Using the Simple ECMAScript Engine (SEE)

by David Leonard, 2003.

ECMAScript is a standardized language also known variously as JavaScript, JScript, and LiveScript. SEE is a library that provides a parser and runtime environment for this language. It conforms to ECMAScript Edition 3, and to JavaScript 1.5, with some compatibility switches for earlier versions of JavaScript and Microsoft's JScript.

This documentation is intended for developers wishing to incorporate SEE into their applications. It explains how you can use SEE to:

I will use the phrase "host application" to mean your application, or any application that uses the SEE runtime environment as auxillary to some primary purpose. Typical examples of host applications are web browsers and scripted XML processors.

Throughout this documentation, references are made to the C functions and macros provided by the SEE library. To avoid definitional redundancy and to improve precision, the reader is encouraged to examine the SEE header files to find the precise definitions and arguments of each function or macro.

Table of contents

Requirements

SEE is written completely in ANSI C. Although SEE is essentially self-contained, it does depend on the host application or developer providing the following:

At install time, SEE uses GNU configure to determine if these are available, and also to determine other system-dependent properties. Host applications should #include <see/see.h> to access all the macros and functions prototypes.

(The developer may find they need to edit the header files to make SEE work for their system. I would be interested in hearing what changes were needed so that future releases can supply this automatically for other users.)

Creating interpreters

The first step in running an ECMAScript program with SEE is to create an interpreter instance. First, allocate storage for a struct SEE_interpreter and then call SEE_interpreter_init() to initialise it.

A pointer to your initialised SEE_interpreter structure is required for almost every function that SEE provides.

SEE supports multiple interpreter instances. For example, in HTML web browsers, each window needs its own interpreter instance as the variables and bindings to built-in objects can be different and separate in each one.

SEE does not support the sharing of object instances across interpreter instances. SEE's functions are not thread-safe, but if the memory allocator is thread-safe then separate threads can use separate interpreters without conflict: SEE treats its static storage as read-only.

There is no mechanism for explicitly deallocating an initialised interpreter; instead, SEE expects the garbage collector to reclaim all unreferenced storage.

If SEE encounters an internal error (such as memory exhaustion, memory corruption, or a bug), it calls the function pointer SEE_abort, passing it a pointer to the interpreter in context. The SEE_abort global variable initially points to a function that simply calls the C library function abort().

Memory management

It is strongly recommended that the host application use the same memory management as SEE. SEE provides 'hook' function pointers that the host application can initialise for this purpose. These pointers must be set up before any interpreter instances are created.

SEE manages memory by calling the following function pointers:

 void* (*SEE_mem_malloc_hook)(struct SEE_interpreter *, unsigned int);
 void  (*SEE_mem_free_hook)(struct SEE_interpreter *, void *);
 void  (*SEE_mem_exhausted_hook)(struct SEE_interpreter *);

If SEE was compiled with Boehm-gc support, these hooks are initialised to wrappers around the GC_malloc() and GC_free() functions. Otherwise, the application must initialise them. (If you intend to implement these, be aware that they may be called with a NULL argument, indicating unknown context.)

If SEE detects a memory allocation function returning NULL, it will call the function pointer SEE_mem_exhausted_hook, which defaults to a function that calls SEE_abort.

Currently, SEE never uses SEE_mem_free_hook, although future versions may use it. It may be safely left at its default value, NULL.

SEE provides three convenient macros for allocating storage. They are:

On memory allocators

Memory allocators can be controversial; as a developer you will understand your applications's memory requirments and limitations better than anyone. Developers not familiar with garbage collectors are often concerned that using them in their application will result in something somehow substandard. This section tries to address that concern.

SEE was designed for use with a garbage collector, partly because the ECMAScript language specification implies it, but mainly because using an alternative (strict) memory allocator would overly penalize the development time, run-time performance and code size of the library (and consequently of the host application). A range of very good, sometimes freely available garbage-collecting allocators exist for the C language on wide range of platforms. Arguments can be made that garbage collecting allocators have much improved runtime performance and overhead than any scheme that manage storage explicitly.

That said, it would be a straightforward (albeit tedious) exercise to convert most of SEE to use (say) a reference-counting memory allocator: all functions would need to handle exceptions, so as to explicitly deallocate acquired references and dispose of temporary values on the stack; the SEE_string management code would need to carefully track segment sharing; and careful management would be required to handle the unavoidable reference cycles involving object constructors and their prototype objects, as well as cycles occuring in recursive function scope chains. This is not trivial. Using a garbage collecting allocator is heaps easier!

Running programs

The general strategy for invoking ECMAScript program text is as follows:

  1. obtain an initialised SEE_interpreter structure;
  2. construct a SEE_input stream that can transport the ECMAScript program text to SEE;
  3. establish a try-catch context;
  4. call the function SEE_Global_eval() to parse and evaluate the stream;
  5. handle any exceptions caught in the try-catch context;
  6. optionally examine the value result returned

The SEE_Global_eval() function is optionally able to return the value associated with the last statement executed. In a non-interactive environment, this value is meaningless, and the value result return pointer given to SEE_Global_eval() may be safely given as NULL.

Inputs

SEE uses 'inputs' as character stream sources to scan and parse ECMAScript program text. Because ECMAScript is defined to use Unicode, the inputs must provide a stream of SEE_unicode_t (UCS-4) characters. (If you don't care about Unicode, it is helpful to know that 7-bit ascii is a direct subset of Unicode. SEE uses Unicode internally).

SEE uses this generalised stream input API rather than (say) a simple UCS-4 or UTF-8 string API, because Unicode-compliant applications will usually have a much better understanding of the encodings they are using than will SEE and streams provide this flexibility, and also because streams avoids much unnecessary duplication in terms of text storage.

Inputs are described with struct SEE_input structures. These functionally resemble stdio's FILE type, or Java's ByteReader classes, but stream fully-decoded Unicode characters instead. Each SEE_input structure contains the input's state and provides a pointer to its access methods.

The inputclass member indicates the access methods. It is a pointer to a SEE_inputclass structure. This class structure contains function pointers to the two methods next() and close().

Use these convenience macros to call the input methods:

Macro Description
SEE_INPUT_NEXT() consume and return the next Unicode character from the stream
SEE_INPUT_CLOSE() release any resources obtained by the stream

The next() method should advance the input pointer, update the eof and lookahead members of the SEE_input structure, and return the old value of lookahead. SEE's scanner calls next() repeatedly, until the eof member becomes true.

If the next() method encounters an encoding error, it should return SEE_INPUT_BADCHAR and try to recover. It can throw an exception if it wants to, but SEE does not attempt to handle that: the application or user program will receive it.

The close() method should deallocate any operating system resources acquired during the input stream's construction. Note that, by convention, SEE will not call the close() method of any application-supplied input. The onus is on the caller to close the inputs supplies to SEE library functions.

The SEE_input structure maintains the input state in various members. Most importantly, the lookahead field must always reflect the next character that a call to next() would return. Once set, the filename, first_lineno and interpreter members of the SEE_input structure should not be changed. The lookahead and eof members should also be initialised before the structure is given to SEE.

Three demonstration/testing input constructors are provided. When called, they create a new SEE_input structure, appropriately initialised. They are:

Application developers are strongly encouraged to develop their own input constructors suitable for their own application, based on these examples.

Try-catch contexts

SEE's exceptions are implemented using C's setjmp()/longjmp() mechanism. SEE provides macros that establish a try-catch context, and to test if a try block terminated abnormally (i.e. due to an exception). Typical code that uses try-catch looks like this:

	struct SEE_interpreter *interp;
	struct SEE_value *e;
	SEE_try_context_t c; /* storage for the try-catch context */

	...

	SEE_TRY(interp, c) {

		/*
		 * Now inside a protected "try block".
		 * The following calls may throw exceptions if they want,
		 * causing the try block to exit immediately.
		 */
		do_something();
		do_something_else();

		/* 
		 * Because the SEE_TRY macro expands into a 'for' loop,
		 * avoid using 'break', or 'return' statements.
		 * If you must leave the try block, use 'continue;',
		 * or throw an exception.
		 */
	}

	/* Code placed here always runs. */
	do_cleanup();

	if ((e = SEE_CAUGHT(c))) {
		/* Handle the thrown exception 'e', somehow. */
		handle_exception(e);

		/* or you can throw it up to the next handler like so: */
		SEE_THROW(interp, e);
	}

	...

Do not return, goto or break out of a try block; the macro does not check for this, and the try-catch context may not be restored properly, causing all sorts of havoc.

Exceptions thrown outside of any try-catch context will cause the interpreter to abort.

If you are not interested in catching exceptions, and only want the 'finally' behaviour, use the following idiom:

	SEE_TRY(interp, c) {
		do_something();
	}
	do_cleanup();
	SEE_DEFAULT_CATCH(interp, c);

Values

Eventually, the host application will need to pass numbers, strings and complex objects about through the SEE interpreter to and from the user code. The ECMAScript language provides for only six value types. They are:

The struct SEE_value structure type expresses all these values (and some more that are the parser uses):

	struct SEE_value {
	    enum { ... } 	    type;
	    union {
		SEE_boolean_t	    boolean;
		SEE_number_t	    number;
		struct SEE_string * string;
		struct SEE_object * object;
		...
	    } u;
	};

Its type member acts as a discriminator, and is always one of SEE_UNDEFINED, SEE_NULL, SEE_BOOLEAN, SEE_NUMBER, SEE_STRING or SEE_OBJECT.

Given a struct SEE_value called v, you can access the appropriate member of the union v.u as shown in the following table:

v.type Valid member Member's type
SEE_UNDEFINED n/a
SEE_NULL n/a
SEE_BOOLEAN v.u.boolean SEE_boolean_t
SEE_NUMBER v.u.number SEE_number_t
SEE_STRING v.u.string struct SEE_string *
SEE_OBJECT v.u.object struct SEE_object *

Two other types (SEE_COMPLETION and SEE_REFERENCE) are only used internally to SEE and are not documented here.

To initialise struct SEE_value structures, use the macros provided. These set the type field and the appropriate part of the union.

	void SEE_SET_UNDEFINED(struct SEE_value *)
	void SEE_SET_NULL(struct SEE_value *)
	void SEE_SET_OBJECT(struct SEE_value *, struct SEE_object *)
	void SEE_SET_STRING(struct SEE_value *, struct SEE_string *)
	void SEE_SET_NUMBER(struct SEE_value *, SEE_number_t)
	void SEE_SET_BOOLEAN(struct SEE_value *, SEE_boolean_t)

To copy a value, use the following macro:

	void SEE_VALUE_COPY(struct SEE_value *dst, struct SEE_value *src)

Most SEE_values are passed about the SEE library functions using pointers. In particular, functions that need to return a value do so by copying the value into the struct SEE_value that the caller provides. Conventionally, the result value pointer is the last argument to these functions. The SEE_VALUE_COPY() macro breaks this convention by instead following the better-known idiom of memcpy().

The ECMAScript language specification provides for conversion functions that the host application developer may find useful. They convert values into values of a known type:

Undefined, null, boolean and number values

The undefined and null types have only one implicit and default value.

Boolean types (SEE_boolean_t) have values of either true (non-zero) or false (zero).

Number values (SEE_number_t) are IEEE 754 signed floating point numbers, normally corresponding to the C compiler's built-in double type.

The following macros may be used to find information about a number value. (They assume that the type is SEE_NUMBER):

SEE also provides constants SEE_Infinity and SEE_NaN which may be stored in number values, but should not be used to compare number values. Use the macros mentioned previously, instead.

Numbers (and other values) may be converted to integers using the functions SEE_ToInt32(), SEE_ToUint32() or SEE_ToUint16(). SEE provides three data types for integers:

String values

String values are pointers to SEE_string structures, that hold UTF-16 strings. Assuming a struct SEE_string * called s, the useful members of this structure are shown in the following table:

Member Member's type Description
s->length unsigned int Length of string in UTF-16 characters
s->data SEE_char_t * Read-only sring storage

Be aware that other strings may come to share the string's data, such as by forming substrings. A string's content must not be modified after construction because of this risk. However, the length field of a string may be changed to a smaller value at any time without concern.

The SEE_char_t type represents each character in the string. It is equivalent to a 16-bit unsigned integer.

To manipulate a string, first create a new string using one of the following:

And then, optionally, append characters to your new string using the following:

Once a string has been passed to any other SEE function, it is generally unwise to modify its contents in any way. It is also OK to share a string between different interpreters if the string is guaranteed not to be modified, and the garbage collector can cope with it.

All strings are assumed to use UTF-16 encoding, meaning that in some cases you will need to be aware of Unicode surrogate characters. If the host application really needs (say) UCS-4 strings, you will need to write your own conversion functionl. (The SEE_input_string() generator may prove useful here).

Note: The SEE_string_sprintf() and SEE_string_vsprintf() functions assume the string constructed exists wholly within the 7-bit ASCII subset of Unicode.

Other string functions provided are:

If you find yourself comparing strings a lot, you may find it easier to compare internalised strings. These are strings that are kept in a fast hash table and may be compared equal using pointer equality. The SEE_intern() function is very fast on alread-interned strings, so it is worth using over SEE_string_cmp() if the strings are likely to be intern'ed already. (Most property names are.)

Objects

ECMAScript uses a prototype-inheritance object model with simple named properties. More information on the object model can be found in the ECMA-262 standard, and in other JavaScript references.

Objects are implemented as in-memory structures, with an objectclass pointer to a table of operational methods.

This section first describes how all objects can be accessed (the 'client interface'), and then goes on to describe the API that host applications can use to make their own objects visible (the 'implementation interface').

Object values, and the object client interface

All object values are pointers to object instances. The pointers are of type struct SEE_object *. No object pointer in an object value should ever point to NULL.

Objects can be acessed and manipulated using the following macros:

Note that the last four macros do not check if the object has a NULL pointer for the corresponding object method. Calling them on an unchecked object will probably result in an access violation (segmentation fault). The following macros return true if the object safely provides those methods:

When storing properties in an object with SEE_OBJECT_PUT(), a flags parameter is required. Normally, this should be supplied as zero, but when constructing an object with properties for the first time, the following bit flags can be used:

Flag Meaning
SEE_ATTR_READONLY Future puts on this property will fail
SEE_ATTR_DONTENUM Enumerators will not list this property
(and will hide prototype properties of
the same name)
SEE_ATTR_DONTDELETE Future deletes on this property will fail

A property enumerator is a pointer to a struct SEE_enum, that allows sequential access to the (enumerable) properties of the object. The order of the enumeration is not guaranteed to be sorted, or even to be the same each time. Once a pointer to a struct SEE_enum is obtained, the following macros can be used to allow access it:

Enumerators will become unstable if the properties of the underlying object change during enumeration. The recommended strategy is to create your own private list of property names and discard the enumerator before attempting to modify the object.

The object implementation interface

A host application usually wishes to expose its own objects to the runtime environment. SEE uses an object implementation API that host objects can independently provide.

All SEE objects are in-memory structures starting with a struct SEE_object:

	
	struct SEE_object {
		struct SEE_objectclass *objectclass;
		struct SEE_object *     Prototype;
	};

Normally, this is just part of a larger structure that maintains the object state. For example, Number objects could be defined as:

	struct number_object {
		struct SEE_object object;
		SEE_number_t      number;
	};

and pointers to a struct number_object can be cast to struct SEE_object *.

The objectclass field of a struct SEE_object points to a struct SEE_objectclass:

	struct SEE_objectclass {
		struct SEE_string *     Class;		/* mandatory */
		SEE_get_fn_t            Get;		/* mandatory */
		SEE_put_fn_t            Put;		/* mandatory */
		SEE_boolean_fn_t        CanPut;		/* mandatory */
		SEE_boolean_fn_t        HasProperty;	/* mandatory */
		SEE_boolean_fn_t        Delete;		/* mandatory */
		SEE_default_fn_t        DefaultValue;	/* mandatory */
		SEE_enumerator_fn_t     enumerator;	/* optional */
		SEE_call_fn_t           Construct;	/* optional */
		SEE_call_fn_t           Call;		/* optional */
		SEE_hasinstance_fn_t    HasInstance;	/* optional */
	};

The members of this structure are function pointers. Use the various SEE_OBJECT_* convenience macros to call them. A member marked "optional" may be set to NULL, in which case a sensible default action is taken.

The host application typically constructs one instance of a SEE_objectclass, and provides implementations for the mandatory object methods Get, Put, etc. SEE expects a precise behaviour from these methods. The behaviours are fully described in the ECMA-262 standard, but can be summarised as follows:

Method Behaviour
Get retreive a named property (or return undefined)
Put create/update a named property
Delete delete a property or return 0
HasProperty returns 0 if the property doesn't exist
CanPut returns 0 if the property cannot be changed
DefaultValue turns the object into a string or number value
Construct constructs a new object; as per the new keyword
Call the object has been called as a function
HasInstance returns 0 if the objects are unrelated
enumerator allow enumeration of the properties (see above)

It is up to the host application to provide storage for the properties, and so forth. The simplest strategy is to simply ignore properties calls to Put and Get that are meaningless. To this end, if the host object does not want to support some of the mandatory operations, it can use the corresponding 'do-nothing' function(s) from this list:

The Prototype field can either be set to the interpreter's Object_prototype, to NULL, or even to some other object.

Once the host application has constructed its own objects that conform to the API, they can be inserted into the 'Global object' as object-valued properties.

The 'Global object' is a user-inaccessible object whose sole purpose is to hold all the built-in objects, such as Object, Function, Math, etc., as well as all user-declared global variables. The host application can access it through the Global member of the SEE_interpreter structure.

Native objects

SEE provides support for a special kind of object class called native objects. Native objects maintain a hash table of properties, and implement the mandatory methods (plus enumerator), and correctly observe the Prototype field.

	struct SEE_native {
		struct SEE_object       object;
		struct SEE_property *   properties[SEE_NATIVE_HASHLEN];
	};

An application can create host objects based on native objects. First, place a struct SEE_native at the beginning of a structure:

	struct some_host_object {
		struct SEE_native       native;
		int			host_specific_info;
	};

Then, use the following objects methods, either directly in the SEE_objectclass structure, or by calling them indirectly from method implementations:

C Function objects

Often, a host application wishes to provide a callable function to the runtime environment, backed by a C function. This requires the construction of an object whose prototype is Function.prototype, and whose objectclass's Call method points to the appropriate C function.

The function SEE_cfunction_make() performs this construction. It takes a pointer to the C function, and an integer indicating a typical number of arguments. (The integer becomes the function object's "length" property.)

NOTE: Objects returned by SEE_cfunction_make() should really only be used in the interpreter context in which they were created, but the current version of SEE does not check for this. (Because cfunction objects are essentially read-only after construction, and if memory allocation operates independently of the interpreters, sharing cfunction objects across interpreters will work OK. But, it is not recommended for future portability.)

A typical C function looks like the following: (actual code for Math.sqrt)

	static void
	math_sqrt(interp, self, thisobj, argc, argv, res)
		struct SEE_interpreter *interp;
		struct SEE_object *self, *thisobj;
		int argc;
		struct SEE_value **argv, *res;
	{
		struct SEE_value v;

		if (argc == 0)
			SEE_SET_UNDEFINED(res);
		else {
			SEE_ToNumber(interp, argv[0], &v);
			SEE_SET_NUMBER(res, sqrt(v.u.number));
		}
	}

The arguments passed to the C function, from SEE_OBJECT_CALL are described in the following table:

Argument Purpose
interp the current interpreter context
self a pointer to the object called (Math.sqrt here)
thisobj the this object (the Math object in this case)
argc number of arguments
argv array of value pointers, of length argc
res value location in which to store the result

The C function should ignore any extra arguments, and treat unsupplied arguments as if they were undefined values. It should also check any assumptions made about thisobj, if it uses it.

User function objects

Occasionally, a host application will wish to take some user text and create a callable function object from it. One way to do this is to invoke the Function constructor with SEE_OBJECT_CONSTRUCT, passing it the arguments and body text as arguments.

Another way, that is more convenient if the user text is available as an input stream, is to use the SEE_Function_new() function:

	struct SEE_object *SEE_Function_new(struct SEE_interpreter *interp, 
		struct SEE_string *name, struct SEE_input *param_input, 
		struct SEE_input *body_input);

where any of the the name, param_input and body_input parameters may be NULL (indicating to use the empty string).

The returned function object may be called with the SEE_OBJECT_CALL() macro.

Errors and Error objects

Host applications sometimes need to convey errors to ECMAScript programs. Errors in ECMAScript are typically indicated by throwing an exception with an object value. The thrown objects conventionally have Error.prototype somewhere in their prototype chain, and provide a message and name property which the Error.prototype reads to generate a human-readable error message.

Host applications can conveniently construct and throw new errors using the following macros:

	SEE_error_throw_string(interp, constructor, string)
	SEE_error_throw(interp, constructor, fmt, ...)
	SEE_error_throw_sys(interp, constructor, fmt, ...)

These macros construct a new error object, and throw it as an exception. The error object thrown normally has a message string property that reflects the rest of the arguments provided to the macro. The SEE_error_throw_sys() macro additionally appends a textual description of errno as well.

The constructor argyment should be one of the error constructor objects found in the SEE_interpreter structure:

Member Meaning
Error runtime error
EvalError error in eval()
RangeError numeric argument has exceeded allowable range
ReferenceError invalid reference was detected
SyntaxError parsing error
TypeError actual type of an operand different to that expected
URIError error in a global URI handling function

Although Error is usually sufficient for most errors, host applications can create their own error constructor object with the SEE_Error_make() convenience function. Only one constructor of the same name should be created per interpreter.

Debugging facilities

The SEE library contains various debugging facilities, that are omitted if it is compiled with the NDEBUG preprocessor define.

Most useful to the application developer are these two functions:

	void SEE_PrintValue(struct SEE_interpreter *i, 
		struct SEE_value *v, FILE *f);
	void SEE_PrintObject(struct SEE_interpreter *i, 
		struct SEE_object *o, FILE *f);

If debugging the library itself, it is worth reading the source code to find the debug flag variables that can be turned on by the host application to enable verbose traces during execution.

References


© David Leonard, 2003. This documentation may be entirely reproduced and distributed in any form, as long as this copyright notice remains intact, and the distributed reproduction is a complete and bona fide copy.
$Id: USAGE.html,v 1.3 2003/12/02 13:13:31 d Exp $