[see] / see / trunk / doc / BYTECODE.txt Repository:
ViewVC logotype

View of /see/trunk/doc/BYTECODE.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1416 - (download) (annotate)
Sun Apr 26 04:54:58 2009 UTC (16 months, 2 weeks ago) by d
File size: 22654 byte(s)
New ENDF code to disambiguate abnormal termination of finally blocks.
Closes bugs 131 and 132.

SEE's intermediate bytecode representation
------------------------------------------

This document describes and defines SEE's intermediate code generation
virtual machine model and instructions.

SEE is designed to support different code generators. The model defined
in this document is an interface between the parser and the code generation-
execution back-end which implements the virtual machine.

The virtual machine executes a sequence of virtual instructions.
The instructions operate on:

    * the environment available through a SEE_context structure
    * a bounded stack of SEE_values ("the value stack")
    * a bounded stack of 'blocks' (eg TRY,WITH. "the block stack")
    * a bounded array of local variable values (see VREF instruction)
    * the 'C' register (the last value resulting from a statement)
    * the 'L' register (a SEE_location)
    * the 'E' register (current enumeration, see B.ENUM)
    * The 'PC' register (the current instruction stream index)

A sequence of instructions implements a SourceElements production
minus any FunctionDeclarations. That is, the code generator is used to
build bytecode that implements exactly one function. Thus, there is only
one entry point.

This document does not define a physical byte code format, instead
it defines the instructions identified by the enumerated types of code.h.
Some instructions take operands. It is up to the code generator how to
encode these operands. The operands to instructions include:

    * address operands (branch and block instructions)
    * integer operands (actual parameter count for call instructions)
    * literal SEE_value (for the LITERAL instruction)

The value stack
---------------

The value stack is used for computing expressions. Most of the instructions
described in this file work on the value stack. The value stack is bounded
for any one function, as the parser is able to determine the maximum
number of values the stack could contain in that function.

I use a notation inspired by Adobe Postscript to describe manipulations on
the value stack. See the section on 'Notation' below.

The block stack
---------------

Some of the instructions in this file work on the block stack. The block 
stack is independent of the value stack and is used for keeping track of:

    * enumerators ('for ... in')
    * try-catch and try-finally handling
    * scope extension ('with')

The block stack is independent of the value stack. The top element of the
block stack, if any, is the current block. We say the program creates a block
when a new block is pushed onto the top of the stack. Similarly, the
program ends a block as it is popped.

The most important feature of blocks is that as the program ends a block, 
it triggers a behaviour such as restoring the scope chain or jumping to 
instructions implementing the finally part of a try-finally statement.

New blocks are created by the 4 instructions S.ENUM, S.TRYC, S.TRYF,
and S.WITH instructions, and a special instruction S.CATCH is used to modify
the block stack. There are 5 types of blocks that may exist on
the block stack: ENUM, CATCH, FINALLY, FINALLY2 and WITH. These are all
described in detail in the section "Block Behaviour".

Blocks are numbered from the bottom of the stack, starting at number 1.

Blocks are ended by the instruction END,n. This instruction ends all
blocks numbered n and higher, beginning with the highest.  There is
a special ENDF instruction used only for ending a FINALLY2 handler block 
by restoring conditions for continuation.

Attempting to end "block zero" (END,0) terminates the function and returns 
the C register to the caller. This is how the 'return' statement is 
implemented.

A block's end behaviour may interrupt the END instruction, stopping further
blocks from being ended and usually causing flow of control to resume 
elsewhere. However, the language eventually returns control to restart
the interrupted END instruction and all blocks will be ended. Because of this,
the END instruction is different to most in that it only advances the PC
when the block stack is short enough. Otherwise it loops on itself.

For example, ending a FINALLY block transfers control to a finally handler, 
which pushes a FINALLY2 block onto the block stack. When the FINALLY2 block
is ended, it restores the C and PC registers to return control to the END
statement that ended the original FINALLY block.

Generally, expressions use the value stack, while statements use the block 
stack and the C register.

Initial conditions
------------------
On entry to a function, the following machine state is defined:

    PC = initial instruction
    C  = the value 'undefined'
    L  = not defined
    E  = not defined
    block stack = empty
    value stack = empty

INSTRUCTION SUMMARY
===================

Instruction notation
--------------------

This section describes the notation used in subsequent sections for defining
all the virtual machine instructions.

Instructions marked with a '*' in the first column may throw exceptions. 
Instructions without a '*' cannot throw exceptions.

Most instructions operate on the value stack, and we need a notation to
describe the before and after states of the stack. We use general notation 
of "before | after". For example, the subtraction operator SUB is described 
like this:

	SUB    num1 num2 | num3
            Let num3 = num1 - num2

This indicates that num2 is first popped off the stack, then num1,
then num3 is pushed. On both sides, the top of the stack is always 
written rightmost in the list.  An empty list is represented by '-'.

The notation also indicates type constraints on the elements. For example,
the 'num2' element from the example is guaranteed to be a SEE_NUMBER value. 

The abbreviations of the value types and some union types are as follows:

    * bool	= { SEE_BOOLEAN }
    * num	= { SEE_NUMBER }
    * str	= { SEE_STRING }
    * null	= { SEE_NULL }
    * undef	= { SEE_UNDEFINED }
    * obj	= { SEE_OBJECT }
    * ref	= { SEE_REFERENCE }
    * cmpl	= { SEE_COMPLETION }
    * any	= bool + num + str + null + undef + obj + ref + cmpl
    * val	= bool + num + str + null + undef + obj
    * prim	= bool + num + str + null + undef

Briefly, 'prim' indicates non-object values, while 'any' indicates things
of unknown type.

Call instructions
-----------------

*   NEW,n 	objC any1..anyn | objR

	1. If typeof(objC) is not object, throw a TypeError exception.
	2. If objC does not implement [[Construct]] throw a TypeError exception.
	3. Call the [[Construct]] method on objC, providing any0..anyn as
	   the argument values (may be empty, i.e. n = 0) setting .
	4. Let objR be the result of the previous [[Construct]] call.

*   CALL,n 	anyC any1..anyn | valR 

	1. Compute objC = GetValue(anyC).
	2. If typeof(objC) is not object, throw a TypeError exception.
	3. If objC does not implement [[Call]] throw a TypeError exception.
	4. If anyC is a reference, let t = GetBase(anyC),
	   otherwise let t = NULL.
	5. If t is an activation object then set t = NULL.
	6. Call the [[Call]] method on objC, providing any0..anyn as the
	   argument values (may be empty).
	7. Let valR be the result of the previous [[Call]] call.

    Note: The L, C, E registers are unchanged by the CALL/NEW instructions.

XXX TODO the arguments to each NEW and CALL are in practice 'val', not 'any'.


Special instructions
--------------------

    FUNC,func	    - | obj
	Binds a function instance object to the current scope
	and pushes it onto the stack.

    LITERAL,val    - | val
	Pushes a copy of a literal value onto the stack.

    LOC,filename:lineno      - | -
	Sets the L register to the source current location.


    Notes: Implementations are encouraged to generate fixed tables (of 
    functions, literals and location tags) during parse, and then to use 
    non-negative integer arguments to the special instructions above.


Generic instructions
--------------------

    NOP  	    - | -
	No operation. (Used for padding.)

    DUP  	    any1 | any1 any1
	Duplicates the value on top of the stack.

    POP  	    any1 | -
	Removes the topmost element on the stack.

    EXCH	    any1 any2 | any2 any1
	Exchanges the top two elements on the stack.

    ROLL3	    any1 any2 any3 | any3 any1 any2
	Rotates the top three elements. The top element is
	pushed down beneath the next two elements, and those two elements
	rise up by one position.


Miscellaneous instructions
--------------------------

*   THROW	    val | -
	Throws an exception with value val.

        Note: The resulting state of the stack is not usefully defined 
        here because no following instruction will ever see the stack
        in that state. However, it is convenient for some analysis to
        assume that THROW does indeed continue.

    SETC	val | -
	Sets the C register to the given value.

    GETC	- | val
	Pushes the value of the C register.


Context instructions
--------------------

    THIS	- | obj
	Pushes the 'this' object from the current context onto the stack.

    OBJECT	- | obj
	Pushes the object interp->Object onto the stack.

    ARRAY	- | obj
	Pushes the object interp->Array onto the stack.

    REGEXP	- | obj
	Pushes the object interp->Regexp onto the stack.


Primitive instructions
----------------------

    REF		obj str | ref
	Creates a reference value by combining obj and str.

*   GETVALUE	ref | val
	Computes GetValue(ref) (8.7.1), i.e. val = ref.[[Get]]

    LOOKUP	str | ref
	Creates a reference by looking up an identifier in the current 
	scope (10.1.4)

*   PUTVALUE	ref val | -
*   PUTVALUE,n	ref val | -
	Computes PutValue(ref, val) (8.7.2), i.e ref.[[Put]](val).
	PUTVALUE,n is a variant of PUTVALUE that puts values
        with non-default (non-zero) attributes.

    VREF,n	- | ref
	Returns a reference to a variable. Variables are always referenced
	from the context variable object. Variables are always initialised
	to undefined in the variables object before the function body begins,
	unless they already exist.  The VREF instruction is equivalent to the 
        instructions <LITERAL,name;LOOKUP> when the variable object is closest
        in scope.

*   DELETE	any1 | bool1
	1. If any is not a reference, then let bool1=true
	2. Otherwise let bool1 be the result of calling [[Delete]] 
           on the reference (11.4.1)

    TYPEOF	any1 | str1
	1. If any1 is a reference with null base, then 
	   let v = undefined
	2. If any1 is a reference with a non-null base, then
	   let v = GetValue(any1)
	3. If any1 is not a reference, then 
           let v = any1
	3. Let str1 be a string representing the type of v (11.4.3)

Conversion instructions
-----------------------

*   TOOBJECT	val | obj
	Let obj be the result of ToObject(val) (9.9).

*   TONUMBER	val | num
	Let num be the result of ToObject(val) (9.3).

    TOBOOLEAN	val | bool
	Let bool be the result of ToBoolean(val) (9.2).

*   TOSTRING	val | str
	Let str be the result of ToString(val) (9.8).

*   TOPRIMITIVE	val | prim
	Let prim be the result of ToPrimitive(val) (9.1).

    Notes: These instructions have no effect when the value on the top
    of the stack is already of the right type. Peephole optimization may 
    occur when the type constraints of instructions obviate the need 
    for these conversions. For example, the DELETE instruction always pushes 
    a boolean; so any immediately following TOBOOLEAN instruction can be
    removed by a peephole optimizer.


Arithmetic instructions (11.6.3)
-----------------------

    NEG		num1 -> num2
	If num1 is NaN, then let num2 = NaN
	otherwise let num2 = -num1	(i.e. num1 with opposite sign)

    INV		val1 -> num2
	Let num2 = ~ToInt32(val1) (i.e. bitwise one's complement)

    NOT		bool1 -> bool2
	Let bool2 = !bool1 (i.e. logical inverse)

    MUL		num1 num2 | num3
    DIV		num1 num2 | num3
    MOD		num1 num2 | num3
	Let num3 = num1 [*/%] num2 (see 11.5)

*   ADD		prim1 prim2 | prim3
	If either of prim1 or prim2 is a string then
	    let prim3 = concat(ToString(prim1), ToString(prim2))
	otherwise
	    let prim3 = ToNumber(prim1) + ToNumber(prim2)

    SUB		num1 num2 | num3
	Let num3 = num1 - num2

    LSHIFT	val1 val2 | num3
	Let num3 = ToInt32(val1) << (ToUint32(val2) & 0x1f)

    RSHIFT	val1 val2 | num3
	Let num3 = ToInt32(val1) >> (ToUint32(val2) & 0x1f) 
	The most significant bit is propagated, and the result is
        an integer in the range [-(2^31), 2^31).

    URSHIFT	val1 val2 | num3
	Let num3 = ToUint32(val1) >> (ToUint32(val2) & 0x1f)
	The most signifiant bit of result is set to zero, and the result is
	an integer in the range [0, 2^32). (11.7.3)


Relational instructions (11.8)
-----------------------

    LT		val1 val2 | bool
	Let bool = (val1 < val2) (see 11.8.1)

    GT		val1 val2 | bool
	Let bool = (val1 > val2) (see 11.8.2)

    LE		val1 val2 | bool
	Let bool = (val1 <= val2) (see 11.8.3)

    GE		val1 val2 | bool
	Let bool = (val1 >= val2) (see 11.8.4)

*   INSTANCEOF	val1 val2 | bool
	1. If val2 is not an object, throw a TypeError exception
	2. Otherwise, let bool = (val2.[[HasInstance]](val1))

*   IN		str1 val2 | bool
	1. If val2 is not an object, throw a TypeError exception
	2. Otherwise, let bool = (val2.[[HasProperty]](str1))

    EQ		val1 val2 | bool
	Let bool = (val1 == val2) (see 11.9.3)

    SEQ		val1 val2 | bool
	Let bool = (val1 === val2) (see 11.9.6)


Binary bitwise instructions (11.10)
---------------------------

    BAND	val1 val2 | num3
    BXOR	val1 val2 | num3
    BOR		val1 val2 | num3
	Let num3 = (ToInt32(val1) [&^|] ToInt32(val2))  (bitwise operation)


Branch instructions
-------------------

    B.ALWAYS,x 	    - | -
	Set PC = x.

    B.TRUE,x	 bool | -
	If bool is true, then set PC = x.

    B.ENUM,x	    - | - str		(when branch IS taken)
		    - | -		(when branch IS NOT taken)
	1. If the enumeration register E is exhausted, then branch to x.
	2. Otherwise, push the next identifier string from E onto the value
	   stack (and do not branch).
	The B.ENUM instruction only pushes a string str where 
	obj.[[HasProperty]](str) is true. This is checked each time.


Block instructions
------------------

    S.ENUM	    obj | -
	1. Create and pushes a new enumeration block. 

        See the B.ENUM instruction above and 'The ENUM block' section below.
	Conceptually, this instruction saves register E, and
	then sets E to the new enumeration of obj.

    S.WITH	    obj | -
	1. Insert obj into the front of the context's scope chain.
	2. Create and push a new 'WITH' block. See 'The WITH block' below.

    S.TRYC,x	    str | -
	1. Creates and pushes a CATCH block. 

        See 'The CATCH block' below.
	The address x indicates the entry point of the catch handler.

    S.TRYF,x	    - | -
	1. Creates and pushes a FINALLY block. 

        See 'The FINALLY block' below.
	The address x indicates the entry point of the finally handler.
	The finally handler at x should finish with an appropiate END.

    S.CATCH         - | -
        1. Inserts the object from the top (CATCH) block into the scope chain
        2. Converts the top (CATCH) block into a WITH block.

        Note that this instruction modifies a block on the block stack,
        rather than just creating a new one. Also, it does not 'end' the 
        top CATCH block, but rather replaces it.

    END,n	    - | ?
        1. If there are less than n blocks remaining on the stack, then
           continue onto the next instruction.
        2. If there are no blocks on the stack, terminate the 
           program and return C.
        3. End the block on top of the block stack by popping it
           and invoking its end behaviour.
        4. Go to step 1 unless the end behaviour transfered control.
           Note: this step may be implemented by leaving the PC 
           alone so that the virtual machine execution loop ends up 
           re-executing the current END instruction.

	Conceptually, the END instruction pops one block, and then
	returns control to itself. This continues until the block stack 
        height is n-1.

        When there are less than n blocks on the block stack, this instruction
        is equivalent to a NOP.

    ENDF 	    - | ?
        1. Remove the FINALLY2 block from the top of the stack
        2. If the block held a valid exception, re-throw it
	   as if SEE_DEFAULT_CATCH() were called
        3. Otherwise, set PC to the saved resume location.

        This separate instruction is required to disambiguated the
        use of END,n within a finally handler (i.e. return, break, continue).

Block behaviour
===============

Block behaviours do not access values from the value stack. This is because
some blocks manipulate the value stack while 

The WITH block

    A WITH block is created by the S.WITH and S.CATCH instructions. 

    On creation it records the state of the scope chain
    before the scope insertion performed by the instruction.

    When this block is ended, it merely restores the scope chain. 

The ENUM block

    An ENUM block is created by the S.ENUM instruction. 

    On creation it saves the value of the E register before it is changed
    by the S.ENUM instruction.

    When this block is ended, it restores the E register.

The CATCH block

    A CATCH block is created by the S.TRYC instruction. 
    
    On creation of a CATCH block the following steps are taken:
       1. activate a SEE_TRY context for catching future instruction exceptions
       2. save the current height of the value stack
       3. saves the starting adddress of the catch handler

    If an exception occurs while a CATCH block is active, the following steps 
    are performed:
	1. Deactivate the SEE_TRY context
        2. Create a new Object K with a property set to the caught exception
        3. Save the object in the block to indicate an exception was caught.
           This is so it may be later consumed by a S.CATCH instruction.
	4. Truncate the value stack to the saved height
        5. Set the PC to the address of the catch handler, which 
           should be an END statement followed by an S.CATCH.

    When a CATCH block is ended, then an exception could not have been 
    caught, and the following steps are performed:
        1. Deactivate the SEE_TRY context

 Structure of a S.TRYC handler

    A catch handler should have the following structure:
        END,x+1
        S.CATCH   ; converts CATCH block x into a WITH block, pushes scope
        ...
        END,x

    where x is the computed depth of the CATCH block. 

    For examepl, consider this function:

        function () {
           try { AAA; } 
           catch(e) { BBB; }
        }

    For this function, a code generator may generate the following,
    unoptimized code:

        0: S.TRYC,3
        1: {AAA}
        2: B.ALWAYS,6
        3: END,2
        4: S.CATCH
        5: {BBB}
        6: END,1
        7: END,0

    The execution of this code when AAA completes is illustrated as follows:

        PC=0 blocks={}              S.TRYC,3
        PC=1 blocks={CATCH<3>}      {AAA} - completes successfully
        PC=2 blocks={CATCH<3>}      B.ALWAYS,6
        PC=6 blocks={CATCH<3>}      END,1 - ending CATCH has little effect
        PC=7 blocks={}              END,0

    And when AAA throws an exception:

        PC=0 blocks={}              S.TRYC,3
        PC=1 blocks={CATCH<3>}      {AAA} - throws an exception with blocks
        PC=3 blocks={CATCH<3,e>,+}  END,2 - ends blocks indicated by '+'
        PC=4 blocks={CATCH<3,e>}    S.CATCH - convert CATCH to WITH using e
        PC=5 blocks={WITH}          {BBB}
        PC=6 blocks={WITH}          END,1 - ends the WITH block
        PC=7 blocks={}

The FINALLY block

    A FINALLY block is created with the S.TRYF instruction. 
    
    On creation of a FINALLY block the following steps are followed:
        1. activate a SEE_TRY context.
        2. record the depth of the value stack

    When a FINALLY block is ended, the following steps happen:
	1. finalise the SEE_TRY context
	2. replace the FINALLY block with a new FINALLY2 block
        3. if the SEE_TRY context had caught an exception, then
           store the exception in the FINALLY2 block
	4. otherwise save the current PC as a resume point and
        5                 the intended x of the current END,x instruction
	5. set the PC to the handler from the FINALLY block

    When an exception occurs inside a FINALLY block context, the
    following steps occur:
	1. the value stack is truncated to its length at the time of S.TRYF
        2. the FINALLY block is ended (see steps above)
           Note especially that the FINALLY2 block will end up
           holding the exception value for rethrow.

 Structure of a S.TRYF handler

    The handler should start with an END statement that cleans up
    any potential blocks above the FINALLY2 block. Then it should
    performs the finalization body, and at the end perform an ENDF
    to complete the finally block.

        END,x+1
        ...
        ENDF

    For example, consider this function:

        function () {
           try { CCC; } 
           finally { DDD; }
        }

    For this function, a code generator may generate the following,
    unoptimized code:

        0: S.TRYF,3
        1: {CCC}
        2: B.ALWAYS,5
        3: END,2
        4: {DDD}
        5: ENDF
        6: END,0

    The execution of this code when CCC completes is illustrated
    as follows:

        PC=0 blocks={}              S.TRYF,3
        PC=1 blocks={FINALLY<3>}    {CCC} - completes successfully
        PC=2 blocks={FINALLY<3>}    B.ALWAYS,5
        PC=5 blocks={FINALLY<3>}    END,1 - saves PC as resume
        PC=3 blocks={FINALLY2<5>}   END,2 - no effect
        PC=4 blocks={FINALLY2<5>}   {DDD}
        PC=5 blocks={FINALLY2<5>}   ENDF - resumes at PC=5
        PC=5 blocks={}              END,1 - no effect
        PC=6 blocks={}              END,0

    And the execution of this code when CCC throws an exception e is 
    as follows:

        PC=0 blocks={}              S.TRYF,3
        PC=1 blocks={FINALLY<3>}    {CCC} - throws an exception
        PC=3 blocks={FINALLY2<e>,+} END,2 - ends blocks indicated by '+'
        PC=4 blocks={FINALLY2<e>}   {DDD}
        PC=5 blocks={FINALLY2<e>}   END,1 - re-throws exception 'e'
        -- Note that the exception e now escapes the virtual machine

The FINALLY2 block

    A FINALLY2 block is created only when a FINALLY block is ended.

    When the FINALLY2 block is ended, it indicates that a finally block
    has been abnormally left. There are no side-effects, other than that
    any saved exception or resume PC is lost.


David Leonard
ViewVC Help
Powered by ViewVC 1.0.9