Building the AST

MysoreScript uses Pegmatite for both parsing and AST construction. Pegmatite is a parsing expression grammar (PEG) library that is designed to allow easy experimentation with language design. PEGs use a greedy strategy matching strategy with unlimited backtracking (equivalent to unlimited read-ahead).

The grammar for MysoreScript is defined in grammar.h. Note that this file just describes what the language looks like: it defines a recogniser for the language, not a parser. This separation makes it easy to reuse a grammar description for syntax highlighting, autocompletion, and so on. The Pegmatite documentation contains more information about defining a grammar

Pegmatite parsers are created by declaratively associating AST classes with rules in the grammar. In MysoreScript, the MysoreScriptParser class in parser.hh is responsible for reating these links, for example defining that the assignment rule in the grammar is handled by the Assignment class.

Each AST node constructs itself by popping existing AST nodes off a stack and then pushing itself. The Assignment class, for example, declares two fields that use Pegmatite’s ASTPtr template. These register themselves with their container on construction (Assignment is a subclass of Pegmatite’s ASTContainer class) and the container initialises them in reverse order by popping values from the stack. Pegmatite will therefore construct an Assignment AST noede by first popping a pointer to an Expression object into the expr field and then popping a pointer to a VarRef into the target field (MysoreScriptParser does not support expressions that evaluate to l-values and so the target of an assignment is always the name of a variable).

Once Pegmatite has constructed an AST, the next step is to perform some semantic analysis. MysoreScript doesn’t do very much of this, because it’s intended as a simple system for teaching rather than a production-quality language implementation.
It does; however, need to create symbol tables to let the interpreter and compiler associate variable references with their declarations and, more importantly, with their locations.

The AST::Statement class defines a collectVarUses method, which all concrete subclasses must implement. This passes two sets, one containing the names of declared variables and one of referenced variables. This is used when executing a closure, to determine which variables are in each scope. The first time that a closure is executed, it will invoke this method on each statement, which will recursively find all of the declarations and variable references. If you are implementing a new AST node that contains others, don’t forget to implement this to forward the invocation to subclasses!

NOTE: There is no real error checking for variable references, the MysoreScript interpreter will simply crash if you try to access a variable that is not in any symbol tables.