Building the AST
MysoreScript uses Pegmatite for both parsing and AST construction. Pegmatite is a parsing expression grammar (PEG) library that is designed to allow easy experimentation with language design. PEGs use a greedy strategy matching strategy with unlimited backtracking (equivalent to unlimited read-ahead).
The grammar for MysoreScript is defined in grammar.h
.
Note that this file just describes what the language looks like: it defines a recogniser for the language, not a parser.
This separation makes it easy to reuse a grammar description for syntax highlighting, autocompletion, and so on.
The Pegmatite documentation contains more information about defining a grammar
Pegmatite parsers are created by declaratively associating AST classes with rules in the grammar.
In MysoreScript, the MysoreScriptParser
class in parser.hh
is responsible for reating these links, for example defining that the assignment
rule in the grammar is handled by the Assignment
class.
Each AST node constructs itself by popping existing AST nodes off a stack and then pushing itself.
The Assignment
class, for example, declares two fields that use Pegmatite’s ASTPtr
template.
These register themselves with their container on construction (Assignment
is a subclass of Pegmatite’s ASTContainer
class) and the container initialises them in reverse order by popping values from the stack.
Pegmatite will therefore construct an Assignment
AST noede by first popping a pointer to an Expression
object into the expr
field and then popping a pointer to a VarRef
into the target
field (MysoreScriptParser does not support expressions that evaluate to l-values and so the target of an assignment is always the name of a variable).
Once Pegmatite has constructed an AST, the next step is to perform some semantic analysis.
MysoreScript doesn’t do very much of this, because it’s intended as a simple system for teaching rather than a production-quality language implementation.
It does; however, need to create symbol tables to let the interpreter and compiler associate variable references with their declarations and, more importantly, with their locations.
The AST::Statement
class defines a collectVarUses
method, which all concrete subclasses must implement.
This passes two sets, one containing the names of declared variables and one of referenced variables.
This is used when executing a closure, to determine which variables are in each scope.
The first time that a closure is executed, it will invoke this method on each statement, which will recursively find all of the declarations and variable references.
If you are implementing a new AST node that contains others, don’t forget to implement this to forward the invocation to subclasses!
NOTE: There is no real error checking for variable references, the MysoreScript interpreter will simply crash if you try to access a variable that is not in any symbol tables.