diff options
Diffstat (limited to 'tcllib/modules/page/notes/doc_normalize.txt')
-rw-r--r-- | tcllib/modules/page/notes/doc_normalize.txt | 138 |
1 files changed, 138 insertions, 0 deletions
diff --git a/tcllib/modules/page/notes/doc_normalize.txt b/tcllib/modules/page/notes/doc_normalize.txt new file mode 100644 index 0000000..b64ad04 --- /dev/null +++ b/tcllib/modules/page/notes/doc_normalize.txt @@ -0,0 +1,138 @@ +Normalized PE Grammar Tree +========================== + +This file documents the tree generated by the transformational package +'pg::peg::normalize'. The input to the transformation is assumed to be +a 'Raw PE Grammar AS Tree', as generated by the PEG frontend, and +described in 'doc_grammar.txt'. + +General information +------------------- + +* The tree is implemented using 'struct::tree'. + +* The tree is used as a higher data structures keeping the various + parts of a grammar together, which are: Start parsing expression, + definitions, and their parsing expressions. The tree nature of the + parsing expressions map especially nicely to this data structure. + +Structure +--------- + +* The root node represents the overall grammar. It has one child node + for the start expression, and one child per definition of a + nonterminal symbol in the grammar. No other nodes are possible. The + order of the children is not specified and an implementation + detail. Attributes in the root provide quick access, and the nodes + can also be distinguished by the attributes they have and/or their + values. + +* A definition node represents a single nonterminal definition from + the grammar. Most of the information describing the definition is + stored in attributes of the node. Sole exception is the parsing + expression associated with the defined nonterminal symbol. This is + represented by an expression subtree, the root of which is the + single child of the definition node. + +* All other nodes represent a parsing expression with the operator + stored in the node and its arguments represented by the children of + the node. For operators allowing more than one argument the children + will be in the same order as specified in the grammar. I.e. the + first child represents the first argument to the operator, and so + on. + +Attributes +---------- + +Name Type Details +---- ---- ------- +name string Root only. The name of the grammar represented by the + tree. +---- ---- ------- +start noderef Root only. Id of the tree node which is the root of + the start expression. A child of the root node. Does + not intersect with the set of definition nodes. Can be + empty, representing a grammar without start expression. +---- ---- ------- +definitions Root only. Maps the names (strings) of nonterminal + dict symbols to the ids of the tree nodes (noderef) which + represents the definition of that symbol. The nodes + are all immediate children of the root node. None of + them can be the root of the start expression + however. The dictionary can be empty, representing a + grammar which has no nonterminal symbols. +---- ---- ------- +undefined Root only. Maps the name (string) of a nonterminal + dict symbol which has no definition in the grammar to a + list containting the ids of the tree nodes (noderef) + which use the symbol despite that. I.e. if this value + is not empty the grammar is invalid and has 'holes'. +==== ==== ======= +symbol string Root and definition nodes only. The name of the + nonterminal symbol whose definition the node is + representing. For root this is '<StartExpression>'. + It is defined for root so that some algorithms on + expressions can use it as a sentinel. +---- ---- ------- +label string Definition nodes only. The name of the input grammar + level nonterminal symbol represented by the node. This + is normally identical to 'symbol', but can differ for + helper definitions introduced by transformations. For + such 'symbol' will refer to the generated name of the + symbol, and 'label' to the name of the symbol in the + grammar the helper belongs to. +---- ---- ------- +mode enum Definition nodes only. Values in {value, discard, + leaf, match}. Specifies how the defined nonterminal + handles the generation of its semantic value during + matching. +---- ---- ------- +users list Definition nodes only. A list containing the ids of + the tree nodes which reference this definition. These + nodes are always expression nodes, with operator + 'n'. The list can be empty, representing a nonterminal + symbol which is defined, but not used anywhere in + grammar. +==== ==== ======= +op enum All expression nodes. Values in {n, t, .., epsilon, + alpha, alnum, x, /, ?, *, +, !, &}. Specifies the + operator part of the expression represented by the + node. +---- ---- ------- +char char Expression nodes with operator 't' (t-Nodes) + only. Value is the single character from the grammar + to match, as represented by Tcl. I.e. any quoting from + the input has been resolved. +---- ---- ------- +begin char ..-Nodes only. Values are like 'char' above, the first +end char and last character in the range to match. +---- ---- ------- +sym string n-Nodes only. The name of the nonterminal symbol to + match. +---- ---- ------- +def noderef n-Nodes only. The id of the definition node for the + nonterminal symbol to match. Can be empty. In that + case the node repesents a try to match an undefined + nonterminal symbol. The value of 'sym' will be a key + in the dictionary of root->undefined, and the id of + this node an element in the list associated with that + key. +==== ==== ======= +at*, to* See 'doc_grammar.txt' for the general definition. + + All nodes except root. + + Definition nodes: The span of input covered by the + definition. + + Expression nodes: The span of input covered by the + expression. + + The nodes for the operators + + dot, alpha, alnum, epsilon + + have no location information right now. Nodes based + on them may have only partial or no information as + well. +---- ---- ------- |