summaryrefslogtreecommitdiffstats
path: root/tcllib/modules/page/notes/doc_normalize.txt
diff options
context:
space:
mode:
Diffstat (limited to 'tcllib/modules/page/notes/doc_normalize.txt')
-rw-r--r--tcllib/modules/page/notes/doc_normalize.txt138
1 files changed, 138 insertions, 0 deletions
diff --git a/tcllib/modules/page/notes/doc_normalize.txt b/tcllib/modules/page/notes/doc_normalize.txt
new file mode 100644
index 0000000..b64ad04
--- /dev/null
+++ b/tcllib/modules/page/notes/doc_normalize.txt
@@ -0,0 +1,138 @@
+Normalized PE Grammar Tree
+==========================
+
+This file documents the tree generated by the transformational package
+'pg::peg::normalize'. The input to the transformation is assumed to be
+a 'Raw PE Grammar AS Tree', as generated by the PEG frontend, and
+described in 'doc_grammar.txt'.
+
+General information
+-------------------
+
+* The tree is implemented using 'struct::tree'.
+
+* The tree is used as a higher data structures keeping the various
+ parts of a grammar together, which are: Start parsing expression,
+ definitions, and their parsing expressions. The tree nature of the
+ parsing expressions map especially nicely to this data structure.
+
+Structure
+---------
+
+* The root node represents the overall grammar. It has one child node
+ for the start expression, and one child per definition of a
+ nonterminal symbol in the grammar. No other nodes are possible. The
+ order of the children is not specified and an implementation
+ detail. Attributes in the root provide quick access, and the nodes
+ can also be distinguished by the attributes they have and/or their
+ values.
+
+* A definition node represents a single nonterminal definition from
+ the grammar. Most of the information describing the definition is
+ stored in attributes of the node. Sole exception is the parsing
+ expression associated with the defined nonterminal symbol. This is
+ represented by an expression subtree, the root of which is the
+ single child of the definition node.
+
+* All other nodes represent a parsing expression with the operator
+ stored in the node and its arguments represented by the children of
+ the node. For operators allowing more than one argument the children
+ will be in the same order as specified in the grammar. I.e. the
+ first child represents the first argument to the operator, and so
+ on.
+
+Attributes
+----------
+
+Name Type Details
+---- ---- -------
+name string Root only. The name of the grammar represented by the
+ tree.
+---- ---- -------
+start noderef Root only. Id of the tree node which is the root of
+ the start expression. A child of the root node. Does
+ not intersect with the set of definition nodes. Can be
+ empty, representing a grammar without start expression.
+---- ---- -------
+definitions Root only. Maps the names (strings) of nonterminal
+ dict symbols to the ids of the tree nodes (noderef) which
+ represents the definition of that symbol. The nodes
+ are all immediate children of the root node. None of
+ them can be the root of the start expression
+ however. The dictionary can be empty, representing a
+ grammar which has no nonterminal symbols.
+---- ---- -------
+undefined Root only. Maps the name (string) of a nonterminal
+ dict symbol which has no definition in the grammar to a
+ list containting the ids of the tree nodes (noderef)
+ which use the symbol despite that. I.e. if this value
+ is not empty the grammar is invalid and has 'holes'.
+==== ==== =======
+symbol string Root and definition nodes only. The name of the
+ nonterminal symbol whose definition the node is
+ representing. For root this is '<StartExpression>'.
+ It is defined for root so that some algorithms on
+ expressions can use it as a sentinel.
+---- ---- -------
+label string Definition nodes only. The name of the input grammar
+ level nonterminal symbol represented by the node. This
+ is normally identical to 'symbol', but can differ for
+ helper definitions introduced by transformations. For
+ such 'symbol' will refer to the generated name of the
+ symbol, and 'label' to the name of the symbol in the
+ grammar the helper belongs to.
+---- ---- -------
+mode enum Definition nodes only. Values in {value, discard,
+ leaf, match}. Specifies how the defined nonterminal
+ handles the generation of its semantic value during
+ matching.
+---- ---- -------
+users list Definition nodes only. A list containing the ids of
+ the tree nodes which reference this definition. These
+ nodes are always expression nodes, with operator
+ 'n'. The list can be empty, representing a nonterminal
+ symbol which is defined, but not used anywhere in
+ grammar.
+==== ==== =======
+op enum All expression nodes. Values in {n, t, .., epsilon,
+ alpha, alnum, x, /, ?, *, +, !, &}. Specifies the
+ operator part of the expression represented by the
+ node.
+---- ---- -------
+char char Expression nodes with operator 't' (t-Nodes)
+ only. Value is the single character from the grammar
+ to match, as represented by Tcl. I.e. any quoting from
+ the input has been resolved.
+---- ---- -------
+begin char ..-Nodes only. Values are like 'char' above, the first
+end char and last character in the range to match.
+---- ---- -------
+sym string n-Nodes only. The name of the nonterminal symbol to
+ match.
+---- ---- -------
+def noderef n-Nodes only. The id of the definition node for the
+ nonterminal symbol to match. Can be empty. In that
+ case the node repesents a try to match an undefined
+ nonterminal symbol. The value of 'sym' will be a key
+ in the dictionary of root->undefined, and the id of
+ this node an element in the list associated with that
+ key.
+==== ==== =======
+at*, to* See 'doc_grammar.txt' for the general definition.
+
+ All nodes except root.
+
+ Definition nodes: The span of input covered by the
+ definition.
+
+ Expression nodes: The span of input covered by the
+ expression.
+
+ The nodes for the operators
+
+ dot, alpha, alnum, epsilon
+
+ have no location information right now. Nodes based
+ on them may have only partial or no information as
+ well.
+---- ---- -------