summaryrefslogtreecommitdiffstats
path: root/tcllib/modules/grammar_me/me_cpucore.man
blob: b81c0467e1be145279b8fa0c9c75316b476ba137 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
[comment {-*- tcl -*- doctools manpage}]
[manpage_begin grammar::me::cpu::core n 0.2]
[keywords grammar]
[keywords parsing]
[keywords {virtual machine}]
[copyright {2005-2006 Andreas Kupries <andreas_kupries@users.sourceforge.net>}]
[moddesc   {Grammar operations and usage}]
[titledesc {ME virtual machine state manipulation}]
[category  {Grammars and finite automata}]
[require Tcl 8.4]
[require grammar::me::cpu::core [opt 0.2]]
[description]
[para]

This package provides an implementation of the ME virtual machine.

Please go and read the document [syscmd grammar::me_intro] first if
you do not know what a ME virtual machine is.

[para]

This implementation represents each ME virtual machine as a Tcl value
and provides commands to manipulate and query such values to show the
effects of executing instructions, adding tokens, retrieving state,
etc.

[para]

The values fully follow the paradigm of Tcl that every value is a
string and while also allowing C implementations for a proper
Tcl_ObjType to keep all the important data in native data structures.

Because of the latter it is recommended to access the state values
[emph only] through the commands of this package to ensure that
internal representation is not shimmered away.

[para]

The actual structure used by all state values is described in section
[sectref {CPU STATE}].

[section API]

The package directly provides only a single command, and all the
functionality is made available through its methods.

[list_begin definitions]

[call [cmd ::grammar::me::cpu::core] [method disasm] [arg asm]]

This method returns a list containing a disassembly of the match
instructions in [arg asm]. The format of [arg asm] is specified in the
section [sectref {MATCH PROGRAM REPRESENTATION}].

[para]

Each element of the result contains instruction label, instruction
name, and the instruction arguments, in this order. The label can be
the empty string. Jump destinations are shown as labels, strings and
tokens unencoded. Token names are prefixed with their numeric id, if,
and only if a tokmap is defined. The two components are separated by a
colon.

[call [cmd ::grammar::me::cpu::core] [method asm] [arg asm]]

This method returns code in the format as specified in section
[sectref {MATCH PROGRAM REPRESENTATION}] generated from ME assembly
code [arg asm], which is in the format as returned by the method
[method disasm].

[call [cmd ::grammar::me::cpu::core] [method new] [arg asm]]

This method creates state value for a ME virtual machine in its
initial state and returns it as its result.

[para]

The argument [arg matchcode] contains a Tcl representation of the
match instructions the machine has to execute while parsing the input
stream. Its format is specified in the section
[sectref {MATCH PROGRAM REPRESENTATION}].

[para]

The [arg tokmap] argument taken by the implementation provided by the
package [package grammar::me::tcl] is here hidden inside of the match
instructions and therefore not needed.

[call [cmd ::grammar::me::cpu::core] [method lc] [arg state] [arg location]]

This method takes the state value of a ME virtual machine and uses it
to convert a location in the input stream (as offset) into a line
number and column index. The result of the method is a 2-element list
containing the two pieces in the order mentioned in the previous
sentence.

[para]

[emph Note] that the method cannot convert locations which the machine
has not yet read from the input stream. In other words, if the machine
has read 7 characters so far it is possible to convert the offsets
[const 0] to [const 6], but nothing beyond that. This also shows that
it is not possible to convert offsets which refer to locations before
the beginning of the stream.

[para]

This utility allows higher levels to convert the location offsets
found in the error status and the AST into more human readable data.

[call [cmd ::grammar::me::cpu::core] [method tok] [arg state] [opt "[arg from] [opt [arg to]]"]]

This method takes the state value of a ME virtual machine and returns
a Tcl list containing the part of the input stream between the
locations [arg from] and [arg to] (both inclusive). If [arg to] is not
specified it will default to the value of [arg from]. If [arg from] is
not specified either the whole input stream is returned.

[para]

This method places the same restrictions on its location arguments as
the method [method lc].

[call [cmd ::grammar::me::cpu::core] [method pc] [arg state]]

This method takes the state value of a ME virtual machine and returns
the current value of the stored program counter.

[call [cmd ::grammar::me::cpu::core] [method iseof] [arg state]]

This method takes the state value of a ME virtual machine and returns
the current value of the stored eof flag.

[call [cmd ::grammar::me::cpu::core] [method at] [arg state]]

This method takes the state value of a ME virtual machine and returns
the current location in the input stream.

[call [cmd ::grammar::me::cpu::core] [method cc] [arg state]]

This method takes the state value of a ME virtual machine and returns
the current token.

[call [cmd ::grammar::me::cpu::core] [method sv] [arg state]]

This method takes the state value of a ME virtual machine and returns
the current semantic value stored in it.

This is an abstract syntax tree as specified in the document
[syscmd grammar::me_ast], section [sectref-external {AST VALUES}].

[call [cmd ::grammar::me::cpu::core] [method ok] [arg state]]

This method takes the state value of a ME virtual machine and returns
the match status stored in it.

[call [cmd ::grammar::me::cpu::core] [method error] [arg state]]

This method takes the state value of a ME virtual machine and returns
the current error status stored in it.

[call [cmd ::grammar::me::cpu::core] [method lstk] [arg state]]

This method takes the state value of a ME virtual machine and returns
the location stack.

[call [cmd ::grammar::me::cpu::core] [method astk] [arg state]]

This method takes the state value of a ME virtual machine and returns
the AST stack.

[call [cmd ::grammar::me::cpu::core] [method mstk] [arg state]]

This method takes the state value of a ME virtual machine and returns
the AST marker stack.

[call [cmd ::grammar::me::cpu::core] [method estk] [arg state]]

This method takes the state value of a ME virtual machine and returns
the error stack.

[call [cmd ::grammar::me::cpu::core] [method rstk] [arg state]]

This method takes the state value of a ME virtual machine and returns
the subroutine return stack.

[call [cmd ::grammar::me::cpu::core] [method nc] [arg state]]

This method takes the state value of a ME virtual machine and returns
the nonterminal match cache as a dictionary.

[call [cmd ::grammar::me::cpu::core] [method ast] [arg state]]

This method takes the state value of a ME virtual machine and returns
the abstract syntax tree currently at the top of the AST stack stored
in it.

This is an abstract syntax tree as specified in the document
[syscmd grammar::me_ast], section [sectref-external {AST VALUES}].

[call [cmd ::grammar::me::cpu::core] [method halted] [arg state]]

This method takes the state value of a ME virtual machine and returns
the current halt status stored in it, i.e. if the machine has stopped
or not.

[call [cmd ::grammar::me::cpu::core] [method code] [arg state]]

This method takes the state value of a ME virtual machine and returns
the code stored in it, i.e. the instructions executed by the machine.

[call [cmd ::grammar::me::cpu::core] [method eof] [arg statevar]]

This method takes the state value of a ME virtual machine as stored in
the variable named by [arg statevar] and modifies it so that the eof
flag inside is set. This signals to the machine that whatever token
are in the input queue are the last to be processed. There will be no
more.

[call [cmd ::grammar::me::cpu::core] [method put] [arg statevar] [arg tok] [arg lex] [arg line] [arg col]]

This method takes the state value of a ME virtual machine as stored in
the variable named by [arg statevar] and modifies it so that the token
[arg tok] is added to the end of the input queue, with associated
lexeme data [arg lex] and [arg line]/[arg col]umn information.

[para]

The operation will fail with an error if the eof flag of the machine
has been set through the method [method eof].

[call [cmd ::grammar::me::cpu::core] [method run] [arg statevar] [opt [arg n]]]

This method takes the state value of a ME virtual machine as stored in
the variable named by [arg statevar], executes a number of
instructions and stores the state resulting from their modifications
back into the variable.

[para]

The execution loop will run until either

[list_begin itemized]
[item] [arg n] instructions have been executed, or
[item] a halt instruction was executed, or
[item]
the input queue is empty and the code is asking for more tokens to
process.
[list_end]
[para]

If no limit [arg n] was set only the last two conditions are checked
for.

[list_end]

[subsection {MATCH PROGRAM REPRESENTATION}]

A match program is represented by nested Tcl list. The first element,
[term asm], is a list of integer numbers, the instructions to execute,
and their arguments. The second element, [term pool], is a list of
strings, referenced by the instructions, for error messages, token
names, etc. The third element, [term tokmap], provides ordering
information for the tokens, mapping their names to their numerical
rank. This element can be empty, forcing lexicographic comparison when
matching ranges.

[para]

All ME instructions are encoded as integer numbers, with the mapping
given below. A number of the instructions, those which handle error
messages, have been given an additional argument to supply that
message explicitly instead of having it constructed from token names,
etc. This allows the machine state to store only the message ids
instead of the full strings.

[para]

Jump destination arguments are absolute indices into the [term asm]
element, refering to the instruction to jump to. Any string arguments
are absolute indices into the [term pool] element. Tokens, characters,
messages, and token (actually character) classes to match are coded as
references into the [term pool] as well.

[para]
[list_begin enumerated]

[enum] "[cmd ict_advance] [arg message]"
[enum] "[cmd ict_match_token] [arg tok] [arg message]"
[enum] "[cmd ict_match_tokrange] [arg tokbegin] [arg tokend] [arg message]"
[enum] "[cmd ict_match_tokclass] [arg code] [arg message]"
[enum] "[cmd inc_restore] [arg branchlabel] [arg nt]"
[enum] "[cmd inc_save] [arg nt]"
[enum] "[cmd icf_ntcall] [arg branchlabel]"
[enum] "[cmd icf_ntreturn]"
[enum] "[cmd iok_ok]"
[enum] "[cmd iok_fail]"
[enum] "[cmd iok_negate]"
[enum] "[cmd icf_jalways] [arg branchlabel]"
[enum] "[cmd icf_jok] [arg branchlabel]"
[enum] "[cmd icf_jfail] [arg branchlabel]"
[enum] "[cmd icf_halt]"
[enum] "[cmd icl_push]"
[enum] "[cmd icl_rewind]"
[enum] "[cmd icl_pop]"
[enum] "[cmd ier_push]"
[enum] "[cmd ier_clear]"
[enum] "[cmd ier_nonterminal] [arg message]"
[enum] "[cmd ier_merge]"
[enum] "[cmd isv_clear]"
[enum] "[cmd isv_terminal]"
[enum] "[cmd isv_nonterminal_leaf] [arg nt]"
[enum] "[cmd isv_nonterminal_range] [arg nt]"
[enum] "[cmd isv_nonterminal_reduce] [arg nt]"
[enum] "[cmd ias_push]"
[enum] "[cmd ias_mark]"
[enum] "[cmd ias_mrewind]"
[enum] "[cmd ias_mpop]"
[list_end]

[section {CPU STATE}]

A state value is a list containing the following elements, in the order listed below:

[list_begin enumerated]
[enum] [term code]: Match instructions, see [sectref {MATCH PROGRAM REPRESENTATION}].
[enum] [term pc]:   Program counter, [term int].
[enum] [term halt]: Halt flag, [term boolean].
[enum] [term eof]:  Eof flag, [term boolean]
[enum] [term tc]:   Terminal cache, and input queue. Structure see below.
[enum] [term cl]:   Current location, [term int].
[enum] [term ct]:   Current token, [term string].
[enum] [term ok]:   Match status, [term boolean].
[enum] [term sv]:   Semantic value, [term list].
[enum] [term er]:   Error status, [term list].
[enum] [term ls]:   Location stack, [term list].
[enum] [term as]:   AST stack, [term list].
[enum] [term ms]:   AST marker stack, [term list].
[enum] [term es]:   Error stack, [term list].
[enum] [term rs]:   Return stack, [term list].
[enum] [term nc]:   Nonterminal cache, [term dictionary].
[list_end]
[para]

[term tc], the input queue of tokens waiting for processing and the
terminal cache containing the tokens already processing are one
unified data structure simply holding all tokens and their
information, with the current location separating that which has been
processed from that which is waiting.

Each element of the queue/cache is a list containing the token, its
lexeme information, line number, and column index, in this order.

[para]

All stacks have their top element aat the end, i.e. pushing an item is
equivalent to appending to the list representing the stack, and
popping it removes the last element.

[para]

[term er], the error status is either empty or a list of two elements,
a location in the input, and a list of messages, encoded as references
into the [term pool] element of the [term code].

[para]

[term nc], the nonterminal cache is keyed by nonterminal name and
location, each value a four-element list containing current location,
match status, semantic value, and error status, in this order.

[vset CATEGORY grammar_me]
[include ../doctools2base/include/feedback.inc]
[manpage_end]