From d38fb59d1263822ad6b0953ccc049b52c1ac2c77 Mon Sep 17 00:00:00 2001 From: "jan.nijtmans" Date: Mon, 13 May 2024 07:55:28 +0000 Subject: Backout [b49efeca6a] (so people can judge whether this is just a textual improvement or not) --- doc/Tcl.n | 323 +++++++++++++++++++++++++++++++++++++------------------------- 1 file changed, 194 insertions(+), 129 deletions(-) diff --git a/doc/Tcl.n b/doc/Tcl.n index fbe77bc..0f784af 100644 --- a/doc/Tcl.n +++ b/doc/Tcl.n @@ -1,7 +1,6 @@ '\" '\" Copyright (c) 1993 The Regents of the University of California. '\" Copyright (c) 1994-1996 Sun Microsystems, Inc. -'\" Copyright (c) 2023 Nathan Coulter '\" '\" See the file "license.terms" for information on usage and redistribution '\" of this file, and for a DISCLAIMER OF ALL WARRANTIES. @@ -17,191 +16,257 @@ Summary of Tcl language syntax. .SH DESCRIPTION .PP The following rules define the syntax and semantics of the Tcl language: -. -.IP "[1] \fBScript.\fR" -A script is composed of zero or more commands delimited by semi-colons or -newlines. -.IP "[2] \fBCommand.\fR" -A command is composed of zero or more words delimited by whitespace. The -replacement for a substitution is included verbatim in the word. For example, a -space in the replacement is included in the word rather than becoming a -delimiter, and \fI\\\\\fR becomes a single backslash in the word. Each word is -processed from left to right and each substitution is performed as soon as it -is complete. -For example, the command -.RS -.PP -.CS -set y [set x 0][incr x][incr x] -.CE -.PP -is composed of three words, and sets the value of \fIy\fR to \fI012\fR. -.PP -If hash -.PQ # -is the first character of what would otherwise be the first word of a command, -all characters up to the next newline are ignored. -.RE -. -.IP "[3] \fBBraced word.\fR" -If a word is enclosed in braces -.PQ { -and -.PQ } "" -, the braces are removed and the enclosed characters become the word. No -substitutions are performed. Nested pairs of braces may occur within the word. -A brace preceded by an odd number of backslashes is not considered part of a -pair, and neither brace nor the backslashes are removed from the word. -. -.IP "[4] \fBQuoted word.\fR" -If a word is enclosed in double quotes +.IP "[1] \fBCommands.\fR" +A Tcl script is a string containing one or more commands. +Semi-colons and newlines are command separators unless quoted as +described below. +Close brackets are command terminators during command substitution +(see below) unless quoted. +.IP "[2] \fBEvaluation.\fR" +A command is evaluated in two steps. +First, the Tcl interpreter breaks the command into \fIwords\fR +and performs substitutions as described below. +These substitutions are performed in the same way for all +commands. +Secondly, the first word is used to locate a routine to +carry out the command, and the remaining words of the command are +passed to that routine. +The routine is free to interpret each of its words +in any way it likes, such as an integer, variable name, list, +or Tcl script. +Different commands interpret their words differently. +.IP "[3] \fBWords.\fR" +Words of a command are separated by white space (except for +newlines, which are command separators). +.IP "[4] \fBDouble quotes.\fR" +If the first character of a word is double-quote .PQ \N'34' -, the double quotes are removed and the enclosed characters become the word. -Substitutions are performed. -. -.IP "[5] \fBList.\fR" -A list has the form of a single command. Newline is whitespace, and semicolon -has no special interpretation. There is no script evaluation so there is no -argument expansion, variable substitution, or command substitution: Dollar-sign -and open bracket have no special interpretation, and what would be argument -expansion in a script is invalid in a list. -. -.IP "[6] \fBArgument expansion.\fR" -If +then the word is terminated by the next double-quote character. +If semi-colons, close brackets, or white space characters +(including newlines) appear between the quotes then they are treated +as ordinary characters and included in the word. +Command substitution, variable substitution, and backslash substitution +are performed on the characters between the quotes as described below. +The double-quotes are not retained as part of the word. +.IP "[5] \fBArgument expansion.\fR" +If a word starts with the string .QW {*} -prefixes a word, it is removed. After any remaining enclosing braces or quotes -are processed and applicable substitutions performed, the word, which must -be a list, is removed from the command, and in its place each word in the -list becomes an additional word in the command. For example, -.CS -cmd a {*}{b [c]} d {*}{$e f {g h}} -.CE +followed by a non-whitespace character, then the leading +.QW {*} +is removed and the rest of the word is parsed and substituted as any other +word. After substitution, the word is parsed as a list (without command or +variable substitutions; backslash substitutions are performed as is normal for +a list and individual internal words may be surrounded by either braces or +double-quote characters), and its words are added to the command being +substituted. For instance, +.QW "cmd a {*}{b [c]} d {*}{$e f {g h}}" is equivalent to -.CS -cmd a b {[c]} d {$e} f {g h} . -.CE -. -.IP "[7] \fBEvaluation.\fR" -To evaluate a script, an interpreter evaluates each successive command. The -first word identifies a procedure, and the remaining words are passed to that -procedure for further evaluation. The procedure interprets each argument in -its own way, e.g. as an integer, variable name, list, mathematical expression, -script, or in some other arbitrary way. The result of the last command is the -result of the script. -. -.IP "[8] \fBCommand substitution.\fR" -Each pair of brackets +.QW "cmd a b {[c]} d {$e} f {g h}" . +.IP "[6] \fBBraces.\fR" +If the first character of a word is an open brace +.PQ { +and rule [5] does not apply, then +the word is terminated by the matching close brace +.PQ } "" . +Braces nest within the word: for each additional open +brace there must be an additional close brace (however, +if an open brace or close brace within the word is +quoted with a backslash then it is not counted in locating the +matching close brace). +No substitutions are performed on the characters between the +braces except for backslash-newline substitutions described +below, nor do semi-colons, newlines, close brackets, +or white space receive any special interpretation. +The word will consist of exactly the characters between the +outer braces, not including the braces themselves. +.IP "[7] \fBCommand substitution.\fR" +If a word contains an open bracket .PQ [ -and -.PQ ] "" -encloses a script and is replaced by the result of that script. -.IP "[9] \fBVariable substitution.\fR" -Each of the following forms begins with dollar sign +then Tcl performs \fIcommand substitution\fR. +To do this it invokes the Tcl interpreter recursively to process +the characters following the open bracket as a Tcl script. +The script may contain any number of commands and must be terminated +by a close bracket +.PQ ] "" . +The result of the script (i.e. the result of its last command) is +substituted into the word in place of the brackets and all of the +characters between them. +There may be any number of command substitutions in a single word. +Command substitution is not performed on words enclosed in braces. +.IP "[8] \fBVariable substitution.\fR" +If a word contains a dollar-sign .PQ $ -and is replaced by the value of the identified variable. \fIname\fR names the -variable and is composed of ASCII letters (\fBA\fR\(en\fBZ\fR and -\fBa\fR\(en\fBz\fR), digits (\fB0\fR\(en\fB9\fR), underscores, or namespace -delimiters (two or more colons). \fIindex\fR is the name of an individual -variable within an array variable, and may be empty. +followed by one of the forms +described below, then Tcl performs \fIvariable +substitution\fR: the dollar-sign and the following characters are +replaced in the word by the value of a variable. +Variable substitution may take any of the following forms: .RS .TP 15 \fB$\fIname\fR . -\fIname\fR may not be empty. +\fIName\fR is the name of a scalar variable; the name is a sequence +of one or more characters that are a letter, digit, underscore, +or namespace separators (two or more colons). +Letters and digits are \fIonly\fR the standard ASCII ones (\fB0\fR\(en\fB9\fR, +\fBA\fR\(en\fBZ\fR and \fBa\fR\(en\fBz\fR). .TP 15 \fB$\fIname\fB(\fIindex\fB)\fR . -\fIname\fR may be empty. Substitutions are performed on \fIindex\fR. +\fIName\fR gives the name of an array variable and \fIindex\fR gives +the name of an element within that array. +\fIName\fR must contain only letters, digits, underscores, and +namespace separators, and may be an empty string. +Letters and digits are \fIonly\fR the standard ASCII ones (\fB0\fR\(en\fB9\fR, +\fBA\fR\(en\fBZ\fR and \fBa\fR\(en\fBz\fR). +Command substitutions, variable substitutions, and backslash +substitutions are performed on the characters of \fIindex\fR. .TP 15 \fB${\fIname\fB}\fR . -\fIname\fR may be empty. -.TP 15 -\fB${\fIname(index)\fB}\fR -. -\fIname\fR may be empty. No substitutions are performed. +\fIName\fR is the name of a scalar variable or array element. It may contain +any characters whatsoever except for close braces. It indicates an array +element if \fIname\fR is in the form +.QW \fIarrayName\fB(\fIindex\fB)\fR +where \fIarrayName\fR does not contain any open parenthesis characters, +.QW \fB(\fR , +or close brace characters, +.QW \fB}\fR , +and \fIindex\fR can be any sequence of characters except for close brace +characters. No further +substitutions are performed during the parsing of \fIname\fR. +.PP +There may be any number of variable substitutions in a single word. +Variable substitution is not performed on words enclosed in braces. +.PP +Note that variables may contain character sequences other than those listed +above, but in that case other mechanisms must be used to access them (e.g., +via the \fBset\fR command's single-argument form). .RE -Variables that are not accessible through one of the forms above may be -accessed through other mechanisms, e.g. the \fBset\fR command. -.IP "[10] \fBBackslash substitution.\fR" -Each backslash +.IP "[9] \fBBackslash substitution.\fR" +If a backslash .PQ \e -that is not part of one of the forms listed below is removed, and the next -character is included in the word verbatim, which allows the inclusion of -characters that would normally be interpreted, namely whitespace, braces, -brackets, double quote, dollar sign, and backslash. The following sequences -are replaced as described: +appears within a word then \fIbackslash substitution\fR occurs. +In all cases but those described below the backslash is dropped and +the following character is treated as an ordinary +character and included in the word. +This allows characters such as double quotes, close brackets, +and dollar signs to be included in words without triggering +special processing. +The following table lists the backslash sequences that are +handled specially, along with the value that replaces each sequence. .RS .RS .RS .TP 7 \e\fBa\fR -. -Audible alert (bell) (U+7). +Audible alert (bell) (Unicode U+000007). .TP 7 \e\fBb\fR -. -Backspace (U+8). +Backspace (Unicode U+000008). .TP 7 \e\fBf\fR -. -Form feed (U+C). +Form feed (Unicode U+00000C). .TP 7 \e\fBn\fR -. -Newline (U+A). +Newline (Unicode U+00000A). .TP 7 \e\fBr\fR -. -Carriage-return (U+D). +Carriage-return (Unicode U+00000D). .TP 7 \e\fBt\fR -. -Tab (U+9). +Tab (Unicode U+000009). .TP 7 \e\fBv\fR -. -Vertical tab (U+B). +Vertical tab (Unicode U+00000B). .TP 7 \e\fB\fIwhiteSpace\fR . -Newline preceded by an odd number of backslashes, along with the consecutive -spaces and tabs that immediately follow it, is replaced by a single space. -Because this happens before the command is split into words, it occurs even -within braced words, and if the resulting space may subsequently be treated as -a word delimiter. +A single space character replaces the backslash, newline, and all spaces +and tabs after the newline. This backslash sequence is unique in that it +is replaced in a separate pre-pass before the command is actually parsed. +This means that it will be replaced even when it occurs between braces, +and the resulting space will be treated as a word separator if it is not +in braces or quotes. .TP 7 \e\e -. Backslash .PQ \e "" . .TP 7 \e\fIooo\fR . -Up to three octal digits form an eight-bit value for a Unicode character in the -range \fI0\fR\(en\fI377\fR, i.e. U+0\(enU+FF. Only the digits that result in a -number in this range are consumed. +The digits \fIooo\fR (one, two, or three of them) give a eight-bit octal +value for the Unicode character that will be inserted, in the range +\fI000\fR\(en\fI377\fR (i.e., the range U+000000\(enU+0000FF). +The parser will stop just before this range overflows, or when +the maximum of three digits is reached. The upper bits of the Unicode +character will be 0. .TP 7 \e\fBx\fIhh\fR . -Up to two hexadecimal digits form an eight-bit value for a Unicode character in -the range \fI0\fR\(en\fIFF\fR. +The hexadecimal digits \fIhh\fR (one or two of them) give an eight-bit +hexadecimal value for the Unicode character that will be inserted. The upper +bits of the Unicode character will be 0 (i.e., the character will be in the +range U+000000\(enU+0000FF). .TP 7 \e\fBu\fIhhhh\fR . -Up to four hexadecimal digits form a 16-bit value for a Unicode character in -the range \fI0\fR\(en\fIFFFF\fR. +The hexadecimal digits \fIhhhh\fR (one, two, three, or four of them) give a +sixteen-bit hexadecimal value for the Unicode character that will be +inserted. The upper bits of the Unicode character will be 0 (i.e., the +character will be in the range U+000000\(enU+00FFFF). .TP 7 \e\fBU\fIhhhhhhhh\fR . -Up to eight hexadecimal digits form a 21-bit value for a Unicode character in -the range \fI0\fR\(en\fI10FFFF\fR. Only the digits that result in a number in -this range are consumed. +The hexadecimal digits \fIhhhhhhhh\fR (one up to eight of them) give a +twenty-one-bit hexadecimal value for the Unicode character that will be +inserted, in the range U+000000\(enU+10FFFF. The parser will stop just +before this range overflows, or when the maximum of eight digits +is reached. The upper bits of the Unicode character will be 0. .RE .RE .PP +Backslash substitution is not performed on words enclosed in braces, +except for backslash-newline as described above. .RE -. +.IP "[10] \fBComments.\fR" +If a hash character +.PQ # +appears at a point where Tcl is +expecting the first character of the first word of a command, +then the hash character and the characters that follow it, up +through the next newline, are treated as a comment and ignored. +The comment character only has significance when it appears +at the beginning of a command. +.IP "[11] \fBOrder of substitution.\fR" +Each character is processed exactly once by the Tcl interpreter +as part of creating the words of a command. +For example, if variable substitution occurs then no further +substitutions are performed on the value of the variable; the +value is inserted into the word verbatim. +If command substitution occurs then the nested command is +processed entirely by the recursive call to the Tcl interpreter; +no substitutions are performed before making the recursive +call and no additional substitutions are performed on the result +of the nested script. +.RS +.PP +Substitutions take place from left to right, and each substitution is +evaluated completely before attempting to evaluate the next. Thus, a +sequence like +.PP +.CS +set y [set x 0][incr x][incr x] +.CE +.PP +will always set the variable \fIy\fR to the value, \fI012\fR. +.RE +.IP "[12] \fBSubstitution and word boundaries.\fR" +Substitutions do not affect the word boundaries of a command, +except for argument expansion as specified in rule [5]. +For example, during variable substitution the entire value of +the variable becomes part of a single word, even if the variable's +value contains spaces. .SH KEYWORDS backslash, command, comment, script, substitution, variable '\" Local Variables: -- cgit v0.12