update to tcl/tk 8.6.7

author: William Joye <wjoye@cfa.harvard.edu> 2017-09-22 18:51:12 (GMT)
committer: William Joye <wjoye@cfa.harvard.edu> 2017-09-22 18:51:12 (GMT)
commit: 3fa8e6dc88e8041b6cb88d1b1e9c05676d3346b7 (patch)
tree: 69afbb41089c8358615879f7cd3c4cf7997f4c7e /tcl8.6/doc/re_syntax.n
parent: a0e17db23c0fd7c771c0afce8cce350c98f90b02 (diff)
download: blt-3fa8e6dc88e8041b6cb88d1b1e9c05676d3346b7.zip
blt-3fa8e6dc88e8041b6cb88d1b1e9c05676d3346b7.tar.gz
blt-3fa8e6dc88e8041b6cb88d1b1e9c05676d3346b7.tar.bz2
1 files changed, 0 insertions, 858 deletions
diff --git a/tcl8.6/doc/re_syntax.n b/tcl8.6/doc/re_syntax.n
deleted file mode 100644
index 7988071..0000000
--- a/tcl8.6/doc/re_syntax.n
+++ /dev/null
@@ -1,858 +0,0 @@
-'\"
-'\" Copyright (c) 1998 Sun Microsystems, Inc.
-'\" Copyright (c) 1999 Scriptics Corporation
-'\"
-'\" See the file "license.terms" for information on usage and redistribution
-'\" of this file, and for a DISCLAIMER OF ALL WARRANTIES.
-'\"
-.so man.macros
-.ie '\w'o''\w'\C'^o''' .ds qo \C'^o'
-.el .ds qo u
-.TH re_syntax n "8.1" Tcl "Tcl Built-In Commands"
-.BS
-.SH NAME
-re_syntax \- Syntax of Tcl regular expressions
-.BE
-.SH DESCRIPTION
-.PP
-A \fIregular expression\fR describes strings of characters.
-It's a pattern that matches certain strings and does not match others.
-.SH "DIFFERENT FLAVORS OF REs"
-Regular expressions
-.PQ RE s ,
-as defined by POSIX, come in two flavors: \fIextended\fR REs
-.PQ ERE s
-and \fIbasic\fR REs
-.PQ BRE s .
-EREs are roughly those of the traditional \fIegrep\fR, while BREs are
-roughly those of the traditional \fIed\fR. This implementation adds
-a third flavor, \fIadvanced\fR REs
-.PQ ARE s ,
-basically EREs with some significant extensions.
-.PP
-This manual page primarily describes AREs. BREs mostly exist for
-backward compatibility in some old programs; they will be discussed at
-the end. POSIX EREs are almost an exact subset of AREs. Features of
-AREs that are not present in EREs will be indicated.
-.SH "REGULAR EXPRESSION SYNTAX"
-.PP
-Tcl regular expressions are implemented using the package written by
-Henry Spencer, based on the 1003.2 spec and some (not quite all) of
-the Perl5 extensions (thanks, Henry!). Much of the description of
-regular expressions below is copied verbatim from his manual entry.
-.PP
-An ARE is one or more \fIbranches\fR,
-separated by
-.QW \fB|\fR ,
-matching anything that matches any of the branches.
-.PP
-A branch is zero or more \fIconstraints\fR or \fIquantified atoms\fR,
-concatenated.
-It matches a match for the first, followed by a match for the second, etc;
-an empty branch matches the empty string.
-.SS QUANTIFIERS
-A quantified atom is an \fIatom\fR possibly followed
-by a single \fIquantifier\fR.
-Without a quantifier, it matches a single match for the atom.
-The quantifiers,
-and what a so-quantified atom matches, are:
-.RS 2
-.TP 6
-\fB*\fR
-.
-a sequence of 0 or more matches of the atom
-.TP
-\fB+\fR
-.
-a sequence of 1 or more matches of the atom
-.TP
-\fB?\fR
-.
-a sequence of 0 or 1 matches of the atom
-.TP
-\fB{\fIm\fB}\fR
-.
-a sequence of exactly \fIm\fR matches of the atom
-.TP
-\fB{\fIm\fB,}\fR
-.
-a sequence of \fIm\fR or more matches of the atom
-.TP
-\fB{\fIm\fB,\fIn\fB}\fR
-.
-a sequence of \fIm\fR through \fIn\fR (inclusive) matches of the atom;
-\fIm\fR may not exceed \fIn\fR
-.TP
-\fB*?  +?  ??  {\fIm\fB}?  {\fIm\fB,}?  {\fIm\fB,\fIn\fB}?\fR
-.
-\fInon-greedy\fR quantifiers, which match the same possibilities,
-but prefer the smallest number rather than the largest number
-of matches (see \fBMATCHING\fR)
-.RE
-.PP
-The forms using \fB{\fR and \fB}\fR are known as \fIbound\fRs. The
-numbers \fIm\fR and \fIn\fR are unsigned decimal integers with
-permissible values from 0 to 255 inclusive.
-.SS ATOMS
-An atom is one of:
-.RS 2
-.IP \fB(\fIre\fB)\fR 6
-matches a match for \fIre\fR (\fIre\fR is any regular expression) with
-the match noted for possible reporting
-.IP \fB(?:\fIre\fB)\fR
-as previous, but does no reporting (a
-.QW non-capturing
-set of parentheses)
-.IP \fB()\fR
-matches an empty string, noted for possible reporting
-.IP \fB(?:)\fR
-matches an empty string, without reporting
-.IP \fB[\fIchars\fB]\fR
-a \fIbracket expression\fR, matching any one of the \fIchars\fR (see
-\fBBRACKET EXPRESSIONS\fR for more detail)
-.IP \fB.\fR
-matches any single character
-.IP \fB\e\fIk\fR
-matches the non-alphanumeric character \fIk\fR
-taken as an ordinary character, e.g. \fB\e\e\fR matches a backslash
-character
-.IP \fB\e\fIc\fR
-where \fIc\fR is alphanumeric (possibly followed by other characters),
-an \fIescape\fR (AREs only), see \fBESCAPES\fR below
-.IP \fB{\fR
-when followed by a character other than a digit, matches the
-left-brace character
-.QW \fB{\fR ;
-when followed by a digit, it is the beginning of a \fIbound\fR (see above)
-.IP \fIx\fR
-where \fIx\fR is a single character with no other significance,
-matches that character.
-.RE
-.SS CONSTRAINTS
-A \fIconstraint\fR matches an empty string when specific conditions
-are met. A constraint may not be followed by a quantifier. The
-simple constraints are as follows; some more constraints are described
-later, under \fBESCAPES\fR.
-.RS 2
-.TP 8
-\fB^\fR
-.
-matches at the beginning of a line
-.TP
-\fB$\fR
-.
-matches at the end of a line
-.TP
-\fB(?=\fIre\fB)\fR
-.
-\fIpositive lookahead\fR (AREs only), matches at any point where a
-substring matching \fIre\fR begins
-.TP
-\fB(?!\fIre\fB)\fR
-.
-\fInegative lookahead\fR (AREs only), matches at any point where no
-substring matching \fIre\fR begins
-.RE
-.PP
-The lookahead constraints may not contain back references (see later),
-and all parentheses within them are considered non-capturing.
-.PP
-An RE may not end with
-.QW \fB\e\fR .
-.SH "BRACKET EXPRESSIONS"
-A \fIbracket expression\fR is a list of characters enclosed in
-.QW \fB[\|]\fR .
-It normally matches any single character from the list
-(but see below). If the list begins with
-.QW \fB^\fR ,
-it matches any single character (but see below) \fInot\fR from the
-rest of the list.
-.PP
-If two characters in the list are separated by
-.QW \fB\-\fR ,
-this is shorthand for the full \fIrange\fR of characters between those two
-(inclusive) in the collating sequence, e.g.
-.QW \fB[0\-9]\fR
-in Unicode matches any conventional decimal digit. Two ranges may not share an
-endpoint, so e.g.
-.QW \fBa\-c\-e\fR
-is illegal. Ranges in Tcl always use the
-Unicode collating sequence, but other programs may use other collating
-sequences and this can be a source of incompatibility between programs.
-.PP
-To include a literal \fB]\fR or \fB\-\fR in the list, the simplest
-method is to enclose it in \fB[.\fR and \fB.]\fR to make it a
-collating element (see below). Alternatively, make it the first
-character (following a possible
-.QW \fB^\fR ),
-or (AREs only) precede it with
-.QW \fB\e\fR .
-Alternatively, for
-.QW \fB\-\fR ,
-make it the last character, or the second endpoint of a range. To use
-a literal \fB\-\fR as the first endpoint of a range, make it a
-collating element or (AREs only) precede it with
-.QW \fB\e\fR .
-With the exception of
-these, some combinations using \fB[\fR (see next paragraphs), and
-escapes, all other special characters lose their special significance
-within a bracket expression.
-.SS "CHARACTER CLASSES"
-Within a bracket expression, the name of a \fIcharacter class\fR
-enclosed in \fB[:\fR and \fB:]\fR stands for the list of all
-characters (not all collating elements!) belonging to that class.
-Standard character classes are:
-.IP \fBalpha\fR 8
-A letter.
-.IP \fBupper\fR 8
-An upper-case letter.
-.IP \fBlower\fR 8
-A lower-case letter.
-.IP \fBdigit\fR 8
-A decimal digit.
-.IP \fBxdigit\fR 8
-A hexadecimal digit.
-.IP \fBalnum\fR 8
-An alphanumeric (letter or digit).
-.IP \fBprint\fR 8
-A "printable" (same as graph, except also including space).
-.IP \fBblank\fR 8
-A space or tab character.
-.IP \fBspace\fR 8
-A character producing white space in displayed text.
-.IP \fBpunct\fR 8
-A punctuation character.
-.IP \fBgraph\fR 8
-A character with a visible representation (includes both \fBalnum\fR
-and \fBpunct\fR).
-.IP \fBcntrl\fR 8
-A control character.
-.PP
-A locale may provide others. A character class may not be used as an endpoint
-of a range.
-.RS
-.PP
-(\fINote:\fR the current Tcl implementation has only one locale, the Unicode
-locale, which supports exactly the above classes.)
-.RE
-.SS "BRACKETED CONSTRAINTS"
-There are two special cases of bracket expressions: the bracket
-expressions
-.QW \fB[[:<:]]\fR
-and
-.QW \fB[[:>:]]\fR
-are constraints, matching empty strings at the beginning and end of a word
-respectively.
-.\" note, discussion of escapes below references this definition of word
-A word is defined as a sequence of word characters that is neither preceded
-nor followed by word characters. A word character is an \fIalnum\fR character
-or an underscore
-.PQ \fB_\fR "" .
-These special bracket expressions are deprecated; users of AREs should use
-constraint escapes instead (see below).
-.SS "COLLATING ELEMENTS"
-Within a bracket expression, a collating element (a character, a
-multi-character sequence that collates as if it were a single
-character, or a collating-sequence name for either) enclosed in
-\fB[.\fR and \fB.]\fR stands for the sequence of characters of that
-collating element. The sequence is a single element of the bracket
-expression's list. A bracket expression in a locale that has
-multi-character collating elements can thus match more than one
-character. So (insidiously), a bracket expression that starts with
-\fB^\fR can match multi-character collating elements even if none of
-them appear in the bracket expression!
-.RS
-.PP
-(\fINote:\fR Tcl has no multi-character collating elements. This information
-is only for illustration.)
-.RE
-.PP
-For example, assume the collating sequence includes a \fBch\fR multi-character
-collating element. Then the RE
-.QW \fB[[.ch.]]*c\fR
-(zero or more
-.QW \fBch\fRs
-followed by
-.QW \fBc\fR )
-matches the first five characters of
-.QW \fBchchcc\fR .
-Also, the RE
-.QW \fB[^c]b\fR
-matches all of
-.QW \fBchb\fR
-(because
-.QW \fB[^c]\fR
-matches the multi-character
-.QW \fBch\fR ).
-.SS "EQUIVALENCE CLASSES"
-Within a bracket expression, a collating element enclosed in \fB[=\fR
-and \fB=]\fR is an equivalence class, standing for the sequences of
-characters of all collating elements equivalent to that one, including
-itself. (If there are no other equivalent collating elements, the
-treatment is as if the enclosing delimiters were
-.QW \fB[.\fR \&
-and
-.QW \fB.]\fR .)
-For example, if \fBo\fR and \fB\*(qo\fR are the members of an
-equivalence class, then
-.QW \fB[[=o=]]\fR ,
-.QW \fB[[=\*(qo=]]\fR ,
-and
-.QW \fB[o\*(qo]\fR \&
-are all synonymous. An equivalence class may not be an endpoint of a range.
-.RS
-.PP
-(\fINote:\fR Tcl implements only the Unicode locale. It does not define any
-equivalence classes. The examples above are just illustrations.)
-.RE
-.SH ESCAPES
-Escapes (AREs only), which begin with a \fB\e\fR followed by an
-alphanumeric character, come in several varieties: character entry,
-class shorthands, constraint escapes, and back references. A \fB\e\fR
-followed by an alphanumeric character but not constituting a valid
-escape is illegal in AREs. In EREs, there are no escapes: outside a
-bracket expression, a \fB\e\fR followed by an alphanumeric character
-merely stands for that character as an ordinary character, and inside
-a bracket expression, \fB\e\fR is an ordinary character. (The latter
-is the one actual incompatibility between EREs and AREs.)
-.SS "CHARACTER-ENTRY ESCAPES"
-Character-entry escapes (AREs only) exist to make it easier to specify
-non-printing and otherwise inconvenient characters in REs:
-.RS 2
-.TP 5
-\fB\ea\fR
-.
-alert (bell) character, as in C
-.TP
-\fB\eb\fR
-.
-backspace, as in C
-.TP
-\fB\eB\fR
-.
-synonym for \fB\e\fR to help reduce backslash doubling in some
-applications where there are multiple levels of backslash processing
-.TP
-\fB\ec\fIX\fR
-.
-(where \fIX\fR is any character) the character whose low-order 5 bits
-are the same as those of \fIX\fR, and whose other bits are all zero
-.TP
-\fB\ee\fR
-.
-the character whose collating-sequence name is
-.QW \fBESC\fR ,
-or failing that, the character with octal value 033
-.TP
-\fB\ef\fR
-.
-formfeed, as in C
-.TP
-\fB\en\fR
-.
-newline, as in C
-.TP
-\fB\er\fR
-.
-carriage return, as in C
-.TP
-\fB\et\fR
-.
-horizontal tab, as in C
-.TP
-\fB\eu\fIwxyz\fR
-.
-(where \fIwxyz\fR is one up to four hexadecimal digits) the Unicode
-character \fBU+\fIwxyz\fR in the local byte ordering
-.TP
-\fB\eU\fIstuvwxyz\fR
-.
-(where \fIstuvwxyz\fR is one up to eight hexadecimal digits) reserved
-for a Unicode extension up to 21 bits. The digits are parsed until the
-first non-hexadecimal character is encountered, the maximun of eight
-hexadecimal digits are reached, or an overflow would occur in the maximum
-value of \fBU+\fI10ffff\fR.
-.TP
-\fB\ev\fR
-.
-vertical tab, as in C are all available.
-.TP
-\fB\ex\fIhh\fR
-.
-(where \fIhh\fR is one or two hexadecimal digits) the character
-whose hexadecimal value is \fB0x\fIhh\fR.
-.TP
-\fB\e0\fR
-.
-the character whose value is \fB0\fR
-.TP
-\fB\e\fIxyz\fR
-.
-(where \fIxyz\fR is exactly three octal digits, and is not a \fIback
-reference\fR (see below)) the character whose octal value is
-\fB0\fIxyz\fR. The first digit must be in the range 0-3, otherwise
-the two-digit form is assumed.
-.TP
-\fB\e\fIxy\fR
-.
-(where \fIxy\fR is exactly two octal digits, and is not a \fIback
-reference\fR (see below)) the character whose octal value is
-\fB0\fIxy\fR
-.RE
-.PP
-Hexadecimal digits are
-.QR \fB0\fR \fB9\fR ,
-.QR \fBa\fR \fBf\fR ,
-and
-.QR \fBA\fR \fBF\fR .
-Octal digits are
-.QR \fB0\fR \fB7\fR .
-.PP
-The character-entry escapes are always taken as ordinary characters.
-For example, \fB\e135\fR is \fB]\fR in Unicode, but \fB\e135\fR does
-not terminate a bracket expression. Beware, however, that some
-applications (e.g., C compilers and the Tcl interpreter if the regular
-expression is not quoted with braces) interpret such sequences
-themselves before the regular-expression package gets to see them,
-which may require doubling (quadrupling, etc.) the
-.QW \fB\e\fR .
-.SS "CLASS-SHORTHAND ESCAPES"
-Class-shorthand escapes (AREs only) provide shorthands for certain
-commonly-used character classes:
-.RS 2
-.TP 10
-\fB\ed\fR
-.
-\fB[[:digit:]]\fR
-.TP
-\fB\es\fR
-.
-\fB[[:space:]]\fR
-.TP
-\fB\ew\fR
-.
-\fB[[:alnum:]_]\fR (note underscore)
-.TP
-\fB\eD\fR
-.
-\fB[^[:digit:]]\fR
-.TP
-\fB\eS\fR
-.
-\fB[^[:space:]]\fR
-.TP
-\fB\eW\fR
-.
-\fB[^[:alnum:]_]\fR (note underscore)
-.RE
-.PP
-Within bracket expressions,
-.QW \fB\ed\fR ,
-.QW \fB\es\fR ,
-and
-.QW \fB\ew\fR \&
-lose their outer brackets, and
-.QW \fB\eD\fR ,
-.QW \fB\eS\fR ,
-and
-.QW \fB\eW\fR \&
-are illegal. (So, for example,
-.QW \fB[a-c\ed]\fR
-is equivalent to
-.QW \fB[a-c[:digit:]]\fR .
-Also,
-.QW \fB[a-c\eD]\fR ,
-which is equivalent to
-.QW \fB[a-c^[:digit:]]\fR ,
-is illegal.)
-.SS "CONSTRAINT ESCAPES"
-A constraint escape (AREs only) is a constraint, matching the empty
-string if specific conditions are met, written as an escape:
-.RS 2
-.TP 6
-\fB\eA\fR
-.
-matches only at the beginning of the string (see \fBMATCHING\fR,
-below, for how this differs from
-.QW \fB^\fR )
-.TP
-\fB\em\fR
-.
-matches only at the beginning of a word
-.TP
-\fB\eM\fR
-.
-matches only at the end of a word
-.TP
-\fB\ey\fR
-.
-matches only at the beginning or end of a word
-.TP
-\fB\eY\fR
-.
-matches only at a point that is not the beginning or end of a word
-.TP
-\fB\eZ\fR
-.
-matches only at the end of the string (see \fBMATCHING\fR, below, for
-how this differs from
-.QW \fB$\fR )
-.TP
-\fB\e\fIm\fR
-.
-(where \fIm\fR is a nonzero digit) a \fIback reference\fR, see below
-.TP
-\fB\e\fImnn\fR
-.
-(where \fIm\fR is a nonzero digit, and \fInn\fR is some more digits,
-and the decimal value \fImnn\fR is not greater than the number of
-closing capturing parentheses seen so far) a \fIback reference\fR, see
-below
-.RE
-.PP
-A word is defined as in the specification of
-.QW \fB[[:<:]]\fR
-and
-.QW \fB[[:>:]]\fR
-above. Constraint escapes are illegal within bracket expressions.
-.SS "BACK REFERENCES"
-A back reference (AREs only) matches the same string matched by the
-parenthesized subexpression specified by the number, so that (e.g.)
-.QW \fB([bc])\e1\fR
-matches
-.QW \fBbb\fR
-or
-.QW \fBcc\fR
-but not
-.QW \fBbc\fR .
-The subexpression must entirely precede the back reference in the RE.
-Subexpressions are numbered in the order of their leading parentheses.
-Non-capturing parentheses do not define subexpressions.
-.PP
-There is an inherent historical ambiguity between octal
-character-entry escapes and back references, which is resolved by
-heuristics, as hinted at above. A leading zero always indicates an
-octal escape. A single non-zero digit, not followed by another digit,
-is always taken as a back reference. A multi-digit sequence not
-starting with a zero is taken as a back reference if it comes after a
-suitable subexpression (i.e. the number is in the legal range for a
-back reference), and otherwise is taken as octal.
-.SH "METASYNTAX"
-In addition to the main syntax described above, there are some special
-forms and miscellaneous syntactic facilities available.
-.PP
-Normally the flavor of RE being used is specified by
-application-dependent means. However, this can be overridden by a
-\fIdirector\fR. If an RE of any flavor begins with
-.QW \fB***:\fR ,
-the rest of the RE is an ARE. If an RE of any flavor begins with
-.QW \fB***=\fR ,
-the rest of the RE is taken to be a literal string, with
-all characters considered ordinary characters.
-.PP
-An ARE may begin with \fIembedded options\fR: a sequence
-\fB(?\fIxyz\fB)\fR (where \fIxyz\fR is one or more alphabetic
-characters) specifies options affecting the rest of the RE. These
-supplement, and can override, any options specified by the
-application. The available option letters are:
-.RS 2
-.TP 3
-\fBb\fR
-.
-rest of RE is a BRE
-.TP 3
-\fBc\fR
-.
-case-sensitive matching (usual default)
-.TP 3
-\fBe\fR
-.
-rest of RE is an ERE
-.TP 3
-\fBi\fR
-.
-case-insensitive matching (see \fBMATCHING\fR, below)
-.TP 3
-\fBm\fR
-.
-historical synonym for \fBn\fR
-.TP 3
-\fBn\fR
-.
-newline-sensitive matching (see \fBMATCHING\fR, below)
-.TP 3
-\fBp\fR
-.
-partial newline-sensitive matching (see \fBMATCHING\fR, below)
-.TP 3
-\fBq\fR
-.
-rest of RE is a literal
-.PQ quoted
-string, all ordinary characters
-.TP 3
-\fBs\fR
-.
-non-newline-sensitive matching (usual default)
-.TP 3
-\fBt\fR
-.
-tight syntax (usual default; see below)
-.TP 3
-\fBw\fR
-.
-inverse partial newline-sensitive
-.PQ weird
-matching (see \fBMATCHING\fR, below)
-.TP 3
-\fBx\fR
-.
-expanded syntax (see below)
-.RE
-.PP
-Embedded options take effect at the \fB)\fR terminating the sequence.
-They are available only at the start of an ARE, and may not be used
-later within it.
-.PP
-In addition to the usual (\fItight\fR) RE syntax, in which all
-characters are significant, there is an \fIexpanded\fR syntax,
-available in all flavors of RE with the \fB\-expanded\fR switch, or in
-AREs with the embedded x option. In the expanded syntax, white-space
-characters are ignored and all characters between a \fB#\fR and the
-following newline (or the end of the RE) are ignored, permitting
-paragraphing and commenting a complex RE. There are three exceptions
-to that basic rule:
-.IP \(bu 3
-a white-space character or
-.QW \fB#\fR
-preceded by
-.QW \fB\e\fR
-is retained
-.IP \(bu 3
-white space or
-.QW \fB#\fR
-within a bracket expression is retained
-.IP \(bu 3
-white space and comments are illegal within multi-character symbols
-like the ARE
-.QW \fB(?:\fR
-or the BRE
-.QW \fB\e(\fR
-.PP
-Expanded-syntax white-space characters are blank, tab, newline, and
-any character that belongs to the \fIspace\fR character class.
-.PP
-Finally, in an ARE, outside bracket expressions, the sequence
-.QW \fB(?#\fIttt\fB)\fR
-(where \fIttt\fR is any text not containing a
-.QW \fB)\fR )
-is a comment, completely ignored. Again, this is not
-allowed between the characters of multi-character symbols like
-.QW \fB(?:\fR .
-Such comments are more a historical artifact than a useful facility,
-and their use is deprecated; use the expanded syntax instead.
-.PP
-\fINone\fR of these metasyntax extensions is available if the
-application (or an initial
-.QW \fB***=\fR
-director) has specified that the
-user's input be treated as a literal string rather than as an RE.
-.SH MATCHING
-In the event that an RE could match more than one substring of a given
-string, the RE matches the one starting earliest in the string. If
-the RE could match more than one substring starting at that point, its
-choice is determined by its \fIpreference\fR: either the longest
-substring, or the shortest.
-.PP
-Most atoms, and all constraints, have no preference. A parenthesized
-RE has the same preference (possibly none) as the RE. A quantified
-atom with quantifier \fB{\fIm\fB}\fR or \fB{\fIm\fB}?\fR has the same
-preference (possibly none) as the atom itself. A quantified atom with
-other normal quantifiers (including \fB{\fIm\fB,\fIn\fB}\fR with
-\fIm\fR equal to \fIn\fR) prefers longest match. A quantified atom
-with other non-greedy quantifiers (including \fB{\fIm\fB,\fIn\fB}?\fR
-with \fIm\fR equal to \fIn\fR) prefers shortest match. A branch has
-the same preference as the first quantified atom in it which has a
-preference. An RE consisting of two or more branches connected by the
-\fB|\fR operator prefers longest match.
-.PP
-Subject to the constraints imposed by the rules for matching the whole
-RE, subexpressions also match the longest or shortest possible
-substrings, based on their preferences, with subexpressions starting
-earlier in the RE taking priority over ones starting later. Note that
-outer subexpressions thus take priority over their component
-subexpressions.
-.PP
-The quantifiers \fB{1,1}\fR and \fB{1,1}?\fR can be used to
-force longest and shortest preference, respectively, on a
-subexpression or a whole RE.
-.RS
-.PP
-\fBNOTE:\fR This means that you can usually make a RE be non-greedy overall by
-putting \fB{1,1}?\fR after one of the first non-constraint atoms or
-parenthesized sub-expressions in it. \fIIt pays to experiment\fR with the
-placing of this non-greediness override on a suitable range of input texts
-when you are writing a RE if you are using this level of complexity.
-.PP
-For example, this regular expression is non-greedy, and will match the
-shortest substring possible given that
-.QW \fBabc\fR
-will be matched as early as possible (the quantifier does not change that):
-.PP
-.CS
-ab{1,1}?c.*x.*cba
-.CE
-.PP
-The atom
-.QW \fBa\fR
-has no greediness preference, we explicitly give one for
-.QW \fBb\fR ,
-and the remaining quantifiers are overridden to be non-greedy by the preceding
-non-greedy quantifier.
-.RE
-.PP
-Match lengths are measured in characters, not collating elements. An
-empty string is considered longer than no match at all. For example,
-.QW \fBbb*\fR
-matches the three middle characters of
-.QW \fBabbbc\fR ,
-.QW \fB(week|wee)(night|knights)\fR
-matches all ten characters of
-.QW \fBweeknights\fR ,
-when
-.QW \fB(.*).*\fR
-is matched against
-.QW \fBabc\fR
-the parenthesized subexpression matches all three characters, and when
-.QW \fB(a*)*\fR
-is matched against
-.QW \fBbc\fR
-both the whole RE and the parenthesized subexpression match an empty string.
-.PP
-If case-independent matching is specified, the effect is much as if
-all case distinctions had vanished from the alphabet. When an
-alphabetic that exists in multiple cases appears as an ordinary
-character outside a bracket expression, it is effectively transformed
-into a bracket expression containing both cases, so that \fBx\fR
-becomes
-.QW \fB[xX]\fR .
-When it appears inside a bracket expression,
-all case counterparts of it are added to the bracket expression, so
-that
-.QW \fB[x]\fR
-becomes
-.QW \fB[xX]\fR
-and
-.QW \fB[^x]\fR
-becomes
-.QW \fB[^xX]\fR .
-.PP
-If newline-sensitive matching is specified, \fB.\fR and bracket
-expressions using \fB^\fR will never match the newline character (so
-that matches will never cross newlines unless the RE explicitly
-arranges it) and \fB^\fR and \fB$\fR will match the empty string after
-and before a newline respectively, in addition to matching at
-beginning and end of string respectively. ARE \fB\eA\fR and \fB\eZ\fR
-continue to match beginning or end of string \fIonly\fR.
-.PP
-If partial newline-sensitive matching is specified, this affects
-\fB.\fR and bracket expressions as with newline-sensitive matching,
-but not \fB^\fR and \fB$\fR.
-.PP
-If inverse partial newline-sensitive matching is specified, this
-affects \fB^\fR and \fB$\fR as with newline-sensitive matching, but
-not \fB.\fR and bracket expressions. This is not very useful but is
-provided for symmetry.
-.SH "LIMITS AND COMPATIBILITY"
-No particular limit is imposed on the length of REs. Programs
-intended to be highly portable should not employ REs longer than 256
-bytes, as a POSIX-compliant implementation can refuse to accept such
-REs.
-.PP
-The only feature of AREs that is actually incompatible with POSIX EREs
-is that \fB\e\fR does not lose its special significance inside bracket
-expressions. All other ARE features use syntax which is illegal or
-has undefined or unspecified effects in POSIX EREs; the \fB***\fR
-syntax of directors likewise is outside the POSIX syntax for both BREs
-and EREs.
-.PP
-Many of the ARE extensions are borrowed from Perl, but some have been
-changed to clean them up, and a few Perl extensions are not present.
-Incompatibilities of note include
-.QW \fB\eb\fR ,
-.QW \fB\eB\fR ,
-the lack of special treatment for a trailing newline, the addition of
-complemented bracket expressions to the things affected by
-newline-sensitive matching, the restrictions on parentheses and back
-references in lookahead constraints, and the longest/shortest-match
-(rather than first-match) matching semantics.
-.PP
-The matching rules for REs containing both normal and non-greedy
-quantifiers have changed since early beta-test versions of this
-package. (The new rules are much simpler and cleaner, but do not work
-as hard at guessing the user's real intentions.)
-.PP
-Henry Spencer's original 1986 \fIregexp\fR package, still in
-widespread use (e.g., in pre-8.1 releases of Tcl), implemented an
-early version of today's EREs. There are four incompatibilities
-between \fIregexp\fR's near-EREs
-.PQ RREs " for short"
-and AREs. In roughly increasing order of significance:
-.IP \(bu 3
-In AREs, \fB\e\fR followed by an alphanumeric character is either an
-escape or an error, while in RREs, it was just another way of writing
-the alphanumeric. This should not be a problem because there was no
-reason to write such a sequence in RREs.
-.IP \(bu 3
-\fB{\fR followed by a digit in an ARE is the beginning of a bound,
-while in RREs, \fB{\fR was always an ordinary character. Such
-sequences should be rare, and will often result in an error because
-following characters will not look like a valid bound.
-.IP \(bu 3
-In AREs, \fB\e\fR remains a special character within
-.QW \fB[\|]\fR ,
-so a literal \fB\e\fR within \fB[\|]\fR must be written
-.QW \fB\e\e\fR .
-\fB\e\e\fR also gives a literal \fB\e\fR within \fB[\|]\fR in RREs,
-but only truly paranoid programmers routinely doubled the backslash.
-.IP \(bu 3
-AREs report the longest/shortest match for the RE, rather than the
-first found in a specified search order. This may affect some RREs
-which were written in the expectation that the first match would be
-reported. (The careful crafting of RREs to optimize the search order
-for fast matching is obsolete (AREs examine all possible matches in
-parallel, and their performance is largely insensitive to their
-complexity) but cases where the search order was exploited to
-deliberately find a match which was \fInot\fR the longest/shortest
-will need rewriting.)
-.SH "BASIC REGULAR EXPRESSIONS"
-BREs differ from EREs in several respects.
-.QW \fB|\fR ,
-.QW \fB+\fR ,
-and \fB?\fR are ordinary characters and there is no equivalent for their
-functionality. The delimiters for bounds are \fB\e{\fR and
-.QW \fB\e}\fR ,
-with \fB{\fR and \fB}\fR by themselves ordinary characters. The
-parentheses for nested subexpressions are \fB\e(\fR and
-.QW \fB\e)\fR ,
-with \fB(\fR and \fB)\fR by themselves ordinary
-characters. \fB^\fR is an ordinary character except at the beginning
-of the RE or the beginning of a parenthesized subexpression, \fB$\fR
-is an ordinary character except at the end of the RE or the end of a
-parenthesized subexpression, and \fB*\fR is an ordinary character if
-it appears at the beginning of the RE or the beginning of a
-parenthesized subexpression (after a possible leading
-.QW \fB^\fR ).
-Finally, single-digit back references are available, and \fB\e<\fR and
-\fB\e>\fR are synonyms for
-.QW \fB[[:<:]]\fR
-and
-.QW \fB[[:>:]]\fR
-respectively; no other escapes are available.
-.SH "SEE ALSO"
-RegExp(3), regexp(n), regsub(n), lsearch(n), switch(n), text(n)
-.SH KEYWORDS
-match, regular expression, string
-.\" Local Variables:
-.\" mode: nroff
-.\" End:
author	William Joye <wjoye@cfa.harvard.edu>	2017-09-22 18:51:12 (GMT)
committer	William Joye <wjoye@cfa.harvard.edu>	2017-09-22 18:51:12 (GMT)
commit	3fa8e6dc88e8041b6cb88d1b1e9c05676d3346b7 (patch)
tree	69afbb41089c8358615879f7cd3c4cf7997f4c7e /tcl8.6/doc/re_syntax.n
parent	a0e17db23c0fd7c771c0afce8cce350c98f90b02 (diff)
download	blt-3fa8e6dc88e8041b6cb88d1b1e9c05676d3346b7.zip blt-3fa8e6dc88e8041b6cb88d1b1e9c05676d3346b7.tar.gz blt-3fa8e6dc88e8041b6cb88d1b1e9c05676d3346b7.tar.bz2