diff options
Diffstat (limited to 'doc/regexp.n')
-rw-r--r-- | doc/regexp.n | 140 |
1 files changed, 45 insertions, 95 deletions
diff --git a/doc/regexp.n b/doc/regexp.n index 0d08dcf..e19ae65 100644 --- a/doc/regexp.n +++ b/doc/regexp.n @@ -4,7 +4,7 @@ '\" See the file "license.terms" for information on usage and redistribution '\" of this file, and for a DISCLAIMER OF ALL WARRANTIES. '\" -'\" RCS: @(#) $Id: regexp.n,v 1.3 1999/04/16 00:46:35 stanton Exp $ +'\" RCS: @(#) $Id: regexp.n,v 1.4 1999/04/30 22:45:01 stanton Exp $ '\" .so man.macros .TH regexp n 8.1 Tcl "Tcl Built-In Commands" @@ -204,8 +204,7 @@ see ESCAPES below .TP \fB{\fR when followed by a character other than a digit, -matches the character -`\fB{\fR'; +matches the left-brace character `\fB{\fR'; when followed by a digit, it is the beginning of a \fIbound\fR (see above) .TP @@ -239,20 +238,16 @@ where no substring matching \fIre\fR begins The lookahead constraints may not contain back references (see later), and all parentheses within them are considered non-capturing. .PP -An RE may not end with -`\fB\e\fR'. +An RE may not end with `\fB\e\fR'. .SH "BRACKET EXPRESSIONS" -A \fIbracket expression\fR is a list of characters enclosed in -`\fB[\|]\fR'. +A \fIbracket expression\fR is a list of characters enclosed in `\fB[\|]\fR'. It normally matches any single character from the list (but see below). -If the list begins with -`\fB^\fR', +If the list begins with `\fB^\fR', it matches any single character (but see below) \fInot\fR from the rest of the list. .PP -If two characters in the list are separated by -`\fB\-\fR', +If two characters in the list are separated by `\fB\-\fR', this is shorthand for the full \fIrange\fR of characters between those two (inclusive) in the collating sequence, @@ -279,20 +274,16 @@ and to make it a collating element (see below). Alternatively, make it the first character -(following a possible -`\fB^\fR'), -or (AREs only) precede it with -`\fB\e\fR'. -Alternatively, for -`\fB\-\fR', +(following a possible `\fB^\fR'), +or (AREs only) precede it with `\fB\e\fR'. +Alternatively, for `\fB\-\fR', make it the last character, or the second endpoint of a range. To use a literal \fB\-\fR as the first endpoint of a range, make it a collating element -or (AREs only) precede it with -`\fB\e\fR'. +or (AREs only) precede it with `\fB\e\fR'. With the exception of these, some combinations using \fB[\fR (see next @@ -324,12 +315,10 @@ multi-character collating element, then the RE \fB[[.ch.]]*c\fR matches the first five characters -of -`\fBchchcc\fR', +of `\fBchchcc\fR', and the RE \fB[^c]b\fR -matches all of -`\fBchb\fR'. +matches all of `\fBchb\fR'. .PP Within a bracket expression, a collating element enclosed in \fB[=\fR @@ -338,20 +327,15 @@ and is an equivalence class, standing for the sequences of characters of all collating elements equivalent to that one, including itself. (If there are no other equivalent collating elements, -the treatment is as if the enclosing delimiters were -`\fB[.\fR'\& -and -`\fB.]\fR'.) +the treatment is as if the enclosing delimiters were `\fB[.\fR'\& +and `\fB.]\fR'.) For example, if \fBo\fR and \fB\o'o^'\fR are the members of an equivalence class, -then -`\fB[[=o=]]\fR', -`\fB[[=\o'o^'=]]\fR', -and -`\fB[o\o'o^']\fR'\& +then `\fB[[=o=]]\fR', `\fB[[=\o'o^'=]]\fR', +and `\fB[o\o'o^']\fR'\& are all synonymous. An equivalence class may not be an endpoint of a range. @@ -448,8 +432,7 @@ and whose other bits are all zero .TP \fB\ee\fR the character whose collating-sequence name -is -`\fBESC\fR', +is `\fBESC\fR', or failing that, the character with octal value 033 .TP \fB\ef\fR @@ -513,13 +496,9 @@ the character whose octal value is \fB0\fIxyz\fR .RE .PP -Hexadecimal digits are -`\fB0\fR'-`\fB9\fR', -`\fBa\fR'-`\fBf\fR', -and -`\fBA\fR'-`\fBF\fR'. -Octal digits are -`\fB0\fR'-`\fB7\fR'. +Hexadecimal digits are `\fB0\fR'-`\fB9\fR', `\fBa\fR'-`\fBf\fR', +and `\fBA\fR'-`\fBF\fR'. +Octal digits are `\fB0\fR'-`\fB7\fR'. .PP The character-entry escapes are always taken as ordinary characters. For example, @@ -532,8 +511,7 @@ but does not terminate a bracket expression. Beware, however, that some applications (e.g., C compilers) interpret such sequences themselves before the regular-expression package -gets to see them, which may require doubling (quadrupling, etc.) the -`\fB\e\fR'. +gets to see them, which may require doubling (quadrupling, etc.) the `\fB\e\fR'. .PP Class-shorthand escapes (AREs only) provide shorthands for certain commonly-used character classes: @@ -560,17 +538,11 @@ character classes: (note underscore) .RE .PP -Within bracket expressions, -`\fB\ed\fR', -`\fB\es\fR', -and -`\fB\ew\fR'\& +Within bracket expressions, `\fB\ed\fR', `\fB\es\fR', +and `\fB\ew\fR'\& lose their outer brackets, -and -`\fB\eD\fR', -`\fB\eS\fR', -and -`\fB\eW\fR'\& +and `\fB\eD\fR', `\fB\eS\fR', +and `\fB\eW\fR'\& are illegal. .PP A constraint escape (AREs only) is a constraint, @@ -580,8 +552,7 @@ written as an escape: .TP 6 \fB\eA\fR matches only at the beginning of the string -(see MATCHING, below, for how this differs from -`\fB^\fR') +(see MATCHING, below, for how this differs from `\fB^\fR') .TP \fB\em\fR matches only at the beginning of a word @@ -597,8 +568,7 @@ matches only at a point which is not the beginning or end of a word .TP \fB\eZ\fR matches only at the end of the string -(see MATCHING, below, for how this differs from -`\fB$\fR') +(see MATCHING, below, for how this differs from `\fB$\fR') .TP \fB\e\fIm\fR (where @@ -632,8 +602,7 @@ matches \fBbb\fR or \fBcc\fR -but not -`\fBbc\fR'. +but not `\fBbc\fR'. The subexpression must entirely precede the back reference in the RE. Subexpressions are numbered in the order of their leading parentheses. Non-capturing parentheses do not define subexpressions. @@ -655,11 +624,9 @@ forms and miscellaneous syntactic facilities available. Normally the flavor of RE being used is specified by application-dependent means. However, this can be overridden by a \fIdirector\fR. -If an RE of any flavor begins with -`\fB***:\fR', +If an RE of any flavor begins with `\fB***:\fR', the rest of the RE is an ARE. -If an RE of any flavor begins with -`\fB***=\fR', +If an RE of any flavor begins with `\fB***=\fR', the rest of the RE is taken to be a literal string, with all characters considered ordinary characters. .PP @@ -750,17 +717,14 @@ if at all, is application-specific; expanded syntax is primarily a scripting facility. .PP Finally, in an ARE, -outside bracket expressions, the sequence -`\fB(?#\fIttt\fB)\fR' +outside bracket expressions, the sequence `\fB(?#\fIttt\fB)\fR' (where \fIttt\fR -is any text not containing a -`\fB)\fR') +is any text not containing a `\fB)\fR') is a comment, completely ignored. Again, this is not allowed between the characters of -multi-character symbols like -`\fB(?:\fR'. +multi-character symbols like `\fB(?:\fR'. Such comments are more a historical artifact than a useful facility, and their use is deprecated; use the expanded syntax instead. @@ -825,11 +789,9 @@ Match lengths are measured in characters, not collating elements. An empty string is considered longer than no match at all. For example, \fBbb*\fR -matches the three middle characters of -`\fBabbbc\fR', +matches the three middle characters of `\fBabbbc\fR', \fB(week|wee)(night|knights)\fR -matches all ten characters of -`\fBweeknights\fR', +matches all ten characters of `\fBweeknights\fR', when \fB(.*).*\fR is matched against @@ -851,8 +813,7 @@ ordinary character outside a bracket expression, it is effectively transformed into a bracket expression containing both cases, so that \fBx\fR -becomes -`\fB[xX]\fR'. +becomes `\fB[xX]\fR'. When it appears inside a bracket expression, all case counterparts of it are added to the bracket expression, so that \fB[x]\fR @@ -860,8 +821,7 @@ becomes \fB[xX]\fR and \fB[^x]\fR -becomes -`\fB[^xX]\fR'. +becomes `\fB[^xX]\fR'. .PP If newline-sensitive matching is specified, \fB.\fR @@ -889,8 +849,7 @@ this affects and bracket expressions as with newline-sensitive matching, but not \fB^\fR -and -`\fB$\fR'. +and `\fB$\fR'. .PP If inverse partial newline-sensitive matching is specified, this affects @@ -923,9 +882,7 @@ syntax for both BREs and EREs. .PP Many of the ARE extensions are borrowed from Perl, but some have been changed to clean them up, and a few Perl extensions are not present. -Incompatibilities of note include -`\fB\eb\fR', -`\fB\eB\fR', +Incompatibilities of note include `\fB\eb\fR', `\fB\eB\fR', the lack of special treatment for a trailing newline, the addition of complemented bracket expressions to the things affected by newline-sensitive matching, @@ -965,14 +922,12 @@ will not look like a valid bound. .PP In AREs, \fB\e\fR -remains a special character within -`\fB[\|]\fR', +remains a special character within `\fB[\|]\fR', so a literal \fB\e\fR within \fB[\|]\fR -must be written -`\fB\e\e\fR'. +must be written `\fB\e\e\fR'. \fB\e\e\fR also gives a literal \fB\e\fR @@ -993,17 +948,14 @@ find a match which was \fInot\fR the longest/shortest will need rewriting.) .RE .SH "BASIC REGULAR EXPRESSIONS" -BREs differ from EREs in several respects. -`\fB|\fR', -`\fB+\fR', +BREs differ from EREs in several respects. `\fB|\fR', `\fB+\fR', and \fB?\fR are ordinary characters and there is no equivalent for their functionality. The delimiters for bounds are \fB\e{\fR -and -`\fB\e}\fR', +and `\fB\e}\fR', with \fB{\fR and @@ -1011,8 +963,7 @@ and by themselves ordinary characters. The parentheses for nested subexpressions are \fB\e(\fR -and -`\fB\e)\fR', +and `\fB\e)\fR', with \fB(\fR and @@ -1028,8 +979,7 @@ and \fB*\fR is an ordinary character if it appears at the beginning of the RE or the beginning of a parenthesized subexpression -(after a possible leading -`\fB^\fR'). +(after a possible leading `\fB^\fR'). Finally, single-digit back references are available, and |