summaryrefslogtreecommitdiffstats
path: root/doc/regexp.n
diff options
context:
space:
mode:
Diffstat (limited to 'doc/regexp.n')
-rw-r--r--doc/regexp.n140
1 files changed, 45 insertions, 95 deletions
diff --git a/doc/regexp.n b/doc/regexp.n
index 0d08dcf..e19ae65 100644
--- a/doc/regexp.n
+++ b/doc/regexp.n
@@ -4,7 +4,7 @@
'\" See the file "license.terms" for information on usage and redistribution
'\" of this file, and for a DISCLAIMER OF ALL WARRANTIES.
'\"
-'\" RCS: @(#) $Id: regexp.n,v 1.3 1999/04/16 00:46:35 stanton Exp $
+'\" RCS: @(#) $Id: regexp.n,v 1.4 1999/04/30 22:45:01 stanton Exp $
'\"
.so man.macros
.TH regexp n 8.1 Tcl "Tcl Built-In Commands"
@@ -204,8 +204,7 @@ see ESCAPES below
.TP
\fB{\fR
when followed by a character other than a digit,
-matches the character
-`\fB{\fR';
+matches the left-brace character `\fB{\fR';
when followed by a digit, it is the beginning of a
\fIbound\fR (see above)
.TP
@@ -239,20 +238,16 @@ where no substring matching \fIre\fR begins
The lookahead constraints may not contain back references (see later),
and all parentheses within them are considered non-capturing.
.PP
-An RE may not end with
-`\fB\e\fR'.
+An RE may not end with `\fB\e\fR'.
.SH "BRACKET EXPRESSIONS"
-A \fIbracket expression\fR is a list of characters enclosed in
-`\fB[\|]\fR'.
+A \fIbracket expression\fR is a list of characters enclosed in `\fB[\|]\fR'.
It normally matches any single character from the list (but see below).
-If the list begins with
-`\fB^\fR',
+If the list begins with `\fB^\fR',
it matches any single character
(but see below) \fInot\fR from the rest of the list.
.PP
-If two characters in the list are separated by
-`\fB\-\fR',
+If two characters in the list are separated by `\fB\-\fR',
this is shorthand
for the full \fIrange\fR of characters between those two (inclusive) in the
collating sequence,
@@ -279,20 +274,16 @@ and
to make it a collating element (see below).
Alternatively,
make it the first character
-(following a possible
-`\fB^\fR'),
-or (AREs only) precede it with
-`\fB\e\fR'.
-Alternatively, for
-`\fB\-\fR',
+(following a possible `\fB^\fR'),
+or (AREs only) precede it with `\fB\e\fR'.
+Alternatively, for `\fB\-\fR',
make it the last character,
or the second endpoint of a range.
To use a literal
\fB\-\fR
as the first endpoint of a range,
make it a collating element
-or (AREs only) precede it with
-`\fB\e\fR'.
+or (AREs only) precede it with `\fB\e\fR'.
With the exception of these, some combinations using
\fB[\fR
(see next
@@ -324,12 +315,10 @@ multi-character collating element,
then the RE
\fB[[.ch.]]*c\fR
matches the first five characters
-of
-`\fBchchcc\fR',
+of `\fBchchcc\fR',
and the RE
\fB[^c]b\fR
-matches all of
-`\fBchb\fR'.
+matches all of `\fBchb\fR'.
.PP
Within a bracket expression, a collating element enclosed in
\fB[=\fR
@@ -338,20 +327,15 @@ and
is an equivalence class, standing for the sequences of characters
of all collating elements equivalent to that one, including itself.
(If there are no other equivalent collating elements,
-the treatment is as if the enclosing delimiters were
-`\fB[.\fR'\&
-and
-`\fB.]\fR'.)
+the treatment is as if the enclosing delimiters were `\fB[.\fR'\&
+and `\fB.]\fR'.)
For example, if
\fBo\fR
and
\fB\o'o^'\fR
are the members of an equivalence class,
-then
-`\fB[[=o=]]\fR',
-`\fB[[=\o'o^'=]]\fR',
-and
-`\fB[o\o'o^']\fR'\&
+then `\fB[[=o=]]\fR', `\fB[[=\o'o^'=]]\fR',
+and `\fB[o\o'o^']\fR'\&
are all synonymous.
An equivalence class may not be an endpoint
of a range.
@@ -448,8 +432,7 @@ and whose other bits are all zero
.TP
\fB\ee\fR
the character whose collating-sequence name
-is
-`\fBESC\fR',
+is `\fBESC\fR',
or failing that, the character with octal value 033
.TP
\fB\ef\fR
@@ -513,13 +496,9 @@ the character whose octal value is
\fB0\fIxyz\fR
.RE
.PP
-Hexadecimal digits are
-`\fB0\fR'-`\fB9\fR',
-`\fBa\fR'-`\fBf\fR',
-and
-`\fBA\fR'-`\fBF\fR'.
-Octal digits are
-`\fB0\fR'-`\fB7\fR'.
+Hexadecimal digits are `\fB0\fR'-`\fB9\fR', `\fBa\fR'-`\fBf\fR',
+and `\fBA\fR'-`\fBF\fR'.
+Octal digits are `\fB0\fR'-`\fB7\fR'.
.PP
The character-entry escapes are always taken as ordinary characters.
For example,
@@ -532,8 +511,7 @@ but
does not terminate a bracket expression.
Beware, however, that some applications (e.g., C compilers) interpret
such sequences themselves before the regular-expression package
-gets to see them, which may require doubling (quadrupling, etc.) the
-`\fB\e\fR'.
+gets to see them, which may require doubling (quadrupling, etc.) the `\fB\e\fR'.
.PP
Class-shorthand escapes (AREs only) provide shorthands for certain commonly-used
character classes:
@@ -560,17 +538,11 @@ character classes:
(note underscore)
.RE
.PP
-Within bracket expressions,
-`\fB\ed\fR',
-`\fB\es\fR',
-and
-`\fB\ew\fR'\&
+Within bracket expressions, `\fB\ed\fR', `\fB\es\fR',
+and `\fB\ew\fR'\&
lose their outer brackets,
-and
-`\fB\eD\fR',
-`\fB\eS\fR',
-and
-`\fB\eW\fR'\&
+and `\fB\eD\fR', `\fB\eS\fR',
+and `\fB\eW\fR'\&
are illegal.
.PP
A constraint escape (AREs only) is a constraint,
@@ -580,8 +552,7 @@ written as an escape:
.TP 6
\fB\eA\fR
matches only at the beginning of the string
-(see MATCHING, below, for how this differs from
-`\fB^\fR')
+(see MATCHING, below, for how this differs from `\fB^\fR')
.TP
\fB\em\fR
matches only at the beginning of a word
@@ -597,8 +568,7 @@ matches only at a point which is not the beginning or end of a word
.TP
\fB\eZ\fR
matches only at the end of the string
-(see MATCHING, below, for how this differs from
-`\fB$\fR')
+(see MATCHING, below, for how this differs from `\fB$\fR')
.TP
\fB\e\fIm\fR
(where
@@ -632,8 +602,7 @@ matches
\fBbb\fR
or
\fBcc\fR
-but not
-`\fBbc\fR'.
+but not `\fBbc\fR'.
The subexpression must entirely precede the back reference in the RE.
Subexpressions are numbered in the order of their leading parentheses.
Non-capturing parentheses do not define subexpressions.
@@ -655,11 +624,9 @@ forms and miscellaneous syntactic facilities available.
Normally the flavor of RE being used is specified by
application-dependent means.
However, this can be overridden by a \fIdirector\fR.
-If an RE of any flavor begins with
-`\fB***:\fR',
+If an RE of any flavor begins with `\fB***:\fR',
the rest of the RE is an ARE.
-If an RE of any flavor begins with
-`\fB***=\fR',
+If an RE of any flavor begins with `\fB***=\fR',
the rest of the RE is taken to be a literal string,
with all characters considered ordinary characters.
.PP
@@ -750,17 +717,14 @@ if at all, is application-specific;
expanded syntax is primarily a scripting facility.
.PP
Finally, in an ARE,
-outside bracket expressions, the sequence
-`\fB(?#\fIttt\fB)\fR'
+outside bracket expressions, the sequence `\fB(?#\fIttt\fB)\fR'
(where
\fIttt\fR
-is any text not containing a
-`\fB)\fR')
+is any text not containing a `\fB)\fR')
is a comment,
completely ignored.
Again, this is not allowed between the characters of
-multi-character symbols like
-`\fB(?:\fR'.
+multi-character symbols like `\fB(?:\fR'.
Such comments are more a historical artifact than a useful facility,
and their use is deprecated;
use the expanded syntax instead.
@@ -825,11 +789,9 @@ Match lengths are measured in characters, not collating elements.
An empty string is considered longer than no match at all.
For example,
\fBbb*\fR
-matches the three middle characters of
-`\fBabbbc\fR',
+matches the three middle characters of `\fBabbbc\fR',
\fB(week|wee)(night|knights)\fR
-matches all ten characters of
-`\fBweeknights\fR',
+matches all ten characters of `\fBweeknights\fR',
when
\fB(.*).*\fR
is matched against
@@ -851,8 +813,7 @@ ordinary character outside a bracket expression, it is effectively
transformed into a bracket expression containing both cases,
so that
\fBx\fR
-becomes
-`\fB[xX]\fR'.
+becomes `\fB[xX]\fR'.
When it appears inside a bracket expression, all case counterparts
of it are added to the bracket expression, so that
\fB[x]\fR
@@ -860,8 +821,7 @@ becomes
\fB[xX]\fR
and
\fB[^x]\fR
-becomes
-`\fB[^xX]\fR'.
+becomes `\fB[^xX]\fR'.
.PP
If newline-sensitive matching is specified,
\fB.\fR
@@ -889,8 +849,7 @@ this affects
and bracket expressions
as with newline-sensitive matching, but not
\fB^\fR
-and
-`\fB$\fR'.
+and `\fB$\fR'.
.PP
If inverse partial newline-sensitive matching is specified,
this affects
@@ -923,9 +882,7 @@ syntax for both BREs and EREs.
.PP
Many of the ARE extensions are borrowed from Perl, but some have
been changed to clean them up, and a few Perl extensions are not present.
-Incompatibilities of note include
-`\fB\eb\fR',
-`\fB\eB\fR',
+Incompatibilities of note include `\fB\eb\fR', `\fB\eB\fR',
the lack of special treatment for a trailing newline,
the addition of complemented bracket expressions to the things
affected by newline-sensitive matching,
@@ -965,14 +922,12 @@ will not look like a valid bound.
.PP
In AREs,
\fB\e\fR
-remains a special character within
-`\fB[\|]\fR',
+remains a special character within `\fB[\|]\fR',
so a literal
\fB\e\fR
within
\fB[\|]\fR
-must be written
-`\fB\e\e\fR'.
+must be written `\fB\e\e\fR'.
\fB\e\e\fR
also gives a literal
\fB\e\fR
@@ -993,17 +948,14 @@ find a match which was \fInot\fR the longest/shortest will need rewriting.)
.RE
.SH "BASIC REGULAR EXPRESSIONS"
-BREs differ from EREs in several respects.
-`\fB|\fR',
-`\fB+\fR',
+BREs differ from EREs in several respects. `\fB|\fR', `\fB+\fR',
and
\fB?\fR
are ordinary characters and there is no equivalent
for their functionality.
The delimiters for bounds are
\fB\e{\fR
-and
-`\fB\e}\fR',
+and `\fB\e}\fR',
with
\fB{\fR
and
@@ -1011,8 +963,7 @@ and
by themselves ordinary characters.
The parentheses for nested subexpressions are
\fB\e(\fR
-and
-`\fB\e)\fR',
+and `\fB\e)\fR',
with
\fB(\fR
and
@@ -1028,8 +979,7 @@ and
\fB*\fR
is an ordinary character if it appears at the beginning of the
RE or the beginning of a parenthesized subexpression
-(after a possible leading
-`\fB^\fR').
+(after a possible leading `\fB^\fR').
Finally,
single-digit back references are available,
and