diff options
Diffstat (limited to 'doc/regexp.n')
-rw-r--r-- | doc/regexp.n | 96 |
1 files changed, 64 insertions, 32 deletions
diff --git a/doc/regexp.n b/doc/regexp.n index a1692bd..100f0d8 100644 --- a/doc/regexp.n +++ b/doc/regexp.n @@ -18,8 +18,8 @@ regexp \- Match a regular expression against a string .SH DESCRIPTION .PP Determines whether the regular expression \fIexp\fR matches part or -all of \fIstring\fR and returns 1 if it does, 0 if it doesn't, unless -\fB-inline\fR is specified (see below). +all of \fIstring\fR and returns 1 if it does, 0 if it does not, unless +\fB\-inline\fR is specified (see below). (Regular expression matching is described in the \fBre_syntax\fR reference page.) .LP @@ -60,20 +60,37 @@ range of characters. \fB\-line\fR Enables newline-sensitive matching. By default, newline is a completely ordinary character with no special meaning. With this -flag, `[^' bracket expressions and `.' never match newline, `^' +flag, +.QW [^ +bracket expressions and +.QW . +never match newline, +.QW ^ matches an empty string after any newline in addition to its normal -function, and `$' matches an empty string before any newline in +function, and +.QW $ +matches an empty string before any newline in addition to its normal function. This flag is equivalent to specifying both \fB\-linestop\fR and \fB\-lineanchor\fR, or the \fB(?n)\fR embedded option (see the \fBre_syntax\fR manual page). .TP 15 \fB\-linestop\fR -Changes the behavior of `[^' bracket expressions and `.' so that they +Changes the behavior of +.QW [^ +bracket expressions and +.QW . +so that they stop at newlines. This is the same as specifying the \fB(?p)\fR embedded option (see the \fBre_syntax\fR manual page). .TP 15 \fB\-lineanchor\fR -Changes the behavior of `^' and `$' (the ``anchors'') so they match the +Changes the behavior of +.QW ^ +and +.QW $ +(the +.QW anchors ) +so they match the beginning and end of a line respectively. This is the same as specifying the \fB(?w)\fR embedded option (see the \fBre_syntax\fR manual page). @@ -81,7 +98,6 @@ manual page). \fB\-nocase\fR Causes upper-case characters in \fIstring\fR to be treated as lower case during the matching process. -.VS 8.3 .TP 15 \fB\-all\fR Causes the regular expression to be matched as many times as possible @@ -91,69 +107,85 @@ the last match only. .TP 15 \fB\-inline\fR Causes the command to return, as a list, the data that would otherwise -be placed in match variables. When using \fB-inline\fR, -match variables may not be specified. If used with \fB-all\fR, the +be placed in match variables. When using \fB\-inline\fR, +match variables may not be specified. If used with \fB\-all\fR, the list will be concatenated at each iteration, such that a flat list is always returned. For each match iteration, the command will append the overall match data, plus one element for each subexpression in the regular expression. Examples are: .CS - regexp -inline -- {\\w(\\w)} " inlined " - => {in n} - regexp -all -inline -- {\\w(\\w)} " inlined " - => {in n li i ne e} +\fBregexp\fR -inline -- {\ew(\ew)} " inlined " + \fI\(-> in n\fR +\fBregexp\fR -all -inline -- {\ew(\ew)} " inlined " + \fI\(-> in n li i ne e\fR .CE .TP 15 \fB\-start\fR \fIindex\fR Specifies a character index offset into the string to start -matching the regular expression at. When using this switch, `^' -will not match the beginning of the line, and \\A will still +matching the regular expression at. +.VS 8.5 +The \fIindex\fR value is interpreted in the same manner +as the \fIindex\fR argument to \fBstring index\fR. +.VE 8.5 +When using this switch, +.QW ^ +will not match the beginning of the line, and \eA will still match the start of the string at \fIindex\fR. If \fB\-indices\fR is specified, the indices will be indexed starting from the absolute beginning of the input string. \fIindex\fR will be constrained to the bounds of the input string. -.VE 8.3 .TP 15 \fB\-\|\-\fR Marks the end of switches. The argument following this one will be treated as \fIexp\fR even if it starts with a \fB\-\fR. .PP -If there are more \fIsubMatchVar\fR's than parenthesized +If there are more \fIsubMatchVar\fRs than parenthesized subexpressions within \fIexp\fR, or if a particular subexpression -in \fIexp\fR doesn't match the string (e.g. because it was in a -portion of the expression that wasn't matched), then the corresponding -\fIsubMatchVar\fR will be set to ``\fB\-1 \-1\fR'' if \fB\-indices\fR -has been specified or to an empty string otherwise. +in \fIexp\fR does not match the string (e.g. because it was in a +portion of the expression that was not matched), then the corresponding +\fIsubMatchVar\fR will be set to +.QW "\fB\-1 \-1\fR" +if \fB\-indices\fR has been specified or to an empty string otherwise. .SH EXAMPLES +.PP Find the first occurrence of a word starting with \fBfoo\fR in a string that is not actually an instance of \fBfoobar\fR, and get the letters following it up to the end of the word into a variable: .CS -\fBregexp\fR {\\<foo(?!bar\\>)(\\w*)} $string \-> restOfWord +\fBregexp\fR {\emfoo(?!bar\eM)(\ew*)} $string \-> restOfWord .CE Note that the whole matched substring has been placed in the variable -\fB\->\fR which is a name chosen to look nice given that we are not +.QW \fB\->\fR , +which is a name chosen to look nice given that we are not actually interested in its contents. .PP Find the index of the word \fBbadger\fR (in any case) within a string and store that in the variable \fBlocation\fR: .CS -\fBregexp\fR \-indices {(?i)\\<badger\\>} $string location +\fBregexp\fR \-indices {(?i)\embadger\eM} $string location +.CE +This could also be written as a \fIbasic\fR regular expression (as opposed +to using the default syntax of \fIadvanced\fR regular expressions) match by +prefixing the expression with a suitable flag: +.CS +\fBregexp\fR \-indices {(?ib)\e<badger\e>} $string location .CE .PP -Count the number of octal digits in a string: +This counts the number of octal digits in a string: .CS \fBregexp\fR \-all {[0\-7]} $string .CE .PP -List all words (consisting of all sequences of non-whitespace -characters) in a string: +This lists all words (consisting of all sequences of non-whitespace +characters) in a string, and is useful as a more powerful version of the +\fBsplit\fR command: .CS -\fBregexp\fR \-all \-inline {\\S+} $string +\fBregexp\fR \-all \-inline {\eS+} $string .CE - .SH "SEE ALSO" -re_syntax(n), regsub(n) - +re_syntax(n), regsub(n), +.VS 8.5 +string(n) +.VE .SH KEYWORDS -match, regular expression, string +match, parsing, pattern, regular expression, splitting, string |