diff options
author | Guido van Rossum <guido@python.org> | 1996-10-24 22:49:13 (GMT) |
---|---|---|
committer | Guido van Rossum <guido@python.org> | 1996-10-24 22:49:13 (GMT) |
commit | 6240b0b7734e94cec89619745dd79b1254ffc56c (patch) | |
tree | 223b72fe19d4dfe82c65fe72ec0a505c642e45d7 /Doc/libregex.tex | |
parent | c45289cb8138d8747561bb5633ac4c17bad3e40a (diff) | |
download | cpython-6240b0b7734e94cec89619745dd79b1254ffc56c.zip cpython-6240b0b7734e94cec89619745dd79b1254ffc56c.tar.gz cpython-6240b0b7734e94cec89619745dd79b1254ffc56c.tar.bz2 |
Small nits only.
Diffstat (limited to 'Doc/libregex.tex')
-rw-r--r-- | Doc/libregex.tex | 28 |
1 files changed, 17 insertions, 11 deletions
diff --git a/Doc/libregex.tex b/Doc/libregex.tex index 45e7249..f2e094a 100644 --- a/Doc/libregex.tex +++ b/Doc/libregex.tex @@ -4,8 +4,8 @@ This module provides regular expression matching operations similar to those found in Emacs. It is always available. -By default the patterns are Emacs-style regular expressions, -with one exception. There is +By default the patterns are Emacs-style regular expressions +(with one exception). There is a way to change the syntax to match that of several well-known \UNIX{} utilities. The exception is that Emacs' \samp{\e s} pattern is not supported, since the original implementation references @@ -36,7 +36,8 @@ avoid interpretation as an octal escape. A regular expression (or RE) specifies a set of strings that matches it; the functions in this module let you check if a particular string -matches a given regular expression. +matches a given regular expression (or if a given regular expression +matches a particular string, which comes down to the same thing). Regular expressions can be concatenated to form new regular expressions; if \emph{A} and \emph{B} are both regular expressions, @@ -51,22 +52,23 @@ any textbook about compiler construction. % "Compilers: Principles, Techniques and Tools", by Alfred V. Aho, % Ravi Sethi, and Jeffrey D. Ullman, or some FA text. -A brief explanation of the format of regular -expressions follows. +A brief explanation of the format of regular expressions follows. Regular expressions can contain both special and ordinary characters. Ordinary characters, like '\code{A}', '\code{a}', or '\code{0}', are the simplest regular expressions; they simply match themselves. You can concatenate ordinary characters, so '\code{last}' matches the -characters 'last'. +characters 'last'. (In the rest of this section, we'll write RE's in +\code{this special font}, usually without quotes, and strings to be +matched 'in single quotes'.) Special characters either stand for classes of ordinary characters, or affect how the regular expressions around them are interpreted. The special characters are: \begin{itemize} -\item[\code{.}]{Matches any character except a newline.} -\item[\code{\^}]{Matches the start of the string.} +\item[\code{.}]{(Dot.) Matches any character except a newline.} +\item[\code{\^}]{(Caret.) Matches the start of the string.} \item[\code{\$}]{Matches the end of the string. \code{foo} matches both 'foo' and 'foobar', while the regular expression '\code{foo\$}' matches only 'foo'.} @@ -114,7 +116,8 @@ should be doubled are indicated. \begin{itemize} \item[\code{\e|}]\code{A\e|B}, where A and B can be arbitrary REs, -creates a regular expression that will match either A or B. +creates a regular expression that will match either A or B. This can +be used inside groups (see below) as well. % \item[\code{\e( \e)}]{Indicates the start and end of a group; the contents of a group can be matched later in the string with the @@ -126,7 +129,8 @@ number. For example, \code{\e (.+\e ) \e \e 1} matches 'the the' or '55 55', but not 'the end' (note the space after the group). This special sequence can only be used to match one of the first 9 groups; groups with higher numbers can be matched using the \code{\e v} -sequence.}} +sequence. (\code{\e 8} and \code{\e 9} don't need a double backslash +because they are not octal digits.)}} % \item[\code{\e \e b}]{Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of @@ -151,6 +155,8 @@ character.} \item[\code{\e >}]{Matches the empty string, but only at the end of a word.} +\item[\code{\e \e \e \e}]{Matches a literal backslash.} + % In Emacs, the following two are start of buffer/end of buffer. In % Python they seem to be synonyms for ^$. \item[\code{\e `}]{Like \code{\^}, this only matches at the start of the @@ -175,7 +181,7 @@ The module defines these functions, and an exception: \begin{funcdesc}{search}{pattern\, string} Return the first position in \var{string} that matches the regular - expression \var{pattern}. Return -1 if no position in the string + expression \var{pattern}. Return \code{-1} if no position in the string matches the pattern (this is different from a zero-length match anywhere!). \end{funcdesc} |