Made a number of revisions suggested by Fredrik Lundh.

Revised the first paragraph so it doesn't sound like it was written when 7-bit strings were assumed; note that Unicode strings can be used.
author: Fred Drake <fdrake@acm.org> 2000-10-06 19:59:22 (GMT)
committer: Fred Drake <fdrake@acm.org> 2000-10-06 19:59:22 (GMT)
commit: 062ea2e70bd320a5b4b3cd1907babf19c17c6622 (patch)
tree: be24263a1999eba769ee0e96d2e4428e2d248421 /Doc/lib/libre.tex
parent: e2b7c4dea38568ac7fa1576167fb1b32bf9cdf3f (diff)
download: cpython-062ea2e70bd320a5b4b3cd1907babf19c17c6622.zip
cpython-062ea2e70bd320a5b4b3cd1907babf19c17c6622.tar.gz
cpython-062ea2e70bd320a5b4b3cd1907babf19c17c6622.tar.bz2
1 files changed, 33 insertions, 12 deletions
diff --git a/Doc/lib/libre.tex b/Doc/lib/libre.tex
index 0c9df2a..37b4ee8 100644
--- a/Doc/lib/libre.tex
+++ b/Doc/lib/libre.tex
@@ -1,21 +1,21 @@
 \section{\module{re} ---
-         Perl-style regular expression operations.}
+         Regular expression operations}
 \declaremodule{standard}{re}
 \moduleauthor{Andrew M. Kuchling}{amk1@bigfoot.com}
+\moduleauthor{Fredrik Lundh}{effbot@telia.com}
 \sectionauthor{Andrew M. Kuchling}{amk1@bigfoot.com}
 
 
-\modulesynopsis{Perl-style regular expression search and match
-operations.}
+\modulesynopsis{Regular expression search and match operations with a
+                Perl-style expression syntax.}
 
 
 This module provides regular expression matching operations similar to
-those found in Perl.  It's 8-bit clean: the strings being processed
-may contain both null bytes and characters whose high bit is set.  Regular
-expression pattern strings may not contain null bytes, but can specify
-the null byte using the \code{\e\var{number}} notation.
-Characters with the high bit set may be included.  The \module{re}
-module is always available.
+those found in Perl.  Regular expression pattern strings may not
+contain null bytes, but can specify the null byte using the
+\code{\e\var{number}} notation.  Both patterns and strings to be
+searched can be Unicode strings as well as 8-bit strings.  The
+\module{re} module is always available.
 
 Regular expressions use the backslash character (\character{\e}) to
 indicate special forms or to allow special characters to be used
@@ -34,6 +34,15 @@ while \code{"\e n"} is a one-character string containing a newline.
 Usually patterns will be expressed in Python code using this raw
 string notation.
 
+\strong{Implementation note:}
+The \module{re}\refstmodindex{pre} module has two distinct
+implementations: \module{sre} is the default implementation and
+includes Unicode support, but may run into stack limitations for some
+patterns.  Though this will be fixed for a future release of Python,
+the older implementation (without Unicode support) is still available
+as the \module{pre}\refstmodindex{pre} module.
+
+
 \subsection{Regular Expression Syntax \label{re-syntax}}
 
 A regular expression (or RE) specifies a set of strings that matches
@@ -155,9 +164,16 @@ simply match the \character{\^} character.  For example, \regexp{[{\^}5]}
 will match any character except \character{5}.
 
 \item[\character{|}]\code{A|B}, where A and B can be arbitrary REs,
-creates a regular expression that will match either A or B.  This can
-be used inside groups (see below) as well.  To match a literal \character{|},
-use \regexp{\e|}, or enclose it inside a character class, as in  \regexp{[|]}.
+creates a regular expression that will match either A or B.  An
+arbitrary number of REs can be separated by the \character{|} in this
+way.  This can be used inside groups (see below) as well.  REs
+separated by \character{|} are tried from left to right, and the first
+one that allows the complete pattern to match is considered the
+accepted branch.  This means that if \code{A} matches, \code{B} will
+never be tested, even if it would produce a longer overall match.  In
+other words, the \character{|} operator is never greedy.  To match a
+literal \character{|}, use \regexp{\e|}, or enclose it inside a
+character class, as in \regexp{[|]}.
 
 \item[\code{(...)}] Matches whatever regular expression is inside the
 parentheses, and indicates the start and end of a group; the contents
@@ -184,6 +200,11 @@ for the entire regular expression.  This is useful if you wish to
 include the flags as part of the regular expression, instead of
 passing a \var{flag} argument to the \function{compile()} function.
 
+Note that the \regexp{(?x)} flag changes how the expression is parsed.
+It should be used first in the expression string, or after one or more
+whitespace characters.  If there are non-whitespace characters before
+the flag, the results are undefined.
+
 \item[\code{(?:...)}] A non-grouping version of regular parentheses.
 Matches whatever regular expression is inside the parentheses, but the
 substring matched by the
author	Fred Drake <fdrake@acm.org>	2000-10-06 19:59:22 (GMT)
committer	Fred Drake <fdrake@acm.org>	2000-10-06 19:59:22 (GMT)
commit	062ea2e70bd320a5b4b3cd1907babf19c17c6622 (patch)
tree	be24263a1999eba769ee0e96d2e4428e2d248421 /Doc/lib/libre.tex
parent	e2b7c4dea38568ac7fa1576167fb1b32bf9cdf3f (diff)
download	cpython-062ea2e70bd320a5b4b3cd1907babf19c17c6622.zip cpython-062ea2e70bd320a5b4b3cd1907babf19c17c6622.tar.gz cpython-062ea2e70bd320a5b4b3cd1907babf19c17c6622.tar.bz2