Merge p3yk branch with the trunk up to revision 45595. This breaks a fair

number of tests, all because of the codecs/_multibytecodecs issue described here (it's not a Py3K issue, just something Py3K discovers): http://mail.python.org/pipermail/python-dev/2006-April/064051.html Hye-Shik Chang promised to look for a fix, so no need to fix it here. The tests that are expected to break are: test_codecencodings_cn test_codecencodings_hk test_codecencodings_jp test_codecencodings_kr test_codecencodings_tw test_codecs test_multibytecodec This merge fixes an actual test failure (test_weakref) in this branch, though, so I believe merging is the right thing to do anyway.
author: Thomas Wouters <thomas@python.org> 2006-04-21 10:40:58 (GMT)
committer: Thomas Wouters <thomas@python.org> 2006-04-21 10:40:58 (GMT)
commit: 49fd7fa4431da299196d74087df4a04f99f9c46f (patch)
tree: 35ace5fe78d3d52c7a9ab356ab9f6dbf8d4b71f4 /Doc/lib/libregex.tex
parent: 9ada3d6e29d5165dadacbe6be07bcd35cfbef59d (diff)
download: cpython-49fd7fa4431da299196d74087df4a04f99f9c46f.zip
cpython-49fd7fa4431da299196d74087df4a04f99f9c46f.tar.gz
cpython-49fd7fa4431da299196d74087df4a04f99f9c46f.tar.bz2
1 files changed, 0 insertions, 370 deletions
diff --git a/Doc/lib/libregex.tex b/Doc/lib/libregex.tex
deleted file mode 100644
index 0982f81..0000000
--- a/Doc/lib/libregex.tex
+++ /dev/null
@@ -1,370 +0,0 @@
-\section{\module{regex} ---
-         Regular expression operations}
-\declaremodule{builtin}{regex}
-
-\modulesynopsis{Regular expression search and match operations.
-                \strong{Obsolete!}}
-
-
-This module provides regular expression matching operations similar to
-those found in Emacs.
-
-\strong{Obsolescence note:}
-This module is obsolete as of Python version 1.5; it is still being
-maintained because much existing code still uses it.  All new code in
-need of regular expressions should use the new
-\code{re}\refstmodindex{re} module, which supports the more powerful
-and regular Perl-style regular expressions.  Existing code should be
-converted.  The standard library module
-\code{reconvert}\refstmodindex{reconvert} helps in converting
-\code{regex} style regular expressions to \code{re}\refstmodindex{re}
-style regular expressions.  (For more conversion help, see Andrew
-Kuchling's\index{Kuchling, Andrew} ``\module{regex-to-re} HOWTO'' at
-\url{http://www.python.org/doc/howto/regex-to-re/}.)
-
-By default the patterns are Emacs-style regular expressions
-(with one exception).  There is
-a way to change the syntax to match that of several well-known
-\UNIX{} utilities.  The exception is that Emacs' \samp{\e s}
-pattern is not supported, since the original implementation references
-the Emacs syntax tables.
-
-This module is 8-bit clean: both patterns and strings may contain null
-bytes and characters whose high bit is set.
-
-\strong{Please note:} There is a little-known fact about Python string
-literals which means that you don't usually have to worry about
-doubling backslashes, even though they are used to escape special
-characters in string literals as well as in regular expressions.  This
-is because Python doesn't remove backslashes from string literals if
-they are followed by an unrecognized escape character.
-\emph{However}, if you want to include a literal \dfn{backslash} in a
-regular expression represented as a string literal, you have to
-\emph{quadruple} it or enclose it in a singleton character class.
-E.g.\  to extract \LaTeX\ \samp{\e section\{\textrm{\ldots}\}} headers
-from a document, you can use this pattern:
-\code{'[\e ]section\{\e (.*\e )\}'}.  \emph{Another exception:}
-the escape sequence \samp{\e b} is significant in string literals
-(where it means the ASCII bell character) as well as in Emacs regular
-expressions (where it stands for a word boundary), so in order to
-search for a word boundary, you should use the pattern \code{'\e \e b'}.
-Similarly, a backslash followed by a digit 0-7 should be doubled to
-avoid interpretation as an octal escape.
-
-\subsection{Regular Expressions}
-
-A regular expression (or RE) specifies a set of strings that matches
-it; the functions in this module let you check if a particular string
-matches a given regular expression (or if a given regular expression
-matches a particular string, which comes down to the same thing).
-
-Regular expressions can be concatenated to form new regular
-expressions; if \emph{A} and \emph{B} are both regular expressions,
-then \emph{AB} is also an regular expression.  If a string \emph{p}
-matches A and another string \emph{q} matches B, the string \emph{pq}
-will match AB.  Thus, complex expressions can easily be constructed
-from simpler ones like the primitives described here.  For details of
-the theory and implementation of regular expressions, consult almost
-any textbook about compiler construction.
-
-% XXX The reference could be made more specific, say to 
-% "Compilers: Principles, Techniques and Tools", by Alfred V. Aho, 
-% Ravi Sethi, and Jeffrey D. Ullman, or some FA text.   
-
-A brief explanation of the format of regular expressions follows.
-
-Regular expressions can contain both special and ordinary characters.
-Ordinary characters, like '\code{A}', '\code{a}', or '\code{0}', are
-the simplest regular expressions; they simply match themselves.  You
-can concatenate ordinary characters, so '\code{last}' matches the
-characters 'last'.  (In the rest of this section, we'll write RE's in
-\code{this special font}, usually without quotes, and strings to be
-matched 'in single quotes'.)
-
-Special characters either stand for classes of ordinary characters, or
-affect how the regular expressions around them are interpreted.
-
-The special characters are:
-\begin{itemize}
-\item[\code{.}] (Dot.)  Matches any character except a newline.
-\item[\code{\^}] (Caret.)  Matches the start of the string.
-\item[\code{\$}] Matches the end of the string.  
-\code{foo} matches both 'foo' and 'foobar', while the regular
-expression '\code{foo\$}' matches only 'foo'.
-\item[\code{*}] Causes the resulting RE to
-match 0 or more repetitions of the preceding RE.  \code{ab*} will
-match 'a', 'ab', or 'a' followed by any number of 'b's.
-\item[\code{+}] Causes the
-resulting RE to match 1 or more repetitions of the preceding RE.
-\code{ab+} will match 'a' followed by any non-zero number of 'b's; it
-will not match just 'a'.
-\item[\code{?}] Causes the resulting RE to
-match 0 or 1 repetitions of the preceding RE.  \code{ab?} will
-match either 'a' or 'ab'.
-
-\item[\code{\e}] Either escapes special characters (permitting you to match
-characters like '*?+\&\$'), or signals a special sequence; special
-sequences are discussed below.  Remember that Python also uses the
-backslash as an escape sequence in string literals; if the escape
-sequence isn't recognized by Python's parser, the backslash and
-subsequent character are included in the resulting string.  However,
-if Python would recognize the resulting sequence, the backslash should
-be repeated twice.  
-
-\item[\code{[]}] Used to indicate a set of characters.  Characters can
-be listed individually, or a range is indicated by giving two
-characters and separating them by a '-'.  Special characters are
-not active inside sets.  For example, \code{[akm\$]}
-will match any of the characters 'a', 'k', 'm', or '\$'; \code{[a-z]} will
-match any lowercase letter.  
-
-If you want to include a \code{]} inside a
-set, it must be the first character of the set; to include a \code{-},
-place it as the first or last character. 
-
-Characters \emph{not} within a range can be matched by including a
-\code{\^} as the first character of the set; \code{\^} elsewhere will
-simply match the '\code{\^}' character.  
-\end{itemize}
-
-The special sequences consist of '\code{\e}' and a character
-from the list below.  If the ordinary character is not on the list,
-then the resulting RE will match the second character.  For example,
-\code{\e\$} matches the character '\$'.  Ones where the backslash
-should be doubled in string literals are indicated.
-
-\begin{itemize}
-\item[\code{\e|}]\code{A\e|B}, where A and B can be arbitrary REs,
-creates a regular expression that will match either A or B.  This can
-be used inside groups (see below) as well.
-%
-\item[\code{\e( \e)}] Indicates the start and end of a group; the
-contents of a group can be matched later in the string with the
-\code{\e [1-9]} special sequence, described next.
-\end{itemize}
-
-\begin{fulllineitems}
-\item[\code{\e \e 1, ... \e \e 7, \e 8, \e 9}]
-Matches the contents of the group of the same
-number.  For example, \code{\e (.+\e ) \e \e 1} matches 'the the' or
-'55 55', but not 'the end' (note the space after the group).  This
-special sequence can only be used to match one of the first 9 groups;
-groups with higher numbers can be matched using the \code{\e v}
-sequence.  (\code{\e 8} and \code{\e 9} don't need a double backslash
-because they are not octal digits.)
-\end{fulllineitems}
-
-\begin{itemize}
-\item[\code{\e \e b}] Matches the empty string, but only at the
-beginning or end of a word.  A word is defined as a sequence of
-alphanumeric characters, so the end of a word is indicated by
-whitespace or a non-alphanumeric character.
-%
-\item[\code{\e B}] Matches the empty string, but when it is \emph{not} at the
-beginning or end of a word.
-%
-\item[\code{\e v}] Must be followed by a two digit decimal number, and
-matches the contents of the group of the same number.  The group
-number must be between 1 and 99, inclusive.
-%
-\item[\code{\e w}]Matches any alphanumeric character; this is
-equivalent to the set \code{[a-zA-Z0-9]}.
-%
-\item[\code{\e W}] Matches any non-alphanumeric character; this is
-equivalent to the set \code{[\^a-zA-Z0-9]}.
-\item[\code{\e <}] Matches the empty string, but only at the beginning of a
-word.  A word is defined as a sequence of alphanumeric characters, so
-the end of a word is indicated by whitespace or a non-alphanumeric 
-character.
-\item[\code{\e >}] Matches the empty string, but only at the end of a
-word.
-
-\item[\code{\e \e \e \e}] Matches a literal backslash.
-
-% In Emacs, the following two are start of buffer/end of buffer.  In
-% Python they seem to be synonyms for ^$.
-\item[\code{\e `}] Like \code{\^}, this only matches at the start of the
-string.
-\item[\code{\e \e '}] Like \code{\$}, this only matches at the end of
-the string.
-% end of buffer
-\end{itemize}
-
-\subsection{Module Contents}
-\nodename{Contents of Module regex}
-
-The module defines these functions, and an exception:
-
-
-\begin{funcdesc}{match}{pattern, string}
-  Return how many characters at the beginning of \var{string} match
-  the regular expression \var{pattern}.  Return \code{-1} if the
-  string does not match the pattern (this is different from a
-  zero-length match!).
-\end{funcdesc}
-
-\begin{funcdesc}{search}{pattern, string}
-  Return the first position in \var{string} that matches the regular
-  expression \var{pattern}.  Return \code{-1} if no position in the string
-  matches the pattern (this is different from a zero-length match
-  anywhere!).
-\end{funcdesc}
-
-\begin{funcdesc}{compile}{pattern\optional{, translate}}
-  Compile a regular expression pattern into a regular expression
-  object, which can be used for matching using its \code{match()} and
-  \code{search()} methods, described below.  The optional argument
-  \var{translate}, if present, must be a 256-character string
-  indicating how characters (both of the pattern and of the strings to
-  be matched) are translated before comparing them; the \var{i}-th
-  element of the string gives the translation for the character with
-  \ASCII{} code \var{i}.  This can be used to implement
-  case-insensitive matching; see the \code{casefold} data item below.
-
-  The sequence
-
-\begin{verbatim}
-prog = regex.compile(pat)
-result = prog.match(str)
-\end{verbatim}
-%
-is equivalent to
-
-\begin{verbatim}
-result = regex.match(pat, str)
-\end{verbatim}
-
-but the version using \code{compile()} is more efficient when multiple
-regular expressions are used concurrently in a single program.  (The
-compiled version of the last pattern passed to \code{regex.match()} or
-\code{regex.search()} is cached, so programs that use only a single
-regular expression at a time needn't worry about compiling regular
-expressions.)
-\end{funcdesc}
-
-\begin{funcdesc}{set_syntax}{flags}
-  Set the syntax to be used by future calls to \code{compile()},
-  \code{match()} and \code{search()}.  (Already compiled expression
-  objects are not affected.)  The argument is an integer which is the
-  OR of several flag bits.  The return value is the previous value of
-  the syntax flags.  Names for the flags are defined in the standard
-  module \code{regex_syntax}\refstmodindex{regex_syntax}; read the
-  file \file{regex_syntax.py} for more information.
-\end{funcdesc}
-
-\begin{funcdesc}{get_syntax}{}
-  Returns the current value of the syntax flags as an integer.
-\end{funcdesc}
-
-\begin{funcdesc}{symcomp}{pattern\optional{, translate}}
-This is like \code{compile()}, but supports symbolic group names: if a
-parenthesis-enclosed group begins with a group name in angular
-brackets, e.g. \code{'\e(<id>[a-z][a-z0-9]*\e)'}, the group can
-be referenced by its name in arguments to the \code{group()} method of
-the resulting compiled regular expression object, like this:
-\code{p.group('id')}.  Group names may contain alphanumeric characters
-and \code{'_'} only.
-\end{funcdesc}
-
-\begin{excdesc}{error}
-  Exception raised when a string passed to one of the functions here
-  is not a valid regular expression (e.g., unmatched parentheses) or
-  when some other error occurs during compilation or matching.  (It is
-  never an error if a string contains no match for a pattern.)
-\end{excdesc}
-
-\begin{datadesc}{casefold}
-A string suitable to pass as the \var{translate} argument to
-\code{compile()} to map all upper case characters to their lowercase
-equivalents.
-\end{datadesc}
-
-\noindent
-Compiled regular expression objects support these methods:
-
-\setindexsubitem{(regex method)}
-\begin{funcdesc}{match}{string\optional{, pos}}
-  Return how many characters at the beginning of \var{string} match
-  the compiled regular expression.  Return \code{-1} if the string
-  does not match the pattern (this is different from a zero-length
-  match!).
-  
-  The optional second parameter, \var{pos}, gives an index in the string
-  where the search is to start; it defaults to \code{0}.  This is not
-  completely equivalent to slicing the string; the \code{'\^'} pattern
-  character matches at the real beginning of the string and at positions
-  just after a newline, not necessarily at the index where the search
-  is to start.
-\end{funcdesc}
-
-\begin{funcdesc}{search}{string\optional{, pos}}
-  Return the first position in \var{string} that matches the regular
-  expression \code{pattern}.  Return \code{-1} if no position in the
-  string matches the pattern (this is different from a zero-length
-  match anywhere!).
-  
-  The optional second parameter has the same meaning as for the
-  \code{match()} method.
-\end{funcdesc}
-
-\begin{funcdesc}{group}{index, index, ...}
-This method is only valid when the last call to the \code{match()}
-or \code{search()} method found a match.  It returns one or more
-groups of the match.  If there is a single \var{index} argument,
-the result is a single string; if there are multiple arguments, the
-result is a tuple with one item per argument.  If the \var{index} is
-zero, the corresponding return value is the entire matching string; if
-it is in the inclusive range [1..99], it is the string matching the
-corresponding parenthesized group (using the default syntax,
-groups are parenthesized using \code{{\e}(} and \code{{\e})}).  If no
-such group exists, the corresponding result is \code{None}.
-
-If the regular expression was compiled by \code{symcomp()} instead of
-\code{compile()}, the \var{index} arguments may also be strings
-identifying groups by their group name.
-\end{funcdesc}
-
-\noindent
-Compiled regular expressions support these data attributes:
-
-\setindexsubitem{(regex attribute)}
-
-\begin{datadesc}{regs}
-When the last call to the \code{match()} or \code{search()} method found a
-match, this is a tuple of pairs of indexes corresponding to the
-beginning and end of all parenthesized groups in the pattern.  Indices
-are relative to the string argument passed to \code{match()} or
-\code{search()}.  The 0-th tuple gives the beginning and end or the
-whole pattern.  When the last match or search failed, this is
-\code{None}.
-\end{datadesc}
-
-\begin{datadesc}{last}
-When the last call to the \code{match()} or \code{search()} method found a
-match, this is the string argument passed to that method.  When the
-last match or search failed, this is \code{None}.
-\end{datadesc}
-
-\begin{datadesc}{translate}
-This is the value of the \var{translate} argument to
-\code{regex.compile()} that created this regular expression object.  If
-the \var{translate} argument was omitted in the \code{regex.compile()}
-call, this is \code{None}.
-\end{datadesc}
-
-\begin{datadesc}{givenpat}
-The regular expression pattern as passed to \code{compile()} or
-\code{symcomp()}.
-\end{datadesc}
-
-\begin{datadesc}{realpat}
-The regular expression after stripping the group names for regular
-expressions compiled with \code{symcomp()}.  Same as \code{givenpat}
-otherwise.
-\end{datadesc}
-
-\begin{datadesc}{groupindex}
-A dictionary giving the mapping from symbolic group names to numerical
-group indexes for regular expressions compiled with \code{symcomp()}.
-\code{None} otherwise.
-\end{datadesc}
author	Thomas Wouters <thomas@python.org>	2006-04-21 10:40:58 (GMT)
committer	Thomas Wouters <thomas@python.org>	2006-04-21 10:40:58 (GMT)
commit	49fd7fa4431da299196d74087df4a04f99f9c46f (patch)
tree	35ace5fe78d3d52c7a9ab356ab9f6dbf8d4b71f4 /Doc/lib/libregex.tex
parent	9ada3d6e29d5165dadacbe6be07bcd35cfbef59d (diff)
download	cpython-49fd7fa4431da299196d74087df4a04f99f9c46f.zip cpython-49fd7fa4431da299196d74087df4a04f99f9c46f.tar.gz cpython-49fd7fa4431da299196d74087df4a04f99f9c46f.tar.bz2