diff options
author | Serhiy Storchaka <storchaka@gmail.com> | 2017-10-14 08:14:26 (GMT) |
---|---|---|
committer | GitHub <noreply@github.com> | 2017-10-14 08:14:26 (GMT) |
commit | cd195e2a7ac5c9b2574d5462752b7939641de4a9 (patch) | |
tree | ae561205263204cf6c1c2a33b0836d8007da6969 /Doc/library/re.rst | |
parent | ef611c96eab0ab667ebb43fdf429b319f6d99890 (diff) | |
download | cpython-cd195e2a7ac5c9b2574d5462752b7939641de4a9.zip cpython-cd195e2a7ac5c9b2574d5462752b7939641de4a9.tar.gz cpython-cd195e2a7ac5c9b2574d5462752b7939641de4a9.tar.bz2 |
bpo-31714: Improved regular expression documentation. (#3907)
Diffstat (limited to 'Doc/library/re.rst')
-rw-r--r-- | Doc/library/re.rst | 235 |
1 files changed, 133 insertions, 102 deletions
diff --git a/Doc/library/re.rst b/Doc/library/re.rst index 7efdd5d..3dd3a0f 100644 --- a/Doc/library/re.rst +++ b/Doc/library/re.rst @@ -14,8 +14,9 @@ This module provides regular expression matching operations similar to those found in Perl. -Both patterns and strings to be searched can be Unicode strings as well as -8-bit strings. However, Unicode strings and 8-bit strings cannot be mixed: +Both patterns and strings to be searched can be Unicode strings (:class:`str`) +as well as 8-bit strings (:class:`bytes`). +However, Unicode strings and 8-bit strings cannot be mixed: that is, you cannot match a Unicode string with a byte pattern or vice-versa; similarly, when asking for a substitution, the replacement string must be of the same type as both the pattern and the search string. @@ -81,9 +82,7 @@ strings to be matched ``'in single quotes'``.) Some characters, like ``'|'`` or ``'('``, are special. Special characters either stand for classes of ordinary characters, or affect -how the regular expressions around them are interpreted. Regular -expression pattern strings may not contain null bytes, but can specify -the null byte using a ``\number`` notation such as ``'\x00'``. +how the regular expressions around them are interpreted. Repetition qualifiers (``*``, ``+``, ``?``, ``{m,n}``, etc) cannot be directly nested. This avoids ambiguity with the non-greedy modifier suffix @@ -94,16 +93,16 @@ the expression ``(?:a{6})*`` matches any multiple of six ``'a'`` characters. The special characters are: -``'.'`` +``.`` (Dot.) In the default mode, this matches any character except a newline. If the :const:`DOTALL` flag has been specified, this matches any character including a newline. -``'^'`` +``^`` (Caret.) Matches the start of the string, and in :const:`MULTILINE` mode also matches immediately after each newline. -``'$'`` +``$`` Matches the end of the string or just before the newline at the end of the string, and in :const:`MULTILINE` mode also matches before a newline. ``foo`` matches both 'foo' and 'foobar', while the regular expression ``foo$`` matches @@ -112,28 +111,28 @@ The special characters are: a single ``$`` in ``'foo\n'`` will find two (empty) matches: one just before the newline, and one at the end of the string. -``'*'`` +``*`` Causes the resulting RE to match 0 or more repetitions of the preceding RE, as many repetitions as are possible. ``ab*`` will match 'a', 'ab', or 'a' followed by any number of 'b's. -``'+'`` +``+`` Causes the resulting RE to match 1 or more repetitions of the preceding RE. ``ab+`` will match 'a' followed by any non-zero number of 'b's; it will not match just 'a'. -``'?'`` +``?`` Causes the resulting RE to match 0 or 1 repetitions of the preceding RE. ``ab?`` will match either 'a' or 'ab'. ``*?``, ``+?``, ``??`` The ``'*'``, ``'+'``, and ``'?'`` qualifiers are all :dfn:`greedy`; they match as much text as possible. Sometimes this behaviour isn't desired; if the RE - ``<.*>`` is matched against ``<a> b <c>``, it will match the entire - string, and not just ``<a>``. Adding ``?`` after the qualifier makes it + ``<.*>`` is matched against ``'<a> b <c>'``, it will match the entire + string, and not just ``'<a>'``. Adding ``?`` after the qualifier makes it perform the match in :dfn:`non-greedy` or :dfn:`minimal` fashion; as *few* characters as possible will be matched. Using the RE ``<.*?>`` will match - only ``<a>``. + only ``'<a>'``. ``{m}`` Specifies that exactly *m* copies of the previous RE should be matched; fewer @@ -145,8 +144,8 @@ The special characters are: RE, attempting to match as many repetitions as possible. For example, ``a{3,5}`` will match from 3 to 5 ``'a'`` characters. Omitting *m* specifies a lower bound of zero, and omitting *n* specifies an infinite upper bound. As an - example, ``a{4,}b`` will match ``aaaab`` or a thousand ``'a'`` characters - followed by a ``b``, but not ``aaab``. The comma may not be omitted or the + example, ``a{4,}b`` will match ``'aaaab'`` or a thousand ``'a'`` characters + followed by a ``'b'``, but not ``'aaab'``. The comma may not be omitted or the modifier would be confused with the previously described form. ``{m,n}?`` @@ -156,7 +155,7 @@ The special characters are: 6-character string ``'aaaaaa'``, ``a{3,5}`` will match 5 ``'a'`` characters, while ``a{3,5}?`` will only match 3 characters. -``'\'`` +``\`` Either escapes special characters (permitting you to match characters like ``'*'``, ``'?'``, and so forth), or signals a special sequence; special sequences are discussed below. @@ -179,8 +178,8 @@ The special characters are: them by a ``'-'``, for example ``[a-z]`` will match any lowercase ASCII letter, ``[0-5][0-9]`` will match all the two-digits numbers from ``00`` to ``59``, and ``[0-9A-Fa-f]`` will match any hexadecimal digit. If ``-`` is escaped (e.g. - ``[a\-z]``) or if it's placed as the first or last character (e.g. ``[a-]``), - it will match a literal ``'-'``. + ``[a\-z]``) or if it's placed as the first or last character + (e.g. ``[-a]`` or ``[a-]``), it will match a literal ``'-'``. * Special characters lose their special meaning inside sets. For example, ``[(+*)]`` will match any of the literal characters ``'('``, ``'+'``, @@ -201,13 +200,13 @@ The special characters are: place it at the beginning of the set. For example, both ``[()[\]{}]`` and ``[]()[{}]`` will both match a parenthesis. -``'|'`` - ``A|B``, where A and B can be arbitrary REs, creates a regular expression that - will match either A or B. An arbitrary number of REs can be separated by the +``|`` + ``A|B``, where *A* and *B* can be arbitrary REs, creates a regular expression that + will match either *A* or *B*. An arbitrary number of REs can be separated by the ``'|'`` in this way. This can be used inside groups (see below) as well. As the target string is scanned, REs separated by ``'|'`` are tried from left to right. When one pattern completely matches, that branch is accepted. This means - that once ``A`` matches, ``B`` will not be tested further, even if it would + that once *A* matches, *B* will not be tested further, even if it would produce a longer overall match. In other words, the ``'|'`` operator is never greedy. To match a literal ``'|'``, use ``\|``, or enclose it inside a character class, as in ``[|]``. @@ -217,7 +216,7 @@ The special characters are: start and end of a group; the contents of a group can be retrieved after a match has been performed, and can be matched later in the string with the ``\number`` special sequence, described below. To match the literals ``'('`` or ``')'``, - use ``\(`` or ``\)``, or enclose them inside a character class: ``[(] [)]``. + use ``\(`` or ``\)``, or enclose them inside a character class: ``[(]``, ``[)]``. ``(?...)`` This is an extension notation (a ``'?'`` following a ``'('`` is not meaningful @@ -232,10 +231,11 @@ The special characters are: letters set the corresponding flags: :const:`re.A` (ASCII-only matching), :const:`re.I` (ignore case), :const:`re.L` (locale dependent), :const:`re.M` (multi-line), :const:`re.S` (dot matches all), - and :const:`re.X` (verbose), for the entire regular expression. (The - flags are described in :ref:`contents-of-module-re`.) This - is useful if you wish to include the flags as part of the regular - expression, instead of passing a *flag* argument to the + :const:`re.U` (Unicode matching), and :const:`re.X` (verbose), + for the entire regular expression. + (The flags are described in :ref:`contents-of-module-re`.) + This is useful if you wish to include the flags as part of the + regular expression, instead of passing a *flag* argument to the :func:`re.compile` function. Flags should be used first in the expression string. @@ -272,10 +272,10 @@ The special characters are: | in the same pattern itself | * ``(?P=quote)`` (as shown) | | | * ``\1`` | +---------------------------------------+----------------------------------+ - | when processing match object ``m`` | * ``m.group('quote')`` | + | when processing match object *m* | * ``m.group('quote')`` | | | * ``m.end('quote')`` (etc.) | +---------------------------------------+----------------------------------+ - | in a string passed to the ``repl`` | * ``\g<quote>`` | + | in a string passed to the *repl* | * ``\g<quote>`` | | argument of ``re.sub()`` | * ``\g<1>`` | | | * ``\1`` | +---------------------------------------+----------------------------------+ @@ -289,18 +289,18 @@ The special characters are: ``(?=...)`` Matches if ``...`` matches next, but doesn't consume any of the string. This is - called a lookahead assertion. For example, ``Isaac (?=Asimov)`` will match + called a :dfn:`lookahead assertion`. For example, ``Isaac (?=Asimov)`` will match ``'Isaac '`` only if it's followed by ``'Asimov'``. ``(?!...)`` - Matches if ``...`` doesn't match next. This is a negative lookahead assertion. + Matches if ``...`` doesn't match next. This is a :dfn:`negative lookahead assertion`. For example, ``Isaac (?!Asimov)`` will match ``'Isaac '`` only if it's *not* followed by ``'Asimov'``. ``(?<=...)`` Matches if the current position in the string is preceded by a match for ``...`` that ends at the current position. This is called a :dfn:`positive lookbehind - assertion`. ``(?<=abc)def`` will find a match in ``abcdef``, since the + assertion`. ``(?<=abc)def`` will find a match in ``'abcdef'``, since the lookbehind will back up 3 characters and check if the contained pattern matches. The contained pattern must only match strings of some fixed length, meaning that ``abc`` or ``a|b`` are allowed, but ``a*`` and ``a{3,4}`` are not. Note that @@ -358,26 +358,26 @@ character ``'$'``. ``\b`` Matches the empty string, but only at the beginning or end of a word. - A word is defined as a sequence of Unicode alphanumeric or underscore - characters, so the end of a word is indicated by whitespace or a - non-alphanumeric, non-underscore Unicode character. Note that formally, + A word is defined as a sequence of word characters. Note that formally, ``\b`` is defined as the boundary between a ``\w`` and a ``\W`` character (or vice versa), or between ``\w`` and the beginning/end of the string. This means that ``r'\bfoo\b'`` matches ``'foo'``, ``'foo.'``, ``'(foo)'``, ``'bar foo baz'`` but not ``'foobar'`` or ``'foo3'``. - By default Unicode alphanumerics are the ones used, but this can be changed - by using the :const:`ASCII` flag. Inside a character range, ``\b`` - represents the backspace character, for compatibility with Python's string - literals. + By default Unicode alphanumerics are the ones used in Unicode patterns, but + this can be changed by using the :const:`ASCII` flag. Word boundaries are + determined by the current locale if the :const:`LOCALE` flag is used. + Inside a character range, ``\b`` represents the backspace character, for + compatibility with Python's string literals. ``\B`` Matches the empty string, but only when it is *not* at the beginning or end of a word. This means that ``r'py\B'`` matches ``'python'``, ``'py3'``, ``'py2'``, but not ``'py'``, ``'py.'``, or ``'py!'``. - ``\B`` is just the opposite of ``\b``, so word characters are - Unicode alphanumerics or the underscore, although this can be changed - by using the :const:`ASCII` flag. + ``\B`` is just the opposite of ``\b``, so word characters in Unicode + patterns are Unicode alphanumerics or the underscore, although this can + be changed by using the :const:`ASCII` flag. Word boundaries are + determined by the current locale if the :const:`LOCALE` flag is used. ``\d`` For Unicode (str) patterns: @@ -387,11 +387,12 @@ character ``'$'``. used only ``[0-9]`` is matched (but the flag affects the entire regular expression, so in such cases using an explicit ``[0-9]`` may be a better choice). + For 8-bit (bytes) patterns: Matches any decimal digit; this is equivalent to ``[0-9]``. ``\D`` - Matches any character which is not a Unicode decimal digit. This is + Matches any character which is not a decimal digit. This is the opposite of ``\d``. If the :const:`ASCII` flag is used this becomes the equivalent of ``[^0-9]`` (but the flag affects the entire regular expression, so in such cases using an explicit ``[^0-9]`` may @@ -412,7 +413,7 @@ character ``'$'``. this is equivalent to ``[ \t\n\r\f\v]``. ``\S`` - Matches any character which is not a Unicode whitespace character. This is + Matches any character which is not a whitespace character. This is the opposite of ``\s``. If the :const:`ASCII` flag is used this becomes the equivalent of ``[^ \t\n\r\f\v]`` (but the flag affects the entire regular expression, so in such cases using an explicit ``[^ \t\n\r\f\v]`` may @@ -426,16 +427,21 @@ character ``'$'``. ``[a-zA-Z0-9_]`` is matched (but the flag affects the entire regular expression, so in such cases using an explicit ``[a-zA-Z0-9_]`` may be a better choice). + For 8-bit (bytes) patterns: Matches characters considered alphanumeric in the ASCII character set; - this is equivalent to ``[a-zA-Z0-9_]``. + this is equivalent to ``[a-zA-Z0-9_]``. If the :const:`LOCALE` flag is + used, matches characters considered alphanumeric in the current locale + and the underscore. ``\W`` - Matches any character which is not a Unicode word character. This is + Matches any character which is not a word character. This is the opposite of ``\w``. If the :const:`ASCII` flag is used this becomes the equivalent of ``[^a-zA-Z0-9_]`` (but the flag affects the entire regular expression, so in such cases using an explicit - ``[^a-zA-Z0-9_]`` may be a better choice). + ``[^a-zA-Z0-9_]`` may be a better choice). If the :const:`LOCALE` flag is + used, matches characters considered alphanumeric in the current locale + and the underscore. ``\Z`` Matches only at the end of the string. @@ -451,7 +457,7 @@ accepted by the regular expression parser:: only inside character classes.) ``'\u'`` and ``'\U'`` escape sequences are only recognized in Unicode -patterns. In bytes patterns they are not treated specially. +patterns. In bytes patterns they are errors. Octal escapes are included in a limited form. If the first digit is a 0, or if there are three octal digits, it is considered an octal escape. Otherwise, it is @@ -526,6 +532,7 @@ form. Make ``\w``, ``\W``, ``\b``, ``\B``, ``\d``, ``\D``, ``\s`` and ``\S`` perform ASCII-only matching instead of full Unicode matching. This is only meaningful for Unicode patterns, and is ignored for byte patterns. + Corresponds to the inline flag ``(?a)``. Note that for backward compatibility, the :const:`re.U` flag still exists (as well as its synonym :const:`re.UNICODE` and its embedded @@ -537,26 +544,40 @@ form. .. data:: DEBUG Display debug information about compiled expression. + No corresponding inline flag. .. data:: I IGNORECASE Perform case-insensitive matching; expressions like ``[A-Z]`` will also - match lowercase letters. The current locale does not change the effect of - this flag. Full Unicode matching (such as ``Ü`` matching ``ü``) also - works unless the :const:`re.ASCII` flag is also used to disable non-ASCII - matches. - + match lowercase letters. Full Unicode matching (such as ``Ü`` matching + ``ü``) also works unless the :const:`re.ASCII` flag is used to disable + non-ASCII matches. The current locale does not change the effect of this + flag unless the :const:`re.LOCALE` flag is also used. + Corresponds to the inline flag ``(?i)``. + + Note that when the Unicode patterns ``[a-z]`` or ``[A-Z]`` are used in + combination with the :const:`IGNORECASE` flag, they will match the 52 ASCII + letters and 4 additional non-ASCII letters: 'İ' (U+0130, Latin capital + letter I with dot above), 'ı' (U+0131, Latin small letter dotless i), + 'ſ' (U+017F, Latin small letter long s) and 'K' (U+212A, Kelvin sign). + If the :const:`ASCII` flag is used, only letters 'a' to 'z' + and 'A' to 'Z' are matched (but the flag affects the entire regular + expression, so in such cases using an explicit ``(?-i:[a-zA-Z])`` may be + a better choice). .. data:: L LOCALE - Make ``\w``, ``\W``, ``\b``, ``\B``, ``\s`` and ``\S`` dependent on the - current locale. The use of this flag is discouraged as the locale mechanism - is very unreliable, and it only handles one "culture" at a time anyway; - you should use Unicode matching instead, which is the default in Python 3 - for Unicode (str) patterns. This flag can be used only with bytes patterns. + Make ``\w``, ``\W``, ``\b``, ``\B`` and case-insensitive matching + dependent on the current locale. This flag can be used only with bytes + patterns. The use of this flag is discouraged as the locale mechanism + is very unreliable, it only handles one "culture" at a time, and it only + works with 8-bit locales. Unicode matching is already enabled by default + in Python 3 for Unicode (str) patterns, and it is able to handle different + locales/languages. + Corresponds to the inline flag ``(?L)``. .. versionchanged:: 3.6 :const:`re.LOCALE` can be used only with bytes patterns and is @@ -577,6 +598,7 @@ form. end of each line (immediately preceding each newline). By default, ``'^'`` matches only at the beginning of the string, and ``'$'`` only at the end of the string and immediately before the newline (if any) at the end of the string. + Corresponds to the inline flag ``(?m)``. .. data:: S @@ -584,6 +606,7 @@ form. Make the ``'.'`` special character match any character at all, including a newline; without this flag, ``'.'`` will match anything *except* a newline. + Corresponds to the inline flag ``(?s)``. .. data:: X @@ -605,7 +628,7 @@ form. \d * # some fractional digits""", re.X) b = re.compile(r"\d+\.\d*") - + Corresponds to the inline flag ``(?x)``. .. function:: search(pattern, string, flags=0) @@ -660,7 +683,7 @@ form. If there are capturing groups in the separator and it matches at the start of the string, the result will start with an empty string. The same holds for - the end of the string: + the end of the string:: >>> re.split('(\W+)', '...words, words...') ['', '...', 'words', ', ', 'words', '...', ''] @@ -671,7 +694,7 @@ form. .. note:: :func:`split` doesn't currently split a string on an empty pattern match. - For example: + For example:: >>> re.split('x*', 'axbc') ['a', 'bc'] @@ -728,7 +751,7 @@ form. converted to a single newline character, ``\r`` is converted to a carriage return, and so forth. Unknown escapes such as ``\&`` are left alone. Backreferences, such as ``\6``, are replaced with the substring matched by group 6 in the pattern. - For example: + For example:: >>> re.sub(r'def\s+([a-zA-Z_][a-zA-Z_0-9]*)\s*\(\s*\):', ... r'static PyObject*\npy_\1(void)\n{', @@ -736,8 +759,8 @@ form. 'static PyObject*\npy_myfunc(void)\n{' If *repl* is a function, it is called for every non-overlapping occurrence of - *pattern*. The function takes a single match object argument, and returns the - replacement string. For example: + *pattern*. The function takes a single :ref:`match object <match-objects>` + argument, and returns the replacement string. For example:: >>> def dashrepl(matchobj): ... if matchobj.group(0) == '-': return ' ' @@ -747,7 +770,7 @@ form. >>> re.sub(r'\sAND\s', ' & ', 'Baked Beans And Spam', flags=re.IGNORECASE) 'Baked Beans & Spam' - The pattern may be a string or a :class:`Pattern` object. + The pattern may be a string or a :ref:`pattern object <re-objects>`. The optional argument *count* is the maximum number of pattern occurrences to be replaced; *count* must be a non-negative integer. If omitted or zero, all @@ -809,6 +832,14 @@ form. >>> print('|'.join(map(re.escape, sorted(operators, reverse=True)))) /|\-|\+|\*\*|\* + This functions must not be used for the replacement string in :func:`sub` + and :func:`subn`, only backslashes should be escaped. For example:: + + >>> digits_re = r'\d+' + >>> sample = '/usr/sbin/sendmail - 0 errors, 12 warnings' + >>> print(re.sub(digits_re, digits_re.replace('\\', r'\\'), sample)) + /usr/sbin/sendmail - \d+ errors, \d+ warnings + .. versionchanged:: 3.3 The ``'_'`` character is no longer escaped. @@ -880,12 +911,12 @@ attributes: from *pos* to ``endpos - 1`` will be searched for a match. If *endpos* is less than *pos*, no match will be found; otherwise, if *rx* is a compiled regular expression object, ``rx.search(string, 0, 50)`` is equivalent to - ``rx.search(string[:50], 0)``. + ``rx.search(string[:50], 0)``. :: - >>> pattern = re.compile("d") - >>> pattern.search("dog") # Match at index 0 - <re.Match object; span=(0, 1), match='d'> - >>> pattern.search("dog", 1) # No match; search doesn't include the "d" + >>> pattern = re.compile("d") + >>> pattern.search("dog") # Match at index 0 + <re.Match object; span=(0, 1), match='d'> + >>> pattern.search("dog", 1) # No match; search doesn't include the "d" .. method:: Pattern.match(string[, pos[, endpos]]) @@ -896,12 +927,12 @@ attributes: different from a zero-length match. The optional *pos* and *endpos* parameters have the same meaning as for the - :meth:`~Pattern.search` method. + :meth:`~Pattern.search` method. :: - >>> pattern = re.compile("o") - >>> pattern.match("dog") # No match as "o" is not at the start of "dog". - >>> pattern.match("dog", 1) # Match as "o" is the 2nd character of "dog". - <re.Match object; span=(1, 2), match='o'> + >>> pattern = re.compile("o") + >>> pattern.match("dog") # No match as "o" is not at the start of "dog". + >>> pattern.match("dog", 1) # Match as "o" is the 2nd character of "dog". + <re.Match object; span=(1, 2), match='o'> If you want to locate a match anywhere in *string*, use :meth:`~Pattern.search` instead (see also :ref:`search-vs-match`). @@ -914,13 +945,13 @@ attributes: match the pattern; note that this is different from a zero-length match. The optional *pos* and *endpos* parameters have the same meaning as for the - :meth:`~Pattern.search` method. + :meth:`~Pattern.search` method. :: - >>> pattern = re.compile("o[gh]") - >>> pattern.fullmatch("dog") # No match as "o" is not at the start of "dog". - >>> pattern.fullmatch("ogre") # No match as not the full string matches. - >>> pattern.fullmatch("doggie", 1, 3) # Matches within given limits. - <re.Match object; span=(1, 3), match='og'> + >>> pattern = re.compile("o[gh]") + >>> pattern.fullmatch("dog") # No match as "o" is not at the start of "dog". + >>> pattern.fullmatch("ogre") # No match as not the full string matches. + >>> pattern.fullmatch("doggie", 1, 3) # Matches within given limits. + <re.Match object; span=(1, 3), match='og'> .. versionadded:: 3.4 @@ -934,14 +965,14 @@ attributes: Similar to the :func:`findall` function, using the compiled pattern, but also accepts optional *pos* and *endpos* parameters that limit the search - region like for :meth:`match`. + region like for :meth:`search`. .. method:: Pattern.finditer(string[, pos[, endpos]]) Similar to the :func:`finditer` function, using the compiled pattern, but also accepts optional *pos* and *endpos* parameters that limit the search - region like for :meth:`match`. + region like for :meth:`search`. .. method:: Pattern.sub(repl, string, count=0) @@ -1024,7 +1055,7 @@ Match objects support the following methods and attributes: pattern, an :exc:`IndexError` exception is raised. If a group is contained in a part of the pattern that did not match, the corresponding result is ``None``. If a group is contained in a part of the pattern that matched multiple times, - the last match is returned. + the last match is returned. :: >>> m = re.match(r"(\w+) (\w+)", "Isaac Newton, physicist") >>> m.group(0) # The entire match @@ -1041,7 +1072,7 @@ Match objects support the following methods and attributes: string argument is not used as a group name in the pattern, an :exc:`IndexError` exception is raised. - A moderately complicated example: + A moderately complicated example:: >>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds") >>> m.group('first_name') @@ -1049,14 +1080,14 @@ Match objects support the following methods and attributes: >>> m.group('last_name') 'Reynolds' - Named groups can also be referred to by their index: + Named groups can also be referred to by their index:: >>> m.group(1) 'Malcolm' >>> m.group(2) 'Reynolds' - If a group matches multiple times, only the last match is accessible: + If a group matches multiple times, only the last match is accessible:: >>> m = re.match(r"(..)+", "a1b2c3") # Matches 3 times. >>> m.group(1) # Returns only the last match. @@ -1066,7 +1097,7 @@ Match objects support the following methods and attributes: .. method:: Match.__getitem__(g) This is identical to ``m.group(g)``. This allows easier access to - an individual group from a match: + an individual group from a match:: >>> m = re.match(r"(\w+) (\w+)", "Isaac Newton, physicist") >>> m[0] # The entire match @@ -1085,7 +1116,7 @@ Match objects support the following methods and attributes: many groups are in the pattern. The *default* argument is used for groups that did not participate in the match; it defaults to ``None``. - For example: + For example:: >>> m = re.match(r"(\d+)\.(\d+)", "24.1632") >>> m.groups() @@ -1093,7 +1124,7 @@ Match objects support the following methods and attributes: If we make the decimal place and everything after it optional, not all groups might participate in the match. These groups will default to ``None`` unless - the *default* argument is given: + the *default* argument is given:: >>> m = re.match(r"(\d+)\.?(\d+)?", "24") >>> m.groups() # Second group defaults to None. @@ -1106,7 +1137,7 @@ Match objects support the following methods and attributes: Return a dictionary containing all the *named* subgroups of the match, keyed by the subgroup name. The *default* argument is used for groups that did not - participate in the match; it defaults to ``None``. For example: + participate in the match; it defaults to ``None``. For example:: >>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds") >>> m.groupdict() @@ -1129,7 +1160,7 @@ Match objects support the following methods and attributes: ``m.start(0)`` is 1, ``m.end(0)`` is 2, ``m.start(1)`` and ``m.end(1)`` are both 2, and ``m.start(2)`` raises an :exc:`IndexError` exception. - An example that will remove *remove_this* from email addresses: + An example that will remove *remove_this* from email addresses:: >>> email = "tony@tiremove_thisger.net" >>> m = re.search("remove_this", email) @@ -1175,7 +1206,7 @@ Match objects support the following methods and attributes: .. attribute:: Match.re - The regular expression object whose :meth:`~Pattern.match` or + The :ref:`regular expression object <re-objects>` whose :meth:`~Pattern.match` or :meth:`~Pattern.search` method produced this match instance. @@ -1213,7 +1244,7 @@ a 5-character string with each character representing a card, "a" for ace, "k" for king, "q" for queen, "j" for jack, "t" for 10, and "2" through "9" representing the card with that value. -To see if a given string is a valid hand, one could do the following: +To see if a given string is a valid hand, one could do the following:: >>> valid = re.compile(r"^[a2-9tjqk]{5}$") >>> displaymatch(valid.match("akt5q")) # Valid. @@ -1224,7 +1255,7 @@ To see if a given string is a valid hand, one could do the following: "<Match: '727ak', groups=()>" That last hand, ``"727ak"``, contained a pair, or two of the same valued cards. -To match this with a regular expression, one could use backreferences as such: +To match this with a regular expression, one could use backreferences as such:: >>> pair = re.compile(r".*(.).*\1") >>> displaymatch(pair.match("717ak")) # Pair of 7s. @@ -1326,7 +1357,7 @@ restrict the match at the beginning of the string:: Note however that in :const:`MULTILINE` mode :func:`match` only matches at the beginning of the string, whereas using :func:`search` with a regular expression -beginning with ``'^'`` will match at the beginning of each line. +beginning with ``'^'`` will match at the beginning of each line. :: >>> re.match('X', 'A\nB\nX', re.MULTILINE) # No match >>> re.search('^X', 'A\nB\nX', re.MULTILINE) # Match @@ -1342,7 +1373,7 @@ easily read and modified by Python as demonstrated in the following example that creates a phonebook. First, here is the input. Normally it may come from a file, here we are using -triple-quoted string syntax: +triple-quoted string syntax:: >>> text = """Ross McFluff: 834.345.1254 155 Elm Street ... @@ -1417,7 +1448,7 @@ Finding all Adverbs :func:`findall` matches *all* occurrences of a pattern, not just the first one as :func:`search` does. For example, if one was a writer and wanted to find all of the adverbs in some text, he or she might use :func:`findall` in -the following manner: +the following manner:: >>> text = "He was carefully disguised but captured quickly by police." >>> re.findall(r"\w+ly", text) @@ -1431,7 +1462,7 @@ If one wants more information about all matches of a pattern than the matched text, :func:`finditer` is useful as it provides :ref:`match objects <match-objects>` instead of strings. Continuing with the previous example, if one was a writer who wanted to find all of the adverbs *and their positions* in -some text, he or she would use :func:`finditer` in the following manner: +some text, he or she would use :func:`finditer` in the following manner:: >>> text = "He was carefully disguised but captured quickly by police." >>> for m in re.finditer(r"\w+ly", text): @@ -1446,7 +1477,7 @@ Raw String Notation Raw string notation (``r"text"``) keeps regular expressions sane. Without it, every backslash (``'\'``) in a regular expression would have to be prefixed with another one to escape it. For example, the two following lines of code are -functionally identical: +functionally identical:: >>> re.match(r"\W(.)\1\W", " ff ") <re.Match object; span=(0, 4), match=' ff '> @@ -1456,7 +1487,7 @@ functionally identical: When one wants to match a literal backslash, it must be escaped in the regular expression. With raw string notation, this means ``r"\\"``. Without raw string notation, one must use ``"\\\\"``, making the following lines of code -functionally identical: +functionally identical:: >>> re.match(r"\\", r"\\") <re.Match object; span=(0, 1), match='\\'> |