From e4936b830cbe2fff52f8fd65dc0bf56ad86156eb Mon Sep 17 00:00:00 2001 From: Nick Coghlan Date: Sat, 9 Aug 2014 16:14:04 +1000 Subject: Issue #21777: separate docs for binary sequence methods --- Doc/library/stdtypes.rst | 846 ++++++++++++++++++++++++++++++++++++++++++----- Misc/NEWS | 7 + 2 files changed, 776 insertions(+), 77 deletions(-) diff --git a/Doc/library/stdtypes.rst b/Doc/library/stdtypes.rst index e41fa6e..2dd57bc 100644 --- a/Doc/library/stdtypes.rst +++ b/Doc/library/stdtypes.rst @@ -1493,7 +1493,9 @@ expression support in the :mod:`re` module). .. method:: str.center(width[, fillchar]) Return centered in a string of length *width*. Padding is done using the - specified *fillchar* (default is a space). + specified *fillchar* (default is an ASCII space). The original string is + returned if *width* is less than or equal to ``len(s)``. + .. method:: str.count(sub[, start[, end]]) @@ -1598,7 +1600,8 @@ expression support in the :mod:`re` module). .. method:: str.index(sub[, start[, end]]) - Like :meth:`find`, but raise :exc:`ValueError` when the substring is not found. + Like :meth:`~str.find`, but raise :exc:`ValueError` when the substring is + not found. .. method:: str.isalnum() @@ -1701,9 +1704,9 @@ expression support in the :mod:`re` module). .. method:: str.ljust(width[, fillchar]) - Return the string left justified in a string of length *width*. Padding is done - using the specified *fillchar* (default is a space). The original string is - returned if *width* is less than or equal to ``len(s)``. + Return the string left justified in a string of length *width*. Padding is + done using the specified *fillchar* (default is an ASCII space). The + original string is returned if *width* is less than or equal to ``len(s)``. .. method:: str.lower() @@ -1720,7 +1723,7 @@ expression support in the :mod:`re` module). Return a copy of the string with leading characters removed. The *chars* argument is a string specifying the set of characters to be removed. If omitted or ``None``, the *chars* argument defaults to removing whitespace. The *chars* - argument is not a prefix; rather, all combinations of its values are stripped: + argument is not a prefix; rather, all combinations of its values are stripped:: >>> ' spacious '.lstrip() 'spacious ' @@ -1773,9 +1776,9 @@ expression support in the :mod:`re` module). .. method:: str.rjust(width[, fillchar]) - Return the string right justified in a string of length *width*. Padding is done - using the specified *fillchar* (default is a space). The original string is - returned if *width* is less than or equal to ``len(s)``. + Return the string right justified in a string of length *width*. Padding is + done using the specified *fillchar* (default is an ASCII space). The + original string is returned if *width* is less than or equal to ``len(s)``. .. method:: str.rpartition(sep) @@ -1800,7 +1803,7 @@ expression support in the :mod:`re` module). Return a copy of the string with trailing characters removed. The *chars* argument is a string specifying the set of characters to be removed. If omitted or ``None``, the *chars* argument defaults to removing whitespace. The *chars* - argument is not a suffix; rather, all combinations of its values are stripped: + argument is not a suffix; rather, all combinations of its values are stripped:: >>> ' spacious '.rstrip() ' spacious' @@ -1822,6 +1825,15 @@ expression support in the :mod:`re` module). (for example, ``'1<>2<>3'.split('<>')`` returns ``['1', '2', '3']``). Splitting an empty string with a specified separator returns ``['']``. + For example:: + + >>> '1,2,3'.split(',') + ['1', '2', '3'] + >>> '1,2,3'.split(',', maxsplit=1) + ['1', '2 3'] + >>> '1,2,,3,'.split(',') + ['1', '2', '', '3', ''] + If *sep* is not specified or is ``None``, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the @@ -1829,8 +1841,14 @@ expression support in the :mod:`re` module). string or a string consisting of just whitespace with a ``None`` separator returns ``[]``. - For example, ``' 1 2 3 '.split()`` returns ``['1', '2', '3']``, and - ``' 1 2 3 '.split(None, 1)`` returns ``['1', '2 3 ']``. + For example:: + + >>> '1 2 3'.split() + ['1', '2', '3'] + >>> '1 2 3'.split(maxsplit=1) + ['1', '2 3'] + >>> ' 1 2 3 '.split() + ['1', '2', '3'] .. index:: @@ -1843,13 +1861,28 @@ expression support in the :mod:`re` module). Line breaks are not included in the resulting list unless *keepends* is given and true. - For example, ``'ab c\n\nde fg\rkl\r\n'.splitlines()`` returns - ``['ab c', '', 'de fg', 'kl']``, while the same call with ``splitlines(True)`` - returns ``['ab c\n', '\n', 'de fg\r', 'kl\r\n']``. + For example:: + + >>> 'ab c\n\nde fg\rkl\r\n'.splitlines() + ['ab c', '', 'de fg', 'kl']`` + >>> 'ab c\n\nde fg\rkl\r\n'.splitlines(keepends=True) + ['ab c\n', '\n', 'de fg\r', 'kl\r\n'] Unlike :meth:`~str.split` when a delimiter string *sep* is given, this method returns an empty list for the empty string, and a terminal line - break does not result in an extra line. + break does not result in an extra line:: + + >>> "".splitlines() + [] + >>> "One line\n".splitlines() + ['One line'] + + For comparison, ``split('\n')`` gives:: + + >>> ''.split('\n') + [''] + >>> 'Two lines\n'.split('\n') + ['Two lines', ''] .. method:: str.startswith(prefix[, start[, end]]) @@ -1866,7 +1899,7 @@ expression support in the :mod:`re` module). The *chars* argument is a string specifying the set of characters to be removed. If omitted or ``None``, the *chars* argument defaults to removing whitespace. The *chars* argument is not a prefix or suffix; rather, all combinations of its - values are stripped: + values are stripped:: >>> ' spacious '.strip() 'spacious' @@ -1886,6 +1919,11 @@ expression support in the :mod:`re` module). Return a titlecased version of the string where words start with an uppercase character and the remaining characters are lowercase. + For example:: + + >>> 'Hello world'.title() + 'Hello World' + The algorithm uses a simple language-independent definition of a word as groups of consecutive letters. The definition works in many contexts but it means that apostrophes in contractions and possessives form word @@ -1938,9 +1976,18 @@ expression support in the :mod:`re` module). .. method:: str.zfill(width) - Return the numeric string left filled with zeros in a string of length - *width*. A sign prefix is handled correctly. The original string is - returned if *width* is less than or equal to ``len(s)``. + Return a copy of the string left filled with ASCII ``'0'`` digits to + make a string of length *width*. A leading sign prefix (``'+'``/``'-'`` + is handled by inserting the padding *after* the sign character rather + than before. The original string is returned if *width* is less than + or equal to ``len(s)``. + + For example:: + + >>> "42".zfill(5) + '00042' + >>> "-42".zfill(5) + '-0042' @@ -2198,16 +2245,28 @@ other ways: Also see the :ref:`bytes ` built-in. -Since bytes objects are sequences of integers, for a bytes object *b*, -``b[0]`` will be an integer, while ``b[0:1]`` will be a bytes object of -length 1. (This contrasts with text strings, where both indexing and -slicing will produce a string of length 1) +Since 2 hexadecimal digits correspond precisely to a single byte, hexadecimal +numbers are a commonly used format for describing binary data. Accordingly, +the bytes type has an additional class method to read data in that format: + +.. classmethod:: bytes.fromhex(string) + + This :class:`bytes` class method returns a bytes object, decoding the + given string object. The string must contain two hexadecimal digits per + byte, with ASCII spaces being ignored. + + >>> bytes.fromhex('2Ef0 F1f2 ') + b'.\xf0\xf1\xf2' + +Since bytes objects are sequences of integers (akin to a tuple), for a bytes +object *b*, ``b[0]`` will be an integer, while ``b[0:1]`` will be a bytes +object of length 1. (This contrasts with text strings, where both indexing +and slicing will produce a string of length 1) The representation of bytes objects uses the literal format (``b'...'``) since it is often more useful than e.g. ``bytes([46, 46, 46])``. You can always convert a bytes object into a list of integers using ``list(b)``. - .. note:: For Python 2.x users: In the Python 2.x series, a variety of implicit conversions between 8-bit strings (the closest thing 2.x offers to a @@ -2241,6 +2300,29 @@ common bytes and bytearray operations described in :ref:`bytes-methods`. Also see the :ref:`bytearray ` built-in. +Since 2 hexadecimal digits correspond precisely to a single byte, hexadecimal +numbers are a commonly used format for describing binary data. Accordingly, +the bytearray type has an additional class method to read data in that format: + +.. classmethod:: bytearray.fromhex(string) + + This :class:`bytearray` class method returns bytearray object, decoding + the given string object. The string must contain two hexadecimal digits + per byte, with ASCII spaces being ignored. + + >>> bytearray.fromhex('2Ef0 F1f2 ') + bytearray(b'.\xf0\xf1\xf2') + +Since bytearray objects are sequences of integers (akin to a list), for a +bytearray object *b*, ``b[0]`` will be an integer, while ``b[0:1]`` will be +a bytearray object of length 1. (This contrasts with text strings, where +both indexing and slicing will produce a string of length 1) + +The representation of bytearray objects uses the bytes literal format +(``bytearray(b'...')``) since it is often more useful than e.g. +``bytearray([46, 46, 46])``. You can always convert a bytearray object into +a list of integers using ``list(b)``. + .. _bytes-methods: @@ -2252,25 +2334,10 @@ Bytes and Bytearray Operations Both bytes and bytearray objects support the :ref:`common ` sequence operations. They interoperate not just with operands of the same -type, but with any object that supports the -:ref:`buffer protocol `. Due to this flexibility, they can be +type, but with any :term:`bytes-like object`. Due to this flexibility, they can be freely mixed in operations without causing errors. However, the return type of the result may depend on the order of operands. -Due to the common use of ASCII text as the basis for binary protocols, bytes -and bytearray objects provide almost all methods found on text strings, with -the exceptions of: - -* :meth:`str.encode` (which converts text strings to bytes objects) -* :meth:`str.format` and :meth:`str.format_map` (which are used to format - text for display to users) -* :meth:`str.isidentifier`, :meth:`str.isnumeric`, :meth:`str.isdecimal`, - :meth:`str.isprintable` (which are used to check various properties of - text strings which are not typically applicable to binary protocols). - -All other string methods are supported, although sometimes with slight -differences in functionality and semantics (as described below). - .. note:: The methods on bytes and bytearray objects don't accept strings as their @@ -2285,25 +2352,30 @@ differences in functionality and semantics (as described below). a = b"abc" b = a.replace(b"a", b"f") -Whenever a bytes or bytearray method needs to interpret the bytes as -characters (e.g. the :meth:`is...` methods, :meth:`split`, :meth:`strip`), -the ASCII character set is assumed (text strings use Unicode semantics). +Some bytes and bytearray operations assume the use of ASCII compatible +binary formats, and hence should be avoided when working with arbitrary +binary data. These restrictions are covered below. .. note:: - Using these ASCII based methods to manipulate binary data that is not + Using these ASCII based operations to manipulate binary data that is not stored in an ASCII based format may lead to data corruption. -The search operations (:keyword:`in`, :meth:`count`, :meth:`find`, -:meth:`index`, :meth:`rfind` and :meth:`rindex`) all accept both integers -in the range 0 to 255 (inclusive) as well as bytes and byte array sequences. +The following methods on bytes and bytearray objects can be used with +arbitrary binary data. -.. versionchanged:: 3.3 - All of the search methods also accept an integer in the range 0 to 255 - (inclusive) as their first argument. +.. method:: bytes.count(sub[, start[, end]]) + bytearray.count(sub[, start[, end]]) + Return the number of non-overlapping occurrences of subsequence *sub* in + the range [*start*, *end*]. Optional arguments *start* and *end* are + interpreted as in slice notation. + + The subsequence to search for may be any :term:`bytes-like object` or an + integer in the range 0 to 255. + + .. versionchanged:: 3.3 + Also accept an integer in the range 0 to 255 as the subsequence. -Each bytes and bytearray instance provides a :meth:`~bytes.decode` convenience -method that is the inverse of :meth:`str.encode`: .. method:: bytes.decode(encoding="utf-8", errors="strict") bytearray.decode(encoding="utf-8", errors="strict") @@ -2316,37 +2388,175 @@ method that is the inverse of :meth:`str.encode`: :func:`codecs.register_error`, see section :ref:`codec-base-classes`. For a list of possible encodings, see section :ref:`standard-encodings`. + .. note:: + + Passing the *encoding* argument to :class:`str` allows decoding any + :term:`bytes-like object` directly, without needing to make a temporary + bytes or bytearray object. + .. versionchanged:: 3.1 Added support for keyword arguments. -Since 2 hexadecimal digits correspond precisely to a single byte, hexadecimal -numbers are a commonly used format for describing binary data. Accordingly, -the bytes and bytearray types have an additional class method to read data in -that format: -.. classmethod:: bytes.fromhex(string) - bytearray.fromhex(string) +.. method:: bytes.endswith(suffix[, start[, end]]) + bytearray.endswith(suffix[, start[, end]]) - This :class:`bytes` class method returns a bytes or bytearray object, - decoding the given string object. The string must contain two hexadecimal - digits per byte, spaces are ignored. + Return ``True`` if the binary data ends with the specified *suffix*, + otherwise return ``False``. *suffix* can also be a tuple of suffixes to + look for. With optional *start*, test beginning at that position. With + optional *end*, stop comparing at that position. - >>> bytes.fromhex('2Ef0 F1f2 ') - b'.\xf0\xf1\xf2' + The suffix(es) to search for may be any :term:`bytes-like object`. + + +.. method:: bytes.find(sub[, start[, end]]) + bytearray.find(sub[, start[, end]]) + + Return the lowest index in the data where the subsequence *sub* is found, + such that *sub* is contained in the slice ``s[start:end]``. Optional + arguments *start* and *end* are interpreted as in slice notation. Return + ``-1`` if *sub* is not found. + + The subsequence to search for may be any :term:`bytes-like object` or an + integer in the range 0 to 255. + + .. note:: + + The :meth:`~bytes.find` method should be used only if you need to know the + position of *sub*. To check if *sub* is a substring or not, use the + :keyword:`in` operator:: + + >>> b'Py' in b'Python' + True + + .. versionchanged:: 3.3 + Also accept an integer in the range 0 to 255 as the subsequence. -The maketrans and translate methods differ in semantics from the versions -available on strings: +.. method:: bytes.index(sub[, start[, end]]) + bytearray.index(sub[, start[, end]]) + + Like :meth:`~bytes.find`, but raise :exc:`ValueError` when the + subsequence is not found. + + The subsequence to search for may be any :term:`bytes-like object` or an + integer in the range 0 to 255. + + .. versionchanged:: 3.3 + Also accept an integer in the range 0 to 255 as the subsequence. + + +.. method:: bytes.join(iterable) + bytearray.join(iterable) + + Return a bytes or bytearray object which is the concatenation of the + binary data sequences in the :term:`iterable` *iterable*. A + :exc:`TypeError` will be raised if there are any values in *iterable* + that are note :term:`bytes-like objects `, including + :class:`str` objects. The separator between elements is the contents + of the bytes or bytearray object providing this method. + + +.. staticmethod:: bytes.maketrans(from, to) + bytearray.maketrans(from, to) + + This static method returns a translation table usable for + :meth:`bytes.translate` that will map each character in *from* into the + character at the same position in *to*; *from* and *to* must both be + :term:`bytes-like objects ` and have the same length. + + .. versionadded:: 3.1 + + +.. method:: bytes.partition(sep) + bytearray.partition(sep) + + Split the sequence at the first occurrence of *sep*, and return a 3-tuple + containing the part before the separator, the separator, and the part + after the separator. If the separator is not found, return a 3-tuple + containing a copy of the original sequence, followed by two empty bytes or + bytearray objects. + + The separator to search for may be any :term:`bytes-like object`. + + +.. method:: bytes.replace(old, new[, count]) + bytearray.replace(old, new[, count]) + + Return a copy of the sequence with all occurrences of subsequence *old* + replaced by *new*. If the optional argument *count* is given, only the + first *count* occurrences are replaced. + + The subsequence to search for and its replacement may be any + :term:`bytes-like object`. + + .. note:: + + The bytearray version of this method does *not* operate in place - it + always produces a new object, even if no changes were made. + + +.. method:: bytes.rfind(sub[, start[, end]]) + bytearray.rfind(sub[, start[, end]]) + + Return the highest index in the sequence where the subsequence *sub* is + found, such that *sub* is contained within ``s[start:end]``. Optional + arguments *start* and *end* are interpreted as in slice notation. Return + ``-1`` on failure. + + The subsequence to search for may be any :term:`bytes-like object` or an + integer in the range 0 to 255. + + .. versionchanged:: 3.3 + Also accept an integer in the range 0 to 255 as the subsequence. + + +.. method:: bytes.rindex(sub[, start[, end]]) + bytearray.rindex(sub[, start[, end]]) + + Like :meth:`~bytes.rfind` but raises :exc:`ValueError` when the + subsequence *sub* is not found. + + The subsequence to search for may be any :term:`bytes-like object` or an + integer in the range 0 to 255. + + .. versionchanged:: 3.3 + Also accept an integer in the range 0 to 255 as the subsequence. + + +.. method:: bytes.rpartition(sep) + bytearray.rpartition(sep) + + Split the sequence at the last occurrence of *sep*, and return a 3-tuple + containing the part before the separator, the separator, and the part + after the separator. If the separator is not found, return a 3-tuple + containing a copy of the original sequence, followed by two empty bytes or + bytearray objects. + + The separator to search for may be any :term:`bytes-like object`. + + +.. method:: bytes.startswith(prefix[, start[, end]]) + bytearray.startswith(prefix[, start[, end]]) + + Return ``True`` if the binary data starts with the specified *prefix*, + otherwise return ``False``. *prefix* can also be a tuple of prefixes to + look for. With optional *start*, test beginning at that position. With + optional *end*, stop comparing at that position. + + The prefix(es) to search for may be any :term:`bytes-like object`. + .. method:: bytes.translate(table[, delete]) bytearray.translate(table[, delete]) Return a copy of the bytes or bytearray object where all bytes occurring in - the optional argument *delete* are removed, and the remaining bytes have been - mapped through the given translation table, which must be a bytes object of - length 256. + the optional argument *delete* are removed, and the remaining bytes have + been mapped through the given translation table, which must be a bytes + object of length 256. - You can use the :func:`bytes.maketrans` method to create a translation table. + You can use the :func:`bytes.maketrans` method to create a translation + table. Set the *table* argument to ``None`` for translations that only delete characters:: @@ -2355,15 +2565,497 @@ available on strings: b'rd ths shrt txt' -.. staticmethod:: bytes.maketrans(from, to) - bytearray.maketrans(from, to) +The following methods on bytes and bytearray objects have default behaviours +that assume the use of ASCII compatible binary formats, but can still be used +with arbitrary binary data by passing appropriate arguments. Note that all of +the bytearray methods in this section do *not* operate in place, and instead +produce new objects. - This static method returns a translation table usable for - :meth:`bytes.translate` that will map each character in *from* into the - character at the same position in *to*; *from* and *to* must be bytes objects - and have the same length. +.. method:: bytes.center(width[, fillbyte]) + bytearray.center(width[, fillbyte]) - .. versionadded:: 3.1 + Return a copy of the object centered in a sequence of length *width*. + Padding is done using the specified *fillbyte* (default is an ASCII + space). For :class:`bytes` objects, the original sequence is returned if + *width* is less than or equal to ``len(s)``. + + .. note:: + + The bytearray version of this method does *not* operate in place - + it always produces a new object, even if no changes were made. + + +.. method:: bytes.ljust(width[, fillbyte]) + bytearray.ljust(width[, fillbyte]) + + Return a copy of the object left justified in a sequence of length *width*. + Padding is done using the specified *fillbyte* (default is an ASCII + space). For :class:`bytes` objects, the original sequence is returned if + *width* is less than or equal to ``len(s)``. + + .. note:: + + The bytearray version of this method does *not* operate in place - + it always produces a new object, even if no changes were made. + + +.. method:: bytes.lstrip([chars]) + bytearray.lstrip([chars]) + + Return a copy of the sequence with specified leading bytes removed. The + *chars* argument is a binary sequence specifying the set of byte values to + be removed - the name refers to the fact this method is usually used with + ASCII characters. If omitted or ``None``, the *chars* argument defaults + to removing ASCII whitespace. The *chars* argument is not a prefix; + rather, all combinations of its values are stripped:: + + >>> b' spacious '.lstrip() + b'spacious ' + >>> b'www.example.com'.lstrip(b'cmowz.') + b'example.com' + + The binary sequence of byte values to remove may be any + :term:`bytes-like object`. + + .. note:: + + The bytearray version of this method does *not* operate in place - + it always produces a new object, even if no changes were made. + + +.. method:: bytes.rjust(width[, fillbyte]) + bytearray.rjust(width[, fillbyte]) + + Return a copy of the object right justified in a sequence of length *width*. + Padding is done using the specified *fillbyte* (default is an ASCII + space). For :class:`bytes` objects, the original sequence is returned if + *width* is less than or equal to ``len(s)``. + + .. note:: + + The bytearray version of this method does *not* operate in place - + it always produces a new object, even if no changes were made. + + +.. method:: bytes.rsplit(sep=None, maxsplit=-1) + bytearray.rsplit(sep=None, maxsplit=-1) + + Split the binary sequence into subsequences of the same type, using *sep* + as the delimiter string. If *maxsplit* is given, at most *maxsplit* splits + are done, the *rightmost* ones. If *sep* is not specified or ``None``, + any subsequence consisting solely of ASCII whitespace is a separator. + Except for splitting from the right, :meth:`rsplit` behaves like + :meth:`split` which is described in detail below. + + +.. method:: bytes.rstrip([chars]) + bytearray.rstrip([chars]) + + Return a copy of the sequence with specified trailing bytes removed. The + *chars* argument is a binary sequence specifying the set of byte values to + be removed - the name refers to the fact this method is usually used with + ASCII characters. If omitted or ``None``, the *chars* argument defaults to + removing ASCII whitespace. The *chars* argument is not a suffix; rather, + all combinations of its values are stripped:: + + >>> b' spacious '.rstrip() + b' spacious' + >>> b'mississippi'.rstrip(b'ipz') + b'mississ' + + The binary sequence of byte values to remove may be any + :term:`bytes-like object`. + + .. note:: + + The bytearray version of this method does *not* operate in place - + it always produces a new object, even if no changes were made. + + +.. method:: bytes.split(sep=None, maxsplit=-1) + bytearray.split(sep=None, maxsplit=-1) + + Split the binary sequence into subsequences of the same type, using *sep* + as the delimiter string. If *maxsplit* is given and non-negative, at most + *maxsplit* splits are done (thus, the list will have at most ``maxsplit+1`` + elements). If *maxsplit* is not specified or is ``-1``, then there is no + limit on the number of splits (all possible splits are made). + + If *sep* is given, consecutive delimiters are not grouped together and are + deemed to delimit empty subsequences (for example, ``b'1,,2'.split(b',')`` + returns ``[b'1', b'', b'2']``). The *sep* argument may consist of a + multibyte sequence (for example, ``b'1<>2<>3'.split(b'<>')`` returns + ``[b'1', b'2', b'3']``). Splitting an empty sequence with a specified + separator returns ``[b'']`` or ``[bytearray(b'')]`` depending on the type + of object being split. The *sep* argument may be any + :term:`bytes-like object`. + + For example:: + + >>> b'1,2,3'.split(b',') + [b'1', b'2', b'3'] + >>> b'1,2,3'.split(b',', maxsplit=1) + [b'1', b'2 3'] + >>> b'1,2,,3,'.split(b',') + [b'1', b'2', b'', b'3', b''] + + If *sep* is not specified or is ``None``, a different splitting algorithm + is applied: runs of consecutive ASCII whitespace are regarded as a single + separator, and the result will contain no empty strings at the start or + end if the sequence has leading or trailing whitespace. Consequently, + splitting an empty sequence or a sequence consisting solely of ASCII + whitespace without a specified separator returns ``[]``. + + For example:: + + + >>> b'1 2 3'.split() + [b'1', b'2', b'3'] + >>> b'1 2 3'.split(maxsplit=1) + [b'1', b'2 3'] + >>> b' 1 2 3 '.split() + [b'1', b'2', b'3'] + + +.. method:: bytes.strip([chars]) + bytearray.strip([chars]) + + Return a copy of the sequence with specified leading and trailing bytes + removed. The *chars* argument is a binary sequence specifying the set of + byte values to be removed - the name refers to the fact this method is + usually used with ASCII characters. If omitted or ``None``, the *chars* + argument defaults to removing ASCII whitespace. The *chars* argument is + not a prefix or suffix; rather, all combinations of its values are + stripped:: + + >>> b' spacious '.strip() + b'spacious' + >>> b'www.example.com'.strip(b'cmowz.') + b'example' + + The binary sequence of byte values to remove may be any + :term:`bytes-like object`. + + .. note:: + + The bytearray version of this method does *not* operate in place - + it always produces a new object, even if no changes were made. + + +The following methods on bytes and bytearray objects assume the use of ASCII +compatible binary formats and should not be applied to arbitrary binary data. +Note that all of the bytearray methods in this section do *not* operate in +place, and instead produce new objects. + +.. method:: bytes.capitalize() + bytearray.capitalize() + + Return a copy of the sequence with each byte interpreted as an ASCII + character, and the first byte capitalized and the rest lowercased. + Non-ASCII byte values are passed through unchanged. + + .. note:: + + The bytearray version of this method does *not* operate in place - it + always produces a new object, even if no changes were made. + + +.. method:: bytes.expandtabs(tabsize=8) + bytearray.expandtabs(tabsize=8) + + Return a copy of the sequence where all ASCII tab characters are replaced + by one or more ASCII spaces, depending on the current column and the given + tab size. Tab positions occur every *tabsize* bytes (default is 8, + giving tab positions at columns 0, 8, 16 and so on). To expand the + sequence, the current column is set to zero and the sequence is examined + byte by byte. If the byte is an ASCII tab character (``b'\t'``), one or + more space characters are inserted in the result until the current column + is equal to the next tab position. (The tab character itself is not + copied.) If the current byte is an ASCII newline (``b'\n'``) or + carriage return (``b'\r'``), it is copied and the current column is reset + to zero. Any other byte value is copied unchanged and the current column + is incremented by one regardless of how the byte value is represented when + printed:: + + >>> b'01\t012\t0123\t01234'.expandtabs() + b'01 012 0123 01234' + >>> b'01\t012\t0123\t01234'.expandtabs(4) + b'01 012 0123 01234' + + .. note:: + + The bytearray version of this method does *not* operate in place - it + always produces a new object, even if no changes were made. + + +.. method:: bytes.isalnum() + bytearray.isalnum() + + Return true if all bytes in the sequence are alphabetical ASCII characters + or ASCII decimal digits and the sequence is not empty, false otherwise. + Alphabetic ASCII characters are those byte values in the sequence + ``b'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'``. ASCII decimal + digits are those byte values in the sequence ``b'0123456789'``. + + For example:: + + >>> b'ABCabc1'.isalnum() + True + >>> b'ABC abc1'.isalnum() + False + + +.. method:: bytes.isalpha() + bytearray.isalpha() + + Return true if all bytes in the sequence are alphabetic ASCII characters + and the sequence is not empty, false otherwise. Alphabetic ASCII + characters are those byte values in the sequence + ``b'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'``. + + For example:: + + >>> b'ABCabc'.isalpha() + True + >>> b'ABCabc1'.isalpha() + False + + +.. method:: bytes.isdigit() + bytearray.isdigit() + + Return true if all bytes in the sequence are ASCII decimal digits + and the sequence is not empty, false otherwise. ASCII decimal digits are + those byte values in the sequence ``b'0123456789'``. + + For example:: + + >>> b'1234'.isdigit() + True + >>> b'1.23'.isdigit() + False + + +.. method:: bytes.islower() + bytearray.islower() + + Return true if there is at least one lowercase ASCII character + in the sequence and no uppercase ASCII characters, false otherwise. + + For example:: + + >>> b'hello world'.islower() + True + >>> b'Hello world'.islower() + False + + Lowercase ASCII characters are those byte values in the sequence + ``b'abcdefghijklmnopqrstuvwxyz'``. Uppercase ASCII characters + are those byte values in the sequence ``b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'``. + + +.. method:: bytes.isspace() + bytearray.isspace() + + Return true if all bytes in the sequence are ASCII whitespace and the + sequence is not empty, false otherwise. ASCII whitespace characters are + those byte values in the sequence b' \t\n\r\x0b\f' (space, tab, newline, + carriage return, vertical tab, form feed). + + +.. method:: bytes.istitle() + bytearray.istitle() + + Return true if the sequence is ASCII titlecase and the sequence is not + empty, false otherwise. See :meth:`bytes.title` for more details on the + definition of "titlecase". + + For example:: + + >>> b'Hello World'.istitle() + True + >>> b'Hello world'.istitle() + False + + +.. method:: bytes.isupper() + bytearray.isupper() + + Return true if there is at least one lowercase alphabetic ASCII character + in the sequence and no uppercase ASCII characters, false otherwise. + + For example:: + + >>> b'HELLO WORLD'.isupper() + True + >>> b'Hello world'.isupper() + False + + Lowercase ASCII characters are those byte values in the sequence + ``b'abcdefghijklmnopqrstuvwxyz'``. Uppercase ASCII characters + are those byte values in the sequence ``b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'``. + + +.. method:: bytes.lower() + bytearray.lower() + + Return a copy of the sequence with all the uppercase ASCII characters + converted to their corresponding lowercase counterpart. + + For example:: + + >>> b'Hello World'.lower() + b'hello world' + + Lowercase ASCII characters are those byte values in the sequence + ``b'abcdefghijklmnopqrstuvwxyz'``. Uppercase ASCII characters + are those byte values in the sequence ``b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'``. + + .. note:: + + The bytearray version of this method does *not* operate in place - it + always produces a new object, even if no changes were made. + + +.. index:: + single: universal newlines; bytes.splitlines method + single: universal newlines; bytearray.splitlines method + +.. method:: bytes.splitlines(keepends=False) + bytearray.splitlines(keepends=False) + + Return a list of the lines in the binary sequence, breaking at ASCII + line boundaries. This method uses the :term:`universal newlines` approach + to splitting lines. Line breaks are not included in the resulting list + unless *keepends* is given and true. + + For example:: + + >>> b'ab c\n\nde fg\rkl\r\n'.splitlines() + [b'ab c', b'', b'de fg', b'kl']`` + >>> b'ab c\n\nde fg\rkl\r\n'.splitlines(keepends=True) + [b'ab c\n', b'\n', b'de fg\r', b'kl\r\n'] + + Unlike :meth:`~bytes.split` when a delimiter string *sep* is given, this + method returns an empty list for the empty string, and a terminal line + break does not result in an extra line:: + + >>> b"".split(b'\n'), b"Two lines\n".split(b'\n') + ([b''], [b'Two lines', b'']) + >>> b"".splitlines(), b"One line\n".splitlines() + ([], [b'One line']) + + +.. method:: bytes.swapcase() + bytearray.swapcase() + + Return a copy of the sequence with all the lowercase ASCII characters + converted to their corresponding uppercase counterpart and vice-versa. + + For example:: + + >>> b'Hello World'.swapcase() + b'hELLO wORLD' + + Lowercase ASCII characters are those byte values in the sequence + ``b'abcdefghijklmnopqrstuvwxyz'``. Uppercase ASCII characters + are those byte values in the sequence ``b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'``. + + Unlike :func:`str.swapcase()`, it is always the case that + ``bin.swapcase().swapcase() == bin`` for the binary versions. Case + conversions are symmetrical in ASCII, even though that is not generally + true for arbitrary Unicode code points. + + .. note:: + + The bytearray version of this method does *not* operate in place - it + always produces a new object, even if no changes were made. + + +.. method:: bytes.title() + bytearray.title() + + Return a titlecased version of the binary sequence where words start with + an uppercase ASCII character and the remaining characters are lowercase. + Uncased byte values are left unmodified. + + For example:: + + >>> b'Hello world'.title() + b'Hello World' + + Lowercase ASCII characters are those byte values in the sequence + ``b'abcdefghijklmnopqrstuvwxyz'``. Uppercase ASCII characters + are those byte values in the sequence ``b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'``. + All other byte values are uncased. + + The algorithm uses a simple language-independent definition of a word as + groups of consecutive letters. The definition works in many contexts but + it means that apostrophes in contractions and possessives form word + boundaries, which may not be the desired result:: + + >>> b"they're bill's friends from the UK".title() + b"They'Re Bill'S Friends From The Uk" + + A workaround for apostrophes can be constructed using regular expressions:: + + >>> import re + >>> def titlecase(s): + ... return re.sub(rb"[A-Za-z]+('[A-Za-z]+)?", + ... lambda mo: mo.group(0)[0:1].upper() + + ... mo.group(0)[1:].lower(), + ... s) + ... + >>> titlecase(b"they're bill's friends.") + b"They're Bill's Friends." + + .. note:: + + The bytearray version of this method does *not* operate in place - it + always produces a new object, even if no changes were made. + + +.. method:: bytes.upper() + bytearray.upper() + + Return a copy of the sequence with all the lowercase ASCII characters + converted to their corresponding uppercase counterpart. + + For example:: + + >>> b'Hello World'.upper() + b'HELLO WORLD' + + Lowercase ASCII characters are those byte values in the sequence + ``b'abcdefghijklmnopqrstuvwxyz'``. Uppercase ASCII characters + are those byte values in the sequence ``b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'``. + + .. note:: + + The bytearray version of this method does *not* operate in place - it + always produces a new object, even if no changes were made. + + +.. method:: bytes.zfill(width) + bytearray.zfill(width) + + Return a copy of the sequence left filled with ASCII ``b'0'`` digits to + make a sequence of length *width*. A leading sign prefix (``b'+'``/ + ``b'-'`` is handled by inserting the padding *after* the sign character + rather than before. For :class:`bytes` objects, the original sequence is + returned if *width* is less than or equal to ``len(seq)``. + + For example:: + + >>> b"42".zfill(5) + b'00042' + >>> b"-42".zfill(5) + b'-0042' + + .. note:: + + The bytearray version of this method does *not* operate in place - it + always produces a new object, even if no changes were made. .. _typememoryview: diff --git a/Misc/NEWS b/Misc/NEWS index 2b21ba5..247a4f3 100644 --- a/Misc/NEWS +++ b/Misc/NEWS @@ -283,6 +283,13 @@ Tests - Issue #21522: Added Tkinter tests for Listbox.itemconfigure(), PanedWindow.paneconfigure(), and Menu.entryconfigure(). +Documentation +------------- + +- Issue #21777: The binary sequence methods on bytes and bytearray are now + documented explicitly, rather than assuming users will be able to derive + the expected behaviour from the behaviour of the corresponding str methods. + Windows ------- -- cgit v0.12