diff options
author | Nick Coghlan <ncoghlan@gmail.com> | 2012-08-20 07:14:07 (GMT) |
---|---|---|
committer | Nick Coghlan <ncoghlan@gmail.com> | 2012-08-20 07:14:07 (GMT) |
commit | 273069cf7d79293fa0c2cca7fe6ab386a4e5c02a (patch) | |
tree | d79cb62e7b86eac5dbe59a09d6c8a4b4c4a56b82 /Doc/library | |
parent | 1685db011dd8e5b808f7c176b05bfdb14c101789 (diff) | |
download | cpython-273069cf7d79293fa0c2cca7fe6ab386a4e5c02a.zip cpython-273069cf7d79293fa0c2cca7fe6ab386a4e5c02a.tar.gz cpython-273069cf7d79293fa0c2cca7fe6ab386a4e5c02a.tar.bz2 |
Close #4966: revamp the sequence docs in order to better explain the state of modern Python
Diffstat (limited to 'Doc/library')
-rw-r--r-- | Doc/library/binary.rst | 23 | ||||
-rw-r--r-- | Doc/library/index.rst | 3 | ||||
-rw-r--r-- | Doc/library/stdtypes.rst | 1618 | ||||
-rw-r--r-- | Doc/library/strings.rst | 27 | ||||
-rw-r--r-- | Doc/library/text.rst | 24 |
5 files changed, 976 insertions, 719 deletions
diff --git a/Doc/library/binary.rst b/Doc/library/binary.rst new file mode 100644 index 0000000..51fbdc1 --- /dev/null +++ b/Doc/library/binary.rst @@ -0,0 +1,23 @@ +.. _binaryservices: + +******************** +Binary Data Services +******************** + +The modules described in this chapter provide some basic services operations +for manipulation of binary data. Other operations on binary data, specifically +in relation to file formats and network protocols, are described in the +relevant sections. + +Some libraries described under :ref:`textservices` also work with either +ASCII-compatible binary formats (for example, :mod:`re`) or all binary data +(for example, :mod:`difflib`). + +In addition, see the documentation for Python's built-in binary data types in +:ref:`binaryseq`. + +.. toctree:: + + struct.rst + codecs.rst + diff --git a/Doc/library/index.rst b/Doc/library/index.rst index 9ac688c..dc35b09 100644 --- a/Doc/library/index.rst +++ b/Doc/library/index.rst @@ -46,7 +46,8 @@ the `Python Package Index <http://pypi.python.org/pypi>`_. stdtypes.rst exceptions.rst - strings.rst + text.rst + binary.rst datatypes.rst numeric.rst functional.rst diff --git a/Doc/library/stdtypes.rst b/Doc/library/stdtypes.rst index 0e71910..3242d4a 100644 --- a/Doc/library/stdtypes.rst +++ b/Doc/library/stdtypes.rst @@ -672,7 +672,7 @@ Here are the rules in detail: To clarify the above rules, here's some example Python code, -equivalent to the builtin hash, for computing the hash of a rational +equivalent to the built-in hash, for computing the hash of a rational number, :class:`float`, or :class:`complex`:: @@ -799,110 +799,77 @@ the yield expression <yieldexpr>`. .. _typesseq: -Sequence Types --- :class:`str`, :class:`bytes`, :class:`bytearray`, :class:`list`, :class:`tuple`, :class:`range` -================================================================================================================== +Sequence Types --- :class:`list`, :class:`tuple`, :class:`range` +================================================================ -There are six sequence types: strings, byte sequences (:class:`bytes` objects), -byte arrays (:class:`bytearray` objects), lists, tuples, and range objects. For -other containers see the built in :class:`dict` and :class:`set` classes, and -the :mod:`collections` module. +There are three basic sequence types: lists, tuples, and range objects. +Additional sequence types tailored for processing of +:ref:`binary data <binaryseq>` and :ref:`text strings <textseq>` are +described in dedicated sections. -.. index:: - object: sequence - object: string - object: bytes - object: bytearray - object: tuple - object: list - object: range - -Strings contain Unicode characters. Their literals are written in single or -double quotes: ``'xyzzy'``, ``"frobozz"``. See :ref:`strings` for more about -string literals. In addition to the functionality described here, there are -also string-specific methods described in the :ref:`string-methods` section. - -Bytes and bytearray objects contain single bytes -- the former is immutable -while the latter is a mutable sequence. -Bytes objects can be constructed by using the -constructor, :func:`bytes`, and from literals; use a ``b`` prefix with normal -string syntax: ``b'xyzzy'``. To construct byte arrays, use the -:func:`bytearray` function. - -While string objects are sequences of characters (represented by strings of -length 1), bytes and bytearray objects are sequences of *integers* (between 0 -and 255), representing the ASCII value of single bytes. That means that for -a bytes or bytearray object *b*, ``b[0]`` will be an integer, while -``b[0:1]`` will be a bytes or bytearray object of length 1. The -representation of bytes objects uses the literal format (``b'...'``) since it -is generally more useful than e.g. ``bytes([50, 19, 100])``. You can always -convert a bytes object into a list of integers using ``list(b)``. - -Also, while in previous Python versions, byte strings and Unicode strings -could be exchanged for each other rather freely (barring encoding issues), -strings and bytes are now completely separate concepts. There's no implicit -en-/decoding if you pass an object of the wrong type. A string always -compares unequal to a bytes or bytearray object. - -Lists are constructed with square brackets, separating items with commas: ``[a, -b, c]``. Tuples are constructed by the comma operator (not within square -brackets), with or without enclosing parentheses, but an empty tuple must have -the enclosing parentheses, such as ``a, b, c`` or ``()``. A single item tuple -must have a trailing comma, such as ``(d,)``. - -Objects of type range are created using the :func:`range` function. They don't -support concatenation or repetition, and using :func:`min` or :func:`max` on -them is inefficient. - -Most sequence types support the following operations. The ``in`` and ``not in`` -operations have the same priorities as the comparison operations. The ``+`` and -``*`` operations have the same priority as the corresponding numeric operations. -[3]_ Additional methods are provided for :ref:`typesseq-mutable`. +.. _typesseq-common: + +Common Sequence Operations +-------------------------- + +.. index:: object: sequence + +The operations in the following table are supported by most sequence types, +both mutable and immutable. The :class:`collections.abc.Sequence` ABC is +provided to make it easier to correctly implement these operations on +custom sequence types. This table lists the sequence operations sorted in ascending priority (operations in the same box have the same priority). In the table, *s* and *t* -are sequences of the same type; *n*, *i*, *j* and *k* are integers. - -+------------------+--------------------------------+----------+ -| Operation | Result | Notes | -+==================+================================+==========+ -| ``x in s`` | ``True`` if an item of *s* is | \(1) | -| | equal to *x*, else ``False`` | | -+------------------+--------------------------------+----------+ -| ``x not in s`` | ``False`` if an item of *s* is | \(1) | -| | equal to *x*, else ``True`` | | -+------------------+--------------------------------+----------+ -| ``s + t`` | the concatenation of *s* and | \(6) | -| | *t* | | -+------------------+--------------------------------+----------+ -| ``s * n, n * s`` | *n* shallow copies of *s* | \(2) | -| | concatenated | | -+------------------+--------------------------------+----------+ -| ``s[i]`` | *i*\ th item of *s*, origin 0 | \(3) | -+------------------+--------------------------------+----------+ -| ``s[i:j]`` | slice of *s* from *i* to *j* | (3)(4) | -+------------------+--------------------------------+----------+ -| ``s[i:j:k]`` | slice of *s* from *i* to *j* | (3)(5) | -| | with step *k* | | -+------------------+--------------------------------+----------+ -| ``len(s)`` | length of *s* | | -+------------------+--------------------------------+----------+ -| ``min(s)`` | smallest item of *s* | | -+------------------+--------------------------------+----------+ -| ``max(s)`` | largest item of *s* | | -+------------------+--------------------------------+----------+ -| ``s.index(i)`` | index of the first occurence | | -| | of *i* in *s* | | -+------------------+--------------------------------+----------+ -| ``s.count(i)`` | total number of occurences of | | -| | *i* in *s* | | -+------------------+--------------------------------+----------+ - -Sequence types also support comparisons. In particular, tuples and lists are -compared lexicographically by comparing corresponding elements. This means that -to compare equal, every element must compare equal and the two sequences must be -of the same type and have the same length. (For full details see -:ref:`comparisons` in the language reference.) +are sequences of the same type, *n*, *i*, *j* and *k* are integers and *x* is +an arbitrary object that meets any type and value restrictions imposed by *s*. + +The ``in`` and ``not in`` operations have the same priorities as the +comparison operations. The ``+`` (concatenation) and ``*`` (repetition) +operations have the same priority as the corresponding numeric operations. + ++--------------------------+--------------------------------+----------+ +| Operation | Result | Notes | ++==========================+================================+==========+ +| ``x in s`` | ``True`` if an item of *s* is | \(1) | +| | equal to *x*, else ``False`` | | ++--------------------------+--------------------------------+----------+ +| ``x not in s`` | ``False`` if an item of *s* is | \(1) | +| | equal to *x*, else ``True`` | | ++--------------------------+--------------------------------+----------+ +| ``s + t`` | the concatenation of *s* and | (6)(7) | +| | *t* | | ++--------------------------+--------------------------------+----------+ +| ``s * n, n * s`` | *n* shallow copies of *s* | (2)(7) | +| | concatenated | | ++--------------------------+--------------------------------+----------+ +| ``s[i]`` | *i*\ th item of *s*, origin 0 | \(3) | ++--------------------------+--------------------------------+----------+ +| ``s[i:j]`` | slice of *s* from *i* to *j* | (3)(4) | ++--------------------------+--------------------------------+----------+ +| ``s[i:j:k]`` | slice of *s* from *i* to *j* | (3)(5) | +| | with step *k* | | ++--------------------------+--------------------------------+----------+ +| ``len(s)`` | length of *s* | | ++--------------------------+--------------------------------+----------+ +| ``min(s)`` | smallest item of *s* | | ++--------------------------+--------------------------------+----------+ +| ``max(s)`` | largest item of *s* | | ++--------------------------+--------------------------------+----------+ +| ``s.index(x, [i[, j]])`` | index of the first occurence | \(8) | +| | of *x* in *s* (at or after | | +| | index *i* and before index *j*)| | ++--------------------------+--------------------------------+----------+ +| ``s.count(x)`` | total number of occurences of | | +| | *x* in *s* | | ++--------------------------+--------------------------------+----------+ + +Sequences of the same type also support comparisons. In particular, tuples +and lists are compared lexicographically by comparing corresponding elements. +This means that to compare equal, every element must compare equal and the +two sequences must be of the same type and have the same length. (For full +details see :ref:`comparisons` in the language reference.) .. index:: triple: operations on; sequence; types @@ -919,14 +886,19 @@ of the same type and have the same length. (For full details see Notes: (1) - When *s* is a string object, the ``in`` and ``not in`` operations act like a - substring test. + While the ``in`` and ``not in`` operations are used only for simple + containment testing in the general case, some specialised sequences + (such as :class:`str`, :class:`bytes` and :class:`bytearray`) also use + them for subsequence testing:: + + >>> "gg" in "eggs" + True (2) Values of *n* less than ``0`` are treated as ``0`` (which yields an empty sequence of the same type as *s*). Note also that the copies are shallow; nested structures are not copied. This often haunts new Python programmers; - consider: + consider:: >>> lists = [[]] * 3 >>> lists @@ -938,7 +910,7 @@ Notes: What has happened is that ``[[]]`` is a one-element list containing an empty list, so all three elements of ``[[]] * 3`` are (pointers to) this single empty list. Modifying any of the elements of ``lists`` modifies this single list. - You can create a list of different lists this way: + You can create a list of different lists this way:: >>> lists = [[] for i in range(3)] >>> lists[0].append(3) @@ -969,33 +941,354 @@ Notes: If *k* is ``None``, it is treated like ``1``. (6) - Concatenating immutable strings always results in a new object. This means - that building up a string by repeated concatenation will have a quadratic - runtime cost in the total string length. To get a linear runtime cost, - you must switch to one of the alternatives below: + Concatenating immutable sequences always results in a new object. This + means that building up a sequence by repeated concatenation will have a + quadratic runtime cost in the total sequence length. To get a linear + runtime cost, you must switch to one of the alternatives below: * if concatenating :class:`str` objects, you can build a list and use - :meth:`str.join` at the end; + :meth:`str.join` at the end or else write to a :class:`io.StringIO` + instance and retrieve its value when complete; * if concatenating :class:`bytes` objects, you can similarly use - :meth:`bytes.join`, or you can do in-place concatenation with a - :class:`bytearray` object. :class:`bytearray` objects are mutable and - have an efficient overallocation mechanism. + :meth:`bytes.join` or :class:`io.BytesIO`, or you can do in-place + concatenation with a :class:`bytearray` object. :class:`bytearray` + objects are mutable and have an efficient overallocation mechanism. + + * if concatenating :class:`tuple` objects, extend a :class:`list` instead. + + * for other types, investigate the relevant class documentation + + +(7) + Some sequence types (such as :class:`range`) only support item sequences + that follow specific patterns, and hence don't support sequence + concatenation or repetition. + +(8) + ``index`` raises :exc:`ValueError` when *x* is not found in *s*. + When supported, the additional arguments to the index method allow + efficient searching of subsections of the sequence. Passing the extra + arguments is roughly equivalent to using ``s[i:j].index(x)``, only + without copying any data and with the returned index being relative to + the start of the sequence rather than the start of the slice. + + +.. _typesseq-immutable: + +Immutable Sequence Types +------------------------ + +.. index:: + triple: immutable; sequence; types + object: tuple + +The only operation that immutable sequence types generally implement that is +not also implemented by mutable sequence types is support for the :func:`hash` +built-in. + +This support allows immutable sequences, such as :class:`tuple` instances, to +be used as :class:`dict` keys and stored in :class:`set` and :class:`frozenset` +instances. + +Attempting to hash an immutable sequence that contains unhashable values will +result in :exc:`TypeError`. + + +.. _typesseq-mutable: + +Mutable Sequence Types +---------------------- + +.. index:: + triple: mutable; sequence; types + object: list + object: bytearray + +The operations in the following table are defined on mutable sequence types. +The :class:`collections.abc.MutableSequence` ABC is provided to make it +easier to correctly implement these operations on custom sequence types. + +In the table *s* is an instance of a mutable sequence type, *t* is any +iterable object and *x* is an arbitrary object that meets any type +and value restrictions imposed by *s* (for example, :class:`bytearray` only +accepts integers that meet the value restriction ``0 <= x <= 255``). + + +.. index:: + triple: operations on; sequence; types + triple: operations on; list; type + pair: subscript; assignment + pair: slice; assignment + statement: del + single: append() (sequence method) + single: extend() (sequence method) + single: count() (sequence method) + single: index() (sequence method) + single: insert() (sequence method) + single: pop() (sequence method) + single: remove() (sequence method) + single: reverse() (sequence method) + ++------------------------------+--------------------------------+---------------------+ +| Operation | Result | Notes | ++==============================+================================+=====================+ +| ``s[i] = x`` | item *i* of *s* is replaced by | | +| | *x* | | ++------------------------------+--------------------------------+---------------------+ +| ``s[i:j] = t`` | slice of *s* from *i* to *j* | | +| | is replaced by the contents of | | +| | the iterable *t* | | ++------------------------------+--------------------------------+---------------------+ +| ``del s[i:j]`` | same as ``s[i:j] = []`` | | ++------------------------------+--------------------------------+---------------------+ +| ``s[i:j:k] = t`` | the elements of ``s[i:j:k]`` | \(1) | +| | are replaced by those of *t* | | ++------------------------------+--------------------------------+---------------------+ +| ``del s[i:j:k]`` | removes the elements of | | +| | ``s[i:j:k]`` from the list | | ++------------------------------+--------------------------------+---------------------+ +| ``s.append(x)`` | same as ``s[len(s):len(s)] = | | +| | [x]`` | | ++------------------------------+--------------------------------+---------------------+ +| ``s.clear()`` | remove all items from ``s`` | \(5) | +| | (same as ``del s[:]``) | | ++------------------------------+--------------------------------+---------------------+ +| ``s.copy()`` | return a shallow copy of ``s`` | \(5) | +| | (same as ``s[:]``) | | ++------------------------------+--------------------------------+---------------------+ +| ``s.extend(t)`` | same as ``s[len(s):len(s)] = | | +| | t`` | | ++------------------------------+--------------------------------+---------------------+ +| ``s.insert(i, x)`` | same as ``s[i:i] = [x]`` | | ++------------------------------+--------------------------------+---------------------+ +| ``s.pop([i])`` | same as ``x = s[i]; del s[i]; | \(2) | +| | return x`` | | ++------------------------------+--------------------------------+---------------------+ +| ``s.remove(x)`` | same as ``del s[s.index(x)]`` | \(3) | ++------------------------------+--------------------------------+---------------------+ +| ``s.reverse()`` | reverses the items of *s* in | \(4) | +| | place | | ++------------------------------+--------------------------------+---------------------+ + + +Notes: + +(1) + *t* must have the same length as the slice it is replacing. + +(2) + The optional argument *i* defaults to ``-1``, so that by default the last + item is removed and returned. + +(3) + ``remove`` raises :exc:`ValueError` when *x* is not found in *s*. + +(4) + The :meth:`reverse` method modifies the sequence in place for economy of + space when reversing a large sequence. To remind users that it operates by + side effect, it does not return the reversed sequence. + +(5) + :meth:`clear` and :meth:`!copy` are included for consistency with the + interfaces of mutable containers that don't support slicing operations + (such as :class:`dict` and :class:`set`) + + .. versionadded:: 3.3 + :meth:`clear` and :meth:`!copy` methods. + + +.. _typesseq-list: + +Lists +----- + +.. index:: object: list + +Lists are mutable sequences, typically used to store collections of +homogeneous items (where the precise degree of similarity will vary by +application). + +Lists may be constructed in several ways: + +* Using a pair of square brackets to denote the empty list: ``[]`` +* Using square brackets, separating items with commas: ``[a]``, ``[a, b, c]`` +* Using a list comprehension: ``[x for x in iterable]`` +* Using the :func:`list` built-in: ``list()`` or ``list(iterable)`` + +Many other operations also produce lists, including the :func:`sorted` built-in. + +Lists implement all of the :ref:`common <typesseq-common>` and +:ref:`mutable <typesseq-mutable>` sequence operations. Lists also provide the +following additional method: + +.. method:: list.sort(*, key=None, reverse=None) + + This method sorts the list in place, using only ``<`` comparisons + between items. Exceptions are not suppressed - if any comparison operations + fail, the entire sort operation will fail (and the list will likely be left + in a partially modified state). + + *key* specifies a function of one argument that is used to extract a + comparison key from each list element (for example, ``key=str.lower``). + The key corresponding to each item in the list is calculated once and + then used for the entire sorting process. The default value of ``None`` + means that list items are sorted directly without calculating a separate + key value. + + The :func:`functools.cmp_to_key` utility is available to convert a 2.x + style *cmp* function to a *key* function. + + *reverse* is a boolean value. If set to ``True``, then the list elements + are sorted as if each comparison were reversed. + + This method modifies the sequence in place for economy of space when + sorting a large sequence. To remind users that it operates by side + effect, it does not return the sorted sequence (use :func:`sorted` to + explicitly request a new sorted list instance). + + The :meth:`sort` method is guaranteed to be stable. A sort is stable if it + guarantees not to change the relative order of elements that compare equal + --- this is helpful for sorting in multiple passes (for example, sort by + department, then by salary grade). + + .. impl-detail:: + + While a list is being sorted, the effect of attempting to mutate, or even + inspect, the list is undefined. The C implementation of Python makes the + list appear empty for the duration, and raises :exc:`ValueError` if it can + detect that the list has been mutated during a sort. + + +.. _typesseq-tuple: + +Tuples +------ + +.. index:: object: tuple + +Tuples are immutable sequences, typically used to store collections of +heterogeneous data (such as the 2-tuples produced by the :func:`enumerate` +built-in). Tuples are also used for cases where an immutable sequence of +homogeneous data is needed (such as allowing storage in a :class:`set` or +:class:`dict` instance). + +Tuples may be constructed in a number of ways: + +* Using a pair of parentheses to denote the empty tuple: ``()`` +* Using a trailing comma for a singleton tuple: ``a,`` or ``(a,)`` +* Separating items with commas: ``a, b, c`` or ``(a, b, c)`` +* Using the :func:`tuple` built-in: ``tuple()`` or ``tuple(iterable)`` + +Note that the parentheses are optional (except in the empty tuple case, or +when needed to avoid syntactic ambiguity). It is actually the comma which +makes a tuple, not the parentheses. + +Tuples implement all of the :ref:`common <typesseq-common>` sequence +operations. + +For heterogeneous collections of data, :func:`collections.namedtuple` may +be more appropriate than a simple tuple object. + + +.. _typesseq-range: + +Ranges +------ + +.. index:: object: range + +The :class:`range` type represents an immutable sequence of numbers and is +commonly used for looping a specific number of times. Instances are created +using the :func:`range` built-in. + +For positive indices with results between the defined ``start`` and ``stop`` +values, integers within the range are determined by the formula: +``r[i] = start + step*i`` + +For negative indices and slicing operations, a range instance determines the +appropriate result for the corresponding tuple and returns either the +appropriate integer (for negative indices) or an appropriate range object +(for slicing operations) . + +The advantage of the :class:`range` type over a regular :class:`list` or +:class:`tuple` is that a :class:`range` object will always take the same +(small) amount of memory, no matter the size of the range it represents (as it +only stores the ``start``, ``stop`` and ``step`` values, calculating individual +items and subranges as needed). + +Ranges implement all of the :ref:`common <typesseq-common>` sequence operations +except concatenation and repetition (due to the fact that range objects can +only represent sequences that follow a strict pattern and repetition and +concatenation will usually violate that pattern). + + +.. _textseq: + +Text Sequence Type --- :class:`str` +=================================== + +.. index:: + object: string + object: bytes + object: bytearray + object: io.StringIO + + +Textual data in Python is handled with :class:`str` objects, which are +immutable sequences of Unicode code points. String literals are +written in a variety of ways: + +* Single quotes: ``'allows embedded "double" quotes'`` +* Double quotes: ``"allows embedded 'single' quotes"``. +* Triple quoted: ``'''Three single quotes'''``, ``"""Three double quotes"""`` +Triple quoted strings may span multiple lines - all associated whitespace will +be included in the string literal. + +String literals that are part of a single expression and have only whitespace +between them will be implicitly converted to a single string literal. + +See :ref:`strings` for more about the various forms of string literal, +including supported escape sequences, and the ``r`` ("raw") prefix that +disables most escape sequence processing. + +Strings may also be created from other objects with the :func:`str` built-in. + +Since there is no separate "character" type, indexing a string produces +strings of length 1. That is, for a non-empty string *s*, ``s[0] == s[0:1]``. + +There is also no mutable string type, but :meth:`str.join` or +:class:`io.StringIO` can be used to efficiently construct strings from +multiple fragments. + +.. versionchanged:: 3.3 + For backwards compatibility with the Python 2 series, the ``u`` prefix is + once again permitted on string literals. It has no effect on the meaning + of string literals and cannot be combined with the ``r`` prefix. .. _string-methods: String Methods -------------- -.. index:: pair: string; methods +.. index:: + pair: string; methods + module: re + +Strings implement all of the :ref:`common <typesseq-common>` sequence +operations, along with the additional methods described below. -String objects support the methods listed below. +Strings also support two styles of string formatting, one providing a large +degree of flexibility and customization (see :meth:`str.format`, +:ref:`formatstrings` and :ref:`string-formatting`) and the other based on C +``printf`` style formatting that handles a narrower range of types and is +slightly harder to use correctly, but is often faster for the cases it can +handle (:ref:`old-string-formatting`). -In addition, Python's strings support the sequence type methods described in the -:ref:`typesseq` section. To output formatted strings, see the -:ref:`string-formatting` section. Also, see the :mod:`re` module for string -functions based on regular expressions. +The :ref:`textservices` section of the standard library covers a number of +other modules that provide various text related utilities (including regular +expression support in the :mod:`re` module). .. method:: str.capitalize() @@ -1462,8 +1755,8 @@ functions based on regular expressions. .. _old-string-formatting: -Old String Formatting Operations --------------------------------- +``printf``-style String Formatting +---------------------------------- .. index:: single: formatting, string (%) @@ -1475,23 +1768,19 @@ Old String Formatting Operations single: % formatting single: % interpolation -.. XXX is the note enough? - .. note:: - The formatting operations described here are modelled on C's printf() - syntax. They only support formatting of certain builtin types. The - use of a binary operator means that care may be needed in order to - format tuples and dictionaries correctly. As the new - :ref:`string-formatting` syntax is more flexible and handles tuples and - dictionaries naturally, it is recommended for new code. However, there - are no current plans to deprecate printf-style formatting. + The formatting operations described here exhibit a variety of quirks that + lead to a number of common errors (such as failing to display tuples and + dictionaries correctly). Using the newer :meth:`str.format` interface + helps avoid these errors, and also provides a generally more powerful, + flexible and extensible approach to formatting text. String objects have one unique built-in operation: the ``%`` operator (modulo). This is also known as the string *formatting* or *interpolation* operator. Given ``format % values`` (where *format* is a string), ``%`` conversion specifications in *format* are replaced with zero or more elements of *values*. -The effect is similar to the using :c:func:`sprintf` in the C language. +The effect is similar to using the :c:func:`sprintf` in the C language. If *format* requires a single argument, *values* may be a single non-tuple object. [5]_ Otherwise, *values* must be a tuple with exactly the number of @@ -1649,229 +1938,174 @@ that ``'\0'`` is the end of the string. ``%f`` conversions for numbers whose absolute value is over 1e50 are no longer replaced by ``%g`` conversions. -.. index:: - module: string - module: re - -Additional string operations are defined in standard modules :mod:`string` and -:mod:`re`. - - -.. _typesseq-range: - -Range Type ----------- - -.. index:: object: range - -The :class:`range` type is an immutable sequence which is commonly used for -looping. The advantage of the :class:`range` type is that an :class:`range` -object will always take the same amount of memory, no matter the size of the -range it represents. - -Range objects have relatively little behavior: they support indexing, contains, -iteration, the :func:`len` function, and the following methods: - -.. method:: range.count(x) - - Return the number of *i*'s for which ``s[i] == x``. - - .. versionadded:: 3.2 - -.. method:: range.index(x) - Return the smallest *i* such that ``s[i] == x``. Raises - :exc:`ValueError` when *x* is not in the range. +.. _binaryseq: - .. versionadded:: 3.2 +Binary Sequence Types --- :class:`bytes`, :class:`bytearray`, :class:`memoryview` +================================================================================= +.. index:: + object: bytes + object: bytearray + object: memoryview + module: array -.. _typesseq-mutable: +The core built-in types for manipulating binary data are :class:`bytes` and +:class:`bytearray`. They are supported by :class:`memoryview` which uses +the buffer protocol to access the memory of other binary objects without +needing to make a copy. -Mutable Sequence Types ----------------------- +The :mod:`array` module supports efficient storage of basic data types like +32-bit integers and IEEE754 double-precision floating values. -.. index:: - triple: mutable; sequence; types - object: list - object: bytearray +.. _typebytes: -List and bytearray objects support additional operations that allow in-place -modification of the object. Other mutable sequence types (when added to the -language) should also support these operations. Strings and tuples are -immutable sequence types: such objects cannot be modified once created. The -following operations are defined on mutable sequence types (where *x* is an -arbitrary object). +Bytes +----- -Note that while lists allow their items to be of any type, bytearray object -"items" are all integers in the range 0 <= x < 256. +.. index:: object: bytes -.. index:: - triple: operations on; sequence; types - triple: operations on; list; type - pair: subscript; assignment - pair: slice; assignment - statement: del - single: append() (sequence method) - single: extend() (sequence method) - single: count() (sequence method) - single: clear() (sequence method) - single: copy() (sequence method) - single: index() (sequence method) - single: insert() (sequence method) - single: pop() (sequence method) - single: remove() (sequence method) - single: reverse() (sequence method) - single: sort() (sequence method) +Bytes objects are immutable sequences of single bytes. Since many major +binary protocols are based on the ASCII text encoding, bytes objects offer +several methods that are only valid when working with ASCII compatible +data and are closely related to string objects in a variety of other ways. -+------------------------------+--------------------------------+---------------------+ -| Operation | Result | Notes | -+==============================+================================+=====================+ -| ``s[i] = x`` | item *i* of *s* is replaced by | | -| | *x* | | -+------------------------------+--------------------------------+---------------------+ -| ``s[i:j] = t`` | slice of *s* from *i* to *j* | | -| | is replaced by the contents of | | -| | the iterable *t* | | -+------------------------------+--------------------------------+---------------------+ -| ``del s[i:j]`` | same as ``s[i:j] = []`` | | -+------------------------------+--------------------------------+---------------------+ -| ``s[i:j:k] = t`` | the elements of ``s[i:j:k]`` | \(1) | -| | are replaced by those of *t* | | -+------------------------------+--------------------------------+---------------------+ -| ``del s[i:j:k]`` | removes the elements of | | -| | ``s[i:j:k]`` from the list | | -+------------------------------+--------------------------------+---------------------+ -| ``s.append(x)`` | same as ``s[len(s):len(s)] = | | -| | [x]`` | | -+------------------------------+--------------------------------+---------------------+ -| ``s.extend(x)`` | same as ``s[len(s):len(s)] = | \(2) | -| | x`` | | -+------------------------------+--------------------------------+---------------------+ -| ``s.clear()`` | remove all items from ``s`` | | -| | | | -+------------------------------+--------------------------------+---------------------+ -| ``s.copy()`` | return a shallow copy of ``s`` | | -| | | | -+------------------------------+--------------------------------+---------------------+ -| ``s.count(x)`` | return number of *i*'s for | | -| | which ``s[i] == x`` | | -+------------------------------+--------------------------------+---------------------+ -| ``s.index(x[, i[, j]])`` | return smallest *k* such that | \(3) | -| | ``s[k] == x`` and ``i <= k < | | -| | j`` | | -+------------------------------+--------------------------------+---------------------+ -| ``s.insert(i, x)`` | same as ``s[i:i] = [x]`` | \(4) | -+------------------------------+--------------------------------+---------------------+ -| ``s.pop([i])`` | same as ``x = s[i]; del s[i]; | \(5) | -| | return x`` | | -+------------------------------+--------------------------------+---------------------+ -| ``s.remove(x)`` | same as ``del s[s.index(x)]`` | \(3) | -+------------------------------+--------------------------------+---------------------+ -| ``s.reverse()`` | reverses the items of *s* in | \(6) | -| | place | | -+------------------------------+--------------------------------+---------------------+ -| ``s.sort([key[, reverse]])`` | sort the items of *s* in place | (6), (7), (8) | -+------------------------------+--------------------------------+---------------------+ +Firstly, the syntax for bytes literals is largely the same as that for string +literals, except that a ``b`` prefix is added: +* Single quotes: ``b'still allows embedded "double" quotes'`` +* Double quotes: ``b"still allows embedded 'single' quotes"``. +* Triple quoted: ``b'''3 single quotes'''``, ``b"""3 double quotes"""`` -Notes: +Only ASCII characters are permitted in bytes literals (regardless of the +declared source code encoding). Any binary values over 127 must be entered +into bytes literals using the appropriate escape sequence. -(1) - *t* must have the same length as the slice it is replacing. +As with string literals, bytes literals may also use a ``r`` prefix to disable +processing of escape sequences. See :ref:`strings` for more about the various +forms of bytes literal, including supported escape sequences. -(2) - *x* can be any iterable object. +While bytes literals and representations are based on ASCII text, bytes +objects actually behave like immutable sequences of integers, with each +value in the sequence restricted such that ``0 <= x < 256`` (attempts to +violate this restriction will trigger :exc:`ValueError`. This is done +deliberately to emphasise that while many binary formats include ASCII based +elements and can be usefully manipulated with some text-oriented algorithms, +this is not generally the case for arbitrary binary data (blindly applying +text processing algorithms to binary data formats that are not ASCII +compatible will usually lead to data corruption). -(3) - Raises :exc:`ValueError` when *x* is not found in *s*. When a negative index is - passed as the second or third parameter to the :meth:`index` method, the sequence - length is added, as for slice indices. If it is still negative, it is truncated - to zero, as for slice indices. +In addition to the literal forms, bytes objects can be created in a number of +other ways: -(4) - When a negative index is passed as the first parameter to the :meth:`insert` - method, the sequence length is added, as for slice indices. If it is still - negative, it is truncated to zero, as for slice indices. +* A zero-filled bytes object of a specified length: ``bytes(10)`` +* From an iterable of integers: ``bytes(range(20))`` +* Copying existing binary data via the buffer protocol: ``bytes(obj)`` -(5) - The optional argument *i* defaults to ``-1``, so that by default the last - item is removed and returned. +Since bytes objects are sequences of integers, for a bytes object *b*, +``b[0]`` will be an integer, while ``b[0:1]`` will be a bytes object of +length 1. (This contrasts with text strings, where both indexing and +slicing will produce a string of length 1) -(6) - The :meth:`sort` and :meth:`reverse` methods modify the sequence in place for - economy of space when sorting or reversing a large sequence. To remind you - that they operate by side effect, they don't return the sorted or reversed - sequence. +The representation of bytes objects uses the literal format (``b'...'``) +since it is often more useful than e.g. ``bytes([46, 46, 46])``. You can +always convert a bytes object into a list of integers using ``list(b)``. -(7) - The :meth:`sort` method takes optional arguments for controlling the - comparisons. Each must be specified as a keyword argument. - *key* specifies a function of one argument that is used to extract a comparison - key from each list element: ``key=str.lower``. The default value is ``None``. - Use :func:`functools.cmp_to_key` to convert an - old-style *cmp* function to a *key* function. +.. note:: + For Python 2.x users: In the Python 2.x series, a variety of implicit + conversions between 8-bit strings (the closest thing 2.x offers to a + built-in binary data type) and Unicode strings were permitted. This was a + backwards compatibility workaround to account for the fact that Python + originally only supported 8-bit text, and Unicode text was a later + addition. In Python 3.x, those implicit conversions are gone - conversions + between 8-bit binary data and Unicode text must be explicit, and bytes and + string objects will always compare unequal. - *reverse* is a boolean value. If set to ``True``, then the list elements are - sorted as if each comparison were reversed. +.. _typebytearray: - The :meth:`sort` method is guaranteed to be stable. A - sort is stable if it guarantees not to change the relative order of elements - that compare equal --- this is helpful for sorting in multiple passes (for - example, sort by department, then by salary grade). +Bytearray Objects +----------------- - .. impl-detail:: +.. index:: object: bytearray - While a list is being sorted, the effect of attempting to mutate, or even - inspect, the list is undefined. The C implementation of Python makes the - list appear empty for the duration, and raises :exc:`ValueError` if it can - detect that the list has been mutated during a sort. +:class:`bytearray` objects are a mutable counterpart to :class:`bytes` +objects. There is no dedicated literal syntax for bytearray objects, instead +they are always created by calling the constructor: -(8) - :meth:`sort` is not supported by :class:`bytearray` objects. +* Creating an empty instance: ``bytearray()`` +* Creating a zero-filled instance with a given length: ``bytearray(10)`` +* From an iterable of integers: ``bytearray(range(20))`` +* Copying existing binary data via the buffer protocol: ``bytearray(b'Hi!)`` - .. versionadded:: 3.3 - :meth:`clear` and :meth:`!copy` methods. +As bytearray objects are mutable, they support the +:ref:`mutable <typesseq-mutable>` sequence operations in addition to the +common bytes and bytearray operations described in :ref:`bytes-methods`. .. _bytes-methods: -Bytes and Byte Array Methods ----------------------------- +Bytes and Bytearray Operations +------------------------------ .. index:: pair: bytes; methods pair: bytearray; methods -Bytes and bytearray objects, being "strings of bytes", have all methods found on -strings, with the exception of :func:`encode`, :func:`format` and -:func:`isidentifier`, which do not make sense with these types. For converting -the objects to strings, they have a :func:`decode` method. +Both bytes and bytearray objects support the :ref:`common <typesseq-common>` +sequence operations. They interoperate not just with operands of the same +type, but with any object that supports the +:ref:`buffer protocol <bufferobjects>`. Due to this flexibility, they can be +freely mixed in operations without causing errors. However, the return type +of the result may depend on the order of operands. + +Due to the common use of ASCII text as the basis for binary protocols, bytes +and bytearray objects provide almost all methods found on text strings, with +the exceptions of: -Wherever one of these methods needs to interpret the bytes as characters -(e.g. the :func:`is...` methods), the ASCII character set is assumed. +* :meth:`str.encode` (which converts text strings to bytes objects) +* :meth:`str.format` and :meth:`str.format_map` (which are used to format + text for display to users) +* :meth:`str.isidentifier`, :meth:`str.isnumeric`, :meth:`str.isdecimal`, + :meth:`str.isprintable` (which are used to check various properties of + text strings which are not typically applicable to binary protocols). -.. versionadded:: 3.3 - The functions :func:`count`, :func:`find`, :func:`index`, - :func:`rfind` and :func:`rindex` have additional semantics compared to - the corresponding string functions: They also accept an integer in - range 0 to 255 (a byte) as their first argument. +All other string methods are supported, although sometimes with slight +differences in functionality and semantics (as described below). .. note:: The methods on bytes and bytearray objects don't accept strings as their arguments, just as the methods on strings don't accept bytes as their - arguments. For example, you have to write :: + arguments. For example, you have to write:: a = "abc" b = a.replace("a", "f") - and :: + and:: a = b"abc" b = a.replace(b"a", b"f") +Whenever a bytes or bytearray method needs to interpret the bytes as +characters (e.g. the :meth:`is...` methods, :meth:`split`, :meth:`strip`), +the ASCII character set is assumed (text strings use Unicode semantics). + +.. note:: + Using these ASCII based methods to manipulate binary data that is not + stored in an ASCII based format may lead to data corruption. + +The search operations (:keyword:`in`, :meth:`count`, :meth:`find`, +:meth:`index`, :meth:`rfind` and :meth:`rindex`) all accept both integers +in the range 0 to 255 as well as bytes and byte array sequences. + +.. versionchanged:: 3.3 + All of the search methods also accept an integer in range 0 to 255 + (a byte) as their first argument. + + +Each bytes and bytearray instance provides a :meth:`decode` convenience +method that is the inverse of "meth:`str.encode`: .. method:: bytes.decode(encoding="utf-8", errors="strict") bytearray.decode(encoding="utf-8", errors="strict") @@ -1887,8 +2121,10 @@ Wherever one of these methods needs to interpret the bytes as characters .. versionchanged:: 3.1 Added support for keyword arguments. - -The bytes and bytearray types have an additional class method: +Since 2 hexadecimal digits correspond precisely to a single byte, hexadecimal +numbers are a commonly used format for describing binary data. Accordingly, +the bytes and bytearray types have an additional class method to read data in +that format: .. classmethod:: bytes.fromhex(string) bytearray.fromhex(string) @@ -1897,8 +2133,8 @@ The bytes and bytearray types have an additional class method: decoding the given string object. The string must contain two hexadecimal digits per byte, spaces are ignored. - >>> bytes.fromhex('f0 f1f2 ') - b'\xf0\xf1\xf2' + >>> bytes.fromhex('2Ef0 F1f2 ') + b'.\xf0\xf1\xf2' The maketrans and translate methods differ in semantics from the versions @@ -1932,6 +2168,390 @@ available on strings: .. versionadded:: 3.1 +.. _typememoryview: + +Memory Views +------------ + +:class:`memoryview` objects allow Python code to access the internal data +of an object that supports the :ref:`buffer protocol <bufferobjects>` without +copying. + +.. class:: memoryview(obj) + + Create a :class:`memoryview` that references *obj*. *obj* must support the + buffer protocol. Built-in objects that support the buffer protocol include + :class:`bytes` and :class:`bytearray`. + + A :class:`memoryview` has the notion of an *element*, which is the + atomic memory unit handled by the originating object *obj*. For many + simple types such as :class:`bytes` and :class:`bytearray`, an element + is a single byte, but other types such as :class:`array.array` may have + bigger elements. + + ``len(view)`` is equal to the length of :class:`~memoryview.tolist`. + If ``view.ndim = 0``, the length is 1. If ``view.ndim = 1``, the length + is equal to the number of elements in the view. For higher dimensions, + the length is equal to the length of the nested list representation of + the view. The :class:`~memoryview.itemsize` attribute will give you the + number of bytes in a single element. + + A :class:`memoryview` supports slicing to expose its data. If + :class:`~memoryview.format` is one of the native format specifiers + from the :mod:`struct` module, indexing will return a single element + with the correct type. Full slicing will result in a subview:: + + >>> v = memoryview(b'abcefg') + >>> v[1] + 98 + >>> v[-1] + 103 + >>> v[1:4] + <memory at 0x7f3ddc9f4350> + >>> bytes(v[1:4]) + b'bce' + + Other native formats:: + + >>> import array + >>> a = array.array('l', [-11111111, 22222222, -33333333, 44444444]) + >>> a[0] + -11111111 + >>> a[-1] + 44444444 + >>> a[2:3].tolist() + [-33333333] + >>> a[::2].tolist() + [-11111111, -33333333] + >>> a[::-1].tolist() + [44444444, -33333333, 22222222, -11111111] + + .. versionadded:: 3.3 + + If the underlying object is writable, the memoryview supports slice + assignment. Resizing is not allowed:: + + >>> data = bytearray(b'abcefg') + >>> v = memoryview(data) + >>> v.readonly + False + >>> v[0] = ord(b'z') + >>> data + bytearray(b'zbcefg') + >>> v[1:4] = b'123' + >>> data + bytearray(b'z123fg') + >>> v[2:3] = b'spam' + Traceback (most recent call last): + File "<stdin>", line 1, in <module> + ValueError: memoryview assignment: lvalue and rvalue have different structures + >>> v[2:6] = b'spam' + >>> data + bytearray(b'z1spam') + + Memoryviews of hashable (read-only) types are also hashable. The hash + is defined as ``hash(m) == hash(m.tobytes())``:: + + >>> v = memoryview(b'abcefg') + >>> hash(v) == hash(b'abcefg') + True + >>> hash(v[2:4]) == hash(b'ce') + True + >>> hash(v[::-2]) == hash(b'abcefg'[::-2]) + True + + Hashing of multi-dimensional objects is supported:: + + >>> buf = bytes(list(range(12))) + >>> x = memoryview(buf) + >>> y = x.cast('B', shape=[2,2,3]) + >>> x.tolist() + [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] + >>> y.tolist() + [[[0, 1, 2], [3, 4, 5]], [[6, 7, 8], [9, 10, 11]]] + >>> hash(x) == hash(y) == hash(y.tobytes()) + True + + .. versionchanged:: 3.3 + Memoryview objects are now hashable. + + + :class:`memoryview` has several methods: + + .. method:: tobytes() + + Return the data in the buffer as a bytestring. This is equivalent to + calling the :class:`bytes` constructor on the memoryview. :: + + >>> m = memoryview(b"abc") + >>> m.tobytes() + b'abc' + >>> bytes(m) + b'abc' + + For non-contiguous arrays the result is equal to the flattened list + representation with all elements converted to bytes. + + .. method:: tolist() + + Return the data in the buffer as a list of elements. :: + + >>> memoryview(b'abc').tolist() + [97, 98, 99] + >>> import array + >>> a = array.array('d', [1.1, 2.2, 3.3]) + >>> m = memoryview(a) + >>> m.tolist() + [1.1, 2.2, 3.3] + + .. method:: release() + + Release the underlying buffer exposed by the memoryview object. Many + objects take special actions when a view is held on them (for example, + a :class:`bytearray` would temporarily forbid resizing); therefore, + calling release() is handy to remove these restrictions (and free any + dangling resources) as soon as possible. + + After this method has been called, any further operation on the view + raises a :class:`ValueError` (except :meth:`release()` itself which can + be called multiple times):: + + >>> m = memoryview(b'abc') + >>> m.release() + >>> m[0] + Traceback (most recent call last): + File "<stdin>", line 1, in <module> + ValueError: operation forbidden on released memoryview object + + The context management protocol can be used for a similar effect, + using the ``with`` statement:: + + >>> with memoryview(b'abc') as m: + ... m[0] + ... + 97 + >>> m[0] + Traceback (most recent call last): + File "<stdin>", line 1, in <module> + ValueError: operation forbidden on released memoryview object + + .. versionadded:: 3.2 + + .. method:: cast(format[, shape]) + + Cast a memoryview to a new format or shape. *shape* defaults to + ``[byte_length//new_itemsize]``, which means that the result view + will be one-dimensional. The return value is a new memoryview, but + the buffer itself is not copied. Supported casts are 1D -> C-contiguous + and C-contiguous -> 1D. One of the formats must be a byte format + ('B', 'b' or 'c'). The byte length of the result must be the same + as the original length. + + Cast 1D/long to 1D/unsigned bytes:: + + >>> import array + >>> a = array.array('l', [1,2,3]) + >>> x = memoryview(a) + >>> x.format + 'l' + >>> x.itemsize + 8 + >>> len(x) + 3 + >>> x.nbytes + 24 + >>> y = x.cast('B') + >>> y.format + 'B' + >>> y.itemsize + 1 + >>> len(y) + 24 + >>> y.nbytes + 24 + + Cast 1D/unsigned bytes to 1D/char:: + + >>> b = bytearray(b'zyz') + >>> x = memoryview(b) + >>> x[0] = b'a' + Traceback (most recent call last): + File "<stdin>", line 1, in <module> + ValueError: memoryview: invalid value for format "B" + >>> y = x.cast('c') + >>> y[0] = b'a' + >>> b + bytearray(b'ayz') + + Cast 1D/bytes to 3D/ints to 1D/signed char:: + + >>> import struct + >>> buf = struct.pack("i"*12, *list(range(12))) + >>> x = memoryview(buf) + >>> y = x.cast('i', shape=[2,2,3]) + >>> y.tolist() + [[[0, 1, 2], [3, 4, 5]], [[6, 7, 8], [9, 10, 11]]] + >>> y.format + 'i' + >>> y.itemsize + 4 + >>> len(y) + 2 + >>> y.nbytes + 48 + >>> z = y.cast('b') + >>> z.format + 'b' + >>> z.itemsize + 1 + >>> len(z) + 48 + >>> z.nbytes + 48 + + Cast 1D/unsigned char to to 2D/unsigned long:: + + >>> buf = struct.pack("L"*6, *list(range(6))) + >>> x = memoryview(buf) + >>> y = x.cast('L', shape=[2,3]) + >>> len(y) + 2 + >>> y.nbytes + 48 + >>> y.tolist() + [[0, 1, 2], [3, 4, 5]] + + .. versionadded:: 3.3 + + There are also several readonly attributes available: + + .. attribute:: obj + + The underlying object of the memoryview:: + + >>> b = bytearray(b'xyz') + >>> m = memoryview(b) + >>> m.obj is b + True + + .. versionadded:: 3.3 + + .. attribute:: nbytes + + ``nbytes == product(shape) * itemsize == len(m.tobytes())``. This is + the amount of space in bytes that the array would use in a contiguous + representation. It is not necessarily equal to len(m):: + + >>> import array + >>> a = array.array('i', [1,2,3,4,5]) + >>> m = memoryview(a) + >>> len(m) + 5 + >>> m.nbytes + 20 + >>> y = m[::2] + >>> len(y) + 3 + >>> y.nbytes + 12 + >>> len(y.tobytes()) + 12 + + Multi-dimensional arrays:: + + >>> import struct + >>> buf = struct.pack("d"*12, *[1.5*x for x in range(12)]) + >>> x = memoryview(buf) + >>> y = x.cast('d', shape=[3,4]) + >>> y.tolist() + [[0.0, 1.5, 3.0, 4.5], [6.0, 7.5, 9.0, 10.5], [12.0, 13.5, 15.0, 16.5]] + >>> len(y) + 3 + >>> y.nbytes + 96 + + .. versionadded:: 3.3 + + .. attribute:: readonly + + A bool indicating whether the memory is read only. + + .. attribute:: format + + A string containing the format (in :mod:`struct` module style) for each + element in the view. A memoryview can be created from exporters with + arbitrary format strings, but some methods (e.g. :meth:`tolist`) are + restricted to native single element formats. Special care must be taken + when comparing memoryviews. Since comparisons are required to return a + value for ``==`` and ``!=``, two memoryviews referencing the same + exporter can compare as not-equal if the exporter's format is not + understood:: + + >>> from ctypes import BigEndianStructure, c_long + >>> class BEPoint(BigEndianStructure): + ... _fields_ = [("x", c_long), ("y", c_long)] + ... + >>> point = BEPoint(100, 200) + >>> a = memoryview(point) + >>> b = memoryview(point) + >>> a == b + False + >>> a.tolist() + Traceback (most recent call last): + File "<stdin>", line 1, in <module> + NotImplementedError: memoryview: unsupported format T{>l:x:>l:y:} + + .. attribute:: itemsize + + The size in bytes of each element of the memoryview:: + + >>> import array, struct + >>> m = memoryview(array.array('H', [32000, 32001, 32002])) + >>> m.itemsize + 2 + >>> m[0] + 32000 + >>> struct.calcsize('H') == m.itemsize + True + + .. attribute:: ndim + + An integer indicating how many dimensions of a multi-dimensional array the + memory represents. + + .. attribute:: shape + + A tuple of integers the length of :attr:`ndim` giving the shape of the + memory as a N-dimensional array. + + .. attribute:: strides + + A tuple of integers the length of :attr:`ndim` giving the size in bytes to + access each element for each dimension of the array. + + .. attribute:: suboffsets + + Used internally for PIL-style arrays. The value is informational only. + + .. attribute:: c_contiguous + + A bool indicating whether the memory is C-contiguous. + + .. versionadded:: 3.3 + + .. attribute:: f_contiguous + + A bool indicating whether the memory is Fortran contiguous. + + .. versionadded:: 3.3 + + .. attribute:: contiguous + + A bool indicating whether the memory is contiguous. + + .. versionadded:: 3.3 + + .. _types-set: Set Types --- :class:`set`, :class:`frozenset` @@ -2358,7 +2978,7 @@ Keys views are set-like since their entries are unique and hashable. If all values are hashable, so that ``(key, value)`` pairs are unique and hashable, then the items view is also set-like. (Values views are not treated as set-like since the entries are generally not unique.) For set-like views, all of the -operations defined for the abstract base class :class:`collections.Set` are +operations defined for the abstract base class :class:`collections.abc.Set` are available (for example, ``==``, ``<``, or ``^``). An example of dictionary view usage:: @@ -2393,390 +3013,6 @@ An example of dictionary view usage:: {'juice', 'sausage', 'bacon', 'spam'} -.. _typememoryview: - -memoryview type -=============== - -:class:`memoryview` objects allow Python code to access the internal data -of an object that supports the :ref:`buffer protocol <bufferobjects>` without -copying. - -.. class:: memoryview(obj) - - Create a :class:`memoryview` that references *obj*. *obj* must support the - buffer protocol. Built-in objects that support the buffer protocol include - :class:`bytes` and :class:`bytearray`. - - A :class:`memoryview` has the notion of an *element*, which is the - atomic memory unit handled by the originating object *obj*. For many - simple types such as :class:`bytes` and :class:`bytearray`, an element - is a single byte, but other types such as :class:`array.array` may have - bigger elements. - - ``len(view)`` is equal to the length of :class:`~memoryview.tolist`. - If ``view.ndim = 0``, the length is 1. If ``view.ndim = 1``, the length - is equal to the number of elements in the view. For higher dimensions, - the length is equal to the length of the nested list representation of - the view. The :class:`~memoryview.itemsize` attribute will give you the - number of bytes in a single element. - - A :class:`memoryview` supports slicing to expose its data. If - :class:`~memoryview.format` is one of the native format specifiers - from the :mod:`struct` module, indexing will return a single element - with the correct type. Full slicing will result in a subview:: - - >>> v = memoryview(b'abcefg') - >>> v[1] - 98 - >>> v[-1] - 103 - >>> v[1:4] - <memory at 0x7f3ddc9f4350> - >>> bytes(v[1:4]) - b'bce' - - Other native formats:: - - >>> import array - >>> a = array.array('l', [-11111111, 22222222, -33333333, 44444444]) - >>> a[0] - -11111111 - >>> a[-1] - 44444444 - >>> a[2:3].tolist() - [-33333333] - >>> a[::2].tolist() - [-11111111, -33333333] - >>> a[::-1].tolist() - [44444444, -33333333, 22222222, -11111111] - - .. versionadded:: 3.3 - - If the underlying object is writable, the memoryview supports slice - assignment. Resizing is not allowed:: - - >>> data = bytearray(b'abcefg') - >>> v = memoryview(data) - >>> v.readonly - False - >>> v[0] = ord(b'z') - >>> data - bytearray(b'zbcefg') - >>> v[1:4] = b'123' - >>> data - bytearray(b'z123fg') - >>> v[2:3] = b'spam' - Traceback (most recent call last): - File "<stdin>", line 1, in <module> - ValueError: memoryview assignment: lvalue and rvalue have different structures - >>> v[2:6] = b'spam' - >>> data - bytearray(b'z1spam') - - Memoryviews of hashable (read-only) types are also hashable. The hash - is defined as ``hash(m) == hash(m.tobytes())``:: - - >>> v = memoryview(b'abcefg') - >>> hash(v) == hash(b'abcefg') - True - >>> hash(v[2:4]) == hash(b'ce') - True - >>> hash(v[::-2]) == hash(b'abcefg'[::-2]) - True - - Hashing of multi-dimensional objects is supported:: - - >>> buf = bytes(list(range(12))) - >>> x = memoryview(buf) - >>> y = x.cast('B', shape=[2,2,3]) - >>> x.tolist() - [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] - >>> y.tolist() - [[[0, 1, 2], [3, 4, 5]], [[6, 7, 8], [9, 10, 11]]] - >>> hash(x) == hash(y) == hash(y.tobytes()) - True - - .. versionchanged:: 3.3 - Memoryview objects are now hashable. - - - :class:`memoryview` has several methods: - - .. method:: tobytes() - - Return the data in the buffer as a bytestring. This is equivalent to - calling the :class:`bytes` constructor on the memoryview. :: - - >>> m = memoryview(b"abc") - >>> m.tobytes() - b'abc' - >>> bytes(m) - b'abc' - - For non-contiguous arrays the result is equal to the flattened list - representation with all elements converted to bytes. - - .. method:: tolist() - - Return the data in the buffer as a list of elements. :: - - >>> memoryview(b'abc').tolist() - [97, 98, 99] - >>> import array - >>> a = array.array('d', [1.1, 2.2, 3.3]) - >>> m = memoryview(a) - >>> m.tolist() - [1.1, 2.2, 3.3] - - .. method:: release() - - Release the underlying buffer exposed by the memoryview object. Many - objects take special actions when a view is held on them (for example, - a :class:`bytearray` would temporarily forbid resizing); therefore, - calling release() is handy to remove these restrictions (and free any - dangling resources) as soon as possible. - - After this method has been called, any further operation on the view - raises a :class:`ValueError` (except :meth:`release()` itself which can - be called multiple times):: - - >>> m = memoryview(b'abc') - >>> m.release() - >>> m[0] - Traceback (most recent call last): - File "<stdin>", line 1, in <module> - ValueError: operation forbidden on released memoryview object - - The context management protocol can be used for a similar effect, - using the ``with`` statement:: - - >>> with memoryview(b'abc') as m: - ... m[0] - ... - 97 - >>> m[0] - Traceback (most recent call last): - File "<stdin>", line 1, in <module> - ValueError: operation forbidden on released memoryview object - - .. versionadded:: 3.2 - - .. method:: cast(format[, shape]) - - Cast a memoryview to a new format or shape. *shape* defaults to - ``[byte_length//new_itemsize]``, which means that the result view - will be one-dimensional. The return value is a new memoryview, but - the buffer itself is not copied. Supported casts are 1D -> C-contiguous - and C-contiguous -> 1D. One of the formats must be a byte format - ('B', 'b' or 'c'). The byte length of the result must be the same - as the original length. - - Cast 1D/long to 1D/unsigned bytes:: - - >>> import array - >>> a = array.array('l', [1,2,3]) - >>> x = memoryview(a) - >>> x.format - 'l' - >>> x.itemsize - 8 - >>> len(x) - 3 - >>> x.nbytes - 24 - >>> y = x.cast('B') - >>> y.format - 'B' - >>> y.itemsize - 1 - >>> len(y) - 24 - >>> y.nbytes - 24 - - Cast 1D/unsigned bytes to 1D/char:: - - >>> b = bytearray(b'zyz') - >>> x = memoryview(b) - >>> x[0] = b'a' - Traceback (most recent call last): - File "<stdin>", line 1, in <module> - ValueError: memoryview: invalid value for format "B" - >>> y = x.cast('c') - >>> y[0] = b'a' - >>> b - bytearray(b'ayz') - - Cast 1D/bytes to 3D/ints to 1D/signed char:: - - >>> import struct - >>> buf = struct.pack("i"*12, *list(range(12))) - >>> x = memoryview(buf) - >>> y = x.cast('i', shape=[2,2,3]) - >>> y.tolist() - [[[0, 1, 2], [3, 4, 5]], [[6, 7, 8], [9, 10, 11]]] - >>> y.format - 'i' - >>> y.itemsize - 4 - >>> len(y) - 2 - >>> y.nbytes - 48 - >>> z = y.cast('b') - >>> z.format - 'b' - >>> z.itemsize - 1 - >>> len(z) - 48 - >>> z.nbytes - 48 - - Cast 1D/unsigned char to to 2D/unsigned long:: - - >>> buf = struct.pack("L"*6, *list(range(6))) - >>> x = memoryview(buf) - >>> y = x.cast('L', shape=[2,3]) - >>> len(y) - 2 - >>> y.nbytes - 48 - >>> y.tolist() - [[0, 1, 2], [3, 4, 5]] - - .. versionadded:: 3.3 - - There are also several readonly attributes available: - - .. attribute:: obj - - The underlying object of the memoryview:: - - >>> b = bytearray(b'xyz') - >>> m = memoryview(b) - >>> m.obj is b - True - - .. versionadded:: 3.3 - - .. attribute:: nbytes - - ``nbytes == product(shape) * itemsize == len(m.tobytes())``. This is - the amount of space in bytes that the array would use in a contiguous - representation. It is not necessarily equal to len(m):: - - >>> import array - >>> a = array.array('i', [1,2,3,4,5]) - >>> m = memoryview(a) - >>> len(m) - 5 - >>> m.nbytes - 20 - >>> y = m[::2] - >>> len(y) - 3 - >>> y.nbytes - 12 - >>> len(y.tobytes()) - 12 - - Multi-dimensional arrays:: - - >>> import struct - >>> buf = struct.pack("d"*12, *[1.5*x for x in range(12)]) - >>> x = memoryview(buf) - >>> y = x.cast('d', shape=[3,4]) - >>> y.tolist() - [[0.0, 1.5, 3.0, 4.5], [6.0, 7.5, 9.0, 10.5], [12.0, 13.5, 15.0, 16.5]] - >>> len(y) - 3 - >>> y.nbytes - 96 - - .. versionadded:: 3.3 - - .. attribute:: readonly - - A bool indicating whether the memory is read only. - - .. attribute:: format - - A string containing the format (in :mod:`struct` module style) for each - element in the view. A memoryview can be created from exporters with - arbitrary format strings, but some methods (e.g. :meth:`tolist`) are - restricted to native single element formats. Special care must be taken - when comparing memoryviews. Since comparisons are required to return a - value for ``==`` and ``!=``, two memoryviews referencing the same - exporter can compare as not-equal if the exporter's format is not - understood:: - - >>> from ctypes import BigEndianStructure, c_long - >>> class BEPoint(BigEndianStructure): - ... _fields_ = [("x", c_long), ("y", c_long)] - ... - >>> point = BEPoint(100, 200) - >>> a = memoryview(point) - >>> b = memoryview(point) - >>> a == b - False - >>> a.tolist() - Traceback (most recent call last): - File "<stdin>", line 1, in <module> - NotImplementedError: memoryview: unsupported format T{>l:x:>l:y:} - - .. attribute:: itemsize - - The size in bytes of each element of the memoryview:: - - >>> import array, struct - >>> m = memoryview(array.array('H', [32000, 32001, 32002])) - >>> m.itemsize - 2 - >>> m[0] - 32000 - >>> struct.calcsize('H') == m.itemsize - True - - .. attribute:: ndim - - An integer indicating how many dimensions of a multi-dimensional array the - memory represents. - - .. attribute:: shape - - A tuple of integers the length of :attr:`ndim` giving the shape of the - memory as a N-dimensional array. - - .. attribute:: strides - - A tuple of integers the length of :attr:`ndim` giving the size in bytes to - access each element for each dimension of the array. - - .. attribute:: suboffsets - - Used internally for PIL-style arrays. The value is informational only. - - .. attribute:: c_contiguous - - A bool indicating whether the memory is C-contiguous. - - .. versionadded:: 3.3 - - .. attribute:: f_contiguous - - A bool indicating whether the memory is Fortran contiguous. - - .. versionadded:: 3.3 - - .. attribute:: contiguous - - A bool indicating whether the memory is contiguous. - - .. versionadded:: 3.3 - - .. _typecontextmanager: Context Manager Types diff --git a/Doc/library/strings.rst b/Doc/library/strings.rst deleted file mode 100644 index 08f1658..0000000 --- a/Doc/library/strings.rst +++ /dev/null @@ -1,27 +0,0 @@ -.. _stringservices: - -*************** -String Services -*************** - -The modules described in this chapter provide a wide range of string -manipulation operations. - -In addition, Python's built-in string classes support the sequence type methods -described in the :ref:`typesseq` section, and also the string-specific methods -described in the :ref:`string-methods` section. To output formatted strings, -see the :ref:`string-formatting` section. Also, see the :mod:`re` module for -string functions based on regular expressions. - - -.. toctree:: - - string.rst - re.rst - struct.rst - difflib.rst - textwrap.rst - codecs.rst - unicodedata.rst - stringprep.rst - diff --git a/Doc/library/text.rst b/Doc/library/text.rst new file mode 100644 index 0000000..939ed4f --- /dev/null +++ b/Doc/library/text.rst @@ -0,0 +1,24 @@ +.. _stringservices: +.. _textservices: + +************************ +Text Processing Services +************************ + +The modules described in this chapter provide a wide range of string +manipulation operations and other text processing services. + +The :mod:`codecs` module described under :ref:`binaryservices` is also +highly relevant to text processing. In addition, see the documentation for +Python's built-in string type in :ref:`textseq`. + + +.. toctree:: + + string.rst + re.rst + difflib.rst + textwrap.rst + unicodedata.rst + stringprep.rst + |