diff options
author | Mark Dickinson <dickinsm@gmail.com> | 2010-06-12 18:50:34 (GMT) |
---|---|---|
committer | Mark Dickinson <dickinsm@gmail.com> | 2010-06-12 18:50:34 (GMT) |
commit | 8e6c45cfd4b8e8205c55638a856909c38bccb586 (patch) | |
tree | bfbc892988d458e5d7bbe2c207cafbc7022c6cdd /Doc | |
parent | 8e5effaaa416bef3bbaf1d19171eef0d4b3401ba (diff) | |
download | cpython-8e6c45cfd4b8e8205c55638a856909c38bccb586.zip cpython-8e6c45cfd4b8e8205c55638a856909c38bccb586.tar.gz cpython-8e6c45cfd4b8e8205c55638a856909c38bccb586.tar.bz2 |
Issue #8469: Add standard sizes to table in struct documentation; additional
clarifications and documentation tweaks.
Backport of revisions 81955-81956 from py3k.
Diffstat (limited to 'Doc')
-rw-r--r-- | Doc/library/struct.rst | 237 |
1 files changed, 119 insertions, 118 deletions
diff --git a/Doc/library/struct.rst b/Doc/library/struct.rst index 166b734..5849261 100644 --- a/Doc/library/struct.rst +++ b/Doc/library/struct.rst @@ -82,9 +82,84 @@ Format Strings -------------- Format strings are the mechanism used to specify the expected layout when -packing and unpacking data. They are built up from format characters, which -specify the type of data being packed/unpacked. In addition, there are -special characters for controlling the byte order, size, and alignment. +packing and unpacking data. They are built up from :ref:`format-characters`, +which specify the type of data being packed/unpacked. In addition, there are +special characters for controlling the :ref:`struct-alignment`. + + +.. _struct-alignment: + +Byte Order, Size, and Alignment +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +By default, C types are represented in the machine's native format and byte +order, and properly aligned by skipping pad bytes if necessary (according to the +rules used by the C compiler). + +Alternatively, the first character of the format string can be used to indicate +the byte order, size and alignment of the packed data, according to the +following table: + ++-----------+------------------------+--------------------+ +| Character | Byte order | Size and alignment | ++===========+========================+====================+ +| ``@`` | native | native | ++-----------+------------------------+--------------------+ +| ``=`` | native | standard | ++-----------+------------------------+--------------------+ +| ``<`` | little-endian | standard | ++-----------+------------------------+--------------------+ +| ``>`` | big-endian | standard | ++-----------+------------------------+--------------------+ +| ``!`` | network (= big-endian) | standard | ++-----------+------------------------+--------------------+ + +If the first character is not one of these, ``'@'`` is assumed. + +Native byte order is big-endian or little-endian, depending on the host +system. For example, Intel x86 and AMD64 (x86-64) are little-endian; +Motorola 68000 and PowerPC G5 are big-endian; ARM and Intel Itanium feature +switchable endianness (bi-endian). Use ``sys.byteorder`` to check the +endianness of your system. + +Native size and alignment are determined using the C compiler's +``sizeof`` expression. This is always combined with native byte order. + +Standard size and alignment are as follows: no alignment is required for any +type (so you have to use pad bytes); :ctype:`short` is 2 bytes; :ctype:`int` and +:ctype:`long` are 4 bytes; :ctype:`long long` (:ctype:`__int64` on Windows) is 8 +bytes; :ctype:`float` and :ctype:`double` are 32-bit and 64-bit IEEE floating +point numbers, respectively. :ctype:`_Bool` is 1 byte. + +Note the difference between ``'@'`` and ``'='``: both use native byte order, but +the size and alignment of the latter is standardized. + +The form ``'!'`` is available for those poor souls who claim they can't remember +whether network byte order is big-endian or little-endian. + +There is no way to indicate non-native byte order (force byte-swapping); use the +appropriate choice of ``'<'`` or ``'>'``. + +The ``'P'`` format character is only available for the native byte ordering +(selected as the default or with the ``'@'`` byte order character). The byte +order character ``'='`` chooses to use little- or big-endian ordering based on +the host system. The struct module does not interpret this as native ordering, +so the ``'P'`` format is not available. + +Notes: + +(1) Padding is only automatically added between successive structure members. + No padding is added at the beginning or the end of the encoded struct. + +(2) No padding is added when using non-native size and alignment, e.g. + with '<', '>', '=', and '!'. + +(3) To align the end of a structure to the alignment requirement of a + particular type, end the format with the code for that type with a repeat + count of zero. See :ref:`struct-examples`. + + +.. _format-characters: Format Characters ^^^^^^^^^^^^^^^^^ @@ -92,46 +167,46 @@ Format Characters Format characters have the following meaning; the conversion between C and Python values should be obvious given their types: -+--------+-------------------------+--------------------+------------+ -| Format | C Type | Python | Notes | -+========+=========================+====================+============+ -| ``x`` | pad byte | no value | | -+--------+-------------------------+--------------------+------------+ -| ``c`` | :ctype:`char` | string of length 1 | | -+--------+-------------------------+--------------------+------------+ -| ``b`` | :ctype:`signed char` | integer | \(3) | -+--------+-------------------------+--------------------+------------+ -| ``B`` | :ctype:`unsigned char` | integer | \(3) | -+--------+-------------------------+--------------------+------------+ -| ``?`` | :ctype:`_Bool` | bool | \(1) | -+--------+-------------------------+--------------------+------------+ -| ``h`` | :ctype:`short` | integer | \(3) | -+--------+-------------------------+--------------------+------------+ -| ``H`` | :ctype:`unsigned short` | integer | \(3) | -+--------+-------------------------+--------------------+------------+ -| ``i`` | :ctype:`int` | integer | \(3) | -+--------+-------------------------+--------------------+------------+ -| ``I`` | :ctype:`unsigned int` | integer or long | \(3) | -+--------+-------------------------+--------------------+------------+ -| ``l`` | :ctype:`long` | integer | \(3) | -+--------+-------------------------+--------------------+------------+ -| ``L`` | :ctype:`unsigned long` | long | \(3) | -+--------+-------------------------+--------------------+------------+ -| ``q`` | :ctype:`long long` | long | \(2),\(3) | -+--------+-------------------------+--------------------+------------+ -| ``Q`` | :ctype:`unsigned long | long | \(2),\(3) | -| | long` | | | -+--------+-------------------------+--------------------+------------+ -| ``f`` | :ctype:`float` | float | | -+--------+-------------------------+--------------------+------------+ -| ``d`` | :ctype:`double` | float | | -+--------+-------------------------+--------------------+------------+ -| ``s`` | :ctype:`char[]` | string | | -+--------+-------------------------+--------------------+------------+ -| ``p`` | :ctype:`char[]` | string | | -+--------+-------------------------+--------------------+------------+ -| ``P`` | :ctype:`void \*` | long | \(3) | -+--------+-------------------------+--------------------+------------+ ++--------+-------------------------+--------------------+----------------+------------+ +| Format | C Type | Python type | Standard size | Notes | ++========+=========================+====================+================+============+ +| ``x`` | pad byte | no value | | | ++--------+-------------------------+--------------------+----------------+------------+ +| ``c`` | :ctype:`char` | string of length 1 | 1 | | ++--------+-------------------------+--------------------+----------------+------------+ +| ``b`` | :ctype:`signed char` | integer | 1 | \(3) | ++--------+-------------------------+--------------------+----------------+------------+ +| ``B`` | :ctype:`unsigned char` | integer | 1 | \(3) | ++--------+-------------------------+--------------------+----------------+------------+ +| ``?`` | :ctype:`_Bool` | bool | 1 | \(1) | ++--------+-------------------------+--------------------+----------------+------------+ +| ``h`` | :ctype:`short` | integer | 2 | \(3) | ++--------+-------------------------+--------------------+----------------+------------+ +| ``H`` | :ctype:`unsigned short` | integer | 2 | \(3) | ++--------+-------------------------+--------------------+----------------+------------+ +| ``i`` | :ctype:`int` | integer | 4 | \(3) | ++--------+-------------------------+--------------------+----------------+------------+ +| ``I`` | :ctype:`unsigned int` | integer | 4 | \(3) | ++--------+-------------------------+--------------------+----------------+------------+ +| ``l`` | :ctype:`long` | integer | 4 | \(3) | ++--------+-------------------------+--------------------+----------------+------------+ +| ``L`` | :ctype:`unsigned long` | integer | 4 | \(3) | ++--------+-------------------------+--------------------+----------------+------------+ +| ``q`` | :ctype:`long long` | integer | 8 | \(2), \(3) | ++--------+-------------------------+--------------------+----------------+------------+ +| ``Q`` | :ctype:`unsigned long | integer | 8 | \(2), \(3) | +| | long` | | | | ++--------+-------------------------+--------------------+----------------+------------+ +| ``f`` | :ctype:`float` | float | 4 | | ++--------+-------------------------+--------------------+----------------+------------+ +| ``d`` | :ctype:`double` | float | 8 | | ++--------+-------------------------+--------------------+----------------+------------+ +| ``s`` | :ctype:`char[]` | string | | | ++--------+-------------------------+--------------------+----------------+------------+ +| ``p`` | :ctype:`char[]` | string | | | ++--------+-------------------------+--------------------+----------------+------------+ +| ``P`` | :ctype:`void \*` | integer | | \(3) | ++--------+-------------------------+--------------------+----------------+------------+ Notes: @@ -190,9 +265,6 @@ count-1, it is padded with null bytes so that exactly count bytes in all are used. Note that for :func:`unpack`, the ``'p'`` format character consumes count bytes, but that the string returned can never contain more than 255 characters. -For the ``'I'``, ``'L'``, ``'q'`` and ``'Q'`` format characters, the return -value is a Python long integer. - For the ``'P'`` format character, the return value is a Python integer or long integer, depending on the size needed to hold a pointer when it has been cast to an integer type. A *NULL* pointer will always be returned as the Python integer @@ -207,77 +279,6 @@ Either 0 or 1 in the native or standard bool representation will be packed, and any non-zero value will be True when unpacking. -.. _struct-alignment: - -Byte Order, Size, and Alignment -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -By default, C types are represented in the machine's native format and byte -order, and properly aligned by skipping pad bytes if necessary (according to the -rules used by the C compiler). - -Alternatively, the first character of the format string can be used to indicate -the byte order, size and alignment of the packed data, according to the -following table: - -+-----------+------------------------+--------------------+ -| Character | Byte order | Size and alignment | -+===========+========================+====================+ -| ``@`` | native | native | -+-----------+------------------------+--------------------+ -| ``=`` | native | standard | -+-----------+------------------------+--------------------+ -| ``<`` | little-endian | standard | -+-----------+------------------------+--------------------+ -| ``>`` | big-endian | standard | -+-----------+------------------------+--------------------+ -| ``!`` | network (= big-endian) | standard | -+-----------+------------------------+--------------------+ - -If the first character is not one of these, ``'@'`` is assumed. - -Native byte order is big-endian or little-endian, depending on the host -system. For example, Intel x86 and AMD64 (x86-64) are little-endian; -Motorola 68000 and PowerPC G5 are big-endian; ARM and Intel Itanium feature -switchable endianness (bi-endian). Use ``sys.byteorder`` to check the -endianness of your system. - -Native size and alignment are determined using the C compiler's -``sizeof`` expression. This is always combined with native byte order. - -Standard size and alignment are as follows: no alignment is required for any -type (so you have to use pad bytes); :ctype:`short` is 2 bytes; :ctype:`int` and -:ctype:`long` are 4 bytes; :ctype:`long long` (:ctype:`__int64` on Windows) is 8 -bytes; :ctype:`float` and :ctype:`double` are 32-bit and 64-bit IEEE floating -point numbers, respectively. :ctype:`_Bool` is 1 byte. - -Note the difference between ``'@'`` and ``'='``: both use native byte order, but -the size and alignment of the latter is standardized. - -The form ``'!'`` is available for those poor souls who claim they can't remember -whether network byte order is big-endian or little-endian. - -There is no way to indicate non-native byte order (force byte-swapping); use the -appropriate choice of ``'<'`` or ``'>'``. - -The ``'P'`` format character is only available for the native byte ordering -(selected as the default or with the ``'@'`` byte order character). The byte -order character ``'='`` chooses to use little- or big-endian ordering based on -the host system. The struct module does not interpret this as native ordering, -so the ``'P'`` format is not available. - -Notes: - -(1) Padding is only automatically added between successive structure members. - No padding is added at the beginning or the end of the encoded struct. - -(2) No padding is added when using non-native size and alignment, e.g. - with '<', '>', '=', and '!'. - -(3) To align the end of a structure to the alignment requirement of a - particular type, end the format with the code for that type with a repeat - count of zero. See :ref:`struct-examples`. - .. _struct-examples: @@ -342,7 +343,7 @@ alignment does not enforce any alignment. .. _struct-objects: -Objects +Classes ------- The :mod:`struct` module also defines the following type: |