diff options
-rw-r--r-- | Doc/lib/libstruct.tex | 73 | ||||
-rw-r--r-- | Doc/libstruct.tex | 73 |
2 files changed, 128 insertions, 18 deletions
diff --git a/Doc/lib/libstruct.tex b/Doc/lib/libstruct.tex index 4a08c78b..f29d83c 100644 --- a/Doc/lib/libstruct.tex +++ b/Doc/lib/libstruct.tex @@ -45,23 +45,81 @@ and Python values should be obvious given their types: \lineiii{x}{pad byte}{no value} \lineiii{c}{char}{string of length 1} \lineiii{b}{signed char}{integer} + \lineiii{B}{unsigned char}{integer} \lineiii{h}{short}{integer} + \lineiii{H}{unsigned short}{integer} \lineiii{i}{int}{integer} + \lineiii{I}{unsigned int}{integer} \lineiii{l}{long}{integer} + \lineiii{L}{unsigned long}{integer} \lineiii{f}{float}{float} \lineiii{d}{double}{float} + \lineiii{s}{char[]}{string} \end{tableiii} A format character may be preceded by an integral repeat count; e.g.\ the format string \code{'4h'} means exactly the same as \code{'hhhh'}. -C numbers are represented in the machine's native format and byte -order, and properly aligned by skipping pad bytes if necessary -(according to the rules used by the C compiler). +For the \code{'s'} format character, the count is interpreted as the +size of the string, not a repeat count like for the other format +characters; e.g. \code{'10s'} means a single 10-byte string, while +\code{'10c'} means 10 characters. For packing, the string is +truncated or padded with null bytes as appropriate to make it fit. +For unpacking, the resulting string always has exactly the specified +number of bytes. As a special case, \code{'0s'} means a single, empty +string (while \code{'0c'} means 0 characters). -Examples (all on a big-endian machine): +For the \code{'I'} and \code{'L'} format characters, the return +value is a Python long integer if a Python plain integer can't +represent the required range (note: this is dependent on the size of +the relevant C types only, not of the sign of the actual value). + +By default, C numbers are represented in the machine's native format +and byte order, and properly aligned by skipping pad bytes if +necessary (according to the rules used by the C compiler). + +Alternatively, the first character of the format string can be used to +indicate the byte order, size and alignment of the packed data, +according to the following table: + +\begin{tableiii}{|c|l|l|}{samp}{Character}{Byte order}{Size and alignment} + \lineiii{@}{native}{native} + \lineiii{=}{native}{standard} + \lineiii{<}{little-endian}{standard} + \lineiii{>}{big-endian}{standard} + \lineiii{!}{network (= big-endian)}{standard} +\end{tableiii} + +If the first character is not one of these, \code{'@'} is assumed. + +Native byte order is big-endian or little-endian, depending on the +host system (e.g. Motorola and Sun are big-endian; Intel and DEC are +little-endian). + +Native size and alignment are determined using the C compiler's sizeof +expression. This is always combined with native byte order. + +Standard size and alignment are as follows: no alignment is required +for any type (so you have to use pad bytes); short is 2 bytes; int and +long are 4 bytes. In this mode, there is no support for float and +double (\code{'f'} and \code{'d'}). + +Note the difference between \code{'@'} and \code{'='}: both use native +byte order, but the size and alignment of the latter is standardized. + +The form \code{'!'} is available for those poor souls who claim they +can't remember whether network byte order is big-endian or +little-endian. + +There is no way to indicate non-native byte order (i.e. force +byte-swapping); use the appropriate choice of \code{'<'} or +\code{'>'}. + +Examples (all using native byte order, size and alignment, on a +big-endian machine): \bcode\begin{verbatim} +from struct import * pack('hhl', 1, 2, 3) == '\000\001\000\002\000\000\000\003' unpack('hhl', '\000\001\000\002\000\000\000\003') == (1, 2, 3) calcsize('hhl') == 8 @@ -71,8 +129,5 @@ Hint: to align the end of a structure to the alignment requirement of a particular type, end the format with the code for that type with a repeat count of zero, e.g.\ the format \code{'llh0l'} specifies two pad bytes at the end, assuming longs are aligned on 4-byte boundaries. - -(More format characters are planned, e.g.\ \code{'s'} for character -arrays, upper case for unsigned variants, and a way to specify the -byte order, which is useful for [de]constructing network packets and -reading/writing portable binary file formats like TIFF and AIFF.) +(This only works when native size and alignment are in effect; +standard size and alignment does not enforce any alignment.) diff --git a/Doc/libstruct.tex b/Doc/libstruct.tex index 4a08c78b..f29d83c 100644 --- a/Doc/libstruct.tex +++ b/Doc/libstruct.tex @@ -45,23 +45,81 @@ and Python values should be obvious given their types: \lineiii{x}{pad byte}{no value} \lineiii{c}{char}{string of length 1} \lineiii{b}{signed char}{integer} + \lineiii{B}{unsigned char}{integer} \lineiii{h}{short}{integer} + \lineiii{H}{unsigned short}{integer} \lineiii{i}{int}{integer} + \lineiii{I}{unsigned int}{integer} \lineiii{l}{long}{integer} + \lineiii{L}{unsigned long}{integer} \lineiii{f}{float}{float} \lineiii{d}{double}{float} + \lineiii{s}{char[]}{string} \end{tableiii} A format character may be preceded by an integral repeat count; e.g.\ the format string \code{'4h'} means exactly the same as \code{'hhhh'}. -C numbers are represented in the machine's native format and byte -order, and properly aligned by skipping pad bytes if necessary -(according to the rules used by the C compiler). +For the \code{'s'} format character, the count is interpreted as the +size of the string, not a repeat count like for the other format +characters; e.g. \code{'10s'} means a single 10-byte string, while +\code{'10c'} means 10 characters. For packing, the string is +truncated or padded with null bytes as appropriate to make it fit. +For unpacking, the resulting string always has exactly the specified +number of bytes. As a special case, \code{'0s'} means a single, empty +string (while \code{'0c'} means 0 characters). -Examples (all on a big-endian machine): +For the \code{'I'} and \code{'L'} format characters, the return +value is a Python long integer if a Python plain integer can't +represent the required range (note: this is dependent on the size of +the relevant C types only, not of the sign of the actual value). + +By default, C numbers are represented in the machine's native format +and byte order, and properly aligned by skipping pad bytes if +necessary (according to the rules used by the C compiler). + +Alternatively, the first character of the format string can be used to +indicate the byte order, size and alignment of the packed data, +according to the following table: + +\begin{tableiii}{|c|l|l|}{samp}{Character}{Byte order}{Size and alignment} + \lineiii{@}{native}{native} + \lineiii{=}{native}{standard} + \lineiii{<}{little-endian}{standard} + \lineiii{>}{big-endian}{standard} + \lineiii{!}{network (= big-endian)}{standard} +\end{tableiii} + +If the first character is not one of these, \code{'@'} is assumed. + +Native byte order is big-endian or little-endian, depending on the +host system (e.g. Motorola and Sun are big-endian; Intel and DEC are +little-endian). + +Native size and alignment are determined using the C compiler's sizeof +expression. This is always combined with native byte order. + +Standard size and alignment are as follows: no alignment is required +for any type (so you have to use pad bytes); short is 2 bytes; int and +long are 4 bytes. In this mode, there is no support for float and +double (\code{'f'} and \code{'d'}). + +Note the difference between \code{'@'} and \code{'='}: both use native +byte order, but the size and alignment of the latter is standardized. + +The form \code{'!'} is available for those poor souls who claim they +can't remember whether network byte order is big-endian or +little-endian. + +There is no way to indicate non-native byte order (i.e. force +byte-swapping); use the appropriate choice of \code{'<'} or +\code{'>'}. + +Examples (all using native byte order, size and alignment, on a +big-endian machine): \bcode\begin{verbatim} +from struct import * pack('hhl', 1, 2, 3) == '\000\001\000\002\000\000\000\003' unpack('hhl', '\000\001\000\002\000\000\000\003') == (1, 2, 3) calcsize('hhl') == 8 @@ -71,8 +129,5 @@ Hint: to align the end of a structure to the alignment requirement of a particular type, end the format with the code for that type with a repeat count of zero, e.g.\ the format \code{'llh0l'} specifies two pad bytes at the end, assuming longs are aligned on 4-byte boundaries. - -(More format characters are planned, e.g.\ \code{'s'} for character -arrays, upper case for unsigned variants, and a way to specify the -byte order, which is useful for [de]constructing network packets and -reading/writing portable binary file formats like TIFF and AIFF.) +(This only works when native size and alignment are in effect; +standard size and alignment does not enforce any alignment.) |