summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--Doc/lib/libstruct.tex73
-rw-r--r--Doc/libstruct.tex73
2 files changed, 128 insertions, 18 deletions
diff --git a/Doc/lib/libstruct.tex b/Doc/lib/libstruct.tex
index 4a08c78b..f29d83c 100644
--- a/Doc/lib/libstruct.tex
+++ b/Doc/lib/libstruct.tex
@@ -45,23 +45,81 @@ and Python values should be obvious given their types:
\lineiii{x}{pad byte}{no value}
\lineiii{c}{char}{string of length 1}
\lineiii{b}{signed char}{integer}
+ \lineiii{B}{unsigned char}{integer}
\lineiii{h}{short}{integer}
+ \lineiii{H}{unsigned short}{integer}
\lineiii{i}{int}{integer}
+ \lineiii{I}{unsigned int}{integer}
\lineiii{l}{long}{integer}
+ \lineiii{L}{unsigned long}{integer}
\lineiii{f}{float}{float}
\lineiii{d}{double}{float}
+ \lineiii{s}{char[]}{string}
\end{tableiii}
A format character may be preceded by an integral repeat count; e.g.\
the format string \code{'4h'} means exactly the same as \code{'hhhh'}.
-C numbers are represented in the machine's native format and byte
-order, and properly aligned by skipping pad bytes if necessary
-(according to the rules used by the C compiler).
+For the \code{'s'} format character, the count is interpreted as the
+size of the string, not a repeat count like for the other format
+characters; e.g. \code{'10s'} means a single 10-byte string, while
+\code{'10c'} means 10 characters. For packing, the string is
+truncated or padded with null bytes as appropriate to make it fit.
+For unpacking, the resulting string always has exactly the specified
+number of bytes. As a special case, \code{'0s'} means a single, empty
+string (while \code{'0c'} means 0 characters).
-Examples (all on a big-endian machine):
+For the \code{'I'} and \code{'L'} format characters, the return
+value is a Python long integer if a Python plain integer can't
+represent the required range (note: this is dependent on the size of
+the relevant C types only, not of the sign of the actual value).
+
+By default, C numbers are represented in the machine's native format
+and byte order, and properly aligned by skipping pad bytes if
+necessary (according to the rules used by the C compiler).
+
+Alternatively, the first character of the format string can be used to
+indicate the byte order, size and alignment of the packed data,
+according to the following table:
+
+\begin{tableiii}{|c|l|l|}{samp}{Character}{Byte order}{Size and alignment}
+ \lineiii{@}{native}{native}
+ \lineiii{=}{native}{standard}
+ \lineiii{<}{little-endian}{standard}
+ \lineiii{>}{big-endian}{standard}
+ \lineiii{!}{network (= big-endian)}{standard}
+\end{tableiii}
+
+If the first character is not one of these, \code{'@'} is assumed.
+
+Native byte order is big-endian or little-endian, depending on the
+host system (e.g. Motorola and Sun are big-endian; Intel and DEC are
+little-endian).
+
+Native size and alignment are determined using the C compiler's sizeof
+expression. This is always combined with native byte order.
+
+Standard size and alignment are as follows: no alignment is required
+for any type (so you have to use pad bytes); short is 2 bytes; int and
+long are 4 bytes. In this mode, there is no support for float and
+double (\code{'f'} and \code{'d'}).
+
+Note the difference between \code{'@'} and \code{'='}: both use native
+byte order, but the size and alignment of the latter is standardized.
+
+The form \code{'!'} is available for those poor souls who claim they
+can't remember whether network byte order is big-endian or
+little-endian.
+
+There is no way to indicate non-native byte order (i.e. force
+byte-swapping); use the appropriate choice of \code{'<'} or
+\code{'>'}.
+
+Examples (all using native byte order, size and alignment, on a
+big-endian machine):
\bcode\begin{verbatim}
+from struct import *
pack('hhl', 1, 2, 3) == '\000\001\000\002\000\000\000\003'
unpack('hhl', '\000\001\000\002\000\000\000\003') == (1, 2, 3)
calcsize('hhl') == 8
@@ -71,8 +129,5 @@ Hint: to align the end of a structure to the alignment requirement of
a particular type, end the format with the code for that type with a
repeat count of zero, e.g.\ the format \code{'llh0l'} specifies two
pad bytes at the end, assuming longs are aligned on 4-byte boundaries.
-
-(More format characters are planned, e.g.\ \code{'s'} for character
-arrays, upper case for unsigned variants, and a way to specify the
-byte order, which is useful for [de]constructing network packets and
-reading/writing portable binary file formats like TIFF and AIFF.)
+(This only works when native size and alignment are in effect;
+standard size and alignment does not enforce any alignment.)
diff --git a/Doc/libstruct.tex b/Doc/libstruct.tex
index 4a08c78b..f29d83c 100644
--- a/Doc/libstruct.tex
+++ b/Doc/libstruct.tex
@@ -45,23 +45,81 @@ and Python values should be obvious given their types:
\lineiii{x}{pad byte}{no value}
\lineiii{c}{char}{string of length 1}
\lineiii{b}{signed char}{integer}
+ \lineiii{B}{unsigned char}{integer}
\lineiii{h}{short}{integer}
+ \lineiii{H}{unsigned short}{integer}
\lineiii{i}{int}{integer}
+ \lineiii{I}{unsigned int}{integer}
\lineiii{l}{long}{integer}
+ \lineiii{L}{unsigned long}{integer}
\lineiii{f}{float}{float}
\lineiii{d}{double}{float}
+ \lineiii{s}{char[]}{string}
\end{tableiii}
A format character may be preceded by an integral repeat count; e.g.\
the format string \code{'4h'} means exactly the same as \code{'hhhh'}.
-C numbers are represented in the machine's native format and byte
-order, and properly aligned by skipping pad bytes if necessary
-(according to the rules used by the C compiler).
+For the \code{'s'} format character, the count is interpreted as the
+size of the string, not a repeat count like for the other format
+characters; e.g. \code{'10s'} means a single 10-byte string, while
+\code{'10c'} means 10 characters. For packing, the string is
+truncated or padded with null bytes as appropriate to make it fit.
+For unpacking, the resulting string always has exactly the specified
+number of bytes. As a special case, \code{'0s'} means a single, empty
+string (while \code{'0c'} means 0 characters).
-Examples (all on a big-endian machine):
+For the \code{'I'} and \code{'L'} format characters, the return
+value is a Python long integer if a Python plain integer can't
+represent the required range (note: this is dependent on the size of
+the relevant C types only, not of the sign of the actual value).
+
+By default, C numbers are represented in the machine's native format
+and byte order, and properly aligned by skipping pad bytes if
+necessary (according to the rules used by the C compiler).
+
+Alternatively, the first character of the format string can be used to
+indicate the byte order, size and alignment of the packed data,
+according to the following table:
+
+\begin{tableiii}{|c|l|l|}{samp}{Character}{Byte order}{Size and alignment}
+ \lineiii{@}{native}{native}
+ \lineiii{=}{native}{standard}
+ \lineiii{<}{little-endian}{standard}
+ \lineiii{>}{big-endian}{standard}
+ \lineiii{!}{network (= big-endian)}{standard}
+\end{tableiii}
+
+If the first character is not one of these, \code{'@'} is assumed.
+
+Native byte order is big-endian or little-endian, depending on the
+host system (e.g. Motorola and Sun are big-endian; Intel and DEC are
+little-endian).
+
+Native size and alignment are determined using the C compiler's sizeof
+expression. This is always combined with native byte order.
+
+Standard size and alignment are as follows: no alignment is required
+for any type (so you have to use pad bytes); short is 2 bytes; int and
+long are 4 bytes. In this mode, there is no support for float and
+double (\code{'f'} and \code{'d'}).
+
+Note the difference between \code{'@'} and \code{'='}: both use native
+byte order, but the size and alignment of the latter is standardized.
+
+The form \code{'!'} is available for those poor souls who claim they
+can't remember whether network byte order is big-endian or
+little-endian.
+
+There is no way to indicate non-native byte order (i.e. force
+byte-swapping); use the appropriate choice of \code{'<'} or
+\code{'>'}.
+
+Examples (all using native byte order, size and alignment, on a
+big-endian machine):
\bcode\begin{verbatim}
+from struct import *
pack('hhl', 1, 2, 3) == '\000\001\000\002\000\000\000\003'
unpack('hhl', '\000\001\000\002\000\000\000\003') == (1, 2, 3)
calcsize('hhl') == 8
@@ -71,8 +129,5 @@ Hint: to align the end of a structure to the alignment requirement of
a particular type, end the format with the code for that type with a
repeat count of zero, e.g.\ the format \code{'llh0l'} specifies two
pad bytes at the end, assuming longs are aligned on 4-byte boundaries.
-
-(More format characters are planned, e.g.\ \code{'s'} for character
-arrays, upper case for unsigned variants, and a way to specify the
-byte order, which is useful for [de]constructing network packets and
-reading/writing portable binary file formats like TIFF and AIFF.)
+(This only works when native size and alignment are in effect;
+standard size and alignment does not enforce any alignment.)