diff options
author | William Joye <wjoye@cfa.harvard.edu> | 2016-10-18 17:31:11 (GMT) |
---|---|---|
committer | William Joye <wjoye@cfa.harvard.edu> | 2016-10-18 17:31:11 (GMT) |
commit | 066971b1e6e77991d9161bb0216a63ba94ea04f9 (patch) | |
tree | 6de02f79b7a4bb08a329581aa67b444fb9001bfd /tcl8.6/doc/binary.n | |
parent | ba065c2de121da1c1dfddd0aa587d10e7e150f05 (diff) | |
parent | 9966985d896629eede849a84f18e406d1164a16c (diff) | |
download | blt-066971b1e6e77991d9161bb0216a63ba94ea04f9.zip blt-066971b1e6e77991d9161bb0216a63ba94ea04f9.tar.gz blt-066971b1e6e77991d9161bb0216a63ba94ea04f9.tar.bz2 |
Merge commit '9966985d896629eede849a84f18e406d1164a16c' as 'tcl8.6'
Diffstat (limited to 'tcl8.6/doc/binary.n')
-rw-r--r-- | tcl8.6/doc/binary.n | 908 |
1 files changed, 908 insertions, 0 deletions
diff --git a/tcl8.6/doc/binary.n b/tcl8.6/doc/binary.n new file mode 100644 index 0000000..5f25d65 --- /dev/null +++ b/tcl8.6/doc/binary.n @@ -0,0 +1,908 @@ +'\" +'\" Copyright (c) 1997 by Sun Microsystems, Inc. +'\" Copyright (c) 2008 by Donal K. Fellows +'\" +'\" See the file "license.terms" for information on usage and redistribution +'\" of this file, and for a DISCLAIMER OF ALL WARRANTIES. +'\" +.TH binary n 8.0 Tcl "Tcl Built-In Commands" +.so man.macros +.BS +'\" Note: do not modify the .SH NAME line immediately below! +.SH NAME +binary \- Insert and extract fields from binary strings +.SH SYNOPSIS +.VS 8.6 +\fBbinary decode \fIformat\fR ?\fI\-option value ...\fR? \fIdata\fR +.br +\fBbinary encode \fIformat\fR ?\fI\-option value ...\fR? \fIdata\fR +.br +.VE 8.6 +\fBbinary format \fIformatString \fR?\fIarg arg ...\fR? +.br +\fBbinary scan \fIstring formatString \fR?\fIvarName varName ...\fR? +.BE +.SH DESCRIPTION +.PP +This command provides facilities for manipulating binary data. The +subcommand \fBbinary format\fR creates a binary string from normal +Tcl values. For example, given the values 16 and 22, on a 32-bit +architecture, it might produce an 8-byte binary string consisting of +two 4-byte integers, one for each of the numbers. The subcommand +\fBbinary scan\fR, does the opposite: it extracts data +from a binary string and returns it as ordinary Tcl string values. +.VS 8.6 +The \fBbinary encode\fR and \fBbinary decode\fR subcommands convert +binary data to or from string encodings such as base64 (used in MIME +messages for example). +.VE 8.6 +.PP +Note that other operations on binary data, such as taking a subsequence of it, +getting its length, or reinterpreting it as a string in some encoding, are +done by other Tcl commands (respectively \fBstring range\fR, +\fBstring length\fR and \fBencoding convertfrom\fR in the example cases). A +binary string in Tcl is merely one where all the characters it contains are in +the range \eu0000\-\eu00FF. +.SH "BINARY ENCODE AND DECODE" +.VS 8.6 +.PP +When encoding binary data as a readable string, the starting binary data is +passed to the \fBbinary encode\fR command, together with the name of the +encoding to use and any encoding-specific options desired. Data which has been +encoded can be converted back to binary form using \fBbinary decode\fR. The +following formats and options are supported. +.TP +\fBbase64\fR +. +The \fBbase64\fR binary encoding is commonly used in mail messages and XML +documents, and uses mostly upper and lower case letters and digits. It has the +distinction of being able to be rewrapped arbitrarily without losing +information. +.RS +.PP +During encoding, the following options are supported: +.TP +\fB\-maxlen \fIlength\fR +. +Indicates that the output should be split into lines of no more than +\fIlength\fR characters. By default, lines are not split. +.TP +\fB\-wrapchar \fIcharacter\fR +. +Indicates that, when lines are split because of the \fB\-maxlen\fR option, +\fIcharacter\fR should be used to separate lines. By default, this is a +newline character, +.QW \en . +.PP +During decoding, the following options are supported: +.TP +\fB\-strict\fR +. +Instructs the decoder to throw an error if it encounters whitespace characters. Otherwise it ignores them. +.RE +.TP +\fBhex\fR +. +The \fBhex\fR binary encoding converts each byte to a pair of hexadecimal +digits in big-endian form. +.RS +.PP +No options are supported during encoding. During decoding, the following +options are supported: +.TP +\fB\-strict\fR +. +Instructs the decoder to throw an error if it encounters whitespace characters. Otherwise it ignores them. +.RE +.TP +\fBuuencode\fR +. +The \fBuuencode\fR binary encoding used to be common for transfer of data +between Unix systems and on USENET, but is less common these days, having been +largely superseded by the \fBbase64\fR binary encoding. +.RS +.PP +During encoding, the following options are supported (though changing them may +produce files that other implementations of decoders cannot process): +.TP +\fB\-maxlen \fIlength\fR +. +Indicates that the output should be split into lines of no more than +\fIlength\fR characters. By default, lines are split every 61 characters, and +this must be in the range 3 to 85 due to limitations in the encoding. +.TP +\fB\-wrapchar \fIcharacter\fR +. +Indicates that, when lines are split because of the \fB\-maxlen\fR option, +\fIcharacter\fR should be used to separate lines. By default, this is a +newline character, +.QW \en . +.PP +During decoding, the following options are supported: +.TP +\fB\-strict\fR +. +Instructs the decoder to throw an error if it encounters unexpected whitespace +characters. Otherwise it ignores them. +.PP +Note that neither the encoder nor the decoder handle the header and footer of +the uuencode format. +.RE +.VE 8.6 +.SH "BINARY FORMAT" +.PP +The \fBbinary format\fR command generates a binary string whose layout +is specified by the \fIformatString\fR and whose contents come from +the additional arguments. The resulting binary value is returned. +.PP +The \fIformatString\fR consists of a sequence of zero or more field +specifiers separated by zero or more spaces. Each field specifier is +a single type character followed by an optional flag character followed +by an optional numeric \fIcount\fR. +Most field specifiers consume one argument to obtain the value to be +formatted. The type character specifies how the value is to be +formatted. The \fIcount\fR typically indicates how many items of the +specified type are taken from the value. If present, the \fIcount\fR +is a non-negative decimal integer or \fB*\fR, which normally indicates +that all of the items in the value are to be used. If the number of +arguments does not match the number of fields in the format string +that consume arguments, then an error is generated. The flag character +is ignored for \fBbinary format\fR. +.PP +Here is a small example to clarify the relation between the field +specifiers and the arguments: +.CS +\fBbinary format\fR d3d {1.0 2.0 3.0 4.0} 0.1 +.CE +.PP +The first argument is a list of four numbers, but because of the count +of 3 for the associated field specifier, only the first three will be +used. The second argument is associated with the second field +specifier. The resulting binary string contains the four numbers 1.0, +2.0, 3.0 and 0.1. +.PP +Each type-count pair moves an imaginary cursor through the binary +data, storing bytes at the current position and advancing the cursor +to just after the last byte stored. The cursor is initially at +position 0 at the beginning of the data. The type may be any one of +the following characters: +.IP \fBa\fR 5 +Stores a byte string of length \fIcount\fR in the output string. +Every character is taken as modulo 256 (i.e. the low byte of every +character is used, and the high byte discarded) so when storing +character strings not wholly expressible using the characters \eu0000-\eu00ff, +the \fBencoding convertto\fR command should be used first to change +the string into an external representation +if this truncation is not desired (i.e. if the characters are +not part of the ISO 8859\-1 character set.) +If \fIarg\fR has fewer than \fIcount\fR bytes, then additional zero +bytes are used to pad out the field. If \fIarg\fR is longer than the +specified length, the extra characters will be ignored. If +\fIcount\fR is \fB*\fR, then all of the bytes in \fIarg\fR will be +formatted. If \fIcount\fR is omitted, then one character will be +formatted. For example, +.RS +.CS +\fBbinary format\fR a7a*a alpha bravo charlie +.CE +will return a string equivalent to \fBalpha\e000\e000bravoc\fR, +.CS +\fBbinary format\fR a* [encoding convertto utf-8 \eu20ac] +.CE +will return a string equivalent to \fB\e342\e202\e254\fR (which is the +UTF-8 byte sequence for a Euro-currency character) and +.CS +\fBbinary format\fR a* [encoding convertto iso8859-15 \eu20ac] +.CE +will return a string equivalent to \fB\e244\fR (which is the ISO +8859\-15 byte sequence for a Euro-currency character). Contrast these +last two with: +.CS +\fBbinary format\fR a* \eu20ac +.CE +which returns a string equivalent to \fB\e254\fR (i.e. \fB\exac\fR) by +truncating the high-bits of the character, and which is probably not +what is desired. +.RE +.IP \fBA\fR 5 +This form is the same as \fBa\fR except that spaces are used for +padding instead of nulls. For example, +.RS +.CS +\fBbinary format\fR A6A*A alpha bravo charlie +.CE +will return \fBalpha bravoc\fR. +.RE +.IP \fBb\fR 5 +Stores a string of \fIcount\fR binary digits in low-to-high order +within each byte in the output string. \fIArg\fR must contain a +sequence of \fB1\fR and \fB0\fR characters. The resulting bytes are +emitted in first to last order with the bits being formatted in +low-to-high order within each byte. If \fIarg\fR has fewer than +\fIcount\fR digits, then zeros will be used for the remaining bits. +If \fIarg\fR has more than the specified number of digits, the extra +digits will be ignored. If \fIcount\fR is \fB*\fR, then all of the +digits in \fIarg\fR will be formatted. If \fIcount\fR is omitted, +then one digit will be formatted. If the number of bits formatted +does not end at a byte boundary, the remaining bits of the last byte +will be zeros. For example, +.RS +.CS +\fBbinary format\fR b5b* 11100 111000011010 +.CE +will return a string equivalent to \fB\ex07\ex87\ex05\fR. +.RE +.IP \fBB\fR 5 +This form is the same as \fBb\fR except that the bits are stored in +high-to-low order within each byte. For example, +.RS +.CS +\fBbinary format\fR B5B* 11100 111000011010 +.CE +will return a string equivalent to \fB\exe0\exe1\exa0\fR. +.RE +.IP \fBH\fR 5 +Stores a string of \fIcount\fR hexadecimal digits in high-to-low +within each byte in the output string. \fIArg\fR must contain a +sequence of characters in the set +.QW 0123456789abcdefABCDEF . +The resulting bytes are emitted in first to last order with the hex digits +being formatted in high-to-low order within each byte. If \fIarg\fR +has fewer than \fIcount\fR digits, then zeros will be used for the +remaining digits. If \fIarg\fR has more than the specified number of +digits, the extra digits will be ignored. If \fIcount\fR is +\fB*\fR, then all of the digits in \fIarg\fR will be formatted. If +\fIcount\fR is omitted, then one digit will be formatted. If the +number of digits formatted does not end at a byte boundary, the +remaining bits of the last byte will be zeros. For example, +.RS +.CS +\fBbinary format\fR H3H*H2 ab DEF 987 +.CE +will return a string equivalent to \fB\exab\ex00\exde\exf0\ex98\fR. +.RE +.IP \fBh\fR 5 +This form is the same as \fBH\fR except that the digits are stored in +low-to-high order within each byte. This is seldom required. For example, +.RS +.CS +\fBbinary format\fR h3h*h2 AB def 987 +.CE +will return a string equivalent to \fB\exba\ex00\exed\ex0f\ex89\fR. +.RE +.IP \fBc\fR 5 +Stores one or more 8-bit integer values in the output string. If no +\fIcount\fR is specified, then \fIarg\fR must consist of an integer +value. If \fIcount\fR is specified, \fIarg\fR must consist of a list +containing at least that many integers. The low-order 8 bits of each integer +are stored as a one-byte value at the cursor position. If \fIcount\fR +is \fB*\fR, then all of the integers in the list are formatted. If the +number of elements in the list is greater +than \fIcount\fR, then the extra elements are ignored. For example, +.RS +.CS +\fBbinary format\fR c3cc* {3 -3 128 1} 260 {2 5} +.CE +will return a string equivalent to +\fB\ex03\exfd\ex80\ex04\ex02\ex05\fR, whereas +.CS +\fBbinary format\fR c {2 5} +.CE +will generate an error. +.RE +.IP \fBs\fR 5 +This form is the same as \fBc\fR except that it stores one or more +16-bit integers in little-endian byte order in the output string. The +low-order 16-bits of each integer are stored as a two-byte value at +the cursor position with the least significant byte stored first. For +example, +.RS +.CS +\fBbinary format\fR s3 {3 -3 258 1} +.CE +will return a string equivalent to +\fB\ex03\ex00\exfd\exff\ex02\ex01\fR. +.RE +.IP \fBS\fR 5 +This form is the same as \fBs\fR except that it stores one or more +16-bit integers in big-endian byte order in the output string. For +example, +.RS +.CS +\fBbinary format\fR S3 {3 -3 258 1} +.CE +will return a string equivalent to +\fB\ex00\ex03\exff\exfd\ex01\ex02\fR. +.RE +.IP \fBt\fR 5 +This form (mnemonically \fItiny\fR) is the same as \fBs\fR and \fBS\fR +except that it stores the 16-bit integers in the output string in the +native byte order of the machine where the Tcl script is running. +To determine what the native byte order of the machine is, refer to +the \fBbyteOrder\fR element of the \fBtcl_platform\fR array. +.IP \fBi\fR 5 +This form is the same as \fBc\fR except that it stores one or more +32-bit integers in little-endian byte order in the output string. The +low-order 32-bits of each integer are stored as a four-byte value at +the cursor position with the least significant byte stored first. For +example, +.RS +.CS +\fBbinary format\fR i3 {3 -3 65536 1} +.CE +will return a string equivalent to +\fB\ex03\ex00\ex00\ex00\exfd\exff\exff\exff\ex00\ex00\ex01\ex00\fR +.RE +.IP \fBI\fR 5 +This form is the same as \fBi\fR except that it stores one or more one +or more 32-bit integers in big-endian byte order in the output string. +For example, +.RS +.CS +\fBbinary format\fR I3 {3 -3 65536 1} +.CE +will return a string equivalent to +\fB\ex00\ex00\ex00\ex03\exff\exff\exff\exfd\ex00\ex01\ex00\ex00\fR +.RE +.IP \fBn\fR 5 +This form (mnemonically \fInumber\fR or \fInormal\fR) is the same as +\fBi\fR and \fBI\fR except that it stores the 32-bit integers in the +output string in the native byte order of the machine where the Tcl +script is running. +To determine what the native byte order of the machine is, refer to +the \fBbyteOrder\fR element of the \fBtcl_platform\fR array. +.IP \fBw\fR 5 +This form is the same as \fBc\fR except that it stores one or more +64-bit integers in little-endian byte order in the output string. The +low-order 64-bits of each integer are stored as an eight-byte value at +the cursor position with the least significant byte stored first. For +example, +.RS +.CS +\fBbinary format\fR w 7810179016327718216 +.CE +will return the string \fBHelloTcl\fR +.RE +.IP \fBW\fR 5 +This form is the same as \fBw\fR except that it stores one or more one +or more 64-bit integers in big-endian byte order in the output string. +For example, +.RS +.CS +\fBbinary format\fR Wc 4785469626960341345 110 +.CE +will return the string \fBBigEndian\fR +.RE +.IP \fBm\fR 5 +This form (mnemonically the mirror of \fBw\fR) is the same as \fBw\fR +and \fBW\fR except that it stores the 64-bit integers in the output +string in the native byte order of the machine where the Tcl script is +running. +To determine what the native byte order of the machine is, refer to +the \fBbyteOrder\fR element of the \fBtcl_platform\fR array. +.IP \fBf\fR 5 +This form is the same as \fBc\fR except that it stores one or more one +or more single-precision floating point numbers in the machine's native +representation in the output string. This representation is not +portable across architectures, so it should not be used to communicate +floating point numbers across the network. The size of a floating +point number may vary across architectures, so the number of bytes +that are generated may vary. If the value overflows the +machine's native representation, then the value of FLT_MAX +as defined by the system will be used instead. Because Tcl uses +double-precision floating point numbers internally, there may be some +loss of precision in the conversion to single-precision. For example, +on a Windows system running on an Intel Pentium processor, +.RS +.CS +\fBbinary format\fR f2 {1.6 3.4} +.CE +will return a string equivalent to +\fB\excd\excc\excc\ex3f\ex9a\ex99\ex59\ex40\fR. +.RE +.IP \fBr\fR 5 +This form (mnemonically \fIreal\fR) is the same as \fBf\fR except that +it stores the single-precision floating point numbers in little-endian +order. This conversion only produces meaningful output when used on +machines which use the IEEE floating point representation (very +common, but not universal.) +.IP \fBR\fR 5 +This form is the same as \fBr\fR except that it stores the +single-precision floating point numbers in big-endian order. +.IP \fBd\fR 5 +This form is the same as \fBf\fR except that it stores one or more one +or more double-precision floating point numbers in the machine's native +representation in the output string. For example, on a +Windows system running on an Intel Pentium processor, +.RS +.CS +\fBbinary format\fR d1 {1.6} +.CE +will return a string equivalent to +\fB\ex9a\ex99\ex99\ex99\ex99\ex99\exf9\ex3f\fR. +.RE +.IP \fBq\fR 5 +This form (mnemonically the mirror of \fBd\fR) is the same as \fBd\fR +except that it stores the double-precision floating point numbers in +little-endian order. This conversion only produces meaningful output +when used on machines which use the IEEE floating point representation +(very common, but not universal.) +.IP \fBQ\fR 5 +This form is the same as \fBq\fR except that it stores the +double-precision floating point numbers in big-endian order. +.IP \fBx\fR 5 +Stores \fIcount\fR null bytes in the output string. If \fIcount\fR is +not specified, stores one null byte. If \fIcount\fR is \fB*\fR, +generates an error. This type does not consume an argument. For +example, +.RS +.CS +\fBbinary format\fR a3xa3x2a3 abc def ghi +.CE +will return a string equivalent to \fBabc\e000def\e000\e000ghi\fR. +.RE +.IP \fBX\fR 5 +Moves the cursor back \fIcount\fR bytes in the output string. If +\fIcount\fR is \fB*\fR or is larger than the current cursor position, +then the cursor is positioned at location 0 so that the next byte +stored will be the first byte in the result string. If \fIcount\fR is +omitted then the cursor is moved back one byte. This type does not +consume an argument. For example, +.RS +.CS +\fBbinary format\fR a3X*a3X2a3 abc def ghi +.CE +will return \fBdghi\fR. +.RE +.IP \fB@\fR 5 +Moves the cursor to the absolute location in the output string +specified by \fIcount\fR. Position 0 refers to the first byte in the +output string. If \fIcount\fR refers to a position beyond the last +byte stored so far, then null bytes will be placed in the uninitialized +locations and the cursor will be placed at the specified location. If +\fIcount\fR is \fB*\fR, then the cursor is moved to the current end of +the output string. If \fIcount\fR is omitted, then an error will be +generated. This type does not consume an argument. For example, +.RS +.CS +\fBbinary format\fR a5@2a1@*a3@10a1 abcde f ghi j +.CE +will return \fBabfdeghi\e000\e000j\fR. +.RE +.SH "BINARY SCAN" +.PP +The \fBbinary scan\fR command parses fields from a binary string, +returning the number of conversions performed. \fIString\fR gives the +input bytes to be parsed (one byte per character, and characters not +representable as a byte have their high bits chopped) +and \fIformatString\fR indicates how to parse it. +Each \fIvarName\fR gives the name of a variable; when a field is +scanned from \fIstring\fR the result is assigned to the corresponding +variable. +.PP +As with \fBbinary format\fR, the \fIformatString\fR consists of a +sequence of zero or more field specifiers separated by zero or more +spaces. Each field specifier is a single type character followed by +an optional flag character followed by an optional numeric \fIcount\fR. +Most field specifiers consume one +argument to obtain the variable into which the scanned values should +be placed. The type character specifies how the binary data is to be +interpreted. The \fIcount\fR typically indicates how many items of +the specified type are taken from the data. If present, the +\fIcount\fR is a non-negative decimal integer or \fB*\fR, which +normally indicates that all of the remaining items in the data are to +be used. If there are not enough bytes left after the current cursor +position to satisfy the current field specifier, then the +corresponding variable is left untouched and \fBbinary scan\fR returns +immediately with the number of variables that were set. If there are +not enough arguments for all of the fields in the format string that +consume arguments, then an error is generated. The flag character +.QW u +may be given to cause some types to be read as unsigned values. The flag +is accepted for all field types but is ignored for non-integer fields. +.PP +A similar example as with \fBbinary format\fR should explain the +relation between field specifiers and arguments in case of the binary +scan subcommand: +.CS +\fBbinary scan\fR $bytes s3s first second +.CE +.PP +This command (provided the binary string in the variable \fIbytes\fR +is long enough) assigns a list of three integers to the variable +\fIfirst\fR and assigns a single value to the variable \fIsecond\fR. +If \fIbytes\fR contains fewer than 8 bytes (i.e. four 2-byte +integers), no assignment to \fIsecond\fR will be made, and if +\fIbytes\fR contains fewer than 6 bytes (i.e. three 2-byte integers), +no assignment to \fIfirst\fR will be made. Hence: +.CS +puts [\fBbinary scan\fR abcdefg s3s first second] +puts $first +puts $second +.CE +will print (assuming neither variable is set previously): +.CS +1 +25185 25699 26213 +can't read "second": no such variable +.CE +.PP +It is \fIimportant\fR to note that the \fBc\fR, \fBs\fR, and \fBS\fR +(and \fBi\fR and \fBI\fR on 64bit systems) will be scanned into +long data size values. In doing this, values that have their high +bit set (0x80 for chars, 0x8000 for shorts, 0x80000000 for ints), +will be sign extended. Thus the following will occur: +.CS +set signShort [\fBbinary format\fR s1 0x8000] +\fBbinary scan\fR $signShort s1 val; \fI# val == 0xFFFF8000\fR +.CE +If you require unsigned values you can include the +.QW u +flag character following +the field type. For example, to read an unsigned short value: +.CS +set signShort [\fBbinary format\fR s1 0x8000] +\fBbinary scan\fR $signShort su1 val; \fI# val == 0x00008000\fR +.CE +.PP +Each type-count pair moves an imaginary cursor through the binary data, +reading bytes from the current position. The cursor is initially +at position 0 at the beginning of the data. The type may be any one of +the following characters: +.IP \fBa\fR 5 +The data is a byte string of length \fIcount\fR. If \fIcount\fR +is \fB*\fR, then all of the remaining bytes in \fIstring\fR will be +scanned into the variable. If \fIcount\fR is omitted, then one +byte will be scanned. +All bytes scanned will be interpreted as being characters in the +range \eu0000-\eu00ff so the \fBencoding convertfrom\fR command will be +needed if the string is not a binary string or a string encoded in ISO +8859\-1. +For example, +.RS +.CS +\fBbinary scan\fR abcde\e000fghi a6a10 var1 var2 +.CE +will return \fB1\fR with the string equivalent to \fBabcde\e000\fR +stored in \fIvar1\fR and \fIvar2\fR left unmodified, and +.CS +\fBbinary scan\fR \e342\e202\e254 a* var1 +set var2 [encoding convertfrom utf-8 $var1] +.CE +will store a Euro-currency character in \fIvar2\fR. +.RE +.IP \fBA\fR 5 +This form is the same as \fBa\fR, except trailing blanks and nulls are stripped from +the scanned value before it is stored in the variable. For example, +.RS +.CS +\fBbinary scan\fR "abc efghi \e000" A* var1 +.CE +will return \fB1\fR with \fBabc efghi\fR stored in \fIvar1\fR. +.RE +.IP \fBb\fR 5 +The data is turned into a string of \fIcount\fR binary digits in +low-to-high order represented as a sequence of +.QW 1 +and +.QW 0 +characters. The data bytes are scanned in first to last order with +the bits being taken in low-to-high order within each byte. Any extra +bits in the last byte are ignored. If \fIcount\fR is \fB*\fR, then +all of the remaining bits in \fIstring\fR will be scanned. If +\fIcount\fR is omitted, then one bit will be scanned. For example, +.RS +.CS +\fBbinary scan\fR \ex07\ex87\ex05 b5b* var1 var2 +.CE +will return \fB2\fR with \fB11100\fR stored in \fIvar1\fR and +\fB1110000110100000\fR stored in \fIvar2\fR. +.RE +.IP \fBB\fR 5 +This form is the same as \fBb\fR, except the bits are taken in +high-to-low order within each byte. For example, +.RS +.CS +\fBbinary scan\fR \ex70\ex87\ex05 B5B* var1 var2 +.CE +will return \fB2\fR with \fB01110\fR stored in \fIvar1\fR and +\fB1000011100000101\fR stored in \fIvar2\fR. +.RE +.IP \fBH\fR 5 +The data is turned into a string of \fIcount\fR hexadecimal digits in +high-to-low order represented as a sequence of characters in the set +.QW 0123456789abcdef . +The data bytes are scanned in first to last +order with the hex digits being taken in high-to-low order within each +byte. Any extra bits in the last byte are ignored. If \fIcount\fR is +\fB*\fR, then all of the remaining hex digits in \fIstring\fR will be +scanned. If \fIcount\fR is omitted, then one hex digit will be +scanned. For example, +.RS +.CS +\fBbinary scan\fR \ex07\exC6\ex05\ex1f\ex34 H3H* var1 var2 +.CE +will return \fB2\fR with \fB07c\fR stored in \fIvar1\fR and +\fB051f34\fR stored in \fIvar2\fR. +.RE +.IP \fBh\fR 5 +This form is the same as \fBH\fR, except the digits are taken in +reverse (low-to-high) order within each byte. For example, +.RS +.CS +\fBbinary scan\fR \ex07\ex86\ex05\ex12\ex34 h3h* var1 var2 +.CE +will return \fB2\fR with \fB706\fR stored in \fIvar1\fR and +\fB502143\fR stored in \fIvar2\fR. +.PP +Note that most code that wishes to parse the hexadecimal digits from +multiple bytes in order should use the \fBH\fR format. +.RE +.IP \fBc\fR 5 +The data is turned into \fIcount\fR 8-bit signed integers and stored +in the corresponding variable as a list. If \fIcount\fR is \fB*\fR, +then all of the remaining bytes in \fIstring\fR will be scanned. If +\fIcount\fR is omitted, then one 8-bit integer will be scanned. For +example, +.RS +.CS +\fBbinary scan\fR \ex07\ex86\ex05 c2c* var1 var2 +.CE +will return \fB2\fR with \fB7 -122\fR stored in \fIvar1\fR and \fB5\fR +stored in \fIvar2\fR. Note that the integers returned are signed, but +they can be converted to unsigned 8-bit quantities using an expression +like: +.CS +set num [expr { $num & 0xff }] +.CE +.RE +.IP \fBs\fR 5 +The data is interpreted as \fIcount\fR 16-bit signed integers +represented in little-endian byte order. The integers are stored in +the corresponding variable as a list. If \fIcount\fR is \fB*\fR, then +all of the remaining bytes in \fIstring\fR will be scanned. If +\fIcount\fR is omitted, then one 16-bit integer will be scanned. For +example, +.RS +.CS +\fBbinary scan\fR \ex05\ex00\ex07\ex00\exf0\exff s2s* var1 var2 +.CE +will return \fB2\fR with \fB5 7\fR stored in \fIvar1\fR and \fB\-16\fR +stored in \fIvar2\fR. Note that the integers returned are signed, but +they can be converted to unsigned 16-bit quantities using an expression +like: +.CS +set num [expr { $num & 0xffff }] +.CE +.RE +.IP \fBS\fR 5 +This form is the same as \fBs\fR except that the data is interpreted +as \fIcount\fR 16-bit signed integers represented in big-endian byte +order. For example, +.RS +.CS +\fBbinary scan\fR \ex00\ex05\ex00\ex07\exff\exf0 S2S* var1 var2 +.CE +will return \fB2\fR with \fB5 7\fR stored in \fIvar1\fR and \fB\-16\fR +stored in \fIvar2\fR. +.RE +.IP \fBt\fR 5 +The data is interpreted as \fIcount\fR 16-bit signed integers +represented in the native byte order of the machine running the Tcl +script. It is otherwise identical to \fBs\fR and \fBS\fR. +To determine what the native byte order of the machine is, refer to +the \fBbyteOrder\fR element of the \fBtcl_platform\fR array. +.IP \fBi\fR 5 +The data is interpreted as \fIcount\fR 32-bit signed integers +represented in little-endian byte order. The integers are stored in +the corresponding variable as a list. If \fIcount\fR is \fB*\fR, then +all of the remaining bytes in \fIstring\fR will be scanned. If +\fIcount\fR is omitted, then one 32-bit integer will be scanned. For +example, +.RS +.CS +set str \ex05\ex00\ex00\ex00\ex07\ex00\ex00\ex00\exf0\exff\exff\exff +\fBbinary scan\fR $str i2i* var1 var2 +.CE +will return \fB2\fR with \fB5 7\fR stored in \fIvar1\fR and \fB\-16\fR +stored in \fIvar2\fR. Note that the integers returned are signed, but +they can be converted to unsigned 32-bit quantities using an expression +like: +.CS +set num [expr { $num & 0xffffffff }] +.CE +.RE +.IP \fBI\fR 5 +This form is the same as \fBI\fR except that the data is interpreted +as \fIcount\fR 32-bit signed integers represented in big-endian byte +order. For example, +.RS +.CS +set str \ex00\ex00\ex00\ex05\ex00\ex00\ex00\ex07\exff\exff\exff\exf0 +\fBbinary scan\fR $str I2I* var1 var2 +.CE +will return \fB2\fR with \fB5 7\fR stored in \fIvar1\fR and \fB\-16\fR +stored in \fIvar2\fR. +.RE +.IP \fBn\fR 5 +The data is interpreted as \fIcount\fR 32-bit signed integers +represented in the native byte order of the machine running the Tcl +script. It is otherwise identical to \fBi\fR and \fBI\fR. +To determine what the native byte order of the machine is, refer to +the \fBbyteOrder\fR element of the \fBtcl_platform\fR array. +.IP \fBw\fR 5 +The data is interpreted as \fIcount\fR 64-bit signed integers +represented in little-endian byte order. The integers are stored in +the corresponding variable as a list. If \fIcount\fR is \fB*\fR, then +all of the remaining bytes in \fIstring\fR will be scanned. If +\fIcount\fR is omitted, then one 64-bit integer will be scanned. For +example, +.RS +.CS +set str \ex05\ex00\ex00\ex00\ex07\ex00\ex00\ex00\exf0\exff\exff\exff +\fBbinary scan\fR $str wi* var1 var2 +.CE +will return \fB2\fR with \fB30064771077\fR stored in \fIvar1\fR and +\fB\-16\fR stored in \fIvar2\fR. Note that the integers returned are +signed and cannot be represented by Tcl as unsigned values. +.RE +.IP \fBW\fR 5 +This form is the same as \fBw\fR except that the data is interpreted +as \fIcount\fR 64-bit signed integers represented in big-endian byte +order. For example, +.RS +.CS +set str \ex00\ex00\ex00\ex05\ex00\ex00\ex00\ex07\exff\exff\exff\exf0 +\fBbinary scan\fR $str WI* var1 var2 +.CE +will return \fB2\fR with \fB21474836487\fR stored in \fIvar1\fR and \fB\-16\fR +stored in \fIvar2\fR. +.RE +.IP \fBm\fR 5 +The data is interpreted as \fIcount\fR 64-bit signed integers +represented in the native byte order of the machine running the Tcl +script. It is otherwise identical to \fBw\fR and \fBW\fR. +To determine what the native byte order of the machine is, refer to +the \fBbyteOrder\fR element of the \fBtcl_platform\fR array. +.IP \fBf\fR 5 +The data is interpreted as \fIcount\fR single-precision floating point +numbers in the machine's native representation. The floating point +numbers are stored in the corresponding variable as a list. If +\fIcount\fR is \fB*\fR, then all of the remaining bytes in +\fIstring\fR will be scanned. If \fIcount\fR is omitted, then one +single-precision floating point number will be scanned. The size of a +floating point number may vary across architectures, so the number of +bytes that are scanned may vary. If the data does not represent a +valid floating point number, the resulting value is undefined and +compiler dependent. For example, on a Windows system running on an +Intel Pentium processor, +.RS +.CS +\fBbinary scan\fR \ex3f\excc\excc\excd f var1 +.CE +will return \fB1\fR with \fB1.6000000238418579\fR stored in +\fIvar1\fR. +.RE +.IP \fBr\fR 5 +This form is the same as \fBf\fR except that the data is interpreted +as \fIcount\fR single-precision floating point number in little-endian +order. This conversion is not portable to the minority of systems not +using IEEE floating point representations. +.IP \fBR\fR 5 +This form is the same as \fBf\fR except that the data is interpreted +as \fIcount\fR single-precision floating point number in big-endian +order. This conversion is not portable to the minority of systems not +using IEEE floating point representations. +.IP \fBd\fR 5 +This form is the same as \fBf\fR except that the data is interpreted +as \fIcount\fR double-precision floating point numbers in the +machine's native representation. For example, on a Windows system +running on an Intel Pentium processor, +.RS +.CS +\fBbinary scan\fR \ex9a\ex99\ex99\ex99\ex99\ex99\exf9\ex3f d var1 +.CE +will return \fB1\fR with \fB1.6000000000000001\fR +stored in \fIvar1\fR. +.RE +.IP \fBq\fR 5 +This form is the same as \fBd\fR except that the data is interpreted +as \fIcount\fR double-precision floating point number in little-endian +order. This conversion is not portable to the minority of systems not +using IEEE floating point representations. +.IP \fBQ\fR 5 +This form is the same as \fBd\fR except that the data is interpreted +as \fIcount\fR double-precision floating point number in big-endian +order. This conversion is not portable to the minority of systems not +using IEEE floating point representations. +.IP \fBx\fR 5 +Moves the cursor forward \fIcount\fR bytes in \fIstring\fR. If +\fIcount\fR is \fB*\fR or is larger than the number of bytes after the +current cursor position, then the cursor is positioned after +the last byte in \fIstring\fR. If \fIcount\fR is omitted, then the +cursor is moved forward one byte. Note that this type does not +consume an argument. For example, +.RS +.CS +\fBbinary scan\fR \ex01\ex02\ex03\ex04 x2H* var1 +.CE +will return \fB1\fR with \fB0304\fR stored in \fIvar1\fR. +.RE +.IP \fBX\fR 5 +Moves the cursor back \fIcount\fR bytes in \fIstring\fR. If +\fIcount\fR is \fB*\fR or is larger than the current cursor position, +then the cursor is positioned at location 0 so that the next byte +scanned will be the first byte in \fIstring\fR. If \fIcount\fR +is omitted then the cursor is moved back one byte. Note that this +type does not consume an argument. For example, +.RS +.CS +\fBbinary scan\fR \ex01\ex02\ex03\ex04 c2XH* var1 var2 +.CE +will return \fB2\fR with \fB1 2\fR stored in \fIvar1\fR and \fB020304\fR +stored in \fIvar2\fR. +.RE +.IP \fB@\fR 5 +Moves the cursor to the absolute location in the data string specified +by \fIcount\fR. Note that position 0 refers to the first byte in +\fIstring\fR. If \fIcount\fR refers to a position beyond the end of +\fIstring\fR, then the cursor is positioned after the last byte. If +\fIcount\fR is omitted, then an error will be generated. For example, +.RS +.CS +\fBbinary scan\fR \ex01\ex02\ex03\ex04 c2@1H* var1 var2 +.CE +will return \fB2\fR with \fB1 2\fR stored in \fIvar1\fR and \fB020304\fR +stored in \fIvar2\fR. +.RE +.SH "PORTABILITY ISSUES" +.PP +The \fBr\fR, \fBR\fR, \fBq\fR and \fBQ\fR conversions will only work +reliably for transferring data between computers which are all using +IEEE floating point representations. This is very common, but not +universal. To transfer floating-point numbers portably between all +architectures, use their textual representation (as produced by +\fBformat\fR) instead. +.SH EXAMPLES +.PP +This is a procedure to write a Tcl string to a binary-encoded channel as +UTF-8 data preceded by a length word: +.PP +.CS +proc \fIwriteString\fR {channel string} { + set data [encoding convertto utf-8 $string] + puts -nonewline [\fBbinary format\fR Ia* \e + [string length $data] $data] +} +.CE +.PP +This procedure reads a string from a channel that was written by the +previously presented \fIwriteString\fR procedure: +.PP +.CS +proc \fIreadString\fR {channel} { + if {![\fBbinary scan\fR [read $channel 4] I length]} { + error "missing length" + } + set data [read $channel $length] + return [encoding convertfrom utf-8 $data] +} +.CE +.PP +This converts the contents of a file (named in the variable \fIfilename\fR) to +base64 and prints them: +.PP +.CS +set f [open $filename rb] +set data [read $f] +close $f +puts [\fBbinary encode\fR base64 \-maxlen 64 $data] +.CE +.SH "SEE ALSO" +encoding(n), format(n), scan(n), string(n), tcl_platform(n) +.SH KEYWORDS +binary, format, scan +'\" Local Variables: +'\" mode: nroff +'\" fill-column: 78 +'\" End: |