summaryrefslogtreecommitdiffstats
path: root/doc
diff options
context:
space:
mode:
authordgp <dgp@users.sourceforge.net>2021-11-07 04:25:39 (GMT)
committerdgp <dgp@users.sourceforge.net>2021-11-07 04:25:39 (GMT)
commit084b76b92a5ec029401f05a4138c1559602c23e7 (patch)
treef8029c86dca73799a108275dc5a45f17b571744c /doc
parent08ffe0305990c426e94a6cee8ba3739d8a8c2ab6 (diff)
downloadtcl-084b76b92a5ec029401f05a4138c1559602c23e7.zip
tcl-084b76b92a5ec029401f05a4138c1559602c23e7.tar.gz
tcl-084b76b92a5ec029401f05a4138c1559602c23e7.tar.bz2
Updates to TIP 568 implementation for Tcl 9:
Use the argument numBytes instead of length. Update the docs to the TIP spec. Clarify and simplify the code.
Diffstat (limited to 'doc')
-rw-r--r--doc/ByteArrObj.3176
-rw-r--r--doc/binary.n11
2 files changed, 124 insertions, 63 deletions
diff --git a/doc/ByteArrObj.3 b/doc/ByteArrObj.3
index 053401a..7fdaea6 100644
--- a/doc/ByteArrObj.3
+++ b/doc/ByteArrObj.3
@@ -4,87 +4,147 @@
'\" See the file "license.terms" for information on usage and redistribution
'\" of this file, and for a DISCLAIMER OF ALL WARRANTIES.
'\"
-.TH Tcl_ByteArrayObj 3 8.1 Tcl "Tcl Library Procedures"
+.TH Tcl_ByteArrayObj 3 9.0 Tcl "Tcl Library Procedures"
.so man.macros
.BS
.SH NAME
-Tcl_NewByteArrayObj, Tcl_SetByteArrayObj, Tcl_GetByteArrayFromObj, Tcl_SetByteArrayLength \- manipulate Tcl values as a arrays of bytes
+Tcl_NewByteArrayObj, Tcl_SetByteArrayObj, Tcl_GetBytesFromObj, Tcl_GetByteArrayFromObj, Tcl_SetByteArrayLength \- manipulate a Tcl value as an array of bytes
.SH SYNOPSIS
.nf
\fB#include <tcl.h>\fR
.sp
Tcl_Obj *
-\fBTcl_NewByteArrayObj\fR(\fIbytes, length\fR)
+\fBTcl_NewByteArrayObj\fR(\fIbytes, numBytes\fR)
.sp
void
-\fBTcl_SetByteArrayObj\fR(\fIobjPtr, bytes, length\fR)
+\fBTcl_SetByteArrayObj\fR(\fIobjPtr, bytes, numBytes\fR)
.sp
+.VS TIP568
unsigned char *
-\fBTcl_GetByteArrayFromObj\fR(\fIobjPtr, lengthPtr\fR)
+\fBTcl_GetBytesFromObj\fR(\fIinterp, objPtr, numBytesPtr\fR)
+.VE TIP568
.sp
unsigned char *
-\fBTcl_SetByteArrayLength\fR(\fIobjPtr, length\fR)
+\fBTcl_GetByteArrayFromObj\fR(\fIobjPtr, numBytesPtr\fR)
+.sp
+unsigned char *
+\fBTcl_SetByteArrayLength\fR(\fIobjPtr, numBytes\fR)
.SH ARGUMENTS
-.AS "const unsigned char" *lengthPtr in/out
+.AS "const unsigned char" *numBytesPtr in/out
.AP "const unsigned char" *bytes in
The array of bytes used to initialize or set a byte-array value. May be NULL
-even if \fIlength\fR is non-zero.
-.AP size_t length in
-The length of the array of bytes.
+even if \fInumBytes\fR is non-zero.
+.AP size_t numBytes in
+The number of bytes in the array.
.AP Tcl_Obj *objPtr in/out
-For \fBTcl_SetByteArrayObj\fR, this points to the value to be converted to
-byte-array type. For \fBTcl_GetByteArrayFromObj\fR and
-\fBTcl_SetByteArrayLength\fR, this points to the value from which to get
-the byte-array value; if \fIobjPtr\fR does not already point to a byte-array
-value, it will be converted to one.
-.AP size_t | int *lengthPtr out
-Filled with the length of the array of bytes in the value.
-May be (int *)NULL when not used.
+For \fBTcl_SetByteArrayObj\fR, this points to an unshared value to be
+overwritten by a byte-array value. For \fBTcl_GetBytesFromObj\fR,
+\fBTcl_GetByteArrayFromObj\fR and \fBTcl_SetByteArrayLength\fR, this points
+to the value from which to extract an array of bytes.
+.AP Tcl_Interp *interp in
+Interpreter to use for error reporting.
+.AP "size_t | int" *numBytesPtr out
+Points to space where the number of bytes in the array may be written.
+Caller may pass NULL when it does not need this information.
.BE
.SH DESCRIPTION
.PP
-These procedures are used to create, modify, and read Tcl byte-array values
-from C code. Byte-array values are typically used to hold the
-results of binary IO operations or data structures created with the
-\fBbinary\fR command. In Tcl, an array of bytes is not equivalent to a
-string. Conceptually, a string is an array of Unicode characters, while a
-byte-array is an array of 8-bit quantities with no implicit meaning.
-Accessor functions are provided to get the string representation of a
-byte-array or to convert an arbitrary value to a byte-array. Obtaining the
+These routines are used to create, modify, store, transfer, and retrieve
+arbitrary binary data in Tcl values. Specifically, data that can be
+represented as a sequence of arbitrary byte values is supported.
+This includes data read from binary channels, values created by the
+\fBbinary\fR command, encrypted data, or other information representable as
+a finite byte sequence.
+.PP
+A byte is an 8-bit quantity with no inherent meaning. When the 8 bits are
+interpreted as an integer value, the range of possible values is (0-255).
+The C type best suited to store a byte is the \fBunsigned char\fR.
+An \fBunsigned char\fR array of size \fIN\fR stores an aribtrary binary
+value of size \fIN\fR bytes. We call this representation a byte-array.
+Here we document the routines that allow us to operate on Tcl values as
+byte-arrays.
+.PP
+All Tcl values must correspond to a string representation.
+When a byte-array value must be processed as a string, the sequence
+of \fIN\fR bytes is transformed into the corresponding sequence
+of \fIN\fR characters, where each byte value transforms to the same
+character codepoint value in the range (U+0000 - U+00FF). Obtaining the
string representation of a byte-array value (by calling
-\fBTcl_GetStringFromObj\fR) produces a properly formed UTF-8 sequence with a
-one-to-one mapping between the bytes in the internal representation and the
-UTF-8 characters in the string representation.
+\fBTcl_GetStringFromObj\fR) produces this string in Tcl's usual
+Modified UTF-8 encoding.
.PP
-\fBTcl_NewByteArrayObj\fR and \fBTcl_SetByteArrayObj\fR will
-create a new value of byte-array type or modify an existing value to have a
-byte-array type. Both of these procedures set the value's type to be
-byte-array and set the value's internal representation to a copy of the
-array of bytes given by \fIbytes\fR. \fBTcl_NewByteArrayObj\fR returns a
-pointer to a newly allocated value with a reference count of zero.
-\fBTcl_SetByteArrayObj\fR invalidates any old string representation and, if
-the value is not already a byte-array value, frees any old internal
-representation. If \fIbytes\fR is NULL then the new byte array contains
-arbitrary values.
+\fBTcl_NewByteArrayObj\fR and \fBTcl_SetByteArrayObj\fR
+create a new value or overwrite an existing unshared value, respectively,
+to hold a byte-array value of \fInumBytes\fR bytes. When a caller
+passes a non-NULL value of \fIbytes\fR, it must point to memory from
+which \fInumBytes\fR bytes can be read. These routines
+allocate \fInumBytes\fR bytes of memory, copy \fInumBytes\fR
+bytes from \fIbytes\fR into it, and keep the result in the internal
+representation of the new or overwritten value.
+When the caller passes a NULL value of \fIbytes\fR, the data copying
+step is skipped, and the bytes stored in the value are undefined.
+A \fIbytes\fR value of NULL is useful only when the caller will arrange
+to write known contents into the byte-array through a pointer retrieved
+by a call to one of the routines explained below. \fBTcl_NewByteArrayObj\fR
+returns a pointer to the created value with a reference count of zero.
+\fBTcl_SetByteArrayObj\fR overwrites and invalidates any old contents
+of the unshared \fIobjPtr\fR as appropriate, and keeps its reference
+count (0 or 1) unchanged. The value produced by these routines has no
+string representation. Any memory allocation failure may cause a panic.
.PP
-\fBTcl_GetByteArrayFromObj\fR converts a Tcl value to byte-array type and
-returns a pointer to the value's new internal representation as an array of
-bytes. The length of this array is stored in \fIlengthPtr\fR if
-\fIlengthPtr\fR is non-NULL. The storage for the array of bytes is owned by
-the value and should not be freed. The contents of the array may be
-modified by the caller only if the value is not shared and the caller
-invalidates the string representation.
+\fBTcl_GetBytesFromObj\fR performs the opposite function of
+\fBTcl_SetByteArrayObj\fR, providing access to read a byte-array from
+a Tcl value that was previously written into it. When \fIobjPtr\fR
+is a value previously produced by \fBTcl_NewByteArrayObj\fR or
+\fBTcl_SetByteArrayObj\fR, then \fBTcl_GetBytesFromObj\fR returns
+a pointer to the byte-array kept in the value's internal representation.
+If the caller provides a non-NULL value for \fInumBytesPtr\fR, it must
+point to memory where \fBTcl_GetBytesFromObj\fR can write the number
+of bytes in the value's internal byte-array. With both pieces of
+information, the caller is able to retrieve any information about the
+contents of that byte-array that it seeks. When \fIobjPtr\fR does
+not already contain an internal byte-array, \fBTcl_GetBytesFromObj\fR
+will try to create one from the value's string representation. Any
+string value that does not include any character codepoints outside
+the range (U+0000 - U+00FF) will successfully translate to a unique
+byte-array value. With the created byte-array, the routine returns
+as before. For any string representation which does contain
+a forbidden character codepoint, the conversion fails, and
+\fBTcl_GetBytesFromObj\fR returns NULL to signal that failure. On
+failure, nothing will be written to \fInumBytesPtr\fR, and if
+the \fIinterp\fR argument is non-NULL, then error messages and
+codes are left in it recording the error.
.PP
-\fBTcl_SetByteArrayLength\fR converts the Tcl value to byte-array type
-and changes the length of the value's internal representation as an
-array of bytes. If \fIlength\fR is greater than the space currently
-allocated for the array, the array is reallocated to the new length; the
-newly allocated bytes at the end of the array have arbitrary values. If
-\fIlength\fR is less than the space currently allocated for the array,
-the length of array is reduced to the new length. The return value is a
-pointer to the value's new array of bytes.
-
+\fBTcl_GetByteArrayFromObj\fR performs exactly the same function as
+\fBTcl_GetBytesFromObj\fR does when called with the \fIinterp\fR
+argument passed the value NULL. This is incompatible with the
+way \fBTcl_GetByteArrayFromObj\fR functioned in Tcl 8.
+\fBTcl_GetBytesFromObj\fR is the more capable interface and should
+usually be favored for use over \fBTcl_GetByteArrayFromObj\fR.
+.PP
+On success, both \fBTcl_GetByteFromObj\fR and \fBTcl_GetByteArrayFromObj\fR
+return a pointer into the internal representation of a \fBTcl_Obj\fR.
+That pointer must not be freed by the caller, and should not be retained
+for use beyond the known time the internal representation of the value
+has not been disturbed. The pointer may be used to overwrite the byte
+contents of the internal representation, so long as the value is unshared
+and any string representation is invalidated.
+.PP
+\fBTcl_SetByteArrayLength\fR enables a caller to change the size of a
+byte-array in the internal representation of an unshared \fIobjPtr\fR to
+become \fInumBytes\fR bytes. This is most often useful after the
+bytes of the internal byte-array have been directly overwritten and it
+has been discovered that the required size differs from the first
+estimate used in the allocation. \fBTcl_SetByteArrayLength\fR returns
+a pointer to the resized byte-array. Because resizing the byte-array
+changes the internal representation, \fBTcl_SetByteArrayLength\fR
+also invalidates any string representation in \fIobjPtr\fR. If resizing
+grows the byte-array, the new byte values are undefined. If \fIobjPtr\fR
+does not already possess an internal byte-array, one is produced in the
+same way that \fBTcl_GetBytesFromObj\fR does, also returning NULL
+when any characters of the value in \fIobjPtr\fR (up to
+\fInumBytes\fR of them) are not valid bytes.
.SH "REFERENCE COUNT MANAGEMENT"
.PP
\fBTcl_NewByteArrayObj\fR always returns a zero-reference object, much
@@ -94,11 +154,11 @@ like \fBTcl_NewObj\fR.
reference count of their \fIobjPtr\fR arguments, but do require that the
object be unshared.
.PP
-\fBTcl_GetByteArrayFromObj\fR does not modify the reference count of its
-\fIobjPtr\fR argument; it only reads.
+\fBTcl_GetBytesFromObj\fR and \fBTcl_GetByteArrayFromObj\fR do not modify
+the reference count of \fIobjPtr\fR; they only read.
.SH "SEE ALSO"
Tcl_GetStringFromObj, Tcl_NewObj, Tcl_IncrRefCount, Tcl_DecrRefCount
.SH KEYWORDS
-value, binary data, byte array, utf, unicode, internationalization
+value, binary data, byte array, utf, unicode
diff --git a/doc/binary.n b/doc/binary.n
index 9ab694e..1a62ad6 100644
--- a/doc/binary.n
+++ b/doc/binary.n
@@ -44,8 +44,9 @@ the range \eu0000\-\eu00FF.
When encoding binary data as a readable string, the starting binary data is
passed to the \fBbinary encode\fR command, together with the name of the
encoding to use and any encoding-specific options desired. Data which has been
-encoded can be converted back to binary form using \fBbinary decode\fR. The
-following formats and options are supported.
+encoded can be converted back to binary form using \fBbinary decode\fR.
+The \fBbinary encode\fR command raises an error if the \fIdata\fR argument
+is not binary data. The following formats and options are supported.
.TP
\fBbase64\fR
.
@@ -607,9 +608,9 @@ will return
.PP
The \fBbinary scan\fR command parses fields from a binary string,
returning the number of conversions performed. \fIString\fR gives the
-input bytes to be parsed (one byte per character, and characters not
-representable as a byte have their high bits chopped)
-and \fIformatString\fR indicates how to parse it.
+input bytes to be parsed and \fIformatString\fR indicates how to parse it.
+An error is raised if \fIstring\fR is anything other than a valid binary
+data value.
Each \fIvarName\fR gives the name of a variable; when a field is
scanned from \fIstring\fR the result is assigned to the corresponding
variable.