summaryrefslogtreecommitdiffstats
path: root/doc/ByteArrObj.3
diff options
context:
space:
mode:
authordgp <dgp@users.sourceforge.net>2021-11-05 19:47:03 (GMT)
committerdgp <dgp@users.sourceforge.net>2021-11-05 19:47:03 (GMT)
commitcf9ec4f29ada714b10771db9437f510c4f0a4c94 (patch)
treee886fd3f7bf3d55633a79ee36bbdfe71ed420c81 /doc/ByteArrObj.3
parent1a396e5b24dc1eb61a3fde1e602cf95ab4c1e659 (diff)
downloadtcl-cf9ec4f29ada714b10771db9437f510c4f0a4c94.zip
tcl-cf9ec4f29ada714b10771db9437f510c4f0a4c94.tar.gz
tcl-cf9ec4f29ada714b10771db9437f510c4f0a4c94.tar.bz2
Adapt documentation of the *ByteArray* routines to better match Tcl
library functioning post-TIP 568.
Diffstat (limited to 'doc/ByteArrObj.3')
-rw-r--r--doc/ByteArrObj.3148
1 files changed, 105 insertions, 43 deletions
diff --git a/doc/ByteArrObj.3 b/doc/ByteArrObj.3
index 0703164..13aa012 100644
--- a/doc/ByteArrObj.3
+++ b/doc/ByteArrObj.3
@@ -50,57 +50,119 @@ Caller may pass NULL when it does not need this information.
.SH DESCRIPTION
.PP
-These procedures are used to create, modify, and read Tcl byte-array values
-from C code. Byte-array values are typically used to hold the
-results of binary IO operations, data structures created with the
-\fBbinary\fR command, or other information, such as encrypted data,
-represented as arbitrary binary data.
-A byte-array is an array of 8-bit quantities (the integer range 0 - 255)
-with no inherent meaning. When a byte-array value must be processed as
-a string, the sequence of \fBN\fR bytes is transformed into the corresponding
-sequence of \fBN\fR characters, where each byte value transforms to the same
+These routines are used to create, modify, store, transfer, and retrieve
+arbitrary binary data in Tcl values. Specifically, data that can be
+represented as a sequence of arbitrary byte values is supported.
+This includes data read from binary channels, values created by the
+\fBbinary\fR command, encrypted data, or other information representable as
+a finite byte sequence.
+.PP
+A byte is an 8-bit quantity with no inherent meaning. When the 8 bits are
+interpreted as an integer value, the range of possible values is (0-255).
+The C type best suited to store a byte is the \fBunsigned char\fR.
+An \fBunsigned char\fR array of size \fIN\fR stores an aribtrary binary
+value of size \fIN\fR bytes. We call this representation a byte-array.
+Here we document the routines that allow us to operate on Tcl values as
+byte-arrays.
+.PP
+All Tcl values must correspond to a string representation.
+When a byte-array value must be processed as a string, the sequence
+of \fIN\fR bytes is transformed into the corresponding sequence
+of \fIN\fR characters, where each byte value transforms to the same
character codepoint value in the range (U+0000 - U+00FF). Obtaining the
string representation of a byte-array value (by calling
-\fBTcl_GetStringFromObj\fR) produces this string in Tcl's usual
+\fBTcl_GetStringFromObj\fR) produces this string in Tcl's usual
Modified UTF-8 encoding.
.PP
\fBTcl_NewByteArrayObj\fR and \fBTcl_SetByteArrayObj\fR
create a new value or overwrite an existing unshared value, respectively,
-to hold a byte-array value of \fInumBytes\fR bytes. \fBTcl_NewByteArrayObj\fR
+to hold a byte-array value of \fInumBytes\fR bytes. When a caller
+passes a non-NULL value of \fIbytes\fR, it must point to memory from
+which \fInumBytes\fR bytes can be read. These routines
+allocate \fInumBytes\fR bytes of memory, copy \fInumBytes\fR
+bytes from \fIbytes\fR into it, and keep the result in the internal
+representation of the new or overwritten value.
+When the caller passes a NULL value of \fIbytes\fR, the data copying
+step is skipped, and the bytes stored in the value are undefined.
+A \fIbytes\fR value of NULL is useful only when the caller will arrange
+to write known contents into the byte-array through a pointer retrieved
+by a call to one of the routines explained below. \fBTcl_NewByteArrayObj\fR
returns a pointer to the created value with a reference count of zero.
\fBTcl_SetByteArrayObj\fR overwrites and invalidates any old contents
-as appropriate, and keeps the same reference count (0 or 1). When
-the \fIbytes\fR argument passed to either routine is not NULL, \fInumBytes\fR
-bytes are copied from \fIbytes\fR into the new value. When
-the \fIbytes\fR argument passed to either routine is NULL, the
-contents of the resulting byte array value are undefined. A \fIbytes\fR
-value of NULL is useful only when the caller will arrange to write
-known contents into the byte array through a pointer retrieved by a call
-to one of the routines explained below. Such manipulation must be performed
-only on unshared values, and accompanied by all appropriate invalidations.
+of the unshared \fIobjPtr\fR as appropriate, and keeps its reference
+count (0 or 1) unchanged. The value produced by these routines has no
+string representation. Any memory allocation failure may cause a panic.
+Note that the type of the \fInumBytes\fR argument is \fBint\fR; consequently
+the largest byte-array value that can be produced by these routines is one
+holding \fBINT_MAX\fR bytes. Note also that the string representation of
+any Tcl value is limited to \fBINT_MAX\fR bytes, so caution should be
+taken with any byte-array of more than \fBINT_MAX / 2\fR bytes.
.PP
-\fBTcl_GetByteArrayFromObj\fR converts a Tcl value to byte-array type and
-returns a pointer to the value's new internal representation as an array of
-bytes. The number of bytes in this array is stored in \fInumBytesPtr\fR if
-\fInumBytesPtr\fR is non-NULL. The storage for the array of bytes is owned by
-the value and should not be freed. The contents of the array may be
-modified by the caller only if the value is not shared and the caller
-invalidates the string representation.
+\fBTcl_GetBytesFromObj\fR performs the opposite function of
+\fBTcl_SetByteArrayObj\fR, providing access to read a byte-array from
+a Tcl value that was previously written into it. When \fIobjPtr\fR
+is a value previously produced by \fBTcl_NewByteArrayObj\fR or
+\fBTcl_SetByteArrayObj\fR, then \fBTcl_GetBytesFromObj\fR returns
+a pointer to the byte-array kept in the value's internal representation.
+If the caller provides a non-NULL value for \fInumBytesPtr\fR, it must
+point to memory where \fBTcl_GetBytesFromObj\fR can write the number
+of bytes in the value's internal byte-array. With both pieces of
+information, the caller is able to retrieve any information about the
+contents of that byte-array that it seeks. When \fIobjPtr\fR does
+not already contain an internal byte-array, \fBTcl_GetBytesFromObj\fR
+will try to create one from the value's string representation. Any
+string value that does not include any character codepoints outside
+the range (U+0000 - U+00FF) will successfully translate to a unique
+byte-array value. With the created byte-array, the routine returns
+as before. For any string representation which does contain
+a forbidden character codepoint, the conversion fails, and
+\fBTcl_GetBytesFromObj\fR returns NULL to signal that failure. On
+failure, nothing will be written to \fInumBytesPtr\fR, and if
+the \fIinterp\fR argument is non-NULL, then error messages and
+codes are left in it recording the error.
.PP
-\fBTcl_GetBytesFromObj\fR does almost the same as \fBTcl_GetByteArrayFromObj\fR,
-the difference is that this function can error if the object contains
-characters > 255. If \fBinterp\fR is not NULL, an error-message will be left there.
+\fBTcl_GetByteArrayFromObj\fR performs nearly the same function as
+\fBTcl_GetBytesFromObj\fR. They differ only in the circumstance when
+a byte-array internal value must be created by transformation of
+a string representation, and that string representation contains a
+character with codepoint greater than U+00FF. Instead of failing
+the conversion, \fBTcl_GetByteArrayFromObj\fR will use the 8 least
+significant bits of each codepoint to produce a valid byte value
+from any character codepoint value. In any other circumstance,
+\fBTcl_GetByteArrayFromObj\fR performs just as \fBTcl_GetBytesFromObj\fR
+does. Since the conversion cannot fail, \fBTcl_GetByteArrayFromObj\fR
+has no need for an \fIinterp\fR argument to record any errors and
+the caller can assume \fBTcl_GetByteArrayFromObj\fR does not return NULL.
.PP
-\fBTcl_SetByteArrayLength\fR converts the Tcl value to byte-array type
-and changes the number of bytes in the value's internal representation as an
-array of bytes. If \fInumBytes\fR is greater than the space currently
-allocated for the array, the array is reallocated be large enough to store
-the larger number of bytes; the newly allocated bytes at the end of the
-array have arbitrary values. If
-\fInumBytes\fR is less than the space currently allocated for the array,
-the length of array is reduced to the new length. The return value is a
-pointer to the value's new array of bytes.
-
+\fBTcl_GetByteArrayFromObj\fR must be used with caution. Because of the
+truncation on conversion, the byte-array made available to the caller
+cannot reliably complete a round-trip back to the original string
+representation. This creates opportunities for bugs due to blindness
+to differences in values. This routine exists in this form primarily
+for compatibility with codebases written for earlier releases of Tcl.
+It is expected this routine will incompatibly change in Tcl 9 so that
+it also signals failed conversions with a NULL return.
+.PP
+On success, both \fBTcl_GetByteFromObj\fR and \fBTcl_GetByteArrayFromObj\fR
+return a pointer into the internal representation of a \fBTcl_Obj\fR.
+That pointer must not be freed by the caller, and should not be retained
+for use beyond the known time the internal representation of the value
+has not been disturbed. The pointer may be used to overwrite the byte
+contents of the internal representation, so long as the value is unshared
+and any string representation is invalidated.
+.PP
+\fBTcl_SetByteArrayLength\fR enables a caller to change the size of a
+byte-array in the internal representation of an unshared \fIobjPtr\fR to
+become \fInumBytes\fR bytes. This is most often useful after the
+bytes of the internal byte-array have been directly overwritten and it
+has been discovered that the required size differs from the first
+estimate used in the allocation. \fBTcl_SetByteArrayLength\fR returns
+a pointer to the resized byte-array. Along with such resizing, any
+string representation of \fIobjPtr\fR must be invalidated. If resizing
+grows the byte-array, the new byte values are undefined. If \fIobjPtr\fR
+does not already possess an internal byte-array, one is produced in the
+same way that \fBTcl_GetByteArrayFromObj\fR does, with all the cautions
+that go along with that.
.SH "REFERENCE COUNT MANAGEMENT"
.PP
\fBTcl_NewByteArrayObj\fR always returns a zero-reference object, much
@@ -110,11 +172,11 @@ like \fBTcl_NewObj\fR.
reference count of their \fIobjPtr\fR arguments, but do require that the
object be unshared.
.PP
-\fBTcl_GetByteArrayFromObj\fR does not modify the reference count of its
-\fIobjPtr\fR argument; it only reads.
+\fBTcl_GetBytesFromObj\fR and \fBTcl_GetByteArrayFromObj\fR do not modify
+the reference count of \fIobjPtr\fR; they only read.
.SH "SEE ALSO"
Tcl_GetStringFromObj, Tcl_NewObj, Tcl_IncrRefCount, Tcl_DecrRefCount
.SH KEYWORDS
-value, binary data, byte array, utf, unicode, internationalization
+value, binary data, byte array, utf, unicode