summaryrefslogtreecommitdiffstats
path: root/doc/ByteArrObj.3
blob: fd7f245be1485ad64662894d02624c6a7e387f22 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
'\"
'\" Copyright (c) 1997 Sun Microsystems, Inc.
'\"
'\" See the file "license.terms" for information on usage and redistribution
'\" of this file, and for a DISCLAIMER OF ALL WARRANTIES.
'\"
.TH Tcl_ByteArrayObj 3 8.1 Tcl "Tcl Library Procedures"
.so man.macros
.BS
.SH NAME
Tcl_NewByteArrayObj, Tcl_SetByteArrayObj, Tcl_GetBytesFromObj, Tcl_GetByteArrayFromObj, Tcl_SetByteArrayLength \- manipulate a Tcl value as an array of bytes
.SH SYNOPSIS
.nf
\fB#include <tcl.h>\fR
.sp
Tcl_Obj *
\fBTcl_NewByteArrayObj\fR(\fIbytes, numBytes\fR)
.sp
void
\fBTcl_SetByteArrayObj\fR(\fIobjPtr, bytes, numBytes\fR)
.sp
.VS TIP568
unsigned char *
\fBTcl_GetBytesFromObj\fR(\fIinterp, objPtr, numBytesPtr\fR)
.VE TIP568
.sp
unsigned char *
\fBTcl_GetByteArrayFromObj\fR(\fIobjPtr, numBytesPtr\fR)
.sp
unsigned char *
\fBTcl_SetByteArrayLength\fR(\fIobjPtr, numBytes\fR)
.SH ARGUMENTS
.AS "const unsigned char" *numBytesPtr in/out
.AP "const unsigned char" *bytes in
The array of bytes used to initialize or set a byte-array value. May be NULL
even if \fInumBytes\fR is non-zero.
.AP int numBytes in
The number of bytes in the array.  It must be >= 0.
.AP Tcl_Obj *objPtr in/out
For \fBTcl_SetByteArrayObj\fR, this points to an unshared value to be
overwritten by a byte-array value.  For \fBTcl_GetBytesFromObj\fR,
\fBTcl_GetByteArrayFromObj\fR and \fBTcl_SetByteArrayLength\fR, this points
to the value from which to extract an array of bytes.
.AP Tcl_Interp *interp in
Interpreter to use for error reporting.
.AP "size_t | int" *numBytesPtr out
Points to space where the number of bytes in the array may be written.
Caller may pass NULL when it does not need this information.
.BE

.SH DESCRIPTION
.PP
These routines are used to create, modify, store, transfer, and retrieve
arbitrary binary data in Tcl values.  Specifically, data that can be
represented as a sequence of arbitrary byte values is supported.
This includes data read from binary channels, values created by the
\fBbinary\fR command, encrypted data, or other information representable as
a finite byte sequence.
.PP
A byte is an 8-bit quantity with no inherent meaning.  When the 8 bits are
interpreted as an integer value, the range of possible values is (0-255).
The C type best suited to store a byte is the \fBunsigned char\fR.
An \fBunsigned char\fR array of size \fIN\fR stores an aribtrary binary
value of size \fIN\fR bytes.  We call this representation a byte-array.
Here we document the routines that allow us to operate on Tcl values as
byte-arrays.
.PP
All Tcl values must correspond to a string representation.
When a byte-array value must be processed as a string, the sequence
of \fIN\fR bytes is transformed into the corresponding sequence
of \fIN\fR characters, where each byte value transforms to the same
character codepoint value in the range (U+0000 - U+00FF).  Obtaining the
string representation of a byte-array value (by calling
\fBTcl_GetStringFromObj\fR) produces this string in Tcl's usual
Modified UTF-8 encoding.
.PP
\fBTcl_NewByteArrayObj\fR and \fBTcl_SetByteArrayObj\fR
create a new value or overwrite an existing unshared value, respectively,
to hold a byte-array value of \fInumBytes\fR bytes.  When a caller
passes a non-NULL value of \fIbytes\fR, it must point to memory from
which \fInumBytes\fR bytes can be read.  These routines
allocate \fInumBytes\fR bytes of memory, copy \fInumBytes\fR
bytes from \fIbytes\fR into it, and keep the result in the internal
representation of the new or overwritten value.
When the caller passes a NULL value of \fIbytes\fR, the data copying
step is skipped, and the bytes stored in the value are undefined.
A \fIbytes\fR value of NULL is useful only when the caller will arrange
to write known contents into the byte-array through a pointer retrieved
by a call to one of the routines explained below.  \fBTcl_NewByteArrayObj\fR
returns a pointer to the created value with a reference count of zero.
\fBTcl_SetByteArrayObj\fR overwrites and invalidates any old contents
of the unshared \fIobjPtr\fR as appropriate, and keeps its reference
count (0 or 1) unchanged.  The value produced by these routines has no
string representation.  Any memory allocation failure may cause a panic.
Note that the type of the \fInumBytes\fR argument is \fBint\fR; consequently
the largest byte-array value that can be produced by these routines is one
holding \fBINT_MAX\fR bytes.  Note also that the string representation of
any Tcl value is limited to \fBINT_MAX\fR bytes, so caution should be
taken with any byte-array of more than \fBINT_MAX / 2\fR bytes.
.PP
\fBTcl_GetBytesFromObj\fR performs the opposite function of
\fBTcl_SetByteArrayObj\fR, providing access to read a byte-array from
a Tcl value that was previously written into it.  When \fIobjPtr\fR
is a value previously produced by \fBTcl_NewByteArrayObj\fR or
\fBTcl_SetByteArrayObj\fR, then \fBTcl_GetBytesFromObj\fR returns
a pointer to the byte-array kept in the value's internal representation.
If the caller provides a non-NULL value for \fInumBytesPtr\fR, it must
point to memory where \fBTcl_GetBytesFromObj\fR can write the number
of bytes in the value's internal byte-array.  With both pieces of
information, the caller is able to retrieve any information about the
contents of that byte-array that it seeks.  When \fIobjPtr\fR does
not already contain an internal byte-array, \fBTcl_GetBytesFromObj\fR
will try to create one from the value's string representation.  Any
string value that does not include any character codepoints outside
the range (U+0000 - U+00FF) will successfully translate to a unique
byte-array value.  With the created byte-array, the routine returns
as before.  For any string representation which does contain
a forbidden character codepoint, the conversion fails, and
\fBTcl_GetBytesFromObj\fR returns NULL to signal that failure.  On
failure, nothing will be written to \fInumBytesPtr\fR, and if
the \fIinterp\fR argument is non-NULL, then error messages and
codes are left in it recording the error.
.PP
\fBTcl_GetByteArrayFromObj\fR performs nearly the same function as
\fBTcl_GetBytesFromObj\fR.  They differ only in the circumstance when
a byte-array internal value must be created by transformation of
a string representation, and that string representation contains a
character with codepoint greater than U+00FF.  Instead of failing
the conversion, \fBTcl_GetByteArrayFromObj\fR will use the 8 least
significant bits of each codepoint to produce a valid byte value
from any character codepoint value.  In any other circumstance,
\fBTcl_GetByteArrayFromObj\fR performs just as \fBTcl_GetBytesFromObj\fR
does.  Since the conversion cannot fail, \fBTcl_GetByteArrayFromObj\fR
has no need for an \fIinterp\fR argument to record any errors and
the caller can assume \fBTcl_GetByteArrayFromObj\fR does not return NULL.
.PP
\fBTcl_GetByteArrayFromObj\fR must be used with caution.  Because of the
truncation on conversion, the byte-array made available to the caller
cannot reliably complete a round-trip back to the original string
representation.  This creates opportunities for bugs due to blindness
to differences in values.  This routine exists in this form primarily
for compatibility with codebases written for earlier releases of Tcl.
It is expected this routine will incompatibly change in Tcl 9 so that
it also signals failed conversions with a NULL return.
.PP
On success, both \fBTcl_GetBytesFromObj\fR and \fBTcl_GetByteArrayFromObj\fR
return a pointer into the internal representation of a \fBTcl_Obj\fR.
That pointer must not be freed by the caller, and should not be retained
for use beyond the known time the internal representation of the value
has not been disturbed.  The pointer may be used to overwrite the byte
contents of the internal representation, so long as the value is unshared
and any string representation is invalidated.
.PP
On success, both \fBTcl_GetBytesFromObj\fR and \fBTcl_GetByteArrayFromObj\fR
write the number of bytes in the byte-array value of \fIobjPtr\fR
to the space pointed to by \fInumBytesPtr\fR.  This space may be of type
\fBsize_t\fR or of type \fBint\fR.  In Tcl 8, the largest number of
bytes possible is \fBINT_MAX\fR, so either type can receive the value.
In codebases meant to migrate to Tcl 9, the option to write to a space
of type \fBsize_t\fR may aid in the migration.
.PP
\fBTcl_SetByteArrayLength\fR enables a caller to change the size of a
byte-array in the internal representation of an unshared \fIobjPtr\fR to
become \fInumBytes\fR bytes.  This is most often useful after the
bytes of the internal byte-array have been directly overwritten and it
has been discovered that the required size differs from the first
estimate used in the allocation. \fBTcl_SetByteArrayLength\fR returns
a pointer to the resized byte-array.  Because resizing the byte-array
changes the internal representation, \fBTcl_SetByteArrayLength\fR
also invalidates any string representation in \fIobjPtr\fR.  If resizing
grows the byte-array, the new byte values are undefined.  If \fIobjPtr\fR
does not already possess an internal byte-array, one is produced in the
same way that \fBTcl_GetByteArrayFromObj\fR does, with all the cautions
that go along with that.
.SH "REFERENCE COUNT MANAGEMENT"
.PP
\fBTcl_NewByteArrayObj\fR always returns a zero-reference object, much
like \fBTcl_NewObj\fR.
.PP
\fBTcl_SetByteArrayObj\fR and \fBTcl_SetByteArrayLength\fR do not modify the
reference count of their \fIobjPtr\fR arguments, but do require that the
object be unshared.
.PP
\fBTcl_GetBytesFromObj\fR and \fBTcl_GetByteArrayFromObj\fR do not modify
the reference count of \fIobjPtr\fR; they only read.

.SH "SEE ALSO"
Tcl_GetStringFromObj, Tcl_NewObj, Tcl_IncrRefCount, Tcl_DecrRefCount

.SH KEYWORDS
value, binary data, byte array, utf, unicode