summaryrefslogtreecommitdiffstats
path: root/doc/string.n
blob: f06c3b6a2f2381268867ca8571cb8b81fb58b99f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
'\"
'\" Copyright (c) 1993 The Regents of the University of California.
'\" Copyright (c) 1994-1996 Sun Microsystems, Inc.
'\"
'\" See the file "license.terms" for information on usage and redistribution
'\" of this file, and for a DISCLAIMER OF ALL WARRANTIES.
'\" 
'\" RCS: @(#) $Id: string.n,v 1.12 1999/08/09 16:30:35 hobbs Exp $
'\" 
.so man.macros
.TH string n 8.1 Tcl "Tcl Built-In Commands"
.BS
'\" Note:  do not modify the .SH NAME line immediately below!
.SH NAME
string \- Manipulate strings
.SH SYNOPSIS
\fBstring \fIoption arg \fR?\fIarg ...?\fR
.BE

.SH DESCRIPTION
.PP
Performs one of several string operations, depending on \fIoption\fR.
The legal \fIoption\fRs (which may be abbreviated) are:
.VS 8.1
.TP
\fBstring bytelength \fIstring\fR
Returns a decimal string giving the number of bytes used to represent
\fIstring\fR in memory.  Because UTF\-8 uses one to three bytes to
represent Unicode characters, the byte length will not be the same as
the character length in general.  The cases where a script cares about
the byte length are rare.  In almost all cases, you should use the
\fBstring length\fR operation.  Refer to the \fBTcl_NumUtfChars\fR
manual entry for more details on the UTF\-8 representation.
.TP
\fBstring compare ?\fB\-nocase\fR? ?\fB\-length int\fR? \fIstring1 string2\fR
.VE 8.1
Perform a character-by-character comparison of strings \fIstring1\fR and
\fIstring2\fR.  Returns
\-1, 0, or 1, depending on whether \fIstring1\fR is lexicographically
less than, equal to, or greater than \fIstring2\fR.
.VS 8.1
If \fB\-length\fR is specified, then only the first \fIlength\fR characters
are used in the comparison.  If \fB\-length\fR is negative, it is
ignored.  If \fB\-nocase\fR is specified, then the strings are
compared in a case-insensitive manner.
.TP
\fBstring equal ?\fB\-nocase\fR? ?\fB-length int\fR? \fIstring1 string2\fR
Perform a character-by-character comparison of strings
\fIstring1\fR and \fIstring2\fR.  Returns 1 if \fIstring1\fR and
\fIstring2\fR are identical, or 0 when not.  If \fB\-length\fR is
specified, then only the first \fIlength\fR characters are used in the
comparison.  If \fB\-length\fR is negative, it is ignored.  If
\fB\-nocase\fR is specified, then the strings are compared in a
case-insensitive manner.
.TP
\fBstring first \fIstring1 string2\fR ?\fIstartIndex\fR?
.VE 8.1
Search \fIstring2\fR for a sequence of characters that exactly match
the characters in \fIstring1\fR.  If found, return the index of the
first character in the first such match within \fIstring2\fR.  If not
found, return \-1.
.VS 8.1
If \fIstartIndex\fR is specified (in any of the forms accepted by the
\fBindex\fR method), then the search is constrained to start with the
character in \fIstring2\fR specified by the index.  For example,
.RS
.CS
\fBstring first a 0a23456789abcdef 5\fR
.CE
will return \fB10\fR, but
.CS
\fBstring first a 0123456789abcdef 11\fR
.CE
will return \fB\-1\fR.
.RE
.VE 8.1
.TP
\fBstring index \fIstring charIndex\fR
Returns the \fIcharIndex\fR'th character of the \fIstring\fR
argument.  A \fIcharIndex\fR of 0 corresponds to the first
character of the string.  
.VS 8.1
\fIcharIndex\fR may be specified as
follows:
.RS
.IP \fIinteger\fR 10
The char specified at this integral index
.IP \fBend\fR 10
The last char of the string.
.IP \fBend\-\fIinteger\fR 10
The last char of the string minus the specified integer
offset (e.g. \fBend\-1\fR would refer to the "c" in "abcd").
.PP
.VE 8.1
If \fIcharIndex\fR is less than 0 or greater than
or equal to the length of the string then an empty string is
returned.
.VS 8.1
.RE
.TP
\fBstring is \fIclass\fR ?\fB\-strict\fR? ?\fB\-failindex \fIvarname\fR? \fIstring\fR
Returns 1 if \fIstring\fR is a valid member of the specified character
class, otherwise returns 0.  If \fB\-strict\fR is specified, then an
empty string returns 0, otherwise and empty string will return 1 on
any class.  If \fB\-failindex\fR is specified, then if the function
returns 0, the index in the string where the class was no longer valid
will be stored in the variable named \fIvarname\fR.  The \fIvarname\fR
will not be set if the function returns 1.  The following character classes
are recognized (the class name can be abbreviated):
.RS
.IP \fBalnum\fR 10
Any Unicode alphabet or digit character.
.IP \fBalpha\fR 10
Any Unicode alphabet character.
.IP \fBascii\fR 10
Any character with a value less than \\u0080 (those that
are in the 7\-bit ascii range).
.IP \fBboolean\fR 10
Any of the forms allowed to \fBTcl_GetBoolean\fR.
.IP \fBcontrol\fR 10
Any Unicode control character.
.IP \fBdigit\fR 10
Any Unicode digit character.  Note that this includes characters
outside of the [0\-9] range.
.IP \fBdouble\fR 10
Any of the valid forms for a double in Tcl, with optional surrounding
whitespace.  In case of under/overflow in the value, 0 is returned
and the \fIvarname\fR will contain \-1.
.IP \fBfalse\fR 10
Any of the forms allowed to \fBTcl_GetBoolean\fR where the value is false.
.IP \fBgraph\fR 10
Any Unicode printing character, except space.
.IP \fBinteger\fR 10
Any of the valid forms for an integer in Tcl, with optional surrounding
whitespace.  In case of under/overflow in the value, 0 is returned
and the \fIvarname\fR will contain \-1.
.IP \fBlower\fR 10
Any Unicode lower case alphabet character.
.IP \fBprint\fR 10
Any Unicode printing character, including space.
.IP \fBpunct\fR 10
Any Unicode punctuation character.
.IP \fBspace\fR 10
Any Unicode space character.
.IP \fBtrue\fR 10
Any of the forms allowed to \fBTcl_GetBoolean\fR where the value is true.
.IP \fBupper\fR 10
Any upper case alphabet character in the Unicode character set.
.IP \fBwordchar\fR 10
Any Unicode word character.  That is any alphanumeric character,
and any Unicode connector punctuation characters (e.g. underscore).
.IP \fBxdigit\fR 10
Any hexadecimal digit character ([0\-9A\-Fa\-f]).
.PP
In the case of \fBboolean\fR, \fBtrue\fR and \fBfalse\fR, if the
function will return 0, then the \fIvarname\fR will always be set to 0,
due to the varied nature of a valid boolean value.
.RE
.TP
\fBstring last \fIstring1 string2\fR ?\fIstartIndex\fR?
.VE 8.1
Search \fIstring2\fR for a sequence of characters that exactly match
the characters in \fIstring1\fR.  If found, return the index of the
first character in the last such match within \fIstring2\fR.  If there
is no match, then return \-1.
.VS 8.1
If \fIstartIndex\fR is specified (in any of the forms accepted by the
\fBindex\fR method), then only the characters in \fIstring2\fR at or before the
specified \fIstartIndex\fR will be considered by the search.  For example,
.RS
.CS
\fBstring last a 0a23456789abcdef 15\fR
.CE
will return \fB10\fR, but
.CS
\fBstring last a 0a23456789abcdef 9\fR
.CE
will return \fB1\fR.
.RE
.VE 8.1
.TP
\fBstring length \fIstring\fR
Returns a decimal string giving the number of characters in
\fIstring\fR.  Note that this is not necessarily the same as the
number of bytes used to store the string.
.VS 8.1
.TP
\fBstring map\fR ?\fB\-nocase\fR? \fIcharMap string\fR
Replaces characters in \fIstring\fR based on the key-value pairs in
\fIcharMap\fR.  \fIcharMap\fR is a list of \fIkey value key value\fR ...
as in the form returned by \fBarray get\fR.  Each instance of a
key in the string will be replaced with its corresponding value.  If
\fB\-nocase\fR is specified, then matching is done without regard to
case differences. Both \fIkey\fR and \fIvalue\fR may be multiple
characters.  Replacement is done in an ordered manner, so the key appearing
first in the list will be checked first, and so on.  \fIstring\fR is
only iterated over once, so earlier key replacements will have no
affect for later key matches.  For example,
.RS
.CS
\fBstring map {abc 1 ab 2 a 3 1 0} 1abcaababcabababc\fR
.CE
will return the string \fB01321221\fR.
.RE
.TP
\fBstring match ?\fB\-nocase\fR? \fIpattern\fR \fIstring\fR
.VE 8.1
See if \fIpattern\fR matches \fIstring\fR; return 1 if it does, 0
if it doesn't.  
.VS 8.1
If \fB\-nocase\fR is specified, then the pattern attempts to match
against the string in a case insensitive manner.
.VE 8.1
For the two strings to match, their contents
must be identical except that the following special sequences
may appear in \fIpattern\fR:
.RS
.IP \fB*\fR 10
Matches any sequence of characters in \fIstring\fR,
including a null string.
.IP \fB?\fR 10
Matches any single character in \fIstring\fR.
.IP \fB[\fIchars\fB]\fR 10
Matches any character in the set given by \fIchars\fR.  If a sequence
of the form
\fIx\fB\-\fIy\fR appears in \fIchars\fR, then any character
between \fIx\fR and \fIy\fR, inclusive, will match.
.VS 8.1
When used with \fB\-nocase\fR, the end points of the range are converted
to lower case first.  Whereas {[A\-z]} matches '_' when matching
case-sensitively ('_' falls between the 'Z' and 'a'), with \fB\-nocase\fR
this is considered like {[A\-Za\-z]} (and probably what was meant in the
first place).
.VE 8.1
.IP \fB\e\fIx\fR 10
Matches the single character \fIx\fR.  This provides a way of
avoiding the special interpretation of the characters
\fB*?[]\e\fR in \fIpattern\fR.
.RE
.TP
\fBstring range \fIstring first last\fR
Returns a range of consecutive characters from \fIstring\fR, starting
with the character whose index is \fIfirst\fR and ending with the
character whose index is \fIlast\fR. An index of 0 refers to the
.VS 8.1
first character of the string.  \fIfirst\fR and \fIlast\fR may be
specified as for the \fBindex\fR method.
.VE 8.1
If \fIfirst\fR is less than zero then it is treated as if it were zero, and
if \fIlast\fR is greater than or equal to the length of the string then
it is treated as if it were \fBend\fR.  If \fIfirst\fR is greater than
\fIlast\fR then an empty string is returned.
.VS 8.1
.TP
\fBstring repeat \fIstring count\fR
Returns \fIstring\fR repeated \fIcount\fR number of times.
.TP
\fBstring replace \fIstring first last\fR ?\fInewstring\fR?
Removes a range of consecutive characters from \fIstring\fR, starting
with the character whose index is \fIfirst\fR and ending with the
character whose index is \fIlast\fR.  An index of 0 refers to the
first character of the string.  \fIFirst\fR and \fIlast\fR may be
specified as for the \fBindex\fR method.  If \fInewstring\fR is
specified, then it is placed in the removed character range.
If \fIfirst\fR is less than zero then it is treated as if it were zero, and
if \fIlast\fR is greater than or equal to the length of the string then
it is treated as if it were \fBend\fR.  If \fIfirst\fR is greater than
\fIlast\fR or the length of the initial string, or \fIlast\fR is less
than 0, then the initial string is returned untouched.
.TP
\fBstring tolower \fIstring\fR ?\fIfirst\fR? ?\fIlast\fR?
Returns a value equal to \fIstring\fR except that all upper (or title) case
letters have been converted to lower case.  If \fIfirst\fR is specified, it
refers to the first char index in the string to start modifying.  If
\fIlast\fR is specified, it refers to the char index in the string to stop
at (inclusive).  \fIfirst\fR and \fIlast\fR may be
specified as for the \fBindex\fR method.
.TP
\fBstring totitle \fIstring\fR ?\fIfirst\fR? ?\fIlast\fR?
Returns a value equal to \fIstring\fR except that the first character
in \fIstring\fR is converted to its Unicode title case variant (or upper
case if there is no title case variant) and the rest of the string is
converted to lower case.  If \fIfirst\fR is specified, it
refers to the first char index in the string to start modifying.  If
\fIlast\fR is specified, it refers to the char index in the string to stop
at (inclusive).  \fIfirst\fR and \fIlast\fR may be
specified as for the \fBindex\fR method.
.TP
\fBstring toupper \fIstring\fR ?\fIfirst\fR? ?\fIlast\fR?
Returns a value equal to \fIstring\fR except that all lower (or title) case
letters have been converted to upper case.  If \fIfirst\fR is specified, it
refers to the first char index in the string to start modifying.  If
\fIlast\fR is specified, it refers to the char index in the string to stop
at (inclusive).  \fIfirst\fR and \fIlast\fR may be specified as for the
\fBindex\fR method.
.VE 8.1
.TP
\fBstring trim \fIstring\fR ?\fIchars\fR?
Returns a value equal to \fIstring\fR except that any leading
or trailing characters from the set given by \fIchars\fR are
removed.
If \fIchars\fR is not specified then white space is removed
(spaces, tabs, newlines, and carriage returns).
.TP
\fBstring trimleft \fIstring\fR ?\fIchars\fR?
Returns a value equal to \fIstring\fR except that any
leading characters from the set given by \fIchars\fR are
removed.
If \fIchars\fR is not specified then white space is removed
(spaces, tabs, newlines, and carriage returns).
.TP
\fBstring trimright \fIstring\fR ?\fIchars\fR?
Returns a value equal to \fIstring\fR except that any
trailing characters from the set given by \fIchars\fR are
removed.
If \fIchars\fR is not specified then white space is removed
(spaces, tabs, newlines, and carriage returns).
.VS 8.1
.TP
\fBstring wordend \fIstring charIndex\fR
Returns the index of the character just after the last one in the word
containing character \fIcharIndex\fR of \fIstring\fR.  \fIcharIndex\fR
may be specified as for the \fBindex\fR method.  A word is
considered to be any contiguous range of alphanumeric (Unicode letters
or decimal digits) or underscore (Unicode connector punctuation)
characters, or any single character other than these.
.TP
\fBstring wordstart \fIstring charIndex\fR
Returns the index of the first character in the word containing
character \fIcharIndex\fR of \fIstring\fR.  \fIcharIndex\fR may be
specified as for the \fBindex\fR method.  A word is considered to be any
contiguous range of alphanumeric (Unicode letters or decimal digits)
or underscore (Unicode connector punctuation) characters, or any
single character other than these.
.VE 8.1

.SH KEYWORDS
case conversion, compare, index, match, pattern, string, word, equal, ctype