summaryrefslogtreecommitdiffstats
path: root/doc/string.n
blob: 183cc04979066224ddf0dbe16ce016b30b232ef2 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
'\"
'\" Copyright (c) 1993 The Regents of the University of California.
'\" Copyright (c) 1994-1996 Sun Microsystems, Inc.
'\"
'\" See the file "license.terms" for information on usage and redistribution
'\" of this file, and for a DISCLAIMER OF ALL WARRANTIES.
'\" 
'\" RCS: @(#) $Id: string.n,v 1.7 1999/05/06 18:46:42 stanton Exp $
'\" 
.so man.macros
.TH string n 8.1 Tcl "Tcl Built-In Commands"
.BS
'\" Note:  do not modify the .SH NAME line immediately below!
.SH NAME
string \- Manipulate strings
.SH SYNOPSIS
\fBstring \fIoption arg \fR?\fIarg ...?\fR
.BE

.SH DESCRIPTION
.PP
Performs one of several string operations, depending on \fIoption\fR.
The legal \fIoption\fRs (which may be abbreviated) are:
.VS 8.1
.TP
\fBstring bytelength \fIstring\fR
Returns a decimal string giving the number of bytes used to represent
\fIstring\fR in memory.  Because UTF-8 uses one to three bytes to
represent Unicode characters, the byte length will not be the same as
the character length in general.  The cases where a script cares about
the byte length are rare.  In almost all cases, you should use the
\fBstring length\fB operation.  Refer to the \fBTcl_NumUtfChars\fR
manual entry for more details on the UTF-8 representation.
.TP
\fBstring compare ?\fI-nocase\fR? ?\fI-length int\fR? \fIstring1 string2\fR
.VE 8.1
Perform a character-by-character comparison of strings \fIstring1\fR and
\fIstring2\fR in the same way as the C \fBstrcmp\fR procedure.  Return
\-1, 0, or 1, depending on whether \fIstring1\fR is lexicographically
less than, equal to, or greater than \fIstring2\fR.
.VS 8.1
If \fI-length\fR is specified, it works like C \fBstrncmp\fR,
comparing only to the specified length.  If \fI-length\fR is negative,
it is ignored.  If \fI-nocase\fR is specified, then the strings are
compared in a case-insensitive manner.
.TP
\fBstring equal ?\fI-nocase\fR? ?\fI-length int\fR? \fIstring1 string2\fR
.VE 8.1
Like the \fBcompare\fR method, but returns 1 when the strings
are equal, or 0 when not.
.TP
\fBstring first \fIstring1 string2\fR
Search \fIstring2\fR for a sequence of characters that exactly match
the characters in \fIstring1\fR.  If found, return the index of the
first character in the first such match within \fIstring2\fR.  If not
found, return \-1.
.TP
\fBstring index \fIstring charIndex\fR
Returns the \fIcharIndex\fR'th character of the \fIstring\fR
argument.  A \fIcharIndex\fR of 0 corresponds to the first
character of the string.  
.VS 8.1
\fIcharIndex\fR may be specified as
follows:
.RS
.IP \fB[\fIinteger\fB]\fR 10
The char specified at this integral index
.IP \fBend\fR 10
The last char of the string.
.IP \fIexpression\fR 10
A Tcl expression that returns a number.
.IP \fBend-\fIinteger\fR 10
The last char of the string minus the specified integer
offset (e.g. \fBend-1\fR).
.PP
.VE 8.1
If \fIcharIndex\fR is less than 0 or greater than
or equal to the length of the string then an empty string is
returned.
.RE
.VS 8.1
.TP
\fBstring is \fIclass\fR ?\fI-strict\fR? ?\fI-failindex varname\fR? \fIstring\fR
See if \fIstring\fR is a valid form of the specified class.  If
\fI-strict\fR is specified, then an empty string returns 0, otherwise and
empty string will return 1 on any class.  If \fI-failindex\fR is specified,
then if the function returns 0, the index in the string where the class was
no longer valid will be stored in the variable named \fIvarname\fR.  The
\fIvarname\fR will not be set if the function returns 1.  The following
class definitions are allowed (the class name can be abbreviated):
.RS
.IP \fBalnum\fR 10
Any Unicode alphabet or digit character.
.IP \fBalpha\fR 10
Any Unicode alphabet character.
.IP \fBascii\fR 10
Any character with a value less than \\u0080 (those that
are in the 7-bit ascii range).
.IP \fBboolean\fR 10
Any of the forms allowed to Tcl_GetBoolean.
.IP \fBdigit\fR 10
Any Unicode digit character.
.IP \fBdouble\fR 10
Any of the valid forms for a double in Tcl, with optional surrounding
whitespace.  In case of under/overflow in the value, 0 is returned
and the \fIvarname\fR will contain -1.
.IP \fBfalse\fR 10
Any of the forms allowed to Tcl_GetBoolean where the value is false.
.IP \fBinteger\fR 10
Any of the valid forms for an integer in Tcl, with optional surrounding
whitespace.  In case of under/overflow in the value, 0 is returned
and the \fIvarname\fR will contain -1.
.IP \fBlower\fR 10
Any Unicode lower case alphabet character.
.IP \fBspace\fR 10
Any Unicode space character.
.IP \fBtrue\fR 10
Any of the forms allowed to Tcl_GetBoolean where the value is true.
.IP \fBupper\fR 10
Any upper case alphabet character in the Unicode character set.
.IP \fBwordchar\fR 10
Any Unicode word character.  That is any alphanumeric character,
and any Unicode connector punctuation characters (ie: underscore).
.RE
In the case of \fBboolean\fR, \fBtrue\fR and \fBfalse\fR, if the
function will return 0, the \fIvarname\fR will always be set to 0,
due to the varied nature of a valid boolean value.
.VE 8.1
.TP
\fBstring last \fIstring1 string2\fR
Search \fIstring2\fR for a sequence of characters that exactly match
the characters in \fIstring1\fR.  If found, return the index of the
first character in the last such match within \fIstring2\fR.  If there
is no match, then return \-1.
.TP
\fBstring length \fIstring\fR
Returns a decimal string giving the number of characters in
\fIstring\fR.  Note that this is not necessarily the same as the
number of bytes used to store the string.
.VS 8.1
.TP
\fBstring map ?\fIoptions\fR? \fIcharMap string\fR
Replaces characters in \fIstring\fR based on the key-value pairs in
\fIcharMap\fR.  \fIcharMap\fR is a list of key value key value ... as
in the form returned by \fBarray get\fR.  Each instance of a key in
the string will be replace with its corresponding value.  Both key and
value may be multiple characters.  This is done
in an ordered manner, so the key appearing first in the list will be
checked first, and so on.  \fIstring\fR is only iterated over once,
so earlier key replacements will have no affect for later key matches.
For example,
.RS
.CS
\fBstring map {abc 1 ab 2 a 3 1 0} 1abcaababcabababc\fR
.CE
will return the string \fB01321221\fR.
.RE
.VE 8.1
.TP
\fBstring match \fIpattern\fR \fIstring\fR
See if \fIpattern\fR matches \fIstring\fR; return 1 if it does, 0
if it doesn't.  Matching is done in a fashion similar to that
used by the C-shell.  For the two strings to match, their contents
must be identical except that the following special sequences
may appear in \fIpattern\fR:
.RS
.IP \fB*\fR 10
Matches any sequence of characters in \fIstring\fR,
including a null string.
.IP \fB?\fR 10
Matches any single character in \fIstring\fR.
.IP \fB[\fIchars\fB]\fR 10
Matches any character in the set given by \fIchars\fR.  If a sequence
of the form
\fIx\fB\-\fIy\fR appears in \fIchars\fR, then any character
between \fIx\fR and \fIy\fR, inclusive, will match.
.IP \fB\e\fIx\fR 10
Matches the single character \fIx\fR.  This provides a way of
avoiding the special interpretation of the characters
\fB*?[]\e\fR in \fIpattern\fR.
.RE
.TP
\fBstring range \fIstring first last\fR
Returns a range of consecutive characters from \fIstring\fR, starting
with the character whose index is \fIfirst\fR and ending with the
character whose index is \fIlast\fR. An index of 0 refers to the
.VS 8.1
first character of the string.  \fIfirst\fR and \fIlast\fR may be
specified as for the \fBindex\fR method.
.VE 8.1
If \fIfirst\fR is less than zero then it is treated as if it were zero, and
if \fIlast\fR is greater than or equal to the length of the string then
it is treated as if it were \fBend\fR.  If \fIfirst\fR is greater than
\fIlast\fR then an empty string is returned.
.VS 8.1
.TP
\fBstring repeat \fIstring count\fR
Returns \fIstring\fR repeated \fIcount\fR number of times.
.TP
\fBstring replace \fIstring last\fR ?\fIstring\fR?
Removes a range of consecutive characters from \fIstring\fR, starting
with the character whose index is \fIfirst\fR and ending with the
character whose index is \fIlast\fR.  An index of 0 refers to the
first character of the string.  \fIfirst\fR and \fIlast\fR may be
specified as for the \fBindex\fR method.  If \fIstring\fR is
specified, then it is placed in the removed character range.
If \fIfirst\fR is less than zero then it is treated as if it were zero, and
if \fIlast\fR is greater than or equal to the length of the string then
it is treated as if it were \fBend\fR.  If \fIfirst\fR is greater than
\fIlast\fR or the length of the initial string, or \fIlast\fR is less
than 0, then the initial string is returned untouched.
.TP
\fBstring tolower \fIstring\fR ?\fIfirst\fR? ?\fIlast\fR?
Returns a value equal to \fIstring\fR except that all upper (or title) case
letters have been converted to lower case.  If \fIfirst\fR is specified, it
refers to the first char index in the string to start modifying.  If
\fIlast\fR is specified, it refers to the char index in the string to stop
at (inclusive).  \fIfirst\fR and \fIlast\fR may be
specified as for the \fBindex\fR method.
.TP
\fBstring totitle \fIstring\fR ?\fIfirst\fR? ?\fIlast\fR?
Returns a value equal to \fIstring\fR except that the first character
in \fIstring\fR is converted to its Unicode title case variant (or upper
case if there is no title case variant) and the rest of the string is
converted to lower case.  If \fIfirst\fR is specified, it
refers to the first char index in the string to start modifying.  If
\fIlast\fR is specified, it refers to the char index in the string to stop
at (inclusive).  \fIfirst\fR and \fIlast\fR may be
specified as for the \fBindex\fR method.
.TP
\fBstring toupper \fIstring\fR ?\fIfirst\fR? ?\fIlast\fR?
Returns a value equal to \fIstring\fR except that all lower (or title) case
letters have been converted to upper case.  If \fIfirst\fR is specified, it
refers to the first char index in the string to start modifying.  If
\fIlast\fR is specified, it refers to the char index in the string to stop
at (inclusive).  \fIfirst\fR and \fIlast\fR may be specified as for the
\fBindex\fR method.
.VE 8.1
.TP
\fBstring trim \fIstring\fR ?\fIchars\fR?
Returns a value equal to \fIstring\fR except that any leading
or trailing characters from the set given by \fIchars\fR are
removed.
If \fIchars\fR is not specified then white space is removed
(spaces, tabs, newlines, and carriage returns).
.TP
\fBstring trimleft \fIstring\fR ?\fIchars\fR?
Returns a value equal to \fIstring\fR except that any
leading characters from the set given by \fIchars\fR are
removed.
If \fIchars\fR is not specified then white space is removed
(spaces, tabs, newlines, and carriage returns).
.TP
\fBstring trimright \fIstring\fR ?\fIchars\fR?
Returns a value equal to \fIstring\fR except that any
trailing characters from the set given by \fIchars\fR are
removed.
If \fIchars\fR is not specified then white space is removed
(spaces, tabs, newlines, and carriage returns).
.VS 8.1
.TP
\fBstring wordend \fIstring charIndex\fR
Returns the index of the character just after the last one in the word
containing character \fIcharIndex\fR of \fIstring\fR.  \fIcharIndex\fR
may be specified as for the \fBindex\fR method.  A word is
considered to be any contiguous range of alphanumeric (Unicode letters
or decimal digits) or underscore (Unicode connector punctuation)
characters, or any single character other than these.
.TP
\fBstring wordstart \fIstring charIndex\fR
Returns the index of the first character in the word containing
character \fIcharIndex\fR of \fIstring\fR.  \fIcharIndex\fR may be
specified as for the \fBindex\fR method.  A word is considered to be any
contiguous range of alphanumeric (Unicode letters or decimal digits)
or underscore (Unicode connector punctuation) characters, or any
single character other than these.
.VE 8.1

.SH KEYWORDS
case conversion, compare, index, match, pattern, string, word