summaryrefslogtreecommitdiffstats
path: root/Doc/lib/liblocale.tex
blob: bb84343f3bbedada64021020756b5e899f0118bb (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
\section{\module{locale} ---
         Internationalization services}

\declaremodule{standard}{locale}
\modulesynopsis{Internationalization services.}
\moduleauthor{Martin von Loewis}{loewis@informatik.hu-berlin.de}
\sectionauthor{Martin von Loewis}{loewis@informatik.hu-berlin.de}


The \module{locale} module opens access to the \POSIX{} locale database
and functionality. The \POSIX{} locale mechanism allows programmers
to deal with certain cultural issues in an application, without
requiring the programmer to know all the specifics of each country
where the software is executed.

The \module{locale} module is implemented on top of the
\module{_locale}\refbimodindex{_locale} module, which in turn uses an
ANSI C locale implementation if available.

The \module{locale} module defines the following exception and
functions:


\begin{funcdesc}{setlocale}{category\optional{, value}}
If \var{value} is specified, modifies the locale setting for the
\var{category}. The available categories are listed in the data
description below. The value is the name of a locale. An empty string
specifies the user's default settings. If the modification of the
locale fails, the exception \exception{Error} is
raised. If successful, the new locale setting is returned.

If no \var{value} is specified, the current setting for the
\var{category} is returned.

\function{setlocale()} is not thread safe on most systems. Applications
typically start with a call of
\begin{verbatim}
import locale
locale.setlocale(locale.LC_ALL,"")
\end{verbatim}
This sets the locale for all categories to the user's default setting
(typically specified in the \envvar{LANG} environment variable). If
the locale is not changed thereafter, using multithreading should not
cause problems.
\end{funcdesc}

\begin{excdesc}{Error}
Exception raised when \function{setlocale()} fails.
\end{excdesc}

\begin{funcdesc}{localeconv}{}
Returns the database of of the local conventions as a dictionary. This
dictionary has the following strings as keys:
\begin{itemize}
\item \code{decimal_point} specifies the decimal point used in
floating point number representations for the \constant{LC_NUMERIC}
category.
\item \code{grouping} is a sequence of numbers specifying at which
relative positions the \code{thousands_sep} is expected. If the
sequence is terminated with \constant{CHAR_MAX}, no further
grouping is performed. If the sequence terminates with a \code{0}, the last
group size is repeatedly used.
\item \code{thousands_sep} is the character used between groups.
\item \code{int_curr_symbol} specifies the international currency
symbol from the \constant{LC_MONETARY} category.
\item \code{currency_symbol} is the local currency symbol.
\item \code{mon_decimal_point} is the decimal point used in monetary
values.
\item \code{mon_thousands_sep} is the separator for grouping of
monetary values.
\item \code{mon_grouping} has the same format as the \code{grouping}
key; it is used for monetary values.
\item \code{positive_sign} and \code{negative_sign} gives the sign
used for positive and negative monetary quantities.
\item \code{int_frac_digits} and \code{frac_digits} specify the number
of fractional digits used in the international and local formatting
of monetary values.
\item \code{p_cs_precedes} and \code{n_cs_precedes} specifies whether
the currency symbol precedes the value for positive or negative
values.
\item \code{p_sep_by_space} and \code{n_sep_by_space} specifies
whether there is a space between the positive or negative value and
the currency symbol.
\item \code{p_sign_posn} and \code{n_sign_posn} indicate how the
sign should be placed for positive and negative monetary values. 
\end{itemize}

The possible values for \code{p_sign_posn} and
\code{n_sign_posn} are given below.

\begin{tableii}{c|l}{code}{Value}{Explanation}
\lineii{0}{Currency and value are surrounded by parentheses.}
\lineii{1}{The sign should precede the value and currency symbol.}
\lineii{2}{The sign should follow the value and currency symbol.}
\lineii{3}{The sign should immediately precede the value.}
\lineii{4}{The sign should immediately follow the value.}
\lineii{LC_MAX}{Nothing is specified in this locale.}
\end{tableii}
\end{funcdesc}

\begin{funcdesc}{strcoll}{string1,string2}
Compares two strings according to the current \constant{LC_COLLATE}
setting. As any other compare function, returns a negative, or a
positive value, or \code{0}, depending on whether \var{string1}
collates before or after \var{string2} or is equal to it.
\end{funcdesc}

\begin{funcdesc}{strxfrm}{string}
Transforms a string to one that can be used for the built-in function
\function{cmp()}\bifuncindex{cmp}, and still returns locale-aware
results.  This function can be used when the same string is compared
repeatedly, e.g. when collating a sequence of strings.
\end{funcdesc}

\begin{funcdesc}{format}{format, val, \optional{grouping\code{ = 0}}}
Formats a number \var{val} according to the current
\constant{LC_NUMERIC} setting.  The format follows the conventions of
the \code{\%} operator.  For floating point values, the decimal point
is modified if appropriate.  If \var{grouping} is true, also takes the
grouping into account.
\end{funcdesc}

\begin{funcdesc}{str}{float}
Formats a floating point number using the same format as the built-in
function \code{str(\var{float})}, but takes the decimal point into
account.
\end{funcdesc}

\begin{funcdesc}{atof}{string}
Converts a string to a floating point number, following the
\constant{LC_NUMERIC} settings.
\end{funcdesc}

\begin{funcdesc}{atoi}{string}
Converts a string to an integer, following the \constant{LC_NUMERIC}
conventions.
\end{funcdesc}

\begin{datadesc}{LC_CTYPE}
\refstmodindex{string}
Locale category for the character type functions. Depending on the
settings of this category, the functions of module \refmodule{string}
dealing with case change their behaviour.
\end{datadesc}

\begin{datadesc}{LC_COLLATE}
Locale category for sorting strings. The functions
\function{strcoll()} and \function{strxfrm()} of the \module{locale}
module are affected.
\end{datadesc}

\begin{datadesc}{LC_TIME}
Locale category for the formatting of time. The function
\function{time.strftime()} follows these conventions.
\end{datadesc}

\begin{datadesc}{LC_MONETARY}
Locale category for formatting of monetary values. The available
options are available from the \function{localeconv()} function.
\end{datadesc}

\begin{datadesc}{LC_MESSAGES}
Locale category for message display. Python currently does not support
application specific locale-aware messages. Messages displayed by the
operating system, like those returned by \function{os.strerror()}
might be affected by this category.
\end{datadesc}

\begin{datadesc}{LC_NUMERIC}
Locale category for formatting numbers. The functions
\function{format()}, \function{atoi()}, \function{atof()} and
\function{str()} of the \module{locale} module are affected by that
category. All other numeric formatting operations are not affected.
\end{datadesc}

\begin{datadesc}{LC_ALL}
Combination of all locale settings. If this flag is used when the
locale is changed, setting the locale for all categories is
attempted. If that fails for any category, no category is changed at
all. When the locale is retrieved using this flag, a string indicating
the setting for all categories is returned. This string can be later
used to restore the settings.
\end{datadesc}

\begin{datadesc}{CHAR_MAX}
This is a symbolic constant used for different values returned by
\function{localeconv()}.
\end{datadesc}

Example:

\begin{verbatim}
>>> import locale
>>> loc = locale.setlocale(locale.LC_ALL) # get current locale
>>> locale.setlocale(locale.LC_ALL, "de") # use German locale
>>> locale.strcoll("f\344n", "foo") # compare a string containing an umlaut 
>>> locale.setlocale(locale.LC_ALL, "") # use user's preferred locale
>>> locale.setlocale(locale.LC_ALL, "C") # use default (C) locale
>>> locale.setlocale(locale.LC_ALL, loc) # restore saved locale
\end{verbatim}

\subsection{Background, details, hints, tips and caveats}

The C standard defines the locale as a program-wide property that may
be relatively expensive to change.  On top of that, some
implementation are broken in such a way that frequent locale changes
may cause core dumps.  This makes the locale somewhat painful to use
correctly.

Initially, when a program is started, the locale is the \samp{C} locale, no
matter what the user's preferred locale is.  The program must
explicitly say that it wants the user's preferred locale settings by
calling \code{setlocale(LC_ALL, "")}.

It is generally a bad idea to call \function{setlocale()} in some library
routine, since as a side effect it affects the entire program.  Saving
and restoring it is almost as bad: it is expensive and affects other
threads that happen to run before the settings have been restored.

If, when coding a module for general use, you need a locale
independent version of an operation that is affected by the locale
(e.g. \function{string.lower()}, or certain formats used with
\function{time.strftime()})), you will have to find a way to do it
without using the standard library routine.  Even better is convincing
yourself that using locale settings is okay.  Only as a last resort
should you document that your module is not compatible with
non-\samp{C} locale settings.

The case conversion functions in the
\refmodule{string}\refstmodindex{string} and
\module{strop}\refbimodindex{strop} modules are affected by the locale
settings.  When a call to the \function{setlocale()} function changes
the \constant{LC_CTYPE} settings, the variables
\code{string.lowercase}, \code{string.uppercase} and
\code{string.letters} (and their counterparts in \module{strop}) are
recalculated.  Note that this code that uses these variable through
`\keyword{from} ... \keyword{import} ...', e.g. \code{from string
import letters}, is not affected by subsequent \function{setlocale()}
calls.

The only way to perform numeric operations according to the locale
is to use the special functions defined by this module:
\function{atof()}, \function{atoi()}, \function{format()},
\function{str()}.

\subsection{For extension writers and programs that embed Python}
\label{embedding-locale}

Extension modules should never call \function{setlocale()}, except to
find out what the current locale is.  But since the return value can
only be used portably to restore it, that is not very useful (except
perhaps to find out whether or not the locale is \samp{C}).

When Python is embedded in an application, if the application sets the
locale to something specific before initializing Python, that is
generally okay, and Python will use whatever locale is set,
\emph{except} that the \constant{LC_NUMERIC} locale should always be
\samp{C}.

The \function{setlocale()} function in the \module{locale} module
gives the Python progammer the impression that you can manipulate the
\constant{LC_NUMERIC} locale setting, but this not the case at the C
level: C code will always find that the \constant{LC_NUMERIC} locale
setting is \samp{C}.  This is because too much would break when the
decimal point character is set to something else than a period
(e.g. the Python parser would break).  Caveat: threads that run
without holding Python's global interpreter lock may occasionally find
that the numeric locale setting differs; this is because the only
portable way to implement this feature is to set the numeric locale
settings to what the user requests, extract the relevant
characteristics, and then restore the \samp{C} numeric locale.

When Python code uses the \module{locale} module to change the locale,
this also affects the embedding application.  If the embedding
application doesn't want this to happen, it should remove the
\module{_locale} extension module (which does all the work) from the
table of built-in modules in the \file{config.c} file, and make sure
that the \module{_locale} module is not accessible as a shared library.