summaryrefslogtreecommitdiffstats
path: root/doc/src/frameworks-technologies/unicode.qdoc
blob: 1f15afe49c4673a49ed3c98b753946b98e56bc67 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
/****************************************************************************
**
** Copyright (C) 2011 Nokia Corporation and/or its subsidiary(-ies).
** All rights reserved.
** Contact: Nokia Corporation (qt-info@nokia.com)
**
** This file is part of the documentation of the Qt Toolkit.
**
** $QT_BEGIN_LICENSE:FDL$
** GNU Free Documentation License
** Alternatively, this file may be used under the terms of the GNU Free
** Documentation License version 1.3 as published by the Free Software
** Foundation and appearing in the file included in the packaging of
** this file.
**
** Other Usage
** Alternatively, this file may be used in accordance with the terms
** and conditions contained in a signed written agreement between you
** and Nokia.
**
**
**
**
** $QT_END_LICENSE$
**
****************************************************************************/

/*!
    \group string-processing
    \title Classes for String Data

    \brief Classes for working with string data.

    These classes are relevant when working with string data. See the
    \l{Unicode in Qt}{information about support for Unicode in Qt} for
    more information.
*/


/*! 
    \page unicode.html
    \title Unicode in Qt
    \brief Information about support for Unicode in Qt.

    \keyword Unicode

    \ingroup technology-apis

    Unicode is a multi-byte character set, portable across all major
    computing platforms and with decent coverage over most of the world.
    It is also single-locale; it includes no code pages or other
    complexities that make software harder to write and test. There is no
    competing character set that's reasonably cross-platform. For these
    reasons, Unicode 4.0 is used as the native character set for Qt.
    
    \section1 Qt's Classes for Working with Strings

    These classes are relevant when working with string data. For information
    about rendering text, see the \l{Rich Text Processing} overview, and if
    your string data is in XML, see the \l{XML Processing} overview.

	\annotatedlist string-processing

    \section1 Information about Unicode on the Web

    The \l{http://www.unicode.org/}{Unicode Consortium} has a number
    of documents available, including

    \list

    \i \l{http://www.unicode.org/unicode/standard/principles.html}{A
    technical introduction to Unicode}
    \i \l{http://www.unicode.org/unicode/standard/standard.html}{The
    home page for the standard}

    \endlist


    \section1 The Standard

    The current version of the standard is \l{http://www.unicode.org/versions/Unicode5.1.0/}{Unicode 5.1.0}.

    Previous printed versions of the specification:

    \list
    \o \l{http://www.amazon.com/Unicode-Standard-Version-5-0-5th/dp/0321480910/trolltech/t}{The Unicode Standard, Version 5.0}
    \o \l{http://www.amazon.com/exec/obidos/ASIN/0321185781/trolltech/t}{The Unicode Standard, version 4.0}  
    \o \l{http://www.amazon.com/exec/obidos/ASIN/0201616335/trolltech/t}{The Unicode Standard, version 3.2}
    \o \l{http://www.amazon.com/exec/obidos/ASIN/0201473459/trolltech/t}{The Unicode Standard, version 2.0} \mdash
    see also the \l{http://www.unicode.org/unicode/reports/tr8.html}{2.1 update} and
    \l{http://www.unicode.org/unicode/standard/versions/enumeratedversions.html#Unicode 2.1.9}{the 2.1.9 data files} at
    \l{http://www.unicode.org}.
    \endlist

    \section1 Unicode in Qt

    In Qt, and in most applications that use Qt, most or all user-visible
    strings are stored using Unicode. Qt provides:

    \list

    \i Translation to/from legacy encodings for file I/O: see
    QTextCodec and QTextStream.
    \i Translation from Input Methods and 8-bit keyboard input.
    \i Translation to legacy character sets for on-screen display.
    \i A string class, QString, that stores Unicode characters, with
    support for migrating from C strings including fast (cached)
    translation to and from US-ASCII, and all the usual string
    operations.
    \i Unicode-aware widgets where appropriate.
    \i Unicode support detection on Windows, so that Qt provides Unicode
    even on Windows platforms that do not support it natively.

    \endlist

    To fully benefit from Unicode, we recommend using QString for storing
    all user-visible strings, and performing all text file I/O using
    QTextStream. Use QKeyEvent::text() for keyboard input in any custom
    widgets you write; it does not make much difference for slow typists
    in Western Europe or North America, but for fast typists or people
    using special input methods using text() is beneficial.

    All the function arguments in Qt that may be user-visible strings,
    QLabel::setText() and a many others, take \c{const QString &}s.
    QString provides implicit casting from \c{const char *}
    so that things like

    \snippet doc/src/snippets/code/doc_src_unicode.cpp 0

    will work. There is also a function, QObject::tr(), that provides
    translation support, like this:

    \snippet doc/src/snippets/code/doc_src_unicode.cpp 1

    QObject::tr() maps from \c{const char *} to a Unicode string, and
    uses installable QTranslator objects to do the mapping.

    Qt provides a number of built-in QTextCodec classes, that is,
    classes that know how to translate between Unicode and legacy
    encodings to support programs that must talk to other programs or
    read/write files in legacy file formats.

    By default, conversion to/from \c{const char *} uses a
    locale-dependent codec. However, applications can easily find codecs
    for other locales, and set any open file or network connection to use
    a special codec. It is also possible to install new codecs, for
    encodings that the built-in ones do not support. (At the time of
    writing, Vietnamese/VISCII is one such example.)

    Since US-ASCII and ISO-8859-1 are so common, there are also especially
    fast functions for mapping to and from them. For example, to open an
    application's icon one might do this:

    \snippet doc/src/snippets/code/doc_src_unicode.cpp 2

    or

    \snippet doc/src/snippets/code/doc_src_unicode.cpp 3

    Regarding output, Qt will do a best-effort conversion from
    Unicode to whatever encoding the system and fonts provide.
    Depending on operating system, locale, font availability, and Qt's
    support for the characters used, this conversion may be good or bad.
    We will extend this in upcoming versions, with emphasis on the most
    common locales first.

    \sa {Internationalization with Qt}
*/