diff options
author | Antoine Pitrou <solipsis@pitrou.net> | 2011-10-23 22:14:43 (GMT) |
---|---|---|
committer | Antoine Pitrou <solipsis@pitrou.net> | 2011-10-23 22:14:43 (GMT) |
commit | fd9b4166bb2adeaeed49782b1855e1acb41924a0 (patch) | |
tree | a8d8dca6f182650cb225894eba5167368f2827c1 /Doc | |
parent | 01fd26c7463483a9ce021606eb4e03096ecdfafd (diff) | |
download | cpython-fd9b4166bb2adeaeed49782b1855e1acb41924a0.zip cpython-fd9b4166bb2adeaeed49782b1855e1acb41924a0.tar.gz cpython-fd9b4166bb2adeaeed49782b1855e1acb41924a0.tar.bz2 |
Improve / clean up the PEP 393 description
Diffstat (limited to 'Doc')
-rw-r--r-- | Doc/whatsnew/3.3.rst | 36 |
1 files changed, 20 insertions, 16 deletions
diff --git a/Doc/whatsnew/3.3.rst b/Doc/whatsnew/3.3.rst index ce47608..fb1c7ce 100644 --- a/Doc/whatsnew/3.3.rst +++ b/Doc/whatsnew/3.3.rst @@ -52,25 +52,27 @@ This article explains the new features in Python 3.3, compared to 3.2. PEP 393: Flexible String Representation ======================================= -[Abstract copied from the PEP: The Unicode string type is changed to support -multiple internal representations, depending on the character with the largest -Unicode ordinal (1, 2, or 4 bytes). This allows a space-efficient -representation in common cases, but gives access to full UCS-4 on all systems. -For compatibility with existing APIs, several representations may exist in -parallel; over time, this compatibility should be phased out.] +The Unicode string type is changed to support multiple internal +representations, depending on the character with the largest Unicode ordinal +(1, 2, or 4 bytes) in the represented string. This allows a space-efficient +representation in common cases, but gives access to full UCS-4 on all +systems. For compatibility with existing APIs, several representations may +exist in parallel; over time, this compatibility should be phased out. -PEP 393 is fully backward compatible. The legacy API should remain -available at least five years. Applications using the legacy API will not -fully benefit of the memory reduction, or worse may use a little bit more -memory, because Python may have to maintain two versions of each string (in -the legacy format and in the new efficient storage). +On the Python side, there should be no downside to this change. -XXX Add list of changes introduced by :pep:`393` here: +On the C API side, PEP 393 is fully backward compatible. The legacy API +should remain available at least five years. Applications using the legacy +API will not fully benefit of the memory reduction, or - worse - may use +a bit more memory, because Python may have to maintain two versions of each +string (in the legacy format and in the new efficient storage). + +Changes introduced by :pep:`393` are the following: * Python now always supports the full range of Unicode codepoints, including non-BMP ones (i.e. from ``U+0000`` to ``U+10FFFF``). The distinction between narrow and wide builds no longer exists and Python now behaves like a wide - build. + build, even under Windows. * The storage of Unicode strings now depends on the highest codepoint in the string: @@ -86,7 +88,8 @@ XXX Add list of changes introduced by :pep:`393` here: XXX The result should be moved in the PEP and a small summary about performances and a link to the PEP should be added here. -* Some of the problems visible on narrow builds have been fixed, for example: +* With the death of narrow builds, the problems specific to narrow builds have + also been fixed, for example: * :func:`len` now always returns 1 for non-BMP characters, so ``len('\U0010FFFF') == 1``; @@ -94,10 +97,11 @@ XXX Add list of changes introduced by :pep:`393` here: * surrogate pairs are not recombined in string literals, so ``'\uDBFF\uDFFF' != '\U0010FFFF'``; - * indexing or slicing a non-BMP characters doesn't return surrogates anymore, + * indexing or slicing non-BMP characters returns the expected value, so ``'\U0010FFFF'[0]`` now returns ``'\U0010FFFF'`` and not ``'\uDBFF'``; - * several other functions in the stdlib now handle correctly non-BMP codepoints. + * several other functions in the standard library now handle correctly + non-BMP codepoints. * The value of :data:`sys.maxunicode` is now always ``1114111`` (``0x10FFFF`` in hexadecimal). The :c:func:`PyUnicode_GetMax` function still returns |