Update and reorganize the whatsnew entry for PEP 393.

author: Ezio Melotti <ezio.melotti@gmail.com> 2011-09-29 05:34:36 (GMT)
committer: Ezio Melotti <ezio.melotti@gmail.com> 2011-09-29 05:34:36 (GMT)
commit: 397546ac2f4f07fe6eca2fff50c72de9e045f483 (patch)
tree: 56e3af1476a21de2373e2376e9fbf617f08ac81e /Doc/whatsnew
parent: 9d3579b7d68816dc35da47a6a972e57f6c936dea (diff)
download: cpython-397546ac2f4f07fe6eca2fff50c72de9e045f483.zip
cpython-397546ac2f4f07fe6eca2fff50c72de9e045f483.tar.gz
cpython-397546ac2f4f07fe6eca2fff50c72de9e045f483.tar.bz2
1 files changed, 42 insertions, 21 deletions
diff --git a/Doc/whatsnew/3.3.rst b/Doc/whatsnew/3.3.rst
index 32d7a3e..c2cf524 100644
--- a/Doc/whatsnew/3.3.rst
+++ b/Doc/whatsnew/3.3.rst
@@ -58,35 +58,56 @@ PEP XXX: Stub
 PEP 393: Flexible String Representation
 =======================================
 
+XXX Give a short introduction about :pep:`393`.
+
+PEP 393 is fully backward compatible. The legacy API should remain
+available at least five years. Applications using the legacy API will not
+fully benefit of the memory reduction, or worse may use a little bit more
+memory, because Python may have to maintain two versions of each string (in
+the legacy format and in the new efficient storage).
+
 XXX Add list of changes introduced by :pep:`393` here:
 
+* Python now always supports the full range of Unicode codepoints, including
+  non-BMP ones (i.e. from ``U+0000`` to ``U+10FFFF``).  The distinction between
+  narrow and wide builds no longer exists and Python now behaves like a wide
+  build.
+
+* The storage of Unicode strings now depends on the highest codepoint in the string:
+
+  * pure ASCII and Latin1 strings (``U+0000-U+00FF``) use 1 byte per codepoint;
+
+  * BMP strings (``U+0000-U+FFFF``) use 2 bytes per codepoint;
+
+  * non-BMP strings (``U+10000-U+10FFFF``) use 4 bytes per codepoint.
+
+.. The memory usage of Python 3.3 is two to three times smaller than Python 3.2,
+   and a little bit better than Python 2.7, on a `Django benchmark
+   <http://mail.python.org/pipermail/python-dev/2011-September/113714.html>`_.
+   XXX The result should be moved in the PEP and a small summary about
+   performances and a link to the PEP should be added here.
+
+* Some of the problems visible on narrow builds have been fixed, for example:
+
+  * :func:`len` now always returns 1 for non-BMP characters,
+    so ``len('\U0010FFFF') == 1``;
+
+  * surrogate pairs are not recombined in string literals,
+    so ``'\uDBFF\uDFFF' != '\U0010FFFF'``;
+
+  * indexing or slicing a non-BMP characters doesn't return surrogates anymore,
+    so ``'\U0010FFFF'[0]`` now returns ``'\U0010FFFF'`` and not ``'\uDBFF'``;
+
+  * several other functions in the stdlib now handle correctly non-BMP codepoints.
+
 * The value of :data:`sys.maxunicode` is now always ``1114111`` (``0x10FFFF``
   in hexadecimal).  The :c:func:`PyUnicode_GetMax` function still returns
   either ``0xFFFF`` or ``0x10FFFF`` for backward compatibility, and it should
   not be used with the new Unicode API (see :issue:`13054`).
 
-* Non-BMP characters (U+10000-U+10FFFF range) are no more special cases.
-  ``'\U0010FFFF'[0]`` is now ``'\U0010FFFF'`` on any platform, instead of
-  ``'\uDFFF'`` on narrow build or ``'\U0010FFFF'`` on wide build. And
-  ``len('\U0010FFFF')`` is now ``1`` on any platform, instead of ``2`` on
-  narrow build or ``1`` on wide build. More generally, most bugs related to
-  non-BMP characters are now fixed. For example, :func:`unicodedata.normalize`
-  handles correctly non-BMP characters on all platforms.
-
-* The storage of Unicode string is now adapted on the content of the string.
-  Pure ASCII and Latin1 strings (U+0000-U+00FF) use 1 byte per character, BMP
-  strings (U+0000-U+FFFF) use 2 bytes per character, and non-BMP characters
-  (U+10000-U+10FFFF range) use 4 bytes per characters. The memory usage of
-  Python 3.3 is two to three times smaller than Python 3.2, and a little bit
-  better than Python 2.7, on a `Django benchmark
-  <http://mail.python.org/pipermail/python-dev/2011-September/113714.html>`_.
-
-* The PEP 393 is fully backward compatible. The legacy API should remain
-  available at least five years. Applications using the legacy API will not
-  fully benefit of the memory reduction, or worse may use a little bit more
-  memory, because Python may have to maintain two versions of each string (in
-  the legacy format and in the new efficient storage).
+* The :file:`./configure` flag ``--with-wide-unicode`` has been removed.
 
+XXX mention new and deprecated functions and macros
 
 Other Language Changes
 ======================
author	Ezio Melotti <ezio.melotti@gmail.com>	2011-09-29 05:34:36 (GMT)
committer	Ezio Melotti <ezio.melotti@gmail.com>	2011-09-29 05:34:36 (GMT)
commit	397546ac2f4f07fe6eca2fff50c72de9e045f483 (patch)
tree	56e3af1476a21de2373e2376e9fbf617f08ac81e /Doc/whatsnew
parent	9d3579b7d68816dc35da47a6a972e57f6c936dea (diff)
download	cpython-397546ac2f4f07fe6eca2fff50c72de9e045f483.zip cpython-397546ac2f4f07fe6eca2fff50c72de9e045f483.tar.gz cpython-397546ac2f4f07fe6eca2fff50c72de9e045f483.tar.bz2