Fill out the Unicode section, somewhat uncertainly

author: Andrew M. Kuchling <amk@amk.ca> 2001-07-19 01:48:08 (GMT)
committer: Andrew M. Kuchling <amk@amk.ca> 2001-07-19 01:48:08 (GMT)
commit: f5fec3c88a2fbb025fba7f1625240058a875c8ea (patch)
tree: bc45a1b1e1116779c01a3efae1d3e6ddce0257f1 /Doc
parent: 8cfa9055cf1b299bdf321f916ccd562993fb5963 (diff)
download: cpython-f5fec3c88a2fbb025fba7f1625240058a875c8ea.zip
cpython-f5fec3c88a2fbb025fba7f1625240058a875c8ea.tar.gz
cpython-f5fec3c88a2fbb025fba7f1625240058a875c8ea.tar.bz2
1 files changed, 24 insertions, 7 deletions
diff --git a/Doc/whatsnew/whatsnew22.tex b/Doc/whatsnew/whatsnew22.tex
index f181dcf..96b0972 100644
--- a/Doc/whatsnew/whatsnew22.tex
+++ b/Doc/whatsnew/whatsnew22.tex
@@ -340,11 +340,21 @@ and Tim Peters, with other fixes from the Python Labs crew.}
 
 Python's Unicode support has been enhanced a bit in 2.2.  Unicode
 strings are usually stored as UCS-2, as 16-bit unsigned integers.
-Python 2.2 can also be compiled to use UCS-4, 32-bit unsigned integers
-by supplying \longprogramopt{enable-unicode=ucs4} to the configure script.
-
-XXX explain surrogates?  I have to figure out what the changes mean to users.
-
+Python 2.2 can also be compiled to use UCS-4, 32-bit unsigned
+integers, as its internal encoding by supplying
+\longprogramopt{enable-unicode=ucs4} to the configure script.  When
+built to use UCS-4, in theory Python could handle Unicode characters
+from U-00000000 to U-7FFFFFFF.  Being able to use UCS-4 internally is
+a necessary step to do that, but it's not the only step, and in Python
+2.2alpha1 the work isn't complete yet.  For example, the
+\function{unichr()} function still only accepts values from 0 to
+65535, and there's no \code{\e U} notation for embedding characters
+greater than 65535 in a Unicode string literal.  All this is the
+province of the still-unimplemented PEP 261, ``Support for `wide'
+Unicode characters''; consult it for further details, and please offer
+comments and suggestions on the proposal it describes.
+
+Another change is much simpler to explain.
 Since their introduction, Unicode strings have supported an
 \method{encode()} method to convert the string to a selected encoding
 such as UTF-8 or Latin-1.  A symmetric
@@ -375,9 +385,16 @@ end
 'furrfu'
 \end{verbatim}
 
-References: http://mail.python.org/pipermail/i18n-sig/2001-June/001107.html  
-and following thread.
+\method{encode()} and \method{decode()} were implemented by
+Marc-Andr\'e Lemburg.  The changes to support using UCS-4 internally
+were implemented by Fredrik Lundh and Martin von L\"owis.
+
+\begin{seealso}
+
+\seepep{261}{Support for `wide' Unicode characters}{PEP written by
+Paul Prescod.  Not yet accepted or fully implemented.}
 
+\end{seealso}
 
 %======================================================================
 \section{PEP 227: Nested Scopes}
author	Andrew M. Kuchling <amk@amk.ca>	2001-07-19 01:48:08 (GMT)
committer	Andrew M. Kuchling <amk@amk.ca>	2001-07-19 01:48:08 (GMT)
commit	f5fec3c88a2fbb025fba7f1625240058a875c8ea (patch)
tree	bc45a1b1e1116779c01a3efae1d3e6ddce0257f1 /Doc
parent	8cfa9055cf1b299bdf321f916ccd562993fb5963 (diff)
download	cpython-f5fec3c88a2fbb025fba7f1625240058a875c8ea.zip cpython-f5fec3c88a2fbb025fba7f1625240058a875c8ea.tar.gz cpython-f5fec3c88a2fbb025fba7f1625240058a875c8ea.tar.bz2