summaryrefslogtreecommitdiffstats
path: root/Doc/library/unicodedata.rst
diff options
context:
space:
mode:
authorGuido van Rossum <guido@python.org>2007-08-17 00:24:54 (GMT)
committerGuido van Rossum <guido@python.org>2007-08-17 00:24:54 (GMT)
commitda27fd267346e213512f4835dd0b7b40e6172bbe (patch)
tree8f9f7dbafb09976c7dbe412992e9270f62455246 /Doc/library/unicodedata.rst
parentaf554a0e17ceb0e6a3cc0c07e9cf6db2f80c1ad9 (diff)
downloadcpython-da27fd267346e213512f4835dd0b7b40e6172bbe.zip
cpython-da27fd267346e213512f4835dd0b7b40e6172bbe.tar.gz
cpython-da27fd267346e213512f4835dd0b7b40e6172bbe.tar.bz2
Manually patched a few things that didn't get merged in, but should.
Diffstat (limited to 'Doc/library/unicodedata.rst')
-rw-r--r--Doc/library/unicodedata.rst6
1 files changed, 5 insertions, 1 deletions
diff --git a/Doc/library/unicodedata.rst b/Doc/library/unicodedata.rst
index 017d4ee..ec788c5 100644
--- a/Doc/library/unicodedata.rst
+++ b/Doc/library/unicodedata.rst
@@ -107,7 +107,7 @@ the following functions:
based on the definition of canonical equivalence and compatibility equivalence.
In Unicode, several characters can be expressed in various way. For example, the
character U+00C7 (LATIN CAPITAL LETTER C WITH CEDILLA) can also be expressed as
- the sequence U+0043 (LATIN CAPITAL LETTER C) U+0327 (COMBINING CEDILLA).
+ the sequence U+0327 (COMBINING CEDILLA) U+0043 (LATIN CAPITAL LETTER C).
For each character, there are two normal forms: normal form C and normal form D.
Normal form D (NFD) is also known as canonical decomposition, and translates
@@ -126,6 +126,10 @@ the following functions:
(NFKC) first applies the compatibility decomposition, followed by the canonical
composition.
+ Even if two unicode strings are normalized and look the same to
+ a human reader, if one has combining characters and the other
+ doesn't, they may not compare equal.
+
.. versionadded:: 2.3
In addition, the module exposes the following constant: