diff options
Diffstat (limited to 'Doc/library/unicodedata.rst')
-rw-r--r-- | Doc/library/unicodedata.rst | 6 |
1 files changed, 5 insertions, 1 deletions
diff --git a/Doc/library/unicodedata.rst b/Doc/library/unicodedata.rst index 017d4ee..ec788c5 100644 --- a/Doc/library/unicodedata.rst +++ b/Doc/library/unicodedata.rst @@ -107,7 +107,7 @@ the following functions: based on the definition of canonical equivalence and compatibility equivalence. In Unicode, several characters can be expressed in various way. For example, the character U+00C7 (LATIN CAPITAL LETTER C WITH CEDILLA) can also be expressed as - the sequence U+0043 (LATIN CAPITAL LETTER C) U+0327 (COMBINING CEDILLA). + the sequence U+0327 (COMBINING CEDILLA) U+0043 (LATIN CAPITAL LETTER C). For each character, there are two normal forms: normal form C and normal form D. Normal form D (NFD) is also known as canonical decomposition, and translates @@ -126,6 +126,10 @@ the following functions: (NFKC) first applies the compatibility decomposition, followed by the canonical composition. + Even if two unicode strings are normalized and look the same to + a human reader, if one has combining characters and the other + doesn't, they may not compare equal. + .. versionadded:: 2.3 In addition, the module exposes the following constant: |