diff options
author | Marc-André Lemburg <mal@egenix.com> | 2001-05-21 20:30:15 (GMT) |
---|---|---|
committer | Marc-André Lemburg <mal@egenix.com> | 2001-05-21 20:30:15 (GMT) |
commit | 489b56e04480b8ca3f2d1676265e67c65bae788d (patch) | |
tree | a148a1f74890d004f6434a77eb14185b76c73c77 /Include | |
parent | f52d27e52d289b99837b4555fb3f757f2c89f4ad (diff) | |
download | cpython-489b56e04480b8ca3f2d1676265e67c65bae788d.zip cpython-489b56e04480b8ca3f2d1676265e67c65bae788d.tar.gz cpython-489b56e04480b8ca3f2d1676265e67c65bae788d.tar.bz2 |
This patch changes the behaviour of the UTF-16 codec family. Only the
UTF-16 codec will now interpret and remove a *leading* BOM mark. Sub-
sequent BOM characters are no longer interpreted and removed.
UTF-16-LE and -BE pass through all BOM mark characters.
These changes should get the UTF-16 codec more in line with what
the Unicode FAQ recommends w/r to BOM marks.
Diffstat (limited to 'Include')
-rw-r--r-- | Include/unicodeobject.h | 9 |
1 files changed, 5 insertions, 4 deletions
diff --git a/Include/unicodeobject.h b/Include/unicodeobject.h index 988ea1b..f91a5a0 100644 --- a/Include/unicodeobject.h +++ b/Include/unicodeobject.h @@ -459,10 +459,11 @@ extern DL_IMPORT(PyObject*) PyUnicode_EncodeUTF8( *byteorder == 0: native order *byteorder == 1: big endian - and then switches according to all BOM marks it finds in the input - data. BOM marks are not copied into the resulting Unicode string. - After completion, *byteorder is set to the current byte order at - the end of input data. + In native mode, the first two bytes of the stream are checked for a + BOM mark. If found, the BOM mark is analysed, the byte order + adjusted and the BOM skipped. In the other modes, no BOM mark + interpretation is done. After completion, *byteorder is set to the + current byte order at the end of input data. If byteorder is NULL, the codec starts in native order mode. |