summaryrefslogtreecommitdiffstats
path: root/Include/unicodeobject.h
diff options
context:
space:
mode:
authorMarc-André Lemburg <mal@egenix.com>2001-05-21 20:30:15 (GMT)
committerMarc-André Lemburg <mal@egenix.com>2001-05-21 20:30:15 (GMT)
commit489b56e04480b8ca3f2d1676265e67c65bae788d (patch)
treea148a1f74890d004f6434a77eb14185b76c73c77 /Include/unicodeobject.h
parentf52d27e52d289b99837b4555fb3f757f2c89f4ad (diff)
downloadcpython-489b56e04480b8ca3f2d1676265e67c65bae788d.zip
cpython-489b56e04480b8ca3f2d1676265e67c65bae788d.tar.gz
cpython-489b56e04480b8ca3f2d1676265e67c65bae788d.tar.bz2
This patch changes the behaviour of the UTF-16 codec family. Only the
UTF-16 codec will now interpret and remove a *leading* BOM mark. Sub- sequent BOM characters are no longer interpreted and removed. UTF-16-LE and -BE pass through all BOM mark characters. These changes should get the UTF-16 codec more in line with what the Unicode FAQ recommends w/r to BOM marks.
Diffstat (limited to 'Include/unicodeobject.h')
-rw-r--r--Include/unicodeobject.h9
1 files changed, 5 insertions, 4 deletions
diff --git a/Include/unicodeobject.h b/Include/unicodeobject.h
index 988ea1b..f91a5a0 100644
--- a/Include/unicodeobject.h
+++ b/Include/unicodeobject.h
@@ -459,10 +459,11 @@ extern DL_IMPORT(PyObject*) PyUnicode_EncodeUTF8(
*byteorder == 0: native order
*byteorder == 1: big endian
- and then switches according to all BOM marks it finds in the input
- data. BOM marks are not copied into the resulting Unicode string.
- After completion, *byteorder is set to the current byte order at
- the end of input data.
+ In native mode, the first two bytes of the stream are checked for a
+ BOM mark. If found, the BOM mark is analysed, the byte order
+ adjusted and the BOM skipped. In the other modes, no BOM mark
+ interpretation is done. After completion, *byteorder is set to the
+ current byte order at the end of input data.
If byteorder is NULL, the codec starts in native order mode.