- Issue #719888: Updated tokenize to use a bytes API. generate_tokens has been

renamed tokenize and now works with bytes rather than strings. A new detect_encoding function has been added for determining source file encoding according to PEP-0263. Token sequences returned by tokenize always start with an ENCODING token which specifies the encoding used to decode the file. This token is used to encode the output of untokenize back to bytes. Credit goes to Michael "I'm-going-to-name-my-first-child-unittest" Foord from Resolver Systems for this work.
author: Trent Nelson <trent.nelson@snakebite.org> 2008-03-18 22:41:35 (GMT)
committer: Trent Nelson <trent.nelson@snakebite.org> 2008-03-18 22:41:35 (GMT)
commit: 428de65ca99492436130165bfbaeb56d6d1daec7 (patch)
tree: d6c11516a28d8ca658e1f35ac6d7cc802958e336 /Lib/test/tokenize_tests-latin1-coding-cookie-and-utf8-bom-sig.txt
parent: 112367a980481d54f8c21802ee2538a3485fdd41 (diff)
download: cpython-428de65ca99492436130165bfbaeb56d6d1daec7.zip
cpython-428de65ca99492436130165bfbaeb56d6d1daec7.tar.gz
cpython-428de65ca99492436130165bfbaeb56d6d1daec7.tar.bz2
1 files changed, 13 insertions, 0 deletions
diff --git a/Lib/test/tokenize_tests-latin1-coding-cookie-and-utf8-bom-sig.txt b/Lib/test/tokenize_tests-latin1-coding-cookie-and-utf8-bom-sig.txt
new file mode 100644
index 0000000..4a7582a
--- /dev/null
+++ b/Lib/test/tokenize_tests-latin1-coding-cookie-and-utf8-bom-sig.txt
@@ -0,0 +1,13 @@
+# -*- coding: latin1 -*-
+# IMPORTANT: this file has the utf-8 BOM signature '\xef\xbb\xbf' 
+# at the start of it.  Make sure this is preserved if any changes
+# are made!  Also note that the coding cookie above conflicts with
+# the presense of a utf-8 BOM signature -- this is intended.
+
+# Arbitrary encoded utf-8 text (stolen from test_doctest2.py).
+x = 'ЉЊЈЁЂ'
+def y():
+    """
+    And again in a comment.  ЉЊЈЁЂ
+    """
+    pass
author	Trent Nelson <trent.nelson@snakebite.org>	2008-03-18 22:41:35 (GMT)
committer	Trent Nelson <trent.nelson@snakebite.org>	2008-03-18 22:41:35 (GMT)
commit	428de65ca99492436130165bfbaeb56d6d1daec7 (patch)
tree	d6c11516a28d8ca658e1f35ac6d7cc802958e336 /Lib/test/tokenize_tests-latin1-coding-cookie-and-utf8-bom-sig.txt
parent	112367a980481d54f8c21802ee2538a3485fdd41 (diff)
download	cpython-428de65ca99492436130165bfbaeb56d6d1daec7.zip cpython-428de65ca99492436130165bfbaeb56d6d1daec7.tar.gz cpython-428de65ca99492436130165bfbaeb56d6d1daec7.tar.bz2