diff options
author | Thomas Kluyver <takowl@gmail.com> | 2018-06-05 17:26:39 (GMT) |
---|---|---|
committer | Carol Willing <carolcode@willingconsulting.com> | 2018-06-05 17:26:39 (GMT) |
commit | c56b17bd8c7a3fd03859822246633d2c9586f8bd (patch) | |
tree | 346fb8b3a6614679232792b3f46398b33e5f3c0e /Doc/library/tokenize.rst | |
parent | c2745d2d05546d76f655ab450eb23d1af39e0b1c (diff) | |
download | cpython-c56b17bd8c7a3fd03859822246633d2c9586f8bd.zip cpython-c56b17bd8c7a3fd03859822246633d2c9586f8bd.tar.gz cpython-c56b17bd8c7a3fd03859822246633d2c9586f8bd.tar.bz2 |
bpo-12486: Document tokenize.generate_tokens() as public API (#6957)
* Document tokenize.generate_tokens()
* Add news file
* Add test for generate_tokens
* Document behaviour around ENCODING token
* Add generate_tokens to __all__
Diffstat (limited to 'Doc/library/tokenize.rst')
-rw-r--r-- | Doc/library/tokenize.rst | 13 |
1 files changed, 12 insertions, 1 deletions
diff --git a/Doc/library/tokenize.rst b/Doc/library/tokenize.rst index 4c0a0ce..111289c 100644 --- a/Doc/library/tokenize.rst +++ b/Doc/library/tokenize.rst @@ -57,6 +57,16 @@ The primary entry point is a :term:`generator`: :func:`.tokenize` determines the source encoding of the file by looking for a UTF-8 BOM or encoding cookie, according to :pep:`263`. +.. function:: generate_tokens(readline) + + Tokenize a source reading unicode strings instead of bytes. + + Like :func:`.tokenize`, the *readline* argument is a callable returning + a single line of input. However, :func:`generate_tokens` expects *readline* + to return a str object rather than bytes. + + The result is an iterator yielding named tuples, exactly like + :func:`.tokenize`. It does not yield an :data:`~token.ENCODING` token. All constants from the :mod:`token` module are also exported from :mod:`tokenize`. @@ -79,7 +89,8 @@ write back the modified script. positions) may change. It returns bytes, encoded using the :data:`~token.ENCODING` token, which - is the first token sequence output by :func:`.tokenize`. + is the first token sequence output by :func:`.tokenize`. If there is no + encoding token in the input, it returns a str instead. :func:`.tokenize` needs to detect the encoding of source files it tokenizes. The |