diff options
author | R. David Murray <rdmurray@bitdance.com> | 2010-10-08 15:55:28 (GMT) |
---|---|---|
committer | R. David Murray <rdmurray@bitdance.com> | 2010-10-08 15:55:28 (GMT) |
commit | 96fd54eaec700cc50e5960f45ee79bc25c2c48c5 (patch) | |
tree | 4e4fc3f48d8957b6b0fccc372410e8374ce4fb70 /Doc | |
parent | 59fdd6736bbf1ba14083a4bb777abaefc364f876 (diff) | |
download | cpython-96fd54eaec700cc50e5960f45ee79bc25c2c48c5.zip cpython-96fd54eaec700cc50e5960f45ee79bc25c2c48c5.tar.gz cpython-96fd54eaec700cc50e5960f45ee79bc25c2c48c5.tar.bz2 |
#4661: add bytes parsing and generation to email (email version bump to 5.1.0)
The work on this is not 100% complete, but everything is present to
allow real-world testing of the code. The only remaining major todo
item is to (hopefully!) enhance the handling of non-ASCII bytes in headers
converted to unicode by RFC2047 encoding them rather than replacing them with
'?'s.
Diffstat (limited to 'Doc')
-rw-r--r-- | Doc/library/email.generator.rst | 35 | ||||
-rw-r--r-- | Doc/library/email.message.rst | 18 | ||||
-rw-r--r-- | Doc/library/email.parser.rst | 69 | ||||
-rw-r--r-- | Doc/library/email.rst | 40 |
4 files changed, 151 insertions, 11 deletions
diff --git a/Doc/library/email.generator.rst b/Doc/library/email.generator.rst index 930905a..954f175 100644 --- a/Doc/library/email.generator.rst +++ b/Doc/library/email.generator.rst @@ -22,6 +22,12 @@ the Generator on a :class:`~email.message.Message` constructed by program may result in changes to the :class:`~email.message.Message` object as defaults are filled in. +:class:`bytes` output can be generated using the :class:`BytesGenerator` class. +If the message object structure contains non-ASCII bytes, this generator's +:meth:`~BytesGenerator.flatten` method will emit the original bytes. Parsing a +binary message and then flattening it with :class:`BytesGenerator` should be +idempotent for standards compliant messages. + Here are the public methods of the :class:`Generator` class, imported from the :mod:`email.generator` module: @@ -65,6 +71,13 @@ Here are the public methods of the :class:`Generator` class, imported from the Note that for subparts, no envelope header is ever printed. + Messages parsed with a Bytes parser that have a + :mailheader:`Content-Transfer-Encoding` of 8bit will be converted to a + use a 7bit Content-Transfer-Encoding. Any other non-ASCII bytes in the + message structure will be converted to '?' characters. + + .. versionchanged:: 3.2 added support for re-encoding 8bit message bodies. + .. method:: clone(fp) Return an independent clone of this :class:`Generator` instance with the @@ -76,11 +89,27 @@ Here are the public methods of the :class:`Generator` class, imported from the :class:`Generator`'s constructor. This provides just enough file-like API for :class:`Generator` instances to be used in the :func:`print` function. -As a convenience, see the methods :meth:`Message.as_string` and -``str(aMessage)``, a.k.a. :meth:`Message.__str__`, which simplify the generation -of a formatted string representation of a message object. For more detail, see +As a convenience, see the :class:`~email.message.Message` methods +:meth:`~email.message.Message.as_string` and ``str(aMessage)``, a.k.a. +:meth:`~email.message.Message.__str__`, which simplify the generation of a +formatted string representation of a message object. For more detail, see :mod:`email.message`. +.. class:: BytesGenerator(outfp, mangle_from_=True, maxheaderlen=78, fmt=None) + + This class has the same API as the :class:`Generator` class, except that + *outfp* must be a file like object that will accept :class`bytes` input to + its `write` method. If the message object structure contains non-ASCII + bytes, this generator's :meth:`~BytesGenerator.flatten` method will produce + them as-is, including preserving parts with a + :mailheader:`Content-Transfer-Encoding` of ``8bit``. + + Note that even the :meth:`write` method API is identical: it expects + strings as input, and converts them to bytes by encoding them using + the ASCII codec. + + .. versionadded:: 3.2 + The :mod:`email.generator` module also provides a derived class, called :class:`DecodedGenerator` which is like the :class:`Generator` base class, except that non-\ :mimetype:`text` parts are substituted with a format string diff --git a/Doc/library/email.message.rst b/Doc/library/email.message.rst index 9dcb2b4..dc305a7 100644 --- a/Doc/library/email.message.rst +++ b/Doc/library/email.message.rst @@ -111,9 +111,17 @@ Here are the methods of the :class:`Message` class: be decoded if this header's value is ``quoted-printable`` or ``base64``. If some other encoding is used, or :mailheader:`Content-Transfer-Encoding` header is missing, or if the payload has bogus base64 data, the payload is - returned as-is (undecoded). If the message is a multipart and the - *decode* flag is ``True``, then ``None`` is returned. The default for - *decode* is ``False``. + returned as-is (undecoded). In all cases the returned value is binary + data. If the message is a multipart and the *decode* flag is ``True``, + then ``None`` is returned. + + When *decode* is ``False`` (the default) the body is returned as a string + without decoding the :mailheader:`Content-Transfer-Encoding`. However, + for a :mailheader:`Content-Transfer-Encoding` of 8bit, an attempt is made + to decode the original bytes using the `charset` specified by the + :mailheader:`Content-Type` header, using the `replace` error handler. If + no `charset` is specified, or if the `charset` given is not recognized by + the email package, the body is decoded using the default ASCII charset. .. method:: set_payload(payload, charset=None) @@ -160,6 +168,10 @@ Here are the methods of the :class:`Message` class: Note that in all cases, any envelope header present in the message is not included in the mapping interface. + In a model generated from bytes, any header values that (in contravention + of the RFCs) contain non-ASCII bytes will have those bytes transformed + into '?' characters when the values are retrieved through this interface. + .. method:: __len__() diff --git a/Doc/library/email.parser.rst b/Doc/library/email.parser.rst index 32f4ff1..77a0b69 100644 --- a/Doc/library/email.parser.rst +++ b/Doc/library/email.parser.rst @@ -80,6 +80,14 @@ Here is the API for the :class:`FeedParser`: if you feed more data to a closed :class:`FeedParser`. +.. class:: BytesFeedParser(_factory=email.message.Message) + + Works exactly like :class:`FeedParser` except that the input to the + :meth:`~FeedParser.feed` method must be bytes and not string. + + .. versionadded:: 3.2 + + Parser class API ^^^^^^^^^^^^^^^^ @@ -131,7 +139,7 @@ class. Similar to the :meth:`parse` method, except it takes a string object instead of a file-like object. Calling this method on a string is exactly - equivalent to wrapping *text* in a :class:`StringIO` instance first and + equivalent to wrapping *text* in a :class:`~io.StringIO` instance first and calling :meth:`parse`. Optional *headersonly* is a flag specifying whether to stop parsing after @@ -139,25 +147,78 @@ class. the entire contents of the file. +.. class:: BytesParser(_class=email.message.Message, strict=None) + + This class is exactly parallel to :class:`Parser`, but handles bytes input. + The *_class* and *strict* arguments are interpreted in the same way as for + the :class:`Parser` constructor. *strict* is supported only to make porting + code easier; it is deprecated. + + .. method:: parse(fp, headeronly=False) + + Read all the data from the binary file-like object *fp*, parse the + resulting bytes, and return the message object. *fp* must support + both the :meth:`readline` and the :meth:`read` methods on file-like + objects. + + The bytes contained in *fp* must be formatted as a block of :rfc:`2822` + style headers and header continuation lines, optionally preceded by a + envelope header. The header block is terminated either by the end of the + data or by a blank line. Following the header block is the body of the + message (which may contain MIME-encoded subparts, including subparts + with a :mailheader:`Content-Transfer-Encoding` of ``8bit``. + + Optional *headersonly* is a flag specifying whether to stop parsing after + reading the headers or not. The default is ``False``, meaning it parses + the entire contents of the file. + + .. method:: parsebytes(bytes, headersonly=False) + + Similar to the :meth:`parse` method, except it takes a byte string object + instead of a file-like object. Calling this method on a byte string is + exactly equivalent to wrapping *text* in a :class:`~io.BytesIO` instance + first and calling :meth:`parse`. + + Optional *headersonly* is as with the :meth:`parse` method. + + .. versionadded:: 3.2 + + Since creating a message object structure from a string or a file object is such -a common task, two functions are provided as a convenience. They are available +a common task, four functions are provided as a convenience. They are available in the top-level :mod:`email` package namespace. .. currentmodule:: email -.. function:: message_from_string(s[, _class][, strict]) +.. function:: message_from_string(s, _class=email.message.Message, strict=None) Return a message object structure from a string. This is exactly equivalent to ``Parser().parsestr(s)``. Optional *_class* and *strict* are interpreted as with the :class:`Parser` class constructor. +.. function:: message_from_bytes(s, _class=email.message.Message, strict=None) + + Return a message object structure from a byte string. This is exactly + equivalent to ``BytesParser().parsebytes(s)``. Optional *_class* and + *strict* are interpreted as with the :class:`Parser` class constructor. + + .. versionadded:: 3.2 -.. function:: message_from_file(fp[, _class][, strict]) +.. function:: message_from_file(fp, _class=email.message.Message, strict=None) Return a message object structure tree from an open :term:`file object`. This is exactly equivalent to ``Parser().parse(fp)``. Optional *_class* and *strict* are interpreted as with the :class:`Parser` class constructor. +.. function:: message_from_binary_file(fp, _class=email.message.Message, strict=None) + + Return a message object structure tree from an open binary :term:`file + object`. This is exactly equivalent to ``BytesParser().parse(fp)``. + Optional *_class* and *strict* are interpreted as with the :class:`Parser` + class constructor. + + .. versionadded:: 3.2 + Here's an example of how you might use this at an interactive Python prompt:: >>> import email diff --git a/Doc/library/email.rst b/Doc/library/email.rst index d3f1908..8926ae4 100644 --- a/Doc/library/email.rst +++ b/Doc/library/email.rst @@ -6,7 +6,7 @@ email messages, including MIME documents. .. moduleauthor:: Barry A. Warsaw <barry@python.org> .. sectionauthor:: Barry A. Warsaw <barry@python.org> -.. Copyright (C) 2001-2007 Python Software Foundation +.. Copyright (C) 2001-2010 Python Software Foundation The :mod:`email` package is a library for managing email messages, including @@ -92,6 +92,44 @@ table also describes the Python compatibility of each version of the package. +---------------+------------------------------+-----------------------+ | :const:`4.0` | Python 2.5 | Python 2.3 to 2.5 | +---------------+------------------------------+-----------------------+ +| :const:`5.0` | Python 3.0 and Python 3.1 | Python 3.0 to 3.2 | ++---------------+------------------------------+-----------------------+ +| :const:`5.1` | Python 3.2 | Python 3.0 to 3.2 | ++---------------+------------------------------+-----------------------+ + +Here are the major differences between :mod:`email` version 5.1 and +version 5.0: + +* It is once again possible to parse messages containing non-ASCII bytes, + and to reproduce such messages if the data containing the non-ASCII + bytes is not modified. + +* New functions :func:`message_from_bytes` and :func:`message_from_binary_file`, + and new classes :class:`~email.parser.BytesFeedParser` and + :class:`~email.parser.BytesParser` allow binary message data to be parsed + into model objects. + +* Given bytes input to the model, :meth:`~email.message.Message.get_payload` + will by default decode a message body that has a + :mailheader:`Content-Transfer-Encoding` of `8bit` using the charset specified + in the MIME headers and return the resulting string. + +* Given bytes input to the model, :class:`~email.generator.Generator` will + convert message bodies that have a :mailheader:`Content-Transfer-Encoding` of + 8bit to instead have a 7bit Content-Transfer-Encoding. + +* New function :class:`~email.generator.BytesGenerator` produces bytes + as output, preserving any unchanged non-ASCII data that was + present in the input used to build the model, including message bodies + with a :mailheader:`Content-Transfer-Encoding` of 8bit. + +Here are the major differences between :mod:`email` version 5.0 and version 4: + +* All operations are on unicode strings. Text inputs must be strings, + text outputs are strings. Outputs are limited to the ASCII character + set and so can be encoded to ASCII for transmission. Inputs are also + limited to ASCII; this is an acknowledged limitation of email 5.0 and + means it can only be used to parse email that is 7bit clean. Here are the major differences between :mod:`email` version 4 and version 3: |