summaryrefslogtreecommitdiffstats
path: root/Doc
diff options
context:
space:
mode:
authorR. David Murray <rdmurray@bitdance.com>2010-10-08 15:55:28 (GMT)
committerR. David Murray <rdmurray@bitdance.com>2010-10-08 15:55:28 (GMT)
commit96fd54eaec700cc50e5960f45ee79bc25c2c48c5 (patch)
tree4e4fc3f48d8957b6b0fccc372410e8374ce4fb70 /Doc
parent59fdd6736bbf1ba14083a4bb777abaefc364f876 (diff)
downloadcpython-96fd54eaec700cc50e5960f45ee79bc25c2c48c5.zip
cpython-96fd54eaec700cc50e5960f45ee79bc25c2c48c5.tar.gz
cpython-96fd54eaec700cc50e5960f45ee79bc25c2c48c5.tar.bz2
#4661: add bytes parsing and generation to email (email version bump to 5.1.0)
The work on this is not 100% complete, but everything is present to allow real-world testing of the code. The only remaining major todo item is to (hopefully!) enhance the handling of non-ASCII bytes in headers converted to unicode by RFC2047 encoding them rather than replacing them with '?'s.
Diffstat (limited to 'Doc')
-rw-r--r--Doc/library/email.generator.rst35
-rw-r--r--Doc/library/email.message.rst18
-rw-r--r--Doc/library/email.parser.rst69
-rw-r--r--Doc/library/email.rst40
4 files changed, 151 insertions, 11 deletions
diff --git a/Doc/library/email.generator.rst b/Doc/library/email.generator.rst
index 930905a..954f175 100644
--- a/Doc/library/email.generator.rst
+++ b/Doc/library/email.generator.rst
@@ -22,6 +22,12 @@ the Generator on a :class:`~email.message.Message` constructed by program may
result in changes to the :class:`~email.message.Message` object as defaults are
filled in.
+:class:`bytes` output can be generated using the :class:`BytesGenerator` class.
+If the message object structure contains non-ASCII bytes, this generator's
+:meth:`~BytesGenerator.flatten` method will emit the original bytes. Parsing a
+binary message and then flattening it with :class:`BytesGenerator` should be
+idempotent for standards compliant messages.
+
Here are the public methods of the :class:`Generator` class, imported from the
:mod:`email.generator` module:
@@ -65,6 +71,13 @@ Here are the public methods of the :class:`Generator` class, imported from the
Note that for subparts, no envelope header is ever printed.
+ Messages parsed with a Bytes parser that have a
+ :mailheader:`Content-Transfer-Encoding` of 8bit will be converted to a
+ use a 7bit Content-Transfer-Encoding. Any other non-ASCII bytes in the
+ message structure will be converted to '?' characters.
+
+ .. versionchanged:: 3.2 added support for re-encoding 8bit message bodies.
+
.. method:: clone(fp)
Return an independent clone of this :class:`Generator` instance with the
@@ -76,11 +89,27 @@ Here are the public methods of the :class:`Generator` class, imported from the
:class:`Generator`'s constructor. This provides just enough file-like API
for :class:`Generator` instances to be used in the :func:`print` function.
-As a convenience, see the methods :meth:`Message.as_string` and
-``str(aMessage)``, a.k.a. :meth:`Message.__str__`, which simplify the generation
-of a formatted string representation of a message object. For more detail, see
+As a convenience, see the :class:`~email.message.Message` methods
+:meth:`~email.message.Message.as_string` and ``str(aMessage)``, a.k.a.
+:meth:`~email.message.Message.__str__`, which simplify the generation of a
+formatted string representation of a message object. For more detail, see
:mod:`email.message`.
+.. class:: BytesGenerator(outfp, mangle_from_=True, maxheaderlen=78, fmt=None)
+
+ This class has the same API as the :class:`Generator` class, except that
+ *outfp* must be a file like object that will accept :class`bytes` input to
+ its `write` method. If the message object structure contains non-ASCII
+ bytes, this generator's :meth:`~BytesGenerator.flatten` method will produce
+ them as-is, including preserving parts with a
+ :mailheader:`Content-Transfer-Encoding` of ``8bit``.
+
+ Note that even the :meth:`write` method API is identical: it expects
+ strings as input, and converts them to bytes by encoding them using
+ the ASCII codec.
+
+ .. versionadded:: 3.2
+
The :mod:`email.generator` module also provides a derived class, called
:class:`DecodedGenerator` which is like the :class:`Generator` base class,
except that non-\ :mimetype:`text` parts are substituted with a format string
diff --git a/Doc/library/email.message.rst b/Doc/library/email.message.rst
index 9dcb2b4..dc305a7 100644
--- a/Doc/library/email.message.rst
+++ b/Doc/library/email.message.rst
@@ -111,9 +111,17 @@ Here are the methods of the :class:`Message` class:
be decoded if this header's value is ``quoted-printable`` or ``base64``.
If some other encoding is used, or :mailheader:`Content-Transfer-Encoding`
header is missing, or if the payload has bogus base64 data, the payload is
- returned as-is (undecoded). If the message is a multipart and the
- *decode* flag is ``True``, then ``None`` is returned. The default for
- *decode* is ``False``.
+ returned as-is (undecoded). In all cases the returned value is binary
+ data. If the message is a multipart and the *decode* flag is ``True``,
+ then ``None`` is returned.
+
+ When *decode* is ``False`` (the default) the body is returned as a string
+ without decoding the :mailheader:`Content-Transfer-Encoding`. However,
+ for a :mailheader:`Content-Transfer-Encoding` of 8bit, an attempt is made
+ to decode the original bytes using the `charset` specified by the
+ :mailheader:`Content-Type` header, using the `replace` error handler. If
+ no `charset` is specified, or if the `charset` given is not recognized by
+ the email package, the body is decoded using the default ASCII charset.
.. method:: set_payload(payload, charset=None)
@@ -160,6 +168,10 @@ Here are the methods of the :class:`Message` class:
Note that in all cases, any envelope header present in the message is not
included in the mapping interface.
+ In a model generated from bytes, any header values that (in contravention
+ of the RFCs) contain non-ASCII bytes will have those bytes transformed
+ into '?' characters when the values are retrieved through this interface.
+
.. method:: __len__()
diff --git a/Doc/library/email.parser.rst b/Doc/library/email.parser.rst
index 32f4ff1..77a0b69 100644
--- a/Doc/library/email.parser.rst
+++ b/Doc/library/email.parser.rst
@@ -80,6 +80,14 @@ Here is the API for the :class:`FeedParser`:
if you feed more data to a closed :class:`FeedParser`.
+.. class:: BytesFeedParser(_factory=email.message.Message)
+
+ Works exactly like :class:`FeedParser` except that the input to the
+ :meth:`~FeedParser.feed` method must be bytes and not string.
+
+ .. versionadded:: 3.2
+
+
Parser class API
^^^^^^^^^^^^^^^^
@@ -131,7 +139,7 @@ class.
Similar to the :meth:`parse` method, except it takes a string object
instead of a file-like object. Calling this method on a string is exactly
- equivalent to wrapping *text* in a :class:`StringIO` instance first and
+ equivalent to wrapping *text* in a :class:`~io.StringIO` instance first and
calling :meth:`parse`.
Optional *headersonly* is a flag specifying whether to stop parsing after
@@ -139,25 +147,78 @@ class.
the entire contents of the file.
+.. class:: BytesParser(_class=email.message.Message, strict=None)
+
+ This class is exactly parallel to :class:`Parser`, but handles bytes input.
+ The *_class* and *strict* arguments are interpreted in the same way as for
+ the :class:`Parser` constructor. *strict* is supported only to make porting
+ code easier; it is deprecated.
+
+ .. method:: parse(fp, headeronly=False)
+
+ Read all the data from the binary file-like object *fp*, parse the
+ resulting bytes, and return the message object. *fp* must support
+ both the :meth:`readline` and the :meth:`read` methods on file-like
+ objects.
+
+ The bytes contained in *fp* must be formatted as a block of :rfc:`2822`
+ style headers and header continuation lines, optionally preceded by a
+ envelope header. The header block is terminated either by the end of the
+ data or by a blank line. Following the header block is the body of the
+ message (which may contain MIME-encoded subparts, including subparts
+ with a :mailheader:`Content-Transfer-Encoding` of ``8bit``.
+
+ Optional *headersonly* is a flag specifying whether to stop parsing after
+ reading the headers or not. The default is ``False``, meaning it parses
+ the entire contents of the file.
+
+ .. method:: parsebytes(bytes, headersonly=False)
+
+ Similar to the :meth:`parse` method, except it takes a byte string object
+ instead of a file-like object. Calling this method on a byte string is
+ exactly equivalent to wrapping *text* in a :class:`~io.BytesIO` instance
+ first and calling :meth:`parse`.
+
+ Optional *headersonly* is as with the :meth:`parse` method.
+
+ .. versionadded:: 3.2
+
+
Since creating a message object structure from a string or a file object is such
-a common task, two functions are provided as a convenience. They are available
+a common task, four functions are provided as a convenience. They are available
in the top-level :mod:`email` package namespace.
.. currentmodule:: email
-.. function:: message_from_string(s[, _class][, strict])
+.. function:: message_from_string(s, _class=email.message.Message, strict=None)
Return a message object structure from a string. This is exactly equivalent to
``Parser().parsestr(s)``. Optional *_class* and *strict* are interpreted as
with the :class:`Parser` class constructor.
+.. function:: message_from_bytes(s, _class=email.message.Message, strict=None)
+
+ Return a message object structure from a byte string. This is exactly
+ equivalent to ``BytesParser().parsebytes(s)``. Optional *_class* and
+ *strict* are interpreted as with the :class:`Parser` class constructor.
+
+ .. versionadded:: 3.2
-.. function:: message_from_file(fp[, _class][, strict])
+.. function:: message_from_file(fp, _class=email.message.Message, strict=None)
Return a message object structure tree from an open :term:`file object`.
This is exactly equivalent to ``Parser().parse(fp)``. Optional *_class*
and *strict* are interpreted as with the :class:`Parser` class constructor.
+.. function:: message_from_binary_file(fp, _class=email.message.Message, strict=None)
+
+ Return a message object structure tree from an open binary :term:`file
+ object`. This is exactly equivalent to ``BytesParser().parse(fp)``.
+ Optional *_class* and *strict* are interpreted as with the :class:`Parser`
+ class constructor.
+
+ .. versionadded:: 3.2
+
Here's an example of how you might use this at an interactive Python prompt::
>>> import email
diff --git a/Doc/library/email.rst b/Doc/library/email.rst
index d3f1908..8926ae4 100644
--- a/Doc/library/email.rst
+++ b/Doc/library/email.rst
@@ -6,7 +6,7 @@
email messages, including MIME documents.
.. moduleauthor:: Barry A. Warsaw <barry@python.org>
.. sectionauthor:: Barry A. Warsaw <barry@python.org>
-.. Copyright (C) 2001-2007 Python Software Foundation
+.. Copyright (C) 2001-2010 Python Software Foundation
The :mod:`email` package is a library for managing email messages, including
@@ -92,6 +92,44 @@ table also describes the Python compatibility of each version of the package.
+---------------+------------------------------+-----------------------+
| :const:`4.0` | Python 2.5 | Python 2.3 to 2.5 |
+---------------+------------------------------+-----------------------+
+| :const:`5.0` | Python 3.0 and Python 3.1 | Python 3.0 to 3.2 |
++---------------+------------------------------+-----------------------+
+| :const:`5.1` | Python 3.2 | Python 3.0 to 3.2 |
++---------------+------------------------------+-----------------------+
+
+Here are the major differences between :mod:`email` version 5.1 and
+version 5.0:
+
+* It is once again possible to parse messages containing non-ASCII bytes,
+ and to reproduce such messages if the data containing the non-ASCII
+ bytes is not modified.
+
+* New functions :func:`message_from_bytes` and :func:`message_from_binary_file`,
+ and new classes :class:`~email.parser.BytesFeedParser` and
+ :class:`~email.parser.BytesParser` allow binary message data to be parsed
+ into model objects.
+
+* Given bytes input to the model, :meth:`~email.message.Message.get_payload`
+ will by default decode a message body that has a
+ :mailheader:`Content-Transfer-Encoding` of `8bit` using the charset specified
+ in the MIME headers and return the resulting string.
+
+* Given bytes input to the model, :class:`~email.generator.Generator` will
+ convert message bodies that have a :mailheader:`Content-Transfer-Encoding` of
+ 8bit to instead have a 7bit Content-Transfer-Encoding.
+
+* New function :class:`~email.generator.BytesGenerator` produces bytes
+ as output, preserving any unchanged non-ASCII data that was
+ present in the input used to build the model, including message bodies
+ with a :mailheader:`Content-Transfer-Encoding` of 8bit.
+
+Here are the major differences between :mod:`email` version 5.0 and version 4:
+
+* All operations are on unicode strings. Text inputs must be strings,
+ text outputs are strings. Outputs are limited to the ASCII character
+ set and so can be encoded to ASCII for transmission. Inputs are also
+ limited to ASCII; this is an acknowledged limitation of email 5.0 and
+ means it can only be used to parse email that is 7bit clean.
Here are the major differences between :mod:`email` version 4 and version 3: