diff options
author | R David Murray <rdmurray@bitdance.com> | 2011-04-18 17:59:37 (GMT) |
---|---|---|
committer | R David Murray <rdmurray@bitdance.com> | 2011-04-18 17:59:37 (GMT) |
commit | 3edd22ac950d3a2bcc1ad2e5a83554970aef3369 (patch) | |
tree | b4661afc1be45e0d072c1c83ab354b2362f05afb /Doc/library | |
parent | ce16be91dc68597b0c5bfc7b4b1c5136fe5697a6 (diff) | |
download | cpython-3edd22ac950d3a2bcc1ad2e5a83554970aef3369.zip cpython-3edd22ac950d3a2bcc1ad2e5a83554970aef3369.tar.gz cpython-3edd22ac950d3a2bcc1ad2e5a83554970aef3369.tar.bz2 |
#11731: simplify/enhance parser/generator API by introducing policy objects.
This new interface will also allow for future planned enhancements
in control over the parser/generator without requiring any additional
complexity in the parser/generator API.
Patch reviewed by Éric Araujo and Barry Warsaw.
Diffstat (limited to 'Doc/library')
-rw-r--r-- | Doc/library/email.generator.rst | 55 | ||||
-rw-r--r-- | Doc/library/email.parser.rst | 54 | ||||
-rw-r--r-- | Doc/library/email.policy.rst | 179 |
3 files changed, 256 insertions, 32 deletions
diff --git a/Doc/library/email.generator.rst b/Doc/library/email.generator.rst index 85b32fe..847d7e4 100644 --- a/Doc/library/email.generator.rst +++ b/Doc/library/email.generator.rst @@ -32,7 +32,8 @@ Here are the public methods of the :class:`Generator` class, imported from the :mod:`email.generator` module: -.. class:: Generator(outfp, mangle_from_=True, maxheaderlen=78) +.. class:: Generator(outfp, mangle_from_=True, maxheaderlen=78, *, \ + policy=policy.default) The constructor for the :class:`Generator` class takes a :term:`file-like object` called *outfp* for an argument. *outfp* must support the :meth:`write` method @@ -53,10 +54,16 @@ Here are the public methods of the :class:`Generator` class, imported from the :class:`~email.header.Header` class. Set to zero to disable header wrapping. The default is 78, as recommended (but not required) by :rfc:`2822`. + The *policy* keyword specifies a :mod:`~email.policy` object that controls a + number of aspects of the generator's operation. The default policy + maintains backward compatibility. + + .. versionchanged:: 3.3 Added the *policy* keyword. + The other public :class:`Generator` methods are: - .. method:: flatten(msg, unixfrom=False, linesep='\\n') + .. method:: flatten(msg, unixfrom=False, linesep=None) Print the textual representation of the message object structure rooted at *msg* to the output file specified when the :class:`Generator` instance @@ -72,12 +79,13 @@ Here are the public methods of the :class:`Generator` class, imported from the Note that for subparts, no envelope header is ever printed. Optional *linesep* specifies the line separator character used to - terminate lines in the output. It defaults to ``\n`` because that is - the most useful value for Python application code (other library packages - expect ``\n`` separated lines). ``linesep=\r\n`` can be used to - generate output with RFC-compliant line separators. + terminate lines in the output. If specified it overrides the value + specified by the ``Generator``\'s ``policy``. - Messages parsed with a Bytes parser that have a + Because strings cannot represent non-ASCII bytes, ``Generator`` ignores + the value of the :attr:`~email.policy.Policy.must_be_7bit` + :mod:`~email.policy` setting and operates as if it were set ``True``. + This means that messages parsed with a Bytes parser that have a :mailheader:`Content-Transfer-Encoding` of 8bit will be converted to a use a 7bit Content-Transfer-Encoding. Non-ASCII bytes in the headers will be :rfc:`2047` encoded with a charset of `unknown-8bit`. @@ -103,7 +111,8 @@ As a convenience, see the :class:`~email.message.Message` methods formatted string representation of a message object. For more detail, see :mod:`email.message`. -.. class:: BytesGenerator(outfp, mangle_from_=True, maxheaderlen=78) +.. class:: BytesGenerator(outfp, mangle_from_=True, maxheaderlen=78, *, \ + policy=policy.default) The constructor for the :class:`BytesGenerator` class takes a binary :term:`file-like object` called *outfp* for an argument. *outfp* must @@ -125,19 +134,31 @@ formatted string representation of a message object. For more detail, see wrapping. The default is 78, as recommended (but not required) by :rfc:`2822`. + The *policy* keyword specifies a :mod:`~email.policy` object that controls a + number of aspects of the generator's operation. The default policy + maintains backward compatibility. + + .. versionchanged:: 3.3 Added the *policy* keyword. + The other public :class:`BytesGenerator` methods are: - .. method:: flatten(msg, unixfrom=False, linesep='\n') + .. method:: flatten(msg, unixfrom=False, linesep=None) Print the textual representation of the message object structure rooted at *msg* to the output file specified when the :class:`BytesGenerator` instance was created. Subparts are visited depth-first and the resulting - text will be properly MIME encoded. If the input that created the *msg* - contained bytes with the high bit set and those bytes have not been - modified, they will be copied faithfully to the output, even if doing so - is not strictly RFC compliant. (To produce strictly RFC compliant - output, use the :class:`Generator` class.) + text will be properly MIME encoded. If the :mod:`~email.policy` option + :attr:`~email.policy.Policy.must_be_7bit` is ``False`` (the default), + then any bytes with the high bit set in the original parsed message that + have not been modified will be copied faithfully to the output. If + ``must_be_7bit`` is true, the bytes will be converted as needed using an + ASCII content-transfer-encoding. In particular, RFC-invalid non-ASCII + bytes in headers will be encoded using the MIME ``unknown-8bit`` + character set, thus rendering them RFC-compliant. + + .. XXX: There should be a complementary option that just does the RFC + compliance transformation but leaves CTE 8bit parts alone. Messages parsed with a Bytes parser that have a :mailheader:`Content-Transfer-Encoding` of 8bit will be reconstructed @@ -152,10 +173,8 @@ formatted string representation of a message object. For more detail, see Note that for subparts, no envelope header is ever printed. Optional *linesep* specifies the line separator character used to - terminate lines in the output. It defaults to ``\n`` because that is - the most useful value for Python application code (other library packages - expect ``\n`` separated lines). ``linesep=\r\n`` can be used to - generate output with RFC-compliant line separators. + terminate lines in the output. If specified it overrides the value + specified by the ``Generator``\ 's ``policy``. .. method:: clone(fp) diff --git a/Doc/library/email.parser.rst b/Doc/library/email.parser.rst index c72d3d4..c5e43a9 100644 --- a/Doc/library/email.parser.rst +++ b/Doc/library/email.parser.rst @@ -58,12 +58,18 @@ list of defects that it can find. Here is the API for the :class:`FeedParser`: -.. class:: FeedParser(_factory=email.message.Message) +.. class:: FeedParser(_factory=email.message.Message, *, policy=policy.default) Create a :class:`FeedParser` instance. Optional *_factory* is a no-argument callable that will be called whenever a new message object is needed. It defaults to the :class:`email.message.Message` class. + The *policy* keyword specifies a :mod:`~email.policy` object that controls a + number of aspects of the parser's operation. The default policy maintains + backward compatibility. + + .. versionchanged:: 3.3 Added the *policy* keyword. + .. method:: feed(data) Feed the :class:`FeedParser` some more data. *data* should be a string @@ -104,7 +110,7 @@ have the same API as the :class:`Parser` and :class:`BytesParser` classes. .. versionadded:: 3.3 BytesHeaderParser -.. class:: Parser(_class=email.message.Message) +.. class:: Parser(_class=email.message.Message, *, policy=policy.default) The constructor for the :class:`Parser` class takes an optional argument *_class*. This must be a callable factory (such as a function or a class), and @@ -112,8 +118,13 @@ have the same API as the :class:`Parser` and :class:`BytesParser` classes. :class:`~email.message.Message` (see :mod:`email.message`). The factory will be called without arguments. - .. versionchanged:: 3.2 - Removed the *strict* argument that was deprecated in 2.4. + The *policy* keyword specifies a :mod:`~email.policy` object that controls a + number of aspects of the parser's operation. The default policy maintains + backward compatibility. + + .. versionchanged:: 3.3 + Removed the *strict* argument that was deprecated in 2.4. Added the + *policy* keyword. The other public :class:`Parser` methods are: @@ -144,12 +155,18 @@ have the same API as the :class:`Parser` and :class:`BytesParser` classes. the entire contents of the file. -.. class:: BytesParser(_class=email.message.Message, strict=None) +.. class:: BytesParser(_class=email.message.Message, *, policy=policy.default) This class is exactly parallel to :class:`Parser`, but handles bytes input. The *_class* and *strict* arguments are interpreted in the same way as for - the :class:`Parser` constructor. *strict* is supported only to make porting - code easier; it is deprecated. + the :class:`Parser` constructor. + + The *policy* keyword specifies a :mod:`~email.policy` object that + controls a number of aspects of the parser's operation. The default + policy maintains backward compatibility. + + .. versionchanged:: 3.3 + Removed the *strict* argument. Added the *policy* keyword. .. method:: parse(fp, headeronly=False) @@ -187,12 +204,15 @@ in the top-level :mod:`email` package namespace. .. currentmodule:: email -.. function:: message_from_string(s, _class=email.message.Message, strict=None) +.. function:: message_from_string(s, _class=email.message.Message, *, \ + policy=policy.default) Return a message object structure from a string. This is exactly equivalent to - ``Parser().parsestr(s)``. Optional *_class* and *strict* are interpreted as + ``Parser().parsestr(s)``. *_class* and *policy* are interpreted as with the :class:`Parser` class constructor. + .. versionchanged:: removed *strict*, added *policy* + .. function:: message_from_bytes(s, _class=email.message.Message, strict=None) Return a message object structure from a byte string. This is exactly @@ -200,21 +220,27 @@ in the top-level :mod:`email` package namespace. *strict* are interpreted as with the :class:`Parser` class constructor. .. versionadded:: 3.2 + .. versionchanged:: 3.3 removed *strict*, added *policy* -.. function:: message_from_file(fp, _class=email.message.Message, strict=None) +.. function:: message_from_file(fp, _class=email.message.Message, *, \ + policy=policy.default) Return a message object structure tree from an open :term:`file object`. - This is exactly equivalent to ``Parser().parse(fp)``. Optional *_class* - and *strict* are interpreted as with the :class:`Parser` class constructor. + This is exactly equivalent to ``Parser().parse(fp)``. *_class* + and *policy* are interpreted as with the :class:`Parser` class constructor. + + .. versionchanged:: 3.3 removed *strict*, added *policy* -.. function:: message_from_binary_file(fp, _class=email.message.Message, strict=None) +.. function:: message_from_binary_file(fp, _class=email.message.Message, *, \ + policy=policy.default) Return a message object structure tree from an open binary :term:`file object`. This is exactly equivalent to ``BytesParser().parse(fp)``. - Optional *_class* and *strict* are interpreted as with the :class:`Parser` + *_class* and *policy* are interpreted as with the :class:`Parser` class constructor. .. versionadded:: 3.2 + .. versionchanged:: 3.3 removed *strict*, added *policy* Here's an example of how you might use this at an interactive Python prompt:: diff --git a/Doc/library/email.policy.rst b/Doc/library/email.policy.rst new file mode 100644 index 0000000..f7eb471 --- /dev/null +++ b/Doc/library/email.policy.rst @@ -0,0 +1,179 @@ +:mod:`email`: Policy Objects +---------------------------- + +.. module:: email.policy + :synopsis: Controlling the parsing and generating of messages + + +The :mod:`email` package's prime focus is the handling of email messages as +described by the various email and MIME RFCs. However, the general format of +email messages (a block of header fields each consisting of a name followed by +a colon followed by a value, the whole block followed by a blank line and an +arbitrary 'body'), is a format that has found utility outside of the realm of +email. Some of these uses conform fairly closely to the main RFCs, some do +not. And even when working with email, there are times when it is desirable to +break strict compliance with the RFCs. + +Policy objects are the mechanism used to provide the email package with the +flexibility to handle all these disparate use cases, + +A :class:`Policy` object encapsulates a set of attributes and methods that +control the behavior of various components of the email package during use. +:class:`Policy` instances can be passed to various classes and methods in the +email package to alter the default behavior. The settable values and their +defaults are described below. The :mod:`policy` module also provides some +pre-created :class:`Policy` instances. In addition to a :const:`default` +instance, there are instances tailored for certain applications. For example +there is an :const:`SMTP` :class:`Policy` with defaults appropriate for +generating output to be sent to an SMTP server. These are listed :ref:`below +<Policy Instances>`. + +In general an application will only need to deal with setting the policy at the +input and output boundaries. Once parsed, a message is represented by a +:class:`~email.message.Message` object, which is designed to be independent of +the format that the message has "on the wire" when it is received, transmitted, +or displayed. Thus, a :class:`Policy` can be specified when parsing a message +to create a :class:`~email.message.Message`, and again when turning the +:class:`~email.message.Message` into some other representation. While often a +program will use the same :class:`Policy` for both input and output, the two +can be different. + +As an example, the following code could be used to read an email message from a +file on disk and pass it to the system ``sendmail`` program on a ``unix`` +system:: + + >>> from email import msg_from_binary_file + >>> from email.generator import BytesGenerator + >>> import email.policy + >>> from subprocess import Popen, PIPE + >>> with open('mymsg.txt', 'b') as f: + >>> msg = msg_from_binary_file(f, policy=email.policy.mbox) + >>> p = Popen(['sendmail', msg['To'][0].address], stdin=PIPE) + >>> g = BytesGenerator(p.stdin, email.policy.policy=SMTP) + >>> g.flatten(msg) + >>> p.stdin.close() + >>> rc = p.wait() + +Some email package methods accept a *policy* keyword argument, allowing the +policy to be overridden for that method. For example, the following code use +the :meth:`email.message.Message.as_string` method to the *msg* object from the +previous example and re-write it to a file using the native line separators for +the platform on which it is running:: + + >>> import os + >>> mypolicy = email.policy.Policy(linesep=os.linesep) + >>> with open('converted.txt', 'wb') as f: + ... f.write(msg.as_string(policy=mypolicy)) + +Policy instances are immutable, but they can be cloned, accepting the same +keyword arguments as the class constructor and returning a new :class:`Policy` +instance that is a copy of the original but with the specified attributes +values changed. For example, the following creates an SMTP policy that will +raise any defects detected as errors:: + + >>> strict_SMTP = email.policy.SMTP.clone(raise_on_defect=True) + +Policy objects can also be combined using the addition operator, producing a +policy object whose settings are a combination of the non-default values of the +summed objects:: + + >>> strict_SMTP = email.policy.SMTP + email.policy.strict + +This operation is not commutative; that is, the order in which the objects are +added matters. To illustrate:: + + >>> Policy = email.policy.Policy + >>> apolicy = Policy(max_line_length=100) + Policy(max_line_length=80) + >>> apolicy.max_line_length + 80 + >>> apolicy = Policy(max_line_length=80) + Policy(max_line_length=100) + >>> apolicy.max_line_length + 100 + + +.. class:: Policy(**kw) + + The valid constructor keyword arguments are any of the attributes listed + below. + + .. attribute:: max_line_length + + The maximum length of any line in the serialized output, not counting the + end of line character(s). Default is 78, per :rfc:`5322`. A value of + ``0`` or :const:`None` indicates that no line wrapping should be + done at all. + + .. attribute:: linesep + + The string to be used to terminate lines in serialized output. The + default is '\\n' because that's the internal end-of-line discipline used + by Python, though '\\r\\n' is required by the RFCs. See `Policy + Instances`_ for policies that use an RFC conformant linesep. Setting it + to :attr:`os.linesep` may also be useful. + + .. attribute:: must_be_7bit + + If :const:`True`, data output by a bytes generator is limited to ASCII + characters. If :const:`False` (the default), then bytes with the high + bit set are preserved and/or allowed in certain contexts (for example, + where possible a content transfer encoding of ``8bit`` will be used). + String generators act as if ``must_be_7bit`` is `True` regardless of the + policy in effect, since a string cannot represent non-ASCII bytes. + + .. attribute:: raise_on_defect + + If :const:`True`, any defects encountered will be raised as errors. If + :const:`False` (the default), defects will be passed to the + :meth:`register_defect` method. + + .. method:: handle_defect(obj, defect) + + *obj* is the object on which to register the defect. *defect* should be + an instance of a subclass of :class:`~email.errors.Defect`. + If :attr:`raise_on_defect` + is ``True`` the defect is raised as an exception. Otherwise *obj* and + *defect* are passed to :meth:`register_defect`. This method is intended + to be called by parsers when they encounter defects, and will not be + called by code that uses the email library unless that code is + implementing an alternate parser. + + .. method:: register_defect(obj, defect) + + *obj* is the object on which to register the defect. *defect* should be + a subclass of :class:`~email.errors.Defect`. This method is part of the + public API so that custom ``Policy`` subclasses can implement alternate + handling of defects. The default implementation calls the ``append`` + method of the ``defects`` attribute of *obj*. + + .. method:: clone(obj, *kw): + + Return a new :class:`Policy` instance whose attributes have the same + values as the current instance, except where those attributes are + given new values by the keyword arguments. + + +Policy Instances +................ + +The following instances of :class:`Policy` provide defaults suitable for +specific common application domains. + +.. data:: default + + An instance of :class:`Policy` with all defaults unchanged. + +.. data:: SMTP + + Output serialized from a message will conform to the email and SMTP + RFCs. The only changed attribute is :attr:`linesep`, which is set to + ``\r\n``. + +.. data:: HTTP + + Suitable for use when serializing headers for use in HTTP traffic. + :attr:`linesep` is set to ``\r\n``, and :attr:`max_line_length` is set to + :const:`None` (unlimited). + +.. data:: strict + + :attr:`raise_on_defect` is set to :const:`True`. |