diff options
author | R David Murray <rdmurray@bitdance.com> | 2012-05-25 19:01:48 (GMT) |
---|---|---|
committer | R David Murray <rdmurray@bitdance.com> | 2012-05-25 19:01:48 (GMT) |
commit | c27e52265b7ff4aa57dc357c289cce8c9dd0fec3 (patch) | |
tree | b2a25260b0aa89d0a4db3c0d2f91c8cb5e68d51a /Doc | |
parent | 9242c1378f77214f5b9b90149861cb13ca986fb0 (diff) | |
download | cpython-c27e52265b7ff4aa57dc357c289cce8c9dd0fec3.zip cpython-c27e52265b7ff4aa57dc357c289cce8c9dd0fec3.tar.gz cpython-c27e52265b7ff4aa57dc357c289cce8c9dd0fec3.tar.bz2 |
#14731: refactor email policy framework.
This patch primarily does two things: (1) it adds some internal-interface
methods to Policy that allow for Policy to control the parsing and folding of
headers in such a way that we can construct a backward compatibility policy
that is 100% compatible with the 3.2 API, while allowing a new policy to
implement the email6 API. (2) it adds that backward compatibility policy and
refactors the test suite so that the only differences between the 3.2
test_email.py file and the 3.3 test_email.py file is some small changes in
test framework and the addition of tests for bugs fixed that apply to the 3.2
API.
There are some additional teaks, such as moving just the code needed for the
compatibility policy into _policybase, so that the library code can import
only _policybase. That way the new code that will be added for email6
will only get imported when a non-compatibility policy is imported.
Diffstat (limited to 'Doc')
-rw-r--r-- | Doc/library/email.generator.rst | 38 | ||||
-rw-r--r-- | Doc/library/email.policy.rst | 290 |
2 files changed, 226 insertions, 102 deletions
diff --git a/Doc/library/email.generator.rst b/Doc/library/email.generator.rst index 03733ee..73440b8 100644 --- a/Doc/library/email.generator.rst +++ b/Doc/library/email.generator.rst @@ -32,8 +32,7 @@ Here are the public methods of the :class:`Generator` class, imported from the :mod:`email.generator` module: -.. class:: Generator(outfp, mangle_from_=True, maxheaderlen=78, *, \ - policy=policy.default) +.. class:: Generator(outfp, mangle_from_=True, maxheaderlen=78, *, policy=None) The constructor for the :class:`Generator` class takes a :term:`file-like object` called *outfp* for an argument. *outfp* must support the :meth:`write` method @@ -55,8 +54,9 @@ Here are the public methods of the :class:`Generator` class, imported from the The default is 78, as recommended (but not required) by :rfc:`2822`. The *policy* keyword specifies a :mod:`~email.policy` object that controls a - number of aspects of the generator's operation. The default policy - maintains backward compatibility. + number of aspects of the generator's operation. If no *policy* is specified, + then the *policy* attached to the message object passed to :attr:``flatten`` + is used. .. versionchanged:: 3.3 Added the *policy* keyword. @@ -80,19 +80,19 @@ Here are the public methods of the :class:`Generator` class, imported from the Optional *linesep* specifies the line separator character used to terminate lines in the output. If specified it overrides the value - specified by the ``Generator``\'s ``policy``. + specified by the *msg*\'s or ``Generator``\'s ``policy``. - Because strings cannot represent non-ASCII bytes, ``Generator`` ignores - the value of the :attr:`~email.policy.Policy.must_be_7bit` - :mod:`~email.policy` setting and operates as if it were set ``True``. - This means that messages parsed with a Bytes parser that have a - :mailheader:`Content-Transfer-Encoding` of 8bit will be converted to a - use a 7bit Content-Transfer-Encoding. Non-ASCII bytes in the headers - will be :rfc:`2047` encoded with a charset of `unknown-8bit`. + Because strings cannot represent non-ASCII bytes, if the policy that + applies when ``flatten`` is run has :attr:`~email.policy.Policy.cte_type` + set to ``8bit``, ``Generator`` will operate as if it were set to + ``7bit``. This means that messages parsed with a Bytes parser that have + a :mailheader:`Content-Transfer-Encoding` of ``8bit`` will be converted + to a use a ``7bit`` Content-Transfer-Encoding. Non-ASCII bytes in the + headers will be :rfc:`2047` encoded with a charset of ``unknown-8bit``. .. versionchanged:: 3.2 - Added support for re-encoding 8bit message bodies, and the *linesep* - argument. + Added support for re-encoding ``8bit`` message bodies, and the + *linesep* argument. .. method:: clone(fp) @@ -149,13 +149,13 @@ formatted string representation of a message object. For more detail, see at *msg* to the output file specified when the :class:`BytesGenerator` instance was created. Subparts are visited depth-first and the resulting text will be properly MIME encoded. If the :mod:`~email.policy` option - :attr:`~email.policy.Policy.must_be_7bit` is ``False`` (the default), + :attr:`~email.policy.Policy.cte_type` is ``8bit`` (the default), then any bytes with the high bit set in the original parsed message that have not been modified will be copied faithfully to the output. If - ``must_be_7bit`` is true, the bytes will be converted as needed using an - ASCII content-transfer-encoding. In particular, RFC-invalid non-ASCII - bytes in headers will be encoded using the MIME ``unknown-8bit`` - character set, thus rendering them RFC-compliant. + ``cte_type`` is ``7bit``, the bytes will be converted as needed + using an ASCII-compatible Content-Transfer-Encoding. In particular, + RFC-invalid non-ASCII bytes in headers will be encoded using the MIME + ``unknown-8bit`` character set, thus rendering them RFC-compliant. .. XXX: There should be a complementary option that just does the RFC compliance transformation but leaves CTE 8bit parts alone. diff --git a/Doc/library/email.policy.rst b/Doc/library/email.policy.rst index d9a292c..73cfba1 100644 --- a/Doc/library/email.policy.rst +++ b/Doc/library/email.policy.rst @@ -23,81 +23,100 @@ A :class:`Policy` object encapsulates a set of attributes and methods that control the behavior of various components of the email package during use. :class:`Policy` instances can be passed to various classes and methods in the email package to alter the default behavior. The settable values and their -defaults are described below. The :mod:`policy` module also provides some -pre-created :class:`Policy` instances. In addition to a :const:`default` -instance, there are instances tailored for certain applications. For example -there is an :const:`SMTP` :class:`Policy` with defaults appropriate for -generating output to be sent to an SMTP server. These are listed `below -<Policy Instances>`. - -In general an application will only need to deal with setting the policy at the -input and output boundaries. Once parsed, a message is represented by a -:class:`~email.message.Message` object, which is designed to be independent of -the format that the message has "on the wire" when it is received, transmitted, -or displayed. Thus, a :class:`Policy` can be specified when parsing a message -to create a :class:`~email.message.Message`, and again when turning the -:class:`~email.message.Message` into some other representation. While often a -program will use the same :class:`Policy` for both input and output, the two -can be different. +defaults are described below. + +There is a default policy used by all classes in the email package. This +policy is named :class:`Compat32`, with a corresponding pre-defined instance +named :const:`compat32`. It provides for complete backward compatibility (in +some cases, including bug compatibility) with the pre-Python3.3 version of the +email package. + +The first part of this documentation covers the features of :class:`Policy`, an +:term:`abstract base class` that defines the features that are common to all +policy objects, including :const:`compat32`. This includes certain hook +methods that are called internally by the email package, which a custom policy +could override to obtain different behavior. + +When a :class:`~email.message.Message` object is created, it acquires a policy. +By default this will be :const:`compat32`, but a different policy can be +specified. If the ``Message`` is created by a :mod:`~email.parser`, a policy +passed to the parser will be the policy used by the ``Message`` it creates. If +the ``Message`` is created by the program, then the policy can be specified +when it is created. When a ``Message`` is passed to a :mod:`~email.generator`, +the generator uses the policy from the ``Message`` by default, but you can also +pass a specific policy to the generator that will override the one stored on +the ``Message`` object. + +:class:`Policy` instances are immutable, but they can be cloned, accepting the +same keyword arguments as the class constructor and returning a new +:class:`Policy` instance that is a copy of the original but with the specified +attributes values changed. As an example, the following code could be used to read an email message from a file on disk and pass it to the system ``sendmail`` program on a Unix system:: >>> from email import msg_from_binary_file >>> from email.generator import BytesGenerator - >>> import email.policy >>> from subprocess import Popen, PIPE >>> with open('mymsg.txt', 'b') as f: - ... msg = msg_from_binary_file(f, policy=email.policy.mbox) + ... msg = msg_from_binary_file(f) >>> p = Popen(['sendmail', msg['To'][0].address], stdin=PIPE) - >>> g = BytesGenerator(p.stdin, policy=email.policy.SMTP) + >>> g = BytesGenerator(p.stdin, policy=msg.policy.clone(linesep='\r\n')) >>> g.flatten(msg) >>> p.stdin.close() >>> rc = p.wait() -.. XXX email.policy.mbox/MBOX does not exist yet +Here we are telling :class:`~email.generator.BytesGenerator` to use the RFC +correct line separator characters when creating the binary string to feed into +``sendmail's`` ``stdin``, where the default policy would use ``\n`` line +separators. Some email package methods accept a *policy* keyword argument, allowing the policy to be overridden for that method. For example, the following code uses -the :meth:`~email.message.Message.as_string` method of the *msg* object from the -previous example and re-write it to a file using the native line separators for -the platform on which it is running:: +the :meth:`~email.message.Message.as_string` method of the *msg* object from +the previous example and writes the message to a file using the native line +separators for the platform on which it is running:: >>> import os - >>> mypolicy = email.policy.Policy(linesep=os.linesep) >>> with open('converted.txt', 'wb') as f: - ... f.write(msg.as_string(policy=mypolicy)) - -Policy instances are immutable, but they can be cloned, accepting the same -keyword arguments as the class constructor and returning a new :class:`Policy` -instance that is a copy of the original but with the specified attributes -values changed. For example, the following creates an SMTP policy that will -raise any defects detected as errors:: - - >>> strict_SMTP = email.policy.SMTP.clone(raise_on_defect=True) + ... f.write(msg.as_string(policy=msg.policy.clone(linesep=os.linesep)) Policy objects can also be combined using the addition operator, producing a policy object whose settings are a combination of the non-default values of the summed objects:: - >>> strict_SMTP = email.policy.SMTP + email.policy.strict + >>> compat_SMTP = email.policy.clone(linesep='\r\n') + >>> compat_strict = email.policy.clone(raise_on_defect=True) + >>> compat_strict_SMTP = compat_SMTP + compat_strict This operation is not commutative; that is, the order in which the objects are added matters. To illustrate:: - >>> Policy = email.policy.Policy - >>> apolicy = Policy(max_line_length=100) + Policy(max_line_length=80) + >>> policy100 = compat32.clone(max_line_length=100) + >>> policy80 = compat32.clone(max_line_length=80) + >>> apolicy = policy100 + Policy80 >>> apolicy.max_line_length 80 - >>> apolicy = Policy(max_line_length=80) + Policy(max_line_length=100) + >>> apolicy = policy80 + policy100 >>> apolicy.max_line_length 100 .. class:: Policy(**kw) - The valid constructor keyword arguments are any of the attributes listed - below. + This is the :term:`abstract base class` for all policy classes. It provides + default implementations for a couple of trivial methods, as well as the + implementation of the immutability property, the :meth:`clone` method, and + the constructor semantics. + + The constructor of a policy class can be passed various keyword arguments. + The arguments that may be specified are any non-method properties on this + class, plus any additional non-method properties on the concrete class. A + value specified in the constructor will override the default value for the + corresponding attribute. + + This class defines the following properties, and thus values for the + following may be passed in the constructor of any policy class: .. attribute:: max_line_length @@ -110,18 +129,28 @@ added matters. To illustrate:: The string to be used to terminate lines in serialized output. The default is ``\n`` because that's the internal end-of-line discipline used - by Python, though ``\r\n`` is required by the RFCs. See `Policy - Instances`_ for policies that use an RFC conformant linesep. Setting it - to :attr:`os.linesep` may also be useful. + by Python, though ``\r\n`` is required by the RFCs. + + .. attribute:: cte_type - .. attribute:: must_be_7bit + Controls the type of Content Transfer Encodings that may be or are + required to be used. The possible values are: - If ``True``, data output by a bytes generator is limited to ASCII - characters. If :const:`False` (the default), then bytes with the high - bit set are preserved and/or allowed in certain contexts (for example, - where possible a content transfer encoding of ``8bit`` will be used). - String generators act as if ``must_be_7bit`` is ``True`` regardless of - the policy in effect, since a string cannot represent non-ASCII bytes. + ======== =============================================================== + ``7bit`` all data must be "7 bit clean" (ASCII-only). This means that + where necessary data will be encoded using either + quoted-printable or base64 encoding. + + ``8bit`` data is not constrained to be 7 bit clean. Data in headers is + still required to be ASCII-only and so will be encoded (see + 'binary_fold' below for an exception), but body parts may use + the ``8bit`` CTE. + ======== =============================================================== + + A ``cte_type`` value of ``8bit`` only works with ``BytesGenerator``, not + ``Generator``, because strings cannot contain binary data. If a + ``Generator`` is operating under a policy that specifies + ``cte_type=8bit``, it will act as if ``cte_type`` is ``7bit``. .. attribute:: raise_on_defect @@ -129,56 +158,151 @@ added matters. To illustrate:: :const:`False` (the default), defects will be passed to the :meth:`register_defect` method. - :mod:`Policy` object also have the following methods: + The following :class:`Policy` method is intended to be called by code using + the email library to create policy instances with custom settings: + + .. method:: clone(**kw) + + Return a new :class:`Policy` instance whose attributes have the same + values as the current instance, except where those attributes are + given new values by the keyword arguments. + + The remaining :class:`Policy` methods are called by the email package code, + and are not intended to be called by an application using the email package. + A custom policy must implement all of these methods. .. method:: handle_defect(obj, defect) - *obj* is the object on which to register the defect. *defect* should be - an instance of a subclass of :class:`~email.errors.Defect`. - If :attr:`raise_on_defect` - is ``True`` the defect is raised as an exception. Otherwise *obj* and - *defect* are passed to :meth:`register_defect`. This method is intended - to be called by parsers when they encounter defects, and will not be - called by code that uses the email library unless that code is - implementing an alternate parser. + Handle a *defect* found on *obj*. When the email package calls this + method, *defect* will always be a subclass of + :class:`~email.errors.Defect`. + + The default implementation checks the :attr:`raise_on_defect` flag. If + it is ``True``, *defect* is raised as an exception. If it is ``False`` + (the default), *obj* and *defect* are passed to :meth:`register_defect`. .. method:: register_defect(obj, defect) - *obj* is the object on which to register the defect. *defect* should be - a subclass of :class:`~email.errors.Defect`. This method is part of the - public API so that custom ``Policy`` subclasses can implement alternate - handling of defects. The default implementation calls the ``append`` - method of the ``defects`` attribute of *obj*. + Register a *defect* on *obj*. In the email package, *defect* will always + be a subclass of :class:`~email.errors.Defect`. - .. method:: clone(obj, *kw) + The default implementation calls the ``append`` method of the ``defects`` + attribute of *obj*. When the email package calls :attr:`handle_defect`, + *obj* will normally have a ``defects`` attribute that has an ``append`` + method. Custom object types used with the email package (for example, + custom ``Message`` objects) should also provide such an attribute, + otherwise defects in parsed messages will raise unexpected errors. - Return a new :class:`Policy` instance whose attributes have the same - values as the current instance, except where those attributes are - given new values by the keyword arguments. + .. method:: header_source_parse(sourcelines) + + The email package calls this method with a list of strings, each string + ending with the line separation characters found in the source being + parsed. The first line includes the field header name and separator. + All whitespace in the source is preserved. The method should return the + ``(name, value)`` tuple that is to be stored in the ``Message`` to + represent the parsed header. + + If an implementation wishes to retain compatibility with the existing + email package policies, *name* should be the case preserved name (all + characters up to the '``:``' separator), while *value* should be the + unfolded value (all line separator characters removed, but whitespace + kept intact), stripped of leading whitespace. + + *sourcelines* may contain surrogateescaped binary data. + + There is no default implementation + + .. method:: header_store_parse(name, value) + + The email package calls this method with the name and value provided by + the application program when the application program is modifying a + ``Message`` programmatically (as opposed to a ``Message`` created by a + parser). The method should return the ``(name, value)`` tuple that is to + be stored in the ``Message`` to represent the header. + + If an implementation wishes to retain compatibility with the existing + email package policies, the *name* and *value* should be strings or + string subclasses that do not change the content of the passed in + arguments. + + There is no default implementation + + .. method:: header_fetch_parse(name, value) + + The email package calls this method with the *name* and *value* currently + stored in the ``Message`` when that header is requested by the + application program, and whatever the method returns is what is passed + back to the application as the value of the header being retrieved. + Note that there may be more than one header with the same name stored in + the ``Message``; the method is passed the specific name and value of the + header destined to be returned to the application. + + *value* may contain surrogateescaped binary data. There should be no + surrogateescaped binary data in the value returned by the method. + + There is no default implementation + + .. method:: fold(name, value) + + The email package calls this method with the *name* and *value* currently + stored in the ``Message`` for a given header. The method should return a + string that represents that header "folded" correctly (according to the + policy settings) by composing the *name* with the *value* and inserting + :attr:`linesep` characters at the appropriate places. See :rfc:`5322` + for a discussion of the rules for folding email headers. + + *value* may contain surrogateescaped binary data. There should be no + surrogateescaped binary data in the string returned by the method. + + .. method:: fold_binary(name, value) + + The same as :meth:`fold`, except that the returned value should be a + bytes object rather than a string. + + *value* may contain surrogateescaped binary data. These could be + converted back into binary data in the returned bytes object. + + +.. class:: Compat32(**kw) + + This concrete :class:`Policy` is the backward compatibility policy. It + replicates the behavior of the email package in Python 3.2. The + :mod:`policy` module also defines an instance of this class, + :const:`compat32`, that is used as the default policy. Thus the default + behavior of the email package is to maintain compatibility with Python 3.2. + The class provides the following concrete implementations of the + abstract methods of :class:`Policy`: -Policy Instances -^^^^^^^^^^^^^^^^ + .. method:: header_source_parse(sourcelines) -The following instances of :class:`Policy` provide defaults suitable for -specific common application domains. + The name is parsed as everything up to the '``:``' and returned + unmodified. The value is determined by stripping leading whitespace off + the remainder of the first line, joining all subsequent lines together, + and stripping any trailing carriage return or linefeed characters. -.. data:: default + .. method:: header_store_parse(name, value) - An instance of :class:`Policy` with all defaults unchanged. + The name and value are returned unmodified. -.. data:: SMTP + .. method:: header_fetch_parse(name, value) - Output serialized from a message will conform to the email and SMTP - RFCs. The only changed attribute is :attr:`linesep`, which is set to - ``\r\n``. + If the value contains binary data, it is converted into a + :class:`~email.header.Header` object using the ``unknown-8bit`` charset. + Otherwise it is returned unmodified. -.. data:: HTTP + .. method:: fold(name, value) - Suitable for use when serializing headers for use in HTTP traffic. - :attr:`linesep` is set to ``\r\n``, and :attr:`max_line_length` is set to - :const:`None` (unlimited). + Headers are folded using the :class:`~email.header.Header` folding + algorithm, which preserves existing line breaks in the value, and wraps + each resulting line to the ``max_line_length``. Non-ASCII binary data are + CTE encoded using the ``unknown-8bit`` charset. -.. data:: strict + .. method:: fold_binary(name, value) - :attr:`raise_on_defect` is set to :const:`True`. + Headers are folded using the :class:`~email.header.Header` folding + algorithm, which preserves existing line breaks in the value, and wraps + each resulting line to the ``max_line_length``. If ``cte_type`` is + ``7bit``, non-ascii binary data is CTE encoded using the ``unknown-8bit`` + charset. Otherwise the original source header is used, with its existing + line breaks and and any (RFC invalid) binary data it may contain. |