diff options
Diffstat (limited to 'Lib/email/architecture.rst')
-rw-r--r-- | Lib/email/architecture.rst | 216 |
1 files changed, 216 insertions, 0 deletions
diff --git a/Lib/email/architecture.rst b/Lib/email/architecture.rst new file mode 100644 index 0000000..80d24fe --- /dev/null +++ b/Lib/email/architecture.rst @@ -0,0 +1,216 @@ +:mod:`email` Package Architecture +================================= + +Overview +-------- + +The email package consists of three major components: + + Model + An object structure that represents an email message, and provides an + API for creating, querying, and modifying a message. + + Parser + Takes a sequence of characters or bytes and produces a model of the + email message represented by those characters or bytes. + + Generator + Takes a model and turns it into a sequence of characters or bytes. The + sequence can either be intended for human consumption (a printable + unicode string) or bytes suitable for transmission over the wire. In + the latter case all data is properly encoded using the content transfer + encodings specified by the relevant RFCs. + +Conceptually the package is organized around the model. The model provides both +"external" APIs intended for use by application programs using the library, +and "internal" APIs intended for use by the Parser and Generator components. +This division is intentionally a bit fuzy; the API described by this documentation +is all a public, stable API. This allows for an application with special needs +to implement its own parser and/or generator. + +In addition to the three major functional components, there is a third key +component to the architecture: + + Policy + An object that specifies various behavioral settings and carries + implementations of various behavior-controlling methods. + +The Policy framework provides a simple and convenient way to control the +behavior of the library, making it possible for the library to be used in a +very flexible fashion while leveraging the common code required to parse, +represent, and generate message-like objects. For example, in addition to the +default :rfc:`5322` email message policy, we also have a policy that manages +HTTP headers in a fashion compliant with :rfc:`2616`. Individual policy +controls, such as the maximum line length produced by the generator, can also +be controlled individually to meet specialized application requirements. + + +The Model +--------- + +The message model is implemented by the :class:`~email.message.Message` class. +The model divides a message into the two fundamental parts discussed by the +RFC: the header section and the body. The `Message` object acts as a +pseudo-dictionary of named headers. Its dictionary interface provides +convenient access to individual headers by name. However, all headers are kept +internally in an ordered list, so that the information about the order of the +headers in the original message is preserved. + +The `Message` object also has a `payload` that holds the body. A `payload` can +be one of two things: data, or a list of `Message` objects. The latter is used +to represent a multipart MIME message. Lists can be nested arbitrarily deeply +in order to represent the message, with all terminal leaves having non-list +data payloads. + + +Message Lifecycle +----------------- + +The general lifecyle of a message is: + + Creation + A `Message` object can be created by a Parser, or it can be + instantiated as an empty message by an application. + + Manipulation + The application may examine one or more headers, and/or the + payload, and it may modify one or more headers and/or + the payload. This may be done on the top level `Message` + object, or on any sub-object. + + Finalization + The Model is converted into a unicode or binary stream, + or the model is discarded. + + + +Header Policy Control During Lifecycle +-------------------------------------- + +One of the major controls exerted by the Policy is the management of headers +during the `Message` lifecycle. Most applications don't need to be aware of +this. + +A header enters the model in one of two ways: via a Parser, or by being set to +a specific value by an application program after the Model already exists. +Similarly, a header exits the model in one of two ways: by being serialized by +a Generator, or by being retrieved from a Model by an application program. The +Policy object provides hooks for all four of these pathways. + +The model storage for headers is a list of (name, value) tuples. + +The Parser identifies headers during parsing, and passes them to the +:meth:`~email.policy.Policy.header_source_parse` method of the Policy. The +result of that method is the (name, value) tuple to be stored in the model. + +When an application program supplies a header value (for example, through the +`Message` object `__setitem__` interface), the name and the value are passed to +the :meth:`~email.policy.Policy.header_store_parse` method of the Policy, which +returns the (name, value) tuple to be stored in the model. + +When an application program retrieves a header (through any of the dict or list +interfaces of `Message`), the name and value are passed to the +:meth:`~email.policy.Policy.header_fetch_parse` method of the Policy to +obtain the value returned to the application. + +When a Generator requests a header during serialization, the name and value are +passed to the :meth:`~email.policy.Policy.fold` method of the Policy, which +returns a string containing line breaks in the appropriate places. The +:meth:`~email.policy.Policy.cte_type` Policy control determines whether or +not Content Transfer Encoding is performed on the data in the header. There is +also a :meth:`~email.policy.Policy.binary_fold` method for use by generators +that produce binary output, which returns the folded header as binary data, +possibly folded at different places than the corresponding string would be. + + +Handling Binary Data +-------------------- + +In an ideal world all message data would conform to the RFCs, meaning that the +parser could decode the message into the idealized unicode message that the +sender originally wrote. In the real world, the email package must also be +able to deal with badly formatted messages, including messages containing +non-ASCII characters that either have no indicated character set or are not +valid characters in the indicated character set. + +Since email messages are *primarily* text data, and operations on message data +are primarily text operations (except for binary payloads of course), the model +stores all text data as unicode strings. Un-decodable binary inside text +data is handled by using the `surrogateescape` error handler of the ASCII +codec. As with the binary filenames the error handler was introduced to +handle, this allows the email package to "carry" the binary data received +during parsing along until the output stage, at which time it is regenerated +in its original form. + +This carried binary data is almost entirely an implementation detail. The one +place where it is visible in the API is in the "internal" API. A Parser must +do the `surrogateescape` encoding of binary input data, and pass that data to +the appropriate Policy method. The "internal" interface used by the Generator +to access header values preserves the `surrogateescaped` bytes. All other +interfaces convert the binary data either back into bytes or into a safe form +(losing information in some cases). + + +Backward Compatibility +---------------------- + +The :class:`~email.policy.Policy.Compat32` Policy provides backward +compatibility with version 5.1 of the email package. It does this via the +following implementation of the four+1 Policy methods described above: + +header_source_parse + Splits the first line on the colon to obtain the name, discards any spaces + after the colon, and joins the remainder of the line with all of the + remaining lines, preserving the linesep characters to obtain the value. + Trailing carriage return and/or linefeed characters are stripped from the + resulting value string. + +header_store_parse + Returns the name and value exactly as received from the application. + +header_fetch_parse + If the value contains any `surrogateescaped` binary data, return the value + as a :class:`~email.header.Header` object, using the character set + `unknown-8bit`. Otherwise just returns the value. + +fold + Uses :class:`~email.header.Header`'s folding to fold headers in the + same way the email5.1 generator did. + +binary_fold + Same as fold, but encodes to 'ascii'. + + +New Algorithm +------------- + +header_source_parse + Same as legacy behavior. + +header_store_parse + Same as legacy behavior. + +header_fetch_parse + If the value is already a header object, returns it. Otherwise, parses the + value using the new parser, and returns the resulting object as the value. + `surrogateescaped` bytes get turned into unicode unknown character code + points. + +fold + Uses the new header folding algorithm, respecting the policy settings. + surrogateescaped bytes are encoded using the ``unknown-8bit`` charset for + ``cte_type=7bit`` or ``8bit``. Returns a string. + + At some point there will also be a ``cte_type=unicode``, and for that + policy fold will serialize the idealized unicode message with RFC-like + folding, converting any surrogateescaped bytes into the unicode + unknown character glyph. + +binary_fold + Uses the new header folding algorithm, respecting the policy settings. + surrogateescaped bytes are encoded using the `unknown-8bit` charset for + ``cte_type=7bit``, and get turned back into bytes for ``cte_type=8bit``. + Returns bytes. + + At some point there will also be a ``cte_type=unicode``, and for that + policy binary_fold will serialize the message according to :rfc:``5335``. |