From 5e634638e623e25aeb84d82e0b89891173a0a5f7 Mon Sep 17 00:00:00 2001 From: Barry Warsaw Date: Wed, 26 Sep 2001 05:23:47 +0000 Subject: The email package documentation, currently organized the way I think Fred prefers. I'm not sure I like this organization, so it may change. --- Doc/lib/email.tex | 358 ++++++++++++++++++++++++++++++++++++++++++ Doc/lib/emailencoders.tex | 53 +++++++ Doc/lib/emailexc.tex | 53 +++++++ Doc/lib/emailgenerator.tex | 68 ++++++++ Doc/lib/emailiter.tex | 37 +++++ Doc/lib/emailmessage.tex | 384 +++++++++++++++++++++++++++++++++++++++++++++ Doc/lib/emailparser.tex | 96 ++++++++++++ Doc/lib/emailutil.tex | 119 ++++++++++++++ 8 files changed, 1168 insertions(+) create mode 100644 Doc/lib/email.tex create mode 100644 Doc/lib/emailencoders.tex create mode 100644 Doc/lib/emailexc.tex create mode 100644 Doc/lib/emailgenerator.tex create mode 100644 Doc/lib/emailiter.tex create mode 100644 Doc/lib/emailmessage.tex create mode 100644 Doc/lib/emailparser.tex create mode 100644 Doc/lib/emailutil.tex diff --git a/Doc/lib/email.tex b/Doc/lib/email.tex new file mode 100644 index 0000000..eba0684 --- /dev/null +++ b/Doc/lib/email.tex @@ -0,0 +1,358 @@ +% Copyright (C) 2001 Python Software Foundation +% Author: barry@zope.com (Barry Warsaw) + +\section{\module{email} -- + An email and MIME handling package} + +\declaremodule{standard}{email} +\modulesynopsis{Package supporting the parsing, manipulating, and + generating email messages, including MIME documents.} +\moduleauthor{Barry A. Warsaw}{barry@zope.com} + +\versionadded{2.2} + +The \module{email} package is a library for managing email messages, +including MIME and other \rfc{2822}-based message documents. It +subsumes most of the functionality in several older standard modules +such as \module{rfc822}, \module{mimetools}, \module{multifile}, and +other non-standard packages such as \module{mimecntl}. + +The primary distinguishing feature of the \module{email} package is +that it splits the parsing and generating of email messages from the +internal \emph{object model} representation of email. Applications +using the \module{email} package deal primarily with objects; you can +add sub-objects to messages, remove sub-objects from messages, +completely re-arrange the contents, etc. There is a separate parser +and a separate generator which handles the transformation from flat +text to the object module, and then back to flat text again. There +are also handy subclasses for some common MIME object types, and a few +miscellaneous utilities that help with such common tasks as extracting +and parsing message field values, creating RFC-compliant dates, etc. + +The following sections describe the functionality of the +\module{email} package. The ordering follows a progression that +should be common in applications: an email message is read as flat +text from a file or other source, the text is parsed to produce an +object model representation of the email message, this model is +manipulated, and finally the model is rendered back into +flat text. + +It is perfectly feasible to create the object model out of whole cloth +-- i.e. completely from scratch. From there, a similar progression can +be taken as above. + +Also included are detailed specifications of all the classes and +modules that the \module{email} package provides, the exception +classes you might encounter while using the \module{email} package, +some auxiliary utilities, and a few examples. For users of the older +\module{mimelib} package, from which the \module{email} package is +descendent, a section on differences and porting is provided. + +\subsection{Representing an email message} + +The primary object in the \module{email} package is the +\class{Message} class, provided in the \refmodule{email.Message} +module. \class{Message} is the base class for the \module{email} +object model. It provides the core functionality for setting and +querying header fields, and for accessing message bodies. + +Conceptually, a \class{Message} object consists of \emph{headers} and +\emph{payloads}. Headers are \rfc{2822} style field name and +values where the field name and value are separated by a colon. The +colon is not part of either the field name or the field value. + +Headers are stored and returned in case-preserving form but are +matched case-insensitively. There may also be a single +\emph{Unix-From} header, also known as the envelope header or the +\code{From_} header. The payload is either a string in the case of +simple message objects, a list of \class{Message} objects for +multipart MIME documents, or a single \class{Message} instance for +\code{message/rfc822} type objects. + +\class{Message} objects provide a mapping style interface for +accessing the message headers, and an explicit interface for accessing +both the headers and the payload. It provides convenience methods for +generating a flat text representation of the message object tree, for +accessing commonly used header parameters, and for recursively walking +over the object tree. + +\subsection{Parsing email messages} +Message object trees can be created in one of two ways: they can be +created from whole cloth by instantiating \class{Message} objects and +stringing them together via \method{add_payload()} and +\method{set_payload()} calls, or they can be created by parsing a flat text +representation of the email message. + +The \module{email} package provides a standard parser that understands +most email document structures, including MIME documents. You can +pass the parser a string or a file object, and the parser will return +to you the root \class{Message} instance of the object tree. For +simple, non-MIME messages the payload of this root object will likely +be a string (e.g. containing the text of the message). For MIME +messages, the root object will return 1 from its +\method{is_multipart()} method, and the subparts can be accessed via +the \method{get_payload()} and \method{walk()} methods. + +Note that the parser can be extended in limited ways, and of course +you can implement your own parser completely from scratch. There is +no magical connection between the \module{email} package's bundled +parser and the +\class{Message} class, so your custom parser can create message object +trees in any way it find necessary. The \module{email} package's +parser is described in detail in the \refmodule{email.Parser} module +documentation. + +\subsection{Generating MIME documents} +One of the most common tasks is to generate the flat text of the email +message represented by a message object tree. You will need to do +this if you want to send your message via the \refmodule{smtplib} +module or the \refmodule{nntplib} module, or print the message on the +console. Taking a message object tree and producing a flat text +document is the job of the \refmodule{email.Generator} module. + +Again, as with the \refmodule{email.Parser} module, you aren't limited +to the functionality of the bundled generator; you could write one +from scratch yourself. However the bundled generator knows how to +generate most email in a standards-compliant way, should handle MIME +and non-MIME email messages just fine, and is designed so that the +transformation from flat text, to an object tree via the +\class{Parser} class, +and back to flat text, be idempotent (the input is identical to the +output). + +\subsection{Creating email and MIME objects from scratch} + +Ordinarily, you get a message object tree by passing some text to a +parser, which parses the text and returns the root of the message +object tree. However you can also build a complete object tree from +scratch, or even individual \class{Message} objects by hand. In fact, +you can also take an existing tree and add new \class{Message} +objects, move them around, etc. This makes a very convenient +interface for slicing-and-dicing MIME messages. + +You can create a new object tree by creating \class{Message} +instances, adding payloads and all the appropriate headers manually. +For MIME messages though, the \module{email} package provides some +convenient classes to make things easier. Each of these classes +should be imported from a module with the same name as the class, from +within the \module{email} package. E.g.: + +\begin{verbatim} +import email.MIMEImage.MIMEImage +\end{verbatim} + +or + +\begin{verbatim} +from email.MIMEText import MIMEText +\end{verbatim} + +Here are the classes: + +\begin{classdesc}{MIMEBase}{_maintype, _subtype, **_params} +This is the base class for all the MIME-specific subclasses of +\class{Message}. Ordinarily you won't create instances specifically +of \class{MIMEBase}, although you could. \class{MIMEBase} is provided +primarily as a convenient base class for more specific MIME-aware +subclasses. + +\var{_maintype} is the \code{Content-Type:} major type (e.g. \code{text} or +\code{image}), and \var{_subtype} is the \code{Content-Type:} minor type +(e.g. \code{plain} or \code{gif}). \var{_params} is a parameter +key/value dictionary and is passed directly to +\method{Message.add_header()}. + +The \class{MIMEBase} class always adds a \code{Content-Type:} header +(based on \var{_maintype}, \var{_subtype}, and \var{_params}), and a +\code{MIME-Version:} header (always set to \code{1.0}). +\end{classdesc} + +\begin{classdesc}{MIMEImage}{_imagedata\optional{, _subtype\optional{, + _encoder\optional{, **_params}}}} + +A subclass of \class{MIMEBase}, the \class{MIMEImage} class is used to +create MIME message objects of major type \code{image}. +\var{_imagedata} is a string containing the raw image data. If this +data can be decoded by the standard Python module \refmodule{imghdr}, +then the subtype will be automatically included in the +\code{Content-Type:} header. Otherwise you can explicitly specify the +image subtype via the \var{_subtype} parameter. If the minor type could +not be guessed and \var{_subtype} was not given, then \code{TypeError} +is raised. + +Optional \var{_encoder} is a callable (i.e. function) which will +perform the actual encoding of the image data for transport. This +callable takes one argument, which is the \class{MIMEImage} instance. +It should use \method{get_payload()} and \method{set_payload()} to +change the payload to encoded form. It should also add any +\code{Content-Transfer-Encoding:} or other headers to the message +object as necessary. The default encoding is \emph{Base64}. See the +\refmodule{email.Encoders} module for a list of the built-in encoders. + +\var{_params} are passed straight through to the \class{MIMEBase} +constructor. +\end{classdesc} + +\begin{classdesc}{MIMEText}{_text\optional{, _subtype\optional{, + _charset\optional{, _encoder}}}} +A subclass of \class{MIMEBase}, the \class{MIMEText} class is used to +create MIME objects of major type \code{text}. \var{_text} is the string +for the payload. \var{_subtype} is the minor type and defaults to +\code{plain}. \var{_charset} is the character set of the text and is +passed as a parameter to the \class{MIMEBase} constructor; it defaults +to \code{us-ascii}. No guessing or encoding is performed on the text +data, but a newline is appended to \var{_text} if it doesn't already +end with a newline. + +The \var{_encoding} argument is as with the \class{MIMEImage} class +constructor, except that the default encoding for \class{MIMEText} +objects is one that doesn't actually modify the payload, but does set +the \code{Content-Transfer-Encoding:} header to \code{7bit} or +\code{8bit} as appropriate. +\end{classdesc} + +\begin{classdesc}{MIMEMessage}{_msg\optional{, _subtype}} +A subclass of \class{MIMEBase}, the \class{MIMEMessage} class is used to +create MIME objects of main type \code{message}. \var{_msg} is used as +the payload, and must be an instance of class \class{Message} (or a +subclass thereof), otherwise a \exception{TypeError} is raised. + +Optional \var{_subtype} sets the subtype of the message; it defaults +to \code{rfc822}. +\end{classdesc} + +\subsection{Encoders, Exceptions, Utilities, and Iterators} + +The \module{email} package provides various encoders for safe +transport of binary payloads in \class{MIMEImage} and \class{MIMEText} +instances. See the \refmodule{email.Encoders} module for more +details. + +All of the class exceptions that the \module{email} package can raise +are available in the \refmodule{email.Errors} module. + +Some miscellaneous utility functions are available in the +\refmodule{email.Utils} module. + +Iterating over a message object tree is easy with the +\method{Message.walk()} method; some additional helper iterators are +available in the \refmodule{email.Iterators} module. + +\subsection{Differences from \module{mimelib}} + +The \module{email} package was originally prototyped as a separate +library called \module{mimelib}. Changes have been made so that +method names are more consistent, and some methods or modules have +either been added or removed. The semantics of some of the methods +have also changed. For the most part, any functionality available in +\module{mimelib} is still available in the \module{email} package, +albeit often in a different way. + +Here is a brief description of the differences between the +\module{mimelib} and the \module{email} packages, along with hints on +how to port your applications. + +Of course, the most visible difference between the two packages is +that the package name has been changed to \module{email}. In +addition, the top-level package has the following differences: + +\begin{itemize} +\item \function{messageFromString()} has been renamed to + \function{message_from_string()}. +\item \function{messageFromFile()} has been renamed to + \function{message_from_file()}. +\end{itemize} + +The \class{Message} class has the following differences: + +\begin{itemize} +\item The method \method{asString()} was renamed to \method{as_string()}. +\item The method \method{ismultipart()} was renamed to + \method{is_multipart()}. +\item The \method{get_payload()} method has grown a \var{decode} + optional argument. +\item The method \method{getall()} was renamed to \method{get_all()}. +\item The method \method{addheader()} was renamed to \method{add_header()}. +\item The method \method{gettype()} was renamed to \method{get_type()}. +\item The method\method{getmaintype()} was renamed to + \method{get_main_type()}. +\item The method \method{getsubtype()} was renamed to + \method{get_subtype()}. +\item The method \method{getparams()} was renamed to + \method{get_params()}. + Also, whereas \method{getparams()} returned a list of strings, + \method{get_params()} returns a list of 2-tuples, effectively + the key/value pairs of the parameters, split on the \samp{=} + sign. +\item The method \method{getparam()} was renamed to \method{get_param()}. +\item The method \method{getcharsets()} was renamed to + \method{get_charsets()}. +\item The method \method{getfilename()} was renamed to + \method{get_filename()}. +\item The method \method{getboundary()} was renamed to + \method{get_boundary()}. +\item The method \method{setboundary()} was renamed to + \method{set_boundary()}. +\item The method \method{getdecodedpayload()} was removed. To get + similar functionality, pass the value 1 to the \var{decode} flag + of the {get_payload()} method. +\item The method \method{getpayloadastext()} was removed. Similar + functionality + is supported by the \class{DecodedGenerator} class in the + \refmodule{email.Generator} module. +\item The method \method{getbodyastext()} was removed. You can get + similar functionality by creating an iterator with + \function{typed_subpart_iterator()} in the + \refmodule{email.Iterators} module. +\end{itemize} + +The \class{Parser} class has no differences in its public interface. +It does have some additional smarts to recognize +\code{message/delivery-status} type messages, which it represents as +a \class{Message} instance containing separate \class{Message} +subparts for each header block in the delivery status +notification\footnote{Delivery Status Notifications (DSN) are defined +in \rfc{1894}}. + +The \class{Generator} class has no differences in its public +interface. There is a new class in the \refmodule{email.Generator} +module though, called \class{DecodedGenerator} which provides most of +the functionality previously available in the +\method{Message.getpayloadastext()} method. + +The following modules and classes have been changed: + +\begin{itemize} +\item The \class{MIMEBase} class constructor arguments \var{_major} + and \var{_minor} have changed to \var{_maintype} and + \var{_subtype} respectively. +\item The \code{Image} class/module has been renamed to + \code{MIMEImage}. The \var{_minor} argument has been renamed to + \var{_subtype}. +\item The \code{Text} class/module has been renamed to + \code{MIMEText}. The \var{_minor} argument has been renamed to + \var{_subtype}. +\item The \code{MessageRFC822} class/module has been renamed to + \code{MIMEMessage}. Note that an earlier version of + \module{mimelib} called this class/module \code{RFC822}, but + that clashed with the Python standard library module + \refmodule{rfc822} on some case-insensitive file systems. + + Also, the \class{MIMEMessage} class now represents any kind of + MIME message with main type \code{message}. It takes an + optional argument \var{_subtype} which is used to set the MIME + subtype. \var{_subtype} defaults to \code{rfc822}. +\end{itemize} + +\module{mimelib} provided some utility functions in its +\module{address} and \module{date} modules. All of these functions +have been moved to the \refmodule{email.Utils} module. + +The \code{MsgReader} class/module has been removed. Its functionality +is most closely supported in the \function{body_line_iterator()} +function in the \refmodule{email.Iterators} module. + +\subsection{Examples} + +Coming soon... + diff --git a/Doc/lib/emailencoders.tex b/Doc/lib/emailencoders.tex new file mode 100644 index 0000000..6ebb302 --- /dev/null +++ b/Doc/lib/emailencoders.tex @@ -0,0 +1,53 @@ +\section{\module{email.Encoders} --- + Email message payload encoders} + +\declaremodule{standard}{email.Encoders} +\modulesynopsis{Encoders for email message payloads.} +\sectionauthor{Barry A. Warsaw}{barry@zope.com} + +\versionadded{2.2} + +When creating \class{Message} objects from scratch, you often need to +encode the payloads for transport through compliant mail servers. +This is especially true for \code{image/*} and \code{text/*} type +messages containing binary data. + +The \module{email} package provides some convenient encodings in its +\module{Encoders} module. These encoders are actually used by the +\class{MIMEImage} and \class{MIMEText} class constructors to provide default +encodings. All encoder functions take exactly one argument, the +message object to encode. They usually extract the payload, encode +it, and reset the payload to this newly encoded value. They should also +set the \code{Content-Transfer-Encoding:} header as appropriate. + +Here are the encoding functions provided: + +\begin{funcdesc}{encode_quopri}{msg} +Encodes the payload into \emph{Quoted-Printable} form and sets the +\code{Content-Transfer-Encoding:} header to +\code{quoted-printable}\footnote{Note that encoding with +\method{encode_quopri()} also encodes all tabs and space characters in +the data.}. +This is a good encoding to use when most of your payload is normal +printable data, but contains a few unprintable characters. +\end{funcdesc} + +\begin{funcdesc}{encode_base64}{msg} +Encodes the payload into \emph{Base64} form and sets the +\code{Content-Transfer-Encoding:} header to +\code{base64}. This is a good encoding to use when most of your payload +is unprintable data since it is a more compact form than +Quoted-Printable. The drawback of Base64 encoding is that it +renders the text non-human readable. +\end{funcdesc} + +\begin{funcdesc}{encode_7or8bit}{msg} +This doesn't actually modify the message's payload, but it does set +the \code{Content-Transfer-Encoding:} header to either \code{7bit} or +\code{8bit} as appropriate, based on the payload data. +\end{funcdesc} + +\begin{funcdesc}{encode_noop}{msg} +This does nothing; it doesn't even set the +\code{Content-Transfer-Encoding:} header. +\end{funcdesc} diff --git a/Doc/lib/emailexc.tex b/Doc/lib/emailexc.tex new file mode 100644 index 0000000..8b2d189 --- /dev/null +++ b/Doc/lib/emailexc.tex @@ -0,0 +1,53 @@ +\section{\module{email.Errors} --- + email package exception classes} + +\declaremodule{standard}{email.Exceptions} +\modulesynopsis{The exception classes used by the email package.} +\sectionauthor{Barry A. Warsaw}{barry@zope.com} + +\versionadded{2.2} + +The following exception classes are defined in the +\module{email.Errors} module: + +\begin{excclassdesc}{MessageError}{} +This is the base class for all exceptions that the \module{email} +package can raise. It is derived from the standard +\exception{Exception} class and defines no additional methods. +\end{excclassdesc} + +\begin{excclassdesc}{MessageParseError}{} +This is the base class for exceptions thrown by the \class{Parser} +class. It is derived from \exception{MessageError}. +\end{excclassdesc} + +\begin{excclassdesc}{HeaderParseError}{} +Raised under some error conditions when parsing the \rfc{2822} headers of +a message, this class is derived from \exception{MessageParseError}. +It can be raised from the \method{Parser.parse()} or +\method{Parser.parsestr()} methods. + +Situations where it can be raised include finding a \emph{Unix-From} +header after the first \rfc{2822} header of the message, finding a +continuation line before the first \rfc{2822} header is found, or finding +a line in the headers which is neither a header or a continuation +line. +\end{excclassdesc} + +\begin{excclassdesc}{BoundaryError}{} +Raised under some error conditions when parsing the \rfc{2822} headers of +a message, this class is derived from \exception{MessageParseError}. +It can be raised from the \method{Parser.parse()} or +\method{Parser.parsestr()} methods. + +Situations where it can be raised include not being able to find the +starting or terminating boundary in a \code{multipart/*} message. +\end{excclassdesc} + +\begin{excclassdesc}{MultipartConversionError}{} +Raised when a payload is added to a \class{Message} object using +\method{add_payload()}, but the payload is already a scalar and the +message's \code{Content-Type:} main type is not either \code{multipart} +or missing. \exception{MultipartConversionError} multiply inherits +from \exception{MessageError} and the built-in \exception{TypeError}. +\end{excclassdesc} diff --git a/Doc/lib/emailgenerator.tex b/Doc/lib/emailgenerator.tex new file mode 100644 index 0000000..2cb58ec --- /dev/null +++ b/Doc/lib/emailgenerator.tex @@ -0,0 +1,68 @@ +\section{\module{email.Generator} --- + Generating flat text from an email message object tree} + +\declaremodule{standard}{email.Generator} +\modulesynopsis{Generate flat text email messages to from a message + object tree.} +\sectionauthor{Barry A. Warsaw}{barry@zope.com} + +\versionadded{2.2} + +The \class{Generator} class is used to render a message object model +into its flat text representation, including MIME encoding any +sub-messages, generating the correct \rfc{2822} headers, etc. Here +are the public methods of the \class{Generator} class. + +\begin{classdesc}{Generator}{outfp\optional{, mangle_from_\optional{, + maxheaderlen}}} +The constructor for the \class{Generator} class takes a file-like +object called \var{outfp} for an argument. \var{outfp} must support +the \method{write()} method and be usable as the output file in a +Python 2.0 extended print statement. + +Optional \var{mangle_from_} is a flag that, when true, puts a ``>'' +character in front of any line in the body that starts exactly as +\samp{From } (i.e. \code{From} followed by a space at the front of the +line). This is the only guaranteed portable way to avoid having such +lines be mistaken for \emph{Unix-From} headers (see +\url{http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/content-length.html} + for details). + +Optional \var{maxheaderlen} specifies the longest length for a +non-continued header. When a header line is longer than +\var{maxheaderlen} (in characters, with tabs expanded to 8 spaces), +the header will be broken on semicolons and continued as per +\rfc{2822}. If no semicolon is found, then the header is left alone. +Set to zero to disable wrapping headers. Default is 78, as +recommended (but not required) by \rfc{2822}. +\end{classdesc} + +The other public \class{Generator} methods are: + +\begin{methoddesc}[Generator]{__call__}{msg\optional{, unixfrom}} +Print the textual representation of the message object tree rooted at +\var{msg} to the output file specified when the \class{Generator} +instance was created. Sub-objects are visited depth-first and the +resulting text will be properly MIME encoded. + +Optional \var{unixfrom} is a flag that forces the printing of the +\emph{Unix-From} (a.k.a. envelope header or \code{From_} header) +delimiter before the first \rfc{2822} header of the root message +object. If the root object has no \emph{Unix-From} header, a standard +one is crafted. By default, this is set to 0 to inhibit the printing +of the \emph{Unix-From} delimiter. + +Note that for sub-objects, no \emph{Unix-From} header is ever printed. +\end{methoddesc} + +\begin{methoddesc}[Generator]{write}{s} +Write the string \var{s} to the underlying file object, +i.e. \var{outfp} passed to \class{Generator}'s constructor. This +provides just enough file-like API for \class{Generator} instances to +be used in extended print statements. +\end{methoddesc} + +As a convenience, see the methods \method{Message.as_string()} and +\code{str(aMessage)}, a.k.a. \method{Message.__str__()}, which +simplify the generation of a formatted string representation of a +message object. For more detail, see \refmodule{email.Message}. diff --git a/Doc/lib/emailiter.tex b/Doc/lib/emailiter.tex new file mode 100644 index 0000000..fbaafbb --- /dev/null +++ b/Doc/lib/emailiter.tex @@ -0,0 +1,37 @@ +\section{\module{email.Iterators} --- + Message object tree iterators} + +\declaremodule{standard}{email.Iterators} +\modulesynopsis{Iterate over a message object tree.} +\sectionauthor{Barry A. Warsaw}{barry@zope.com} + +\versionadded{2.2} + +Iterating over a message object tree is fairly easy with the +\method{Message.walk()} method. The \module{email.Iterators} module +provides some useful higher level iterations over message object +trees. + +\begin{funcdesc}{body_line_iterator}{msg} +This iterates over all the payloads in all the subparts of \var{msg}, +returning the string payloads line-by-line. It skips over all the +subpart headers, and it skips over any subpart with a payload that +isn't a Python string. This is somewhat equivalent to reading the +flat text representation of the message from a file using +\method{readline()}, skipping over all the intervening headers. +\end{funcdesc} + +\begin{funcdesc}{typed_subpart_iterator}{msg\optional{, + maintype\optional{, subtype}}} +This iterates over all the subparts of \var{msg}, returning only those +subparts that match the MIME type specified by \var{maintype} and +\var{subtype}. + +Note that \var{subtype} is optional; if omitted, then subpart MIME +type matching is done only with the main type. \var{maintype} is +optional too; it defaults to \code{text}. + +Thus, by default \function{typed_subpart_iterator()} returns each +subpart that has a MIME type of \code{text/*}. +\end{funcdesc} + diff --git a/Doc/lib/emailmessage.tex b/Doc/lib/emailmessage.tex new file mode 100644 index 0000000..bc9c0ce --- /dev/null +++ b/Doc/lib/emailmessage.tex @@ -0,0 +1,384 @@ +\section{\module{email.Message} --- + The Message class} + +\declaremodule{standard}{email.Message} +\modulesynopsis{The base class representing email messages.} +\sectionauthor{Barry A. Warsaw}{barry@zope.com} + +\versionadded{2.2} + +The \module{Message} module provides a single class, the +\class{Message} class. This class is the base class for the +\module{email} package object model. It has a fairly extensive set of +methods to get and set email headers and email payloads. For an +introduction of the \module{email} package, please read the +\refmodule{email} package overview. + +\class{Message} instances can be created either directly, or +indirectly by using a \refmodule{email.Parser}. \class{Message} +objects provide a mapping style interface for accessing the message +headers, and an explicit interface for accessing both the headers and +the payload. It provides convenience methods for generating a flat +text representation of the message object tree, for accessing commonly +used header parameters, and for recursively walking over the object +tree. + +Here are the methods of the \class{Message} class: + +\begin{methoddesc}[Message]{as_string}{\optional{unixfrom}} +Return the entire formatted message as a string. Optional +\var{unixfrom}, when true, specifies to include the \emph{Unix-From} +envelope header; it defaults to 0. +\end{methoddesc} + +\begin{methoddesc}[Message]{__str__()}{} +Equivalent to \method{aMessage.as_string(unixfrom=1)}. +\end{methoddesc} + +\begin{methoddesc}[Message]{is_multipart}{} +Return 1 if the message's payload is a list of sub-\class{Message} +objects, otherwise return 0. When \method{is_multipart()} returns 0, +the payload should either be a string object, or a single +\class{Message} instance. +\end{methoddesc} + +\begin{methoddesc}[Message]{set_unixfrom}{unixfrom} +Set the \emph{Unix-From} (a.k.a envelope header or \code{From_} +header) to \var{unixfrom}, which should be a string. +\end{methoddesc} + +\begin{methoddesc}[Message]{get_unixfrom}{} +Return the \emph{Unix-From} header. Defaults to \code{None} if the +\emph{Unix-From} header was never set. +\end{methoddesc} + +\begin{methoddesc}[Message]{add_payload}{payload} +Add \var{payload} to the message object's existing payload. If, prior +to calling this method, the object's payload was \code{None} +(i.e. never before set), then after this method is called, the payload +will be the argument \var{payload}. + +If the object's payload was already a list +(i.e. \method{is_multipart()} returns 1), then \var{payload} is +appended to the end of the existing payload list. + +For any other type of existing payload, \method{add_payload()} will +transform the new payload into a list consisting of the old payload +and \var{payload}, but only if the document is already a MIME +multipart document. This condition is satisfied if the message's +\code{Content-Type:} header's main type is either \var{multipart}, or +there is no \code{Content-Type:} header. In any other situation, +\exception{MultipartConversionError} is raised. +\end{methoddesc} + +\begin{methoddesc}[Message]{attach}{payload} +Synonymous with \method{add_payload()}. +\end{methoddesc} + +\begin{methoddesc}[Message]{get_payload}{\optional{i\optional{, decode}}} +Return the current payload, which will be a list of \class{Message} +objects when \method{is_multipart()} returns 1, or a scalar (either a +string or a single \class{Message} instance) when +\method{is_multipart()} returns 0. + +With optional \var{i}, \method{get_payload()} will return the +\var{i}-th element of the payload, counting from zero, if +\method{is_multipart()} returns 1. An \code{IndexError} will be raised +if \var{i} is less than 0 or greater than or equal to the number of +items in the payload. If the payload is scalar +(i.e. \method{is_multipart()} returns 0) and \var{i} is given, a +\code{TypeError} is raised. + +Optional \var{decode} is a flag indicating whether the payload should be +decoded or not, according to the \code{Content-Transfer-Encoding:} header. +When true and the message is not a multipart, the payload will be +decoded if this header's value is \samp{quoted-printable} or +\samp{base64}. If some other encoding is used, or +\code{Content-Transfer-Encoding:} header is +missing, the payload is returned as-is (undecoded). If the message is +a multipart and the \var{decode} flag is true, then \code{None} is +returned. +\end{methoddesc} + +\begin{methoddesc}[Message]{set_payload}{payload} +Set the entire message object's payload to \var{payload}. It is the +client's responsibility to ensure the payload invariants. +\end{methoddesc} + +The following methods implement a mapping-like interface for accessing +the message object's \rfc{2822} headers. Note that there are some +semantic differences between these methods and a normal mapping +(i.e. dictionary) interface. For example, in a dictionary there are +no duplicate keys, but here there may be duplicate message headers. Also, +in dictionaries there is no guaranteed order to the keys returned by +\method{keys()}, but in a \class{Message} object, there is an explicit +order. These semantic differences are intentional and are biased +toward maximal convenience. + +Note that in all cases, any optional \emph{Unix-From} header the message +may have is not included in the mapping interface. + +\begin{methoddesc}[Message]{__len__}{} +Return the total number of headers, including duplicates. +\end{methoddesc} + +\begin{methoddesc}[Message]{__contains__}{name} +Return true if the message object has a field named \var{name}. +Match is done case-insensitively and \var{name} should not include the +trailing colon. Used for the \code{in} operator, +e.g.: + +\begin{verbatim} +if 'message-id' in myMessage: + print 'Message-ID:', myMessage['message-id'] +\end{verbatim} +\end{methoddesc} + +\begin{methoddesc}[Message]{__getitem__}{name} +Return the value of the named header field. \var{name} should not +include the colon field separator. If the header is missing, +\code{None} is returned; a \code{KeyError} is never raised. + +Note that if the named field appears more than once in the message's +headers, exactly which of those field values will be returned is +undefined. Use the \method{get_all()} method to get the values of all +the extant named headers. +\end{methoddesc} + +\begin{methoddesc}[Message]{__setitem__}{name, val} +Add a header to the message with field name \var{name} and value +\var{val}. The field is appended to the end of the message's existing +fields. + +Note that this does \emph{not} overwrite or delete any existing header +with the same name. If you want to ensure that the new header is the +only one present in the message with field name +\var{name}, first use \method{__delitem__()} to delete all named +fields, e.g.: + +\begin{verbatim} +del msg['subject'] +msg['subject'] = 'Python roolz!' +\end{verbatim} +\end{methoddesc} + +\begin{methoddesc}[Message]{__delitem__}{name} +Delete all occurrences of the field with name \var{name} from the +message's headers. No exception is raised if the named field isn't +present in the headers. +\end{methoddesc} + +\begin{methoddesc}[Message]{has_key}{name} +Return 1 if the message contains a header field named \var{name}, +otherwise return 0. +\end{methoddesc} + +\begin{methoddesc}[Message]{keys}{} +Return a list of all the message's header field names. These keys +will be sorted in the order in which they were added to the message +via \method{__setitem__()}, and may contain duplicates. Any fields +deleted and then subsequently re-added are always appended to the end +of the header list. +\end{methoddesc} + +\begin{methoddesc}[Message]{values}{} +Return a list of all the message's field values. These will be sorted +in the order in which they were added to the message via +\method{__setitem__()}, and may contain duplicates. Any fields +deleted and then subsequently re-added are always appended to the end +of the header list. +\end{methoddesc} + +\begin{methoddesc}[Message]{items}{} +Return a list of 2-tuples containing all the message's field headers and +values. These will be sorted in the order in which they were added to +the message via \method{__setitem__()}, and may contain duplicates. +Any fields deleted and then subsequently re-added are always appended +to the end of the header list. +\end{methoddesc} + +\begin{methoddesc}[Message]{get}{name\optional{, failobj}} +Return the value of the named header field. This is identical to +\method{__getitem__()} except that optional \var{failobj} is returned +if the named header is missing (defaults to \code{None}). +\end{methoddesc} + +Here are some additional useful methods: + +\begin{methoddesc}[Message]{get_all}{name\optional{, failobj}} +Return a list of all the values for the field named \var{name}. These +will be sorted in the order in which they were added to the message +via \method{__setitem__()}. Any fields +deleted and then subsequently re-added are always appended to the end +of the list. + +If there are no such named headers in the message, \var{failobj} is +returned (defaults to \code{None}). +\end{methoddesc} + +\begin{methoddesc}[Message]{add_header}{_name, _value, **_params} +Extended header setting. This method is similar to +\method{__setitem__()} except that additional header parameters can be +provided as keyword arguments. \var{_name} is the header to set and +\var{_value} is the \emph{primary} value for the header. + +For each item in the keyword argument dictionary \var{_params}, the +key is taken as the parameter name, with underscores converted to +dashes (since dashes are illegal in Python identifiers). Normally, +the parameter will be added as \code{key="value"} unless the value is +\code{None}, in which case only the key will be added. + +Here's an example: + +\begin{verbatim} +msg.add_header('Content-Disposition', 'attachment', filename='bud.gif') +\end{verbatim} + +This will add a header that looks like + +\begin{verbatim} +Content-Disposition: attachment; filename="bud.gif" +\end{verbatim} +\end{methoddesc} + +\begin{methoddesc}[Message]{get_type}{\optional{failobj}} +Return the message's content type, as a string of the form +``maintype/subtype'' as taken from the \code{Content-Type:} header. +The returned string is coerced to lowercase. + +If there is no \code{Content-Type:} header in the message, +\var{failobj} is returned (defaults to \code{None}). +\end{methoddesc} + +\begin{methoddesc}[Message]{get_main_type}{\optional{failobj}} +Return the message's \emph{main} content type. This essentially returns the +\var{maintype} part of the string returned by \method{get_type()}, with the +same semantics for \var{failobj}. +\end{methoddesc} + +\begin{methoddesc}[Message]{get_subtype}{\optional{failobj}} +Return the message's sub-content type. This essentially returns the +\var{subtype} part of the string returned by \method{get_type()}, with the +same semantics for \var{failobj}. +\end{methoddesc} + +\begin{methoddesc}[Message]{get_params}{\optional{failobj\optional{, header}}} +Return the message's \code{Content-Type:} parameters, as a list. The +elements of the returned list are 2-tuples of key/value pairs, as +split on the \samp{=} sign. The left hand side of the \samp{=} is the +key, while the right hand side is the value. If there is no \samp{=} +sign in the parameter the value is the empty string. The value is +always unquoted with \method{Utils.unquote()}. + +Optional \var{failobj} is the object to return if there is no +\code{Content-Type:} header. Optional \var{header} is the header to +search instead of \code{Content-Type:}. +\end{methoddesc} + +\begin{methoddesc}[Message]{get_param}{param\optional{, + failobj\optional{, header}}} +Return the value of the \code{Content-Type:} header's parameter +\var{param} as a string. If the message has no \code{Content-Type:} +header or if there is no such parameter, then \var{failobj} is +returned (defaults to \code{None}). + +Optional \var{header} if given, specifies the message header to use +instead of \code{Content-Type:}. +\end{methoddesc} + +\begin{methoddesc}[Message]{get_charsets}{\optional{failobj}} +Return a list containing the character set names in the message. If +the message is a \code{multipart}, then the list will contain one +element for each subpart in the payload, otherwise, it will be a list +of length 1. + +Each item in the list will be a string which is the value of the +\code{charset} parameter in the \code{Content-Type:} header for the +represented subpart. However, if the subpart has no +\code{Content-Type:} header, no \code{charset} parameter, or is not of +the \code{text} main MIME type, then that item in the returned list +will be \var{failobj}. +\end{methoddesc} + +\begin{methoddesc}[Message]{get_filename}{\optional{failobj}} +Return the value of the \code{filename} parameter of the +\code{Content-Disposition:} header of the message, or \var{failobj} if +either the header is missing, or has no \code{filename} parameter. +The returned string will always be unquoted as per +\method{Utils.unquote()}. +\end{methoddesc} + +\begin{methoddesc}[Message]{get_boundary}{\optional{failobj}} +Return the value of the \code{boundary} parameter of the +\code{Content-Type:} header of the message, or \var{failobj} if either +the header is missing, or has no \code{boundary} parameter. The +returned string will always be unquoted as per +\method{Utils.unquote()}. +\end{methoddesc} + +\begin{methoddesc}[Message]{set_boundary}{boundary} +Set the \code{boundary} parameter of the \code{Content-Type:} header +to \var{boundary}. \method{set_boundary()} will always quote +\var{boundary} so you should not quote it yourself. A +\code{HeaderParseError} is raised if the message object has no +\code{Content-Type:} header. + +Note that using this method is subtly different than deleting the old +\code{Content-Type:} header and adding a new one with the new boundary +via \method{add_header()}, because \method{set_boundary()} preserves the +order of the \code{Content-Type:} header in the list of headers. +However, it does \emph{not} preserve any continuation lines which may +have been present in the original \code{Content-Type:} header. +\end{methoddesc} + +\begin{methoddesc}[Message]{walk}{} +The \method{walk()} method is an all-purpose generator which can be +used to iterate over all the parts and subparts of a message object +tree, in depth-first traversal order. You will typically use +\method{walk()} as the iterator in a \code{for ... in} loop; each +iteration returns the next subpart. + +Here's an example that prints the MIME type of every part of a message +object tree: + +\begin{verbatim} +>>> for part in msg.walk(): +>>> print part.get_type('text/plain') +multipart/report +text/plain +message/delivery-status +text/plain +text/plain +message/rfc822 +\end{verbatim} +\end{methoddesc} + +\class{Message} objects can also optionally contain two instance +attributes, which can be used when generating the plain text of a MIME +message. + +\begin{datadesc}{preamble} +The format of a MIME document allows for some text between the blank +line following the headers, and the first multipart boundary string. +Normally, this text is never visible in a MIME-aware mail reader +because it falls outside the standard MIME armor. However, when +viewing the raw text of the message, or when viewing the message in a +non-MIME aware reader, this text can become visible. + +The \var{preamble} attribute contains this leading extra-armor text +for MIME documents. When the \class{Parser} discovers some text after +the headers but before the first boundary string, it assigns this text +to the message's \var{preamble} attribute. When the \class{Generator} +is writing out the plain text representation of a MIME message, and it +finds the message has a \var{preamble} attribute, it will write this +text in the area between the headers and the first boundary. + +Note that if the message object has no preamble, the +\var{preamble} attribute will be \code{None}. +\end{datadesc} + +\begin{datadesc}{epilogue} +The \var{epilogue} attribute acts the same way as the \var{preamble} +attribute, except that it contains text that appears between the last +boundary and the end of the message. +\end{datadesc} diff --git a/Doc/lib/emailparser.tex b/Doc/lib/emailparser.tex new file mode 100644 index 0000000..c96c3b3 --- /dev/null +++ b/Doc/lib/emailparser.tex @@ -0,0 +1,96 @@ +\section{\module{email.Parser} --- + Parsing flat text email messages} + +\declaremodule{standard}{email.Parser} +\modulesynopsis{Parse flat text email messages to produce a message + object tree.} +\sectionauthor{Barry A. Warsaw}{barry@zope.com} + +\versionadded{2.2} + +The \module{Parser} module provides a single class, the \class{Parser} +class, which is used to take a message in flat text form and create +the associated object model. The resulting object tree can then be +manipulated using the \class{Message} class interface as described in +\refmodule{email.Message}, and turned over +to a generator (as described in \refmodule{emamil.Generator}) to +return the textual representation of the message. It is intended that +the \class{Parser} to \class{Generator} path be idempotent if the +object model isn't modified in between. + +\subsection{Parser class API} + +\begin{classdesc}{Parser}{\optional{_class}} +The constructor for the \class{Parser} class takes a single optional +argument \var{_class}. This must be callable factory (i.e. a function +or a class), and it is used whenever a sub-message object needs to be +created. It defaults to \class{Message} (see +\refmodule{email.Message}). \var{_class} will be called with zero +arguments. +\end{classdesc} + +The other public \class{Parser} methods are: + +\begin{methoddesc}[Parser]{parse}{fp} +Read all the data from the file-like object \var{fp}, parse the +resulting text, and return the root message object. \var{fp} must +support both the \method{readline()} and the \method{read()} methods +on file-like objects. + +The text contained in \var{fp} must be formatted as a block of \rfc{2822} +style headers and header continuation lines, optionally preceeded by a +\emph{Unix-From} header. The header block is terminated either by the +end of the data or by a blank line. Following the header block is the +body of the message (which may contain MIME-encoded subparts). +\end{methoddesc} + +\begin{methoddesc}[Parser]{parsestr}{text} +Similar to the \method{parse()} method, except it takes a string +object instead of a file-like object. Calling this method on a string +is exactly equivalent to wrapping \var{text} in a \class{StringIO} +instance first and calling \method{parse()}. +\end{methoddesc} + +Since creating a message object tree from a string or a file object is +such a common task, two functions are provided as a convenience. They +are available in the top-level \module{email} package namespace. + +\begin{funcdesc}{message_from_string}{s\optional{, _class}} +Return a message object tree from a string. This is exactly +equivalent to \code{Parser().parsestr(s)}. Optional \var{_class} is +interpreted as with the \class{Parser} class constructor. +\end{funcdesc} + +\begin{funcdesc}{message_from_file}{fp\optional{, _class}} +Return a message object tree from an open file object. This is exactly +equivalent to \code{Parser().parse(fp)}. Optional \var{_class} is +interpreted as with the \class{Parser} class constructor. +\end{funcdesc} + +Here's an example of how you might use this at an interactive Python +prompt: + +\begin{verbatim} +>>> import email +>>> msg = email.message_from_string(myString) +\end{verbatim} + +\subsection{Additional notes} + +Here are some notes on the parsing semantics: + +\begin{itemize} +\item Most non-\code{multipart} type messages are parsed as a single + message object with a string payload. These objects will return + 0 for \method{is_multipart()}. +\item One exception is for \code{message/delivery-status} type + messages. Because such the body of such messages consist of + blocks of headers, \class{Parser} will create a non-multipart + object containing non-multipart subobjects for each header + block. +\item Another exception is for \code{message/*} types (i.e. more + general than \code{message/delivery-status}. These are + typically \code{message/rfc822} type messages, represented as a + non-multipart object containing a singleton payload, another + non-multipart \class{Message} instance. +\end{itemize} diff --git a/Doc/lib/emailutil.tex b/Doc/lib/emailutil.tex new file mode 100644 index 0000000..e028fcd --- /dev/null +++ b/Doc/lib/emailutil.tex @@ -0,0 +1,119 @@ +\section{\module{email.Utils} --- + Miscellaneous email package utilities} + +\declaremodule{standard}{email.Utils} +\modulesynopsis{Miscellaneous email package utilities.} +\sectionauthor{Barry A. Warsaw}{barry@zope.com} + +\versionadded{2.2} + +There are several useful utilities provided with the \module{email} +package. + +\begin{funcdesc}{quote}{str} +Return a new string with backslashes in \var{str} replaced by two +backslashes and double quotes replaced by backslash-double quote. +\end{funcdesc} + +\begin{funcdesc}{unquote}{str} +Return a new string which is an \emph{unquoted} version of \var{str}. +If \var{str} ends and begins with double quotes, they are stripped +off. Likewise if \var{str} ends and begins with angle brackets, they +are stripped off. +\end{funcdesc} + +\begin{funcdesc}{parseaddr}{address} +Parse address -- which should be the value of some address-containing +field such as \code{To:} or \code{Cc:} -- into its constituent +``realname'' and ``email address'' parts. Returns a tuple of that +information, unless the parse fails, in which case a 2-tuple of +\code{(None, None)} is returned. +\end{funcdesc} + +\begin{funcdesc}{dump_address_pair}{pair} +The inverse of \method{parseaddr()}, this takes a 2-tuple of the form +\code{(realname, email_address)} and returns the string value suitable +for a \code{To:} or \code{Cc:} header. If the first element of +\var{pair} is false, then the second element is returned unmodified. +\end{funcdesc} + +\begin{funcdesc}{getaddresses}{fieldvalues} +This method returns a list of 2-tuples of the form returned by +\code{parseaddr()}. \var{fieldvalues} is a sequence of header field +values as might be returned by \method{Message.getall()}. Here's a +simple example that gets all the recipients of a message: + +\begin{verbatim} +from email.Utils import getaddresses + +tos = msg.get_all('to') +ccs = msg.get_all('cc') +resent_tos = msg.get_all('resent-to') +resent_ccs = msg.get_all('resent-cc') +all_recipients = getaddresses(tos + ccs + resent_tos + resent_ccs) +\end{verbatim} +\end{funcdesc} + +\begin{funcdesc}{decode}{s} +This method decodes a string according to the rules in \rfc{2047}. It +returns the decoded string as a Python unicode string. +\end{funcdesc} + +\begin{funcdesc}{encode}{s\optional{, charset\optional{, encoding}}} +This method encodes a string according to the rules in \rfc{2047}. It +is not actually the inverse of \function{decode()} since it doesn't +handle multiple character sets or multiple string parts needing +encoding. In fact, the input string \var{s} must already be encoded +in the \var{charset} character set (Python can't reliably guess what +character set a string might be encoded in). The default +\var{charset} is \samp{iso-8859-1}. + +\var{encoding} must be either the letter \samp{q} for +Quoted-Printable or \samp{b} for Base64 encoding. If +neither, a \code{ValueError} is raised. Both the \var{charset} and +the \var{encoding} strings are case-insensitive, and coerced to lower +case in the returned string. +\end{funcdesc} + +\begin{funcdesc}{parsedate}{date} +Attempts to parse a date according to the rules in \rfc{2822}. +however, some mailers don't follow that format as specified, so +\function{parsedate()} tries to guess correctly in such cases. +\var{date} is a string containing an \rfc{2822} date, such as +\code{"Mon, 20 Nov 1995 19:12:08 -0500"}. If it succeeds in parsing +the date, \function{parsedate()} returns a 9-tuple that can be passed +directly to \function{time.mktime()}; otherwise \code{None} will be +returned. Note that fields 6, 7, and 8 of the result tuple are not +usable. +\end{funcdesc} + +\begin{funcdesc}{parsedate_tz}{date} +Performs the same function as \function{parsedate()}, but returns +either \code{None} or a 10-tuple; the first 9 elements make up a tuple +that can be passed directly to \function{time.mktime()}, and the tenth +is the offset of the date's timezone from UTC (which is the official +term for Greenwich Mean Time)\footnote{Note that the sign of the timezone +offset is the opposite of the sign of the \code{time.timezone} +variable for the same timezone; the latter variable follows the +\POSIX{} standard while this module follows \rfc{2822}.}. If the input +string has no timezone, the last element of the tuple returned is +\code{None}. Note that fields 6, 7, and 8 of the result tuple are not +usable. +\end{funcdesc} + +\begin{funcdesc}{mktime_tz}{tuple} +Turn a 10-tuple as returned by \function{parsedate_tz()} into a UTC +timestamp. It the timezone item in the tuple is \code{None}, assume +local time. Minor deficiency: \function{mktime_tz()} interprets the +first 8 elements of \var{tuple} as a local time and then compensates +for the timezone difference. This may yield a slight error around +changes in daylight savings time, though not worth worring about for +common use. +\end{funcdesc} + +\begin{funcdesc}{formatdate}{\optional{timeval}} +Returns the time formatted as per Internet standards \rfc{2822} +and updated by \rfc{1123}. If \var{timeval} is provided, then it +should be a floating point time value as expected by +\method{time.gmtime()}, otherwise the current time is used. +\end{funcdesc} -- cgit v0.12