summaryrefslogtreecommitdiffstats
path: root/Doc
diff options
context:
space:
mode:
authorBarry Warsaw <barry@python.org>2004-10-03 03:16:19 (GMT)
committerBarry Warsaw <barry@python.org>2004-10-03 03:16:19 (GMT)
commitbb113867305f8ab70947bffb77961a60d10730dc (patch)
tree0af1fbf0fbbd95170636205343ba827cf768bb38 /Doc
parent2cdd608601071df8e557beaaa78b54884c80e8de (diff)
downloadcpython-bb113867305f8ab70947bffb77961a60d10730dc.zip
cpython-bb113867305f8ab70947bffb77961a60d10730dc.tar.gz
cpython-bb113867305f8ab70947bffb77961a60d10730dc.tar.bz2
Big email 3.0 API changes, with updated unit tests and documentation.
Briefly (from the NEWS file): - Updates for the email package: + All deprecated APIs that in email 2.x issued warnings have been removed: _encoder argument to the MIMEText constructor, Message.add_payload(), Utils.dump_address_pair(), Utils.decode(), Utils.encode() + New deprecations: Generator.__call__(), Message.get_type(), Message.get_main_type(), Message.get_subtype(), the 'strict' argument to the Parser constructor. These will be removed in email 3.1. + Support for Python earlier than 2.3 has been removed (see PEP 291). + All defect classes have been renamed to end in 'Defect'. + Some FeedParser fixes; also a MultipartInvariantViolationDefect will be added to messages that claim to be multipart but really aren't. + Updates to documentation.
Diffstat (limited to 'Doc')
-rw-r--r--Doc/lib/email.tex47
-rw-r--r--Doc/lib/emailencoders.tex10
-rw-r--r--Doc/lib/emailexc.tex33
-rw-r--r--Doc/lib/emailmessage.tex46
-rw-r--r--Doc/lib/emailmimebase.tex9
-rw-r--r--Doc/lib/emailparser.tex96
-rw-r--r--Doc/lib/emailutil.tex35
7 files changed, 195 insertions, 81 deletions
diff --git a/Doc/lib/email.tex b/Doc/lib/email.tex
index debed70..56affa5 100644
--- a/Doc/lib/email.tex
+++ b/Doc/lib/email.tex
@@ -1,5 +1,5 @@
-% Copyright (C) 2001,2002 Python Software Foundation
-% Author: barry@zope.com (Barry Warsaw)
+% Copyright (C) 2001-2004 Python Software Foundation
+% Author: barry@python.org (Barry Warsaw)
\section{\module{email} ---
An email and MIME handling package}
@@ -7,8 +7,8 @@
\declaremodule{standard}{email}
\modulesynopsis{Package supporting the parsing, manipulating, and
generating email messages, including MIME documents.}
-\moduleauthor{Barry A. Warsaw}{barry@zope.com}
-\sectionauthor{Barry A. Warsaw}{barry@zope.com}
+\moduleauthor{Barry A. Warsaw}{barry@python.org}
+\sectionauthor{Barry A. Warsaw}{barry@python.org}
\versionadded{2.2}
@@ -22,7 +22,7 @@ sending of email messages to SMTP (\rfc{2821}) servers; that is the
function of the \refmodule{smtplib} module. The \module{email}
package attempts to be as RFC-compliant as possible, supporting in
addition to \rfc{2822}, such MIME-related RFCs as
-\rfc{2045}-\rfc{2047}, and \rfc{2231}.
+\rfc{2045}, \rfc{2046}, \rfc{2047}, and \rfc{2231}.
The primary distinguishing feature of the \module{email} package is
that it splits the parsing and generating of email messages from the
@@ -79,7 +79,7 @@ package, a section on differences and porting is provided.
\subsection{Encoders}
\input{emailencoders}
-\subsection{Exception classes}
+\subsection{Exception and Defect classes}
\input{emailexc}
\subsection{Miscellaneous utilities}
@@ -88,14 +88,41 @@ package, a section on differences and porting is provided.
\subsection{Iterators}
\input{emailiter}
-\subsection{Differences from \module{email} v1 (up to Python 2.2.1)}
+\subsection{Package History}
Version 1 of the \module{email} package was bundled with Python
releases up to Python 2.2.1. Version 2 was developed for the Python
2.3 release, and backported to Python 2.2.2. It was also available as
-a separate distutils based package. \module{email} version 2 is
-almost entirely backward compatible with version 1, with the
-following differences:
+a separate distutils-based package, and is compatible back to Python 2.1.
+
+\module{email} version 3.0 was released with Python 2.4 and as a separate
+distutils-based package. It is compatible back to Python 2.3.
+
+Here are the differences between \module{email} version 3 and version 2:
+
+\begin{itemize}
+\item The \class{FeedParser} class was introduced, and the \class{Parser}
+ class was implemented in terms of the \class{FeedParser}. All parsing
+ there for is non-strict, and parsing will make a best effort never to
+ raise an exception. Problems found while parsing messages are stored in
+ the message's \var{defect} attribute.
+
+\item All aspects of the API which raised \exception{DeprecationWarning}s in
+ version 2 have been removed. These include the \var{_encoder} argument
+ to the \class{MIMEText} constructor, the \method{Message.add_payload()}
+ method, the \function{Utils.dump_address_pair()} function, and the
+ functions \function{Utils.decode()} and \function{Utils.encode()}.
+
+\item New \exception{DeprecationWarning}s have been added to:
+ \method{Generator.__call__()}, \method{Message.get_type()},
+ \method{Message.get_main_type()}, \method{Message.get_subtype()}, and
+ the \var{strict} argument to the \class{Parser} class. These are
+ expected to be removed in email 3.1.
+
+\item Support for Pythons earlier than 2.3 has been removed.
+\end{itemize}
+
+Here are the differences between \module{email} version 2 and version 1:
\begin{itemize}
\item The \module{email.Header} and \module{email.Charset} modules
diff --git a/Doc/lib/emailencoders.tex b/Doc/lib/emailencoders.tex
index cd54d68..a49e04d 100644
--- a/Doc/lib/emailencoders.tex
+++ b/Doc/lib/emailencoders.tex
@@ -8,11 +8,11 @@ type messages containing binary data.
The \module{email} package provides some convenient encodings in its
\module{Encoders} module. These encoders are actually used by the
-\class{MIMEImage} and \class{MIMEText} class constructors to provide default
-encodings. All encoder functions take exactly one argument, the
-message object to encode. They usually extract the payload, encode
-it, and reset the payload to this newly encoded value. They should also
-set the \mailheader{Content-Transfer-Encoding} header as appropriate.
+\class{MIMEAudio} and \class{MIMEImage} class constructors to provide default
+encodings. All encoder functions take exactly one argument, the message
+object to encode. They usually extract the payload, encode it, and reset the
+payload to this newly encoded value. They should also set the
+\mailheader{Content-Transfer-Encoding} header as appropriate.
Here are the encoding functions provided:
diff --git a/Doc/lib/emailexc.tex b/Doc/lib/emailexc.tex
index 824a276..6ac0889 100644
--- a/Doc/lib/emailexc.tex
+++ b/Doc/lib/emailexc.tex
@@ -52,3 +52,36 @@ rarely raised in practice. However the exception may also be raised
if the \method{attach()} method is called on an instance of a class
derived from \class{MIMENonMultipart} (e.g. \class{MIMEImage}).
\end{excclassdesc}
+
+Here's the list of the defects that the \class{FeedParser} can find while
+parsing messages. Note that the defects are added to the message where the
+problem was found, so for example, if a message nested inside a
+\mimetype{multipart/alternative} had a malformed header, that nested message
+object would have a defect, but the containing messages would not.
+
+All defect classes are subclassed from \class{email.Errors.MessageDefect}, but
+this class is \emph{not} an exception!
+
+\versionadded[All the defect classes were added]{2.4}
+
+\begin{itemize}
+\item \class{NoBoundaryInMultipartDefect} -- A message claimed to be a
+ multipart, but had no \mimetype{boundary} parameter.
+
+\item \class{StartBoundaryNotFoundDefect} -- The start boundary claimed in the
+ \mailheader{Content-Type} header was never found.
+
+\item \class{FirstHeaderLineIsContinuationDefect} -- The message had a
+ continuation line as its first header line.
+
+\item \class{MisplacedEnvelopeHeaderDefect} - A ``Unix From'' header was found
+ in the middle of a header block.
+
+\item \class{MalformedHeaderDefect} -- A header was found that was missing a
+ colon, or was otherwise malformed.
+
+\item \class{MultipartInvariantViolationDefect} -- A message claimed to be a
+ \mimetype{multipart}, but no subparts were found. Note that when a
+ message has this defect, its \method{is_multipart()} method may return
+ false even though its content type claims to be \mimetype{multipart}.
+\end{itemize}
diff --git a/Doc/lib/emailmessage.tex b/Doc/lib/emailmessage.tex
index 1943273..f732054 100644
--- a/Doc/lib/emailmessage.tex
+++ b/Doc/lib/emailmessage.tex
@@ -359,13 +359,16 @@ the form \code{(CHARSET, LANGUAGE, VALUE)}. Note that both \code{CHARSET} and
\code{VALUE} to be encoded in the \code{us-ascii} charset. You can
usually ignore \code{LANGUAGE}.
-Your application should be prepared to deal with 3-tuple return
-values, and can convert the parameter to a Unicode string like so:
+If your application doesn't care whether the parameter was encoded as in
+\rfc{2231}, you can collapse the parameter value by calling
+\function{email.Utils.collapse_rfc2231_value()}, passing in the return value
+from \method{get_param()}. This will return a suitably decoded Unicode string
+whn the value is a tuple, or the original string unquoted if it isn't. For
+example:
\begin{verbatim}
-param = msg.get_param('foo')
-if isinstance(param, tuple):
- param = unicode(param[2], param[0] or 'us-ascii')
+rawparam = msg.get_param('foo')
+param = email.Utils.collapse_rfc2231_value(rawparam)
\end{verbatim}
In any case, the parameter value (either the returned string, or the
@@ -549,32 +552,21 @@ newline get printed after your closing \mimetype{multipart} boundary,
set the \var{epilogue} to the empty string.
\end{datadesc}
-\subsubsection{Deprecated methods}
-
-The following methods are deprecated in \module{email} version 2.
-They are documented here for completeness.
+\begin{datadesc}{defects}
+The \var{defects} attribute contains a list of all the problems found when
+parsing this message. See \refmodule{email.Errors} for a detailed description
+of the possible parsing defects.
-\begin{methoddesc}[Message]{add_payload}{payload}
-Add \var{payload} to the message object's existing payload. If, prior
-to calling this method, the object's payload was \code{None}
-(i.e. never before set), then after this method is called, the payload
-will be the argument \var{payload}.
+\versionadded{2.4}
+\end{datadesc}
-If the object's payload was already a list
-(i.e. \method{is_multipart()} returns \code{True}), then \var{payload} is
-appended to the end of the existing payload list.
+\subsubsection{Deprecated methods}
-For any other type of existing payload, \method{add_payload()} will
-transform the new payload into a list consisting of the old payload
-and \var{payload}, but only if the document is already a MIME
-multipart document. This condition is satisfied if the message's
-\mailheader{Content-Type} header's main type is either
-\mimetype{multipart}, or there is no \mailheader{Content-Type}
-header. In any other situation,
-\exception{MultipartConversionError} is raised.
+\versionchanged[The \method{add_payload()} method was removed; use the
+\method{attach()} method instead]{2.4}
-\deprecated{2.2.2}{Use the \method{attach()} method instead.}
-\end{methoddesc}
+The following methods are deprecated. They are documented here for
+completeness.
\begin{methoddesc}[Message]{get_type}{\optional{failobj}}
Return the message's content type, as a string of the form
diff --git a/Doc/lib/emailmimebase.tex b/Doc/lib/emailmimebase.tex
index 3318d6a..070c9a2 100644
--- a/Doc/lib/emailmimebase.tex
+++ b/Doc/lib/emailmimebase.tex
@@ -142,9 +142,7 @@ Optional \var{_subtype} sets the subtype of the message; it defaults
to \mimetype{rfc822}.
\end{classdesc}
-\begin{classdesc}{MIMEText}{_text\optional{, _subtype\optional{,
- _charset\optional{, _encoder}}}}
-
+\begin{classdesc}{MIMEText}{_text\optional{, _subtype\optional{, _charset}}}
A subclass of \class{MIMENonMultipart}, the \class{MIMEText} class is
used to create MIME objects of major type \mimetype{text}.
\var{_text} is the string for the payload. \var{_subtype} is the
@@ -153,6 +151,7 @@ character set of the text and is passed as a parameter to the
\class{MIMENonMultipart} constructor; it defaults to \code{us-ascii}. No
guessing or encoding is performed on the text data.
-\deprecated{2.2.2}{The \var{_encoding} argument has been deprecated.
-Encoding now happens implicitly based on the \var{_charset} argument.}
+\versionchanged[The previously deprecated \var{_encoding} argument has
+been removed. Encoding happens implicitly based on the \var{_charset}
+argument]{2.4}
\end{classdesc}
diff --git a/Doc/lib/emailparser.tex b/Doc/lib/emailparser.tex
index 1e8597c..5fac92f 100644
--- a/Doc/lib/emailparser.tex
+++ b/Doc/lib/emailparser.tex
@@ -18,29 +18,79 @@ messages, the root object will return \code{True} from its
\method{is_multipart()} method, and the subparts can be accessed via
the \method{get_payload()} and \method{walk()} methods.
+There are actually two parser interfaces available for use, the classic
+\class{Parser} API and the incremental \class{FeedParser} API. The classic
+\class{Parser} API is fine if you have the entire text of the message in
+memory as a string, or if the entire message lives in a file on the file
+system. \class{FeedParser} is more appropriate for when you're reading the
+message from a stream which might block waiting for more input (e.g. reading
+an email message from a socket). The \class{FeedParser} can consume and parse
+the message incrementally, and only returns the root object when you close the
+parser\footnote{As of email package version 3.0, introduced in
+Python 2.4, the classic \class{Parser} was re-implemented in terms of the
+\class{FeedParser}, so the semantics and results are identical between the two
+parsers.}.
+
Note that the parser can be extended in limited ways, and of course
you can implement your own parser completely from scratch. There is
no magical connection between the \module{email} package's bundled
parser and the \class{Message} class, so your custom parser can create
message object trees any way it finds necessary.
-The primary parser class is \class{Parser} which parses both the
-headers and the payload of the message. In the case of
-\mimetype{multipart} messages, it will recursively parse the body of
-the container message. Two modes of parsing are supported,
-\emph{strict} parsing, which will usually reject any non-RFC compliant
-message, and \emph{lax} parsing, which attempts to adjust for common
-MIME formatting problems.
+\subsubsection{FeedParser API}
+
+\versionadded{2.4}
+
+The \class{FeedParser} provides an API that is conducive to incremental
+parsing of email messages, such as would be necessary when reading the text of
+an email message from a source that can block (e.g. a socket). The
+\class{FeedParser} can of course be used to parse an email message fully
+contained in a string or a file, but the classic \class{Parser} API may be
+more convenient for such use cases. The semantics and results of the two
+parser APIs are identical.
+
+The \class{FeedParser}'s API is simple; you create an instance, feed it a
+bunch of text until there's no more to feed it, then close the parser to
+retrieve the root message object. The \class{FeedParser} is extremely
+accurate when parsing standards-compliant messages, and it does a very good
+job of parsing non-compliant messages, providing information about how a
+message was deemed broken. It will populate a message object's \var{defects}
+attribute with a list of any problems it found in a message. See the
+\refmodule{email.Errors} module for the list of defects that it can find.
+
+Here is the API for the \class{FeedParser}:
+
+\begin{classdesc}{FeedParser}{\optional{_factory}}
+Create a \class{FeedParser} instance. Optional \var{_factory} is a
+no-argument callable that will be called whenever a new message object is
+needed. It defaults to the \class{email.Message.Message} class.
+\end{classdesc}
+
+\begin{methoddesc}[FeedParser]{feed}{data}
+Feed the \class{FeedParser} some more data. \var{data} should be a
+string containing one or more lines. The lines can be partial and the
+\class{FeedParser} will stitch such partial lines together properly. The
+lines in the string can have any of the common three line endings, carriage
+return, newline, or carriage return and newline (they can even be mixed).
+\end{methoddesc}
+
+\begin{methoddesc}[FeedParser]{close}{}
+Closing a \class{FeedParser} completes the parsing of all previously fed data,
+and returns the root message object. It is undefined what happens if you feed
+more data to a closed \class{FeedParser}.
+\end{methoddesc}
-The \module{email.Parser} module also provides a second class, called
+\subsubsection{Parser class API}
+
+The \class{Parser} provides an API that can be used to parse a message when
+the complete contents of the message are available in a string or file. The
+\module{email.Parser} module also provides a second class, called
\class{HeaderParser} which can be used if you're only interested in
the headers of the message. \class{HeaderParser} can be much faster in
these situations, since it does not attempt to parse the message body,
instead setting the payload to the raw body as a string.
\class{HeaderParser} has the same API as the \class{Parser} class.
-\subsubsection{Parser class API}
-
\begin{classdesc}{Parser}{\optional{_class\optional{, strict}}}
The constructor for the \class{Parser} class takes an optional
argument \var{_class}. This must be a callable factory (such as a
@@ -49,19 +99,14 @@ needs to be created. It defaults to \class{Message} (see
\refmodule{email.Message}). The factory will be called without
arguments.
-The optional \var{strict} flag specifies whether strict or lax parsing
-should be performed. Normally, when things like MIME terminating
-boundaries are missing, or when messages contain other formatting
-problems, the \class{Parser} will raise a
-\exception{MessageParseError}. However, when lax parsing is enabled,
-the \class{Parser} will attempt to work around such broken formatting
-to produce a usable message structure (this doesn't mean
-\exception{MessageParseError}s are never raised; some ill-formatted
-messages just can't be parsed). The \var{strict} flag defaults to
-\code{False} since lax parsing usually provides the most convenient
-behavior.
+The optional \var{strict} flag is ignored. \deprecated{2.4}{Because the
+\class{Parser} class is a backward compatible API wrapper around the
+new-in-Python 2.4 \class{FeedParser}, \emph{all} parsing is effectively
+non-strict. You should simply stop passing a \var{strict} flag to the
+\class{Parser} constructor.}
\versionchanged[The \var{strict} flag was added]{2.2.2}
+\versionchanged[The \var{strict} flag was deprecated]{2.4}
\end{classdesc}
The other public \class{Parser} methods are:
@@ -149,4 +194,13 @@ Here are some notes on the parsing semantics:
object containing a list payload of length 1. Their
\method{is_multipart()} method will return \code{True}. The
single element in the list payload will be a sub-message object.
+
+\item Some non-standards compliant messages may not be internally consistent
+ about their \mimetype{multipart}-edness. Such messages may have a
+ \mailheader{Content-Type} header of type \mimetype{multipart}, but their
+ \method{is_multipart()} method may return \code{False}. If such
+ messages were parsed with the \class{FeedParser}, they will have an
+ instance of the \class{MultipartInvariantViolationDefect} class in their
+ \var{defects} attribute list. See \refmodule{email.Errors} for
+ details.
\end{itemize}
diff --git a/Doc/lib/emailutil.tex b/Doc/lib/emailutil.tex
index 80f0acf..c41f066 100644
--- a/Doc/lib/emailutil.tex
+++ b/Doc/lib/emailutil.tex
@@ -119,24 +119,33 @@ as-is. If \var{charset} is given but \var{language} is not, the
string is encoded using the empty string for \var{language}.
\end{funcdesc}
+\begin{funcdesc}{collapse_rfc2231_value}{value\optional{, errors\optional{,
+ fallback_charset}}}
+When a header parameter is encoded in \rfc{2231} format,
+\method{Message.get_param()} may return a 3-tuple containing the character
+set, language, and value. \function{collapse_rfc2231_value()} turns this into
+a unicode string. Optional \var{errors} is passed to the \var{errors}
+argument of the built-in \function{unicode()} function; it defaults to
+\code{replace}. Optional \var{fallback_charset} specifies the character set
+to use if the one in the \rfc{2231} header is not known by Python; it defaults
+to \code{us-ascii}.
+
+For convenience, if the \var{value} passed to
+\function{collapse_rfc2231_value()} is not a tuple, it should be a string and
+it is returned unquoted.
+\end{funcdesc}
+
\begin{funcdesc}{decode_params}{params}
Decode parameters list according to \rfc{2231}. \var{params} is a
sequence of 2-tuples containing elements of the form
\code{(content-type, string-value)}.
\end{funcdesc}
-The following functions have been deprecated:
-
-\begin{funcdesc}{dump_address_pair}{pair}
-\deprecated{2.2.2}{Use \function{formataddr()} instead.}
-\end{funcdesc}
-
-\begin{funcdesc}{decode}{s}
-\deprecated{2.2.2}{Use \method{Header.decode_header()} instead.}
-\end{funcdesc}
-
+\versionchanged[The \function{dump_address_pair()} function has been removed;
+use \function{formataddr()} instead.]{2.4}
-\begin{funcdesc}{encode}{s\optional{, charset\optional{, encoding}}}
-\deprecated{2.2.2}{Use \method{Header.encode()} instead.}
-\end{funcdesc}
+\versionchanged[The \function{decode()} function has been removed; use the
+\method{Header.decode_header()} method instead.]{2.4}
+\versionchanged[The \function{encode()} function has been removed; use the
+\method{Header.encode()} method instead.]{2.4}