Big email 3.0 API changes, with updated unit tests and documentation.

Briefly (from the NEWS file): - Updates for the email package: + All deprecated APIs that in email 2.x issued warnings have been removed: _encoder argument to the MIMEText constructor, Message.add_payload(), Utils.dump_address_pair(), Utils.decode(), Utils.encode() + New deprecations: Generator.__call__(), Message.get_type(), Message.get_main_type(), Message.get_subtype(), the 'strict' argument to the Parser constructor. These will be removed in email 3.1. + Support for Python earlier than 2.3 has been removed (see PEP 291). + All defect classes have been renamed to end in 'Defect'. + Some FeedParser fixes; also a MultipartInvariantViolationDefect will be added to messages that claim to be multipart but really aren't. + Updates to documentation.
author: Barry Warsaw <barry@python.org> 2004-10-03 03:16:19 (GMT)
committer: Barry Warsaw <barry@python.org> 2004-10-03 03:16:19 (GMT)
commit: bb113867305f8ab70947bffb77961a60d10730dc (patch)
tree: 0af1fbf0fbbd95170636205343ba827cf768bb38 /Doc/lib
parent: 2cdd608601071df8e557beaaa78b54884c80e8de (diff)
download: cpython-bb113867305f8ab70947bffb77961a60d10730dc.zip
cpython-bb113867305f8ab70947bffb77961a60d10730dc.tar.gz
cpython-bb113867305f8ab70947bffb77961a60d10730dc.tar.bz2
7 files changed, 195 insertions, 81 deletions
diff --git a/Doc/lib/email.tex b/Doc/lib/email.tex
index debed70..56affa5 100644
--- a/Doc/lib/email.tex
+++ b/Doc/lib/email.tex
@@ -1,5 +1,5 @@
-% Copyright (C) 2001,2002 Python Software Foundation
-% Author: barry@zope.com (Barry Warsaw)
+% Copyright (C) 2001-2004 Python Software Foundation
+% Author: barry@python.org (Barry Warsaw)
 
 \section{\module{email} ---
 	 An email and MIME handling package}
@@ -7,8 +7,8 @@
 \declaremodule{standard}{email}
 \modulesynopsis{Package supporting the parsing, manipulating, and
     generating email messages, including MIME documents.}
-\moduleauthor{Barry A. Warsaw}{barry@zope.com}
-\sectionauthor{Barry A. Warsaw}{barry@zope.com}
+\moduleauthor{Barry A. Warsaw}{barry@python.org}
+\sectionauthor{Barry A. Warsaw}{barry@python.org}
 
 \versionadded{2.2}
 
@@ -22,7 +22,7 @@ sending of email messages to SMTP (\rfc{2821}) servers; that is the
 function of the \refmodule{smtplib} module.  The \module{email}
 package attempts to be as RFC-compliant as possible, supporting in
 addition to \rfc{2822}, such MIME-related RFCs as
-\rfc{2045}-\rfc{2047}, and \rfc{2231}.
+\rfc{2045}, \rfc{2046}, \rfc{2047}, and \rfc{2231}.
 
 The primary distinguishing feature of the \module{email} package is
 that it splits the parsing and generating of email messages from the
@@ -79,7 +79,7 @@ package, a section on differences and porting is provided.
 \subsection{Encoders}
 \input{emailencoders}
 
-\subsection{Exception classes}
+\subsection{Exception and Defect classes}
 \input{emailexc}
 
 \subsection{Miscellaneous utilities}
@@ -88,14 +88,41 @@ package, a section on differences and porting is provided.
 \subsection{Iterators}
 \input{emailiter}
 
-\subsection{Differences from \module{email} v1 (up to Python 2.2.1)}
+\subsection{Package History}
 
 Version 1 of the \module{email} package was bundled with Python
 releases up to Python 2.2.1.  Version 2 was developed for the Python
 2.3 release, and backported to Python 2.2.2.  It was also available as
-a separate distutils based package.  \module{email} version 2 is
-almost entirely backward compatible with version 1, with the
-following differences:
+a separate distutils-based package, and is compatible back to Python 2.1.
+
+\module{email} version 3.0 was released with Python 2.4 and as a separate
+distutils-based package.  It is compatible back to Python 2.3.
+
+Here are the differences between \module{email} version 3 and version 2:
+
+\begin{itemize}
+\item The \class{FeedParser} class was introduced, and the \class{Parser}
+      class was implemented in terms of the \class{FeedParser}.  All parsing
+      there for is non-strict, and parsing will make a best effort never to
+      raise an exception.  Problems found while parsing messages are stored in
+      the message's \var{defect} attribute.
+
+\item All aspects of the API which raised \exception{DeprecationWarning}s in
+      version 2 have been removed.  These include the \var{_encoder} argument
+      to the \class{MIMEText} constructor, the \method{Message.add_payload()}
+      method, the \function{Utils.dump_address_pair()} function, and the
+      functions \function{Utils.decode()} and \function{Utils.encode()}.
+
+\item New \exception{DeprecationWarning}s have been added to:
+      \method{Generator.__call__()}, \method{Message.get_type()},
+      \method{Message.get_main_type()}, \method{Message.get_subtype()}, and
+      the \var{strict} argument to the \class{Parser} class.  These are
+      expected to be removed in email 3.1.
+
+\item Support for Pythons earlier than 2.3 has been removed.
+\end{itemize}
+
+Here are the differences between \module{email} version 2 and version 1:
 
 \begin{itemize}
 \item The \module{email.Header} and \module{email.Charset} modules
diff --git a/Doc/lib/emailencoders.tex b/Doc/lib/emailencoders.tex
index cd54d68..a49e04d 100644
--- a/Doc/lib/emailencoders.tex
+++ b/Doc/lib/emailencoders.tex
@@ -8,11 +8,11 @@ type messages containing binary data.
 
 The \module{email} package provides some convenient encodings in its
 \module{Encoders} module.  These encoders are actually used by the
-\class{MIMEImage} and \class{MIMEText} class constructors to provide default
-encodings.  All encoder functions take exactly one argument, the
-message object to encode.  They usually extract the payload, encode
-it, and reset the payload to this newly encoded value.  They should also
-set the \mailheader{Content-Transfer-Encoding} header as appropriate.
+\class{MIMEAudio} and \class{MIMEImage} class constructors to provide default
+encodings.  All encoder functions take exactly one argument, the message
+object to encode.  They usually extract the payload, encode it, and reset the
+payload to this newly encoded value.  They should also set the
+\mailheader{Content-Transfer-Encoding} header as appropriate.
 
 Here are the encoding functions provided:
 
diff --git a/Doc/lib/emailexc.tex b/Doc/lib/emailexc.tex
index 824a276..6ac0889 100644
--- a/Doc/lib/emailexc.tex
+++ b/Doc/lib/emailexc.tex
@@ -52,3 +52,36 @@ rarely raised in practice.  However the exception may also be raised
 if the \method{attach()} method is called on an instance of a class
 derived from \class{MIMENonMultipart} (e.g. \class{MIMEImage}).
 \end{excclassdesc}
+
+Here's the list of the defects that the \class{FeedParser} can find while
+parsing messages.  Note that the defects are added to the message where the
+problem was found, so for example, if a message nested inside a
+\mimetype{multipart/alternative} had a malformed header, that nested message
+object would have a defect, but the containing messages would not.
+
+All defect classes are subclassed from \class{email.Errors.MessageDefect}, but
+this class is \emph{not} an exception!
+
+\versionadded[All the defect classes were added]{2.4}
+
+\begin{itemize}
+\item \class{NoBoundaryInMultipartDefect} -- A message claimed to be a
+      multipart, but had no \mimetype{boundary} parameter.
+
+\item \class{StartBoundaryNotFoundDefect} -- The start boundary claimed in the
+      \mailheader{Content-Type} header was never found.
+
+\item \class{FirstHeaderLineIsContinuationDefect} -- The message had a
+      continuation line as its first header line.
+
+\item \class{MisplacedEnvelopeHeaderDefect} - A ``Unix From'' header was found
+      in the middle of a header block.
+
+\item \class{MalformedHeaderDefect} -- A header was found that was missing a
+      colon, or was otherwise malformed.
+
+\item \class{MultipartInvariantViolationDefect} -- A message claimed to be a
+      \mimetype{multipart}, but no subparts were found.  Note that when a
+      message has this defect, its \method{is_multipart()} method may return
+      false even though its content type claims to be \mimetype{multipart}.
+\end{itemize}
diff --git a/Doc/lib/emailmessage.tex b/Doc/lib/emailmessage.tex
index 1943273..f732054 100644
--- a/Doc/lib/emailmessage.tex
+++ b/Doc/lib/emailmessage.tex
@@ -359,13 +359,16 @@ the form \code{(CHARSET, LANGUAGE, VALUE)}.  Note that both \code{CHARSET} and
 \code{VALUE} to be encoded in the \code{us-ascii} charset.  You can
 usually ignore \code{LANGUAGE}.
 
-Your application should be prepared to deal with 3-tuple return
-values, and can convert the parameter to a Unicode string like so:
+If your application doesn't care whether the parameter was encoded as in
+\rfc{2231}, you can collapse the parameter value by calling
+\function{email.Utils.collapse_rfc2231_value()}, passing in the return value
+from \method{get_param()}.  This will return a suitably decoded Unicode string
+whn the value is a tuple, or the original string unquoted if it isn't.  For
+example:
 
 \begin{verbatim}
-param = msg.get_param('foo')
-if isinstance(param, tuple):
-    param = unicode(param[2], param[0] or 'us-ascii')
+rawparam = msg.get_param('foo')
+param = email.Utils.collapse_rfc2231_value(rawparam)
 \end{verbatim}
 
 In any case, the parameter value (either the returned string, or the
@@ -549,32 +552,21 @@ newline get printed after your closing \mimetype{multipart} boundary,
 set the \var{epilogue} to the empty string.
 \end{datadesc}
 
-\subsubsection{Deprecated methods}
-
-The following methods are deprecated in \module{email} version 2.
-They are documented here for completeness.
+\begin{datadesc}{defects}
+The \var{defects} attribute contains a list of all the problems found when
+parsing this message.  See \refmodule{email.Errors} for a detailed description
+of the possible parsing defects.
 
-\begin{methoddesc}[Message]{add_payload}{payload}
-Add \var{payload} to the message object's existing payload.  If, prior
-to calling this method, the object's payload was \code{None}
-(i.e. never before set), then after this method is called, the payload
-will be the argument \var{payload}.
+\versionadded{2.4}
+\end{datadesc}
 
-If the object's payload was already a list
-(i.e. \method{is_multipart()} returns \code{True}), then \var{payload} is
-appended to the end of the existing payload list.
+\subsubsection{Deprecated methods}
 
-For any other type of existing payload, \method{add_payload()} will
-transform the new payload into a list consisting of the old payload
-and \var{payload}, but only if the document is already a MIME
-multipart document.  This condition is satisfied if the message's
-\mailheader{Content-Type} header's main type is either
-\mimetype{multipart}, or there is no \mailheader{Content-Type}
-header.  In any other situation,
-\exception{MultipartConversionError} is raised.
+\versionchanged[The \method{add_payload()} method was removed; use the
+\method{attach()} method instead]{2.4}
 
-\deprecated{2.2.2}{Use the \method{attach()} method instead.}
-\end{methoddesc}
+The following methods are deprecated.  They are documented here for
+completeness.
 
 \begin{methoddesc}[Message]{get_type}{\optional{failobj}}
 Return the message's content type, as a string of the form
diff --git a/Doc/lib/emailmimebase.tex b/Doc/lib/emailmimebase.tex
index 3318d6a..070c9a2 100644
--- a/Doc/lib/emailmimebase.tex
+++ b/Doc/lib/emailmimebase.tex
@@ -142,9 +142,7 @@ Optional \var{_subtype} sets the subtype of the message; it defaults
 to \mimetype{rfc822}.
 \end{classdesc}
 
-\begin{classdesc}{MIMEText}{_text\optional{, _subtype\optional{,
-    _charset\optional{, _encoder}}}}
-
+\begin{classdesc}{MIMEText}{_text\optional{, _subtype\optional{, _charset}}}
 A subclass of \class{MIMENonMultipart}, the \class{MIMEText} class is
 used to create MIME objects of major type \mimetype{text}.
 \var{_text} is the string for the payload.  \var{_subtype} is the
@@ -153,6 +151,7 @@ character set of the text and is passed as a parameter to the
 \class{MIMENonMultipart} constructor; it defaults to \code{us-ascii}.  No
 guessing or encoding is performed on the text data.
 
-\deprecated{2.2.2}{The \var{_encoding} argument has been deprecated.
-Encoding now happens implicitly based on the \var{_charset} argument.}
+\versionchanged[The previously deprecated \var{_encoding} argument has
+been removed.  Encoding happens implicitly based on the \var{_charset}
+argument]{2.4}
 \end{classdesc}
diff --git a/Doc/lib/emailparser.tex b/Doc/lib/emailparser.tex
index 1e8597c..5fac92f 100644
--- a/Doc/lib/emailparser.tex
+++ b/Doc/lib/emailparser.tex
@@ -18,29 +18,79 @@ messages, the root object will return \code{True} from its
 \method{is_multipart()} method, and the subparts can be accessed via
 the \method{get_payload()} and \method{walk()} methods.
 
+There are actually two parser interfaces available for use, the classic
+\class{Parser} API and the incremental \class{FeedParser} API.  The classic
+\class{Parser} API is fine if you have the entire text of the message in
+memory as a string, or if the entire message lives in a file on the file
+system.  \class{FeedParser} is more appropriate for when you're reading the
+message from a stream which might block waiting for more input (e.g. reading
+an email message from a socket).  The \class{FeedParser} can consume and parse
+the message incrementally, and only returns the root object when you close the
+parser\footnote{As of email package version 3.0, introduced in
+Python 2.4, the classic \class{Parser} was re-implemented in terms of the
+\class{FeedParser}, so the semantics and results are identical between the two
+parsers.}.
+
 Note that the parser can be extended in limited ways, and of course
 you can implement your own parser completely from scratch.  There is
 no magical connection between the \module{email} package's bundled
 parser and the \class{Message} class, so your custom parser can create
 message object trees any way it finds necessary.
 
-The primary parser class is \class{Parser} which parses both the
-headers and the payload of the message.  In the case of
-\mimetype{multipart} messages, it will recursively parse the body of
-the container message.  Two modes of parsing are supported,
-\emph{strict} parsing, which will usually reject any non-RFC compliant
-message, and \emph{lax} parsing, which attempts to adjust for common
-MIME formatting problems.
+\subsubsection{FeedParser API}
+
+\versionadded{2.4}
+
+The \class{FeedParser} provides an API that is conducive to incremental
+parsing of email messages, such as would be necessary when reading the text of
+an email message from a source that can block (e.g. a socket).  The
+\class{FeedParser} can of course be used to parse an email message fully
+contained in a string or a file, but the classic \class{Parser} API may be
+more convenient for such use cases.  The semantics and results of the two
+parser APIs are identical.
+
+The \class{FeedParser}'s API is simple; you create an instance, feed it a
+bunch of text until there's no more to feed it, then close the parser to
+retrieve the root message object.  The \class{FeedParser} is extremely
+accurate when parsing standards-compliant messages, and it does a very good
+job of parsing non-compliant messages, providing information about how a
+message was deemed broken.  It will populate a message object's \var{defects}
+attribute with a list of any problems it found in a message.  See the
+\refmodule{email.Errors} module for the list of defects that it can find.
+
+Here is the API for the \class{FeedParser}:
+
+\begin{classdesc}{FeedParser}{\optional{_factory}}
+Create a \class{FeedParser} instance.  Optional \var{_factory} is a
+no-argument callable that will be called whenever a new message object is
+needed.  It defaults to the \class{email.Message.Message} class.
+\end{classdesc}
+
+\begin{methoddesc}[FeedParser]{feed}{data}
+Feed the \class{FeedParser} some more data.  \var{data} should be a
+string containing one or more lines.  The lines can be partial and the
+\class{FeedParser} will stitch such partial lines together properly.  The
+lines in the string can have any of the common three line endings, carriage
+return, newline, or carriage return and newline (they can even be mixed).
+\end{methoddesc}
+
+\begin{methoddesc}[FeedParser]{close}{}
+Closing a \class{FeedParser} completes the parsing of all previously fed data,
+and returns the root message object.  It is undefined what happens if you feed
+more data to a closed \class{FeedParser}.
+\end{methoddesc}
 
-The \module{email.Parser} module also provides a second class, called
+\subsubsection{Parser class API}
+
+The \class{Parser} provides an API that can be used to parse a message when
+the complete contents of the message are available in a string or file.  The
+\module{email.Parser} module also provides a second class, called
 \class{HeaderParser} which can be used if you're only interested in
 the headers of the message. \class{HeaderParser} can be much faster in
 these situations, since it does not attempt to parse the message body,
 instead setting the payload to the raw body as a string.
 \class{HeaderParser} has the same API as the \class{Parser} class.
 
-\subsubsection{Parser class API}
-
 \begin{classdesc}{Parser}{\optional{_class\optional{, strict}}}
 The constructor for the \class{Parser} class takes an optional
 argument \var{_class}.  This must be a callable factory (such as a
@@ -49,19 +99,14 @@ needs to be created.  It defaults to \class{Message} (see
 \refmodule{email.Message}).  The factory will be called without
 arguments.
 
-The optional \var{strict} flag specifies whether strict or lax parsing
-should be performed.  Normally, when things like MIME terminating
-boundaries are missing, or when messages contain other formatting
-problems, the \class{Parser} will raise a
-\exception{MessageParseError}.  However, when lax parsing is enabled,
-the \class{Parser} will attempt to work around such broken formatting
-to produce a usable message structure (this doesn't mean
-\exception{MessageParseError}s are never raised; some ill-formatted
-messages just can't be parsed).  The \var{strict} flag defaults to
-\code{False} since lax parsing usually provides the most convenient
-behavior.
+The optional \var{strict} flag is ignored.  \deprecated{2.4}{Because the
+\class{Parser} class is a backward compatible API wrapper around the
+new-in-Python 2.4 \class{FeedParser}, \emph{all} parsing is effectively
+non-strict.  You should simply stop passing a \var{strict} flag to the
+\class{Parser} constructor.}
 
 \versionchanged[The \var{strict} flag was added]{2.2.2}
+\versionchanged[The \var{strict} flag was deprecated]{2.4}
 \end{classdesc}
 
 The other public \class{Parser} methods are:
@@ -149,4 +194,13 @@ Here are some notes on the parsing semantics:
       object containing a list payload of length 1.  Their
       \method{is_multipart()} method will return \code{True}.  The
       single element in the list payload will be a sub-message object.
+
+\item Some non-standards compliant messages may not be internally consistent
+      about their \mimetype{multipart}-edness.  Such messages may have a
+      \mailheader{Content-Type} header of type \mimetype{multipart}, but their
+      \method{is_multipart()} method may return \code{False}.  If such
+      messages were parsed with the \class{FeedParser}, they will have an
+      instance of the \class{MultipartInvariantViolationDefect} class in their
+      \var{defects} attribute list.  See \refmodule{email.Errors} for
+      details.
 \end{itemize}
diff --git a/Doc/lib/emailutil.tex b/Doc/lib/emailutil.tex
index 80f0acf..c41f066 100644
--- a/Doc/lib/emailutil.tex
+++ b/Doc/lib/emailutil.tex
@@ -119,24 +119,33 @@ as-is.  If \var{charset} is given but \var{language} is not, the
 string is encoded using the empty string for \var{language}.
 \end{funcdesc}
 
+\begin{funcdesc}{collapse_rfc2231_value}{value\optional{, errors\optional{,
+    fallback_charset}}}
+When a header parameter is encoded in \rfc{2231} format,
+\method{Message.get_param()} may return a 3-tuple containing the character
+set, language, and value.  \function{collapse_rfc2231_value()} turns this into
+a unicode string.  Optional \var{errors} is passed to the \var{errors}
+argument of the built-in \function{unicode()} function; it defaults to
+\code{replace}.  Optional \var{fallback_charset} specifies the character set
+to use if the one in the \rfc{2231} header is not known by Python; it defaults
+to \code{us-ascii}.
+
+For convenience, if the \var{value} passed to
+\function{collapse_rfc2231_value()} is not a tuple, it should be a string and
+it is returned unquoted.
+\end{funcdesc}
+
 \begin{funcdesc}{decode_params}{params}
 Decode parameters list according to \rfc{2231}.  \var{params} is a
 sequence of 2-tuples containing elements of the form
 \code{(content-type, string-value)}.
 \end{funcdesc}
 
-The following functions have been deprecated:
-
-\begin{funcdesc}{dump_address_pair}{pair}
-\deprecated{2.2.2}{Use \function{formataddr()} instead.}
-\end{funcdesc}
-
-\begin{funcdesc}{decode}{s}
-\deprecated{2.2.2}{Use \method{Header.decode_header()} instead.}
-\end{funcdesc}
-
+\versionchanged[The \function{dump_address_pair()} function has been removed;
+use \function{formataddr()} instead.]{2.4}
 
-\begin{funcdesc}{encode}{s\optional{, charset\optional{, encoding}}}
-\deprecated{2.2.2}{Use \method{Header.encode()} instead.}
-\end{funcdesc}
+\versionchanged[The \function{decode()} function has been removed; use the
+\method{Header.decode_header()} method instead.]{2.4}
 
+\versionchanged[The \function{encode()} function has been removed; use the
+\method{Header.encode()} method instead.]{2.4}
author	Barry Warsaw <barry@python.org>	2004-10-03 03:16:19 (GMT)
committer	Barry Warsaw <barry@python.org>	2004-10-03 03:16:19 (GMT)
commit	bb113867305f8ab70947bffb77961a60d10730dc (patch)
tree	0af1fbf0fbbd95170636205343ba827cf768bb38 /Doc/lib
parent	2cdd608601071df8e557beaaa78b54884c80e8de (diff)
download	cpython-bb113867305f8ab70947bffb77961a60d10730dc.zip cpython-bb113867305f8ab70947bffb77961a60d10730dc.tar.gz cpython-bb113867305f8ab70947bffb77961a60d10730dc.tar.bz2