diff options
author | Barry Warsaw <barry@python.org> | 2004-10-03 03:16:19 (GMT) |
---|---|---|
committer | Barry Warsaw <barry@python.org> | 2004-10-03 03:16:19 (GMT) |
commit | bb113867305f8ab70947bffb77961a60d10730dc (patch) | |
tree | 0af1fbf0fbbd95170636205343ba827cf768bb38 /Doc | |
parent | 2cdd608601071df8e557beaaa78b54884c80e8de (diff) | |
download | cpython-bb113867305f8ab70947bffb77961a60d10730dc.zip cpython-bb113867305f8ab70947bffb77961a60d10730dc.tar.gz cpython-bb113867305f8ab70947bffb77961a60d10730dc.tar.bz2 |
Big email 3.0 API changes, with updated unit tests and documentation.
Briefly (from the NEWS file):
- Updates for the email package:
+ All deprecated APIs that in email 2.x issued warnings have been removed:
_encoder argument to the MIMEText constructor, Message.add_payload(),
Utils.dump_address_pair(), Utils.decode(), Utils.encode()
+ New deprecations: Generator.__call__(), Message.get_type(),
Message.get_main_type(), Message.get_subtype(), the 'strict' argument to
the Parser constructor. These will be removed in email 3.1.
+ Support for Python earlier than 2.3 has been removed (see PEP 291).
+ All defect classes have been renamed to end in 'Defect'.
+ Some FeedParser fixes; also a MultipartInvariantViolationDefect will be
added to messages that claim to be multipart but really aren't.
+ Updates to documentation.
Diffstat (limited to 'Doc')
-rw-r--r-- | Doc/lib/email.tex | 47 | ||||
-rw-r--r-- | Doc/lib/emailencoders.tex | 10 | ||||
-rw-r--r-- | Doc/lib/emailexc.tex | 33 | ||||
-rw-r--r-- | Doc/lib/emailmessage.tex | 46 | ||||
-rw-r--r-- | Doc/lib/emailmimebase.tex | 9 | ||||
-rw-r--r-- | Doc/lib/emailparser.tex | 96 | ||||
-rw-r--r-- | Doc/lib/emailutil.tex | 35 |
7 files changed, 195 insertions, 81 deletions
diff --git a/Doc/lib/email.tex b/Doc/lib/email.tex index debed70..56affa5 100644 --- a/Doc/lib/email.tex +++ b/Doc/lib/email.tex @@ -1,5 +1,5 @@ -% Copyright (C) 2001,2002 Python Software Foundation -% Author: barry@zope.com (Barry Warsaw) +% Copyright (C) 2001-2004 Python Software Foundation +% Author: barry@python.org (Barry Warsaw) \section{\module{email} --- An email and MIME handling package} @@ -7,8 +7,8 @@ \declaremodule{standard}{email} \modulesynopsis{Package supporting the parsing, manipulating, and generating email messages, including MIME documents.} -\moduleauthor{Barry A. Warsaw}{barry@zope.com} -\sectionauthor{Barry A. Warsaw}{barry@zope.com} +\moduleauthor{Barry A. Warsaw}{barry@python.org} +\sectionauthor{Barry A. Warsaw}{barry@python.org} \versionadded{2.2} @@ -22,7 +22,7 @@ sending of email messages to SMTP (\rfc{2821}) servers; that is the function of the \refmodule{smtplib} module. The \module{email} package attempts to be as RFC-compliant as possible, supporting in addition to \rfc{2822}, such MIME-related RFCs as -\rfc{2045}-\rfc{2047}, and \rfc{2231}. +\rfc{2045}, \rfc{2046}, \rfc{2047}, and \rfc{2231}. The primary distinguishing feature of the \module{email} package is that it splits the parsing and generating of email messages from the @@ -79,7 +79,7 @@ package, a section on differences and porting is provided. \subsection{Encoders} \input{emailencoders} -\subsection{Exception classes} +\subsection{Exception and Defect classes} \input{emailexc} \subsection{Miscellaneous utilities} @@ -88,14 +88,41 @@ package, a section on differences and porting is provided. \subsection{Iterators} \input{emailiter} -\subsection{Differences from \module{email} v1 (up to Python 2.2.1)} +\subsection{Package History} Version 1 of the \module{email} package was bundled with Python releases up to Python 2.2.1. Version 2 was developed for the Python 2.3 release, and backported to Python 2.2.2. It was also available as -a separate distutils based package. \module{email} version 2 is -almost entirely backward compatible with version 1, with the -following differences: +a separate distutils-based package, and is compatible back to Python 2.1. + +\module{email} version 3.0 was released with Python 2.4 and as a separate +distutils-based package. It is compatible back to Python 2.3. + +Here are the differences between \module{email} version 3 and version 2: + +\begin{itemize} +\item The \class{FeedParser} class was introduced, and the \class{Parser} + class was implemented in terms of the \class{FeedParser}. All parsing + there for is non-strict, and parsing will make a best effort never to + raise an exception. Problems found while parsing messages are stored in + the message's \var{defect} attribute. + +\item All aspects of the API which raised \exception{DeprecationWarning}s in + version 2 have been removed. These include the \var{_encoder} argument + to the \class{MIMEText} constructor, the \method{Message.add_payload()} + method, the \function{Utils.dump_address_pair()} function, and the + functions \function{Utils.decode()} and \function{Utils.encode()}. + +\item New \exception{DeprecationWarning}s have been added to: + \method{Generator.__call__()}, \method{Message.get_type()}, + \method{Message.get_main_type()}, \method{Message.get_subtype()}, and + the \var{strict} argument to the \class{Parser} class. These are + expected to be removed in email 3.1. + +\item Support for Pythons earlier than 2.3 has been removed. +\end{itemize} + +Here are the differences between \module{email} version 2 and version 1: \begin{itemize} \item The \module{email.Header} and \module{email.Charset} modules diff --git a/Doc/lib/emailencoders.tex b/Doc/lib/emailencoders.tex index cd54d68..a49e04d 100644 --- a/Doc/lib/emailencoders.tex +++ b/Doc/lib/emailencoders.tex @@ -8,11 +8,11 @@ type messages containing binary data. The \module{email} package provides some convenient encodings in its \module{Encoders} module. These encoders are actually used by the -\class{MIMEImage} and \class{MIMEText} class constructors to provide default -encodings. All encoder functions take exactly one argument, the -message object to encode. They usually extract the payload, encode -it, and reset the payload to this newly encoded value. They should also -set the \mailheader{Content-Transfer-Encoding} header as appropriate. +\class{MIMEAudio} and \class{MIMEImage} class constructors to provide default +encodings. All encoder functions take exactly one argument, the message +object to encode. They usually extract the payload, encode it, and reset the +payload to this newly encoded value. They should also set the +\mailheader{Content-Transfer-Encoding} header as appropriate. Here are the encoding functions provided: diff --git a/Doc/lib/emailexc.tex b/Doc/lib/emailexc.tex index 824a276..6ac0889 100644 --- a/Doc/lib/emailexc.tex +++ b/Doc/lib/emailexc.tex @@ -52,3 +52,36 @@ rarely raised in practice. However the exception may also be raised if the \method{attach()} method is called on an instance of a class derived from \class{MIMENonMultipart} (e.g. \class{MIMEImage}). \end{excclassdesc} + +Here's the list of the defects that the \class{FeedParser} can find while +parsing messages. Note that the defects are added to the message where the +problem was found, so for example, if a message nested inside a +\mimetype{multipart/alternative} had a malformed header, that nested message +object would have a defect, but the containing messages would not. + +All defect classes are subclassed from \class{email.Errors.MessageDefect}, but +this class is \emph{not} an exception! + +\versionadded[All the defect classes were added]{2.4} + +\begin{itemize} +\item \class{NoBoundaryInMultipartDefect} -- A message claimed to be a + multipart, but had no \mimetype{boundary} parameter. + +\item \class{StartBoundaryNotFoundDefect} -- The start boundary claimed in the + \mailheader{Content-Type} header was never found. + +\item \class{FirstHeaderLineIsContinuationDefect} -- The message had a + continuation line as its first header line. + +\item \class{MisplacedEnvelopeHeaderDefect} - A ``Unix From'' header was found + in the middle of a header block. + +\item \class{MalformedHeaderDefect} -- A header was found that was missing a + colon, or was otherwise malformed. + +\item \class{MultipartInvariantViolationDefect} -- A message claimed to be a + \mimetype{multipart}, but no subparts were found. Note that when a + message has this defect, its \method{is_multipart()} method may return + false even though its content type claims to be \mimetype{multipart}. +\end{itemize} diff --git a/Doc/lib/emailmessage.tex b/Doc/lib/emailmessage.tex index 1943273..f732054 100644 --- a/Doc/lib/emailmessage.tex +++ b/Doc/lib/emailmessage.tex @@ -359,13 +359,16 @@ the form \code{(CHARSET, LANGUAGE, VALUE)}. Note that both \code{CHARSET} and \code{VALUE} to be encoded in the \code{us-ascii} charset. You can usually ignore \code{LANGUAGE}. -Your application should be prepared to deal with 3-tuple return -values, and can convert the parameter to a Unicode string like so: +If your application doesn't care whether the parameter was encoded as in +\rfc{2231}, you can collapse the parameter value by calling +\function{email.Utils.collapse_rfc2231_value()}, passing in the return value +from \method{get_param()}. This will return a suitably decoded Unicode string +whn the value is a tuple, or the original string unquoted if it isn't. For +example: \begin{verbatim} -param = msg.get_param('foo') -if isinstance(param, tuple): - param = unicode(param[2], param[0] or 'us-ascii') +rawparam = msg.get_param('foo') +param = email.Utils.collapse_rfc2231_value(rawparam) \end{verbatim} In any case, the parameter value (either the returned string, or the @@ -549,32 +552,21 @@ newline get printed after your closing \mimetype{multipart} boundary, set the \var{epilogue} to the empty string. \end{datadesc} -\subsubsection{Deprecated methods} - -The following methods are deprecated in \module{email} version 2. -They are documented here for completeness. +\begin{datadesc}{defects} +The \var{defects} attribute contains a list of all the problems found when +parsing this message. See \refmodule{email.Errors} for a detailed description +of the possible parsing defects. -\begin{methoddesc}[Message]{add_payload}{payload} -Add \var{payload} to the message object's existing payload. If, prior -to calling this method, the object's payload was \code{None} -(i.e. never before set), then after this method is called, the payload -will be the argument \var{payload}. +\versionadded{2.4} +\end{datadesc} -If the object's payload was already a list -(i.e. \method{is_multipart()} returns \code{True}), then \var{payload} is -appended to the end of the existing payload list. +\subsubsection{Deprecated methods} -For any other type of existing payload, \method{add_payload()} will -transform the new payload into a list consisting of the old payload -and \var{payload}, but only if the document is already a MIME -multipart document. This condition is satisfied if the message's -\mailheader{Content-Type} header's main type is either -\mimetype{multipart}, or there is no \mailheader{Content-Type} -header. In any other situation, -\exception{MultipartConversionError} is raised. +\versionchanged[The \method{add_payload()} method was removed; use the +\method{attach()} method instead]{2.4} -\deprecated{2.2.2}{Use the \method{attach()} method instead.} -\end{methoddesc} +The following methods are deprecated. They are documented here for +completeness. \begin{methoddesc}[Message]{get_type}{\optional{failobj}} Return the message's content type, as a string of the form diff --git a/Doc/lib/emailmimebase.tex b/Doc/lib/emailmimebase.tex index 3318d6a..070c9a2 100644 --- a/Doc/lib/emailmimebase.tex +++ b/Doc/lib/emailmimebase.tex @@ -142,9 +142,7 @@ Optional \var{_subtype} sets the subtype of the message; it defaults to \mimetype{rfc822}. \end{classdesc} -\begin{classdesc}{MIMEText}{_text\optional{, _subtype\optional{, - _charset\optional{, _encoder}}}} - +\begin{classdesc}{MIMEText}{_text\optional{, _subtype\optional{, _charset}}} A subclass of \class{MIMENonMultipart}, the \class{MIMEText} class is used to create MIME objects of major type \mimetype{text}. \var{_text} is the string for the payload. \var{_subtype} is the @@ -153,6 +151,7 @@ character set of the text and is passed as a parameter to the \class{MIMENonMultipart} constructor; it defaults to \code{us-ascii}. No guessing or encoding is performed on the text data. -\deprecated{2.2.2}{The \var{_encoding} argument has been deprecated. -Encoding now happens implicitly based on the \var{_charset} argument.} +\versionchanged[The previously deprecated \var{_encoding} argument has +been removed. Encoding happens implicitly based on the \var{_charset} +argument]{2.4} \end{classdesc} diff --git a/Doc/lib/emailparser.tex b/Doc/lib/emailparser.tex index 1e8597c..5fac92f 100644 --- a/Doc/lib/emailparser.tex +++ b/Doc/lib/emailparser.tex @@ -18,29 +18,79 @@ messages, the root object will return \code{True} from its \method{is_multipart()} method, and the subparts can be accessed via the \method{get_payload()} and \method{walk()} methods. +There are actually two parser interfaces available for use, the classic +\class{Parser} API and the incremental \class{FeedParser} API. The classic +\class{Parser} API is fine if you have the entire text of the message in +memory as a string, or if the entire message lives in a file on the file +system. \class{FeedParser} is more appropriate for when you're reading the +message from a stream which might block waiting for more input (e.g. reading +an email message from a socket). The \class{FeedParser} can consume and parse +the message incrementally, and only returns the root object when you close the +parser\footnote{As of email package version 3.0, introduced in +Python 2.4, the classic \class{Parser} was re-implemented in terms of the +\class{FeedParser}, so the semantics and results are identical between the two +parsers.}. + Note that the parser can be extended in limited ways, and of course you can implement your own parser completely from scratch. There is no magical connection between the \module{email} package's bundled parser and the \class{Message} class, so your custom parser can create message object trees any way it finds necessary. -The primary parser class is \class{Parser} which parses both the -headers and the payload of the message. In the case of -\mimetype{multipart} messages, it will recursively parse the body of -the container message. Two modes of parsing are supported, -\emph{strict} parsing, which will usually reject any non-RFC compliant -message, and \emph{lax} parsing, which attempts to adjust for common -MIME formatting problems. +\subsubsection{FeedParser API} + +\versionadded{2.4} + +The \class{FeedParser} provides an API that is conducive to incremental +parsing of email messages, such as would be necessary when reading the text of +an email message from a source that can block (e.g. a socket). The +\class{FeedParser} can of course be used to parse an email message fully +contained in a string or a file, but the classic \class{Parser} API may be +more convenient for such use cases. The semantics and results of the two +parser APIs are identical. + +The \class{FeedParser}'s API is simple; you create an instance, feed it a +bunch of text until there's no more to feed it, then close the parser to +retrieve the root message object. The \class{FeedParser} is extremely +accurate when parsing standards-compliant messages, and it does a very good +job of parsing non-compliant messages, providing information about how a +message was deemed broken. It will populate a message object's \var{defects} +attribute with a list of any problems it found in a message. See the +\refmodule{email.Errors} module for the list of defects that it can find. + +Here is the API for the \class{FeedParser}: + +\begin{classdesc}{FeedParser}{\optional{_factory}} +Create a \class{FeedParser} instance. Optional \var{_factory} is a +no-argument callable that will be called whenever a new message object is +needed. It defaults to the \class{email.Message.Message} class. +\end{classdesc} + +\begin{methoddesc}[FeedParser]{feed}{data} +Feed the \class{FeedParser} some more data. \var{data} should be a +string containing one or more lines. The lines can be partial and the +\class{FeedParser} will stitch such partial lines together properly. The +lines in the string can have any of the common three line endings, carriage +return, newline, or carriage return and newline (they can even be mixed). +\end{methoddesc} + +\begin{methoddesc}[FeedParser]{close}{} +Closing a \class{FeedParser} completes the parsing of all previously fed data, +and returns the root message object. It is undefined what happens if you feed +more data to a closed \class{FeedParser}. +\end{methoddesc} -The \module{email.Parser} module also provides a second class, called +\subsubsection{Parser class API} + +The \class{Parser} provides an API that can be used to parse a message when +the complete contents of the message are available in a string or file. The +\module{email.Parser} module also provides a second class, called \class{HeaderParser} which can be used if you're only interested in the headers of the message. \class{HeaderParser} can be much faster in these situations, since it does not attempt to parse the message body, instead setting the payload to the raw body as a string. \class{HeaderParser} has the same API as the \class{Parser} class. -\subsubsection{Parser class API} - \begin{classdesc}{Parser}{\optional{_class\optional{, strict}}} The constructor for the \class{Parser} class takes an optional argument \var{_class}. This must be a callable factory (such as a @@ -49,19 +99,14 @@ needs to be created. It defaults to \class{Message} (see \refmodule{email.Message}). The factory will be called without arguments. -The optional \var{strict} flag specifies whether strict or lax parsing -should be performed. Normally, when things like MIME terminating -boundaries are missing, or when messages contain other formatting -problems, the \class{Parser} will raise a -\exception{MessageParseError}. However, when lax parsing is enabled, -the \class{Parser} will attempt to work around such broken formatting -to produce a usable message structure (this doesn't mean -\exception{MessageParseError}s are never raised; some ill-formatted -messages just can't be parsed). The \var{strict} flag defaults to -\code{False} since lax parsing usually provides the most convenient -behavior. +The optional \var{strict} flag is ignored. \deprecated{2.4}{Because the +\class{Parser} class is a backward compatible API wrapper around the +new-in-Python 2.4 \class{FeedParser}, \emph{all} parsing is effectively +non-strict. You should simply stop passing a \var{strict} flag to the +\class{Parser} constructor.} \versionchanged[The \var{strict} flag was added]{2.2.2} +\versionchanged[The \var{strict} flag was deprecated]{2.4} \end{classdesc} The other public \class{Parser} methods are: @@ -149,4 +194,13 @@ Here are some notes on the parsing semantics: object containing a list payload of length 1. Their \method{is_multipart()} method will return \code{True}. The single element in the list payload will be a sub-message object. + +\item Some non-standards compliant messages may not be internally consistent + about their \mimetype{multipart}-edness. Such messages may have a + \mailheader{Content-Type} header of type \mimetype{multipart}, but their + \method{is_multipart()} method may return \code{False}. If such + messages were parsed with the \class{FeedParser}, they will have an + instance of the \class{MultipartInvariantViolationDefect} class in their + \var{defects} attribute list. See \refmodule{email.Errors} for + details. \end{itemize} diff --git a/Doc/lib/emailutil.tex b/Doc/lib/emailutil.tex index 80f0acf..c41f066 100644 --- a/Doc/lib/emailutil.tex +++ b/Doc/lib/emailutil.tex @@ -119,24 +119,33 @@ as-is. If \var{charset} is given but \var{language} is not, the string is encoded using the empty string for \var{language}. \end{funcdesc} +\begin{funcdesc}{collapse_rfc2231_value}{value\optional{, errors\optional{, + fallback_charset}}} +When a header parameter is encoded in \rfc{2231} format, +\method{Message.get_param()} may return a 3-tuple containing the character +set, language, and value. \function{collapse_rfc2231_value()} turns this into +a unicode string. Optional \var{errors} is passed to the \var{errors} +argument of the built-in \function{unicode()} function; it defaults to +\code{replace}. Optional \var{fallback_charset} specifies the character set +to use if the one in the \rfc{2231} header is not known by Python; it defaults +to \code{us-ascii}. + +For convenience, if the \var{value} passed to +\function{collapse_rfc2231_value()} is not a tuple, it should be a string and +it is returned unquoted. +\end{funcdesc} + \begin{funcdesc}{decode_params}{params} Decode parameters list according to \rfc{2231}. \var{params} is a sequence of 2-tuples containing elements of the form \code{(content-type, string-value)}. \end{funcdesc} -The following functions have been deprecated: - -\begin{funcdesc}{dump_address_pair}{pair} -\deprecated{2.2.2}{Use \function{formataddr()} instead.} -\end{funcdesc} - -\begin{funcdesc}{decode}{s} -\deprecated{2.2.2}{Use \method{Header.decode_header()} instead.} -\end{funcdesc} - +\versionchanged[The \function{dump_address_pair()} function has been removed; +use \function{formataddr()} instead.]{2.4} -\begin{funcdesc}{encode}{s\optional{, charset\optional{, encoding}}} -\deprecated{2.2.2}{Use \method{Header.encode()} instead.} -\end{funcdesc} +\versionchanged[The \function{decode()} function has been removed; use the +\method{Header.decode_header()} method instead.]{2.4} +\versionchanged[The \function{encode()} function has been removed; use the +\method{Header.encode()} method instead.]{2.4} |