From 5db478fa29299416f8475445f2584b20d8e534ed Mon Sep 17 00:00:00 2001 From: Barry Warsaw Date: Tue, 1 Oct 2002 04:33:16 +0000 Subject: Proofread and spell checked, all except the Examples section (which I'll do next). --- Doc/lib/email.tex | 40 +++---- Doc/lib/emailcharsets.tex | 240 +++++++++++++++++++++++++++++++++++++++++ Doc/lib/emailencoders.tex | 2 +- Doc/lib/emailgenerator.tex | 36 ++++--- Doc/lib/emailheaders.tex | 260 ++------------------------------------------- Doc/lib/emailmessage.tex | 87 +++++++-------- Doc/lib/emailmimebase.tex | 16 +-- Doc/lib/emailparser.tex | 24 ++--- Doc/lib/emailutil.tex | 4 +- 9 files changed, 351 insertions(+), 358 deletions(-) create mode 100644 Doc/lib/emailcharsets.tex diff --git a/Doc/lib/email.tex b/Doc/lib/email.tex index aa9f3e5..cbbcf87 100644 --- a/Doc/lib/email.tex +++ b/Doc/lib/email.tex @@ -39,14 +39,13 @@ and parsing message field values, creating RFC-compliant dates, etc. The following sections describe the functionality of the \module{email} package. The ordering follows a progression that should be common in applications: an email message is read as flat -text from a file or other source, the text is parsed to produce an -object model representation of the email message, this model is -manipulated, and finally the model is rendered back into -flat text. +text from a file or other source, the text is parsed to produce the +object structure of the email message, this structure is manipulated, +and finally rendered back into flat text. -It is perfectly feasible to create the object model out of whole cloth ---- i.e. completely from scratch. From there, a similar progression -can be taken as above. +It is perfectly feasible to create the object structure out of whole +cloth --- i.e. completely from scratch. From there, a similar +progression can be taken as above. Also included are detailed specifications of all the classes and modules that the \module{email} package provides, the exception @@ -71,9 +70,12 @@ package, a section on differences and porting is provided. \subsection{Creating email and MIME objects from scratch} \input{emailmimebase} -\subsection{Headers, Character sets, and Internationalization} +\subsection{Internationalized headers} \input{emailheaders} +\subsection{Representing character sets} +\input{emailcharsets} + \subsection{Encoders} \input{emailencoders} @@ -92,7 +94,7 @@ Version 1 of the \module{email} package was bundled with Python releases up to Python 2.2.1. Version 2 was developed for the Python 2.3 release, and backported to Python 2.2.2. It was also available as a separate distutils based package. \module{email} version 2 is -almost entirely backwards compatible with version 1, with the +almost entirely backward compatible with version 1, with the following differences: \begin{itemize} @@ -100,31 +102,31 @@ following differences: have been added. \item The pickle format for \class{Message} instances has changed. Since this was never (and still isn't) formally defined, this - isn't considered a backwards incompatibility. However if your + isn't considered a backward incompatibility. However if your application pickles and unpickles \class{Message} instances, be aware that in \module{email} version 2, \class{Message} instances now have private variables \var{_charset} and \var{_default_type}. \item Several methods in the \class{Message} class have been - deprecated, or their signatures changes. Also, many new methods + deprecated, or their signatures changed. Also, many new methods have been added. See the documentation for the \class{Message} - class for deatils. The changes should be completely backwards + class for details. The changes should be completely backward compatible. \item The object structure has changed in the face of \mimetype{message/rfc822} content types. In \module{email} version 1, such a type would be represented by a scalar payload, i.e. the container message's \method{is_multipart()} returned - false, \method{get_payload()} was not a list object, and was - actually a \class{Message} instance. + false, \method{get_payload()} was not a list object, but a single + \class{Message} instance. This structure was inconsistent with the rest of the package, so the object representation for \mimetype{message/rfc822} content - types was changed. In module{email} version 2, the container + types was changed. In \module{email} version 2, the container \emph{does} return \code{True} from \method{is_multipart()}, and \method{get_payload()} returns a list containing a single \class{Message} item. - Note that this is one place that backwards compatibility could + Note that this is one place that backward compatibility could not be completely maintained. However, if you're already testing the return type of \method{get_payload()}, you should be fine. You just need to make sure your code doesn't do a @@ -142,7 +144,7 @@ following differences: \module{email.Generator} module was added. \item The intermediate base classes \class{MIMENonMultipart} and \class{MIMEMultipart} have been added, and interposed in the - class heirarchy for most of the other MIME-related derived + class hierarchy for most of the other MIME-related derived classes. \item The \var{_encoder} argument to the \class{MIMEText} constructor has been deprecated. Encoding now happens implicitly based @@ -167,7 +169,9 @@ method names are more consistent, and some methods or modules have either been added or removed. The semantics of some of the methods have also changed. For the most part, any functionality available in \module{mimelib} is still available in the \refmodule{email} package, -albeit often in a different way. +albeit often in a different way. Backward compatibility between +the \module{mimelib} package and the \module{email} package was not a +priority. Here is a brief description of the differences between the \module{mimelib} and the \refmodule{email} packages, along with hints on diff --git a/Doc/lib/emailcharsets.tex b/Doc/lib/emailcharsets.tex new file mode 100644 index 0000000..d1ae728 --- /dev/null +++ b/Doc/lib/emailcharsets.tex @@ -0,0 +1,240 @@ +\declaremodule{standard}{email.Charset} +\modulesynopsis{Character Sets} + +This module provides a class \class{Charset} for representing +character sets and character set conversions in email messages, as +well as a character set registry and several convenience methods for +manipulating this registry. Instances of \class{Charset} are used in +several other modules within the \module{email} package. + +\versionadded{2.2.2} + +\begin{classdesc}{Charset}{\optional{input_charset}} +Map character sets to their email properties. + +This class provides information about the requirements imposed on +email for a specific character set. It also provides convenience +routines for converting between character sets, given the availability +of the applicable codecs. Given a character set, it will do its best +to provide information on how to use that character set in an email +message in an RFC-compliant way. + +Certain character sets must be encoded with quoted-printable or base64 +when used in email headers or bodies. Certain character sets must be +converted outright, and are not allowed in email. + +Optional \var{input_charset} is as described below. After being alias +normalized it is also used as a lookup into the registry of character +sets to find out the header encoding, body encoding, and output +conversion codec to be used for the character set. For example, if +\var{input_charset} is \code{iso-8859-1}, then headers and bodies will +be encoded using quoted-printable and no output conversion codec is +necessary. If \var{input_charset} is \code{euc-jp}, then headers will +be encoded with base64, bodies will not be encoded, but output text +will be converted from the \code{euc-jp} character set to the +\code{iso-2022-jp} character set. +\end{classdesc} + +\class{Charset} instances have the following data attributes: + +\begin{datadesc}{input_charset} +The initial character set specified. Common aliases are converted to +their \emph{official} email names (e.g. \code{latin_1} is converted to +\code{iso-8859-1}). Defaults to 7-bit \code{us-ascii}. +\end{datadesc} + +\begin{datadesc}{header_encoding} +If the character set must be encoded before it can be used in an +email header, this attribute will be set to \code{Charset.QP} (for +quoted-printable), \code{Charset.BASE64} (for base64 encoding), or +\code{Charset.SHORTEST} for the shortest of QP or BASE64 encoding. +Otherwise, it will be \code{None}. +\end{datadesc} + +\begin{datadesc}{body_encoding} +Same as \var{header_encoding}, but describes the encoding for the +mail message's body, which indeed may be different than the header +encoding. \code{Charset.SHORTEST} is not allowed for +\var{body_encoding}. +\end{datadesc} + +\begin{datadesc}{output_charset} +Some character sets must be converted before they can be used in +email headers or bodies. If the \var{input_charset} is one of +them, this attribute will contain the name of the character set +output will be converted to. Otherwise, it will be \code{None}. +\end{datadesc} + +\begin{datadesc}{input_codec} +The name of the Python codec used to convert the \var{input_charset} to +Unicode. If no conversion codec is necessary, this attribute will be +\code{None}. +\end{datadesc} + +\begin{datadesc}{output_codec} +The name of the Python codec used to convert Unicode to the +\var{output_charset}. If no conversion codec is necessary, this +attribute will have the same value as the \var{input_codec}. +\end{datadesc} + +\class{Charset} instances also have the following methods: + +\begin{methoddesc}[Charset]{get_body_encoding}{} +Return the content transfer encoding used for body encoding. + +This is either the string \samp{quoted-printable} or \samp{base64} +depending on the encoding used, or it is a function, in which case you +should call the function with a single argument, the Message object +being encoded. The function should then set the +\mailheader{Content-Transfer-Encoding} header itself to whatever is +appropriate. + +Returns the string \samp{quoted-printable} if +\var{body_encoding} is \code{QP}, returns the string +\samp{base64} if \var{body_encoding} is \code{BASE64}, and returns the +string \samp{7bit} otherwise. +\end{methoddesc} + +\begin{methoddesc}{convert}{s} +Convert the string \var{s} from the \var{input_codec} to the +\var{output_codec}. +\end{methoddesc} + +\begin{methoddesc}{to_splittable}{s} +Convert a possibly multibyte string to a safely splittable format. +\var{s} is the string to split. + +Uses the \var{input_codec} to try and convert the string to Unicode, +so it can be safely split on character boundaries (even for multibyte +characters). + +Returns the string as-is if it isn't known how to convert \var{s} to +Unicode with the \var{input_charset}. + +Characters that could not be converted to Unicode will be replaced +with the Unicode replacement character \character{U+FFFD}. +\end{methoddesc} + +\begin{methoddesc}{from_splittable}{ustr\optional{, to_output}} +Convert a splittable string back into an encoded string. \var{ustr} +is a Unicode string to ``unsplit''. + +This method uses the proper codec to try and convert the string from +Unicode back into an encoded format. Return the string as-is if it is +not Unicode, or if it could not be converted from Unicode. + +Characters that could not be converted from Unicode will be replaced +with an appropriate character (usually \character{?}). + +If \var{to_output} is \code{True} (the default), uses +\var{output_codec} to convert to an +encoded format. If \var{to_output} is \code{False}, it uses +\var{input_codec}. +\end{methoddesc} + +\begin{methoddesc}{get_output_charset}{} +Return the output character set. + +This is the \var{output_charset} attribute if that is not \code{None}, +otherwise it is \var{input_charset}. +\end{methoddesc} + +\begin{methoddesc}{encoded_header_len}{} +Return the length of the encoded header string, properly calculating +for quoted-printable or base64 encoding. +\end{methoddesc} + +\begin{methoddesc}{header_encode}{s\optional{, convert}} +Header-encode the string \var{s}. + +If \var{convert} is \code{True}, the string will be converted from the +input charset to the output charset automatically. This is not useful +for multibyte character sets, which have line length issues (multibyte +characters must be split on a character, not a byte boundary); use the +higher-level \class{Header} class to deal with these issues (see +\refmodule{email.Header}). \var{convert} defaults to \code{False}. + +The type of encoding (base64 or quoted-printable) will be based on +the \var{header_encoding} attribute. +\end{methoddesc} + +\begin{methoddesc}{body_encode}{s\optional{, convert}} +Body-encode the string \var{s}. + +If \var{convert} is \code{True} (the default), the string will be +converted from the input charset to output charset automatically. +Unlike \method{header_encode()}, there are no issues with byte +boundaries and multibyte charsets in email bodies, so this is usually +pretty safe. + +The type of encoding (base64 or quoted-printable) will be based on +the \var{body_encoding} attribute. +\end{methoddesc} + +The \class{Charset} class also provides a number of methods to support +standard operations and built-in functions. + +\begin{methoddesc}[Charset]{__str__}{} +Returns \var{input_charset} as a string coerced to lower case. +\end{methoddesc} + +\begin{methoddesc}[Charset]{__eq__}{other} +This method allows you to compare two \class{Charset} instances for equality. +\end{methoddesc} + +\begin{methoddesc}[Header]{__ne__}{other} +This method allows you to compare two \class{Charset} instances for inequality. +\end{methoddesc} + +The \module{email.Charset} module also provides the following +functions for adding new entries to the global character set, alias, +and codec registries: + +\begin{funcdesc}{add_charset}{charset\optional{, header_enc\optional{, + body_enc\optional{, output_charset}}}} +Add character properties to the global registry. + +\var{charset} is the input character set, and must be the canonical +name of a character set. + +Optional \var{header_enc} and \var{body_enc} is either +\code{Charset.QP} for quoted-printable, \code{Charset.BASE64} for +base64 encoding, \code{Charset.SHORTEST} for the shortest of +quoted-printable or base64 encoding, or \code{None} for no encoding. +\code{SHORTEST} is only valid for \var{header_enc}. The default is +\code{None} for no encoding. + +Optional \var{output_charset} is the character set that the output +should be in. Conversions will proceed from input charset, to +Unicode, to the output charset when the method +\method{Charset.convert()} is called. The default is to output in the +same character set as the input. + +Both \var{input_charset} and \var{output_charset} must have Unicode +codec entries in the module's character set-to-codec mapping; use +\function{add_codec()} to add codecs the module does +not know about. See the \refmodule{codecs} module's documentation for +more information. + +The global character set registry is kept in the module global +dictionary \code{CHARSETS}. +\end{funcdesc} + +\begin{funcdesc}{add_alias}{alias, canonical} +Add a character set alias. \var{alias} is the alias name, +e.g. \code{latin-1}. \var{canonical} is the character set's canonical +name, e.g. \code{iso-8859-1}. + +The global charset alias registry is kept in the module global +dictionary \code{ALIASES}. +\end{funcdesc} + +\begin{funcdesc}{add_codec}{charset, codecname} +Add a codec that map characters in the given character set to and from +Unicode. + +\var{charset} is the canonical name of a character set. +\var{codecname} is the name of a Python codec, as appropriate for the +second argument to the \function{unicode()} built-in, or to the +\method{encode()} method of a Unicode string. +\end{funcdesc} diff --git a/Doc/lib/emailencoders.tex b/Doc/lib/emailencoders.tex index 4b4e637..cd54d68 100644 --- a/Doc/lib/emailencoders.tex +++ b/Doc/lib/emailencoders.tex @@ -17,7 +17,7 @@ set the \mailheader{Content-Transfer-Encoding} header as appropriate. Here are the encoding functions provided: \begin{funcdesc}{encode_quopri}{msg} -Encodes the payload into quoted-Printable form and sets the +Encodes the payload into quoted-printable form and sets the \mailheader{Content-Transfer-Encoding} header to \code{quoted-printable}\footnote{Note that encoding with \method{encode_quopri()} also encodes all tabs and space characters in diff --git a/Doc/lib/emailgenerator.tex b/Doc/lib/emailgenerator.tex index 03fee9f..01c12d0 100644 --- a/Doc/lib/emailgenerator.tex +++ b/Doc/lib/emailgenerator.tex @@ -24,12 +24,12 @@ Here are the public methods of the \class{Generator} class: The constructor for the \class{Generator} class takes a file-like object called \var{outfp} for an argument. \var{outfp} must support the \method{write()} method and be usable as the output file in a -Python 2.0 extended print statement. +Python extended print statement. Optional \var{mangle_from_} is a flag that, when \code{True}, puts a \samp{>} character in front of any line in the body that starts exactly as -\samp{From } (i.e. \code{From} followed by a space at the front of the -line). This is the only guaranteed portable way to avoid having such +\samp{From }, i.e. \code{From} followed by a space at the beginning of the +line. This is the only guaranteed portable way to avoid having such lines be mistaken for a Unix mailbox format envelope header separator (see \ulink{WHY THE CONTENT-LENGTH FORMAT IS BAD} {http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/content-length.html} @@ -48,10 +48,10 @@ recommended (but not required) by \rfc{2822}. The other public \class{Generator} methods are: -\begin{methoddesc}[Generator]{flatten()}{msg\optional{, unixfrom}} +\begin{methoddesc}[Generator]{flatten}{msg\optional{, unixfrom}} Print the textual representation of the message object structure rooted at \var{msg} to the output file specified when the \class{Generator} -instance was created. Sub-objects are visited depth-first and the +instance was created. Subparts are visited depth-first and the resulting text will be properly MIME encoded. Optional \var{unixfrom} is a flag that forces the printing of the @@ -60,7 +60,7 @@ root message object. If the root object has no envelope header, a standard one is crafted. By default, this is set to \code{False} to inhibit the printing of the envelope delimiter. -Note that for sub-objects, no envelope header is ever printed. +Note that for subparts, no envelope header is ever printed. \versionadded{2.2.2} \end{methoddesc} @@ -99,16 +99,20 @@ Optional \var{_mangle_from_} and \var{maxheaderlen} are as with the \class{Generator} base class. If the subpart is not of main type \mimetype{text}, optional \var{fmt} -is a format string that is used instead of the message -payload. \var{fmt} is expanded with the following keywords (in -\samp{\%(keyword)s} format): - -type : Full MIME type of the non-\mimetype{text} part -maintype : Main MIME type of the non-\mimetype{text} part -subtype : Sub-MIME type of the non-\mimetype{text} part -filename : Filename of the non-\mimetype{text} part -description: Description associated with the non-\mimetype{text} part -encoding : Content transfer encoding of the non-\mimetype{text} part +is a format string that is used instead of the message payload. +\var{fmt} is expanded with the following keywords, \samp{\%(keyword)s} +format: + +\begin{itemize} +\item \code{type} -- Full MIME type of the non-\mimetype{text} part +\item \code{maintype} -- Main MIME type of the non-\mimetype{text} part +\item \code{subtype} -- Sub-MIME type of the non-\mimetype{text} part +\item \code{filename} -- Filename of the non-\mimetype{text} part +\item \code{description} -- Description associated with the + non-\mimetype{text} part +\item \code{encoding} -- Content transfer encoding of the + non-\mimetype{text} part +\end{itemize} The default value for \var{fmt} is \code{None}, meaning diff --git a/Doc/lib/emailheaders.tex b/Doc/lib/emailheaders.tex index 172e5d6..66eb716 100644 --- a/Doc/lib/emailheaders.tex +++ b/Doc/lib/emailheaders.tex @@ -3,7 +3,7 @@ \rfc{2822} is the base standard that describes the format of email messages. It derives from the older \rfc{822} standard which came -into widespread at a time when most email was composed of \ASCII{} +into widespread use at a time when most email was composed of \ASCII{} characters only. \rfc{2822} is a specification written assuming email contains only 7-bit \ASCII{} characters. @@ -19,10 +19,9 @@ The \module{email} package supports these standards in its If you want to include non-\ASCII{} characters in your email headers, say in the \mailheader{Subject} or \mailheader{To} fields, you should -use the \class{Header} class (in module \module{email.Header} and -assign the field in the \class{Message} object to an instance of -\class{Header} instead of using a string for the header value. For -example: +use the \class{Header} class and assign the field in the +\class{Message} object to an instance of \class{Header} instead of +using a string for the header value. For example: \begin{verbatim} >>> from email.Message import Message @@ -50,7 +49,8 @@ Here is the \class{Header} class description: \begin{classdesc}{Header}{\optional{s\optional{, charset\optional{, maxlinelen\optional{, header_name\optional{, continuation_ws}}}}}} -Create a MIME-compliant header that can contain many character sets. +Create a MIME-compliant header that can contain strings in different +character sets. Optional \var{s} is the initial header value. If \code{None} (the default), the initial header value is not set. You can later append @@ -74,7 +74,7 @@ e.g. \mailheader{Subject}) pass in the name of the field in default value for \var{header_name} is \code{None}, meaning it is not taken into account for the first line of a long, split header. -Optional \var{continuation_ws} must be RFC 2822 compliant folding +Optional \var{continuation_ws} must be \rfc{2822}-compliant folding whitespace, and is usually either a space or a hard tab character. This character will be prepended to continuation lines. \end{classdesc} @@ -89,7 +89,7 @@ will be converted to a \class{Charset} instance. A value of constructor is used. \var{s} may be a byte string or a Unicode string. If it is a byte -string (i.e. \code{isinstance(s, StringType)} is true), then +string (i.e. \code{isinstance(s, str)} is true), then \var{charset} is the encoding of that byte string, and a \exception{UnicodeError} will be raised if the string cannot be decoded with that character set. @@ -113,7 +113,7 @@ standard operators and built-in functions. \begin{methoddesc}[Header]{__str__}{} A synonym for \method{Header.encode()}. Useful for -\code{str(aHeader)} calls. +\code{str(aHeader)}. \end{methoddesc} \begin{methoddesc}[Header]{__unicode__}{} @@ -165,245 +165,3 @@ This function takes one of those sequence of pairs and returns a \var{header_name}, and \var{continuation_ws} are as in the \class{Header} constructor. \end{funcdesc} - -\declaremodule{standard}{email.Charset} -\modulesynopsis{Character Sets} - -This module provides a class \class{Charset} for representing -character sets and character set conversions in email messages, as -well as a character set registry and several convenience methods for -manipulating this registry. Instances of \class{Charset} are used in -several other modules within the \module{email} package. - -\versionadded{2.2.2} - -\begin{classdesc}{Charset}{\optional{input_charset}} -Map character sets to their email properties. - -This class provides information about the requirements imposed on -email for a specific character set. It also provides convenience -routines for converting between character sets, given the availability -of the applicable codecs. Given a character set, it will do its best -to provide information on how to use that character set in an email -message in an RFC-compliant way. - -Certain character sets must be encoded with quoted-printable or base64 -when used in email headers or bodies. Certain character sets must be -converted outright, and are not allowed in email. - -Optional \var{input_charset} is as described below. After being alias -normalized it is also used as a lookup into the registry of character -sets to find out the header encoding, body encoding, and output -conversion codec to be used for the character set. For example, if -\var{input_charset} is \code{iso-8859-1}, then headers and bodies will -be encoded using quoted-printable and no output conversion codec is -necessary. If \var{input_charset} is \code{euc-jp}, then headers will -be encoded with base64, bodies will not be encoded, but output text -will be converted from the \code{euc-jp} character set to the -\code{iso-2022-jp} character set. -\end{classdesc} - -\class{Charset} instances have the following data attributes: - -\begin{datadesc}{input_charset} -The initial character set specified. Common aliases are converted to -their \emph{official} email names (e.g. \code{latin_1} is converted to -\code{iso-8859-1}). Defaults to 7-bit \code{us-ascii}. -\end{datadesc} - -\begin{datadesc}{header_encoding} -If the character set must be encoded before it can be used in an -email header, this attribute will be set to \code{Charset.QP} (for -quoted-printable), \code{Charset.BASE64} (for base64 encoding), or -\code{Charset.SHORTEST} for the shortest of QP or BASE64 encoding. -Otherwise, it will be \code{None}. -\end{datadesc} - -\begin{datadesc}{body_encoding} -Same as \var{header_encoding}, but describes the encoding for the -mail message's body, which indeed may be different than the header -encoding. \code{Charset.SHORTEST} is not allowed for -\var{body_encoding}. -\end{datadesc} - -\begin{datadesc}{output_charset} -Some character sets must be converted before the can be used in -email headers or bodies. If the \var{input_charset} is one of -them, this attribute will contain the name of the character set -output will be converted to. Otherwise, it will be \code{None}. -\end{datadesc} - -\begin{datadesc}{input_codec} -The name of the Python codec used to convert the \var{input_charset} to -Unicode. If no conversion codec is necessary, this attribute will be -\code{None}. -\end{datadesc} - -\begin{datadesc}{output_codec} -The name of the Python codec used to convert Unicode to the -\var{output_charset}. If no conversion codec is necessary, this -attribute will have the same value as the \var{input_codec}. -\end{datadesc} - -\class{Charset} instances also have the following methods: - -\begin{methoddesc}[Charset]{get_body_encoding}{} -Return the content transfer encoding used for body encoding. - -This is either the string \samp{quoted-printable} or \samp{base64} -depending on the encoding used, or it is a function, in which case you -should call the function with a single argument, the Message object -being encoded. The function should then set the -\mailheader{Content-Transfer-Encoding} header itself to whatever is -appropriate. - -Returns the string \samp{quoted-printable} if -\var{body_encoding} is \code{QP}, returns the string -\samp{base64} if \var{body_encoding} is \code{BASE64}, and returns the -string \samp{7bit} otherwise. -\end{methoddesc} - -\begin{methoddesc}{convert}{s} -Convert the string \var{s} from the \var{input_codec} to the -\var{output_codec}. -\end{methoddesc} - -\begin{methoddesc}{to_splittable}{s} -Convert a possibly multibyte string to a safely splittable format. -\var{s} is the string to split. - -Uses the \var{input_codec} to try and convert the string to Unicode, -so it can be safely split on character boundaries (even for multibyte -characters). - -Returns the string as-is if it isn't known how to convert \var{s} to -Unicode with the \var{input_charset}. - -Characters that could not be converted to Unicode will be replaced -with the Unicode replacement character \character{U+FFFD}. -\end{methoddesc} - -\begin{methoddesc}{from_splittable}{ustr\optional{, to_output}} -Convert a splittable string back into an encoded string. \var{ustr} -is a Unicode string to ``unsplit''. - -This method uses the proper codec to try and convert the string from -Unicode back into an encoded format. Return the string as-is if it is -not Unicode, or if it could not be converted from Unicode. - -Characters that could not be converted from Unicode will be replaced -with an appropriate character (usually \character{?}). - -If \var{to_output} is \code{True} (the default), uses -\var{output_codec} to convert to an -encoded format. If \var{to_output} is \code{False}, it uses -\var{input_codec}. -\end{methoddesc} - -\begin{methoddesc}{get_output_charset}{} -Return the output character set. - -This is the \var{output_charset} attribute if that is not \code{None}, -otherwise it is \var{input_charset}. -\end{methoddesc} - -\begin{methoddesc}{encoded_header_len}{} -Return the length of the encoded header string, properly calculating -for quoted-printable or base64 encoding. -\end{methoddesc} - -\begin{methoddesc}{header_encode}{s\optional{, convert}} -Header-encode the string \var{s}. - -If \var{convert} is \code{True}, the string will be converted from the -input charset to the output charset automatically. This is not useful -for multibyte character sets, which have line length issues (multibyte -characters must be split on a character, not a byte boundary); use the -higher-level \class{Header} class to deal with these issues (see -\refmodule{email.Header}). \var{convert} defaults to \code{False}. - -The type of encoding (base64 or quoted-printable) will be based on -the \var{header_encoding} attribute. -\end{methoddesc} - -\begin{methoddesc}{body_encode}{s\optional{, convert}} -Body-encode the string \var{s}. - -If \var{convert} is \code{True} (the default), the string will be -converted from the input charset to output charset automatically. -Unlike \method{header_encode()}, there are no issues with byte -boundaries and multibyte charsets in email bodies, so this is usually -pretty safe. - -The type of encoding (base64 or quoted-printable) will be based on -the \var{body_encoding} attribute. -\end{methoddesc} - -The \class{Charset} class also provides a number of methods to support -standard operations and built-in functions. - -\begin{methoddesc}[Charset]{__str__}{} -Returns \var{input_charset} as a string coerced to lower case. -\end{methoddesc} - -\begin{methoddesc}[Charset]{__eq__}{other} -This method allows you to compare two \class{Charset} instances for equality. -\end{methoddesc} - -\begin{methoddesc}[Header]{__ne__}{other} -This method allows you to compare two \class{Charset} instances for inequality. -\end{methoddesc} - -The \module{email.Charset} module also provides the following -functions for adding new entries to the global character set, alias, -and codec registries: - -\begin{funcdesc}{add_charset}{charset\optional{, header_enc\optional{, - body_enc\optional{, output_charset}}}} -Add character properties to the global registry. - -\var{charset} is the input character set, and must be the canonical -name of a character set. - -Optional \var{header_enc} and \var{body_enc} is either -\code{Charset.QP} for quoted-printable, \code{Charset.BASE64} for -base64 encoding, \code{Charset.SHORTEST} for the shortest of qp or -base64 encoding, or \code{None} for no encoding. \code{SHORTEST} is -only valid for \var{header_enc}. It describes how message headers and -message bodies in the input charset are to be encoded. Default is no -encoding. - -Optional \var{output_charset} is the character set that the output -should be in. Conversions will proceed from input charset, to -Unicode, to the output charset when the method -\method{Charset.convert()} is called. The default is to output in the -same character set as the input. - -Both \var{input_charset} and \var{output_charset} must have Unicode -codec entries in the module's character set-to-codec mapping; use -\function{add_codec(charset, codecname)} to add codecs the module does -not know about. See the \refmodule{codecs} module's documentation for -more information. - -The global character set registry is kept in the module global -dictionary \code{CHARSETS}. -\end{funcdesc} - -\begin{funcdesc}{add_alias}{alias, canonical} -Add a character set alias. \var{alias} is the alias name, -e.g. \code{latin-1}. \var{canonical} is the character set's canonical -name, e.g. \code{iso-8859-1}. - -The global charset alias registry is kept in the module global -dictionary \code{ALIASES}. -\end{funcdesc} - -\begin{funcdesc}{add_codec}{charset, codecname} -Add a codec that map characters in the given character set to and from -Unicode. - -\var{charset} is the canonical name of a character set. -\var{codecname} is the name of a Python codec, as appropriate for the -second argument to the \function{unicode()} built-in, or to the -\method{encode()} method of a Unicode string. -\end{funcdesc} diff --git a/Doc/lib/emailmessage.tex b/Doc/lib/emailmessage.tex index 271619d..d76e7fd 100644 --- a/Doc/lib/emailmessage.tex +++ b/Doc/lib/emailmessage.tex @@ -33,9 +33,9 @@ The constructor takes no arguments. \end{classdesc} \begin{methoddesc}[Message]{as_string}{\optional{unixfrom}} -Return the entire formatted message as a string. Optional -\var{unixfrom}, when true, specifies to include the \emph{Unix-From} -envelope header; it defaults to \code{False}. +Return the entire message flatten as a string. When optional +\var{unixfrom} is \code{True}, the envelope header is included in the +returned string. \var{unixfrom} defaults to \code{False}. \end{methoddesc} \begin{methoddesc}[Message]{__str__}{} @@ -59,7 +59,7 @@ envelope header was never set. \end{methoddesc} \begin{methoddesc}[Message]{attach}{payload} -Add the given payload to the current payload, which must be +Add the given \var{payload} to the current payload, which must be \code{None} or a list of \class{Message} objects before the call. After the call, the payload will always be a list of \class{Message} objects. If you want to set the payload to a scalar object (e.g. a @@ -95,7 +95,7 @@ returned. The default for \var{decode} is \code{False}. \begin{methoddesc}[Message]{set_payload}{payload\optional{, charset}} Set the entire message object's payload to \var{payload}. It is the client's responsibility to ensure the payload invariants. Optional -\var{charset} sets the message's default character set (see +\var{charset} sets the message's default character set; see \method{set_charset()} for details. \versionchanged[\var{charset} argument added]{2.2.2} @@ -103,7 +103,7 @@ client's responsibility to ensure the payload invariants. Optional \begin{methoddesc}[Message]{set_charset}{charset} Set the character set of the payload to \var{charset}, which can -either be a \class{Charset} instance (see \refmodule{email.Charset}, a +either be a \class{Charset} instance (see \refmodule{email.Charset}), a string naming a character set, or \code{None}. If it is a string, it will be converted to a \class{Charset} instance. If \var{charset} is \code{None}, the @@ -128,14 +128,18 @@ Return the \class{Charset} instance associated with the message's payload. \end{methoddesc} The following methods implement a mapping-like interface for accessing -the message object's \rfc{2822} headers. Note that there are some +the message's \rfc{2822} headers. Note that there are some semantic differences between these methods and a normal mapping (i.e. dictionary) interface. For example, in a dictionary there are no duplicate keys, but here there may be duplicate message headers. Also, in dictionaries there is no guaranteed order to the keys returned by -\method{keys()}, but in a \class{Message} object, there is an explicit -order. These semantic differences are intentional and are biased -toward maximal convenience. +\method{keys()}, but in a \class{Message} object, headers are always +returned in the order they appeared in the original message, or were +added to the message later. Any header deleted and then re-added are +always appended to the end of the header list. + +These semantic differences are intentional and are biased toward +maximal convenience. Note that in all cases, any envelope header present in the message is not included in the mapping interface. @@ -175,8 +179,7 @@ fields. Note that this does \emph{not} overwrite or delete any existing header with the same name. If you want to ensure that the new header is the only one present in the message with field name -\var{name}, first use \method{__delitem__()} to delete all named -fields, e.g.: +\var{name}, delete the field first, e.g.: \begin{verbatim} del msg['subject'] @@ -196,27 +199,16 @@ otherwise return false. \end{methoddesc} \begin{methoddesc}[Message]{keys}{} -Return a list of all the message's header field names. These keys -will be sorted in the order in which they appeared in the original -message, or were added to the message and may contain -duplicates. Any fields deleted and then subsequently re-added are -always appended to the end of the header list. +Return a list of all the message's header field names. \end{methoddesc} \begin{methoddesc}[Message]{values}{} -Return a list of all the message's field values. These will be sorted -in the order in which they appeared in the original message, or were -added to the message, and may contain -duplicates. Any fields deleted and then subsequently re-added are -always appended to the end of the header list. +Return a list of all the message's field values. \end{methoddesc} \begin{methoddesc}[Message]{items}{} Return a list of 2-tuples containing all the message's field headers -and values. These will be sorted in the order in which they appeared -in the original message, or were added to the message, and may contain -duplicates. Any fields deleted and then subsequently re-added are -always appended to the end of the header list. +and values. \end{methoddesc} \begin{methoddesc}[Message]{get}{name\optional{, failobj}} @@ -228,11 +220,7 @@ if the named header is missing (defaults to \code{None}). Here are some additional useful methods: \begin{methoddesc}[Message]{get_all}{name\optional{, failobj}} -Return a list of all the values for the field named \var{name}. These -will be sorted in the order in which they appeared in the original -message, or were added to the message. Any fields deleted and then -subsequently re-added are always appended to the end of the list. - +Return a list of all the values for the field named \var{name}. If there are no such named headers in the message, \var{failobj} is returned (defaults to \code{None}). \end{methoddesc} @@ -351,10 +339,10 @@ instead of \mailheader{Content-Type}. Parameter keys are always compared case insensitively. The return value can either be a string, or a 3-tuple if the parameter was \rfc{2231} encoded. When it's a 3-tuple, the elements of the value are of -the form \samp{(CHARSET, LANGUAGE, VALUE)}, where \var{LANGUAGE} may +the form \code{(CHARSET, LANGUAGE, VALUE)}, where \code{LANGUAGE} may be the empty string. Your application should be prepared to deal with -3-tuple return values, which it can convert the parameter to a Unicode -string like so: +3-tuple return values, which it can convert to a Unicode string like +so: \begin{verbatim} param = msg.get_param('foo') @@ -363,7 +351,7 @@ if isinstance(param, tuple): \end{verbatim} In any case, the parameter value (either the returned string, or the -\var{VALUE} item in the 3-tuple) is always unquoted, unless +\code{VALUE} item in the 3-tuple) is always unquoted, unless \var{unquote} is set to \code{False}. \versionchanged[\var{unquote} argument added, and 3-tuple return value @@ -398,7 +386,7 @@ Remove the given parameter completely from the \mailheader{Content-Type} header. The header will be re-written in place without the parameter or its value. All values will be quoted as necessary unless \var{requote} is \code{False} (the default is -\code{True}). Optional \var{header} specifies an alterative to +\code{True}). Optional \var{header} specifies an alternative to \mailheader{Content-Type}. \versionadded{2.2.2} @@ -417,8 +405,8 @@ leaves the existing header's quoting as is, otherwise the parameters will be quoted (the default). An alternative header can be specified in the \var{header} argument. -When the \mailheader{Content-Type} header is set, we'll always also -add a \mailheader{MIME-Version} header. +When the \mailheader{Content-Type} header is set a +\mailheader{MIME-Version} header is also added. \versionadded{2.2.2} \end{methoddesc} @@ -440,11 +428,10 @@ returned string will always be unquoted as per \end{methoddesc} \begin{methoddesc}[Message]{set_boundary}{boundary} -Set the \code{boundary} parameter of the \mailheader{Content-Type} header -to \var{boundary}. \method{set_boundary()} will always quote -\var{boundary} so you should not quote it yourself. A -\exception{HeaderParseError} is raised if the message object has no -\mailheader{Content-Type} header. +Set the \code{boundary} parameter of the \mailheader{Content-Type} +header to \var{boundary}. \method{set_boundary()} will always quote +\var{boundary} if necessary. A \exception{HeaderParseError} is raised +if the message object has no \mailheader{Content-Type} header. Note that using this method is subtly different than deleting the old \mailheader{Content-Type} header and adding a new one with the new boundary @@ -459,9 +446,9 @@ Return the \code{charset} parameter of the \mailheader{Content-Type} header. If there is no \mailheader{Content-Type} header, or if that header has no \code{charset} parameter, \var{failobj} is returned. -Note that this method differs from \method{get_charset} which returns -the \class{Charset} instance for the default encoding of the message -body. +Note that this method differs from \method{get_charset()} which +returns the \class{Charset} instance for the default encoding of the +message body. \versionadded{2.2.2} \end{methoddesc} @@ -484,15 +471,15 @@ will be \var{failobj}. The \method{walk()} method is an all-purpose generator which can be used to iterate over all the parts and subparts of a message object tree, in depth-first traversal order. You will typically use -\method{walk()} as the iterator in a \code{for ... in} loop; each +\method{walk()} as the iterator in a \code{for} loop; each iteration returns the next subpart. -Here's an example that prints the MIME type of every part of a message -object tree: +Here's an example that prints the MIME type of every part of a +multipart message structure: \begin{verbatim} >>> for part in msg.walk(): ->>> print part.get_type('text/plain') +>>> print part.get_content_type() multipart/report text/plain message/delivery-status diff --git a/Doc/lib/emailmimebase.tex b/Doc/lib/emailmimebase.tex index 97c3eda..6bbd5dd 100644 --- a/Doc/lib/emailmimebase.tex +++ b/Doc/lib/emailmimebase.tex @@ -1,10 +1,10 @@ Ordinarily, you get a message object structure by passing a file or -some text to a parser, which parses the text and returns the root of -the message object structure. However you can also build a complete -object structure from scratch, or even individual \class{Message} -objects by hand. In fact, you can also take an existing structure and -add new \class{Message} objects, move them around, etc. This makes a -very convenient interface for slicing-and-dicing MIME messages. +some text to a parser, which parses the text and returns the root +message object. However you can also build a complete message +structure from scratch, or even individual \class{Message} objects by +hand. In fact, you can also take an existing structure and add new +\class{Message} objects, move them around, etc. This makes a very +convenient interface for slicing-and-dicing MIME messages. You can create a new object structure by creating \class{Message} instances, adding attachments and all the appropriate headers manually. @@ -99,7 +99,7 @@ callable takes one argument, which is the \class{MIMEAudio} instance. It should use \method{get_payload()} and \method{set_payload()} to change the payload to encoded form. It should also add any \mailheader{Content-Transfer-Encoding} or other headers to the message -object as necessary. The default encoding is \emph{Base64}. See the +object as necessary. The default encoding is base64. See the \refmodule{email.Encoders} module for a list of the built-in encoders. \var{_params} are passed straight through to the base class constructor. @@ -124,7 +124,7 @@ callable takes one argument, which is the \class{MIMEImage} instance. It should use \method{get_payload()} and \method{set_payload()} to change the payload to encoded form. It should also add any \mailheader{Content-Transfer-Encoding} or other headers to the message -object as necessary. The default encoding is \emph{Base64}. See the +object as necessary. The default encoding is base64. See the \refmodule{email.Encoders} module for a list of the built-in encoders. \var{_params} are passed straight through to the \class{MIMEBase} diff --git a/Doc/lib/emailparser.tex b/Doc/lib/emailparser.tex index b5d9900..62a5a6f 100644 --- a/Doc/lib/emailparser.tex +++ b/Doc/lib/emailparser.tex @@ -54,7 +54,7 @@ should be performed. Normally, when things like MIME terminating boundaries are missing, or when messages contain other formatting problems, the \class{Parser} will raise a \exception{MessageParseError}. However, when lax parsing is enabled, -the \class{Parser} will attempt to workaround such broken formatting +the \class{Parser} will attempt to work around such broken formatting to produce a usable message structure (this doesn't mean \exception{MessageParseError}s are never raised; some ill-formatted messages just can't be parsed). The \var{strict} flag defaults to @@ -73,14 +73,12 @@ support both the \method{readline()} and the \method{read()} methods on file-like objects. The text contained in \var{fp} must be formatted as a block of \rfc{2822} -style headers and header continuation lines, optionally preceeded by a +style headers and header continuation lines, optionally preceded by a envelope header. The header block is terminated either by the end of the data or by a blank line. Following the header block is the body of the message (which may contain MIME-encoded subparts). -Optional \var{headersonly} is a flag specifying whether to stop -parsing after reading the headers or not. The default is \code{False}, -meaning it parses the entire contents of the file. +Optional \var{headersonly} is as with the \method{parse()} method. \versionchanged[The \var{headersonly} flag was added]{2.2.2} \end{methoddesc} @@ -104,7 +102,7 @@ convenience. They are available in the top-level \module{email} package namespace. \begin{funcdesc}{message_from_string}{s\optional{, _class\optional{, strict}}} -Return a message object tree from a string. This is exactly +Return a message object structure from a string. This is exactly equivalent to \code{Parser().parsestr(s)}. Optional \var{_class} and \var{strict} are interpreted as with the \class{Parser} class constructor. @@ -112,9 +110,10 @@ equivalent to \code{Parser().parsestr(s)}. Optional \var{_class} and \end{funcdesc} \begin{funcdesc}{message_from_file}{fp\optional{, _class\optional{, strict}}} -Return a message object tree from an open file object. This is exactly -equivalent to \code{Parser().parse(fp)}. Optional \var{_class} and -\var{strict} are interpreted as with the \class{Parser} class constructor. +Return a message object structure tree from an open file object. This +is exactly equivalent to \code{Parser().parse(fp)}. Optional +\var{_class} and \var{strict} are interpreted as with the +\class{Parser} class constructor. \versionchanged[The \var{strict} flag was added]{2.2.2} \end{funcdesc} @@ -138,9 +137,10 @@ Here are some notes on the parsing semantics: \method{get_payload()} method will return a string object. \item All \mimetype{multipart} type messages will be parsed as a container message object with a list of sub-message objects for - their payload. These messages will return \code{True} for - \method{is_multipart()} and their \method{get_payload()} method - will return a list of \class{Message} instances. + their payload. The outer container message will return + \code{True} for \method{is_multipart()} and their + \method{get_payload()} method will return the list of + \class{Message} subparts. \item Most messages with a content type of \mimetype{message/*} (e.g. \mimetype{message/deliver-status} and \mimetype{message/rfc822}) will also be parsed as container diff --git a/Doc/lib/emailutil.tex b/Doc/lib/emailutil.tex index e2ff752..80f0acf 100644 --- a/Doc/lib/emailutil.tex +++ b/Doc/lib/emailutil.tex @@ -6,7 +6,7 @@ package. \begin{funcdesc}{quote}{str} Return a new string with backslashes in \var{str} replaced by two -backslashes and double quotes replaced by backslash-double quote. +backslashes, and double quotes replaced by backslash-double quote. \end{funcdesc} \begin{funcdesc}{unquote}{str} @@ -85,7 +85,7 @@ common use. \end{funcdesc} \begin{funcdesc}{formatdate}{\optional{timeval\optional{, localtime}}} -Returns a date string as per Internet standard \rfc{2822}, e.g.: +Returns a date string as per \rfc{2822}, e.g.: \begin{verbatim} Fri, 09 Nov 2001 01:08:47 -0000 -- cgit v0.12