diff options
Diffstat (limited to 'Doc/lib/emailheaders.tex')
-rw-r--r-- | Doc/lib/emailheaders.tex | 260 |
1 files changed, 9 insertions, 251 deletions
diff --git a/Doc/lib/emailheaders.tex b/Doc/lib/emailheaders.tex index 172e5d6..66eb716 100644 --- a/Doc/lib/emailheaders.tex +++ b/Doc/lib/emailheaders.tex @@ -3,7 +3,7 @@ \rfc{2822} is the base standard that describes the format of email messages. It derives from the older \rfc{822} standard which came -into widespread at a time when most email was composed of \ASCII{} +into widespread use at a time when most email was composed of \ASCII{} characters only. \rfc{2822} is a specification written assuming email contains only 7-bit \ASCII{} characters. @@ -19,10 +19,9 @@ The \module{email} package supports these standards in its If you want to include non-\ASCII{} characters in your email headers, say in the \mailheader{Subject} or \mailheader{To} fields, you should -use the \class{Header} class (in module \module{email.Header} and -assign the field in the \class{Message} object to an instance of -\class{Header} instead of using a string for the header value. For -example: +use the \class{Header} class and assign the field in the +\class{Message} object to an instance of \class{Header} instead of +using a string for the header value. For example: \begin{verbatim} >>> from email.Message import Message @@ -50,7 +49,8 @@ Here is the \class{Header} class description: \begin{classdesc}{Header}{\optional{s\optional{, charset\optional{, maxlinelen\optional{, header_name\optional{, continuation_ws}}}}}} -Create a MIME-compliant header that can contain many character sets. +Create a MIME-compliant header that can contain strings in different +character sets. Optional \var{s} is the initial header value. If \code{None} (the default), the initial header value is not set. You can later append @@ -74,7 +74,7 @@ e.g. \mailheader{Subject}) pass in the name of the field in default value for \var{header_name} is \code{None}, meaning it is not taken into account for the first line of a long, split header. -Optional \var{continuation_ws} must be RFC 2822 compliant folding +Optional \var{continuation_ws} must be \rfc{2822}-compliant folding whitespace, and is usually either a space or a hard tab character. This character will be prepended to continuation lines. \end{classdesc} @@ -89,7 +89,7 @@ will be converted to a \class{Charset} instance. A value of constructor is used. \var{s} may be a byte string or a Unicode string. If it is a byte -string (i.e. \code{isinstance(s, StringType)} is true), then +string (i.e. \code{isinstance(s, str)} is true), then \var{charset} is the encoding of that byte string, and a \exception{UnicodeError} will be raised if the string cannot be decoded with that character set. @@ -113,7 +113,7 @@ standard operators and built-in functions. \begin{methoddesc}[Header]{__str__}{} A synonym for \method{Header.encode()}. Useful for -\code{str(aHeader)} calls. +\code{str(aHeader)}. \end{methoddesc} \begin{methoddesc}[Header]{__unicode__}{} @@ -165,245 +165,3 @@ This function takes one of those sequence of pairs and returns a \var{header_name}, and \var{continuation_ws} are as in the \class{Header} constructor. \end{funcdesc} - -\declaremodule{standard}{email.Charset} -\modulesynopsis{Character Sets} - -This module provides a class \class{Charset} for representing -character sets and character set conversions in email messages, as -well as a character set registry and several convenience methods for -manipulating this registry. Instances of \class{Charset} are used in -several other modules within the \module{email} package. - -\versionadded{2.2.2} - -\begin{classdesc}{Charset}{\optional{input_charset}} -Map character sets to their email properties. - -This class provides information about the requirements imposed on -email for a specific character set. It also provides convenience -routines for converting between character sets, given the availability -of the applicable codecs. Given a character set, it will do its best -to provide information on how to use that character set in an email -message in an RFC-compliant way. - -Certain character sets must be encoded with quoted-printable or base64 -when used in email headers or bodies. Certain character sets must be -converted outright, and are not allowed in email. - -Optional \var{input_charset} is as described below. After being alias -normalized it is also used as a lookup into the registry of character -sets to find out the header encoding, body encoding, and output -conversion codec to be used for the character set. For example, if -\var{input_charset} is \code{iso-8859-1}, then headers and bodies will -be encoded using quoted-printable and no output conversion codec is -necessary. If \var{input_charset} is \code{euc-jp}, then headers will -be encoded with base64, bodies will not be encoded, but output text -will be converted from the \code{euc-jp} character set to the -\code{iso-2022-jp} character set. -\end{classdesc} - -\class{Charset} instances have the following data attributes: - -\begin{datadesc}{input_charset} -The initial character set specified. Common aliases are converted to -their \emph{official} email names (e.g. \code{latin_1} is converted to -\code{iso-8859-1}). Defaults to 7-bit \code{us-ascii}. -\end{datadesc} - -\begin{datadesc}{header_encoding} -If the character set must be encoded before it can be used in an -email header, this attribute will be set to \code{Charset.QP} (for -quoted-printable), \code{Charset.BASE64} (for base64 encoding), or -\code{Charset.SHORTEST} for the shortest of QP or BASE64 encoding. -Otherwise, it will be \code{None}. -\end{datadesc} - -\begin{datadesc}{body_encoding} -Same as \var{header_encoding}, but describes the encoding for the -mail message's body, which indeed may be different than the header -encoding. \code{Charset.SHORTEST} is not allowed for -\var{body_encoding}. -\end{datadesc} - -\begin{datadesc}{output_charset} -Some character sets must be converted before the can be used in -email headers or bodies. If the \var{input_charset} is one of -them, this attribute will contain the name of the character set -output will be converted to. Otherwise, it will be \code{None}. -\end{datadesc} - -\begin{datadesc}{input_codec} -The name of the Python codec used to convert the \var{input_charset} to -Unicode. If no conversion codec is necessary, this attribute will be -\code{None}. -\end{datadesc} - -\begin{datadesc}{output_codec} -The name of the Python codec used to convert Unicode to the -\var{output_charset}. If no conversion codec is necessary, this -attribute will have the same value as the \var{input_codec}. -\end{datadesc} - -\class{Charset} instances also have the following methods: - -\begin{methoddesc}[Charset]{get_body_encoding}{} -Return the content transfer encoding used for body encoding. - -This is either the string \samp{quoted-printable} or \samp{base64} -depending on the encoding used, or it is a function, in which case you -should call the function with a single argument, the Message object -being encoded. The function should then set the -\mailheader{Content-Transfer-Encoding} header itself to whatever is -appropriate. - -Returns the string \samp{quoted-printable} if -\var{body_encoding} is \code{QP}, returns the string -\samp{base64} if \var{body_encoding} is \code{BASE64}, and returns the -string \samp{7bit} otherwise. -\end{methoddesc} - -\begin{methoddesc}{convert}{s} -Convert the string \var{s} from the \var{input_codec} to the -\var{output_codec}. -\end{methoddesc} - -\begin{methoddesc}{to_splittable}{s} -Convert a possibly multibyte string to a safely splittable format. -\var{s} is the string to split. - -Uses the \var{input_codec} to try and convert the string to Unicode, -so it can be safely split on character boundaries (even for multibyte -characters). - -Returns the string as-is if it isn't known how to convert \var{s} to -Unicode with the \var{input_charset}. - -Characters that could not be converted to Unicode will be replaced -with the Unicode replacement character \character{U+FFFD}. -\end{methoddesc} - -\begin{methoddesc}{from_splittable}{ustr\optional{, to_output}} -Convert a splittable string back into an encoded string. \var{ustr} -is a Unicode string to ``unsplit''. - -This method uses the proper codec to try and convert the string from -Unicode back into an encoded format. Return the string as-is if it is -not Unicode, or if it could not be converted from Unicode. - -Characters that could not be converted from Unicode will be replaced -with an appropriate character (usually \character{?}). - -If \var{to_output} is \code{True} (the default), uses -\var{output_codec} to convert to an -encoded format. If \var{to_output} is \code{False}, it uses -\var{input_codec}. -\end{methoddesc} - -\begin{methoddesc}{get_output_charset}{} -Return the output character set. - -This is the \var{output_charset} attribute if that is not \code{None}, -otherwise it is \var{input_charset}. -\end{methoddesc} - -\begin{methoddesc}{encoded_header_len}{} -Return the length of the encoded header string, properly calculating -for quoted-printable or base64 encoding. -\end{methoddesc} - -\begin{methoddesc}{header_encode}{s\optional{, convert}} -Header-encode the string \var{s}. - -If \var{convert} is \code{True}, the string will be converted from the -input charset to the output charset automatically. This is not useful -for multibyte character sets, which have line length issues (multibyte -characters must be split on a character, not a byte boundary); use the -higher-level \class{Header} class to deal with these issues (see -\refmodule{email.Header}). \var{convert} defaults to \code{False}. - -The type of encoding (base64 or quoted-printable) will be based on -the \var{header_encoding} attribute. -\end{methoddesc} - -\begin{methoddesc}{body_encode}{s\optional{, convert}} -Body-encode the string \var{s}. - -If \var{convert} is \code{True} (the default), the string will be -converted from the input charset to output charset automatically. -Unlike \method{header_encode()}, there are no issues with byte -boundaries and multibyte charsets in email bodies, so this is usually -pretty safe. - -The type of encoding (base64 or quoted-printable) will be based on -the \var{body_encoding} attribute. -\end{methoddesc} - -The \class{Charset} class also provides a number of methods to support -standard operations and built-in functions. - -\begin{methoddesc}[Charset]{__str__}{} -Returns \var{input_charset} as a string coerced to lower case. -\end{methoddesc} - -\begin{methoddesc}[Charset]{__eq__}{other} -This method allows you to compare two \class{Charset} instances for equality. -\end{methoddesc} - -\begin{methoddesc}[Header]{__ne__}{other} -This method allows you to compare two \class{Charset} instances for inequality. -\end{methoddesc} - -The \module{email.Charset} module also provides the following -functions for adding new entries to the global character set, alias, -and codec registries: - -\begin{funcdesc}{add_charset}{charset\optional{, header_enc\optional{, - body_enc\optional{, output_charset}}}} -Add character properties to the global registry. - -\var{charset} is the input character set, and must be the canonical -name of a character set. - -Optional \var{header_enc} and \var{body_enc} is either -\code{Charset.QP} for quoted-printable, \code{Charset.BASE64} for -base64 encoding, \code{Charset.SHORTEST} for the shortest of qp or -base64 encoding, or \code{None} for no encoding. \code{SHORTEST} is -only valid for \var{header_enc}. It describes how message headers and -message bodies in the input charset are to be encoded. Default is no -encoding. - -Optional \var{output_charset} is the character set that the output -should be in. Conversions will proceed from input charset, to -Unicode, to the output charset when the method -\method{Charset.convert()} is called. The default is to output in the -same character set as the input. - -Both \var{input_charset} and \var{output_charset} must have Unicode -codec entries in the module's character set-to-codec mapping; use -\function{add_codec(charset, codecname)} to add codecs the module does -not know about. See the \refmodule{codecs} module's documentation for -more information. - -The global character set registry is kept in the module global -dictionary \code{CHARSETS}. -\end{funcdesc} - -\begin{funcdesc}{add_alias}{alias, canonical} -Add a character set alias. \var{alias} is the alias name, -e.g. \code{latin-1}. \var{canonical} is the character set's canonical -name, e.g. \code{iso-8859-1}. - -The global charset alias registry is kept in the module global -dictionary \code{ALIASES}. -\end{funcdesc} - -\begin{funcdesc}{add_codec}{charset, codecname} -Add a codec that map characters in the given character set to and from -Unicode. - -\var{charset} is the canonical name of a character set. -\var{codecname} is the name of a Python codec, as appropriate for the -second argument to the \function{unicode()} built-in, or to the -\method{encode()} method of a Unicode string. -\end{funcdesc} |