\section{Standard Module \sectcode{rfc822}} \label{module-rfc822} \stmodindex{rfc822} \renewcommand{\indexsubitem}{(in module rfc822)} This module defines a class, \code{Message}, which represents a collection of ``email headers'' as defined by the Internet standard RFC 822. It is used in various contexts, usually to read such headers from a file. \index{RFC!822} Note that there's a separate module to read \UNIX{}, MH, and MMDF style mailbox files: \code{mailbox}. \refstmodindex{mailbox} A \code{Message} instance is instantiated with an open file object as parameter. The optional \code{seekable} parameter indicates if the file object is seekable; the default value is 1 for true. Instantiation reads headers from the file up to a blank line and stores them in the instance; after instantiation, the file is positioned directly after the blank line that terminates the headers. Input lines as read from the file may either be terminated by CR-LF or by a single linefeed; a terminating CR-LF is replaced by a single linefeed before the line is stored. All header matching is done independent of upper or lower case; e.g. \code{m['From']}, \code{m['from']} and \code{m['FROM']} all yield the same result. \begin{funcdesc}{parsedate}{date} Attempts to parse a date according to the rules in RFC822. however, some mailers don't follow that format as specified, so \code{parsedate()} tries to guess correctly in such cases. \var{date} is a string containing an RFC822 date, such as \code{"Mon, 20 Nov 1995 19:12:08 -0500"}. If it succeeds in parsing the date, \code{parsedate()} returns a 9-tuple that can be passed directly to \code{time.mktime()}; otherwise \code{None} will be returned. \end{funcdesc} \begin{funcdesc}{parsedate_tz}{date} Performs the same function as \code{parsedate}, but returns either \code{None} or a 10-tuple; the first 9 elements make up a tuple that can be passed directly to \code{time.mktime()}, and the tenth is the offset of the date's time zone from UTC (which is the official term for Greenwich Mean Time). \end{funcdesc} \subsection{Message Objects} A \code{Message} instance has the following methods: \begin{funcdesc}{rewindbody}{} Seek to the start of the message body. This only works if the file object is seekable. \end{funcdesc} \begin{funcdesc}{getallmatchingheaders}{name} Return a list of lines consisting of all headers matching \var{name}, if any. Each physical line, whether it is a continuation line or not, is a separate list item. Return the empty list if no header matches \var{name}. \end{funcdesc} \begin{funcdesc}{getfirstmatchingheader}{name} Return a list of lines comprising the first header matching \var{name}, and its continuation line(s), if any. Return \code{None} if there is no header matching \var{name}. \end{funcdesc} \begin{funcdesc}{getrawheader}{name} Return a single string consisting of the text after the colon in the first header matching \var{name}. This includes leading whitespace, the trailing linefeed, and internal linefeeds and whitespace if there any continuation line(s) were present. Return \code{None} if there is no header matching \var{name}. \end{funcdesc} \begin{funcdesc}{getheader}{name} Like \code{getrawheader(\var{name})}, but strip leading and trailing whitespace (but not internal whitespace). \end{funcdesc} \begin{funcdesc}{getaddr}{name} Return a pair (full name, email address) parsed from the string returned by \code{getheader(\var{name})}. If no header matching \var{name} exists, return \code{None, None}; otherwise both the full name and the address are (possibly empty )strings. Example: If \code{m}'s first \code{From} header contains the string\\ \code{'jack@cwi.nl (Jack Jansen)'}, then \code{m.getaddr('From')} will yield the pair \code{('Jack Jansen', 'jack@cwi.nl')}. If the header contained \code{'Jack Jansen '} instead, it would yield the exact same result. \end{funcdesc} \begin{funcdesc}{getaddrlist}{name} This is similar to \code{getaddr(\var{list})}, but parses a header containing a list of email addresses (e.g. a \code{To} header) and returns a list of (full name, email address) pairs (even if there was only one address in the header). If there is no header matching \var{name}, return an empty list. XXX The current version of this function is not really correct. It yields bogus results if a full name contains a comma. \end{funcdesc} \begin{funcdesc}{getdate}{name} Retrieve a header using \code{getheader} and parse it into a 9-tuple compatible with \code{time.mktime()}. If there is no header matching \var{name}, or it is unparsable, return \code{None}. Date parsing appears to be a black art, and not all mailers adhere to the standard. While it has been tested and found correct on a large collection of email from many sources, it is still possible that this function may occasionally yield an incorrect result. \end{funcdesc} \begin{funcdesc}{getdate_tz}{name} Retrieve a header using \code{getheader} and parse it into a 10-tuple; the first 9 elements will make a tuple compatible with \code{time.mktime()}, and the 10th is a number giving the offset of the date's time zone from UTC. Similarly to \code{getdate()}, if there is no header matching \var{name}, or it is unparsable, return \code{None}. \end{funcdesc} \code{Message} instances also support a read-only mapping interface. In particular: \code{m[name]} is the same as \code{m.getheader(name)}; and \code{len(m)}, \code{m.has_key(name)}, \code{m.keys()}, \code{m.values()} and \code{m.items()} act as expected (and consistently). Finally, \code{Message} instances have two public instance variables: \begin{datadesc}{headers} A list containing the entire set of header lines, in the order in which they were read. Each line contains a trailing newline. The blank line terminating the headers is not contained in the list. \end{datadesc} \begin{datadesc}{fp} The file object passed at instantiation time. \end{datadesc}