Doc/lib/librfc822.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160

\section{Standard Module \sectcode{rfc822}}
\label{module-rfc822}
\stmodindex{rfc822}

\setindexsubitem{(in module rfc822)}

This module defines a class, \code{Message}, which represents a
collection of ``email headers'' as defined by the Internet standard
\rfc{822}.  It is used in various contexts, usually to read such
headers from a file.

Note that there's a separate module to read \UNIX{}, MH, and MMDF
style mailbox files: \code{mailbox}.
\refstmodindex{mailbox}

A \code{Message} instance is instantiated with an open file object as
parameter.  The optional \code{seekable} parameter indicates if the
file object is seekable; the default value is 1 for true.
Instantiation reads headers from the file up to a blank line and
stores them in the instance; after instantiation, the file is
positioned directly after the blank line that terminates the headers.

Input lines as read from the file may either be terminated by CR-LF or
by a single linefeed; a terminating CR-LF is replaced by a single
linefeed before the line is stored.

All header matching is done independent of upper or lower case;
e.g. \code{m['From']}, \code{m['from']} and \code{m['FROM']} all yield
the same result.

\begin{funcdesc}{parsedate}{date}
Attempts to parse a date according to the rules in \rfc{822}.  however,
some mailers don't follow that format as specified, so
\code{parsedate()} tries to guess correctly in such cases. 
\var{date} is a string containing an \rfc{822} date, such as 
\code{"Mon, 20 Nov 1995 19:12:08 -0500"}.  If it succeeds in parsing
the date, \code{parsedate()} returns a 9-tuple that can be passed
directly to \code{time.mktime()}; otherwise \code{None} will be
returned.  
\end{funcdesc}

\begin{funcdesc}{parsedate_tz}{date}
Performs the same function as \code{parsedate()}, but returns either
\code{None} or a 10-tuple; the first 9 elements make up a tuple that
can be passed directly to \code{time.mktime()}, and the tenth is the
offset of the date's timezone from UTC (which is the official term
for Greenwich Mean Time).  (Note that the sign of the timezone offset
is the opposite of the sign of the \code{time.timezone} variable for
the same timezone; the latter variable follows the \POSIX{} standard
while this module follows \rfc{822}.)  If the input string has no
timezone, the last element of the tuple returned is \code{None}.
\end{funcdesc}

\begin{funcdesc}{mktime_tz}{tuple}
Turn a 10-tuple as returned by \code{parsedate_tz()} into a UTC timestamp.
It the timezone item in the tuple is \code{None}, assume local time.
Minor deficiency: this first interprets the first 8 elements as a
local time and then compensates for the timezone difference;
this may yield a slight error around daylight savings time
switch dates.  Not enough to worry about for common use.
\end{funcdesc}

\subsection{Message Objects}

A \code{Message} instance has the following methods:

\begin{funcdesc}{rewindbody}{}
Seek to the start of the message body.  This only works if the file
object is seekable.
\end{funcdesc}

\begin{funcdesc}{getallmatchingheaders}{name}
Return a list of lines consisting of all headers matching
\var{name}, if any.  Each physical line, whether it is a continuation
line or not, is a separate list item.  Return the empty list if no
header matches \var{name}.
\end{funcdesc}

\begin{funcdesc}{getfirstmatchingheader}{name}
Return a list of lines comprising the first header matching
\var{name}, and its continuation line(s), if any.  Return \code{None}
if there is no header matching \var{name}.
\end{funcdesc}

\begin{funcdesc}{getrawheader}{name}
Return a single string consisting of the text after the colon in the
first header matching \var{name}.  This includes leading whitespace,
the trailing linefeed, and internal linefeeds and whitespace if there
any continuation line(s) were present.  Return \code{None} if there is
no header matching \var{name}.
\end{funcdesc}

\begin{funcdesc}{getheader}{name}
Like \code{getrawheader(\var{name})}, but strip leading and trailing
whitespace (but not internal whitespace).
\end{funcdesc}

\begin{funcdesc}{getaddr}{name}
Return a pair (full name, email address) parsed from the string
returned by \code{getheader(\var{name})}.  If no header matching
\var{name} exists, return \code{None, None}; otherwise both the full
name and the address are (possibly empty )strings.

Example: If \code{m}'s first \code{From} header contains the string\\
\code{'jack@cwi.nl (Jack Jansen)'}, then
\code{m.getaddr('From')} will yield the pair
\code{('Jack Jansen', 'jack@cwi.nl')}.
If the header contained
\code{'Jack Jansen <jack@cwi.nl>'} instead, it would yield the
exact same result.
\end{funcdesc}

\begin{funcdesc}{getaddrlist}{name}
This is similar to \code{getaddr(\var{list})}, but parses a header
containing a list of email addresses (e.g. a \code{To} header) and
returns a list of (full name, email address) pairs (even if there was
only one address in the header).  If there is no header matching
\var{name}, return an empty list.

XXX The current version of this function is not really correct.  It
yields bogus results if a full name contains a comma.
\end{funcdesc}

\begin{funcdesc}{getdate}{name}
Retrieve a header using \code{getheader} and parse it into a 9-tuple
compatible with \code{time.mktime()}.  If there is no header matching
\var{name}, or it is unparsable, return \code{None}.

Date parsing appears to be a black art, and not all mailers adhere to
the standard.  While it has been tested and found correct on a large
collection of email from many sources, it is still possible that this
function may occasionally yield an incorrect result.
\end{funcdesc}

\begin{funcdesc}{getdate_tz}{name}
Retrieve a header using \code{getheader} and parse it into a 10-tuple;
the first 9 elements will make a tuple compatible with
\code{time.mktime()}, and the 10th is a number giving the offset of
the date's timezone from UTC.  Similarly to \code{getdate()}, if
there is no header matching \var{name}, or it is unparsable, return
\code{None}. 
\end{funcdesc}

\code{Message} instances also support a read-only mapping interface.
In particular: \code{m[name]} is the same as \code{m.getheader(name)};
and \code{len(m)}, \code{m.has_key(name)}, \code{m.keys()},
\code{m.values()} and \code{m.items()} act as expected (and
consistently).

Finally, \code{Message} instances have two public instance variables:

\begin{datadesc}{headers}
A list containing the entire set of header lines, in the order in
which they were read.  Each line contains a trailing newline.  The
blank line terminating the headers is not contained in the list.
\end{datadesc}

\begin{datadesc}{fp}
The file object passed at instantiation time.
\end{datadesc}