summaryrefslogtreecommitdiffstats
path: root/Doc/library/rfc822.rst
blob: bd8c9a28fb88522a027170c41ca28fdf01fb4e22 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354

:mod:`rfc822` --- Parse RFC 2822 mail headers
=============================================

.. module:: rfc822
   :synopsis: Parse 2822 style mail messages.
   :deprecated:


.. deprecated:: 2.3
   The :mod:`email` package should be used in preference to the :mod:`rfc822`
   module.  This module is present only to maintain backward compatibility.

This module defines a class, :class:`Message`, which represents an "email
message" as defined by the Internet standard :rfc:`2822`. [#]_  Such messages
consist of a collection of message headers, and a message body.  This module
also defines a helper class :class:`AddressList` for parsing :rfc:`2822`
addresses.  Please refer to the RFC for information on the specific syntax of
:rfc:`2822` messages.

.. index:: module: mailbox

The :mod:`mailbox` module provides classes  to read mailboxes produced by
various end-user mail programs.


.. class:: Message(file[, seekable])

   A :class:`Message` instance is instantiated with an input object as parameter.
   Message relies only on the input object having a :meth:`readline` method; in
   particular, ordinary file objects qualify.  Instantiation reads headers from the
   input object up to a delimiter line (normally a blank line) and stores them in
   the instance.  The message body, following the headers, is not consumed.

   This class can work with any input object that supports a :meth:`readline`
   method.  If the input object has seek and tell capability, the
   :meth:`rewindbody` method will work; also, illegal lines will be pushed back
   onto the input stream.  If the input object lacks seek but has an :meth:`unread`
   method that can push back a line of input, :class:`Message` will use that to
   push back illegal lines.  Thus this class can be used to parse messages coming
   from a buffered stream.

   The optional *seekable* argument is provided as a workaround for certain stdio
   libraries in which :cfunc:`tell` discards buffered data before discovering that
   the :cfunc:`lseek` system call doesn't work.  For maximum portability, you
   should set the seekable argument to zero to prevent that initial :meth:`tell`
   when passing in an unseekable object such as a file object created from a socket
   object.

   Input lines as read from the file may either be terminated by CR-LF or by a
   single linefeed; a terminating CR-LF is replaced by a single linefeed before the
   line is stored.

   All header matching is done independent of upper or lower case; e.g.
   ``m['From']``, ``m['from']`` and ``m['FROM']`` all yield the same result.


.. class:: AddressList(field)

   You may instantiate the :class:`AddressList` helper class using a single string
   parameter, a comma-separated list of :rfc:`2822` addresses to be parsed.  (The
   parameter ``None`` yields an empty list.)


.. function:: quote(str)

   Return a new string with backslashes in *str* replaced by two backslashes and
   double quotes replaced by backslash-double quote.


.. function:: unquote(str)

   Return a new string which is an *unquoted* version of *str*. If *str* ends and
   begins with double quotes, they are stripped off.  Likewise if *str* ends and
   begins with angle brackets, they are stripped off.


.. function:: parseaddr(address)

   Parse *address*, which should be the value of some address-containing field such
   as :mailheader:`To` or :mailheader:`Cc`, into its constituent "realname" and
   "email address" parts. Returns a tuple of that information, unless the parse
   fails, in which case a 2-tuple ``(None, None)`` is returned.


.. function:: dump_address_pair(pair)

   The inverse of :meth:`parseaddr`, this takes a 2-tuple of the form ``(realname,
   email_address)`` and returns the string value suitable for a :mailheader:`To` or
   :mailheader:`Cc` header.  If the first element of *pair* is false, then the
   second element is returned unmodified.


.. function:: parsedate(date)

   Attempts to parse a date according to the rules in :rfc:`2822`. however, some
   mailers don't follow that format as specified, so :func:`parsedate` tries to
   guess correctly in such cases.  *date* is a string containing an :rfc:`2822`
   date, such as  ``'Mon, 20 Nov 1995 19:12:08 -0500'``.  If it succeeds in parsing
   the date, :func:`parsedate` returns a 9-tuple that can be passed directly to
   :func:`time.mktime`; otherwise ``None`` will be returned.  Note that indexes 6,
   7, and 8 of the result tuple are not usable.


.. function:: parsedate_tz(date)

   Performs the same function as :func:`parsedate`, but returns either ``None`` or
   a 10-tuple; the first 9 elements make up a tuple that can be passed directly to
   :func:`time.mktime`, and the tenth is the offset of the date's timezone from UTC
   (which is the official term for Greenwich Mean Time).  (Note that the sign of
   the timezone offset is the opposite of the sign of the ``time.timezone``
   variable for the same timezone; the latter variable follows the POSIX standard
   while this module follows :rfc:`2822`.)  If the input string has no timezone,
   the last element of the tuple returned is ``None``.  Note that indexes 6, 7, and
   8 of the result tuple are not usable.


.. function:: mktime_tz(tuple)

   Turn a 10-tuple as returned by :func:`parsedate_tz` into a UTC timestamp.  If
   the timezone item in the tuple is ``None``, assume local time.  Minor
   deficiency: this first interprets the first 8 elements as a local time and then
   compensates for the timezone difference; this may yield a slight error around
   daylight savings time switch dates.  Not enough to worry about for common use.


.. seealso::

   Module :mod:`email`
      Comprehensive email handling package; supersedes the :mod:`rfc822` module.

   Module :mod:`mailbox`
      Classes to read various mailbox formats produced  by end-user mail programs.

   Module :mod:`mimetools`
      Subclass of :class:`rfc822.Message` that handles MIME encoded messages.


.. _message-objects:

Message Objects
---------------

A :class:`Message` instance has the following methods:


.. method:: Message.rewindbody()

   Seek to the start of the message body.  This only works if the file object is
   seekable.


.. method:: Message.isheader(line)

   Returns a line's canonicalized fieldname (the dictionary key that will be used
   to index it) if the line is a legal :rfc:`2822` header; otherwise returns
   ``None`` (implying that parsing should stop here and the line be pushed back on
   the input stream).  It is sometimes useful to override this method in a
   subclass.


.. method:: Message.islast(line)

   Return true if the given line is a delimiter on which Message should stop.  The
   delimiter line is consumed, and the file object's read location positioned
   immediately after it.  By default this method just checks that the line is
   blank, but you can override it in a subclass.


.. method:: Message.iscomment(line)

   Return ``True`` if the given line should be ignored entirely, just skipped. By
   default this is a stub that always returns ``False``, but you can override it in
   a subclass.


.. method:: Message.getallmatchingheaders(name)

   Return a list of lines consisting of all headers matching *name*, if any.  Each
   physical line, whether it is a continuation line or not, is a separate list
   item.  Return the empty list if no header matches *name*.


.. method:: Message.getfirstmatchingheader(name)

   Return a list of lines comprising the first header matching *name*, and its
   continuation line(s), if any.  Return ``None`` if there is no header matching
   *name*.


.. method:: Message.getrawheader(name)

   Return a single string consisting of the text after the colon in the first
   header matching *name*.  This includes leading whitespace, the trailing
   linefeed, and internal linefeeds and whitespace if there any continuation
   line(s) were present.  Return ``None`` if there is no header matching *name*.


.. method:: Message.getheader(name[, default])

   Return a single string consisting of the last header matching *name*,
   but strip leading and trailing whitespace.
   Internal whitespace is not stripped.  The optional *default* argument can be
   used to specify a different default to be returned when there is no header
   matching *name*; it defaults to ``None``.
   This is the preferred way to get parsed headers.


.. method:: Message.get(name[, default])

   An alias for :meth:`getheader`, to make the interface more compatible  with
   regular dictionaries.


.. method:: Message.getaddr(name)

   Return a pair ``(full name, email address)`` parsed from the string returned by
   ``getheader(name)``.  If no header matching *name* exists, return ``(None,
   None)``; otherwise both the full name and the address are (possibly empty)
   strings.

   Example: If *m*'s first :mailheader:`From` header contains the string
   ``'jack@cwi.nl (Jack Jansen)'``, then ``m.getaddr('From')`` will yield the pair
   ``('Jack Jansen', 'jack@cwi.nl')``. If the header contained ``'Jack Jansen
   <jack@cwi.nl>'`` instead, it would yield the exact same result.


.. method:: Message.getaddrlist(name)

   This is similar to ``getaddr(list)``, but parses a header containing a list of
   email addresses (e.g. a :mailheader:`To` header) and returns a list of ``(full
   name, email address)`` pairs (even if there was only one address in the header).
   If there is no header matching *name*, return an empty list.

   If multiple headers exist that match the named header (e.g. if there are several
   :mailheader:`Cc` headers), all are parsed for addresses. Any continuation lines
   the named headers contain are also parsed.


.. method:: Message.getdate(name)

   Retrieve a header using :meth:`getheader` and parse it into a 9-tuple compatible
   with :func:`time.mktime`; note that fields 6, 7, and 8  are not usable.  If
   there is no header matching *name*, or it is unparsable, return ``None``.

   Date parsing appears to be a black art, and not all mailers adhere to the
   standard.  While it has been tested and found correct on a large collection of
   email from many sources, it is still possible that this function may
   occasionally yield an incorrect result.


.. method:: Message.getdate_tz(name)

   Retrieve a header using :meth:`getheader` and parse it into a 10-tuple; the
   first 9 elements will make a tuple compatible with :func:`time.mktime`, and the
   10th is a number giving the offset of the date's timezone from UTC.  Note that
   fields 6, 7, and 8  are not usable.  Similarly to :meth:`getdate`, if there is
   no header matching *name*, or it is unparsable, return ``None``.

:class:`Message` instances also support a limited mapping interface. In
particular: ``m[name]`` is like ``m.getheader(name)`` but raises :exc:`KeyError`
if there is no matching header; and ``len(m)``, ``m.get(name[, default])``,
``m.__contains__(name)``, ``m.keys()``, ``m.values()`` ``m.items()``, and
``m.setdefault(name[, default])`` act as expected, with the one difference
that :meth:`setdefault` uses an empty string as the default value.
:class:`Message` instances also support the mapping writable interface ``m[name]
= value`` and ``del m[name]``.  :class:`Message` objects do not support the
:meth:`clear`, :meth:`copy`, :meth:`popitem`, or :meth:`update` methods of the
mapping interface.  (Support for :meth:`get` and :meth:`setdefault` was only
added in Python 2.2.)

Finally, :class:`Message` instances have some public instance variables:


.. attribute:: Message.headers

   A list containing the entire set of header lines, in the order in which they
   were read (except that setitem calls may disturb this order). Each line contains
   a trailing newline.  The blank line terminating the headers is not contained in
   the list.


.. attribute:: Message.fp

   The file or file-like object passed at instantiation time.  This can be used to
   read the message content.


.. attribute:: Message.unixfrom

   The Unix ``From`` line, if the message had one, or an empty string.  This is
   needed to regenerate the message in some contexts, such as an ``mbox``\ -style
   mailbox file.


.. _addresslist-objects:

AddressList Objects
-------------------

An :class:`AddressList` instance has the following methods:


.. method:: AddressList.__len__()

   Return the number of addresses in the address list.


.. method:: AddressList.__str__()

   Return a canonicalized string representation of the address list. Addresses are
   rendered in "name" <host@domain> form, comma-separated.


.. method:: AddressList.__add__(alist)

   Return a new :class:`AddressList` instance that contains all addresses in both
   :class:`AddressList` operands, with duplicates removed (set union).


.. method:: AddressList.__iadd__(alist)

   In-place version of :meth:`__add__`; turns this :class:`AddressList` instance
   into the union of itself and the right-hand instance, *alist*.


.. method:: AddressList.__sub__(alist)

   Return a new :class:`AddressList` instance that contains every address in the
   left-hand :class:`AddressList` operand that is not present in the right-hand
   address operand (set difference).


.. method:: AddressList.__isub__(alist)

   In-place version of :meth:`__sub__`, removing addresses in this list which are
   also in *alist*.

Finally, :class:`AddressList` instances have one public instance variable:


.. attribute:: AddressList.addresslist

   A list of tuple string pairs, one per address.  In each member, the first is the
   canonicalized name part, the second is the actual route-address (``'@'``\
   -separated username-host.domain pair).

.. rubric:: Footnotes

.. [#] This module originally conformed to :rfc:`822`, hence the name.  Since then,
   :rfc:`2822` has been released as an update to :rfc:`822`.  This module should be
   considered :rfc:`2822`\ -conformant, especially in cases where the syntax or
   semantics have changed since :rfc:`822`.