summaryrefslogtreecommitdiffstats
path: root/Doc/library/email.headerregistry.rst
blob: 4fc9594bc3beb41aefb731e54488a0dbfc4bdb8e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
:mod:`email.headerregistry`: Custom Header Objects
--------------------------------------------------

.. module:: email.headerregistry
   :synopsis: Automatic Parsing of headers based on the field name

.. note::

   The headerregistry module has been included in the standard library on a
   :term:`provisional basis <provisional package>`. Backwards incompatible
   changes (up to and including removal of the module) may occur if deemed
   necessary by the core developers.

.. versionadded:: 3.3
   as a :term:`provisional module <provisional package>`

Headers are represented by customized subclasses of :class:`str`.  The
particular class used to represent a given header is determined by the
:attr:`~email.policy.EmailPolicy.header_factory` of the :mod:`~email.policy` in
effect when the headers are created.  This section documents the particular
``header_factory`` implemented by the email package for handling :RFC:`5322`
compliant email messages, which not only provides customized header objects for
various header types, but also provides an extension mechanism for applications
to add their own custom header types.

When using any of the policy objects derived from
:data:`~email.policy.EmailPolicy`, all headers are produced by
:class:`.HeaderRegistry` and have :class:`.BaseHeader` as their last base
class.  Each header class has an additional base class that is determined by
the type of the header.  For example, many headers have the class
:class:`.UnstructuredHeader` as their other base class.  The specialized second
class for a header is determined by the name of the header, using a lookup
table stored in the :class:`.HeaderRegistry`.  All of this is managed
transparently for the typical application program, but interfaces are provided
for modifying the default behavior for use by more complex applications.

The sections below first document the header base classes and their attributes,
followed by the API for modifying the behavior of :class:`.HeaderRegistry`, and
finally the support classes used to represent the data parsed from structured
headers.


.. class:: BaseHeader(name, value)

   *name* and *value* are passed to ``BaseHeader`` from the
   :attr:`~email.policy.EmailPolicy.header_factory` call.  The string value of
   any header object is the *value* fully decoded to unicode.

   This base class defines the following read-only properties:


   .. attribute:: name

      The name of the header (the portion of the field before the ':').  This
      is exactly the value passed in the :attr:`~EmailPolicy.header_factory`
      call for *name*; that is, case is preserved.


   .. attribute:: defects

      A tuple of :exc:`~email.errors.HeaderDefect` instances reporting any
      RFC compliance problems found during parsing.  The email package tries to
      be complete about detecting compliance issues.  See the :mod:`errors`
      module for a discussion of the types of defects that may be reported.


   .. attribute:: max_count

      The maximum number of headers of this type that can have the same
      ``name``.  A value of ``None`` means unlimited.  The ``BaseHeader`` value
      for this attribute is ``None``; it is expected that specialized header
      classes will override this value as needed.

   ``BaseHeader`` also provides the following method, which is called by the
   email library code and should not in general be called by application
   programs:

   .. method:: fold(*, policy)

      Return a string containing :attr:`~email.policy.Policy.linesep`
      characters as required to correctly fold the header according
      to *policy*.  A :attr:`~email.policy.Policy.cte_type` of
      ``8bit`` will be treated as if it were ``7bit``, since strings
      may not contain binary data.


   ``BaseHeader`` by itself cannot be used to create a header object.  It
   defines a protocol that each specialized header cooperates with in order to
   produce the header object.  Specifically, ``BaseHeader`` requires that
   the specialized class provide a :func:`classmethod` named ``parse``.  This
   method is called as follows::

       parse(string, kwds)

   ``kwds`` is a dictionary containing one pre-initialized key, ``defects``.
   ``defects`` is an empty list.  The parse method should append any detected
   defects to this list.  On return, the ``kwds`` dictionary *must* contain
   values for at least the keys ``decoded`` and ``defects``.  ``decoded``
   should be the string value for the header (that is, the header value fully
   decoded to unicode).  The parse method should assume that *string* may
   contain transport encoded parts, but should correctly handle all valid
   unicode characters as well so that it can parse un-encoded header values.

   ``BaseHeader``'s ``__new__`` then creates the header instance, and calls its
   ``init`` method.  The specialized class only needs to provide an ``init``
   method if it wishes to set additional attributes beyond those provided by
   ``BaseHeader`` itself.  Such an ``init`` method should look like this::

       def init(self, *args, **kw):
           self._myattr = kw.pop('myattr')
           super().init(*args, **kw)

   That is, anything extra that the specialized class puts in to the ``kwds``
   dictionary should be removed and handled, and the remaining contents of
   ``kw`` (and ``args``) passed to the ``BaseHeader`` ``init`` method.


.. class:: UnstructuredHeader

   An "unstructured" header is the default type of header in :rfc:`5322`.
   Any header that does not have a specified syntax is treated as
   unstructured.  The classic example of an unstructured header is the
   :mailheader:`Subject` header.

   In :rfc:`5322`, an unstructured header is a run of arbitrary text in the
   ASCII character set.  :rfc:`2047`, however, has an :rfc:`5322` compatible
   mechanism for encoding non-ASCII text as ASCII characters within a header
   value.  When a *value* containing encoded words is passed to the
   constructor, the ``UnstructuredHeader`` parser converts such encoded words
   back in to the original unicode, following the :rfc:`2047` rules for
   unstructured text.  The parser uses heuristics to attempt to decode certain
   non-compliant encoded words.  Defects are registered in such cases, as well
   as defects for issues such as invalid characters within the encoded words or
   the non-encoded text.

   This header type provides no additional attributes.


.. class:: DateHeader

   :rfc:`5322` specifies a very specific format for dates within email headers.
   The ``DateHeader`` parser recognizes that date format, as well as
   recognizing a number of variant forms that are sometimes found "in the
   wild".

   This header type provides the following additional attributes:

   .. attribute:: datetime

      If the header value can be recognized as a valid date of one form or
      another, this attribute will contain a :class:`~datetime.datetime`
      instance representing that date.  If the timezone of the input date is
      specified as ``-0000`` (indicating it is in UTC but contains no
      information about the source timezone), then :attr:`.datetime` will be a
      naive :class:`~datetime.datetime`.  If a specific timezone offset is
      found (including `+0000`), then :attr:`.datetime` will contain an aware
      ``datetime`` that uses :class:`datetime.timezone` to record the timezone
      offset.

   The ``decoded`` value of the header is determined by formatting the
   ``datetime`` according to the :rfc:`5322` rules; that is, it is set to::

       email.utils.format_datetime(self.datetime)

   When creating a ``DateHeader``, *value* may be
   :class:`~datetime.datetime` instance.  This means, for example, that
   the following code is valid and does what one would expect::

       msg['Date']  = datetime(2011, 7, 15, 21)

   Because this is a naive ``datetime`` it will be interpreted as a UTC
   timestamp, and the resulting value will have a timezone of ``-0000``.  Much
   more useful is to use the :func:`~email.utils.localtime` function from the
   :mod:`~email.utils` module::

       msg['Date'] = utils.localtime()

   This example sets the date header to the current time and date using
   the current timezone offset.


.. class:: AddressHeader

   Address headers are one of the most complex structured header types.
   The ``AddressHeader`` class provides a generic interface to any address
   header.

   This header type provides the following additional attributes:


   .. attribute:: groups

      A tuple of :class:`.Group` objects encoding the
      addresses and groups found in the header value.  Addresses that are
      not part of a group are represented in this list as single-address
      ``Groups`` whose :attr:`~.Group.display_name` is ``None``.


   .. attribute:: addresses

      A tuple of :class:`.Address` objects encoding all
      of the individual addresses from the header value.  If the header value
      contains any groups, the individual addresses from the group are included
      in the list at the point where the group occurs in the value (that is,
      the list of addresses is "flattened" into a one dimensional list).

   The ``decoded`` value of the header will have all encoded words decoded to
   unicode.  :class:`~encodings.idna` encoded domain names are also decoded to unicode.  The
   ``decoded`` value is set by :attr:`~str.join`\ ing the :class:`str` value of
   the elements of the ``groups`` attribute with ``', '``.

   A list of :class:`.Address` and :class:`.Group` objects in any combination
   may be used to set the value of an address header.  ``Group`` objects whose
   ``display_name`` is ``None`` will be interpreted as single addresses, which
   allows an address list to be copied with groups intact by using the list
   obtained ``groups`` attribute of the source header.


.. class:: SingleAddressHeader

   A subclass of :class:`.AddressHeader` that adds one
   additional attribute:


   .. attribute:: address

      The single address encoded by the header value.  If the header value
      actually contains more than one address (which would be a violation of
      the RFC under the default :mod:`policy`), accessing this attribute will
      result in a :exc:`ValueError`.


Each of the above classes also has a ``Unique`` variant (for example,
``UniqueUnstructuredHeader``).  The only difference is that in the ``Unique``
variant, :attr:`~.BaseHeader.max_count` is set to 1.


.. class:: HeaderRegistry(base_class=BaseHeader, \
                          default_class=UnstructuredHeader, \
                          use_default_map=True)

    This is the factory used by :class:`~email.policy.EmailPolicy` by default.
    ``HeaderRegistry`` builds the class used to create a header instance
    dynamically, using *base_class* and a specialized class retrieved from a
    registry that it holds.  When a given header name does not appear in the
    registry, the class specified by *default_class* is used as the specialized
    class.  When *use_default_map* is ``True`` (the default), the standard
    mapping of header names to classes is copied in to the registry during
    initialization.  *base_class* is always the last class in the generated
    class's ``__bases__`` list.

    The default mappings are:

      :subject:         UniqueUnstructuredHeader
      :date:            UniqueDateHeader
      :resent-date:     DateHeader
      :orig-date:       UniqueDateHeader
      :sender:          UniqueSingleAddressHeader
      :resent-sender:   SingleAddressHeader
      :to:              UniqueAddressHeader
      :resent-to:       AddressHeader
      :cc:              UniqueAddressHeader
      :resent-cc:       AddressHeader
      :from:            UniqueAddressHeader
      :resent-from:     AddressHeader
      :reply-to:        UniqueAddressHeader

    ``HeaderRegistry`` has the following methods:


    .. method:: map_to_type(self, name, cls)

       *name* is the name of the header to be mapped.  It will be converted to
       lower case in the registry.  *cls* is the specialized class to be used,
       along with *base_class*, to create the class used to instantiate headers
       that match *name*.


    .. method:: __getitem__(name)

       Construct and return a class to handle creating a *name* header.


    .. method:: __call__(name, value)

       Retrieves the specialized header associated with *name* from the
       registry (using *default_class* if *name* does not appear in the
       registry) and composes it with *base_class* to produce a class,
       calls the constructed class's constructor, passing it the same
       argument list, and finally returns the class instance created thereby.


The following classes are the classes used to represent data parsed from
structured headers and can, in general, be used by an application program to
construct structured values to assign to specific headers.


.. class:: Address(display_name='', username='', domain='', addr_spec=None)

   The class used to represent an email address.  The general form of an
   address is::

      [display_name] <username@domain>

   or::

      username@domain

   where each part must conform to specific syntax rules spelled out in
   :rfc:`5322`.

   As a convenience *addr_spec* can be specified instead of *username* and
   *domain*, in which case *username* and *domain* will be parsed from the
   *addr_spec*.  An *addr_spec* must be a properly RFC quoted string; if it is
   not ``Address`` will raise an error.  Unicode characters are allowed and
   will be property encoded when serialized.  However, per the RFCs, unicode is
   *not* allowed in the username portion of the address.

   .. attribute:: display_name

      The display name portion of the address, if any, with all quoting
      removed.  If the address does not have a display name, this attribute
      will be an empty string.

   .. attribute:: username

      The ``username`` portion of the address, with all quoting removed.

   .. attribute:: domain

      The ``domain`` portion of the address.

   .. attribute:: addr_spec

      The ``username@domain`` portion of the address, correctly quoted
      for use as a bare address (the second form shown above).  This
      attribute is not mutable.

   .. method:: __str__()

      The ``str`` value of the object is the address quoted according to
      :rfc:`5322` rules, but with no Content Transfer Encoding of any non-ASCII
      characters.

   To support SMTP (:rfc:`5321`), ``Address`` handles one special case: if
   ``username`` and ``domain`` are both the empty string (or ``None``), then
   the string value of the ``Address`` is ``<>``.


.. class:: Group(display_name=None, addresses=None)

   The class used to represent an address group.  The general form of an
   address group is::

     display_name: [address-list];

   As a convenience for processing lists of addresses that consist of a mixture
   of groups and single addresses, a ``Group`` may also be used to represent
   single addresses that are not part of a group by setting *display_name* to
   ``None`` and providing a list of the single address as *addresses*.

   .. attribute:: display_name

      The ``display_name`` of the group.  If it is ``None`` and there is
      exactly one ``Address`` in ``addresses``, then the ``Group`` represents a
      single address that is not in a group.

   .. attribute:: addresses

      A possibly empty tuple of :class:`.Address` objects representing the
      addresses in the group.

   .. method:: __str__()

      The ``str`` value of a ``Group`` is formatted according to :rfc:`5322`,
      but with no Content Transfer Encoding of any non-ASCII characters.  If
      ``display_name`` is none and there is a single ``Address`` in the
      ``addresses`` list, the ``str`` value will be the same as the ``str`` of
      that single ``Address``.