summaryrefslogtreecommitdiffstats
path: root/Lib/email/Header.py
Commit message (Collapse)AuthorAgeFilesLines
* Big email 3.0 API changes, with updated unit tests and documentation.Barry Warsaw2004-10-031-1/+2
| | | | | | | | | | | | | | | | | Briefly (from the NEWS file): - Updates for the email package: + All deprecated APIs that in email 2.x issued warnings have been removed: _encoder argument to the MIMEText constructor, Message.add_payload(), Utils.dump_address_pair(), Utils.decode(), Utils.encode() + New deprecations: Generator.__call__(), Message.get_type(), Message.get_main_type(), Message.get_subtype(), the 'strict' argument to the Parser constructor. These will be removed in email 3.1. + Support for Python earlier than 2.3 has been removed (see PEP 291). + All defect classes have been renamed to end in 'Defect'. + Some FeedParser fixes; also a MultipartInvariantViolationDefect will be added to messages that claim to be multipart but really aren't. + Updates to documentation.
* _split_ascii(): Small optimization by RH.Barry Warsaw2004-05-101-1/+1
|
* Update to Python 2.3, getting rid of backward compatiblity crud. Get rid of aBarry Warsaw2004-05-091-27/+6
| | | | bunch of module globals that aren't used.
* __unicode__(): Fix the logic for calculating whether to add aBarry Warsaw2003-03-301-3/+3
| | | | | separating space or not between encoded chunks. Closes SF bug #710498.
* _encode_chunks(): Throw out empty chunks.Barry Warsaw2003-03-171-0/+2
|
* _split_ascii() [method and function]: Don't join the lines just toBarry Warsaw2003-03-101-10/+11
| | | | | | split them again. Simply return them as chunk lists. _encode_chunks(): Don't add more folding whitespace than necessary.
* _split_ascii(): lstrip the individual lines in the ascii split lines,Barry Warsaw2003-03-071-0/+3
| | | | since we'll be adding our own continuation whitespace later.
* More internal refinements of the ascii splitting algorithm.Barry Warsaw2003-03-071-7/+10
| | | | | | | | | | | | | | _encode_chunks(): Pass maxlinelen in instead of always using self._maxlinelen, so we can adjust for shorter initial lines. Pass this value through to _max_append(). encode(): Weave maxlinelen through to the _encode_chunks() call. _split_ascii(): When recursively splitting a line on spaces (i.e. lower level syntactic split), don't append the whole returned string. Instead, split it on linejoiners and extend the lines up to the last line (for proper packing). Calculate the linelen based on the last element in the this list.
* Repaired a misleading comment Barry inherited from me.Tim Peters2003-03-061-1/+1
|
* _split_ascii(): In the clause where curlen + partlen > maxlen, if theBarry Warsaw2003-03-061-1/+8
| | | | | | part itself is longer than maxlen, and we aren't already splitting on whitespace, then we recursively split the part on whitespace and append that to the this list.
* __unicode__(): When converting to a unicode string, we need toBarry Warsaw2003-03-061-3/+20
| | | | | | preserve spaces in the encoded/unencoded word boundaries. RFC 2047 is ambiguous here, but most people expect the space to be preserved. Really closes SF bug # 640110.
* decode_header(): Typo when appending an unencoded chunk to theBarry Warsaw2003-03-061-1/+1
| | | | | | previous unencoded chunk (e.g. when they appear on separate lines). Closes the 2nd bug in SF #640110 (the first one's already been fixed).
* Merge of the folding-reimpl-branch. Specific changes,Barry Warsaw2003-03-061-100/+154
| | | | | | | | | | | | | | | | | | | | | | | _split(): New implementation of ASCII line splitting which should do a better job and not be subject to the various weird artifacts (bugs) reported. This should also do a better job of higher-level syntactic splits by trying first to split on semis, then commas, then whitespace. Use a Timbot-ly binary search for optimal non-ASCII split points for better packing of header lines. This also lets us remove one recursion call. Don't pass in firstline, but instead pass in the actual line length we're shooting for. Also pass in the list of split characters. encode(): Pass in the list of split characters so applications can have some control over what "higher level syntactic breaks" are. Also, decode_header(): Transform binascii.Errors which can occur when decoding a base64 RFC 2047 header with bogus data, into an email.Errors.HeaderParseError. Closes SF bug #696712.
* Header.__init__(), .append(): Add an optional argument `errors' whichBarry Warsaw2002-12-301-6/+11
| | | | | | is passed straight through to the unicode() and ustr.encode() calls. I think it's the best we can do to address the UnicodeErrors in badly encoded headers such as is described in SF bug #648119.
* append(): Fixing the test for convertability after consultation withBarry Warsaw2002-10-141-14/+28
| | | | | | | Ben. If s is a byte string, make sure it can be converted to unicode with the input codec, and from unicode with the output codec, or raise a UnicodeError exception early. Skip this test (and the unicode->byte string conversion) when the charset is our faux 8bit raw charset.
* __init__(): Fix an invariant, that the charset item in a chunk tupleBarry Warsaw2002-10-141-2/+11
| | | | | | | | | | must be a Charset instance, not a string. The bug here was that self._charset wasn't being converted to a Charset instance so later .append() calls which used the default charset would break. _split(): If the charset of the chunk is '8bit', return the chunk unchanged. We can't safely split it, so this is the avenue of least harm.
* _encode_chunks(), encode(): Don't modify self._chunks. As Ben says:Barry Warsaw2002-10-131-23/+22
| | | | | | | | | Also, it fixes a really egregious error in Header.encode() (really in Header._encode_chunks()) that could cause a header to grow and grow each time encode() was called if output_codec was different from input_codec. Also, fix a typo.
* Docstring consistency with the updated .tex files.Barry Warsaw2002-09-301-3/+4
|
* With help from Martin v. Loewis, clarification is added for theBarry Warsaw2002-09-301-29/+61
| | | | | | | | | | | | | | | | | | | | semantics of header chunks using byte and Unicode strings. Specifically, append(): When the given string is a byte string, charset (whether specified explicitly in the argument list or implicitly via the constructor default) is the encoding of the byte string, and a UnicodeError will be raised if the string cannot be decoded with that charset. If s is a Unicode string, then charset is a hint specifying the character set of the characters in the string. In this case, when producing an RFC 2822 compliant header using RFC 2047 rules, the Unicode string will be encoded using the following charsets in order: us-ascii, the charset hint, utf-8. __init__(): Use the global USASCII Charset instance when the charset argument is None. Also, clarification in the docstring. Also, use True/False where appropriate.
* _ascii_split(): Don't lstrip continuation lines. Closes SF bug #601392.Barry Warsaw2002-09-101-1/+1
|
* append(): Bite the bullet and let charset be the string name of aBarry Warsaw2002-07-231-3/+6
| | | | character set, which we'll convert to a Charset instance. Sigh.
* make_header(): Watch out for charset is None, which decode_header()Barry Warsaw2002-07-231-3/+2
| | | | will return as the charset if implicit us-ascii is used.
* make_header(): New function to take the output of decode_header() andBarry Warsaw2002-07-091-6/+45
| | | | | | | | | | | create a Header instance. Closes feature request #539481. Header.__init__(): Allow the initial string to be omitted. __eq__(), __ne__(): Support rich comparisons for equality of Header instances withy Header instances or strings. Also, update a bunch of docstrings.
* append(): Clarify the expected type of charset.Barry Warsaw2002-07-031-1/+2
|
* __unicode__(): Patch # 541263 by Mikhail Zabaluev, implementationBarry Warsaw2002-06-291-0/+6
| | | | modified by Barry.
* Teach this class about "highest-level syntactic breaks" but only forBarry Warsaw2002-06-281-58/+151
| | | | | | | | | | | | | | | | | | | | | | | headers with no charset or 'us-ascii' charsets. Actually this is only partially true: we know about semicolons (but not true parameters) and we know about whitespace (but not technically folding whitespace). Still it should be good enough for all practical purposes. Other changes include: __init__(): Add a continuation_ws argument, which defaults to a single space. Set this to change the whitespace used for continuation lines when a header must be split. Also, changed the way header line lengths are calculated, so that they take into account continuation_ws (when tabs-expanded) and any provided header_name parameter. This should do much better on returning split headers for which the first and subsequent lines must fit into a specified width. guess_maxlinelen(): Removed. I don't think we need this method as part of the public API. encode_chunks() -> _encode_chunks(): I don't think we need this one as part of the public API either.
* The _compat modules now export _floordiv() instead of _intdiv2() forBarry Warsaw2002-06-011-5/+4
| | | | | | better code reuse. _split() Use _floordiv().
* Whitespace normalization.Tim Peters2002-05-231-4/+4
|
* Fixed a bug in the splitting of lines, and improved the splitting forBarry Warsaw2002-05-191-11/+29
| | | | | | | | | | | | | | | | | single byte character sets. Also fixed a semantic problem with the constructor's default arguments. Specifically, __init__(): Change the maxlinelen argument default to None instead of MAXLINELEN. The semantics should have been (and now are) that if maxlinelen is given it is always honored. If it isn't given, but header_name is given, then the maximum line length is calculated. If neither are given then the default 76 characters is used. _split(): If the character set is a single byte character set then we can split the line at the maxlinelen because we know that encoding the header won't increase its length. If the charset isn't a single byte charset then we use the quicker divide-and-conquer line splitting algorithm as before.
* Sync'ing with standalone email package 2.0.1. This adds support forBarry Warsaw2002-04-101-0/+210
non-us-ascii character sets in headers and bodies. Some API changes (with DeprecationWarnings for the old APIs). Better RFC-compliant implementations of base64 and quoted-printable. Updated test cases. Documentation updates to follow (after I finish writing them ;).