summaryrefslogtreecommitdiffstats
path: root/Lib/email/Parser.py
Commit message (Collapse)AuthorAgeFilesLines
* Big email 3.0 API changes, with updated unit tests and documentation.Barry Warsaw2004-10-031-11/+23
| | | | | | | | | | | | | | | | | Briefly (from the NEWS file): - Updates for the email package: + All deprecated APIs that in email 2.x issued warnings have been removed: _encoder argument to the MIMEText constructor, Message.add_payload(), Utils.dump_address_pair(), Utils.decode(), Utils.encode() + New deprecations: Generator.__call__(), Message.get_type(), Message.get_main_type(), Message.get_subtype(), the 'strict' argument to the Parser constructor. These will be removed in email 3.1. + Support for Python earlier than 2.3 has been removed (see PEP 291). + All defect classes have been renamed to end in 'Defect'. + Some FeedParser fixes; also a MultipartInvariantViolationDefect will be added to messages that claim to be multipart but really aren't. + Updates to documentation.
* Update to Python 2.3, getting rid of backward compatiblity crud.Barry Warsaw2004-05-091-281/+20
| | | | This Parser is now just a backward compatible front-end to the FeedParser.
* Merge in Anthony's new parser code, from the anthony-parser-branch:Thomas Wouters2004-03-201-128/+173
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | > ---------------------------- > revision 1.20.4.4 > date: 2003/06/12 09:14:17; author: anthonybaxter; state: Exp; lines: +13 -6 > preamble is None when missing, not ''. > Handle a couple of bogus formatted messages - now parses my main testsuite. > Handle message/external-body. > ---------------------------- > revision 1.20.4.3 > date: 2003/06/12 07:16:40; author: anthonybaxter; state: Exp; lines: +6 -4 > epilogue-processing is now the same as the old parser - the newline at the > end of the line with the --endboundary-- is included as part of the epilogue. > Note that any whitespace after the boundary is _not_ part of the epilogue. > ---------------------------- > revision 1.20.4.2 > date: 2003/06/12 06:39:09; author: anthonybaxter; state: Exp; lines: +6 -4 > message/delivery-status fixed. > HeaderParser fixed. > ---------------------------- > revision 1.20.4.1 > date: 2003/06/12 06:08:56; author: anthonybaxter; state: Exp; lines: +163 -129 > A work-in-progress snapshot of the new parser. A couple of known problems: > > - first (blank) line of MIME epilogues is being consumed > - message/delivery-status isn't quite right > > It still needs a lot of cleanup, but right now it parses a whole lot of > badness that the old parser failed on. I also need to think about adding > back the old 'strict' flag in some way. > =============================================================================
* Merge of the folding-reimpl-branch. Specific changes,Barry Warsaw2003-03-061-2/+2
| | | | Rename a constant.
* parse(), _parseheaders(), _parsebody(): A fix for SF bug #633527,Barry Warsaw2002-11-051-9/+22
| | | | | | | | | | | | | | | | | | where in lax parsing, the first non-header line after a header block (e.g. the first line not containing a colon, and not a continuation), can be treated as the first body line, even without the RFC mandated blank line separator. rfc822 had this behavior, and I vaguely remember problems with this, but can't remember details. In any event, all the tests still pass, so I guess we'll find out. ;/ This patch works by returning the non-header, non-continuation line from _parseheader() and using that as the first header line prepended to fp.read() if given. It's usually None. We use this approach instead of trying to seek/tell the file-like object.
* _parsebody(): A fix for SF bug #631350, where a subobject in aBarry Warsaw2002-11-051-2/+6
| | | | | | | | | | multipart/digest isn't a message/rfc822. This is legal, but counter to recommended practice in RFC 2046, $5.1.5. The fix is to look at the content type after setting the default content type. If the maintype is then message or multipart, attach the parsed subobject, otherwise use set_payload() to set the data of the other object.
* _parsebody(): Use get_content_type() instead of the deprecatedBarry Warsaw2002-10-071-5/+6
| | | | | | | get_type(). Also, one of the regular expressions is constant so might as well make it a module global. And, when splitting up digests, handle lineseps that are longer than 1 character in length (e.g. \r\n).
* Docstring consistency with the updated .tex files.Barry Warsaw2002-09-301-0/+14
|
* Use True/False everywhere.Barry Warsaw2002-09-281-5/+12
|
* _parsebody(): Instead of raising a BoundaryError when no startBarry Warsaw2002-09-101-2/+5
| | | | | boundary could be found -- in a lax parser -- the entire body is assigned to the message payload.
* Whitespace normalization.Tim Peters2002-08-231-5/+5
|
* Parser.__init__(): The consensus on the mimelib-devel list is thatBarry Warsaw2002-07-191-2/+2
| | | | non-strict parsing should be the default. Make it so.
* Anthony Baxter's cleanup patch. Python project SF patch # 583190,Barry Warsaw2002-07-181-18/+25
| | | | | | | | | | | | | | | | | | | | | | | | quoting: in non-strict mode, messages don't require a blank line at the end with a missing end-terminator. A single newline is sufficient now. Handle trailing whitespace at the end of a boundary. Had to switch from using string.split() to re.split() Handle whitespace on the end of a parameter list for Content-type. Handle whitespace on the end of a plain content-type header. Specifically, get_type(): Strip the content type string. _get_params_preserve(): Strip the parameter names and values on both sides. _parsebody(): Lots of changes as described above, with some stylistic changes by Barry (who hopefully didn't screw things up ;).
* Anthony Baxter's patch for non-strict parsing. This adds a `strict'Barry Warsaw2002-07-091-24/+71
| | | | | | | | | | | | | | | | | argument to the constructor -- defaulting to true -- which is different than Anthony's approach of using global state. parse(), parsestr(): Grow a `headersonly' argument which stops parsing once the header block has been seen, i.e. it does /not/ parse or even read the body of the message. This is used for parsing message/rfc822 type messages. We need test cases for the non-strict parsing. Anthony will supply these. _parsebody(): We can get rid of the isdigest end-of-line kludges, although we still need to know if we're parsing a multipart/digest so we can set the default type accordingly.
* _parsebody(): Fix for the new message/rfc822 tree structure (theBarry Warsaw2002-06-021-4/+3
| | | | parent is now a multipart with one element, the sub-message object).
* I've thought about it some more, and I believe it is proper for theBarry Warsaw2002-05-191-10/+20
| | | | | | | | | | | | email package's Parser to handle the three common line endings. Certain protocols such as IMAP define CRLF line endings and it doesn't make sense for the client app to have to normalize the line endings before handing it message off to the Parser. _parsebody(): Be more flexible in the matching of line endings for finding the MIME separators. Accept any of \r, \n and \r\n. Note that we do /not/ change the line endings in the payloads, we just accept any of those three around MIME boundaries.
* Sync'ing with standalone email package 2.0.1. This adds support forBarry Warsaw2002-04-101-10/+16
| | | | | | | | | non-us-ascii character sets in headers and bodies. Some API changes (with DeprecationWarnings for the old APIs). Better RFC-compliant implementations of base64 and quoted-printable. Updated test cases. Documentation updates to follow (after I finish writing them ;).
* _parsebody(): When adding subparts to a multipart container, make sureBarry Warsaw2002-01-271-2/+7
| | | | | | that the first subpart added makes the payload a list object. Otherwise, a multipart/* with only one subpart will not have the proper structure.
* HeaderParser: A new subclass of Parser which only parses the messageBarry Warsaw2001-10-111-0/+16
| | | | | | headers. It does not parse the body of the message, instead simply assigning it as a string to the container's payload. This can be much faster when you're only interested in a message's header.
* Give me back my page breaks.Barry Warsaw2001-10-041-1/+1
|
* Whitespace normalization.Tim Peters2001-10-041-1/+1
|
* _parsebody(): Use get_boundary() and get_type().Barry Warsaw2001-09-261-10/+16
| | | | | | | Also, add a clause to the big-if to handle message/delivery-status content types. These create a message with subparts that are Message instances, which best represent the header blocks of this content type.
* The email package version 1.0, prototyped as mimelibBarry Warsaw2001-09-231-0/+154
<http://sf.net/projects/mimelib>. There /are/ API differences between mimelib and email, but most of the implementations are shared (except where cool Py2.2 stuff like generators are used).