diff options
Diffstat (limited to 'Doc/library/multifile.rst')
-rw-r--r-- | Doc/library/multifile.rst | 190 |
1 files changed, 190 insertions, 0 deletions
diff --git a/Doc/library/multifile.rst b/Doc/library/multifile.rst new file mode 100644 index 0000000..c36ccb7 --- /dev/null +++ b/Doc/library/multifile.rst @@ -0,0 +1,190 @@ + +:mod:`multifile` --- Support for files containing distinct parts +================================================================ + +.. module:: multifile + :synopsis: Support for reading files which contain distinct parts, such as some MIME data. +.. sectionauthor:: Eric S. Raymond <esr@snark.thyrsus.com> + + +.. deprecated:: 2.5 + The :mod:`email` package should be used in preference to the :mod:`multifile` + module. This module is present only to maintain backward compatibility. + +The :class:`MultiFile` object enables you to treat sections of a text file as +file-like input objects, with ``''`` being returned by :meth:`readline` when a +given delimiter pattern is encountered. The defaults of this class are designed +to make it useful for parsing MIME multipart messages, but by subclassing it and +overriding methods it can be easily adapted for more general use. + + +.. class:: MultiFile(fp[, seekable]) + + Create a multi-file. You must instantiate this class with an input object + argument for the :class:`MultiFile` instance to get lines from, such as a file + object returned by :func:`open`. + + :class:`MultiFile` only ever looks at the input object's :meth:`readline`, + :meth:`seek` and :meth:`tell` methods, and the latter two are only needed if you + want random access to the individual MIME parts. To use :class:`MultiFile` on a + non-seekable stream object, set the optional *seekable* argument to false; this + will prevent using the input object's :meth:`seek` and :meth:`tell` methods. + +It will be useful to know that in :class:`MultiFile`'s view of the world, text +is composed of three kinds of lines: data, section-dividers, and end-markers. +MultiFile is designed to support parsing of messages that may have multiple +nested message parts, each with its own pattern for section-divider and +end-marker lines. + + +.. seealso:: + + Module :mod:`email` + Comprehensive email handling package; supersedes the :mod:`multifile` module. + + +.. _multifile-objects: + +MultiFile Objects +----------------- + +A :class:`MultiFile` instance has the following methods: + + +.. method:: MultiFile.readline(str) + + Read a line. If the line is data (not a section-divider or end-marker or real + EOF) return it. If the line matches the most-recently-stacked boundary, return + ``''`` and set ``self.last`` to 1 or 0 according as the match is or is not an + end-marker. If the line matches any other stacked boundary, raise an error. On + encountering end-of-file on the underlying stream object, the method raises + :exc:`Error` unless all boundaries have been popped. + + +.. method:: MultiFile.readlines(str) + + Return all lines remaining in this part as a list of strings. + + +.. method:: MultiFile.read() + + Read all lines, up to the next section. Return them as a single (multiline) + string. Note that this doesn't take a size argument! + + +.. method:: MultiFile.seek(pos[, whence]) + + Seek. Seek indices are relative to the start of the current section. The *pos* + and *whence* arguments are interpreted as for a file seek. + + +.. method:: MultiFile.tell() + + Return the file position relative to the start of the current section. + + +.. method:: MultiFile.next() + + Skip lines to the next section (that is, read lines until a section-divider or + end-marker has been consumed). Return true if there is such a section, false if + an end-marker is seen. Re-enable the most-recently-pushed boundary. + + +.. method:: MultiFile.is_data(str) + + Return true if *str* is data and false if it might be a section boundary. As + written, it tests for a prefix other than ``'-``\ ``-'`` at start of line (which + all MIME boundaries have) but it is declared so it can be overridden in derived + classes. + + Note that this test is used intended as a fast guard for the real boundary + tests; if it always returns false it will merely slow processing, not cause it + to fail. + + +.. method:: MultiFile.push(str) + + Push a boundary string. When a decorated version of this boundary is found as + an input line, it will be interpreted as a section-divider or end-marker + (depending on the decoration, see :rfc:`2045`). All subsequent reads will + return the empty string to indicate end-of-file, until a call to :meth:`pop` + removes the boundary a or :meth:`next` call reenables it. + + It is possible to push more than one boundary. Encountering the + most-recently-pushed boundary will return EOF; encountering any other + boundary will raise an error. + + +.. method:: MultiFile.pop() + + Pop a section boundary. This boundary will no longer be interpreted as EOF. + + +.. method:: MultiFile.section_divider(str) + + Turn a boundary into a section-divider line. By default, this method + prepends ``'--'`` (which MIME section boundaries have) but it is declared so + it can be overridden in derived classes. This method need not append LF or + CR-LF, as comparison with the result ignores trailing whitespace. + + +.. method:: MultiFile.end_marker(str) + + Turn a boundary string into an end-marker line. By default, this method + prepends ``'--'`` and appends ``'--'`` (like a MIME-multipart end-of-message + marker) but it is declared so it can be overridden in derived classes. This + method need not append LF or CR-LF, as comparison with the result ignores + trailing whitespace. + +Finally, :class:`MultiFile` instances have two public instance variables: + + +.. attribute:: MultiFile.level + + Nesting depth of the current part. + + +.. attribute:: MultiFile.last + + True if the last end-of-file was for an end-of-message marker. + + +.. _multifile-example: + +:class:`MultiFile` Example +-------------------------- + +.. sectionauthor:: Skip Montanaro <skip@mojam.com> + + +:: + + import mimetools + import multifile + import StringIO + + def extract_mime_part_matching(stream, mimetype): + """Return the first element in a multipart MIME message on stream + matching mimetype.""" + + msg = mimetools.Message(stream) + msgtype = msg.gettype() + params = msg.getplist() + + data = StringIO.StringIO() + if msgtype[:10] == "multipart/": + + file = multifile.MultiFile(stream) + file.push(msg.getparam("boundary")) + while file.next(): + submsg = mimetools.Message(file) + try: + data = StringIO.StringIO() + mimetools.decode(file, data, submsg.getencoding()) + except ValueError: + continue + if submsg.gettype() == mimetype: + break + file.pop() + return data.getvalue() + |