diff options
Diffstat (limited to 'Doc/lib/libmultifile.tex')
-rw-r--r-- | Doc/lib/libmultifile.tex | 147 |
1 files changed, 147 insertions, 0 deletions
diff --git a/Doc/lib/libmultifile.tex b/Doc/lib/libmultifile.tex new file mode 100644 index 0000000..0741aa1 --- /dev/null +++ b/Doc/lib/libmultifile.tex @@ -0,0 +1,147 @@ +% Documentation by ESR +\section{Standard Module \module{multifile}} +\stmodindex{multiFile} +\label{module-multifile} + +The \code{MultiFile} object enables you to treat sections of a text +file as file-like input objects, with EOF being returned by +\code{readline} when a given delimiter pattern is encountered. The +defaults of this class are designed to make it useful for parsing +MIME multipart messages, but by subclassing it and overriding methods +it can be easily adapted for more general use. + +\begin{classdesc}{MultiFile}{fp[, seekable=1]} +Create a multi-file. You must instantiate this class with an input +object argument for MultiFile to get lines from, such as as a file +object returned by \code{open}. + +MultiFile only ever looks at the input object's \code{readline}, +\code{seek} and \code{tell} methods, and the latter two are only +needed if you want to random-access the multifile sections. To use +MultiFile on a non-seekable stream object, set the optional seekable +argument to 0; this will avoid using the input object's \code{seek} +and \code{tell} at all. +\end{classdesc} + +It will be useful to know that in MultiFile's view of the world, text +is composed of three kinds of lines: data, section-dividers, and +end-markers. MultiFile is designed to support parsing of +messages that may have multiple nested message parts, each with its +own pattern for section-divider and end-marker lines. + +\subsection{MultiFile Objects} +\label{MultiFile-objects} + +A \class{MultiFile} instance has the following methods: + +\begin{methoddesc}{push}{str} +Push a boundary string. When an appropriately decorated version of +this boundary is found as an input line, it will be interpreted as a +section-divider or end-marker and passed back as EOF. All subsequent +reads will also be passed back as EOF, until a \method{pop} removes +the boundary a or \method{next} call reenables it. + +It is possible to push more than one boundary. Encountering the +most-recently-pushed boundary will return EOF; encountering any other +boundary will raise an error. +\end{methoddesc} + +\begin{methoddesc}{readline}{str} +Read a line. If the line is data (not a section-divider or end-marker +or real EOF) return it. If the line matches the most-recently-stacked +boundary, return EOF and set \code{self.last} to 1 or 0 according as +the match is or is not an end-marker. If the line matches any other +stacked boundary, raise an error. If the line is a real EOF, raise an +error unless all boundaries have been popped. +\end{methoddesc} + +\begin{methoddesc}{readlines}{str} +Read all lines, up to the next section. Return them as a list of strings +\end{methoddesc} + +\begin{methoddesc}{read}{str} +Read all lines, up to the next section. Return them as a single +(multiline) string. Note that this doesn't take a size argument! +\end{methoddesc} + +\begin{methoddesc}{next}{str} +Skip lines to the next section (that is, read lines until a +section-divider or end-marker has been consumed). Return 1 if there +is such a section, 0 if an end-marker is seen. Re-enable the +most-recently-pushed boundary. +\end{methoddesc} + +\begin{methoddesc}{pop}{str} +Pop a section boundary. This boundary will no longer be interpreted as EOF. +\end{methoddesc} + +\begin{methoddesc}{seek}{str, pos, whence=0} +Seek. Seek indices are relative to the start of the current section. +The pos and whence arguments are interpreted as for a file seek. +\end{methoddesc} + +\begin{methoddesc}{next}{str} +Tell. Tell indices are relative to the start of the current section. +\end{methoddesc} + +\begin{methoddesc}{is_data}{str} +Return true if a 1 is certainly data and 0 if it might be a section +boundary. As written, it tests for a prefix other than '--' at start of +line (which all MIME boundaries have) but it is declared so it can be +overridden in derived classes. + +Note that this test is used intended as a fast guard for the real +boundary tests; if it always returns 0 it will merely slow processing, +not cause it to fail. +\end{methoddesc} + +\begin{methoddesc}{section_divider}{str} +Turn a boundary into a section-divider line. By default, this +method prepends '--' (which MIME section boundaries have) but it is +declared so it can be overridden in derived classes. This method +need not append LF or CR-LF, as comparison with the result ignores +trailing whitespace. +\end{methoddesc} + +\begin{methoddesc}{end_marker}{str} +Turn a boundary string into an end-marker line. By default, this +method prepends '--' and appends '--' (like a MIME-multipart +end-of-message marker) but it is declared so it can be be overridden +in derived classes. This method need not append LF or CR-LF, as +comparison with the result ignores trailing whitespace. +\end{methoddesc} + +Finally, \class{MultiFile} instances have two public instance variables: + +\begin{memberdesc}{level} +\end{memberdesc} + +\begin{memberdesc}{last} +1 if the last EOF passed back was for an end-of-message marker, 0 otherwise. +\end{memberdesc} + +Example: + +\begin{verbatim} + fp = MultiFile(sys.stdin, 0) + fp.push(outer_boundary) + message1 = fp.readlines() + # We should now be either at real EOF or stopped on a message + # boundary. Re-enable the outer boundary. + fp.next() + # Read another message with the same delimiter + message2 = fp.readlines() + # Re-enable that delimiter again + fp.next() + # Now look for a message subpart with a different boundary + fp.push(inner_boundary) + sub_header = fp.readlines() + # If no exception has been thrown, we're looking at the start of + # the message subpart. Reset and grab the subpart + fp.next() + sub_body = fp.readlines() + # Got it. Now pop the inner boundary to re-enable the outer one. + fp.pop() + # Read to next outer boundary + message3 = fp.readlines() +\end{verbatim} |