diff options
Diffstat (limited to 'Doc/libhtmllib.tex')
-rw-r--r-- | Doc/libhtmllib.tex | 119 |
1 files changed, 0 insertions, 119 deletions
diff --git a/Doc/libhtmllib.tex b/Doc/libhtmllib.tex deleted file mode 100644 index c856b42..0000000 --- a/Doc/libhtmllib.tex +++ /dev/null @@ -1,119 +0,0 @@ -\section{Standard Module \module{htmllib}} -\label{module-htmllib} -\stmodindex{htmllib} -\index{HTML} -\index{hypertext} - - -This module defines a class which can serve as a base for parsing text -files formatted in the HyperText Mark-up Language (HTML). The class -is not directly concerned with I/O --- it must be provided with input -in string form via a method, and makes calls to methods of a -``formatter'' object in order to produce output. The -\class{HTMLParser} class is designed to be used as a base class for -other classes in order to add functionality, and allows most of its -methods to be extended or overridden. In turn, this class is derived -from and extends the \class{SGMLParser} class defined in module -\module{sgmllib}\refstmodindex{sgmllib}. The \class{HTMLParser} -implementation supports the HTML 2.0 language as described in -\rfc{1866}. Two implementations of formatter objects are provided in -the \module{formatter}\refstmodindex{formatter} module; refer to the -documentation for that module for information on the formatter -interface. -\index{SGML} -\withsubitem{(in module sgmllib)}{\ttindex{SGMLParser}} -\index{formatter} - -The following is a summary of the interface defined by -\class{sgmllib.SGMLParser}: - -\begin{itemize} - -\item -The interface to feed data to an instance is through the \method{feed()} -method, which takes a string argument. This can be called with as -little or as much text at a time as desired; \samp{p.feed(a); -p.feed(b)} has the same effect as \samp{p.feed(a+b)}. When the data -contains complete HTML tags, these are processed immediately; -incomplete elements are saved in a buffer. To force processing of all -unprocessed data, call the \method{close()} method. - -For example, to parse the entire contents of a file, use: -\begin{verbatim} -parser.feed(open('myfile.html').read()) -parser.close() -\end{verbatim} - -\item -The interface to define semantics for HTML tags is very simple: derive -a class and define methods called \code{start_\var{tag}()}, -\code{end_\var{tag}()}, or \code{do_\var{tag}()}. The parser will -call these at appropriate moments: \code{start_\var{tag}} or -\code{do_\var{tag}()} is called when an opening tag of the form -\code{<\var{tag} ...>} is encountered; \code{end_\var{tag}()} is called -when a closing tag of the form \code{<\var{tag}>} is encountered. If -an opening tag requires a corresponding closing tag, like \code{<H1>} -... \code{</H1>}, the class should define the \code{start_\var{tag}()} -method; if a tag requires no closing tag, like \code{<P>}, the class -should define the \code{do_\var{tag}()} method. - -\end{itemize} - -The module defines a single class: - -\begin{classdesc}{HTMLParser}{formatter} -This is the basic HTML parser class. It supports all entity names -required by the HTML 2.0 specification (\rfc{1866}). It also defines -handlers for all HTML 2.0 and many HTML 3.0 and 3.2 elements. -\end{classdesc} - -In addition to tag methods, the \class{HTMLParser} class provides some -additional methods and instance variables for use within tag methods. - -\begin{memberdesc}{formatter} -This is the formatter instance associated with the parser. -\end{memberdesc} - -\begin{memberdesc}{nofill} -Boolean flag which should be true when whitespace should not be -collapsed, or false when it should be. In general, this should only -be true when character data is to be treated as ``preformatted'' text, -as within a \code{<PRE>} element. The default value is false. This -affects the operation of \method{handle_data()} and \method{save_end()}. -\end{memberdesc} - - -\begin{methoddesc}{anchor_bgn}{href, name, type} -This method is called at the start of an anchor region. The arguments -correspond to the attributes of the \code{<A>} tag with the same -names. The default implementation maintains a list of hyperlinks -(defined by the \code{href} attribute) within the document. The list -of hyperlinks is available as the data attribute \code{anchorlist}. -\end{methoddesc} - -\begin{methoddesc}{anchor_end}{} -This method is called at the end of an anchor region. The default -implementation adds a textual footnote marker using an index into the -list of hyperlinks created by \method{anchor_bgn()}. -\end{methoddesc} - -\begin{methoddesc}{handle_image}{source, alt\optional{, ismap\optional{, align\optional{, width\optional{, height}}}}} -This method is called to handle images. The default implementation -simply passes the \var{alt} value to the \method{handle_data()} -method. -\end{methoddesc} - -\begin{methoddesc}{save_bgn}{} -Begins saving character data in a buffer instead of sending it to the -formatter object. Retrieve the stored data via \method{save_end()}. -Use of the \method{save_bgn()} / \method{save_end()} pair may not be -nested. -\end{methoddesc} - -\begin{methoddesc}{save_end}{} -Ends buffering character data and returns all data saved since the -preceeding call to \method{save_bgn()}. If the \code{nofill} flag is -false, whitespace is collapsed to single spaces. A call to this -method without a preceeding call to \method{save_bgn()} will raise a -\exception{TypeError} exception. -\end{methoddesc} |