summaryrefslogtreecommitdiffstats
path: root/Doc/libhtmllib.tex
diff options
context:
space:
mode:
Diffstat (limited to 'Doc/libhtmllib.tex')
-rw-r--r--Doc/libhtmllib.tex119
1 files changed, 0 insertions, 119 deletions
diff --git a/Doc/libhtmllib.tex b/Doc/libhtmllib.tex
deleted file mode 100644
index c856b42..0000000
--- a/Doc/libhtmllib.tex
+++ /dev/null
@@ -1,119 +0,0 @@
-\section{Standard Module \module{htmllib}}
-\label{module-htmllib}
-\stmodindex{htmllib}
-\index{HTML}
-\index{hypertext}
-
-
-This module defines a class which can serve as a base for parsing text
-files formatted in the HyperText Mark-up Language (HTML). The class
-is not directly concerned with I/O --- it must be provided with input
-in string form via a method, and makes calls to methods of a
-``formatter'' object in order to produce output. The
-\class{HTMLParser} class is designed to be used as a base class for
-other classes in order to add functionality, and allows most of its
-methods to be extended or overridden. In turn, this class is derived
-from and extends the \class{SGMLParser} class defined in module
-\module{sgmllib}\refstmodindex{sgmllib}. The \class{HTMLParser}
-implementation supports the HTML 2.0 language as described in
-\rfc{1866}. Two implementations of formatter objects are provided in
-the \module{formatter}\refstmodindex{formatter} module; refer to the
-documentation for that module for information on the formatter
-interface.
-\index{SGML}
-\withsubitem{(in module sgmllib)}{\ttindex{SGMLParser}}
-\index{formatter}
-
-The following is a summary of the interface defined by
-\class{sgmllib.SGMLParser}:
-
-\begin{itemize}
-
-\item
-The interface to feed data to an instance is through the \method{feed()}
-method, which takes a string argument. This can be called with as
-little or as much text at a time as desired; \samp{p.feed(a);
-p.feed(b)} has the same effect as \samp{p.feed(a+b)}. When the data
-contains complete HTML tags, these are processed immediately;
-incomplete elements are saved in a buffer. To force processing of all
-unprocessed data, call the \method{close()} method.
-
-For example, to parse the entire contents of a file, use:
-\begin{verbatim}
-parser.feed(open('myfile.html').read())
-parser.close()
-\end{verbatim}
-
-\item
-The interface to define semantics for HTML tags is very simple: derive
-a class and define methods called \code{start_\var{tag}()},
-\code{end_\var{tag}()}, or \code{do_\var{tag}()}. The parser will
-call these at appropriate moments: \code{start_\var{tag}} or
-\code{do_\var{tag}()} is called when an opening tag of the form
-\code{<\var{tag} ...>} is encountered; \code{end_\var{tag}()} is called
-when a closing tag of the form \code{<\var{tag}>} is encountered. If
-an opening tag requires a corresponding closing tag, like \code{<H1>}
-... \code{</H1>}, the class should define the \code{start_\var{tag}()}
-method; if a tag requires no closing tag, like \code{<P>}, the class
-should define the \code{do_\var{tag}()} method.
-
-\end{itemize}
-
-The module defines a single class:
-
-\begin{classdesc}{HTMLParser}{formatter}
-This is the basic HTML parser class. It supports all entity names
-required by the HTML 2.0 specification (\rfc{1866}). It also defines
-handlers for all HTML 2.0 and many HTML 3.0 and 3.2 elements.
-\end{classdesc}
-
-In addition to tag methods, the \class{HTMLParser} class provides some
-additional methods and instance variables for use within tag methods.
-
-\begin{memberdesc}{formatter}
-This is the formatter instance associated with the parser.
-\end{memberdesc}
-
-\begin{memberdesc}{nofill}
-Boolean flag which should be true when whitespace should not be
-collapsed, or false when it should be. In general, this should only
-be true when character data is to be treated as ``preformatted'' text,
-as within a \code{<PRE>} element. The default value is false. This
-affects the operation of \method{handle_data()} and \method{save_end()}.
-\end{memberdesc}
-
-
-\begin{methoddesc}{anchor_bgn}{href, name, type}
-This method is called at the start of an anchor region. The arguments
-correspond to the attributes of the \code{<A>} tag with the same
-names. The default implementation maintains a list of hyperlinks
-(defined by the \code{href} attribute) within the document. The list
-of hyperlinks is available as the data attribute \code{anchorlist}.
-\end{methoddesc}
-
-\begin{methoddesc}{anchor_end}{}
-This method is called at the end of an anchor region. The default
-implementation adds a textual footnote marker using an index into the
-list of hyperlinks created by \method{anchor_bgn()}.
-\end{methoddesc}
-
-\begin{methoddesc}{handle_image}{source, alt\optional{, ismap\optional{, align\optional{, width\optional{, height}}}}}
-This method is called to handle images. The default implementation
-simply passes the \var{alt} value to the \method{handle_data()}
-method.
-\end{methoddesc}
-
-\begin{methoddesc}{save_bgn}{}
-Begins saving character data in a buffer instead of sending it to the
-formatter object. Retrieve the stored data via \method{save_end()}.
-Use of the \method{save_bgn()} / \method{save_end()} pair may not be
-nested.
-\end{methoddesc}
-
-\begin{methoddesc}{save_end}{}
-Ends buffering character data and returns all data saved since the
-preceeding call to \method{save_bgn()}. If the \code{nofill} flag is
-false, whitespace is collapsed to single spaces. A call to this
-method without a preceeding call to \method{save_bgn()} will raise a
-\exception{TypeError} exception.
-\end{methoddesc}