summaryrefslogtreecommitdiffstats
path: root/Doc/lib/libhtmllib.tex
diff options
context:
space:
mode:
Diffstat (limited to 'Doc/lib/libhtmllib.tex')
-rw-r--r--Doc/lib/libhtmllib.tex181
1 files changed, 0 insertions, 181 deletions
diff --git a/Doc/lib/libhtmllib.tex b/Doc/lib/libhtmllib.tex
deleted file mode 100644
index e51dfcb..0000000
--- a/Doc/lib/libhtmllib.tex
+++ /dev/null
@@ -1,181 +0,0 @@
-\section{\module{htmllib} ---
- A parser for HTML documents}
-
-\declaremodule{standard}{htmllib}
-\modulesynopsis{A parser for HTML documents.}
-
-\index{HTML}
-\index{hypertext}
-
-
-This module defines a class which can serve as a base for parsing text
-files formatted in the HyperText Mark-up Language (HTML). The class
-is not directly concerned with I/O --- it must be provided with input
-in string form via a method, and makes calls to methods of a
-``formatter'' object in order to produce output. The
-\class{HTMLParser} class is designed to be used as a base class for
-other classes in order to add functionality, and allows most of its
-methods to be extended or overridden. In turn, this class is derived
-from and extends the \class{SGMLParser} class defined in module
-\refmodule{sgmllib}\refstmodindex{sgmllib}. The \class{HTMLParser}
-implementation supports the HTML 2.0 language as described in
-\rfc{1866}. Two implementations of formatter objects are provided in
-the \refmodule{formatter}\refstmodindex{formatter}\ module; refer to the
-documentation for that module for information on the formatter
-interface.
-\withsubitem{(in module sgmllib)}{\ttindex{SGMLParser}}
-
-The following is a summary of the interface defined by
-\class{sgmllib.SGMLParser}:
-
-\begin{itemize}
-
-\item
-The interface to feed data to an instance is through the \method{feed()}
-method, which takes a string argument. This can be called with as
-little or as much text at a time as desired; \samp{p.feed(a);
-p.feed(b)} has the same effect as \samp{p.feed(a+b)}. When the data
-contains complete HTML markup constructs, these are processed immediately;
-incomplete constructs are saved in a buffer. To force processing of all
-unprocessed data, call the \method{close()} method.
-
-For example, to parse the entire contents of a file, use:
-\begin{verbatim}
-parser.feed(open('myfile.html').read())
-parser.close()
-\end{verbatim}
-
-\item
-The interface to define semantics for HTML tags is very simple: derive
-a class and define methods called \method{start_\var{tag}()},
-\method{end_\var{tag}()}, or \method{do_\var{tag}()}. The parser will
-call these at appropriate moments: \method{start_\var{tag}} or
-\method{do_\var{tag}()} is called when an opening tag of the form
-\code{<\var{tag} ...>} is encountered; \method{end_\var{tag}()} is called
-when a closing tag of the form \code{<\var{tag}>} is encountered. If
-an opening tag requires a corresponding closing tag, like \code{<H1>}
-... \code{</H1>}, the class should define the \method{start_\var{tag}()}
-method; if a tag requires no closing tag, like \code{<P>}, the class
-should define the \method{do_\var{tag}()} method.
-
-\end{itemize}
-
-The module defines a parser class and an exception:
-
-\begin{classdesc}{HTMLParser}{formatter}
-This is the basic HTML parser class. It supports all entity names
-required by the XHTML 1.0 Recommendation (\url{http://www.w3.org/TR/xhtml1}).
-It also defines handlers for all HTML 2.0 and many HTML 3.0 and 3.2 elements.
-\end{classdesc}
-
-\begin{excdesc}{HTMLParseError}
-Exception raised by the \class{HTMLParser} class when it encounters an
-error while parsing.
-\versionadded{2.4}
-\end{excdesc}
-
-
-\begin{seealso}
- \seemodule{formatter}{Interface definition for transforming an
- abstract flow of formatting events into
- specific output events on writer objects.}
- \seemodule{HTMLParser}{Alternate HTML parser that offers a slightly
- lower-level view of the input, but is
- designed to work with XHTML, and does not
- implement some of the SGML syntax not used in
- ``HTML as deployed'' and which isn't legal
- for XHTML.}
- \seemodule{htmlentitydefs}{Definition of replacement text for XHTML 1.0
- entities.}
- \seemodule{sgmllib}{Base class for \class{HTMLParser}.}
-\end{seealso}
-
-
-\subsection{HTMLParser Objects \label{html-parser-objects}}
-
-In addition to tag methods, the \class{HTMLParser} class provides some
-additional methods and instance variables for use within tag methods.
-
-\begin{memberdesc}[HTMLParser]{formatter}
-This is the formatter instance associated with the parser.
-\end{memberdesc}
-
-\begin{memberdesc}[HTMLParser]{nofill}
-Boolean flag which should be true when whitespace should not be
-collapsed, or false when it should be. In general, this should only
-be true when character data is to be treated as ``preformatted'' text,
-as within a \code{<PRE>} element. The default value is false. This
-affects the operation of \method{handle_data()} and \method{save_end()}.
-\end{memberdesc}
-
-
-\begin{methoddesc}[HTMLParser]{anchor_bgn}{href, name, type}
-This method is called at the start of an anchor region. The arguments
-correspond to the attributes of the \code{<A>} tag with the same
-names. The default implementation maintains a list of hyperlinks
-(defined by the \code{HREF} attribute for \code{<A>} tags) within the
-document. The list of hyperlinks is available as the data attribute
-\member{anchorlist}.
-\end{methoddesc}
-
-\begin{methoddesc}[HTMLParser]{anchor_end}{}
-This method is called at the end of an anchor region. The default
-implementation adds a textual footnote marker using an index into the
-list of hyperlinks created by \method{anchor_bgn()}.
-\end{methoddesc}
-
-\begin{methoddesc}[HTMLParser]{handle_image}{source, alt\optional{, ismap\optional{,
- align\optional{, width\optional{, height}}}}}
-This method is called to handle images. The default implementation
-simply passes the \var{alt} value to the \method{handle_data()}
-method.
-\end{methoddesc}
-
-\begin{methoddesc}[HTMLParser]{save_bgn}{}
-Begins saving character data in a buffer instead of sending it to the
-formatter object. Retrieve the stored data via \method{save_end()}.
-Use of the \method{save_bgn()} / \method{save_end()} pair may not be
-nested.
-\end{methoddesc}
-
-\begin{methoddesc}[HTMLParser]{save_end}{}
-Ends buffering character data and returns all data saved since the
-preceding call to \method{save_bgn()}. If the \member{nofill} flag is
-false, whitespace is collapsed to single spaces. A call to this
-method without a preceding call to \method{save_bgn()} will raise a
-\exception{TypeError} exception.
-\end{methoddesc}
-
-
-
-\section{\module{htmlentitydefs} ---
- Definitions of HTML general entities}
-
-\declaremodule{standard}{htmlentitydefs}
-\modulesynopsis{Definitions of HTML general entities.}
-\sectionauthor{Fred L. Drake, Jr.}{fdrake@acm.org}
-
-This module defines three dictionaries, \code{name2codepoint},
-\code{codepoint2name}, and \code{entitydefs}. \code{entitydefs} is
-used by the \refmodule{htmllib} module to provide the
-\member{entitydefs} member of the \class{HTMLParser} class. The
-definition provided here contains all the entities defined by XHTML 1.0
-that can be handled using simple textual substitution in the Latin-1
-character set (ISO-8859-1).
-
-
-\begin{datadesc}{entitydefs}
- A dictionary mapping XHTML 1.0 entity definitions to their
- replacement text in ISO Latin-1.
-
-\end{datadesc}
-
-\begin{datadesc}{name2codepoint}
- A dictionary that maps HTML entity names to the Unicode codepoints.
- \versionadded{2.3}
-\end{datadesc}
-
-\begin{datadesc}{codepoint2name}
- A dictionary that maps Unicode codepoints to HTML entity names.
- \versionadded{2.3}
-\end{datadesc}