summaryrefslogtreecommitdiffstats
path: root/Doc/library/htmllib.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Doc/library/htmllib.rst')
-rw-r--r--Doc/library/htmllib.rst147
1 files changed, 0 insertions, 147 deletions
diff --git a/Doc/library/htmllib.rst b/Doc/library/htmllib.rst
deleted file mode 100644
index 5e6554a..0000000
--- a/Doc/library/htmllib.rst
+++ /dev/null
@@ -1,147 +0,0 @@
-
-:mod:`htmllib` --- A parser for HTML documents
-==============================================
-
-.. module:: htmllib
- :synopsis: A parser for HTML documents.
-
-
-.. index::
- single: HTML
- single: hypertext
-
-.. index::
- module: sgmllib
- module: formatter
- single: SGMLParser (in module sgmllib)
-
-This module defines a class which can serve as a base for parsing text files
-formatted in the HyperText Mark-up Language (HTML). The class is not directly
-concerned with I/O --- it must be provided with input in string form via a
-method, and makes calls to methods of a "formatter" object in order to produce
-output. The :class:`HTMLParser` class is designed to be used as a base class
-for other classes in order to add functionality, and allows most of its methods
-to be extended or overridden. In turn, this class is derived from and extends
-the :class:`SGMLParser` class defined in module :mod:`sgmllib`. The
-:class:`HTMLParser` implementation supports the HTML 2.0 language as described
-in :rfc:`1866`. Two implementations of formatter objects are provided in the
-:mod:`formatter` module; refer to the documentation for that module for
-information on the formatter interface.
-
-The following is a summary of the interface defined by
-:class:`sgmllib.SGMLParser`:
-
-* The interface to feed data to an instance is through the :meth:`feed` method,
- which takes a string argument. This can be called with as little or as much
- text at a time as desired; ``p.feed(a); p.feed(b)`` has the same effect as
- ``p.feed(a+b)``. When the data contains complete HTML markup constructs, these
- are processed immediately; incomplete constructs are saved in a buffer. To
- force processing of all unprocessed data, call the :meth:`close` method.
-
- For example, to parse the entire contents of a file, use::
-
- parser.feed(open('myfile.html').read())
- parser.close()
-
-* The interface to define semantics for HTML tags is very simple: derive a class
- and define methods called :meth:`start_tag`, :meth:`end_tag`, or :meth:`do_tag`.
- The parser will call these at appropriate moments: :meth:`start_tag` or
- :meth:`do_tag` is called when an opening tag of the form ``<tag ...>`` is
- encountered; :meth:`end_tag` is called when a closing tag of the form ``<tag>``
- is encountered. If an opening tag requires a corresponding closing tag, like
- ``<H1>`` ... ``</H1>``, the class should define the :meth:`start_tag` method; if
- a tag requires no closing tag, like ``<P>``, the class should define the
- :meth:`do_tag` method.
-
-The module defines a parser class and an exception:
-
-
-.. class:: HTMLParser(formatter)
-
- This is the basic HTML parser class. It supports all entity names required by
- the XHTML 1.0 Recommendation (http://www.w3.org/TR/xhtml1). It also defines
- handlers for all HTML 2.0 and many HTML 3.0 and 3.2 elements.
-
-
-.. exception:: HTMLParseError
-
- Exception raised by the :class:`HTMLParser` class when it encounters an error
- while parsing.
-
-
-.. seealso::
-
- Module :mod:`formatter`
- Interface definition for transforming an abstract flow of formatting events into
- specific output events on writer objects.
-
- Module :mod:`html.parser`
- Alternate HTML parser that offers a slightly lower-level view of the input, but
- is designed to work with XHTML, and does not implement some of the SGML syntax
- not used in "HTML as deployed" and which isn't legal for XHTML.
-
- Module :mod:`html.entities`
- Definition of replacement text for XHTML 1.0 entities.
-
- Module :mod:`sgmllib`
- Base class for :class:`HTMLParser`.
-
-
-.. _html-parser-objects:
-
-HTMLParser Objects
-------------------
-
-In addition to tag methods, the :class:`HTMLParser` class provides some
-additional methods and instance variables for use within tag methods.
-
-
-.. attribute:: HTMLParser.formatter
-
- This is the formatter instance associated with the parser.
-
-
-.. attribute:: HTMLParser.nofill
-
- Boolean flag which should be true when whitespace should not be collapsed, or
- false when it should be. In general, this should only be true when character
- data is to be treated as "preformatted" text, as within a ``<PRE>`` element.
- The default value is false. This affects the operation of :meth:`handle_data`
- and :meth:`save_end`.
-
-
-.. method:: HTMLParser.anchor_bgn(href, name, type)
-
- This method is called at the start of an anchor region. The arguments
- correspond to the attributes of the ``<A>`` tag with the same names. The
- default implementation maintains a list of hyperlinks (defined by the ``HREF``
- attribute for ``<A>`` tags) within the document. The list of hyperlinks is
- available as the data attribute :attr:`anchorlist`.
-
-
-.. method:: HTMLParser.anchor_end()
-
- This method is called at the end of an anchor region. The default
- implementation adds a textual footnote marker using an index into the list of
- hyperlinks created by :meth:`anchor_bgn`.
-
-
-.. method:: HTMLParser.handle_image(source, alt[, ismap[, align[, width[, height]]]])
-
- This method is called to handle images. The default implementation simply
- passes the *alt* value to the :meth:`handle_data` method.
-
-
-.. method:: HTMLParser.save_bgn()
-
- Begins saving character data in a buffer instead of sending it to the formatter
- object. Retrieve the stored data via :meth:`save_end`. Use of the
- :meth:`save_bgn` / :meth:`save_end` pair may not be nested.
-
-
-.. method:: HTMLParser.save_end()
-
- Ends buffering character data and returns all data saved since the preceding
- call to :meth:`save_bgn`. If the :attr:`nofill` flag is false, whitespace is
- collapsed to single spaces. A call to this method without a preceding call to
- :meth:`save_bgn` will raise a :exc:`TypeError` exception.