diff options
Diffstat (limited to 'Doc/library/html.parser.rst')
-rw-r--r-- | Doc/library/html.parser.rst | 19 |
1 files changed, 17 insertions, 2 deletions
diff --git a/Doc/library/html.parser.rst b/Doc/library/html.parser.rst index 1fa11a2..06a3b1a 100644 --- a/Doc/library/html.parser.rst +++ b/Doc/library/html.parser.rst @@ -9,12 +9,20 @@ single: HTML single: XHTML +**Source code:** :source:`Lib/html/parser.py` + +-------------- + This module defines a class :class:`HTMLParser` which serves as the basis for parsing text files formatted in HTML (HyperText Mark-up Language) and XHTML. -.. class:: HTMLParser() +.. class:: HTMLParser(strict=True) - The :class:`HTMLParser` class is instantiated without arguments. + Create a parser instance. If *strict* is ``True`` (the default), invalid + html results in :exc:`~html.parser.HTMLParseError` exceptions [#]_. If + *strict* is ``False``, the parser uses heuristics to make a best guess at + the intention of any invalid html it encounters, similar to the way most + browsers do. An :class:`HTMLParser` instance is fed HTML data and calls handler functions when tags begin and end. The :class:`HTMLParser` class is meant to be overridden by the @@ -23,6 +31,8 @@ parsing text files formatted in HTML (HyperText Mark-up Language) and XHTML. This parser does not check that end tags match start tags or call the end-tag handler for elements which are closed implicitly by closing an outer element. + .. versionchanged:: 3.2 *strict* keyword added + An exception is defined as well: @@ -191,3 +201,8 @@ As a basic example, below is a very basic HTML parser that uses the Encountered a html end tag +.. rubric:: Footnotes + +.. [#] For backward compatibility reasons *strict* mode does not raise + exceptions for all non-compliant HTML. That is, some invalid HTML + is tolerated even in *strict* mode. |