summaryrefslogtreecommitdiffstats
path: root/Doc/library/html.parser.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Doc/library/html.parser.rst')
-rw-r--r--Doc/library/html.parser.rst19
1 files changed, 17 insertions, 2 deletions
diff --git a/Doc/library/html.parser.rst b/Doc/library/html.parser.rst
index 1fa11a2..06a3b1a 100644
--- a/Doc/library/html.parser.rst
+++ b/Doc/library/html.parser.rst
@@ -9,12 +9,20 @@
single: HTML
single: XHTML
+**Source code:** :source:`Lib/html/parser.py`
+
+--------------
+
This module defines a class :class:`HTMLParser` which serves as the basis for
parsing text files formatted in HTML (HyperText Mark-up Language) and XHTML.
-.. class:: HTMLParser()
+.. class:: HTMLParser(strict=True)
- The :class:`HTMLParser` class is instantiated without arguments.
+ Create a parser instance. If *strict* is ``True`` (the default), invalid
+ html results in :exc:`~html.parser.HTMLParseError` exceptions [#]_. If
+ *strict* is ``False``, the parser uses heuristics to make a best guess at
+ the intention of any invalid html it encounters, similar to the way most
+ browsers do.
An :class:`HTMLParser` instance is fed HTML data and calls handler functions when tags
begin and end. The :class:`HTMLParser` class is meant to be overridden by the
@@ -23,6 +31,8 @@ parsing text files formatted in HTML (HyperText Mark-up Language) and XHTML.
This parser does not check that end tags match start tags or call the end-tag
handler for elements which are closed implicitly by closing an outer element.
+ .. versionchanged:: 3.2 *strict* keyword added
+
An exception is defined as well:
@@ -191,3 +201,8 @@ As a basic example, below is a very basic HTML parser that uses the
Encountered a html end tag
+.. rubric:: Footnotes
+
+.. [#] For backward compatibility reasons *strict* mode does not raise
+ exceptions for all non-compliant HTML. That is, some invalid HTML
+ is tolerated even in *strict* mode.