#1486713: Add a tolerant mode to HTMLParser.

The motivation for adding this option is that the the functionality it provides used to be provided by sgmllib in Python2, and was used by, for example, BeautifulSoup. Without this option, the Python3 version of BeautifulSoup and the many programs that use it are crippled. The original patch was by 'kxroberto'. I modified it heavily but kept his heuristics and test. I also added additional heuristics to fix #975556, #1046092, and part of #6191. This patch should be completely backward compatible: the behavior with the default strict=True is unchanged.
author: R. David Murray <rdmurray@bitdance.com> 2010-12-03 04:06:39 (GMT)
committer: R. David Murray <rdmurray@bitdance.com> 2010-12-03 04:06:39 (GMT)
commit: b579dba1195df97f87ba868a5987f18fb7509bff (patch)
tree: d1ff2cf38f061ee0bba08459167e33daa7a4ad79 /Doc/library/html.parser.rst
parent: 79cdb661f5a6cf8bba07aa50f4451f6c409bb067 (diff)
download: cpython-b579dba1195df97f87ba868a5987f18fb7509bff.zip
cpython-b579dba1195df97f87ba868a5987f18fb7509bff.tar.gz
cpython-b579dba1195df97f87ba868a5987f18fb7509bff.tar.bz2
1 files changed, 11 insertions, 2 deletions
diff --git a/Doc/library/html.parser.rst b/Doc/library/html.parser.rst
index 2bc6555..743d183 100644
--- a/Doc/library/html.parser.rst
+++ b/Doc/library/html.parser.rst
@@ -12,9 +12,13 @@
 This module defines a class :class:`HTMLParser` which serves as the basis for
 parsing text files formatted in HTML (HyperText Mark-up Language) and XHTML.
 
-.. class:: HTMLParser()
+.. class:: HTMLParser(strict=True)
 
-   The :class:`HTMLParser` class is instantiated without arguments.
+   Create a parser instance.  If *strict* is ``True`` (the default), invalid
+   html results in :exc:`~html.parser.HTMLParseError` exceptions [#]_.  If
+   *strict* is ``False``, the parser uses heuristics to make a best guess at
+   the intention of any invalid html it encounters, similar to the way most
+   browsers do.
 
    An :class:`HTMLParser` instance is fed HTML data and calls handler functions when tags
    begin and end.  The :class:`HTMLParser` class is meant to be overridden by the
@@ -191,3 +195,8 @@ As a basic example, below is a very basic HTML parser that uses the
    Encountered a html end tag
 
 
+.. rubric:: Footnotes
+
+.. [#] For backward compatibility reasons *strict* mode does not throw
+       errors for all non-compliant HTML.  That is, some invalid HTML
+       is tolerated even in *strict* mode.
author	R. David Murray <rdmurray@bitdance.com>	2010-12-03 04:06:39 (GMT)
committer	R. David Murray <rdmurray@bitdance.com>	2010-12-03 04:06:39 (GMT)
commit	b579dba1195df97f87ba868a5987f18fb7509bff (patch)
tree	d1ff2cf38f061ee0bba08459167e33daa7a4ad79 /Doc/library/html.parser.rst
parent	79cdb661f5a6cf8bba07aa50f4451f6c409bb067 (diff)
download	cpython-b579dba1195df97f87ba868a5987f18fb7509bff.zip cpython-b579dba1195df97f87ba868a5987f18fb7509bff.tar.gz cpython-b579dba1195df97f87ba868a5987f18fb7509bff.tar.bz2