cpython.git - https://github.com/python/cpython.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	gh-140875: Fix handling of unclosed charrefs before EOF in HTMLParser ↵	Serhiy Storchaka	2025-11-19	1	-10/+19
\| \| \| \|	(GH-140904)
*	gh-137836: Support more RAWTEXT and PLAINTEXT elements in HTMLParser (GH-137837)	Serhiy Storchaka	2025-10-31	1	-6/+18
\| \| \| \| \| \|	* the "plaintext" element * the RAWTEXT elements "xmp", "iframe", "noembed" and "noframes" * optionally RAWTEXT (if scripting=True) element "noscript"
*	gh-135661: Fix parsing unterminated bogus comments in HTMLParser (GH-137873)	Serhiy Storchaka	2025-08-17	1	-14/+8
\| \| \| \|	Bogus comments that start with "<![CDATA[" should not include the starting "!" in its value.
*	gh-135661: Fix CDATA section parsing in HTMLParser (GH-135665)	Serhiy Storchaka	2025-08-14	1	-2/+26
\| \| \| \| \| \| \| \| \|	"] ]>" and "]] >" no longer end the CDATA section. Make CDATA section parsing context depending. Add private method HTMLParser._set_support_cdata() to change the context. If called with True, "<[CDATA[" starts a CDATA section which ends with "]]>". If called with False, "<[CDATA[" starts a bogus comments which ends with ">".
*	gh-118350: Fix support of elements "textarea" and "title" in HTMLParser ↵	Timon Viola	2025-07-22	1	-5/+15
\| \| \| \| \| \|	(#135310) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> Co-authored-by: Łukasz Langa <lukasz@langa.pl>
*	gh-135661: Fix parsing attributes with whitespaces around the "=" separator ↵	Serhiy Storchaka	2025-07-21	1	-2/+2
\| \| \| \| \|	in HTMLParser (GH-136908) This fixes a regression introduced in GH-135930.
*	gh-102555: Fix comment parsing in HTMLParser according to the HTML5 standard ↵	Serhiy Storchaka	2025-07-04	1	-1/+17
\| \| \| \| \| \| \| \| \| \| \| \|	(GH-135664) * "--!>" now ends the comment. * "-- >" no longer ends the comment. * Support abnormally ended empty comments "<-->" and "<--->". --------- Co-author: Kerim Kabirov <the.privat33r+gh@pm.me> Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
*	gh-135661: Fix parsing start and end tags in HTMLParser according to the ↵	Serhiy Storchaka	2025-07-03	1	-74/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	HTML5 standard (GH-135930) * Whitespaces no longer accepted between `</` and the tag name. E.g. `</ script>` does not end the script section. * Vertical tabulation (`\v`) and non-ASCII whitespaces no longer recognized as whitespaces. The only whitespaces are `\t\n\r\f `. * Null character (U+0000) no longer ends the tag name. * Attributes and slashes after the tag name in end tags are now ignored, instead of terminating after the first `>` in quoted attribute value. E.g. `</script/foo=">"/>`. * Multiple slashes and whitespaces between the last attribute and closing `>` are now ignored in both start and end tags. E.g. `<a foo=bar/ //>`. * Multiple `=` between attribute name and value are no longer collapsed. E.g. `<a foo==bar>` produces attribute "foo" with value "=bar". * Whitespaces between the `=` separator and attribute name or value are no longer ignored. E.g. `<a foo =bar>` produces two attributes "foo" and "=bar", both with value None; `<a foo= bar>` produces two attributes: "foo" with value "" and "bar" with value None. * Fix Sphinx errors. * Apply suggestions from code review Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com> * Address review comments. * Move to Security. --------- Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
*	gh-135462: Fix quadratic complexity in processing special input in ↵	Serhiy Storchaka	2025-06-13	1	-11/+30
\| \| \| \| \| \|	HTMLParser (GH-135464) End-of-file errors are now handled according to the HTML5 specs -- comments and declarations are automatically closed, tags are ignored.
*	gh-86155: Fix data loss after unclosed script or style tag in HTMLParser ↵	Waylan Limberg	2025-05-10	1	-1/+1
\| \| \| \| \| \| \|	(GH-22658) When calling .close() the HTMLParser should flush all remaining content, even when that content is in an unclosed script or style tag.
*	gh-77057: Fix handling of invalid markup declarations in HTMLParser (GH-9295)	Ezio Melotti	2025-05-10	1	-2/+2
\| \| \|	Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
*	gh-69426: HTMLParser: only unescape properly terminated character entities ↵	Sascha Ißbrücker	2025-05-07	1	-1/+19
\| \| \| \| \| \| \| \| \| \|	in attribute values (GH-95215) According to the HTML5 spec, named character references in attribute values should only be processed if they are not followed by an ASCII alphanumeric, or an equals sign. https://html.spec.whatwg.org/multipage/parsing.html#named-character-reference-state
*	gh-95813: Improve HTMLParser from the view of inheritance (#95874)	Dong-hee Na	2022-08-18	1	-1/+2
\| \| \| \| \| \| \|	* gh-95813: Improve HTMLParser from the view of inheritance * gh-95813: Add unittest * Address code review
*	bpo-45421: Remove dead code from html.parser (GH-28847)	Alberto Mardegan	2021-10-12	1	-7/+0
\| \| \| \| \|	Support for HtmlParserError was removed back in 2014 with commit 73a4359eb0eb624c588c5d52083ea4944f9787ea, however this small block was missed.
*	Fix typos in the Lib directory (GH-28775)	Christian Clauss	2021-10-06	1	-1/+1
\| \| \| \| \|	Fix typos in the Lib directory as identified by codespell. Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
*	bpo-41748: Handles unquoted attributes with commas (#24072)	Karl Dubost	2021-02-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* bpo-41748: Adds tests for unquoted attributes with comma * bpo-41748: Handles unquoted attributes with comma * bpo-41748: Addresses review comments * bpo-41748: Addresses review comments * Adds more test cases * Simplifies the regex for handling spaces * bpo-41748: Moves attributes tests under the right class * bpo-41748: Addresses review about duplicate attributes * bpo-41748: Adds NEWS.d entry for this patch
*	bpo-37328: remove deprecated HTMLParser.unescape (GH-14186)	Inada Naoki	2019-08-27	1	-8/+0
\| \| \|	It is deprecated since Python 3.4.
*	bpo-30629: Remove second call of str.lower() in html.parser.parse_endtag. ↵	Motoki Naruse	2017-06-17	1	-1/+1
\| \| \| \| \| \|	(#2099) elem is the result of .lower() 6 lines above the handle_endtag call. Patch by Motoki Naruse
*	Revert "Fixed a typo in the HTMLParser.feed docstrings" (#1771)	Serhiy Storchaka	2017-05-24	1	-1/+1
\| \| \| \| \|	* Revert "Fixed a typo in the HTMLParser.feed docstrings. The docstring started with an 'r', like a The docstring was correct. I read the patch in opposite direction, as adding the "r" prefix. This reverts commit 5ba185039f1bd465d3f82531324fd3fe1ee42f0c.
*	Fixed a typo in the HTMLParser.feed docstrings. The docstring started with ↵	Jani Šumak	2017-05-23	1	-1/+1
\| \| \| \|	an 'r', like a rawstring. (#1759)
*	#27364: fix "incorrect" uses of escape character in the stdlib.	R David Murray	2016-09-08	1	-2/+2
\| \| \| \| \| \| \|	And most of the tools. Patch by Emanual Barry, reviewed by me, Serhiy Storchaka, and Martin Panter.
*	Issue #27076: Doc, comment and tests spelling fixes	Martin Panter	2016-05-26	1	-1/+1
\| \| \| \|	Most fixes to Doc/ and Lib/ directories by Ville Skyttä.
*	#23144: merge with 3.4.	Ezio Melotti	2015-09-06	1	-1/+9
\|\
\| *	#23144: Make sure that HTMLParser.feed() returns all the data, even when ↵	Ezio Melotti	2015-09-06	1	-1/+9
\| \| \| \| \| \| \| \|	convert_charrefs is True.
* \|	#21047: set the default value for the convert_charrefs argument of ↵	Ezio Melotti	2014-08-02	1	-8/+2
\| \| \| \| \| \| \| \|	HTMLParser to True. Patch by Berker Peksag.
* \|	#15114: the strict mode and argument of HTMLParser, HTMLParser.error, and ↵	Ezio Melotti	2014-08-02	1	-94/+12
\|/ \| \| \|	the HTMLParserError exception have been removed.
*	#20288: merge with 3.3.	Ezio Melotti	2014-02-01	1	-3/+3
\|\
\| *	#20288: fix handling of invalid numeric charrefs in HTMLParser.	Ezio Melotti	2014-02-01	1	-3/+3
\| \|
* \|	#13633: Added a new convert_charrefs keyword arg to HTMLParser that, when ↵	Ezio Melotti	2013-11-23	1	-17/+45
\| \| \| \| \| \| \| \|	True, automatically converts all character references.
* \|	#19688: add back and deprecate the internal HTMLParser.unescape() method.	Ezio Melotti	2013-11-22	1	-0/+7
\| \|
* \|	#2927: Added the unescape() function to the html module.	Ezio Melotti	2013-11-19	1	-33/+5
\| \|
* \|	#19480: merge with 3.3.	Ezio Melotti	2013-11-07	1	-9/+12
\|\ \ \| \|/
\| *	#19480: HTMLParser now accepts all valid start-tag names as defined by the ↵	Ezio Melotti	2013-11-07	1	-9/+12
\| \| \| \| \| \| \| \|	HTML5 standard.
* \|	#15114: The html.parser module now raises a DeprecationWarning when the ↵	Ezio Melotti	2013-11-02	1	-4/+10
\| \| \| \| \| \| \| \|	strict argument of HTMLParser or the HTMLParser.error method are used.
* \|	#17802: merge with 3.3.	Ezio Melotti	2013-05-01	1	-0/+1
\|\ \ \| \|/
\| *	#17802: Fix an UnboundLocalError in html.parser. Initial tests by Thomas ↵	Ezio Melotti	2013-05-01	1	-0/+1
\| \| \| \| \| \| \| \|	Barlow.
* \|	#14679: add an __all__ (that contains only HTMLParser) to html.parser.	Ezio Melotti	2013-05-01	1	-0/+2
\|/
*	#15156: HTMLParser now uses the new "html.entities.html5" dictionary.	Ezio Melotti	2012-06-24	1	-17/+15
\|
*	#15114: the strict mode of HTMLParser and the HTMLParseError exception are ↵	Ezio Melotti	2012-06-23	1	-9/+12
\| \| \| \|	deprecated now that the parser is able to parse invalid markup.
*	#14538: HTMLParser can now parse correctly start tags that contain a bare /.	Ezio Melotti	2012-04-19	1	-3/+3
\|
*	HTMLParser is now able to handle slashes in the start tag.	Ezio Melotti	2012-02-21	1	-7/+11
\|
*	Fix an index and clean up comments.	Ezio Melotti	2012-02-13	1	-1/+2
\|
*	Improve handling of declarations in HTMLParser.	Ezio Melotti	2012-02-13	1	-8/+22
\|
*	#13993: HTMLParser is now able to handle broken end tags when strict=False.	Ezio Melotti	2012-02-13	1	-15/+27
\|
*	#13960: HTMLParser is now able to handle broken comments when strict=False.	Ezio Melotti	2012-02-10	1	-1/+24
\|
*	#13358: HTMLParser now calls handle_data only once for each CDATA.	Ezio Melotti	2011-11-18	1	-3/+4
\|
*	#1745761, #755670, #13357, #12629, #1200313: improve attribute handling in ↵	Ezio Melotti	2011-11-14	1	-9/+10
\| \| \| \|	HTMLParser.
*	#670664: Fix HTMLParser to correctly handle the content of ↵	Ezio Melotti	2011-11-01	1	-4/+18
\| \| \| \|	``<script>...</script>`` and ``<style>...</style>``.
*	#13273: fix a bug that prevented HTMLParser to properly detect some tags ↵	Ezio Melotti	2011-10-28	1	-3/+2
\| \| \| \|	when strict=False.
*	#12888: Fix a bug in HTMLParser.unescape that prevented it to escape more ↵	Ezio Melotti	2011-09-05	1	-1/+1
\| \| \| \|	than 128 entities. Patch by Peter Otten.