| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
| |
HTMLParser (GH-135310) (GH-136985)
(cherry picked from commit 4d02f31cdd45d81b95540d9076222b709d4f2335)
Co-authored-by: Timon Viola <44016238+timonviola@users.noreply.github.com>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Łukasz Langa <lukasz@langa.pl>
|
| |
|
|
|
|
|
|
| |
separator in HTMLParser (GH-136908) (GH-136918)
This fixes a regression introduced in GH-135930.
(cherry picked from commit dee650189497735edbc08a54edabb5b06ef1bd09)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
standard (GH-135664) (GH-136272)
* "--!>" now ends the comment.
* "-- >" no longer ends the comment.
* Support abnormally ended empty comments "<-->" and "<--->".
---------
(cherry picked from commit 8ac7613dc8b8f82253d7c0e2b6ef6ed703a0a1ee)
Co-author: Kerim Kabirov <the.privat33r+gh@pm.me>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the HTML5 standard (GH-135930) (GH-136256)
* Whitespaces no longer accepted between `</` and the tag name.
E.g. `</ script>` does not end the script section.
* Vertical tabulation (`\v`) and non-ASCII whitespaces no longer recognized
as whitespaces. The only whitespaces are `\t\n\r\f `.
* Null character (U+0000) no longer ends the tag name.
* Attributes and slashes after the tag name in end tags are now ignored,
instead of terminating after the first `>` in quoted attribute value.
E.g. `</script/foo=">"/>`.
* Multiple slashes and whitespaces between the last attribute and closing `>`
are now ignored in both start and end tags. E.g. `<a foo=bar/ //>`.
* Multiple `=` between attribute name and value are no longer collapsed.
E.g. `<a foo==bar>` produces attribute "foo" with value "=bar".
* Whitespaces between the `=` separator and attribute name or value are no
longer ignored. E.g. `<a foo =bar>` produces two attributes "foo" and
"=bar", both with value None; `<a foo= bar>` produces two attributes:
"foo" with value "" and "bar" with value None.
---------
(cherry picked from commit 0243f97cbadec8d985e63b1daec5d1cbc850cae3)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
|
| |
|
|
|
|
|
|
|
| |
HTMLParser (GH-135464) (GH-135482)
End-of-file errors are now handled according to the HTML5 specs --
comments and declarations are automatically closed, tags are ignored.
(cherry picked from commit 6eb6c5dbfb528bd07d77b60fd71fd05d81d45c41)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
|
| |
|
|
|
|
|
|
|
| |
HTMLParser (GH-22658) (GH-133845)
When calling .close() the HTMLParser should flush all remaining content,
even when that content is in an unclosed script or style tag.
(cherry picked from commit 53383e90e4df7029f792b7aa81aa2e4cff348ed0)
Co-authored-by: Waylan Limberg <waylan.limberg@icloud.com>
|
| |
|
|
|
|
|
|
| |
(GH-9295) (GH-133834)
(cherry picked from commit 76c0b01bc401c3e976011bbc69cec56dbebe0ad5)
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
entities in attribute values (GH-95215) (GH-133586)
According to the HTML5 spec, named character references in attribute values
should only be processed if they are not followed by an ASCII alphanumeric,
or an equals sign.
(cherry picked from commit 77b14a6d58e527f915966446eb0866652a46feb5)
https: //html.spec.whatwg.org/multipage/parsing.html#named-character-reference-state
Co-authored-by: Sascha Ißbrücker <sascha.issbruecker@googlemail.com>
|
| |
|
|
|
|
|
| |
* gh-95813: Improve HTMLParser from the view of inheritance
* gh-95813: Add unittest
* Address code review
|
| |
|
|
|
| |
Support for HtmlParserError was removed back in 2014 with commit
73a4359eb0eb624c588c5d52083ea4944f9787ea, however this small block was
missed.
|
| |
|
|
|
| |
Fix typos in the Lib directory as identified by codespell.
Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* bpo-41748: Adds tests for unquoted attributes with comma
* bpo-41748: Handles unquoted attributes with comma
* bpo-41748: Addresses review comments
* bpo-41748: Addresses review comments
* Adds more test cases
* Simplifies the regex for handling spaces
* bpo-41748: Moves attributes tests under the right class
* bpo-41748: Addresses review about duplicate attributes
* bpo-41748: Adds NEWS.d entry for this patch
|
| |
|
| |
It is deprecated since Python 3.4.
|
| |
|
|
|
|
| |
(#2099)
elem is the result of .lower() 6 lines above the handle_endtag call.
Patch by Motoki Naruse
|
| |
|
|
|
| |
* Revert "Fixed a typo in the HTMLParser.feed docstrings. The docstring started with an 'r', like a The docstring was correct. I read the patch in opposite direction, as *adding* the "r" prefix.
This reverts commit 5ba185039f1bd465d3f82531324fd3fe1ee42f0c.
|
| |
|
|
| |
an 'r', like a rawstring. (#1759)
|
| |
|
|
|
|
|
| |
And most of the tools.
Patch by Emanual Barry, reviewed by me, Serhiy Storchaka, and
Martin Panter.
|
| |
|
|
| |
Most fixes to Doc/ and Lib/ directories by Ville Skyttä.
|
| |\ |
|
| | |
| |
| |
| | |
convert_charrefs is True.
|
| | |
| |
| |
| | |
HTMLParser to True. Patch by Berker Peksag.
|
| |/
|
|
| |
the HTMLParserError exception have been removed.
|
| |\ |
|
| | | |
|
| | |
| |
| |
| | |
True, automatically converts all character references.
|
| | | |
|
| | | |
|
| |\ \
| |/ |
|
| | |
| |
| |
| | |
HTML5 standard.
|
| | |
| |
| |
| | |
strict argument of HTMLParser or the HTMLParser.error method are used.
|
| |\ \
| |/ |
|
| | |
| |
| |
| | |
Barlow.
|
| |/ |
|
| | |
|
| |
|
|
| |
deprecated now that the parser is able to parse invalid markup.
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
|
|
| |
HTMLParser.
|
| |
|
|
| |
``<script>...</script>`` and ``<style>...</style>``.
|
| |
|
|
| |
when strict=False.
|
| |
|
|
| |
than 128 entities. Patch by Peter Otten.
|
| |\ |
|
| | | |
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
svn+ssh://pythondev@svn.python.org/python/branches/py3k
........
r87542 | senthil.kumaran | 2010-12-28 23:55:16 +0800 (Tue, 28 Dec 2010) | 3 lines
Fix Issue10759 - html.parser.unescape() fails on HTML entities with incorrect syntax
........
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
svn+ssh://pythondev@svn.python.org/python/branches/py3k
................
r81504 | victor.stinner | 2010-05-24 23:46:25 +0200 (lun., 24 mai 2010) | 13 lines
Recorded merge of revisions 81500-81501 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk
........
r81500 | victor.stinner | 2010-05-24 23:33:24 +0200 (lun., 24 mai 2010) | 2 lines
Issue #6662: Fix parsing of malformatted charref (&#bad;)
........
r81501 | victor.stinner | 2010-05-24 23:37:28 +0200 (lun., 24 mai 2010) | 2 lines
Add the author of the last fix (Issue #6662)
........
................
|