| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
(GH-140904)
|
| |
|
|
|
|
| |
* the "plaintext" element
* the RAWTEXT elements "xmp", "iframe", "noembed" and "noframes"
* optionally RAWTEXT (if scripting=True) element "noscript"
|
| |
|
|
| |
Bogus comments that start with "<![CDATA[" should not include the starting "!"
in its value.
|
| |
|
|
|
|
|
|
|
| |
"] ]>" and "]] >" no longer end the CDATA section.
Make CDATA section parsing context depending.
Add private method HTMLParser._set_support_cdata() to change the context.
If called with True, "<[CDATA[" starts a CDATA section which ends with "]]>".
If called with False, "<[CDATA[" starts a bogus comments which ends with ">".
|
| |
|
|
|
|
| |
(#135310)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Łukasz Langa <lukasz@langa.pl>
|
| |
|
|
|
| |
in HTMLParser (GH-136908)
This fixes a regression introduced in GH-135930.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
(GH-135664)
* "--!>" now ends the comment.
* "-- >" no longer ends the comment.
* Support abnormally ended empty comments "<-->" and "<--->".
---------
Co-author: Kerim Kabirov <the.privat33r+gh@pm.me>
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
HTML5 standard (GH-135930)
* Whitespaces no longer accepted between `</` and the tag name.
E.g. `</ script>` does not end the script section.
* Vertical tabulation (`\v`) and non-ASCII whitespaces no longer recognized
as whitespaces. The only whitespaces are `\t\n\r\f `.
* Null character (U+0000) no longer ends the tag name.
* Attributes and slashes after the tag name in end tags are now ignored,
instead of terminating after the first `>` in quoted attribute value.
E.g. `</script/foo=">"/>`.
* Multiple slashes and whitespaces between the last attribute and closing `>`
are now ignored in both start and end tags. E.g. `<a foo=bar/ //>`.
* Multiple `=` between attribute name and value are no longer collapsed.
E.g. `<a foo==bar>` produces attribute "foo" with value "=bar".
* Whitespaces between the `=` separator and attribute name or value are no
longer ignored. E.g. `<a foo =bar>` produces two attributes "foo" and
"=bar", both with value None; `<a foo= bar>` produces two attributes:
"foo" with value "" and "bar" with value None.
* Fix Sphinx errors.
* Apply suggestions from code review
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
* Address review comments.
* Move to Security.
---------
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
|
| |
|
|
|
|
| |
HTMLParser (GH-135464)
End-of-file errors are now handled according to the HTML5 specs --
comments and declarations are automatically closed, tags are ignored.
|
| |
|
|
|
|
|
| |
(GH-22658)
When calling .close() the HTMLParser should flush all remaining content,
even when that content is in an unclosed script or style tag.
|
| |
|
| |
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
|
| |
|
|
|
|
|
|
|
|
| |
in attribute values (GH-95215)
According to the HTML5 spec, named character references in attribute values
should only be processed if they are not followed by an ASCII alphanumeric,
or an equals sign.
https://html.spec.whatwg.org/multipage/parsing.html#named-character-reference-state
|
| |
|
|
|
|
|
| |
* gh-95813: Improve HTMLParser from the view of inheritance
* gh-95813: Add unittest
* Address code review
|
| |
|
|
|
| |
Support for HtmlParserError was removed back in 2014 with commit
73a4359eb0eb624c588c5d52083ea4944f9787ea, however this small block was
missed.
|
| |
|
|
|
| |
Fix typos in the Lib directory as identified by codespell.
Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* bpo-41748: Adds tests for unquoted attributes with comma
* bpo-41748: Handles unquoted attributes with comma
* bpo-41748: Addresses review comments
* bpo-41748: Addresses review comments
* Adds more test cases
* Simplifies the regex for handling spaces
* bpo-41748: Moves attributes tests under the right class
* bpo-41748: Addresses review about duplicate attributes
* bpo-41748: Adds NEWS.d entry for this patch
|
| |
|
| |
It is deprecated since Python 3.4.
|
| |
|
|
|
|
| |
(#2099)
elem is the result of .lower() 6 lines above the handle_endtag call.
Patch by Motoki Naruse
|
| |
|
|
|
| |
* Revert "Fixed a typo in the HTMLParser.feed docstrings. The docstring started with an 'r', like a The docstring was correct. I read the patch in opposite direction, as *adding* the "r" prefix.
This reverts commit 5ba185039f1bd465d3f82531324fd3fe1ee42f0c.
|
| |
|
|
| |
an 'r', like a rawstring. (#1759)
|
| |
|
|
|
|
|
| |
And most of the tools.
Patch by Emanual Barry, reviewed by me, Serhiy Storchaka, and
Martin Panter.
|
| |
|
|
| |
Most fixes to Doc/ and Lib/ directories by Ville Skyttä.
|
| |\ |
|
| | |
| |
| |
| | |
convert_charrefs is True.
|
| | |
| |
| |
| | |
HTMLParser to True. Patch by Berker Peksag.
|
| |/
|
|
| |
the HTMLParserError exception have been removed.
|
| |\ |
|
| | | |
|
| | |
| |
| |
| | |
True, automatically converts all character references.
|
| | | |
|
| | | |
|
| |\ \
| |/ |
|
| | |
| |
| |
| | |
HTML5 standard.
|
| | |
| |
| |
| | |
strict argument of HTMLParser or the HTMLParser.error method are used.
|
| |\ \
| |/ |
|
| | |
| |
| |
| | |
Barlow.
|
| |/ |
|
| | |
|
| |
|
|
| |
deprecated now that the parser is able to parse invalid markup.
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
|
|
| |
HTMLParser.
|
| |
|
|
| |
``<script>...</script>`` and ``<style>...</style>``.
|
| |
|
|
| |
when strict=False.
|
| |
|
|
| |
than 128 entities. Patch by Peter Otten.
|