summaryrefslogtreecommitdiffstats
path: root/Lib/sgmllib.py
Commit message (Collapse)AuthorAgeFilesLines
* Deprecate htmllib and sgmllib for 3.0.Georg Brandl2008-06-011-0/+5
|
* Replace unnecessary function call.Georg Brandl2007-08-061-1/+1
|
* Forward port of 51850 from release25-maint branch.Neal Norwitz2006-09-111-10/+9
| | | | | | | As mentioned on python-dev, reverting patch #1504333 because it introduced an infinite loop in rev 47154. This patch also adds a test to prevent the regression.
* SF bug #1504333: sgmlib should allow angle brackets in quoted valuesFred Drake2006-06-291-9/+10
| | | | | | | (modified patch by Sam Ruby; changed to use separate REs for start and end tags to reduce matching cost for end tags; extended tests; updated to avoid breaking previous changes to support IPv6 addresses in unquoted attribute values)
* - SF bug #853506: IP6 address parsing in sgmllibFred Drake2006-06-231-3/+3
| | | | | | | | | | ('[' and ']' were not accepted in unquoted attribute values) - cleaned up tests of character and entity reference decoding so the tests cover the documented relationships among handle_charref, handle_entityref, convert_charref, convert_codepoint, and convert_entityref, without bringing up Unicode issues that sgmllib cannot be involved in
* fix change that broke the htmllib testsFred Drake2006-06-171-2/+2
|
* SF patch 1504676: Make sgmllib char and entity references pluggableFred Drake2006-06-161-37/+44
| | | | (implementation/tests contributed by Sam Ruby)
* explain an XXX in more detailFred Drake2006-06-141-0/+3
|
* Whitespace normalization.Tim Peters2006-04-031-1/+1
|
* patch #1462498: handle entityrefs in attribute values.Georg Brandl2006-04-011-3/+31
|
* add name that should be considered public to __all__Fred Drake2004-09-091-1/+1
|
* Replace backticks with repr() or "%r"Walter Dörwald2004-02-121-3/+3
| | | | From SF patch #852334.
* Patch #793559: Reset __starttext_tag. Fixes #709491. Backported to 2.3.Martin v. Löwis2003-09-201-1/+1
|
* Allow "@" in unquoted attribute values.Fred Drake2003-04-291-1/+1
| | | | | Added test that checks for characters allowed in the query part of URLs. Backport candidate.
* Whitespace normalization.Tim Peters2003-04-241-4/+4
|
* Patch #545300: Support marked sections.Martin v. Löwis2003-03-301-14/+8
|
* Accept commas in unquoted attribute values.Fred Drake2003-03-141-1/+1
| | | | This closes SF patch #669683.
* Replace boolean test with is None.Raymond Hettinger2002-06-021-1/+1
|
* SF 563203. Replaced 'has_key()' with 'in'.Raymond Hettinger2002-06-011-1/+1
|
* Re-arrange things and remove some unused variables/imports to keep pycheckerFred Drake2001-10-261-1/+0
| | | | happy. (This does not cover everything it complained about, though.)
* Re-factor the SGMLParser class to use the new markupbase.ParserBase class.Fred Drake2001-09-241-75/+34
| | | | | | | | | | Use a new internal method, error(), consistently to raise parse errors; the new base class also uses this. Adjust the parse_comment() method to return the new offset into the buffer instead of the number of characters scanned; this was the only helper method that did it this way, so we have better consistency now. Required to share the new base class. This fixes SF bug #448482 and #453706.
* Patch #444359: Remove unused imports.Martin v. Löwis2001-08-021-1/+0
|
* Make the new docstrings better conform to Guido's style guide.Fred Drake2001-07-191-7/+15
|
* Added docstrings based on a patch by Evelyn Mitchell.Fred Drake2001-07-191-11/+16
| | | | This closes SF patch #440153.
* In CDATA mode, make sure entity-reference syntax is not interpreted;Fred Drake2001-07-161-8/+26
| | | | | | | entity references are not allowed in that mode. Do a better job of scanning <!DOCTYPE ...> declarations; based on the code in HTMLParser.py.
* Be more permissive in what is accepted as an attribute name; this makesFred Drake2001-07-141-1/+1
| | | | | this module slightly more resiliant in the face of XHTML input, or just colons in attribute names.
* Allow underscores in tag names and quote characters in unquoted attributeFred Drake2001-07-051-2/+2
| | | | | | | values. The change for attribute values matches the way Mozilla and Navigator view the world, at least. This closes SF bug #436621.
* parse_declaration(): be more lenient in what we accept. We nowGuido van Rossum2001-05-211-12/+7
| | | | | | | | | basically accept <!...> where the dots can be single- or double-quoted strings or any other character except >. Background: I found a real-life example that failed to parse with the old assumption: http://www.opensource.org/licenses/jabberpl.html contains a few constructs of the form <![if !supportLists]>...<![endif]>.
* Fix typo in exception name (SGMLParserError should be SGMLParseError)Guido van Rossum2001-04-151-1/+1
| | | | found by Neil Norwitz's PyChecker.
* Change RuntimeError to SGMLParseError, which subclasses RuntimeErrorFred Drake2001-03-161-5/+53
| | | | | | | | | | for backward compatibility. Add support for SGML declaration syntax (<!....>) to some reasonable degree. This does not support everything allowed in SGML, but should work with "real" HTML (internal subset in a DOCTYPE is not handled). The content of the declaration is passed to the .handle_decl() method, which can be overridden by subclasses.
* Change "[%s]" % string.whitespace to r"\s" in regular expressions.Fred Drake2001-03-141-4/+3
|
* SF Patch # 103839 byt dougfort: Allow ';' in attributesGuido van Rossum2001-02-191-1/+1
| | | | | | | sgmllib does not recognize HTML attributes containing the semicolon ';' character. This may be in accordance with the HTML spec, but there are sites that use it (excite.com) and the browsers I regularly use (IE5, Netscape, Opera) all handle it. Doug Fort Downright Software LLC
* bunch more __all__ listsSkip Montanaro2001-02-151-0/+1
| | | | | | also modified check_all function to suppress all warnings since they aren't relevant to what this test is doing (allows quiet checking of regsub, for instance)
* Use ValueError instead of string.atoi.error, since we've switched toEric S. Raymond2001-02-091-1/+1
| | | | int().
* String method conversion.Eric S. Raymond2001-02-091-5/+5
|
* Whitespace normalization.Tim Peters2001-01-151-2/+2
|
* Update the code to better reflect recommended style:Fred Drake2000-12-121-2/+2
| | | | | Use != instead of <> since <> is documented as "obsolescent". Use "is" and "is not" when comparing with None or type objects.
* [Old patch that hadn't been checked in.]Fred Drake2000-06-291-2/+11
| | | | | | | | | | | get_starttag_text(): New method. Return the text of the most recently parsed start tag, from the '<' to the '>' or '/'. Not really useful for structure processing, but requested for Web-related use. May also be useful for being able to re-generate the input from the parse events, but there's no equivalent for end tags. attrfind: Be a little more forgiving of unquoted attribute values.
* typos fixed by Rob HooftJeremy Hylton2000-06-281-1/+1
|
* The third and final doc-string sweep by Ka-Ping Yee.Guido van Rossum2000-02-041-1/+1
| | | | | | | | The attached patches update the standard library so that all modules have docstrings beginning with one-line summaries. A new docstring was added to formatter. The docstring for os.py was updated to mention nt, os2, ce in addition to posix, dos, mac.
* Allow recognition of attributes even if they don't have space in frontFred Drake1999-01-251-1/+1
| | | | | | | of them. I.e., '<a name="foo"href="bar.html">' will now have two attributes recognized. Based on comments from newgroup.
* Patch by Chris Herborth (posted to comp.lang.python)to make it behaveGuido van Rossum1998-08-241-3/+3
| | | | with tags that have - or . in their names.
* Put back the call to report_unbalanced() that was lost whenGuido van Rossum1998-07-071-0/+2
| | | | parse_endtag() was restructured in parse_endtag() and finish_endtag().
* Patch by Lars Marius Garshol:Guido van Rossum1998-05-281-2/+30
| | | | | | | | | - Handle <? processing instructions >. - Allow . and - in entity names. Also fixed an oversight in the previous fix (in one place, [ \t\r\n] was used instead of string.whitespace).
* Fix regexp for attrfind; bug reported by Lars Marius GarsholFred Drake1998-04-161-4/+4
| | | | <larsga@ifi.uio.no>.
* Mass check-in after untabifying all files that need it.Guido van Rossum1998-03-261-275/+275
|
* Although it's hard to be sure, I *think* this is a working conversionGuido van Rossum1997-10-231-67/+67
| | | | | | from regex to re style regular expressions. This should make sgmllib and htmllib threadsafe, so I can now create a threaded version of webchecker...
* (sgmllib.py): Partial acceptance of patch from David LeonardFred Drake1996-12-161-1/+1
| | | | | | | | <leonard@dstc.edu.au>; allows hyphen and period in the middle of attribute names. Still not allowed as first character; as first character these are illegal in the Reference Concrete Syntax, and we've not identified any use of these characters as the first char in an attribute name in deployment on the web.
* Reformatted with 4-space tab stops.Guido van Rossum1996-03-281-286/+406
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow '=' and '~' in unquoted attribute values. Added overridable methods handle_starttag(tag, method, attrs) and handle_endtag(tag, method) so subclasses can decide whether they really want to call the method (e.g. when suppressing some portion of the document). Added support for a number of SGML shortcuts: shorthand full notation <tag>...<>... <tag>...<tag>... <tag>...</> <tag>...</tag> <tag/.../ <tag>...</tag> <tag1<tag2> <tag1><tag2> </tag1</tag2> </tag1></tag2> </tag1<tag2> </tag1><tag2> This required factoring out some common actions and rationalizing the interface to parse_endtag(), so as to make the code more readable. Fixed syntax for &entity and &#char references so the trailing semicolon is optional; removed explicit support for trailing period (which was a TBL mistake in HTML 0.0). Generalized the test program. Tried to speed things up a little. (More to come after the profile results are in.) Fix error recovery: call the end methods popped from the stack instead of the one that triggers. (Plus some complications because of the way HTML extensions are handled in Grail.)
* typos in attrfind regexGuido van Rossum1995-10-061-1/+1
|