summaryrefslogtreecommitdiffstats
path: root/Lib/sgmllib.py
Commit message (Collapse)AuthorAgeFilesLines
* Allow recognition of attributes even if they don't have space in frontFred Drake1999-01-251-1/+1
| | | | | | | of them. I.e., '<a name="foo"href="bar.html">' will now have two attributes recognized. Based on comments from newgroup.
* Patch by Chris Herborth (posted to comp.lang.python)to make it behaveGuido van Rossum1998-08-241-3/+3
| | | | with tags that have - or . in their names.
* Put back the call to report_unbalanced() that was lost whenGuido van Rossum1998-07-071-0/+2
| | | | parse_endtag() was restructured in parse_endtag() and finish_endtag().
* Patch by Lars Marius Garshol:Guido van Rossum1998-05-281-2/+30
| | | | | | | | | - Handle <? processing instructions >. - Allow . and - in entity names. Also fixed an oversight in the previous fix (in one place, [ \t\r\n] was used instead of string.whitespace).
* Fix regexp for attrfind; bug reported by Lars Marius GarsholFred Drake1998-04-161-4/+4
| | | | <larsga@ifi.uio.no>.
* Mass check-in after untabifying all files that need it.Guido van Rossum1998-03-261-275/+275
|
* Although it's hard to be sure, I *think* this is a working conversionGuido van Rossum1997-10-231-67/+67
| | | | | | from regex to re style regular expressions. This should make sgmllib and htmllib threadsafe, so I can now create a threaded version of webchecker...
* (sgmllib.py): Partial acceptance of patch from David LeonardFred Drake1996-12-161-1/+1
| | | | | | | | <leonard@dstc.edu.au>; allows hyphen and period in the middle of attribute names. Still not allowed as first character; as first character these are illegal in the Reference Concrete Syntax, and we've not identified any use of these characters as the first char in an attribute name in deployment on the web.
* Reformatted with 4-space tab stops.Guido van Rossum1996-03-281-286/+406
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow '=' and '~' in unquoted attribute values. Added overridable methods handle_starttag(tag, method, attrs) and handle_endtag(tag, method) so subclasses can decide whether they really want to call the method (e.g. when suppressing some portion of the document). Added support for a number of SGML shortcuts: shorthand full notation <tag>...<>... <tag>...<tag>... <tag>...</> <tag>...</tag> <tag/.../ <tag>...</tag> <tag1<tag2> <tag1><tag2> </tag1</tag2> </tag1></tag2> </tag1<tag2> </tag1><tag2> This required factoring out some common actions and rationalizing the interface to parse_endtag(), so as to make the code more readable. Fixed syntax for &entity and &#char references so the trailing semicolon is optional; removed explicit support for trailing period (which was a TBL mistake in HTML 0.0). Generalized the test program. Tried to speed things up a little. (More to come after the profile results are in.) Fix error recovery: call the end methods popped from the stack instead of the one that triggers. (Plus some complications because of the way HTML extensions are handled in Grail.)
* typos in attrfind regexGuido van Rossum1995-10-061-1/+1
|
* allow _ in attr names (Netscape!)Guido van Rossum1995-09-301-1/+1
|
* fix <!...!> parsing; added verbose option; don't lowercase entityrefsGuido van Rossum1995-09-221-5/+7
|
* support value-less attributes, using regex.group()Guido van Rossum1995-09-011-14/+8
|
* added note about missing featuresGuido van Rossum1995-08-101-0/+2
|
* changed comment parsingGuido van Rossum1995-08-041-13/+14
|
* make reporting unbalanced tags an overridable methodGuido van Rossum1995-06-221-2/+7
|
* remove redundant backslashes; some cosneticsGuido van Rossum1995-03-041-9/+10
|
* added html parser and supporting castGuido van Rossum1995-02-271-0/+321