cpython.git - https://github.com/python/cpython.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	SF bug 753592, websucker bug	Neal Norwitz	2003-07-01	1	-1/+1
\| \| \| \| \|	Pass the proper variable when the user supplies a directory. Will backport.
*	When bad HTML is encountered, ignore the page rather than failing with	Mark Hammond	2003-02-27	1	-1/+9
\| \| \| \|	a traceback.
*	Handle the Content-Type header a little more appropriately: if it	Fred Drake	2002-11-12	1	-0/+3
\| \| \| \| \| \|	contains options, drop them to get the major/minor content type. Modified from the supplied patch to support more whitespace variation. Closes SF patch #613605.
*	Apply diff2.txt from SF patch http://www.python.org/sf/572113	Walter Dörwald	2002-09-11	5	-27/+22
\| \| \| \| \| \| \| \|	(with one small bugfix in bgen/bgen/scantools.py) This replaces string module functions with string methods for the stuff in the Tools directory. Several uses of string.letters etc. are still remaining.
*	Apply diff.txt from SF patch http://www.python.org/sf/561478	Walter Dörwald	2002-06-06	1	-1/+2
\| \| \| \| \| \|	This uses cgi.parse_header() in Checker.checkforhtml(), so that webchecker recognises the mime type text/html even if options are specified.
*	[Bug #512799] urllib.splittype() returns a 2-tuple. (Reported by seb bacon)	Andrew M. Kuchling	2002-03-08	1	-1/+1
\|
*	Fix SF bug #482171: webchecker dies on file: URLs w/o robots.txt	Guido van Rossum	2001-12-11	1	-2/+2
\| \| \| \| \| \|	The cause seems to be that when a file URL doesn't exist, urllib.urlopen() raises OSError instead of IOError. Simply add this to the except clause. Not elegant, but effective. :-)
*	Only catch NameError and TypeError when attempting to subclass an	Fred Drake	2001-05-11	1	-1/+1
\| \| \| \|	exception (for compatibility with old versions of Python).
*	Added more link attributes based on additonal information from Chris	Fred Drake	2001-04-05	1	-1/+13
\| \| \| \| \| \| \|	McCafferty <christopher.mccafferty@csg.ch>, and a bit of experimentation with Navigator 4.7. HTML-as-deployed is evil!
*	A number of improvements based on a discussion with Chris McCafferty	Fred Drake	2001-04-04	1	-2/+24
\| \| \| \| \| \| \| \| \|	<christopher.mccafferty@csg.ch>: Add javascript: and telnet: to the types of URLs we ignore. Add support for several additional URL-valued attributes on the BODY, FRAME, IFRAME, LINK, OBJECT, and SCRIPT elements.
*	Patch inspired by Just van Rossum: on the Mac, in savefilename(), make	Guido van Rossum	2000-04-25	1	-1/+3
\| \| \| \| \|	the path to save a relative path by prefixing it with os.sep (':'). Also fix an indent inconsistency in the same function.
*	Moved robotparser.py to the Lib directory.	Guido van Rossum	2000-03-29	1	-97/+0
\| \| \| \|	If you do a "cvs update" in the Lib directory, it will pop up there.
*	Fix suggested by Magnus Kessler: in class Page, it is possible for	Guido van Rossum	2000-03-28	1	-1/+4
\| \| \| \| \|	self.parser to be None; in that case don't dereference it in getnames().
*	Skip Montanaro:	Guido van Rossum	2000-03-27	1	-17/+17
\| \| \| \| \| \| \| \| \| \| \| \|	The robotparser.py module currently lives in Tools/webchecker. In preparation for its migration to Lib, I made the following changes: * renamed the test() function _test * corrected the URLs in _test() so they refer to actual documents * added an "if __name__ == '__main__'" catcher to invoke _test() when run as a main program * added doc strings for the two main methods, parse and can_fetch * replaced usage of regsub and regex with corresponding re code
*	Complete the integration of Sam Bayer's fixes.	Guido van Rossum	1999-11-17	2	-912/+10
\|
*	Changed fron importing wcnew back to webchecker.	Guido van Rossum	1999-11-17	2	-6/+2
\|
*	Integrated Sam Bayer's wcnew.py code. It seems silly to keep two	Guido van Rossum	1999-11-17	1	-46/+185
\| \| \| \| \|	files. Removed Sam's "SLB" change comments; otherwise this is the same as wcnew.py.
*	# NOT by Sam Bayer: reindented to use 4 spaces like the rest here,	Guido van Rossum	1999-11-17	1	-204/+203
\| \| \| \|	# and removed trailing whitespace.
*	Samuel L. Bayer:	Guido van Rossum	1999-11-17	1	-4/+12
\| \| \| \| \| \| \| \| \|	- same trick with "import wcnew; webchecker = wcnew" as above - updated readhtml() method to handle pair representation; used new name suppression infrastructure from wcnew.py to suppress processing name anchors [And untabified --GvR]
*	Samuel L. Bayer:	Guido van Rossum	1999-11-17	1	-17/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- added -t and -a arguments - added "import wcnew; webchecker = wcnew" in place of "import webchecker" (I assume that if you're happy with the changes, you'll just replace webchecker.py with wcnew.py, but if I were to do that, the diffs would be incomprehensible) - fixed buggy -v argument (I think you got out of sync with the way verbosity was handled in webchecker vs. wcgui between 1.5 and 1.5.2) - made -v actually do something by adding a call to c.setflags() (probably the same problem as above) - updated references to URLs to accommodate wcnew.py's pair representation; added appropriate calls to format_url() to handle display; added argument to ListPanel() initialization to provide access to format_url() [And untabified --GvR]
*	Samuel L. Bayer:	Guido van Rossum	1999-11-17	1	-154/+178
\| \| \| \| \| \| \| \| \| \|	- same fixes from webchecker.py - incorporated small diff between current webchecker.py and 1.5.2 - fixed bug where "extra roots" added with the -t argument were being checked as real roots, not just as possible continuations - added -a argument to suppress checking of name anchors [And untabified --GvR]
*	Samuel L. Bayer:	Guido van Rossum	1999-11-17	1	-1/+6
\| \| \| \| \| \| \| \|	- forced new done origins to set errors if they're in self.bad (fixes bug where only the first of a number of errorful references to a link is reported under some circumstances) - suppressed adding duplicates to self.todo list (cleans up printout in wcgui details)
*	Some changes (maybe not enough?) to make it work on Windows with local	Guido van Rossum	1999-04-26	1	-3/+3
\| \| \| \|	file URLs.
*	Added Samuel Bayer's new webchecker.	Guido van Rossum	1999-03-24	1	-0/+884
\| \| \| \| \| \| \| \|	Unfortunately his code breaks wcgui.py in a way that's not easy to fix. I expect that this is a temporary situation -- eventually Sam's changes will be merged back in. (The changes add a -t option to specify exceptions to the -x option, and explicit checking for #foo style fragment ids.)
*	Recover from failed saves; when a file turns out to be a directory,	Guido van Rossum	1999-01-03	1	-5/+17
\| \| \| \|	create a directory and moer the original file to the index.html.
*	Added note() message to Page class -- this was used but didn't exist.	Guido van Rossum	1998-08-06	1	-0/+9
\| \| \| \| \|	(The alternative would be to call self.checker.note() but since self.checker might be None that's not quite right.
*	Rewrite to support multiple suckers, each with their own thread.	Guido van Rossum	1998-07-08	1	-102/+140
\|
*	Instead of printint, use self.message() or self.note().	Guido van Rossum	1998-07-08	2	-72/+63
\|
*	# This is a new module I wrote over the weekend. Again, you missed the	Guido van Rossum	1998-06-15	1	-16/+37
\| \| \| \| \| \| \|	# checkin email because my PC doesn't have the "Mail" command. Add threading (now that it works). Also some small adaptations to Unix again.
*	Primitive GUI for websucker.	Guido van Rossum	1998-06-15	1	-0/+185
\|
*	Fix the way a trailing / is changed to /index.html so that it	Guido van Rossum	1998-06-15	1	-2/+3
\| \| \| \|	doesn't depend on the value of os.sep. (I.e. ported to Windows :-)
*	sort the urls in the todo list	Guido van Rossum	1998-06-15	1	-1/+3
\|
*	Use a try-except so that the pickle file is written even when we die	Guido van Rossum	1998-04-27	1	-14/+18
\| \| \| \|	because of an unexpected exception.
*	Give in to tabnanny	Guido van Rossum	1998-04-06	6	-1041/+850
\|
*	Use a better way to bind the checkext instance variable to a check	Guido van Rossum	1998-03-05	1	-9/+8
\| \| \| \| \|	button widget, not involving a __getattr__() method but a callback on the widget.
*	Adapt to new webchecker structure. Due to better structure of	Guido van Rossum	1998-02-21	1	-59/+33
\| \| \| \| \|	getpage(), much less duplicate code is needed -- we only need to override readhtml().
*	Major overhaul. Don't use global variable (e.g. verbose); use	Guido van Rossum	1998-02-21	1	-130/+191
\| \| \| \| \| \| \|	instance variables. Make all global functions methods, for easy overriding. Restructure getpage() for easy overriding. Add save_pickle() method and load_pickle() global function to make it easier for other programs to emulate the toplevel interface.
*	Map .shtml to text/html.	Guido van Rossum	1997-10-07	1	-0/+1
\|
*	A variant on webchecker that creates a mirror copy of a remote site.	Guido van Rossum	1997-10-06	1	-0/+131
\|
*	Several changes:	Guido van Rossum	1997-10-06	1	-6/+24
\| \| \| \| \| \| \| \| \| \|	- Change the code that looks for robots.txt to always look in /, even if the "root" path is somewhere deep down below. - Add link processing in <AREA> tags. - Change safeclose() to avoid crashing when the file has no geturl() method.
*	Tiny script to play with it on a Mac.	Guido van Rossum	1997-05-28	1	-0/+7
\|
*	Scroll to top of info window when done.	Guido van Rossum	1997-05-09	1	-0/+1
\|
*	Avoid the fancy handler for error 401 (request authentication).	Guido van Rossum	1997-05-07	1	-4/+7
\|
*	Restructured Checker class to get rid of 'ext' table.	Guido van Rossum	1997-02-02	2	-177/+165
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Links are now either in 'todo' or 'done', and ext links are hadled more like local links except that no further links are gathered (and sometimes they aren't checked, e.g. for mailto and news URLs). The -x option reverses its meaning: it disables checking of ext links (they are moved to 'done' without checking). A new 'errors' table collects pages with bad links as we go -- redundant, but useful for the GUI version which needs to report this as we go. Some new methods, including reset(). New checkpoint format. Adapted the GUI to the changes in the Checker class. Added Quit and "Start over" buttons, and a checkbox to disable checking external links. The details window now also shows bad links emanating from the selected page. Miscellaneous small chages.
*	Add root URL entry box, separate start/stop/step buttons.	Guido van Rossum	1997-02-01	1	-54/+131
\| \| \| \|	If the users selects an item in 'To check', start checking there.
*	Process <img> and <frame> tags. Don't bother skipping second href.	Guido van Rossum	1997-02-01	1	-3/+12
\|
*	Check in another copy of tktools.py...	Guido van Rossum	1997-01-31	1	-0/+367
\|
*	Tk interface to webchecker. Not fully featured yet, but usable.	Guido van Rossum	1997-01-31	1	-0/+329
\|
*	Spin off checking of external page in a subroutine.	Guido van Rossum	1997-01-31	1	-17/+20
\| \| \| \| \|	Increase MAXPAGE to 150K. Add back printing of __doc__ for usage message.
*	Many misc changes.	Guido van Rossum	1997-01-31	1	-95/+142
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Faster HTML parser derivede from SGMLparser (Fred Gansevles). - All manipulations of todo, done, ext, bad are done via methods, so a derived class can override. Also moved the 'done' marking to dopage(), so run() is much simpler. - Added a method status() which returns a string containing the summary counts; added a "total" count. - Drop the guessing of the file type before opening the document -- we still need to check those links for validity! - Added a subroutine to close a connection which first slurps up the remaining data when it's an ftp URL -- apparently closing an ftp connection without reading till the end makes it hang. - Added -n option to skip running (only useful with -R). - The Checker object now has an instance variable which is set to 1 when it is changed. This is not pickled.