summaryrefslogtreecommitdiffstats
path: root/Lib/robotparser.py
Commit message (Collapse)AuthorAgeFilesLines
* Fix typos in comments, documentation and test method namesMartin Panter2016-05-081-1/+1
|
* Issue 21469: Mitigate risk of false positives with robotparser.Raymond Hettinger2014-05-131-2/+12
| | | | | | | | | | | | | | | | | | | * Repair the broken link to norobots-rfc.txt. * HTTP response codes >= 500 treated as a failed read rather than as a not found. Not found means that we can assume the entire site is allowed. A 5xx server error tells us nothing. * A successful read() or parse() updates the mtime (which is defined to be "the time the robots.txt file was last fetched"). * The can_fetch() method returns False unless we've had a read() with a 2xx or 4xx response. This avoids false positives in the case where a user calls can_fetch() before calling read(). * I don't see any easy way to test this patch without hitting internet resources that might change or without use of mock objects that wouldn't provide must reassurance.
* #17403: urllib.parse.robotparser normalizes the urls before adding to ruleline.Senthil Kumaran2013-05-291-0/+1
| | | | This helps in handling certain types invalid urls in a conservative manner.
* Merged revisions 83238 via svnmerge fromGeorg Brandl2010-08-011-2/+4
| | | | | | | | | | svn+ssh://pythondev@svn.python.org/python/branches/py3k ........ r83238 | georg.brandl | 2010-07-29 19:55:01 +0200 (Do, 29 Jul 2010) | 1 line #4108: the first default entry (User-agent: *) wins. ........
* Merged revisions 83209 via svnmerge fromSenthil Kumaran2010-07-281-1/+6
| | | | | | | | | | svn+ssh://pythondev@svn.python.org/python/branches/py3k ........ r83209 | senthil.kumaran | 2010-07-28 21:57:56 +0530 (Wed, 28 Jul 2010) | 3 lines Fix Issue6325 - robotparse to honor urls with query strings. ........
* Close issue 3437 - missing state change when Allow lines are processed.Skip Montanaro2008-07-271-0/+5
| | | | Adds test cases which use Allow: as well.
* #1778443 robotparser fixes from Aristotelis MikropoulosBenjamin Peterson2008-07-121-6/+3
|
* Get rid of _test(), _main(), _debug() and _check(). Tests are no longerSkip Montanaro2008-04-281-93/+12
| | | | | | needed (better set available in Lib/test/test_robotparser.py). Clean up a few PEP 8 nits (compound statements on a single line, whitespace around operators).
* fixes 813986Skip Montanaro2007-08-281-0/+5
|
* Patch #1555098: use str.join() instead of repeated stringGeorg Brandl2007-03-131-9/+6
| | | | concatenation in robotparser.
* Patch #1014237: Consistently return booleans throughout.Martin v. Löwis2004-08-231-10/+10
|
* Replace str.find()!=1 with the more readable "in" operator.Raymond Hettinger2004-05-041-1/+1
|
* SF patch #911431: robot.txt must be robots.txtRaymond Hettinger2004-03-131-2/+2
| | | | (Contributed by George Yoshida.)
* Get rid of many apply() calls.Guido van Rossum2003-02-271-1/+1
|
* Remove import of re, it is not usedNeal Norwitz2002-05-311-1/+1
|
* Patch 560023 adding docstrings. 2.2 Candidate (after verifying modules were ↵Raymond Hettinger2002-05-291-0/+17
| | | | not updated after 2.2).
* Convert a pile of obvious "yes/no" functions to return bool.Tim Peters2002-04-041-6/+6
|
* Correctly set default entry in all cases.Martin v. Löwis2002-03-181-6/+9
|
* Patch #499513: use readline() instead of readlines(). Removed theMartin v. Löwis2002-03-181-16/+6
| | | | unnecessary redirection limit code which is already in FancyURLopener.
* Correct various errors:Martin v. Löwis2002-02-281-6/+16
| | | | | | | - Use substring search, not re search for user-agent and paths. - Consider * entry last. Unquote, then requote URLs. - Treat empty Disallow as "allow everything". Add test cases. Fixes #523041
* Remove unused import (PyChecker)Andrew M. Kuchling2001-08-131-1/+0
|
* Whitespace normalization.Tim Peters2001-02-151-1/+1
|
* The bulk of the credit for these changes goes to Bastian KleineidamSkip Montanaro2001-02-121-34/+89
| | | | | | | * restores urllib as the file fetcher (closes bug #132000) * allows checking URLs with empty paths (closes patches #103511 and 103721) * properly handle user agents with versions (e.g., SpamMeister/1.5) * added several more tests
* String method conversion.Eric S. Raymond2001-02-091-8/+8
|
* Whitespace normalization.Tim Peters2001-01-211-10/+10
|
* added __all__ lists to a number of Python modulesSkip Montanaro2001-01-201-0/+2
| | | | | | | | added test script and expected output file as well this closes patch 103297. __all__ attributes will be added to other modules without first submitting a patch, just adding the necessary line to the test script to verify more-or-less correct implementation.
* rewrite of robotparser.py by Bastian Kleineidam. Closes patch 102229.Skip Montanaro2001-01-201-60/+179
|
* Skip Montanaro:Guido van Rossum2000-03-271-17/+17
| | | | | | | | | | | | The robotparser.py module currently lives in Tools/webchecker. In preparation for its migration to Lib, I made the following changes: * renamed the test() function _test * corrected the URLs in _test() so they refer to actual documents * added an "if __name__ == '__main__'" catcher to invoke _test() when run as a main program * added doc strings for the two main methods, parse and can_fetch * replaced usage of regsub and regex with corresponding re code
* Give in to tabnannyGuido van Rossum1998-04-061-60/+60
|
* Skip Montanaro's robots.txt parser.Guido van Rossum1997-01-301-0/+97