| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Repair the broken link to norobots-rfc.txt.
* HTTP response codes >= 500 treated as a failed read rather than as a not
found. Not found means that we can assume the entire site is allowed. A 5xx
server error tells us nothing.
* A successful read() or parse() updates the mtime (which is defined to be "the
time the robots.txt file was last fetched").
* The can_fetch() method returns False unless we've had a read() with a 2xx or
4xx response. This avoids false positives in the case where a user calls
can_fetch() before calling read().
* I don't see any easy way to test this patch without hitting internet
resources that might change or without use of mock objects that wouldn't
provide must reassurance.
|
|
|
|
| |
This helps in handling certain types invalid urls in a conservative manner.
|
|
|
|
|
|
|
|
|
|
| |
svn+ssh://pythondev@svn.python.org/python/branches/py3k
........
r83238 | georg.brandl | 2010-07-29 19:55:01 +0200 (Do, 29 Jul 2010) | 1 line
#4108: the first default entry (User-agent: *) wins.
........
|
|
|
|
|
|
|
|
|
|
| |
svn+ssh://pythondev@svn.python.org/python/branches/py3k
........
r83209 | senthil.kumaran | 2010-07-28 21:57:56 +0530 (Wed, 28 Jul 2010) | 3 lines
Fix Issue6325 - robotparse to honor urls with query strings.
........
|
|
|
|
| |
Adds test cases which use Allow: as well.
|
| |
|
|
|
|
|
|
| |
needed (better set available in Lib/test/test_robotparser.py). Clean up a
few PEP 8 nits (compound statements on a single line, whitespace around
operators).
|
| |
|
|
|
|
| |
concatenation in robotparser.
|
| |
|
| |
|
|
|
|
| |
(Contributed by George Yoshida.)
|
| |
|
| |
|
|
|
|
| |
not updated after 2.2).
|
| |
|
| |
|
|
|
|
| |
unnecessary redirection limit code which is already in FancyURLopener.
|
|
|
|
|
|
|
| |
- Use substring search, not re search for user-agent and paths.
- Consider * entry last. Unquote, then requote URLs.
- Treat empty Disallow as "allow everything".
Add test cases. Fixes #523041
|
| |
|
| |
|
|
|
|
|
|
|
| |
* restores urllib as the file fetcher (closes bug #132000)
* allows checking URLs with empty paths (closes patches #103511 and 103721)
* properly handle user agents with versions (e.g., SpamMeister/1.5)
* added several more tests
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
added test script and expected output file as well
this closes patch 103297.
__all__ attributes will be added to other modules without first submitting
a patch, just adding the necessary line to the test script to verify
more-or-less correct implementation.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The robotparser.py module currently lives in Tools/webchecker. In
preparation for its migration to Lib, I made the following changes:
* renamed the test() function _test
* corrected the URLs in _test() so they refer to actual documents
* added an "if __name__ == '__main__'" catcher to invoke _test()
when run as a main program
* added doc strings for the two main methods, parse and can_fetch
* replaced usage of regsub and regex with corresponding re code
|
| |
|
|
|