| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
or request rate (GH-11791)
Co-Authored-By: Tal Einat <taleinat+github@gmail.com>
|
| |
|
|
|
|
|
|
| |
The urllib.robotparser's __str__ representation now includes wildcard
entries and the "Crawl-delay" and "Request-rate" fields. Also removes extra
newlines that were being appended to the end of the string.
|
| |
|
|
|
|
|
|
| |
crawl_delay and request_rate
Initial patch by Peter Wirtz.
|
|\ |
|
| | |
|
|/
|
|
|
|
| |
extensions.
Patch by Nikolay Bogoychev.
|
|
|
|
|
|
| |
if/else expression).
Suggested by: Tal Einat
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Repair the broken link to norobots-rfc.txt.
* HTTP response codes >= 500 treated as a failed read rather than as a not
found. Not found means that we can assume the entire site is allowed. A 5xx
server error tells us nothing.
* A successful read() or parse() updates the mtime (which is defined to be "the
time the robots.txt file was last fetched").
* The can_fetch() method returns False unless we've had a read() with a 2xx or
4xx response. This avoids false positives in the case where a user calls
can_fetch() before calling read().
* I don't see any easy way to test this patch without hitting internet
resources that might change or without use of mock objects that wouldn't
provide must reassurance.
|
|
|
|
| |
This helps in handling certain types invalid urls in a conservative manner.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk
........
r65209 | raymond.hettinger | 2008-07-23 19:08:18 -0500 (Wed, 23 Jul 2008) | 1 line
Finish-up the partial conversion from int to Py_ssize_t for deque indices and length.
........
r65210 | raymond.hettinger | 2008-07-23 19:53:49 -0500 (Wed, 23 Jul 2008) | 1 line
Parse to the correct datatype.
........
r65211 | benjamin.peterson | 2008-07-23 21:27:46 -0500 (Wed, 23 Jul 2008) | 1 line
fix spacing
........
r65212 | benjamin.peterson | 2008-07-23 21:31:28 -0500 (Wed, 23 Jul 2008) | 1 line
fix markup
........
r65213 | benjamin.peterson | 2008-07-23 21:45:37 -0500 (Wed, 23 Jul 2008) | 1 line
add some documentation for 2to3
........
r65214 | raymond.hettinger | 2008-07-24 00:38:48 -0500 (Thu, 24 Jul 2008) | 1 line
Finish conversion from int to Py_ssize_t.
........
r65215 | raymond.hettinger | 2008-07-24 02:04:55 -0500 (Thu, 24 Jul 2008) | 1 line
Convert from long to Py_ssize_t.
........
r65216 | georg.brandl | 2008-07-24 02:09:21 -0500 (Thu, 24 Jul 2008) | 2 lines
Fix indentation.
........
r65225 | benjamin.peterson | 2008-07-25 11:55:37 -0500 (Fri, 25 Jul 2008) | 1 line
teach .bzrignore about doc tools
........
r65226 | benjamin.peterson | 2008-07-25 12:02:11 -0500 (Fri, 25 Jul 2008) | 1 line
document default value for fillvalue
........
r65233 | raymond.hettinger | 2008-07-25 13:43:33 -0500 (Fri, 25 Jul 2008) | 1 line
Issue 1592: Better error reporting for operations on closed shelves.
........
r65239 | benjamin.peterson | 2008-07-25 16:59:53 -0500 (Fri, 25 Jul 2008) | 1 line
fix indentation
........
r65246 | andrew.kuchling | 2008-07-26 08:08:19 -0500 (Sat, 26 Jul 2008) | 1 line
This sentence continues to bug me; rewrite it for the second time
........
r65247 | andrew.kuchling | 2008-07-26 08:09:06 -0500 (Sat, 26 Jul 2008) | 1 line
Remove extra words
........
r65255 | skip.montanaro | 2008-07-26 19:49:02 -0500 (Sat, 26 Jul 2008) | 3 lines
Close issue 3437 - missing state change when Allow lines are processed.
Adds test cases which use Allow: as well.
........
r65256 | skip.montanaro | 2008-07-26 19:50:41 -0500 (Sat, 26 Jul 2008) | 2 lines
note robotparser bug fix.
........
|
|
|
|
|
|
| |
The solution is to convert bytes to text via utf-8. I'm not entirely
sure if this is safe, but it looks like robots.txt is expected to be
ascii.
|
|
It consists of code from urllib, urllib2, urlparse, and robotparser.
The old modules have all been removed. The new package has five
submodules: urllib.parse, urllib.request, urllib.response,
urllib.error, and urllib.robotparser. The urllib.request.urlopen()
function uses the url opener from urllib2.
Note that the unittests have not been renamed for the
beta, but they will be renamed in the future.
Joint work with Senthil Kumaran.
|