diff options
author | Berker Peksag <berker.peksag@gmail.com> | 2015-10-08 09:27:06 (GMT) |
---|---|---|
committer | Berker Peksag <berker.peksag@gmail.com> | 2015-10-08 09:27:06 (GMT) |
commit | 960e848f0d32399824d61dce2ff5736bb596ed44 (patch) | |
tree | 7274633785f0e6b8d0ce64375bdca6351b0d32db /Doc | |
parent | 2137dc15737e286e576d69ad10124973c3f9ba1e (diff) | |
download | cpython-960e848f0d32399824d61dce2ff5736bb596ed44.zip cpython-960e848f0d32399824d61dce2ff5736bb596ed44.tar.gz cpython-960e848f0d32399824d61dce2ff5736bb596ed44.tar.bz2 |
Issue #16099: RobotFileParser now supports Crawl-delay and Request-rate
extensions.
Patch by Nikolay Bogoychev.
Diffstat (limited to 'Doc')
-rw-r--r-- | Doc/library/urllib.robotparser.rst | 30 | ||||
-rw-r--r-- | Doc/whatsnew/3.6.rst | 8 |
2 files changed, 36 insertions, 2 deletions
diff --git a/Doc/library/urllib.robotparser.rst b/Doc/library/urllib.robotparser.rst index f179de2..c2e1bef 100644 --- a/Doc/library/urllib.robotparser.rst +++ b/Doc/library/urllib.robotparser.rst @@ -53,15 +53,41 @@ structure of :file:`robots.txt` files, see http://www.robotstxt.org/orig.html. Sets the time the ``robots.txt`` file was last fetched to the current time. + .. method:: crawl_delay(useragent) -The following example demonstrates basic use of the RobotFileParser class. + Returns the value of the ``Crawl-delay`` parameter from ``robots.txt`` + for the *useragent* in question. If there is no such parameter or it + doesn't apply to the *useragent* specified or the ``robots.txt`` entry + for this parameter has invalid syntax, return ``None``. + + .. versionadded:: 3.6 + + .. method:: request_rate(useragent) + + Returns the contents of the ``Request-rate`` parameter from + ``robots.txt`` in the form of a :func:`~collections.namedtuple` + ``(requests, seconds)``. If there is no such parameter or it doesn't + apply to the *useragent* specified or the ``robots.txt`` entry for this + parameter has invalid syntax, return ``None``. + + .. versionadded:: 3.6 + + +The following example demonstrates basic use of the :class:`RobotFileParser` +class:: >>> import urllib.robotparser >>> rp = urllib.robotparser.RobotFileParser() >>> rp.set_url("http://www.musi-cal.com/robots.txt") >>> rp.read() + >>> rrate = rp.request_rate("*") + >>> rrate.requests + 3 + >>> rrate.seconds + 20 + >>> rp.crawl_delay("*") + 6 >>> rp.can_fetch("*", "http://www.musi-cal.com/cgi-bin/search?city=San+Francisco") False >>> rp.can_fetch("*", "http://www.musi-cal.com/") True - diff --git a/Doc/whatsnew/3.6.rst b/Doc/whatsnew/3.6.rst index dd35c9a..3080820 100644 --- a/Doc/whatsnew/3.6.rst +++ b/Doc/whatsnew/3.6.rst @@ -119,6 +119,14 @@ datetime (Contributed by Ashley Anderson in :issue:`12006`.) +urllib.robotparser +------------------ + +:class:`~urllib.robotparser.RobotFileParser` now supports ``Crawl-delay`` and +``Request-rate`` extensions. +(Contributed by Nikolay Bogoychev in :issue:`16099`.) + + Optimizations ============= |