summaryrefslogtreecommitdiffstats
path: root/Doc
diff options
context:
space:
mode:
authorBerker Peksag <berker.peksag@gmail.com>2015-10-08 09:27:06 (GMT)
committerBerker Peksag <berker.peksag@gmail.com>2015-10-08 09:27:06 (GMT)
commit960e848f0d32399824d61dce2ff5736bb596ed44 (patch)
tree7274633785f0e6b8d0ce64375bdca6351b0d32db /Doc
parent2137dc15737e286e576d69ad10124973c3f9ba1e (diff)
downloadcpython-960e848f0d32399824d61dce2ff5736bb596ed44.zip
cpython-960e848f0d32399824d61dce2ff5736bb596ed44.tar.gz
cpython-960e848f0d32399824d61dce2ff5736bb596ed44.tar.bz2
Issue #16099: RobotFileParser now supports Crawl-delay and Request-rate
extensions. Patch by Nikolay Bogoychev.
Diffstat (limited to 'Doc')
-rw-r--r--Doc/library/urllib.robotparser.rst30
-rw-r--r--Doc/whatsnew/3.6.rst8
2 files changed, 36 insertions, 2 deletions
diff --git a/Doc/library/urllib.robotparser.rst b/Doc/library/urllib.robotparser.rst
index f179de2..c2e1bef 100644
--- a/Doc/library/urllib.robotparser.rst
+++ b/Doc/library/urllib.robotparser.rst
@@ -53,15 +53,41 @@ structure of :file:`robots.txt` files, see http://www.robotstxt.org/orig.html.
Sets the time the ``robots.txt`` file was last fetched to the current
time.
+ .. method:: crawl_delay(useragent)
-The following example demonstrates basic use of the RobotFileParser class.
+ Returns the value of the ``Crawl-delay`` parameter from ``robots.txt``
+ for the *useragent* in question. If there is no such parameter or it
+ doesn't apply to the *useragent* specified or the ``robots.txt`` entry
+ for this parameter has invalid syntax, return ``None``.
+
+ .. versionadded:: 3.6
+
+ .. method:: request_rate(useragent)
+
+ Returns the contents of the ``Request-rate`` parameter from
+ ``robots.txt`` in the form of a :func:`~collections.namedtuple`
+ ``(requests, seconds)``. If there is no such parameter or it doesn't
+ apply to the *useragent* specified or the ``robots.txt`` entry for this
+ parameter has invalid syntax, return ``None``.
+
+ .. versionadded:: 3.6
+
+
+The following example demonstrates basic use of the :class:`RobotFileParser`
+class::
>>> import urllib.robotparser
>>> rp = urllib.robotparser.RobotFileParser()
>>> rp.set_url("http://www.musi-cal.com/robots.txt")
>>> rp.read()
+ >>> rrate = rp.request_rate("*")
+ >>> rrate.requests
+ 3
+ >>> rrate.seconds
+ 20
+ >>> rp.crawl_delay("*")
+ 6
>>> rp.can_fetch("*", "http://www.musi-cal.com/cgi-bin/search?city=San+Francisco")
False
>>> rp.can_fetch("*", "http://www.musi-cal.com/")
True
-
diff --git a/Doc/whatsnew/3.6.rst b/Doc/whatsnew/3.6.rst
index dd35c9a..3080820 100644
--- a/Doc/whatsnew/3.6.rst
+++ b/Doc/whatsnew/3.6.rst
@@ -119,6 +119,14 @@ datetime
(Contributed by Ashley Anderson in :issue:`12006`.)
+urllib.robotparser
+------------------
+
+:class:`~urllib.robotparser.RobotFileParser` now supports ``Crawl-delay`` and
+``Request-rate`` extensions.
+(Contributed by Nikolay Bogoychev in :issue:`16099`.)
+
+
Optimizations
=============