summaryrefslogtreecommitdiffstats
path: root/Doc/library/urllib.robotparser.rst
diff options
context:
space:
mode:
authorSenthil Kumaran <orsenthil@gmail.com>2008-06-23 04:41:59 (GMT)
committerSenthil Kumaran <orsenthil@gmail.com>2008-06-23 04:41:59 (GMT)
commitaca8fd7a9dc96143e592076fab4d89cc1691d03f (patch)
treef18d273e3f72b917139e07f3e6e4d72a5119fd94 /Doc/library/urllib.robotparser.rst
parentd11a44312f2e80a9c4979063ce94233f924dcc5b (diff)
downloadcpython-aca8fd7a9dc96143e592076fab4d89cc1691d03f.zip
cpython-aca8fd7a9dc96143e592076fab4d89cc1691d03f.tar.gz
cpython-aca8fd7a9dc96143e592076fab4d89cc1691d03f.tar.bz2
Documentation updates for urllib package. Modified the documentation for the
urllib,urllib2 -> urllib.request,urllib.error urlparse -> urllib.parse RobotParser -> urllib.robotparser Updated tutorial references and other module references (http.client.rst, ftplib.rst,contextlib.rst) Updated the examples in the urllib2-howto Addresses Issue3142.
Diffstat (limited to 'Doc/library/urllib.robotparser.rst')
-rw-r--r--Doc/library/urllib.robotparser.rst73
1 files changed, 73 insertions, 0 deletions
diff --git a/Doc/library/urllib.robotparser.rst b/Doc/library/urllib.robotparser.rst
new file mode 100644
index 0000000..e351c56
--- /dev/null
+++ b/Doc/library/urllib.robotparser.rst
@@ -0,0 +1,73 @@
+
+:mod:`urllib.robotparser` --- Parser for robots.txt
+====================================================
+
+.. module:: urllib.robotparser
+ :synopsis: Loads a robots.txt file and answers questions about
+ fetchability of other URLs.
+.. sectionauthor:: Skip Montanaro <skip@pobox.com>
+
+
+.. index::
+ single: WWW
+ single: World Wide Web
+ single: URL
+ single: robots.txt
+
+This module provides a single class, :class:`RobotFileParser`, which answers
+questions about whether or not a particular user agent can fetch a URL on the
+Web site that published the :file:`robots.txt` file. For more details on the
+structure of :file:`robots.txt` files, see http://www.robotstxt.org/orig.html.
+
+
+.. class:: RobotFileParser()
+
+ This class provides a set of methods to read, parse and answer questions
+ about a single :file:`robots.txt` file.
+
+
+ .. method:: set_url(url)
+
+ Sets the URL referring to a :file:`robots.txt` file.
+
+
+ .. method:: read()
+
+ Reads the :file:`robots.txt` URL and feeds it to the parser.
+
+
+ .. method:: parse(lines)
+
+ Parses the lines argument.
+
+
+ .. method:: can_fetch(useragent, url)
+
+ Returns ``True`` if the *useragent* is allowed to fetch the *url*
+ according to the rules contained in the parsed :file:`robots.txt`
+ file.
+
+
+ .. method:: mtime()
+
+ Returns the time the ``robots.txt`` file was last fetched. This is
+ useful for long-running web spiders that need to check for new
+ ``robots.txt`` files periodically.
+
+
+ .. method:: modified()
+
+ Sets the time the ``robots.txt`` file was last fetched to the current
+ time.
+
+The following example demonstrates basic use of the RobotFileParser class. ::
+
+ >>> import urllib.robotparser
+ >>> rp = urllib.robotparser.RobotFileParser()
+ >>> rp.set_url("http://www.musi-cal.com/robots.txt")
+ >>> rp.read()
+ >>> rp.can_fetch("*", "http://www.musi-cal.com/cgi-bin/search?city=San+Francisco")
+ False
+ >>> rp.can_fetch("*", "http://www.musi-cal.com/")
+ True
+