diff options
author | Senthil Kumaran <orsenthil@gmail.com> | 2008-06-23 04:41:59 (GMT) |
---|---|---|
committer | Senthil Kumaran <orsenthil@gmail.com> | 2008-06-23 04:41:59 (GMT) |
commit | aca8fd7a9dc96143e592076fab4d89cc1691d03f (patch) | |
tree | f18d273e3f72b917139e07f3e6e4d72a5119fd94 /Doc/library/urllib.robotparser.rst | |
parent | d11a44312f2e80a9c4979063ce94233f924dcc5b (diff) | |
download | cpython-aca8fd7a9dc96143e592076fab4d89cc1691d03f.zip cpython-aca8fd7a9dc96143e592076fab4d89cc1691d03f.tar.gz cpython-aca8fd7a9dc96143e592076fab4d89cc1691d03f.tar.bz2 |
Documentation updates for urllib package. Modified the documentation for the
urllib,urllib2 -> urllib.request,urllib.error
urlparse -> urllib.parse
RobotParser -> urllib.robotparser
Updated tutorial references and other module references (http.client.rst,
ftplib.rst,contextlib.rst)
Updated the examples in the urllib2-howto
Addresses Issue3142.
Diffstat (limited to 'Doc/library/urllib.robotparser.rst')
-rw-r--r-- | Doc/library/urllib.robotparser.rst | 73 |
1 files changed, 73 insertions, 0 deletions
diff --git a/Doc/library/urllib.robotparser.rst b/Doc/library/urllib.robotparser.rst new file mode 100644 index 0000000..e351c56 --- /dev/null +++ b/Doc/library/urllib.robotparser.rst @@ -0,0 +1,73 @@ + +:mod:`urllib.robotparser` --- Parser for robots.txt +==================================================== + +.. module:: urllib.robotparser + :synopsis: Loads a robots.txt file and answers questions about + fetchability of other URLs. +.. sectionauthor:: Skip Montanaro <skip@pobox.com> + + +.. index:: + single: WWW + single: World Wide Web + single: URL + single: robots.txt + +This module provides a single class, :class:`RobotFileParser`, which answers +questions about whether or not a particular user agent can fetch a URL on the +Web site that published the :file:`robots.txt` file. For more details on the +structure of :file:`robots.txt` files, see http://www.robotstxt.org/orig.html. + + +.. class:: RobotFileParser() + + This class provides a set of methods to read, parse and answer questions + about a single :file:`robots.txt` file. + + + .. method:: set_url(url) + + Sets the URL referring to a :file:`robots.txt` file. + + + .. method:: read() + + Reads the :file:`robots.txt` URL and feeds it to the parser. + + + .. method:: parse(lines) + + Parses the lines argument. + + + .. method:: can_fetch(useragent, url) + + Returns ``True`` if the *useragent* is allowed to fetch the *url* + according to the rules contained in the parsed :file:`robots.txt` + file. + + + .. method:: mtime() + + Returns the time the ``robots.txt`` file was last fetched. This is + useful for long-running web spiders that need to check for new + ``robots.txt`` files periodically. + + + .. method:: modified() + + Sets the time the ``robots.txt`` file was last fetched to the current + time. + +The following example demonstrates basic use of the RobotFileParser class. :: + + >>> import urllib.robotparser + >>> rp = urllib.robotparser.RobotFileParser() + >>> rp.set_url("http://www.musi-cal.com/robots.txt") + >>> rp.read() + >>> rp.can_fetch("*", "http://www.musi-cal.com/cgi-bin/search?city=San+Francisco") + False + >>> rp.can_fetch("*", "http://www.musi-cal.com/") + True + |