diff options
author | Georg Brandl <georg@python.org> | 2007-08-15 14:28:01 (GMT) |
---|---|---|
committer | Georg Brandl <georg@python.org> | 2007-08-15 14:28:01 (GMT) |
commit | 8ec7f656134b1230ab23003a94ba3266d7064122 (patch) | |
tree | bc730d5fb3302dc375edd26b26f750d609b61d72 /Doc/library/robotparser.rst | |
parent | f56181ff53ba00b7bed3997a4dccd9a1b6217b57 (diff) | |
download | cpython-8ec7f656134b1230ab23003a94ba3266d7064122.zip cpython-8ec7f656134b1230ab23003a94ba3266d7064122.tar.gz cpython-8ec7f656134b1230ab23003a94ba3266d7064122.tar.bz2 |
Move the 2.6 reST doc tree in place.
Diffstat (limited to 'Doc/library/robotparser.rst')
-rw-r--r-- | Doc/library/robotparser.rst | 71 |
1 files changed, 71 insertions, 0 deletions
diff --git a/Doc/library/robotparser.rst b/Doc/library/robotparser.rst new file mode 100644 index 0000000..1a66955 --- /dev/null +++ b/Doc/library/robotparser.rst @@ -0,0 +1,71 @@ + +:mod:`robotparser` --- Parser for robots.txt +============================================= + +.. module:: robotparser + :synopsis: Loads a robots.txt file and answers questions about fetchability of other URLs. +.. sectionauthor:: Skip Montanaro <skip@mojam.com> + + +.. index:: + single: WWW + single: World Wide Web + single: URL + single: robots.txt + +This module provides a single class, :class:`RobotFileParser`, which answers +questions about whether or not a particular user agent can fetch a URL on the +Web site that published the :file:`robots.txt` file. For more details on the +structure of :file:`robots.txt` files, see +http://www.robotstxt.org/wc/norobots.html. + + +.. class:: RobotFileParser() + + This class provides a set of methods to read, parse and answer questions about a + single :file:`robots.txt` file. + + + .. method:: RobotFileParser.set_url(url) + + Sets the URL referring to a :file:`robots.txt` file. + + + .. method:: RobotFileParser.read() + + Reads the :file:`robots.txt` URL and feeds it to the parser. + + + .. method:: RobotFileParser.parse(lines) + + Parses the lines argument. + + + .. method:: RobotFileParser.can_fetch(useragent, url) + + Returns ``True`` if the *useragent* is allowed to fetch the *url* according to + the rules contained in the parsed :file:`robots.txt` file. + + + .. method:: RobotFileParser.mtime() + + Returns the time the ``robots.txt`` file was last fetched. This is useful for + long-running web spiders that need to check for new ``robots.txt`` files + periodically. + + + .. method:: RobotFileParser.modified() + + Sets the time the ``robots.txt`` file was last fetched to the current time. + +The following example demonstrates basic use of the RobotFileParser class. :: + + >>> import robotparser + >>> rp = robotparser.RobotFileParser() + >>> rp.set_url("http://www.musi-cal.com/robots.txt") + >>> rp.read() + >>> rp.can_fetch("*", "http://www.musi-cal.com/cgi-bin/search?city=San+Francisco") + False + >>> rp.can_fetch("*", "http://www.musi-cal.com/") + True + |