diff options
author | Georg Brandl <georg@python.org> | 2008-06-23 11:23:31 (GMT) |
---|---|---|
committer | Georg Brandl <georg@python.org> | 2008-06-23 11:23:31 (GMT) |
commit | 0f7ede45693be57ba51c7aa23a0d841f160de874 (patch) | |
tree | 42f8f578bdf60432c9056b2e300529efb1d9c6b4 /Doc/library | |
parent | aca8fd7a9dc96143e592076fab4d89cc1691d03f (diff) | |
download | cpython-0f7ede45693be57ba51c7aa23a0d841f160de874.zip cpython-0f7ede45693be57ba51c7aa23a0d841f160de874.tar.gz cpython-0f7ede45693be57ba51c7aa23a0d841f160de874.tar.bz2 |
Review the doc changes for the urllib package creation.
Diffstat (limited to 'Doc/library')
-rw-r--r-- | Doc/library/contextlib.rst | 4 | ||||
-rw-r--r-- | Doc/library/http.client.rst | 3 | ||||
-rw-r--r-- | Doc/library/robotparser.rst | 73 | ||||
-rw-r--r-- | Doc/library/urllib.error.rst | 42 | ||||
-rw-r--r-- | Doc/library/urllib.parse.rst | 58 | ||||
-rw-r--r-- | Doc/library/urllib.request.rst | 22 | ||||
-rw-r--r-- | Doc/library/urllib.robotparser.rst | 12 |
7 files changed, 68 insertions, 146 deletions
diff --git a/Doc/library/contextlib.rst b/Doc/library/contextlib.rst index 2cd97c2..74a68cf 100644 --- a/Doc/library/contextlib.rst +++ b/Doc/library/contextlib.rst @@ -98,9 +98,9 @@ Functions provided: And lets you write code like this:: from contextlib import closing - import urllib.request + from urllib.request import urlopen - with closing(urllib.request.urlopen('http://www.python.org')) as page: + with closing(urlopen('http://www.python.org')) as page: for line in page: print(line) diff --git a/Doc/library/http.client.rst b/Doc/library/http.client.rst index 1ea3576..bcda4c9 100644 --- a/Doc/library/http.client.rst +++ b/Doc/library/http.client.rst @@ -13,8 +13,7 @@ This module defines classes which implement the client side of the HTTP and HTTPS protocols. It is normally not used directly --- the module -:mod:`urllib.request` -uses it to handle URLs that use HTTP and HTTPS. +:mod:`urllib.request` uses it to handle URLs that use HTTP and HTTPS. .. note:: diff --git a/Doc/library/robotparser.rst b/Doc/library/robotparser.rst deleted file mode 100644 index cce7966..0000000 --- a/Doc/library/robotparser.rst +++ /dev/null @@ -1,73 +0,0 @@ - -:mod:`robotparser` --- Parser for robots.txt -============================================= - -.. module:: robotparser - :synopsis: Loads a robots.txt file and answers questions about - fetchability of other URLs. -.. sectionauthor:: Skip Montanaro <skip@pobox.com> - - -.. index:: - single: WWW - single: World Wide Web - single: URL - single: robots.txt - -This module provides a single class, :class:`RobotFileParser`, which answers -questions about whether or not a particular user agent can fetch a URL on the -Web site that published the :file:`robots.txt` file. For more details on the -structure of :file:`robots.txt` files, see http://www.robotstxt.org/orig.html. - - -.. class:: RobotFileParser() - - This class provides a set of methods to read, parse and answer questions - about a single :file:`robots.txt` file. - - - .. method:: set_url(url) - - Sets the URL referring to a :file:`robots.txt` file. - - - .. method:: read() - - Reads the :file:`robots.txt` URL and feeds it to the parser. - - - .. method:: parse(lines) - - Parses the lines argument. - - - .. method:: can_fetch(useragent, url) - - Returns ``True`` if the *useragent* is allowed to fetch the *url* - according to the rules contained in the parsed :file:`robots.txt` - file. - - - .. method:: mtime() - - Returns the time the ``robots.txt`` file was last fetched. This is - useful for long-running web spiders that need to check for new - ``robots.txt`` files periodically. - - - .. method:: modified() - - Sets the time the ``robots.txt`` file was last fetched to the current - time. - -The following example demonstrates basic use of the RobotFileParser class. :: - - >>> import robotparser - >>> rp = robotparser.RobotFileParser() - >>> rp.set_url("http://www.musi-cal.com/robots.txt") - >>> rp.read() - >>> rp.can_fetch("*", "http://www.musi-cal.com/cgi-bin/search?city=San+Francisco") - False - >>> rp.can_fetch("*", "http://www.musi-cal.com/") - True - diff --git a/Doc/library/urllib.error.rst b/Doc/library/urllib.error.rst index 1cbfe7d..bd76860 100644 --- a/Doc/library/urllib.error.rst +++ b/Doc/library/urllib.error.rst @@ -2,47 +2,47 @@ ================================================================== .. module:: urllib.error - :synopsis: Next generation URL opening library. + :synopsis: Exception classes raised by urllib.request. .. moduleauthor:: Jeremy Hylton <jhylton@users.sourceforge.net> .. sectionauthor:: Senthil Kumaran <orsenthil@gmail.com> -The :mod:`urllib.error` module defines exception classes raise by -urllib.request. The base exception class is URLError, which inherits from -IOError. +The :mod:`urllib.error` module defines the exception classes for exceptions +raised by :mod:`urllib.request`. The base exception class is :exc:`URLError`, +which inherits from :exc:`IOError`. The following exceptions are raised by :mod:`urllib.error` as appropriate: - .. exception:: URLError - The handlers raise this exception (or derived exceptions) when they run into a - problem. It is a subclass of :exc:`IOError`. + The handlers raise this exception (or derived exceptions) when they run into + a problem. It is a subclass of :exc:`IOError`. .. attribute:: reason - The reason for this error. It can be a message string or another exception - instance (:exc:`socket.error` for remote URLs, :exc:`OSError` for local - URLs). + The reason for this error. It can be a message string or another + exception instance (:exc:`socket.error` for remote URLs, :exc:`OSError` + for local URLs). .. exception:: HTTPError - Though being an exception (a subclass of :exc:`URLError`), an :exc:`HTTPError` - can also function as a non-exceptional file-like return value (the same thing - that :func:`urlopen` returns). This is useful when handling exotic HTTP - errors, such as requests for authentication. + Though being an exception (a subclass of :exc:`URLError`), an + :exc:`HTTPError` can also function as a non-exceptional file-like return + value (the same thing that :func:`urlopen` returns). This is useful when + handling exotic HTTP errors, such as requests for authentication. .. attribute:: code - An HTTP status code as defined in `RFC 2616 <http://www.faqs.org/rfcs/rfc2616.html>`_. - This numeric value corresponds to a value found in the dictionary of - codes as found in :attr:`http.server.BaseHTTPRequestHandler.responses`. + An HTTP status code as defined in `RFC 2616 + <http://www.faqs.org/rfcs/rfc2616.html>`_. This numeric value corresponds + to a value found in the dictionary of codes as found in + :attr:`http.server.BaseHTTPRequestHandler.responses`. .. exception:: ContentTooShortError(msg[, content]) - This exception is raised when the :func:`urlretrieve` function detects that the - amount of the downloaded data is less than the expected amount (given by the - *Content-Length* header). The :attr:`content` attribute stores the downloaded - (and supposedly truncated) data. + This exception is raised when the :func:`urlretrieve` function detects that + the amount of the downloaded data is less than the expected amount (given by + the *Content-Length* header). The :attr:`content` attribute stores the + downloaded (and supposedly truncated) data. diff --git a/Doc/library/urllib.parse.rst b/Doc/library/urllib.parse.rst index affa406..a5463e6 100644 --- a/Doc/library/urllib.parse.rst +++ b/Doc/library/urllib.parse.rst @@ -20,13 +20,12 @@ to an absolute URL given a "base URL." The module has been designed to match the Internet RFC on Relative Uniform Resource Locators (and discovered a bug in an earlier draft!). It supports the following URL schemes: ``file``, ``ftp``, ``gopher``, ``hdl``, ``http``, -``https``, ``imap``, ``mailto``, ``mms``, ``news``, ``nntp``, ``prospero``, -``rsync``, ``rtsp``, ``rtspu``, ``sftp``, ``shttp``, ``sip``, ``sips``, -``snews``, ``svn``, ``svn+ssh``, ``telnet``, ``wais``. +``https``, ``imap``, ``mailto``, ``mms``, ``news``, ``nntp``, ``prospero``, +``rsync``, ``rtsp``, ``rtspu``, ``sftp``, ``shttp``, ``sip``, ``sips``, +``snews``, ``svn``, ``svn+ssh``, ``telnet``, ``wais``. The :mod:`urllib.parse` module defines the following functions: - .. function:: urlparse(urlstring[, default_scheme[, allow_fragments]]) Parse a URL into six components, returning a 6-tuple. This corresponds to the @@ -92,11 +91,11 @@ The :mod:`urllib.parse` module defines the following functions: .. function:: urlunparse(parts) - Construct a URL from a tuple as returned by ``urlparse()``. The *parts* argument - can be any six-item iterable. This may result in a slightly different, but - equivalent URL, if the URL that was parsed originally had unnecessary delimiters - (for example, a ? with an empty query; the RFC states that these are - equivalent). + Construct a URL from a tuple as returned by ``urlparse()``. The *parts* + argument can be any six-item iterable. This may result in a slightly + different, but equivalent URL, if the URL that was parsed originally had + unnecessary delimiters (for example, a ``?`` with an empty query; the RFC + states that these are equivalent). .. function:: urlsplit(urlstring[, default_scheme[, allow_fragments]]) @@ -140,19 +139,19 @@ The :mod:`urllib.parse` module defines the following functions: .. function:: urlunsplit(parts) - Combine the elements of a tuple as returned by :func:`urlsplit` into a complete - URL as a string. The *parts* argument can be any five-item iterable. This may - result in a slightly different, but equivalent URL, if the URL that was parsed - originally had unnecessary delimiters (for example, a ? with an empty query; the - RFC states that these are equivalent). + Combine the elements of a tuple as returned by :func:`urlsplit` into a + complete URL as a string. The *parts* argument can be any five-item + iterable. This may result in a slightly different, but equivalent URL, if the + URL that was parsed originally had unnecessary delimiters (for example, a ? + with an empty query; the RFC states that these are equivalent). .. function:: urljoin(base, url[, allow_fragments]) Construct a full ("absolute") URL by combining a "base URL" (*base*) with another URL (*url*). Informally, this uses components of the base URL, in - particular the addressing scheme, the network location and (part of) the path, - to provide missing components in the relative URL. For example: + particular the addressing scheme, the network location and (part of) the + path, to provide missing components in the relative URL. For example: >>> from urllib.parse import urljoin >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html') @@ -178,10 +177,10 @@ The :mod:`urllib.parse` module defines the following functions: .. function:: urldefrag(url) - If *url* contains a fragment identifier, returns a modified version of *url* - with no fragment identifier, and the fragment identifier as a separate string. - If there is no fragment identifier in *url*, returns *url* unmodified and an - empty string. + If *url* contains a fragment identifier, return a modified version of *url* + with no fragment identifier, and the fragment identifier as a separate + string. If there is no fragment identifier in *url*, return *url* unmodified + and an empty string. .. function:: quote(string[, safe]) @@ -195,9 +194,10 @@ The :mod:`urllib.parse` module defines the following functions: .. function:: quote_plus(string[, safe]) - Like :func:`quote`, but also replaces spaces by plus signs, as required for - quoting HTML form values. Plus signs in the original string are escaped unless - they are included in *safe*. It also does not have *safe* default to ``'/'``. + Like :func:`quote`, but also replace spaces by plus signs, as required for + quoting HTML form values. Plus signs in the original string are escaped + unless they are included in *safe*. It also does not have *safe* default to + ``'/'``. .. function:: unquote(string) @@ -209,7 +209,7 @@ The :mod:`urllib.parse` module defines the following functions: .. function:: unquote_plus(string) - Like :func:`unquote`, but also replaces plus signs by spaces, as required for + Like :func:`unquote`, but also replace plus signs by spaces, as required for unquoting HTML form values. @@ -254,7 +254,6 @@ The result objects from the :func:`urlparse` and :func:`urlsplit` functions are subclasses of the :class:`tuple` type. These subclasses add the attributes described in those functions, as well as provide an additional method: - .. method:: ParseResult.geturl() Return the re-combined version of the original URL as a string. This may differ @@ -279,13 +278,12 @@ described in those functions, as well as provide an additional method: The following classes provide the implementations of the parse results:: - .. class:: BaseResult - Base class for the concrete result classes. This provides most of the attribute - definitions. It does not provide a :meth:`geturl` method. It is derived from - :class:`tuple`, but does not override the :meth:`__init__` or :meth:`__new__` - methods. + Base class for the concrete result classes. This provides most of the + attribute definitions. It does not provide a :meth:`geturl` method. It is + derived from :class:`tuple`, but does not override the :meth:`__init__` or + :meth:`__new__` methods. .. class:: ParseResult(scheme, netloc, path, params, query, fragment) diff --git a/Doc/library/urllib.request.rst b/Doc/library/urllib.request.rst index 4262836..d124d9a 100644 --- a/Doc/library/urllib.request.rst +++ b/Doc/library/urllib.request.rst @@ -7,9 +7,9 @@ .. sectionauthor:: Moshe Zadka <moshez@users.sourceforge.net> -The :mod:`urllib.request` module defines functions and classes which help in opening -URLs (mostly HTTP) in a complex world --- basic and digest authentication, -redirections, cookies and more. +The :mod:`urllib.request` module defines functions and classes which help in +opening URLs (mostly HTTP) in a complex world --- basic and digest +authentication, redirections, cookies and more. The :mod:`urllib.request` module defines the following functions: @@ -180,7 +180,7 @@ The following classes are provided: the ``User-Agent`` header, which is used by a browser to identify itself -- some HTTP servers only allow requests coming from common browsers as opposed to scripts. For example, Mozilla Firefox may identify itself as ``"Mozilla/5.0 - (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"``, while :mod:`urllib2`'s + (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"``, while :mod:`urllib`'s default user agent string is ``"Python-urllib/2.6"`` (on Python 2.6). The final two arguments are only of interest for correct handling of third-party @@ -1005,10 +1005,11 @@ HTTPErrorProcessor Objects For non-200 error codes, this simply passes the job on to the :meth:`protocol_error_code` handler methods, via :meth:`OpenerDirector.error`. - Eventually, :class:`urllib2.HTTPDefaultErrorHandler` will raise an + Eventually, :class:`HTTPDefaultErrorHandler` will raise an :exc:`HTTPError` if no other handler handles the error. -.. _urllib2-examples: + +.. _urllib-request-examples: Examples -------- @@ -1180,15 +1181,18 @@ The following example uses no proxies at all, overriding environment settings:: using the :mod:`ftplib` module, subclassing :class:`FancyURLOpener`, or changing *_urlopener* to meet your needs. + + :mod:`urllib.response` --- Response classes used by urllib. =========================================================== + .. module:: urllib.response :synopsis: Response classes used by urllib. The :mod:`urllib.response` module defines functions and classes which define a -minimal file like interface, including read() and readline(). The typical -response object is an addinfourl instance, which defines and info() method and -that returns headers and a geturl() method that returns the url. +minimal file like interface, including ``read()`` and ``readline()``. The +typical response object is an addinfourl instance, which defines and ``info()`` +method and that returns headers and a ``geturl()`` method that returns the url. Functions defined by this module are used internally by the :mod:`urllib.request` module. diff --git a/Doc/library/urllib.robotparser.rst b/Doc/library/urllib.robotparser.rst index e351c56..0cac2ad 100644 --- a/Doc/library/urllib.robotparser.rst +++ b/Doc/library/urllib.robotparser.rst @@ -1,9 +1,8 @@ - :mod:`urllib.robotparser` --- Parser for robots.txt ==================================================== .. module:: urllib.robotparser - :synopsis: Loads a robots.txt file and answers questions about + :synopsis: Load a robots.txt file and answer questions about fetchability of other URLs. .. sectionauthor:: Skip Montanaro <skip@pobox.com> @@ -25,42 +24,37 @@ structure of :file:`robots.txt` files, see http://www.robotstxt.org/orig.html. This class provides a set of methods to read, parse and answer questions about a single :file:`robots.txt` file. - .. method:: set_url(url) Sets the URL referring to a :file:`robots.txt` file. - .. method:: read() Reads the :file:`robots.txt` URL and feeds it to the parser. - .. method:: parse(lines) Parses the lines argument. - .. method:: can_fetch(useragent, url) Returns ``True`` if the *useragent* is allowed to fetch the *url* according to the rules contained in the parsed :file:`robots.txt` file. - .. method:: mtime() Returns the time the ``robots.txt`` file was last fetched. This is useful for long-running web spiders that need to check for new ``robots.txt`` files periodically. - .. method:: modified() Sets the time the ``robots.txt`` file was last fetched to the current time. -The following example demonstrates basic use of the RobotFileParser class. :: + +The following example demonstrates basic use of the RobotFileParser class. >>> import urllib.robotparser >>> rp = urllib.robotparser.RobotFileParser() |