summaryrefslogtreecommitdiffstats
path: root/Doc/library
diff options
context:
space:
mode:
authorGeorg Brandl <georg@python.org>2008-06-23 11:23:31 (GMT)
committerGeorg Brandl <georg@python.org>2008-06-23 11:23:31 (GMT)
commit0f7ede45693be57ba51c7aa23a0d841f160de874 (patch)
tree42f8f578bdf60432c9056b2e300529efb1d9c6b4 /Doc/library
parentaca8fd7a9dc96143e592076fab4d89cc1691d03f (diff)
downloadcpython-0f7ede45693be57ba51c7aa23a0d841f160de874.zip
cpython-0f7ede45693be57ba51c7aa23a0d841f160de874.tar.gz
cpython-0f7ede45693be57ba51c7aa23a0d841f160de874.tar.bz2
Review the doc changes for the urllib package creation.
Diffstat (limited to 'Doc/library')
-rw-r--r--Doc/library/contextlib.rst4
-rw-r--r--Doc/library/http.client.rst3
-rw-r--r--Doc/library/robotparser.rst73
-rw-r--r--Doc/library/urllib.error.rst42
-rw-r--r--Doc/library/urllib.parse.rst58
-rw-r--r--Doc/library/urllib.request.rst22
-rw-r--r--Doc/library/urllib.robotparser.rst12
7 files changed, 68 insertions, 146 deletions
diff --git a/Doc/library/contextlib.rst b/Doc/library/contextlib.rst
index 2cd97c2..74a68cf 100644
--- a/Doc/library/contextlib.rst
+++ b/Doc/library/contextlib.rst
@@ -98,9 +98,9 @@ Functions provided:
And lets you write code like this::
from contextlib import closing
- import urllib.request
+ from urllib.request import urlopen
- with closing(urllib.request.urlopen('http://www.python.org')) as page:
+ with closing(urlopen('http://www.python.org')) as page:
for line in page:
print(line)
diff --git a/Doc/library/http.client.rst b/Doc/library/http.client.rst
index 1ea3576..bcda4c9 100644
--- a/Doc/library/http.client.rst
+++ b/Doc/library/http.client.rst
@@ -13,8 +13,7 @@
This module defines classes which implement the client side of the HTTP and
HTTPS protocols. It is normally not used directly --- the module
-:mod:`urllib.request`
-uses it to handle URLs that use HTTP and HTTPS.
+:mod:`urllib.request` uses it to handle URLs that use HTTP and HTTPS.
.. note::
diff --git a/Doc/library/robotparser.rst b/Doc/library/robotparser.rst
deleted file mode 100644
index cce7966..0000000
--- a/Doc/library/robotparser.rst
+++ /dev/null
@@ -1,73 +0,0 @@
-
-:mod:`robotparser` --- Parser for robots.txt
-=============================================
-
-.. module:: robotparser
- :synopsis: Loads a robots.txt file and answers questions about
- fetchability of other URLs.
-.. sectionauthor:: Skip Montanaro <skip@pobox.com>
-
-
-.. index::
- single: WWW
- single: World Wide Web
- single: URL
- single: robots.txt
-
-This module provides a single class, :class:`RobotFileParser`, which answers
-questions about whether or not a particular user agent can fetch a URL on the
-Web site that published the :file:`robots.txt` file. For more details on the
-structure of :file:`robots.txt` files, see http://www.robotstxt.org/orig.html.
-
-
-.. class:: RobotFileParser()
-
- This class provides a set of methods to read, parse and answer questions
- about a single :file:`robots.txt` file.
-
-
- .. method:: set_url(url)
-
- Sets the URL referring to a :file:`robots.txt` file.
-
-
- .. method:: read()
-
- Reads the :file:`robots.txt` URL and feeds it to the parser.
-
-
- .. method:: parse(lines)
-
- Parses the lines argument.
-
-
- .. method:: can_fetch(useragent, url)
-
- Returns ``True`` if the *useragent* is allowed to fetch the *url*
- according to the rules contained in the parsed :file:`robots.txt`
- file.
-
-
- .. method:: mtime()
-
- Returns the time the ``robots.txt`` file was last fetched. This is
- useful for long-running web spiders that need to check for new
- ``robots.txt`` files periodically.
-
-
- .. method:: modified()
-
- Sets the time the ``robots.txt`` file was last fetched to the current
- time.
-
-The following example demonstrates basic use of the RobotFileParser class. ::
-
- >>> import robotparser
- >>> rp = robotparser.RobotFileParser()
- >>> rp.set_url("http://www.musi-cal.com/robots.txt")
- >>> rp.read()
- >>> rp.can_fetch("*", "http://www.musi-cal.com/cgi-bin/search?city=San+Francisco")
- False
- >>> rp.can_fetch("*", "http://www.musi-cal.com/")
- True
-
diff --git a/Doc/library/urllib.error.rst b/Doc/library/urllib.error.rst
index 1cbfe7d..bd76860 100644
--- a/Doc/library/urllib.error.rst
+++ b/Doc/library/urllib.error.rst
@@ -2,47 +2,47 @@
==================================================================
.. module:: urllib.error
- :synopsis: Next generation URL opening library.
+ :synopsis: Exception classes raised by urllib.request.
.. moduleauthor:: Jeremy Hylton <jhylton@users.sourceforge.net>
.. sectionauthor:: Senthil Kumaran <orsenthil@gmail.com>
-The :mod:`urllib.error` module defines exception classes raise by
-urllib.request. The base exception class is URLError, which inherits from
-IOError.
+The :mod:`urllib.error` module defines the exception classes for exceptions
+raised by :mod:`urllib.request`. The base exception class is :exc:`URLError`,
+which inherits from :exc:`IOError`.
The following exceptions are raised by :mod:`urllib.error` as appropriate:
-
.. exception:: URLError
- The handlers raise this exception (or derived exceptions) when they run into a
- problem. It is a subclass of :exc:`IOError`.
+ The handlers raise this exception (or derived exceptions) when they run into
+ a problem. It is a subclass of :exc:`IOError`.
.. attribute:: reason
- The reason for this error. It can be a message string or another exception
- instance (:exc:`socket.error` for remote URLs, :exc:`OSError` for local
- URLs).
+ The reason for this error. It can be a message string or another
+ exception instance (:exc:`socket.error` for remote URLs, :exc:`OSError`
+ for local URLs).
.. exception:: HTTPError
- Though being an exception (a subclass of :exc:`URLError`), an :exc:`HTTPError`
- can also function as a non-exceptional file-like return value (the same thing
- that :func:`urlopen` returns). This is useful when handling exotic HTTP
- errors, such as requests for authentication.
+ Though being an exception (a subclass of :exc:`URLError`), an
+ :exc:`HTTPError` can also function as a non-exceptional file-like return
+ value (the same thing that :func:`urlopen` returns). This is useful when
+ handling exotic HTTP errors, such as requests for authentication.
.. attribute:: code
- An HTTP status code as defined in `RFC 2616 <http://www.faqs.org/rfcs/rfc2616.html>`_.
- This numeric value corresponds to a value found in the dictionary of
- codes as found in :attr:`http.server.BaseHTTPRequestHandler.responses`.
+ An HTTP status code as defined in `RFC 2616
+ <http://www.faqs.org/rfcs/rfc2616.html>`_. This numeric value corresponds
+ to a value found in the dictionary of codes as found in
+ :attr:`http.server.BaseHTTPRequestHandler.responses`.
.. exception:: ContentTooShortError(msg[, content])
- This exception is raised when the :func:`urlretrieve` function detects that the
- amount of the downloaded data is less than the expected amount (given by the
- *Content-Length* header). The :attr:`content` attribute stores the downloaded
- (and supposedly truncated) data.
+ This exception is raised when the :func:`urlretrieve` function detects that
+ the amount of the downloaded data is less than the expected amount (given by
+ the *Content-Length* header). The :attr:`content` attribute stores the
+ downloaded (and supposedly truncated) data.
diff --git a/Doc/library/urllib.parse.rst b/Doc/library/urllib.parse.rst
index affa406..a5463e6 100644
--- a/Doc/library/urllib.parse.rst
+++ b/Doc/library/urllib.parse.rst
@@ -20,13 +20,12 @@ to an absolute URL given a "base URL."
The module has been designed to match the Internet RFC on Relative Uniform
Resource Locators (and discovered a bug in an earlier draft!). It supports the
following URL schemes: ``file``, ``ftp``, ``gopher``, ``hdl``, ``http``,
-``https``, ``imap``, ``mailto``, ``mms``, ``news``, ``nntp``, ``prospero``,
-``rsync``, ``rtsp``, ``rtspu``, ``sftp``, ``shttp``, ``sip``, ``sips``,
-``snews``, ``svn``, ``svn+ssh``, ``telnet``, ``wais``.
+``https``, ``imap``, ``mailto``, ``mms``, ``news``, ``nntp``, ``prospero``,
+``rsync``, ``rtsp``, ``rtspu``, ``sftp``, ``shttp``, ``sip``, ``sips``,
+``snews``, ``svn``, ``svn+ssh``, ``telnet``, ``wais``.
The :mod:`urllib.parse` module defines the following functions:
-
.. function:: urlparse(urlstring[, default_scheme[, allow_fragments]])
Parse a URL into six components, returning a 6-tuple. This corresponds to the
@@ -92,11 +91,11 @@ The :mod:`urllib.parse` module defines the following functions:
.. function:: urlunparse(parts)
- Construct a URL from a tuple as returned by ``urlparse()``. The *parts* argument
- can be any six-item iterable. This may result in a slightly different, but
- equivalent URL, if the URL that was parsed originally had unnecessary delimiters
- (for example, a ? with an empty query; the RFC states that these are
- equivalent).
+ Construct a URL from a tuple as returned by ``urlparse()``. The *parts*
+ argument can be any six-item iterable. This may result in a slightly
+ different, but equivalent URL, if the URL that was parsed originally had
+ unnecessary delimiters (for example, a ``?`` with an empty query; the RFC
+ states that these are equivalent).
.. function:: urlsplit(urlstring[, default_scheme[, allow_fragments]])
@@ -140,19 +139,19 @@ The :mod:`urllib.parse` module defines the following functions:
.. function:: urlunsplit(parts)
- Combine the elements of a tuple as returned by :func:`urlsplit` into a complete
- URL as a string. The *parts* argument can be any five-item iterable. This may
- result in a slightly different, but equivalent URL, if the URL that was parsed
- originally had unnecessary delimiters (for example, a ? with an empty query; the
- RFC states that these are equivalent).
+ Combine the elements of a tuple as returned by :func:`urlsplit` into a
+ complete URL as a string. The *parts* argument can be any five-item
+ iterable. This may result in a slightly different, but equivalent URL, if the
+ URL that was parsed originally had unnecessary delimiters (for example, a ?
+ with an empty query; the RFC states that these are equivalent).
.. function:: urljoin(base, url[, allow_fragments])
Construct a full ("absolute") URL by combining a "base URL" (*base*) with
another URL (*url*). Informally, this uses components of the base URL, in
- particular the addressing scheme, the network location and (part of) the path,
- to provide missing components in the relative URL. For example:
+ particular the addressing scheme, the network location and (part of) the
+ path, to provide missing components in the relative URL. For example:
>>> from urllib.parse import urljoin
>>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
@@ -178,10 +177,10 @@ The :mod:`urllib.parse` module defines the following functions:
.. function:: urldefrag(url)
- If *url* contains a fragment identifier, returns a modified version of *url*
- with no fragment identifier, and the fragment identifier as a separate string.
- If there is no fragment identifier in *url*, returns *url* unmodified and an
- empty string.
+ If *url* contains a fragment identifier, return a modified version of *url*
+ with no fragment identifier, and the fragment identifier as a separate
+ string. If there is no fragment identifier in *url*, return *url* unmodified
+ and an empty string.
.. function:: quote(string[, safe])
@@ -195,9 +194,10 @@ The :mod:`urllib.parse` module defines the following functions:
.. function:: quote_plus(string[, safe])
- Like :func:`quote`, but also replaces spaces by plus signs, as required for
- quoting HTML form values. Plus signs in the original string are escaped unless
- they are included in *safe*. It also does not have *safe* default to ``'/'``.
+ Like :func:`quote`, but also replace spaces by plus signs, as required for
+ quoting HTML form values. Plus signs in the original string are escaped
+ unless they are included in *safe*. It also does not have *safe* default to
+ ``'/'``.
.. function:: unquote(string)
@@ -209,7 +209,7 @@ The :mod:`urllib.parse` module defines the following functions:
.. function:: unquote_plus(string)
- Like :func:`unquote`, but also replaces plus signs by spaces, as required for
+ Like :func:`unquote`, but also replace plus signs by spaces, as required for
unquoting HTML form values.
@@ -254,7 +254,6 @@ The result objects from the :func:`urlparse` and :func:`urlsplit` functions are
subclasses of the :class:`tuple` type. These subclasses add the attributes
described in those functions, as well as provide an additional method:
-
.. method:: ParseResult.geturl()
Return the re-combined version of the original URL as a string. This may differ
@@ -279,13 +278,12 @@ described in those functions, as well as provide an additional method:
The following classes provide the implementations of the parse results::
-
.. class:: BaseResult
- Base class for the concrete result classes. This provides most of the attribute
- definitions. It does not provide a :meth:`geturl` method. It is derived from
- :class:`tuple`, but does not override the :meth:`__init__` or :meth:`__new__`
- methods.
+ Base class for the concrete result classes. This provides most of the
+ attribute definitions. It does not provide a :meth:`geturl` method. It is
+ derived from :class:`tuple`, but does not override the :meth:`__init__` or
+ :meth:`__new__` methods.
.. class:: ParseResult(scheme, netloc, path, params, query, fragment)
diff --git a/Doc/library/urllib.request.rst b/Doc/library/urllib.request.rst
index 4262836..d124d9a 100644
--- a/Doc/library/urllib.request.rst
+++ b/Doc/library/urllib.request.rst
@@ -7,9 +7,9 @@
.. sectionauthor:: Moshe Zadka <moshez@users.sourceforge.net>
-The :mod:`urllib.request` module defines functions and classes which help in opening
-URLs (mostly HTTP) in a complex world --- basic and digest authentication,
-redirections, cookies and more.
+The :mod:`urllib.request` module defines functions and classes which help in
+opening URLs (mostly HTTP) in a complex world --- basic and digest
+authentication, redirections, cookies and more.
The :mod:`urllib.request` module defines the following functions:
@@ -180,7 +180,7 @@ The following classes are provided:
the ``User-Agent`` header, which is used by a browser to identify itself --
some HTTP servers only allow requests coming from common browsers as opposed
to scripts. For example, Mozilla Firefox may identify itself as ``"Mozilla/5.0
- (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"``, while :mod:`urllib2`'s
+ (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"``, while :mod:`urllib`'s
default user agent string is ``"Python-urllib/2.6"`` (on Python 2.6).
The final two arguments are only of interest for correct handling of third-party
@@ -1005,10 +1005,11 @@ HTTPErrorProcessor Objects
For non-200 error codes, this simply passes the job on to the
:meth:`protocol_error_code` handler methods, via :meth:`OpenerDirector.error`.
- Eventually, :class:`urllib2.HTTPDefaultErrorHandler` will raise an
+ Eventually, :class:`HTTPDefaultErrorHandler` will raise an
:exc:`HTTPError` if no other handler handles the error.
-.. _urllib2-examples:
+
+.. _urllib-request-examples:
Examples
--------
@@ -1180,15 +1181,18 @@ The following example uses no proxies at all, overriding environment settings::
using the :mod:`ftplib` module, subclassing :class:`FancyURLOpener`, or changing
*_urlopener* to meet your needs.
+
+
:mod:`urllib.response` --- Response classes used by urllib.
===========================================================
+
.. module:: urllib.response
:synopsis: Response classes used by urllib.
The :mod:`urllib.response` module defines functions and classes which define a
-minimal file like interface, including read() and readline(). The typical
-response object is an addinfourl instance, which defines and info() method and
-that returns headers and a geturl() method that returns the url.
+minimal file like interface, including ``read()`` and ``readline()``. The
+typical response object is an addinfourl instance, which defines and ``info()``
+method and that returns headers and a ``geturl()`` method that returns the url.
Functions defined by this module are used internally by the
:mod:`urllib.request` module.
diff --git a/Doc/library/urllib.robotparser.rst b/Doc/library/urllib.robotparser.rst
index e351c56..0cac2ad 100644
--- a/Doc/library/urllib.robotparser.rst
+++ b/Doc/library/urllib.robotparser.rst
@@ -1,9 +1,8 @@
-
:mod:`urllib.robotparser` --- Parser for robots.txt
====================================================
.. module:: urllib.robotparser
- :synopsis: Loads a robots.txt file and answers questions about
+ :synopsis: Load a robots.txt file and answer questions about
fetchability of other URLs.
.. sectionauthor:: Skip Montanaro <skip@pobox.com>
@@ -25,42 +24,37 @@ structure of :file:`robots.txt` files, see http://www.robotstxt.org/orig.html.
This class provides a set of methods to read, parse and answer questions
about a single :file:`robots.txt` file.
-
.. method:: set_url(url)
Sets the URL referring to a :file:`robots.txt` file.
-
.. method:: read()
Reads the :file:`robots.txt` URL and feeds it to the parser.
-
.. method:: parse(lines)
Parses the lines argument.
-
.. method:: can_fetch(useragent, url)
Returns ``True`` if the *useragent* is allowed to fetch the *url*
according to the rules contained in the parsed :file:`robots.txt`
file.
-
.. method:: mtime()
Returns the time the ``robots.txt`` file was last fetched. This is
useful for long-running web spiders that need to check for new
``robots.txt`` files periodically.
-
.. method:: modified()
Sets the time the ``robots.txt`` file was last fetched to the current
time.
-The following example demonstrates basic use of the RobotFileParser class. ::
+
+The following example demonstrates basic use of the RobotFileParser class.
>>> import urllib.robotparser
>>> rp = urllib.robotparser.RobotFileParser()