summaryrefslogtreecommitdiffstats
path: root/Lib/urllib2.py
diff options
context:
space:
mode:
authorJeremy Hylton <jeremy@alum.mit.edu>2004-08-07 17:40:50 (GMT)
committerJeremy Hylton <jeremy@alum.mit.edu>2004-08-07 17:40:50 (GMT)
commit5d9c3031c805ffb634688a6fcae0e7790688ce53 (patch)
tree43acb77a31e43498fe410844c30b83461fa66e4e /Lib/urllib2.py
parent1baa2480215a2cd168e2fde10e640650c1807496 (diff)
downloadcpython-5d9c3031c805ffb634688a6fcae0e7790688ce53.zip
cpython-5d9c3031c805ffb634688a6fcae0e7790688ce53.tar.gz
cpython-5d9c3031c805ffb634688a6fcae0e7790688ce53.tar.bz2
Fix urllib2.urlopen() handling of chunked content encoding.
The change to use the newer httplib interface admitted the possibility that we'd get an HTTP/1.1 chunked response, but the code didn't handle it correctly. The raw socket object can't be pass to addinfourl(), because it would read the undecoded response. Instead, addinfourl() must call HTTPResponse.read(), which will handle the decoding. One extra wrinkle is that the HTTPReponse object can't be passed to addinfourl() either, because it doesn't implement readline() or readlines(). As a quick hack, use socket._fileobject(), which implements those methods on top of a read buffer. (suggested by mwh) Finally, add some tests based on test_urllibnet. Thanks to Andrew Sawyers for originally reporting the chunked problem.
Diffstat (limited to 'Lib/urllib2.py')
-rw-r--r--Lib/urllib2.py16
1 files changed, 14 insertions, 2 deletions
diff --git a/Lib/urllib2.py b/Lib/urllib2.py
index c525f8c..9ec8b9b 100644
--- a/Lib/urllib2.py
+++ b/Lib/urllib2.py
@@ -997,8 +997,20 @@ class AbstractHTTPHandler(BaseHandler):
raise URLError(err)
# Pick apart the HTTPResponse object to get the addinfourl
- # object initialized properly
- resp = addinfourl(r.fp, r.msg, req.get_full_url())
+ # object initialized properly.
+
+ # Wrap the HTTPResponse object in socket's file object adapter
+ # for Windows. That adapter calls recv(), so delegate recv()
+ # to read(). This weird wrapping allows the returned object to
+ # have readline() and readlines() methods.
+
+ # XXX It might be better to extract the read buffering code
+ # out of socket._fileobject() and into a base class.
+
+ r.recv = r.read
+ fp = socket._fileobject(r)
+
+ resp = addinfourl(fp, r.msg, req.get_full_url())
resp.code = r.status
resp.msg = r.reason
return resp