Commit ef91bb26 authored by Martin Panter's avatar Martin Panter

Issue #12319: Always send file request bodies using chunked encoding

The previous attempt to determine the file’s Content-Length gave a false
positive for pipes on Windows.

Also, drop the special case for sending zero-length iterable bodies.
parent 8f96a306
...@@ -240,17 +240,17 @@ HTTPConnection Objects ...@@ -240,17 +240,17 @@ HTTPConnection Objects
The *headers* argument should be a mapping of extra HTTP headers to send The *headers* argument should be a mapping of extra HTTP headers to send
with the request. with the request.
If *headers* contains neither Content-Length nor Transfer-Encoding, a If *headers* contains neither Content-Length nor Transfer-Encoding,
Content-Length header will be added automatically if possible. If but there is a request body, one of those
header fields will be added automatically. If
*body* is ``None``, the Content-Length header is set to ``0`` for *body* is ``None``, the Content-Length header is set to ``0`` for
methods that expect a body (``PUT``, ``POST``, and ``PATCH``). If methods that expect a body (``PUT``, ``POST``, and ``PATCH``). If
*body* is a string or bytes-like object, the Content-Length header is *body* is a string or a bytes-like object that is not also a
set to its length. If *body* is a binary :term:`file object` :term:`file <file object>`, the Content-Length header is
supporting :meth:`~io.IOBase.seek`, this will be used to determine set to its length. Any other type of *body* (files
its size. Otherwise, the Content-Length header is not added and iterables in general) will be chunk-encoded, and the
automatically. In cases where determining the Content-Length up Transfer-Encoding header will automatically be set instead of
front is not possible, the body will be chunk-encoded and the Content-Length.
Transfer-Encoding header will automatically be set.
The *encode_chunked* argument is only relevant if Transfer-Encoding is The *encode_chunked* argument is only relevant if Transfer-Encoding is
specified in *headers*. If *encode_chunked* is ``False``, the specified in *headers*. If *encode_chunked* is ``False``, the
...@@ -260,19 +260,18 @@ HTTPConnection Objects ...@@ -260,19 +260,18 @@ HTTPConnection Objects
.. note:: .. note::
Chunked transfer encoding has been added to the HTTP protocol Chunked transfer encoding has been added to the HTTP protocol
version 1.1. Unless the HTTP server is known to handle HTTP 1.1, version 1.1. Unless the HTTP server is known to handle HTTP 1.1,
the caller must either specify the Content-Length or must use a the caller must either specify the Content-Length, or must pass a
body representation whose length can be determined automatically. :class:`str` or bytes-like object that is not also a file as the
body representation.
.. versionadded:: 3.2 .. versionadded:: 3.2
*body* can now be an iterable. *body* can now be an iterable.
.. versionchanged:: 3.6 .. versionchanged:: 3.6
If neither Content-Length nor Transfer-Encoding are set in If neither Content-Length nor Transfer-Encoding are set in
*headers* and Content-Length cannot be determined, *body* will now *headers*, file and iterable *body* objects are now chunk-encoded.
be automatically chunk-encoded. The *encode_chunked* argument The *encode_chunked* argument was added.
was added. No attempt is made to determine the Content-Length for file
The Content-Length for binary file objects is determined with seek.
No attempt is made to determine the Content-Length for text file
objects. objects.
.. method:: HTTPConnection.getresponse() .. method:: HTTPConnection.getresponse()
......
...@@ -187,12 +187,11 @@ The following classes are provided: ...@@ -187,12 +187,11 @@ The following classes are provided:
server, or ``None`` if no such data is needed. Currently HTTP server, or ``None`` if no such data is needed. Currently HTTP
requests are the only ones that use *data*. The supported object requests are the only ones that use *data*. The supported object
types include bytes, file-like objects, and iterables. If no types include bytes, file-like objects, and iterables. If no
``Content-Length`` header has been provided, :class:`HTTPHandler` will ``Content-Length`` nor ``Transfer-Encoding`` header field
try to determine the length of *data* and set this header accordingly. has been provided, :class:`HTTPHandler` will set these headers according
If this fails, ``Transfer-Encoding: chunked`` as specified in to the type of *data*. ``Content-Length`` will be used to send
:rfc:`7230`, Section 3.3.1 will be used to send the data. See bytes objects, while ``Transfer-Encoding: chunked`` as specified in
:meth:`http.client.HTTPConnection.request` for details on the :rfc:`7230`, Section 3.3.1 will be used to send files and other iterables.
supported object types and on how the content length is determined.
For an HTTP POST request method, *data* should be a buffer in the For an HTTP POST request method, *data* should be a buffer in the
standard :mimetype:`application/x-www-form-urlencoded` format. The standard :mimetype:`application/x-www-form-urlencoded` format. The
...@@ -256,8 +255,8 @@ The following classes are provided: ...@@ -256,8 +255,8 @@ The following classes are provided:
.. versionchanged:: 3.6 .. versionchanged:: 3.6
Do not raise an error if the ``Content-Length`` has not been Do not raise an error if the ``Content-Length`` has not been
provided and could not be determined. Fall back to use chunked provided and *data* is neither ``None`` nor a bytes object.
transfer encoding instead. Fall back to use chunked transfer encoding instead.
.. class:: OpenerDirector() .. class:: OpenerDirector()
......
...@@ -579,8 +579,8 @@ The :class:`~unittest.mock.Mock` class has the following improvements: ...@@ -579,8 +579,8 @@ The :class:`~unittest.mock.Mock` class has the following improvements:
urllib.request urllib.request
-------------- --------------
If a HTTP request has a non-empty body but no Content-Length header If a HTTP request has a file or iterable body (other than a
and the content length cannot be determined up front, rather than bytes object) but no Content-Length header, rather than
throwing an error, :class:`~urllib.request.AbstractHTTPHandler` now throwing an error, :class:`~urllib.request.AbstractHTTPHandler` now
falls back to use chunked transfer encoding. falls back to use chunked transfer encoding.
(Contributed by Demian Brecht and Rolf Krahl in :issue:`12319`.) (Contributed by Demian Brecht and Rolf Krahl in :issue:`12319`.)
...@@ -935,6 +935,13 @@ Changes in the Python API ...@@ -935,6 +935,13 @@ Changes in the Python API
This behavior has also been backported to earlier Python versions This behavior has also been backported to earlier Python versions
by Setuptools 26.0.0. by Setuptools 26.0.0.
* In the :mod:`urllib.request` module and the
:meth:`http.client.HTTPConnection.request` method, if no Content-Length
header field has been specified and the request body is a file object,
it is now sent with HTTP 1.1 chunked encoding. If a file object has to
be sent to a HTTP 1.0 server, the Content-Length value now has to be
specified by the caller. See :issue:`12319`.
Changes in the C API Changes in the C API
-------------------- --------------------
......
...@@ -805,35 +805,21 @@ class HTTPConnection: ...@@ -805,35 +805,21 @@ class HTTPConnection:
def _get_content_length(body, method): def _get_content_length(body, method):
"""Get the content-length based on the body. """Get the content-length based on the body.
If the body is "empty", we set Content-Length: 0 for methods If the body is None, we set Content-Length: 0 for methods that expect
that expect a body (RFC 7230, Section 3.3.2). If the body is a body (RFC 7230, Section 3.3.2). We also set the Content-Length for
set for other methods, we set the header provided we can any method if the body is a str or bytes-like object and not a file.
figure out what the length is.
""" """
if not body: if body is None:
# do an explicit check for not None here to distinguish # do an explicit check for not None here to distinguish
# between unset and set but empty # between unset and set but empty
if method.upper() in _METHODS_EXPECTING_BODY or body is not None: if method.upper() in _METHODS_EXPECTING_BODY:
return 0 return 0
else: else:
return None return None
if hasattr(body, 'read'): if hasattr(body, 'read'):
# file-like object. # file-like object.
if HTTPConnection._is_textIO(body): return None
# text streams are unpredictable because it depends on
# character encoding and line ending translation.
return None
else:
# Is it seekable?
try:
curpos = body.tell()
sz = body.seek(0, io.SEEK_END)
except (TypeError, AttributeError, OSError):
return None
else:
body.seek(curpos)
return sz - curpos
try: try:
# does it implement the buffer protocol (bytes, bytearray, array)? # does it implement the buffer protocol (bytes, bytearray, array)?
...@@ -1266,8 +1252,7 @@ class HTTPConnection: ...@@ -1266,8 +1252,7 @@ class HTTPConnection:
# the caller passes encode_chunked=True or the following # the caller passes encode_chunked=True or the following
# conditions hold: # conditions hold:
# 1. content-length has not been explicitly set # 1. content-length has not been explicitly set
# 2. the length of the body cannot be determined # 2. the body is a file or iterable, but not a str or bytes-like
# (e.g. it is a generator or unseekable file)
# 3. Transfer-Encoding has NOT been explicitly set by the caller # 3. Transfer-Encoding has NOT been explicitly set by the caller
if 'content-length' not in header_names: if 'content-length' not in header_names:
...@@ -1280,7 +1265,7 @@ class HTTPConnection: ...@@ -1280,7 +1265,7 @@ class HTTPConnection:
encode_chunked = False encode_chunked = False
content_length = self._get_content_length(body, method) content_length = self._get_content_length(body, method)
if content_length is None: if content_length is None:
if body: if body is not None:
if self.debuglevel > 0: if self.debuglevel > 0:
print('Unable to determine size of %r' % body) print('Unable to determine size of %r' % body)
encode_chunked = True encode_chunked = True
......
...@@ -381,6 +381,16 @@ class TransferEncodingTest(TestCase): ...@@ -381,6 +381,16 @@ class TransferEncodingTest(TestCase):
# same request # same request
self.assertNotIn('content-length', [k.lower() for k in headers]) self.assertNotIn('content-length', [k.lower() for k in headers])
def test_empty_body(self):
# Zero-length iterable should be treated like any other iterable
conn = client.HTTPConnection('example.com')
conn.sock = FakeSocket(b'')
conn.request('POST', '/', ())
_, headers, body = self._parse_request(conn.sock.data)
self.assertEqual(headers['Transfer-Encoding'], 'chunked')
self.assertNotIn('content-length', [k.lower() for k in headers])
self.assertEqual(body, b"0\r\n\r\n")
def _make_body(self, empty_lines=False): def _make_body(self, empty_lines=False):
lines = self.expected_body.split(b' ') lines = self.expected_body.split(b' ')
for idx, line in enumerate(lines): for idx, line in enumerate(lines):
...@@ -652,7 +662,9 @@ class BasicTest(TestCase): ...@@ -652,7 +662,9 @@ class BasicTest(TestCase):
def test_send_file(self): def test_send_file(self):
expected = (b'GET /foo HTTP/1.1\r\nHost: example.com\r\n' expected = (b'GET /foo HTTP/1.1\r\nHost: example.com\r\n'
b'Accept-Encoding: identity\r\nContent-Length:') b'Accept-Encoding: identity\r\n'
b'Transfer-Encoding: chunked\r\n'
b'\r\n')
with open(__file__, 'rb') as body: with open(__file__, 'rb') as body:
conn = client.HTTPConnection('example.com') conn = client.HTTPConnection('example.com')
...@@ -1717,7 +1729,7 @@ class RequestBodyTest(TestCase): ...@@ -1717,7 +1729,7 @@ class RequestBodyTest(TestCase):
self.assertEqual("5", message.get("content-length")) self.assertEqual("5", message.get("content-length"))
self.assertEqual(b'body\xc1', f.read()) self.assertEqual(b'body\xc1', f.read())
def test_file_body(self): def test_text_file_body(self):
self.addCleanup(support.unlink, support.TESTFN) self.addCleanup(support.unlink, support.TESTFN)
with open(support.TESTFN, "w") as f: with open(support.TESTFN, "w") as f:
f.write("body") f.write("body")
...@@ -1726,10 +1738,8 @@ class RequestBodyTest(TestCase): ...@@ -1726,10 +1738,8 @@ class RequestBodyTest(TestCase):
message, f = self.get_headers_and_fp() message, f = self.get_headers_and_fp()
self.assertEqual("text/plain", message.get_content_type()) self.assertEqual("text/plain", message.get_content_type())
self.assertIsNone(message.get_charset()) self.assertIsNone(message.get_charset())
# Note that the length of text files is unpredictable # No content-length will be determined for files; the body
# because it depends on character encoding and line ending # will be sent using chunked transfer encoding instead.
# translation. No content-length will be set, the body
# will be sent using chunked transfer encoding.
self.assertIsNone(message.get("content-length")) self.assertIsNone(message.get("content-length"))
self.assertEqual("chunked", message.get("transfer-encoding")) self.assertEqual("chunked", message.get("transfer-encoding"))
self.assertEqual(b'4\r\nbody\r\n0\r\n\r\n', f.read()) self.assertEqual(b'4\r\nbody\r\n0\r\n\r\n', f.read())
...@@ -1743,8 +1753,9 @@ class RequestBodyTest(TestCase): ...@@ -1743,8 +1753,9 @@ class RequestBodyTest(TestCase):
message, f = self.get_headers_and_fp() message, f = self.get_headers_and_fp()
self.assertEqual("text/plain", message.get_content_type()) self.assertEqual("text/plain", message.get_content_type())
self.assertIsNone(message.get_charset()) self.assertIsNone(message.get_charset())
self.assertEqual("5", message.get("content-length")) self.assertEqual("chunked", message.get("Transfer-Encoding"))
self.assertEqual(b'body\xc1', f.read()) self.assertNotIn("Content-Length", message)
self.assertEqual(b'5\r\nbody\xc1\r\n0\r\n\r\n', f.read())
class HTTPResponseTest(TestCase): class HTTPResponseTest(TestCase):
......
...@@ -913,40 +913,50 @@ class HandlerTests(unittest.TestCase): ...@@ -913,40 +913,50 @@ class HandlerTests(unittest.TestCase):
self.assertEqual(req.unredirected_hdrs["Spam"], "foo") self.assertEqual(req.unredirected_hdrs["Spam"], "foo")
def test_http_body_file(self): def test_http_body_file(self):
# A regular file - Content Length is calculated unless already set. # A regular file - chunked encoding is used unless Content Length is
# already set.
h = urllib.request.AbstractHTTPHandler() h = urllib.request.AbstractHTTPHandler()
o = h.parent = MockOpener() o = h.parent = MockOpener()
file_obj = tempfile.NamedTemporaryFile(mode='w+b', delete=False) file_obj = tempfile.NamedTemporaryFile(mode='w+b', delete=False)
file_path = file_obj.name file_path = file_obj.name
file_obj.write(b"Something\nSomething\nSomething\n")
file_obj.close() file_obj.close()
self.addCleanup(os.unlink, file_path)
for headers in {}, {"Content-Length": 30}: with open(file_path, "rb") as f:
with open(file_path, "rb") as f: req = Request("http://example.com/", f, {})
req = Request("http://example.com/", f, headers) newreq = h.do_request_(req)
newreq = h.do_request_(req) te = newreq.get_header('Transfer-encoding')
self.assertEqual(int(newreq.get_header('Content-length')), 30) self.assertEqual(te, "chunked")
self.assertFalse(newreq.has_header('Content-length'))
os.unlink(file_path) with open(file_path, "rb") as f:
req = Request("http://example.com/", f, {"Content-Length": 30})
newreq = h.do_request_(req)
self.assertEqual(int(newreq.get_header('Content-length')), 30)
self.assertFalse(newreq.has_header("Transfer-encoding"))
def test_http_body_fileobj(self): def test_http_body_fileobj(self):
# A file object - Content Length is calculated unless already set. # A file object - chunked encoding is used
# unless Content Length is already set.
# (Note that there are some subtle differences to a regular # (Note that there are some subtle differences to a regular
# file, that is why we are testing both cases.) # file, that is why we are testing both cases.)
h = urllib.request.AbstractHTTPHandler() h = urllib.request.AbstractHTTPHandler()
o = h.parent = MockOpener() o = h.parent = MockOpener()
file_obj = io.BytesIO() file_obj = io.BytesIO()
file_obj.write(b"Something\nSomething\nSomething\n")
for headers in {}, {"Content-Length": 30}: req = Request("http://example.com/", file_obj, {})
file_obj.seek(0) newreq = h.do_request_(req)
req = Request("http://example.com/", file_obj, headers) self.assertEqual(newreq.get_header('Transfer-encoding'), 'chunked')
newreq = h.do_request_(req) self.assertFalse(newreq.has_header('Content-length'))
self.assertEqual(int(newreq.get_header('Content-length')), 30)
headers = {"Content-Length": 30}
req = Request("http://example.com/", file_obj, headers)
newreq = h.do_request_(req)
self.assertEqual(int(newreq.get_header('Content-length')), 30)
self.assertFalse(newreq.has_header("Transfer-encoding"))
file_obj.close() file_obj.close()
...@@ -959,9 +969,7 @@ class HandlerTests(unittest.TestCase): ...@@ -959,9 +969,7 @@ class HandlerTests(unittest.TestCase):
h = urllib.request.AbstractHTTPHandler() h = urllib.request.AbstractHTTPHandler()
o = h.parent = MockOpener() o = h.parent = MockOpener()
cmd = [sys.executable, "-c", cmd = [sys.executable, "-c", r"pass"]
r"import sys; "
r"sys.stdout.buffer.write(b'Something\nSomething\nSomething\n')"]
for headers in {}, {"Content-Length": 30}: for headers in {}, {"Content-Length": 30}:
with subprocess.Popen(cmd, stdout=subprocess.PIPE) as proc: with subprocess.Popen(cmd, stdout=subprocess.PIPE) as proc:
req = Request("http://example.com/", proc.stdout, headers) req = Request("http://example.com/", proc.stdout, headers)
...@@ -983,8 +991,6 @@ class HandlerTests(unittest.TestCase): ...@@ -983,8 +991,6 @@ class HandlerTests(unittest.TestCase):
def iterable_body(): def iterable_body():
yield b"one" yield b"one"
yield b"two"
yield b"three"
for headers in {}, {"Content-Length": 11}: for headers in {}, {"Content-Length": 11}:
req = Request("http://example.com/", iterable_body(), headers) req = Request("http://example.com/", iterable_body(), headers)
...@@ -996,6 +1002,14 @@ class HandlerTests(unittest.TestCase): ...@@ -996,6 +1002,14 @@ class HandlerTests(unittest.TestCase):
else: else:
self.assertEqual(int(newreq.get_header('Content-length')), 11) self.assertEqual(int(newreq.get_header('Content-length')), 11)
def test_http_body_empty_seq(self):
# Zero-length iterable body should be treated like any other iterable
h = urllib.request.AbstractHTTPHandler()
h.parent = MockOpener()
req = h.do_request_(Request("http://example.com/", ()))
self.assertEqual(req.get_header("Transfer-encoding"), "chunked")
self.assertFalse(req.has_header("Content-length"))
def test_http_body_array(self): def test_http_body_array(self):
# array.array Iterable - Content Length is calculated # array.array Iterable - Content Length is calculated
......
...@@ -52,10 +52,9 @@ Library ...@@ -52,10 +52,9 @@ Library
- Issue #12319: Chunked transfer encoding support added to - Issue #12319: Chunked transfer encoding support added to
http.client.HTTPConnection requests. The http.client.HTTPConnection requests. The
urllib.request.AbstractHTTPHandler class does not enforce a Content-Length urllib.request.AbstractHTTPHandler class does not enforce a Content-Length
header any more. If a HTTP request has a non-empty body, but no header any more. If a HTTP request has a file or iterable body, but no
Content-Length header, and the content length cannot be determined Content-Length header, the library now falls back to use chunked transfer-
up front, rather than throwing an error, the library now falls back encoding.
to use chunked transfer encoding.
- A new version of typing.py from https://github.com/python/typing: - A new version of typing.py from https://github.com/python/typing:
- Collection (only for 3.6) (Issue #27598) - Collection (only for 3.6) (Issue #27598)
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment