Commit e53d977e authored by Senthil Kumaran's avatar Senthil Kumaran

Explain the use of charset parameter with Content-Type header: issue11082

parents df2aecbf 6b3434ae
...@@ -512,9 +512,10 @@ task isn't already covered by the URL parsing functions above. ...@@ -512,9 +512,10 @@ task isn't already covered by the URL parsing functions above.
Convert a mapping object or a sequence of two-element tuples, which may Convert a mapping object or a sequence of two-element tuples, which may
either be a :class:`str` or a :class:`bytes`, to a "percent-encoded" either be a :class:`str` or a :class:`bytes`, to a "percent-encoded"
string. The resultant string must be converted to bytes using the string. If the resultant string is to be used as a *data* for POST
user-specified encoding before it is sent to :func:`urlopen` as the optional operation with :func:`urlopen` function, then it should be properly encoded
*data* argument. to bytes, otherwise it would result in a :exc:`TypeError`.
The resulting string is a series of ``key=value`` pairs separated by ``'&'`` The resulting string is a series of ``key=value`` pairs separated by ``'&'``
characters, where both *key* and *value* are quoted using :func:`quote_plus` characters, where both *key* and *value* are quoted using :func:`quote_plus`
above. When a sequence of two-element tuples is used as the *query* above. When a sequence of two-element tuples is used as the *query*
......
...@@ -2,9 +2,10 @@ ...@@ -2,9 +2,10 @@
============================================================= =============================================================
.. module:: urllib.request .. module:: urllib.request
:synopsis: Next generation URL opening library. :synopsis: Extensible library for opening URLs.
.. moduleauthor:: Jeremy Hylton <jeremy@alum.mit.edu> .. moduleauthor:: Jeremy Hylton <jeremy@alum.mit.edu>
.. sectionauthor:: Moshe Zadka <moshez@users.sourceforge.net> .. sectionauthor:: Moshe Zadka <moshez@users.sourceforge.net>
.. sectionauthor:: Senthil Kumaran <senthil@uthcode.com>
The :mod:`urllib.request` module defines functions and classes which help in The :mod:`urllib.request` module defines functions and classes which help in
...@@ -20,16 +21,26 @@ The :mod:`urllib.request` module defines the following functions: ...@@ -20,16 +21,26 @@ The :mod:`urllib.request` module defines the following functions:
Open the URL *url*, which can be either a string or a Open the URL *url*, which can be either a string or a
:class:`Request` object. :class:`Request` object.
*data* may be a bytes object specifying additional data to send to the *data* must be a bytes object specifying additional data to be sent to the
server, or ``None`` if no such data is needed. *data* may also be an server, or ``None`` if no such data is needed. *data* may also be an
iterable object and in that case Content-Length value must be specified in iterable object and in that case Content-Length value must be specified in
the headers. Currently HTTP requests are the only ones that use *data*; the the headers. Currently HTTP requests are the only ones that use *data*; the
HTTP request will be a POST instead of a GET when the *data* parameter is HTTP request will be a POST instead of a GET when the *data* parameter is
provided. *data* should be a buffer in the standard provided.
*data* should be a buffer in the standard
:mimetype:`application/x-www-form-urlencoded` format. The :mimetype:`application/x-www-form-urlencoded` format. The
:func:`urllib.parse.urlencode` function takes a mapping or sequence of :func:`urllib.parse.urlencode` function takes a mapping or sequence of
2-tuples and returns a string in this format. urllib.request module uses 2-tuples and returns a string in this format. It should be encoded to bytes
HTTP/1.1 and includes ``Connection:close`` header in its HTTP requests. before being used as the *data* parameter. The charset parameter in
``Content-Type`` header may be used to specify the encoding. If charset
parameter is not sent with the Content-Type header, the server following the
HTTP 1.1 recommendation may assume that the data is encoded in ISO-8859-1
encoding. It is advisable to use charset parameter with encoding used in
``Content-Type`` header with the :class:`Request`.
urllib.request module uses HTTP/1.1 and includes ``Connection:close`` header
in its HTTP requests.
The optional *timeout* parameter specifies a timeout in seconds for The optional *timeout* parameter specifies a timeout in seconds for
blocking operations like the connection attempt (if not specified, blocking operations like the connection attempt (if not specified,
...@@ -66,9 +77,10 @@ The :mod:`urllib.request` module defines the following functions: ...@@ -66,9 +77,10 @@ The :mod:`urllib.request` module defines the following functions:
are handled through the proxy when they are set. are handled through the proxy when they are set.
The legacy ``urllib.urlopen`` function from Python 2.6 and earlier has been The legacy ``urllib.urlopen`` function from Python 2.6 and earlier has been
discontinued; :func:`urlopen` corresponds to the old ``urllib2.urlopen``. discontinued; :func:`urllib.request.urlopen` corresponds to the old
Proxy handling, which was done by passing a dictionary parameter to ``urllib2.urlopen``. Proxy handling, which was done by passing a dictionary
``urllib.urlopen``, can be obtained by using :class:`ProxyHandler` objects. parameter to ``urllib.urlopen``, can be obtained by using
:class:`ProxyHandler` objects.
.. versionchanged:: 3.2 .. versionchanged:: 3.2
*cafile* and *capath* were added. *cafile* and *capath* were added.
...@@ -83,10 +95,11 @@ The :mod:`urllib.request` module defines the following functions: ...@@ -83,10 +95,11 @@ The :mod:`urllib.request` module defines the following functions:
.. function:: install_opener(opener) .. function:: install_opener(opener)
Install an :class:`OpenerDirector` instance as the default global opener. Install an :class:`OpenerDirector` instance as the default global opener.
Installing an opener is only necessary if you want urlopen to use that opener; Installing an opener is only necessary if you want urlopen to use that
otherwise, simply call :meth:`OpenerDirector.open` instead of :func:`urlopen`. opener; otherwise, simply call :meth:`OpenerDirector.open` instead of
The code does not check for a real :class:`OpenerDirector`, and any class with :func:`~urllib.request.urlopen`. The code does not check for a real
the appropriate interface will work. :class:`OpenerDirector`, and any class with the appropriate interface will
work.
.. function:: build_opener([handler, ...]) .. function:: build_opener([handler, ...])
...@@ -138,13 +151,21 @@ The following classes are provided: ...@@ -138,13 +151,21 @@ The following classes are provided:
*url* should be a string containing a valid URL. *url* should be a string containing a valid URL.
*data* may be a bytes object specifying additional data to send to the *data* must be a bytes object specifying additional data to send to the
server, or ``None`` if no such data is needed. Currently HTTP requests are server, or ``None`` if no such data is needed. Currently HTTP requests are
the only ones that use *data*; the HTTP request will be a POST instead of a the only ones that use *data*; the HTTP request will be a POST instead of a
GET when the *data* parameter is provided. *data* should be a buffer in the GET when the *data* parameter is provided. *data* should be a buffer in the
standard :mimetype:`application/x-www-form-urlencoded` format. The standard :mimetype:`application/x-www-form-urlencoded` format.
:func:`urllib.parse.urlencode` function takes a mapping or sequence of
2-tuples and returns a string in this format. The :func:`urllib.parse.urlencode` function takes a mapping or sequence of
2-tuples and returns a string in this format. It should be encoded to bytes
before being used as the *data* parameter. The charset parameter in
``Content-Type`` header may be used to specify the encoding. If charset
parameter is not sent with the Content-Type header, the server following the
HTTP 1.1 recommendation may assume that the data is encoded in ISO-8859-1
encoding. It is advisable to use charset parameter with encoding used in
``Content-Type`` header with the :class:`Request`.
*headers* should be a dictionary, and will be treated as if *headers* should be a dictionary, and will be treated as if
:meth:`add_header` was called with each key and value as arguments. :meth:`add_header` was called with each key and value as arguments.
...@@ -156,8 +177,11 @@ The following classes are provided: ...@@ -156,8 +177,11 @@ The following classes are provided:
:mod:`urllib`'s default user agent string is :mod:`urllib`'s default user agent string is
``"Python-urllib/2.6"`` (on Python 2.6). ``"Python-urllib/2.6"`` (on Python 2.6).
The following two arguments, *origin_req_host* and *unverifiable*, An example of using ``Content-Type`` header with *data* argument would be
are only of interest for correct handling of third-party HTTP cookies: sending a dictionary like ``{"Content-Type":" application/x-www-form-urlencoded;charset=utf-8"}``
The final two arguments are only of interest for correct handling
of third-party HTTP cookies:
*origin_req_host* should be the request-host of the origin *origin_req_host* should be the request-host of the origin
transaction, as defined by :rfc:`2965`. It defaults to transaction, as defined by :rfc:`2965`. It defaults to
...@@ -1107,8 +1131,9 @@ every :class:`Request`. To change this:: ...@@ -1107,8 +1131,9 @@ every :class:`Request`. To change this::
opener.open('http://www.example.com/') opener.open('http://www.example.com/')
Also, remember that a few standard headers (:mailheader:`Content-Length`, Also, remember that a few standard headers (:mailheader:`Content-Length`,
:mailheader:`Content-Type` and :mailheader:`Host`) are added when the :mailheader:`Content-Type` without charset parameter and :mailheader:`Host`)
:class:`Request` is passed to :func:`urlopen` (or :meth:`OpenerDirector.open`). are added when the :class:`Request` is passed to :func:`urlopen` (or
:meth:`OpenerDirector.open`).
.. _urllib-examples: .. _urllib-examples:
...@@ -1126,9 +1151,12 @@ from urlencode is encoded to bytes before it is sent to urlopen as data:: ...@@ -1126,9 +1151,12 @@ from urlencode is encoded to bytes before it is sent to urlopen as data::
>>> import urllib.request >>> import urllib.request
>>> import urllib.parse >>> import urllib.parse
>>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0}) >>> data = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>> params = params.encode('utf-8') >>> data = data.encode('utf-8')
>>> f = urllib.request.urlopen("http://www.musi-cal.com/cgi-bin/query", params) >>> request = urllib.request.Request("http://requestb.in/xrbl82xr")
>>> # adding charset parameter to the Content-Type header.
>>> request.add_header("Content-Type","application/x-www-form-urlencoded;charset=utf-8")
>>> f = urllib.request.urlopen(request, data)
>>> print(f.read().decode('utf-8')) >>> print(f.read().decode('utf-8'))
The following example uses an explicitly specified HTTP proxy, overriding The following example uses an explicitly specified HTTP proxy, overriding
......
...@@ -1172,8 +1172,9 @@ class AbstractHTTPHandler(BaseHandler): ...@@ -1172,8 +1172,9 @@ class AbstractHTTPHandler(BaseHandler):
if request.data is not None: # POST if request.data is not None: # POST
data = request.data data = request.data
if isinstance(data, str): if isinstance(data, str):
raise TypeError("POST data should be bytes" msg = "POST data should be bytes or an iterable of bytes."\
" or an iterable of bytes. It cannot be str.") "It cannot be str"
raise TypeError(msg)
if not request.has_header('Content-type'): if not request.has_header('Content-type'):
request.add_unredirected_header( request.add_unredirected_header(
'Content-type', 'Content-type',
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment