Commit d05c9ff8 authored by Alexandre Vassalotti's avatar Alexandre Vassalotti

Issue #6784: Strings from Python 2 can now be unpickled as bytes objects.

Initial patch by Merlijn van Deen.

I've added a few unrelated docstring fixes in the patch while I was at
it, which makes the documentation for pickle a bit more consistent.
parent ee07b947
...@@ -173,7 +173,7 @@ The :mod:`pickle` module provides the following constants: ...@@ -173,7 +173,7 @@ The :mod:`pickle` module provides the following constants:
An integer, the default :ref:`protocol version <pickle-protocols>` used An integer, the default :ref:`protocol version <pickle-protocols>` used
for pickling. May be less than :data:`HIGHEST_PROTOCOL`. Currently the for pickling. May be less than :data:`HIGHEST_PROTOCOL`. Currently the
default protocol is 3, a new protocol designed for Python 3.0. default protocol is 3, a new protocol designed for Python 3.
The :mod:`pickle` module provides the following functions to make the pickling The :mod:`pickle` module provides the following functions to make the pickling
...@@ -184,9 +184,9 @@ process more convenient: ...@@ -184,9 +184,9 @@ process more convenient:
Write a pickled representation of *obj* to the open :term:`file object` *file*. Write a pickled representation of *obj* to the open :term:`file object` *file*.
This is equivalent to ``Pickler(file, protocol).dump(obj)``. This is equivalent to ``Pickler(file, protocol).dump(obj)``.
The optional *protocol* argument tells the pickler to use the given protocol; The optional *protocol* argument tells the pickler to use the given
supported protocols are 0, 1, 2, 3. The default protocol is 3; a protocol; supported protocols are 0, 1, 2, 3. The default protocol is 3; a
backward-incompatible protocol designed for Python 3.0. backward-incompatible protocol designed for Python 3.
Specifying a negative protocol version selects the highest protocol version Specifying a negative protocol version selects the highest protocol version
supported. The higher the protocol used, the more recent the version of supported. The higher the protocol used, the more recent the version of
...@@ -198,64 +198,66 @@ process more convenient: ...@@ -198,64 +198,66 @@ process more convenient:
interface. interface.
If *fix_imports* is true and *protocol* is less than 3, pickle will try to If *fix_imports* is true and *protocol* is less than 3, pickle will try to
map the new Python 3.x names to the old module names used in Python 2.x, map the new Python 3 names to the old module names used in Python 2, so
so that the pickle data stream is readable with Python 2.x. that the pickle data stream is readable with Python 2.
.. function:: dumps(obj, protocol=None, \*, fix_imports=True) .. function:: dumps(obj, protocol=None, \*, fix_imports=True)
Return the pickled representation of the object as a :class:`bytes` Return the pickled representation of the object as a :class:`bytes` object,
object, instead of writing it to a file. instead of writing it to a file.
The optional *protocol* argument tells the pickler to use the given protocol; The optional *protocol* argument tells the pickler to use the given
supported protocols are 0, 1, 2, 3. The default protocol is 3; a protocol; supported protocols are 0, 1, 2, 3 and 4. The default protocol
backward-incompatible protocol designed for Python 3.0. is 3; a backward-incompatible protocol designed for Python 3.
Specifying a negative protocol version selects the highest protocol version Specifying a negative protocol version selects the highest protocol version
supported. The higher the protocol used, the more recent the version of supported. The higher the protocol used, the more recent the version of
Python needed to read the pickle produced. Python needed to read the pickle produced.
If *fix_imports* is true and *protocol* is less than 3, pickle will try to If *fix_imports* is true and *protocol* is less than 3, pickle will try to
map the new Python 3.x names to the old module names used in Python 2.x, map the new Python 3 names to the old module names used in Python 2, so
so that the pickle data stream is readable with Python 2.x. that the pickle data stream is readable with Python 2.
.. function:: load(file, \*, fix_imports=True, encoding="ASCII", errors="strict") .. function:: load(file, \*, fix_imports=True, encoding="ASCII", errors="strict")
Read a pickled object representation from the open :term:`file object` *file* Read a pickled object representation from the open :term:`file object`
and return the reconstituted object hierarchy specified therein. This is *file* and return the reconstituted object hierarchy specified therein.
equivalent to ``Unpickler(file).load()``. This is equivalent to ``Unpickler(file).load()``.
The protocol version of the pickle is detected automatically, so no protocol The protocol version of the pickle is detected automatically, so no
argument is needed. Bytes past the pickled object's representation are protocol argument is needed. Bytes past the pickled object's
ignored. representation are ignored.
The argument *file* must have two methods, a read() method that takes an The argument *file* must have two methods, a read() method that takes an
integer argument, and a readline() method that requires no arguments. Both integer argument, and a readline() method that requires no arguments. Both
methods should return bytes. Thus *file* can be an on-disk file opened methods should return bytes. Thus *file* can be an on-disk file opened for
for binary reading, a :class:`io.BytesIO` object, or any other custom object binary reading, a :class:`io.BytesIO` object, or any other custom object
that meets this interface. that meets this interface.
Optional keyword arguments are *fix_imports*, *encoding* and *errors*, Optional keyword arguments are *fix_imports*, *encoding* and *errors*,
which are used to control compatibility support for pickle stream generated which are used to control compatibility support for pickle stream generated
by Python 2.x. If *fix_imports* is true, pickle will try to map the old by Python 2. If *fix_imports* is true, pickle will try to map the old
Python 2.x names to the new names used in Python 3.x. The *encoding* and Python 2 names to the new names used in Python 3. The *encoding* and
*errors* tell pickle how to decode 8-bit string instances pickled by Python *errors* tell pickle how to decode 8-bit string instances pickled by Python
2.x; these default to 'ASCII' and 'strict', respectively. 2; these default to 'ASCII' and 'strict', respectively. The *encoding* can
be 'bytes' to read these 8-bit string instances as bytes objects.
.. function:: loads(bytes_object, \*, fix_imports=True, encoding="ASCII", errors="strict") .. function:: loads(bytes_object, \*, fix_imports=True, encoding="ASCII", errors="strict")
Read a pickled object hierarchy from a :class:`bytes` object and return the Read a pickled object hierarchy from a :class:`bytes` object and return the
reconstituted object hierarchy specified therein reconstituted object hierarchy specified therein
The protocol version of the pickle is detected automatically, so no protocol The protocol version of the pickle is detected automatically, so no
argument is needed. Bytes past the pickled object's representation are protocol argument is needed. Bytes past the pickled object's
ignored. representation are ignored.
Optional keyword arguments are *fix_imports*, *encoding* and *errors*, Optional keyword arguments are *fix_imports*, *encoding* and *errors*,
which are used to control compatibility support for pickle stream generated which are used to control compatibility support for pickle stream generated
by Python 2.x. If *fix_imports* is true, pickle will try to map the old by Python 2. If *fix_imports* is true, pickle will try to map the old
Python 2.x names to the new names used in Python 3.x. The *encoding* and Python 2 names to the new names used in Python 3. The *encoding* and
*errors* tell pickle how to decode 8-bit string instances pickled by Python *errors* tell pickle how to decode 8-bit string instances pickled by Python
2.x; these default to 'ASCII' and 'strict', respectively. 2; these default to 'ASCII' and 'strict', respectively. The *encoding* can
be 'bytes' to read these 8-bit string instances as bytes objects.
The :mod:`pickle` module defines three exceptions: The :mod:`pickle` module defines three exceptions:
...@@ -290,9 +292,9 @@ The :mod:`pickle` module exports two classes, :class:`Pickler` and ...@@ -290,9 +292,9 @@ The :mod:`pickle` module exports two classes, :class:`Pickler` and
This takes a binary file for writing a pickle data stream. This takes a binary file for writing a pickle data stream.
The optional *protocol* argument tells the pickler to use the given protocol; The optional *protocol* argument tells the pickler to use the given
supported protocols are 0, 1, 2, 3. The default protocol is 3; a protocol; supported protocols are 0, 1, 2, 3 and 4. The default protocol
backward-incompatible protocol designed for Python 3.0. is 3; a backward-incompatible protocol designed for Python 3.
Specifying a negative protocol version selects the highest protocol version Specifying a negative protocol version selects the highest protocol version
supported. The higher the protocol used, the more recent the version of supported. The higher the protocol used, the more recent the version of
...@@ -300,11 +302,12 @@ The :mod:`pickle` module exports two classes, :class:`Pickler` and ...@@ -300,11 +302,12 @@ The :mod:`pickle` module exports two classes, :class:`Pickler` and
The *file* argument must have a write() method that accepts a single bytes The *file* argument must have a write() method that accepts a single bytes
argument. It can thus be an on-disk file opened for binary writing, a argument. It can thus be an on-disk file opened for binary writing, a
:class:`io.BytesIO` instance, or any other custom object that meets this interface. :class:`io.BytesIO` instance, or any other custom object that meets this
interface.
If *fix_imports* is true and *protocol* is less than 3, pickle will try to If *fix_imports* is true and *protocol* is less than 3, pickle will try to
map the new Python 3.x names to the old module names used in Python 2.x, map the new Python 3 names to the old module names used in Python 2, so
so that the pickle data stream is readable with Python 2.x. that the pickle data stream is readable with Python 2.
.. method:: dump(obj) .. method:: dump(obj)
...@@ -366,16 +369,17 @@ The :mod:`pickle` module exports two classes, :class:`Pickler` and ...@@ -366,16 +369,17 @@ The :mod:`pickle` module exports two classes, :class:`Pickler` and
The argument *file* must have two methods, a read() method that takes an The argument *file* must have two methods, a read() method that takes an
integer argument, and a readline() method that requires no arguments. Both integer argument, and a readline() method that requires no arguments. Both
methods should return bytes. Thus *file* can be an on-disk file object opened methods should return bytes. Thus *file* can be an on-disk file object
for binary reading, a :class:`io.BytesIO` object, or any other custom object opened for binary reading, a :class:`io.BytesIO` object, or any other
that meets this interface. custom object that meets this interface.
Optional keyword arguments are *fix_imports*, *encoding* and *errors*, Optional keyword arguments are *fix_imports*, *encoding* and *errors*,
which are used to control compatibility support for pickle stream generated which are used to control compatibility support for pickle stream generated
by Python 2.x. If *fix_imports* is true, pickle will try to map the old by Python 2. If *fix_imports* is true, pickle will try to map the old
Python 2.x names to the new names used in Python 3.x. The *encoding* and Python 2 names to the new names used in Python 3. The *encoding* and
*errors* tell pickle how to decode 8-bit string instances pickled by Python *errors* tell pickle how to decode 8-bit string instances pickled by Python
2.x; these default to 'ASCII' and 'strict', respectively. 2; these default to 'ASCII' and 'strict', respectively. The *encoding* can
be 'bytes' to read these ß8-bit string instances as bytes objects.
.. method:: load() .. method:: load()
......
...@@ -348,24 +348,25 @@ class _Pickler: ...@@ -348,24 +348,25 @@ class _Pickler:
def __init__(self, file, protocol=None, *, fix_imports=True): def __init__(self, file, protocol=None, *, fix_imports=True):
"""This takes a binary file for writing a pickle data stream. """This takes a binary file for writing a pickle data stream.
The optional protocol argument tells the pickler to use the The optional *protocol* argument tells the pickler to use the
given protocol; supported protocols are 0, 1, 2, 3 and 4. The given protocol; supported protocols are 0, 1, 2, 3 and 4. The
default protocol is 3; a backward-incompatible protocol designed for default protocol is 3; a backward-incompatible protocol designed
Python 3. for Python 3.
Specifying a negative protocol version selects the highest Specifying a negative protocol version selects the highest
protocol version supported. The higher the protocol used, the protocol version supported. The higher the protocol used, the
more recent the version of Python needed to read the pickle more recent the version of Python needed to read the pickle
produced. produced.
The file argument must have a write() method that accepts a single The *file* argument must have a write() method that accepts a
bytes argument. It can thus be a file object opened for binary single bytes argument. It can thus be a file object opened for
writing, a io.BytesIO instance, or any other custom object that binary writing, a io.BytesIO instance, or any other custom
meets this interface. object that meets this interface.
If fix_imports is True and protocol is less than 3, pickle will try to If *fix_imports* is True and *protocol* is less than 3, pickle
map the new Python 3 names to the old module names used in Python 2, will try to map the new Python 3 names to the old module names
so that the pickle data stream is readable with Python 2. used in Python 2, so that the pickle data stream is readable
with Python 2.
""" """
if protocol is None: if protocol is None:
protocol = DEFAULT_PROTOCOL protocol = DEFAULT_PROTOCOL
...@@ -389,10 +390,9 @@ class _Pickler: ...@@ -389,10 +390,9 @@ class _Pickler:
"""Clears the pickler's "memo". """Clears the pickler's "memo".
The memo is the data structure that remembers which objects the The memo is the data structure that remembers which objects the
pickler has already seen, so that shared or recursive objects are pickler has already seen, so that shared or recursive objects
pickled by reference and not by value. This method is useful when are pickled by reference and not by value. This method is
re-using picklers. useful when re-using picklers.
""" """
self.memo.clear() self.memo.clear()
...@@ -975,8 +975,14 @@ class _Unpickler: ...@@ -975,8 +975,14 @@ class _Unpickler:
encoding="ASCII", errors="strict"): encoding="ASCII", errors="strict"):
"""This takes a binary file for reading a pickle data stream. """This takes a binary file for reading a pickle data stream.
The protocol version of the pickle is detected automatically, so no The protocol version of the pickle is detected automatically, so
proto argument is needed. no proto argument is needed.
The argument *file* must have two methods, a read() method that
takes an integer argument, and a readline() method that requires
no arguments. Both methods should return bytes. Thus *file*
can be a binary file object opened for reading, a io.BytesIO
object, or any other custom object that meets this interface.
The file-like object must have two methods, a read() method The file-like object must have two methods, a read() method
that takes an integer argument, and a readline() method that that takes an integer argument, and a readline() method that
...@@ -985,13 +991,14 @@ class _Unpickler: ...@@ -985,13 +991,14 @@ class _Unpickler:
reading, a BytesIO object, or any other custom object that reading, a BytesIO object, or any other custom object that
meets this interface. meets this interface.
Optional keyword arguments are *fix_imports*, *encoding* and *errors*, Optional keyword arguments are *fix_imports*, *encoding* and
which are used to control compatiblity support for pickle stream *errors*, which are used to control compatiblity support for
generated by Python 2.x. If *fix_imports* is True, pickle will try to pickle stream generated by Python 2. If *fix_imports* is True,
map the old Python 2.x names to the new names used in Python 3.x. The pickle will try to map the old Python 2 names to the new names
*encoding* and *errors* tell pickle how to decode 8-bit string used in Python 3. The *encoding* and *errors* tell pickle how
instances pickled by Python 2.x; these default to 'ASCII' and to decode 8-bit string instances pickled by Python 2; these
'strict', respectively. default to 'ASCII' and 'strict', respectively. *encoding* can be
'bytes' to read theses 8-bit string instances as bytes objects.
""" """
self._file_readline = file.readline self._file_readline = file.readline
self._file_read = file.read self._file_read = file.read
...@@ -1139,6 +1146,15 @@ class _Unpickler: ...@@ -1139,6 +1146,15 @@ class _Unpickler:
self.append(unpack('>d', self.read(8))[0]) self.append(unpack('>d', self.read(8))[0])
dispatch[BINFLOAT[0]] = load_binfloat dispatch[BINFLOAT[0]] = load_binfloat
def _decode_string(self, value):
# Used to allow strings from Python 2 to be decoded either as
# bytes or Unicode strings. This should be used only with the
# STRING, BINSTRING and SHORT_BINSTRING opcodes.
if self.encoding == "bytes":
return value
else:
return value.decode(self.encoding, self.errors)
def load_string(self): def load_string(self):
data = self.readline()[:-1] data = self.readline()[:-1]
# Strip outermost quotes # Strip outermost quotes
...@@ -1146,8 +1162,7 @@ class _Unpickler: ...@@ -1146,8 +1162,7 @@ class _Unpickler:
data = data[1:-1] data = data[1:-1]
else: else:
raise UnpicklingError("the STRING opcode argument must be quoted") raise UnpicklingError("the STRING opcode argument must be quoted")
self.append(codecs.escape_decode(data)[0] self.append(self._decode_string(codecs.escape_decode(data)[0]))
.decode(self.encoding, self.errors))
dispatch[STRING[0]] = load_string dispatch[STRING[0]] = load_string
def load_binstring(self): def load_binstring(self):
...@@ -1156,8 +1171,7 @@ class _Unpickler: ...@@ -1156,8 +1171,7 @@ class _Unpickler:
if len < 0: if len < 0:
raise UnpicklingError("BINSTRING pickle has negative byte count") raise UnpicklingError("BINSTRING pickle has negative byte count")
data = self.read(len) data = self.read(len)
value = str(data, self.encoding, self.errors) self.append(self._decode_string(data))
self.append(value)
dispatch[BINSTRING[0]] = load_binstring dispatch[BINSTRING[0]] = load_binstring
def load_binbytes(self): def load_binbytes(self):
...@@ -1191,8 +1205,7 @@ class _Unpickler: ...@@ -1191,8 +1205,7 @@ class _Unpickler:
def load_short_binstring(self): def load_short_binstring(self):
len = self.read(1)[0] len = self.read(1)[0]
data = self.read(len) data = self.read(len)
value = str(data, self.encoding, self.errors) self.append(self._decode_string(data))
self.append(value)
dispatch[SHORT_BINSTRING[0]] = load_short_binstring dispatch[SHORT_BINSTRING[0]] = load_short_binstring
def load_short_binbytes(self): def load_short_binbytes(self):
......
...@@ -969,113 +969,107 @@ class StackObject(object): ...@@ -969,113 +969,107 @@ class StackObject(object):
return self.name return self.name
pyint = StackObject( pyint = pylong = StackObject(
name='int', name='int',
obtype=int, obtype=int,
doc="A short (as opposed to long) Python integer object.") doc="A Python integer object.")
pylong = StackObject(
name='long',
obtype=int,
doc="A long (as opposed to short) Python integer object.")
pyinteger_or_bool = StackObject( pyinteger_or_bool = StackObject(
name='int_or_bool', name='int_or_bool',
obtype=(int, bool), obtype=(int, bool),
doc="A Python integer object (short or long), or " doc="A Python integer or boolean object.")
"a Python bool.")
pybool = StackObject( pybool = StackObject(
name='bool', name='bool',
obtype=(bool,), obtype=bool,
doc="A Python bool object.") doc="A Python boolean object.")
pyfloat = StackObject( pyfloat = StackObject(
name='float', name='float',
obtype=float, obtype=float,
doc="A Python float object.") doc="A Python float object.")
pystring = StackObject( pybytes_or_str = pystring = StackObject(
name='string', name='bytes_or_str',
obtype=bytes, obtype=(bytes, str),
doc="A Python (8-bit) string object.") doc="A Python bytes or (Unicode) string object.")
pybytes = StackObject( pybytes = StackObject(
name='bytes', name='bytes',
obtype=bytes, obtype=bytes,
doc="A Python bytes object.") doc="A Python bytes object.")
pyunicode = StackObject( pyunicode = StackObject(
name='str', name='str',
obtype=str, obtype=str,
doc="A Python (Unicode) string object.") doc="A Python (Unicode) string object.")
pynone = StackObject( pynone = StackObject(
name="None", name="None",
obtype=type(None), obtype=type(None),
doc="The Python None object.") doc="The Python None object.")
pytuple = StackObject( pytuple = StackObject(
name="tuple", name="tuple",
obtype=tuple, obtype=tuple,
doc="A Python tuple object.") doc="A Python tuple object.")
pylist = StackObject( pylist = StackObject(
name="list", name="list",
obtype=list, obtype=list,
doc="A Python list object.") doc="A Python list object.")
pydict = StackObject( pydict = StackObject(
name="dict", name="dict",
obtype=dict, obtype=dict,
doc="A Python dict object.") doc="A Python dict object.")
pyset = StackObject( pyset = StackObject(
name="set", name="set",
obtype=set, obtype=set,
doc="A Python set object.") doc="A Python set object.")
pyfrozenset = StackObject( pyfrozenset = StackObject(
name="frozenset", name="frozenset",
obtype=set, obtype=set,
doc="A Python frozenset object.") doc="A Python frozenset object.")
anyobject = StackObject( anyobject = StackObject(
name='any', name='any',
obtype=object, obtype=object,
doc="Any kind of object whatsoever.") doc="Any kind of object whatsoever.")
markobject = StackObject( markobject = StackObject(
name="mark", name="mark",
obtype=StackObject, obtype=StackObject,
doc="""'The mark' is a unique object. doc="""'The mark' is a unique object.
Opcodes that operate on a variable number of objects Opcodes that operate on a variable number of objects
generally don't embed the count of objects in the opcode, generally don't embed the count of objects in the opcode,
or pull it off the stack. Instead the MARK opcode is used or pull it off the stack. Instead the MARK opcode is used
to push a special marker object on the stack, and then to push a special marker object on the stack, and then
some other opcodes grab all the objects from the top of some other opcodes grab all the objects from the top of
the stack down to (but not including) the topmost marker the stack down to (but not including) the topmost marker
object. object.
""") """)
stackslice = StackObject( stackslice = StackObject(
name="stackslice", name="stackslice",
obtype=StackObject, obtype=StackObject,
doc="""An object representing a contiguous slice of the stack. doc="""An object representing a contiguous slice of the stack.
This is used in conjunction with markobject, to represent all This is used in conjunction with markobject, to represent all
of the stack following the topmost markobject. For example, of the stack following the topmost markobject. For example,
the POP_MARK opcode changes the stack from the POP_MARK opcode changes the stack from
[..., markobject, stackslice] [..., markobject, stackslice]
to to
[...] [...]
No matter how many object are on the stack after the topmost No matter how many object are on the stack after the topmost
markobject, POP_MARK gets rid of all of them (including the markobject, POP_MARK gets rid of all of them (including the
topmost markobject too). topmost markobject too).
""") """)
############################################################################## ##############################################################################
# Descriptors for pickle opcodes. # Descriptors for pickle opcodes.
...@@ -1212,7 +1206,7 @@ opcodes = [ ...@@ -1212,7 +1206,7 @@ opcodes = [
code='L', code='L',
arg=decimalnl_long, arg=decimalnl_long,
stack_before=[], stack_before=[],
stack_after=[pylong], stack_after=[pyint],
proto=0, proto=0,
doc="""Push a long integer. doc="""Push a long integer.
...@@ -1230,7 +1224,7 @@ opcodes = [ ...@@ -1230,7 +1224,7 @@ opcodes = [
code='\x8a', code='\x8a',
arg=long1, arg=long1,
stack_before=[], stack_before=[],
stack_after=[pylong], stack_after=[pyint],
proto=2, proto=2,
doc="""Long integer using one-byte length. doc="""Long integer using one-byte length.
...@@ -1241,7 +1235,7 @@ opcodes = [ ...@@ -1241,7 +1235,7 @@ opcodes = [
code='\x8b', code='\x8b',
arg=long4, arg=long4,
stack_before=[], stack_before=[],
stack_after=[pylong], stack_after=[pyint],
proto=2, proto=2,
doc="""Long integer using found-byte length. doc="""Long integer using found-byte length.
...@@ -1254,45 +1248,50 @@ opcodes = [ ...@@ -1254,45 +1248,50 @@ opcodes = [
code='S', code='S',
arg=stringnl, arg=stringnl,
stack_before=[], stack_before=[],
stack_after=[pystring], stack_after=[pybytes_or_str],
proto=0, proto=0,
doc="""Push a Python string object. doc="""Push a Python string object.
The argument is a repr-style string, with bracketing quote characters, The argument is a repr-style string, with bracketing quote characters,
and perhaps embedded escapes. The argument extends until the next and perhaps embedded escapes. The argument extends until the next
newline character. (Actually, they are decoded into a str instance newline character. These are usually decoded into a str instance
using the encoding given to the Unpickler constructor. or the default, using the encoding given to the Unpickler constructor. or the default,
'ASCII'.) 'ASCII'. If the encoding given was 'bytes' however, they will be
decoded as bytes object instead.
"""), """),
I(name='BINSTRING', I(name='BINSTRING',
code='T', code='T',
arg=string4, arg=string4,
stack_before=[], stack_before=[],
stack_after=[pystring], stack_after=[pybytes_or_str],
proto=1, proto=1,
doc="""Push a Python string object. doc="""Push a Python string object.
There are two arguments: the first is a 4-byte little-endian signed int There are two arguments: the first is a 4-byte little-endian
giving the number of bytes in the string, and the second is that many signed int giving the number of bytes in the string, and the
bytes, which are taken literally as the string content. (Actually, second is that many bytes, which are taken literally as the string
they are decoded into a str instance using the encoding given to the content. These are usually decoded into a str instance using the
Unpickler constructor. or the default, 'ASCII'.) encoding given to the Unpickler constructor. or the default,
'ASCII'. If the encoding given was 'bytes' however, they will be
decoded as bytes object instead.
"""), """),
I(name='SHORT_BINSTRING', I(name='SHORT_BINSTRING',
code='U', code='U',
arg=string1, arg=string1,
stack_before=[], stack_before=[],
stack_after=[pystring], stack_after=[pybytes_or_str],
proto=1, proto=1,
doc="""Push a Python string object. doc="""Push a Python string object.
There are two arguments: the first is a 1-byte unsigned int giving There are two arguments: the first is a 1-byte unsigned int giving
the number of bytes in the string, and the second is that many bytes, the number of bytes in the string, and the second is that many
which are taken literally as the string content. (Actually, they bytes, which are taken literally as the string content. These are
are decoded into a str instance using the encoding given to the usually decoded into a str instance using the encoding given to
Unpickler constructor. or the default, 'ASCII'.) the Unpickler constructor. or the default, 'ASCII'. If the
encoding given was 'bytes' however, they will be decoded as bytes
object instead.
"""), """),
# Bytes (protocol 3 only; older protocols don't support bytes at all) # Bytes (protocol 3 only; older protocols don't support bytes at all)
......
...@@ -1305,6 +1305,35 @@ class AbstractPickleTests(unittest.TestCase): ...@@ -1305,6 +1305,35 @@ class AbstractPickleTests(unittest.TestCase):
dumped = self.dumps(set([3]), 2) dumped = self.dumps(set([3]), 2)
self.assertEqual(dumped, DATA6) self.assertEqual(dumped, DATA6)
def test_load_python2_str_as_bytes(self):
# From Python 2: pickle.dumps('a\x00\xa0', protocol=0)
self.assertEqual(self.loads(b"S'a\\x00\\xa0'\n.",
encoding="bytes"), b'a\x00\xa0')
# From Python 2: pickle.dumps('a\x00\xa0', protocol=1)
self.assertEqual(self.loads(b'U\x03a\x00\xa0.',
encoding="bytes"), b'a\x00\xa0')
# From Python 2: pickle.dumps('a\x00\xa0', protocol=2)
self.assertEqual(self.loads(b'\x80\x02U\x03a\x00\xa0.',
encoding="bytes"), b'a\x00\xa0')
def test_load_python2_unicode_as_str(self):
# From Python 2: pickle.dumps(u'π', protocol=0)
self.assertEqual(self.loads(b'V\\u03c0\n.',
encoding='bytes'), 'π')
# From Python 2: pickle.dumps(u'π', protocol=1)
self.assertEqual(self.loads(b'X\x02\x00\x00\x00\xcf\x80.',
encoding="bytes"), 'π')
# From Python 2: pickle.dumps(u'π', protocol=2)
self.assertEqual(self.loads(b'\x80\x02X\x02\x00\x00\x00\xcf\x80.',
encoding="bytes"), 'π')
def test_load_long_python2_str_as_bytes(self):
# From Python 2: pickle.dumps('x' * 300, protocol=1)
self.assertEqual(self.loads(pickle.BINSTRING +
struct.pack("<I", 300) +
b'x' * 300 + pickle.STOP,
encoding='bytes'), b'x' * 300)
def test_large_pickles(self): def test_large_pickles(self):
# Test the correctness of internal buffering routines when handling # Test the correctness of internal buffering routines when handling
# large data. # large data.
...@@ -1566,7 +1595,6 @@ class AbstractPickleTests(unittest.TestCase): ...@@ -1566,7 +1595,6 @@ class AbstractPickleTests(unittest.TestCase):
unpickled = self.loads(self.dumps(method, proto)) unpickled = self.loads(self.dumps(method, proto))
self.assertEqual(method(obj), unpickled(obj)) self.assertEqual(method(obj), unpickled(obj))
def test_c_methods(self): def test_c_methods(self):
global Subclass global Subclass
class Subclass(tuple): class Subclass(tuple):
......
...@@ -83,13 +83,17 @@ class PyPicklerUnpicklerObjectTests(AbstractPicklerUnpicklerObjectTests): ...@@ -83,13 +83,17 @@ class PyPicklerUnpicklerObjectTests(AbstractPicklerUnpicklerObjectTests):
class PyDispatchTableTests(AbstractDispatchTableTests): class PyDispatchTableTests(AbstractDispatchTableTests):
pickler_class = pickle._Pickler pickler_class = pickle._Pickler
def get_dispatch_table(self): def get_dispatch_table(self):
return pickle.dispatch_table.copy() return pickle.dispatch_table.copy()
class PyChainDispatchTableTests(AbstractDispatchTableTests): class PyChainDispatchTableTests(AbstractDispatchTableTests):
pickler_class = pickle._Pickler pickler_class = pickle._Pickler
def get_dispatch_table(self): def get_dispatch_table(self):
return collections.ChainMap({}, pickle.dispatch_table) return collections.ChainMap({}, pickle.dispatch_table)
......
...@@ -293,6 +293,7 @@ Kushal Das ...@@ -293,6 +293,7 @@ Kushal Das
Jonathan Dasteel Jonathan Dasteel
Pierre-Yves David Pierre-Yves David
A. Jesse Jiryu Davis A. Jesse Jiryu Davis
Merlijn van Deen
John DeGood John DeGood
Ned Deily Ned Deily
Vincent Delft Vincent Delft
......
...@@ -23,6 +23,10 @@ Library ...@@ -23,6 +23,10 @@ Library
- Issue #19296: Silence compiler warning in dbm_open - Issue #19296: Silence compiler warning in dbm_open
- Issue #6784: Strings from Python 2 can now be unpickled as bytes
objects by setting the encoding argument of Unpickler to be 'bytes'.
Initial patch by Merlijn van Deen.
- Issue #19839: Fix regression in bz2 module's handling of non-bzip2 data at - Issue #19839: Fix regression in bz2 module's handling of non-bzip2 data at
EOF, and analogous bug in lzma module. EOF, and analogous bug in lzma module.
......
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment