Commit 4e18ac85 authored by Nadeem Vawda's avatar Nadeem Vawda

Merge heads

parents 98fe1a0c 200e00a9
...@@ -37,14 +37,18 @@ All of the classes in this module may safely be accessed from multiple threads. ...@@ -37,14 +37,18 @@ All of the classes in this module may safely be accessed from multiple threads.
*fileobj*), or operate directly on a named file (named by *filename*). *fileobj*), or operate directly on a named file (named by *filename*).
Exactly one of these two parameters should be provided. Exactly one of these two parameters should be provided.
The *mode* argument can be either ``'r'`` for reading (default), or ``'w'`` The *mode* argument can be either ``'r'`` for reading (default), ``'w'`` for
for writing. overwriting, or ``'a'`` for appending. If *fileobj* is provided, a mode of
``'w'`` does not truncate the file, and is instead equivalent to ``'a'``.
The *buffering* argument is ignored. Its use is deprecated. The *buffering* argument is ignored. Its use is deprecated.
If *mode* is ``'w'``, *compresslevel* can be a number between ``1`` and If *mode* is ``'w'`` or ``'a'``, *compresslevel* can be a number between
``9`` specifying the level of compression: ``1`` produces the least ``1`` and ``9`` specifying the level of compression: ``1`` produces the
compression, and ``9`` (default) produces the most compression. least compression, and ``9`` (default) produces the most compression.
If *mode* is ``'r'``, the input file may be the concatenation of multiple
compressed streams.
:class:`BZ2File` provides all of the members specified by the :class:`BZ2File` provides all of the members specified by the
:class:`io.BufferedIOBase`, except for :meth:`detach` and :meth:`truncate`. :class:`io.BufferedIOBase`, except for :meth:`detach` and :meth:`truncate`.
...@@ -70,6 +74,10 @@ All of the classes in this module may safely be accessed from multiple threads. ...@@ -70,6 +74,10 @@ All of the classes in this module may safely be accessed from multiple threads.
.. versionchanged:: 3.3 .. versionchanged:: 3.3
The *fileobj* argument to the constructor was added. The *fileobj* argument to the constructor was added.
.. versionchanged:: 3.3
The ``'a'`` (append) mode was added, along with support for reading
multi-stream files.
Incremental (de)compression Incremental (de)compression
--------------------------- ---------------------------
...@@ -106,14 +114,20 @@ Incremental (de)compression ...@@ -106,14 +114,20 @@ Incremental (de)compression
incrementally. For one-shot compression, use the :func:`decompress` function incrementally. For one-shot compression, use the :func:`decompress` function
instead. instead.
.. note::
This class does not transparently handle inputs containing multiple
compressed streams, unlike :func:`decompress` and :class:`BZ2File`. If
you need to decompress a multi-stream input with :class:`BZ2Decompressor`,
you must use a new decompressor for each stream.
.. method:: decompress(data) .. method:: decompress(data)
Provide data to the decompressor object. Returns a chunk of decompressed Provide data to the decompressor object. Returns a chunk of decompressed
data if possible, or an empty byte string otherwise. data if possible, or an empty byte string otherwise.
Attempting to decompress data after the end of stream is reached raises Attempting to decompress data after the end of the current stream is
an :exc:`EOFError`. If any data is found after the end of the stream, it reached raises an :exc:`EOFError`. If any data is found after the end of
is ignored and saved in the :attr:`unused_data` attribute. the stream, it is ignored and saved in the :attr:`unused_data` attribute.
.. attribute:: eof .. attribute:: eof
...@@ -127,6 +141,9 @@ Incremental (de)compression ...@@ -127,6 +141,9 @@ Incremental (de)compression
Data found after the end of the compressed stream. Data found after the end of the compressed stream.
If this attribute is accessed before the end of the stream has been
reached, its value will be ``b''``.
One-shot (de)compression One-shot (de)compression
------------------------ ------------------------
...@@ -145,5 +162,11 @@ One-shot (de)compression ...@@ -145,5 +162,11 @@ One-shot (de)compression
Decompress *data*. Decompress *data*.
If *data* is the concatenation of multiple compressed streams, decompress
all of the streams.
For incremental decompression, use a :class:`BZ2Decompressor` instead. For incremental decompression, use a :class:`BZ2Decompressor` instead.
.. versionchanged:: 3.3
Support for multi-stream inputs was added.
...@@ -76,6 +76,10 @@ class BZ2File(io.BufferedIOBase): ...@@ -76,6 +76,10 @@ class BZ2File(io.BufferedIOBase):
mode = "wb" mode = "wb"
mode_code = _MODE_WRITE mode_code = _MODE_WRITE
self._compressor = BZ2Compressor() self._compressor = BZ2Compressor()
elif mode in ("a", "ab"):
mode = "ab"
mode_code = _MODE_WRITE
self._compressor = BZ2Compressor()
else: else:
raise ValueError("Invalid mode: {!r}".format(mode)) raise ValueError("Invalid mode: {!r}".format(mode))
...@@ -161,14 +165,25 @@ class BZ2File(io.BufferedIOBase): ...@@ -161,14 +165,25 @@ class BZ2File(io.BufferedIOBase):
def _fill_buffer(self): def _fill_buffer(self):
if self._buffer: if self._buffer:
return True return True
if self._decompressor.unused_data:
rawblock = self._decompressor.unused_data
else:
rawblock = self._fp.read(_BUFFER_SIZE)
if not rawblock:
if self._decompressor.eof: if self._decompressor.eof:
self._mode = _MODE_READ_EOF self._mode = _MODE_READ_EOF
self._size = self._pos self._size = self._pos
return False return False
rawblock = self._fp.read(_BUFFER_SIZE) else:
if not rawblock:
raise EOFError("Compressed file ended before the " raise EOFError("Compressed file ended before the "
"end-of-stream marker was reached") "end-of-stream marker was reached")
# Continue to next stream.
if self._decompressor.eof:
self._decompressor = BZ2Decompressor()
self._buffer = self._decompressor.decompress(rawblock) self._buffer = self._decompressor.decompress(rawblock)
return True return True
...@@ -384,9 +399,15 @@ def decompress(data): ...@@ -384,9 +399,15 @@ def decompress(data):
""" """
if len(data) == 0: if len(data) == 0:
return b"" return b""
result = b""
while True:
decomp = BZ2Decompressor() decomp = BZ2Decompressor()
result = decomp.decompress(data) result += decomp.decompress(data)
if not decomp.eof: if not decomp.eof:
raise ValueError("Compressed data ended before the " raise ValueError("Compressed data ended before the "
"end-of-stream marker was reached") "end-of-stream marker was reached")
if not decomp.unused_data:
return result return result
# There is unused data left over. Proceed to next stream.
data = decomp.unused_data
...@@ -84,9 +84,9 @@ class BZ2FileTest(BaseTest): ...@@ -84,9 +84,9 @@ class BZ2FileTest(BaseTest):
else: else:
return self.DATA return self.DATA
def createTempFile(self, crlf=False): def createTempFile(self, crlf=False, streams=1):
with open(self.filename, "wb") as f: with open(self.filename, "wb") as f:
f.write(self.getData(crlf)) f.write(self.getData(crlf) * streams)
def testRead(self): def testRead(self):
# "Test BZ2File.read()" # "Test BZ2File.read()"
...@@ -95,6 +95,26 @@ class BZ2FileTest(BaseTest): ...@@ -95,6 +95,26 @@ class BZ2FileTest(BaseTest):
self.assertRaises(TypeError, bz2f.read, None) self.assertRaises(TypeError, bz2f.read, None)
self.assertEqual(bz2f.read(), self.TEXT) self.assertEqual(bz2f.read(), self.TEXT)
def testReadMultiStream(self):
# "Test BZ2File.read() with a multi stream archive"
self.createTempFile(streams=5)
with BZ2File(self.filename) as bz2f:
self.assertRaises(TypeError, bz2f.read, None)
self.assertEqual(bz2f.read(), self.TEXT * 5)
def testReadMonkeyMultiStream(self):
# "Test BZ2File.read() with a multi stream archive in which stream"
# "end is alined with internal buffer size"
buffer_size = bz2._BUFFER_SIZE
bz2._BUFFER_SIZE = len(self.DATA)
try:
self.createTempFile(streams=5)
with BZ2File(self.filename) as bz2f:
self.assertRaises(TypeError, bz2f.read, None)
self.assertEqual(bz2f.read(), self.TEXT * 5)
finally:
bz2._BUFFER_SIZE = buffer_size
def testRead0(self): def testRead0(self):
# "Test BBZ2File.read(0)" # "Test BBZ2File.read(0)"
self.createTempFile() self.createTempFile()
...@@ -114,6 +134,18 @@ class BZ2FileTest(BaseTest): ...@@ -114,6 +134,18 @@ class BZ2FileTest(BaseTest):
text += str text += str
self.assertEqual(text, self.TEXT) self.assertEqual(text, self.TEXT)
def testReadChunk10MultiStream(self):
# "Test BZ2File.read() in chunks of 10 bytes with a multi stream archive"
self.createTempFile(streams=5)
with BZ2File(self.filename) as bz2f:
text = b''
while 1:
str = bz2f.read(10)
if not str:
break
text += str
self.assertEqual(text, self.TEXT * 5)
def testRead100(self): def testRead100(self):
# "Test BZ2File.read(100)" # "Test BZ2File.read(100)"
self.createTempFile() self.createTempFile()
...@@ -151,6 +183,15 @@ class BZ2FileTest(BaseTest): ...@@ -151,6 +183,15 @@ class BZ2FileTest(BaseTest):
for line in sio.readlines(): for line in sio.readlines():
self.assertEqual(bz2f.readline(), line) self.assertEqual(bz2f.readline(), line)
def testReadLineMultiStream(self):
# "Test BZ2File.readline() with a multi stream archive"
self.createTempFile(streams=5)
with BZ2File(self.filename) as bz2f:
self.assertRaises(TypeError, bz2f.readline, None)
sio = BytesIO(self.TEXT * 5)
for line in sio.readlines():
self.assertEqual(bz2f.readline(), line)
def testReadLines(self): def testReadLines(self):
# "Test BZ2File.readlines()" # "Test BZ2File.readlines()"
self.createTempFile() self.createTempFile()
...@@ -159,6 +200,14 @@ class BZ2FileTest(BaseTest): ...@@ -159,6 +200,14 @@ class BZ2FileTest(BaseTest):
sio = BytesIO(self.TEXT) sio = BytesIO(self.TEXT)
self.assertEqual(bz2f.readlines(), sio.readlines()) self.assertEqual(bz2f.readlines(), sio.readlines())
def testReadLinesMultiStream(self):
# "Test BZ2File.readlines() with a multi stream archive"
self.createTempFile(streams=5)
with BZ2File(self.filename) as bz2f:
self.assertRaises(TypeError, bz2f.readlines, None)
sio = BytesIO(self.TEXT * 5)
self.assertEqual(bz2f.readlines(), sio.readlines())
def testIterator(self): def testIterator(self):
# "Test iter(BZ2File)" # "Test iter(BZ2File)"
self.createTempFile() self.createTempFile()
...@@ -166,6 +215,13 @@ class BZ2FileTest(BaseTest): ...@@ -166,6 +215,13 @@ class BZ2FileTest(BaseTest):
sio = BytesIO(self.TEXT) sio = BytesIO(self.TEXT)
self.assertEqual(list(iter(bz2f)), sio.readlines()) self.assertEqual(list(iter(bz2f)), sio.readlines())
def testIteratorMultiStream(self):
# "Test iter(BZ2File) with a multi stream archive"
self.createTempFile(streams=5)
with BZ2File(self.filename) as bz2f:
sio = BytesIO(self.TEXT * 5)
self.assertEqual(list(iter(bz2f)), sio.readlines())
def testClosedIteratorDeadlock(self): def testClosedIteratorDeadlock(self):
# "Test that iteration on a closed bz2file releases the lock." # "Test that iteration on a closed bz2file releases the lock."
# http://bugs.python.org/issue3309 # http://bugs.python.org/issue3309
...@@ -217,6 +273,17 @@ class BZ2FileTest(BaseTest): ...@@ -217,6 +273,17 @@ class BZ2FileTest(BaseTest):
self.assertRaises(IOError, bz2f.write, b"a") self.assertRaises(IOError, bz2f.write, b"a")
self.assertRaises(IOError, bz2f.writelines, [b"a"]) self.assertRaises(IOError, bz2f.writelines, [b"a"])
def testAppend(self):
# "Test BZ2File.write()"
with BZ2File(self.filename, "w") as bz2f:
self.assertRaises(TypeError, bz2f.write)
bz2f.write(self.TEXT)
with BZ2File(self.filename, "a") as bz2f:
self.assertRaises(TypeError, bz2f.write)
bz2f.write(self.TEXT)
with open(self.filename, 'rb') as f:
self.assertEqual(self.decompress(f.read()), self.TEXT * 2)
def testSeekForward(self): def testSeekForward(self):
# "Test BZ2File.seek(150, 0)" # "Test BZ2File.seek(150, 0)"
self.createTempFile() self.createTempFile()
...@@ -225,6 +292,14 @@ class BZ2FileTest(BaseTest): ...@@ -225,6 +292,14 @@ class BZ2FileTest(BaseTest):
bz2f.seek(150) bz2f.seek(150)
self.assertEqual(bz2f.read(), self.TEXT[150:]) self.assertEqual(bz2f.read(), self.TEXT[150:])
def testSeekForwardMultiStream(self):
# "Test BZ2File.seek(150, 0) across stream boundaries"
self.createTempFile(streams=2)
with BZ2File(self.filename) as bz2f:
self.assertRaises(TypeError, bz2f.seek)
bz2f.seek(len(self.TEXT) + 150)
self.assertEqual(bz2f.read(), self.TEXT[150:])
def testSeekBackwards(self): def testSeekBackwards(self):
# "Test BZ2File.seek(-150, 1)" # "Test BZ2File.seek(-150, 1)"
self.createTempFile() self.createTempFile()
...@@ -233,6 +308,16 @@ class BZ2FileTest(BaseTest): ...@@ -233,6 +308,16 @@ class BZ2FileTest(BaseTest):
bz2f.seek(-150, 1) bz2f.seek(-150, 1)
self.assertEqual(bz2f.read(), self.TEXT[500-150:]) self.assertEqual(bz2f.read(), self.TEXT[500-150:])
def testSeekBackwardsMultiStream(self):
# "Test BZ2File.seek(-150, 1) across stream boundaries"
self.createTempFile(streams=2)
with BZ2File(self.filename) as bz2f:
readto = len(self.TEXT) + 100
while readto > 0:
readto -= len(bz2f.read(readto))
bz2f.seek(-150, 1)
self.assertEqual(bz2f.read(), self.TEXT[100-150:] + self.TEXT)
def testSeekBackwardsFromEnd(self): def testSeekBackwardsFromEnd(self):
# "Test BZ2File.seek(-150, 2)" # "Test BZ2File.seek(-150, 2)"
self.createTempFile() self.createTempFile()
...@@ -240,6 +325,13 @@ class BZ2FileTest(BaseTest): ...@@ -240,6 +325,13 @@ class BZ2FileTest(BaseTest):
bz2f.seek(-150, 2) bz2f.seek(-150, 2)
self.assertEqual(bz2f.read(), self.TEXT[len(self.TEXT)-150:]) self.assertEqual(bz2f.read(), self.TEXT[len(self.TEXT)-150:])
def testSeekBackwardsFromEndMultiStream(self):
# "Test BZ2File.seek(-1000, 2) across stream boundaries"
self.createTempFile(streams=2)
with BZ2File(self.filename) as bz2f:
bz2f.seek(-1000, 2)
self.assertEqual(bz2f.read(), (self.TEXT * 2)[-1000:])
def testSeekPostEnd(self): def testSeekPostEnd(self):
# "Test BZ2File.seek(150000)" # "Test BZ2File.seek(150000)"
self.createTempFile() self.createTempFile()
...@@ -248,6 +340,14 @@ class BZ2FileTest(BaseTest): ...@@ -248,6 +340,14 @@ class BZ2FileTest(BaseTest):
self.assertEqual(bz2f.tell(), len(self.TEXT)) self.assertEqual(bz2f.tell(), len(self.TEXT))
self.assertEqual(bz2f.read(), b"") self.assertEqual(bz2f.read(), b"")
def testSeekPostEndMultiStream(self):
# "Test BZ2File.seek(150000)"
self.createTempFile(streams=5)
with BZ2File(self.filename) as bz2f:
bz2f.seek(150000)
self.assertEqual(bz2f.tell(), len(self.TEXT) * 5)
self.assertEqual(bz2f.read(), b"")
def testSeekPostEndTwice(self): def testSeekPostEndTwice(self):
# "Test BZ2File.seek(150000) twice" # "Test BZ2File.seek(150000) twice"
self.createTempFile() self.createTempFile()
...@@ -257,6 +357,15 @@ class BZ2FileTest(BaseTest): ...@@ -257,6 +357,15 @@ class BZ2FileTest(BaseTest):
self.assertEqual(bz2f.tell(), len(self.TEXT)) self.assertEqual(bz2f.tell(), len(self.TEXT))
self.assertEqual(bz2f.read(), b"") self.assertEqual(bz2f.read(), b"")
def testSeekPostEndTwiceMultiStream(self):
# "Test BZ2File.seek(150000) twice with a multi stream archive"
self.createTempFile(streams=5)
with BZ2File(self.filename) as bz2f:
bz2f.seek(150000)
bz2f.seek(150000)
self.assertEqual(bz2f.tell(), len(self.TEXT) * 5)
self.assertEqual(bz2f.read(), b"")
def testSeekPreStart(self): def testSeekPreStart(self):
# "Test BZ2File.seek(-150, 0)" # "Test BZ2File.seek(-150, 0)"
self.createTempFile() self.createTempFile()
...@@ -265,6 +374,14 @@ class BZ2FileTest(BaseTest): ...@@ -265,6 +374,14 @@ class BZ2FileTest(BaseTest):
self.assertEqual(bz2f.tell(), 0) self.assertEqual(bz2f.tell(), 0)
self.assertEqual(bz2f.read(), self.TEXT) self.assertEqual(bz2f.read(), self.TEXT)
def testSeekPreStartMultiStream(self):
# "Test BZ2File.seek(-150, 0) with a multi stream archive"
self.createTempFile(streams=2)
with BZ2File(self.filename) as bz2f:
bz2f.seek(-150)
self.assertEqual(bz2f.tell(), 0)
self.assertEqual(bz2f.read(), self.TEXT * 2)
def testFileno(self): def testFileno(self):
# "Test BZ2File.fileno()" # "Test BZ2File.fileno()"
self.createTempFile() self.createTempFile()
...@@ -510,6 +627,11 @@ class FuncTest(BaseTest): ...@@ -510,6 +627,11 @@ class FuncTest(BaseTest):
# "Test decompress() function with incomplete data" # "Test decompress() function with incomplete data"
self.assertRaises(ValueError, bz2.decompress, self.DATA[:-10]) self.assertRaises(ValueError, bz2.decompress, self.DATA[:-10])
def testDecompressMultiStream(self):
# "Test decompress() function for data with multiple streams"
text = bz2.decompress(self.DATA * 5)
self.assertEqual(text, self.TEXT * 5)
def test_main(): def test_main():
support.run_unittest( support.run_unittest(
BZ2FileTest, BZ2FileTest,
......
...@@ -161,6 +161,9 @@ Core and Builtins ...@@ -161,6 +161,9 @@ Core and Builtins
Library Library
------- -------
- Issue #1625: BZ2File and bz2.decompress() now support multi-stream files.
Initial patch by Nir Aides.
- Issue #8796: codecs.open() calls the builtin open() function instead of using - Issue #8796: codecs.open() calls the builtin open() function instead of using
StreamReaderWriter. Deprecate StreamReader, StreamWriter, StreamReaderWriter, StreamReaderWriter. Deprecate StreamReader, StreamWriter, StreamReaderWriter,
StreamRecoder and EncodedFile() of the codec module. Use the builtin open() StreamRecoder and EncodedFile() of the codec module. Use the builtin open()
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment