Commit 22dfbbec authored by Joe Tsai's avatar Joe Tsai Committed by Nigel Tao

compress/flate: improve inflate speed by reading more bits at a time

The flate library guarantees that the Reader will never read more
bytes than is necessary. This way, the underlying io.Reader will
be left exactly after the last byte of the DEFLATE stream.
Formats like gzip depend on this behavior being true.

As such, inflate conservatively reads the minimum symbol length in
huffSym leading to many individual calls to moreBits. However, if we
take advantage of the fact that every block *must* end with the EOB
symbol, we can choose to read the length of the EOB symbol.
Since the EOB symbol is also the most rare symbol (occuring exactly
once) in a block, we can hypothesize that it is almost as long as
the max symbol length, allowing huffSym to ask for more bits at the
start of every loop. This increases the probabilty that the Huffman
code is decoded on the first iteration of the outer for-loop.

benchmark                              old MB/s     new MB/s     speedup
BenchmarkDecodeDigitsSpeed1e4-4        51.05        54.31        1.06x
BenchmarkDecodeDigitsSpeed1e5-4        58.86        62.24        1.06x
BenchmarkDecodeDigitsSpeed1e6-4        59.63        63.13        1.06x
BenchmarkDecodeDigitsDefault1e4-4      51.94        54.61        1.05x
BenchmarkDecodeDigitsDefault1e5-4      63.70        69.13        1.09x
BenchmarkDecodeDigitsDefault1e6-4      66.08        71.43        1.08x
BenchmarkDecodeDigitsCompress1e4-4     52.25        54.56        1.04x
BenchmarkDecodeDigitsCompress1e5-4     63.34        68.30        1.08x
BenchmarkDecodeDigitsCompress1e6-4     66.84        70.64        1.06x
BenchmarkDecodeTwainSpeed1e4-4         50.74        53.40        1.05x
BenchmarkDecodeTwainSpeed1e5-4         60.77        67.03        1.10x
BenchmarkDecodeTwainSpeed1e6-4         62.08        69.78        1.12x
BenchmarkDecodeTwainDefault1e4-4       53.45        56.40        1.06x
BenchmarkDecodeTwainDefault1e5-4       73.54        79.05        1.07x
BenchmarkDecodeTwainDefault1e6-4       77.68        83.65        1.08x
BenchmarkDecodeTwainCompress1e4-4      53.21        56.15        1.06x
BenchmarkDecodeTwainCompress1e5-4      73.82        77.76        1.05x
BenchmarkDecodeTwainCompress1e6-4      79.23        83.30        1.05x

Change-Id: Ie194925c827988a380b8c2fdd13b13c4faa5d397
Reviewed-on: https://go-review.googlesource.com/15651Reviewed-by: default avatarNigel Tao <nigeltao@golang.org>
parent 24375757
......@@ -465,6 +465,14 @@ func (f *decompressor) readHuffman() error {
return CorruptInputError(f.roffset)
}
// As an optimization, we can initialize the min bits to read at a time
// for the HLIT tree to the length of the EOB marker since we know that
// every block must terminate with one. This preserves the property that
// we never read any extra bytes after the end of the DEFLATE stream.
if f.h1.min < f.bits[endBlockMarker] {
f.h1.min = f.bits[endBlockMarker]
}
return nil
}
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment