strconv: Add benchmarks for quote and unquote
This functions are currently relatively slow. They were initially used in zodbdump and zodbrestore, where their speed did not matter much, but with bstr and ustr, since e.g. quote is used in repr, not having them to perform with speed similar to builtin string escaping starts to be an issue. Tatuya Kamada reports at nexedi/pygolang!21 (comment 170833) : ### 3. `u` seems slow with large arrays especially when `repr` it I have faced a slowness while testing `u`, `b` with python 2.7, especially with `repr`. ```python >>> timeit.timeit("from golang import b,u; u('あ'*199998)", number=10) 2.02020001411438 >>> timeit.timeit("from golang import b,u; repr(u('あ'*199998))", number=10) 54.60263395309448 ``` `bytes`(str) is very fast. ```python >>> timeit.timeit("from golang import b,u; bytes('あ'*199998)", number=10) 0.000392913818359375 >>> timeit.timeit("from golang import b,u; repr(bytes('あ'*199998))", number=10) 0.4604980945587158 ``` `b` is much faster than `u`, but still the repr seems slow. ``` >>> timeit.timeit("from golang import b,u; b('あ'*199998)", number=10) 0.0009968280792236328 >>> timeit.timeit("from golang import b,u; repr(b('あ'*199998))", number=10) 25.498882055282593 ``` The "repr" part of this problem is due to that both bstr.__repr__ and ustr.__repr__ use custom quoting routines which currently are implemented in pure python in strconv module: https://lab.nexedi.com/kirr/pygolang/blob/300d7dfa/golang/_golang_str.pyx#L282-291 https://lab.nexedi.com/kirr/pygolang/blob/300d7dfa/golang/_golang_str.pyx#L582-591 https://lab.nexedi.com/kirr/pygolang/blob/300d7dfa/golang/_golang_str.pyx#L941-970 https://lab.nexedi.com/kirr/pygolang/blob/300d7dfa/golang/strconv.py#L31-92 The fix would be to move strconv.py to Cython and to correspondingly rework it to avoid using python-level constructs during quoting internally. Working on that was not a priority, but soon I will need to move strconv to Cython for another reason: to be able to break import cycle in between _golang and strconv. So it makes sense to add strconv benchmark first - since we'll start moving it to Cython anyway - to see where we are and how further changes will help performance-wise. Currently we are at name time/op quote[a] 910µs ± 0% quote[\u03b1] 1.23ms ± 0% quote[\u65e5] 800µs ± 0% quote[\U0001f64f] 1.06ms ± 1% stdquote 1.17µs ± 0% unquote[a] 1.33ms ± 1% unquote[\u03b1] 952µs ± 2% unquote[\u65e5] 613µs ± 2% unquote[\U0001f64f] 3.62ms ± 1% stdunquote 788ns ± 0% i.e. on py2 quoting is ~ 1000x slower than builtin string escaping, and unquoting is even slower. on py3 the situation is better, but still not good: name time/op quote[a] 579µs ± 1% quote[\u03b1] 942µs ± 1% quote[\u65e5] 595µs ± 0% quote[\U0001f64f] 274µs ± 1% stdquote 2.70µs ± 0% unquote[a] 696µs ± 1% unquote[\u03b1] 763µs ± 0% unquote[\u65e5] 474µs ± 1% unquote[\U0001f64f] 187µs ± 0% stdunquote 808ns ± 0% δ(py2, py3) for the reference: name py2 time/op py3 time/op delta quote[a] 910µs ± 0% 579µs ± 1% -36.42% (p=0.008 n=5+5) quote[\u03b1] 1.23ms ± 0% 0.94ms ± 1% -23.17% (p=0.008 n=5+5) quote[\u65e5] 800µs ± 0% 595µs ± 0% -25.63% (p=0.016 n=4+5) quote[\U0001f64f] 1.06ms ± 1% 0.27ms ± 1% -74.23% (p=0.008 n=5+5) stdquote 1.17µs ± 0% 2.70µs ± 0% +129.71% (p=0.008 n=5+5) unquote[a] 1.33ms ± 1% 0.70ms ± 1% -47.71% (p=0.008 n=5+5) unquote[\u03b1] 952µs ± 2% 763µs ± 0% -19.82% (p=0.008 n=5+5) unquote[\u65e5] 613µs ± 2% 474µs ± 1% -22.76% (p=0.008 n=5+5) unquote[\U0001f64f] 3.62ms ± 1% 0.19ms ± 0% -94.84% (p=0.016 n=5+4) stdunquote 788ns ± 0% 808ns ± 0% +2.59% (p=0.016 n=4+5)
Showing
Please register or sign in to comment