golang/strconv_test.py · 90f0e0ff69ef261040e61573efb5bdd1fa91d2e6 · Carlos Ramos Carreño / pygolang

strconv: Add benchmarks for quote and unquote · 90f0e0ff
Kirill Smelkov authored Jun 23, 2023
This functions are currently relatively slow. They were initially used
in zodbdump and zodbrestore, where their speed did not matter much, but
with bstr and ustr, since e.g. quote is used in repr, not having them to
perform with speed similar to builtin string escaping starts to be an
issue. Tatuya Kamada reports at nexedi/pygolang!21 (comment 170833) :

    ### 3. `u` seems slow with large arrays especially when `repr` it

    I have faced a slowness while testing `u`, `b` with python 2.7, especially with `repr`.

    ```python
    >>> timeit.timeit("from golang import b,u; u('あ'*199998)", number=10)
    2.02020001411438
    >>> timeit.timeit("from golang import b,u; repr(u('あ'*199998))", number=10)
    54.60263395309448
    ```

    `bytes`(str) is very fast.

    ```python
    >>> timeit.timeit("from golang import b,u; bytes('あ'*199998)", number=10)
    0.000392913818359375
    >>> timeit.timeit("from golang import b,u; repr(bytes('あ'*199998))", number=10)
    0.4604980945587158
    ```

    `b` is much faster than `u`, but still the repr seems slow.

    ```
    >>> timeit.timeit("from golang import b,u; b('あ'*199998)", number=10)
    0.0009968280792236328
    >>> timeit.timeit("from golang import b,u; repr(b('あ'*199998))", number=10)
    25.498882055282593
    ```

The "repr" part of this problem is due to that both bstr.__repr__ and
ustr.__repr__ use custom quoting routines which currently are implemented in
pure python in strconv module:

https://lab.nexedi.com/kirr/pygolang/blob/300d7dfa/golang/_golang_str.pyx#L282-291
https://lab.nexedi.com/kirr/pygolang/blob/300d7dfa/golang/_golang_str.pyx#L582-591
https://lab.nexedi.com/kirr/pygolang/blob/300d7dfa/golang/_golang_str.pyx#L941-970
https://lab.nexedi.com/kirr/pygolang/blob/300d7dfa/golang/strconv.py#L31-92

The fix would be to move strconv.py to Cython and to correspondingly rework it
to avoid using python-level constructs during quoting internally.

Working on that was not a priority, but soon I will need to move strconv to
Cython for another reason: to be able to break import cycle in between _golang
and strconv.

So it makes sense to add strconv benchmark first - since we'll start moving it
to Cython anyway - to see where we are and how further changes will help
performance-wise.

Currently we are at

    name                 time/op
    quote[a]              910µs ± 0%
    quote[\u03b1]        1.23ms ± 0%
    quote[\u65e5]         800µs ± 0%
    quote[\U0001f64f]    1.06ms ± 1%
    stdquote             1.17µs ± 0%
    unquote[a]           1.33ms ± 1%
    unquote[\u03b1]       952µs ± 2%
    unquote[\u65e5]       613µs ± 2%
    unquote[\U0001f64f]  3.62ms ± 1%
    stdunquote            788ns ± 0%

i.e. on py2 quoting is ~ 1000x slower than builtin string escaping, and unquoting is
even slower.

on py3 the situation is better, but still not good:

    name                 time/op
    quote[a]              579µs ± 1%
    quote[\u03b1]         942µs ± 1%
    quote[\u65e5]         595µs ± 0%
    quote[\U0001f64f]     274µs ± 1%
    stdquote             2.70µs ± 0%
    unquote[a]            696µs ± 1%
    unquote[\u03b1]       763µs ± 0%
    unquote[\u65e5]       474µs ± 1%
    unquote[\U0001f64f]   187µs ± 0%
    stdunquote            808ns ± 0%

δ(py2, py3) for the reference:

    name                 py2 time/op  py3 time/op  delta
    quote[a]              910µs ± 0%   579µs ± 1%   -36.42%  (p=0.008 n=5+5)
    quote[\u03b1]        1.23ms ± 0%  0.94ms ± 1%   -23.17%  (p=0.008 n=5+5)
    quote[\u65e5]         800µs ± 0%   595µs ± 0%   -25.63%  (p=0.016 n=4+5)
    quote[\U0001f64f]    1.06ms ± 1%  0.27ms ± 1%   -74.23%  (p=0.008 n=5+5)
    stdquote             1.17µs ± 0%  2.70µs ± 0%  +129.71%  (p=0.008 n=5+5)
    unquote[a]           1.33ms ± 1%  0.70ms ± 1%   -47.71%  (p=0.008 n=5+5)
    unquote[\u03b1]       952µs ± 2%   763µs ± 0%   -19.82%  (p=0.008 n=5+5)
    unquote[\u65e5]       613µs ± 2%   474µs ± 1%   -22.76%  (p=0.008 n=5+5)
    unquote[\U0001f64f]  3.62ms ± 1%  0.19ms ± 0%   -94.84%  (p=0.016 n=5+4)
    stdunquote            788ns ± 0%   808ns ± 0%    +2.59%  (p=0.016 n=4+5)
90f0e0ff
strconv_test.py 5.79 KB
Replace strconv_test.py