• Kirill Smelkov's avatar
    strconv: Add benchmarks for quote and unquote · 23f0a47c
    Kirill Smelkov authored
    This functions are currently relatively slow. They were initially used
    in zodbdump and zodbrestore, where their speed did not matter much, but
    with bstr and ustr, since e.g. quote is used in repr, not having them to
    perform with speed similar to builtin string escaping starts to be an
    issue. Tatuya Kamada reports at nexedi/pygolang!21 (comment 170833) :
    
        ### 3. `u` seems slow with large arrays especially when `repr` it
    
        I have faced a slowness while testing `u`, `b` with python 2.7, especially with `repr`.
    
        ```python
        >>> timeit.timeit("from golang import b,u; u('あ'*199998)", number=10)
        2.02020001411438
        >>> timeit.timeit("from golang import b,u; repr(u('あ'*199998))", number=10)
        54.60263395309448
        ```
    
        `bytes`(str) is very fast.
    
        ```python
        >>> timeit.timeit("from golang import b,u; bytes('あ'*199998)", number=10)
        0.000392913818359375
        >>> timeit.timeit("from golang import b,u; repr(bytes('あ'*199998))", number=10)
        0.4604980945587158
        ```
    
        `b` is much faster than `u`, but still the repr seems slow.
    
        ```
        >>> timeit.timeit("from golang import b,u; b('あ'*199998)", number=10)
        0.0009968280792236328
        >>> timeit.timeit("from golang import b,u; repr(b('あ'*199998))", number=10)
        25.498882055282593
        ```
    
    The "repr" part of this problem is due to that both bstr.__repr__ and
    ustr.__repr__ use custom quoting routines which currently are implemented in
    pure python in strconv module:
    
    https://lab.nexedi.com/kirr/pygolang/blob/300d7dfa/golang/_golang_str.pyx#L282-291
    https://lab.nexedi.com/kirr/pygolang/blob/300d7dfa/golang/_golang_str.pyx#L582-591
    https://lab.nexedi.com/kirr/pygolang/blob/300d7dfa/golang/_golang_str.pyx#L941-970
    https://lab.nexedi.com/kirr/pygolang/blob/300d7dfa/golang/strconv.py#L31-92
    
    The fix would be to move strconv.py to Cython and to correspondingly rework it
    to avoid using python-level constructs during quoting internally.
    
    Working on that was not a priority, but soon I will need to move strconv to
    Cython for another reason: to be able to break import cycle in between _golang
    and strconv.
    
    So it makes sense to add strconv benchmark first - since we'll start moving it
    to Cython anyway - to see where we are and how further changes will help
    performance-wise.
    
    Currently we are at
    
        name                 time/op
        quote[a]              910µs ± 0%
        quote[\u03b1]        1.23ms ± 0%
        quote[\u65e5]         800µs ± 0%
        quote[\U0001f64f]    1.06ms ± 1%
        stdquote             1.17µs ± 0%
        unquote[a]           1.33ms ± 1%
        unquote[\u03b1]       952µs ± 2%
        unquote[\u65e5]       613µs ± 2%
        unquote[\U0001f64f]  3.62ms ± 1%
        stdunquote            788ns ± 0%
    
    i.e. on py2 quoting is ~ 1000x slower than builtin string escaping, and unquoting is
    even slower.
    
    on py3 the situation is better, but still not good:
    
        name                 time/op
        quote[a]              579µs ± 1%
        quote[\u03b1]         942µs ± 1%
        quote[\u65e5]         595µs ± 0%
        quote[\U0001f64f]     274µs ± 1%
        stdquote             2.70µs ± 0%
        unquote[a]            696µs ± 1%
        unquote[\u03b1]       763µs ± 0%
        unquote[\u65e5]       474µs ± 1%
        unquote[\U0001f64f]   187µs ± 0%
        stdunquote            808ns ± 0%
    
    δ(py2, py3) for the reference:
    
        name                 py2 time/op  py3 time/op  delta
        quote[a]              910µs ± 0%   579µs ± 1%   -36.42%  (p=0.008 n=5+5)
        quote[\u03b1]        1.23ms ± 0%  0.94ms ± 1%   -23.17%  (p=0.008 n=5+5)
        quote[\u65e5]         800µs ± 0%   595µs ± 0%   -25.63%  (p=0.016 n=4+5)
        quote[\U0001f64f]    1.06ms ± 1%  0.27ms ± 1%   -74.23%  (p=0.008 n=5+5)
        stdquote             1.17µs ± 0%  2.70µs ± 0%  +129.71%  (p=0.008 n=5+5)
        unquote[a]           1.33ms ± 1%  0.70ms ± 1%   -47.71%  (p=0.008 n=5+5)
        unquote[\u03b1]       952µs ± 2%   763µs ± 0%   -19.82%  (p=0.008 n=5+5)
        unquote[\u65e5]       613µs ± 2%   474µs ± 1%   -22.76%  (p=0.008 n=5+5)
        unquote[\U0001f64f]  3.62ms ± 1%  0.19ms ± 0%   -94.84%  (p=0.016 n=5+4)
        stdunquote            788ns ± 0%   808ns ± 0%    +2.59%  (p=0.016 n=4+5)
    23f0a47c
strconv_test.py 5.79 KB