Commits · 3f2215685c3bb9a9f6f6ba326545b3f913e14825 · Kirill Smelkov / pygolang

20 Dec, 2024 10 commits

golang_str: tests: Make test_strings_mod_and_format more robust with upcoming unicode=ustr · 3f221568

Kirill Smelkov authored Dec 17, 2024

Previously test_strings_mod_and_format was testing % and .format via
compareing bstr and ustr results with similar result for unicode. This
works reasonably ok. However under gpython, when unicode will be
replaced with ustr, it will no longer compare results of bstr/ustr
methods with something good and external - indeed in that case bstr/ustr
e.g. result of % will be compared to result of ustr % which opens the
door for bugs to stay unnoticed.

-> Adjust the test, similarly to 9a075b17 (golang_str: tests: Make
test_strings_methods more robust with upcoming unicode=ustr), to
explicitly provide expected result for all entries in the test vector.
We make sure those results are good and match std python because we also
assert that unicode % and .format match it.

3f221568

golang_str: Fix ustr.translate on sequence · d76d5e1a

Kirill Smelkov authored May 06, 2024

NumPy uses s.translate(str) and under gpython/py3 with str patched to be
ustr it breaks with:

      File ".../numpy-1.24.4-py3.9-linux-x86_64.egg/numpy/core/_string_helpers.py", line 40, in english_lower
        lowered = s.translate(LOWER_TABLE)
      File "golang/_golang_str.pyx", line 909, in golang._golang._pyustr.translate
    AttributeError: 'str' object has no attribute 'items'

https://docs.python.org/3/library/stdtypes.html#str.translate documents
translate to work on both mappings and sequences, so my usage of
table.items() in ff24be3d (golang_str: bstr/ustr string methods) was not
correct.

-> Fix it by reworking ustr.translate to use our proxy mapping instead
of going through all items of original table in the beginning.

d76d5e1a

golang_str: tests: Fix thinko wrt \u in tests · b31c5fa2

Kirill Smelkov authored May 06, 2024

On py2 \u does not work in str literals - only in unicode ones.

This corrects all tests that were doing x32 incorrectly due to the thinko.

b31c5fa2

golang_str: Fix bstr/ustr .__str__ to always return bstr/ustr even for subclasses · d4dcf5dd

Kirill Smelkov authored May 02, 2023

This behaviour is provided by builtin str and we were not following it:

    $ python3
    Python 3.11.2 (main, Mar 13 2023, 12:18:29) [GCC 12.2.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> class SSS(str): pass
    ...
    >>> z = SSS('abc')
    >>> z
    'abc'
    >>> type(z)
    <class '__main__.SSS'>
    >>> q = str(z)
    >>> q
    'abc'
    >>> type(q)
    <class 'str'>
    >>> r = z.__str__()
    >>> r
    'abc'
    >>> type(r)
    <class 'str'>                       <-- NOTE str, not __main__.SSS

    $ gpython               # with str patched to be ustr
    >>> class SSS(str): pass
    >>> z = SSS('abc')
    >>> z
    'abc'
    >>> type(z)
    <class '__main__.SSS'>
    >>> q = str(z)
    >>> q
    'abc'
    >>> type(q)
    <class 'str'>
    >>> r = z.__str__()
    >>> r
    'abc'
    >>> type(r)
    <class '__main__.SSS'>              <-- NOTE not str

which leads to crash during IPython startup on py3.11:

    $ gpython -m IPython    # with str patched to be ustr
    Traceback (most recent call last):
      File "/home/kirr/src/tools/go/py3.venv/bin/gpython", line 8, in <module>
        sys.exit(main())
                 ^^^^^^
      File "/home/kirr/src/tools/go/pygolang-master/gpython/__init__.py", line 478, in main
        pymain(argv, init)
      File "/home/kirr/src/tools/go/pygolang-master/gpython/__init__.py", line 291, in pymain
        run(mmain)
      File "/home/kirr/src/tools/go/pygolang-master/gpython/__init__.py", line 162, in run
        runpy._run_module_as_main(mod)
      File "<frozen runpy>", line 198, in _run_module_as_main
      File "<frozen runpy>", line 88, in _run_code
      File "/home/kirr/src/tools/go/py3.venv/lib/python3.11/site-packages/IPython/__main__.py", line 15, in <module>
        start_ipython()
      File "/home/kirr/src/tools/go/py3.venv/lib/python3.11/site-packages/IPython/__init__.py", line 128, in start_ipython
        return launch_new_instance(argv=argv, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/kirr/src/tools/go/py3.venv/lib/python3.11/site-packages/traitlets/config/application.py", line 1042, in launch_instance
        app.initialize(argv)
      File "/home/kirr/src/tools/go/py3.venv/lib/python3.11/site-packages/traitlets/config/application.py", line 113, in inner
        return method(app, *args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/kirr/src/tools/go/py3.venv/lib/python3.11/site-packages/IPython/terminal/ipapp.py", line 279, in initialize
        self.init_shell()
      File "/home/kirr/src/tools/go/py3.venv/lib/python3.11/site-packages/IPython/terminal/ipapp.py", line 293, in init_shell
        self.shell = self.interactive_shell_class.instance(parent=self,
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/kirr/src/tools/go/py3.venv/lib/python3.11/site-packages/traitlets/config/configurable.py", line 551, in instance
        inst = cls(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^
      File "/home/kirr/src/tools/go/py3.venv/lib/python3.11/site-packages/IPython/terminal/interactiveshell.py", line 856, in __init__
        self.init_prompt_toolkit_cli()
      File "/home/kirr/src/tools/go/py3.venv/lib/python3.11/site-packages/IPython/terminal/interactiveshell.py", line 648, in init_prompt_toolkit_cli
        **self._extra_prompt_options(),
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/kirr/src/tools/go/py3.venv/lib/python3.11/site-packages/IPython/terminal/interactiveshell.py", line 751, in _extra_prompt_options
        "lexer": IPythonPTLexer(),
                 ^^^^^^^^^^^^^^^^
      File "/home/kirr/src/tools/go/py3.venv/lib/python3.11/site-packages/IPython/terminal/ptutils.py", line 177, in __init__
        self.python_lexer = PygmentsLexer(l.Python3Lexer)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/kirr/src/tools/go/py3.venv/lib/python3.11/site-packages/prompt_toolkit/lexers/pygments.py", line 198, in __init__
        self.pygments_lexer = pygments_lexer_cls(
                              ^^^^^^^^^^^^^^^^^^^
      File "/home/kirr/src/tools/go/py3.venv/lib/python3.11/site-packages/pygments/lexer.py", line 647, in __call__
        cls._tokens = cls.process_tokendef('', cls.get_tokendefs())
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/kirr/src/tools/go/py3.venv/lib/python3.11/site-packages/pygments/lexer.py", line 586, in process_tokendef
        cls._process_state(tokendefs, processed, state)
      File "/home/kirr/src/tools/go/py3.venv/lib/python3.11/site-packages/pygments/lexer.py", line 549, in _process_state
        tokens.extend(cls._process_state(unprocessed, processed,
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/kirr/src/tools/go/py3.venv/lib/python3.11/site-packages/pygments/lexer.py", line 533, in _process_state
        assert type(state) is str, "wrong state name %r (%r)" % (state, type(state))
               ^^^^^^^^^^^^^^^^^^
    AssertionError: wrong state name 'keywords' (<class 'pygments.lexer.include'>)

    If you suspect this is an IPython 8.12.0 bug, please report it at:
        https://github.com/ipython/ipython/issues
    or send an email to the mailing list at ipython-dev@python.org

    You can print a more detailed traceback right now with "%tb", or use "%debug"
    to interactively debug it.

    Extra-detailed tracebacks for bug-reporting purposes can be enabled via:
        c.Application.verbose_crash=True

Here pygments define

    class include(str):
        pass

and wants `str(obj)` to return str, not include if obj was instance of include.

-> Adjust bstr/ustr .__str__() to always return bstr/ustr even for
subclassed.

For consistency, do the same for .__unicode__ . In case a
subclass wants its __str__, or __unicode__ to return self
without casting to bstr/ustr, it can override those methods.

d4dcf5dd

golang_str: Fix bstr/ustr __add__ and friends to return NotImplemented wrt unsupported types · aa5d2f91

Kirill Smelkov authored May 10, 2024

In bbbb58f0 (golang_str: bstr/ustr support for + and *) I've added
support for binary string operations, but similarly to __eq__ did not
handle correctly the case for arbitrary arguments that potentially
define __radd__ and similar.

As the result it breaks when running e.g. bstr + pyparsing.Regex

      File ".../pyparsing-2.4.7-py2.7.egg/pyparsing.py", line 6591, in pyparsing_common
        _full_ipv6_address = (_ipv6_part + (':' + _ipv6_part) * 7).setName("full IPv6 address")
      File "golang/_golang_str.pyx", line 469, in golang._golang._pybstr.__add__
        return pyb(zbytes.__add__(a, _pyb_coerce(b)))
      File "golang/_golang_str.pyx", line 243, in golang._golang._pyb_coerce
        raise TypeError("b: coerce: invalid type %s" % type(x))
    TypeError: b: coerce: invalid type <class 'pyparsing.Regex'>

because pyparsing.Regex is a type, that does not inherit from str, but defines
its own __radd__ to handle str + Regex as Regex.

-> Fix it by returning NotImplemented from under __add__ and other operations
where it is needed so that bstr and ustr behave in the same way as builtin str
wrt third types, but care to handle bstr/ustr promise that

    only explicit conversion through `b` and `u` accept objects with buffer interface. Automatic coercion does not.

aa5d2f91

golang_str: Fix bstr/ustr __eq__ and friends to return NotImplemented wrt non-string types · 09694757

Kirill Smelkov authored May 08, 2024

In 54c2a3cf (golang_str: Teach bstr/ustr to compare wrt any
string with automatic coercion) I've added __eq__, __ne__, __lt__ etc
methods to our strings, but __lt__ and other comparison to raise
TypeError against any non-string type. My idea was to mimic user-visible
py3 behaviour such as

    >>> "abc" > 1
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: '>' not supported between instances of 'str' and 'int'

However it turned out that the implementation was not exactly matching
what Python is doing internally which lead to incorrect behaviour when
bstr or ustr is compared wrt another type with its own __cmp__. In the
general case for `a op b` Python first queries a.__op__(b) and
b.__op'__(a) and sometimes other methods before going to .__cmp__. This
relies on the methods to return NotImplemented instead of raising an
exception and if a trial raises TypeError everything is stopped and that
TypeError is returned to the caller.

Jérome reports a real breakage due to this when bstr is compared wrt
distutils.version.LooseVersion . LooseVersion is basically

    class LooseVersion(Version):
        def __cmp__ (self, other):
            if isinstance(other, StringType):
                other = LooseVersion(other)

            return cmp(self.version, other.version)

but due to my thinko on `LooseVersion < bstr` the control flow was not
getting into that LooseVersion.__cmp__ because bstr.__gt__ was tried
first and raised TypeError.

-> Fix all comparison operations to return NotImplemented instead of
raising TypeError and make sure in the tests that this behaviour exactly
matches what native str type does.

The fix is needed not only for py2 because added test_strings_cmp_wrt_distutils_LooseVersion
was failing on py3 as well without the fix.

/reported-by @jerome
/reported-on nexedi/slapos!1575 (comment 206080)

09694757

golang_str: Add ustr.decode for symmetry with bstr.decode and because gpy2 breaks without it · da4b857b

Kirill Smelkov authored May 10, 2024

Without working unicode.decode gpython/py2 with unicode replaced by ustr
fails when running ERP5 as follows:

$ /srv/slapgrid/slappart49/t/ekg/i/5/bin/runTestSuite --help
No handlers could be found for logger "SecurityInfo"
Traceback (most recent call last):
File "/srv/slapgrid/slappart49/t/ekg/soft/b5048b47894a7612651c7fe81c2c8636/bin/.runTestSuite.pyexe", line 296, in <module>
main()
File "/srv/slapgrid/slappart49/t/ekg/soft/b5048b47894a7612651c7fe81c2c8636/parts/pygolang/gpython/__init__.py", line 484, in main
pymain(argv, init)
File "/srv/slapgrid/slappart49/t/ekg/soft/b5048b47894a7612651c7fe81c2c8636/parts/pygolang/gpython/__init__.py", line 292, in pymain
run(mmain)
File "/srv/slapgrid/slappart49/t/ekg/soft/b5048b47894a7612651c7fe81c2c8636/parts/pygolang/gpython/__init__.py", line 192, in run
_execfile(filepath, mmain.__dict__)
File "/srv/slapgrid/slappart49/t/ekg/soft/b5048b47894a7612651c7fe81c2c8636/parts/pygolang/gpython/__init__.py", line 339, in _execfile
six.exec_(code, globals, locals)
File "/srv/slapgrid/slappart49/t/ekg/soft/b5048b47894a7612651c7fe81c2c8636/eggs/six-1.16.0-py2.7.egg/six.py", line 735, in exec_
exec("""exec _code_ in _globs_, _locs_""")
File "<string>", line 1, in <module>
File "/srv/slapgrid/slappart49/t/ekg/soft/b5048b47894a7612651c7fe81c2c8636/bin/runTestSuite", line 10, in <module>
from Products.ERP5Type.tests.runTestSuite import main; sys.exit(main())
File "/srv/slapgrid/slappart49/t/ekg/soft/b5048b47894a7612651c7fe81c2c8636/parts/erp5/product/ERP5Type/__init__.py", line 96, in <module>
from . import ZopePatch
File "/srv/slapgrid/slappart49/t/ekg/soft/b5048b47894a7612651c7fe81c2c8636/parts/erp5/product/ERP5Type/ZopePatch.py", line 75, in <module>
from Products.ERP5Type.patches import ZopePageTemplateUtils
File "/srv/slapgrid/slappart49/t/ekg/soft/b5048b47894a7612651c7fe81c2c8636/parts/erp5/product/ERP5Type/patches/ZopePageTemplateUtils.py", line 58, in <module>
convertToUnicode(u'', 'text/xml', ())
File "/srv/slapgrid/slappart49/t/ekg/soft/b5048b47894a7612651c7fe81c2c8636/eggs/Zope-4.8.9+slapospatched002-py2.7.egg/Products/PageTemplates/utils.py", line 73, in convertToUnicode
return source.decode(encoding), encoding
AttributeError: unreadable attribute

and in general if we treat both bstr ans ustr being two different
representations of the same entity, if we have bstr.decode, having
ustr.decode is also needed for symmetry with both operations converting
bytes representation of the string into unicode.

Now there is full symmetry in between bstr/ustr and encode/decode. Quoting updated encode/decode text:

Encode encodes unicode representation of the string into bytes, leaving string domain.
Decode decodes bytes representation of the string into ustr, staying inside string domain.

Both bstr and ustr are accepted by encode and decode treating them as two
different representations of the same entity.

On encoding, for bstr, the string representation is first converted to
unicode and encoded to bytes from there. For ustr unicode representation
of the string is directly encoded.

On decoding, for ustr, the string representation is first converted to
bytes and decoded to unicode from there. For bstr bytes representation of
the string is directly decoded.

da4b857b

golang_str: Adjust bstr/ustr .encode() and .__bytes__ to leave string domain into bytes · 6f26b32c

Kirill Smelkov authored May 07, 2024

Initially in 023907ee (golang_str: bstr/ustr encode/decode) I
implemented things in such a way that (b|u)str.__bytes__ were giving
bstr and ustr.encode() was giving bstr as well. My logic here was that
bstr is based on bytes and it is ok to give that.

However this logic did not pass backward compatibility test: for example
when LXML is imported it does

    cdef bytes _FILENAME_ENCODING = (sys.getfilesystemencoding() or sys.getdefaultencoding() or 'ascii').encode("UTF-8")

and under gpython/py3 with unicode patched to be ustr it breaks with

      File "/srv/slapgrid/slappart47/srv/runner/software/7f1663e8148f227ce3c6a38fc52796e2/bin/runwsgi", line 4, in <module>
        from Products.ERP5.bin.zopewsgi import runwsgi; sys.exit(runwsgi())
      File "/srv/slapgrid/slappart47/srv/runner/software/7f1663e8148f227ce3c6a38fc52796e2/parts/erp5/product/ERP5/__init__.py", line 36, in <module>
        from Products.ERP5Type.Utils import initializeProduct, updateGlobals
      File "/srv/slapgrid/slappart47/srv/runner/software/7f1663e8148f227ce3c6a38fc52796e2/parts/erp5/product/ERP5Type/__init__.py", line 42, in <module>
        from .patches import pylint
      File "/srv/slapgrid/slappart47/srv/runner/software/7f1663e8148f227ce3c6a38fc52796e2/parts/erp5/product/ERP5Type/patches/pylint.py", line 524, in <module>
        __import__(module_name, fromlist=[module_name], level=0))
      File "src/lxml/sax.py", line 18, in init lxml.sax
      File "src/lxml/etree.pyx", line 154, in init lxml.etree
    TypeError: Expected bytes, got golang.bstr

The breakage highlights a thinko in my previous reasoning: yes bstr is based on
bytes, but bstr has different semantics compared to bytes: even though e.g.
__getitem__ works the same way for bytes on py2, it works differently compared
to py3. This way if on py3 a program is doing bytes(x) or x.encode() it then
expects the result to have bytes semantics of current python which is not the
case if the result is bstr.

-> Fix that by adjusting .encode() and .__bytes__() to produce bytes type of
   current python and leave string domain.

I initially was contemplating for some time to introduce a third type, e.g.
bvec also based on bytes, but having bytes semantic and that bvec.decode would
return back to pygolang strings domain. But due to the fact that bytes semantic
is different in between py2 and py3, it would mean that bvec provided by
pygolang would need to have different behaviours dependent on current python
version which is undesirable.

In the end with leaving into native bytes the "bytes inconsistency" problem is
left to remain under std python with pygolang targeting only to fix strings
inconsistency in between py2 and py3 and providing the same semantic for
bstr and ustr on all python versions.

It also does not harm that bytes.decode() returns std unicode instead of ustr:
for programs that run under unpatched python we have u() to convert the result
to ustr, while under gpython std unicode is actually ustr which makes
bytes.decode() behaviour still quite ok.

P.S. we enable bstr.encode for consistency and because under py2, if not
enabled, it will break when running pytest under gpython in

          File ".../_pytest/assertion/rewrite.py", line 352, in <module>
            RN = "\r\n".encode("utf-8")
        AttributeError: unreadable attribute

6f26b32c

golang_str: Fix iter(bstr) to yield byte instead of unicode character · 8d76276c

Kirill Smelkov authored May 07, 2024

In a72c1c1a (golang_str: bstr/ustr iteration) things were initially
implemented to follow Go semantic exactly with bytestring iteration
yielding unicode characters as explained in
https://blog.golang.org/strings. However this makes bstr not a 100%
drop-in compatible replacement for std str under py2, and even though my
initial testing was saying this change does not affect programs in
practice it turned out to be not the case.

For example with bstr.__iter__ yielding unicode characters running
gpython on py2 with builtin str patched to be bstr will break sometimes
when importing uuid:

There uuid reads 16 bytes from /dev/random and then wants to iterate
those 16 bytes as single bytes and then expects that the length
of the resulting sequence is exactly 16:

     int = long(('%02x'*16) % tuple(map(ord, bytes)), 16)

     ( https://github.com/python/cpython/blob/2.7-0-g8d21aa21f2c/Lib/uuid.py#L147 )

which breaks if some of the read bytes are higher than 0x7f.

Even though this particular problem could be worked-around with
patching uuid, there is no evidence that there will be no similar
problems later, which could be many.

-> So adjust bstr semantic instead to follow semantic of str under py2
   and introduce uiter() primitive to still be able to iterate
   bytestrings as unicode characters.

This makes bstr, hopefully, to be fully compatible with str on py2 while
still providing reasonably good approach for strings processing the
Go-way when needed.

Add biter as well for symmetry.

See

    nexedi/pygolang!21 (comment 170754)
    nexedi/pygolang!21 (comment 170782)
    ...

and

    nexedi/pygolang!21 (comment 206044)

for discussion on iter(bstr) topic.

8d76276c

strconv: Optimize quoting lightly · a11cb5dc

Kirill Smelkov authored Jun 26, 2023

Add type annotations and use C-level objects instead of py-ones where it
is easy to do. We are not all-good yet, but this already brings some noticable speedup:

name old time/op new time/op delta
quote[a] 786µs ± 1% 10µs ± 0% -98.76% (p=0.016 n=4+5)
quote[\u03b1] 1.12ms ± 0% 0.41ms ± 0% -63.37% (p=0.008 n=5+5)
quote[\u65e5] 738µs ± 2% 258µs ± 0% -65.07% (p=0.016 n=4+5)
quote[\U0001f64f] 920µs ± 1% 78µs ± 0% -91.46% (p=0.016 n=5+4)
stdquote 1.19µs ± 0% 1.19µs ± 0% ~ (p=0.794 n=5+5)
unquote[a] 1.08ms ± 0% 1.08ms ± 1% ~ (p=0.548 n=5+5)
unquote[\u03b1] 797µs ± 0% 807µs ± 1% +1.23% (p=0.008 n=5+5)
unquote[\u65e5] 522µs ± 0% 520µs ± 1% ~ (p=0.056 n=5+5)
unquote[\U0001f64f] 3.21ms ± 0% 3.14ms ± 0% -2.13% (p=0.008 n=5+5)
stdunquote 815ns ± 0% 836ns ± 0% +2.63% (p=0.008 n=5+5)

a11cb5dc

16 Dec, 2024 14 commits

golang, strconv: Switch them to cimport each other at pyx level · e5c513bf

Kirill Smelkov authored Jun 26, 2023

Since 50b8cb7e (strconv: Move functionality related to UTF8
encode/decode into _golang_str) both golang_str and strconv import each
other.

Before this patch that import was done at py level at runtime from
outside to workaround the import cycle. This results in that strconv
functionality is not available while golang is only being imported.
So far it was not a problem, but when builtin string types will become
patched with bstr and ustr, that will become a problem because string
repr starts to be used at import time, which for pybstr is implemented
via strconv.quote .

-> Fix this by switching golang and strconv to cimport each other at pyx
level. There, similarly to C, the cycle works just ok out of the box.

This also automatically helps performance a bit:

name old time/op new time/op delta
quote[a] 805µs ± 0% 786µs ± 1% -2.40% (p=0.016 n=5+4)
quote[\u03b1] 1.21ms ± 0% 1.12ms ± 0% -7.47% (p=0.008 n=5+5)
quote[\u65e5] 785µs ± 0% 738µs ± 2% -5.97% (p=0.016 n=5+4)
quote[\U0001f64f] 1.04ms ± 0% 0.92ms ± 1% -11.73% (p=0.008 n=5+5)
stdquote 1.18µs ± 0% 1.19µs ± 0% +0.54% (p=0.008 n=5+5)
unquote[a] 1.26ms ± 0% 1.08ms ± 0% -14.66% (p=0.008 n=5+5)
unquote[\u03b1] 911µs ± 1% 797µs ± 0% -12.55% (p=0.008 n=5+5)
unquote[\u65e5] 592µs ± 0% 522µs ± 0% -11.81% (p=0.008 n=5+5)
unquote[\U0001f64f] 3.46ms ± 0% 3.21ms ± 0% -7.34% (p=0.008 n=5+5)
stdunquote 812ns ± 1% 815ns ± 0% ~ (p=0.183 n=5+5)

e5c513bf

strconv: Move it to pyx · 2684dc94

Kirill Smelkov authored Jun 26, 2023

So far this is plain code movement with no type annotations added and
internal from-strconv imports still being done via py level.

As expected this does not help practically for performance yet:

name old time/op new time/op delta
quote[a] 910µs ± 0% 805µs ± 0% -11.54% (p=0.008 n=5+5)
quote[\u03b1] 1.23ms ± 0% 1.21ms ± 0% -1.24% (p=0.008 n=5+5)
quote[\u65e5] 800µs ± 0% 785µs ± 0% -1.86% (p=0.016 n=4+5)
quote[\U0001f64f] 1.06ms ± 1% 1.04ms ± 0% -1.92% (p=0.008 n=5+5)
stdquote 1.17µs ± 0% 1.18µs ± 0% +0.80% (p=0.008 n=5+5)
unquote[a] 1.33ms ± 1% 1.26ms ± 0% -5.13% (p=0.008 n=5+5)
unquote[\u03b1] 952µs ± 2% 911µs ± 1% -4.25% (p=0.008 n=5+5)
unquote[\u65e5] 613µs ± 2% 592µs ± 0% -3.48% (p=0.008 n=5+5)
unquote[\U0001f64f] 3.62ms ± 1% 3.46ms ± 0% -4.32% (p=0.008 n=5+5)
stdunquote 788ns ± 0% 812ns ± 1% +3.07% (p=0.016 n=4+5)

2684dc94

unicode/utf8: Start of the package (stub) · cd69a8ad

Kirill Smelkov authored Jun 26, 2023

We will soon need to use error rune codepoint from both golang_str.pyx
and strconv.pyx - so we need to move that definition into shared place.
What fits best is unicode/utf8, so start that package and move the
constant there.

cd69a8ad

*: uint8_t -> byte, unicode-codepint -> rune · bd662e01

Kirill Smelkov authored Jun 26, 2023

We added byte and rune types in the previous patch. Let's use them now
throughout whole codebase where appropriate.

Currently the only place where unicode-codepoint is used is
_utf8_decode_rune. uint8_t was used in many places.

bd662e01

golang, libgolang: Add byte / rune types · 7505febc

Kirill Smelkov authored Jun 26, 2023

Those types are the base when working with byte- and unicode strings.
It will be clearer to use them explicitly instead of uint8_t and int32_t
when processing string.

7505febc

strconv: Add benchmarks for quote and unquote · 23f0a47c

Kirill Smelkov authored Jun 23, 2023

This functions are currently relatively slow. They were initially used
in zodbdump and zodbrestore, where their speed did not matter much, but
with bstr and ustr, since e.g. quote is used in repr, not having them to
perform with speed similar to builtin string escaping starts to be an
issue. Tatuya Kamada reports at nexedi/pygolang!21 (comment 170833) :

    ### 3. `u` seems slow with large arrays especially when `repr` it

    I have faced a slowness while testing `u`, `b` with python 2.7, especially with `repr`.

    ```python
    >>> timeit.timeit("from golang import b,u; u('あ'*199998)", number=10)
    2.02020001411438
    >>> timeit.timeit("from golang import b,u; repr(u('あ'*199998))", number=10)
    54.60263395309448
    ```

    `bytes`(str) is very fast.

    ```python
    >>> timeit.timeit("from golang import b,u; bytes('あ'*199998)", number=10)
    0.000392913818359375
    >>> timeit.timeit("from golang import b,u; repr(bytes('あ'*199998))", number=10)
    0.4604980945587158
    ```

    `b` is much faster than `u`, but still the repr seems slow.

    ```
    >>> timeit.timeit("from golang import b,u; b('あ'*199998)", number=10)
    0.0009968280792236328
    >>> timeit.timeit("from golang import b,u; repr(b('あ'*199998))", number=10)
    25.498882055282593
    ```

The "repr" part of this problem is due to that both bstr.__repr__ and
ustr.__repr__ use custom quoting routines which currently are implemented in
pure python in strconv module:

https://lab.nexedi.com/kirr/pygolang/blob/300d7dfa/golang/_golang_str.pyx#L282-291
https://lab.nexedi.com/kirr/pygolang/blob/300d7dfa/golang/_golang_str.pyx#L582-591
https://lab.nexedi.com/kirr/pygolang/blob/300d7dfa/golang/_golang_str.pyx#L941-970
https://lab.nexedi.com/kirr/pygolang/blob/300d7dfa/golang/strconv.py#L31-92

The fix would be to move strconv.py to Cython and to correspondingly rework it
to avoid using python-level constructs during quoting internally.

Working on that was not a priority, but soon I will need to move strconv to
Cython for another reason: to be able to break import cycle in between _golang
and strconv.

So it makes sense to add strconv benchmark first - since we'll start moving it
to Cython anyway - to see where we are and how further changes will help
performance-wise.

Currently we are at

    name                 time/op
    quote[a]              910µs ± 0%
    quote[\u03b1]        1.23ms ± 0%
    quote[\u65e5]         800µs ± 0%
    quote[\U0001f64f]    1.06ms ± 1%
    stdquote             1.17µs ± 0%
    unquote[a]           1.33ms ± 1%
    unquote[\u03b1]       952µs ± 2%
    unquote[\u65e5]       613µs ± 2%
    unquote[\U0001f64f]  3.62ms ± 1%
    stdunquote            788ns ± 0%

i.e. on py2 quoting is ~ 1000x slower than builtin string escaping, and unquoting is
even slower.

on py3 the situation is better, but still not good:

    name                 time/op
    quote[a]              579µs ± 1%
    quote[\u03b1]         942µs ± 1%
    quote[\u65e5]         595µs ± 0%
    quote[\U0001f64f]     274µs ± 1%
    stdquote             2.70µs ± 0%
    unquote[a]            696µs ± 1%
    unquote[\u03b1]       763µs ± 0%
    unquote[\u65e5]       474µs ± 1%
    unquote[\U0001f64f]   187µs ± 0%
    stdunquote            808ns ± 0%

δ(py2, py3) for the reference:

    name                 py2 time/op  py3 time/op  delta
    quote[a]              910µs ± 0%   579µs ± 1%   -36.42%  (p=0.008 n=5+5)
    quote[\u03b1]        1.23ms ± 0%  0.94ms ± 1%   -23.17%  (p=0.008 n=5+5)
    quote[\u65e5]         800µs ± 0%   595µs ± 0%   -25.63%  (p=0.016 n=4+5)
    quote[\U0001f64f]    1.06ms ± 1%  0.27ms ± 1%   -74.23%  (p=0.008 n=5+5)
    stdquote             1.17µs ± 0%  2.70µs ± 0%  +129.71%  (p=0.008 n=5+5)
    unquote[a]           1.33ms ± 1%  0.70ms ± 1%   -47.71%  (p=0.008 n=5+5)
    unquote[\u03b1]       952µs ± 2%   763µs ± 0%   -19.82%  (p=0.008 n=5+5)
    unquote[\u65e5]       613µs ± 2%   474µs ± 1%   -22.76%  (p=0.008 n=5+5)
    unquote[\U0001f64f]  3.62ms ± 1%  0.19ms ± 0%   -94.84%  (p=0.016 n=5+4)
    stdunquote            788ns ± 0%   808ns ± 0%    +2.59%  (p=0.016 n=4+5)

23f0a47c

golang_str: pybstr -> _pybstr ; pyustr -> _pyustr · e27197ce

Kirill Smelkov authored May 01, 2023

And let pybstr/pyustr point to version of bstr/ustr types that is actually in use:
- when bytes/unicode are not patched -> to _pybstr/_pyustr
- when bytes/unicode will be patched -> to bytes/unicode to where original
  _pybstr/_pyustr were copied during bytes/unicode patching.
at runtime the code uses pybstr/pyustr instead of _pybstr/_pyustr.

e27197ce

golang_str: Invoke bytes/unicode methods via zbytes/zunicode · d02a0f21

Kirill Smelkov authored Mar 26, 2023

GPython will patch builtin bytes and unicode types.
zbytes and zunicode will refer to original unpatched types.
We will use them to invoke original bytes/unicode methods.

NOTE we will test against bytes/unicode - not zbytes/zunicode - when
inspecting type of objects. In other words we will use original
bytes/unicode types only to refer to their original methods and code.

d02a0f21

golang_str: Switch bstr/ustr to cdef classes · 758727a4

Kirill Smelkov authored Mar 26, 2023

For gpython to switch builtin str/unicode to bstr/ustr we will need
bstr/ustr to have exactly the same C layout as builtin string types.
This is possible to achieve only via `cdef class`. It is also good to
switch to `cdef class` for RAM savings - from https://github.com/cython/cython/pull/5212#issuecomment-1387659026 :

    # what Cython does at runtime for `class MyBytes(bytes)`
    In [3]: MyBytes = type('MyBytes', (bytes,), {'__slots__': ()})

    In [4]: MyBytes
    Out[4]: __main__.MyBytes

    In [5]: a = bytes(b'123')

    In [6]: b = MyBytes(b'123')

    In [7]: a
    Out[7]: b'123'

    In [8]: b
    Out[8]: b'123'

    In [9]: a == b
    Out[9]: True

    In [10]: import sys

    In [11]: sys.getsizeof(a)
    Out[11]: 36

    In [12]: sys.getsizeof(b)
    Out[12]: 52

So with `cdef class` we gain more control and optimize memory usage.

This was not done before because cython forbids to `cdef class X(bytes)` due to
https://github.com/cython/cython/issues/711. We work it around in setup.py with
draft for proper patch pre-posted to upstream in https://github.com/cython/cython/pull/5212 .

758727a4

golang_str: tests: Make test_strings_methods more robust with upcoming unicode=ustr · 9a075b17

Kirill Smelkov authored May 01, 2023

Previously test_strings_methods was testing a method via comparing bstr
and ustr results of .method() with similar result of unicode.method().
This works reasonably ok. However under gpython, when unicode will be
replaced with ustr, it will no longer compare results of bstr/ustr
methods with something good and external - indeed in that case bstr/ustr
.method() will be compared to result of ustr.method() which opens the
door for bugs to stay unnoticed.

-> Adjust the test to explicitly provide expected result for all entries
in the test vector. We make sure those results are good and match std
python because we also assert that unicode.method() matches it.

9a075b17

golang_str: Fix bstr.decode to handle 'string-escape' codec properly · cd632a66

Kirill Smelkov authored May 01, 2023

On py2 str.decode('string-escape') returns str, not unicode and this
property is actually being used and relied upon by Lib/pickle.py:

https://github.com/python/cpython/blob/v2.7.18-0-g8d21aa21f2c/Lib/pickle.py#L967-L977

We promised bstr to be drop-in replacement for str on py2, so let's
adjust its behaviour to match the original because if we do not,
unpickling strings will break when str is replaced by bstr under
gpython.

Do not add bstr.encode yet until we hit a real case where it is actually used.

cd632a66

golang_str: tests: Adjust test_strings_index2 not to depend on repr(ustr|bstr) · e4cbdfae

Kirill Smelkov authored Apr 30, 2023

repr(ustr|bstr) will change behaviour depending on whether we are
running under regular python, or gpython with string types replaced by
bstr/ustr. But this test is completely orthogonal to that. -> Let's
untie it from particular repr behaviour by emitting verified items in
quoted form + asserting their types in the code.

e4cbdfae

fixup! golang_str: bstr/ustr pickle support · aa20637f

Kirill Smelkov authored Apr 30, 2023

In ebd18f3f the code was ok but there is a thinko in test: it needs to
test all pickle protocols from 0 to _including_ HIGHEST_PROTOCOL.

aa20637f

Sync with master · fdd73156
Kirill Smelkov authored Dec 16, 2024

fdd73156

04 Dec, 2024 3 commits

golang: Add support for @func(Class) and @func to be used over @property · 91a434d5

Kirill Smelkov authored Nov 29, 2024

Since the beginning of pygolang it is possible to define methods
separate from class. For example

    @func(MyClass)
    def my_method(self, ...):
        ...

will define MyClass.my_method(*). This works for regular functions and
staticmethod/classmethod as well. But support for properties was missing
because there was no use case so far.

-> Add support for properties as well as I hit the need for it during my
work on wendelin.core monitoring.

Test class changed to inherit from object since on py2 properties work
only for new-style classes.

(*) see afa46cf5 (Turn pygopath into full pygolang) and 942ee900
    (golang: Deprecate @method(cls) in favour of @func(cls)) for details.

/reviewed-by @levin.zimmermann
/reviewed-on nexedi/pygolang!31

91a434d5

golang: Make @func to be idempotent · 5302558e

Kirill Smelkov authored Nov 29, 2024

i.e. make double call of func(func(f)) to return exactly the same as func(f).

This is correct to do as the first func call already returns a wrapper
that setups additional frame for defer. The second func call, if doing
the same, will wrap the thing just one more time and there will be two
frames for defer, but defer needs only one to work correctly.

So far we had no case when such double func calls would appear in
practice, because

    @func
    @func
    def f():
         ...

would immediately catch attention.

However in the next patch we will have this case to appear internally
when handling properties. So it is better to make sure beforehand no
waste of resources will happen.

/reviewed-by @levin.zimmermann
/reviewed-on nexedi/pygolang!31

5302558e

golang: Adjust @func to wrap functions with standalone wrapper with recognizable name · 0c0f4b2a

Kirill Smelkov authored Nov 28, 2024

Since 5146eb0b (Add support for defer & recover) we have func, which for

    @func
    def f():
        ...

will turn f to be run with additional frame where defer can register calls.

This works ok, but so far the worker of the wrapper was defined inside
func itself - each time func was used, and also the worker had "no
speaking" name _. The latter was making tracebacks a bit harder to read.

-> Move the wrapper to be standalone function with _goframe name. This
removes a bit of import-time overhead when @func is called, and makes
tracebacks a bit more readable.

But my original motivation here is to be able to detect double
func(func(·)) calls and make it idempotent - see next patch for that.

/reviewed-by @levin.zimmermann
/reviewed-on nexedi/pygolang!31

0c0f4b2a

25 Sep, 2024 3 commits

gpython: Implement -v · 9434cf08

Kirill Smelkov authored Sep 23, 2024

Tracing import statements might be handy while debugging things
related to initialization. Implementation is simple reexecution of
underlying python with that same -v like we already do for -O, -E and -X.

/reviewed-by @jerome
/reviewed-on nexedi/pygolang!30

9434cf08

gpython: Implement -X for non-gpython options · b7d0f6b2

Kirill Smelkov authored Sep 23, 2024

We already handle -X gpython.* starting from a6b993c8 (gpython: Add way
to run it with threads runtime). However any other non-gpython -X option
was leading to failure - for example:

    (z-dev) kirr@deca:~/src/tools/go/pygolang$ gpython -X faulthandler
    unknown option: '-X'

(well the error message was also not good)

However on py3 there are useful -X options that might be handy to use,
for example `-X faulthandler` and `-X importtime`.

-> Add support to pymain to handle those via reexecuting underlying
   interpreter like we already do for -O and -E.

/reviewed-by @jerome
/reviewed-on nexedi/pygolang!30

b7d0f6b2

gpython: Implement -E · 736143a5

Kirill Smelkov authored Sep 22, 2024

Let's teach gpython and pymain about -E (ignore $PYTHON* environment
variables) because new buildout runs python -E inside. Xavier reports:

    Since slapos was upgraded zc.buildout 3.0.1+slapos004, tests for
    slapos.rebootstrap and slapos.recipe.template fail because buildout now
    installs in develop with pip install --editable instead of python
    setup.py develop and in the process pip runs python -E, e.g.
    https://erp5js.nexedi.net/#/test_result_module/20240912-837A12F7/10

For the implementation use the same approach to reexecute underlying
interpreter with given low-level option as we already did for -O in
8564dfdd (gpython: Implement -O).

/reported-and-tested-by @xavier_thompson
/reviewed-by @jerome
/reviewed-on !30

736143a5

23 Sep, 2024 1 commit

golang: test: Fix for Pytest < 7 · 4fefae90

Kirill Smelkov authored Sep 22, 2024

In 74a9838c (golang: tests: Fix for Pytest ≥ 7.4) I fixed
test_defer_excchain_dump_pytest for Pytest ≥ 7.4 but missed that
pytest.version_tuple is not available for Pytest < 7.0(*) which started to
lead to pygolang test failures on py3 under SlapOS becuase there we
are still using pytest 4.6.11 :

    _______________________ test_defer_excchain_dump_pytest ________________________

        def test_defer_excchain_dump_pytest():
            # pytest 7.4 also changed traceback output format
            # similarly to ipython we do not need to test it becase we activate
            # pytest-related patch only on py2 for which latest pytest version is 4.6.11 .
            import pytest
    >       if six.PY3 and pytest.version_tuple >= (7,4):
    E       AttributeError: module 'pytest' has no attribute 'version_tuple'

https://stack.nexedi.com/test_result_module/20240920-666C5CF1/3

-> Fix that by checking pytest.version_tuple more carefully.

(*) see https://docs.pytest.org/en/stable/reference/reference.html#pytest-version-tuple

/reviewed-by @jerome
/reviewed-on !29

4fefae90

20 Jun, 2024 3 commits

golang: Fix `@func(cls) def name` not to set `name` in calling context · 30f06b4a

Kirill Smelkov authored Jun 18, 2024

This is take 2 after 924a808c (golang: Fix `@func(cls) def name` not to
override `name` in calling context). There we fixed it not to override
name if name was already set, but for the case of unset name it was
still set. The following example was thus not working correctly as
builtin `next` was shadowed:

    class BitSync

    @func(BitSync)
    def next(): ...         # this was shadowing access to builtin next

    def peek(seq):
        return next(...)    # here next was taken not from builtin, but
                            # from result of above shadowing

To solve the problem in the patch from 2019 I initially contemplated
patching bytecode because python unconditionally does STORE_NAME after a
function is defined with decorator:

    In [2]: c = """
       ...: @fff
       ...: def ccc():
       ...:     return 1
       ...: """

    In [3]: cc = compile(c, "file", "exec")

    In [4]: dis(cc)
      2           0 LOAD_NAME                0 (fff)
                  3 LOAD_CONST               0 (<code object ccc at 0x7fafe58d0130, file "file", line 2>)
                  6 MAKE_FUNCTION            0
                  9 CALL_FUNCTION            1
                 12 STORE_NAME               1 (ccc)	<-- NOTE means: ccc = what fff() call returns
                 15 LOAD_CONST               1 (None)
                 18 RETURN_VALUE

However after hitting this problem for real again and taking a fresh
look I found a way to arrange for the good end result without bytecode
magic: if name is initially unset @func can install its own custom
object, which, when overwritten by normal python codeflow of invoking
STORE_NAME after decorator, unsets the attribute.

That works quite ok and the patch with the fix is small.

/cc @jerome
/proposed-for-review-on nexedi/pygolang!28

30f06b4a

gpython: tests: Remove grepv utility · c3880bf4

Kirill Smelkov authored Jun 20, 2024

After previous patch it became unused. Should we need it again we can
revert hereby commit or write it anew.

c3880bf4

gpython: tests: Fix test of warning filters. · d10200c8

Carlos Ramos Carreño authored Jun 19, 2024

The tests for gpython's handling of warning filters assumed that
the warnings passed in the command line were located on the top
of the warning filters list.
This is not true in the presence of automatically imported modules
that set warning filters, such as `_distutils_hack`, which
[used to filter deprecation warnings from distutils](https://github.com/pypa/setuptools/commit/5d60ccefb48329b7cedfe6d78fc1cb95683104b6).

We fix it by comparing against a regex which allows extra filters
above or below the ones we set.

--------
kirr: setuptools in between v55 to v60.3.1 was installing 'ignore'
'distutils deprecated' DeprecationWarning filter referenced above.
As the result with such setuptools test_pymain was failing:

    >       assert _.startswith(
                b"sys.warnoptions: ['ignore', 'world', 'error::SyntaxWarning']\n\n" + \
                b"warnings.filters:\n" + \
                b"- error::SyntaxWarning::*\n" + \
                b"- ignore::Warning::*\n"), _
    E       AssertionError: b"sys.warnoptions: ['ignore', 'world', 'error::SyntaxWarning']
    E
    E         warnings.filters:
    E         - ignore:.+ distutils\\b.+ deprec...::PendingDeprecationWarning::*	<-- NOTE
    E         - ignore::ImportWarning::*
    E         - ignore::ResourceWarning::*
    E         - ignore::PEP440Warning::*
    E         "

    ...

Since now we only selectively check for the presence of gpython
should-be installed filters, it is also ok to remove explicit `grep -v
for ignore:sys.exc_clear:DeprecationWarning:threading` on py2.

/reviewed-by @kirr
/reviewed-on !1

d10200c8

07 Jun, 2024 1 commit

setup: Require a minimum setuptools version for Python 3. · 0d53a6d1

Carlos Ramos Carreño authored Jun 06, 2024

When the `setuptools_dso` module is used having older versions of
setuptools installed, the `wheel` module (one of its dependencies)
sets up a handler for the root logging that writes to stdout
(https://github.com/pypa/wheel/issues/622).
This breaks the expected output of the programs, and thus the
wendelin.core tests.

However, if `setuptools.logging` is available, `wheel` will use it
instead and it won't set up a handler.
Thus, I added a explicit dependency to a version of setuptools above
60.2 for Python 3, as that is the first version that provides this
module.

--------
kirr:

Move setuptools pinning close to setuptools_dso requirement to which it
relates and add corresponding comment. Setuptools pinning also fixes e.g.
the following test failure inside pygolang itself:

    golang/golang_str_test.py::test_strings_print FAILED

    ============================== FAILURES ==============================
    _________________________ test_strings_print _________________________

        def test_strings_print():
            outok = readfile(dir_testprog + "/golang_test_str.txt")
            retcode, stdout, stderr = _pyrun(["golang_test_str.py"],
                                        cwd=dir_testprog, stdout=PIPE, stderr=PIPE)
            assert retcode == 0, (stdout, stderr)
            assert stderr == b""
    >       assertDoc(outok, stdout)

    golang/golang_str_test.py:121:
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    want = 'print(qq(b)): "привет αβγ b"\nprint(qq(u)): "привет αβγ u"\n'
    got = 'Extend DSO search path to \'PYGOLANG/golang/runtime\'\nprint(qq(b)): "привет αβγ b"\nprint(qq(u)): "привет αβγ u"\n'

        def assertDoc(want, got):
            want = u(want)
            got  = u(got)

            # normalize got to PYGOLANG
            udir_pygolang = abbrev_home(dir_pygolang)    # /home/x/.../pygolang -> ~/.../pygolang
            got = got.replace(dir_pygolang,  "PYGOLANG") # /home/x/.../pygolang -> PYGOLANG
            got = got.replace(udir_pygolang, "PYGOLANG") # ~/.../pygolang       -> PYGOLANG

            # got: normalize PYGOLANG\a\b\c -> PYGOLANG/a/b/c
            #                a\b\c\d.py  -> a/b/c/d.py
            def _(m):
                return m.group(0).replace(os.path.sep, '/')
            got = re.sub(r"(?<=PYGOLANG)[^\s]+(?=\s)",  _, got)
            got = re.sub(r"([\w\\\.]+)(?=\.py)",        _, got)

            # want: process conditionals
            # PY39(...) -> ...   if py ≥ 3.9 else ø  (inline)
            # `... +PY39` -> ... if py ≥ 3.9 else ø  (whole line)
            # `... -PY39` -> ... if py < 3.9 else ø  (whole line)
            have = {}  # 'PYxy' -> y/n
            for minor in (9,10,11):
                have['PY3%d' % minor] = (sys.version_info >= (3, minor))
            for x, havex in have.items():
                want = re.sub(r"%s\((.*)\)" % x, r"\1" if havex else "", want)
                r = re.compile(r'^(?P<main>.*?) +(?P<y>(\+|-))%s$' % x)
                v = []
                for l in want.splitlines():
                    m = r.match(l)
                    if m is not None:
                        l = m.group('main')
                        y = {'+':True, '-':False}[m.group('y')]
                        if (y and not havex) or (havex and not y):
                            continue
                    v.append(l)
                want = '\n'.join(v)+'\n'

            # want: ^$ -> <BLANKLINE>
            while "\n\n" in want:
                want = want.replace("\n\n", "\n<BLANKLINE>\n")

            X = doctest.OutputChecker()
            if not X.check_output(want, got, doctest.ELLIPSIS):
                # output_difference wants Example object with .want attr
                class Ex: pass
                _ = Ex()
                _.want = want
    >           fail("not equal:\n" + X.output_difference(_, got,
                            doctest.ELLIPSIS | doctest.REPORT_UDIFF))
    E           Failed: not equal:
    E           Expected:
    E               print(qq(b)): "привет αβγ b"
    E               print(qq(u)): "привет αβγ u"
    E           Got:
    E               Extend DSO search path to 'PYGOLANG/golang/runtime'
    E               print(qq(b)): "привет αβγ b"
    E               print(qq(u)): "привет αβγ u"

see e.g. https://stack.nexedi.com/test_result_module/20240509-389AC427/3
for details.

/reviewed-by @kirr
/reviewed-on nexedi/pygolang!27

0d53a6d1

19 Apr, 2024 5 commits

time: Redo timers properly · 044deb35

Kirill Smelkov authored Apr 15, 2024

Background: in 2019 in 9c260fde (time: New package that mirrors Go's
time) and b073f6df (time: Move/Port timers to C++/Pyx nogil) I've added
basic timers - with proper API but with very dumb implementation that
was spawning one thread per each timer. There were just a few timers in
the users and this was working, surprisingly, relatively ok...

... until 2023 where I was working on XLTE that needs to organize 100Hz
polling of Amarisoft eNodeB service to retrieve information about flows
on Data Radio Bearers:

    xlte@2a016d48
    https://lab.nexedi.com/kirr/xlte/-/blob/8e606c64/amari/drb.py

There each request comes with its own deadline - to catch "no reply",
and the deadlines are implemented via timers. So there are 100 threads
created every second which adds visible overhead, consumes a lot of
virtual address space and RSS for threads stacks, and should be all unnecessary.

We was tolerating even that for some time, but recently Joanne approached me
with reports that xamari program, that does the polling, is leaking memory.

With that, and because it was hard to find what is actually leaking,
I've started to remove uncertainties and there are a lot of uncertainty
in what is going on when lots of threads are being created over and over.

In the end the leak turned out to be likely a different thing (see
nexedi/pygolang!24, still
discovered while working on hereby patch), but all of the above was
enough motivation to finally start redoing the timers properly.

--------

So when it comes to do the timers properly more or less, there is
usually queue of armed timers, and a loop that picks entries from that
queue to fire them. I was initially trying to do the simple thing and
use std::priority_queue for that, because priority_queue is internally
heap, and heaps can provide O(log(n)) insertion and removal of arbitrary
element, plus O(1) "pick top element to process". Exactly what would
suit. However I quickly found that even in 2024, std::priority_queue
does not provide removal operation at all, and there is no such thing as
e.g. std::sift_heap, that would help to implement that manually. Which
is surprising, because e.g. libevent implements all that just ok via
sifting up/down upon removal in logarithmic complexity:

https://github.com/libevent/libevent/blob/80e25c02/minheap-internal.h#L96-L115

the lack of efficient removal operation turned out to be a blocker to
use std::priority_queue because most of the timers, that are armed for
timeouts, are never expired and upon successful completion of covered
operation, the timer is stopped. In other words the timer is removed
from the timer queue and the removal is one of the most often
operations.

So, if std::priority_queue cannot work, we would need to either bring in
another implementation of a heap, or, if we are to bring something,
bring and use something else that is more suitable for implementing
timers.

That reminded me that in 2005 for my Navy project, I already implemented
custom timer wheel to handle timeouts after reading https://lwn.net/Articles/152436/ .
Contrary to heaps, such timer wheels provide O(1) insertion and removal
of timers and work generally faster. But this time I did not want to
delve into implementing all that myself again and tried to look around
of what is available out there.

There was an update to kernel timer-wheel implementation described at
https://lwn.net/Articles/646950/ and from that a project called
Timeout.c was also found that provides implementation for such a wheel
for user space: https://25thandclement.com/~william/projects/timeout.c.html .

However when we are to pick third-party code, we should be ready to
understand it and fix bugs there on our own. So the audit of timeout.c
did not went very smoothly - there are many platform-depended places,
and the issue tracker shows signs that sometimes not everything is ok
with the implementation. With that I've looked around a bit more and
found more compact and more portable Ratas library with good structure
and description and whose audit came more well:

    https://www.snellman.net/blog/archive/2016-07-27-ratas-hierarchical-timer-wheel
    https://github.com/jsnell/ratas

Here, after going through the code, I feel to be capable to understand
issues and fix bugs myself if that would become needed.

And the benchmark comparison of Timeout.c and Ratas shows that they
should be of the same order regarding performance:

https://lab.nexedi.com/kirr/misc/-/blob/4f51fd6/bench/time-wheel/ratas-vs-timeout.pdf
ratas@382321d2
timeout@d6f15744

which makes Ratas the winner for me.

Having timer-wheel implementation, the rest is just technique to glue it
all together. One implementation aspect deserves to be mentioned though:

The timer loop uses Semaphore.acquire, recently modernized to also
accept timeout, to organize sleep in between pauses with also being able
to be simultaneously woken up if new timer is armed with earlier
expiration time.

Other than that the changes are mostly straightforward. Please see the
patch itself for details.

Regarding how the new implementation is more efficient for what we had
before, there are added benchmarks to measure arming timers that do not
fire, and, for symmetry, arming timers that do fire. We are most
interested in the first benchmark, because it shows how cheap or
expensive it is to use timers to implement timeouts, but the second one
is also useful to have to see the overhead of the whole timers machinery.

On my machine under py3.11 they go as after this patch:

    name              time/op
    timer_arm_cancel   805ns ± 0%
    timer_arm_fire    9.63µs ± 0%

and before the patch the benchmarks simply do not run till the end
because they run out of memory due to huge number of threads being
created.

Still with the following test program we can measure the effect new
timers implementation has:

    ---- 8< ----
    from golang import time

    def main():
        δt_rate = 1*time.millisecond

        tprev = time.now()
        tnext = tprev + δt_rate
        while 1:
            timer = time.Timer(5*time.second)
            _ = timer.stop()
            assert _ is True

            t = time.now()
            δtsleep = tnext - t
            #print('sleep %.3f ms' % (δtsleep/time.millisecond))
            time.sleep(δtsleep)
            tprev = tnext
            tnext += δt_rate

    main()
    ---- 8< ----

This program creates/arms and cancels a timer 1000 times per second.

Before hereby patch this program consumes ~ 30% of CPU, while after
hereby patch this program consumes ~ 7-8% of CPU.

For the reference just a sleep part of that program, with all code
related to timers removed consumes ~5% of CPU, while the consumption of
plain sleep(1ms) in C and directly using system calls

    ---- 8< ----
    #include <unistd.h>

    int main() {
        while (1) {
            usleep(1000);
        }
        return 0;
    }
    ---- 8< ----

is ~ 3-4% of CPU on my machine.

/cc @jerome
/cc ORS team (@jhuge, @lu.xu, @tomo, @xavier_thompson, @Daetalus)
/proposed-for-review-on nexedi/pygolang!26

044deb35

time: Rearrange code a bit · 82c55254

Kirill Smelkov authored Apr 15, 2024

In the next patch we will add reworked implementation of timers - that
will no longer use dumb approach to work via threads - and in that
implementation it will make sense to regroup the organization of code a
bit for better clarity. Prepare for that:

- move Timer.reset to stay in between _new_timer and stop. This will be
  handy because Timer.reset will be interaction with both even loop
  (coming right before new) and stop.

- move new_timer to the place where we commonly keep wrapper
  "create-timer or ticker" routines to improve signal/noise ration for
  the place where actual interaction in between code parts happen.

For the reference my general approach to order things is to go from high
level to down and group things by interaction along the way. This way
things turns out to be the most easily readable and understandable.

/proposed-for-review-on nexedi/pygolang!26

82c55254

libgolang: Adjust and require runtimes to provide semaphores with timeout · ae9b6f7d

Kirill Smelkov authored Apr 15, 2024

Previously libgolang was specifying its runtime, among other primitives,
to provide semaphore implementation with acquire and release methods.
The release should be non-blocking operation, and the acquire should be
blocking until the semaphore is acquired.

However for efficient implementation of timers, we will need to have
semaphore acquire that can also be instructed to time out.

-> Adjust thread and gevent runtimes to provide that and adjust runtime
interface specification to require that.

This is generally backward incompatible change, but given that there is
just a few libgolang runtimes, it, hopefully, should not do any
real breakage. So I think it is ok to do it this way.

For the reference - contrary to runtimes - the public user API of
libgolang and pygolang - that most of the pygolang users actually use -
is not changed at all. In other words there is no backward-compatibility
issue for regular pygolang/libgolang users because for them pygolang
stays 100% backward compatible.

/proposed-for-review-on nexedi/pygolang!26

ae9b6f7d

time: test: Add test for stop on func-based Timer · fb065b64

Kirill Smelkov authored Apr 15, 2024

I was working on Timer-related topics and started to suspect that stop
might become panicking when draining timer channel if func != nil. That
turned out to be not true - the code is correct as it is, but it
generally helps to have tests covering questionable functionality.

/proposed-for-review-on nexedi/pygolang!26

fb065b64

time: test: Explicitly release Timer/Ticker resources · 9fafad8e

Kirill Smelkov authored Apr 15, 2024

In the light of discovered memory leaks (see nexedi/pygolang!24),
it is better to explicitly make sure that resources allocated by every
test are explicitly released. Even though timers are released
automatically on their expiration, there is generally no guarantee that
the tests will finish after all timers are expired. And even more so for
Ticker - without explicit stop, the ticker continues to be active
forever.

So stop all created timers and tickers where we can in the tests.

/proposed-for-review-on nexedi/pygolang!26

9fafad8e