- 03 May, 2020 1 commit
-
-
Kirill Smelkov authored
qq is used to quote strings or byte-strings. The following example illustrates the problem we are currently hitting in zodbtools with Python3: >>> "hello %s" % qq("мир") 'hello "мир"' >>> b"hello %s" % qq("мир") Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: %b requires a bytes-like object, or an object that implements __bytes__, not 'str' >>> "hello %s" % qq(b("мир")) 'hello "мир"' >>> b"hello %s" % qq(b("мир")) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: %b requires a bytes-like object, or an object that implements __bytes__, not 'str' i.e. one way or another if type of format string and what qq returns do not match it creates a TypeError. We want qq(obj) to be useable with both string and bytestring format. For that let's teach qq to return special str- and bytes- derived types that know how to automatically convert to str->bytes and bytes->str via b/u correspondingly. This way formatting works whatever types combination it was for format and for qq, and the whole result has the same type as format. For now we teach only qq to use new types and don't generally expose _str and _unicode to be returned by b and u yet. However we might do so in the future after incrementally gaining a bit more experience. /proposed-for-review-on: nexedi/pygolang!1
-
- 29 Apr, 2020 3 commits
-
-
Kirill Smelkov authored
i.e. unicode on py3 and bytes on py2. This makes it more likely to work for "format string %s ..." % qq(object) with format string being str and object of arbitrary type. However more on this topic in the next patch.
-
Kirill Smelkov authored
i.e. b(b(·)) and u(u(·)) are always identity. This property already holds and is easy to verify just by code review. However it might become less obvious once we start to tweak b and u. -> Add explicit test.
-
Kirill Smelkov authored
The new home is https://lab.nexedi.com/nexedi/pygolang [1] https://stack.nexedi.com
-
- 16 Apr, 2020 3 commits
-
-
Kirill Smelkov authored
A build fix wrt gevent-1.5 + benchmarks for nogil go and channels.
-
Kirill Smelkov authored
Starting from gevent >= 1.5 '*.pxd' files for gevent API are no longer provided, at least in released gevent wheels. This broke pygolang: Error compiling Cython file: ------------------------------------------------------------ ... # Gevent runtime uses gevent's greenlets and semaphores. # When sema.acquire() blocks, gevent switches us from current to another greenlet. IF not PYPY: from gevent._greenlet cimport Greenlet ^ ------------------------------------------------------------ golang/runtime/_runtime_gevent.pyx:28:4: 'gevent/_greenlet.pxd' not found Since gevent upstream refuses to restore Cython level access[1], let's fix the build by using gevent bits via Python-level. Even when used via py import gevent-1.5 brings speed improvement compared to gevent-1.4 (used via cimport): (on i7@2.6GHz, gevent runtime) gevent-1.4 gevent-1.5 (cimport) (py import) name old time/op new time/op delta pyx_select_nogil 9.47µs ± 0% 8.74µs ± 0% -7.70% (p=0.000 n=10+9) pyx_go_nogil 14.3µs ± 1% 12.0µs ± 1% -16.52% (p=0.000 n=10+10) pyx_chan_nogil 7.10µs ± 1% 6.32µs ± 1% -10.89% (p=0.000 n=10+10) go 16.0µs ± 2% 13.4µs ± 1% -16.37% (p=0.000 n=10+10) chan 7.50µs ± 0% 6.79µs ± 0% -9.53% (p=0.000 n=10+10) select 10.8µs ± 1% 10.0µs ± 1% -6.78% (p=0.000 n=10+10) Using gevent-1.5 could have been even faster via cimport (it is still possible to compile and test against gevent installed in development mode via `pip install -e` because pxd files are there in gevent worktree and tarball): gevent-1.5 gevent-1.5 (py import) (cimport) name old time/op new time/op delta pyx_select_nogil 8.74µs ± 0% 7.90µs ± 1% -9.60% (p=0.000 n=9+10) pyx_go_nogil 12.0µs ± 1% 11.2µs ± 2% -6.35% (p=0.000 n=10+10) pyx_chan_nogil 6.32µs ± 1% 5.89µs ± 0% -6.80% (p=0.000 n=10+9) go 13.4µs ± 1% 12.4µs ± 1% -7.54% (p=0.000 n=10+9) chan 6.79µs ± 0% 6.42µs ± 0% -5.47% (p=0.000 n=10+10) select 10.0µs ± 1% 9.4µs ± 1% -6.39% (p=0.000 n=10+10) but we cannot use cimport to access gevent-1.5 universally, since pxd are not shipped in gevent wheel releases. In the future we might want to change plain version check into compile time check whether gevent/_greenlet.pxd is actually present or not and use faster access if yes. Requesting gevent to be installed in non-binary form might be also an option worth trying. However plain version check should be ok for now. [1] https://github.com/gevent/gevent/issues/1568
-
Kirill Smelkov authored
on i7@2.6GHz it looks like: thread runtime: name time/op pyx_select_nogil 2.70µs ±13% pyx_go_nogil 15.9µs ± 1% pyx_chan_nogil 2.79µs ± 2% go 17.6µs ± 0% chan 3.05µs ± 4% select 3.62µs ± 4% gevent runtime (gevent-1.4.0): name time/op pyx_select_nogil 9.39µs ± 1% pyx_go_nogil 15.1µs ± 2% pyx_chan_nogil 7.10µs ± 1% go 16.6µs ± 1% chan 7.47µs ± 1% select 10.7µs ± 0%
-
- 15 Apr, 2020 1 commit
-
-
Kirill Smelkov authored
Just a single change to expose time constants to C++ users (2476f47e).
-
- 05 Mar, 2020 1 commit
-
-
Kirill Smelkov authored
This makes them available for C++ users as well.
-
- 28 Feb, 2020 4 commits
-
-
Kirill Smelkov authored
-
Kirill Smelkov authored
On macos and windows, Python2 is built with --enable-unicode=ucs2, which makes it to use UTF-16 encoding for unicode characters, and so for characters higher than U+10000 it uses surrogate encoding with _2_ unicode points, for example: >>> import sys >>> sys.maxunicode 65535 <-- NOTE indicates UCS2 build >>> s = u'\U00012345' >>> s u'\U00012345' >>> s.encode('utf-8') '\xf0\x92\x8d\x85' >>> len(s) 2 <-- NOTE _not_ 1 >>> s[0] u'\ud808' >>> s[1] u'\udf45' This leads to e.g. b tests failing for # tbytes tunicode (b"\xf0\x90\x8c\xbc", u'\U0001033c'), # Valid 4 Octet Sequence '𐌼' > assert b(tunicode) == tbytes E AssertionError: assert '\xed\xa0\x80\xed\xbc\xbc' == '\xf0\x90\x8c\xbc' E - \xed\xa0\x80\xed\xbc\xbc E + \xf0\x90\x8c\xbc because on UCS2 python build u'\U0001033c' is represented as 2 unicode points: >>> s = u'\U0001033c' >>> len(s) 2 >>> s[0] u'\ud800' >>> s[1] u'\udf3c' >>> s[0].encode('utf-8') '\xed\xa0\x80' >>> s[1].encode('utf-8') '\xed\xbc\xbc' -> Fix it by detecting UCS2 build and working around by manually combining such surrogate unicode pairs appropriately. A reference on the subject: https://matthew-brett.github.io/pydagogue/python_unicode.html#utf-16-ucs2-builds-of-python-and-32-bit-unicode-code-points
-
Kirill Smelkov authored
This is a preparatory step for the next patch where we'll be fixing strconv for Python2 builds with --enable-unicode=ucs2, where a unicode character can be taking _2_ unicode points. In that general case relying on unicode objects to represent runes is not good, because many things generally do not work for U+10000 and above, e.g. ord breaks: >>> import sys >>> sys.maxunicode 65535 <-- NOTE indicates UCS2 build >>> s = u'\U00012345' >>> s u'\U00012345' >>> s.encode('utf-8') '\xf0\x92\x8d\x85' >>> len(s) 2 <-- NOTE _not_ 1 >>> ord(s) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: ord() expected a character, but string of length 2 found so we switch to represent runes as integer, similarly to what Go does.
-
Kirill Smelkov authored
Commit 8af78fc5 (pyx.build: v↑ setuptools_dso (1.2 -> 1.4)) upgraded setuptools_dso to 1.4, but since from https://github.com/mdavidsaver/setuptools_dso/commit/3f3ff746 setuptools_dso started to use multiprocessing, pyx.build, when running under gpython, started to hang, which is a known gevent problem - see e.g. here: https://github.com/gevent/gevent/issues/993. The problem was manifesting itself as pyx.build unit test hanging under Python3. Fix it by installing gevent multiprocessing plugin which is automatically used/activated by gevent.monkey.patch_all(). geventmp says it is pre-alpha, but by using it we can unhang pyx.build tests, which is better state than before. The other future possibility would be to use https://github.com/jgehrcke/gipc wrapped into multiprocessing compatible API.
-
- 27 Feb, 2020 3 commits
-
-
Kirill Smelkov authored
This is top-level documentation for error chaining that was promised and marked as TODO in - fd95c88a (golang, errors, fmt: Error chaining (C++/Pyx)) - 17798442 (golang: Expose error at Py level) - 78d0c76f (golang: Teach pyerror to be a base class) - 337de0d7 (golang, errors, fmt: Error chaining (Python)) - 03f88c0b (errors: Take .__cause__ into account)
-
Kirill Smelkov authored
This provides top-level documentation for b and u that was promised and marked as TODO in bcb95cd5 (golang: Provide b, u for strings).
-
Kirill Smelkov authored
Pychan provides __eq__ (see 2c8063f4 "*: Channels must be compared by ==, not by "is" even for nilchan"), but does not provide __ne__. At the same time in 17798442 (golang: Expose error at Py level) we had to define both pyerror.__eq__ and pyerror.__ne__ because without __ne__ pyerror != pyerror was not working correctly. As it turns out pychan != pychan already works ok, because pychan does not have base class and for that case cython automatically generates __ne__ based on __eq__: https://github.com/cython/cython/blob/0.29.14-629-ga73815042/Cython/Compiler/ModuleNode.py#L1963-L1976 https://github.com/cython/cython/commit/b75d2942afab Add corresponding comment and extend tests to make sure it is indeed so.
-
- 20 Feb, 2020 1 commit
-
-
Kirill Smelkov authored
Go version does not provide this, but the topic of sync.RWMutex downgrading was raised up several times, at least https://github.com/golang/go/issues/4026 https://github.com/golang/go/issues/23513 https://groups.google.com/forum/#!topic/golang-nuts/MmIDUzl8HA0 ... Atomic downgrading is often useful to avoid race window in between Unlock and RLock and, as consequence, having the need to recheck things after RLock. We can put this complexity and logic into well-defined RWMutex primitive instead of throwing it to be solved by every RWMutex user.
-
- 17 Feb, 2020 1 commit
-
-
Kirill Smelkov authored
Provide sync.RWMutex that can be useful for cases when there are multiple simultaneous readers and more seldom writer(s). This implements readers-writer mutex with preference for writers similarly to Go version.
-
- 12 Feb, 2020 1 commit
-
-
Kirill Smelkov authored
Only io.EOF and io.ErrUnexpectedEOF for now. Moved here from wcfs from wendelin.core.
-
- 11 Feb, 2020 3 commits
-
-
Kirill Smelkov authored
A Python error can have links to other errors by means of both .Unwrap() and .__cause__ . These ways are both explicit and so should be treated by e.g. errors.Is as present in error's error chain. It is a bit unclear, at least initially, how to linearise and order error chain traversal in divergence points - for exception objects where both .Unwrap() and .__cause__ are !None. However more closer look suggests linearisation rule to traverse into .__cause__ after going through .Unwrap() part - please see details in documentation added into _error.pyx -> Teach errors.Is to do this traversal, and this way now e.g. exception raised as raise X from Y will be treated by errors.Is as being both X and Y, even if any of X or Y also has its own error chain via .Unwrap(). Top-level documentation is TODO.
-
Kirill Smelkov authored
Following errors model in Go and fd95c88a (golang, errors, fmt: Error chaining (C++/Pyx)) let's add support at Python-level for errors to wrap each other and to be inspected/unwrapped: - an error can additionally provide way to unwrap itself, if it provides .Unwrap() method. .__cause__ is not taken into account yet, but will be in a follow-up patch; - errors.Is(err) tests whether an item in error's chain matches target; - `fmt.Errorf("... : %w", ... err)` is similar to `"... : %s" % (..., err)` but resulting error, when unwrapped, will return err. - errors.Unwrap is not exposed as chaining through both .Unwrap() and .__cause__ will need more than just "current element" as unwrapping state (i.e. errors.Unwrap API is insufficient - see next patch), and in practice users of errors.Unwrap() are very seldom. Support for error chaining through .__cause__ will follow in the next patch. Top-level documentation is TODO. See https://blog.golang.org/go1.13-errors for error chaining overview.
-
Kirill Smelkov authored
It is surprising to have an exception class that cannot be derived from. Besides, in the future we'll use subclassing from golang.error as an indicator that an error is a "well-defined" (in simple words - does not need traceback to be interpreted).
-
- 10 Feb, 2020 1 commit
-
-
Kirill Smelkov authored
The first step to expose errors and error chaining to Python: - Add pyerror that wraps a pyx/nogil C-level error and is exposed as golang.error at py level. - py errors must be compared by ==, not by "is" - Add (py) errors.New to create a new error from text. - a C-level error that has .Unwrap, is exposed with .Unwrap at py level, but full py-level chaining will be implemented in a follow-up patch. - py error does not support inheritance yet. Top-level documentation is TODO.
-
- 06 Feb, 2020 1 commit
-
-
Kirill Smelkov authored
Following errors model in Go, let's add support for errors to wrap other errors and to be inspected/unwrapped: - an error can additionally provide way to unwrap itself, if it implements errorWrapper interface; - errors.Unwrap(err) tries to extract wrapped error; - errors.Is(err) tests whether an item in error's chain matches target; - `fmt.errorf("... : %w", ... err)` is similar to `fmt.errorf("... : %s", ... err.c_str())` but resulting error, when unwrapped, will return err. Add C++ implementation for the above + tests. Python analogs will follow in the next patches. Top-level documentation is TODO. See https://blog.golang.org/go1.13-errors for error chaining overview.
-
- 04 Feb, 2020 16 commits
-
-
Kirill Smelkov authored
Package cxx was added in 9785f2d3 (cxx: New package), but the interface that cxx:dict provided turned out to be not optimal: dict.get was returning (v, ok), and dict.pop ----//--- Correct dict.get and dict.pop to return just value, and, similarly to channels API, provide additional dict.get_ and dict.pop_ - extended versions that also return ok: dict.get(k) -> v dict.pop(k) -> v dict.get_(k) -> (v, ok) dict.pop_(k) -> (v, ok) This time add tests.
-
Kirill Smelkov authored
Follow the scheme established and used for all other packages, because we will soon have fmt pyx part which, if named as fmt.pyx, will intersect and conflict with fmt.py .
-
Kirill Smelkov authored
errors.New was added in a245ab56 (errors: New package) without test.
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
Kirill Smelkov authored
Makes understanding which test is it and where when one fails.
-
Kirill Smelkov authored
Currently libgolang_test.cpp contains tests for code in libgolang.cpp and for code that lives in other libgolang packages - sync, fmt, etc. It is becoming tight and we are going to split libgolang_test.cpp and move package tests to their corresponing files - e.g. to sync_test.cpp and the like. Move common assertion utilities into shared header before that as a preparatory step.
-
Kirill Smelkov authored
Just use builtins and cimported things that we have at pyx level.
-
Kirill Smelkov authored
U is preffered way to make sure an object is unicode string.
-
Kirill Smelkov authored
This will allow to integrate qq with u in the next patch. Moving to compiled code for string processing functions is also generally better for performance.
-
Kirill Smelkov authored
With Python3 I've got tired to constantly use .encode() and .decode(); getting exception if original argument was unicode on e.g. b.decode(); getting exception on raw bytes that are invalid UTF-8, not being able to use bytes literal with non-ASCII characters, etc. So instead of this pain provide two functions that make sure an object is either bytes or unicode: - b converts str/unicode/bytes s to UTF-8 encoded bytestring. Bytes input is preserved as-is: b(bytes_input) == bytes_input Unicode input is UTF-8 encoded. The encoding always succeeds. b is reverse operation to u - the following invariant is always true: b(u(bytes_input)) == bytes_input - u converts str/unicode/bytes s to unicode string. Unicode input is preserved as-is: u(unicode_input) == unicode_input Bytes input is UTF-8 decoded. The decoding always succeeds and input information is not lost: non-valid UTF-8 bytes are decoded into surrogate codes ranging from U+DC80 to U+DCFF. u is reverse operation to b - the following invariant is always true: u(b(unicode_input)) == unicode_input NOTE: encoding _and_ decoding *never* fail nor loose information. This is achieved by using 'surrogateescape' error handler on Python3, and providing manual fallback that behaves the same way on Python2. The naming is chosen with the idea so that b(something) resembles b"something", and u(something) resembles u"something". This, even being only a part of strings solution discussed in [1], should help handle byte- and unicode- strings in more robust and distraction free way. Top-level documentation is TODO. [1] nexedi/zodbtools!13
-
Kirill Smelkov authored
This continues 60f6db6f (libgolang: Provide nil as alias for nullptr and NULL): I've tried to compile pygolang with Clang on my Debian 10 workstation and got: $ CC=clang CXX=clang++ python setup.py build_dso -i In file included from ./golang/fmt.h:32: ./golang/libgolang.h:381:11: error: unknown type name 'nullptr_t'; did you mean 'std::nullptr_t'? constexpr nullptr_t nil = nullptr; ^~~~~~~~~ std::nullptr_t /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/x86_64-linux-gnu/c++/8/bits/c++config.h:242:29: note: 'std::nullptr_t' declared here typedef decltype(nullptr) nullptr_t; ^ : In file included from ./golang/context.h In file included from golang/runtime/libgolang.cpp:30: ./golang/libgolang.h:381:11: error: unknown type name 'nullptr_t'; did you mean 'std::nullptr_t'? constexpr nullptr_t nil = nullptr; ^~~~~~~~~ std::nullptr_t /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/x86_64-linux-gnu/c++/8/bits/c++config.h:242:29: note: 'std::nullptr_t' declared here typedef decltype(nullptr) nullptr_t; ^ :39: ./golang/libgolang.h:381:11: error: unknown type In file included from golang/fmt.cpp:25: In file included from ./golang/fmt.h:32: ./golang/libgolang.h:421:17: error: unknown type name 'nullptr_t'; did you mean 'std::nullptr_t'? inline chan(nullptr_t) { _ch = nil; } ^~~~~~~~~ std::nullptr_t ... It seems with GCC and Clang under macOS nullptr_t is automatically provided in builtin namespace, while with older Clang on Linux (clang version 7.0.1-8) only in std:: namespace - rightfully as nullptr_t is described to be present there: https://en.cppreference.com/w/cpp/types/nullptr_t This way we either have to correct all occurrences of nullptr_t to std::nullptr_t, or do something similar with providing nil under golang:: . To reduce noise I prefer the later and let it be named as Nil.
-
Kirill Smelkov authored
The code was assigning nil to local, _not_ global _tblockforever. As a result _tblockforever was left set with a test hook even after leaving test context. Fix it. The bug was there starting from 3b241983 (Port/move channels to C/C++/Pyx). Had to change `= nil` to `= NULL` because with nil Cython complains as def __exit__(pypanicWhenBlocked t, typ, val, tb): global _tblockforever _tblockforever = nil ^ ------------------------------------------------------------ golang/_golang_test.pyx:86:25: Cannot assign type 'nullptr_t' to 'void (*)(void) nogil' This is https://github.com/cython/cython/issues/3314.
-
Kirill Smelkov authored
It's a leftover originating from b073f6df (time: Move/Port timers to C++/Pyx nogil).
-
Kirill Smelkov authored
Instead of `pyctx.ctx = nil` it was just `ctx = nil` - i.e. assign nil to local variable instead of changing pyctx instance data. We were not observing this bug because Cython, for C++ fields of cdef classes, automatically emits in-place destructor calls in generated __dealloc__ https://github.com/cython/cython/blob/0.29.14-11-g8c620c388/Cython/Compiler/ModuleNode.py#L1477-L1478 and so this way there was no leak. However we want to be explicit and the code was not correct. Fix it. The bug was there from 2a359791 (context: Move/Port context package to C++/Pyx nogil).
-