wcfs: client: Adjust Cython part to accept both bytes and str input, and yield bstr output
wcfs/client/_wcfs.pyx provides Cython wrapper over C++ WCFS client that works with bytes-based std::string messages. On py2 everything works ok, but on py3, due to this, it rejects str given as input argument, e.g. as follows: ```python _____________________________ test_join_autostart ______________________________ @func def test_join_autostart(): zurl = testzurl with raises(RuntimeError, match="wcfs: join .*: server not running"): wcfs.join(zurl, autostart=False) assert wcfs._wcregistry == {} def _(): assert wcfs._wcregistry == {} defer(_) > wc = wcfs.join(zurl, autostart=True) wcfs/wcfs_test.py:164: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ wcfs/__init__.py:225: in join wc = WCFS(mntpt, fwcfs, wcsrv) ../../venvs/wendelin.core/lib/python3.9/site-packages/decorator.py:232: in fun return caller(func, *(extras + args), **kw) ../pygolang/golang/__init__.py:125: in _ return f(*argv, **kw) wcfs/__init__.py:167: in __init__ wc.mountpoint = mountpoint wcfs/client/_wcfs.pyx:44: in wendelin.wcfs.client._wcfs.PyWCFS.mountpoint.__set__ def __set__(PyWCFS pywc, string v): _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > ??? E TypeError: expected bytes, str found ``` because by default Cython treats std::string as related to bytes on py side. -> Fix it by accepting both str and bytes as input for all methods arguments related to strings. For returned strings care to return them as strings, not bytes, to which Cython converts std::string by default because calling code expects returned messages to have string semantic. Though we return the data as bytestring, not unicode, as the rest of the testsuite also assumes binary messages reception from WCFS server. NOTE even though it was me to originally suggest in private to use cython: c_string_type=str, c_string_encoding=utf8 so that str type is accepted as input, later, when having a broader look, I realized that there are two problems with the above directives. First the directives affect not only the input, but also any std::string returned becomes returned as unicode instead of bytes/bytestr previously. However as explained above the higher level expects binary semantic from returned messages. And second if WCFS sends a message with invalid UTF-8 data, it will result in exception thrown on the client instead of actually returning sent data to the caller. This makes debugging more difficult and last thing I want to happen is, when WCFS sends some garbage, to get a UnicodeDecodeError instead of actually seeing the message and higher level assert saying that that message is unexpected with providing details. So do all the in- and out- conversions by hand instead with controlling desired semantics ourselves. On py3 the implementation depends on nexedi/pygolang!21, but on py2 it works both with and without pygolang bstr patches. Preliminary history: vnmabus/wendelin.core@47c27b03Co-authored-by: Carlos Ramos Carreño <carlos.ramos@nexedi.com>
Showing
Please register or sign in to comment