• Kirill Smelkov's avatar
    golang_str: Adjust bstr/ustr .encode() and .__bytes__ to leave string domain into bytes · 6f26b32c
    Kirill Smelkov authored
    Initially in 023907ee (golang_str: bstr/ustr encode/decode) I
    implemented things in such a way that (b|u)str.__bytes__ were giving
    bstr and ustr.encode() was giving bstr as well. My logic here was that
    bstr is based on bytes and it is ok to give that.
    
    However this logic did not pass backward compatibility test: for example
    when LXML is imported it does
    
        cdef bytes _FILENAME_ENCODING = (sys.getfilesystemencoding() or sys.getdefaultencoding() or 'ascii').encode("UTF-8")
    
    and under gpython/py3 with unicode patched to be ustr it breaks with
    
          File "/srv/slapgrid/slappart47/srv/runner/software/7f1663e8148f227ce3c6a38fc52796e2/bin/runwsgi", line 4, in <module>
            from Products.ERP5.bin.zopewsgi import runwsgi; sys.exit(runwsgi())
          File "/srv/slapgrid/slappart47/srv/runner/software/7f1663e8148f227ce3c6a38fc52796e2/parts/erp5/product/ERP5/__init__.py", line 36, in <module>
            from Products.ERP5Type.Utils import initializeProduct, updateGlobals
          File "/srv/slapgrid/slappart47/srv/runner/software/7f1663e8148f227ce3c6a38fc52796e2/parts/erp5/product/ERP5Type/__init__.py", line 42, in <module>
            from .patches import pylint
          File "/srv/slapgrid/slappart47/srv/runner/software/7f1663e8148f227ce3c6a38fc52796e2/parts/erp5/product/ERP5Type/patches/pylint.py", line 524, in <module>
            __import__(module_name, fromlist=[module_name], level=0))
          File "src/lxml/sax.py", line 18, in init lxml.sax
          File "src/lxml/etree.pyx", line 154, in init lxml.etree
        TypeError: Expected bytes, got golang.bstr
    
    The breakage highlights a thinko in my previous reasoning: yes bstr is based on
    bytes, but bstr has different semantics compared to bytes: even though e.g.
    __getitem__ works the same way for bytes on py2, it works differently compared
    to py3. This way if on py3 a program is doing bytes(x) or x.encode() it then
    expects the result to have bytes semantics of current python which is not the
    case if the result is bstr.
    
    -> Fix that by adjusting .encode() and .__bytes__() to produce bytes type of
       current python and leave string domain.
    
    I initially was contemplating for some time to introduce a third type, e.g.
    bvec also based on bytes, but having bytes semantic and that bvec.decode would
    return back to pygolang strings domain. But due to the fact that bytes semantic
    is different in between py2 and py3, it would mean that bvec provided by
    pygolang would need to have different behaviours dependent on current python
    version which is undesirable.
    
    In the end with leaving into native bytes the "bytes inconsistency" problem is
    left to remain under std python with pygolang targeting only to fix strings
    inconsistency in between py2 and py3 and providing the same semantic for
    bstr and ustr on all python versions.
    
    It also does not harm that bytes.decode() returns std unicode instead of ustr:
    for programs that run under unpatched python we have u() to convert the result
    to ustr, while under gpython std unicode is actually ustr which makes
    bytes.decode() behaviour still quite ok.
    
    P.S. we enable bstr.encode for consistency and because under py2, if not
    enabled, it will break when running pytest under gpython in
    
              File ".../_pytest/assertion/rewrite.py", line 352, in <module>
                RN = "\r\n".encode("utf-8")
            AttributeError: unreadable attribute
    6f26b32c
golang_str_test.py 101 KB