• Kirill Smelkov's avatar
    golang_str: Fix iter(bstr) to yield byte instead of unicode character · 8d76276c
    Kirill Smelkov authored
    In a72c1c1a (golang_str: bstr/ustr iteration) things were initially
    implemented to follow Go semantic exactly with bytestring iteration
    yielding unicode characters as explained in
    https://blog.golang.org/strings. However this makes bstr not a 100%
    drop-in compatible replacement for std str under py2, and even though my
    initial testing was saying this change does not affect programs in
    practice it turned out to be not the case.
    
    For example with bstr.__iter__ yielding unicode characters running
    gpython on py2 with builtin str patched to be bstr will break sometimes
    when importing uuid:
    
    There uuid reads 16 bytes from /dev/random and then wants to iterate
    those 16 bytes as single bytes and then expects that the length
    of the resulting sequence is exactly 16:
    
         int = long(('%02x'*16) % tuple(map(ord, bytes)), 16)
    
         ( https://github.com/python/cpython/blob/2.7-0-g8d21aa21f2c/Lib/uuid.py#L147 )
    
    which breaks if some of the read bytes are higher than 0x7f.
    
    Even though this particular problem could be worked-around with
    patching uuid, there is no evidence that there will be no similar
    problems later, which could be many.
    
    -> So adjust bstr semantic instead to follow semantic of str under py2
       and introduce uiter() primitive to still be able to iterate
       bytestrings as unicode characters.
    
    This makes bstr, hopefully, to be fully compatible with str on py2 while
    still providing reasonably good approach for strings processing the
    Go-way when needed.
    
    Add biter as well for symmetry.
    
    See
    
        nexedi/pygolang!21 (comment 170754)
        nexedi/pygolang!21 (comment 170782)
        ...
    
    and
    
        nexedi/pygolang!21 (comment 206044)
    
    for discussion on iter(bstr) topic.
    8d76276c
gpython_test.py 17 KB