• Kirill Smelkov's avatar
    Add support for Python bytes · 2fe0e876
    Kirill Smelkov authored
    In Python bytes is immutable and read-only array of bytes. It is
    also hashable and so is different from go []byte in that it can be
    used as a dict key. Thus the closes approximation for Python bytes in Go
    is some type derived from Go's string - it will be different from string
    and at the same time will inherit from string it immutability property
    and being able to be used as map key. So
    
    - add ogórek.Bytes type to represent Python bytes
    - add support to decode BINBYTES* pickle opcodes (these are protocol 3 opcodes)
    - add support to encode ogórek.Bytes via those BINBYTES* opcodes
    - for protocols <= 2, where there is no opcodes to directly represent
      bytes, adopt the same approach as Python - by pickling bytes as
    
    	_codecs.encode(byt.decode('latin1'), 'latin1')
    
      this way unpickling it on Python3 will give bytes, while unpickling it
      on Python2 will give str:
    
    	In [1]: sys.version
    	Out[1]: '3.6.6 (default, Jun 27 2018, 14:44:17) \n[GCC 8.1.0]'
    
    	In [2]: byt = b'\x01\x02\x03'
    
    	In [3]: _codecs.encode(byt.decode('latin1'), 'latin1')
    	Out[3]: b'\x01\x02\x03'
    
      ---
    
    	In [1]: sys.version
    	Out[1]: '2.7.15+ (default, Aug 31 2018, 11:56:52) \n[GCC 8.2.0]'
    
    	In [2]: byt = b'\x01\x02\x03'
    
    	In [3]: _codecs.encode(byt.decode('latin1'), 'latin1')
    	Out[3]: '\x01\x02\x03'
    
    - correspondingly teach decoder to recognize particular calls to
      _codecs.encode as being representation for bytes and decode it
      appropriately.
    
    - since we now have to emit byt.decode('latin1') as UNICODE - add, so
      far internal, `type unicode(string)` that instructs ogórek encoder to
      always emit the string with UNICODE opcodes (regular string is encoded
      to unicode pickle object only for protocol >= 3).
    
    - For []byte encoding preserve the current status - even though
      dispatching in Encoder.encode changes, the end result is the same -
      []byte was and stays currently encoded as just regular string.
    
      This was added in 555efd8f "first draft of dumb pickle encoder", and
      even though that might be not a good choice, changing it is a topic for
      another patch.
    2fe0e876
ogorek.go 30 KB