• Kirill Smelkov's avatar
    Fix/Add support for []byte (= bytearray on Python side) · 30ebda02
    Kirill Smelkov authored
    Starting from 2013 (from 555efd8f "first draft of dumb pickle encoder")
    wrt []byte ogórek state was:
    
    1. []byte was encoded as string
    2. there was no way to decode a pickle object into []byte
    
    then, as
    
    - []byte encoding was never explicitly tested,
    - nor I could find any usage of such encodings via searching through all Free /
      Open-Source software ogórek uses - I searched via "Uses" of NewEncoder on godoc:
    
      https://sourcegraph.com/github.com/kisielk/og-rek/-/blob/encode.go#L48:6=&tab=references:external
    
    it is likely that []byte encoding support was added just for the sake of
    it and convenience and then never used. It is also likely that the
    original author does not use ogórek encoder anymore:
    
    	https://github.com/kisielk/og-rek/pull/52#issuecomment-423639026
    
    For those reasons I tend to think that it should be relatively safe to
    change how []byte is handled:
    
    - the need to change []byte handling is that currently []byte is a kind of
      exception: we can only encode it and not decode something into it.
      Currently encode/decode roundtrip for []byte gives string, which breaks
      the property of encode/decode being identity for all other basic types.
    
    - on the similar topic, on encoding strings are assumed UTF-8 and are
      encoded as UNICODE opcodes for protocol >= 3. Passing arbitrary bytes
      there seems to be not good.
    
    - on to how change []byte - sadly it cannot be used as Python's bytes
      counterpart. In fact in the previous patch we actually just added
      ogórek.Bytes as a counterpart for Python bytes. We did not used []byte
      for that because - contrary to Python bytes - []byte cannot be used as a
      dict key.
    
    - the most natural counterpart for Go's []byte is thus Python's
      bytearray:
    
    	https://docs.python.org/3/library/stdtypes.html#bytearray-objects
    
      which is "a mutable counterpart to bytes objects"
    
    So add Python's bytearray decoding into []byte, and correspondingly
    change []byte encoding to be encoded as bytearray.
    
    P.S.
    
    This changes encoder semantic wrt []byte. If some ogórek use breaks
    somewhere because of it, we could add an encoder option to restore
    backward compatible behaviour. However since I suspect noone was
    actually encoding []byte into pickles, I prefer we wait for someone to
    speak-up first instead of loading EncoderConfig with confusion options
    that nobody will ever use.
    30ebda02
ogorek.go 31 KB