Commit 06e06939 authored by Kirill Smelkov's avatar Kirill Smelkov Committed by Kamil Kisiel

encoder: Allow to specify pickle protocol version

There are many pickle protocol versions - 0 to 4. Python2 for example
understands only versions 0 - 2. However we currently unconditionally
emit opcodes from higher versions, for example STACK_GLOBAL - from
version 4 - when encoding a Class, which leads to inability to decode
pickles generated by ogórek on Python2.

Similarly protocol 0 states that only text opcodes should be used,
however we currently unconditionally emit e.g. BININT (from protocol 1)
when encoding integers.

Changing to always using protocol 0 opcodes would be not good, since many
opcodes for efficiently encoding either integers, booleans, unicode etc
are available only in protocol versions 2 and 4.

For this reason, similarly to Python[1], let's allow users to specify
desired pickle protocol when creating Encoder with config. For backward
compatibility and common sense the protocol version that plain
NewEncoder selects is 2.

This commit adds only above-described user interface and testing
infrastructure for verifying what was the result of encoding an object
at particular protocol version.

For now only a few of pickle test vectors are right wrt what the encoder
should be or currently generates. Thus in the next patches we'll be
step-by-step fixing encoder on this topic.

[1] https://docs.python.org/3/library/pickle.html#pickle.dump
parent 93075d82
......@@ -10,6 +10,8 @@ import (
"strings"
)
const highestProtocol = 4 // highest protocol version we support generating
type TypeError struct {
typ string
}
......@@ -26,6 +28,9 @@ type Encoder struct {
// EncoderConfig allows to tune Encoder.
type EncoderConfig struct {
// Protocol specifies which pickle protocol version should be used.
Protocol int
// PersistentRef, if !nil, will be used by encoder to encode objects as persistent references.
//
// Whenever the encoders sees pointer to a Go struct object, it will call
......@@ -39,7 +44,10 @@ type EncoderConfig struct {
// NewEncoder returns a new Encoder struct with default values
func NewEncoder(w io.Writer) *Encoder {
return NewEncoderWithConfig(w, &EncoderConfig{})
return NewEncoderWithConfig(w, &EncoderConfig{
// allow both Python2 and Python3 to decode what ogórek produces by default
Protocol: 2,
})
}
// NewEncoderWithConfig is similar to NewEncoder, but allows specifying the encoder configuration.
......@@ -49,6 +57,18 @@ func NewEncoderWithConfig(w io.Writer, config *EncoderConfig) *Encoder {
// Encode writes the pickle encoding of v to w, the encoder's writer
func (e *Encoder) Encode(v interface{}) error {
proto := e.config.Protocol
if !(0 <= proto && proto <= highestProtocol) {
return fmt.Errorf("pickle: encode: invalid protocol %d", proto)
}
// protocol >= 2 -> emit PROTO <protocol>
if proto >= 2 {
err := e.emit(opProto, byte(proto))
if err != nil {
return err
}
}
rv := reflectValueOf(v)
err := e.encode(rv)
if err != nil {
......
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment