Commits · 0d666f6a71671d21f37e3a6dd0df05d81cebb66e · Jérome Perrin / zodbtools

01 Sep, 2023 1 commit

test/gen_testdata: Adjust it to match current testdata/ state · 0d666f6a

Kirill Smelkov authored Sep 01, 2023

In 80559a94 ("zodbdump: support --pretty option with a format to show
pickles disassembly") we added support for zodbdump --pretty and
adjusted files in testdata/ to be named like 1.zdump.{raw,zpickledis}.ok
instead of just 1.zdump.ok. However, that renaming and
generation of 1.zdump.zpickledis.ok, it seems, were done by hand, because
rerunning gen_testdata.py still regenerates old 1.zdump.ok. It seems
that during nexedi/zodbtools!22 I
missed that gen_testdata.py was not updated.

-> Fix it.

Running gen_testdata.py with py2 and ZODB 5.8.1 regenerates *.fs and
*.ok files in testdata/ in exactly the same state they were.

0d666f6a

08 Sep, 2022 1 commit

Port zodbtools to py3 · 7ae5ff82

Kirill Smelkov authored Sep 08, 2022

Penultimate patch needs `bstr` from pygolang to work ok (see
kirr/pygolang@c9648c44), but it won't hurt
if we merge this without waiting for pygolang bits because without bstr
zodbtools continues to work ok on py2, and it will be py3 mode which
will not work fully ok.

Previous discussions and py3 porting attempts:

- nexedi/zodbtools!8 (comment 73726)
- nexedi/zodbtools!12
- conversation from nexedi/zodbtools!13 (comment 81553) to nexedi/zodbtools!13 (comment 81874)
- nexedi/zodbtools!19 (comment 129023)
- kirr/zodbtools@42799cf6 (comment 166403)

/reviewed-by @jerome
/reviewed-on nexedi/zodbtools!23

7ae5ff82

07 Sep, 2022 7 commits

analyze: test: Fix tidmin thinko in "empty range" test · 65ebbe7b

Kirill Smelkov authored Sep 07, 2022

Empty-range test added in b4824ad5 (analyze: fix ZeroDivisionErrors when
report is empty) intended to use 0xffffffffffffffff TID, but used just
'ffffffffffffffff' string instead. It was passing on py2 partly by luck,
but on py3 it fails because tidmin type is mismatched:

    _______________________________ test_zodbanalyze _______________________________

    tmpdir = local('/tmp/pytest-of-kirr/pytest-30/test_zodbanalyze0')
    capsys = <_pytest.capture.CaptureFixture object at 0x7f7bb3f9a4f0>

        def test_zodbanalyze(tmpdir, capsys):
            ...

            # empty range
            report(
    >           analyze(
                    tfs1,
                    use_dbm=False,
                    delta_fs=False,
                    tidmin="ffffffffffffffff",
                    tidmax=None,
                ),
                csv=False,
            )

    zodbtools/test/test_analyze.py:68:
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    ../../venv/py3.venv/lib/python3.9/site-packages/decorator.py:232: in fun
        return caller(func, *(extras + args), **kw)
    ../../../tools/go/pygolang/golang/__init__.py:103: in _
        return f(*argv, **kw)
    zodbtools/zodbanalyze.py:181: in analyze
        fsi = fs.iterator(tidmin, tidmax)
    ../ZODB/src/ZODB/FileStorage/FileStorage.py:1381: in iterator
        return FileIterator(self._file_name, start, stop)
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    self = <ZODB.FileStorage.FileStorage.FileIterator object at 0x7f7bb348c6d0>
    filename = '/tmp/pytest-of-kirr/pytest-30/test_zodbanalyze0/1.fs'
    start = 'ffffffffffffffff', stop = None, pos = 4

        def __init__(self, filename, start=None, stop=None, pos=4):
            assert isinstance(filename, STRING_TYPES)
            file = open(filename, 'rb')
            self._file = file
            self._file_name = filename
            if file.read(4) != packed_version:
                raise FileStorageFormatError(file.name)
            file.seek(0, 2)
            self._file_size = file.tell()
            if (pos < 4) or pos > self._file_size:
                raise ValueError("Given position is greater than the file size",
                                 pos, self._file_size)
            self._pos = pos
    >       assert start is None or isinstance(start, bytes)
    E       AssertionError

    ../ZODB/src/ZODB/FileStorage/FileStorage.py:1816: AssertionError
    ------------------------------ Captured log call -------------------------------
    ERROR    ZODB.FileStorage:FileStorage.py:480 loading index
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xb7 in position 25: ordinal not in range(128)

    The above exception was the direct cause of the following exception:

    Traceback (most recent call last):
      File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/FileStorage/FileStorage.py", line 478, in _restore_index
        info = fsIndex.load(index_name)
      File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/fsIndex.py", line 138, in load
        v = unpickler.load()
    SystemError: <built-in method read of _io.BufferedReader object at 0x7f7bb3df03b0> returned a result with an error set
    ERROR    ZODB.FileStorage:FileStorage.py:480 loading index
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xb7 in position 25: ordinal not in range(128)

    ...

-> Fix it by preparing tidmin in the test a 8-bytes binary properly.

65ebbe7b

*: Fix working on py3 by using bstr bytestring instead of raw bytes · 9861c136

Kirill Smelkov authored Sep 07, 2022

e.g. for ObjectData .hashfunc:

In many contexts we need that .hashfunc to be like string, e.g. for
accessing hashRegistry by keys. In many other contexts - e.g. when
zodbdump input it parsed or emitted, it is more handy to handle it like
raw bytes.

If we let .hashfunc to be of type str - it breaks the second mode. If of
type bytes - it breaks the first mode.

And also in many places it is hard to constantly encode/decode str and
bytes, especially in the places where an object is sometimes used in
strings context, and sometimes in binary context.

-> Fix it all in one go by using bytestring type from pygolang,
which provides both unicode string and binary semantics simultaneously.

This needs bstr from pygolang (see kirr/pygolang@c9648c44),
but even if pygolang comes without bstr, with this patch zodbtools
continues to work ok on py2 - it will be just py3 mode that won't work.

The list of test failures before this patch is provided below:

    _______________________________ test_zodbanalyze _______________________________

    tmpdir = local('/tmp/pytest-of-kirr/pytest-22/test_zodbanalyze0')
    capsys = <_pytest.capture.CaptureFixture object at 0x7f3de6835c70>

        def test_zodbanalyze(tmpdir, capsys):
            tfs1 = fs1_testdata_py23(tmpdir,
                            os.path.join(os.path.dirname(__file__), "testdata", "1.fs"))

            for use_dbm in (False, True):
    >           report(
                    analyze(
                        tfs1,
                        use_dbm=use_dbm,
                        delta_fs=False,
                        tidmin=None,
                        tidmax=None,
                    ),
                    csv=False,
                )

    zodbtools/test/test_analyze.py:30:
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    rep = <zodbtools.zodbanalyze.Report object at 0x7f3de5e16b20>, csv = False

        def report(rep, csv=False):
            ...
                    print (fmtp % (t_display, rep.TYPEMAP[t], rep.TYPESIZE[t],
                                   pct, rep.TYPESIZE[t] * 1.0 / rep.TYPEMAP[t],
    >                              rep.COIDSMAP[t], rep.CBYTESMAP[t],
                                   rep.FOIDSMAP.get(t, 0), rep.FBYTESMAP.get(t, 0)))
    E               KeyError: b'persistent.mapping.PersistentMapping'

    zodbtools/zodbanalyze.py:147: KeyError

    ____________________________ test_zodbcommit[!zext] ____________________________

    zext = <function zext.<locals>._ at 0x7f3deb5c3e50>

        @func
        def test_zodbcommit(zext):
            tmpd = mkdtemp('', 'zodbcommit.')
            defer(lambda: rmtree(tmpd))

            stor = storageFromURL('%s/2.fs' % tmpd)
            defer(stor.close)

            head = stor.lastTransaction()

            # commit some transactions via zodbcommit and verify if storage dump gives
            # what is expected.
            t1 = Transaction(z64, ' ', b'user name', b'description ...', zext(dumps({'a': 'b'}, _protocol)), [
                ObjectData(p64(1), b'data1', 'sha1', sha1(b'data1')),
                ObjectData(p64(2), b'data2', 'sha1', sha1(b'data2'))])

            t1.tid = zodbcommit(stor, head, t1)

            t2 = Transaction(z64, ' ', b'user2', b'desc2', b'', [
                ObjectDelete(p64(2))])

            t2.tid = zodbcommit(stor, t1.tid, t2)

            buf = BytesIO()
            zodbdump(stor, p64(u64(head)+1), None, out=buf)
            dumped = buf.getvalue()

    >       assert dumped == b''.join([_.zdump() for _ in (t1, t2)])

    zodbtools/test/test_commit.py:61:
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    zodbtools/test/test_commit.py:61: in <listcomp>
        assert dumped == b''.join([_.zdump() for _ in (t1, t2)])
    zodbtools/zodbdump.py:521: in zdump
        z += obj.zdump()
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    self = <zodbtools.zodbdump.ObjectData object at 0x7f3de5d26d90>

        def zdump(self):
            data = self.data
            hashonly = isinstance(data, HashOnly)
            if hashonly:
                size = data.size
            else:
                size = len(data)
    >       z = b'obj %s %d %s:%s' % (ashex(self.oid), size, self.hashfunc, ashex(self.hash_))
    E       TypeError: %b requires a bytes-like object, or an object that implements __bytes__, not 'str'

    zodbtools/zodbdump.py:569: TypeError

    _______________________________ test_dumpreader ________________________________

        def test_dumpreader():
            in_ = b"""\
        txn 0123456789abcdef " "
        user "my name"
        description "o la-la..."
        extension "zzz123 def"
        obj 0000000000000001 delete
        obj 0000000000000002 from 0123456789abcdee
        obj 0000000000000003 54 adler32:01234567 -
        obj 0000000000000004 4 sha1:9865d483bc5a94f2e30056fc256ed3066af54d04
        ZZZZ
        obj 0000000000000005 9 crc32:52fdeac5
        ABC

        DEF!

        txn 0123456789abcdf0 " "
        user "author2"
        description "zzz"
        extension "qqq"

        """

            r = DumpReader(BytesIO(in_))
    >       t1 = r.readtxn()

    zodbtools/test/test_dump.py:78:
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    zodbtools/zodbdump.py:443: in readtxn
        self._badline('unknown hash function %s' % qq(hashfunc))
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    self = <zodbtools.zodbdump.DumpReader object at 0x7f3de5d69cd0>
    msg = 'unknown hash function "adler32"'

        def _badline(self, msg):
    >       raise RuntimeError("%s+%d: invalid line: %s (%s)" % (_ioname(self._r), self.lineno, msg, qq(self._line)))
    E       RuntimeError: +7: invalid line: unknown hash function "adler32" ("obj 0000000000000003 54 adler32:01234567 -")

    zodbtools/zodbdump.py:382: RuntimeError

    ___________________________ test_zodbrestore[!zext] ____________________________

    tmpdir = local('/tmp/pytest-of-kirr/pytest-22/test_zodbrestore__zext_0')
    zext = <function zext.<locals>._ at 0x7f3de5d6ddc0>

        @func
        def test_zodbrestore(tmpdir, zext):
            zkind = '_!zext' if zext.disabled else ''

            # restore from testdata/1.zdump.ok and verify it gives result that is
            # bit-to-bit identical to testdata/1.fs
            tdata = dirname(__file__) + "/testdata"
            @func
            def _():
                zdump = open("%s/1%s.zdump.raw.ok" % (tdata, zkind), 'rb')
                defer(zdump.close)

                stor = storageFromURL('%s/2.fs' % tmpdir)
                defer(stor.close)

                zodbrestore(stor, zdump)
    >       _()

    zodbtools/test/test_restore.py:49:
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    ../../venv/py3.venv/lib/python3.9/site-packages/decorator.py:232: in fun
        return caller(func, *(extras + args), **kw)
    ../../../tools/go/pygolang/golang/__init__.py:103: in _
        return f(*argv, **kw)
    zodbtools/test/test_restore.py:48: in _
        zodbrestore(stor, zdump)
    zodbtools/zodbrestore.py:39: in zodbrestore
        txn = zr.readtxn()
    zodbtools/zodbdump.py:443: in readtxn
        self._badline('unknown hash function %s' % qq(hashfunc))
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    self = <zodbtools.zodbdump.DumpReader object at 0x7f3de5d79e20>
    msg = 'unknown hash function "sha1"'

        def _badline(self, msg):
    >       raise RuntimeError("%s+%d: invalid line: %s (%s)" % (_ioname(self._r), self.lineno, msg, qq(self._line)))
    E       RuntimeError: /home/kirr/src/wendelin/z/zodbtools/zodbtools/test/testdata/1_!zext.zdump.raw.ok+5: invalid line: unknown hash function "sha1" ("obj 0000000000000000 61 sha1:664e6de0f153d8eaeda638d616a320c6e3c5feb1")

    zodbtools/zodbdump.py:382: RuntimeError

9861c136

zodbcommit: Fix stdin reading on py3 · b21fbe23

Kirill Smelkov authored Sep 07, 2022

Zodbcommit reads input in zodbdump format from stdin and then uses
zodbdump.DumpReader to parser that input. The parser works on binary
data.

However zodbcommit, was preparing that input data mixing bytes and
strings, which is failing on py3:

    (py3.venv) kirr@deca:~/src/wendelin/z/zodbtools$ zodb commit 1.fs 00
    Ignoring index for /home/kirr/src/wendelin/z/zodbtools/1.fs
    aaa
    Traceback (most recent call last):
      File "/home/kirr/src/wendelin/venv/py3.venv/bin/zodb", line 33, in <module>
        sys.exit(load_entry_point('zodbtools', 'console_scripts', 'zodb')())
      File "/home/kirr/src/wendelin/z/zodbtools/zodbtools/zodb.py", line 129, in main
        return command_module.main(argv)
      File "/home/kirr/src/wendelin/venv/py3.venv/lib/python3.9/site-packages/decorator.py", line 232, in fun
        return caller(func, *(extras + args), **kw)
      File "/home/kirr/src/tools/go/pygolang/golang/__init__.py", line 103, in _
        return f(*argv, **kw)
      File "/home/kirr/src/wendelin/z/zodbtools/zodbtools/zodbcommit.py", line 222, in main
        zin += sys.stdin.read()
    TypeError: can't concat str to bytes

-> Fix it by reading stdin in binary mode.

No test currently as zodbcommit.main is not covered by tests (hopefully yet).

b21fbe23

zodbdump: Fix pickle disassembly on py3 · 69dc6de1

Kirill Smelkov authored Sep 07, 2022

pickletools.dis, which is used to handle --pretty=zpickledis (*),
expects output stream be text-like, not binary. We were passing a binary
stream to it. As the result pickle disassembly was failing on py3:

    _______________________ test_zodbdump[!zext-zpickledis] ________________________

    tmpdir = local('/tmp/pytest-of-kirr/pytest-11/test_zodbdump__zext_zpickledis0')
    zext = <function zext.<locals>._ at 0x7f538b508670>, pretty = 'zpickledis'

        @mark.parametrize('pretty', ('raw', 'zpickledis'))
        def test_zodbdump(tmpdir, zext, pretty):
            tdir  = dirname(__file__)
            zkind = '_!zext' if zext.disabled else ''
            tfs1  = fs1_testdata_py23(tmpdir, '%s/testdata/1%s.fs' % (tdir, zkind))
            stor  = FileStorage(tfs1, read_only=True)

            with open('%s/testdata/1%s.zdump.%s.ok' % (tdir, zkind, pretty), 'rb') as f:
                dumpok = f.read()

            out = BytesIO()
    >       zodbdump(stor, None, None, pretty=pretty, out=out)

    zodbtools/test/test_dump.py:48:
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    zodbtools/zodbdump.py:165: in zodbdump
        pickletools.dis(dataf, disf) # class
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    pickle = <_io.BytesIO object at 0x7f538b577130>
    out = <_io.BytesIO object at 0x7f538b49f8b0>, memo = {}, indentlevel = 4
    annotate = 0

        def dis(pickle, out=None, memo=None, indentlevel=4, annotate=0):
            """Produce a symbolic disassembly of a pickle..."""
            ...
            for opcode, arg, pos in genops(pickle):
                if pos is not None:
    >               print("%5d:" % pos, end=' ', file=out)
    E               TypeError: a bytes-like object is required, not 'str'

    /usr/lib/python3.9/pickletools.py:2450: TypeError

-> Fix it by letting pickletools.dis to emit its output to StringIO instead of BytesIO.

(*) see 80559a94 "zodbdump: support --pretty option with a format to show
    pickles disassembly"

69dc6de1

tests: Adjust testdata FileStorage for current Python on the fly · e825f80f

Kirill Smelkov authored Sep 07, 2022

FileStorage/py2 uses `FS21` magic in file header, whereas
FileStorage/py3 uses `FS30` magic:

    https://github.com/zopefoundation/ZODB/blob/0e72b8b13657/src/ZODB/_compat.py#L39
    https://github.com/zopefoundation/ZODB/blob/0e72b8b13657/src/ZODB/_compat.py#L74

And if, upon opening the database, file magic does not match to what ZODB
expects, open is rejected:

    https://github.com/zopefoundation/ZODB/blob/0e72b8b13657/src/ZODB/FileStorage/FileStorage.py#L88
    https://github.com/zopefoundation/ZODB/blob/0e72b8b13657/src/ZODB/FileStorage/FileStorage.py#L1625-L1630

This is done with the idea for a database, that was written from
Python2, to be rejected to be opened from Python3 and vice-versa because
strings/bytes semantics changed in between py23.

As the result, many zodbtools tests currently fail on py3 when they try
to access prepared FileStorage database in testdata, because that
database was originally prepared on py2. Here is, for example, how
test_zodbdump fails:

    ___________________________ test_zodbdump[zext-raw] ____________________________

    zext = <function zext.<locals>._ at 0x7f28530bf9d0>, pretty = 'raw'

        @mark.parametrize('pretty', ('raw', 'zpickledis'))
        def test_zodbdump(zext, pretty):
            tdir  = dirname(__file__)
            zkind = '_!zext' if zext.disabled else ''
    >       stor  = FileStorage('%s/testdata/1%s.fs' % (tdir, zkind), read_only=True)

    zodbtools/test/test_dump.py:41:
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    ../ZODB/src/ZODB/FileStorage/FileStorage.py:315: in __init__
        self._pos, self._oid, tid = read_index(
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    file = <_io.BufferedReader name='/home/kirr/src/wendelin/z/zodbtools/zodbtools/test/testdata/1.fs'>
    name = '/home/kirr/src/wendelin/z/zodbtools/zodbtools/test/testdata/1.fs'
    index = <ZODB.fsIndex.fsIndex object at 0x7f2852fee2b0>, tindex = {}
    stop = b'\xff\xff\xff\xff\xff\xff\xff\xff'
    ltid = b'\x00\x00\x00\x00\x00\x00\x00\x00', start = 4
    maxoid = b'\x00\x00\x00\x00\x00\x00\x00\x00', recover = 0, read_only = True

        def read_index(file, name, index, tindex, stop=b'\377'*8,
                       ltid=z64, start=4, maxoid=z64, recover=0, read_only=0):
            """Scan the file storage and update the index."""
            ...
            if file_size:
                if file_size < start:
                    raise FileStorageFormatError(file.name)
                seek(0)
                if read(4) != packed_version:
    >               raise FileStorageFormatError(name)
    E               ZODB.FileStorage.FileStorage.FileStorageFormatError: /home/kirr/src/wendelin/z/zodbtools/zodbtools/test/testdata/1.fs

    ../ZODB/src/ZODB/FileStorage/FileStorage.py:1630: FileStorageFormatError

Since zodbtools primarily work on raw data without decoding stored
pickles, unlike Zope or ERP5, it should not be a problem for zodbtools
to work on py3 with the database that was prepared on py2.

-> Adjust all tests to use FileStorage data generated on the fly based
on original files in testdata/ but with FileStorage header being
rewritten to match current python.

e825f80f

util += writefile · 3cb93096

Kirill Smelkov authored Sep 07, 2022

A counterpart to readfile - to write a file instead of reading it.
We will need this function in the next patch.

3cb93096

util: Factor readfile function into here · adec18bd
Kirill Smelkov authored Sep 07, 2022
```
Soon we will need to use it not only from test_restore.py
```
adec18bd

29 Mar, 2022 1 commit

zodbdump: support --pretty option with a format to show pickles disassembly · 80559a94

Jérome Perrin authored Mar 20, 2022

Showing pickle disassembly can sometimes be useful to analyse details of
the pickle content. We realized that in some data structures used in
ERP5 the same string was saved multiple times in the same pickle and by
using the exact same string (ie. for which `s1 is s2` is True), the
pickle will have the string only once and pickles are a bit smaller. For
more reference, the context was
nexedi/erp5!1560 (comment 154825)

This introduces a new --pretty option that we will be able to extend
later with more output formats.
Co-authored-by: Kirill Smelkov <kirr@nexedi.com>
Reviewed-on: nexedi/zodbtools!22

80559a94

01 Apr, 2021 1 commit

zodbrestore: Mark restore-with-extension tests as xfail on ZODB4 · aa7e1966

Jérome Perrin authored Mar 18, 2021

@kirr wrote (!19 (comment 129442))

For the reference - contrary to ZODB5, restore tests on ZODB4 are currently
[broken](https://nexedijs.erp5.net/#/test_result_module/20210317-B3AC205A/2).
Restored file is not bit-to-bit identical to the original.

The problem is that on commit/restore, we need to save
user/description/extension. For extension `zodbdump.Transaction` provides
.extension_bytes, which ZODB5 uses to save its raw copy. However ZODB4 goes
through `.extension` and pickles it:

https://lab.nexedi.com/nexedi/zodbtools/blob/129afa67/zodbtools/zodbdump.py#L425-453
https://github.com/zopefoundation/ZODB/blob/4/src/ZODB/BaseStorage.py#L220-L240

This leads to unpickle-repickle round-trip and different extension being committed on restore:

```diff
diff --git a/1zdump b/2zdump
index 5033bc1..a3a32aa 100644
--- a/1zdump
+++ b/2zdump
@@ -10,7 +10,7 @@ q^A.
txn 0285cbac3d0369e6 " "
user "user0.0"
description "step 0.0"
-extension "\x80\x02}q\x01(U\tx-cookieSU\x05RF9IEU\vx-generatorq\x02U\fzodb/py2 (f)u."
+extension "}q\x01(U\tx-cookieSU\x05RF9IEU\vx-generatorU\fzodb/py2 (f)u."
obj 0000000000000000 98 sha1:eba252d1984f975ecb636bc1b3a89c953dd20527
...
```

What might save us is to somehow in Transaction.extension returns a
dict-subclass object that is somehow pickled to the exact bytes remembered when
it was created. However, after briefly checking, I could not find a mechanism
to do so yet...

@jerome wrote (!19 (comment 129479))

@kirr we already have pytest fixtures to test differently depending on whether
the ZODB version has support for extension_bytes, so what about using it in the
test and testing restoring the extension bytes version of the dump only for
ZODB5 ?

@kirr wrote (!19 (comment 129482))

@jerome, yes we have this, but I believe we should actually fix zodbrestore to
be reliable whatever ZODB is used. For ZODB5 it works. For ZODB4-wc2 we can
adjust ZODB code to use extension_bytes similarly to how ZODB5 does. But
unpatched ZODB4 is currently out of luck. As it was decided that Nexedi will
use both ZODB4 and ZODB4-wc2, I think we should fix zodbrestore to work on all
those versions to be reliable.

/cc @tomo

@kirr:

-> No universal ZODB4 fix for now (this would require to monkey patch ZODB in
several places), so mark "restore with extension" test as xfail similarly to
how we already do for "dump with extension" test.

This brings -ZODB4 and -ZODB4-wc2 tests back to PASS state.

Even though on ZODB4 extension is restored not bit-to-bit exactly, it is
restored to be the same dictionary equal to what was used to produce the
dump. Not ideal, but still not loosing the information in practice.

One more reason to switch to ZODB5...

aa7e1966

16 Mar, 2021 2 commits

zodbcommit: Provide full context when reporting errors · 129afa67

Kirill Smelkov authored Mar 16, 2021

In the previous patch we taught object copy handler to report more
details, but it was still incomplete - the error was missing details
about which operation was run - commit, or restore of particular
transaction.

Noting that it can be also noted that other errors reported from that
function lack such context.

-> So fix it universally, at least for zodbcommit for now: set top-level
runctx to topic of what we are doing, and use that runctx when
generating errors. Runctx describes what we are running, and could be
also later used for logging and tracing. That's why it is called runctx
instead of just errctx for "error context".

TODO currently it is only exceptions that we explicitly raise which get
the context. If an exception is raised by something that we call - the
context won't be added. It would be good to later rework error handling
and append such context for any raised error. Defer and
https://lab.nexedi.com/kirr/go123/blob/863c4602/xerr/__init__.py has
something preliminary for this.

The particular error when restoring a missing object copy becomes

    ValueError: /tmp/demo002868462/δ0285cbac75555580/δ.fs: restore 0285cbacb70a3db3 @0285cbacb258bf66: object 0000000000000003: copy from @0285cbac70a3d733: no data

instead of older

    ValueError: /tmp/demo358030847/δ0285cbac75555580/δ.fs: object 0000000000000003: copy from @0285cbac70a3d733: no data

/reviewed-by @jerome
/reviewed-on !20

129afa67

zodbcommit: Robustify copy handling · fa00c283

Kirill Smelkov authored Mar 16, 2021

When zodbdump input says to copy an object, we first load that object.
However if object does not exist loadBefore raises POSKeyError, and when
object at copied-from revision was deleted loadBefore returns None.

-> Handle that explicitly to provide failure details to the user, so
that instead of cryptic

    === RUN   TestLoad/δstart=0285cbac75555580
    Traceback (most recent call last):
      File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
        "__main__", fname, loader, pkg_name)
      File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
        exec code in run_globals
      File "/home/kirr/src/wendelin/z/zodbtools/zodbtools/zodb.py", line 133, in <module>
        main()
      File "/home/kirr/src/wendelin/z/zodbtools/zodbtools/zodb.py", line 129, in main
        return command_module.main(argv)
      File "<decorator-gen-6>", line 2, in main
      File "/home/kirr/src/tools/go/pygolang/golang/__init__.py", line 103, in _
        return f(*argv, **kw)
      File "/home/kirr/src/wendelin/z/zodbtools/zodbtools/zodbrestore.py", line 94, in main
        zodbrestore(stor, asbinstream(sys.stdin), _)
      File "/home/kirr/src/wendelin/z/zodbtools/zodbtools/zodbrestore.py", line 43, in zodbrestore
        zodbcommit(stor, at, txn)
      File "/home/kirr/src/wendelin/z/zodbtools/zodbtools/zodbcommit.py", line 122, in zodbcommit
        _()
      File "/home/kirr/src/wendelin/z/zodbtools/zodbtools/zodbcommit.py", line 91, in _
        data, _, _ = stor.loadBefore(obj.oid, p64(u64(obj.copy_from)+1))
    TypeError: 'NoneType' object is not iterable
        xtesting.go:483: /tmp/demo009767458/δ0285cbac75555580/δ.fs: zpyrestore: exit status 1

it fails with something more understandable:

    === RUN   TestLoad/δstart=0285cbac75555580
    Traceback (most recent call last):
      File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
        "__main__", fname, loader, pkg_name)
      File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
        exec code in run_globals
      File "/home/kirr/src/wendelin/z/zodbtools/zodbtools/zodb.py", line 133, in <module>
        main()
      File "/home/kirr/src/wendelin/z/zodbtools/zodbtools/zodb.py", line 129, in main
        return command_module.main(argv)
      File "<decorator-gen-6>", line 2, in main
      File "/home/kirr/src/tools/go/pygolang/golang/__init__.py", line 103, in _
        return f(*argv, **kw)
      File "/home/kirr/src/wendelin/z/zodbtools/zodbtools/zodbrestore.py", line 94, in main
        zodbrestore(stor, asbinstream(sys.stdin), _)
      File "/home/kirr/src/wendelin/z/zodbtools/zodbtools/zodbrestore.py", line 43, in zodbrestore
        zodbcommit(stor, at, txn)
      File "/home/kirr/src/wendelin/z/zodbtools/zodbtools/zodbcommit.py", line 129, in zodbcommit
        _()
      File "/home/kirr/src/wendelin/z/zodbtools/zodbtools/zodbcommit.py", line 97, in _
        (stor.getName(), ashex(obj.oid), ashex(obj.copy_from)))
    ValueError: /tmp/demo358030847/δ0285cbac75555580/δ.fs: object 0000000000000003: copy from @0285cbac70a3d733: no data
        xtesting.go:483: /tmp/demo358030847/δ0285cbac75555580/δ.fs: zpyrestore: exit status 1

For the implementation it would be easier to use loadAt
(https://github.com/zopefoundation/ZODB/pull/323), but we don't have
that yet.

/reviewed-by @jerome
/reviewed-on nexedi/zodbtools!20

fa00c283

15 Mar, 2021 4 commits

restore, commit: Help += note that those tools are low-level · 4275f2e9

Kirill Smelkov authored Mar 15, 2021

Suggessted by @jerome here: nexedi/zodbtools!19 (comment 129181)

Co-authored-with: Jérome Perrin <jerome@nexedi.com>
/reviewed-on nexedi/zodbtools!19

4275f2e9

zodbrestore - Tool to restore content of a ZODB database from zodbdump output · b944e0ee

Kirill Smelkov authored Mar 10, 2021

Zodbrestore is long-coming counterpart to zodbdump.
Implementation is internally based on reworked zodbcommit.

For FileStorage restored database is verified via test to be bit-to-bit
identical to the original.

For NEO it won't be exactly the case, as NEO does not implement
IStorageRestoreable: there is only tpc_begin(tid=...) but no restore().

/helped-by @jerome
/reviewed-on nexedi/zodbtools!19

b944e0ee

zodbcommit: Prepare to compute current serial of an oid lazily · e7b82a96

Kirill Smelkov authored Mar 10, 2021

This current serial will not be needed on new codepaths to be added to
zodbcommit in the next patch.

-> Move the computation to function to trigger it only from places where
knowing current serial is actually needed.

/reviewed-by @jerome
/reviewed-on !19

e7b82a96

zodbcommit: Don't forget to call tpc_abort on an error · 67b42fa7

Kirill Smelkov authored Mar 10, 2021

Two-phase commit protocol assumes that after tpc_begin, it will be
either successful tpc_vote + tpc_finish, or tpc_abort. We were not
calling tpc_abort on an error, potentially leaving storage in "commit is
in progress" state on an error.

/reviewed-by @jerome
/reviewed-on !19

67b42fa7

10 Mar, 2021 2 commits

Drop support for ZODB3 · c59a54ca

Kirill Smelkov authored Mar 09, 2021

Nexedi stack is dropping support for that old ZODB version - see e.g.

- slapos@70d05199
- neoppod@3a8f6f03
- wendelin.core@0802da2b

Regarding test/gen_testdata.py: even though ZODB4 uses zodbpickle, and
so should be able to load pickles encoded with protocol 3 even on
python2, in practice it does not work so well: ZODB4 tests fail if I set

    --- a/src/ZODB/_compat.py
    +++ b/src/ZODB/_compat.py
    @@ -34,7 +34,7 @@
         HIGHEST_PROTOCOL = cPickle.HIGHEST_PROTOCOL
         IMPORT_MAPPING = {}
         NAME_MAPPING = {}
    -    _protocol = 1
    +    _protocol = 3
         FILESTORAGE_MAGIC = b"FS21"
     else:
         # Python 3.x: can't use stdlib's pickle because

-> so continue to preserve protocol < 3 when generating the test
database for compatibility - now with ZODB4/py2.

/reviewed-by @jerome
/reviewed-on !18

c59a54ca

tox: Don't run tests agains ZODB+PR183 anymore · 986baf02

Kirill Smelkov authored Mar 09, 2021

The patch that provides raw-extension functionality was merged into ZODB 5.6:

https://github.com/zopefoundation/ZODB/commit/2f8cc67a3ba3

So when testing with ZODB5 >= 5.6 the tests will excercise code path
that uses txn.extension_bytes, and when testing with ZODB4 the tests
will excercise code path that work-arounds lack of txn.extension_bytes.

/reviewed-by @jerome
/reviewed-on !18

986baf02

02 Nov, 2020 1 commit

Add way to run tests via nxdtest · 518537ea

Kirill Smelkov authored Oct 15, 2020

Nxdtest[1] is tox-like tool to run tests under Nexedi testing
infrastructure.

[1] https://lab.nexedi.com/nexedi/nxdtest

/reviewed-on !17

518537ea

30 Apr, 2020 1 commit

More python3 support · 05de3cb4

Kirill Smelkov authored Apr 30, 2020

Flushing changes from yet another attempt. Still not completely there yet, but closer.

Reviewed-by: @jerome
Reviewed-on: !16

05de3cb4

29 Apr, 2020 6 commits

tidrange: test: Fix for py3 · 2236aaaf
Kirill Smelkov authored Apr 29, 2020
```
ashex gives bytes, whereas reference_tid was str.
```
2236aaaf

*: dict.keys() returns sequence, not [] on py3 · 7851a964

Kirill Smelkov authored Apr 29, 2020

The sequence cannot be randomly accessed, e.g.

    In [5]: d = {1:2}

    In [6]: kv = d.keys()

    In [7]: kv
    Out[7]: dict_keys([1])

    In [8]: kv[0]
    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    <ipython-input-8-643f90e1910b> in <module>()
    ----> 1 kv[0]

    TypeError: 'dict_keys' object is not subscriptable

-> Use list(dict.keys()) in places where we need random access.

7851a964

*: Pass bytes literal into BytesIO · 2f9e0623

Kirill Smelkov authored Apr 29, 2020

Otherwise it breaks with str on py3:

	In [1]: from io import BytesIO

	In [2]: BytesIO("abc")
	---------------------------------------------------------------------------
	TypeError                                 Traceback (most recent call last)
	<ipython-input-2-52a130edd46d> in <module>()
	----> 1 BytesIO("abc")

	TypeError: a bytes-like object is required, not 'str'

2f9e0623

zodbdump: Use bytes to emit its output · d3152c78

Kirill Smelkov authored Apr 29, 2020

Zodbdump format is text-binary and is saved into files opened in binary
mode. -> We have to emit bytes - not strings - into it, since otherwise
on Python3 it would break.

This needs qq support from pygolang[1] to be able to use qq with both
string and bytestring format, e.g. for

	 "hello %s" % qq(name),	and
	b"hello %s" % qq(name)

to give the same output irregardless of whether name is str or bytes.

[1] pygolang!1

d3152c78

*: Zodbdump format is semi text-binary: Mark it as such + handle zdump output as binary · ddd5fd03

Kirill Smelkov authored Apr 29, 2020

Zodbdump format is already described as semi text-binary in top-level
zodbdump.py documentation. However zdump() docstring was referring to it
as "text". Fix it and use binary to handle places where zdump is
loaded/saved.

ddd5fd03

*: Don't use %r to print/report lines/bytes to outside · bc608aea

Kirill Smelkov authored Apr 29, 2020

%r has different output for strings and bytes on python3:

	In [1]: a = 'hello'
	In [2]: b = b'hello'

	In [3]: repr(a)
	Out[3]: "'hello'"

	In [4]: repr(b)
	Out[4]: "b'hello'"

-> Use qq whose output is stable irregardless of whether input is string or bytes.

bc608aea

13 Mar, 2020 1 commit

zodbinfo: Provide "head" as command to query DB head; Turn "last_tid" into... · a2e4dd23

Kirill Smelkov authored Mar 13, 2020

zodbinfo: Provide "head" as command to query DB head; Turn "last_tid" into deprecated alias for head

Similarly to go version: kirr/neo@151d8b79.

a2e4dd23

14 Feb, 2020 1 commit

test/gen_testdata: Fix for ZODB5 > 5.5.1 + preserve database compatibility with ZODB3/py2 · 0b6f99da

Kirill Smelkov authored Feb 07, 2020

Starting with upcoming ZODB 5.5.2 ZODB tries to preserve
`extension_bytes` transaction metadata property in the raw form as it
was stored on disk in the database:

    https://github.com/zopefoundation/ZODB/commit/2f8cc67a

However now when running test/gen_testdata.py with ZODB with that patch (and
gen_testdata.py refuses to work if it detects that ZODB does not properly
supports .extension_bytes property because we want it to be present in the
generated test database [1,2]) it now breaks:

    $ ./gen_testdata.py
    Traceback (most recent call last):
      File "./gen_testdata.py", line 230, in <module>
        main()
      File "./gen_testdata.py", line 224, in main
        gen_testdb("%s.fs" % dbname, zext=zext)
      File "./gen_testdata.py", line 194, in gen_testdb
        stor.tpc_begin(txn)
      File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/BaseStorage.py", line 193, in tpc_begin
        ext = transaction.extension_bytes
    AttributeError: 'Transaction' object has no attribute 'extension_bytes'

The breakage is because, as specified in ZODB interfaces[3,4], storage requires
ZODB.IStorageTransactionMetaData, not transaction.ITransaction instance
gen_testdata.py was using. The script used to work before just by luck.

The fix is to convert transaction instance into storage transaction metadata
object for the place where we talk to storage at raw level.

HOWEVER, when checking regenerated database and its dump I noticed:

ZODB >= 5.4.0 uses pickle protocol 3 on both python2 and python3

    https://github.com/zopefoundation/ZODB/commit/12ee41c4

In other words it saves e.g. OID of an object as pickle binary, which decodes
as bytes on py3 and zodbpickle.binary on py2 when decoding via zodbpickle.
However it will result in *DecodeError* when decoding on py2 with standard
pickle module. The latter means that ZODB3 will _fail_ to load data from test
database, because ZODB3 - contrary to ZODB4 and ZODB5 - uses std pickle module,
not zodbpickle.

We still care about ZODB3 and in particular it is included into
zodbtools test matrix:

    https://lab.nexedi.com/nexedi/zodbtools/blob/7bc0385e/tox.ini#L9-14

so we cannot break it.

-> Temporarily patch ZODB at runtime to make sure it emits data with
older protocol and without using zodbpickle.binary for oid, so that
generated test database could be loaded on ZODB3 as well.

gen_testdata.py now works with latest ZODB, but produces exactly the
same bit-to-bit output as before.

[1] https://lab.nexedi.com/nexedi/zodbtools/blob/7bc0385e/zodbtools/test/gen_testdata.py#L215-217
[2] https://lab.nexedi.com/nexedi/zodbtools/blob/7bc0385e/zodbtools/test/testutil.py#L31-63
[3] https://github.com/zopefoundation/ZODB/blob/5.5.1-35-gb5895a5c2/src/ZODB/interfaces.py#L815-L818
[4] https://github.com/zopefoundation/ZODB/blob/5.5.1-35-gb5895a5c2/src/ZODB/interfaces.py#L538-L575

/reviewed-on !15

0b6f99da

09 Jul, 2019 1 commit

tox: Don't duplicate setup.py on which for-tests dependencies we need · 7bc0385e

Kirill Smelkov authored Jul 08, 2019

-> Use .[test] to refer to them.
https://stackoverflow.com/a/41398850/9456786

/reviewed-by @jerome
/reviewed-on !14

7bc0385e

03 Jun, 2019 1 commit

More python3 compatibility · b44f9c0d

Kirill Smelkov authored Jun 02, 2019

@jerome, I was trying to make zodbtools work with Python3 and along that road picked some bits of your work from !12. At present the migration to Python3 is not complete, and even though now I have the answer to how handle strings in both python2/3 in compatible and reasonable way (I can share details if you are interested), I have to put that work on hold for some time and use https://pypi.org/project/pep3134 directly in wcfs tests, since getting all string details right, even after figuring on how to do it, will take time. Anyway the bits presented here should be ready for master and could be merged now. Could you please have a look?

Thanks beforehand,  
Kirill

/reviewed-on !13

b44f9c0d

24 May, 2019 8 commits

zodbdump: Default out to stdout in binary mode · c5f20201

Kirill Smelkov authored May 24, 2019

Zodbdump format is mixed text+binary so dumping to unicode stdout won't
work.

Based on patch by Jérome Perrin.

c5f20201

*: s.decode('hex') -> fromhex(s) · b508f108

Kirill Smelkov authored May 24, 2019

Because on Py3:

        def test_dumpreader():
            in_ = b"""\
        txn 0123456789abcdef " "
        user "my name"
        description "o la-la..."
        extension "zzz123 def"
        obj 0000000000000001 delete
        obj 0000000000000002 from 0123456789abcdee
        obj 0000000000000003 54 adler32:01234567 -
        obj 0000000000000004 4 sha1:9865d483bc5a94f2e30056fc256ed3066af54d04
        ZZZZ
        obj 0000000000000005 9 crc32:52fdeac5
        ABC

        DEF!

        txn 0123456789abcdf0 " "
        user "author2"
        description "zzz"
        extension "qqq"

        """

            r = DumpReader(BytesIO(in_))
            t1 = r.readtxn()
            assert isinstance(t1, Transaction)
    >       assert t1.tid == '0123456789abcdef'.decode('hex')
    E       AttributeError: 'str' object has no attribute 'decode'

    test/test_dump.py:77: AttributeError

Based on patch by Jérome Perrin.

b508f108

utils: Initialize hashers with bytes · 1418c86f

Kirill Smelkov authored May 24, 2019

	self = <zodbtools.util.CRC32Hasher object at 0x7f887ae465f8>

	    def __init__(self):
	>       self._h = crc32('')
	E       TypeError: a bytes-like object is required, not 'str'

	util.py:208: TypeError

Based on patch by Jérome Perrin.

1418c86f

*: Pass bytes - not unicode - literals to sha1() · a7eee284

Kirill Smelkov authored May 24, 2019

	data = 'data1'

	    def sha1(data):
	        m = hashlib.sha1()
	>       m.update(data)
	E       TypeError: Unicode-objects must be encoded before hashing

	zodbtools/util.py:38: TypeError

Based on patch by Jérome Perrin.

a7eee284

util: Fix ashex for Python3 · 7a7370e6

Kirill Smelkov authored May 24, 2019

	s = b'\x03\xc4\x85v\x00\x00\x00\x00'

	    def ashex(s):
	>       return s.encode('hex')
	E       AttributeError: 'bytes' object has no attribute 'encode'

	zodbtools/util.py:29: AttributeError

s.encode('hex') used to work on Py2 but fails on Py3:

	In [1]: s = "abc"

	In [2]: b = b"def"

	In [3]: s.encode('hex')
	---------------------------------------------------------------------------
	LookupError                               Traceback (most recent call last)
	<ipython-input-3-75ae843597fe> in <module>()
	----> 1 s.encode('hex')

	LookupError: 'hex' is not a text encoding; use codecs.encode() to handle arbitrary codecs

	In [4]: b.encode('hex')
	---------------------------------------------------------------------------
	AttributeError                            Traceback (most recent call last)
	<ipython-input-4-ec2fccff20bc> in <module>()
	----> 1 b.encode('hex')

	AttributeError: 'bytes' object has no attribute 'encode'

	In [5]: import codecs

	In [6]: codecs.encode(b, 'hex')
	Out[6]: b'646566'

	In [7]: codecs.encode(s, 'hex')
	---------------------------------------------------------------------------
	TypeError                                 Traceback (most recent call last)
	/usr/lib/python3.7/encodings/hex_codec.py in hex_encode(input, errors)
	     14     assert errors == 'strict'
	---> 15     return (binascii.b2a_hex(input), len(input))
	     16

	TypeError: a bytes-like object is required, not 'str'

	The above exception was the direct cause of the following exception:

	TypeError                                 Traceback (most recent call last)
	<ipython-input-7-7fcb16cead4f> in <module>()
	----> 1 codecs.encode(s, 'hex')

	TypeError: encoding with 'hex' codec failed (TypeError: a bytes-like object is required, not 'str')

After the patch it works with bytes and raises for str.
Fromhex does not need to be changed - it already uses codecs.decode way as
originally added in dd959b28 (zodbdump += DumpReader - to read/parse zodbdump
stream).

Based on patch by Jérome Perrin.

7a7370e6

*: cStringIO.StringIO -> io.BytesIO · 62b21d01

Kirill Smelkov authored May 24, 2019

There is no cStringIO on Python3:

	test_dump.py:26: in <module>
	    from cStringIO import StringIO
	E   ModuleNotFoundError: No module named 'cStringIO'

Based on patch by Jérome Perrin.

62b21d01

zodb: rework command driver for python3 compatibility · 00a534ef
Jérome Perrin authored Jan 30, 2019
```
This makes zodb command driver tests added in the previous patch to pass
on both python2 and python3.
```
00a534ef

test: add a test for zodb commmad and help driver · 2d94ae9d

Jérome Perrin authored Jan 30, 2019

----

kirr: factor running `zodb ...` into zodbrun + add test for `zodb -h`.

Added test currently passes on py2, but fails on py3:

	out = <_io.TextIOWrapper encoding='UTF-8'>

	    def usage(out):
	        print("""\
	    Zodb is a tool for managing ZODB databases.

	    Usage:

	        zodb command [arguments]

	    The commands are:
	    """, file=out)

	        cmdv = command_dict.keys()
	>       cmdv.sort()
	E       AttributeError: 'dict_keys' object has no attribute 'sort'

	zodbtools/zodb.py:55: AttributeError

It will be fixed in the next patch.

2d94ae9d

07 Mar, 2019 1 commit
- zodbtools v0.0.0.dev8 · bcaf3984
  Jérome Perrin authored Mar 06, 2019
  
  bcaf3984