Commit c58c20d6 authored by Kirill Smelkov's avatar Kirill Smelkov

wcfs: tests: xbtree.py package for inspecting/manipulating internal structure of BTrees

To handle invalidations, WCFS will need to detect changes to both ZBlk
objects and to ZBigFile.blktab BTree that is mapping file blocks to ZBlk
objects. And with BTree detecting changes is much more complex, because
when a BTree changes, it might be rebalanced, or keys migrated from one
tree/bucket node to another tree/bucket node. In other words a BTree
change might be not only a change to a {}key->value dictionary, but also
a change to BTree topology.

Because there are many BTree topologies that correspond to the same
{}key->value state, a change from kv₁ to kv₂, even if kv₁ and kv₂ are
close to each other, might be accompanied by a dramatic change to
topology of the tree. This creates a need for thoroughly testing the
BTree difference algorithm because many of BTree topologies changes are
tricky, and if a simple algorithm works on relatively stable topology
updates, it does not necessarily mean that that same algorithm will
continue to work correctly in the general case.

So, as a preparatory step, here comes xbtree.py package, that can be
used to inspect tree topologies, to create trees with specified topology
and to manipulate topology of an existing tree. This package will be
used in tests for upcoming ΔBtail.

For debugging, and also since those tests will involve both Go and
Python parts, it creates the need to be able to specify and exchange
topology of a tree via compact string. This package also defines so
called "topology encoding" to do so.

Some preliminar history:

fb56193f    X fix metric to keep Z <- N order stable over key^
809304d1    X "B:" indicates ø bucket with k&b, "B" - ø bucket with only keys
9eca74ec    X Teach AllStructs to emit topologies with values
1b962f03    X Restructure: found bug that it was not marking objects as modified
9181c5d9    X Restructure; verify that it marks as changed only modifed nodes
e9902c4a    X improve `xbtree topoview`

For the reference xbtree.py package documentation is quoted below.

---- 8< ----

Package xbtree provides utilities for inspecting/manipulating internal
structure of integer-keyed BTrees.

It will be primarily used to help verify ΔBTail in WCFS.

- `Tree` represents a tree node.
- `Bucket` represents a bucket node.
- `StructureOf` returns internal structure of ZODB BTree represented as Tree
  and Bucket nodes.
- `Restructure` reorganizes ZODB BTree instance according to specified topology
  structure.

- `AllStructs` generates all possible BTree topology structures with given keys.

Topology encoding
-----------------

Topology encoding provides way to represent structure of a Tree as path-like string.

TopoEncode converts Tree into its topology-encoded representation, while
TopoDecode decodes topology-encoded string back into Tree.

The following example illustrates topology encoding represented by string
"T3/T-T/B1-T5/B-B7,8,9":

      [ 3 ]             T3/         represents Tree([3])
       / \
     [ ] [ ]            T-T/        represents two empty Tree([])
      ↓   ↓
     |1|[ 5 ]           B1-T5/      represent Bucket([1]) and Tree([5])
         / \
        || |7|8|9|      B-B7,8,9    represents empty Bucket([]) and Bucket([7,8,9])

Topology encoding specification:

A Tree is encoded by level-order traversal, delimiting layers with "/".
Inside a layer Tree and Bucket nodes are signalled as

    "T<keys>"           ; Tree
    "B<keys>"           ; Bucket with only keys
    "B<keys+values>"    ; Bucket with keys and values

Keys are represented as ","-delimited list of integers. For example Tree
or Bucket with [1,3,5] keys are represented as

    "T1,3,5"        ; Tree([1,3,5])
    "B1,3,5"        ; Bucket([1,3,5])

Keys+values are represented as ","-delimited list of "<key>:<value>" pairs. For
example Bucket corresponding to {1:1, 2:4, 3:9} is represented as

    "B1:1,2:4,3:9"  ; Bucket([1,2,3], [1,4,9])

Empty keys+values are represented as ":" - an empty Bucket for key->value
mapping is represented as

    "B:"            ; Bucket([], [])

Nodes inside one layer are delimited with "-". For example a layer consisting
of an empty Tree, a Tree with [1,3] keys, and Bucket with [4,5] keys is
represented as

    "T-T1,3-B4,5"   ; layer with Tree([]), Tree([1,3]) and Bucket([4,5])

A layer consists of nodes that are followed by node-node links from upper layer
in left-to-right order.

Visualization
-------------

The following visualization utilities are provided to help understand BTrees
better:

- `topoview` displays BTree structure given its topology-encoded representation.
- `Tree.graphviz` returns Tree graph representation in dot language.
parent fbf15309
......@@ -346,7 +346,7 @@ setup(
],
extras_require = {
'test': ['pytest'],
'test': ['pytest', 'scipy'],
},
cmdclass = {'build_ext': build_ext,
......
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment