Commits · 46f3f3fd9807445bb37ca54a014aa5ada6161de0 · Levin Zimmermann / wendelin.core

28 Oct, 2021 16 commits

wcfs: Add FileSock FUSE utility · 46f3f3fd

Kirill Smelkov authored Oct 26, 2021

FileSock is bidirectional channel associated with opened file.

FileSock provides streaming write/read operations for filesystem server that
are correspondingly matched with read/write operations on filesystem user side.

WCFS will use FileSock to implement exchange over .wcfs/zhead and,
later, head/watch files.

Some preliminary history:

b17aeb8c    X Change FileSock to use xio.Pipe which is io.Pipe + support for IO cancellation

46f3f3fd

wcfs: zdata: ΔFtail · f980471f

Kirill Smelkov authored Oct 26, 2021

ΔFtail builds on ΔBtail and  provides ZBigFile-level history that WCFS
will use to compute which blocks of a ZBigFile need to be invalidated in
OS file cache given raw ZODB changes on ZODB invalidation message.

It also will be used by WCFS to implement isolation protocol, where on
every FUSE READ request WCFS will query ΔFtail to find out revision of
corresponding file block.

Quoting ΔFtail documentation:

---- 8< ----

ΔFtail provides ZBigFile-level history tail.

It translates ZODB object-level changes to information about which blocks of
which ZBigFile were modified, and provides service to query that information.

ΔFtail class documentation
~~~~~~~~~~~~~~~~~~~~~~~~~~

ΔFtail represents tail of revisional changes to files.

It semantically consists of

    []δF			; rev ∈ (tail, head]

where δF represents a change in files space

    δF:
    	.rev↑
    	{} file ->  {}blk | EPOCH

Only files and blocks explicitly requested to be tracked are guaranteed to
be present. In particular a block that was not explicitly requested to be
tracked, even if it was changed in δZ, is not guaranteed to be present in δF.

After file epoch (file creation, deletion, or any other change to file
object) previous track requests for that file become forgotten and have no
further effect.

ΔFtail provides the following operations:

  .Track(file, blk, path, zblk)	- add file and block reached via BTree path to tracked set.

  .Update(δZ) -> δF				- update files δ tail given raw ZODB changes
  .ForgetPast(revCut)			- forget changes ≤ revCut
  .SliceByRev(lo, hi) -> []δF		- query for all files changes with rev ∈ (lo, hi]
  .SliceByFileRev(file, lo, hi) -> []δfile	- query for changes of a file with rev ∈ (lo, hi]
  .BlkRevAt(file, #blk, at) -> blkrev	- query for what is last revision that changed
    					  file[#blk] as of @at database state.

where δfile represents a change to one file

    δfile:
    	.rev↑
    	{}blk | EPOCH

See also zodb.ΔTail and xbtree.ΔBtail

Concurrency

ΔFtail is safe to use in single-writer / multiple-readers mode. That is at
any time there should be either only sole writer, or, potentially several
simultaneous readers. The table below classifies operations:

    Writers:  Update, ForgetPast
    Readers:  Track + all queries (SliceByRev, SliceByFileRev, BlkRevAt)

Note that, in particular, it is correct to run multiple Track and queries
requests simultaneously.

ΔFtail organization
~~~~~~~~~~~~~~~~~~~

ΔFtail leverages:

    - ΔBtail to track changes to ZBigFile.blktab BTree, and
    - ΔZtail to track changes to ZBlk objects and to ZBigFile object itself.

then every query merges ΔBtail and ΔZtail data on the fly to provide
ZBigFile-level result.

Merging on the fly, contrary to computing and maintaining vδF data, is done
to avoid complexity of recomputing vδF when tracking set changes. Most of
ΔFtail complexity is, thus, located in ΔBtail, which implements BTree diff
and handles complexity of recomputing vδB when set of tracked blocks
changes after new track requests.

Changes to ZBigFile object indicate epochs. Epochs could be:

    - file creation or deletion,
    - change of ZBigFile.blksize,
    - change of ZBigFile.blktab to point to another BTree.

Epochs represent major changes to file history where file is assumed to
change so dramatically, that practically it can be considered to be a
"whole" change. In particular, WCFS, upon seeing a ZBigFile epoch,
invalidates all data in corresponding OS-level cache for the file.

The only historical data, that ΔFtail maintains by itself, is history of
epochs. That history does not need to be recomputed when more blocks become
tracked and is thus easy to maintain. It also can be maintained only in
ΔFtail because ΔBtail and ΔZtail does not "know" anything about ZBigFile.

Concurrency

In order to allow multiple Track and queries requests to be served in
parallel, ΔFtail bases its concurrency promise on ΔBtail guarantees +
snapshot-style access for vδE and ztrackInBlk in queries:

1. Track calls ΔBtail.Track and quickly updates .byFile, .byRoot and
   _RootTrack indices under a lock.

2. BlkRevAt queries ΔBtail.GetAt and then combines retrieved information
   about zblk with vδE and δZ.

3. SliceByFileRev queries ΔBtail.SliceByRootRev and then merges retrieved
   vδT data with vδZ, vδE and ztrackInBlk.

4. In queries vδE is retrieved/built in snapshot style similarly to how vδT
   is built in ΔBtail. Note that vδE needs to be built only the first time,
   and does not need to be further rebuilt, so the logic in ΔFtail is simpler
   compared to ΔBtail.

5. for ztrackInBlk - that is used by SliceByFileRev query - an atomic
   snapshot is retrieved for objects of interest. This allows to hold
   δFtail.mu lock for relatively brief time without blocking other parallel
   Track/queries requests for long.

Combined this organization allows non-overlapping queries/track-requests
to run simultaneously. (This property is essential to WCFS because otherwise
WCFS would not be able to serve several non-overlapping READ requests to one
file in parallel.)

See also "Concurrency" in ΔBtail organization for more details.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Some preliminary history:

kirr/wendelin.core@ef74aebc    X ΔFtail: Keep reference to ZBigFile via Oid, not via *ZBigFile
kirr/wendelin.core@bf9a7405    X No longer rely on ZODB cache invariant for invalidations
kirr/wendelin.core@46340069    X found by Random
kirr/wendelin.core@e7b598c6    X start of ΔFtail.SliceByFileRev rework to function via merging δB and δZ histories on the fly
kirr/wendelin.core@59c83009    X ΔFtail.SliceByFileRoot tests started to work draftly after "on-the-fly" rework
kirr/wendelin.core@210e9b07    X Fix ΔBtail.SliceByRootRev (lo,hi] handling
kirr/wendelin.core@bf3ace66    X ΔFtail: Rebuild vδE after first track
kirr/wendelin.core@46624787    X ΔFtail: `go test -failfast -short -v -run Random -randseed=1626793016249041295` discovered problems
kirr/wendelin.core@786dd336    X Size no longer tracks [0,∞) since we start tracking when zfile is non-empty
kirr/wendelin.core@4f707117    X test that shows problem of SliceByRootRev where untracked blocks are not added uniformly into whole history
kirr/wendelin.core@c0b7e4c3    X ΔFtail.SliceByFileRev: Fix untracked entries to be present uniformly in result
kirr/wendelin.core@aac37c11    X zdata: Introduce T to start removing duplication in tests
kirr/wendelin.core@bf411aa9    X zdata: Deduplicate zfile loading
kirr/wendelin.core@b74dda09    X Start switching Track from Track(key) to Track(keycov)
kirr/wendelin.core@aa0288ce    X Switch SliceByRootRev to vδTSnapForTracked
kirr/wendelin.core@588a512a    X zdata: Switch SliceByFileRev not to clone Zinblk
kirr/wendelin.core@8b5d8523    X Move tracking of which blocks were accessed from wcfs to ΔFtail
kirr/wendelin.core@30f5ddc7    ΔFtail += .Epoch in δf
kirr/wendelin.core@22f5f096    X Rework ΔFtail so that BlkRevAt works with ZBigFile checkout from any at ∈ (tail, head]
kirr/wendelin.core@0853cc9f    X ΔFtail + tests
kirr/wendelin.core@124688f9    X ΔFtail fixes
kirr/wendelin.core@d85bb82c    ΔFtail concurrency

f980471f

wcfs: xbtree: ΔBtail · 2ab4be93

Kirill Smelkov authored Oct 26, 2021

ΔBtail provides BTree-level history tail that WCFS - via ΔFtail - will
use to compute which blocks of a ZBigFile need to be invalidated in OS
file cache given raw ZODB changes on ZODB invalidation message.

It also will be used by WCFS to implement isolation protocol, where on
every FUSE READ request WCFS will query ΔBtail - again via ΔFtail - to
find out revision of corresponding file block.

Quoting ΔBtail documentation:

---- 8< ----

ΔBtail provides BTree-level history tail.

It translates ZODB object-level changes to information about which keys of
which BTree were modified, and provides service to query that information.

ΔBtail class documentation
~~~~~~~~~~~~~~~~~~~~~~~~~~

ΔBtail represents tail of revisional changes to BTrees.

It semantically consists of

    []δB			; rev ∈ (tail, head]

where δB represents a change in BTrees space

    δB:
    	.rev↑
    	{} root -> {}(key, δvalue)

It covers only changes to keys from tracked subset of BTrees parts.
In particular a key that was not explicitly requested to be tracked, even if
it was changed in δZ, is not guaranteed to be present in δB.

ΔBtail provides the following operations:

  .Track(path)	- start tracking tree nodes and keys; root=path[0], keys=path[-1].(lo,hi]

  .Update(δZ) -> δB				- update BTree δ tail given raw ZODB changes
  .ForgetPast(revCut)			- forget changes ≤ revCut
  .SliceByRev(lo, hi) -> []δB		- query for all trees changes with rev ∈ (lo, hi]
  .SliceByRootRev(root, lo, hi) -> []δT	- query for changes of a tree with rev ∈ (lo, hi]
  .GetAt(root, key, at) -> (value, rev)	- get root[key] @at assuming root[key] ∈ tracked

where δT represents a change to one tree

    δT:
    	.rev↑
    	{}(key, δvalue)

An example for tracked set is a set of visited BTree paths.
There is no requirement that tracked set belongs to only one single BTree.

See also zodb.ΔTail and zdata.ΔFtail

Concurrency

ΔBtail is safe to use in single-writer / multiple-readers mode. That is at
any time there should be either only sole writer, or, potentially several
simultaneous readers. The table below classifies operations:

    Writers:  Update, ForgetPast
    Readers:  Track + all queries (SliceByRev, SliceByRootRev, GetAt)

Note that, in particular, it is correct to run multiple Track and queries
requests simultaneously.

ΔBtail organization
~~~~~~~~~~~~~~~~~~~

ΔBtail keeps raw ZODB history in ΔZtail and uses BTree-diff algorithm(*) to
turn δZ into BTree-level diff. For each tracked BTree a separate ΔTtail is
maintained with tree-level history in ΔTtail.vδT .

Because it is very computationally expensive(+) to find out for an object to
which BTree it belongs, ΔBtail cannot provide full BTree-level history given
just ΔZtail with δZ changes. Due to this ΔBtail requires help from
users, which are expected to call ΔBtail.Track(treepath) to let ΔBtail know
that such and such ZODB objects constitute a path from root of a tree to some
of its leaf. After Track call the objects from the path and tree keys, that
are covered by leaf node, become tracked: from now-on ΔBtail will detect
and provide BTree-level changes caused by any change of tracked tree objects
or tracked keys. This guarantee can be provided because ΔBtail now knows
that such and such objects belong to a particular tree.

To manage knowledge which tree part is tracked ΔBtail uses PPTreeSubSet.
This data-structure represents so-called PP-connected set of tree nodes:
simply speaking it builds on some leafs and then includes parent(leaf),
parent(parent(leaf)), etc. In other words it's a "parent"-closure of the
leafs. The property of being PP-connected means that starting from any node
from such set, it is always possible to reach root node by traversing
.parent links, and that every intermediate node went-through during
traversal also belongs to the set.

A new Track request potentially grows tracked keys coverage. Due to this,
on a query, ΔBtail needs to recompute potentially whole vδT of the affected
tree. This recomputation is managed by "vδTSnapForTracked*" and "_rebuild"
functions and uses the same treediff algorithm, that Update is using, but
modulo PPTreeSubSet corresponding to δ key coverage. Update also potentially
needs to rebuild whole vδT history, not only append new δT, because a
change to tracked tree nodes can result in growth of tracked key coverage.

Queries are relatively straightforward code that work on vδT snapshot. The
main complexity, besides BTree-diff algorithm, lies in recomputing vδT when
set of tracked keys changes, and in handling that recomputation in such a way
that multiple Track and queries requests could be all served in parallel.

Concurrency

In order to allow multiple Track and queries requests to be served in
parallel ΔBtail employs special organization of vδT rebuild process where
complexity of concurrency is reduced to math on merging updates to vδT and
trackSet, and on key range lookup:

1. vδT is managed under read-copy-update (RCU) discipline: before making
   any vδT change the mutator atomically clones whole vδT and applies its
   change to the clone. This way a query, once it retrieves vδT snapshot,
   does not need to further synchronize with vδT mutators, and can rely on
   that retrieved vδT snapshot will remain immutable.

2. a Track request goes through 3 states: "new", "handle-in-progress" and
   "handled". At each state keys/nodes of the Track are maintained in:

   - ΔTtail.ktrackNew and .trackNew       for "new",
   - ΔTtail.krebuildJobs                  for "handle-in-progress", and
   - ΔBtail.trackSet                      for "handled".

   trackSet keeps nodes, and implicitly keys, from all handled Track
   requests. For all keys, covered by trackSet, vδT is fully computed.

   a new Track(keycov, path) is remembered in ktrackNew and trackNew to be
   further processed when a query should need keys from keycov. vδT is not
   yet providing data for keycov keys.

   when a Track request starts to be processed, its keys and nodes are moved
   from ktrackNew/trackNew into krebuildJobs. vδT is not yet providing data
   for requested-to-be-tracked keys.

   all trackSet, trackNew/ktrackNew and krebuildJobs are completely disjoint:

    trackSet ^ trackNew     = ø
    trackSet ^ krebuildJobs = ø
    trackNew ^ krebuildJobs = ø

3. when a query is served, it needs to retrieve vδT snapshot that takes
   related previous Track requests into account. Retrieving such snapshots
   is implemented in vδTSnapForTracked*() family of functions: there it
   checks ktrackNew/trackNew, and if those sets overlap with query's keys
   of interest, run vδT rebuild for keys queued in ktrackNew.

   the main part of that rebuild can be run without any locks, because it
   does not use nor modify any ΔBtail data, and for δ(vδT) it just computes
   a fresh full vδT build modulo retrieved ktrackNew. Only after that
   computation is complete, ΔBtail is locked again to quickly merge in
   δ(vδT) update back into vδT.

   This organization is based on the fact that

    vδT/(T₁∪T₂) = vδT/T₁ | vδT/T₂

     ( i.e. vδT computed for tracked set being union of T₁ and T₂ is the
       same as merge of vδT computed for tracked set T₁ and vδT computed
      for tracked set T₂ )

   and that

    trackSet | (δPP₁|δPP₂) = (trackSet|δPP₁) | (trackSet|δPP₂)

    ( i.e. tracking set updated for union of δPP₁ and δPP₂ is the same
      as union of tracking set updated with δPP₁ and tracking set updated
      with δPP₂ )

   these merge properties allow to run computation for δ(vδT) and δ(trackSet)
   independently and with ΔBtail unlocked, which in turn enables running
   several Track/queries in parallel.

4. while vδT rebuild is being run, krebuildJobs keeps corresponding keycov
   entry to indicate in-progress rebuild. Should a query need vδT for keys
   from that job, it first waits for corresponding job(s) to complete.

Explained rebuild organization allows non-overlapping queries/track-requests
to run simultaneously. (This property is essential to WCFS because otherwise
WCFS would not be able to serve several non-overlapping READ requests to one
file in parallel.)

--------

(*) implemented in treediff.go
(+) full database scan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Some preliminary history:

kirr/wendelin.core@877e64a9    X wcfs: Fix tests to pass again
kirr/wendelin.core@c32055fc    X wcfs/xbtree: ΔBtail tests += ø -> Tree; Tree -> ø
kirr/wendelin.core@78f2f88b    X wcfs/xbtree: Fix treediff(a, ø)
kirr/wendelin.core@5324547c    X wcfs/xbtree: root(a) must stay in trackSet even after treediff(a,ø)
kirr/wendelin.core@f65f775b    X wcfs/xbtree: treediff(ø, b)
kirr/wendelin.core@c75b1c6f    X wcfs/xbtree: Start killing holeIdx
kirr/wendelin.core@0fa06cbd    X kadj must be taken into account as kadj^δZ
kirr/wendelin.core@ef5e5183    X treediff ret += δtkeycov
kirr/wendelin.core@f30826a6    X another bug in δtkeyconv computation
kirr/wendelin.core@0917380e    X wcfs: assert that keycov only grow
kirr/wendelin.core@502e05c2    X found why TestΔBTailAllStructs was not effective to find δtkeycov bugs
kirr/wendelin.core@450ba707    X Fix rebuild with ø @at2
kirr/wendelin.core@f60528c9    X ΔBtail.Clone had bug that it was aliasing klon and orig data
kirr/wendelin.core@9d20f8e8    X treediff: Fix BUG while computing AB coverage
kirr/wendelin.core@ddb28043    X rebuild: Don't return nil for empty ΔPPTreeSubSet - that leads to SIGSEGV
kirr/wendelin.core@324241eb    X rebuild: tests: Don't reflect.DeepEqual in inner loop
kirr/wendelin.core@8f6e2b1e    X rebuild: tests: Don't access ZODB in XGetδKV
kirr/wendelin.core@2c0b4793    X rebuild: tests: Don't access ZODB in xtrackKeys
kirr/wendelin.core@8f0e37f2    X rebuild: tests: Precompute kadj10·kadj21
kirr/wendelin.core@271d953d    X rebuild: tests: Move ΔBtail.Clone test out of hot inner loop into separate test
kirr/wendelin.core@a87cc6de    X rebuild: tests: Don't recompute trackSet(keys1R2) several times
kirr/wendelin.core@01433e96    X rebuild: tests: Don't compute keyCover in trackSet
kirr/wendelin.core@7371f9c5    X rebuild: tests: Inline _assertTrack
kirr/wendelin.core@3e9164b3    X rebuild: tests: Don't exercise keys from keys2 that already became tracked after Track(keys1) + Update
kirr/wendelin.core@e9c4b619    X rebuild: tests: Random testing
kirr/wendelin.core@d0fe680a    X δbtail += ForgetPast
kirr/wendelin.core@210e9b07    X Fix ΔBtail.SliceByRootRev (lo,hi] handling
kirr/wendelin.core@855ab4b8    X ΔBtail: Goodbye .KVAtTail
kirr/wendelin.core@2f5582e6    X ΔBtail: Tweak tests to run faster in normal mode
kirr/wendelin.core@cf352737    X random testing found another failing test for rebuild...
kirr/wendelin.core@7f7e34e0    X wcfs/xbtree: Fix update not to add duplicate extra point if rebuild  - called by Update - already added it
kirr/wendelin.core@6ad0052c    X ΔBtail.Track: No need to return error
kirr/wendelin.core@aafcacdf    X xbtree: GetAt test
kirr/wendelin.core@784a6761    X xbtree: Fix KAdj definition after treediff was reworked this summer to base decisions on node keycoverage instead of particular node keys
kirr/wendelin.core@0bb1c22e    X xbtree: Verify that ForgetPast clones vδT on trim
kirr/wendelin.core@a8945cbf    X Start reworking rebuild routines not to modify data inplace
kirr/wendelin.core@b74dda09    X Start switching Track from Track(key) to Track(keycov)
kirr/wendelin.core@dea85e87    X Switch GetAt to vδTSnapForTrackedKey
kirr/wendelin.core@aa0288ce    X Switch SliceByRootRev to vδTSnapForTracked
kirr/wendelin.core@c4366b14    X xbtree: tests: Also verify state of ΔTtail.ktrackNew
kirr/wendelin.core@b98706ad    X Track should be nop if keycov/path is already in krebuildJobs
kirr/wendelin.core@e141848a    X test.go  ↑ timeout  10m -> 20m
kirr/wendelin.core@423f77be    X wcfs: Goodby holeIdx
kirr/wendelin.core@37c2e806    X wcfs: Teach treediff to compute not only δtrack (set of nodes), but also δ for track-key coverage
kirr/wendelin.core@52c72dbb    X ΔBtail.rebuild started to work draftly
kirr/wendelin.core@c9f13fc7    X Get rebuild tests to run in a sane time; Add proper random-based testing for rebuild
kirr/wendelin.core@c7f1e3c9    X xbtree: Factor testing infrastructure bits into xbtree/xbtreetest
kirr/wendelin.core@7602c1f4    ΔBtail concurrency

2ab4be93

wcfs: xbtree: BTree-diff algorithm · 80153aa5

Kirill Smelkov authored Oct 26, 2021

This algorithm will be internally used by ΔBtail in the next patch.

The algorithm would be simple, if we would need to diff two trees
completely. However in ΔBtail only subpart of BTree nodes are tracked(*)
and the diff has to work modulo that tracking set.

No tests now because ΔBtail tests will cover treediff functionality as well.

Some preliminary history:

78f2f88b    X wcfs/xbtree: Fix treediff(a, ø)
5324547c    X wcfs/xbtree: root(a) must stay in trackSet even after treediff(a,ø)
f65f775b    X wcfs/xbtree: treediff(ø, b)
c75b1c6f    X wcfs/xbtree: Start killing holeIdx
ef5e5183    X treediff ret += δtkeycov
9d20f8e8    X treediff: Fix BUG while computing AB coverage
ddb28043    X rebuild: Don't return nil for empty ΔPPTreeSubSet - that leads to SIGSEGV
f68398c9    X wcfs: Move treediff into its own file

(*) because full BTree scan is needed to discover all of its nodes.

Quoting treediff documentation:

---- 8< ----

treediff provides diff for BTrees

Use δZConnectTracked + treediff to compute BTree-diff caused by δZ:

    δZConnectTracked(δZ, trackSet)                         -> δZTC, δtopsByRoot
    treediff(root, δtops, δZTC, trackSet, zconn{Old,New})  -> δT, δtrack, δtkeycov

δZConnectTracked computes BTree-connected closure of δZ modulo tracked set
and also returns δtopsByRoot to indicate which tree objects were changed and
in which subtree parts. With that information one can call treediff for each
changed root to compute BTree-diff and δ for trackSet itself.

BTree diff algorithm

diffT, diffB and δMerge constitute the diff algorithm implementation.
diff(A,B) works on pair of A and B whole key ranges splitted into regions
covered by tree nodes. The splitting represents current state of recursion
into corresponding tree. If a node in particular key range is Bucket, that
bucket contributes to δ- in case of A, and to δ+ in case of B. If a node in
particular key range is Tree, the algorithm may want to expand that tree
node into its children and to recourse into some of the children.

There are two phases:

- Phase 1 expands A top->down driven by δZTC, adds reached buckets to δ-,
  and queues key regions of those buckets to be processed on B.

- Phase 2 starts processing from queued key regions, expands them on B and
  adds reached buckets to δ+. Then it iterates to reach consistency in between
  A and B because processing buckets on B side may increase δ key coverage,
  and so corresponding key ranges has to be again processed on A. Which in
  turn may increase δ key coverage again, and needs to be processed on B side,
  etc...

The final δ is merge of δ- and δ+.

diffT has more detailed explanation of phase 1 and phase 2 logic.

80153aa5

wcfs: xbtree: blib += PPTreeSubSet, ΔPPTreeSubSet · 27df5a3b

Kirill Smelkov authored Oct 26, 2021

This data structures will be used in ΔBtail to maintain sef of tracked
BTree nodes, and to represent δ to such set.

Some preliminary history:

kirr/wendelin.core@78f2f88b    X wcfs/xbtree: Fix treediff(a, ø)
kirr/wendelin.core@5324547c    X wcfs/xbtree: root(a) must stay in trackSet even after treediff(a,ø)
kirr/wendelin.core@f65f775b    X wcfs/xbtree: treediff(ø, b)
kirr/wendelin.core@66bc41ce    X Fix bug in PPTreeSubSet.Difference  - it was always leaving root node alive
kirr/wendelin.core@ddb28043    X rebuild: Don't return nil for empty ΔPPTreeSubSet - that leads to SIGSEGV
kirr/wendelin.core@a87cc6de    X rebuild: tests: Don't recompute trackSet(keys1R2) several times

Quoting PPTreeSubSet and ΔPPTreeSubSet documentation:

---- 8< ----

PPTreeSubSet represents PP-connected subset of tree node objects.

It is

    PP(xleafs)

where PP(node) maps node to {node, node.parent, node.parent,parent, ...} up
to top root from where the node is reached.

The nodes in the set are represented by their Oid.

Usually PPTreeSubSet is built as PP(some-leafs), but in general the starting
nodes are arbitrary. PPTreeSubSet can also have many root nodes, thus not
necessarily representing a subset of a single tree.

Usual set operations are provided: Union, Difference and Intersection.

Nodes can be added into the set via AddPath. Path is reverse operation - it
returns path to tree node given its oid.

Every node in the set comes with .parent pointer.

~~~~

ΔPPTreeSubSet represents a change to PPTreeSubSet.

It can be applied via PPTreeSubSet.ApplyΔ .

The result B of applying δ to A is:

    B = A.xDifference(δ.Del).xUnion(δ.Add)		(*)

(*) NOTE δ.Del and δ.Add might have their leafs starting from non-leaf nodes in A/B.
    This situation arises when δ represents a change in path to particular
    node, but that node itself does not change, for example:

           c*             c
          / \            /
        41*  42         41
         |    |         | \
        22   43        46  43
              |         |   |
             44        22  44

    Here nodes {c, 41} are changed, node 42 is unlinked, and node 46 is added.
    Nodes 43 and 44 stay unchanged.

        δ.Del = c-42-43   | c-41-22
        δ.Add = c-41-43   | c-41-46-22

    The second component with "-22" builds from leaf, but the first
    component with "-43" builds from non-leaf node.

        ΔnchildNonLeafs = {43: +1}

    Only complete result of applying all

        - xfixup(-1, ΔnchildNonLeafs)
        - δ.Del,
        - δ.Add, and
        - xfixup(+1, ΔnchildNonLeafs)

    produces correctly PP-connected set.

27df5a3b

wcfs: xbtree: blib += RangedMap, RangedKeySet · 1f2cd49d

Kirill Smelkov authored Oct 26, 2021

RangedMap is Key->VALUE map with adjacent keys mapped to the same value coalesced into Ranges.
RangedKeySet is set of Keys with adjacent keys coalesced into Ranges.

This data structures will be needed for ΔBtail.

For now the implementation is simple since it keeps whole map in a
linear slice because both RangedMap and RangedKeySet will be used in
ΔBtail to keep something proportional to δ of a change, which is assumed
to be small or medium most of the time.

Some preliminary history:

kirr/wendelin.core@6ea5920a X xbtree: Less copy/garbage in RangedKeySet ops
kirr/wendelin.core@3ecacd99 X need to keep Value first so that sizeof(set-entry) = sizeof(KeyRange)
kirr/wendelin.core@a5b9b19b X SetRange draftly works
kirr/wendelin.core@ed2de0de X Tests for Get
kirr/wendelin.core@3b7b69e6 X fixes for empty set/range
kirr/wendelin.core@6972f999 X xbtree/blib: RangedMap, RangedSet += IntersectsRange, Intersection
kirr/wendelin.core@57be0126 X RangedMap - like RangedSet but for dict

1f2cd49d

wcfs: tests: Tree-based testing environment · b87edcfe

Kirill Smelkov authored Oct 26, 2021

Add treeenv.go that combines Treegen and client side access to ZODB with
committed trees as extension to testing.T . The environment allows to
easily see which tree update was committed, what is the difference in
terms of KV, what is the state of updated tree and state of pointed-to
ZBlk objects.

This will be used to test upcoming ΔBtail and ΔFtail.

Main functionality is in treeenv.go; the other added files are to
support that.

Some preliminary history:

kirr/wendelin.core@f07502fc X xbtreetest: Teach T & Commit to automatically provide At in symbolic form
kirr/wendelin.core@0d62b05e X Adjust to btree.VGet & friends signature change to include keycov in visit callback
kirr/wendelin.core@588a512a X zdata: Switch SliceByFileRev not to clone Zinblk
kirr/wendelin.core@e9c4b619 X rebuild: tests: Random testing
kirr/wendelin.core@43090ac7 X tests: Factor-out tree-test-env into tTreeEnv
kirr/wendelin.core@d4a523b2 X δbtail: tests: Run much faster with live ZODB cache
kirr/wendelin.core@271d953d X rebuild: tests: Move ΔBtail.Clone test out of hot inner loop into separate test
kirr/wendelin.core@c32055fc X wcfs/xbtree: ΔBtail tests += ø -> Tree; Tree -> ø
kirr/wendelin.core@5324547c X wcfs/xbtree: root(a) must stay in trackSet even after treediff(a,ø)
kirr/wendelin.core@8f6e2b1e X rebuild: tests: Don't access ZODB in XGetδKV

b87edcfe

wcfs: Set package · b13ee09b

Kirill Smelkov authored Oct 26, 2021

Lacking generics we have set.go.in and instantiation for Set[int64],
set[string], Set[Oid] and Set[Tid] - that will be used in follow-up
patches.

The set.go.in itself is mostly a generalized copy from git-backup:

https://lab.nexedi.com/kirr/git-backup/blob/c9db60e8/set.go

b13ee09b

wcfs: tests: Treegen functionality · a8595565

Kirill Smelkov authored Oct 26, 2021

treegen.go and treegen.py together provide a way

- to commit a particular BTree topology into ZODB, and
- to generate set of random tree topologies that all correspond to particular {k->v} dict.

this will be used in upcoming ΔBtail and ΔFtail tests.

See treegen.py documentation for details.

Some preliminary history:

kirr/wendelin.core@9eca74ec    X Teach AllStructs to emit topologies with values
kirr/wendelin.core@1b962f03    X Restructure: found bug that it was not marking objects as modified
kirr/wendelin.core@2139af2c    X treegen: Verify that tree actually saved to storage is what was requested
kirr/wendelin.core@b5e39d4a    X wcfs/treegen: allstructs: Do not keep all tree structures in memory
kirr/wendelin.core@e9c4b619    X rebuild: tests: Random testing
kirr/wendelin.core@c32055fc    X wcfs/xbtree: ΔBtail tests += ø -> Tree; Tree -> ø
kirr/wendelin.core@4300d88a    X wcfs/xbtreetest/treegen.py: Fix it on ZODB4

a8595565

wcfs: xbtree: blib: Start of package · 37fb6d28

Kirill Smelkov authored Oct 26, 2021

This will be the place to keep BTree-related utilities.
For now it provides only type aliases since Go lacks generics.

37fb6d28

wcfs: tests: xbtree.py package for inspecting/manipulating internal structure of BTrees · 0e829874

Kirill Smelkov authored Oct 26, 2021

To handle invalidations, WCFS will need to detect changes to both ZBlk
objects and to ZBigFile.blktab BTree that is mapping file blocks to ZBlk
objects. And with BTree detecting changes is much more complex, because
when a BTree changes, it might be rebalanced, or keys migrated from one
tree/bucket node to another tree/bucket node. In other words a BTree
change might be not only a change to a {}key->value dictionary, but also
a change to BTree topology.

Because there are many BTree topologies that correspond to the same
{}key->value state, a change from kv₁ to kv₂, even if kv₁ and kv₂ are
close to each other, might be accompanied by a dramatic change to
topology of the tree. This creates a need for thoroughly testing the
BTree difference algorithm because many of BTree topologies changes are
tricky, and if a simple algorithm works on relatively stable topology
updates, it does not necessarily mean that that same algorithm will
continue to work correctly in the general case.

So, as a preparatory step, here comes xbtree.py package, that can be
used to inspect tree topologies, to create trees with specified topology
and to manipulate topology of an existing tree. This package will be
used in tests for upcoming ΔBtail.

For debugging, and also since those tests will involve both Go and
Python parts, it creates the need to be able to specify and exchange
topology of a tree via compact string. This package also defines so
called "topology encoding" to do so.

Some preliminar history:

kirr/wendelin.core@fb56193f    X fix metric to keep Z <- N order stable over key^
kirr/wendelin.core@809304d1    X "B:" indicates ø bucket with k&b, "B" - ø bucket with only keys
kirr/wendelin.core@9eca74ec    X Teach AllStructs to emit topologies with values
kirr/wendelin.core@1b962f03    X Restructure: found bug that it was not marking objects as modified
kirr/wendelin.core@9181c5d9    X Restructure; verify that it marks as changed only modifed nodes
kirr/wendelin.core@e9902c4a    X improve `xbtree topoview`

For the reference xbtree.py package documentation is quoted below.

---- 8< ----

Package xbtree provides utilities for inspecting/manipulating internal
structure of integer-keyed BTrees.

It will be primarily used to help verify ΔBTail in WCFS.

- `Tree` represents a tree node.
- `Bucket` represents a bucket node.
- `StructureOf` returns internal structure of ZODB BTree represented as Tree
  and Bucket nodes.
- `Restructure` reorganizes ZODB BTree instance according to specified topology
  structure.

- `AllStructs` generates all possible BTree topology structures with given keys.

Topology encoding
-----------------

Topology encoding provides way to represent structure of a Tree as path-like string.

TopoEncode converts Tree into its topology-encoded representation, while
TopoDecode decodes topology-encoded string back into Tree.

The following example illustrates topology encoding represented by string
"T3/T-T/B1-T5/B-B7,8,9":

      [ 3 ]             T3/         represents Tree([3])
       / \
     [ ] [ ]            T-T/        represents two empty Tree([])
      ↓   ↓
     |1|[ 5 ]           B1-T5/      represent Bucket([1]) and Tree([5])
         / \
        || |7|8|9|      B-B7,8,9    represents empty Bucket([]) and Bucket([7,8,9])

Topology encoding specification:

A Tree is encoded by level-order traversal, delimiting layers with "/".
Inside a layer Tree and Bucket nodes are signalled as

    "T<keys>"           ; Tree
    "B<keys>"           ; Bucket with only keys
    "B<keys+values>"    ; Bucket with keys and values

Keys are represented as ","-delimited list of integers. For example Tree
or Bucket with [1,3,5] keys are represented as

    "T1,3,5"        ; Tree([1,3,5])
    "B1,3,5"        ; Bucket([1,3,5])

Keys+values are represented as ","-delimited list of "<key>:<value>" pairs. For
example Bucket corresponding to {1:1, 2:4, 3:9} is represented as

    "B1:1,2:4,3:9"  ; Bucket([1,2,3], [1,4,9])

Empty keys+values are represented as ":" - an empty Bucket for key->value
mapping is represented as

    "B:"            ; Bucket([], [])

Nodes inside one layer are delimited with "-". For example a layer consisting
of an empty Tree, a Tree with [1,3] keys, and Bucket with [4,5] keys is
represented as

    "T-T1,3-B4,5"   ; layer with Tree([]), Tree([1,3]) and Bucket([4,5])

A layer consists of nodes that are followed by node-node links from upper layer
in left-to-right order.

Visualization
-------------

The following visualization utilities are provided to help understand BTrees
better:

- `topoview` displays BTree structure given its topology-encoded representation.
- `Tree.graphviz` returns Tree graph representation in dot language.

0e829874

wcfs: tests: Start verifying state of OS file cache · d81d2cbb

Kirill Smelkov authored Oct 26, 2021

For WCFS to be efficient it will have to carefully preserve OS cache on
file invalidations. As preparatory step establish infrastructure for
verifying state of OS file cache and start asserting on OS cache state
in a couple of places.

See comments added to tFile constructor that describe how OS cache state
verification is setup.

Some preliminary history:

kirr/wendelin.core@8293025b X Thoughts on how to avoid readahead touching pages of neighbour block
kirr/wendelin.core@3054e4a3 X not touching neighbour block works via setting MADV_RANDOM in last 1/4 of every block
kirr/wendelin.core@18362227 X #5 access still triggers read to #4 ?
kirr/wendelin.core@17dbf94e X Provide mlock2 fallback for Ubuntu
kirr/wendelin.core@d134c0b9 X wcfs: test: try to live with only hard memlock limit adjusted
kirr/wendelin.core@c2423296 X Fix mlock2 build on Debian 8

d81d2cbb

wcfs: Initial implementation of basic filesystem · e3f2ee2d

Kirill Smelkov authored Oct 26, 2021

Provide filesystem view of in-ZODB ZBigFiles, but do not implement support for
invalidations nor isolation protocol yet. In particular, because ZODB
invalidations are not yet handled, the filesystem does not update its data in
accordance with ZODB updates, and instead provides stale data view that
corresponds to the state of ZODB at the time when wcfs was mounted.

The main parts of this patch are:

- wcfs/wcfs.go is filesystem implementation itself together with overview.
- wcfs/__init__.py is python wrapper to spawn and interoperate with that filesystem.
- wcfs/wcfs_test.py is tests.

Some preliminary history:

kirr/wendelin.core@fe7efb94    X start of wcfs
kirr/wendelin.core@878b2787    X draft loading
kirr/wendelin.core@d58c71e8    X don't overalign end by 1 blksize if end is already aligned
kirr/wendelin.core@29c9f13d    X readBlk: Fix thinko in already case
kirr/wendelin.core@59552328    X wcfs: Care to disable OS polling on us
kirr/wendelin.core@c00d94c7    X workaround lack of exception chaining on Python2 with xdefer
kirr/wendelin.core@0398e23d    X bytearray turned out to be copying data
kirr/wendelin.core@7a837040    X print wcfs.py py-level traceback on SIGBUS (e.g. wcfs.go aborting due to bug/panic)
kirr/wendelin.core@661b871f    X make sure tests don't get stuck even if wcfs gets killed -9 ...
kirr/wendelin.core@2c043d29    X More effort to unmount failed wcfs.go
kirr/wendelin.core@1ccc4478    X Use `with gil` + regular py code instead of PyGILState_Ensure/PyGILState_Release/PyRun_SimpleString
kirr/wendelin.core@5dc9c791    X wcfs: Kill xdefer
kirr/wendelin.core@91e9eba8    X wcfs: test: Register tFile to tDB early
kirr/wendelin.core@a7138fef    X wcfs: mkdir /tmp/wcfs with sticky bit
kirr/wendelin.core@1eec76d0    X wcfs: try to set sticky for /tmp/wcfs even if the directory already exists
kirr/wendelin.core@c2c35851    X wcfs: tests: Factor-out waiting for a general condition to become true into waitfor
kirr/wendelin.core@78f36993    X wcfs: test: Fix thinko in getting /sys/fs/fuse/connection/<X> for wcfs
kirr/wendelin.core@bc9eb16f    X wcfs: tests: Don't use testmntpt everywhere
kirr/wendelin.core@6dec74e7    X wcfs: tests: Split tDB into -> tDB + tWCFS
kirr/wendelin.core@3a6bd764    X wcfs: tests: Run `fusermount -u` the second time if we had to kill wcfs
kirr/wendelin.core@112720f3    X wcfs: tests: Print which files are still opened on wcfs if `fusermount -u` fails
kirr/wendelin.core@bb40185b    X wcfs: Take $WENDELIN_CORE_WCFS_OPTIONS into account not only from under join
kirr/wendelin.core@03a9ef33    X wcfs: Remove credentials from zurl when computing wcfs mountpoint
kirr/wendelin.core@68ee5bdc    X wcfs: lsof tweaks
kirr/wendelin.core@21671879    X wcfs: Teach entrypoint frontend to handle subcommands: serve, status, stop
kirr/wendelin.core@b0642b80    X wcfs: Switch mountpoints from /tmp/wcfs/* to /dev/shm/*
kirr/wendelin.core@b0ca031f    X wcfs: Teach join/serve to start successfully even after unclean wcfs shutdown
kirr/wendelin.core@5bfa8cf8    X wcfs: Add start to spawn a Server that can be later stopped  (draft)
kirr/wendelin.core@5fcec261    X wcfs: Run fusermount and friends with /bin:/usr/bin always on path
kirr/wendelin.core@669d7a20    fixup! X wcfs: Run fusermount and friends with /bin:/usr/bin always on path
kirr/wendelin.core@6b22f8c4    X wcfs: Teach start to start successfully even after unclean wcfs shutdown
kirr/wendelin.core@15389db0    X wcfs: Tune _fuse_unmount to include `fusermount -u` error message into raised exception
kirr/wendelin.core@153c002a    X wcfs: _fuse_unmount: Try first `kill -TERM` before `kill -QUIT` wcfs
kirr/wendelin.core@3244f3a6    X wcfs: lsof +D misbehaves - don't use it
kirr/wendelin.core@a126e709    X wcfs: Put client log into its own logger
kirr/wendelin.core@ac303d1e    X wcfs: tests: -v  ->  show only wcfs.py logs verbosely
kirr/wendelin.core@d671a9e9    X wcfs: Give more time to stop wcfs server

e3f2ee2d

wcfs: Add zdata package to load ZBlk/ZBigFile data · 2c152d41

Kirill Smelkov authored Oct 22, 2021

Add functionality to load objects from ZODB as saved by py wendelin.core.
Mostly straightforward code.
The main part is in zblk.go .

Contrary to python implementation, go can load ZBlk1's subobjects in
parallel, which, given scalable ZODB storage, can be significantly
faster compared to serially loading all ZData subobjects as py code
does.

TODO test wrt data saved by Python3.

Some preliminary history:

kirr/wendelin.core@878b2787 X draft loading
kirr/wendelin.core@bf9a7405 X No longer rely on ZODB cache invariant for invalidations
kirr/wendelin.core@0d62b05e X Adjust to btree.VGet & friends signature change to include keycov in visit callback
kirr/wendelin.core@b74dda09 X Start switching Track from Track(key) to Track(keycov)

2c152d41

wcfs: Initial stub · 2163fcaf

Kirill Smelkov authored Oct 22, 2021

Add initial stub for WCFS program and tests.
WCFS functionality will be added step-by-step in follow-up commits.

Some preliminary history:

0ae88a32       X .nxdtest: Verify Go bits with GOMAXPROCS=1,2,`nproc`
23528eb4       X wcfs: make it to use go modules for dependencies

2163fcaf

lib/zodb: Teach zstor_2zurl about ZEO, NEO and Demo storages · a05db040

Kirill Smelkov authored Oct 28, 2021

In 6637d216 (lib/zodb: Add zstor_2zurl - way to convert a ZODB storage
into URL to access it) we added zstor_2zurl function to convert a ZODB
storage client object into an URL to access the storage. At that time
the function knew how to understand FileStorage only. Let's add support
for other storages that WCFS will need to support now.

NEO URI scheme matches the one currently used on ZODB/go side. It
semantically needs neoppod!18
to be also applied to NEO/py side, but we do not care for now that that
patch is not merged (yet, or forever) because extracted ZURL is used
only with WCFS which uses NEO/go.

NEO support also depends on custom patch to remember SSL credentials on
NEO Client:

kirr/neo@a2f192cb

Some preliminary history:

kirr/wendelin.core@5cb39463    fixup! X wcfs/zeo started to work locally
kirr/wendelin.core@1cf3b228    X zstor_2zurl += NEO
kirr/wendelin.core@7f8fa32a    X lib/zodb: zstor_2zurl += NEO/SSL support
kirr/wendelin.core@e26524df    X wcfs, lib/zodb: DemoStorage support

a05db040

25 Oct, 2021 8 commits

setup: Split virtmem into its own DSO · 622fb217

Kirill Smelkov authored Oct 22, 2021

Upcoming libwcfs (C++ part of WCFS client) will need to use virtmem code
and link to libvirtmem.

622fb217

setup: Add build dependency information · 9742fe7b

Kirill Smelkov authored Nov 03, 2020

Manaully, because there is no automatic dependency tracking in
setuptools...

Dependency tracking is needed to avoid miscompilation after incremental
update under SlapOS/buildout/testnode/... when e.g. only .h was changed.

9742fe7b

setup: Teach cython to resolve `cimport wendelin.*` starting from top-level · 58f2af44
Kirill Smelkov authored Mar 02, 2020
```
This is similar to e870781d (Top-level in-tree import redirector) but
for upcoming pyx modules.
```
58f2af44

setup: Switch building of Python extensions to Pygolang · d95b6635

Kirill Smelkov authored Oct 22, 2021

Soon we are going to split virtmem code into its own DSO to which
bigfile extension will link. As plain setuptools does not support such
dynamic linking, we are going to use setuptools_dso instead. But more:
some of our upcoming extensions and DSOs will need to use Cython and C++
parts of Pygolang. Prepare that and use Extensions and DSO from
golang.pyx.build to support that right from the start.

d95b6635

setup: Factor common code to build a py extension into Ext · 8e19af30

Kirill Smelkov authored Oct 22, 2021

Currently we have only one extension wendelin.bigfile._bigfile, but we
are going to add more both python extensions and non-python DSOs. Start
preparing to that by factoring-out common code.

8e19af30

fixup! lib/zodb: Add tests for critical ZODB properties that Wendelin.core 2 will depend on · c45c2de8
Kirill Smelkov authored Oct 22, 2021
```
lib/tests/testprog/zloadrace.py:90:1 'ZODB.FileStorage.FileStorage' imported but unused

This amends commit c37a989d.
```
c45c2de8

bigfile/virtmem: Print traceback on segmentation fault · 25fa18f4

Kirill Smelkov authored Aug 11, 2021

Do what we can do without gdb and then tail to regular segmentation
fault. With core file gdb can still be used, but it is handy if we
already can get traceback of the crash into the log automatically.

TODO better use https://github.com/ianlancetaylor/libbacktrace because
backtrace_symbols often does not provide symbolic information.

We do not do this now because libbacktrace is not always automatically
installed.

25fa18f4

build: Force rebuild *.t programs · e7cea028

Kirill Smelkov authored Aug 16, 2021

This makes sure that those programs are always built afresh instead
being stuck at outdated build. This is needed because corresponding test .c
file includes many other .c files and we don't implement dependency tracking.

e7cea028

01 Apr, 2021 3 commits

tests: Reset transaction synchronizers before every test run · fe369d32

Kirill Smelkov authored Apr 01, 2021

Else, e.g. after a failing test, that closed its storage and DB, but not
all Connections, another test, just by starting new transaction, would
invoke synchronization on that unclosed connection, which will try to
access closed storage and likely fail.

Fixes e.g. https://nexedijs.erp5.net/#/test_result_module/20210401-31B27B3D/5

Crash scenariou is the same as described in 5a5ed2c7 (tests: Force-close
ZODB connections in teardown, that testing code forgot to explicitly
close). Only now we try to isolate tests from each other not only for
different modules, but also for tests inside the same module.

fe369d32

lib/zodb: Add tests for critical ZODB properties that Wendelin.core 2 will depend on · c37a989d

Kirill Smelkov authored Apr 01, 2021

The tests verify that there is no concurrency bugs around load,
Connection.open and invalidations. See e.g.

https://github.com/zopefoundation/ZODB/issues/290
https://github.com/zopefoundation/ZEO/issues/155

By including the tests into wendelin.core, we will have CI coverage for
all supported storages (FileStorage, ZEO, NEO), and for all supported
ZODB (currently ZODB4, ZODB4-wc2 and ZODB5).

ZEO5 is know to currently fail zloadrace.
However, even though ZODB#290 was fixed, ZEO5 turned out to also fail on zopenrace:

        def test_zodb_zopenrace():
            # exercises ZODB.Connection + particular storage implementation
    >       zopenrace.main()

    lib/tests/test_zodb.py:382:
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    <decorator-gen-1>:2: in main
        ???
    ../../tools/go/pygolang/golang/__init__.py:103: in _
        return f(*argv, **kw)
    lib/tests/testprog/zopenrace.py:115: in main
        test(zstor)
    <decorator-gen-2>:2: in test
        ???
    ../../tools/go/pygolang/golang/__init__.py:103: in _
        return f(*argv, **kw)
    lib/tests/testprog/zopenrace.py:201: in test
        wg.wait()
    golang/_sync.pyx:246: in golang._sync.PyWorkGroup.wait
        ???
    golang/_sync.pyx:226: in golang._sync.PyWorkGroup.go.pyrunf
        ???
    lib/tests/testprog/zopenrace.py:165: in T1
        t1()
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

        def t1():
            transaction.begin()
            zconn = db.open()

            root = zconn.root()
            obj1 = root['obj1']
            obj2 = root['obj2']

            # obj1 - reload it from zstor
            # obj2 - get it from zconn cache
            obj1._p_invalidate()

            # both objects must have the same values
            i1 = obj1.i
            i2 = obj2.i
            if i1 != i2:
    >           raise AssertionError("T1: obj1.i (%d)  !=  obj2.i (%d)" % (i1, i2))
    E           AssertionError: T1: obj1.i (3)  !=  obj2.i (2)

    lib/tests/testprog/zopenrace.py:156: AssertionError

c37a989d

*: tests: don't hang on exception in non-main thread · 08e0c9fb

Kirill Smelkov authored Oct 11, 2020

Previously if an assert or something failed in spawned thread, the main
thread was usually spinning indefinitely = tests hang. -> Switch all
threading places to use sync.WorkGroup and this way if a thread fails,
all other threads are canceled and the exception is reported back to
wg.wait in main thread.

Since we start to go this route, NotifyChannel is reworked to fully use
channels instead of busy-waiting.

08e0c9fb

26 Mar, 2021 1 commit

.gitignore += *_dsoinfo.py · 5f28b72c

Kirill Smelkov authored Mar 26, 2021

setuptools_dso 2 started to emit those autogenerated files.
See https://github.com/mdavidsaver/setuptools_dso/pull/15 for details.

5f28b72c

08 Mar, 2021 3 commits

tox: v↑ NEO (1.9 -> 1.12) · 95b012d3

Kirill Smelkov authored Mar 08, 2021

NEO 1.9 was released in 2018 and is outdated by now. NEO 1.12 is
currently the latest NEO release.

95b012d3

Require Zodbtools · d62a297c

Kirill Smelkov authored Mar 08, 2021

After switching to ZODB >= 4 in the previous commit, we can safely
require zodbtools, because there is now no conflict in between
ZODB3/ZODB eggs.

d62a297c

Drop support for ZODB3 · 0802da2b

Kirill Smelkov authored Mar 08, 2021

It's been a while since last ZODB3 3.10.7 release in 2016 and the last
commit in upstream ZODB3 repository (3.10 branch) is from 2017. The
world switched since then to ZODB4 and to ZODB5 after that.

We were still requiring ZODB3, because ZODB3 3.11 egg was just a
dependency on newer ZODB, ZEO, BTrees and persistent; and this way we
could be supporting all ZODB3.10.x and  ZODB4 and ZODB5 via ZODB3.11.

However upcoming Wendelin.core 2, for its proper working, needs MVCC
semantic as implemented in ZODB5. This forces us, even for ZODB4, to
backport non-trivial bits from ZODB5 (see [1]). Maintaining ZODB3
support at this point becomes non-practical, because, to our knowledge,
there is no wendelin.core user that plans to continue using ZODB3
without switching to at least ZODB4 in the near future.

So goodbye ZODB3. Even though ZODB still stays with us, it gives a
feeling similar to [2], because in 2014, when I was myself learning
ZODB, it was through ZODB3 - still at the time when all ZODB bits were
living together in one place.

[1] nexedi/ZODB!1
[2] https://lists.osuosl.org/pipermail/darcs-users/2008-September/014095.html

0802da2b

11 Dec, 2020 1 commit

tests: Don't try to access db.storage when automatically closing connections · fd6b5252

Kirill Smelkov authored Dec 11, 2020

DB.close() does `del self.storage`.

https://github.com/zopefoundation/ZODB/blob/5.6.0-14-g0eae10cd0/src/ZODB/DB.py#L646

This way if DB was closed, but some conn(s) were not, it will crash in
teardown as e.g. below:

    _____________ ERROR at teardown of test_bigfile_zblk1_zdata_reuse ______________

        def teardown_module():
    >       testdb.teardown()

    bigfile/tests/test_filezodb.py:58:
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    self = <wendelin.lib.testing.TestDB_ZEO object at 0x7fb9c0216350>

        def teardown(self):
            # close connections that test code forgot to close
            for connref, tb in self.connv:
                conn = connref()
                if conn is None:
                    continue
                if not conn.opened:
                    continue    # still alive, but closed
                print("W: testdb: teardown: %s left not closed by test code"
                      "; opened by:\n%s" % (conn, tb), file=sys.stderr)

                db = conn.db()
    >           stor = db.storage
    E           AttributeError: 'DB' object has no attribute 'storage'

    lib/testing.py:217: AttributeError

The fix is simple - don't use db.storage at all, because it is not actually used in that code.

fd6b5252

17 Nov, 2020 2 commits

bigfile/py: Garbage-collect BigFile <=> BigFileH cycles · a6a8f5ba

Kirill Smelkov authored Nov 08, 2020

Since ZBigFile keeps references to fileh objects that are created
through it it forms a file <=> fileh cycle that is not collected without
cyclic GC:

https://lab.nexedi.com/nexedi/wendelin.core/blob/v0.13-52-ga702d41/bigfile/file_zodb.py#L497
https://lab.nexedi.com/nexedi/wendelin.core/blob/v0.13-52-ga702d41/bigfile/file_zodb.py#L566-571

We did not noticed this leak until now because it is small, but with
upcoming wendelin.core 2 it is important to release a fileh, because
there is WCFS connection associated with fileh, and if fileh is not
released, that connection also stays alive, keeping on-WCFS resources
still being used, and preventing WCFS from being unmounted cleanly.

-> Add cyclic GC support to PyBigFile / PyBigFileH

NOTE: we still don't allow PyVMA <=> PyBigFileH cycles to be collected,
because fileh_close called from fileh.__del__ asserts that there are no
live mappings left. See added comments for details. There is no
known practical need to use such cycles, so this should be ok.

See also other patches on cyclic GC topic:

- 450ad804 (bigarray: ArrayRef support for BigArray)  // adds cyclic GC support for PyVMA
- d97641d2 (bigfile/py: Properly untrack PyVMA from GC before dealloc)

/proposed-for-review-on nexedi/wendelin.core!12

a6a8f5ba

bigfile/py: Move PyVMA's support for cyclic GC close to pyvma_dealloc · 7cc35422

Kirill Smelkov authored Nov 09, 2020

The logic in pyvma_traverse and pyvma_clear needs to be synchronized
with PyVMA deallocation. In the next patche we'll be amending this
logic, and it will help a reader to keep all those functions together.

For the reference: PyVMA support for cyclic GC was introduced in
450ad804 (bigarray: ArrayRef support for BigArray). See also d97641d2
(bigfile/py: Properly untrack PyVMA from GC before dealloc).

/proposed-for-review-on nexedi/wendelin.core!12

7cc35422

03 Nov, 2020 2 commits

t/tfault-run: Require bash · a702d410

Kirill Smelkov authored Nov 03, 2020

Otherwise when /bin/sh is dash it fails with

    t/tfault-run: 35: test: on_pagefault: unexpected operator

a702d410

t/tfault-run: Clear state from previous run before starting · cf92dfca

Kirill Smelkov authored Nov 03, 2020

Otherwise, if previous test.fault failed, tfault-run fails to start, e.g.

    >>> test.fault
    $ make test.fault # MAKEFLAGS=-j1
    x86_64-linux-gnu-gcc -pthread -g -Wall -D_GNU_SOURCE -std=gnu99 -fplan9-extensions -Wno-declaration-after-statement -Wno-error=declaration-after-statement  -Iinclude -I3rdparty/ccan -I3rdparty/include   bigfile/tests/tfault.c lib/bug.c lib/utils.c 3rdparty/ccan/ccan/tap/tap.c  -o bigfile/tests/tfault.t
    t/tfault-run bigfile/tests/tfault.t faultr on_pagefault
    mkdir: cannot create directory ‘t/tfault-run.faultr’: File exists
    Makefile:186: recipe for target 'faultr.tfault' failed
    make: *** [faultr.tfault] Error 1
    rm bigfile/tests/tfault.t
    error   test.fault      0.433s  # 1t 1e 0f 0s

cf92dfca

02 Nov, 2020 1 commit

Add way to run tests via nxdtest · 19de3fe2

Kirill Smelkov authored Oct 15, 2020

Nxdtest[1] is tox-like tool to run tests under Nexedi testing
infrastructure. See [2] for details.

[1] https://lab.nexedi.com/nexedi/nxdtest
[2] nexedi/slapos!839

19de3fe2

11 Sep, 2020 1 commit

3rdparty/ccan: Update for build fix wrt recent gcc/binutils · 35cb1446

Kirill Smelkov authored Sep 11, 2020

We need the following patch of mine:

   http://git.ozlabs.org/?p=ccan;a=commitdiff;h=b97c7f0841f5173a07a2571f2c99f944d8405a90

35cb1446

17 May, 2020 2 commits

lib/zodb: Fix typos · b195812c
Kirill Smelkov authored May 17, 2020

b195812c

lib/zodb: Try to clarify zconn_at intent · 50a42028

Kirill Smelkov authored May 17, 2020

It is hard for people to understand current wording, so let's expand
zconn_at description to additionally explain what it is providing with
second set of words, which, hopefully, lowers potential ambiguity a bit.

/reported-by @jwolf083

50a42028