Commits · 8e7eab330154be8d3203692a1064bfdae0989ee0 · nexedi / ZODB

30 Nov, 2020 1 commit

[ZODB4] Backport the way MVCC is handled from ZODB5 · 8e7eab33

Kirill Smelkov authored Nov 29, 2020

This backports to ZODB4 Connection ZODB5's approach to handle MVCC via
always calling storage.loadBefore() instead of "load for latest version
+ loadBefore if we were notified of database changes" approach.

Why?
----

Short answer: because Wendelin.core 2 needs to know at which particular
database state application-level ZODB connection is viewing the
database, and it is hard to implement such functionality correctly
without this backport. Please see appendix for the explanation.

What
----

This backports to ZODB4 the minimum necessary part of upstream commit 227953b9
(Simplify MVCC by determining transaction start time using lastTransaction) +
follow-up correctness fixes:

https://github.com/zopefoundation/ZODB/issues/50
https://github.com/zopefoundation/ZODB/pull/56
https://github.com/zopefoundation/ZODB/pull/291
https://github.com/zopefoundation/ZODB/pull/307

In short:

- a Connection is always opened with explicitly corresponding to a particular database revision
- Connection uses only loadBefore with that revision to load objects
- every time a Connection is (re)opened, the result of queued invalidations and
  explicit query to storage.lastTransaction is carefully merged to refresh
  Connection's idea about which database state it corresponds to.

The "careful" in last point is important. Historically ZODB5 was first reworked
in commit 227953b9 (https://github.com/zopefoundation/ZODB/pull/56) to always
call lastTransaction to refresh state of Connection view. Since there
was no proper synchronisation with respect to process of handling
invalidations, that lead to data corruption issue due to race in
Connection.open() vs invalidations:

https://github.com/zopefoundation/ZODB/issues/290

That race and data corruption was fixed in commit b5895a5c
(https://github.com/zopefoundation/ZODB/pull/291) by way of avoiding
lastTransaction call and relying only on invalidations channel when
refreshing Connection view.

This fix in turn led to another data corruption issue because in
presence of client-server reconnections, ZODB storage drivers can partly
skip notifying client with detailed invalidation messages:

https://github.com/zopefoundation/ZODB/pull/291#issuecomment-581047061

A fix to that issue (https://github.com/zopefoundation/ZODB/pull/307)
proposed to change back to query storage for lastTransaction on every
Connection refresh, but to implement careful merging of lastTransaction
result and data from invalidation channel. However it was found that the
"careful merging" can work correctly only if we require from storage
drivers a particular ordering of invalidation events wrt lastTransaction
return and result:

https://github.com/zopefoundation/ZODB/pull/307#discussion_r434145034

While ZEO was already complying with that requirements, NEO had to be
fixed to support that:

https://github.com/zopefoundation/ZODB/pull/307#discussion_r434166238
neoppod@a7d101ec
neoppod@96a5c01f

Patch details
-------------

We change Connection._txn_time to be a "before" for the database state
to which Connection view corresponds. This state is hooked to be
initialized and updated in Connection._flush_invalidations - the
function that is called from both explicit Connection (re)open and at
transaction boundaries via Connection.afterCompletion hook.

Objects loading is factored into Connection._load which replaces old
"load + check invalidated + fallback to loadBefore" game in
Connection._setstate.

Connection.open now calls Connection._flush_invalidations
unconditionally - even if it was global cache reset event - because
besides invalidation flushes the latter is now responsible for querying
storage lastTransaction.

TmpStore - a "storage" that provides runtime support for savepoints - is
refactored correspondingly to delegate loading of original objects back
to underlying Connection.

DB.close is modified - similarly to ZODB5 - to release DB's Connections
carefully with preventing connections from DB poll from implicitly
starting new transactions via afterCompletion hook.

ZODB.nxd_patches is introduced to indicate to client software that this
particular patch is present and can be relied upon.

Tests are updated correspondingly. In 227953b9 Jim talks about
converting many tests - because

	"Lots of tests didn't clean up databases and connections properly"

and because new MVCC approach

	"makes database and connection hygiene a bit more important,
	especially for tests, because a connection will continue to interact
	with storages if it isn't properly closed, which can lead to errors if
	the storage is closed."

but finally implementing automatic cleanup at transaction boundaries
because there are too many tests to fix. We backport only automatic
cleanup + necessary explicit test fixes to keep the diff minimal.

All tests pass. This includes tests for ZODB itself, ZEO and NEO test
over hereby modified ZODB(*), my test programs from

https://github.com/zopefoundation/ZODB/issues/290	and
https://github.com/zopefoundation/ZEO/issues/155

and ERP5 tests. Upcoming wendelin.core 2 also work with this change.

(*) ZEO, NEO and ERP5 tests fail sometimes, but there is no regression
here because ZEO, NEO and ERP5 tests are failing regularly, and in the
same way, even with unmodified ZODB.

Appendix. zconn_at
------------------

This appendix provides motivation for the backport:

For wendelin.core v2 we need a way to know at which particular database
state application-level ZODB connection is viewing the database. Knowing
that state, WCFS client library interacts with WCFS filesystem server
and, in simple terms, requests the server to provide data as of that
particular database state. Let us call the function that for a client
ZODB connection returns database state corresponding to its database
view zconn_at.

Now here is the problem: zconn_at is relatively easy to implement for
ZODB5 - see e.g. here:

https://lab.nexedi.com/nexedi/wendelin.core/blob/v0.13-54-ga6a8f5b/lib/zodb.py#L142-181
wendelin.core@3bd82127

however, for ZODB4, since its operational models is not
directly MVCC, it is not that straightforward. Still, even for older
ZODB4, for every client connection, there _is_ such at that corresponds
to that connection view of the database.

We need ZODB4 support, because ZODB4 is currently the version that
Nexedi uses, and my understanding is that it will stay like this for not
a small time. I have the feeling that ZODB5 was reworked in better
direction, but without caring enough about quality which resulted in
concurrency bugs with data corruption effects like

https://github.com/zopefoundation/ZODB/issues/290
https://github.com/zopefoundation/ZEO/issues/155
etc.

Even though the first one is now fixed (but it broke other parts and so
both ZODB had to be fixed again _and_ NEO had to be fixed for that ZODB
fix to work currently), I feel that upgrading into ZODB5 for Nexedi will
require some non-negligible amount of QA work, and thus it is better if
we move step-by-step - even if we eventually upgrade to ZODB5 - it is
better we first migrate wendelin.core 1 -> wendelin.core 2 with keeping
current version of ZODB.

Now please note what would happen if zconn_at gives, even a bit, wrong
answer: wcfs client will ask wcfs server to provide array data as of
different database state compared to current on-client ZODB connection.
This will result in that data accessed via ZBigArray will _not_
correspond to all other data accessed via regular ZODB mechanism.
It is, in other words, a data corruptions.
In some scenarios it can be harmless, but generally it is the worst
that can happen to a database.

It is good to keep in mind ZODB issue290 when imagining corner cases
that zconn_at has to deal with. Even though that issue is ZODB5 only, it
shows what kind of bugs it can be in zconn_at implementation for ZODB4.

Just for the reference: in Wendelin-based systems there is usually constant
stream of database ingestions coming from many places simultaneously. Plus many
activities crunching on the DB at the same time as well. And the more clients a
system handles, the more there will be level-of-concurrency increase. This
means that the problem of correctly handling concurrency issues in zconn_at is
not purely theoretical, but has direct relation to our systems.

--------

With this backport, zconn_at for ZODB4 becomes trivial and robust to implement:

https://lab.nexedi.com/kirr/wendelin.core/blob/484071b3/lib/zodb.py#L183-195

I would like to thank Joshua Wölfel whose internship helped this topic
to shape up:

https://www.erp5.com/project_section/wendelin-ia-project/forum/Joshua-internship-D8b7NNhWfz

/cc @nexedi, @jwolf083Signed-off-by: Kirill Smelkov <kirr@nexedi.com>

8e7eab33

20 Nov, 2020 2 commits

[ZODB4] IStorage: Require lastTransaction() to invalidate DB before returning ... · 40116375

Kirill Smelkov authored Jun 08, 2020

... and to provide correct "current" view for load().

This is ZODB4 backport of https://github.com/zopefoundation/ZODB/pull/313,
which itself is just a more explicit language of
https://github.com/zopefoundation/ZODB/commit/4a6b0283#diff-881ceb274f9e538d4144950eefce8682R685-R695

It has been amended with ZODB4-specific

      It is guaranteed that after lastTransaction returns, "current" view of
      the storage as observed by load() is ≥ returned tid.

because on ZODB<5 - contrary to ZODB5 - there is "load current" for
which correct semantic must be also provided:
https://github.com/zopefoundation/ZODB/pull/307#discussion_r436662348

Original description follows:

---- 8< ----

Because if lastTransaction() returns tid, for which local database
handle has not yet been updated with invalidations, it could lead to
data corruption due to concurrency issues similar to
https://github.com/zopefoundation/ZODB/issues/290:

- DB refreshes a Connection for new transaction;
- zstor.lastTransaction() is called to obtain database view for this connection.
- objects in live-cache for this Connection are invalidated with
  invalidations that were queued through DB.invalidate() calls from
  storage.
- if lastTransaction does not guarantee that all DB invalidations for
  transactions with ID ≤ returned tid have been completed, it can be
  that:

	incomplete set of objects are invalidated in live cache

  i.e. data corruption.

This particular data corruption has been hit when working on core of
ZODB and was not immediately noticed:

https://github.com/zopefoundation/ZODB/pull/307#pullrequestreview-423017996

this fact justifies the importance of explicitly stating what IStorage
guarantees are / must be in the interface.

This guarantee

- already holds for FileStorage (no database mutations from outside of
  single process);
- is already true for ZEO4 and ZEO5
  https://github.com/zopefoundation/ZODB/pull/307#pullrequestreview-423017996
  https://github.com/zopefoundation/ZODB/pull/307#discussion_r434166238
- holds for RelStorage because it implements IMVCCStorage natively;
- is *not* currently true for NEO because NEO sets zstor.last_tid before
  calling DB.invalidate:

  https://lab.nexedi.com/nexedi/neoppod/blob/fc58c089/neo/client/handlers/master.py#L109-124

However NEO is willing to change and already prepared the fix to provide
this guarantee because of data corruption scenario that can happen
without it:

  https://github.com/zopefoundation/ZODB/pull/307#discussion_r436662348
  neoppod@a7d101ec
  neoppod@a7d101ec (comment 112238)
  neoppod@a7d101ec (comment 113331)

In other words all storages that, to my knowledge, are in current use
are either already providing specified semantic, or will be shortly
fixed to provide it.

This way we can fix up the interface and make the semantic clear.

/cc @jamadden, @vpelletier, @arnaud-fontaine, @jwolf083, @klawlf82, @gitzit, @jimfulton

40116375

[ZODB4] setup: tests_require += zope.testrunner · 998c8f86

Kirill Smelkov authored Nov 20, 2020

Zope.testrunner is used to run ZODB tests (see e.g. tox.ini). It was
specified as explicit dependency in tox.ini->deps and in buildout.cfg.
However if in SlapOS we install just ZODB[test] zope.testrunner won't be
installed.

-> Fix is by moving zope.testrunner dependency into setup, so that
ZODB[test] installs it. This is the same as what ZODB5 does:

https://github.com/zopefoundation/ZODB/blob/5.6.0-14-g0eae10cd0/setup.py#L48

998c8f86

31 Jul, 2020 1 commit

[ZODB4] Start of 4-nxd branch · d644e63b

Kirill Smelkov authored Jul 31, 2020

Upstream considers 4 flavour to be "dead":

https://github.com/zopefoundation/ZEO/pull/161#pullrequestreview-447245642

Let's keep on 4-nxd branch in our fork until ZODB4 is used on our side.

d644e63b

31 Mar, 2020 1 commit

FileStorage: Save committed transaction to disk even if changed data is empty · fdf9e7a2

Kirill Smelkov authored Mar 13, 2020

[ This is ZODB4 backport of commit bb9bf539
(https://github.com/zopefoundation/ZODB/pull/298) ]

ZODB tries to avoid saving empty transactions to storage on
`transaction.commit()`. The way it works is: if no objects were changed
during ongoing transaction, ZODB.Connection does not join current
TransactionManager, and transaction.commit() performs two-phase commit
protocol only on joined DataManagers. In other words if no objects were
changed, no tpc_*() methods are called at all on ZODB.Connection at
transaction.commit() time.

This way application servers like Zope/ZServer/ERP5/... can have
something as

try:
# process incoming request
transaction.commit() # processed ok
except:
transaction.abort()
# problem: log + reraise

in top-level code to process requests without creating many on-disk
transactions with empty data changes just because read-only requests
were served.

Everything is working as intended.

However at storage level, FileStorage currently also checks whether
transaction that is being committed also comes with empty data changes,
and _skips_ saving transaction into disk *at all* for such cases, even
if it has been explicitly told to commit the transaction via two-phase
commit protocol calls done at storage level.

This creates the situation, where contrary to promise in
ZODB/interfaces.py(*), after successful tpc_begin/tpc_vote/tpc_finish()
calls made at storage level, transaction is _not_ made permanent,
despite tid of "committed" transaction being returned to caller. In other
words FileStorage, when asked to commit a transaction, even if one with
empty data changes, reports "ok" and gives transaction ID to the caller,
without creating corresponding transaction record on disk.

This behaviour is

a) redundant to application-level avoidance to create empty transaction
on storage described in the beginning, and

b) creates problems:

The first problem is that application that works at storage-level might
be interested in persisting transaction, even with empty changes to
data, just because it wants to save the metadata similarly to e.g.
`git commit --allow-empty`.

The other problem is that an application view and data in database
become inconsistent: an application is told that a transaction was
created with corresponding transaction ID, but if the storage is
actually inspected, e.g. by iteration, the transaction is not there.
This, in particular, can create problems if TID of committed transaction
is reported elsewhere and that second database client does not find the
transaction it was told should exist.

I hit this particular problem with wendelin.core. In wendelin.core,
there is custom virtual memory layer that keeps memory in sync with
data in ZODB. At commit time, the memory is inspected for being dirtied,
and if a page was changed, virtual memory layer joins current
transaction _and_ forces corresponding ZODB.Connection - via which it
will be saving data into ZODB objects - to join the transaction too,
because it would be too late to join ZODB.Connection after 2PC process
has begun(+). One of the format in which data are saved tries to
optimize disk space usage, and it actually might happen, that even if
data in RAM were dirtied, the data itself stayed the same and so nothing
should be saved into ZODB. However ZODB.Connection is already joined
into transaction and it is hard not to join it because joining a
DataManager when the 2PC is already ongoing does not work.

This used to work ok with wendelin.core 1, but with wendelin.core 2 -
where separate virtual filesystem is also connected to the database to
provide base layer for arrays mappings - this creates problem, because
when wcfs (the filesystem) is told to synchronize to view the database
@tid of committed transaction, it can wait forever waiting for that, or
later, transaction to appear on disk in the database, creating
application-level deadlock.

I agree that some more effort might be made at wendelin.core side to
avoid committing transactions with empty data at storage level.

However the most clean way to fix this problem in my view is to fix
FileStorage itself, because if at storage level it was asked to commit
something, it should not silently skip doing so and dropping even non-empty
metadata + returning ok and committed transaction ID to the caller.

As described in the beginning this should not create problems for
application-level ZODB users, while at storage-level the implementation
is now consistently matching interface and common sense.

----

(*) tpc_finish: Finish the transaction, making any transaction changes permanent.
Changes must be made permanent at this point.
...

https://github.com/zopefoundation/ZODB/blob/5.5.1-35-gb5895a5c2/src/ZODB/interfaces.py#L828-L831

(+) https://lab.nexedi.com/kirr/wendelin.core/blob/9ff5ed32/bigfile/file_zodb.py#L788-822

fdf9e7a2

30 Apr, 2018 1 commit
- Appveyor (#157) including a fix for Python 3.6 · 7a1a4911
  Jim Fulton authored Apr 09, 2017
```
Tests running and passing on windows.

(cherry picked from commit 87fd29eb)
```
  7a1a4911
08 Apr, 2017 2 commits
- Fix date in changelog · 5b6c92a2
  Julien Muchembled authored Apr 08, 2017
  
  5b6c92a2
- Changelog for PR #153 · c9f750f3
  Jim Fulton authored Apr 08, 2017
```
This backports the following commits:
  6d6730574934d4ffa08b6afccb4d3bde6cf46194
  244bb92b06770da76805696a6c0ebd815bf67bc9
```
  c9f750f3
03 Apr, 2017 1 commit

FileStorage: Report problem on read-only open of non-existent file · 56c96a11

Kirill Smelkov authored Apr 02, 2017

... instead of silently creating empty database on such opens.

Use-case for this are utilities like e.g. zodbdump and zodbcmp which
expect such storage opens to fail so that the tool can know there is no
such storage and report it to user.

In contrast current state is: read-only opens get created-on-the-fly
empty storage with no content, but which can be iterated over without
getting any error.

This way e.g. `zodbdump non-existent.fs` produces empty output _and_
exit code 0 which is not what caller expects.

(cherry picked from commit 30bbabf19e837b4ca9e35fab5d5c1278b078fe45)

56c96a11

06 Feb, 2017 1 commit
- 4.4.5 · 60ce5795
  Julien Muchembled authored Feb 06, 2017
  
  60ce5795
02 Feb, 2017 5 commits

Changelog for PR #136 · 7c0d963f
Julien Muchembled authored Jan 14, 2017

7c0d963f

Fix deadlock with storages that "sync" on a new transaction · 9821696f

Julien Muchembled authored Jan 13, 2017

This backports a change from commit 227953b977a9e195c4ce9bbb9acd9c5ee60c333a.

NEO, as well as ZEO+server_sync (ERP5 backports this feature with a
monkey-patch), pings the server (primary master node in the case of NEO) on
new transactions. However, this round-trip is actually performed by the thread
that also does tasks requiring to lock the DB, like processing of invalidations.

Since transaction 1.6.1 (more precisely commit e581a120a6), IStorage.sync()
is called indirectly by DB.open() when a transaction has already begun,
and the DB must not be locked when this happens.

9821696f

Don't manipulate Connection state after it has been returned to the pool. · 465b3502

Jason Madden authored Feb 01, 2017

Doing so leads to race conditions.

In particular, there can be an AttributeError.

See https://github.com/zodb/zodbshootout/issues/26 for details.

(cherry picked from commit f8cf23ec)

465b3502

Don't require persistent at setup time. We don't build native code that needs... · 98612054

Jason Madden authored Jan 28, 2017

Don't require persistent at setup time. We don't build native code that needs those headers anymore. Fixes #119.

(cherry picked from commit d7dae8b1)

98612054

use buildout to build so we don't have 2 environments to manage · 6a7d20fc

Jim Fulton authored Sep 08, 2016

This is also a workaround for pypa/setuptools#864, since the default pypy3 on
Travis is quite old.

(cherry picked from commit def73970b7873a4460719b4466e97e935e55bc9a
                       and 5dcca55e2834d570f66b59f792578cc112187f27)

6a7d20fc

27 Nov, 2016 2 commits
- 4.4.4 · ee291e45
  Jim Fulton authored Nov 27, 2016
  
  ee291e45
- Merge pull request #134 from zopefoundation/transaction-203 · d268d538
  Jim Fulton authored Nov 27, 2016
```
Fixed to work with transaction 2.0.3.
```
  d268d538
25 Nov, 2016 1 commit
- Fixed to work with transaction 2.0.3. · 9a348a89
  Jim Fulton authored Nov 25, 2016
  
  9a348a89
27 Sep, 2016 2 commits

Merge pull request #121 from zopefoundation/4-simplify-README · 977e24bf
Jim Fulton authored Sep 27, 2016
```
Simplify the README file to avoid out of date information
```
977e24bf

Simplify the README file to avoid out of date information · 77f2ff75

Jim Fulton authored Sep 27, 2016

We have the same information scattered around in different places,
which increases the chance that some of it will be out of date.

See the end of:
https://groups.google.com/forum/#!topic/zodb/fy6RRVAF9-s

I want to try to avoid duplicating zodb.org.

77f2ff75

12 Sep, 2016 2 commits
- Merge pull request #116 from NextThought/114-for-4 · f618bffd
  Jim Fulton authored Sep 12, 2016
```
Clear Connection.transaction_manager on close. Fixes #114 for ZODB 4.
```
  f618bffd
- Clear Connection.transaction_manager on close. Fixes #114 · 760e22a9
  Jason Madden authored Sep 12, 2016
  
  760e22a9
09 Sep, 2016 1 commit
- Merge pull request #98 from zopefoundation/issue97 · 27f3a173
  Jim Fulton authored Sep 09, 2016
```
Call _p_resolveConflict() even if a conflicting change doesn't change the state
```
  27f3a173
21 Aug, 2016 2 commits
- Changelog for PR #98 · 7d436f39
  Julien Muchembled authored Aug 20, 2016
  
  7d436f39
- Call _p_resolveConflict() even if a conflicting change doesn't change the state · b74eef76
  Julien Muchembled authored Aug 20, 2016
```
This reverts to the behaviour of 3.10.3 and older.
```
  b74eef76
04 Aug, 2016 1 commit
- 4.4.3 · b53c7019
  Jim Fulton authored Aug 04, 2016
  
  b53c7019
27 Jul, 2016 2 commits
- Merge pull request #96 from zopefoundation/checkTransactionalUndoIterator · a2da8235
  Jim Fulton authored Jul 27, 2016
```
checkTransactionalUndoIterator: do not expect iterator to return sorted oids
```
  a2da8235
- checkTransactionalUndoIterator: do not expect iterator to return sorted oids · 9d418f12
  Julien Muchembled authored Jul 27, 2016
  
  9d418f12
26 Jul, 2016 1 commit
- Merge pull request #88 from zopefoundation/fstail-offset · 37e445a5
  Jim Fulton authored Jul 26, 2016
```
fstail: print the txn offset and header size, instead of only the data offset
```
  37e445a5
13 Jul, 2016 1 commit
- Changelog for PR #88 · 4392f902
  Julien Muchembled authored Jul 13, 2016
  
  4392f902
12 Jul, 2016 8 commits

fstail: print the txn offset and header size, instead of only the data offset · 3807ace8

Julien Muchembled authored Jul 10, 2016

Before:

    2016-07-01 09:41:50.416574: hash=d7101c5ee7b8e412d7b6d54873204421e09b7f34
    user='' description='' length=1629 offset=58990284

After:

    2016-07-01 09:41:50.416574: hash=d7101c5ee7b8e412d7b6d54873204421e09b7f34
    user='' description='' length=1629 offset=58990261 (+23)

The structure of a FileStorage DB is such that it's easy to revert the last
transactions, by truncating the file at the right offset. With the above
change, `fstail` can now be used to get this offset.

In the above example:

    truncate -s 58990261 Data.fs

would delete the transaction and all those after.

3807ace8

Merge pull request #89 from zopefoundation/undo-refactor · 75bae1a6
Jim Fulton authored Jul 12, 2016
```
Refactored FileStorage transactional undo
```
75bae1a6
removed out of date comment · b563487e
Jim Fulton authored Jul 12, 2016

b563487e
Merge pull request #86 from NextThought/handle-serials4 · e080bdcc
Jim Fulton authored Jul 12, 2016
```
Fix handle_all_serials for the new and old protocols.
```
e080bdcc
Update comment. [skip ci] · c13649da
Jason Madden authored Jul 12, 2016

c13649da
changes · d64a0cbf
Jim Fulton authored Jul 12, 2016

d64a0cbf

Refactored FileStorage transactional undo · d717a685

Jim Fulton authored Jul 12, 2016

As part of a project to provide object-level commit locks for ZEO, I'm
refactiring FileStorage to maintain transaction-specific data in
Tranaction.data.  This involved undo.  In trying to figure this out, I
found:

- A bug in _undoDataInfo, which I verified with some tests and

- _transactionalUndoRecord was maddeningly difficult to reason about
  (and thus change).

I was concerned less by the bug than my inability to know whether a
change to the code would be correct.

So I refactored the code, mainly transactionalUndoRecord, to make the
code easier to understand, fixing some logic errors (I'm pretty sure)
along the way.  This included lots of comments. (Comments are much
easier to compose when you're working out logic you didn't
understand.)

In addition to makeing the code cleaner, it allows undo to be handled
in cases that weren't handled before.

d717a685

Long lines. Grrrr. · a6c1713d
Jim Fulton authored Jul 12, 2016

a6c1713d

09 Jul, 2016 1 commit
- Fix handle_all_serials for the new and old protocols. · 87748b15
  Jason Madden authored Jul 09, 2016
  
  87748b15
08 Jul, 2016 1 commit
- 4.4.2 · 45831d69
  Jim Fulton authored Jul 08, 2016
  
  45831d69