• Kirill Smelkov's avatar
    [ZODB4] Backport the way MVCC is handled from ZODB5 · 8e7eab33
    Kirill Smelkov authored
    This backports to ZODB4 Connection ZODB5's approach to handle MVCC via
    always calling storage.loadBefore() instead of "load for latest version
    + loadBefore if we were notified of database changes" approach.
    
    Why?
    ----
    
    Short answer: because Wendelin.core 2 needs to know at which particular
    database state application-level ZODB connection is viewing the
    database, and it is hard to implement such functionality correctly
    without this backport. Please see appendix for the explanation.
    
    What
    ----
    
    This backports to ZODB4 the minimum necessary part of upstream commit 227953b9
    (Simplify MVCC by determining transaction start time using lastTransaction) +
    follow-up correctness fixes:
    
    https://github.com/zopefoundation/ZODB/issues/50
    https://github.com/zopefoundation/ZODB/pull/56
    https://github.com/zopefoundation/ZODB/pull/291
    https://github.com/zopefoundation/ZODB/pull/307
    
    In short:
    
    - a Connection is always opened with explicitly corresponding to a particular database revision
    - Connection uses only loadBefore with that revision to load objects
    - every time a Connection is (re)opened, the result of queued invalidations and
      explicit query to storage.lastTransaction is carefully merged to refresh
      Connection's idea about which database state it corresponds to.
    
    The "careful" in last point is important. Historically ZODB5 was first reworked
    in commit 227953b9 (https://github.com/zopefoundation/ZODB/pull/56) to always
    call lastTransaction to refresh state of Connection view. Since there
    was no proper synchronisation with respect to process of handling
    invalidations, that lead to data corruption issue due to race in
    Connection.open() vs invalidations:
    
    https://github.com/zopefoundation/ZODB/issues/290
    
    That race and data corruption was fixed in commit b5895a5c
    (https://github.com/zopefoundation/ZODB/pull/291) by way of avoiding
    lastTransaction call and relying only on invalidations channel when
    refreshing Connection view.
    
    This fix in turn led to another data corruption issue because in
    presence of client-server reconnections, ZODB storage drivers can partly
    skip notifying client with detailed invalidation messages:
    
    https://github.com/zopefoundation/ZODB/pull/291#issuecomment-581047061
    
    A fix to that issue (https://github.com/zopefoundation/ZODB/pull/307)
    proposed to change back to query storage for lastTransaction on every
    Connection refresh, but to implement careful merging of lastTransaction
    result and data from invalidation channel. However it was found that the
    "careful merging" can work correctly only if we require from storage
    drivers a particular ordering of invalidation events wrt lastTransaction
    return and result:
    
    https://github.com/zopefoundation/ZODB/pull/307#discussion_r434145034
    
    While ZEO was already complying with that requirements, NEO had to be
    fixed to support that:
    
    https://github.com/zopefoundation/ZODB/pull/307#discussion_r434166238
    neoppod@a7d101ec
    neoppod@96a5c01f
    
    Patch details
    -------------
    
    We change Connection._txn_time to be a "before" for the database state
    to which Connection view corresponds. This state is hooked to be
    initialized and updated in Connection._flush_invalidations - the
    function that is called from both explicit Connection (re)open and at
    transaction boundaries via Connection.afterCompletion hook.
    
    Objects loading is factored into Connection._load which replaces old
    "load + check invalidated + fallback to loadBefore" game in
    Connection._setstate.
    
    Connection.open now calls Connection._flush_invalidations
    unconditionally - even if it was global cache reset event - because
    besides invalidation flushes the latter is now responsible for querying
    storage lastTransaction.
    
    TmpStore - a "storage" that provides runtime support for savepoints - is
    refactored correspondingly to delegate loading of original objects back
    to underlying Connection.
    
    DB.close is modified - similarly to ZODB5 - to release DB's Connections
    carefully with preventing connections from DB poll from implicitly
    starting new transactions via afterCompletion hook.
    
    ZODB.nxd_patches is introduced to indicate to client software that this
    particular patch is present and can be relied upon.
    
    Tests are updated correspondingly. In 227953b9 Jim talks about
    converting many tests - because
    
    	"Lots of tests didn't clean up databases and connections properly"
    
    and because new MVCC approach
    
    	"makes database and connection hygiene a bit more important,
    	especially for tests, because a connection will continue to interact
    	with storages if it isn't properly closed, which can lead to errors if
    	the storage is closed."
    
    but finally implementing automatic cleanup at transaction boundaries
    because there are too many tests to fix. We backport only automatic
    cleanup + necessary explicit test fixes to keep the diff minimal.
    
    All tests pass. This includes tests for ZODB itself, ZEO and NEO test
    over hereby modified ZODB(*), my test programs from
    
    https://github.com/zopefoundation/ZODB/issues/290	and
    https://github.com/zopefoundation/ZEO/issues/155
    
    and ERP5 tests. Upcoming wendelin.core 2 also work with this change.
    
    (*) ZEO, NEO and ERP5 tests fail sometimes, but there is no regression
    here because ZEO, NEO and ERP5 tests are failing regularly, and in the
    same way, even with unmodified ZODB.
    
    Appendix. zconn_at
    ------------------
    
    This appendix provides motivation for the backport:
    
    For wendelin.core v2 we need a way to know at which particular database
    state application-level ZODB connection is viewing the database. Knowing
    that state, WCFS client library interacts with WCFS filesystem server
    and, in simple terms, requests the server to provide data as of that
    particular database state. Let us call the function that for a client
    ZODB connection returns database state corresponding to its database
    view zconn_at.
    
    Now here is the problem: zconn_at is relatively easy to implement for
    ZODB5 - see e.g. here:
    
    https://lab.nexedi.com/nexedi/wendelin.core/blob/v0.13-54-ga6a8f5b/lib/zodb.py#L142-181
    wendelin.core@3bd82127
    
    however, for ZODB4, since its operational models is not
    directly MVCC, it is not that straightforward. Still, even for older
    ZODB4, for every client connection, there _is_ such at that corresponds
    to that connection view of the database.
    
    We need ZODB4 support, because ZODB4 is currently the version that
    Nexedi uses, and my understanding is that it will stay like this for not
    a small time. I have the feeling that ZODB5 was reworked in better
    direction, but without caring enough about quality which resulted in
    concurrency bugs with data corruption effects like
    
    https://github.com/zopefoundation/ZODB/issues/290
    https://github.com/zopefoundation/ZEO/issues/155
    etc.
    
    Even though the first one is now fixed (but it broke other parts and so
    both ZODB had to be fixed again _and_ NEO had to be fixed for that ZODB
    fix to work currently), I feel that upgrading into ZODB5 for Nexedi will
    require some non-negligible amount of QA work, and thus it is better if
    we move step-by-step - even if we eventually upgrade to ZODB5 - it is
    better we first migrate wendelin.core 1 -> wendelin.core 2 with keeping
    current version of ZODB.
    
    Now please note what would happen if zconn_at gives, even a bit, wrong
    answer: wcfs client will ask wcfs server to provide array data as of
    different database state compared to current on-client ZODB connection.
    This will result in that data accessed via ZBigArray will _not_
    correspond to all other data accessed via regular ZODB mechanism.
    It is, in other words, a data corruptions.
    In some scenarios it can be harmless, but generally it is the worst
    that can happen to a database.
    
    It is good to keep in mind ZODB issue290 when imagining corner cases
    that zconn_at has to deal with. Even though that issue is ZODB5 only, it
    shows what kind of bugs it can be in zconn_at implementation for ZODB4.
    
    Just for the reference: in Wendelin-based systems there is usually constant
    stream of database ingestions coming from many places simultaneously. Plus many
    activities crunching on the DB at the same time as well. And the more clients a
    system handles, the more there will be level-of-concurrency increase. This
    means that the problem of correctly handling concurrency issues in zconn_at is
    not purely theoretical, but has direct relation to our systems.
    
    --------
    
    With this backport, zconn_at for ZODB4 becomes trivial and robust to implement:
    
    https://lab.nexedi.com/kirr/wendelin.core/blob/484071b3/lib/zodb.py#L183-195
    
    I would like to thank Joshua Wölfel whose internship helped this topic
    to shape up:
    
    https://www.erp5.com/project_section/wendelin-ia-project/forum/Joshua-internship-D8b7NNhWfz
    
    /cc @nexedi, @jwolf083Signed-off-by: Kirill Smelkov's avatarKirill Smelkov <kirr@nexedi.com>
    8e7eab33
DB.py 35.4 KB