• Kirill Smelkov's avatar
    FileStorage: Save committed transaction to disk even if changed data is empty · fdf9e7a2
    Kirill Smelkov authored
    [ This is ZODB4 backport of commit bb9bf539
      (https://github.com/zopefoundation/ZODB/pull/298) ]
    
    ZODB tries to avoid saving empty transactions to storage on
    `transaction.commit()`. The way it works is: if no objects were changed
    during ongoing transaction, ZODB.Connection does not join current
    TransactionManager, and transaction.commit() performs two-phase commit
    protocol only on joined DataManagers. In other words if no objects were
    changed, no tpc_*() methods are called at all on ZODB.Connection at
    transaction.commit() time.
    
    This way application servers like Zope/ZServer/ERP5/... can have
    something as
    
        try:
            # process incoming request
            transaction.commit()    # processed ok
        except:
            transaction.abort()
            # problem: log + reraise
    
    in top-level code to process requests without creating many on-disk
    transactions with empty data changes just because read-only requests
    were served.
    
    Everything is working as intended.
    
    However at storage level, FileStorage currently also checks whether
    transaction that is being committed also comes with empty data changes,
    and _skips_ saving transaction into disk *at all* for such cases, even
    if it has been explicitly told to commit the transaction via two-phase
    commit protocol calls done at storage level.
    
    This creates the situation, where contrary to promise in
    ZODB/interfaces.py(*), after successful tpc_begin/tpc_vote/tpc_finish()
    calls made at storage level, transaction is _not_ made permanent,
    despite tid of "committed" transaction being returned to caller. In other
    words FileStorage, when asked to commit a transaction, even if one with
    empty data changes, reports "ok" and gives transaction ID to the caller,
    without creating corresponding transaction record on disk.
    
    This behaviour is
    
    a) redundant to application-level avoidance to create empty transaction
       on storage described in the beginning, and
    
    b) creates problems:
    
    The first problem is that application that works at storage-level might
    be interested in persisting transaction, even with empty changes to
    data, just because it wants to save the metadata similarly to e.g.
    `git commit --allow-empty`.
    
    The other problem is that an application view and data in database
    become inconsistent: an application is told that a transaction was
    created with corresponding transaction ID, but if the storage is
    actually inspected, e.g. by iteration, the transaction is not there.
    This, in particular, can create problems if TID of committed transaction
    is reported elsewhere and that second database client does not find the
    transaction it was told should exist.
    
    I hit this particular problem with wendelin.core. In wendelin.core,
    there is custom virtual memory layer that keeps memory in sync with
    data in ZODB. At commit time, the memory is inspected for being dirtied,
    and if a page was changed, virtual memory layer joins current
    transaction _and_ forces corresponding ZODB.Connection - via which it
    will be saving data into ZODB objects - to join the transaction too,
    because it would be too late to join ZODB.Connection after 2PC process
    has begun(+). One of the format in which data are saved tries to
    optimize disk space usage, and it actually might happen, that even if
    data in RAM were dirtied, the data itself stayed the same and so nothing
    should be saved into ZODB. However ZODB.Connection is already joined
    into transaction and it is hard not to join it because joining a
    DataManager when the 2PC is already ongoing does not work.
    
    This used to work ok with wendelin.core 1, but with wendelin.core 2 -
    where separate virtual filesystem is also connected to the database to
    provide base layer for arrays mappings - this creates problem, because
    when wcfs (the filesystem) is told to synchronize to view the database
    @tid of committed transaction, it can wait forever waiting for that, or
    later, transaction to appear on disk in the database, creating
    application-level deadlock.
    
    I agree that some more effort might be made at wendelin.core side to
    avoid committing transactions with empty data at storage level.
    
    However the most clean way to fix this problem in my view is to fix
    FileStorage itself, because if at storage level it was asked to commit
    something, it should not silently skip doing so and dropping even non-empty
    metadata + returning ok and committed transaction ID to the caller.
    
    As described in the beginning this should not create problems for
    application-level ZODB users, while at storage-level the implementation
    is now consistently matching interface and common sense.
    
    ----
    
    (*) tpc_finish: Finish the transaction, making any transaction changes permanent.
        Changes must be made permanent at this point.
        ...
    
        https://github.com/zopefoundation/ZODB/blob/5.5.1-35-gb5895a5c2/src/ZODB/interfaces.py#L828-L831
    
    (+) https://lab.nexedi.com/kirr/wendelin.core/blob/9ff5ed32/bigfile/file_zodb.py#L788-822
    fdf9e7a2
FileStorage.py 72.2 KB