Blame view

BUGS.rst 1.41 KB
Julien Muchembled committed
Although NEO is considered ready for production use in most cases, there are
Julien Muchembled committed
2 3
a few bugs to know because they concern basic features of ZODB (marked with Z),
or promised features of NEO (marked with N).
Julien Muchembled committed

Julien Muchembled committed
5 6
All the listed bugs will be fixed with high priority.

Julien Muchembled committed
7 8
(N) A backup cell may be wrongly marked as corrupted while checking replicas
Julien Muchembled committed
9 10 11 12 13 14 15 16 17 18

This happens in the following conditions:

1. a backup cluster starts to check replicas whereas a cell is outdated
2. this cell becomes updated, but only up to a tid smaller than the max tid
   to check (this can't happen for a non-backup cluster)
3. the cluster actually starts to check the related partition
4. the cell is checked completely before it could replicate up to the max tid
   to check

Julien Muchembled committed
19 20 21 22 23 24 25 26 27 28
Sometimes, it causes the master to crash::

    File "neo/lib/", line 72, in dispatch
      method(conn, *args, **kw)
    File "neo/master/handlers/", line 93, in notifyReplicationDone
      cell_list = app.backup_app.notifyReplicationDone(node, offset, tid)
    File "neo/master/", line 337, in notifyReplicationDone
      assert cell.isReadable()

Julien Muchembled committed
29 30
Workaround: make sure all cells are up-to-date before checking replicas.

Julien Muchembled committed
31 32 33 34
Found by running testBackupNodeLost many times:

- either a failureException: 12 != 11
- or the above assert failure, in which case the unit test freezes