BUGS.rst 1.41 KB
Newer Older
Julien Muchembled's avatar
Julien Muchembled committed
Although NEO is considered ready for production use in most cases, there are
2 3
a few bugs to know because they concern basic features of ZODB (marked with Z),
or promised features of NEO (marked with N).

Julien Muchembled's avatar
Julien Muchembled committed
5 6
All the listed bugs will be fixed with high priority.

7 8
(N) A backup cell may be wrongly marked as corrupted while checking replicas
9 10 11 12 13 14 15 16 17 18

This happens in the following conditions:

1. a backup cluster starts to check replicas whereas a cell is outdated
2. this cell becomes updated, but only up to a tid smaller than the max tid
   to check (this can't happen for a non-backup cluster)
3. the cluster actually starts to check the related partition
4. the cell is checked completely before it could replicate up to the max tid
   to check

19 20 21 22 23 24 25 26 27 28
Sometimes, it causes the master to crash::

    File "neo/lib/handler.py", line 72, in dispatch
      method(conn, *args, **kw)
    File "neo/master/handlers/storage.py", line 93, in notifyReplicationDone
      cell_list = app.backup_app.notifyReplicationDone(node, offset, tid)
    File "neo/master/backup_app.py", line 337, in notifyReplicationDone
      assert cell.isReadable()

29 30
Workaround: make sure all cells are up-to-date before checking replicas.

31 32 33 34
Found by running testBackupNodeLost many times:

- either a failureException: 12 != 11
- or the above assert failure, in which case the unit test freezes