An error occurred fetching the project authors.
  1. 31 Mar, 2017 2 commits
    • Julien Muchembled's avatar
      Fix race when tweak touches partitions that are being reported as replicated · 87c5178b
      Julien Muchembled authored
      The bug could lead to data corruption (if a partition is wrongly marked as
      UP_TO_DATE) or crashes (assertion failure on either the storage or the master).
      
      The protocol is extended to handle the following scenario:
      
          S                                    M
          partition 0 outdated
            <-- UnfinishedTransactions ------>
          replication of partition 0 ...
          partition 1 outdated
            --- UnfinishedTransactions ...
          ... replication finished
            --- ReplicationDone ...
                                               tweak
            <-- partition 1 discarded --------
                                               tweak
            <-- partition 1 outdated ---------
                ... UnfinishedTransactions -->
                ... ReplicationDone --------->
      
      The master can't simply mark all outdated cells as being updatable when it
      receives an UnfinishedTransactions packet.
      87c5178b
    • Julien Muchembled's avatar
      Forbid read-accesses to cells that are actually non-readable · 64afd7d2
      Julien Muchembled authored
      After an attempt to read from a non-readable, which happens when a client has
      a newer or older PT than storage's, the client now retries to read.
      
      This bugfix is for all kinds of read-access except undoLog, which can still
      report incomplete results.
      64afd7d2
  2. 23 Mar, 2017 1 commit
  3. 18 Mar, 2017 1 commit
    • Julien Muchembled's avatar
      master: fix crash when a transaction begins while a storage node starts operation · 781b4eb5
      Julien Muchembled authored
      Traceback (most recent call last):
        ...
        File "neo/lib/handler.py", line 72, in dispatch
          method(conn, *args, **kw)
        File "neo/master/handlers/client.py", line 70, in askFinishTransaction
          conn.getPeerId(),
        File "neo/master/transactions.py", line 387, in prepare
          assert node_list, (ready, failed)
      AssertionError: (set([]), frozenset([]))
      
      Master log leading to the crash:
        PACKET    #0x0009 StartOperation                 > S1
        PACKET    #0x0004 BeginTransaction               < C1
        DEBUG     Begin <...>
        PACKET    #0x0004 AnswerBeginTransaction         > C1
        PACKET    #0x0001 NotifyReady                    < S1
      
      It was wrong to process BeginTransaction before receiving NotifyReady.
      
      The changes in the storage are cosmetics: the 'ready' attribute has become
      redundant with 'operational'.
      781b4eb5
  4. 21 Feb, 2017 1 commit
    • Julien Muchembled's avatar
      Implement deadlock avoidance · 092992db
      Julien Muchembled authored
      This is a first version with several optimizations possible:
      - improve EventQueue (or implement a specific queue) to minimize deadlocks
      - turn the RebaseObject packet into a notification
      
      Sorting oids could also be useful to reduce the probability of deadlocks,
      but that would never be enough to avoid them completely, even if there's a
      single storage. For example:
      
      1. C1 does a first store (x or y)
      2. C2 stores x and y; one is delayed
      3. C1 stores the other -> deadlock
         When solving the deadlock, the data of the first store may only
         exist on the storage.
      
      2 functional tests are removed because they're redundant,
      either with ZODB tests or with the new threaded tests.
      092992db
  5. 14 Feb, 2017 1 commit
  6. 18 Jan, 2017 1 commit
  7. 04 Jan, 2017 1 commit
    • Julien Muchembled's avatar
      qa: rewrite testReplicationBlockedByUnfinished as a threaded test · d3cb8888
      Julien Muchembled authored
      It is extended to check that the storage is only notified about the
      transactions that existed at the time it asked for them. Otherwise,
      Replicator.transactionFinished would be called more than once, and
      `self.ttid_set.remove(ttid)` would raise KeyError.
      
      The functional version also contained an annoying 'sleep(10)'.
      d3cb8888
  8. 22 Dec, 2016 1 commit
  9. 29 Nov, 2016 1 commit
  10. 27 Nov, 2016 2 commits
  11. 21 Mar, 2016 1 commit
    • Julien Muchembled's avatar
      master: do never abort a prepared transaction · 7ee7ff4e
      Julien Muchembled authored
      This fixes the following crash (for example when a client disconnects during
      tpc_finish):
      
      Traceback (most recent call last):
        ...
        File "neo/master/handlers/storage.py", line 68, in answerInformationLocked
          self.app.tm.lock(ttid, conn.getUUID())
        File "neo/master/transactions.py", line 338, in lock
          if self._ttid_dict[ttid].lock(uuid) and self._queue[0][1] == ttid:
      IndexError: list index out of range
      7ee7ff4e
  12. 25 Jan, 2016 1 commit
  13. 01 Dec, 2015 1 commit
    • Julien Muchembled's avatar
      Safer DB truncation, new 'truncate' ctl command · d3c8b76d
      Julien Muchembled authored
      With the previous commit, the request to truncate the DB was not stored
      persistently, which means that this operation was still vulnerable to the case
      where the master is restarted after some nodes, but not all, have already
      truncated. The master didn't have the information to fix this and the result
      was a DB partially truncated.
      
      -> On a Truncate packet, a storage node only stores the tid somewhere, to send
         it back to the master, which stays in RECOVERING state as long as any node
         has a different value than that of the node with the latest partition table.
      
      We also want to make sure that there is no unfinished data, because a user may
      truncate at a tid higher than a locked one.
      
      -> Truncation is now effective at the end on the VERIFYING phase, just before
         returning the last ids to the master.
      
      At last all nodes should be truncated, to avoid that an offline node comes back
      with a different history. Currently, this would not be an issue since
      replication is always restart from the beginning, but later we'd like they
      remember where they stopped to replicate.
      
      -> If a truncation is requested, the master waits for all nodes to be pending,
         even if it was previously started (the user can still force the cluster to
         start with neoctl). And any lost node during verification also causes the
         master to go back to recovery.
      
      Obviously, the protocol has been changed to split the LastIDs packet and
      introduce a new Recovery, since it does not make sense anymore to ask last ids
      during recovery.
      d3c8b76d
  14. 30 Nov, 2015 2 commits
    • Julien Muchembled's avatar
      Perform DB truncation during recovery, send PT to storages before verification · 3e3eab5b
      Julien Muchembled authored
      Currently, the database may only be truncated when leaving backup mode, but
      the issue will be the same when neoctl gets a new command to truncate at an
      arbitrary tid: we want to be sure that all nodes are truncated before anything
      else.
      
      Therefore, we stop sending Truncate orders before stopping operation because
      nodes could fail/exit before actually processing them. Truncation must also
      happen before asking nodes their last ids.
      
      With this commit, if a truncation is requested:
      - this is always the first thing done when a storage node connects to the
        primary master during the RECOVERING phase,
      - and the cluster does not start automatically if there are missing nodes,
        unless an admin forces it.
      
      Other changes:
      - Connections to storage nodes don't need to be aborted anymore when leaving
        backup mode.
      - The master always initiates communication when a storage node identifies,
        which simplifies code and reduces the number of exchanged packets.
      3e3eab5b
    • Julien Muchembled's avatar
      Go back/stay in RECOVERING state when the partition table can't be operational · e1f9a7da
      Julien Muchembled authored
      This fixes several cases where the partition table could become corrupt and
      the whole cluster being stuck in VERIFYING state.
      
      This also reduces the probability the have cells out of date when restarting
      several storage nodes simultaneously.
      
      At last, if a master node becomes primary again, a cluster must not be started
      automatically if nodes with readable cells are missing, in order to avoid
      a split of the database. This could happen if this master node was previously
      forced to start it.
      e1f9a7da
  15. 21 May, 2015 1 commit
  16. 07 Jan, 2014 1 commit
  17. 21 Aug, 2012 2 commits
  18. 20 Aug, 2012 3 commits
    • Vincent Pelletier's avatar
      Drop some unused imports. · 4fcd8ddc
      Vincent Pelletier authored
      4fcd8ddc
    • Julien Muchembled's avatar
      neoctl: new 'print ids' command · 95216790
      Julien Muchembled authored
      95216790
    • Julien Muchembled's avatar
      More bugfixes to backup mode · 08742377
      Julien Muchembled authored
      - catch OperationFailure
      - reset transaction manager when leaving backup mode
      - send appropriate target tid to a storage that updates a outdated cell
      - clean up partition table when leaving BACKINGUP state unexpectedly
      - make sure all readable cells of a partition have the same 'backup_tid'
        if they have the same data, so that we know when internal replication is
        finished when leaving backup mode
      - fix storage not finished internal replication when leaving backup mode
      08742377
  19. 20 Mar, 2012 1 commit
  20. 13 Mar, 2012 1 commit
  21. 12 Mar, 2012 1 commit
    • Julien Muchembled's avatar
      New feature to check that partitions are replicated properly · 04f72a4c
      Julien Muchembled authored
      This includes an API change of Node.isIdentified, which now tells whether
      identification packets have been exchanged or not.
      All handlers must be updated to implement '_acceptIdentification' instead of
      overriding EventHandler.acceptIdentification: this patch only does it for
      StorageOperationHandler
      04f72a4c
  22. 24 Feb, 2012 1 commit
    • Julien Muchembled's avatar
      Implements backup using specialised storage nodes and relying on replication · 8e3c7b01
      Julien Muchembled authored
      Replication is also fully reimplemented:
      - It is not done anymore on whole partitions.
      - It runs at lowest priority not to degrades performance for client nodes.
      
      Schema of MySQL table is changed to optimize storage layout: rows are now
      grouped by age, for good partial replication performance.
      This certainly also speeds up simple loads/stores.
      8e3c7b01
  23. 10 Jan, 2012 1 commit
  24. 26 Oct, 2011 1 commit
  25. 08 Feb, 2011 1 commit
  26. 17 Jan, 2011 1 commit
  27. 11 Jan, 2011 1 commit
    • Grégory Wisniewski's avatar
      Master transaction manager use TTID as index. · 2c3bea29
      Grégory Wisniewski authored
      - AnswerInformationLocked give ttid instead of tid
      - Master transaction manager always use ttid in data structures
      - It's no more makes sense to check if the tid is greater than the last
      generated as it never comes back from a storage, just check if the ttid is
      well known by the transaction manager.
      - Rename all tid variable that now hold a ttid
      - Transaction manager's queue contains ttids but the corresponding tids are
      increasing to keep commit order.
      - Adjust tests
      
      git-svn-id: https://svn.erp5.org/repos/neo/trunk@2613 71dcc9de-d417-0410-9af5-da40c76e7ee4
      2c3bea29
  28. 22 Dec, 2010 1 commit
  29. 14 Dec, 2010 2 commits
  30. 07 Dec, 2010 1 commit
  31. 08 Nov, 2010 2 commits
  32. 05 Nov, 2010 1 commit
    • Vincent Pelletier's avatar
      Ignore some requests, based on connection state. · 07b48079
      Vincent Pelletier authored
      Some requests can be safely ignored when received over a closed connection.
      This was previously done explicitly in handlers, but it turns out it would
      cause a lot of code duplication. Instead, define the policy on a packet
      type basis, and apply it to all packets upon reception, before passing it
      to handler.
      Also, protect request handlers when they respond, as connection might be
      closed.
      
      git-svn-id: https://svn.erp5.org/repos/neo/trunk@2419 71dcc9de-d417-0410-9af5-da40c76e7ee4
      07b48079