1. 27 Apr, 2019 2 commits
    • Julien Muchembled's avatar
      Make the number of replicas modifiable when the cluster is running · ef5fc508
      Julien Muchembled authored
      neoctl gets a new command to change the number of replicas.
      
      The number of replicas becomes a new partition table attribute and
      like the PT id, it is stored in the config table. On the other side,
      the configuration value for the number of partitions is dropped,
      since it can be computed from the partition table, which is
      always stored in full.
      
      The -p/-r master options now only apply at database creation.
      
      Some implementation notes:
      
      - The protocol is slightly optimized in that the master now sends
        automatically the whole partition tables to the admin & client
        nodes upon connection, like for storage nodes.
        This makes the protocol more consistent, and the master is the
        only remaining node requesting partition tables, during recovery.
      
      - Some parts become tricky because app.pt can be None in more cases.
        For example, the extra condition in NodeManager.update
        (before app.pt.dropNode) was added for this is the reason.
        Or the 'loadPartitionTable' method (storage) that is not inlined
        because of unit tests.
        Overall, this commit simplifies more than it complicates.
      
      - In the master handlers, we stop hijacking the 'connectionCompleted'
        method for tasks to be performed (often send the full partition
        table) on handler switches.
      
      - The admin's 'bootstrapped' flag could have been removed earlier:
        race conditions can't happen since the AskNodeInformation packet
        was removed (commit d048a52d).
      ef5fc508
    • Julien Muchembled's avatar
      New --new-nid storage option for fast cloning · 27e3f620
      Julien Muchembled authored
      It is often faster to set up replicas by stopping a node (and any
      underlying database server like MariaDB) and do a raw copy of the
      database (e.g. with rsync). So far, it required to stop the whole
      cluster and use tools like 'mysql' or sqlite3' to edit:
      - the 'pt' table in databases,
      - the 'config.nid' values of the new nodes.
      
      With this new option, if you already have 1 replica, you can set up
      new replicas with such fast raw copy, and without interruption of
      service. Obviously, this implies less redundancy during the operation.
      27e3f620
  2. 21 Mar, 2019 1 commit
  3. 11 Mar, 2019 1 commit
  4. 05 Dec, 2018 1 commit
  5. 07 Aug, 2018 1 commit
    • Julien Muchembled's avatar
      Use argparse instead of optparse · 9f1e4eef
      Julien Muchembled authored
      Besides the use of another module for option parsing, the main change is that
      there's no more Config class that mixes configuration for different components.
      Application classes now takes a simple 'dict' with parsed values.
      
      The changes in 'neoctl' are somewhat ugly, because command-line options are not
      defined on the command-line class, but this component is likely to disappear
      in the future.
      
      It remains possible to pass options via a configuration file. The code is a bit
      complex but isolated in neo.lib.config
      
      For SSL, the code may be simpler if we change for a single --ssl option that
      takes 3 paths. Not done to not break compatibility. Hence, the hack with
      an extra OptionList class in neo.lib.app
      
      A new functional test tests the 'neomigrate' script, instead of just the
      internal API to migrate data.
      9f1e4eef
  6. 22 Jun, 2018 1 commit
    • Julien Muchembled's avatar
      Maximize resiliency by taking into account the topology of storage nodes · 97af23cc
      Julien Muchembled authored
      This commit adds a contraint when tweaking the partition table with replicas,
      so that cells of each partition are assigned as far as possible from each
      other, e.g. not on the same machine even if each one has several disks, and
      in any case not on the same storage device.
      
      Currently, the topology path of each node is automatically calculated by the
      storage backend. Both MySQL and SQLite return a 2-tuple (host, st_dev).
      To be improved:
      - Add a storage option to override the path: the 'tweak' algorithm can already
        handle topology paths of any length, so something like (room, machine, disk)
        could be done easily.
      - Write OS-specific code to determine the real hardware behind st_dev
        (e.g. 2 different 'st_dev' values may actually refer to the same disk,
         because of layers like partitioning, device-mapper, loop, btrfs subvolumes,
         and so on).
      - Make 'neoctl' report in some way if the PT is optimal. Meanwhile,
        if it isn't, the master only logs a WARNING during tweak.
      97af23cc
  7. 05 Dec, 2017 1 commit
  8. 07 Nov, 2017 1 commit
  9. 12 Jun, 2017 1 commit
  10. 24 Apr, 2017 1 commit
  11. 23 Mar, 2017 1 commit
  12. 18 Mar, 2017 1 commit
    • Julien Muchembled's avatar
      master: fix crash when a transaction begins while a storage node starts operation · 781b4eb5
      Julien Muchembled authored
      Traceback (most recent call last):
        ...
        File "neo/lib/handler.py", line 72, in dispatch
          method(conn, *args, **kw)
        File "neo/master/handlers/client.py", line 70, in askFinishTransaction
          conn.getPeerId(),
        File "neo/master/transactions.py", line 387, in prepare
          assert node_list, (ready, failed)
      AssertionError: (set([]), frozenset([]))
      
      Master log leading to the crash:
        PACKET    #0x0009 StartOperation                 > S1
        PACKET    #0x0004 BeginTransaction               < C1
        DEBUG     Begin <...>
        PACKET    #0x0004 AnswerBeginTransaction         > C1
        PACKET    #0x0001 NotifyReady                    < S1
      
      It was wrong to process BeginTransaction before receiving NotifyReady.
      
      The changes in the storage are cosmetics: the 'ready' attribute has become
      redundant with 'operational'.
      781b4eb5
  13. 14 Mar, 2017 1 commit
  14. 02 Mar, 2017 1 commit
    • Julien Muchembled's avatar
      storage: fix PT updates in case of late AnswerUnfinishedTransactions · a74937c8
      Julien Muchembled authored
      This is done by moving
              self.replicator.populate()
      after the switch to MasterOperationHandler, so that the latter is not delayed.
      
      This change comes with some refactoring of the main loop,
      to clean up app.checker and app.replicator properly (like app.tm).
      
      Another option could have been to process notifications with the last handler,
      instead of the first one. But if possible, cleaning up the whole code to not
      delay handlers anymore looks the best option.
      a74937c8
  15. 14 Feb, 2017 2 commits
  16. 02 Feb, 2017 2 commits
  17. 18 Jan, 2017 1 commit
  18. 01 Dec, 2016 1 commit
  19. 27 Nov, 2016 1 commit
    • Julien Muchembled's avatar
      Fix identification issues, including a race condition causing id conflicts · 9385706f
      Julien Muchembled authored
      The added test describes how the new id timestamps fix the race condition.
      These timestamps could be any unique opaque values, and the protocol is
      extended to exchange them along with node ids.
      
      Internally, nodes also reuse timestamps as a marker to identify the first
      NotifyNodeInformation packets from the master: since this packet is a complete
      list of nodes in the cluster, any other node in the node manager has left the
      cluster definitely and is removed.
      
      The secondary masters didn't receive update about master nodes.
      It's also useless to send them information about non-master nodes.
      9385706f
  20. 25 Jan, 2016 1 commit
  21. 01 Dec, 2015 1 commit
    • Julien Muchembled's avatar
      Safer DB truncation, new 'truncate' ctl command · d3c8b76d
      Julien Muchembled authored
      With the previous commit, the request to truncate the DB was not stored
      persistently, which means that this operation was still vulnerable to the case
      where the master is restarted after some nodes, but not all, have already
      truncated. The master didn't have the information to fix this and the result
      was a DB partially truncated.
      
      -> On a Truncate packet, a storage node only stores the tid somewhere, to send
         it back to the master, which stays in RECOVERING state as long as any node
         has a different value than that of the node with the latest partition table.
      
      We also want to make sure that there is no unfinished data, because a user may
      truncate at a tid higher than a locked one.
      
      -> Truncation is now effective at the end on the VERIFYING phase, just before
         returning the last ids to the master.
      
      At last all nodes should be truncated, to avoid that an offline node comes back
      with a different history. Currently, this would not be an issue since
      replication is always restart from the beginning, but later we'd like they
      remember where they stopped to replicate.
      
      -> If a truncation is requested, the master waits for all nodes to be pending,
         even if it was previously started (the user can still force the cluster to
         start with neoctl). And any lost node during verification also causes the
         master to go back to recovery.
      
      Obviously, the protocol has been changed to split the LastIDs packet and
      introduce a new Recovery, since it does not make sense anymore to ask last ids
      during recovery.
      d3c8b76d
  22. 30 Nov, 2015 1 commit
    • Julien Muchembled's avatar
      Perform DB truncation during recovery, send PT to storages before verification · 3e3eab5b
      Julien Muchembled authored
      Currently, the database may only be truncated when leaving backup mode, but
      the issue will be the same when neoctl gets a new command to truncate at an
      arbitrary tid: we want to be sure that all nodes are truncated before anything
      else.
      
      Therefore, we stop sending Truncate orders before stopping operation because
      nodes could fail/exit before actually processing them. Truncation must also
      happen before asking nodes their last ids.
      
      With this commit, if a truncation is requested:
      - this is always the first thing done when a storage node connects to the
        primary master during the RECOVERING phase,
      - and the cluster does not start automatically if there are missing nodes,
        unless an admin forces it.
      
      Other changes:
      - Connections to storage nodes don't need to be aborted anymore when leaving
        backup mode.
      - The master always initiates communication when a storage node identifies,
        which simplifies code and reduces the number of exchanged packets.
      3e3eab5b
  23. 05 Oct, 2015 1 commit
  24. 24 Sep, 2015 2 commits
  25. 12 Aug, 2015 2 commits
  26. 24 Jun, 2015 1 commit
  27. 21 May, 2015 1 commit
  28. 05 May, 2015 1 commit
  29. 30 Jul, 2014 1 commit
  30. 22 Jul, 2014 1 commit
  31. 04 Jul, 2014 1 commit
  32. 03 Jun, 2014 1 commit
  33. 07 Jan, 2014 1 commit
  34. 23 Aug, 2012 1 commit
  35. 21 Aug, 2012 1 commit