1. 16 Jul, 2018 1 commit
  2. 22 Jun, 2018 2 commits
    • Maximize resiliency by taking into account the topology of storage nodes · 97af23cc
      This commit adds a contraint when tweaking the partition table with replicas,
      so that cells of each partition are assigned as far as possible from each
      other, e.g. not on the same machine even if each one has several disks, and
      in any case not on the same storage device.
      
      Currently, the topology path of each node is automatically calculated by the
      storage backend. Both MySQL and SQLite return a 2-tuple (host, st_dev).
      To be improved:
      - Add a storage option to override the path: the 'tweak' algorithm can already
        handle topology paths of any length, so something like (room, machine, disk)
        could be done easily.
      - Write OS-specific code to determine the real hardware behind st_dev
        (e.g. 2 different 'st_dev' values may actually refer to the same disk,
         because of layers like partitioning, device-mapper, loop, btrfs subvolumes,
         and so on).
      - Make 'neoctl' report in some way if the PT is optimal. Meanwhile,
        if it isn't, the master only logs a WARNING during tweak.
      Julien Muchembled committed
    • storage: also commit updated cell TID at each replicated chunk of 'obj' records · d4ea398d
      This is a follow-up of commit b3dd6973
      ("Optimize resumption of replication by starting from a greater TID").
      I missed the case where a storage node is restarted while it is replicating:
      it lost the TID where it was interrupted.
      
      Although we commit after each replicated chunk, to avoid transferring again
      all the data from the beginning, it could still waste time to check that
      the data are already replicated.
      Julien Muchembled committed
  3. 21 Jun, 2018 1 commit
  4. 19 Jun, 2018 4 commits
  5. 04 Jun, 2018 1 commit
  6. 31 May, 2018 1 commit
    • tests/cluster: speedup waiting a bit · d08c83d4
      NEO functional tests use pdb.wait() in a few places, for example in
      NEOCluster .run(), .start() and .expectCondition(). The wait
      implementation uses polling with exponentially growing wait period.
      
      With the following instrumentation
      
      	--- a/neo/tests/cluster.py
      	+++ b/neo/tests/cluster.py
      	@@ -236,6 +236,7 @@ def wait(self, test, timeout):
      	                         return False
      	             finally:
      	                 cluster_dict.release()
      	+            print 'next_sleep:', next_sleep
      	             sleep(next_sleep)
      	         return True
      
      during execution of functional tests it is not uncommon to see the
      following sleep periods
      
      	next_sleep: 0.1
      	next_sleep: 0.1
      	next_sleep: 0.15
      	next_sleep: 0.225
      	next_sleep: 0.3375
      	next_sleep: 0.50625
      	next_sleep: 0.1
      	next_sleep: 0.1
      	next_sleep: 0.15
      	next_sleep: 0.225
      	next_sleep: 0.3375
      	next_sleep: 0.50625
      	next_sleep: 0.1
      	next_sleep: 0.1
      	next_sleep: 0.1
      	next_sleep: 0.15
      	next_sleep: 0.225
      	next_sleep: 0.3375
      	next_sleep: 0.1
      	next_sleep: 0.1
      	next_sleep: 0.1
      	next_sleep: 0.1
      	next_sleep: 0.1
      	next_sleep: 0.1
      	next_sleep: 0.1
      	next_sleep: 0.1
      	next_sleep: 0.1
      	next_sleep: 0.1
      	next_sleep: 0.15
      	next_sleep: 0.225
      	next_sleep: 0.3375
      	next_sleep: 0.50625
      
      .
      
      Without going into reworking the wait mechanism to use real
      notifications instead of polling, it was observed that the exponential
      progression tends to create too coarse sleeps. Initial 0.1s interval was
      found to be also too much.
      
      This patch remove the exponential period growth and reduces period by order
      of one magnitude. For functional tests timings on my computer it is thus:
      
      before patch:
      
      	Functional tests
      
      	28 Tests, 0 Failed
      
      	Title                     : TestRunner
      	Date                      : 2018-04-04
      	Node                      : deco
      	Machine                   : x86_64
      	System                    : Linux
      	Python                    : 2.7.14
      
      	Directory                 : /tmp/neo_tests/1522868674.115798
      	Status                    : 100.000%
      	NEO_TESTS_ADAPTER         : SQLite
      
      	                               NEO TESTS REPORT
      
      	              Test Module |  run  | unexpected | expected | skipped |  time
      	--------------------------+-------+------------+----------+---------+----------
      	                   Client |    6  |       .    |      .   |     .   |   8.51s
      	                  Cluster |    7  |       .    |      .   |     .   |   9.84s
      	                   Master |    4  |       .    |      .   |     .   |   9.68s
      	                  Storage |   11  |       .    |      .   |     .   |  20.76s
      	--------------------------+-------+------------+----------+---------+----------
      	     neo.tests.functional |       |            |          |         |
      	--------------------------+-------+------------+----------+---------+----------
      	                  Summary |   28  |       .    |      .   |     .   |  48.79s
      	--------------------------+-------+------------+----------+---------+----------
      
      after patch:
      
      	Functional tests
      
      	28 Tests, 0 Failed
      
      	Title                     : TestRunner
      	Date                      : 2018-04-04
      	Node                      : deco
      	Machine                   : x86_64
      	System                    : Linux
      	Python                    : 2.7.14
      
      	Directory                 : /tmp/neo_tests/1522868527.624376
      	Status                    : 100.000%
      	NEO_TESTS_ADAPTER         : SQLite
      
      	                               NEO TESTS REPORT
      
      	              Test Module |  run  | unexpected | expected | skipped |  time
      	--------------------------+-------+------------+----------+---------+----------
      	                   Client |    6  |       .    |      .   |     .   |   7.38s
      	                  Cluster |    7  |       .    |      .   |     .   |   7.05s
      	                   Master |    4  |       .    |      .   |     .   |   8.22s
      	                  Storage |   11  |       .    |      .   |     .   |  19.22s
      	--------------------------+-------+------------+----------+---------+----------
      	     neo.tests.functional |       |            |          |         |
      	--------------------------+-------+------------+----------+---------+----------
      	                  Summary |   28  |       .    |      .   |     .   |  41.87s
      	--------------------------+-------+------------+----------+---------+----------
      
      in other words ~ 10% improvement for the whole time to run functional tests.
      
      /reviewed-by @vpelletier, @jm
      /reviewed-on !10
      Kirill Smelkov committed
  7. 30 May, 2018 6 commits
    • protocol: update packet docstrings · 9f0f2afe
      /reviewed-on !9
      Julien Muchembled committed
    • Bump protocol version · f62f9bc9
      Julien Muchembled committed
    • protocol: a single byte is more than enough to encode enums · 52db5607
      Julien Muchembled committed
    • protocol: small cleanup in packet registration · a00ab78b
      I made a mistake in commit 13a64cfe
      ("Simplify definition of packets by computing automatically their codes").
      My intention was that the code an answer packet continues to only differ by the
      highest bit, as implemented now by this commit.
      
      Before:
        0x0001, 0x8002   Ask1, Answer1
        0x0003           Notify2
        0x0004, 0x8005   Ask3, Answer3
        0x0006, 0x8007   Ask4, Answer4
      
      After:
        0x0001, 0x8001   Ask1, Answer1
        0x0002           Notify2
        0x0003, 0x8003   Ask3, Answer3
        0x0004, 0x8004   Ask4, Answer4
      
      This makes the protocol easier to document.
      
      And by not wasting the range of possible values, it seems we have enough
      space to shrink to a single byte.
      
      This also removes code that became meaningless since that codes are generated
      automatically.
      Julien Muchembled committed
    • Optimize resumption of replication by starting from a greater TID · b3dd6973
      Although data that are already transferred aren't transferred again, checking
      that the data are there for a whole partition can still be a lot of work for
      big databases. This commit is a major performance improvement in that a storage
      node that gets disconnected for a short time now gets fully operational quite
      instantaneously because it only has to replicate the new data. Before, the time
      to recover depended on the size of the DB.
      
      For OUT_OF_DATE cells, the difficult part was that they are writable and
      can then contain holes, so we can't just take the last TID in trans/obj
      (we wrongly did that at the beginning, and then committed
      6b1f198f as a workaround). We solve that
      by storing up to where it was up-to-date: this value is initialized from
      the last TIDs in trans/obj when the state switches from UP_TO_DATE/FEEDING.
      
      There's actually one such OUT_OF_DATE TID per assigned cell (backends store
      these values in the 'pt' table). Otherwise, a cell that still has a lot to
      replicate would still cause all other cells to resume from the a very small
      TID, or even ZERO_TID; the worse case is when a new cell is assigned to a node
      (as a result of tweak).
      
      For UP_TO_DATE cells of a backup cluster, replication was resumed from the
      maximum TID at which all assigned cells are known to be fully replicated.
      Like for OUT_OF_DATE cells, the presence of a late cell could cause a lot of
      extra work for others, the worst case being when setting up a backup cluster
      (it always restarted from ZERO_TID as long as at least 1 cell was still empty).
      Because UP_TO_DATE cells are guaranteed to have no holes, there's no need to
      store extra information: we simply look at the last TIDs in trans/obj.
      We even handle trans & obj independently, to minimize the work in 1 table
      (i.e. trans since it's processed first) if the other is late (obj).
      
      There's a small change in the protocol so that OUT_OF_DATE enum value equals 0.
      This way, backends can store the OUT_OF_DATE TID (verbatim) in the same column
      as the cell state.
      
      Note about MySQL changes in commit ca58ccd7:
      what we did as a workaround is not one any more. Now, we do so much on Python
      side that it's unlikely we could reduce the number of queries using GROUP BY.
      We even stopped doing that for SQLite.
      Julien Muchembled committed
    • importer: update comment about a workaround for ZODB3 · fa9664ee
      Julien Muchembled committed
  8. 25 May, 2018 1 commit
  9. 24 May, 2018 9 commits
  10. 17 May, 2018 1 commit
  11. 16 May, 2018 5 commits
  12. 15 May, 2018 1 commit
  13. 11 May, 2018 3 commits
  14. 07 May, 2018 4 commits