- 27 Apr, 2019 8 commits
-
-
Julien Muchembled authored
The following 2 operations can be onerous and they should not be directly usable without some kind of confirmation by the user: - Dropping a node now requires to first stop it. - Tweaking does not exclude anymore automatically DOWN nodes, because a node could go DOWN between the moment the user sends the command to tweak and the actual tweak by the master.
-
Julien Muchembled authored
-
Julien Muchembled authored
Initially, I wanted to do the simulation inside neoctl but it has no knowledge of the topology (the master don't send devpath values of storage nodes). Therefore, the work is delegated to the master node, which implies a change of the protocol.
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
This stops abusing ProtocolError, which disconnects the admin node needlessly. The many 'if ... raise RuntimeError' in neo/neoctl/neoctl.py could be turned into assertions.
-
Julien Muchembled authored
neoctl gets a new command to change the number of replicas. The number of replicas becomes a new partition table attribute and like the PT id, it is stored in the config table. On the other side, the configuration value for the number of partitions is dropped, since it can be computed from the partition table, which is always stored in full. The -p/-r master options now only apply at database creation. Some implementation notes: - The protocol is slightly optimized in that the master now sends automatically the whole partition tables to the admin & client nodes upon connection, like for storage nodes. This makes the protocol more consistent, and the master is the only remaining node requesting partition tables, during recovery. - Some parts become tricky because app.pt can be None in more cases. For example, the extra condition in NodeManager.update (before app.pt.dropNode) was added for this is the reason. Or the 'loadPartitionTable' method (storage) that is not inlined because of unit tests. Overall, this commit simplifies more than it complicates. - In the master handlers, we stop hijacking the 'connectionCompleted' method for tasks to be performed (often send the full partition table) on handler switches. - The admin's 'bootstrapped' flag could have been removed earlier: race conditions can't happen since the AskNodeInformation packet was removed (commit d048a52d).
-
Julien Muchembled authored
It is often faster to set up replicas by stopping a node (and any underlying database server like MariaDB) and do a raw copy of the database (e.g. with rsync). So far, it required to stop the whole cluster and use tools like 'mysql' or sqlite3' to edit: - the 'pt' table in databases, - the 'config.nid' values of the new nodes. With this new option, if you already have 1 replica, you can set up new replicas with such fast raw copy, and without interruption of service. Obviously, this implies less redundancy during the operation.
-
- 21 Mar, 2019 1 commit
-
-
Julien Muchembled authored
This breaks compatibily but it was mentionned from the beginning that these options are only there for testing purpose. TODO: rename all remaining occurrences of UUID into NID in the code
-
- 11 Mar, 2019 1 commit
-
-
Julien Muchembled authored
-
- 26 Feb, 2019 1 commit
-
-
Julien Muchembled authored
-
- 31 Dec, 2018 1 commit
-
-
Julien Muchembled authored
In functional tests (or anything reusing this framework), the mapping could be incorrect at the beginning of logs.
-
- 05 Dec, 2018 1 commit
-
-
Julien Muchembled authored
neolog has new options: -N for old behaviour, and -C to show the cluster name.
-
- 07 Nov, 2018 2 commits
-
-
Julien Muchembled authored
This fixes stuck replication when a client loses connection to the master during a commit.
-
Julien Muchembled authored
-
- 07 Aug, 2018 1 commit
-
-
Julien Muchembled authored
Besides the use of another module for option parsing, the main change is that there's no more Config class that mixes configuration for different components. Application classes now takes a simple 'dict' with parsed values. The changes in 'neoctl' are somewhat ugly, because command-line options are not defined on the command-line class, but this component is likely to disappear in the future. It remains possible to pass options via a configuration file. The code is a bit complex but isolated in neo.lib.config For SSL, the code may be simpler if we change for a single --ssl option that takes 3 paths. Not done to not break compatibility. Hence, the hack with an extra OptionList class in neo.lib.app A new functional test tests the 'neomigrate' script, instead of just the internal API to migrate data.
-
- 22 Jun, 2018 1 commit
-
-
Julien Muchembled authored
This commit adds a contraint when tweaking the partition table with replicas, so that cells of each partition are assigned as far as possible from each other, e.g. not on the same machine even if each one has several disks, and in any case not on the same storage device. Currently, the topology path of each node is automatically calculated by the storage backend. Both MySQL and SQLite return a 2-tuple (host, st_dev). To be improved: - Add a storage option to override the path: the 'tweak' algorithm can already handle topology paths of any length, so something like (room, machine, disk) could be done easily. - Write OS-specific code to determine the real hardware behind st_dev (e.g. 2 different 'st_dev' values may actually refer to the same disk, because of layers like partitioning, device-mapper, loop, btrfs subvolumes, and so on). - Make 'neoctl' report in some way if the PT is optimal. Meanwhile, if it isn't, the master only logs a WARNING during tweak.
-
- 29 Mar, 2018 1 commit
-
-
Julien Muchembled authored
This is a follow-up of commit 2ca7c335, which changed 'tweak' not to discard readable cells too quickly. The scenario of a storage being lost whereas it has feeding cells was forgotten. These must be discarded immediately, otherwise we end up with more up-to-date cells than wanted. Without the change in outdate(), testSafeTweak would end with: UU.|U.U|UUU Once replication is optimized not to always restart checking cells from the beginning: - Remembering that an out-of-date cell was feeding could be a safer option, but it may not be worth the extra complexity. - Another possibility may be to replace the FEEDING state by an automatic partial tweak that only discards up-to-date cells too many whenever a cell becomes up-to-date.
-
- 02 Mar, 2018 3 commits
-
-
Julien Muchembled authored
Before, it waited for upstream activity until all partitions are touched. However, when upstream is idle the backup cluster could remain stuck forever if it was interrupted whereas some cells were still late.
-
Julien Muchembled authored
The 'min_tid < new_tid' assertion failed when jumping to the past.
-
Julien Muchembled authored
Given that: - read locks are only taken by transactions (not replication) - in backup mode, storage nodes stay in UP_TO_DATE state, even if partitions are synchronized up to different tids there was a race condition with the master node replying to LastTransaction with a TID that may not be replicated yet by all replicas, potentially causing such replicas to reply OidDoesNotExist or OidNotFound if a client asks it data too early. IOW, even if the cluster does contain the data up to `getBackupTid(max)`, it is only readable by NEO clients up to `getBackupTid(min)` as long as the cluster is in BACKINGUP state.
-
- 12 Jun, 2017 1 commit
-
-
Julien Muchembled authored
The most important change is that it does not discard readable cells too quickly anymore. A partition can now have multiple FEEDING cells, to avoid going below the wanted level of replication. The new algorithm is also better at minimizing the amount replication.
-
- 12 May, 2017 1 commit
-
-
Julien Muchembled authored
Since it's not worth anymore to keep track of the last connection activity (which, btw, ignored TCP ACKs, i.e. timeouts could theorically be triggered before all the data were actually sent), the semantics of closeClient has also changed. Before this commit, the 1-minute timeout was reset whenever there was activity (connection still used as server). Now, it happens exactly 100 seconds after the connection is not used anymore as client.
-
- 10 May, 2017 1 commit
-
-
Julien Muchembled authored
Now, the primary master is the running master with None displayed in the last column. Before, it could be the id timestamp of when it was secondary, which was obsolete information.
-
- 02 May, 2017 1 commit
-
-
Julien Muchembled authored
This fixes the following crash: Traceback (most recent call last): ... File "neo/master/handlers/identification.py", line 94, in requestIdentification uuid = app.getNewUUID(uuid, address, node_type) File "neo/master/app.py", line 449, in getNewUUID assert uuid != self.uuid AssertionError
-
- 27 Apr, 2017 1 commit
-
-
Julien Muchembled authored
-
- 25 Apr, 2017 2 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
-
- 24 Apr, 2017 3 commits
-
-
Julien Muchembled authored
The election is not a separate process anymore. It happens during the RECOVERING phase, and there's no use of timeouts anymore. Each master node keeps a timestamp of when it started to play the primary role, and the node with the smallest timestamp is elected. The election stops when the cluster is started: as long as it is operational, the primary master can't be deposed. An election must happen whenever the cluster is not operational anymore, to handle the case of a network cut between a primary master and all other nodes: then another master node (secondary) takes over and when the initial primary master is back, it loses against the new primary master if the cluster is already started.
-
Julien Muchembled authored
-
Julien Muchembled authored
In order to do that correctly, this commit contains several other changes: When connecting to a primary master, a full node list always follows the identification. For storage nodes, this means that they now know all nodes during the RECOVERING phase. The initial full node list now always contains a node tuple for: - the server-side node (i.e. the primary master): on a master, this is done by always having a node describing itself in its node manager. - the client-side node, to make sure it gets a id timestamp: now an admin node also receives a node for itself.
-
- 31 Mar, 2017 5 commits
-
-
Julien Muchembled authored
The bug could lead to data corruption (if a partition is wrongly marked as UP_TO_DATE) or crashes (assertion failure on either the storage or the master). The protocol is extended to handle the following scenario: S M partition 0 outdated <-- UnfinishedTransactions ------> replication of partition 0 ... partition 1 outdated --- UnfinishedTransactions ... ... replication finished --- ReplicationDone ... tweak <-- partition 1 discarded -------- tweak <-- partition 1 outdated --------- ... UnfinishedTransactions --> ... ReplicationDone ---------> The master can't simply mark all outdated cells as being updatable when it receives an UnfinishedTransactions packet.
-
Julien Muchembled authored
After an attempt to read from a non-readable, which happens when a client has a newer or older PT than storage's, the client now retries to read. This bugfix is for all kinds of read-access except undoLog, which can still report incomplete results.
-
Julien Muchembled authored
This revert commit bddc1802, to fix the following storage crash: Traceback (most recent call last): ... File "neo/lib/handler.py", line 72, in dispatch method(conn, *args, **kw) File "neo/storage/handlers/master.py", line 44, in notifyPartitionChanges app.pt.update(ptid, cell_list, app.nm) File "neo/lib/pt.py", line 231, in update assert node is not None, 'No node found for uuid ' + uuid_str(uuid) AssertionError: No node found for uuid S3 Partitition table updates must also be processed with InitializationHandler when nodes remain in PENDING state because they're not added to the cluster.
-
Julien Muchembled authored
-
Julien Muchembled authored
-
- 23 Mar, 2017 2 commits
-
-
Julien Muchembled authored
It becomes possible to answer with several packets: - the last is the usual associated answer packet - all other (previously sent) packets are notifications Connection.send does not return the packet id anymore. This is not useful enough, and the caller can inspect the sent packet (getId).
-
Julien Muchembled authored
-
- 18 Mar, 2017 1 commit
-
-
Julien Muchembled authored
Traceback (most recent call last): ... File "neo/lib/handler.py", line 72, in dispatch method(conn, *args, **kw) File "neo/master/handlers/client.py", line 70, in askFinishTransaction conn.getPeerId(), File "neo/master/transactions.py", line 387, in prepare assert node_list, (ready, failed) AssertionError: (set([]), frozenset([])) Master log leading to the crash: PACKET #0x0009 StartOperation > S1 PACKET #0x0004 BeginTransaction < C1 DEBUG Begin <...> PACKET #0x0004 AnswerBeginTransaction > C1 PACKET #0x0001 NotifyReady < S1 It was wrong to process BeginTransaction before receiving NotifyReady. The changes in the storage are cosmetics: the 'ready' attribute has become redundant with 'operational'.
-
- 14 Mar, 2017 1 commit
-
-
Julien Muchembled authored
An issue that happened for the first time on a storage node didn't always cause other nodes to flush their logs, which made debugging difficult.
-