Commits · 5f5d6516aa6cf18c6c7bd4a706c62a702ce09233 · Kirill Smelkov / neo

29 Nov, 2016 1 commit
- . · 5f5d6516
  Kirill Smelkov authored Nov 29, 2016
  
  5f5d6516
28 Nov, 2016 5 commits
- . · dae00a2b
  Kirill Smelkov authored Nov 28, 2016
  
  dae00a2b
- . · 63875e79
  Kirill Smelkov authored Nov 28, 2016
  
  63875e79
- . · 69a02de9
  Kirill Smelkov authored Nov 28, 2016
  
  69a02de9
- X proto += .id_timestamp · bac889e6
  Kirill Smelkov authored Nov 28, 2016
```
See 9385706f (Fix identification issues, including a race condition
causing id conflicts).
```
  bac889e6
- . · 4bedd3fc
  Kirill Smelkov authored Nov 28, 2016
  
  4bedd3fc
27 Nov, 2016 11 commits

Bump protocol version · 8eb14b01
Julien Muchembled authored Nov 27, 2016

8eb14b01

Fix identification issues, including a race condition causing id conflicts · 9385706f

Julien Muchembled authored Nov 24, 2016

The added test describes how the new id timestamps fix the race condition.
These timestamps could be any unique opaque values, and the protocol is
extended to exchange them along with node ids.

Internally, nodes also reuse timestamps as a marker to identify the first
NotifyNodeInformation packets from the master: since this packet is a complete
list of nodes in the cluster, any other node in the node manager has left the
cluster definitely and is removed.

The secondary masters didn't receive update about master nodes.
It's also useless to send them information about non-master nodes.

9385706f

protocol: simplify definition of Struct-based items · 54e819ff
Julien Muchembled authored Nov 24, 2016

54e819ff

Remove AskNodeInformation packet · d048a52d

Julien Muchembled authored Nov 25, 2016

When Client (including backup master) and admin nodes are identified,
the primary master now sends them automatically all nodes with
NotifyNodeInformation, as with storage nodes.

d048a52d

master: fix crashes in identification due to buggy nodes · 35664759
Julien Muchembled authored Nov 24, 2016
```
- check address conflicts
- on invalid values, reject peer instead of dying
```
35664759

lib.node: fix NodeManager accessors returning identified nodes · e7cccf01

Julien Muchembled authored Nov 23, 2016

Listing connected/connecting nodes with a UUID is used:
- in one place by storage nodes: here, it does not matter if we skip nodes that
  aren't really identified
- in many places by the master, only for server connections, in which case we
  have equivalence with real identification

So in practice, NodeManager is only simplified to reuse the 'identified'
property of nodes.

e7cccf01

lib.node: code refactoring · 5941b27d
Julien Muchembled authored Nov 23, 2016

5941b27d
storage: only accept clients that are known by the master · c17f5f91
Julien Muchembled authored Nov 23, 2016
```
Therefore, a client node in the node manager is always RUNNING.
```
c17f5f91

Give new ids to clients whose ids were already reallocated · d752aadb

Julien Muchembled authored Nov 21, 2016

Although the change applies to any node with a temporary ids (all but storage),
only clients don't have addresses and are therefore not recognizable.

After a client is disconnected from the master and before reconnecting, another
client may join the cluster and "steals" the id of the first client. This issue
leads to stuck clients, failing in loop with exceptions like the following one:

    ERROR ZODB.Connection Couldn't load state for 0x0251
    Traceback (most recent call last):
      File "ZODB/Connection.py", line 860, in setstate
        self._setstate(obj)
      File "ZODB/Connection.py", line 901, in _setstate
        p, serial = self._storage.load(obj._p_oid, '')
      File "neo/client/Storage.py", line 82, in load
        return self.app.load(oid)[:2]
      File "neo/client/app.py", line 353, in load
        data, tid, next_tid, _ = self._loadFromStorage(oid, tid, before_tid)
      File "neo/client/app.py", line 373, in _loadFromStorage
        for node, conn in self.cp.iterateForObject(oid, readable=True):
      File "neo/client/pool.py", line 91, in iterateForObject
        pt = self.app.pt
      File "neo/client/app.py", line 145, in __getattr__
        self._getMasterConnection()
      File "neo/client/app.py", line 214, in _getMasterConnection
        result = self.master_conn = self._connectToPrimaryNode()
      File "neo/client/app.py", line 246, in _connectToPrimaryNode
        handler=handler)
      File "neo/lib/threaded_app.py", line 154, in _ask
        _handlePacket(qconn, qpacket, kw, handler)
      File "neo/lib/threaded_app.py", line 135, in _handlePacket
        handler.dispatch(conn, packet, kw)
      File "neo/lib/handler.py", line 66, in dispatch
        method(conn, *args, **kw)
      File "neo/lib/handler.py", line 188, in error
        getattr(self, Errors[code])(conn, message)
      File "neo/client/handlers/__init__.py", line 23, in protocolError
        raise StorageError("protocol error: %s" % message)
    StorageError: protocol error: already connected

d752aadb

spelling: oudated -> outdated · b62b8dc3
Julien Muchembled authored Nov 27, 2016

b62b8dc3
Fix spelling mistakes · 6e32ebb7
Julien Muchembled authored Nov 21, 2016

6e32ebb7

25 Nov, 2016 7 commits
- . · 0f30552f
  Kirill Smelkov authored Nov 25, 2016
  
  0f30552f
- . · 8c564e42
  Kirill Smelkov authored Nov 25, 2016
  
  8c564e42
- coverage: CacheItem.__repr__ (client) · b61f8745
  Julien Muchembled authored Nov 24, 2016
  
  b61f8745
- New neotestrunner option for code coverage testing · 5de0ff3a
  Julien Muchembled authored Nov 24, 2016
  
  5de0ff3a
- . · fddbe14c
  Kirill Smelkov authored Nov 25, 2016
  
  fddbe14c
- . · e8954823
  Kirill Smelkov authored Nov 25, 2016
  
  e8954823
- Merge branch 'master' into t · 2998b840
  Kirill Smelkov authored Nov 25, 2016
```
* master:
  coverage: CacheItem.__repr__ (client)
  New neotestrunner option for code coverage testing
```
  2998b840
24 Nov, 2016 2 commits
- coverage: CacheItem.__repr__ (client) · 46491261
  Julien Muchembled authored Nov 24, 2016
  
  46491261
- New neotestrunner option for code coverage testing · c3145ff1
  Julien Muchembled authored Nov 24, 2016
  
  c3145ff1
23 Nov, 2016 10 commits

. · 8c736e77
Kirill Smelkov authored Nov 23, 2016

8c736e77
. · fa68b9e4
Kirill Smelkov authored Nov 23, 2016

fa68b9e4
Merge branch 'x/go' into t · 0b751f74
Kirill Smelkov authored Nov 23, 2016
```
* x/go:
  .
  .
  .
  X notes on partition table
  .
  .
  .
  .
  .
  .
```
0b751f74
. · 0d0ce246
Kirill Smelkov authored Nov 23, 2016

0d0ce246

Merge branch 'master' into x/go · f84a1095

Kirill Smelkov authored Nov 23, 2016

* master:
  client: fix item eviction from cache, which could break loading from storage
  Bump protocol version for new read-only mode in BACKUPING state
  backup: Teach cluster in BACKUPING state to also serve regular ZODB clients in read-only mode
  tests/threaded: Add handy shortcuts to NEOCluster to concisely check cluster properties in tests

f84a1095

. · cb46ccd2
Kirill Smelkov authored Nov 23, 2016

cb46ccd2
. · 6c996814
Kirill Smelkov authored Nov 23, 2016

6c996814
. · d0c3276a
Kirill Smelkov authored Nov 23, 2016

d0c3276a
rm zodbdump -> moved to https://lab.nexedi.com/nexedi/zodbtools · 6a74de9c
Kirill Smelkov authored Nov 23, 2016

6a74de9c

Merge branch 'master' into t · 5d2baac5

Kirill Smelkov authored Nov 23, 2016

* master:
  client: fix item eviction from cache, which could break loading from storage
  Bump protocol version for new read-only mode in BACKUPING state
  backup: Teach cluster in BACKUPING state to also serve regular ZODB clients in read-only mode
  tests/threaded: Add handy shortcuts to NEOCluster to concisely check cluster properties in tests

5d2baac5

21 Nov, 2016 2 commits

client: fix item eviction from cache, which could break loading from storage · 4ef05b9e

Julien Muchembled authored Nov 18, 2016

`ClientCache._oid_dict` shall not have empty values. For a given oid, when the
last item is removed from the cache, the oid must be removed as well to free
memory. In some cases, this was not done.

A consequence of this bug is the following exception:

    ERROR ZODB.Connection Couldn't load state for 0x02d1e1e4
    Traceback (most recent call last):
      File "ZODB/Connection.py", line 860, in setstate
        self._setstate(obj)
      File "ZODB/Connection.py", line 901, in _setstate
        p, serial = self._storage.load(obj._p_oid, '')
      File "neo/client/Storage.py", line 82, in load
        return self.app.load(oid)[:2]
      File "neo/client/app.py", line 358, in load
        self._cache.store(oid, data, tid, next_tid)
      File "neo/client/cache.py", line 228, in store
        prev = item_list[-1]
    IndexError: list index out of range

4ef05b9e

Bump protocol version for new read-only mode in BACKUPING state · 2b3993f1
Julien Muchembled authored Nov 21, 2016

2b3993f1

18 Nov, 2016 1 commit
- . · fe823e9c
  Kirill Smelkov authored Nov 18, 2016
  
  fe823e9c
15 Nov, 2016 1 commit

backup: Teach cluster in BACKUPING state to also serve regular ZODB clients in read-only mode · d4944062

Kirill Smelkov authored Nov 10, 2016

A backup cluster for tids <= backup_tid has all data to provide regular
read-only ZODB service. Having regular ZODB access to the data can be
handy e.g. for externally verifying data for consistency between
main and backup clusters. Peeking around without disturbing main
cluster might be also useful sometimes.

In this patch:

- master & storage nodes are taught:

* to instantiate read-only or regular client service handler depending on cluster state:
RUNNING -> regular
BACKINGUP -> read-only

* in read-only client handler:
+ to reject write-related operations
+ to provide read operations but adjust semantic as last_tid in the database
would be = backup_tid

- new READ_ONLY_ACCESS protocol error code is introduced so that client can
raise POSException.ReadOnlyError upon receiving it.

I have not implemented back-channel for invalidations in read-only mode (yet ?).
This way once a client connects to cluster in backup state, it won't see
new data fetched by backup cluster from upstream after client connected.

The reasons invalidations are not implemented is that for now (imho)
there is no off-hand ready infrastructure to get updates from
replicating node on transaction-by-transaction basis (it currently only
notifies when whole batch is done). For consistency verification (main
reason for this patch) we also don't need invalidations to work, as in
that task we always connect afresh to backup. So I simply only put
relevant TODOs about invalidations for now.

The patch is not very polished but should work.

/reviewed-on nexedi/neoppod!4

d4944062