1. 31 Dec, 2018 4 commits
  2. 05 Dec, 2018 1 commit
  3. 21 Nov, 2018 4 commits
    • fixup! client: discard late answers to lockless writes · 8ef1ddba
      Since commit 50e7fe52,
      some code can be simplified.
      Julien Muchembled committed
    • client: fix race condition between Storage.load() and invalidations · a2e278d5
      This fixes a bug that could manifest as follows:
      
        Traceback (most recent call last):
          File "neo/client/app.py", line 432, in load
            self._cache.store(oid, data, tid, next_tid)
          File "neo/client/cache.py", line 223, in store
            assert item.tid == tid, (item, tid)
        AssertionError: (<CacheItem oid='\x00\x00\x00\x00\x00\x00\x00\x01' tid='\x03\xcb\xc6\xca\xfd\xc7\xda\xee' next_tid='\x03\xcb\xc6\xca\xfd\xd8\t\x88' data='...' counter=1 level=1 expire=10000 prev=<...> next=<...>>, '\x03\xcb\xc6\xca\xfd\xd8\t\x88')
      
      The big changes in the threaded test framework are required because we need to
      reproduce a race condition between client threads and this conflicts with the
      serialization of epoll events (deadlock).
      Julien Muchembled committed
    • client: fix race condition in refcounting dispatched answer packets · 743026d5
      This was found when stress-testing a big cluster. 1 client node was stuck:
      
        (Pdb) pp app.dispatcher.__dict__
        {'lock_acquire': <built-in method acquire of thread.lock object at 0x7f788c6e4250>,
        'lock_release': <built-in method release of thread.lock object at 0x7f788c6e4250>,
        'message_table': {140155667614608: {},
                          140155668875280: {},
                          140155671145872: {},
                          140155672381008: {},
                          140155672381136: {},
                          140155672381456: {},
                          140155673002448: {},
                          140155673449680: {},
                          140155676093648: {170: <neo.lib.locking.SimpleQueue object at 0x7f788a109c58>},
                          140155677536464: {},
                          140155679224336: {},
                          140155679876496: {},
                          140155680702992: {},
                          140155681851920: {},
                          140155681852624: {},
                          140155682773584: {},
                          140155685988880: {},
                          140155693061328: {},
                          140155693062224: {},
                          140155693074960: {},
                          140155696334736: {278: <neo.lib.locking.SimpleQueue object at 0x7f788a109c58>},
                          140155696411408: {},
                          140155696414160: {},
                          140155696576208: {},
                          140155722373904: {}},
        'queue_dict': {140155673622936: 1, 140155689147480: 2}}
      
      140155673622936 should not be queue_dict
      Julien Muchembled committed
    • More RTMIN+2 (log) information for clients and connections · 7e456329
      Julien Muchembled committed
  4. 15 Nov, 2018 3 commits
  5. 08 Nov, 2018 15 commits
  6. 07 Nov, 2018 8 commits
  7. 05 Nov, 2018 2 commits
  8. 06 Sep, 2018 2 commits
    • storage: fix assertion failure in case of connection reset with a client node · 652f1f0d
      Here is what happened after simulating a network failure between a client and
      a storage:
      
      C8
      
      DEBUG   recv failed for <SSLSocketConnectorIPv6 at 0x7f8198027f90 fileno 17 ('xxxx:xxxx:120:cd8::90a1', 53970), opened to ('xxxx:xxxx:60:4c2c::25c3', 39085)>: ECONNRESET (Connection reset by peer)
      DEBUG   connection closed for <MTClientConnection(uuid=S2, address=[xxxx:xxxx:60:4c2c::25c3]:39085, handler=StorageEventHandler, closed, client) at 7f81939a0950>
      DEBUG   connection started for <MTClientConnection(uuid=S2, address=[xxxx:xxxx:60:4c2c::25c3]:39085, handler=StorageEventHandler, fd=17, on_close=onConnectionClosed, connecting, client) at 7f8192eb17d0>
      PACKET  #0x0000 RequestIdentification          > S2 ([xxxx:xxxx:60:4c2c::25c3]:39085)        | (<EnumItem CLIENT (2)>, -536870904, None, '...', [], 1535555463.455761)
      DEBUG   SSL handshake done for <SSLSocketConnectorIPv6 at 0x7f8192eb1850 fileno 17 ('xxxx:xxxx:120:cd8::90a1', 54014), opened to ('xxxx:xxxx:60:4c2c::25c3', 39085)>: ECDHE-RSA-AES256-GCM-SHA384 256
      DEBUG   connection completed for <MTClientConnection(uuid=S2, address=[xxxx:xxxx:60:4c2c::25c3]:39085, handler=StorageEventHandler, fd=17, on_close=onConnectionClosed, client) at 7f8192eb17d0> (from xxxx:xxxx:120:cd8::90a1:54014)
      DEBUG   <SSLSocketConnectorIPv6 at 0x7f8192eb1850 fileno 17 ('xxxx:xxxx:120:cd8::90a1', 54014), opened to ('xxxx:xxxx:60:4c2c::25c3', 39085)> closed in recv
      DEBUG   connection closed for <MTClientConnection(uuid=S2, address=[xxxx:xxxx:60:4c2c::25c3]:39085, handler=StorageEventHandler, closed, client) at 7f8192eb17d0>
      ERROR   Connection to <StorageNode(uuid=S2, address=[xxxx:xxxx:60:4c2c::25c3]:39085, state=RUNNING, connection=None, not identified) at 7f81a8874690> failed
      
      S2
      
      DEBUG   accepted a connection from xxxx:xxxx:120:cd8::90a1:54014
      DEBUG   SSL handshake done for <SSLSocketConnectorIPv6 at 0x7f657144a910 fileno 22 ('xxxx:xxxx:60:4c2c::25c3', 39085), opened from ('xxxx:xxxx:120:cd8::90a1', 54014)>: ECDHE-RSA-AES256-GCM-SHA384 256
      DEBUG   connection completed for <ServerConnection(uuid=None, address=[xxxx:xxxx:120:cd8::90a1]:54014, handler=IdentificationHandler, fd=22, server) at 7f657144a090> (from xxxx:xxxx:60:4c2c::25c3:39085)
      PACKET  #0x0000 RequestIdentification          < None ([xxxx:xxxx:120:cd8::90a1]:54014)         | (<EnumItem CLIENT (2)>, -536870904, None, '...', [], 1535555463.455761)
      DEBUG   connection closed for <ServerConnection(uuid=None, address=[xxxx:xxxx:120:cd8::90a1]:54014, handler=IdentificationHandler, closed, server) at 7f657144a090>
      WARNING A connection was lost during identification
      ERROR   Pre-mortem data:
      ERROR   Traceback (most recent call last):
      ERROR     File "neo/storage/app.py", line 194, in run
      ERROR       self._run()
      ERROR     File "neo/storage/app.py", line 225, in _run
      ERROR       self.doOperation()
      ERROR     File "neo/storage/app.py", line 310, in doOperation
      ERROR       poll()
      ERROR     File "neo/storage/app.py", line 134, in _poll
      ERROR       self.em.poll(1)
      ERROR     File "neo/lib/event.py", line 160, in poll
      ERROR       to_process.process()
      ERROR     File "neo/lib/connection.py", line 499, in process
      ERROR       self._handlers.handle(self, self._queue.pop(0))
      ERROR     File "neo/lib/connection.py", line 85, in handle
      ERROR       self._handle(connection, packet)
      ERROR     File "neo/lib/connection.py", line 100, in _handle
      ERROR       pending[0][1].packetReceived(connection, packet)
      ERROR     File "neo/lib/handler.py", line 123, in packetReceived
      ERROR       self.dispatch(*args)
      ERROR     File "neo/lib/handler.py", line 72, in dispatch
      ERROR       method(conn, *args, **kw)
      ERROR     File "neo/storage/handlers/identification.py", line 56, in requestIdentification
      ERROR       assert not node.isConnected(), node
      ERROR   AssertionError: <ClientNode(uuid=C8, state=RUNNING, connection=<ServerConnection(uuid=C8, address=[xxxx:xxxx:120:cd8::90a1]:53970, handler=ClientOperationHandler, fd=18, on_close=onConnectionClosed, server) at 7f657147d7d0>) at 7f65714d6cd0>
      Julien Muchembled committed
  9. 03 Sep, 2018 1 commit
    • qa: document a rare random failure in testExport · b54c1c68
      Traceback (most recent call last):
        File "neo/tests/functional/testClient.py", line 241, in testExport
          self.assertEqual(dump, self.__dump(neo_db.storage, list))
        File "neo/tests/functional/testClient.py", line 210, in __dump
          for t in storage.iterator()}
        File "neo/tests/functional/testClient.py", line 207, in <dictcomp>
          return {u64(t.tid): sorted((u64(o.oid), o.data_txn and u64(o.data_txn),
        File "neo/client/iterator.py", line 69, in iterator
          max_tid, chunk = app.transactionLog(start, stop, CHUNK_LENGTH)
        File "neo/client/app.py", line 841, in transactionLog
          Packets.AskTIDsFrom(start, stop, limit, offset))
        File "neo/client/app.py", line 296, in _askStorageForRead
          return askStorage(conn, packet)
        File "neo/client/app.py", line 164, in _askStorage
          return self._ask(conn, packet, handler=self.storage_handler, **kw)
        File "neo/lib/threaded_app.py", line 144, in _ask
          _handlePacket(qconn, qpacket, kw, handler)
        File "neo/lib/threaded_app.py", line 133, in _handlePacket
          handler.dispatch(conn, packet, kw)
        File "neo/lib/handler.py", line 72, in dispatch
          method(conn, *args, **kw)
        File "neo/lib/handler.py", line 208, in error
          getattr(self, Errors[code])(conn, message)
        File "neo/lib/handler.py", line 227, in backendNotImplemented
          raise NotImplementedError(message)
      NotImplementedError: neo.storage.database.importer.ImporterDatabaseManager does not implement getReplicationTIDList
      Julien Muchembled committed