1. 08 Nov, 2018 15 commits
  2. 07 Nov, 2018 8 commits
  3. 05 Nov, 2018 2 commits
  4. 06 Sep, 2018 2 commits
    • storage: fix assertion failure in case of connection reset with a client node · 652f1f0d
      Here is what happened after simulating a network failure between a client and
      a storage:
      
      C8
      
      DEBUG   recv failed for <SSLSocketConnectorIPv6 at 0x7f8198027f90 fileno 17 ('xxxx:xxxx:120:cd8::90a1', 53970), opened to ('xxxx:xxxx:60:4c2c::25c3', 39085)>: ECONNRESET (Connection reset by peer)
      DEBUG   connection closed for <MTClientConnection(uuid=S2, address=[xxxx:xxxx:60:4c2c::25c3]:39085, handler=StorageEventHandler, closed, client) at 7f81939a0950>
      DEBUG   connection started for <MTClientConnection(uuid=S2, address=[xxxx:xxxx:60:4c2c::25c3]:39085, handler=StorageEventHandler, fd=17, on_close=onConnectionClosed, connecting, client) at 7f8192eb17d0>
      PACKET  #0x0000 RequestIdentification          > S2 ([xxxx:xxxx:60:4c2c::25c3]:39085)        | (<EnumItem CLIENT (2)>, -536870904, None, '...', [], 1535555463.455761)
      DEBUG   SSL handshake done for <SSLSocketConnectorIPv6 at 0x7f8192eb1850 fileno 17 ('xxxx:xxxx:120:cd8::90a1', 54014), opened to ('xxxx:xxxx:60:4c2c::25c3', 39085)>: ECDHE-RSA-AES256-GCM-SHA384 256
      DEBUG   connection completed for <MTClientConnection(uuid=S2, address=[xxxx:xxxx:60:4c2c::25c3]:39085, handler=StorageEventHandler, fd=17, on_close=onConnectionClosed, client) at 7f8192eb17d0> (from xxxx:xxxx:120:cd8::90a1:54014)
      DEBUG   <SSLSocketConnectorIPv6 at 0x7f8192eb1850 fileno 17 ('xxxx:xxxx:120:cd8::90a1', 54014), opened to ('xxxx:xxxx:60:4c2c::25c3', 39085)> closed in recv
      DEBUG   connection closed for <MTClientConnection(uuid=S2, address=[xxxx:xxxx:60:4c2c::25c3]:39085, handler=StorageEventHandler, closed, client) at 7f8192eb17d0>
      ERROR   Connection to <StorageNode(uuid=S2, address=[xxxx:xxxx:60:4c2c::25c3]:39085, state=RUNNING, connection=None, not identified) at 7f81a8874690> failed
      
      S2
      
      DEBUG   accepted a connection from xxxx:xxxx:120:cd8::90a1:54014
      DEBUG   SSL handshake done for <SSLSocketConnectorIPv6 at 0x7f657144a910 fileno 22 ('xxxx:xxxx:60:4c2c::25c3', 39085), opened from ('xxxx:xxxx:120:cd8::90a1', 54014)>: ECDHE-RSA-AES256-GCM-SHA384 256
      DEBUG   connection completed for <ServerConnection(uuid=None, address=[xxxx:xxxx:120:cd8::90a1]:54014, handler=IdentificationHandler, fd=22, server) at 7f657144a090> (from xxxx:xxxx:60:4c2c::25c3:39085)
      PACKET  #0x0000 RequestIdentification          < None ([xxxx:xxxx:120:cd8::90a1]:54014)         | (<EnumItem CLIENT (2)>, -536870904, None, '...', [], 1535555463.455761)
      DEBUG   connection closed for <ServerConnection(uuid=None, address=[xxxx:xxxx:120:cd8::90a1]:54014, handler=IdentificationHandler, closed, server) at 7f657144a090>
      WARNING A connection was lost during identification
      ERROR   Pre-mortem data:
      ERROR   Traceback (most recent call last):
      ERROR     File "neo/storage/app.py", line 194, in run
      ERROR       self._run()
      ERROR     File "neo/storage/app.py", line 225, in _run
      ERROR       self.doOperation()
      ERROR     File "neo/storage/app.py", line 310, in doOperation
      ERROR       poll()
      ERROR     File "neo/storage/app.py", line 134, in _poll
      ERROR       self.em.poll(1)
      ERROR     File "neo/lib/event.py", line 160, in poll
      ERROR       to_process.process()
      ERROR     File "neo/lib/connection.py", line 499, in process
      ERROR       self._handlers.handle(self, self._queue.pop(0))
      ERROR     File "neo/lib/connection.py", line 85, in handle
      ERROR       self._handle(connection, packet)
      ERROR     File "neo/lib/connection.py", line 100, in _handle
      ERROR       pending[0][1].packetReceived(connection, packet)
      ERROR     File "neo/lib/handler.py", line 123, in packetReceived
      ERROR       self.dispatch(*args)
      ERROR     File "neo/lib/handler.py", line 72, in dispatch
      ERROR       method(conn, *args, **kw)
      ERROR     File "neo/storage/handlers/identification.py", line 56, in requestIdentification
      ERROR       assert not node.isConnected(), node
      ERROR   AssertionError: <ClientNode(uuid=C8, state=RUNNING, connection=<ServerConnection(uuid=C8, address=[xxxx:xxxx:120:cd8::90a1]:53970, handler=ClientOperationHandler, fd=18, on_close=onConnectionClosed, server) at 7f657147d7d0>) at 7f65714d6cd0>
      Julien Muchembled committed
  5. 03 Sep, 2018 1 commit
    • qa: document a rare random failure in testExport · b54c1c68
      Traceback (most recent call last):
        File "neo/tests/functional/testClient.py", line 241, in testExport
          self.assertEqual(dump, self.__dump(neo_db.storage, list))
        File "neo/tests/functional/testClient.py", line 210, in __dump
          for t in storage.iterator()}
        File "neo/tests/functional/testClient.py", line 207, in <dictcomp>
          return {u64(t.tid): sorted((u64(o.oid), o.data_txn and u64(o.data_txn),
        File "neo/client/iterator.py", line 69, in iterator
          max_tid, chunk = app.transactionLog(start, stop, CHUNK_LENGTH)
        File "neo/client/app.py", line 841, in transactionLog
          Packets.AskTIDsFrom(start, stop, limit, offset))
        File "neo/client/app.py", line 296, in _askStorageForRead
          return askStorage(conn, packet)
        File "neo/client/app.py", line 164, in _askStorage
          return self._ask(conn, packet, handler=self.storage_handler, **kw)
        File "neo/lib/threaded_app.py", line 144, in _ask
          _handlePacket(qconn, qpacket, kw, handler)
        File "neo/lib/threaded_app.py", line 133, in _handlePacket
          handler.dispatch(conn, packet, kw)
        File "neo/lib/handler.py", line 72, in dispatch
          method(conn, *args, **kw)
        File "neo/lib/handler.py", line 208, in error
          getattr(self, Errors[code])(conn, message)
        File "neo/lib/handler.py", line 227, in backendNotImplemented
          raise NotImplementedError(message)
      NotImplementedError: neo.storage.database.importer.ImporterDatabaseManager does not implement getReplicationTIDList
      Julien Muchembled committed
  6. 13 Aug, 2018 1 commit
  7. 07 Aug, 2018 2 commits
    • Use argparse instead of optparse · 9f1e4eef
      Besides the use of another module for option parsing, the main change is that
      there's no more Config class that mixes configuration for different components.
      Application classes now takes a simple 'dict' with parsed values.
      
      The changes in 'neoctl' are somewhat ugly, because command-line options are not
      defined on the command-line class, but this component is likely to disappear
      in the future.
      
      It remains possible to pass options via a configuration file. The code is a bit
      complex but isolated in neo.lib.config
      
      For SSL, the code may be simpler if we change for a single --ssl option that
      takes 3 paths. Not done to not break compatibility. Hence, the hack with
      an extra OptionList class in neo.lib.app
      
      A new functional test tests the 'neomigrate' script, instead of just the
      internal API to migrate data.
      Julien Muchembled committed
    • neolog: use argparse instead of optparse · 56d0b764
      Julien Muchembled committed
  8. 06 Aug, 2018 1 commit
    • Add comment about dormant bug when sending a lot of data to a slow node · 05e19861
      This mainly concerns the storage node, and depending on how its polling loop is
      changed, the following crash could happen again during replication:
      
        File "neo/scripts/neostorage.py", line 66, in main
          app.run()
        File "neo/storage/app.py", line 147, in run
          self._run()
        File "neo/storage/app.py", line 178, in _run
          self.doOperation()
        File "neo/storage/app.py", line 258, in doOperation
          _poll(0)
        File "neo/lib/event.py", line 231, in _poll
          conn.writable()
        File "neo/lib/connection.py", line 418, in writable
          if self.connector.send():
        File "neo/lib/connector.py", line 179, in send
          n = self.socket.send(msg)
        File "ssl.py", line 719, in send
          v = self._sslobj.write(data)
      OverflowError: string longer than 2147483647 byte
      Julien Muchembled committed
  9. 30 Jul, 2018 1 commit
  10. 16 Jul, 2018 1 commit
  11. 22 Jun, 2018 2 commits
    • Maximize resiliency by taking into account the topology of storage nodes · 97af23cc
      This commit adds a contraint when tweaking the partition table with replicas,
      so that cells of each partition are assigned as far as possible from each
      other, e.g. not on the same machine even if each one has several disks, and
      in any case not on the same storage device.
      
      Currently, the topology path of each node is automatically calculated by the
      storage backend. Both MySQL and SQLite return a 2-tuple (host, st_dev).
      To be improved:
      - Add a storage option to override the path: the 'tweak' algorithm can already
        handle topology paths of any length, so something like (room, machine, disk)
        could be done easily.
      - Write OS-specific code to determine the real hardware behind st_dev
        (e.g. 2 different 'st_dev' values may actually refer to the same disk,
         because of layers like partitioning, device-mapper, loop, btrfs subvolumes,
         and so on).
      - Make 'neoctl' report in some way if the PT is optimal. Meanwhile,
        if it isn't, the master only logs a WARNING during tweak.
      Julien Muchembled committed
    • storage: also commit updated cell TID at each replicated chunk of 'obj' records · d4ea398d
      This is a follow-up of commit b3dd69730cf0e4273e1be33ee3a5ee382836b3b3
      ("Optimize resumption of replication by starting from a greater TID").
      I missed the case where a storage node is restarted while it is replicating:
      it lost the TID where it was interrupted.
      
      Although we commit after each replicated chunk, to avoid transferring again
      all the data from the beginning, it could still waste time to check that
      the data are already replicated.
      Julien Muchembled committed
  12. 21 Jun, 2018 1 commit
  13. 19 Jun, 2018 3 commits