1. 27 Jul, 2016 5 commits
    • Julien Muchembled's avatar
      38583af9
    • Julien Muchembled's avatar
      client: do not limit the number of open connections to storage nodes · 77132157
      Julien Muchembled authored
      There was a bug that connections were not maintained during a TPC,
      which caused transactions to be aborted when the limit was reached.
      
      Given that oids are spreaded evenly over all partitions, and that clients always
      write to all cells of each involved partitions, clients would spend their time
      reconnecting to storage nodes as soon as the limit is reached. So such feature
      really looks counter-productive.
      77132157
    • Julien Muchembled's avatar
    • Julien Muchembled's avatar
      client: fix conflict of node id by never reading from storage without being connected to the master · 11d83ad9
      Julien Muchembled authored
      Client nodes ignored the state of the connection to the master node when reading
      data from storage, as long as their partition tables were recent enough. This
      way, they were able to finish read-only transactions even if they could't reach
      the master, which could be useful for high availability. The downside is that
      the master node ignored that their node ids were still used, which causes "uuid"
      conflicts when reallocating them.
      
      Rejected solutions:
      - An unused NEO Storage should not insist in staying connected to master node.
      - Reverting to big random node identifiers is a lot of work and it would make
        debugging annoying (see commit 23fad3af).
      - Always increasing node ids could have been a simple solution if we accepted
        that the cluster dies after that all 2^24 possible ids were allocated.
      
      Given that reading from storage without being connected to the master can only
      be useful to finish the current transaction (because we always ping the master
      at the beginning of every transaction), keeping such feature is not worth the
      effort.
      
      This commit fixes id conflicts in a very simple way, by clearing the partition
      table upon primary node failure, which forces reconnection to the master before
      querying any storage node. In such case, we raise a special exception that will
      cause the transaction to be restarted, so that the user does not get errors for
      temporary connection failures.
      11d83ad9
    • Julien Muchembled's avatar
      storage: add comment about the idea to lock an oid before reporting a resolvable conflict · 4e17456b
      Julien Muchembled authored
      Currently, another argument not to lock is that we would not be able to test
      incremental resolution anymore. We can think about this again when deadlock
      resolution is implemented.
      4e17456b
  2. 24 Jul, 2016 5 commits
    • Julien Muchembled's avatar
      Fix race conditions in EventManager between _poll/connection_dict and (un)registration · 8b91706a
      Julien Muchembled authored
      The following error was reported on a client node:
      
          #0x0000 Error                   < None (2001:...:2051)
          1 (Retry Later)
          connection closed for <MTClientConnection(uuid=None, address=2001:...:2051, handler=PrimaryNotificationsHandler, closed, client) at 7f1ea7c42f90>
          Event Manager:
          connection started for <MTClientConnection(uuid=None, address=2001:...:2051, handler=PrimaryNotificationsHandler, fd=13, on_close=onConnectionClosed, connecting, client) at 7f1ea7c25c10>
          #0x0000 RequestIdentification          > None (2001:...:2051)
            Readers: []
            Writers: []
            Connections:
              13: <MTClientConnection(uuid=None, address=2001:...:2051, handler=PrimaryNotificationsHandler, fd=13, on_close=onConnectionClosed, connecting, client) at 7f1ea7c25c10> (pending=False)
          Node manager : 1 nodes
          * None |   MASTER | 2001:...:2051 | UNKNOWN
          <ClientCache history_size=0 oid_count=0 size=0 time=0 queue_length=[0] (life_time=10000 max_history_size=100000 max_size=20971520)>
          poll raised, retrying
          Traceback (most recent call last):
            File "neo/lib/threaded_app.py", line 93, in _run
              poll(1)
            File "neo/lib/event.py", line 134, in poll
              self._poll(0)
            File "neo/lib/event.py", line 164, in _poll
              conn = self.connection_dict[fd]
          KeyError: 13
      
      which means that:
      - while the poll thread is getting a (13, EPOLLIN) event because it is
        closed (aborted by the master)
      - another thread processes the error packet, by closing it in
        PrimaryBootstrapHandler.notReady
      - next, the poll thread resumes the execution of EpollEventManager._poll
        and fails to find fd=13 in self.connection_dict
      
      So here, we have a race condition between epoll_wait and any further use
      of connection_dict to map returned fds.
      
      However, what commit a4731a0c does to handle
      the case of fd reallocation only works for mono-threaded applications.
      In EPOLLIN, wrapping 'self.connection_dict[fd]' the same way as for other
      events is not enough. For example:
      - case 1:
        - thread 1: epoll returns fd=13
        - thread 2: close(13)
        - thread 2: open(13)
        - thread 1: self.connection_dict[13] does not match
                    but this would be handled by the 'unregistered' list
      - case 2:
        - thread 1: reset 'unregistered'
        - thread 2: close(13)
        - thread 2: open(13)
        - thread 1: epoll returns fd=13
        - thread 1: self.connection_dict[13] matches
                    but it would be wrongly ignored by 'unregistered'
      - case 3:
        - thread 1: about to call readable/writable/onTimeout on a connection
        - thread 2: this connection is closed
        - thread 1: readable/writable/onTimeout wrongly called on a closed connection
      
      We could protect _poll() with a lock, and make unregister() use wakeup() so
      that it gets a chance to acquire it, but that causes threaded tests to deadlock
      (continuing in this direction seems too complicated).
      
      So we have to deal with the fact that there can be race conditions at any time
      and there's no way to make 'connection_dict' match exactly what epoll returns.
      We solve this by preventing fd reallocation inside _poll(), which is fortunately
      possible with sockets, using 'shutdown': the closing of fds is delayed.
      
      For above case 3, readable/writable/onTimeout for MTClientConnection are also
      changed to test whether the connection is still open while it has the lock.
      Just for safety, we do the same for 'process'.
      
      At last, another kind of race condition that this commit also fixes concerns
      the use of itervalues() on EventManager.connection_dict.
      8b91706a
    • Julien Muchembled's avatar
      Indent many lines before any real change · 4a0b936f
      Julien Muchembled authored
      This is a preliminary commit, without any functional change,
      just to make the next one easier to review.
      4a0b936f
    • Julien Muchembled's avatar
      client: remove redundant check of new connections to the master · 9f4dd15e
      Julien Muchembled authored
      We already have logs when a connection fails,
      and ask() raises ConnectionClosed if the connection is closed.
      9f4dd15e
    • Vincent Pelletier's avatar
      e791dc3f
    • Vincent Pelletier's avatar
      b7e0ec7f
  3. 13 Jul, 2016 1 commit
  4. 17 Jun, 2016 2 commits
  5. 15 Jun, 2016 2 commits
  6. 08 Jun, 2016 2 commits
  7. 26 May, 2016 1 commit
    • Julien Muchembled's avatar
      client: fix the count of history items in the cache · e61b017f
      Julien Muchembled authored
      Cache items are stored in double-linked chains. In order to quickly know the
      number of history items, an extra attribute is used to count them. It was not
      always decremented when a history item was removed.
      
      This led to the following exception:
        <ClientCache history_size=100000 oid_count=1959 size=20970973 time=2849049 queue_length=[1, 7, 738, 355, 480, 66, 255, 44, 3, 5, 2, 1, 3, 4, 2, 2] (life_time=10000 max_history_size=100000 max_size=20971520)>
        poll raised, retrying
        Traceback (most recent call last):
          ...
          File "neo/client/handlers/master.py", line 137, in packetReceived
            cache.store(oid, data, tid, None)
          File "neo/client/cache.py", line 247, in store
            self._add(head)
          File "neo/client/cache.py", line 129, in _add
            self._remove(head)
          File "neo/client/cache.py", line 136, in _remove
            level = item.level
        AttributeError: 'NoneType' object has no attribute 'level'
      e61b017f
  8. 25 Apr, 2016 1 commit
  9. 20 Apr, 2016 2 commits
    • Julien Muchembled's avatar
    • Julien Muchembled's avatar
      storage: fix crash when trying to replicate from an unreachable node · 8b07ff98
      Julien Muchembled authored
      This fixes the following issue:
      
      WARNING replication aborted for partition 1
      DEBUG   connection started for <ClientConnection(uuid=None, address=...:43776, handler=StorageOperationHandler, fd=10, on_close=onConnectionClosed, connecting, client) at 7f5d2067fdd0>
      DEBUG   connect failed for <SocketConnectorIPv6 at 0x7f5d2067fe10 fileno 10 ('::', 0), opened to ('...', 43776)>: ENETUNREACH (Network is unreachable)
      WARNING replication aborted for partition 5
      DEBUG   connection started for <ClientConnection(uuid=None, address=...:43776, handler=StorageOperationHandler, fd=10, on_close=onConnectionClosed, connecting, client) at 7f5d1c409510>
      PACKET  #0x0000 RequestIdentification          > None (...:43776)  | (<EnumItem STORAGE (1)>, None, ('...', 60533), '...')
      ERROR   Pre-mortem data:
      ERROR   Traceback (most recent call last):
      ERROR     File "neo/storage/app.py", line 157, in run
      ERROR       self._run()
      ERROR     File "neo/storage/app.py", line 197, in _run
      ERROR       self.doOperation()
      ERROR     File "neo/storage/app.py", line 285, in doOperation
      ERROR       poll()
      ERROR     File "neo/storage/app.py", line 95, in _poll
      ERROR       self.em.poll(1)
      ERROR     File "neo/lib/event.py", line 121, in poll
      ERROR       self._poll(blocking)
      ERROR     File "neo/lib/event.py", line 165, in _poll
      ERROR       if conn.readable():
      ERROR     File "neo/lib/connection.py", line 481, in readable
      ERROR       self._closure()
      ERROR     File "neo/lib/connection.py", line 539, in _closure
      ERROR       self.close()
      ERROR     File "neo/lib/connection.py", line 531, in close
      ERROR       handler.connectionClosed(self)
      ERROR     File "neo/lib/handler.py", line 135, in connectionClosed
      ERROR       self.connectionLost(conn, NodeStates.TEMPORARILY_DOWN)
      ERROR     File "neo/storage/handlers/storage.py", line 59, in connectionLost
      ERROR       replicator.abort()
      ERROR     File "neo/storage/replicator.py", line 339, in abort
      ERROR       self._nextPartition()
      ERROR     File "neo/storage/replicator.py", line 260, in _nextPartition
      ERROR       None if name else app.uuid, app.server, name or app.name))
      ERROR     File "neo/lib/connection.py", line 562, in ask
      ERROR       raise ConnectionClosed
      ERROR   ConnectionClosed
      8b07ff98
  10. 18 Apr, 2016 1 commit
  11. 01 Apr, 2016 1 commit
  12. 31 Mar, 2016 1 commit
  13. 30 Mar, 2016 2 commits
  14. 28 Mar, 2016 2 commits
  15. 22 Mar, 2016 2 commits
  16. 21 Mar, 2016 3 commits
  17. 09 Mar, 2016 2 commits
  18. 08 Mar, 2016 2 commits
  19. 04 Mar, 2016 3 commits