First step for manual cluster startup implementation :

The recovery stage is used only one time, when switching to primary state. Each time the cluster lost the operational status (a cell has no up to date node), the verification stage restart. git-svn-id: https://svn.erp5.org/repos/neo/branches/prototype3@794 71dcc9de-d417-0410-9af5-da40c76e7ee4

First step for manual cluster startup implementation :
The recovery stage is used only one time, when switching to primary state. Each time the cluster lost the operational status (a cell has no up to date node), the verification stage restart. git-svn-id: https://svn.erp5.org/repos/neo/branches/prototype3@794 71dcc9de-d417-0410-9af5-da40c76e7ee4
a81824a0 · Grégory Wisniewski · db31b672 · a81824a0
Commit a81824a0 authored Jul 06, 2009 by Grégory Wisniewski
Hide whitespace changes
Inline Side-by-side

Showing with 12 additions and 16 deletions

neo/master/app.py neo/master/app.py +12 -16

No files found.
--- a/neo/master/app.py
+++ b/neo/master/app.py
@@ -528,23 +528,20 @@ class Application(object):
        """Verify the data in storage nodes and clean them up, if necessary."""
        logging.info('start to verify data')

-        em = self.em
-        nm = self.nm
+        em, nm = self.em, self.nm
+        self.changeClusterState(protocol.VERIFYING)

-        # Wait ask/request primary master exchange with the last storage node
-        # because it have to be in the verification state
-        t = time()
-        while time() < t + 1:
+        # wait for any missing node
+        while not self.pt.operational():
            em.poll(1)

-        self.changeClusterState(protocol.VERIFYING)
-
        # FIXME this part has a potential problem that the write buffers can
        # be very huge. Thus it would be better to flush the buffers from time
        # to time _without_ reading packets.

        # Send the current partition table to storage and admin nodes, so that
        # all nodes share the same view.
+        # FIXME: the admin must ask itself the partition table
        for conn in em.getConnectionList():
            uuid = conn.getUUID()
            if uuid is not None:
@@ -690,15 +687,14 @@ class Application(object):
            if node.getState() == RUNNING_STATE:
                node.setState(TEMPORARILY_DOWN_STATE)

+        # recover the cluster status at startup
+        self.recoverStatus()
+
        while 1:
-            recovering = True
-            while recovering:
-                self.recoverStatus()
-                recovering = False
-                try:
-                    self.verifyData()
-                except VerificationFailure:
-                    recovering = True
+            try:
+                self.verifyData()
+            except VerificationFailure:
+                continue
            self.provideService()

    def playSecondaryRole(self):