Products.CMFActivity.ActivityTool: Improve behaviour on single-node instances.

- Ignore node preference when spawning activities. Otherwise, activities which are not spawned with a preferred node will get an effective priority penalty compared to same-priority activities spawned *with* a node preference, despite both being to execute by the same processing node. - Break activity processing loop when the current processing node is also the activity validation node. This avoids pathological cases of activity accumulation, for example when reindexing an entire site: _recurseCallMethod is spawned in processing_node=0, but immediateReindexObject is spawned in processing_node=-1 because of serialization_tag dependency, so with such loop _recurseCallMethod will be executed over and over, piling indexation activities up until _recurseCallMethod does not self-respawn. In turn, such activity accumulation lead to an increased overhead, and decreased activity processing efficiency. This may also allow multi-node instances to more reliably use the validation node as a processing node. The cost for multi-node instances of these changes should be absolutely minimal (no extra IO necessary, minimal extra code). A possible drawback on single-node instances is that tic period may become more important because process_timer will return more often.

Products.CMFActivity.ActivityTool: Improve behaviour on single-node instances.
- Ignore node preference when spawning activities. Otherwise, activities which are not spawned with a preferred node will get an effective priority penalty compared to same-priority activities spawned *with* a node preference, despite both being to execute by the same processing node. - Break activity processing loop when the current processing node is also the activity validation node. This avoids pathological cases of activity accumulation, for example when reindexing an entire site: _recurseCallMethod is spawned in processing_node=0, but immediateReindexObject is spawned in processing_node=-1 because of serialization_tag dependency, so with such loop _recurseCallMethod will be executed over and over, piling indexation activities up until _recurseCallMethod does not self-respawn. In turn, such activity accumulation lead to an increased overhead, and decreased activity processing efficiency. This may also allow multi-node instances to more reliably use the validation node as a processing node. The cost for multi-node instances of these changes should be absolutely minimal (no extra IO necessary, minimal extra code). A possible drawback on single-node instances is that tic period may become more important because process_timer will return more often.
041642d0 · Vincent Pelletier · a55b0f78 · 041642d0
Commit 041642d0 authored Dec 21, 2021 by Vincent Pelletier
Show whitespace changes
Inline Side-by-side

Showing with 25 additions and 5 deletions

product/CMFActivity/ActivityTool.py product/CMFActivity/ActivityTool.py +25 -5

No files found.
--- a/product/CMFActivity/ActivityTool.py
+++ b/product/CMFActivity/ActivityTool.py
@@ -1293,8 +1293,9 @@ class ActivityTool (BaseTool):
            self.registerNode(currentNode)
            processing_node_list = self.getNodeList(role=ROLE_PROCESSING)
+            is_distributing_node = self.getDistributingNode() == currentNode
            # only distribute when we are the distributingNode
-            if self.getDistributingNode() == currentNode:
+            if is_distributing_node:
              self.distribute(len(processing_node_list))
            # SkinsTool uses a REQUEST cache to store skin objects, as
@@ -1310,7 +1311,20 @@ class ActivityTool (BaseTool):
            # the processing_node numbers are the indices of the elements
            # in the node tuple +1 because processing_node starts form 1
            if currentNode in processing_node_list:
-              self.tic(processing_node_list.index(currentNode) + 1)
+              self.tic(
+                processing_node=processing_node_list.index(currentNode) + 1,
+                # Tell tic it cannot keep processing activities forever when
+                # current node is also the distribution node: it must return
+                # in order for validation to happen. Otherwise, there can be
+                # an accumulation of activities in processing_node=-1 which
+                # would otherwise be possible to execute.
+                # To minimize the impact of this control, this is computed
+                # here, so if the current node is given the distribution node
+                # role while already in the "tic" method, it will keep looping.
+                # Node role changes are rare and several are dangerous on a
+                # busy cluster, so it should not be a big drawback.
+                can_loop=is_distributing_node,
+              )
          except:
            # Catch ALL exception to avoid killing timerserver.
            LOG('ActivityTool', ERROR, 'process_timer received an exception',
@@ -1330,7 +1344,7 @@ class ActivityTool (BaseTool):
        activity.distribute(aq_inner(self), node_count)
    security.declarePublic('tic')
-    def tic(self, processing_node=1, force=0):
+    def tic(self, processing_node=1, force=0, can_loop=True):
      """
        Starts again an activity
        processing_node starts from 1 (there is not node 0)
@@ -1371,6 +1385,8 @@ class ActivityTool (BaseTool):
                break
            else:
              break
+            if not can_loop:
+              break
          finally:
            is_running_lock.release()
      finally:
@@ -1462,9 +1478,13 @@ class ActivityTool (BaseTool):
        elif node != 'same':
          kw['node'] = self.getFamilyId(node)
          break
+        processing_node_list = self.getNodeList(role=ROLE_PROCESSING)
+        if len(processing_node_list) < 2:
+          # If there is less than 2 processing nodes at the time this activity
+          # is spawned, then ignore "same" (including implicit "same").
+          break
        try:
-          kw['node'] = 1 + self.getNodeList(
+          kw['node'] = 1 + processing_node_list.index(getCurrentNode())
-            role=ROLE_PROCESSING).index(getCurrentNode())
        except ValueError:
          pass
        break