Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
erp5 erp5
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Labels
    • Labels
  • Merge requests 139
    • Merge requests 139
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Environments
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Jobs
  • Commits
Collapse sidebar
  • nexedi
  • erp5erp5
  • Merge requests
  • !1526

You need to sign in or sign up before continuing.
Merged
Created Dec 22, 2021 by Vincent Pelletier@vpelletierOwner

Products.CMFActivity.ActivityTool: Improve behaviour on single-node instances.

  • Overview 21
  • Commits 1
  • Changes 1

Background:

I investigated abnormal activity spawning patterns on Romain's dev instance when reindexing the entire site, which contains about 1 million documents:

  • The main reindexation phase was spawning indexation activities which were not being validated, so once all _recursiveReindexObjects were done in SQLQueue there were over a million indexation activities in SQLDict. This is because ActivityTool.tic keeps looping as long as it finds activities to run. This is perfectly fine when another process is doing activity validation, but when the cluster is composed of a single zope this completely freezes the activity validation process. This not only causes such activity accumulation, but also means that any interactive use of the site is impossible: indexation activities spawned by interactive use are also never validated.
  • When by chance some (recursiveReindexObject) activities in SQLDict did get validated, they were not executed for as long as _recursiveReindexObject activities existed in SQLQueue. This is because recursiveReindexObject are spawned without node preference, but _recursiveReindexObject is. These choices make sense, but they also mean that the effective priority of the former is 3, while the priority of the latter is 2. This, combined with the fact that they are spawned in different queues means, and the fact that _recursiveReindexObject respawns itself and is immediately validated (inserted with processing_node=0) means that SQLDict is never executed for as long as _recursiveReindexObject exist.

The first point is fixed by ActivityTool.process_timer telling ActivityTool.tic whether it is allowed to keep executing activities, and disallowing it when current node is the validation node. The internal logic of breaking the iteration when a queue could execute activities is preserved, so that activity validation happens before queue priorities are recomputed.

The second point is fixed by not setting same node preference when spawning activities at a time when there is a single processing node. This is done at activity insertion because it seems easier to do with a very low overhead than during priority computations later in the activity's lifecycle. This means that a cluster temporarily set with a single processing node will trigger this condition for all activities spawned during such period, but I believe this is exceedingly rare, and the temporary performance loss from having sub-optimal node selection in such transitory configuration should be meaningless. Explicit node family choices are obeyed independently of the number of processing nodes.

These changes should have an unnoticeable performance impact on multi-processing-nodes setups.

These changes should have a positive effect on multi-processing-nodes setups by improving the behaviour of a node configured both as validation node and as processing node (which is historically not a recommended setup), as it will now not completely stall validation for as long as there are processable activities. I would still recommend against such setup, as it will necessarily increase the validation latency, which will have a negative effect on activity performance.

With these changes, the activity spawning & execution pattern on Romain's single-node instance was much more stable.

/cc @jm @romain

Assignee
Assign to
Reviewer
Request review from
None
Milestone
None
Assign milestone
Time tracking
Source branch: cmfactivity_single_zope
GitLab Nexedi Edition | About GitLab | About Nexedi | 沪ICP备2021021310号-2 | 沪ICP备2021021310号-7