Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
erp5 erp5
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Labels
    • Labels
  • Merge requests 136
    • Merge requests 136
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Environments
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Jobs
  • Commits
Collapse sidebar
  • nexedi
  • erp5erp5
  • Merge requests
  • !1491

Merged
Created Sep 17, 2021 by Vincent Pelletier@vpelletierOwner

CMFActivity.Activity.SQLBase: Reduce the number of deadlocks

  • Overview 14
  • Commits 1
  • Changes 1

MariaDB seems to be using inconsistent lock acquisition order when executing the activity reservation queries. As a consequence, it produces internal deadlocks, which it detects. Upon detection, it kills one of the involved query, which causes message reservation to fail, despite the presence of executable activities.

To avoid depending on MariaDB internal lock acquisition order, acquire an explicit table-scoped lock before running the activity reservation queries.

On an otherwise-idle 31 processing node cluster with the following activities spawned, designed to stress activity reservation queries (many ultra-short activities being executed one at a time):

active_getTitle = context.getPortalObject().portal_catalog.activate(
  activity='SQLQueue',
  priority=5,
  tag='foo',
).getTitle
for _ in xrange(40000):
  active_getTitle()

the results are:

  • a 26% shorter activity execution time: from 206s with the original code to 152s
  • a 100% reduction in reported deadlocks from 300 with the original code to 0

There is room for further improvements at a later time:

  • tweaking the amount of time spent waiting for this new lock to be available, set for now at 1s.
  • possibly bypassing this lock altogether when there are too few processing nodes simultaneously enabled, or even in an adaptive reaction to deadlock errors actually happening.
  • cover more write accesses to these tables with the same lock

From a production environment, it appears that the getReservedMessageList method alone is involved in 95% of these deadlocks, so for now this change only targets this method.

/cc @jm @georgios.dagkakis

Assignee
Assign to
Reviewer
Request review from
None
Milestone
None
Assign milestone
Time tracking
Source branch: cmfactivity_sqllock
GitLab Nexedi Edition | About GitLab | About Nexedi | 沪ICP备2021021310号-2 | 沪ICP备2021021310号-7