• Julien Muchembled's avatar
    CMFActivity: new activate() parameter to prefer executing on the same node · 6dba0a9e
    Julien Muchembled authored
    The goal is to make better use of the ZODB Storage cache. It is common to do
    processing on a data set in several sequential transactions: in such case, by
    continuing execution of these messages on the same node, data is loaded from
    ZODB only once. Without this, and if there are many other messages to process,
    processing always continue on a random node, causing much more load from ZODB.
    
    To prevent nodes from having too much work to do, or too little compared to
    other nodes, this new parameter is only a hint for CMFActivity. It remains
    possible for a node to execute a message that was intended for another node.
    
    Before this commit, a processing node selects the first message(s) according to
    the following ordering:
    
      priority, date
    
    and now:
    
      priority, node_preference, date
    
    where node_preference is:
    
      -1 -> same node
       0 -> no preferred node
       1 -> another node
    
    The implementation is tricky for 2 reasons:
    - MariaDB can't order this way in a single simple query, so we have 1
      subquery for each case, potentially getting 3 times the wanted maximum of
      messages, then order/filter on the resulting union.
    - MariaDB also can't filter efficiently messages for other nodes, so the 3rd
      subquery returns messages for any node, potentially duplicating results from
      the first 2 subqueries. This works because they'll be ordered last.
      Unfortunately, this requires extra indices.
    
    In any case, message reservation must be very efficient, or MariaDB deadlocks
    quickly happen, and locking an activity table during reservation reduces
    parallelism too much.
    
    In addition to better cache efficiency, this new feature can be used as a
    workaround for a bug affecting serialiation_tag, causing IntegrityError when
    reindexing many new objects. If you have 2 recursive reindexations for both a
    document and one of its lines, and if you have so many messages than grouping
    is split between these 2 messages, then you end up with 2 nodes indexing the
    same line in parallel: for some tables, the pattern DELETE+INSERT conflicts
    since InnoDB does not take any lock when deleting a non-existent row.
    
    If you have many activities creating such documents, you can combine with
    grouping and appropriate priority to make sure that such pair of messages won't
    be executed on different nodes, except maybe at the end (when there's no
    document to create anymore; then activity reexecution may be enough).
    For example:
    
      from Products.CMFActivity.ActivityTool import getCurrentNode
      portal.setPlacelessDefaultReindexParameters(
        activate_kw={'node': 'same', 'priority': priority},
        group_id=getCurrentNode())
    
    where `priority` is the same as the activity containing the above code, which
    can also use grouping without increasing the probability of IntegrityError.
    6dba0a9e
Queue.py 7.38 KB