Commit 6b2e0b92 authored by Julien Muchembled's avatar Julien Muchembled

searchAndActivate: rework 'activity_count' parameter

- It's not deprecated anymore.
- When activities are grouped at CMFActivity level, it specifies a number of
  activities that are generated at a time instead of a number of activity
  groups.
- The special None value means no limit.
parent c42c1d38
Pipeline #31510 failed with stage
in 0 seconds
...@@ -1349,26 +1349,33 @@ class CatalogTool (UniqueObject, ZCatalog, CMFCoreCatalogTool, ActiveObject): ...@@ -1349,26 +1349,33 @@ class CatalogTool (UniqueObject, ZCatalog, CMFCoreCatalogTool, ActiveObject):
document returned by catalog) as first positional argument. document returned by catalog) as first positional argument.
Use 'packet_size' parameter to limit the size of each group (default: 30). Use 'packet_size' parameter to limit the size of each group (default: 30).
'activity_count' parameter is deprecated. The maximum number of activities that are generated by this method
Its value should be hardcoded because CMFActivity can now handle many (before activating itself) can be tweaked with 'activity_count'.
activities efficiently and any tweak should benefit to everyone. Except as a way to limit the number of processing nodes, this should be
However, there are still rare cases where one want to limit the number rarely used because CMFActivity can handle many activities efficiently
of processing nodes, to minimize latency of high-priority activities. and we should rather have good default values.
The special None value means no limit, which can be useful when a catalog
search is so slow and doesn't return too many results.
'packet_size' is deprecated when used without 'select_method_id'.
""" """
catalog_kw = kw.copy() catalog_kw = kw.copy()
select_method_id = catalog_kw.pop('select_method_id', None) select_method_id = catalog_kw.pop('select_method_id', None)
limit = catalog_kw.pop('activity_count', 0)
if select_method_id: if select_method_id:
packet_size = catalog_kw.pop('packet_size', 30) packet_size = catalog_kw.pop('packet_size', 30)
limit = packet_size * catalog_kw.pop('activity_count', 100) if limit is not None:
limit = packet_size * (limit or 100)
elif 'packet_size' in catalog_kw: # BBB elif 'packet_size' in catalog_kw: # BBB
assert not group_kw, (kw, group_kw) assert not group_kw, (kw, group_kw)
packet_size = catalog_kw.pop('packet_size') packet_size = catalog_kw.pop('packet_size')
group_method_cost = 1. / packet_size group_method_cost = 1. / packet_size
limit = packet_size * catalog_kw.pop('activity_count', 100) if limit == 0:
limit = 100 * packet_size
else: else:
group_method_cost = group_kw.get('group_method_cost', .034) # 30 objects group_method_cost = group_kw.get('group_method_cost', .034) # 30 objects
limit = catalog_kw.pop('activity_count', None) or \ if limit == 0:
100 * int(ceil(1 / group_method_cost)) limit = 100 * int(ceil(1 / group_method_cost))
if min_uid: if min_uid:
catalog_kw['min_uid'] = SimpleQuery(uid=min_uid, catalog_kw['min_uid'] = SimpleQuery(uid=min_uid,
comparison_operator='>') comparison_operator='>')
......
  • About no-limit behaviour: what is then the added value of calling searchAndActivate ?

    In other word, why not then directly doing:

    for row in portal_catalog(...):
      row.activate(...).doFoo(...)

    This allows getting rid of searchAndActivate's expensive sort on uid, so it should get better performance, and this code seems simple enough that it should not require wrapping it in a function.

    The only benefit I can think of when calling into code which already wraps searchAndActive and sometimes runs fast queries and/or with large result sets, and sometimes runs slow queries with small result sets. Which seems rather niche.

  • Oops how I missed the useless sort ? Anyway, at least in my case, it didn't slow down:

    • with sort: Query_time: 10590.548910 Lock_time: 0.000064 Rows_sent: 97636 Rows_examined: 2383154
    • without: Query_time: 10622.009253 Lock_time: 0.000099 Rows_sent: 97635 Rows_examined: 2381524

    BTW, the query is for a migration script that will be only run twice (1 to simulate & 1 for real; it's not worth optimizing):

      self.portal_catalog._searchAndActivate(script_id,
        method_kw=dict(...),
        activate_kw=dict(active_process=active_process, tag=script_id, priority=11),
        portal_type='Simulation Movement',
        strict_resource_uid=[a list of 6 service uids],
        group_by='parent_uid')

    The explain by MariaDB 10.1:

    +------+-------------+----------------------------------------+--------+---------------------+------------+---------+-------------------------------------------------+---------+-----------------------------------------------------------+
    | id   | select_type | table                                  | type   | possible_keys       | key        | key_len | ref                                             | rows    | Extra                                                     |
    +------+-------------+----------------------------------------+--------+---------------------+------------+---------+-------------------------------------------------+---------+-----------------------------------------------------------+
    |    1 | SIMPLE      | related_strict_resource_uid_1_category | range  | PRIMARY,Membership  | Membership | 16      | NULL                                            | 2350894 | Using where; Using index; Using temporary; Using filesort |
    |    1 | SIMPLE      | catalog                                | eq_ref | PRIMARY,Portal Type | PRIMARY    | 8       | erp5.related_strict_resource_uid_1_category.uid |       1 | Using where                                               |
    +------+-------------+----------------------------------------+--------+---------------------+------------+---------+-------------------------------------------------+---------+-----------------------------------------------------------+

    About no-limit behaviour: what is then the added value of calling searchAndActivate ?

    Indeed, not much. In addition to the reason you gave:

    • it sets a default grouping
    • it's not trivial anymore when select_method_id is used
    • in my use case, the code was written from the beginning with _searchAndActivate (and no perf issue on a smaller instance) and I didn't want to rewrite all the above code
  • at least in my case, it didn't slow down:

    Oh, indeed, when and there is no limit, the sort happens last, and with such small(-ish) result set the sort should be fast.

    In addition to the reason you gave:

    All good points, thanks. So this change looks good to me, and thank you for the follow-up change disabling the sort.

Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment