Commit 7daaf0a5 authored by Vincent Pelletier's avatar Vincent Pelletier

Work around poor UPDATE use of index.

UPDATE query is exected to use the existing index on (processing_node,
priority, date) both for WHERE and ORDER BY, as is expected from
EXPLAIN-ing the equivalent SELECT:

MariaDB [erp5]> explain select uid from message_queue WHERE processing_node=0 AND date <= '2013-06-06 22:22:49' ORDER BY priority, date LIMIT 1;
+------+-------------+---------------+------+----------------------------------------------------------+-------------------------------+---------+-------+-------+--------------------------+
| id   | select_type | table         | type | possible_keys                                            | key                           | key_len | ref   | rows  | Extra                    |
+------+-------------+---------------+------+----------------------------------------------------------+-------------------------------+---------+-------+-------+--------------------------+
|    1 | SIMPLE      | message_queue | ref  | processing_node_processing,processing_node_priority_date | processing_node_priority_date | 2       | const | 26622 | Using where; Using index |
+------+-------------+---------------+------+----------------------------------------------------------+-------------------------------+---------+-------+-------+--------------------------+

If it weren't using the index for ORDER BY, "Extra" would contain
"Using filesort".

Still, UPDATE behaves differently:

  # User@Host: user[user] @  [10.0.0.3]
  # Thread_id: 1635880  Schema: erp5  QC_hit: No
  # Query_time: 2.668405  Lock_time: 2.460698  Rows_sent: 0  Rows_examined: 49263
  # Full_scan: No  Full_join: No  Tmp_table: No  Tmp_table_on_disk: No
  # Filesort: Yes  Filesort_on_disk: No  Merge_passes: 0
  SET TIMESTAMP=1370557446;
  UPDATE
    message_queue
  SET
    processing_node=12
  WHERE
    processing_node=0
    AND DATE <= '2013-06-06 22:24:04'
    ORDER BY
    priority, DATE
  LIMIT 1;

So change the UPDATE..SELECT pattern into a SELECT FOR UPDATE..UPDATE
pattern, so SELECT's correct execution plan is used.
parent 3b74878d
......@@ -193,19 +193,36 @@ class SQLBase(Queue):
This number is guaranted not to be exceeded.
If None (or not given) no limit apply.
"""
select = activity_tool.SQLBase_selectReservedMessageList
if group_method_id:
reserve = limit - 1
else:
result = select(table=self.sql_table, count=limit,
processing_node=processing_node)
reserve = limit - len(result)
if reserve:
activity_tool.SQLBase_reserveMessageList(table=self.sql_table,
count=reserve, processing_node=processing_node, to_date=date,
group_method_id=group_method_id)
result = select(table=self.sql_table,
processing_node=processing_node, count=limit)
assert limit
# Do not check already-assigned messages when trying to reserve more
# activities, because in such case we will find one reserved activity.
result = activity_tool.SQLBase_selectReservedMessageList(
table=self.sql_table,
count=limit,
processing_node=processing_node,
group_method_id=group_method_id,
)
limit -= len(result)
if limit:
reservable = activity_tool.SQLBase_getReservableMessageList(
table=self.sql_table,
count=limit,
processing_node=processing_node,
to_date=date,
group_method_id=group_method_id,
)
if reservable:
activity_tool.SQLBase_reserveMessageList(
uid=[x.uid for x in reservable],
table=self.sql_table,
processing_node=processing_node,
)
# DC.ZRDB.Results.Results does not implement concatenation
# Implement an imperfect (but cheap) concatenation. Do not update
# __items__ nor _data_dictionary.
assert result._names == reservable._names, (result._names,
reservable._names)
result._data += reservable._data
return result
def makeMessageListAvailable(self, activity_tool, uid_list):
......
<dtml-comment>
title:
connection_id:cmf_activity_sql_connection
max_rows:0
max_cache:0
cache_time:0
class_name:
class_file:
</dtml-comment>
<params>table
processing_node
to_date
count
group_method_id
</params>
SELECT
*
FROM
<dtml-var table>
WHERE
processing_node=0
AND date <= <dtml-sqlvar to_date type="datetime">
<dtml-if expr="group_method_id is not None">
AND group_method_id = <dtml-sqlvar group_method_id type="string">
</dtml-if>
ORDER BY
<dtml-comment>
During normal operation, sorting by date (as 2nd criteria) is fairer
for users and reduce the probability to do the same work several times
(think of an object that is modified several times in a short period of time).
However, current implementation is not optimal when reindexing a whole site
with several mount points (to different ZEO servers), because modules may not
be processed in parallel. If you want to speed up ERP5Site_reindexAll,
consider:
- ordering by 'priority, RAND()' temporarily;
- or better, hack ERP5Site_reindexAll so that all reindex messages have
identical/random dates (hint: add optional parameter to Folder_reindexAll
and Folder_reindexObjectList in order to forward a date from
ERP5Site_reindexAll, e.g. current date would work if MySQL
shuffles enough lines with same priority/date).
- or even better, use NEO <http://www.neoppod.org/>
For higher concurrency than 10 or 20 nodes of activity, it might be required
to add a random start point to reduce the risk of MySQL locks.
</dtml-comment>
priority, date
LIMIT <dtml-sqlvar count type="int">
FOR UPDATE
......@@ -9,40 +9,13 @@ class_file:
</dtml-comment>
<params>table
processing_node
to_date
count
group_method_id
uid
</params>
UPDATE
<dtml-var table>
SET
processing_node=<dtml-sqlvar processing_node type="int">
WHERE
processing_node=0
AND date <= <dtml-sqlvar to_date type="datetime">
<dtml-if expr="group_method_id is not None">
AND group_method_id = <dtml-sqlvar group_method_id type="string">
</dtml-if>
ORDER BY
<dtml-comment>
During normal operation, sorting by date (as 2nd criteria) is fairer
for users and reduce the probability to do the same work several times
(think of an object that is modified several times in a short period of time).
However, current implementation is not optimal when reindexing a whole site
with several mount points (to different ZEO servers), because modules may not
be processed in parallel. If you want to speed up ERP5Site_reindexAll,
consider:
- ordering by 'priority, RAND()' temporarily;
- or better, hack ERP5Site_reindexAll so that all reindex messages have
identical/random dates (hint: add optional parameter to Folder_reindexAll
and Folder_reindexObjectList in order to forward a date from
ERP5Site_reindexAll, e.g. current date would work if MySQL
shuffles enough lines with same priority/date).
- or even better, use NEO <http://www.neoppod.org/>
For higher concurrency than 10 or 20 nodes of activity, it might be required
to add a random start point to reduce the risk of MySQL locks.
</dtml-comment>
priority, date
LIMIT <dtml-sqlvar count type="int">
<dtml-sqltest uid type="int" multiple>
<dtml-var sql_delimiter>
COMMIT
......@@ -9,6 +9,7 @@ class_file:
</dtml-comment>
<params>table
processing_node
group_method_id
count</params>
SELECT
*
......@@ -16,6 +17,9 @@ FROM
<dtml-var table>
WHERE
processing_node = <dtml-sqlvar processing_node type="int">
<dtml-if expr="group_method_id is not None">
AND group_method_id = <dtml-sqlvar group_method_id type="string">
</dtml-if>
<dtml-if expr="count is not None">
LIMIT <dtml-sqlvar count type="int">
</dtml-if>
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment