Commit 254a3712 authored by Boris Kocherov's avatar Boris Kocherov Committed by Vincent Pelletier

for SQLCatalog caching only transactional_cache_decorator is used

caching_instance_method not used
parent 15a947a6
......@@ -80,11 +80,9 @@ def noReadOnlyTransactionCache():
yield
try:
from Products.ERP5Type.Cache import \
readOnlyTransactionCache, caching_instance_method
readOnlyTransactionCache
except ImportError:
LOG('SQLCatalog', WARNING, 'Count not import caching_instance_method, expect slowness.')
def caching_instance_method(*args, **kw):
return lambda method: method
readOnlyTransactionCache = noReadOnlyTransactionCache
try:
......@@ -1092,10 +1090,7 @@ class Catalog(Folder,
pass
return tuple(result_list)
@caching_instance_method(id='SQLCatalog.getColumnIds',
cache_factory='erp5_content_long',
cache_id_generator=generateCatalogCacheId,
)
@transactional_cache_decorator('SQLCatalog.getColumnIds')
def _getColumnIds(self):
keys = set()
add_key = keys.add
......@@ -1121,10 +1116,6 @@ class Catalog(Folder,
security.declarePrivate('getColumnMap')
@transactional_cache_decorator('SQLCatalog.getColumnMap')
@caching_instance_method(id='SQLCatalog.getColumnMap',
cache_factory='erp5_content_long',
cache_id_generator=generateCatalogCacheId,
)
def getColumnMap(self):
"""
Calls the show column method and returns dictionnary of
......@@ -1139,10 +1130,6 @@ class Catalog(Folder,
security.declarePrivate('getResultColumnIds')
@transactional_cache_decorator('SQLCatalog.getResultColumnIds')
@caching_instance_method(id='SQLCatalog.getResultColumnIds',
cache_factory='erp5_content_long',
cache_id_generator=generateCatalogCacheId,
)
def getResultColumnIds(self):
"""
Calls the show column method and returns dictionnary of
......@@ -1156,10 +1143,6 @@ class Catalog(Folder,
security.declarePrivate('getSortColumnIds')
@transactional_cache_decorator('SQLCatalog.getSortColumnIds')
@caching_instance_method(id='SQLCatalog.getSortColumnIds',
cache_factory='erp5_content_long',
cache_id_generator=generateCatalogCacheId,
)
def getSortColumnIds(self):
"""
Calls the show column method and returns dictionnary of
......@@ -2014,10 +1997,7 @@ class Catalog(Folder,
"""
return self.sql_catalog_scriptable_keys
@caching_instance_method(id='SQLCatalog.getTableIndex',
cache_factory='erp5_content_long',
cache_id_generator=generateCatalogCacheId,
)
@transactional_cache_decorator('SQLCatalog.getTableIndex')
def _getTableIndex(self, table):
table_index = {}
method = getattr(self, self.sql_catalog_index, '')
......@@ -2549,10 +2529,6 @@ class Catalog(Folder,
return kw
@transactional_cache_decorator('SQLCatalog._getSearchKeyDict')
@caching_instance_method(id='SQLCatalog._getSearchKeyDict',
cache_factory='erp5_content_long',
cache_id_generator=generateCatalogCacheId,
)
def _getSearchKeyDict(self):
result = {}
search_key_column_dict = {
......
  • Hello,

    I'm not sure about the purpose of this patch, and it is not explained by the commit. There is surely good reason, good intentions.

    But it has been identified in very important project that this makes tons of additional queries to mariadb and it is clearly part of performance regression.

    Now we do update on another project, and we see regularly some "show columns from category" and similar queries in debug threads.

    I think it's great to have contributions, and I know when reviewing it's sometimes not at all obvious that some code will makes things slower.

    But, @vpelletier, why this was not immediately reverted once it has been identified has a clear performance issue ? May be it was wished to take time to do more measurements, but it's seems clear that doing about 15 (value coming from a project) additionals ping-pong with MariaDB for most transactions can be considered as a performance crime. I even heard about millions of such "show columns from xxx" done every day on a big project.

    So please revert it immediately, I can also revert tomorrow. We should apply same policy as patches introducing regression in unit test. We revert, then we take time to rework, this ensure keeping good quality in erp5@master. This also shows that it is urgent that we have back performance tests.

    Then for the purpose of this patch itself : 1 - please explain 2 - may be there is ways to use cache cookies. I mean, we might still continue using caching_method, but force all zopes to recalculate the value under some condition (for example when catalog configuration is changed) by adding a cache cookie in the key of the cache. 3 - if 2) doesn't help, we can help finding solutions. 4 - rework the patch

    /cc @vpelletier @bk @jm @tb

  • why this was not immediately reverted once it has been identified has a clear performance issue ?

    Has it been identified as a clear performance issue ? By who ?

    In my experience, although this was a visible and surpicious behaviour, working around it (with a very dirty patch not worthy of master) did not yield any measurable performance improvements, meaning its effect is at worse minor compared to the other issues we were experiencing.

    These major issues were about fulltext indexing (DELETE/UPDATE/SELECT causing deadlocks and general slowness, at least on MyISAM) whose volume increased a lot, and about new DISTINCT keyword added in some existing queries (to trigger a side effect, nothing to do with eliminating duplicate rows) which were then behaving very poorly on an older MariaDB. The latter is fixed in master, and for the former there is not really anything to fix in master (@kazuhiko can describe these much better than me).

    Maybe you do trigger a case where this issue is actually the performance-limiting factor. All I am saying that if you believe I did already know of it as a proven performance issue, it is completely false. And as a consequence, the task of improving this was on a low priority on my list.

    1 - please explain

    I should also have asked for a better commit message, as indeed this one does not reflect the purpose but only the execution. Sorry.

    The reason is that caches are bad[1]: they get out of sync with actual data.

    The most visible part of this cache was when Zope was started without MySQL available: all queries would execute as SELECT ... WHERE 1. And nowadays, thanks to catalog rejecting unknown columns instead of silently dropping th conditions, these queries would just raise.

    Schema change is totally outside of the reach of ERP5 (it may change when [dis]connecting, it may change because someone ran "alter table"...). The question is not about how normal it is to change schema outside of ERP5 control, the question is how bad the effect is. And the effect is that ERP5 can immediately emit errors and prevent any (person_module - based) authentication, which I think is not acceptable, even as the result of frowned-upon actions (emergency rescue actions are rarely based on good development practices).

    So:

    • I prefer to not revert this commit (at least not on public master), as it goes in the right direction: make ERP5 work (or at least recover gracefuly) when MySQL does not work.

      But I won't prevent you from reverting if you insist.

    • I totally agree that 15 queries just to know the table schema is silly. Especially nowadays as there is an information_schema database containing pseudo-tables containing all the data we need in a way which can be trivially querried. I have worked today in emergency mode on such patch, and think I have something reasonable.

      I should have requested this to be done before applying the patch. Sorry.

    • I prefer to have clear measures that this issue alone causes performance problems (especially in the single-query version) before adding any cache.

      I am deeply sorry I could not get around to testing this before you hit performance issues, be they related or not to this.

    [1] Remember: caches are among the hardest things to get right in computer science. They look simple, but cause persistent-but-also-transcient bugs where past usage influences current state (aka, total debugging nightmare), and they ignore (or require duplicate) security checks. My opinion is that there should to be a very serious reason, weighted by careful measurements and verified by peer-review and/or tests before a cache can be added.

    Edited by Vincent Pelletier
  • Thanks for your detailed answer. And from here, I was not well informed that performance drop you had were mostly coming from other things, sorry for this. So I understand better now why you didn't pushed quickly a fix.

    I totally agree with the idea to improve ERP5 behavior when "MySQL does not work". In particular, not working authentication even though you have account in acl_users is non-sense.

    Though using cache could be very useful for performance. It's good when we can avoid a cache by very fast algorithm. But here a cache must be faster than a ping-pong with MariaDB, or this should means the cache is not efficient. A single query like you suggest would surely help, but we should be drastic. If it comes with 0.1% performance loss, it's not good, something should be improved to compensate.

    Here with the many queries, for getColumnMap I get following values, on a server that is not crazy fast (time in seconds) :

    2016-07-13 08:25:07,337 INFO PERF getColumnMap 0.214618
    2016-07-13 08:25:07,702 INFO PERF getColumnMap 0.23792
    2016-07-13 08:25:08,034 INFO PERF getColumnMap 0.240443
    2016-07-13 08:25:08,599 INFO PERF getColumnMap 0.246215
    2016-07-13 08:25:09,246 INFO PERF getColumnMap 0.421924

    (it always takes more than 0.2 seconds).

    If I display a Person object, this server display it in 0.622 seconds (and getColumnMap is called in this transaction). So 30% of the time is spent doing the 15 round-trips to MariaDB. So this clearly show that we have huge regression here.

    So I will revert today.

    Usage of information_schema to have information in a single query is very good. But even with this, I believe cache is essential. getColumnMap is used for almost all transactions, and an additional round-trip must have non-negligible effect. So when update of schema is done by ERP5, it should be easy to force recalculation of getColumnMap through cache cookie. Then for cases outside of the control of ERP5, may be there is way to catch exceptions, increase cache cookie to force refresh of getColumnMap, and then relaunch query (unit test needed for this).

    Edited by Sebastien Robin
  • I'm surprised by these figures. Granted, I'm used to a very fast server, but in my experience the round-trip is around 1ms rather than 0.25s, so the total wasted time becomes then somewhat acceptable (as in: there are many other things we can improve elsewhere).

    I'll get in touch with you (after a few more customer requests) to make you test my single-query patch, to know the improvement magnitude.

  • Finally after checking, Vincent had some patch to get schema in a single sql query. It just need some work to check tests. I'm looking at this for having patch applied Friday in last case.

Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment