Commit f554c05a authored by Vincent Pelletier's avatar Vincent Pelletier

ERP5Type,CopySupport: Allow immediately indexing new subobjects

"immediate", in this context, means "during the same transaction".
Normally, indexation always happens in a transaction different from the
one which did the indexation-inducing action (modifying a property,
creating a document, explicitely requesting indexation). This is because
SQL and object databases do not have the same approach to conflict
resolution: in SQL, the last one wins, and ordering happens based on locks.
In ZODB, conflict resolution is stricter in that to modify an object
a transaction must have started with the same revision of that object as
the one which is current at the time it is trying to commit. As both
databases must be kept consistent, one interpretation must be enforced
onto the other: the ZODB interpretation. So delayed indexation, plus
careful activity sequencing (serialization_tag) is required.

But in very specific cases, it is actually safe to index a document
immediately: when creating that document. This is because the only
conflict which may then happen is if two transaction produce the same
path, and ZODB will prevent the transaction from committing altogether,
preventing any conflict resolution from happening. Pasting a document
falls into this category as well, for the same reason.

In turn, this feature removes the need to call "immediate" reindexation
methods, allowing to restrict their availability later and preventing
API misuse and catalog consistency compromission.

Two variants of "immediate" indexation are available:
- internal to the method which creates considered document
- delayed to a caller-controller, but mandatory, point later in current
  transaction, by using a context (in python sense) manager object.
parent 35c38086
......@@ -12,6 +12,7 @@
#
##############################################################################
from functools import partial
from OFS import Moniker
from zExceptions import BadRequest
from AccessControl import ClassSecurityInfo, getSecurityManager
......@@ -27,6 +28,7 @@ from Products.ERP5Type import Permissions
from Acquisition import aq_base, aq_inner, aq_parent
from Products.ERP5Type.Accessor.Constant import PropertyGetter as ConstantGetter
from Products.ERP5Type.Globals import PersistentMapping, MessageDialog
from Products.ERP5Type.ImmediateReindexContextManager import ImmediateReindexContextManager
from Products.ERP5Type.Utils import get_request
from Products.ERP5Type.Message import translateString
from Products.CMFCore.WorkflowCore import WorkflowException
......@@ -422,16 +424,17 @@ class CopyContainer:
path_item_list=previous_path,
new_id=self.id)
def _duplicate(self, cp, reindex_kw=None):
def _duplicate(self, cp, reindex_kw=None, immediate_reindex=False):
_, result = self.__duplicate(
cp,
duplicate=True,
is_indexable=None,
reindex_kw=reindex_kw,
immediate_reindex=immediate_reindex,
)
return result
def __duplicate(self, cp, duplicate, is_indexable, reindex_kw):
def __duplicate(self, cp, duplicate, is_indexable, reindex_kw, immediate_reindex):
try:
cp = _cb_decode(cp)
except:
......@@ -506,6 +509,16 @@ class CopyContainer:
if not set_owner:
# try to make ownership implicit if possible
new_ob.manage_changeOwnershipType(explicit=0)
method = new_ob.immediateReindexObject
if reindex_kw is not None:
method = partial(method, **reindex_kw)
if isinstance(immediate_reindex, ImmediateReindexContextManager):
immediate_reindex.append(method)
elif immediate_reindex:
# Immediately reindexing document that we just pasted is safe, as no
# other transaction can by definition see it, so there cannot be a race
# condition leading to stale catalog content.
method()
return op, result
def _postDuplicate(self):
......@@ -534,7 +547,7 @@ class CopyContainer:
self.isIndexable = ConstantGetter('isIndexable', value=False)
self.__recurse('_setNonIndexable')
def manage_pasteObjects(self, cb_copy_data=None, is_indexable=None, reindex_kw=None, REQUEST=None):
def manage_pasteObjects(self, cb_copy_data=None, is_indexable=None, reindex_kw=None, immediate_reindex=False, REQUEST=None):
"""Paste previously copied objects into the current object.
If calling manage_pasteObjects from python code, pass the result of a
......@@ -543,6 +556,18 @@ class CopyContainer:
If is_indexable is False, we will avoid indexing the pasted objects and
subobjects
immediate_reindex (bool)
Immediately (=during current transaction) reindex created document, so
it is possible to find it in catalog before transaction ends.
Note: this does not apply to subobjects which may be created during
pasting. Only the topmost object will be immediately reindexed. Any
subobject will be reindexed later, using activities.
If a ImmediateReindexContextManager instance is given, a context (in
python sense) must have been entered with it, and indexation will
occur when that context is exited, allowing further changes before
first indexation (ex: workflow state change, property change).
"""
cp = None
if cb_copy_data is not None:
......@@ -556,6 +581,7 @@ class CopyContainer:
duplicate=False,
is_indexable=is_indexable,
reindex_kw=reindex_kw,
immediate_reindex=immediate_reindex,
)
if REQUEST is None:
return result
......
......@@ -21,6 +21,7 @@
#
##############################################################################
from functools import partial
import zope.interface
from Products.ERP5Type.Globals import InitializeClass
from AccessControl import ClassSecurityInfo, getSecurityManager
......@@ -34,6 +35,7 @@ from Products.ERP5Type import interfaces, Constraint, Permissions, PropertySheet
from Products.ERP5Type.Base import getClassPropertyList
from Products.ERP5Type.UnrestrictedMethod import UnrestrictedMethod
from Products.ERP5Type.Utils import deprecated, createExpressionContext
from Products.ERP5Type.ImmediateReindexContextManager import ImmediateReindexContextManager
from Products.ERP5Type.XMLObject import XMLObject
from Products.ERP5Type.Cache import CachingMethod
from Products.ERP5Type.dynamic.accessor_holder import getPropertySheetValueList, \
......@@ -42,6 +44,7 @@ from Products.ERP5Type.dynamic.accessor_holder import getPropertySheetValueList,
ERP5TYPE_SECURITY_GROUP_ID_GENERATION_SCRIPT = 'ERP5Type_asSecurityGroupId'
from TranslationProviderBase import TranslationProviderBase
from Products.ERP5Type.Accessor.Constant import PropertyGetter as ConstantGetter
from Products.ERP5Type.Accessor.Translation import TRANSLATION_DOMAIN_CONTENT_TRANSLATION
from zLOG import LOG, ERROR
from Products.CMFCore.exceptions import zExceptions_Unauthorized
......@@ -359,11 +362,25 @@ class ERP5TypeInformation(XMLObject,
def constructInstance(self, container, id, created_by_builder=0,
temp_object=0, compute_local_role=None,
notify_workflow=True, is_indexable=None,
activate_kw=None, reindex_kw=None, **kw):
activate_kw=None, reindex_kw=None,
immediate_reindex=False, **kw):
"""
Build a "bare" instance of the appropriate type in
'container', using 'id' as its id.
Call the init_script for the portal_type.
immediate_reindex (bool, ImmediateReindexContextManager)
Immediately (=during current transaction) reindex created document, so
it is possible to find it in catalog before transaction ends.
Note: this does not apply to subobjects which may be created during
document construction. Only the topmost object will be immediately
reindexed. Any subobject will be reindexed later, using activities.
If a ImmediateReindexContextManager instance is given, a context (in
python sense) must have been entered with it, and indexation will
occur when that context is exited, allowing further changes before
first indexation (ex: workflow state change, property change).
Returns the object.
"""
if compute_local_role is None:
......@@ -426,6 +443,17 @@ class ERP5TypeInformation(XMLObject,
if kw:
ob._edit(force_update=1, **kw)
if not temp_object:
method = ob.immediateReindexObject
if reindex_kw is not None:
method = partial(method, **reindex_kw)
if isinstance(immediate_reindex, ImmediateReindexContextManager):
immediate_reindex.append(method)
elif immediate_reindex:
# Immediately reindexing document that we just created is safe, as no
# other transaction can by definition see it, so there cannot be a race
# condition leading to stale catalog content.
method()
return ob
def _getPropertyHolder(self):
......
#############################################################################
#
# Copyright (c) 2018 Nexedi SA and Contributors. All Rights Reserved.
# Vincent Pelletier <vincent@nexedi.com>
#
# WARNING: This program as such is intended to be used by professional
# programmers who take the whole responsability of assessing all potential
# consequences resulting from its eventual inadequacies and bugs
# End users who are looking for a ready-to-use solution with commercial
# garantees and support are strongly adviced to contract a Free Software
# Service Company
#
# This program is Free Software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
#
##############################################################################
from AccessControl import ClassSecurityInfo
from Products.ERP5Type.Globals import InitializeClass
class ImmediateReindexContextManager(object):
"""
Immediately reindex given object(s) upon leaving context.
Pass an instance of this class as "immediate_reindex" argument on methods
having one to delay indexation a bit (ex: to let you change object state,
change some peroperties).
Example usage:
from Products.ERP5Type.ImmediateReindexContextManager import ImmediateReindexContextManager
with ImmediateReindexContextManager() as immediate_reindex_context_manager:
document = context.newContent(
immediate_reindex=immediate_reindex_context_manager,
...
)
document.confirm()
# document will be indexed as already confirmed
"""
security = ClassSecurityInfo()
def __init__(self):
# Detect and tolerate (but track) context nesting.
self.__context_stack = []
def __enter__(self):
self.__context_stack.append([])
return self
def __exit__(self, exc_type, exc_value, traceback):
for method in self.__context_stack.pop():
method()
# Note: if you want to reuse this class, pay extra attention to security.
# It is critical that:
# - the class can be imported and instanciated from restricted python
# - "append" method cannot be called from anywhere but products
# - ImmediateReindexContextManager stay a class on its own (even if it
# should become an empty subclass), only used for indexation-related
# methods upon document creation.
# Otherwise, misuse will happen.
security.declarePrivate('append')
def append(self, method):
"""
Queue indexation method for execution upon context exit.
May only be called by places which just bound document into its container,
like constructInstance or object paste handler.
DO NOT CALL THIS ANYWHERE ELSE !
"""
try:
self.__context_stack[-1].append(method)
except IndexError:
raise RuntimeError(
'ImmediateReindexContextManager must be entered '
'(see "with" statement) before it can be used',
)
InitializeClass(ImmediateReindexContextManager)
......@@ -175,6 +175,7 @@ allow_module('Products.ERP5Type.Error')
allow_module('Products.ERP5Type.Errors')
allow_module('Products.ERP5Type.JSONEncoder')
allow_module('Products.ERP5Type.Log')
allow_module('Products.ERP5Type.ImmediateReindexContextManager')
ModuleSecurityInfo('Products.ERP5Type.JSON').declarePublic('dumps', 'loads')
ModuleSecurityInfo('Products.ERP5Type.Constraint').declarePublic('PropertyTypeValidity')
ModuleSecurityInfo('Products.ERP5Type.DiffUtils').declarePublic('DiffFile')
......
  • I am afraid that the new API is too rigid. Currently, I'm stuck because I'd like to immediately reindex a newly-created delivery at the end of the post-build script. The only way to use the new API is to subclass SimulatedDeliveryBuilder and it's not as simple as overriding the _createDelivery because I don't want the immediate reindex to happen so early. For performance reasons, I also don't reindex immediately when it's useless.

    If we decide to keep the API, the best thing to do is to add a TALES field on the builder that is evaluated only for newly-created delivery:

    • return None to not reindex immediately
    • reindex immediately otherwise and call .serialize() on the returned value (I need that too and I expect it's always required in such case)

    Then for my use case, it would be:

    context.getSpecialiseValue(portal_type="Sale Trade Condition") if context.getSimulationState() == 'planned' else None

    (this is actually strongly related to the delivery select method, which searches for appropriate existing invoices to extend, the STC being an easy and quite specific criterion)

    But I wonder what's the issue in doing something much simpler:

    • drop the new API, i.e. revert
    • add a check in immediateReindexObject that the uid is not in the catalog

    /cc @jerome

  • add a TALES field on the builder that [...]

    but such field is likely to be incomprehensible. And if ever it does not cover all use cases, we'd end up with more complex code for nothing.

    I could also convert the script to an external method, which shows that the new API does not prevent misuse.

    • drop the new API, i.e. revert

    What is the utility of removing this API, breaking any code which uses it ?

    • add a check in immediateReindexObject that the uid is not in the catalog

    Isn't there a race-condition (presence check will likely be under transaction isolation) ?

    • drop the new API, i.e. revert

    What is the utility of removing this API, breaking any code which uses it ?

    Less code to maintain and devs who need the discussed feature would not waste time with it.

    Making the immediate reindex private already broke existing code, but it was quite easy to fix it. Doing the contrary would even be easier.

    • add a check in immediateReindexObject that the uid is not in the catalog

    Isn't there a race-condition (presence check will likely be under transaction isolation) ?

    Ah right. I have 2 other ideas:

    • simply check that _p_serial is z64
    • when a document is created, add it to a transactional variable (a list looks best it's unlikely to be read), and later immediateReindexObject checks that the document is in the transactional variable
  • Making the immediate reindex private already broke existing code

    Besides the fact that this method allowed casual use to break the catalog (which I believe is not being discussed here), it also allowed casual use to cause poor performance:

    • it does not allow grouping, so indexation SQL overhead is maximal and throughput is minimal.
    • it exposes the current transaction to SQL locks (ex: indexation will trigger category indexation ZSQLMethod, which deletes rows before inserting them back, which puts locks), so its performance will be affected by current indexation load. This is fine when the current transaction is an activity, but is bad for regular transactions.

    So using such feature is still bad practice, and I see nothing wrong in putting barriers against its use.

    It it were just me, I would just drop this method entirely and declare all code using it as a design mistake of some sort. Ex: why do we use a data model in which a catalog lookup is required in order to find the latest post in a given forum thread, knowing that we will get back to the thread immediately after posting ?

    Allowing some cases of immediate indexation is a compromise, and it only covers cases which existed in the code back when I implemented this change.

  • Besides the fact that this method allowed casual use to break the catalog (which I believe is not being discussed here)

    No. That's exactly what I am discussing: make immediateReindexObject safe.

    casual use to cause poor performance

    Many things can cause poor performance, starting from inefficient algorithm and we don't try to prevent that because that's not possible. The poor performance of immediateReindexObject is relative: in some cases, that's the best compromise.

    My new use case for immediateReindexObject is for a builder and the cost of immediateReindexObject is really negligible and overall, we'll have a huge performance gain for invoice building.

    It it were just me, I would just drop this method entirely and declare all code using it as a design mistake of some sort.

    I guess you include the new API in all code using it ?

    Ex: why do we use a data model in which a catalog lookup is required in order to find the latest post in a given forum thread, knowing that we will get back to the thread immediately after posting ?

    I'm really open to alternatives but if we can't find any, it's not constructive to reject improvements to immediate reindexing.

  • No. That's exactly what I am discussing: make immediateReindexObject safe.

    Except that by making it private it prevents casual use altogether.

    But I get your point: independently from the discussion about method visibility, it is better to make it always safe.

    Many things can cause poor performance, [...]

    Did you notice my use of "casual" ? It is a very important part of my description.

    I guess you include the new API in all code using it ?

    How would any code call a non-existent method ? So yes, any code, including whatever you mean by "new API" (maybe the code added by the present 4-years-old commit ?).

    it's not constructive to reject improvements to immediate reindexing.

    Have I rejected anything ? I am providing context to what led to the present change, to explain all the aspects this is trying to cover as they may not be obvious from the implementation.

    4 posts in, I still do not have the beginning of an idea about what makes your use-case different, all I can tell is that you are very unhappy with the current state of the code, then you provided some snippets about how you would solve the non-described issue that I have no idea how they fit in the picture, and how you want to remove code introduced by the present commit.

    So, if you are actually looking for ideas, could you describe (to someone who is not familiar with simulation call tree - I probably do not need to know about specific script names, just about transaction chronology and areas of responsibility):

    • Where is/are the document(s) to index being created ?
    • Where is the first point in that transaction that indexation is possible ? (ex: after a state change)
    • Where is the last point in that transaction that indexation is possible ? (ex: before entering another area of responsibility, which contains some catalog lookup)
    • How these points relate to each other in terms of call stack ?

    This should at least illustrate why the immediate_reindex argument of newContent cannot be used, and maybe give me some ideas.

  • The call tree in my new use case is:

    build
      ...
        _processDeliveryGroup
          _createDelivery
      callAfterBuildingScript

    All the above methods belong to BuilderMixin.

    It's in the script called by callAfterBuildingScript (the script id is configured in a field of the builder) that the invoice (if created) is moved to 'planned' state and immediate reindex make has to be done after that. In some cases, the script moves the created invoice to a different state and no need to immediate reindex. In some other cases, no invoice is created: the builder extends an already planned & indexed invoice.

    The invoice is created in _createDelivery.

    We currently don't have a custom class for our builder: we use SimulatedDeliveryBuilder.

  • Thank you, this seems very clear to me.

    Would it make sense to provide a way for whoever chooses/implements the after-building script to also customise more invoice creation arguments ? For example to provide stuff like activate_kw={'node': ..., 'priority': ...} [1].

    If it is, then maybe immediate_reindex could be a specific case of the more general feature of "give control over newContent arguments". Then, maybe control can be handed over to a script chosen in the same way as the current post-building script, which would prepare arguments for newContent, which could include something to pass as the immediate_reindex argument.

    Some pseudo-code as illustration:

    # ERP5Site_customiseBuild(build_callback)
    with ImmediateReindexContextManager as foo:
      ... = build_callback(invoice_new_content_kw={'immediate_reindex': foo, ...}, ...)
      # here, retrieve everything the after-building logic needs: maybe it is returned by "build_callback", maybe "build_callback" takes a mutable argument where it stores created invoice and whatever else...
      # then, carry on with code taken from the after-building script:
      ...
      invoice_value.plan()
    # here the invoice is indexed and can be found, and the after-building logic carries on
    portal_catalog(...)
    ...

    The the call-stack would become something like:

    build
      ...
        ERP5Site_customiseBuild
          ...
            _processDeliveryGroup
              _createDelivery
        [back in ERP5Site_customiseBuild for post-building duties]

    Of course there are many possible ways in which this would be unacceptable. Maybe the after-building script contains some nested scopes/loops/... Maybe we really do not want to interrupt the build call chain. Maybe this would take just far too much time to implement such change. Maybe it would be an unacceptable API change.

    Also, if it makes no sense to provide a way to customise more newContent arguments besides immediate_reindex or if there are already ways to control these arguments, then this approach may be overkill.

    [1] Just to be extra-clear: I am of course not suggesting that the indexation activity spawned by newContent would do anything good for the after-building script's needs, it just happens to be the kind of activity spawned by newContent and activity arguments are the kind of things an instance's admin may want some control on in order to scale better.

  • I forgot to describe a usual call stack of build in the call tree (case of local build):

    CMFActivity (activated by Delivery.localBuild)
      Delivery._localBuild
        BusinessLink.build
          BuilderMixin.build
            ...

    And no API to customize there, for example to insert a context manager. That would become possible with your suggestion.

    Also, if it makes no sense to provide a way to customise more newContent arguments besides immediate_reindex or if there are already ways to control these arguments, then this approach may be overkill.

    BusinessLink.build accepts an activate_kw parameter that is forwarded to all newContent (delivery, line & cell) and it's trivial to add it to localBuild & _localBuild.

    Such an API change for something you don't want to encourage looks crazy to me.


    For my use case, I think there's actually a way to do faster than using immediate reindexation. Currently, the builder is called by activity whenever it needs invoices to be indexed, so I could add activity dependencies to prevent concurrent builds (and the dependency would be named after the involved STC, instead of calling serialize on it).

    But I don't want to optimize like that because that would make the code more complex & fragile for nothing. "fragile" because it's easy to break code in the worse possible way (race condition) with a missed/wrong dependency.

    And in the case of customer projects, why pay more in initial dev & maintenance when it's already fast enough.


    I think more and more that it's wrong to put any barrier against immediate reindex. It's actually very useful in customer projects because it's quick and robust way to avoid race conditions in places where the performance cost is negligible.

    For many years, I didn't know that it is safe for new objects and I did something crazy (same project, I obfuscated parts with [...]):

        while 1:
          r = portal.sale_order_module.searchFolder(**kw)
          if len(r) == 1:
            path = r[0].path
          elif reference:
            # XXX: Fallback mechanism that is at least required for [...]
            for r in portal.portal_activities.getMessageList(
                  tag="[...]_createOrder:"+reference):
              path = '/'.join(r.object_path)
              break
            else:
              # We must retry in case there was a race condition with an activity
              # that have just reindexed the SO for the first time.
              portal.erp5_sql_connection().query('ROLLBACK\0BEGIN')
              reference = None
              continue
          else:
            raise [...]Failure("Sales Order does not exist")
          getTransactionalVariable()["[...]_document"] = path
          return portal.restrictedTraverse(path)

    It's for an interface. A first RPC asks to create a document in the ERP and a second RPC to do something on it, the RPC identifies the document by a reference instead of its id.


    At last, note that declaring private is more annoying than dissuasive: I could convert my post-build script to an external method.

    For me it becomes clear. If immediateReindexObject can be safe, it's a good thing to make it easy to use.

  • Such an API change for something you don't want to encourage looks crazy to me.

    If the alternative is making immediateReindex public, I think what I am suggesting is still better at dissuading its use.

    Of course as I wrote before, my ideal world in a vacuum with friction-less spherical cows would not have any immediate reindex ability to begin with. But we are not in that world, so we need to expose just enough that we can do what we have to do. Which means exposing it more than at the time of the current commit.

    it's quick and robust way to avoid race conditions in places where the performance cost is negligible.

    There are asterisks after "robust", which I would be happy to get rid of (independently from the visibility discussion).

    the RPC identifies the document by a reference instead of its id

    Isn't this kind of a weakness in the design of this RPC ?

    For example, here are the design changes I would (hopefully) suggest if such interface design was presented to me (before anything was implemented, hopefully), off the top of my head (there may be more possibilities):

    • in a REST design, the location of a created resource can be signaled using a Location header in a 201 Created response. Then the second call could be made based on that locator.
    • maybe we could rely on the fact that the client has to robust to transient issues, by responding with a 5xx status, which the client could then retry. Or maybe use a dummy 307 Temporary Redirect to the current locator if the client supports it properly (unfortunately this it not common).

    Once all such options are eliminated, then we reach the level of internal hacks, where we need to force ERP5 to behave in ways which are not natural to its design, like immediate reindexation. I would not be shocked if I have to expose it in a project-specific way then, without having to push such change to generic code.

    Of course this does not help when the specification & implementation already exist. But I think generic ERP5 should be pushing future designs to take a "more natural" direction based on what we learned (from having broken our catalog, ...).

    And of course, there are certainly non-interface cases where this does not help, like the simulation case (and I'm happily taking your word for it that there is a big gain to be had and no reasonable alternative, but I have no clue either way myself in this context).

    At last, note that declaring private is more annoying than dissuasive: I could convert my post-build script to an external method.

    Is this a fair dichotomy ? Having the method public fails even harder at being dissuasive: it is not even annoying.

    For me it becomes clear. If immediateReindexObject can be safe, it's a good thing to make it easy to use.

    So you seem to be saying that the solution I propose could be practically implemented, then you present an unrelated example which I think illustrate how a lack of dissuasion/annoyance leads to bad designs, and then you still decide that it is best to stick to your initial opinion while just discarding the other aspects I have presented ?

    Have I just been wasting my time replying here ?

    @romain : I believe you are the one where the bucket stops for ERP5 design/API decisions. Please arbiter.

Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment