Catalog-safe isUser check (!709) · Merge requests · nexedi / erp5

Catalog-safe isUser check

The goal of this merge request is to get rid of the circular dependency of catalog on itself.

To index a document, code must decide how to index its security among roles_and_users and viewable_* columns. This depends on whether the local role is granted to a user or to a security group. To decide whether a role is granted to a user, the existing catalog looks user up using PAS. And of course, with ERP5{,Login}UserManager plugin enabled, PAS ends up querying the catalog to find if the looked up user exists.

This means that if a user document (typically: a Person) grants roles to the user it represents before it is first indexed, it will be mis-indexed: it will not be found by PAS until it is indexed, so the role granted to the user will be considered to be granted to a group, causing a security_uid allocation. In turn, security_uid allocation just pile up (because of a fundamental design limitation, nothing can automatically and safely free a security_uid).

For live document creation, this can be avoided by adding a python expression on the role declaration to prevent assigning any role to a user until it can be found by PAS, and interactions or activity dependency chains to re-compute roles once the user can be found. But for posterior reindexation from an incomplete catalog (ex: full reindexation from ZODB only), nothing can really be done: the roles are already granted, so catalog will try to index them.

The solution implemented here is to guess, by just looking at the value role is granted to, whether it is a user identifier or a group. Also, it provides a way to override the default implementation per-project using a restricted python script.

There are 2 approaches implemented in this merge request:

the "no decision, no improvement by default, most compatible" approach just moves the existing PAS lookup behind the customised script lookup.
the "try to improve default situation" approach, implemented in topmost commit, goes further by trying to detect groups: as user identifiers were, until very recently (ERP5 Login introduction and the semantic split of Person.reference into user_id, login and pure reference) very unstructured values, this implementation cannot really recognise a user id, but instead tries to recognise all possible security groups. It does so by enumerating all atomic security groups, and by deconstructing composed security group identifiers used in local roles and checking whether each component is a valid group. Withdrawn: not feasible in generic code. Instead, a Zope-instance-specitic script is required when non-generic viewable_... columns are used.

To evaluate the viability of that second option, please run the following code (once !) on representative, real-world instances and send me the output:

from pprint import pformat
portal = context.getPortalObject()
security_mapping = getattr(
  portal,
  'ERP5Type_getSecurityCategoryMapping',
  lambda: ( # BBB
    (
      'ERP5Type_getSecurityCategoryFromAssignment',
      portal.getPortalAssignmentBaseCategoryList(),
    ),
  ),
)()
print 'security_mapping:', pformat(security_mapping)

timing = []
for count in xrange(2):
  BEGIN = DateTime()
  base_category_id_set = {
    base_category
    for _, base_category_list in security_mapping
    for base_category in base_category_list
  }
  portal_categories = portal.portal_categories
  todo_list = []
  for base_category_id in base_category_id_set:
    try:
      base_category_value = portal_categories[base_category_id]
    except KeyError:
      if count:
        print 'Base category declared in category mapping, but missing in portal_categories: %r' % (base_category_id, )
      continue
    todo_list.extend(base_category_value.objectValues())
  group_id_set = set(portal.acl_users.zodb_groups.listGroupIds())
  while todo_list:
    category_value = todo_list.pop()
    group_id = (
      category_value.getProperty('codification') or
      category_value.getProperty('reference') or
      category_value.getId()
    )
    if '_' in group_id or group_id[-1] == '*':
      if count:
        print 'Malformed group id on %s: %r' % (
          category_value.getPath(),
          group_id,
        )
      continue
    group_id_set.add(group_id)
    todo_list.extend(category_value.objectValues())
  END = DateTime()
  timing.append((END - BEGIN) * 86400)
print 'groups:', len(group_id_set)
print 'first run: %.2fs, second run: %.2fs' % tuple(timing)
return printed

/cc @jerome @jm @kazuhiko @romain @seb @tb