Catalog-safe isUser check
The goal of this merge request is to get rid of the circular dependency of catalog on itself.
To index a document, code must decide how to index its security among roles_and_users
and viewable_*
columns. This depends on whether the local role is granted to a user or to a security group. To decide whether a role is granted to a user, the existing catalog looks user up using PAS. And of course, with ERP5{,Login}UserManager
plugin enabled, PAS ends up querying the catalog to find if the looked up user exists.
This means that if a user document (typically: a Person
) grants roles to the user it represents before it is first indexed, it will be mis-indexed: it will not be found by PAS until it is indexed, so the role granted to the user will be considered to be granted to a group, causing a security_uid
allocation. In turn, security_uid
allocation just pile up (because of a fundamental design limitation, nothing can automatically and safely free a security_uid
).
For live document creation, this can be avoided by adding a python expression on the role declaration to prevent assigning any role to a user until it can be found by PAS, and interactions or activity dependency chains to re-compute roles once the user can be found. But for posterior reindexation from an incomplete catalog (ex: full reindexation from ZODB only), nothing can really be done: the roles are already granted, so catalog will try to index them.
The solution implemented here is to guess, by just looking at the value role is granted to, whether it is a user identifier or a group. Also, it provides a way to override the default implementation per-project using a restricted python script.
There are 2 approaches implemented in this merge request:
- the "no decision, no improvement by default, most compatible" approach just moves the existing PAS lookup behind the customised script lookup.
-
the "try to improve default situation" approach, implemented in topmost commit, goes further by trying to detect groups: as user identifiers were, until very recently (Withdrawn: not feasible in generic code. Instead, a Zope-instance-specitic script is required when non-genericERP5 Login
introduction and the semantic split ofPerson.reference
intouser_id
, login and pure reference) very unstructured values, this implementation cannot really recognise a user id, but instead tries to recognise all possible security groups. It does so by enumerating all atomic security groups, and by deconstructing composed security group identifiers used in local roles and checking whether each component is a valid group.viewable_...
columns are used.
To evaluate the viability of that second option, please run the following code (once !) on representative, real-world instances and send me the output:
from pprint import pformat
portal = context.getPortalObject()
security_mapping = getattr(
portal,
'ERP5Type_getSecurityCategoryMapping',
lambda: ( # BBB
(
'ERP5Type_getSecurityCategoryFromAssignment',
portal.getPortalAssignmentBaseCategoryList(),
),
),
)()
print 'security_mapping:', pformat(security_mapping)
timing = []
for count in xrange(2):
BEGIN = DateTime()
base_category_id_set = {
base_category
for _, base_category_list in security_mapping
for base_category in base_category_list
}
portal_categories = portal.portal_categories
todo_list = []
for base_category_id in base_category_id_set:
try:
base_category_value = portal_categories[base_category_id]
except KeyError:
if count:
print 'Base category declared in category mapping, but missing in portal_categories: %r' % (base_category_id, )
continue
todo_list.extend(base_category_value.objectValues())
group_id_set = set(portal.acl_users.zodb_groups.listGroupIds())
while todo_list:
category_value = todo_list.pop()
group_id = (
category_value.getProperty('codification') or
category_value.getProperty('reference') or
category_value.getId()
)
if '_' in group_id or group_id[-1] == '*':
if count:
print 'Malformed group id on %s: %r' % (
category_value.getPath(),
group_id,
)
continue
group_id_set.add(group_id)
todo_list.extend(category_value.objectValues())
END = DateTime()
timing.append((END - BEGIN) * 86400)
print 'groups:', len(group_id_set)
print 'first run: %.2fs, second run: %.2fs' % tuple(timing)
return printed