Draft: Rework DMS without base data
General context
Part of work on DMS requested by JP. This is perhaps the hardest MR, which intends to fully drop base data from DMS. Instead, we only use the data property, and enforce ERP5 principle of "storing what the user entered". That means every file uploaded is now directly stored in data, with content_type updated, and all accesses are made to these same properties. Before, the behaviour was mixed, and only first upload was guaranteed to update data, which was anyway never accessed in practice (only exception being "Download original document" action), with conversion to a base format being performed instead. This conversion led in some cases to data or formatting losses, which are purely useless before edition; it also stores data twice, wasting disk space.
What is included?
Dropping base data has many implications everywhere, and I did not tried to reduce a lot the number of commits for now, though it is planned before merging. Most of the changes should be straightforward, but at least one review would be helpful, since the work has spanned over months, and I likely have forgotten some things I did in the beginning.
As much as possible, this MR only does replace base data:
- drop all calls to
[get|set|has]BaseData, accesses tobase_data, calls to[get|set|has]BaseContentTypeand accesses tobase_content_type; - delete External Processing Workflow;
- drop
IBaseConvertableinterface;
but I did opportunistic improvements here and there when they helped, even if maybe not strictly required. What I can think of is:
- add a new Portal Type group
ooo_document; - merge all OOo conversions chains that were spread out in the code in the main
_convertfunction; - add an
ITextConvertableinterface, which mostly replacesIBaseConvertable;
Backward compatibility
See commit tsoulard/erp5@9d270406 . The previous source of truth was base data, and I do not want to do guesswork to see if data matches via Cloudooo or anything, so for every call to [get|set|has]BaseData, I copy base_data to data and delete the old property if needed, and return data. This seems the less costly way to migrate, without needing an upgrader script or anything similar. Adjustments might be needed on this side, consider the current implementation a minimal working version, but for instance it is unclear to me if we also want the migration to happen on [get|has]Data when previous base data exists.
MR readiness
Getting all tests to work has been the biggest nightmare. As of now, I am left with only three failing tests. testContributionRegistryTool cannot be solved with new interaction workflow: the issue is that since we guess content type automatically from filename immediately after upload, the predicate on both content type and filename always match. I think I will simply update the test, maybe to use a different predicate. The two others are still unidentified, and hard to debug. OfficeJS test may be due to the change in document, but unsure. Anyway, solving these tests is unlikely to change the whole MR.
Secondly, the code in this MR have not been rebased on recent ERP5 with new BT5 format. I will surely get bored doing it, so most likely this will be one of the few of my MR to use a merge commit (so I do not rebase) with BT5 re-export. I am ready to hear other ideas, but will not spend time changing all syntax.
What follows?
After merge, work on DMS is not finished, but I decided to split it since this MR is already quite some work. On the other subjects JP would like to see, interfaces, portal type groups, work on OfficeJS sync and WebDAV should be expected later.