WIP: Wrapper as Pdata for big file when install business template
-
-
-
-
-
-
resolved all discussions
Toggle commit list -
marked as a Work In Progress
Toggle commit list -
When i upgrade/install erp5_notebook in an instance which use neo storage instead of zeo,
Neo stopped immediately, as consequence, erp5_notebook can't be installed/upgraded.
The reason is in erp5_notebook, there has a one big file scipy.data.bin 157MB which can't be installed as it is in neo
The solution suggested by @jm is to wrapper it by Pdata when install a BT5: 7784f78c Which i think, is however a big change.
Imagine below case:
In your BT5, there has a big file coincidentally, after you install it,
When you access a property like data by calling getData()
Before this change, what you get is a string.
After this change, what you get is a Pdata object or AcquisitionWrapper
So i'd like let you know this change and your opinion on it.
-
There's no problem. The BT5 system must restore things as they were, and when the author of this object added it in a skin folder, it was originally in Pdata form.
Let's merge.
-
It's normal that file data are PData, this is what happens when "large" files are uploaded ( because
_read_data
is called ), it's not good that business template uses string.I think you just got unlucky to hit that bug with text conversion at the same time, but it's really supposed to be PData ( and
ERP5/mixin/cached_convertable.py
handles PData correctly )About business template test, maybe we could merge this with
test_text_file_import_export
? -
product/ERP5/Document/TextDocument.py
159 159 if mime_type == 'text/html': 160 160 mime_type = 'text/x-html-safe' 161 161 if src_mimetype != "image/svg+xml": 162 result = portal_transforms.convertToData(mime_type, text_content, 162 # For Big File, text_content is a PData AcquisitionWrapper 163 # convertToData require a string 164 result = portal_transforms.convertToData(mime_type, str(text_content), -
how can text content become
Pdata
? do you still have a traceback ?I think we will need a test for that.
edit: the message below is for an independant bug I noticed while looking at this code but does not seem important in this discussion
I tried to reproduce with tests from
ERP5OOo/tests/testDms.py
, I started to add this:def test_html_to_text_pdata(self): web_page_portal_type = 'Web Page' module = self.portal.getDefaultModule(web_page_portal_type) web_page = module.newContent(portal_type=web_page_portal_type) html_content = """ <p>blablabla</p>""" * 10000 file_like = StringIO.StringIO() file_like.write(html_content) setattr(file_like, 'filename', 'something.htm') web_page.edit(file=file_like) import pdb; pdb.set_trace() safe_html = web_page.convert('html')[1]
and notice that there is
.data
that is aPdata
(that looks good, but maybe it's not used in web pages) and also the same content in.text_content
:(Pdb) web_page.data <OFS.Image.Pdata object at 0x7fb549d46d70> (Pdb) len(str(web_page.data)) 230000 (Pdb) type(web_page.text_content) <type 'str'> (Pdb) len(web_page.text_content) 230000
Of course, because property is duplicated, this kind of problem happen:
(Pdb) web_page.setTextContent('ahah') (Pdb) len(web_page.text_content) 4 (Pdb) len(str(web_page.data)) 230000
note that this test is strange, because there is no way to
edit(file=
on a web page, but the contribute action on the front page create web pages with this issue:if I edit the web page ( with
WebPage_viewEditor
), then onlytext_content
is updated:This looks like a bug, but it's another story ( I filled this in bug module as https://nexedijs.erp5.net/#/bug_module/20191010-25F131?page=form&view=view ).
What I'm wondering is how we can get text content to be
Pdata
(but I don't think that's bad that text content isPdata
) -
This
.data
and.text_content
is not important in the context of this discussion.
-
-
how can text content become
Pdata
? do you still have a traceback ?it happens when install erp5_officejs bt5, SQLCatalog_deferFullTextIndexActivity is failed for some big files
for example this file:
gadget_officejs_appstore_css_jquery-mobile_jquery-mobile_latest_css's length is 255910
you can check traceback in this test result with UNKNOWN result
-
product/ERP5/tests/testBusinessTemplate.py
3401 3402 """Tests the Title of the Template Tool.""" 3402 3403 self.assertEqual('Template Tool', self.getTemplateTool().Title()) 3403 3404 3405 def test_00_checkFileisPDataInstance(self): 3406 self.assertIsInstance(self.portal.portal_skins.erp5_ckeditor.ckeditor['LICENSE.md'].data, Pdata) 3407 -
I realised that this test is still passed without my changes in BusinessTemplate.
I need to think more, something is not clear for me
-
Indeed. Look at:
elif obj.__class__.__name__ in ('File', 'Image'): if "data" in obj.__dict__: File._setData.__func__(obj, obj.data)
in
ObjectTemplateItem.install
. We should remove it. Could it solve the bug found by Jerome ?And the test should contain a second check for the object in the BT:
self.assertIsInstance(self.portal.portal_templates .getInstalledBusinessTemplate('erp5_xhtml_style')._skin_item._objects ['portal_skins/erp5_ckeditor/ckeditor/LICENSE.md'].data, Pdata)
BTW, your assertIsInstance line exceeds 80 chars.
-
.... but then this
elif obj.__class__.__name__ in ('File', 'Image'): if "data" in obj.__dict__: File._setData.__func__(obj, obj.data)
should already be installing
portal_skins/erp5_notebook/scipy.data
withPData
. I must be missing something. -
It would. NEO breaks before.
-
-
Ah thank you, I understand, this is this patch that would make
text_content
aPdata
.Before the patch, only
data
property can bePData
,text_content
was alwaysstr
. If we change this, then it becomes a big change and probably other things would break. For example, can we useWebPage_viewEditor
iftext_content
isPdata
?Maybe the logic should be something like:
if isinstance(obj, OFSFile) and property_name == "data": data = obj._read_data(data)[0]
-
I would like to know why neo did not work. I know that having a large size of string in a persistent object is a bad idea, it should be Pdata, but it is still a bug of neo.
It is not a bug of NEO, and this is not even specific to NEO. There has always been an arbitrary limit to the size of network packets in the NEO protocol (64M), but we consider that exceeding this limit is either a misuse or a bug and we don't want think to get worse with a OOM. msgpack does the same, and ZEO5 has failing tests with last version of msgpack.
We're open to increase this limit whenever it is reasonable. Here, it isn't.
-
For example, can we use
WebPage_viewEditor
iftext_content
isPdata
?I'm afraid we have no choice and make it work.
-
I'm afraid we have no choice and make it work.
It seems a good thing if we can make text_content (and maybe all string properties) support very large text by using
Pdata
but in my understanding this is not blocking for the problem of installing erp5_notebook, because scipy.data is anOFS.Image.File
, which should be using.data
property, not.text_content
, so we can limit this to "business template installation should usePdata
forOFS.Image.File.data
" and handle text content later. -
OK. And it looks like anyway that types with text_content attribute are unlikely to be big.
Xiaowu, what's the status with
isinstance(obj, OFSFile) and property_name == "data"
and with the 3 lines removed in install() ? -
added 1 commit
- 7ceb0a6e - BusinessTemplate: File's data property is imported as PData
Toggle commit list -
what's the status with
isinstance(obj, OFSFile) and
....Test still running,
-
@arnau the logic to use
Pdata
if it's big,str
otherwise is in https://github.com/zopefoundation/Zope/blob/2.13.27/src/OFS/Image.py#L513File_view
has amy_file
field, when edit is called, something like this happens:parts/erp5/product/ERP5Type/Base.py(254)__call__() -> result = self.__dict__['_m'](instance, *args, **kw) parts/erp5/product/ERP5/Document/File.py(117)_edit() -> self._setFile(file_object, precondition=precondition) parts/erp5/product/ERP5Type/Base.py(254)__call__() -> result = self.__dict__['_m'](instance, *args, **kw) parts/erp5/product/ERP5/Document/File.py(148)_setFile() -> CMFFile._edit(self, precondition=precondition, file=data) eggs/Products.CMFDefault-2.2.4-py2.7.egg/Products/CMFDefault/File.py(173)_edit() -> self.manage_upload(file) eggs/Zope2-2.13.27+slapospatched001-py2.7.egg/OFS/Image.py(492)manage_upload() -> data, size = self._read_data(file) > eggs/Zope2-2.13.27+slapospatched001-py2.7.egg/OFS/Image.py(515)_read_data() -> import transaction
setFile
(and alsosetData
which is usessetFile
) are special setters.setTextContent
on web page is a simple setter for string property. -
Toggle commit list
-
-
added 27 commits
-
9eda82d9...f675c0fb - 26 commits from branch
nexedi:master
- c0118634 - BusinessTemplate: Import Big File's data property as Pdata
Toggle commit list -
9eda82d9...f675c0fb - 26 commits from branch
-
what's the status of this ? looks good to me. So the problem was that when we first import the business template and create the
ObjectTemplateItem
we are using a very large string and not aPData
and even if the.data
attribute of the file was already properly created asPData
theObjectTemplateItem
(_path_item
of the business template document ) was a very large string, is it ? -
when we first import the business template and create the
ObjectTemplateItem
we are using a very large string and not aPData
yes
.data
attribute of the file was already properly created asPData
as i understand, when first import bt5,
.data
attribute was not yet set as PData, it happens after import bt5 inelif obj.__class__.__name__ in ('File', 'Image'): if "data" in obj.__dict__: File._setData.__func__(obj, obj.data)
and NEO break before it.
So in this MR, when first import bt5, we set
.data
attribute as PData asap.I'll rebase it against master, if all is ok, i'll merge.
-
added 23 commits
-
c0118634...969c3e23 - 22 commits from branch
nexedi:master
- 37e1ab36 - BusinessTemplate: Import Big File's data property as Pdata
Toggle commit list -
c0118634...969c3e23 - 22 commits from branch
-
Please register or sign in to post a comment