Skip to content
  • This project
    • Loading...
  • Sign in

nexedi / erp5

Go to a project
Toggle navigation
Toggle navigation pinning
  • Projects
  • Groups
  • Snippets
  • Help
  • Project
  • Activity
  • Repository
  • Pipelines
  • Graphs
  • Merge Requests 111
  • Wiki
  • Snippets
  • Network
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Open
Merge Request !953 opened 2019-10-08 14:10:02 UTC by Xiaowu Zhang@xiaowu.zhang

WIP: Wrapper as Pdata for big file when install business template

Edited 2019-10-30 08:14:53 UTC
Check out branch Download as
  • Email Patches
  • Plain Diff
Request to merge xiaowu.zhang:bug into master (173 commits behind)
×

Check out, review, and merge locally

Step 1. Fetch and check out the branch for this merge request

git fetch https://lab.nexedi.com/xiaowu.zhang/erp5.git bug
git checkout -b xiaowu.zhang/erp5-bug FETCH_HEAD

Step 2. Review the changes locally

Step 3. Merge the branch and fix any conflicts that come up

git checkout master
git merge --no-ff xiaowu.zhang/erp5-bug

Step 4. Push the result of the merge to GitLab

git push origin master

Note that pushing to GitLab requires write access to this repository.

Tip: You can also checkout merge requests locally by following these guidelines.

Checking ability to merge automatically…
  • Discussion 25
  • Commits 1
  • Changes 2
  • {{ resolvedDiscussionCount }}/{{ discussionCount }} {{ resolvedCountText }} resolved
  • 3656bbc4b81cef3c1415dc0f7eb4e11c?s=80&d=identicon
    Toggle discussion
    Xiaowu Zhang
    @xiaowu.zhang started a discussion on an outdated diff 2019-10-08 14:25:38 UTC
    Resolved by Xiaowu Zhang 2019-10-08 15:07:38 UTC
    product/ERP5/Document/TextDocument.py
    159 159 if mime_type == 'text/html':
    160 160 mime_type = 'text/x-html-safe'
    161 161 if src_mimetype != "image/svg+xml":
    162 # For Big File, text_content is a PData AcquisitionWrapper
    163 if not isinstance(text_content, str):
    164 text_content = str(text_content)
    162 165 result = portal_transforms.convertToData(mime_type, text_content,
    163 166 object=self, context=self,
    • @xiaowu.zhang Xiaowu Zhang
      @xiaowu.zhang commented 2019-10-08 14:25:38 UTC
      Developer

      This commit fix error such as:

      ('/erp5_portal_eb1fb2f2504535ab2441b96e45fb7c4e/web_page_module/gadget_hr_expense_xlsx_core_min_js', 'SQLCatalog_deferFullTextIndexActivity', -2, 1), ('/erp5_portal_eb1fb2f2504535ab2441b96e45fb7c4e/web_page_module/gadget_hr_fullcalendar_js', 'SQLCatalog_deferFullTextIndexActivity', -2, 1)]
      Last error message:
      TypeError
      argument 1 must be string or buffer, not Acquisition.ImplicitAcquisitionWrapper
      
      

      Full error trace is:

      test result 20191007

      Edited 2019-10-08 14:27:35 UTC
  • @xiaowu.zhang Xiaowu Zhang
    @xiaowu.zhang commented 2019-10-08 14:31:36 UTC
    Developer

    @jm can you pls check? especially this commit:af0c926c thx

  • C911061fd96783a1cdab76ca0ef49a66?s=80&d=identicon
    Toggle discussion
    Julien Muchembled
    @jm started a discussion on an outdated diff 2019-10-08 14:39:10 UTC
    Resolved by Xiaowu Zhang 2019-10-08 15:07:33 UTC
    product/ERP5/tests/testBusinessTemplate.py
    3401 3408 """Tests the Title of the Template Tool."""
    3402 3409 self.assertEqual('Template Tool', self.getTemplateTool().Title())
    3403 3410
    3411 def test_00_checkFileisPDataInstance(self):
    3412 """Test Check File Class"""
    3413 sequence_list = SequenceList()
    3414 sequence_string = '\
    3415 CheckFileIsImportedAsPData \
    3416 '
    3417 sequence_list.addSequenceString(sequence_string)
    3418 sequence_list.play(self)
    • @jm Julien Muchembled
      @jm commented 2019-10-08 14:39:10 UTC
      Owner

      don't complicate with a step (that won't be reused elsewhere) and sequences

  • @xiaowu.zhang Xiaowu Zhang
    @xiaowu.zhang

    added 2 commits

    • 4f1f1ffb - BusinessTemplate: File is imported as PData
    • c50b1418 - TextDocument: convert to string for big file's text content

    Compare with previous version

    2019-10-08 14:57:38 UTC

    added 2 commits

    • 4f1f1ffb - BusinessTemplate: File is imported as PData
    • c50b1418 - TextDocument: convert to string for big file's text content

    Compare with previous version

    Toggle commit list
  • @xiaowu.zhang Xiaowu Zhang
    @xiaowu.zhang

    added 2 commits

    • 7784f78c - BusinessTemplate: File is imported as PData
    • 828687cc - TextDocument: convert to string for big file's text content

    Compare with previous version

    2019-10-08 15:05:08 UTC

    added 2 commits

    • 7784f78c - BusinessTemplate: File is imported as PData
    • 828687cc - TextDocument: convert to string for big file's text content

    Compare with previous version

    Toggle commit list
  • @xiaowu.zhang Xiaowu Zhang
    @xiaowu.zhang

    resolved all discussions

    2019-10-08 15:07:38 UTC

    resolved all discussions

    Toggle commit list
  • @xiaowu.zhang Xiaowu Zhang
    @xiaowu.zhang

    marked as a Work In Progress

    2019-10-09 09:28:39 UTC

    marked as a Work In Progress

    Toggle commit list
  • @xiaowu.zhang Xiaowu Zhang
    @xiaowu.zhang commented 2019-10-09 13:04:47 UTC
    Developer

    @nexedi

    When i upgrade/install erp5_notebook in an instance which use neo storage instead of zeo,

    Neo stopped immediately, as consequence, erp5_notebook can't be installed/upgraded.

    The reason is in erp5_notebook, there has a one big file scipy.data.bin 157MB which can't be installed as it is in neo

    The solution suggested by @jm is to wrapper it by Pdata when install a BT5: 7784f78c Which i think, is however a big change.

    Imagine below case:

    In your BT5, there has a big file coincidentally, after you install it,

    When you access a property like data by calling getData()

    Before this change, what you get is a string.

    After this change, what you get is a Pdata object or AcquisitionWrapper

    So i'd like let you know this change and your opinion on it.

  • @jm Julien Muchembled
    @jm commented 2019-10-09 14:00:22 UTC
    Owner

    There's no problem. The BT5 system must restore things as they were, and when the author of this object added it in a skin folder, it was originally in Pdata form.

    Let's merge.

  • @yusei Yusei Tahara
    @yusei commented 2019-10-10 01:17:14 UTC
    Owner

    @nexedi I would like to know why neo did not work. I know that having a large size of string in a persistent object is a bad idea, it should be Pdata, but it is still a bug of neo.

  • @jerome Jérome Perrin
    @jerome commented 2019-10-10 03:46:39 UTC
    Owner

    It's normal that file data are PData, this is what happens when "large" files are uploaded ( because _read_data is called ), it's not good that business template uses string.

    I think you just got unlucky to hit that bug with text conversion at the same time, but it's really supposed to be PData ( and ERP5/mixin/cached_convertable.py handles PData correctly )

    About business template test, maybe we could merge this with test_text_file_import_export ?

  • 0d26e5885b4c844763ca1fc0f7f4f444?s=80&d=identicon
    Toggle discussion
    Jérome Perrin
    @jerome started a discussion on an outdated diff 2019-10-10 05:06:02 UTC
    Last updated by Jérome Perrin 2019-10-10 05:10:45 UTC
    product/ERP5/Document/TextDocument.py
    159 159 if mime_type == 'text/html':
    160 160 mime_type = 'text/x-html-safe'
    161 161 if src_mimetype != "image/svg+xml":
    162 result = portal_transforms.convertToData(mime_type, text_content,
    162 # For Big File, text_content is a PData AcquisitionWrapper
    163 # convertToData require a string
    164 result = portal_transforms.convertToData(mime_type, str(text_content),
    • @jerome Jérome Perrin
      @jerome commented 2019-10-10 05:06:02 UTC
      Owner

      how can text content become Pdata ? do you still have a traceback ?

      I think we will need a test for that.


      edit: the message below is for an independant bug I noticed while looking at this code but does not seem important in this discussion

      I tried to reproduce with tests from ERP5OOo/tests/testDms.py , I started to add this:

        def test_html_to_text_pdata(self):
          web_page_portal_type = 'Web Page'
          module = self.portal.getDefaultModule(web_page_portal_type)
          web_page = module.newContent(portal_type=web_page_portal_type)
      
          html_content = """
            <p>blablabla</p>""" * 10000
          file_like = StringIO.StringIO()
          file_like.write(html_content)
          setattr(file_like, 'filename', 'something.htm')
          web_page.edit(file=file_like)
      
          import pdb; pdb.set_trace()
          safe_html = web_page.convert('html')[1]
      

      and notice that there is .data that is a Pdata (that looks good, but maybe it's not used in web pages) and also the same content in .text_content:

      (Pdb) web_page.data
      <OFS.Image.Pdata object at 0x7fb549d46d70>
      (Pdb) len(str(web_page.data))
      230000
      (Pdb) type(web_page.text_content)
      <type 'str'>
      (Pdb) len(web_page.text_content)
      230000
      

      Of course, because property is duplicated, this kind of problem happen:

      (Pdb) web_page.setTextContent('ahah')
      (Pdb) len(web_page.text_content)
      4
      (Pdb) len(str(web_page.data))
      230000
      

      note that this test is strange, because there is no way to edit(file= on a web page, but the contribute action on the front page create web pages with this issue:

      image

      image

      if I edit the web page ( with WebPage_viewEditor ), then only text_content is updated:

      image

      This looks like a bug, but it's another story ( I filled this in bug module as https://nexedijs.erp5.net/#/bug_module/20191010-25F131?page=form&view=view ).

      What I'm wondering is how we can get text content to be Pdata (but I don't think that's bad that text content is Pdata)

      Edited 2019-10-10 07:03:30 UTC
    • @jerome Jérome Perrin
      @jerome commented 2019-10-10 05:10:45 UTC
      Owner

      This .data and .text_content is not important in the context of this discussion.

  • @xiaowu.zhang Xiaowu Zhang
    @xiaowu.zhang commented 2019-10-10 08:11:47 UTC
    Developer

    how can text content become Pdata ? do you still have a traceback ?

    it happens when install erp5_officejs bt5, SQLCatalog_deferFullTextIndexActivity is failed for some big files

    Screenshot_2019-10-10_at_10.00.03_AM

    for example this file:

    gadget_officejs_appstore_css_jquery-mobile_jquery-mobile_latest_css's length is 255910

    Screenshot_2019-10-10_at_10.05.59_AM

    you can check traceback in this test result with UNKNOWN result

  • 3656bbc4b81cef3c1415dc0f7eb4e11c?s=80&d=identicon
    Toggle discussion
    Xiaowu Zhang
    @xiaowu.zhang started a discussion on an outdated diff 2019-10-10 08:47:40 UTC
    Last updated by Julien Muchembled 2019-10-10 10:45:35 UTC
    product/ERP5/tests/testBusinessTemplate.py
    3401 3402 """Tests the Title of the Template Tool."""
    3402 3403 self.assertEqual('Template Tool', self.getTemplateTool().Title())
    3403 3404
    3405 def test_00_checkFileisPDataInstance(self):
    3406 self.assertIsInstance(self.portal.portal_skins.erp5_ckeditor.ckeditor['LICENSE.md'].data, Pdata)
    3407
    • @xiaowu.zhang Xiaowu Zhang
      @xiaowu.zhang commented 2019-10-10 08:47:40 UTC
      Developer

      I realised that this test is still passed without my changes in BusinessTemplate.

      I need to think more, something is not clear for me

    • @jm Julien Muchembled
      @jm commented 2019-10-10 09:31:09 UTC
      Owner

      Indeed. Look at:

              elif obj.__class__.__name__ in ('File', 'Image'):
                if "data" in obj.__dict__:
                  File._setData.__func__(obj, obj.data)
      

      in ObjectTemplateItem.install. We should remove it. Could it solve the bug found by Jerome ?

      And the test should contain a second check for the object in the BT:

          self.assertIsInstance(self.portal.portal_templates
            .getInstalledBusinessTemplate('erp5_xhtml_style')._skin_item._objects
            ['portal_skins/erp5_ckeditor/ckeditor/LICENSE.md'].data, Pdata)
      

      BTW, your assertIsInstance line exceeds 80 chars.

    • @jerome Jérome Perrin
      @jerome commented 2019-10-10 10:05:07 UTC
      Owner

      .... but then this

              elif obj.__class__.__name__ in ('File', 'Image'):
                if "data" in obj.__dict__:
                  File._setData.__func__(obj, obj.data)
      

      should already be installing portal_skins/erp5_notebook/scipy.data with PData. I must be missing something.

    • @jm Julien Muchembled
      @jm commented 2019-10-10 10:45:35 UTC
      Owner

      It would. NEO breaks before.

  • @jerome Jérome Perrin
    @jerome commented 2019-10-10 08:49:01 UTC
    Owner

    Ah thank you, I understand, this is this patch that would make text_content a Pdata.

    Before the patch, only data property can be PData, text_content was always str. If we change this, then it becomes a big change and probably other things would break. For example, can we use WebPage_viewEditor if text_content is Pdata ?

    Maybe the logic should be something like:

        if isinstance(obj, OFSFile) and property_name == "data":
          data = obj._read_data(data)[0]
    
  • @jm Julien Muchembled
    @jm commented 2019-10-10 09:15:25 UTC
    Owner

    I would like to know why neo did not work. I know that having a large size of string in a persistent object is a bad idea, it should be Pdata, but it is still a bug of neo.

    It is not a bug of NEO, and this is not even specific to NEO. There has always been an arbitrary limit to the size of network packets in the NEO protocol (64M), but we consider that exceeding this limit is either a misuse or a bug and we don't want think to get worse with a OOM. msgpack does the same, and ZEO5 has failing tests with last version of msgpack.

    We're open to increase this limit whenever it is reasonable. Here, it isn't.

  • @jm Julien Muchembled
    @jm commented 2019-10-10 09:35:16 UTC
    Owner

    For example, can we use WebPage_viewEditor if text_content is Pdata ?

    I'm afraid we have no choice and make it work.

  • @jerome Jérome Perrin
    @jerome commented 2019-10-10 09:55:57 UTC
    Owner

    I'm afraid we have no choice and make it work.

    It seems a good thing if we can make text_content (and maybe all string properties) support very large text by using Pdata but in my understanding this is not blocking for the problem of installing erp5_notebook, because scipy.data is an OFS.Image.File , which should be using .data property, not .text_content, so we can limit this to "business template installation should use Pdata for OFS.Image.File.data" and handle text content later.

  • @jm Julien Muchembled
    @jm commented 2019-10-10 13:46:06 UTC
    Owner

    OK. And it looks like anyway that types with text_content attribute are unlikely to be big.

    Xiaowu, what's the status with isinstance(obj, OFSFile) and property_name == "data" and with the 3 lines removed in install() ?

  • @xiaowu.zhang Xiaowu Zhang
    @xiaowu.zhang

    added 1 commit

    • 7ceb0a6e - BusinessTemplate: File's data property is imported as PData

    Compare with previous version

    2019-10-10 15:43:35 UTC

    added 1 commit

    • 7ceb0a6e - BusinessTemplate: File's data property is imported as PData

    Compare with previous version

    Toggle commit list
  • @xiaowu.zhang Xiaowu Zhang
    @xiaowu.zhang commented 2019-10-10 15:47:02 UTC
    Developer

    what's the status with isinstance(obj, OFSFile) and....

    Test still running,

    Edited 2019-10-10 15:47:09 UTC
  • @arnau Arnaud Fontaine
    @arnau commented 2019-10-10 20:50:45 UTC
    Developer

    It's normal that file data are PData, this is what happens when "large" files are uploaded ( because _read_data is called ), it's not good that business template uses string.

    @jerome Just out of curiosity, but where is this done? I could not find it...

  • @jerome Jérome Perrin
    @jerome commented 2019-10-11 03:59:36 UTC
    Owner

    @arnau the logic to use Pdata if it's big, str otherwise is in https://github.com/zopefoundation/Zope/blob/2.13.27/src/OFS/Image.py#L513

    File_view has a my_file field, when edit is called, something like this happens:

      parts/erp5/product/ERP5Type/Base.py(254)__call__()
    -> result = self.__dict__['_m'](instance, *args, **kw)
      parts/erp5/product/ERP5/Document/File.py(117)_edit()
    -> self._setFile(file_object, precondition=precondition)
      parts/erp5/product/ERP5Type/Base.py(254)__call__()
    -> result = self.__dict__['_m'](instance, *args, **kw)
      parts/erp5/product/ERP5/Document/File.py(148)_setFile()
    -> CMFFile._edit(self, precondition=precondition, file=data)
      eggs/Products.CMFDefault-2.2.4-py2.7.egg/Products/CMFDefault/File.py(173)_edit()
    -> self.manage_upload(file)
      eggs/Zope2-2.13.27+slapospatched001-py2.7.egg/OFS/Image.py(492)manage_upload()
    -> data, size = self._read_data(file)
    > eggs/Zope2-2.13.27+slapospatched001-py2.7.egg/OFS/Image.py(515)_read_data()
    -> import transaction
    

    setFile (and also setData which is uses setFile) are special setters. setTextContent on web page is a simple setter for string property.

  • @xiaowu.zhang Xiaowu Zhang
    @xiaowu.zhang

    added 1 commit

    • 78e60a3e - BusinessTemplate: : is missed :-

    Compare with previous version

    2019-10-11 08:16:45 UTC

    added 1 commit

    • 78e60a3e - BusinessTemplate: : is missed :-

    Compare with previous version

    Toggle commit list
  • @arnau Arnaud Fontaine
    @arnau commented 2019-10-11 09:48:03 UTC
    Developer

    @jerome Thanks. I have missed it when reading OFS.Image source code :P.

  • @xiaowu.zhang Xiaowu Zhang
    @xiaowu.zhang

    added 1 commit

    • 9eda82d9 - BusinessTemplate: _setData also set size attribute

    Compare with previous version

    2019-10-15 13:43:42 UTC

    added 1 commit

    • 9eda82d9 - BusinessTemplate: _setData also set size attribute

    Compare with previous version

    Toggle commit list
  • @xiaowu.zhang Xiaowu Zhang
    @xiaowu.zhang

    added 27 commits

    • 9eda82d9...f675c0fb - 26 commits from branch nexedi:master
    • c0118634 - BusinessTemplate: Import Big File's data property as Pdata

    Compare with previous version

    2019-10-17 08:57:54 UTC

    added 27 commits

    • 9eda82d9...f675c0fb - 26 commits from branch nexedi:master
    • c0118634 - BusinessTemplate: Import Big File's data property as Pdata

    Compare with previous version

    Toggle commit list
  • @jerome Jérome Perrin
    @jerome commented 2019-10-28 03:59:40 UTC
    Owner

    what's the status of this ? looks good to me. So the problem was that when we first import the business template and create the ObjectTemplateItem we are using a very large string and not a PData and even if the .data attribute of the file was already properly created as PData the ObjectTemplateItem ( _path_item of the business template document ) was a very large string, is it ?

  • @xiaowu.zhang Xiaowu Zhang
    @xiaowu.zhang commented 2019-10-28 09:03:09 UTC
    Developer

    when we first import the business template and create the ObjectTemplateItem we are using a very large string and not a PData

    yes

    .data attribute of the file was already properly created as PData

    as i understand, when first import bt5, .data attribute was not yet set as PData, it happens after import bt5 in

    elif obj.__class__.__name__ in ('File', 'Image'):
              if "data" in obj.__dict__:
                File._setData.__func__(obj, obj.data)
    

    and NEO break before it.

    So in this MR, when first import bt5, we set .data attribute as PData asap.

    I'll rebase it against master, if all is ok, i'll merge.

  • @xiaowu.zhang Xiaowu Zhang
    @xiaowu.zhang

    added 23 commits

    • c0118634...969c3e23 - 22 commits from branch nexedi:master
    • 37e1ab36 - BusinessTemplate: Import Big File's data property as Pdata

    Compare with previous version

    2019-10-30 08:14:53 UTC

    added 23 commits

    • c0118634...969c3e23 - 22 commits from branch nexedi:master
    • 37e1ab36 - BusinessTemplate: Import Big File's data property as Pdata

    Compare with previous version

    Toggle commit list
  • Write
  • Preview
Styling with Markdown is supported
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
  • Please register or sign in to post a comment
Assignee
No assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
0
Labels
None
Assign labels
  • View labels
35
35 participants
+ 28 more
Reference: nexedi/erp5!953
GitLab Nexedi Edition | About GitLab | About Nexedi | 沪ICP备14008524号