1. 03 Mar, 2015 6 commits
    • Kirill Smelkov's avatar
      BigFile: Basic tests · 9bf0d1e1
      Kirill Smelkov authored
      So far BigFile was not unit-tested and because of recent BigFile patches
      and fixes Romain suggested to write tests for it.
      
      We test:
      
          - working with BigFile via its public interface:
      
            * GET/PUT both in plain and with ranges variants,
            *.getData()/.getSize(), and
            * recently-introduced ._appendData()
      
          - that BigFile correctly handles situations where .data is either
            None or str or BTreeData and that migration automatically happens
            to BTreeData on append.
      
      ~~~~
      
      Unlike common case, BigFile more directly works on REQUEST and RESPONSE
      (instead of plain object publishing), so to test it we need not only call
      methods and compare return values but first prepare proper
      request/response, set them up and analyze response headers and content
      after method invocation happened.
      
      For preparing request/response Zope provides utility
      Testing.makerequest() and its 2 variations but for our case they all
      turned out to be not flexible enough - e.g. Testing.makerequest()
      hardcodes request stdin=sys.stdin
      
          https://github.com/zopefoundation/Zope/blob/master/src/Testing/makerequest.py#L56
      
      (and we need to provide it to e.g. upload via PUT), makerequest from
      Products.CMFCore.tests.test_CookieCrumbler hardcodes request environment
      
          http://svn.zope.org/Products.CMFCore/branches/2.2/Products/CMFCore/tests/test_CookieCrumbler.py?revision=126491&view=markup
      
      (and we need it for convenient way to set request headers), etc, so first we
      introduce our own makerequest-alike that
      
         1. always redirects stdout to stringio,
         2. stdin content can be specified and is processed,
         3. returns actual request object (not wrapped portal).
      
      on top of that we introduce two convenience helpers GET and PUT to prepare
      same-named request and then a function to generally invoke a request on object
      and check results - i.e. given object and request, find appropriate method,
      call it appropriately, verify return value, http status code, response body and
      check asserted headers. All that in one line - to keep signal-to-noise ratio high.
      
      ~~~~
      
      There are still some things to fix and improve:
      
         - Zope translates 308 http status code (which BigFile PUT with range
           query returns) to 500 because that code is experimental:
      
           https://github.com/zopefoundation/Zope/blob/master/src/ZPublisher/HTTPResponse.py#L226
           https://github.com/zopefoundation/Zope/blob/master/src/ZPublisher/HTTPResponse.py#L64
      
         - It is not clear (to me) what PUT range query should return for
           empty file. In HTTP/1.1 ranges are specified as both start and end
           inclusive so currently for empty-file case BigFile code returns "0--1"
           (= "0" - "-1") but that is not valid according to HTTP/1.1 spec
      
              http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35.1
      
           and again, judging from spec, it is not clear how to represent
           range "empty".
      
           For now I've left "0--1" checked as correct, but left a note in
           tests this is dubiously so.
      
         - Support for 'If-Range' and multiple ranges in 'Range' headers is
           not tested.
      
         - There are no scalability tests, i.e. "let's write a lot of data
           into BigFile and see how underlying BTreeData behaves"
      
      So all it is is basic tests so we know general BigFile logic and
      interface work.
      
      Test is done as a "live test" under erp5_big_file bt5 as per Sebastien
      suggestion.
      Helped-by: Sebastien Robin's avatarSebastien Robin <seb@nexedi.com>
      9bf0d1e1
    • Kirill Smelkov's avatar
      bt5/erp5_big_file: Regenerate · e454ebfb
      Kirill Smelkov authored
      Rebuild this bt5 afresh from current ERP5.
      e454ebfb
    • Kirill Smelkov's avatar
      BigFile: Fix most errors reported by pyflakes · d087fba3
      Kirill Smelkov authored
          $ pyflakes product/ERP5/Document/BigFile.py
      
          product/ERP5/Document/BigFile.py:27: 'getToolByName' imported but unused
          product/ERP5/Document/BigFile.py:180: undefined name 'DateTime'
          product/ERP5/Document/BigFile.py:325: local variable 'filename' is assigned to but never used
          product/ERP5/Document/BigFile.py:360: local variable 'data' is assigned to but never used
      
      getToolByName is not used. For DateTime we add appropriate import. data
      was there unused from the beginning - from 00f696ee (Allow to upload in
      chunk.) - for query_range we just return range = [0, current_size-1] and
      data is left unused.
      
      I did not remove filename in
      
          # need to return it as attachment
          filename = self.getStandardFilename(format=format)
          RESPONSE.setHeader('Cache-Control', 'Private') # workaround for Internet Explorer's bug
          RESPONSE.setHeader('Accept-Ranges', 'bytes')
      
      because as the comment says it tries to workaround some IE bug and I
      have no clue is filename needed in that case and was forgotten to be
      appended or it is the other way.
      Reviewed-by: Romain Courteaud's avatarRomain Courteaud <romain@nexedi.com>
      d087fba3
    • Kirill Smelkov's avatar
      BigFile: Factor out code to append data chunk to ._appendData() · f126d0bf
      Kirill Smelkov authored
      So that data could be appended on server code via direct calls too.
      
      NOTE previously ._read_data() accepted data=None argument and callers
      were either providing it with current .data to append or None to forget
      old content and just add new fresh one.
      
      We could drop data=None from _read_data() signature, but leave it as is
      for compatibility with outside code (e.g. zope's
      OFS.Image.File.manage_upload() calls ._read_data(file) without any data
      argument and in that case file content should be recreated, not
      appended).
      
      On the other hand we rework our code in .PUT() so for both "new content"
      and "append range" in the end it always does "append" operation. For it
      to work this way "new content" simply truncates the file before
      proceeding to "append".
      Reviewed-by: Romain Courteaud's avatarRomain Courteaud <romain@nexedi.com>
      f126d0bf
    • Kirill Smelkov's avatar
      BigFile: .data can be BTreeData or None or (possibly non-empty) str · f54e3fa6
      Kirill Smelkov authored
      Current BigFile code in many places assumes .data property is either
      None or BTreeData instance. But as already shown in 4d8f0c33 (BigFile:
      Fix non-range download of just created big file) this is not true and
      .data can be an str.
      
      This leads to situations where code wants to act on an str, like on a
      BTreeData instance, e.g.
      
          def _range_request_handler():
              ...
      
              if data is not None:
                  RESPONSE.setHeader('Last-Modified', rfc1123_date(data._p_mtime))
      
      or
      
          def _read_data(... data=None):
              ...
      
              if data is None:
                  btree = BTreeData()
              else:
                  btree = data
      
              ...
              btree.write(...)
      
      and other places, and in all those situation we'll get AttributeError
      (str has neither ._p_mtime nor .write) and it will crash.
      
      ~~~~
      
      .data can be str at least because '' is the default value for `data`
      field in Data property sheet. From this proposition the code could be
      reorganised to work in "data is either BTreeData or empty (= None or '')"
      
      But we discussed with Romain and his idea is that non empty strings have
      to be too supported because of compatibility reasons and because of
      desire to support possible future automatic File-based documents
      migration to BigFiles.
      
      From this perspective for BigFile the invariant thus
      
          .data is either BTreeData or str (empty or not) or None.
      
      This patch goes through whole BigFile code and corrects places to either
      properly support str case, or None (in e.g. computing len(data) in
      index_html).
      
      In _read_data() if data is previously str - that means we are appending
      content to this file and thus it is a good idea to first convert str
      (empty or not) to BTreeData and then proceed with appending.
      Helped-by: Vincent Pelletier's avatarVincent Pelletier <vincent@nexedi.com>
      Reviewed-by: Romain Courteaud's avatarRomain Courteaud <romain@nexedi.com>
      f54e3fa6
    • Kirill Smelkov's avatar
      BigFile: Factor out "get .data mtime" into helper · 28225e1b
      Kirill Smelkov authored
      Since .data can be BTreeData or None (or as we'll see next and str),
      ._p_mtime() is not always defined on it and in several places current
      code has branches from where to get it.
      
      Move this logic out to a separate helper and the code which needs to
      know mtime gets streamlined.
      Suggested-by: Vincent Pelletier's avatarVincent Pelletier <vincent@nexedi.com>
      28225e1b
  2. 02 Mar, 2015 1 commit
  3. 23 Feb, 2015 1 commit
    • Kirill Smelkov's avatar
      BigFile: Fix non-range download of just created big file · 4d8f0c33
      Kirill Smelkov authored
      If in erp5 I go to big_file_module and 'Add Big File' action and then try to
      download just-created empty bigfile I get a crash:
      
          curl ... --data 'format=raw'  http://localhost:8889/erp5/big_file_module/18
      
          ...
          <h2>Site Error</h2>
      
          <p>An error was encountered while publishing this
           resource.</p>
      
          <p>
            <strong>Error Type:
              AttributeError
            </strong>
            <br />
            <strong>Error Value:
              'str' object has no attribute 'iterate'
            </strong>
            <br />
          </p>
      
      with exception traceback
      
          Traceback (innermost last):
            Module ZPublisher.Publish, line 138, in publish
              request, bind=1)
            Module ZPublisher.mapply, line 77, in mapply
              if debug is not None: return debug(object,args,context)
            Module ZPublisher.Publish, line 48, in call_object
              result=apply(object,args) # Type s<cr> to step into published object.
            Module Products.ERP5.Document.BigFile, line 297, in index_html
              for chunk in data.iterate():
          AttributeError: 'str' object has no attribute 'iterate'
      
      I've compared BigFile code with the sample place in File code from Zope/src/OFS
      (which is base class for BigFile)
      
          https://github.com/zopefoundation/Zope/blob/2.13/src/OFS/Image.py#L420
      
      and in index_html(), if we requested the data itself, there it sees whether
      self.data is either
      
          1) simply bytes (= str in python2), or
      
          2) linked-list of Pdata
      
      and in BigFile we currently miss handling 1) case.
      
      ~~~~
      
      BigFile, it looks, was copied-and-modified from Zope.OFS.Image.File first in
      65121be7 (Support streaming big file in DMS.) Then in index_html download there
      was only an 'iterate over btree chunks' case. Later in dff53681 (Get
      modification date from btree.) a case for
      
          if data is None:
            return ''
      
      was added before btree iteration.
      
      Here we also restore original Zope code for returning file content if it is
      string instance directly, because as it is experimentally observed, that case
      can also happen.
      
      The patch does not add tests, because currently BigFile class does not have
      tests at all (at least I could not find them).
      Reviewed-by: Romain Courteaud's avatarRomain Courteaud <romain@nexedi.com>
      4d8f0c33
  4. 21 Feb, 2015 3 commits
  5. 18 Feb, 2015 1 commit
  6. 17 Feb, 2015 28 commits