1. 15 Apr, 2020 3 commits
    • Kirill Smelkov's avatar
      lib/zodb: Add zstor_2zurl - way to convert a ZODB storage into URL to access it · 6637d216
      Kirill Smelkov authored
      Wendelin.core 2 will need to spawn WCFS filesystem server that accesses
      the same ZODB database as the program that spawns it. The database
      argument passed to WCFS is passed in the form of URL[1,2].  Even though
      zodburi provides way to convert an URL into ZODB storage instance, there
      is currently no way for reverse operation - to convert ZODB storage
      instance into URL to access it(*). So we have to build it by our own.
      
      Provide zstor_2zurl stub that currently works for FileStorage only.
      ZEO and NEO support is TODO.
      
      In the future we might want to move this functionality into
      zodbtools/py.
      
      [1] https://lab.nexedi.com/nexedi/zodbtools/blob/a2e4dd23/zodbtools/help.py#L27-53
      [2] https://lab.nexedi.com/kirr/neo/blob/3d909114/go/zodb/zodbtools/help.go#L25-51
      
      (*) contrary to ZODB/go where this functionality is provided out of the box:
          https://godoc.org/lab.nexedi.com/kirr/neo/go/zodb#IStorage
      6637d216
    • Kirill Smelkov's avatar
      lib/zodb: Add patch to ZODB.Connection to support callback on connection DB view change · 959ae2d0
      Kirill Smelkov authored
      Wendelin.core 2 will need to hook into when client ZODB.Connection
      changes its database view and readjust WCFS-level client connection
      accordingly.
      
      ZODB.Connection can change its view on either connection reopen, or even
      without reopen on start of new transaction.
      
      This patch implements ZODB.Connection.onResyncCallback for ZODB5 only.
      
      ZODB4 and ZODB3 support is TODO.
      959ae2d0
    • Kirill Smelkov's avatar
      lib/zodb: Add zconn_at draft (ZODB5 only) · 3bd82127
      Kirill Smelkov authored
      For wendelin.core v2 we need a way to know at which particular database
      state application-level ZODB connection is viewing the database. Knowing
      that state, WCFS client library will interact with WCFS filesystem server
      and, in simple terms, request the server to provide data as of that
      particular database state.
      
      Contrary to ZODB/go[1] ZODB/py does not provide the functionality to
      obtain DB state of connection view, so we have to build it ourselves.
      Let us call the function that for a client ZODB connection returns
      database state corresponding to its database view as zconn_at.
      
      It is relatively easy to implement zconn_at for ZODB5, since ZODB5
      adopted MVCC uniformly and this patch does just that. However even with
      ZODB5 currently all released ZODB5 versions have race in
      Connection.open() vs invalidations[2], and so the first ZODB5 release
      with which zconn_at implemented here will work reliable should be
      upcoming ZODB 5.5.2
      
      It is TODO to implement zconn_at for ZODB4 and ZODB3, which organize
      things differently.
      
      Please note what would happen if zconn_at gives, even a bit, incorrect
      answer: wcfs client will ask wcfs server to provide array data as of
      different database state compared to current on-client ZODB connection.
      This will result in that data accessed via ZBigArray will _not_
      correspond to all other data accessed via regular ZODB mechanism.
      It is, in other words, would be a data corruptions.
      
      [1] https://godoc.org/lab.nexedi.com/kirr/neo/go/zodb#Connection
      [2] https://github.com/zopefoundation/ZODB/issues/290
      3bd82127
  2. 01 Apr, 2020 1 commit
  3. 18 Dec, 2019 1 commit
  4. 12 Jul, 2019 1 commit
    • Kirill Smelkov's avatar
      *: Use defer for dbclose & friends · 5c8340d2
      Kirill Smelkov authored
      For tests this makes sure that if one test fails, it won't make following
      tests fail just because the next test will fail trying to lock test database.
      
      For regular code (demo_zbigarray.py) this is also a good thing to do -
      to always close the database irregardless of whether an exception was
      raised before program reached end of main.
      
      Pygolang becomes regular - not test only - dependency. Being regular
      dependency is currently required only by demo_zbigarray.py, but it will
      be also used in upcoming wcfs, so adding pygolang into wendelin.core
      dependencies aligns with the plan.
      
      dbclose now uses defer almost everywhere - there are still few places in
      tests, where one test function is opening/closing test database multiple
      times - those were not (yet ?) converted.
      5c8340d2
  5. 29 Oct, 2018 1 commit
    • Kirill Smelkov's avatar
      lib.xnumpy.structured: New utility to create structured view of an array · 32ca80e2
      Kirill Smelkov authored
      Structured creates view of the array interpreting its minor axis as fully covered by a dtype.
      
      It is similar to arr.view(dtype) + corresponding reshape, but does
      not have limitations of ndarray.view(). For example:
      
        In [1]: a = np.arange(3*3, dtype=np.int32).reshape((3,3))
      
        In [2]: a
        Out[2]:
        array([[0, 1, 2],
               [3, 4, 5],
               [6, 7, 8]], dtype=int32)
      
        In [3]: b = a[:2,:2]
      
        In [4]: b
        Out[4]:
        array([[0, 1],
               [3, 4]], dtype=int32)
      
        In [5]: dtxy = np.dtype([('x', np.int32), ('y', np.int32)])
      
        In [6]: dtxy
        Out[6]: dtype([('x', '<i4'), ('y', '<i4')])
      
        In [7]: b.view(dtxy)
        ---------------------------------------------------------------------------
        ValueError                                Traceback (most recent call last)
        <ipython-input-66-af98529aa150> in <module>()
        ----> 1 b.view(dtxy)
      
        ValueError: To change to a dtype of a different size, the array must be C-contiguous
      
        In [8]: structured(b, dtxy)
        Out[8]: array([(0, 1), (3, 4)], dtype=[('x', '<i4'), ('y', '<i4')])
      
      Structured always creates view and never copies data.
      
      Here is original context where separately playing with .shape and .dtype
      was not enough, since it was creating array copy and OOM'ing the machine:
      
      klaus/wendelin@cbe4938b
      32ca80e2
  6. 17 Apr, 2018 1 commit
    • Kirill Smelkov's avatar
      tests: Explicitly close ZODB connections for places with warnings found by previous patch · 01b995a4
      Kirill Smelkov authored
      bigfile/tests/test_filezodb.py ........W: testdb: teardown: <Connection at 7f8fe2b43b90> left not closed by test code; opened by:
        ...
        File "/home/kirr/src/wendelin/wendelin.core/bigfile/tests/test_filezodb.py", line 754, in test_bigfile_zblk1_zdata_reuse
          _test_bigfile_zblk1_zdata_reuse()
        File "/home/kirr/src/wendelin/wendelin.core/bigfile/tests/test_filezodb.py", line 759, in _test_bigfile_zblk1_zdata_reuse
          root = dbopen()
        File "/home/kirr/src/wendelin/wendelin.core/bigfile/tests/test_filezodb.py", line 47, in dbopen
          return testdb.dbopen()
        File "/home/kirr/src/wendelin/wendelin.core/lib/testing.py", line 188, in dbopen
          self.connv.append( (weakref.ref(conn), ''.join(traceback.format_stack())) )
      
      lib/tests/test_zodb.py .W: testdb: teardown: <Connection at 7f8fe26f13d0> left not closed by test code; opened by:
        ...
        File "/home/kirr/src/wendelin/wendelin.core/lib/tests/test_zodb.py", line 49, in test_deactivate_btree
          root = dbopen()
        File "/home/kirr/src/wendelin/wendelin.core/lib/tests/test_zodb.py", line 30, in dbopen
          return testdb.dbopen()
        File "/home/kirr/src/wendelin/wendelin.core/lib/testing.py", line 188, in dbopen
          self.connv.append( (weakref.ref(conn), ''.join(traceback.format_stack())) )
      01b995a4
  7. 24 Oct, 2017 1 commit
    • Kirill Smelkov's avatar
      Relicense to GPLv3+ with wide exception for all Free Software / Open Source... · f11386a4
      Kirill Smelkov authored
      Relicense to GPLv3+ with wide exception for all Free Software / Open Source projects + Business options.
      
      Nexedi stack is licensed under Free Software licenses with various exceptions
      that cover three business cases:
      
      - Free Software
      - Proprietary Software
      - Rebranding
      
      As long as one intends to develop Free Software based on Nexedi stack, no
      license cost is involved. Developing proprietary software based on Nexedi stack
      may require a proprietary exception license. Rebranding Nexedi stack is
      prohibited unless rebranding license is acquired.
      
      Through this licensing approach, Nexedi expects to encourage Free Software
      development without restrictions and at the same time create a framework for
      proprietary software to contribute to the long term sustainability of the
      Nexedi stack.
      
      Please see https://www.nexedi.com/licensing for details, rationale and options.
      f11386a4
  8. 14 Aug, 2016 1 commit
    • Kirill Smelkov's avatar
      bigfile/zodb/ZBlk1: Don't miss to deactivate/free internal .chunktab buckets in loadblkdata() · 542917d1
      Kirill Smelkov authored
      13c0c17c (bigfile/zodb: Format #1 which is optimized for small changes)
      used BTree to organize ZBlk1 block's chunks and for loadblkdata() added
      "TODO we are missing to free internal BTree structures on data load".
      
      #3 besides other
      things showed that even when we deactivate ZData objects, we are still
      keeping them as ghosts occupying memory and the same for IOBucket
      objects.
      
      This all happens because there is no proper way to deactivate whole
      btree - including internal buckets objects. And since internal buckets
      are not deactivated, they stay in picklecache and thus hold a reference
      to ZData objects and ZData objects in turn, even if explicitly
      deactivated, stay in memory.
      
      We can fix this all via implementing whole-btree deactivation procedure.
      
      To do so we need to iterate over all btree buckets recursively, but
      unfortunately there is no BTree API to access/iterate btree's buckets.
      We can however still get reference to first top-level buckets via
      gc.get_referents(btree) and then scan buckets further without hacks.
      
      gc.get_referents(btree) is a hack, but
      
      - it works in O(1)  (we only get pointers from btree, not scanning all
        gcable objects and deducing them)
      - it works reliable if we filter out non-interesting objects.
      
      So in the end it works.
      
      Before the patch loading more and more ZBlk1 data with objgraph
      instrumentation was showing itself like
      
          #                                    Nobj        δ
          wendelin.bigfile.file_zodb.ZData     7168      +512
          BTrees.IOBTree.IOBucket               238       +17
          BTrees.IOBTree.IOBTree                 14        +1
      
      and after this patch we now have
      
          BTrees.IOBTree.IOBTree                 14        +1
      
      we cannot remove that "IOBTree + 1", since ZBlk1 is holding direct
      reference on it (via .chunktab) and we have to keep ZBlk1 live with
      ._v_zfile and ._v_zblk set for invalidation to work. "+1 IOBtree" is
      however small - 144 bytes per 2M (= 0.006%) so we can neglect that the
      same way we neglect keeping ZBlk1 staying live for each block.
      542917d1
  9. 02 Jun, 2015 1 commit
    • Kirill Smelkov's avatar
      *: It is not safe to use multiply.reduce() - it overflows · 73926487
      Kirill Smelkov authored
      e.g.
      
          In [1]: multiply.reduce((1<<30, 1<<30, 1<<30))
          Out[1]: 0
      
      instead of
      
          In [2]: (1<<30) * (1<<30) * (1<<30)
          Out[2]: 1237940039285380274899124224
      
          In [3]: 1<<90
          Out[3]: 1237940039285380274899124224
      
      also multiply.reduce returns int64, instead of python int:
      
          In [4]: type( multiply.reduce([1,2,3]) )
          Out[4]: numpy.int64
      
      which also leads to overflow-related problems if we further compute with
      this value and other integers and results exceeds int64 - it becomes
      float:
      
          In [5]: idx0_stop = 18446744073709551615
      
          In [6]: stride0   = numpy.int64(1)
      
          In [7]: byte0_stop = idx0_stop * stride0
      
          In [8]: byte0_stop
          Out[8]: 1.8446744073709552e+19
      
      and then it becomes a real problem for BigArray.__getitem__()
      
          wendelin.core/bigarray/__init__.py:326: RuntimeWarning: overflow encountered in long_scalars
            page0_min  = min(byte0_start, byte0_stop+byte0_stride) // pagesize # TODO -> fileh.pagesize
      
      and then
      
          >           vma0 = self._fileh.mmap(page0_min, page0_max-page0_min+1)
          E           TypeError: integer argument expected, got float
      
      ~~~~
      
      So just avoid multiple.reduce() and do our own mul() properly the same
      way sum() is builtin into python, and we avoid overflow-related
      problems.
      73926487