1. 15 Apr, 2020 3 commits
    • Kirill Smelkov's avatar
      lib/zodb: Add patch to ZODB.Connection to support callback on connection DB view change · 959ae2d0
      Kirill Smelkov authored
      Wendelin.core 2 will need to hook into when client ZODB.Connection
      changes its database view and readjust WCFS-level client connection
      ZODB.Connection can change its view on either connection reopen, or even
      without reopen on start of new transaction.
      This patch implements ZODB.Connection.onResyncCallback for ZODB5 only.
      ZODB4 and ZODB3 support is TODO.
    • Kirill Smelkov's avatar
      lib/zodb: Add zconn_at draft (ZODB5 only) · 3bd82127
      Kirill Smelkov authored
      For wendelin.core v2 we need a way to know at which particular database
      state application-level ZODB connection is viewing the database. Knowing
      that state, WCFS client library will interact with WCFS filesystem server
      and, in simple terms, request the server to provide data as of that
      particular database state.
      Contrary to ZODB/go[1] ZODB/py does not provide the functionality to
      obtain DB state of connection view, so we have to build it ourselves.
      Let us call the function that for a client ZODB connection returns
      database state corresponding to its database view as zconn_at.
      It is relatively easy to implement zconn_at for ZODB5, since ZODB5
      adopted MVCC uniformly and this patch does just that. However even with
      ZODB5 currently all released ZODB5 versions have race in
      Connection.open() vs invalidations[2], and so the first ZODB5 release
      with which zconn_at implemented here will work reliable should be
      upcoming ZODB 5.5.2
      It is TODO to implement zconn_at for ZODB4 and ZODB3, which organize
      things differently.
      Please note what would happen if zconn_at gives, even a bit, incorrect
      answer: wcfs client will ask wcfs server to provide array data as of
      different database state compared to current on-client ZODB connection.
      This will result in that data accessed via ZBigArray will _not_
      correspond to all other data accessed via regular ZODB mechanism.
      It is, in other words, would be a data corruptions.
      [1] https://godoc.org/lab.nexedi.com/kirr/neo/go/zodb#Connection
      [2] https://github.com/zopefoundation/ZODB/issues/290
    • Kirill Smelkov's avatar
      lib/zodb: Add zmajor - way to know under which ZODB 3, 4 or 5 we are running · 8c0b7471
      Kirill Smelkov authored
      This will be needed in the following patches to know how to inject
      zconn_at or zconn resync functionality into particular ZODB version.
  2. 01 Apr, 2020 2 commits
  3. 18 Dec, 2019 3 commits
    • Kirill Smelkov's avatar
      bigfile/zodb: Factor-out LivePersistent into -> lib/zodb · c02776e9
      Kirill Smelkov authored
      It was from long-ago marked as "XXX move to common place".
    • Kirill Smelkov's avatar
      *: Add package-level documentation to ZODB-related packages · d5e0d2f9
      Kirill Smelkov authored
      Add package-level documentation to
        - bigfile/file_zodb.py,
        - bigarray/array_zodb.py, and
        - lib/zodb.py
      The most interesting read is file_zodb.py .
      Slightly improve documenation for functions in a couple of places.
      Improving documentation was long overdue and it is improved only slightly by this commit.
    • Kirill Smelkov's avatar
      tests: Keep ZEO test database on /tmp/ · 70c998c1
      Kirill Smelkov authored
      We already keep FileStorage test database on /tmp/ and NEO itself (via
      neo.tests.functional.NEOCluster) also keeps test data on tmpfs. However
      test database for ZEO was created in current directory and was wearing
      out SSD unnecessarily.
      FIXME zeo_forker currently does not provide API to keep all server files
      in particular place. This way server conf and log are still emitted in
      current directory, but at least we move data.fs away. Since conf and log
      are uniquely named, e.g. server-<ΧΧΧ>.conf and tmpYYY.log, and it was
      only that Data.fs was named non-uniquely, by moving Data.fs into unique
      per-server place, this also helps with-ZEO tests to execute correctly in
      parallel with `tox -p`.
  4. 12 Jul, 2019 1 commit
    • Kirill Smelkov's avatar
      *: Use defer for dbclose & friends · 5c8340d2
      Kirill Smelkov authored
      For tests this makes sure that if one test fails, it won't make following
      tests fail just because the next test will fail trying to lock test database.
      For regular code (demo_zbigarray.py) this is also a good thing to do -
      to always close the database irregardless of whether an exception was
      raised before program reached end of main.
      Pygolang becomes regular - not test only - dependency. Being regular
      dependency is currently required only by demo_zbigarray.py, but it will
      be also used in upcoming wcfs, so adding pygolang into wendelin.core
      dependencies aligns with the plan.
      dbclose now uses defer almost everywhere - there are still few places in
      tests, where one test function is opening/closing test database multiple
      times - those were not (yet ?) converted.
  5. 29 Oct, 2018 2 commits
    • Kirill Smelkov's avatar
      lib.xnumpy.structured: New utility to create structured view of an array · 32ca80e2
      Kirill Smelkov authored
      Structured creates view of the array interpreting its minor axis as fully covered by a dtype.
      It is similar to arr.view(dtype) + corresponding reshape, but does
      not have limitations of ndarray.view(). For example:
        In [1]: a = np.arange(3*3, dtype=np.int32).reshape((3,3))
        In [2]: a
        array([[0, 1, 2],
               [3, 4, 5],
               [6, 7, 8]], dtype=int32)
        In [3]: b = a[:2,:2]
        In [4]: b
        array([[0, 1],
               [3, 4]], dtype=int32)
        In [5]: dtxy = np.dtype([('x', np.int32), ('y', np.int32)])
        In [6]: dtxy
        Out[6]: dtype([('x', '<i4'), ('y', '<i4')])
        In [7]: b.view(dtxy)
        ValueError                                Traceback (most recent call last)
        <ipython-input-66-af98529aa150> in <module>()
        ----> 1 b.view(dtxy)
        ValueError: To change to a dtype of a different size, the array must be C-contiguous
        In [8]: structured(b, dtxy)
        Out[8]: array([(0, 1), (3, 4)], dtype=[('x', '<i4'), ('y', '<i4')])
      Structured always creates view and never copies data.
      Here is original context where separately playing with .shape and .dtype
      was not enough, since it was creating array copy and OOM'ing the machine:
    • Kirill Smelkov's avatar
      bigarray: Factor-out our custom numpy.lib.stride_tricks.as_strided-alike into lib/xnumpy.py · 6a5dfefa
      Kirill Smelkov authored
      We are going to use this code in another place, so move this out to
      dommon place as a preparatory step first.
      On a related note: Since ArrayRef is generic and quite independent from
      BigArray (it only supports it, but equally it supports just other - e.g.
      plain arrays), the proper place for it might be also to be lib/xnumpy.py .
      We might get to this topic a bit later.
  6. 17 Apr, 2018 2 commits
    • Kirill Smelkov's avatar
      tests: Explicitly close ZODB connections for places with warnings found by previous patch · 01b995a4
      Kirill Smelkov authored
      bigfile/tests/test_filezodb.py ........W: testdb: teardown: <Connection at 7f8fe2b43b90> left not closed by test code; opened by:
        File "/home/kirr/src/wendelin/wendelin.core/bigfile/tests/test_filezodb.py", line 754, in test_bigfile_zblk1_zdata_reuse
        File "/home/kirr/src/wendelin/wendelin.core/bigfile/tests/test_filezodb.py", line 759, in _test_bigfile_zblk1_zdata_reuse
          root = dbopen()
        File "/home/kirr/src/wendelin/wendelin.core/bigfile/tests/test_filezodb.py", line 47, in dbopen
          return testdb.dbopen()
        File "/home/kirr/src/wendelin/wendelin.core/lib/testing.py", line 188, in dbopen
          self.connv.append( (weakref.ref(conn), ''.join(traceback.format_stack())) )
      lib/tests/test_zodb.py .W: testdb: teardown: <Connection at 7f8fe26f13d0> left not closed by test code; opened by:
        File "/home/kirr/src/wendelin/wendelin.core/lib/tests/test_zodb.py", line 49, in test_deactivate_btree
          root = dbopen()
        File "/home/kirr/src/wendelin/wendelin.core/lib/tests/test_zodb.py", line 30, in dbopen
          return testdb.dbopen()
        File "/home/kirr/src/wendelin/wendelin.core/lib/testing.py", line 188, in dbopen
          self.connv.append( (weakref.ref(conn), ''.join(traceback.format_stack())) )
    • Kirill Smelkov's avatar
      tests: Force-close ZODB connections in teardown, that testing code forgot to explicitly close · 5a5ed2c7
      Kirill Smelkov authored
      If a test forgets to explicitly close ZODB connection it was using, this
      connection stays alive in transaction synchronizers (it is a weakset),
      and continues to be used on e.g. transaction.commit() when all
      synchronizers are invoked. This could lead to crashes like below when
      underlying ZODB storage was closed by test module teardown and testing
      moved on to another test module:
          $ WENDELIN_CORE_TEST_DB="<neo>" py.test  bigfile/tests/test_filezodb.py::test_bigfile_zblk1_zdata_reuse lib/tests/test_zodb.py
          ======= test session starts ========
          platform linux2 -- Python 2.7.14+, pytest-3.5.0, py-1.5.3, pluggy-0.6.0
          rootdir: /home/kirr/src/wendelin/wendelin.core, inifile:
          collected 2 items
          bigfile/tests/test_filezodb.py .                                     [ 50%]
          lib/tests/test_zodb.py F                                             [100%]
          ______ test_deactivate_btree _______
              def test_deactivate_btree():
                  root = dbopen()
                  # init btree with many leaf nodes
                  leafv = []
                  root['btree'] = B = IOBTree()
                  for i in range(10000):
                      B[i] = xi = XInt(i)
          >       transaction.commit()
          _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
          ../venv/z5/local/lib/python2.7/site-packages/transaction/_manager.py:131: in commit
              return self.get().commit()
          ../venv/z5/local/lib/python2.7/site-packages/transaction/_transaction.py:316: in commit
              self._synchronizers.map(lambda s: s.afterCompletion(self))
          ../venv/z5/local/lib/python2.7/site-packages/transaction/weakset.py:62: in map
          ../venv/z5/local/lib/python2.7/site-packages/transaction/_transaction.py:316: in <lambda>
              self._synchronizers.map(lambda s: s.afterCompletion(self))
          ../venv/z5/local/lib/python2.7/site-packages/ZODB/Connection.py:757: in afterCompletion
              self.newTransaction(transaction, False)
          ../venv/z5/local/lib/python2.7/site-packages/ZODB/Connection.py:737: in newTransaction
              invalidated = self._storage.poll_invalidations()
          ../venv/z5/local/lib/python2.7/site-packages/ZODB/mvccadapter.py:131: in poll_invalidations
              self._start = p64(u64(self._storage.lastTransaction()) + 1)
          _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
          self = <neo.client.Storage.Storage object at 0x7ffa1be8d410>
              def lastTransaction(self):
                  # Used in ZODB unit tests
          >       return self.app.last_tid
          E       AttributeError: 'NoneType' object has no attribute 'last_tid'
          ../../neo/src/lab.nexedi.com/kirr/neo/neo/client/Storage.py:181: AttributeError
      where NEO's Storage.app is None because the storage was closed.
      To avoid such kind of failures make sure TestDB.teardown() always closes
      all ZODB connections that were ever opened via TestDB.dbopen().
      Add a warning about such force-closing with information about corresponding
      connection and code place that created it, so that it is easy to
      understand which test needs a fix.
      /suggested-by @jm
  7. 16 Apr, 2018 1 commit
    • Kirill Smelkov's avatar
      tests: Teach to run with ZEO5 · 808b59b7
      Kirill Smelkov authored
      Since 7fc4ec66 (tests: Allow to test with ZEO & NEO ZODB storages) we
      can run the tests with either FileStorage, ZEO or NEO. But ZEO test
      adapter started to fail with ZEO5:
      	self = <wendelin.lib.testing.TestDB_ZEO object at 0x7f1feb5091d0>
      	    def setup(self):
      	        port  = self.zeo_forker.get_port()
      	        zconf = self.zeo_forker.ZEOConfig(('', port))
      	        self.addr, self.adminaddr, self.pid, self.path = \
      	>               self.zeo_forker.start_zeo_server(zeo_conf=zconf, port=port)
      	E       ValueError: need more than 2 values to unpack
      This is because in ZEO5 forker.start_zeo_server() was reworked to return only
      addr and stop closure instead of returning all details and relying on caller to
      implement stop itself.
      Adapt the test to detect ZEO5 and use new calling convention.
  8. 21 Feb, 2018 1 commit
  9. 24 Oct, 2017 1 commit
    • Kirill Smelkov's avatar
      Relicense to GPLv3+ with wide exception for all Free Software / Open Source... · f11386a4
      Kirill Smelkov authored
      Relicense to GPLv3+ with wide exception for all Free Software / Open Source projects + Business options.
      Nexedi stack is licensed under Free Software licenses with various exceptions
      that cover three business cases:
      - Free Software
      - Proprietary Software
      - Rebranding
      As long as one intends to develop Free Software based on Nexedi stack, no
      license cost is involved. Developing proprietary software based on Nexedi stack
      may require a proprietary exception license. Rebranding Nexedi stack is
      prohibited unless rebranding license is acquired.
      Through this licensing approach, Nexedi expects to encourage Free Software
      development without restrictions and at the same time create a framework for
      proprietary software to contribute to the long term sustainability of the
      Nexedi stack.
      Please see https://www.nexedi.com/licensing for details, rationale and options.
  10. 14 Aug, 2016 1 commit
    • Kirill Smelkov's avatar
      bigfile/zodb/ZBlk1: Don't miss to deactivate/free internal .chunktab buckets in loadblkdata() · 542917d1
      Kirill Smelkov authored
      13c0c17c (bigfile/zodb: Format #1 which is optimized for small changes)
      used BTree to organize ZBlk1 block's chunks and for loadblkdata() added
      "TODO we are missing to free internal BTree structures on data load".
      #3 besides other
      things showed that even when we deactivate ZData objects, we are still
      keeping them as ghosts occupying memory and the same for IOBucket
      This all happens because there is no proper way to deactivate whole
      btree - including internal buckets objects. And since internal buckets
      are not deactivated, they stay in picklecache and thus hold a reference
      to ZData objects and ZData objects in turn, even if explicitly
      deactivated, stay in memory.
      We can fix this all via implementing whole-btree deactivation procedure.
      To do so we need to iterate over all btree buckets recursively, but
      unfortunately there is no BTree API to access/iterate btree's buckets.
      We can however still get reference to first top-level buckets via
      gc.get_referents(btree) and then scan buckets further without hacks.
      gc.get_referents(btree) is a hack, but
      - it works in O(1)  (we only get pointers from btree, not scanning all
        gcable objects and deducing them)
      - it works reliable if we filter out non-interesting objects.
      So in the end it works.
      Before the patch loading more and more ZBlk1 data with objgraph
      instrumentation was showing itself like
          #                                    Nobj        δ
          wendelin.bigfile.file_zodb.ZData     7168      +512
          BTrees.IOBTree.IOBucket               238       +17
          BTrees.IOBTree.IOBTree                 14        +1
      and after this patch we now have
          BTrees.IOBTree.IOBTree                 14        +1
      we cannot remove that "IOBTree + 1", since ZBlk1 is holding direct
      reference on it (via .chunktab) and we have to keep ZBlk1 live with
      ._v_zfile and ._v_zblk set for invalidation to work. "+1 IOBtree" is
      however small - 144 bytes per 2M (= 0.006%) so we can neglect that the
      same way we neglect keeping ZBlk1 staying live for each block.
  11. 23 Sep, 2015 2 commits
    • Kirill Smelkov's avatar
      lib/mem: Allow memcpy destination to be bigger than source · 520cbc6f
      Kirill Smelkov authored
      i.e. it is ok to copy smaller data into larger buffer.
    • Kirill Smelkov's avatar
      lib/mem: Allow memcpy & friends to work on arbitrary-length buffer · a35106c2
      Kirill Smelkov authored
      - not only multiple of 8. We can do it by using uint8 typed arrays, and
      it does not hurt performance:
      In [1]: from wendelin.lib.mem import bzero, memset, memcpy
      In [2]: A = bytearray(2*1024*1024)
      In [3]: B = bytearray(2*1024*1024)
              memcpy(B, A)    bzero(A)        memset(A, 0xff)
      old:    718 µs          227 µs / 1116   228 µs / 1055 (*)
      new:    718 µs          176 µs / 1080   175 µs / 1048
          (*) the second number comes from e.g.
              In [8]: timeit bzero(A)
              The slowest run took 4.63 times longer than the fastest.
              This could mean that an intermediate result is being cached
              10000 loops, best of 3: 228 µs per loop
              so the second number is more realistic and says performance
              stays aproximately the same and only slightly improves.
  12. 06 Aug, 2015 4 commits
  13. 26 Jun, 2015 1 commit
    • Kirill Smelkov's avatar
      tests: Allow to test with ZEO & NEO ZODB storages · 7fc4ec66
      Kirill Smelkov authored
      Previously we were always testing with DBs backed up by FileStorage. Now
      we provide a way to run the testsuite with user selected storage
          $ WENDELIN_CORE_TEST_DB="<fs>"   make test.py     # test with temporary db with FileStorage
          $ WENDELIN_CORE_TEST_DB="<zeo>"  make test.py     # ----------//---------- with ZEO
          $ WENDELIN_CORE_TEST_DB="<neo>"  make test.py     # ----------//---------- with NEO
          $ WENDELIN_CORE_TEST_DB=neo://db@master  make test.py     # test with externally provided DB
      Default is still to run tests with FileStorage.
      /cc @jm
  14. 25 Jun, 2015 2 commits
  15. 02 Jun, 2015 1 commit
    • Kirill Smelkov's avatar
      *: It is not safe to use multiply.reduce() - it overflows · 73926487
      Kirill Smelkov authored
          In [1]: multiply.reduce((1<<30, 1<<30, 1<<30))
          Out[1]: 0
      instead of
          In [2]: (1<<30) * (1<<30) * (1<<30)
          Out[2]: 1237940039285380274899124224
          In [3]: 1<<90
          Out[3]: 1237940039285380274899124224
      also multiply.reduce returns int64, instead of python int:
          In [4]: type( multiply.reduce([1,2,3]) )
          Out[4]: numpy.int64
      which also leads to overflow-related problems if we further compute with
      this value and other integers and results exceeds int64 - it becomes
          In [5]: idx0_stop = 18446744073709551615
          In [6]: stride0   = numpy.int64(1)
          In [7]: byte0_stop = idx0_stop * stride0
          In [8]: byte0_stop
          Out[8]: 1.8446744073709552e+19
      and then it becomes a real problem for BigArray.__getitem__()
          wendelin.core/bigarray/__init__.py:326: RuntimeWarning: overflow encountered in long_scalars
            page0_min  = min(byte0_start, byte0_stop+byte0_stride) // pagesize # TODO -> fileh.pagesize
      and then
          >           vma0 = self._fileh.mmap(page0_min, page0_max-page0_min+1)
          E           TypeError: integer argument expected, got float
      So just avoid multiple.reduce() and do our own mul() properly the same
      way sum() is builtin into python, and we avoid overflow-related
  16. 03 Apr, 2015 4 commits
    • Kirill Smelkov's avatar
      bigfile: Basic benchmarks · bb9d8bf1
      Kirill Smelkov authored
          - for virtual memory subsytem
          - for ZBigFiles
      They are not currently great, e.g. for virtmem we have in-kernel
      overhead of page clearing - in perf profiles, for bigfile_mmap compared
      to file_read kernel's clear_page_c raises significantly.
      That is the worker for clearing page memory and we currently cannot
      avoid that - any memory obtained from kernel (MAP_ANONYMOUS, mmap(file)
      with hole, etc...) comes pre-initialized to zeros to userspace.
      This can be seen in the benchmarks as well: file_readbig differs from
      file_read in only that the latter uses 1 small buffer and the first
      allocates large memory (cleared by kernel + python does the memset).
          bigfile/tests/bench_virtmem.py@125::bench_file_mmap_adler32     0.47  (0.86 0.49 0.47)
          bigfile/tests/bench_virtmem.py@126::bench_file_read_adler32     0.69  (1.11 0.71 0.69)
          bigfile/tests/bench_virtmem.py@127::bench_file_readbig_adler32  1.41  (1.70 1.42 1.41)
          bigfile/tests/bench_virtmem.py@128::bench_bigfile_mmap_adler32  1.42  (1.45 1.42 1.51)
          bigfile/tests/bench_virtmem.py@130::bench_file_mmap_md5         1.52  (1.91 1.54 1.52)
          bigfile/tests/bench_virtmem.py@131::bench_file_read_md5         1.73  (2.10 1.75 1.73)
          bigfile/tests/bench_virtmem.py@132::bench_file_readbig_md5      2.44  (2.73 2.46 2.44)
          bigfile/tests/bench_virtmem.py@133::bench_bigfile_mmap_md5      2.40  (2.48 2.40 2.53)
      There is MAP_UNINITIALIZED which works only for non-mmu targets and only
      if explicitly allowed when configuring kernel (off by default).
      There were patches to disable that pages zeroing, as it gives
      significant speedup for people's workloads, e.g. [1,2] but all of them
      did not got merged for security reasons.
      [1] http://marc.info/?t=132691315900001&r=1&w=2
      [2] http://thread.gmane.org/gmane.linux.kernel/548926
      For ZBigFile - it is the storage who is dominating in profiles.
    • Kirill Smelkov's avatar
      lib/mem: Python utilities to zero, set & copy memory · 699b1375
      Kirill Smelkov authored
      Like C bzero / memset & memcopy - but work on python buffers. We
      leverage NumPy for doing actual work, and this way NumPy becomes a
      Having NumPy as a dependency is ok - we'll for sure need it later as we
      are trying to build out-of-core ndarrays.
    • Kirill Smelkov's avatar
      lib/utils: Small C utilities we'll use · 3e5e78cd
      Kirill Smelkov authored
      Like taking an exact integer log2, upcasting pointers for C-style
      inheritance done in a Plan9 way, and wrappers to functions which should
      never fail.
    • Kirill Smelkov's avatar
      lib/bug: Small utilities to say there is a warning, or a todo, or a bug on condition · db28e53f
      Kirill Smelkov authored
      Modelled by ones used in Linux kernel.