- 15 Apr, 2020 3 commits
-
-
Kirill Smelkov authored
Wendelin.core 2 will need to hook into when client ZODB.Connection changes its database view and readjust WCFS-level client connection accordingly. ZODB.Connection can change its view on either connection reopen, or even without reopen on start of new transaction. This patch implements ZODB.Connection.onResyncCallback for ZODB5 only. ZODB4 and ZODB3 support is TODO.
-
Kirill Smelkov authored
For wendelin.core v2 we need a way to know at which particular database state application-level ZODB connection is viewing the database. Knowing that state, WCFS client library will interact with WCFS filesystem server and, in simple terms, request the server to provide data as of that particular database state. Contrary to ZODB/go[1] ZODB/py does not provide the functionality to obtain DB state of connection view, so we have to build it ourselves. Let us call the function that for a client ZODB connection returns database state corresponding to its database view as zconn_at. It is relatively easy to implement zconn_at for ZODB5, since ZODB5 adopted MVCC uniformly and this patch does just that. However even with ZODB5 currently all released ZODB5 versions have race in Connection.open() vs invalidations[2], and so the first ZODB5 release with which zconn_at implemented here will work reliable should be upcoming ZODB 5.5.2 It is TODO to implement zconn_at for ZODB4 and ZODB3, which organize things differently. Please note what would happen if zconn_at gives, even a bit, incorrect answer: wcfs client will ask wcfs server to provide array data as of different database state compared to current on-client ZODB connection. This will result in that data accessed via ZBigArray will _not_ correspond to all other data accessed via regular ZODB mechanism. It is, in other words, would be a data corruptions. [1] https://godoc.org/lab.nexedi.com/kirr/neo/go/zodb#Connection [2] https://github.com/zopefoundation/ZODB/issues/290
-
Kirill Smelkov authored
This will be needed in the following patches to know how to inject zconn_at or zconn resync functionality into particular ZODB version.
-
- 01 Apr, 2020 2 commits
-
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
- 18 Dec, 2019 3 commits
-
-
Kirill Smelkov authored
It was from long-ago marked as "XXX move to common place".
-
Kirill Smelkov authored
Add package-level documentation to - bigfile/file_zodb.py, - bigarray/array_zodb.py, and - lib/zodb.py The most interesting read is file_zodb.py . Slightly improve documenation for functions in a couple of places. Improving documentation was long overdue and it is improved only slightly by this commit.
-
Kirill Smelkov authored
We already keep FileStorage test database on /tmp/ and NEO itself (via neo.tests.functional.NEOCluster) also keeps test data on tmpfs. However test database for ZEO was created in current directory and was wearing out SSD unnecessarily. FIXME zeo_forker currently does not provide API to keep all server files in particular place. This way server conf and log are still emitted in current directory, but at least we move data.fs away. Since conf and log are uniquely named, e.g. server-<ΧΧΧ>.conf and tmpYYY.log, and it was only that Data.fs was named non-uniquely, by moving Data.fs into unique per-server place, this also helps with-ZEO tests to execute correctly in parallel with `tox -p`.
-
- 12 Jul, 2019 1 commit
-
-
Kirill Smelkov authored
For tests this makes sure that if one test fails, it won't make following tests fail just because the next test will fail trying to lock test database. For regular code (demo_zbigarray.py) this is also a good thing to do - to always close the database irregardless of whether an exception was raised before program reached end of main. Pygolang becomes regular - not test only - dependency. Being regular dependency is currently required only by demo_zbigarray.py, but it will be also used in upcoming wcfs, so adding pygolang into wendelin.core dependencies aligns with the plan. dbclose now uses defer almost everywhere - there are still few places in tests, where one test function is opening/closing test database multiple times - those were not (yet ?) converted.
-
- 29 Oct, 2018 2 commits
-
-
Kirill Smelkov authored
Structured creates view of the array interpreting its minor axis as fully covered by a dtype. It is similar to arr.view(dtype) + corresponding reshape, but does not have limitations of ndarray.view(). For example: In [1]: a = np.arange(3*3, dtype=np.int32).reshape((3,3)) In [2]: a Out[2]: array([[0, 1, 2], [3, 4, 5], [6, 7, 8]], dtype=int32) In [3]: b = a[:2,:2] In [4]: b Out[4]: array([[0, 1], [3, 4]], dtype=int32) In [5]: dtxy = np.dtype([('x', np.int32), ('y', np.int32)]) In [6]: dtxy Out[6]: dtype([('x', '<i4'), ('y', '<i4')]) In [7]: b.view(dtxy) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-66-af98529aa150> in <module>() ----> 1 b.view(dtxy) ValueError: To change to a dtype of a different size, the array must be C-contiguous In [8]: structured(b, dtxy) Out[8]: array([(0, 1), (3, 4)], dtype=[('x', '<i4'), ('y', '<i4')]) Structured always creates view and never copies data. Here is original context where separately playing with .shape and .dtype was not enough, since it was creating array copy and OOM'ing the machine: klaus/wendelin@cbe4938b
-
Kirill Smelkov authored
We are going to use this code in another place, so move this out to dommon place as a preparatory step first. On a related note: Since ArrayRef is generic and quite independent from BigArray (it only supports it, but equally it supports just other - e.g. plain arrays), the proper place for it might be also to be lib/xnumpy.py . We might get to this topic a bit later.
-
- 17 Apr, 2018 2 commits
-
-
Kirill Smelkov authored
bigfile/tests/test_filezodb.py ........W: testdb: teardown: <Connection at 7f8fe2b43b90> left not closed by test code; opened by: ... File "/home/kirr/src/wendelin/wendelin.core/bigfile/tests/test_filezodb.py", line 754, in test_bigfile_zblk1_zdata_reuse _test_bigfile_zblk1_zdata_reuse() File "/home/kirr/src/wendelin/wendelin.core/bigfile/tests/test_filezodb.py", line 759, in _test_bigfile_zblk1_zdata_reuse root = dbopen() File "/home/kirr/src/wendelin/wendelin.core/bigfile/tests/test_filezodb.py", line 47, in dbopen return testdb.dbopen() File "/home/kirr/src/wendelin/wendelin.core/lib/testing.py", line 188, in dbopen self.connv.append( (weakref.ref(conn), ''.join(traceback.format_stack())) ) lib/tests/test_zodb.py .W: testdb: teardown: <Connection at 7f8fe26f13d0> left not closed by test code; opened by: ... File "/home/kirr/src/wendelin/wendelin.core/lib/tests/test_zodb.py", line 49, in test_deactivate_btree root = dbopen() File "/home/kirr/src/wendelin/wendelin.core/lib/tests/test_zodb.py", line 30, in dbopen return testdb.dbopen() File "/home/kirr/src/wendelin/wendelin.core/lib/testing.py", line 188, in dbopen self.connv.append( (weakref.ref(conn), ''.join(traceback.format_stack())) )
-
Kirill Smelkov authored
If a test forgets to explicitly close ZODB connection it was using, this connection stays alive in transaction synchronizers (it is a weakset), and continues to be used on e.g. transaction.commit() when all synchronizers are invoked. This could lead to crashes like below when underlying ZODB storage was closed by test module teardown and testing moved on to another test module: $ WENDELIN_CORE_TEST_DB="<neo>" py.test bigfile/tests/test_filezodb.py::test_bigfile_zblk1_zdata_reuse lib/tests/test_zodb.py ======= test session starts ======== platform linux2 -- Python 2.7.14+, pytest-3.5.0, py-1.5.3, pluggy-0.6.0 rootdir: /home/kirr/src/wendelin/wendelin.core, inifile: collected 2 items bigfile/tests/test_filezodb.py . [ 50%] lib/tests/test_zodb.py F [100%] ______ test_deactivate_btree _______ def test_deactivate_btree(): root = dbopen() # init btree with many leaf nodes leafv = [] root['btree'] = B = IOBTree() for i in range(10000): B[i] = xi = XInt(i) leafv.append(xi) > transaction.commit() lib/tests/test_zodb.py:56: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ../venv/z5/local/lib/python2.7/site-packages/transaction/_manager.py:131: in commit return self.get().commit() ../venv/z5/local/lib/python2.7/site-packages/transaction/_transaction.py:316: in commit self._synchronizers.map(lambda s: s.afterCompletion(self)) ../venv/z5/local/lib/python2.7/site-packages/transaction/weakset.py:62: in map f(elt) ../venv/z5/local/lib/python2.7/site-packages/transaction/_transaction.py:316: in <lambda> self._synchronizers.map(lambda s: s.afterCompletion(self)) ../venv/z5/local/lib/python2.7/site-packages/ZODB/Connection.py:757: in afterCompletion self.newTransaction(transaction, False) ../venv/z5/local/lib/python2.7/site-packages/ZODB/Connection.py:737: in newTransaction invalidated = self._storage.poll_invalidations() ../venv/z5/local/lib/python2.7/site-packages/ZODB/mvccadapter.py:131: in poll_invalidations self._start = p64(u64(self._storage.lastTransaction()) + 1) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <neo.client.Storage.Storage object at 0x7ffa1be8d410> def lastTransaction(self): # Used in ZODB unit tests > return self.app.last_tid E AttributeError: 'NoneType' object has no attribute 'last_tid' ../../neo/src/lab.nexedi.com/kirr/neo/neo/client/Storage.py:181: AttributeError where NEO's Storage.app is None because the storage was closed. ---- To avoid such kind of failures make sure TestDB.teardown() always closes all ZODB connections that were ever opened via TestDB.dbopen(). Add a warning about such force-closing with information about corresponding connection and code place that created it, so that it is easy to understand which test needs a fix. /suggested-by @jm
-
- 16 Apr, 2018 1 commit
-
-
Kirill Smelkov authored
Since 7fc4ec66 (tests: Allow to test with ZEO & NEO ZODB storages) we can run the tests with either FileStorage, ZEO or NEO. But ZEO test adapter started to fail with ZEO5: self = <wendelin.lib.testing.TestDB_ZEO object at 0x7f1feb5091d0> def setup(self): port = self.zeo_forker.get_port() zconf = self.zeo_forker.ZEOConfig(('', port)) self.addr, self.adminaddr, self.pid, self.path = \ > self.zeo_forker.start_zeo_server(zeo_conf=zconf, port=port) E ValueError: need more than 2 values to unpack This is because in ZEO5 forker.start_zeo_server() was reworked to return only addr and stop closure instead of returning all details and relying on caller to implement stop itself. Adapt the test to detect ZEO5 and use new calling convention.
-
- 21 Feb, 2018 1 commit
-
-
Kirill Smelkov authored
This allows e.g. to open `neo://cluster@master?compress=false` - in other words with using options, which our current simplified opening code does not support. Keep old dbstoropen around as the fallback to work when zodbtools/zodburi are not available, since we still want to try to support ZODB 3.10.
-
- 24 Oct, 2017 1 commit
-
-
Kirill Smelkov authored
Relicense to GPLv3+ with wide exception for all Free Software / Open Source projects + Business options. Nexedi stack is licensed under Free Software licenses with various exceptions that cover three business cases: - Free Software - Proprietary Software - Rebranding As long as one intends to develop Free Software based on Nexedi stack, no license cost is involved. Developing proprietary software based on Nexedi stack may require a proprietary exception license. Rebranding Nexedi stack is prohibited unless rebranding license is acquired. Through this licensing approach, Nexedi expects to encourage Free Software development without restrictions and at the same time create a framework for proprietary software to contribute to the long term sustainability of the Nexedi stack. Please see https://www.nexedi.com/licensing for details, rationale and options.
-
- 14 Aug, 2016 1 commit
-
-
Kirill Smelkov authored
13c0c17c (bigfile/zodb: Format #1 which is optimized for small changes) used BTree to organize ZBlk1 block's chunks and for loadblkdata() added "TODO we are missing to free internal BTree structures on data load". #3 besides other things showed that even when we deactivate ZData objects, we are still keeping them as ghosts occupying memory and the same for IOBucket objects. This all happens because there is no proper way to deactivate whole btree - including internal buckets objects. And since internal buckets are not deactivated, they stay in picklecache and thus hold a reference to ZData objects and ZData objects in turn, even if explicitly deactivated, stay in memory. We can fix this all via implementing whole-btree deactivation procedure. To do so we need to iterate over all btree buckets recursively, but unfortunately there is no BTree API to access/iterate btree's buckets. We can however still get reference to first top-level buckets via gc.get_referents(btree) and then scan buckets further without hacks. gc.get_referents(btree) is a hack, but - it works in O(1) (we only get pointers from btree, not scanning all gcable objects and deducing them) - it works reliable if we filter out non-interesting objects. So in the end it works. Before the patch loading more and more ZBlk1 data with objgraph instrumentation was showing itself like # Nobj δ wendelin.bigfile.file_zodb.ZData 7168 +512 BTrees.IOBTree.IOBucket 238 +17 BTrees.IOBTree.IOBTree 14 +1 and after this patch we now have BTrees.IOBTree.IOBTree 14 +1 we cannot remove that "IOBTree + 1", since ZBlk1 is holding direct reference on it (via .chunktab) and we have to keep ZBlk1 live with ._v_zfile and ._v_zblk set for invalidation to work. "+1 IOBtree" is however small - 144 bytes per 2M (= 0.006%) so we can neglect that the same way we neglect keeping ZBlk1 staying live for each block.
-
- 23 Sep, 2015 2 commits
-
-
Kirill Smelkov authored
i.e. it is ok to copy smaller data into larger buffer.
-
Kirill Smelkov authored
- not only multiple of 8. We can do it by using uint8 typed arrays, and it does not hurt performance: In [1]: from wendelin.lib.mem import bzero, memset, memcpy In [2]: A = bytearray(2*1024*1024) In [3]: B = bytearray(2*1024*1024) memcpy(B, A) bzero(A) memset(A, 0xff) old: 718 µs 227 µs / 1116 228 µs / 1055 (*) new: 718 µs 176 µs / 1080 175 µs / 1048 (*) the second number comes from e.g. In [8]: timeit bzero(A) The slowest run took 4.63 times longer than the fastest. This could mean that an intermediate result is being cached 10000 loops, best of 3: 228 µs per loop so the second number is more realistic and says performance stays aproximately the same and only slightly improves.
-
- 06 Aug, 2015 4 commits
-
-
Kirill Smelkov authored
Mutex lock/unlock should not fail if mutex was correctly initialized/used.
-
Kirill Smelkov authored
We factored out SIGSEGV block/restore from fileh_dirty_writeout() to all functions in cb7a7055 (bigfile/virtmem: Block/restore SIGSEGV in non-pagefault-handling function). The restoration however just sets whole thread sigmask. It could be possible that between block/restore calls procmask for other signals could be changed, and this way - setting procmask directly - we will overwrite them. So be careful, and when restoring SIGSEGV mask, touch mask bit for only that signal. ( we need xsigismember helper to get this done, which is also introduced in this patch )
-
Kirill Smelkov authored
The mistake was there from the beginning - from 3e5e78cd (lib/utils: Small C utilities we'll use).
-
Kirill Smelkov authored
We'll need this for function which return error not in errno - e.g. pthread_sigmask().
-
- 26 Jun, 2015 1 commit
-
-
Kirill Smelkov authored
Previously we were always testing with DBs backed up by FileStorage. Now we provide a way to run the testsuite with user selected storage backend: $ WENDELIN_CORE_TEST_DB="<fs>" make test.py # test with temporary db with FileStorage $ WENDELIN_CORE_TEST_DB="<zeo>" make test.py # ----------//---------- with ZEO $ WENDELIN_CORE_TEST_DB="<neo>" make test.py # ----------//---------- with NEO $ WENDELIN_CORE_TEST_DB=neo://db@master make test.py # test with externally provided DB Default is still to run tests with FileStorage. /cc @jm
-
- 25 Jun, 2015 2 commits
-
-
Kirill Smelkov authored
Done via manual hacky way for now. The clean solution would be to reuse e.g. repoze.zodbconn[1] or zodburi[2] and teach them to support NEO. But for now we can't -- those eggs depend on ZODB, and we still use ZODB3 for maintaining compatibility with both ZODB3.10 and ZODB4. /cc @jm [1] https://pypi.python.org/pypi/repoze.zodbconn [2] https://pypi.python.org/pypi/zodburi
-
Kirill Smelkov authored
Factor out those routines to open a ZODB database to common place. The reason for doing so is that we'll soon teach dbopen to automatically recognize several protocols, e.g. neo:// and zeo:// and this way, clients who use dbopen() could automatically access storages besides FileStorage.
-
- 02 Jun, 2015 1 commit
-
-
Kirill Smelkov authored
e.g. In [1]: multiply.reduce((1<<30, 1<<30, 1<<30)) Out[1]: 0 instead of In [2]: (1<<30) * (1<<30) * (1<<30) Out[2]: 1237940039285380274899124224 In [3]: 1<<90 Out[3]: 1237940039285380274899124224 also multiply.reduce returns int64, instead of python int: In [4]: type( multiply.reduce([1,2,3]) ) Out[4]: numpy.int64 which also leads to overflow-related problems if we further compute with this value and other integers and results exceeds int64 - it becomes float: In [5]: idx0_stop = 18446744073709551615 In [6]: stride0 = numpy.int64(1) In [7]: byte0_stop = idx0_stop * stride0 In [8]: byte0_stop Out[8]: 1.8446744073709552e+19 and then it becomes a real problem for BigArray.__getitem__() wendelin.core/bigarray/__init__.py:326: RuntimeWarning: overflow encountered in long_scalars page0_min = min(byte0_start, byte0_stop+byte0_stride) // pagesize # TODO -> fileh.pagesize and then > vma0 = self._fileh.mmap(page0_min, page0_max-page0_min+1) E TypeError: integer argument expected, got float ~~~~ So just avoid multiple.reduce() and do our own mul() properly the same way sum() is builtin into python, and we avoid overflow-related problems.
-
- 03 Apr, 2015 4 commits
-
-
Kirill Smelkov authored
- for virtual memory subsytem - for ZBigFiles They are not currently great, e.g. for virtmem we have in-kernel overhead of page clearing - in perf profiles, for bigfile_mmap compared to file_read kernel's clear_page_c raises significantly. That is the worker for clearing page memory and we currently cannot avoid that - any memory obtained from kernel (MAP_ANONYMOUS, mmap(file) with hole, etc...) comes pre-initialized to zeros to userspace. This can be seen in the benchmarks as well: file_readbig differs from file_read in only that the latter uses 1 small buffer and the first allocates large memory (cleared by kernel + python does the memset). bigfile/tests/bench_virtmem.py@125::bench_file_mmap_adler32 0.47 (0.86 0.49 0.47) bigfile/tests/bench_virtmem.py@126::bench_file_read_adler32 0.69 (1.11 0.71 0.69) bigfile/tests/bench_virtmem.py@127::bench_file_readbig_adler32 1.41 (1.70 1.42 1.41) bigfile/tests/bench_virtmem.py@128::bench_bigfile_mmap_adler32 1.42 (1.45 1.42 1.51) bigfile/tests/bench_virtmem.py@130::bench_file_mmap_md5 1.52 (1.91 1.54 1.52) bigfile/tests/bench_virtmem.py@131::bench_file_read_md5 1.73 (2.10 1.75 1.73) bigfile/tests/bench_virtmem.py@132::bench_file_readbig_md5 2.44 (2.73 2.46 2.44) bigfile/tests/bench_virtmem.py@133::bench_bigfile_mmap_md5 2.40 (2.48 2.40 2.53) There is MAP_UNINITIALIZED which works only for non-mmu targets and only if explicitly allowed when configuring kernel (off by default). There were patches to disable that pages zeroing, as it gives significant speedup for people's workloads, e.g. [1,2] but all of them did not got merged for security reasons. [1] http://marc.info/?t=132691315900001&r=1&w=2 [2] http://thread.gmane.org/gmane.linux.kernel/548926 ~~~~ For ZBigFile - it is the storage who is dominating in profiles.
-
Kirill Smelkov authored
Like C bzero / memset & memcopy - but work on python buffers. We leverage NumPy for doing actual work, and this way NumPy becomes a depenency. Having NumPy as a dependency is ok - we'll for sure need it later as we are trying to build out-of-core ndarrays.
-
Kirill Smelkov authored
Like taking an exact integer log2, upcasting pointers for C-style inheritance done in a Plan9 way, and wrappers to functions which should never fail.
-
Kirill Smelkov authored
Modelled by ones used in Linux kernel.
-