- 17 Sep, 2024 22 commits
-
-
Kirill Smelkov authored
Pinning is critical operation whose failure will soon lead to client being killed with SIGBUS. WCFS correctness also depend fundamentally on pin operation, if started, to be handled by the client. -> rework the READ handler not to cancel pin if a READ interrupt comes in from the OS client. Do this via organizing WatchLink.serveCtx and running pins under this context instead of under READ context. Later we will adjust pins to also cancel this context on any error. Test is, hopefully, TODO.
-
Kirill Smelkov authored
When serve is completing and going to exit, it sends an error message to the client without any timeout. So if the client is not reading from the channel, wcfs will get stuck waiting for the message to be consumed. -> Fix that by trying to send that last error only during 1 second and ignoring errors if any Test is, hopefully, TODO.
-
Kirill Smelkov authored
Bring in more structure: - final watchlink cleanup is done in its own block - cancelling spawned handlers is done in another block - add more comments explaining things
-
Kirill Smelkov authored
Previously we were using .sk.CloseRead() to interrupt sk.Read(), but that is not necessary since .sk, relying on xio.Pipe, implements xio.Reader natively with full support for cancellation. The original code to cancel via CloseRead comes from mid 2019 and predates go123@7ad867a3 go123@0e368363 go123@0bdac628 go123@9db4dfac go123@d2dc6c09 And in b17aeb8c and 6f0cdaff (wcfs: Provide isolation to clients), it seems, I missed to update WatchLink.serve code to that. Do that now because it simplifies code flow organization a bit.
-
Kirill Smelkov authored
So far we were testing only against faulty client that reads pin notification ok, but does not reply to the notification. But there could be more problems: 1) a client does not read pin notification at all 2) a client closes watchlink abruptly after reading pin notification 3) a client replies to pin notification but the reply is not "ack" The first problem, if not handled leads to whole set of clients to become stuck on reading the same block as the faulty client. The other problems also indicate breakage of the isolation protocol from the client side and that wcfs can no longer be sure that it provides good uncorrupted data to the client. In the first case, similarly to "no reply" situation we need to kill the client to make progress while maintaining safety as well. In the cases 2 and 3 we cannot maintain safety if the faulty client remains in the set of live and served clients, so it is also logical to send SIGBUS/SIGKILL to it. Killing a client with SIGBUS is similar to how OS kernel sends SIGBUS when a memory-mapped file is accessed and loading file data results in EIO. It is also similar to wendelin.core 1 where SIGBUS is raised if loading file block results in an error. Extend tests to cover all explained scenarios.
-
Kirill Smelkov authored
wcfs: tests: Add test to exercies faulty client that does not reply to pin triggered by readPinWatchers Levin writes: This patch extends the test scope of 'test_wcfs_pintimeout_kill'. Before this patch, the test only ensured that a client that does not respond to pin requests during the initial watch request [1] is killed. Now it also tests that a faulty client is killed when a block is invalidated. Since there are no other situations where the WCFS server sends pin requests to a client, the tests now cover all situations where a faulty client might not respond. This patch therefore aims to increase the security that WCFS is not blocked by a faulty client. [1] See nexedi/wendelin.core!18 Preliminary history: levin.zimmermann/wendelin.core@9d42efffCo-authored-by: Levin Zimmermann <levin.zimmermann@nexedi.com>
-
Kirill Smelkov authored
We will need to use this utilitin from several places in the next patch.
-
Kirill Smelkov authored
Currently assertBlk uses default timeout() to wait for READ operation to complete. That works well everywhere except that in faulty protection tests wcfs server will first need to wait for its own pintimeout time to kill the faulty client and only then return read result to all non-faulty clients. This way corresponding test, when one client fails to handle pin notification well triggered due to READ operations, will need to use adjusted longer timeout for the good client when doing assertBlk. Adjust assertBlk to allow specifying custom timeout as preparatory step for that.
-
Kirill Smelkov authored
And make sure that that good client can setup its watch ok even through there simultaneously is a faulty client that should get killed.
-
Kirill Smelkov authored
If we don't the whole testing process will become killed when wcfs becomes taught to kill clients that do not handle pin notifications well. Use multiprocessing to do so and to be able to interoperate with spawned test process by sending/receiving objects to/from it. Preliminary history: levin.zimmermann/wendelin.core@aef0f0e1Co-authored-by: Levin Zimmermann <levin.zimmermann@nexedi.com>
-
Kirill Smelkov authored
If wcfs kills client that did not respond to pin notification in pintimeout time, we need to wait strictly _more_ than that time to detect whether client was killed or not. And in practice, due to noise in operating system load and other factors, that waiting time should be significantly greater to detect lack of expected event. However we were waiting for exactly 1·pintimeout time and were claiming that there was no pinkill event right after that. -> Wait for 2·pintimeout instead of 1·pintimeout to make pinkill detection robust.
-
Kirill Smelkov authored
The default "pin timeout" is 30s and we are going to add many tests that exercise pinkilling functionality soon. If every such test takes 2·pintimeout time = 60s, it will result in significant time increase needed to run WCFS tests. Avoid that by adjusting pin timeout to one order of magnitude smaller pintimeout=3s during faulty protection tests.
-
Kirill Smelkov authored
This testing helper limits whole test time to detect FUSE-related deadlocks via aborting FUSE connection on timeout. It is working good so far. But soon we will need pinkill-related tests, where timeout will need to be detected independently of FUSE connection. Expose tWCFS.ctx for tests to be able to use this context and do things limited in time. Adjust FUSE aborting to correlate exactly with this context cancellation.
-
Kirill Smelkov authored
We are going to add more tests on this topic + supporting infrastructure. It makes sense to move everything related to dedicated test file first as a preparatory step because wcfs_test.py feels already overloaded. Plain code movement.
-
Kirill Smelkov authored
WCFS allows issuing simultaneous watch requests and when two watch requests are simultaneously issued for the same file there was a race in their handling: the code was relying on w.atMu.W to protect setupWatch from concurrent readPinWatcher, and also, seemingly from another setupWatch running on the same file. But there is a bug about that: lacking atomic primitive to downgrade RWMutex from wlock to rlock, atMu.W was first fully unlocked and then rlocked again. The code prepare wrt readPinWatcher to start running in that unlock->rlock time window, but it was not prepared wrt another setupWatch starting to run on the same file in that pause time. -> Fix that via using dedicated Watch.setupMu lock that protects setupWatch from setupWatch. Test is, hopefully, TODO. My mistake from 6f0cdaff (wcfs: Provide isolation to clients)
-
Kirill Smelkov authored
Inside readPinWatchers: https://lab.nexedi.com/nexedi/wendelin.core/-/blob/wendelin.core-2.0.alpha3-26-g79e6f7b9/wcfs/wcfs.go#L1536-1591 if δFtail.BlkRevAt would return an error, then f.watchMu was not RUnlocked back, and wg.Wait was not called at all. -> Fix that by scheduling unlock and wg wait right after f.watchMu is rlocked and workgroup is created. Test is, hopefully, TODO. My mistake from 6f0cdaff (wcfs: Provide isolation to clients)
-
Kirill Smelkov authored
The code was already behaving like that but there was XXX to do it. Add test to verify it is actually done. Opened WatchLink handle is released after RELEASE because read in WatchLink.serve, after RELEASE, returns EOF and then the code inside WCFS does all necessary WatchLink-related cleanup: https://lab.nexedi.com/nexedi/wendelin.core/-/blob/wendelin.core-2.0.alpha3-26-g79e6f7b9/wcfs/wcfs.go#L1828-1872
-
Kirill Smelkov authored
This was marked as TODO in server code and not implemented. Without this cleanup zheadSockTab was growing indefinitely after every open/close and leaking memory. -> Fix it via registering RELEASE handler to FUSE and removing corresponding zheadSockTab entry from there.
-
Kirill Smelkov authored
Report there number of inside-WCFS instances, e.g. number of tracked BigFiles, WatchLinks etc, and also number of counted events, for example how many times a pin event happened. Soon we will need this statistics to implement tests e.g. for pinkilling and other functionalities, and it might be also useful to have in general.
-
Kirill Smelkov authored
ZWatcher says it does not need to lock wlinkMu because it is already holding zheadMu and setupWatch runs with zheadMu locked. That is indeed true, but the mistake here is that it i not only setupWatch that makes access to wlinkTab. For example WatchNode.Open registers new entries there only under wlinkMu: https://lab.nexedi.com/nexedi/wendelin.core/-/blob/wendelin.core-2.0.alpha3-26-g79e6f7b9/wcfs/wcfs.go#L1819-1822 -> Fix it by always using wlinkMu when accessing wlinkTab. My mistake from 6f0cdaff (wcfs: Provide isolation to clients) Test is, hopefully, TODO.
-
Kirill Smelkov authored
Previously we were protecting access to zheadSockTab with zheadMu because this table was accessed from only two places: when opening .wcfs/zhead and in zwatcher. Soon we are going to add another place that will access this table and still using big zheadMu seem less and less logical. -> Switch to using dedicated lock to protect table of .wcfs/zhead opens as preparatory step for that.
-
Kirill Smelkov authored
Currently zwatcher failure leads to wcfs starting to provide stale data instead of uptodate data. Fix that by detecting zwatcher failures and explicitly switching the filesystem to a mode where any access to anything returns "input/output error". Zwatcher can fail on e.g. failure to retrieve transactions from ZODB storage or any other failure. With this patch we make sure this does not go unnoticed.
-
- 15 Sep, 2024 1 commit
-
-
Kirill Smelkov authored
go-fuse added functionality to handle Init.MaxPages in https://github.com/hanwen/go-fuse/commit/265a39266958.
-
- 23 Jul, 2024 3 commits
-
-
Levin Zimmermann authored
We need to drop client-specific options so that NEO URI that only differ due to client options while actually pointing to the same NEO server are equal after normalization. -------- kirr: See nexedi/neoppod!18 for the discussion on this subject. /reviewed-by @kirr /reviewed-on nexedi/wendelin.core!28
-
Levin Zimmermann authored
NEO/go and NEO/py URI format diverged over time: - neo@8c974485 However with nexedi/neoppod!21 a common solution was found. With neo!7 NEO/go and NEO/py URI formats are in sync again. We therefore now need to update 'wendelin.core' to support the finally agreed on URI format. /reviewed-by @kirr /reviewed-on nexedi/wendelin.core!28
-
Levin Zimmermann authored
With kirr/neo@95572d6a we synchronized NEO/go URI format with NEO/py URI format. We need this new NEO/go version to apply this synchronization to 'wendelin.core' ZODB tools (what we'll do in the next patches). /reviewed-by @kirr /reviewed-on nexedi/wendelin.core!28
-
- 22 Jul, 2024 1 commit
-
-
Kirill Smelkov authored
This semantically reverts 99f262dd (bigfile/zodb: Make auto format the default) for wendelin.core-1 mode because in non-WCFS mode there are known problems with data corruption on BTree topology changes(*) and auto mode actually does change those topologies with first setting ZBigFile[blk] -> ZBlk1 and then updating the same block to point to ZBlk0 object. Avoid pressuring those problems and use auto as default only in WCFS mode that should handle invalidations with all those BTree topology changes well. The patch is based on suggestion by Levin Zimmermann: nexedi/wendelin.core!20 (comment 212405) We have to move _default_use_wcfs because now it is invoked at module import time and needs to be already defined at the time of the call. (*) see nexedi/wendelin.core@8c32c9f6 for details. /reviewed-by @levin.zimmermann /reviewed-on nexedi/wendelin.core!29
-
- 25 Jun, 2024 6 commits
-
-
Carlos Ramos Carreño authored
Strings cannot be directly hashed without encoding them first, or an error will be raised: ```python ______________________________ test_zsync_resync _______________________________ @func def test_zsync_resync(): zstor = testdb.getZODBStorage() defer(zstor.close) > db, zconn, wconn = _zsync_setup(zstor) wcfs/client/_wczsync_test.py:112: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ../../venvs/wendelin.core/lib/python3.9/site-packages/decorator.py:232: in fun return caller(func, *(extras + args), **kw) ../pygolang/golang/__init__.py:125: in _ return f(*argv, **kw) wcfs/client/_wczsync_test.py:53: in _zsync_setup wc = wcfs.join(zurl) wcfs/__init__.py:201: in join mntpt = _mntpt_4zurl(zurl) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ zurl = 'file:///srv/slapgrid/slappart66/tmp/testdb_fs.xstpbg49/1.fs' def _mntpt_4zurl(zurl): # normalize zurl so that even if we have e.g. two neos:// urls coming # with different paths to ssl keys, or with different order in the list of # masters, we still have them associated with the same wcfs mountpoint. zurl = zurl_normalize_main(zurl) m = hashlib.sha1() > m.update(zurl) E TypeError: Strings must be encoded before hashing ``` We fix this error by encoding the string as UTF8 before hashing it. -------- kirr: Use b instead of doing if isinstance(zurl, six.text_type): zurl = zurl.encode("utf-8") wcfs already takes this approach of using b in other places - for example in tDB.change: # change schedules zf to be changed according to changeDelta at commit. # # changeDelta: {} blk -> data. # data can be both bytes and unicode. <-- NOTE def change(t, zf, changeDelta): assert isinstance(zf, ZBigFile) zfDelta = t._changed.setdefault(zf, {}) for blk, data in six.iteritems(changeDelta): data = b(data) <-- NOTE ... /reviewed-by @kirr /reviewed-on nexedi/wendelin.core!27
-
Carlos Ramos Carreño authored
Some modules and methods have changed names in Python 3. The `thread` module has been renamed to `_thread` and the old name gives error when run on Python 3: ```python Traceback: /opt/slapgrid/b0df76c24a1d2728ccf3e276f07c1790/parts/python3/lib/python3.9/importlib/__init__.py:127: in import_module return _bootstrap._gcd_import(name[level:], package, level) wcfs/client/client_test.py:32: in <module> from wendelin.wcfs.wcfs_test import tDB, tAt, timeout, eprint wcfs/wcfs_test.py:44: in <module> from thread import get_ident as gettid E ModuleNotFoundError: No module named 'thread' ``` In a similar vein, the `items` method of dictionaries plays the same role as the old `iteritems`. We use the `six` module to paper over these differences. /reviewed-by @kirr /reviewed-on nexedi/wendelin.core!27
-
Carlos Ramos Carreño authored
The builtin `zip` in Python 3 returns an iterator, not a list. Thus, one cannot directly use the `len` method on the object returned by `zip`, or we will have errors like the following one: ```python Traceback (most recent call last): File "/srv/slapgrid/slappart66/git/wendelin.core/wcfs/internal/xbtree/xbtreetest/treegen.py", line 617, in <module> main() File "/srv/slapgrid/slappart66/git/wendelin.core/wcfs/internal/xbtree/xbtreetest/treegen.py", line 613, in main cmd(argv) File "/srv/slapgrid/slappart66/venvs/wendelin.core/lib/python3.9/site-packages/decorator.py", line 232, in fun return caller(func, *(extras + args), **kw) File "/srv/slapgrid/slappart66/git/pygolang/golang/__init__.py", line 125, in _ return f(*argv, **kw) File "/srv/slapgrid/slappart66/git/wendelin.core/wcfs/internal/xbtree/xbtreetest/treegen.py", line 589, in cmd_trees TreesSrv(zstor, r) File "/srv/slapgrid/slappart66/venvs/wendelin.core/lib/python3.9/site-packages/decorator.py", line 232, in fun return caller(func, *(extras + args), **kw) File "/srv/slapgrid/slappart66/git/pygolang/golang/__init__.py", line 125, in _ return f(*argv, **kw) File "/srv/slapgrid/slappart66/git/wendelin.core/wcfs/internal/xbtree/xbtreetest/treegen.py", line 234, in TreesSrv treetxtPrev = zctx.ztreetxt(ztree) File "/srv/slapgrid/slappart66/venvs/wendelin.core/lib/python3.9/site-packages/decorator.py", line 232, in fun return caller(func, *(extras + args), **kw) File "/srv/slapgrid/slappart66/git/pygolang/golang/__init__.py", line 125, in _ return f(*argv, **kw) File "/srv/slapgrid/slappart66/git/wendelin.core/wcfs/internal/xbtree/xbtreetest/treegen.py", line 536, in ztreetxt return zctx.TopoEncode(xbtree.StructureOf(ztree)) File "/srv/slapgrid/slappart66/venvs/wendelin.core/lib/python3.9/site-packages/decorator.py", line 232, in fun return caller(func, *(extras + args), **kw) File "/srv/slapgrid/slappart66/git/pygolang/golang/__init__.py", line 125, in _ return f(*argv, **kw) File "/srv/slapgrid/slappart66/git/wendelin.core/wcfs/internal/xbtree/xbtreetest/treegen.py", line 542, in TopoEncode return xbtree.TopoEncode(tree, zctx.vencode) File "/srv/slapgrid/slappart66/git/wendelin.core/wcfs/internal/xbtree.py", line 797, in TopoEncode for nodev in _walkBFS(tree): File "/srv/slapgrid/slappart66/git/wendelin.core/wcfs/internal/xbtree.py", line 701, in _walkBFS for level in __walkBFS(tree): File "/srv/slapgrid/slappart66/git/wendelin.core/wcfs/internal/xbtree.py", line 724, in __walkBFS assert len(rv) == len(rn.node.children) TypeError: object of type 'zip' has no len() ``` Thus, we have to create a list from the result of `zip` before calling `len` on it. -------- kirr: There were only two places where zip was used to build a list. All other places where zip is used - both in wcfs/xbtree and in other packages - are calling zip to iterate over zip result: (py39.venv) kirr@deca:~/src/wendelin/wendelin.core$ git grep -w zip bigarray/__init__.py: for n, s in zip(self.shape, self.stridev): bigarray/__init__.py: for n, s in zip(a.shape, a.strides): bigarray/array_zodb.py:BigArray_defaults = dict(zip(reversed(_.args), reversed(_.defaults))) wcfs/internal/xbtree.py: for i, (klo, khi) in enumerate(zip(v[:-1], v[1:])): # (klo, khi) = [] of (k_i, k_{i+1}) wcfs/internal/xbtree.py: kvv = ['%s:%s' % (k,v) for (k,v) in zip(b.keyv, b.valuev)] wcfs/internal/xbtree.py: for (j,i) in zip(jv, iv): wcfs/internal/xbtree.py: for (child, k) in zip(node.children[1:], node.keyv): wcfs/internal/xbtree.py: for (k,v) in zip(node.keyv, node.valuev): wcfs/internal/xbtree.py: for (xlo, xhi) in zip(ksplitv[:-1], ksplitv[1:]): # (klo, s1), (s1, s2), ..., (sN, khi) wcfs/internal/xbtree.py: for (xlo, xhi) in zip(ksplitv[:-1], ksplitv[1:]): # (klo, s1), (s1, s2), ..., (sN, khi) wcfs/internal/xbtree.py: for (k,vtxt) in zip(node.keyv, vtxtv)]) wcfs/internal/xbtree/xbtreetest/treegen.py: for (k,v) in zip(node.keyv, node.valuev): wcfs/internal/xbtree_test.py: for (child, childOK) in zip(kids, children): wcfs/internal/xbtree_test.py: for (i,(k,v)) in enumerate(zip(keys, values)): # handled in hereby patch wcfs/internal/xbtree.py: rv = list(zip(v[:-1], v[1:])) # (klo,k1), (k1,k2), ..., (kN,khi) wcfs/internal/xbtree.py: rv = list(zip(v[:-1], v[1:])) # (klo,k1), (k1,k2), ..., (kN,khi) /reviewed-by @kirr /reviewed-on nexedi/wendelin.core!27
-
Carlos Ramos Carreño authored
`numpy.object` was an alias for the builtin `object`, so we can use `object` instead: ```python _________________________ test_bigarray_noobject[tRAM] _________________________ testbig = <bigarray.tests.test_basic.tRAM object at 0x7f6d114ead60> def test_bigarray_noobject(testbig): Zh = testbig.fopen() # NOTE str & unicode are fixed-size types - if size is not explicitly given # it will become S0 or U0 > obj_dtypev = [numpy.object, 'O', 'i4, O', [('x', 'i4'), ('y', 'i4, O')]] bigarray/tests/test_basic.py:110: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ attr = 'object' def __getattr__(attr): # Warn for expired attributes, and return a dummy function # that always raises an exception. import warnings import math try: msg = __expired_functions__[attr] except KeyError: pass else: warnings.warn(msg, DeprecationWarning, stacklevel=2) def _expired(*args, **kwds): raise RuntimeError(msg) return _expired # Emit warnings for deprecated attributes try: val, msg = __deprecated_attrs__[attr] except KeyError: pass else: warnings.warn(msg, DeprecationWarning, stacklevel=2) return val if attr in __future_scalars__: # And future warnings for those that will change, but also give # the AttributeError warnings.warn( f"In the future `np.{attr}` will be defined as the " "corresponding NumPy scalar.", FutureWarning, stacklevel=2) if attr in __former_attrs__: > raise AttributeError(__former_attrs__[attr]) E AttributeError: module 'numpy' has no attribute 'object'. E `np.object` was a deprecated alias for the builtin `object`. To avoid this error in existing code, use `object` by itself. Doing this will not modify any behavior and is safe. E The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at: E https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations ``` -------- kirr: On py2: In [1]: import numpy In [2]: numpy.__version__ Out[2]: '1.16.6' In [3]: numpy.object Out[3]: object In [4]: numpy.object is object Out[4]: True this change is, thus, indeed safe to make. /reviewed-by @kirr /reviewed-on nexedi/wendelin.core!27
-
Carlos Ramos Carreño authored
NEO is still not ported to Python 3. Importing NEO globally thus makes pytest tests fail during the assert-rewritting step: ```python ../../venvs/wendelin.core/lib/python3.9/site-packages/_pytest/assertion/rewrite.py:178: in exec_module exec(co, module.__dict__) lib/tests/test_zodb.py:36: in <module> from neo.client.Storage import Storage as NEOStorage ../neoppod/neo/client/__init__.py:52: in <module> from . import app # set up signal handlers early enough to do it in the main thread E File "/srv/slapgrid/slappart66/git/neoppod/neo/client/app.py", line 356 E except NEOStorageReadRetry, e: E ^ E SyntaxError: invalid syntax ``` A MR adding enough support to not fail at import time is proposed in nexedi/neoppod!24 . However, that MR will not be reviewed until the vacation period is over. In the meantime, and as a previous step to make running NEO tests optional, the import has been moved inside the function loading NEO. Thus, only the tests that require NEO will fail. -------- kirr: Add TODO to revert to import NEO globally after lab.nexedi.com/nexedi/neoppod/-/merge_requests/24 is landed. /reviewed-by @kirr /reviewed-on nexedi/wendelin.core!27
-
Carlos Ramos Carreño authored
`ZEO[test]` should be installed when testing, so that `zope.testing` is installed. Otherwise, an import error may be raised when running the test if `zope.testing` has not been manually installed. -------- kirr: Adjust tox.ini to no longer install zope.testing explicitly since we are now requesting ZEO[test] under our own tests extra, and tox.ini installs .[test] as the primary package. /reviewed-by @kirr /reviewed-on nexedi/wendelin.core!27
-
- 21 Jun, 2024 1 commit
-
-
Kirill Smelkov authored
Because of the way wendelin.core organizes its in-tree python importing redirector (see wendelin.py) it is possible to import the same module twice with python thinking it is importing two different modules. For example when installed in develop mode python resolves the following imports to the same bigfile/__init__.py import wendelin.bigfile import bigfile but tries to load that module twice and independently. Which leads to virtmem DSO, linked to from under bigfile/_bigfile extension, being initialized twice and complaining about that because only single gil hook should be requested to be installed: (py39.venv) kirr@deca:~/src/wendelin/wendelin.core$ python Python 3.9.19+ (heads/3.9:40d77b93672, Apr 12 2024, 06:40:05) [GCC 12.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import wendelin.bigfile >>> import bigfile python: bigfile/virtmem.c:106: virt_lock_hookgil: Assertion `!(virtmem_gilhooks)' failed. Аварийный останов This problem was there from day 1, but it was not creating issues in practice because wendelin.core users do `import wendelin...` and there was also no problem with running pytest in the source tree. However with py39 and pytest8 we see that running pytest somehow started to unconditionally import things from under two namespaces which leads to inability to run tests even when instructing pytest to collect them via python-modules namespace instead of filesystem: (py39.venv) kirr@deca:~/src/wendelin/wendelin.core$ pytest -vsx --pyargs wendelin.bigfile.tests.test_basic ======================== test session starts ======================== platform linux -- Python 3.9.19+, pytest-8.2.2, pluggy-1.5.0 -- /home/kirr/src/wendelin/venv/py39.venv/bin/python3.9 cachedir: .pytest_cache rootdir: /home/kirr/src/wendelin/wendelin.core configfile: pyproject.toml collecting ... python3.9: bigfile/virtmem.c:106: virt_lock_hookgil: Assertion `!(virtmem_gilhooks)' failed. Fatal Python error: Aborted Current thread 0x00007fa172a60740 (most recent call first): File "<frozen importlib._bootstrap>", line 228 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 1173 in create_module File "<frozen importlib._bootstrap>", line 565 in module_from_spec File "<frozen importlib._bootstrap>", line 666 in _load_unlocked File "<frozen importlib._bootstrap>", line 986 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1007 in _find_and_load File "/home/kirr/src/wendelin/wendelin.core/bigfile/__init__.py", line 31 in <module> File "<frozen importlib._bootstrap>", line 228 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 850 in exec_module File "<frozen importlib._bootstrap>", line 680 in _load_unlocked File "<frozen importlib._bootstrap>", line 986 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1007 in _find_and_load File "<frozen importlib._bootstrap>", line 1030 in _gcd_import File "<frozen importlib._bootstrap>", line 228 in _call_with_frames_removed File "<frozen importlib._bootstrap>", line 972 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1007 in _find_and_load File "<frozen importlib._bootstrap>", line 1030 in _gcd_import File "<frozen importlib._bootstrap>", line 228 in _call_with_frames_removed File "<frozen importlib._bootstrap>", line 972 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1007 in _find_and_load File "<frozen importlib._bootstrap>", line 1030 in _gcd_import File "/home/kirr/local/py3.9/lib/python3.9/importlib/__init__.py", line 127 in import_module File "/home/kirr/src/wendelin/venv/py39.venv/lib/python3.9/site-packages/_pytest/pathlib.py", line 591 in import_path File "/home/kirr/src/wendelin/venv/py39.venv/lib/python3.9/site-packages/_pytest/python.py", line 492 in importtestmodule ... File "/home/kirr/src/wendelin/venv/py39.venv/lib/python3.9/site-packages/_pytest/runner.py", line 567 in collect_one_node File "/home/kirr/src/wendelin/venv/py39.venv/lib/python3.9/site-packages/_pytest/main.py", line 837 in _collect_one_node File "/home/kirr/src/wendelin/venv/py39.venv/lib/python3.9/site-packages/_pytest/main.py", line 974 in genitems File "/home/kirr/src/wendelin/venv/py39.venv/lib/python3.9/site-packages/_pytest/main.py", line 811 in perform_collect File "/home/kirr/src/wendelin/venv/py39.venv/lib/python3.9/site-packages/_pytest/main.py", line 349 in pytest_collection ... File "/home/kirr/src/wendelin/venv/py39.venv/lib/python3.9/site-packages/_pytest/config/__init__.py", line 178 in main File "/home/kirr/src/wendelin/venv/py39.venv/lib/python3.9/site-packages/_pytest/config/__init__.py", line 206 in console_main File "/home/kirr/src/wendelin/venv/py39.venv/bin/pytest", line 8 in <module> Аварийный останов (образ памяти сброшен на диск) This happens because wendelin.bigfile is importing wendelin.bigfile._bigfile as `from ._bigfile import ...` which under pytest leads to importing both wendelin.bigfile._bigfile and bigfile._bigfile and further conflicting when setting up GIL hooks. -> Fix this issue by avoiding relative imports and always referring to wendelin.core modules with `wendelin.` prefix. The list of places where relative imports were used was small and found via $ git grep -w import |grep '\s\.' bigfile/__init__.py:from ._bigfile import BigFile, WRITEOUT_STORE, WRITEOUT_MARKSTORED, ram_reclaim wcfs/__init__.py:from .client._wcfs import \ Everywhere else we were already importing things from under wendelin namespace via fully specified module path. After the fix both $ pytest -vsx --pyargs wendelin.bigfile.tests.test_basic and $ pytest -vsx bigfile/tests/test_basic.py start to work ok from inside the worktree. /reported-and-tested-by @vnmabus /reviewed-by @levin.zimmermann /reviewed-on nexedi/wendelin.core!26
-
- 07 Jun, 2024 1 commit
-
-
Carlos Ramos Carreño authored
When building an editable wheel it is not necessary that `build_packages` (or even `run`) is called before calling `get_outputs` (notice the following in https://setuptools.pypa.io/en/latest/userguide/extension.html#supporting-sdists-and-editable-installs-in-build-sub-commands : "Please note that custom sub-commands SHOULD NOT rely on `run()` being executed (or not) to provide correct return values for `get_outputs()`, `get_output_mapping()` or `get_source_files()`. The `get_*` methods should work independently of `run()."). Our implementation relied in the call to `build_packages` to set the name of the synthetic init file. This commit uses a property of the object instead, to compute that name whenever it is necessary. With this change, it is now possible to make editable wheels. -------- kirr: Without the fix `pip install -e` fails as follows on py3: Traceback (most recent call last): File "/home/kirr/src/wendelin/venv/py39.venv/lib/python3.9/site-packages/setuptools/command/editable_wheel.py", line 155, in run self._create_wheel_file(bdist_wheel) File "/home/kirr/src/wendelin/venv/py39.venv/lib/python3.9/site-packages/setuptools/command/editable_wheel.py", line 357, in _create_wheel_file files, mapping = self._run_build_commands(dist_name, unpacked, lib, tmp) File "/home/kirr/src/wendelin/venv/py39.venv/lib/python3.9/site-packages/setuptools/command/editable_wheel.py", line 281, in _run_build_commands files, mapping = self._collect_build_outputs() File "/home/kirr/src/wendelin/venv/py39.venv/lib/python3.9/site-packages/setuptools/command/editable_wheel.py", line 266, in _collect_build_outputs files.extend(cmd.get_outputs() or []) File "<string>", line 137, in get_outputs File "/home/kirr/src/wendelin/venv/py39.venv/lib/python3.9/site-packages/setuptools/command/build_py.py", line 78, in __getattr__ return orig.build_py.__getattr__(self, attr) File "/home/kirr/src/wendelin/venv/py39.venv/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 107, in __getattr__ raise AttributeError(attr) AttributeError: initfile /reviewed-by @kirr /reviewed-on nexedi/wendelin.core!25
-
- 31 May, 2024 2 commits
-
-
Carlos Ramos Carreño authored
The interface of the function `search_include_directories` has changed in Cython 3.0a7 in https://github.com/cython/cython/commit/f3f7b612. This updates the replacement used by wendelin so that it works for both newer and older versions. Note that wendelin.core still does not work with Cython >= 3, as that version refuses to compile Python functions that can throw C++ exceptions (apparently, mixing C++ exceptions and Cython-generated code is not considered safe). /reviewed-by @kirr /reviewed-on nexedi/wendelin.core!24
-
Carlos Ramos Carreño authored
NEO is still not ported to Python 3, so Python 3 tests should not use this backend. /reviewed-by @kirr /reviewed-on nexedi/wendelin.core!24
-
- 03 Apr, 2024 2 commits
-
-
Levin Zimmermann authored
If a user doesn't explicitly declare a ZBlk format, it can be assumed that this user wants to have the best ratio between consumed storage space and data access speed. Currently the best ratio between these two is provided by the new 'auto' (heuristic) format. In case of small appends this format helps reducing storage space, and in any other case it just behaves like ZBlk0 [1]. Therefore this default ensures a fast access speed [2], but also avoids a massive data growth in case of many small appends [3]. [1] An exception to this is: in its current implementation a block behaves like ZBlk1 (slow access) in case it isn't fully filled up yet. [2] As this was stated as a reason why ZBlk1 as a default format was reverted in 0b68f178. [3] This was perhaps the reason why ZBlk1 was set to be the default format in 9ae42085. The massive storage space consumption can already be a problem with few array to which regularly small data is appended to, as it can easily happen with Wendelin development instances. /reviewed-by @kirr /reviewed-on !20
-
Levin Zimmermann authored
There are two formats to save data with a ZBigFile: ZBlk0 and ZBlk1. They differ by adjusting the ratio between access-time and growing disk-space, where ZBlk1 is better regarding to disk space, while ZBlk0 has a better access-time. Wendelin.core users may not always know yet or care which format fits better for their data. In this case it may be easier for users to just let the program automatically select the ZBlk format. With this patch and the new 'auto' (for heuristic) option of the 'ZBlk' argument of ZBigFile, this is now possible. The 'auto' option isn't really a new ZBlk format in itself, but it just tries to automatically select the best ZBlk format option according to the characteristics of the changes that the user applies to the ZBigFile. In its current implementation, the heuristic tackles the use-case of large arrays with many small append-only changes. In this case 'auto' is smaller in space than ZBlk0, but faster to read than ZBlk1. It does so, by initially using ZBlk1 until a blk is filled up. Once a blk is full, it switches to ZBlk1, as it was recommended by @kirr in !20 (comment 196084). With this patch comes a test (bigfile/tests/bench_zblkfmt) that creates benchmarks for different combinations and zblk formats. The test aims to check how the 'heuristic' format performs in contrast to 'ZBlk0' and 'ZBlk1': BenchmarkAppendSize/zblk=ZBlk0/change_count=500/change_percentage_set=[0.014] 1 538.1 MB BenchmarkAppendRandRead/zblk=ZBlk0/change_count=500/change_percentage_set=[0.014] 6 2.085 ms/blk BenchmarkAppendSize/zblk=ZBlk1/change_count=500/change_percentage_set=[0.014] 1 16.8 MB BenchmarkAppendRandRead/zblk=ZBlk1/change_count=500/change_percentage_set=[0.014] 6 14.564 ms/blk BenchmarkAppendSize/zblk=auto/change_count=500/change_percentage_set=[0.014] 1 29.4 MB BenchmarkAppendRandRead/zblk=auto/change_count=500/change_percentage_set=[0.014] 6 2.119 ms/blk BenchmarkRandWriteSize/zblk=ZBlk0/arrsize=1000000/change_count=500/change_percentage_set=[0.2] 1 1021.1 MB BenchmarkRandWriteRandRead/zblk=ZBlk0/arrsize=1000000/change_count=500/change_percentage_set=[0.2] 3 2.324 ms/blk BenchmarkRandWriteSize/zblk=ZBlk1/arrsize=1000000/change_count=500/change_percentage_set=[0.2] 1 216.2 MB BenchmarkRandWriteRandRead/zblk=ZBlk1/arrsize=1000000/change_count=500/change_percentage_set=[0.2] 3 15.317 ms/blk BenchmarkRandWriteSize/zblk=auto/arrsize=1000000/change_count=500/change_percentage_set=[0.2] 1 219.8 MB BenchmarkRandWriteRandRead/zblk=auto/arrsize=1000000/change_count=500/change_percentage_set=[0.2] 3 14.027 ms/blk BenchmarkRandWriteSize/zblk=ZBlk0/arrsize=1000000/change_count=500/change_percentage_set=[1] 1 1048.6 MB BenchmarkRandWriteRandRead/zblk=ZBlk0/arrsize=1000000/change_count=500/change_percentage_set=[1] 3 2.126 ms/blk BenchmarkRandWriteSize/zblk=ZBlk1/arrsize=1000000/change_count=500/change_percentage_set=[1] 1 1070.4 MB BenchmarkRandWriteRandRead/zblk=ZBlk1/arrsize=1000000/change_count=500/change_percentage_set=[1] 3 14.284 ms/blk BenchmarkRandWriteSize/zblk=auto/arrsize=1000000/change_count=500/change_percentage_set=[1] 1 1070.3 MB BenchmarkRandWriteRandRead/zblk=auto/arrsize=1000000/change_count=500/change_percentage_set=[1] 3 14.072 ms/blk BenchmarkRandWriteSize/zblk=ZBlk0/arrsize=1000000/change_count=500/change_percentage_set=[0.2,1] 1 1046.4 MB BenchmarkRandWriteRandRead/zblk=ZBlk0/arrsize=1000000/change_count=500/change_percentage_set=[0.2,1] 3 2.137 ms/blk BenchmarkRandWriteSize/zblk=ZBlk1/arrsize=1000000/change_count=500/change_percentage_set=[0.2,1] 1 638.2 MB BenchmarkRandWriteRandRead/zblk=ZBlk1/arrsize=1000000/change_count=500/change_percentage_set=[0.2,1] 3 14.083 ms/blk BenchmarkRandWriteSize/zblk=auto/arrsize=1000000/change_count=500/change_percentage_set=[0.2,1] 1 639.5 MB BenchmarkRandWriteRandRead/zblk=auto/arrsize=1000000/change_count=500/change_percentage_set=[0.2,1] 3 13.937 ms/blk and post-processed with benchstat from 3 such runs: │ x.log │ │ B │ AppendSize/zblk=ZBlk0/change_count=500/change_percentage_set=[0.014] 513.2Mi ± 0% AppendSize/zblk=ZBlk1/change_count=500/change_percentage_set=[0.014] 16.02Mi ± 0% AppendSize/zblk=auto/change_count=500/change_percentage_set=[0.014] 28.04Mi ± 0% RandWriteSize/zblk=ZBlk0/arrsize=1000000/change_count=500/change_percentage_set=[0.2] 973.8Mi ± 0% RandWriteSize/zblk=ZBlk1/arrsize=1000000/change_count=500/change_percentage_set=[0.2] 206.2Mi ± 0% RandWriteSize/zblk=auto/arrsize=1000000/change_count=500/change_percentage_set=[0.2] 209.6Mi ± 0% RandWriteSize/zblk=ZBlk0/arrsize=1000000/change_count=500/change_percentage_set=[1] 1000.0Mi ± 0% RandWriteSize/zblk=ZBlk1/arrsize=1000000/change_count=500/change_percentage_set=[1] 1020.8Mi ± 0% RandWriteSize/zblk=auto/arrsize=1000000/change_count=500/change_percentage_set=[1] 1020.7Mi ± 0% RandWriteSize/zblk=ZBlk0/arrsize=1000000/change_count=500/change_percentage_set=[0.2,1] 997.9Mi ± 0% RandWriteSize/zblk=ZBlk1/arrsize=1000000/change_count=500/change_percentage_set=[0.2,1] 608.6Mi ± 0% RandWriteSize/zblk=auto/arrsize=1000000/change_count=500/change_percentage_set=[0.2,1] 609.9Mi ± 0% geomean 353.0Mi │ x.log │ │ ms/blk │ AppendRandRead/zblk=ZBlk0/change_count=500/change_percentage_set=[0.014] 2.094 ± 12% AppendRandRead/zblk=ZBlk1/change_count=500/change_percentage_set=[0.014] 14.47 ± 1% AppendRandRead/zblk=auto/change_count=500/change_percentage_set=[0.014] 2.168 ± 2% RandWriteRandRead/zblk=ZBlk0/arrsize=1000000/change_count=500/change_percentage_set=[0.2] 2.324 ± 1% RandWriteRandRead/zblk=ZBlk1/arrsize=1000000/change_count=500/change_percentage_set=[0.2] 13.73 ± 12% RandWriteRandRead/zblk=auto/arrsize=1000000/change_count=500/change_percentage_set=[0.2] 13.60 ± 3% RandWriteRandRead/zblk=ZBlk0/arrsize=1000000/change_count=500/change_percentage_set=[1] 2.125 ± 2% RandWriteRandRead/zblk=ZBlk1/arrsize=1000000/change_count=500/change_percentage_set=[1] 14.18 ± 3% RandWriteRandRead/zblk=auto/arrsize=1000000/change_count=500/change_percentage_set=[1] 14.17 ± 1% RandWriteRandRead/zblk=ZBlk0/arrsize=1000000/change_count=500/change_percentage_set=[0.2,1] 2.118 ± 1% RandWriteRandRead/zblk=ZBlk1/arrsize=1000000/change_count=500/change_percentage_set=[0.2,1] 13.85 ± 2% RandWriteRandRead/zblk=auto/arrsize=1000000/change_count=500/change_percentage_set=[0.2,1] 13.80 ± 1% geomean 6.423 See !20 and kirr/wendelin.core@da765ef7...0c6f0850 for the preliminary history of this patch. Co-authored-by: Kirill Smelkov <kirr@nexedi.com> Fix typo.
-
- 29 Mar, 2024 1 commit
-
-
Kirill Smelkov authored
This is utility function that we will need to use in the next patch to see how data of two blocks are similar to each other. We use numpy for the implementation because this code will be hot and if we don't use optimized C routines writeout will become very slow. Quoting draft patch kirr/wendelin.core@3f631932 : -> Also optimize ndelta computation - when done in plain python just this part was taking a lot of time as timing for initial writeup showed: writeup with ZBlk0: ~20-25s writeup with ZBlk1: ~20-30s writeup with auto: was ~ 120s now, after switching to numpy for ndelta computation, whole runtime with 'auto' is taking ~ 35s. The whole runtime, if I observe benchmark execution correctly, is dominated by database writeup. /reviewed-by @levin.zimmermann /reviewed-on !20
-