- 17 Jul, 2023 8 commits
-
-
Levin Zimmermann authored
/reviewed-by @kirr /reviewed-on kirr/neo!5 * t-fix-flaky-testload: fixup! neonet/newlink: Fix lost conn in encoding detector go/neo/neonet: Fix client handshake not to accept server encoding if it is different from what client indicated go/neo/neonet: Demonstrate problem in handshake with NEO/py go/neo/neonet: Dedicate an error type to indicate "protocol version mismatch" as handshake failure cause fixup! client_test: Keep NEO srv logs if test fails fixup! client_test/NEOSrv += LogContent for better debug neonet/newlink: Fix lost conn in encoding detector client_test: Keep NEO srv logs if test fails client_test += print NEO server log if >=1 test(s) failed client_test/NEOSrv += LogContent for better debug
-
Kirill Smelkov authored
* master: go/neo/neonet: Fix client handshake not to accept server encoding if it is different from what client indicated go/neo/neonet: Demonstrate problem in handshake with NEO/py go/neo/neonet: Dedicate an error type to indicate "protocol version mismatch" as handshake failure cause
-
Kirill Smelkov authored
-
Levin Zimmermann authored
go/neo/neonet: Fix client handshake not to accept server encoding if it is different from what client indicated If the peers encoding is different than our encoding two different scenarios can happen, because the handshake order is undefined (e.g. we don't know if our handshake is received before the peer sends its handshake): 1. Our handshake is received before peer sends its handshake, NEO/py closes connection if it sees unexpected magic, version, etc. 2. The client already sends a handshake before it proceeds our handshake. In this case it initally sends us it version, we can extract its encoding, and only later, once it proceeded our handshake with the bad encoding, it closes the connection. Before this patch case (2) wasn't handled correctly by the automatic encoding detection of 'DialLink'. 'DialLink' simply accepted the different-than-expected encoding, but once the peer proceeded the nodes handshake the peer closed the connection and the initially established and returned link was immediately closed again. Due to this it was good luck whether connecting with a peer different with an encoding different from the expected one worked or didn't work (it depended on which handshake was faster). Now 'DialLink' should reliably find the correct encoding and return a stable link. -------- kirr: this is based on the following original patch by Levin: f6b59772 I updated documentation throughout correspondingly and also added corresponding handshake-specific test in the previous patch. See kirr/neo!5 and b2da69e2 (go/neo/neonet: Demonstrate problem in handshake with NEO/py) for more in-depth description of the problem.
-
Kirill Smelkov authored
Levin Zimmerman discovered that sometimes NEO/py accepts our handshake hello with encoding 'M', then replies its owh handshake ehlo with encoding 'N' and then further terminates the connection. In other words it looks like that the handshake went successful, but it actually did not and NEO/py terminates the link after some time. This manifests itself e.g. as infrequent TestLoad failures on t branch with the following output: === RUN TestLoad/py/!ssl I: runneo.py: /tmp/neo776618506/1 !ssl: started master(s): 127.0.0.1:21151 === RUN TestLoad/py/!ssl/enc=N(dialTryOrder=N,M) client_test.go:598: skip: does not excercise client redial === RUN TestLoad/py/!ssl/enc=N(dialTryOrder=M,N) xtesting.go:330: load 0285cbac258bf266:0000000000000000: returned err unexpected: have: neo://127.0.0.1:21151/1: load 0285cbac258bf266:0000000000000000: dial S1: dial 127.0.0.1:40345 (STORAGE): 127.0.0.1:56678 - 127.0.0.1:40345: request identification: 127.0.0.1:56678 - 127.0.0.1:40345 .1: recv: EOF want: nil xtesting.go:330: load 0285cbac258bf266:0000000000000000: returned tid unexpected: 0000000000000000 ; want: 0285cbac258bf266 xtesting.go:330: load 0285cbac258bf266:0000000000000000: returned buf = nil xtesting.go:339: load 0285cbac258bf265:0000000000000000: returned err unexpected: have: neo://127.0.0.1:21151/1: load 0285cbac258bf265:0000000000000000: dial S1: dial 127.0.0.1:40345 (STORAGE): 127.0.0.1:56688 - 127.0.0.1:40345: request identification: 127.0.0.1:56688 - 127.0.0.1:40345 .1: recv: EOF want: neo://127.0.0.1:21151/1: load 0285cbac258bf265:0000000000000000: 0000000000000000: object was not yet created ... client_test.go:588: NEO log tail: log file 'storage_0.log' tail: 2023-07-17 17:21:57.1519 DEBUG S1 connection completed for <ServerConnection(nid=None, address=127.0.0.1:51230, handler=IdentificationHandler, fd=20, server) at 7f3583fd4f50> (from 127.0.0.1:40345) 2023-07-17 17:21:57.1537 WARNING S1 Protocol version mismatch with <ServerConnection(nid=None, address=127.0.0.1:51230, handler=IdentificationHandler, fd=20, server) at 7f3583fd4f50> 2023-07-17 17:21:57.1548 DEBUG S1 connection closed for <ServerConnection(nid=None, address=127.0.0.1:51230, handler=IdentificationHandler, closed, server) at 7f3583fd4f50> 2023-07-17 17:21:57.1551 WARNING S1 A connection was lost during identification 2023-07-17 17:22:00.1582 DEBUG S1 accepted a connection from 127.0.0.1:51236 2023-07-17 17:22:00.1585 DEBUG S1 connection completed for <ServerConnection(nid=None, address=127.0.0.1:51236, handler=IdentificationHandler, fd=20, server) at 7f3583fd4e90> (from 127.0.0.1:40345) 2023-07-17 17:22:00.1604 WARNING S1 Protocol version mismatch with <ServerConnection(nid=None, address=127.0.0.1:51236, handler=IdentificationHandler, fd=20, server) at 7f3583fd4e90> 2023-07-17 17:22:00.1622 DEBUG S1 connection closed for <ServerConnection(nid=None, address=127.0.0.1:51236, handler=IdentificationHandler, closed, server) at 7f3583fd4e90> 2023-07-17 17:22:00.1625 WARNING S1 A connection was lost during identification 2023-07-17 17:22:03.1663 DEBUG S1 accepted a connection from 127.0.0.1:51238 2023-07-17 17:22:03.1666 DEBUG S1 connection completed for <ServerConnection(nid=None, address=127.0.0.1:51238, handler=IdentificationHandler, fd=20, server) at 7f3583fd4d10> (from 127.0.0.1:40345) 2023-07-17 17:22:03.1674 WARNING S1 Protocol version mismatch with <ServerConnection(nid=None, address=127.0.0.1:51238, handler=IdentificationHandler, fd=20, server) at 7f3583fd4d10> 2023-07-17 17:22:03.1688 DEBUG S1 connection closed for <ServerConnection(nid=None, address=127.0.0.1:51238, handler=IdentificationHandler, closed, server) at 7f3583fd4d10> 2023-07-17 17:22:03.1691 WARNING S1 A connection was lost during identification 2023-07-17 17:22:06.1714 DEBUG S1 accepted a connection from 127.0.0.1:57072 2023-07-17 17:22:06.1719 DEBUG S1 connection completed for <ServerConnection(nid=None, address=127.0.0.1:57072, handler=IdentificationHandler, fd=20, server) at 7f3583fd4b50> (from 127.0.0.1:40345) 2023-07-17 17:22:06.1727 WARNING S1 Protocol version mismatch with <ServerConnection(nid=None, address=127.0.0.1:57072, handler=IdentificationHandler, fd=20, server) at 7f3583fd4b50> 2023-07-17 17:22:06.1738 DEBUG S1 connection closed for <ServerConnection(nid=None, address=127.0.0.1:57072, handler=IdentificationHandler, closed, server) at 7f3583fd4b50> 2023-07-17 17:22:06.1738 WARNING S1 A connection was lost during identification log file 'master_0.log' tail: 2023-07-17 17:21:21.0799 PACKET M1 #0x0012 NotifyNodeInformation > A1 (127.0.0.1:37906) 2023-07-17 17:21:21.0799 PACKET M1 ! C0 | CLIENT | | RUNNING | 2023-07-17 14:21:21.079838 2023-07-17 17:21:21.0800 PACKET M1 #0x0102 NotifyNodeInformation > S1 (127.0.0.1:37918) 2023-07-17 17:21:21.0800 PACKET M1 ! C0 | CLIENT | | RUNNING | 2023-07-17 14:21:21.079838 2023-07-17 17:21:21.0801 DEBUG M1 Handler changed on <ServerConnection(nid=None, address=127.0.0.1:37966, handler=ClientServiceHandler, fd=18, server) at 7f3584245910> 2023-07-17 17:21:21.0802 PACKET M1 #0x0001 AnswerRequestIdentification > C0 (127.0.0.1:37966) 2023-07-17 17:21:21.0804 PACKET M1 #0x0000 NotifyNodeInformation > C0 (127.0.0.1:37966) 2023-07-17 17:21:21.0804 PACKET M1 ! C0 | CLIENT | | RUNNING | 2023-07-17 14:21:21.079838 2023-07-17 17:21:21.0804 PACKET M1 ! M1 | MASTER | 127.0.0.1:21151 | RUNNING | None 2023-07-17 17:21:21.0804 PACKET M1 ! S1 | STORAGE | 127.0.0.1:40345 | RUNNING | 2023-07-17 14:21:18.737469 2023-07-17 17:21:21.0805 PACKET M1 #0x0002 NotifyPartitionTable > C0 (127.0.0.1:37966) 2023-07-17 17:21:21.0810 PACKET M1 #0x0003 LastTransaction < C0 (127.0.0.1:37966) 2023-07-17 17:21:21.0811 PACKET M1 #0x0003 AnswerLastTransaction > C0 (127.0.0.1:37966) 2023-07-17 17:22:06.2053 DEBUG M1 <SocketConnectorIPv4 at 0x7f3584252d10 fileno 18 ('127.0.0.1', 21151), opened from ('127.0.0.1', 37966)> closed in recv 2023-07-17 17:22:06.2056 DEBUG M1 connection closed for <ServerConnection(nid=C0, address=127.0.0.1:37966, handler=ClientServiceHandler, closed, server) at 7f3584245910> 2023-07-17 17:22:06.2058 PACKET M1 #0x0014 NotifyNodeInformation > A1 (127.0.0.1:37906) 2023-07-17 17:22:06.2058 PACKET M1 ! C0 | CLIENT | | UNKNOWN | 2023-07-17 14:21:21.079838 2023-07-17 17:22:06.2059 PACKET M1 #0x0104 NotifyNodeInformation > S1 (127.0.0.1:37918) 2023-07-17 17:22:06.2059 PACKET M1 ! C0 | CLIENT | | UNKNOWN | 2023-07-17 14:21:21.079838 The problem is due to that my analysis from e407f725 (go/neo/neonet: Rework handshake to differentiate client and server parts) turned out to be incorrect. Quoting that patch: -> Rework handshake so that client always sends its hello first, and only then the server side replies. This matches actual NEO/py behaviour: https://lab.nexedi.com/nexedi/neoppod/blob/v1.12-67-g261dd4b4/neo/lib/connector.py#L293-294 even though the "NEO protocol" states that Handshake transmissions are not ordered with respect to each other and can go in parallel. ( https://neo.nexedi.com/P-NEO-Protocol.Specification.2019?portal_skin=CI_slideshow#/9/2 ) If I recall correctly that sentence was authored by me in 2018 based on previous understanding of should-be full symmetry in-between client and server. so here "This matches actual NEO/py behaviour" was wrong: even though https://lab.nexedi.com/nexedi/neoppod/blob/v1.12-67-g261dd4b4/neo/lib/connector.py#L293-294 indeed says that # The NEO protocol is such that a client connection is always the # first to send a packet, as soon as the connection is established, in reality it is not the case as SocketConnector always queues handshake hello upon its creation before receiving anything from remote side: https://lab.nexedi.com/nexedi/neoppod/blob/v1.12-93-gfd87e153/neo/lib/connector.py#L77-78 . In practice this leads to that in non-SSL case NEO/py server might be fast enough to send its prepared hello before receiving hello from us. Levin also explains at kirr/neo!5 (comment 187429): I think what happens is this: the NEO protocol doesn't specify in which order handshakes happen after initial dial. If the peer sends a handshake before receiving our handshake and if this peers handshake is received by us, 'DialLink' assumes everything is fine (no err is returned), it breaks the loop and returns the link. But then, very little time later, when the peer finally receives our handshake, this looks strange for the peer and it closes the connection. So in my understanding this should be fixed by explicitly comparing the encodings between our expected one and what the peer provided us. If encodings don't match we should retry with a new encoding in order to prevent the peer from closing the connection. For me this also explains why sometimes the tests passed and sometimes didn't: it depended on which node was faster ('race condition'). -> In this patch we add correspondig handshake test that demonstrates this problem. It currently fails as --- FAIL: TestHandshake (0.01s) --- FAIL: TestHandshake/enc=N (0.00s) newlink_test.go:154: handshake encoding mismatch: client: unexpected error: have: <nil> "<nil>" want: &neonet._HandshakeError{LocalRole:1, LocalAddr:net.pipeAddr{}, RemoteAddr:net.pipeAddr{}, Err:(*neonet._EncodingMismatchError)(0xc0000a4190)} "pipe - pipe: handshake (client): protocol encoding mismatch: peer = 'M' ; our side = 'N'" --- FAIL: TestHandshake/enc=M (0.00s) newlink_test.go:154: handshake encoding mismatch: client: unexpected error: have: <nil> "<nil>" want: &neonet._HandshakeError{LocalRole:1, LocalAddr:net.pipeAddr{}, RemoteAddr:net.pipeAddr{}, Err:(*neonet._EncodingMismatchError)(0xc0001a22cc)} "pipe - pipe: handshake (client): protocol encoding mismatch: peer = 'N' ; our side = 'M'" We will fix it in the next patch. /reported-by @levin.zimmermann /reported-on kirr/neo!5
-
Kirill Smelkov authored
go/neo/neonet: Dedicate an error type to indicate "protocol version mismatch" as handshake failure cause We will soon need to detect if a handshake failure was due to mismatch of protocol encodings and that would require introduction of dedicated error type for that cause. As a preparatory step first refactor "version mismatch cause" to follow the same style for symmetry.
-
Kirill Smelkov authored
-
Kirill Smelkov authored
LogContent -> LogTail py/neolog -> `python -m neo.scripts.neolog` FIXME for glog FIXME for admin_0.log and other log files
-
- 07 Jul, 2023 1 commit
-
-
Levin Zimmermann authored
If the peers encoding is different than our encoding two different scenarios can happen, because the handshake order is undefined (e.g. we don't know if our handshake is received before the peer sends its handshake): 1. Our handshake is received before peer sends its handshake, NEO/py closes connection if it sees unexpected magic, version, etc. 2. The client already sends a handshake before it proceeds our handshake. In this case it initally sends us it version, we can extract its encoding, and only later, once it proceeded our handshake with the bad encoding, it closes the connection. Before this patch case (2) wasn't handled correctly by the automatic encoding detection of 'DialLink'. 'DialLink' simply accepted the different-than-expected encoding, but once the peer proceeded the nodes handshake the peer closed the connection and the initially established and returned link was immediately closed again. Due to this it was good luck whether connecting with a peer different with an encoding different from the expected one worked or didn't work (it depended on which handshake was faster). Now 'DialLink' should reliably find the correct encoding and return a stable link.
-
- 29 Jun, 2023 3 commits
-
-
Levin Zimmermann authored
If a test fails we may want to invest further why this test fails and reading the NEO server log files can be an efficient way for these investigations. Therefore we should keep log files if a test fails.
-
Levin Zimmermann authored
-
Levin Zimmermann authored
If tests fail we may want to check NEO logs. The new function LogContent of the NEOSrv interface provides an API to fetch the relevant parts of the log file of a NEO server. This API can be used by test code in order to automatically print relevant server log information if a test fails.
-
- 24 May, 2023 2 commits
-
-
Kirill Smelkov authored
For example option `compress` controls kind of compression that _client_ performs when saving data to server. Similarly cache-size, logfile and read-only adjust on-client behaviour, not server. From nexedi/neoppod!18 (comment 124725) : In general the most correct thing to do is: - use host part for where to connect (host:port, list of host ports, UNIX socket, etc) - use path part to identify a database or other on-server resource - use query part for parameters that are passed to remote server (e.g. `storage` in case of ZEO) - use fragment part for local parameters that are not passed to remote server (e.g. local `logfile`) - use credentials part for things required to authenticate/encrypt. To normalize an URL wcfs client would drop credentials and fragment, but keep host, path and query. Fragments are documented not to be sent to remote side and to be evaluated by local side only. -> Move options that control client behaviour to fragment.
-
Kirill Smelkov authored
Because list of masters and cluster name must be already present in netloc and path. Previously e.g. neo://α,β,γ/db?master_nodes=a,b,c" would mean to use master nodes {a,b,c} not {α,β,γ}. Now it is treated as invalid URI to remove ambiguity. Same for cluster name.
-
- 18 Jan, 2023 3 commits
-
-
Kirill Smelkov authored
* master: go/neo: Expand user prefix in TLS key/cert paths go/neo/interal += xpath, xfilepath
-
Levin Zimmermann authored
This patch fixes a discrepancy between NEO/py and NEO/go: NEO/py expands the '~' and the '~username' prefix in the file path of the TLS certificate/key files [1]. This syntax is used in NEO/py SlapOS SR [2]. We need to fix this discrepancy in NEO/go in order to use TLS encryption with NEO + WCFS. [1] https://lab.nexedi.com/nexedi/neoppod/blob/7c539f0f/neo/lib/config.py#L149 and https://lab.nexedi.com/nexedi/neoppod/blob/fa63d856/neo/lib/app.py#L25-31 [2] https://lab.nexedi.com/nexedi/slapos/blob/397726e1/stack/erp5/instance-zodb-base.cfg.in#L18-20 and https://lab.nexedi.com/nexedi/slapos/blob/a8150a1a/software/neoppod/instance-neo-input-schema.json#L62 /reviewed-by @kirr /reviewed-on kirr/neo!1
-
Levin Zimmermann authored
The xfilepath package supports resolving filepaths with a user prefix to absolute paths: it converts '~' and '~username' to $HOME of user (as it's done by for instance bash). No builtin golang module supports this functionality [1]. We need this functionality in order to imitate the behaviour of NEO/py in NEO/go [2]. --- [1] https://stackoverflow.com/questions/47261719/how-can-i-resolve-a-relative-path-to-absolute-path-in-golang) [2] nexedi/slapos!1307 (comment 17574) /reviewed-by @kirr /reviewed-on kirr/neo!1
-
- 04 Nov, 2022 1 commit
-
-
Kirill Smelkov authored
-
- 03 Nov, 2022 1 commit
-
-
Kirill Smelkov authored
* master: go/neo/t/neotest: Use python -c 'print ...' in a way that works on both py2 and py3 go/neo/t/tzodb.py: Fix zhash for Python3
-
- 18 May, 2022 2 commits
-
-
Kirill Smelkov authored
Without parenthesis it was failing on py3: (neo) (py3.venv) (g.env) kirr@deca:~/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest info-local date: Wed, 18 May 2022 11:05:50 +0300 xnode: kirr@deca.navytux.spb.ru (2401:5180:0:af::1 192.168.0.3 (+ 1·ipv4)) uname: Linux deca 5.10.0-13-amd64 #1 SMP Debian 5.10.106-1 (2022-03-17) x86_64 GNU/Linux cpu: Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz File "<string>", line 1 print '%.2fGHz' % (400000 / 1E6) ^ SyntaxError: invalid syntax
-
Kirill Smelkov authored
On py3 dict.keys() returns iterator instead of list: $ ./tzodb.py zhash Traceback (most recent call last): File "/home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/t/./tzodb.py", line 141, in <module> main() File "/home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/t/./tzodb.py", line 138, in main zhash() File "/home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/t/./tzodb.py", line 59, in zhash optv, argv = getopt(sys.argv[2:], "h", ["help", "check=", "bench="] + hashRegistry.keys()) TypeError: can only concatenate list (not "dict_keys") to list
-
- 25 Nov, 2021 2 commits
-
-
Kirill Smelkov authored
* master: go/internal/xurl: New package (draft)
-
Kirill Smelkov authored
With xurl.ParseQuery being simpler analog of url.ParseQuery. Simpler: It returns regular map instead of url.Values by not allowing duplicates.
-
- 24 Nov, 2021 1 commit
-
-
Kirill Smelkov authored
Instead of requiring users to explicitly mark their tests as finished in particular subject-to-deadlock places, tracetest was fixed to unconditionally detect "extra sends" deadlocks automatically: kirr/go123@3b19f68c
-
- 19 Nov, 2021 2 commits
-
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
- 12 Nov, 2021 1 commit
-
-
Kirill Smelkov authored
go get: upgraded github.com/fsnotify/fsnotify v1.4.10-0.20200417215612-7f4cf4dd2b52 => v1.5.1 go get: upgraded github.com/golang/glog v0.0.0-20160126235308-23def4e6c14b => v1.0.0 go get: upgraded github.com/gwenn/gosqlite v0.0.0-20201008200117-82e079acf5b6 => v0.0.0-20211101095637-b18efb2e44c8 go get: upgraded github.com/gwenn/yacr v0.0.0-20200112083327-bbe82c1f4d60 => v0.0.0-20211101095056-492fb0c571bc go get: upgraded github.com/kisielk/og-rek v1.1.0 => v1.2.0 go get: upgraded github.com/soheilhy/cmux v0.1.5-0.20210205191134-5ec6847320e5 => v0.1.5 go get: upgraded github.com/tinylib/msgp v1.1.5 => v1.1.6 go get: upgraded golang.org/x/net v0.0.0-20210226172049-e18ecbb05110 => v0.0.0-20211111160137-58aab5ef257a go get: upgraded golang.org/x/sys v0.0.0-20210301091718-77cc2087c03b => v0.0.0-20211111213525-f221eed1c01e go get: upgraded golang.org/x/text v0.3.5 => v0.3.7 go get: upgraded lab.nexedi.com/kirr/go123 v0.0.0-20210128150852-c20e95f0f789 => v0.0.0-20210906140734-c9eb28d9e408 Highlights: - og-rek brings support for pickle protocol 5 - go123 brings in support for Go1.17
-
- 04 Oct, 2021 13 commits
-
-
Kirill Smelkov authored
* master: go/zodb/btree: Change V<op> family to also provide visited node key coverage on visit callback go/zodb/btree: Add KeyRange type go/zodb/btree: Introduce constants for min/max key value go/zodb/btree: tests: Don't forget to close storage go/zodb/btree: Cosmetics
-
Kirill Smelkov authored
WCFS needs to know key coverage for every visited node. WARNING: this is API change.
-
Kirill Smelkov authored
KeyRange represents [lo,hi) key range. It simplifies working with ranges of keys. We will use it in the next commit. KeyRange originated in WCFS and was copied from there: https://lab.nexedi.com/kirr/wendelin.core/blob/57be0126/wcfs/internal/xbtree/blib/keyrange.go
-
Kirill Smelkov authored
We were already using math.Min<Key> in one place, but the number of such places is going to increase. -> Keep min/max definition in only one place.
-
Kirill Smelkov authored
NOTE: db was already being closed in the test's code.
-
Kirill Smelkov authored
-
Kirill Smelkov authored
* t+btree: . . . . . . . X Start reworking BTree to provide keycov on visit callback
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-