wcfs/wcfs_faultyprot_test.py · e145384aeb4cb99005fabc7d12e57fb78dd39006 · Kirill Smelkov / wendelin.core

wcfs: tests: Extend faulty protection tests with more kinds of faulty clients · e145384a

Kirill Smelkov authored Sep 16, 2024

So far we were testing only against faulty client that reads pin
notification ok, but does not reply to the notification. But there could
be more problems:

1) a client does not read pin notification at all
2) a client closes watchlink abruptly after reading pin notification
3) a client replies to pin notification but the reply is not "ack"

The first problem, if not handled leads to whole set of clients to
become stuck on reading the same block as the faulty client. The other
problems also indicate breakage of the isolation protocol from the client
side and that wcfs can no longer be sure that it provides good
uncorrupted data to the client.

In the first case, similarly to "no reply" situation we need to kill the
client to make progress while maintaining safety as well. In the cases 2
and 3 we cannot maintain safety if the faulty client remains in the set
of live and served clients, so it is also logical to send SIGBUS/SIGKILL
to it.

Killing a client with SIGBUS is similar to how OS kernel sends SIGBUS when
a memory-mapped file is accessed and loading file data results in EIO. It is
also similar to wendelin.core 1 where SIGBUS is raised if loading file block
results in an error.

Extend tests to cover all explained scenarios.

e145384a

wcfs_faultyprot_test.py 16.1 KB

Replace wcfs_faultyprot_test.py