software/slaprunner: Fix resiliency (runner-sshd)
/reviewed-on nexedi/slapos!465
-
Developer
@guillaume.hervier Good to see that this issue is now fixed
👍 The commit message is not good however, as runner-sshd is a base service of the runner itself, and has nothing to do with the resiliency (the resilient stack deploys its own ssh server). Did you have time to check also the 2nd bug I've reported about the duplicate certificate authority service ? -
Indeed, i should have named the commit something like "Fix resilient webrunner"
About the duplicate
certificate-authority
bug, the problem is that this part is defined at multiples locations because of stack/monitor also defining this part, which create some conflicts on what part takes precedence, which makes the presence of twocertificate-authority
services, one from the slaprunner SR, one from the monitor stack...I don't actually have a good fix for that as the problem is mostly about parts conflicts between SRs and how we should name them, which I think could lead to bigger problems
-
Developer
which I think could lead to bigger problems
Indeed, for the moment we have something that "happens to work", but by using a bug in the Software Release.
which create some conflicts on what part takes precedence,
I don't know if you already digged in buildout's code (if not it will happen some day for sure), but in reality it is really simple (and stupid). You can overwrite options by re-defining its section in a higher-level template. For a same section name, the highest/latest section wins.
Which means that the latest
certificate-authority
declared wins. Now, if I look to a resilient webrunner .installed-switch-software-type, I have :- a section
[certificate-authority]
installing the service ~/etc/service/certificate_authority - a section
[certificate-authority-service]
installing the service ~/etc/service/certificate_authority-f570d062fd7fa79dc5303876e0c4575e - certificate_authority-f570d062fd7fa79dc5303876e0c4575e wrapping a ~/srv/runner/instance/slappart2/bin/certificate_authority that doesn't exist
In your commit 142d35d8 , I think you intended to make the section
[certificate-authority]
create a script in ~/bin, and you wanted to install the service with a new section[certificate-authority-service]
.As there is no ~/bin/certificate_authority but a ~/etc/service/certificate_authority, it seems that the
[certificate-authority]
section installed is the one from monitor, which means that the monitor buildout is overwriting the common instance-runner.cfg buildout.Because I was interested in the issue, I played a bit to understand what was happening. The result is here : Nicolas/slapos@0c519a52 . It seems fixing this issue of duplicated
[certificate-authority]
. Please give me your opinion on this commit. If it is positive, I'll run tests on it then merge.In conclusion, the bugs introduced by !409 (merged) could have been spotted before or right after merging into master, thanks to test results. At that time tests were broken, this is why we found it out so long after. So if someone breaks tests again, please crush them.
- a section