Fix collective.recipe.shelloutput running "too early"
Our software using sshd were sometimes failing in tests, because the way they publish key fingerprint was racy.
It is based on
collective.recipe.shelloutput, which as we can see in the recipe code operates on
We are using
collective.recipe.shelloutput to capture the output of
ssh-keygen -lf $KEY and this must run after the file
$KEY is generated ( it is generated by another
plone.recipe.command version). We were trying to run the
collective.recipe.shelloutput after the
plone.recipe.command, but that was incorrect anyway, because
collective.recipe.shelloutput reads the file at
__init__ step, where
plone.recipe.command creates the file at
As we could see in test suite, it was sometimes working, when
slapos node instance ran only once, but it sometimes working, when
slapos node instance ran more than once, for example because a promise failed and
slapos node instance was retried.
collective.recipe.shelloutput does not take into account the exit code of the command but simply capture with
"Error ..." whatever the command might output on stderr, we add another step checking that the captured output is not
"Error ..." and if it is cause a buildout error so that
slapos node instance is retried and then succeed.
What should happen now is:
collective.recipe.shelloutputreads the key fingerprint, the file is not present so it captures
plone.recipe.commandcreates the key
plone.recipe.commandchecks that the captured fingerprint is not
"Error ..."it fails
- buildout restarts
collective.recipe.shelloutputreads key fingerprint correctly.
Slaprunner has been heavily modified, because it was using a
sshkeys_authority which was incompatible with this as it uses symlinks for keys. Since we don't know what is the purpose of
sshkeys_authority, we rewrote that software to use simple commands instead of that "ssh keys authority".
The changes were merged into master. The source branch has been removed.
- If we write a dedicated recipe to read fingerprint, maybe we can find a solution that does not rely on making buildout fail on first run to retry and succeed on second run.
- I thought about removing completely the ssh key authority, but it's still used in
- The ssh code is duplicated, I start to feel it would make sense to have a ssh stack that we could add easily in softwares (maybe also in testnode, since ssh is more confortable than shellinabox)
14 14 slaprunner-promise 15 15 slaprunner-supervisord-wrapper 16 16 runner-sshd-add-authorized-key 17 runner-sshd-graceful 18 17 runner-sshd-promise 19 runner-sshkeys-authority 20 runner-sshkeys-authority-service 21 runner-sshkeys-sshd 22 runner-sshkeys-sshd-service
For changes in
instance-runner-import.cfg.inif resiliency tests pass then it is OK for me.
Thanks, I made new tests suites: SlapOS.SlapRunner.ResilienceTest-!681 SlapOS.SlapRunner.ResilienceTest-!681.ERP5 cloning the "Master" ones. On the cloned one, I changed the distributor to "Deploy test" .
Is there a reason not to use "Deploy test" as distributor on resilience Master test suites ?
I think we should use only https://stack.nexedi.com/test_status
And if one needs to link to current / latest test result following link ( example) can be used :
I am observing strange errors in the logs of SlapOS.SlapRunner.ResilienceTest-!681.ERP5 test:
OSError: [Errno 39] Directory not empty: '/srv/slapgrid/slappart11/srv/testnode/cth/inst/test0-0/../../shared/gcc/b38ab3c08c9358a296259bfd5e127740__compile__/gcc-5.5.0'
OSError: [Errno 2] No such file or directory: '/srv/slapgrid/slappart11/srv/testnode/cth/inst/test0-0/../../shared/gcc/b38ab3c08c9358a296259bfd5e127740__compile__/gcc-5.5.0/conftest.c'
It seems we have issues like two webrunners are installing in same folder.
These changes passed on
SlapOS.SlapRunner.ResilienceTest-Master, so it should mean that basic functionality of resilient slaprunner is still OK.
SlapOS.SlapRunner.ResilienceTest-Master.ERP5never passed since we renamed the test, so let's ignore it for now.
I amended the commit, because the changes in slaprunner were wrong, because runner's flask app also read section
slapos.cfgand it failed with:
... File "/data/slappart11_testnode/cqg/inst/test0-0/tmp/soft/bc201c5c2ec114015e5d1d445eec4b4a/eggs/gunicorn-19.7.1-py2.7.egg/gunicorn/util.py", line 352, in import_app __import__(module) File "/data/slappart11_testnode/cqg/inst/test0-0/tmp/soft/bc201c5c2ec114015e5d1d445eec4b4a/eggs/slapos.toolbox-0.101-py2.7.egg/slapos/runner/run.py", line 185, in <module> run() File "/data/slappart11_testnode/cqg/inst/test0-0/tmp/soft/bc201c5c2ec114015e5d1d445eec4b4a/eggs/slapos.toolbox-0.101-py2.7.egg/slapos/runner/run.py", line 104, in run config.setConfig() File "/data/slappart11_testnode/cqg/inst/test0-0/tmp/soft/bc201c5c2ec114015e5d1d445eec4b4a/eggs/slapos.toolbox-0.101-py2.7.egg/slapos/runner/run.py", line 43, in setConfig configuration_dict = dict(configuration_parser.items(section)) File "/srv/slapgrid/slappart11/srv/testnode/cqg/shared/python2.7/ce4d6d304315ea57b8193ebd859d9b0c/lib/python2.7/ConfigParser.py", line 642, in items raise NoSectionError(section) ConfigParser.NoSectionError: No section: 'sshkeys_authority'
I pushed a simple fix for this. I was thinking to "remove more"
sshkeys_authorityin the slapos.toolbox part of slaprunner, this is used in https://lab.nexedi.com/nexedi/slapos.toolbox/blob/76c05ae8b0fb9365311c73a2050ee9470027e2e3/slapos/runner/templates/manageRepository.html#L51-56 and https://lab.nexedi.com/nexedi/slapos.toolbox/blob/76c05ae8b0fb9365311c73a2050ee9470027e2e3/slapos/runner/views.py#L116-126 (since slapos.toolbox@67a1d495 ) but don't understand this code - I don't see how one can clone a git repository with a ssh public key, so I prefer to keep this change minimum and leave this as is.
marked as a Work In ProgressToggle commit list
There was another failure
> s0[RootSoftwareInstance]: Promise 'slaprunner_frontend.py' failed with output: 'https://[2001:67c:1254:44:b176::ea90]:50005/login' is not available (returned 502, expected 200).
I'm retrying with slapos.core!175 it's possible that this is a random failure.
unmarked as a Work In ProgressToggle commit list
mergedToggle commit list