resilient: find doesn't fail if files have wrong permissions
we have webrunners where some files don't belong to correct user in partitions. In this case, find fails and we get this error:
Traceback (most recent call last): File "/opt/slapgrid/01ce71861bb9c5883cdbdf8af74c5d1f/bin/runner-exporter", line 83, in <module> sys.exit(slapos.resilient.runner_exporter.runExport()) File "/opt/slapgrid/01ce71861bb9c5883cdbdf8af74c5d1f/eggs/slapos.toolbox-0.100-py2.7.egg/slapos/resilient/runner_exporter.py", line 182, in runExport modified_file_list = getBackupFilesModifiedDuringExportList(args, export_start_date) File "/opt/slapgrid/01ce71861bb9c5883cdbdf8af74c5d1f/eggs/slapos.toolbox-0.100-py2.7.egg/slapos/resilient/runner_exporter.py", line 106, in getBackupFilesModifiedDuringExportList 'find', 'instance', '-cmin', str(export_time / 60), '-type', 'f', '-path', '*/srv/backup/*' File "/opt/slapgrid/01ce71861bb9c5883cdbdf8af74c5d1f/parts/python2.7/lib/python2.7/subprocess.py", line 223, in check_output raise CalledProcessError(retcode, cmd, output=output) subprocess.CalledProcessError: Command '('find', 'instance', '-cmin', '0.335502366225', '-type', 'f', '-path', '*/srv/backup/*')' returned non-zero exit status 1
Make find don't go into directories with wrong permissions
I have a thought : when we rsync files, we use the options "-og" which preserve the group and the owner of the files. Using the
-permoption seems the correct way to do, but can it create other issues in the resiliency process (iow : I think it's not OK to create a file belonging to root) ?
Maybe it is OK, as in
man findthere is :
-o, --owner preserve owner (super-user only)
which would mean this switch is useless in our case.
How is it possible that a non root unix user created directories owned by root ? wasn't someone using root user by mistake ? (in which case maybe we don't want to support this and we just chown back the files to this webrunner user)
@jerome indeed, it's not possible by default and the user owning this partition has special rights in the sudoers file because compiling ChromiumOS necessitate root permission. I agree this is an odd case and shouldn't happen most of time. Still it exists and this webrunner instance is failing.
@Nicolas the rsync process is not launched with root user but with slapuser so preserve owner and group will only work if the file belongs to that user. Anyway, the rsync is used later to rsync what is backuped. Here the problem is just find going through all the instance directories and failing on some root files. Those files are not backuped at all.
Hi @rafael ,
I think if there are files that don't belong to the slapuser, resiliency should crash (then user will know something is wrong in their slaprunner).
This said, I know in the case of the slaprunner building nayuos, we need to be root for some compilation steps, which lead to this special case. Then going the way of backuping files that the slapuser can at least read seems ok to me FOR THIS SPECIAL CASE.
To think further (I don't say we shouldn't do anything about this, I just put my thoughts here so we have a base of discussion) : these files belonging to root will after copy to runner1 belong to the slapuser (as I said, I don't think root is preserved). Thus after takeover building nayuos won't work, until someone spends time to find which permission should be put on which file.
So if we want to go the most correct way, we should indeed check what are these files belonging to root :
- if they are compilation residues, then they should be excluded of the backup.
- if they are compilation results (iow, important to keep), then they should be chown at the end of the compilation (as we are root at that moment).
- if they are binaries used for the compilation, they should or belong to the nayuos SR, with whatever it takes to make them belong to root later (promise raising, documentation on how to build nayuos, ...), but then they don't need to be backuped (they will be recreated on runner 1 by resiliency), or be created at the system level, and then they don't belong to the backup neither (maybe they can be created by an ansible playbook ?).
I don't know how much effort we want to put in this, so as I said before, backuping the files that we can at least read is OK for me.
Everyone, please remember that this find is used in
getBackupFilesModifiedDuringExportListfunction. This function searches for files with path looking like
*/srv/backup/*modified recently in all instance directories. But while doing this search, the find goes inside all the instance directories (backup, no backup, etc.). The problem here is not that we want to backup files belonging to root. We just want resiliency not to fail because a few filles belong to root.
Another way maybe is to explicitly look into
'find', 'instance/slappart*/srv/backup', '-cmin', str(export_time / 60), '-type', 'f'
Thanks for making me notice that the issue happens during
getBackupFilesModifiedDuringExportList, I totally missed that. Could you also tell me if we want to backup these files or not ? If not, I think we should exclude the problematic folders from the backup by adding their path in the file
srv/exporter.exclude`` and updategetBackupFilesModifiedDuringExportList
to add the support of excluded files in the call tofind
and not after (by using the switch! -path```). If we want them in the backup, then there is another bug as the rsync shouldn't have returned an error...
@rafael : sorry, I shouldn't have used "crash", but "fail gracefully" :) . Then the failure is found out by a promise to alert the user. I mean, exporting a backup that we can't rebuild on the other site and discovering this too late is much worst than having the resiliency alert the user that something is wrong (and in that case we are still in the it-works case because we have monitoring).
The more I think about this case, the more I am convinced that the issue is with the NayuOS software release, and not with the resiliency code (for exemple, why does it store files that the slapuser can't read in the backup folder ? If it is a real backup, then it's good that
getBackupFilesModifiedDuringExportListraise on them. If they are not to backup, then they shouldn't be stored there.).
Those files belonging to root are not backuped at all (actually, almost nothing is backuped there):
$ cat instance/slappart0/srv/exporter.exclude parts/ parts/**
The files belonging to root are under
@Nicolas the problem is that find doesn't have a exclude file list option like rsync. Beside, for me it make no sense to look inside the
srv/backupdirectory and exclude the files in
srv/exporter.exclude(because the exporter.exclude is listing what we should exclude to copy inside srv/backup). What do you think about my suggestion:
'find', 'instance/slappart*/srv/backup', '-cmin', str(export_time / 60), '-type', 'f'? Do you remember why we didn't do like this from the beginning ? This restricts the find to only meaningful directories.
'find', 'instance/slappart*/srv/backup'seems very good to me.
Why it wasn't done before I don't know, but as far as I know there is no strong reason for the pattern
Sorry, I meant :
for the pattern