Commit 7f1e3771 authored by Marco Mariani's avatar Marco Mariani

pbs signature check: only sort by filenames, test for initial backup

parent 2355a629
......@@ -134,9 +134,10 @@ class Recipe(GenericSlapRecipe, Notify, Callback):
comments = ['', 'Pull data from a PBS *-export instance.', '']
rdiff_wrapper_template = textwrap.dedent("""\
#!/bin/sh
# %(comment)s
%(comment)s
LC_ALL=C
export LC_ALL
is_first_backup=$(test -d %(local_directory)s/rdiff-backup-data || echo yes)
RDIFF_BACKUP="%(rdiffbackup_binary)s"
$RDIFF_BACKUP %(rdiffbackup_parameter)s
if [ ! $? -eq 0 ]; then
......@@ -146,11 +147,17 @@ class Recipe(GenericSlapRecipe, Notify, Callback):
$RDIFF_BACKUP --check-destination-dir %(local_directory)s
if [ ! $? -eq 0 ]; then
# Here, two possiblities:
# * The first backup failed. It is safe to remove it since there is nothing valuable there.
# * The backup has been complete, but is now in a really weird state. Not safe to remove it.
if [ is_first_backup ]; then
:
# The first backup failed, and check-destination as well.
# we may want to remove the backup.
else
:
# The backup command has failed, while transferring an increment, and check-destination as well.
# XXX We may need to publish the failure and ask the the equeue, re-run this script again,
# instead do a push to the clone.
fi
fi
else
# Everything's okay, cleaning up...
$RDIFF_BACKUP --remove-older-than %(remove_backup_older_than)s --force %(local_directory)s
......@@ -158,7 +165,7 @@ class Recipe(GenericSlapRecipe, Notify, Callback):
if [ -e /srv/slapgrid/slappart17/srv/backup/pbs/COMP-1867-slappart6-runner-2/backup.signature ]; them
cd %(local_directory)s
find -type f ! -name backup.signature ! -wholename "./rdiff-backup-data/*" -print0 | xargs -0 sha256sum | LC_ALL=C sort -k 66 > ../proof.signature
find -type f ! -name backup.signature ! -wholename "./rdiff-backup-data/*" -print0 | xargs -P4 -0 sha256sum | LC_ALL=C sort -k 66 > ../proof.signature
diff -ruw backup.signature ../proof.signature > ../backup.diff
# XXX If there is a difference on the backup, we should publish the
# failure and ask the equeue, re-run this script again,
......
......@@ -53,7 +53,7 @@ mode = 0644
recipe = hexagonit.recipe.download
url = ${:_profile_base_location_}/template/runner-import.sh.jinja2
download-only = true
md5sum = 7d3c42b9cc457f41f6be72c765b8aadf
md5sum = c0d05a26b06ce172efaad03c52ef92ca
filename = runner-import.sh.jinja2
mode = 0644
......@@ -68,7 +68,7 @@ mode = 0644
recipe = hexagonit.recipe.download
url = ${:_profile_base_location_}/template/runner-export.sh.jinja2
download-only = true
md5sum = 072a6a15b17b364e709d89468a6ac180
md5sum = e8aee339d411226bc4145dc71b629582
filename = runner-export.sh.jinja2
mode = 0644
......
......@@ -24,4 +24,4 @@ if [ -d {{ directory['backup'] }}/runner/software ]; then
rm {{ directory['backup'] }}/runner/software/*
fi
cd {{ directory['backup'] }} && find -type f ! -name backup.signature -print0 | xargs -0 sha256sum | LC_ALL=C sort -k 66 > backup.signature
cd {{ directory['backup'] }} && find -type f ! -name backup.signature -print0 | xargs -P4 -0 sha256sum | LC_ALL=C sort -k 66 > backup.signature
......@@ -18,7 +18,7 @@ restore_element () {
write_backup_proof () {
cd {{ directory['backup'] }}
find -type f ! -name backup.signature ! -wholename "./rdiff-backup-data/*" -print0 | xargs -0 sha256sum | LC_ALL=C sort -k 66 > {{ directory['srv'] }}/proof.signature
find -type f ! -name backup.signature ! -wholename "./rdiff-backup-data/*" -print0 | xargs -P4 -0 sha256sum | LC_ALL=C sort -k 66 > {{ directory['srv'] }}/proof.signature
diff -ruw {{ directory['backup'] }} {{ directory['srv'] }}/proof.signature > {{ directory['srv'] }}/backup.diff
}
......
  • I think that using xargs -P4 is a bad idea. It's terrible for HDD and counter-productive when the machine has less than 4 cores, and this 2 criteria aren't rare for backup machines. A single core is also often close to SSD speed, or even higher.

    Currently, I see a machine with 4 processes in D-state, checksumming at very low speed. It's a VM and I don't know yet what kind of disk it is.

    This commit is not the only one adding this flag.

    /cc @Nicolas @vpelletier @rafael @seb

  • +1 for removing -P4 for checksum operations.

  • +1, if I remember correctly, I always wanted that background proccess consume as less as possible, this probablly passed unnotice during the review.

  • Changed in 7f8c4418

  • Thank you

Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment