Draft: stack/resilient: use restic instead of rdiff-backup for resiliency.
Why rdiff-backup is bad ?
rdiff-backup does nothing clever for different filename cases. If we already have 20250101.index and new 99% similar 20250102.index, rdiff-backup transfers 100% of the new file.
Also rdiff-backup compresses 100% of removed files and it does no transfer at all during compression.
(see rdiff-backup!9 (closed) or ZODB!2 (merged) for the detail. It's are just an ad-hoc workaround, not a solution, and we need more for other usecases.)
Why restic then ?
restic is clever for deduplication even for different file names. It also does transfer and compression simultaneously.
But restic does not have 'pull' backup ?
PBS has a rest-server listening on a unix socket with append-only mode. Then PBS invokes reverse ssh to Theia-0 that launches restic against a unix socket there, that is tunneled to the socket in PBS side.
With this way, Theia-0 cannot initiate restic process and even another user in Theia-0 or PBS cannot communicate with this temporary rest-server, because it does not listen on a TCP port.
Also, even if Theia-0 is compromised, old snapshots cannot be deleted on PBS because rest-server is running with append-only mode.
Compatibility ?
If the backup directory handled by rdiff-backup already exists, its latest content will be automatically imported to the restic repository at the first pull after upgrade.
Once any deletion of restic snapshot happens, the existing rdiff-backup directory will be automatically deleted, as it should be already too old.
Directories and files
PBS
- srv/backup/pbs/(name)/ : root directory of rdiff-backup
- srv/backup/pbs/(name)/rdiff-backup-data/ : rdiff-backup internal data and old backups
- srv/backup/pbs/(name).restic/ : NEW restic repository
- srv/backup/pbs/(name).restic/rest-server.sock: NEW unix socket for rest-server
Theia
- srv/backup/something: target directory of backup / restore
- srv/backup/ssh.sock: NEW unix socket for SSH reverse tunnel from PBS
- bin/restic-wrapper: NEW a wrapper script specified as ForceCommand in etc/resilient-sshd.conf.
No more latest files / directories themselves on PBS ?
You can mount restic repositories via FUSE. (ref: Restore using mount)
Design question
- DONE
rest-server can keep running permanently controled by supervisord ?- pros: easy to control the process. pull/push script become simpler.
- cons: use more file descriptor. cpu / memory usage can be ignorable while idle ? no additional potential security risk ?
- any other backup tool than restic ?
Removed parts
-
resilient-genstatrss-wrapper
- It was rdiff-backup specific, using slapos.toolbox/slapos/resilient/rdiffBackupStat2RSS.py. It was not automatically called before.
- Now pull/push script's stdout is automatically put in an RSS feed with the same URL as before at the end of the script.
-
pbs-push-history-log
- It contained only one last push output. Now push/pull outputs are embedded in the feed above, that can be found in
var/log/equeue.log
as well.
- It contained only one last push output. Now push/pull outputs are embedded in the feed above, that can be found in