Commit e6c4baa9 authored by Kirill Smelkov's avatar Kirill Smelkov

readme: Explain the original motivation for git-backup

Even though we have the readme people keep on wondering why git-backup
was created at all. Rafael suggested to share the original announcement
email "so we dont forget later on why we have this."

-> Do that.

/suggested-and-reviewed-by @rafael
/reviewed-on !10
/reviewed-on https://www.erp5.com/group_section/forum/Gitlab-backup-zDVMZqaMAK/view?list_start=8&reset=1#2074720302
parent 3327aedf
......@@ -17,6 +17,15 @@ and for backup to have history and be otherwise managed as a usual Git
repository. In particular it is possible to use standard git pull/push to
synchronize backups in several places.
The original motivation for git-backup was to manage backups of `lab.nexedi.com`__
with being able to deduplicate content of forks, and to be able to track the
whole history of the site. The last property is similar to ZODB where Nexedi
used to "never pack" and keep the whole history of the whole site. Please see
the Appendix for more details.
__ https://lab.nexedi.com
Backup workflow is:
1. create backup repository::
......@@ -59,3 +68,86 @@ for details.
__ git-backup.go
__ contrib/gitlab-backup
--------
Appendix. Original announcement
===============================
:Subject: [Nexedi] [ANNOUNCE] Program to backup several Git repositories into 1
:From: Kirill Smelkov <kirr@nexedi.com>
:Date: Mon, 31 Aug 2015 22:36:31 +0300
Hi All,
Recently we had discussion with Kazuhiko on current GitLab backup state.
GitLab approach is to create tarball for every repository and then
create one big tar file containing everything. In presence of forks this
results in waste of disk space which gets worse the more forks and
personal repositories we have.
Even today, when a lot of development happens not yet on GitLab, 1
standard GitLab backup takes ~ 3GB, which creates pressure for storage
and consequently forces admin to make compromises wrt how long to keep
backup history. Again, this will become more heavy as we move more and
more to GitLab.
So clearly something has to be done.
With this email I propose the idea to backup Git hosting via Git itself.
For this we need to pull all hosted objects (from all git repositories)
into 1 git database and then leverage Git's good ability to deduplicate
and pack content. Plus we need to carefully remember which refs from
which repositories point to which objects so we can properly restore.
That's basically all. I've tried to do a POC which is available here:
https://lab.nexedi.cn/kirr/git-backup
and contains more details. The main program[1] is generic + there is
concrete driver to backup GitLab repositories together with database
dump and everything else[2].
It has been tested by me on our GitLab instance manually for some time
already and preliminarily results are::
GitLab POC
time of 1st run 2m25s 7m41s
backup size after 1st run 3013MB 363MB
time of 2nd run 1m28s 1m52s
(with small commit)
backup size increase +3013MB +4MB (*)
after 2nd run
(*) I've tracked this +4MB to the fact that git leaves empty directory
refs/backup/<dir>/ if e.g. refs/backup/<dir>/some-ref was deleted and
<dir> becomes empty. This can be improved in git itself or worked around
in the tool. Actual data growth in db objects is few kilobytes.
In other words backup size is already ~10 times smaller compared to
GitLab default and because size increase on incremental runs is small on
average, it creates practical ability to store backup history forever,
just like we do with histories in usual Git repositories.
Restoration process has been also verified manually, and besides that, on
each restore run, the program verifies extracted git repositories for
connectivity correctness. So in my view this should be safe to use.
...
I welcome feedback, questions and review of the tool. If all goes well
and we use it on our GitLab instance for some time ok, my idea is to
make the announcement to a wider audience.
...
| Thanks,
| Kirill
| [1] https://lab.nexedi.cn/kirr/git-backup/blob/master/git-backup
| [2] https://lab.nexedi.cn/kirr/git-backup/blob/master/contrib/gitlab-backup
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment