• Kirill Smelkov's avatar
    gitlab-backup: Dump DB ourselves · 6fa6df4b
    Kirill Smelkov authored
    The reason to do this is that we want to have more control over DB dump
    process. Current problems which lead to this decision are:
    
        1. DB dump is one large file which size grows over time. This is not
           friendly to git;
    
        2. DB dump is currently not git/rsync friendly - when PostgreSQL
           does a dump, it just copes internal pages for data to output.
           And internal ordering changes every time a row is updated.
    
            http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/bin/pg_dump/pg_dump.c;h=aa01d6a6;hb=HEAD#l1590
            http://stackoverflow.com/questions/24622579/does-or-can-the-postgresql-copy-to-command-guarantee-a-particular-row-order
    
    both 1 and 2 currently put our backup tool to their knees. We'll be
    handling those issues in the following patches.
    
    For now we perform the dump manually and switch from dumping in
    plain-text SQL to dumping in PostgreSQL native "directory" format, where
    there is small table of contents with schema (toc.dat) and output of
    `COPY <table> TO stdout` for each table in separate file.
    
        http://www.postgresql.org/docs/9.5/static/app-pgdump.html
    
    On restore we restore plain-text SQL with pg_restore and give this
    plain-text SQL back to gitlab, so it thinks it restores it the usual way.
    
    NOTE: backward compatibility is preserved - restore part, if it sees
        backup made by older version of gitlab-backup, which dumps
        database.sql in plain text - restores it correctly.
    
    NOTE2: now gitlab-backup supports only PostgreSQL (e.g. not MySQL).
        Adding support for other databases is possible, but requires custom
        handler for every DB (or just a fallback to usual plaintext maybe).
    
    NOTE3: even as we split DB into separate tables, this does not currently
        help problem #1, as in GitLab it is mostly just one table which
        occupies the whole space.
    
    /cc @kazuhiko
    6fa6df4b
gitlab-backup 7.52 KB