• Kirill Smelkov's avatar
    restore: Extract packs in multiple workers · ff2f0b67
    Kirill Smelkov authored
    This way it allows us to leverage multiple CPUs on a system for pack
    extractions, which are computation-heavy operations.
    
    The way to do is more-or-less classical:
    
        - main worker prepares requests for pack extraction jobs
    
        - there are multiple pack-extraction workers, which read requests
          from jobs queue and perform them
    
        - at the end we wait for everything to stop, collect errors and
          optionally signalling the whole thing to cancel if we see an error
          coming. (it is only a signal and we still have to wait for
          everything to stop)
    
    The default number of workers is N(CPU) on the system - because we spawn
    separate `git pack-objects ...` for every request.
    
    We also now explicitly limit N(CPU) each `git pack-objects ...` can use
    to 1. This way control how many resources to use is in git-backup hand
    and also git packs better this way (when only using 1 thread) because
    when deltifying all objects are considered to each other, not only all
    objects inside 1 thread's object poll, and even when pack.threads is not
    1, first "objects counting" phase of pack is serial - wasting all but 1
    core.
    
    On lab.nexedi.com we already use pack.threads=1 by default in global
    gitconfig, but the above change is for code to be universal.
    
    Time to restore nexedi/ from lab.nexedi.com backup:
    
    2CPU laptop:
    
        before (pack.threads=1)     10m11s
        before (pack.threads=NCPU)   9m13s
        after  -j1                  10m11s
        after                        6m17s
    
    8CPU system (with other load present, noisy) :
    
        before (pack.threads=1)     ~5m
        after                       ~1m30s
    ff2f0b67
git-backup.go 34.7 KB