upstream.go · 1b274d0dc08ba96fc8d55f66499dc6ef8bb556b5 · Kirill Smelkov / gitlab-workhorse

Teach gitlab-workhorse to serve requests to get raw blobs · 1b274d0d
Kirill Smelkov authored Dec 09, 2015
Currently GitLab serves requests to get raw blobs via Ruby-on-Rails code and
Unicorn. Because RoR/Unicorn is relatively heavyweight, in environment where
there are a lot of simultaneous requests to get raw blobs, this works very slow
and server is constantly overloaded.

On the other hand, to get raw blob content, we do not need anything from RoR
framework - we only need to have access to project git repository on filesystem,
and knowing whether access for getting data from there should be granted or
not. That means it is possible to adjust Nginx frontend to route '.../raw/....'
request to more lightweight and performant program which does this particular
task and that will be a net win.

As gitlab-workhorse is written in Go, and Go has good concurrency/parallelism
support and is generally much faster than Ruby, adding raw blob serving task to
it makes sense.

In this patch: we add infrastructure to process GET request for '/raw/...':

- extract project / ref and path from URL
- query auth backend for whether download access should be granted or not
- emit blob content via spawning external `git cat-file`

I've tried to mimic the output to be as close as the one emitted by RoR code,
with the idea that for users the change should be transparent.

As in this patch we do auth backend query for every request to get a blob, RoR
code is still loaded very much, so essentially there is no speedup yet:

  (on a 8-CPU i7-3770S with 16GB of RAM)

  # request goes to unicorn  (9 unicorn workers)
  $ ./wrk -c40 -d10 -t1 --latency https://[2001:67c:1254:e:8b::c776]:7777/root/slapos/raw/master/software/wendelin/software.cfg
  Running 10s test @ https://[2001:67c:1254:e:8b::c776]:7777/root/slapos/raw/master/software/wendelin/software.cfg
    1 threads and 40 connections
    Thread Stats   Avg      Stdev     Max   +/- Stdev
      Latency   553.06ms  166.39ms   1.29s    80.06%
      Req/Sec    69.53     23.12   140.00     71.72%
    Latency Distribution
       50%  525.41ms
       75%  615.63ms
       90%  774.48ms
       99%    1.05s
    695 requests in 10.02s, 1.38MB read
  Requests/sec:     69.38
  Transfer/sec:    141.47KB

  # request goes to gitlab-workhorse with the following added to nginx conf
  # location ~ ^/[\w\.-]+/[\w\.-]+/raw/ {
  #   error_page 418 = @gitlab-workhorse;
  #   return 418;
  # }
  $ ./wrk -c40 -d10 -t1 --latency https://[2001:67c:1254:e:8b::c776]:7777/root/slapos/raw/master/software/wendelin/software.cfg
  Running 10s test @ https://[2001:67c:1254:e:8b::c776]:7777/root/slapos/raw/master/software/wendelin/software.cfg
    1 threads and 40 connections
    Thread Stats   Avg      Stdev     Max   +/- Stdev
      Latency   549.37ms  220.53ms   1.69s    84.74%
      Req/Sec    71.01     25.49   160.00     70.71%
    Latency Distribution
       50%  514.66ms
       75%  584.32ms
       90%  767.46ms
       99%    1.37s
    709 requests in 10.01s, 1.26MB read
  Requests/sec:     70.83
  Transfer/sec:    128.79KB

In the next patch we'll cache requests to auth backend and that will improve
performance dramatically.
1b274d0d
upstream.go 5.51 KB
Replace upstream.go