Commit 1597f3fd authored by Gabriel Mazetto's avatar Gabriel Mazetto

Merge branch '10579-geo-document-how-our-git-push-proxy-works' into 'master'

Geo: Document how our Git push proxy works

Closes #10579

See merge request gitlab-org/gitlab-ee!10518
parents aa672175 f4eb7f77
...@@ -2,10 +2,48 @@ ...@@ -2,10 +2,48 @@
Geo connects GitLab instances together. One GitLab instance is Geo connects GitLab instances together. One GitLab instance is
designated as a **primary** node and can be run with multiple designated as a **primary** node and can be run with multiple
**secondary** nodes. Geo orchestrates quite a few components that are **secondary** nodes. Geo orchestrates quite a few components that can be seen on
described in more detail below. the diagram below and are described in more detail within this document.
## Database replication ![Geo Architecture Diagram](../administration/geo/replication/img/geo_architecture.png)
## Replication layer
Geo handles replication for different components:
- [Database](#database-replication): includes the entire application, except cache and jobs.
- [Git repositories](#repository-replication): includes both projects and wikis.
- [Uploaded blobs](#uploads-replication): includes anything from images attached on issues
to raw logs and assets from CI.
With the exception of the Database replication, on a *secondary* node, everything is coordinated
by the [Geo Log Cursor](#geo-log-cursor).
### Geo Log Cursor daemon
The [Geo Log Cursor daemon](#geo-log-cursor-daemon) is a separate process running on
each **secondary** node. It monitors the [Geo Event Log](#geo-event-log)
for new events and creates background jobs for each specific event type.
For example when a repository is updated, the Geo **primary** node creates
a Geo event with an associated repository updated event. The Geo Log Cursor daemon
picks the event up and schedules a `Geo::ProjectSyncWorker` job which will
use the `Geo::RepositorySyncService` and `Geo::WikiSyncService` classes
to update the repository and the wiki respectively.
The Geo Log Cursor daemon can operate in High Availability mode automatically.
The daemon will try to acquire a lock from time to time and once acquired, it
will behave as the *active* daemon.
Any additional running daemons on the same node, will be in standby
mode, ready to resume work if the *active* daemon releases its lock.
We use the [`ExclusiveLease`](https://www.rubydoc.info/github/gitlabhq/gitlabhq/Gitlab/ExclusiveLease) lock type with a small TTL, that is renewed at every
pooling cycle. That allows us to implement this global lock with a timeout.
At the end of the pooling cycle, if the daemon can't renew and/or reacquire
the lock, it switches to standby mode.
### Database replication
Geo uses [streaming replication](#streaming-replication) to replicate Geo uses [streaming replication](#streaming-replication) to replicate
the database from the **primary** to the **secondary** nodes. This the database from the **primary** to the **secondary** nodes. This
...@@ -13,7 +51,7 @@ replication gives the **secondary** nodes access to all the data saved ...@@ -13,7 +51,7 @@ replication gives the **secondary** nodes access to all the data saved
in the database. So users can log in on the **secondary** and read all in the database. So users can log in on the **secondary** and read all
the issues, merge requests, etc. on the **secondary** node. the issues, merge requests, etc. on the **secondary** node.
## Repository replication ### Repository replication
Geo also replicates repositories. Each **secondary** node keeps track of Geo also replicates repositories. Each **secondary** node keeps track of
the state of every repository in the [tracking database](#tracking-database). the state of every repository in the [tracking database](#tracking-database).
...@@ -23,7 +61,7 @@ There are a few ways a repository gets replicated by the: ...@@ -23,7 +61,7 @@ There are a few ways a repository gets replicated by the:
- [Repository Sync worker](#repository-sync-worker). - [Repository Sync worker](#repository-sync-worker).
- [Geo Log Cursor](#geo-log-cursor). - [Geo Log Cursor](#geo-log-cursor).
### Project Registry #### Project Registry
The `Geo::ProjectRegistry` class defines the model used to track the The `Geo::ProjectRegistry` class defines the model used to track the
state of repository replication. For each project in the main state of repository replication. For each project in the main
...@@ -32,15 +70,15 @@ database, one record in the tracking database is kept. ...@@ -32,15 +70,15 @@ database, one record in the tracking database is kept.
It records the following about repositories: It records the following about repositories:
- The last time they were synced. - The last time they were synced.
- The last time they were synced successfully. - The last time they were successfully synced.
- If they need to be resynced. - If they need to be resynced.
- When retry should be attempted. - When a retry should be attempted.
- The number of retries. - The number of retries.
- If and when the they were verified. - If and when they were verified.
It also stores these attributes for project wikis in dedicated columns. It also stores these attributes for project wikis in dedicated columns.
### Repository Sync worker #### Repository Sync worker
The `Geo::RepositorySyncWorker` class runs periodically in the The `Geo::RepositorySyncWorker` class runs periodically in the
background and it searches the `Geo::ProjectRegistry` model for background and it searches the `Geo::ProjectRegistry` model for
...@@ -59,26 +97,12 @@ times, Geo does a so-called _redownload_. It will do a clean clone ...@@ -59,26 +97,12 @@ times, Geo does a so-called _redownload_. It will do a clean clone
into the `@geo-temporary` directory in the root of the storage. When into the `@geo-temporary` directory in the root of the storage. When
it's successful, we replace the main repo with the newly cloned one. it's successful, we replace the main repo with the newly cloned one.
### Geo Log Cursor ### Uploads replication
The [Geo Log Cursor](#geo-log-cursor) is a separate process running on
each **secondary** node. It monitors the [Geo Event Log](#geo-event-log)
and handles all of the events. When it sees an unhandled event, it
starts a background worker to handle that event, depending on the type
of event.
When a repository receives an update, the Geo **primary** node creates
a Geo event with an associated repository updated event. The cursor
picks that up, and schedules a `Geo::ProjectSyncWorker` job which will
use the `Geo::RepositorySyncService` class and `Geo::WikiSyncService`
class to update the repository and the wiki.
## Uploads replication
File uploads are also being replicated to the **secondary** node. To File uploads are also being replicated to the **secondary** node. To
track the state of syncing, the `Geo::FileRegistry` model is used. track the state of syncing, the `Geo::FileRegistry` model is used.
### File Registry #### File Registry
Similar to the [Project Registry](#project-registry), there is a Similar to the [Project Registry](#project-registry), there is a
`Geo::FileRegistry` model that tracks the synced uploads. `Geo::FileRegistry` model that tracks the synced uploads.
...@@ -86,7 +110,7 @@ Similar to the [Project Registry](#project-registry), there is a ...@@ -86,7 +110,7 @@ Similar to the [Project Registry](#project-registry), there is a
CI Job Artifacts are synced in a similar way as uploads or LFS CI Job Artifacts are synced in a similar way as uploads or LFS
objects, but they are tracked by `Geo::JobArtifactRegistry` model. objects, but they are tracked by `Geo::JobArtifactRegistry` model.
### File Download Dispatch worker #### File Download Dispatch worker
Also similar to the [Repository Sync worker](#repository-sync-worker), Also similar to the [Repository Sync worker](#repository-sync-worker),
there is a `Geo::FileDownloadDispatchWorker` class that is run there is a `Geo::FileDownloadDispatchWorker` class that is run
...@@ -113,7 +137,7 @@ Authorization: GL-Geo <access_key>:<JWT payload> ...@@ -113,7 +137,7 @@ Authorization: GL-Geo <access_key>:<JWT payload>
``` ```
The **primary** node uses the `access_key` field to look up the The **primary** node uses the `access_key` field to look up the
corresponding Geo **secondary** node and decrypts the JWT payload, corresponding **secondary** node and decrypts the JWT payload,
which contains additional information to identify the file which contains additional information to identify the file
request. This ensures that the **secondary** node downloads the right request. This ensures that the **secondary** node downloads the right
file for the right database ID. For example, for an LFS object, the file for the right database ID. For example, for an LFS object, the
...@@ -133,6 +157,28 @@ NOTE: **Note:** ...@@ -133,6 +157,28 @@ NOTE: **Note:**
JWT requires synchronized clocks between the machines JWT requires synchronized clocks between the machines
involved, otherwise it may fail with an encryption error. involved, otherwise it may fail with an encryption error.
## Git Push to Geo secondary
The Git Push Proxy exists as a functionality built inside the `gitlab-shell` component.
It is active on a **secondary** node only. It allows the user that has cloned a repository
from the secondary node to push to the same URL.
Git `push` requests directed to a **secondary** node will be sent over to the **primary** node,
while `pull` requests will continue to be served by the **secondary** node for maximum efficiency.
HTTPS and SSH requests are handled differently:
- With HTTPS, we will give the user a `HTTP 302 Redirect` pointing to the project on the **primary** node.
The git client is wise enough to understand that status code and process the redirection.
- With SSH, because there is no equivalent way to perform a redirect, we have to proxy the request.
This is done inside [`gitlab-shell`](https://gitlab.com/gitlab-org/gitlab-shell), by first translating the request
to the HTTP protocol, and then proxying it to the **primary** node.
The [`gitlab-shell`](https://gitlab.com/gitlab-org/gitlab-shell) daemon knows when to proxy based on the response
from `/api/v4/allowed`. A special `HTTP 300` status code is returned and we execute a "custom action",
specified in the response body. The response contains additional data that allows the proxied `push` operation
to happen on the **primary** node.
## Using the Tracking Database ## Using the Tracking Database
Along with the main database that is replicated, a Geo **secondary** Along with the main database that is replicated, a Geo **secondary**
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment