@@ -72,7 +72,7 @@ Learn how to install, configure, update, and maintain your GitLab instance.
...
@@ -72,7 +72,7 @@ Learn how to install, configure, update, and maintain your GitLab instance.
-[Branded login page](../customization/branded_login_page.md): Customize the login page with your own logo, title, and description.
-[Branded login page](../customization/branded_login_page.md): Customize the login page with your own logo, title, and description.
-[Welcome message](../customization/welcome_message.md): Add a custom welcome message to the sign-in page.
-[Welcome message](../customization/welcome_message.md): Add a custom welcome message to the sign-in page.
-["New Project" page](../customization/new_project_page.md): Customize the text to be displayed on the page that opens whenever your users create a new project.
-["New Project" page](../customization/new_project_page.md): Customize the text to be displayed on the page that opens whenever your users create a new project.
-[Additional custom email text](https://docs.gitlab.com/ee/user/admin_area/settings/email.html#custom-additional-text): Add additional custom text to emails sent from GitLab. **[PREMIUM ONLY]**
-[Additional custom email text](https://docs.gitlab.com/ee/user/admin_area/settings/email.html#custom-additional-text-premium-only): Add additional custom text to emails sent from GitLab. **[PREMIUM ONLY]**
-[Database](#database-replication): includes the entire application, except cache and jobs.
-[Git repositories](#repository-replication): includes both projects and wikis.
-[Uploaded blobs](#uploads-replication): includes anything from images attached on issues
to raw logs and assets from CI.
With the exception of the Database replication, on a *secondary* node, everything is coordinated
by the [Geo Log Cursor](#geo-log-cursor).
### Geo Log Cursor daemon
The [Geo Log Cursor daemon](#geo-log-cursor-daemon) is a separate process running on
each **secondary** node. It monitors the [Geo Event Log](#geo-event-log)
for new events and creates background jobs for each specific event type.
For example when a repository is updated, the Geo **primary** node creates
a Geo event with an associated repository updated event. The Geo Log Cursor daemon
picks the event up and schedules a `Geo::ProjectSyncWorker` job which will
use the `Geo::RepositorySyncService` and `Geo::WikiSyncService` classes
to update the repository and the wiki respectively.
The Geo Log Cursor daemon can operate in High Availability mode automatically.
The daemon will try to acquire a lock from time to time and once acquired, it
will behave as the *active* daemon.
Any additional running daemons on the same node, will be in standby
mode, ready to resume work if the *active* daemon releases its lock.
We use the [`ExclusiveLease`](https://www.rubydoc.info/github/gitlabhq/gitlabhq/Gitlab/ExclusiveLease) lock type with a small TTL, that is renewed at every
pooling cycle. That allows us to implement this global lock with a timeout.
At the end of the pooling cycle, if the daemon can't renew and/or reacquire
the lock, it switches to standby mode.
### Database replication
Geo uses [streaming replication](#streaming-replication) to replicate
Geo uses [streaming replication](#streaming-replication) to replicate
the database from the **primary** to the **secondary** nodes. This
the database from the **primary** to the **secondary** nodes. This
...
@@ -13,7 +51,7 @@ replication gives the **secondary** nodes access to all the data saved
...
@@ -13,7 +51,7 @@ replication gives the **secondary** nodes access to all the data saved
in the database. So users can log in on the **secondary** and read all
in the database. So users can log in on the **secondary** and read all
the issues, merge requests, etc. on the **secondary** node.
the issues, merge requests, etc. on the **secondary** node.
## Repository replication
### Repository replication
Geo also replicates repositories. Each **secondary** node keeps track of
Geo also replicates repositories. Each **secondary** node keeps track of
the state of every repository in the [tracking database](#tracking-database).
the state of every repository in the [tracking database](#tracking-database).
...
@@ -23,7 +61,7 @@ There are a few ways a repository gets replicated by the:
...
@@ -23,7 +61,7 @@ There are a few ways a repository gets replicated by the:
@@ -11,37 +11,38 @@ All Geo nodes have the following settings:
...
@@ -11,37 +11,38 @@ All Geo nodes have the following settings:
| Setting | Description |
| Setting | Description |
| --------| ----------- |
| --------| ----------- |
| Primary | This marks a Geo Node as primary. There can be only one primary, make sure that you first add the primary node and then all the others. |
| Primary | This marks a Geo Node as **primary** node. There can be only one **primary** node; make sure that you first add the **primary** node and then all the others. |
| URL | The instance's full URL, in the same way it is configured in `/etc/gitlab/gitlab.rb` (Omnibus GitLab installations) or `gitlab.yml` (source based installations). |
| Name | The unique identifier for the Geo node. Must match the setting `gitlab_rails[geo_node_name]` in `/etc/gitlab/gitlab.rb`. The setting defaults to `external_url` with a trailing slash. |
| URL | The instance's user-facing URL. |
The node you're reading from is indicated with a green `Current node` label, and
The node you're reading from is indicated with a green `Current node` label, and
the primary is given a blue `Primary` label. Remember that you can only make
the **primary** node is given a blue `Primary` label. Remember that you can only make
changes on the primary!
changes on the **primary** node!
## Secondary node settings
## **Secondary** node settings
Secondaries have a number of additional settings available:
**Secondary** nodes have a number of additional settings available:
| Setting | Description |
| Setting | Description |
|---------------------------|-------------|
|---------------------------|-------------|
Selective synchronization | Enable Geo [selective sync](https://docs.gitlab.com/ee/administration/geo/replication/configuration.html#selective-synchronization) for this **secondary** node. |
| Selective synchronization | Enable Geo [selective sync](https://docs.gitlab.com/ee/administration/geo/replication/configuration.html#selective-synchronization) for this **secondary** node. |
| Repository sync capacity | Number of concurrent requests this **secondary** node will make to the **primary** node when backfilling repositories. |
| Repository sync capacity | Number of concurrent requests this **secondary** node will make to the **primary** node when backfilling repositories. |
| File sync capacity | Number of concurrent requests this **secondary** node will make to the **primary** node when backfilling files. |
| File sync capacity | Number of concurrent requests this **secondary** node will make to the **primary** node when backfilling files. |
## Geo backfill
## Geo backfill
Secondaries are notified of changes to repositories and files by the primary,
**Secondary** nodes are notified of changes to repositories and files by the **primary** node,
and will always attempt to synchronize those changes as quickly as possible.
and will always attempt to synchronize those changes as quickly as possible.
Backfill is the act of populating the secondary with repositories and files that
Backfill is the act of populating the **secondary** node with repositories and files that
existed *before* the secondary was added to the database. Since there may be
existed *before* the **secondary** node was added to the database. Since there may be
extremely large numbers of repositories and files, it's infeasible to attempt to
extremely large numbers of repositories and files, it's infeasible to attempt to
download them all at once, so GitLab places an upper limit on the concurrency of
download them all at once, so GitLab places an upper limit on the concurrency of
these operations.
these operations.
How long the backfill takes is a function of the maximum concurrency, but higher
How long the backfill takes is a function of the maximum concurrency, but higher
values place more strain on the primary node. From [GitLab 10.2](https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/3107),
values place more strain on the **primary** node. From [GitLab 10.2](https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/3107),
the limits are configurable - if your primary node has lots of surplus capacity,
the limits are configurable. If your **primary** node has lots of surplus capacity,
you can increase the values to complete backfill in a shorter time. If it's
you can increase the values to complete backfill in a shorter time. If it's
under heavy load and backfill is reducing its availability for normal requests,
under heavy load and backfill is reducing its availability for normal requests,
you can decrease them.
you can decrease them.
...
@@ -55,3 +56,15 @@ which is used by users. Internal URL does not need to be a private address.
...
@@ -55,3 +56,15 @@ which is used by users. Internal URL does not need to be a private address.
Internal URL defaults to External URL, but you can customize it under
Internal URL defaults to External URL, but you can customize it under
**Admin area > Geo Nodes**.
**Admin area > Geo Nodes**.
## Multiple secondary nodes behind a load balancer
In GitLab 11.11, **secondary** nodes can use identical external URLs as long as
a unique `name` is set for each Geo node. The `gitlab.rb` setting
`gitlab_rails[geo_node_name]` must:
- Be set for each GitLab instance that runs `unicorn`, `sidekiq`, or `geo_logcursor`.
- Match a Geo node name.
The load balancer must use sticky sessions in order to avoid authentication