Commit 361824d7 authored by Mike Lewis's avatar Mike Lewis

Merge branch 'docs/refactor-geo-documentation' into 'master'

Refactor Geo documentation landing page

Closes gitlab-ce#50501

See merge request gitlab-org/gitlab-ee!7353
parents 532e4aad a43d7978
# Geo (Geo Replication) **[PREMIUM ONLY]** # Geo Replication **[PREMIUM ONLY]**
> **Notes:** Geo is the solution for widely distributed development teams.
> - Geo is part of [GitLab Premium][ee]
> - Introduced in GitLab Enterprise Edition 8.9
> We recommend you use it with at least GitLab Enterprise Edition 10.0 for
> basic Geo features, or latest version for a better experience
> - You should make sure that all nodes run the same GitLab version
> - Geo requires PostgreSQL 9.6 and Git 2.9 in addition to GitLab's usual
> [minimum requirements][install-requirements]
> - Using Geo in combination with High Availability (HA) is considered **Generally Available** (GA) in GitLab Enterprise Edition 10.4
>
> **Note:**
> Geo changes significantly from release to release. Upgrades **are**
> supported and [documented](#updating-the-geo-nodes), but you should ensure that
> you're following the right version of the documentation for your installation!
> The best way to do this is to follow the documentation from the `/help` endpoint
> on your **primary** node, but you can also navigate to [this page on GitLab.com](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/doc/gitlab-geo/README.md)
> and choose the appropriate release from the `tags` dropdown, e.g., `v10.0.0-ee`.
Geo allows you to replicate your GitLab instance to other geographical
locations as a read-only fully operational version.
## Overview ## Overview
If you have two or more teams geographically spread out, but your GitLab Fetching large repositories can take a long time for teams located far from a single GitLab instance.
instance is in a single location, fetching large repositories can take a long
time.
Your Geo instance can be used for cloning and fetching projects, in addition to Geo provides local, read-only instances of your GitLab instances, reducing the time it takes to clone and fetch large repositories and speeding up development.
reading any data. This will make working with large repositories over large
distances much faster.
![Geo overview](img/geo_overview.png) > - Geo is part of [GitLab Premium](https://about.gitlab.com/pricing/).
> - Introduced in GitLab Enterprise Edition 8.9.
> - We recommend you use:
> - At least GitLab Enterprise Edition 10.0 for basic Geo features.
> - The latest version for a better experience.
> - Make sure that all nodes run the same GitLab version.
> - Geo requires PostgreSQL 9.6 and Git 2.9, in addition to GitLab's usual [minimum requirements](../../../install/requirements.md).
> - Using Geo in combination with [High Availability](../../high_availability/README.md) is considered **Generally Available** (GA) in GitLab [GitLab Premium](https://about.gitlab.com/pricing/) 10.4.
When Geo is enabled, we refer to your original instance as a **primary** node For a video introduction to Geo, see [Introduction to GitLab Geo - GitLab Features](https://www.youtube.com/watch?v=-HDLxSjEh6w).
and the replicated read-only ones as **secondaries**.
Keep in mind that: CAUTION: **Caution:**
Geo undergoes significant changes from release to release. Upgrades **are** supported and [documented](#updating-geo), but you should ensure that you're using the right version of the documentation for your installation.
To make sure you're using the right version of the documentation, navigate to [the source version of this page on GitLab.com](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/doc/administration/geo/replication/index.md) and choose the appropriate release from the **Switch branch/tag** dropdown. For example, [`v11.2.3-ee`](https://gitlab.com/gitlab-org/gitlab-ee/blob/v11.2.3-ee/doc/administration/geo/replication/index.md).
## Use cases
Implementing Geo provides the following benefits:
- Reduce from minutes to seconds the time taken for your distributed developers to clone and fetch large repositories and projects.
- Enable all of your developers to contribute ideas and work in parallel, no matter where they are.
- Balance the load between your primary and secondary nodes, or offload your automated tests to the Geo secondary node.
In addition, it:
- Can be used for cloning and fetching projects, in addition to reading any data available in the GitLab web interface (see [current limitations](#current-limitations)).
- Overcomes slow connections between distant offices, saving time by improving speed for distributed teams.
- Helps reducing the loading time for automated tasks, custom integrations, and internal workflows.
- Can quickly fail over to a Geo secondary node in a [disaster recovery](../disaster_recovery/index.md) scenario.
- Allows [planned failover](../disaster_recovery/planned_failover.md) to a Geo secondary node.
Geo provides:
- Secondaries talk to the primary to get user data for logins (API) and to - Read-only secondary nodes: Maintain one primary GitLab node while still enabling a read-only secondary node for each of your distributed teams.
replicate repositories, LFS Objects and Attachments (HTTPS + JWT). - Authentication system hooks: The secondary node receives all authentication data (like user accounts and logins) from the primary instance.
- Since GitLab Premium 10.0, the primary no longer talks to - An intuitive UI: Secondary nodes utilize the same web interface your team has grown accustomed to. In addition, there are visual notifications that block write operations and make it clear that a user is on a secondary node.
secondaries to notify for changes (API).
## Use-cases ## How it works
- Can be used for cloning and fetching projects, in addition Your Geo instance can be used for cloning and fetching projects, in addition to reading any data. This will make working with large repositories over large distances much faster.
to reading any data available in the GitLab web interface (see [current limitations](#current-limitations))
- Overcomes slow connection between distant offices, saving time by
improving speed for distributed teams
- Helps reducing the loading time for automated tasks,
custom integrations and internal workflows
- Quickly failover to a Geo secondary in a [Disaster Recovery][disaster-recovery] scenario
- Allows [planned failover] to a Geo secondary
## Architecture ![Geo overview](img/geo_overview.png)
When Geo is enabled, the:
- Original instance is known as the **primary** node.
- Replicated read-only nodes are known as **secondary** nodes.
Keep in mind that:
The following diagram illustrates the underlying architecture of Geo - Secondary nodes talk to the primary node to:
([source diagram]). - Get user data for logins (API).
- Replicate repositories, LFS Objects, and Attachments (HTTPS + JWT).
- Since GitLab Premium 10.0, the primary node no longer talks to secondary nodes to notify for changes (API).
### Architecture
The following diagram illustrates the underlying architecture of Geo.
![Geo architecture](img/geo_architecture.png) ![Geo architecture](img/geo_architecture.png)
In this diagram, there is one Geo primary node and one secondary. The In this diagram:
secondary clones repositories via git over HTTPS. Attachments, LFS objects, and
other files are downloaded via HTTPS using the GitLab API to authenticate, - There is one primary node and one secondary node.
with a special endpoint protected by JWT. - The secondary node clones repositories via Git over HTTPS. Attachments, LFS objects, and other files are downloaded via HTTPS using the GitLab API to authenticate, with a special endpoint protected by JWT.
- Writes to the database and Git repositories can only be performed on the primary node. The secondary node receives database updates via PostgreSQL streaming replication.
Writes to the database and Git repositories can only be performed on the Geo Note that the secondary node needs two different PostgreSQL databases:
primary node. The secondary node receives database updates via PostgreSQL
streaming replication.
Note that the secondary needs two different PostgreSQL databases: a read-only - A read-only database instance that streams data from the main GitLab database.
instance that streams data from the main GitLab database and another used - [Another database instance](#geo-tracking-database) used internally by the secondary node to record what data has been replicated.
internally by the secondary node to record what data has been replicated.
In the secondary nodes there is an additional daemon: Geo Log Cursor. In the secondary nodes, there is an additional daemon: [Geo Log Cursor](#geo-log-cursor).
## Geo Recommendations ## Geo Recommendations
We highly recommend that you install Geo on an operating system that supports We highly recommend that you install Geo on an operating system that supports OpenSSH 6.9 or higher. The following operating systems are known to ship with a current version of OpenSSH:
OpenSSH 6.9 or higher. The following operating systems are known to ship with a
current version of OpenSSH:
* CentOS 7.4 - [CentOS](https://www.centos.org) 7.4
* Ubuntu 16.04 - [Ubuntu](https://www.ubuntu.com) 16.04
Note that CentOS 6 and 7.0 ship with an old version of OpenSSH that do not NOTE: **Note:**
support a feature that Geo requires. See the [documentation on Geo SSH CentOS 6 and 7.0 ship with an old version of OpenSSH that does not support [fast lookup of authorized SSH keys in the database](../../operations/fast_ssh_key_lookup.md), which Geo requires.
access][fast-ssh-lookup] for more details.
### Firewall rules ### Firewall rules
...@@ -112,204 +116,164 @@ If you wish to terminate SSL at the GitLab application server instead, use TCP p ...@@ -112,204 +116,164 @@ If you wish to terminate SSL at the GitLab application server instead, use TCP p
### LDAP ### LDAP
We recommend that if you use LDAP on your primary that you also set up a We recommend that if you use LDAP on your primary node, you also set up a secondary LDAP server for the secondary node. Otherwise, users will not be able to perform Git operations over HTTP(s) on the secondary node using HTTP Basic Authentication. However, Git via SSH and personal access tokens will still work.
secondary LDAP server for the secondary Geo node. Otherwise, users will not be
able to perform Git operations over HTTP(s) on the **secondary** Geo node
using HTTP Basic Authentication. However, Git via SSH and personal access
tokens will still work.
Check with your LDAP provider for instructions on on how to set up Check with your LDAP provider for instructions on how to set up replication. For example, OpenLDAP provides [these instructions](https://www.openldap.org/doc/admin24/replication.html).
replication. For example, OpenLDAP provides [these
instructions][ldap-replication].
### Geo Tracking Database ### Geo Tracking Database
We use the tracking database as metadata to control what needs to be The tracking database instance is used as metadata to control what needs to be updated on the disk of the local instance. For example:
updated on the disk of the local instance (for example, download new assets,
fetch new LFS Objects or fetch changes from a repository that has recently been - Download new assets.
updated). - Fetch new LFS Objects.
- Fetch changes from a repository that has recently been updated.
Because the replicated instance is read-only, we need this additional instance Because the replicated database instance is read-only, we need this additional database instance for each secondary node.
per secondary location.
### Geo Log Cursor ### Geo Log Cursor
This daemon reads a log of events replicated by the primary node to the secondary This daemon:
database and updates the Geo Tracking Database with changes that need to be
executed.
When something is marked to be updated in the tracking database, asynchronous - Reads a log of events replicated by the primary node to the secondary database instance.
jobs running on the secondary node will execute the required operations and - Updates the Geo Tracking Database instance with changes that need to be executed.
update the state.
This new architecture allows us to be resilient to connectivity issues between the When something is marked to be updated in the tracking database instance, asynchronous jobs running on the secondary node will execute the required operations and update the state.
nodes. It doesn't matter if it was just a few minutes or days. The secondary
instance will be able to replay all the events in the correct order and get in This new architecture allows GitLab to be resilient to connectivity issues between the nodes. It doesn't matter how long the secondary node is disconnected from the primary node as it will be able to replay all the events in the correct order and become synchronized with the primary node again.
sync again.
## Setup instructions ## Setup instructions
These instructions assume you have a working instance of GitLab. They will These instructions assume you have a working instance of GitLab. They guide you through:
guide you through making your existing instance the primary Geo node and
adding secondary Geo nodes. 1. Making your existing instance the primary node.
1. Adding secondary nodes.
The steps below should be followed in the order they appear. **Make sure the CAUTION: **Caution:**
GitLab version is the same on all nodes.** The steps below should be followed in the order they appear. **Make sure the GitLab version is the same on all nodes.**
### Using Omnibus GitLab ### Using Omnibus GitLab
If you installed GitLab using the Omnibus packages (highly recommended): If you installed GitLab using the Omnibus packages (highly recommended):
1. [Install GitLab Enterprise Edition][install-ee] on the server that will serve 1. [Install GitLab Enterprise Edition](https://about.gitlab.com/installation/) on the server that will serve as the **secondary** node. Do not create an account or log in to the new secondary node.
as the **secondary** Geo node. Do not create an account or login to the new 1. [Upload the GitLab License](../../../user/admin_area/license.md) on the **primary** node to unlock Geo. The license must be for [GitLab Premium](https://about.gitlab.com/pricing/) or higher.
secondary node. 1. [Set up the database replication](database.md) (`primary (read-write) <-> secondary (read-only)` topology).
1. [Upload the GitLab License][upload-license] on the **primary** 1. [Configure fast lookup of authorized SSH keys in the database](../../operations/fast_ssh_key_lookup.md). This step is required and needs to be done on **both** the primary and secondary nodes.
Geo node to unlock Geo. 1. [Configure GitLab](configuration.md) to set the primary and secondary nodes.
1. [Set up the database replication][database] (`primary (read-write) <-> 1. Optional: [Configure a secondary LDAP server](../../auth/ldap.md) for the secondary node. See [notes on LDAP](#ldap).
secondary (read-only)` topology). 1. [Follow the "Using a Geo Server" guide](using_a_geo_server.md).
1. [Configure fast lookup of authorized SSH keys in the database][fast-ssh-lookup],
this step is required and needs to be done on both the primary AND secondary nodes.
1. [Configure GitLab][configuration] to set the primary and secondary nodes.
1. Optional: [Configure a secondary LDAP server][config-ldap]
for the secondary. See [notes on LDAP](#ldap).
1. [Follow the "Using a Geo Server" guide][using-geo].
### Using GitLab installed from source ### Using GitLab installed from source
If you installed GitLab from source: If you installed GitLab from source:
1. [Install GitLab Enterprise Edition][install-ee-source] on the server that 1. [Install GitLab Enterprise Edition](../../../install/installation.md) on the server that will serve as the **secondary** node. Do not create an account or log in to the new secondary node.
will serve as the **secondary** Geo node. Do not create an account or login 1. [Upload the GitLab License](../../../user/admin_area/license.md) on the **primary** node to unlock Geo. The license must be for [GitLab Premium](https://about.gitlab.com/pricing/) or higher.
to the new secondary node. 1. [Set up the database replication](database_source.md) (`primary (read-write) <-> secondary (read-only)` topology).
1. [Upload the GitLab License][upload-license] on the **primary** 1. [Configure fast lookup of authorized SSH keys in the database](../../operations/fast_ssh_key_lookup.md). Do this step for **both** primary and secondary nodes.
Geo node to unlock Geo. 1. [Configure GitLab](configuration_source.md) to set the primary and secondary nodes.
1. [Set up the database replication][database-source] (`primary (read-write) 1. [Follow the "Using a Geo Server" guide](using_a_geo_server.md).
<-> secondary (read-only)` topology).
1. [Configure fast lookup of authorized SSH keys in the database][fast-ssh-lookup], ## Post-installation documentation
do this step for both primary AND secondary nodes.
1. [Configure GitLab][configuration-source] to set the primary and secondary After installing GitLab on the secondary nodes and performing the initial configuration, see the following documentation for post-installation information.
nodes.
1. [Follow the "Using a Geo Server" guide][using-geo]. ### Configuring Geo
## Configuring Geo For information on configuring Geo, see:
Read through the [Geo configuration][configuration] documentation. - [Geo configuration (GitLab Omnibus)](configuration.md).
- [Geo configuration (source)](configuration_source.md).
## Updating the Geo nodes ### Updating Geo
Read how to [update your Geo nodes to the latest GitLab version][updating-geo]. For information on how to update your Geo nodes to the latest GitLab version, see [Updating the Geo nodes](updating_the_geo_nodes.md).
## Configuring Geo HA ### Configuring Geo high availability
Read through the [Geo High Availability documentation][ha]. For information on configuring Geo for high availability, see [Geo High Availability](high_availability.md).
## Configuring Geo with Object storage ### Configuring Geo with Object Storage
When you have object storage enabled, please consult the For information on configuring Geo with object storage, see [Geo with Object storage](object_storage.md).
[Geo with Object Storage][object-storage] documentation.
## Disaster Recovery ### Disaster Recovery
Read through the [Disaster Recovery documentation][disaster-recovery] how to use Geo to mitigate data-loss and For information on using Geo in disaster recovery situations to mitigate data-loss and restore services, see [Disaster Recovery](../disaster_recovery/index.md).
restore services in a disaster scenario.
### Replicating the Container Registry ### Replicating the Container Registry
Read how to [replicate the Container Registry][docker-registry]. For more information on how to replicate the Container Registry, see [Docker Registry for a secondary node](docker_registry.md).
### Security Review
For more information on Geo security, see [Geo security review](security_review.md).
### Tuning Geo
For more information on tuning Geo, see [Tuning Geo](tuning.md).
## Current limitations ## Current limitations
> **IMPORTANT**: This list of limitations tracks only the latest version. If you are in an older version, CAUTION: **Caution:**
extra limitations may be in place. This list of limitations only reflects the latest version of GitLab. If you are using an older version, extra limitations may be in place.
- Pushing code to a secondary redirects the request to the primary instead of handling it directly [gitlab-ee#1381](https://gitlab.com/gitlab-org/gitlab-ee/issues/1381): - Pushing code to a secondary node redirects the request to the primary node instead of handling it directly [gitlab-ee#1381](https://gitlab.com/gitlab-org/gitlab-ee/issues/1381):
* Push via HTTP and SSH supported - Push via HTTP and SSH supported.
* Git LFS also supported - Git LFS also supported.
- The primary node has to be online for OAuth login to happen (existing sessions and Git are not affected) - The primary node has to be online for OAuth login to happen. Existing sessions and Git are not affected.
- The installation takes multiple manual steps that together can take about an hour depending on circumstances; we are - The installation takes multiple manual steps that together can take about an hour depending on circumstances. We are working on improving this experience. See [gitlab-org/omnibus-gitlab#2978](https://gitlab.com/gitlab-org/omnibus-gitlab/issues/2978) for details.
working on improving this experience, see [gitlab-org/omnibus-gitlab#2978] for details. - Real-time updates of issues/merge requests (for example, via long polling) doesn't work on the secondary node.
- Real-time updates of issues/merge requests (e.g. via long polling) doesn't work on the secondary - [Selective synchronization](configuration.md#selective-synchronization) applies only to files and repositories. Other datasets are replicated to the secondary node in full, making it inappropriate for use as an access control mechanism.
- [Selective synchronization](configuration.md#selective-synchronization)
applies only to files and repositories. Other datasets are replicated to the
secondary in full, making it inappropriate for use as an access control
mechanism.
### Limitations on replication ### Limitations on replication
Only the following items are replicated to the secondary. Data not on this list is unavailable on the secondary. Failing over without manually replicating it will cause the data to be **lost**: Only the following items are replicated to the secondary node:
- All database content (e.g. snippets, epics, issues, merge requests, groups, project metadata, etc) - All database content. For example, snippets, epics, issues, merge requests, groups, and project metadata.
- Project repositories - Project repositories.
- Project wiki repositories - Project wiki repositories.
- User uploads (e.g. attachments to issues, merge requests and epics, avatars, etc) - User uploads. For example, attachments to issues, merge requests, epics, and avatars.
- CI job artifacts and traces - CI job artifacts and traces.
### Examples of unreplicated data DANGER: **DANGER**
Data not on this list is unavailable on the secondary node. Failing over without manually replicating data not on this list will cause the data to be **lost**.
Take special note that these GitLab features are both commonly used, and **not** ### Examples of data not replicated
replicated by Geo at present. If you wish to use them on the secondary, or to
execute a failover successfully, you will need to replicate their data using
some other means.
- [Elasticsearch integration](../../../integration/elasticsearch.md) Take special note that these examples of GitLab features are both:
- [Container Registry](../../container_registry.md) ([Object Storage][docker-registry] can mitigate this)
- [GitLab Pages](../../pages/index.md) - Commonly used.
- [Mattermost integration](https://docs.gitlab.com/omnibus/gitlab-mattermost/) - **Not** replicated by Geo at present.
Examples include:
- [Elasticsearch integration](../../../integration/elasticsearch.md).
- [Container Registry](../../container_registry.md). [Object Storage](object_storage.md) can mitigate this.
- [GitLab Pages](../../pages/index.md).
- [Mattermost integration](https://docs.gitlab.com/omnibus/gitlab-mattermost/).
CAUTION: **Caution:**
If you wish to use them on a secondary node, or to execute a failover successfully, you will need to replicate their data using some other means.
## Frequently Asked Questions ## Frequently Asked Questions
Read more in the [Geo FAQ][faq]. For answers to common questions, see the [Geo FAQ](faq.md).
## Log files ## Log files
Since GitLab 9.5, Geo stores structured log messages in a `geo.log` file. For Since GitLab 9.5, Geo stores structured log messages in a `geo.log` file. For Omnibus installations, this file is at `/var/log/gitlab/gitlab-rails/geo.log`.
Omnibus installations, this file can be found in
`/var/log/gitlab/gitlab-rails/geo.log`. This file contains information about This file contains information about when Geo attempts to sync repositories and files. Each line in the file contains a separate JSON entry that can be ingested into Elasticsearch, Splunk, etc.
when Geo attempts to sync repositories and files. Each line in the file contains a
separate JSON entry that can be ingested into Elasticsearch, Splunk, etc. For For example:
example:
```json ```json
{"severity":"INFO","time":"2017-08-06T05:40:16.104Z","message":"Repository update","project_id":1,"source":"repository","resync_repository":true,"resync_wiki":true,"class":"Gitlab::Geo::LogCursor::Daemon","cursor_delay_s":0.038} {"severity":"INFO","time":"2017-08-06T05:40:16.104Z","message":"Repository update","project_id":1,"source":"repository","resync_repository":true,"resync_wiki":true,"class":"Gitlab::Geo::LogCursor::Daemon","cursor_delay_s":0.038}
``` ```
This message shows that Geo detected that a repository update was needed for project 1. This message shows that Geo detected that a repository update was needed for project `1`.
## Security of Geo
Read the [security review][security-review] page.
## Tuning Geo
Read the [Geo tuning][tunning] documentation.
## Troubleshooting ## Troubleshooting
Read the [troubleshooting document][troubleshooting]. For troubleshooting steps, see [Geo Troubleshooting](troubleshooting.md).
[ee]: https://about.gitlab.com/pricing/ "GitLab Enterprise Edition landing page"
[install-requirements]: ../../../install/requirements.md
[install-ee]: https://about.gitlab.com/downloads-ee/ "GitLab Enterprise Edition Omnibus packages downloads page"
[install-ee-source]: https://docs.gitlab.com/ee/install/installation.html "GitLab Enterprise Edition installation from source"
[disaster-recovery]: ../disaster_recovery/index.md
[planned failover]: ../disaster_recovery/planned_failover.md
[fast-ssh-lookup]: ../../operations/fast_ssh_key_lookup.md
[upload-license]: ../../../user/admin_area/license.md
[database]: database.md
[database-source]: database_source.md
[configuration]: configuration.md
[configuration-source]: configuration_source.md
[config-ldap]: ../../auth/ldap.md
[using-geo]: using_a_geo_server.md
[updating-geo]: updating_the_geo_nodes.md
[ha]: high_availability.md
[object-storage]: object_storage.md
[docker-registry]: docker_registry.md
[faq]: faq.md
[security-review]: security_review.md
[tunning]: tuning.md
[troubleshooting]: troubleshooting.md
[source diagram]: https://docs.google.com/drawings/d/1Abw0P_H0Ew1-2Lj_xPDRWP87clGIke-1fil7_KQqrtE/edit
[ldap-replication]: https://www.openldap.org/doc/admin24/replication.html
[gitlab-org/gitlab-ee#3912]: https://gitlab.com/gitlab-org/gitlab-ee/issues/3912
[gitlab-org/omnibus-gitlab#2978]: https://gitlab.com/gitlab-org/omnibus-gitlab/issues/2978
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment