Commit d9e80413 authored by Nick Thomas's avatar Nick Thomas

Merge branch 'tc-geo-doc-tracking-db' into 'master'

Revise GitLab Geo documentation

Closes #3365 and #3400

See merge request gitlab-org/gitlab-ee!2858
parents ae19f96c e5d7dc25
...@@ -56,9 +56,9 @@ components below. ...@@ -56,9 +56,9 @@ components below.
### High Availability with Sentinel ### High Availability with Sentinel
>**Notes:** >**Notes:**
- Starting with GitLab `8.11`, you can configure a list of Redis Sentinel - Starting with GitLab 8.11, you can configure a list of Redis Sentinel
servers that will monitor a group of Redis servers to provide failover support. servers that will monitor a group of Redis servers to provide failover support.
- Starting with GitLab `8.14`, the Omnibus GitLab Enterprise Edition package - Starting with GitLab 8.14, the Omnibus GitLab Enterprise Edition package
comes with Redis Sentinel daemon built-in. comes with Redis Sentinel daemon built-in.
High Availability with Redis requires a few things: High Availability with Redis requires a few things:
......
# GitLab Geo # GitLab Geo
NOTE: GitLab Geo is in ALPHA development. It is considered experimental and >**Note:**
GitLab Geo is in **Beta** development. It is considered experimental and
not production-ready. It will undergo significant changes over the next year, not production-ready. It will undergo significant changes over the next year,
and there is significant chance of data loss. For the latest updates, check the and there is significant chance of data loss. For the latest updates, check the
[meta issue](https://gitlab.com/gitlab-org/gitlab-ee/issues/846). [meta issue](https://gitlab.com/gitlab-org/gitlab-ee/issues/846).
...@@ -32,9 +33,10 @@ and the replicated read-only ones as **secondaries**. ...@@ -32,9 +33,10 @@ and the replicated read-only ones as **secondaries**.
Keep in mind that: Keep in mind that:
- Secondaries talk to primary to get user data for logins (API), and to - Secondaries talk to primary to authorize user logins (OAuth),
clone/pull from repositories (HTTP(S)/SSH). to synchronize data (database replication), and to clone/pull from
- Primary talks to secondaries to notify for changes (API). repositories (SSH).
- Primary talks to secondaries to get their status (API).
## Use-cases ## Use-cases
......
...@@ -3,12 +3,15 @@ ...@@ -3,12 +3,15 @@
After you set up the [database replication and configure the GitLab Geo nodes][req], After you set up the [database replication and configure the GitLab Geo nodes][req],
there are a few things to consider: there are a few things to consider:
1. When you create a new project in the primary node, the Git repository will 1. Users need an extra step to be able to fetch code from the secondary and push
appear in the secondary only _after_ the first `git push`. to primary:
1. You need an extra step to be able to fetch code from the `secondary` and push
to `primary`: 1. Clone the repository as normal do from the secondary node:
```bash
git clone git@secondary.gitlab.example.com:user/repo.git
```
1. Clone your repository as you would normally do from the `secondary` node
1. Change the remote push URL following this example: 1. Change the remote push URL following this example:
```bash ```bash
......
...@@ -33,8 +33,9 @@ can be summed up to: ...@@ -33,8 +33,9 @@ can be summed up to:
1. Configure the primary node 1. Configure the primary node
1. Replicate some required configurations between the primary and the secondaries 1. Replicate some required configurations between the primary and the secondaries
1. Start GitLab in the secondary node's machine 1. Configure a second, tracking database on each secondary
1. Configure every secondary node in the primary's Admin screen 1. Configure every secondary node in the primary's Admin screen
1. Start GitLab on the secondary node's machine
### Prerequisites ### Prerequisites
...@@ -49,6 +50,9 @@ first two steps of the [Setup instructions](README.md#setup-instructions): ...@@ -49,6 +50,9 @@ first two steps of the [Setup instructions](README.md#setup-instructions):
1. Your nodes must have an NTP service running to synchronize the clocks. 1. Your nodes must have an NTP service running to synchronize the clocks.
You can use different timezones, but the hour relative to UTC can't be more You can use different timezones, but the hour relative to UTC can't be more
than 60 seconds off from each node. than 60 seconds off from each node.
1. You have set up another PostgreSQL database that can store writes for the secondary.
Note that this MUST be on another instance, since the primary replicated database
is read-only.
Some of the following steps require to configure the primary and secondary Some of the following steps require to configure the primary and secondary
nodes almost at the same time. For your convenience make sure you have SSH nodes almost at the same time. For your convenience make sure you have SSH
...@@ -106,7 +110,7 @@ sensitive data in the database. Any secondary node must have the ...@@ -106,7 +110,7 @@ sensitive data in the database. Any secondary node must have the
sudo -i sudo -i
``` ```
1. Added in GitLab 9.1: Execute the command below to display current encryption key and copy it: 1. Execute the command below to display the current encryption key and copy it:
``` ```
gitlab-rake geo:db:show_encryption_key gitlab-rake geo:db:show_encryption_key
...@@ -134,16 +138,21 @@ sensitive data in the database. Any secondary node must have the ...@@ -134,16 +138,21 @@ sensitive data in the database. Any secondary node must have the
### Step 4. Regenerating the authorized keys in the secondary node ### Step 4. Regenerating the authorized keys in the secondary node
> **IMPORTANT:** Since GitLab 10.0 `~/.ssh/authorized_keys` no longer
> can be used, and this step is deprecated. Instead, follow the
> instructions on [configuring SSH authorization via database lookups](../administration/operations/speed_up_ssh.html)
> (for both primary AND secondary nodes).
Regenerate the keys for `~/.ssh/authorized_keys` Regenerate the keys for `~/.ssh/authorized_keys`
(HTTPS clone will still work without this extra step). (HTTPS clone will still work without this extra step).
On the **secondary** node where the database is [already replicated](./database.md), 1. On the **secondary** node where the database is [already replicated](./database.md),
run: run:
``` ```
# For Omnibus installations # For Omnibus installations
gitlab-rake gitlab:shell:setup gitlab-rake gitlab:shell:setup
``` ```
This will enable `git` operations to authorize against your existing users. This will enable `git` operations to authorize against your existing users.
New users and SSH keys updated after this step, will be replicated automatically. New users and SSH keys updated after this step, will be replicated automatically.
...@@ -187,10 +196,10 @@ The two most obvious issues that replication can have here are: ...@@ -187,10 +196,10 @@ The two most obvious issues that replication can have here are:
### Step 6. Replicating the repositories data ### Step 6. Replicating the repositories data
Lastly, getting a new secondary Geo node up and running, will also require the Getting a new secondary Geo node up and running, will also require the
repositories data to be synced. repositories data to be synced.
With GitLab **9.0** the syncing process starts automatically from the With GitLab 9.0 the syncing process starts automatically from the
secondary node after the **Add Node** button is pressed. secondary node after the **Add Node** button is pressed.
Currently, this is what is synced: Currently, this is what is synced:
...@@ -212,7 +221,7 @@ repository shards you must duplicate the same configuration on the secondary. ...@@ -212,7 +221,7 @@ repository shards you must duplicate the same configuration on the secondary.
Disabling a secondary node stops the syncing process. Disabling a secondary node stops the syncing process.
With GitLab **8.14** this process is started manually from the primary node. With GitLab 8.14 this process is started manually from the primary node.
You can start the syncing process by clicking the "Backfill all repositories" You can start the syncing process by clicking the "Backfill all repositories"
button on `Admin > Geo Nodes` screen. button on `Admin > Geo Nodes` screen.
...@@ -259,7 +268,7 @@ Point your users to the [after setup steps](after_setup.md). ...@@ -259,7 +268,7 @@ Point your users to the [after setup steps](after_setup.md).
## Selective replication ## Selective replication
With GitLab **9.5**, GitLab Geo now supports the first iteration of selective With GitLab 9.5, GitLab Geo now supports the first iteration of selective
replication, which allows admins to choose which namespaces should be replication, which allows admins to choose which namespaces should be
replicated by secondary nodes. replicated by secondary nodes.
......
...@@ -22,9 +22,9 @@ in your testing/production environment. ...@@ -22,9 +22,9 @@ in your testing/production environment.
## Setting up GitLab ## Setting up GitLab
>**Notes:** >**Notes:**
- Don't setup any custom authentication in the secondary nodes, this will be - **Do not** setup any custom authentication in the secondary nodes, this will be
handled by the primary node. handled by the primary node.
- Do not add anything in the secondaries Geo nodes admin area - **Do not** add anything in the secondaries Geo nodes admin area
(**Admin Area ➔ Geo Nodes**). This is handled solely by the primary node. (**Admin Area ➔ Geo Nodes**). This is handled solely by the primary node.
After having installed GitLab Enterprise Edition in the instance that will serve After having installed GitLab Enterprise Edition in the instance that will serve
...@@ -33,9 +33,9 @@ next steps can be summed up to: ...@@ -33,9 +33,9 @@ next steps can be summed up to:
1. Configure the primary node 1. Configure the primary node
1. Replicate some required configurations between the primary and the secondaries 1. Replicate some required configurations between the primary and the secondaries
1. Configure a second, tracking database for each secondary 1. Configure a second, tracking database on each secondary
1. Start GitLab in the secondary node's machine
1. Configure every secondary node in the primary's Admin screen 1. Configure every secondary node in the primary's Admin screen
1. Start GitLab on the secondary node's machine
### Prerequisites ### Prerequisites
...@@ -47,6 +47,9 @@ first two steps of the [Setup instructions](README.md#setup-instructions): ...@@ -47,6 +47,9 @@ first two steps of the [Setup instructions](README.md#setup-instructions):
1. You have set up the database replication. 1. You have set up the database replication.
1. Your secondary node is allowed to communicate via HTTP/HTTPS and SSH with 1. Your secondary node is allowed to communicate via HTTP/HTTPS and SSH with
your primary node (make sure your firewall is not blocking that). your primary node (make sure your firewall is not blocking that).
1. Your nodes must have an NTP service running to synchronize the clocks.
You can use different timezones, but the hour relative to UTC can't be more
than 60 seconds off from each node.
1. You have set up another PostgreSQL database that can store writes for the secondary. 1. You have set up another PostgreSQL database that can store writes for the secondary.
Note that this MUST be on another instance, since the primary replicated database Note that this MUST be on another instance, since the primary replicated database
is read-only. is read-only.
...@@ -117,56 +120,37 @@ sensitive data in the database. Any secondary node must have the ...@@ -117,56 +120,37 @@ sensitive data in the database. Any secondary node must have the
sudo -i sudo -i
``` ```
1. (This step is required only if you want to enable the new Disaster Recovery 1. Open the secrets file and paste the value of `db_key_base` you copied in the
feature in Alpha shipped in GitLab 9.0) Create `database_geo.yml` with the previous step:
information of your secondary PostgreSQL database.
Note that this must be a totally different instance from the primary, since this
is where the secondary will track its internal state:
``` ```
sudo cp /home/git/gitlab/config/database_geo.yml.postgresql /home/git/gitlab/config/database_geo.yml editor /etc/gitlab/gitlab-secrets.json
``` ```
1. (This step is required only if you want to enable the new Disaster Recovery 1. Save and close the file.
feature in Alpha shipped in GitLab 9.0) Edit the content of `database_geo.yml`
in `production:` to be like the following:
```yaml
#
# PRODUCTION
#
production:
adapter: postgresql
encoding: unicode
database: gitlabhq_geo_production
pool: 10
username: gitlab_geo
# password:
host: /var/opt/gitlab/geo-postgresql
port: 5431
```
1. (This step is required only if you want to enable the new Disaster Recovery ### Step 4. Regenerating the authorized keys in the secondary node
feature in Alpha shipped in GitLab 9.0) Create the database
`gitlabhq_geo_production` in that PostgreSQL instance.
1. (This step is required only if you want to enable the new Disaster Recovery > **IMPORTANT:** Since GitLab 10.0 `~/.ssh/authorized_keys` no longer
feature in Alpha shipped in GitLab 9.0) Set up the Geo tracking database: > can be used, and this step is deprecated. Instead, follow the
> instructions on [configuring SSH authorization via database lookups](../administration/operations/speed_up_ssh.html)
> (for both primary AND secondary nodes).
``` Regenerate the keys for `~/.ssh/authorized_keys`
bundle exec rake geo:db:migrate (HTTPS clone will still work without this extra step).
```
1. Open the secrets file and paste the value of `db_key_base` you copied in the 1. On the **secondary** node where the database is [already replicated](./database.md),
previous step: run:
``` ```
editor /home/git/gitlab/config/secrets.yml # Installations from source
sudo -u git -H bundle exec rake gitlab:shell:setup RAILS_ENV=production
``` ```
1. Save and close the file. This will enable `git` operations to authorize against your existing users.
New users and SSH keys updated after this step, will be replicated automatically.
### Step 4. Enabling the secondary GitLab node ### Step 5. Enabling the secondary GitLab node
1. SSH into the **secondary** node and login as root: 1. SSH into the **secondary** node and login as root:
...@@ -193,6 +177,7 @@ sensitive data in the database. Any secondary node must have the ...@@ -193,6 +177,7 @@ sensitive data in the database. Any secondary node must have the
in your browser. in your browser.
1. Add the secondary node by providing its full URL and the public SSH key 1. Add the secondary node by providing its full URL and the public SSH key
you created previously. **Do NOT** check the box 'This is a primary node'. you created previously. **Do NOT** check the box 'This is a primary node'.
1. Added in GitLab 9.5: Choose which namespaces should be replicated by the secondary node. Leave blank to replicate all. Read more in [selective replication](#selective-replication).
1. Click the **Add node** button. 1. Click the **Add node** button.
--- ---
...@@ -210,12 +195,12 @@ The two most obvious issues that replication can have here are: ...@@ -210,12 +195,12 @@ The two most obvious issues that replication can have here are:
[Troubleshooting](configuration.md#troubleshooting) section) [Troubleshooting](configuration.md#troubleshooting) section)
- Instance is firewalled (check your firewall rules) - Instance is firewalled (check your firewall rules)
### Step 5. Replicating the repositories data ### Step 6. Replicating the repositories data
Getting a new secondary Geo node up and running, will also require the Getting a new secondary Geo node up and running, will also require the
repositories data to be synced. repositories data to be synced.
With GitLab **9.0** the syncing process starts automatically from the With GitLab 9.0 the syncing process starts automatically from the
secondary node after the **Add Node** button is pressed. secondary node after the **Add Node** button is pressed.
Currently, this is what is synced: Currently, this is what is synced:
...@@ -230,11 +215,14 @@ You can monitor the status of the syncing process on a secondary node ...@@ -230,11 +215,14 @@ You can monitor the status of the syncing process on a secondary node
by visiting the primary node's **Admin Area ➔ Geo Nodes** (`/admin/geo_nodes`) by visiting the primary node's **Admin Area ➔ Geo Nodes** (`/admin/geo_nodes`)
in your browser. in your browser.
Please note that if `git_data_dirs` is customized on the primary for multiple
repository shards you must duplicate the same configuration on the secondary.
![GitLab Geo dashboard](img/geo-node-dashboard.png) ![GitLab Geo dashboard](img/geo-node-dashboard.png)
Disabling a secondary node stops the syncing process. Disabling a secondary node stops the syncing process.
With GitLab **8.14** this process is started manually from the primary node. With GitLab 8.14 this process is started manually from the primary node.
You can start the syncing process by clicking the "Backfill all repositories" You can start the syncing process by clicking the "Backfill all repositories"
button on `Admin > Geo Nodes` screen. button on `Admin > Geo Nodes` screen.
...@@ -267,22 +255,6 @@ While active repositories will be eventually replicated, if you don't rsync, ...@@ -267,22 +255,6 @@ While active repositories will be eventually replicated, if you don't rsync,
the files, any archived/inactive repositories will not get in the secondary node the files, any archived/inactive repositories will not get in the secondary node
as Geo doesn't run any routine task to look for missing repositories. as Geo doesn't run any routine task to look for missing repositories.
### Step 6. Regenerating the authorized keys in the secondary node
The final step is to regenerate the keys for `~/.ssh/authorized_keys`
(HTTPS clone will still work without this extra step).
On the **secondary** node where the database is [already replicated](./database.md),
run:
```
# Installations from source
sudo -u git -H bundle exec rake gitlab:shell:setup RAILS_ENV=production
```
This will enable `git` operations to authorize against your existing users.
New users and SSH keys updated after this step, will be replicated automatically.
## Next steps ## Next steps
Your nodes should now be ready to use. You can login to the secondary node Your nodes should now be ready to use. You can login to the secondary node
...@@ -295,6 +267,10 @@ If your installation isn't working properly, check the ...@@ -295,6 +267,10 @@ If your installation isn't working properly, check the
Point your users to the [after setup steps](after_setup.md). Point your users to the [after setup steps](after_setup.md).
## Selective replication
Read [Selective replication](configuration.md#selective-replication).
## Adding another secondary Geo node ## Adding another secondary Geo node
To add another Geo node in an already Geo configured infrastructure, just follow To add another Geo node in an already Geo configured infrastructure, just follow
......
...@@ -8,7 +8,7 @@ from source, follow the ...@@ -8,7 +8,7 @@ from source, follow the
1. [Install GitLab Enterprise Edition][install-ee] on the server that will serve 1. [Install GitLab Enterprise Edition][install-ee] on the server that will serve
as the secondary Geo node. Do not login or set up anything else in the as the secondary Geo node. Do not login or set up anything else in the
secondary node for the moment. secondary node for the moment.
1. **Setup the database replication (`primary (read-write) <-> secondary (read-only)` topology).** 1. **Setup the database replication topology:** `primary (read-write) <-> secondary (read-only)`
1. [Configure GitLab](configuration.md) to set the primary and secondary nodes. 1. [Configure GitLab](configuration.md) to set the primary and secondary nodes.
1. [Follow the after setup steps](after_setup.md). 1. [Follow the after setup steps](after_setup.md).
...@@ -24,14 +24,14 @@ in your testing/production environment. ...@@ -24,14 +24,14 @@ in your testing/production environment.
## PostgreSQL replication ## PostgreSQL replication
The GitLab primary node where the write operations happen will connect to The GitLab primary node where the write operations happen will connect to
`primary` database server, and the secondary ones which are read-only will primary database server, and the secondary ones which are read-only will
connect to `secondary` database servers (which are read-only too). connect to secondary database servers (which are read-only too).
>**Note:** >**Note:**
In many databases documentation you will see `primary` being references as `master` In many databases documentation you will see "primary" being referenced as "master"
and `secondary` as either `slave` or `standby` server (read-only). and "secondary" as either "slave" or "standby" server (read-only).
New for GitLab 9.4: We recommend using [PostgreSQL replication Since GitLab 9.4: We recommend using [PostgreSQL replication
slots](https://medium.com/@tk512/replication-slots-in-postgresql-b4b03d277c75) slots](https://medium.com/@tk512/replication-slots-in-postgresql-b4b03d277c75)
to ensure the primary retains all the data necessary for the secondaries to to ensure the primary retains all the data necessary for the secondaries to
recover. See below for more details. recover. See below for more details.
...@@ -40,17 +40,17 @@ recover. See below for more details. ...@@ -40,17 +40,17 @@ recover. See below for more details.
The following guide assumes that: The following guide assumes that:
- You are using PostgreSQL 9.2 or later which includes the - You are using PostgreSQL 9.6 or later which includes the
[`pg_basebackup` tool][pgback]. If you are using Omnibus it includes the required [`pg_basebackup` tool][pgback]. If you are using Omnibus it includes the required
PostgreSQL version for Geo. PostgreSQL version for Geo.
- You have a primary server already set up (the GitLab server you are - You have a primary server already set up (the GitLab server you are
replicating from), running Omnibus' PostgreSQL (or equivalent version), and you replicating from), running Omnibus' PostgreSQL (or equivalent version), and you
have a new secondary server set up on the same OS and PostgreSQL version. If have a new secondary server set up on the same OS and PostgreSQL version. Also
you are using Omnibus, make sure the GitLab version is the same on all nodes. make sure the GitLab version is the same on all nodes.
- The IP of the primary server for our examples will be `1.2.3.4`, whereas the - The IP of the primary server for our examples will be `1.2.3.4`, whereas the
secondary's IP will be `5.6.7.8`. Note that the primary and secondary servers secondary's IP will be `5.6.7.8`. Note that the primary and secondary servers
MUST be able to communicate over these addresses. These IP addresses can either **must** be able to communicate over these addresses (using HTTPS & SSH).
be public or private. These IP addresses can either be public or private.
### Step 1. Configure the primary server ### Step 1. Configure the primary server
...@@ -177,13 +177,71 @@ The following guide assumes that: ...@@ -177,13 +177,71 @@ The following guide assumes that:
\q \q
``` ```
1. Added in GitLab 9.1: Edit `/etc/gitlab/gitlab.rb` and add the following: 1. Edit `/etc/gitlab/gitlab.rb` and add the following:
```ruby ```ruby
geo_secondary_role['enable'] = true geo_secondary_role['enable'] = true
```
1. Optional since GitLab 9.1, and required for GitLab 10.0 or higher:
[Enable tracking database on the secondary server](#enable-tracking-database-on-the-secondary-server)
1. Otherwise, continue to [initiate the replication process](#step-3-initiate-the-replication-process).
#### Enable tracking database on the secondary server
Geo secondary nodes use a tracking database to keep track of replication status and recover
automatically from some replication issues.
It is added in GitLab 9.1, and since GitLab 10.0 it is required.
> **IMPORTANT:** For this feature to work correctly, all nodes must be
with their clocks synchronized. It is not required for all nodes to be set to
the same time zone, but when the respective times are converted to UTC time,
the clocks must be synchronized to within 60 seconds of each other.
1. Setup clock synchronization service in your Linux distro.
This can easily be done via any NTP-compatible daemon. For example,
here are [instructions for setting up NTP with Ubuntu](https://help.ubuntu.com/lts/serverguide/NTP.html).
1. Edit `/etc/gitlab/gitlab.rb` and add the following:
```ruby
geo_postgresql['enable'] = true geo_postgresql['enable'] = true
``` ```
1. Create `database_geo.yml` with the information of your secondary PostgreSQL
database. Note that GitLab will set up another database instance separate
from the primary, since this is where the secondary will track its internal
state:
```
sudo cp /opt/gitlab/embedded/service/gitlab-rails/config/database_geo.yml.postgresql /opt/gitlab/embedded/service/gitlab-rails/config/database_geo.yml
```
1. Edit the content of `database_geo.yml` in `production:` like the example below:
```yaml
#
# PRODUCTION
#
production:
adapter: postgresql
encoding: unicode
database: gitlabhq_geo_production
pool: 10
username: gitlab_geo
# password:
host: /var/opt/gitlab/geo-postgresql
```
1. Set up the Geo tracking database:
```
sudo gitlab-rake geo:db:migrate
```
1. [Reconfigure GitLab][] for the changes to take effect. 1. [Reconfigure GitLab][] for the changes to take effect.
1. Continue to [initiate the replication process](#step-3-initiate-the-replication-process). 1. Continue to [initiate the replication process](#step-3-initiate-the-replication-process).
......
...@@ -8,7 +8,7 @@ using the Omnibus GitLab packages, follow the ...@@ -8,7 +8,7 @@ using the Omnibus GitLab packages, follow the
1. [Install GitLab Enterprise Edition][install-ee-source] on the server that 1. [Install GitLab Enterprise Edition][install-ee-source] on the server that
will serve as the secondary Geo node. Do not login or set up anything else will serve as the secondary Geo node. Do not login or set up anything else
in the secondary node for the moment. in the secondary node for the moment.
1. **Setup the database replication (`primary (read-write) <-> secondary (read-only)` topology).** 1. **Setup the database replication topology:** `primary (read-write) <-> secondary (read-only)`
1. [Configure GitLab](configuration_source.md) to set the primary and secondary 1. [Configure GitLab](configuration_source.md) to set the primary and secondary
nodes. nodes.
1. [Follow the after setup steps](after_setup.md). 1. [Follow the after setup steps](after_setup.md).
...@@ -25,12 +25,17 @@ in your testing/production environment. ...@@ -25,12 +25,17 @@ in your testing/production environment.
## PostgreSQL replication ## PostgreSQL replication
The GitLab primary node where the write operations happen will connect to The GitLab primary node where the write operations happen will connect to
`primary` database server, and the secondary ones which are read-only will primary database server, and the secondary ones which are read-only will
connect to `secondary` database servers (which are read-only too). connect to secondary database servers (which are read-only too).
>**Note:** >**Note:**
In many databases documentation you will see `primary` being references as `master` In many databases documentation you will see "primary" being referenced as "master"
and `secondary` as either `slave` or `standby` server (read-only). and "secondary" as either "slave" or "standby" server (read-only).
Since GitLab 9.4: We recommend using [PostgreSQL replication
slots](https://medium.com/@tk512/replication-slots-in-postgresql-b4b03d277c75)
to ensure the primary retains all the data necessary for the secondaries to
recover. See below for more details.
### Prerequisites ### Prerequisites
...@@ -41,13 +46,15 @@ The following guide assumes that: ...@@ -41,13 +46,15 @@ The following guide assumes that:
PostgreSQL version for Geo. PostgreSQL version for Geo.
- You have a primary server already set up (the GitLab server you are - You have a primary server already set up (the GitLab server you are
replicating from), and you have a new secondary server set up on the same OS replicating from), and you have a new secondary server set up on the same OS
and PostgreSQL version. and PostgreSQL version. Also make sure the GitLab version is the same on all nodes.
- The IP of the primary server for our examples will be `1.2.3.4`, whereas the - The IP of the primary server for our examples will be `1.2.3.4`, whereas the
secondary's IP will be `5.6.7.8`. secondary's IP will be `5.6.7.8`. Note that the primary and secondary servers
**must** be able to communicate over these addresses. These IP addresses can either
be public or private.
### Step 1. Configure the primary server ### Step 1. Configure the primary server
1. SSH into your database **primary** server and login as root: 1. SSH into your GitLab **primary** server and login as root:
``` ```
sudo -i sudo -i
...@@ -125,10 +132,11 @@ The following guide assumes that: ...@@ -125,10 +132,11 @@ The following guide assumes that:
1. Now that the PostgreSQL server is set up to accept remote connections, run 1. Now that the PostgreSQL server is set up to accept remote connections, run
`netstat -plnt` to make sure that PostgreSQL is listening to the server's `netstat -plnt` to make sure that PostgreSQL is listening to the server's
public IP. public IP.
1. Continue to [set up the secondary server](#step-2-configure-the-secondary-server).
### Step 2. Configure the secondary server ### Step 2. Configure the secondary server
1. SSH into your database **secondary** server and login as root: 1. SSH into your GitLab **secondary** server and login as root:
``` ```
sudo -i sudo -i
...@@ -151,7 +159,7 @@ The following guide assumes that: ...@@ -151,7 +159,7 @@ The following guide assumes that:
``` ```
1. Edit `postgresql.conf` to configure the secondary for streaming replication 1. Edit `postgresql.conf` to configure the secondary for streaming replication
(for Debian/Ubuntu that would be `/etc/postgresql/9.x/main/postgresql.conf`): (for Debian/Ubuntu that would be `/etc/postgresql/9.*/main/postgresql.conf`):
```bash ```bash
wal_level = hot_standby wal_level = hot_standby
...@@ -162,7 +170,61 @@ The following guide assumes that: ...@@ -162,7 +170,61 @@ The following guide assumes that:
``` ```
1. Restart PostgreSQL for the changes to take effect. 1. Restart PostgreSQL for the changes to take effect.
1. Continue to [initiate the replication process](#step-3-initiate-the-replication-process).
1. Optional since GitLab 9.1, and required for GitLab 10.0 or higher:
[Enable tracking database on the secondary server](#enable-tracking-database-on-the-secondary-server)
1. Otherwise, continue to [initiate the replication process](#step-3-initiate-the-replication-process).
#### Enable tracking database on the secondary server
Geo secondary nodes use a tracking database to keep track of replication status and recover
automatically from some replication issues.
It is added in GitLab 9.1, and since GitLab 10.0 it is required.
> **IMPORTANT:** For this feature to work correctly, all nodes must be
with their clocks synchronized. It is not required for all nodes to be set to
the same time zone, but when the respective times are converted to UTC time,
the clocks must be synchronized to within 60 seconds of each other.
1. Setup clock synchronization service in your Linux distro.
This can easily be done via any NTP-compatible daemon. For example,
here are [instructions for setting up NTP with Ubuntu](https://help.ubuntu.com/lts/serverguide/NTP.html).
1. Create `database_geo.yml` with the information of your secondary PostgreSQL
database. Note that GitLab will set up another database instance separate
from the primary, since this is where the secondary will track its internal
state:
```
sudo cp /home/git/gitlab/config/database_geo.yml.postgresql /home/git/gitlab/config/database_geo.yml
```
1. Edit the content of `database_geo.yml` in `production:` like the example below:
```yaml
#
# PRODUCTION
#
production:
adapter: postgresql
encoding: unicode
database: gitlabhq_geo_production
pool: 10
username: gitlab_geo
# password:
host: /var/opt/gitlab/geo-postgresql
```
1. Create the database `gitlabhq_geo_production` in that PostgreSQL
instance.
1. Set up the Geo tracking database:
```
bundle exec rake geo:db:migrate
```
### Step 3. Initiate the replication process ### Step 3. Initiate the replication process
......
# GitLab Geo Disaster Recovery # GitLab Geo Disaster Recovery
> **Note:** > **Note:**
This is not officially supported yet, please don't use as your only GitLab Geo Disaster Recovery is in **Alpha** development. Please don't
Disaster Recovery strategy as you may lose data. use as your only Disaster Recovery strategy as you may lose data.
GitLab Geo replicates your database and your Git repositories. We will GitLab Geo replicates your database and your Git repositories. We will
support and replicate more data in the future, that will enable you to support and replicate more data in the future, that will enable you to
......
...@@ -25,12 +25,12 @@ primary node. ...@@ -25,12 +25,12 @@ primary node.
## How long does it take to have a commit replicated to a secondary node? ## How long does it take to have a commit replicated to a secondary node?
All replication operations are asynchronous and are queued to be dispatched in All replication operations are asynchronous and are queued to be dispatched in
a batched request every 10 seconds. Besides that, it depends on a lot of other a batched request every 10 minutes. Besides that, it depends on a lot of other
factors including the amount of traffic, how big your commit is, the factors including the amount of traffic, how big your commit is, the
connectivity between your nodes, your hardware, etc. connectivity between your nodes, your hardware, etc.
## What happens if the SSH server runs at a different port? ## What if the SSH server runs at a different port?
We send the clone url from the primary server to any secondaries, so it We send the clone url from the primary server to any secondaries, so it
doesn't matter. If primary is running on port `2200` clone url will reflect doesn't matter. If primary is running on port `2200`, clone url will reflect
that. that.
...@@ -6,6 +6,9 @@ single source of truth, Geo needs to be configured to perform SSH fingerprint ...@@ -6,6 +6,9 @@ single source of truth, Geo needs to be configured to perform SSH fingerprint
lookups via database lookup. This approach is also much faster than scanning a lookups via database lookup. This approach is also much faster than scanning a
file. file.
>**Note:**
GitLab 10.0 and higher require database lookups for SSH keys.
Note this feature is only available on operating systems that support OpenSSH Note this feature is only available on operating systems that support OpenSSH
6.9 and above. For CentOS 6 and 7, see the [instructions on building custom 6.9 and above. For CentOS 6 and 7, see the [instructions on building custom
version of OpenSSH for your server] version of OpenSSH for your server]
......
...@@ -10,7 +10,9 @@ all you need to do is update GitLab itself: ...@@ -10,7 +10,9 @@ all you need to do is update GitLab itself:
1. Log into each node (primary and secondaries) 1. Log into each node (primary and secondaries)
1. [Update GitLab][update] 1. [Update GitLab][update]
1. Test primary and secondary nodes, and check version in each. 1. [Update tracking database on secondary node](#update-tracking-database-on-secondary-node) when
the tracking database is enabled.
1. [Test](#check-status-after-updating) primary and secondary nodes, and check version in each.
## Special update notes for 9.0.x ## Special update notes for 9.0.x
...@@ -149,68 +151,18 @@ everything is working correctly: ...@@ -149,68 +151,18 @@ everything is working correctly:
1. Test the data replication by pushing code to the primary and see if it 1. Test the data replication by pushing code to the primary and see if it
is received by the secondaries is received by the secondaries
## Enable tracking database ## Update tracking database on secondary node
NOTE: This step is required only if you want to enable the new Disaster After updating a secondary node, you might need to run migrations on
Recovery feature in Alpha shipped in GitLab 9.0. the tracking database. The tracking database was added in GitLab 9.1,
and it is required since 10.0.
Geo secondary nodes now can keep track of replication status and recover 1. Run database migrations on tracking database
automatically from some replication issues. To get this feature enabled,
you need to activate the Tracking Database.
> **IMPORTANT:** For this feature to work correctly, all nodes must be
with their clocks synchronized. It is not required for all nodes to be set to
the same time zone, but when the respective times are converted to UTC time,
the clocks must be synchronized to within 60 seconds of each other.
1. Setup clock synchronization service in your Linux distro.
This can easily be done via any NTP-compatible daemon. For example,
here are [instructions for setting up NTP with Ubuntu](https://help.ubuntu.com/lts/serverguide/NTP.html).
1. Edit `/etc/gitlab/gitlab.rb`:
```
geo_postgresql['enable'] = true
```
1. Create `database_geo.yml` with the information of your secondary PostgreSQL
database. Note that GitLab will set up another database instance separate
from the primary, since this is where the secondary will track its internal
state:
```
sudo cp /opt/gitlab/embedded/service/gitlab-rails/config/database_geo.yml.postgresql /opt/gitlab/embedded/service/gitlab-rails/config/database_geo.yml
```
1. Edit the content of `database_geo.yml` in `production:` like the example below:
```yaml
#
# PRODUCTION
#
production:
adapter: postgresql
encoding: unicode
database: gitlabhq_geo_production
pool: 10
username: gitlab_geo
# password:
host: /var/opt/gitlab/geo-postgresql
port: 5431
```
1. Reconfigure GitLab:
```
sudo gitlab-ctl start
sudo gitlab-ctl reconfigure
```
1. Set up the Geo tracking database:
``` ```
sudo gitlab-rake geo:db:migrate sudo gitlab-rake geo:db:migrate
``` ```
1. Repeat this step for every secondary node
[update]: ../update/README.md [update]: ../update/README.md
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment