Commit 027c60bd authored by James Ramsay's avatar James Ramsay

Separate Disaster Recovery docs from Geo docs

Move the disaster recovery documentation into the administration docs
to separate it from the geo documentation because they are separate
features.
parent 40ae4b80
## Bring a demoted primary back online
After a fail-over, it is possible to fail back to the demoted primary to restore your original configuration.
This process consists of two steps: making old primary a secondary and promoting secondary to a primary.
### Configure the former primary to be a secondary
Since the former primary will be out of sync with the current primary, the first
step is to bring the former primary up to date. There is one downside though, some uploads and repositories
that have been deleted during an idle period of a primary node, will not be deleted from the disk but the overall sync will be much faster. As an alternative, you can set up a [GitLab instance from scratch](https://docs.gitlab.com/ee/gitlab-geo/#setup-instructions) to workaround this downside.
1. SSH into the former primary that has fallen behind.
1. Make sure all the services are up by running the command
```bash
sudo gitlab-ctl start
```
Note: If you [disabled the primary permanently](index.md#step-2-permanently-disable-the-primary), you need to undo those steps now. For Debian/Ubuntu you just need to run `sudo systemctl enable gitlab-runsvdir`. For CentoOS 6, you need to install GitLab instance from scratch and setup it as a secondary node by following [Setup instructions](../../gitlab-geo/README.md#setup-instructions). In this case you don't need the step below.
1. [Setup database replication](../../gitlab-geo/database.md). In this documentation, primary
refers to the current primary, and secondary refers to the former primary.
If you have lost your original primary, follow the
[setup instructions](../../gitlab-geo/README.md#setup-instructions) to set up a new secondary.
### Promote the secondary to primary
When initial replication is complete and the primary and secondary are closely in sync you can do a [Planned Failover](planned-fail-over.md)
### Restore the secondary node
If your objective is to have two nodes again, you need to bring your secondary node back online as well by repeating the first step ([Configure the former primary to be a secondary](#configure-the-former-primary-to-be-a-secondary)) for the secondary node.
# Disaster Recovery
> **Note:** Disaster Recovery for multi-secondary configurations is in
> **Alpha** development. Do not use this as your only Disaster Recovery
> strategy as you may lose data.
GitLab Geo replicates your database and your Git repositories. We will
support and replicate more data in the future, that will enable you to
fail-over with minimal effort, in a disaster situation.
See [Geo current limitations](../../gitlab-geo/README.md#current-limitations)
for more information.
## Promoting secondary Geo replica in single-secondary configuration
We don't currently provide an automated way to promote a Geo replica and do a
fail-over, but you can do it manually if you have `root` access to the machine.
This process promotes a secondary Geo replica to a primary. To regain
geographical redundancy as quickly as possible, you should add a new secondary
immediately after following these instructions.
### Step 1. Allow replication to finish if possible
If the secondary is still replicating data from the primary, follow
[the Planned Failover doc](planned-fail-over.md) as closely as possible in
order to avoid unnecessary data loss.
### Step 2. Permanently disable the primary
**Warning: If a primary goes offline, there may be data saved on the primary
that has not been replicated to the secondary. This data should be treated
as lost if you proceed.**
If an outage on your primary happens, you should do everything possible to
avoid a split-brain situation where writes can occur to two different GitLab
instances, complicating recovery efforts. So to prepare for the fail-over, we
must disable the primary.
1. SSH into your **primary** to stop and disable GitLab, if possible.
```bash
sudo gitlab-ctl stop
```
Prevent GitLab from starting up again if the server unexpectedly reboots:
```bash
sudo systemctl disable gitlab-runsvdir
```
On some operating systems such as CentOS 6, an easy way to prevent GitLab
from being started if the machine reboots isn't available
(see [Omnibus issue #3058](https://gitlab.com/gitlab-org/omnibus-gitlab/issues/3058)).
It may be safest to uninstall the GitLab package completely:
```bash
yum remove gitlab-ee
```
1. If you do not have SSH access to your primary, take the machine offline and
prevent it from rebooting by any means at your disposal.
Since there are many ways you may prefer to accomplish this, we will avoid a
single recommendation. You may need to:
* Reconfigure load balancers
* Change DNS records (e.g. point the primary DNS record to the secondary node in order to stop usage of the primary)
* Stop virtual servers
* Block traffic through a firewall
* Revoke object storage permissions from the primary
* Physically disconnect a machine
### Step 3. Promoting a secondary Geo replica
1. SSH in to your **secondary** and login as root:
```bash
sudo -i
```
1. Edit `/etc/gitlab/gitlab.rb` to reflect its new status as primary.
Remove the following line:
```ruby
## REMOVE THIS LINE
geo_secondary_role['enable'] = true
```
A new secondary should not be added at this time. If you want to add a new
secondary, do this after you have completed the entire process of promoting
the secondary to the primary.
1. Promote the secondary to primary. Execute:
```bash
gitlab-ctl promote-to-primary-node
```
1. Verify you can connect to the newly promoted primary using the URL used
previously for the secondary.
1. Success! The secondary has now been promoted to primary.
### Step 4. (Optional) Updating the primary domain's DNS record
Updating the DNS records for the primary domain to point to the secondary
will prevent the need to update all references to the primary domain to the
secondary domain, like changing Git remotes and API URLs.
1. SSH in to your **secondary** and login as root:
```bash
sudo -i
```
1. Update the primary domain's DNS record.
After updating the primary domain's DNS records to point to the secondary,
edit `/etc/gitlab/gitlab.rb` on the the secondary to reflect the new URL:
```ruby
# Change the existing external_url configuration
external_url 'https://gitlab.example.com'
```
1. Reconfigure the secondary node for the change to take effect:
```bash
gitlab-ctl reconfigure
```
1. Execute the command below to update the newly promoted primary node URL:
```bash
gitlab-rake geo:update_primary_node_url
```
This command will use the changed `external_url` configuration defined
in `/etc/gitlab/gitlab.rb`.
1. Verify you can connect to the newly promoted primary using the primary URL.
If you updated the DNS records for the primary domain, these changes may
not have yet propagated depending on the previous DNS records TTL.
### Step 5. (Optional) Add secondary Geo replicas to a promoted primary
Promoting a secondary to primary using the process above does not enable
GitLab Geo on the new primary.
To bring a new secondary online, follow the
[Geo setup instructions](../../gitlab-geo/README.md#setup-instructions).
## Promoting secondary Geo replica in multi-secondary configurations
Disaster Recovery does not yet support systems with multiple
secondary Geo replicas (e.g. one primary and two or more secondaries). We are
working on it, see [#4284](https://gitlab.com/gitlab-org/gitlab-ee/issues/4284)
for details.
## Troubleshooting
### I followed the disaster recovery instructions and now two-factor auth is broken!
The setup instructions for Geo prior to 10.5 failed to replicate the
`otp_key_base` secret, which is used to encrypt the two-factor authentication
secrets stored in the database. If it differs between primary and secondary
nodes, users with two-factor authentication enabled won't be able to log in
after a fail-over.
If you still have access to the old primary node, you can follow the
instructions in the
[Upgrading to GitLab 10.5](../../gitlab-geo/updating_the_geo_nodes.md#upgrading-to-gitlab-105)
section to resolve the error. Otherwise, the secret is lost and you'll need to
[reset two-factor authentication for all users](../../security/two_factor_authentication.md#disabling-2fa-for-everyone).
# Disaster Recovery for Planned Fail-Over
A planned fail-over is similar to a disaster recovery scenario, except you are able
to notify users of the maintenance window, and allow data to finish replicating to
secondaries.
Please read this entire document as well as [Disaster Recovery](index.md)
before proceeding.
### Notify users of scheduled maintenance
1. On the primary, in Admin Area > Messages, add a broadcast message.
Check Admin Area > Geo Nodes to estimate how long it will take to finish syncing.
```
We are doing scheduled maintenance at XX:XX UTC, expected to take less than 1 hour.
```
1. On the secondary, you may need to clear the cache for the broadcast message to show up.
### Block primary traffic
1. At the scheduled time, using your cloud provider or your node's firewall, block HTTP and SSH traffic to/from the primary except for your IP and the secondary's IP.
### Allow replication to finish as much as possible
1. On the secondary, navigate to Admin Area > Geo Nodes and wait until all replication progress is 100% on the secondary "Current node".
1. Navigate to Admin Area > Monitoring > Background Jobs > Queues and wait until the "geo" queues drop ideally to 0.
### Promote the secondary
1. Finally, follow [Disaster Recovery](index.md) to promote the secondary to a primary.
......@@ -20,7 +20,8 @@ Learn how to install, configure, update, and maintain your GitLab instance.
- **(Starter/Premium)** [Omnibus support for log forwarding](https://docs.gitlab.com/omnibus/settings/logs.html#udp-log-shipping-gitlab-enterprise-edition-only)
- [High Availability](high_availability/README.md): Configure multiple servers for scaling or high availability.
- [High Availability on AWS](../university/high-availability/aws/README.md): Set up GitLab HA on Amazon AWS.
- **(Premium)** [GitLab Geo](../gitlab-geo/README.md): Replicate your GitLab instance to other geographical locations as a read-only fully operational version.
- **(Premium)** [Geo](../gitlab-geo/README.md): Replicate your GitLab instance to other geographical locations as a read-only fully operational version.
- **(Premium)** [Disaster Recovery](disaster_recovery/index.md): Quickly fail-over to a different site with minimal effort in a disaster situation.
- **(Premium)** [Pivotal Tile](../install/pivotal/index.md): Deploy GitLab as a pre-configured appliance using Ops Manager (BOSH) for Pivotal Cloud Foundry.
### Configuring GitLab
......
......@@ -51,7 +51,9 @@ to reading any data available in the GitLab web interface (see [current limitati
improving speed for distributed teams
- Helps reducing the loading time for automated tasks,
custom integrations and internal workflows
- A Geo secondary can be promoted to become the primary in a [Disaster Recovery](disaster-recovery.md) scenario
- Quickly fail-over to a Geo secondary in a
[Disaster Recovery](../administration/disaster_recovery/index.md) scenario
- Allows [planned fail-over](../administration/disaster_recovery/planned-fail-over.md) to a Geo secondary
## Architecture
......@@ -191,10 +193,6 @@ Read through the [Geo High Availability documentation](ha.md).
When you have object storage enabled, please consult the
[Geo with Object Storage](object_storage.md) documentation.
## Restore demoted primary geo node
Read how to [Bring a demoted primary back](bring-primary-back.md)
## Replicating the Container Registry
Read how to [replicate the Container Registry](docker_registry.md).
......
## Bring a demoted primary back online
After a failover, it is possible to fail back to the demoted primary to restore your original configuration.
This process consists of two steps: making old primary a secondary and promoting secondary to a primary.
### Configure the former primary to be a secondary
Since the former primary will be out of sync with the current primary, the first
step is to bring the former primary up to date. There is one downside though, some uploads and repositories
that have been deleted during an idle period of a primary node, will not be deleted from the disk but the overall sync will be much faster. As an alternative, you can set up a [GitLab instance from scratch](https://docs.gitlab.com/ee/gitlab-geo/#setup-instructions) to workaround this downside.
1. SSH into the former primary that has fallen behind.
1. Make sure all the services are up by running the command
```bash
sudo gitlab-ctl start
```
Note: If you [disabled primary permanently](https://docs.gitlab.com/ee/gitlab-geo/disaster-recovery.html#step-2-permanently-disable-the-primary), you need to undo those steps now. For Debian/Ubuntu you just need to run `sudo systemctl enable gitlab-runsvdir`. For CentoOS 6, you need to install GitLab instance from scratch and setup it as a secondary node by following [Setup instructions](https://docs.gitlab.com/ee/gitlab-geo/#setup-instructions). In this case you don't need the step below.
1. [Setup the database replication](database.md). In this documentation, primary
refers to the current primary, and secondary refers to the former primary.
If you have lost your original primary, follow the
[setup instructions](README.md#setup-instructions) to set up a new secondary.
### Promote the secondary to primary
When initial replication is complete and the primary and secondary are closely in sync you can do a [Planned Failover](planned-failover.md)
### Restore the secondary node
If your objective is to have two nodes again, you need to bring your secondary node back online as well by repeating the first step ([Make primary a secondary](#make-primary-a-secondary)) for the secondary node.
This document was moved to [another location](../administration/disaster_recovery/bring-primary-back.md).
......@@ -90,7 +90,7 @@ with the same credentials as used in the primary.
GitLab integrates with the system-installed SSH daemon, designating a user
(typically named git) through which all access requests are handled.
In a [Disaster Recovery](disaster-recovery.md) situation, GitLab system
In a [Disaster Recovery](../administration/disaster_recovery/index.md) situation, GitLab system
administrators will promote a secondary Geo replica to a primary and they can
update the DNS records for the primary domain to point to the secondary to prevent
the need to update all references to the primary domain to the secondary domain,
......
# GitLab Geo Disaster Recovery
> **Note:** Disaster Recovery is in **Alpha** development. Do not use this as
> your only Disaster Recovery strategy as you may lose data.
GitLab Geo replicates your database and your Git repositories. We will
support and replicate more data in the future, that will enable you to
fail-over with minimal effort, in a disaster situation.
See [current limitations](README.md#current-limitations) for more information.
## Promoting secondary Geo replica in single-secondary configuration
We don't currently provide an automated way to promote a Geo replica and do a
fail-over, but you can do it manually if you have `root` access to the machine.
This process promotes a secondary Geo replica to a primary. To regain
geographical redundancy as quickly as possible, you should add a new secondary
immediately after following these instructions.
### Step 1. Allow replication to finish if possible
If the secondary is still replicating data from the primary, follow
[the Planned Failover doc](planned-failover.md) as closely as possible in
order to avoid unnecessary data loss.
### Step 2. Permanently disable the primary
**Warning: If a primary goes offline, there may be data saved on the primary
that has not been replicated to the secondary. This data should be treated
as lost if you proceed.**
If an outage on your primary happens, you should do everything possible to
avoid a split-brain situation where writes can occur to two different GitLab
instances, complicating recovery efforts. So to prepare for the failover, we
must disable the primary.
1. SSH into your **primary** to stop and disable GitLab, if possible.
```bash
sudo gitlab-ctl stop
```
Prevent GitLab from starting up again if the server unexpectedly reboots:
```bash
sudo systemctl disable gitlab-runsvdir
```
On some operating systems such as CentOS 6, an easy way to prevent GitLab
from being started if the machine reboots isn't available
(see [Omnibus issue #3058](https://gitlab.com/gitlab-org/omnibus-gitlab/issues/3058)).
It may be safest to uninstall the GitLab package completely:
```bash
yum remove gitlab-ee
```
1. If you do not have SSH access to your primary, take the machine offline and
prevent it from rebooting by any means at your disposal.
Since there are many ways you may prefer to accomplish this, we will avoid a
single recommendation. You may need to:
* Reconfigure load balancers
* Change DNS records (e.g. point the primary DNS record to the secondary node in order to stop usage of the primary)
* Stop virtual servers
* Block traffic through a firewall
* Revoke object storage permissions from the primary
* Physically disconnect a machine
### Step 3. Promoting a secondary Geo replica
1. SSH in to your **secondary** and login as root:
```bash
sudo -i
```
1. Edit `/etc/gitlab/gitlab.rb` to reflect its new status as primary.
Remove the following line:
```ruby
## REMOVE THIS LINE
geo_secondary_role['enable'] = true
```
A new secondary should not be added at this time. If you want to add a new
secondary, do this after you have completed the entire process of promoting
the secondary to the primary.
1. Promote the secondary to primary. Execute:
```bash
gitlab-ctl promote-to-primary-node
```
1. Verify you can connect to the newly promoted primary using the URL used
previously for the secondary.
1. Success! The secondary has now been promoted to primary.
### Step 4. (Optional) Updating the primary domain's DNS record
Updating the DNS records for the primary domain to point to the secondary
will prevent the need to update all references to the primary domain to the
secondary domain, like changing Git remotes and API URLs.
1. SSH in to your **secondary** and login as root:
```bash
sudo -i
```
1. Update the primary domain's DNS record.
After updating the primary domain's DNS records to point to the secondary,
edit `/etc/gitlab/gitlab.rb` on the the secondary to reflect the new URL:
```ruby
# Change the existing external_url configuration
external_url 'https://gitlab.example.com'
```
1. Reconfigure the secondary node for the change to take effect:
```bash
gitlab-ctl reconfigure
```
1. Execute the command below to update the newly promoted primary node URL:
```bash
gitlab-rake geo:update_primary_node_url
```
This command will use the changed `external_url` configuration defined
in `/etc/gitlab/gitlab.rb`.
1. Verify you can connect to the newly promoted primary using the primary URL.
If you updated the DNS records for the primary domain, these changes may
not have yet propagated depending on the previous DNS records TTL.
### Step 5. (Optional) Add secondary Geo replicas to a promoted primary
Promoting a secondary to primary using the process above does not enable
GitLab Geo on the new primary.
To bring a new secondary online, follow the [GitLab Geo setup instructions](
README.md#setup-instructions).
## Promoting secondary Geo replica in multi-secondary configurations
Disaster Recovery does not yet support systems with multiple
secondary Geo replicas (e.g. one primary and two or more secondaries). We are
working on it, see [#4284](https://gitlab.com/gitlab-org/gitlab-ee/issues/4284)
for details.
This document was moved to [another location](../administration/disaster_recovery/index.md).
......@@ -2,26 +2,10 @@
## Can I use Geo in a disaster recovery situation?
There are limitations to what we replicate (see
Yes, but there are limitations to what we replicate (see
[What data is replicated to a secondary node?](#what-data-is-replicated-to-a-secondary-node)).
In an extreme data-loss situation you can make a secondary Geo into your
primary, but this is not officially supported yet.
If you still want to proceed, see our step-by-step instructions on how to
manually [promote a secondary node](disaster-recovery.md) into primary.
## I followed the disaster recovery instructions and now two-factor auth is broken!
The setup instructions for GitLab Geo prior to 10.5 failed to replicate the
`otp_key_base` secret, which used to encrypt the two-factor authentication
secrets stored in the database. If it differs between primary and secondary
nodes, users with two-factor authentication enabled won't be able to log in
after a DR failover.
If you still have access to the old primary node, you can follow the
instructions in the [Upgrading to GitLab 10.5](updating_the_geo_nodes.md#upgrading-to-gitlab-105)
section to resolve the error. Otherwise, the secret is lost and you'll need to
[reset two-factor authentication for all users](../security/two_factor_authentication.md#disabling-2fa-for-everyone).
Read the documentation for [Disaster Recovery](../administration/disaster_recovery/index.md).
## What data is replicated to a secondary node?
......
# GitLab Geo Planned Failover
A planned failover is similar to a disaster recovery scenario, except you are able
to notify users of the maintenance window, and allow data to finish replicating to
secondaries.
Please read this entire document as well as
[GitLab Geo Disaster Recovery](disaster-recovery.md) before proceeding.
### Notify users of scheduled maintenance
1. On the primary, in Admin Area > Messages, add a broadcast message.
Check Admin Area > Geo Nodes to estimate how long it will take to finish syncing.
```
We are doing scheduled maintenance at XX:XX UTC, expected to take less than 1 hour.
```
1. On the secondary, you may need to clear the cache for the broadcast message to show up.
### Block primary traffic
1. At the scheduled time, using your cloud provider or your node's firewall, block HTTP and SSH traffic to/from the primary except for your IP and the secondary's IP.
### Allow replication to finish as much as possible
1. On the secondary, navigate to Admin Area > Geo Nodes and wait until all replication progress is 100% on the secondary "Current node".
1. Navigate to Admin Area > Monitoring > Background Jobs > Queues and wait until the "geo" queues drop ideally to 0.
### Promote the secondary
1. Finally, follow [GitLab Geo Disaster Recovery](disaster-recovery.md) to promote the secondary to a primary.
This document was moved to [another location](../administration/disaster_recovery/planned-fail-over.md).
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment