Commit f8853569 authored by Michael Kozono's avatar Michael Kozono Committed by Nick Thomas

Geo: Minor improvements to Disaster Recovery and Planned Failover docs

parent e61cc7c4
...@@ -79,6 +79,10 @@ must disable the primary. ...@@ -79,6 +79,10 @@ must disable the primary.
- Revoke object storage permissions from the primary - Revoke object storage permissions from the primary
- Physically disconnect a machine - Physically disconnect a machine
1. If you plan to
[update the primary domain DNS record](#step-4-optional-updating-the-primary-domain-dns-record),
you may wish to lower the TTL now to speed up propagation.
### Step 3. Promoting a secondary Geo replica ### Step 3. Promoting a secondary Geo replica
1. SSH in to your **secondary** and login as root: 1. SSH in to your **secondary** and login as root:
...@@ -146,6 +150,10 @@ secondary domain, like changing Git remotes and API URLs. ...@@ -146,6 +150,10 @@ secondary domain, like changing Git remotes and API URLs.
external_url 'https://gitlab.example.com' external_url 'https://gitlab.example.com'
``` ```
NOTE: **Note**
Changing `external_url` won't prevent access via the old secondary URL, as
long as the secondary DNS records are still intact.
1. Reconfigure the secondary node for the change to take effect: 1. Reconfigure the secondary node for the change to take effect:
```bash ```bash
......
...@@ -187,38 +187,45 @@ Until a [read-only mode][ce-19739] is implemented, updates must be prevented ...@@ -187,38 +187,45 @@ Until a [read-only mode][ce-19739] is implemented, updates must be prevented
from happening manually. Note that your **secondary** still needs read-only from happening manually. Note that your **secondary** still needs read-only
access to the primary for the duration of the maintenance window. access to the primary for the duration of the maintenance window.
At the scheduled time, using your cloud provider or your node's firewall, block 1. At the scheduled time, using your cloud provider or your node's firewall, block
all HTTP, HTTPS and SSH traffic to/from the primary, **except** for your IP and all HTTP, HTTPS and SSH traffic to/from the primary, **except** for your IP and
the secondary's IP. the secondary's IP.
For instance, if your secondary originates all its traffic from `5.6.7.8` and For instance, if your secondary originates all its traffic from `5.6.7.8` and
your IP is `100.0.0.1`, you might run the following commands on the server(s) your IP is `100.0.0.1`, you might run the following commands on the server(s)
making up your primary node: making up your primary node:
``` ```
sudo iptables -A INPUT -p tcp -s 5.6.7.8 --destination-port 22 -j ACCEPT sudo iptables -A INPUT -p tcp -s 5.6.7.8 --destination-port 22 -j ACCEPT
sudo iptables -A INPUT -p tcp -s 100.0.0.1 --destination-port 22 -j ACCEPT sudo iptables -A INPUT -p tcp -s 100.0.0.1 --destination-port 22 -j ACCEPT
sudo iptables -A INPUT --destination-port 22 -j REJECT sudo iptables -A INPUT --destination-port 22 -j REJECT
sudo iptables -A INPUT -p tcp -s 5.6.7.8 --destination-port 80 -j ACCEPT sudo iptables -A INPUT -p tcp -s 5.6.7.8 --destination-port 80 -j ACCEPT
sudo iptables -A INPUT -p tcp -s 100.0.0.1 --destination-port 80 -j ACCEPT sudo iptables -A INPUT -p tcp -s 100.0.0.1 --destination-port 80 -j ACCEPT
sudo iptables -A INPUT --tcp-dport 80 -j REJECT sudo iptables -A INPUT --tcp-dport 80 -j REJECT
sudo iptables -A INPUT -p tcp -s 5.6.7.8 --destination-port 443 -j ACCEPT sudo iptables -A INPUT -p tcp -s 5.6.7.8 --destination-port 443 -j ACCEPT
sudo iptables -A INPUT -p tcp -s 100.0.0.1 --destination-port 443 -j ACCEPT sudo iptables -A INPUT -p tcp -s 100.0.0.1 --destination-port 443 -j ACCEPT
sudo iptables -A INPUT --tcp-dport 443 -j REJECT sudo iptables -A INPUT --tcp-dport 443 -j REJECT
``` ```
From this point, users will be unable to view their data or make changes on the From this point, users will be unable to view their data or make changes on the
**primary** node. They will also be unable to log in to the **secondary** node, **primary** node. They will also be unable to log in to the **secondary** node,
but existing sessions will work for the remainder of the maintenance period, and but existing sessions will work for the remainder of the maintenance period, and
public data will be accessible throughout. public data will be accessible throughout.
Next, disable non-Geo periodic background jobs on the primary node by navigating 1. Verify the primary is blocked to HTTP traffic by visiting it in browser via
to **Admin Area ➔ Monitoring ➔ Background Jobs ➔ Cron** , pressing `Disable All`, another IP. The server should refuse connection.
and then pressing `Enable` for the `geo_sidekiq_cron_config_worker` cron job.
This job will re-enable several other cron jobs that are essential for planned 1. Verify the primary is blocked to Git over SSH traffic by attempting to pull an
failover to complete successfully. existing Git repository with an SSH remote URL. The server should refuse
connection.
1. Disable non-Geo periodic background jobs on the primary node by navigating
to **Admin Area ➔ Monitoring ➔ Background Jobs ➔ Cron** , pressing `Disable All`,
and then pressing `Enable` for the `geo_sidekiq_cron_config_worker` cron job.
This job will re-enable several other cron jobs that are essential for planned
failover to complete successfully.
## Finish replicating and verifying all data ## Finish replicating and verifying all data
...@@ -230,7 +237,6 @@ failover to complete successfully. ...@@ -230,7 +237,6 @@ failover to complete successfully.
before it is completed will cause the work to be lost! before it is completed will cause the work to be lost!
1. On the **primary**, navigate to **Admin Area ➔ Geo Nodes** and wait for the 1. On the **primary**, navigate to **Admin Area ➔ Geo Nodes** and wait for the
following conditions to be true of the **secondary** you are failing over to: following conditions to be true of the **secondary** you are failing over to:
* All replication meters to each 100% replicated, 0% failures * All replication meters to each 100% replicated, 0% failures
* All verification meters reach 100% verified, 0% failures * All verification meters reach 100% verified, 0% failures
* Database replication lag is 0ms * Database replication lag is 0ms
...@@ -256,6 +262,8 @@ begin to diverge from the old one. If problems do arise at this point, failing ...@@ -256,6 +262,8 @@ begin to diverge from the old one. If problems do arise at this point, failing
back to the old primary [is possible][bring-primary-back], but likely to result back to the old primary [is possible][bring-primary-back], but likely to result
in the loss of any data uploaded to the new primary in the meantime. in the loss of any data uploaded to the new primary in the meantime.
Don't forget to remove the broadcast message after failover is complete.
[bring-primary-back]: bring_primary_back.md [bring-primary-back]: bring_primary_back.md
[ce-19739]: https://gitlab.com/gitlab-org/gitlab-ce/issues/19739 [ce-19739]: https://gitlab.com/gitlab-org/gitlab-ce/issues/19739
[container-registry]: ../replication/container_registry.md [container-registry]: ../replication/container_registry.md
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment