Commit 92ee313c authored by Douglas Barbosa Alexandre's avatar Douglas Barbosa Alexandre Committed by Marcel Amirault

Update docs to promote a Geo secondary site with the new single command

parent ac9b0aa0
...@@ -5,27 +5,27 @@ info: To determine the technical writer assigned to the Stage/Group associated w ...@@ -5,27 +5,27 @@ info: To determine the technical writer assigned to the Stage/Group associated w
type: howto type: howto
--- ---
# Bring a demoted primary node back online **(PREMIUM SELF)** # Bring a demoted primary site back online **(PREMIUM SELF)**
After a failover, it is possible to fail back to the demoted **primary** node to After a failover, it is possible to fail back to the demoted **primary** site to
restore your original configuration. This process consists of two steps: restore your original configuration. This process consists of two steps:
1. Making the old **primary** node a **secondary** node. 1. Making the old **primary** site a **secondary** site.
1. Promoting a **secondary** node to a **primary** node. 1. Promoting a **secondary** site to a **primary** site.
WARNING: WARNING:
If you have any doubts about the consistency of the data on this node, we recommend setting it up from scratch. If you have any doubts about the consistency of the data on this site, we recommend setting it up from scratch.
## Configure the former **primary** node to be a **secondary** node ## Configure the former **primary** site to be a **secondary** site
Since the former **primary** node will be out of sync with the current **primary** node, the first step is to bring the former **primary** node up to date. Note, deletion of data stored on disk like Since the former **primary** site will be out of sync with the current **primary** site, the first step is to bring the former **primary** site up to date. Note, deletion of data stored on disk like
repositories and uploads will not be replayed when bringing the former **primary** node back repositories and uploads will not be replayed when bringing the former **primary** site back
into sync, which may result in increased disk usage. into sync, which may result in increased disk usage.
Alternatively, you can [set up a new **secondary** GitLab instance](../setup/index.md) to avoid this. Alternatively, you can [set up a new **secondary** GitLab instance](../setup/index.md) to avoid this.
To bring the former **primary** node up to date: To bring the former **primary** site up to date:
1. SSH into the former **primary** node that has fallen behind. 1. SSH into the former **primary** site that has fallen behind.
1. Make sure all the services are up: 1. Make sure all the services are up:
```shell ```shell
...@@ -33,36 +33,36 @@ To bring the former **primary** node up to date: ...@@ -33,36 +33,36 @@ To bring the former **primary** node up to date:
``` ```
NOTE: NOTE:
If you [disabled the **primary** node permanently](index.md#step-2-permanently-disable-the-primary-node), If you [disabled the **primary** site permanently](index.md#step-2-permanently-disable-the-primary-site),
you need to undo those steps now. For Debian/Ubuntu you just need to run you need to undo those steps now. For Debian/Ubuntu you just need to run
`sudo systemctl enable gitlab-runsvdir`. For CentOS 6, you need to install `sudo systemctl enable gitlab-runsvdir`. For CentOS 6, you need to install
the GitLab instance from scratch and set it up as a **secondary** node by the GitLab instance from scratch and set it up as a **secondary** site by
following [Setup instructions](../setup/index.md). In this case, you don't need to follow the next step. following [Setup instructions](../setup/index.md). In this case, you don't need to follow the next step.
NOTE: NOTE:
If you [changed the DNS records](index.md#step-4-optional-updating-the-primary-domain-dns-record) If you [changed the DNS records](index.md#step-4-optional-updating-the-primary-domain-dns-record)
for this node during disaster recovery procedure you may need to [block for this site during disaster recovery procedure you may need to [block
all the writes to this node](planned_failover.md#prevent-updates-to-the-primary-node) all the writes to this site](planned_failover.md#prevent-updates-to-the-primary-node)
during this procedure. during this procedure.
1. [Set up database replication](../setup/database.md). In this case, the **secondary** node 1. [Set up database replication](../setup/database.md). In this case, the **secondary** site
refers to the former **primary** node. refers to the former **primary** site.
1. If [PgBouncer](../../postgresql/pgbouncer.md) was enabled on the **current secondary** node 1. If [PgBouncer](../../postgresql/pgbouncer.md) was enabled on the **current secondary** site
(when it was a primary node) disable it by editing `/etc/gitlab/gitlab.rb` (when it was a primary site) disable it by editing `/etc/gitlab/gitlab.rb`
and running `sudo gitlab-ctl reconfigure`. and running `sudo gitlab-ctl reconfigure`.
1. You can then set up database replication on the **secondary** node. 1. You can then set up database replication on the **secondary** site.
If you have lost your original **primary** node, follow the If you have lost your original **primary** site, follow the
[setup instructions](../setup/index.md) to set up a new **secondary** node. [setup instructions](../setup/index.md) to set up a new **secondary** site.
## Promote the **secondary** node to **primary** node ## Promote the **secondary** site to **primary** site
When the initial replication is complete and the **primary** node and **secondary** node are When the initial replication is complete and the **primary** site and **secondary** site are
closely in sync, you can do a [planned failover](planned_failover.md). closely in sync, you can do a [planned failover](planned_failover.md).
## Restore the **secondary** node ## Restore the **secondary** site
If your objective is to have two nodes again, you need to bring your **secondary** If your objective is to have two sites again, you need to bring your **secondary**
node back online as well by repeating the first step site back online as well by repeating the first step
([configure the former **primary** node to be a **secondary** node](#configure-the-former-primary-node-to-be-a-secondary-node)) ([configure the former **primary** site to be a **secondary** site](#configure-the-former-primary-site-to-be-a-secondary-site))
for the **secondary** node. for the **secondary** site.
...@@ -16,36 +16,36 @@ For the latest updates, check the [Disaster Recovery epic for complete maturity] ...@@ -16,36 +16,36 @@ For the latest updates, check the [Disaster Recovery epic for complete maturity]
Multi-secondary configurations require the complete re-synchronization and re-configuration of all non-promoted secondaries and Multi-secondary configurations require the complete re-synchronization and re-configuration of all non-promoted secondaries and
causes downtime. causes downtime.
## Promoting a **secondary** Geo node in single-secondary configurations ## Promoting a **secondary** Geo site in single-secondary configurations
We don't currently provide an automated way to promote a Geo replica and do a We don't currently provide an automated way to promote a Geo replica and do a
failover, but you can do it manually if you have `root` access to the machine. failover, but you can do it manually if you have `root` access to the machine.
This process promotes a **secondary** Geo node to a **primary** node. To regain This process promotes a **secondary** Geo site to a **primary** site. To regain
geographic redundancy as quickly as possible, you should add a new **secondary** node geographic redundancy as quickly as possible, you should add a new **secondary** site
immediately after following these instructions. immediately after following these instructions.
### Step 1. Allow replication to finish if possible ### Step 1. Allow replication to finish if possible
If the **secondary** node is still replicating data from the **primary** node, follow If the **secondary** site is still replicating data from the **primary** site, follow
[the planned failover docs](planned_failover.md) as closely as possible in [the planned failover docs](planned_failover.md) as closely as possible in
order to avoid unnecessary data loss. order to avoid unnecessary data loss.
### Step 2. Permanently disable the **primary** node ### Step 2. Permanently disable the **primary** site
WARNING: WARNING:
If the **primary** node goes offline, there may be data saved on the **primary** node If the **primary** site goes offline, there may be data saved on the **primary** site
that have not been replicated to the **secondary** node. This data should be treated that have not been replicated to the **secondary** site. This data should be treated
as lost if you proceed. as lost if you proceed.
If an outage on the **primary** node happens, you should do everything possible to If an outage on the **primary** site happens, you should do everything possible to
avoid a split-brain situation where writes can occur in two different GitLab avoid a split-brain situation where writes can occur in two different GitLab
instances, complicating recovery efforts. So to prepare for the failover, we instances, complicating recovery efforts. So to prepare for the failover, we
must disable the **primary** node. must disable the **primary** site.
- If you have SSH access: - If you have SSH access:
1. SSH into the **primary** node to stop and disable GitLab: 1. SSH into the **primary** site to stop and disable GitLab:
```shell ```shell
sudo gitlab-ctl stop sudo gitlab-ctl stop
...@@ -57,35 +57,35 @@ must disable the **primary** node. ...@@ -57,35 +57,35 @@ must disable the **primary** node.
sudo systemctl disable gitlab-runsvdir sudo systemctl disable gitlab-runsvdir
``` ```
- If you do not have SSH access to the **primary** node, take the machine offline and - If you do not have SSH access to the **primary** site, take the machine offline and
prevent it from rebooting by any means at your disposal. prevent it from rebooting by any means at your disposal.
You might need to: You might need to:
- Reconfigure the load balancers. - Reconfigure the load balancers.
- Change DNS records (for example, point the primary DNS record to the - Change DNS records (for example, point the primary DNS record to the
**secondary** node to stop usage of the **primary** node). **secondary** site to stop usage of the **primary** site).
- Stop the virtual servers. - Stop the virtual servers.
- Block traffic through a firewall. - Block traffic through a firewall.
- Revoke object storage permissions from the **primary** node. - Revoke object storage permissions from the **primary** site.
- Physically disconnect a machine. - Physically disconnect a machine.
If you plan to [update the primary domain DNS record](#step-4-optional-updating-the-primary-domain-dns-record), If you plan to [update the primary domain DNS record](#step-4-optional-updating-the-primary-domain-dns-record),
you may wish to lower the TTL now to speed up propagation. you may wish to lower the TTL now to speed up propagation.
### Step 3. Promoting a **secondary** node ### Step 3. Promoting a **secondary** site
WARNING: WARNING:
In GitLab 13.2 and 13.3, promoting a secondary node to a primary while the In GitLab 13.2 and 13.3, promoting a secondary site to a primary while the
secondary is paused fails. Do not pause replication before promoting a secondary is paused fails. Do not pause replication before promoting a
secondary. If the node is paused, be sure to resume before promoting. secondary. If the secondary site is paused, be sure to resume before promoting.
This issue has been fixed in GitLab 13.4 and later. This issue has been fixed in GitLab 13.4 and later.
Note the following when promoting a secondary: Note the following when promoting a secondary:
- If replication was paused on the secondary node (for example as a part of - If replication was paused on the secondary site (for example as a part of
upgrading, while you were running a version of GitLab earlier than 13.4), you upgrading, while you were running a version of GitLab earlier than 13.4), you
_must_ [enable the node by using the database](../replication/troubleshooting.md#message-activerecordrecordinvalid-validation-failed-enabled-geo-primary-node-cannot-be-disabled) _must_ [enable the site by using the database](../replication/troubleshooting.md#message-activerecordrecordinvalid-validation-failed-enabled-geo-primary-node-cannot-be-disabled)
before proceeding. If the secondary node before proceeding. If the secondary site
[has been paused](../../geo/index.md#pausing-and-resuming-replication), the promotion [has been paused](../../geo/index.md#pausing-and-resuming-replication), the promotion
performs a point-in-time recovery to the last known state. performs a point-in-time recovery to the last known state.
Data that was created on the primary while the secondary was paused is lost. Data that was created on the primary while the secondary was paused is lost.
...@@ -99,7 +99,32 @@ Note the following when promoting a secondary: ...@@ -99,7 +99,32 @@ Note the following when promoting a secondary:
for more information, see this for more information, see this
[troubleshooting advice](../replication/troubleshooting.md#errors-when-using---skip-preflight-checks-or---force). [troubleshooting advice](../replication/troubleshooting.md#errors-when-using---skip-preflight-checks-or---force).
#### Promoting a **secondary** node running on a single machine #### Promoting a **secondary** site running on a single node running GitLab 14.5 and later
1. SSH in to your **secondary** node and execute:
- To promote the secondary node to primary:
```shell
sudo gitlab-ctl geo promote
```
- To promote the secondary node to primary **without any further confirmation**:
```shell
sudo gitlab-ctl geo promote --force
```
1. Verify you can connect to the newly-promoted **primary** site using the URL used
previously for the **secondary** site.
1. If successful, the **secondary** site is now promoted to the **primary** site.
#### Promoting a **secondary** site running on a single node running GitLab 14.4 and earlier
WARNING:
The `gitlab-ctl promote-to-primary-node` and `gitlab-ctl promoted-db` commands are
deprecated in GitLab 14.5 and later, and are scheduled to [be removed in GitLab 15.0](https://gitlab.com/gitlab-org/gitlab/-/issues/345207).
Use `gitlab-ctl geo promote` instead.
1. SSH in to your **secondary** node and login as root: 1. SSH in to your **secondary** node and login as root:
...@@ -116,7 +141,7 @@ Note the following when promoting a secondary: ...@@ -116,7 +141,7 @@ Note the following when promoting a secondary:
roles ['geo_secondary_role'] roles ['geo_secondary_role']
``` ```
1. Promote the **secondary** node to the **primary** node: 1. Promote the **secondary** site to the **primary** site:
- To promote the secondary node to primary along with [preflight checks](planned_failover.md#preflight-checks): - To promote the secondary node to primary along with [preflight checks](planned_failover.md#preflight-checks):
...@@ -146,18 +171,57 @@ Note the following when promoting a secondary: ...@@ -146,18 +171,57 @@ Note the following when promoting a secondary:
gitlab-ctl promote-to-primary-node --force gitlab-ctl promote-to-primary-node --force
``` ```
1. Verify you can connect to the newly-promoted **primary** node using the URL used 1. Verify you can connect to the newly-promoted **primary** site using the URL used
previously for the **secondary** node. previously for the **secondary** site.
1. If successful, the **secondary** node is now promoted to the **primary** node. 1. If successful, the **secondary** site is now promoted to the **primary** site.
#### Promoting a **secondary** node with multiple servers #### Promoting a **secondary** site with multiple nodes running GitLab 14.5 and later
1. SSH to every Sidekiq, PostgresSQL, and Gitaly node in the **secondary** site and run one of the following commands:
- To promote the secondary node to primary:
```shell
sudo gitlab-ctl geo promote
```
- To promote the secondary node to primary **without any further confirmation**:
```shell
sudo gitlab-ctl geo promote --force
```
1. SSH into each Rails node on your **secondary** site and run one of the following commands:
- To promote the secondary node to primary:
```shell
sudo gitlab-ctl geo promote
```
- To promote the secondary node to primary **without any further confirmation**:
```shell
sudo gitlab-ctl geo promote --force
```
1. Verify you can connect to the newly-promoted **primary** site using the URL used
previously for the **secondary** site.
1. If successful, the **secondary** site is now promoted to the **primary** site.
#### Promoting a **secondary** site with multiple nodes running GitLab 14.4 and earlier
WARNING:
The `gitlab-ctl promote-to-primary-node` and `gitlab-ctl promoted-db` commands are
deprecated in GitLab 14.5 and later, and are scheduled to [be removed in GitLab 15.0](https://gitlab.com/gitlab-org/gitlab/-/issues/345207).
Use `gitlab-ctl geo promote` instead.
The `gitlab-ctl promote-to-primary-node` command cannot be used yet in The `gitlab-ctl promote-to-primary-node` command cannot be used yet in
conjunction with multiple servers, as it can only conjunction with multiple nodes, as it can only perform changes on
perform changes on a **secondary** with only a single machine. Instead, you must a **secondary** with only a single node. Instead, you must
do this manually. do this manually.
1. SSH in to the database node in the **secondary** and trigger PostgreSQL to 1. SSH in to the database node in the **secondary** site and trigger PostgreSQL to
promote to read-write: promote to read-write:
```shell ```shell
...@@ -187,16 +251,54 @@ do this manually. ...@@ -187,16 +251,54 @@ do this manually.
1. Verify you can connect to the newly-promoted **primary** using the URL used 1. Verify you can connect to the newly-promoted **primary** using the URL used
previously for the **secondary**. previously for the **secondary**.
1. If successful, the **secondary** node is now promoted to the **primary** node. 1. If successful, the **secondary** site is now promoted to the **primary** site.
#### Promoting a **secondary** site with a Patroni standby cluster running GitLab 14.5 and later
1. SSH to every Sidekiq, PostgresSQL, and Gitaly node in the **secondary** site and run one of the following commands:
- To promote the secondary node to primary:
```shell
sudo gitlab-ctl geo promote
```
- To promote the secondary node to primary **without any further confirmation**:
```shell
sudo gitlab-ctl geo promote --force
```
1. SSH into each Rails node on your **secondary** site and run one of the following commands:
#### Promoting a **secondary** node with a Patroni standby cluster - To promote the secondary node to primary:
```shell
sudo gitlab-ctl geo promote
```
- To promote the secondary node to primary **without any further confirmation**:
```shell
sudo gitlab-ctl geo promote --force
```
1. Verify you can connect to the newly-promoted **primary** site using the URL used
previously for the **secondary** site.
1. If successful, the **secondary** site is now promoted to the **primary** site.
#### Promoting a **secondary** site with a Patroni standby cluster running GitLab 14.4 and earlier
WARNING:
The `gitlab-ctl promote-to-primary-node` and `gitlab-ctl promoted-db` commands are
deprecated in GitLab 14.5 and later, and are scheduled to [be removed in GitLab 15.0](https://gitlab.com/gitlab-org/gitlab/-/issues/345207).
Use `gitlab-ctl geo promote` instead.
The `gitlab-ctl promote-to-primary-node` command cannot be used yet in The `gitlab-ctl promote-to-primary-node` command cannot be used yet in
conjunction with a Patroni standby cluster, as it can only conjunction with a Patroni standby cluster, as it can only perform changes on
perform changes on a **secondary** with only a single machine. Instead, you must a **secondary** with only a single node. Instead, you must do this manually.
do this manually.
1. SSH in to the Standby Leader database node in the **secondary** and trigger PostgreSQL to 1. SSH in to the Standby Leader database node in the **secondary** site and trigger PostgreSQL to
promote to read-write: promote to read-write:
```shell ```shell
...@@ -230,9 +332,81 @@ do this manually. ...@@ -230,9 +332,81 @@ do this manually.
1. Verify you can connect to the newly-promoted **primary** using the URL used 1. Verify you can connect to the newly-promoted **primary** using the URL used
previously for the **secondary**. previously for the **secondary**.
1. If successful, the **secondary** node is now promoted to the **primary** node. 1. If successful, the **secondary** site is now promoted to the **primary** site.
#### Promoting a **secondary** site with an external PostgreSQL database running GitLab 14.5 and later
The `gitlab-ctl geo promote` command can be used in conjunction with
an external PostgreSQL database, but it can only perform changes on
a **secondary** PostgreSQL database managed by Omnibus.
You must promote the replica database associated with the **secondary**
site first.
1. Promote the replica database associated with the **secondary** site. This
sets the database to read-write. The instructions vary depending on where your database is hosted:
- [Amazon RDS](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ReadRepl.html#USER_ReadRepl.Promote)
- [Azure PostgreSQL](https://docs.microsoft.com/en-us/azure/postgresql/howto-read-replicas-portal#stop-replication)
- [Google Cloud SQL](https://cloud.google.com/sql/docs/mysql/replication/manage-replicas#promote-replica)
- For other external PostgreSQL databases, save the following script in your
secondary node, for example `/tmp/geo_promote.sh`, and modify the connection
parameters to match your environment. Then, execute it to promote the replica:
```shell
#!/bin/bash
#### Promoting a **secondary** node with an external PostgreSQL database PG_SUPERUSER=postgres
# The path to your pg_ctl binary. You may need to adjust this path to match
# your PostgreSQL installation
PG_CTL_BINARY=/usr/lib/postgresql/10/bin/pg_ctl
# The path to your PostgreSQL data directory. You may need to adjust this
# path to match your PostgreSQL installation. You can also run
# `SHOW data_directory;` from PostgreSQL to find your data directory
PG_DATA_DIRECTORY=/etc/postgresql/10/main
# Promote the PostgreSQL database and allow read/write operations
sudo -u $PG_SUPERUSER $PG_CTL_BINARY -D $PG_DATA_DIRECTORY promote
```
1. SSH to every Sidekiq, PostgresSQL, and Gitaly node in the **secondary** site and run one of the following commands:
- To promote the secondary node to primary:
```shell
sudo gitlab-ctl geo promote
```
- To promote the secondary node to primary **without any further confirmation**:
```shell
sudo gitlab-ctl geo promote --force
```
1. SSH into each Rails node on your **secondary** site and run one of the following commands:
- To promote the secondary node to primary:
```shell
sudo gitlab-ctl geo promote
```
- To promote the secondary node to primary **without any further confirmation**:
```shell
sudo gitlab-ctl geo promote --force
```
1. Verify you can connect to the newly-promoted **primary** site using the URL used
previously for the **secondary** site.
1. If successful, the **secondary** site is now promoted to the **primary** site.
#### Promoting a **secondary** site with an external PostgreSQL database running GitLab 14.4 and earlier
WARNING:
The `gitlab-ctl promote-to-primary-node` and `gitlab-ctl promoted-db` commands are
deprecated in GitLab 14.5 and later, and are scheduled to [be removed in GitLab 15.0](https://gitlab.com/gitlab-org/gitlab/-/issues/345207).
Use `gitlab-ctl geo promote` instead.
The `gitlab-ctl promote-to-primary-node` command cannot be used in conjunction with The `gitlab-ctl promote-to-primary-node` command cannot be used in conjunction with
an external PostgreSQL database, as it can only perform changes on a **secondary** an external PostgreSQL database, as it can only perform changes on a **secondary**
...@@ -287,23 +461,23 @@ required: ...@@ -287,23 +461,23 @@ required:
1. Verify you can connect to the newly-promoted **primary** using the URL used 1. Verify you can connect to the newly-promoted **primary** using the URL used
previously for the **secondary**. previously for the **secondary**.
1. If successful, the **secondary** node is now promoted to the **primary** node. 1. If successful, the **secondary** site is now promoted to the **primary** site.
### Step 4. (Optional) Updating the primary domain DNS record ### Step 4. (Optional) Updating the primary domain DNS record
Updating the DNS records for the primary domain to point to the **secondary** node Updating the DNS records for the primary domain to point to the **secondary** site
to prevent the need to update all references to the primary domain to the to prevent the need to update all references to the primary domain to the
secondary domain, like changing Git remotes and API URLs. secondary domain, like changing Git remotes and API URLs.
1. SSH into the **secondary** node and login as root: 1. SSH into the **secondary** site and login as root:
```shell ```shell
sudo -i sudo -i
``` ```
1. Update the primary domain's DNS record. After updating the primary domain's 1. Update the primary domain's DNS record. After updating the primary domain's
DNS records to point to the **secondary** node, edit `/etc/gitlab/gitlab.rb` on the DNS records to point to the **secondary** site, edit `/etc/gitlab/gitlab.rb` on the
**secondary** node to reflect the new URL: **secondary** site to reflect the new URL:
```ruby ```ruby
# Change the existing external_url configuration # Change the existing external_url configuration
...@@ -314,13 +488,13 @@ secondary domain, like changing Git remotes and API URLs. ...@@ -314,13 +488,13 @@ secondary domain, like changing Git remotes and API URLs.
Changing `external_url` does not prevent access via the old secondary URL, as Changing `external_url` does not prevent access via the old secondary URL, as
long as the secondary DNS records are still intact. long as the secondary DNS records are still intact.
1. Reconfigure the **secondary** node for the change to take effect: 1. Reconfigure the **secondary** site for the change to take effect:
```shell ```shell
gitlab-ctl reconfigure gitlab-ctl reconfigure
``` ```
1. Execute the command below to update the newly promoted **primary** node URL: 1. Execute the command below to update the newly promoted **primary** site URL:
```shell ```shell
gitlab-rake geo:update_primary_node_url gitlab-rake geo:update_primary_node_url
...@@ -335,14 +509,14 @@ secondary domain, like changing Git remotes and API URLs. ...@@ -335,14 +509,14 @@ secondary domain, like changing Git remotes and API URLs.
To determine if you need to do this, search for the To determine if you need to do this, search for the
`gitlab_rails["geo_node_name"]` setting in your `/etc/gitlab/gitlab.rb` `gitlab_rails["geo_node_name"]` setting in your `/etc/gitlab/gitlab.rb`
file. If it is commented out with `#` or not found at all, then you file. If it is commented out with `#` or not found at all, then you
need to update the **primary** node's name in the database. You can search for it need to update the **primary** site's name in the database. You can search for it
like so: like so:
```shell ```shell
grep "geo_node_name" /etc/gitlab/gitlab.rb grep "geo_node_name" /etc/gitlab/gitlab.rb
``` ```
To update the **primary** node's name in the database: To update the **primary** site's name in the database:
```shell ```shell
gitlab-rails runner 'Gitlab::Geo.primary_node.update!(name: GeoNode.current_node_name)' gitlab-rails runner 'Gitlab::Geo.primary_node.update!(name: GeoNode.current_node_name)'
...@@ -352,12 +526,12 @@ secondary domain, like changing Git remotes and API URLs. ...@@ -352,12 +526,12 @@ secondary domain, like changing Git remotes and API URLs.
If you updated the DNS records for the primary domain, these changes may If you updated the DNS records for the primary domain, these changes may
not have yet propagated depending on the previous DNS records TTL. not have yet propagated depending on the previous DNS records TTL.
### Step 5. (Optional) Add **secondary** Geo node to a promoted **primary** node ### Step 5. (Optional) Add **secondary** Geo site to a promoted **primary** site
Promoting a **secondary** node to **primary** node using the process above does not enable Promoting a **secondary** site to **primary** site using the process above does not enable
Geo on the new **primary** node. Geo on the new **primary** site.
To bring a new **secondary** node online, follow the [Geo setup instructions](../index.md#setup-instructions). To bring a new **secondary** site online, follow the [Geo setup instructions](../index.md#setup-instructions).
### Step 6. (Optional) Removing the secondary's tracking database ### Step 6. (Optional) Removing the secondary's tracking database
...@@ -376,13 +550,13 @@ for the changes to take effect. ...@@ -376,13 +550,13 @@ for the changes to take effect.
## Promoting secondary Geo replica in multi-secondary configurations ## Promoting secondary Geo replica in multi-secondary configurations
If you have more than one **secondary** node and you need to promote one of them, we suggest you follow If you have more than one **secondary** site and you need to promote one of them, we suggest you follow
[Promoting a **secondary** Geo node in single-secondary configurations](#promoting-a-secondary-geo-node-in-single-secondary-configurations) [Promoting a **secondary** Geo site in single-secondary configurations](#promoting-a-secondary-geo-site-in-single-secondary-configurations)
and after that you also need two extra steps. and after that you also need two extra steps.
### Step 1. Prepare the new **primary** node to serve one or more **secondary** nodes ### Step 1. Prepare the new **primary** site to serve one or more **secondary** sites
1. SSH into the new **primary** node and login as root: 1. SSH into the new **primary** site and login as root:
```shell ```shell
sudo -i sudo -i
...@@ -442,13 +616,13 @@ and after that you also need two extra steps. ...@@ -442,13 +616,13 @@ and after that you also need two extra steps.
### Step 2. Initiate the replication process ### Step 2. Initiate the replication process
Now we need to make each **secondary** node listen to changes on the new **primary** node. To do that you need Now we need to make each **secondary** site listen to changes on the new **primary** site. To do that you need
to [initiate the replication process](../setup/database.md#step-3-initiate-the-replication-process) again but this time to [initiate the replication process](../setup/database.md#step-3-initiate-the-replication-process) again but this time
for another **primary** node. All the old replication settings are overwritten. for another **primary** site. All the old replication settings are overwritten.
## Promoting a secondary Geo cluster in GitLab Cloud Native Helm Charts ## Promoting a secondary Geo cluster in GitLab Cloud Native Helm Charts
When updating a Cloud Native Geo deployment, the process for updating any node that is external to the secondary Kubernetes cluster does not differ from the non Cloud Native approach. As such, you can always defer to [Promoting a secondary Geo node in single-secondary configurations](#promoting-a-secondary-geo-node-in-single-secondary-configurations) for more information. When updating a Cloud Native Geo deployment, the process for updating any node that is external to the secondary Kubernetes cluster does not differ from the non Cloud Native approach. As such, you can always defer to [Promoting a secondary Geo site in single-secondary configurations](#promoting-a-secondary-geo-site-in-single-secondary-configurations) for more information.
The following sections assume you are using the `gitlab` namespace. If you used a different namespace when setting up your cluster, you should also replace `--namespace gitlab` with your namespace. The following sections assume you are using the `gitlab` namespace. If you used a different namespace when setting up your cluster, you should also replace `--namespace gitlab` with your namespace.
...@@ -489,13 +663,45 @@ must disable the **primary** site: ...@@ -489,13 +663,45 @@ must disable the **primary** site:
- Revoke object storage permissions from the **primary** site. - Revoke object storage permissions from the **primary** site.
- Physically disconnect a machine. - Physically disconnect a machine.
### Step 2. Promote all **secondary** nodes external to the cluster ### Step 2. Promote all **secondary** sites external to the cluster
WARNING: WARNING:
If the secondary site [has been paused](../../geo/index.md#pausing-and-resuming-replication), this performs If the secondary site [has been paused](../../geo/index.md#pausing-and-resuming-replication), this performs
a point-in-time recovery to the last known state. a point-in-time recovery to the last known state.
Data that was created on the primary while the secondary was paused is lost. Data that was created on the primary while the secondary was paused is lost.
If you are running GitLab 14.5 and later:
1. SSH to every Sidekiq, PostgresSQL, and Gitaly node in the **secondary** site and run one of the following commands:
- To promote the secondary node to primary:
```shell
sudo gitlab-ctl geo promote
```
- To promote the secondary node to primary **without any further confirmation**:
```shell
sudo gitlab-ctl geo promote --force
```
1. SSH into each Rails node on your **secondary** site and run one of the following commands:
- To promote the secondary node to primary:
```shell
sudo gitlab-ctl geo promote
```
- To promote the secondary node to primary **without any further confirmation**:
```shell
sudo gitlab-ctl geo promote --force
```
If you are running GitLab 14.4 and earlier:
1. SSH in to the database node in the **secondary** and trigger PostgreSQL to 1. SSH in to the database node in the **secondary** and trigger PostgreSQL to
promote to read-write: promote to read-write:
...@@ -522,8 +728,6 @@ Data that was created on the primary while the secondary was paused is lost. ...@@ -522,8 +728,6 @@ Data that was created on the primary while the secondary was paused is lost.
After making these changes, [reconfigure GitLab](../../restart_gitlab.md#omnibus-gitlab-reconfigure) on the database node. After making these changes, [reconfigure GitLab](../../restart_gitlab.md#omnibus-gitlab-reconfigure) on the database node.
### Step 3. Promote the **secondary** cluster
1. Find the task runner pod: 1. Find the task runner pod:
```shell ```shell
...@@ -536,6 +740,8 @@ Data that was created on the primary while the secondary was paused is lost. ...@@ -536,6 +740,8 @@ Data that was created on the primary while the secondary was paused is lost.
kubectl --namespace gitlab exec -ti gitlab-geo-task-runner-XXX -- gitlab-rake geo:set_secondary_as_primary kubectl --namespace gitlab exec -ti gitlab-geo-task-runner-XXX -- gitlab-rake geo:set_secondary_as_primary
``` ```
### Step 3. Promote the **secondary** cluster
1. Update the existing cluster configuration. 1. Update the existing cluster configuration.
You can retrieve the existing configuration with Helm: You can retrieve the existing configuration with Helm:
......
...@@ -204,4 +204,4 @@ in the loss of any data uploaded to the new **primary** in the meantime. ...@@ -204,4 +204,4 @@ in the loss of any data uploaded to the new **primary** in the meantime.
Don't forget to remove the broadcast message after the failover is complete. Don't forget to remove the broadcast message after the failover is complete.
Finally, you can bring the [old site back as a secondary](bring_primary_back.md#configure-the-former-primary-node-to-be-a-secondary-node). Finally, you can bring the [old site back as a secondary](bring_primary_back.md#configure-the-former-primary-site-to-be-a-secondary-site).
...@@ -66,13 +66,13 @@ promote a Geo replica and perform a failover. ...@@ -66,13 +66,13 @@ promote a Geo replica and perform a failover.
NOTE: NOTE:
GitLab 13.9 through GitLab 14.3 are affected by a bug in which the Geo secondary site statuses will appear to stop updating and become unhealthy. For more information, see [Geo Admin Area shows 'Unhealthy' after enabling Maintenance Mode](../../replication/troubleshooting.md#geo-admin-area-shows-unhealthy-after-enabling-maintenance-mode). GitLab 13.9 through GitLab 14.3 are affected by a bug in which the Geo secondary site statuses will appear to stop updating and become unhealthy. For more information, see [Geo Admin Area shows 'Unhealthy' after enabling Maintenance Mode](../../replication/troubleshooting.md#geo-admin-area-shows-unhealthy-after-enabling-maintenance-mode).
On the **secondary** node: On the **secondary** site:
1. On the top bar, select **Menu > Admin**. 1. On the top bar, select **Menu > Admin**.
1. On the left sidebar, select **Geo > Nodes** to see its status. 1. On the left sidebar, select **Geo > Nodes** to see its status.
Replicated objects (shown in green) should be close to 100%, Replicated objects (shown in green) should be close to 100%,
and there should be no failures (shown in red). If a large proportion of and there should be no failures (shown in red). If a large proportion of
objects aren't yet replicated (shown in gray), consider giving the node more objects aren't yet replicated (shown in gray), consider giving the site more
time to complete. time to complete.
![Replication status](../../replication/img/geo_dashboard_v14_0.png) ![Replication status](../../replication/img/geo_dashboard_v14_0.png)
...@@ -85,20 +85,20 @@ You can use the ...@@ -85,20 +85,20 @@ You can use the
[Geo status API](../../../../api/geo_nodes.md#retrieve-project-sync-or-verification-failures-that-occurred-on-the-current-node) [Geo status API](../../../../api/geo_nodes.md#retrieve-project-sync-or-verification-failures-that-occurred-on-the-current-node)
to review failed objects and the reasons for failure. to review failed objects and the reasons for failure.
A common cause of replication failures is the data being missing on the A common cause of replication failures is the data being missing on the
**primary** node - you can resolve these failures by restoring the data from backup, **primary** site - you can resolve these failures by restoring the data from backup,
or removing references to the missing data. or removing references to the missing data.
The maintenance window won't end until Geo replication and verification is The maintenance window won't end until Geo replication and verification is
completely finished. To keep the window as short as possible, you should completely finished. To keep the window as short as possible, you should
ensure these processes are close to 100% as possible during active use. ensure these processes are close to 100% as possible during active use.
If the **secondary** node is still replicating data from the **primary** node, If the **secondary** site is still replicating data from the **primary** site,
follow these steps to avoid unnecessary data loss: follow these steps to avoid unnecessary data loss:
1. Until a [read-only mode](https://gitlab.com/gitlab-org/gitlab/-/issues/14609) 1. Until a [read-only mode](https://gitlab.com/gitlab-org/gitlab/-/issues/14609)
is implemented, updates must be prevented from happening manually to the is implemented, updates must be prevented from happening manually to the
**primary**. Your **secondary** node still needs read-only **primary**. Your **secondary** site still needs read-only
access to the **primary** node during the maintenance window: access to the **primary** site during the maintenance window:
1. At the scheduled time, using your cloud provider or your node's firewall, block 1. At the scheduled time, using your cloud provider or your node's firewall, block
all HTTP, HTTPS and SSH traffic to/from the **primary** node, **except** for your IP and all HTTP, HTTPS and SSH traffic to/from the **primary** node, **except** for your IP and
...@@ -121,18 +121,18 @@ follow these steps to avoid unnecessary data loss: ...@@ -121,18 +121,18 @@ follow these steps to avoid unnecessary data loss:
``` ```
From this point, users are unable to view their data or make changes on the From this point, users are unable to view their data or make changes on the
**primary** node. They are also unable to log in to the **secondary** node. **primary** site. They are also unable to log in to the **secondary** site.
However, existing sessions need to work for the remainder of the maintenance period, and However, existing sessions need to work for the remainder of the maintenance period, and
so public data is accessible throughout. so public data is accessible throughout.
1. Verify the **primary** node is blocked to HTTP traffic by visiting it in browser via 1. Verify the **primary** site is blocked to HTTP traffic by visiting it in browser via
another IP. The server should refuse connection. another IP. The server should refuse connection.
1. Verify the **primary** node is blocked to Git over SSH traffic by attempting to pull an 1. Verify the **primary** site is blocked to Git over SSH traffic by attempting to pull an
existing Git repository with an SSH remote URL. The server should refuse existing Git repository with an SSH remote URL. The server should refuse
connection. connection.
1. On the **primary** node: 1. On the **primary** site:
1. On the top bar, select **Menu > Admin**. 1. On the top bar, select **Menu > Admin**.
1. On the left sidebar, select **Monitoring > Background Jobs**. 1. On the left sidebar, select **Monitoring > Background Jobs**.
1. On the Sidekiq dhasboard, select **Cron**. 1. On the Sidekiq dhasboard, select **Cron**.
...@@ -150,7 +150,7 @@ follow these steps to avoid unnecessary data loss: ...@@ -150,7 +150,7 @@ follow these steps to avoid unnecessary data loss:
1. If you are manually replicating any 1. If you are manually replicating any
[data not managed by Geo](../../replication/datatypes.md#limitations-on-replicationverification), [data not managed by Geo](../../replication/datatypes.md#limitations-on-replicationverification),
trigger the final replication process now. trigger the final replication process now.
1. On the **primary** node: 1. On the **primary** site:
1. On the top bar, select **Menu > Admin**. 1. On the top bar, select **Menu > Admin**.
1. On the left sidebar, select **Monitoring > Background Jobs**. 1. On the left sidebar, select **Monitoring > Background Jobs**.
1. On the Sidekiq dashboard, select **Queues**, and wait for all queues except 1. On the Sidekiq dashboard, select **Queues**, and wait for all queues except
...@@ -165,7 +165,7 @@ follow these steps to avoid unnecessary data loss: ...@@ -165,7 +165,7 @@ follow these steps to avoid unnecessary data loss:
- Database replication lag is 0ms. - Database replication lag is 0ms.
- The Geo log cursor is up to date (0 events behind). - The Geo log cursor is up to date (0 events behind).
1. On the **secondary** node: 1. On the **secondary** site:
1. On the top bar, select **Menu > Admin**. 1. On the top bar, select **Menu > Admin**.
1. On the left sidebar, select **Monitoring > Background Jobs**. 1. On the left sidebar, select **Monitoring > Background Jobs**.
1. On the Sidekiq dashboard, select **Queues**, and wait for all the `geo` 1. On the Sidekiq dashboard, select **Queues**, and wait for all the `geo`
...@@ -173,14 +173,14 @@ follow these steps to avoid unnecessary data loss: ...@@ -173,14 +173,14 @@ follow these steps to avoid unnecessary data loss:
1. [Run an integrity check](../../../raketasks/check.md) to verify the integrity 1. [Run an integrity check](../../../raketasks/check.md) to verify the integrity
of CI artifacts, LFS objects, and uploads in file storage. of CI artifacts, LFS objects, and uploads in file storage.
At this point, your **secondary** node contains an up-to-date copy of everything the At this point, your **secondary** site contains an up-to-date copy of everything the
**primary** node has, meaning nothing is lost when you fail over. **primary** site has, meaning nothing is lost when you fail over.
1. In this final step, you need to permanently disable the **primary** node. 1. In this final step, you need to permanently disable the **primary** site.
WARNING: WARNING:
When the **primary** node goes offline, there may be data saved on the **primary** node When the **primary** site goes offline, there may be data saved on the **primary** site
that has not been replicated to the **secondary** node. This data should be treated that has not been replicated to the **secondary** site. This data should be treated
as lost if you proceed. as lost if you proceed.
NOTE: NOTE:
...@@ -189,9 +189,9 @@ follow these steps to avoid unnecessary data loss: ...@@ -189,9 +189,9 @@ follow these steps to avoid unnecessary data loss:
When performing a failover, we want to avoid a split-brain situation where When performing a failover, we want to avoid a split-brain situation where
writes can occur in two different GitLab instances. So to prepare for the writes can occur in two different GitLab instances. So to prepare for the
failover, you must disable the **primary** node: failover, you must disable the **primary** site:
- If you have SSH access to the **primary** node, stop and disable GitLab: - If you have SSH access to the **primary** site, stop and disable GitLab:
```shell ```shell
sudo gitlab-ctl stop sudo gitlab-ctl stop
...@@ -214,19 +214,58 @@ follow these steps to avoid unnecessary data loss: ...@@ -214,19 +214,58 @@ follow these steps to avoid unnecessary data loss:
from starting if the machine reboots as `root` with from starting if the machine reboots as `root` with
`initctl stop gitlab-runsvvdir && echo 'manual' > /etc/init/gitlab-runsvdir.override && initctl reload-configuration`. `initctl stop gitlab-runsvvdir && echo 'manual' > /etc/init/gitlab-runsvdir.override && initctl reload-configuration`.
- If you do not have SSH access to the **primary** node, take the machine offline and - If you do not have SSH access to the **primary** site, take the machine offline and
prevent it from rebooting. Since there are many ways you may prefer to accomplish prevent it from rebooting. Since there are many ways you may prefer to accomplish
this, we avoid a single recommendation. You may need to: this, we avoid a single recommendation. You may need to:
- Reconfigure the load balancers. - Reconfigure the load balancers.
- Change DNS records (for example, point the **primary** DNS record to the - Change DNS records (for example, point the **primary** DNS record to the
**secondary** node to stop using the **primary** node). **secondary** site to stop using the **primary** site).
- Stop the virtual servers. - Stop the virtual servers.
- Block traffic through a firewall. - Block traffic through a firewall.
- Revoke object storage permissions from the **primary** node. - Revoke object storage permissions from the **primary** site.
- Physically disconnect a machine. - Physically disconnect a machine.
### Promoting the **secondary** node ### Promoting the **secondary** site running GitLab 14.5 and later
1. SSH to every Sidekiq, PostgresSQL, and Gitaly node in the **secondary** site and run one of the following commands:
- To promote the secondary node to primary:
```shell
sudo gitlab-ctl geo promote
```
- To promote the secondary node to primary **without any further confirmation**:
```shell
sudo gitlab-ctl geo promote --force
```
1. SSH into each Rails node on your **secondary** site and run one of the following commands:
- To promote the secondary node to primary:
```shell
sudo gitlab-ctl geo promote
```
- To promote the secondary node to primary **without any further confirmation**:
```shell
sudo gitlab-ctl geo promote --force
```
1. Verify you can connect to the newly promoted **primary** site using the URL used
previously for the **secondary** site.
1. If successful, the **secondary** site is now promoted to the **primary** site.
### Promoting the **secondary** site running GitLab 14.4 and earlier
WARNING:
The `gitlab-ctl promote-to-primary-node` and `gitlab-ctl promoted-db` commands are
deprecated in GitLab 14.5 and later, and are scheduled to [be removed in GitLab 15.0](https://gitlab.com/gitlab-org/gitlab/-/issues/345207).
Use `gitlab-ctl geo promote` instead.
NOTE: NOTE:
A new **secondary** should not be added at this time. If you want to add a new A new **secondary** should not be added at this time. If you want to add a new
...@@ -243,13 +282,13 @@ perform changes on a **secondary** with only a single machine. Instead, you must ...@@ -243,13 +282,13 @@ perform changes on a **secondary** with only a single machine. Instead, you must
do this manually. do this manually.
WARNING: WARNING:
In GitLab 13.2 and 13.3, promoting a secondary node to a primary while the In GitLab 13.2 and 13.3, promoting a secondary site to a primary while the
secondary is paused fails. Do not pause replication before promoting a secondary is paused fails. Do not pause replication before promoting a
secondary. If the node is paused, be sure to resume before promoting. This secondary. If the site is paused, be sure to resume before promoting. This
issue has been fixed in GitLab 13.4 and later. issue has been fixed in GitLab 13.4 and later.
WARNING: WARNING:
If the secondary node [has been paused](../../../geo/index.md#pausing-and-resuming-replication), this performs If the secondary site [has been paused](../../../geo/index.md#pausing-and-resuming-replication), this performs
a point-in-time recovery to the last known state. a point-in-time recovery to the last known state.
Data that was created on the primary while the secondary was paused is lost. Data that was created on the primary while the secondary was paused is lost.
...@@ -291,6 +330,6 @@ Data that was created on the primary while the secondary was paused is lost. ...@@ -291,6 +330,6 @@ Data that was created on the primary while the secondary was paused is lost.
### Next steps ### Next steps
To regain geographic redundancy as quickly as possible, you should To regain geographic redundancy as quickly as possible, you should
[add a new **secondary** node](../../setup/index.md). To [add a new **secondary** site](../../setup/index.md). To
do that, you can re-add the old **primary** as a new secondary and bring it back do that, you can re-add the old **primary** as a new secondary and bring it back
online. online.
...@@ -54,10 +54,10 @@ promote a Geo replica and perform a failover. ...@@ -54,10 +54,10 @@ promote a Geo replica and perform a failover.
NOTE: NOTE:
GitLab 13.9 through GitLab 14.3 are affected by a bug in which the Geo secondary site statuses will appear to stop updating and become unhealthy. For more information, see [Geo Admin Area shows 'Unhealthy' after enabling Maintenance Mode](../../replication/troubleshooting.md#geo-admin-area-shows-unhealthy-after-enabling-maintenance-mode). GitLab 13.9 through GitLab 14.3 are affected by a bug in which the Geo secondary site statuses will appear to stop updating and become unhealthy. For more information, see [Geo Admin Area shows 'Unhealthy' after enabling Maintenance Mode](../../replication/troubleshooting.md#geo-admin-area-shows-unhealthy-after-enabling-maintenance-mode).
On the **secondary** node, navigate to the **Admin Area > Geo** dashboard to On the **secondary** site, navigate to the **Admin Area > Geo** dashboard to
review its status. Replicated objects (shown in green) should be close to 100%, review its status. Replicated objects (shown in green) should be close to 100%,
and there should be no failures (shown in red). If a large proportion of and there should be no failures (shown in red). If a large proportion of
objects aren't yet replicated (shown in gray), consider giving the node more objects aren't yet replicated (shown in gray), consider giving the site more
time to complete. time to complete.
![Replication status](../../replication/img/geo_dashboard_v14_0.png) ![Replication status](../../replication/img/geo_dashboard_v14_0.png)
...@@ -70,20 +70,20 @@ You can use the ...@@ -70,20 +70,20 @@ You can use the
[Geo status API](../../../../api/geo_nodes.md#retrieve-project-sync-or-verification-failures-that-occurred-on-the-current-node) [Geo status API](../../../../api/geo_nodes.md#retrieve-project-sync-or-verification-failures-that-occurred-on-the-current-node)
to review failed objects and the reasons for failure. to review failed objects and the reasons for failure.
A common cause of replication failures is the data being missing on the A common cause of replication failures is the data being missing on the
**primary** node - you can resolve these failures by restoring the data from backup, **primary** site - you can resolve these failures by restoring the data from backup,
or removing references to the missing data. or removing references to the missing data.
The maintenance window won't end until Geo replication and verification is The maintenance window won't end until Geo replication and verification is
completely finished. To keep the window as short as possible, you should completely finished. To keep the window as short as possible, you should
ensure these processes are close to 100% as possible during active use. ensure these processes are close to 100% as possible during active use.
If the **secondary** node is still replicating data from the **primary** node, If the **secondary** site is still replicating data from the **primary** site,
follow these steps to avoid unnecessary data loss: follow these steps to avoid unnecessary data loss:
1. Until a [read-only mode](https://gitlab.com/gitlab-org/gitlab/-/issues/14609) 1. Until a [read-only mode](https://gitlab.com/gitlab-org/gitlab/-/issues/14609)
is implemented, updates must be prevented from happening manually to the is implemented, updates must be prevented from happening manually to the
**primary**. Your **secondary** node still needs read-only **primary**. Your **secondary** site still needs read-only
access to the **primary** node during the maintenance window: access to the **primary** site during the maintenance window:
1. At the scheduled time, using your cloud provider or your node's firewall, block 1. At the scheduled time, using your cloud provider or your node's firewall, block
all HTTP, HTTPS and SSH traffic to/from the **primary** node, **except** for your IP and all HTTP, HTTPS and SSH traffic to/from the **primary** node, **except** for your IP and
...@@ -106,18 +106,18 @@ follow these steps to avoid unnecessary data loss: ...@@ -106,18 +106,18 @@ follow these steps to avoid unnecessary data loss:
``` ```
From this point, users are unable to view their data or make changes on the From this point, users are unable to view their data or make changes on the
**primary** node. They are also unable to log in to the **secondary** node. **primary** site. They are also unable to log in to the **secondary** site.
However, existing sessions need to work for the remainder of the maintenance period, and However, existing sessions need to work for the remainder of the maintenance period, and
so public data is accessible throughout. so public data is accessible throughout.
1. Verify the **primary** node is blocked to HTTP traffic by visiting it in browser via 1. Verify the **primary** site is blocked to HTTP traffic by visiting it in browser via
another IP. The server should refuse connection. another IP. The server should refuse connection.
1. Verify the **primary** node is blocked to Git over SSH traffic by attempting to pull an 1. Verify the **primary** site is blocked to Git over SSH traffic by attempting to pull an
existing Git repository with an SSH remote URL. The server should refuse existing Git repository with an SSH remote URL. The server should refuse
connection. connection.
1. On the **primary** node: 1. On the **primary** site:
1. On the top bar, select **Menu > Admin**. 1. On the top bar, select **Menu > Admin**.
1. On the left sidebar, select **Monitoring > Background Jobs**. 1. On the left sidebar, select **Monitoring > Background Jobs**.
1. On the Sidekiq dhasboard, select **Cron**. 1. On the Sidekiq dhasboard, select **Cron**.
...@@ -135,7 +135,7 @@ follow these steps to avoid unnecessary data loss: ...@@ -135,7 +135,7 @@ follow these steps to avoid unnecessary data loss:
1. If you are manually replicating any 1. If you are manually replicating any
[data not managed by Geo](../../replication/datatypes.md#limitations-on-replicationverification), [data not managed by Geo](../../replication/datatypes.md#limitations-on-replicationverification),
trigger the final replication process now. trigger the final replication process now.
1. On the **primary** node: 1. On the **primary** site:
1. On the top bar, select **Menu > Admin**. 1. On the top bar, select **Menu > Admin**.
1. On the left sidebar, select **Monitoring > Background Jobs**. 1. On the left sidebar, select **Monitoring > Background Jobs**.
1. On the Sidekiq dashboard, select **Queues**, and wait for all queues except 1. On the Sidekiq dashboard, select **Queues**, and wait for all queues except
...@@ -143,14 +143,14 @@ follow these steps to avoid unnecessary data loss: ...@@ -143,14 +143,14 @@ follow these steps to avoid unnecessary data loss:
These queues contain work that has been submitted by your users; failing over These queues contain work that has been submitted by your users; failing over
before it is completed, causes the work to be lost. before it is completed, causes the work to be lost.
1. On the left sidebar, select **Geo > Nodes** and wait for the 1. On the left sidebar, select **Geo > Nodes** and wait for the
following conditions to be true of the **secondary** node you are failing over to: following conditions to be true of the **secondary** site you are failing over to:
- All replication meters reach 100% replicated, 0% failures. - All replication meters reach 100% replicated, 0% failures.
- All verification meters reach 100% verified, 0% failures. - All verification meters reach 100% verified, 0% failures.
- Database replication lag is 0ms. - Database replication lag is 0ms.
- The Geo log cursor is up to date (0 events behind). - The Geo log cursor is up to date (0 events behind).
1. On the **secondary** node: 1. On the **secondary** site:
1. On the top bar, select **Menu > Admin**. 1. On the top bar, select **Menu > Admin**.
1. On the left sidebar, select **Monitoring > Background Jobs**. 1. On the left sidebar, select **Monitoring > Background Jobs**.
1. On the Sidekiq dashboard, select **Queues**, and wait for all the `geo` 1. On the Sidekiq dashboard, select **Queues**, and wait for all the `geo`
...@@ -158,14 +158,14 @@ follow these steps to avoid unnecessary data loss: ...@@ -158,14 +158,14 @@ follow these steps to avoid unnecessary data loss:
1. [Run an integrity check](../../../raketasks/check.md) to verify the integrity 1. [Run an integrity check](../../../raketasks/check.md) to verify the integrity
of CI artifacts, LFS objects, and uploads in file storage. of CI artifacts, LFS objects, and uploads in file storage.
At this point, your **secondary** node contains an up-to-date copy of everything the At this point, your **secondary** site contains an up-to-date copy of everything the
**primary** node has, meaning nothing is lost when you fail over. **primary** site has, meaning nothing is lost when you fail over.
1. In this final step, you need to permanently disable the **primary** node. 1. In this final step, you need to permanently disable the **primary** site.
WARNING: WARNING:
When the **primary** node goes offline, there may be data saved on the **primary** node When the **primary** site goes offline, there may be data saved on the **primary** site
that has not been replicated to the **secondary** node. This data should be treated that has not been replicated to the **secondary** site. This data should be treated
as lost if you proceed. as lost if you proceed.
NOTE: NOTE:
...@@ -174,9 +174,9 @@ follow these steps to avoid unnecessary data loss: ...@@ -174,9 +174,9 @@ follow these steps to avoid unnecessary data loss:
When performing a failover, we want to avoid a split-brain situation where When performing a failover, we want to avoid a split-brain situation where
writes can occur in two different GitLab instances. So to prepare for the writes can occur in two different GitLab instances. So to prepare for the
failover, you must disable the **primary** node: failover, you must disable the **primary** site:
- If you have SSH access to the **primary** node, stop and disable GitLab: - If you have SSH access to the **primary** site, stop and disable GitLab:
```shell ```shell
sudo gitlab-ctl stop sudo gitlab-ctl stop
...@@ -199,19 +199,19 @@ follow these steps to avoid unnecessary data loss: ...@@ -199,19 +199,19 @@ follow these steps to avoid unnecessary data loss:
from starting if the machine reboots as `root` with from starting if the machine reboots as `root` with
`initctl stop gitlab-runsvvdir && echo 'manual' > /etc/init/gitlab-runsvdir.override && initctl reload-configuration`. `initctl stop gitlab-runsvvdir && echo 'manual' > /etc/init/gitlab-runsvdir.override && initctl reload-configuration`.
- If you do not have SSH access to the **primary** node, take the machine offline and - If you do not have SSH access to the **primary** site, take the machine offline and
prevent it from rebooting. Since there are many ways you may prefer to accomplish prevent it from rebooting. Since there are many ways you may prefer to accomplish
this, we avoid a single recommendation. You may need to: this, we avoid a single recommendation. You may need to:
- Reconfigure the load balancers. - Reconfigure the load balancers.
- Change DNS records (for example, point the **primary** DNS record to the - Change DNS records (for example, point the **primary** DNS record to the
**secondary** node to stop using the **primary** node). **secondary** site to stop using the **primary** site).
- Stop the virtual servers. - Stop the virtual servers.
- Block traffic through a firewall. - Block traffic through a firewall.
- Revoke object storage permissions from the **primary** node. - Revoke object storage permissions from the **primary** site.
- Physically disconnect a machine. - Physically disconnect a machine.
### Promoting the **secondary** node ### Promoting the **secondary** site
Note the following when promoting a secondary: Note the following when promoting a secondary:
...@@ -222,9 +222,35 @@ Note the following when promoting a secondary: ...@@ -222,9 +222,35 @@ Note the following when promoting a secondary:
error during this process, read error during this process, read
[the troubleshooting advice](../../replication/troubleshooting.md#fixing-errors-during-a-failover-or-when-promoting-a-secondary-to-a-primary-node). [the troubleshooting advice](../../replication/troubleshooting.md#fixing-errors-during-a-failover-or-when-promoting-a-secondary-to-a-primary-node).
To promote the secondary node: To promote the secondary site running GitLab 14.5 and later:
1. SSH in to your **secondary** node and login as root: 1. SSH in to your **secondary** node and run one of the following commands:
- To promote the secondary node to primary:
```shell
sudo gitlab-ctl geo promote
```
- To promote the secondary node to primary **without any further confirmation**:
```shell
sudo gitlab-ctl geo promote --force
```
1. Verify you can connect to the newly promoted **primary** site using the URL used
previously for the **secondary** site.
If successful, the **secondary** site is now promoted to the **primary** site.
To promote the secondary site running GitLab 14.4 and earlier:
WARNING:
The `gitlab-ctl promote-to-primary-node` and `gitlab-ctl promoted-db` commands are
deprecated in GitLab 14.5 and later, and are scheduled to [be removed in GitLab 15.0](https://gitlab.com/gitlab-org/gitlab/-/issues/345207).
Use `gitlab-ctl geo promote` instead.
1. SSH in to your **secondary** site and login as root:
```shell ```shell
sudo -i sudo -i
...@@ -275,20 +301,20 @@ To promote the secondary node: ...@@ -275,20 +301,20 @@ To promote the secondary node:
gitlab-ctl promote-to-primary-node --skip-preflight-check gitlab-ctl promote-to-primary-node --skip-preflight-check
``` ```
You can also promote the secondary node to primary **without any further confirmation**, even when preflight checks fail: You can also promote the secondary site to primary **without any further confirmation**, even when preflight checks fail:
```shell ```shell
sudo gitlab-ctl promote-to-primary-node --force sudo gitlab-ctl promote-to-primary-node --force
``` ```
1. Verify you can connect to the newly promoted **primary** node using the URL used 1. Verify you can connect to the newly promoted **primary** site using the URL used
previously for the **secondary** node. previously for the **secondary** site.
If successful, the **secondary** node has now been promoted to the **primary** node. If successful, the **secondary** site is now promoted to the **primary** site.
### Next steps ### Next steps
To regain geographic redundancy as quickly as possible, you should To regain geographic redundancy as quickly as possible, you should
[add a new **secondary** node](../../setup/index.md). To [add a new **secondary** site](../../setup/index.md). To
do that, you can re-add the old **primary** as a new secondary and bring it back do that, you can re-add the old **primary** as a new secondary and bring it back
online. online.
...@@ -28,7 +28,7 @@ To disable Geo, you need to first remove all your secondary Geo sites, which mea ...@@ -28,7 +28,7 @@ To disable Geo, you need to first remove all your secondary Geo sites, which mea
anymore on these sites. You can follow our docs to [remove your secondary Geo sites](remove_geo_site.md). anymore on these sites. You can follow our docs to [remove your secondary Geo sites](remove_geo_site.md).
If the current site that you want to keep using is a secondary site, you need to first promote it to primary. If the current site that you want to keep using is a secondary site, you need to first promote it to primary.
You can use our steps on [how to promote a secondary site](../disaster_recovery/#step-3-promoting-a-secondary-node) You can use our steps on [how to promote a secondary site](../disaster_recovery/#step-3-promoting-a-secondary-site)
to do that. to do that.
## Remove the primary site from the UI ## Remove the primary site from the UI
......
...@@ -683,7 +683,7 @@ when promoting a secondary to a primary node with strategies to resolve them. ...@@ -683,7 +683,7 @@ when promoting a secondary to a primary node with strategies to resolve them.
### Message: ActiveRecord::RecordInvalid: Validation failed: Name has already been taken ### Message: ActiveRecord::RecordInvalid: Validation failed: Name has already been taken
When [promoting a **secondary** node](../disaster_recovery/index.md#step-3-promoting-a-secondary-node), When [promoting a **secondary** site](../disaster_recovery/index.md#step-3-promoting-a-secondary-site),
you might encounter the following error: you might encounter the following error:
```plaintext ```plaintext
...@@ -751,7 +751,7 @@ This can be fixed in the database. ...@@ -751,7 +751,7 @@ This can be fixed in the database.
### Message: ``NoMethodError: undefined method `secondary?' for nil:NilClass`` ### Message: ``NoMethodError: undefined method `secondary?' for nil:NilClass``
When [promoting a **secondary** node](../disaster_recovery/index.md#step-3-promoting-a-secondary-node), When [promoting a **secondary** site](../disaster_recovery/index.md#step-3-promoting-a-secondary-site),
you might encounter the following error: you might encounter the following error:
```plaintext ```plaintext
...@@ -767,13 +767,13 @@ Tasks: TOP => geo:set_secondary_as_primary ...@@ -767,13 +767,13 @@ Tasks: TOP => geo:set_secondary_as_primary
(See full trace by running task with --trace) (See full trace by running task with --trace)
``` ```
This command is intended to be executed on a secondary node only, and this error This command is intended to be executed on a secondary site only, and this error
is displayed if you attempt to run this command on a primary node. is displayed if you attempt to run this command on a primary site.
### Message: `sudo: gitlab-pg-ctl: command not found` ### Message: `sudo: gitlab-pg-ctl: command not found`
When When
[promoting a **secondary** node with multiple servers](../disaster_recovery/index.md#promoting-a-secondary-node-with-multiple-servers), [promoting a **secondary** site with multiple nodes](../disaster_recovery/index.md#promoting-a-secondary-site-with-multiple-nodes-running-gitlab-144-and-earlier),
you need to run the `gitlab-pg-ctl` command to promote the PostgreSQL you need to run the `gitlab-pg-ctl` command to promote the PostgreSQL
read-replica database. read-replica database.
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment