Commit cad61ad8 authored by Marcin Sedlak-Jakubowski's avatar Marcin Sedlak-Jakubowski

Merge branch 'eread/clarify-recovery-with-praefect' into 'master'

Clarify how to use Praefect for data recovery

See merge request gitlab-org/gitlab!38861
parents a4d44520 6119623d
...@@ -469,6 +469,7 @@ TypeScript ...@@ -469,6 +469,7 @@ TypeScript
Twilio Twilio
Twitter Twitter
Ubuntu Ubuntu
unapplied
unarchive unarchive
unarchived unarchived
unarchives unarchives
......
...@@ -1041,40 +1041,63 @@ strategy in the future. ...@@ -1041,40 +1041,63 @@ strategy in the future.
## Primary Node Failure ## Primary Node Failure
Praefect recovers from a failing primary Gitaly node by promoting a healthy secondary as the new primary. To minimize data loss, Praefect elects the secondary with the least unreplicated writes from the primary. There can still be some unreplicated writes, leading to data loss. Gitaly Cluster recovers from a failing primary Gitaly node by promoting a healthy secondary as the
new primary.
To minimize data loss, Gitaly Cluster:
- Switches repositories that are outdated on the new primary to [read-only mode](#read-only-mode).
- Elects the secondary with the least unreplicated writes from the primary to be the new primary.
Because there can still be some unreplicated writes, [data loss can occur](#check-for-data-loss).
### Read-only mode ### Read-only mode
If the primary is not fully up to date, Praefect switches: > - Introduced in GitLab 13.0 as [generally available](https://about.gitlab.com/handbook/product/gitlab-the-product/#generally-available-ga).
> - Between GitLab 13.0 and GitLab 13.2, read-only mode applied to the whole virtual storage and occurred whenever failover occurred.
> - [In GitLab 13.3 and later](https://gitlab.com/gitlab-org/gitaly/-/issues/2862), read-only mode applies on a per-repository basis and only occurs if a new primary is out of date.
When Gitaly Cluster switches to a new primary, repositories enter read-only mode if they are out of
date. This can happen after failing over to an outdated secondary. Read-only mode eases data
recovery efforts by preventing writes that may conflict with the unreplicated writes on other nodes.
To enable writes again, an administrator can:
- The virtual storage to read-only mode in GitLab 13.2 and earlier. 1. [Check](#check-for-data-loss) for data loss.
- The affected repositories to read-only mode in 1. Attempt to [recover](#recover-missing-data) missing data.
[GitLab 13.3](https://gitlab.com/gitlab-org/gitaly/-/issues/2862) and later. 1. Either [enable writes](#enable-writes-or-accept-data-loss) in the virtual storage or
[accept data loss](#enable-writes-or-accept-data-loss) if necessary, depending on the version of
GitLab.
This can happen after failing over to an outdated secondary. Read-only mode eases data ### Check for data loss
recovery efforts by preventing writes that may conflict with the unreplicated writes.
To re-enable the repositories for writes, the administrator can attempt to The Praefect `dataloss` sub-command identifies replicas that are likely to be outdated. This is
[recover the missing data](#recover-missing-data) from an up-to-date replica. If useful for identifying potential data loss after a failover. The following parameters are
recovering the data is not possible, the repository can be enabled for writes again by available:
[accepting the data loss](#accept-data-loss).
### Checking for data loss - `-virtual-storage` that specifies which virtual storage to check. The default behavior is to
display outdated replicas of read-only repositories as they generally require administrator
action.
- In GitLab 13.3 and later, `-partially-replicated` that specifies whether to display a list of
[outdated replicas of writable repositories](#outdated-replicas-of-writable-repositories).
The Praefect `dataloss` sub-command identifies replicas that are likely to be outdated. This is useful for identifying potential data loss after a failover. NOTE: **Note:**
`dataloss` is still in beta and the output format is subject to change.
To check for outdated replicas of read-only repositories, run:
```shell ```shell
sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dataloss [-virtual-storage <virtual-storage>] [-writable] sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dataloss [-virtual-storage <virtual-storage>]
``` ```
- `-virtual-storage` specifies which virtual storage to check. Every configured virtual storage is checked if none is specified. Every configured virtual storage is checked if none is specified:
- `-writable` specifies whether to report outdated replicas of writable repositories. Repository is writable if the primary has the latest changes. Secondaries might be temporarily outdated while they are waiting to replicate the latest changes. To reduce the noise, default behavior is to display outdated replicas of read-only repositories as they generally require administrator action.
```shell ```shell
sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dataloss sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dataloss
``` ```
The number of potentially unapplied changes to repositories are listed for each replica. Listed repositories might have the latest changes but it is not guaranteed. Only outdated replicas of read-only repositories are listed by default. The number of potentially unapplied changes to repositories is listed for each replica. Listed
repositories might have the latest changes but it is not guaranteed. Only outdated replicas of
read-only repositories are listed by default. For example:
```shell ```shell
Virtual storage: default Virtual storage: default
...@@ -1085,7 +1108,29 @@ Virtual storage: default ...@@ -1085,7 +1108,29 @@ Virtual storage: default
gitaly-3 is behind by 2 changes or less gitaly-3 is behind by 2 changes or less
``` ```
Set the `-writable` flag also list the outdated replicas of writable repositories. A confirmation is printed out when every repository is writable. For example:
```shell
Virtual storage: default
Primary: gitaly-1
All repositories are writable!
```
#### Outdated replicas of writable repositories
> [Introduced](https://gitlab.com/gitlab-org/gitaly/-/issues/3019) in GitLab 13.3.
To also list information for outdated replicas of writable repositories, use the
`-partially-replicated` parameter.
A repository is writable if the primary has the latest changes. Secondaries might be temporarily
outdated while they are waiting to replicate the latest changes.
```shell
sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dataloss [-virtual-storage <virtual-storage>] [-partially-replicated]
```
Example output:
```shell ```shell
Virtual storage: default Virtual storage: default
...@@ -1098,15 +1143,8 @@ Virtual storage: default ...@@ -1098,15 +1143,8 @@ Virtual storage: default
gitaly-2 is behind by 1 change or less gitaly-2 is behind by 1 change or less
``` ```
A confirmation is printed out when every repository is writable. With the `-partially-replicated` flag set, a confirmation is printed out if every replica is fully up to date.
For example:
```shell
Virtual storage: default
Primary: gitaly-1
All repositories are writable!
```
With the `-writable` flag set, a confirmation is printed out if every replica is fully up to date.
```shell ```shell
Virtual storage: default Virtual storage: default
...@@ -1114,17 +1152,15 @@ Virtual storage: default ...@@ -1114,17 +1152,15 @@ Virtual storage: default
All repositories are up to date! All repositories are up to date!
``` ```
NOTE: **Note:** ### Check repository checksums
`dataloss` is still in beta and the output format is subject to change.
### Checking repository checksums
To check a project's repository checksums across on all Gitaly nodes, run the To check a project's repository checksums across on all Gitaly nodes, run the
[replicas Rake task](../raketasks/praefect.md#replica-checksums) on the main GitLab node. [replicas Rake task](../raketasks/praefect.md#replica-checksums) on the main GitLab node.
### Recover missing data ### Recover missing data
The Praefect `reconcile` sub-command can be used to recover unreplicated changes from another replica. The source must be on a later generation than the target storage. The Praefect `reconcile` sub-command can be used to recover unreplicated changes from another replica.
The source must be on a later version than the target storage.
```shell ```shell
sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml reconcile -virtual <virtual-storage> -reference <up-to-date-storage> -target <outdated-storage> -f sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml reconcile -virtual <virtual-storage> -reference <up-to-date-storage> -target <outdated-storage> -f
...@@ -1132,30 +1168,30 @@ sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.t ...@@ -1132,30 +1168,30 @@ sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.t
Refer to [Backend Node Recovery](#backend-node-recovery) section for more details on the `reconcile` sub-command. Refer to [Backend Node Recovery](#backend-node-recovery) section for more details on the `reconcile` sub-command.
### Accept data loss ### Enable writes or accept data loss
Praefect provides the following subcommands to re-enable writes: Praefect provides the following subcommands to re-enable writes:
- In GitLab 13.2 and earlier, `enable-writes` to re-enable virtual storage for writes - In GitLab 13.2 and earlier, `enable-writes` to re-enable virtual storage for writes after data
after data recovery attempts. recovery attempts.
```shell ```shell
sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml enable-writes -virtual-storage <virtual-storage> sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml enable-writes -virtual-storage <virtual-storage>
``` ```
- [In GitLab 13.3](https://gitlab.com/gitlab-org/gitaly/-/merge_requests/2415) and - [In GitLab 13.3](https://gitlab.com/gitlab-org/gitaly/-/merge_requests/2415) and later,
later, `accept-dataloss` to accept data loss and re-enable writes for repositories `accept-dataloss` to accept data loss and re-enable writes for repositories after data recovery
after data recovery attempts have failed. Accepting data loss causes current version attempts have failed. Accepting data loss causes current version of the repository on the
of the repository on the authoritative storage to be considered latest. Other storages authoritative storage to be considered latest. Other storages are brought up to date with the
are brought up to date with the authoritative storage by scheduling replication jobs. authoritative storage by scheduling replication jobs.
```shell ```shell
sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml accept-dataloss -virtual-storage <virtual-storage> -repository <relative-path> -authoritative-storage <storage-name> sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml accept-dataloss -virtual-storage <virtual-storage> -repository <relative-path> -authoritative-storage <storage-name>
``` ```
CAUTION: **Caution:** CAUTION: **Caution:**
`accept-dataloss` causes permanent data loss by overwriting other versions of the `accept-dataloss` causes permanent data loss by overwriting other versions of the repository. Data
repository. Data recovery efforts must be performed before using it. [recovery efforts](#recover-missing-data) must be performed before using it.
## Backend Node Recovery ## Backend Node Recovery
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment