Commit cad61ad8 authored by Marcin Sedlak-Jakubowski's avatar Marcin Sedlak-Jakubowski

Merge branch 'eread/clarify-recovery-with-praefect' into 'master'

Clarify how to use Praefect for data recovery

See merge request gitlab-org/gitlab!38861
parents a4d44520 6119623d
......@@ -469,6 +469,7 @@ TypeScript
Twilio
Twitter
Ubuntu
unapplied
unarchive
unarchived
unarchives
......
......@@ -1041,40 +1041,63 @@ strategy in the future.
## Primary Node Failure
Praefect recovers from a failing primary Gitaly node by promoting a healthy secondary as the new primary. To minimize data loss, Praefect elects the secondary with the least unreplicated writes from the primary. There can still be some unreplicated writes, leading to data loss.
Gitaly Cluster recovers from a failing primary Gitaly node by promoting a healthy secondary as the
new primary.
To minimize data loss, Gitaly Cluster:
- Switches repositories that are outdated on the new primary to [read-only mode](#read-only-mode).
- Elects the secondary with the least unreplicated writes from the primary to be the new primary.
Because there can still be some unreplicated writes, [data loss can occur](#check-for-data-loss).
### Read-only mode
If the primary is not fully up to date, Praefect switches:
> - Introduced in GitLab 13.0 as [generally available](https://about.gitlab.com/handbook/product/gitlab-the-product/#generally-available-ga).
> - Between GitLab 13.0 and GitLab 13.2, read-only mode applied to the whole virtual storage and occurred whenever failover occurred.
> - [In GitLab 13.3 and later](https://gitlab.com/gitlab-org/gitaly/-/issues/2862), read-only mode applies on a per-repository basis and only occurs if a new primary is out of date.
When Gitaly Cluster switches to a new primary, repositories enter read-only mode if they are out of
date. This can happen after failing over to an outdated secondary. Read-only mode eases data
recovery efforts by preventing writes that may conflict with the unreplicated writes on other nodes.
To enable writes again, an administrator can:
- The virtual storage to read-only mode in GitLab 13.2 and earlier.
- The affected repositories to read-only mode in
[GitLab 13.3](https://gitlab.com/gitlab-org/gitaly/-/issues/2862) and later.
1. [Check](#check-for-data-loss) for data loss.
1. Attempt to [recover](#recover-missing-data) missing data.
1. Either [enable writes](#enable-writes-or-accept-data-loss) in the virtual storage or
[accept data loss](#enable-writes-or-accept-data-loss) if necessary, depending on the version of
GitLab.
This can happen after failing over to an outdated secondary. Read-only mode eases data
recovery efforts by preventing writes that may conflict with the unreplicated writes.
### Check for data loss
To re-enable the repositories for writes, the administrator can attempt to
[recover the missing data](#recover-missing-data) from an up-to-date replica. If
recovering the data is not possible, the repository can be enabled for writes again by
[accepting the data loss](#accept-data-loss).
The Praefect `dataloss` sub-command identifies replicas that are likely to be outdated. This is
useful for identifying potential data loss after a failover. The following parameters are
available:
### Checking for data loss
- `-virtual-storage` that specifies which virtual storage to check. The default behavior is to
display outdated replicas of read-only repositories as they generally require administrator
action.
- In GitLab 13.3 and later, `-partially-replicated` that specifies whether to display a list of
[outdated replicas of writable repositories](#outdated-replicas-of-writable-repositories).
The Praefect `dataloss` sub-command identifies replicas that are likely to be outdated. This is useful for identifying potential data loss after a failover.
NOTE: **Note:**
`dataloss` is still in beta and the output format is subject to change.
To check for outdated replicas of read-only repositories, run:
```shell
sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dataloss [-virtual-storage <virtual-storage>] [-writable]
sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dataloss [-virtual-storage <virtual-storage>]
```
- `-virtual-storage` specifies which virtual storage to check. Every configured virtual storage is checked if none is specified.
- `-writable` specifies whether to report outdated replicas of writable repositories. Repository is writable if the primary has the latest changes. Secondaries might be temporarily outdated while they are waiting to replicate the latest changes. To reduce the noise, default behavior is to display outdated replicas of read-only repositories as they generally require administrator action.
Every configured virtual storage is checked if none is specified:
```shell
sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dataloss
```
The number of potentially unapplied changes to repositories are listed for each replica. Listed repositories might have the latest changes but it is not guaranteed. Only outdated replicas of read-only repositories are listed by default.
The number of potentially unapplied changes to repositories is listed for each replica. Listed
repositories might have the latest changes but it is not guaranteed. Only outdated replicas of
read-only repositories are listed by default. For example:
```shell
Virtual storage: default
......@@ -1085,7 +1108,29 @@ Virtual storage: default
gitaly-3 is behind by 2 changes or less
```
Set the `-writable` flag also list the outdated replicas of writable repositories.
A confirmation is printed out when every repository is writable. For example:
```shell
Virtual storage: default
Primary: gitaly-1
All repositories are writable!
```
#### Outdated replicas of writable repositories
> [Introduced](https://gitlab.com/gitlab-org/gitaly/-/issues/3019) in GitLab 13.3.
To also list information for outdated replicas of writable repositories, use the
`-partially-replicated` parameter.
A repository is writable if the primary has the latest changes. Secondaries might be temporarily
outdated while they are waiting to replicate the latest changes.
```shell
sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dataloss [-virtual-storage <virtual-storage>] [-partially-replicated]
```
Example output:
```shell
Virtual storage: default
......@@ -1098,15 +1143,8 @@ Virtual storage: default
gitaly-2 is behind by 1 change or less
```
A confirmation is printed out when every repository is writable.
```shell
Virtual storage: default
Primary: gitaly-1
All repositories are writable!
```
With the `-writable` flag set, a confirmation is printed out if every replica is fully up to date.
With the `-partially-replicated` flag set, a confirmation is printed out if every replica is fully up to date.
For example:
```shell
Virtual storage: default
......@@ -1114,17 +1152,15 @@ Virtual storage: default
All repositories are up to date!
```
NOTE: **Note:**
`dataloss` is still in beta and the output format is subject to change.
### Checking repository checksums
### Check repository checksums
To check a project's repository checksums across on all Gitaly nodes, run the
[replicas Rake task](../raketasks/praefect.md#replica-checksums) on the main GitLab node.
### Recover missing data
The Praefect `reconcile` sub-command can be used to recover unreplicated changes from another replica. The source must be on a later generation than the target storage.
The Praefect `reconcile` sub-command can be used to recover unreplicated changes from another replica.
The source must be on a later version than the target storage.
```shell
sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml reconcile -virtual <virtual-storage> -reference <up-to-date-storage> -target <outdated-storage> -f
......@@ -1132,30 +1168,30 @@ sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.t
Refer to [Backend Node Recovery](#backend-node-recovery) section for more details on the `reconcile` sub-command.
### Accept data loss
### Enable writes or accept data loss
Praefect provides the following subcommands to re-enable writes:
- In GitLab 13.2 and earlier, `enable-writes` to re-enable virtual storage for writes
after data recovery attempts.
- In GitLab 13.2 and earlier, `enable-writes` to re-enable virtual storage for writes after data
recovery attempts.
```shell
sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml enable-writes -virtual-storage <virtual-storage>
```
- [In GitLab 13.3](https://gitlab.com/gitlab-org/gitaly/-/merge_requests/2415) and
later, `accept-dataloss` to accept data loss and re-enable writes for repositories
after data recovery attempts have failed. Accepting data loss causes current version
of the repository on the authoritative storage to be considered latest. Other storages
are brought up to date with the authoritative storage by scheduling replication jobs.
- [In GitLab 13.3](https://gitlab.com/gitlab-org/gitaly/-/merge_requests/2415) and later,
`accept-dataloss` to accept data loss and re-enable writes for repositories after data recovery
attempts have failed. Accepting data loss causes current version of the repository on the
authoritative storage to be considered latest. Other storages are brought up to date with the
authoritative storage by scheduling replication jobs.
```shell
sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml accept-dataloss -virtual-storage <virtual-storage> -repository <relative-path> -authoritative-storage <storage-name>
```
CAUTION: **Caution:**
`accept-dataloss` causes permanent data loss by overwriting other versions of the
repository. Data recovery efforts must be performed before using it.
`accept-dataloss` causes permanent data loss by overwriting other versions of the repository. Data
[recovery efforts](#recover-missing-data) must be performed before using it.
## Backend Node Recovery
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment