Commit 4f1eebd1 authored by Marcia Ramos's avatar Marcia Ramos

Merge branch '341377-documentation-for-idempotency-with-lb' into 'master'

Update documentation for idempotency jobs

See merge request gitlab-org/gitlab!71205
parents 87707291 7073609f
......@@ -154,12 +154,6 @@ A good example of that would be a cache expiration worker.
A job scheduled for an idempotent worker is [deduplicated](#deduplication) when
an unstarted job with the same arguments is already in the queue.
WARNING:
For [data consistency jobs](#job-data-consistency-strategies), the deduplication is not compatible with the
`data_consistency` attribute set to `:sticky` or `:delayed`.
The reason for this is that deduplication always takes into account the latest binary replication pointer into account, not the first one.
There is an [open issue](https://gitlab.com/gitlab-org/gitlab/-/issues/325291) to improve this.
### Ensuring a worker is idempotent
Make sure the worker tests pass using the following shared example:
......@@ -285,6 +279,55 @@ module AuthorizedProjectUpdate
end
```
### Deduplication with load balancing
> [Introduced](https://gitlab.com/groups/gitlab-org/-/epics/6763) in GitLab 14.4.
Jobs that declare either `:sticky` or `:delayed` data consistency
are eligible for database load-balancing.
In both cases, jobs are [scheduled in the future](#scheduling-jobs-in-the-future) with a short delay (1 second).
This minimizes the chance of replication lag after a write.
If you really want to deduplicate jobs eligible for load balancing,
specify `including_scheduled: true` argument when defining deduplication strategy:
```ruby
class DelayedIdempotentWorker
include ApplicationWorker
data_consistency :delayed
deduplicate :until_executing, including_scheduled: true
idempotent!
# ...
end
```
#### Preserve the latest WAL location for idempotent jobs
> - [Introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/69372) in GitLab 14.3.
> - [Enabled on GitLab.com](https://gitlab.com/gitlab-org/gitlab/-/issues/338350) in GitLab 14.4.
The deduplication always take into account the latest binary replication pointer, not the first one.
This happens because we drop the same job scheduled for the second time and the Write-Ahead Log (WAL) is lost.
This could lead to comparing the old WAL location and reading from a stale replica.
To support both deduplication and maintaining data consistency with load balancing,
we are preserving the latest WAL location for idempotent jobs in Redis.
This way we are always comparing the latest binary replication pointer,
making sure that we read from the replica that is fully caught up.
FLAG:
On self-managed GitLab, by default this feature is not available.
To make it available,
ask an administrator to [enable the preserve_latest_wal_locations_for_idempotent_jobs flag](../administration/feature_flags.md).
FLAG:
On self-managed GitLab, by default this feature is not available.
To make it available,
ask an administrator to [enable the `preserve_latest_wal_locations_for_idempotent_jobs` flag](../administration/feature_flags.md).
This feature flag is related to GitLab development and is not intended to be used by GitLab administrators, though.
On GitLab.com, this feature is available but can be configured by GitLab.com administrators only.
## Limited capacity worker
It is possible to limit the number of concurrent running jobs for a worker class
......@@ -553,11 +596,6 @@ class DelayedWorker
end
```
For [idempotent jobs](#idempotent-jobs), the deduplication is not compatible with the
`data_consistency` attribute set to `:sticky` or `:delayed`.
The reason for this is that deduplication always takes into account the latest binary replication pointer into account, not the first one.
There is an [open issue](https://gitlab.com/gitlab-org/gitlab/-/issues/325291) to improve this.
### `feature_flag` property
The `feature_flag` property allows you to toggle a job's `data_consistency`,
......@@ -583,6 +621,12 @@ class DelayedWorker
end
```
### Data consistency with idempotent jobs
For [idempotent jobs](#idempotent-jobs) that declare either `:sticky` or `:delayed` data consistency, we are
[preserving the latest WAL location](#preserve-the-latest-wal-location-for-idempotent-jobs) while deduplicating,
ensuring that we read from the replica that is fully caught up.
## Jobs with External Dependencies
Most background jobs in the GitLab application communicate with other GitLab
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment