Commit 76667af2 authored by Changzheng Liu's avatar Changzheng Liu Committed by Dylan Griffith

Remove target_name parameter from Elasticsearch rake tasks

parent 9da4bd72
......@@ -88,7 +88,7 @@ updated automatically.
Since Elasticsearch can read and use indices created in the previous major version, you don't need to change anything in the GitLab configuration when upgrading Elasticsearch.
The only thing worth noting is that if you have created your current index before GitLab 13.0, you might want to [reclaim the production index name](#reclaiming-the-gitlab-production-index-name) or reindex from scratch (which will implicitly create an alias). The latter might be faster depending on the GitLab instance size. Once you do that, you'll be able to perform zero-downtime reindexing and you will benefit from any future features that will make use of the alias.
The only thing worth noting is that if you have created your current index before GitLab 13.0, you might want to reindex from scratch (which will implicitly create an alias) in order to use some features, for example [Zero downtime reindexing](#zero-downtime-reindexing). Once you do that, you'll be able to perform zero-downtime reindexing and will benefit from any future features that make use of the alias.
## Elasticsearch repository indexer
......@@ -300,139 +300,12 @@ To disable the Elasticsearch integration:
## Zero downtime reindexing
The idea behind this reindexing method is to leverage Elasticsearch index alias
feature to atomically swap between two indices. We'll refer to each index as
`primary` (online and used by GitLab for read/writes) and `secondary`
(offline, for reindexing purpose).
Instead of connecting directly to the `primary` index, we'll setup an index
alias such as we can change the underlying index at will.
NOTE:
Any index attached to the production alias is deemed a `primary` and will be
used by the GitLab Advanced Search integration.
### Pause the indexing
In the **Admin Area > Settings > Advanced Search** section, select the
**Pause Elasticsearch Indexing** setting, and then save your change.
With this, all updates that should happen on your Elasticsearch index will be
buffered and caught up after resuming.
The indexing will also be automatically paused when the [**Trigger cluster reindexing**](#trigger-the-reindex-via-the-advanced-search-administration) button is used, and resumes when the reindexing completes or aborts.
### Setup
NOTE:
If your index was created with GitLab 13.0 or greater, you can directly
[trigger the reindex](#trigger-the-reindex-via-the-advanced-search-administration).
This process involves several shell commands and curl invocations, so a good
initial setup will help for later:
```shell
# You can find this value under Admin Area > Settings > Advanced Search > URL
export CLUSTER_URL="http://localhost:9200"
export PRIMARY_INDEX="gitlab-production"
export SECONDARY_INDEX="gitlab-production-$(date +%s)"
```
### Reclaiming the `gitlab-production` index name
WARNING:
It is highly recommended that you take a snapshot of your cluster to ensure
there is a recovery path if anything goes wrong.
Due to a technical limitation, there will be a slight downtime because of the
fact that we need to reclaim the current `primary` index to be used as the alias.
To reclaim the `gitlab-production` index name, you need to first create a `secondary` index and then trigger the re-index from `primary`.
#### Creating a secondary index
To create a secondary index, run the following Rake task. The `SKIP_ALIAS`
environment variable will disable the automatic creation of the Elasticsearch
alias, which would conflict with the existing index under `$PRIMARY_INDEX`, and will
not create a separate Issue index:
```shell
# Omnibus installation
sudo SKIP_ALIAS=1 gitlab-rake "gitlab:elastic:create_empty_index[$SECONDARY_INDEX]"
# Source installation
SKIP_ALIAS=1 bundle exec rake "gitlab:elastic:create_empty_index[$SECONDARY_INDEX]"
```
The index should be created successfully, with the latest index options and
mappings.
#### Trigger the re-index from `primary`
To trigger the re-index from `primary` index:
1. Use the Elasticsearch [Reindex API](https://www.elastic.co/guide/en/elasticsearch/reference/7.6/docs-reindex.html):
```shell
curl --request POST \
--header 'Content-Type: application/json' \
--data "{ \"source\": { \"index\": \"$PRIMARY_INDEX\" }, \"dest\": { \"index\": \"$SECONDARY_INDEX\" } }" \
"$CLUSTER_URL/_reindex?slices=auto&wait_for_completion=false"
```
There will be an output like:
```plaintext
{"task":"3qw_Tr0YQLq7PF16Xek8YA:1012"}
```
Note the `task` value, as it will be useful to follow the reindex progress.
1. Wait for the reindex process to complete by checking the `completed` value.
Using the `task` value form the previous step:
```shell
export TASK_ID=3qw_Tr0YQLq7PF16Xek8YA:1012
curl "$CLUSTER_URL/_tasks/$TASK_ID?pretty"
```
The output will be like:
```plaintext
{"completed":false, …}
```
After the returned value is `true`, continue to the next step.
1. Ensure that the secondary index has data in it. You can use the
Elasticsearch API to look for the index size and compare our two indices:
```shell
curl $CLUSTER_URL/$PRIMARY_INDEX/_count => 123123
curl $CLUSTER_URL/$SECONDARY_INDEX/_count => 123123
```
NOTE:
Comparing the document count is more accurate than using the index size, as improvements to the storage might cause the new index to be smaller than the original one.
1. After you are confident your `secondary` index is valid, you can process to
the creation of the alias.
```shell
# Delete the original index
curl --request DELETE $CLUSTER_URL/$PRIMARY_INDEX
# Create the alias and add the `secondary` index to it
curl --request POST \
--header 'Content-Type: application/json' \
--data "{\"actions\":[{\"add\":{\"index\":\"$SECONDARY_INDEX\",\"alias\":\"$PRIMARY_INDEX\"}}]}}" \
$CLUSTER_URL/_aliases
```
The reindexing is now completed. Your GitLab instance is now ready to use the [automated in-cluster reindexing](#trigger-the-reindex-via-the-advanced-search-administration) feature for future reindexing.
1. Resume the indexing
Under **Admin Area > Settings > Advanced Search**, uncheck the **Pause Elasticsearch Indexing** setting and save.
The idea behind this reindexing method is to leverage the [Elasticsearch reindex API](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html)
and Elasticsearch index alias feature to perform the operation. We set up an index alias which connects to a
`primary` index which is used by GitLab for reads/writes. When reindexing process starts, we temporarily pause
the writes to the `primary` index. Then, we create another index and invoke the Reindex API which migrates the
index data onto the new index. Once the reindexing job is complete, we switch to the new index by connecting the
index alias to it which becomes the new `primary` index. At the end, we unpause the writes and normal operation resumes.
### Trigger the reindex via the Advanced Search administration
......@@ -538,17 +411,14 @@ The following are some available Rake tasks:
| [`sudo gitlab-rake gitlab:elastic:index_projects`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Iterates over all projects and queues Sidekiq jobs to index them in the background. |
| [`sudo gitlab-rake gitlab:elastic:index_projects_status`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Determines the overall status of the indexing. It is done by counting the total number of indexed projects, dividing by a count of the total number of projects, then multiplying by 100. |
| [`sudo gitlab-rake gitlab:elastic:clear_index_status`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Deletes all instances of IndexStatus for all projects. Note that this command will result in a complete wipe of the index, and it should be used with caution. |
| [`sudo gitlab-rake gitlab:elastic:create_empty_index[<TARGET_NAME>]`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Generates empty indexes (the default index and a separate issues index) and assigns an alias for each on the Elasticsearch side only if it doesn't already exist. |
| [`sudo gitlab-rake gitlab:elastic:delete_index[<TARGET_NAME>]`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Removes the GitLab indexes and aliases (if they exist) on the Elasticsearch instance. |
| [`sudo gitlab-rake gitlab:elastic:recreate_index[<TARGET_NAME>]`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Wrapper task for `gitlab:elastic:delete_index[<TARGET_NAME>]` and `gitlab:elastic:create_empty_index[<TARGET_NAME>]`. |
| [`sudo gitlab-rake gitlab:elastic:create_empty_index`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Generates empty indexes (the default index and a separate issues index) and assigns an alias for each on the Elasticsearch side only if it doesn't already exist. |
| [`sudo gitlab-rake gitlab:elastic:delete_index`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Removes the GitLab indexes and aliases (if they exist) on the Elasticsearch instance. |
| [`sudo gitlab-rake gitlab:elastic:recreate_index`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Wrapper task for `gitlab:elastic:delete_index` and `gitlab:elastic:create_empty_index`. |
| [`sudo gitlab-rake gitlab:elastic:index_snippets`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Performs an Elasticsearch import that indexes the snippets data. |
| [`sudo gitlab-rake gitlab:elastic:projects_not_indexed`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Displays which projects are not indexed. |
| [`sudo gitlab-rake gitlab:elastic:reindex_cluster`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Schedules a zero-downtime cluster reindexing task. This feature should be used with an index that was created after GitLab 13.0. |
| [`sudo gitlab-rake gitlab:elastic:mark_reindex_failed`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake)`] | Mark the most recent re-index job as failed. |
NOTE:
The `TARGET_NAME` parameter is optional and will use the default index/alias name from the current `RAILS_ENV` if not set.
### Environment variables
In addition to the Rake tasks, there are some environment variables that can be used to modify the process:
......
---
title: Remove target_name parameter from Elasticsearch rake tasks
merge_request: 52958
author:
type: changed
......@@ -60,14 +60,11 @@ namespace :gitlab do
end
desc "GitLab | Elasticsearch | Create empty indexes and assigns an alias for each"
task :create_empty_index, [:target_name] => [:environment] do |t, args|
task create_empty_index: [:environment] do |t, args|
with_alias = ENV["SKIP_ALIAS"].nil?
options = {}
# only create an index at the specified name
options[:index_name] = args[:target_name] unless with_alias
helper = Gitlab::Elastic::Helper.new(target_name: args[:target_name])
helper = Gitlab::Elastic::Helper.default
index_name = helper.create_empty_index(with_alias: with_alias, options: options)
# with_alias is used to support interacting with a specific index (such as when reclaiming the production index
......@@ -89,8 +86,8 @@ namespace :gitlab do
end
desc "GitLab | Elasticsearch | Delete all indexes"
task :delete_index, [:target_name] => [:environment] do |t, args|
helper = Gitlab::Elastic::Helper.new(target_name: args[:target_name])
task delete_index: [:environment] do |t, args|
helper = Gitlab::Elastic::Helper.default
if helper.delete_index
puts "Index/alias '#{helper.target_name}' has been deleted".color(:green)
......@@ -115,7 +112,7 @@ namespace :gitlab do
end
desc "GitLab | Elasticsearch | Recreate indexes"
task :recreate_index, [:target_name] => [:environment] do |t, args|
task recreate_index: [:environment] do |t, args|
Rake::Task["gitlab:elastic:delete_index"].invoke(*args)
Rake::Task["gitlab:elastic:create_empty_index"].invoke(*args)
end
......
......@@ -21,18 +21,14 @@ RSpec.describe 'gitlab:elastic namespace rake tasks', :elastic do
end
context 'when SKIP_ALIAS environment variable is set' do
let(:secondary_index_name) { "gitlab-test-#{Time.now.strftime("%Y%m%d-%H%M")}"}
before do
stub_env('SKIP_ALIAS', '1')
end
after do
es_helper.delete_index(index_name: secondary_index_name)
es_helper.client.indices.delete(index: "#{es_helper.target_name}*")
end
subject { run_rake_task('gitlab:elastic:create_empty_index', secondary_index_name) }
it 'does not alias the new index' do
expect { subject }.not_to change { es_helper.alias_exists?(name: es_helper.target_name) }
end
......@@ -44,10 +40,6 @@ RSpec.describe 'gitlab:elastic namespace rake tasks', :elastic do
expect { subject }.not_to change { es_helper.index_exists?(index_name: migration_index_name) }
end
it 'creates an index at the specified name' do
expect { subject }.to change { es_helper.index_exists?(index_name: secondary_index_name) }.from(false).to(true)
end
Gitlab::Elastic::Helper::ES_SEPARATE_CLASSES.each do |class_name|
describe "#{class_name}" do
it "does not create a standalone index" do
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment