Since Elasticsearch can read and use indices created in the previous major version, you don't need to change anything in the GitLab configuration when upgrading Elasticsearch.
The only thing worth noting is that if you have created your current index before GitLab 13.0, you might want to [reclaim the production index name](#reclaiming-the-gitlab-production-index-name) or reindex from scratch (which will implicitly create an alias). The latter might be faster depending on the GitLab instance size. Once you do that, you'll be able to perform zero-downtime reindexing and you will benefit from any future features that will make use of the alias.
The only thing worth noting is that if you have created your current index before GitLab 13.0, you might want to reindex from scratch (which will implicitly create an alias) in order to use some features, for example [Zero downtime reindexing](#zero-downtime-reindexing). Once you do that, you'll be able to perform zero-downtime reindexing and will benefit from any future features that make use of the alias.
## Elasticsearch repository indexer
...
...
@@ -300,139 +300,12 @@ To disable the Elasticsearch integration:
## Zero downtime reindexing
The idea behind this reindexing method is to leverage Elasticsearch index alias
feature to atomically swap between two indices. We'll refer to each index as
`primary` (online and used by GitLab for read/writes) and `secondary`
(offline, for reindexing purpose).
Instead of connecting directly to the `primary` index, we'll setup an index
alias such as we can change the underlying index at will.
NOTE:
Any index attached to the production alias is deemed a `primary` and will be
used by the GitLab Advanced Search integration.
### Pause the indexing
In the **Admin Area > Settings > Advanced Search** section, select the
**Pause Elasticsearch Indexing** setting, and then save your change.
With this, all updates that should happen on your Elasticsearch index will be
buffered and caught up after resuming.
The indexing will also be automatically paused when the [**Trigger cluster reindexing**](#trigger-the-reindex-via-the-advanced-search-administration) button is used, and resumes when the reindexing completes or aborts.
### Setup
NOTE:
If your index was created with GitLab 13.0 or greater, you can directly
[trigger the reindex](#trigger-the-reindex-via-the-advanced-search-administration).
This process involves several shell commands and curl invocations, so a good
initial setup will help for later:
```shell
# You can find this value under Admin Area > Settings > Advanced Search > URL
Comparing the document count is more accurate than using the index size, as improvements to the storage might cause the new index to be smaller than the original one.
1. After you are confident your `secondary` index is valid, you can process to
the creation of the alias.
```shell
# Delete the original index
curl --request DELETE $CLUSTER_URL/$PRIMARY_INDEX
# Create the alias and add the `secondary` index to it
The reindexing is now completed. Your GitLab instance is now ready to use the [automated in-cluster reindexing](#trigger-the-reindex-via-the-advanced-search-administration) feature for future reindexing.
1. Resume the indexing
Under **Admin Area > Settings > Advanced Search**, uncheck the **Pause Elasticsearch Indexing** setting and save.
The idea behind this reindexing method is to leverage the [Elasticsearch reindex API](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html)
and Elasticsearch index alias feature to perform the operation. We set up an index alias which connects to a
`primary` index which is used by GitLab for reads/writes. When reindexing process starts, we temporarily pause
the writes to the `primary` index. Then, we create another index and invoke the Reindex API which migrates the
index data onto the new index. Once the reindexing job is complete, we switch to the new index by connecting the
index alias to it which becomes the new `primary` index. At the end, we unpause the writes and normal operation resumes.
### Trigger the reindex via the Advanced Search administration
...
...
@@ -538,17 +411,14 @@ The following are some available Rake tasks:
| [`sudo gitlab-rake gitlab:elastic:index_projects`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Iterates over all projects and queues Sidekiq jobs to index them in the background. |
| [`sudo gitlab-rake gitlab:elastic:index_projects_status`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Determines the overall status of the indexing. It is done by counting the total number of indexed projects, dividing by a count of the total number of projects, then multiplying by 100. |
| [`sudo gitlab-rake gitlab:elastic:clear_index_status`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Deletes all instances of IndexStatus for all projects. Note that this command will result in a complete wipe of the index, and it should be used with caution. |
| [`sudo gitlab-rake gitlab:elastic:create_empty_index[<TARGET_NAME>]`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Generates empty indexes (the default index and a separate issues index) and assigns an alias for each on the Elasticsearch side only if it doesn't already exist. |
| [`sudo gitlab-rake gitlab:elastic:delete_index[<TARGET_NAME>]`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Removes the GitLab indexes and aliases (if they exist) on the Elasticsearch instance. |
| [`sudo gitlab-rake gitlab:elastic:recreate_index[<TARGET_NAME>]`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Wrapper task for `gitlab:elastic:delete_index[<TARGET_NAME>]` and `gitlab:elastic:create_empty_index[<TARGET_NAME>]`. |
| [`sudo gitlab-rake gitlab:elastic:create_empty_index`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Generates empty indexes (the default index and a separate issues index) and assigns an alias for each on the Elasticsearch side only if it doesn't already exist. |
| [`sudo gitlab-rake gitlab:elastic:delete_index`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Removes the GitLab indexes and aliases (if they exist) on the Elasticsearch instance. |
| [`sudo gitlab-rake gitlab:elastic:recreate_index`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Wrapper task for `gitlab:elastic:delete_index` and `gitlab:elastic:create_empty_index`. |
| [`sudo gitlab-rake gitlab:elastic:index_snippets`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Performs an Elasticsearch import that indexes the snippets data. |
| [`sudo gitlab-rake gitlab:elastic:projects_not_indexed`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Displays which projects are not indexed. |
| [`sudo gitlab-rake gitlab:elastic:reindex_cluster`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Schedules a zero-downtime cluster reindexing task. This feature should be used with an index that was created after GitLab 13.0. |
| [`sudo gitlab-rake gitlab:elastic:mark_reindex_failed`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake)`] | Mark the most recent re-index job as failed. |
NOTE:
The `TARGET_NAME` parameter is optional and will use the default index/alias name from the current `RAILS_ENV` if not set.
### Environment variables
In addition to the Rake tasks, there are some environment variables that can be used to modify the process: