Commit bbde8d09 authored by Dmitry Gruzd's avatar Dmitry Gruzd

Advanced Search: Estimate ES cluster size

This change introduces a new rake task estimate_cluster_size,
which could be useful for estimating ES cluster size
parent 2d12dc1c
......@@ -56,6 +56,12 @@ A few notes on CPU and storage:
to any spinning media for Elasticsearch. In testing, nodes that use SSD storage
see boosts in both query and indexing performance.
- We've introduced the [`estimate_cluster_size`](#gitlab-advanced-search-rake-tasks)
Rake task to estimate the Advanced Search storage requirements in advance, which
- The [`estimate_cluster_size`](#gitlab-advanced-search-rake-tasks) Rake task estimates the
Advanced Search storage requirements in advance. The Rake task uses total repository size
for the calculation. [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/221177) in GitLab 13.10.
Keep in mind, these are **minimum requirements** for Elasticsearch.
Heavily-used Elasticsearch clusters will likely require considerably more
resources.
......@@ -421,8 +427,9 @@ The following are some available Rake tasks:
| [`sudo gitlab-rake gitlab:elastic:index_snippets`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Performs an Elasticsearch import that indexes the snippets data. |
| [`sudo gitlab-rake gitlab:elastic:projects_not_indexed`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Displays which projects are not indexed. |
| [`sudo gitlab-rake gitlab:elastic:reindex_cluster`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Schedules a zero-downtime cluster reindexing task. This feature should be used with an index that was created after GitLab 13.0. |
| [`sudo gitlab-rake gitlab:elastic:mark_reindex_failed`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake)`] | Mark the most recent re-index job as failed. |
| [`sudo gitlab-rake gitlab:elastic:list_pending_migrations`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake)`] | List pending migrations. Pending migrations include those that have not yet started, have started but not finished, and those that are halted. |
| [`sudo gitlab-rake gitlab:elastic:mark_reindex_failed`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Mark the most recent re-index job as failed. |
| [`sudo gitlab-rake gitlab:elastic:list_pending_migrations`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | List pending migrations. Pending migrations include those that have not yet started, have started but not finished, and those that are halted. |
| [`sudo gitlab-rake gitlab:elastic:estimate_cluster_size`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Get an estimate of cluster size based on the total repository size. |
### Environment variables
......
---
title: 'Advanced Search: Estimate Elasticsearch cluster size'
merge_request: 54430
author:
type: added
......@@ -163,6 +163,21 @@ namespace :gitlab do
end
end
desc "GitLab | Elasticsearch | Estimate Cluster size"
task estimate_cluster_size: :environment do
include ActionView::Helpers::NumberHelper
total_size = Namespace::RootStorageStatistics.sum(:repository_size).to_i
total_size_human = number_to_human_size(total_size, delimiter: ',', precision: 1, significant: false)
estimated_cluster_size = total_size * 0.5
estimated_cluster_size_human = number_to_human_size(estimated_cluster_size, delimiter: ',', precision: 1, significant: false)
puts "This GitLab instance repository size is #{total_size_human}."
puts "By our estimates for such repository size, your cluster size should be at least #{estimated_cluster_size_human}.".color(:green)
puts 'Please note that it is possible to index only selected namespaces/projects by using Elasticsearch indexing restrictions.'
end
def project_id_batches(&blk)
relation = Project.all
......
......@@ -231,4 +231,18 @@ RSpec.describe 'gitlab:elastic namespace rake tasks', :elastic do
end
end
end
describe 'estimate_cluster_size' do
subject { run_rake_task('gitlab:elastic:estimate_cluster_size') }
before do
create(:namespace_root_storage_statistics, repository_size: 1.megabyte)
create(:namespace_root_storage_statistics, repository_size: 10.megabyte)
create(:namespace_root_storage_statistics, repository_size: 30.megabyte)
end
it 'outputs estimates' do
expect { subject }.to output(/your cluster size should be at least 20.5 MB/).to_stdout
end
end
end
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment