Force Postgres to avoid trigram indexes when in a group

When filtering issues with a search string in a group, we observed on GitLab.com that Postgres was using an inefficient query plan, preferring the (global) trigram indexes on description and title, rather than using a filter on the restricted set of issues within the group. Change the callers of the IssuableFinder to use a CTE in this case to fence the rest of the query from the LIKE filters, so that the optimiser is forced to perform the filter in the order we prefer. This will only force the use of a CTE when: 1. The use_cte_for_search params is truthy. 2. We are using Postgres. 3. We have passed the `search` param. The third item is important - searching issues using the search box does not use the finder in this way, but contructs a query and appends `full_search` to that. For some reason, this query does not suffer from the same issue. Currenly, we only pass this param when filtering issuables (issues or MRs) in a group context.

Force Postgres to avoid trigram indexes when in a group
When filtering issues with a search string in a group, we observed on GitLab.com that Postgres was using an inefficient query plan, preferring the (global) trigram indexes on description and title, rather than using a filter on the restricted set of issues within the group. Change the callers of the IssuableFinder to use a CTE in this case to fence the rest of the query from the LIKE filters, so that the optimiser is forced to perform the filter in the order we prefer. This will only force the use of a CTE when: 1. The use_cte_for_search params is truthy. 2. We are using Postgres. 3. We have passed the `search` param. The third item is important - searching issues using the search box does not use the finder in this way, but contructs a query and appends `full_search` to that. For some reason, this query does not suffer from the same issue. Currenly, we only pass this param when filtering issuables (issues or MRs) in a group context.
c03386c3 · Sean McGivern · 57e6a98c · c03386c3 · c03386c3 · c03386c3
Commit c03386c3 authored Jun 05, 2018 by Sean McGivern
3 changed files
--- a/app/controllers/concerns/issuable_collections.rb
+++ b/app/controllers/concerns/issuable_collections.rb
@@ -95,6 +95,7 @@ module IssuableCollections
    elsif @group
      @filter_params[:group_id] = @group.id
      @filter_params[:include_subgroups] = true
+      @filter_params[:use_cte_for_search] = true
    end

    @filter_params.permit(finder_type.valid_params)

--- a/app/finders/issuable_finder.rb
+++ b/app/finders/issuable_finder.rb
@@ -23,6 +23,7 @@
 #     created_before: datetime
 #     updated_after: datetime
 #     updated_before: datetime
+#     use_cte_for_search: boolean
 #
 class IssuableFinder
  prepend FinderWithCrossProjectAccess
@@ -54,6 +55,7 @@ class IssuableFinder
      sort
      state
      include_subgroups
+      use_cte_for_search
    ]
  end

@@ -74,19 +76,21 @@ class IssuableFinder
    items = init_collection
    items = filter_items(items)

-    # Filtering by project HAS TO be the last because we use the project IDs yielded by the issuable query thus far
-    items = by_project(items)
+    # This has to be last as we may use a CTE as an optimization fence by
+    # passing the use_cte_for_search param
+    # https://www.postgresql.org/docs/current/static/queries-with.html
+    items = by_search(items)

    sort(items)
  end

  def filter_items(items)
+    items = by_project(items)
    items = by_scope(items)
    items = by_created_at(items)
    items = by_updated_at(items)
    items = by_state(items)
    items = by_group(items)
-    items = by_search(items)
    items = by_assignee(items)
    items = by_author(items)
    items = by_non_archived(items)
@@ -107,7 +111,6 @@ class IssuableFinder
  #
  def count_by_state
    count_params = params.merge(state: nil, sort: nil)
-    labels_count = label_names.any? ? label_names.count : 1
    finder = self.class.new(current_user, count_params)
    counts = Hash.new(0)

@@ -116,6 +119,11 @@ class IssuableFinder
    # per issuable, so we have to count those in Ruby - which is bad, but still
    # better than performing multiple queries.
    #
+    # This does not apply when we are using a CTE for the search, as the labels
+    # GROUP BY is inside the subquery in that case, so we set labels_count to 1.
+    labels_count = label_names.any? ? label_names.count : 1
+    labels_count = 1 if use_cte_for_search?
+
    finder.execute.reorder(nil).group(:state).count.each do |key, value|
      counts[Array(key).last.to_sym] += value / labels_count
    end
@@ -326,8 +334,24 @@ class IssuableFinder
    items
  end

+  def use_cte_for_search?
+    return false unless search
+    return false unless Gitlab::Database.postgresql?
+
+    params[:use_cte_for_search]
+  end
+
  def by_search(items)
-    search ? items.full_search(search) : items
+    return items unless search
+
+    if use_cte_for_search?
+      cte = Gitlab::SQL::RecursiveCTE.new(klass.table_name)
+      cte << items
+
+      items = klass.with(cte.to_arel).from(klass.table_name)
+    end
+
+    items.full_search(search)
  end

  def by_iids(items)

--- a/changelogs/unreleased/46648-timeout-searching-group-issues.yml
+++ b/changelogs/unreleased/46648-timeout-searching-group-issues.yml
+---
+title: Improve performance of group issues filtering on GitLab.com
+merge_request: 19429
+author:
+type: performance