Utility for efficient IN SQL queries

35555705 · Adam Hegyi · 3fb23ada · 35555705 · 35555705 · 35555705
Commit 35555705 authored Sep 14, 2021 by Adam Hegyi
14 changed files
--- a/doc/development/database/efficient_in_operator_queries.md
+++ b/doc/development/database/efficient_in_operator_queries.md
+---
+stage: Enablement
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+# Efficient `IN` operator queries
+This document describes a technique for building efficient ordered database queries with the `IN`
+SQL operator and the usage of a GitLab utility module to help apply the technique.
+NOTE:
+The described technique makes heavy use of
+[keyset pagination](pagination_guidelines.md#keyset-pagination).
+It's advised to get familiar with the topic first.
+## Motivation
+In GitLab, many domain objects like `Issue` live under nested hierarchies of projects and groups.
+To fetch nested database records for domain objects at the group-level,
+we often perform queries with the `IN` SQL operator.
+We are usually interested in ordering the records by some attributes
+and limiting the number of records using `ORDER BY` and `LIMIT` clauses for performance.
+Pagination may be used to fetch subsequent records.
+Example tasks requiring querying nested domain objects from the group level:
+- Show first 20 issues by creation date or due date from the group `gitlab-org`.
+- Show first 20 merge_requests by merged at date from the group `gitlab-com`.
+Unfortunately, ordered group-level queries typically perform badly
+as their executions require heavy I/O, memory, and computations.
+Let's do an in-depth examination of executing one such query.
+### Performance problems with `IN` queries
+Consider the task of fetching the twenty oldest created issues
+from the group `gitlab-org` with the following query:
+```sql
+SELECT "issues".*
+FROM "issues"
+WHERE "issues"."project_id" IN
+    (SELECT "projects"."id"
+     FROM "projects"
+     WHERE "projects"."namespace_id" IN
+         (SELECT traversal_ids[array_length(traversal_ids, 1)] AS id
+          FROM "namespaces"
+          WHERE (traversal_ids @> ('{9970}'))))
+ORDER BY "issues"."created_at" ASC,
+         "issues"."id" ASC
+LIMIT 20
+```
+NOTE:
+For pagination, ordering by the `created_at` column is not enough,
+we must add the `id` column as a
+[tie-breaker](pagination_performance_guidelines.md#tie-breaker-column).
+The execution of the query can be largely broken down into three steps:
+1. The database accesses both `namespaces` and `projects` tables
+   to find all projects from all groups in the group hierarchy.
+1. The database retrieves `issues` records for each project causing heavy disk I/O.
+   Ideally, an appropriate index configuration should optimize this process.
+1. The database sorts the `issues` rows in memory by `created_at` and returns `LIMIT 20` rows to
+   the end-user. For large groups, this final step requires both large memory and CPU resources.
+<details>
+<summary>Expand this sentence to see the execution plan for this DB query.</summary>
+<pre><code>
+ Limit  (cost=90170.07..90170.12 rows=20 width=1329) (actual time=967.597..967.607 rows=20 loops=1)
+   Buffers: shared hit=239127 read=3060
+   I/O Timings: read=336.879
+   ->  Sort  (cost=90170.07..90224.02 rows=21578 width=1329) (actual time=967.596..967.603 rows=20 loops=1)
+         Sort Key: issues.created_at, issues.id
+         Sort Method: top-N heapsort  Memory: 74kB
+         Buffers: shared hit=239127 read=3060
+         I/O Timings: read=336.879
+         ->  Nested Loop  (cost=1305.66..89595.89 rows=21578 width=1329) (actual time=4.709..797.659 rows=241534 loops=1)
+               Buffers: shared hit=239121 read=3060
+               I/O Timings: read=336.879
+               ->  HashAggregate  (cost=1305.10..1360.22 rows=5512 width=4) (actual time=4.657..5.370 rows=1528 loops=1)
+                     Group Key: projects.id
+                     Buffers: shared hit=2597
+                     ->  Nested Loop  (cost=576.76..1291.32 rows=5512 width=4) (actual time=2.427..4.244 rows=1528 loops=1)
+                           Buffers: shared hit=2597
+                           ->  HashAggregate  (cost=576.32..579.06 rows=274 width=25) (actual time=2.406..2.447 rows=265 loops=1)
+                                 Group Key: namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)]
+                                 Buffers: shared hit=334
+                                 ->  Bitmap Heap Scan on namespaces  (cost=141.62..575.63 rows=274 width=25) (actual time=1.933..2.330 rows=265 loops=1)
+                                       Recheck Cond: (traversal_ids @> '{9970}'::integer[])
+                                       Heap Blocks: exact=243
+                                       Buffers: shared hit=334
+                                       ->  Bitmap Index Scan on index_namespaces_on_traversal_ids  (cost=0.00..141.55 rows=274 width=0) (actual time=1.897..1.898 rows=265 loops=1)
+                                             Index Cond: (traversal_ids @> '{9970}'::integer[])
+                                             Buffers: shared hit=91
+                           ->  Index Only Scan using index_projects_on_namespace_id_and_id on projects  (cost=0.44..2.40 rows=20 width=8) (actual time=0.004..0.006 rows=6 loops=265)
+                                 Index Cond: (namespace_id = (namespaces.traversal_ids)[array_length(namespaces.traversal_ids, 1)])
+                                 Heap Fetches: 51
+                                 Buffers: shared hit=2263
+               ->  Index Scan using index_issues_on_project_id_and_iid on issues  (cost=0.57..10.57 rows=544 width=1329) (actual time=0.114..0.484 rows=158 loops=1528)
+                     Index Cond: (project_id = projects.id)
+                     Buffers: shared hit=236524 read=3060
+                     I/O Timings: read=336.879
+ Planning Time: 7.750 ms
+ Execution Time: 967.973 ms
+(36 rows)
+</code></pre>
+</details>
+The performance of the query depends on the number of rows in the database.
+On average, we can say the following:
+- Number of groups in the group-hierarchy: less than 1 000
+- Number of projects: less than 5 000
+- Number of issues: less than 100 000
+From the list, it's apparent that the number of `issues` records has
+the largest impact on the performance.
+As per normal usage, we can say that the number of issue records grows
+at a faster rate than the `namespaces` and the `projects` records.
+This problem affects most of our group-level features where records are listed
+in a specific order, such as group-level issues, merge requests pages, and APIs.
+For very large groups the database queries can easily time out, causing HTTP 500 errors.
+## Optimizing ordered `IN` queries
+In the talk
+["How to teach an elephant to dance rock'n'roll"](https://www.youtube.com/watch?v=Ha38lcjVyhQ),
+Maxim Boguk demonstrated a technique to optimize a special class of ordered `IN` queries,
+such as our ordered group-level queries.
+A typical ordered `IN` query may look like this:
+```sql
+SELECT t.* FROM t
+WHERE t.fkey IN (value_set)
+ORDER BY t.pkey
+LIMIT N;
+```
+Here's the key insight used in the technique: we need at most `|value_set| + N` record lookups,
+rather than retrieving all records satisfying the condition `t.fkey IN value_set` (`|value_set|`
+is the number of values in `value_set`).
+We adopted and generalized the technique for use in GitLab by implementing utilities in the
+`Gitlab::Pagination::Keyset::InOperatorOptimization` class to facilitate building efficient `IN`
+queries.
+### Requirements
+The technique is not a drop-in replacement for the existing group-level queries using `IN` operator.
+The technique can only optimize `IN` queries that satisfy the following requirements:
+- `LIMIT` is present, which usually means that the query is paginated
+  (offset or keyset pagination).
+- The column used with the `IN` query and the columns in the `ORDER BY`
+  clause are covered with a database index. The columns in the index must be
+  in the following order: `column_for_the_in_query`, `order by column 1`, and
+  `order by column 2`.
+- The columns in the `ORDER BY` clause are distinct
+  (the combination of the columns uniquely identifies one particular column in the table).
+WARNING:
+This technique will not improve the performance of the `COUNT(*)` queries.
+## The `InOperatorOptimization` module
+> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/67352) in GitLab 14.3.
+The `Gitlab::Pagination::Keyset::InOperatorOptimization` module implements utilities for applying a generalized version of
+the efficient `IN` query technique described in the previous section.
+To build optimized, ordered `IN` queries that meet [the requirements](#requirements),
+use the utility class `QueryBuilder` from the module.
+NOTE:
+The generic keyset pagination module introduced in the merge request
+[51481](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/51481)
+plays a fundamental role in the generalized implementation of the technique
+in `Gitlab::Pagination::Keyset::InOperatorOptimization`.
+### Basic usage of `QueryBuilder`
+To illustrate a basic usage, we will build a query that
+fetches 20 issues with the oldest `created_at` from the group `gitlab-org`.
+The following ActiveRecord query would produce a query similar to
+[the unoptimized query](#performance-problems-with-in-queries) that we examined earlier:
+```ruby
+scope = Issue
+  .where(project_id: Group.find(9970).all_projects.select(:id)) # `gitlab-org` group and its subgroups
+  .order(:created_at, :id)
+  .limit(20)
+```
+Instead, use the query builder `InOperatorOptimization::QueryBuilder` to produce an optimized
+version:
+```ruby
+scope = Issue.order(:created_at, :id)
+array_scope = Group.find(9970).all_projects.select(:id)
+array_mapping_scope = -> (id_expression) { Issue.where(Issue.arel_table[:project_id].eq(id_expression)) }
+finder_query = -> (created_at_expression, id_expression) { Issue.where(Issue.arel_table[:id].eq(id_expression)) }
+Gitlab::Pagination::Keyset::InOperatorOptimization::QueryBuilder.new(
+  scope: scope,
+  array_scope: array_scope,
+  array_mapping_scope: array_mapping_scope,
+  finder_query: finder_query
+).execute.limit(20)
+```
+- `scope` represents the original `ActiveRecord::Relation` object without the `IN` query. The
+  relation should define an order which must be supported by the
+  [keyset pagination library](keyset_pagination.md#usage).
+- `array_scope` contains the `ActiveRecord::Relation` object, which represents the original
+  `IN (subquery)`. The select values must contain the columns by which the subquery is "connected"
+  to the main query: the `id` of the project record.
+- `array_mapping_scope` defines a lambda returning an `ActiveRecord::Relation` object. The lambda
+  matches (`=`) single select values from the `array_scope`. The lambda yields as many
+  arguments as the select values defined in the `array_scope`. The arguments are Arel SQL expressions.
+- `finder_query` loads the actual record row from the database. It must also be a lambda, where
+  the order by column expressions is available for locating the record. In this example, the
+  yielded values are `created_at` and `id` SQL expressions. Finding a record is very fast via the
+  primary key, so we don't use the `created_at` value.
+The following database index on the `issues` table must be present
+to make the query execute efficiently:
+```sql
+"idx_issues_on_project_id_and_created_at_and_id" btree (project_id, created_at, id)
+```
+<details>
+<summary>Expand this sentence to see the SQL query.</summary>
+<pre><code>
+SELECT "issues".*
+FROM
+  (WITH RECURSIVE "array_cte" AS MATERIALIZED
+     (SELECT "projects"."id"
+ FROM "projects"
+ WHERE "projects"."namespace_id" IN
+     (SELECT traversal_ids[array_length(traversal_ids, 1)] AS id
+      FROM "namespaces"
+      WHERE (traversal_ids @> ('{9970}')))),
+                  "recursive_keyset_cte" AS (  -- initializer row start
+                                               (SELECT NULL::issues AS records,
+                                                       array_cte_id_array,
+                                                       issues_created_at_array,
+                                                       issues_id_array,
+                                                       0::bigint AS COUNT
+                                                FROM
+                                                  (SELECT ARRAY_AGG("array_cte"."id") AS array_cte_id_array,
+                                                          ARRAY_AGG("issues"."created_at") AS issues_created_at_array,
+                                                          ARRAY_AGG("issues"."id") AS issues_id_array
+                                                   FROM
+                                                     (SELECT "array_cte"."id"
+                                                      FROM array_cte) array_cte
+                                                   LEFT JOIN LATERAL
+                                                     (SELECT "issues"."created_at",
+                                                             "issues"."id"
+                                                      FROM "issues"
+                                                      WHERE "issues"."project_id" = "array_cte"."id"
+                                                      ORDER BY "issues"."created_at" ASC, "issues"."id" ASC
+                                                      LIMIT 1) issues ON TRUE
+                                                   WHERE "issues"."created_at" IS NOT NULL
+                                                     AND "issues"."id" IS NOT NULL) array_scope_lateral_query
+                                                LIMIT 1)
+                                                -- initializer row finished
+                                             UNION ALL
+                                               (SELECT
+                                                  -- result row start
+                                                  (SELECT issues -- record finder query as the first column
+                                                   FROM "issues"
+                                                   WHERE "issues"."id" = recursive_keyset_cte.issues_id_array[position]
+                                                   LIMIT 1),
+                                                   array_cte_id_array,
+                                                   recursive_keyset_cte.issues_created_at_array[:position_query.position-1]||next_cursor_values.created_at||recursive_keyset_cte.issues_created_at_array[position_query.position+1:],
+                                                   recursive_keyset_cte.issues_id_array[:position_query.position-1]||next_cursor_values.id||recursive_keyset_cte.issues_id_array[position_query.position+1:],
+                                                   recursive_keyset_cte.count + 1
+                                                -- result row finished
+                                                FROM recursive_keyset_cte,
+                                                     LATERAL
+                                                  -- finding the cursor values of the next record start
+                                                  (SELECT created_at,
+                                                          id,
+                                                          position
+                                                   FROM UNNEST(issues_created_at_array, issues_id_array) WITH
+                                                   ORDINALITY AS u(created_at, id, position)
+                                                   WHERE created_at IS NOT NULL
+                                                     AND id IS NOT NULL
+                                                   ORDER BY "created_at" ASC, "id" ASC
+                                                   LIMIT 1) AS position_query,
+                                                  -- finding the cursor values of the next record end
+                                                  -- finding the next cursor values (next_cursor_values_query) start
+                                                             LATERAL
+                                                  (SELECT "record"."created_at",
+                                                          "record"."id"
+                                                   FROM (
+                                                         VALUES (NULL,
+                                                                 NULL)) AS nulls
+                                                   LEFT JOIN
+                                                     (SELECT "issues"."created_at",
+                                                             "issues"."id"
+                                                      FROM (
+                                                              (SELECT "issues"."created_at",
+                                                                      "issues"."id"
+                                                               FROM "issues"
+                                                               WHERE "issues"."project_id" = recursive_keyset_cte.array_cte_id_array[position]
+                                                                 AND recursive_keyset_cte.issues_created_at_array[position] IS NULL
+                                                                 AND "issues"."created_at" IS NULL
+                                                                 AND "issues"."id" > recursive_keyset_cte.issues_id_array[position]
+                                                               ORDER BY "issues"."created_at" ASC, "issues"."id" ASC)
+                                                            UNION ALL
+                                                              (SELECT "issues"."created_at",
+                                                                      "issues"."id"
+                                                               FROM "issues"
+                                                               WHERE "issues"."project_id" = recursive_keyset_cte.array_cte_id_array[position]
+                                                                 AND recursive_keyset_cte.issues_created_at_array[position] IS NOT NULL
+                                                                 AND "issues"."created_at" IS NULL
+                                                               ORDER BY "issues"."created_at" ASC, "issues"."id" ASC)
+                                                            UNION ALL
+                                                              (SELECT "issues"."created_at",
+                                                                      "issues"."id"
+                                                               FROM "issues"
+                                                               WHERE "issues"."project_id" = recursive_keyset_cte.array_cte_id_array[position]
+                                                                 AND recursive_keyset_cte.issues_created_at_array[position] IS NOT NULL
+                                                                 AND "issues"."created_at" > recursive_keyset_cte.issues_created_at_array[position]
+                                                               ORDER BY "issues"."created_at" ASC, "issues"."id" ASC)
+                                                            UNION ALL
+                                                              (SELECT "issues"."created_at",
+                                                                      "issues"."id"
+                                                               FROM "issues"
+                                                               WHERE "issues"."project_id" = recursive_keyset_cte.array_cte_id_array[position]
+                                                                 AND recursive_keyset_cte.issues_created_at_array[position] IS NOT NULL
+                                                                 AND "issues"."created_at" = recursive_keyset_cte.issues_created_at_array[position]
+                                                                 AND "issues"."id" > recursive_keyset_cte.issues_id_array[position]
+                                                               ORDER BY "issues"."created_at" ASC, "issues"."id" ASC)) issues
+                                                      ORDER BY "issues"."created_at" ASC, "issues"."id" ASC
+                                                      LIMIT 1) record ON TRUE
+                                                   LIMIT 1) AS next_cursor_values))
+                                                  -- finding the next cursor values (next_cursor_values_query) END
+SELECT (records).*
+   FROM "recursive_keyset_cte" AS "issues"
+   WHERE (COUNT <> 0)) issues -- filtering out the initializer row
+LIMIT 20
+</code></pre>
+</details>
+### Using the `IN` query optimization
+#### Adding more filters
+In this example, let's add an extra filter by `milestone_id`.
+Be careful when adding extra filters to the query. If the column is not covered by the same index,
+then the query might perform worse than the non-optimized query. The `milestone_id` column in the
+`issues` table is currently covered by a different index:
+```sql
+"index_issues_on_milestone_id" btree (milestone_id)
+```
+Adding the `miletone_id = X` filter to the `scope` argument or to the optimized scope causes bad performance.
+Example (bad):
+```ruby
+Gitlab::Pagination::Keyset::InOperatorOptimization::QueryBuilder.new(
+  scope: scope,
+  array_scope: array_scope,
+  array_mapping_scope: array_mapping_scope,
+  finder_query: finder_query
+).execute
+  .where(milestone_id: 5)
+  .limit(20)
+```
+To address this concern, we could define another index:
+```sql
+"idx_issues_on_project_id_and_milestone_id_and_created_at_and_id" btree (project_id, milestone_id, created_at, id)
+```
+Adding more indexes to the `issues` table could significantly affect the performance of
+the `UPDATE` queries. In this case, it's better to rely on the original query. It means that if we
+want to use the optimization for the unfiltered page we need to add extra logic in the application code:
+```ruby
+if optimization_possible? # no extra params or params covered with the same index as the ORDER BY clause
+  run_optimized_query
+else
+  run_normal_in_query
+end
+```
+#### Multiple `IN` queries
+Let's assume that we want to extend the group-level queries to include only incident and test case
+issue types.
+The original ActiveRecord query would look like this:
+```ruby
+scope = Issue
+  .where(project_id: Group.find(9970).all_projects.select(:id)) # `gitlab-org` group and its subgroups
+  .where(issue_type: [:incident, :test_case]) # 1, 2
+  .order(:created_at, :id)
+  .limit(20)
+```
+To construct the array scope, we'll need to take the Cartesian product of the `project_id IN` and
+the `issue_type IN` queries. `issue_type` is an ActiveRecord enum, so we need to
+construct the following table:
+| `project_id` | `issue_type_value` |
+| ------------ | ------------------ |
+| 2            | 1                  |
+| 2            | 2                  |
+| 5            | 1                  |
+| 5            | 2                  |
+| 10           | 1                  |
+| 10           | 2                  |
+| 9            | 1                  |
+| 9            | 2                  |
+For the `issue_types` query we can construct a value list without querying a table:
+```ruby
+value_list = Arel::Nodes::ValuesList.new([[Issue.issue_types[:incident]],[Issue.issue_types[:test_case]]])
+issue_type_values = Arel::Nodes::Grouping.new(value_list).as('issue_type_values (value)').to_sql
+array_scope = Group
+  .find(9970)
+  .all_projects
+  .from("#{Project.table_name}, #{issue_type_values}")
+  .select(:id, :value)
+```
+Building the `array_mapping_scope` query requires two arguments: `id` and `issue_type_value`:
+```ruby
+array_mapping_scope = -> (id_expression, issue_type_value_expression) { Issue.where(Issue.arel_table[:project_id].eq(id_expression)).where(Issue.arel_table[:issue_type].eq(issue_type_value_expression)) }
+```
+The `scope` and the `finder` queries don't change:
+```ruby
+scope = Issue.order(:created_at, :id)
+finder_query = -> (created_at_expression, id_expression) { Issue.where(Issue.arel_table[:id].eq(id_expression)) }
+Gitlab::Pagination::Keyset::InOperatorOptimization::QueryBuilder.new(
+  scope: scope,
+  array_scope: array_scope,
+  array_mapping_scope: array_mapping_scope,
+  finder_query: finder_query
+).execute.limit(20)
+```
+<details>
+<summary>Expand this sentence to see the SQL query.</summary>
+<pre><code lang='sql'>
+SELECT "issues".*
+FROM
+  (WITH RECURSIVE "array_cte" AS MATERIALIZED
+     (SELECT "projects"."id", "value"
+      FROM projects, (
+                      VALUES (1), (2)) AS issue_type_values (value)
+      WHERE "projects"."namespace_id" IN
+          (WITH RECURSIVE "base_and_descendants" AS (
+                                                       (SELECT "namespaces".*
+                                                        FROM "namespaces"
+                                                        WHERE "namespaces"."type" = 'Group'
+                                                          AND "namespaces"."id" = 9970)
+                                                     UNION
+                                                       (SELECT "namespaces".*
+                                                        FROM "namespaces", "base_and_descendants"
+                                                        WHERE "namespaces"."type" = 'Group'
+                                                          AND "namespaces"."parent_id" = "base_and_descendants"."id")) SELECT "id"
+           FROM "base_and_descendants" AS "namespaces")),
+                  "recursive_keyset_cte" AS (
+                                               (SELECT NULL::issues AS records,
+                                                       array_cte_id_array,
+                                                       array_cte_value_array,
+                                                       issues_created_at_array,
+                                                       issues_id_array,
+                                                       0::bigint AS COUNT
+                                                FROM
+                                                  (SELECT ARRAY_AGG("array_cte"."id") AS array_cte_id_array,
+                                                          ARRAY_AGG("array_cte"."value") AS array_cte_value_array,
+                                                          ARRAY_AGG("issues"."created_at") AS issues_created_at_array,
+                                                          ARRAY_AGG("issues"."id") AS issues_id_array
+                                                   FROM
+                                                     (SELECT "array_cte"."id",
+                                                             "array_cte"."value"
+                                                      FROM array_cte) array_cte
+                                                   LEFT JOIN LATERAL
+                                                     (SELECT "issues"."created_at",
+                                                             "issues"."id"
+                                                      FROM "issues"
+                                                      WHERE "issues"."project_id" = "array_cte"."id"
+                                                        AND "issues"."issue_type" = "array_cte"."value"
+                                                      ORDER BY "issues"."created_at" ASC, "issues"."id" ASC
+                                                      LIMIT 1) issues ON TRUE
+                                                   WHERE "issues"."created_at" IS NOT NULL
+                                                     AND "issues"."id" IS NOT NULL) array_scope_lateral_query
+                                                LIMIT 1)
+                                             UNION ALL
+                                               (SELECT
+                                                  (SELECT issues
+                                                   FROM "issues"
+                                                   WHERE "issues"."id" = recursive_keyset_cte.issues_id_array[POSITION]
+                                                   LIMIT 1), array_cte_id_array,
+                                                             array_cte_value_array,
+                                                             recursive_keyset_cte.issues_created_at_array[:position_query.position-1]||next_cursor_values.created_at||recursive_keyset_cte.issues_created_at_array[position_query.position+1:], recursive_keyset_cte.issues_id_array[:position_query.position-1]||next_cursor_values.id||recursive_keyset_cte.issues_id_array[position_query.position+1:], recursive_keyset_cte.count + 1
+                                                FROM recursive_keyset_cte,
+                                                     LATERAL
+                                                  (SELECT created_at,
+                                                          id,
+                                                          POSITION
+                                                   FROM UNNEST(issues_created_at_array, issues_id_array) WITH
+                                                   ORDINALITY AS u(created_at, id, POSITION)
+                                                   WHERE created_at IS NOT NULL
+                                                     AND id IS NOT NULL
+                                                   ORDER BY "created_at" ASC, "id" ASC
+                                                   LIMIT 1) AS position_query,
+                                                             LATERAL
+                                                  (SELECT "record"."created_at",
+                                                          "record"."id"
+                                                   FROM (
+                                                         VALUES (NULL,
+                                                                 NULL)) AS nulls
+                                                   LEFT JOIN
+                                                     (SELECT "issues"."created_at",
+                                                             "issues"."id"
+                                                      FROM (
+                                                              (SELECT "issues"."created_at",
+                                                                      "issues"."id"
+                                                               FROM "issues"
+                                                               WHERE "issues"."project_id" = recursive_keyset_cte.array_cte_id_array[POSITION]
+                                                                 AND "issues"."issue_type" = recursive_keyset_cte.array_cte_value_array[POSITION]
+                                                                 AND recursive_keyset_cte.issues_created_at_array[POSITION] IS NULL
+                                                                 AND "issues"."created_at" IS NULL
+                                                                 AND "issues"."id" > recursive_keyset_cte.issues_id_array[POSITION]
+                                                               ORDER BY "issues"."created_at" ASC, "issues"."id" ASC)
+                                                            UNION ALL
+                                                              (SELECT "issues"."created_at",
+                                                                      "issues"."id"
+                                                               FROM "issues"
+                                                               WHERE "issues"."project_id" = recursive_keyset_cte.array_cte_id_array[POSITION]
+                                                                 AND "issues"."issue_type" = recursive_keyset_cte.array_cte_value_array[POSITION]
+                                                                 AND recursive_keyset_cte.issues_created_at_array[POSITION] IS NOT NULL
+                                                                 AND "issues"."created_at" IS NULL
+                                                               ORDER BY "issues"."created_at" ASC, "issues"."id" ASC)
+                                                            UNION ALL
+                                                              (SELECT "issues"."created_at",
+                                                                      "issues"."id"
+                                                               FROM "issues"
+                                                               WHERE "issues"."project_id" = recursive_keyset_cte.array_cte_id_array[POSITION]
+                                                                 AND "issues"."issue_type" = recursive_keyset_cte.array_cte_value_array[POSITION]
+                                                                 AND recursive_keyset_cte.issues_created_at_array[POSITION] IS NOT NULL
+                                                                 AND "issues"."created_at" > recursive_keyset_cte.issues_created_at_array[POSITION]
+                                                               ORDER BY "issues"."created_at" ASC, "issues"."id" ASC)
+                                                            UNION ALL
+                                                              (SELECT "issues"."created_at",
+                                                                      "issues"."id"
+                                                               FROM "issues"
+                                                               WHERE "issues"."project_id" = recursive_keyset_cte.array_cte_id_array[POSITION]
+                                                                 AND "issues"."issue_type" = recursive_keyset_cte.array_cte_value_array[POSITION]
+                                                                 AND recursive_keyset_cte.issues_created_at_array[POSITION] IS NOT NULL
+                                                                 AND "issues"."created_at" = recursive_keyset_cte.issues_created_at_array[POSITION]
+                                                                 AND "issues"."id" > recursive_keyset_cte.issues_id_array[POSITION]
+                                                               ORDER BY "issues"."created_at" ASC, "issues"."id" ASC)) issues
+                                                      ORDER BY "issues"."created_at" ASC, "issues"."id" ASC
+                                                      LIMIT 1) record ON TRUE
+                                                   LIMIT 1) AS next_cursor_values)) SELECT (records).*
+   FROM "recursive_keyset_cte" AS "issues"
+   WHERE (COUNT <> 0)) issues
+LIMIT 20
+</code>
+</details>
+NOTE:
+To make the query efficient, the following columns need to be covered with an index: `project_id`, `issue_type`, `created_at`, and `id`.
+#### Batch iteration
+Batch iteration over the records is possible via the keyset `Iterator` class.
+```ruby
+scope = Issue.order(:created_at, :id)
+array_scope = Group.find(9970).all_projects.select(:id)
+array_mapping_scope = -> (id_expression) { Issue.where(Issue.arel_table[:project_id].eq(id_expression)) }
+finder_query = -> (created_at_expression, id_expression) { Issue.where(Issue.arel_table[:id].eq(id_expression)) }
+opts = {
+  in_operator_optimization_options: {
+    array_scope: array_scope,
+    array_mapping_scope: array_mapping_scope,
+    finder_query: finder_query
+  }
+}
+Gitlab::Pagination::Keyset::Iterator.new(scope: scope, **opts).each_batch(of: 100) do |records|
+  puts records.select(:id).map { |r| [r.id] }
+end
+```
+#### Keyset pagination
+The optimization works out of the box with GraphQL and the `keyset_paginate` helper method.
+Read more about [keyset pagination](database/keyset_pagination.md).
+```ruby
+array_scope = Group.find(9970).all_projects.select(:id)
+array_mapping_scope = -> (id_expression) { Issue.where(Issue.arel_table[:project_id].eq(id_expression)) }
+finder_query = -> (created_at_expression, id_expression) { Issue.where(Issue.arel_table[:id].eq(id_expression)) }
+opts = {
+  in_operator_optimization_options: {
+    array_scope: array_scope,
+    array_mapping_scope: array_mapping_scope,
+    finder_query: finder_query
+  }
+}
+issues = Issue
+  .order(:created_at, :id)
+  .keyset_paginate(per_page: 20, keyset_order_options: opts)
+  .records
+```
+#### Offset pagination with Kaminari
+The `ActiveRecord` scope produced by the `InOperatorOptimization` class can be used in
+[offset-paginated](database/pagination_guidelines.md#offset-pagination)
+queries.
+```ruby
+Gitlab::Pagination::Keyset::InOperatorOptimization::QueryBuilder
+  .new(...)
+  .execute
+  .page(1)
+  .per(20)
+  .without_count
+```
+## Generalized `IN` optimization technique
+Let's dive into how `QueryBuilder` builds the optimized query
+to fetch the twenty oldest created issues from the group `gitlab-org`
+using the generalized `IN` optimization technique.
+### Array CTE
+As the first step, we use a common table expression (CTE) for collecting the `projects.id` values.
+This is done by wrapping the incoming `array_scope` ActiveRecord relation parameter with a CTE.
+```sql
+WITH array_cte AS MATERIALIZED (
+  SELECT "projects"."id"
+   FROM "projects"
+   WHERE "projects"."namespace_id" IN
+       (SELECT traversal_ids[array_length(traversal_ids, 1)] AS id
+        FROM "namespaces"
+        WHERE (traversal_ids @> ('{9970}')))
+)
+```
+This query produces the following result set with only one column (`projects.id`):
+| ID  |
+| --- |
+| 9   |
+| 2   |
+| 5   |
+| 10  |
+### Array mapping
+For each project (that is, each record storing a project ID in `array_cte`),
+we will fetch the cursor value identifying the first issue respecting the `ORDER BY` clause.
+As an example, let's pick the first record `ID=9` from `array_cte`.
+The following query should fetch the cursor value `(created_at, id)` identifying
+the first issue record respecting the `ORDER BY` clause for the project with `ID=9`:
+```sql
+SELECT "issues"."created_at", "issues"."id"
+FROM "issues"."project_id"=9
+ORDER BY "issues"."created_at" ASC, "issues"."id" ASC
+LIMIT 1;
+```
+We will use `LATERAL JOIN` to loop over the records in the `array_cte` and find the
+cursor value for each project. The query would be built using the `array_mapping_scope` lambda
+function.
+```sql
+SELECT ARRAY_AGG("array_cte"."id") AS array_cte_id_array,
+  ARRAY_AGG("issues"."created_at") AS issues_created_at_array,
+  ARRAY_AGG("issues"."id") AS issues_id_array
+FROM (
+  SELECT "array_cte"."id" FROM array_cte
+) array_cte
+LEFT JOIN LATERAL
+(
+  SELECT "issues"."created_at", "issues"."id"
+  FROM "issues"
+  WHERE "issues"."project_id" = "array_cte"."id"
+  ORDER BY "issues"."created_at" ASC, "issues"."id" ASC
+  LIMIT 1
+) issues ON TRUE
+```
+Since we have an index on `project_id`, `created_at`, and `id`,
+index-only scans should quickly locate all the cursor values.
+This is how the query could be translated to Ruby:
+```ruby
+created_at_values = []
+id_values = []
+project_ids.map do |project_id|
+  created_at, id = Issue.select(:created_at, :id).where(project_id: project_id).order(:created_at, :id).limit(1).first # N+1 but it's fast
+  created_at_values << created_at
+  id_values << id
+end
+```
+This is what the result set would look like:
+| `project_ids` | `created_at_values` | `id_values` |
+| ------------- | ------------------- | ----------- |
+| 2             | 2020-01-10          | 5           |
+| 5             | 2020-01-05          | 4           |
+| 10            | 2020-01-15          | 7           |
+| 9             | 2020-01-05          | 3           |
+The table shows the cursor values (`created_at, id`) of the first record for each project
+respecting the `ORDER BY` clause.
+At this point, we have the initial data. To start collecting the actual records from the database,
+we'll use a recursive CTE query where each recursion locates one row until
+the `LIMIT` is reached or no more data can be found.
+Here's an outline of the steps we will take in the recursive CTE query
+(expressing the steps in SQL is non-trivial but will be explained next):
+1. Sort the initial resultset according to the `ORDER BY` clause.
+1. Pick the top cursor to fetch the record, this is our first record. In the example,
+this cursor would be (`2020-01-05`, `3`) for `project_id=9`.
+1. We can use (`2020-01-05`, `3`) to fetch the next issue respecting the `ORDER BY` clause
+`project_id=9` filter. This produces an updated resultset.
+  | `project_ids` | `created_at_values` | `id_values` |
+  | ------------- | ------------------- | ----------- |
+  | 2             | 2020-01-10          | 5           |
+  | 5             | 2020-01-05          | 4           |
+  | 10            | 2020-01-15          | 7           |
+  | **9**         | **2020-01-06**      | **6**       |
+1. Repeat 1 to 3 with the updated resultset until we have fetched `N=20` records.
+### Initializing the recursive CTE query
+For the initial recursive query, we'll need to produce exactly one row, we call this the
+initializer query (`initializer_query`).
+Use `ARRAY_AGG` function to compact the initial result set into a single row
+and use the row as the initial value for the recursive CTE query:
+Example initializer row:
+| `records`      | `project_ids`   | `created_at_values` | `id_values` | `Count` | `Position` |
+| -------------- | --------------- | ------------------- | ----------- | ------- | ---------- |
+| `NULL::issues` | `[9, 2, 5, 10]` | `[...]`             | `[...]`     | `0`     | `NULL`     |
+- The `records` column contains our sorted database records, and the initializer query sets the
+first value to `NULL`, which is filtered out later.
+- The `count` column tracks the number of records found. We use this column to filter out the
+initializer row from the result set.
+### Recursive portion of the CTE query
+The result row is produced with the following steps:
+1. [Order the keyset arrays.](#order-the-keyset-arrays)
+1. [Find the next cursor.](#find-the-next-cursor)
+1. [Produce a new row.](#produce-a-new-row)
+#### Order the keyset arrays
+Order the keyset arrays according to the original `ORDER BY` clause with `LIMIT 1` using the
+`UNNEST [] WITH ORDINALITY` table function. The function locates the "lowest" keyset cursor
+values and gives us the array position. These cursor values are used to locate the record.
+NOTE:
+At this point, we haven't read anything from the database tables, because we relied on
+fast index-only scans.
+| `project_ids` | `created_at_values` | `id_values` |
+| ------------- | ------------------- | ----------- |
+| 2             | 2020-01-10          | 5           |
+| 5             | 2020-01-05          | 4           |
+| 10            | 2020-01-15          | 7           |
+| 9             | 2020-01-05          | 3           |
+The first row is the 4th one (`position = 4`), because it has the lowest `created_at` and
+`id` values. The `UNNEST` function also exposes the position using an extra column (note:
+PostgreSQL uses 1-based index).
+Demonstration of the `UNNEST [] WITH ORDINALITY` table function:
+```sql
+SELECT position FROM unnest('{2020-01-10, 2020-01-05, 2020-01-15, 2020-01-05}'::timestamp[], '{5, 4, 7, 3}'::int[])
+  WITH ORDINALITY AS t(created_at, id, position) ORDER BY created_at ASC, id ASC LIMIT 1;
+```
+Result:
+```sql
+position
+----------
+         4
+(1 row)
+```
+#### Find the next cursor
+Now, let's find the next cursor values (`next_cursor_values_query`) for the project with `id = 9`.
+To do that, we build a keyset pagination SQL query. Find the next row after
+`created_at = 2020-01-05` and `id = 3`. Because we order by two database columns, there can be two
+cases:
+- There are rows with `created_at = 2020-01-05` and `id > 3`.
+- There are rows with `created_at > 2020-01-05`.
+Generating this query is done by the generic keyset pagination library. After the query is done,
+we have a temporary table with the next cursor values:
+| `created_at` | ID  |
+| ------------ | --- |
+| 2020-01-06   | 6   |
+#### Produce a new row
+As the final step, we need to produce a new row by manipulating the initializer row
+(`data_collector_query` method). Two things happen here:
+- Read the full row from the DB and return it in the `records` columns. (`result_collector_columns`
+method)
+- Replace the cursor values at the current position with the results from the keyset query.
+Reading the full row from the database is only one index scan by the primary key. We use the
+ActiveRecord query passed as the `finder_query`:
+```sql
+(SELECT "issues".* FROM issues WHERE id = id_values[position] LIMIT 1)
+```
+By adding parentheses, the result row can be put into the `records` column.
+Replacing the cursor values at `position` can be done via standard PostgreSQL array operators:
+```sql
+-- created_at_values column value
+created_at_values[:position-1]||next_cursor_values.created_at||created_at_values[position+1:]
+-- id_values column value
+id_values[:position-1]||next_cursor_values.id||id_values[position+1:]
+```
+The Ruby equivalent would be the following:
+```ruby
+id_values[0..(position - 1)] + [next_cursor_values.id] + id_values[(position + 1)..-1]
+```
+After this, the recursion starts again by finding the next lowest cursor value.
+### Finalizing the query
+For producing the final `issues` rows, we're going to wrap the query with another `SELECT` statement:
+```sql
+SELECT "issues".*
+FROM (
+  SELECT (records).* -- similar to ruby splat operator
+  FROM recursive_keyset_cte
+  WHERE recursive_keyset_cte.count <> 0 -- filter out the initializer row
+) AS issues
+```
+### Performance comparison
+Assuming that we have the correct database index in place, we can compare the query performance by
+looking at the number of database rows accessed by the query.
+- Number of groups: 100
+- Number of projects: 500
+- Number of issues (in the group hierarchy): 50 000
+Standard `IN` query:
+| Query                    | Entries read from index | Rows read from the table | Rows sorted in memory |
+| ------------------------ | ----------------------- | ------------------------ | --------------------- |
+| group hierarchy subquery | 100                     | 0                        | 0                     |
+| project lookup query     | 500                     | 0                        | 0                     |
+| issue lookup query       | 50 000                  | 20                       | 50 000                |
+Optimized `IN` query:
+| Query                    | Entries read from index | Rows read from the table | Rows sorted in memory |
+| ------------------------ | ----------------------- | ------------------------ | --------------------- |
+| group hierarchy subquery | 100                     | 0                        | 0                     |
+| project lookup query     | 500                     | 0                        | 0                     |
+| issue lookup query       | 519                     | 20                       | 10 000                |
+The group and project queries are not using sorting, the necessary columns are read from database
+indexes. These values are accessed frequently so it's very likely that most of the data will be
+in the PostgreSQL's buffer cache.
+The optimized `IN` query will read maximum 519 entries (cursor values) from the index:
+- 500 index-only scans for populating the arrays for each project. The cursor values of the first
+record will be here.
+- Maximum 19 additional index-only scans for the consecutive records.
+The optimized `IN` query will sort the array (cursor values per project array) 20 times, which
+means we'll sort 20 x 500 rows. However, this might be a less memory-intensive task than
+sorting 10 000 rows at once.
+Performance comparison for the `gitlab-org` group:
+| Query                | Number of 8K Buffers involved | Uncached execution time | Cached execution time |
+| -------------------- | ----------------------------- | ----------------------- | --------------------- |
+| `IN` query           | 240833                        | 1.2s                    | 660ms                 |
+| Optimized `IN` query | 9783                          | 450ms                   | 22ms                  |
+NOTE:
+Before taking measurements, the group lookup query was executed separately in order to make
+the group data available in the buffer cache. Since it's a frequently called query, it's going to
+hit many shared buffers during the query execution in the production environment.
--- a/doc/development/database/index.md
+++ b/doc/development/database/index.md
@@ -62,6 +62,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
 - [Query performance guidelines](../query_performance.md)
 - [Pagination guidelines](pagination_guidelines.md)
  - [Pagination performance guidelines](pagination_performance_guidelines.md)
+- [Efficient `IN` operator queries](efficient_in_operator_queries.md)
 ## Case studies

--- a/lib/gitlab/pagination/keyset/column_order_definition.rb
+++ b/lib/gitlab/pagination/keyset/column_order_definition.rb
@@ -173,6 +173,18 @@ module Gitlab
          distinct
        end
+        def order_direction_as_sql_string
+          sql_string = ascending_order? ? +'ASC' : +'DESC'
+          if nulls_first?
+            sql_string << ' NULLS FIRST'
+          elsif nulls_last?
+            sql_string << ' NULLS LAST'
+          end
+          sql_string
+        end
        private
        attr_reader :reversed_order_expression, :nullable, :distinct

--- a/lib/gitlab/pagination/keyset/in_operator_optimization/array_scope_columns.rb
+++ b/lib/gitlab/pagination/keyset/in_operator_optimization/array_scope_columns.rb
+# frozen_string_literal: true
+module Gitlab
+  module Pagination
+    module Keyset
+      module InOperatorOptimization
+        class ArrayScopeColumns
+          ARRAY_SCOPE_CTE_NAME = 'array_cte'
+          def initialize(columns)
+            validate_columns!(columns)
+            array_scope_table = Arel::Table.new(ARRAY_SCOPE_CTE_NAME)
+            @columns = columns.map do |column|
+              ColumnData.new(column, "array_scope_#{column}", array_scope_table)
+            end
+          end
+          def array_scope_cte_name
+            ARRAY_SCOPE_CTE_NAME
+          end
+          def array_aggregated_columns
+            columns.map(&:array_aggregated_column)
+          end
+          def array_aggregated_column_names
+            columns.map(&:array_aggregated_column_name)
+          end
+          def arel_columns
+            columns.map(&:arel_column)
+          end
+          def array_lookup_expressions_by_position(table_name)
+            columns.map do |column|
+              Arel.sql("#{table_name}.#{column.array_aggregated_column_name}[position]")
+            end
+          end
+          private
+          attr_reader :columns
+          def validate_columns!(columns)
+            if columns.blank?
+              msg = <<~MSG
+              No array columns were given.
+              Make sure you explicitly select the columns in the array_scope parameter.
+              Example: Project.select(:id)
+              MSG
+              raise StandardError, msg
+            end
+          end
+        end
+      end
+    end
+  end
+end
--- a/lib/gitlab/pagination/keyset/in_operator_optimization/column_data.rb
+++ b/lib/gitlab/pagination/keyset/in_operator_optimization/column_data.rb
+# frozen_string_literal: true
+module Gitlab
+  module Pagination
+    module Keyset
+      module InOperatorOptimization
+        class ColumnData
+          attr_reader :original_column_name, :as, :arel_table
+          def initialize(original_column_name, as, arel_table)
+            @original_column_name = original_column_name.to_s
+            @as = as.to_s
+            @arel_table = arel_table
+          end
+          def projection
+            arel_column.as(as)
+          end
+          def arel_column
+            arel_table[original_column_name]
+          end
+          def arel_column_as
+            arel_table[as]
+          end
+          def array_aggregated_column_name
+            "#{arel_table.name}_#{original_column_name}_array"
+          end
+          def array_aggregated_column
+            Arel::Nodes::NamedFunction.new('ARRAY_AGG', [arel_column]).as(array_aggregated_column_name)
+          end
+        end
+      end
+    end
+  end
+end
--- a/lib/gitlab/pagination/keyset/in_operator_optimization/order_by_columns.rb
+++ b/lib/gitlab/pagination/keyset/in_operator_optimization/order_by_columns.rb
+# frozen_string_literal: true
+module Gitlab
+  module Pagination
+    module Keyset
+      module InOperatorOptimization
+        class OrderByColumns
+          include Enumerable
+          # This class exposes collection methods for the order by columns
+          #
+          # Example: by modelling the `issues.created_at ASC, issues.id ASC` ORDER BY
+          # SQL clause, this class will receive two ColumnOrderDefinition objects
+          def initialize(columns, arel_table)
+            @columns = columns.map do |column|
+              ColumnData.new(column.attribute_name, "order_by_columns_#{column.attribute_name}", arel_table)
+            end
+          end
+          def arel_columns
+            columns.map(&:arel_column)
+          end
+          def array_aggregated_columns
+            columns.map(&:array_aggregated_column)
+          end
+          def array_aggregated_column_names
+            columns.map(&:array_aggregated_column_name)
+          end
+          def original_column_names
+            columns.map(&:original_column_name)
+          end
+          def original_column_names_as_arel_string
+            columns.map { |c| Arel.sql(c.original_column_name) }
+          end
+          def original_column_names_as_tmp_tamble
+            temp_table = Arel::Table.new('record')
+            original_column_names.map { |c| temp_table[c] }
+          end
+          def cursor_values(table_name)
+            columns.each_with_object({}) do |column, hash|
+              hash[column.original_column_name] = Arel.sql("#{table_name}.#{column.array_aggregated_column_name}[position]")
+            end
+          end
+          def array_lookup_expressions_by_position(table_name)
+            columns.map do |column|
+              Arel.sql("#{table_name}.#{column.array_aggregated_column_name}[position]")
+            end
+          end
+          def replace_value_in_array_by_position_expressions
+            columns.map do |column|
+              name = "#{QueryBuilder::RECURSIVE_CTE_NAME}.#{column.array_aggregated_column_name}"
+              new_value = "next_cursor_values.#{column.original_column_name}"
+              "#{name}[:position_query.position-1]||#{new_value}||#{name}[position_query.position+1:]"
+            end
+          end
+          def each(&block)
+            columns.each(&block)
+          end
+          private
+          attr_reader :columns
+        end
+      end
+    end
+  end
+end
--- a/lib/gitlab/pagination/keyset/in_operator_optimization/query_builder.rb
+++ b/lib/gitlab/pagination/keyset/in_operator_optimization/query_builder.rb
+# frozen_string_literal: true
+module Gitlab
+  module Pagination
+    module Keyset
+      module InOperatorOptimization
+        # rubocop: disable CodeReuse/ActiveRecord
+        class QueryBuilder
+          UnsupportedScopeOrder = Class.new(StandardError)
+          RECURSIVE_CTE_NAME = 'recursive_keyset_cte'
+          RECORDS_COLUMN = 'records'
+          # This class optimizes slow database queries (PostgreSQL specific) where the
+          # IN SQL operator is used with sorting.
+          #
+          # Arguments:
+          # scope - ActiveRecord::Relation supporting keyset pagination
+          # array_scope - ActiveRecord::Relation for the `IN` subselect
+          # array_mapping_scope - Lambda for connecting scope with array_scope
+          # finder_query - ActiveRecord::Relation for finding one row by the passed in cursor values
+          # values - keyset cursor values (optional)
+          #
+          # Example ActiveRecord query: Issues in the namespace hierarchy
+          # > scope = Issue
+          # >  .where(project_id: Group.find(9970).all_projects.select(:id))
+          # >  .order(:created_at, :id)
+          # >  .limit(20);
+          #
+          # Optimized version:
+          #
+          # > scope = Issue.where({}).order(:created_at, :id) # base scope
+          # > array_scope = Group.find(9970).all_projects.select(:id)
+          # > array_mapping_scope = -> (id_expression) { Issue.where(Issue.arel_table[:project_id].eq(id_expression)) }
+          #
+          # # finding the record by id is good enough, we can ignore the created_at_expression
+          # > finder_query = -> (created_at_expression, id_expression) { Issue.where(Issue.arel_table[:id].eq(id_expression)) }
+          #
+          # > Gitlab::Pagination::Keyset::InOperatorOptimization::QueryBuilder.new(
+          # >   scope: scope,
+          # >   array_scope: array_scope,
+          # >   array_mapping_scope: array_mapping_scope,
+          # >   finder_query: finder_query
+          # > ).execute.limit(20)
+          def initialize(scope:, array_scope:, array_mapping_scope:, finder_query:, values: {})
+            @scope, success = Gitlab::Pagination::Keyset::SimpleOrderBuilder.build(scope)
+            unless success
+              error_message = <<~MSG
+              The order on the scope does not support keyset pagination. You might need to define a custom Order object.\n
+              See https://docs.gitlab.com/ee/development/database/keyset_pagination.html#complex-order-configuration\n
+              Or the Gitlab::Pagination::Keyset::Order class for examples
+              MSG
+              raise(UnsupportedScopeOrder, error_message)
+            end
+            @order = Gitlab::Pagination::Keyset::Order.extract_keyset_order_object(scope)
+            @array_scope = array_scope
+            @array_mapping_scope = array_mapping_scope
+            @finder_query = finder_query
+            @values = values
+            @model = @scope.model
+            @table_name = @model.table_name
+            @arel_table = @model.arel_table
+          end
+          def execute
+            selector_cte = Gitlab::SQL::CTE.new(:array_cte, array_scope)
+            cte = Gitlab::SQL::RecursiveCTE.new(RECURSIVE_CTE_NAME, union_args: { remove_duplicates: false, remove_order: false })
+            cte << initializer_query
+            cte << data_collector_query
+            q = cte
+              .apply_to(model.where({})
+              .with(selector_cte.to_arel))
+              .select(result_collector_final_projections)
+              .where("count <> 0") # filter out the initializer row
+            model.from(q.arel.as(table_name))
+          end
+          private
+          attr_reader :array_scope, :scope, :order, :array_mapping_scope, :finder_query, :values, :model, :table_name, :arel_table
+          def initializer_query
+            array_column_names = array_scope_columns.array_aggregated_column_names + order_by_columns.array_aggregated_column_names
+            projections = [
+              *result_collector_initializer_columns,
+              *array_column_names,
+              '0::bigint AS count'
+            ]
+            model.select(projections).from(build_column_arrays_query).limit(1)
+          end
+          # This query finds the first cursor values for each item in the array CTE.
+          #
+          # array_cte:
+          #
+          # |project_id|
+          # |----------|
+          # |         1|
+          # |         2|
+          # |         3|
+          # |         4|
+          #
+          # For each project_id, find the first issues row by respecting the created_at, id order.
+          #
+          # The `array_mapping_scope` parameter defines how the `array_scope` and the `scope` can be combined.
+          #
+          # scope = Issue.where({}) # empty scope
+          # array_mapping_scope = Issue.where(project_id: X)
+          #
+          # scope.merge(array_mapping_scope) # Issue.where(project_id: X)
+          #
+          # X will be replaced with a value from the `array_cte` temporary table.
+          #
+          # |created_at|id|
+          # |----------|--|
+          # |2020-01-15| 2|
+          # |2020-01-07| 3|
+          # |2020-01-07| 4|
+          # |2020-01-10| 5|
+          def build_column_arrays_query
+            q = Arel::SelectManager.new
+              .project(array_scope_columns.array_aggregated_columns + order_by_columns.array_aggregated_columns)
+              .from(array_cte)
+              .join(Arel.sql("LEFT JOIN LATERAL (#{initial_keyset_query.to_sql}) #{table_name} ON TRUE"))
+            order_by_columns.each { |column| q.where(column.arel_column.not_eq(nil)) }
+            q.as('array_scope_lateral_query')
+          end
+          def array_cte
+            Arel::SelectManager.new
+              .project(array_scope_columns.arel_columns)
+              .from(Arel.sql(array_scope_columns.array_scope_cte_name))
+              .as(array_scope_columns.array_scope_cte_name)
+          end
+          def initial_keyset_query
+            keyset_scope = scope.merge(array_mapping_scope.call(*array_scope_columns.arel_columns))
+            order
+              .apply_cursor_conditions(keyset_scope, values, use_union_optimization: true)
+              .reselect(*order_by_columns.arel_columns)
+              .limit(1)
+          end
+          def data_collector_query
+            array_column_list = array_scope_columns.array_aggregated_column_names
+            order_column_value_arrays = order_by_columns.replace_value_in_array_by_position_expressions
+            select = [
+              *result_collector_columns,
+              *array_column_list,
+              *order_column_value_arrays,
+              "#{RECURSIVE_CTE_NAME}.count + 1"
+            ]
+            from = <<~SQL
+              #{RECURSIVE_CTE_NAME},
+              #{array_order_query.lateral.as('position_query').to_sql},
+              #{ensure_one_row(next_cursor_values_query).lateral.as('next_cursor_values').to_sql}
+            SQL
+            model.select(select).from(from)
+          end
+          # NULL guard. This method ensures that NULL values are returned when the passed in scope returns 0 rows.
+          # Example query: returns issues.id or NULL
+          #
+          # SELECT issues.id FROM (VALUES (NULL)) nulls (id)
+          # LEFT JOIN (SELECT id FROM issues WHERE id = 1 LIMIT 1) issues ON TRUE
+          # LIMIT 1
+          def ensure_one_row(query)
+            q = Arel::SelectManager.new
+            q.projections = order_by_columns.original_column_names_as_tmp_tamble
+            null_values = [nil] * order_by_columns.count
+            from = Arel::Nodes::Grouping.new(Arel::Nodes::ValuesList.new([null_values])).as('nulls')
+            q.from(from)
+            q.join(Arel.sql("LEFT JOIN (#{query.to_sql}) record ON TRUE"))
+            q.limit = 1
+            q
+          end
+          # This subquery finds the cursor values for the next record by sorting the generated cursor arrays in memory and taking the first element.
+          # It combines the cursor arrays (UNNEST) together and sorts them according to the originally defined ORDER BY clause.
+          #
+          # Example: issues in the group hierarchy with ORDER BY created_at, id
+          #
+          # |project_id|  |created_at|id| # 2 arrays combined: issues_created_at_array, issues_id_array
+          # |----------|  |----------|--|
+          # |         1|  |2020-01-15| 2|
+          # |         2|  |2020-01-07| 3|
+          # |         3|  |2020-01-07| 4|
+          # |         4|  |2020-01-10| 5|
+          #
+          # The query will return the cursor values: (2020-01-07, 3) and the array position: 1
+          # From the position, we can tell that the record belongs to the project with id 2.
+          def array_order_query
+            q = Arel::SelectManager.new
+              .project([*order_by_columns.original_column_names_as_arel_string, Arel.sql('position')])
+              .from("UNNEST(#{list(order_by_columns.array_aggregated_column_names)}) WITH ORDINALITY AS u(#{list(order_by_columns.original_column_names)}, position)")
+            order_by_columns.each { |column| q.where(Arel.sql(column.original_column_name).not_eq(nil)) } # ignore rows where all columns are NULL
+            q.order(Arel.sql(order_by_without_table_references)).take(1)
+          end
+          # This subquery finds the next cursor values after the previously determined position (from array_order_query).
+          # The current cursor values are passed in as SQL literals since the actual values are encoded into SQL arrays.
+          #
+          # Example: issues in the group hierarchy with ORDER BY created_at, id
+          #
+          # |project_id|  |created_at|id| # 2 arrays combined: issues_created_at_array, issues_id_array
+          # |----------|  |----------|--|
+          # |         1|  |2020-01-15| 2|
+          # |         2|  |2020-01-07| 3|
+          # |         3|  |2020-01-07| 4|
+          # |         4|  |2020-01-10| 5|
+          #
+          # Assuming that the determined position is 1, the cursor values will be the following:
+          # - Filter: project_id = 2
+          # - created_at = 2020-01-07
+          # - id = 3
+          def next_cursor_values_query
+            cursor_values = order_by_columns.cursor_values(RECURSIVE_CTE_NAME)
+            array_mapping_scope_columns = array_scope_columns.array_lookup_expressions_by_position(RECURSIVE_CTE_NAME)
+            keyset_scope = scope
+              .reselect(*order_by_columns.arel_columns)
+              .merge(array_mapping_scope.call(*array_mapping_scope_columns))
+            order
+              .apply_cursor_conditions(keyset_scope, cursor_values, use_union_optimization: true)
+              .reselect(*order_by_columns.arel_columns)
+              .limit(1)
+          end
+          # Generates an ORDER BY clause by using the column position index and the original order clauses.
+          # This method is used to sort the collected arrays in SQL.
+          # Example: "issues".created_at DESC , "issues".id ASC => 1 DESC, 2 ASC
+          def order_by_without_table_references
+            order.column_definitions.each_with_index.map do |column_definition, i|
+              "#{i + 1} #{column_definition.order_direction_as_sql_string}"
+            end.join(", ")
+          end
+          def result_collector_initializer_columns
+            ["NULL::#{table_name} AS #{RECORDS_COLUMN}"]
+          end
+          def result_collector_columns
+            query = finder_query
+              .call(*order_by_columns.array_lookup_expressions_by_position(RECURSIVE_CTE_NAME))
+              .select("#{table_name}")
+              .limit(1)
+            ["(#{query.to_sql})"]
+          end
+          def result_collector_final_projections
+            ["(#{RECORDS_COLUMN}).*"]
+          end
+          def array_scope_columns
+            @array_scope_columns ||= ArrayScopeColumns.new(array_scope.select_values)
+          end
+          def order_by_columns
+            @order_by_columns ||= OrderByColumns.new(order.column_definitions, arel_table)
+          end
+          def list(array)
+            array.join(', ')
+          end
+        end
+        # rubocop: enable CodeReuse/ActiveRecord
+      end
+    end
+  end
+end
--- a/lib/gitlab/pagination/keyset/iterator.rb
+++ b/lib/gitlab/pagination/keyset/iterator.rb
@@ -6,12 +6,13 @@ module Gitlab
      class Iterator
        UnsupportedScopeOrder = Class.new(StandardError)
-        def initialize(scope:, use_union_optimization: true)
+        def initialize(scope:, use_union_optimization: true, in_operator_optimization_options: nil)
          @scope, success = Gitlab::Pagination::Keyset::SimpleOrderBuilder.build(scope)
          raise(UnsupportedScopeOrder, 'The order on the scope does not support keyset pagination') unless success
          @order = Gitlab::Pagination::Keyset::Order.extract_keyset_order_object(scope)
-          @use_union_optimization = use_union_optimization
+          @use_union_optimization = in_operator_optimization_options ? false : use_union_optimization
+          @in_operator_optimization_options = in_operator_optimization_options
        end
        # rubocop: disable CodeReuse/ActiveRecord
@@ -19,11 +20,10 @@ module Gitlab
          cursor_attributes = {}
          loop do
-            current_scope = scope.dup.limit(of)
+            current_scope = scope.dup
-            relation = order
+            relation = order.apply_cursor_conditions(current_scope, cursor_attributes, keyset_options)
-              .apply_cursor_conditions(current_scope, cursor_attributes, { use_union_optimization: @use_union_optimization })
+            relation = relation.reorder(order) unless @in_operator_optimization_options
-              .reorder(order)
+            relation = relation.limit(of)
-              .limit(of)
            yield relation
@@ -38,6 +38,13 @@ module Gitlab
        private
        attr_reader :scope, :order
+        def keyset_options
+          {
+            use_union_optimization: @use_union_optimization,
+            in_operator_optimization_options: @in_operator_optimization_options
+          }
+        end
      end
    end
  end

--- a/lib/gitlab/pagination/keyset/order.rb
+++ b/lib/gitlab/pagination/keyset/order.rb
@@ -152,15 +152,24 @@ module Gitlab
        end
        # rubocop: disable CodeReuse/ActiveRecord
-        def apply_cursor_conditions(scope, values = {}, options = { use_union_optimization: false })
+        def apply_cursor_conditions(scope, values = {}, options = { use_union_optimization: false, in_operator_optimization_options: nil })
          values ||= {}
          transformed_values = values.with_indifferent_access
-          scope = apply_custom_projections(scope)
+          scope = apply_custom_projections(scope.dup)
          where_values = build_where_values(transformed_values)
          if options[:use_union_optimization] && where_values.size > 1
            build_union_query(scope, where_values).reorder(self)
+          elsif options[:in_operator_optimization_options]
+            opts = options[:in_operator_optimization_options]
+            Gitlab::Pagination::Keyset::InOperatorOptimization::QueryBuilder.new(
+              **{
+                scope: scope.reorder(self),
+                values: values
+              }.merge(opts)
+            ).execute
          else
            scope.where(build_or_query(where_values)) # rubocop: disable CodeReuse/ActiveRecord
          end
@@ -187,7 +196,7 @@ module Gitlab
          columns = Arel::Nodes::Grouping.new(column_definitions.map(&:column_expression))
          values = Arel::Nodes::Grouping.new(column_definitions.map do |column_definition|
            value = values[column_definition.attribute_name]
-            Arel::Nodes.build_quoted(value, column_definition.column_expression)
+            build_quoted(value, column_definition.column_expression)
          end)
          if column_definitions.first.ascending_order?
@@ -197,6 +206,12 @@ module Gitlab
          end
        end
+        def build_quoted(value, column_expression)
+          return value if value.instance_of?(Arel::Nodes::SqlLiteral)
+          Arel::Nodes.build_quoted(value, column_expression)
+        end
        # Adds extra columns to the SELECT clause
        def apply_custom_projections(scope)
          additional_projections = column_definitions.select(&:add_to_projections).map do |column_definition|

--- a/spec/lib/gitlab/pagination/keyset/column_order_definition_spec.rb
+++ b/spec/lib/gitlab/pagination/keyset/column_order_definition_spec.rb
@@ -185,4 +185,25 @@ RSpec.describe Gitlab::Pagination::Keyset::ColumnOrderDefinition do
      end
    end
  end
+  describe "#order_direction_as_sql_string" do
+    let(:nulls_last_order) do
+      described_class.new(
+        attribute_name: :name,
+        column_expression: Project.arel_table[:name],
+        order_expression: Gitlab::Database.nulls_last_order('merge_request_metrics.merged_at', :desc),
+        reversed_order_expression: Gitlab::Database.nulls_first_order('merge_request_metrics.merged_at', :asc),
+        order_direction: :desc,
+        nullable: :nulls_last, # null values are always last
+        distinct: false
+      )
+    end
+    it { expect(project_name_column.order_direction_as_sql_string).to eq('ASC') }
+    it { expect(project_name_column.reverse.order_direction_as_sql_string).to eq('DESC') }
+    it { expect(project_name_lower_column.order_direction_as_sql_string).to eq('DESC') }
+    it { expect(project_name_lower_column.reverse.order_direction_as_sql_string).to eq('ASC') }
+    it { expect(nulls_last_order.order_direction_as_sql_string).to eq('DESC NULLS LAST') }
+    it { expect(nulls_last_order.reverse.order_direction_as_sql_string).to eq('ASC NULLS FIRST') }
+  end
 end
--- a/spec/lib/gitlab/pagination/keyset/in_operator_optimization/array_scope_columns_spec.rb
+++ b/spec/lib/gitlab/pagination/keyset/in_operator_optimization/array_scope_columns_spec.rb
+# frozen_string_literal: true
+require 'spec_helper'
+RSpec.describe Gitlab::Pagination::Keyset::InOperatorOptimization::ArrayScopeColumns do
+  let(:columns) { [:relative_position, :id] }
+  subject(:array_scope_columns) { described_class.new(columns) }
+  it 'builds array column names' do
+    expect(array_scope_columns.array_aggregated_column_names).to eq(%w[array_cte_relative_position_array array_cte_id_array])
+  end
+  context 'when no columns are given' do
+    let(:columns) { [] }
+    it { expect { array_scope_columns }.to raise_error /No array columns were given/ }
+  end
+end
--- a/spec/lib/gitlab/pagination/keyset/in_operator_optimization/column_data_spec.rb
+++ b/spec/lib/gitlab/pagination/keyset/in_operator_optimization/column_data_spec.rb
+# frozen_string_literal: true
+require 'spec_helper'
+RSpec.describe Gitlab::Pagination::Keyset::InOperatorOptimization::ColumnData do
+  subject(:column_data) { described_class.new('id', 'issue_id', Issue.arel_table) }
+  describe '#array_aggregated_column_name' do
+    it { expect(column_data.array_aggregated_column_name).to eq('issues_id_array') }
+  end
+  describe '#projection' do
+    it 'returns the Arel projection for the column with a new alias' do
+      expect(column_data.projection.to_sql).to eq('"issues"."id" AS issue_id')
+    end
+  end
+  it 'accepts symbols for original_column_name and as' do
+    column_data = described_class.new(:id, :issue_id, Issue.arel_table)
+    expect(column_data.projection.to_sql).to eq('"issues"."id" AS issue_id')
+  end
+end
--- a/spec/lib/gitlab/pagination/keyset/in_operator_optimization/order_by_columns_spec.rb
+++ b/spec/lib/gitlab/pagination/keyset/in_operator_optimization/order_by_columns_spec.rb
+# frozen_string_literal: true
+require 'spec_helper'
+RSpec.describe Gitlab::Pagination::Keyset::InOperatorOptimization::OrderByColumns do
+  let(:columns) do
+    [
+      Gitlab::Pagination::Keyset::ColumnOrderDefinition.new(
+        attribute_name: :relative_position,
+        order_expression: Issue.arel_table[:relative_position].desc
+      ),
+      Gitlab::Pagination::Keyset::ColumnOrderDefinition.new(
+        attribute_name: :id,
+        order_expression: Issue.arel_table[:id].desc
+      )
+    ]
+  end
+  subject(:order_by_columns) { described_class.new(columns, Issue.arel_table) }
+  describe '#array_aggregated_column_names' do
+    it { expect(order_by_columns.array_aggregated_column_names).to eq(%w[issues_relative_position_array issues_id_array]) }
+  end
+  describe '#original_column_names' do
+    it { expect(order_by_columns.original_column_names).to eq(%w[relative_position id]) }
+  end
+  describe '#cursor_values' do
+    it 'returns the keyset pagination cursor values from the column arrays as SQL expression' do
+      expect(order_by_columns.cursor_values('tbl')).to eq({
+        "id" => "tbl.issues_id_array[position]",
+        "relative_position" => "tbl.issues_relative_position_array[position]"
+      })
+    end
+  end
+end
--- a/spec/lib/gitlab/pagination/keyset/in_operator_optimization/query_builder_spec.rb
+++ b/spec/lib/gitlab/pagination/keyset/in_operator_optimization/query_builder_spec.rb
+# frozen_string_literal: true
+require 'spec_helper'
+RSpec.describe Gitlab::Pagination::Keyset::InOperatorOptimization::QueryBuilder do
+  let_it_be(:two_weeks_ago) { 2.weeks.ago }
+  let_it_be(:three_weeks_ago) { 3.weeks.ago }
+  let_it_be(:four_weeks_ago) { 4.weeks.ago }
+  let_it_be(:five_weeks_ago) { 5.weeks.ago }
+  let_it_be(:top_level_group) { create(:group) }
+  let_it_be(:sub_group_1) { create(:group, parent: top_level_group) }
+  let_it_be(:sub_group_2) { create(:group, parent: top_level_group) }
+  let_it_be(:sub_sub_group_1) { create(:group, parent: sub_group_2) }
+  let_it_be(:project_1) { create(:project, group: top_level_group) }
+  let_it_be(:project_2) { create(:project, group: top_level_group) }
+  let_it_be(:project_3) { create(:project, group: sub_group_1) }
+  let_it_be(:project_4) { create(:project, group: sub_group_2) }
+  let_it_be(:project_5) { create(:project, group: sub_sub_group_1) }
+  let_it_be(:issues) do
+    [
+      create(:issue, project: project_1, created_at: three_weeks_ago, relative_position: 5),
+      create(:issue, project: project_1, created_at: two_weeks_ago),
+      create(:issue, project: project_2, created_at: two_weeks_ago, relative_position: 15),
+      create(:issue, project: project_2, created_at: two_weeks_ago),
+      create(:issue, project: project_3, created_at: four_weeks_ago),
+      create(:issue, project: project_4, created_at: five_weeks_ago, relative_position: 10),
+      create(:issue, project: project_5, created_at: four_weeks_ago)
+    ]
+  end
+  shared_examples 'correct ordering examples' do
+    let(:iterator) do
+      Gitlab::Pagination::Keyset::Iterator.new(
+        scope: scope.limit(batch_size),
+        in_operator_optimization_options: in_operator_optimization_options
+      )
+    end
+    it 'returns records in correct order' do
+      all_records = []
+      iterator.each_batch(of: batch_size) do |records|
+        all_records.concat(records)
+      end
+      expect(all_records).to eq(expected_order)
+    end
+  end
+  context 'when ordering by issues.id DESC' do
+    let(:scope) { Issue.order(id: :desc) }
+    let(:expected_order) { issues.sort_by(&:id).reverse }
+    let(:in_operator_optimization_options) do
+      {
+        array_scope: Project.where(namespace_id: top_level_group.self_and_descendants.select(:id)).select(:id),
+        array_mapping_scope: -> (id_expression) { Issue.where(Issue.arel_table[:project_id].eq(id_expression)) },
+        finder_query: -> (id_expression) { Issue.where(Issue.arel_table[:id].eq(id_expression)) }
+      }
+    end
+    context 'when iterating records one by one' do
+      let(:batch_size) { 1 }
+      it_behaves_like 'correct ordering examples'
+    end
+    context 'when iterating records with LIMIT 3' do
+      let(:batch_size) { 3 }
+      it_behaves_like 'correct ordering examples'
+    end
+    context 'when loading records at once' do
+      let(:batch_size) { issues.size + 1 }
+      it_behaves_like 'correct ordering examples'
+    end
+  end
+  context 'when ordering by issues.relative_position DESC NULLS LAST, id DESC' do
+    let(:scope) { Issue.order(order) }
+    let(:expected_order) { scope.to_a }
+    let(:order) do
+      # NULLS LAST ordering requires custom Order object for keyset pagination:
+      # https://docs.gitlab.com/ee/development/database/keyset_pagination.html#complex-order-configuration
+      Gitlab::Pagination::Keyset::Order.build([
+        Gitlab::Pagination::Keyset::ColumnOrderDefinition.new(
+          attribute_name: :relative_position,
+          column_expression: Issue.arel_table[:relative_position],
+          order_expression: Gitlab::Database.nulls_last_order('relative_position', :desc),
+          reversed_order_expression: Gitlab::Database.nulls_first_order('relative_position', :asc),
+          order_direction: :desc,
+          nullable: :nulls_last,
+          distinct: false
+        ),
+        Gitlab::Pagination::Keyset::ColumnOrderDefinition.new(
+          attribute_name: :id,
+          order_expression: Issue.arel_table[:id].desc,
+          nullable: :not_nullable,
+          distinct: true
+        )
+      ])
+    end
+    let(:in_operator_optimization_options) do
+      {
+        array_scope: Project.where(namespace_id: top_level_group.self_and_descendants.select(:id)).select(:id),
+        array_mapping_scope: -> (id_expression) { Issue.where(Issue.arel_table[:project_id].eq(id_expression)) },
+        finder_query: -> (_relative_position_expression, id_expression) { Issue.where(Issue.arel_table[:id].eq(id_expression)) }
+      }
+    end
+    context 'when iterating records one by one' do
+      let(:batch_size) { 1 }
+      it_behaves_like 'correct ordering examples'
+    end
+    context 'when iterating records with LIMIT 3' do
+      let(:batch_size) { 3 }
+      it_behaves_like 'correct ordering examples'
+    end
+  end
+  context 'when ordering by issues.created_at DESC, issues.id ASC' do
+    let(:scope) { Issue.order(created_at: :desc, id: :asc) }
+    let(:expected_order) { issues.sort_by { |issue| [issue.created_at.to_f * -1, issue.id] } }
+    let(:in_operator_optimization_options) do
+      {
+        array_scope: Project.where(namespace_id: top_level_group.self_and_descendants.select(:id)).select(:id),
+        array_mapping_scope: -> (id_expression) { Issue.where(Issue.arel_table[:project_id].eq(id_expression)) },
+        finder_query: -> (_created_at_expression, id_expression) { Issue.where(Issue.arel_table[:id].eq(id_expression)) }
+      }
+    end
+    context 'when iterating records one by one' do
+      let(:batch_size) { 1 }
+      it_behaves_like 'correct ordering examples'
+    end
+    context 'when iterating records with LIMIT 3' do
+      let(:batch_size) { 3 }
+      it_behaves_like 'correct ordering examples'
+    end
+    context 'when loading records at once' do
+      let(:batch_size) { issues.size + 1 }
+      it_behaves_like 'correct ordering examples'
+    end
+  end
+  context 'pagination support' do
+    let(:scope) { Issue.order(id: :desc) }
+    let(:expected_order) { issues.sort_by(&:id).reverse }
+    let(:options) do
+      {
+        scope: scope,
+        array_scope: Project.where(namespace_id: top_level_group.self_and_descendants.select(:id)).select(:id),
+        array_mapping_scope: -> (id_expression) { Issue.where(Issue.arel_table[:project_id].eq(id_expression)) },
+        finder_query: -> (id_expression) { Issue.where(Issue.arel_table[:id].eq(id_expression)) }
+      }
+    end
+    context 'offset pagination' do
+      subject(:optimized_scope) { described_class.new(**options).execute }
+      it 'paginates the scopes' do
+        first_page = optimized_scope.page(1).per(2)
+        expect(first_page).to eq(expected_order[0...2])
+        second_page = optimized_scope.page(2).per(2)
+        expect(second_page).to eq(expected_order[2...4])
+        third_page = optimized_scope.page(3).per(2)
+        expect(third_page).to eq(expected_order[4...6])
+      end
+    end
+    context 'keyset pagination' do
+      def paginator(cursor = nil)
+        scope.keyset_paginate(cursor: cursor, per_page: 2, keyset_order_options: options)
+      end
+      it 'paginates correctly' do
+        first_page = paginator.records
+        expect(first_page).to eq(expected_order[0...2])
+        cursor_for_page_2 = paginator.cursor_for_next_page
+        second_page = paginator(cursor_for_page_2).records
+        expect(second_page).to eq(expected_order[2...4])
+        cursor_for_page_3 = paginator(cursor_for_page_2).cursor_for_next_page
+        third_page = paginator(cursor_for_page_3).records
+        expect(third_page).to eq(expected_order[4...6])
+      end
+    end
+  end
+  it 'raises error when unsupported scope is passed' do
+    scope = Issue.order(Issue.arel_table[:id].lower.desc)
+    options = {
+      scope: scope,
+      array_scope: Project.where(namespace_id: top_level_group.self_and_descendants.select(:id)).select(:id),
+      array_mapping_scope: -> (id_expression) { Issue.where(Issue.arel_table[:project_id].eq(id_expression)) },
+      finder_query: -> (id_expression) { Issue.where(Issue.arel_table[:id].eq(id_expression)) }
+    }
+    expect { described_class.new(**options).execute }.to raise_error(/The order on the scope does not support keyset pagination/)
+  end
+end