Commit 5a640061 authored by Adam Hegyi's avatar Adam Hegyi

Enhance in operator optimziation ORDER BY support

This change adds support to use the in operator optimization with SQL
expressions in the ORDER BY clause.
parent 737ecf8b
......@@ -589,6 +589,87 @@ LIMIT 20
NOTE:
To make the query efficient, the following columns need to be covered with an index: `project_id`, `issue_type`, `created_at`, and `id`.
#### Using calculated ORDER BY expression
The following example orders epic records by the duration between the creation time and closed
time. It is calculated with the following formula:
```sql
SELECT EXTRACT('epoch' FROM epics.closed_at - epics.created_at) FROM epics
```
The query above returns the duration in seconds (`double precision`) between the two timestamp
columns in seconds. To order the records by this expression, you must reference it
in the `ORDER BY` clause:
```sql
SELECT EXTRACT('epoch' FROM epics.closed_at - epics.created_at)
FROM epics
ORDER BY EXTRACT('epoch' FROM epics.closed_at - epics.created_at) DESC
```
To make this ordering efficient on the group-level with the in-operator optimization, use a
custom `ORDER BY` configuration. Since the duration is not a distinct value (no unique index
present), you must add a tie-breaker column (`id`).
The following example shows the final `ORDER BY` clause:
```sql
ORDER BY extract('epoch' FROM epics.closed_at - epics.created_at) DESC, epics.id DESC
```
Snippet for loading records ordered by the calcualted duration:
```ruby
arel_table = Epic.arel_table
order = Gitlab::Pagination::Keyset::Order.build([
Gitlab::Pagination::Keyset::ColumnOrderDefinition.new(
attribute_name: 'duration_in_seconds',
order_expression: Arel.sql('EXTRACT(EPOCH FROM epics.closed_at - epics.created_at)').desc,
distinct: false,
sql_type: 'double precision' # important for calculated SQL expressions
),
Gitlab::Pagination::Keyset::ColumnOrderDefinition.new(
attribute_name: 'id',
order_expression: arel_table[:id].desc
)
])
records = Gitlab::Pagination::Keyset::InOperatorOptimization::QueryBuilder.new(
scope: Epic.where.not(closed_at: nil).reorder(order), # filter out NULL values
array_scope: Group.find(9970).self_and_descendants.select(:id),
array_mapping_scope: -> (id_expression) { Epic.where(Epic.arel_table[:group_id].eq(id_expression)) }
).execute.limit(20)
puts records.pluck(:duration_in_seconds, :id) # other columnns are not available
```
Building the query requires quite a bit of configuration. For the order configuration you
can find more information within the
[complex order configuration](keyset_pagination.md#complex-order-configuration)
section for keyset paginated database queries.
The query requires a specialized database index:
```sql
CREATE INDEX index_epics_on_duration ON epics USING btree (group_id, EXTRACT(EPOCH FROM epics.closed_at - epics.created_at) DESC, id DESC) WHERE (closed_at IS NOT NULL);
```
Notice that the `finder_query` parameter is not used. The query only returns the `ORDER BY` columns
which are the `duration_in_seconds` (calculated column) and the `id` columns. This is a limitation
of the feature, defining the `finder_query` with calculated `ORDER BY` expressions is not supported.
To get the complete database records, an extra query can be invoked by the returned `id` column:
```ruby
records_by_id = records.index_by(&:id)
complete_records = Epic.where(id: records_by_id.keys).index_by(&:id)
# Printing the complete records according to the `ORDER BY` clause
records_by_id.each do |id, _|
puts complete_records[id].attributes
end
```
#### Batch iteration
Batch iteration over the records is possible via the keyset `Iterator` class.
......
......@@ -114,6 +114,20 @@ module Gitlab
# - When the order is a calculated expression or the column is in another table (JOIN-ed)
#
# If the add_to_projections is true, the query builder will automatically add the column to the SELECT values
#
# **sql_type**
#
# The SQL type of the column or SQL expression. This is an optional field which is only required when using the
# column with the InOperatorOptimization class.
#
# Example: When the order expression is a calculated SQL expression.
#
# {
# attribute_name: 'id_times_count',
# order_expression: Arel.sql('(id * count)').asc,
# sql_type: 'integer' # the SQL type here must match with the type of the produced data by the order_expression. Putting 'text' here would be incorrect.
# }
#
class ColumnOrderDefinition
REVERSED_ORDER_DIRECTIONS = { asc: :desc, desc: :asc }.freeze
REVERSED_NULL_POSITIONS = { nulls_first: :nulls_last, nulls_last: :nulls_first }.freeze
......@@ -122,7 +136,8 @@ module Gitlab
attr_reader :attribute_name, :column_expression, :order_expression, :add_to_projections, :order_direction
def initialize(attribute_name:, order_expression:, column_expression: nil, reversed_order_expression: nil, nullable: :not_nullable, distinct: true, order_direction: nil, add_to_projections: false)
# rubocop: disable Metrics/ParameterLists
def initialize(attribute_name:, order_expression:, column_expression: nil, reversed_order_expression: nil, nullable: :not_nullable, distinct: true, order_direction: nil, sql_type: nil, add_to_projections: false)
@attribute_name = attribute_name
@order_expression = order_expression
@column_expression = column_expression || calculate_column_expression(order_expression)
......@@ -130,8 +145,10 @@ module Gitlab
@reversed_order_expression = reversed_order_expression || calculate_reversed_order(order_expression)
@nullable = parse_nullable(nullable, distinct)
@order_direction = parse_order_direction(order_expression, order_direction)
@sql_type = sql_type
@add_to_projections = add_to_projections
end
# rubocop: enable Metrics/ParameterLists
def reverse
self.class.new(
......@@ -185,6 +202,12 @@ module Gitlab
sql_string
end
def sql_type
raise Gitlab::Pagination::Keyset::SqlTypeMissingError.for_column(self) if @sql_type.nil?
@sql_type
end
private
attr_reader :reversed_order_expression, :nullable, :distinct
......
......@@ -4,23 +4,35 @@ module Gitlab
module Pagination
module Keyset
module InOperatorOptimization
# This class is used for wrapping an Arel column with
# convenient helper methods in order to make the query
# building for the InOperatorOptimization a bit cleaner.
class ColumnData
attr_reader :original_column_name, :as, :arel_table
def initialize(original_column_name, as, arel_table)
@original_column_name = original_column_name.to_s
# column - name of the DB column
# as - custom alias for the column
# arel_table - relation where the column is located
def initialize(column, as, arel_table)
@original_column_name = column
@as = as.to_s
@arel_table = arel_table
end
# Generates: `issues.name AS my_alias`
def projection
arel_column.as(as)
end
# Generates: issues.name`
def arel_column
arel_table[original_column_name]
end
# overridden in OrderByColumnData class
alias_method :column_expression, :arel_column
# Generates: `issues.my_alias`
def arel_column_as
arel_table[as]
end
......@@ -29,8 +41,9 @@ module Gitlab
"#{arel_table.name}_#{original_column_name}_array"
end
# Generates: SELECT ARRAY_AGG(...) AS issues_name_array
def array_aggregated_column
Arel::Nodes::NamedFunction.new('ARRAY_AGG', [arel_column]).as(array_aggregated_column_name)
Arel::Nodes::NamedFunction.new('ARRAY_AGG', [column_expression]).as(array_aggregated_column_name)
end
end
end
......
# frozen_string_literal: true
module Gitlab
module Pagination
module Keyset
module InOperatorOptimization
class OrderByColumnData < ColumnData
extend ::Gitlab::Utils::Override
attr_reader :column
# column - a ColumnOrderDefinition object
# as - custom alias for the column
# arel_table - relation where the column is located
def initialize(column, as, arel_table)
super(column.attribute_name.to_s, as, arel_table)
@column = column
end
override :arel_column
def arel_column
column.column_expression
end
override :column_expression
def column_expression
arel_table[original_column_name]
end
def column_for_projection
column.column_expression.as(original_column_name)
end
end
end
end
end
end
......@@ -9,16 +9,16 @@ module Gitlab
# This class exposes collection methods for the order by columns
#
# Example: by modelling the `issues.created_at ASC, issues.id ASC` ORDER BY
# Example: by modeling the `issues.created_at ASC, issues.id ASC` ORDER BY
# SQL clause, this class will receive two ColumnOrderDefinition objects
def initialize(columns, arel_table)
@columns = columns.map do |column|
ColumnData.new(column.attribute_name, "order_by_columns_#{column.attribute_name}", arel_table)
OrderByColumnData.new(column, "order_by_columns_#{column.attribute_name}", arel_table)
end
end
def arel_columns
columns.map(&:arel_column)
columns.map(&:column_for_projection)
end
def array_aggregated_columns
......
......@@ -120,7 +120,7 @@ module Gitlab
.from(array_cte)
.join(Arel.sql("LEFT JOIN LATERAL (#{initial_keyset_query.to_sql}) #{table_name} ON TRUE"))
order_by_columns.each { |column| q.where(column.arel_column.not_eq(nil)) }
order_by_columns.each { |column| q.where(column.column_expression.not_eq(nil)) }
q.as('array_scope_lateral_query')
end
......@@ -231,7 +231,7 @@ module Gitlab
order
.apply_cursor_conditions(keyset_scope, cursor_values, use_union_optimization: true)
.reselect(*order_by_columns.arel_columns)
.reselect(*order_by_columns.map(&:column_for_projection))
.limit(1)
end
......
......@@ -12,11 +12,7 @@ module Gitlab
end
def initializer_columns
order_by_columns.map do |column|
column_name = column.original_column_name.to_s
type = model.columns_hash[column_name].sql_type
"NULL::#{type} AS #{column_name}"
end
order_by_columns.map { |column_data| null_with_type_cast(column_data) }
end
def columns
......@@ -30,6 +26,15 @@ module Gitlab
private
attr_reader :model, :order_by_columns
def null_with_type_cast(column_data)
column_name = column_data.original_column_name.to_s
active_record_column = model.columns_hash[column_name]
type = active_record_column ? active_record_column.sql_type : column_data.column.sql_type
"NULL::#{type} AS #{column_name}"
end
end
end
end
......
......@@ -9,6 +9,8 @@ module Gitlab
RECORDS_COLUMN = 'records'
def initialize(finder_query, model, order_by_columns)
verify_order_by_attributes_on_model!(model, order_by_columns)
@finder_query = finder_query
@order_by_columns = order_by_columns
@table_name = model.table_name
......@@ -34,6 +36,20 @@ module Gitlab
private
attr_reader :finder_query, :order_by_columns, :table_name
def verify_order_by_attributes_on_model!(model, order_by_columns)
order_by_columns.map(&:column).each do |column|
unless model.columns_hash[column.attribute_name.to_s]
text = <<~TEXT
The "RecordLoaderStrategy" does not support the following ORDER BY column because
it's not available on the \"#{model.table_name}\" table: #{column.attribute_name}
Omit the "finder_query" parameter to use the "OrderValuesLoaderStrategy".
TEXT
raise text
end
end
end
end
end
end
......
# frozen_string_literal: true
module Gitlab
module Pagination
module Keyset
class SqlTypeMissingError < StandardError
def self.for_column(column)
message = <<~TEXT
The "sql_type" attribute is not set for the following column definition:
#{column.attribute_name}
See the ColumnOrderDefinition class for more context.
TEXT
new(message)
end
end
end
end
end
# frozen_string_literal: true
require 'spec_helper'
RSpec.describe Gitlab::Pagination::Keyset::InOperatorOptimization::OrderByColumnData do
let(:arel_table) { Issue.arel_table }
let(:column) do
Gitlab::Pagination::Keyset::ColumnOrderDefinition.new(
attribute_name: :id,
column_expression: arel_table[:id],
order_expression: arel_table[:id].desc
)
end
subject(:column_data) { described_class.new(column, 'column_alias', arel_table) }
describe '#arel_column' do
it 'delegates to column_expression' do
expect(column_data.arel_column).to eq(column.column_expression)
end
end
describe '#column_for_projection' do
it 'returns the expression with AS using the original column name' do
expect(column_data.column_for_projection.to_sql).to eq('"issues"."id" AS id')
end
end
describe '#projection' do
it 'returns the expression with AS using the specified column lias' do
expect(column_data.projection.to_sql).to eq('"issues"."id" AS column_alias')
end
end
end
......@@ -33,14 +33,14 @@ RSpec.describe Gitlab::Pagination::Keyset::InOperatorOptimization::QueryBuilder
]
end
shared_examples 'correct ordering examples' do
let(:iterator) do
Gitlab::Pagination::Keyset::Iterator.new(
scope: scope.limit(batch_size),
in_operator_optimization_options: in_operator_optimization_options
)
end
let(:iterator) do
Gitlab::Pagination::Keyset::Iterator.new(
scope: scope.limit(batch_size),
in_operator_optimization_options: in_operator_optimization_options
)
end
shared_examples 'correct ordering examples' do |opts = {}|
let(:all_records) do
all_records = []
iterator.each_batch(of: batch_size) do |records|
......@@ -49,8 +49,10 @@ RSpec.describe Gitlab::Pagination::Keyset::InOperatorOptimization::QueryBuilder
all_records
end
it 'returns records in correct order' do
expect(all_records).to eq(expected_order)
unless opts[:skip_finder_query_test]
it 'returns records in correct order' do
expect(all_records).to eq(expected_order)
end
end
context 'when not passing the finder query' do
......@@ -248,4 +250,57 @@ RSpec.describe Gitlab::Pagination::Keyset::InOperatorOptimization::QueryBuilder
expect { described_class.new(**options).execute }.to raise_error(/The order on the scope does not support keyset pagination/)
end
context 'when ordering by SQL expression' do
let(:order) do
# ORDER BY (id * 10), id
Gitlab::Pagination::Keyset::Order.build([
Gitlab::Pagination::Keyset::ColumnOrderDefinition.new(
attribute_name: 'id_multiplied_by_ten',
order_expression: Arel.sql('(id * 10)').asc,
sql_type: 'integer'
),
Gitlab::Pagination::Keyset::ColumnOrderDefinition.new(
attribute_name: :id,
order_expression: Issue.arel_table[:id].asc
)
])
end
let(:scope) { Issue.reorder(order) }
let(:expected_order) { issues.sort_by(&:id) }
let(:in_operator_optimization_options) do
{
array_scope: Project.where(namespace_id: top_level_group.self_and_descendants.select(:id)).select(:id),
array_mapping_scope: -> (id_expression) { Issue.where(Issue.arel_table[:project_id].eq(id_expression)) }
}
end
context 'when iterating records one by one' do
let(:batch_size) { 1 }
it_behaves_like 'correct ordering examples', skip_finder_query_test: true
end
context 'when iterating records with LIMIT 3' do
let(:batch_size) { 3 }
it_behaves_like 'correct ordering examples', skip_finder_query_test: true
end
context 'when passing finder query' do
let(:batch_size) { 3 }
it 'raises error, loading complete rows are not supported with SQL expressions' do
in_operator_optimization_options[:finder_query] = -> (_, _) { Issue.select(:id, '(id * 10)').where(id: -1) }
expect(in_operator_optimization_options[:finder_query]).not_to receive(:call)
expect do
iterator.each_batch(of: batch_size) { |records| records.to_a }
end.to raise_error /The "RecordLoaderStrategy" does not support/
end
end
end
end
......@@ -31,4 +31,41 @@ RSpec.describe Gitlab::Pagination::Keyset::InOperatorOptimization::Strategies::O
])
end
end
context 'when an SQL expression is given' do
context 'when the sql_type attribute is missing' do
let(:order) do
Gitlab::Pagination::Keyset::Order.build([
Gitlab::Pagination::Keyset::ColumnOrderDefinition.new(
attribute_name: 'id_times_ten',
order_expression: Arel.sql('id * 10').asc
)
])
end
let(:keyset_scope) { Project.order(order) }
it 'raises error' do
expect { strategy.initializer_columns }.to raise_error(Gitlab::Pagination::Keyset::SqlTypeMissingError)
end
end
context 'when the sql_type_attribute is present' do
let(:order) do
Gitlab::Pagination::Keyset::Order.build([
Gitlab::Pagination::Keyset::ColumnOrderDefinition.new(
attribute_name: 'id_times_ten',
order_expression: Arel.sql('id * 10').asc,
sql_type: 'integer'
)
])
end
let(:keyset_scope) { Project.order(order) }
it 'returns the initializer columns' do
expect(strategy.initializer_columns).to eq(['NULL::integer AS id_times_ten'])
end
end
end
end
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment