Commit d3d90778 authored by Sean McGivern's avatar Sean McGivern

Add a helper to rename a column using a background migration

This works the same way as change_column_type_using_background_migration, but
for renaming a column. It takes a table, not a relation, to match its concurrent
counterpart.

Also, generalise the cleanup migrations to reduce code duplication.
parent 4d6d1f05
...@@ -195,22 +195,22 @@ end ...@@ -195,22 +195,22 @@ end
And that's it, we're done! And that's it, we're done!
## Changing Column Types For Large Tables ## Changing The Schema For Large Tables
While `change_column_type_concurrently` can be used for changing the type of a While `change_column_type_concurrently` and `rename_column_concurrently` can be
column without downtime it doesn't work very well for large tables. Because all used for changing the schema of a table without downtime it doesn't work very
of the work happens in sequence the migration can take a very long time to well for large tables. Because all of the work happens in sequence the migration
complete, preventing a deployment from proceeding. can take a very long time to complete, preventing a deployment from proceeding.
`change_column_type_concurrently` can also produce a lot of pressure on the They can also produce a lot of pressure on the database due to it rapidly
database due to it rapidly updating many rows in sequence. updating many rows in sequence.
To reduce database pressure you should instead use To reduce database pressure you should instead use
`change_column_type_using_background_migration` when migrating a column in a `change_column_type_using_background_migration` or `rename_column_concurrently`
large table (e.g. `issues`). This method works similar to when migrating a column in a large table (e.g. `issues`). These methods work
`change_column_type_concurrently` but uses background migration to spread the similarly to the concurrent counterparts but uses background migration to spread
work / load over a longer time period, without slowing down deployments. the work / load over a longer time period, without slowing down deployments.
Usage of this method is fairly simple: For example, to change the column type using a background migration:
```ruby ```ruby
class ExampleMigration < ActiveRecord::Migration class ExampleMigration < ActiveRecord::Migration
...@@ -296,6 +296,15 @@ class MigrateRemainingIssuesClosedAt < ActiveRecord::Migration ...@@ -296,6 +296,15 @@ class MigrateRemainingIssuesClosedAt < ActiveRecord::Migration
end end
``` ```
The same applies to `rename_column_using_background_migration`:
1. Create a migration using the helper, which will schedule background
migrations to spread the writes over a longer period of time.
2. In the next monthly release, create a clean-up migration to steal from the
Sidekiq queues, migrate any missing rows, and cleanup the rename. This
migration should skip the steps after stealing from the Sidekiq queues if the
column has already been renamed.
For more information, see [the documentation on cleaning up background For more information, see [the documentation on cleaning up background
migrations](background_migrations.md#cleaning-up). migrations](background_migrations.md#cleaning-up).
......
# frozen_string_literal: true
module Gitlab
module BackgroundMigration
# Background migration for cleaning up a concurrent column rename.
class CleanupConcurrentRename < CleanupConcurrentSchemaChange
RESCHEDULE_DELAY = 10.minutes
def cleanup_concurrent_schema_change(table, old_column, new_column)
cleanup_concurrent_column_rename(table, old_column, new_column)
end
end
end
end
# frozen_string_literal: true
module Gitlab
module BackgroundMigration
# Base class for cleaning up concurrent schema changes.
class CleanupConcurrentSchemaChange
include Database::MigrationHelpers
# table - The name of the table the migration is performed for.
# old_column - The name of the old (to drop) column.
# new_column - The name of the new column.
def perform(table, old_column, new_column)
return unless column_exists?(table, new_column)
rows_to_migrate = define_model_for(table)
.where(new_column => nil)
.where
.not(old_column => nil)
if rows_to_migrate.any?
BackgroundMigrationWorker.perform_in(
RESCHEDULE_DELAY,
self.class.name,
[table, old_column, new_column]
)
else
cleanup_concurrent_schema_change(table, old_column, new_column)
end
end
# These methods are necessary so we can re-use the migration helpers in
# this class.
def connection
ActiveRecord::Base.connection
end
def method_missing(name, *args, &block)
connection.__send__(name, *args, &block) # rubocop: disable GitlabSecurity/PublicSend
end
def respond_to_missing?(*args)
connection.respond_to?(*args) || super
end
def define_model_for(table)
Class.new(ActiveRecord::Base) do
self.table_name = table
end
end
end
end
end
...@@ -2,52 +2,12 @@ ...@@ -2,52 +2,12 @@
module Gitlab module Gitlab
module BackgroundMigration module BackgroundMigration
# Background migration for cleaning up a concurrent column rename. # Background migration for cleaning up a concurrent column type changeb.
class CleanupConcurrentTypeChange class CleanupConcurrentTypeChange < CleanupConcurrentSchemaChange
include Database::MigrationHelpers
RESCHEDULE_DELAY = 10.minutes RESCHEDULE_DELAY = 10.minutes
# table - The name of the table the migration is performed for. def cleanup_concurrent_schema_change(table, old_column, new_column)
# old_column - The name of the old (to drop) column. cleanup_concurrent_column_type_change(table, old_column)
# new_column - The name of the new column.
def perform(table, old_column, new_column)
return unless column_exists?(:issues, new_column)
rows_to_migrate = define_model_for(table)
.where(new_column => nil)
.where
.not(old_column => nil)
if rows_to_migrate.any?
BackgroundMigrationWorker.perform_in(
RESCHEDULE_DELAY,
'CleanupConcurrentTypeChange',
[table, old_column, new_column]
)
else
cleanup_concurrent_column_type_change(table, old_column)
end
end
# These methods are necessary so we can re-use the migration helpers in
# this class.
def connection
ActiveRecord::Base.connection
end
def method_missing(name, *args, &block)
connection.__send__(name, *args, &block) # rubocop: disable GitlabSecurity/PublicSend
end
def respond_to_missing?(*args)
connection.respond_to?(*args) || super
end
def define_model_for(table)
Class.new(ActiveRecord::Base) do
self.table_name = table
end
end end
end end
end end
......
...@@ -596,6 +596,97 @@ module Gitlab ...@@ -596,6 +596,97 @@ module Gitlab
end end
end end
# Renames a column using a background migration.
#
# Because this method uses a background migration it's more suitable for
# large tables. For small tables it's better to use
# `rename_column_concurrently` since it can complete its work in a much
# shorter amount of time and doesn't rely on Sidekiq.
#
# Example usage:
#
# rename_column_using_background_migration(
# :users,
# :feed_token,
# :rss_token
# )
#
# table - The name of the database table containing the column.
#
# old - The old column name.
#
# new - The new column name.
#
# type - The type of the new column. If no type is given the old column's
# type is used.
#
# batch_size - The number of rows to schedule in a single background
# migration.
#
# interval - The time interval between every background migration.
def rename_column_using_background_migration(
table,
old_column,
new_column,
type: nil,
batch_size: 10_000,
interval: 10.minutes
)
check_trigger_permissions!(table)
old_col = column_for(table, old_column)
new_type = type || old_col.type
max_index = 0
add_column(table, new_column, new_type,
limit: old_col.limit,
precision: old_col.precision,
scale: old_col.scale)
# We set the default value _after_ adding the column so we don't end up
# updating any existing data with the default value. This isn't
# necessary since we copy over old values further down.
change_column_default(table, new_column, old_col.default) if old_col.default
install_rename_triggers(table, old_column, new_column)
model = Class.new(ActiveRecord::Base) do
self.table_name = table
include ::EachBatch
end
# Schedule the jobs that will copy the data from the old column to the
# new one. Rows with NULL values in our source column are skipped since
# the target column is already NULL at this point.
model.where.not(old_column => nil).each_batch(of: batch_size) do |batch, index|
start_id, end_id = batch.pluck('MIN(id), MAX(id)').first
max_index = index
BackgroundMigrationWorker.perform_in(
index * interval,
'CopyColumn',
[table, old_column, new_column, start_id, end_id]
)
end
# Schedule the renaming of the column to happen (initially) 1 hour after
# the last batch finished.
BackgroundMigrationWorker.perform_in(
(max_index * interval) + 1.hour,
'CleanupConcurrentRename',
[table, old_column, new_column]
)
if perform_background_migration_inline?
# To ensure the schema is up to date immediately we perform the
# migration inline in dev / test environments.
Gitlab::BackgroundMigration.steal('CopyColumn')
Gitlab::BackgroundMigration.steal('CleanupConcurrentRename')
end
end
def perform_background_migration_inline? def perform_background_migration_inline?
Rails.env.test? || Rails.env.development? Rails.env.test? || Rails.env.development?
end end
......
...@@ -1178,6 +1178,61 @@ describe Gitlab::Database::MigrationHelpers do ...@@ -1178,6 +1178,61 @@ describe Gitlab::Database::MigrationHelpers do
end end
end end
describe '#rename_column_using_background_migration' do
let!(:issue) { create(:issue, :closed, closed_at: Time.zone.now) }
it 'renames a column using a background migration' do
expect(model)
.to receive(:add_column)
.with(
'issues',
:closed_at_timestamp,
:datetime_with_timezone,
limit: anything,
precision: anything,
scale: anything
)
expect(model)
.to receive(:install_rename_triggers)
.with('issues', :closed_at, :closed_at_timestamp)
expect(BackgroundMigrationWorker)
.to receive(:perform_in)
.ordered
.with(
10.minutes,
'CopyColumn',
['issues', :closed_at, :closed_at_timestamp, issue.id, issue.id]
)
expect(BackgroundMigrationWorker)
.to receive(:perform_in)
.ordered
.with(
1.hour + 10.minutes,
'CleanupConcurrentRename',
['issues', :closed_at, :closed_at_timestamp]
)
expect(Gitlab::BackgroundMigration)
.to receive(:steal)
.ordered
.with('CopyColumn')
expect(Gitlab::BackgroundMigration)
.to receive(:steal)
.ordered
.with('CleanupConcurrentRename')
model.rename_column_using_background_migration(
'issues',
:closed_at,
:closed_at_timestamp
)
end
end
describe '#perform_background_migration_inline?' do describe '#perform_background_migration_inline?' do
it 'returns true in a test environment' do it 'returns true in a test environment' do
allow(Rails.env) allow(Rails.env)
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment