Commit 3584f7dd authored by Douwe Maan's avatar Douwe Maan

Merge branch 'zero-downtime-migrations' into 'master'

Prepare for zero downtime migrations

See merge request !9976
parents 6ce2cc61 223d8a3d
...@@ -4,28 +4,53 @@ When writing migrations for GitLab, you have to take into account that ...@@ -4,28 +4,53 @@ When writing migrations for GitLab, you have to take into account that
these will be ran by hundreds of thousands of organizations of all sizes, some with these will be ran by hundreds of thousands of organizations of all sizes, some with
many years of data in their database. many years of data in their database.
In addition, having to take a server offline for a an upgrade small or big is In addition, having to take a server offline for a a upgrade small or big is a
a big burden for most organizations. For this reason it is important that your big burden for most organizations. For this reason it is important that your
migrations are written carefully, can be applied online and adhere to the style guide below. migrations are written carefully, can be applied online and adhere to the style
guide below.
Migrations should not require GitLab installations to be taken offline unless Migrations are **not** allowed to require GitLab installations to be taken
_absolutely_ necessary - see the ["What Requires Downtime?"](what_requires_downtime.md) offline unless _absolutely necessary_. Downtime assumptions should be based on
page. If a migration requires downtime, this should be clearly mentioned during the behaviour of a migration when performed using PostgreSQL, as various
the review process, as well as being documented in the monthly release post. For operations in MySQL may require downtime without there being alternatives.
more information, see the "Downtime Tagging" section below.
When downtime is necessary the migration has to be approved by:
1. The VP of Engineering
1. A Backend Lead
1. A Database Specialist
An up-to-date list of people holding these titles can be found at
<https://about.gitlab.com/team/>.
The document ["What Requires Downtime?"](what_requires_downtime.md) specifies
various database operations, whether they require downtime and how to
work around that whenever possible.
When writing your migrations, also consider that databases might have stale data When writing your migrations, also consider that databases might have stale data
or inconsistencies and guard for that. Try to make as little assumptions as possible or inconsistencies and guard for that. Try to make as few assumptions as
about the state of the database. possible about the state of the database.
Please don't depend on GitLab-specific code since it can change in future
versions. If needed copy-paste GitLab code into the migration to make it forward
compatible.
## Commit Guidelines
Please don't depend on GitLab specific code since it can change in future versions. Each migration **must** be added in its own commit with a descriptive commit
If needed copy-paste GitLab code into the migration to make it forward compatible. message. If a commit adds a migration it _should only_ include the migration and
any corresponding changes to `db/schema.rb`. This makes it easy to revert a
database migration without accidentally reverting other changes.
## Downtime Tagging ## Downtime Tagging
Every migration must specify if it requires downtime or not, and if it should Every migration must specify if it requires downtime or not, and if it should
require downtime it must also specify a reason for this. To do so, add the require downtime it must also specify a reason for this. This is required even
following two constants to the migration class' body: if 99% of the migrations won't require downtime as this makes it easier to find
the migrations that _do_ require downtime.
To tag a migration, add the following two constants to the migration class'
body:
* `DOWNTIME`: a boolean that when set to `true` indicates the migration requires * `DOWNTIME`: a boolean that when set to `true` indicates the migration requires
downtime. downtime.
...@@ -50,12 +75,53 @@ from a migration class. ...@@ -50,12 +75,53 @@ from a migration class.
## Reversibility ## Reversibility
Your migration should be reversible. This is very important, as it should Your migration **must be** reversible. This is very important, as it should
be possible to downgrade in case of a vulnerability or bugs. be possible to downgrade in case of a vulnerability or bugs.
In your migration, add a comment describing how the reversibility of the In your migration, add a comment describing how the reversibility of the
migration was tested. migration was tested.
## Multi Threading
Sometimes a migration might need to use multiple Ruby threads to speed up a
migration. For this to work your migration needs to include the module
`Gitlab::Database::MultiThreadedMigration`:
```ruby
class MyMigration < ActiveRecord::Migration
include Gitlab::Database::MigrationHelpers
include Gitlab::Database::MultiThreadedMigration
end
```
You can then use the method `with_multiple_threads` to perform work in separate
threads. For example:
```ruby
class MyMigration < ActiveRecord::Migration
include Gitlab::Database::MigrationHelpers
include Gitlab::Database::MultiThreadedMigration
def up
with_multiple_threads(4) do
disable_statement_timeout
# ...
end
end
end
```
Here the call to `disable_statement_timeout` will use the connection local to
the `with_multiple_threads` block, instead of re-using the global connection
pool. This ensures each thread has its own connection object, and won't time
out when trying to obtain one.
**NOTE:** PostgreSQL has a maximum amount of connections that it allows. This
limit can vary from installation to installation. As a result it's recommended
you do not use more than 32 threads in a single migration. Usually 4-8 threads
should be more than enough.
## Removing indices ## Removing indices
When removing an index make sure to use the method `remove_concurrent_index` instead When removing an index make sure to use the method `remove_concurrent_index` instead
...@@ -78,7 +144,10 @@ end ...@@ -78,7 +144,10 @@ end
## Adding indices ## Adding indices
If you need to add an unique index please keep in mind there is possibility of existing duplicates. If it is possible write a separate migration for handling this situation. It can be just removing or removing with overwriting all references to these duplicates depend on situation. If you need to add a unique index please keep in mind there is the possibility
of existing duplicates being present in the database. This means that should
always _first_ add a migration that removes any duplicates, before adding the
unique index.
When adding an index make sure to use the method `add_concurrent_index` instead When adding an index make sure to use the method `add_concurrent_index` instead
of the regular `add_index` method. The `add_concurrent_index` method of the regular `add_index` method. The `add_concurrent_index` method
...@@ -90,17 +159,22 @@ so: ...@@ -90,17 +159,22 @@ so:
```ruby ```ruby
class MyMigration < ActiveRecord::Migration class MyMigration < ActiveRecord::Migration
include Gitlab::Database::MigrationHelpers include Gitlab::Database::MigrationHelpers
disable_ddl_transaction! disable_ddl_transaction!
def change def up
add_concurrent_index :table, :column
end
def down
remove_index :table, :column if index_exists?(:table, :column)
end end
end end
``` ```
## Adding Columns With Default Values ## Adding Columns With Default Values
When adding columns with default values you should use the method When adding columns with default values you must use the method
`add_column_with_default`. This method ensures the table is updated without `add_column_with_default`. This method ensures the table is updated without
requiring downtime. This method is not reversible so you must manually define requiring downtime. This method is not reversible so you must manually define
the `up` and `down` methods in your migration class. the `up` and `down` methods in your migration class.
...@@ -123,6 +197,9 @@ class MyMigration < ActiveRecord::Migration ...@@ -123,6 +197,9 @@ class MyMigration < ActiveRecord::Migration
end end
``` ```
Keep in mind that this operation can easily take 10-15 minutes to complete on
larger installations (e.g. GitLab.com). As a result you should only add default
values if absolutely necessary.
## Integer column type ## Integer column type
...@@ -147,13 +224,15 @@ add_column(:projects, :foo, :integer, default: 10, limit: 8) ...@@ -147,13 +224,15 @@ add_column(:projects, :foo, :integer, default: 10, limit: 8)
## Testing ## Testing
Make sure that your migration works with MySQL and PostgreSQL with data. An empty database does not guarantee that your migration is correct. Make sure that your migration works with MySQL and PostgreSQL with data. An
empty database does not guarantee that your migration is correct.
Make sure your migration can be reversed. Make sure your migration can be reversed.
## Data migration ## Data migration
Please prefer Arel and plain SQL over usual ActiveRecord syntax. In case of using plain SQL you need to quote all input manually with `quote_string` helper. Please prefer Arel and plain SQL over usual ActiveRecord syntax. In case of
using plain SQL you need to quote all input manually with `quote_string` helper.
Example with Arel: Example with Arel:
...@@ -177,3 +256,17 @@ select_all("SELECT name, COUNT(id) as cnt FROM tags GROUP BY name HAVING COUNT(i ...@@ -177,3 +256,17 @@ select_all("SELECT name, COUNT(id) as cnt FROM tags GROUP BY name HAVING COUNT(i
execute("DELETE FROM tags WHERE id IN(#{duplicate_ids.join(",")})") execute("DELETE FROM tags WHERE id IN(#{duplicate_ids.join(",")})")
end end
``` ```
If you need more complex logic you can define and use models local to a
migration. For example:
```ruby
class MyMigration < ActiveRecord::Migration
class Project < ActiveRecord::Base
self.table_name = 'projects'
end
end
```
When doing so be sure to explicitly set the model's table name so it's not
derived from the class name or namespace.
...@@ -2,7 +2,8 @@ ...@@ -2,7 +2,8 @@
When working with a database certain operations can be performed without taking When working with a database certain operations can be performed without taking
GitLab offline, others do require a downtime period. This guide describes GitLab offline, others do require a downtime period. This guide describes
various operations and their impact. various operations, their impact, and how to perform them without requiring
downtime.
## Adding Columns ## Adding Columns
...@@ -41,50 +42,156 @@ information on how to use this method. ...@@ -41,50 +42,156 @@ information on how to use this method.
## Dropping Columns ## Dropping Columns
On PostgreSQL you can safely remove an existing column without the need for Removing columns is tricky because running GitLab processes may still be using
downtime. When you drop a column in PostgreSQL it's not immediately removed, the columns. To work around this you will need two separate merge requests and
instead it is simply disabled. The data is removed on the next vacuum run. releases: one to ignore and then remove the column, and one to remove the ignore
rule.
On MySQL this operation requires downtime. ### Step 1: Ignoring The Column
While database wise dropping a column may be fine on PostgreSQL this operation The first step is to ignore the column in the application code. This is
still requires downtime because the application code may still be using the necessary because Rails caches the columns and re-uses this cache in various
column that was removed. For example, consider the following migration: places. This can be done by including the `IgnorableColumn` module into the
model, followed by defining the columns to ignore. For example, to ignore
`updated_at` in the User model you'd use the following:
```ruby ```ruby
class MyMigration < ActiveRecord::Migration class User < ActiveRecord::Base
def change include IgnorableColumn
remove_column :projects, :dummy
end ignore_column :updated_at
end end
``` ```
Now imagine that the GitLab instance is running and actively uses the `dummy` Once added you should create a _post-deployment_ migration that removes the
column. If we were to run the migration this would result in the GitLab instance column. Both these changes should be submitted in the same merge request.
producing errors whenever it tries to use the `dummy` column.
As a result of the above downtime _is_ required when removing a column, even ### Step 2: Removing The Ignore Rule
when using PostgreSQL.
Once the changes from step 1 have been released & deployed you can set up a
separate merge request that removes the ignore rule. This merge request can
simply remove the `ignore_column` line, and the `include IgnorableColumn` line
if no other `ignore_column` calls remain.
## Renaming Columns ## Renaming Columns
Renaming columns requires downtime as running GitLab instances will continue Renaming columns the normal way requires downtime as an application may continue
using the old column name until a new version is deployed. This can result using the old column name during/after a database migration. To rename a column
in the instance producing errors, which in turn can impact the user experience. without requiring downtime we need two migrations: a regular migration, and a
post-deployment migration. Both these migration can go in the same release.
## Changing Column Constraints ### Step 1: Add The Regular Migration
First we need to create the regular migration. This migration should use
`Gitlab::Database::MigrationHelpers#rename_column_concurrently` to perform the
renaming. For example
```ruby
# A regular migration in db/migrate
class RenameUsersUpdatedAtToUpdatedAtTimestamp < ActiveRecord::Migration
include Gitlab::Database::MigrationHelpers
disable_ddl_transaction!
def up
rename_column_concurrently :users, :updated_at, :updated_at_timestamp
end
def down
cleanup_concurrent_column_rename :users, :updated_at_timestamp, :updated_at
end
end
```
This will take care of renaming the column, ensuring data stays in sync, copying
over indexes and foreign keys, etc.
**NOTE:** if a column contains 1 or more indexes that do not contain the name of
the original column, the above procedure will fail. In this case you will first
need to rename these indexes.
Generally changing column constraints requires checking all rows in the table to ### Step 2: Add A Post-Deployment Migration
see if they meet the new constraint, unless a constraint is _removed_. For
example, changing a column that previously allowed NULL values to not allow NULL
values requires the database to verify all existing rows.
The specific behaviour varies a bit between databases but in general the safest The renaming procedure requires some cleaning up in a post-deployment migration.
approach is to assume changing constraints requires downtime. We can perform this cleanup using
`Gitlab::Database::MigrationHelpers#cleanup_concurrent_column_rename`:
```ruby
# A post-deployment migration in db/post_migrate
class CleanupUsersUpdatedAtRename < ActiveRecord::Migration
include Gitlab::Database::MigrationHelpers
disable_ddl_transaction!
def up
cleanup_concurrent_column_rename :users, :updated_at, :updated_at_timestamp
end
def down
rename_column_concurrently :users, :updated_at_timestamp, :updated_at
end
end
```
## Changing Column Constraints
Adding or removing a NOT NULL clause (or another constraint) can typically be
done without requiring downtime. However, this does require that any application
changes are deployed _first_. Thus, changing the constraints of a column should
happen in a post-deployment migration.
## Changing Column Types ## Changing Column Types
This operation requires downtime. Changing the type of a column can be done using
`Gitlab::Database::MigrationHelpers#change_column_type_concurrently`. This
method works similarly to `rename_column_concurrently`. For example, let's say
we want to change the type of `users.username` from `string` to `text`.
### Step 1: Create A Regular Migration
A regular migration is used to create a new column with a temporary name along
with setting up some triggers to keep data in sync. Such a migration would look
as follows:
```ruby
# A regular migration in db/migrate
class ChangeUsersUsernameStringToText < ActiveRecord::Migration
include Gitlab::Database::MigrationHelpers
disable_ddl_transaction!
def up
change_column_type_concurrently :users, :username, :text
end
def down
cleanup_concurrent_column_type_change :users, :username
end
end
```
### Step 2: Create A Post Deployment Migration
Next we need to clean up our changes using a post-deployment migration:
```ruby
# A post-deployment migration in db/post_migrate
class ChangeUsersUsernameStringToTextCleanup < ActiveRecord::Migration
include Gitlab::Database::MigrationHelpers
disable_ddl_transaction!
def up
cleanup_concurrent_column_type_change :users
end
def down
change_column_type_concurrently :users, :username, :string
end
end
```
And that's it, we're done!
## Adding Indexes ## Adding Indexes
...@@ -101,12 +208,19 @@ Migrations can take advantage of this by using the method ...@@ -101,12 +208,19 @@ Migrations can take advantage of this by using the method
```ruby ```ruby
class MyMigration < ActiveRecord::Migration class MyMigration < ActiveRecord::Migration
def change def up
add_concurrent_index :projects, :column_name add_concurrent_index :projects, :column_name
end end
def down
remove_index(:projects, :column_name) if index_exists?(:projects, :column_name)
end
end end
``` ```
Note that `add_concurrent_index` can not be reversed automatically, thus you
need to manually define `up` and `down`.
When running this on PostgreSQL the `CONCURRENTLY` option mentioned above is When running this on PostgreSQL the `CONCURRENTLY` option mentioned above is
used. On MySQL this method produces a regular `CREATE INDEX` query. used. On MySQL this method produces a regular `CREATE INDEX` query.
...@@ -125,43 +239,54 @@ This operation is safe as there's no code using the table just yet. ...@@ -125,43 +239,54 @@ This operation is safe as there's no code using the table just yet.
## Dropping Tables ## Dropping Tables
This operation requires downtime as application code may still be using the Dropping tables can be done safely using a post-deployment migration, but only
table. if the application no longer uses the table.
## Adding Foreign Keys ## Adding Foreign Keys
Adding foreign keys acquires an exclusive lock on both the source and target Adding foreign keys usually works in 3 steps:
tables in PostgreSQL. This requires downtime as otherwise the entire application
grinds to a halt for the duration of the operation. 1. Start a transaction
1. Run `ALTER TABLE` to add the constraint(s)
1. Check all existing data
On MySQL this operation also requires downtime _unless_ foreign key checks are Because `ALTER TABLE` typically acquires an exclusive lock until the end of a
disabled. Because this means checks aren't enforced this is not ideal, as such transaction this means this approach would require downtime.
one should assume MySQL also requires downtime.
GitLab allows you to work around this by using
`Gitlab::Database::MigrationHelpers#add_concurrent_foreign_key`. This method
ensures that when PostgreSQL is used no downtime is needed.
## Removing Foreign Keys ## Removing Foreign Keys
This operation should not require downtime on both PostgreSQL and MySQL. This operation does not require downtime.
## Updating Data ## Data Migrations
Updating data should generally be safe. The exception to this is data that's Data migrations can be tricky. The usual approach to migrate data is to take a 3
being migrated from one version to another while the application still produces step approach:
data in the old version.
For example, imagine the application writes the string `'dog'` to a column but 1. Migrate the initial batch of data
it really is meant to write `'cat'` instead. One might think that the following 1. Deploy the application code
migration is all that is needed to solve this problem: 1. Migrate any remaining data
```ruby Usually this works, but not always. For example, if a field's format is to be
class MyMigration < ActiveRecord::Migration changed from JSON to something else we have a bit of a problem. If we were to
def up change existing data before deploying application code we'll most likely run
execute("UPDATE some_table SET column = 'cat' WHERE column = 'dog';") into errors. On the other hand, if we were to migrate after deploying the
end application code we could run into the same problems.
end
``` If you merely need to correct some invalid data, then a post-deployment
migration is usually enough. If you need to change the format of data (e.g. from
JSON to something else) it's typically best to add a new column for the new data
format, and have the application use that. In such a case the procedure would
be:
Unfortunately this is not enough. Because the application is still running and 1. Add a new column in the new format
using the old value this may result in the table still containing rows where 1. Copy over existing data to this new column
`column` is set to `dog`, even after the migration finished. 1. Deploy the application code
1. In a post-deployment migration, copy over any remaining data
In these cases downtime _is_ required, even for rarely updated tables. In general there is no one-size-fits-all solution, therefore it's best to
discuss these kind of migrations in a merge request to make sure they are
implemented in the best way possible.
...@@ -48,6 +48,23 @@ GitLab provides official Docker images for both Community and Enterprise ...@@ -48,6 +48,23 @@ GitLab provides official Docker images for both Community and Enterprise
editions. They are based on the Omnibus package and instructions on how to editions. They are based on the Omnibus package and instructions on how to
update them are in [a separate document][omnidocker]. update them are in [a separate document][omnidocker].
## Upgrading without downtime
Starting with GitLab 9.1.0 it's possible to upgrade to a newer version of GitLab
without having to take your GitLab instance offline. However, for this to work
there are the following requirements:
1. You can only upgrade 1 release at a time. For example, if 9.1.15 is the last
release of 9.1 then you can safely upgrade from that version to 9.2.0.
However, if you are running 9.1.14 you first need to upgrade to 9.1.15.
2. You have to use [post-deployment
migrations](../development/post_deployment_migrations.md).
3. You are using PostgreSQL. If you are using MySQL you will still need downtime
when upgrading.
This applies to major, minor, and patch releases unless stated otherwise in a
release post.
## Upgrading between editions ## Upgrading between editions
GitLab comes in two flavors: [Community Edition][ce] which is MIT licensed, GitLab comes in two flavors: [Community Edition][ce] which is MIT licensed,
......
...@@ -89,7 +89,8 @@ module Gitlab ...@@ -89,7 +89,8 @@ module Gitlab
ADD CONSTRAINT #{key_name} ADD CONSTRAINT #{key_name}
FOREIGN KEY (#{column}) FOREIGN KEY (#{column})
REFERENCES #{target} (id) REFERENCES #{target} (id)
ON DELETE #{on_delete} NOT VALID; #{on_delete ? "ON DELETE #{on_delete}" : ''}
NOT VALID;
EOF EOF
# Validate the existing constraint. This can potentially take a very # Validate the existing constraint. This can potentially take a very
...@@ -258,6 +259,245 @@ module Gitlab ...@@ -258,6 +259,245 @@ module Gitlab
raise error raise error
end end
end end
# Renames a column without requiring downtime.
#
# Concurrent renames work by using database triggers to ensure both the
# old and new column are in sync. However, this method will _not_ remove
# the triggers or the old column automatically; this needs to be done
# manually in a post-deployment migration. This can be done using the
# method `cleanup_concurrent_column_rename`.
#
# table - The name of the database table containing the column.
# old - The old column name.
# new - The new column name.
# type - The type of the new column. If no type is given the old column's
# type is used.
def rename_column_concurrently(table, old, new, type: nil)
if transaction_open?
raise 'rename_column_concurrently can not be run inside a transaction'
end
trigger_name = rename_trigger_name(table, old, new)
quoted_table = quote_table_name(table)
quoted_old = quote_column_name(old)
quoted_new = quote_column_name(new)
if Database.postgresql?
install_rename_triggers_for_postgresql(trigger_name, quoted_table,
quoted_old, quoted_new)
else
install_rename_triggers_for_mysql(trigger_name, quoted_table,
quoted_old, quoted_new)
end
old_col = column_for(table, old)
new_type = type || old_col.type
add_column(table, new, new_type,
limit: old_col.limit,
default: old_col.default,
null: old_col.null,
precision: old_col.precision,
scale: old_col.scale)
update_column_in_batches(table, new, Arel::Table.new(table)[old])
copy_indexes(table, old, new)
copy_foreign_keys(table, old, new)
end
# Changes the type of a column concurrently.
#
# table - The table containing the column.
# column - The name of the column to change.
# new_type - The new column type.
def change_column_type_concurrently(table, column, new_type)
temp_column = "#{column}_for_type_change"
rename_column_concurrently(table, column, temp_column, type: new_type)
end
# Performs cleanup of a concurrent type change.
#
# table - The table containing the column.
# column - The name of the column to change.
# new_type - The new column type.
def cleanup_concurrent_column_type_change(table, column)
temp_column = "#{column}_for_type_change"
transaction do
# This has to be performed in a transaction as otherwise we might have
# inconsistent data.
cleanup_concurrent_column_rename(table, column, temp_column)
rename_column(table, temp_column, column)
end
end
# Cleans up a concurrent column name.
#
# This method takes care of removing previously installed triggers as well
# as removing the old column.
#
# table - The name of the database table.
# old - The name of the old column.
# new - The name of the new column.
def cleanup_concurrent_column_rename(table, old, new)
trigger_name = rename_trigger_name(table, old, new)
if Database.postgresql?
remove_rename_triggers_for_postgresql(table, trigger_name)
else
remove_rename_triggers_for_mysql(trigger_name)
end
remove_column(table, old)
end
# Performs a concurrent column rename when using PostgreSQL.
def install_rename_triggers_for_postgresql(trigger, table, old, new)
execute <<-EOF.strip_heredoc
CREATE OR REPLACE FUNCTION #{trigger}()
RETURNS trigger AS
$BODY$
BEGIN
NEW.#{new} := NEW.#{old};
RETURN NEW;
END;
$BODY$
LANGUAGE 'plpgsql'
VOLATILE
EOF
execute <<-EOF.strip_heredoc
CREATE TRIGGER #{trigger}
BEFORE INSERT OR UPDATE
ON #{table}
FOR EACH ROW
EXECUTE PROCEDURE #{trigger}()
EOF
end
# Installs the triggers necessary to perform a concurrent column rename on
# MySQL.
def install_rename_triggers_for_mysql(trigger, table, old, new)
execute <<-EOF.strip_heredoc
CREATE TRIGGER #{trigger}_insert
BEFORE INSERT
ON #{table}
FOR EACH ROW
SET NEW.#{new} = NEW.#{old}
EOF
execute <<-EOF.strip_heredoc
CREATE TRIGGER #{trigger}_update
BEFORE UPDATE
ON #{table}
FOR EACH ROW
SET NEW.#{new} = NEW.#{old}
EOF
end
# Removes the triggers used for renaming a PostgreSQL column concurrently.
def remove_rename_triggers_for_postgresql(table, trigger)
execute("DROP TRIGGER #{trigger} ON #{table}")
execute("DROP FUNCTION #{trigger}()")
end
# Removes the triggers used for renaming a MySQL column concurrently.
def remove_rename_triggers_for_mysql(trigger)
execute("DROP TRIGGER #{trigger}_insert")
execute("DROP TRIGGER #{trigger}_update")
end
# Returns the (base) name to use for triggers when renaming columns.
def rename_trigger_name(table, old, new)
'trigger_' + Digest::SHA256.hexdigest("#{table}_#{old}_#{new}").first(12)
end
# Returns an Array containing the indexes for the given column
def indexes_for(table, column)
column = column.to_s
indexes(table).select { |index| index.columns.include?(column) }
end
# Returns an Array containing the foreign keys for the given column.
def foreign_keys_for(table, column)
column = column.to_s
foreign_keys(table).select { |fk| fk.column == column }
end
# Copies all indexes for the old column to a new column.
#
# table - The table containing the columns and indexes.
# old - The old column.
# new - The new column.
def copy_indexes(table, old, new)
old = old.to_s
new = new.to_s
indexes_for(table, old).each do |index|
new_columns = index.columns.map do |column|
column == old ? new : column
end
# This is necessary as we can't properly rename indexes such as
# "ci_taggings_idx".
unless index.name.include?(old)
raise "The index #{index.name} can not be copied as it does not "\
"mention the old column. You have to rename this index manually first."
end
name = index.name.gsub(old, new)
options = {
unique: index.unique,
name: name,
length: index.lengths,
order: index.orders
}
# These options are not supported by MySQL, so we only add them if
# they were previously set.
options[:using] = index.using if index.using
options[:where] = index.where if index.where
unless index.opclasses.blank?
opclasses = index.opclasses.dup
# Copy the operator classes for the old column (if any) to the new
# column.
opclasses[new] = opclasses.delete(old) if opclasses[old]
options[:opclasses] = opclasses
end
add_concurrent_index(table, new_columns, options)
end
end
# Copies all foreign keys for the old column to the new column.
#
# table - The table containing the columns and indexes.
# old - The old column.
# new - The new column.
def copy_foreign_keys(table, old, new)
foreign_keys_for(table, old).each do |fk|
add_concurrent_foreign_key(fk.from_table,
fk.to_table,
column: new,
on_delete: fk.on_delete)
end
end
# Returns the column for the given table and column name.
def column_for(table, name)
name = name.to_s
columns(table).find { |column| column.name == name }
end
end end
end end
end end
module Gitlab
module Database
module MultiThreadedMigration
MULTI_THREAD_AR_CONNECTION = :thread_local_ar_connection
# This overwrites the default connection method so that every thread can
# use a thread-local connection, while still supporting all of Rails'
# migration methods.
def connection
Thread.current[MULTI_THREAD_AR_CONNECTION] ||
ActiveRecord::Base.connection
end
# Starts a thread-pool for N threads, along with N threads each using a
# single connection. The provided block is yielded from inside each
# thread.
#
# Example:
#
# with_multiple_threads(4) do
# execute('SELECT ...')
# end
#
# thread_count - The number of threads to start.
#
# join - When set to true this method will join the threads, blocking the
# caller until all threads have finished running.
#
# Returns an Array containing the started threads.
def with_multiple_threads(thread_count, join: true)
pool = Gitlab::Database.create_connection_pool(thread_count)
threads = Array.new(thread_count) do
Thread.new do
pool.with_connection do |connection|
begin
Thread.current[MULTI_THREAD_AR_CONNECTION] = connection
yield
ensure
Thread.current[MULTI_THREAD_AR_CONNECTION] = nil
end
end
end
end
threads.each(&:join) if join
threads
end
end
end
end
...@@ -338,4 +338,392 @@ describe Gitlab::Database::MigrationHelpers, lib: true do ...@@ -338,4 +338,392 @@ describe Gitlab::Database::MigrationHelpers, lib: true do
end end
end end
end end
describe '#rename_column_concurrently' do
context 'in a transaction' do
it 'raises RuntimeError' do
allow(model).to receive(:transaction_open?).and_return(true)
expect { model.rename_column_concurrently(:users, :old, :new) }.
to raise_error(RuntimeError)
end
end
context 'outside a transaction' do
let(:old_column) do
double(:column,
type: :integer,
limit: 8,
default: 0,
null: false,
precision: 5,
scale: 1)
end
let(:trigger_name) { model.rename_trigger_name(:users, :old, :new) }
before do
allow(model).to receive(:transaction_open?).and_return(false)
allow(model).to receive(:column_for).and_return(old_column)
# Since MySQL and PostgreSQL use different quoting styles we'll just
# stub the methods used for this to make testing easier.
allow(model).to receive(:quote_column_name) { |name| name.to_s }
allow(model).to receive(:quote_table_name) { |name| name.to_s }
end
context 'using MySQL' do
it 'renames a column concurrently' do
allow(Gitlab::Database).to receive(:postgresql?).and_return(false)
expect(model).to receive(:install_rename_triggers_for_mysql).
with(trigger_name, 'users', 'old', 'new')
expect(model).to receive(:add_column).
with(:users, :new, :integer,
limit: old_column.limit,
default: old_column.default,
null: old_column.null,
precision: old_column.precision,
scale: old_column.scale)
expect(model).to receive(:update_column_in_batches)
expect(model).to receive(:copy_indexes).with(:users, :old, :new)
expect(model).to receive(:copy_foreign_keys).with(:users, :old, :new)
model.rename_column_concurrently(:users, :old, :new)
end
end
context 'using PostgreSQL' do
it 'renames a column concurrently' do
allow(Gitlab::Database).to receive(:postgresql?).and_return(true)
expect(model).to receive(:install_rename_triggers_for_postgresql).
with(trigger_name, 'users', 'old', 'new')
expect(model).to receive(:add_column).
with(:users, :new, :integer,
limit: old_column.limit,
default: old_column.default,
null: old_column.null,
precision: old_column.precision,
scale: old_column.scale)
expect(model).to receive(:update_column_in_batches)
expect(model).to receive(:copy_indexes).with(:users, :old, :new)
expect(model).to receive(:copy_foreign_keys).with(:users, :old, :new)
model.rename_column_concurrently(:users, :old, :new)
end
end
end
end
describe '#cleanup_concurrent_column_rename' do
it 'cleans up the renaming procedure for PostgreSQL' do
allow(Gitlab::Database).to receive(:postgresql?).and_return(true)
expect(model).to receive(:remove_rename_triggers_for_postgresql).
with(:users, /trigger_.{12}/)
expect(model).to receive(:remove_column).with(:users, :old)
model.cleanup_concurrent_column_rename(:users, :old, :new)
end
it 'cleans up the renaming procedure for MySQL' do
allow(Gitlab::Database).to receive(:postgresql?).and_return(false)
expect(model).to receive(:remove_rename_triggers_for_mysql).
with(/trigger_.{12}/)
expect(model).to receive(:remove_column).with(:users, :old)
model.cleanup_concurrent_column_rename(:users, :old, :new)
end
end
describe '#change_column_type_concurrently' do
it 'changes the column type' do
expect(model).to receive(:rename_column_concurrently).
with('users', 'username', 'username_for_type_change', type: :text)
model.change_column_type_concurrently('users', 'username', :text)
end
end
describe '#cleanup_concurrent_column_type_change' do
it 'cleans up the type changing procedure' do
expect(model).to receive(:cleanup_concurrent_column_rename).
with('users', 'username', 'username_for_type_change')
expect(model).to receive(:rename_column).
with('users', 'username_for_type_change', 'username')
model.cleanup_concurrent_column_type_change('users', 'username')
end
end
describe '#install_rename_triggers_for_postgresql' do
it 'installs the triggers for PostgreSQL' do
expect(model).to receive(:execute).
with(/CREATE OR REPLACE FUNCTION foo()/m)
expect(model).to receive(:execute).
with(/CREATE TRIGGER foo/m)
model.install_rename_triggers_for_postgresql('foo', :users, :old, :new)
end
end
describe '#install_rename_triggers_for_mysql' do
it 'installs the triggers for MySQL' do
expect(model).to receive(:execute).
with(/CREATE TRIGGER foo_insert.+ON users/m)
expect(model).to receive(:execute).
with(/CREATE TRIGGER foo_update.+ON users/m)
model.install_rename_triggers_for_mysql('foo', :users, :old, :new)
end
end
describe '#remove_rename_triggers_for_postgresql' do
it 'removes the function and trigger' do
expect(model).to receive(:execute).with('DROP TRIGGER foo ON bar')
expect(model).to receive(:execute).with('DROP FUNCTION foo()')
model.remove_rename_triggers_for_postgresql('bar', 'foo')
end
end
describe '#remove_rename_triggers_for_mysql' do
it 'removes the triggers' do
expect(model).to receive(:execute).with('DROP TRIGGER foo_insert')
expect(model).to receive(:execute).with('DROP TRIGGER foo_update')
model.remove_rename_triggers_for_mysql('foo')
end
end
describe '#rename_trigger_name' do
it 'returns a String' do
expect(model.rename_trigger_name(:users, :foo, :bar)).
to match(/trigger_.{12}/)
end
end
describe '#indexes_for' do
it 'returns the indexes for a column' do
idx1 = double(:idx, columns: %w(project_id))
idx2 = double(:idx, columns: %w(user_id))
allow(model).to receive(:indexes).with('table').and_return([idx1, idx2])
expect(model.indexes_for('table', :user_id)).to eq([idx2])
end
end
describe '#foreign_keys_for' do
it 'returns the foreign keys for a column' do
fk1 = double(:fk, column: 'project_id')
fk2 = double(:fk, column: 'user_id')
allow(model).to receive(:foreign_keys).with('table').and_return([fk1, fk2])
expect(model.foreign_keys_for('table', :user_id)).to eq([fk2])
end
end
describe '#copy_indexes' do
context 'using a regular index using a single column' do
it 'copies the index' do
index = double(:index,
columns: %w(project_id),
name: 'index_on_issues_project_id',
using: nil,
where: nil,
opclasses: {},
unique: false,
lengths: [],
orders: [])
allow(model).to receive(:indexes_for).with(:issues, 'project_id').
and_return([index])
expect(model).to receive(:add_concurrent_index).
with(:issues,
%w(gl_project_id),
unique: false,
name: 'index_on_issues_gl_project_id',
length: [],
order: [])
model.copy_indexes(:issues, :project_id, :gl_project_id)
end
end
context 'using a regular index with multiple columns' do
it 'copies the index' do
index = double(:index,
columns: %w(project_id foobar),
name: 'index_on_issues_project_id_foobar',
using: nil,
where: nil,
opclasses: {},
unique: false,
lengths: [],
orders: [])
allow(model).to receive(:indexes_for).with(:issues, 'project_id').
and_return([index])
expect(model).to receive(:add_concurrent_index).
with(:issues,
%w(gl_project_id foobar),
unique: false,
name: 'index_on_issues_gl_project_id_foobar',
length: [],
order: [])
model.copy_indexes(:issues, :project_id, :gl_project_id)
end
end
context 'using an index with a WHERE clause' do
it 'copies the index' do
index = double(:index,
columns: %w(project_id),
name: 'index_on_issues_project_id',
using: nil,
where: 'foo',
opclasses: {},
unique: false,
lengths: [],
orders: [])
allow(model).to receive(:indexes_for).with(:issues, 'project_id').
and_return([index])
expect(model).to receive(:add_concurrent_index).
with(:issues,
%w(gl_project_id),
unique: false,
name: 'index_on_issues_gl_project_id',
length: [],
order: [],
where: 'foo')
model.copy_indexes(:issues, :project_id, :gl_project_id)
end
end
context 'using an index with a USING clause' do
it 'copies the index' do
index = double(:index,
columns: %w(project_id),
name: 'index_on_issues_project_id',
where: nil,
using: 'foo',
opclasses: {},
unique: false,
lengths: [],
orders: [])
allow(model).to receive(:indexes_for).with(:issues, 'project_id').
and_return([index])
expect(model).to receive(:add_concurrent_index).
with(:issues,
%w(gl_project_id),
unique: false,
name: 'index_on_issues_gl_project_id',
length: [],
order: [],
using: 'foo')
model.copy_indexes(:issues, :project_id, :gl_project_id)
end
end
context 'using an index with custom operator classes' do
it 'copies the index' do
index = double(:index,
columns: %w(project_id),
name: 'index_on_issues_project_id',
using: nil,
where: nil,
opclasses: { 'project_id' => 'bar' },
unique: false,
lengths: [],
orders: [])
allow(model).to receive(:indexes_for).with(:issues, 'project_id').
and_return([index])
expect(model).to receive(:add_concurrent_index).
with(:issues,
%w(gl_project_id),
unique: false,
name: 'index_on_issues_gl_project_id',
length: [],
order: [],
opclasses: { 'gl_project_id' => 'bar' })
model.copy_indexes(:issues, :project_id, :gl_project_id)
end
end
describe 'using an index of which the name does not contain the source column' do
it 'raises RuntimeError' do
index = double(:index,
columns: %w(project_id),
name: 'index_foobar_index',
using: nil,
where: nil,
opclasses: {},
unique: false,
lengths: [],
orders: [])
allow(model).to receive(:indexes_for).with(:issues, 'project_id').
and_return([index])
expect { model.copy_indexes(:issues, :project_id, :gl_project_id) }.
to raise_error(RuntimeError)
end
end
end
describe '#copy_foreign_keys' do
it 'copies foreign keys from one column to another' do
fk = double(:fk,
from_table: 'issues',
to_table: 'projects',
on_delete: :cascade)
allow(model).to receive(:foreign_keys_for).with(:issues, :project_id).
and_return([fk])
expect(model).to receive(:add_concurrent_foreign_key).
with('issues', 'projects', column: :gl_project_id, on_delete: :cascade)
model.copy_foreign_keys(:issues, :project_id, :gl_project_id)
end
end
describe '#column_for' do
it 'returns a column object for an existing column' do
column = model.column_for(:users, :id)
expect(column.name).to eq('id')
end
it 'returns nil when a column does not exist' do
expect(model.column_for(:users, :kittens)).to be_nil
end
end
end end
require 'spec_helper'
describe Gitlab::Database::MultiThreadedMigration do
let(:migration) do
Class.new { include Gitlab::Database::MultiThreadedMigration }.new
end
describe '#connection' do
after do
Thread.current[described_class::MULTI_THREAD_AR_CONNECTION] = nil
end
it 'returns the thread-local connection if present' do
Thread.current[described_class::MULTI_THREAD_AR_CONNECTION] = 10
expect(migration.connection).to eq(10)
end
it 'returns the global connection if no thread-local connection was set' do
expect(migration.connection).to eq(ActiveRecord::Base.connection)
end
end
describe '#with_multiple_threads' do
it 'starts multiple threads and yields the supplied block in every thread' do
output = Queue.new
migration.with_multiple_threads(2) do
output << migration.connection.execute('SELECT 1')
end
expect(output.size).to eq(2)
end
it 'joins the threads when the join parameter is set' do
expect_any_instance_of(Thread).to receive(:join).and_call_original
migration.with_multiple_threads(1) { }
end
end
end
require 'spec_helper'
describe IgnorableColumn do
let :base_class do
Class.new do
def self.columns
# This method does not have access to "double"
[Struct.new(:name).new('id'), Struct.new(:name).new('title')]
end
end
end
let :model do
Class.new(base_class) do
include IgnorableColumn
end
end
describe '.columns' do
it 'returns the columns, excluding the ignored ones' do
model.ignore_column(:title)
expect(model.columns.map(&:name)).to eq(%w(id))
end
end
describe '.ignored_columns' do
it 'returns a Set' do
expect(model.ignored_columns).to be_an_instance_of(Set)
end
it 'returns the names of the ignored columns' do
model.ignore_column(:title)
expect(model.ignored_columns).to eq(Set.new(%w(title)))
end
end
end
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment