Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
G
gitlab-ce
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
1
Merge Requests
1
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
nexedi
gitlab-ce
Commits
530115f9
Commit
530115f9
authored
Aug 27, 2019
by
GitLab Bot
Browse files
Options
Browse Files
Download
Plain Diff
Automatic merge of gitlab-org/gitlab-ce master
parents
c83296ec
190dba6d
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
99 additions
and
44 deletions
+99
-44
doc/development/database_review.md
doc/development/database_review.md
+1
-1
doc/development/migration_style_guide.md
doc/development/migration_style_guide.md
+98
-43
No files found.
doc/development/database_review.md
View file @
530115f9
...
@@ -91,7 +91,7 @@ and details for a database reviewer:
...
@@ -91,7 +91,7 @@ and details for a database reviewer:
concurrent index/foreign key helpers (with transactions disabled)
concurrent index/foreign key helpers (with transactions disabled)
-
Check consistency with
`db/schema.rb`
and that migrations are
[
reversible
](
migration_style_guide.md#reversibility
)
-
Check consistency with
`db/schema.rb`
and that migrations are
[
reversible
](
migration_style_guide.md#reversibility
)
-
Check queries timing (If any): Queries executed in a migration
-
Check queries timing (If any): Queries executed in a migration
need to fit comfortabl
e
within
`15s`
- preferably much less than that - on GitLab.com.
need to fit comfortabl
y
within
`15s`
- preferably much less than that - on GitLab.com.
-
Check
[
background migrations
](
background_migrations.md
)
:
-
Check
[
background migrations
](
background_migrations.md
)
:
-
For data migrations, establish a time estimate for execution
-
For data migrations, establish a time estimate for execution
-
They should only be used when migrating data in larger tables.
-
They should only be used when migrating data in larger tables.
...
...
doc/development/migration_style_guide.md
View file @
530115f9
# Migration Style Guide
# Migration Style Guide
When writing migrations for GitLab, you have to take into account that
When writing migrations for GitLab, you have to take into account that
these will be r
a
n by hundreds of thousands of organizations of all sizes, some with
these will be r
u
n by hundreds of thousands of organizations of all sizes, some with
many years of data in their database.
many years of data in their database.
In addition, having to take a server offline for an upgrade small or big is a
In addition, having to take a server offline for an upgrade small or big is a
big burden for most organizations. For this reason it is important that your
big burden for most organizations. For this reason
,
it is important that your
migrations are written carefully, can be applied online and adhere to the style
migrations are written carefully, can be applied online
,
and adhere to the style
guide below.
guide below.
Migrations are
**not**
allowed to require GitLab installations to be taken
Migrations are
**not**
allowed to require GitLab installations to be taken
...
@@ -85,7 +85,38 @@ be possible to downgrade in case of a vulnerability or bugs.
...
@@ -85,7 +85,38 @@ be possible to downgrade in case of a vulnerability or bugs.
In your migration, add a comment describing how the reversibility of the
In your migration, add a comment describing how the reversibility of the
migration was tested.
migration was tested.
## Multi Threading
## Atomicity
By default, migrations are single transaction. That is, a transaction is opened
at the beginning of the migration, and committed after all steps are processed.
Running migrations in a single transaction makes sure that if one of the steps fails,
none of the steps will be executed, leaving the database in valid state.
Therefore, either:
-
Put all migrations in one single-transaction migration.
-
If necessary, put most actions in one migration and create a separate migration
for the steps that cannot be done in a single transaction.
For example, if you create an empty table and need to build an index for it,
it is recommended to use a regular single-transaction migration and the default
rails schema statement:
[
`add_index`
](
https://api.rubyonrails.org/v5.2/classes/ActiveRecord/ConnectionAdapters/SchemaStatements.html#method-i-add_index
)
.
This is a blocking operation, but it won't cause problems because the table is not yet used,
and therefore it does not have any records yet.
## Heavy operations in a single transaction
When using a single-transaction migration, a transaction will hold on a database connection
for the duration of the migration, so you must make sure the actions in the migration
do not take too much time: In general, queries executed in a migration need to fit comfortably
within
`15s`
on GitLab.com.
In case you need to insert, update, or delete a significant amount of data, you:
-
Must disable the single transaction with
`disable_ddl_transaction!`
.
-
Should consider doing it in a
[
Background Migration
](
background_migrations.md
)
.
## Multi-Threading
Sometimes a migration might need to use multiple Ruby threads to speed up a
Sometimes a migration might need to use multiple Ruby threads to speed up a
migration. For this to work your migration needs to include the module
migration. For this to work your migration needs to include the module
...
@@ -122,16 +153,16 @@ pool. This ensures each thread has its own connection object, and won't time
...
@@ -122,16 +153,16 @@ pool. This ensures each thread has its own connection object, and won't time
out when trying to obtain one.
out when trying to obtain one.
**NOTE:**
PostgreSQL has a maximum amount of connections that it allows. This
**NOTE:**
PostgreSQL has a maximum amount of connections that it allows. This
limit can vary from installation to installation. As a result it's recommended
limit can vary from installation to installation. As a result
,
it's recommended
you do not use more than 32 threads in a single migration. Usually 4-8 threads
you do not use more than 32 threads in a single migration. Usually
,
4-8 threads
should be more than enough.
should be more than enough.
## Removing indexes
## Removing indexes
When removing an index make sure to use the method
`remove_concurrent_index`
instea
d
If the table is not empty when removing an index, make sure to use the metho
d
of the regular
`remove_index`
method. The
`remove_concurrent_index`
method
`remove_concurrent_index`
instead of the regular
`remove_index`
method.
automatically drops concurrent indexes when using PostgreSQL, removing the
The
`remove_concurrent_index`
method drops indexes concurrently, so no locking is required,
need for downtime. To use this method
you must disable single-transaction mode
and there is no need for downtime. To use this method,
you must disable single-transaction mode
by calling the method
`disable_ddl_transaction!`
in the body of your migration
by calling the method
`disable_ddl_transaction!`
in the body of your migration
class like so:
class like so:
...
@@ -149,19 +180,25 @@ end
...
@@ -149,19 +180,25 @@ end
Note that it is not necessary to check if the index exists prior to
Note that it is not necessary to check if the index exists prior to
removing it.
removing it.
For a small table (such as an empty one or one with less than
`1,000`
records),
it is recommended to use
`remove_index`
in a single-transaction migration,
combining it with other operations that don't require
`disable_ddl_transaction!`
.
## Adding indexes
## Adding indexes
If you need to add a unique index please keep in mind there is the possibility
If you need to add a unique index
,
please keep in mind there is the possibility
of existing duplicates being present in the database. This means that should
of existing duplicates being present in the database. This means that should
always _first_ add a migration that removes any duplicates, before adding the
always _first_ add a migration that removes any duplicates, before adding the
unique index.
unique index.
When adding an index make sure to use the method
`add_concurrent_index`
instead
When adding an index to a non-empty table make sure to use the method
of the regular
`add_index`
method. The
`add_concurrent_index`
method
`add_concurrent_index`
instead of the regular
`add_index`
method.
automatically creates concurrent indexes when using PostgreSQL, removing the
The
`add_concurrent_index`
method automatically creates concurrent indexes
need for downtime. To use this method you must disable transactions by calling
when using PostgreSQL, removing the need for downtime.
the method
`disable_ddl_transaction!`
in the body of your migration class like
so:
To use this method, you must disable single-transactions mode
by calling the method
`disable_ddl_transaction!`
in the body of your migration
class like so:
```
ruby
```
ruby
class
MyMigration
<
ActiveRecord
::
Migration
[
4.2
]
class
MyMigration
<
ActiveRecord
::
Migration
[
4.2
]
...
@@ -179,16 +216,20 @@ class MyMigration < ActiveRecord::Migration[4.2]
...
@@ -179,16 +216,20 @@ class MyMigration < ActiveRecord::Migration[4.2]
end
end
```
```
For a small table (such as an empty one or one with less than
`1,000`
records),
it is recommended to use
`add_index`
in a single-transaction migration, combining it with other
operations that don't require
`disable_ddl_transaction!`
.
## Adding foreign-key constraints
## Adding foreign-key constraints
When adding a foreign-key constraint to either an existing or
new
When adding a foreign-key constraint to either an existing or
a new column also
column remember to also add a
index on the column.
remember to add an
index on the column.
This is
**required**
for all foreign-keys, e.g., to support efficient cascading
This is
**required**
for all foreign-keys, e.g., to support efficient cascading
deleting: when a lot of rows in a table get deleted, the referenced records need
deleting: when a lot of rows in a table get deleted, the referenced records need
to be deleted too. The database has to look for corresponding records in the
to be deleted too. The database has to look for corresponding records in the
referenced table. Without an index, this will result in a sequential scan on the
referenced table. Without an index, this will result in a sequential scan on the
table which can take a long time.
table
,
which can take a long time.
Here's an example where we add a new column with a foreign key
Here's an example where we add a new column with a foreign key
constraint. Note it includes
`index: true`
to create an index for it.
constraint. Note it includes
`index: true`
to create an index for it.
...
@@ -202,13 +243,17 @@ class Migration < ActiveRecord::Migration[4.2]
...
@@ -202,13 +243,17 @@ class Migration < ActiveRecord::Migration[4.2]
end
end
```
```
When adding a foreign-key constraint to an existing column
, we
When adding a foreign-key constraint to an existing column
in a non-empty table,
have to employ
`add_concurrent_foreign_key`
and
`add_concurrent_index`
we
have to employ
`add_concurrent_foreign_key`
and
`add_concurrent_index`
instead of
`add_reference`
.
instead of
`add_reference`
.
For an empty table (such as a fresh one), it is recommended to use
`add_reference`
in a single-transaction migration, combining it with other
operations that don't require
`disable_ddl_transaction!`
.
## Adding Columns With Default Values
## Adding Columns With Default Values
When adding columns with default values
you must use the method
When adding columns with default values
to non-empty tables, you must use
`add_column_with_default`
. This method ensures the table is updated without
`add_column_with_default`
. This method ensures the table is updated without
requiring downtime. This method is not reversible so you must manually define
requiring downtime. This method is not reversible so you must manually define
the
`up`
and
`down`
methods in your migration class.
the
`up`
and
`down`
methods in your migration class.
...
@@ -232,10 +277,14 @@ end
...
@@ -232,10 +277,14 @@ end
```
```
Keep in mind that this operation can easily take 10-15 minutes to complete on
Keep in mind that this operation can easily take 10-15 minutes to complete on
larger installations (e.g. GitLab.com). As a result you should only add default
larger installations (e.g. GitLab.com). As a result, you should only add
values if absolutely necessary. There is a RuboCop cop that will fail if this
default values if absolutely necessary. There is a RuboCop cop that will fail if
method is used on some tables that are very large on GitLab.com, which would
this method is used on some tables that are very large on GitLab.com, which
cause other issues.
would cause other issues.
For a small table (such as an empty one or one with less than
`1,000`
records),
use
`add_column`
and
`change_column_default`
in a single-transaction migration,
combining it with other operations that don't require
`disable_ddl_transaction!`
.
## Updating an existing column
## Updating an existing column
...
@@ -253,8 +302,10 @@ update_column_in_batches(:projects, :foo, 10) do |table, query|
...
@@ -253,8 +302,10 @@ update_column_in_batches(:projects, :foo, 10) do |table, query|
end
end
```
```
To perform a computed update, the value can be wrapped in
`Arel.sql`
, so Arel
If a computed update is needed, the value can be wrapped in
`Arel.sql`
, so Arel
treats it as an SQL literal. The below example is the same as the one above, but
treats it as an SQL literal. It's also a required deprecation for
[
Rails 6
](
https://gitlab.com/gitlab-org/gitlab-ce/issues/61451
)
.
The below example is the same as the one above, but
the value is set to the product of the
`bar`
and
`baz`
columns:
the value is set to the product of the
`bar`
and
`baz`
columns:
```
ruby
```
ruby
...
@@ -275,12 +326,12 @@ staging environment - or asking someone else to do so for you - beforehand.
...
@@ -275,12 +326,12 @@ staging environment - or asking someone else to do so for you - beforehand.
By default, an integer column can hold up to a 4-byte (32-bit) number. That is
By default, an integer column can hold up to a 4-byte (32-bit) number. That is
a max value of 2,147,483,647. Be aware of this when creating a column that will
a max value of 2,147,483,647. Be aware of this when creating a column that will
hold file sizes in byte units. If you are tracking file size in bytes this
hold file sizes in byte units. If you are tracking file size in bytes
,
this
restricts the maximum file size to just over 2GB.
restricts the maximum file size to just over 2GB.
To allow an integer column to hold up to an 8-byte (64-bit) number, explicitly
To allow an integer column to hold up to an 8-byte (64-bit) number, explicitly
set the limit to 8-bytes. This will allow the column to hold a value up to
set the limit to 8-bytes. This will allow the column to hold a value up to
9,223,372,036,854,775,807
.
`9,223,372,036,854,775,807`
.
Rails migration example:
Rails migration example:
...
@@ -294,9 +345,11 @@ add_column(:projects, :foo, :integer, default: 10, limit: 8)
...
@@ -294,9 +345,11 @@ add_column(:projects, :foo, :integer, default: 10, limit: 8)
## Timestamp column type
## Timestamp column type
By default, Rails uses the
`timestamp`
data type that stores timestamp data without timezone information.
By default, Rails uses the
`timestamp`
data type that stores timestamp data
The
`timestamp`
data type is used by calling either the
`add_timestamps`
or the
`timestamps`
method.
without timezone information. The
`timestamp`
data type is used by calling
Also Rails converts the
`:datetime`
data type to the
`timestamp`
one.
either the
`add_timestamps`
or the
`timestamps`
method.
Also, Rails converts the
`:datetime`
data type to the
`timestamp`
one.
Example:
Example:
...
@@ -317,14 +370,16 @@ def up
...
@@ -317,14 +370,16 @@ def up
end
end
```
```
Instead of using these methods one should use the following methods to store timestamps with timezones:
Instead of using these methods, one should use the following methods to store
timestamps with timezones:
-
`add_timestamps_with_timezone`
-
`add_timestamps_with_timezone`
-
`timestamps_with_timezone`
-
`timestamps_with_timezone`
This ensures all timestamps have a time zone specified. This in turn means existing timestamps won't
This ensures all timestamps have a time zone specified. This, in turn, means
suddenly use a different timezone when the system's timezone changes. It also makes it very clear which
existing timestamps won't suddenly use a different timezone when the system's
timezone was used in the first place.
timezone changes. It also makes it very clear which timezone was used in the
first place.
## Storing JSON in database
## Storing JSON in database
...
@@ -359,7 +414,7 @@ Make sure your migration can be reversed.
...
@@ -359,7 +414,7 @@ Make sure your migration can be reversed.
## Data migration
## Data migration
Please prefer Arel and plain SQL over usual ActiveRecord syntax. In case of
Please prefer Arel and plain SQL over usual ActiveRecord syntax. In case of
using plain SQL you need to quote all input manually with
`quote_string`
helper.
using plain SQL
,
you need to quote all input manually with
`quote_string`
helper.
Example with Arel:
Example with Arel:
...
@@ -384,7 +439,7 @@ select_all("SELECT name, COUNT(id) as cnt FROM tags GROUP BY name HAVING COUNT(i
...
@@ -384,7 +439,7 @@ select_all("SELECT name, COUNT(id) as cnt FROM tags GROUP BY name HAVING COUNT(i
end
end
```
```
If you need more complex logic you can define and use models local to a
If you need more complex logic
,
you can define and use models local to a
migration. For example:
migration. For example:
```
ruby
```
ruby
...
@@ -395,13 +450,13 @@ class MyMigration < ActiveRecord::Migration[4.2]
...
@@ -395,13 +450,13 @@ class MyMigration < ActiveRecord::Migration[4.2]
end
end
```
```
When doing so be sure to explicitly set the model's table name so it's not
When doing so be sure to explicitly set the model's table name
,
so it's not
derived from the class name or namespace.
derived from the class name or namespace.
### Renaming reserved paths
### Renaming reserved paths
When a new route for projects is introduced
tha
t could conflict with any
When a new route for projects is introduced
, i
t could conflict with any
existing records. The path for th
is
records should be renamed, and the
existing records. The path for th
ese
records should be renamed, and the
related data should be moved on disk.
related data should be moved on disk.
Since we had to do this a few times already, there are now some helpers to help
Since we had to do this a few times already, there are now some helpers to help
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment