Commit a0334a5f authored by Matthias Kaeppler's avatar Matthias Kaeppler

Clarify migration types in style guide

Also add flow chart to support decision making.
parent f98a5d04
...@@ -8,7 +8,7 @@ info: "See the Technical Writers assigned to Development Guidelines: https://abo ...@@ -8,7 +8,7 @@ info: "See the Technical Writers assigned to Development Guidelines: https://abo
# Background migrations # Background migrations
Background migrations should be used to perform data migrations whenever a Background migrations should be used to perform data migrations whenever a
migration exceeds [the time limits in our guidelines](database_review.md#timing-guidelines-for-migrations). For example, you can use background migration exceeds [the time limits in our guidelines](migration_style_guide.md#how-long-a-migration-should-take). For example, you can use background
migrations to migrate data that's stored in a single JSON column migrations to migrate data that's stored in a single JSON column
to a separate table instead. to a separate table instead.
...@@ -18,7 +18,7 @@ migrations automatically reschedule themselves for a later point in time. ...@@ -18,7 +18,7 @@ migrations automatically reschedule themselves for a later point in time.
## When To Use Background Migrations ## When To Use Background Migrations
You should use a background migration when you migrate _data_ in tables that have You should use a background migration when you migrate _data_ in tables that have
so many rows that the process would exceed [the time limits in our guidelines](database_review.md#timing-guidelines-for-migrations) if performed using a regular Rails migration. so many rows that the process would exceed [the time limits in our guidelines](migration_style_guide.md#how-long-a-migration-should-take) if performed using a regular Rails migration.
- Background migrations should be used when migrating data in [high-traffic tables](migration_style_guide.md#high-traffic-tables). - Background migrations should be used when migrating data in [high-traffic tables](migration_style_guide.md#high-traffic-tables).
- Background migrations may also be used when executing numerous single-row queries - Background migrations may also be used when executing numerous single-row queries
......
...@@ -216,7 +216,7 @@ test its execution using `CREATE INDEX CONCURRENTLY` in the `#database-lab` Slac ...@@ -216,7 +216,7 @@ test its execution using `CREATE INDEX CONCURRENTLY` in the `#database-lab` Slac
it's suggested to treat background migrations as post migrations: it's suggested to treat background migrations as post migrations:
place them in `db/post_migrate` instead of `db/migrate`. Keep in mind place them in `db/post_migrate` instead of `db/migrate`. Keep in mind
that post migrations are executed post-deployment in production. that post migrations are executed post-deployment in production.
- Check [timing guidelines for migrations](#timing-guidelines-for-migrations) - Check [timing guidelines for migrations](migration_style_guide.md#how-long-a-migration-should-take)
- Check migrations are reversible and implement a `#down` method - Check migrations are reversible and implement a `#down` method
- Check data migrations: - Check data migrations:
- Establish a time estimate for execution on GitLab.com. - Establish a time estimate for execution on GitLab.com.
...@@ -234,18 +234,3 @@ test its execution using `CREATE INDEX CONCURRENTLY` in the `#database-lab` Slac ...@@ -234,18 +234,3 @@ test its execution using `CREATE INDEX CONCURRENTLY` in the `#database-lab` Slac
to queries (changing the query, schema or adding indexes and similar) to queries (changing the query, schema or adding indexes and similar)
- General guideline is for queries to come in below [100ms execution time](query_performance.md#timing-guidelines-for-queries) - General guideline is for queries to come in below [100ms execution time](query_performance.md#timing-guidelines-for-queries)
- Avoid N+1 problems and minimize the [query count](merge_request_performance_guidelines.md#query-counts). - Avoid N+1 problems and minimize the [query count](merge_request_performance_guidelines.md#query-counts).
### Timing guidelines for migrations
In general, migrations for a single deploy shouldn't take longer than
1 hour for GitLab.com. The following guidelines are not hard rules, they were
estimated to keep migration timing to a minimum.
NOTE:
Keep in mind that all runtimes should be measured against GitLab.com.
| Migration Type | Execution Time Recommended | Notes |
|----|----|---|
| Regular migrations on `db/migrate` | `3 minutes` | A valid exception are index creation as this can take a long time. |
| Post migrations on `db/post_migrate` | `10 minutes` | |
| Background migrations | --- | Since these are suitable for larger tables, it's not possible to set a precise timing guideline, however, any single query must stay below [`1 second` execution time](query_performance.md#timing-guidelines-for-queries) with cold caches. |
...@@ -31,11 +31,66 @@ Please don't depend on GitLab-specific code since it can change in future ...@@ -31,11 +31,66 @@ Please don't depend on GitLab-specific code since it can change in future
versions. If needed copy-paste GitLab code into the migration to make it forward versions. If needed copy-paste GitLab code into the migration to make it forward
compatible. compatible.
For GitLab.com, please take into consideration that regular migrations (under `db/migrate`) ## Choose an appropriate migration type
are run before [Canary is deployed](https://gitlab.com/gitlab-com/gl-infra/readiness/-/tree/master/library/canary/#configuration-and-deployment),
and [post-deployment migrations](post_deployment_migrations.md) (`db/post_migrate`) are run after the deployment to production has finished. The first step before adding a new migration should be to decide which type is most appropriate.
There are currently three kinds of migrations you can create, depending on the kind of
work it needs to perform and how long it takes to complete:
1. [**Regular schema migrations.**](#create-a-regular-schema-migration) These are traditional Rails migrations in `db/migrate` that run _before_ new application code is deployed
(for GitLab.com before [Canary is deployed](https://gitlab.com/gitlab-com/gl-infra/readiness/-/tree/master/library/canary/#configuration-and-deployment)).
This means that they should be relatively fast, no more than a few minutes, so as not to unnecessarily delay a deployment.
One exception is a migration that takes longer but is absolutely critical for the application to operate correctly.
For example, you might have indices that enforce unique tuples, or that are needed for query performance in critical parts of the application. In cases where the migration would be unacceptably slow, however, a better option might be to guard the feature with a [feature flag](feature_flags/index.md)
and perform a post-deployment migration instead. The feature can then be turned on after the migration finishes.
1. [**Post-deployment migrations.**](post_deployment_migrations.md) These are Rails migrations in `db/post_migrate` and
run _after_ new application code has been deployed (for GitLab.com after the production deployment has finished).
They can be used for schema changes that aren't critical for the application to operate, or data migrations that take at most a few minutes.
Common examples for schema changes that should run post-deploy include:
- Clean-ups, like removing unused columns.
- Adding non-critical indices on high-traffic tables.
- Adding non-critical indices that take a long time to create.
1. [**Background migrations.**](background_migrations.md) These aren't regular Rails migrations, but application code that is
executed via Sidekiq jobs, although a post-deployment migration is used to schedule them. Use them only for data migrations that
exceed the timing guidelines for post-deploy migrations. Background migrations should _not_ change the schema.
Use the following diagram to guide your decision, but keep in mind that it is just a tool, and
the final outcome will always be dependent on the specific changes being made:
```mermaid
graph LR
A{Schema<br/>changed?}
A -->|Yes| C{Critical to<br/>speed or<br/>behavior?}
A -->|No| D{Is it fast?}
C -->|Yes| H{Is it fast?}
C -->|No| F[Post-deploy migration]
H -->|Yes| E[Regular migration]
H -->|No| I[Post-deploy migration<br/>+ feature flag]
D -->|Yes| F[Post-deploy migration]
D -->|No| G[Background migration]
```
### How long a migration should take
In general, all migrations for a single deploy shouldn't take longer than
1 hour for GitLab.com. The following guidelines are not hard rules, they were
estimated to keep migration duration to a minimum.
NOTE:
Keep in mind that all durations should be measured against GitLab.com.
| Migration Type | Recommended Duration | Notes |
|----|----|---|
| Regular migrations | `<= 3 minutes` | A valid exception are changes without which application functionality or performance would be severely degraded and which cannot be delayed. |
| Post-deployment migrations | `<= 10 minutes` | A valid exception are schema changes, since they must not happen in background migrations. |
| Background migrations | `> 10 minutes` | Since these are suitable for larger tables, it's not possible to set a precise timing guideline, however, any single query must stay below [`1 second` execution time](query_performance.md#timing-guidelines-for-queries) with cold caches. |
## Create database migrations ## Create a regular schema migration
To create a migration you can use the following Rails generator: To create a migration you can use the following Rails generator:
......
...@@ -18,7 +18,7 @@ When you are optimizing your SQL queries, there are two dimensions to pay attent ...@@ -18,7 +18,7 @@ When you are optimizing your SQL queries, there are two dimensions to pay attent
| Query Type | Maximum Query Time | Notes | | Query Type | Maximum Query Time | Notes |
|----|----|---| |----|----|---|
| General queries | `100ms` | This is not a hard limit, but if a query is getting above it, it is important to spend time understanding why it can or cannot be optimized. | | General queries | `100ms` | This is not a hard limit, but if a query is getting above it, it is important to spend time understanding why it can or cannot be optimized. |
| Queries in a migration | `100ms` | This is different than the total [migration time](database_review.md#timing-guidelines-for-migrations). | | Queries in a migration | `100ms` | This is different than the total [migration time](migration_style_guide.md#how-long-a-migration-should-take). |
| Concurrent operations in a migration | `5min` | Concurrent operations do not block the database, but they block the GitLab update. This includes operations such as `add_concurrent_index` and `add_concurrent_foreign_key`. | | Concurrent operations in a migration | `5min` | Concurrent operations do not block the database, but they block the GitLab update. This includes operations such as `add_concurrent_index` and `add_concurrent_foreign_key`. |
| Background migrations | `1s` | | | Background migrations | `1s` | |
| Service Ping | `1s` | See the [Service Ping docs](service_ping/index.md#developing-and-testing-service-ping) for more details. | | Service Ping | `1s` | See the [Service Ping docs](service_ping/index.md#developing-and-testing-service-ping) for more details. |
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment