Add query performance page to the docs

Add a page for query performance guidelines with timing suggestions and a section on cold and warm cache. Link to the new doc from other database doc locations mentioning query times.

Add query performance page to the docs
Add a page for query performance guidelines with timing suggestions and a section on cold and warm cache. Link to the new doc from other database doc locations mentioning query times.
6e03208e · Steve Abrams · 1ece0011 · 6e03208e · 6e03208e · 6e03208e
Commit 6e03208e authored Nov 10, 2020 by Steve Abrams
5 changed files
--- a/doc/development/database/index.md
+++ b/doc/development/database/index.md
@@ -59,6 +59,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
 - [Client-side connection-pool](client_side_connection_pool.md)
 - [Updating multiple values](setting_multiple_values.md)
 - [Constraints naming conventions](constraint_naming_convention.md)
+- [Query performance guidelines](../query_performance.md)

 ## Case studies


--- a/doc/development/database_review.md
+++ b/doc/development/database_review.md
@@ -158,8 +158,8 @@ test its execution using `CREATE INDEX CONCURRENTLY` in the `#database-lab` Slac
    - Maintainer: After the merge request is merged, notify Release Managers about it on `#f_upcoming_release` Slack channel.
  - Check consistency with `db/structure.sql` and that migrations are [reversible](migration_style_guide.md#reversibility)
  - Check that the relevant version files under `db/schema_migrations` were added or removed.
-  - Check queries timing (If any): Queries executed in a migration
-    need to fit comfortably within `15s` - preferably much less than that - on GitLab.com.
+  - Check queries timing (If any): In a single transaction, cumulative query time executed in a migration
+    needs to fit comfortably within `15s` - preferably much less than that - on GitLab.com.
  - For column removals, make sure the column has been [ignored in a previous release](what_requires_downtime.md#dropping-columns)
 - Check [background migrations](background_migrations.md):
  - Establish a time estimate for execution on GitLab.com. For historical purposes,
@@ -190,7 +190,7 @@ test its execution using `CREATE INDEX CONCURRENTLY` in the `#database-lab` Slac
  - For given queries, review parameters regarding data distribution
  - [Check query plans](understanding_explain_plans.md) and suggest improvements
    to queries (changing the query, schema or adding indexes and similar)
-  - General guideline is for queries to come in below 100ms execution time
+  - General guideline is for queries to come in below [100ms execution time](query_performance.md#timing-guidelines-for-queries)
  - Avoid N+1 problems and minimalize the [query count](merge_request_performance_guidelines.md#query-counts).

 ### Timing guidelines for migrations
@@ -206,4 +206,4 @@ Keep in mind that all runtimes should be measured against GitLab.com.
 |----|----|---|
 | Regular migrations on `db/migrate` | `3 minutes` | A valid exception are index creation as this can take a long time. |
 | Post migrations on `db/post_migrate` | `10 minutes` | |
-| Background migrations | --- | Since these are suitable for larger tables, it's not possible to set a precise timing guideline, however, any single query must stay below `1 second` execution time with cold caches. |
+| Background migrations | --- | Since these are suitable for larger tables, it's not possible to set a precise timing guideline, however, any single query must stay below [`1 second` execution time](query_performance.md#timing-guidelines-for-queries) with cold caches. |
--- a/doc/development/migration_style_guide.md
+++ b/doc/development/migration_style_guide.md
@@ -153,8 +153,9 @@ and therefore it does not have any records yet.

 When using a single-transaction migration, a transaction will hold on a database connection
 for the duration of the migration, so you must make sure the actions in the migration
-do not take too much time: In general, queries executed in a migration need to fit comfortably
-within `15s` on GitLab.com.
+do not take too much time: GitLab.com’s production database has a `15s` timeout, so
+in general, the cumulative execution time in a migration should aim to fit comfortably
+in that limit. Singular query timings should fit within the [standard limit](query_performance.md#timing-guidelines-for-queries)

 In case you need to insert, update, or delete a significant amount of data, you:


--- a/doc/development/product_analytics/usage_ping.md
+++ b/doc/development/product_analytics/usage_ping.md
@@ -631,7 +631,7 @@ Paste the SQL query into `#database-lab` to see how the query performs at scale.

 - `#database-lab` is a Slack channel which uses a production-sized environment to test your queries.
 - GitLab.com’s production database has a 15 second timeout.
- Any single query must stay below 1 second execution time with cold caches.
+- Any single query must stay below [1 second execution time](../query_performance.md#timing-guidelines-for-queries) with cold caches.
 - Add a specialized index on columns involved to reduce the execution time.

 In order to have an understanding of the query's execution we add in the MR description the following information:

--- a/doc/development/query_performance.md
+++ b/doc/development/query_performance.md
+---
+stage: Enablement
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
+---
+
+# Query performance guidelines
+
+This document describes various guidelines to follow when optimizing SQL queries.
+
+When you are optimizing your SQL queries, there are two dimensions to pay attention to:
+
+1. The query execution time. This is paramount as it reflects how the user experiences GitLab.
+1. The query plan. Optimizing the query plan is important in allowing queries to independently scale over time. Realizing that an index will keep a query performing well as the table grows before the query degrades is an example of why we analyze these plans.
+
+## Timing guidelines for queries
+
+| Query Type | Maximum Query Time | Notes |
+|----|----|---|
+| General queries | `100ms` | This is not a hard limit, but if a query is getting above it, it is important to spend time understanding why it can or cannot be optimized. |
+| Queries in a migration | `100ms` | This is different than the total [migration time](database_review.md#timing-guidelines-for-migrations). |
+| Concurrent operations in a migration | `5min` | Concurrent operations do not block the database, but they block the GitLab update. This includes operations such as `add_concurrent_index` and `add_concurrent_foreign_key`. |
+| Background migrations | `1s` |  |
+| Usage Ping | `1s` | See the [usage ping docs](product_analytics/usage_ping.md#developing-and-testing-usage-ping) for more details. |
+
+- When analyzing your query's performance, pay attention to if the time you are seeing is on a [cold or warm cache](#cold-and-warm-cache). These guidelines apply for both cache types.
+- When working with batched queries, change the range and batch size to see how it effects the query timing and caching.
+- If an existing query is already underperforming, make an effort to improve it. If it is too complex or would stall development, create a follow-up so it can be addressed in a timely manner. You can always ask the database reviewer or maintainer for help and guidance.
+
+## Cold and warm cache
+
+When evaluating query performance it is important to understand the difference between
+cold and warm cached queries.
+
+The first time a query is made, it is made on a "cold cache". Meaning it needs
+to read from disk. If you run the query again, the data can be read from the
+cache, or what PostgreSQL calls shared buffers. This is the "warm cache" query.
+
+When analyzing an [`EXPLAIN` plan](understanding_explain_plans.md), you can see
+the difference not only in the timing, but by looking at the output for `Buffers`
+by running your explain with `EXPLAIN(analyze, buffers)`. The [#database-lab](understanding_explain_plans.md#database-lab)
+tool will automatically include these options.
+
+If you are making a warm cache query, you will only see the `shared hits`.
+
+For example in #database-lab:
+
+```plaintext
+Shared buffers:
+  - hits: 36467 (~284.90 MiB) from the buffer pool
+  - reads: 0 from the OS file cache, including disk I/O
+```
+
+Or in the explain plan from `psql`:
+
+```sql
+Buffers: shared hit=7323
+```
+
+If the cache is cold, you will also see `reads`.
+
+In #database-lab:
+
+```plaintext
+Shared buffers:
+  - hits: 17204 (~134.40 MiB) from the buffer pool
+  - reads: 15229 (~119.00 MiB) from the OS file cache, including disk I/O
+```
+
+In `psql`:
+
+```sql
+Buffers: shared hit=7202 read=121
+```