The term `SHOULD` per the [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt) means:
> This word, or the adjective "RECOMMENDED", mean that there
> may exist valid reasons in particular circumstances to ignore a
> particular item, but the full implications must be understood and
> carefully weighed before choosing a different course.
Ideally, each of these tradeoffs should be documented
in the separate issues, labelled accordingly and linked
to original issue and epic.
## Impact Analysis
**Summary:** think about the impact your merge request may have on performance
...
...
@@ -44,6 +59,64 @@ should ask one of the merge request reviewers to review your changes. You can
find a list of these reviewers at <https://about.gitlab.com/company/team/>. A reviewer
in turn can request a performance specialist to review the changes.
## Think outside of the box
Everyone has their own perception how the new feature is going to be used.
Always consider how users might be using the feature instead. Usually,
users test our features in a very unconventional way,
like by brute forcing or abusing edge conditions that we have.
## Data set
The data set that will be processed by the merge request should be known
and documented. The feature should clearly document what the expected
data set is for this feature to process, and what problems it might cause.
If you would think about the following example that puts
a strong emphasis of data set being processed.
The problem is simple: you want to filter a list of files from
some git repository. Your feature requests a list of all files
from the repository and perform search for the set of files.
As an author you should in context of that problem consider
the following:
1. What repositories are going to be supported?
1. How long it will take for big repositories like Linux kernel?
1. Is there something that we can do differently to not process such a
big data set?
1. Should we build some fail-safe mechanism to contain
computational complexity? Usually it is better to degrade
the service for a single user instead of all users.
## Query plans and database structure
The query plan can answer the questions whether we need additional
indexes, or whether we perform expensive filtering (i.e. using sequential scans).
Each query plan should be run against substantional size of data set.
For example if you look for issues with specific conditions,
you should consider validating the query against
a small number (a few hundred) and a big number (100_000) of issues.
See how the query will behave if the result will be a few
and a few thousand.
This is needed as we have users using GitLab for very big projects and
in a very unconventional way. Even, if it seems that it is unlikely
that such big data set will be used, it is still plausible that one
of our customers will have the problem with the feature.
Understanding ahead of time how it is going to behave at scale even if we accept it,
is the desired outcome. We should always have a plan or understanding what it takes
to optimise feature to the magnitude of higher usage patterns.
Every database structure should be optimised and sometimes even over-described
to be prepared to be easily extended. The hardest part after some point is
data migration. Migrating millions of rows will always be troublesome and
can have negative impact on application.
To better understand how to get help with the query plan reviews
read this section on [how to prepare the merge request for a database review](https://docs.gitlab.com/ee/development/database_review.html#how-to-prepare-the-merge-request-for-a-database-review).
## Query Counts
**Summary:** a merge request **should not** increase the number of executed SQL
...
...
@@ -172,3 +245,107 @@ Caching data per transaction can be done using
`Gitlab::SafeRequestStore` to avoid having to remember to check
`RequestStore.active?`). Caching data in Redis can be done using [Rails' caching