Commit d60e8d68 authored by Achilleas Pipinellis's avatar Achilleas Pipinellis

Merge branch 'weimeng-master-patch-79445' into 'master'

Remove legacy content from Availability page

See merge request gitlab-org/gitlab!29878
parents 9363545a 01249e20
...@@ -4,136 +4,10 @@ type: reference, concepts ...@@ -4,136 +4,10 @@ type: reference, concepts
# Availability # Availability
GitLab offers high availability options for organizations that require
the fault tolerance and redundancy necessary to maintain high-uptime operations.
Please consult our [scaling documentation](../scaling) if you want to resolve
performance bottlenecks you encounter in individual GitLab components without
incurring the additional complexity costs associated with maintaining a
highly-available architecture.
On this page, we present examples of self-managed instances which demonstrate
how GitLab can be scaled out and made highly available. These examples progress
from simple to complex as scaling or highly-available components are added.
For larger setups serving 2,000 or more users, we provide
[reference architectures](../scaling/index.md#reference-architectures) based on GitLab's
experience with GitLab.com and internal scale testing that aim to achieve the
right balance of scalability and availability.
For detailed insight into how GitLab scales and configures GitLab.com, you can
watch [this 1 hour Q&A](https://www.youtube.com/watch?v=uCU8jdYzpac)
with [John Northrup](https://gitlab.com/northrup), and live questions coming
in from some of our customers.
GitLab offers a number of options to manage availability and resiliency. Below are the options to consider with trade-offs. GitLab offers a number of options to manage availability and resiliency. Below are the options to consider with trade-offs.
| Event | GitLab Feature | Recovery Point Objective (RPO) | Recovery Time Objective (RTO) | Cost | | Event | GitLab Feature | Recovery Point Objective (RPO) | Recovery Time Objective (RTO) | Cost |
| ----- | -------------- | --- | --- | ---- | | ----- | -------------- | --- | --- | ---- |
| Availability Zone failure | "GitLab HA" | No loss | No loss | 2x Git storage, multiple nodes balanced across AZ's | | Availability Zone failure | "GitLab HA" | No loss | No loss | 2x Git storage, multiple nodes balanced across AZ's |
| Region failure | "GitLab Disaster Recovery" | 5-10 minutes | 30 minutes | 2x primary cost | | Region failure | [GitLab Geo Disaster Recovery](../geo/disaster_recovery/index.md) | 5-10 minutes | 30 minutes | 2x primary cost |
| All failures | Backup/Restore | Last backup | Hours to Days | Cost of storing the backups | | All failures | Backup/Restore | Last backup | Hours to Days | Cost of storing the backups |
## High availability
### Omnibus installation with automatic database failover
By adding automatic failover for database systems, we can enable higher uptime with an additional layer of complexity.
- For PostgreSQL, we provide repmgr for server cluster management and failover
and a combination of [PgBouncer](../high_availability/pgbouncer.md) and [Consul](../high_availability/consul.md) for
database client cutover.
- For Redis, we use [Redis Sentinel](../high_availability/redis.md) for server failover and client cutover.
You can also optionally run [additional Sidekiq processes on dedicated hardware](../high_availability/sidekiq.md)
and configure individual Sidekiq processes to
[process specific background job queues](../operations/extra_sidekiq_processes.md)
if you need to scale out background job processing.
### GitLab Geo
GitLab Geo allows you to replicate your GitLab instance to other geographical locations as a read-only fully operational instance that can also be promoted in case of disaster.
This configuration is supported in [GitLab Premium and Ultimate](https://about.gitlab.com/pricing/).
References:
- [Geo Documentation](../geo/replication/index.md)
- [GitLab Geo with a highly available configuration](../geo/replication/high_availability.md)
## GitLab components and configuration instructions
The GitLab application depends on the following [components](../../development/architecture.md#component-diagram).
It can also depend on several third party services depending on
your environment setup. Here we'll detail both in the order in which
you would typically configure them along with our recommendations for
their use and configuration.
### Third party services
Here's some details of several third party services a typical environment
will depend on. The services can be provided by numerous applications
or providers and further advice can be given on how best to select.
These should be configured first, before the [GitLab components](#gitlab-components).
| Component | Description | Configuration instructions |
|--------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|
| [Load Balancer(s)](../high_availability/load_balancer.md)[^6] | Handles load balancing for the GitLab nodes where required | [Load balancer HA configuration](../high_availability/load_balancer.md) |
| [Cloud Object Storage service](../high_availability/object_storage.md)[^4] | Recommended store for shared data objects | [Cloud Object Storage configuration](../high_availability/object_storage.md) |
| [NFS](../high_availability/nfs.md)[^5] [^7] | Shared disk storage service. Can be used as an alternative for Gitaly or Object Storage. Required for GitLab Pages | [NFS configuration](../high_availability/nfs.md) |
### GitLab components
Next are all of the components provided directly by GitLab. As mentioned
earlier, they are presented in the typical order you would configure
them.
| Component | Description | Configuration instructions |
|---------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------|---------------------------------------------------------------|
| [Consul](../../development/architecture.md#consul)[^3] | Service discovery and health checks/failover | [Consul HA configuration](../high_availability/consul.md) **(PREMIUM ONLY)** |
| [PostgreSQL](../../development/architecture.md#postgresql) | Database | [Database HA configuration](../high_availability/database.md) |
| [PgBouncer](../../development/architecture.md#pgbouncer) | Database Pool Manager | [PgBouncer HA configuration](../high_availability/pgbouncer.md) **(PREMIUM ONLY)** |
| [Redis](../../development/architecture.md#redis)[^3] with Redis Sentinel | Key/Value store for shared data with HA watcher service | [Redis HA configuration](../high_availability/redis.md) |
| [Gitaly](../../development/architecture.md#gitaly)[^2] [^5] [^7] | Recommended high-level storage for Git repository data | [Gitaly HA configuration](../high_availability/gitaly.md) |
| [Sidekiq](../../development/architecture.md#sidekiq) | Asynchronous/Background jobs | [Sidekiq configuration](../high_availability/sidekiq.md) |
| [GitLab application nodes](../../development/architecture.md#unicorn)[^1] | (Unicorn / Puma, Workhorse) - Web-requests (UI, API, Git over HTTP) | [GitLab app HA/scaling configuration](../high_availability/gitlab.md) |
| [Prometheus](../../development/architecture.md#prometheus) and [Grafana](../../development/architecture.md#grafana) | GitLab environment monitoring | [Monitoring node for scaling/HA](../high_availability/monitoring_node.md) |
In some cases, components can be combined on the same nodes to reduce complexity as well.
[^1]: In our architectures we run each GitLab Rails node using the Puma webserver
and have its number of workers set to 90% of available CPUs along with 4 threads.
[^2]: Gitaly node requirements are dependent on customer data, specifically the number of
projects and their sizes. We recommend 2 nodes as an absolute minimum for HA environments
and at least 4 nodes should be used when supporting 50,000 or more users.
We also recommend that each Gitaly node should store no more than 5TB of data
and have the number of [`gitaly-ruby` workers](../gitaly/index.md#gitaly-ruby)
set to 20% of available CPUs. Additional nodes should be considered in conjunction
with a review of expected data size and spread based on the recommendations above.
[^3]: Recommended Redis setup differs depending on the size of the architecture.
For smaller architectures (up to 5,000 users) we suggest one Redis cluster for all
classes and that Redis Sentinel is hosted alongside Consul.
For larger architectures (10,000 users or more) we suggest running a separate
[Redis Cluster](../high_availability/redis.md#running-multiple-redis-clusters) for the Cache class
and another for the Queues and Shared State classes respectively. We also recommend
that you run the Redis Sentinel clusters separately as well for each Redis Cluster.
[^4]: For data objects such as LFS, Uploads, Artifacts, etc. We recommend a [Cloud Object Storage service](../object_storage.md)
over NFS where possible, due to better performance and availability.
[^5]: NFS can be used as an alternative for both repository data (replacing Gitaly) and
object storage but this isn't typically recommended for performance reasons. Note however it is required for
[GitLab Pages](https://gitlab.com/gitlab-org/gitlab-pages/issues/196).
[^6]: Our architectures have been tested and validated with [HAProxy](https://www.haproxy.org/)
as the load balancer. However other reputable load balancers with similar feature sets
should also work instead but be aware these aren't validated.
[^7]: We strongly recommend that any Gitaly and / or NFS nodes are set up with SSD disks over
HDD with a throughput of at least 8,000 IOPS for read operations and 2,000 IOPS for write
as these components have heavy I/O. These IOPS values are recommended only as a starter
as with time they may be adjusted higher or lower depending on the scale of your
environment's workload. If you're running the environment on a Cloud provider
you may need to refer to their documentation on how configure IOPS correctly.
...@@ -19,7 +19,7 @@ See [Running Gitaly on its own server](../gitaly/index.md#running-gitaly-on-its- ...@@ -19,7 +19,7 @@ See [Running Gitaly on its own server](../gitaly/index.md#running-gitaly-on-its-
in Gitaly documentation. in Gitaly documentation.
Continue configuration of other components by going back to the Continue configuration of other components by going back to the
[High Availability](../availability/index.md#gitlab-components-and-configuration-instructions) page. [Scaling](../scaling/index.md#components-provided-by-omnibus-gitlab) page.
## Enable Monitoring ## Enable Monitoring
......
...@@ -89,7 +89,7 @@ Advanced configuration options are supported and can be added if ...@@ -89,7 +89,7 @@ Advanced configuration options are supported and can be added if
needed. needed.
Continue configuration of other components by going back to the Continue configuration of other components by going back to the
[High Availability](../availability/index.md#gitlab-components-and-configuration-instructions) page. [Scaling](../scaling/index.md#components-provided-by-omnibus-gitlab) page.
### High Availability with GitLab Omnibus **(PREMIUM ONLY)** ### High Availability with GitLab Omnibus **(PREMIUM ONLY)**
......
...@@ -5,40 +5,17 @@ type: reference, concepts ...@@ -5,40 +5,17 @@ type: reference, concepts
# Scaling # Scaling
GitLab supports a number of scaling options to ensure that your self-managed GitLab supports a number of scaling options to ensure that your self-managed
instance is able to scale out to meet your organization's needs when scaling up instance is able to scale to meet your organization's needs.
a single-box GitLab installation is no longer practical or feasible.
Please consult our [high availability documentation](../availability/index.md) On this page, we present examples of self-managed instances which demonstrate
if your organization requires fault tolerance and redundancy features, such as how GitLab can be scaled up, scaled out or made highly available. These
automatic database system failover. examples progress from simple to complex as scaling or highly-available
components are added.
## GitLab components and scaling instructions For detailed insight into how GitLab scales and configures GitLab.com, you can
watch [this 1 hour Q&A](https://www.youtube.com/watch?v=uCU8jdYzpac)
Here's a list of components directly provided by Omnibus GitLab or installed as with [John Northrup](https://gitlab.com/northrup), and live questions coming
part of a source installation and their configuration instructions for scaling. in from some of our customers.
| Component | Description | Configuration instructions |
|-----------|-------------|----------------------------|
| [PostgreSQL](../../development/architecture.md#postgresql) | Database | [PostgreSQL configuration](https://docs.gitlab.com/omnibus/settings/database.html) |
| [Redis](../../development/architecture.md#redis) | Key/value store for fast data lookup and caching | [Redis configuration](../high_availability/redis.md) |
| [GitLab application services](../../development/architecture.md#unicorn) | Unicorn/Puma, Workhorse, GitLab Shell - serves front-end requests (UI, API, Git over HTTP/SSH) | [GitLab app scaling configuration](../high_availability/gitlab.md) |
| [PgBouncer](../../development/architecture.md#pgbouncer) | Database connection pooler | [PgBouncer configuration](../high_availability/pgbouncer.md#running-pgbouncer-as-part-of-a-non-ha-gitlab-installation) **(PREMIUM ONLY)** |
| [Sidekiq](../../development/architecture.md#sidekiq) | Asynchronous/background jobs | [Sidekiq configuration](../high_availability/sidekiq.md) |
| [Gitaly](../../development/architecture.md#gitaly) | Provides access to Git repositories | [Gitaly configuration](../gitaly/index.md#running-gitaly-on-its-own-server) |
| [Prometheus](../../development/architecture.md#prometheus) and [Grafana](../../development/architecture.md#grafana) | GitLab environment monitoring | [Monitoring node for scaling](../high_availability/monitoring_node.md) |
## Third-party services used for scaling
Here's a list of third-party services you may require as part of scaling GitLab.
The services can be provided by numerous applications or vendors and further
advice is given on how best to select the right choice for your organization's
needs.
| Component | Description | Configuration instructions |
|-----------|-------------|----------------------------|
| Load balancer(s) | Handles load balancing, typically when you have multiple GitLab application services nodes | [Load balancer configuration](../high_availability/load_balancer.md) |
| Object storage service | Recommended store for shared data objects | [Cloud Object Storage configuration](../high_availability/object_storage.md) |
| NFS | Shared disk storage service. Can be used as an alternative for Gitaly or Object Storage. Required for GitLab Pages | [NFS configuration](../high_availability/nfs.md) |
## Reference architectures ## Reference architectures
...@@ -76,7 +53,7 @@ how much automation you use, mirroring, and repo/change size. Additionally the ...@@ -76,7 +53,7 @@ how much automation you use, mirroring, and repo/change size. Additionally the
shown memory values are given directly by [GCP machine types](https://cloud.google.com/compute/docs/machine-types). shown memory values are given directly by [GCP machine types](https://cloud.google.com/compute/docs/machine-types).
On different cloud vendors a best effort like for like can be used. On different cloud vendors a best effort like for like can be used.
### Under 1,000 users ### Up to 1,000 users
From 1 to 1,000 users, a single-node [Omnibus](https://docs.gitlab.com/omnibus/) setup with frequent backups is adequate. From 1 to 1,000 users, a single-node [Omnibus](https://docs.gitlab.com/omnibus/) setup with frequent backups is adequate.
Please refer to the [installation documentation](../../install/README.md) and [backup/restore documentation](https://docs.gitlab.com/omnibus/settings/backups.html#backup-and-restore-omnibus-gitlab-configuration). Please refer to the [installation documentation](../../install/README.md) and [backup/restore documentation](https://docs.gitlab.com/omnibus/settings/backups.html#backup-and-restore-omnibus-gitlab-configuration).
...@@ -85,11 +62,15 @@ This solution is appropriate for many teams that have a single server at their d ...@@ -85,11 +62,15 @@ This solution is appropriate for many teams that have a single server at their d
You can also optionally configure GitLab to use an [external PostgreSQL service](../external_database.md) or an [external object storage service](../high_availability/object_storage.md) for added performance and reliability at a relatively low complexity cost. You can also optionally configure GitLab to use an [external PostgreSQL service](../external_database.md) or an [external object storage service](../high_availability/object_storage.md) for added performance and reliability at a relatively low complexity cost.
### 1,000 to 1,999 users ### Up to 2,000 users
For 1,000 to 1,999 users, defining a reference architecture for this scale is [being worked on](https://gitlab.com/gitlab-org/quality/performance/-/issues/223).
### 2,000 users NOTE: **Note:** The 2,000-user reference architecture documented below is
designed to help your organization achieve a highly-available GitLab deployment.
If you do not have the expertise or need to maintain a highly-available
environment, you can have a simpler and less costly-to-operate environment by
deploying two or more GitLab Rails servers, external load balancing, an NFS
server, a PostgreSQL server and a Redis server. A reference architecture with
this alternative in mind is [being worked on](https://gitlab.com/gitlab-org/quality/performance/-/issues/223).
- **Supported users (approximate):** 2,000 - **Supported users (approximate):** 2,000
- **Test RPS rates:** API: 40 RPS, Web: 4 RPS, Git: 4 RPS - **Test RPS rates:** API: 40 RPS, Web: 4 RPS, Git: 4 RPS
...@@ -110,7 +91,7 @@ For 1,000 to 1,999 users, defining a reference architecture for this scale is [b ...@@ -110,7 +91,7 @@ For 1,000 to 1,999 users, defining a reference architecture for this scale is [b
| External load balancing node[^6] | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 | c5.large | | External load balancing node[^6] | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 | c5.large |
| Internal load balancing node[^6] | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 | c5.large | | Internal load balancing node[^6] | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 | c5.large |
### 5,000 users ### Up to 5,000 users
- **Supported users (approximate):** 5,000 - **Supported users (approximate):** 5,000
- **Test RPS rates:** API: 100 RPS, Web: 10 RPS, Git: 10 RPS - **Test RPS rates:** API: 100 RPS, Web: 10 RPS, Git: 10 RPS
...@@ -131,7 +112,7 @@ For 1,000 to 1,999 users, defining a reference architecture for this scale is [b ...@@ -131,7 +112,7 @@ For 1,000 to 1,999 users, defining a reference architecture for this scale is [b
| External load balancing node[^6] | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 | c5.large | | External load balancing node[^6] | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 | c5.large |
| Internal load balancing node[^6] | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 | c5.large | | Internal load balancing node[^6] | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 | c5.large |
### 10,000 users ### Up to 10,000 users
- **Supported users (approximate):** 10,000 - **Supported users (approximate):** 10,000
- **Test RPS rates:** API: 200 RPS, Web: 20 RPS, Git: 20 RPS - **Test RPS rates:** API: 200 RPS, Web: 20 RPS, Git: 20 RPS
...@@ -155,7 +136,7 @@ For 1,000 to 1,999 users, defining a reference architecture for this scale is [b ...@@ -155,7 +136,7 @@ For 1,000 to 1,999 users, defining a reference architecture for this scale is [b
| External load balancing node[^6] | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 | c5.large | | External load balancing node[^6] | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 | c5.large |
| Internal load balancing node[^6] | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 | c5.large | | Internal load balancing node[^6] | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 | c5.large |
### 25,000 users ### Up to 25,000 users
- **Supported users (approximate):** 25,000 - **Supported users (approximate):** 25,000
- **Test RPS rates:** API: 500 RPS, Web: 50 RPS, Git: 50 RPS - **Test RPS rates:** API: 500 RPS, Web: 50 RPS, Git: 50 RPS
...@@ -179,7 +160,7 @@ For 1,000 to 1,999 users, defining a reference architecture for this scale is [b ...@@ -179,7 +160,7 @@ For 1,000 to 1,999 users, defining a reference architecture for this scale is [b
| External load balancing node[^6] | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 | c5.large | | External load balancing node[^6] | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 | c5.large |
| Internal load balancing node[^6] | 1 | 4 vCPU, 3.6GB Memory | n1-highcpu-4 | c5.xlarge | | Internal load balancing node[^6] | 1 | 4 vCPU, 3.6GB Memory | n1-highcpu-4 | c5.xlarge |
### 50,000 users ### Up to 50,000 users
- **Supported users (approximate):** 50,000 - **Supported users (approximate):** 50,000
- **Test RPS rates:** API: 1000 RPS, Web: 100 RPS, Git: 100 RPS - **Test RPS rates:** API: 1000 RPS, Web: 100 RPS, Git: 100 RPS
...@@ -203,6 +184,43 @@ For 1,000 to 1,999 users, defining a reference architecture for this scale is [b ...@@ -203,6 +184,43 @@ For 1,000 to 1,999 users, defining a reference architecture for this scale is [b
| External load balancing node[^6] | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 | c5.large | | External load balancing node[^6] | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 | c5.large |
| Internal load balancing node[^6] | 1 | 8 vCPU, 7.2GB Memory | n1-highcpu-8 | c5.2xlarge | | Internal load balancing node[^6] | 1 | 8 vCPU, 7.2GB Memory | n1-highcpu-8 | c5.2xlarge |
## Configuring GitLab to scale
### Components not provided by Omnibus GitLab
Depending on your system architecture, you may require some components which are
not provided in Omnibus GitLab. If required, these should be configured before
setting up components provided by GitLab. Advice on how to select the right
solution for your organization is provided in the configuration instructions
listed below.
| Component | Description | Configuration instructions |
|-----------|-------------|----------------------------|
| Load balancer(s)[^6] | Handles load balancing, typically when you have multiple GitLab application services nodes | [Load balancer configuration](../high_availability/load_balancer.md)[^6] |
| Object storage service[^4] | Recommended store for shared data objects | [Cloud Object Storage configuration](../object_storage.md) |
| NFS[^5] [^7] | Shared disk storage service. Can be used as an alternative for Gitaly or Object Storage. Required for GitLab Pages | [NFS configuration](../high_availability/nfs.md) |
### Components provided by Omnibus GitLab
The following components are provided by Omnibus GitLab. They are listed in the
order you'll typically configure them if they are required by your
[reference architecture](#reference-architectures) of choice.
| Component | Description | Configuration instructions |
|-----------|-------------|----------------------------|
| [Consul](../../development/architecture.md#consul)[^3] | Service discovery and health checks/failover | [Consul HA configuration](../high_availability/consul.md) **(PREMIUM ONLY)** |
| [PostgreSQL](../../development/architecture.md#postgresql) | Database | [PostgreSQL configuration](https://docs.gitlab.com/omnibus/settings/database.html) |
| [PgBouncer](../../development/architecture.md#pgbouncer) | Database connection pooler | [PgBouncer configuration](../high_availability/pgbouncer.md#running-pgbouncer-as-part-of-a-non-ha-gitlab-installation) **(PREMIUM ONLY)** |
| Repmgr | PostgreSQL cluster management and failover | [PostgreSQL and Repmgr configuration](../high_availability/database.md) |
| [Redis](../../development/architecture.md#redis)[^3] | Key/value store for fast data lookup and caching | [Redis configuration](../high_availability/redis.md) |
| Redis Sentinel | High availability for Redis | [Redis Sentinel configuration](../high_availability/redis.md) |
| [Gitaly](../../development/architecture.md#gitaly)[^2] [^5] [^7] | Provides access to Git repositories | [Gitaly configuration](../gitaly/index.md#running-gitaly-on-its-own-server) |
| [Sidekiq](../../development/architecture.md#sidekiq) | Asynchronous/background jobs | [Sidekiq configuration](../high_availability/sidekiq.md) |
| [GitLab application services](../../development/architecture.md#unicorn)[^1] | Unicorn/Puma, Workhorse, GitLab Shell - serves front-end requests (UI, API, Git over HTTP/SSH) | [GitLab app scaling configuration](../high_availability/gitlab.md) |
| [Prometheus](../../development/architecture.md#prometheus) and [Grafana](../../development/architecture.md#grafana) | GitLab environment monitoring | [Monitoring node for scaling](../high_availability/monitoring_node.md) |
## Footnotes
[^1]: In our architectures we run each GitLab Rails node using the Puma webserver [^1]: In our architectures we run each GitLab Rails node using the Puma webserver
and have its number of workers set to 90% of available CPUs along with 4 threads. and have its number of workers set to 90% of available CPUs along with 4 threads.
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment