Commit 6a21b5fd authored by Grant Young's avatar Grant Young Committed by Tanya Pazitny

Better handle docker errors

Add specs to dummy manifest upload errors
Add specs to dummy delete errors
Fix crash associated with docker error when deleting tags
parent c6c52893
......@@ -82,12 +82,12 @@ Complete the following installation steps in order. A link at the end of each
section will bring you back to the Scalable Architecture Examples section so
you can continue with the next step.
1. [PostgreSQL](database.md#postgresql-in-a-scaled-environment)
1. [PostgreSQL](database.md#postgresql-in-a-scaled-environment) with [PGBouncer](https://docs.gitlab.com/ee/administration/high_availability/pgbouncer.html)
1. [Redis](redis.md#redis-in-a-scaled-environment)
1. [Gitaly](gitaly.md) (recommended) and / or [NFS](nfs.md)[^4]
1. [GitLab application nodes](gitlab.md)
- With [Object Storage service enabled](../gitaly/index.md#eliminating-nfs-altogether)[^3]
1. [Load Balancer](load_balancer.md)[^2]
1. [Load Balancer(s)](load_balancer.md)[^2]
1. [Monitoring node (Prometheus and Grafana)](monitoring_node.md)
### Full Scaling
......@@ -98,8 +98,8 @@ is split into separate Sidekiq and Unicorn/Workhorse nodes. One indication that
this architecture is required is if Sidekiq queues begin to periodically increase
in size, indicating that there is contention or there are not enough resources.
- 1 or more PostgreSQL node
- 1 or more Redis node
- 1 or more PostgreSQL nodes
- 1 or more Redis nodes
- 1 or more Gitaly storage servers
- 1 or more Object Storage services[^3] and / or NFS storage server[^4]
- 2 or more Sidekiq nodes
......@@ -182,6 +182,7 @@ the basis of the GitLab.com architecture. While this scales well it also comes
with the added complexity of many more nodes to configure, manage, and monitor.
- 3 PostgreSQL nodes
- 1 or more PgBouncer nodes (with associated internal load balancers)
- 4 or more Redis nodes (2 separate clusters for persistent and cache data)
- 3 Consul nodes
- 3 Sentinel nodes
......@@ -228,16 +229,17 @@ users are, how much automation you use, mirroring, and repo/change size.
| ----------------------------|-------|-----------------------|---------------|
| GitLab Rails <br> - Puma workers on each node set to 90% of available CPUs with 16 threads | 3 | 32 vCPU, 28.8GB Memory | n1-highcpu-32 |
| PostgreSQL | 3 | 4 vCPU, 15GB Memory | n1-standard-4 |
| PgBouncer | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 |
| PgBouncer | 3 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 |
| Gitaly <br> - Gitaly Ruby workers on each node set to 20% of available CPUs | X[^1] . | 16 vCPU, 60GB Memory | n1-standard-16 |
| Redis Cache + Sentinel <br> - Cache maxmemory set to 90% of available memory | 3 | 4 vCPU, 15GB Memory | n1-standard-4 |
| Redis Persistent + Sentinel | 3 | 4 vCPU, 15GB Memory | n1-standard-4 |
| Sidekiq | 4 | 4 vCPU, 15GB Memory | n1-standard-4 |
| Consul | 3 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 |
| NFS Server[^4] . | 1 | 4 CPU, 3.6GB Memory | n1-highcpu-4 |
| NFS Server[^4] . | 1 | 4 vCPU, 3.6GB Memory | n1-highcpu-4 |
| S3 Object Storage[^3] . | - | - | - |
| Monitoring node | 1 | 4 CPU, 3.6GB Memory | n1-highcpu-4 |
| Load Balancing node[^2] . | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 |
| Monitoring node | 1 | 4 vCPU, 3.6GB Memory | n1-highcpu-4 |
| External load balancing node[^2] . | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 |
| Internal load balancing node[^2] . | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 |
NOTE: **Note:** Memory values are given directly by GCP machine sizes. On different cloud
vendors a best effort like for like can be used.
......@@ -255,16 +257,17 @@ vendors a best effort like for like can be used.
| ----------------------------|-------|-----------------------|---------------|
| GitLab Rails <br> - Puma workers on each node set to 90% of available CPUs with 16 threads | 7 | 32 vCPU, 28.8GB Memory | n1-highcpu-32 |
| PostgreSQL | 3 | 8 vCPU, 30GB Memory | n1-standard-8 |
| PgBouncer | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 |
| PgBouncer | 3 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 |
| Gitaly <br> - Gitaly Ruby workers on each node set to 20% of available CPUs | X[^1] . | 32 vCPU, 120GB Memory | n1-standard-32 |
| Redis Cache + Sentinel <br> - Cache maxmemory set to 90% of available memory | 3 | 4 vCPU, 15GB Memory | n1-standard-4 |
| Redis Persistent + Sentinel | 3 | 4 vCPU, 15GB Memory | n1-standard-4 |
| Sidekiq | 4 | 4 vCPU, 15GB Memory | n1-standard-4 |
| Consul | 3 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 |
| NFS Server[^4] . | 1 | 4 CPU, 3.6GB Memory | n1-highcpu-4 |
| NFS Server[^4] . | 1 | 4 vCPU, 3.6GB Memory | n1-highcpu-4 |
| S3 Object Storage[^3] . | - | - | - |
| Monitoring node | 1 | 4 CPU, 3.6GB Memory | n1-highcpu-4 |
| Load Balancing node[^2] . | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 |
| Monitoring node | 1 | 4 vCPU, 3.6GB Memory | n1-highcpu-4 |
| External load balancing node[^2] . | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 |
| Internal load balancing node[^2] . | 1 | 4 vCPU, 3.6GB Memory | n1-highcpu-4 |
NOTE: **Note:** Memory values are given directly by GCP machine sizes. On different cloud
vendors a best effort like for like can be used.
......@@ -284,16 +287,17 @@ may be adjusted prior to certification based on performance testing.
| ----------------------------|-------|-----------------------|---------------|
| GitLab Rails <br> - Puma workers on each node set to 90% of available CPUs with 16 threads | 15 | 32 vCPU, 28.8GB Memory | n1-highcpu-32 |
| PostgreSQL | 3 | 8 vCPU, 30GB Memory | n1-standard-8 |
| PgBouncer | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 |
| PgBouncer | 3 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 |
| Gitaly <br> - Gitaly Ruby workers on each node set to 20% of available CPUs | X[^1] . | 64 vCPU, 240GB Memory | n1-standard-64 |
| Redis Cache + Sentinel <br> - Cache maxmemory set to 90% of available memory | 3 | 4 vCPU, 15GB Memory | n1-standard-4 |
| Redis Persistent + Sentinel | 3 | 4 vCPU, 15GB Memory | n1-standard-4 |
| Sidekiq | 4 | 4 vCPU, 15GB Memory | n1-standard-4 |
| Consul | 3 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 |
| NFS Server[^4] . | 1 | 4 CPU, 3.6GB Memory | n1-highcpu-4 |
| NFS Server[^4] . | 1 | 4 vCPU, 3.6GB Memory | n1-highcpu-4 |
| S3 Object Storage[^3] . | - | - | - |
| Monitoring node | 1 | 4 CPU, 3.6GB Memory | n1-highcpu-4 |
| Load Balancing node[^2] . | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 |
| Monitoring node | 1 | 4 vCPU, 3.6GB Memory | n1-highcpu-4 |
| External load balancing node[^2] . | 1 | 2 vCPU, 1.8GB Memory | n1-highcpu-2 |
| Internal load balancing node[^2] . | 1 | 8 vCPU, 7.2GB Memory | n1-highcpu-8 |
NOTE: **Note:** Memory values are given directly by GCP machine sizes. On different cloud
vendors a best effort like for like can be used.
......@@ -305,18 +309,18 @@ vendors a best effort like for like can be used.
project counts and sizes.
[^2]: Our architectures have been tested and validated with [HAProxy](https://www.haproxy.org/)
as the load balancer. However other reputable load balancers with similar feature sets
should also work here but be aware these aren't validated.
as the load balancer. However other reputable load balancers with similar feature sets
should also work here but be aware these aren't validated.
[^3]: For data objects such as LFS, Uploads, Artifacts, etc... We recommend a S3 Object Storage
where possible over NFS due to better performance and availability. Several types of objects
are supported for S3 storage - [Job artifacts](../job_artifacts.md#using-object-storage),
[LFS](../lfs/lfs_administration.md#storing-lfs-objects-in-remote-object-storage),
[Uploads](../uploads.md#using-object-storage-core-only),
[Merge Request Diffs](../merge_request_diffs.md#using-object-storage),
[Packages](../packages/index.md#using-object-storage) (Optional Feature),
[Dependency Proxy](../packages/dependency_proxy.md#using-object-storage) (Optional Feature).
where possible over NFS due to better performance and availability. Several types of objects
are supported for S3 storage - [Job artifacts](../job_artifacts.md#using-object-storage),
[LFS](../lfs/lfs_administration.md#storing-lfs-objects-in-remote-object-storage),
[Uploads](../uploads.md#using-object-storage-core-only),
[Merge Request Diffs](../merge_request_diffs.md#using-object-storage),
[Packages](../packages/index.md#using-object-storage) (Optional Feature),
[Dependency Proxy](../packages/dependency_proxy.md#using-object-storage) (Optional Feature).
[^4]: NFS storage server is still required for [GitLab Pages](https://gitlab.com/gitlab-org/gitlab-pages/issues/196)
and optionally for CI Job Incremental Logging
([can be switched to use Redis instead](https://docs.gitlab.com/ee/administration/job_logs.html#new-incremental-logging-architecture)).
and optionally for CI Job Incremental Logging
([can be switched to use Redis instead](https://docs.gitlab.com/ee/administration/job_logs.html#new-incremental-logging-architecture)).
......@@ -135,7 +135,8 @@ The recommended configuration for a PostgreSQL HA requires:
- `repmgrd` - A service to monitor, and handle failover in case of a failure
- `Consul` agent - Used for service discovery, to alert other nodes when failover occurs
- A minimum of three `Consul` server nodes
- A minimum of one `pgbouncer` service node
- A minimum of one `pgbouncer` service node, but it's recommended to have one per database node
- An internal load balancer (TCP) is required when there is more than one `pgbouncer` service node
You also need to take into consideration the underlying network topology,
making sure you have redundant connectivity between all Database and GitLab instances,
......@@ -155,13 +156,13 @@ Database nodes run two services with PostgreSQL:
On failure, the old master node is automatically evicted from the cluster, and should be rejoined manually once recovered.
- Consul. Monitors the status of each node in the database cluster and tracks its health in a service definition on the Consul cluster.
Alongside PgBouncer, there is a Consul agent that watches the status of the PostgreSQL service. If that status changes, Consul runs a script which updates the configuration and reloads PgBouncer
Alongside each PgBouncer, there is a Consul agent that watches the status of the PostgreSQL service. If that status changes, Consul runs a script which updates the configuration and reloads PgBouncer
##### Connection flow
Each service in the package comes with a set of [default ports](https://docs.gitlab.com/omnibus/package-information/defaults.html#ports). You may need to make specific firewall rules for the connections listed below:
- Application servers connect to [PgBouncer default port](https://docs.gitlab.com/omnibus/package-information/defaults.html#pgbouncer)
- Application servers connect to either PgBouncer directly via its [default port](https://docs.gitlab.com/omnibus/package-information/defaults.html#pgbouncer) or via a configured Internal Load Balancer (TCP) that serves multiple PgBouncers.
- PgBouncer connects to the primary database servers [PostgreSQL default port](https://docs.gitlab.com/omnibus/package-information/defaults.html#postgresql)
- Repmgr connects to the database servers [PostgreSQL default port](https://docs.gitlab.com/omnibus/package-information/defaults.html#postgresql)
- Postgres secondaries connect to the primary database servers [PostgreSQL default port](https://docs.gitlab.com/omnibus/package-information/defaults.html#postgresql)
......@@ -499,7 +500,7 @@ attributes set, but the following need to be set.
# Disable PostgreSQL on the application node
postgresql['enable'] = false
gitlab_rails['db_host'] = 'PGBOUNCER_NODE'
gitlab_rails['db_host'] = 'PGBOUNCER_NODE' or 'INTERNAL_LOAD_BALANCER'
gitlab_rails['db_port'] = 6432
gitlab_rails['db_password'] = 'POSTGRESQL_USER_PASSWORD'
gitlab_rails['auto_migrate'] = false
......@@ -533,7 +534,8 @@ Here we'll show you some fully expanded example configurations.
##### Example recommended setup
This example uses 3 Consul servers, 3 PostgreSQL servers, and 1 application node.
This example uses 3 Consul servers, 3 PgBouncer servers (with associated internal load balancer),
3 PostgreSQL servers, and 1 application node.
We start with all servers on the same 10.6.0.0/16 private network range, they
can connect to each freely other on those addresses.
......@@ -543,14 +545,16 @@ Here is a list and description of each machine and the assigned IP:
- `10.6.0.11`: Consul 1
- `10.6.0.12`: Consul 2
- `10.6.0.13`: Consul 3
- `10.6.0.21`: PostgreSQL master
- `10.6.0.22`: PostgreSQL secondary
- `10.6.0.23`: PostgreSQL secondary
- `10.6.0.31`: GitLab application
All passwords are set to `toomanysecrets`, please do not use this password or derived hashes.
- `10.6.0.20`: Internal Load Balancer
- `10.6.0.21`: PgBouncer 1
- `10.6.0.22`: PgBouncer 2
- `10.6.0.23`: PgBouncer 3
- `10.6.0.31`: PostgreSQL master
- `10.6.0.32`: PostgreSQL secondary
- `10.6.0.33`: PostgreSQL secondary
- `10.6.0.41`: GitLab application
The external_url for GitLab is `http://gitlab.example.com`
All passwords are set to `toomanysecrets`, please do not use this password or derived hashes and the external_url for GitLab is `http://gitlab.example.com`.
Please note that after the initial configuration, if a failover occurs, the PostgresSQL master will change to one of the available secondaries until it is failed back.
......@@ -566,10 +570,45 @@ consul['configuration'] = {
server: true,
retry_join: %w(10.6.0.11 10.6.0.12 10.6.0.13)
}
consul['monitoring_service_discovery'] = true
```
[Reconfigure Omnibus GitLab][reconfigure GitLab] for the changes to take effect.
##### Example recommended setup for PgBouncer servers
On each server edit `/etc/gitlab/gitlab.rb`:
```ruby
# Disable all components except Pgbouncer and Consul agent
roles ['pgbouncer_role']
# Configure PgBouncer
pgbouncer['admin_users'] = %w(pgbouncer gitlab-consul)
pgbouncer['users'] = {
'gitlab-consul': {
password: '5e0e3263571e3704ad655076301d6ebe'
},
'pgbouncer': {
password: '771a8625958a529132abe6f1a4acb19c'
}
}
consul['watchers'] = %w(postgresql)
consul['enable'] = true
consul['configuration'] = {
retry_join: %w(10.6.0.11 10.6.0.12 10.6.0.13)
}
consul['monitoring_service_discovery'] = true
```
[Reconfigure Omnibus GitLab][reconfigure GitLab] for the changes to take effect.
##### Internal load balancer setup
An internal load balancer (TCP) is then required to be setup to serve each PgBouncer node (in this example on the IP of `10.6.0.20`). An example of how to do this can be found in the [PgBouncer Configure Internal Load Balancer](pgbouncer.md#configure-the-internal-load-balancer) section.
##### Example recommended setup for PostgreSQL servers
###### Primary node
......@@ -589,9 +628,6 @@ postgresql['shared_preload_libraries'] = 'repmgr_funcs'
# Disable automatic database migrations
gitlab_rails['auto_migrate'] = false
# Configure the Consul agent
consul['services'] = %w(postgresql)
postgresql['pgbouncer_user_password'] = '771a8625958a529132abe6f1a4acb19c'
postgresql['sql_user_password'] = '450409b85a0223a214b5fb1484f34d0f'
postgresql['max_wal_senders'] = 4
......@@ -599,9 +635,13 @@ postgresql['max_wal_senders'] = 4
postgresql['trust_auth_cidr_addresses'] = %w(10.6.0.0/16)
repmgr['trust_auth_cidr_addresses'] = %w(10.6.0.0/16)
# Configure the Consul agent
consul['services'] = %w(postgresql)
consul['enable'] = true
consul['configuration'] = {
retry_join: %w(10.6.0.11 10.6.0.12 10.6.0.13)
}
consul['monitoring_service_discovery'] = true
```
[Reconfigure Omnibus GitLab][reconfigure GitLab] for the changes to take effect.
......@@ -626,18 +666,15 @@ On the server edit `/etc/gitlab/gitlab.rb`:
```ruby
external_url 'http://gitlab.example.com'
gitlab_rails['db_host'] = '127.0.0.1'
gitlab_rails['db_host'] = '10.6.0.20' # Internal Load Balancer for PgBouncer nodes
gitlab_rails['db_port'] = 6432
gitlab_rails['db_password'] = 'toomanysecrets'
gitlab_rails['auto_migrate'] = false
postgresql['enable'] = false
pgbouncer['enable'] = true
pgbouncer['enable'] = false
consul['enable'] = true
# Configure PgBouncer
pgbouncer['admin_users'] = %w(pgbouncer gitlab-consul)
# Configure Consul agent
consul['watchers'] = %w(postgresql)
......@@ -661,7 +698,7 @@ consul['configuration'] = {
After deploying the configuration follow these steps:
1. On `10.6.0.21`, our primary database
1. On `10.6.0.31`, our primary database
Enable the `pg_trgm` extension
......@@ -673,7 +710,7 @@ After deploying the configuration follow these steps:
CREATE EXTENSION pg_trgm;
```
1. On `10.6.0.22`, our first standby database
1. On `10.6.0.32`, our first standby database
Make this node a standby of the primary
......@@ -681,7 +718,7 @@ After deploying the configuration follow these steps:
gitlab-ctl repmgr standby setup 10.6.0.21
```
1. On `10.6.0.23`, our second standby database
1. On `10.6.0.33`, our second standby database
Make this node a standby of the primary
......@@ -689,7 +726,7 @@ After deploying the configuration follow these steps:
gitlab-ctl repmgr standby setup 10.6.0.21
```
1. On `10.6.0.31`, our application server
1. On `10.6.0.41`, our application server
Set `gitlab-consul` user's PgBouncer password to `toomanysecrets`
......@@ -705,7 +742,7 @@ After deploying the configuration follow these steps:
#### Example minimal setup
This example uses 3 PostgreSQL servers, and 1 application node.
This example uses 3 PostgreSQL servers, and 1 application node (with PgBouncer setup alongside).
It differs from the [recommended setup](#example-recommended-setup) by moving the Consul servers into the same servers we use for PostgreSQL.
The trade-off is between reducing server counts, against the increased operational complexity of needing to deal with PostgreSQL [failover](#failover-procedure) and [restore](#restore-procedure) procedures in addition to [Consul outage recovery](consul.md#outage-recovery) on the same set of machines.
......
......@@ -4,13 +4,9 @@ type: reference
# Working with the bundle PgBouncer service
As part of its High Availability stack, GitLab Premium includes a bundled version of [PgBouncer](https://www.pgbouncer.org/) that can be managed through `/etc/gitlab/gitlab.rb`.
As part of its High Availability stack, GitLab Premium includes a bundled version of [PgBouncer](https://pgbouncer.github.io/) that can be managed through `/etc/gitlab/gitlab.rb`. PgBouncer is used to seamlessly migrate database connections between servers in a failover scenario. Additionally, it can be used in a non-HA setup to pool connections, speeding up response time while reducing resource usage.
In a High Availability setup, PgBouncer is used to seamlessly migrate database connections between servers in a failover scenario.
Additionally, it can be used in a non-HA setup to pool connections, speeding up response time while reducing resource usage.
It is recommended to run PgBouncer alongside the `gitlab-rails` service, or on its own dedicated node in a cluster.
In a HA setup, it's recommended to run a PgBouncer node separately for each database node with an internal load balancer (TCP) serving each accordingly.
## Operations
......@@ -18,7 +14,7 @@ It is recommended to run PgBouncer alongside the `gitlab-rails` service, or on i
1. Make sure you collect [`CONSUL_SERVER_NODES`](database.md#consul-information), [`CONSUL_PASSWORD_HASH`](database.md#consul-information), and [`PGBOUNCER_PASSWORD_HASH`](database.md#pgbouncer-information) before executing the next step.
1. Edit `/etc/gitlab/gitlab.rb` replacing values noted in the `# START user configuration` section:
1. One each node, edit the `/etc/gitlab/gitlab.rb` config file and replace values noted in the `# START user configuration` section as below:
```ruby
# Disable all components except PgBouncer and Consul agent
......@@ -67,7 +63,7 @@ It is recommended to run PgBouncer alongside the `gitlab-rails` service, or on i
#### PgBouncer Checkpoint
1. Ensure the node is talking to the current master:
1. Ensure each node is talking to the current master:
```sh
gitlab-ctl pgb-console # You will be prompted for PGBOUNCER_PASSWORD
......@@ -100,6 +96,41 @@ It is recommended to run PgBouncer alongside the `gitlab-rails` service, or on i
(2 rows)
```
#### Configure the internal load balancer
If you're running more than one PgBouncer node as recommended, then at this time you'll need to set up a TCP internal load balancer to serve each correctly. This can be done with any reputable TCP load balancer.
As an example here's how you could do it with [HAProxy](https://www.haproxy.org/):
```
global
log /dev/log local0
log localhost local1 notice
log stdout format raw local0
defaults
log global
default-server inter 10s fall 3 rise 2
balance leastconn
frontend internal-pgbouncer-tcp-in
bind *:6432
mode tcp
option tcplog
default_backend pgbouncer
backend pgbouncer
mode tcp
option tcp-check
server pgbouncer1 <ip>:6432 check
server pgbouncer2 <ip>:6432 check
server pgbouncer3 <ip>:6432 check
```
Refer to your preferred Load Balancer's documentation for further guidance.
### Running PgBouncer as part of a non-HA GitLab installation
1. Generate PGBOUNCER_USER_PASSWORD_HASH with the command `gitlab-ctl pg-password-md5 pgbouncer`
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment