Commit 981ffd39 authored by Drew Blessing's avatar Drew Blessing

Add HA architecture examples

parent ddbbd3bd
...@@ -19,21 +19,102 @@ solution. And the more complex the solution, the more work is involved in ...@@ -19,21 +19,102 @@ solution. And the more complex the solution, the more work is involved in
setting up and maintaining it. High availability is not free and every HA setting up and maintaining it. High availability is not free and every HA
solution should balance the costs against the benefits. solution should balance the costs against the benefits.
## Architecture # Architecture
There are two kinds of setups: There are many options when choosing a highly-available GitLab architecture. We
recommend engaging with GitLab Support to choose the best architecture for your
- active/active use-case. This page contains some various options and guidelines based on
- active/passive experience with GitLab.com and EE on-premises customers.
### Active/Active ## GitLab Components
This architecture scales easily because all application servers handle The following components need to be considered for an HA environment. In many
user requests simultaneously. The database, Redis, and GitLab application are cases components can be combined on the same nodes to reduce complexity.
all deployed on separate servers. The configuration is **only** highly-available
if the database, Redis and storage are also configured as such. - Unicorn/Workhorse - Web-requests (UI, API, Git over HTTP)
- Sidekiq - Asynchronous/Background jobs
Follow the steps below to configure an active/active setup: - PostgreSQL - Database
- Consul - Database service discovery and health checks/failover
- PGBouncer - Database pool manager
- Redis - Key/Value store (User sessions, cache, queue for Sidekiq)
- Sentinel - Redis health check/failover manager
## Architecture Examples
For all examples below, we recommend running Consul and Redis Sentinel on
dedicated nodes. If Consul is running on PostgreSQL nodes or Sentinel on
Redis nodes there is a potential that high resource usage by PostgreSQL or
Redis could prevent communication between the other Consul and Sentinel nodes.
This may lead to the other nodes believing a failure has occurred and automated
failover is necessary. Isolating them away from the services they monitor reduces
the chances of split-brain.
The examples below do not really address high availability of NFS. Some enterprises
have access to NFS appliances that manage availability. This is the best case
scenario. In the future, GitLab may offer a more user-friendly solution to
[GitLab HA Storage](https://gitlab.com/gitlab-org/omnibus-gitlab/issues/2472).
There are many options in between each of these examples. Work with GitLab Support
to understand the best starting point for your workload and adapt from there.
### Horizontal
This is the simplest form of high availability and scaling. It requires the
fewest number of individual servers (virtual or physical) but does have some
trade-offs and limits.
This architecture will work well for many GitLab customers. Larger customers
may begin to notice certain events cause contention/high load - for example,
cloning many large repositories with binary files, high API usage, a large
number of enqueued Sidekiq jobs, etc. If this happens you should consider
moving to a hybrid or fully distributed architecture depending on what is causing
the contention.
- 2 PostgreSQL nodes
- 2 Redis nodes
- 3 Consul/Sentinel nodes
- 2 or more GitLab application nodes (Unicorn, Workhorse, Sidekiq, PGBouncer)
- 1 NFS server/appliance
![Horizontal architecture diagram](../img/high_availability/horizontal.png)
### Hybrid
In this architecture, certain components are split on dedicated nodes so high
resource usage of one component does not interfere with others. In larger
environments this is a good architecture to consider if you foresee or do have
contention due to certain workloads.
- 2 PostgreSQL nodes
- 2 Redis nodes
- 3 Consul/Sentinel nodes
- 2 or more Sidekiq nodes
- 2 or more Web nodes (Unicorn, Workhorse, PGBouncer)
- 1 or more NFS servers/appliances
![Hybrid architecture diagram](../img/high_availability/hybrid.png)
### Fully Distributed
This architecture scales to hundreds of thousands of users and projects and is
the basis of the GitLab.com architecture. While this scales well it also comes
with the added complexity of many more nodes to configure, manage and monitor.
- 2 PostgreSQL nodes
- 4 or more Redis nodes (2 separate clusters for persistent and cache data)
- 3 Consul nodes
- 3 Sentinel nodes
- Multiple dedicated Sidekiq nodes (Split into real-time, best effort, ASAP,
CI Pipeline and Pull Mirror sets)
- 2 or more Git nodes (Git over SSH/Git over HTTP)
- 2 or more API nodes (All requests to `/api`)
- 2 or more Web nodes (All other web requests)
- 2 or more NFS servers/appliances
![Fully Distributed architecture diagram](../img/high_availability/fully-distributed.png)
The following pages outline the steps necessary to configure each component
separately:
1. [Configure the database](database.md) 1. [Configure the database](database.md)
1. [Configure Redis](redis.md) 1. [Configure Redis](redis.md)
...@@ -42,22 +123,3 @@ Follow the steps below to configure an active/active setup: ...@@ -42,22 +123,3 @@ Follow the steps below to configure an active/active setup:
1. [Configure the GitLab application servers](gitlab.md) 1. [Configure the GitLab application servers](gitlab.md)
1. [Configure the load balancers](load_balancer.md) 1. [Configure the load balancers](load_balancer.md)
![Active/Active HA Diagram](../img/high_availability/active-active-diagram.png)
### Active/Passive
For pure high-availability/failover with no scaling you can use an
active/passive configuration. This utilizes DRBD (Distributed Replicated
Block Device) to keep all data in sync. DRBD requires a low latency link to
remain in sync. It is not advisable to attempt to run DRBD between data centers
or in different cloud availability zones.
> **Note:** GitLab recommends against choosing this HA method because of the
complexity of managing DRBD and crafting automatic failover. This is
*compatible* with GitLab, but not officially *supported*. If you are
an EE customer, support will help you with GitLab related problems, but if the
root cause is identified as DRBD, we will not troubleshoot further.
Components/Servers Required: 2 servers/virtual machines (one active/one passive)
![Active/Passive HA Diagram](../img/high_availability/active-passive-diagram.png)
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment