Commit 13d9b357 authored by Nick Thomas's avatar Nick Thomas

Merge branch 'db-service-discovery' into 'master'

Add service discovery for the DB load balancer

See merge request gitlab-org/gitlab-ee!5883
parents be874b29 e457de0f
...@@ -13,6 +13,11 @@ production: ...@@ -13,6 +13,11 @@ production:
# hosts: # hosts:
# - host1.example.com # - host1.example.com
# - host2.example.com # - host2.example.com
# discover:
# nameserver: 1.2.3.4
# port: 8600
# record: secondary.postgresql.service.consul
# interval: 300
# #
# Development specific # Development specific
......
...@@ -8,5 +8,9 @@ if ActiveRecord::Base.connected? && ActiveRecord::Base.connection.table_exists?( ...@@ -8,5 +8,9 @@ if ActiveRecord::Base.connected? && ActiveRecord::Base.connection.table_exists?(
end end
Gitlab::Database::LoadBalancing.configure_proxy Gitlab::Database::LoadBalancing.configure_proxy
# Service discovery must be started after configuring the proxy, as service
# discovery depends on this.
Gitlab::Database::LoadBalancing.start_service_discovery
end end
end end
...@@ -125,6 +125,9 @@ after_fork do |server, worker| ...@@ -125,6 +125,9 @@ after_fork do |server, worker|
defined?(::Prometheus::Client.reinitialize_on_pid_change) && defined?(::Prometheus::Client.reinitialize_on_pid_change) &&
Prometheus::Client.reinitialize_on_pid_change Prometheus::Client.reinitialize_on_pid_change
defined?(Gitlab::Database::LoadBalancing) &&
Gitlab::Database::LoadBalancing.start_service_discovery
# if preload_app is true, then you may also want to check and # if preload_app is true, then you may also want to check and
# restart any other shared sockets/descriptors such as Memcached, # restart any other shared sockets/descriptors such as Memcached,
# and Redis. TokyoCabinet file handles are safe to reuse # and Redis. TokyoCabinet file handles are safe to reuse
......
...@@ -99,6 +99,76 @@ the following. This will balance the load between `host1.example.com` and ...@@ -99,6 +99,76 @@ the following. This will balance the load between `host1.example.com` and
1. Save the file and [restart GitLab][] for the changes to take effect. 1. Save the file and [restart GitLab][] for the changes to take effect.
## Service Discovery
> [Introduced][ee-5883] in [GitLab Premium][eep] 11.0.
Service discovery allows GitLab to automatically retrieve a list of secondary
databases to use, instead of having to manually specify these in the
`database.yml` configuration file. Service discovery works by periodically
checking a DNS A record, using the IPs returned by this record as the addresses
for the secondaries. For service discovery to work, all you need is a DNS server
and an A record containing the IP addresses of your secondaries.
To use service discovery you need to change your `database.yml` configuration
file so it looks like the following:
```yaml
production:
username: gitlab
database: gitlab
encoding: unicode
load_balancing:
discover:
nameserver: localhost
record: secondary.postgresql.service.consul
port: 8600
interval: 60
disconnect_timeout: 120
```
Here the `discover:` section specifies the configuration details to use for
service discovery.
### Configuration
The following options can be set:
| Option | Description | Default |
|----------------------|---------------------------------------------------------------------------------------------------|-----------|
| `nameserver` | The nameserver to use for looking up the DNS record. | localhost |
| `record` | The A record to look up. This option is required for service discovery to work. | |
| `port` | The port of the nameserver. | 8600 |
| `interval` | The minimum time in seconds between checking the DNS record. | 60 |
| `disconnect_timeout` | The time in seconds after which an old connection is closed, after the list of hosts was updated. | 120 |
The `interval` value specifies the _minimum_ time between checks. If the A
record has a TTL greater than this value, then service discovery will honor said
TTL. For example, if the TTL of the A record is 90 seconds, then service
discovery will wait at least 90 seconds before checking the A record again.
When the list of hosts is updated, it might take a while for the old connections
to be terminated. The `disconnect_timeout` setting can be used to enforce an
upper limit on the time it will take to terminate all old database connections.
### Forking
If you use an application server that forks, such as Unicorn, you _have to_
update your Unicorn configuration to start service discovery _after_ a fork.
Failure to do so will lead to service discovery only running in the parent
process. If you are using Unicorn, then you can add the following to your
Unicorn configuration file:
```ruby
after_fork do |server, worker|
defined?(Gitlab::Database::LoadBalancing) &&
Gitlab::Database::LoadBalancing.start_service_discovery
end
```
This will ensure that service discovery is started in both the parent and all
child processes.
## Balancing queries ## Balancing queries
Read-only `SELECT` queries will be balanced among all the secondary hosts. Read-only `SELECT` queries will be balanced among all the secondary hosts.
...@@ -198,3 +268,4 @@ production: ...@@ -198,3 +268,4 @@ production:
[wikipedia]: https://en.wikipedia.org/wiki/Load_balancing_(computing) [wikipedia]: https://en.wikipedia.org/wiki/Load_balancing_(computing)
[db-req]: ../install/requirements.md#database [db-req]: ../install/requirements.md#database
[ee-3526]: https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/3526 [ee-3526]: https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/3526
[ee-5883]: https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/5883
---
title: Add service discovery for the DB load balancer
merge_request:
author:
type: added
...@@ -50,6 +50,22 @@ module Gitlab ...@@ -50,6 +50,22 @@ module Gitlab
configuration['hosts'] || [] configuration['hosts'] || []
end end
def self.service_discovery_enabled?
configuration.dig('discover', 'record').present?
end
def self.service_discovery_configuration
conf = configuration['discover'] || {}
{
nameserver: conf['nameserver'] || 'localhost',
port: conf['port'] || 8600,
record: conf['record'],
interval: conf['interval'] || 60,
disconnect_timeout: conf['disconnect_timeout'] || 120
}
end
def self.log(level, message) def self.log(level, message)
Rails.logger.tagged(LOG_TAG) do Rails.logger.tagged(LOG_TAG) do
Rails.logger.send(level, message) Rails.logger.send(level, message)
...@@ -63,15 +79,22 @@ module Gitlab ...@@ -63,15 +79,22 @@ module Gitlab
# Returns true if load balancing is to be enabled. # Returns true if load balancing is to be enabled.
def self.enable? def self.enable?
return false unless ::License.feature_available?(:db_load_balancing) return false unless ::License.feature_available?(:db_load_balancing)
return false if program_name == 'rake' || Sidekiq.server?
return false unless Database.postgresql?
program_name != 'rake' && !hosts.empty? && !Sidekiq.server? && hosts.any? || service_discovery_enabled?
Database.postgresql?
end end
def self.program_name def self.program_name
@program_name ||= File.basename($0) @program_name ||= File.basename($0)
end end
def self.start_service_discovery
return unless service_discovery_enabled?
ServiceDiscovery.new(service_discovery_configuration).start
end
# Configures proxying of requests. # Configures proxying of requests.
def self.configure_proxy def self.configure_proxy
self.proxy = ConnectionProxy.new(hosts) self.proxy = ConnectionProxy.new(hosts)
......
...@@ -3,7 +3,7 @@ module Gitlab ...@@ -3,7 +3,7 @@ module Gitlab
module LoadBalancing module LoadBalancing
# A single database host used for load balancing. # A single database host used for load balancing.
class Host class Host
attr_reader :pool, :last_checked_at, :intervals, :load_balancer attr_reader :pool, :last_checked_at, :intervals, :load_balancer, :host
delegate :connection, :release_connection, to: :pool delegate :connection, :release_connection, to: :pool
...@@ -34,6 +34,22 @@ module Gitlab ...@@ -34,6 +34,22 @@ module Gitlab
@intervals = (interval..(interval * 2)).step(0.5).to_a @intervals = (interval..(interval * 2)).step(0.5).to_a
end end
# Disconnects the pool, once all connections are no longer in use.
#
# timeout - The time after which the pool should be forcefully
# disconnected.
def disconnect!(timeout = 120)
start_time = Metrics::System.monotonic_time
while (Metrics::System.monotonic_time - start_time) <= timeout
break if pool.connections.none?(&:in_use?)
sleep(2)
end
pool.disconnect!
end
def offline! def offline!
LoadBalancing.log(:warn, "Marking host #{@host} as offline") LoadBalancing.log(:warn, "Marking host #{@host} as offline")
......
...@@ -3,8 +3,6 @@ module Gitlab ...@@ -3,8 +3,6 @@ module Gitlab
module LoadBalancing module LoadBalancing
# A list of database hosts to use for connections. # A list of database hosts to use for connections.
class HostList class HostList
attr_reader :hosts
# hosts - The list of secondary hosts to add. # hosts - The list of secondary hosts to add.
def initialize(hosts = []) def initialize(hosts = [])
@hosts = hosts.shuffle @hosts = hosts.shuffle
...@@ -12,8 +10,23 @@ module Gitlab ...@@ -12,8 +10,23 @@ module Gitlab
@mutex = Mutex.new @mutex = Mutex.new
end end
def hosts
@mutex.synchronize { @hosts }
end
def length def length
@hosts.length @mutex.synchronize { @hosts.length }
end
def host_names
@mutex.synchronize { @hosts.map(&:host) }
end
def hosts=(hosts)
@mutex.synchronize do
@hosts = hosts.shuffle
@index = 0
end
end end
# Returns the next available host. # Returns the next available host.
...@@ -22,6 +35,8 @@ module Gitlab ...@@ -22,6 +35,8 @@ module Gitlab
# hosts were available. # hosts were available.
def next def next
@mutex.synchronize do @mutex.synchronize do
break if @hosts.empty?
started_at = @index started_at = @index
loop do loop do
......
# frozen_string_literal: true
require 'resolv'
module Gitlab
module Database
module LoadBalancing
# Service discovery of secondary database hosts.
#
# Service discovery works by periodically looking up a DNS record. If the
# DNS record returns a new list of hosts, this class will update the load
# balancer with said hosts. Requests may continue to use the old hosts
# until they complete.
class ServiceDiscovery
attr_reader :resolver, :interval, :record, :disconnect_timeout
MAX_SLEEP_ADJUSTMENT = 10
# nameserver - The nameserver to use for DNS lookups.
# port - The port of the nameserver.
# record - The DNS record to look up for retrieving the secondaries.
# interval - The time to wait between lookups.
# disconnect_timeout - The time after which an old host should be
# forcefully disconnected.
def initialize(nameserver:, port:, record:, interval: 60, disconnect_timeout: 120)
@resolver = Resolv::DNS.new(nameserver_port: [[nameserver, port]])
@interval = interval
@record = record
@disconnect_timeout = disconnect_timeout
end
def start
Thread.new do
loop do
interval =
begin
refresh_if_necessary
rescue => error
# Any exceptions that might occur should be reported to
# Sentry, instead of silently terminating this thread.
Raven.capture_exception(error)
Rails.logger.error(
"Service discovery encountered an error: #{error.message}"
)
self.interval
end
# We slightly randomize the sleep() interval. This should reduce
# the likelihood of _all_ processes refreshing at the same time,
# possibly putting unnecessary pressure on the DNS server.
sleep(interval + rand(MAX_SLEEP_ADJUSTMENT))
end
end
end
# Refreshes the hosts, but only if the DNS record returned a new list of
# addresses.
#
# The return value is the amount of time (in seconds) to wait before
# checking the DNS record for any changes.
def refresh_if_necessary
interval, from_dns = addresses_from_dns
current = addresses_from_load_balancer
replace_hosts(from_dns) if from_dns != current
interval
end
# Replaces all the hosts in the load balancer with the new ones,
# disconnecting the old connections.
#
# addresses - An Array of IP addresses to use for the new hosts.
def replace_hosts(addresses)
old_hosts = load_balancer.host_list.hosts
load_balancer.host_list.hosts = addresses.map do |addr|
Host.new(addr, load_balancer)
end
# We must explicitly disconnect the old connections, otherwise we may
# leak database connections over time. For example, if a request
# started just before we added the new hosts it will use an old
# host/connection. While this connection will be checked in and out,
# it won't be explicitly disconnected.
old_hosts.each do |host|
host.disconnect!(disconnect_timeout)
end
end
# Returns an Array containing:
#
# 1. The time to wait for the next check.
# 2. An array containing the IP addresses of the DNS record.
def addresses_from_dns
resources =
resolver.getresources(record, Resolv::DNS::Resource::IN::A)
# Addresses are sorted so we can directly compare the old and new
# addresses, without having to use any additional data structures.
addresses = resources.map { |r| r.address.to_s }.sort
[new_wait_time_for(resources), addresses]
end
def new_wait_time_for(resources)
wait = resources.first&.ttl || interval
# The preconfigured interval acts as a minimum amount of time to
# wait.
wait < interval ? interval : wait
end
def addresses_from_load_balancer
load_balancer.host_list.host_names.sort
end
def load_balancer
LoadBalancing.proxy.load_balancer
end
end
end
end
end
...@@ -23,6 +23,23 @@ describe Gitlab::Database::LoadBalancing::HostList do ...@@ -23,6 +23,23 @@ describe Gitlab::Database::LoadBalancing::HostList do
end end
end end
describe '#host_names' do
it 'returns the host names of all hosts' do
expect(host_list.host_names).to eq(%w[localhost localhost])
end
end
describe '#hosts=' do
it 'updates the list of hosts to use' do
host_list.hosts = [
Gitlab::Database::LoadBalancing::Host.new('foo', load_balancer)
]
expect(host_list.length).to eq(1)
expect(host_list.hosts[0].host).to eq('foo')
end
end
describe '#next' do describe '#next' do
it 'returns a host' do it 'returns a host' do
expect(host_list.next) expect(host_list.next)
...@@ -48,5 +65,9 @@ describe Gitlab::Database::LoadBalancing::HostList do ...@@ -48,5 +65,9 @@ describe Gitlab::Database::LoadBalancing::HostList do
expect(host_list.next).to be_nil expect(host_list.next).to be_nil
end end
it 'returns nil if no hosts are available' do
expect(described_class.new.next).to be_nil
end
end end
end end
...@@ -20,6 +20,39 @@ describe Gitlab::Database::LoadBalancing::Host, :postgresql do ...@@ -20,6 +20,39 @@ describe Gitlab::Database::LoadBalancing::Host, :postgresql do
end end
end end
describe '#disconnect!' do
it 'disconnects the pool' do
connection = double(:connection, in_use?: false)
pool = double(:pool, connections: [connection])
allow(host)
.to receive(:pool)
.and_return(pool)
expect(host)
.not_to receive(:sleep)
expect(host.pool)
.to receive(:disconnect!)
host.disconnect!
end
it 'disconnects the pool when waiting for connections takes too long' do
connection = double(:connection, in_use?: true)
pool = double(:pool, connections: [connection])
allow(host)
.to receive(:pool)
.and_return(pool)
expect(host.pool)
.to receive(:disconnect!)
host.disconnect!(1)
end
end
describe '#release_connection' do describe '#release_connection' do
it 'releases the current connection from the pool' do it 'releases the current connection from the pool' do
expect(host.pool).to receive(:release_connection) expect(host.pool).to receive(:release_connection)
...@@ -310,4 +343,10 @@ describe Gitlab::Database::LoadBalancing::Host, :postgresql do ...@@ -310,4 +343,10 @@ describe Gitlab::Database::LoadBalancing::Host, :postgresql do
expect(host.query_and_release('SELECT 10 AS number')).to eq({}) expect(host.query_and_release('SELECT 10 AS number')).to eq({})
end end
end end
describe '#host' do
it 'returns the hostname' do
expect(host.host).to eq('localhost')
end
end
end end
# frozen_string_literal: true
require 'spec_helper'
describe Gitlab::Database::LoadBalancing::ServiceDiscovery do
let(:service) do
described_class.new(nameserver: 'localhost', port: 8600, record: 'foo')
end
describe '#start' do
before do
allow(service)
.to receive(:loop)
.and_yield
end
it 'starts service discovery in a new thread' do
expect(service)
.to receive(:refresh_if_necessary)
.and_return(5)
expect(service)
.to receive(:rand)
.and_return(2)
expect(service)
.to receive(:sleep)
.with(7)
service.start.join
end
it 'reports exceptions to Sentry' do
error = StandardError.new
expect(service)
.to receive(:refresh_if_necessary)
.and_raise(error)
expect(Raven)
.to receive(:capture_exception)
.with(error)
expect(service)
.to receive(:rand)
.and_return(2)
expect(service)
.to receive(:sleep)
.with(62)
service.start.join
end
end
describe '#refresh_if_necessary' do
context 'when a refresh is necessary' do
before do
allow(service)
.to receive(:addresses_from_load_balancer)
.and_return(%w[localhost])
allow(service)
.to receive(:addresses_from_dns)
.and_return([10, %w[foo bar]])
end
it 'refreshes the load balancer hosts' do
expect(service)
.to receive(:replace_hosts)
.with(%w[foo bar])
expect(service.refresh_if_necessary).to eq(10)
end
end
context 'when a refresh is not necessary' do
before do
allow(service)
.to receive(:addresses_from_load_balancer)
.and_return(%w[localhost])
allow(service)
.to receive(:addresses_from_dns)
.and_return([10, %w[localhost]])
end
it 'does not refresh the load balancer hosts' do
expect(service)
.not_to receive(:replace_hosts)
expect(service.refresh_if_necessary).to eq(10)
end
end
end
describe '#replace_hosts' do
let(:load_balancer) do
Gitlab::Database::LoadBalancing::LoadBalancer.new(%w[foo])
end
before do
allow(service)
.to receive(:load_balancer)
.and_return(load_balancer)
end
it 'replaces the hosts of the load balancer' do
service.replace_hosts(%w[bar])
expect(load_balancer.host_list.host_names).to eq(%w[bar])
end
it 'disconnects the old connections' do
host = load_balancer.host_list.hosts.first
allow(service)
.to receive(:disconnect_timeout)
.and_return(2)
expect(host)
.to receive(:disconnect!)
.with(2)
service.replace_hosts(%w[bar])
end
end
describe '#addresses_from_dns' do
it 'returns a TTL and ordered list of IP addresses' do
res1 = double(:resource, address: '255.255.255.0', ttl: 90)
res2 = double(:resource, address: '127.0.0.1', ttl: 90)
allow(service.resolver)
.to receive(:getresources)
.with('foo', Resolv::DNS::Resource::IN::A)
.and_return([res1, res2])
expect(service.addresses_from_dns)
.to eq([90, %w[127.0.0.1 255.255.255.0]])
end
end
describe '#new_wait_time_for' do
it 'returns the DNS TTL if greater than the default interval' do
res = double(:resource, ttl: 90)
expect(service.new_wait_time_for([res])).to eq(90)
end
it 'returns the default interval if greater than the DNS TTL' do
res = double(:resource, ttl: 10)
expect(service.new_wait_time_for([res])).to eq(60)
end
it 'returns the default interval if no resources are given' do
expect(service.new_wait_time_for([])).to eq(60)
end
end
describe '#addresses_from_load_balancer' do
it 'returns the ordered host names of the load balancer' do
load_balancer = Gitlab::Database::LoadBalancing::LoadBalancer.new(%w[b a])
allow(service)
.to receive(:load_balancer)
.and_return(load_balancer)
expect(service.addresses_from_load_balancer).to eq(%w[a b])
end
end
end
...@@ -142,6 +142,18 @@ describe Gitlab::Database::LoadBalancing do ...@@ -142,6 +142,18 @@ describe Gitlab::Database::LoadBalancing do
expect(described_class.enable?).to eq(true) expect(described_class.enable?).to eq(true)
end end
it 'returns true when service discovery is enabled' do
allow(described_class).to receive(:hosts).and_return([])
allow(Sidekiq).to receive(:server?).and_return(false)
allow(Gitlab::Database).to receive(:postgresql?).and_return(true)
allow(described_class)
.to receive(:service_discovery_enabled?)
.and_return(true)
expect(described_class.enable?).to eq(true)
end
context 'without a license' do context 'without a license' do
before do before do
License.destroy_all License.destroy_all
...@@ -197,4 +209,75 @@ describe Gitlab::Database::LoadBalancing do ...@@ -197,4 +209,75 @@ describe Gitlab::Database::LoadBalancing do
expect(described_class.active_record_models).to be_an_instance_of(Array) expect(described_class.active_record_models).to be_an_instance_of(Array)
end end
end end
describe '.service_discovery_enabled?' do
it 'returns true if service discovery is enabled' do
allow(described_class)
.to receive(:configuration)
.and_return('discover' => { 'record' => 'foo' })
expect(described_class.service_discovery_enabled?).to eq(true)
end
it 'returns false if service discovery is disabled' do
expect(described_class.service_discovery_enabled?).to eq(false)
end
end
describe '.service_discovery_configuration' do
context 'when no configuration is provided' do
it 'returns a default configuration Hash' do
expect(described_class.service_discovery_configuration).to eq(
nameserver: 'localhost',
port: 8600,
record: nil,
interval: 60,
disconnect_timeout: 120
)
end
end
context 'when configuration is provided' do
it 'returns a Hash including the custom configuration' do
allow(described_class)
.to receive(:configuration)
.and_return('discover' => { 'record' => 'foo' })
expect(described_class.service_discovery_configuration).to eq(
nameserver: 'localhost',
port: 8600,
record: 'foo',
interval: 60,
disconnect_timeout: 120
)
end
end
end
describe '.start_service_discovery' do
it 'does not start if service discovery is disabled' do
expect(Gitlab::Database::LoadBalancing::ServiceDiscovery)
.not_to receive(:new)
described_class.start_service_discovery
end
it 'starts service discovery if enabled' do
allow(described_class)
.to receive(:service_discovery_enabled?)
.and_return(true)
instance = double(:instance)
expect(Gitlab::Database::LoadBalancing::ServiceDiscovery)
.to receive(:new)
.with(an_instance_of(Hash))
.and_return(instance)
expect(instance)
.to receive(:start)
described_class.start_service_discovery
end
end
end end
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment