Commit 623db07d authored by Yorick Peterse's avatar Yorick Peterse

Add support for load balancing database queries

This adds support for balancing queries amongst multiple database hosts.
Web requests will stick to using the primary for a little while after a
write took place, removing the need for synchronous replication. Load
balancing is disabled for Sidekiq since using this could lead to race
conditions, and Sidekiq mostly performs writes anyway.

== Balancing

Balancing is done using a simple round-robin algorithm. The first time a
connection is needed the first host is used, then the second, third,
etc. This logic resets to the first host once reaching the end of the
hosts list.

The code added in this commit _only_ load balances queries sent from
models. The code does _not_ touch ActiveRecord::Base.connection. This
means that direct use of this method will result in the queries being
sent to the primary.

== Configuration

Configuration is done by adding a YAML section to config/database.yml.
For example:

    production:
      load_balancing:
        hosts:
          - 10.0.0.1
          - 10.0.0.2

All hosts will use the same authentication credentials.

== Sticking

When a write is performed the query is sent to the primary. Any queries
executed after this point are also sent to the primary. At the end of a
request some session details are stored for the current user, these
details are used to stick to the primary for as long as necessary (or
until the data expires).

This prevents the user from running into cases where they write data to
the primary, read from the secondary, and the data isn't available yet
(e.g. leading to an HTTP 404 error).

== Overhead

The load balancing code has minimal overhead. Instead of parsing raw SQL
queries it hooks into Rails specific methods to determine what host to
use for a query.

== Transactions

Transactions are always executed on the primary, even if they don't
perform any writes. Once a transaction completes a session will stick to
the primary. This is based on transactions almost always being used for
writes (there's little benefit to using a transaction for only reads).

== Prepared Statements

Prepared statements don't work well when queries are being distributed
amongst hosts. As a result GitLab will automatically disable prepared
statements when load balancing is enabled. Disabling prepared statements
has no impact on response timings, and may even reduce the memory usage
of PostgreSQL.

== Failovers

The load balancing code is capable of dealing with database failovers.
In the event of a secondary being unavailable the load balancer will
mark it as offline and use the next available secondary. If no
secondaries are available the primary is used instead.

Secondaries that are marked as offline are checked again automatically,
preventing a host from being marked as offline forever.

In the event of a connection error when writing to the primary the
code will suspend the caller, then retry the operation up to 3 times.
Every retry the sleep time will increase exponentially.

All of this means that in the event of a DB restart or failover some
requests may take a bit longer to complete; instead of the application
immediately returning an error.
parent f2f5cb67
---
title: Add support for load balancing database queries
merge_request:
author:
......@@ -9,7 +9,11 @@ production:
# username: git
# password:
# host: localhost
# port: 5432
# port: 5432
# load_balancing:
# hosts:
# - host1.example.com
# - host2.example.com
#
# Development specific
......
if Gitlab::Database::LoadBalancing.enable?
Gitlab::Database.disable_prepared_statements
Gitlab::Application.configure do |config|
config.middleware.use(Gitlab::Database::LoadBalancing::RackMiddleware)
end
Gitlab::Database::LoadBalancing.configure_proxy
end
......@@ -78,6 +78,7 @@
- [Container Registry](administration/container_registry.md) Configure Docker Registry with GitLab.
- [Repository restrictions](user/admin_area/settings/account_and_limit_settings.md#repository-size-limit) Define size restrictions for your repositories to limit the space they occupy in your storage device. Includes LFS objects.
- [Auditor users](administration/auditor_users.md) Create auditor users, with read-only access to the entire system.
- [Database load balancing](administration/database_load_balancing.md) Distribute database queries amongst multiple database servers.
## Contributor documentation
......
# Database Load Balancing
GitLab Enterprise Edition allows you to distribute read-only queries amongst
multiple database servers. This can be used to reduce the load on the primary
database, and increase responsiveness.
For load balancing to work you will need at least PostgreSQL 9.2 or newer, MySQL
is not supported. You also need to make sure that you have at least 1 secondary
in [hot standby][hot-standby] mode.
Load balancing also requires that the hosts configured in `config/database.yml`
**always** point to the primary, even after a database failover. Furthermore,
the additional hosts to balance load amongst must **always** point to secondary
databases. This means that you should put a load balance in front of every
database, and have GitLab connect to those load balancers.
For example, say you have a primary ("db1.gitlab.com") and two secondaries,
"db2.gitlab.com" and "db3.gitlab.com". For this setup you will need to have 3
load balancers, one for every host. For example:
* primary.gitlab.com forwards to db1.gitlab.com
* secondary1.gitlab.com forwards to db2.gitlab.com
* secondary2.gitlab.com forwards to db3.gitlab.com
Now let's say that a failover happens and db2 becomes the new primary. This
means forwarding should now happen as follows:
* primary.gitlab.com forwards to db2.gitlab.com
* secondary1.gitlab.com forwards to db1.gitlab.com
* secondary2.gitlab.com forwards to db3.gitlab.com
GitLab does not take care of this for you, so you will need to do so yourself.
Finally, load balancing requires that GitLab can connect to all hosts using the
same credentials and port as configured in `config/database.yml`. Using
different ports or credentials for different hosts is not supported.
## Enabling Load Balancing
Load balancing is configured in `config/database.yml`. For the environment in
which you want to use load balancing you'll need to add the following:
```yaml
load_balancing:
hosts:
- host1
- host2
- etc
```
For example, for the "production" environment:
```yaml
production:
username: gitlab
database: gitlab
encoding: unicode
load_balancing:
hosts:
- host1.example.com
- host2.example.com
```
This will balance the load between `host1.example.com` and `host2.example.com`.
## Balancing Queries
Read-only `SELECT` queries will be balanced amongst all the secondary hosts.
Everything else (including transactions) will be executed on the primary.
Queries such as `SELECT ... FOR UPDATE` are also executed on the primary.
## Prepared Statements
Prepared statements don't work well with load balancing and are disabled
automatically when load balancing is enabled. This should have no impact on
response timings.
## Primary Sticking
After a write has been performed GitLab will stick to using the primary for a
certain period of time, scoped to the user that performed the write. GitLab will
revert back to using secondaries when they have either caught up, or after 30
seconds.
## Failover Handling
In the event of a failover or an unresponsive database, the load balancer will
try to use the next available host. If no secondaries are available the
operation is performed on the primary instead.
In the event of a connection error being produced when writing data, the
operation will be retried up to 3 times using an exponential back-off.
When using load balancing you should be able to safely restart a database server
without it immediately leading to errors being presented to the users.
## Logging
The load balancer logs various messages, such as:
* When a host is marked as offline
* When a host comes back online
* When all secondaries are offline
Each log message contains the tag `[DB-LB]` to make searching/filtering of such
log entries easier.
[hot-standby]: https://www.postgresql.org/docs/9.6/static/hot-standby.html
......@@ -103,6 +103,14 @@ module Gitlab
ActiveRecord::ConnectionAdapters::ConnectionPool.new(spec)
end
# Disables prepared statements for the current database connection.
def self.disable_prepared_statements
config = ActiveRecord::Base.configurations[Rails.env]
config['prepared_statements'] = false
ActiveRecord::Base.establish_connection(config)
end
def self.connection
ActiveRecord::Base.connection
end
......
module Gitlab
module Database
module LoadBalancing
# The connection proxy to use for load balancing (if enabled).
cattr_accessor :proxy
LOG_TAG = 'DB-LB'.freeze
# The exceptions raised for connection errors.
CONNECTION_ERRORS = if defined?(PG)
[
PG::ConnectionBad,
PG::ConnectionDoesNotExist,
PG::ConnectionException,
PG::ConnectionFailure,
PG::UnableToSend,
# During a failover this error may be raised when
# writing to a primary.
PG::ReadOnlySqlTransaction
].freeze
else
[].freeze
end
# Returns the additional hosts to use for load balancing.
def self.hosts
hash = ActiveRecord::Base.configurations[Rails.env]['load_balancing']
if hash
hash['hosts'] || []
else
[]
end
end
def self.log(level, message)
Rails.logger.tagged(LOG_TAG) do
Rails.logger.send(level, message)
end
end
def self.pool_size
ActiveRecord::Base.configurations[Rails.env]['pool']
end
# Returns true if load balancing is to be enabled.
def self.enable?
program_name != 'rake' && !hosts.empty? && !Sidekiq.server? &&
Database.postgresql?
end
def self.program_name
File.basename($0)
end
# Configures proxying of requests.
def self.configure_proxy
self.proxy = ConnectionProxy.new(hosts)
# ActiveRecordProxy's methods are made available as class methods in
# ActiveRecord::Base, while still allowing the use of `super`.
ActiveRecord::Base.singleton_class.prepend(ActiveRecordProxy)
# The above will only patch newly defined models, so we also need to
# patch existing ones.
active_record_models.each do |model|
model.singleton_class.prepend(ModelProxy)
end
end
def self.active_record_models
ActiveRecord::Base.descendants
end
end
end
end
module Gitlab
module Database
module LoadBalancing
# Module injected into ActiveRecord::Base to allow proxying of subclasses.
module ActiveRecordProxy
def inherited(by)
super(by)
# The methods in ModelProxy will become available as class methods for
# the class defined in `by`.
by.singleton_class.prepend(ModelProxy)
end
end
end
end
end
module Gitlab
module Database
module LoadBalancing
# Redirecting of ActiveRecord connections.
#
# The ConnectionProxy class redirects ActiveRecord connection requests to
# the right load balancer pool, depending on the type of query.
class ConnectionProxy
attr_reader :load_balancer
# These methods perform writes after which we need to stick to the
# primary.
STICKY_WRITES = %i(
delete
delete_all
insert
transaction
update
update_all
).freeze
# hosts - The hosts to use for load balancing.
def initialize(hosts = [])
@load_balancer = LoadBalancer.new(hosts)
end
def select(*args)
read_using_load_balancer(:select, *args)
end
def select_all(arel, name = nil, binds = [])
if arel.respond_to?(:locked) && arel.locked
# SELECT ... FOR UPDATE queries should be sent to the primary.
write_using_load_balancer(:select_all, arel, name, binds,
sticky: true)
else
read_using_load_balancer(:select_all, arel, name, binds)
end
end
STICKY_WRITES.each do |name|
define_method(name) do |*args, &block|
write_using_load_balancer(name, *args, sticky: true, &block)
end
end
# Delegates all unknown messages to a read-write connection.
def method_missing(name, *args, &block)
write_using_load_balancer(name, *args, &block)
end
# Performs a read using the load balancer.
#
# name - The name of the method to call on a connection object.
def read_using_load_balancer(name, *args, &block)
method = Session.current.use_primary? ? :read_write : :read
@load_balancer.send(method) do |connection|
connection.send(name, *args, &block)
end
end
# Performs a write using the load balancer.
#
# name - The name of the method to call on a connection object.
# sticky - If set to true the session will stick to the master after
# the write.
def write_using_load_balancer(name, *args, sticky: false, &block)
result = @load_balancer.read_write do |connection|
# Sticking has to be enabled before calling the method. Not doing so
# could lead to methods called in a block still being performed on a
# secondary instead of on a primary (when necessary).
Session.current.use_primary! if sticky
connection.send(name, *args, &block)
end
result
end
end
end
end
end
module Gitlab
module Database
module LoadBalancing
# A single database host used for load balancing.
class Host
attr_reader :pool
delegate :connection, :release_connection, to: :pool
# host - The address of the database.
def initialize(host)
@host = host
@pool = Database.create_connection_pool(LoadBalancing.pool_size, host)
@online = true
end
def offline!
LoadBalancing.log(:warn, "Marking host #{@host} as offline")
@online = false
@pool.disconnect!
end
# Returns true if the host is online.
def online?
return true if @online
begin
retried = 0
@online = begin
connection.active?
rescue
if retried < 3
release_connection
retried += 1
retry
else
false
end
end
LoadBalancing.log(:info, "Host #{@host} came back online") if @online
@online
ensure
release_connection
end
end
# Returns true if this host has caught up to the given transaction
# write location.
#
# location - The transaction write location as reported by a primary.
def caught_up?(location)
string = connection.quote(location)
# In case the host is a primary pg_last_xlog_replay_location() returns
# NULL. The recovery check ensures we treat the host as up-to-date in
# such a case.
query = "SELECT NOT pg_is_in_recovery() OR " \
"pg_xlog_location_diff(pg_last_xlog_replay_location(), #{string}) >= 0 AS result"
row = connection.select_all(query).first
row && row['result'] == 't'
ensure
release_connection
end
end
end
end
end
module Gitlab
module Database
module LoadBalancing
# A list of database hosts to use for connections.
class HostList
attr_reader :hosts
# hosts - The list of secondary hosts to add.
def initialize(hosts = [])
@hosts = hosts.shuffle
@index = 0
@mutex = Mutex.new
end
def length
@hosts.length
end
# Returns the next available host.
#
# Returns a Gitlab::Database::LoadBalancing::Host instance, or nil if no
# hosts were available.
def next
@mutex.synchronize do
started_at = @index
loop do
host = @hosts[@index]
@index = (@index + 1) % @hosts.length
return host if host.online?
# Return nil once we have cycled through all hosts and none were
# available.
return if @index == started_at
end
end
end
end
end
end
end
module Gitlab
module Database
module LoadBalancing
# Load balancing for ActiveRecord connections.
#
# Each host in the load balancer uses the same credentials as the primary
# database.
#
# This class *requires* that `ActiveRecord::Base.connection` always
# returns a connection to the primary.
class LoadBalancer
CACHE_KEY = :gitlab_load_balancer_host
attr_reader :host_list
# hosts - The hostnames/addresses of the additional databases.
def initialize(hosts = [])
@host_list = HostList.new(hosts.map { |addr| Host.new(addr) })
end
# Yields a connection that can be used for reads.
#
# If no secondaries were available this method will use the primary
# instead.
def read(&block)
conflict_retried = 0
while host
begin
return yield host.connection
rescue => error
if serialization_failure?(error)
# This error can occur when a query conflicts. See
# https://www.postgresql.org/docs/current/static/hot-standby.html#HOT-STANDBY-CONFLICT
# for more information.
#
# In this event we'll cycle through the secondaries at most 3
# times before using the primary instead.
if conflict_retried < @host_list.length * 3
conflict_retried += 1
release_host
else
break
end
elsif connection_error?(error)
host.offline!
release_host
else
raise error
end
end
end
LoadBalancing.
log(:warn, 'No secondaries were available, using primary instead')
read_write(&block)
end
# Yields a connection that can be used for both reads and writes.
def read_write
# In the event of a failover the primary may be briefly unavailable.
# Instead of immediately grinding to a halt we'll retry the operation
# a few times.
retry_with_backoff do
yield ActiveRecord::Base.connection
end
end
# Returns a host to use for queries.
#
# Hosts are scoped per thread so that multiple threads don't
# accidentally re-use the same host + connection.
def host
RequestStore[CACHE_KEY] ||= @host_list.next
end
# Releases the host and connection for the current thread.
def release_host
RequestStore[CACHE_KEY]&.release_connection
RequestStore.delete(CACHE_KEY)
end
def release_primary_connection
ActiveRecord::Base.connection_pool.release_connection
end
# Returns the transaction write location of the primary.
def primary_write_location
read_write do |connection|
row = connection.
select_all('SELECT pg_current_xlog_insert_location()::text AS location').
first
if row
row['location']
else
raise 'Failed to determine the write location of the primary database'
end
end
end
# Returns true if all hosts have caught up to the given transaction
# write location.
def all_caught_up?(location)
@host_list.hosts.all? { |host| host.caught_up?(location) }
end
# Yields a block, retrying it upon error using an exponential backoff.
def retry_with_backoff(retries = 3, time = 2)
retried = 0
last_error = nil
while retried < retries
begin
return yield
rescue => error
raise error unless connection_error?(error)
# We need to release the primary connection as otherwise Rails
# will keep raising errors when using the connection.
release_primary_connection
last_error = error
sleep(time)
retried += 1
time **= 2
end
end
raise last_error
end
def connection_error?(error)
case error
when ActiveRecord::StatementInvalid, ActionView::Template::Error
# After connecting to the DB Rails will wrap query errors using this
# class.
connection_error?(error.original_exception)
when *CONNECTION_ERRORS
true
else
# When PG tries to set the client encoding but fails due to a
# connection error it will raise a PG::Error instance. Catching that
# would catch all errors (even those we don't want), so instead we
# check for the message of the error.
error.message.start_with?('invalid encoding name:')
end
end
def serialization_failure?(error)
if error.respond_to?(:original_exception)
serialization_failure?(error.original_exception)
else
error.is_a?(PG::TRSerializationFailure)
end
end
end
end
end
end
module Gitlab
module Database
module LoadBalancing
# Modle injected into models in order to redirect connections to a
# ConnectionProxy.
module ModelProxy
def connection
LoadBalancing.proxy
end
end
end
end
end
module Gitlab
module Database
module LoadBalancing
# Rack middleware for managing load balancing.
class RackMiddleware
SESSION_KEY = :gitlab_load_balancer
# The number of seconds after which a session should stop reading from
# the primary.
EXPIRATION = 30
def initialize(app)
@app = app
end
def call(env)
# Ensure that any state that may have run before the first request
# doesn't linger around.
clear
user = user_for_request(env)
check_primary_requirement(user) if user
result = @app.call(env)
assign_primary_for_user(user) if Session.current.use_primary? && user
result
ensure
clear
end
# Checks if we need to use the primary for the current user.
def check_primary_requirement(user)
location = last_write_location_for(user)
return unless location
if load_balancer.all_caught_up?(location)
delete_write_location_for(user)
else
Session.current.use_primary!
end
end
def assign_primary_for_user(user)
set_write_location_for(user, load_balancer.primary_write_location)
end
def clear
load_balancer.release_host
Session.clear_session
end
def load_balancer
LoadBalancing.proxy.load_balancer
end
# Returns the User object for the currently authenticated user, if any.
def user_for_request(env)
api = env['api.endpoint']
warden = env['warden']
if api && api.respond_to?(:current_user)
# The current request is an API request. In this case we can use our
# `current_user` helper method.
api.current_user
elsif warden && warden.user
# Used by the Rails app, and sometimes by the API.
warden.user
else
nil
end
end
def last_write_location_for(user)
Gitlab::Redis.with do |redis|
redis.get(redis_key_for(user))
end
end
def delete_write_location_for(user)
Gitlab::Redis.with do |redis|
redis.del(redis_key_for(user))
end
end
def set_write_location_for(user, location)
Gitlab::Redis.with do |redis|
redis.set(redis_key_for(user), location, ex: EXPIRATION)
end
end
def redis_key_for(user)
"database-load-balancing/write-location/#{user.id}"
end
end
end
end
end
module Gitlab
module Database
module LoadBalancing
# Tracking of load balancing state per user session.
#
# A session starts at the beginning of a request and ends once the request
# has been completed. Sessions can be used to keep track of what hosts
# should be used for queries.
class Session
CACHE_KEY = :gitlab_load_balancer_session
def self.current
RequestStore[CACHE_KEY] ||= new
end
def self.clear_session
RequestStore.delete(CACHE_KEY)
end
def initialize
@use_primary = false
end
def use_primary?
@use_primary
end
def use_primary!
@use_primary = true
end
end
end
end
end
require 'spec_helper'
describe Gitlab::Database::LoadBalancing::ActiveRecordProxy do
describe '#inherited' do
it 'adds the ModelProxy module to the singleton class' do
base = Class.new do
include Gitlab::Database::LoadBalancing::ActiveRecordProxy
end
model = Class.new(base)
expect(model.included_modules).to include(described_class)
end
end
end
require 'spec_helper'
describe Gitlab::Database::LoadBalancing::ConnectionProxy do
let(:proxy) { described_class.new }
describe '#select' do
it 'performs a read' do
expect(proxy).to receive(:read_using_load_balancer).with(:select, 'foo')
proxy.select('foo')
end
end
describe '#select_all' do
describe 'using a SELECT query' do
it 'runs the query on a secondary' do
arel = double(:arel)
expect(proxy).to receive(:read_using_load_balancer).
with(:select_all, arel, 'foo', [])
proxy.select_all(arel, 'foo')
end
end
describe 'using a SELECT FOR UPDATE query' do
it 'runs the query on the primary and sticks to it' do
arel = double(:arel, locked: true)
expect(proxy).to receive(:write_using_load_balancer).
with(:select_all, arel, 'foo', [], sticky: true)
proxy.select_all(arel, 'foo')
end
end
end
Gitlab::Database::LoadBalancing::ConnectionProxy::STICKY_WRITES.each do |name|
describe "#{name}" do
it 'runs the query on the primary and sticks to it' do
expect(proxy).to receive(:write_using_load_balancer).
with(name, 'foo', sticky: true)
proxy.send(name, 'foo')
end
end
end
# We have an extra test for #transaction here to make sure that nested queries
# are also sent to a primary.
describe '#transaction' do
after do
Gitlab::Database::LoadBalancing::Session.clear_session
end
it 'runs the transaction and any nested queries on the primary' do
primary = double(:connection)
allow(primary).to receive(:transaction).and_yield
allow(primary).to receive(:select)
expect(proxy.load_balancer).to receive(:read_write).
twice.and_yield(primary)
# This expectation is put in place to ensure no read is performed.
expect(proxy.load_balancer).not_to receive(:read)
proxy.transaction { proxy.select('true') }
expect(Gitlab::Database::LoadBalancing::Session.current.use_primary?).
to eq(true)
end
end
describe '#method_missing' do
it 'runs the query on the primary without sticking to it' do
expect(proxy).to receive(:write_using_load_balancer).
with(:foo, 'foo')
proxy.foo('foo')
end
end
describe '#read_using_load_balancer' do
let(:session) { double(:session) }
let(:connection) { double(:connection) }
before do
allow(Gitlab::Database::LoadBalancing::Session).to receive(:current).
and_return(session)
end
describe 'with a regular session' do
it 'uses a secondary' do
allow(session).to receive(:use_primary?).and_return(false)
expect(connection).to receive(:foo).with('foo')
expect(proxy.load_balancer).to receive(:read).and_yield(connection)
proxy.read_using_load_balancer(:foo, 'foo')
end
end
describe 'with a session using the primary' do
it 'uses the primary' do
allow(session).to receive(:use_primary?).and_return(true)
expect(connection).to receive(:foo).with('foo')
expect(proxy.load_balancer).to receive(:read_write).
and_yield(connection)
proxy.read_using_load_balancer(:foo, 'foo')
end
end
end
describe '#write_using_load_balancer' do
let(:session) { double(:session) }
let(:connection) { double(:connection) }
before do
allow(Gitlab::Database::LoadBalancing::Session).to receive(:current).
and_return(session)
end
it 'uses the primary' do
expect(proxy.load_balancer).to receive(:read_write).and_yield(connection)
expect(connection).to receive(:foo).with('foo')
expect(session).not_to receive(:use_primary!)
proxy.write_using_load_balancer(:foo, 'foo')
end
it 'sticks to the primary when sticking is enabled' do
expect(proxy.load_balancer).to receive(:read_write).and_yield(connection)
expect(connection).to receive(:foo).with('foo')
expect(session).to receive(:use_primary!)
proxy.write_using_load_balancer(:foo, 'foo', sticky: true)
end
end
end
require 'spec_helper'
describe Gitlab::Database::LoadBalancing::HostList do
before do
allow(Gitlab::Database).to receive(:create_connection_pool).
and_return(ActiveRecord::Base.connection_pool)
end
let(:host_list) do
hosts = Array.new(2) do
Gitlab::Database::LoadBalancing::Host.new('localhost')
end
described_class.new(hosts)
end
describe '#length' do
it 'returns the number of hosts in the list' do
expect(host_list.length).to eq(2)
end
end
describe '#next' do
it 'returns a host' do
expect(host_list.next).
to be_an_instance_of(Gitlab::Database::LoadBalancing::Host)
end
it 'cycles through all available hosts' do
expect(host_list.next).to eq(host_list.hosts[0])
expect(host_list.next).to eq(host_list.hosts[1])
expect(host_list.next).to eq(host_list.hosts[0])
end
it 'skips hosts that are offline' do
allow(host_list.hosts[0]).to receive(:online?).and_return(false)
expect(host_list.next).to eq(host_list.hosts[1])
end
it 'returns nil if no hosts are online' do
host_list.hosts.each do |host|
allow(host).to receive(:online?).and_return(false)
end
expect(host_list.next).to be_nil
end
end
end
require 'spec_helper'
describe Gitlab::Database::LoadBalancing::Host do
let(:host) { described_class.new('localhost') }
before do
allow(Gitlab::Database).to receive(:create_connection_pool).
and_return(ActiveRecord::Base.connection_pool)
end
describe '#connection' do
it 'returns a connection from the pool' do
expect(host.pool).to receive(:connection)
host.connection
end
end
describe '#release_connection' do
it 'releases the current connection from the pool' do
expect(host.pool).to receive(:release_connection)
host.release_connection
end
end
describe '#offline!' do
it 'marks the host as offline' do
expect(host.pool).to receive(:disconnect!)
host.offline!
end
end
describe '#online?' do
let(:error) { Class.new(RuntimeError) }
before do
allow(host.pool).to receive(:disconnect!)
end
it 'returns true when the host is online' do
expect(host).not_to receive(:connection)
expect(host).not_to receive(:release_connection)
expect(host.online?).to eq(true)
end
it 'returns true when the host was marked as offline but is online again' do
connection = double(:connection, active?: true)
allow(host).to receive(:connection).and_return(connection)
host.offline!
expect(host).to receive(:release_connection)
expect(host.online?).to eq(true)
end
it 'returns false when the host is offline' do
connection = double(:connection, active?: false)
allow(host).to receive(:connection).and_return(connection)
expect(host).to receive(:release_connection)
host.offline!
expect(host.online?).to eq(false)
end
it 'returns false when a connection could not be established' do
expect(host).to receive(:connection).exactly(4).times.and_raise(error)
expect(host).to receive(:release_connection).exactly(4).times
host.offline!
expect(host.online?).to eq(false)
end
it 'retries when a connection error is thrown' do
connection = double(:connection, active?: true)
raised = false
allow(host).to receive(:connection) do
unless raised
raised = true
raise error.new
end
connection
end
expect(host).to receive(:release_connection).twice
host.offline!
expect(host.online?).to eq(true)
end
end
describe '#caught_up?' do
let(:connection) { double(:connection) }
before do
allow(connection).to receive(:quote).and_return('foo')
end
it 'returns true when a host has caught up' do
allow(host).to receive(:connection).and_return(connection)
expect(connection).to receive(:select_all).and_return([{ 'result' => 't' }])
expect(host.caught_up?('foo')).to eq(true)
end
it 'returns false when a host has not caught up' do
allow(host).to receive(:connection).and_return(connection)
expect(connection).to receive(:select_all).and_return([{ 'result' => 'f' }])
expect(host.caught_up?('foo')).to eq(false)
end
end
end
require 'spec_helper'
describe Gitlab::Database::LoadBalancing::LoadBalancer do
let(:lb) { described_class.new(%w(localhost localhost)) }
before do
allow(Gitlab::Database).to receive(:create_connection_pool).
and_return(ActiveRecord::Base.connection_pool)
end
after do
RequestStore.delete(described_class::CACHE_KEY)
end
describe '#read' do
let(:conflict_error) { Class.new(RuntimeError) }
before do
stub_const(
'Gitlab::Database::LoadBalancing::LoadBalancer::PG::TRSerializationFailure',
conflict_error
)
end
it 'yields a connection for a read' do
connection = double(:connection)
host = double(:host)
allow(lb).to receive(:host).and_return(host)
expect(host).to receive(:connection).and_return(connection)
expect { |b| lb.read(&b) }.to yield_with_args(connection)
end
it 'marks hosts that are offline' do
allow(lb).to receive(:connection_error?).and_return(true)
expect(lb.host_list.hosts[0]).to receive(:offline!)
expect(lb).to receive(:release_host)
raised = false
returned = lb.read do
unless raised
raised = true
raise
end
10
end
expect(returned).to eq(10)
end
it 'retries a query in the event of a serialization failure' do
raised = false
expect(lb).to receive(:release_host)
returned = lb.read do
unless raised
raised = true
raise conflict_error.new
end
10
end
expect(returned).to eq(10)
end
it 'retries every host at most 3 times when a query conflict is raised' do
expect(lb).to receive(:release_host).exactly(6).times
expect(lb).to receive(:read_write)
lb.read { raise conflict_error.new }
end
it 'uses the primary if no secondaries are available' do
allow(lb).to receive(:connection_error?).and_return(true)
lb.host_list.hosts.each do |host|
expect(host).to receive(:online?).and_return(false)
end
expect(lb).to receive(:read_write).and_call_original
expect { |b| lb.read(&b) }.
to yield_with_args(ActiveRecord::Base.connection)
end
end
describe '#read_write' do
it 'yields a connection for a write' do
expect { |b| lb.read_write(&b) }.
to yield_with_args(ActiveRecord::Base.connection)
end
it 'uses a retry with exponential backoffs' do
expect(lb).to receive(:retry_with_backoff).and_yield
lb.read_write { 10 }
end
end
describe '#host' do
it 'returns the secondary host to use' do
expect(lb.host).to be_an_instance_of(Gitlab::Database::LoadBalancing::Host)
end
it 'stores the host in a thread-local variable' do
RequestStore.delete(described_class::CACHE_KEY)
expect(lb.host_list).to receive(:next).once.and_call_original
lb.host
lb.host
end
end
describe '#release_host' do
it 'releases the host and its connection' do
lb.host
lb.release_host
expect(RequestStore[described_class::CACHE_KEY]).to be_nil
end
end
describe '#release_primary_connection' do
it 'releases the connection to the primary' do
expect(ActiveRecord::Base.connection_pool).to receive(:release_connection)
lb.release_primary_connection
end
end
describe '#primary_write_location' do
if Gitlab::Database.postgresql?
it 'returns a String' do
expect(lb.primary_write_location).to be_an_instance_of(String)
end
end
it 'raises an error if the write location could not be retrieved' do
connection = double(:connection)
allow(lb).to receive(:read_write).and_yield(connection)
allow(connection).to receive(:select_all).and_return([])
expect { lb.primary_write_location }.to raise_error(RuntimeError)
end
end
describe '#all_caught_up?' do
it 'returns true if all hosts caught up to the write location' do
lb.host_list.hosts.each do |host|
expect(host).to receive(:caught_up?).with('foo').and_return(true)
end
expect(lb.all_caught_up?('foo')).to eq(true)
end
it 'returns false if a host has not yet caught up' do
expect(lb.host_list.hosts[0]).to receive(:caught_up?).
with('foo').
and_return(true)
expect(lb.host_list.hosts[1]).to receive(:caught_up?).
with('foo').
and_return(false)
expect(lb.all_caught_up?('foo')).to eq(false)
end
end
describe '#retry_with_backoff' do
it 'returns the value returned by the block' do
value = lb.retry_with_backoff { 10 }
expect(value).to eq(10)
end
it 're-raises errors not related to database connections' do
expect(lb).not_to receive(:sleep) # to make sure we're not retrying
expect { lb.retry_with_backoff { raise 'boop' } }.
to raise_error(RuntimeError)
end
it 'retries the block when a connection error is raised' do
allow(lb).to receive(:connection_error?).and_return(true)
expect(lb).to receive(:sleep).with(2)
expect(lb).to receive(:release_primary_connection)
raised = false
returned = lb.retry_with_backoff do
unless raised
raised = true
raise
end
10
end
expect(returned).to eq(10)
end
it 're-raises the connection error if the retries did not succeed' do
allow(lb).to receive(:connection_error?).and_return(true)
expect(lb).to receive(:sleep).with(2).ordered
expect(lb).to receive(:sleep).with(4).ordered
expect(lb).to receive(:sleep).with(16).ordered
expect(lb).to receive(:release_primary_connection).exactly(3).times
expect { lb.retry_with_backoff { raise } }.to raise_error(RuntimeError)
end
end
describe '#connection_error?' do
before do
stub_const('Gitlab::Database::LoadBalancing::LoadBalancer::CONNECTION_ERRORS',
[NotImplementedError])
end
it 'returns true for a connection error' do
error = NotImplementedError.new
expect(lb.connection_error?(error)).to eq(true)
end
it 'returns true for a wrapped connection error' do
original = NotImplementedError.new
wrapped = ActiveRecord::StatementInvalid.new('boop', original)
expect(lb.connection_error?(wrapped)).to eq(true)
end
it 'returns true for a wrapped connection error from a view' do
original = NotImplementedError.new
wrapped = ActionView::Template::Error.new('boop', original)
expect(lb.connection_error?(wrapped)).to eq(true)
end
it 'returns true for deeply wrapped/nested errors' do
original = NotImplementedError.new
middle = ActiveRecord::StatementInvalid.new('boop', original)
top = ActionView::Template::Error.new('boop', middle)
expect(lb.connection_error?(top)).to eq(true)
end
it 'returns true for an invalid encoding error' do
error = RuntimeError.new('invalid encoding name: unicode')
expect(lb.connection_error?(error)).to eq(true)
end
it 'returns false for errors not related to database connections' do
error = RuntimeError.new
expect(lb.connection_error?(error)).to eq(false)
end
end
describe '#serialization_failure?' do
let(:conflict_error) { Class.new(RuntimeError) }
before do
stub_const(
'Gitlab::Database::LoadBalancing::LoadBalancer::PG::TRSerializationFailure',
conflict_error
)
end
it 'returns for a serialization error' do
expect(lb.serialization_failure?(conflict_error.new)).to eq(true)
end
it 'returns true for a wrapped error' do
wrapped = ActionView::Template::Error.new('boop', conflict_error.new)
expect(lb.serialization_failure?(wrapped)).to eq(true)
end
end
end
require 'spec_helper'
describe Gitlab::Database::LoadBalancing::ModelProxy do
describe '#connection' do
it 'returns a connection proxy' do
dummy = Class.new do
include Gitlab::Database::LoadBalancing::ModelProxy
end
proxy = double(:proxy)
expect(Gitlab::Database::LoadBalancing).to receive(:proxy).
and_return(proxy)
expect(dummy.new.connection).to eq(proxy)
end
end
end
require 'spec_helper'
describe Gitlab::Database::LoadBalancing::RackMiddleware, :redis do
let(:app) { double(:app) }
let(:middleware) { described_class.new(app) }
after do
Gitlab::Database::LoadBalancing::Session.clear_session
end
describe '#call' do
let(:lb) { double(:lb) }
let(:user) { double(:user, id: 42) }
before do
expect(app).to receive(:call).with(an_instance_of(Hash))
allow(middleware).to receive(:load_balancer).and_return(lb)
expect(middleware).to receive(:clear).twice
end
context 'when the primary was used' do
it 'assigns the user to the primary' do
allow(middleware).to receive(:user_for_request).and_return(user)
allow(middleware).to receive(:last_write_location_for).
with(user).
and_return('123')
allow(lb).to receive(:all_caught_up?).with('123').and_return(false)
expect(middleware).to receive(:assign_primary_for_user).with(user)
middleware.call({})
end
end
context 'when a primary was not used' do
it 'does not assign the user to the primary' do
allow(middleware).to receive(:user_for_request).and_return(user)
allow(middleware).to receive(:last_write_location_for).
with(user).
and_return('123')
allow(lb).to receive(:all_caught_up?).with('123').and_return(true)
expect(middleware).not_to receive(:assign_primary_for_user)
middleware.call({})
end
end
end
describe '#check_primary_requirement' do
let(:lb) { double(:lb) }
let(:user) { double(:user, id: 42) }
before do
allow(middleware).to receive(:load_balancer).and_return(lb)
end
it 'marks the primary as the host to use when necessary' do
expect(middleware).to receive(:last_write_location_for).
with(user).
and_return('foo')
expect(lb).to receive(:all_caught_up?).with('foo').and_return(false)
expect(Gitlab::Database::LoadBalancing::Session.current).
to receive(:use_primary!)
middleware.check_primary_requirement(user)
end
it 'does not use the primary when there is no cached write location' do
expect(middleware).to receive(:last_write_location_for).
with(user).
and_return(nil)
expect(lb).not_to receive(:all_caught_up?)
expect(Gitlab::Database::LoadBalancing::Session.current).
not_to receive(:use_primary!)
middleware.check_primary_requirement(user)
end
it 'does not use the primary when all hosts have caught up' do
expect(middleware).to receive(:last_write_location_for).
with(user).
and_return('foo')
expect(lb).to receive(:all_caught_up?).with('foo').and_return(true)
expect(middleware).to receive(:delete_write_location_for).with(user)
middleware.check_primary_requirement(user)
end
end
describe '#assign_primary_for_user' do
it 'stores primary instance details for the current user' do
user = double(:user, id: 42)
lb = double(:load_balancer, primary_write_location: '123')
allow(middleware).to receive(:load_balancer).and_return(lb)
expect(middleware).to receive(:set_write_location_for).with(user, '123')
middleware.assign_primary_for_user(user)
end
end
describe '#clear' do
it 'clears the currently used host and session' do
proxy = double(:proxy)
lb = double(:lb)
allow(Gitlab::Database::LoadBalancing).to receive(:proxy).and_return(proxy)
allow(proxy).to receive(:load_balancer).and_return(lb)
expect(lb).to receive(:release_host)
middleware.clear
thread_key = Gitlab::Database::LoadBalancing::Session::CACHE_KEY
expect(RequestStore[thread_key]).to be_nil
end
end
describe '#load_balancer' do
it 'returns the load balancer' do
proxy = double(:proxy)
allow(Gitlab::Database::LoadBalancing).to receive(:proxy).and_return(proxy)
expect(proxy).to receive(:load_balancer)
middleware.load_balancer
end
end
describe '#user_for_request' do
let(:user) { double(:user, id: 42) }
it 'returns the current user for a Grape request' do
env = { 'api.endpoint' => double(:api, current_user: user) }
expect(middleware.user_for_request(env)).to eq(user)
end
it 'returns the current user for a Rails request' do
env = { 'warden' => double(:warden, user: user) }
expect(middleware.user_for_request(env)).to eq(user)
end
it 'returns nil if no user could be found' do
expect(middleware.user_for_request({})).to be_nil
end
end
describe '#last_write_location_for' do
it 'returns the last WAL write location for a user' do
user = double(:user, id: 42)
middleware.set_write_location_for(user, '123')
expect(middleware.last_write_location_for(user)).to eq('123')
end
end
describe '#delete_write_location' do
it 'removes the WAL write location from Redis' do
user = double(:user, id: 42)
middleware.set_write_location_for(user, '123')
middleware.delete_write_location_for(user)
expect(middleware.last_write_location_for(user)).to be_nil
end
end
describe '#set_write_location' do
it 'stores the WAL write location in Redis' do
user = double(:user, id: 42)
middleware.set_write_location_for(user, '123')
expect(middleware.last_write_location_for(user)).to eq('123')
end
end
describe '#redis_key_for' do
it 'returns a String' do
user = double(:user, id: 42)
expect(middleware.redis_key_for(user)).
to eq('database-load-balancing/write-location/42')
end
end
end
require 'spec_helper'
describe Gitlab::Database::LoadBalancing::Session do
after do
described_class.clear_session
end
describe '.current' do
it 'returns the current session' do
expect(described_class.current).to be_an_instance_of(described_class)
end
end
describe '.clear_session' do
it 'clears the current session' do
described_class.current
described_class.clear_session
expect(RequestStore[described_class::CACHE_KEY]).to be_nil
end
end
describe '#use_primary?' do
it 'returns true when the primary should be used' do
instance = described_class.new
instance.use_primary!
expect(instance.use_primary?).to eq(true)
end
it 'returns false when a secondary should be used' do
expect(described_class.new.use_primary?).to eq(false)
end
end
end
require 'spec_helper'
describe Gitlab::Database::LoadBalancing do
describe '.log' do
it 'logs a message' do
expect(Rails.logger).to receive(:info).with('boop')
described_class.log(:info, 'boop')
end
end
describe '.hosts' do
it 'returns a list of hosts' do
allow(ActiveRecord::Base.configurations[Rails.env]).to receive(:[]).
with('load_balancing').
and_return({ 'hosts' => %w(foo bar baz) })
expect(described_class.hosts).to eq(%w(foo bar baz))
end
end
describe '.pool_size' do
it 'returns a Fixnum' do
expect(described_class.pool_size).to be_a_kind_of(Integer)
end
end
describe '.enable?' do
it 'returns false when no hosts are specified' do
allow(described_class).to receive(:hosts).and_return([])
expect(described_class.enable?).to eq(false)
end
it 'returns false when Sidekiq is being used' do
allow(described_class).to receive(:hosts).and_return(%w(foo))
allow(Sidekiq).to receive(:server?).and_return(true)
expect(described_class.enable?).to eq(false)
end
it 'returns false when a database other than PostgreSQL is being used' do
allow(described_class).to receive(:hosts).and_return(%w(foo))
allow(Sidekiq).to receive(:server?).and_return(false)
allow(Gitlab::Database).to receive(:postgresql?).and_return(false)
expect(described_class.enable?).to eq(false)
end
it 'returns false when running inside a Rake task' do
expect(described_class).to receive(:program_name).and_return('rake')
expect(described_class.enable?).to eq(false)
end
it 'returns true when load balancing should be enabled' do
allow(described_class).to receive(:hosts).and_return(%w(foo))
allow(Sidekiq).to receive(:server?).and_return(false)
allow(Gitlab::Database).to receive(:postgresql?).and_return(true)
expect(described_class.enable?).to eq(true)
end
end
describe '.program_name' do
it 'returns a String' do
expect(described_class.program_name).to be_an_instance_of(String)
end
end
describe '.configure_proxy' do
after do
described_class.proxy = nil
end
it 'configures the connection proxy' do
model = double(:model)
expect(ActiveRecord::Base.singleton_class).to receive(:prepend).
with(Gitlab::Database::LoadBalancing::ActiveRecordProxy)
expect(described_class).to receive(:active_record_models).
and_return([model])
expect(model.singleton_class).to receive(:prepend).
with(Gitlab::Database::LoadBalancing::ModelProxy)
described_class.configure_proxy
end
end
describe '.active_record_models' do
it 'returns an Array' do
expect(described_class.active_record_models).to be_an_instance_of(Array)
end
end
end
......@@ -167,4 +167,21 @@ describe Gitlab::Database, lib: true do
expect(MigrationTest.new.false_value).to eq 0
end
end
describe '#disable_prepared_statements' do
it 'disables prepared statements' do
config = {}
expect(ActiveRecord::Base.configurations).to receive(:[]).
with(Rails.env).
and_return(config)
expect(ActiveRecord::Base).to receive(:establish_connection).
with({ 'prepared_statements' => false })
described_class.disable_prepared_statements
expect(config['prepared_statements']).to eq(false)
end
end
end
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment