1. 14 Mar, 2017 1 commit
    • Yorick Peterse's avatar
      Add support for load balancing database queries · 623db07d
      Yorick Peterse authored
      This adds support for balancing queries amongst multiple database hosts.
      Web requests will stick to using the primary for a little while after a
      write took place, removing the need for synchronous replication. Load
      balancing is disabled for Sidekiq since using this could lead to race
      conditions, and Sidekiq mostly performs writes anyway.
      
      == Balancing
      
      Balancing is done using a simple round-robin algorithm. The first time a
      connection is needed the first host is used, then the second, third,
      etc. This logic resets to the first host once reaching the end of the
      hosts list.
      
      The code added in this commit _only_ load balances queries sent from
      models. The code does _not_ touch ActiveRecord::Base.connection. This
      means that direct use of this method will result in the queries being
      sent to the primary.
      
      == Configuration
      
      Configuration is done by adding a YAML section to config/database.yml.
      For example:
      
          production:
            load_balancing:
              hosts:
                - 10.0.0.1
                - 10.0.0.2
      
      All hosts will use the same authentication credentials.
      
      == Sticking
      
      When a write is performed the query is sent to the primary. Any queries
      executed after this point are also sent to the primary. At the end of a
      request some session details are stored for the current user, these
      details are used to stick to the primary for as long as necessary (or
      until the data expires).
      
      This prevents the user from running into cases where they write data to
      the primary, read from the secondary, and the data isn't available yet
      (e.g. leading to an HTTP 404 error).
      
      == Overhead
      
      The load balancing code has minimal overhead. Instead of parsing raw SQL
      queries it hooks into Rails specific methods to determine what host to
      use for a query.
      
      == Transactions
      
      Transactions are always executed on the primary, even if they don't
      perform any writes. Once a transaction completes a session will stick to
      the primary. This is based on transactions almost always being used for
      writes (there's little benefit to using a transaction for only reads).
      
      == Prepared Statements
      
      Prepared statements don't work well when queries are being distributed
      amongst hosts. As a result GitLab will automatically disable prepared
      statements when load balancing is enabled. Disabling prepared statements
      has no impact on response timings, and may even reduce the memory usage
      of PostgreSQL.
      
      == Failovers
      
      The load balancing code is capable of dealing with database failovers.
      In the event of a secondary being unavailable the load balancer will
      mark it as offline and use the next available secondary. If no
      secondaries are available the primary is used instead.
      
      Secondaries that are marked as offline are checked again automatically,
      preventing a host from being marked as offline forever.
      
      In the event of a connection error when writing to the primary the
      code will suspend the caller, then retry the operation up to 3 times.
      Every retry the sleep time will increase exponentially.
      
      All of this means that in the event of a DB restart or failover some
      requests may take a bit longer to complete; instead of the application
      immediately returning an error.
      623db07d
  2. 10 Mar, 2017 20 commits
  3. 09 Mar, 2017 19 commits