• Yorick Peterse's avatar
    Handle outdated replicas in the DB load balancer · a2867b1b
    Yorick Peterse authored
    Instead of checking if a replica is online by just running "SELECT 1" we
    instead check the replication status. If a replica is lagging behind too
    much time and data wise we'll stop using it until it is back in sync
    with the primary. We also only check for the status roughly every 30
    seconds. This reduces the overhead of the status checks, at the cost of
    the status potentially lagging behind the real world for 30 seconds or
    so.
    
    Checking the replicas happens in a request, without any central
    coordination mechanism. This keeps the implementation simple and the
    processes independent of _another_ central service. To prevent all
    processes from doing the same work at the same time we randomize the
    checking intervals on a per process basis. This won't prevent 2
    processes from checking at the same time, but it does reduce the
    likelihood of _all_ of check checking at the same time.
    
    Fixes https://gitlab.com/gitlab-org/gitlab-ee/issues/2197, closes
    https://gitlab.com/gitlab-org/gitlab-ee/issues/2866
    a2867b1b
database_load_balancing.md 7.98 KB