Handle outdated replicas in the DB load balancer
Instead of checking if a replica is online by just running "SELECT 1" we instead check the replication status. If a replica is lagging behind too much time and data wise we'll stop using it until it is back in sync with the primary. We also only check for the status roughly every 30 seconds. This reduces the overhead of the status checks, at the cost of the status potentially lagging behind the real world for 30 seconds or so. Checking the replicas happens in a request, without any central coordination mechanism. This keeps the implementation simple and the processes independent of _another_ central service. To prevent all processes from doing the same work at the same time we randomize the checking intervals on a per process basis. This won't prevent 2 processes from checking at the same time, but it does reduce the likelihood of _all_ of check checking at the same time. Fixes https://gitlab.com/gitlab-org/gitlab-ee/issues/2197, closes https://gitlab.com/gitlab-org/gitlab-ee/issues/2866
Showing
Please register or sign in to comment