• unknown's avatar
    MDEV-5509: Seconds_behind_master incorrect in parallel replication · 8cc6e90d
    unknown authored
    The problem was a race between the SQL driver thread and the worker threads.
    The SQL driver thread would set rli->last_master_timestamp to zero to
    mark that it has caught up with the master, while the worker threads would
    set it to the timestamp of the executed event. This can happen out-of-order
    in parallel replication, causing the "caught up" status to be overwritten
    and Seconds_Behind_Master to wrongly grow when the slave is idle.
    
    To fix, introduce a separate flag rli->sql_thread_caught_up to mark that the
    SQL driver thread is caught up. This avoids issues with worker threads
    overwriting the SQL driver thread status. In parallel replication, we then
    make SHOW SLAVE STATUS check in addition that all worker threads are idle
    before showing Seconds_Behind_Master as 0 due to slave idle.
    8cc6e90d
rpl_rli.cc 57.6 KB