• Dave Chinner's avatar
    xfs: CIL checkpoint flushes caches unconditionally · bad77c37
    Dave Chinner authored
    Currently every journal IO is issued as REQ_PREFLUSH | REQ_FUA to
    guarantee the ordering requirements the journal has w.r.t. metadata
    writeback. THe two ordering constraints are:
    
    1. we cannot overwrite metadata in the journal until we guarantee
    that the dirty metadata has been written back in place and is
    stable.
    
    2. we cannot write back dirty metadata until it has been written to
    the journal and guaranteed to be stable (and hence recoverable) in
    the journal.
    
    These rules apply to the atomic transactions recorded in the
    journal, not to the journal IO itself. Hence we need to ensure
    metadata is stable before we start writing a new transaction to the
    journal (guarantee #1), and we need to ensure the entire transaction
    is stable in the journal before we start metadata writeback
    (guarantee #2).
    
    The ordering guarantees of #1 are currently provided by REQ_PREFLUSH
    being added to every iclog IO. This causes the journal IO to issue a
    cache flush and wait for it to complete before issuing the write IO
    to the journal. Hence all completed metadata IO is guaranteed to be
    stable before the journal overwrites the old metadata.
    
    However, for long running CIL checkpoints that might do a thousand
    journal IOs, we don't need every single one of these iclog IOs to
    issue a cache flush - the cache flush done before the first iclog is
    submitted is sufficient to cover the entire range in the log that
    the checkpoint will overwrite because the CIL space reservation
    guarantees the tail of the log (completed metadata) is already
    beyond the range of the checkpoint write.
    
    Hence we only need a full cache flush between closing off the CIL
    checkpoint context (i.e. when the push switches it out) and issuing
    the first journal IO. Rather than plumbing this through to the
    journal IO, we can start this cache flush the moment the CIL context
    is owned exclusively by the push worker. The cache flush can be in
    progress while we process the CIL ready for writing, hence
    reducing the latency of the initial iclog write. This is especially
    true for large checkpoints, where we might have to process hundreds
    of thousands of log vectors before we issue the first iclog write.
    In these cases, it is likely the cache flush has already been
    completed by the time we have built the CIL log vector chain.
    Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
    Reviewed-by: default avatarChandan Babu R <chandanrlinux@gmail.com>
    Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
    Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
    Reviewed-by: default avatarAllison Henderson <allison.henderson@oracle.com>
    Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
    bad77c37
xfs_log_cil.c 38.9 KB