• Dave Chinner's avatar
    xfs: ATTR_REPLACE algorithm with LARP enabled needs rework · fdaf1bb3
    Dave Chinner authored
    We can't use the same algorithm for replacing an existing attribute
    when logging attributes. The existing algorithm is essentially:
    
    1. create new attr w/ INCOMPLETE
    2. atomically flip INCOMPLETE flags between old + new attribute
    3. remove old attr which is marked w/ INCOMPLETE
    
    This algorithm guarantees that we see either the old or new
    attribute, and if we fail after the atomic flag flip, we don't have
    to recover the removal of the old attr because we never see
    INCOMPLETE attributes in lookups.
    
    For logged attributes, however, this does not work. The logged
    attribute intents do not track the work that has been done as the
    transaction rolls, and hence the only recovery mechanism we have is
    "run the replace operation from scratch".
    
    This is further exacerbated by the attempt to avoid needing the
    INCOMPLETE flag to create an atomic swap. This means we can create
    a second active attribute of the same name before we remove the
    original. If we fail at any point after the create but before the
    removal has completed, we end up with duplicate attributes in
    the attr btree and recovery only tries to replace one of them.
    
    There are several other failure modes where we can leave partially
    allocated remote attributes that expose stale data, partially free
    remote attributes that enable UAF based stale data exposure, etc.
    
    TO fix this, we need a different algorithm for replace operations
    when LARP is enabled. Luckily, it's not that complex if we take the
    right first step. That is, the first thing we log is the attri
    intent with the new name/value pair and mark the old attr as
    INCOMPLETE in the same transaction.
    
    From there, we then remove the old attr and keep relogging the
    new name/value in the intent, such that we always know that we have
    to create the new attr in recovery. Once the old attr is removed,
    we then run a normal ATTR_CREATE operation relogging the intent as
    we go. If the new attr is local, then it gets created in a single
    atomic transaction that also logs the final intent done. If the new
    attr is remote, the we set INCOMPLETE on the new attr while we
    allocate and set the remote value, and then we clear the INCOMPLETE
    flag at in the last transaction taht logs the final intent done.
    
    If we fail at any point in this algorithm, log recovery will always
    see the same state on disk: the new name/value in the intent, and
    either an INCOMPLETE attr or no attr in the attr btree. If we find
    an INCOMPLETE attr, we run the full replace starting with removing
    the INCOMPLETE attr. If we don't find it, then we simply create the
    new attr.
    
    Notably, recovery of a failed create that has an INCOMPLETE flag set
    is now the same - we start with the lookup of the INCOMPLETE attr,
    and if that exists then we do the full replace recovery process,
    otherwise we just create the new attr.
    
    Hence changing the way we do the replace operation when LARP is
    enabled allows us to use the same log recovery algorithm for both
    the ATTR_CREATE and ATTR_REPLACE operations. This is also the same
    algorithm we use for runtime ATTR_REPLACE operations (except for the
    step setting up the initial conditions).
    
    The result is that:
    
    - ATTR_CREATE uses the same algorithm regardless of whether LARP is
      enabled or not
    - ATTR_REPLACE with larp=0 is identical to the old algorithm
    - ATTR_REPLACE with larp=1 runs an unmodified attr removal algorithm
      from the larp=0 code and then runs the unmodified ATTR_CREATE
      code.
    - log recovery when larp=1 runs the same ATTR_REPLACE algorithm as
      it uses at runtime.
    
    Because the state machine is now quite clean, changing the algorithm
    is really just a case of changing the initial state and how the
    states link together for the ATTR_REPLACE case. Hence it's not a
    huge amount of code for what is a fairly substantial rework
    of the attr logging and recovery algorithm....
    Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
    Reviewed-by: default avatarAllison Henderson <allison.henderson@oracle.com>
    Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
    Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
    fdaf1bb3
xfs_trace.h 130 KB