• Kent Overstreet's avatar
    bcachefs: Fix needs_whiteout BUG_ON() in bkey_sort() · 5dfd3746
    Kent Overstreet authored
    Btree nodes are log structured; thus, we need to emit whiteouts when
    we're deleting a key that's been written out to disk.
    
    k->needs_whiteout tracks whether a key will need a whiteout when it's
    deleted, and this requires some careful handling; e.g. the key we're
    deleting may not have been written out to disk, but it may have
    overwritten a key that was - thus we need to carry this flag around on
    overwrites.
    
    Invariants:
    There may be multiple key for the same position in a given node (because
    of overwrites), but only one of them will be a live (non deleted) key,
    and only one key for a given position will have the needs_whiteout flag
    set.
    
    Additionally, we don't want to carry around whiteouts that need to be
    written in the main searchable part of a btree node - btree_iter_peek()
    will have to skip past them, and this can lead to an O(n^2) issues when
    doing sequential deletions (e.g. inode rm/truncate). So there's a
    separate region in the btree node buffer for unwritten whiteouts; these
    are merge sorted with the rest of the keys we're writing in the btree
    node write path.
    
    The unwritten whiteouts was a later optimization that bch2_sort_keys()
    didn't take into account; the unwritten whiteouts area means that we
    never have deleted keys with needs_whiteout set in the main searchable
    part of a btree node.
    
    That means we can simplify and optimize some sort paths, and eliminate
    an assertion that syzbot found:
    
    - Unless we're in the btree node write path, it's always ok to drop
      whiteouts when sorting
    - When sorting for a btree node write, we drop the whiteout if it's not
      from the unwritten whiteouts area, or if it's overwritten by a real
      key at the same position.
    
    This completely eliminates some tricky logic for propagating the
    needs_whiteout flag: syzbot was able to hit the assertion that checked
    that there shouldn't be more than one key at the same pos with
    needs_whiteout set, likely due to a combination of flipping on
    needs_whiteout on all written keys (they need whiteouts if overwritten),
    combined with not always dropping unneeded whiteouts, and the tricky
    logic in the sort path for preserving needs_whiteout that wasn't really
    needed.
    Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
    5dfd3746
bkey_sort.h 1.28 KB