• Boris Burkov's avatar
    btrfs: add ro compat flags to inodes · 77eea05e
    Boris Burkov authored
    Currently, inode flags are fully backwards incompatible in btrfs. If we
    introduce a new inode flag, then tree-checker will detect it and fail.
    This can even cause us to fail to mount entirely. To make it possible to
    introduce new flags which can be read-only compatible, like VERITY, we
    add new ro flags to btrfs without treating them quite so harshly in
    tree-checker. A read-only file system can survive an unexpected flag,
    and can be mounted.
    
    As for the implementation, it unfortunately gets a little complicated.
    
    The on-disk representation of the inode, btrfs_inode_item, has an __le64
    for flags but the in-memory representation, btrfs_inode, uses a u32.
    David Sterba had the nice idea that we could reclaim those wasted 32 bits
    on disk and use them for the new ro_compat flags.
    
    It turns out that the tree-checker code which checks for unknown flags
    is broken, and ignores the upper 32 bits we are hoping to use. The issue
    is that the flags use the literal 1 rather than 1ULL, so the flags are
    signed ints, and one of them is specifically (1 << 31). As a result, the
    mask which ORs the flags is a negative integer on machines where int is
    32 bit twos complement. When tree-checker evaluates the expression:
    
      btrfs_inode_flags(leaf, iitem) & ~BTRFS_INODE_FLAG_MASK)
    
    The mask is something like 0x80000abc, which gets promoted to u64 with
    sign extension to 0xffffffff80000abc. Negating that 64 bit mask leaves
    all the upper bits zeroed, and we can't detect unexpected flags.
    
    This suggests that we can't use those bits after all. Luckily, we have
    good reason to believe that they are zero anyway. Inode flags are
    metadata, which is always checksummed, so any bit flips that would
    introduce 1s would cause a checksum failure anyway (excluding the
    improbable case of the checksum getting corrupted exactly badly).
    
    Further, unless the 1 << 31 flag is used, the cast to u64 of the 32 bit
    inode flag should preserve its value and not add leading zeroes
    (at least for twos complement). The only place that flag
    (BTRFS_INODE_ROOT_ITEM_INIT) is used is in a special inode embedded in
    the root item, and indeed for that inode we see 0xffffffff80000000 as
    the flags on disk. However, that inode is never seen by tree checker,
    nor is it used in a context where verity might be meaningful.
    Theoretically, a future ro flag might cause trouble on that inode, so we
    should proactively clean up that mess before it does.
    
    With the introduction of the new ro flags, keep two separate unsigned
    masks and check them against the appropriate u32. Since we no longer run
    afoul of sign extension, this also stops writing out 0xffffffff80000000
    in root_item inodes going forward.
    Signed-off-by: default avatarBoris Burkov <boris@bur.io>
    Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    77eea05e
delayed-inode.c 49.9 KB