• Al Viro's avatar
    fix bitmap corruption on close_range() with CLOSE_RANGE_UNSHARE · 9a2fa147
    Al Viro authored
    copy_fd_bitmaps(new, old, count) is expected to copy the first
    count/BITS_PER_LONG bits from old->full_fds_bits[] and fill
    the rest with zeroes.  What it does is copying enough words
    (BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest.
    That works fine, *if* all bits past the cutoff point are
    clear.  Otherwise we are risking garbage from the last word
    we'd copied.
    
    For most of the callers that is true - expand_fdtable() has
    count equal to old->max_fds, so there's no open descriptors
    past count, let alone fully occupied words in ->open_fds[],
    which is what bits in ->full_fds_bits[] correspond to.
    
    The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds),
    which is the smallest multiple of BITS_PER_LONG that covers all
    opened descriptors below max_fds.  In the common case (copying on
    fork()) max_fds is ~0U, so all opened descriptors will be below
    it and we are fine, by the same reasons why the call in expand_fdtable()
    is safe.
    
    Unfortunately, there is a case where max_fds is less than that
    and where we might, indeed, end up with junk in ->full_fds_bits[] -
    close_range(from, to, CLOSE_RANGE_UNSHARE) with
    	* descriptor table being currently shared
    	* 'to' being above the current capacity of descriptor table
    	* 'from' being just under some chunk of opened descriptors.
    In that case we end up with observably wrong behaviour - e.g. spawn
    a child with CLONE_FILES, get all descriptors in range 0..127 open,
    then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending
    up with descriptor #128, despite #64 being observably not open.
    
    The minimally invasive fix would be to deal with that in dup_fd().
    If this proves to add measurable overhead, we can go that way, but
    let's try to fix copy_fd_bitmaps() first.
    
    * new helper: bitmap_copy_and_expand(to, from, bits_to_copy, size).
    * make copy_fd_bitmaps() take the bitmap size in words, rather than
    bits; it's 'count' argument is always a multiple of BITS_PER_LONG,
    so we are not losing any information, and that way we can use the
    same helper for all three bitmaps - compiler will see that count
    is a multiple of BITS_PER_LONG for the large ones, so it'll generate
    plain memcpy()+memset().
    
    Reproducer added to tools/testing/selftests/core/close_range_test.c
    
    Cc: stable@vger.kernel.org
    Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
    9a2fa147
file.c 36.1 KB