• Gulam Mohamed's avatar
    loop: Fix a race between loop detach and loop open · 18048c1a
    Gulam Mohamed authored
    1. Userspace sends the command "losetup -d" which uses the open() call
       to open the device
    2. Kernel receives the ioctl command "LOOP_CLR_FD" which calls the
       function loop_clr_fd()
    3. If LOOP_CLR_FD is the first command received at the time, then the
       AUTOCLEAR flag is not set and deletion of the
       loop device proceeds ahead and scans the partitions (drop/add
       partitions)
    
            if (disk_openers(lo->lo_disk) > 1) {
                    lo->lo_flags |= LO_FLAGS_AUTOCLEAR;
                    loop_global_unlock(lo, true);
                    return 0;
            }
    
     4. Before scanning partitions, it will check to see if any partition of
        the loop device is currently opened
     5. If any partition is opened, then it will return EBUSY:
    
        if (disk->open_partitions)
                    return -EBUSY;
     6. So, after receiving the "LOOP_CLR_FD" command and just before the above
        check for open_partitions, if any other command
        (like blkid) opens any partition of the loop device, then the partition
        scan will not proceed and EBUSY is returned as shown in above code
     7. But in "__loop_clr_fd()", this EBUSY error is not propagated
     8. We have noticed that this is causing the partitions of the loop to
        remain stale even after the loop device is detached resulting in the
        IO errors on the partitions
    
    Fix:
    
    Defer the detach of loop device to release function, which is called when
    the last close happens, by setting the lo_flags to LO_FLAGS_AUTOCLEAR at
    the time of detach i.e in loop_clr_fd() function.
    
    Test case involves the following two scripts:
    
    script1.sh:
    
    while [ 1 ];
    do
            losetup -P -f /home/opt/looptest/test10.img
            blkid /dev/loop0p1
    done
    
    script2.sh:
    
    while [ 1 ];
    do
            losetup -d /dev/loop0
    done
    
    Without fix, the following IO errors have been observed:
    
    kernel: __loop_clr_fd: partition scan of loop0 failed (rc=-16)
    kernel: I/O error, dev loop0, sector 20971392 op 0x0:(READ) flags 0x80700
            phys_seg 1 prio class 0
    kernel: I/O error, dev loop0, sector 108868 op 0x0:(READ) flags 0x0
            phys_seg 1 prio class 0
    kernel: Buffer I/O error on dev loop0p1, logical block 27201, async page
            read
    Signed-off-by: default avatarGulam Mohamed <gulam.mohamed@oracle.com>
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20240618164042.343777-1-gulam.mohamed@oracle.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
    18048c1a
loop.c 57.4 KB