• Mauricio Faria de Oliveira's avatar
    Revert "dm mpath: fix stalls when handling invalid ioctls" · ec3b2cdc
    Mauricio Faria de Oliveira authored
    commit 47796938 upstream.
    
    This reverts commit a1989b33.
    
    That commit introduced a regression at least for the case of the SG_IO ioctl()
    running without CAP_SYS_RAWIO capability (e.g., unprivileged users) when there
    are no active paths: the ioctl() fails with the ENOTTY errno immediately rather
    than blocking due to queue_if_no_path until a path becomes active, for example.
    
    That case happens to be exercised by QEMU KVM guests with 'scsi-block' devices
    (qemu "-device scsi-block" [1], libvirt "<disk type='block' device='lun'>" [2])
    from multipath devices; which leads to SCSI/filesystem errors in such a guest.
    
    More general scenarios can hit that regression too. The following demonstration
    employs a SG_IO ioctl() with a standard SCSI INQUIRY command for this objective
    (some output & user changes omitted for brevity and comments added for clarity).
    
    Reverting that commit restores normal operation (queueing) in failing scenarios;
    tested on linux-next (next-20151022).
    
    1) Test-case is based on sg_simple0 [3] (just SG_IO; remove SG_GET_VERSION_NUM)
    
        $ cat sg_simple0.c
        ... see [3] ...
        $ sed '/SG_GET_VERSION_NUM/,/}/d' sg_simple0.c > sgio_inquiry.c
        $ gcc sgio_inquiry.c -o sgio_inquiry
    
    2) The ioctl() works fine with active paths present.
    
        # multipath -l 85ag56
        85ag56 (...) dm-19 IBM     ,2145
        size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw
        |-+- policy='service-time 0' prio=0 status=active
        | |- 8:0:11:0  sdz  65:144  active undef running
        | `- 9:0:9:0   sdbf 67:144  active undef running
        `-+- policy='service-time 0' prio=0 status=enabled
          |- 8:0:12:0  sdae 65:224  active undef running
          `- 9:0:12:0  sdbo 68:32   active undef running
    
        $ ./sgio_inquiry /dev/mapper/85ag56
        Some of the INQUIRY command's response:
            IBM       2145              0000
        INQUIRY duration=0 millisecs, resid=0
    
    3) The ioctl() fails with ENOTTY errno with _no_ active paths present,
       for unprivileged users (rather than blocking due to queue_if_no_path).
    
        # for path in $(multipath -l 85ag56 | grep -o 'sd[a-z]\+'); \
              do multipathd -k"fail path $path"; done
    
        # multipath -l 85ag56
        85ag56 (...) dm-19 IBM     ,2145
        size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw
        |-+- policy='service-time 0' prio=0 status=enabled
        | |- 8:0:11:0  sdz  65:144  failed undef running
        | `- 9:0:9:0   sdbf 67:144  failed undef running
        `-+- policy='service-time 0' prio=0 status=enabled
          |- 8:0:12:0  sdae 65:224  failed undef running
          `- 9:0:12:0  sdbo 68:32   failed undef running
    
        $ ./sgio_inquiry /dev/mapper/85ag56
        sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device
    
    4) dmesg shows that scsi_verify_blk_ioctl() failed for SG_IO (0x2285);
       it returns -ENOIOCTLCMD, later replaced with -ENOTTY in vfs_ioctl().
    
        $ dmesg
        <...>
        [] device-mapper: multipath: Failing path 65:144.
        [] device-mapper: multipath: Failing path 67:144.
        [] device-mapper: multipath: Failing path 65:224.
        [] device-mapper: multipath: Failing path 68:32.
        [] sgio_inquiry: sending ioctl 2285 to a partition!
    
    5) The ioctl() only works if the SYS_CAP_RAWIO capability is present
       (then queueing happens -- in this example, queue_if_no_path is set);
       this is due to a conditional check in scsi_verify_blk_ioctl().
    
        # capsh --drop=cap_sys_rawio -- -c './sgio_inquiry /dev/mapper/85ag56'
        sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device
    
        # ./sgio_inquiry /dev/mapper/85ag56 &
        [1] 72830
    
        # cat /proc/72830/stack
        [<c00000171c0df700>] 0xc00000171c0df700
        [<c000000000015934>] __switch_to+0x204/0x350
        [<c000000000152d4c>] msleep+0x5c/0x80
        [<c00000000077dfb0>] dm_blk_ioctl+0x70/0x170
        [<c000000000487c40>] blkdev_ioctl+0x2b0/0x9b0
        [<c0000000003128e4>] block_ioctl+0x64/0xd0
        [<c0000000002dd3b0>] do_vfs_ioctl+0x490/0x780
        [<c0000000002dd774>] SyS_ioctl+0xd4/0xf0
        [<c000000000009358>] system_call+0x38/0xd0
    
    6) This is the function call chain exercised in this analysis:
    
    SYSCALL_DEFINE3(ioctl, <...>) @ fs/ioctl.c
        -> do_vfs_ioctl()
            -> vfs_ioctl()
                ...
                error = filp->f_op->unlocked_ioctl(filp, cmd, arg);
                ...
                    -> dm_blk_ioctl() @ drivers/md/dm.c
                        -> multipath_ioctl() @ drivers/md/dm-mpath.c
                            ...
                            (bdev = NULL, due to no active paths)
                            ...
                            if (!bdev || <...>) {
                                int err = scsi_verify_blk_ioctl(NULL, cmd);
                                if (err)
                                    r = err;
                            }
                            ...
                                -> scsi_verify_blk_ioctl() @ block/scsi_ioctl.c
                                    ...
                                    if (bd && bd == bd->bd_contains) // not taken (bd = NULL)
                                        return 0;
                                    ...
                                    if (capable(CAP_SYS_RAWIO)) // not taken (unprivileged user)
                                        return 0;
                                    ...
                                    printk_ratelimited(KERN_WARNING
                                               "%s: sending ioctl %x to a partition!\n" <...>);
    
                                    return -ENOIOCTLCMD;
                                <-
                            ...
                            return r ? : <...>
                        <-
                ...
                if (error == -ENOIOCTLCMD)
                    error = -ENOTTY;
                 out:
                    return error;
                ...
    
    Links:
    [1] http://git.qemu.org/?p=qemu.git;a=commit;h=336a6915bc7089fb20fea4ba99972ad9a97c5f52
    [2] https://libvirt.org/formatdomain.html#elementsDisks (see 'disk' -> 'device')
    [3] http://tldp.org/HOWTO/SCSI-Generic-HOWTO/pexample.html (Revision 1.2, 2002-05-03)
    Signed-off-by: default avatarMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
    Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
    Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
    ec3b2cdc
dm-mpath.c 41.6 KB