• Peter Oberparleiter's avatar
    s390/cio: fix invalid -EBUSY on ccw_device_start · 5ef1dc40
    Peter Oberparleiter authored
    The s390 common I/O layer (CIO) returns an unexpected -EBUSY return code
    when drivers try to start I/O while a path-verification (PV) process is
    pending. This can lead to failed device initialization attempts with
    symptoms like broken network connectivity after boot.
    
    Fix this by replacing the -EBUSY return code with a deferred condition
    code 1 reply to make path-verification handling consistent from a
    driver's point of view.
    
    The problem can be reproduced semi-regularly using the following process,
    while repeating steps 2-3 as necessary (example assumes an OSA device
    with bus-IDs 0.0.a000-0.0.a002 on CHPID 0.02):
    
    1. echo 0.0.a000,0.0.a001,0.0.a002 >/sys/bus/ccwgroup/drivers/qeth/group
    2. echo 0 > /sys/bus/ccwgroup/devices/0.0.a000/online
    3. echo 1 > /sys/bus/ccwgroup/devices/0.0.a000/online ; \
       echo on > /sys/devices/css0/chp0.02/status
    
    Background information:
    
    The common I/O layer starts path-verification I/Os when it receives
    indications about changes in a device path's availability. This occurs
    for example when hardware events indicate a change in channel-path
    status, or when a manual operation such as a CHPID vary or configure
    operation is performed.
    
    If a driver attempts to start I/O while a PV is running, CIO reports a
    successful I/O start (ccw_device_start() return code 0). Then, after
    completion of PV, CIO synthesizes an interrupt response that indicates
    an asynchronous status condition that prevented the start of the I/O
    (deferred condition code 1).
    
    If a PV indication arrives while a device is busy with driver-owned I/O,
    PV is delayed until after I/O completion was reported to the driver's
    interrupt handler. To ensure that PV can be started eventually, CIO
    reports a device busy condition (ccw_device_start() return code -EBUSY)
    if a driver tries to start another I/O while PV is pending.
    
    In some cases this -EBUSY return code causes device drivers to consider
    a device not operational, resulting in failed device initialization.
    
    Note: The code that introduced the problem was added in 2003. Symptoms
    started appearing with the following CIO commit that causes a PV
    indication when a device is removed from the cio_ignore list after the
    associated parent subchannel device was probed, but before online
    processing of the CCW device has started:
    
    2297791c ("s390/cio: dont unregister subchannel from child-drivers")
    
    During boot, the cio_ignore list is modified by the cio_ignore dracut
    module [1] as well as Linux vendor-specific systemd service scripts[2].
    When combined, this commit and boot scripts cause a frequent occurrence
    of the problem during boot.
    
    [1] https://github.com/dracutdevs/dracut/tree/master/modules.d/81cio_ignore
    [2] https://github.com/SUSE/s390-tools/blob/master/cio_ignore.service
    
    Cc: stable@vger.kernel.org # v5.15+
    Fixes: 2297791c ("s390/cio: dont unregister subchannel from child-drivers")
    Tested-By: default avatarThorsten Winkler <twinkler@linux.ibm.com>
    Reviewed-by: default avatarThorsten Winkler <twinkler@linux.ibm.com>
    Signed-off-by: default avatarPeter Oberparleiter <oberpar@linux.ibm.com>
    Signed-off-by: default avatarHeiko Carstens <hca@linux.ibm.com>
    5ef1dc40
device_ops.c 26.6 KB