• Benjamin Coddington's avatar
    pnfs/blocklayout: handle transient devices · b3dce6a2
    Benjamin Coddington authored
    PNFS block/SCSI layouts should gracefully handle cases where block devices
    are not available when a layout is retrieved, or the block devices are
    removed while the client holds a layout.
    
    While setting up a layout segment, keep a record of an unavailable or
    un-parsable block device in cache with a flag so that subsequent layouts do
    not spam the server with GETDEVINFO.  We can reuse the current
    NFS_DEVICEID_UNAVAILABLE handling with one variation: instead of reusing
    the device, we will discard it and send a fresh GETDEVINFO after the
    timeout, since the lookup and validation of the device occurs within the
    GETDEVINFO response handling.
    
    A lookup of a layout segment that references an unavailable device will
    return a segment with the NFS_LSEG_UNAVAILABLE flag set.  This will allow
    the pgio layer to mark the layout with the appropriate fail bit, which
    forces subsequent IO to the MDS, and prevents spamming the server with
    LAYOUTGET, LAYOUTRETURN.
    
    Finally, when IO to a block device fails, look up the block device(s)
    referenced by the pgio header, and mark them as unavailable.
    Signed-off-by: default avatarBenjamin Coddington <bcodding@redhat.com>
    Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
    b3dce6a2
dev.c 13.1 KB