• Justin T. Gibbs's avatar
    Aic7xxx and Aic79xx driver Update · 36da50bc
    Justin T. Gibbs authored
    o Avoid pre-2.5.X mid-layer deadlock due to SCSI malloc fragmentation
    
    For pre-2.5.X kernels, attempt to calculate a safe value
    for our S/G list length.  In these kernels, the midlayer
    allocates an S/G array dynamically when a command is issued
    using SCSI malloc.  This list, which is in an OS dependent
    format that must later be copied to our private S/G list, is
    sized to house just the number of segments needed for the
    current transfer.  Since the code that sizes the SCSI malloc
    pool does not take into consideration fragmentation of the
    pool, executing transactions numbering just a fraction of our
    concurrent transaction limit with list lengths aproaching
    AH?_NSEG in length will quickly depleat the SCSI malloc pool
    of usable space.
    
    Unfortunately, the mid-layer does not properly handle this
    scsi malloc failure.  In kernels prior to 2.4.20, should
    the device that experienced the malloc failure be idle and
    never have any new I/O initiated (block queue is not "kicked"),
    the process will hang indefinitely.  In 2.4.20 and beyond,
    the disk experiencing the failure is marked as a "starved
    device", but this only helps if I/O is initiated to or completes
    on that HBA.  If the failure was induced by another HBA, and
    no other I/O is pending on the HBA and no new transactions are
    queued, we are still succeptible to the hang.  (Also note that
    many 2.4.X kernels do not properly lock the "some_device_starved"
    and "device_starved" fields calling into question their overall
    effectiveness).
    
    By sizing our S/G list to avoid SCSI malloc pool fragmentation,
    we will hopefully avoid this deadlock at least for configurations
    where our own HBAs are the only ones using the SCSI subsystem.
    36da50bc
aic79xx_osm.c 141 KB