• Linus Torvalds's avatar
    module: warn about excessively long module waits · cb5b81bc
    Linus Torvalds authored
    Russell King reported that the arm cbc(aes) crypto module hangs when
    loaded, and Herbert Xu bisected it to commit 9b9879fc ("modules:
    catch concurrent module loads, treat them as idempotent"), and noted:
    
     "So what's happening here is that the first modprobe tries to load a
      fallback CBC implementation, in doing so it triggers a load of the
      exact same module due to module aliases.
    
      IOW we're loading aes-arm-bs which provides cbc(aes). However, this
      needs a fallback of cbc(aes) to operate, which is made out of the
      generic cbc module + any implementation of aes, or ecb(aes). The
      latter happens to also be provided by aes-arm-cb so that's why it
      tries to load the same module again"
    
    So loading the aes-arm-bs module ends up wanting to recursively load
    itself, and the recursive load then ends up waiting for the original
    module load to complete.
    
    This is a regression, in that it used to be that we just tried to load
    the module multiple times, and then as we went on to install it the
    second time we would instead just error out because the module name
    already existed.
    
    That is actually also exactly what the original "catch concurrent loads"
    patch did in commit 9828ed3f ("module: error out early on concurrent
    load of the same module file"), but it turns out that it ends up being
    racy, in that erroring out before the module has been fully initialized
    will cause failures in dependent module loading.
    
    See commit ac2263b5 (which was the revert of that "error out early")
    commit for details about why erroring out before the module has been
    initialized is actually fundamentally racy.
    
    Now, for the actual recursive module load (as opposed to just
    concurrently loading the same module twice), the race is not an issue.
    
    At the same time it's hard for the kernel to see that this is recursion,
    because the module load is always done from a usermode helper, so the
    recursion is not some simple callchain within the kernel.
    
    End result: this is not the real fix, but this at least adds a warning
    for the situation (admittedly much too late for all the debugging pain
    that Russell and Herbert went through) and if we can come to a
    resolution on how to detect the recursion properly, this re-organizes
    the code to make that easier.
    
    Link: https://lore.kernel.org/all/ZrFHLqvFqhzykuYw@shell.armlinux.org.uk/Reported-by: default avatarRussell King <linux@armlinux.org.uk>
    Debugged-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    cb5b81bc
main.c 86.4 KB