• Gerald Schaefer's avatar
    s390/mm: fix asce_bits handling with dynamic pagetable levels · 723cacbd
    Gerald Schaefer authored
    There is a race with multi-threaded applications between context switch and
    pagetable upgrade. In switch_mm() a new user_asce is built from mm->pgd and
    mm->context.asce_bits, w/o holding any locks. A concurrent mmap with a
    pagetable upgrade on another thread in crst_table_upgrade() could already
    have set new asce_bits, but not yet the new mm->pgd. This would result in a
    corrupt user_asce in switch_mm(), and eventually in a kernel panic from a
    translation exception.
    
    Fix this by storing the complete asce instead of just the asce_bits, which
    can then be read atomically from switch_mm(), so that it either sees the
    old value or the new value, but no mixture. Both cases are OK. Having the
    old value would result in a page fault on access to the higher level memory,
    but the fault handler would see the new mm->pgd, if it was a valid access
    after the mmap on the other thread has completed. So as worst-case scenario
    we would have a page fault loop for the racing thread until the next time
    slice.
    
    Also remove dead code and simplify the upgrade/downgrade path, there are no
    upgrades from 2 levels, and only downgrades from 3 levels for compat tasks.
    There are also no concurrent upgrades, because the mmap_sem is held with
    down_write() in do_mmap, so the flush and table checks during upgrade can
    be removed.
    Reported-by: default avatarMichael Munday <munday@ca.ibm.com>
    Reviewed-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
    Signed-off-by: default avatarGerald Schaefer <gerald.schaefer@de.ibm.com>
    Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
    723cacbd
pgalloc.h 3.76 KB