• Michael Ellerman's avatar
    powerpc: Check address limit on user-mode return (TIF_FSCHECK) · 3e378680
    Michael Ellerman authored
    set_fs() sets the addr_limit, which is used in access_ok() to
    determine if an address is a user or kernel address.
    
    Some code paths use set_fs() to temporarily elevate the addr_limit so
    that kernel code can read/write kernel memory as if it were user
    memory. That is fine as long as the code can't ever return to
    userspace with the addr_limit still elevated.
    
    If that did happen, then userspace can read/write kernel memory as if
    it were user memory, eg. just with write(2). In case it's not clear,
    that is very bad. It has also happened in the past due to bugs.
    
    Commit 5ea0727b ("x86/syscalls: Check address limit on user-mode
    return") added a mechanism to check the addr_limit value before
    returning to userspace. Any call to set_fs() sets a thread flag,
    TIF_FSCHECK, and if we see that on the return to userspace we go out
    of line to check that the addr_limit value is not elevated.
    
    For further info see the above commit, as well as:
      https://lwn.net/Articles/722267/
      https://bugs.chromium.org/p/project-zero/issues/detail?id=990
    
    Verified to work on 64-bit Book3S using a POC that objdumps the system
    call handler, and a modified lkdtm_CORRUPT_USER_DS() that doesn't kill
    the caller.
    
    Before:
      $ sudo ./test-tif-fscheck
      ...
      0000000000000000 <.data>:
             0:       e1 f7 8a 79     rldicl. r10,r12,30,63
             4:       80 03 82 40     bne     0x384
             8:       00 40 8a 71     andi.   r10,r12,16384
             c:       78 0b 2a 7c     mr      r10,r1
            10:       10 fd 21 38     addi    r1,r1,-752
            14:       08 00 c2 41     beq-    0x1c
            18:       58 09 2d e8     ld      r1,2392(r13)
            1c:       00 00 41 f9     std     r10,0(r1)
            20:       70 01 61 f9     std     r11,368(r1)
            24:       78 01 81 f9     std     r12,376(r1)
            28:       70 00 01 f8     std     r0,112(r1)
            2c:       78 00 41 f9     std     r10,120(r1)
            30:       20 00 82 41     beq     0x50
            34:       a6 42 4c 7d     mftb    r10
    
    After:
    
      $ sudo ./test-tif-fscheck
      Killed
    
    And in dmesg:
      Invalid address limit on user-mode return
      WARNING: CPU: 1 PID: 3689 at ../include/linux/syscalls.h:260 do_notify_resume+0x140/0x170
      ...
      NIP [c00000000001ee50] do_notify_resume+0x140/0x170
      LR [c00000000001ee4c] do_notify_resume+0x13c/0x170
      Call Trace:
        do_notify_resume+0x13c/0x170 (unreliable)
        ret_from_except_lite+0x70/0x74
    
    Performance overhead is essentially zero in the usual case, because
    the bit is checked as part of the existing _TIF_USER_WORK_MASK check.
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    3e378680
signal.c 5.74 KB