1. 20 Apr, 2019 40 commits
    • Balbir Singh's avatar
      bpf: Fix selftests are changes for CVE 2019-7308 · 03f11a51
      Balbir Singh authored
      The changes to fix the CVE 2019-7308 make the bpf verifier stricter
      with respect to operations that were allowed earlier in unprivileged
      mode. Fixup the test cases so that the error messages now correctly
      reflect pointer arithmetic going out of range for tests.
      Signed-off-by: default avatarBalbir Singh <sblbir@amzn.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      03f11a51
    • Daniel Borkmann's avatar
      bpf: fix sanitation rewrite in case of non-pointers · a042c21a
      Daniel Borkmann authored
      commit 3612af78 upstream.
      
      Marek reported that he saw an issue with the below snippet in that
      timing measurements where off when loaded as unpriv while results
      were reasonable when loaded as privileged:
      
          [...]
          uint64_t a = bpf_ktime_get_ns();
          uint64_t b = bpf_ktime_get_ns();
          uint64_t delta = b - a;
          if ((int64_t)delta > 0) {
          [...]
      
      Turns out there is a bug where a corner case is missing in the fix
      d3bd7413 ("bpf: fix sanitation of alu op with pointer / scalar
      type from different paths"), namely fixup_bpf_calls() only checks
      whether aux has a non-zero alu_state, but it also needs to test for
      the case of BPF_ALU_NON_POINTER since in both occasions we need to
      skip the masking rewrite (as there is nothing to mask).
      
      Fixes: d3bd7413 ("bpf: fix sanitation of alu op with pointer / scalar type from different paths")
      Reported-by: default avatarMarek Majkowski <marek@cloudflare.com>
      Reported-by: default avatarArthur Fabre <afabre@cloudflare.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/netdev/CAJPywTJqP34cK20iLM5YmUMz9KXQOdu1-+BZrGMAGgLuBWz7fg@mail.gmail.com/T/Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarBalbir Singh <sblbir@amzn.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a042c21a
    • Xu Yu's avatar
      bpf: do not restore dst_reg when cur_state is freed · 12462c88
      Xu Yu authored
      commit 0803278b upstream.
      
      Syzkaller hit 'KASAN: use-after-free Write in sanitize_ptr_alu' bug.
      
      Call trace:
      
        dump_stack+0xbf/0x12e
        print_address_description+0x6a/0x280
        kasan_report+0x237/0x360
        sanitize_ptr_alu+0x85a/0x8d0
        adjust_ptr_min_max_vals+0x8f2/0x1ca0
        adjust_reg_min_max_vals+0x8ed/0x22e0
        do_check+0x1ca6/0x5d00
        bpf_check+0x9ca/0x2570
        bpf_prog_load+0xc91/0x1030
        __se_sys_bpf+0x61e/0x1f00
        do_syscall_64+0xc8/0x550
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fault injection trace:
      
        kfree+0xea/0x290
        free_func_state+0x4a/0x60
        free_verifier_state+0x61/0xe0
        push_stack+0x216/0x2f0	          <- inject failslab
        sanitize_ptr_alu+0x2b1/0x8d0
        adjust_ptr_min_max_vals+0x8f2/0x1ca0
        adjust_reg_min_max_vals+0x8ed/0x22e0
        do_check+0x1ca6/0x5d00
        bpf_check+0x9ca/0x2570
        bpf_prog_load+0xc91/0x1030
        __se_sys_bpf+0x61e/0x1f00
        do_syscall_64+0xc8/0x550
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      When kzalloc() fails in push_stack(), free_verifier_state() will free
      current verifier state. As push_stack() returns, dst_reg was restored
      if ptr_is_dst_reg is false. However, as member of the cur_state,
      dst_reg is also freed, and error occurs when dereferencing dst_reg.
      Simply fix it by testing ret of push_stack() before restoring dst_reg.
      
      Fixes: 979d63d5 ("bpf: prevent out of bounds speculation on pointer arithmetic")
      Signed-off-by: default avatarXu Yu <xuyu@linux.alibaba.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      12462c88
    • Daniel Borkmann's avatar
      bpf: fix inner map masking to prevent oob under speculation · 0ed998d1
      Daniel Borkmann authored
      commit 9d5564dd upstream.
      
      During review I noticed that inner meta map setup for map in
      map is buggy in that it does not propagate all needed data
      from the reference map which the verifier is later accessing.
      
      In particular one such case is index masking to prevent out of
      bounds access under speculative execution due to missing the
      map's unpriv_array/index_mask field propagation. Fix this such
      that the verifier is generating the correct code for inlined
      lookups in case of unpriviledged use.
      
      Before patch (test_verifier's 'map in map access' dump):
      
        # bpftool prog dump xla id 3
           0: (62) *(u32 *)(r10 -4) = 0
           1: (bf) r2 = r10
           2: (07) r2 += -4
           3: (18) r1 = map[id:4]
           5: (07) r1 += 272                |
           6: (61) r0 = *(u32 *)(r2 +0)     |
           7: (35) if r0 >= 0x1 goto pc+6   | Inlined map in map lookup
           8: (54) (u32) r0 &= (u32) 0      | with index masking for
           9: (67) r0 <<= 3                 | map->unpriv_array.
          10: (0f) r0 += r1                 |
          11: (79) r0 = *(u64 *)(r0 +0)     |
          12: (15) if r0 == 0x0 goto pc+1   |
          13: (05) goto pc+1                |
          14: (b7) r0 = 0                   |
          15: (15) if r0 == 0x0 goto pc+11
          16: (62) *(u32 *)(r10 -4) = 0
          17: (bf) r2 = r10
          18: (07) r2 += -4
          19: (bf) r1 = r0
          20: (07) r1 += 272                |
          21: (61) r0 = *(u32 *)(r2 +0)     | Index masking missing (!)
          22: (35) if r0 >= 0x1 goto pc+3   | for inner map despite
          23: (67) r0 <<= 3                 | map->unpriv_array set.
          24: (0f) r0 += r1                 |
          25: (05) goto pc+1                |
          26: (b7) r0 = 0                   |
          27: (b7) r0 = 0
          28: (95) exit
      
      After patch:
      
        # bpftool prog dump xla id 1
           0: (62) *(u32 *)(r10 -4) = 0
           1: (bf) r2 = r10
           2: (07) r2 += -4
           3: (18) r1 = map[id:2]
           5: (07) r1 += 272                |
           6: (61) r0 = *(u32 *)(r2 +0)     |
           7: (35) if r0 >= 0x1 goto pc+6   | Same inlined map in map lookup
           8: (54) (u32) r0 &= (u32) 0      | with index masking due to
           9: (67) r0 <<= 3                 | map->unpriv_array.
          10: (0f) r0 += r1                 |
          11: (79) r0 = *(u64 *)(r0 +0)     |
          12: (15) if r0 == 0x0 goto pc+1   |
          13: (05) goto pc+1                |
          14: (b7) r0 = 0                   |
          15: (15) if r0 == 0x0 goto pc+12
          16: (62) *(u32 *)(r10 -4) = 0
          17: (bf) r2 = r10
          18: (07) r2 += -4
          19: (bf) r1 = r0
          20: (07) r1 += 272                |
          21: (61) r0 = *(u32 *)(r2 +0)     |
          22: (35) if r0 >= 0x1 goto pc+4   | Now fixed inlined inner map
          23: (54) (u32) r0 &= (u32) 0      | lookup with proper index masking
          24: (67) r0 <<= 3                 | for map->unpriv_array.
          25: (0f) r0 += r1                 |
          26: (05) goto pc+1                |
          27: (b7) r0 = 0                   |
          28: (b7) r0 = 0
          29: (95) exit
      
      
      Fixes: b2157399 ("bpf: prevent out-of-bounds speculation")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarVallish Vaidyeshwara <vallish@amazon.com>
      Signed-off-by: default avatarBalbir Singh <sblbir@amzn.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0ed998d1
    • Daniel Borkmann's avatar
      bpf: fix sanitation of alu op with pointer / scalar type from different paths · 6588a490
      Daniel Borkmann authored
      commit d3bd7413 upstream.
      
      While 979d63d5 ("bpf: prevent out of bounds speculation on pointer
      arithmetic") took care of rejecting alu op on pointer when e.g. pointer
      came from two different map values with different map properties such as
      value size, Jann reported that a case was not covered yet when a given
      alu op is used in both "ptr_reg += reg" and "numeric_reg += reg" from
      different branches where we would incorrectly try to sanitize based
      on the pointer's limit. Catch this corner case and reject the program
      instead.
      
      Fixes: 979d63d5 ("bpf: prevent out of bounds speculation on pointer arithmetic")
      Reported-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarVallish Vaidyeshwara <vallish@amazon.com>
      Signed-off-by: default avatarBalbir Singh <sblbir@amzn.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6588a490
    • Daniel Borkmann's avatar
      bpf: prevent out of bounds speculation on pointer arithmetic · ae03b6b1
      Daniel Borkmann authored
      commit 979d63d5 upstream.
      
      Jann reported that the original commit back in b2157399
      ("bpf: prevent out-of-bounds speculation") was not sufficient
      to stop CPU from speculating out of bounds memory access:
      While b2157399 only focussed on masking array map access
      for unprivileged users for tail calls and data access such
      that the user provided index gets sanitized from BPF program
      and syscall side, there is still a more generic form affected
      from BPF programs that applies to most maps that hold user
      data in relation to dynamic map access when dealing with
      unknown scalars or "slow" known scalars as access offset, for
      example:
      
        - Load a map value pointer into R6
        - Load an index into R7
        - Do a slow computation (e.g. with a memory dependency) that
          loads a limit into R8 (e.g. load the limit from a map for
          high latency, then mask it to make the verifier happy)
        - Exit if R7 >= R8 (mispredicted branch)
        - Load R0 = R6[R7]
        - Load R0 = R6[R0]
      
      For unknown scalars there are two options in the BPF verifier
      where we could derive knowledge from in order to guarantee
      safe access to the memory: i) While </>/<=/>= variants won't
      allow to derive any lower or upper bounds from the unknown
      scalar where it would be safe to add it to the map value
      pointer, it is possible through ==/!= test however. ii) another
      option is to transform the unknown scalar into a known scalar,
      for example, through ALU ops combination such as R &= <imm>
      followed by R |= <imm> or any similar combination where the
      original information from the unknown scalar would be destroyed
      entirely leaving R with a constant. The initial slow load still
      precedes the latter ALU ops on that register, so the CPU
      executes speculatively from that point. Once we have the known
      scalar, any compare operation would work then. A third option
      only involving registers with known scalars could be crafted
      as described in [0] where a CPU port (e.g. Slow Int unit)
      would be filled with many dependent computations such that
      the subsequent condition depending on its outcome has to wait
      for evaluation on its execution port and thereby executing
      speculatively if the speculated code can be scheduled on a
      different execution port, or any other form of mistraining
      as described in [1], for example. Given this is not limited
      to only unknown scalars, not only map but also stack access
      is affected since both is accessible for unprivileged users
      and could potentially be used for out of bounds access under
      speculation.
      
      In order to prevent any of these cases, the verifier is now
      sanitizing pointer arithmetic on the offset such that any
      out of bounds speculation would be masked in a way where the
      pointer arithmetic result in the destination register will
      stay unchanged, meaning offset masked into zero similar as
      in array_index_nospec() case. With regards to implementation,
      there are three options that were considered: i) new insn
      for sanitation, ii) push/pop insn and sanitation as inlined
      BPF, iii) reuse of ax register and sanitation as inlined BPF.
      
      Option i) has the downside that we end up using from reserved
      bits in the opcode space, but also that we would require
      each JIT to emit masking as native arch opcodes meaning
      mitigation would have slow adoption till everyone implements
      it eventually which is counter-productive. Option ii) and iii)
      have both in common that a temporary register is needed in
      order to implement the sanitation as inlined BPF since we
      are not allowed to modify the source register. While a push /
      pop insn in ii) would be useful to have in any case, it
      requires once again that every JIT needs to implement it
      first. While possible, amount of changes needed would also
      be unsuitable for a -stable patch. Therefore, the path which
      has fewer changes, less BPF instructions for the mitigation
      and does not require anything to be changed in the JITs is
      option iii) which this work is pursuing. The ax register is
      already mapped to a register in all JITs (modulo arm32 where
      it's mapped to stack as various other BPF registers there)
      and used in constant blinding for JITs-only so far. It can
      be reused for verifier rewrites under certain constraints.
      The interpreter's tmp "register" has therefore been remapped
      into extending the register set with hidden ax register and
      reusing that for a number of instructions that needed the
      prior temporary variable internally (e.g. div, mod). This
      allows for zero increase in stack space usage in the interpreter,
      and enables (restricted) generic use in rewrites otherwise as
      long as such a patchlet does not make use of these instructions.
      The sanitation mask is dynamic and relative to the offset the
      map value or stack pointer currently holds.
      
      There are various cases that need to be taken under consideration
      for the masking, e.g. such operation could look as follows:
      ptr += val or val += ptr or ptr -= val. Thus, the value to be
      sanitized could reside either in source or in destination
      register, and the limit is different depending on whether
      the ALU op is addition or subtraction and depending on the
      current known and bounded offset. The limit is derived as
      follows: limit := max_value_size - (smin_value + off). For
      subtraction: limit := umax_value + off. This holds because
      we do not allow any pointer arithmetic that would
      temporarily go out of bounds or would have an unknown
      value with mixed signed bounds where it is unclear at
      verification time whether the actual runtime value would
      be either negative or positive. For example, we have a
      derived map pointer value with constant offset and bounded
      one, so limit based on smin_value works because the verifier
      requires that statically analyzed arithmetic on the pointer
      must be in bounds, and thus it checks if resulting
      smin_value + off and umax_value + off is still within map
      value bounds at time of arithmetic in addition to time of
      access. Similarly, for the case of stack access we derive
      the limit as follows: MAX_BPF_STACK + off for subtraction
      and -off for the case of addition where off := ptr_reg->off +
      ptr_reg->var_off.value. Subtraction is a special case for
      the masking which can be in form of ptr += -val, ptr -= -val,
      or ptr -= val. In the first two cases where we know that
      the value is negative, we need to temporarily negate the
      value in order to do the sanitation on a positive value
      where we later swap the ALU op, and restore original source
      register if the value was in source.
      
      The sanitation of pointer arithmetic alone is still not fully
      sufficient as is, since a scenario like the following could
      happen ...
      
        PTR += 0x1000 (e.g. K-based imm)
        PTR -= BIG_NUMBER_WITH_SLOW_COMPARISON
        PTR += 0x1000
        PTR -= BIG_NUMBER_WITH_SLOW_COMPARISON
        [...]
      
      ... which under speculation could end up as ...
      
        PTR += 0x1000
        PTR -= 0 [ truncated by mitigation ]
        PTR += 0x1000
        PTR -= 0 [ truncated by mitigation ]
        [...]
      
      ... and therefore still access out of bounds. To prevent such
      case, the verifier is also analyzing safety for potential out
      of bounds access under speculative execution. Meaning, it is
      also simulating pointer access under truncation. We therefore
      "branch off" and push the current verification state after the
      ALU operation with known 0 to the verification stack for later
      analysis. Given the current path analysis succeeded it is
      likely that the one under speculation can be pruned. In any
      case, it is also subject to existing complexity limits and
      therefore anything beyond this point will be rejected. In
      terms of pruning, it needs to be ensured that the verification
      state from speculative execution simulation must never prune
      a non-speculative execution path, therefore, we mark verifier
      state accordingly at the time of push_stack(). If verifier
      detects out of bounds access under speculative execution from
      one of the possible paths that includes a truncation, it will
      reject such program.
      
      Given we mask every reg-based pointer arithmetic for
      unprivileged programs, we've been looking into how it could
      affect real-world programs in terms of size increase. As the
      majority of programs are targeted for privileged-only use
      case, we've unconditionally enabled masking (with its alu
      restrictions on top of it) for privileged programs for the
      sake of testing in order to check i) whether they get rejected
      in its current form, and ii) by how much the number of
      instructions and size will increase. We've tested this by
      using Katran, Cilium and test_l4lb from the kernel selftests.
      For Katran we've evaluated balancer_kern.o, Cilium bpf_lxc.o
      and an older test object bpf_lxc_opt_-DUNKNOWN.o and l4lb
      we've used test_l4lb.o as well as test_l4lb_noinline.o. We
      found that none of the programs got rejected by the verifier
      with this change, and that impact is rather minimal to none.
      balancer_kern.o had 13,904 bytes (1,738 insns) xlated and
      7,797 bytes JITed before and after the change. Most complex
      program in bpf_lxc.o had 30,544 bytes (3,817 insns) xlated
      and 18,538 bytes JITed before and after and none of the other
      tail call programs in bpf_lxc.o had any changes either. For
      the older bpf_lxc_opt_-DUNKNOWN.o object we found a small
      increase from 20,616 bytes (2,576 insns) and 12,536 bytes JITed
      before to 20,664 bytes (2,582 insns) and 12,558 bytes JITed
      after the change. Other programs from that object file had
      similar small increase. Both test_l4lb.o had no change and
      remained at 6,544 bytes (817 insns) xlated and 3,401 bytes
      JITed and for test_l4lb_noinline.o constant at 5,080 bytes
      (634 insns) xlated and 3,313 bytes JITed. This can be explained
      in that LLVM typically optimizes stack based pointer arithmetic
      by using K-based operations and that use of dynamic map access
      is not overly frequent. However, in future we may decide to
      optimize the algorithm further under known guarantees from
      branch and value speculation. Latter seems also unclear in
      terms of prediction heuristics that today's CPUs apply as well
      as whether there could be collisions in e.g. the predictor's
      Value History/Pattern Table for triggering out of bounds access,
      thus masking is performed unconditionally at this point but could
      be subject to relaxation later on. We were generally also
      brainstorming various other approaches for mitigation, but the
      blocker was always lack of available registers at runtime and/or
      overhead for runtime tracking of limits belonging to a specific
      pointer. Thus, we found this to be minimally intrusive under
      given constraints.
      
      With that in place, a simple example with sanitized access on
      unprivileged load at post-verification time looks as follows:
      
        # bpftool prog dump xlated id 282
        [...]
        28: (79) r1 = *(u64 *)(r7 +0)
        29: (79) r2 = *(u64 *)(r7 +8)
        30: (57) r1 &= 15
        31: (79) r3 = *(u64 *)(r0 +4608)
        32: (57) r3 &= 1
        33: (47) r3 |= 1
        34: (2d) if r2 > r3 goto pc+19
        35: (b4) (u32) r11 = (u32) 20479  |
        36: (1f) r11 -= r2                | Dynamic sanitation for pointer
        37: (4f) r11 |= r2                | arithmetic with registers
        38: (87) r11 = -r11               | containing bounded or known
        39: (c7) r11 s>>= 63              | scalars in order to prevent
        40: (5f) r11 &= r2                | out of bounds speculation.
        41: (0f) r4 += r11                |
        42: (71) r4 = *(u8 *)(r4 +0)
        43: (6f) r4 <<= r1
        [...]
      
      For the case where the scalar sits in the destination register
      as opposed to the source register, the following code is emitted
      for the above example:
      
        [...]
        16: (b4) (u32) r11 = (u32) 20479
        17: (1f) r11 -= r2
        18: (4f) r11 |= r2
        19: (87) r11 = -r11
        20: (c7) r11 s>>= 63
        21: (5f) r2 &= r11
        22: (0f) r2 += r0
        23: (61) r0 = *(u32 *)(r2 +0)
        [...]
      
      JIT blinding example with non-conflicting use of r10:
      
        [...]
         d5:	je     0x0000000000000106    _
         d7:	mov    0x0(%rax),%edi       |
         da:	mov    $0xf153246,%r10d     | Index load from map value and
         e0:	xor    $0xf153259,%r10      | (const blinded) mask with 0x1f.
         e7:	and    %r10,%rdi            |_
         ea:	mov    $0x2f,%r10d          |
         f0:	sub    %rdi,%r10            | Sanitized addition. Both use r10
         f3:	or     %rdi,%r10            | but do not interfere with each
         f6:	neg    %r10                 | other. (Neither do these instructions
         f9:	sar    $0x3f,%r10           | interfere with the use of ax as temp
         fd:	and    %r10,%rdi            | in interpreter.)
        100:	add    %rax,%rdi            |_
        103:	mov    0x0(%rdi),%eax
       [...]
      
      Tested that it fixes Jann's reproducer, and also checked that test_verifier
      and test_progs suite with interpreter, JIT and JIT with hardening enabled
      on x86-64 and arm64 runs successfully.
      
        [0] Speculose: Analyzing the Security Implications of Speculative
            Execution in CPUs, Giorgi Maisuradze and Christian Rossow,
            https://arxiv.org/pdf/1801.04084.pdf
      
        [1] A Systematic Evaluation of Transient Execution Attacks and
            Defenses, Claudio Canella, Jo Van Bulck, Michael Schwarz,
            Moritz Lipp, Benjamin von Berg, Philipp Ortner, Frank Piessens,
            Dmitry Evtyushkin, Daniel Gruss,
            https://arxiv.org/pdf/1811.05441.pdf
      
      Fixes: b2157399 ("bpf: prevent out-of-bounds speculation")
      Reported-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarVallish Vaidyeshwara <vallish@amazon.com>
      [some checkpatch cleanups and backported to 4.14 by sblbir]
      Signed-off-by: default avatarBalbir Singh <sblbir@amzn.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ae03b6b1
    • Daniel Borkmann's avatar
      bpf: fix check_map_access smin_value test when pointer contains offset · 4b756d9f
      Daniel Borkmann authored
      commit b7137c4e upstream.
      
      In check_map_access() we probe actual bounds through __check_map_access()
      with offset of reg->smin_value + off for lower bound and offset of
      reg->umax_value + off for the upper bound. However, even though the
      reg->smin_value could have a negative value, the final result of the
      sum with off could be positive when pointer arithmetic with known and
      unknown scalars is combined. In this case we reject the program with
      an error such as "R<x> min value is negative, either use unsigned index
      or do a if (index >=0) check." even though the access itself would be
      fine. Therefore extend the check to probe whether the actual resulting
      reg->smin_value + off is less than zero.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      [backported to 4.14 sblbir]
      Signed-off-by: default avatarBalbir Singh <sblbir@amzn.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4b756d9f
    • Daniel Borkmann's avatar
      bpf: restrict unknown scalars of mixed signed bounds for unprivileged · 17efa653
      Daniel Borkmann authored
      commit 9d7eceed upstream.
      
      For unknown scalars of mixed signed bounds, meaning their smin_value is
      negative and their smax_value is positive, we need to reject arithmetic
      with pointer to map value. For unprivileged the goal is to mask every
      map pointer arithmetic and this cannot reliably be done when it is
      unknown at verification time whether the scalar value is negative or
      positive. Given this is a corner case, the likelihood of breaking should
      be very small.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      [backported to 4.14 sblbir]
      Signed-off-by: default avatarBalbir Singh <sblbir@amzn.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      17efa653
    • Daniel Borkmann's avatar
      bpf: restrict stack pointer arithmetic for unprivileged · ba9d2e0c
      Daniel Borkmann authored
      commit e4298d25 upstream.
      
      Restrict stack pointer arithmetic for unprivileged users in that
      arithmetic itself must not go out of bounds as opposed to the actual
      access later on. Therefore after each adjust_ptr_min_max_vals() with
      a stack pointer as a destination we simulate a check_stack_access()
      of 1 byte on the destination and once that fails the program is
      rejected for unprivileged program loads. This is analog to map
      value pointer arithmetic and needed for masking later on.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      [backported to 4.14 sblbir]
      Signed-off-by: default avatarBalbir Singh <sblbir@amzn.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ba9d2e0c
    • Daniel Borkmann's avatar
      bpf: restrict map value pointer arithmetic for unprivileged · afb711a6
      Daniel Borkmann authored
      commit 0d6303db upstream.
      
      Restrict map value pointer arithmetic for unprivileged users in that
      arithmetic itself must not go out of bounds as opposed to the actual
      access later on. Therefore after each adjust_ptr_min_max_vals() with a
      map value pointer as a destination it will simulate a check_map_access()
      of 1 byte on the destination and once that fails the program is rejected
      for unprivileged program loads. We use this later on for masking any
      pointer arithmetic with the remainder of the map value space. The
      likelihood of breaking any existing real-world unprivileged eBPF
      program is very small for this corner case.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      afb711a6
    • Daniel Borkmann's avatar
      bpf: enable access to ax register also from verifier rewrite · b101cf55
      Daniel Borkmann authored
      commit 9b73bfdd upstream.
      
      Right now we are using BPF ax register in JIT for constant blinding as
      well as in interpreter as temporary variable. Verifier will not be able
      to use it simply because its use will get overridden from the former in
      bpf_jit_blind_insn(). However, it can be made to work in that blinding
      will be skipped if there is prior use in either source or destination
      register on the instruction. Taking constraints of ax into account, the
      verifier is then open to use it in rewrites under some constraints. Note,
      ax register already has mappings in every eBPF JIT.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      [backported to 4.14 sblbir]
      Signed-off-by: default avatarBalbir Singh <sblbir@amzn.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b101cf55
    • Daniel Borkmann's avatar
      bpf: move tmp variable into ax register in interpreter · ee282977
      Daniel Borkmann authored
      commit 144cd91c upstream.
      
      This change moves the on-stack 64 bit tmp variable in ___bpf_prog_run()
      into the hidden ax register. The latter is currently only used in JITs
      for constant blinding as a temporary scratch register, meaning the BPF
      interpreter will never see the use of ax. Therefore it is safe to use
      it for the cases where tmp has been used earlier. This is needed to later
      on allow restricted hidden use of ax in both interpreter and JITs.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      [backported to 4.14 sblbir]
      Signed-off-by: default avatarBalbir Singh <sblbir@amzn.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ee282977
    • Daniel Borkmann's avatar
      bpf: move {prev_,}insn_idx into verifier env · 6a42c494
      Daniel Borkmann authored
      commit c08435ec upstream.
      
      Move prev_insn_idx and insn_idx from the do_check() function into
      the verifier environment, so they can be read inside the various
      helper functions for handling the instructions. It's easier to put
      this into the environment rather than changing all call-sites only
      to pass it along. insn_idx is useful in particular since this later
      on allows to hold state in env->insn_aux_data[env->insn_idx].
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarVallish Vaidyeshwara <vallish@amazon.com>
      [Backported to 4.14 by sblbir]
      Signed-off-by: default avatarBalbir Singh <sblbir@amzn.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6a42c494
    • Alexei Starovoitov's avatar
      bpf: fix stack state printing in verifier log · 85614894
      Alexei Starovoitov authored
      commit 12a3cc84 upstream.
      
      fix incorrect stack state prints in print_verifier_state()
      
      Fixes: 638f5b90 ("bpf: reduce verifier memory consumption")
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarBalbir Singh <sblbir@amzn.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      85614894
    • Craig Gallek's avatar
      bpf: fix verifier NULL pointer dereference · 86e5dd8c
      Craig Gallek authored
      commit 8c01c4f8 upstream.
      
      do_check() can fail early without allocating env->cur_state under
      memory pressure.  Syzkaller found the stack below on the linux-next
      tree because of this.
      
        kasan: CONFIG_KASAN_INLINE enabled
        kasan: GPF could be caused by NULL-ptr deref or user memory access
        general protection fault: 0000 [#1] SMP KASAN
        Dumping ftrace buffer:
           (ftrace buffer empty)
        Modules linked in:
        CPU: 1 PID: 27062 Comm: syz-executor5 Not tainted 4.14.0-rc7+ #106
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        task: ffff8801c2c74700 task.stack: ffff8801c3e28000
        RIP: 0010:free_verifier_state kernel/bpf/verifier.c:347 [inline]
        RIP: 0010:bpf_check+0xcf4/0x19c0 kernel/bpf/verifier.c:4533
        RSP: 0018:ffff8801c3e2f5c8 EFLAGS: 00010202
        RAX: dffffc0000000000 RBX: 00000000fffffff4 RCX: 0000000000000000
        RDX: 0000000000000070 RSI: ffffffff817d5aa9 RDI: 0000000000000380
        RBP: ffff8801c3e2f668 R08: 0000000000000000 R09: 1ffff100387c5d9f
        R10: 00000000218c4e80 R11: ffffffff85b34380 R12: ffff8801c4dc6a28
        R13: 0000000000000000 R14: ffff8801c4dc6a00 R15: ffff8801c4dc6a20
        FS:  00007f311079b700(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00000000004d4a24 CR3: 00000001cbcd0000 CR4: 00000000001406e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         bpf_prog_load+0xcbb/0x18e0 kernel/bpf/syscall.c:1166
         SYSC_bpf kernel/bpf/syscall.c:1690 [inline]
         SyS_bpf+0xae9/0x4620 kernel/bpf/syscall.c:1652
         entry_SYSCALL_64_fastpath+0x1f/0xbe
        RIP: 0033:0x452869
        RSP: 002b:00007f311079abe8 EFLAGS: 00000212 ORIG_RAX: 0000000000000141
        RAX: ffffffffffffffda RBX: 0000000000758020 RCX: 0000000000452869
        RDX: 0000000000000030 RSI: 0000000020168000 RDI: 0000000000000005
        RBP: 00007f311079aa20 R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000000000 R11: 0000000000000212 R12: 00000000004b7550
        R13: 00007f311079ab58 R14: 00000000004b7560 R15: 0000000000000000
        Code: df 48 c1 ea 03 80 3c 02 00 0f 85 e6 0b 00 00 4d 8b 6e 20 48 b8 00 00 00 00 00 fc ff df 49 8d bd 80 03 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 b6 0b 00 00 49 8b bd 80 03 00 00 e8 d6 0c 26
        RIP: free_verifier_state kernel/bpf/verifier.c:347 [inline] RSP: ffff8801c3e2f5c8
        RIP: bpf_check+0xcf4/0x19c0 kernel/bpf/verifier.c:4533 RSP: ffff8801c3e2f5c8
        ---[ end trace c8d37f339dc64004 ]---
      
      Fixes: 638f5b90 ("bpf: reduce verifier memory consumption")
      Fixes: 1969db47 ("bpf: fix verifier memory leaks")
      Signed-off-by: default avatarCraig Gallek <kraig@google.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBalbir Singh <sblbir@amzn.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      86e5dd8c
    • Alexei Starovoitov's avatar
      bpf: fix verifier memory leaks · 534087e6
      Alexei Starovoitov authored
      commit 1969db47 upstream.
      
      fix verifier memory leaks
      
      Fixes: 638f5b90 ("bpf: reduce verifier memory consumption")
      Signed-off-by: default avatarAlexei Starovoitov <ast@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBalbir Singh <sblbir@amzn.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      534087e6
    • Alexei Starovoitov's avatar
      bpf: reduce verifier memory consumption · 28356c21
      Alexei Starovoitov authored
      commit 638f5b90 upstream.
      
      the verifier got progressively smarter over time and size of its internal
      state grew as well. Time to reduce the memory consumption.
      
      Before:
      sizeof(struct bpf_verifier_state) = 6520
      After:
      sizeof(struct bpf_verifier_state) = 896
      
      It's done by observing that majority of BPF programs use little to
      no stack whereas verifier kept all of 512 stack slots ready always.
      Instead dynamically reallocate struct verifier state when stack
      access is detected.
      Runtime difference before vs after is within a noise.
      The number of processed instructions stays the same.
      
      Cc: jakub.kicinski@netronome.com
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [Backported to 4.14 by sblbir]
      Signed-off-by: default avatarBalbir Singh <sblbir@amzn.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      28356c21
    • Mikulas Patocka's avatar
      dm: disable CRYPTO_TFM_REQ_MAY_SLEEP to fix a GFP_KERNEL recursion deadlock · 8991f1af
      Mikulas Patocka authored
      [ Upstream commit 432061b3 ]
      
      There's a XFS on dm-crypt deadlock, recursing back to itself due to the
      crypto subsystems use of GFP_KERNEL, reported here:
      https://bugzilla.kernel.org/show_bug.cgi?id=200835
      
      * dm-crypt calls crypt_convert in xts mode
      * init_crypt from xts.c calls kmalloc(GFP_KERNEL)
      * kmalloc(GFP_KERNEL) recurses into the XFS filesystem, the filesystem
      	tries to submit some bios and wait for them, causing a deadlock
      
      Fix this by updating both the DM crypt and integrity targets to no
      longer use the CRYPTO_TFM_REQ_MAY_SLEEP flag, which will change the
      crypto allocations from GFP_KERNEL to GFP_ATOMIC, therefore they can't
      recurse into a filesystem.  A GFP_ATOMIC allocation can fail, but
      init_crypt() in xts.c handles the allocation failure gracefully - it
      will fall back to preallocated buffer if the allocation fails.
      
      The crypto API maintainer says that the crypto API only needs to
      allocate memory when dealing with unaligned buffers and therefore
      turning CRYPTO_TFM_REQ_MAY_SLEEP off is safe (see this discussion:
      https://www.redhat.com/archives/dm-devel/2018-August/msg00195.html )
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarSasha Levin (Microsoft) <sashal@kernel.org>
      8991f1af
    • Daniel Borkmann's avatar
      bpf: fix use after free in bpf_evict_inode · 02c2de9b
      Daniel Borkmann authored
      [ Upstream commit 1da6c4d9 ]
      
      syzkaller was able to generate the following UAF in bpf:
      
        BUG: KASAN: use-after-free in lookup_last fs/namei.c:2269 [inline]
        BUG: KASAN: use-after-free in path_lookupat.isra.43+0x9f8/0xc00 fs/namei.c:2318
        Read of size 1 at addr ffff8801c4865c47 by task syz-executor2/9423
      
        CPU: 0 PID: 9423 Comm: syz-executor2 Not tainted 4.20.0-rc1-next-20181109+
        #110
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
        Google 01/01/2011
        Call Trace:
          __dump_stack lib/dump_stack.c:77 [inline]
          dump_stack+0x244/0x39d lib/dump_stack.c:113
          print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256
          kasan_report_error mm/kasan/report.c:354 [inline]
          kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412
          __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:430
          lookup_last fs/namei.c:2269 [inline]
          path_lookupat.isra.43+0x9f8/0xc00 fs/namei.c:2318
          filename_lookup+0x26a/0x520 fs/namei.c:2348
          user_path_at_empty+0x40/0x50 fs/namei.c:2608
          user_path include/linux/namei.h:62 [inline]
          do_mount+0x180/0x1ff0 fs/namespace.c:2980
          ksys_mount+0x12d/0x140 fs/namespace.c:3258
          __do_sys_mount fs/namespace.c:3272 [inline]
          __se_sys_mount fs/namespace.c:3269 [inline]
          __x64_sys_mount+0xbe/0x150 fs/namespace.c:3269
          do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
          entry_SYSCALL_64_after_hwframe+0x49/0xbe
        RIP: 0033:0x457569
        Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7
        48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
        ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
        RSP: 002b:00007fde6ed96c78 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
        RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000457569
        RDX: 0000000020000040 RSI: 0000000020000000 RDI: 0000000000000000
        RBP: 000000000072bf00 R08: 0000000020000340 R09: 0000000000000000
        R10: 0000000000200000 R11: 0000000000000246 R12: 00007fde6ed976d4
        R13: 00000000004c2c24 R14: 00000000004d4990 R15: 00000000ffffffff
      
        Allocated by task 9424:
          save_stack+0x43/0xd0 mm/kasan/kasan.c:448
          set_track mm/kasan/kasan.c:460 [inline]
          kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
          __do_kmalloc mm/slab.c:3722 [inline]
          __kmalloc_track_caller+0x157/0x760 mm/slab.c:3737
          kstrdup+0x39/0x70 mm/util.c:49
          bpf_symlink+0x26/0x140 kernel/bpf/inode.c:356
          vfs_symlink+0x37a/0x5d0 fs/namei.c:4127
          do_symlinkat+0x242/0x2d0 fs/namei.c:4154
          __do_sys_symlink fs/namei.c:4173 [inline]
          __se_sys_symlink fs/namei.c:4171 [inline]
          __x64_sys_symlink+0x59/0x80 fs/namei.c:4171
          do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
          entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
        Freed by task 9425:
          save_stack+0x43/0xd0 mm/kasan/kasan.c:448
          set_track mm/kasan/kasan.c:460 [inline]
          __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
          kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
          __cache_free mm/slab.c:3498 [inline]
          kfree+0xcf/0x230 mm/slab.c:3817
          bpf_evict_inode+0x11f/0x150 kernel/bpf/inode.c:565
          evict+0x4b9/0x980 fs/inode.c:558
          iput_final fs/inode.c:1550 [inline]
          iput+0x674/0xa90 fs/inode.c:1576
          do_unlinkat+0x733/0xa30 fs/namei.c:4069
          __do_sys_unlink fs/namei.c:4110 [inline]
          __se_sys_unlink fs/namei.c:4108 [inline]
          __x64_sys_unlink+0x42/0x50 fs/namei.c:4108
          do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
          entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      In this scenario path lookup under RCU is racing with the final
      unlink in case of symlinks. As Linus puts it in his analysis:
      
        [...] We actually RCU-delay the inode freeing itself, but
        when we do the final iput(), the "evict()" function is called
        synchronously. Now, the simple fix would seem to just RCU-delay
        the kfree() of the symlink data in bpf_evict_inode(). Maybe
        that's the right thing to do. [...]
      
      Al suggested to piggy-back on the ->destroy_inode() callback in
      order to implement RCU deferral there which can then kfree() the
      inode->i_link eventually right before putting inode back into
      inode cache. By reusing free_inode_nonrcu() from there we can
      avoid the need for our own inode cache and just reuse generic
      one as we currently do.
      
      And in-fact on top of all this we should just get rid of the
      bpf_evict_inode() entirely. This means truncate_inode_pages_final()
      and clear_inode() will then simply be called by the fs core via
      evict(). Dropping the reference should really only be done when
      inode is unhashed and nothing reachable anymore, so it's better
      also moved into the final ->destroy_inode() callback.
      
      Fixes: 0f98621b ("bpf, inode: add support for symlinks and fix mtime/ctime")
      Reported-by: syzbot+fb731ca573367b7f6564@syzkaller.appspotmail.com
      Reported-by: syzbot+a13e5ead792d6df37818@syzkaller.appspotmail.com
      Reported-by: syzbot+7a8ba368b47fdefca61e@syzkaller.appspotmail.com
      Suggested-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Analyzed-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Link: https://lore.kernel.org/lkml/0000000000006946d2057bbd0eef@google.com/T/Signed-off-by: default avatarSasha Levin (Microsoft) <sashal@kernel.org>
      02c2de9b
    • Pi-Hsun Shih's avatar
      include/linux/swap.h: use offsetof() instead of custom __swapoffset macro · c43e2bdd
      Pi-Hsun Shih authored
      [ Upstream commit a4046c06 ]
      
      Use offsetof() to calculate offset of a field to take advantage of
      compiler built-in version when possible, and avoid UBSAN warning when
      compiling with Clang:
      
        UBSAN: Undefined behaviour in mm/swapfile.c:3010:38
        member access within null pointer of type 'union swap_header'
        CPU: 6 PID: 1833 Comm: swapon Tainted: G S                4.19.23 #43
        Call trace:
         dump_backtrace+0x0/0x194
         show_stack+0x20/0x2c
         __dump_stack+0x20/0x28
         dump_stack+0x70/0x94
         ubsan_epilogue+0x14/0x44
         ubsan_type_mismatch_common+0xf4/0xfc
         __ubsan_handle_type_mismatch_v1+0x34/0x54
         __se_sys_swapon+0x654/0x1084
         __arm64_sys_swapon+0x1c/0x24
         el0_svc_common+0xa8/0x150
         el0_svc_compat_handler+0x2c/0x38
         el0_svc_compat+0x8/0x18
      
      Link: http://lkml.kernel.org/r/20190312081902.223764-1-pihsun@chromium.orgSigned-off-by: default avatarPi-Hsun Shih <pihsun@chromium.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c43e2bdd
    • Stanislaw Gruszka's avatar
      lib/div64.c: off by one in shift · 6dc75ccb
      Stanislaw Gruszka authored
      [ Upstream commit cdc94a37 ]
      
      fls counts bits starting from 1 to 32 (returns 0 for zero argument).  If
      we add 1 we shift right one bit more and loose precision from divisor,
      what cause function incorect results with some numbers.
      
      Corrected code was tested in user-space, see bugzilla:
         https://bugzilla.kernel.org/show_bug.cgi?id=202391
      
      Link: http://lkml.kernel.org/r/1548686944-11891-1-git-send-email-sgruszka@redhat.com
      Fixes: 658716d1 ("div64_u64(): improve precision on 32bit platforms")
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Reported-by: default avatarSiarhei Volkau <lis8215@gmail.com>
      Tested-by: default avatarSiarhei Volkau <lis8215@gmail.com>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6dc75ccb
    • YueHaibing's avatar
      appletalk: Fix use-after-free in atalk_proc_exit · 0ba1fa56
      YueHaibing authored
      [ Upstream commit 6377f787 ]
      
      KASAN report this:
      
      BUG: KASAN: use-after-free in pde_subdir_find+0x12d/0x150 fs/proc/generic.c:71
      Read of size 8 at addr ffff8881f41fe5b0 by task syz-executor.0/2806
      
      CPU: 0 PID: 2806 Comm: syz-executor.0 Not tainted 5.0.0-rc7+ #45
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0xfa/0x1ce lib/dump_stack.c:113
       print_address_description+0x65/0x270 mm/kasan/report.c:187
       kasan_report+0x149/0x18d mm/kasan/report.c:317
       pde_subdir_find+0x12d/0x150 fs/proc/generic.c:71
       remove_proc_entry+0xe8/0x420 fs/proc/generic.c:667
       atalk_proc_exit+0x18/0x820 [appletalk]
       atalk_exit+0xf/0x5a [appletalk]
       __do_sys_delete_module kernel/module.c:1018 [inline]
       __se_sys_delete_module kernel/module.c:961 [inline]
       __x64_sys_delete_module+0x3dc/0x5e0 kernel/module.c:961
       do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x462e99
      Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fb2de6b9c58 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0
      RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000200001c0
      RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007fb2de6ba6bc
      R13: 00000000004bccaa R14: 00000000006f6bc8 R15: 00000000ffffffff
      
      Allocated by task 2806:
       set_track mm/kasan/common.c:85 [inline]
       __kasan_kmalloc.constprop.3+0xa0/0xd0 mm/kasan/common.c:496
       slab_post_alloc_hook mm/slab.h:444 [inline]
       slab_alloc_node mm/slub.c:2739 [inline]
       slab_alloc mm/slub.c:2747 [inline]
       kmem_cache_alloc+0xcf/0x250 mm/slub.c:2752
       kmem_cache_zalloc include/linux/slab.h:730 [inline]
       __proc_create+0x30f/0xa20 fs/proc/generic.c:408
       proc_mkdir_data+0x47/0x190 fs/proc/generic.c:469
       0xffffffffc10c01bb
       0xffffffffc10c0166
       do_one_initcall+0xfa/0x5ca init/main.c:887
       do_init_module+0x204/0x5f6 kernel/module.c:3460
       load_module+0x66b2/0x8570 kernel/module.c:3808
       __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
       do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 2806:
       set_track mm/kasan/common.c:85 [inline]
       __kasan_slab_free+0x130/0x180 mm/kasan/common.c:458
       slab_free_hook mm/slub.c:1409 [inline]
       slab_free_freelist_hook mm/slub.c:1436 [inline]
       slab_free mm/slub.c:2986 [inline]
       kmem_cache_free+0xa6/0x2a0 mm/slub.c:3002
       pde_put+0x6e/0x80 fs/proc/generic.c:647
       remove_proc_entry+0x1d3/0x420 fs/proc/generic.c:684
       0xffffffffc10c031c
       0xffffffffc10c0166
       do_one_initcall+0xfa/0x5ca init/main.c:887
       do_init_module+0x204/0x5f6 kernel/module.c:3460
       load_module+0x66b2/0x8570 kernel/module.c:3808
       __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
       do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff8881f41fe500
       which belongs to the cache proc_dir_entry of size 256
      The buggy address is located 176 bytes inside of
       256-byte region [ffff8881f41fe500, ffff8881f41fe600)
      The buggy address belongs to the page:
      page:ffffea0007d07f80 count:1 mapcount:0 mapping:ffff8881f6e69a00 index:0x0
      flags: 0x2fffc0000000200(slab)
      raw: 02fffc0000000200 dead000000000100 dead000000000200 ffff8881f6e69a00
      raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8881f41fe480: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
       ffff8881f41fe500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      >ffff8881f41fe580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                           ^
       ffff8881f41fe600: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
       ffff8881f41fe680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      It should check the return value of atalk_proc_init fails,
      otherwise atalk_exit will trgger use-after-free in pde_subdir_find
      while unload the module.This patch fix error cleanup path of atalk_init
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0ba1fa56
    • Kevin Wang's avatar
      drm/amdkfd: use init_mqd function to allocate object for hid_mqd (CI) · 86c7f76a
      Kevin Wang authored
      [ Upstream commit cac734c2 ]
      
      if use the legacy method to allocate object, when mqd_hiq need to run
      uninit code, it will be cause WARNING call trace.
      
      eg: (s3 suspend test)
      [   34.918944] Call Trace:
      [   34.918948]  [<ffffffff92961dc1>] dump_stack+0x19/0x1b
      [   34.918950]  [<ffffffff92297648>] __warn+0xd8/0x100
      [   34.918951]  [<ffffffff9229778d>] warn_slowpath_null+0x1d/0x20
      [   34.918991]  [<ffffffffc03ce1fe>] uninit_mqd_hiq_sdma+0x4e/0x50 [amdgpu]
      [   34.919028]  [<ffffffffc03d0ef7>] uninitialize+0x37/0xe0 [amdgpu]
      [   34.919064]  [<ffffffffc03d15a6>] kernel_queue_uninit+0x16/0x30 [amdgpu]
      [   34.919086]  [<ffffffffc03d26c2>] pm_uninit+0x12/0x20 [amdgpu]
      [   34.919107]  [<ffffffffc03d4915>] stop_nocpsch+0x15/0x20 [amdgpu]
      [   34.919129]  [<ffffffffc03c1dce>] kgd2kfd_suspend.part.4+0x2e/0x50 [amdgpu]
      [   34.919150]  [<ffffffffc03c2667>] kgd2kfd_suspend+0x17/0x20 [amdgpu]
      [   34.919171]  [<ffffffffc03c103a>] amdgpu_amdkfd_suspend+0x1a/0x20 [amdgpu]
      [   34.919187]  [<ffffffffc02ec428>] amdgpu_device_suspend+0x88/0x3a0 [amdgpu]
      [   34.919189]  [<ffffffff922e22cf>] ? enqueue_entity+0x2ef/0xbe0
      [   34.919205]  [<ffffffffc02e8220>] amdgpu_pmops_suspend+0x20/0x30 [amdgpu]
      [   34.919207]  [<ffffffff925c56ff>] pci_pm_suspend+0x6f/0x150
      [   34.919208]  [<ffffffff925c5690>] ? pci_pm_freeze+0xf0/0xf0
      [   34.919210]  [<ffffffff926b45c6>] dpm_run_callback+0x46/0x90
      [   34.919212]  [<ffffffff926b49db>] __device_suspend+0xfb/0x2a0
      [   34.919213]  [<ffffffff926b4b9f>] async_suspend+0x1f/0xa0
      [   34.919214]  [<ffffffff922c918f>] async_run_entry_fn+0x3f/0x130
      [   34.919216]  [<ffffffff922b9d4f>] process_one_work+0x17f/0x440
      [   34.919217]  [<ffffffff922bade6>] worker_thread+0x126/0x3c0
      [   34.919218]  [<ffffffff922bacc0>] ? manage_workers.isra.25+0x2a0/0x2a0
      [   34.919220]  [<ffffffff922c1c31>] kthread+0xd1/0xe0
      [   34.919221]  [<ffffffff922c1b60>] ? insert_kthread_work+0x40/0x40
      [   34.919222]  [<ffffffff92974c1d>] ret_from_fork_nospec_begin+0x7/0x21
      [   34.919224]  [<ffffffff922c1b60>] ? insert_kthread_work+0x40/0x40
      [   34.919224] ---[ end trace 38cd9f65c963adad ]---
      Signed-off-by: default avatarKevin Wang <kevin1.wang@amd.com>
      Reviewed-by: default avatarOak Zeng <Oak.Zeng@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      86c7f76a
    • Yang Shi's avatar
      ARM: 8839/1: kprobe: make patch_lock a raw_spinlock_t · 5e3f6ba8
      Yang Shi authored
      [ Upstream commit 143c2a89 ]
      
      When running kprobe on -rt kernel, the below bug is caught:
      
      |BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:931
      |in_atomic(): 1, irqs_disabled(): 128, pid: 14, name: migration/0
      |Preemption disabled at:[<802f2b98>] cpu_stopper_thread+0xc0/0x140
      |CPU: 0 PID: 14 Comm: migration/0 Tainted: G O 4.8.3-rt2 #1
      |Hardware name: Freescale LS1021A
      |[<8025a43c>] (___might_sleep)
      |[<80b5b324>] (rt_spin_lock)
      |[<80b5c31c>] (__patch_text_real)
      |[<80b5c3ac>] (patch_text_stop_machine)
      |[<802f2920>] (multi_cpu_stop)
      
      Since patch_text_stop_machine() is called in stop_machine() which
      disables IRQ, sleepable lock should be not used in this atomic context,
       so replace patch_lock to raw lock.
      Signed-off-by: default avatarYang Shi <yang.shi@linaro.org>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Reviewed-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5e3f6ba8
    • Ilia Mirkin's avatar
      drm/nouveau/volt/gf117: fix speedo readout register · 03e67ed3
      Ilia Mirkin authored
      [ Upstream commit fc782242 ]
      
      GF117 appears to use the same register as GK104 (but still with the
      general Fermi readout mechanism).
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108980Signed-off-by: default avatarIlia Mirkin <imirkin@alum.mit.edu>
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      03e67ed3
    • Leo Yan's avatar
      coresight: cpu-debug: Support for CA73 CPUs · a68ee104
      Leo Yan authored
      [ Upstream commit a0f890ab ]
      
      This patch is to add the AMBA device ID for CA73 CPU, so that CPU debug
      module can be initialized successfully when a SoC contain CA73 CPUs.
      
      This patch has been verified on 96boards Hikey960.
      Signed-off-by: default avatarLeo Yan <leo.yan@linaro.org>
      Signed-off-by: default avatarMathieu Poirier <mathieu.poirier@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a68ee104
    • Zhang Rui's avatar
      Revert "ACPI / EC: Remove old CLEAR_ON_RESUME quirk" · 3e3adeb2
      Zhang Rui authored
      [ Upstream commit b6a3e147 ]
      
      On some Samsung hardware, it is necessary to clear events accumulated by
      the EC during sleep. These ECs stop reporting GPEs until they are manually
      polled, if too many events are accumulated.
      Thus the CLEAR_ON_RESUME quirk is introduced to send EC query commands
      unconditionally after resume to clear all the EC query events on those
      platforms.
      
      Later, commit 4c237371 ("ACPI / EC: Remove old CLEAR_ON_RESUME quirk")
      removes the CLEAR_ON_RESUME quirk because we thought the new EC IRQ
      polling logic should handle this case.
      
      Now it has been proved that the EC IRQ Polling logic does not fix the
      issue actually because we got regression report on these Samsung
      platforms after removing the quirk.
      
      Thus revert commit 4c237371 ("ACPI / EC: Remove old CLEAR_ON_RESUME
      quirk") to introduce back the Samsung quirk in this patch.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=44161Tested-by: default avatarOrtwin Glück <odi@odi.ch>
      Tested-by: default avatarFrancisco Cribari <cribari@gmail.com>
      Tested-by: default avatarBalazs Varga <balazs4web@gmail.com>
      Signed-off-by: default avatarZhang Rui <rui.zhang@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3e3adeb2
    • Lars Persson's avatar
      crypto: axis - fix for recursive locking from bottom half · 56828309
      Lars Persson authored
      [ Upstream commit c34a8382 ]
      
      Clients may submit a new requests from the completion callback
      context. The driver was not prepared to receive a request in this
      state because it already held the request queue lock and a recursive
      lock error is triggered.
      
      Now all completions are queued up until we are ready to drop the queue
      lock and then delivered.
      
      The fault was triggered by TCP over an IPsec connection in the LTP
      test suite:
        LTP: starting tcp4_ipsec02 (tcp_ipsec.sh -p ah -m transport -s "100 1000 65535")
        BUG: spinlock recursion on CPU#1, genload/943
         lock: 0xbf3c3094, .magic: dead4ead, .owner: genload/943, .owner_cpu: 1
        CPU: 1 PID: 943 Comm: genload Tainted: G           O    4.9.62-axis5-devel #6
        Hardware name: Axis ARTPEC-6 Platform
         (unwind_backtrace) from [<8010d134>] (show_stack+0x18/0x1c)
         (show_stack) from [<803a289c>] (dump_stack+0x84/0x98)
         (dump_stack) from [<8016e164>] (do_raw_spin_lock+0x124/0x128)
         (do_raw_spin_lock) from [<804de1a4>] (artpec6_crypto_submit+0x2c/0xa0)
         (artpec6_crypto_submit) from [<804def38>] (artpec6_crypto_prepare_submit_hash+0xd0/0x54c)
         (artpec6_crypto_prepare_submit_hash) from [<7f3165f0>] (ah_output+0x2a4/0x3dc [ah4])
         (ah_output [ah4]) from [<805df9bc>] (xfrm_output_resume+0x178/0x4a4)
         (xfrm_output_resume) from [<805d283c>] (xfrm4_output+0xac/0xbc)
         (xfrm4_output) from [<80587928>] (ip_queue_xmit+0x140/0x3b4)
         (ip_queue_xmit) from [<805a13b4>] (tcp_transmit_skb+0x4c4/0x95c)
         (tcp_transmit_skb) from [<8059f218>] (tcp_rcv_state_process+0xdf4/0xdfc)
         (tcp_rcv_state_process) from [<805a7530>] (tcp_v4_do_rcv+0x64/0x1ac)
         (tcp_v4_do_rcv) from [<805a9724>] (tcp_v4_rcv+0xa34/0xb74)
         (tcp_v4_rcv) from [<80581d34>] (ip_local_deliver_finish+0x78/0x2b0)
         (ip_local_deliver_finish) from [<8058259c>] (ip_local_deliver+0xe4/0x104)
         (ip_local_deliver) from [<805d23ec>] (xfrm4_transport_finish+0xf4/0x144)
         (xfrm4_transport_finish) from [<805df564>] (xfrm_input+0x4f4/0x74c)
         (xfrm_input) from [<804de420>] (artpec6_crypto_task+0x208/0x38c)
         (artpec6_crypto_task) from [<801271b0>] (tasklet_action+0x60/0xec)
         (tasklet_action) from [<801266d4>] (__do_softirq+0xcc/0x3a4)
         (__do_softirq) from [<80126d20>] (irq_exit+0xf4/0x15c)
         (irq_exit) from [<801741e8>] (__handle_domain_irq+0x68/0xbc)
         (__handle_domain_irq) from [<801014f0>] (gic_handle_irq+0x50/0x94)
         (gic_handle_irq) from [<80657370>] (__irq_usr+0x50/0x80)
      Signed-off-by: default avatarLars Persson <larper@axis.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      56828309
    • Hsin-Yi, Wang's avatar
      drm/panel: panel-innolux: set display off in innolux_panel_unprepare · 61bea6ad
      Hsin-Yi, Wang authored
      [ Upstream commit 46f3ceaf ]
      
      Move mipi_dsi_dcs_set_display_off() from innolux_panel_disable()
      to innolux_panel_unprepare(), so they are consistent with
      innolux_panel_enable() and innolux_panel_prepare().
      
      This also fixes some mode check and irq timeout issue in MTK dsi code.
      
      Since some dsi code (e.g. mtk_dsi) have following call trace:
      1. drm_panel_disable(), which calls innolux_panel_disable()
      2. switch to cmd mode
      3. drm_panel_unprepare(), which calls innolux_panel_unprepare()
      
      However, mtk_dsi needs to be in cmd mode to be able to send commands
      (e.g. mipi_dsi_dcs_set_display_off() and mipi_dsi_dcs_enter_sleep_mode()),
      so we need these functions to be called after the switch to cmd mode happens,
      i.e. in innolux_panel_unprepare.
      Signed-off-by: default avatarHsin-Yi, Wang <hsinyi@chromium.org>
      Signed-off-by: default avatarSean Paul <seanpaul@chromium.org>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190109065922.231753-1-hsinyi@chromium.orgSigned-off-by: default avatarSasha Levin <sashal@kernel.org>
      61bea6ad
    • Christophe Leroy's avatar
      lkdtm: Add tests for NULL pointer dereference · f2778b35
      Christophe Leroy authored
      [ Upstream commit 59a12205 ]
      
      Introduce lkdtm tests for NULL pointer dereference: check access or exec
      at NULL address, since these errors tend to be reported differently from
      the general fault error text. For example from x86:
      
          pr_alert("BUG: unable to handle kernel %s at %px\n",
              address < PAGE_SIZE ? "NULL pointer dereference" : "paging request",
              (void *)address);
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f2778b35
    • Christophe Leroy's avatar
      lkdtm: Print real addresses · b035faf5
      Christophe Leroy authored
      [ Upstream commit 4c411157 ]
      
      Today, when doing a lkdtm test before the readiness of the
      random generator, (ptrval) is printed instead of the address
      at which it perform the fault:
      
      [ 1597.337030] lkdtm: Performing direct entry EXEC_USERSPACE
      [ 1597.337142] lkdtm: attempting ok execution at (ptrval)
      [ 1597.337398] lkdtm: attempting bad execution at (ptrval)
      [ 1597.337460] kernel tried to execute user page (77858000) -exploit attempt? (uid: 0)
      [ 1597.344769] Unable to handle kernel paging request for instruction fetch
      [ 1597.351392] Faulting instruction address: 0x77858000
      [ 1597.356312] Oops: Kernel access of bad area, sig: 11 [#1]
      
      If the lkdtm test is done later on, it prints an hashed address.
      
      In both cases this is pointless. The purpose of the test is to
      ensure the kernel generates an Oops at the expected address,
      so real addresses needs to be printed. This patch fixes that.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b035faf5
    • Dmitry Osipenko's avatar
      soc/tegra: pmc: Drop locking from tegra_powergate_is_powered() · 201aee30
      Dmitry Osipenko authored
      [ Upstream commit b6e1fd17 ]
      
      This fixes splats like the one below if CONFIG_DEBUG_ATOMIC_SLEEP=y
      and machine (Tegra30) booted with SMP=n or all secondary CPU's are put
      offline. Locking isn't needed because it protects atomic operation.
      
      BUG: sleeping function called from invalid context at kernel/locking/mutex.c:254
      in_atomic(): 1, irqs_disabled(): 128, pid: 0, name: swapper/0
      CPU: 0 PID: 0 Comm: swapper/0 Tainted: G         C        4.18.0-next-20180821-00180-gc3ebb6544e44-dirty #823
      Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
      [<c01134f4>] (unwind_backtrace) from [<c010db2c>] (show_stack+0x20/0x24)
      [<c010db2c>] (show_stack) from [<c0bd0f3c>] (dump_stack+0x94/0xa8)
      [<c0bd0f3c>] (dump_stack) from [<c0151df8>] (___might_sleep+0x13c/0x174)
      [<c0151df8>] (___might_sleep) from [<c0151ea0>] (__might_sleep+0x70/0xa8)
      [<c0151ea0>] (__might_sleep) from [<c0bec2b8>] (mutex_lock+0x2c/0x70)
      [<c0bec2b8>] (mutex_lock) from [<c0589844>] (tegra_powergate_is_powered+0x44/0xa8)
      [<c0589844>] (tegra_powergate_is_powered) from [<c0581a60>] (tegra30_cpu_rail_off_ready+0x30/0x74)
      [<c0581a60>] (tegra30_cpu_rail_off_ready) from [<c0122244>] (tegra30_idle_lp2+0xa0/0x108)
      [<c0122244>] (tegra30_idle_lp2) from [<c0853438>] (cpuidle_enter_state+0x140/0x540)
      [<c0853438>] (cpuidle_enter_state) from [<c08538a4>] (cpuidle_enter+0x40/0x4c)
      [<c08538a4>] (cpuidle_enter) from [<c01595e0>] (call_cpuidle+0x30/0x48)
      [<c01595e0>] (call_cpuidle) from [<c01599f8>] (do_idle+0x238/0x28c)
      [<c01599f8>] (do_idle) from [<c0159d28>] (cpu_startup_entry+0x28/0x2c)
      [<c0159d28>] (cpu_startup_entry) from [<c0be76c8>] (rest_init+0xd8/0xdc)
      [<c0be76c8>] (rest_init) from [<c1200f50>] (start_kernel+0x41c/0x430)
      Signed-off-by: default avatarDmitry Osipenko <digetx@gmail.com>
      Acked-by: default avatarJon Hunter <jonathanh@nvidia.com>
      Signed-off-by: default avatarThierry Reding <treding@nvidia.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      201aee30
    • Julia Cartwright's avatar
      iommu/dmar: Fix buffer overflow during PCI bus notification · e2f0e69f
      Julia Cartwright authored
      [ Upstream commit cffaaf0c ]
      
      Commit 57384592 ("iommu/vt-d: Store bus information in RMRR PCI
      device path") changed the type of the path data, however, the change in
      path type was not reflected in size calculations.  Update to use the
      correct type and prevent a buffer overflow.
      
      This bug manifests in systems with deep PCI hierarchies, and can lead to
      an overflow of the static allocated buffer (dmar_pci_notify_info_buf),
      or can lead to overflow of slab-allocated data.
      
         BUG: KASAN: global-out-of-bounds in dmar_alloc_pci_notify_info+0x1d5/0x2e0
         Write of size 1 at addr ffffffff90445d80 by task swapper/0/1
         CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W       4.14.87-rt49-02406-gd0a0e96 #1
         Call Trace:
          ? dump_stack+0x46/0x59
          ? print_address_description+0x1df/0x290
          ? dmar_alloc_pci_notify_info+0x1d5/0x2e0
          ? kasan_report+0x256/0x340
          ? dmar_alloc_pci_notify_info+0x1d5/0x2e0
          ? e820__memblock_setup+0xb0/0xb0
          ? dmar_dev_scope_init+0x424/0x48f
          ? __down_write_common+0x1ec/0x230
          ? dmar_dev_scope_init+0x48f/0x48f
          ? dmar_free_unused_resources+0x109/0x109
          ? cpumask_next+0x16/0x20
          ? __kmem_cache_create+0x392/0x430
          ? kmem_cache_create+0x135/0x2f0
          ? e820__memblock_setup+0xb0/0xb0
          ? intel_iommu_init+0x170/0x1848
          ? _raw_spin_unlock_irqrestore+0x32/0x60
          ? migrate_enable+0x27a/0x5b0
          ? sched_setattr+0x20/0x20
          ? migrate_disable+0x1fc/0x380
          ? task_rq_lock+0x170/0x170
          ? try_to_run_init_process+0x40/0x40
          ? locks_remove_file+0x85/0x2f0
          ? dev_prepare_static_identity_mapping+0x78/0x78
          ? rt_spin_unlock+0x39/0x50
          ? lockref_put_or_lock+0x2a/0x40
          ? dput+0x128/0x2f0
          ? __rcu_read_unlock+0x66/0x80
          ? __fput+0x250/0x300
          ? __rcu_read_lock+0x1b/0x30
          ? mntput_no_expire+0x38/0x290
          ? e820__memblock_setup+0xb0/0xb0
          ? pci_iommu_init+0x25/0x63
          ? pci_iommu_init+0x25/0x63
          ? do_one_initcall+0x7e/0x1c0
          ? initcall_blacklisted+0x120/0x120
          ? kernel_init_freeable+0x27b/0x307
          ? rest_init+0xd0/0xd0
          ? kernel_init+0xf/0x120
          ? rest_init+0xd0/0xd0
          ? ret_from_fork+0x1f/0x40
         The buggy address belongs to the variable:
          dmar_pci_notify_info_buf+0x40/0x60
      
      Fixes: 57384592 ("iommu/vt-d: Store bus information in RMRR PCI device path")
      Signed-off-by: default avatarJulia Cartwright <julia@ni.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e2f0e69f
    • Ard Biesheuvel's avatar
      crypto: sha512/arm - fix crash bug in Thumb2 build · 29b56343
      Ard Biesheuvel authored
      [ Upstream commit c6431650 ]
      
      The SHA512 code we adopted from the OpenSSL project uses a rather
      peculiar way to take the address of the round constant table: it
      takes the address of the sha256_block_data_order() routine, and
      substracts a constant known quantity to arrive at the base of the
      table, which is emitted by the same assembler code right before
      the routine's entry point.
      
      However, recent versions of binutils have helpfully changed the
      behavior of references emitted via an ADR instruction when running
      in Thumb2 mode: it now takes the Thumb execution mode bit into
      account, which is bit 0 af the address. This means the produced
      table address also has bit 0 set, and so we end up with an address
      value pointing 1 byte past the start of the table, which results
      in crashes such as
      
        Unable to handle kernel paging request at virtual address bf825000
        pgd = 42f44b11
        [bf825000] *pgd=80000040206003, *pmd=5f1bd003, *pte=00000000
        Internal error: Oops: 207 [#1] PREEMPT SMP THUMB2
        Modules linked in: sha256_arm(+) sha1_arm_ce sha1_arm ...
        CPU: 7 PID: 396 Comm: cryptomgr_test Not tainted 5.0.0-rc6+ #144
        Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
        PC is at sha256_block_data_order+0xaaa/0xb30 [sha256_arm]
        LR is at __this_module+0x17fd/0xffffe800 [sha256_arm]
        pc : [<bf820bca>]    lr : [<bf824ffd>]    psr: 800b0033
        sp : ebc8bbe8  ip : faaabe1c  fp : 2fdd3433
        r10: 4c5f1692  r9 : e43037df  r8 : b04b0a5a
        r7 : c369d722  r6 : 39c3693e  r5 : 7a013189  r4 : 1580d26b
        r3 : 8762a9b0  r2 : eea9c2cd  r1 : 3e9ab536  r0 : 1dea4ae7
        Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA Thumb  Segment user
        Control: 70c5383d  Table: 6b8467c0  DAC: dbadc0de
        Process cryptomgr_test (pid: 396, stack limit = 0x69e1fe23)
        Stack: (0xebc8bbe8 to 0xebc8c000)
        ...
        unwind: Unknown symbol address bf820bca
        unwind: Index not found bf820bca
        Code: 441a ea80 40f9 440a (f85e) 3b04
        ---[ end trace e560cce92700ef8a ]---
      
      Given that this affects older kernels as well, in case they are built
      with a recent toolchain, apply a minimal backportable fix, which is
      to emit another non-code label at the start of the routine, and
      reference that instead. (This is similar to the current upstream state
      of this file in OpenSSL)
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      29b56343
    • Ard Biesheuvel's avatar
      crypto: sha256/arm - fix crash bug in Thumb2 build · 92562a9f
      Ard Biesheuvel authored
      [ Upstream commit 69216a54 ]
      
      The SHA256 code we adopted from the OpenSSL project uses a rather
      peculiar way to take the address of the round constant table: it
      takes the address of the sha256_block_data_order() routine, and
      substracts a constant known quantity to arrive at the base of the
      table, which is emitted by the same assembler code right before
      the routine's entry point.
      
      However, recent versions of binutils have helpfully changed the
      behavior of references emitted via an ADR instruction when running
      in Thumb2 mode: it now takes the Thumb execution mode bit into
      account, which is bit 0 af the address. This means the produced
      table address also has bit 0 set, and so we end up with an address
      value pointing 1 byte past the start of the table, which results
      in crashes such as
      
        Unable to handle kernel paging request at virtual address bf825000
        pgd = 42f44b11
        [bf825000] *pgd=80000040206003, *pmd=5f1bd003, *pte=00000000
        Internal error: Oops: 207 [#1] PREEMPT SMP THUMB2
        Modules linked in: sha256_arm(+) sha1_arm_ce sha1_arm ...
        CPU: 7 PID: 396 Comm: cryptomgr_test Not tainted 5.0.0-rc6+ #144
        Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
        PC is at sha256_block_data_order+0xaaa/0xb30 [sha256_arm]
        LR is at __this_module+0x17fd/0xffffe800 [sha256_arm]
        pc : [<bf820bca>]    lr : [<bf824ffd>]    psr: 800b0033
        sp : ebc8bbe8  ip : faaabe1c  fp : 2fdd3433
        r10: 4c5f1692  r9 : e43037df  r8 : b04b0a5a
        r7 : c369d722  r6 : 39c3693e  r5 : 7a013189  r4 : 1580d26b
        r3 : 8762a9b0  r2 : eea9c2cd  r1 : 3e9ab536  r0 : 1dea4ae7
        Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA Thumb  Segment user
        Control: 70c5383d  Table: 6b8467c0  DAC: dbadc0de
        Process cryptomgr_test (pid: 396, stack limit = 0x69e1fe23)
        Stack: (0xebc8bbe8 to 0xebc8c000)
        ...
        unwind: Unknown symbol address bf820bca
        unwind: Index not found bf820bca
        Code: 441a ea80 40f9 440a (f85e) 3b04
        ---[ end trace e560cce92700ef8a ]---
      
      Given that this affects older kernels as well, in case they are built
      with a recent toolchain, apply a minimal backportable fix, which is
      to emit another non-code label at the start of the routine, and
      reference that instead. (This is similar to the current upstream state
      of this file in OpenSSL)
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      92562a9f
    • Vitaly Kuznetsov's avatar
      kernel: hung_task.c: disable on suspend · 507f2f17
      Vitaly Kuznetsov authored
      [ Upstream commit a1c6ca3c ]
      
      It is possible to observe hung_task complaints when system goes to
      suspend-to-idle state:
      
       # echo freeze > /sys/power/state
      
       PM: Syncing filesystems ... done.
       Freezing user space processes ... (elapsed 0.001 seconds) done.
       OOM killer disabled.
       Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done.
       sd 0:0:0:0: [sda] Synchronizing SCSI cache
       INFO: task bash:1569 blocked for more than 120 seconds.
             Not tainted 4.19.0-rc3_+ #687
       "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
       bash            D    0  1569    604 0x00000000
       Call Trace:
        ? __schedule+0x1fe/0x7e0
        schedule+0x28/0x80
        suspend_devices_and_enter+0x4ac/0x750
        pm_suspend+0x2c0/0x310
      
      Register a PM notifier to disable the detector on suspend and re-enable
      back on wakeup.
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      507f2f17
    • Steve French's avatar
      cifs: fallback to older infolevels on findfirst queryinfo retry · 1d41cd12
      Steve French authored
      [ Upstream commit 3b7960ca ]
      
      In cases where queryinfo fails, we have cases in cifs (vers=1.0)
      where with backupuid mounts we retry the query info with findfirst.
      This doesn't work to some NetApp servers which don't support
      WindowsXP (and later) infolevel 261 (SMB_FIND_FILE_ID_FULL_DIR_INFO)
      so in this case use other info levels (in this case it will usually
      be level 257, SMB_FIND_FILE_DIRECTORY_INFO).
      
      (Also fixes some indentation)
      
      See kernel bugzilla 201435
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1d41cd12
    • ndesaulniers@google.com's avatar
      compiler.h: update definition of unreachable() · 82017e26
      ndesaulniers@google.com authored
      [ Upstream commit fe0640eb ]
      
      Fixes the objtool warning seen with Clang:
      arch/x86/mm/fault.o: warning: objtool: no_context()+0x220: unreachable
      instruction
      
      Fixes commit 815f0ddb ("include/linux/compiler*.h: make compiler-*.h
      mutually exclusive")
      
      Josh noted that the fallback definition was meant to work around a
      pre-gcc-4.6 bug. GCC still needs to work around
      https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82365, so compiler-gcc.h
      defines its own version of unreachable().  Clang and ICC can use this
      shared definition.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/204Suggested-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Suggested-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Tested-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarMiguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      82017e26
    • Sean Christopherson's avatar
      KVM: nVMX: restore host state in nested_vmx_vmexit for VMFail · 05e78850
      Sean Christopherson authored
      [ Upstream commit bd18bffc ]
      
      A VMEnter that VMFails (as opposed to VMExits) does not touch host
      state beyond registers that are explicitly noted in the VMFail path,
      e.g. EFLAGS.  Host state does not need to be loaded because VMFail
      is only signaled for consistency checks that occur before the CPU
      starts to load guest state, i.e. there is no need to restore any
      state as nothing has been modified.  But in the case where a VMFail
      is detected by hardware and not by KVM (due to deferring consistency
      checks to hardware), KVM has already loaded some amount of guest
      state.  Luckily, "loaded" only means loaded to KVM's software model,
      i.e. vmcs01 has not been modified.  So, unwind our software model to
      the pre-VMEntry host state.
      
      Not restoring host state in this VMFail path leads to a variety of
      failures because we end up with stale data in vcpu->arch, e.g. CR0,
      CR4, EFER, etc... will all be out of sync relative to vmcs01.  Any
      significant delta in the stale data is all but guaranteed to crash
      L1, e.g. emulation of SMEP, SMAP, UMIP, WP, etc... will be wrong.
      
      An alternative to this "soft" reload would be to load host state from
      vmcs12 as if we triggered a VMExit (as opposed to VMFail), but that is
      wildly inconsistent with respect to the VMX architecture, e.g. an L1
      VMM with separate VMExit and VMFail paths would explode.
      
      Note that this approach does not mean KVM is 100% accurate with
      respect to VMX hardware behavior, even at an architectural level
      (the exact order of consistency checks is microarchitecture specific).
      But 100% emulation accuracy isn't the goal (with this patch), rather
      the goal is to be consistent in the information delivered to L1, e.g.
      a VMExit should not fall-through VMENTER, and a VMFail should not jump
      to HOST_RIP.
      
      This technically reverts commit "5af41573 (KVM: nVMX: Fix mmu
      context after VMLAUNCH/VMRESUME failure)", but retains the core
      aspects of that patch, just in an open coded form due to the need to
      pull state from vmcs01 instead of vmcs12.  Restoring host state
      resolves a variety of issues introduced by commit "4f350c6d
      (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)",
      which remedied the incorrect behavior of treating VMFail like VMExit
      but in doing so neglected to restore arch state that had been modified
      prior to attempting nested VMEnter.
      
      A sample failure that occurs due to stale vcpu.arch state is a fault
      of some form while emulating an LGDT (due to emulated UMIP) from L1
      after a failed VMEntry to L3, in this case when running the KVM unit
      test test_tpr_threshold_values in L1.  L0 also hits a WARN in this
      case due to a stale arch.cr4.UMIP.
      
      L1:
        BUG: unable to handle kernel paging request at ffffc90000663b9e
        PGD 276512067 P4D 276512067 PUD 276513067 PMD 274efa067 PTE 8000000271de2163
        Oops: 0009 [#1] SMP
        CPU: 5 PID: 12495 Comm: qemu-system-x86 Tainted: G        W         4.18.0-rc2+ #2
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
        RIP: 0010:native_load_gdt+0x0/0x10
      
        ...
      
        Call Trace:
         load_fixmap_gdt+0x22/0x30
         __vmx_load_host_state+0x10e/0x1c0 [kvm_intel]
         vmx_switch_vmcs+0x2d/0x50 [kvm_intel]
         nested_vmx_vmexit+0x222/0x9c0 [kvm_intel]
         vmx_handle_exit+0x246/0x15a0 [kvm_intel]
         kvm_arch_vcpu_ioctl_run+0x850/0x1830 [kvm]
         kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
         do_vfs_ioctl+0x9f/0x600
         ksys_ioctl+0x66/0x70
         __x64_sys_ioctl+0x16/0x20
         do_syscall_64+0x4f/0x100
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      L0:
        WARNING: CPU: 2 PID: 3529 at arch/x86/kvm/vmx.c:6618 handle_desc+0x28/0x30 [kvm_intel]
        ...
        CPU: 2 PID: 3529 Comm: qemu-system-x86 Not tainted 4.17.2-coffee+ #76
        Hardware name: Intel Corporation Kabylake Client platform/KBL S
        RIP: 0010:handle_desc+0x28/0x30 [kvm_intel]
      
        ...
      
        Call Trace:
         kvm_arch_vcpu_ioctl_run+0x863/0x1840 [kvm]
         kvm_vcpu_ioctl+0x3a1/0x5c0 [kvm]
         do_vfs_ioctl+0x9f/0x5e0
         ksys_ioctl+0x66/0x70
         __x64_sys_ioctl+0x16/0x20
         do_syscall_64+0x49/0xf0
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 5af41573 (KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure)
      Fixes: 4f350c6d (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly)
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim KrÄmáŠ<rkrcmar@redhat.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      05e78850
    • Ronald Tschalär's avatar
      ACPI / SBS: Fix GPE storm on recent MacBookPro's · 00254adf
      Ronald Tschalär authored
      [ Upstream commit ca1721c5 ]
      
      On Apple machines, plugging-in or unplugging the power triggers a GPE
      for the EC. Since these machines expose an SBS device, this GPE ends
      up triggering the acpi_sbs_callback(). This in turn tries to get the
      status of the SBS charger. However, on MBP13,* and MBP14,* machines,
      performing the smbus-read operation to get the charger's status triggers
      the EC's GPE again. The result is an endless re-triggering and handling
      of that GPE, consuming significant CPU resources (> 50% in irq).
      
      In the end this is quite similar to commit 3031cdde (ACPI / SBS:
      Don't assume the existence of an SBS charger), except that on the above
      machines a status of all 1's is returned. And like there, we just want
      ignore the charger here.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=198169Signed-off-by: default avatarRonald Tschalär <ronald@innovation.ch>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      00254adf