1. 03 Jan, 2019 12 commits
    • David Ahern's avatar
      ipv6: Fix dump of specific table with strict checking · 73155879
      David Ahern authored
      Dump of a specific table with strict checking enabled is looping. The
      problem is that the end of the table dump is not marked in the cb. When
      dumping a specific table, cb args 0 and 1 are not used (they are the hash
      index and entry with an hash table index when dumping all tables). Re-use
      args[0] to hold a 'done' flag for the specific table dump.
      
      Fixes: 13e38901 ("net/ipv6: Plumb support for filtering route dumps")
      Reported-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      73155879
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · be630043
      David S. Miller authored
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf 2019-01-02
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) prevent out of bounds speculation on pointer arithmetic, from Daniel.
      
      2) typo fix, from Xiaozhou.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      be630043
    • Alexei Starovoitov's avatar
      Merge branch 'prevent-oob-under-speculation' · a67825f5
      Alexei Starovoitov authored
      Daniel Borkmann says:
      
      ====================
      This set fixes an out of bounds case under speculative execution
      by implementing masking of pointer alu into the verifier. For
      details please see the individual patches.
      
      Thanks!
      
      v2 -> v3:
        - 8/9: change states_equal condition into old->speculative &&
          !cur->speculative, thanks Jakub!
        - 8/9: remove incorrect speculative state test in
          propagate_liveness(), thanks Jakub!
      v1 -> v2:
        - Typo fixes in commit msg and a comment, thanks David!
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      a67825f5
    • Daniel Borkmann's avatar
      bpf: add various test cases to selftests · 80c9b2fa
      Daniel Borkmann authored
      Add various map value pointer related test cases to test_verifier
      kselftest to reflect recent changes and improve test coverage. The
      tests include basic masking functionality, unprivileged behavior
      on pointer arithmetic which goes oob, mixed bounds tests, negative
      unknown scalar but resulting positive offset for access and helper
      range, handling of arithmetic from multiple maps, various masking
      scenarios with subsequent map value access and others including two
      test cases from Jann Horn for prior fixes.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      80c9b2fa
    • Daniel Borkmann's avatar
      bpf: prevent out of bounds speculation on pointer arithmetic · 979d63d5
      Daniel Borkmann authored
      Jann reported that the original commit back in b2157399
      ("bpf: prevent out-of-bounds speculation") was not sufficient
      to stop CPU from speculating out of bounds memory access:
      While b2157399 only focussed on masking array map access
      for unprivileged users for tail calls and data access such
      that the user provided index gets sanitized from BPF program
      and syscall side, there is still a more generic form affected
      from BPF programs that applies to most maps that hold user
      data in relation to dynamic map access when dealing with
      unknown scalars or "slow" known scalars as access offset, for
      example:
      
        - Load a map value pointer into R6
        - Load an index into R7
        - Do a slow computation (e.g. with a memory dependency) that
          loads a limit into R8 (e.g. load the limit from a map for
          high latency, then mask it to make the verifier happy)
        - Exit if R7 >= R8 (mispredicted branch)
        - Load R0 = R6[R7]
        - Load R0 = R6[R0]
      
      For unknown scalars there are two options in the BPF verifier
      where we could derive knowledge from in order to guarantee
      safe access to the memory: i) While </>/<=/>= variants won't
      allow to derive any lower or upper bounds from the unknown
      scalar where it would be safe to add it to the map value
      pointer, it is possible through ==/!= test however. ii) another
      option is to transform the unknown scalar into a known scalar,
      for example, through ALU ops combination such as R &= <imm>
      followed by R |= <imm> or any similar combination where the
      original information from the unknown scalar would be destroyed
      entirely leaving R with a constant. The initial slow load still
      precedes the latter ALU ops on that register, so the CPU
      executes speculatively from that point. Once we have the known
      scalar, any compare operation would work then. A third option
      only involving registers with known scalars could be crafted
      as described in [0] where a CPU port (e.g. Slow Int unit)
      would be filled with many dependent computations such that
      the subsequent condition depending on its outcome has to wait
      for evaluation on its execution port and thereby executing
      speculatively if the speculated code can be scheduled on a
      different execution port, or any other form of mistraining
      as described in [1], for example. Given this is not limited
      to only unknown scalars, not only map but also stack access
      is affected since both is accessible for unprivileged users
      and could potentially be used for out of bounds access under
      speculation.
      
      In order to prevent any of these cases, the verifier is now
      sanitizing pointer arithmetic on the offset such that any
      out of bounds speculation would be masked in a way where the
      pointer arithmetic result in the destination register will
      stay unchanged, meaning offset masked into zero similar as
      in array_index_nospec() case. With regards to implementation,
      there are three options that were considered: i) new insn
      for sanitation, ii) push/pop insn and sanitation as inlined
      BPF, iii) reuse of ax register and sanitation as inlined BPF.
      
      Option i) has the downside that we end up using from reserved
      bits in the opcode space, but also that we would require
      each JIT to emit masking as native arch opcodes meaning
      mitigation would have slow adoption till everyone implements
      it eventually which is counter-productive. Option ii) and iii)
      have both in common that a temporary register is needed in
      order to implement the sanitation as inlined BPF since we
      are not allowed to modify the source register. While a push /
      pop insn in ii) would be useful to have in any case, it
      requires once again that every JIT needs to implement it
      first. While possible, amount of changes needed would also
      be unsuitable for a -stable patch. Therefore, the path which
      has fewer changes, less BPF instructions for the mitigation
      and does not require anything to be changed in the JITs is
      option iii) which this work is pursuing. The ax register is
      already mapped to a register in all JITs (modulo arm32 where
      it's mapped to stack as various other BPF registers there)
      and used in constant blinding for JITs-only so far. It can
      be reused for verifier rewrites under certain constraints.
      The interpreter's tmp "register" has therefore been remapped
      into extending the register set with hidden ax register and
      reusing that for a number of instructions that needed the
      prior temporary variable internally (e.g. div, mod). This
      allows for zero increase in stack space usage in the interpreter,
      and enables (restricted) generic use in rewrites otherwise as
      long as such a patchlet does not make use of these instructions.
      The sanitation mask is dynamic and relative to the offset the
      map value or stack pointer currently holds.
      
      There are various cases that need to be taken under consideration
      for the masking, e.g. such operation could look as follows:
      ptr += val or val += ptr or ptr -= val. Thus, the value to be
      sanitized could reside either in source or in destination
      register, and the limit is different depending on whether
      the ALU op is addition or subtraction and depending on the
      current known and bounded offset. The limit is derived as
      follows: limit := max_value_size - (smin_value + off). For
      subtraction: limit := umax_value + off. This holds because
      we do not allow any pointer arithmetic that would
      temporarily go out of bounds or would have an unknown
      value with mixed signed bounds where it is unclear at
      verification time whether the actual runtime value would
      be either negative or positive. For example, we have a
      derived map pointer value with constant offset and bounded
      one, so limit based on smin_value works because the verifier
      requires that statically analyzed arithmetic on the pointer
      must be in bounds, and thus it checks if resulting
      smin_value + off and umax_value + off is still within map
      value bounds at time of arithmetic in addition to time of
      access. Similarly, for the case of stack access we derive
      the limit as follows: MAX_BPF_STACK + off for subtraction
      and -off for the case of addition where off := ptr_reg->off +
      ptr_reg->var_off.value. Subtraction is a special case for
      the masking which can be in form of ptr += -val, ptr -= -val,
      or ptr -= val. In the first two cases where we know that
      the value is negative, we need to temporarily negate the
      value in order to do the sanitation on a positive value
      where we later swap the ALU op, and restore original source
      register if the value was in source.
      
      The sanitation of pointer arithmetic alone is still not fully
      sufficient as is, since a scenario like the following could
      happen ...
      
        PTR += 0x1000 (e.g. K-based imm)
        PTR -= BIG_NUMBER_WITH_SLOW_COMPARISON
        PTR += 0x1000
        PTR -= BIG_NUMBER_WITH_SLOW_COMPARISON
        [...]
      
      ... which under speculation could end up as ...
      
        PTR += 0x1000
        PTR -= 0 [ truncated by mitigation ]
        PTR += 0x1000
        PTR -= 0 [ truncated by mitigation ]
        [...]
      
      ... and therefore still access out of bounds. To prevent such
      case, the verifier is also analyzing safety for potential out
      of bounds access under speculative execution. Meaning, it is
      also simulating pointer access under truncation. We therefore
      "branch off" and push the current verification state after the
      ALU operation with known 0 to the verification stack for later
      analysis. Given the current path analysis succeeded it is
      likely that the one under speculation can be pruned. In any
      case, it is also subject to existing complexity limits and
      therefore anything beyond this point will be rejected. In
      terms of pruning, it needs to be ensured that the verification
      state from speculative execution simulation must never prune
      a non-speculative execution path, therefore, we mark verifier
      state accordingly at the time of push_stack(). If verifier
      detects out of bounds access under speculative execution from
      one of the possible paths that includes a truncation, it will
      reject such program.
      
      Given we mask every reg-based pointer arithmetic for
      unprivileged programs, we've been looking into how it could
      affect real-world programs in terms of size increase. As the
      majority of programs are targeted for privileged-only use
      case, we've unconditionally enabled masking (with its alu
      restrictions on top of it) for privileged programs for the
      sake of testing in order to check i) whether they get rejected
      in its current form, and ii) by how much the number of
      instructions and size will increase. We've tested this by
      using Katran, Cilium and test_l4lb from the kernel selftests.
      For Katran we've evaluated balancer_kern.o, Cilium bpf_lxc.o
      and an older test object bpf_lxc_opt_-DUNKNOWN.o and l4lb
      we've used test_l4lb.o as well as test_l4lb_noinline.o. We
      found that none of the programs got rejected by the verifier
      with this change, and that impact is rather minimal to none.
      balancer_kern.o had 13,904 bytes (1,738 insns) xlated and
      7,797 bytes JITed before and after the change. Most complex
      program in bpf_lxc.o had 30,544 bytes (3,817 insns) xlated
      and 18,538 bytes JITed before and after and none of the other
      tail call programs in bpf_lxc.o had any changes either. For
      the older bpf_lxc_opt_-DUNKNOWN.o object we found a small
      increase from 20,616 bytes (2,576 insns) and 12,536 bytes JITed
      before to 20,664 bytes (2,582 insns) and 12,558 bytes JITed
      after the change. Other programs from that object file had
      similar small increase. Both test_l4lb.o had no change and
      remained at 6,544 bytes (817 insns) xlated and 3,401 bytes
      JITed and for test_l4lb_noinline.o constant at 5,080 bytes
      (634 insns) xlated and 3,313 bytes JITed. This can be explained
      in that LLVM typically optimizes stack based pointer arithmetic
      by using K-based operations and that use of dynamic map access
      is not overly frequent. However, in future we may decide to
      optimize the algorithm further under known guarantees from
      branch and value speculation. Latter seems also unclear in
      terms of prediction heuristics that today's CPUs apply as well
      as whether there could be collisions in e.g. the predictor's
      Value History/Pattern Table for triggering out of bounds access,
      thus masking is performed unconditionally at this point but could
      be subject to relaxation later on. We were generally also
      brainstorming various other approaches for mitigation, but the
      blocker was always lack of available registers at runtime and/or
      overhead for runtime tracking of limits belonging to a specific
      pointer. Thus, we found this to be minimally intrusive under
      given constraints.
      
      With that in place, a simple example with sanitized access on
      unprivileged load at post-verification time looks as follows:
      
        # bpftool prog dump xlated id 282
        [...]
        28: (79) r1 = *(u64 *)(r7 +0)
        29: (79) r2 = *(u64 *)(r7 +8)
        30: (57) r1 &= 15
        31: (79) r3 = *(u64 *)(r0 +4608)
        32: (57) r3 &= 1
        33: (47) r3 |= 1
        34: (2d) if r2 > r3 goto pc+19
        35: (b4) (u32) r11 = (u32) 20479  |
        36: (1f) r11 -= r2                | Dynamic sanitation for pointer
        37: (4f) r11 |= r2                | arithmetic with registers
        38: (87) r11 = -r11               | containing bounded or known
        39: (c7) r11 s>>= 63              | scalars in order to prevent
        40: (5f) r11 &= r2                | out of bounds speculation.
        41: (0f) r4 += r11                |
        42: (71) r4 = *(u8 *)(r4 +0)
        43: (6f) r4 <<= r1
        [...]
      
      For the case where the scalar sits in the destination register
      as opposed to the source register, the following code is emitted
      for the above example:
      
        [...]
        16: (b4) (u32) r11 = (u32) 20479
        17: (1f) r11 -= r2
        18: (4f) r11 |= r2
        19: (87) r11 = -r11
        20: (c7) r11 s>>= 63
        21: (5f) r2 &= r11
        22: (0f) r2 += r0
        23: (61) r0 = *(u32 *)(r2 +0)
        [...]
      
      JIT blinding example with non-conflicting use of r10:
      
        [...]
         d5:	je     0x0000000000000106    _
         d7:	mov    0x0(%rax),%edi       |
         da:	mov    $0xf153246,%r10d     | Index load from map value and
         e0:	xor    $0xf153259,%r10      | (const blinded) mask with 0x1f.
         e7:	and    %r10,%rdi            |_
         ea:	mov    $0x2f,%r10d          |
         f0:	sub    %rdi,%r10            | Sanitized addition. Both use r10
         f3:	or     %rdi,%r10            | but do not interfere with each
         f6:	neg    %r10                 | other. (Neither do these instructions
         f9:	sar    $0x3f,%r10           | interfere with the use of ax as temp
         fd:	and    %r10,%rdi            | in interpreter.)
        100:	add    %rax,%rdi            |_
        103:	mov    0x0(%rdi),%eax
       [...]
      
      Tested that it fixes Jann's reproducer, and also checked that test_verifier
      and test_progs suite with interpreter, JIT and JIT with hardening enabled
      on x86-64 and arm64 runs successfully.
      
        [0] Speculose: Analyzing the Security Implications of Speculative
            Execution in CPUs, Giorgi Maisuradze and Christian Rossow,
            https://arxiv.org/pdf/1801.04084.pdf
      
        [1] A Systematic Evaluation of Transient Execution Attacks and
            Defenses, Claudio Canella, Jo Van Bulck, Michael Schwarz,
            Moritz Lipp, Benjamin von Berg, Philipp Ortner, Frank Piessens,
            Dmitry Evtyushkin, Daniel Gruss,
            https://arxiv.org/pdf/1811.05441.pdf
      
      Fixes: b2157399 ("bpf: prevent out-of-bounds speculation")
      Reported-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      979d63d5
    • Daniel Borkmann's avatar
      bpf: fix check_map_access smin_value test when pointer contains offset · b7137c4e
      Daniel Borkmann authored
      In check_map_access() we probe actual bounds through __check_map_access()
      with offset of reg->smin_value + off for lower bound and offset of
      reg->umax_value + off for the upper bound. However, even though the
      reg->smin_value could have a negative value, the final result of the
      sum with off could be positive when pointer arithmetic with known and
      unknown scalars is combined. In this case we reject the program with
      an error such as "R<x> min value is negative, either use unsigned index
      or do a if (index >=0) check." even though the access itself would be
      fine. Therefore extend the check to probe whether the actual resulting
      reg->smin_value + off is less than zero.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      b7137c4e
    • Daniel Borkmann's avatar
      bpf: restrict unknown scalars of mixed signed bounds for unprivileged · 9d7eceed
      Daniel Borkmann authored
      For unknown scalars of mixed signed bounds, meaning their smin_value is
      negative and their smax_value is positive, we need to reject arithmetic
      with pointer to map value. For unprivileged the goal is to mask every
      map pointer arithmetic and this cannot reliably be done when it is
      unknown at verification time whether the scalar value is negative or
      positive. Given this is a corner case, the likelihood of breaking should
      be very small.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      9d7eceed
    • Daniel Borkmann's avatar
      bpf: restrict stack pointer arithmetic for unprivileged · e4298d25
      Daniel Borkmann authored
      Restrict stack pointer arithmetic for unprivileged users in that
      arithmetic itself must not go out of bounds as opposed to the actual
      access later on. Therefore after each adjust_ptr_min_max_vals() with
      a stack pointer as a destination we simulate a check_stack_access()
      of 1 byte on the destination and once that fails the program is
      rejected for unprivileged program loads. This is analog to map
      value pointer arithmetic and needed for masking later on.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e4298d25
    • Daniel Borkmann's avatar
      bpf: restrict map value pointer arithmetic for unprivileged · 0d6303db
      Daniel Borkmann authored
      Restrict map value pointer arithmetic for unprivileged users in that
      arithmetic itself must not go out of bounds as opposed to the actual
      access later on. Therefore after each adjust_ptr_min_max_vals() with a
      map value pointer as a destination it will simulate a check_map_access()
      of 1 byte on the destination and once that fails the program is rejected
      for unprivileged program loads. We use this later on for masking any
      pointer arithmetic with the remainder of the map value space. The
      likelihood of breaking any existing real-world unprivileged eBPF
      program is very small for this corner case.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      0d6303db
    • Daniel Borkmann's avatar
      bpf: enable access to ax register also from verifier rewrite · 9b73bfdd
      Daniel Borkmann authored
      Right now we are using BPF ax register in JIT for constant blinding as
      well as in interpreter as temporary variable. Verifier will not be able
      to use it simply because its use will get overridden from the former in
      bpf_jit_blind_insn(). However, it can be made to work in that blinding
      will be skipped if there is prior use in either source or destination
      register on the instruction. Taking constraints of ax into account, the
      verifier is then open to use it in rewrites under some constraints. Note,
      ax register already has mappings in every eBPF JIT.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      9b73bfdd
    • Daniel Borkmann's avatar
      bpf: move tmp variable into ax register in interpreter · 144cd91c
      Daniel Borkmann authored
      This change moves the on-stack 64 bit tmp variable in ___bpf_prog_run()
      into the hidden ax register. The latter is currently only used in JITs
      for constant blinding as a temporary scratch register, meaning the BPF
      interpreter will never see the use of ax. Therefore it is safe to use
      it for the cases where tmp has been used earlier. This is needed to later
      on allow restricted hidden use of ax in both interpreter and JITs.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      144cd91c
    • Daniel Borkmann's avatar
      bpf: move {prev_,}insn_idx into verifier env · c08435ec
      Daniel Borkmann authored
      Move prev_insn_idx and insn_idx from the do_check() function into
      the verifier environment, so they can be read inside the various
      helper functions for handling the instructions. It's easier to put
      this into the environment rather than changing all call-sites only
      to pass it along. insn_idx is useful in particular since this later
      on allows to hold state in env->insn_aux_data[env->insn_idx].
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c08435ec
  2. 02 Jan, 2019 4 commits
    • Eric Dumazet's avatar
      isdn: fix kernel-infoleak in capi_unlocked_ioctl · d63967e4
      Eric Dumazet authored
      Since capi_ioctl() copies 64 bytes after calling
      capi20_get_manufacturer() we need to ensure to not leak
      information to user.
      
      BUG: KMSAN: kernel-infoleak in _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32
      CPU: 0 PID: 11245 Comm: syz-executor633 Not tainted 4.20.0-rc7+ #2
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x173/0x1d0 lib/dump_stack.c:113
       kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:613
       kmsan_internal_check_memory+0x9d4/0xb00 mm/kmsan/kmsan.c:704
       kmsan_copy_to_user+0xab/0xc0 mm/kmsan/kmsan_hooks.c:601
       _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32
       capi_ioctl include/linux/uaccess.h:177 [inline]
       capi_unlocked_ioctl+0x1a0b/0x1bf0 drivers/isdn/capi/capi.c:939
       do_vfs_ioctl+0xebd/0x2bf0 fs/ioctl.c:46
       ksys_ioctl fs/ioctl.c:713 [inline]
       __do_sys_ioctl fs/ioctl.c:720 [inline]
       __se_sys_ioctl+0x1da/0x270 fs/ioctl.c:718
       __x64_sys_ioctl+0x4a/0x70 fs/ioctl.c:718
       do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
       entry_SYSCALL_64_after_hwframe+0x63/0xe7
      RIP: 0033:0x440019
      Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007ffdd4659fb8 EFLAGS: 00000213 ORIG_RAX: 0000000000000010
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440019
      RDX: 0000000020000080 RSI: 00000000c0044306 RDI: 0000000000000003
      RBP: 00000000006ca018 R08: 0000000000000000 R09: 00000000004002c8
      R10: 0000000000000000 R11: 0000000000000213 R12: 00000000004018a0
      R13: 0000000000401930 R14: 0000000000000000 R15: 0000000000000000
      
      Local variable description: ----data.i@capi_unlocked_ioctl
      Variable was created at:
       capi_ioctl drivers/isdn/capi/capi.c:747 [inline]
       capi_unlocked_ioctl+0x82/0x1bf0 drivers/isdn/capi/capi.c:939
       do_vfs_ioctl+0xebd/0x2bf0 fs/ioctl.c:46
      
      Bytes 12-63 of 64 are uninitialized
      Memory access of size 64 starts at ffff88807ac5fce8
      Data copied to user address 0000000020000080
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Karsten Keil <isdn@linux-pingi.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d63967e4
    • Stefano Brivio's avatar
      ipv6: route: Fix return value of ip6_neigh_lookup() on neigh_create() error · 7adf3246
      Stefano Brivio authored
      In ip6_neigh_lookup(), we must not return errors coming from
      neigh_create(): if creation of a neighbour entry fails, the lookup should
      return NULL, in the same way as it's done in __neigh_lookup().
      
      Otherwise, callers legitimately checking for a non-NULL return value of
      the lookup function might dereference an invalid pointer.
      
      For instance, on neighbour table overflow, ndisc_router_discovery()
      crashes ndisc_update() by passing ERR_PTR(-ENOBUFS) as 'neigh' argument.
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Fixes: f8a1b43b ("net/ipv6: Create a neigh_lookup for FIB entries")
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7adf3246
    • Eric Dumazet's avatar
      net/hamradio/6pack: use mod_timer() to rearm timers · 202700e3
      Eric Dumazet authored
      Using del_timer() + add_timer() is generally unsafe on SMP,
      as noticed by syzbot. Use mod_timer() instead.
      
      kernel BUG at kernel/time/timer.c:1136!
      invalid opcode: 0000 [#1] PREEMPT SMP KASAN
      CPU: 1 PID: 1026 Comm: kworker/u4:4 Not tainted 4.20.0+ #2
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: events_unbound flush_to_ldisc
      RIP: 0010:add_timer kernel/time/timer.c:1136 [inline]
      RIP: 0010:add_timer+0xa81/0x1470 kernel/time/timer.c:1134
      Code: 4d 89 7d 40 48 c7 85 70 fe ff ff 00 00 00 00 c7 85 7c fe ff ff ff ff ff ff 48 89 85 90 fe ff ff e9 e6 f7 ff ff e8 cf 42 12 00 <0f> 0b e8 c8 42 12 00 0f 0b e8 c1 42 12 00 4c 89 bd 60 fe ff ff e9
      RSP: 0018:ffff8880a7fdf5a8 EFLAGS: 00010293
      RAX: ffff8880a7846340 RBX: dffffc0000000000 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffffffff816f3ee1 RDI: ffff88808a514ff8
      RBP: ffff8880a7fdf760 R08: 0000000000000007 R09: ffff8880a7846c58
      R10: ffff8880a7846340 R11: 0000000000000000 R12: ffff88808a514ff8
      R13: ffff88808a514ff8 R14: ffff88808a514dc0 R15: 0000000000000030
      FS:  0000000000000000(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000000000061c500 CR3: 00000000994d9000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       decode_prio_command drivers/net/hamradio/6pack.c:903 [inline]
       sixpack_decode drivers/net/hamradio/6pack.c:971 [inline]
       sixpack_receive_buf drivers/net/hamradio/6pack.c:457 [inline]
       sixpack_receive_buf+0xf9c/0x1470 drivers/net/hamradio/6pack.c:434
       tty_ldisc_receive_buf+0x164/0x1c0 drivers/tty/tty_buffer.c:465
       tty_port_default_receive_buf+0x114/0x190 drivers/tty/tty_port.c:38
       receive_buf drivers/tty/tty_buffer.c:481 [inline]
       flush_to_ldisc+0x3b2/0x590 drivers/tty/tty_buffer.c:533
       process_one_work+0xd0c/0x1ce0 kernel/workqueue.c:2153
       worker_thread+0x143/0x14a0 kernel/workqueue.c:2296
       kthread+0x357/0x430 kernel/kthread.c:246
       ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Andreas Koensgen <ajk@comnets.uni-bremen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      202700e3
    • Xue Chaojing's avatar
      net-next/hinic:add shutdown callback · 53fe3ed1
      Xue Chaojing authored
      If there is no shutdown callback, our board will report pcie UNF errors
      after restarting. This patch add shutdown callback for hinic.
      Signed-off-by: default avatarXue Chaojing <xuechaojing@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      53fe3ed1
  3. 01 Jan, 2019 8 commits
  4. 31 Dec, 2018 4 commits
    • Tyrel Datwyler's avatar
      ibmveth: fix DMA unmap error in ibmveth_xmit_start error path · 756af9c6
      Tyrel Datwyler authored
      Commit 33a48ab1 ("ibmveth: Fix DMA unmap error") fixed an issue in the
      normal code path of ibmveth_xmit_start() that was originally introduced by
      Commit 6e8ab30e ("ibmveth: Add scatter-gather support"). This original
      fix missed the error path where dma_unmap_page is wrongly called on the
      header portion in descs[0] which was mapped with dma_map_single. As a
      result a failure to DMA map any of the frags results in a dmesg warning
      when CONFIG_DMA_API_DEBUG is enabled.
      
      ------------[ cut here ]------------
      DMA-API: ibmveth 30000002: device driver frees DMA memory with wrong function
        [device address=0x000000000a430000] [size=172 bytes] [mapped as page] [unmapped as single]
      WARNING: CPU: 1 PID: 8426 at kernel/dma/debug.c:1085 check_unmap+0x4fc/0xe10
      ...
      <snip>
      ...
      DMA-API: Mapped at:
      ibmveth_start_xmit+0x30c/0xb60
      dev_hard_start_xmit+0x100/0x450
      sch_direct_xmit+0x224/0x490
      __qdisc_run+0x20c/0x980
      __dev_queue_xmit+0x1bc/0xf20
      
      This fixes the API misuse by unampping descs[0] with dma_unmap_single.
      
      Fixes: 6e8ab30e ("ibmveth: Add scatter-gather support")
      Signed-off-by: default avatarTyrel Datwyler <tyreld@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      756af9c6
    • Heiner Kallweit's avatar
      r8169: fix WoL device wakeup enable · 3bd82645
      Heiner Kallweit authored
      In rtl8169_runtime_resume() we configure WoL but don't set the device
      to wakeup-enabled. This prevents PME generation once the cable is
      re-plugged. Fix this by moving the call to device_set_wakeup_enable()
      to __rtl8169_set_wol().
      
      Fixes: 433f9d0d ("r8169: improve saved_wolopts handling")
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3bd82645
    • Cong Wang's avatar
      netrom: fix locking in nr_find_socket() · 7314f548
      Cong Wang authored
      nr_find_socket(), nr_find_peer() and nr_find_listener() lock the
      sock after finding it in the global list. However, the call path
      requires BH disabled for the sock lock consistently.
      
      Actually the locking is unnecessary at this point, we can just hold
      the sock refcnt to make sure it is not gone after we unlock the global
      list, and lock it later only when needed.
      
      Reported-and-tested-by: syzbot+f621cda8b7e598908efa@syzkaller.appspotmail.com
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7314f548
    • Cong Wang's avatar
      net/wan: fix a double free in x25_asy_open_tty() · d5c7c745
      Cong Wang authored
      When x25_asy_open() fails, it already cleans up by itself,
      so its caller doesn't need to free the memory again.
      
      It seems we still have to call x25_asy_free() to clear the SLF_INUSE
      bit, so just set these pointers to NULL after kfree().
      
      Reported-and-tested-by: syzbot+5e5e969e525129229052@syzkaller.appspotmail.com
      Fixes: 3b780bed ("x25_asy: Free x25_asy on x25_asy_open() failure.")
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d5c7c745
  5. 30 Dec, 2018 5 commits
  6. 29 Dec, 2018 7 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · f7d18ef6
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net, specifically
      fixes for the nf_conncount infrastructure which is causing troubles
      since 5c789e13 ("netfilter: nf_conncount: Add list lock and gc
      worker, and RCU for init tree search"). Patches aim to simplify this
      infrastructure while fixing up the problems:
      
      1) Use fixed size CONNCOUNT_SLOTS in nf_conncount, from Shawn Bohrer.
      
      2) Incorrect signedness in age calculation from find_or_evict(),
         from Florian Westphal.
      
      3) Proper locking for the garbage collector workqueue callback,
         first make a patch to count how many nodes can be collected
         without holding locks, then grab lock and release them. Also
         from Florian.
      
      4) Restart node lookup from the insertion path, after releasing nodes
         via packet path garbage collection. Shawn Bohrer described a scenario
         that may result in inserting a connection in an already dead list
         node. Patch from Florian.
      
      5) Merge lookup and add function to avoid a hold release and re-grab.
         From Florian.
      
      6) Be safe and iterate over the node lists under the spinlock.
      
      7) Speculative list nodes removal via garbage collection, check if
         list node got a connection while it was scheduled for deletion
         via gc.
      
      8) Accidental argument swap in find_next_bit() that leads to more
         frequent scheduling of the workqueue. From Florian Westphal.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7d18ef6
    • Scott Wood's avatar
      fsl/fman: Use GFP_ATOMIC in {memac,tgec}_add_hash_mac_address() · 0d9c9a23
      Scott Wood authored
      These functions are called from atomic context:
      
      [    9.150239] BUG: sleeping function called from invalid context at /home/scott/git/linux/mm/slab.h:421
      [    9.158159] in_atomic(): 1, irqs_disabled(): 0, pid: 4432, name: ip
      [    9.163128] CPU: 8 PID: 4432 Comm: ip Not tainted 4.20.0-rc2-00169-g63d86876 #29
      [    9.163130] Call Trace:
      [    9.170701] [c0000002e899a980] [c0000000009c1068] .dump_stack+0xa8/0xec (unreliable)
      [    9.177140] [c0000002e899aa10] [c00000000007a7b4] .___might_sleep+0x138/0x164
      [    9.184440] [c0000002e899aa80] [c0000000001d5bac] .kmem_cache_alloc_trace+0x238/0x30c
      [    9.191216] [c0000002e899ab40] [c00000000065ea1c] .memac_add_hash_mac_address+0x104/0x198
      [    9.199464] [c0000002e899abd0] [c00000000065a788] .set_multi+0x1c8/0x218
      [    9.206242] [c0000002e899ac80] [c0000000006615ec] .dpaa_set_rx_mode+0xdc/0x17c
      [    9.213544] [c0000002e899ad00] [c00000000083d2b0] .__dev_set_rx_mode+0x80/0xd4
      [    9.219535] [c0000002e899ad90] [c00000000083d334] .dev_set_rx_mode+0x30/0x54
      [    9.225271] [c0000002e899ae10] [c00000000083d4a0] .__dev_open+0x148/0x1c8
      [    9.230751] [c0000002e899aeb0] [c00000000083d934] .__dev_change_flags+0x19c/0x1e0
      [    9.230755] [c0000002e899af60] [c00000000083d9a4] .dev_change_flags+0x2c/0x80
      [    9.242752] [c0000002e899aff0] [c0000000008554ec] .do_setlink+0x350/0xf08
      [    9.248228] [c0000002e899b170] [c000000000857ad0] .rtnl_newlink+0x588/0x7e0
      [    9.253965] [c0000002e899b740] [c000000000852424] .rtnetlink_rcv_msg+0x3e0/0x498
      [    9.261440] [c0000002e899b820] [c000000000884790] .netlink_rcv_skb+0x134/0x14c
      [    9.267607] [c0000002e899b8e0] [c000000000851840] .rtnetlink_rcv+0x18/0x2c
      [    9.274558] [c0000002e899b950] [c000000000883c8c] .netlink_unicast+0x214/0x318
      [    9.281163] [c0000002e899ba00] [c000000000884220] .netlink_sendmsg+0x348/0x444
      [    9.287076] [c0000002e899bae0] [c00000000080d13c] .sock_sendmsg+0x2c/0x54
      [    9.287080] [c0000002e899bb50] [c0000000008106c0] .___sys_sendmsg+0x2d0/0x2d8
      [    9.298375] [c0000002e899bd30] [c000000000811a80] .__sys_sendmsg+0x5c/0xb0
      [    9.303939] [c0000002e899be20] [c0000000000006b0] system_call+0x60/0x6c
      Signed-off-by: default avatarScott Wood <oss@buserror.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d9c9a23
    • Jia-Ju Bai's avatar
      isdn: hisax: hfc_pci: Fix a possible concurrency use-after-free bug in HFCPCI_l1hw() · 7418e652
      Jia-Ju Bai authored
      In drivers/isdn/hisax/hfc_pci.c, the functions hfcpci_interrupt() and
      HFCPCI_l1hw() may be concurrently executed.
      
      HFCPCI_l1hw()
        line 1173: if (!cs->tx_skb)
      
      hfcpci_interrupt()
        line 942: spin_lock_irqsave();
        line 1066: dev_kfree_skb_irq(cs->tx_skb);
      
      Thus, a possible concurrency use-after-free bug may occur
      in HFCPCI_l1hw().
      
      To fix these bugs, the calls to spin_lock_irqsave() and
      spin_unlock_irqrestore() are added in HFCPCI_l1hw(), to protect the
      access to cs->tx_skb.
      Signed-off-by: default avatarJia-Ju Bai <baijiaju1990@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7418e652
    • Yunsheng Lin's avatar
      ethtool: check the return value of get_regs_len · f9fc54d3
      Yunsheng Lin authored
      The return type for get_regs_len in struct ethtool_ops is int,
      the hns3 driver may return error when failing to get the regs
      len by sending cmd to firmware.
      Signed-off-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f9fc54d3
    • Florian Westphal's avatar
      netfilter: nf_conncount: fix argument order to find_next_bit · a0072320
      Florian Westphal authored
      Size and 'next bit' were swapped, this bug could cause worker to
      reschedule itself even if system was idle.
      
      Fixes: 5c789e13 ("netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search")
      Reviewed-by: default avatarShawn Bohrer <sbohrer@cloudflare.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a0072320
    • Pablo Neira Ayuso's avatar
      netfilter: nf_conncount: speculative garbage collection on empty lists · c80f10bc
      Pablo Neira Ayuso authored
      Instead of removing a empty list node that might be reintroduced soon
      thereafter, tentatively place the empty list node on the list passed to
      tree_nodes_free(), then re-check if the list is empty again before erasing
      it from the tree.
      
      [ Florian: rebase on top of pending nf_conncount fixes ]
      
      Fixes: 5c789e13 ("netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search")
      Reviewed-by: default avatarShawn Bohrer <sbohrer@cloudflare.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      c80f10bc
    • Pablo Neira Ayuso's avatar
      netfilter: nf_conncount: move all list iterations under spinlock · 2f971a8f
      Pablo Neira Ayuso authored
      Two CPUs may race to remove a connection from the list, the existing
      conn->dead will result in a use-after-free. Use the per-list spinlock to
      protect list iterations.
      
      As all accesses to the list now happen while holding the per-list lock,
      we no longer need to delay free operations with rcu.
      
      Joint work with Florian.
      
      Fixes: 5c789e13 ("netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search")
      Reviewed-by: default avatarShawn Bohrer <sbohrer@cloudflare.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      2f971a8f