1. 21 Dec, 2017 19 commits
    • Quentin Monnet's avatar
      selftests/bpf: fix Makefile for passing LLC to the command line · cd95a892
      Quentin Monnet authored
      Makefile has a LLC variable that is initialised to "llc", but can
      theoretically be overridden from the command line ("make LLC=llc-6.0").
      However, this fails because for LLVM probe check, "llc" is called
      directly. Use the $(LLC) variable instead to fix this.
      
      Fixes: 22c88526 ("bpf: improve selftests and add tests for meta pointer")
      Signed-off-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      cd95a892
    • David S. Miller's avatar
      Merge branch 'net-zerocopy-fixes' · c50b7c47
      David S. Miller authored
      Saeed Mahameed says:
      
      ===================
      Mellanox, mlx5 fixes 2017-12-19
      
      The follwoing series includes some fixes for mlx5 core and etherent
      driver.
      
      Please pull and let me know if there is any problem.
      
      This series doesn't introduce any conflict with the ongoing mlx5 for-next
      submission.
      
      For -stable:
      
      kernels >= v4.7.y
          ("net/mlx5e: Fix possible deadlock of VXLAN lock")
          ("net/mlx5e: Add refcount to VXLAN structure")
          ("net/mlx5e: Prevent possible races in VXLAN control flow")
          ("net/mlx5e: Fix features check of IPv6 traffic")
      
      kernels >= v4.9.y
          ("net/mlx5: Fix error flow in CREATE_QP command")
          ("net/mlx5: Fix rate limit packet pacing naming and struct")
      
      kernels >= v4.13.y
          ("net/mlx5: FPGA, return -EINVAL if size is zero")
      
      kernels >= v4.14.y
          ("Revert "mlx5: move affinity hints assignments to generic code")
      
      All above patches apply and compile with no issues on corresponding -stable.
      ===================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c50b7c47
    • Willem de Bruijn's avatar
      skbuff: skb_copy_ubufs must release uarg even without user frags · b90ddd56
      Willem de Bruijn authored
      skb_copy_ubufs creates a private copy of frags[] to release its hold
      on user frags, then calls uarg->callback to notify the owner.
      
      Call uarg->callback even when no frags exist. This edge case can
      happen when zerocopy_sg_from_iter finds enough room in skb_headlen
      to copy all the data.
      
      Fixes: 3ece7826 ("sock: skb_copy_ubufs support for compound pages")
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b90ddd56
    • Willem de Bruijn's avatar
      skbuff: orphan frags before zerocopy clone · 268b7906
      Willem de Bruijn authored
      Call skb_zerocopy_clone after skb_orphan_frags, to avoid duplicate
      calls to skb_uarg(skb)->callback for the same data.
      
      skb_zerocopy_clone associates skb_shinfo(skb)->uarg from frag_skb
      with each segment. This is only safe for uargs that do refcounting,
      which is those that pass skb_orphan_frags without dropping their
      shared frags. For others, skb_orphan_frags drops the user frags and
      sets the uarg to NULL, after which sock_zerocopy_clone has no effect.
      
      Qemu hangs were reported due to duplicate vhost_net_zerocopy_callback
      calls for the same data causing the vhost_net_ubuf_ref_>refcount to
      drop below zero.
      
      Link: http://lkml.kernel.org/r/<CAF=yD-LWyCD4Y0aJ9O0e_CHLR+3JOeKicRRTEVCPxgw4XOcqGQ@mail.gmail.com>
      Fixes: 1f8b977a ("sock: enable MSG_ZEROCOPY")
      Reported-by: default avatarAndreas Hartmann <andihartmann@01019freenet.de>
      Reported-by: default avatarDavid Hill <dhill@redhat.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      268b7906
    • Shaohua Li's avatar
      net: reevalulate autoflowlabel setting after sysctl setting · 513674b5
      Shaohua Li authored
      sysctl.ip6.auto_flowlabels is default 1. In our hosts, we set it to 2.
      If sockopt doesn't set autoflowlabel, outcome packets from the hosts are
      supposed to not include flowlabel. This is true for normal packet, but
      not for reset packet.
      
      The reason is ipv6_pinfo.autoflowlabel is set in sock creation. Later if
      we change sysctl.ip6.auto_flowlabels, the ipv6_pinfo.autoflowlabel isn't
      changed, so the sock will keep the old behavior in terms of auto
      flowlabel. Reset packet is suffering from this problem, because reset
      packet is sent from a special control socket, which is created at boot
      time. Since sysctl.ipv6.auto_flowlabels is 1 by default, the control
      socket will always have its ipv6_pinfo.autoflowlabel set, even after
      user set sysctl.ipv6.auto_flowlabels to 1, so reset packset will always
      have flowlabel. Normal sock created before sysctl setting suffers from
      the same issue. We can't even turn off autoflowlabel unless we kill all
      socks in the hosts.
      
      To fix this, if IPV6_AUTOFLOWLABEL sockopt is used, we use the
      autoflowlabel setting from user, otherwise we always call
      ip6_default_np_autolabel() which has the new settings of sysctl.
      
      Note, this changes behavior a little bit. Before commit 42240901
      (ipv6: Implement different admin modes for automatic flow labels), the
      autoflowlabel behavior of a sock isn't sticky, eg, if sysctl changes,
      existing connection will change autoflowlabel behavior. After that
      commit, autoflowlabel behavior is sticky in the whole life of the sock.
      With this patch, the behavior isn't sticky again.
      
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Tom Herbert <tom@quantonium.net>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      513674b5
    • Eric Garver's avatar
      openvswitch: Fix pop_vlan action for double tagged frames · c48e7473
      Eric Garver authored
      skb_vlan_pop() expects skb->protocol to be a valid TPID for double
      tagged frames. So set skb->protocol to the TPID and let skb_vlan_pop()
      shift the true ethertype into position for us.
      
      Fixes: 5108bbad ("openvswitch: add processing of L3 packets")
      Signed-off-by: default avatarEric Garver <e@erig.me>
      Reviewed-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c48e7473
    • Ido Schimmel's avatar
      ipv6: Honor specified parameters in fibmatch lookup · 58acfd71
      Ido Schimmel authored
      Currently, parameters such as oif and source address are not taken into
      account during fibmatch lookup. Example (IPv4 for reference) before
      patch:
      
      $ ip -4 route show
      192.0.2.0/24 dev dummy0 proto kernel scope link src 192.0.2.1
      198.51.100.0/24 dev dummy1 proto kernel scope link src 198.51.100.1
      
      $ ip -6 route show
      2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
      2001:db8:2::/64 dev dummy1 proto kernel metric 256 pref medium
      fe80::/64 dev dummy0 proto kernel metric 256 pref medium
      fe80::/64 dev dummy1 proto kernel metric 256 pref medium
      
      $ ip -4 route get fibmatch 192.0.2.2 oif dummy0
      192.0.2.0/24 dev dummy0 proto kernel scope link src 192.0.2.1
      $ ip -4 route get fibmatch 192.0.2.2 oif dummy1
      RTNETLINK answers: No route to host
      
      $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy0
      2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
      $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy1
      2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
      
      After:
      
      $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy0
      2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
      $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy1
      RTNETLINK answers: Network is unreachable
      
      The problem stems from the fact that the necessary route lookup flags
      are not set based on these parameters.
      
      Instead of duplicating the same logic for fibmatch, we can simply
      resolve the original route from its copy and dump it instead.
      
      Fixes: 18c3a61c ("net: ipv6: RTM_GETROUTE: return matched fib result when requested")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      58acfd71
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 8b6ca2bf
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2017-12-21
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Fix multiple security issues in the BPF verifier mostly related
         to the value and min/max bounds tracking rework in 4.14. Issues
         range from incorrect bounds calculation in some BPF_RSH cases,
         to improper sign extension and reg size handling on 32 bit
         ALU ops, missing strict alignment checks on stack pointers, and
         several others that got fixed, from Jann, Alexei and Edward.
      
      2) Fix various build failures in BPF selftests on sparc64. More
         specifically, librt needed to be added to the libs to link
         against and few format string fixups for sizeof, from David.
      
      3) Fix one last remaining issue from BPF selftest build that was
         still occuring on s390x from the asm/bpf_perf_event.h include
         which could not find the asm/ptrace.h copy, from Hendrik.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8b6ca2bf
    • Alexei Starovoitov's avatar
      bpf: do not allow root to mangle valid pointers · 82abbf8d
      Alexei Starovoitov authored
      Do not allow root to convert valid pointers into unknown scalars.
      In particular disallow:
       ptr &= reg
       ptr <<= reg
       ptr += ptr
      and explicitly allow:
       ptr -= ptr
      since pkt_end - pkt == length
      
      1.
      This minimizes amount of address leaks root can do.
      In the future may need to further tighten the leaks with kptr_restrict.
      
      2.
      If program has such pointer math it's likely a user mistake and
      when verifier complains about it right away instead of many instructions
      later on invalid memory access it's easier for users to fix their progs.
      
      3.
      when register holding a pointer cannot change to scalar it allows JITs to
      optimize better. Like 32-bit archs could use single register for pointers
      instead of a pair required to hold 64-bit scalars.
      
      4.
      reduces architecture dependent behavior. Since code:
      r1 = r10;
      r1 &= 0xff;
      if (r1 ...)
      will behave differently arm64 vs x64 and offloaded vs native.
      
      A significant chunk of ptr mangling was allowed by
      commit f1174f77 ("bpf/verifier: rework value tracking")
      yet some of it was allowed even earlier.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      82abbf8d
    • Daniel Borkmann's avatar
      Merge branch 'bpf-verifier-sec-fixes' · 3db9128f
      Daniel Borkmann authored
      Alexei Starovoitov says:
      
      ====================
      This patch set addresses a set of security vulnerabilities
      in bpf verifier logic discovered by Jann Horn.
      All of the patches are candidates for 4.14 stable.
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      3db9128f
    • Jann Horn's avatar
      selftests/bpf: add tests for recent bugfixes · 2255f8d5
      Jann Horn authored
      These tests should cover the following cases:
      
       - MOV with both zero-extended and sign-extended immediates
       - implicit truncation of register contents via ALU32/MOV32
       - implicit 32-bit truncation of ALU32 output
       - oversized register source operand for ALU32 shift
       - right-shift of a number that could be positive or negative
       - map access where adding the operation size to the offset causes signed
         32-bit overflow
       - direct stack access at a ~4GiB offset
      
      Also remove the F_LOAD_WITH_STRICT_ALIGNMENT flag from a bunch of tests
      that should fail independent of what flags userspace passes.
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      2255f8d5
    • Alexei Starovoitov's avatar
      bpf: fix integer overflows · bb7f0f98
      Alexei Starovoitov authored
      There were various issues related to the limited size of integers used in
      the verifier:
       - `off + size` overflow in __check_map_access()
       - `off + reg->off` overflow in check_mem_access()
       - `off + reg->var_off.value` overflow or 32-bit truncation of
         `reg->var_off.value` in check_mem_access()
       - 32-bit truncation in check_stack_boundary()
      
      Make sure that any integer math cannot overflow by not allowing
      pointer math with large values.
      
      Also reduce the scope of "scalar op scalar" tracking.
      
      Fixes: f1174f77 ("bpf/verifier: rework value tracking")
      Reported-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      bb7f0f98
    • Jann Horn's avatar
      bpf: don't prune branches when a scalar is replaced with a pointer · 179d1c56
      Jann Horn authored
      This could be made safe by passing through a reference to env and checking
      for env->allow_ptr_leaks, but it would only work one way and is probably
      not worth the hassle - not doing it will not directly lead to program
      rejection.
      
      Fixes: f1174f77 ("bpf/verifier: rework value tracking")
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      179d1c56
    • Jann Horn's avatar
      bpf: force strict alignment checks for stack pointers · a5ec6ae1
      Jann Horn authored
      Force strict alignment checks for stack pointers because the tracking of
      stack spills relies on it; unaligned stack accesses can lead to corruption
      of spilled registers, which is exploitable.
      
      Fixes: f1174f77 ("bpf/verifier: rework value tracking")
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      a5ec6ae1
    • Jann Horn's avatar
      bpf: fix missing error return in check_stack_boundary() · ea25f914
      Jann Horn authored
      Prevent indirect stack accesses at non-constant addresses, which would
      permit reading and corrupting spilled pointers.
      
      Fixes: f1174f77 ("bpf/verifier: rework value tracking")
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      ea25f914
    • Jann Horn's avatar
      bpf: fix 32-bit ALU op verification · 468f6eaf
      Jann Horn authored
      32-bit ALU ops operate on 32-bit values and have 32-bit outputs.
      Adjust the verifier accordingly.
      
      Fixes: f1174f77 ("bpf/verifier: rework value tracking")
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      468f6eaf
    • Jann Horn's avatar
      bpf: fix incorrect tracking of register size truncation · 0c17d1d2
      Jann Horn authored
      Properly handle register truncation to a smaller size.
      
      The old code first mirrors the clearing of the high 32 bits in the bitwise
      tristate representation, which is correct. But then, it computes the new
      arithmetic bounds as the intersection between the old arithmetic bounds and
      the bounds resulting from the bitwise tristate representation. Therefore,
      when coerce_reg_to_32() is called on a number with bounds
      [0xffff'fff8, 0x1'0000'0007], the verifier computes
      [0xffff'fff8, 0xffff'ffff] as bounds of the truncated number.
      This is incorrect: The truncated number could also be in the range [0, 7],
      and no meaningful arithmetic bounds can be computed in that case apart from
      the obvious [0, 0xffff'ffff].
      
      Starting with v4.14, this is exploitable by unprivileged users as long as
      the unprivileged_bpf_disabled sysctl isn't set.
      
      Debian assigned CVE-2017-16996 for this issue.
      
      v2:
       - flip the mask during arithmetic bounds calculation (Ben Hutchings)
      v3:
       - add CVE number (Ben Hutchings)
      
      Fixes: b03c9f9f ("bpf/verifier: track signed and unsigned min/max values")
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Acked-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      0c17d1d2
    • Jann Horn's avatar
      bpf: fix incorrect sign extension in check_alu_op() · 95a762e2
      Jann Horn authored
      Distinguish between
      BPF_ALU64|BPF_MOV|BPF_K (load 32-bit immediate, sign-extended to 64-bit)
      and BPF_ALU|BPF_MOV|BPF_K (load 32-bit immediate, zero-padded to 64-bit);
      only perform sign extension in the first case.
      
      Starting with v4.14, this is exploitable by unprivileged users as long as
      the unprivileged_bpf_disabled sysctl isn't set.
      
      Debian assigned CVE-2017-16995 for this issue.
      
      v3:
       - add CVE number (Ben Hutchings)
      
      Fixes: 48461135 ("bpf: allow access into map value arrays")
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Acked-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      95a762e2
    • Edward Cree's avatar
      bpf/verifier: fix bounds calculation on BPF_RSH · 4374f256
      Edward Cree authored
      Incorrect signed bounds were being computed.
      If the old upper signed bound was positive and the old lower signed bound was
      negative, this could cause the new upper signed bound to be too low,
      leading to security issues.
      
      Fixes: b03c9f9f ("bpf/verifier: track signed and unsigned min/max values")
      Reported-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      [jannh@google.com: changed description to reflect bug impact]
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      4374f256
  2. 20 Dec, 2017 13 commits
  3. 19 Dec, 2017 8 commits
    • David Miller's avatar
      bpf: Fix tools and testing build. · 19c832ed
      David Miller authored
      I'm getting various build failures on sparc64.  The key is
      usually that the userland tools get built 32-bit.
      
      1) clock_gettime() is in librt, so that must be added to the link
         libraries.
      
      2) "sizeof(x)" must be printed with "%Z" printf prefix.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      19c832ed
    • Moshe Shemesh's avatar
      net/mlx5: Stay in polling mode when command EQ destroy fails · a2fba188
      Moshe Shemesh authored
      During unload, on mlx5_stop_eqs we move command interface from events
      mode to polling mode, but if command interface EQ destroy fail we move
      back to events mode.
      That's wrong since even if we fail to destroy command interface EQ, we
      do release its irq, so no interrupts will be received.
      
      Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
      Signed-off-by: default avatarMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      a2fba188
    • Moshe Shemesh's avatar
      net/mlx5: Cleanup IRQs in case of unload failure · d6b2785c
      Moshe Shemesh authored
      When mlx5_stop_eqs fails to destroy any of the eqs it returns with an error.
      In such failure flow the function will return without
      releasing all EQs irqs and then pci_free_irq_vectors will fail.
      Fix by only warn on destroy EQ failure and continue to release other
      EQs and their irqs.
      
      It fixes the following kernel trace:
      kernel: kernel BUG at drivers/pci/msi.c:352!
      ...
      ...
      kernel: Call Trace:
      kernel: pci_disable_msix+0xd3/0x100
      kernel: pci_free_irq_vectors+0xe/0x20
      kernel: mlx5_load_one.isra.17+0x9f5/0xec0 [mlx5_core]
      
      Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
      Signed-off-by: default avatarMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      d6b2785c
    • Maor Gottlieb's avatar
      net/mlx5: Fix steering memory leak · 139ed6c6
      Maor Gottlieb authored
      Flow steering priority and namespace are software only objects that
      didn't have the proper destructors and were not freed during steering
      cleanup.
      
      Fix it by adding destructor functions for these objects.
      
      Fixes: bd71b08e ("net/mlx5: Support multiple updates of steering rules in parallel")
      Signed-off-by: default avatarMaor Gottlieb <maorg@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      139ed6c6
    • Gal Pressman's avatar
      net/mlx5e: Prevent possible races in VXLAN control flow · 0c1cc8b2
      Gal Pressman authored
      When calling add/remove VXLAN port, a lock must be held in order to
      prevent race scenarios when more than one add/remove happens at the
      same time.
      Fix by holding our state_lock (mutex) as done by all other parts of the
      driver.
      Note that the spinlock protecting the radix-tree is still needed in
      order to synchronize radix-tree access from softirq context.
      
      Fixes: b3f63c3d ("net/mlx5e: Add netdev support for VXLAN tunneling")
      Signed-off-by: default avatarGal Pressman <galp@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      0c1cc8b2
    • Gal Pressman's avatar
      net/mlx5e: Add refcount to VXLAN structure · 23f4cc2c
      Gal Pressman authored
      A refcount mechanism must be implemented in order to prevent unwanted
      scenarios such as:
      - Open an IPv4 VXLAN interface
      - Open an IPv6 VXLAN interface (different socket)
      - Remove one of the interfaces
      
      With current implementation, the UDP port will be removed from our VXLAN
      database and turn off the offloads for the other interface, which is
      still active.
      The reference count mechanism will only allow UDP port removals once all
      consumers are gone.
      
      Fixes: b3f63c3d ("net/mlx5e: Add netdev support for VXLAN tunneling")
      Signed-off-by: default avatarGal Pressman <galp@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      23f4cc2c
    • Gal Pressman's avatar
      net/mlx5e: Fix possible deadlock of VXLAN lock · 63235141
      Gal Pressman authored
      mlx5e_vxlan_lookup_port is called both from mlx5e_add_vxlan_port (user
      context) and mlx5e_features_check (softirq), but the lock acquired does
      not disable bottom half and might result in deadlock. Fix it by simply
      replacing spin_lock() with spin_lock_bh().
      While at it, replace all unnecessary spin_lock_irq() to spin_lock_bh().
      
      lockdep's WARNING: inconsistent lock state
      [  654.028136] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
      [  654.028229] swapper/5/0 [HC0[0]:SC1[9]:HE1:SE0] takes:
      [  654.028321]  (&(&vxlan_db->lock)->rlock){+.?.}, at: [<ffffffffa06e7f0e>] mlx5e_vxlan_lookup_port+0x1e/0x50 [mlx5_core]
      [  654.028528] {SOFTIRQ-ON-W} state was registered at:
      [  654.028607]   _raw_spin_lock+0x3c/0x70
      [  654.028689]   mlx5e_vxlan_lookup_port+0x1e/0x50 [mlx5_core]
      [  654.028794]   mlx5e_vxlan_add_port+0x2e/0x120 [mlx5_core]
      [  654.028878]   process_one_work+0x1e9/0x640
      [  654.028942]   worker_thread+0x4a/0x3f0
      [  654.029002]   kthread+0x141/0x180
      [  654.029056]   ret_from_fork+0x24/0x30
      [  654.029114] irq event stamp: 579088
      [  654.029174] hardirqs last  enabled at (579088): [<ffffffff818f475a>] ip6_finish_output2+0x49a/0x8c0
      [  654.029309] hardirqs last disabled at (579087): [<ffffffff818f470e>] ip6_finish_output2+0x44e/0x8c0
      [  654.029446] softirqs last  enabled at (579030): [<ffffffff810b3b3d>] irq_enter+0x6d/0x80
      [  654.029567] softirqs last disabled at (579031): [<ffffffff810b3c05>] irq_exit+0xb5/0xc0
      [  654.029684] other info that might help us debug this:
      [  654.029781]  Possible unsafe locking scenario:
      
      [  654.029868]        CPU0
      [  654.029908]        ----
      [  654.029947]   lock(&(&vxlan_db->lock)->rlock);
      [  654.030045]   <Interrupt>
      [  654.030090]     lock(&(&vxlan_db->lock)->rlock);
      [  654.030162]
       *** DEADLOCK ***
      
      Fixes: b3f63c3d ("net/mlx5e: Add netdev support for VXLAN tunneling")
      Signed-off-by: default avatarGal Pressman <galp@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      63235141
    • Moni Shoua's avatar
      net/mlx5: Fix error flow in CREATE_QP command · dbff26e4
      Moni Shoua authored
      In error flow, when DESTROY_QP command should be executed, the wrong
      mailbox was set with data, not the one that is written to hardware,
      Fix that.
      
      Fixes: 09a7d9ec '{net,IB}/mlx5: QP/XRCD commands via mlx5 ifc'
      Signed-off-by: default avatarMoni Shoua <monis@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      dbff26e4