1. 01 Dec, 2018 1 commit
    • Daniel Borkmann's avatar
      bpf: fix pointer offsets in context for 32 bit · b7df9ada
      Daniel Borkmann authored
      Currently, pointer offsets in three BPF context structures are
      broken in two scenarios: i) 32 bit compiled applications running
      on 64 bit kernels, and ii) LLVM compiled BPF programs running
      on 32 bit kernels. The latter is due to BPF target machine being
      strictly 64 bit. So in each of the cases the offsets will mismatch
      in verifier when checking / rewriting context access. Fix this by
      providing a helper macro __bpf_md_ptr() that will enforce padding
      up to 64 bit and proper alignment, and for context access a macro
      bpf_ctx_range_ptr() which will cover full 64 bit member range on
      32 bit archs. For flow_keys, we additionally need to force the
      size check to sizeof(__u64) as with other pointer types.
      
      Fixes: d58e468b ("flow_dissector: implements flow dissector BPF hook")
      Fixes: 4f738adb ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data")
      Fixes: 2dbb9b9e ("bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT")
      Reported-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Tested-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      b7df9ada
  2. 30 Nov, 2018 1 commit
  3. 29 Nov, 2018 6 commits
    • Yonghong Song's avatar
      tools: bpftool: fix a bitfield pretty print issue · 528bff0c
      Yonghong Song authored
      Commit b12d6ec0 ("bpf: btf: add btf print functionality")
      added btf pretty print functionality to bpftool.
      There is a problem though in printing a bitfield whose type
      has modifiers.
      
      For example, for a type like
        typedef int ___int;
        struct tmp_t {
                int a:3;
                ___int b:3;
        };
      Suppose we have a map
        struct bpf_map_def SEC("maps") tmpmap = {
                .type = BPF_MAP_TYPE_HASH,
                .key_size = sizeof(__u32),
                .value_size = sizeof(struct tmp_t),
                .max_entries = 1,
        };
      and the hash table is populated with one element with
      key 0 and value (.a = 1 and .b = 2).
      
      In BTF, the struct member "b" will have a type "typedef" which
      points to an int type. The current implementation does not
      pass the bit offset during transition from typedef to int type,
      hence incorrectly print the value as
        $ bpftool m d id 79
        [{
                "key": 0,
                "value": {
                    "a": 0x1,
                    "b": 0x1
                }
            }
        ]
      
      This patch fixed the issue by carrying bit_offset along the type
      chain during bit_field print. The correct result can be printed as
        $ bpftool m d id 76
        [{
                "key": 0,
                "value": {
                    "a": 0x1,
                    "b": 0x2
                }
            }
        ]
      
      The kernel pretty print is implemented correctly and does not
      have this issue.
      
      Fixes: b12d6ec0 ("bpf: btf: add btf print functionality")
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      528bff0c
    • Alexei Starovoitov's avatar
      Merge branch 'btf-check-name' · c2209c6d
      Alexei Starovoitov authored
      Yonghong Song says:
      
      ====================
      This patch set added name checking for PTR, ARRAY, VOLATILE, TYPEDEF,
      CONST, RESTRICT, STRUCT, UNION, ENUM and FWD types. Such a strict
      name checking makes BTF more sound in the kernel and future
      BTF-to-header-file converesion ([1]) less fragile.
      
      Patch #1 implemented btf_name_valid_identifier() for name checking
      which will be used in Patch #2.
      Patch #2 checked name validity for the above mentioned types.
      Patch #3 fixed two existing test_btf unit tests exposed by the strict
      name checking.
      Patch #4 added additional test cases.
      
      This patch set is against bpf tree.
      
      Patch #1 has been implemented in bpf-next commit
      Commit 2667a262 ("bpf: btf: Add BTF_KIND_FUNC
      and BTF_KIND_FUNC_PROTO"), so there is no need to apply this
      patch to bpf-next. In case this patch is applied to bpf-next,
      there will be a minor conflict like
        diff --cc kernel/bpf/btf.c
        index a09b2f94ab25,93c233ab2db6..000000000000
        --- a/kernel/bpf/btf.c
        +++ b/kernel/bpf/btf.c
        @@@ -474,7 -451,7 +474,11 @@@ static bool btf_name_valid_identifier(c
                return !*src;
          }
      
        ++<<<<<<< HEAD
         +const char *btf_name_by_offset(const struct btf *btf, u32 offset)
        ++=======
        + static const char *btf_name_by_offset(const struct btf *btf, u32 offset)
        ++>>>>>>> fa9566b0847d... bpf: btf: implement btf_name_valid_identifier()
          {
                if (!offset)
                        return "(anon)";
      Just resolve the conflict by taking the "const char ..." line.
      
      Patches #2, #3 and #4 can be applied to bpf-next without conflict.
      
      [1]: http://vger.kernel.org/lpc-bpf2018.html#session-2
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c2209c6d
    • Yonghong Song's avatar
      tools/bpf: add addition type tests to test_btf · d0848912
      Yonghong Song authored
      The following additional unit testcases are added to test_btf:
      ...
      BTF raw test[42] (typedef (invalid name, name_off = 0)): OK
      BTF raw test[43] (typedef (invalid name, invalid identifier)): OK
      BTF raw test[44] (ptr type (invalid name, name_off <> 0)): OK
      BTF raw test[45] (volatile type (invalid name, name_off <> 0)): OK
      BTF raw test[46] (const type (invalid name, name_off <> 0)): OK
      BTF raw test[47] (restrict type (invalid name, name_off <> 0)): OK
      BTF raw test[48] (fwd type (invalid name, name_off = 0)): OK
      BTF raw test[49] (fwd type (invalid name, invalid identifier)): OK
      BTF raw test[50] (array type (invalid name, name_off <> 0)): OK
      BTF raw test[51] (struct type (name_off = 0)): OK
      BTF raw test[52] (struct type (invalid name, invalid identifier)): OK
      BTF raw test[53] (struct member (name_off = 0)): OK
      BTF raw test[54] (struct member (invalid name, invalid identifier)): OK
      BTF raw test[55] (enum type (name_off = 0)): OK
      BTF raw test[56] (enum type (invalid name, invalid identifier)): OK
      BTF raw test[57] (enum member (invalid name, name_off = 0)): OK
      BTF raw test[58] (enum member (invalid name, invalid identifier)): OK
      ...
      
      Fixes: c0fa1b6c ("bpf: btf: Add BTF tests")
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d0848912
    • Martin KaFai Lau's avatar
      tools/bpf: fix two test_btf unit test cases · 8800cd03
      Martin KaFai Lau authored
      There are two unit test cases, which should encode
      TYPEDEF type, but instead encode PTR type.
      The error is flagged out after enforcing name
      checking in the previous patch.
      
      Fixes: c0fa1b6c ("bpf: btf: Add BTF tests")
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      8800cd03
    • Yonghong Song's avatar
      bpf: btf: check name validity for various types · eb04bbb6
      Yonghong Song authored
      This patch added name checking for the following types:
       . BTF_KIND_PTR, BTF_KIND_ARRAY, BTF_KIND_VOLATILE,
         BTF_KIND_CONST, BTF_KIND_RESTRICT:
           the name must be null
       . BTF_KIND_STRUCT, BTF_KIND_UNION: the struct/member name
           is either null or a valid identifier
       . BTF_KIND_ENUM: the enum type name is either null or a valid
           identifier; the enumerator name must be a valid identifier.
       . BTF_KIND_FWD: the name must be a valid identifier
       . BTF_KIND_TYPEDEF: the name must be a valid identifier
      
      For those places a valid name is required, the name must be
      a valid C identifier. This can be relaxed later if we found
      use cases for a different (non-C) frontend.
      
      Fixes: 69b693f0 ("bpf: btf: Introduce BPF Type Format (BTF)")
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      eb04bbb6
    • Yonghong Song's avatar
      bpf: btf: implement btf_name_valid_identifier() · cdbb096a
      Yonghong Song authored
      Function btf_name_valid_identifier() have been implemented in
      bpf-next commit 2667a262 ("bpf: btf: Add BTF_KIND_FUNC and
      BTF_KIND_FUNC_PROTO"). Backport this function so later patch
      can use it.
      
      Fixes: 69b693f0 ("bpf: btf: Introduce BPF Type Format (BTF)")
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      cdbb096a
  4. 28 Nov, 2018 16 commits
    • David S. Miller's avatar
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue · d78a5ebd
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Fixes 2018-11-28
      
      This series contains fixes to igb, ixgbe and i40e.
      
      Yunjian Wang from Huawei resolves a variable that could potentially be
      NULL before it is used.
      
      Lihong fixes an i40e issue which goes back to 4.17 kernels, where
      deleting any of the MAC filters was causing the incorrect syncing for
      the PF.
      
      Josh Elsasser caught that there were missing enum values in the link
      capabilities for x550 devices, which was preventing link for 1000BaseLX
      SFP modules.
      
      Jan fixes the function header comments for XSK methods.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d78a5ebd
    • Julian Wiedmann's avatar
      s390/qeth: fix length check in SNMP processing · 9a764c1e
      Julian Wiedmann authored
      The response for a SNMP request can consist of multiple parts, which
      the cmd callback stages into a kernel buffer until all parts have been
      received. If the callback detects that the staging buffer provides
      insufficient space, it bails out with error.
      This processing is buggy for the first part of the response - while it
      initially checks for a length of 'data_len', it later copies an
      additional amount of 'offsetof(struct qeth_snmp_cmd, data)' bytes.
      
      Fix the calculation of 'data_len' for the first part of the response.
      This also nicely cleans up the memcpy code.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Reviewed-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9a764c1e
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · e9d8faf9
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Disable BH while holding list spinlock in nf_conncount, from
         Taehee Yoo.
      
      2) List corruption in nf_conncount, also from Taehee.
      
      3) Fix race that results in leaving around an empty list node in
         nf_conncount, from Taehee Yoo.
      
      4) Proper chain handling for inactive chains from the commit path,
         from Florian Westphal. This includes a selftest for this.
      
      5) Do duplicate rule handles when replacing rules, also from Florian.
      
      6) Remove net_exit path in xt_RATEEST that results in splat, from Taehee.
      
      7) Possible use-after-free in nft_compat when releasing extensions.
         From Florian.
      
      8) Memory leak in xt_hashlimit, from Taehee.
      
      9) Call ip_vs_dst_notifier after ipv6_dev_notf, from Xin Long.
      
      10) Fix cttimeout with udplite and gre, from Florian.
      
      11) Preserve oif for IPv6 link-local generated traffic from mangle
          table, from Alin Nastac.
      
      12) Missing error handling in masquerade notifiers, from Taehee Yoo.
      
      13) Use mutex to protect registration/unregistration of masquerade
          extensions in order to prevent a race, from Taehee.
      
      14) Incorrect condition check in tree_nodes_free(), also from Taehee.
      
      15) Fix chain counter leak in rule replacement path, from Taehee.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e9d8faf9
    • Pan Bian's avatar
      net: hisilicon: remove unexpected free_netdev · c7589401
      Pan Bian authored
      The net device ndev is freed via free_netdev when failing to register
      the device. The control flow then jumps to the error handling code
      block. ndev is used and freed again. Resulting in a use-after-free bug.
      Signed-off-by: default avatarPan Bian <bianpan2016@163.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c7589401
    • Pan Bian's avatar
      rapidio/rionet: do not free skb before reading its length · cfc43519
      Pan Bian authored
      skb is freed via dev_kfree_skb_any, however, skb->len is read then. This
      may result in a use-after-free bug.
      
      Fixes: e6161d64 ("rapidio/rionet: rework driver initialization and removal")
      Signed-off-by: default avatarPan Bian <bianpan2016@163.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cfc43519
    • Jan Sokolowski's avatar
      i40e: fix kerneldoc for xsk methods · 529eb362
      Jan Sokolowski authored
      One method, xsk_umem_setup, had an incorrect kernel doc
      description, which has been corrected.
      
      Also fixes small typos found in the comments.
      Signed-off-by: default avatarJan Sokolowski <jan.sokolowski@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      529eb362
    • Josh Elsasser's avatar
      ixgbe: recognize 1000BaseLX SFP modules as 1Gbps · a8bf879a
      Josh Elsasser authored
      Add the two 1000BaseLX enum values to the X550's check for 1Gbps modules,
      allowing the core driver code to establish a link over this SFP type.
      
      This is done by the out-of-tree driver but the fix wasn't in mainline.
      
      Fixes: e23f3336 ("ixgbe: Fix 1G and 10G link stability for X550EM_x SFP+”)
      Fixes: 6a14ee0c ("ixgbe: Add X550 support function pointers")
      Signed-off-by: default avatarJosh Elsasser <jelsasser@appneta.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      a8bf879a
    • Lihong Yang's avatar
      i40e: Fix deletion of MAC filters · eab077aa
      Lihong Yang authored
      In __i40e_del_filter function, the flag __I40E_MACVLAN_SYNC_PENDING for
      the PF state is wrongly set for the VSI. Deleting any of the MAC filters
      has caused the incorrect syncing for the PF. Fix it by setting this state
      flag to the intended PF.
      
      CC: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarLihong Yang <lihong.yang@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      eab077aa
    • Yunjian Wang's avatar
      igb: fix uninitialized variables · e4c39f79
      Yunjian Wang authored
      This patch fixes the variable 'phy_word' may be used uninitialized.
      Signed-off-by: default avatarYunjian Wang <wangyunjian@huawei.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      e4c39f79
    • Taehee Yoo's avatar
      netfilter: nf_tables: deactivate expressions in rule replecement routine · ca089878
      Taehee Yoo authored
      There is no expression deactivation call from the rule replacement path,
      hence, chain counter is not decremented. A few steps to reproduce the
      problem:
      
         %nft add table ip filter
         %nft add chain ip filter c1
         %nft add chain ip filter c1
         %nft add rule ip filter c1 jump c2
         %nft replace rule ip filter c1 handle 3 accept
         %nft flush ruleset
      
      <jump c2> expression means immediate NFT_JUMP to chain c2.
      Reference count of chain c2 is increased when the rule is added.
      
      When rule is deleted or replaced, the reference counter of c2 should be
      decreased via nft_rule_expr_deactivate() which calls
      nft_immediate_deactivate().
      
      Splat looks like:
      [  214.396453] WARNING: CPU: 1 PID: 21 at net/netfilter/nf_tables_api.c:1432 nf_tables_chain_destroy.isra.38+0x2f9/0x3a0 [nf_tables]
      [  214.398983] Modules linked in: nf_tables nfnetlink
      [  214.398983] CPU: 1 PID: 21 Comm: kworker/1:1 Not tainted 4.20.0-rc2+ #44
      [  214.398983] Workqueue: events nf_tables_trans_destroy_work [nf_tables]
      [  214.398983] RIP: 0010:nf_tables_chain_destroy.isra.38+0x2f9/0x3a0 [nf_tables]
      [  214.398983] Code: 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 8e 00 00 00 48 8b 7b 58 e8 e1 2c 4e c6 48 89 df e8 d9 2c 4e c6 eb 9a <0f> 0b eb 96 0f 0b e9 7e fe ff ff e8 a7 7e 4e c6 e9 a4 fe ff ff e8
      [  214.398983] RSP: 0018:ffff8881152874e8 EFLAGS: 00010202
      [  214.398983] RAX: 0000000000000001 RBX: ffff88810ef9fc28 RCX: ffff8881152876f0
      [  214.398983] RDX: dffffc0000000000 RSI: 1ffff11022a50ede RDI: ffff88810ef9fc78
      [  214.398983] RBP: 1ffff11022a50e9d R08: 0000000080000000 R09: 0000000000000000
      [  214.398983] R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff11022a50eba
      [  214.398983] R13: ffff888114446e08 R14: ffff8881152876f0 R15: ffffed1022a50ed6
      [  214.398983] FS:  0000000000000000(0000) GS:ffff888116400000(0000) knlGS:0000000000000000
      [  214.398983] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  214.398983] CR2: 00007fab9bb5f868 CR3: 000000012aa16000 CR4: 00000000001006e0
      [  214.398983] Call Trace:
      [  214.398983]  ? nf_tables_table_destroy.isra.37+0x100/0x100 [nf_tables]
      [  214.398983]  ? __kasan_slab_free+0x145/0x180
      [  214.398983]  ? nf_tables_trans_destroy_work+0x439/0x830 [nf_tables]
      [  214.398983]  ? kfree+0xdb/0x280
      [  214.398983]  nf_tables_trans_destroy_work+0x5f5/0x830 [nf_tables]
      [ ... ]
      
      Fixes: bb7b40ae ("netfilter: nf_tables: bogus EBUSY in chain deletions")
      Reported by: Christoph Anton Mitterer <calestyo@scientia.net>
      Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914505
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=201791Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      ca089878
    • Bryan Whitehead's avatar
      lan743x: Enable driver to work with LAN7431 · 4df5ce9b
      Bryan Whitehead authored
      This driver was designed to work with both LAN7430 and LAN7431.
      The only difference between the two is the LAN7431 has support
      for external phy.
      
      This change adds LAN7431 to the list of recognized devices
      supported by this driver.
      
      Updates for v2:
          changed 'fixes' tag to match defined format
      
      fixes: 23f0703c ("lan743x: Add main source files for new lan743x driver")
      Signed-off-by: default avatarBryan Whitehead <Bryan.Whitehead@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4df5ce9b
    • Jon Maloy's avatar
      tipc: fix lockdep warning during node delete · ec835f89
      Jon Maloy authored
      We see the following lockdep warning:
      
      [ 2284.078521] ======================================================
      [ 2284.078604] WARNING: possible circular locking dependency detected
      [ 2284.078604] 4.19.0+ #42 Tainted: G            E
      [ 2284.078604] ------------------------------------------------------
      [ 2284.078604] rmmod/254 is trying to acquire lock:
      [ 2284.078604] 00000000acd94e28 ((&n->timer)#2){+.-.}, at: del_timer_sync+0x5/0xa0
      [ 2284.078604]
      [ 2284.078604] but task is already holding lock:
      [ 2284.078604] 00000000f997afc0 (&(&tn->node_list_lock)->rlock){+.-.}, at: tipc_node_stop+0xac/0x190 [tipc]
      [ 2284.078604]
      [ 2284.078604] which lock already depends on the new lock.
      [ 2284.078604]
      [ 2284.078604]
      [ 2284.078604] the existing dependency chain (in reverse order) is:
      [ 2284.078604]
      [ 2284.078604] -> #1 (&(&tn->node_list_lock)->rlock){+.-.}:
      [ 2284.078604]        tipc_node_timeout+0x20a/0x330 [tipc]
      [ 2284.078604]        call_timer_fn+0xa1/0x280
      [ 2284.078604]        run_timer_softirq+0x1f2/0x4d0
      [ 2284.078604]        __do_softirq+0xfc/0x413
      [ 2284.078604]        irq_exit+0xb5/0xc0
      [ 2284.078604]        smp_apic_timer_interrupt+0xac/0x210
      [ 2284.078604]        apic_timer_interrupt+0xf/0x20
      [ 2284.078604]        default_idle+0x1c/0x140
      [ 2284.078604]        do_idle+0x1bc/0x280
      [ 2284.078604]        cpu_startup_entry+0x19/0x20
      [ 2284.078604]        start_secondary+0x187/0x1c0
      [ 2284.078604]        secondary_startup_64+0xa4/0xb0
      [ 2284.078604]
      [ 2284.078604] -> #0 ((&n->timer)#2){+.-.}:
      [ 2284.078604]        del_timer_sync+0x34/0xa0
      [ 2284.078604]        tipc_node_delete+0x1a/0x40 [tipc]
      [ 2284.078604]        tipc_node_stop+0xcb/0x190 [tipc]
      [ 2284.078604]        tipc_net_stop+0x154/0x170 [tipc]
      [ 2284.078604]        tipc_exit_net+0x16/0x30 [tipc]
      [ 2284.078604]        ops_exit_list.isra.8+0x36/0x70
      [ 2284.078604]        unregister_pernet_operations+0x87/0xd0
      [ 2284.078604]        unregister_pernet_subsys+0x1d/0x30
      [ 2284.078604]        tipc_exit+0x11/0x6f2 [tipc]
      [ 2284.078604]        __x64_sys_delete_module+0x1df/0x240
      [ 2284.078604]        do_syscall_64+0x66/0x460
      [ 2284.078604]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [ 2284.078604]
      [ 2284.078604] other info that might help us debug this:
      [ 2284.078604]
      [ 2284.078604]  Possible unsafe locking scenario:
      [ 2284.078604]
      [ 2284.078604]        CPU0                    CPU1
      [ 2284.078604]        ----                    ----
      [ 2284.078604]   lock(&(&tn->node_list_lock)->rlock);
      [ 2284.078604]                                lock((&n->timer)#2);
      [ 2284.078604]                                lock(&(&tn->node_list_lock)->rlock);
      [ 2284.078604]   lock((&n->timer)#2);
      [ 2284.078604]
      [ 2284.078604]  *** DEADLOCK ***
      [ 2284.078604]
      [ 2284.078604] 3 locks held by rmmod/254:
      [ 2284.078604]  #0: 000000003368be9b (pernet_ops_rwsem){+.+.}, at: unregister_pernet_subsys+0x15/0x30
      [ 2284.078604]  #1: 0000000046ed9c86 (rtnl_mutex){+.+.}, at: tipc_net_stop+0x144/0x170 [tipc]
      [ 2284.078604]  #2: 00000000f997afc0 (&(&tn->node_list_lock)->rlock){+.-.}, at: tipc_node_stop+0xac/0x19
      [...}
      
      The reason is that the node timer handler sometimes needs to delete a
      node which has been disconnected for too long. To do this, it grabs
      the lock 'node_list_lock', which may at the same time be held by the
      generic node cleanup function, tipc_node_stop(), during module removal.
      Since the latter is calling del_timer_sync() inside the same lock, we
      have a potential deadlock.
      
      We fix this letting the timer cleanup function use spin_trylock()
      instead of just spin_lock(), and when it fails to grab the lock it
      just returns so that the timer handler can terminate its execution.
      This is safe to do, since tipc_node_stop() anyway is about to
      delete both the timer and the node instance.
      
      Fixes: 6a939f36 ("tipc: Auto removal of peer down node instance")
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ec835f89
    • Bryan Whitehead's avatar
      lan743x: fix return value for lan743x_tx_napi_poll · cc592205
      Bryan Whitehead authored
      The lan743x driver, when under heavy traffic load, has been noticed
      to sometimes hang, or cause a kernel panic.
      
      Debugging reveals that the TX napi poll routine was returning
      the wrong value, 'weight'. Most other drivers return 0.
      And call napi_complete, instead of napi_complete_done.
      
      Additionally when creating the tx napi poll routine.
      Changed netif_napi_add, to netif_tx_napi_add.
      
      Updates for v3:
          changed 'fixes' tag to match defined format
      
      Updates for v2:
      use napi_complete, instead of napi_complete_done in
          lan743x_tx_napi_poll
      use netif_tx_napi_add, instead of netif_napi_add for
          registration of tx napi poll routine
      
      fixes: 23f0703c ("lan743x: Add main source files for new lan743x driver")
      Signed-off-by: default avatarBryan Whitehead <Bryan.Whitehead@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc592205
    • Colin Ian King's avatar
      net: via: via-velocity: fix spelling mistake "alignement" -> "alignment" · 4b5adba0
      Colin Ian King authored
      The text in array velocity_gstrings contains a spelling mistake,
      rename rx_frame_alignement_errors to rx_frame_alignment_errors.
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4b5adba0
    • Colin Ian King's avatar
      qed: fix spelling mistake "attnetion" -> "attention" · 1d510657
      Colin Ian King authored
      The text in array s_igu_fifo_error_strs contains a spelling mistake,
      fix it.
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1d510657
    • Lorenzo Bianconi's avatar
      net: thunderx: fix NULL pointer dereference in nic_remove · 24a6d2dd
      Lorenzo Bianconi authored
      Fix a possible NULL pointer dereference in nic_remove routine
      removing the nicpf module if nic_probe fails.
      The issue can be triggered with the following reproducer:
      
      $rmmod nicvf
      $rmmod nicpf
      
      [  521.412008] Unable to handle kernel access to user memory outside uaccess routines at virtual address 0000000000000014
      [  521.422777] Mem abort info:
      [  521.425561]   ESR = 0x96000004
      [  521.428624]   Exception class = DABT (current EL), IL = 32 bits
      [  521.434535]   SET = 0, FnV = 0
      [  521.437579]   EA = 0, S1PTW = 0
      [  521.440730] Data abort info:
      [  521.443603]   ISV = 0, ISS = 0x00000004
      [  521.447431]   CM = 0, WnR = 0
      [  521.450417] user pgtable: 4k pages, 48-bit VAs, pgdp = 0000000072a3da42
      [  521.457022] [0000000000000014] pgd=0000000000000000
      [  521.461916] Internal error: Oops: 96000004 [#1] SMP
      [  521.511801] Hardware name: GIGABYTE H270-T70/MT70-HD0, BIOS T49 02/02/2018
      [  521.518664] pstate: 80400005 (Nzcv daif +PAN -UAO)
      [  521.523451] pc : nic_remove+0x24/0x88 [nicpf]
      [  521.527808] lr : pci_device_remove+0x48/0xd8
      [  521.532066] sp : ffff000013433cc0
      [  521.535370] x29: ffff000013433cc0 x28: ffff810f6ac50000
      [  521.540672] x27: 0000000000000000 x26: 0000000000000000
      [  521.545974] x25: 0000000056000000 x24: 0000000000000015
      [  521.551274] x23: ffff8007ff89a110 x22: ffff000001667070
      [  521.556576] x21: ffff8007ffb170b0 x20: ffff8007ffb17000
      [  521.561877] x19: 0000000000000000 x18: 0000000000000025
      [  521.567178] x17: 0000000000000000 x16: 000000000000010ffc33ff98 x8 : 0000000000000000
      [  521.593683] x7 : 0000000000000000 x6 : 0000000000000001
      [  521.598983] x5 : 0000000000000002 x4 : 0000000000000003
      [  521.604284] x3 : ffff8007ffb17184 x2 : ffff8007ffb17184
      [  521.609585] x1 : ffff000001662118 x0 : ffff000008557be0
      [  521.614887] Process rmmod (pid: 1897, stack limit = 0x00000000859535c3)
      [  521.621490] Call trace:
      [  521.623928]  nic_remove+0x24/0x88 [nicpf]
      [  521.627927]  pci_device_remove+0x48/0xd8
      [  521.631847]  device_release_driver_internal+0x1b0/0x248
      [  521.637062]  driver_detach+0x50/0xc0
      [  521.640628]  bus_remove_driver+0x60/0x100
      [  521.644627]  driver_unregister+0x34/0x60
      [  521.648538]  pci_unregister_driver+0x24/0xd8
      [  521.652798]  nic_cleanup_module+0x14/0x111c [nicpf]
      [  521.657672]  __arm64_sys_delete_module+0x150/0x218
      [  521.662460]  el0_svc_handler+0x94/0x110
      [  521.666287]  el0_svc+0x8/0xc
      [  521.669160] Code: aa1e03e0 9102c295 d503201f f9404eb3 (b9401660)
      
      Fixes: 4863dea3 ("net: Adding support for Cavium ThunderX network controller")
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      24a6d2dd
  5. 27 Nov, 2018 12 commits
    • Xin Long's avatar
      sctp: increase sk_wmem_alloc when head->truesize is increased · 0d32f177
      Xin Long authored
      I changed to count sk_wmem_alloc by skb truesize instead of 1 to
      fix the sk_wmem_alloc leak caused by later truesize's change in
      xfrm in Commit 02968ccf ("sctp: count sk_wmem_alloc by skb
      truesize in sctp_packet_transmit").
      
      But I should have also increased sk_wmem_alloc when head->truesize
      is increased in sctp_packet_gso_append() as xfrm does. Otherwise,
      sctp gso packet will cause sk_wmem_alloc underflow.
      
      Fixes: 02968ccf ("sctp: count sk_wmem_alloc by skb truesize in sctp_packet_transmit")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d32f177
    • Colin Ian King's avatar
      firestream: fix spelling mistake: "Inititing" -> "Initializing" · a8842e97
      Colin Ian King authored
      There are spelling mistakes in debug messages, fix them.
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a8842e97
    • Heiner Kallweit's avatar
      net: phy: add workaround for issue where PHY driver doesn't bind to the device · c85ddeca
      Heiner Kallweit authored
      After switching the r8169 driver to use phylib some user reported that
      their network is broken. This was caused by the genphy PHY driver being
      used instead of the dedicated PHY driver for the RTL8211B. Users
      reported that loading the Realtek PHY driver module upfront fixes the
      issue. See also this mail thread:
      https://marc.info/?t=154279781800003&r=1&w=2
      The issue is quite weird and the root cause seems to be somewhere in
      the base driver core. The patch works around the issue and may be
      removed once the actual issue is fixed.
      
      The Fixes tag refers to the first reported occurrence of the issue.
      The issue itself may have been existing much longer and it may affect
      users of other network chips as well. Users typically will recognize
      this issue only if their PHY stops working when being used with the
      genphy driver.
      
      Fixes: f1e911d5 ("r8169: add basic phylib support")
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c85ddeca
    • Bernd Eckstein's avatar
      usbnet: ipheth: fix potential recvmsg bug and recvmsg bug 2 · 45611c61
      Bernd Eckstein authored
      The bug is not easily reproducable, as it may occur very infrequently
      (we had machines with 20minutes heavy downloading before it occurred)
      However, on a virual machine (VMWare on Windows 10 host) it occurred
      pretty frequently (1-2 seconds after a speedtest was started)
      
      dev->tx_skb mab be freed via dev_kfree_skb_irq on a callback
      before it is set.
      
      This causes the following problems:
      - double free of the skb or potential memory leak
      - in dmesg: 'recvmsg bug' and 'recvmsg bug 2' and eventually
        general protection fault
      
      Example dmesg output:
      [  134.841986] ------------[ cut here ]------------
      [  134.841987] recvmsg bug: copied 9C24A555 seq 9C24B557 rcvnxt 9C25A6B3 fl 0
      [  134.841993] WARNING: CPU: 7 PID: 2629 at /build/linux-hwe-On9fm7/linux-hwe-4.15.0/net/ipv4/tcp.c:1865 tcp_recvmsg+0x44d/0xab0
      [  134.841994] Modules linked in: ipheth(OE) kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd vmw_balloon intel_rapl_perf joydev input_leds serio_raw vmw_vsock_vmci_transport vsock shpchp i2c_piix4 mac_hid binfmt_misc vmw_vmci parport_pc ppdev lp parport autofs4 vmw_pvscsi vmxnet3 hid_generic usbhid hid vmwgfx ttm drm_kms_helper syscopyarea sysfillrect mptspi mptscsih sysimgblt ahci psmouse fb_sys_fops pata_acpi mptbase libahci e1000 drm scsi_transport_spi
      [  134.842046] CPU: 7 PID: 2629 Comm: python Tainted: G        W  OE    4.15.0-34-generic #37~16.04.1-Ubuntu
      [  134.842046] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
      [  134.842048] RIP: 0010:tcp_recvmsg+0x44d/0xab0
      [  134.842048] RSP: 0018:ffffa6630422bcc8 EFLAGS: 00010286
      [  134.842049] RAX: 0000000000000000 RBX: ffff997616f4f200 RCX: 0000000000000006
      [  134.842049] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff9976257d6490
      [  134.842050] RBP: ffffa6630422bd98 R08: 0000000000000001 R09: 000000000004bba4
      [  134.842050] R10: 0000000001e00c6f R11: 000000000004bba4 R12: ffff99760dee3000
      [  134.842051] R13: 0000000000000000 R14: ffff99760dee3514 R15: 0000000000000000
      [  134.842051] FS:  00007fe332347700(0000) GS:ffff9976257c0000(0000) knlGS:0000000000000000
      [  134.842052] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  134.842053] CR2: 0000000001e41000 CR3: 000000020e9b4006 CR4: 00000000003606e0
      [  134.842055] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  134.842055] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  134.842057] Call Trace:
      [  134.842060]  ? aa_sk_perm+0x53/0x1a0
      [  134.842064]  inet_recvmsg+0x51/0xc0
      [  134.842066]  sock_recvmsg+0x43/0x50
      [  134.842070]  SYSC_recvfrom+0xe4/0x160
      [  134.842072]  ? __schedule+0x3de/0x8b0
      [  134.842075]  ? ktime_get_ts64+0x4c/0xf0
      [  134.842079]  SyS_recvfrom+0xe/0x10
      [  134.842082]  do_syscall_64+0x73/0x130
      [  134.842086]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      [  134.842086] RIP: 0033:0x7fe331f5a81d
      [  134.842088] RSP: 002b:00007ffe8da98398 EFLAGS: 00000246 ORIG_RAX: 000000000000002d
      [  134.842090] RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 00007fe331f5a81d
      [  134.842094] RDX: 00000000000003fb RSI: 0000000001e00874 RDI: 0000000000000003
      [  134.842095] RBP: 00007fe32f642c70 R08: 0000000000000000 R09: 0000000000000000
      [  134.842097] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe332347698
      [  134.842099] R13: 0000000001b7e0a0 R14: 0000000001e00874 R15: 0000000000000000
      [  134.842103] Code: 24 fd ff ff e9 cc fe ff ff 48 89 d8 41 8b 8c 24 10 05 00 00 44 8b 45 80 48 c7 c7 08 bd 59 8b 48 89 85 68 ff ff ff e8 b3 c4 7d ff <0f> 0b 48 8b 85 68 ff ff ff e9 e9 fe ff ff 41 8b 8c 24 10 05 00
      [  134.842126] ---[ end trace b7138fc08c83147f ]---
      [  134.842144] general protection fault: 0000 [#1] SMP PTI
      [  134.842145] Modules linked in: ipheth(OE) kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd vmw_balloon intel_rapl_perf joydev input_leds serio_raw vmw_vsock_vmci_transport vsock shpchp i2c_piix4 mac_hid binfmt_misc vmw_vmci parport_pc ppdev lp parport autofs4 vmw_pvscsi vmxnet3 hid_generic usbhid hid vmwgfx ttm drm_kms_helper syscopyarea sysfillrect mptspi mptscsih sysimgblt ahci psmouse fb_sys_fops pata_acpi mptbase libahci e1000 drm scsi_transport_spi
      [  134.842161] CPU: 7 PID: 2629 Comm: python Tainted: G        W  OE    4.15.0-34-generic #37~16.04.1-Ubuntu
      [  134.842162] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
      [  134.842164] RIP: 0010:tcp_close+0x2c6/0x440
      [  134.842165] RSP: 0018:ffffa6630422bde8 EFLAGS: 00010202
      [  134.842167] RAX: 0000000000000000 RBX: ffff99760dee3000 RCX: 0000000180400034
      [  134.842168] RDX: 5c4afd407207a6c4 RSI: ffffe868495bd300 RDI: ffff997616f4f200
      [  134.842169] RBP: ffffa6630422be08 R08: 0000000016f4d401 R09: 0000000180400034
      [  134.842169] R10: ffffa6630422bd98 R11: 0000000000000000 R12: 000000000000600c
      [  134.842170] R13: 0000000000000000 R14: ffff99760dee30c8 R15: ffff9975bd44fe00
      [  134.842171] FS:  00007fe332347700(0000) GS:ffff9976257c0000(0000) knlGS:0000000000000000
      [  134.842173] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  134.842174] CR2: 0000000001e41000 CR3: 000000020e9b4006 CR4: 00000000003606e0
      [  134.842177] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  134.842178] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  134.842179] Call Trace:
      [  134.842181]  inet_release+0x42/0x70
      [  134.842183]  __sock_release+0x42/0xb0
      [  134.842184]  sock_close+0x15/0x20
      [  134.842187]  __fput+0xea/0x220
      [  134.842189]  ____fput+0xe/0x10
      [  134.842191]  task_work_run+0x8a/0xb0
      [  134.842193]  exit_to_usermode_loop+0xc4/0xd0
      [  134.842195]  do_syscall_64+0xf4/0x130
      [  134.842197]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      [  134.842197] RIP: 0033:0x7fe331f5a560
      [  134.842198] RSP: 002b:00007ffe8da982e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
      [  134.842200] RAX: 0000000000000000 RBX: 00007fe32f642c70 RCX: 00007fe331f5a560
      [  134.842201] RDX: 00000000008f5320 RSI: 0000000001cd4b50 RDI: 0000000000000003
      [  134.842202] RBP: 00007fe32f6500f8 R08: 000000000000003c R09: 00000000009343c0
      [  134.842203] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe32f6500d0
      [  134.842204] R13: 00000000008f5320 R14: 00000000008f5320 R15: 0000000001cd4770
      [  134.842205] Code: c8 00 00 00 45 31 e4 49 39 fe 75 4d eb 50 83 ab d8 00 00 00 01 48 8b 17 48 8b 47 08 48 c7 07 00 00 00 00 48 c7 47 08 00 00 00 00 <48> 89 42 08 48 89 10 0f b6 57 34 8b 47 2c 2b 47 28 83 e2 01 80
      [  134.842226] RIP: tcp_close+0x2c6/0x440 RSP: ffffa6630422bde8
      [  134.842227] ---[ end trace b7138fc08c831480 ]---
      
      The proposed patch eliminates a potential racing condition.
      Before, usb_submit_urb was called and _after_ that, the skb was attached
      (dev->tx_skb). So, on a callback it was possible, however unlikely that the
      skb was freed before it was set. That way (because dev->tx_skb was not set
      to NULL after it was freed), it could happen that a skb from a earlier
      transmission was freed a second time (and the skb we should have freed did
      not get freed at all)
      
      Now we free the skb directly in ipheth_tx(). It is not passed to the
      callback anymore, eliminating the posibility of a double free of the same
      skb. Depending on the retval of usb_submit_urb() we use dev_kfree_skb_any()
      respectively dev_consume_skb_any() to free the skb.
      Signed-off-by: default avatarOliver Zweigle <Oliver.Zweigle@faro.com>
      Signed-off-by: default avatarBernd Eckstein <3ernd.Eckstein@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      45611c61
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 93143f84
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2018-11-27
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Fix several bugs in BPF sparc JIT, that is, convergence for fused
         branches, initialization of frame pointer register, and moving all
         arguments into output registers from input registers in prologue
         to fix BPF to BPF calls, from David.
      
      2) Fix a bug in arm64 JIT for fetching BPF to BPF call addresses where
         they are not guaranteed to fit into imm field and therefore must be
         retrieved through prog aux data, from Daniel.
      
      3) Explicitly add all JITs to MAINTAINERS file with developers able to
         help out in feature development, fixes, review, etc.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93143f84
    • David Miller's avatar
      sparc: Adjust bpf JIT prologue for PSEUDO calls. · 2b9034b5
      David Miller authored
      Move all arguments into output registers from input registers.
      
      This path is exercised by test_verifier.c's "calls: two calls with
      args" test.  Adjust BPF_TAILCALL_PROLOGUE_SKIP as needed.
      
      Let's also make the prologue length a constant size regardless of
      the combination of ->saw_frame_pointer and ->saw_tail_call
      settings.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      2b9034b5
    • Daniel Borkmann's avatar
      bpf, doc: add entries of who looks over which jits · fa1e0c96
      Daniel Borkmann authored
      Make the high-level BPF JIT entry a general 'catch-all' and add
      architecture specific entries to make it more clear who actively
      maintains which BPF JIT compiler. The list (L) address implies
      that this eventually lands in the bpf patchwork bucket. Goal is
      that this set of responsible developers listed here is always up
      to date and a point of contact for helping out in e.g. feature
      development, fixes, review or testing patches in order to help
      long-term in ensuring quality of the BPF JITs and therefore BPF
      core under a given architecture. Every new JIT in future /must/
      have an entry here as well.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Acked-by: default avatarSandipan Das <sandipan@linux.ibm.com>
      Acked-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Acked-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarZi Shen Lim <zlim.lnx@gmail.com>
      Acked-by: default avatarPaul Burton <paul.burton@mips.com>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: default avatarWang YanQing <udknight@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      fa1e0c96
    • David Miller's avatar
      sparc: Correct ctx->saw_frame_pointer logic. · e2ac579a
      David Miller authored
      We need to initialize the frame pointer register not just if it is
      seen as a source operand, but also if it is seen as the destination
      operand of a store or an atomic instruction (which effectively is a
      source operand).
      
      This is exercised by test_verifier's "non-invalid fp arithmetic"
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e2ac579a
    • David Miller's avatar
      sparc: Fix JIT fused branch convergance. · c44768a3
      David Miller authored
      On T4 and later sparc64 cpus we can use the fused compare and branch
      instruction.
      
      However, it can only be used if the branch destination is in the range
      of a signed 10-bit immediate offset.  This amounts to 1024
      instructions forwards or backwards.
      
      After the commit referenced in the Fixes: tag, the largest possible
      size program seen by the JIT explodes by a significant factor.
      
      As a result of this convergance takes many more passes since the
      expanded "BPF_LDX | BPF_MSH | BPF_B" code sequence, for example,
      contains several embedded branch on condition instructions.
      
      On each pass, as suddenly new fused compare and branch instances
      become valid, this makes thousands more in range for the next pass.
      And so on and so forth.
      
      This is most greatly exemplified by "BPF_MAXINSNS: exec all MSH" which
      takes 35 passes to converge, and shrinks the image by about 64K.
      
      To decrease the cost of this number of convergance passes, do the
      convergance pass before we have the program image allocated, just like
      other JITs (such as x86) do.
      
      Fixes: e0cea7ce ("bpf: implement ld_abs/ld_ind in native bpf")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c44768a3
    • Alexei Starovoitov's avatar
      Merge branch 'arm64-jit-fixes' · fdac315d
      Alexei Starovoitov authored
      Daniel Borkmann says:
      
      ====================
      This set contains a fix for arm64 BPF JIT. First patch generalizes
      ppc64 way of retrieving subprog into bpf_jit_get_func_addr() as core
      code and uses the same on arm64 in second patch. Tested on both arm64
      and ppc64.
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      fdac315d
    • Daniel Borkmann's avatar
      bpf, arm64: fix getting subprog addr from aux for calls · 8c11ea5c
      Daniel Borkmann authored
      The arm64 JIT has the same issue as ppc64 JIT in that the relative BPF
      to BPF call offset can be too far away from core kernel in that relative
      encoding into imm is not sufficient and could potentially be truncated,
      see also fd045f6c ("arm64: add support for module PLTs") which adds
      spill-over space for module_alloc() and therefore bpf_jit_binary_alloc().
      Therefore, use the recently added bpf_jit_get_func_addr() helper for
      properly fetching the address through prog->aux->func[off]->bpf_func
      instead. This also has the benefit to optimize normal helper calls since
      their address can use the optimized emission. Tested on Cavium ThunderX
      CN8890.
      
      Fixes: db496944 ("bpf: arm64: add JIT support for multi-function programs")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      8c11ea5c
    • Daniel Borkmann's avatar
      bpf, ppc64: generalize fetching subprog into bpf_jit_get_func_addr · e2c95a61
      Daniel Borkmann authored
      Make fetching of the BPF call address from ppc64 JIT generic. ppc64
      was using a slightly different variant rather than through the insns'
      imm field encoding as the target address would not fit into that space.
      Therefore, the target subprog number was encoded into the insns' offset
      and fetched through fp->aux->func[off]->bpf_func instead. Given there
      are other JITs with this issue and the mechanism of fetching the address
      is JIT-generic, move it into the core as a helper instead. On the JIT
      side, we get information on whether the retrieved address is a fixed
      one, that is, not changing through JIT passes, or a dynamic one. For
      the former, JITs can optimize their imm emission because this doesn't
      change jump offsets throughout JIT process.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarSandipan Das <sandipan@linux.ibm.com>
      Tested-by: default avatarSandipan Das <sandipan@linux.ibm.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e2c95a61
  6. 26 Nov, 2018 4 commits
    • Taehee Yoo's avatar
      netfilter: nf_conncount: remove wrong condition check routine · 53ca0f2f
      Taehee Yoo authored
      All lists that reach the tree_nodes_free() function have both zero
      counter and true dead flag. The reason for this is that lists to be
      release are selected by nf_conncount_gc_list() which already decrements
      the list counter and sets on the dead flag. Therefore, this if statement
      in tree_nodes_free() is unnecessary and wrong.
      
      Fixes: 31568ec0 ("netfilter: nf_conncount: fix list_del corruption in conn_free")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      53ca0f2f
    • Taehee Yoo's avatar
      netfilter: nat: fix double register in masquerade modules · 095faf45
      Taehee Yoo authored
      There is a reference counter to ensure that masquerade modules register
      notifiers only once. However, the existing reference counter approach is
      not safe, test commands are:
      
         while :
         do
         	   modprobe ip6t_MASQUERADE &
      	   modprobe nft_masq_ipv6 &
      	   modprobe -rv ip6t_MASQUERADE &
      	   modprobe -rv nft_masq_ipv6 &
         done
      
      numbers below represent the reference counter.
      --------------------------------------------------------
      CPU0        CPU1        CPU2        CPU3        CPU4
      [insmod]    [insmod]    [rmmod]     [rmmod]     [insmod]
      --------------------------------------------------------
      0->1
      register    1->2
                  returns     2->1
      			returns     1->0
                                                      0->1
                                                      register <--
                                          unregister
      --------------------------------------------------------
      
      The unregistation of CPU3 should be processed before the
      registration of CPU4.
      
      In order to fix this, use a mutex instead of reference counter.
      
      splat looks like:
      [  323.869557] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [modprobe:1381]
      [  323.869574] Modules linked in: nf_tables(+) nf_nat_ipv6(-) nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 n]
      [  323.869574] irq event stamp: 194074
      [  323.898930] hardirqs last  enabled at (194073): [<ffffffff90004a0d>] trace_hardirqs_on_thunk+0x1a/0x1c
      [  323.898930] hardirqs last disabled at (194074): [<ffffffff90004a29>] trace_hardirqs_off_thunk+0x1a/0x1c
      [  323.898930] softirqs last  enabled at (182132): [<ffffffff922006ec>] __do_softirq+0x6ec/0xa3b
      [  323.898930] softirqs last disabled at (182109): [<ffffffff90193426>] irq_exit+0x1a6/0x1e0
      [  323.898930] CPU: 0 PID: 1381 Comm: modprobe Not tainted 4.20.0-rc2+ #27
      [  323.898930] RIP: 0010:raw_notifier_chain_register+0xea/0x240
      [  323.898930] Code: 3c 03 0f 8e f2 00 00 00 44 3b 6b 10 7f 4d 49 bc 00 00 00 00 00 fc ff df eb 22 48 8d 7b 10 488
      [  323.898930] RSP: 0018:ffff888101597218 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
      [  323.898930] RAX: 0000000000000000 RBX: ffffffffc04361c0 RCX: 0000000000000000
      [  323.898930] RDX: 1ffffffff26132ae RSI: ffffffffc04aa3c0 RDI: ffffffffc04361d0
      [  323.898930] RBP: ffffffffc04361c8 R08: 0000000000000000 R09: 0000000000000001
      [  323.898930] R10: ffff8881015972b0 R11: fffffbfff26132c4 R12: dffffc0000000000
      [  323.898930] R13: 0000000000000000 R14: 1ffff110202b2e44 R15: ffffffffc04aa3c0
      [  323.898930] FS:  00007f813ed41540(0000) GS:ffff88811ae00000(0000) knlGS:0000000000000000
      [  323.898930] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  323.898930] CR2: 0000559bf2c9f120 CR3: 000000010bc80000 CR4: 00000000001006f0
      [  323.898930] Call Trace:
      [  323.898930]  ? atomic_notifier_chain_register+0x2d0/0x2d0
      [  323.898930]  ? down_read+0x150/0x150
      [  323.898930]  ? sched_clock_cpu+0x126/0x170
      [  323.898930]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
      [  323.898930]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
      [  323.898930]  register_netdevice_notifier+0xbb/0x790
      [  323.898930]  ? __dev_close_many+0x2d0/0x2d0
      [  323.898930]  ? __mutex_unlock_slowpath+0x17f/0x740
      [  323.898930]  ? wait_for_completion+0x710/0x710
      [  323.898930]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
      [  323.898930]  ? up_write+0x6c/0x210
      [  323.898930]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
      [  324.127073]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
      [  324.127073]  nft_chain_filter_init+0x1e/0xe8a [nf_tables]
      [  324.127073]  nf_tables_module_init+0x37/0x92 [nf_tables]
      [ ... ]
      
      Fixes: 8dd33cc9 ("netfilter: nf_nat: generalize IPv4 masquerading support for nf_tables")
      Fixes: be6b635c ("netfilter: nf_nat: generalize IPv6 masquerading support for nf_tables")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      095faf45
    • Taehee Yoo's avatar
      netfilter: add missing error handling code for register functions · 584eab29
      Taehee Yoo authored
      register_{netdevice/inetaddr/inet6addr}_notifier may return an error
      value, this patch adds the code to handle these error paths.
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      584eab29
    • Alin Nastac's avatar
      netfilter: ipv6: Preserve link scope traffic original oif · 508b0904
      Alin Nastac authored
      When ip6_route_me_harder is invoked, it resets outgoing interface of:
        - link-local scoped packets sent by neighbor discovery
        - multicast packets sent by MLD host
        - multicast packets send by MLD proxy daemon that sets outgoing
          interface through IPV6_PKTINFO ipi6_ifindex
      
      Link-local and multicast packets must keep their original oif after
      ip6_route_me_harder is called.
      Signed-off-by: default avatarAlin Nastac <alin.nastac@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      508b0904