1. 31 Aug, 2021 8 commits
    • Subbaraya Sundeep's avatar
      octeontx2-af: Fix loop in free and unmap counter · 6537e96d
      Subbaraya Sundeep authored
      When the given counter does not belong to the entry
      then code ends up in infinite loop because the loop
      cursor, entry is not getting updated further. This
      patch fixes that by updating entry for every iteration.
      
      Fixes: a958dd59 ("octeontx2-af: Map or unmap NPC MCAM entry and counter")
      Signed-off-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6537e96d
    • Eric Dumazet's avatar
      af_unix: fix potential NULL deref in unix_dgram_connect() · dc56ad70
      Eric Dumazet authored
      syzbot was able to trigger NULL deref in unix_dgram_connect() [1]
      
      This happens in
      
      	if (unix_peer(sk))
      		sk->sk_state = other->sk_state = TCP_ESTABLISHED; // crash because @other is NULL
      
      Because locks have been dropped, unix_peer() might be non NULL,
      while @other is NULL (AF_UNSPEC case)
      
      We need to move code around, so that we no longer access
      unix_peer() and sk_state while locks have been released.
      
      [1]
      general protection fault, probably for non-canonical address 0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN
      KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
      CPU: 0 PID: 10341 Comm: syz-executor239 Not tainted 5.14.0-rc7-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:unix_dgram_connect+0x32a/0xc60 net/unix/af_unix.c:1226
      Code: 00 00 45 31 ed 49 83 bc 24 f8 05 00 00 00 74 69 e8 eb 5b a6 f9 48 8d 7d 12 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 48 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 e0 07 00 00
      RSP: 0018:ffffc9000a89fcd8 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: 0000000000000004 RCX: 0000000000000000
      RDX: 0000000000000002 RSI: ffffffff87cf4ef5 RDI: 0000000000000012
      RBP: 0000000000000000 R08: 0000000000000000 R09: ffff88802e1917c3
      R10: ffffffff87cf4eba R11: 0000000000000001 R12: ffff88802e191740
      R13: 0000000000000000 R14: ffff88802e191d38 R15: ffff88802e1917c0
      FS:  00007f3eb0052700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000004787d0 CR3: 0000000029c0a000 CR4: 00000000001506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       __sys_connect_file+0x155/0x1a0 net/socket.c:1890
       __sys_connect+0x161/0x190 net/socket.c:1907
       __do_sys_connect net/socket.c:1917 [inline]
       __se_sys_connect net/socket.c:1914 [inline]
       __x64_sys_connect+0x6f/0xb0 net/socket.c:1914
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x446a89
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 a1 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f3eb0052208 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
      RAX: ffffffffffffffda RBX: 00000000004cc4d8 RCX: 0000000000446a89
      RDX: 000000000000006e RSI: 0000000020000180 RDI: 0000000000000003
      RBP: 00000000004cc4d0 R08: 00007f3eb0052700 R09: 0000000000000000
      R10: 00007f3eb0052700 R11: 0000000000000246 R12: 00000000004cc4dc
      R13: 00007ffd791e79cf R14: 00007f3eb0052300 R15: 0000000000022000
      Modules linked in:
      ---[ end trace 4eb809357514968c ]---
      RIP: 0010:unix_dgram_connect+0x32a/0xc60 net/unix/af_unix.c:1226
      Code: 00 00 45 31 ed 49 83 bc 24 f8 05 00 00 00 74 69 e8 eb 5b a6 f9 48 8d 7d 12 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 48 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 e0 07 00 00
      RSP: 0018:ffffc9000a89fcd8 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: 0000000000000004 RCX: 0000000000000000
      RDX: 0000000000000002 RSI: ffffffff87cf4ef5 RDI: 0000000000000012
      RBP: 0000000000000000 R08: 0000000000000000 R09: ffff88802e1917c3
      R10: ffffffff87cf4eba R11: 0000000000000001 R12: ffff88802e191740
      R13: 0000000000000000 R14: ffff88802e191d38 R15: ffff88802e1917c0
      FS:  00007f3eb0052700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007ffd791fe960 CR3: 0000000029c0a000 CR4: 00000000001506e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      Fixes: 83301b53 ("af_unix: Set TCP_ESTABLISHED for datagram sockets too")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Cong Wang <cong.wang@bytedance.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc56ad70
    • Jason Wang's avatar
      dpaa2-eth: Replace strlcpy with strscpy · 995786ba
      Jason Wang authored
      The strlcpy should not be used because it doesn't limit the source
      length. As linus says, it's a completely useless function if you
      can't implicitly trust the source string - but that is almost always
      why people think they should use it! All in all the BSD function
      will lead some potential bugs.
      
      But the strscpy doesn't require reading memory from the src string
      beyond the specified "count" bytes, and since the return value is
      easier to error-check than strlcpy()'s. In addition, the implementation
      is robust to the string changing out from underneath it, unlike the
      current strlcpy() implementation.
      
      Thus, We prefer using strscpy instead of strlcpy.
      Signed-off-by: default avatarJason Wang <wangborong@cdjrlc.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      995786ba
    • Geetha sowjanya's avatar
      octeontx2-af: Use NDC TX for transmit packet data · a7314371
      Geetha sowjanya authored
      For better performance set hardware to use NDC TX for reading packet
      data specified NIX_SEND_SG_S.
      Signed-off-by: default avatarGeetha sowjanya <gakula@marvell.com>
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a7314371
    • MichelleJin's avatar
      net: bridge: use mld2r_ngrec instead of icmpv6_dataun · 6baeb395
      MichelleJin authored
      br_ip6_multicast_mld2_report function uses icmp6h
      to parse mld2_report packet.
      
      mld2r_ngrec defines mld2r_hdr.icmp6_dataun.un_data16[1]
      in include/net/mld.h.
      
      So, it is more compact to use mld2r rather than icmp6h.
      
      By doing printk test, it is confirmed that
      icmp6h->icmp6_dataun.un_data16[1] and mld2r->mld2r_ngrec are
      indeed equivalent.
      
      Also, sizeof(*mld2r) and sizeof(*icmp6h) are equivalent, too.
      Signed-off-by: default avatarMichelleJin <shjy180909@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6baeb395
    • Stefan Wahren's avatar
      net: qualcomm: fix QCA7000 checksum handling · 429205da
      Stefan Wahren authored
      Based on tests the QCA7000 doesn't support checksum offloading. So assume
      ip_summed is CHECKSUM_NONE and let the kernel take care of the checksum
      handling. This fixes data transfer issues in noisy environments.
      Reported-by: default avatarMichael Heimpold <michael.heimpold@in-tech.com>
      Fixes: 291ab06e ("net: qualcomm: new Ethernet over SPI driver for QCA7000")
      Signed-off-by: default avatarStefan Wahren <stefan.wahren@i2se.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      429205da
    • Christophe JAILLET's avatar
      net: pasemi: Remove usage of the deprecated "pci-dma-compat.h" API · a16ef91a
      Christophe JAILLET authored
      In [1], Christoph Hellwig has proposed to remove the wrappers in
      include/linux/pci-dma-compat.h.
      
      Some reasons why this API should be removed have been given by Julia
      Lawall in [2].
      
      A coccinelle script has been used to perform the needed transformation
      Only relevant parts are given below.
      
      An 'unlikely()' has been removed when calling 'dma_mapping_error()' because
      this function, which is inlined, already has such an annotation.
      
      @@ @@
      -    PCI_DMA_TODEVICE
      +    DMA_TO_DEVICE
      
      @@ @@
      -    PCI_DMA_FROMDEVICE
      +    DMA_FROM_DEVICE
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_map_single(e1, e2, e3, e4)
      +    dma_map_single(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_single(e1, e2, e3, e4)
      +    dma_unmap_single(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4, e5;
      @@
      -    pci_map_page(e1, e2, e3, e4, e5)
      +    dma_map_page(&e1->dev, e2, e3, e4, e5)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_page(e1, e2, e3, e4)
      +    dma_unmap_page(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2;
      @@
      -    pci_dma_mapping_error(e1, e2)
      +    dma_mapping_error(&e1->dev, e2)
      
      [1]: https://lore.kernel.org/kernel-janitors/20200421081257.GA131897@infradead.org/
      [2]: https://lore.kernel.org/kernel-janitors/alpine.DEB.2.22.394.2007120902170.2424@hadrien/Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Link: https://lore.kernel.org/r/bc6cd281eae024b26fd9c7ef6678d2d1dc9d74fd.1630150008.git.christophe.jaillet@wanadoo.frSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a16ef91a
    • Xiyu Yang's avatar
      net: sched: Fix qdisc_rate_table refcount leak when get tcf_block failed · c6607012
      Xiyu Yang authored
      The reference counting issue happens in one exception handling path of
      cbq_change_class(). When failing to get tcf_block, the function forgets
      to decrease the refcount of "rtab" increased by qdisc_put_rtab(),
      causing a refcount leak.
      
      Fix this issue by jumping to "failure" label when get tcf_block failed.
      
      Fixes: 6529eaba ("net: sched: introduce tcf block infractructure")
      Signed-off-by: default avatarXiyu Yang <xiyuyang19@fudan.edu.cn>
      Reviewed-by: default avatarCong Wang <cong.wang@bytedance.com>
      Link: https://lore.kernel.org/r/1630252681-71588-1-git-send-email-xiyuyang19@fudan.edu.cnSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c6607012
  2. 30 Aug, 2021 32 commits
    • Jakub Kicinski's avatar
      Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 19a31d79
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      bpf-next 2021-08-31
      
      We've added 116 non-merge commits during the last 17 day(s) which contain
      a total of 126 files changed, 6813 insertions(+), 4027 deletions(-).
      
      The main changes are:
      
      1) Add opaque bpf_cookie to perf link which the program can read out again,
         to be used in libbpf-based USDT library, from Andrii Nakryiko.
      
      2) Add bpf_task_pt_regs() helper to access userspace pt_regs, from Daniel Xu.
      
      3) Add support for UNIX stream type sockets for BPF sockmap, from Jiang Wang.
      
      4) Allow BPF TCP congestion control progs to call bpf_setsockopt() e.g. to switch
         to another congestion control algorithm during init, from Martin KaFai Lau.
      
      5) Extend BPF iterator support for UNIX domain sockets, from Kuniyuki Iwashima.
      
      6) Allow bpf_{set,get}sockopt() calls from setsockopt progs, from Prankur Gupta.
      
      7) Add bpf_get_netns_cookie() helper for BPF_PROG_TYPE_{SOCK_OPS,CGROUP_SOCKOPT}
         progs, from Xu Liu and Stanislav Fomichev.
      
      8) Support for __weak typed ksyms in libbpf, from Hao Luo.
      
      9) Shrink struct cgroup_bpf by 504 bytes through refactoring, from Dave Marchevsky.
      
      10) Fix a smatch complaint in verifier's narrow load handling, from Andrey Ignatov.
      
      11) Fix BPF interpreter's tail call count limit, from Daniel Borkmann.
      
      12) Big batch of improvements to BPF selftests, from Magnus Karlsson, Li Zhijian,
          Yucong Sun, Yonghong Song, Ilya Leoshkevich, Jussi Maki, Ilya Leoshkevich, others.
      
      13) Another big batch to revamp XDP samples in order to give them consistent look
          and feel, from Kumar Kartikeya Dwivedi.
      
      * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (116 commits)
        MAINTAINERS: Remove self from powerpc BPF JIT
        selftests/bpf: Fix potential unreleased lock
        samples: bpf: Fix uninitialized variable in xdp_redirect_cpu
        selftests/bpf: Reduce more flakyness in sockmap_listen
        bpf: Fix bpf-next builds without CONFIG_BPF_EVENTS
        bpf: selftests: Add dctcp fallback test
        bpf: selftests: Add connect_to_fd_opts to network_helpers
        bpf: selftests: Add sk_state to bpf_tcp_helpers.h
        bpf: tcp: Allow bpf-tcp-cc to call bpf_(get|set)sockopt
        selftests: xsk: Preface options with opt
        selftests: xsk: Make enums lower case
        selftests: xsk: Generate packets from specification
        selftests: xsk: Generate packet directly in umem
        selftests: xsk: Simplify cleanup of ifobjects
        selftests: xsk: Decrease sending speed
        selftests: xsk: Validate tx stats on tx thread
        selftests: xsk: Simplify packet validation in xsk tests
        selftests: xsk: Rename worker_* functions that are not thread entry points
        selftests: xsk: Disassociate umem size with packets sent
        selftests: xsk: Remove end-of-test packet
        ...
      ====================
      
      Link: https://lore.kernel.org/r/20210830225618.11634-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      19a31d79
    • Maxim Mikityanskiy's avatar
      sch_htb: Fix inconsistency when leaf qdisc creation fails · ca49bfd9
      Maxim Mikityanskiy authored
      In HTB offload mode, qdiscs of leaf classes are grafted to netdev
      queues. sch_htb expects the dev_queue field of these qdiscs to point to
      the corresponding queues. However, qdisc creation may fail, and in that
      case noop_qdisc is used instead. Its dev_queue doesn't point to the
      right queue, so sch_htb can lose track of used netdev queues, which will
      cause internal inconsistencies.
      
      This commit fixes this bug by keeping track of the netdev queue inside
      struct htb_class. All reads of cl->leaf.q->dev_queue are replaced by the
      new field, the two values are synced on writes, and WARNs are added to
      assert equality of the two values.
      
      The driver API has changed: when TC_HTB_LEAF_DEL needs to move a queue,
      the driver used to pass the old and new queue IDs to sch_htb. Now that
      there is a new field (offload_queue) in struct htb_class that needs to
      be updated on this operation, the driver will pass the old class ID to
      sch_htb instead (it already knows the new class ID).
      
      Fixes: d03b195b ("sch_htb: Hierarchical QoS hardware offload")
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Link: https://lore.kernel.org/r/20210826115425.1744053-1-maximmi@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ca49bfd9
    • Sandipan Das's avatar
      MAINTAINERS: Remove self from powerpc BPF JIT · fca35b11
      Sandipan Das authored
      Stepping down as I haven't had a chance to look into the powerpc
      BPF JIT compilers for a while.
      Signed-off-by: default avatarSandipan Das <sandipan@linux.ibm.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210827111905.396145-1-sandipan@linux.ibm.com
      fca35b11
    • Yajun Deng's avatar
      net: ipv4: Fix the warning for dereference · 1b9fbe81
      Yajun Deng authored
      Add a if statements to avoid the warning.
      
      Dan Carpenter report:
      The patch faf482ca: "net: ipv4: Move ip_options_fragment() out of
      loop" from Aug 23, 2021, leads to the following Smatch complaint:
      
          net/ipv4/ip_output.c:833 ip_do_fragment()
          warn: variable dereferenced before check 'iter.frag' (see line 828)
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Fixes: faf482ca ("net: ipv4: Move ip_options_fragment() out of loop")
      Link: https://lore.kernel.org/netdev/20210830073802.GR7722@kadam/T/#tSigned-off-by: default avatarYajun Deng <yajun.deng@linux.dev>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1b9fbe81
    • Dan Carpenter's avatar
      net: qrtr: make checks in qrtr_endpoint_post() stricter · aaa8e492
      Dan Carpenter authored
      These checks are still not strict enough.  The main problem is that if
      "cb->type == QRTR_TYPE_NEW_SERVER" is true then "len - hdrlen" is
      guaranteed to be 4 but we need to be at least 16 bytes.  In fact, we
      can reject everything smaller than sizeof(*pkt) which is 20 bytes.
      
      Also I don't like the ALIGN(size, 4).  It's better to just insist that
      data is needs to be aligned at the start.
      
      Fixes: 0baa99ee ("net: qrtr: Allow non-immediate node routing")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aaa8e492
    • Haimin Zhang's avatar
      fix array-index-out-of-bounds in taprio_change · efe487fc
      Haimin Zhang authored
      syzbot report an array-index-out-of-bounds in taprio_change
      index 16 is out of range for type '__u16 [16]'
      that's because mqprio->num_tc is lager than TC_MAX_QUEUE,so we check
      the return value of netdev_set_num_tc.
      
      Reported-by: syzbot+2b3e5fb6c7ef285a94f6@syzkaller.appspotmail.com
      Signed-off-by: default avatarHaimin Zhang <tcs_kernel@tencent.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      efe487fc
    • 王贇's avatar
      net: fix NULL pointer reference in cipso_v4_doi_free · e842cb60
      王贇 authored
      In netlbl_cipsov4_add_std() when 'doi_def->map.std' alloc
      failed, we sometime observe panic:
      
        BUG: kernel NULL pointer dereference, address:
        ...
        RIP: 0010:cipso_v4_doi_free+0x3a/0x80
        ...
        Call Trace:
         netlbl_cipsov4_add_std+0xf4/0x8c0
         netlbl_cipsov4_add+0x13f/0x1b0
         genl_family_rcv_msg_doit.isra.15+0x132/0x170
         genl_rcv_msg+0x125/0x240
      
      This is because in cipso_v4_doi_free() there is no check
      on 'doi_def->map.std' when doi_def->type got value 1, which
      is possibe, since netlbl_cipsov4_add_std() haven't initialize
      it before alloc 'doi_def->map.std'.
      
      This patch just add the check to prevent panic happen in similar
      cases.
      Reported-by: default avatarAbaci <abaci@linux.alibaba.com>
      Signed-off-by: default avatarMichael Wang <yun.wang@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e842cb60
    • David S. Miller's avatar
      Merge branch 'inet-exceptions-less-predictable' · 63cad4c7
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      inet: make exception handling less predictible
      
      This second round of patches is addressing Keyu Man recommendations
      to make linux hosts more robust against a class of brute force attacks.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      63cad4c7
    • Eric Dumazet's avatar
      ipv4: make exception cache less predictible · 67d6d681
      Eric Dumazet authored
      Even after commit 6457378f ("ipv4: use siphash instead of Jenkins in
      fnhe_hashfun()"), an attacker can still use brute force to learn
      some secrets from a victim linux host.
      
      One way to defeat these attacks is to make the max depth of the hash
      table bucket a random value.
      
      Before this patch, each bucket of the hash table used to store exceptions
      could contain 6 items under attack.
      
      After the patch, each bucket would contains a random number of items,
      between 6 and 10. The attacker can no longer infer secrets.
      
      This is slightly increasing memory size used by the hash table,
      by 50% in average, we do not expect this to be a problem.
      
      This patch is more complex than the prior one (IPv6 equivalent),
      because IPv4 was reusing the oldest entry.
      Since we need to be able to evict more than one entry per
      update_or_create_fnhe() call, I had to replace
      fnhe_oldest() with fnhe_remove_oldest().
      
      Also note that we will queue extra kfree_rcu() calls under stress,
      which hopefully wont be a too big issue.
      
      Fixes: 4895c771 ("ipv4: Add FIB nexthop exceptions.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarKeyu Man <kman001@ucr.edu>
      Cc: Willy Tarreau <w@1wt.eu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Tested-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      67d6d681
    • Eric Dumazet's avatar
      ipv6: make exception cache less predictible · a00df2ca
      Eric Dumazet authored
      Even after commit 4785305c ("ipv6: use siphash in rt6_exception_hash()"),
      an attacker can still use brute force to learn some secrets from a victim
      linux host.
      
      One way to defeat these attacks is to make the max depth of the hash
      table bucket a random value.
      
      Before this patch, each bucket of the hash table used to store exceptions
      could contain 6 items under attack.
      
      After the patch, each bucket would contains a random number of items,
      between 6 and 10. The attacker can no longer infer secrets.
      
      This is slightly increasing memory size used by the hash table,
      we do not expect this to be a problem.
      
      Following patch is dealing with the same issue in IPv4.
      
      Fixes: 35732d01 ("ipv6: introduce a hash table to store dst cache")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarKeyu Man <kman001@ucr.edu>
      Cc: Wei Wang <weiwan@google.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a00df2ca
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 9dfa859d
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter updates for net-next
      
      The following patchset contains Netfilter updates for net-next:
      
      1) Clean up and consolidate ct ecache infrastructure by merging ct and
         expect notifiers, from Florian Westphal.
      
      2) Missing counters and timestamp in nfnetlink_queue and _log conntrack
         information.
      
      3) Missing error check for xt_register_template() in iptables mangle,
         as a incremental fix for the previous pull request, also from
         Florian Westphal.
      
      4) Add netfilter hooks for the SRv6 lightweigh tunnel driver, from
         Ryoga Sato. The hooks are enabled via nf_hooks_lwtunnel sysctl
         to make sure existing netfilter rulesets do not break. There is
         a static key to disable the hooks by default.
      
         The pktgen_bench_xmit_mode_netif_receive.sh shows no noticeable
         impact in the seg6_input path for non-netfilter users: similar
         numbers with and without this patch.
      
         This is a sample of the perf report output:
      
          11.67%  kpktgend_0       [ipv6]                    [k] ipv6_get_saddr_eval
           7.89%  kpktgend_0       [ipv6]                    [k] __ipv6_addr_label
           7.52%  kpktgend_0       [ipv6]                    [k] __ipv6_dev_get_saddr
           6.63%  kpktgend_0       [kernel.vmlinux]          [k] asm_exc_nmi
           4.74%  kpktgend_0       [ipv6]                    [k] fib6_node_lookup_1
           3.48%  kpktgend_0       [kernel.vmlinux]          [k] pskb_expand_head
           3.33%  kpktgend_0       [ipv6]                    [k] ip6_rcv_core.isra.29
           3.33%  kpktgend_0       [ipv6]                    [k] seg6_do_srh_encap
           2.53%  kpktgend_0       [ipv6]                    [k] ipv6_dev_get_saddr
           2.45%  kpktgend_0       [ipv6]                    [k] fib6_table_lookup
           2.24%  kpktgend_0       [kernel.vmlinux]          [k] ___cache_free
           2.16%  kpktgend_0       [ipv6]                    [k] ip6_pol_route
           2.11%  kpktgend_0       [kernel.vmlinux]          [k] __ipv6_addr_type
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9dfa859d
    • David S. Miller's avatar
      Merge branch 'IXP46x-PTP-Timer' · 724812d8
      David S. Miller authored
      Linus Walleij says:
      
      ====================
      IXP46x PTP Timer clean-up and DT
      
      ChangeLog v2->v3:
      
      - Dropped the patch enabling compile tests: we are still dependent
        on some machine-specific headers. The plan is to get rid of this
        after device tree conversion. We include one of the compile testing
        fixes anyway, because it is nice to have fixed.
      
      - Rebased on the latest net-next
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      724812d8
    • Linus Walleij's avatar
      ixp4xx_eth: Probe the PTP module from the device tree · e9e50622
      Linus Walleij authored
      This adds device tree probing support for the PTP module
      adjacent to the ethernet module. It is pretty straight
      forward, all resources are in the device tree as they
      come to the platform device.
      
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e9e50622
    • Linus Walleij's avatar
      ixp4xx_eth: Add devicetree bindings · 323fb75d
      Linus Walleij authored
      This adds device tree bindings for the IXP46x PTP Timer, a companion
      to the IXP4xx ethernet in newer platforms.
      
      Cc: devicetree@vger.kernel.org
      Cc: Arnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      323fb75d
    • Linus Walleij's avatar
      ixp4xx_eth: Stop referring to GPIOs · 13dc9319
      Linus Walleij authored
      The driver is being passed interrupts, then looking up the
      same interrupts as GPIOs a second time to convert them into
      interrupts and set properties on them.
      
      This is pointless: the GPIO and irqchip APIs of a GPIO chip
      are orthogonal. Just request the interrupts and be done
      with it, drop reliance on any GPIO functions or definitions.
      
      Use devres-managed functions and add a small devress quirk
      to unregister the clock as well and we can rely on devres
      to handle all the resources and cut down a bunch of
      boilerplate in the process.
      
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      13dc9319
    • Arnd Bergmann's avatar
      ixp4xx_eth: fix compile-testing · f52749a2
      Arnd Bergmann authored
      Change the driver to use portable integer types to avoid warnings
      during compile testing, including:
      
      drivers/net/ethernet/xscale/ixp4xx_eth.c:721:21: error: cast to 'u32 *' (aka 'unsigned int *') from smaller integer type 'int' [-Werror,-Wint-to-pointer-cast]
              memcpy_swab32(mem, (u32 *)((int)skb->data & ~3), bytes / 4);
                                 ^
      drivers/net/ethernet/xscale/ixp4xx_eth.c:963:12: error: incompatible pointer types passing 'u32 *' (aka 'unsigned int *') to parameter of type 'dma_addr_t *' (aka 'unsigned long long *') [-Werror,-Wincompatible-pointer-types]
                                                    &port->desc_tab_phys)))
                                                    ^~~~~~~~~~~~~~~~~~~~
      include/linux/dmapool.h:27:20: note: passing argument to parameter 'handle' here
                           dma_addr_t *handle);
                                       ^
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f52749a2
    • Arnd Bergmann's avatar
      ixp4xx_eth: make ptp support a platform driver · 9055a2f5
      Arnd Bergmann authored
      After the recent ixp4xx cleanups, the ptp driver has gained a
      build failure in some configurations:
      
      drivers/net/ethernet/xscale/ptp_ixp46x.c: In function 'ptp_ixp_init':
      drivers/net/ethernet/xscale/ptp_ixp46x.c:290:51: error: 'IXP4XX_TIMESYNC_BASE_VIRT' undeclared (first use in this function)
      
      Avoid the last bit of hardcoded constants from platform headers
      by turning the ptp driver bit into a platform driver and passing
      the IRQ and MMIO address as resources.
      
      This is a bit tricky:
      
      - The interface between the two drivers is now the new
        ixp46x_ptp_find() function, replacing the global
        ixp46x_phc_index variable. The call is done as late
        as possible, in hwtstamp_set(), to ensure that the
        ptp device is fully probed.
      
      - As the ptp driver is now called by the network driver, the
        link dependency is reversed, which in turn requires a small
        Makefile hack
      
      - The GPIO number is still left hardcoded. This is clearly not
        great, but it can be addressed later. Note that commit 98ac0cc2
        ("ARM: ixp4xx: Convert to MULTI_IRQ_HANDLER") changed the
        IRQ number to something meaningless. Passing the correct IRQ
        in a resource fixes this.
      
      - When the PTP driver is disabled, ethtool .get_ts_info()
        now correctly lists only software timestamping regardless
        of the hardware.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      [Fix a missing include]
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9055a2f5
    • David S. Miller's avatar
      Merge branch 'hns3-cleanups' · 27c77943
      David S. Miller authored
      Guangbin Huang says:
      
      ====================
      net: hns3: add some cleanups
      
      This series includes some cleanups for the HNS3 ethernet driver.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      27c77943
    • Hao Chen's avatar
      net: hns3: uniform parameter name of hclge_ptp_clean_tx_hwts() · 52d89333
      Hao Chen authored
      The parameter name of hclge_ptp_clean_tx_hwts() in declaration is "dev",
      but the definition of this function is used the common name "hdev" as
      other functions, so modify it.
      Signed-off-by: default avatarHao Chen <chenhao288@hisilicon.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      52d89333
    • Hao Chen's avatar
      net: hnss3: use max() to simplify code · 38b99e1e
      Hao Chen authored
      Replace the "? :" statement wich max() to simplify code.
      Signed-off-by: default avatarHao Chen <chenhao288@hisilicon.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      38b99e1e
    • Hao Chen's avatar
      net: hns3: modify a print format of hns3_dbg_queue_map() · 5aea2da5
      Hao Chen authored
      The type of tqp_vector->vector_irq is int, so modify its print format
      to "%d".
      Signed-off-by: default avatarHao Chen <chenhao288@hisilicon.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5aea2da5
    • Guangbin Huang's avatar
      net: hns3: refine function hclge_dbg_dump_tm_pri() · 04d96139
      Guangbin Huang authored
      To improve flexibility, simplicity and maintainability to dump info of
      every element of tm priority, add a struct hclge_dbg_item array of tm
      priority and fill string of every data according to this array.
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      04d96139
    • Guangbin Huang's avatar
      net: hns3: reconstruct function hclge_ets_validate() · 161ad669
      Guangbin Huang authored
      This patch reconstructs function hclge_ets_validate() to reduce the code
      cycle complexity and make code more concise.
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      161ad669
    • Peng Li's avatar
      net: hns3: reconstruct function hns3_self_test · 4c8dab1c
      Peng Li authored
      This patch reconstructs function hns3_self_test to reduce the code
      cycle complexity and make code more concise.
      Signed-off-by: default avatarPeng Li <lipeng321@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4c8dab1c
    • Jiaran Zhang's avatar
      net: hns3: initialize each member of structure array on a separate line · 60fe9ff9
      Jiaran Zhang authored
      To make the format of each member initialization of structure array
      clearer, initialize each member on a separate line.
      Signed-off-by: default avatarJiaran Zhang <zhangjiaran@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      60fe9ff9
    • David S. Miller's avatar
      Merge branch 'bnxt_en-fw-messages' · 49f9df5b
      David S. Miller authored
      Michael Chan says:
      
      ====================
      bnxt_en: Implement new driver APIs to send FW messages
      
      The current driver APIs to send messages to the firmware allow only one
      outstanding message in flight.  There is only one buffer for the firmware
      response for each firmware channel.  To send a firmware message, all
      callers must take a mutex and it is released after the firmware response
      has been read.  This scheme does not allow multiple firmware messages
      in flight.  Firmware may take a long time to respond to some messages
      (e.g. NVRAM related ones) and this causes the mutex to be held for
      a long time, blocking other callers.
      
      This patchset intoduces the new driver APIs to address the above
      shortcomings.  The new APIs are compatible with new and old firmware.
      But the new deferred firmware response mechanism will require newer
      firmware in order to allow multiple outstanding firmware commands.
      
      All callers are updated to use the new APIs.
      
      v2: Patch 4 and patch 9 updated to fix issues reported by test robot
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      49f9df5b
    • Edwin Peer's avatar
      bnxt_en: support multiple HWRM commands in flight · 68f684e2
      Edwin Peer authored
      Add infrastructure to maintain a pending list of HWRM commands awaiting
      completion and reduce the scope of the hwrm_cmd_lock mutex so that it
      protects only the request mailbox. The mailbox is free to use for one
      or more concurrent commands after receiving deferred response events.
      
      For uniformity and completeness, use the same pending list for
      collecting completions for commands that respond via a completion ring.
      These commands are only used for freeing rings and for IRQ test and
      we only support one such command in flight.
      
      Note deferred responses are also only supported on the main channel.
      The secondary channel (KONG) does not support deferred responses.
      Signed-off-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      68f684e2
    • Edwin Peer's avatar
      bnxt_en: remove legacy HWRM interface · b34695a8
      Edwin Peer authored
      There are no longer any callers relying on the old API.
      Signed-off-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b34695a8
    • Edwin Peer's avatar
      bnxt_en: update all firmware calls to use the new APIs · bbf33d1d
      Edwin Peer authored
      The conversion follows this general pattern for most of the calls:
      
      1. The input message is changed from a stack variable initialized
      using bnxt_hwrm_cmd_hdr_init() to a pointer allocated and intialized
      using hwrm_req_init().
      
      2. If we don't need to read the firmware response, the hwrm_send_message()
      call is replaced with hwrm_req_send().
      
      3. If we need to read the firmware response, the mutex lock is replaced
      by hwrm_req_hold() to hold the response.  When the response is read, the
      mutex unlock is replaced by hwrm_req_drop().
      
      If additional DMA buffers are needed for firmware response data, the
      hwrm_req_dma_slice() is used instead of calling dma_alloc_coherent().
      
      Some minor refactoring is also done while doing these conversions.
      
      v2: Fix unintialized variable warnings in __bnxt_hwrm_get_tx_rings()
      and bnxt_approve_mac()
      Signed-off-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bbf33d1d
    • Edwin Peer's avatar
      bnxt_en: use link_lock instead of hwrm_cmd_lock to protect link_info · 3c10ed49
      Edwin Peer authored
      We currently use the hwrm_cmd_lock to serialize the update of the
      firmware's link status response data and the copying of link status data
      to the VF.  This won't work when we update the firmware message APIs, so
      we use the link_lock mutex instead.  All link_info data should be
      updated under the link_lock mutex.  Also add link_lock to functions that
      touch link_info in __bnxt_open_nic() and bnxt_probe_phy(). The locking
      is probably not strictly necessary during probe, but it's more consistent.
      Signed-off-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Reviewed-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3c10ed49
    • Edwin Peer's avatar
      bnxt_en: add support for HWRM request slices · 21380817
      Edwin Peer authored
      Slices are a mechanism for suballocating DMA mapped regions from the
      request buffer. Such regions can be used for indirect command data
      instead of creating new mappings with dma_alloc_coherent().
      
      The advantage of using a slice is that the lifetime of the slice is
      bound to the request and will be automatically unmapped when the
      request is consumed.
      
      A single external region is also supported. This allows for regions
      that will not fit inside the spare request buffer space such that
      the same API can be used consistently even for larger mappings.
      Signed-off-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      21380817
    • Edwin Peer's avatar
      bnxt_en: add HWRM request assignment API · ecddc29d
      Edwin Peer authored
      hwrm_req_replace() provides an assignment like operation to replace a
      managed HWRM request object with data from a pre-built source. This is
      useful for handling request data provided by higher layer HWRM clients.
      Signed-off-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ecddc29d