1. 18 Jun, 2022 1 commit
    • Jakub Kicinski's avatar
      Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 582573f1
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2022-06-17
      
      We've added 12 non-merge commits during the last 4 day(s) which contain
      a total of 14 files changed, 305 insertions(+), 107 deletions(-).
      
      The main changes are:
      
      1) Fix x86 JIT tailcall count offset on BPF-2-BPF call, from Jakub Sitnicki.
      
      2) Fix a kprobe_multi link bug which misplaces BPF cookies, from Jiri Olsa.
      
      3) Fix an infinite loop when processing a module's BTF, from Kumar Kartikeya Dwivedi.
      
      4) Fix getting a rethook only in RCU available context, from Masami Hiramatsu.
      
      5) Fix request socket refcount leak in sk lookup helpers, from Jon Maxwell.
      
      6) Fix xsk xmit behavior which wrongly adds skb to already full cq, from Ciara Loftus.
      
      * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        rethook: Reject getting a rethook if RCU is not watching
        fprobe, samples: Add use_trace option and show hit/missed counter
        bpf, docs: Update some of the JIT/maintenance entries
        selftest/bpf: Fix kprobe_multi bench test
        bpf: Force cookies array to follow symbols sorting
        ftrace: Keep address offset in ftrace_lookup_symbols
        selftests/bpf: Shuffle cookies symbols in kprobe multi test
        selftests/bpf: Test tail call counting with bpf2bpf and data on stack
        bpf, x86: Fix tail call count offset calculation on bpf2bpf call
        bpf: Limit maximum modifier chain length in btf_check_type_tags
        bpf: Fix request_sock leak in sk lookup helpers
        xsk: Fix generic transmit when completion queue reservation fails
      ====================
      
      Link: https://lore.kernel.org/r/20220617202119.2421-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      582573f1
  2. 17 Jun, 2022 14 commits
    • Masami Hiramatsu (Google)'s avatar
      rethook: Reject getting a rethook if RCU is not watching · c0f3bb40
      Masami Hiramatsu (Google) authored
      Since the rethook_recycle() will involve the call_rcu() for reclaiming
      the rethook_instance, the rethook must be set up at the RCU available
      context (non idle). This rethook_recycle() in the rethook trampoline
      handler is inevitable, thus the RCU available check must be done before
      setting the rethook trampoline.
      
      This adds a rcu_is_watching() check in the rethook_try_get() so that
      it will return NULL if it is called when !rcu_is_watching().
      
      Fixes: 54ecbe6f ("rethook: Add a generic return hook")
      Signed-off-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: https://lore.kernel.org/bpf/165461827269.280167.7379263615545598958.stgit@devnote2
      c0f3bb40
    • Masami Hiramatsu (Google)'s avatar
      fprobe, samples: Add use_trace option and show hit/missed counter · c88dbbcd
      Masami Hiramatsu (Google) authored
      Add use_trace option to use trace_printk() instead of pr_info()
      so that the handler doesn't involve the RCU operations.
      And show the hit and missed counter so that the user can check
      how many times the probe handler hit and missed.
      Signed-off-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: https://lore.kernel.org/bpf/165461826247.280167.11939123218334322352.stgit@devnote2
      c88dbbcd
    • Daniel Borkmann's avatar
      bpf, docs: Update some of the JIT/maintenance entries · 63ce81d1
      Daniel Borkmann authored
      Various minor updates around some of the BPF-related entries:
      
      JITs for ARM32/NFP/SPARC/X86-32 haven't seen updates in quite a while, thus
      for now, mark them as 'Odd Fixes' until they become more actively developed.
      
      JITs for POWERPC/S390 are in good shape and receive active development and
      review, thus bump to 'Supported' similar as we have with X86-64/ARM64.
      
      JITs for MIPS/RISC-V are in similar good shape as the ones mentioned above,
      but looked after mostly in spare time, thus leave for now in 'Maintained' state.
      
      Add Michael to PPC JIT given he's picking up the patches there, so it better
      reflects today's state.
      
      Also, I haven't done much reviewing around BPF sockmap/kTLS after John and I
      did the big rework back in the days to integrate sockmap with kTLS.
      
      These days, most of this is taken care by John, Jakub {Sitnicki,Kicinski} and
      others in the community, so remove myself from these two.
      
      Lastly, move all BPF-related entries into one place, that is, move the sockmap
      one over near rest of BPF.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/r/f9b8a63a0b48dc764bd4c50f87632889f5813f69.1655494758.git.daniel@iogearbox.netSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      63ce81d1
    • Riccardo Paolo Bestetti's avatar
      ipv4: ping: fix bind address validity check · b4a028c4
      Riccardo Paolo Bestetti authored
      Commit 8ff978b8 ("ipv4/raw: support binding to nonlocal addresses")
      introduced a helper function to fold duplicated validity checks of bind
      addresses into inet_addr_valid_or_nonlocal(). However, this caused an
      unintended regression in ping_check_bind_addr(), which previously would
      reject binding to multicast and broadcast addresses, but now these are
      both incorrectly allowed as reported in [1].
      
      This patch restores the original check. A simple reordering is done to
      improve readability and make it evident that multicast and broadcast
      addresses should not be allowed. Also, add an early exit for INADDR_ANY
      which replaces lost behavior added by commit 0ce779a9 ("net: Avoid
      unnecessary inet_addr_type() call when addr is INADDR_ANY").
      
      Furthermore, this patch introduces regression selftests to catch these
      specific cases.
      
      [1] https://lore.kernel.org/netdev/CANP3RGdkAcDyAZoT1h8Gtuu0saq+eOrrTiWbxnOs+5zn+cpyKg@mail.gmail.com/
      
      Fixes: 8ff978b8 ("ipv4/raw: support binding to nonlocal addresses")
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Reported-by: default avatarMaciej Żenczykowski <maze@google.com>
      Signed-off-by: default avatarCarlos Llamas <cmllamas@google.com>
      Signed-off-by: default avatarRiccardo Paolo Bestetti <pbl@bestov.io>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4a028c4
    • Xu Jia's avatar
      hamradio: 6pack: fix array-index-out-of-bounds in decode_std_command() · 2b04495e
      Xu Jia authored
      Hulk Robot reports incorrect sp->rx_count_cooked value in decode_std_command().
      This should be caused by the subtracting from sp->rx_count_cooked before.
      It seems that sp->rx_count_cooked value is changed to 0, which bypassed the
      previous judgment.
      
      The situation is shown below:
      
               (Thread 1)			|  (Thread 2)
      decode_std_command()		| resync_tnc()
      ...					|
      if (rest == 2)			|
      	sp->rx_count_cooked -= 2;	|
      else if (rest == 3)			| ...
      					| sp->rx_count_cooked = 0;
      	sp->rx_count_cooked -= 1;	|
      for (i = 0; i < sp->rx_count_cooked; i++) // report error
      	checksum += sp->cooked_buf[i];
      
      sp->rx_count_cooked is a shared variable but is not protected by a lock.
      The same applies to sp->rx_count. This patch adds a lock to fix the bug.
      
      The fail log is shown below:
      =======================================================================
      UBSAN: array-index-out-of-bounds in drivers/net/hamradio/6pack.c:925:31
      index 400 is out of range for type 'unsigned char [400]'
      CPU: 3 PID: 7433 Comm: kworker/u10:1 Not tainted 5.18.0-rc5-00163-g4b97bac0 #2
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
      Workqueue: events_unbound flush_to_ldisc
      Call Trace:
       <TASK>
       dump_stack_lvl+0xcd/0x134
       ubsan_epilogue+0xb/0x50
       __ubsan_handle_out_of_bounds.cold+0x62/0x6c
       sixpack_receive_buf+0xfda/0x1330
       tty_ldisc_receive_buf+0x13e/0x180
       tty_port_default_receive_buf+0x6d/0xa0
       flush_to_ldisc+0x213/0x3f0
       process_one_work+0x98f/0x1620
       worker_thread+0x665/0x1080
       kthread+0x2e9/0x3a0
       ret_from_fork+0x1f/0x30
       ...
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarXu Jia <xujia39@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2b04495e
    • Hoang Le's avatar
      tipc: fix use-after-free Read in tipc_named_reinit · 911600bf
      Hoang Le authored
      syzbot found the following issue on:
      ==================================================================
      BUG: KASAN: use-after-free in tipc_named_reinit+0x94f/0x9b0
      net/tipc/name_distr.c:413
      Read of size 8 at addr ffff88805299a000 by task kworker/1:9/23764
      
      CPU: 1 PID: 23764 Comm: kworker/1:9 Not tainted
      5.18.0-rc4-syzkaller-00878-g17d49e6e #0
      Hardware name: Google Compute Engine/Google Compute Engine,
      BIOS Google 01/01/2011
      Workqueue: events tipc_net_finalize_work
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       print_address_description.constprop.0.cold+0xeb/0x495
      mm/kasan/report.c:313
       print_report mm/kasan/report.c:429 [inline]
       kasan_report.cold+0xf4/0x1c6 mm/kasan/report.c:491
       tipc_named_reinit+0x94f/0x9b0 net/tipc/name_distr.c:413
       tipc_net_finalize+0x234/0x3d0 net/tipc/net.c:138
       process_one_work+0x996/0x1610 kernel/workqueue.c:2289
       worker_thread+0x665/0x1080 kernel/workqueue.c:2436
       kthread+0x2e9/0x3a0 kernel/kthread.c:376
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298
       </TASK>
      [...]
      ==================================================================
      
      In the commit
      d966ddcc ("tipc: fix a deadlock when flushing scheduled work"),
      the cancel_work_sync() function just to make sure ONLY the work
      tipc_net_finalize_work() is executing/pending on any CPU completed before
      tipc namespace is destroyed through tipc_exit_net(). But this function
      is not guaranteed the work is the last queued. So, the destroyed instance
      may be accessed in the work which will try to enqueue later.
      
      In order to completely fix, we re-order the calling of cancel_work_sync()
      to make sure the work tipc_net_finalize_work() was last queued and it
      must be completed by calling cancel_work_sync().
      
      Reported-by: syzbot+47af19f3307fc9c5c82e@syzkaller.appspotmail.com
      Fixes: d966ddcc ("tipc: fix a deadlock when flushing scheduled work")
      Acked-by: default avatarJon Maloy <jmaloy@redhat.com>
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarHoang Le <hoang.h.le@dektech.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      911600bf
    • Jay Vosburgh's avatar
      veth: Add updating of trans_start · e66e257a
      Jay Vosburgh authored
      Since commit 21a75f09 ("bonding: Fix ARP monitor validation"),
      the bonding ARP / ND link monitors depend on the trans_start time to
      determine link availability.  NETIF_F_LLTX drivers must update trans_start
      directly, which veth does not do.  This prevents use of the ARP or ND link
      monitors with veth interfaces in a bond.
      
      	Resolve this by having veth_xmit update the trans_start time.
      Reported-by: default avatarJonathan Toppins <jtoppins@redhat.com>
      Tested-by: default avatarJonathan Toppins <jtoppins@redhat.com>
      Signed-off-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Fixes: 21a75f09 ("bonding: Fix ARP monitor validation")
      Link: https://lore.kernel.org/netdev/b2fd4147-8f50-bebd-963a-1a3e8d1d9715@redhat.com/Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e66e257a
    • Eric Dumazet's avatar
      net: fix data-race in dev_isalive() · cc26c266
      Eric Dumazet authored
      dev_isalive() is called under RTNL or dev_base_lock protection.
      
      This means that changes to dev->reg_state should be done with both locks held.
      
      syzbot reported:
      
      BUG: KCSAN: data-race in register_netdevice / type_show
      
      write to 0xffff888144ecf518 of 1 bytes by task 20886 on cpu 0:
      register_netdevice+0xb9f/0xdf0 net/core/dev.c:10050
      lapbeth_new_device drivers/net/wan/lapbether.c:414 [inline]
      lapbeth_device_event+0x4a0/0x6c0 drivers/net/wan/lapbether.c:456
      notifier_call_chain kernel/notifier.c:87 [inline]
      raw_notifier_call_chain+0x53/0xb0 kernel/notifier.c:455
      __dev_notify_flags+0x1d6/0x3a0
      dev_change_flags+0xa2/0xc0 net/core/dev.c:8607
      do_setlink+0x778/0x2230 net/core/rtnetlink.c:2780
      __rtnl_newlink net/core/rtnetlink.c:3546 [inline]
      rtnl_newlink+0x114c/0x16a0 net/core/rtnetlink.c:3593
      rtnetlink_rcv_msg+0x811/0x8c0 net/core/rtnetlink.c:6089
      netlink_rcv_skb+0x13e/0x240 net/netlink/af_netlink.c:2501
      rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:6107
      netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
      netlink_unicast+0x58a/0x660 net/netlink/af_netlink.c:1345
      netlink_sendmsg+0x661/0x750 net/netlink/af_netlink.c:1921
      sock_sendmsg_nosec net/socket.c:714 [inline]
      sock_sendmsg net/socket.c:734 [inline]
      __sys_sendto+0x21e/0x2c0 net/socket.c:2119
      __do_sys_sendto net/socket.c:2131 [inline]
      __se_sys_sendto net/socket.c:2127 [inline]
      __x64_sys_sendto+0x74/0x90 net/socket.c:2127
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      read to 0xffff888144ecf518 of 1 bytes by task 20423 on cpu 1:
      dev_isalive net/core/net-sysfs.c:38 [inline]
      netdev_show net/core/net-sysfs.c:50 [inline]
      type_show+0x24/0x90 net/core/net-sysfs.c:112
      dev_attr_show+0x35/0x90 drivers/base/core.c:2095
      sysfs_kf_seq_show+0x175/0x240 fs/sysfs/file.c:59
      kernfs_seq_show+0x75/0x80 fs/kernfs/file.c:162
      seq_read_iter+0x2c3/0x8e0 fs/seq_file.c:230
      kernfs_fop_read_iter+0xd1/0x2f0 fs/kernfs/file.c:235
      call_read_iter include/linux/fs.h:2052 [inline]
      new_sync_read fs/read_write.c:401 [inline]
      vfs_read+0x5a5/0x6a0 fs/read_write.c:482
      ksys_read+0xe8/0x1a0 fs/read_write.c:620
      __do_sys_read fs/read_write.c:630 [inline]
      __se_sys_read fs/read_write.c:628 [inline]
      __x64_sys_read+0x3e/0x50 fs/read_write.c:628
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      value changed: 0x00 -> 0x01
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 20423 Comm: udevd Tainted: G W 5.19.0-rc2-syzkaller-dirty #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc26c266
    • Claudiu Manoil's avatar
      phy: aquantia: Fix AN when higher speeds than 1G are not advertised · 9b7fd167
      Claudiu Manoil authored
      Even when the eth port is resticted to work with speeds not higher than 1G,
      and so the eth driver is requesting the phy (via phylink) to advertise up
      to 1000BASET support, the aquantia phy device is still advertising for 2.5G
      and 5G speeds.
      Clear these advertising defaults when requested.
      
      Cc: Ondrej Spacek <ondrej.spacek@nxp.com>
      Fixes: 09c4c57f ("net: phy: aquantia: add support for auto-negotiation configuration")
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@nxp.com>
      Link: https://lore.kernel.org/r/20220610084037.7625-1-claudiu.manoil@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9b7fd167
    • Alexei Starovoitov's avatar
      Merge branch 'bpf: Fix cookie values for kprobe multi' · a4a8b2ee
      Alexei Starovoitov authored
      Jiri Olsa says:
      
      ====================
      
      hi,
      there's bug in kprobe_multi link that makes cookies misplaced when
      using symbols to attach. The reason is that we sort symbols by name
      but not adjacent cookie values. Current test did not find it because
      bpf_fentry_test* are already sorted by name.
      
      v3 changes:
        - fixed kprobe_multi bench test to filter out invalid entries
          from available_filter_functions
      
      v2 changes:
        - rebased on top of bpf/master
        - checking if cookies are defined later in swap function [Andrii]
        - added acks
      
      thanks,
      jirka
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      a4a8b2ee
    • Jiri Olsa's avatar
      selftest/bpf: Fix kprobe_multi bench test · 73006702
      Jiri Olsa authored
      With [1] the available_filter_functions file contains records
      starting with __ftrace_invalid_address___ and marking disabled
      entries.
      
      We need to filter them out for the bench test to pass only
      resolvable symbols to kernel.
      
      [1] commit b39181f7 ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function")
      
      Fixes: b39181f7 ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function")
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: https://lore.kernel.org/r/20220615112118.497303-5-jolsa@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      73006702
    • Jiri Olsa's avatar
      bpf: Force cookies array to follow symbols sorting · eb5fb032
      Jiri Olsa authored
      When user specifies symbols and cookies for kprobe_multi link
      interface it's very likely the cookies will be misplaced and
      returned to wrong functions (via get_attach_cookie helper).
      
      The reason is that to resolve the provided functions we sort
      them before passing them to ftrace_lookup_symbols, but we do
      not do the same sort on the cookie values.
      
      Fixing this by using sort_r function with custom swap callback
      that swaps cookie values as well.
      
      Fixes: 0236fec5 ("bpf: Resolve symbols with ftrace_lookup_symbols for kprobe multi link")
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: https://lore.kernel.org/r/20220615112118.497303-4-jolsa@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      eb5fb032
    • Jiri Olsa's avatar
      ftrace: Keep address offset in ftrace_lookup_symbols · eb1b2985
      Jiri Olsa authored
      We want to store the resolved address on the same index as
      the symbol string, because that's the user (bpf kprobe link)
      code assumption.
      
      Also making sure we don't store duplicates that might be
      present in kallsyms.
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Acked-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Fixes: bed0d9a5 ("ftrace: Add ftrace_lookup_symbols function")
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: https://lore.kernel.org/r/20220615112118.497303-3-jolsa@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      eb1b2985
    • Jiri Olsa's avatar
      selftests/bpf: Shuffle cookies symbols in kprobe multi test · ad884853
      Jiri Olsa authored
      There's a kernel bug that causes cookies to be misplaced and
      the reason we did not catch this with this test is that we
      provide bpf_fentry_test* functions already sorted by name.
      
      Shuffling function bpf_fentry_test2 deeper in the list and
      keeping the current cookie values as before will trigger
      the bug.
      
      The kernel fix is coming in following changes.
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: https://lore.kernel.org/r/20220615112118.497303-2-jolsa@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      ad884853
  3. 16 Jun, 2022 5 commits
  4. 15 Jun, 2022 13 commits
  5. 14 Jun, 2022 7 commits
    • Linus Torvalds's avatar
      netfs: fix up netfs_inode_init() docbook comment · 018ab4fa
      Linus Torvalds authored
      Commit e81fb419 ("netfs: Further cleanups after struct netfs_inode
      wrapper introduced") changed the argument types and names, and actually
      updated the comment too (although that was thanks to David Howells, not
      me: my original patch only changed the code).
      
      But the comment fixup didn't go quite far enough, and didn't change the
      argument name in the comment, resulting in
      
        include/linux/netfs.h:314: warning: Function parameter or member 'ctx' not described in 'netfs_inode_init'
        include/linux/netfs.h:314: warning: Excess function parameter 'inode' description in 'netfs_inode_init'
      
      during htmldoc generation.
      
      Fixes: e81fb419 ("netfs: Further cleanups after struct netfs_inode wrapper introduced")
      Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      018ab4fa
    • Przemyslaw Patynowski's avatar
      ice: Fix memory corruption in VF driver · efe41860
      Przemyslaw Patynowski authored
      Disable VF's RX/TX queues, when it's disabled. VF can have queues enabled,
      when it requests a reset. If PF driver assumes that VF is disabled,
      while VF still has queues configured, VF may unmap DMA resources.
      In such scenario device still can map packets to memory, which ends up
      silently corrupting it.
      Previously, VF driver could experience memory corruption, which lead to
      crash:
      [ 5119.170157] BUG: unable to handle kernel paging request at 00001b9780003237
      [ 5119.170166] PGD 0 P4D 0
      [ 5119.170173] Oops: 0002 [#1] PREEMPT_RT SMP PTI
      [ 5119.170181] CPU: 30 PID: 427592 Comm: kworker/u96:2 Kdump: loaded Tainted: G        W I      --------- -  - 4.18.0-372.9.1.rt7.166.el8.x86_64 #1
      [ 5119.170189] Hardware name: Dell Inc. PowerEdge R740/014X06, BIOS 2.3.10 08/15/2019
      [ 5119.170193] Workqueue: iavf iavf_adminq_task [iavf]
      [ 5119.170219] RIP: 0010:__page_frag_cache_drain+0x5/0x30
      [ 5119.170238] Code: 0f 0f b6 77 51 85 f6 74 07 31 d2 e9 05 df ff ff e9 90 fe ff ff 48 8b 05 49 db 33 01 eb b4 0f 1f 80 00 00 00 00 0f 1f 44 00 00 <f0> 29 77 34 74 01 c3 48 8b 07 f6 c4 80 74 0f 0f b6 77 51 85 f6 74
      [ 5119.170244] RSP: 0018:ffffa43b0bdcfd78 EFLAGS: 00010282
      [ 5119.170250] RAX: ffffffff896b3e40 RBX: ffff8fb282524000 RCX: 0000000000000002
      [ 5119.170254] RDX: 0000000049000000 RSI: 0000000000000000 RDI: 00001b9780003203
      [ 5119.170259] RBP: ffff8fb248217b00 R08: 0000000000000022 R09: 0000000000000009
      [ 5119.170262] R10: 2b849d6300000000 R11: 0000000000000020 R12: 0000000000000000
      [ 5119.170265] R13: 0000000000001000 R14: 0000000000000009 R15: 0000000000000000
      [ 5119.170269] FS:  0000000000000000(0000) GS:ffff8fb1201c0000(0000) knlGS:0000000000000000
      [ 5119.170274] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 5119.170279] CR2: 00001b9780003237 CR3: 00000008f3e1a003 CR4: 00000000007726e0
      [ 5119.170283] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 5119.170286] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 5119.170290] PKRU: 55555554
      [ 5119.170292] Call Trace:
      [ 5119.170298]  iavf_clean_rx_ring+0xad/0x110 [iavf]
      [ 5119.170324]  iavf_free_rx_resources+0xe/0x50 [iavf]
      [ 5119.170342]  iavf_free_all_rx_resources.part.51+0x30/0x40 [iavf]
      [ 5119.170358]  iavf_virtchnl_completion+0xd8a/0x15b0 [iavf]
      [ 5119.170377]  ? iavf_clean_arq_element+0x210/0x280 [iavf]
      [ 5119.170397]  iavf_adminq_task+0x126/0x2e0 [iavf]
      [ 5119.170416]  process_one_work+0x18f/0x420
      [ 5119.170429]  worker_thread+0x30/0x370
      [ 5119.170437]  ? process_one_work+0x420/0x420
      [ 5119.170445]  kthread+0x151/0x170
      [ 5119.170452]  ? set_kthread_struct+0x40/0x40
      [ 5119.170460]  ret_from_fork+0x35/0x40
      [ 5119.170477] Modules linked in: iavf sctp ip6_udp_tunnel udp_tunnel mlx4_en mlx4_core nfp tls vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM ipt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc intel_rapl_msr iTCO_wdt iTCO_vendor_support dell_smbios wmi_bmof dell_wmi_descriptor dcdbas kvm_intel kvm irqbypass intel_rapl_common isst_if_common skx_edac irdma nfit libnvdimm x86_pkg_temp_thermal i40e intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ib_uverbs rapl ipmi_ssif intel_cstate intel_uncore mei_me pcspkr acpi_ipmi ib_core mei lpc_ich i2c_i801 ipmi_si ipmi_devintf wmi ipmi_msghandler acpi_power_meter xfs libcrc32c sd_mod t10_pi sg mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ice ahci drm libahci crc32c_intel libata tg3 megaraid_sas
      [ 5119.170613]  i2c_algo_bit dm_mirror dm_region_hash dm_log dm_mod fuse [last unloaded: iavf]
      [ 5119.170627] CR2: 00001b9780003237
      
      Fixes: ec4f5a43 ("ice: Check if VF is disabled for Opcode and other operations")
      Signed-off-by: default avatarPrzemyslaw Patynowski <przemyslawx.patynowski@intel.com>
      Co-developed-by: default avatarSlawomir Laba <slawomirx.laba@intel.com>
      Signed-off-by: default avatarSlawomir Laba <slawomirx.laba@intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      efe41860
    • Przemyslaw Patynowski's avatar
      ice: Fix queue config fail handling · be2af714
      Przemyslaw Patynowski authored
      Disable VF's RX/TX queues, when VIRTCHNL_OP_CONFIG_VSI_QUEUES fail.
      Not disabling them might lead to scenario, where PF driver leaves VF
      queues enabled, when VF's VSI failed queue config.
      In this scenario VF should not have RX/TX queues enabled. If PF failed
      to set up VF's queues, VF will reset due to TX timeouts in VF driver.
      Initialize iterator 'i' to -1, so if error happens prior to configuring
      queues then error path code will not disable queue 0. Loop that
      configures queues will is using same iterator, so error path code will
      only disable queues that were configured.
      
      Fixes: 77ca27c4 ("ice: add support for virtchnl_queue_select.[tx|rx]_queues bitmap")
      Suggested-by: default avatarSlawomir Laba <slawomirx.laba@intel.com>
      Signed-off-by: default avatarPrzemyslaw Patynowski <przemyslawx.patynowski@intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      be2af714
    • Roman Storozhenko's avatar
      ice: Sync VLAN filtering features for DVM · 9542ef4f
      Roman Storozhenko authored
      VLAN filtering features, that is C-Tag and S-Tag, in DVM mode must be
      both enabled or disabled.
      In case of turning off/on only one of the features, another feature must
      be turned off/on automatically with issuing an appropriate message to
      the kernel log.
      
      Fixes: 1babaf77 ("ice: Advertise 802.1ad VLAN filtering and offloads for PF netdev")
      Signed-off-by: default avatarRoman Storozhenko <roman.storozhenko@intel.com>
      Co-developed-by: default avatarAnatolii Gerasymenko <anatolii.gerasymenko@intel.com>
      Signed-off-by: default avatarAnatolii Gerasymenko <anatolii.gerasymenko@intel.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      9542ef4f
    • Michal Michalik's avatar
      ice: Fix PTP TX timestamp offset calculation · 71a579f0
      Michal Michalik authored
      The offset was being incorrectly calculated for E822 - that led to
      collisions in choosing TX timestamp register location when more than
      one port was trying to use timestamping mechanism.
      
      In E822 one quad is being logically split between ports, so quad 0 is
      having trackers for ports 0-3, quad 1 ports 4-7 etc. Each port should
      have separate memory location for tracking timestamps. Due to error for
      example ports 1 and 2 had been assigned to quad 0 with same offset (0),
      while port 1 should have offset 0 and 1 offset 16.
      
      Fix it by correctly calculating quad offset.
      
      Fixes: 3a749623 ("ice: implement basic E822 PTP support")
      Signed-off-by: default avatarMichal Michalik <michal.michalik@intel.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      71a579f0
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 24625f7d
      Linus Torvalds authored
      Pull kvm fixes from Paolo Bonzini:
       "While last week's pull request contained miscellaneous fixes for x86,
        this one covers other architectures, selftests changes, and a bigger
        series for APIC virtualization bugs that were discovered during 5.20
        development. The idea is to base 5.20 development for KVM on top of
        this tag.
      
        ARM64:
      
         - Properly reset the SVE/SME flags on vcpu load
      
         - Fix a vgic-v2 regression regarding accessing the pending state of a
           HW interrupt from userspace (and make the code common with vgic-v3)
      
         - Fix access to the idreg range for protected guests
      
         - Ignore 'kvm-arm.mode=protected' when using VHE
      
         - Return an error from kvm_arch_init_vm() on allocation failure
      
         - A bunch of small cleanups (comments, annotations, indentation)
      
        RISC-V:
      
         - Typo fix in arch/riscv/kvm/vmid.c
      
         - Remove broken reference pattern from MAINTAINERS entry
      
        x86-64:
      
         - Fix error in page tables with MKTME enabled
      
         - Dirty page tracking performance test extended to running a nested
           guest
      
         - Disable APICv/AVIC in cases that it cannot implement correctly"
      
      [ This merge also fixes a misplaced end parenthesis bug introduced in
        commit 3743c2f0 ("KVM: x86: inhibit APICv/AVIC on changes to APIC
        ID or APIC base") pointed out by Sean Christopherson ]
      
      Link: https://lore.kernel.org/all/20220610191813.371682-1-seanjc@google.com/
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (34 commits)
        KVM: selftests: Restrict test region to 48-bit physical addresses when using nested
        KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2
        KVM: selftests: Clean up LIBKVM files in Makefile
        KVM: selftests: Link selftests directly with lib object files
        KVM: selftests: Drop unnecessary rule for STATIC_LIBS
        KVM: selftests: Add a helper to check EPT/VPID capabilities
        KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.h
        KVM: selftests: Refactor nested_map() to specify target level
        KVM: selftests: Drop stale function parameter comment for nested_map()
        KVM: selftests: Add option to create 2M and 1G EPT mappings
        KVM: selftests: Replace x86_page_size with PG_LEVEL_XX
        KVM: x86: SVM: fix nested PAUSE filtering when L0 intercepts PAUSE
        KVM: x86: SVM: drop preempt-safe wrappers for avic_vcpu_load/put
        KVM: x86: disable preemption around the call to kvm_arch_vcpu_{un|}blocking
        KVM: x86: disable preemption while updating apicv inhibition
        KVM: x86: SVM: fix avic_kick_target_vcpus_fast
        KVM: x86: SVM: remove avic's broken code that updated APIC ID
        KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base
        KVM: x86: document AVIC/APICv inhibit reasons
        KVM: x86/mmu: Set memory encryption "value", not "mask", in shadow PDPTRs
        ...
      24625f7d
    • Ciara Loftus's avatar
      xsk: Fix generic transmit when completion queue reservation fails · a6e944f2
      Ciara Loftus authored
      Two points of potential failure in the generic transmit function are:
      
        1. completion queue (cq) reservation failure.
        2. skb allocation failure
      
      Originally the cq reservation was performed first, followed by the skb
      allocation. Commit 67571640 ("xdp: fix possible cq entry leak")
      reversed the order because at the time there was no mechanism available
      to undo the cq reservation which could have led to possible cq entry leaks
      in the event of skb allocation failure. However if the skb allocation is
      performed first and the cq reservation then fails, the xsk skb destructor
      is called which blindly adds the skb address to the already full cq leading
      to undefined behavior.
      
      This commit restores the original order (cq reservation followed by skb
      allocation) and uses the xskq_prod_cancel helper to undo the cq reserve
      in event of skb allocation failure.
      
      Fixes: 67571640 ("xdp: fix possible cq entry leak")
      Signed-off-by: default avatarCiara Loftus <ciara.loftus@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Link: https://lore.kernel.org/bpf/20220614070746.8871-1-ciara.loftus@intel.com
      a6e944f2