1. 10 Mar, 2021 20 commits
    • Eric Dumazet's avatar
      macvlan: macvlan_count_rx() needs to be aware of preemption · dd4fa1da
      Eric Dumazet authored
      macvlan_count_rx() can be called from process context, it is thus
      necessary to disable preemption before calling u64_stats_update_begin()
      
      syzbot was able to spot this on 32bit arch:
      
      WARNING: CPU: 1 PID: 4632 at include/linux/seqlock.h:271 __seqprop_assert include/linux/seqlock.h:271 [inline]
      WARNING: CPU: 1 PID: 4632 at include/linux/seqlock.h:271 __seqprop_assert.constprop.0+0xf0/0x11c include/linux/seqlock.h:269
      Modules linked in:
      Kernel panic - not syncing: panic_on_warn set ...
      CPU: 1 PID: 4632 Comm: kworker/1:3 Not tainted 5.12.0-rc2-syzkaller #0
      Hardware name: ARM-Versatile Express
      Workqueue: events macvlan_process_broadcast
      Backtrace:
      [<82740468>] (dump_backtrace) from [<827406dc>] (show_stack+0x18/0x1c arch/arm/kernel/traps.c:252)
       r7:00000080 r6:60000093 r5:00000000 r4:8422a3c4
      [<827406c4>] (show_stack) from [<82751b58>] (__dump_stack lib/dump_stack.c:79 [inline])
      [<827406c4>] (show_stack) from [<82751b58>] (dump_stack+0xb8/0xe8 lib/dump_stack.c:120)
      [<82751aa0>] (dump_stack) from [<82741270>] (panic+0x130/0x378 kernel/panic.c:231)
       r7:830209b4 r6:84069ea4 r5:00000000 r4:844350d0
      [<82741140>] (panic) from [<80244924>] (__warn+0xb0/0x164 kernel/panic.c:605)
       r3:8404ec8c r2:00000000 r1:00000000 r0:830209b4
       r7:0000010f
      [<80244874>] (__warn) from [<82741520>] (warn_slowpath_fmt+0x68/0xd4 kernel/panic.c:628)
       r7:81363f70 r6:0000010f r5:83018e50 r4:00000000
      [<827414bc>] (warn_slowpath_fmt) from [<81363f70>] (__seqprop_assert include/linux/seqlock.h:271 [inline])
      [<827414bc>] (warn_slowpath_fmt) from [<81363f70>] (__seqprop_assert.constprop.0+0xf0/0x11c include/linux/seqlock.h:269)
       r8:5a109000 r7:0000000f r6:a568dac0 r5:89802300 r4:00000001
      [<81363e80>] (__seqprop_assert.constprop.0) from [<81364af0>] (u64_stats_update_begin include/linux/u64_stats_sync.h:128 [inline])
      [<81363e80>] (__seqprop_assert.constprop.0) from [<81364af0>] (macvlan_count_rx include/linux/if_macvlan.h:47 [inline])
      [<81363e80>] (__seqprop_assert.constprop.0) from [<81364af0>] (macvlan_broadcast+0x154/0x26c drivers/net/macvlan.c:291)
       r5:89802300 r4:8a927740
      [<8136499c>] (macvlan_broadcast) from [<81365020>] (macvlan_process_broadcast+0x258/0x2d0 drivers/net/macvlan.c:317)
       r10:81364f78 r9:8a86d000 r8:8a9c7e7c r7:8413aa5c r6:00000000 r5:00000000
       r4:89802840
      [<81364dc8>] (macvlan_process_broadcast) from [<802696a4>] (process_one_work+0x2d4/0x998 kernel/workqueue.c:2275)
       r10:00000008 r9:8404ec98 r8:84367a02 r7:ddfe6400 r6:ddfe2d40 r5:898dac80
       r4:8a86d43c
      [<802693d0>] (process_one_work) from [<80269dcc>] (worker_thread+0x64/0x54c kernel/workqueue.c:2421)
       r10:00000008 r9:8a9c6000 r8:84006d00 r7:ddfe2d78 r6:898dac94 r5:ddfe2d40
       r4:898dac80
      [<80269d68>] (worker_thread) from [<80271f40>] (kthread+0x184/0x1a4 kernel/kthread.c:292)
       r10:85247e64 r9:898dac80 r8:80269d68 r7:00000000 r6:8a9c6000 r5:89a2ee40
       r4:8a97bd00
      [<80271dbc>] (kthread) from [<80200114>] (ret_from_fork+0x14/0x20 arch/arm/kernel/entry-common.S:158)
      Exception stack(0x8a9c7fb0 to 0x8a9c7ff8)
      
      Fixes: 412ca155 ("macvlan: Move broadcasts into a work queue")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dd4fa1da
    • Ido Schimmel's avatar
      drop_monitor: Perform cleanup upon probe registration failure · 9398e9c0
      Ido Schimmel authored
      In the rare case that drop_monitor fails to register its probe on the
      'napi_poll' tracepoint, it will not deactivate its hysteresis timer as
      part of the error path. If the hysteresis timer was armed by the shortly
      lived 'kfree_skb' probe and user space retries to initiate tracing, a
      warning will be emitted for trying to initialize an active object [1].
      
      Fix this by properly undoing all the operations that were done prior to
      probe registration, in both software and hardware code paths.
      
      Note that syzkaller managed to fail probe registration by injecting a
      slab allocation failure [2].
      
      [1]
      ODEBUG: init active (active state 0) object type: timer_list hint: sched_send_work+0x0/0x60 include/linux/list.h:135
      WARNING: CPU: 1 PID: 8649 at lib/debugobjects.c:505 debug_print_object+0x16e/0x250 lib/debugobjects.c:505
      Modules linked in:
      CPU: 1 PID: 8649 Comm: syz-executor.0 Not tainted 5.11.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:505
      [...]
      Call Trace:
       __debug_object_init+0x524/0xd10 lib/debugobjects.c:588
       debug_timer_init kernel/time/timer.c:722 [inline]
       debug_init kernel/time/timer.c:770 [inline]
       init_timer_key+0x2d/0x340 kernel/time/timer.c:814
       net_dm_trace_on_set net/core/drop_monitor.c:1111 [inline]
       set_all_monitor_traces net/core/drop_monitor.c:1188 [inline]
       net_dm_monitor_start net/core/drop_monitor.c:1295 [inline]
       net_dm_cmd_trace+0x720/0x1220 net/core/drop_monitor.c:1339
       genl_family_rcv_msg_doit+0x228/0x320 net/netlink/genetlink.c:739
       genl_family_rcv_msg net/netlink/genetlink.c:783 [inline]
       genl_rcv_msg+0x328/0x580 net/netlink/genetlink.c:800
       netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2502
       genl_rcv+0x24/0x40 net/netlink/genetlink.c:811
       netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline]
       netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338
       netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927
       sock_sendmsg_nosec net/socket.c:652 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:672
       ____sys_sendmsg+0x6e8/0x810 net/socket.c:2348
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2402
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2435
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      [2]
       FAULT_INJECTION: forcing a failure.
       name failslab, interval 1, probability 0, space 0, times 1
       CPU: 1 PID: 8645 Comm: syz-executor.0 Not tainted 5.11.0-syzkaller #0
       Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
       Call Trace:
        dump_stack+0xfa/0x151
        should_fail.cold+0x5/0xa
        should_failslab+0x5/0x10
        __kmalloc+0x72/0x3f0
        tracepoint_add_func+0x378/0x990
        tracepoint_probe_register+0x9c/0xe0
        net_dm_cmd_trace+0x7fc/0x1220
        genl_family_rcv_msg_doit+0x228/0x320
        genl_rcv_msg+0x328/0x580
        netlink_rcv_skb+0x153/0x420
        genl_rcv+0x24/0x40
        netlink_unicast+0x533/0x7d0
        netlink_sendmsg+0x856/0xd90
        sock_sendmsg+0xcf/0x120
        ____sys_sendmsg+0x6e8/0x810
        ___sys_sendmsg+0xf3/0x170
        __sys_sendmsg+0xe5/0x1b0
        do_syscall_64+0x2d/0x70
        entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Fixes: 70c69274 ("drop_monitor: Initialize timer and work item upon tracing enable")
      Fixes: 8ee2267a ("drop_monitor: Convert to using devlink tracepoint")
      Reported-by: syzbot+779559d6503f3a56213d@syzkaller.appspotmail.com
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9398e9c0
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 547fd083
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2021-03-10
      
      The following pull-request contains BPF updates for your *net* tree.
      
      We've added 8 non-merge commits during the last 5 day(s) which contain
      a total of 11 files changed, 136 insertions(+), 17 deletions(-).
      
      The main changes are:
      
      1) Reject bogus use of vmlinux BTF as map/prog creation BTF, from Alexei Starovoitov.
      
      2) Fix allocation failure splat in x86 JIT for large progs. Also fix overwriting
         percpu cgroup storage from tracing programs when nested, from Yonghong Song.
      
      3) Fix rx queue retrieval in XDP for multi-queue veth, from Maciej Fijalkowski.
      
      4) Fix bpf_check_mtu() helper API before freeze to have mtu_len as custom skb/xdp
         L3 input length, from Jesper Dangaard Brouer.
      
      5) Fix inode_storage's lookup_elem return value upon having bad fd, from Tal Lossos.
      
      6) Fix bpftool and libbpf cross-build on MacOS, from Georgi Valkov.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      547fd083
    • Wei Wang's avatar
      ipv6: fix suspecious RCU usage warning · 28259bac
      Wei Wang authored
      Syzbot reported the suspecious RCU usage in nexthop_fib6_nh() when
      called from ipv6_route_seq_show(). The reason is ipv6_route_seq_start()
      calls rcu_read_lock_bh(), while nexthop_fib6_nh() calls
      rcu_dereference_rtnl().
      The fix proposed is to add a variant of nexthop_fib6_nh() to use
      rcu_dereference_bh_rtnl() for ipv6_route_seq_show().
      
      The reported trace is as follows:
      ./include/net/nexthop.h:416 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      2 locks held by syz-executor.0/17895:
           at: seq_read+0x71/0x12a0 fs/seq_file.c:169
           at: seq_file_net include/linux/seq_file_net.h:19 [inline]
           at: ipv6_route_seq_start+0xaf/0x300 net/ipv6/ip6_fib.c:2616
      
      stack backtrace:
      CPU: 1 PID: 17895 Comm: syz-executor.0 Not tainted 4.15.0-syzkaller #0
      Call Trace:
       [<ffffffff849edf9e>] __dump_stack lib/dump_stack.c:17 [inline]
       [<ffffffff849edf9e>] dump_stack+0xd8/0x147 lib/dump_stack.c:53
       [<ffffffff8480b7fa>] lockdep_rcu_suspicious+0x153/0x15d kernel/locking/lockdep.c:5745
       [<ffffffff8459ada6>] nexthop_fib6_nh include/net/nexthop.h:416 [inline]
       [<ffffffff8459ada6>] ipv6_route_native_seq_show net/ipv6/ip6_fib.c:2488 [inline]
       [<ffffffff8459ada6>] ipv6_route_seq_show+0x436/0x7a0 net/ipv6/ip6_fib.c:2673
       [<ffffffff81c556df>] seq_read+0xccf/0x12a0 fs/seq_file.c:276
       [<ffffffff81dbc62c>] proc_reg_read+0x10c/0x1d0 fs/proc/inode.c:231
       [<ffffffff81bc28ae>] do_loop_readv_writev fs/read_write.c:714 [inline]
       [<ffffffff81bc28ae>] do_loop_readv_writev fs/read_write.c:701 [inline]
       [<ffffffff81bc28ae>] do_iter_read+0x49e/0x660 fs/read_write.c:935
       [<ffffffff81bc81ab>] vfs_readv+0xfb/0x170 fs/read_write.c:997
       [<ffffffff81c88847>] kernel_readv fs/splice.c:361 [inline]
       [<ffffffff81c88847>] default_file_splice_read+0x487/0x9c0 fs/splice.c:416
       [<ffffffff81c86189>] do_splice_to+0x129/0x190 fs/splice.c:879
       [<ffffffff81c86f66>] splice_direct_to_actor+0x256/0x890 fs/splice.c:951
       [<ffffffff81c8777d>] do_splice_direct+0x1dd/0x2b0 fs/splice.c:1060
       [<ffffffff81bc4747>] do_sendfile+0x597/0xce0 fs/read_write.c:1459
       [<ffffffff81bca205>] SYSC_sendfile64 fs/read_write.c:1520 [inline]
       [<ffffffff81bca205>] SyS_sendfile64+0x155/0x170 fs/read_write.c:1506
       [<ffffffff81015fcf>] do_syscall_64+0x1ff/0x310 arch/x86/entry/common.c:305
       [<ffffffff84a00076>] entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      Fixes: f88d8ea6 ("ipv6: Plumb support for nexthop object in a fib6_info")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Cc: David Ahern <dsahern@kernel.org>
      Cc: Ido Schimmel <idosch@idosch.org>
      Cc: Petr Machata <petrm@nvidia.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      28259bac
    • David S. Miller's avatar
      Merge branch 'ip6ip6-crash' · c89489b4
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      Fix ip6ip6 crash for collect_md skbs
      
      Fix a NULL pointer deref panic I ran into for regular ip6ip6 tunnel devices
      when collect_md populated skbs were redirected to them for xmit. See patches
      for further details, thanks!
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c89489b4
    • Daniel Borkmann's avatar
      net, bpf: Fix ip6ip6 crash with collect_md populated skbs · a188bb56
      Daniel Borkmann authored
      I ran into a crash where setting up a ip6ip6 tunnel device which was /not/
      set to collect_md mode was receiving collect_md populated skbs for xmit.
      
      The BPF prog was populating the skb via bpf_skb_set_tunnel_key() which is
      assigning special metadata dst entry and then redirecting the skb to the
      device, taking ip6_tnl_start_xmit() -> ipxip6_tnl_xmit() -> ip6_tnl_xmit()
      and in the latter it performs a neigh lookup based on skb_dst(skb) where
      we trigger a NULL pointer dereference on dst->ops->neigh_lookup() since
      the md_dst_ops do not populate neigh_lookup callback with a fake handler.
      
      Transform the md_dst_ops into generic dst_blackhole_ops that can also be
      reused elsewhere when needed, and use them for the metadata dst entries as
      callback ops.
      
      Also, remove the dst_md_discard{,_out}() ops and rely on dst_discard{,_out}()
      from dst_init() which free the skb the same way modulo the splat. Given we
      will be able to recover just fine from there, avoid any potential splats
      iff this gets ever triggered in future (or worse, panic on warns when set).
      
      Fixes: f38a9eb1 ("dst: Metadata destinations")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a188bb56
    • Daniel Borkmann's avatar
      net: Consolidate common blackhole dst ops · c4c877b2
      Daniel Borkmann authored
      Move generic blackhole dst ops to the core and use them from both
      ipv4_dst_blackhole_ops and ip6_dst_blackhole_ops where possible. No
      functional change otherwise. We need these also in other locations
      and having to define them over and over again is not great.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c4c877b2
    • Linus Torvalds's avatar
      Merge git://git.kernel.org:/pub/scm/linux/kernel/git/netdev/net · 05a59d79
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix transmissions in dynamic SMPS mode in ath9k, from Felix Fietkau.
      
       2) TX skb error handling fix in mt76 driver, also from Felix.
      
       3) Fix BPF_FETCH atomic in x86 JIT, from Brendan Jackman.
      
       4) Avoid double free of percpu pointers when freeing a cloned bpf prog.
          From Cong Wang.
      
       5) Use correct printf format for dma_addr_t in ath11k, from Geert
          Uytterhoeven.
      
       6) Fix resolve_btfids build with older toolchains, from Kun-Chuan
          Hsieh.
      
       7) Don't report truncated frames to mac80211 in mt76 driver, from
          Lorenzop Bianconi.
      
       8) Fix watcdog timeout on suspend/resume of stmmac, from Joakim Zhang.
      
       9) mscc ocelot needs NET_DEVLINK selct in Kconfig, from Arnd Bergmann.
      
      10) Fix sign comparison bug in TCP_ZEROCOPY_RECEIVE getsockopt(), from
          Arjun Roy.
      
      11) Ignore routes with deleted nexthop object in mlxsw, from Ido
          Schimmel.
      
      12) Need to undo tcp early demux lookup sometimes in nf_nat, from
          Florian Westphal.
      
      13) Fix gro aggregation for udp encaps with zero csum, from Daniel
          Borkmann.
      
      14) Make sure to always use imp*_ndo_send when necessaey, from Jason A.
          Donenfeld.
      
      15) Fix TRSCER masks in sh_eth driver from Sergey Shtylyov.
      
      16) prevent overly huge skb allocationsd in qrtr, from Pavel Skripkin.
      
      17) Prevent rx ring copnsumer index loss of sync in enetc, from Vladimir
          Oltean.
      
      18) Make sure textsearch copntrol block is large enough, from Wilem de
          Bruijn.
      
      19) Revert MAC changes to r8152 leading to instability, from Hates Wang.
      
      20) Advance iov in 9p even for empty reads, from Jissheng Zhang.
      
      21) Double hook unregister in nftables, from PabloNeira Ayuso.
      
      22) Fix memleak in ixgbe, fropm Dinghao Liu.
      
      23) Avoid dups in pkt scheduler class dumps, from Maximilian Heyne.
      
      24) Various mptcp fixes from Florian Westphal, Paolo Abeni, and Geliang
          Tang.
      
      25) Fix DOI refcount bugs in cipso, from Paul Moore.
      
      26) One too many irqsave in ibmvnic, from Junlin Yang.
      
      27) Fix infinite loop with MPLS gso segmenting via virtio_net, from
          Balazs Nemeth.
      
      * git://git.kernel.org:/pub/scm/linux/kernel/git/netdev/net: (164 commits)
        s390/qeth: fix notification for pending buffers during teardown
        s390/qeth: schedule TX NAPI on QAOB completion
        s390/qeth: improve completion of pending TX buffers
        s390/qeth: fix memory leak after failed TX Buffer allocation
        net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0
        net: check if protocol extracted by virtio_net_hdr_set_proto is correct
        net: dsa: xrs700x: check if partner is same as port in hsr join
        net: lapbether: Remove netif_start_queue / netif_stop_queue
        atm: idt77252: fix null-ptr-dereference
        atm: uPD98402: fix incorrect allocation
        atm: fix a typo in the struct description
        net: qrtr: fix error return code of qrtr_sendmsg()
        mptcp: fix length of ADD_ADDR with port sub-option
        net: bonding: fix error return code of bond_neigh_init()
        net: enetc: allow hardware timestamping on TX queues with tc-etf enabled
        net: enetc: set MAC RX FIFO to recommended value
        net: davicom: Use platform_get_irq_optional()
        net: davicom: Fix regulator not turned off on driver removal
        net: davicom: Fix regulator not turned off on failed probe
        net: dsa: fix switchdev objects on bridge master mistakenly being applied on ports
        ...
      05a59d79
    • Linus Torvalds's avatar
      Merge git://git.kernel.org:/pub/scm/linux/kernel/git/davem/sparc · 6a30bedf
      Linus Torvalds authored
      Pull sparc fixes from David Miller:
       "Fix opcode filtering for exceptions, and clean up defconfig"
      
      * git://git.kernel.org:/pub/scm/linux/kernel/git/davem/sparc:
        sparc: sparc64_defconfig: remove duplicate CONFIGs
        sparc64: Fix opcode filtering in handling of no fault loads
      6a30bedf
    • Corentin Labbe's avatar
      sparc: sparc64_defconfig: remove duplicate CONFIGs · 69264b4a
      Corentin Labbe authored
      After my patch there is CONFIG_ATA defined twice.
      Remove the duplicate one.
      Same problem for CONFIG_HAPPYMEAL, except I added as builtin for boot
      test with NFS.
      Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Fixes: a57cdeb3 ("sparc: sparc64_defconfig: add necessary configs for qemu")
      Signed-off-by: default avatarCorentin Labbe <clabbe@baylibre.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      69264b4a
    • Rob Gardner's avatar
      sparc64: Fix opcode filtering in handling of no fault loads · e5e8b80d
      Rob Gardner authored
      is_no_fault_exception() has two bugs which were discovered via random
      opcode testing with stress-ng. Both are caused by improper filtering
      of opcodes.
      
      The first bug can be triggered by a floating point store with a no-fault
      ASI, for instance "sta %f0, [%g0] #ASI_PNF", opcode C1A01040.
      
      The code first tests op3[5] (0x1000000), which denotes a floating
      point instruction, and then tests op3[2] (0x200000), which denotes a
      store instruction. But these bits are not mutually exclusive, and the
      above mentioned opcode has both bits set. The intent is to filter out
      stores, so the test for stores must be done first in order to have
      any effect.
      
      The second bug can be triggered by a floating point load with one of
      the invalid ASI values 0x8e or 0x8f, which pass this check in
      is_no_fault_exception():
           if ((asi & 0xf2) == ASI_PNF)
      
      An example instruction is "ldqa [%l7 + %o7] #ASI 0x8f, %f38",
      opcode CF95D1EF. Asi values greater than 0x8b (ASI_SNFL) are fatal
      in handle_ldf_stq(), and is_no_fault_exception() must not allow these
      invalid asi values to make it that far.
      
      In both of these cases, handle_ldf_stq() reacts by calling
      sun4v_data_access_exception() or spitfire_data_access_exception(),
      which call is_no_fault_exception() and results in an infinite
      recursion.
      Signed-off-by: default avatarRob Gardner <rob.gardner@oracle.com>
      Tested-by: default avatarAnatoly Pugachev <matorola@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e5e8b80d
    • David S. Miller's avatar
      Merge branch 's390-qeth-fixes' · 85154557
      David S. Miller authored
      Julian Wiedmann says:
      
      ====================
      s390/qeth: fixes 2021-03-09
      
      please apply the following patch series to netdev's net tree.
      
      This brings one fix for a memleak in an error path of the setup code.
      Also several fixes for dealing with pending TX buffers - two for old
      bugs in their completion handling, and one recent regression in a
      teardown path.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      85154557
    • Julian Wiedmann's avatar
      s390/qeth: fix notification for pending buffers during teardown · 7eefda7f
      Julian Wiedmann authored
      The cited commit reworked the state machine for pending TX buffers.
      In qeth_iqd_tx_complete() it turned PENDING into a transient state, and
      uses NEED_QAOB for buffers that get parked while waiting for their QAOB
      completion.
      
      But it missed to adjust the check in qeth_tx_complete_buf(). So if
      qeth_tx_complete_pending_bufs() is called during teardown to drain
      the parked TX buffers, we no longer raise a notification for af_iucv.
      
      Instead of updating the checked state, just move this code into
      qeth_tx_complete_pending_bufs() itself. This also gets rid of the
      special-case in the common TX completion path.
      
      Fixes: 8908f36d ("s390/qeth: fix af_iucv notification race")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7eefda7f
    • Julian Wiedmann's avatar
      s390/qeth: schedule TX NAPI on QAOB completion · 3e83d467
      Julian Wiedmann authored
      When a QAOB notifies us that a pending TX buffer has been delivered, the
      actual TX completion processing by qeth_tx_complete_pending_bufs()
      is done within the context of a TX NAPI instance. We shouldn't rely on
      this instance being scheduled by some other TX event, but just do it
      ourselves.
      
      qeth_qdio_handle_aob() is called from qeth_poll(), ie. our main NAPI
      instance. To avoid touching the TX queue's NAPI instance
      before/after it is (un-)registered, reorder the code in qeth_open()
      and qeth_stop() accordingly.
      
      Fixes: 0da9581d ("qeth: exploit asynchronous delivery of storage blocks")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3e83d467
    • Julian Wiedmann's avatar
      s390/qeth: improve completion of pending TX buffers · c20383ad
      Julian Wiedmann authored
      The current design attaches a pending TX buffer to a custom
      single-linked list, which is anchored at the buffer's slot on the
      TX ring. The buffer is then checked for final completion whenever
      this slot is processed during a subsequent TX NAPI poll cycle.
      
      But if there's insufficient traffic on the ring, we might never make
      enough progress to get back to this ring slot and discover the pending
      buffer's final TX completion. In particular if this missing TX
      completion blocks the application from sending further traffic.
      
      So convert the custom single-linked list code to a per-queue list_head,
      and scan this list on every TX NAPI cycle.
      
      Fixes: 0da9581d ("qeth: exploit asynchronous delivery of storage blocks")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c20383ad
    • Julian Wiedmann's avatar
      s390/qeth: fix memory leak after failed TX Buffer allocation · e7a36d27
      Julian Wiedmann authored
      When qeth_alloc_qdio_queues() fails to allocate one of the buffers that
      back an Output Queue, the 'out_freeoutqbufs' path will free all
      previously allocated buffers for this queue. But it misses to free the
      half-finished queue struct itself.
      
      Move the buffer allocation into qeth_alloc_output_queue(), and deal with
      such errors internally.
      
      Fixes: 0da9581d ("qeth: exploit asynchronous delivery of storage blocks")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Reviewed-by: default avatarAlexandra Winter <wintera@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7a36d27
    • David S. Miller's avatar
      Merge branch 'virtio_net-infinite-loop' · b005c9ef
      David S. Miller authored
      Balazs Nemeth says:
      
      ====================
      net: prevent infinite loop caused by incorrect proto from virtio_net_hdr_set_proto
      
      These patches prevent an infinite loop for gso packets with a protocol
      from virtio net hdr that doesn't match the protocol in the packet.
      Note that packets coming from a device without
      header_ops->parse_protocol being implemented will not be caught by
      the check in virtio_net_hdr_to_skb, but the infinite loop will still
      be prevented by the check in the gso layer.
      
      Changes from v2 to v3:
        - Remove unused *eth.
        - Use MPLS_HLEN to also check if the MPLS header length is a multiple
          of four.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b005c9ef
    • Balazs Nemeth's avatar
      net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0 · d348ede3
      Balazs Nemeth authored
      A packet with skb_inner_network_header(skb) == skb_network_header(skb)
      and ETH_P_MPLS_UC will prevent mpls_gso_segment from pulling any headers
      from the packet. Subsequently, the call to skb_mac_gso_segment will
      again call mpls_gso_segment with the same packet leading to an infinite
      loop. In addition, ensure that the header length is a multiple of four,
      which should hold irrespective of the number of stacked labels.
      Signed-off-by: default avatarBalazs Nemeth <bnemeth@redhat.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d348ede3
    • Balazs Nemeth's avatar
      net: check if protocol extracted by virtio_net_hdr_set_proto is correct · 924a9bc3
      Balazs Nemeth authored
      For gso packets, virtio_net_hdr_set_proto sets the protocol (if it isn't
      set) based on the type in the virtio net hdr, but the skb could contain
      anything since it could come from packet_snd through a raw socket. If
      there is a mismatch between what virtio_net_hdr_set_proto sets and
      the actual protocol, then the skb could be handled incorrectly later
      on.
      
      An example where this poses an issue is with the subsequent call to
      skb_flow_dissect_flow_keys_basic which relies on skb->protocol being set
      correctly. A specially crafted packet could fool
      skb_flow_dissect_flow_keys_basic preventing EINVAL to be returned.
      
      Avoid blindly trusting the information provided by the virtio net header
      by checking that the protocol in the packet actually matches the
      protocol set by virtio_net_hdr_set_proto. Note that since the protocol
      is only checked if skb->dev implements header_ops->parse_protocol,
      packets from devices without the implementation are not checked at this
      stage.
      
      Fixes: 9274124f ("net: stricter validation of untrusted gso packets")
      Signed-off-by: default avatarBalazs Nemeth <bnemeth@redhat.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      924a9bc3
    • George McCollister's avatar
      net: dsa: xrs700x: check if partner is same as port in hsr join · 286a8624
      George McCollister authored
      Don't assign dp to partner if it's the same port that xrs700x_hsr_join
      was called with. The partner port is supposed to be the other port in
      the HSR/PRP redundant pair not the same port. This fixes an issue
      observed in testing where forwarding between redundant HSR ports on this
      switch didn't work depending on the order the ports were added to the
      hsr device.
      
      Fixes: bd62e6f5 ("net: dsa: xrs700x: add HSR offloading support")
      Signed-off-by: default avatarGeorge McCollister <george.mccollister@gmail.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      286a8624
  2. 09 Mar, 2021 7 commits
    • Yonghong Song's avatar
      bpf, x86: Use kvmalloc_array instead kmalloc_array in bpf_jit_comp · de920fc6
      Yonghong Song authored
      x86 bpf_jit_comp.c used kmalloc_array to store jited addresses
      for each bpf insn. With a large bpf program, we have see the
      following allocation failures in our production server:
      
          page allocation failure: order:5, mode:0x40cc0(GFP_KERNEL|__GFP_COMP),
                                   nodemask=(null),cpuset=/,mems_allowed=0"
          Call Trace:
          dump_stack+0x50/0x70
          warn_alloc.cold.120+0x72/0xd2
          ? __alloc_pages_direct_compact+0x157/0x160
          __alloc_pages_slowpath+0xcdb/0xd00
          ? get_page_from_freelist+0xe44/0x1600
          ? vunmap_page_range+0x1ba/0x340
          __alloc_pages_nodemask+0x2c9/0x320
          kmalloc_order+0x18/0x80
          kmalloc_order_trace+0x1d/0xa0
          bpf_int_jit_compile+0x1e2/0x484
          ? kmalloc_order_trace+0x1d/0xa0
          bpf_prog_select_runtime+0xc3/0x150
          bpf_prog_load+0x480/0x720
          ? __mod_memcg_lruvec_state+0x21/0x100
          __do_sys_bpf+0xc31/0x2040
          ? close_pdeo+0x86/0xe0
          do_syscall_64+0x42/0x110
          entry_SYSCALL_64_after_hwframe+0x44/0xa9
          RIP: 0033:0x7f2f300f7fa9
          Code: Bad RIP value.
      
      Dumped assembly:
      
          ffffffff810b6d70 <bpf_int_jit_compile>:
          ; {
          ffffffff810b6d70: e8 eb a5 b4 00        callq   0xffffffff81c01360 <__fentry__>
          ffffffff810b6d75: 41 57                 pushq   %r15
          ...
          ffffffff810b6f39: e9 72 fe ff ff        jmp     0xffffffff810b6db0 <bpf_int_jit_compile+0x40>
          ;       addrs = kmalloc_array(prog->len + 1, sizeof(*addrs), GFP_KERNEL);
          ffffffff810b6f3e: 8b 45 0c              movl    12(%rbp), %eax
          ;       return __kmalloc(bytes, flags);
          ffffffff810b6f41: be c0 0c 00 00        movl    $3264, %esi
          ;       addrs = kmalloc_array(prog->len + 1, sizeof(*addrs), GFP_KERNEL);
          ffffffff810b6f46: 8d 78 01              leal    1(%rax), %edi
          ;       if (unlikely(check_mul_overflow(n, size, &bytes)))
          ffffffff810b6f49: 48 c1 e7 02           shlq    $2, %rdi
          ;       return __kmalloc(bytes, flags);
          ffffffff810b6f4d: e8 8e 0c 1d 00        callq   0xffffffff81287be0 <__kmalloc>
          ;       if (!addrs) {
          ffffffff810b6f52: 48 85 c0              testq   %rax, %rax
      
      Change kmalloc_array() to kvmalloc_array() to avoid potential
      allocation error for big bpf programs.
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210309015647.3657852-1-yhs@fb.com
      de920fc6
    • Yonghong Song's avatar
      bpf: Don't do bpf_cgroup_storage_set() for kuprobe/tp programs · 05a68ce5
      Yonghong Song authored
      For kuprobe and tracepoint bpf programs, kernel calls
      trace_call_bpf() which calls BPF_PROG_RUN_ARRAY_CHECK()
      to run the program array. Currently, BPF_PROG_RUN_ARRAY_CHECK()
      also calls bpf_cgroup_storage_set() to set percpu
      cgroup local storage with NULL value. This is
      due to Commit 394e40a2 ("bpf: extend bpf_prog_array to store
      pointers to the cgroup storage") which modified
      __BPF_PROG_RUN_ARRAY() to call bpf_cgroup_storage_set()
      and this macro is also used by BPF_PROG_RUN_ARRAY_CHECK().
      
      kuprobe and tracepoint programs are not allowed to call
      bpf_get_local_storage() helper hence does not
      access percpu cgroup local storage. Let us
      change BPF_PROG_RUN_ARRAY_CHECK() not to
      modify percpu cgroup local storage.
      
      The issue is observed when I tried to debug [1] where
      percpu data is overwritten due to
        preempt_disable -> migration_disable
      change. This patch does not completely fix the above issue,
      which will be addressed separately, e.g., multiple cgroup
      prog runs may preempt each other. But it does fix
      any potential issue caused by tracing program
      overwriting percpu cgroup storage:
       - in a busy system, a tracing program is to run between
         bpf_cgroup_storage_set() and the cgroup prog run.
       - a kprobe program is triggered by a helper in cgroup prog
         before bpf_get_local_storage() is called.
      
       [1] https://lore.kernel.org/bpf/CAKH8qBuXCfUz=w8L+Fj74OaUpbosO29niYwTki7e3Ag044_aww@mail.gmail.com/T
      
      Fixes: 394e40a2 ("bpf: extend bpf_prog_array to store pointers to the cgroup storage")
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarRoman Gushchin <guro@fb.com>
      Link: https://lore.kernel.org/bpf/20210309185028.3763817-1-yhs@fb.com
      05a68ce5
    • Linus Torvalds's avatar
      Merge tag 'gpio-fixes-for-v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux · 4b3d9f9c
      Linus Torvalds authored
      Pull gpio fixes from Bartosz Golaszewski:
       "A bunch of fixes for the GPIO subsystem. We have two regressions in
        the core code spotted right after the merge window, a series of fixes
        for ACPI GPIO and a subsequent fix for a related regression in
        gpio-pca953x + a minor tweak in .gitignore and a rework of handling of
        the gpio-line-names to remedy a regression in stm32mp151.
      
        Summary:
      
         - fix two regressions in core GPIO subsystem code: one NULL-pointer
           dereference and one list corruption
      
         - read GPIO line names from fwnode instead of using the generic
           device properties to fix a regression on stm32mp151
      
         - fixes to ACPI GPIO and gpio-pca953x to handle a regression in IRQ
           handling on Intel Galileo
      
         - update .gitignore in GPIO selftests"
      
      * tag 'gpio-fixes-for-v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
        gpiolib: Read "gpio-line-names" from a firmware node
        gpio: pca953x: Set IRQ type when handle Intel Galileo Gen 2
        gpiolib: acpi: Allow to find GpioInt() resource by name and index
        gpiolib: acpi: Add ACPI_GPIO_QUIRK_ABSOLUTE_NUMBER quirk
        gpiolib: acpi: Add missing IRQF_ONESHOT
        gpio: fix gpio-device list corruption
        gpio: fix NULL-deref-on-deregistration regression
        selftests: gpio: update .gitignore
      4b3d9f9c
    • Linus Torvalds's avatar
      Merge tag 'mips-fixes_5.12_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux · 9c39198a
      Linus Torvalds authored
      Pull MIPS fixes from Thomas Bogendoerfer:
      
       - fixes for boot breakage because of misaligned FDTs
      
       - fix for overwritten exception handlers
      
       - enable MIPS optimized crypto for all MIPS CPUs to improve wireguard
         performance
      
      * tag 'mips-fixes_5.12_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
        MIPS: kernel: Reserve exception base early to prevent corruption
        MIPS: vmlinux.lds.S: align raw appended dtb to 8 bytes
        crypto: mips/poly1305 - enable for all MIPS processors
        MIPS: boot/compressed: Copy DTB to aligned address
      9c39198a
    • Xie He's avatar
      net: lapbether: Remove netif_start_queue / netif_stop_queue · f7d9d485
      Xie He authored
      For the devices in this driver, the default qdisc is "noqueue",
      because their "tx_queue_len" is 0.
      
      In function "__dev_queue_xmit" in "net/core/dev.c", devices with the
      "noqueue" qdisc are specially handled. Packets are transmitted without
      being queued after a "dev->flags & IFF_UP" check. However, it's possible
      that even if this check succeeds, "ops->ndo_stop" may still have already
      been called. This is because in "__dev_close_many", "ops->ndo_stop" is
      called before clearing the "IFF_UP" flag.
      
      If we call "netif_stop_queue" in "ops->ndo_stop", then it's possible in
      "__dev_queue_xmit", it sees the "IFF_UP" flag is present, and then it
      checks "netif_xmit_stopped" and finds that the queue is already stopped.
      In this case, it will complain that:
      "Virtual device ... asks to queue packet!"
      
      To prevent "__dev_queue_xmit" from generating this complaint, we should
      not call "netif_stop_queue" in "ops->ndo_stop".
      
      We also don't need to call "netif_start_queue" in "ops->ndo_open",
      because after a netdev is allocated and registered, the
      "__QUEUE_STATE_DRV_XOFF" flag is initially not set, so there is no need
      to call "netif_start_queue" to clear it.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarXie He <xie.he.0141@gmail.com>
      Acked-by: default avatarMartin Schiller <ms@dev.tdt.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7d9d485
    • Thomas Bogendoerfer's avatar
      MIPS: kernel: Reserve exception base early to prevent corruption · bd67b711
      Thomas Bogendoerfer authored
      BMIPS is one of the few platforms that do change the exception base.
      After commit 2dcb3964 ("memblock: do not start bottom-up allocations
      with kernel_end") we started seeing BMIPS boards fail to boot with the
      built-in FDT being corrupted.
      
      Before the cited commit, early allocations would be in the [kernel_end,
      RAM_END] range, but after commit they would be within [RAM_START +
      PAGE_SIZE, RAM_END].
      
      The custom exception base handler that is installed by
      bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
      memory region allocated by unflatten_and_copy_device_tree() thus
      corrupting the FDT used by the kernel.
      
      To fix this, we need to perform an early reservation of the custom
      exception space. Additional we reserve the first 4k (1k for R3k) for
      either normal exception vector space (legacy CPUs) or special vectors
      like cache exceptions.
      
      Huge thanks to Serge for analysing and proposing a solution to this
      issue.
      
      Fixes: 2dcb3964 ("memblock: do not start bottom-up allocations with kernel_end")
      Reported-by: default avatarKamal Dasu <kdasu.kdev@gmail.com>
      Debugged-by: default avatarSerge Semin <Sergey.Semin@baikalelectronics.ru>
      Acked-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Tested-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarSerge Semin <fancer.lancer@gmail.com>
      Signed-off-by: default avatarThomas Bogendoerfer <tsbogend@alpha.franken.de>
      bd67b711
    • Linus Torvalds's avatar
      Merge git://git.kernel.org:/pub/scm/linux/kernel/git/davem/sparc · 987a0874
      Linus Torvalds authored
      Pull sparc updates from David Miller:
       "Just some more random bits from Al, including a conversion over to
        generic extables"
      
      * git://git.kernel.org:/pub/scm/linux/kernel/git/davem/sparc:
        sparc32: take ->thread.flags out
        sparc32: get rid of fake_swapper_regs
        sparc64: get rid of fake_swapper_regs
        sparc32: switch to generic extables
        sparc32: switch copy_user.S away from range exception table entries
        sparc32: get rid of range exception table entries in checksum_32.S
        sparc32: switch __bzero() away from range exception table entries
        sparc32: kill lookup_fault()
        sparc32: don't bother with lookup_fault() in __bzero()
      987a0874
  3. 08 Mar, 2021 13 commits