1. 29 Dec, 2020 1 commit
  2. 28 Dec, 2020 25 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 4bfc4714
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2020-12-28
      
      The following pull-request contains BPF updates for your *net* tree.
      
      There is a small merge conflict between bpf tree commit 69ca310f
      ("bpf: Save correct stopping point in file seq iteration") and net tree
      commit 66ed5944 ("bpf/task_iter: In task_file_seq_get_next use
      task_lookup_next_fd_rcu"). The get_files_struct() does not exist anymore
      in net, so take the hunk in HEAD and add the `info->tid = curr_tid` to
      the error path:
      
        [...]
                      curr_task = task_seq_get_next(ns, &curr_tid, true);
                      if (!curr_task) {
                              info->task = NULL;
                              info->tid = curr_tid;
                              return NULL;
                      }
      
                      /* set info->task and info->tid */
        [...]
      
      We've added 10 non-merge commits during the last 9 day(s) which contain
      a total of 11 files changed, 75 insertions(+), 20 deletions(-).
      
      The main changes are:
      
      1) Various AF_XDP fixes such as fill/completion ring leak on failed bind and
         fixing a race in skb mode's backpressure mechanism, from Magnus Karlsson.
      
      2) Fix latency spikes on lockdep enabled kernels by adding a rescheduling
         point to BPF hashtab initialization, from Eric Dumazet.
      
      3) Fix a splat in task iterator by saving the correct stopping point in the
         seq file iteration, from Jonathan Lemon.
      
      4) Fix BPF maps selftest by adding retries in case hashtab returns EBUSY
         errors on update/deletes, from Andrii Nakryiko.
      
      5) Fix BPF selftest error reporting to something more user friendly if the
         vmlinux BTF cannot be found, from Kamal Mostafa.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4bfc4714
    • Xie He's avatar
      net: hdlc_ppp: Fix issues when mod_timer is called while timer is running · 1fef7359
      Xie He authored
      ppp_cp_event is called directly or indirectly by ppp_rx with "ppp->lock"
      held. It may call mod_timer to add a new timer. However, at the same time
      ppp_timer may be already running and waiting for "ppp->lock". In this
      case, there's no need for ppp_timer to continue running and it can just
      exit.
      
      If we let ppp_timer continue running, it may call add_timer. This causes
      kernel panic because add_timer can't be called with a timer pending.
      This patch fixes this problem.
      
      Fixes: e022c2f0 ("WAN: new synchronous PPP implementation for generic HDLC.")
      Cc: Krzysztof Halasa <khc@pm.waw.pl>
      Signed-off-by: default avatarXie He <xie.he.0141@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1fef7359
    • Leo Le Bouter's avatar
      atlantic: remove architecture depends · 9b22fece
      Leo Le Bouter authored
      This was tested on a RaptorCS Talos II with IBM POWER9 DD2.2 CPUs and an
      ASUS XG-C100F PCI-e card without any issue. Speeds of ~8Gbps could be
      attained with not-very-scientific (wget HTTP) both-ways measurements on
      a local network. No warning or error reported in kernel logs. The
      drivers seems to be portable enough for it not to be gated like such.
      Signed-off-by: Leo Le Bouter's avatarLéo Le Bouter <lle-bout@zaclys.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b22fece
    • Cong Wang's avatar
      erspan: fix version 1 check in gre_parse_header() · 085c7c4e
      Cong Wang authored
      Both version 0 and version 1 use ETH_P_ERSPAN, but version 0 does not
      have an erspan header. So the check in gre_parse_header() is wrong,
      we have to distinguish version 1 from version 0.
      
      We can just check the gre header length like is_erspan_type1().
      
      Fixes: cb73ee40 ("net: ip_gre: use erspan key field for tunnel lookup")
      Reported-by: syzbot+f583ce3d4ddf9836b27a@syzkaller.appspotmail.com
      Cc: William Tu <u9012063@gmail.com>
      Cc: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: default avatarCong Wang <cong.wang@bytedance.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      085c7c4e
    • Yunjian Wang's avatar
      net: hns: fix return value check in __lb_other_process() · 5ede3ada
      Yunjian Wang authored
      The function skb_copy() could return NULL, the return value
      need to be checked.
      
      Fixes: b5996f11 ("net: add Hisilicon Network Subsystem basic ethernet support")
      Signed-off-by: default avatarYunjian Wang <wangyunjian@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5ede3ada
    • Randy Dunlap's avatar
      net: sched: prevent invalid Scell_log shift count · bd1248f1
      Randy Dunlap authored
      Check Scell_log shift size in red_check_params() and modify all callers
      of red_check_params() to pass Scell_log.
      
      This prevents a shift out-of-bounds as detected by UBSAN:
        UBSAN: shift-out-of-bounds in ./include/net/red.h:252:22
        shift exponent 72 is too large for 32-bit type 'int'
      
      Fixes: 8afa10cb ("net_sched: red: Avoid illegal values")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Reported-by: syzbot+97c5bd9cc81eca63d36e@syzkaller.appspotmail.com
      Cc: Nogah Frankel <nogahf@mellanox.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: netdev@vger.kernel.org
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bd1248f1
    • weichenchen's avatar
      net: neighbor: fix a crash caused by mod zero · a533b70a
      weichenchen authored
      pneigh_enqueue() tries to obtain a random delay by mod
      NEIGH_VAR(p, PROXY_DELAY). However, NEIGH_VAR(p, PROXY_DELAY)
      migth be zero at that point because someone could write zero
      to /proc/sys/net/ipv4/neigh/[device]/proxy_delay after the
      callers check it.
      
      This patch uses prandom_u32_max() to get a random delay instead
      which avoids potential division by zero.
      Signed-off-by: default avatarweichenchen <weichen.chen@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a533b70a
    • Guillaume Nault's avatar
      ipv4: Ignore ECN bits for fib lookups in fib_compute_spec_dst() · 21fdca22
      Guillaume Nault authored
      RT_TOS() only clears one of the ECN bits. Therefore, when
      fib_compute_spec_dst() resorts to a fib lookup, it can return
      different results depending on the value of the second ECN bit.
      
      For example, ECT(0) and ECT(1) packets could be treated differently.
      
        $ ip netns add ns0
        $ ip netns add ns1
        $ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1
        $ ip -netns ns0 link set dev lo up
        $ ip -netns ns1 link set dev lo up
        $ ip -netns ns0 link set dev veth01 up
        $ ip -netns ns1 link set dev veth10 up
      
        $ ip -netns ns0 address add 192.0.2.10/24 dev veth01
        $ ip -netns ns1 address add 192.0.2.11/24 dev veth10
      
        $ ip -netns ns1 address add 192.0.2.21/32 dev lo
        $ ip -netns ns1 route add 192.0.2.10/32 tos 4 dev veth10 src 192.0.2.21
        $ ip netns exec ns1 sysctl -wq net.ipv4.icmp_echo_ignore_broadcasts=0
      
      With TOS 4 and ECT(1), ns1 replies using source address 192.0.2.21
      (ping uses -Q to set all TOS and ECN bits):
      
        $ ip netns exec ns0 ping -c 1 -b -Q 5 192.0.2.255
        [...]
        64 bytes from 192.0.2.21: icmp_seq=1 ttl=64 time=0.544 ms
      
      But with TOS 4 and ECT(0), ns1 replies using source address 192.0.2.11
      because the "tos 4" route isn't matched:
      
        $ ip netns exec ns0 ping -c 1 -b -Q 6 192.0.2.255
        [...]
        64 bytes from 192.0.2.11: icmp_seq=1 ttl=64 time=0.597 ms
      
      After this patch the ECN bits don't affect the result anymore:
      
        $ ip netns exec ns0 ping -c 1 -b -Q 6 192.0.2.255
        [...]
        64 bytes from 192.0.2.21: icmp_seq=1 ttl=64 time=0.591 ms
      
      Fixes: 35ebf65e ("ipv4: Create and use fib_compute_spec_dst() helper.")
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      21fdca22
    • Stefan Chulski's avatar
      net: mvpp2: fix pkt coalescing int-threshold configuration · 4f374d2c
      Stefan Chulski authored
      The packet coalescing interrupt threshold has separated registers
      for different aggregated/cpu (sw-thread). The required value should
      be loaded for every thread but not only for 1 current cpu.
      
      Fixes: 213f428f ("net: mvpp2: add support for TX interrupts and RX queue distribution modes")
      Signed-off-by: default avatarStefan Chulski <stefanc@marvell.com>
      Link: https://lore.kernel.org/r/1608748521-11033-1-git-send-email-stefanc@marvell.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4f374d2c
    • Jakub Kicinski's avatar
      Merge branch 'net-ipa-fix-some-new-build-warnings' · bb2cc7d7
      Jakub Kicinski authored
      Alex Elder says:
      
      ====================
      net: ipa: fix some new build warnings
      
      I got a super friendly message from the Intel kernel test robot that
      pointed out that two patches I posted last week caused new build
      warnings.  I already had these problems fixed in my own tree but
      the fix was not included in what I sent out last week.
      ====================
      
      Link: https://lore.kernel.org/r/20201226213737.338928-1-elder@linaro.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bb2cc7d7
    • Alex Elder's avatar
      net: ipa: don't return a value from evt_ring_command() · 1ddf776b
      Alex Elder authored
      Callers of evt_ring_command() no longer care whether the command
      times out, and don't use what evt_ring_command() returns.  Redefine
      that function to have void return type.
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Fixes: 428b448e ("net: ipa: use state to determine event ring command success")
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1ddf776b
    • Alex Elder's avatar
      net: ipa: don't return a value from gsi_channel_command() · 1169318b
      Alex Elder authored
      Callers of gsi_channel_command() no longer care whether the command
      times out, and don't use what gsi_channel_command() returns.  Redefine
      that function to have void return type.
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Fixes: 6ffddf3b ("net: ipa: use state to determine channel command success")
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1169318b
    • Jakub Kicinski's avatar
      Merge branch 'bnxt_en-bug-fixes' · bc4adf0e
      Jakub Kicinski authored
      Michael Chan says:
      
      ====================
      bnxt_en: Bug fixes.
      
      The first patch fixes recovery of fatal AER errors.  The second one
      fixes a potential array out of bounds issue.
      ====================
      
      Link: https://lore.kernel.org/r/1609096698-15009-1-git-send-email-michael.chan@broadcom.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bc4adf0e
    • Michael Chan's avatar
      bnxt_en: Check TQM rings for maximum supported value. · a029a2fe
      Michael Chan authored
      TQM rings are hardware resources that require host context memory
      managed by the driver.  The driver supports up to 9 TQM rings and
      the number of rings to use is requested by firmware during run-time.
      Cap this number to the maximum supported to prevent accessing beyond
      the array.  Future firmware may request more than 9 TQM rings.  Define
      macros to remove the magic number 9 from the C code.
      
      Fixes: ac3158cb ("bnxt_en: Allocate TQM ring context memory according to fw specification.")
      Reviewed-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Reviewed-by: default avatarVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a029a2fe
    • Vasundhara Volam's avatar
      bnxt_en: Fix AER recovery. · fb1e6e56
      Vasundhara Volam authored
      A recent change skips sending firmware messages to the firmware when
      pci_channel_offline() is true during fatal AER error.  To make this
      complete, we need to move the re-initialization sequence to
      bnxt_io_resume(), otherwise the firmware messages to re-initialize
      will all be skipped.  In any case, it is more correct to re-initialize
      in bnxt_io_resume().
      
      Also, fix the reverse x-mas tree format when defining variables
      in bnxt_io_slot_reset().
      
      Fixes: b340dc68 ("bnxt_en: Avoid sending firmware messages when AER error is detected.")
      Reviewed-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fb1e6e56
    • Jakub Kicinski's avatar
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 74f88c16
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2020-12-23
      
      Commit e086ba2f ("e1000e: disable s0ix entry and exit flows for ME
      systems") disabled S0ix flows for systems that have various incarnations of
      the i219-LM ethernet controller.  This was done because of some regressions
      caused by an earlier commit 632fbd5e ("e1000e: fix S0ix flows for
      cable connected case") with i219-LM controller.
      
      Per discussion with Intel architecture team this direction should be
      changed and allow S0ix flows to be used by default.  This patch series
      includes directional changes for their conclusions in
      https://lkml.org/lkml/2020/12/13/15.
      
      * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        e1000e: Export S0ix flags to ethtool
        Revert "e1000e: disable s0ix entry and exit flows for ME systems"
        e1000e: bump up timeout to wait when ME un-configures ULP mode
        e1000e: Only run S0ix flows if shutdown succeeded
      ====================
      
      Link: https://lore.kernel.org/r/20201223233625.92519-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      74f88c16
    • Davide Caratti's avatar
      net: mptcp: cap forward allocation to 1M · e7579d5d
      Davide Caratti authored
      the following syzkaller reproducer:
      
       r0 = socket$inet_mptcp(0x2, 0x1, 0x106)
       bind$inet(r0, &(0x7f0000000080)={0x2, 0x4e24, @multicast2}, 0x10)
       connect$inet(r0, &(0x7f0000000480)={0x2, 0x4e24, @local}, 0x10)
       sendto$inet(r0, &(0x7f0000000100)="f6", 0xffffffe7, 0xc000, 0x0, 0x0)
      
      systematically triggers the following warning:
      
       WARNING: CPU: 2 PID: 8618 at net/core/stream.c:208 sk_stream_kill_queues+0x3fa/0x580
       Modules linked in:
       CPU: 2 PID: 8618 Comm: syz-executor Not tainted 5.10.0+ #334
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/04
       RIP: 0010:sk_stream_kill_queues+0x3fa/0x580
       Code: df 48 c1 ea 03 0f b6 04 02 84 c0 74 04 3c 03 7e 40 8b ab 20 02 00 00 e9 64 ff ff ff e8 df f0 81 2
       RSP: 0018:ffffc9000290fcb0 EFLAGS: 00010293
       RAX: ffff888011cb8000 RBX: 0000000000000000 RCX: ffffffff86eecf0e
       RDX: 0000000000000000 RSI: ffffffff86eecf6a RDI: 0000000000000005
       RBP: 0000000000000e28 R08: ffff888011cb8000 R09: fffffbfff1f48139
       R10: ffffffff8fa409c7 R11: fffffbfff1f48138 R12: ffff8880215e6220
       R13: ffffffff8fa409c0 R14: ffffc9000290fd30 R15: 1ffff92000521fa2
       FS:  00007f41c78f4800(0000) GS:ffff88802d000000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007f95c803d088 CR3: 0000000025ed2000 CR4: 00000000000006f0
       Call Trace:
        __mptcp_destroy_sock+0x4f5/0x8e0
         mptcp_close+0x5e2/0x7f0
        inet_release+0x12b/0x270
        __sock_release+0xc8/0x270
        sock_close+0x18/0x20
        __fput+0x272/0x8e0
        task_work_run+0xe0/0x1a0
        exit_to_user_mode_prepare+0x1df/0x200
        syscall_exit_to_user_mode+0x19/0x50
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      userspace programs provide arbitrarily high values of 'len' in sendmsg():
      this is causing integer overflow of 'amount'. Cap forward allocation to 1
      megabyte: higher values are not really useful.
      Suggested-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Fixes: e93da928 ("mptcp: implement wmem reservation")
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Link: https://lore.kernel.org/r/3334d00d8b2faecafdfab9aa593efcbf61442756.1608584474.git.dcaratti@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e7579d5d
    • Yunjian Wang's avatar
      tun: fix return value when the number of iovs exceeds MAX_SKB_FRAGS · 950271d7
      Yunjian Wang authored
      Currently the tun_napi_alloc_frags() function returns -ENOMEM when the
      number of iovs exceeds MAX_SKB_FRAGS + 1. However this is inappropriate,
      we should use -EMSGSIZE instead of -ENOMEM.
      
      The following distinctions are matters:
      1. the caller need to drop the bad packet when -EMSGSIZE is returned,
         which means meeting a persistent failure.
      2. the caller can try again when -ENOMEM is returned, which means
         meeting a transient failure.
      
      Fixes: 90e33d45 ("tun: enable napi_gro_frags() for TUN/TAP driver")
      Signed-off-by: default avatarYunjian Wang <wangyunjian@huawei.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Link: https://lore.kernel.org/r/1608864736-24332-1-git-send-email-wangyunjian@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      950271d7
    • Grygorii Strashko's avatar
      net: ethernet: ti: cpts: fix ethtool output when no ptp_clock registered · 4614792e
      Grygorii Strashko authored
      The CPTS driver registers PTP PHC clock when first netif is going up and
      unregister it when all netif are down. Now ethtool will show:
       - PTP PHC clock index 0 after boot until first netif is up;
       - the last assigned PTP PHC clock index even if PTP PHC clock is not
      registered any more after all netifs are down.
      
      This patch ensures that -1 is returned by ethtool when PTP PHC clock is not
      registered any more.
      
      Fixes: 8a2c9a5a ("net: ethernet: ti: cpts: rework initialization/deinitialization")
      Signed-off-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Link: https://lore.kernel.org/r/20201224162405.28032-1-grygorii.strashko@ti.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4614792e
    • Jakub Kicinski's avatar
      Merge branch 'net-sysfs-fix-race-conditions-in-the-xps-code' · 5ff3fda9
      Jakub Kicinski authored
      Antoine Tenart says:
      
      ====================
      net-sysfs: fix race conditions in the xps code
      
      This series fixes race conditions in the xps code, where out of bound
      accesses can occur when dev->num_tc is updated, triggering oops. The
      root cause is linked to locking issues. An explanation is given in each
      of the commit logs.
      
      We had a discussion on the v1 of this series about using the xps_map
      mutex instead of the rtnl lock. While that seemed a better compromise,
      v2 showed the added complexity wasn't best for fixes. So we decided to
      go back to v1 and use the rtnl lock.
      
      Because of this, the only differences between v1 and v3 are improvements
      in the commit messages.
      ====================
      
      Link: https://lore.kernel.org/r/20201223212323.3603139-1-atenart@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5ff3fda9
    • Antoine Tenart's avatar
      net-sysfs: take the rtnl lock when accessing xps_rxqs_map and num_tc · 4ae2bb81
      Antoine Tenart authored
      Accesses to dev->xps_rxqs_map (when using dev->num_tc) should be
      protected by the rtnl lock, like we do for netif_set_xps_queue. I didn't
      see an actual bug being triggered, but let's be safe here and take the
      rtnl lock while accessing the map in sysfs.
      
      Fixes: 8af2c06f ("net-sysfs: Add interface for Rx queue(s) map per Tx queue")
      Signed-off-by: default avatarAntoine Tenart <atenart@kernel.org>
      Reviewed-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4ae2bb81
    • Antoine Tenart's avatar
      net-sysfs: take the rtnl lock when storing xps_rxqs · 2d57b4f1
      Antoine Tenart authored
      Two race conditions can be triggered when storing xps rxqs, resulting in
      various oops and invalid memory accesses:
      
      1. Calling netdev_set_num_tc while netif_set_xps_queue:
      
         - netif_set_xps_queue uses dev->tc_num as one of the parameters to
           compute the size of new_dev_maps when allocating it. dev->tc_num is
           also used to access the map, and the compiler may generate code to
           retrieve this field multiple times in the function.
      
         - netdev_set_num_tc sets dev->tc_num.
      
         If new_dev_maps is allocated using dev->tc_num and then dev->tc_num
         is set to a higher value through netdev_set_num_tc, later accesses to
         new_dev_maps in netif_set_xps_queue could lead to accessing memory
         outside of new_dev_maps; triggering an oops.
      
      2. Calling netif_set_xps_queue while netdev_set_num_tc is running:
      
         2.1. netdev_set_num_tc starts by resetting the xps queues,
              dev->tc_num isn't updated yet.
      
         2.2. netif_set_xps_queue is called, setting up the map with the
              *old* dev->num_tc.
      
         2.3. netdev_set_num_tc updates dev->tc_num.
      
         2.4. Later accesses to the map lead to out of bound accesses and
              oops.
      
         A similar issue can be found with netdev_reset_tc.
      
      One way of triggering this is to set an iface up (for which the driver
      uses netdev_set_num_tc in the open path, such as bnx2x) and writing to
      xps_rxqs in a concurrent thread. With the right timing an oops is
      triggered.
      
      Both issues have the same fix: netif_set_xps_queue, netdev_set_num_tc
      and netdev_reset_tc should be mutually exclusive. We do that by taking
      the rtnl lock in xps_rxqs_store.
      
      Fixes: 8af2c06f ("net-sysfs: Add interface for Rx queue(s) map per Tx queue")
      Signed-off-by: default avatarAntoine Tenart <atenart@kernel.org>
      Reviewed-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2d57b4f1
    • Antoine Tenart's avatar
      net-sysfs: take the rtnl lock when accessing xps_cpus_map and num_tc · fb250385
      Antoine Tenart authored
      Accesses to dev->xps_cpus_map (when using dev->num_tc) should be
      protected by the rtnl lock, like we do for netif_set_xps_queue. I didn't
      see an actual bug being triggered, but let's be safe here and take the
      rtnl lock while accessing the map in sysfs.
      
      Fixes: 184c449f ("net: Add support for XPS with QoS via traffic classes")
      Signed-off-by: default avatarAntoine Tenart <atenart@kernel.org>
      Reviewed-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fb250385
    • Antoine Tenart's avatar
      net-sysfs: take the rtnl lock when storing xps_cpus · 1ad58225
      Antoine Tenart authored
      Two race conditions can be triggered when storing xps cpus, resulting in
      various oops and invalid memory accesses:
      
      1. Calling netdev_set_num_tc while netif_set_xps_queue:
      
         - netif_set_xps_queue uses dev->tc_num as one of the parameters to
           compute the size of new_dev_maps when allocating it. dev->tc_num is
           also used to access the map, and the compiler may generate code to
           retrieve this field multiple times in the function.
      
         - netdev_set_num_tc sets dev->tc_num.
      
         If new_dev_maps is allocated using dev->tc_num and then dev->tc_num
         is set to a higher value through netdev_set_num_tc, later accesses to
         new_dev_maps in netif_set_xps_queue could lead to accessing memory
         outside of new_dev_maps; triggering an oops.
      
      2. Calling netif_set_xps_queue while netdev_set_num_tc is running:
      
         2.1. netdev_set_num_tc starts by resetting the xps queues,
              dev->tc_num isn't updated yet.
      
         2.2. netif_set_xps_queue is called, setting up the map with the
              *old* dev->num_tc.
      
         2.3. netdev_set_num_tc updates dev->tc_num.
      
         2.4. Later accesses to the map lead to out of bound accesses and
              oops.
      
         A similar issue can be found with netdev_reset_tc.
      
      One way of triggering this is to set an iface up (for which the driver
      uses netdev_set_num_tc in the open path, such as bnx2x) and writing to
      xps_cpus in a concurrent thread. With the right timing an oops is
      triggered.
      
      Both issues have the same fix: netif_set_xps_queue, netdev_set_num_tc
      and netdev_reset_tc should be mutually exclusive. We do that by taking
      the rtnl lock in xps_cpus_store.
      
      Fixes: 184c449f ("net: Add support for XPS with QoS via traffic classes")
      Signed-off-by: default avatarAntoine Tenart <atenart@kernel.org>
      Reviewed-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1ad58225
    • Roland Dreier's avatar
      CDC-NCM: remove "connected" log message · 59b4a8fa
      Roland Dreier authored
      The cdc_ncm driver passes network connection notifications up to
      usbnet_link_change(), which is the right place for any logging.
      Remove the netdev_info() duplicating this from the driver itself.
      
      This stops devices such as my "TRENDnet USB 10/100/1G/2.5G LAN"
      (ID 20f4:e02b) adapter from spamming the kernel log with
      
          cdc_ncm 2-2:2.0 enp0s2u2c2: network connection: connected
      
      messages every 60 msec or so.
      Signed-off-by: default avatarRoland Dreier <roland@kernel.org>
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Link: https://lore.kernel.org/r/20201224032116.2453938-1-roland@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      59b4a8fa
  3. 24 Dec, 2020 3 commits
    • Jonathan Lemon's avatar
      bpf: Use thread_group_leader() · a61daaf3
      Jonathan Lemon authored
      Instead of directly comparing task->tgid and task->pid, use the
      thread_group_leader() helper.  This helps with readability, and
      there should be no functional change.
      Signed-off-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20201218185032.2464558-3-jonathan.lemon@gmail.com
      a61daaf3
    • Jonathan Lemon's avatar
      bpf: Save correct stopping point in file seq iteration · 69ca310f
      Jonathan Lemon authored
      On some systems, some variant of the following splat is
      repeatedly seen.  The common factor in all traces seems
      to be the entry point to task_file_seq_next().  With the
      patch, all warnings go away.
      
          rcu: INFO: rcu_sched self-detected stall on CPU
          rcu: \x0926-....: (20992 ticks this GP) idle=d7e/1/0x4000000000000002 softirq=81556231/81556231 fqs=4876
          \x09(t=21033 jiffies g=159148529 q=223125)
          NMI backtrace for cpu 26
          CPU: 26 PID: 2015853 Comm: bpftool Kdump: loaded Not tainted 5.6.13-0_fbk4_3876_gd8d1f9bf80bb #1
          Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A12 10/08/2018
          Call Trace:
           <IRQ>
           dump_stack+0x50/0x70
           nmi_cpu_backtrace.cold.6+0x13/0x50
           ? lapic_can_unplug_cpu.cold.30+0x40/0x40
           nmi_trigger_cpumask_backtrace+0xba/0xca
           rcu_dump_cpu_stacks+0x99/0xc7
           rcu_sched_clock_irq.cold.90+0x1b4/0x3aa
           ? tick_sched_do_timer+0x60/0x60
           update_process_times+0x24/0x50
           tick_sched_timer+0x37/0x70
           __hrtimer_run_queues+0xfe/0x270
           hrtimer_interrupt+0xf4/0x210
           smp_apic_timer_interrupt+0x5e/0x120
           apic_timer_interrupt+0xf/0x20
           </IRQ>
          RIP: 0010:get_pid_task+0x38/0x80
          Code: 89 f6 48 8d 44 f7 08 48 8b 00 48 85 c0 74 2b 48 83 c6 55 48 c1 e6 04 48 29 f0 74 19 48 8d 78 20 ba 01 00 00 00 f0 0f c1 50 20 <85> d2 74 27 78 11 83 c2 01 78 0c 48 83 c4 08 c3 31 c0 48 83 c4 08
          RSP: 0018:ffffc9000d293dc8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
          RAX: ffff888637c05600 RBX: ffffc9000d293e0c RCX: 0000000000000000
          RDX: 0000000000000001 RSI: 0000000000000550 RDI: ffff888637c05620
          RBP: ffffffff8284eb80 R08: ffff88831341d300 R09: ffff88822ffd8248
          R10: ffff88822ffd82d0 R11: 00000000003a93c0 R12: 0000000000000001
          R13: 00000000ffffffff R14: ffff88831341d300 R15: 0000000000000000
           ? find_ge_pid+0x1b/0x20
           task_seq_get_next+0x52/0xc0
           task_file_seq_get_next+0x159/0x220
           task_file_seq_next+0x4f/0xa0
           bpf_seq_read+0x159/0x390
           vfs_read+0x8a/0x140
           ksys_read+0x59/0xd0
           do_syscall_64+0x42/0x110
           entry_SYSCALL_64_after_hwframe+0x44/0xa9
          RIP: 0033:0x7f95ae73e76e
          Code: Bad RIP value.
          RSP: 002b:00007ffc02c1dbf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
          RAX: ffffffffffffffda RBX: 000000000170faa0 RCX: 00007f95ae73e76e
          RDX: 0000000000001000 RSI: 00007ffc02c1dc30 RDI: 0000000000000007
          RBP: 00007ffc02c1ec70 R08: 0000000000000005 R09: 0000000000000006
          R10: fffffffffffff20b R11: 0000000000000246 R12: 00000000019112a0
          R13: 0000000000000000 R14: 0000000000000007 R15: 00000000004283c0
      
      If unable to obtain the file structure for the current task,
      proceed to the next task number after the one returned from
      task_seq_get_next(), instead of the next task number from the
      original iterator.
      
      Also, save the stopping task number from task_seq_get_next()
      on failure in case of restarts.
      
      Fixes: eaaacd23 ("bpf: Add task and task/file iterator targets")
      Signed-off-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20201218185032.2464558-2-jonathan.lemon@gmail.com
      69ca310f
    • Andrii Nakryiko's avatar
      selftests/bpf: Work-around EBUSY errors from hashmap update/delete · 11b844b0
      Andrii Nakryiko authored
      20b6cc34 ("bpf: Avoid hashtab deadlock with map_locked") introduced
      a possibility of getting EBUSY error on lock contention, which seems to happen
      very deterministically in test_maps when running 1024 threads on low-CPU
      machine. In libbpf CI case, it's a 2 CPU VM and it's hitting this 100% of the
      time. Work around by retrying on EBUSY (and EAGAIN, while we are at it) after
      a small sleep. sched_yield() is too agressive and fails even after 20 retries,
      so I went with usleep(1) for backoff.
      
      Also log actual error returned to make it easier to see what's going on.
      
      Fixes: 20b6cc34 ("bpf: Avoid hashtab deadlock with map_locked")
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20201223200652.3417075-1-andrii@kernel.org
      11b844b0
  4. 23 Dec, 2020 11 commits