1. 16 Jun, 2021 2 commits
  2. 15 Jun, 2021 6 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · a4f0377d
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2021-06-15
      
      The following pull-request contains BPF updates for your *net* tree.
      
      We've added 5 non-merge commits during the last 11 day(s) which contain
      a total of 10 files changed, 115 insertions(+), 16 deletions(-).
      
      The main changes are:
      
      1) Fix marking incorrect umem ring as done in libbpf's
         xsk_socket__create_shared() helper, from Kev Jackson.
      
      2) Fix oob leakage under a spectre v1 type confusion
         attack, from Daniel Borkmann.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a4f0377d
    • Aleksander Jan Bajkowski's avatar
      lantiq: net: fix duplicated skb in rx descriptor ring · 7ea6cd16
      Aleksander Jan Bajkowski authored
      The previous commit didn't fix the bug properly. By mistake, it replaces
      the pointer of the next skb in the descriptor ring instead of the current
      one. As a result, the two descriptors are assigned the same SKB. The error
      is seen during the iperf test when skb_put tries to insert a second packet
      and exceeds the available buffer.
      
      Fixes: c7718ee9 ("net: lantiq: fix memory corruption in RX ring ")
      Signed-off-by: default avatarAleksander Jan Bajkowski <olek2@wp.pl>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7ea6cd16
    • Kristian Evensen's avatar
      qmi_wwan: Do not call netif_rx from rx_fixup · 057d4933
      Kristian Evensen authored
      When the QMI_WWAN_FLAG_PASS_THROUGH is set, netif_rx() is called from
      qmi_wwan_rx_fixup(). When the call to netif_rx() is successful (which is
      most of the time), usbnet_skb_return() is called (from rx_process()).
      usbnet_skb_return() will then call netif_rx() a second time for the same
      skb.
      
      Simplify the code and avoid the redundant netif_rx() call by changing
      qmi_wwan_rx_fixup() to always return 1 when QMI_WWAN_FLAG_PASS_THROUGH
      is set. We then leave it up to the existing infrastructure to call
      netif_rx().
      Suggested-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarKristian Evensen <kristian.evensen@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      057d4933
    • Maciej Żenczykowski's avatar
      net: cdc_ncm: switch to eth%d interface naming · c1a3d406
      Maciej Żenczykowski authored
      This is meant to make the host side cdc_ncm interface consistently
      named just like the older CDC protocols: cdc_ether & cdc_ecm
      (and even rndis_host), which all use 'FLAG_ETHER | FLAG_POINTTOPOINT'.
      
      include/linux/usb/usbnet.h:
        #define FLAG_ETHER	0x0020		/* maybe use "eth%d" names */
        #define FLAG_WLAN	0x0080		/* use "wlan%d" names */
        #define FLAG_WWAN	0x0400		/* use "wwan%d" names */
        #define FLAG_POINTTOPOINT 0x1000	/* possibly use "usb%d" names */
      
      drivers/net/usb/usbnet.c @ line 1711:
        strcpy (net->name, "usb%d");
        ...
        // heuristic:  "usb%d" for links we know are two-host,
        // else "eth%d" when there's reasonable doubt.  userspace
        // can rename the link if it knows better.
        if ((dev->driver_info->flags & FLAG_ETHER) != 0 &&
            ((dev->driver_info->flags & FLAG_POINTTOPOINT) == 0 ||
             (net->dev_addr [0] & 0x02) == 0))
                strcpy (net->name, "eth%d");
        /* WLAN devices should always be named "wlan%d" */
        if ((dev->driver_info->flags & FLAG_WLAN) != 0)
                strcpy(net->name, "wlan%d");
        /* WWAN devices should always be named "wwan%d" */
        if ((dev->driver_info->flags & FLAG_WWAN) != 0)
                strcpy(net->name, "wwan%d");
      
      So by using ETHER | POINTTOPOINT the interface naming is
      either usb%d or eth%d based on the global uniqueness of the
      mac address of the device.
      
      Without this 2.5gbps ethernet dongles which all seem to use the cdc_ncm
      driver end up being called usb%d instead of eth%d even though they're
      definitely not two-host.  (All 1gbps & 5gbps ethernet usb dongles I've
      tested don't hit this problem due to use of different drivers, primarily
      r8152 and aqc111)
      
      Fixes tag is based purely on git blame, and is really just here to make
      sure this hits LTS branches newer than v4.5.
      
      Cc: Lorenzo Colitti <lorenzo@google.com>
      Fixes: 4d06dd53 ("cdc_ncm: do not call usbnet_link_change from cdc_ncm_bind")
      Signed-off-by: default avatarMaciej Żenczykowski <maze@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1a3d406
    • Changbin Du's avatar
      net: inline function get_net_ns_by_fd if NET_NS is disabled · e34492de
      Changbin Du authored
      The function get_net_ns_by_fd() could be inlined when NET_NS is not
      enabled.
      Signed-off-by: default avatarChangbin Du <changbin.du@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e34492de
    • Jakub Kicinski's avatar
      ptp: improve max_adj check against unreasonable values · 475b92f9
      Jakub Kicinski authored
      Scaled PPM conversion to PPB may (on 64bit systems) result
      in a value larger than s32 can hold (freq/scaled_ppm is a long).
      This means the kernel will not correctly reject unreasonably
      high ->freq values (e.g. > 4294967295ppb, 281474976645 scaled PPM).
      
      The conversion is equivalent to a division by ~66 (65.536),
      so the value of ppb is always smaller than ppm, but not small
      enough to assume narrowing the type from long -> s32 is okay.
      
      Note that reasonable user space (e.g. ptp4l) will not use such
      high values, anyway, 4289046510ppb ~= 4.3x, so the fix is
      somewhat pedantic.
      
      Fixes: d39a7435 ("ptp: validate the requested frequency adjustment.")
      Fixes: d94ba80e ("ptp: Added a brand new class driver for ptp clocks.")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      475b92f9
  3. 14 Jun, 2021 16 commits
    • Subash Abhinov Kasiviswanathan's avatar
      net: mhi_net: Update the transmit handler prototype · 2214fb53
      Subash Abhinov Kasiviswanathan authored
      Update the function prototype of mhi_ndo_xmit to match
      ndo_start_xmit. This otherwise leads to run time failures when
      CFI is enabled in kernel.
      
      Fixes: 3ffec6a1 ("net: Add mhi-net driver")
      Signed-off-by: default avatarSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2214fb53
    • Daniel Borkmann's avatar
      bpf, selftests: Adjust few selftest outcomes wrt unreachable code · 973377ff
      Daniel Borkmann authored
      In almost all cases from test_verifier that have been changed in here, we've
      had an unreachable path with a load from a register which has an invalid
      address on purpose. This was basically to make sure that we never walk this
      path and to have the verifier complain if it would otherwise. Change it to
      match on the right error for unprivileged given we now test these paths
      under speculative execution.
      
      There's one case where we match on exact # of insns_processed. Due to the
      extra path, this will of course mismatch on unprivileged. Thus, restrict the
      test->insn_processed check to privileged-only.
      
      In one other case, we result in a 'pointer comparison prohibited' error. This
      is similarly due to verifying an 'invalid' branch where we end up with a value
      pointer on one side of the comparison.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      973377ff
    • Daniel Borkmann's avatar
      bpf: Fix leakage under speculation on mispredicted branches · 9183671a
      Daniel Borkmann authored
      The verifier only enumerates valid control-flow paths and skips paths that
      are unreachable in the non-speculative domain. And so it can miss issues
      under speculative execution on mispredicted branches.
      
      For example, a type confusion has been demonstrated with the following
      crafted program:
      
        // r0 = pointer to a map array entry
        // r6 = pointer to readable stack slot
        // r9 = scalar controlled by attacker
        1: r0 = *(u64 *)(r0) // cache miss
        2: if r0 != 0x0 goto line 4
        3: r6 = r9
        4: if r0 != 0x1 goto line 6
        5: r9 = *(u8 *)(r6)
        6: // leak r9
      
      Since line 3 runs iff r0 == 0 and line 5 runs iff r0 == 1, the verifier
      concludes that the pointer dereference on line 5 is safe. But: if the
      attacker trains both the branches to fall-through, such that the following
      is speculatively executed ...
      
        r6 = r9
        r9 = *(u8 *)(r6)
        // leak r9
      
      ... then the program will dereference an attacker-controlled value and could
      leak its content under speculative execution via side-channel. This requires
      to mistrain the branch predictor, which can be rather tricky, because the
      branches are mutually exclusive. However such training can be done at
      congruent addresses in user space using different branches that are not
      mutually exclusive. That is, by training branches in user space ...
      
        A:  if r0 != 0x0 goto line C
        B:  ...
        C:  if r0 != 0x0 goto line D
        D:  ...
      
      ... such that addresses A and C collide to the same CPU branch prediction
      entries in the PHT (pattern history table) as those of the BPF program's
      lines 2 and 4, respectively. A non-privileged attacker could simply brute
      force such collisions in the PHT until observing the attack succeeding.
      
      Alternative methods to mistrain the branch predictor are also possible that
      avoid brute forcing the collisions in the PHT. A reliable attack has been
      demonstrated, for example, using the following crafted program:
      
        // r0 = pointer to a [control] map array entry
        // r7 = *(u64 *)(r0 + 0), training/attack phase
        // r8 = *(u64 *)(r0 + 8), oob address
        // [...]
        // r0 = pointer to a [data] map array entry
        1: if r7 == 0x3 goto line 3
        2: r8 = r0
        // crafted sequence of conditional jumps to separate the conditional
        // branch in line 193 from the current execution flow
        3: if r0 != 0x0 goto line 5
        4: if r0 == 0x0 goto exit
        5: if r0 != 0x0 goto line 7
        6: if r0 == 0x0 goto exit
        [...]
        187: if r0 != 0x0 goto line 189
        188: if r0 == 0x0 goto exit
        // load any slowly-loaded value (due to cache miss in phase 3) ...
        189: r3 = *(u64 *)(r0 + 0x1200)
        // ... and turn it into known zero for verifier, while preserving slowly-
        // loaded dependency when executing:
        190: r3 &= 1
        191: r3 &= 2
        // speculatively bypassed phase dependency
        192: r7 += r3
        193: if r7 == 0x3 goto exit
        194: r4 = *(u8 *)(r8 + 0)
        // leak r4
      
      As can be seen, in training phase (phase != 0x3), the condition in line 1
      turns into false and therefore r8 with the oob address is overridden with
      the valid map value address, which in line 194 we can read out without
      issues. However, in attack phase, line 2 is skipped, and due to the cache
      miss in line 189 where the map value is (zeroed and later) added to the
      phase register, the condition in line 193 takes the fall-through path due
      to prior branch predictor training, where under speculation, it'll load the
      byte at oob address r8 (unknown scalar type at that point) which could then
      be leaked via side-channel.
      
      One way to mitigate these is to 'branch off' an unreachable path, meaning,
      the current verification path keeps following the is_branch_taken() path
      and we push the other branch to the verification stack. Given this is
      unreachable from the non-speculative domain, this branch's vstate is
      explicitly marked as speculative. This is needed for two reasons: i) if
      this path is solely seen from speculative execution, then we later on still
      want the dead code elimination to kick in in order to sanitize these
      instructions with jmp-1s, and ii) to ensure that paths walked in the
      non-speculative domain are not pruned from earlier walks of paths walked in
      the speculative domain. Additionally, for robustness, we mark the registers
      which have been part of the conditional as unknown in the speculative path
      given there should be no assumptions made on their content.
      
      The fix in here mitigates type confusion attacks described earlier due to
      i) all code paths in the BPF program being explored and ii) existing
      verifier logic already ensuring that given memory access instruction
      references one specific data structure.
      
      An alternative to this fix that has also been looked at in this scope was to
      mark aux->alu_state at the jump instruction with a BPF_JMP_TAKEN state as
      well as direction encoding (always-goto, always-fallthrough, unknown), such
      that mixing of different always-* directions themselves as well as mixing of
      always-* with unknown directions would cause a program rejection by the
      verifier, e.g. programs with constructs like 'if ([...]) { x = 0; } else
      { x = 1; }' with subsequent 'if (x == 1) { [...] }'. For unprivileged, this
      would result in only single direction always-* taken paths, and unknown taken
      paths being allowed, such that the former could be patched from a conditional
      jump to an unconditional jump (ja). Compared to this approach here, it would
      have two downsides: i) valid programs that otherwise are not performing any
      pointer arithmetic, etc, would potentially be rejected/broken, and ii) we are
      required to turn off path pruning for unprivileged, where both can be avoided
      in this work through pushing the invalid branch to the verification stack.
      
      The issue was originally discovered by Adam and Ofek, and later independently
      discovered and reported as a result of Benedict and Piotr's research work.
      
      Fixes: b2157399 ("bpf: prevent out-of-bounds speculation")
      Reported-by: default avatarAdam Morrison <mad@cs.tau.ac.il>
      Reported-by: default avatarOfek Kirzner <ofekkir@gmail.com>
      Reported-by: default avatarBenedict Schlueter <benedict.schlueter@rub.de>
      Reported-by: default avatarPiotr Krysiuk <piotras@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Reviewed-by: default avatarBenedict Schlueter <benedict.schlueter@rub.de>
      Reviewed-by: default avatarPiotr Krysiuk <piotras@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      9183671a
    • Daniel Borkmann's avatar
      bpf: Do not mark insn as seen under speculative path verification · fe9a5ca7
      Daniel Borkmann authored
      ... in such circumstances, we do not want to mark the instruction as seen given
      the goal is still to jmp-1 rewrite/sanitize dead code, if it is not reachable
      from the non-speculative path verification. We do however want to verify it for
      safety regardless.
      
      With the patch as-is all the insns that have been marked as seen before the
      patch will also be marked as seen after the patch (just with a potentially
      different non-zero count). An upcoming patch will also verify paths that are
      unreachable in the non-speculative domain, hence this extension is needed.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Reviewed-by: default avatarBenedict Schlueter <benedict.schlueter@rub.de>
      Reviewed-by: default avatarPiotr Krysiuk <piotras@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      fe9a5ca7
    • Daniel Borkmann's avatar
      bpf: Inherit expanded/patched seen count from old aux data · d203b0fd
      Daniel Borkmann authored
      Instead of relying on current env->pass_cnt, use the seen count from the
      old aux data in adjust_insn_aux_data(), and expand it to the new range of
      patched instructions. This change is valid given we always expand 1:n
      with n>=1, so what applies to the old/original instruction needs to apply
      for the replacement as well.
      
      Not relying on env->pass_cnt is a prerequisite for a later change where we
      want to avoid marking an instruction seen when verified under speculative
      execution path.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Reviewed-by: default avatarBenedict Schlueter <benedict.schlueter@rub.de>
      Reviewed-by: default avatarPiotr Krysiuk <piotras@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d203b0fd
    • David S. Miller's avatar
      Merge tag 'for-net-2021-06-14' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth · 45deacc7
      David S. Miller authored
      Luiz Augusto von Dentz says:
      
      ====================
      bluetooth pull request for net:
      
       - Fix crash on SMP when debug is enabled
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      45deacc7
    • Luiz Augusto von Dentz's avatar
      Bluetooth: SMP: Fix crash when receiving new connection when debug is enabled · 995fca15
      Luiz Augusto von Dentz authored
      When receiving a new connection pchan->conn won't be initialized so the
      code cannot use bt_dev_dbg as the pointer to hci_dev won't be
      accessible.
      
      Fixes: 2e1614f7 ("Bluetooth: SMP: Convert BT_ERR/BT_DBG to bt_dev_err/bt_dev_dbg")
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      995fca15
    • Pavel Skripkin's avatar
      net: qrtr: fix OOB Read in qrtr_endpoint_post · ad9d24c9
      Pavel Skripkin authored
      Syzbot reported slab-out-of-bounds Read in
      qrtr_endpoint_post. The problem was in wrong
      _size_ type:
      
      	if (len != ALIGN(size, 4) + hdrlen)
      		goto err;
      
      If size from qrtr_hdr is 4294967293 (0xfffffffd), the result of
      ALIGN(size, 4) will be 0. In case of len == hdrlen and size == 4294967293
      in header this check won't fail and
      
      	skb_put_data(skb, data + hdrlen, size);
      
      will read out of bound from data, which is hdrlen allocated block.
      
      Fixes: 194ccc88 ("net: qrtr: Support decoding incoming v2 packets")
      Reported-and-tested-by: syzbot+1917d778024161609247@syzkaller.appspotmail.com
      Signed-off-by: default avatarPavel Skripkin <paskripkin@gmail.com>
      Reviewed-by: default avatarBjorn Andersson <bjorn.andersson@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad9d24c9
    • David Ahern's avatar
      ipv4: Fix device used for dst_alloc with local routes · b87b04f5
      David Ahern authored
      Oliver reported a use case where deleting a VRF device can hang
      waiting for the refcnt to drop to 0. The root cause is that the dst
      is allocated against the VRF device but cached on the loopback
      device.
      
      The use case (added to the selftests) has an implicit VRF crossing
      due to the ordering of the FIB rules (lookup local is before the
      l3mdev rule, but the problem occurs even if the FIB rules are
      re-ordered with local after l3mdev because the VRF table does not
      have a default route to terminate the lookup). The end result is
      is that the FIB lookup returns the loopback device as the nexthop,
      but the ingress device is in a VRF. The mismatch causes the dst
      alloc against the VRF device but then cached on the loopback.
      
      The fix is to bring the trick used for IPv6 (see ip6_rt_get_dev_rcu):
      pick the dst alloc device based the fib lookup result but with checks
      that the result has a nexthop device (e.g., not an unreachable or
      prohibit entry).
      
      Fixes: f5a0aab8 ("net: ipv4: dst for local input routes should use l3mdev if relevant")
      Reported-by: default avatarOliver Herms <oliver.peter.herms@gmail.com>
      Signed-off-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b87b04f5
    • Pavel Skripkin's avatar
      net: caif: fix memory leak in ldisc_open · 58af3d3d
      Pavel Skripkin authored
      Syzbot reported memory leak in tty_init_dev().
      The problem was in unputted tty in ldisc_open()
      
      static int ldisc_open(struct tty_struct *tty)
      {
      ...
      	ser->tty = tty_kref_get(tty);
      ...
      	result = register_netdevice(dev);
      	if (result) {
      		rtnl_unlock();
      		free_netdev(dev);
      		return -ENODEV;
      	}
      ...
      }
      
      Ser pointer is netdev private_data, so after free_netdev()
      this pointer goes away with unputted tty reference. So, fix
      it by adding tty_kref_put() before freeing netdev.
      
      Reported-and-tested-by: syzbot+f303e045423e617d2cad@syzkaller.appspotmail.com
      Signed-off-by: default avatarPavel Skripkin <paskripkin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      58af3d3d
    • Rahul Lakkireddy's avatar
      cxgb4: fix wrong ethtool n-tuple rule lookup · 09427c19
      Rahul Lakkireddy authored
      The TID returned during successful filter creation is relative to
      the region in which the filter is created. Using it directly always
      returns Hi Prio/Normal filter region's entry for the first couple of
      entries, even though the rule is actually inserted in Hash region.
      Fix by analyzing in which region the filter has been inserted and
      save the absolute TID to be used for lookup later.
      
      Fixes: db43b30c ("cxgb4: add ethtool n-tuple filter deletion")
      Signed-off-by: default avatarRahul Lakkireddy <rahul.lakkireddy@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      09427c19
    • Christophe JAILLET's avatar
      netxen_nic: Fix an error handling path in 'netxen_nic_probe()' · 49a10c7b
      Christophe JAILLET authored
      If an error occurs after a 'pci_enable_pcie_error_reporting()' call, it
      must be undone by a corresponding 'pci_disable_pcie_error_reporting()'
      call, as already done in the remove function.
      
      Fixes: e87ad553 ("netxen: support pci error handlers")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      49a10c7b
    • Christophe JAILLET's avatar
      qlcnic: Fix an error handling path in 'qlcnic_probe()' · cb337660
      Christophe JAILLET authored
      If an error occurs after a 'pci_enable_pcie_error_reporting()' call, it
      must be undone by a corresponding 'pci_disable_pcie_error_reporting()'
      call, as already done in the remove function.
      
      Fixes: 451724c8 ("qlcnic: aer support")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cb337660
    • Jakub Kicinski's avatar
      ethtool: strset: fix message length calculation · e175aef9
      Jakub Kicinski authored
      Outer nest for ETHTOOL_A_STRSET_STRINGSETS is not accounted for.
      This may result in ETHTOOL_MSG_STRSET_GET producing a warning like:
      
          calculated message payload length (684) not sufficient
          WARNING: CPU: 0 PID: 30967 at net/ethtool/netlink.c:369 ethnl_default_doit+0x87a/0xa20
      
      and a splat.
      
      As usually with such warnings three conditions must be met for the warning
      to trigger:
       - there must be no skb size rounding up (e.g. reply_size of 684);
       - string set must be per-device (so that the header gets populated);
       - the device name must be at least 12 characters long.
      
      all in all with current user space it looks like reading priv flags
      is the only place this could potentially happen. Or with syzbot :)
      
      Reported-by: syzbot+59aa77b92d06cd5a54f2@syzkaller.appspotmail.com
      Fixes: 71921690 ("ethtool: provide string sets with STRSET_GET request")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e175aef9
    • Alex Elder's avatar
      net: qualcomm: rmnet: don't over-count statistics · 994c393b
      Alex Elder authored
      The purpose of the loop using u64_stats_fetch_*_irq() is to ensure
      statistics on a given CPU are collected atomically. If one of the
      statistics values gets updated within the begin/retry window, the
      loop will run again.
      
      Currently the statistics totals are updated inside that window.
      This means that if the loop ever retries, the statistics for the
      CPU will be counted more than once.
      
      Fix this by taking a snapshot of a CPU's statistics inside the
      protected window, and then updating the counters with the snapshot
      values after exiting the loop.
      
      (Also add a newline at the end of this file...)
      
      Fixes: 192c4b5d ("net: qualcomm: rmnet: Add support for 64 bit stats")
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      994c393b
    • Tyson Moore's avatar
      sch_cake: revise docs for RFC 8622 LE PHB support · 4f667b8e
      Tyson Moore authored
      Commit b8392808 ("sch_cake: add RFC 8622 LE PHB support to CAKE
      diffserv handling") added the LE mark to the Bulk tin. Update the
      comments to reflect the change.
      Signed-off-by: default avatarTyson Moore <tyson@tyson.me>
      Acked-by: default avatarToke Høiland-Jørgensen <toke@toke.dk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4f667b8e
  4. 12 Jun, 2021 1 commit
    • Changbin Du's avatar
      net: make get_net_ns return error if NET_NS is disabled · ea6932d7
      Changbin Du authored
      There is a panic in socket ioctl cmd SIOCGSKNS when NET_NS is not enabled.
      The reason is that nsfs tries to access ns->ops but the proc_ns_operations
      is not implemented in this case.
      
      [7.670023] Unable to handle kernel NULL pointer dereference at virtual address 00000010
      [7.670268] pgd = 32b54000
      [7.670544] [00000010] *pgd=00000000
      [7.671861] Internal error: Oops: 5 [#1] SMP ARM
      [7.672315] Modules linked in:
      [7.672918] CPU: 0 PID: 1 Comm: systemd Not tainted 5.13.0-rc3-00375-g6799d4f2 #16
      [7.673309] Hardware name: Generic DT based system
      [7.673642] PC is at nsfs_evict+0x24/0x30
      [7.674486] LR is at clear_inode+0x20/0x9c
      
      The same to tun SIOCGSKNS command.
      
      To fix this problem, we make get_net_ns() return -EINVAL when NET_NS is
      disabled. Meanwhile move it to right place net/core/net_namespace.c.
      Signed-off-by: default avatarChangbin Du <changbin.du@gmail.com>
      Fixes: c62cce2c ("net: add an ioctl to get a socket network namespace")
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: David Laight <David.Laight@ACULAB.COM>
      Cc: Christian Brauner <christian.brauner@ubuntu.com>
      Suggested-by: default avatarJakub Kicinski <kuba@kernel.org>
      Acked-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ea6932d7
  5. 11 Jun, 2021 7 commits
  6. 10 Jun, 2021 8 commits
    • David S. Miller's avatar
      Merge branch 'mptcp-fixes' · 232e3683
      David S. Miller authored
      Mat Martineau says:
      
      ====================
      mptcp: More v5.13 fixes
      
      Here's another batch of MPTCP fixes for v5.13.
      
      Patch 1 cleans up memory accounting between the MPTCP-level socket and
      the subflows to more reliably transfer forward allocated memory under
      pressure.
      
      Patch 2 wakes up socket readers more reliably.
      
      Patch 3 changes a WARN_ONCE to a pr_debug.
      
      Patch 4 changes the selftests to only use syncookies in test cases where
      they do not cause spurious failures.
      
      Patch 5 modifies socket error reporting to avoid a possible soft lockup.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      232e3683
    • Paolo Abeni's avatar
      mptcp: fix soft lookup in subflow_error_report() · 499ada50
      Paolo Abeni authored
      Maxim reported a soft lookup in subflow_error_report():
      
       watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:0]
       RIP: 0010:native_queued_spin_lock_slowpath
       RSP: 0018:ffffa859c0003bc0 EFLAGS: 00000202
       RAX: 0000000000000101 RBX: 0000000000000001 RCX: 0000000000000000
       RDX: ffff9195c2772d88 RSI: 0000000000000000 RDI: ffff9195c2772d88
       RBP: ffff9195c2772d00 R08: 00000000000067b0 R09: c6e31da9eb1e44f4
       R10: ffff9195ef379700 R11: ffff9195edb50710 R12: ffff9195c2772d88
       R13: ffff9195f500e3d0 R14: ffff9195ef379700 R15: ffff9195ef379700
       FS:  0000000000000000(0000) GS:ffff91961f400000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 000000c000407000 CR3: 0000000002988000 CR4: 00000000000006f0
       Call Trace:
        <IRQ>
       _raw_spin_lock_bh
       subflow_error_report
       mptcp_subflow_data_available
       __mptcp_move_skbs_from_subflow
       mptcp_data_ready
       tcp_data_queue
       tcp_rcv_established
       tcp_v4_do_rcv
       tcp_v4_rcv
       ip_protocol_deliver_rcu
       ip_local_deliver_finish
       __netif_receive_skb_one_core
       netif_receive_skb
       rtl8139_poll 8139too
       __napi_poll
       net_rx_action
       __do_softirq
       __irq_exit_rcu
       common_interrupt
        </IRQ>
      
      The calling function - mptcp_subflow_data_available() - can be invoked
      from different contexts:
      - plain ssk socket lock
      - ssk socket lock + mptcp_data_lock
      - ssk socket lock + mptcp_data_lock + msk socket lock.
      
      Since subflow_error_report() tries to acquire the mptcp_data_lock, the
      latter two call chains will cause soft lookup.
      
      This change addresses the issue moving the error reporting call to
      outer functions, where the held locks list is known and the we can
      acquire only the needed one.
      Reported-by: default avatarMaxim Galaganov <max@internet.ru>
      Fixes: 15cc1045 ("mptcp: deliver ssk errors to msk")
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/199Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      499ada50
    • Paolo Abeni's avatar
      selftests: mptcp: enable syncookie only in absence of reorders · 2395da0e
      Paolo Abeni authored
      Syncookie validation may fail for OoO packets, causing spurious
      resets and self-tests failures, so let's force syncookie only
      for tests iteration with no OoO.
      
      Fixes: fed61c4b ("selftests: mptcp: make 2nd net namespace use tcp syn cookies unconditionally")
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/198Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2395da0e
    • Paolo Abeni's avatar
      mptcp: do not warn on bad input from the network · 61e71022
      Paolo Abeni authored
      warn_bad_map() produces a kernel WARN on bad input coming
      from the network. Use pr_debug() to avoid spamming the system
      log.
      
      Additionally, when the right bound check fails, warn_bad_map() reports
      the wrong ssn value, let's fix it.
      
      Fixes: 648ef4b8 ("mptcp: Implement MPTCP receive path")
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/107Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      61e71022
    • Paolo Abeni's avatar
      mptcp: wake-up readers only for in sequence data · 99d1055c
      Paolo Abeni authored
      Currently we rely on the subflow->data_avail field, which is subject to
      races:
      
      	ssk1
      		skb len = 500 DSS(seq=1, len=1000, off=0)
      		# data_avail == MPTCP_SUBFLOW_DATA_AVAIL
      
      	ssk2
      		skb len = 500 DSS(seq = 501, len=1000)
      		# data_avail == MPTCP_SUBFLOW_DATA_AVAIL
      
      	ssk1
      		skb len = 500 DSS(seq = 1, len=1000, off =500)
      		# still data_avail == MPTCP_SUBFLOW_DATA_AVAIL,
      		# as the skb is covered by a pre-existing map,
      		# which was in-sequence at reception time.
      
      Instead we can explicitly check if some has been received in-sequence,
      propagating the info from __mptcp_move_skbs_from_subflow().
      
      Additionally add the 'ONCE' annotation to the 'data_avail' memory
      access, as msk will read it outside the subflow socket lock.
      
      Fixes: 648ef4b8 ("mptcp: Implement MPTCP receive path")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      99d1055c
    • Paolo Abeni's avatar
      mptcp: try harder to borrow memory from subflow under pressure · 72f96132
      Paolo Abeni authored
      If the host is under sever memory pressure, and RX forward
      memory allocation for the msk fails, we try to borrow the
      required memory from the ingress subflow.
      
      The current attempt is a bit flaky: if skb->truesize is less
      than SK_MEM_QUANTUM, the ssk will not release any memory, and
      the next schedule will fail again.
      
      Instead, directly move the required amount of pages from the
      ssk to the msk, if available
      
      Fixes: 9c3f94e1 ("mptcp: add missing memory scheduling in the rx path")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      72f96132
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 22488e45
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Fix a crash when stateful expression with its own gc callback
         is used in a set definition.
      
      2) Skip IPv6 packets from any link-local address in IPv6 fib expression.
         Add a selftest for this scenario, from Florian Westphal.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      22488e45
    • David S. Miller's avatar
      Merge branch 'tcp-options-oob-fixes' · 0280f429
      David S. Miller authored
      Maxim Mikityanskiy says:
      
      ====================
      Fix out of bounds when parsing TCP options
      
      This series fixes out-of-bounds access in various places in the kernel
      where parsing of TCP options takes place. Fortunately, many more
      occurrences don't have this bug.
      
      v2 changes:
      
      synproxy: Added an early return when length < 0 to avoid calling
      skb_header_pointer with negative length.
      
      sch_cake: Added doff validation to avoid parsing garbage.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0280f429