An error occurred fetching the project authors.
  1. 28 Jun, 2019 1 commit
    • John Hurley's avatar
      net: sched: refactor reinsert action · 720f22fe
      John Hurley authored
      The TC_ACT_REINSERT return type was added as an in-kernel only option to
      allow a packet ingress or egress redirect. This is used to avoid
      unnecessary skb clones in situations where they are not required. If a TC
      hook returns this code then the packet is 'reinserted' and no skb consume
      is carried out as no clone took place.
      
      This return type is only used in act_mirred. Rather than have the reinsert
      called from the main datapath, call it directly in act_mirred. Instead of
      returning TC_ACT_REINSERT, change the type to the new TC_ACT_CONSUMED
      which tells the caller that the packet has been stolen by another process
      and that no consume call is required.
      
      Moving all redirect calls to the act_mirred code is in preparation for
      tracking recursion created by act_mirred.
      Signed-off-by: default avatarJohn Hurley <john.hurley@netronome.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      720f22fe
  2. 16 Jun, 2019 1 commit
  3. 05 Jun, 2019 1 commit
  4. 30 May, 2019 3 commits
    • Eric Dumazet's avatar
      net-gro: fix use-after-free read in napi_gro_frags() · a4270d67
      Eric Dumazet authored
      If a network driver provides to napi_gro_frags() an
      skb with a page fragment of exactly 14 bytes, the call
      to gro_pull_from_frag0() will 'consume' the fragment
      by calling skb_frag_unref(skb, 0), and the page might
      be freed and reused.
      
      Reading eth->h_proto at the end of napi_frags_skb() might
      read mangled data, or crash under specific debugging features.
      
      BUG: KASAN: use-after-free in napi_frags_skb net/core/dev.c:5833 [inline]
      BUG: KASAN: use-after-free in napi_gro_frags+0xc6f/0xd10 net/core/dev.c:5841
      Read of size 2 at addr ffff88809366840c by task syz-executor599/8957
      
      CPU: 1 PID: 8957 Comm: syz-executor599 Not tainted 5.2.0-rc1+ #32
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
       __kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
       kasan_report+0x12/0x20 mm/kasan/common.c:614
       __asan_report_load_n_noabort+0xf/0x20 mm/kasan/generic_report.c:142
       napi_frags_skb net/core/dev.c:5833 [inline]
       napi_gro_frags+0xc6f/0xd10 net/core/dev.c:5841
       tun_get_user+0x2f3c/0x3ff0 drivers/net/tun.c:1991
       tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2037
       call_write_iter include/linux/fs.h:1872 [inline]
       do_iter_readv_writev+0x5f8/0x8f0 fs/read_write.c:693
       do_iter_write fs/read_write.c:970 [inline]
       do_iter_write+0x184/0x610 fs/read_write.c:951
       vfs_writev+0x1b3/0x2f0 fs/read_write.c:1015
       do_writev+0x15b/0x330 fs/read_write.c:1058
      
      Fixes: a50e233c ("net-gro: restore frag0 optimization")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a4270d67
    • Thomas Gleixner's avatar
      treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 · 2874c5fd
      Thomas Gleixner authored
      Based on 1 normalized pattern(s):
      
        this program is free software you can redistribute it and or modify
        it under the terms of the gnu general public license as published by
        the free software foundation either version 2 of the license or at
        your option any later version
      
      extracted by the scancode license scanner the SPDX license identifier
      
        GPL-2.0-or-later
      
      has been chosen to replace the boilerplate/reference in 3029 file(s).
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarAllison Randal <allison@lohutok.net>
      Cc: linux-spdx@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.deSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2874c5fd
    • Stephen Hemminger's avatar
      net: core: support XDP generic on stacked devices. · 458bf2f2
      Stephen Hemminger authored
      When a device is stacked like (team, bonding, failsafe or netvsc) the
      XDP generic program for the parent device was not called.
      
      Move the call to XDP generic inside __netif_receive_skb_core where
      it can be done multiple times for stacked case.
      
      Fixes: d4455169 ("net: xdp: support xdp generic on virtual devices")
      Signed-off-by: default avatarStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      458bf2f2
  5. 16 May, 2019 1 commit
  6. 05 May, 2019 1 commit
  7. 11 Apr, 2019 1 commit
    • Si-Wei Liu's avatar
      failover: allow name change on IFF_UP slave interfaces · 8065a779
      Si-Wei Liu authored
      When a netdev appears through hot plug then gets enslaved by a failover
      master that is already up and running, the slave will be opened
      right away after getting enslaved. Today there's a race that userspace
      (udev) may fail to rename the slave if the kernel (net_failover)
      opens the slave earlier than when the userspace rename happens.
      Unlike bond or team, the primary slave of failover can't be renamed by
      userspace ahead of time, since the kernel initiated auto-enslavement is
      unable to, or rather, is never meant to be synchronized with the rename
      request from userspace.
      
      As the failover slave interfaces are not designed to be operated
      directly by userspace apps: IP configuration, filter rules with
      regard to network traffic passing and etc., should all be done on master
      interface. In general, userspace apps only care about the
      name of master interface, while slave names are less important as long
      as admin users can see reliable names that may carry
      other information describing the netdev. For e.g., they can infer that
      "ens3nsby" is a standby slave of "ens3", while for a
      name like "eth0" they can't tell which master it belongs to.
      
      Historically the name of IFF_UP interface can't be changed because
      there might be admin script or management software that is already
      relying on such behavior and assumes that the slave name can't be
      changed once UP. But failover is special: with the in-kernel
      auto-enslavement mechanism, the userspace expectation for device
      enumeration and bring-up order is already broken. Previously initramfs
      and various userspace config tools were modified to bypass failover
      slaves because of auto-enslavement and duplicate MAC address. Similarly,
      in case that users care about seeing reliable slave name, the new type
      of failover slaves needs to be taken care of specifically in userspace
      anyway.
      
      It's less risky to lift up the rename restriction on failover slave
      which is already UP. Although it's possible this change may potentially
      break userspace component (most likely configuration scripts or
      management software) that assumes slave name can't be changed while
      UP, it's relatively a limited and controllable set among all userspace
      components, which can be fixed specifically to listen for the rename
      events on failover slaves. Userspace component interacting with slaves
      is expected to be changed to operate on failover master interface
      instead, as the failover slave is dynamic in nature which may come and
      go at any point.  The goal is to make the role of failover slaves less
      relevant, and userspace components should only deal with failover master
      in the long run.
      
      Fixes: 30c8bd5a ("net: Introduce generic failover module")
      Signed-off-by: default avatarSi-Wei Liu <si-wei.liu@oracle.com>
      Reviewed-by: default avatarLiran Alon <liran.alon@oracle.com>
      Acked-by: default avatarSridhar Samudrala <sridhar.samudrala@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8065a779
  8. 05 Apr, 2019 2 commits
  9. 02 Apr, 2019 1 commit
  10. 29 Mar, 2019 1 commit
    • Alexander Lobakin's avatar
      net: core: netif_receive_skb_list: unlist skb before passing to pt->func · 9a5a90d1
      Alexander Lobakin authored
      __netif_receive_skb_list_ptype() leaves skb->next poisoned before passing
      it to pt_prev->func handler, what may produce (in certain cases, e.g. DSA
      setup) crashes like:
      
      [ 88.606777] CPU 0 Unable to handle kernel paging request at virtual address 0000000e, epc == 80687078, ra == 8052cc7c
      [ 88.618666] Oops[#1]:
      [ 88.621196] CPU: 0 PID: 0 Comm: swapper Not tainted 5.1.0-rc2-dlink-00206-g4192a172-dirty #1473
      [ 88.630885] $ 0 : 00000000 10000400 00000002 864d7850
      [ 88.636709] $ 4 : 87c0ddf0 864d7800 87c0ddf0 00000000
      [ 88.642526] $ 8 : 00000000 49600000 00000001 00000001
      [ 88.648342] $12 : 00000000 c288617b dadbee27 25d17c41
      [ 88.654159] $16 : 87c0ddf0 85cff080 80790000 fffffffd
      [ 88.659975] $20 : 80797b20 ffffffff 00000001 864d7800
      [ 88.665793] $24 : 00000000 8011e658
      [ 88.671609] $28 : 80790000 87c0dbc0 87cabf00 8052cc7c
      [ 88.677427] Hi : 00000003
      [ 88.680622] Lo : 7b5b4220
      [ 88.683840] epc : 80687078 vlan_dev_hard_start_xmit+0x1c/0x1a0
      [ 88.690532] ra : 8052cc7c dev_hard_start_xmit+0xac/0x188
      [ 88.696734] Status: 10000404	IEp
      [ 88.700422] Cause : 50000008 (ExcCode 02)
      [ 88.704874] BadVA : 0000000e
      [ 88.708069] PrId : 0001a120 (MIPS interAptiv (multi))
      [ 88.713005] Modules linked in:
      [ 88.716407] Process swapper (pid: 0, threadinfo=(ptrval), task=(ptrval), tls=00000000)
      [ 88.725219] Stack : 85f61c28 00000000 0000000e 80780000 87c0ddf0 85cff080 80790000 8052cc7c
      [ 88.734529] 87cabf00 00000000 00000001 85f5fb40 807b0000 864d7850 87cabf00 807d0000
      [ 88.743839] 864d7800 8655f600 00000000 85cff080 87c1c000 0000006a 00000000 8052d96c
      [ 88.753149] 807a0000 8057adb8 87c0dcc8 87c0dc50 85cfff08 00000558 87cabf00 85f58c50
      [ 88.762460] 00000002 85f58c00 864d7800 80543308 fffffff4 00000001 85f58c00 864d7800
      [ 88.771770] ...
      [ 88.774483] Call Trace:
      [ 88.777199] [<80687078>] vlan_dev_hard_start_xmit+0x1c/0x1a0
      [ 88.783504] [<8052cc7c>] dev_hard_start_xmit+0xac/0x188
      [ 88.789326] [<8052d96c>] __dev_queue_xmit+0x6e8/0x7d4
      [ 88.794955] [<805a8640>] ip_finish_output2+0x238/0x4d0
      [ 88.800677] [<805ab6a0>] ip_output+0xc8/0x140
      [ 88.805526] [<805a68f4>] ip_forward+0x364/0x560
      [ 88.810567] [<805a4ff8>] ip_rcv+0x48/0xe4
      [ 88.815030] [<80528d44>] __netif_receive_skb_one_core+0x44/0x58
      [ 88.821635] [<8067f220>] dsa_switch_rcv+0x108/0x1ac
      [ 88.827067] [<80528f80>] __netif_receive_skb_list_core+0x228/0x26c
      [ 88.833951] [<8052ed84>] netif_receive_skb_list+0x1d4/0x394
      [ 88.840160] [<80355a88>] lunar_rx_poll+0x38c/0x828
      [ 88.845496] [<8052fa78>] net_rx_action+0x14c/0x3cc
      [ 88.850835] [<806ad300>] __do_softirq+0x178/0x338
      [ 88.856077] [<8012a2d4>] irq_exit+0xbc/0x100
      [ 88.860846] [<802f8b70>] plat_irq_dispatch+0xc0/0x144
      [ 88.866477] [<80105974>] handle_int+0x14c/0x158
      [ 88.871516] [<806acfb0>] r4k_wait+0x30/0x40
      [ 88.876462] Code: afb10014 8c8200a0 00803025 <9443000c> 94a20468 00000000 10620042 00a08025 9605046a
      [ 88.887332]
      [ 88.888982] ---[ end trace eb863d007da11cf1 ]---
      [ 88.894122] Kernel panic - not syncing: Fatal exception in interrupt
      [ 88.901202] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
      
      Fix this by pulling skb off the sublist and zeroing skb->next pointer
      before calling ptype callback.
      
      Fixes: 88eb1944 ("net: core: propagate SKB lists through packet_type lookup")
      Reviewed-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarAlexander Lobakin <alobakin@dlink.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9a5a90d1
  11. 28 Mar, 2019 1 commit
  12. 24 Mar, 2019 2 commits
  13. 20 Mar, 2019 3 commits
    • Paolo Abeni's avatar
      net: remove 'fallback' argument from dev->ndo_select_queue() · a350ecce
      Paolo Abeni authored
      After the previous patch, all the callers of ndo_select_queue()
      provide as a 'fallback' argument netdev_pick_tx.
      The only exceptions are nested calls to ndo_select_queue(),
      which pass down the 'fallback' available in the current scope
      - still netdev_pick_tx.
      
      We can drop such argument and replace fallback() invocation with
      netdev_pick_tx(). This avoids an indirect call per xmit packet
      in some scenarios (TCP syn, UDP unconnected, XDP generic, pktgen)
      with device drivers implementing such ndo. It also clean the code
      a bit.
      
      Tested with ixgbe and CONFIG_FCOE=m
      
      With pktgen using queue xmit:
      threads		vanilla 	patched
      		(kpps)		(kpps)
      1		2334		2428
      2		4166		4278
      4		7895		8100
      
       v1 -> v2:
       - rebased after helper's name change
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a350ecce
    • Paolo Abeni's avatar
      packet: rework packet_pick_tx_queue() to use common code selection · b71b5837
      Paolo Abeni authored
      Currently packet_pick_tx_queue() is the only caller of
      ndo_select_queue() using a fallback argument other than
      netdev_pick_tx.
      
      Leveraging rx queue, we can obtain a similar queue selection
      behavior using core helpers. After this change, ndo_select_queue()
      is always invoked with netdev_pick_tx() as fallback.
      We can change ndo_select_queue() signature in a followup patch,
      dropping an indirect call per transmitted packet in some scenarios
      (e.g. TCP syn and XDP generic xmit)
      
      This changes slightly how af packet queue selection happens when
      PACKET_QDISC_BYPASS is set. It's now more similar to plan dev_queue_xmit()
      tacking in account both XPS and TC mapping.
      
       v1  -> v2:
        - rebased after helper name change
       RFC -> v1:
        - initialize sender_cpu to the expected value
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b71b5837
    • Paolo Abeni's avatar
      net: dev: rename queue selection helpers. · 4bd97d51
      Paolo Abeni authored
      With the following patches, we are going to use __netdev_pick_tx() in
      many modules. Rename it to netdev_pick_tx(), to make it clear is
      a public API.
      
      Also rename the existing netdev_pick_tx() to netdev_core_pick_tx(),
      to avoid name clashes.
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Suggested-by: default avatarDavid Miller <davem@davemloft.net>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4bd97d51
  14. 24 Feb, 2019 2 commits
  15. 16 Feb, 2019 1 commit
    • Hauke Mehrtens's avatar
      net: Fix for_each_netdev_feature on Big endian · 3b89ea9c
      Hauke Mehrtens authored
      The features attribute is of type u64 and stored in the native endianes on
      the system. The for_each_set_bit() macro takes a pointer to a 32 bit array
      and goes over the bits in this area. On little Endian systems this also
      works with an u64 as the most significant bit is on the highest address,
      but on big endian the words are swapped. When we expect bit 15 here we get
      bit 47 (15 + 32).
      
      This patch converts it more or less to its own for_each_set_bit()
      implementation which works on 64 bit integers directly. This is then
      completely in host endianness and should work like expected.
      
      Fixes: fd867d51 ("net/core: generic support for disabling netdev features down stack")
      Signed-off-by: default avatarHauke Mehrtens <hauke.mehrtens@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3b89ea9c
  16. 06 Feb, 2019 2 commits
  17. 01 Feb, 2019 1 commit
  18. 29 Jan, 2019 1 commit
    • Josh Elsasser's avatar
      net: set default network namespace in init_dummy_netdev() · 35edfdc7
      Josh Elsasser authored
      Assign a default net namespace to netdevs created by init_dummy_netdev().
      Fixes a NULL pointer dereference caused by busy-polling a socket bound to
      an iwlwifi wireless device, which bumps the per-net BUSYPOLLRXPACKETS stat
      if napi_poll() received packets:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000190
        IP: napi_busy_loop+0xd6/0x200
        Call Trace:
          sock_poll+0x5e/0x80
          do_sys_poll+0x324/0x5a0
          SyS_poll+0x6c/0xf0
          do_syscall_64+0x6b/0x1f0
          entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      Fixes: 7db6b048 ("net: Commonize busy polling code to focus on napi_id instead of socket")
      Signed-off-by: default avatarJosh Elsasser <jelsasser@appneta.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      35edfdc7
  19. 06 Jan, 2019 1 commit
    • Masahiro Yamada's avatar
      jump_label: move 'asm goto' support test to Kconfig · e9666d10
      Masahiro Yamada authored
      Currently, CONFIG_JUMP_LABEL just means "I _want_ to use jump label".
      
      The jump label is controlled by HAVE_JUMP_LABEL, which is defined
      like this:
      
        #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
        # define HAVE_JUMP_LABEL
        #endif
      
      We can improve this by testing 'asm goto' support in Kconfig, then
      make JUMP_LABEL depend on CC_HAS_ASM_GOTO.
      
      Ugly #ifdef HAVE_JUMP_LABEL will go away, and CONFIG_JUMP_LABEL will
      match to the real kernel capability.
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Tested-by: default avatarSedat Dilek <sedat.dilek@gmail.com>
      e9666d10
  20. 15 Dec, 2018 1 commit
  21. 14 Dec, 2018 3 commits
  22. 06 Dec, 2018 6 commits
    • Petr Machata's avatar
      net: core: dev: Attach extack to NETDEV_PRE_UP · 40c900aa
      Petr Machata authored
      Drivers may need to validate configuration of a device that's about to
      be upped. Should the validation fail, there's currently no way to
      communicate details of the failure to the user, beyond an error number.
      
      To mend that, change __dev_open() to take an extack argument and pass it
      from __dev_change_flags() and dev_open(), where it was propagated in the
      previous patches.
      
      Change __dev_open() to call call_netdevice_notifiers_extack() so that
      the passed-in extack is attached to the NETDEV_PRE_UP notifier.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      40c900aa
    • Petr Machata's avatar
      net: core: dev: Add call_netdevice_notifiers_extack() · 26372605
      Petr Machata authored
      In order to propagate extack through NETDEV_PRE_UP, add a new function
      call_netdevice_notifiers_extack() that primes the extack field of the
      notifier info. Convert call_netdevice_notifiers() to a simple wrapper
      around the new function that passes NULL for extack.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      26372605
    • Petr Machata's avatar
      net: core: dev: Add extack argument to __dev_change_flags() · 6d040321
      Petr Machata authored
      In order to pass extack together with NETDEV_PRE_UP notifications, it's
      necessary to route the extack to __dev_open() from diverse (possibly
      indirect) callers. The last missing API is __dev_change_flags().
      
      Therefore extend __dev_change_flags() with and extra extack argument and
      update the two existing users.
      
      Since the function declaration line is changed anyway, name the struct
      net_device argument to placate checkpatch.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6d040321
    • Petr Machata's avatar
      net: core: dev: Add extack argument to dev_change_flags() · 567c5e13
      Petr Machata authored
      In order to pass extack together with NETDEV_PRE_UP notifications, it's
      necessary to route the extack to __dev_open() from diverse (possibly
      indirect) callers. One prominent API through which the notification is
      invoked is dev_change_flags().
      
      Therefore extend dev_change_flags() with and extra extack argument and
      update all users. Most of the calls end up just encoding NULL, but
      several sites (VLAN, ipvlan, VRF, rtnetlink) do have extack available.
      
      Since the function declaration line is changed anyway, name the other
      function arguments to placate checkpatch.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      567c5e13
    • Petr Machata's avatar
      net: core: dev: Add extack argument to dev_open() · 00f54e68
      Petr Machata authored
      In order to pass extack together with NETDEV_PRE_UP notifications, it's
      necessary to route the extack to __dev_open() from diverse (possibly
      indirect) callers. One prominent API through which the notification is
      invoked is dev_open().
      
      Therefore extend dev_open() with and extra extack argument and update
      all users. Most of the calls end up just encoding NULL, but bond and
      team drivers have the extack readily available.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      00f54e68
    • Edward Cree's avatar
      net: use skb_list_del_init() to remove from RX sublists · 22f6bbb7
      Edward Cree authored
      list_del() leaves the skb->next pointer poisoned, which can then lead to
       a crash in e.g. OVS forwarding.  For example, setting up an OVS VXLAN
       forwarding bridge on sfc as per:
      
      ========
      $ ovs-vsctl show
      5dfd9c47-f04b-4aaa-aa96-4fbb0a522a30
          Bridge "br0"
              Port "br0"
                  Interface "br0"
                      type: internal
              Port "enp6s0f0"
                  Interface "enp6s0f0"
              Port "vxlan0"
                  Interface "vxlan0"
                      type: vxlan
                      options: {key="1", local_ip="10.0.0.5", remote_ip="10.0.0.4"}
          ovs_version: "2.5.0"
      ========
      (where 10.0.0.5 is an address on enp6s0f1)
      and sending traffic across it will lead to the following panic:
      ========
      general protection fault: 0000 [#1] SMP PTI
      CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.20.0-rc3-ehc+ #701
      Hardware name: Dell Inc. PowerEdge R710/0M233H, BIOS 6.4.0 07/23/2013
      RIP: 0010:dev_hard_start_xmit+0x38/0x200
      Code: 53 48 89 fb 48 83 ec 20 48 85 ff 48 89 54 24 08 48 89 4c 24 18 0f 84 ab 01 00 00 48 8d 86 90 00 00 00 48 89 f5 48 89 44 24 10 <4c> 8b 33 48 c7 03 00 00 00 00 48 8b 05 c7 d1 b3 00 4d 85 f6 0f 95
      RSP: 0018:ffff888627b437e0 EFLAGS: 00010202
      RAX: 0000000000000000 RBX: dead000000000100 RCX: ffff88862279c000
      RDX: ffff888614a342c0 RSI: 0000000000000000 RDI: 0000000000000000
      RBP: ffff888618a88000 R08: 0000000000000001 R09: 00000000000003e8
      R10: 0000000000000000 R11: ffff888614a34140 R12: 0000000000000000
      R13: 0000000000000062 R14: dead000000000100 R15: ffff888616430000
      FS:  0000000000000000(0000) GS:ffff888627b40000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f6d2bc6d000 CR3: 000000000200a000 CR4: 00000000000006e0
      Call Trace:
       <IRQ>
       __dev_queue_xmit+0x623/0x870
       ? masked_flow_lookup+0xf7/0x220 [openvswitch]
       ? ep_poll_callback+0x101/0x310
       do_execute_actions+0xaba/0xaf0 [openvswitch]
       ? __wake_up_common+0x8a/0x150
       ? __wake_up_common_lock+0x87/0xc0
       ? queue_userspace_packet+0x31c/0x5b0 [openvswitch]
       ovs_execute_actions+0x47/0x120 [openvswitch]
       ovs_dp_process_packet+0x7d/0x110 [openvswitch]
       ovs_vport_receive+0x6e/0xd0 [openvswitch]
       ? dst_alloc+0x64/0x90
       ? rt_dst_alloc+0x50/0xd0
       ? ip_route_input_slow+0x19a/0x9a0
       ? __udp_enqueue_schedule_skb+0x198/0x1b0
       ? __udp4_lib_rcv+0x856/0xa30
       ? __udp4_lib_rcv+0x856/0xa30
       ? cpumask_next_and+0x19/0x20
       ? find_busiest_group+0x12d/0xcd0
       netdev_frame_hook+0xce/0x150 [openvswitch]
       __netif_receive_skb_core+0x205/0xae0
       __netif_receive_skb_list_core+0x11e/0x220
       netif_receive_skb_list+0x203/0x460
       ? __efx_rx_packet+0x335/0x5e0 [sfc]
       efx_poll+0x182/0x320 [sfc]
       net_rx_action+0x294/0x3c0
       __do_softirq+0xca/0x297
       irq_exit+0xa6/0xb0
       do_IRQ+0x54/0xd0
       common_interrupt+0xf/0xf
       </IRQ>
      ========
      So, in all listified-receive handling, instead pull skbs off the lists with
       skb_list_del_init().
      
      Fixes: 9af86f93 ("net: core: fix use-after-free in __netif_receive_skb_list_core")
      Fixes: 7da517a3 ("net: core: Another step of skb receive list processing")
      Fixes: a4ca8b7d ("net: ipv4: fix drop handling in ip_list_rcv() and ip_list_rcv_finish()")
      Fixes: d8269e2c ("net: ipv6: listify ipv6_rcv() and ip6_rcv_finish()")
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      22f6bbb7
  23. 04 Dec, 2018 1 commit
    • Qian Cai's avatar
      net/core: tidy up an error message · bf29e9e9
      Qian Cai authored
      netif_napi_add() could report an error like this below due to it allows
      to pass a format string for wildcarding before calling
      dev_get_valid_name(),
      
      "netif_napi_add() called with weight 256 on device eth%d"
      
      For example, hns_enet_drv module does this.
      
      hns_nic_try_get_ae
        hns_nic_init_ring_data
          netif_napi_add
        register_netdev
          dev_get_valid_name
      
      Hence, make it a bit more human-readable by using netdev_err_once()
      instead.
      Signed-off-by: default avatarQian Cai <cai@gmx.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bf29e9e9
  24. 30 Nov, 2018 1 commit
    • Geneviève Bastien's avatar
      net: Add trace events for all receive exit points · b0e3f1bd
      Geneviève Bastien authored
      Trace events are already present for the receive entry points, to indicate
      how the reception entered the stack.
      
      This patch adds the corresponding exit trace events that will bound the
      reception such that all events occurring between the entry and the exit
      can be considered as part of the reception context. This greatly helps
      for dependency and root cause analyses.
      
      Without this, it is not possible with tracepoint instrumentation to
      determine whether a sched_wakeup event following a netif_receive_skb
      event is the result of the packet reception or a simple coincidence after
      further processing by the thread. It is possible using other mechanisms
      like kretprobes, but considering the "entry" points are already present,
      it would be good to add the matching exit events.
      
      In addition to linking packets with wakeups, the entry/exit event pair
      can also be used to perform network stack latency analyses.
      Signed-off-by: default avatarGeneviève Bastien <gbastien@versatic.net>
      CC: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      CC: Steven Rostedt <rostedt@goodmis.org>
      CC: Ingo Molnar <mingo@redhat.com>
      CC: David S. Miller <davem@davemloft.net>
      Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> (tracing side)
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b0e3f1bd
  25. 29 Nov, 2018 1 commit