1. 13 Jun, 2018 1 commit
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 60d061e3
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter patches for your net tree:
      
      1) Fix NULL pointer dereference from nf_nat_decode_session() if NAT is
         not loaded, from Prashant Bhole.
      
      2) Fix socket extension module autoload.
      
      3) Don't bogusly reject sets with the NFT_SET_EVAL flag set on from
         the dynset extension.
      
      4) Fix races with nf_tables module removal and netns exit path,
         patches from Florian Westphal.
      
      5) Don't hit BUG_ON if jumpstack goes too deep, instead hit
         WARN_ON_ONCE, from Taehee Yoo.
      
      6) Another NULL pointer dereference from ctnetlink, again if NAT is
         not loaded, from Florian Westphal.
      
      7) Fix x_tables match list corruption in xt_connmark module removal
         path, also from Florian.
      
      8) nf_conncount doesn't properly deal with conntrack zones, hence
         garbage collector may get rid of entries in a different zone.
         From Yi-Hung Wei.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      60d061e3
  2. 12 Jun, 2018 28 commits
    • Juergen Gross's avatar
      xen/netfront: raise max number of slots in xennet_get_responses() · 57f230ab
      Juergen Gross authored
      The max number of slots used in xennet_get_responses() is set to
      MAX_SKB_FRAGS + (rx->status <= RX_COPY_THRESHOLD).
      
      In old kernel-xen MAX_SKB_FRAGS was 18, while nowadays it is 17. This
      difference is resulting in frequent messages "too many slots" and a
      reduced network throughput for some workloads (factor 10 below that of
      a kernel-xen based guest).
      
      Replacing MAX_SKB_FRAGS by XEN_NETIF_NR_SLOTS_MIN for calculation of
      the max number of slots to use solves that problem (tests showed no
      more messages "too many slots" and throughput was as high as with the
      kernel-xen based guest system).
      
      Replace MAX_SKB_FRAGS-2 by XEN_NETIF_NR_SLOTS_MIN-1 in
      netfront_tx_slot_available() for making it clearer what is really being
      tested without actually modifying the tested value.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      57f230ab
    • Cong Wang's avatar
      smc: convert to ->poll_mask · c0129a06
      Cong Wang authored
      smc->clcsock is an internal TCP socket, after TCP socket
      converts to ->poll_mask, ->poll doesn't exist any more.
      So just convert smc socket to ->poll_mask too.
      
      Fixes: 2c7d3dac ("net/tcp: convert to ->poll_mask")
      Reported-by: syzbot+f5066e369b2d5fff630f@syzkaller.appspotmail.com
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Ursula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c0129a06
    • Christophe JAILLET's avatar
      net: stmmac: dwmac-meson8b: Fix an error handling path in 'meson8b_dwmac_probe()' · 760a6ed6
      Christophe JAILLET authored
      If 'of_device_get_match_data()' fails, we need to release some resources as
      done in the other error handling path of this function.
      
      Fixes: efacb568 ("net: stmmac: dwmac-meson: extend phy mode setting")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      760a6ed6
    • Davide Caratti's avatar
      tc-testing: ife: fix wrong teardown command in test b7b8 · 31962c8c
      Davide Caratti authored
      fix failures in the 'teardown' stage of test b7b8, probably a leftover of
      commit 7c5995b3 ("tc-testing: fixed copy-pasting error in ife tests")
      
      Fixes: a56e6bcd ("tc-testing: updated ife test cases")
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      31962c8c
    • Vadim Lomovtsev's avatar
      net: thunderx: prevent concurrent data re-writing by nicvf_set_rx_mode · 469998c8
      Vadim Lomovtsev authored
      For each network interface linux network stack issue ndo_set_rx_mode call
      in order to configure MAC address filters (e.g. for multicast filtering).
      Currently ThunderX NICVF driver has only one ordered workqueue to process
      such requests for all VFs.
      
      And because of that it is possible that subsequent call to
      ndo_set_rx_mode would corrupt data which is currently in use
      by nicvf_set_rx_mode_task. Which in turn could cause following issue:
      [...]
      [   48.978341] Unable to handle kernel paging request at virtual address 1fffff0000000000
      [   48.986275] Mem abort info:
      [   48.989058]   Exception class = DABT (current EL), IL = 32 bits
      [   48.994965]   SET = 0, FnV = 0
      [   48.998020]   EA = 0, S1PTW = 0
      [   49.001152] Data abort info:
      [   49.004022]   ISV = 0, ISS = 0x00000004
      [   49.007869]   CM = 0, WnR = 0
      [   49.010826] [1fffff0000000000] address between user and kernel address ranges
      [   49.017963] Internal error: Oops: 96000004 [#1] SMP
      [...]
      [   49.072138] task: ffff800fdd675400 task.stack: ffff000026440000
      [   49.078051] PC is at prefetch_freepointer.isra.37+0x28/0x3c
      [   49.083613] LR is at kmem_cache_alloc_trace+0xc8/0x1fc
      [...]
      [   49.272684] [<ffff0000082738f0>] prefetch_freepointer.isra.37+0x28/0x3c
      [   49.279286] [<ffff000008276bc8>] kmem_cache_alloc_trace+0xc8/0x1fc
      [   49.285455] [<ffff0000082c0c0c>] alloc_fdtable+0x78/0x134
      [   49.290841] [<ffff0000082c15c0>] dup_fd+0x254/0x2f4
      [   49.295709] [<ffff0000080d1954>] copy_process.isra.38.part.39+0x64c/0x1168
      [   49.302572] [<ffff0000080d264c>] _do_fork+0xfc/0x3b0
      [   49.307524] [<ffff0000080d29e8>] SyS_clone+0x44/0x50
      [...]
      
      This patch is to prevent such concurrent data write with spinlock.
      Reported-by: default avatarDean Nelson <dnelson@redhat.com>
      Signed-off-by: default avatarVadim Lomovtsev <Vadim.Lomovtsev@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      469998c8
    • Linus Walleij's avatar
      net: phy: mdio-gpio: Cut surplus includes · 909f1edc
      Linus Walleij authored
      The GPIO MDIO driver now needs only <linux/gpio/consumer.h>
      so cut the legacy <linux/gpio.h> and <linux/of_gpio.h>
      includes that are no longer used.
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      909f1edc
    • David S. Miller's avatar
      Merge branch 'hv_netvsc-notification-and-namespace-fixes' · bfc17d00
      David S. Miller authored
      Stephen Hemminger says:
      
      ====================
      hv_netvsc: notification and namespace fixes
      
      This set of patches addresses two set of fixes. First it backs out
      the common callback model which was merged in net-next without
      completing all the review feedback or getting maintainer approval.
      
      Then it fixes the transparent VF management code to handle network
      namespaces.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bfc17d00
    • Stephen Hemminger's avatar
      hv_netvsc: move VF to same namespace as netvsc device · c0a41b88
      Stephen Hemminger authored
      When VF is added, the paravirtual device is already present
      and may have been moved to another network namespace. For example,
      sometimes the management interface is put in another net namespace
      in some environments.
      
      The VF should get moved to where the netvsc device is when the
      VF is discovered. The user can move it later (if desired).
      Signed-off-by: default avatarStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c0a41b88
    • Stephen Hemminger's avatar
      hv_netvsc: fix network namespace issues with VF support · 7bf7bb37
      Stephen Hemminger authored
      When finding the parent netvsc device, the search needs to be across
      all netvsc device instances (independent of network namespace).
      
      Find parent device of VF using upper_dev_get routine which
      searches only adjacent list.
      
      Fixes: e8ff40d4 ("hv_netvsc: improve VF device matching")
      Signed-off-by: default avatarStephen Hemminger <sthemmin@microsoft.com>
      
      netns aware byref
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7bf7bb37
    • Stephen Hemminger's avatar
      hv_netvsc: drop common code until callback model fixed · 8cde8f0c
      Stephen Hemminger authored
      The callback model of handling network failover is not suitable
      in the current form.
        1. It was merged without addressing all the review feedback.
        2. It was merged without approval of any of the netvsc maintainers.
        3. Design discussion on how to handle PV/VF fallback is still
           not complete.
        4. IMHO the code model using callbacks is trying to make
           something common which isn't.
      
      Revert the netvsc specific changes for now. Does not impact ongoing
      development of failover model for virtio.
      Revisit this after a simpler library based failover kernel
      routines are extracted.
      
      This reverts
      commit 9c6ffbac ("hv_netvsc: fix error return code in netvsc_probe()")
      and
      commit 1ff78076 ("netvsc: refactor notifier/event handling code to use the failover framework")
      Signed-off-by: default avatarStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8cde8f0c
    • David S. Miller's avatar
      Merge branch 'nfp-fixes' · 01a1a170
      David S. Miller authored
      Jakub Kicinski says:
      
      ====================
      nfp: fix a warning, stats, naming and route leak
      
      Various fixes for the NFP.  Patch 1 fixes a harmless GCC 8 warning.
      Patch 2 ensures statistics are correct after users decrease the number
      of channels/rings.  Patch 3 restores phy_port_name behaviour for flower,
      ndo_get_phy_port_name used to return -EOPNOTSUPP on one of the netdevs,
      and we need to keep it that way otherwise interface names may change.
      Patch 4 fixes refcnt leak in flower tunnel offload code.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      01a1a170
    • Pieter Jansen van Vuuren's avatar
      nfp: flower: free dst_entry in route table · e62e51af
      Pieter Jansen van Vuuren authored
      We need to release the refcnt on dst_entry in the route table, otherwise
      we will leak the route.
      
      Fixes: 8e6a9046 ("nfp: flower vxlan neighbour offload")
      Signed-off-by: default avatarPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
      Signed-off-by: default avatarLouis Peens <louis.peens@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e62e51af
    • Jakub Kicinski's avatar
      nfp: remove phys_port_name on flower's vNIC · fe06a64e
      Jakub Kicinski authored
      .ndo_get_phys_port_name was recently extended to support multi-vNIC
      FWs.  These are firmwares which can have more than one vNIC per PF
      without associated port (e.g. Adaptive Buffer Management FW), therefore
      we need a way of distinguishing the vNICs.  Unfortunately, it's too
      late to make flower use the same naming.  Flower users may depend on
      .ndo_get_phys_port_name returning -EOPNOTSUPP, for example the name
      udev gave the PF vNIC was just the bare PCI device-based name before
      the change, and will have 'nn0' appended after.
      
      To ensure flower's vNIC doesn't have phys_port_name attribute, add
      a flag to vNIC struct and set it in flower code.  New projects will
      not set the flag adhere to the naming scheme from the start.
      
      Fixes: 51c1df83 ("nfp: assign vNIC id as phys_port_name of vNICs which are not ports")
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fe06a64e
    • Jakub Kicinski's avatar
      nfp: include all ring counters in interface stats · 29f534c4
      Jakub Kicinski authored
      We are gathering software statistics on per-ring basis.
      .ndo_get_stats64 handler adds the rings up.  Unfortunately
      we are currently only adding up active rings, which means
      that if user decreases the number of active rings the
      statistics from deactivated rings will no longer be counted
      and total interface statistics may go backwards.
      
      Always sum all possible rings, the stats are allocated
      statically for max number of rings, so we don't have to
      worry about them being removed.  We could add the stats
      up when user changes the ring count, but it seems unnecessary..
      Adding up inactive rings will be very quick since no datapath
      will be touching them.
      
      Fixes: 164d1e9e ("nfp: add support for ethtool .set_channels")
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      29f534c4
    • Jakub Kicinski's avatar
      nfp: don't pad strings in nfp_cpp_resource_find() to avoid gcc 8 warning · f8d0efb1
      Jakub Kicinski authored
      Once upon a time nfp_cpp_resource_find() took a name parameter,
      which could be any user-chosen string.  Resources are identified
      by a CRC32 hash of a 8 byte string, so we had to pad user input
      with zeros to make sure CRC32 gave the correct result.
      
      Since then nfp_cpp_resource_find() was made to operate on allocated
      resources only (struct nfp_resource).  We kzalloc those so there is
      no need to pad the strings and use memcmp.
      
      This avoids a GCC 8 stringop-truncation warning:
      
      In function ‘nfp_cpp_resource_find’,
          inlined from ‘nfp_resource_try_acquire’ at .../nfpcore/nfp_resource.c:153:8,
          inlined from ‘nfp_resource_acquire’ at .../nfpcore/nfp_resource.c:206:9:
          .../nfpcore/nfp_resource.c:108:2: warning:  strncpy’ output may be truncated copying 8 bytes from a string of length 8 [-Wstringop-truncation]
            strncpy(name_pad, res->name, sizeof(name_pad));
            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f8d0efb1
    • Bart Van Assche's avatar
      Revert "net: do not allow changing SO_REUSEADDR/SO_REUSEPORT on bound sockets" · cdb8744d
      Bart Van Assche authored
      Revert the patch mentioned in the subject because it breaks at least
      the Avahi mDNS daemon. That patch namely causes the Ubuntu 18.04 Avahi
      daemon to fail to start:
      
      Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: Successfully called chroot().
      Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: Successfully dropped remaining capabilities.
      Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: No service file found in /etc/avahi/services.
      Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: SO_REUSEADDR failed: Structure needs cleaning
      Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: SO_REUSEADDR failed: Structure needs cleaning
      Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: Failed to create server: No suitable network protocol available
      Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: avahi-daemon 0.7 exiting.
      Jun 12 09:49:24 ubuntu-vm systemd[1]: avahi-daemon.service: Main process exited, code=exited, status=255/n/a
      Jun 12 09:49:24 ubuntu-vm systemd[1]: avahi-daemon.service: Failed with result 'exit-code'.
      Jun 12 09:49:24 ubuntu-vm systemd[1]: Failed to start Avahi mDNS/DNS-SD Stack.
      
      Fixes: f396922d ("net: do not allow changing SO_REUSEADDR/SO_REUSEPORT on bound sockets")
      Cc: Maciej Żenczykowski <maze@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cdb8744d
    • Yi-Hung Wei's avatar
      netfilter: nf_conncount: Fix garbage collection with zones · 21ba8847
      Yi-Hung Wei authored
      Currently, we use check_hlist() for garbage colleciton. However, we
      use the ‘zone’ from the counted entry to query the existence of
      existing entries in the hlist. This could be wrong when they are in
      different zones, and this patch fixes this issue.
      
      Fixes: e59ea3df ("netfilter: xt_connlimit: honor conntrack zone if available")
      Signed-off-by: default avatarYi-Hung Wei <yihung.wei@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      21ba8847
    • Florian Westphal's avatar
      netfilter: xt_connmark: fix list corruption on rmmod · fc6ddbec
      Florian Westphal authored
      This needs to use xt_unregister_targets, else new revision is left
      on the list which then causes list to point to a target struct that has been free'd.
      
      Fixes: 472a73e0 ("netfilter: xt_conntrack: Support bit-shifting for CONNMARK & MARK targets.")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      fc6ddbec
    • Florian Westphal's avatar
      netfilter: ctnetlink: avoid null pointer dereference · c05a45c0
      Florian Westphal authored
      Dan Carpenter points out that deref occurs after NULL check, we should
      re-fetch the pointer and check that instead.
      
      Fixes: 2c205dd3 ("netfilter: add struct nf_nat_hook and use it")
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      c05a45c0
    • Taehee Yoo's avatar
      netfilter: nf_tables: use WARN_ON_ONCE instead of BUG_ON in nft_do_chain() · adc972c5
      Taehee Yoo authored
      When depth of chain is bigger than NFT_JUMP_STACK_SIZE, the nft_do_chain
      crashes. But there is no need to crash hard here.
      Suggested-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Acked-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      adc972c5
    • Florian Westphal's avatar
      netfilter: nf_tables: close race between netns exit and rmmod · 0a2cf5ee
      Florian Westphal authored
      If net namespace is exiting while nf_tables module is being removed
      we can oops:
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
       IP: nf_tables_flowtable_event+0x43/0xf0 [nf_tables]
       PGD 0 P4D 0
       Oops: 0000 [#1] SMP PTI
       Modules linked in: nf_tables(-) nfnetlink [..]
        unregister_netdevice_notifier+0xdd/0x130
        nf_tables_module_exit+0x24/0x3a [nf_tables]
        SyS_delete_module+0x1c5/0x240
        do_syscall_64+0x74/0x190
      
      Avoid this by attempting to take reference on the net namespace from
      the notifiers.  If it fails the namespace is exiting already, and nft
      core is taking care of cleanup work.
      
      We also need to make sure the netdev hook type gets removed
      before netns ops removal, else notifier might be invoked with device
      event for a netns where net->nft was never initialised (because
      pernet ops was removed beforehand).
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      0a2cf5ee
    • Florian Westphal's avatar
      netfilter: nf_tables: fix module unload race · 71ad00c5
      Florian Westphal authored
      We must first remove the nfnetlink protocol handler when nf_tables module
      is unloaded -- we don't want userspace to submit new change requests once
      we've started to tear down nft state.
      
      Furthermore, nfnetlink must not call any subsystem function after
      call_batch returned -EAGAIN.
      
      EAGAIN means the subsys mutex was dropped, so its unlikely but possible that
      nf_tables subsystem was removed due to 'rmmod nf_tables' on another cpu.
      
      Therefore, we must abort batch completely and not move on to next part of
      the batch.
      
      Last, we can't invoke ->abort unless we've checked that the subsystem is
      still registered.
      
      Change netns exit path of nf_tables to make sure any incompleted
      transaction gets removed on exit.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      71ad00c5
    • Pablo Neira Ayuso's avatar
      netfilter: nft_dynset: do not reject set updates with NFT_SET_EVAL · 215a31f1
      Pablo Neira Ayuso authored
      NFT_SET_EVAL is signalling the kernel that this sets can be updated from
      the evaluation path, even if there are no expressions attached to the
      element. Otherwise, set updates with no expressions fail. Update
      description to describe the right semantics.
      
      Fixes: 22fe54d5 ("netfilter: nf_tables: add support for dynamic set updates")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      215a31f1
    • Pablo Neira Ayuso's avatar
      netfilter: nft_socket: fix module autoload · 3fb61eca
      Pablo Neira Ayuso authored
      Add alias definition for module autoload when adding socket rules.
      
      Fixes: 554ced0a ("netfilter: nf_tables: add support for native socket matching")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      3fb61eca
    • Prashant Bhole's avatar
      netfilter: fix null-ptr-deref in nf_nat_decode_session · 155fb5c5
      Prashant Bhole authored
      Add null check for nat_hook in nf_nat_decode_session()
      
      [  195.648098] UBSAN: Undefined behaviour in ./include/linux/netfilter.h:348:14
      [  195.651366] BUG: KASAN: null-ptr-deref in __xfrm_policy_check+0x208/0x1d70
      [  195.653888] member access within null pointer of type 'struct nf_nat_hook'
      [  195.653896] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.17.0-rc6+ #5
      [  195.656320] Read of size 8 at addr 0000000000000008 by task ping/2469
      [  195.658715] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      [  195.658721] Call Trace:
      [  195.661087]
      [  195.669341]  <IRQ>
      [  195.670574]  dump_stack+0xc6/0x150
      [  195.672156]  ? dump_stack_print_info.cold.0+0x1b/0x1b
      [  195.674121]  ? ubsan_prologue+0x31/0x92
      [  195.676546]  ubsan_epilogue+0x9/0x49
      [  195.678159]  handle_null_ptr_deref+0x11a/0x130
      [  195.679800]  ? sprint_OID+0x1a0/0x1a0
      [  195.681322]  __ubsan_handle_type_mismatch_v1+0xd5/0x11d
      [  195.683146]  ? ubsan_prologue+0x92/0x92
      [  195.684642]  __xfrm_policy_check+0x18ef/0x1d70
      [  195.686294]  ? rt_cache_valid+0x118/0x180
      [  195.687804]  ? __xfrm_route_forward+0x410/0x410
      [  195.689463]  ? fib_multipath_hash+0x700/0x700
      [  195.691109]  ? kvm_sched_clock_read+0x23/0x40
      [  195.692805]  ? pvclock_clocksource_read+0xf6/0x280
      [  195.694409]  ? graph_lock+0xa0/0xa0
      [  195.695824]  ? pvclock_clocksource_read+0xf6/0x280
      [  195.697508]  ? pvclock_read_flags+0x80/0x80
      [  195.698981]  ? kvm_sched_clock_read+0x23/0x40
      [  195.700347]  ? sched_clock+0x5/0x10
      [  195.701525]  ? sched_clock_cpu+0x18/0x1a0
      [  195.702846]  tcp_v4_rcv+0x1d32/0x1de0
      [  195.704115]  ? lock_repin_lock+0x70/0x270
      [  195.707072]  ? pvclock_read_flags+0x80/0x80
      [  195.709302]  ? tcp_v4_early_demux+0x4b0/0x4b0
      [  195.711833]  ? lock_acquire+0x195/0x380
      [  195.714222]  ? ip_local_deliver_finish+0xfc/0x770
      [  195.716967]  ? raw_rcv+0x2b0/0x2b0
      [  195.718856]  ? lock_release+0xa00/0xa00
      [  195.720938]  ip_local_deliver_finish+0x1b9/0x770
      [...]
      
      Fixes: 2c205dd3 ("netfilter: add struct nf_nat_hook and use it")
      Signed-off-by: default avatarPrashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
      Acked-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      155fb5c5
    • David Miller's avatar
      tcp: Do not reload skb pointer after skb_gro_receive(). · 6892286e
      David Miller authored
      This is not necessary.  skb_gro_receive() will never change what
      'head' points to.
      
      In it's original implementation (see commit 71d93b39 ("net: Add
      skb_gro_receive")), it did:
      
      ====================
      +	*head = nskb;
      +	nskb->next = p->next;
      +	p->next = NULL;
      ====================
      
      This sequence was removed in commit 58025e46 ("net: gro: remove
      obsolete code from skb_gro_receive()")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      6892286e
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 0ca69d13
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2018-06-12
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Avoid an allocation warning in AF_XDP by adding __GFP_NOWARN for the
         umem setup, from Björn.
      
      2) Silence a warning in bpf fs when an application tries to open(2) a
         pinned bpf obj due to missing fops. Add a dummy open fop that continues
         to just bail out in such case, from Daniel.
      
      3) Fix a BPF selftest urandom_read build issue where gcc complains that
         it gets built twice, from Anders.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0ca69d13
    • David S. Miller's avatar
      Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue · 93ba168a
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2018-06-11
      
      This series contains fixes to ixgbe IPsec and MACVLAN.
      
      Alex provides the 5 fixes in this series, starting with fixing an issue
      where num_rx_pools was not being populated until after the queues and
      interrupts were reinitialized when enabling MACVLAN interfaces.  Updated
      to use CONFIG_XFRM_OFFLOAD instead of CONFIG_XFRM, since the code
      requires CONFIG_XFRM_OFFLOAD to be enabled.  Moved the IPsec
      initialization function to be more consistent with the placement of
      similar initialization functions and before the call to reset the
      hardware, which will clean up any link issues that may have been
      introduced.  Fixed the boolean logic that was testing for transmit OR
      receive ready bits, when it should have been testing for transmit AND
      receive ready bits.  Fixed the bit definitions for SECTXSTAT and SECRXSTAT
      registers and ensure that if IPsec is disabled on the part, do not
      enable it.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93ba168a
  3. 11 Jun, 2018 11 commits
    • David Ahern's avatar
      net/ipv6: Ensure cfg is properly initialized in ipv6_create_tempaddr · 3f2d67b6
      David Ahern authored
      Valdis reported a BUG in ipv6_add_addr:
      
      [ 1820.832682] BUG: unable to handle kernel NULL pointer dereference at 0000000000000209
      [ 1820.832728] RIP: 0010:ipv6_add_addr+0x280/0xd10
      [ 1820.832732] Code: 49 8b 1f 0f 84 6a 0a 00 00 48 85 db 0f 84 4e 0a 00 00 48 8b 03 48 8b 53 08 49 89 45 00 49 8b 47 10
      49 89 55 08 48 85 c0 74 15 <48> 8b 50 08 48 8b 00 49 89 95 b8 01 00 00 49 89 85 b0 01 00 00 4c
      [ 1820.832847] RSP: 0018:ffffaa07c2fd7880 EFLAGS: 00010202
      [ 1820.832853] RAX: 0000000000000201 RBX: ffffaa07c2fd79b0 RCX: 0000000000000000
      [ 1820.832858] RDX: a4cfbfba2cbfa64c RSI: 0000000000000000 RDI: ffffffff8a8e9fa0
      [ 1820.832862] RBP: ffffaa07c2fd7920 R08: 000000000000017a R09: ffffffff8a555300
      [ 1820.832866] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888d18e71c00
      [ 1820.832871] R13: ffff888d0a9b1200 R14: 0000000000000000 R15: ffffaa07c2fd7980
      [ 1820.832876] FS:  00007faa51bdb800(0000) GS:ffff888d1d400000(0000) knlGS:0000000000000000
      [ 1820.832880] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1820.832885] CR2: 0000000000000209 CR3: 000000021e8f8001 CR4: 00000000001606e0
      [ 1820.832888] Call Trace:
      [ 1820.832898]  ? __local_bh_enable_ip+0x119/0x260
      [ 1820.832904]  ? ipv6_create_tempaddr+0x259/0x5a0
      [ 1820.832912]  ? __local_bh_enable_ip+0x139/0x260
      [ 1820.832921]  ipv6_create_tempaddr+0x2da/0x5a0
      [ 1820.832926]  ? ipv6_create_tempaddr+0x2da/0x5a0
      [ 1820.832941]  manage_tempaddrs+0x1a5/0x240
      [ 1820.832951]  inet6_addr_del+0x20b/0x3b0
      [ 1820.832959]  ? nla_parse+0xce/0x1e0
      [ 1820.832968]  inet6_rtm_deladdr+0xd9/0x210
      [ 1820.832981]  rtnetlink_rcv_msg+0x1d4/0x5f0
      
      Looking at the code I found 1 element (peer_pfx) of the newly introduced
      ifa6_config struct that is not initialized. Use a memset rather than hard
      coding an init for each struct element.
      Reported-by: default avatarValdis Kletnieks <valdis.kletnieks@vt.edu>
      Fixes: e6464b8c ("net/ipv6: Convert ipv6_add_addr to struct ifa6_config")
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3f2d67b6
    • Daniel Borkmann's avatar
      tls: fix NULL pointer dereference on poll · f6fadff3
      Daniel Borkmann authored
      While hacking on kTLS, I ran into the following panic from an
      unprivileged netserver / netperf TCP session:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
        PGD 800000037f378067 P4D 800000037f378067 PUD 3c0e61067 PMD 0
        Oops: 0010 [#1] SMP KASAN PTI
        CPU: 1 PID: 2289 Comm: netserver Not tainted 4.17.0+ #139
        Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET47W (1.21 ) 11/28/2016
        RIP: 0010:          (null)
        Code: Bad RIP value.
        RSP: 0018:ffff88036abcf740 EFLAGS: 00010246
        RAX: dffffc0000000000 RBX: ffff88036f5f6800 RCX: 1ffff1006debed26
        RDX: ffff88036abcf920 RSI: ffff8803cb1a4f00 RDI: ffff8803c258c280
        RBP: ffff8803c258c280 R08: ffff8803c258c280 R09: ffffed006f559d48
        R10: ffff88037aacea43 R11: ffffed006f559d49 R12: ffff8803c258c280
        R13: ffff8803cb1a4f20 R14: 00000000000000db R15: ffffffffc168a350
        FS:  00007f7e631f4700(0000) GS:ffff8803d1c80000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: ffffffffffffffd6 CR3: 00000003ccf64005 CR4: 00000000003606e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         ? tls_sw_poll+0xa4/0x160 [tls]
         ? sock_poll+0x20a/0x680
         ? do_select+0x77b/0x11a0
         ? poll_schedule_timeout.constprop.12+0x130/0x130
         ? pick_link+0xb00/0xb00
         ? read_word_at_a_time+0x13/0x20
         ? vfs_poll+0x270/0x270
         ? deref_stack_reg+0xad/0xe0
         ? __read_once_size_nocheck.constprop.6+0x10/0x10
        [...]
      
      Debugging further, it turns out that calling into ctx->sk_poll() is
      invalid since sk_poll itself is NULL which was saved from the original
      TCP socket in order for tls_sw_poll() to invoke it.
      
      Looks like the recent conversion from poll to poll_mask callback started
      in 15252423 ("net: add support for ->poll_mask in proto_ops") missed
      to eventually convert kTLS, too: TCP's ->poll was converted over to the
      ->poll_mask in commit 2c7d3dac ("net/tcp: convert to ->poll_mask")
      and therefore kTLS wrongly saved the ->poll old one which is now NULL.
      
      Convert kTLS over to use ->poll_mask instead. Also instead of POLLIN |
      POLLRDNORM use the proper EPOLLIN | EPOLLRDNORM bits as the case in
      tcp_poll_mask() as well that is mangled here.
      
      Fixes: 2c7d3dac ("net/tcp: convert to ->poll_mask")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Watson <davejwatson@fb.com>
      Tested-by: default avatarDave Watson <davejwatson@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f6fadff3
    • Björn Töpel's avatar
      xsk: silence warning on memory allocation failure · a343993c
      Björn Töpel authored
      syzkaller reported a warning from xdp_umem_pin_pages():
      
        WARNING: CPU: 1 PID: 4537 at mm/slab_common.c:996 kmalloc_slab+0x56/0x70 mm/slab_common.c:996
        ...
        __do_kmalloc mm/slab.c:3713 [inline]
        __kmalloc+0x25/0x760 mm/slab.c:3727
        kmalloc_array include/linux/slab.h:634 [inline]
        kcalloc include/linux/slab.h:645 [inline]
        xdp_umem_pin_pages net/xdp/xdp_umem.c:205 [inline]
        xdp_umem_reg net/xdp/xdp_umem.c:318 [inline]
        xdp_umem_create+0x5c9/0x10f0 net/xdp/xdp_umem.c:349
        xsk_setsockopt+0x443/0x550 net/xdp/xsk.c:531
        __sys_setsockopt+0x1bd/0x390 net/socket.c:1935
        __do_sys_setsockopt net/socket.c:1946 [inline]
        __se_sys_setsockopt net/socket.c:1943 [inline]
        __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1943
        do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      This is a warning about attempting to allocate more than
      KMALLOC_MAX_SIZE memory. The request originates from userspace, and if
      the request is too big, the kernel is free to deny its allocation. In
      this patch, the failed allocation attempt is silenced with
      __GFP_NOWARN.
      
      Fixes: c0c77d8f ("xsk: add user memory registration support sockopt")
      Reported-by: syzbot+4abadc5d69117b346506@syzkaller.appspotmail.com
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      a343993c
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · a08ce73b
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter/IPVS fixes for net
      
      The following patchset contains Netfilter/IPVS fixes for your net tree:
      
      1) Reject non-null terminated helper names from xt_CT, from Gao Feng.
      
      2) Fix KASAN splat due to out-of-bound access from commit phase, from
         Alexey Kodanev.
      
      3) Missing conntrack hook registration on IPVS FTP helper, from Julian
         Anastasov.
      
      4) Incorrect skbuff allocation size in bridge nft_reject, from Taehee Yoo.
      
      5) Fix inverted check on packet xmit to non-local addresses, also from
         Julian.
      
      6) Fix ebtables alignment compat problems, from Alin Nastac.
      
      7) Hook mask checks are not correct in xt_set, from Serhey Popovych.
      
      8) Fix timeout listing of element in ipsets, from Jozsef.
      
      9) Cap maximum timeout value in ipset, also from Jozsef.
      
      10) Don't allow family option for hash:mac sets, from Florent Fourcot.
      
      11) Restrict ebtables to work with NFPROTO_BRIDGE targets only, this
          Florian.
      
      12) Another bug reported by KASAN in the rbtree set backend, from
          Taehee Yoo.
      
      13) Missing __IPS_MAX_BIT update doesn't include IPS_OFFLOAD_BIT.
          From Gao Feng.
      
      14) Missing initialization of match/target in ebtables, from Florian
          Westphal.
      
      15) Remove useless nft_dup.h file in include path, from C. Labbe.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a08ce73b
    • Zhouyang Jia's avatar
      net: dsa: add error handling for pskb_trim_rcsum · 349b71d6
      Zhouyang Jia authored
      When pskb_trim_rcsum fails, the lack of error-handling code may
      cause unexpected results.
      
      This patch adds error-handling code after calling pskb_trim_rcsum.
      Signed-off-by: default avatarZhouyang Jia <jiazhouyang09@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      349b71d6
    • Julian Anastasov's avatar
      ipv6: allow PMTU exceptions to local routes · 09757646
      Julian Anastasov authored
      IPVS setups with local client and remote tunnel server need
      to create exception for the local virtual IP. What we do is to
      change PMTU from 64KB (on "lo") to 1460 in the common case.
      Suggested-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Fixes: 45e4fd26 ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
      Fixes: 7343ff31 ("ipv6: Don't create clones of host routes.")
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      09757646
    • Alexander Duyck's avatar
      ixgbe: Fix bit definitions and add support for testing for ipsec support · 421d954c
      Alexander Duyck authored
      This patch addresses two issues. First it adds the correct bit definitions
      for the SECTXSTAT and SECRXSTAT registers. Then it makes use of those
      definitions to test for if IPsec has been disabled on the part and if so we
      do not enable it.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Reported-by: default avatarAndre Tomt <andre@tomt.net>
      Acked-by: default avatarShannon Nelson <shannon.nelson@oracle.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      421d954c
    • Alexander Duyck's avatar
      ixgbe: Avoid loopback and fix boolean logic in ipsec_stop_data · e9f655ee
      Alexander Duyck authored
      This patch fixes two issues. First we add an early test for the Tx and Rx
      security block ready bits. By doing this we can avoid the need for waits or
      loopback in the event that the security block is already flushed out.
      Secondly we fix the boolean logic that was testing for the Tx OR Rx ready
      bits being set and change it so that we only exit if the Tx AND Rx ready
      bits are both set.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: default avatarShannon Nelson <shannon.nelson@oracle.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      e9f655ee
    • Alexander Duyck's avatar
      ixgbe: Move ipsec init function to before reset call · de7a7e34
      Alexander Duyck authored
      This patch moves the IPsec init function in ixgbe_sw_init. This way it is a
      bit more consistent with the placement of similar initialization functions
      and is placed before the reset_hw call which should allow us to clean up
      any link issues that may be introduced by the fact that we force the link
      up if somehow the device had IPsec still enabled before the driver was
      loaded.
      
      In addition to the function move it is necessary to change the assignment
      of netdev->features. The easiest way to do this is to just test for the
      existence of adapter->ipsec and if it is present we set the feature bits.
      
      Fixes: 49a94d74 ("ixgbe: add ipsec engine start and stop routines")
      Reported-by: default avatarAndre Tomt <andre@tomt.net>
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: default avatarShannon Nelson <shannon.nelson@oracle.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      de7a7e34
    • Alexander Duyck's avatar
      ixgbe: Use CONFIG_XFRM_OFFLOAD instead of CONFIG_XFRM · e433f3a5
      Alexander Duyck authored
      There is no point in adding code if CONFIG_XFRM is defined that we won't
      use unless CONFIG_XFRM_OFFLOAD is defined. So instead of leaving this code
      floating around I am replacing the ifdef with what I believe is the correct
      one so that we only include the code and variables if they will actually be
      used.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: default avatarShannon Nelson <shannon.nelson@oracle.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      e433f3a5
    • Alexander Duyck's avatar
      ixgbe: Fix setting of TC configuration for macvlan case · 646bb57c
      Alexander Duyck authored
      When we were enabling macvlan interfaces we weren't correctly configuring
      things until ixgbe_setup_tc was called a second time either by tweaking the
      number of queues or increasing the macvlan count past 15.
      
      The issue came down to the fact that num_rx_pools is not populated until
      after the queues and interrupts are reinitialized.
      
      Instead of trying to set it sooner we can just move the call to setup at
      least 1 traffic class to the SR-IOV/VMDq setup function so that we just set
      it for this one case. We already had a spot that was configuring the queues
      for TC 0 in the code here anyway so it makes sense to also set the number
      of TCs here as well.
      
      Fixes: 49cfbeb7 ("ixgbe: Fix handling of macvlan Tx offload")
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      646bb57c