1. 11 Jun, 2019 30 commits
    • Yihao Wu's avatar
      NFSv4.1: Fix bug only first CB_NOTIFY_LOCK is handled · 11f08784
      Yihao Wu authored
      commit ba851a39 upstream.
      
      When a waiter is waked by CB_NOTIFY_LOCK, it will retry
      nfs4_proc_setlk(). The waiter may fail to nfs4_proc_setlk() and sleep
      again. However, the waiter is already removed from clp->cl_lock_waitq
      when handling CB_NOTIFY_LOCK in nfs4_wake_lock_waiter(). So any
      subsequent CB_NOTIFY_LOCK won't wake this waiter anymore. We should
      put the waiter back to clp->cl_lock_waitq before retrying.
      
      Cc: stable@vger.kernel.org #4.9+
      Signed-off-by: default avatarYihao Wu <wuyihao@linux.alibaba.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      11f08784
    • Yihao Wu's avatar
      NFSv4.1: Again fix a race where CB_NOTIFY_LOCK fails to wake a waiter · d1d8b49a
      Yihao Wu authored
      commit 52b042ab upstream.
      
      Commit b7dbcc0e "NFSv4.1: Fix a race where CB_NOTIFY_LOCK fails to wake a waiter"
      found this bug. However it didn't fix it.
      
      This commit replaces schedule_timeout() with wait_woken() and
      default_wake_function() with woken_wake_function() in function
      nfs4_retry_setlk() and nfs4_wake_lock_waiter(). wait_woken() uses
      memory barriers in its implementation to avoid potential race condition
      when putting a process into sleeping state and then waking it up.
      
      Fixes: a1d617d8 ("nfs: allow blocking locks to be awoken by lock callbacks")
      Cc: stable@vger.kernel.org #4.9+
      Signed-off-by: default avatarYihao Wu <wuyihao@linux.alibaba.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d1d8b49a
    • Trond Myklebust's avatar
      SUNRPC: Fix a use after free when a server rejects the RPCSEC_GSS credential · 42b761f6
      Trond Myklebust authored
      commit 7987b694 upstream.
      
      The addition of rpc_check_timeout() to call_decode causes an Oops
      when the RPCSEC_GSS credential is rejected.
      The reason is that rpc_decode_header() will call xprt_release() in
      order to free task->tk_rqstp, which is needed by rpc_check_timeout()
      to check whether or not we should exit due to a soft timeout.
      
      The fix is to move the call to xprt_release() into call_decode() so
      we can perform it after rpc_check_timeout().
      Reported-by: default avatarOlga Kornievskaia <olga.kornievskaia@gmail.com>
      Reported-by: default avatarNick Bowler <nbowler@draconx.ca>
      Fixes: cea57789 ("SUNRPC: Clean up")
      Cc: stable@vger.kernel.org # v5.1+
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      42b761f6
    • Olga Kornievskaia's avatar
      SUNRPC fix regression in umount of a secure mount · ca416579
      Olga Kornievskaia authored
      commit ec6017d9 upstream.
      
      If call_status returns ENOTCONN, we need to re-establish the connection
      state after. Otherwise the client goes into an infinite loop of call_encode,
      call_transmit, call_status (ENOTCONN), call_encode.
      
      Fixes: c8485e4d ("SUNRPC: Handle ECONNREFUSED correctly in xprt_transmit()")
      Signed-off-by: default avatarOlga Kornievskaia <kolga@netapp.com>
      Cc: stable@vger.kernel.org # v2.6.29+
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ca416579
    • Helge Deller's avatar
      parisc: Fix crash due alternative coding for NP iopdir_fdc bit · d40905fa
      Helge Deller authored
      commit 527a1d1e upstream.
      
      According to the found documentation, data cache flushes and sync
      instructions are needed on the PCX-U+ (PA8200, e.g. C200/C240)
      platforms, while PCX-W (PA8500, e.g. C360) platforms aparently don't
      need those flushes when changing the IO PDIR data structures.
      
      We have no documentation for PCX-W+ (PA8600) and PCX-W2 (PA8700) CPUs,
      but Carlo Pisani reported that his C3600 machine (PA8600, PCX-W+) fails
      when the fdc instructions were removed. His firmware didn't set the NIOP
      bit, so one may assume it's a firmware bug since other C3750 machines
      had the bit set.
      
      Even if documentation (as mentioned above) states that PCX-W (PA8500,
      e.g.  J5000) does not need fdc flushes, Sven could show that an Adaptec
      29320A PCI-X SCSI controller reliably failed on a dd command during the
      first five minutes in his J5000 when fdc flushes were missing.
      
      Going forward, we will now NOT replace the fdc and sync assembler
      instructions by NOPS if:
      a) the NP iopdir_fdc bit was set by firmware, or
      b) we find a CPU up to and including a PCX-W+ (PA8600).
      
      This fixes the HPMC crashes on a C240 and C36XX machines. For other
      machines we rely on the firmware to set the bit when needed.
      
      In case one finds HPMC issues, people could try to boot their machines
      with the "no-alternatives" kernel option to turn off any alternative
      patching.
      Reported-by: default avatarSven Schnelle <svens@stackframe.org>
      Reported-by: default avatarCarlo Pisani <carlojpisani@gmail.com>
      Tested-by: default avatarSven Schnelle <svens@stackframe.org>
      Fixes: 3847dab7 ("parisc: Add alternative coding infrastructure")
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Cc: stable@vger.kernel.org # 5.0+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d40905fa
    • John David Anglin's avatar
      parisc: Use implicit space register selection for loading the coherence index of I/O pdirs · 5ca4fb12
      John David Anglin authored
      commit 63923d2c upstream.
      
      We only support I/O to kernel space. Using %sr1 to load the coherence
      index may be racy unless interrupts are disabled. This patch changes the
      code used to load the coherence index to use implicit space register
      selection. This saves one instruction and eliminates the race.
      
      Tested on rp3440, c8000 and c3750.
      Signed-off-by: default avatarJohn David Anglin <dave.anglin@bell.net>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5ca4fb12
    • Eugeniy Paltsev's avatar
      ARC: mm: SIGSEGV userspace trying to access kernel virtual memory · 35f473d1
      Eugeniy Paltsev authored
      commit a8c715b4 upstream.
      
      As of today if userspace process tries to access a kernel virtual addres
      (0x7000_0000 to 0x7ffff_ffff) such that a legit kernel mapping already
      exists, that process hangs instead of being killed with SIGSEGV
      
      Fix that by ensuring that do_page_fault() handles kenrel vaddr only if
      in kernel mode.
      
      And given this, we can also simplify the code a bit. Now a vmalloc fault
      implies kernel mode so its failure (for some reason) can reuse the
      @no_context label and we can remove @bad_area_nosemaphore.
      
      Reproduce user test for original problem:
      
      ------------------------>8-----------------
       #include <stdlib.h>
       #include <stdint.h>
      
       int main(int argc, char *argv[])
       {
       	volatile uint32_t temp;
      
       	temp = *(uint32_t *)(0x70000000);
       }
      ------------------------>8-----------------
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarEugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      35f473d1
    • Jann Horn's avatar
      habanalabs: fix debugfs code · de3cfea5
      Jann Horn authored
      commit 8438846c upstream.
      
      This fixes multiple things in the habanalabs debugfs code, in particular:
      
       - mmu_write() was unnecessarily verbose, copying around between multiple
         buffers
       - mmu_write() could write a user-specified, unbounded amount of userspace
         memory into a kernel buffer (out-of-bounds write)
       - multiple debugfs read handlers ignored the user-supplied count,
         potentially corrupting out-of-bounds userspace data
       - hl_device_read() was unnecessarily verbose
       - hl_device_write() could read uninitialized stack memory
       - multiple debugfs read handlers copied terminating null characters to
         userspace
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Reviewed-by: default avatarOded Gabbay <oded.gabbay@gmail.com>
      Signed-off-by: default avatarOded Gabbay <oded.gabbay@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      de3cfea5
    • Linus Torvalds's avatar
      rcu: locking and unlocking need to always be at least barriers · f4277a2b
      Linus Torvalds authored
      commit 66be4e66 upstream.
      
      Herbert Xu pointed out that commit bb73c52b ("rcu: Don't disable
      preemption for Tiny and Tree RCU readers") was incorrect in making the
      preempt_disable/enable() be conditional on CONFIG_PREEMPT_COUNT.
      
      If CONFIG_PREEMPT_COUNT isn't enabled, the preemption enable/disable is
      a no-op, but still is a compiler barrier.
      
      And RCU locking still _needs_ that compiler barrier.
      
      It is simply fundamentally not true that RCU locking would be a complete
      no-op: we still need to guarantee (for example) that things that can
      trap and cause preemption cannot migrate into the RCU locked region.
      
      The way we do that is by making it a barrier.
      
      See for example commit 386afc91 ("spinlocks and preemption points
      need to be at least compiler barriers") from back in 2013 that had
      similar issues with spinlocks that become no-ops on UP: they must still
      constrain the compiler from moving other operations into the critical
      region.
      
      Now, it is true that a lot of RCU operations already use READ_ONCE() and
      WRITE_ONCE() (which in practice likely would never be re-ordered wrt
      anything remotely interesting), but it is also true that that is not
      globally the case, and that it's not even necessarily always possible
      (ie bitfields etc).
      Reported-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Fixes: bb73c52b ("rcu: Don't disable preemption for Tiny and Tree RCU readers")
      Cc: stable@kernel.org
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f4277a2b
    • Jakub Kicinski's avatar
      net/tls: replace the sleeping lock around RX resync with a bit lock · 9749d62b
      Jakub Kicinski authored
      [ Upstream commit e52972c1 ]
      
      Commit 38030d7c ("net/tls: avoid NULL-deref on resync during device removal")
      tried to fix a potential NULL-dereference by taking the
      context rwsem.  Unfortunately the RX resync may get called
      from soft IRQ, so we can't use the rwsem to protect from
      the device disappearing.  Because we are guaranteed there
      can be only one resync at a time (it's called from strparser)
      use a bit to indicate resync is busy and make device
      removal wait for the bit to get cleared.
      
      Note that there is a leftover "flags" field in struct
      tls_context already.
      
      Fixes: 4799ac81 ("tls: Add rx inline crypto offload")
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9749d62b
    • Erez Alfasi's avatar
      net/mlx4_en: ethtool, Remove unsupported SFP EEPROM high pages query · e2465055
      Erez Alfasi authored
      [ Upstream commit 135dd959 ]
      
      Querying EEPROM high pages data for SFP module is currently
      not supported by our driver but is still tried, resulting in
      invalid FW queries.
      
      Set the EEPROM ethtool data length to 256 for SFP module to
      limit the reading for page 0 only and prevent invalid FW queries.
      
      Fixes: 7202da8b ("ethtool, net/mlx4_en: Cable info, get_module_info/eeprom ethtool support")
      Signed-off-by: default avatarErez Alfasi <ereza@mellanox.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e2465055
    • David Ahern's avatar
      ipmr_base: Do not reset index in mr_table_dump · ad491eac
      David Ahern authored
      [ Upstream commit 7fcd1e03 ]
      
      e is the counter used to save the location of a dump when an
      skb is filled. Once the walk of the table is complete, mr_table_dump
      needs to return without resetting that index to 0. Dump of a specific
      table is looping because of the reset because there is no way to
      indicate the walk of the table is done.
      
      Move the reset to the caller so the dump of each table starts at 0,
      but the loop counter is maintained if a dump fills an skb.
      
      Fixes: e1cedae1 ("ipmr: Refactor mr_rtm_dumproute")
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ad491eac
    • Matteo Croce's avatar
      cls_matchall: avoid panic when receiving a packet before filter set · cb6794b7
      Matteo Croce authored
      [ Upstream commit 25426043 ]
      
      When a matchall classifier is added, there is a small time interval in
      which tp->root is NULL. If we receive a packet in this small time slice
      a NULL pointer dereference will happen, leading to a kernel panic:
      
          # tc qdisc replace dev eth0 ingress
          # tc filter add dev eth0 parent ffff: matchall action gact drop
          Unable to handle kernel NULL pointer dereference at virtual address 0000000000000034
          Mem abort info:
            ESR = 0x96000005
            Exception class = DABT (current EL), IL = 32 bits
            SET = 0, FnV = 0
            EA = 0, S1PTW = 0
          Data abort info:
            ISV = 0, ISS = 0x00000005
            CM = 0, WnR = 0
          user pgtable: 4k pages, 39-bit VAs, pgdp = 00000000a623d530
          [0000000000000034] pgd=0000000000000000, pud=0000000000000000
          Internal error: Oops: 96000005 [#1] SMP
          Modules linked in: cls_matchall sch_ingress nls_iso8859_1 nls_cp437 vfat fat m25p80 spi_nor mtd xhci_plat_hcd xhci_hcd phy_generic sfp mdio_i2c usbcore i2c_mv64xxx marvell10g mvpp2 usb_common spi_orion mvmdio i2c_core sbsa_gwdt phylink ip_tables x_tables autofs4
          Process ksoftirqd/0 (pid: 9, stack limit = 0x0000000009de7d62)
          CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 5.1.0-rc6 #21
          Hardware name: Marvell 8040 MACCHIATOBin Double-shot (DT)
          pstate: 40000005 (nZcv daif -PAN -UAO)
          pc : mall_classify+0x28/0x78 [cls_matchall]
          lr : tcf_classify+0x78/0x138
          sp : ffffff80109db9d0
          x29: ffffff80109db9d0 x28: ffffffc426058800
          x27: 0000000000000000 x26: ffffffc425b0dd00
          x25: 0000000020000000 x24: 0000000000000000
          x23: ffffff80109dbac0 x22: 0000000000000001
          x21: ffffffc428ab5100 x20: ffffffc425b0dd00
          x19: ffffff80109dbac0 x18: 0000000000000000
          x17: 0000000000000000 x16: 0000000000000000
          x15: 0000000000000000 x14: 0000000000000000
          x13: ffffffbf108ad288 x12: dead000000000200
          x11: 00000000f0000000 x10: 0000000000000001
          x9 : ffffffbf1089a220 x8 : 0000000000000001
          x7 : ffffffbebffaa950 x6 : 0000000000000000
          x5 : 000000442d6ba000 x4 : 0000000000000000
          x3 : ffffff8008735ad8 x2 : ffffff80109dbac0
          x1 : ffffffc425b0dd00 x0 : ffffff8010592078
          Call trace:
           mall_classify+0x28/0x78 [cls_matchall]
           tcf_classify+0x78/0x138
           __netif_receive_skb_core+0x29c/0xa20
           __netif_receive_skb_one_core+0x34/0x60
           __netif_receive_skb+0x28/0x78
           netif_receive_skb_internal+0x2c/0xc0
           napi_gro_receive+0x1a0/0x1d8
           mvpp2_poll+0x928/0xb18 [mvpp2]
           net_rx_action+0x108/0x378
           __do_softirq+0x128/0x320
           run_ksoftirqd+0x44/0x60
           smpboot_thread_fn+0x168/0x1b0
           kthread+0x12c/0x130
           ret_from_fork+0x10/0x1c
          Code: aa0203f3 aa1e03e0 d503201f f9400684 (b9403480)
          ---[ end trace fc71e2ef7b8ab5a5 ]---
          Kernel panic - not syncing: Fatal exception in interrupt
          SMP: stopping secondary CPUs
          Kernel Offset: disabled
          CPU features: 0x002,00002000
          Memory Limit: none
          Rebooting in 1 seconds..
      
      Fix this by adding a NULL check in mall_classify().
      
      Fixes: ed76f5ed ("net: sched: protect filter_chain list with filter_chain_lock mutex")
      Signed-off-by: default avatarMatteo Croce <mcroce@redhat.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cb6794b7
    • David Ahern's avatar
      neighbor: Call __ipv4_neigh_lookup_noref in neigh_xmit · a0c0f274
      David Ahern authored
      [ Upstream commit 4b2a2bfe ]
      
      Commit cd9ff4de changed the key for IFF_POINTOPOINT devices to
      INADDR_ANY but neigh_xmit which is used for MPLS encapsulations was not
      updated to use the altered key. The result is that every packet Tx does
      a lookup on the gateway address which does not find an entry, a new one
      is created only to find the existing one in the table right before the
      insert since arp_constructor was updated to reset the primary key. This
      is seen in the allocs and destroys counters:
          ip -s -4 ntable show | head -10 | grep alloc
      
      which increase for each packet showing the unnecessary overhread.
      
      Fix by having neigh_xmit use __ipv4_neigh_lookup_noref for NEIGH_ARP_TABLE.
      
      Fixes: cd9ff4de ("ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY")
      Reported-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Tested-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a0c0f274
    • David Ahern's avatar
      neighbor: Reset gc_entries counter if new entry is released before insert · 450b9d8e
      David Ahern authored
      [ Upstream commit 64c6f4bb ]
      
      Ian and Alan both reported seeing overflows after upgrades to 5.x kernels:
        neighbour: arp_cache: neighbor table overflow!
      
      Alan's mpls script helped get to the bottom of this bug. When a new entry
      is created the gc_entries counter is bumped in neigh_alloc to check if a
      new one is allowed to be created. ___neigh_create then searches for an
      existing entry before inserting the just allocated one. If an entry
      already exists, the new one is dropped in favor of the existing one. In
      this case the cleanup path needs to drop the gc_entries counter. There
      is no memory leak, only a counter leak.
      
      Fixes: 58956317 ("neighbor: Improve garbage collection")
      Reported-by: default avatarIan Kumlien <ian.kumlien@gmail.com>
      Reported-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Tested-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      450b9d8e
    • Nikita Danilov's avatar
      net: aquantia: fix wol configuration not applied sometimes · 713f03fa
      Nikita Danilov authored
      [ Upstream commit 930b9a05 ]
      
      WoL magic packet configuration sometimes does not work due to
      couple of leakages found.
      
      Mainly there was a regression introduced during readx_poll refactoring.
      
      Next, fw request waiting time was too small. Sometimes that
      caused sleep proxy config function to return with an error
      and to skip WoL configuration.
      At last, WoL data were passed to FW from not clean buffer.
      That could cause FW to accept garbage as a random configuration data.
      
      Fixes: 6a7f2277 ("net: aquantia: replace AQ_HW_WAIT_FOR with readx_poll_timeout_atomic")
      Signed-off-by: default avatarNikita Danilov <nikita.danilov@aquantia.com>
      Signed-off-by: default avatarIgor Russkikh <igor.russkikh@aquantia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      713f03fa
    • Olivier Matz's avatar
      ipv6: fix EFAULT on sendto with icmpv6 and hdrincl · cba55c16
      Olivier Matz authored
      [ Upstream commit b9aa52c4 ]
      
      The following code returns EFAULT (Bad address):
      
        s = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6);
        setsockopt(s, SOL_IPV6, IPV6_HDRINCL, 1);
        sendto(ipv6_icmp6_packet, addr);   /* returns -1, errno = EFAULT */
      
      The IPv4 equivalent code works. A workaround is to use IPPROTO_RAW
      instead of IPPROTO_ICMPV6.
      
      The failure happens because 2 bytes are eaten from the msghdr by
      rawv6_probe_proto_opt() starting from commit 19e3c66b ("ipv6
      equivalent of "ipv4: Avoid reading user iov twice after
      raw_probe_proto_opt""), but at that time it was not a problem because
      IPV6_HDRINCL was not yet introduced.
      
      Only eat these 2 bytes if hdrincl == 0.
      
      Fixes: 715f504b ("ipv6: add IPV6_HDRINCL option for raw sockets")
      Signed-off-by: default avatarOlivier Matz <olivier.matz@6wind.com>
      Acked-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cba55c16
    • Olivier Matz's avatar
      ipv6: use READ_ONCE() for inet->hdrincl as in ipv4 · c64c725d
      Olivier Matz authored
      [ Upstream commit 59e3e4b5 ]
      
      As it was done in commit 8f659a03 ("net: ipv4: fix for a race
      condition in raw_sendmsg") and commit 20b50d79 ("net: ipv4: emulate
      READ_ONCE() on ->hdrincl bit-field in raw_sendmsg()") for ipv4, copy the
      value of inet->hdrincl in a local variable, to avoid introducing a race
      condition in the next commit.
      Signed-off-by: default avatarOlivier Matz <olivier.matz@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c64c725d
    • Tim Beale's avatar
      udp: only choose unbound UDP socket for multicast when not in a VRF · 43ec962d
      Tim Beale authored
      [ Upstream commit 82ba25c6 ]
      
      By default, packets received in another VRF should not be passed to an
      unbound socket in the default VRF. This patch updates the IPv4 UDP
      multicast logic to match the unicast VRF logic (in compute_score()),
      as well as the IPv6 mcast logic (in __udp_v6_is_mcast_sock()).
      
      The particular case I noticed was DHCP discover packets going
      to the 255.255.255.255 address, which are handled by
      __udp4_lib_mcast_deliver(). The previous code meant that running
      multiple different DHCP server or relay agent instances across VRFs
      did not work correctly - any server/relay agent in the default VRF
      received DHCP discover packets for all other VRFs.
      
      Fixes: 6da5b0f0 ("net: ensure unbound datagram socket to be chosen when not in a VRF")
      Signed-off-by: default avatarTim Beale <timbeale@catalyst.net.nz>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      43ec962d
    • Hangbin Liu's avatar
      Revert "fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL not supplied" · 7c3234ec
      Hangbin Liu authored
      [ Upstream commit 4970b42d ]
      
      This reverts commit e9919a24.
      
      Nathan reported the new behaviour breaks Android, as Android just add
      new rules and delete old ones.
      
      If we return 0 without adding dup rules, Android will remove the new
      added rules and causing system to soft-reboot.
      
      Fixes: e9919a24 ("fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL not supplied")
      Reported-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Reported-by: default avatarYaro Slav <yaro330@gmail.com>
      Reported-by: default avatarMaciej Żenczykowski <zenczykowski@gmail.com>
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Tested-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7c3234ec
    • Paolo Abeni's avatar
      pktgen: do not sleep with the thread lock held. · 7b34227e
      Paolo Abeni authored
      [ Upstream commit 720f1de4 ]
      
      Currently, the process issuing a "start" command on the pktgen procfs
      interface, acquires the pktgen thread lock and never release it, until
      all pktgen threads are completed. The above can blocks indefinitely any
      other pktgen command and any (even unrelated) netdevice removal - as
      the pktgen netdev notifier acquires the same lock.
      
      The issue is demonstrated by the following script, reported by Matteo:
      
      ip -b - <<'EOF'
      	link add type dummy
      	link add type veth
      	link set dummy0 up
      EOF
      modprobe pktgen
      echo reset >/proc/net/pktgen/pgctrl
      {
      	echo rem_device_all
      	echo add_device dummy0
      } >/proc/net/pktgen/kpktgend_0
      echo count 0 >/proc/net/pktgen/dummy0
      echo start >/proc/net/pktgen/pgctrl &
      sleep 1
      rmmod veth
      
      Fix the above releasing the thread lock around the sleep call.
      
      Additionally we must prevent racing with forcefull rmmod - as the
      thread lock no more protects from them. Instead, acquire a self-reference
      before waiting for any thread. As a side effect, running
      
      rmmod pktgen
      
      while some thread is running now fails with "module in use" error,
      before this patch such command hanged indefinitely.
      
      Note: the issue predates the commit reported in the fixes tag, but
      this fix can't be applied before the mentioned commit.
      
      v1 -> v2:
       - no need to check for thread existence after flipping the lock,
         pktgen threads are freed only at net exit time
       -
      
      Fixes: 6146e6a4 ("[PKTGEN]: Removes thread_{un,}lock() macros.")
      Reported-and-tested-by: default avatarMatteo Croce <mcroce@redhat.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7b34227e
    • Willem de Bruijn's avatar
      packet: unconditionally free po->rollover · f2bba1f6
      Willem de Bruijn authored
      [ Upstream commit afa0925c ]
      
      Rollover used to use a complex RCU mechanism for assignment, which had
      a race condition. The below patch fixed the bug and greatly simplified
      the logic.
      
      The feature depends on fanout, but the state is private to the socket.
      Fanout_release returns f only when the last member leaves and the
      fanout struct is to be freed.
      
      Destroy rollover unconditionally, regardless of fanout state.
      
      Fixes: 57f015f5 ("packet: fix crash in fanout_demux_rollover()")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Diagnosed-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f2bba1f6
    • Russell King's avatar
      net: sfp: read eeprom in maximum 16 byte increments · bab4057c
      Russell King authored
      [ Upstream commit 28e74a7c ]
      
      Some SFP modules do not like reads longer than 16 bytes, so read the
      EEPROM in chunks of 16 bytes at a time.  This behaviour is not specified
      in the SFP MSAs, which specifies:
      
       "The serial interface uses the 2-wire serial CMOS E2PROM protocol
        defined for the ATMEL AT24C01A/02/04 family of components."
      
      and
      
       "As long as the SFP+ receives an acknowledge, it shall serially clock
        out sequential data words. The sequence is terminated when the host
        responds with a NACK and a STOP instead of an acknowledge."
      
      We must avoid breaking a read across a 16-bit quantity in the diagnostic
      page, thankfully all 16-bit quantities in that page are naturally
      aligned.
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bab4057c
    • Zhu Yanjun's avatar
      net: rds: fix memory leak in rds_ib_flush_mr_pool · 023998c9
      Zhu Yanjun authored
      [ Upstream commit 85cb9287 ]
      
      When the following tests last for several hours, the problem will occur.
      
      Server:
          rds-stress -r 1.1.1.16 -D 1M
      Client:
          rds-stress -r 1.1.1.14 -s 1.1.1.16 -D 1M -T 30
      
      The following will occur.
      
      "
      Starting up....
      tsks   tx/s   rx/s  tx+rx K/s    mbi K/s    mbo K/s tx us/c   rtt us cpu
      %
        1      0      0       0.00       0.00       0.00    0.00 0.00 -1.00
        1      0      0       0.00       0.00       0.00    0.00 0.00 -1.00
        1      0      0       0.00       0.00       0.00    0.00 0.00 -1.00
        1      0      0       0.00       0.00       0.00    0.00 0.00 -1.00
      "
      >From vmcore, we can find that clean_list is NULL.
      
      >From the source code, rds_mr_flushd calls rds_ib_mr_pool_flush_worker.
      Then rds_ib_mr_pool_flush_worker calls
      "
       rds_ib_flush_mr_pool(pool, 0, NULL);
      "
      Then in function
      "
      int rds_ib_flush_mr_pool(struct rds_ib_mr_pool *pool,
                               int free_all, struct rds_ib_mr **ibmr_ret)
      "
      ibmr_ret is NULL.
      
      In the source code,
      "
      ...
      list_to_llist_nodes(pool, &unmap_list, &clean_nodes, &clean_tail);
      if (ibmr_ret)
              *ibmr_ret = llist_entry(clean_nodes, struct rds_ib_mr, llnode);
      
      /* more than one entry in llist nodes */
      if (clean_nodes->next)
              llist_add_batch(clean_nodes->next, clean_tail, &pool->clean_list);
      ...
      "
      When ibmr_ret is NULL, llist_entry is not executed. clean_nodes->next
      instead of clean_nodes is added in clean_list.
      So clean_nodes is discarded. It can not be used again.
      The workqueue is executed periodically. So more and more clean_nodes are
      discarded. Finally the clean_list is NULL.
      Then this problem will occur.
      
      Fixes: 1bc144b6 ("net, rds, Replace xlist in net/rds/xlist.h with llist")
      Signed-off-by: default avatarZhu Yanjun <yanjun.zhu@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      023998c9
    • Maxime Chevallier's avatar
      net: mvpp2: Use strscpy to handle stat strings · 9d659f5c
      Maxime Chevallier authored
      [ Upstream commit d37acd5a ]
      
      Use a safe strscpy call to copy the ethtool stat strings into the
      relevant buffers, instead of a memcpy that will be accessing
      out-of-bound data.
      
      Fixes: 118d6298 ("net: mvpp2: add ethtool GOP statistics")
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9d659f5c
    • Ivan Khoronzhuk's avatar
      net: ethernet: ti: cpsw_ethtool: fix ethtool ring param set · ae4133df
      Ivan Khoronzhuk authored
      [ Upstream commit 09faf5a7 ]
      
      Fix ability to set RX descriptor number, the reason - initially
      "tx_max_pending" was set incorrectly, but the issue appears after
      adding sanity check, so fix is for "sanity" patch.
      
      Fixes: 37e2d99b ("ethtool: Ensure new ring parameters are within bounds during SRINGPARAM")
      Signed-off-by: default avatarIvan Khoronzhuk <ivan.khoronzhuk@linaro.org>
      Reviewed-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ae4133df
    • Xin Long's avatar
      ipv6: fix the check before getting the cookie in rt6_get_cookie · f5622df0
      Xin Long authored
      [ Upstream commit b7999b07 ]
      
      In Jianlin's testing, netperf was broken with 'Connection reset by peer',
      as the cookie check failed in rt6_check() and ip6_dst_check() always
      returned NULL.
      
      It's caused by Commit 93531c67 ("net/ipv6: separate handling of FIB
      entries from dst based routes"), where the cookie can be got only when
      'c1'(see below) for setting dst_cookie whereas rt6_check() is called
      when !'c1' for checking dst_cookie, as we can see in ip6_dst_check().
      
      Since in ip6_dst_check() both rt6_dst_from_check() (c1) and rt6_check()
      (!c1) will check the 'from' cookie, this patch is to remove the c1 check
      in rt6_get_cookie(), so that the dst_cookie can always be set properly.
      
      c1:
        (rt->rt6i_flags & RTF_PCPU || unlikely(!list_empty(&rt->rt6i_uncached)))
      
      Fixes: 93531c67 ("net/ipv6: separate handling of FIB entries from dst based routes")
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f5622df0
    • Xin Long's avatar
      ipv4: not do cache for local delivery if bc_forwarding is enabled · b470bcc4
      Xin Long authored
      [ Upstream commit 0a90478b ]
      
      With the topo:
      
          h1 ---| rp1            |
                |     route  rp3 |--- h3 (192.168.200.1)
          h2 ---| rp2            |
      
      If rp1 bc_forwarding is set while rp2 bc_forwarding is not, after
      doing "ping 192.168.200.255" on h1, then ping 192.168.200.255 on
      h2, and the packets can still be forwared.
      
      This issue was caused by the input route cache. It should only do
      the cache for either bc forwarding or local delivery. Otherwise,
      local delivery can use the route cache for bc forwarding of other
      interfaces.
      
      This patch is to fix it by not doing cache for local delivery if
      all.bc_forwarding is enabled.
      
      Note that we don't fix it by checking route cache local flag after
      rt_cache_valid() in "local_input:" and "ip_mkroute_input", as the
      common route code shouldn't be touched for bc_forwarding.
      
      Fixes: 5cbf777c ("route: add support for directed broadcast forwarding")
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b470bcc4
    • Neil Horman's avatar
      Fix memory leak in sctp_process_init · 8bcd5209
      Neil Horman authored
      [ Upstream commit 0a8dd9f6 ]
      
      syzbot found the following leak in sctp_process_init
      BUG: memory leak
      unreferenced object 0xffff88810ef68400 (size 1024):
        comm "syz-executor273", pid 7046, jiffies 4294945598 (age 28.770s)
        hex dump (first 32 bytes):
          1d de 28 8d de 0b 1b e3 b5 c2 f9 68 fd 1a 97 25  ..(........h...%
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<00000000a02cebbd>] kmemleak_alloc_recursive include/linux/kmemleak.h:55
      [inline]
          [<00000000a02cebbd>] slab_post_alloc_hook mm/slab.h:439 [inline]
          [<00000000a02cebbd>] slab_alloc mm/slab.c:3326 [inline]
          [<00000000a02cebbd>] __do_kmalloc mm/slab.c:3658 [inline]
          [<00000000a02cebbd>] __kmalloc_track_caller+0x15d/0x2c0 mm/slab.c:3675
          [<000000009e6245e6>] kmemdup+0x27/0x60 mm/util.c:119
          [<00000000dfdc5d2d>] kmemdup include/linux/string.h:432 [inline]
          [<00000000dfdc5d2d>] sctp_process_init+0xa7e/0xc20
      net/sctp/sm_make_chunk.c:2437
          [<00000000b58b62f8>] sctp_cmd_process_init net/sctp/sm_sideeffect.c:682
      [inline]
          [<00000000b58b62f8>] sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1384
      [inline]
          [<00000000b58b62f8>] sctp_side_effects net/sctp/sm_sideeffect.c:1194
      [inline]
          [<00000000b58b62f8>] sctp_do_sm+0xbdc/0x1d60 net/sctp/sm_sideeffect.c:1165
          [<0000000044e11f96>] sctp_assoc_bh_rcv+0x13c/0x200
      net/sctp/associola.c:1074
          [<00000000ec43804d>] sctp_inq_push+0x7f/0xb0 net/sctp/inqueue.c:95
          [<00000000726aa954>] sctp_backlog_rcv+0x5e/0x2a0 net/sctp/input.c:354
          [<00000000d9e249a8>] sk_backlog_rcv include/net/sock.h:950 [inline]
          [<00000000d9e249a8>] __release_sock+0xab/0x110 net/core/sock.c:2418
          [<00000000acae44fa>] release_sock+0x37/0xd0 net/core/sock.c:2934
          [<00000000963cc9ae>] sctp_sendmsg+0x2c0/0x990 net/sctp/socket.c:2122
          [<00000000a7fc7565>] inet_sendmsg+0x64/0x120 net/ipv4/af_inet.c:802
          [<00000000b732cbd3>] sock_sendmsg_nosec net/socket.c:652 [inline]
          [<00000000b732cbd3>] sock_sendmsg+0x54/0x70 net/socket.c:671
          [<00000000274c57ab>] ___sys_sendmsg+0x393/0x3c0 net/socket.c:2292
          [<000000008252aedb>] __sys_sendmsg+0x80/0xf0 net/socket.c:2330
          [<00000000f7bf23d1>] __do_sys_sendmsg net/socket.c:2339 [inline]
          [<00000000f7bf23d1>] __se_sys_sendmsg net/socket.c:2337 [inline]
          [<00000000f7bf23d1>] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2337
          [<00000000a8b4131f>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:3
      
      The problem was that the peer.cookie value points to an skb allocated
      area on the first pass through this function, at which point it is
      overwritten with a heap allocated value, but in certain cases, where a
      COOKIE_ECHO chunk is included in the packet, a second pass through
      sctp_process_init is made, where the cookie value is re-allocated,
      leaking the first allocation.
      
      Fix is to always allocate the cookie value, and free it when we are done
      using it.
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Reported-by: syzbot+f7e9153b037eac9b1df8@syzkaller.appspotmail.com
      CC: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: netdev@vger.kernel.org
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8bcd5209
    • Vivien Didelot's avatar
      ethtool: fix potential userspace buffer overflow · 698d9dc3
      Vivien Didelot authored
      [ Upstream commit 0ee4e769 ]
      
      ethtool_get_regs() allocates a buffer of size ops->get_regs_len(),
      and pass it to the kernel driver via ops->get_regs() for filling.
      
      There is no restriction about what the kernel drivers can or cannot do
      with the open ethtool_regs structure. They usually set regs->version
      and ignore regs->len or set it to the same size as ops->get_regs_len().
      
      But if userspace allocates a smaller buffer for the registers dump,
      we would cause a userspace buffer overflow in the final copy_to_user()
      call, which uses the regs.len value potentially reset by the driver.
      
      To fix this, make this case obvious and store regs.len before calling
      ops->get_regs(), to only copy as much data as requested by userspace,
      up to the value returned by ops->get_regs_len().
      
      While at it, remove the redundant check for non-null regbuf.
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@gmail.com>
      Reviewed-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      698d9dc3
  2. 09 Jun, 2019 10 commits