1. 04 Jan, 2020 10 commits
  2. 21 Dec, 2019 30 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.9.207 · 5b7a2c7d
      Greg Kroah-Hartman authored
      5b7a2c7d
    • Aaro Koskinen's avatar
      net: stmmac: don't stop NAPI processing when dropping a packet · 33457946
      Aaro Koskinen authored
      commit 07b39753 upstream.
      
      Currently, if we drop a packet, we exit from NAPI loop before the budget
      is consumed. In some situations this will make the RX processing stall
      e.g. when flood pinging the system with oversized packets, as the
      errorneous packets are not dropped efficiently.
      
      If we drop a packet, we should just continue to the next one as long as
      the budget allows.
      Signed-off-by: default avatarAaro Koskinen <aaro.koskinen@nokia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [acj: backport v4.9 -stable
      -adjust context]
      Signed-off-by: default avatarAviraj CJ <acj@cisco.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      33457946
    • Aaro Koskinen's avatar
      net: stmmac: use correct DMA buffer size in the RX descriptor · 7e3efae8
      Aaro Koskinen authored
      commit 583e6361 upstream.
      
      We always program the maximum DMA buffer size into the receive descriptor,
      although the allocated size may be less. E.g. with the default MTU size
      we allocate only 1536 bytes. If somebody sends us a bigger frame, then
      memory may get corrupted.
      
      Fix by using exact buffer sizes.
      Signed-off-by: default avatarAaro Koskinen <aaro.koskinen@nokia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [acj: backport to v4.9 -stable :
      - adjust context
      - skipped the section modifying non-existent functions in dwxgmac2_descs.c and
      hwif.h ]
      Signed-off-by: default avatarAviraj CJ <acj@cisco.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7e3efae8
    • Mathias Nyman's avatar
      xhci: fix USB3 device initiated resume race with roothub autosuspend · d93f4bce
      Mathias Nyman authored
      commit 057d476f upstream.
      
      A race in xhci USB3 remote wake handling may force device back to suspend
      after it initiated resume siganaling, causing a missed resume event or warm
      reset of device.
      
      When a USB3 link completes resume signaling and goes to enabled (UO)
      state a interrupt is issued and the interrupt handler will clear the
      bus_state->port_remote_wakeup resume flag, allowing bus suspend.
      
      If the USB3 roothub thread just finished reading port status before
      the interrupt, finding ports still in suspended (U3) state, but hasn't
      yet started suspending the hub, then the xhci interrupt handler will clear
      the flag that prevented roothub suspend and allow bus to suspend, forcing
      all port links back to suspended (U3) state.
      
      Example case:
      usb_runtime_suspend() # because all ports still show suspended U3
        usb_suspend_both()
          hub_suspend();   # successful as hub->wakeup_bits not set yet
      ==> INTERRUPT
      xhci_irq()
        handle_port_status()
          clear bus_state->port_remote_wakeup
          usb_wakeup_notification()
            sets hub->wakeup_bits;
              kick_hub_wq()
      <== END INTERRUPT
            hcd_bus_suspend()
              xhci_bus_suspend() # success as port_remote_wakeup bits cleared
      
      Fix this by increasing roothub usage count during port resume to prevent
      roothub autosuspend, and by making sure bus_state->port_remote_wakeup
      flag is only cleared after resume completion is visible, i.e.
      after xhci roothub returned U0 or other non-U3 link state link on a
      get port status request.
      
      Issue rootcaused by Chiasheng Lee
      
      Cc: <stable@vger.kernel.org>
      Cc: Lee, Hou-hsun <hou-hsun.lee@intel.com>
      Reported-by: default avatarLee, Chiasheng <chiasheng.lee@intel.com>
      Signed-off-by: default avatarMathias Nyman <mathias.nyman@linux.intel.com>
      Link: https://lore.kernel.org/r/20191211142007.8847-3-mathias.nyman@linux.intel.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      d93f4bce
    • Alex Deucher's avatar
      drm/radeon: fix r1xx/r2xx register checker for POT textures · 76829059
      Alex Deucher authored
      commit 008037d4 upstream.
      
      Shift and mask were reversed.  Noticed by chance.
      Tested-by: default avatarMeelis Roos <mroos@linux.ee>
      Reviewed-by: default avatarMichel Dänzer <mdaenzer@redhat.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      76829059
    • Bart Van Assche's avatar
      scsi: iscsi: Fix a potential deadlock in the timeout handler · ebb8c240
      Bart Van Assche authored
      commit 5480e299 upstream.
      
      Some time ago the block layer was modified such that timeout handlers are
      called from thread context instead of interrupt context. Make it safe to
      run the iSCSI timeout handler in thread context. This patch fixes the
      following lockdep complaint:
      
      ================================
      WARNING: inconsistent lock state
      5.5.1-dbg+ #11 Not tainted
      --------------------------------
      inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
      kworker/7:1H/206 [HC0[0]:SC0[0]:HE1:SE1] takes:
      ffff88802d9827e8 (&(&session->frwd_lock)->rlock){+.?.}, at: iscsi_eh_cmd_timed_out+0xa6/0x6d0 [libiscsi]
      {IN-SOFTIRQ-W} state was registered at:
        lock_acquire+0x106/0x240
        _raw_spin_lock+0x38/0x50
        iscsi_check_transport_timeouts+0x3e/0x210 [libiscsi]
        call_timer_fn+0x132/0x470
        __run_timers.part.0+0x39f/0x5b0
        run_timer_softirq+0x63/0xc0
        __do_softirq+0x12d/0x5fd
        irq_exit+0xb3/0x110
        smp_apic_timer_interrupt+0x131/0x3d0
        apic_timer_interrupt+0xf/0x20
        default_idle+0x31/0x230
        arch_cpu_idle+0x13/0x20
        default_idle_call+0x53/0x60
        do_idle+0x38a/0x3f0
        cpu_startup_entry+0x24/0x30
        start_secondary+0x222/0x290
        secondary_startup_64+0xa4/0xb0
      irq event stamp: 1383705
      hardirqs last  enabled at (1383705): [<ffffffff81aace5c>] _raw_spin_unlock_irq+0x2c/0x50
      hardirqs last disabled at (1383704): [<ffffffff81aacb98>] _raw_spin_lock_irq+0x18/0x50
      softirqs last  enabled at (1383690): [<ffffffffa0e2efea>] iscsi_queuecommand+0x76a/0xa20 [libiscsi]
      softirqs last disabled at (1383682): [<ffffffffa0e2e998>] iscsi_queuecommand+0x118/0xa20 [libiscsi]
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&(&session->frwd_lock)->rlock);
        <Interrupt>
          lock(&(&session->frwd_lock)->rlock);
      
       *** DEADLOCK ***
      
      2 locks held by kworker/7:1H/206:
       #0: ffff8880d57bf928 ((wq_completion)kblockd){+.+.}, at: process_one_work+0x472/0xab0
       #1: ffff88802b9c7de8 ((work_completion)(&q->timeout_work)){+.+.}, at: process_one_work+0x476/0xab0
      
      stack backtrace:
      CPU: 7 PID: 206 Comm: kworker/7:1H Not tainted 5.5.1-dbg+ #11
      Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      Workqueue: kblockd blk_mq_timeout_work
      Call Trace:
       dump_stack+0xa5/0xe6
       print_usage_bug.cold+0x232/0x23b
       mark_lock+0x8dc/0xa70
       __lock_acquire+0xcea/0x2af0
       lock_acquire+0x106/0x240
       _raw_spin_lock+0x38/0x50
       iscsi_eh_cmd_timed_out+0xa6/0x6d0 [libiscsi]
       scsi_times_out+0xf4/0x440 [scsi_mod]
       scsi_timeout+0x1d/0x20 [scsi_mod]
       blk_mq_check_expired+0x365/0x3a0
       bt_iter+0xd6/0xf0
       blk_mq_queue_tag_busy_iter+0x3de/0x650
       blk_mq_timeout_work+0x1af/0x380
       process_one_work+0x56d/0xab0
       worker_thread+0x7a/0x5d0
       kthread+0x1bc/0x210
       ret_from_fork+0x24/0x30
      
      Fixes: 287922eb ("block: defer timeouts to a workqueue")
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Lee Duncan <lduncan@suse.com>
      Cc: Chris Leech <cleech@redhat.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20191209173457.187370-1-bvanassche@acm.orgSigned-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Reviewed-by: default avatarLee Duncan <lduncan@suse.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ebb8c240
    • Hou Tao's avatar
      dm btree: increase rebalance threshold in __rebalance2() · e861c9bc
      Hou Tao authored
      commit 474e5595 upstream.
      
      We got the following warnings from thin_check during thin-pool setup:
      
        $ thin_check /dev/vdb
        examining superblock
        examining devices tree
          missing devices: [1, 84]
            too few entries in btree_node: 41, expected at least 42 (block 138, max_entries = 126)
        examining mapping tree
      
      The phenomenon is the number of entries in one node of details_info tree is
      less than (max_entries / 3). And it can be easily reproduced by the following
      procedures:
      
        $ new a thin pool
        $ presume the max entries of details_info tree is 126
        $ new 127 thin devices (e.g. 1~127) to make the root node being full
          and then split
        $ remove the first 43 (e.g. 1~43) thin devices to make the children
          reblance repeatedly
        $ stop the thin pool
        $ thin_check
      
      The root cause is that the B-tree removal procedure in __rebalance2()
      doesn't guarantee the invariance: the minimal number of entries in
      non-root node should be >= (max_entries / 3).
      
      Simply fix the problem by increasing the rebalance threshold to
      make sure the number of entries in each child will be greater
      than or equal to (max_entries / 3 + 1), so no matter which
      child is used for removal, the number will still be valid.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHou Tao <houtao1@huawei.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e861c9bc
    • Navid Emamdoost's avatar
      dma-buf: Fix memory leak in sync_file_merge() · bdca5750
      Navid Emamdoost authored
      commit 6645d42d upstream.
      
      In the implementation of sync_file_merge() the allocated sync_file is
      leaked if number of fences overflows. Release sync_file by goto err.
      
      Fixes: a02b9dc9 ("dma-buf/sync_file: refactor fence storage in struct sync_file")
      Signed-off-by: default avatarNavid Emamdoost <navid.emamdoost@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: https://patchwork.freedesktop.org/patch/msgid/20191122220957.30427-1-navid.emamdoost@gmail.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bdca5750
    • Jiang Yi's avatar
      vfio/pci: call irq_bypass_unregister_producer() before freeing irq · 017df5ed
      Jiang Yi authored
      commit d567fb88 upstream.
      
      Since irq_bypass_register_producer() is called after request_irq(), we
      should do tear-down in reverse order: irq_bypass_unregister_producer()
      then free_irq().
      
      Specifically free_irq() may release resources required by the
      irqbypass del_producer() callback.  Notably an example provided by
      Marc Zyngier on arm64 with GICv4 that he indicates has the potential
      to wedge the hardware:
      
       free_irq(irq)
         __free_irq(irq)
           irq_domain_deactivate_irq(irq)
             its_irq_domain_deactivate()
               [unmap the VLPI from the ITS]
      
       kvm_arch_irq_bypass_del_producer(cons, prod)
         kvm_vgic_v4_unset_forwarding(kvm, irq, ...)
           its_unmap_vlpi(irq)
             [Unmap the VLPI from the ITS (again), remap the original LPI]
      Signed-off-by: default avatarJiang Yi <giangyi@amazon.com>
      Cc: stable@vger.kernel.org # v4.4+
      Fixes: 6d7425f1 ("vfio: Register/unregister irq_bypass_producer")
      Link: https://lore.kernel.org/kvm/20191127164910.15888-1-giangyi@amazon.comReviewed-by: default avatarMarc Zyngier <maz@kernel.org>
      Reviewed-by: default avatarEric Auger <eric.auger@redhat.com>
      [aw: commit log]
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      017df5ed
    • Dmitry Osipenko's avatar
      ARM: tegra: Fix FLOW_CTLR_HALT register clobbering by tegra_resume() · 52e81d8d
      Dmitry Osipenko authored
      commit d70f7d31 upstream.
      
      There is an unfortunate typo in the code that results in writing to
      FLOW_CTLR_HALT instead of FLOW_CTLR_CSR.
      
      Cc: <stable@vger.kernel.org>
      Acked-by: default avatarPeter De Schrijver <pdeschrijver@nvidia.com>
      Signed-off-by: default avatarDmitry Osipenko <digetx@gmail.com>
      Signed-off-by: default avatarThierry Reding <treding@nvidia.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      52e81d8d
    • Lihua Yao's avatar
      ARM: dts: s3c64xx: Fix init order of clock providers · f5c5a5d6
      Lihua Yao authored
      commit d60d0cff upstream.
      
      fin_pll is the parent of clock-controller@7e00f000, specify
      the dependency to ensure proper initialization order of clock
      providers.
      
      without this patch:
      [    0.000000] S3C6410 clocks: apll = 0, mpll = 0
      [    0.000000]  epll = 0, arm_clk = 0
      
      with this patch:
      [    0.000000] S3C6410 clocks: apll = 532000000, mpll = 532000000
      [    0.000000]  epll = 24000000, arm_clk = 532000000
      
      Cc: <stable@vger.kernel.org>
      Fixes: 3f6d439f ("clk: reverse default clk provider initialization order in of_clk_init()")
      Signed-off-by: default avatarLihua Yao <ylhuajnu@outlook.com>
      Reviewed-by: default avatarSylwester Nawrocki <s.nawrocki@samsung.com>
      Signed-off-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f5c5a5d6
    • Pavel Shilovsky's avatar
      CIFS: Respect O_SYNC and O_DIRECT flags during reconnect · 5234d7e1
      Pavel Shilovsky authored
      commit 44805b0e upstream.
      
      Currently the client translates O_SYNC and O_DIRECT flags
      into corresponding SMB create options when openning a file.
      The problem is that on reconnect when the file is being
      re-opened the client doesn't set those flags and it causes
      a server to reject re-open requests because create options
      don't match. The latter means that any subsequent system
      call against that open file fail until a share is re-mounted.
      
      Fix this by properly setting SMB create options when
      re-openning files after reconnects.
      
      Fixes: 1013e760: ("SMB3: Don't ignore O_SYNC/O_DSYNC and O_DIRECT flags")
      Cc: Stable <stable@vger.kernel.org>
      Signed-off-by: default avatarPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5234d7e1
    • Max Filippov's avatar
      xtensa: fix TLB sanity checker · 1e4f2b36
      Max Filippov authored
      commit 36de10c4 upstream.
      
      Virtual and translated addresses retrieved by the xtensa TLB sanity
      checker must be consistent, i.e. correspond to the same state of the
      checked TLB entry. KASAN shadow memory is mapped dynamically using
      auto-refill TLB entries and thus may change TLB state between the
      virtual and translated address retrieval, resulting in false TLB
      insanity report.
      Move read_xtlb_translation close to read_xtlb_virtual to make sure that
      read values are consistent.
      
      Cc: stable@vger.kernel.org
      Fixes: a99e07ee ("xtensa: check TLB sanity on return to userspace")
      Signed-off-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1e4f2b36
    • Jian-Hong Pan's avatar
      PCI/MSI: Fix incorrect MSI-X masking on resume · e3745fab
      Jian-Hong Pan authored
      commit e045fa29 upstream.
      
      When a driver enables MSI-X, msix_program_entries() reads the MSI-X Vector
      Control register for each vector and saves it in desc->masked.  Each
      register is 32 bits and bit 0 is the actual Mask bit.
      
      When we restored these registers during resume, we previously set the Mask
      bit if *any* bit in desc->masked was set instead of when the Mask bit
      itself was set:
      
        pci_restore_state
          pci_restore_msi_state
            __pci_restore_msix_state
              for_each_pci_msi_entry
                msix_mask_irq(entry, entry->masked)   <-- entire u32 word
                  __pci_msix_desc_mask_irq(desc, flag)
                    mask_bits = desc->masked & ~PCI_MSIX_ENTRY_CTRL_MASKBIT
                    if (flag)       <-- testing entire u32, not just bit 0
                      mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT
                    writel(mask_bits, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL)
      
      This means that after resume, MSI-X vectors were masked when they shouldn't
      be, which leads to timeouts like this:
      
        nvme nvme0: I/O 978 QID 3 timeout, completion polled
      
      On resume, set the Mask bit only when the saved Mask bit from suspend was
      set.
      
      This should remove the need for 19ea025e ("nvme: Add quirk for Kingston
      NVME SSD running FW E8FK11.T").
      
      [bhelgaas: commit log, move fix to __pci_msix_desc_mask_irq()]
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=204887
      Link: https://lore.kernel.org/r/20191008034238.2503-1-jian-hong@endlessm.com
      Fixes: f2440d9a ("PCI MSI: Refactor interrupt masking code")
      Signed-off-by: default avatarJian-Hong Pan <jian-hong@endlessm.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e3745fab
    • Steffen Liebergeld's avatar
      PCI: Fix Intel ACS quirk UPDCR register address · 0d78b18e
      Steffen Liebergeld authored
      commit d8558ac8 upstream.
      
      According to documentation [0] the correct offset for the Upstream Peer
      Decode Configuration Register (UPDCR) is 0x1014.  It was previously defined
      as 0x1114.
      
      d99321b6 ("PCI: Enable quirks for PCIe ACS on Intel PCH root ports")
      intended to enforce isolation between PCI devices allowing them to be put
      into separate IOMMU groups.  Due to the wrong register offset the intended
      isolation was not fully enforced.  This is fixed with this patch.
      
      Please note that I did not test this patch because I have no hardware that
      implements this register.
      
      [0] https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/4th-gen-core-family-mobile-i-o-datasheet.pdf (page 325)
      Fixes: d99321b6 ("PCI: Enable quirks for PCIe ACS on Intel PCH root ports")
      Link: https://lore.kernel.org/r/7a3505df-79ba-8a28-464c-88b83eefffa6@kernkonzept.comSigned-off-by: default avatarSteffen Liebergeld <steffen.liebergeld@kernkonzept.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarAndrew Murray <andrew.murray@arm.com>
      Acked-by: default avatarAshok Raj <ashok.raj@intel.com>
      Cc: stable@vger.kernel.org	# v3.15+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0d78b18e
    • Greg Kroah-Hartman's avatar
      Revert "regulator: Defer init completion for a while after late_initcall" · 8a2ae3ab
      Greg Kroah-Hartman authored
      This reverts commit 8b8c8d69 which is
      commit 55576cf1 upstream.
      
      It's causing "odd" interactions with older kernels, so it probably isn't
      a good idea to cause timing changes there.  This has been reported to
      cause oopses on Pixel devices.
      Reported-by: default avatarSiddharth Kapoor <ksiddharth@google.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Lee Jones <lee.jones@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8a2ae3ab
    • Guillaume Nault's avatar
      tcp: Protect accesses to .ts_recent_stamp with {READ,WRITE}_ONCE() · e7a3b025
      Guillaume Nault authored
      [ Upstream commit 721c8daf ]
      
      Syncookies borrow the ->rx_opt.ts_recent_stamp field to store the
      timestamp of the last synflood. Protect them with READ_ONCE() and
      WRITE_ONCE() since reads and writes aren't serialised.
      
      Use of .rx_opt.ts_recent_stamp for storing the synflood timestamp was
      introduced by a0f82f64 ("syncookies: remove last_synq_overflow from
      struct tcp_sock"). But unprotected accesses were already there when
      timestamp was stored in .last_synq_overflow.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e7a3b025
    • Guillaume Nault's avatar
      tcp: tighten acceptance of ACKs not matching a child socket · 0c8cd7f6
      Guillaume Nault authored
      [ Upstream commit cb44a08f ]
      
      When no synflood occurs, the synflood timestamp isn't updated.
      Therefore it can be so old that time_after32() can consider it to be
      in the future.
      
      That's a problem for tcp_synq_no_recent_overflow() as it may report
      that a recent overflow occurred while, in fact, it's just that jiffies
      has grown past 'last_overflow' + TCP_SYNCOOKIE_VALID + 2^31.
      
      Spurious detection of recent overflows lead to extra syncookie
      verification in cookie_v[46]_check(). At that point, the verification
      should fail and the packet dropped. But we should have dropped the
      packet earlier as we didn't even send a syncookie.
      
      Let's refine tcp_synq_no_recent_overflow() to report a recent overflow
      only if jiffies is within the
      [last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval. This
      way, no spurious recent overflow is reported when jiffies wraps and
      'last_overflow' becomes in the future from the point of view of
      time_after32().
      
      However, if jiffies wraps and enters the
      [last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval (with
      'last_overflow' being a stale synflood timestamp), then
      tcp_synq_no_recent_overflow() still erroneously reports an
      overflow. In such cases, we have to rely on syncookie verification
      to drop the packet. We unfortunately have no way to differentiate
      between a fresh and a stale syncookie timestamp.
      
      In practice, using last_overflow as lower bound is problematic.
      If the synflood timestamp is concurrently updated between the time
      we read jiffies and the moment we store the timestamp in
      'last_overflow', then 'now' becomes smaller than 'last_overflow' and
      tcp_synq_no_recent_overflow() returns true, potentially dropping a
      valid syncookie.
      
      Reading jiffies after loading the timestamp could fix the problem,
      but that'd require a memory barrier. Let's just accommodate for
      potential timestamp growth instead and extend the interval using
      'last_overflow - HZ' as lower bound.
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0c8cd7f6
    • Guillaume Nault's avatar
      tcp: fix rejected syncookies due to stale timestamps · 3b4a534f
      Guillaume Nault authored
      [ Upstream commit 04d26e7b ]
      
      If no synflood happens for a long enough period of time, then the
      synflood timestamp isn't refreshed and jiffies can advance so much
      that time_after32() can't accurately compare them any more.
      
      Therefore, we can end up in a situation where time_after32(now,
      last_overflow + HZ) returns false, just because these two values are
      too far apart. In that case, the synflood timestamp isn't updated as
      it should be, which can trick tcp_synq_no_recent_overflow() into
      rejecting valid syncookies.
      
      For example, let's consider the following scenario on a system
      with HZ=1000:
      
        * The synflood timestamp is 0, either because that's the timestamp
          of the last synflood or, more commonly, because we're working with
          a freshly created socket.
      
        * We receive a new SYN, which triggers synflood protection. Let's say
          that this happens when jiffies == 2147484649 (that is,
          'synflood timestamp' + HZ + 2^31 + 1).
      
        * Then tcp_synq_overflow() doesn't update the synflood timestamp,
          because time_after32(2147484649, 1000) returns false.
          With:
            - 2147484649: the value of jiffies, aka. 'now'.
            - 1000: the value of 'last_overflow' + HZ.
      
        * A bit later, we receive the ACK completing the 3WHS. But
          cookie_v[46]_check() rejects it because tcp_synq_no_recent_overflow()
          says that we're not under synflood. That's because
          time_after32(2147484649, 120000) returns false.
          With:
            - 2147484649: the value of jiffies, aka. 'now'.
            - 120000: the value of 'last_overflow' + TCP_SYNCOOKIE_VALID.
      
          Of course, in reality jiffies would have increased a bit, but this
          condition will last for the next 119 seconds, which is far enough
          to accommodate for jiffie's growth.
      
      Fix this by updating the overflow timestamp whenever jiffies isn't
      within the [last_overflow, last_overflow + HZ] range. That shouldn't
      have any performance impact since the update still happens at most once
      per second.
      
      Now we're guaranteed to have fresh timestamps while under synflood, so
      tcp_synq_no_recent_overflow() can safely use it with time_after32() in
      such situations.
      
      Stale timestamps can still make tcp_synq_no_recent_overflow() return
      the wrong verdict when not under synflood. This will be handled in the
      next patch.
      
      For 64 bits architectures, the problem was introduced with the
      conversion of ->tw_ts_recent_stamp to 32 bits integer by commit
      cca9bab1 ("tcp: use monotonic timestamps for PAWS").
      The problem has always been there on 32 bits architectures.
      
      Fixes: cca9bab1 ("tcp: use monotonic timestamps for PAWS")
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3b4a534f
    • Eric Dumazet's avatar
      inet: protect against too small mtu values. · 67b02e37
      Eric Dumazet authored
      [ Upstream commit 501a90c9 ]
      
      syzbot was once again able to crash a host by setting a very small mtu
      on loopback device.
      
      Let's make inetdev_valid_mtu() available in include/net/ip.h,
      and use it in ip_setup_cork(), so that we protect both ip_append_page()
      and __ip_append_data()
      
      Also add a READ_ONCE() when the device mtu is read.
      
      Pairs this lockless read with one WRITE_ONCE() in __dev_set_mtu(),
      even if other code paths might write over this field.
      
      Add a big comment in include/linux/netdevice.h about dev->mtu
      needing READ_ONCE()/WRITE_ONCE() annotations.
      
      Hopefully we will add the missing ones in followup patches.
      
      [1]
      
      refcount_t: saturated; leaking memory.
      WARNING: CPU: 0 PID: 9464 at lib/refcount.c:22 refcount_warn_saturate+0x138/0x1f0 lib/refcount.c:22
      Kernel panic - not syncing: panic_on_warn set ...
      CPU: 0 PID: 9464 Comm: syz-executor850 Not tainted 5.4.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x197/0x210 lib/dump_stack.c:118
       panic+0x2e3/0x75c kernel/panic.c:221
       __warn.cold+0x2f/0x3e kernel/panic.c:582
       report_bug+0x289/0x300 lib/bug.c:195
       fixup_bug arch/x86/kernel/traps.c:174 [inline]
       fixup_bug arch/x86/kernel/traps.c:169 [inline]
       do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:267
       do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:286
       invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027
      RIP: 0010:refcount_warn_saturate+0x138/0x1f0 lib/refcount.c:22
      Code: 06 31 ff 89 de e8 c8 f5 e6 fd 84 db 0f 85 6f ff ff ff e8 7b f4 e6 fd 48 c7 c7 e0 71 4f 88 c6 05 56 a6 a4 06 01 e8 c7 a8 b7 fd <0f> 0b e9 50 ff ff ff e8 5c f4 e6 fd 0f b6 1d 3d a6 a4 06 31 ff 89
      RSP: 0018:ffff88809689f550 EFLAGS: 00010286
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffffffff815e4336 RDI: ffffed1012d13e9c
      RBP: ffff88809689f560 R08: ffff88809c50a3c0 R09: fffffbfff15d31b1
      R10: fffffbfff15d31b0 R11: ffffffff8ae98d87 R12: 0000000000000001
      R13: 0000000000040100 R14: ffff888099041104 R15: ffff888218d96e40
       refcount_add include/linux/refcount.h:193 [inline]
       skb_set_owner_w+0x2b6/0x410 net/core/sock.c:1999
       sock_wmalloc+0xf1/0x120 net/core/sock.c:2096
       ip_append_page+0x7ef/0x1190 net/ipv4/ip_output.c:1383
       udp_sendpage+0x1c7/0x480 net/ipv4/udp.c:1276
       inet_sendpage+0xdb/0x150 net/ipv4/af_inet.c:821
       kernel_sendpage+0x92/0xf0 net/socket.c:3794
       sock_sendpage+0x8b/0xc0 net/socket.c:936
       pipe_to_sendpage+0x2da/0x3c0 fs/splice.c:458
       splice_from_pipe_feed fs/splice.c:512 [inline]
       __splice_from_pipe+0x3ee/0x7c0 fs/splice.c:636
       splice_from_pipe+0x108/0x170 fs/splice.c:671
       generic_splice_sendpage+0x3c/0x50 fs/splice.c:842
       do_splice_from fs/splice.c:861 [inline]
       direct_splice_actor+0x123/0x190 fs/splice.c:1035
       splice_direct_to_actor+0x3b4/0xa30 fs/splice.c:990
       do_splice_direct+0x1da/0x2a0 fs/splice.c:1078
       do_sendfile+0x597/0xd00 fs/read_write.c:1464
       __do_sys_sendfile64 fs/read_write.c:1525 [inline]
       __se_sys_sendfile64 fs/read_write.c:1511 [inline]
       __x64_sys_sendfile64+0x1dd/0x220 fs/read_write.c:1511
       do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x441409
      Code: e8 ac e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fffb64c4f78 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441409
      RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000005
      RBP: 0000000000073b8a R08: 0000000000000010 R09: 0000000000000010
      R10: 0000000000010001 R11: 0000000000000246 R12: 0000000000402180
      R13: 0000000000402210 R14: 0000000000000000 R15: 0000000000000000
      Kernel Offset: disabled
      Rebooting in 86400 seconds..
      
      Fixes: 1470ddf7 ("inet: Remove explicit write references to sk/inet in ip_append_data")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      67b02e37
    • Taehee Yoo's avatar
      tipc: fix ordering of tipc module init and exit routine · dddfb252
      Taehee Yoo authored
      [ Upstream commit 9cf1cd8e ]
      
      In order to set/get/dump, the tipc uses the generic netlink
      infrastructure. So, when tipc module is inserted, init function
      calls genl_register_family().
      After genl_register_family(), set/get/dump commands are immediately
      allowed and these callbacks internally use the net_generic.
      net_generic is allocated by register_pernet_device() but this
      is called after genl_register_family() in the __init function.
      So, these callbacks would use un-initialized net_generic.
      
      Test commands:
          #SHELL1
          while :
          do
              modprobe tipc
              modprobe -rv tipc
          done
      
          #SHELL2
          while :
          do
              tipc link list
          done
      
      Splat looks like:
      [   59.616322][ T2788] kasan: CONFIG_KASAN_INLINE enabled
      [   59.617234][ T2788] kasan: GPF could be caused by NULL-ptr deref or user memory access
      [   59.618398][ T2788] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [   59.619389][ T2788] CPU: 3 PID: 2788 Comm: tipc Not tainted 5.4.0+ #194
      [   59.620231][ T2788] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [   59.621428][ T2788] RIP: 0010:tipc_bcast_get_broadcast_mode+0x131/0x310 [tipc]
      [   59.622379][ T2788] Code: c7 c6 ef 8b 38 c0 65 ff 0d 84 83 c9 3f e8 d7 a5 f2 e3 48 8d bb 38 11 00 00 48 b8 00 00 00 00
      [   59.622550][ T2780] NET: Registered protocol family 30
      [   59.624627][ T2788] RSP: 0018:ffff88804b09f578 EFLAGS: 00010202
      [   59.624630][ T2788] RAX: dffffc0000000000 RBX: 0000000000000011 RCX: 000000008bc66907
      [   59.624631][ T2788] RDX: 0000000000000229 RSI: 000000004b3cf4cc RDI: 0000000000001149
      [   59.624633][ T2788] RBP: ffff88804b09f588 R08: 0000000000000003 R09: fffffbfff4fb3df1
      [   59.624635][ T2788] R10: fffffbfff50318f8 R11: ffff888066cadc18 R12: ffffffffa6cc2f40
      [   59.624637][ T2788] R13: 1ffff11009613eba R14: ffff8880662e9328 R15: ffff8880662e9328
      [   59.624639][ T2788] FS:  00007f57d8f7b740(0000) GS:ffff88806cc00000(0000) knlGS:0000000000000000
      [   59.624645][ T2788] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   59.625875][ T2780] tipc: Started in single node mode
      [   59.626128][ T2788] CR2: 00007f57d887a8c0 CR3: 000000004b140002 CR4: 00000000000606e0
      [   59.633991][ T2788] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   59.635195][ T2788] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   59.636478][ T2788] Call Trace:
      [   59.637025][ T2788]  tipc_nl_add_bc_link+0x179/0x1470 [tipc]
      [   59.638219][ T2788]  ? lock_downgrade+0x6e0/0x6e0
      [   59.638923][ T2788]  ? __tipc_nl_add_link+0xf90/0xf90 [tipc]
      [   59.639533][ T2788]  ? tipc_nl_node_dump_link+0x318/0xa50 [tipc]
      [   59.640160][ T2788]  ? mutex_lock_io_nested+0x1380/0x1380
      [   59.640746][ T2788]  tipc_nl_node_dump_link+0x4fd/0xa50 [tipc]
      [   59.641356][ T2788]  ? tipc_nl_node_reset_link_stats+0x340/0x340 [tipc]
      [   59.642088][ T2788]  ? __skb_ext_del+0x270/0x270
      [   59.642594][ T2788]  genl_lock_dumpit+0x85/0xb0
      [   59.643050][ T2788]  netlink_dump+0x49c/0xed0
      [   59.643529][ T2788]  ? __netlink_sendskb+0xc0/0xc0
      [   59.644044][ T2788]  ? __netlink_dump_start+0x190/0x800
      [   59.644617][ T2788]  ? __mutex_unlock_slowpath+0xd0/0x670
      [   59.645177][ T2788]  __netlink_dump_start+0x5a0/0x800
      [   59.645692][ T2788]  genl_rcv_msg+0xa75/0xe90
      [   59.646144][ T2788]  ? __lock_acquire+0xdfe/0x3de0
      [   59.646692][ T2788]  ? genl_family_rcv_msg_attrs_parse+0x320/0x320
      [   59.647340][ T2788]  ? genl_lock_dumpit+0xb0/0xb0
      [   59.647821][ T2788]  ? genl_unlock+0x20/0x20
      [   59.648290][ T2788]  ? genl_parallel_done+0xe0/0xe0
      [   59.648787][ T2788]  ? find_held_lock+0x39/0x1d0
      [   59.649276][ T2788]  ? genl_rcv+0x15/0x40
      [   59.649722][ T2788]  ? lock_contended+0xcd0/0xcd0
      [   59.650296][ T2788]  netlink_rcv_skb+0x121/0x350
      [   59.650828][ T2788]  ? genl_family_rcv_msg_attrs_parse+0x320/0x320
      [   59.651491][ T2788]  ? netlink_ack+0x940/0x940
      [   59.651953][ T2788]  ? lock_acquire+0x164/0x3b0
      [   59.652449][ T2788]  genl_rcv+0x24/0x40
      [   59.652841][ T2788]  netlink_unicast+0x421/0x600
      [ ... ]
      
      Fixes: 7e436905 ("tipc: fix a slab object leak")
      Fixes: a62fbcce ("tipc: make subscriber server support net namespace")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dddfb252
    • Eric Dumazet's avatar
      tcp: md5: fix potential overestimation of TCP option space · e12119e7
      Eric Dumazet authored
      [ Upstream commit 9424e2e7 ]
      
      Back in 2008, Adam Langley fixed the corner case of packets for flows
      having all of the following options : MD5 TS SACK
      
      Since MD5 needs 20 bytes, and TS needs 12 bytes, no sack block
      can be cooked from the remaining 8 bytes.
      
      tcp_established_options() correctly sets opts->num_sack_blocks
      to zero, but returns 36 instead of 32.
      
      This means TCP cooks packets with 4 extra bytes at the end
      of options, containing unitialized bytes.
      
      Fixes: 33ad798c ("tcp: options clean up")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e12119e7
    • Aaron Conole's avatar
      openvswitch: support asymmetric conntrack · d1c79f98
      Aaron Conole authored
      [ Upstream commit 5d50aa83 ]
      
      The openvswitch module shares a common conntrack and NAT infrastructure
      exposed via netfilter.  It's possible that a packet needs both SNAT and
      DNAT manipulation, due to e.g. tuple collision.  Netfilter can support
      this because it runs through the NAT table twice - once on ingress and
      again after egress.  The openvswitch module doesn't have such capability.
      
      Like netfilter hook infrastructure, we should run through NAT twice to
      keep the symmetry.
      
      Fixes: 05752523 ("openvswitch: Interface with NAT.")
      Signed-off-by: default avatarAaron Conole <aconole@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d1c79f98
    • Grygorii Strashko's avatar
      net: ethernet: ti: cpsw: fix extra rx interrupt · 4939f3c3
      Grygorii Strashko authored
      [ Upstream commit 51302f77 ]
      
      Now RX interrupt is triggered twice every time, because in
      cpsw_rx_interrupt() it is asked first and then disabled. So there will be
      pending interrupt always, when RX interrupt is enabled again in NAPI
      handler.
      
      Fix it by first disabling IRQ and then do ask.
      
      Fixes: 870915fe ("drivers: net: cpsw: remove disable_irq/enable_irq as irq can be masked from cpsw itself")
      Signed-off-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4939f3c3
    • Nikolay Aleksandrov's avatar
      net: bridge: deny dev_set_mac_address() when unregistering · b979958e
      Nikolay Aleksandrov authored
      [ Upstream commit c4b4c421 ]
      
      We have an interesting memory leak in the bridge when it is being
      unregistered and is a slave to a master device which would change the
      mac of its slaves on unregister (e.g. bond, team). This is a very
      unusual setup but we do end up leaking 1 fdb entry because
      dev_set_mac_address() would cause the bridge to insert the new mac address
      into its table after all fdbs are flushed, i.e. after dellink() on the
      bridge has finished and we call NETDEV_UNREGISTER the bond/team would
      release it and will call dev_set_mac_address() to restore its original
      address and that in turn will add an fdb in the bridge.
      One fix is to check for the bridge dev's reg_state in its
      ndo_set_mac_address callback and return an error if the bridge is not in
      NETREG_REGISTERED.
      
      Easy steps to reproduce:
       1. add bond in mode != A/B
       2. add any slave to the bond
       3. add bridge dev as a slave to the bond
       4. destroy the bridge device
      
      Trace:
       unreferenced object 0xffff888035c4d080 (size 128):
         comm "ip", pid 4068, jiffies 4296209429 (age 1413.753s)
         hex dump (first 32 bytes):
           41 1d c9 36 80 88 ff ff 00 00 00 00 00 00 00 00  A..6............
           d2 19 c9 5e 3f d7 00 00 00 00 00 00 00 00 00 00  ...^?...........
         backtrace:
           [<00000000ddb525dc>] kmem_cache_alloc+0x155/0x26f
           [<00000000633ff1e0>] fdb_create+0x21/0x486 [bridge]
           [<0000000092b17e9c>] fdb_insert+0x91/0xdc [bridge]
           [<00000000f2a0f0ff>] br_fdb_change_mac_address+0xb3/0x175 [bridge]
           [<000000001de02dbd>] br_stp_change_bridge_id+0xf/0xff [bridge]
           [<00000000ac0e32b1>] br_set_mac_address+0x76/0x99 [bridge]
           [<000000006846a77f>] dev_set_mac_address+0x63/0x9b
           [<00000000d30738fc>] __bond_release_one+0x3f6/0x455 [bonding]
           [<00000000fc7ec01d>] bond_netdev_event+0x2f2/0x400 [bonding]
           [<00000000305d7795>] notifier_call_chain+0x38/0x56
           [<0000000028885d4a>] call_netdevice_notifiers+0x1e/0x23
           [<000000008279477b>] rollback_registered_many+0x353/0x6a4
           [<0000000018ef753a>] unregister_netdevice_many+0x17/0x6f
           [<00000000ba854b7a>] rtnl_delete_link+0x3c/0x43
           [<00000000adf8618d>] rtnl_dellink+0x1dc/0x20a
           [<000000009b6395fd>] rtnetlink_rcv_msg+0x23d/0x268
      
      Fixes: 43598813 ("bridge: add local MAC address to forwarding table (v2)")
      Reported-by: syzbot+2add91c08eb181fea1bf@syzkaller.appspotmail.com
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b979958e
    • Ivan Bornyakov's avatar
      nvme: host: core: fix precedence of ternary operator · c39c0be9
      Ivan Bornyakov authored
      commit e9a9853c upstream.
      
      Ternary operator have lower precedence then bitwise or, so 'cdw10' was
      calculated wrong.
      Signed-off-by: default avatarIvan Bornyakov <brnkv.i1@gmail.com>
      Reviewed-by: default avatarMax Gurtovoy <maxg@mellanox.com>
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c39c0be9
    • Konstantin Khorenko's avatar
      kernel/module.c: wakeup processes in module_wq on module unload · 5defd32b
      Konstantin Khorenko authored
      [ Upstream commit 5d603311 ]
      
      Fix the race between load and unload a kernel module.
      
      sys_delete_module()
       try_stop_module()
        mod->state = _GOING
      					add_unformed_module()
      					 old = find_module_all()
      					 (old->state == _GOING =>
      					  wait_event_interruptible())
      
      					 During pre-condition
      					 finished_loading() rets 0
      					 schedule()
      					 (never gets waken up later)
       free_module()
        mod->state = _UNFORMED
         list_del_rcu(&mod->list)
         (dels mod from "modules" list)
      
      return
      
      The race above leads to modprobe hanging forever on loading
      a module.
      
      Error paths on loading module call wake_up_all(&module_wq) after
      freeing module, so let's do the same on straight module unload.
      
      Fixes: 6e6de3de ("kernel/module.c: Only return -EEXIST for modules that have finished loading")
      Reviewed-by: default avatarPrarit Bhargava <prarit@redhat.com>
      Signed-off-by: default avatarKonstantin Khorenko <khorenko@virtuozzo.com>
      Signed-off-by: default avatarJessica Yu <jeyu@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5defd32b
    • Eran Ben Elisha's avatar
      net/mlx5e: Fix SFF 8472 eeprom length · f265cbda
      Eran Ben Elisha authored
      [ Upstream commit c431f859 ]
      
      SFF 8472 eeprom length is 512 bytes. Fix module info return value to
      support 512 bytes read.
      
      Fixes: ace329f4 ("net/mlx5e: ethtool, Remove unsupported SFP EEPROM high pages query")
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Reviewed-by: default avatarAya Levin <ayal@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f265cbda
    • Pavel Tikhomirov's avatar
      sunrpc: fix crash when cache_head become valid before update · 5b67825b
      Pavel Tikhomirov authored
      [ Upstream commit 5fcaf698 ]
      
      I was investigating a crash in our Virtuozzo7 kernel which happened in
      in svcauth_unix_set_client. I found out that we access m_client field
      in ip_map structure, which was received from sunrpc_cache_lookup (we
      have a bit older kernel, now the code is in sunrpc_cache_add_entry), and
      these field looks uninitialized (m_client == 0x74 don't look like a
      pointer) but in the cache_head in flags we see 0x1 which is CACHE_VALID.
      
      It looks like the problem appeared from our previous fix to sunrpc (1):
      commit 4ecd55ea ("sunrpc: fix cache_head leak due to queued
      request")
      
      And we've also found a patch already fixing our patch (2):
      commit d58431ea ("sunrpc: don't mark uninitialised items as VALID.")
      
      Though the crash is eliminated, I think the core of the problem is not
      completely fixed:
      
      Neil in the patch (2) makes cache_head CACHE_NEGATIVE, before
      cache_fresh_locked which was added in (1) to fix crash. These way
      cache_is_valid won't say the cache is valid anymore and in
      svcauth_unix_set_client the function cache_check will return error
      instead of 0, and we don't count entry as initialized.
      
      But it looks like we need to remove cache_fresh_locked completely in
      sunrpc_cache_lookup:
      
      In (1) we've only wanted to make cache_fresh_unlocked->cache_dequeue so
      that cache_requests with no readers also release corresponding
      cache_head, to fix their leak.  We with Vasily were not sure if
      cache_fresh_locked and cache_fresh_unlocked should be used in pair or
      not, so we've guessed to use them in pair.
      
      Now we see that we don't want the CACHE_VALID bit set here by
      cache_fresh_locked, as "valid" means "initialized" and there is no
      initialization in sunrpc_cache_add_entry. Both expiry_time and
      last_refresh are not used in cache_fresh_unlocked code-path and also not
      required for the initial fix.
      
      So to conclude cache_fresh_locked was called by mistake, and we can just
      safely remove it instead of crutching it with CACHE_NEGATIVE. It looks
      ideologically better for me. Hope I don't miss something here.
      
      Here is our crash backtrace:
      [13108726.326291] BUG: unable to handle kernel NULL pointer dereference at 0000000000000074
      [13108726.326365] IP: [<ffffffffc01f79eb>] svcauth_unix_set_client+0x2ab/0x520 [sunrpc]
      [13108726.326448] PGD 0
      [13108726.326468] Oops: 0002 [#1] SMP
      [13108726.326497] Modules linked in: nbd isofs xfs loop kpatch_cumulative_81_0_r1(O) xt_physdev nfnetlink_queue bluetooth rfkill ip6table_nat nf_nat_ipv6 ip_vs_wrr ip_vs_wlc ip_vs_sh nf_conntrack_netlink ip_vs_sed ip_vs_pe_sip nf_conntrack_sip ip_vs_nq ip_vs_lc ip_vs_lblcr ip_vs_lblc ip_vs_ftp ip_vs_dh nf_nat_ftp nf_conntrack_ftp iptable_raw xt_recent nf_log_ipv6 xt_hl ip6t_rt nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_TCPMSS xt_tcpmss vxlan ip6_udp_tunnel udp_tunnel xt_statistic xt_NFLOG nfnetlink_log dummy xt_mark xt_REDIRECT nf_nat_redirect raw_diag udp_diag tcp_diag inet_diag netlink_diag af_packet_diag unix_diag rpcsec_gss_krb5 xt_addrtype ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 ebtable_nat ebtable_broute nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_mangle ip6table_raw nfsv4
      [13108726.327173]  dns_resolver cls_u32 binfmt_misc arptable_filter arp_tables ip6table_filter ip6_tables devlink fuse_kio_pcs ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_nat iptable_nat nf_nat_ipv4 xt_comment nf_conntrack_ipv4 nf_defrag_ipv4 xt_wdog_tmo xt_multiport bonding xt_set xt_conntrack iptable_filter iptable_mangle kpatch(O) ebtable_filter ebt_among ebtables ip_set_hash_ip ip_set nfnetlink vfat fat skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass fuse pcspkr ses enclosure joydev sg mei_me hpwdt hpilo lpc_ich mei ipmi_si shpchp ipmi_devintf ipmi_msghandler xt_ipvs acpi_power_meter ip_vs_rr nfsv3 nfsd auth_rpcgss nfs_acl nfs lockd grace fscache nf_nat cls_fw sch_htb sch_cbq sch_sfq ip_vs em_u32 nf_conntrack tun br_netfilter veth overlay ip6_vzprivnet ip6_vznetstat ip_vznetstat
      [13108726.327817]  ip_vzprivnet vziolimit vzevent vzlist vzstat vznetstat vznetdev vzmon vzdev bridge pio_kaio pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper scsi_transport_iscsi 8021q syscopyarea sysfillrect garp sysimgblt fb_sys_fops mrp stp ttm llc bnx2x crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel drm dm_multipath ghash_clmulni_intel uas aesni_intel lrw gf128mul glue_helper ablk_helper cryptd tg3 smartpqi scsi_transport_sas mdio libcrc32c i2c_core usb_storage ptp pps_core wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod [last unloaded: kpatch_cumulative_82_0_r1]
      [13108726.328403] CPU: 35 PID: 63742 Comm: nfsd ve: 51332 Kdump: loaded Tainted: G        W  O   ------------   3.10.0-862.20.2.vz7.73.29 #1 73.29
      [13108726.328491] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 10/02/2018
      [13108726.328554] task: ffffa0a6a41b1160 ti: ffffa0c2a74bc000 task.ti: ffffa0c2a74bc000
      [13108726.328610] RIP: 0010:[<ffffffffc01f79eb>]  [<ffffffffc01f79eb>] svcauth_unix_set_client+0x2ab/0x520 [sunrpc]
      [13108726.328706] RSP: 0018:ffffa0c2a74bfd80  EFLAGS: 00010246
      [13108726.328750] RAX: 0000000000000001 RBX: ffffa0a6183ae000 RCX: 0000000000000000
      [13108726.328811] RDX: 0000000000000074 RSI: 0000000000000286 RDI: ffffa0c2a74bfcf0
      [13108726.328864] RBP: ffffa0c2a74bfe00 R08: ffffa0bab8c22960 R09: 0000000000000001
      [13108726.328916] R10: 0000000000000001 R11: 0000000000000001 R12: ffffa0a32aa7f000
      [13108726.328969] R13: ffffa0a6183afac0 R14: ffffa0c233d88d00 R15: ffffa0c2a74bfdb4
      [13108726.329022] FS:  0000000000000000(0000) GS:ffffa0e17f9c0000(0000) knlGS:0000000000000000
      [13108726.329081] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [13108726.332311] CR2: 0000000000000074 CR3: 00000026a1b28000 CR4: 00000000007607e0
      [13108726.334606] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [13108726.336754] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [13108726.338908] PKRU: 00000000
      [13108726.341047] Call Trace:
      [13108726.343074]  [<ffffffff8a2c78b4>] ? groups_alloc+0x34/0x110
      [13108726.344837]  [<ffffffffc01f5eb4>] svc_set_client+0x24/0x30 [sunrpc]
      [13108726.346631]  [<ffffffffc01f2ac1>] svc_process_common+0x241/0x710 [sunrpc]
      [13108726.348332]  [<ffffffffc01f3093>] svc_process+0x103/0x190 [sunrpc]
      [13108726.350016]  [<ffffffffc07d605f>] nfsd+0xdf/0x150 [nfsd]
      [13108726.351735]  [<ffffffffc07d5f80>] ? nfsd_destroy+0x80/0x80 [nfsd]
      [13108726.353459]  [<ffffffff8a2bf741>] kthread+0xd1/0xe0
      [13108726.355195]  [<ffffffff8a2bf670>] ? create_kthread+0x60/0x60
      [13108726.356896]  [<ffffffff8a9556dd>] ret_from_fork_nospec_begin+0x7/0x21
      [13108726.358577]  [<ffffffff8a2bf670>] ? create_kthread+0x60/0x60
      [13108726.360240] Code: 4c 8b 45 98 0f 8e 2e 01 00 00 83 f8 fe 0f 84 76 fe ff ff 85 c0 0f 85 2b 01 00 00 49 8b 50 40 b8 01 00 00 00 48 89 93 d0 1a 00 00 <f0> 0f c1 02 83 c0 01 83 f8 01 0f 8e 53 02 00 00 49 8b 44 24 38
      [13108726.363769] RIP  [<ffffffffc01f79eb>] svcauth_unix_set_client+0x2ab/0x520 [sunrpc]
      [13108726.365530]  RSP <ffffa0c2a74bfd80>
      [13108726.367179] CR2: 0000000000000074
      
      Fixes: d58431ea ("sunrpc: don't mark uninitialised items as VALID.")
      Signed-off-by: default avatarPavel Tikhomirov <ptikhomirov@virtuozzo.com>
      Acked-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5b67825b
    • Tejun Heo's avatar
      workqueue: Fix missing kfree(rescuer) in destroy_workqueue() · 1a1e9ff5
      Tejun Heo authored
      commit 8efe1223 upstream.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarQian Cai <cai@lca.pw>
      Fixes: def98c84 ("workqueue: Fix spurious sanity check failures in destroy_workqueue()")
      Cc: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1a1e9ff5