- 08 Mar, 2017 40 commits
-
-
Long Li authored
BugLink: http://bugs.launchpad.net/bugs/1665097 A PCI_EJECT message can arrive at the same time we are calling pci_scan_child_bus in the workqueue for the previous PCI_BUS_RELATIONS message or in create_root_hv_pci_bus(), in this case we could potentailly modify the bus from multiple places. Properly lock the bus access. Thanks Dexuan Cui <decui@microsoft.com> for pointing out the race condition in create_root_hv_pci_bus(). Signed-off-by: Long Li <longli@microsoft.com> Reported-by: Xiaofeng Wang <xiaofwan@redhat.com> Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> Acked-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Long Li authored
BugLink: http://bugs.launchpad.net/bugs/1665097 hv_pci_devices_present is called in hv_pci_remove when we remove a PCI device from host (e.g. by disabling SRIOV on a device). In hv_pci_remove, the bus is already removed before the call, so we don't need to rescan the bus in the workqueue scheduled from hv_pci_devices_present. By introducing status hv_pcibus_removed, we can avoid this situation. Signed-off-by: Long Li <longli@microsoft.com> Reported-by: Xiaofeng Wang <xiaofwan@redhat.com> Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> Acked-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Dexuan Cui authored
BugLink: http://bugs.launchpad.net/bugs/1665097 The devfn of 00:02.0 is 0x10. devfn_to_wslot(0x10) == 0x2, and wslot_to_devfn(0x2) should be 0x10, while it's 0x2 in the current code. Due to this, hv_eject_device_work() -> pci_get_domain_bus_and_slot() returns NULL and pci_stop_and_remove_bus_device() is not called. Later when the real device driver's .remove() is invoked by hv_pci_remove() -> pci_stop_root_bus(), some warnings can be noticed because the VM has lost the access to the underlying device at that time. Signed-off-by: Jake Oshins <jakeo@microsoft.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Haiyang Zhang <haiyangz@microsoft.com> CC: stable@vger.kernel.org CC: K. Y. Srinivasan <kys@microsoft.com> CC: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> Acked-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jack Morgenstein authored
BugLink: http://bugs.launchpad.net/bugs/1650058 Some Hypervisors detach VFs from VMs by instantly causing an FLR event to be generated for a VF. In the mlx4 case, this will cause that VF's comm channel to be disabled before the VM has an opportunity to invoke the VF device's "shutdown" method. The result is that the VF driver on the VM will experience a command timeout during the shutdown process when the Hypervisor does not deliver a command-completion event to the VM. To avoid FW command timeouts on the VM when the driver's shutdown method is invoked, we detect the absence of the VF's comm channel at the very start of the shutdown process. If the comm-channel has already been disabled, we cause all FW commands during the device shutdown process to immediately return success (and thus avoid all command timeouts). Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit d585df1c) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> Acked-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jack Morgenstein authored
BugLink: http://bugs.launchpad.net/bugs/1650058 Save the qp context flags byte containing the flag disabling vlan stripping in the RESET to INIT qp transition, rather than in the INIT to RTR transition. Per the firmware spec, the flags in this byte are active in the RESET to INIT transition. As a result of saving the flags in the incorrect qp transition, when switching dynamically from VGT to VST and back to VGT, the vlan remained stripped (as is required for VST) and did not return to not-stripped (as is required for VGT). Fixes: f0f829bf ("net/mlx4_core: Add immediate activate for VGT->VST->VGT") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 7c3945bc) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> Acked-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Jack Morgenstein authored
BugLink: http://bugs.launchpad.net/bugs/1650058 In function mlx4_cq_completion() and mlx4_cq_event(), the radix_tree_lookup requires a rcu_read_lock. This is mandatory: if another core frees the CQ, it could run the radix_tree_node_rcu_free() call_rcu() callback while its being used by the radix tree lookup function. Additionally, in function mlx4_cq_event(), since we are adding the rcu lock around the radix-tree lookup, we no longer need to take the spinlock. Also, the synchronize_irq() call for the async event eliminates the need for incrementing the cq reference count in mlx4_cq_event(). Other changes: 1. In function mlx4_cq_free(), replace spin_lock_irq with spin_lock: we no longer take this spinlock in the interrupt context. The spinlock here, therefore, simply protects against different threads simultaneously invoking mlx4_cq_free() for different cq's. 2. In function mlx4_cq_free(), we move the radix tree delete to before the synchronize_irq() calls. This guarantees that we will not access this cq during any subsequent interrupts, and therefore can safely free the CQ after the synchronize_irq calls. The rcu_read_lock in the interrupt handlers only needs to protect against corrupting the radix tree; the interrupt handlers may access the cq outside the rcu_read_lock due to the synchronize_irq calls which protect against premature freeing of the cq. 3. In function mlx4_cq_event(), we change the mlx_warn message to mlx4_dbg. 4. We leave the cq reference count mechanism in place, because it is still needed for the cq completion tasklet mechanism. Fixes: 6d90aa5c ("net/mlx4_core: Make sure there are no pending async events when freeing CQ") Fixes: 225c7b1f ("IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 291c566a) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> Acked-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Eugenia Emantayev authored
BugLink: http://bugs.launchpad.net/bugs/1650058 Single send WQE in RX buffer should be stamped with software ownership in order to prevent the flow of QP in error in FW once UPDATE_QP is called. Fixes: 9f519f68 ('mlx4_en: Not using Shared Receive Queues') Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 6496bbf0) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> Acked-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Greg Kroah-Hartman authored
BugLink: http://bugs.launchpad.net/bugs/1664960Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Andrey Ryabinin authored
BugLink: http://bugs.launchpad.net/bugs/1664960 commit 5351fbb1 upstream. page_flip_completed() dereferences 'work' variable after executing queue_work(). This is not safe as the 'work' item might be already freed by queued work: BUG: KASAN: use-after-free in page_flip_completed+0x3ff/0x490 at addr ffff8803dc010f90 Call Trace: __asan_report_load8_noabort+0x59/0x80 page_flip_completed+0x3ff/0x490 intel_finish_page_flip_mmio+0xe3/0x130 intel_pipe_handle_vblank+0x2d/0x40 gen8_irq_handler+0x4a7/0xed0 __handle_irq_event_percpu+0xf6/0x860 handle_irq_event_percpu+0x6b/0x160 handle_irq_event+0xc7/0x1b0 handle_edge_irq+0x1f4/0xa50 handle_irq+0x41/0x70 do_IRQ+0x9a/0x200 common_interrupt+0x89/0x89 Freed: kfree+0x113/0x4d0 intel_unpin_work_fn+0x29a/0x3b0 process_one_work+0x79e/0x1b70 worker_thread+0x611/0x1460 kthread+0x241/0x3a0 ret_from_fork+0x27/0x40 Move queue_work() after trace_i915_flip_complete() to fix this. Fixes: e5510fac ("drm/i915: add tracepoints for flip requests & completions") Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20170126143211.24013-1-aryabinin@virtuozzo.com (cherry picked from commit 05c41f92) Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Takashi Iwai authored
BugLink: http://bugs.launchpad.net/bugs/1664960 commit 37a7ea4a upstream. snd_seq_pool_done() syncs with closing of all opened threads, but it aborts the wait loop with a timeout, and proceeds to the release resource even if not all threads have been closed. The timeout was 5 seconds, and if you run a crazy stuff, it can exceed easily, and may result in the access of the invalid memory address -- this is what syzkaller detected in a bug report. As a fix, let the code graduate from naiveness, simply remove the loop timeout. BugLink: http://lkml.kernel.org/r/CACT4Y+YdhDV2H5LLzDTJDVF-qiYHUHhtRaW4rbb4gUhTCQB81w@mail.gmail.comReported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Takashi Iwai authored
BugLink: http://bugs.launchpad.net/bugs/1664960 commit 4842e98f upstream. When a sequencer queue is created in snd_seq_queue_alloc(),it adds the new queue element to the public list before referencing it. Thus the queue might be deleted before the call of snd_seq_queue_use(), and it results in the use-after-free error, as spotted by syzkaller. The fix is to reference the queue object at the right time. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Boris Ostrovsky authored
BugLink: http://bugs.launchpad.net/bugs/1664960 commit 74470954 upstream. rx_refill_timer should be deleted as soon as we disconnect from the backend since otherwise it is possible for the timer to go off before we get to xennet_destroy_queues(). If this happens we may dereference queue->rx.sring which is set to NULL in xennet_disconnect_backend(). Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
ojab authored
BugLink: http://bugs.launchpad.net/bugs/1664960 commit ffdadd68 upstream. MPI2 controllers sometimes got lost (i.e. disappear from /sys/bus/pci/devices) if ASMP is enabled. Signed-off-by: Slava Kardakov <ojab@ojab.ru> Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=60644Acked-by: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Dave Carroll authored
BugLink: http://bugs.launchpad.net/bugs/1664960 commit 8af8e1c2 upstream. commit 78cbccd3 ("aacraid: Fix for KDUMP driver hang") caused a problem on older controllers which do not support MSI-x (namely ASR3405,ASR3805). This patch conditionalizes the previous patch to controllers which support MSI-x Fixes: 78cbccd3 ("aacraid: Fix for KDUMP driver hang") Reported-by: Arkadiusz Miskiewicz <a.miskiewicz@gmail.com> Signed-off-by: Dave Carroll <david.carroll@microsemi.com> Reviewed-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Steffen Maier authored
BugLink: http://bugs.launchpad.net/bugs/1664960 commit 2dfa6688 upstream. Dan Carpenter kindly reported: <quote> The patch d27a7cb9: "zfcp: trace on request for open and close of WKA port" from Aug 10, 2016, leads to the following static checker warning: drivers/s390/scsi/zfcp_fsf.c:1615 zfcp_fsf_open_wka_port() warn: 'req' was already freed. drivers/s390/scsi/zfcp_fsf.c 1609 zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT); 1610 retval = zfcp_fsf_req_send(req); 1611 if (retval) 1612 zfcp_fsf_req_free(req); ^^^ Freed. 1613 out: 1614 spin_unlock_irq(&qdio->req_q_lock); 1615 if (req && !IS_ERR(req)) 1616 zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req->req_id); ^^^^^^^^^^^ Use after free. 1617 return retval; 1618 } Same thing for zfcp_fsf_close_wka_port() as well. </quote> Rather than relying on req being NULL (or ERR_PTR) for all cases where we don't want to trace or should not trace, simply check retval which is unconditionally initialized with -EIO != 0 and it can only become 0 on successful retval = zfcp_fsf_req_send(req). With that we can also remove the then again unnecessary unconditional initialization of req which was introduced with that earlier commit. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Suggested-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: d27a7cb9 ("zfcp: trace on request for open and close of WKA port") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Jens Remus <jremus@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Ben Hutchings authored
BugLink: http://bugs.launchpad.net/bugs/1664960 Commit a50af86d "netvsc: reduce maximum GSO size" was wrongly backported to 4.4-stable. The maximum size needs to be set before the net device is registered, in netvsc_probe(). Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Thorsten Horstmann authored
BugLink: http://bugs.launchpad.net/bugs/1664960 commit da7061c8 upstream. The function ieee80211_ie_split_vendor doesn't return 0 on errors. Instead it returns any offset < ielen when WLAN_EID_VENDOR_SPECIFIC is found. The return value in mesh_add_vendor_ies must therefore be checked against ifmsh->ie_len and not 0. Otherwise all ifmsh->ie starting with WLAN_EID_VENDOR_SPECIFIC will be rejected. Fixes: 082ebb0c ("mac80211: fix mesh beacon format") Signed-off-by: Thorsten Horstmann <thorsten@defutech.de> Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fit.fraunhofer.de> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> [sven@narfation.org: Add commit message] Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Alexander Sverdlin authored
BugLink: http://bugs.launchpad.net/bugs/1664960 commit 97a98ae5 upstream. Asynchronous external abort is coded differently in DFSR with LPAE enabled. Fixes: 9254970c "ARM: 8447/1: catch pending imprecise abort on unmask". Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Nicholas Bellinger authored
BugLink: http://bugs.launchpad.net/bugs/1664960 commit 9b2792c3 upstream. This patch addresses a long standing bug where the commit phase of COMPARE_AND_WRITE would result in a se_cmd->cmd_kref reference leak if se_cmd->scsi_status returned non SAM_STAT_GOOD. This would manifest first as a lost SCSI response, and eventual hung task during fabric driver logout or re-login, as existing shutdown logic waited for the COMPARE_AND_WRITE se_cmd->cmd_kref to reach zero. To address this bug, compare_and_write_post() has been changed to drop the incorrect !cmd->scsi_status conditional that was preventing *post_ret = 1 for being set during non SAM_STAT_GOOD status. This patch has been tested with SAM_STAT_CHECK_CONDITION status from normal target_complete_cmd() callback path, as well as the incoming __target_execute_cmd() submission failure path when se_cmd->execute_cmd() returns non zero status. Reported-by: Donald White <dew@datera.io> Cc: Donald White <dew@datera.io> Tested-by: Gary Guo <ghg@datera.io> Cc: Gary Guo <ghg@datera.io> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Nicholas Bellinger authored
BugLink: http://bugs.launchpad.net/bugs/1664960 commit c54eeffb upstream. This patch fixes a bug where incoming task management requests can be explicitly aborted during an active LUN_RESET, but who's struct work_struct are canceled in-flight before execution. This occurs when core_tmr_drain_tmr_list() invokes cancel_work_sync() for the incoming se_tmr_req->task_cmd->work, resulting in cmd->work for target_tmr_work() never getting invoked and the aborted TMR waiting indefinately within transport_wait_for_tasks(). To address this case, perform a CMD_T_ABORTED check early in transport_generic_handle_tmr(), and invoke the normal path via transport_cmd_check_stop_to_fabric() to complete any TMR kthreads blocked waiting for CMD_T_STOP in transport_wait_for_tasks(). Also, move the TRANSPORT_ISTATE_PROCESSING assignment earlier into transport_generic_handle_tmr() so the existing check in core_tmr_drain_tmr_list() avoids attempting abort the incoming se_tmr_req->task_cmd->work if it has already been queued into se_device->tmr_wq. Reported-by: Rob Millner <rlm@daterainc.com> Tested-by: Rob Millner <rlm@daterainc.com> Cc: Rob Millner <rlm@daterainc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Nicholas Bellinger authored
BugLink: http://bugs.launchpad.net/bugs/1664960 commit 0583c261 upstream. This patch adds the missing target_complete_cmd() SCSI status parameter change in target_xcopy_do_work(), that was originally missing in commit 926317de. It correctly propigates up the correct SCSI status during EXTENDED_COPY exception cases, instead of always using the hardcoded SAM_STAT_CHECK_CONDITION from original code. This is required for ESX host environments that expect to hit SAM_STAT_RESERVATION_CONFLICT for certain scenarios, and SAM_STAT_CHECK_CONDITION results in non-retriable status for these cases. Reported-by: Nixon Vincent <nixon.vincent@calsoftinc.com> Tested-by: Nixon Vincent <nixon.vincent@calsoftinc.com> Cc: Nixon Vincent <nixon.vincent@calsoftinc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Nicholas Bellinger authored
BugLink: http://bugs.launchpad.net/bugs/1664960 commit 391e2a6d upstream. After the v4.2+ RCU conversion to se_node_acl->lun_entry_hlist, a BUG_ON() was added in core_enable_device_list_for_node() to detect when the located orig->se_lun_acl contains an existing se_lun_acl pointer reference. However, this scenario can happen when a dynamically generated NodeACL is being converted to an explicit NodeACL, when the explicit NodeACL contains a different LUN mapping than the default provided by the WWN endpoint. So instead of triggering BUG_ON(), go ahead and fail instead following the original pre RCU conversion logic. Reported-by: Benjamin ESTRABAUD <ben.estrabaud@mpstor.com> Cc: Benjamin ESTRABAUD <ben.estrabaud@mpstor.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Dave Martin authored
BugLink: http://bugs.launchpad.net/bugs/1664960 commit 228dbbfb upstream. Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET to fill all the registers, the thread's old registers are preserved. Fixes: 5be6f62b ("ARM: 6883/1: ptrace: Migrate to regsets framework") Signed-off-by: Dave Martin <Dave.Martin@arm.com> Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Arnd Bergmann authored
BugLink: http://bugs.launchpad.net/bugs/1664960 commit b3f2d07f upstream. The use of ACCESS_ONCE() looks like a micro-optimization to force gcc to use an indexed load for the register address, but it has an absolutely detrimental effect on builds with gcc-5 and CONFIG_KASAN=y, leading to a very likely kernel stack overflow aside from very complex object code: hisilicon/hns/hns_dsaf_gmac.c: In function 'hns_gmac_update_stats': hisilicon/hns/hns_dsaf_gmac.c:419:1: error: the frame size of 2912 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] hisilicon/hns/hns_dsaf_ppe.c: In function 'hns_ppe_reset_common': hisilicon/hns/hns_dsaf_ppe.c:390:1: error: the frame size of 1184 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] hisilicon/hns/hns_dsaf_ppe.c: In function 'hns_ppe_get_regs': hisilicon/hns/hns_dsaf_ppe.c:621:1: error: the frame size of 3632 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] hisilicon/hns/hns_dsaf_rcb.c: In function 'hns_rcb_get_common_regs': hisilicon/hns/hns_dsaf_rcb.c:970:1: error: the frame size of 2784 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] hisilicon/hns/hns_dsaf_gmac.c: In function 'hns_gmac_get_regs': hisilicon/hns/hns_dsaf_gmac.c:641:1: error: the frame size of 5728 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] hisilicon/hns/hns_dsaf_rcb.c: In function 'hns_rcb_get_ring_regs': hisilicon/hns/hns_dsaf_rcb.c:1021:1: error: the frame size of 2208 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] hisilicon/hns/hns_dsaf_main.c: In function 'hns_dsaf_comm_init': hisilicon/hns/hns_dsaf_main.c:1209:1: error: the frame size of 1904 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] hisilicon/hns/hns_dsaf_xgmac.c: In function 'hns_xgmac_get_regs': hisilicon/hns/hns_dsaf_xgmac.c:748:1: error: the frame size of 4704 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] hisilicon/hns/hns_dsaf_main.c: In function 'hns_dsaf_update_stats': hisilicon/hns/hns_dsaf_main.c:2420:1: error: the frame size of 1088 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] hisilicon/hns/hns_dsaf_main.c: In function 'hns_dsaf_get_regs': hisilicon/hns/hns_dsaf_main.c:2753:1: error: the frame size of 10768 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] This does not seem to happen any more with gcc-7, but removing the ACCESS_ONCE seems safe anyway and it avoids a serious issue for some people. I have verified that with gcc-5.3.1, the object code we get is better in the new version both with and without CONFIG_KASAN, as we no longer allocate a 1344 byte stack frame for hns_dsaf_get_regs() but otherwise have practically identical object code. With gcc-7.0.0, removing ACCESS_ONCE has no effect, the object code is already good either way. This patch is probably not urgent to get into 4.11 as only KASAN=y builds with certain compilers are affected, but I still think it makes sense to backport into older kernels. Fixes: 511e6bc0 ("net: add Hisilicon Network Subsystem DSAF support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Tejun Heo authored
BugLink: http://bugs.launchpad.net/bugs/1664960 commit 4d59b6cc upstream. Commit 513e3d2d ("cpumask: always use nr_cpu_ids in formatting and parsing functions") converted both cpumask printing and parsing functions to use nr_cpu_ids instead of nr_cpumask_bits. While this was okay for the printing functions as it just picked one of the two output formats that we were alternating between depending on a kernel config, doing the same for parsing wasn't okay. nr_cpumask_bits can be either nr_cpu_ids or NR_CPUS. We can always use nr_cpu_ids but that is a variable while NR_CPUS is a constant, so it can be more efficient to use NR_CPUS when we can get away with it. Converting the printing functions to nr_cpu_ids makes sense because it affects how the masks get presented to userspace and doesn't break anything; however, using nr_cpu_ids for parsing functions can incorrectly leave the higher bits uninitialized while reading in these masks from userland. As all testing and comparison functions use nr_cpumask_bits which can be larger than nr_cpu_ids, the parsed cpumasks can erroneously yield false negative results. This made the taskstats interface incorrectly return -EINVAL even when the inputs were correct. Fix it by restoring the parse functions to use nr_cpumask_bits instead of nr_cpu_ids. Link: http://lkml.kernel.org/r/20170206182442.GB31078@htj.duckdns.org Fixes: 513e3d2d ("cpumask: always use nr_cpu_ids in formatting and parsing functions") Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Martin Steigerwald <martin.steigerwald@teamix.de> Debugged-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Linus Torvalds authored
BugLink: http://bugs.launchpad.net/bugs/1664960 commit d966564f upstream. This reverts commit 020eb3da. Gabriel C reports that it causes his machine to not boot, and we haven't tracked down the reason for it yet. Since the bug it fixes has been around for a longish time, we're better off reverting the fix for now. Gabriel says: "It hangs early and freezes with a lot RCU warnings. I bisected it down to : > Ruslan Ruslichenko (1): > x86/ioapic: Restore IO-APIC irq_chip retrigger callback Reverting this one fixes the problem for me.. The box is a PRIMERGY TX200 S5 , 2 socket , 2 x E5520 CPU(s) installed" and Ruslan and Thomas are currently stumped. Reported-and-bisected-by: Gabriel C <nix.or.die@gmail.com> Cc: Ruslan Ruslichenko <rruslich@cisco.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Stephen Smalley authored
BugLink: http://bugs.launchpad.net/bugs/1664960 commit 0c461cb7 upstream. SELinux tries to support setting/clearing of /proc/pid/attr attributes from the shell by ignoring terminating newlines and treating an attribute value that begins with a NUL or newline as an attempt to clear the attribute. However, the test for clearing attributes has always been wrong; it has an off-by-one error, and this could further lead to reading past the end of the allocated buffer since commit bb646cdb ("proc_pid_attr_write(): switch to memdup_user()"). Fix the off-by-one error. Even with this fix, setting and clearing /proc/pid/attr attributes from the shell is not straightforward since the interface does not support multiple write() calls (so shells that write the value and newline separately will set and then immediately clear the attribute, requiring use of echo -n to set the attribute), whereas trying to use echo -n "" to clear the attribute causes the shell to skip the write() call altogether since POSIX says that a zero-length write causes no side effects. Thus, one must use echo -n to set and echo without -n to clear, as in the following example: $ echo -n unconfined_u:object_r:user_home_t:s0 > /proc/$$/attr/fscreate $ cat /proc/$$/attr/fscreate unconfined_u:object_r:user_home_t:s0 $ echo "" > /proc/$$/attr/fscreate $ cat /proc/$$/attr/fscreate Note the use of /proc/$$ rather than /proc/self, as otherwise the cat command will read its own attribute value, not that of the shell. There are no users of this facility to my knowledge; possibly we should just get rid of it. UPDATE: Upon further investigation it appears that a local process with the process:setfscreate permission can cause a kernel panic as a result of this bug. This patch fixes CVE-2017-2618. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> [PM: added the update about CVE-2017-2618 to the commit description] Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: James Morris <james.l.morris@oracle.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Vineet Gupta authored
BugLink: http://bugs.launchpad.net/bugs/1664960 commit a524c218 upstream. Reported-by: Jo-Philipp Wich <jo@mein.io> Fixes: 9aed02fe ("ARC: [arcompact] handle unaligned access delay slot") Cc: linux-kernel@vger.kernel.org Cc: linux-snps-arc@lists.infradead.org Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
NeilBrown authored
BugLink: http://bugs.launchpad.net/bugs/1650336 There are two problems with refcounting of auth_gss messages. First, the reference on the pipe->pipe list (taken by a call to rpc_queue_upcall()) is not counted. It seems to be assumed that a message in pipe->pipe will always also be in pipe->in_downcall, where it is correctly reference counted. However there is no guaranty of this. I have a report of a NULL dereferences in rpc_pipe_read() which suggests a msg that has been freed is still on the pipe->pipe list. One way I imagine this might happen is: - message is queued for uid=U and auth->service=S1 - rpc.gssd reads this message and starts processing. This removes the message from pipe->pipe - message is queued for uid=U and auth->service=S2 - rpc.gssd replies to the first message. gss_pipe_downcall() calls __gss_find_upcall(pipe, U, NULL) and it finds the *second* message, as new messages are placed at the head of ->in_downcall, and the service type is not checked. - This second message is removed from ->in_downcall and freed by gss_release_msg() (even though it is still on pipe->pipe) - rpc.gssd tries to read another message, and dereferences a pointer to this message that has just been freed. I fix this by incrementing the reference count before calling rpc_queue_upcall(), and decrementing it if that fails, or normally in gss_pipe_destroy_msg(). It seems strange that the reply doesn't target the message more precisely, but I don't know all the details. In any case, I think the reference counting irregularity became a measureable bug when the extra arg was added to __gss_find_upcall(), hence the Fixes: line below. The second problem is that if rpc_queue_upcall() fails, the new message is not freed. gss_alloc_msg() set the ->count to 1, gss_add_msg() increments this to 2, gss_unhash_msg() decrements to 1, then the pointer is discarded so the memory never gets freed. Fixes: 9130b8db ("SUNRPC: allow for upcalls for same uid but different gss service") Cc: stable@vger.kernel.org Link: https://bugzilla.opensuse.org/show_bug.cgi?id=1011250Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> (cherry picked from commit 1cded9d2) Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Guenter Roeck authored
BugLink: http://bugs.launchpad.net/bugs/1664809 On a system with a defective USB device connected to an USB hub, an endless sequence of port connect events was observed. The sequence of events as observed is as follows: - Port reports connected event (port status=USB_PORT_STAT_CONNECTION). - Event handler debounces port and resets it by calling hub_port_reset(). - hub_port_reset() calls hub_port_wait_reset() to wait for the reset to complete. - The reset completes, but USB_PORT_STAT_CONNECTION is not immediately set in the port status register. - hub_port_wait_reset() returns -ENOTCONN. - Port initialization sequence is aborted. - A few milliseconds later, the port again reports a connected event, and the sequence repeats. This continues either forever or, randomly, stops if the connection is already re-established when the port status is read. It results in a high rate of udev events. This in turn destabilizes userspace since the above sequence holds the device mutex pretty much continuously and prevents userspace from actually reading the device status. To prevent the problem from happening, let's wait for the connection to be re-established after a port reset. If the device was actually disconnected, the code will still return an error, but it will do so only after the long reset timeout. Cc: Douglas Anderson <dianders@chromium.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 22547c4c) Signed-off-by: AceLan Kao <acelan.kao@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
John Johansen authored
The lperms struct is uninitialized for use with auditing if there is an early failure due to a path name error. This can result in incorrect logging or in the extreme case apparmor killing the task with a signal which results in the failure in the referenced bug. BugLink: http://bugs.launchpad.net/bugs/1664912Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Bryant G. Ly authored
BugLink: http://bugs.launchpad.net/bugs/1662551 This patch adds internal LIO sgl limit since the driver already sets a max transfer limit on transport layer of 1MB to the client. Cc: stable@vger.kernel.org Tested-by: Steven Royer <seroyer@linux.vnet.ibm.com> Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> (cherry picked from commit b22bc278) Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Long Li authored
BugLink: http://bugs.launchpad.net/bugs/1663687 On I/O errors, the Windows driver doesn't set data_transfer_length on error conditions other than SRB_STATUS_DATA_OVERRUN. In these cases we need to set data_transfer_length to 0, indicating there is no data transferred. On SRB_STATUS_DATA_OVERRUN, data_transfer_length is set by the Windows driver to the actual data transferred. Reported-by: Shiva Krishna <Shiva.Krishna@nimblestorage.com> Signed-off-by: Long Li <longli@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Cc: <stable@vger.kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> (cherry picked from linux-next commit 40630f46) Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Long Li authored
BugLink: http://bugs.launchpad.net/bugs/1663687 When sense message is present on error, we should pass along to the upper layer to decide how to deal with the error. This patch fixes connectivity issues with Fiber Channel devices. Signed-off-by: Long Li <longli@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Cc: <stable@vger.kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> (cherry picked from linux-next commit bba5dc33) Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Long Li authored
BugLink: http://bugs.launchpad.net/bugs/1663687 Properly set SRB flags when hosting device supports tagged queuing. This patch improves the performance on Fiber Channel disks. Signed-off-by: Long Li <longli@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Cc: <stable@vger.kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> (cherry picked from linux-next commit 3cd6d3d9) Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
K. Y. Srinivasan authored
BugLink: http://bugs.launchpad.net/bugs/1663687 Enable multi-q support. We will allocate the outgoing channel using the following policy: 1. We will make every effort to pick a channel that is in the same NUMA node that is initiating the I/O 2. The mapping between the guest CPU and the outgoing channel is persistent. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> (back ported from linux-next commit d86adf48) Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Conflicts: drivers/scsi/storvsc_drv.c Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
K. Y. Srinivasan authored
BugLink: http://bugs.launchpad.net/bugs/1663687 Remove the artificially imposed restriction on max segment size. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> (cherry picked from linux-next commit 97796528) Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
K. Y. Srinivasan authored
BugLink: http://bugs.launchpad.net/bugs/1663687 Enable tracking of queue depth. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> (cherry picked from linux-next commit f64dad26) Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Gabriel Krisman Bertazi authored
BugLink: http://bugs.launchpad.net/bugs/1662666 In blk_mq_map_swqueue, there is a memory optimization that frees the tags of a queue that has gone unmapped. Later, if that hctx is remapped after another topology change, the tags need to be reallocated. If this allocation fails, a simple WARN_ON triggers, but the block layer ends up with an active hctx without any corresponding set of tags. Then, any income IO to that hctx can trigger an Oops. I can reproduce it consistently by running IO, flipping CPUs on and off and eventually injecting a memory allocation failure in that path. In the fix below, if the system experiences a failed allocation of any hctx's tags, we remap all the ctxs of that queue to the hctx_0, which should always keep it's tags. There is a minor performance hit, since our mapping just got worse after the error path, but this is the simplest solution to handle this error path. The performance hit will disappear after another successful remap. I considered dropping the memory optimization all together, but it seemed a bad trade-off to handle this very specific error case. This should apply cleanly on top of Jens' for-next branch. The Oops is the one below: SP (3fff935ce4d0) is in userspace 1:mon> e cpu 0x1: Vector: 300 (Data Access) at [c000000fe99eb110] pc: c0000000005e868c: __sbitmap_queue_get+0x2c/0x180 lr: c000000000575328: __bt_get+0x48/0xd0 sp: c000000fe99eb390 msr: 900000010280b033 dar: 28 dsisr: 40000000 current = 0xc000000fe9966800 paca = 0xc000000007e80300 softe: 0 irq_happened: 0x01 pid = 11035, comm = aio-stress Linux version 4.8.0-rc6+ (root@bean) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.2) ) #3 SMP Mon Oct 10 20:16:53 CDT 2016 1:mon> s [c000000fe99eb3d0] c000000000575328 __bt_get+0x48/0xd0 [c000000fe99eb400] c000000000575838 bt_get.isra.1+0x78/0x2d0 [c000000fe99eb480] c000000000575cb4 blk_mq_get_tag+0x44/0x100 [c000000fe99eb4b0] c00000000056f6f4 __blk_mq_alloc_request+0x44/0x220 [c000000fe99eb500] c000000000570050 blk_mq_map_request+0x100/0x1f0 [c000000fe99eb580] c000000000574650 blk_mq_make_request+0xf0/0x540 [c000000fe99eb640] c000000000561c44 generic_make_request+0x144/0x230 [c000000fe99eb690] c000000000561e00 submit_bio+0xd0/0x200 [c000000fe99eb740] c0000000003ef740 ext4_io_submit+0x90/0xb0 [c000000fe99eb770] c0000000003e95d8 ext4_writepages+0x588/0xdd0 [c000000fe99eb910] c00000000025a9f0 do_writepages+0x60/0xc0 [c000000fe99eb940] c000000000246c88 __filemap_fdatawrite_range+0xf8/0x180 [c000000fe99eb9e0] c000000000246f90 filemap_write_and_wait_range+0x70/0xf0 [c000000fe99eba20] c0000000003dd844 ext4_sync_file+0x214/0x540 [c000000fe99eba80] c000000000364718 vfs_fsync_range+0x78/0x130 [c000000fe99ebad0] c0000000003dd46c ext4_file_write_iter+0x35c/0x430 [c000000fe99ebb90] c00000000038c280 aio_run_iocb+0x3b0/0x450 [c000000fe99ebce0] c00000000038dc28 do_io_submit+0x368/0x730 [c000000fe99ebe30] c000000000009404 system_call+0x38/0xec Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Cc: Brian King <brking@linux.vnet.ibm.com> Cc: Douglas Miller <dougmill@linux.vnet.ibm.com> Cc: linux-block@vger.kernel.org Cc: linux-scsi@vger.kernel.org Reviewed-by: Douglas Miller <dougmill@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit d1b1cea1) Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-
Gabriel Krisman Bertazi authored
BugLink: http://bugs.launchpad.net/bugs/1662666 While stressing memory and IO at the same time we changed SMT settings, we were able to consistently trigger deadlocks in the mm system, which froze the entire machine. I think that under memory stress conditions, the large allocations performed by blk_mq_init_rq_map may trigger a reclaim, which stalls waiting on the block layer remmaping completion, thus deadlocking the system. The trace below was collected after the machine stalled, waiting for the hotplug event completion. The simplest fix for this is to make allocations in this path non-reclaimable, with GFP_NOIO. With this patch, We couldn't hit the issue anymore. This should apply on top of Jens's for-next branch cleanly. Changes since v1: - Use GFP_NOIO instead of GFP_NOWAIT. Call Trace: [c000000f0160aaf0] [c000000f0160ab50] 0xc000000f0160ab50 (unreliable) [c000000f0160acc0] [c000000000016624] __switch_to+0x2e4/0x430 [c000000f0160ad20] [c000000000b1a880] __schedule+0x310/0x9b0 [c000000f0160ae00] [c000000000b1af68] schedule+0x48/0xc0 [c000000f0160ae30] [c000000000b1b4b0] schedule_preempt_disabled+0x20/0x30 [c000000f0160ae50] [c000000000b1d4fc] __mutex_lock_slowpath+0xec/0x1f0 [c000000f0160aed0] [c000000000b1d678] mutex_lock+0x78/0xa0 [c000000f0160af00] [d000000019413cac] xfs_reclaim_inodes_ag+0x33c/0x380 [xfs] [c000000f0160b0b0] [d000000019415164] xfs_reclaim_inodes_nr+0x54/0x70 [xfs] [c000000f0160b0f0] [d0000000194297f8] xfs_fs_free_cached_objects+0x38/0x60 [xfs] [c000000f0160b120] [c0000000003172c8] super_cache_scan+0x1f8/0x210 [c000000f0160b190] [c00000000026301c] shrink_slab.part.13+0x21c/0x4c0 [c000000f0160b2d0] [c000000000268088] shrink_zone+0x2d8/0x3c0 [c000000f0160b380] [c00000000026834c] do_try_to_free_pages+0x1dc/0x520 [c000000f0160b450] [c00000000026876c] try_to_free_pages+0xdc/0x250 [c000000f0160b4e0] [c000000000251978] __alloc_pages_nodemask+0x868/0x10d0 [c000000f0160b6f0] [c000000000567030] blk_mq_init_rq_map+0x160/0x380 [c000000f0160b7a0] [c00000000056758c] blk_mq_map_swqueue+0x33c/0x360 [c000000f0160b820] [c000000000567904] blk_mq_queue_reinit+0x64/0xb0 [c000000f0160b850] [c00000000056a16c] blk_mq_queue_reinit_notify+0x19c/0x250 [c000000f0160b8a0] [c0000000000f5d38] notifier_call_chain+0x98/0x100 [c000000f0160b8f0] [c0000000000c5fb0] __cpu_notify+0x70/0xe0 [c000000f0160b930] [c0000000000c63c4] notify_prepare+0x44/0xb0 [c000000f0160b9b0] [c0000000000c52f4] cpuhp_invoke_callback+0x84/0x250 [c000000f0160ba10] [c0000000000c570c] cpuhp_up_callbacks+0x5c/0x120 [c000000f0160ba60] [c0000000000c7cb8] _cpu_up+0xf8/0x1d0 [c000000f0160bac0] [c0000000000c7eb0] do_cpu_up+0x120/0x150 [c000000f0160bb40] [c0000000006fe024] cpu_subsys_online+0x64/0xe0 [c000000f0160bb90] [c0000000006f5124] device_online+0xb4/0x120 [c000000f0160bbd0] [c0000000006f5244] online_store+0xb4/0xc0 [c000000f0160bc20] [c0000000006f0a68] dev_attr_store+0x68/0xa0 [c000000f0160bc60] [c0000000003ccc30] sysfs_kf_write+0x80/0xb0 [c000000f0160bca0] [c0000000003cbabc] kernfs_fop_write+0x17c/0x250 [c000000f0160bcf0] [c00000000030fe6c] __vfs_write+0x6c/0x1e0 [c000000f0160bd90] [c000000000311490] vfs_write+0xd0/0x270 [c000000f0160bde0] [c0000000003131fc] SyS_write+0x6c/0x110 [c000000f0160be30] [c000000000009204] system_call+0x38/0xec Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Cc: Brian King <brking@linux.vnet.ibm.com> Cc: Douglas Miller <dougmill@linux.vnet.ibm.com> Cc: linux-block@vger.kernel.org Cc: linux-scsi@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com> (cherry picked from commit 36e1f3d1) Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
-