- 15 Mar, 2017 32 commits
-
-
Gavin Shan authored
commit d7d55536 upstream. The surprise hotplug is driven by interrupt in PowerNV PCI hotplug driver. In the interrupt handler, pnv_php_interrupt(), we bail when pnv_pci_get_presence_state() returns zero wrongly. It causes the presence change event is always ignored incorrectly. This fixes the issue by bailing on error (non-zero value) returned from pnv_pci_get_presence_state(). Fixes: 360aebd8 ("drivers/pci/hotplug: Support surprise hotplug in powernv driver") Reported-by: Hank Chang <hankmax0000@gmail.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Tested-by: Willie Liauw <williel@supermicro.com.tw> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Nicholas Bellinger authored
commit bd4e2d29 upstream. When transport_clear_lun_ref() is shutting down a se_lun via configfs with new I/O in-flight, it's possible to trigger a NULL pointer dereference in transport_lookup_cmd_lun() due to the fact percpu_ref_get() doesn't do any __PERCPU_REF_DEAD checking before incrementing lun->lun_ref.count after lun->lun_ref has switched to atomic_t mode. This results in a NULL pointer dereference as LUN shutdown code in core_tpg_remove_lun() continues running after the existing ->release() -> core_tpg_lun_ref_release() callback completes, and clears the RCU protected se_lun->lun_se_dev pointer. During the OOPs, the state of lun->lun_ref in the process which triggered the NULL pointer dereference looks like the following on v4.1.y stable code: struct se_lun { lun_link_magic = 4294932337, lun_status = TRANSPORT_LUN_STATUS_FREE, ..... lun_se_dev = 0x0, lun_sep = 0x0, ..... lun_ref = { count = { counter = 1 }, percpu_count_ptr = 3, release = 0xffffffffa02fa1e0 <core_tpg_lun_ref_release>, confirm_switch = 0x0, force_atomic = false, rcu = { next = 0xffff88154fa1a5d0, func = 0xffffffff8137c4c0 <percpu_ref_switch_to_atomic_rcu> } } } To address this bug, use percpu_ref_tryget_live() to ensure once __PERCPU_REF_DEAD is visable on all CPUs and ->lun_ref has switched to atomic_t, all new I/Os will fail to obtain a new lun->lun_ref reference. Also use an explicit percpu_ref_kill_and_confirm() callback to block on ->lun_ref_comp to allow the first stage and associated RCU grace period to complete, and then block on ->lun_ref_shutdown waiting for the final percpu_ref_put() to drop the last reference via transport_lun_remove_cmd() before continuing with core_tpg_remove_lun() shutdown. Reported-by: Rob Millner <rlm@daterainc.com> Tested-by: Rob Millner <rlm@daterainc.com> Cc: Rob Millner <rlm@daterainc.com> Tested-by: Vaibhav Tandon <vst@datera.io> Cc: Vaibhav Tandon <vst@datera.io> Tested-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Gavin Shan authored
commit 303529d6 upstream. The root port or PCIe switch downstream port might have been associated with driver other than pnv-php. The MSI or MSIx might also have been enabled by that driver (e.g. pcieport_drv). Attempt to enable MSI incurs below backtrace: PowerPC PowerNV PCI Hotplug Driver version: 0.1 ------------[ cut here ]------------ WARNING: CPU: 19 PID: 1004 at drivers/pci/msi.c:1071 \ __pci_enable_msi_range+0x84/0x4e0 NIP [c000000000665c34] __pci_enable_msi_range+0x84/0x4e0 LR [c000000000665c24] __pci_enable_msi_range+0x74/0x4e0 Call Trace: [c000000384d67600] [c000000000665c24] __pci_enable_msi_range+0x74/0x4e0 [c000000384d676e0] [d00000000aa31b04] pnv_php_register+0x564/0x5a0 [pnv_php] [c000000384d677c0] [d00000000aa31658] pnv_php_register+0xb8/0x5a0 [pnv_php] [c000000384d678a0] [d00000000aa31658] pnv_php_register+0xb8/0x5a0 [pnv_php] [c000000384d67980] [d00000000aa31dfc] pnv_php_init+0x60/0x98 [pnv_php] [c000000384d679f0] [c00000000000cfdc] do_one_initcall+0x6c/0x1d0 [c000000384d67ab0] [c000000000b92354] do_init_module+0x94/0x254 [c000000384d67b40] [c00000000019719c] load_module+0x258c/0x2c60 [c000000384d67d30] [c000000000197bb0] SyS_finit_module+0xf0/0x170 [c000000384d67e30] [c00000000000b184] system_call+0x38/0xe0 This fixes the issue by skipping enabling the surprise hotplug capability if the MSI or MSIx on the PCI slot's upstream port has been enabled by other driver. Fixes: 360aebd8 ("drivers/pci/hotplug: Support surprise hotplug in powernv driver") Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Tested-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Gavin Shan authored
commit 36c7c9da upstream. The WARN_ON() causes unnecessary backtrace when putting the parent slot, which is likely to be NULL. WARNING: CPU: 2 PID: 1071 at drivers/pci/hotplug/pnv_php.c:85 \ pnv_php_release+0xcc/0x150 [pnv_php] : Call Trace: [c0000003bc007c10] [d00000000ad613c4] pnv_php_release+0x144/0x150 [pnv_php] [c0000003bc007c40] [c0000000006641d8] pci_hp_deregister+0x238/0x330 [c0000003bc007cd0] [d00000000ad61440] pnv_php_unregister_one+0x70/0xa0 [pnv_php] [c0000003bc007d10] [d00000000ad614c0] pnv_php_unregister+0x50/0x80 [pnv_php] [c0000003bc007d40] [d00000000ad61e84] pnv_php_exit+0x50/0xcb4 [pnv_php] [c0000003bc007d70] [c00000000019499c] SyS_delete_module+0x1fc/0x2a0 [c0000003bc007e30] [c00000000000b184] system_call+0x38/0xe0 Fixes: 66725152 ("PCI/hotplug: PowerPC PowerNV PCI hotplug driver") Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Tested-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jeff Layton authored
commit df963ea8 upstream. There's no reason a request should ever be on a s_unsafe list but not in the request tree. Link: http://tracker.ceph.com/issues/18474Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Steven Rostedt (VMware) authored
commit 32677207 upstream. The child_exit errno needs to be shifted by 8 bits to compare against the return values for the bisect variables. Fixes: c5dacb88 ("ktest: Allow overriding bisect test results") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Boris Brezillon authored
commit ee194289 upstream. at91sam9_ebi_get_config() is incorrectly converting timings in clock cycles into timings in nanoseconds by multiplying the cycle values by the clk rate instead of the clk period. at91sam9_ebi_xslate_config() has the same problem for the tdf_ns -> tdf_cycles conversion. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Reported-by: Chris Leahy <leahycm@gmail.com> Fixes: 6a4ec4cd ("memory: add Atmel EBI (External Bus Interface) driver") Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Peter Zijlstra authored
commit 0695d7dc upstream. freeing of inodes must be RCU-delayed on all filesystems Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric W. Biederman authored
commit 93faccbb upstream. To support unprivileged users mounting filesystems two permission checks have to be performed: a test to see if the user allowed to create a mount in the mount namespace, and a test to see if the user is allowed to access the specified filesystem. The automount case is special in that mounting the original filesystem grants permission to mount the sub-filesystems, to any user who happens to stumble across the their mountpoint and satisfies the ordinary filesystem permission checks. Attempting to handle the automount case by using override_creds almost works. It preserves the idea that permission to mount the original filesystem is permission to mount the sub-filesystem. Unfortunately using override_creds messes up the filesystems ordinary permission checks. Solve this by being explicit that a mount is a submount by introducing vfs_submount, and using it where appropriate. vfs_submount uses a new mount internal mount flags MS_SUBMOUNT, to let sget and friends know that a mount is a submount so they can take appropriate action. sget and sget_userns are modified to not perform any permission checks on submounts. follow_automount is modified to stop using override_creds as that has proven problemantic. do_mount is modified to always remove the new MS_SUBMOUNT flag so that we know userspace will never by able to specify it. autofs4 is modified to stop using current_real_cred that was put in there to handle the previous version of submount permission checking. cifs is modified to pass the mountpoint all of the way down to vfs_submount. debugfs is modified to pass the mountpoint all of the way down to trace_automount by adding a new parameter. To make this change easier a new typedef debugfs_automount_t is introduced to capture the type of the debugfs automount function. Fixes: 069d5ac9 ("autofs: Fix automounts by using current_real_cred()->uid") Fixes: aeaa4a79 ("fs: Call d_automount with the filesystems creds") Reviewed-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Bart Van Assche authored
commit 0a6fdbde upstream. Avoid that srp_process_rsp() overwrites the status information in ch if the SRP target response timed out and processing of another task management function has already started. Avoid that issuing multiple task management functions concurrently triggers list corruption. This patch prevents that the following stack trace appears in the system log: WARNING: CPU: 8 PID: 9269 at lib/list_debug.c:52 __list_del_entry_valid+0xbc/0xc0 list_del corruption. prev->next should be ffffc90004bb7b00, but was ffff8804052ecc68 CPU: 8 PID: 9269 Comm: sg_reset Tainted: G W 4.10.0-rc7-dbg+ #3 Call Trace: dump_stack+0x68/0x93 __warn+0xc6/0xe0 warn_slowpath_fmt+0x4a/0x50 __list_del_entry_valid+0xbc/0xc0 wait_for_completion_timeout+0x12e/0x170 srp_send_tsk_mgmt+0x1ef/0x2d0 [ib_srp] srp_reset_device+0x5b/0x110 [ib_srp] scsi_ioctl_reset+0x1c7/0x290 scsi_ioctl+0x12a/0x420 sd_ioctl+0x9d/0x100 blkdev_ioctl+0x51e/0x9f0 block_ioctl+0x38/0x40 do_vfs_ioctl+0x8f/0x700 SyS_ioctl+0x3c/0x70 entry_SYSCALL_64_fastpath+0x18/0xad Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Israel Rukshin <israelr@mellanox.com> Cc: Max Gurtovoy <maxg@mellanox.com> Cc: Laurence Oberman <loberman@redhat.com> Cc: Steve Feeley <Steve.Feeley@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Bart Van Assche authored
commit 6cb72bc1 upstream. After srp_process_rsp() returns there is a short time during which the scsi_host_find_tag() call will return a pointer to the SCSI command that is being completed. If during that time a duplicate response is received, avoid that the following call stack appears: BUG: unable to handle kernel NULL pointer dereference at (null) IP: srp_recv_done+0x450/0x6b0 [ib_srp] Oops: 0000 [#1] SMP CPU: 10 PID: 0 Comm: swapper/10 Not tainted 4.10.0-rc7-dbg+ #1 Call Trace: <IRQ> __ib_process_cq+0x4b/0xd0 [ib_core] ib_poll_handler+0x1d/0x70 [ib_core] irq_poll_softirq+0xba/0x120 __do_softirq+0xba/0x4c0 irq_exit+0xbe/0xd0 smp_apic_timer_interrupt+0x38/0x50 apic_timer_interrupt+0x90/0xa0 </IRQ> RIP: srp_recv_done+0x450/0x6b0 [ib_srp] RSP: ffff88046f483e20 Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Israel Rukshin <israelr@mellanox.com> Cc: Max Gurtovoy <maxg@mellanox.com> Cc: Laurence Oberman <loberman@redhat.com> Cc: Steve Feeley <Steve.Feeley@sandisk.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Bart Van Assche authored
commit d6c58dc4 upstream. Tests have shown that the following error message is reported when using SG-GAPS registration with an mlx5 adapter: scsi host1: ib_srp: failed RECV status WR flushed (5) for CQE ffff880bd4270eb0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0f007806 2500002a ad9fafd1 scsi host1: ib_srp: reconnect succeeded mlx5_0:dump_cqe:262:(pid 7369): dump error cqe 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0f007806 25000032 00105dd0 scsi host1: ib_srp: failed FAST REG status memory management operation error (6) for CQE ffff880b92860138 Hence avoid using SG-GAPS memory registrations. Additionally, always configure the blk_queue_virt_boundary() to avoid to trigger a mapping failure when using adapters that support SG-GAPS (e.g. mlx5). Fixes: commit ad8e66b4 ("IB/srp: fix mr allocation when the device supports sg gaps") Fixes: commit 509c5f33 ("IB/srp: Prevent mapping failures") Reported-by: Laurence Oberman <loberman@redhat.com> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Israel Rukshin <israelr@mellanox.com> Cc: Max Gurtovoy <maxg@mellanox.com> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Mark Bloch <markb@mellanox.com> Cc: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Leon Romanovsky authored
commit 0fd27a88 upstream. When we initialize buffer to create SRQ in kernel, the number of pages was less than actually used in following mlx5_fill_page_array(). Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Erez Shitrit authored
commit 2b084176 upstream. When sending packet to destination that was not resolved yet via path query, the driver keeps the skb and tries to re-send it again when the path is resolved. But when re-sending via dev_queue_xmit the kernel doesn't call to dev_hard_header, so IPoIB needs to keep 20 bytes in the skb and to put the destination address inside them. In that way the dev_start_xmit will have the correct destination, and the driver won't take the destination from the skb->data, while nothing exists there, which causes to packet be be dropped. The test flow is: 1. Run the SM on remote node, 2. Restart the driver. 4. Ping some destination, 3. Observe that first ICMP request will be dropped. Fixes: fc791b63 ("IB/ipoib: move back IB LL address into the hard header") Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Tested-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Feras Daoud authored
commit 0a0007f2 upstream. When calling set_mode from sys/fs, the call flow locks the sys/fs lock first and then tries to lock rtnl_lock (when calling ipoib_set_mod). On the other hand, the rmmod call flow takes the rtnl_lock first (when calling unregister_netdev) and then tries to take the sys/fs lock. Deadlock a->b, b->a. The problem starts when ipoib_set_mod frees it's rtnl_lck and tries to get it after that. set_mod: [<ffffffff8104f2bd>] ? check_preempt_curr+0x6d/0x90 [<ffffffff814fee8e>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff81448655>] ? __rtnl_unlock+0x15/0x20 [<ffffffff814fed2b>] mutex_lock+0x2b/0x50 [<ffffffff81448675>] rtnl_lock+0x15/0x20 [<ffffffffa02ad807>] ipoib_set_mode+0x97/0x160 [ib_ipoib] [<ffffffffa02b5f5b>] set_mode+0x3b/0x80 [ib_ipoib] [<ffffffff8134b840>] dev_attr_store+0x20/0x30 [<ffffffff811f0fe5>] sysfs_write_file+0xe5/0x170 [<ffffffff8117b068>] vfs_write+0xb8/0x1a0 [<ffffffff8117ba81>] sys_write+0x51/0x90 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b rmmod: [<ffffffff81279ffc>] ? put_dec+0x10c/0x110 [<ffffffff8127a2ee>] ? number+0x2ee/0x320 [<ffffffff814fe6a5>] schedule_timeout+0x215/0x2e0 [<ffffffff8127cc04>] ? vsnprintf+0x484/0x5f0 [<ffffffff8127b550>] ? string+0x40/0x100 [<ffffffff814fe323>] wait_for_common+0x123/0x180 [<ffffffff81060250>] ? default_wake_function+0x0/0x20 [<ffffffff8119661e>] ? ifind_fast+0x5e/0xb0 [<ffffffff814fe43d>] wait_for_completion+0x1d/0x20 [<ffffffff811f2e68>] sysfs_addrm_finish+0x228/0x270 [<ffffffff811f2fb3>] sysfs_remove_dir+0xa3/0xf0 [<ffffffff81273f66>] kobject_del+0x16/0x40 [<ffffffff8134cd14>] device_del+0x184/0x1e0 [<ffffffff8144e59b>] netdev_unregister_kobject+0xab/0xc0 [<ffffffff8143c05e>] rollback_registered+0xae/0x130 [<ffffffff8143c102>] unregister_netdevice+0x22/0x70 [<ffffffff8143c16e>] unregister_netdev+0x1e/0x30 [<ffffffffa02a91b0>] ipoib_remove_one+0xe0/0x120 [ib_ipoib] [<ffffffffa01ed95f>] ib_unregister_device+0x4f/0x100 [ib_core] [<ffffffffa021f5e1>] mlx4_ib_remove+0x41/0x180 [mlx4_ib] [<ffffffffa01ab771>] mlx4_remove_device+0x71/0x90 [mlx4_core] Fixes: 862096a8 ("IB/ipoib: Add more rtnl_link_ops callbacks") Cc: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric W. Biederman authored
commit 1064f874 upstream. Ever since mount propagation was introduced in cases where a mount in propagated to parent mount mountpoint pair that is already in use the code has placed the new mount behind the old mount in the mount hash table. This implementation detail is problematic as it allows creating arbitrary length mount hash chains. Furthermore it invalidates the constraint maintained elsewhere in the mount code that a parent mount and a mountpoint pair will have exactly one mount upon them. Making it hard to deal with and to talk about this special case in the mount code. Modify mount propagation to notice when there is already a mount at the parent mount and mountpoint where a new mount is propagating to and place that preexisting mount on top of the new mount. Modify unmount propagation to notice when a mount that is being unmounted has another mount on top of it (and no other children), and to replace the unmounted mount with the mount on top of it. Move the MNT_UMUONT test from __lookup_mnt_last into __propagate_umount as that is the only call of __lookup_mnt_last where MNT_UMOUNT may be set on any mount visible in the mount hash table. These modifications allow: - __lookup_mnt_last to be removed. - attach_shadows to be renamed __attach_mnt and its shadow handling to be removed. - commit_tree to be simplified - copy_tree to be simplified The result is an easier to understand tree of mounts that does not allow creation of arbitrary length hash chains in the mount hash table. The result is also a very slight userspace visible difference in semantics. The following two cases now behave identically, where before order mattered: case 1: (explicit user action) B is a slave of A mount something on A/a , it will propagate to B/a and than mount something on B/a case 2: (tucked mount) B is a slave of A mount something on B/a and than mount something on A/a Histroically umount A/a would fail in case 1 and succeed in case 2. Now umount A/a succeeds in both configurations. This very small change in semantics appears if anything to be a bug fix to me and my survey of userspace leads me to believe that no programs will notice or care of this subtle semantic change. v2: Updated to mnt_change_mountpoint to not call dput or mntput and instead to decrement the counts directly. It is guaranteed that there will be other references when mnt_change_mountpoint is called so this is safe. v3: Moved put_mountpoint under mount_lock in attach_recursive_mnt As the locking in fs/namespace.c changed between v2 and v3. v4: Reworked the logic in propagate_mount_busy and __propagate_umount that detects when a mount completely covers another mount. v5: Removed unnecessary tests whose result is alwasy true in find_topper and attach_recursive_mnt. v6: Document the user space visible semantic difference. Fixes: b90fa9ae ("[PATCH] shared mount handling: bind and rbind") Tested-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Gavin Li authored
commit 8e290cec upstream. brcmf_sdio_fromevntchan() was being called on the the data frame rather than the software header, causing some frames to be mischaracterized as on the event channel rather than the data channel. This fixes a major performance regression (due to dropped packets). With this patch the download speed jumped from 1Mbit/s back up to 40MBit/s due to the sheer amount of packets being incorrectly processed. Fixes: c56caa9d ("brcmfmac: screening firmware event packet") Signed-off-by: Gavin Li <git@thegavinli.com> Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com> [kvalo@codeaurora.org: improve commit logs based on email discussion] Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andrew Donnellan authored
commit 171ed0fc upstream. Commit 14a3ae34 ("cxl: Prevent read/write to AFU config space while AFU not configured") introduced a rwsem to fix an invalid memory access that occurred when someone attempts to access the config space of an AFU on a vPHB whilst the AFU is deconfigured, such as during EEH recovery. It turns out that it's possible to run into a nested locking issue when EEH recovery fails and a full device hotplug is required. cxl_pci_error_detected() deconfigures the AFU, taking a writer lock on configured_rwsem. When EEH recovery fails, the EEH code calls pci_hp_remove_devices() to remove the device, which in turn calls cxl_remove() -> cxl_pci_remove_afu() -> pci_deconfigure_afu(), which tries to grab the writer lock that's already held. Standard rwsem semantics don't express what we really want to do here and don't allow for nested locking. Fix this by replacing the rwsem with an atomic_t which we can control more finely. Allow the AFU to be locked multiple times so long as there are no readers. Fixes: 14a3ae34 ("cxl: Prevent read/write to AFU config space while AFU not configured") Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andrew Donnellan authored
commit 14a3ae34 upstream. During EEH recovery, we deconfigure all AFUs whilst leaving the corresponding vPHB and virtual PCI device in place. If something attempts to interact with the AFU's PCI config space (e.g. running lspci) after the AFU has been deconfigured and before it's reconfigured, cxl_pcie_{read,write}_config() will read invalid values from the deconfigured struct cxl_afu and proceed to Oops when they try to dereference pointers that have been set to NULL during deconfiguration. Add a rwsem to struct cxl_afu so we can prevent interaction with config space while the AFU is deconfigured. Reported-by: Pradipta Ghosh <pradghos@in.ibm.com> Suggested-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Thomas Petazzoni authored
commit 239a3b66 upstream. When TX descriptors are filled in, the buffer DMA address is split between the tx_desc->buf_phys_addr field (high-order bits) and tx_desc->packet_offset field (5 low-order bits). However, when we re-calculate the DMA address from the TX descriptor in mvpp2_txq_inc_put(), we do not take tx_desc->packet_offset into account. This means that when the DMA address is not aligned on a 32 bytes boundary, we end up calling dma_unmap_single() with a DMA address that was not the one returned by dma_map_single(). This inconsistency is detected by the kernel when DMA_API_DEBUG is enabled. We fix this problem by properly calculating the DMA address in mvpp2_txq_inc_put(). Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Heiko Carstens authored
commit 4920e3cf upstream. The current implementation of setup_randomness uses the stack address and therefore the pointer to the SYSIB 3.2.2 block as input data address. Furthermore the length of the input data is the number of virtual-machine description blocks which is typically one. This means that typically a single zero byte is fed to add_device_randomness. Fix both of these and use the address of the first virtual machine description block as input data address and also use the correct length. Fixes: bcfcbb6b ("s390: add system information as device randomness") Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Heiko Carstens authored
commit da8fd820 upstream. Commit bcfcbb6b ("s390: add system information as device randomness") intended to add some virtual machine specific information to the randomness pool. Unfortunately it uses the page allocator before it is ready to use. In result the page allocator always returns NULL and the setup_randomness function never adds anything to the randomness pool. To fix this use memblock_alloc and memblock_free instead. Fixes: bcfcbb6b ("s390: add system information as device randomness") Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Martin Schwidefsky authored
commit fb94a687 upstream. Return a sensible value if TASK_SIZE if called from a kernel thread. This gets us around an issue with copy_mount_options that does a magic size calculation "TASK_SIZE - (unsigned long)data" while in a kernel thread and data pointing to kernel space. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Peter Oberparleiter authored
commit 77759137 upstream. Prevent kernel crashes due to unhandled exceptions raised by the CHSC instruction which may for example be triggered by invalid ioctl data. Fixes: 64150adf ("s390/cio: Introduce generic synchronous CHSC IOCTL") Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Reviewed-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michael Holzheu authored
commit a4a81d8e upstream. In binutils/libbfd (bfd/elf.c) it is enforced that all s390 specific ELF notes like e.g. NT_S390_PREFIX or NT_S390_CTRS have "LINUX" specified as note name. Otherwise the notes are ignored. For /proc/vmcore we currently use "CORE" for these notes. Up to now this has not been a real problem because the dump analysis tool "crash" does not check the note name. But it will break all programs that use libbfd for processing ELF notes. So fix this and use "LINUX" for all s390 specific notes to comply with libbfd. Reported-by: Philipp Rudo <prudo@linux.vnet.ibm.com> Reviewed-by: Philipp Rudo <prudo@linux.vnet.ibm.com> Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Gerald Schaefer authored
commit a63f53e3 upstream. Since commit dd22f551 "block: Change direct_access calling convention", the device size calculation in dcssblk_direct_access() is off-by-one. This results in bdev_direct_access() always returning -ENXIO because the returned value is not page aligned. Fix this by adding 1 to the dev_sz calculation. Fixes: dd22f551 ("block: Change direct_access calling convention") Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Julian Wiedmann authored
commit 1e4a382f upstream. For devices with multiple input queues, tiqdio_call_inq_handlers() iterates over all input queues and clears the device's DSCI during each iteration. If the DSCI is re-armed during one of the later iterations, we therefore do not scan the previous queues again. The re-arming also raises a new adapter interrupt. But its handler does not trigger a rescan for the device, as the DSCI has already been erroneously cleared. This can result in queue stalls on devices with multiple input queues. Fix it by clearing the DSCI just once, prior to scanning the queues. As the code is moved in front of the loop, we also need to access the DSCI directly (ie irq->dsci) instead of going via each queue's parent pointer to the same irq. This is not a functional change, and a follow-up patch will clean up the other users. In practice, this bug only affects CQ-enabled HiperSockets devices, ie. devices with sysfs-attribute "hsuid" set. Setting a hsuid is needed for AF_IUCV socket applications that use HiperSockets communication. Fixes: 104ea556 ("qdio: support asynchronous delivery of storage blocks") Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dmitry Tunin authored
commit 441ad62d upstream. T: Bus=01 Lev=01 Prnt=01 Port=07 Cnt=04 Dev#= 5 Spd=12 MxCh= 0 D: Ver= 1.10 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=04ca ProdID=3018 Rev=00.01 C: #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA I: If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb I: If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb Signed-off-by: Dmitry Tunin <hanipouspilot@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Chao Peng authored
commit 96794e4e upstream. Guest segment selector is 16 bit field and guest segment base is natural width field. Fix two incorrect invocations accordingly. Without this patch, build fails when aggressive inlining is used with ICC. Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Janosch Frank authored
commit e1e8a962 upstream. User controlled KVM guests do not support the dirty log, as they have no single gmap that we can check for changes. As they have no single gmap, kvm->arch.gmap is NULL and all further referencing to it for dirty checking will result in a NULL dereference. Let's return -EINVAL if a caller tries to sync dirty logs for a UCONTROL guest. Fixes: 15f36ebd ("KVM: s390: Add proper dirty bitmap support to S390 kvm.") Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com> Reported-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ian Abbott authored
commit 1c9c858e upstream. The MKS Instruments SCOM-0800 and SCOM-0801 cards (originally by Tenta Technologies) are 3U CompactPCI serial cards with 4 and 8 serial ports, respectively. The first 4 ports are implemented by an OX16PCI954 chip, and the second 4 ports are implemented by an OX16C954 chip on a local bus, bridged by the second PCI function of the OX16PCI954. The ports are jumper-selectable as RS-232 and RS-422/485, and the UARTs use a non-standard oscillator frequency of 20 MHz (base_baud = 1250000). Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alexander Popov authored
commit 82f2341c upstream. Currently N_HDLC line discipline uses a self-made singly linked list for data buffers and has n_hdlc.tbuf pointer for buffer retransmitting after an error. The commit be10eb75 ("tty: n_hdlc add buffer flushing") introduced racy access to n_hdlc.tbuf. After tx error concurrent flush_tx_queue() and n_hdlc_send_frames() can put one data buffer to tx_free_buf_list twice. That causes double free in n_hdlc_release(). Let's use standard kernel linked list and get rid of n_hdlc.tbuf: in case of tx error put current data buffer after the head of tx_buf_list. Signed-off-by: Alexander Popov <alex.popov@linux.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 12 Mar, 2017 8 commits
-
-
Greg Kroah-Hartman authored
-
Florian Westphal authored
commit e5072053 upstream. This further refines the changes made to conntrack gc_worker in commit e0df8cae ("netfilter: conntrack: refine gc worker heuristics"). The main idea of that change was to reduce the scan interval when evictions take place. However, on the reporters' setup, there are 1-2 million conntrack entries in total and roughly 8k new (and closing) connections per second. In this case we'll always evict at least one entry per gc cycle and scan interval is always at 1 jiffy because of this test: } else if (expired_count) { gc_work->next_gc_run /= 2U; next_run = msecs_to_jiffies(1); being true almost all the time. Given we scan ~10k entries per run its clearly wrong to reduce interval based on nonzero eviction count, it will only waste cpu cycles since a vast majorities of conntracks are not timed out. Thus only look at the ratio (scanned entries vs. evicted entries) to make a decision on whether to reduce or not. Because evictor is supposed to only kick in when system turns idle after a busy period, pick a high ratio -- this makes it 50%. We thus keep the idea of increasing scan rate when its likely that table contains many expired entries. In order to not let timed-out entries hang around for too long (important when using event logging, in which case we want to timely destroy events), we now scan the full table within at most GC_MAX_SCAN_JIFFIES (16 seconds) even in worst-case scenario where all timed-out entries sit in same slot. I tested this with a vm under synflood (with sysctl net.netfilter.nf_conntrack_tcp_timeout_syn_recv=3). While flood is ongoing, interval now stays at its max rate (GC_MAX_SCAN_JIFFIES / GC_MAX_BUCKETS_DIV -> 125ms). With feedback from Nicolas Dichtel. Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Fixes: b87a2f91 ("netfilter: conntrack: add gc worker to remove timed-out entries") Signed-off-by: Florian Westphal <fw@strlen.de> Tested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Tested-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Florian Westphal authored
commit 524b698d upstream. Instead of breaking loop and instant resched, don't bother checking this in first place (the loop calls cond_resched for every bucket anyway). Suggested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Yan, Zheng authored
commit d641df81 upstream. add_to_page_cache_lru() can fails, so the actual pages to read can be smaller than the initial size of osd request. We need to update osd request size in that case. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
James Smart authored
commit 8ea73db4 upstream. Correct WQ creation for pagesize The driver was calculating the adapter command pagesize indicator from the system pagesize. However, the buffers the driver allocates are only one size (SLI4_PAGE_SIZE), so no calculation was necessary. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Cc: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ralf Baechle authored
commit ae2f5e5e upstream. Fix the following build error with binutils 2.25. CC arch/mips/mm/sc-ip22.o {standard input}: Assembler messages: {standard input}:132: Error: number (0x9000000080000000) larger than 32 bits {standard input}:159: Error: number (0x9000000080000000) larger than 32 bits {standard input}:200: Error: number (0x9000000080000000) larger than 32 bits scripts/Makefile.build:293: recipe for target 'arch/mips/mm/sc-ip22.o' failed make[1]: *** [arch/mips/mm/sc-ip22.o] Error 1 MIPS has used .set mips3 to temporarily switch the assembler to 64 bit mode in 64 bit kernels virtually forever. Binutils 2.25 broke this behavious partially by happily accepting 64 bit instructions in .set mips3 mode but puking on 64 bit constants when generating 32 bit ELF. Binutils 2.26 restored the old behaviour again. Fix build with binutils 2.25 by open coding the offending dli $1, 0x9000000080000000 as li $1, 0x9000 dsll $1, $1, 48 which is ugly be the only thing that will build on all binutils vintages. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ralf Baechle authored
commit f9f1c8db upstream. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Aneesh Kumar K.V authored
commit fda2d27d upstream. We will set LPCR with correct value for radix during int. This make sure we start with a sanitized value of LPCR. In case of kexec, cpus can have LPCR value based on the previous translation mode we were running. Fixes: fe036a06 ("powerpc/64/kexec: Fix MMU cleanup on radix") Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-