- 26 Sep, 2014 40 commits
-
-
Paul Gortmaker authored
commit 7400ce7e (v3.4.92-76-g7400ce7e) was a backport of commit ebebd49a upstream ("8250/16?50: Add support for Broadcom TruManage redirected serial port") However, in the context of 3.4.x kernels, the pci setup code was expecting a struct uart_port and not a struct uart_8250_port, leading to the following concerning warnings: drivers/tty/serial/8250/8250_pci.c: In function ‘pci_brcm_trumanage_setup’: drivers/tty/serial/8250/8250_pci.c:1086:2: warning: passing argument 3 of ‘pci_default_setup’ from incompatible pointer type [enabled by default] int ret = pci_default_setup(priv, board, port, idx); ^ drivers/tty/serial/8250/8250_pci.c:1036:1: note: expected ‘struct uart_port *’ but argument is of type ‘struct uart_8250_port *’ pci_default_setup(struct serial_private *priv, ^ drivers/tty/serial/8250/8250_pci.c: At top level: drivers/tty/serial/8250/8250_pci.c:1746:3: warning: initialization from incompatible pointer type [enabled by default] .setup = pci_brcm_trumanage_setup, ^ drivers/tty/serial/8250/8250_pci.c:1746:3: warning: (near initialization for ‘pci_serial_quirks[56].setup’) [enabled by default] I'd also expect the initialization to not function correctly, and perhaps dereference random garbage due to this. Since the uart_port is a field within the uart_8250_port, the adaptation to fix these warnings is a straightforward removal of a layer of indirection. Cc: Stephen Hurd <shurd@broadcom.com> Cc: Michael Chan <mchan@broadcom.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Rui Xiang <rui.xiang@huawei.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by:
Zefan Li <lizefan@huawei.com> (cherry picked from commit 82a938ab) (cherry picked from commit HEAD) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Stefan Kristiansson authored
Prevents build issue with updated toolchain Reported-by:
Jack Thomasson <jkt@moonlitsw.com> Tested-by:
Christian Svensson <blue@cmd.nu> Signed-off-by:
Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Signed-off-by:
Jonas Bonn <jonas@southpole.se> (cherry picked from commit 160d8378) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Johan Hovold authored
Make sure to verify the number of ports requested by subdriver to avoid writing beyond the end of fixed-size array in interface data. The current usb-serial implementation is limited to eight ports per interface but failed to verify that the number of ports requested by a subdriver (which could have been determined from device descriptors) did not exceed this limit. Cc: stable <stable@vger.kernel.org> Signed-off-by:
Johan Hovold <johan@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 5654699f) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Mark Rutland authored
On revisions of Cortex-A15 prior to r3p3, a CLREX instruction at PL1 may falsely trigger a watchpoint exception, leading to potential data aborts during exception return and/or livelock. This patch resolves the issue in the following ways: - Replacing our uses of CLREX with a dummy STREX sequence instead (as we did for v6 CPUs). - Removing the clrex code from v7_exit_coherency_flush and derivatives, since this only exists as a minor performance improvement when non-cached exclusives are in use (Linux doesn't use these). Benchmarking on a variety of ARM cores revealed no measurable performance difference with this change applied, so the change is performed unconditionally and no new Kconfig entry is added. Signed-off-by:
Mark Rutland <mark.rutland@arm.com> Signed-off-by:
Will Deacon <will.deacon@arm.com> Cc: stable@vger.kernel.org Signed-off-by:
Russell King <rmk+kernel@arm.linux.org.uk> (cherry picked from commit 2c32c65e) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Mark Rutland authored
The ARMv6 and ARMv7 early abort handlers clear the exclusive monitors upon entry to the kernel, but this is redundant: - We clear the monitors on every exception return since commit 200b812d ("Clear the exclusive monitor when returning from an exception"), so this is not necessary to ensure the monitors are cleared before returning from a fault handler. - Any dummy STREX will target a temporary scratch area in memory, and may succeed or fail without corrupting useful data. Its status value will not be used. - Any other STREX in the kernel must be preceded by an LDREX, which will initialise the monitors consistently and will not depend on the earlier state of the monitors. Therefore we have no reason to care about the initial state of the exclusive monitors when a data abort is taken, and clearing the monitors prior to exception return (as we already do) is sufficient. This patch removes the redundant clearing of the exclusive monitors from the early abort handlers. Signed-off-by:
Mark Rutland <mark.rutland@arm.com> Acked-by:
Will Deacon <will.deacon@arm.com> Cc: stable@vger.kernel.org Signed-off-by:
Russell King <rmk+kernel@arm.linux.org.uk> (cherry picked from commit 85868313) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Jiri Kosina authored
The report passed to us from transport driver could potentially be arbitrarily large, therefore we better sanity-check it so that magicmouse_emit_touch() gets only valid values of raw_id. Cc: stable@vger.kernel.org Reported-by:
Steven Vittitoe <scvitti@google.com> Signed-off-by:
Jiri Kosina <jkosina@suse.cz> (cherry picked from commit c54def7b) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Stephen Hemminger authored
I have a j5 create (JUA210) USB 2 video device and adding it device id to SIS USB video gets it to work. Signed-off-by:
Stephen Hemminger <stephen@networkplumber.org> Cc: stable <stable@vger.kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 5b6b80ae) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Benjamin Tissoires authored
Commit "HID: logitech: perform bounds checking on device_id early enough" unfortunately leaks some errors to dmesg which are not real ones: - if the report is not a DJ one, then there is not point in checking the device_id - the receiver (index 0) can also receive some notifications which can be safely ignored given the current implementation Move out the test regarding the report_id and also discards printing errors when the receiver got notified. Fixes: ad3e14d7 Cc: stable@vger.kernel.org Reported-and-tested-by:
Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by:
Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by:
Jiri Kosina <jkosina@suse.cz> (cherry picked from commit 5abfe85c) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Jaša Bartelj authored
Added support to the ftdi_sio driver for ekey Converter USB which uses an FT232BM chip. Signed-off-by:
Jaša Bartelj <jasa.bartelj@gmail.com> Cc: stable <stable@vger.kernel.org> Signed-off-by:
Johan Hovold <johan@kernel.org> (cherry picked from commit 646907f5) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Greg KH authored
This adds a new device id to the pl2303 driver for the ZTEK device. Reported-by:
Mike Chu <Mike-Chu@prolific.com.tw> Cc: stable <stable@vger.kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Johan Hovold <johan@kernel.org> (cherry picked from commit 91fcb1ce) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Brennan Ashton authored
This VIA Telecom baseband processor is used is used by by u-blox in both the FW2770 and FW2760 products and may be used in others as well. This patch has been tested on both of these modem versions. Signed-off-by:
Brennan Ashton <bashton@brennanashton.com> Cc: stable <stable@vger.kernel.org> Signed-off-by:
Johan Hovold <johan@kernel.org> (cherry picked from commit d7730273) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Max Filippov authored
Remove restoring a6 on some return paths and instead modify and restore it in a single place, using symbolic name. Correctly restore a7 from PT_AREG7 in case of illegal a6 value. Cc: stable@vger.kernel.org Signed-off-by:
Max Filippov <jcmvbkbc@gmail.com> (cherry picked from commit d1b6ba82) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Max Filippov authored
Current definition of TLBTEMP_BASE_2 is always 32K above the TLBTEMP_BASE_1, whereas fast_second_level_miss handler for the TLBTEMP region analyzes virtual address bit (PAGE_SHIFT + DCACHE_ALIAS_ORDER) to determine TLBTEMP region where the fault happened. The size of the TLBTEMP region is also checked incorrectly: not 64K, but twice data cache way size (whicht may as well be less than the instruction cache way size). Fix TLBTEMP_BASE_2 to be TLBTEMP_BASE_1 + data cache way size. Provide TLBTEMP_SIZE that is a greater of doubled data cache way size or the instruction cache way size, and use it to determine if the second level TLB miss occured in the TLBTEMP region. Practical occurence of page faults in the TLBTEMP area is extremely rare, this code can be tested by deletion of all w[di]tlb instructions in the tlbtemp_mapping region. Cc: stable@vger.kernel.org Signed-off-by:
Max Filippov <jcmvbkbc@gmail.com> (cherry picked from commit 7128039f) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Alan Douglas authored
Virtual address is translated to the XCHAL_KSEG_CACHED region in the dma_free_coherent, but is checked to be in the 0...XCHAL_KSEG_SIZE range. Change check for end of the range from 'addr >= X' to 'addr > X - 1' to handle the case of X == 0. Replace 'if (C) BUG();' construct with 'BUG_ON(C);'. Cc: stable@vger.kernel.org Signed-off-by:
Alan Douglas <adouglas@cadence.com> Signed-off-by:
Max Filippov <jcmvbkbc@gmail.com> (cherry picked from commit 1ca49463) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Arjun Sreedharan authored
scc_bus_softreset not necessarily should return zero. Propagate the error code. Signed-off-by:
Arjun Sreedharan <arjun024@gmail.com> Signed-off-by:
Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org (cherry picked from commit 4dc7c76c) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Anton Blanchard authored
Hidden away in the last 8 bytes of the buffer_list page is a solitary statistic. It needs to be byte swapped or else ethtool -S will produce numbers that terrify the user. Since we do this in multiple places, create a helper function with a comment explaining what is going on. Signed-off-by:
Anton Blanchard <anton@samba.org> Cc: stable@vger.kernel.org Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit cbd52281) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
David S. Miller authored
The assumption was that update_mmu_cache() (and the equivalent for PMDs) would only be called when the PTE being installed will be accessible by the user. This is not true for code paths originating from remove_migration_pte(). There are dire consequences for placing a non-valid PTE into the TSB. The TLB miss frramework assumes thatwhen a TSB entry matches we can just load it into the TLB and return from the TLB miss trap. So if a non-valid PTE is in there, we will deadlock taking the TLB miss over and over, never satisfying the miss. Just exit early from update_mmu_cache() and friends in this situation. Based upon a report and patch from Christopher Alexander Tobias Schulze. Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit 18f38132) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
H. Peter Anvin authored
Make espfix64 a hidden Kconfig option. This fixes the x86-64 UML build which had broken due to the non-existence of init_espfix_bsp() in UML: since UML uses its own Kconfig, this option does not appear in the UML build. This also makes it possible to make support for 16-bit segments a configuration option, for the people who want to minimize the size of the kernel. Reported-by:
Ingo Molnar <mingo@kernel.org> Signed-off-by:
H. Peter Anvin <hpa@zytor.com> Cc: Richard Weinberger <richard@nod.at> Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com (cherry picked from commit 197725de) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Thomas Gleixner authored
The current deadlock detection logic does not work reliably due to the following early exit path: /* * Drop out, when the task has no waiters. Note, * top_waiter can be NULL, when we are in the deboosting * mode! */ if (top_waiter && (!task_has_pi_waiters(task) || top_waiter != task_top_pi_waiter(task))) goto out_unlock_pi; So this not only exits when the task has no waiters, it also exits unconditionally when the current waiter is not the top priority waiter of the task. So in a nested locking scenario, it might abort the lock chain walk and therefor miss a potential deadlock. Simple fix: Continue the chain walk, when deadlock detection is enabled. We also avoid the whole enqueue, if we detect the deadlock right away (A-A). It's an optimization, but also prevents that another waiter who comes in after the detection and before the task has undone the damage observes the situation and detects the deadlock and returns -EDEADLOCK, which is wrong as the other task is not in a deadlock situation. Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Reviewed-by:
Steven Rostedt <rostedt@goodmis.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20140522031949.725272460@linutronix.deSigned-off-by:
Thomas Gleixner <tglx@linutronix.de> (cherry picked from commit 397335f0) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Nadav Amit authored
commit 682367c4 upstream. Recent Intel CPUs have 10 variable range MTRRs. Since operating systems sometime make assumptions on CPUs while they ignore capability MSRs, it is better for KVM to be consistent with recent CPUs. Reporting more MTRRs than actually supported has no functional implications. Signed-off-by:
Nadav Amit <namit@cs.technion.ac.il> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 80bbfbaa)
-
Thomas Hellstrom authored
Commit "drm/vmwgfx: correct fb_fix_screeninfo.line_length", while fixing a vmwgfx fbdev bug, also writes the pitch to a supposedly read-only register: SVGA_REG_BYTES_PER_LINE, while it should be (and also in fact is) written to SVGA_REG_PITCHLOCK. This patch is Cc'd stable because of the unknown effects writing to this register might have, particularly on older device versions. v2: Updated log message. Cc: stable@vger.kernel.org Cc: Christopher Friedt <chrisfriedt@gmail.com> Tested-by:
Christopher Friedt <chrisfriedt@gmail.com> Signed-off-by:
Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by:
Jakob Bornecrantz <jakob@vmware.com> (cherry picked from commit 4e578080) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Jaganath Kanakkassery authored
The length check is invalid since the length varies with type of info response. This was introduced by the commit cb3b3152 Because of this, l2cap info rsp is not handled and command reject is sent. > ACL data: handle 11 flags 0x02 dlen 16 L2CAP(s): Info rsp: type 2 result 0 Extended feature mask 0x00b8 Enhanced Retransmission mode Streaming mode FCS Option Fixed Channels < ACL data: handle 11 flags 0x00 dlen 10 L2CAP(s): Command rej: reason 0 Command not understood Cc: stable@vger.kernel.org Signed-off-by:
Jaganath Kanakkassery <jaganath.k@samsung.com> Signed-off-by:
Chan-Yeol Park <chanyeol.park@samsung.com> Acked-by:
Johan Hedberg <johan.hedberg@intel.com> Signed-off-by:
Gustavo Padovan <gustavo.padovan@collabora.co.uk> ath9k_htc: Handle IDLE state transition properly Make sure that a chip reset is done when IDLE is turned off - this fixes authentication timeouts. Cc: stable@vger.kernel.org Reported-by:
Ignacy Gawedzki <i@lri.fr> Signed-off-by:
Sujith Manoharan <c_manoha@qca.qualcomm.com> Signed-off-by:
John W. Linville <linville@tuxdriver.com> ath9k: fix an RCU issue in calling ieee80211_get_tx_rates ath_txq_schedule is called outside of the drv_tx call, so it needs RCU protection. Signed-off-by:
Felix Fietkau <nbd@openwrt.org> Signed-off-by:
John W. Linville <linville@tuxdriver.com> Bluetooth: Fix invalid length check in l2cap_information_rsp() The length check is invalid since the length varies with type of info response. This was introduced by the commit cb3b3152 Because of this, l2cap info rsp is not handled and command reject is sent. > ACL data: handle 11 flags 0x02 dlen 16 L2CAP(s): Info rsp: type 2 result 0 Extended feature mask 0x00b8 Enhanced Retransmission mode Streaming mode FCS Option Fixed Channels < ACL data: handle 11 flags 0x00 dlen 10 L2CAP(s): Command rej: reason 0 Command not understood Cc: stable@vger.kernel.org Signed-off-by:
Jaganath Kanakkassery <jaganath.k@samsung.com> Signed-off-by:
Chan-Yeol Park <chanyeol.park@samsung.com> Acked-by:
Johan Hedberg <johan.hedberg@intel.com> Signed-off-by:
Gustavo Padovan <gustavo.padovan@collabora.co.uk> Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 nl80211: fix attrbuf access race by allocating a separate one Since my commit 3713b4e3 ("nl80211: allow splitting wiphy information in dumps"), nl80211_dump_wiphy() uses the global nl80211_fam.attrbuf for parsing the incoming data. This wouldn't be a problem if it only did so on the first dump iteration which is locked against other commands in generic netlink, but due to space constraints in cb->args (the needed state doesn't fit) I decided to always parse the original message. That's racy though since nl80211_fam.attrbuf could be used by some other parsing in generic netlink concurrently. For now, fix this by allocating a separate parse buffer (it's a bit too big for the stack, currently 1448 bytes on 64-bit). For -next, I'll change the code to parse into the global buffer in the first round only and then allocate a smaller buffer to keep the data in cb->args. Reported-by:
Linus Torvalds <torvalds@linux-foundation.org> Acked-by:
David S. Miller <davem@davemloft.net> Acked-by:
John W. Linville <linville@tuxdriver.com> Signed-off-by:
Johannes Berg <johannes.berg@intel.com> (cherry picked from commit da9910ac 3f6fa3d4) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Konrad Rzeszutek Wilk authored
The 'intel_idle_probe' probes the CPU and sets the CPU notifier. But if later on during the module initialization we fail (say in cpuidle_register_driver), we stop loading, but we neglect to unregister the CPU notifier. This means that during CPU hotplug events the system will fail: calling intel_idle_init+0x0/0x326 @ 1 intel_idle: MWAIT substates: 0x1120 intel_idle: v0.4 model 0x2A intel_idle: lapic_timer_reliable_states 0xffffffff intel_idle: intel_idle yielding to none initcall intel_idle_init+0x0/0x326 returned -19 after 14 usecs ... some time later, offlining and onlining a CPU: cpu 3 spinlock event irq 62 BUG: unable to ] __cpuidle_register_device+0x1c/0x120 PGD 99b8b067 PUD 99b95067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: xen_evtchn nouveau mxm_wmi wmi radeon ttm i915 fbcon tileblit font atl1c bitblit softcursor drm_kms_helper video xen_blkfront xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea xenfs xen_privcmd mperf CPU 0 Pid: 2302, comm: udevd Not tainted 3.8.0-rc3upstream-00249-g09ad159 #1 MSI MS-7680/H61M-P23 (MS-7680) RIP: e030:[<ffffffff814d956c>] [<ffffffff814d956c>] __cpuidle_register_device+0x1c/0x120 RSP: e02b:ffff88009dacfcb8 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff880105380000 RCX: 000000000000001c RDX: 0000000000000000 RSI: 0000000000000055 RDI: ffff880105380000 RBP: ffff88009dacfce8 R08: ffffffff81a4f048 R09: 0000000000000008 R10: 0000000000000008 R11: 0000000000000000 R12: ffff880105380000 R13: 00000000ffffffdd R14: 0000000000000000 R15: ffffffff81a523d0 FS: 00007f37bd83b7a0(0000) GS:ffff880105200000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 00000000a09ea000 CR4: 0000000000042660 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process udevd (pid: 2302, threadinfo ffff88009dace000, task ffff88009afb47f0) Stack: ffffffff8107f2d0 ffffffff810c2fb7 ffff88009dacfce8 00000000ffffffea ffff880105380000 00000000ffffffdd ffff88009dacfd08 ffffffff814d9882 0000000000000003 ffff880105380000 ffff88009dacfd28 ffffffff81340afd Call Trace: [<ffffffff8107f2d0>] ? collect_cpu_info_local+0x30/0x30 [<ffffffff810c2fb7>] ? __might_sleep+0xe7/0x100 [<ffffffff814d9882>] cpuidle_register_device+0x32/0x70 [<ffffffff81340afd>] intel_idle_cpu_init+0xad/0x110 [<ffffffff81340bc8>] cpu_hotplug_notify+0x68/0x80 [<ffffffff8166023d>] notifier_call_chain+0x4d/0x70 [<ffffffff810bc369>] __raw_notifier_call_chain+0x9/0x10 [<ffffffff81094a4b>] __cpu_notify+0x1b/0x30 [<ffffffff81652cf7>] _cpu_up+0x103/0x14b [<ffffffff81652e18>] cpu_up+0xd9/0xec [<ffffffff8164a254>] store_online+0x94/0xd0 [<ffffffff814122fb>] dev_attr_store+0x1b/0x20 [<ffffffff81216404>] sysfs_write_file+0xf4/0x170 [<ffffffff811a1024>] vfs_write+0xb4/0x130 [<ffffffff811a17ea>] sys_write+0x5a/0xa0 [<ffffffff816643a9>] system_call_fastpath+0x16/0x1b Code: 03 18 00 c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec 30 48 89 5d e8 4c 89 65 f0 48 89 fb 4c 89 6d f8 e8 84 08 00 00 <48> 8b 78 08 49 89 c4 e8 f8 7f c1 ff 89 c2 b8 ea ff ff ff 84 d2 RIP [<ffffffff814d956c>] __cpuidle_register_device+0x1c/0x120 RSP <ffff88009dacfcb8> This patch fixes that by moving the CPU notifier registration as the last item to be done by the module. Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by:
Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: 3.6+ <stable@vger.kernel.org> Signed-off-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com> (cherry picked from commit 6f8c2e79) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Florian Westphal authored
local_df means 'ignore DF bit if set', so if its set we're allowed to perform ip fragmentation. This wasn't noticed earlier because the output path also drops such skbs (and emits needed icmp error) and because netfilter ip defrag did not set local_df until couple of days ago. Only difference is that DF-packets-larger-than MTU now discarded earlier (f.e. we avoid pointless netfilter postrouting trip). While at it, drop the repeated test ip_exceeds_mtu, checking it once is enough... Fixes: fe6cc55f ("net: ip, ipv6: handle gso skbs in forwarding path") Signed-off-by:
Florian Westphal <fw@strlen.de> Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit ca6c5d4a) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Christopher Friedt authored
commit aa6de142 upstream. Previously, the vmwgfx_fb driver would allow users to call FBIOSET_VINFO, but it would not adjust the FINFO properly, resulting in distorted screen rendering. The patch corrects that behaviour. See https://bugs.gentoo.org/show_bug.cgi?id=494794 for examples. Signed-off-by:
Christopher Friedt <chrisfriedt@gmail.com> Reviewed-by:
Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit b8a0ddef)
-
Roger Quadros authored
OMAP3 doesn't contain "l3_init_clkdm" clock domain. Use the proper clock domains for USB Host and USB TLL modules. Gets rid of the following warnings during boot omap_hwmod: usb_host_hs: could not associate to clkdm l3_init_clkdm omap_hwmod: usb_tll_hs: could not associate to clkdm l3_init_clkdm Reported-by:
Nishanth Menon <nm@ti.com> Cc: Paul Walmsley <paul@pwsan.com> Signed-off-by:
Roger Quadros <rogerq@ti.com> Fixes: de231388 ("ARM: OMAP: USB: EHCI and OHCI hwmod structures for OMAP3") Cc: Keshava Munegowda <keshava_mgowda@ti.com> Cc: Partha Basak <parthab@india.ti.com> Signed-off-by:
Paul Walmsley <paul@pwsan.com> (cherry picked from commit c6c56697) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Nicholas Santos authored
Patch to add the Formosa Industrial Computing, Inc. Infrared Receiver [IR605A/Q] to hid-ids.h and hid-quirks.c. This IR receiver causes about a 10 second timeout when the usbhid driver attempts to initialze the device. Adding this device to the quirks list with HID_QUIRK_NO_INIT_REPORTS removes the delay. Signed-off-by:
Nicholas Santos <nicholas.santos@gmail.com> [jkosina@suse.cz: fix ordering] Signed-off-by:
Jiri Kosina <jkosina@suse.cz> (cherry picked from commit 320cde19) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Lai Jiangshan authored
When a kworker should die, the kworkre is notified through WORKER_DIE flag instead of kthread_should_stop(). This, IIRC, is primarily to keep the test synchronized inside worker_pool lock. WORKER_DIE is first set while holding pool->lock, the lock is dropped and kthread_stop() is called. Unfortunately, this means that there's a slight chance that the target kworker may see WORKER_DIE before kthread_stop() finishes and exits and frees the target task before or during kthread_stop(). Fix it by pinning the target task before setting WORKER_DIE and putting it after kthread_stop() is done. tj: Improved patch description and comment. Moved pinning above WORKER_DIE for better signify what it's protecting. CC: stable@vger.kernel.org Signed-off-by:
Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by:
Tejun Heo <tj@kernel.org> (cherry picked from commit 5bdfff96) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Jason Gunthorpe authored
Some Atmel TPMs provide completely wrong timeouts from their TPM_CAP_PROP_TIS_TIMEOUT query. This patch detects that and returns new correct values via a DID/VID table in the TIS driver. Tested on ARM using an AT97SC3204T FW version 37.16 Cc: <stable@vger.kernel.org> [PHuewe: without this fix these 'broken' Atmel TPMs won't function on older kernels] Signed-off-by:
"Berg, Christopher" <Christopher.Berg@atmel.com> Signed-off-by:
Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by:
Peter Huewe <peterhuewe@gmx.de> (cherry picked from commit 8e54caf4) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Linus Torvalds authored
The performance regression that Josef Bacik reported in the pathname lookup (see commit 99d263d4 "vfs: fix bad hashing of dentries") made me look at performance stability of the dcache code, just to verify that the problem was actually fixed. That turned up a few other problems in this area. There are a few cases where we exit RCU lookup mode and go to the slow serializing case when we shouldn't, Al has fixed those and they'll come in with the next VFS pull. But my performance verification also shows that link_path_walk() turns out to have a very unfortunate 32-bit store of the length and hash of the name we look up, followed by a 64-bit read of the combined hash_len field. That screws up the processor store to load forwarding, causing an unnecessary hickup in this critical routine. It's caused by the ugly calling convention for the "hash_name()" function, and easily fixed by just making hash_name() fill in the whole 'struct qstr' rather than passing it a pointer to just the hash value. With that, the profile for this function looks much smoother. Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Merge branch 'parisc-3.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc updates from Helge Deller: "The most important patch is a new Light Weigth Syscall (LWS) for 8, 16, 32 and 64 bit atomic CAS operations which is required in order to be able to implement the atomic gcc builtins on our platform. Other than that, we wire up the seccomp, getrandom and memfd_create syscalls, fixes a minor off-by-one bug and a wrong printk string" * 'parisc-3.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Implement new LWS CAS supporting 64 bit operations. parisc: Wire up seccomp, getrandom and memfd_create syscalls parisc: dino: fix %d confusingly prefixed with 0x in format string parisc: sys_hpux: NUL terminator is one past the end Merge tag 'ntb-3.17' of git://github.com/jonmason/ntb Pull ntb driver bugfixes from Jon Mason: "NTB driver fixes for queue spread and buffer alignment. Also, update to MAINTAINERS to reflect new e-mail address" * tag 'ntb-3.17' of git://github.com/jonmason/ntb: ntb: Add alignment check to meet hardware requirement MAINTAINERS: update NTB info NTB: correct the spread of queues over mw's Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull ARM irq chip fixes from Thomas Gleixner: "Another pile of ARM specific irq chip fixlets: - off by one bugs in the crossbar driver - missing annotations - a bunch of "make it compile" updates I pulled the lot today from Jason, but it has been in -next for at least a week" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip: gic-v3: Declare rdist as __percpu pointer to __iomem pointer irqchip: gic: Make gic_default_routable_irq_domain_ops static irqchip: exynos-combiner: Fix compilation error on ARM64 irqchip: crossbar: Off by one bugs in init irqchip: gic-v3: Tag all low level accessors __maybe_unused irqchip: gic-v3: Only define gic_peek_irq() when building SMP Merge tag 'irqchip-urgent-3.17' of git://git.infradead.org/users/jcooper/linux into irq/urgent irqchip fixes for v3.17 from Jason Cooper - GIC/GICV3: Various fixlets - crossbar: Fix off-by-one bug - exynos-combiner: Fix arm64 build error ntb: Add alignment check to meet hardware requirement The NTB translate register must have the value to be BAR size aligned. This alignment check make sure that the DMA memory allocated has the proper alignment. Another requirement for NTB to function properly with memory window BAR size greater or equal to 4M is to use the CMA feature in 3.16 kernel with the appropriate CONFIG_CMA_ALIGNMENT and CONFIG_CMA_SIZE_MBYTES set. Signed-off-by:
Dave Jiang <dave.jiang@intel.com> Signed-off-by:
Jon Mason <jdmason@kudzu.us> MAINTAINERS: update NTB info Update my contact info to my personal email address and add Dave Jiang. Signed-off-by:
Jon Mason <jon.mason@intel.com> Signed-off-by:
Dave Jiang <dave.jiang@intel.com> NTB: correct the spread of queues over mw's The detection of an uneven number of queues on the given memory windows was not correct. The mw_num is zero based and the mod should be division to spread them evenly over the mw's. Signed-off-by:
Jon Mason <jon.mason@intel.com> Merge branches 'locking-urgent-for-linus' and 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull futex and timer fixes from Thomas Gleixner: "A oneliner bugfix for the jinxed futex code: - Drop hash bucket lock in the error exit path. I really could slap myself for intruducing that bug while fixing all the other horror in that code three month ago ... and the timer department is not too proud about the following fixes: - Deal with a long standing rounding bug in the timeval to jiffies conversion. It's a real issue and this fix fell through the cracks for quite some time. - Another round of alarmtimer fixes. Finally this code gets used more widely and the subtle issues hidden for quite some time are noticed and fixed. Nothing really exciting, just the itty bitty details which bite the serious users here and there" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Unlock hb->lock in futex_wait_requeue_pi() error path * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: alarmtimer: Lock k_itimer during timer callback alarmtimer: Do not signal SIGEV_NONE timers alarmtimer: Return relative times in timer_gettime jiffies: Fix timeval conversion to jiffies parisc: Implement new LWS CAS supporting 64 bit operations. The current LWS cas only works correctly for 32bit. The new LWS allows for CAS operations of variable size. Signed-off-by:
Guy Martin <gmsoft@tuxicoman.be> Cc: <stable@vger.kernel.org> # 3.13+ Signed-off-by:
Helge Deller <deller@gmx.de> vfs: fix bad hashing of dentries Josef Bacik found a performance regression between 3.2 and 3.10 and narrowed it down to commit bfcfaa77 ("vfs: use 'unsigned long' accesses for dcache name comparison and hashing"). He reports: "The test case is essentially for (i = 0; i < 1000000; i++) mkdir("a$i"); On xfs on a fio card this goes at about 20k dir/sec with 3.2, and 12k dir/sec with 3.10. This is because we spend waaaaay more time in __d_lookup on 3.10 than in 3.2. The new hashing function for strings is suboptimal for < sizeof(unsigned long) string names (and hell even > sizeof(unsigned long) string names that I've tested). I broke out the old hashing function and the new one into a userspace helper to get real numbers and this is what I'm getting: Old hash table had 1000000 entries, 0 dupes, 0 max dupes New hash table had 12628 entries, 987372 dupes, 900 max dupes We had 11400 buckets with a p50 of 30 dupes, p90 of 240 dupes, p99 of 567 dupes for the new hash My test does the hash, and then does the d_hash into a integer pointer array the same size as the dentry hash table on my system, and then just increments the value at the address we got to see how many entries we overlap with. As you can see the old hash function ended up with all 1 million entries in their own bucket, whereas the new one they are only distributed among ~12.5k buckets, which is why we're using so much more CPU in __d_lookup". The reason for this hash regression is two-fold: - On 64-bit architectures the down-mixing of the original 64-bit word-at-a-time hash into the final 32-bit hash value is very simplistic and suboptimal, and just adds the two 32-bit parts together. In particular, because there is no bit shuffling and the mixing boundary is also a byte boundary, similar character patterns in the low and high word easily end up just canceling each other out. - the old byte-at-a-time hash mixed each byte into the final hash as it hashed the path component name, resulting in the low bits of the hash generally being a good source of hash data. That is not true for the word-at-a-time case, and the hash data is distributed among all the bits. The fix is the same in both cases: do a better job of mixing the bits up and using as much of the hash data as possible. We already have the "hash_32|64()" functions to do that. Reported-by:
Josef Bacik <jbacik@fb.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Chris Mason <clm@fb.com> Cc: linux-fsdevel@vger.kernel.org Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> alarmtimer: Lock k_itimer during timer callback Locks the k_itimer's it_lock member when handling the alarm timer's expiry callback. The regular posix timers defined in posix-timers.c have this lock held during timout processing because their callbacks are routed through posix_timer_fn(). The alarm timers follow a different path, so they ought to grab the lock somewhere else. Cc: stable@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Sharvil Nanavati <sharvil@google.com> Signed-off-by:
Richard Larocque <rlarocque@google.com> Signed-off-by:
John Stultz <john.stultz@linaro.org> alarmtimer: Do not signal SIGEV_NONE timers Avoids sending a signal to alarm timers created with sigev_notify set to SIGEV_NONE by checking for that special case in the timeout callback. The regular posix timers avoid sending signals to SIGEV_NONE timers by not scheduling any callbacks for them in the first place. Although it would be possible to do something similar for alarm timers, it's simpler to handle this as a special case in the timeout. Prior to this patch, the alarm timer would ignore the sigev_notify value and try to deliver signals to the process anyway. Even worse, the sanity check for the value of sigev_signo is skipped when SIGEV_NONE was specified, so the signal number could be bogus. If sigev_signo was an unitialized value (as it often would be if SIGEV_NONE is used), then it's hard to predict which signal will be sent. Cc: stable@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Sharvil Nanavati <sharvil@google.com> Signed-off-by:
Richard Larocque <rlarocque@google.com> Signed-off-by:
John Stultz <john.stultz@linaro.org> alarmtimer: Return relative times in timer_gettime Returns the time remaining for an alarm timer, rather than the time at which it is scheduled to expire. If the timer has already expired or it is not currently scheduled, the it_value's members are set to zero. This new behavior matches that of the other posix-timers and the POSIX specifications. This is a change in user-visible behavior, and may break existing applications. Hopefully, few users rely on the old incorrect behavior. Cc: stable@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Sharvil Nanavati <sharvil@google.com> Signed-off-by:
Richard Larocque <rlarocque@google.com> [jstultz: minor style tweak] Signed-off-by:
John Stultz <john.stultz@linaro.org> jiffies: Fix timeval conversion to jiffies timeval_to_jiffies tried to round a timeval up to an integral number of jiffies, but the logic for doing so was incorrect: intervals corresponding to exactly N jiffies would become N+1. This manifested itself particularly repeatedly stopping/starting an itimer: setitimer(ITIMER_PROF, &val, NULL); setitimer(ITIMER_PROF, NULL, &val); would add a full tick to val, _even if it was exactly representable in terms of jiffies_ (say, the result of a previous rounding.) Doing this repeatedly would cause unbounded growth in val. So fix the math. Here's what was wrong with the conversion: we essentially computed (eliding seconds) jiffies = usec * (NSEC_PER_USEC/TICK_NSEC) by using scaling arithmetic, which took the best approximation of NSEC_PER_USEC/TICK_NSEC with denominator of 2^USEC_JIFFIE_SC = x/(2^USEC_JIFFIE_SC), and computed: jiffies = (usec * x) >> USEC_JIFFIE_SC and rounded this calculation up in the intermediate form (since we can't necessarily exactly represent TICK_NSEC in usec.) But the scaling arithmetic is a (very slight) *over*approximation of the true value; that is, instead of dividing by (1 usec/ 1 jiffie), we effectively divided by (1 usec/1 jiffie)-epsilon (rounding down). This would normally be fine, but we want to round timeouts up, and we did so by adding 2^USEC_JIFFIE_SC - 1 before the shift; this would be fine if our division was exact, but dividing this by the slightly smaller factor was equivalent to adding just _over_ 1 to the final result (instead of just _under_ 1, as desired.) In particular, with HZ=1000, we consistently computed that 10000 usec was 11 jiffies; the same was true for any exact multiple of TICK_NSEC. We could possibly still round in the intermediate form, adding something less than 2^USEC_JIFFIE_SC - 1, but easier still is to convert usec->nsec, round in nanoseconds, and then convert using time*spec*_to_jiffies. This adds one constant multiplication, and is not observably slower in microbenchmarks on recent x86 hardware. Tested: the following program: int main() { struct itimerval zero = {{0, 0}, {0, 0}}; /* Initially set to 10 ms. */ struct itimerval initial = zero; initial.it_interval.tv_usec = 10000; setitimer(ITIMER_PROF, &initial, NULL); /* Save and restore several times. */ for (size_t i = 0; i < 10; ++i) { struct itimerval prev; setitimer(ITIMER_PROF, &zero, &prev); /* on old kernels, this goes up by TICK_USEC every iteration */ printf("previous value: %ld %ld %ld %ld\n", prev.it_interval.tv_sec, prev.it_interval.tv_usec, prev.it_value.tv_sec, prev.it_value.tv_usec); setitimer(ITIMER_PROF, &prev, NULL); } return 0; } Cc: stable@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Paul Turner <pjt@google.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Reviewed-by:
Paul Turner <pjt@google.com> Reported-by:
Aaron Jacobs <jacobsa@google.com> Signed-off-by:
Andrew Hunter <ahh@google.com> [jstultz: Tweaked to apply to 3.17-rc] Signed-off-by:
John Stultz <john.stultz@linaro.org> futex: Unlock hb->lock in futex_wait_requeue_pi() error path futex_wait_requeue_pi() calls futex_wait_setup(). If futex_wait_setup() succeeds it returns with hb->lock held and preemption disabled. Now the sanity check after this does: if (match_futex(&q.key, &key2)) { ret = -EINVAL; goto out_put_keys; } which releases the keys but does not release hb->lock. So we happily return to user space with hb->lock held and therefor preemption disabled. Unlock hb->lock before taking the exit route. Reported-by:
Dave "Trinity" Jones <davej@redhat.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Darren Hart <dvhart@linux.intel.com> Reviewed-by:
Davidlohr Bueso <dave@stgolabs.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1409112318500.4178@nanosSigned-off-by:
Thomas Gleixner <tglx@linutronix.de> irqchip: gic-v3: Declare rdist as __percpu pointer to __iomem pointer The __percpu __iomem annotations on the rdist base are contradictory and confuse static checkers such as sparse. This patch fixes the anotations so that rdist is described as a __percpu pointer to an __iomem pointer. Cc: Jason Cooper <jason@lakedaemon.net> Cc: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Will Deacon <will.deacon@arm.com> Acked-by:
Marc Zyngier <marc.zyngier@arm.com> Link: https://lkml.kernel.org/r/1409062410-25891-9-git-send-email-will.deacon@arm.comSigned-off-by:
Jason Cooper <jason@lakedaemon.net> irqchip: gic: Make gic_default_routable_irq_domain_ops static The internal irq domain ops for the GIC are not used directly anywhere else, so make them static. This gets rid of a sparse warning on the file. Cc: Jason Cooper <jason@lakedaemon.net> Cc: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Will Deacon <will.deacon@arm.com> Acked-by:
Marc Zyngier <marc.zyngier@arm.com> Link: https://lkml.kernel.org/r/1409062410-25891-8-git-send-email-will.deacon@arm.comSigned-off-by:
Jason Cooper <jason@lakedaemon.net> irqchip: exynos-combiner: Fix compilation error on ARM64 The following compilation error occurs on 64-bit Exynos7 SoC: drivers/irqchip/exynos-combiner.c: In function ‘combiner_irq_domain_map’: drivers/irqchip/exynos-combiner.c:162:2: error: implicit declaration of function ‘set_irq_flags’ [-Werror=implicit-function-declaration] set_irq_flags(irq, IRQF_VALID | IRQF_PROBE); ^ drivers/irqchip/exynos-combiner.c:162:21: error: ‘IRQF_VALID’ undeclared (first use in this function) set_irq_flags(irq, IRQF_VALID | IRQF_PROBE); ^ drivers/irqchip/exynos-combiner.c:162:21: note: each undeclared identifier is reported only once for each function it appears in drivers/irqchip/exynos-combiner.c:162:34: error: ‘IRQF_PROBE’ undeclared (first use in this function) set_irq_flags(irq, IRQF_VALID | IRQF_PROBE); Fix the build error by including linux/interrupt.h. Signed-off-by:
Naveen Krishna Chatradhi <ch.naveen@samsung.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Sudeep Holla <sudeep.holla@arm.com> Link: https://lkml.kernel.org/r/1409722329-18309-1-git-send-email-ch.naveen@samsung.comSigned-off-by:
Jason Cooper <jason@lakedaemon.net> parisc: Wire up seccomp, getrandom and memfd_create syscalls With secure computing we only support the SECCOMP_MODE_STRICT mode for now. Signed-off-by:
Helge Deller <deller@gmx.de> parisc: dino: fix %d confusingly prefixed with 0x in format string Signed-off-by:
Hans Wennborg <hans@hanshq.net> Signed-off-by:
Helge Deller <deller@gmx.de> parisc: sys_hpux: NUL terminator is one past the end We allocate "len" number of chars so we should put the NUL at "len - 1" to avoid corrupting memory. Btw, strlen_user() is different from the normal strlen() function because it includes NUL terminator in the count. Signed-off-by:
Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by:
Helge Deller <deller@gmx.de> irqchip: crossbar: Off by one bugs in init My static checker complains that the ">" should be ">=" or else we go beyond the end of the cb->irq_map[] array on the next line. Signed-off-by:
Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by:
Jason Cooper <jason@lakedaemon.net> irqchip: gic-v3: Tag all low level accessors __maybe_unused This is only really needed for gic_write_sgi1r in the !SMP case since it is only referenced in the SMP initialisation code but it seems better to have these functions all next to each other and declared consistently. Signed-off-by:
Mark Brown <broonie@linaro.org> Link: https://lkml.kernel.org/r/1406748194-21094-1-git-send-email-broonie@kernel.orgSigned-off-by:
Jason Cooper <jason@lakedaemon.net> irqchip: gic-v3: Only define gic_peek_irq() when building SMP If building with CONFIG_SMP disbled (for example, with allnoconfig) then GCC complains that the static function gic_peek_irq() is defined but not used since the only reference is in the SMP initialisation code. Fix this by moving the function definition inside the ifdef. Signed-off-by:
Mark Brown <broonie@linaro.org> Acked-by:
Marc Zyngier <marc.zyngier@arm.com> Link: https://lkml.kernel.org/r/1406480224-24628-1-git-send-email-broonie@kernel.orgSigned-off-by:
Jason Cooper <jason@lakedaemon.net> (cherry picked from commit 9226b5b4 99d263d4) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Al Viro authored
D_HASH{MASK,BITS} are used once each, both in the same function (d_hash()). At this point they are actively misguiding - they imply that values are compiler constants, which is no longer true. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> (cherry picked from commit 482db906) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Tejun Heo authored
While a queue is being destroyed, all the blkgs are destroyed and its ->root_blkg pointer is set to NULL. If someone else starts to drain while the queue is in this state, the following oops happens. NULL pointer dereference at 0000000000000028 IP: [<ffffffff8144e944>] blk_throtl_drain+0x84/0x230 PGD e4a1067 PUD b773067 PMD 0 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: cfq_iosched(-) [last unloaded: cfq_iosched] CPU: 1 PID: 537 Comm: bash Not tainted 3.16.0-rc3-work+ #2 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff88000e222250 ti: ffff88000efd4000 task.ti: ffff88000efd4000 RIP: 0010:[<ffffffff8144e944>] [<ffffffff8144e944>] blk_throtl_drain+0x84/0x230 RSP: 0018:ffff88000efd7bf0 EFLAGS: 00010046 RAX: 0000000000000000 RBX: ffff880015091450 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88000efd7c10 R08: 0000000000000000 R09: 0000000000000001 R10: ffff88000e222250 R11: 0000000000000000 R12: ffff880015091450 R13: ffff880015092e00 R14: ffff880015091d70 R15: ffff88001508fc28 FS: 00007f1332650740(0000) GS:ffff88001fa80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000028 CR3: 0000000009446000 CR4: 00000000000006e0 Stack: ffffffff8144e8f6 ffff880015091450 0000000000000000 ffff880015091d80 ffff88000efd7c28 ffffffff8144ae2f ffff880015091450 ffff88000efd7c58 ffffffff81427641 ffff880015091450 ffffffff82401f00 ffff880015091450 Call Trace: [<ffffffff8144ae2f>] blkcg_drain_queue+0x1f/0x60 [<ffffffff81427641>] __blk_drain_queue+0x71/0x180 [<ffffffff81429b3e>] blk_queue_bypass_start+0x6e/0xb0 [<ffffffff814498b8>] blkcg_deactivate_policy+0x38/0x120 [<ffffffff8144ec44>] blk_throtl_exit+0x34/0x50 [<ffffffff8144aea5>] blkcg_exit_queue+0x35/0x40 [<ffffffff8142d476>] blk_release_queue+0x26/0xd0 [<ffffffff81454968>] kobject_cleanup+0x38/0x70 [<ffffffff81454848>] kobject_put+0x28/0x60 [<ffffffff81427505>] blk_put_queue+0x15/0x20 [<ffffffff817d07bb>] scsi_device_dev_release_usercontext+0x16b/0x1c0 [<ffffffff810bc339>] execute_in_process_context+0x89/0xa0 [<ffffffff817d064c>] scsi_device_dev_release+0x1c/0x20 [<ffffffff817930e2>] device_release+0x32/0xa0 [<ffffffff81454968>] kobject_cleanup+0x38/0x70 [<ffffffff81454848>] kobject_put+0x28/0x60 [<ffffffff817934d7>] put_device+0x17/0x20 [<ffffffff817d11b9>] __scsi_remove_device+0xa9/0xe0 [<ffffffff817d121b>] scsi_remove_device+0x2b/0x40 [<ffffffff817d1257>] sdev_store_delete+0x27/0x30 [<ffffffff81792ca8>] dev_attr_store+0x18/0x30 [<ffffffff8126f75e>] sysfs_kf_write+0x3e/0x50 [<ffffffff8126ea87>] kernfs_fop_write+0xe7/0x170 [<ffffffff811f5e9f>] vfs_write+0xaf/0x1d0 [<ffffffff811f69bd>] SyS_write+0x4d/0xc0 [<ffffffff81d24692>] system_call_fastpath+0x16/0x1b 776687bc ("block, blk-mq: draining can't be skipped even if bypass_depth was non-zero") made it easier to trigger this bug by making blk_queue_bypass_start() drain even when it loses the first bypass test to blk_cleanup_queue(); however, the bug has always been there even before the commit as blk_queue_bypass_start() could race against queue destruction, win the initial bypass test but perform the actual draining after blk_cleanup_queue() already destroyed all blkgs. Fix it by skippping calling into policy draining if all the blkgs are already gone. Signed-off-by:
Tejun Heo <tj@kernel.org> Reported-by:
Shirish Pargaonkar <spargaonkar@suse.com> Reported-by:
Sasha Levin <sasha.levin@oracle.com> Reported-by:
Jet Chen <jet.chen@intel.com> Cc: stable@vger.kernel.org Tested-by:
Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by:
Jens Axboe <axboe@fb.com> Revert "bio: modify __bio_add_page() to accept pages that don't start a new segment" This reverts commit 254c4407. It causes crashes with cryptsetup, even after a few iterations and updates. Drop it for now. blkcg: don't call into policy draining if root_blkg is already gone While a queue is being destroyed, all the blkgs are destroyed and its ->root_blkg pointer is set to NULL. If someone else starts to drain while the queue is in this state, the following oops happens. NULL pointer dereference at 0000000000000028 IP: [<ffffffff8144e944>] blk_throtl_drain+0x84/0x230 PGD e4a1067 PUD b773067 PMD 0 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: cfq_iosched(-) [last unloaded: cfq_iosched] CPU: 1 PID: 537 Comm: bash Not tainted 3.16.0-rc3-work+ #2 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff88000e222250 ti: ffff88000efd4000 task.ti: ffff88000efd4000 RIP: 0010:[<ffffffff8144e944>] [<ffffffff8144e944>] blk_throtl_drain+0x84/0x230 RSP: 0018:ffff88000efd7bf0 EFLAGS: 00010046 RAX: 0000000000000000 RBX: ffff880015091450 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88000efd7c10 R08: 0000000000000000 R09: 0000000000000001 R10: ffff88000e222250 R11: 0000000000000000 R12: ffff880015091450 R13: ffff880015092e00 R14: ffff880015091d70 R15: ffff88001508fc28 FS: 00007f1332650740(0000) GS:ffff88001fa80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000028 CR3: 0000000009446000 CR4: 00000000000006e0 Stack: ffffffff8144e8f6 ffff880015091450 0000000000000000 ffff880015091d80 ffff88000efd7c28 ffffffff8144ae2f ffff880015091450 ffff88000efd7c58 ffffffff81427641 ffff880015091450 ffffffff82401f00 ffff880015091450 Call Trace: [<ffffffff8144ae2f>] blkcg_drain_queue+0x1f/0x60 [<ffffffff81427641>] __blk_drain_queue+0x71/0x180 [<ffffffff81429b3e>] blk_queue_bypass_start+0x6e/0xb0 [<ffffffff814498b8>] blkcg_deactivate_policy+0x38/0x120 [<ffffffff8144ec44>] blk_throtl_exit+0x34/0x50 [<ffffffff8144aea5>] blkcg_exit_queue+0x35/0x40 [<ffffffff8142d476>] blk_release_queue+0x26/0xd0 [<ffffffff81454968>] kobject_cleanup+0x38/0x70 [<ffffffff81454848>] kobject_put+0x28/0x60 [<ffffffff81427505>] blk_put_queue+0x15/0x20 [<ffffffff817d07bb>] scsi_device_dev_release_usercontext+0x16b/0x1c0 [<ffffffff810bc339>] execute_in_process_context+0x89/0xa0 [<ffffffff817d064c>] scsi_device_dev_release+0x1c/0x20 [<ffffffff817930e2>] device_release+0x32/0xa0 [<ffffffff81454968>] kobject_cleanup+0x38/0x70 [<ffffffff81454848>] kobject_put+0x28/0x60 [<ffffffff817934d7>] put_device+0x17/0x20 [<ffffffff817d11b9>] __scsi_remove_device+0xa9/0xe0 [<ffffffff817d121b>] scsi_remove_device+0x2b/0x40 [<ffffffff817d1257>] sdev_store_delete+0x27/0x30 [<ffffffff81792ca8>] dev_attr_store+0x18/0x30 [<ffffffff8126f75e>] sysfs_kf_write+0x3e/0x50 [<ffffffff8126ea87>] kernfs_fop_write+0xe7/0x170 [<ffffffff811f5e9f>] vfs_write+0xaf/0x1d0 [<ffffffff811f69bd>] SyS_write+0x4d/0xc0 [<ffffffff81d24692>] system_call_fastpath+0x16/0x1b 776687bc ("block, blk-mq: draining can't be skipped even if bypass_depth was non-zero") made it easier to trigger this bug by making blk_queue_bypass_start() drain even when it loses the first bypass test to blk_cleanup_queue(); however, the bug has always been there even before the commit as blk_queue_bypass_start() could race against queue destruction, win the initial bypass test but perform the actual draining after blk_cleanup_queue() already destroyed all blkgs. Fix it by skippping calling into policy draining if all the blkgs are already gone. Signed-off-by:
Tejun Heo <tj@kernel.org> Reported-by:
Shirish Pargaonkar <spargaonkar@suse.com> Reported-by:
Sasha Levin <sasha.levin@oracle.com> Reported-by:
Jet Chen <jet.chen@intel.com> Cc: stable@vger.kernel.org Tested-by:
Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by:
Jens Axboe <axboe@fb.com> bio: modify __bio_add_page() to accept pages that don't start a new segment The original behaviour is to refuse to add a new page if the maximum number of segments has been reached, regardless of the fact the page we are going to add can be merged into the last segment or not. Unfortunately, when the system runs under heavy memory fragmentation conditions, a driver may try to add multiple pages to the last segment. The original code won't accept them and EBUSY will be reported to userspace. This patch modifies the function so it refuses to add a page only in case the latter starts a new segment and the maximum number of segments has already been reached. The bug can be easily reproduced with the st driver: 1) set CONFIG_SCSI_MPT2SAS_MAX_SGE or CONFIG_SCSI_MPT3SAS_MAX_SGE to 16 2) modprobe st buffer_kbs=1024 3) #dd if=/dev/zero of=/dev/st0 bs=1M count=10 dd: error writing `/dev/st0': Device or resource busy [ming.lei@canonical.com: update bi_iter.bi_size before recounting segments] Signed-off-by:
Maurizio Lombardi <mlombard@redhat.com> Signed-off-by:
Ming Lei <ming.lei@canonical.com> Tested-by:
Dongsu Park <dongsu.park@profitbricks.com> Tested-by:
Jet Chen <jet.chen@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Jens Axboe <axboe@fb.com> block: fix SG_[GS]ET_RESERVED_SIZE ioctl when max_sectors is huge SG_GET_RESERVED_SIZE and SG_SET_RESERVED_SIZE ioctls access a reserved buffer in bytes as int type. The value needs to be capped at the request queue's max_sectors. But integer overflow is not correctly handled in the calculation when converting max_sectors from sectors to bytes. Signed-off-by:
Akinobu Mita <akinobu.mita@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Douglas Gilbert <dgilbert@interlog.com> Cc: linux-scsi@vger.kernel.org Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@fb.com> block: fix BLKSECTGET ioctl when max_sectors is greater than USHRT_MAX BLKSECTGET ioctl loads the request queue's max_sectors as unsigned short value to the argument pointer. So if the max_sector is greater than USHRT_MAX, the upper 16 bits of that is just discarded. In such case, USHRT_MAX is more preferable than the lower 16 bits of max_sectors. Signed-off-by:
Akinobu Mita <akinobu.mita@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Douglas Gilbert <dgilbert@interlog.com> Cc: linux-scsi@vger.kernel.org Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@fb.com> block/partitions/efi.c: kerneldoc fixing Adding function documentation and fixing kerneldoc warnings ('field: description' uniformization). Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by:
Fabian Frederick <fabf@skynet.be> Signed-off-by:
Jens Axboe <axboe@fb.com> block/partitions/msdos.c: code clean-up checkpatch fixing: WARNING: Missing a blank line after declarations WARNING: space prohibited between function name and open parenthesis '(' ERROR: spaces required around that '<' (ctx:VxV) Cc: Jens Axboe <axboe@kernel.dk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Fabian Frederick <fabf@skynet.be> Signed-off-by:
Jens Axboe <axboe@fb.com> block/partitions/amiga.c: replace nolevel printk by pr_err Also add no prefix pr_fmt to avoid any future default format update Cc: Jens Axboe <axboe@kernel.dk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Fabian Frederick <fabf@skynet.be> Signed-off-by:
Jens Axboe <axboe@fb.com> block/partitions/aix.c: replace count*size kzalloc by kcalloc kcalloc manages count*sizeof overflow. Cc: Jens Axboe <axboe@kernel.dk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Fabian Frederick <fabf@skynet.be> Signed-off-by:
Jens Axboe <axboe@fb.com> bio-integrity: add "bip_max_vcnt" into struct bio_integrity_payload Commit 08778795 ("block: Fix nr_vecs for inline integrity vectors") from Martin introduces the function bip_integrity_vecs(get the useful vectors) to fix the issue about nr_vecs for inline integrity vectors that reported by David Milburn. But it seems that bip_integrity_vecs() will return the wrong number if the bio is not based on any bio_set for some reason(bio->bi_pool == NULL), because in that case, the bip_inline_vecs[0] is malloced directly. So here we add the bip_max_vcnt to record the count of vector slots, and cleanup the function bip_integrity_vecs(). Signed-off-by:
Gu Zheng <guz.fnst@cn.fujitsu.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Kent Overstreet <kmo@daterainc.com> Signed-off-by:
Jens Axboe <axboe@fb.com> blk-mq: use percpu_ref for mq usage count Currently, blk-mq uses a percpu_counter to keep track of how many usages are in flight. The percpu_counter is drained while freezing to ensure that no usage is left in-flight after freezing is complete. blk_mq_queue_enter/exit() and blk_mq_[un]freeze_queue() implement this per-cpu gating mechanism. This type of code has relatively high chance of subtle bugs which are extremely difficult to trigger and it's way too hairy to be open coded in blk-mq. percpu_ref can serve the same purpose after the recent changes. This patch replaces the open-coded per-cpu usage counting and draining mechanism with percpu_ref. blk_mq_queue_enter() performs tryget_live on the ref and exit() performs put. blk_mq_freeze_queue() kills the ref and waits until the reference count reaches zero. blk_mq_unfreeze_queue() revives the ref and wakes up the waiters. Signed-off-by:
Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Nicholas A. Bellinger <nab@linux-iscsi.org> Cc: Kent Overstreet <kmo@daterainc.com> Signed-off-by:
Jens Axboe <axboe@fb.com> blk-mq: collapse __blk_mq_drain_queue() into blk_mq_freeze_queue() Keeping __blk_mq_drain_queue() as a separate function doesn't buy us anything and it's gonna be further simplified. Let's flatten it into its caller. This patch doesn't make any functional change. Signed-off-by:
Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Nicholas A. Bellinger <nab@linux-iscsi.org> Signed-off-by:
Jens Axboe <axboe@fb.com> blk-mq: decouble blk-mq freezing from generic bypassing blk_mq freezing is entangled with generic bypassing which bypasses blkcg and io scheduler and lets IO requests fall through the block layer to the drivers in FIFO order. This allows forward progress on IOs with the advanced features disabled so that those features can be configured or altered without worrying about stalling IO which may lead to deadlock through memory allocation. However, generic bypassing doesn't quite fit blk-mq. blk-mq currently doesn't make use of blkcg or ioscheds and it maps bypssing to freezing, which blocks request processing and drains all the in-flight ones. This causes problems as bypassing assumes that request processing is online. blk-mq works around this by conditionally allowing request processing for the problem case - during queue initialization. Another weirdity is that except for during queue cleanup, bypassing started on the generic side prevents blk-mq from processing new requests but doesn't drain the in-flight ones. This shouldn't break anything but again highlights that something isn't quite right here. The root cause is conflating blk-mq freezing and generic bypassing which are two different mechanisms. The only intersecting purpose that they serve is during queue cleanup. Let's properly separate blk-mq freezing from generic bypassing and simply use it where necessary. * request_queue->mq_freeze_depth is added and blk_mq_[un]freeze_queue() now operate on this counter instead of ->bypass_depth. The replacement for QUEUE_FLAG_BYPASS isn't added but the counter is tested directly. This will be further updated by later changes. * blk_mq_drain_queue() is dropped and "__" prefix is dropped from blk_mq_freeze_queue(). Queue cleanup path now calls blk_mq_freeze_queue() directly. * blk_queue_enter()'s fast path condition is simplified to simply check @q->mq_freeze_depth. Previously, the condition was !blk_queue_dying(q) && (!blk_queue_bypass(q) || !blk_queue_init_done(q)) mq_freeze_depth is incremented right after dying is set and blk_queue_init_done() exception isn't necessary as blk-mq doesn't start frozen, which only leaves the blk_queue_bypass() test which can be replaced by @q->mq_freeze_depth test. This change simplifies the code and reduces confusion in the area. Signed-off-by:
Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Nicholas A. Bellinger <nab@linux-iscsi.org> Signed-off-by:
Jens Axboe <axboe@fb.com> block, blk-mq: draining can't be skipped even if bypass_depth was non-zero Currently, both blk_queue_bypass_start() and blk_mq_freeze_queue() skip queue draining if bypass_depth was already above zero. The assumption is that the one which bumped the bypass_depth should have performed draining already; however, there's nothing which prevents a new instance of bypassing/freezing from starting before the previous one finishes draining. The current code may allow the later bypassing/freezing instances to complete while there still are in-flight requests which haven't finished draining. Fix it by draining regardless of bypass_depth. We still skip draining from blk_queue_bypass_start() while the queue is initializing to avoid introducing excessive delays during boot. INIT_DONE setting is moved above the initial blk_queue_bypass_end() so that bypassing attempts can't slip inbetween. Signed-off-by:
Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Nicholas A. Bellinger <nab@linux-iscsi.org> Signed-off-by:
Jens Axboe <axboe@fb.com> blk-mq: fix a memory ordering bug in blk_mq_queue_enter() blk-mq uses a percpu_counter to keep track of how many usages are in flight. The percpu_counter is drained while freezing to ensure that no usage is left in-flight after freezing is complete. blk_mq_queue_enter/exit() and blk_mq_[un]freeze_queue() implement this per-cpu gating mechanism; unfortunately, it contains a subtle bug - smp_wmb() in blk_mq_queue_enter() doesn't prevent prevent the cpu from fetching @q->bypass_depth before incrementing @q->mq_usage_counter and if freezing happens inbetween the caller can slip through and freezing can be complete while there are active users. Use smp_mb() instead so that bypass_depth and mq_usage_counter modifications and tests are properly interlocked. Signed-off-by:
Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Nicholas A. Bellinger <nab@linux-iscsi.org> Signed-off-by:
Jens Axboe <axboe@fb.com> Merge branch 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu into for-3.17/core Merge the percpu_ref changes from Tejun, he says they are stable now. percpu-refcount: implement percpu_ref_reinit() and percpu_ref_is_zero() Now that explicit invocation of percpu_ref_exit() is necessary to free the percpu counter, we can implement percpu_ref_reinit() which reinitializes a released percpu_ref. This can be used implement scalable gating switch which can be drained and then re-opened without worrying about memory allocation failures. percpu_ref_is_zero() is added to be used in a sanity check in percpu_ref_exit(). As this function will be useful for other purposes too, make it a public interface. v2: Use smp_read_barrier_depends() instead of smp_load_acquire(). We only need data dep barrier and smp_load_acquire() is stronger and heavier on some archs. Spotted by Lai Jiangshan. Signed-off-by:
Tejun Heo <tj@kernel.org> Cc: Kent Overstreet <kmo@daterainc.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> percpu-refcount: require percpu_ref to be exited explicitly Currently, a percpu_ref undoes percpu_ref_init() automatically by freeing the allocated percpu area when the percpu_ref is killed. While seemingly convenient, this has the following niggles. * It's impossible to re-init a released reference counter without going through re-allocation. * In the similar vein, it's impossible to initialize a percpu_ref count with static percpu variables. * We need and have an explicit destructor anyway for failure paths - percpu_ref_cancel_init(). This patch removes the automatic percpu counter freeing in percpu_ref_kill_rcu() and repurposes percpu_ref_cancel_init() into a generic destructor now named percpu_ref_exit(). percpu_ref_destroy() is considered but it gets confusing with percpu_ref_kill() while "exit" clearly indicates that it's the counterpart of percpu_ref_init(). All percpu_ref_cancel_init() users are updated to invoke percpu_ref_exit() instead and explicit percpu_ref_exit() calls are added to the destruction path of all percpu_ref users. Signed-off-by:
Tejun Heo <tj@kernel.org> Acked-by:
Benjamin LaHaise <bcrl@kvack.org> Cc: Kent Overstreet <kmo@daterainc.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Nicholas A. Bellinger <nab@linux-iscsi.org> Cc: Li Zefan <lizefan@huawei.com> percpu-refcount: use unsigned long for pcpu_count pointer percpu_ref->pcpu_count is a percpu pointer with a status flag in its lowest bit. As such, it always goes through arithmetic operations which is very cumbersome to do on a pointer. It has to be first casted to unsigned long and then back. Let's just make the field unsigned long so that we can skip the first casts. While at it, rename it to pcpu_counter_ptr to clarify that it's a pointer value. Signed-off-by:
Tejun Heo <tj@kernel.org> Cc: Kent Overstreet <kmo@daterainc.com> Cc: Christoph Lameter <cl@linux-foundation.org> percpu-refcount: add helpers for ->percpu_count accesses * All four percpu_ref_*() operations implemented in the header file perform the same operation to determine whether the percpu_ref is alive and extract the percpu pointer. Factor out the common logic into __pcpu_ref_alive(). This doesn't change the generated code. * There are a couple places in percpu-refcount.c which masks out PCPU_REF_DEAD to obtain the percpu pointer. Factor it out into pcpu_count_ptr(). * The above changes make the WARN_ON_ONCE() conditional at the top of percpu_ref_kill_and_confirm() the only user of REF_STATUS(). Test PCPU_REF_DEAD directly and remove REF_STATUS(). This patch doesn't introduce any functional change. Signed-off-by:
Tejun Heo <tj@kernel.org> Cc: Kent Overstreet <kmo@daterainc.com> Cc: Christoph Lameter <cl@linux-foundation.org> percpu-refcount: one bit is enough for REF_STATUS percpu-refcount currently reserves two lowest bits of its percpu pointer to indicate its state; however, only one bit is used for PCPU_REF_DEAD. Simplify it by removing PCPU_STATUS_BITS/MASK and testing PCPU_REF_DEAD directly. This also allows the compiler to choose a more efficient instruction depending on the architecture. Signed-off-by:
Tejun Heo <tj@kernel.org> Cc: Kent Overstreet <kmo@daterainc.com> Cc: Christoph Lameter <cl@linux-foundation.org> percpu-refcount, aio: use percpu_ref_cancel_init() in ioctx_alloc() ioctx_alloc() reaches inside percpu_ref and directly frees ->pcpu_count in its failure path, which is quite gross. percpu_ref has been providing a proper interface to do this, percpu_ref_cancel_init(), for quite some time now. Let's use that instead. This patch doesn't introduce any behavior changes. Signed-off-by:
Tejun Heo <tj@kernel.org> Acked-by:
Benjamin LaHaise <bcrl@kvack.org> Cc: Kent Overstreet <kmo@daterainc.com> workqueue: stronger test in process_one_work() After the recent changes, when POOL_DISASSOCIATED is cleared, the running worker's local CPU should be the same as pool->cpu without any exception even during cpu-hotplug. Update the sanity check in process_one_work() accordingly. This patch changes "(proposition_A && proposition_B && proposition_C)" to "(proposition_B && proposition_C)", so if the old compound proposition is true, the new one must be true too. so this will not hide any possible bug which can be caught by the old test. tj: Minor updates to the description. CC: Jason J. Herne <jjherne@linux.vnet.ibm.com> CC: Sasha Levin <sasha.levin@oracle.com> Signed-off-by:
Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by:
Tejun Heo <tj@kernel.org> workqueue: clear POOL_DISASSOCIATED in rebind_workers() The commit a9ab775b ("workqueue: directly restore CPU affinity of workers from CPU_ONLINE") moved the pool->lock into rebind_workers() without also moving "pool->flags &= ~POOL_DISASSOCIATED". There is nothing wrong with "pool->flags &= ~POOL_DISASSOCIATED" not being moved together, but there isn't any benefit either. We move it into rebind_workers() and achieve these benefits: 1) Better readability. POOL_DISASSOCIATED is cleared in rebind_workers() as expected. 2) When POOL_DISASSOCIATED is cleared, we can ensure that all the running workers of the pool are on the local CPU (pool->cpu). tj: Cosmetic updates to the code and description. Signed-off-by:
Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by:
Tejun Heo <tj@kernel.org> percpu: Use ALIGN macro instead of hand coding alignment calculation Signed-off-by:
Christoph Lameter <cl@linux.com> Signed-off-by:
Tejun Heo <tj@kernel.org> percpu: invoke __verify_pcpu_ptr() from the generic part of accessors and operations __verify_pcpu_ptr() is used to verify that a specified parameter is actually an percpu pointer by percpu accessor and operation implementations. Currently, where it's called isn't clearly defined and we just ensure that it's invoked at least once for all accessors and operations. The lack of clarity on when it should be called isn't nice and given that this is a completely generic issue, there's no reason to make archs worry about it. This patch updates __verify_pcpu_ptr() invocations such that it's always invoked from the final generic wrapper once per access or operation. As this is already the case for {raw|this}_cpu_*() definitions through __pcpu_size_*(), only the {raw|per|this}_cpu_ptr() accessors need to be updated. This change makes it unnecessary for archs to worry about __verify_pcpu_ptr(). x86's arch_raw_cpu_ptr() is updated accordingly. Signed-off-by:
Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> percpu: preffity percpu header files percpu macros are difficult to read. It's partly because they're fairly complex but also because they simply lack visual and conventional consistency to an unusual degree. The preceding patches tried to organize macro definitions consistently by their roles. This patch makes the following cosmetic changes to improve overall readability. * Use consistent convention for multi-line macro definitions - "do {" or "({" are now put on their own lines and the line continuing '\' are all put on the same column. * Temp variables used inside macro are consistently given "__" prefix. * When a macro argument is passed to another macro or a function, putting extra parenthses around it doesn't help anything. Don't put them. * _this_cpu_generic_*() are renamed to this_cpu_generic_*() so that they're consistent with raw_cpu_generic_*(). * Reorganize raw_cpu_*() and this_cpu_*() definitions so that trivial wrappers are collected in one place after actual operation definitions. * Other misc cleanups including reorganizing comments. All changes in this patch are cosmetic and cause no functional difference. Signed-off-by:
Tejun Heo <tj@kernel.org> Acked-by:
Christoph Lameter <cl@linux.com> percpu: use raw_cpu_*() to define __this_cpu_*() __this_cpu_*() operations are the same as raw_cpu_*() operations except for the added __this_cpu_preempt_check(). Curiously, these were defined using __pcu_size_call_*() instead of being layered on top of raw_cpu_*(). Let's layer them so that __this_cpu_*() are defined in terms of raw_cpu_*(). It's simpler and less error-prone this way. This patch doesn't introduce any functional difference. Signed-off-by:
Tejun Heo <tj@kernel.org> Acked-by:
Christoph Lameter <cl@linux.com> percpu: reorder macros in percpu header files * In include/asm-generic/percpu.h, collect {raw|_this}_cpu_generic*() macros into one place. They were dispersed through {raw|this}_cpu_*_N() definitions and the visiual inconsistency was making following the code unnecessarily difficult. * In include/linux/percpu-defs.h, move __verify_pcpu_ptr() later in the file so that it's right above accessor definitions where it's actually used. This is pure reorganization. Signed-off-by:
Tejun Heo <tj@kernel.org> Acked-by:
Christoph Lameter <cl@linux.com> percpu: move {raw|this}_cpu_*() definitions to include/linux/percpu-defs.h We're in the process of moving all percpu accessors and operations to include/linux/percpu-defs.h so that they're available to arch headers without having to include full include/linux/percpu.h which may cause cyclic inclusion dependency. This patch moves {raw|this}_cpu_*() definitions from include/linux/percpu.h to include/linux/percpu-defs.h. The code is moved mostly verbatim; however, raw_cpu_*() are placed above this_cpu_*() which is more conventional as the raw operations may be used to defined other variants. This is pure reorganization. Signed-off-by:
Tejun Heo <tj@kernel.org> Acked-by:
Christoph Lameter <cl@linux.com> percpu: move generic {raw|this}_cpu_*_N() definitions to include/asm-generic/percpu.h {raw|this}_cpu_*_N() operations are expected to be provided by archs and the generic definitions are provided as fallbacks. As such, these firmly belong to include/asm-generic/percpu.h. Move the generic definitions to include/asm-generic/percpu.h. The code is moved mostly verbatim; however, raw_cpu_*_N() are placed above this_cpu_*_N() which is more conventional as the raw operations may be used to defined other variants. This is pure reorganization. Signed-off-by:
Tejun Heo <tj@kernel.org> Acked-by:
Christoph Lameter <cl@linux.com> percpu: only allow sized arch overrides for {raw|this}_cpu_*() ops Currently, percpu allows two separate methods for overriding {raw|this}_cpu_*() ops - for a given operation, an arch can provide whole replacement or sized sub operations to override specific parts of it. e.g. arch either can provide this_cpu_add() or this_cpu_add_4() to override only the 4 byte operation. While quite flexible on a glance, the dual-overriding scheme complicates the code path for no actual gain. It compilcates the already complex operation definitions and if an arch wants to override all sizes, it can easily provide all variants anyway. In fact, no arch is actually making use of whole operation override. Another oddity is that __this_cpu_*() operations are defined in the same way as raw_cpu_*() but ignores full overrides of the raw_cpu_*() and doesn't allow full operation override, so if an arch provides whole overrides for raw_cpu_*() operations __this_cpu_*() ends up using the generic implementations. More importantly, it takes away the layering between arch-specific and generic parts making it impossible for the generic part to implement arch-independent features on top of arch-specific overrides. This patch removes the support for whole operation overrides. As no arch is using it, this doesn't cause any actual difference. Signed-off-by:
Tejun Heo <tj@kernel.org> Acked-by:
Christoph Lameter <cl@linux.com> percpu: reorganize include/linux/percpu-defs.h Reorganize for better readability. * Accessor definitions are collected into one place and SMP and UP now define them in the same order. * Definitions are layered when possible - e.g. per_cpu() is now defined in terms of this_cpu_ptr(). * Rather pointless comment dropped. * per_cpu(), __raw_get_cpu_var() and __get_cpu_var() are defined in a way which can be shared between SMP and UP and moved out of CONFIG_SMP blocks. This patch doesn't introduce any functional difference. Signed-off-by:
Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux-foundation.org> percpu: move accessors from include/linux/percpu.h to percpu-defs.h include/linux/percpu-defs.h is gonna host all accessors and operations so that arch headers can make use of them too without worrying about circular dependency through include/linux/percpu.h. This patch moves the following accessors from include/linux/percpu.h to include/linux/percpu-defs.h. * get/put_cpu_var() * get/put_cpu_ptr() * per_cpu_ptr() This is pure reorgniazation. Signed-off-by:
Tejun Heo <tj@kernel.org> Acked-by:
Christoph Lameter <cl@linux.com> percpu: include/asm-generic/percpu.h should contain only arch-overridable parts The roles of the various percpu header files has become unclear. There are four header files involved. include/linux/percpu-defs.h include/linux/percpu.h include/asm-generic/percpu.h arch/*/include/asm/percpu.h The original intention for include/asm-generic/percpu.h is providing generic definitions for arch-overridable parts; however, it now hosts various stuff which can't be overridden by archs. Also, include/linux/percpu-defs.h was initially added to contain section and percpu variable definition macros so that arch header files can make use of them without worrying about introducing cyclic inclusion dependency by including include/linux/percpu.h; however, arch headers sometimes need to access percpu variables too and this is one of the reasons why some accessors were implemented in include/linux/asm-generic/percpu.h. Let's clear up the situation by making include/asm-generic/percpu.h contain only arch-overridable parts and moving accessors and operations into include/linux/percpu-defs. Note that this patch only moves things from include/asm-generic/percpu.h. include/linux/percpu.h will be taken care of by later patches. This patch moves the followings. * SHIFT_PERCPU_PTR() / VERIFY_PERCPU_PTR() * per_cpu() * raw_cpu_ptr() * this_cpu_ptr() * __get_cpu_var() * __raw_get_cpu_var() * __this_cpu_ptr() * PER_CPU_[SHARED_]ALIGNED_SECTION * PER_CPU_[SHARED_]ALIGNED_SECTION * PER_CPU_FIRST_SECTION This patch is pure reorganization. Signed-off-by:
Tejun Heo <tj@kernel.org> Acked-by:
Christoph Lameter <cl@linux.com> percpu: introduce arch_raw_cpu_ptr() Currently, archs can override raw_cpu_ptr() directly; however, we wanna build a layer of indirection in the generic part of percpu so that we can implement generic features there without affecting archs. Introduce arch_raw_cpu_ptr() which is used to define raw_cpu_ptr() by generic percpu code. The two are identical for now. x86 is currently the only arch which overrides raw_cpu_ptr() and is converted to define arch_raw_cpu_ptr() instead. This doesn't introduce any functional difference. Signed-off-by:
Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> percpu: disallow archs from overriding SHIFT_PERCPU_PTR() It has been about half a decade since all archs started using the dynamic percpu allocator and thus the same SHIFT_PERCPU_PTR() implementation. There's no benefit in overriding SHIFT_PERCPU_PTR() anymore. Remove #ifndef around it to clarify that this is identical regardless of the arch. This patch doesn't cause any functional difference. Signed-off-by:
Tejun Heo <tj@kernel.org> Acked-by:
Christoph Lameter <cl@linux.com> (cherry picked from commit 2a1b4cf2 0b462c89) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
NeilBrown authored
Currently we don't abort recovery on a write error if the write error to the recovering device was triggerd by normal IO (as opposed to recovery IO). This means that for one bitmap region, the recovery might write to the recovering device for a few sectors, then not bother for subsequent sectors (as it never writes to failed devices). In this case the bitmap bit will be cleared, but it really shouldn't. The result is that if the recovering device fails and is then re-added (after fixing whatever hardware problem triggerred the failure), the second recovery won't redo the region it was in the middle of, so some of the device will not be recovered properly. If we abort the recovery, the region being processes will be cancelled (bit not cleared) and the whole region will be retried. As the bug can result in data corruption the patch is suitable for -stable. For kernels prior to 3.11 there is a conflict in raid10.c which will require care. Original-from: jiao hui <jiaohui@bwstor.com.cn> Reported-and-tested-by:
jiao hui <jiaohui@bwstor.com.cn> Signed-off-by:
NeilBrown <neilb@suse.de> Cc: stable@vger.kernel.org (cherry picked from commit 2446dba0) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Eric W. Biederman authored
While invesgiating the issue where in "mount --bind -oremount,ro ..." would result in later "mount --bind -oremount,rw" succeeding even if the mount started off locked I realized that there are several additional mount flags that should be locked and are not. In particular MNT_NOSUID, MNT_NODEV, MNT_NOEXEC, and the atime flags in addition to MNT_READONLY should all be locked. These flags are all per superblock, can all be changed with MS_BIND, and should not be changable if set by a more privileged user. The following additions to the current logic are added in this patch. - nosuid may not be clearable by a less privileged user. - nodev may not be clearable by a less privielged user. - noexec may not be clearable by a less privileged user. - atime flags may not be changeable by a less privileged user. The logic with atime is that always setting atime on access is a global policy and backup software and auditing software could break if atime bits are not updated (when they are configured to be updated), and serious performance degradation could result (DOS attack) if atime updates happen when they have been explicitly disabled. Therefore an unprivileged user should not be able to mess with the atime bits set by a more privileged user. The additional restrictions are implemented with the addition of MNT_LOCK_NOSUID, MNT_LOCK_NODEV, MNT_LOCK_NOEXEC, and MNT_LOCK_ATIME mnt flags. Taken together these changes and the fixes for MNT_LOCK_READONLY should make it safe for an unprivileged user to create a user namespace and to call "mount --bind -o remount,... ..." without the danger of mount flags being changed maliciously. Cc: stable@vger.kernel.org Acked-by:
Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by:
"Eric W. Biederman" <ebiederm@xmission.com> (cherry picked from commit 9566d674) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Eric W. Biederman authored
Kenton Varda <kenton@sandstorm.io> discovered that by remounting a read-only bind mount read-only in a user namespace the MNT_LOCK_READONLY bit would be cleared, allowing an unprivileged user to the remount a read-only mount read-write. Correct this by replacing the mask of mount flags to preserve with a mask of mount flags that may be changed, and preserve all others. This ensures that any future bugs with this mask and remount will fail in an easy to detect way where new mount flags simply won't change. Cc: stable@vger.kernel.org Acked-by:
Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by:
"Eric W. Biederman" <ebiederm@xmission.com> (cherry picked from commit a6138db8) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Ralf Baechle authored
This fixes the following issue BUG: using smp_processor_id() in preemptible [00000000] code: kjournald/1761 caller is blast_dcache32+0x30/0x254 Call Trace: [<8047f02c>] dump_stack+0x8/0x34 [<802e7e40>] debug_smp_processor_id+0xe0/0xf0 [<80114d94>] blast_dcache32+0x30/0x254 [<80118484>] r4k_dma_cache_wback_inv+0x200/0x288 [<80110ff0>] mips_dma_map_sg+0x108/0x180 [<80355098>] ide_dma_prepare+0xf0/0x1b8 [<8034eaa4>] do_rw_taskfile+0x1e8/0x33c [<8035951c>] ide_do_rw_disk+0x298/0x3e4 [<8034a3c4>] do_ide_request+0x2e0/0x704 [<802bb0dc>] __blk_run_queue+0x44/0x64 [<802be000>] queue_unplugged.isra.36+0x1c/0x54 [<802beb94>] blk_flush_plug_list+0x18c/0x24c [<802bec6c>] blk_finish_plug+0x18/0x48 [<8026554c>] journal_commit_transaction+0x3b8/0x151c [<80269648>] kjournald+0xec/0x238 [<8014ac00>] kthread+0xb8/0xc0 [<8010268c>] ret_from_kernel_thread+0x14/0x1c Caches in most systems are identical - but not always, so we can't avoid the use of smp_call_function() by just looking at the boot CPU's data, have to fiddle with preemption instead. Signed-off-by:
Ralf Baechle <ralf@linux-mips.org> Cc: Markos Chandras <markos.chandras@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/5835 (cherry picked from commit ff522058) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Aaro Koskinen authored
get_system_type() is not thread-safe on OCTEON. It uses static data, also more dangerous issue is that it's calling cvmx_fuse_read_byte() every time without any synchronization. Currently it's possible to get processes stuck looping forever in kernel simply by launching multiple readers of /proc/cpuinfo: (while true; do cat /proc/cpuinfo > /dev/null; done) & (while true; do cat /proc/cpuinfo > /dev/null; done) & ... Fix by initializing the system type string only once during the early boot. Signed-off-by:
Aaro Koskinen <aaro.koskinen@nsn.com> Cc: stable@vger.kernel.org Reviewed-by:
Markos Chandras <markos.chandras@imgtec.com> Patchwork: http://patchwork.linux-mips.org/patch/7437/Signed-off-by:
James Hogan <james.hogan@imgtec.com> MIPS: CPS: Initialize EVA before bringing up VPEs from secondary cores The CPS code is doing several memory loads when configuring the VPEs from secondary cores, so the segmentation control registers must be initialized in time otherwise the kernel will crash with strange TLB exceptions. Reviewed-by:
Paul Burton <paul.burton@imgtec.com> Signed-off-by:
Markos Chandras <markos.chandras@imgtec.com> Patchwork: http://patchwork.linux-mips.org/patch/7424/Signed-off-by:
James Hogan <james.hogan@imgtec.com> MIPS: Malta: EVA: Rename 'eva_entry' to 'platform_eva_init' Rename 'eva_entry' to 'platform_eva_init' as required by the new 'eva_init' macro in the eva.h header. Since this macro is now used in a platform dependent way, it must not depend on its caller so move the t1 register initialization inside this macro. Also set the .reorder assembler option in case the caller may have previously set .noreorder. This may allow a few assembler optimizations. Finally include missing headers and document the register usage for this macro. Reviewed-by:
Paul Burton <paul.burton@imgtec.com> Signed-off-by:
Markos Chandras <markos.chandras@imgtec.com> Patchwork: http://patchwork.linux-mips.org/patch/7423/Signed-off-by:
James Hogan <james.hogan@imgtec.com> MIPS: EVA: Add new EVA header Generic code may need to perform certain operations when EVA is enabled, for example, configure the segmentation registers during boot. In order to avoid using more CONFIG_EVA ifdefs in the arch code, such functions will be added in this header instead. Initially this header contains a macro which will be used by generic code later on during VPEs configuration on secondary cores. All it does is to call the platform specific EVA init code in case EVA is enabled. Reviewed-by:
Paul Burton <paul.burton@imgtec.com> Signed-off-by:
Markos Chandras <markos.chandras@imgtec.com> Patchwork: http://patchwork.linux-mips.org/patch/7422/Signed-off-by:
James Hogan <james.hogan@imgtec.com> MIPS: scall64-o32: Fix indirect syscall detection Commit 4c21b8fd (MIPS: seccomp: Handle indirect system calls (o32)) added indirect syscall detection for O32 processes running on MIPS64 but it did not work as expected. The reason is the the scall64-o32 implementation differs compared to scall32-o32. In the former, the v0 (syscall number) register contains the absolute syscall number (4000 + X) whereas in the latter it contains the relative syscall number (X). Fix the code to avoid doing an extra addition, and load the v0 register directly to the first argument for syscall_trace_enter. Moreover, set the .reorder assembler option in order to have better control on this part of the assembly code. Signed-off-by:
Markos Chandras <markos.chandras@imgtec.com> Patchwork: http://patchwork.linux-mips.org/patch/7481/ Cc: <stable@vger.kernel.org> # v3.15+ Signed-off-by:
James Hogan <james.hogan@imgtec.com> MIPS: syscall: Fix AUDIT value for O32 processes on MIPS64 On MIPS64, O32 processes set both TIF_32BIT_ADDR and TIF_32BIT_REGS so the previous condition treated O32 applications as N32 when evaluating seccomp filters. Fix the condition to check both TIF_32BIT_{REGS, ADDR} for the N32 AUDIT flag. Signed-off-by:
Markos Chandras <markos.chandras@imgtec.com> Patchwork: http://patchwork.linux-mips.org/patch/7480/ Cc: <stable@vger.kernel.org> # v3.15+ Signed-off-by:
James Hogan <james.hogan@imgtec.com> MIPS: Loongson: Fix COP2 usage for preemptible kernel In preemptible kernel, only TIF_USEDFPU flag is reliable to distinguish whether _init_fpu()/_restore_fp() is needed. Because the value of the CP0_Status.CU1 isn't changed during preemption. V2: Fix coding style. Signed-off-by:
Huacai Chen <chenhc@lemote.com> Cc: John Crispin <john@phrozen.org> Cc: Steven J. Hill <Steven.Hill@imgtec.com> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: linux-mips@linux-mips.org Cc: Fuxin Zhang <zhangfx@lemote.com> Cc: Zhangjin Wu <wuzhangjin@gmail.com> Patchwork: https://patchwork.linux-mips.org/patch/7515/Signed-off-by:
Ralf Baechle <ralf@linux-mips.org> MIPS: NL: Fix nlm_xlp_defconfig build error The nlm_xlp_defconfig build fails with ./arch/mips/include/asm/mach-netlogic/topology.h:15:0: error: "topology_core_id" redefined [-Werror] In file included from include/linux/smp.h:59:0, [ ...] from arch/mips/mm/dma-default.c:12: ./arch/mips/include/asm/smp.h:41:0: note: this is the location of the previous definition and similar errors. This is caused by commit bda4584c ("MIPS: Support CPU topology files in sysfs") which adds the defines to arch/mips/include/asm/smp.h. Remove the defines from arch/mips/include/asm/mach-netlogic/topology.h as no longer necessary. Signed-off-by:
Guenter Roeck <linux@roeck-us.net> Cc: Huacai Chen <chenhc@lemote.com> Cc: Andreas Herrmann <andreas.herrmann@caviumnetworks.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/7513/Signed-off-by:
Ralf Baechle <ralf@linux-mips.org> MIPS: Remove race window in page fault handling Multicore MIPSes without I/D hardware coherency suffered from a race condition in the page fault handler. The page table entry was published before any pending lazy D-cache flush was committed, hence it allowed execution of stale page cache data by other VPEs in the system. To make the cache handling safe we need to perform flushing already in the set_pte_at function. MIPSes without coherent I-caches can get a small increase in flushes due to the unavailability of the execute flag in set_pte_at. [ralf@linux-mips.org: outlining set_pte_at() saves a good k in a test build, so I moved its definition from pgtable.h to cache.c.] Signed-off-by:
Lars Persson <larper@axis.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/7511/Signed-off-by:
Ralf Baechle <ralf@linux-mips.org> MIPS: Malta: Improve system memory detection for '{e, }memsize' >= 2G Using kstrtol to parse the "{e,}memsize" variables was wrong because this parses signed long numbers. In case of '{e,}memsize' >= 2G, the top bit is set, resulting to -ERANGE errors and possibly random system memory boundaries. We fix this by replacing "kstrtol" with "kstrtoul". We also improve the code to check the kstrtoul return value and print a warning if an error was returned. Signed-off-by:
Markos Chandras <markos.chandras@imgtec.com> Cc: <stable@vger.kernel.org> # v3.15+ Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/7543/Signed-off-by:
Ralf Baechle <ralf@linux-mips.org> MIPS: Alchemy: Fix db1200 PSC clock enablement Enable PSC0 (I2C/SPI) clock and leave PSC1 (Audio) alone. This patch restores functionality to both Audio and I2C/SPI. Signed-off-by:
Manuel Lauss <manuel.lauss@gmail.com> Cc: Linux-MIPS <linux-mips@linux-mips.org> Patchwork: https://patchwork.linux-mips.org/patch/7544/Signed-off-by:
Ralf Baechle <ralf@linux-mips.org> MIPS: BCM47XX: Fix reboot problem on BCM4705/BCM4785 This adds some code based on code from the Broadcom GPL tar to fix the reboot problems on BCM4705/BCM4785. I tried rebooting my device for ~10 times and have never seen a problem. This reverts the changes in the previous commit and adds the real fix as suggested by Rafał. Setting bit 22 in Reg 22, sel 4 puts the BIU (Bus Interface Unit) into async mode. The previous commit was 316cad5c [MIPS: BCM47XX: make reboot more relaiable] Signed-off-by:
Hauke Mehrtens <hauke@hauke-m.de> Cc: jogo@openwrt.org Cc: zajec5@gmail.com Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/7545/Signed-off-by:
Ralf Baechle <ralf@linux-mips.org> MIPS: Remove duplicated include from numa.c Signed-off-by:
Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: Huacai Chen <chenhc@lemote.com> Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/7537/Signed-off-by:
Ralf Baechle <ralf@linux-mips.org> MIPS: Add common plat_irq_dispatch declaration Add common declaration to get rid of following sparse warning: "symbol 'plat_irq_dispatch' was not declared. Should it be static?" Signed-off-by:
Sergey Ryazanov <ryazanov.s.a@gmail.com> Cc: Linux MIPS <linux-mips@linux-mips.org> Patchwork: https://patchwork.linux-mips.org/patch/7539/Signed-off-by:
Ralf Baechle <ralf@linux-mips.org> MIPS: MSP71xx: remove unused plat_irq_dispatch() argument Remove unused argument to make the plat_irq_dispatch() function declaration similar to the realization of other platforms. Signed-off-by:
Sergey Ryazanov <ryazanov.s.a@gmail.com> Cc: Linux MIPS <linux-mips@linux-mips.org> Patchwork: https://patchwork.linux-mips.org/patch/7538/Signed-off-by:
Ralf Baechle <ralf@linux-mips.org> MIPS: GIC: Remove useless parens from GICBIS(). Signed-off-by:
Ralf Baechle <ralf@linux-mips.org> MIPS: perf: Mark pmu interupt IRQF_NO_THREAD In RT kernel, I ran into the following calltrace, so PMU interrupts cannot be threaded in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/0 INFO: lockdep is turned off. Call Trace: [<ffffffff8088595c>] dump_stack+0x1c/0x50 [<ffffffff801a958c>] __might_sleep+0x13c/0x148 [<ffffffff80891c54>] rt_spin_lock+0x3c/0xb0 [<ffffffff801ad29c>] __wake_up+0x3c/0x80 [<ffffffff80243ba4>] perf_event_wakeup+0x8c/0xf8 [<ffffffff80243c50>] perf_pending_event+0x40/0x78 [<ffffffff8023d88c>] irq_work_run+0x74/0xc0 [<ffffffff80152640>] mipsxx_pmu_handle_shared_irq+0x110/0x228 [<ffffffff8015276c>] mipsxx_pmu_handle_irq+0x14/0x30 [<ffffffff801ffda4>] handle_irq_event_percpu+0xbc/0x470 [<ffffffff80204478>] handle_percpu_irq+0x98/0xc8 [<ffffffff801ff284>] generic_handle_irq+0x4c/0x68 [<ffffffff8089748c>] do_IRQ+0x2c/0x48 [<ffffffff80105864>] plat_irq_dispatch+0x64/0xd0 [ralf@linux-mips.org: I don't see why based on this register dump the handler should be marked IRQF_NO_THREAD - but the handler is manipulating per-CPU resources so we don't want it to be rescheduled to another CPU.] Signed-off-by:
Yang Wei <Wei.Yang@windriver.com> Cc: a.p.zijlstra@chello.nl Cc: paulus@samba.org Cc: mingo@redhat.com Cc: acme@kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/7506/Signed-off-by:
Ralf Baechle <ralf@linux-mips.org> MIPS: kdump: Set correct value to kexec_indirection_page variable Since there is not indirection page in crash type, so the vaule of the head field of kimage structure is not equal to the address of indirection page but IND_DONE. so we have to set kexec_indirection_page variable to the address of the head field of image structure. [ralf@linux-mips.org: Don't add pointless empty line, fix trailing whitespace damage.] Signed-off-by:
Yang Wei <Wei.Yang@windriver.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/7499/Signed-off-by:
Ralf Baechle <ralf@linux-mips.org> MIPS: OCTEON: make get_system_type() thread-safe get_system_type() is not thread-safe on OCTEON. It uses static data, also more dangerous issue is that it's calling cvmx_fuse_read_byte() every time without any synchronization. Currently it's possible to get processes stuck looping forever in kernel simply by launching multiple readers of /proc/cpuinfo: (while true; do cat /proc/cpuinfo > /dev/null; done) & (while true; do cat /proc/cpuinfo > /dev/null; done) & ... Fix by initializing the system type string only once during the early boot. Signed-off-by:
Aaro Koskinen <aaro.koskinen@nsn.com> Cc: stable@vger.kernel.org Reviewed-by:
Markos Chandras <markos.chandras@imgtec.com> Patchwork: http://patchwork.linux-mips.org/patch/7437/Signed-off-by:
James Hogan <james.hogan@imgtec.com> (cherry picked from commit 33d9a530 60830868) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Jarkko Sakkinen authored
Regression in 41ab999c. Call to tpm_chip_put is missing. This will cause TPM device driver not to unload if tmp_get_random() is called. Cc: <stable@vger.kernel.org> # 3.7+ Signed-off-by:
Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by:
Peter Huewe <peterhuewe@gmx.de> (cherry picked from commit 3e14d83e) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Paolo Bonzini authored
This reverts commit 682367c4, which causes 32-bit SMP Windows 7 guests to panic. SeaBIOS has a limit on the number of MTRRs that it can handle, and this patch exceeded the limit. Better revert it. Thanks to Nadav Amit for debugging the cause. Cc: stable@nongnu.org Reported-by:
Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> (cherry picked from commit 0d234daf) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
H. Peter Anvin authored
Make espfix64 a hidden Kconfig option. This fixes the x86-64 UML build which had broken due to the non-existence of init_espfix_bsp() in UML: since UML uses its own Kconfig, this option does not appear in the UML build. This also makes it possible to make support for 16-bit segments a configuration option, for the people who want to minimize the size of the kernel. Reported-by:
Ingo Molnar <mingo@kernel.org> Signed-off-by:
H. Peter Anvin <hpa@zytor.com> Cc: Richard Weinberger <richard@nod.at> Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com (cherry picked from commit 197725de) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-