- 10 Oct, 2014 40 commits
-
-
Hugh Dickins authored
Commit f00cdc6d ("shmem: fix faulting into a hole while it's punched") was buggy: Sasha sent a lockdep report to remind us that grabbing i_mutex in the fault path is a no-no (write syscall may already hold i_mutex while faulting user buffer). We tried a completely different approach (see following patch) but that proved inadequate: good enough for a rational workload, but not good enough against trinity - which forks off so many mappings of the object that contention on i_mmap_mutex while hole-puncher holds i_mutex builds into serious starvation when concurrent faults force the puncher to fall back to single-page unmap_mapping_range() searches of the i_mmap tree. So return to the original umbrella approach, but keep away from i_mutex this time. We really don't want to bloat every shmem inode with a new mutex or completion, just to protect this unlikely case from trinity. So extend the original with wait_queue_head on stack at the hole-punch end, and wait_queue item on the stack at the fault end. This involves further use of i_lock to guard against the races: lockdep has been happy so far, and I see fs/inode.c:unlock_new_inode() holds i_lock around wake_up_bit(), which is comparable to what we do here. i_lock is more convenient, but we could switch to shmem's info->lock. This issue has been tagged with CVE-2014-4171, which will require commit f00cdc6d and this and the following patch to be backported: we suggest to 3.1+, though in fact the trinity forkbomb effect might go back as far as 2.6.16, when madvise(,,MADV_REMOVE) came in - or might not, since much has changed, with i_mmap_mutex a spinlock before 3.0. Anyone running trinity on 3.0 and earlier? I don't think we need care. Signed-off-by:
Hugh Dickins <hughd@google.com> Reported-by:
Sasha Levin <sasha.levin@oracle.com> Tested-by:
Sasha Levin <sasha.levin@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Lukas Czerner <lczerner@redhat.com> Cc: Dave Jones <davej@redhat.com> Cc: <stable@vger.kernel.org> [3.1+] Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 8e205f77) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Hugh Dickins authored
Trinity finds that mmap access to a hole while it's punched from shmem can prevent the madvise(MADV_REMOVE) or fallocate(FALLOC_FL_PUNCH_HOLE) from completing, until the reader chooses to stop; with the puncher's hold on i_mutex locking out all other writers until it can complete. It appears that the tmpfs fault path is too light in comparison with its hole-punching path, lacking an i_data_sem to obstruct it; but we don't want to slow down the common case. Extend shmem_fallocate()'s existing range notification mechanism, so shmem_fault() can refrain from faulting pages into the hole while it's punched, waiting instead on i_mutex (when safe to sleep; or repeatedly faulting when not). [akpm@linux-foundation.org: coding-style fixes] Signed-off-by:
Hugh Dickins <hughd@google.com> Reported-by:
Sasha Levin <sasha.levin@oracle.com> Tested-by:
Sasha Levin <sasha.levin@oracle.com> Cc: Dave Jones <davej@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit f00cdc6d) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Andy Whitcroft authored
The recently merged change (in v3.14-rc6) to ACPI resource detection (below) causes all zero length ACPI resources to be elided from the table: commit b355cee8 Author: Zhang Rui <rui.zhang@intel.com> Date: Thu Feb 27 11:37:15 2014 +0800 ACPI / resources: ignore invalid ACPI device resources This change has caused a regression in (at least) serial port detection for a number of machines (see LP#1313981 [1]). These seem to represent their IO regions (presumably incorrectly) as a zero length region. Reverting the above commit restores these serial devices. Only elide zero length resources which lie at address 0. Fixes: b355cee8 (ACPI / resources: ignore invalid ACPI device resources) Signed-off-by:
Andy Whitcroft <apw@canonical.com> Acked-by:
Zhang Rui <rui.zhang@intel.com> Cc: 3.14+ <stable@vger.kernel.org> # 3.14+ Signed-off-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com> (cherry picked from commit 867f9d46) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Thomas Gleixner authored
The rx_poll code has the following gem: if (msg_ctrl_save & IF_MCONT_EOB) return num_rx_pkts; The EOB bit is the indicator for the hardware that this is the last configured FIFO object. But this object can contain valid data, if we manage to free up objects before the overrun case hits. Now if the code exits due to the EOB bit set, then this buffer is stale and the interrupt bit and NewDat bit of the buffer are still set. Results in a nice interrupt storm unless we come into an overrun situation where the MSGLST bit gets set. ksoftirqd/0-3 [000] ..s. 79.124101: c_can_poll: rx_poll: val: 00008001 pend 00008001 ksoftirqd/0-3 [000] ..s. 79.124176: c_can_poll: rx_poll: val: 00008000 pend 00008000 ksoftirqd/0-3 [000] ..s. 79.124187: c_can_poll: rx_poll: val: 00008002 pend 00008002 ksoftirqd/0-3 [000] ..s. 79.124256: c_can_poll: rx_poll: val: 00008000 pend 00008000 ksoftirqd/0-3 [000] ..s. 79.124267: c_can_poll: rx_poll: val: 00008000 pend 00008000 The amazing thing is that the check of the MSGLST (aka overrun bit) used to be after the check of the EOB bit. That was "fixed" in commit 5d0f801a(can: c_can: Fix RX message handling, handle lost message before EOB). But the author of this "fix" did not even understand that the EOB check is broken as well. Again a simple solution: Remove Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> [mkl: adjusted subject and commit message] Signed-off-by:
Marc Kleine-Budde <mkl@pengutronix.de> (cherry picked from commit 710c5610) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Christoph Lameter authored
Seems to be called with preemption enabled. Therefore it must use mod_zone_page_state instead. Signed-off-by:
Christoph Lameter <cl@linux.com> Reported-by:
Grygorii Strashko <grygorii.strashko@ti.com> Tested-by:
Grygorii Strashko <grygorii.strashko@ti.com> Cc: Tejun Heo <tj@kernel.org> Cc: Santosh Shilimkar <santosh.shilimkar@ti.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 83da7510) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Raghavendra K T authored
Currently max_sane_readahead() returns zero on the cpu whose NUMA node has no local memory which leads to readahead failure. Fix this readahead failure by returning minimum of (requested pages, 512). Users running applications on a memory-less cpu which needs readahead such as streaming application see considerable boost in the performance. Result: fadvise experiment with FADV_WILLNEED on a PPC machine having memoryless CPU with 1GB testfile (12 iterations) yielded around 46.66% improvement. fadvise experiment with FADV_WILLNEED on a x240 machine with 1GB testfile 32GB* 4G RAM numa machine (12 iterations) showed no impact on the normal NUMA cases w/ patch. Kernel Avg Stddev base 7.4975 3.92% patched 7.4174 3.26% [Andrew: making return value PAGE_SIZE independent] Suggested-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by:
Jan Kara <jack@suse.cz> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 6d2be915) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
David Rientjes authored
The cached pageblock hint should be ignored when triggering compaction through /proc/sys/vm/compact_memory so all eligible memory is isolated. Manually invoking compaction is known to be expensive, there's no need to skip pageblocks based on heuristics (mainly for debugging). Signed-off-by:
David Rientjes <rientjes@google.com> Acked-by:
Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 91ca9186) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Joonsoo Kim authored
It is just for clean-up to reduce code size and improve readability. There is no functional change. Signed-off-by:
Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by:
Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit b6c75016) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Joonsoo Kim authored
It is odd to drop the spinlock when we scan (SWAP_CLUSTER_MAX - 1) th pfn page. This may results in below situation while isolating migratepage. 1. try isolate 0x0 ~ 0x200 pfn pages. 2. When low_pfn is 0x1ff, ((low_pfn+1) % SWAP_CLUSTER_MAX) == 0, so drop the spinlock. 3. Then, to complete isolating, retry to aquire the lock. I think that it is better to use SWAP_CLUSTER_MAX th pfn for checking the criteria about dropping the lock. This has no harm 0x0 pfn, because, at this time, locked variable would be false. Signed-off-by:
Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by:
Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit be1aa03b) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Lars Ellenberg authored
Since linux kernel 3.13, kthread_run() internally uses wait_for_completion_killable(). We sometimes may use kthread_run() while we still have a signal pending, which we used to kick our threads out of potentially blocking network functions, causing kthread_run() to mistake that as a new fatal signal and fail. Fix: flush_signals() before kthread_run(). Signed-off-by:
Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by:
Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by:
Jens Axboe <axboe@fb.com> (cherry picked from commit bbc1c5e8) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Joonsoo Kim authored
suitable_migration_target() checks that pageblock is suitable for migration target. In isolate_freepages_block(), it is called on every page and this is inefficient. So make it called once per pageblock. suitable_migration_target() also checks if page is highorder or not, but it's criteria for highorder is pageblock order. So calling it once within pageblock range has no problem. Signed-off-by:
Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by:
Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 01ead534) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
David Rientjes authored
Page migration will fail for memory that is pinned in memory with, for example, get_user_pages(). In this case, it is unnecessary to take zone->lru_lock or isolating the page and passing it to page migration which will ultimately fail. This is a racy check, the page can still change from under us, but in that case we'll just fail later when attempting to move the page. This avoids very expensive memory compaction when faulting transparent hugepages after pinning a lot of memory with a Mellanox driver. On a 128GB machine and pinning ~120GB of memory, before this patch we see the enormous disparity in the number of page migration failures because of the pinning (from /proc/vmstat): compact_pages_moved 8450 compact_pagemigrate_failed 15614415 0.05% of pages isolated are successfully migrated and explicitly triggering memory compaction takes 102 seconds. After the patch: compact_pages_moved 9197 compact_pagemigrate_failed 7 99.9% of pages isolated are now successfully migrated in this configuration and memory compaction takes less than one second. Signed-off-by:
David Rientjes <rientjes@google.com> Acked-by:
Hugh Dickins <hughd@google.com> Acked-by:
Mel Gorman <mgorman@suse.de> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Rik van Riel <riel@redhat.com> Cc: Greg Thelen <gthelen@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 119d6d59) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Dan Streetman authored
Add plist_requeue(), which moves the specified plist_node after all other same-priority plist_nodes in the list. This is essentially an optimized plist_del() followed by plist_add(). This is needed by swap, which (with the next patch in this set) uses a plist of available swap devices. When a swap device (either a swap partition or swap file) are added to the system with swapon(), the device is added to a plist, ordered by the swap device's priority. When swap needs to allocate a page from one of the swap devices, it takes the page from the first swap device on the plist, which is the highest priority swap device. The swap device is left in the plist until all its pages are used, and then removed from the plist when it becomes full. However, as described in man 2 swapon, swap must allocate pages from swap devices with the same priority in round-robin order; to do this, on each swap page allocation, swap uses a page from the first swap device in the plist, and then calls plist_requeue() to move that swap device entry to after any other same-priority swap devices. The next swap page allocation will again use a page from the first swap device in the plist and requeue it, and so on, resulting in round-robin usage of equal-priority swap devices. Also add plist_test_requeue() test function, for use by plist_test() to test plist_requeue() function. Signed-off-by:
Dan Streetman <ddstreet@ieee.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Acked-by:
Mel Gorman <mgorman@suse.de> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Shaohua Li <shli@fusionio.com> Cc: Hugh Dickins <hughd@google.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Cc: Weijie Yang <weijieut@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Bob Liu <bob.liu@oracle.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit a75f232c) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Dan Streetman authored
Add PLIST_HEAD() to plist.h, equivalent to LIST_HEAD() from list.h, to define and initialize a struct plist_head. Add plist_for_each_continue() and plist_for_each_entry_continue(), equivalent to list_for_each_continue() and list_for_each_entry_continue(), to iterate over a plist continuing after the current position. Add plist_prev() and plist_next(), equivalent to (struct list_head*)->prev and ->next, implemented by list_prev_entry() and list_next_entry(), to access the prev/next struct plist_node entry. These are needed because unlike struct list_head, direct access of the prev/next struct plist_node isn't possible; the list must be navigated via the contained struct list_head. e.g. instead of accessing the prev by list_prev_entry(node, node_list) it can be accessed by plist_prev(node). Signed-off-by:
Dan Streetman <ddstreet@ieee.org> Acked-by:
Mel Gorman <mgorman@suse.de> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Shaohua Li <shli@fusionio.com> Cc: Hugh Dickins <hughd@google.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Cc: Weijie Yang <weijieut@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Bob Liu <bob.liu@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit fd16618e) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Julian Anastasov authored
commit 8f4e0a18 ("IPVS netns exit causes crash in conntrack") added second ip_vs_conn_drop_conntrack call instead of just adding the needed check. As result, the first call still can cause crash on netns exit. Remove it. Signed-off-by:
Julian Anastasov <ja@ssi.bg> Signed-off-by:
Hans Schillstrom <hans@schillstrom.com> Signed-off-by:
Simon Horman <horms@verge.net.au> (cherry picked from commit 2627b7e1) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Guy Martin authored
The current LWS cas only works correctly for 32bit. The new LWS allows for CAS operations of variable size. Signed-off-by:
Guy Martin <gmsoft@tuxicoman.be> Cc: <stable@vger.kernel.org> # 3.13+ Signed-off-by:
Helge Deller <deller@gmx.de> (cherry picked from commit 89206491) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Benjamin Tissoires authored
Commit "HID: logitech: perform bounds checking on device_id early enough" unfortunately leaks some errors to dmesg which are not real ones: - if the report is not a DJ one, then there is not point in checking the device_id - the receiver (index 0) can also receive some notifications which can be safely ignored given the current implementation Move out the test regarding the report_id and also discards printing errors when the receiver got notified. Fixes: ad3e14d7 Cc: stable@vger.kernel.org Reported-and-tested-by:
Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by:
Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by:
Jiri Kosina <jkosina@suse.cz> (cherry picked from commit 5abfe85c) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Takashi Iwai authored
ALC1150 codec seems to need the COEF- and PLL-setups just like its compatible ALC882 codec. Some machines (e.g. SunMicro X10SAT) show the problem like too low output volumes unless the COEF setup is applied. Reported-and-tested-by:
Dana Goyette <danagoyette@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Takashi Iwai <tiwai@suse.de> (cherry picked from commit acf08081) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Jiri Kosina authored
Withtout this, ring initialization fails reliabily during resume with [drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 head ffffff8804 tail 00000000 start 000e4000 This is not a complete fix, but it is verified to make the ring initialization failures during resume much less likely. We were not able to root-cause this bug (likely HW-specific to Gen4 chips) yet. This is therefore used as a ducttape before problem is fully understood and proper fix created, so that people don't suffer from completely unusable systems in the meantime. The discussion and debugging is happening at https://bugs.freedesktop.org/show_bug.cgi?id=76554Signed-off-by:
Jiri Kosina <jkosina@suse.cz> Cc: stable@vger.kernel.org Signed-off-by:
Daniel Vetter <daniel.vetter@ffwll.ch> (cherry picked from commit ece4a17d) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Paolo Bonzini authored
This reverts commit 682367c4, which causes 32-bit SMP Windows 7 guests to panic. SeaBIOS has a limit on the number of MTRRs that it can handle, and this patch exceeded the limit. Better revert it. Thanks to Nadav Amit for debugging the cause. Cc: stable@nongnu.org Reported-by:
Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> (cherry picked from commit 0d234daf) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Takashi Iwai authored
ALC269 & co have many vendor-specific setups with COEF verbs. However, some verbs seem specific to some codec versions and they result in the codec stalling. Typically, such a case can be avoided by checking the return value from reading a COEF. If the return value is -1, it implies that the COEF is invalid, thus it shouldn't be written. This patch adds the invalid COEF checks in appropriate places accessing ALC269 and its variants. The patch actually fixes the resume problem on Acer AO725 laptop. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=52181Tested-by:
Francesco Muzio <muziofg@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Takashi Iwai <tiwai@suse.de> (cherry picked from commit f3ee07d8) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Jiri Kosina authored
device_index is a char type and the size of paired_dj_deivces is 7 elements, therefore proper bounds checking has to be applied to device_index before it is used. We are currently performing the bounds checking in logi_dj_recv_add_djhid_device(), which is too late, as malicious device could send REPORT_TYPE_NOTIF_DEVICE_UNPAIRED early enough and trigger the problem in one of the report forwarding functions called from logi_dj_raw_event(). Fix this by performing the check at the earliest possible ocasion in logi_dj_raw_event(). Cc: stable@vger.kernel.org Reported-by:
Ben Hawkes <hawkes@google.com> Reviewed-by:
Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by:
Jiri Kosina <jkosina@suse.cz> (cherry picked from commit ad3e14d7) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
David S. Miller authored
pte_ERROR() is not used anywhere, delete it. For pgd_ERROR() and pmd_ERROR(), output something similar to x86, giving the address of the pgd/pmd as well as it's value. Also provide the caller, since these macros are invoked from pgd_clear_bad() and pmd_clear_bad() which provides little context as to what high level operation was occuring when the BAD state was detected. Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit fe866433) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
David S. Miller authored
Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit 0eef331a) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
David S. Miller authored
In commit b2d43834 ("sparc64: Make PAGE_OFFSET variable."), the MAX_PHYS_ADDRESS_BITS value was increased (to 47). This constant reference to '41UL' was missed. Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit ee73887e) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
David S. Miller authored
When _PAGE_SPECIAL and _PAGE_PMD_HUGE were added to the mask, the comment was not updated. Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit c2e4e676) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Tobias Brunner authored
The SPI check introduced in ea9884b3 was intended for IPComp SAs but actually prevented AH SAs from getting installed (depending on the SPI). Fixes: ea9884b3 ("xfrm: check user specified spi for IPComp") Cc: Fan Du <fan.du@windriver.com> Signed-off-by:
Tobias Brunner <tobias@strongswan.org> Signed-off-by:
Steffen Klassert <steffen.klassert@secunet.com> (cherry picked from commit a0e5ef53) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Andrew Hunter authored
timeval_to_jiffies tried to round a timeval up to an integral number of jiffies, but the logic for doing so was incorrect: intervals corresponding to exactly N jiffies would become N+1. This manifested itself particularly repeatedly stopping/starting an itimer: setitimer(ITIMER_PROF, &val, NULL); setitimer(ITIMER_PROF, NULL, &val); would add a full tick to val, _even if it was exactly representable in terms of jiffies_ (say, the result of a previous rounding.) Doing this repeatedly would cause unbounded growth in val. So fix the math. Here's what was wrong with the conversion: we essentially computed (eliding seconds) jiffies = usec * (NSEC_PER_USEC/TICK_NSEC) by using scaling arithmetic, which took the best approximation of NSEC_PER_USEC/TICK_NSEC with denominator of 2^USEC_JIFFIE_SC = x/(2^USEC_JIFFIE_SC), and computed: jiffies = (usec * x) >> USEC_JIFFIE_SC and rounded this calculation up in the intermediate form (since we can't necessarily exactly represent TICK_NSEC in usec.) But the scaling arithmetic is a (very slight) *over*approximation of the true value; that is, instead of dividing by (1 usec/ 1 jiffie), we effectively divided by (1 usec/1 jiffie)-epsilon (rounding down). This would normally be fine, but we want to round timeouts up, and we did so by adding 2^USEC_JIFFIE_SC - 1 before the shift; this would be fine if our division was exact, but dividing this by the slightly smaller factor was equivalent to adding just _over_ 1 to the final result (instead of just _under_ 1, as desired.) In particular, with HZ=1000, we consistently computed that 10000 usec was 11 jiffies; the same was true for any exact multiple of TICK_NSEC. We could possibly still round in the intermediate form, adding something less than 2^USEC_JIFFIE_SC - 1, but easier still is to convert usec->nsec, round in nanoseconds, and then convert using time*spec*_to_jiffies. This adds one constant multiplication, and is not observably slower in microbenchmarks on recent x86 hardware. Tested: the following program: int main() { struct itimerval zero = {{0, 0}, {0, 0}}; /* Initially set to 10 ms. */ struct itimerval initial = zero; initial.it_interval.tv_usec = 10000; setitimer(ITIMER_PROF, &initial, NULL); /* Save and restore several times. */ for (size_t i = 0; i < 10; ++i) { struct itimerval prev; setitimer(ITIMER_PROF, &zero, &prev); /* on old kernels, this goes up by TICK_USEC every iteration */ printf("previous value: %ld %ld %ld %ld\n", prev.it_interval.tv_sec, prev.it_interval.tv_usec, prev.it_value.tv_sec, prev.it_value.tv_usec); setitimer(ITIMER_PROF, &prev, NULL); } return 0; } Cc: stable@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Paul Turner <pjt@google.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Reviewed-by:
Paul Turner <pjt@google.com> Reported-by:
Aaron Jacobs <jacobsa@google.com> Signed-off-by:
Andrew Hunter <ahh@google.com> [jstultz: Tweaked to apply to 3.17-rc] Signed-off-by:
John Stultz <john.stultz@linaro.org> (cherry picked from commit d78c9300) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Mel Gorman authored
This patch reverts 1ba6e0b5 ("mm: numa: split_huge_page: transfer the NUMA type from the pmd to the pte"). If a huge page is being split due a protection change and the tail will be in a PROT_NONE vma then NUMA hinting PTEs are temporarily created in the protected VMA. VM_RW|VM_PROTNONE |-----------------| ^ split here In the specific case above, it should get fixed up by change_pte_range() but there is a window of opportunity for weirdness to happen. Similarly, if a huge page is shrunk and split during a protection update but before pmd_numa is cleared then a pte_numa can be left behind. Instead of adding complexity trying to deal with the case, this patch will not mark PTEs NUMA when splitting a huge page. NUMA hinting faults will not be triggered which is marginal in comparison to the complexity in dealing with the corner cases during THP split. Cc: stable@vger.kernel.org Signed-off-by:
Mel Gorman <mgorman@suse.de> Acked-by:
Rik van Riel <riel@redhat.com> Acked-by:
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit abc40bd2) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Waiman Long authored
In __split_huge_page_map(), the check for page_mapcount(page) is invariant within the for loop. Because of the fact that the macro is implemented using atomic_read(), the redundant check cannot be optimized away by the compiler leading to unnecessary read to the page structure. This patch moves the invariant bug check out of the loop so that it will be done only once. On a 3.16-rc1 based kernel, the execution time of a microbenchmark that broke up 1000 transparent huge pages using munmap() had an execution time of 38,245us and 38,548us with and without the patch respectively. The performance gain is about 1%. Signed-off-by:
Waiman Long <Waiman.Long@hp.com> Acked-by:
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Scott J Norton <scott.norton@hp.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit f8303c25) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Steven Rostedt (Red Hat) authored
Commit 651e22f2 "ring-buffer: Always reset iterator to reader page" fixed one bug but in the process caused another one. The reset is to update the header page, but that fix also changed the way the cached reads were updated. The cache reads are used to test if an iterator needs to be updated or not. A ring buffer iterator, when created, disables writes to the ring buffer but does not stop other readers or consuming reads from happening. Although all readers are synchronized via a lock, they are only synchronized when in the ring buffer functions. Those functions may be called by any number of readers. The iterator continues down when its not interrupted by a consuming reader. If a consuming read occurs, the iterator starts from the beginning of the buffer. The way the iterator sees that a consuming read has happened since its last read is by checking the reader "cache". The cache holds the last counts of the read and the reader page itself. Commit 651e22f2 changed what was saved by the cache_read when the rb_iter_reset() occurred, making the iterator never match the cache. Then if the iterator calls rb_iter_reset(), it will go into an infinite loop by checking if the cache doesn't match, doing the reset and retrying, just to see that the cache still doesn't match! Which should never happen as the reset is suppose to set the cache to the current value and there's locks that keep a consuming reader from having access to the data. Fixes: 651e22f2 "ring-buffer: Always reset iterator to reader page" Signed-off-by:
Steven Rostedt <rostedt@goodmis.org> (cherry picked from commit 24607f11) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Peter Zijlstra authored
Oleg noticed that a cleanup by Sylvain actually uncovered a bug; by calling perf_event_free_task() when failing sched_fork() we will not yet have done the memset() on ->perf_event_ctxp[] and will therefore try and 'free' the inherited contexts, which are still in use by the parent process. This is bad.. Suggested-by:
Oleg Nesterov <oleg@redhat.com> Reported-by:
Oleg Nesterov <oleg@redhat.com> Reported-by:
Sylvain 'ythier' Hitier <sylvain.hitier@gmail.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 6c72e350) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Jan Kara authored
We did not implement any bound on number of indirect ICBs we follow when loading inode. Thus corrupted medium could cause kernel to go into an infinite loop, possibly causing a stack overflow. Fix the possible stack overflow by removing recursion from __udf_read_inode() and limit number of indirect ICBs we follow to avoid infinite loops. Signed-off-by:
Jan Kara <jack@suse.cz> (cherry picked from commit c03aa9f6) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Julian Anastasov authored
commit fc604767 ("ipvs: changes for local real server") from 2.6.37 introduced DNAT support to local real server but the IPv6 LOCAL_OUT handler ip_vs_local_reply6() is registered incorrectly as IPv4 hook causing any outgoing IPv4 traffic to be dropped depending on the IP header values. Chris tracked down the problem to CONFIG_IP_VS_IPV6=y Bug report: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1349768Reported-by:
Chris J Arges <chris.j.arges@canonical.com> Tested-by:
Chris J Arges <chris.j.arges@canonical.com> Signed-off-by:
Julian Anastasov <ja@ssi.bg> Signed-off-by:
Simon Horman <horms@verge.net.au> (cherry picked from commit eb90b0c7) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Alex Gartrell authored
Previously, only the four high bits of the tclass were maintained in the ipv6 case. This matches the behavior of ipv4, though whether or not we should reflect ECN bits may be up for debate. Signed-off-by:
Alex Gartrell <agartrell@fb.com> Acked-by:
Julian Anastasov <ja@ssi.bg> Signed-off-by:
Simon Horman <horms@verge.net.au> (cherry picked from commit 76f084bc) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
NeilBrown authored
r1_bio->start_next_window is not initialised in the READ case, so allow_barrier may incorrectly decrement conf->current_window_requests which can cause raise_barrier() to block forever. Fixes: 79ef3a8a cc: stable@vger.kernel.org (v3.13+) Reported-by:
Brassow Jonathan <jbrassow@redhat.com> Signed-off-by:
NeilBrown <neilb@suse.de> (cherry picked from commit f0cc9a05) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
NeilBrown authored
If a devices is being recovered it is not InSync and is not Faulty. If a read error is experienced on that device, fix_read_error() will be called, but it ignores non-InSync devices. So it will neither fix the error nor fail the device. It is incorrect that fix_read_error() ignores non-InSync devices. It should only ignore Faulty devices. So fix it. This became a bug when we allowed reading from a device that was being recovered. It is suitable for any subsequent -stable kernel. Fixes: da8840a7 Cc: stable@vger.kernel.org (v3.5+) Reported-by:
Alexander Lyakas <alex.bolshoy@gmail.com> Tested-by:
Alexander Lyakas <alex.bolshoy@gmail.com> Signed-off-by:
NeilBrown <neilb@suse.de> (cherry picked from commit b8cb6b4c) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
NeilBrown authored
Both normal IO and resync IO can be retried with reschedule_retry() and so be counted into ->nr_queued, but only normal IO gets counted in ->nr_pending. Before the recent improvement to RAID1 resync there could only possibly have been one or the other on the queue. When handling a read failure it could only be normal IO. So when handle_read_error() called freeze_array() the fact that freeze_array only compares ->nr_queued against ->nr_pending was safe. But now that these two types can interleave, we can have both normal and resync IO requests queued, so we need to count them both in nr_pending. This error can lead to freeze_array() hanging if there is a read error, so it is suitable for -stable. Fixes: 79ef3a8a cc: stable@vger.kernel.org (v3.13+) Reported-by:
Brassow Jonathan <jbrassow@redhat.com> Signed-off-by:
NeilBrown <neilb@suse.de> (cherry picked from commit 34e97f17) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
NeilBrown authored
commit c6d119cf upstream. commit 79ef3a8a made it possible for reads to happen concurrently with resync. This means that we need to be more careful where read_balancing is allowed during resync - we can no longer be sure that any resync that has already started will definitely finish. So keep read_balancing to before recovery_cp, which is conservative but safe. This bug makes it possible to read from a device that doesn't have up-to-date data, so it can cause data corruption. So it is suitable for any kernel since 3.11. Fixes: 79ef3a8aSigned-off-by:
NeilBrown <neilb@suse.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit dd777df4) (cherry picked from commit HEAD) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Hans Verkuil authored
commit 6a03dc92 upstream. This was caused by an uninitialized setup.config field. Based on a suggestion from Devin Heitmueller. Signed-off-by:
Hans Verkuil <hans.verkuil@cisco.com> Thanks-to: Devin Heitmueller <dheitmueller@kernellabs.com> Reported-by:
Scott Robinson <scott.robinson55@gmail.com> Tested-by:
Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by:
Mauro Carvalho Chehab <m.chehab@samsung.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 94146fe3) (cherry picked from commit HEAD) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-