- 11 Oct, 2014 1 commit
-
-
Sasha Levin authored
Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
- 10 Oct, 2014 39 commits
-
-
Andreas Schwab authored
The time spent by a CPU under a given frequency is stored in jiffies unit in the cpu var cpufreq_stats_table->time_in_state[i], i being the index of the frequency. This is what is displayed in the following file on the right column: cat /sys/devices/system/cpu/cpuX/cpufreq/stats/time_in_state 2301000 19835820 2300000 3172 [...] Now cpufreq converts this jiffies unit delta to clock_t before returning it to the user as in the above file. And that conversion is achieved using the API cputime64_to_clock_t(). Although it accidentally works on traditional tick based cputime accounting, where cputime_t maps directly to jiffies, it doesn't work with other types of cputime accounting such as CONFIG_VIRT_CPU_ACCOUNTING_* where cputime_t can map to nsecs or any granularity preffered by the architecture. For example we get a buggy zero delta on full dyntick configurations: cat /sys/devices/system/cpu/cpuX/cpufreq/stats/time_in_state 2301000 0 2300000 0 [...] Fix this with using the proper jiffies_64_t to clock_t conversion. Reported-and-tested-by:
Carsten Emde <C.Emde@osadl.org> Signed-off-by:
Andreas Schwab <schwab@linux-m68k.org> Signed-off-by:
Frederic Weisbecker <fweisbec@gmail.com> Acked-by:
Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com> (cherry picked from commit a857c0b9) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Johannes Berg authored
In testmode and vendor command reply/event SKBs we use the skb cb data to store nl80211 parameters between allocation and sending. This causes the code for CONFIG_NETLINK_MMAP to get confused, because it takes ownership of the skb cb data when the SKB is handed off to netlink, and it doesn't explicitly clear it. Clear the skb cb explicitly when we're done and before it gets passed to netlink to avoid this issue. Cc: stable@vger.kernel.org [this goes way back] Reported-by:
Assaf Azulay <assaf.azulay@intel.com> Reported-by:
David Spinadel <david.spinadel@intel.com> Signed-off-by:
Johannes Berg <johannes.berg@intel.com> (cherry picked from commit bd8c78e7) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
NeilBrown authored
It has come to my attention (thanks Martin) that 'discard_zeroes_data' is only a hint. Some devices in some cases don't do what it says on the label. The use of DISCARD in RAID5 depends on reads from discarded regions being predictably zero. If a write to a previously discarded region performs a read-modify-write cycle it assumes that the parity block was consistent with the data blocks. If all were zero, this would be the case. If some are and some aren't this would not be the case. This could lead to data corruption after a device failure when data needs to be reconstructed from the parity. As we cannot trust 'discard_zeroes_data', ignore it by default and so disallow DISCARD on all raid4/5/6 arrays. As many devices are trustworthy, and as there are benefits to using DISCARD, add a module parameter to over-ride this caution and cause DISCARD to work if discard_zeroes_data is set. If a site want to enable DISCARD on some arrays but not on others they should select DISCARD support at the filesystem level, and set the raid456 module parameter. raid456.devices_handle_discard_safely=Y As this is a data-safety issue, I believe this patch is suitable for -stable. DISCARD support for RAID456 was added in 3.7 Cc: Shaohua Li <shli@kernel.org> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Heinz Mauelshagen <heinzm@redhat.com> Cc: stable@vger.kernel.org (3.7+) Acked-by:
Martin K. Petersen <martin.petersen@oracle.com> Acked-by:
Mike Snitzer <snitzer@redhat.com> Fixes: 620125f2Signed-off-by:
NeilBrown <neilb@suse.de> (cherry picked from commit 8e0e99ba) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Hans Verkuil authored
commit 58d75f4b upstream. The recent conversion of saa7134 to vb2 unconvered a poll() bug that broke the teletext applications alevt and mtt. These applications expect that calling poll() without having called VIDIOC_STREAMON will cause poll() to return POLLERR. That did not happen in vb2. This patch fixes that behavior. It also fixes what should happen when poll() is called when STREAMON is called but no buffers have been queued. In that case poll() will also return POLLERR, but only for capture queues since output queues will always return POLLOUT anyway in that situation. This brings the vb2 behavior in line with the old videobuf behavior. Signed-off-by:
Hans Verkuil <hans.verkuil@cisco.com> Acked-by:
Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by:
Mauro Carvalho Chehab <mchehab@osg.samsung.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit f5d34b7c) (cherry picked from commit HEAD) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Oleg Nesterov authored
find_lock_task_mm() expects it is called under rcu or tasklist lock, but it seems that at least oom_unkillable_task()->task_in_mem_cgroup() and mem_cgroup_out_of_memory()->oom_badness() can call it lockless. Perhaps we could fix the callers, but this patch simply adds rcu lock into find_lock_task_mm(). This also allows to simplify a bit one of its callers, oom_kill_process(). Signed-off-by:
Oleg Nesterov <oleg@redhat.com> Cc: Sergey Dyasly <dserrg@gmail.com> Cc: Sameer Nanda <snanda@chromium.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mandeep Singh Baines <msb@chromium.org> Cc: "Ma, Xindong" <xindong.ma@intel.com> Reviewed-by:
Michal Hocko <mhocko@suse.cz> Cc: "Tu, Xiaobing" <xiaobing.tu@intel.com> Acked-by:
David Rientjes <rientjes@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 4d4048be) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Oleg Nesterov authored
At least out_of_memory() calls has_intersects_mems_allowed() without even rcu_read_lock(), this is obviously buggy. Add the necessary rcu_read_lock(). This means that we can not simply return from the loop, we need "bool ret" and "break". While at it, swap the names of task_struct's (the argument and the local). This cleans up the code a little bit and avoids the unnecessary initialization. Signed-off-by:
Oleg Nesterov <oleg@redhat.com> Reviewed-by:
Sergey Dyasly <dserrg@gmail.com> Tested-by:
Sergey Dyasly <dserrg@gmail.com> Reviewed-by:
Sameer Nanda <snanda@chromium.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mandeep Singh Baines <msb@chromium.org> Cc: "Ma, Xindong" <xindong.ma@intel.com> Reviewed-by:
Michal Hocko <mhocko@suse.cz> Cc: "Tu, Xiaobing" <xiaobing.tu@intel.com> Acked-by:
David Rientjes <rientjes@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit ad962441) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Oleg Nesterov authored
Change oom_kill.c to use for_each_thread() rather than the racy while_each_thread() which can loop forever if we race with exit. Note also that most users were buggy even if while_each_thread() was fine, the task can exit even _before_ rcu_read_lock(). Fortunately the new for_each_thread() only requires the stable task_struct, so this change fixes both problems. Signed-off-by:
Oleg Nesterov <oleg@redhat.com> Reviewed-by:
Sergey Dyasly <dserrg@gmail.com> Tested-by:
Sergey Dyasly <dserrg@gmail.com> Reviewed-by:
Sameer Nanda <snanda@chromium.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mandeep Singh Baines <msb@chromium.org> Cc: "Ma, Xindong" <xindong.ma@intel.com> Reviewed-by:
Michal Hocko <mhocko@suse.cz> Cc: "Tu, Xiaobing" <xiaobing.tu@intel.com> Acked-by:
David Rientjes <rientjes@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 1da4db0c) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Oleg Nesterov authored
while_each_thread() and next_thread() should die, almost every lockless usage is wrong. 1. Unless g == current, the lockless while_each_thread() is not safe. while_each_thread(g, t) can loop forever if g exits, next_thread() can't reach the unhashed thread in this case. Note that this can happen even if g is the group leader, it can exec. 2. Even if while_each_thread() itself was correct, people often use it wrongly. It was never safe to just take rcu_read_lock() and loop unless you verify that pid_alive(g) == T, even the first next_thread() can point to the already freed/reused memory. This patch adds signal_struct->thread_head and task->thread_node to create the normal rcu-safe list with the stable head. The new for_each_thread(g, t) helper is always safe under rcu_read_lock() as long as this task_struct can't go away. Note: of course it is ugly to have both task_struct->thread_node and the old task_struct->thread_group, we will kill it later, after we change the users of while_each_thread() to use for_each_thread(). Perhaps we can kill it even before we convert all users, we can reimplement next_thread(t) using the new thread_head/thread_node. But we can't do this right now because this will lead to subtle behavioural changes. For example, do/while_each_thread() always sees at least one task, while for_each_thread() can do nothing if the whole thread group has died. Or thread_group_empty(), currently its semantics is not clear unless thread_group_leader(p) and we need to audit the callers before we can change it. So this patch adds the new interface which has to coexist with the old one for some time, hopefully the next changes will be more or less straightforward and the old one will go away soon. Signed-off-by:
Oleg Nesterov <oleg@redhat.com> Reviewed-by:
Sergey Dyasly <dserrg@gmail.com> Tested-by:
Sergey Dyasly <dserrg@gmail.com> Reviewed-by:
Sameer Nanda <snanda@chromium.org> Acked-by:
David Rientjes <rientjes@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mandeep Singh Baines <msb@chromium.org> Cc: "Ma, Xindong" <xindong.ma@intel.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: "Tu, Xiaobing" <xiaobing.tu@intel.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 0c740d0a) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Oleg Nesterov authored
Cleanup and preparation for the next changes. Move the "if (clone_flags & CLONE_THREAD)" code down under "if (likely(p->pid))" and turn it into into the "else" branch. This makes the process/thread initialization more symmetrical and removes one check. Signed-off-by:
Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Sergey Dyasly <dserrg@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 80628ca0) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Florian Westphal authored
When loose tracking is enabled (default), non-syn packets cause creation of new conntracks in established state with default timeout for established state (5 days). This causes the table to fill up with UNREPLIED when the 'new ack' packet happened to be the last-ack of a previous, already timed-out connection. Consider: A 192.168.x.52792 > 10.184.y.80: F, 426:426(0) ack 9237 win 255 B 10.184.y.80 > 192.168.x.52792: ., ack 427 win 123 <61 second pause> C 10.184.y.80 > 192.168.x.52792: F, 9237:9237(0) ack 427 win 123 D 192.168.x.52792 > 10.184.y.80: ., ack 9238 win 255 B moves conntrack to CLOSE_WAIT and will kill it after 60 second timeout, C is ignored (FIN set), but last packet (D) causes new ct with 5-days timeout. Use UNACK timeout (5 minutes) instead to get rid of these entries sooner when in ESTABLISHED state without having seen traffic in both directions. Signed-off-by:
Florian Westphal <fw@strlen.de> Acked-by:
Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org> (cherry picked from commit 6547a221) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Jiri Olsa authored
The commit '2814eb05 perf kmem: Remove die() calls' disabled 'perf kmem' command for machines without numa support. It made the command fail if '/sys/devices/system/node' dir wasn't found. Skipping the numa based initialization in case the directory is not found and continue execution. Signed-off-by:
Jiri Olsa <jolsa@redhat.com> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1379003976-5839-5-git-send-email-jolsa@redhat.comSigned-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com> (cherry picked from commit 4921e320) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Felipe Balbi authored
Currently, we disable pm_runtime before all register accesses are done, this is dangerous and might lead to abort exceptions due to the driver trying to access a register which is clocked by a clock which was long gated. Fix that by moving pm_runtime_put_sync() and pm_runtime_disable() as the last thing we do before returning from our ->remove() method. Fixes: 72246da4 (usb: Introduce DesignWare USB3 DRD Driver) Cc: <stable@vger.kernel.org> # v3.2+ Signed-off-by:
Felipe Balbi <balbi@ti.com> (cherry picked from commit fed33afc) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Felipe Balbi authored
Commit 71c731a2 (usb: host: xhci: Fix Compliance Mode on SN65LVP3502CP Hardware) implemented a workaround for a known issue with Texas Instruments' USB 3.0 redriver IC but it left a condition where any xHCI host would be taken out of reset if port was placed in compliance mode and there was no device connected to the port. That condition would trigger a fake connection to a non-existent device so that usbcore would trigger a warm reset of the port, thus taking the link out of reset. This has the side-effect of preventing any xHCI host connected to a Linux machine from starting and running the USB 3.0 Electrical Compliance Suite because the port will mysteriously taken out of compliance mode and, thus, xHCI won't step through the necessary compliance patterns for link validation. This patch fixes the issue by just adding a missing check for XHCI_COMP_MODE_QUIRK inside xhci_hub_report_usb3_link_state() when PORT_CAS isn't set. This patch should be backported to all kernels containing commit 71c731a2. Fixes: 71c731a2 (usb: host: xhci: Fix Compliance Mode on SN65LVP3502CP Hardware) Cc: Alexis R. Cortes <alexis.cortes@ti.com> Cc: <stable@vger.kernel.org> # v3.2+ Signed-off-by:
Felipe Balbi <balbi@ti.com> Acked-by:
Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 96908589) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Tejun Heo authored
create_singlethread_workqueue() is a compat interface for single threaded workqueue which maps to ordered workqueue w/ rescuer in the current implementation. create_singlethread_workqueue() currently implemented by invoking alloc_workqueue() w/ appropriate parameters. 8719dcea ("workqueue: reject adjusting max_active or applying attrs to ordered workqueues") introduced __WQ_ORDERED to protect ordered workqueues against dynamic attribute changes which can break ordering guarantees but forgot to apply it to create_singlethread_workqueue(). This in itself is okay as nobody currently uses dynamic attribute change on workqueues created with create_singlethread_workqueue(). However, 4c16bd32 ("workqueue: implement NUMA affinity for unbound workqueues") broke singlethreaded guarantee for ordered workqueues through allocating a separate pool_workqueue on each NUMA node by default. A later change 8a2b7538 ("workqueue: fix ordered workqueues in NUMA setups") fixed it by allocating only one global pool_workqueue if __WQ_ORDERED is set. Combined, the __WQ_ORDERED omission in create_singlethread_workqueue() became critical breaking its single threadedness and ordering guarantee. Let's make create_singlethread_workqueue() wrap alloc_ordered_workqueue() instead so that it inherits __WQ_ORDERED and can implicitly track future ordered_workqueue changes. v2: I missed that __WQ_ORDERED now protects against pwq splitting across NUMA nodes and incorrectly described the patch as a nice-to-have fix to protect against future dynamic attribute usages. Oleg pointed out that this is actually a critical breakage due to 8a2b7538 ("workqueue: fix ordered workqueues in NUMA setups"). Signed-off-by:
Tejun Heo <tj@kernel.org> Reported-by:
Mike Anderson <mike.anderson@us.ibm.com> Cc: Oleg Nesterov <onestero@redhat.com> Cc: Gustavo Luiz Duarte <gduarte@redhat.com> Cc: Tomas Henzl <thenzl@redhat.com> Cc: stable@vger.kernel.org Fixes: 4c16bd32 ("workqueue: implement NUMA affinity for unbound workqueues") (cherry picked from commit e09c2c29) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Emmanuel Grumbach authored
This reverts commit 43d826ca. This commit caused packet loss. Cc: <stable@vger.kernel.org> Signed-off-by:
Emmanuel Grumbach <emmanuel.grumbach@intel.com> (cherry picked from commit f47f46d7) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Dave Martin authored
Copying a function with memcpy() and then trying to execute the result isn't trivially portable to Thumb. This patch modifies the kexec soft restart code to copy its assembler trampoline relocate_new_kernel() using fncpy() instead, so that relocate_new_kernel can be in the same ISA as the rest of the kernel without problems. Signed-off-by:
Dave Martin <Dave.Martin@arm.com> Acked-by:
Will Deacon <will.deacon@arm.com> Reported-by:
Taras Kondratiuk <taras.kondratiuk@linaro.org> Tested-by:
Taras Kondratiuk <taras.kondratiuk@linaro.org> Signed-off-by:
Russell King <rmk+kernel@arm.linux.org.uk> (cherry picked from commit e2ccba49) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Shen Guang authored
When we are doing compliance test with xHCI, we found that if we enable CONFIG_USB_SUSPEND and plug in a bad device which causes over-current condition to the root port, software will not be noticed. The reason is that current code don't set hub->change_bits in hub_activate() when over-current happens, and then hub_events() will not check the port status because it thinks nothing changed. If CONFIG_USB_SUSPEND is disabled, the interrupt pipe of the hub will report the change and set hub->event_bits, and then hub_events() will check what events happened.In this case over-current can be detected. Signed-off-by:
Shen Guang <shenguang10@gmail.com> Acked-by:
Alan Stern <stern@rowland.harvard.edu> Acked-by:
Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 08d1dec6) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Mathias Nyman authored
Resuming from hibernate (S4) will restart and re-initialize xHC. The device contexts are freed and will be re-allocated later during device reset. Usb core will disable link pm in device resume before device reset, which will try to change the max exit latency, accessing the device contexts before they are re-allocated. There is no need to zero (disable) the max exit latency when disabling hw lpm for a freshly re-initialized xHC. So check that device context exists before doing anything. The max exit latency will be set again after device reset when usb core enables the link pm. Reported-by:
Imre Deak <imre.deak@intel.com> Tested-by:
Imre Deak <imre.deak@intel.com> Cc: stable <stable@vger.kernel.org> Signed-off-by:
Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 96044694) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Benjamin Tissoires authored
Commit "HID: logitech: perform bounds checking on device_id early enough" unfortunately leaks some errors to dmesg which are not real ones: - if the report is not a DJ one, then there is not point in checking the device_id - the receiver (index 0) can also receive some notifications which can be safely ignored given the current implementation Move out the test regarding the report_id and also discards printing errors when the receiver got notified. Fixes: ad3e14d7 Cc: stable@vger.kernel.org Reported-and-tested-by:
Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by:
Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by:
Jiri Kosina <jkosina@suse.cz> (cherry picked from commit 5abfe85c) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Mel Gorman authored
When kswapd is awake reclaiming, the per-cpu stat thresholds are lowered to get more accurate counts to avoid breaching watermarks. This threshold update iterates over all possible CPUs which is unnecessary. Only online CPUs need to be updated. If a new CPU is onlined, refresh_zone_stat_thresholds() will set the thresholds correctly. Signed-off-by:
Mel Gorman <mgorman@suse.de> Acked-by:
Johannes Weiner <hannes@cmpxchg.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit bb0b6dff) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Hugh Dickins authored
Use ACCESS_ONCE() in handle_pte_fault() when getting the entry or orig_pte upon which all subsequent decisions and pte_same() tests will be made. I have no evidence that its lack is responsible for the mm/filemap.c:202 BUG_ON(page_mapped(page)) in __delete_from_page_cache() found by trinity, and I am not optimistic that it will fix it. But I have found no other explanation, and ACCESS_ONCE() here will surely not hurt. If gcc does re-access the pte before passing it down, then that would be disastrous for correct page fault handling, and certainly could explain the page_mapped() BUGs seen (concurrent fault causing page to be mapped in a second time on top of itself: mapcount 2 for a single pte). Signed-off-by:
Hugh Dickins <hughd@google.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit c0d73261) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Hugh Dickins authored
Under shmem swapping load, I sometimes hit the VM_BUG_ON_PAGE(!PageLRU) in isolate_lru_pages() at mm/vmscan.c:1281! Commit 2457aec6 ("mm: non-atomically mark page accessed during page cache allocation where possible") looks like interrupted work-in-progress. mm/filemap.c's call to init_page_accessed() is fine, but not mm/shmem.c's - shmem_write_begin() is clearly wrong to use it after shmem_getpage(), when the page is always visible in radix_tree, and often already on LRU. Revert change to shmem_write_begin(), and use init_page_accessed() or mark_page_accessed() appropriately for SGP_WRITE in shmem_getpage_gfp(). SGP_WRITE also covers shmem_symlink(), which did not mark_page_accessed() before; but since many other filesystems use [__]page_symlink(), which did and does mark the page accessed, consider this as rectifying an oversight. Signed-off-by:
Hugh Dickins <hughd@google.com> Acked-by:
Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@suse.cz> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Prabhakar Lad <prabhakar.csengg@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 66d2f4d2) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Mel Gorman authored
If a page is marked for immediate reclaim then it is moved to the tail of the LRU list. This occurs when the system is under enough memory pressure for pages under writeback to reach the end of the LRU but we test for this using atomic operations on every writeback. This patch uses an optimistic non-atomic test first. It'll miss some pages in rare cases but the consequences are not severe enough to warrant such a penalty. While the function does not dominate profiles during a simple dd test the cost of it is reduced. 73048 0.7428 vmlinux-3.15.0-rc5-mmotm-20140513 end_page_writeback 23740 0.2409 vmlinux-3.15.0-rc5-lessatomic end_page_writeback Signed-off-by:
Mel Gorman <mgorman@suse.de> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 888cf2db) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Mel Gorman authored
Discarding buffers uses a bunch of atomic operations when discarding buffers because ...... I can't think of a reason. Use a cmpxchg loop to clear all the necessary flags. In most (all?) cases this will be a single atomic operations. [akpm@linux-foundation.org: move BUFFER_FLAGS_DISCARD into the .c file] Signed-off-by:
Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jan Kara <jack@suse.cz> Cc: Michal Hocko <mhocko@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit e7470ee8) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Mel Gorman authored
shmem_getpage_gfp uses an atomic operation to set the SwapBacked field before it's even added to the LRU or visible. This is unnecessary as what could it possible race against? Use an unlocked variant. Signed-off-by:
Mel Gorman <mgorman@suse.de> Acked-by:
Johannes Weiner <hannes@cmpxchg.org> Acked-by:
Rik van Riel <riel@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jan Kara <jack@suse.cz> Cc: Michal Hocko <mhocko@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 07a42788) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Mel Gorman authored
Currently it's calculated once per zone in the zonelist. Signed-off-by:
Mel Gorman <mgorman@suse.de> Acked-by:
Johannes Weiner <hannes@cmpxchg.org> Reviewed-by:
Rik van Riel <riel@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jan Kara <jack@suse.cz> Cc: Michal Hocko <mhocko@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit a6e21b14) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Mel Gorman authored
A node/zone index is used to check if pages are compatible for merging but this happens unconditionally even if the buddy page is not free. Defer the calculation as long as possible. Ideally we would check the zone boundary but nodes can overlap. Signed-off-by:
Mel Gorman <mgorman@suse.de> Acked-by:
Johannes Weiner <hannes@cmpxchg.org> Acked-by:
Rik van Riel <riel@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jan Kara <jack@suse.cz> Cc: Michal Hocko <mhocko@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit d34c5fa0) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Mel Gorman authored
If a zone cannot be used for a dirty page then it gets marked "full" which is cached in the zlc and later potentially skipped by allocation requests that have nothing to do with dirty zones. Signed-off-by:
Mel Gorman <mgorman@suse.de> Acked-by:
Johannes Weiner <hannes@cmpxchg.org> Reviewed-by:
Rik van Riel <riel@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 800a1e75) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Mel Gorman authored
The zlc is used on NUMA machines to quickly skip over zones that are full. However it is always updated, even for the first zone scanned when the zlc might not even be active. As it's a write to a bitmap that potentially bounces cache line it's deceptively expensive and most machines will not care. Only update the zlc if it was active. Signed-off-by:
Mel Gorman <mgorman@suse.de> Acked-by:
Johannes Weiner <hannes@cmpxchg.org> Reviewed-by:
Rik van Riel <riel@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 65bb3719) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Shaohua Li authored
We use the accessed bit to age a page at page reclaim time, and currently we also flush the TLB when doing so. But in some workloads TLB flush overhead is very heavy. In my simple multithreaded app with a lot of swap to several pcie SSDs, removing the tlb flush gives about 20% ~ 30% swapout speedup. Fortunately just removing the TLB flush is a valid optimization: on x86 CPUs, clearing the accessed bit without a TLB flush doesn't cause data corruption. It could cause incorrect page aging and the (mistaken) reclaim of hot pages, but the chance of that should be relatively low. So as a performance optimization don't flush the TLB when clearing the accessed bit, it will eventually be flushed by a context switch or a VM operation anyway. [ In the rare event of it not getting flushed for a long time the delay shouldn't really matter because there's no real memory pressure for swapout to react to. ] Suggested-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Shaohua Li <shli@fusionio.com> Acked-by:
Rik van Riel <riel@redhat.com> Acked-by:
Mel Gorman <mgorman@suse.de> Acked-by:
Hugh Dickins <hughd@google.com> Acked-by:
Johannes Weiner <hannes@cmpxchg.org> Cc: linux-mm@kvack.org Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20140408075809.GA1764@kernel.org [ Rewrote the changelog and the code comments. ] Signed-off-by:
Ingo Molnar <mingo@kernel.org> (cherry picked from commit b13b1d2d) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
David Rientjes authored
Async compaction terminates prematurely when need_resched(), see compact_checklock_irqsave(). This can never trigger, however, if the cond_resched() in isolate_migratepages_range() always takes care of the scheduling. If the cond_resched() actually triggers, then terminate this pageblock scan for async compaction as well. Signed-off-by:
David Rientjes <rientjes@google.com> Acked-by:
Mel Gorman <mgorman@suse.de> Acked-by:
Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit aeef4b83) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Heesub Shin authored
Remove code lines currently not in use or never called. Signed-off-by:
Heesub Shin <heesub.shin@samsung.com> Acked-by:
Vlastimil Babka <vbabka@suse.cz> Cc: Dongjun Shin <d.j.shin@samsung.com> Cc: Sunghwan Yun <sunghwan.yun@samsung.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Christoph Lameter <cl@linux.com> Cc: Rik van Riel <riel@redhat.com> Cc: Dongjun Shin <d.j.shin@samsung.com> Cc: Sunghwan Yun <sunghwan.yun@samsung.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 13fb44e4) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Fabian Frederick authored
Commit f9acc8c7 ("readahead: sanify file_ra_state names") left ra_submit with a single function call. Move ra_submit to internal.h and inline it to save some stack. Thanks to Andrew Morton for commenting different versions. Signed-off-by:
Fabian Frederick <fabf@skynet.be> Suggested-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 29f175d1) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Al Viro authored
... it does that itself (via kmap_atomic()) Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> (cherry picked from commit 9e8c2af9) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Sasha Levin authored
This patch removes read_cache_page_async() which wasn't really needed anywhere and simplifies the code around it a bit. read_cache_page_async() is useful when we want to read a page into the cache without waiting for it to complete. This happens when the appropriate callback 'filler' doesn't complete its read operation and releases the page lock immediately, and instead queues a different completion routine to do that. This never actually happened anywhere in the code. read_cache_page_async() had 3 different callers: - read_cache_page() which is the sync version, it would just wait for the requested read to complete using wait_on_page_read(). - JFFS2 would call it from jffs2_gc_fetch_page(), but the filler function it supplied doesn't do any async reads, and would complete before the filler function returns - making it actually a sync read. - CRAMFS would call it using the read_mapping_page_async() wrapper, with a similar story to JFFS2 - the filler function doesn't do anything that reminds async reads and would always complete before the filler function returns. To sum it up, the code in mm/filemap.c never took advantage of having read_cache_page_async(). While there are filler callbacks that do async reads (such as the block one), we always called it with the read_cache_page(). This patch adds a mandatory wait for read to complete when adding a new page to the cache, and removes read_cache_page_async() and its wrappers. Signed-off-by:
Sasha Levin <sasha.levin@oracle.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 67f9fd91) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Johannes Weiner authored
The radix tree hole searching code is only used for page cache, for example the readahead code trying to get a a picture of the area surrounding a fault. It sufficed to rely on the radix tree definition of holes, which is "empty tree slot". But this is about to change, though, as shadow page descriptors will be stored in the page cache after the actual pages get evicted from memory. Move the functions over to mm/filemap.c and make them native page cache operations, where they can later be adapted to handle the new definition of "page cache hole". Signed-off-by:
Johannes Weiner <hannes@cmpxchg.org> Reviewed-by:
Rik van Riel <riel@redhat.com> Reviewed-by:
Minchan Kim <minchan@kernel.org> Acked-by:
Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bob Liu <bob.liu@oracle.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Metin Doslu <metin@citusdata.com> Cc: Michel Lespinasse <walken@google.com> Cc: Ozgun Erdogan <ozgun@citusdata.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin <klamm@yandex-team.ru> Cc: Ryan Mallon <rmallon@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit e7b563bb) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Johannes Weiner authored
Page cache radix tree slots are usually stabilized by the page lock, but shmem's swap cookies have no such thing. Because the overall truncation loop is lockless, the swap entry is currently confirmed by a tree lookup and then deleted by another tree lookup under the same tree lock region. Use radix_tree_delete_item() instead, which does the verification and deletion with only one lookup. This also allows removing the delete-only special case from shmem_radix_tree_replace(). Signed-off-by:
Johannes Weiner <hannes@cmpxchg.org> Reviewed-by:
Minchan Kim <minchan@kernel.org> Reviewed-by:
Rik van Riel <riel@redhat.com> Acked-by:
Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bob Liu <bob.liu@oracle.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Metin Doslu <metin@citusdata.com> Cc: Michel Lespinasse <walken@google.com> Cc: Ozgun Erdogan <ozgun@citusdata.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin <klamm@yandex-team.ru> Cc: Ryan Mallon <rmallon@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 6dbaf22c) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Johannes Weiner authored
Provide a function that does not just delete an entry at a given index, but also allows passing in an expected item. Delete only if that item is still located at the specified index. This is handy when lockless tree traversals want to delete entries as well because they don't have to do an second, locked lookup to verify the slot has not changed under them before deleting the entry. Signed-off-by:
Johannes Weiner <hannes@cmpxchg.org> Reviewed-by:
Minchan Kim <minchan@kernel.org> Reviewed-by:
Rik van Riel <riel@redhat.com> Acked-by:
Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bob Liu <bob.liu@oracle.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Metin Doslu <metin@citusdata.com> Cc: Michel Lespinasse <walken@google.com> Cc: Ozgun Erdogan <ozgun@citusdata.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin <klamm@yandex-team.ru> Cc: Ryan Mallon <rmallon@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 53c59f26) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
David Rientjes authored
The conditions that control the isolation mode in isolate_migratepages_range() do not change during the iteration, so extract them out and only define the value once. This actually does have an effect, gcc doesn't optimize it itself because of cc->sync. Signed-off-by:
David Rientjes <rientjes@google.com> Cc: Mel Gorman <mgorman@suse.de> Acked-by:
Rik van Riel <riel@redhat.com> Acked-by:
Vlastimil Babka <vbabka@suse.cz> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit da1c67a7) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-