- 16 Feb, 2011 1 commit
-
-
Ingo Molnar authored
Merge reason: the topic is ready - consolidate it into the more generic x86/mm tree and prevent conflicts. Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
- 14 Feb, 2011 8 commits
-
-
Kamal Mostafa authored
Emit warning when "mem=nopentium" is specified on any arch other than x86_32 (the only that arch supports it). Signed-off-by: Kamal Mostafa <kamal@canonical.com> BugLink: http://bugs.launchpad.net/bugs/553464 Cc: Yinghai Lu <yinghai@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> LKML-Reference: <1296783486-23033-2-git-send-email-kamal@canonical.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org>
-
Kamal Mostafa authored
Avoid removing all of memory and panicing when "mem={invalid}" is specified, e.g. mem=blahblah, mem=0, or mem=nopentium (on platforms other than x86_32). Signed-off-by: Kamal Mostafa <kamal@canonical.com> BugLink: http://bugs.launchpad.net/bugs/553464 Cc: Yinghai Lu <yinghai@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: <stable@kernel.org> # .3x: as far back as it applies LKML-Reference: <1296783486-23033-1-git-send-email-kamal@canonical.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Shaohua Li authored
This one isn't related to previous patch. If online cpus are below NUM_INVALIDATE_TLB_VECTORS, we don't need the lock. The comments in the code declares we don't need the check, but a hot lock still needs an atomic operation and expensive, so add the check here. Uses nr_cpu_ids here as suggested by Eric Dumazet. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> LKML-Reference: <1295232730.1949.710.camel@sli10-conroe> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Shaohua Li authored
Make the maxium TLB invalidate vectors depend on NR_CPUS linearly, with a maximum of 32 vectors. We currently only have 8 vectors for TLB invalidation and that is clearly inadequate. If we have a lot of CPUs, the CPUs need share the 8 vectors and tlbstate_lock is used to protect them. flush_tlb_page() is heavily used in page reclaim, which will cause a lot of lock contention for tlbstate_lock. Andi Kleen suggested increasing the vectors number to 32, which should be good for current typical systems to reduce the tlbstate_lock contention. My test system has 4 sockets and 64G memory, and 64 CPUs. My workload creates 64 processes. Each process mmap reads a big empty sparse file. The total size of the files are 2*total_mem, so this will cause a lot of page reclaim. Below is the result I get from perf call-graph profiling: without the patch: ------------------ 24.25% usemem [kernel] [k] _raw_spin_lock | --- _raw_spin_lock | |--42.15%-- native_flush_tlb_others with the patch: ------------------ 14.96% usemem [kernel] [k] _raw_spin_lock | --- _raw_spin_lock |--13.89%-- native_flush_tlb_others So this heavily reduces the tlbstate_lock contention. Suggested-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: Shaohua Li <shaohua.li@intel.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1295232727.1949.709.camel@sli10-conroe> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Shaohua Li authored
Add up to 32 invalidate_interrupt handlers. How many handlers are added depends on NUM_INVALIDATE_TLB_VECTORS. So if NUM_INVALIDATE_TLB_VECTORS is smaller than 32, we reduce code size. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> LKML-Reference: <1295232725.1949.708.camel@sli10-conroe> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Shaohua Li authored
Cleanup the vector usage and make them continuous if possible. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> LKML-Reference: <1295232722.1949.707.camel@sli10-conroe> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Ingo Molnar authored
Conflicts: arch/x86/mm/numa_64.c Merge reason: fix the conflict, update to latest -rc and pick up this dependent fix from Yinghai: e6d2e2b2: memblock: don't adjust size in memblock_find_base() Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
David Miller authored
Commit c0e69a5b ("klist.c: bit 0 in pointer can't be used as flag") intended to make sure that all klist objects were at least pointer size aligned, but used the constant "4" which only works on 32-bit. Use "sizeof(void *)" which is correct in all cases. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> Cc: stable <stable@kernel.org> Cc: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
- 13 Feb, 2011 6 commits
-
-
git://git.secretlab.ca/git/linux-2.6Linus Torvalds authored
* 'spi/merge' of git://git.secretlab.ca/git/linux-2.6: devicetree-discuss is moderated for non-subscribers MAINTAINERS: Add entry for GPIO subsystem dt: add documentation of ARM dt boot interface dt: Remove obsolete description of powerpc boot interface dt: Move device tree documentation out of powerpc directory spi/spi_sh_msiof: fix wrong address calculation, which leads to an Oops
-
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6Linus Torvalds authored
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: ALSA: hda - add quirk for Ordissimo EVE using a realtek ALC662 ALSA: hrtimer: remove superfluous tasklet invocation ALSA: hrtimer: handle delayed timer interrupts ALSA: HDA: Add subwoofer quirk for Acer Aspire 8942G ALSA: hda - Don't handle empty patch files ALSA: hda - Fix missing CA initialization for HDMI/DP ALSA: usbaudio - Enable the E-MU 0204 USB ALSA: hda - switch lfe with side in mixer for 4930g ASoC: Improve WM8994 digital power sequencing ASoC: Create an AIF1ADCDAT signal widget to match AIF2 asoc: davinci: da830/omap-l137: correct cpu_dai_name ASoC: fill in snd_soc_pcm_runtime.card before calling snd_soc_dai_link.init()
-
Linus Torvalds authored
This reverts commit 47970b1b. It turns out it breaks several distributions. Looks like the stricter selinux checks fail due to selinux policies not being set to allow the access - breaking X, but also lspci. So while the change was clearly the RightThing(tm) to do in theory, in practice we have backwards compatibility issues making it not work. Reported-by: Dave Young <hidave.darkstar@gmail.com> Acked-by: David Airlie <airlied@linux.ie> Acked-by: Alex Riesen <raa.lkml@gmail.com> Cc: Eric Paris <eparis@redhat.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: James Morris <jmorris@namei.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Takashi Iwai authored
-
Grant Likely authored
-
Paul Bolle authored
Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
-
- 12 Feb, 2011 25 commits
-
-
Grant Likely authored
I'll probably regret this.... Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4Linus Torvalds authored
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: jbd2: call __jbd2_log_start_commit with j_state_lock write locked ext4: serialize unaligned asynchronous DIO ext4: make grpinfo slab cache names static ext4: Fix data corruption with multi-block writepages support ext4: fix up ext4 error handling ext4: unregister features interface on module unload ext4: fix panic on module unload when stopping lazyinit thread
-
Theodore Ts'o authored
On an SMP ARM system running ext4, I've received a report that the first J_ASSERT in jbd2_journal_commit_transaction has been triggering: J_ASSERT(journal->j_running_transaction != NULL); While investigating possible causes for this problem, I noticed that __jbd2_log_start_commit() is getting called with j_state_lock only read-locked, in spite of the fact that it's possible for it might j_commit_request. Fix this by grabbing the necessary information so we can test to see if we need to start a new transaction before dropping the read lock, and then calling jbd2_log_start_commit() which will grab the write lock. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
-
Eric Sandeen authored
ext4 has a data corruption case when doing non-block-aligned asynchronous direct IO into a sparse file, as demonstrated by xfstest 240. The root cause is that while ext4 preallocates space in the hole, mappings of that space still look "new" and dio_zero_block() will zero out the unwritten portions. When more than one AIO thread is going, they both find this "new" block and race to zero out their portion; this is uncoordinated and causes data corruption. Dave Chinner fixed this for xfs by simply serializing all unaligned asynchronous direct IO. I've done the same here. The difference is that we only wait on conversions, not all IO. This is a very big hammer, and I'm not very pleased with stuffing this into ext4_file_write(). But since ext4 is DIO_LOCKING, we need to serialize it at this high level. I tried to move this into ext4_ext_direct_IO, but by then we have the i_mutex already, and we will wait on the work queue to do conversions - which must also take the i_mutex. So that won't work. This was originally exposed by qemu-kvm installing to a raw disk image with a normal sector-63 alignment. I've tested a backport of this patch with qemu, and it does avoid the corruption. It is also quite a lot slower (14 min for package installs, vs. 8 min for well-aligned) but I'll take slow correctness over fast corruption any day. Mingming suggested that we can track outstanding conversions, and wait on those so that non-sparse files won't be affected, and I've implemented that here; unaligned AIO to nonsparse files won't take a perf hit. [tytso@mit.edu: Keep the mutex as a hashed array instead of bloating the ext4 inode] [tytso@mit.edu: Fix up namespace issues so that global variables are protected with an "ext4_" prefix.] Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
-
Eric Sandeen authored
In 2.6.37 I was running into oopses with repeated module loads & unloads. I tracked this down to: fb1813f4 ext4: use dedicated slab caches for group_info structures (this was in addition to the features advert unload problem) The kstrdup & subsequent kfree of the cache name was causing a double free. In slub, at least, if I read it right it allocates & frees the name itself, slab seems to do something different... so in slub I think we were leaking -our- cachep->name, and double freeing the one allocated by slub. After getting lost in slab/slub/slob a bit, I just looked at other sized-caches that get allocated. jbd2, biovec, sgpool all do it more or less the way jbd2 does. Below patch follows the jbd2 method of dynamically allocating a cache at mount time from a list of static names. (This might also possibly fix a race creating the caches with parallel mounts running). [Folded in a fix from Dan Carpenter which fixed an off-by-one error in the original patch] Cc: stable@kernel.org Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
-
Grant Likely authored
I'll probably regret this.... Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
-
git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds authored
* 'kvm-updates/2.6.38' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: SVM: Make sure KERNEL_GS_BASE is valid when loading gs_index
-
git://git.kernel.org/pub/scm/linux/kernel/git/bp/bpLinus Torvalds authored
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: amd64_edac: Fix DIMMs per DCTs output
-
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlmLinus Torvalds authored
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: dlm: use single thread workqueues
-
git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6Linus Torvalds authored
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: don't always drop malformed replies on the floor (try #3) cifs: clean up checks in cifs_echo_request [CIFS] Do not send SMBEcho requests on new sockets until SMBNegotiate
-
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/stagingLinus Torvalds authored
* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/staging: hwmon: (emc1403) Fix I2C address range hwmon: (lm63) Consider LM64 temperature offset
-
Linus Torvalds authored
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: pci: use security_capable() when checking capablities during config space read security: add cred argument to security_capable() tpm_tis: Use timeouts returned from TPM
-
Linus Torvalds authored
Merge branch 's5p-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung * 's5p-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung: ARM: SAMSUNG: Ensure struct sys_device is declared in plat/pm.h ARM: S5PV310: Cleanup System MMU ARM: S5PV310: Add support System MMU on SMDKV310
-
git://git.monstr.eu/linux-2.6-microblazeLinus Torvalds authored
* 'next' of git://git.monstr.eu/linux-2.6-microblaze: microblaze: Fix msr instruction detection microblaze: Fix pte_update function microblaze: Fix asm compilation warning microblaze: Fix IRQ flag handling for MSR=0
-
Julia Lawall authored
This code makes two calls to clk_get, then test both return values and fails if either failed. The problem is that in the first inner if, where the first call to clk_get has failed, it don't know if the second call has failed as well. So it don't know whether clk_get should be called on the result of the second call. Of course, it would be possible to test that value again. A simpler solution is just to test the result of calling clk_get directly after each call. The semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r@ position p1,p2; expression e; statement S; @@ e = clk_get@p1(...) ... if@p2 (IS_ERR(e)) S @@ expression e; statement S; identifier l; position r.p1, p2 != r.p2; @@ *e = clk_get@p1(...) ... when != clk_put(e) *if@p2 (...) { ... when != clk_put(e) * return ...; }// </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Acked-by: Tony Lindgren <tony@atomide.com> Acked-by: Amit Kucheria <amit.kucheria@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
KAMEZAWA Hiroyuki authored
mem_cgroup_uncharge_page() should be called in all failure cases after mem_cgroup_charge_newpage() is called in huge_memory.c::collapse_huge_page() [ 4209.076861] BUG: Bad page state in process khugepaged pfn:1e9800 [ 4209.077601] page:ffffea0006b14000 count:0 mapcount:0 mapping: (null) index:0x2800 [ 4209.078674] page flags: 0x40000000004000(head) [ 4209.079294] pc:ffff880214a30000 pc->flags:2146246697418756 pc->mem_cgroup:ffffc9000177a000 [ 4209.082177] (/A) [ 4209.082500] Pid: 31, comm: khugepaged Not tainted 2.6.38-rc3-mm1 #1 [ 4209.083412] Call Trace: [ 4209.083678] [<ffffffff810f4454>] ? bad_page+0xe4/0x140 [ 4209.084240] [<ffffffff810f53e6>] ? free_pages_prepare+0xd6/0x120 [ 4209.084837] [<ffffffff8155621d>] ? rwsem_down_failed_common+0xbd/0x150 [ 4209.085509] [<ffffffff810f5462>] ? __free_pages_ok+0x32/0xe0 [ 4209.086110] [<ffffffff810f552b>] ? free_compound_page+0x1b/0x20 [ 4209.086699] [<ffffffff810fad6c>] ? __put_compound_page+0x1c/0x30 [ 4209.087333] [<ffffffff810fae1d>] ? put_compound_page+0x4d/0x200 [ 4209.087935] [<ffffffff810fb015>] ? put_page+0x45/0x50 [ 4209.097361] [<ffffffff8113f779>] ? khugepaged+0x9e9/0x1430 [ 4209.098364] [<ffffffff8107c870>] ? autoremove_wake_function+0x0/0x40 [ 4209.099121] [<ffffffff8113ed90>] ? khugepaged+0x0/0x1430 [ 4209.099780] [<ffffffff8107c236>] ? kthread+0x96/0xa0 [ 4209.100452] [<ffffffff8100dda4>] ? kernel_thread_helper+0x4/0x10 [ 4209.101214] [<ffffffff8107c1a0>] ? kthread+0x0/0xa0 [ 4209.101842] [<ffffffff8100dda0>] ? kernel_thread_helper+0x0/0x10 Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Johannes Weiner authored
Commit 3e7d3449 ("mm: vmscan: reclaim order-0 and use compaction instead of lumpy reclaim") introduced an indefinite loop in shrink_zone(). It meant to break out of this loop when no pages had been reclaimed and not a single page was even scanned. The way it would detect the latter is by taking a snapshot of sc->nr_scanned at the beginning of the function and comparing it against the new sc->nr_scanned after the scan loop. But it would re-iterate without updating that snapshot, looping forever if sc->nr_scanned changed at least once since shrink_zone() was invoked. This is not the sole condition that would exit that loop, but it requires other processes to change the zone state, as the reclaimer that is stuck obviously can not anymore. This is only happening for higher-order allocations, where reclaim is run back to back with compaction. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Michal Hocko <mhocko@suse.cz> Tested-by: Kent Overstreet<kent.overstreet@gmail.com> Reported-by: Kent Overstreet <kent.overstreet@gmail.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Michel Lespinasse authored
If the page is going to be written to, __do_page needs to break COW. However, the old page (before breaking COW) was never mapped mapped into the current pte (__do_fault is only called when the pte is not present), so vmscan can't have marked the old page as PageMlocked due to being mapped in __do_fault's VMA. Therefore, __do_fault() does not need to worry about clearing PageMlocked() on the old page. Signed-off-by: Michel Lespinasse <walken@google.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Michel Lespinasse authored
vmscan can lazily find pages that are mapped within VM_LOCKED vmas, and set the PageMlocked bit on these pages, transfering them onto the unevictable list. When do_wp_page() breaks COW within a VM_LOCKED vma, it may need to clear PageMlocked on the old page and set it on the new page instead. This change fixes an issue where do_wp_page() was clearing PageMlocked on the old page while the pte was still pointing to it (as well as rmap). Therefore, we were not protected against vmscan immediately transfering the old page back onto the unevictable list. This could cause pages to get stranded there forever. I propose to move the corresponding code to the end of do_wp_page(), after the pte (and rmap) have been pointed to the new page. Additionally, we can use munlock_vma_page() instead of clear_page_mlock(), so that the old page stays mlocked if there are still other VM_LOCKED vmas mapping it. Signed-off-by: Michel Lespinasse <walken@google.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Yinghai Lu authored
While applying patch to use memblock to find aperture for 64bit x86. Ingo found system with 1g + force_iommu > No AGP bridge found > Node 0: aperture @ 38000000 size 32 MB > Aperture pointing to e820 RAM. Ignoring. > Your BIOS doesn't leave a aperture memory hole > Please enable the IOMMU option in the BIOS setup > This costs you 64 MB of RAM > Cannot allocate aperture memory hole (0,65536K) the corresponding code: addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<20); if (addr == MEMBLOCK_ERROR || addr + aper_size > 0xffffffff) { printk(KERN_ERR "Cannot allocate aperture memory hole (%lx,%uK)\n", addr, aper_size>>10); return 0; } memblock_x86_reserve_range(addr, addr + aper_size, "aperture64") fails because memblock core code align the size with 512M. That could make size way too big. So don't align the size in that case. actually __memblock_alloc_base, the another caller already align that before calling that function. BTW. x86 does not use __memblock_alloc_base... Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Airlie <airlied@linux.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Soren Hansen authored
Commit 2a48fc0a ("block: autoconvert trivial BKL users to private mutex") replaced uses of the BKL in the nbd driver with mutex operations. Since then, I've been been seeing these lock ups: INFO: task qemu-nbd:16115 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. qemu-nbd D 0000000000000001 0 16115 16114 0x00000004 ffff88007d775d98 0000000000000082 ffff88007d775fd8 ffff88007d774000 0000000000013a80 ffff8800020347e0 ffff88007d775fd8 0000000000013a80 ffff880133730000 ffff880002034440 ffffea0004333db8 ffffffffa071c020 Call Trace: [<ffffffff815b9997>] __mutex_lock_slowpath+0xf7/0x180 [<ffffffff815b93eb>] mutex_lock+0x2b/0x50 [<ffffffffa071a21c>] nbd_ioctl+0x6c/0x1c0 [nbd] [<ffffffff812cb970>] blkdev_ioctl+0x230/0x730 [<ffffffff811967a1>] block_ioctl+0x41/0x50 [<ffffffff81175c03>] do_vfs_ioctl+0x93/0x370 [<ffffffff81175f61>] sys_ioctl+0x81/0xa0 [<ffffffff8100c0c2>] system_call_fastpath+0x16/0x1b Instrumenting the nbd module's ioctl handler with some extra logging clearly shows the NBD_DO_IT ioctl being invoked which is a long-lived ioctl in the sense that it doesn't return until another ioctl asks the driver to disconnect. However, that other ioctl blocks, waiting for the module-level mutex that replaced the BKL, and then we're stuck. This patch removes the module-level mutex altogether. It's clearly wrong, and as far as I can see, it's entirely unnecessary, since the nbd driver maintains per-device mutexes, and I don't see anything that would require a module-level (or kernel-level, for that matter) mutex. Signed-off-by: Soren Hansen <soren@linux2go.dk> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Paul Clements <paul.clements@steeleye.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: <stable@kernel.org> [2.6.37.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alexander Strakh authored
In file drivers/rtc/rtc-proc.c seq_open() can return -ENOMEM. 86 if (!try_module_get(THIS_MODULE)) 87 return -ENODEV; 88 89 return single_open(file, rtc_proc_show, rtc); In this case before exiting (line 89) from rtc_proc_open the module_put(THIS_MODULE) must be called. Found by Linux Device Drivers Verification Project Signed-off-by: Alexander Strakh <strakh@ispras.ru> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Roland Stigge authored
Add a mutex to register communication and handling. Without the mutex, GPIOs didn't switch as expected when toggled in a fast sequence of status changes of multiple outputs. Signed-off-by: Roland Stigge <stigge@antcom.de> Acked-by: Eric Miao <eric.y.miao@gmail.com> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: Marc Zyngier <maz@misterjones.org> Cc: Ben Gardner <bgardner@wabtec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Tejun Heo authored
The wake_up_process() call in ptrace_detach() is spurious and not interlocked with the tracee state. IOW, the tracee could be running or sleeping in any place in the kernel by the time wake_up_process() is called. This can lead to the tracee waking up unexpectedly which can be dangerous. The wake_up is spurious and should be removed but for now reduce its toxicity by only waking up if the tracee is in TRACED or STOPPED state. This bug can possibly be used as an attack vector. I don't think it will take too much effort to come up with an attack which triggers oops somewhere. Most sleeps are wrapped in condition test loops and should be safe but we have quite a number of places where sleep and wakeup conditions are expected to be interlocked. Although the window of opportunity is tiny, ptrace can be used by non-privileged users and with some loading the window can definitely be extended and exploited. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Roland McGrath <roland@redhat.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Boaz Harrosh authored
In commit fa0d7e3d ("fs: icache RCU free inodes"), we use rcu free inode instead of freeing the inode directly. It causes a crash when we rmmod immediately after we umount the volume[1]. So we need to call rcu_barrier after we kill_sb so that the inode is freed before we do rmmod. The idea is inspired by Aneesh Kumar. rcu_barrier will wait for all callbacks to end before preceding. The original patch was done by Tao Ma, but synchronize_rcu() is not enough here. 1. http://marc.info/?l=linux-fsdevel&m=129680863330185&w=2Tested-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-