- 02 Nov, 2011 40 commits
-
-
git://linux-nfs.org/~bfields/linuxLinus Torvalds authored
* 'for-3.2' of git://linux-nfs.org/~bfields/linux: nfsd4: typo logical vs bitwise negate in nfsd4_decode_share_access
-
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linuxLinus Torvalds authored
* 'misc-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: MAINTAINERS: Update entry for IA64 [IA64] gpio: GENERIC_GPIO default must be n [IA64[ add CONFIG_NET_VENDOR_INTEL=y to default config files where needed [IA64] agp/hp-agp: Allow binding user memory to the AGP GART [IA64] sn2: add missing put_cpu()
-
Linus Torvalds authored
Says Andrew: "60 patches. That's good enough for -rc1 I guess. I have quite a lot of detritus to be rechecked, work through maintainers, etc. - most of the remains of MM - rtc - various misc - cgroups - memcg - cpusets - procfs - ipc - rapidio - sysctl - pps - w1 - drivers/misc - aio" * akpm: (60 commits) memcg: replace ss->id_lock with a rwlock aio: allocate kiocbs in batches drivers/misc/vmw_balloon.c: fix typo in code comment drivers/misc/vmw_balloon.c: determine page allocation flag can_sleep outside loop w1: disable irqs in critical section drivers/w1/w1_int.c: multiple masters used same init_name drivers/power/ds2780_battery.c: fix deadlock upon insertion and removal drivers/power/ds2780_battery.c: add a nolock function to w1 interface drivers/power/ds2780_battery.c: create central point for calling w1 interface w1: ds2760 and ds2780, use ida for id and ida_simple_get() to get it pps gpio client: add missing dependency pps: new client driver using GPIO pps: default echo function include/linux/dma-mapping.h: add dma_zalloc_coherent() sysctl: make CONFIG_SYSCTL_SYSCALL default to n sysctl: add support for poll() RapidIO: documentation update drivers/net/rionet.c: fix ethernet address macros for LE platforms RapidIO: fix potential null deref in rio_setup_device() RapidIO: add mport driver for Tsi721 bridge ...
-
Andrew Bresticker authored
While back-porting Johannes Weiner's patch "mm: memcg-aware global reclaim" for an internal effort, we noticed a significant performance regression during page-reclaim heavy workloads due to high contention of the ss->id_lock. This lock protects idr map, and serializes calls to idr_get_next() in css_get_next() (which is used during the memcg hierarchy walk). Since idr_get_next() is just doing a look up, we need only serialize it with respect to idr_remove()/idr_get_new(). By making the ss->id_lock a rwlock, contention is greatly reduced and performance improves. Tested: cat a 256m file from a ramdisk in a 128m container 50 times on each core (one file + container per core) in parallel on a NUMA machine. Result is the time for the test to complete in 1 of the containers. Both kernels included Johannes' memcg-aware global reclaim patches. Before rwlock patch: 1710.778s After rwlock patch: 152.227s Signed-off-by: Andrew Bresticker <abrestic@google.com> Cc: Paul Menage <menage@gmail.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ying Han <yinghan@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jeff Moyer authored
In testing aio on a fast storage device, I found that the context lock takes up a fair amount of cpu time in the I/O submission path. The reason is that we take it for every I/O submitted (see __aio_get_req). Since we know how many I/Os are passed to io_submit, we can preallocate the kiocbs in batches, reducing the number of times we take and release the lock. In my testing, I was able to reduce the amount of time spent in _raw_spin_lock_irq by .56% (average of 3 runs). The command I used to test this was: aio-stress -O -o 2 -o 3 -r 8 -d 128 -b 32 -i 32 -s 16384 <dev> I also tested the patch with various numbers of events passed to io_submit, and I ran the xfstests aio group of tests to ensure I didn't break anything. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Cc: Daniel Ehrenberg <dehrenberg@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Rakib Mullick authored
Fix typo in code comment. Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> Acked-by: Dmitry Torokhov <dtor@vmware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Rakib Mullick authored
In vmballoon_reserve_page(), flags has been passed from the callee function (vmballoon_inflate here). So, we can determine can_sleep outside the loop. Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> Acked-by: Dmitry Torokhov <dtor@vmware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jan Weitzel authored
Interrupting w1_delay() in w1_read_bit() results in missing the low level on the w1 line and receiving "1" instead of "0". Add local_irq_save()/local_irq_restore() around the critical section Signed-off-by: Jan Weitzel <j.weitzel@phytec.de> Acked-by: Evgeniy Polyakov <zbr@ioremap.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Florian Faber authored
When using multiple masters, w1_int.c would use the .init_name from w1.c for all entities, which will fail when creating a corresponding sysfs entry. This patch uses the unique name previously generated. WARNING: at fs/sysfs/dir.c:451 sysfs_add_one+0x48/0x64() sysfs: cannot create duplicate filename '/devices/w1 bus master' Modules linked in: Call trace: [<9001a604>] warn_slowpath_common+0x34/0x44 [<9001a64c>] warn_slowpath_fmt+0x14/0x18 [<90078020>] sysfs_add_one+0x48/0x64 [<900784ec>] create_dir+0x40/0x68 [<9007857a>] sysfs_create_dir+0x66/0x78 [<900c1a8a>] kobject_add_internal+0x6e/0x104 [<900c1bc0>] kobject_add_varg+0x20/0x2c [<900c1c1c>] kobject_add+0x30/0x3c [<900dbd66>] device_add+0x6a/0x378 [<900dbb4a>] device_initialize+0x12/0x48 [<900dc080>] device_register+0xc/0x10 [<900f99be>] w1_add_master_device+0x162/0x274 [<90008e7a>] w1_gpio_probe+0x66/0xb4 [<9000030c>] kernel_init+0x0/0xe8 [<900dde54>] platform_drv_probe+0xc/0xe [<9000030c>] kernel_init+0x0/0xe8 [<900dd4f8>] driver_probe_device+0x6c/0xdc [<900dd5fc>] __driver_attach+0x34/0x48 [<900dcce8>] bus_for_each_dev+0x2c/0x48 [<900dd5c8>] __driver_attach+0x0/0x48 [<900dd38c>] driver_attach+0x10/0x14 [<900dd16a>] bus_add_driver+0x6a/0x18c [<900dd768>] driver_register+0x60/0xb8 [<90011594>] __initcall_w1_therm_init6+0x0/0x4 [<90008e00>] w1_gpio_init+0x0/0x14 [<9000030c>] kernel_init+0x0/0xe8 [<900ddf48>] platform_driver_register+0x30/0x38 [<90011594>] __initcall_w1_therm_init6+0x0/0x4 [<90008e00>] w1_gpio_init+0x0/0x14 [<9000030c>] kernel_init+0x0/0xe8 [<900ddf5e>] platform_driver_probe+0xe/0x3c [<90008e0c>] w1_gpio_init+0xc/0x14 [<90011594>] __initcall_w1_therm_init6+0x0/0x4 [<90008e00>] w1_gpio_init+0x0/0x14 [<900126d4>] do_one_initcall+0x34/0x130 [<90000372>] kernel_init+0x66/0xe8 [<90011594>] __initcall_w1_therm_init6+0x0/0x4 [<9001ca3e>] do_exit+0x0/0x3a6 [<9000030c>] kernel_init+0x0/0xe8 [<9001ca3e>] do_exit+0x0/0x3a6 ---[ end trace 5a9233884fead918 ]--- kobject_add_internal failed for w1 bus master with -EEXIST, don't try to register things with the same name in the same directory. Signed-off-by: Florian Faber <faber@faberman.de> Cc: Evgeniy Polyakov <zbr@ioremap.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Clifton Barnes authored
Fixes the deadlock when inserting and removing the ds2780. Signed-off-by: Clifton Barnes <cabarnes@indesign-llc.com> Cc: Evgeniy Polyakov <zbr@ioremap.net> Cc: <stable@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Clifton Barnes authored
Adds a nolock function to the w1 interface to avoid locking the mutex if needed. Signed-off-by: Clifton Barnes <cabarnes@indesign-llc.com> Cc: Evgeniy Polyakov <zbr@ioremap.net> Cc: <stable@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Clifton Barnes authored
Simply creates one point to call the w1 interface. Signed-off-by: Clifton Barnes <cabarnes@indesign-llc.com> Cc: Evgeniy Polyakov <zbr@ioremap.net> Cc: <stable@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jonathan Cameron authored
Straightforward. As an aside, the ida_init calls are not needed as far as I can see needed. (DEFINE_IDA does the same already). Signed-off-by: Jonathan Cameron <jic23@cam.ac.uk> Cc: Evgeniy Polyakov <zbr@ioremap.net> Acked-by: Clifton Barnes <cabarnes@indesign-llc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Heiko Carstens authored
Add "depends on GENERIC_HARDIRQS" to avoid compile breakage on s390: drivers/built-in.o: In function `pps_gpio_remove': linux-next/drivers/pps/clients/pps-gpio.c:189: undefined reference to `free_irq' Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: James Nuss <jamesnuss@nanometrics.ca> Cc: Rodolfo Giometti <giometti@enneenne.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
James Nuss authored
This client driver allows you to use a GPIO pin as a source for PPS signals. Platform data [1] are used to specify the GPIO pin number, label, assert event edge type, and whether clear events are captured. This driver is based on the work by Ricardo Martins who submitted an initial implementation [2] of a PPS IRQ client driver to the linuxpps mailing-list on Dec 3 2010. [1] include/linux/pps-gpio.h [2] http://ml.enneenne.com/pipermail/linuxpps/2010-December/004155.html [akpm@linux-foundation.org: remove unneeded cast of void*] Signed-off-by: James Nuss <jamesnuss@nanometrics.ca> Cc: Ricardo Martins <rasm@fe.up.pt> Acked-by: Rodolfo Giometti <giometti@linux.it> Signed-off-by: Ricardo Martins <rasm@fe.up.pt> Cc: Alexander Gordeev <lasaine@lvk.cs.msu.su> Cc: Igor Plyatov <plyatov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
James Nuss authored
A default echo function has been provided so it is no longer an error when you specify PPS_ECHOASSERT or PPS_ECHOCLEAR without an explicit echo function. This allows some code re-use and also makes it easier to write client drivers since the default echo function does not normally need to change. Signed-off-by: James Nuss <jamesnuss@nanometrics.ca> Reviewed-by: Ben Gardiner <bengardiner@nanometrics.ca> Acked-by: Rodolfo Giometti <giometti@linux.it> Cc: Ricardo Martins <rasm@fe.up.pt> Cc: Alexander Gordeev <lasaine@lvk.cs.msu.su> Cc: Igor Plyatov <plyatov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Andrew Morton authored
Lots of driver code does a dma_alloc_coherent() and then zeroes out the memory with a memset. Make it easy for them. Cc: Alexandre Bounine <alexandre.bounine@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
WANG Cong authored
When I tried to send a patch to remove it, Andi told me we still need to keep compabitlies for old libc, so we can't remove this completely. Then just make it default to n and remove the doc from feature-removal-schedule.txt. Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Lucas De Marchi authored
Adding support for poll() in sysctl fs allows userspace to receive notifications of changes in sysctl entries. This adds a infrastructure to allow files in sysctl fs to be pollable and implements it for hostname and domainname. [akpm@linux-foundation.org: s/declare/define/ for definitions] Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi> Cc: Greg KH <gregkh@suse.de> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alexandre Bounine authored
Update rapidio.txt to reflect changes from recent patch. See http://marc.info/?l=linux-kernel&m=131285620113589&w=2 for details. Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Liu Gang <Gang.Liu@freescale.com> Cc: Micha Nelissen <micha@neli.hopto.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alexandre Bounine authored
Modify Ethernet addess macros to be compatible with BE/LE platforms Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Chul Kim <chul.kim@idt.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: <stable@kernel.org> [2.6.39+] Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alexandre Bounine authored
The "goto cleanup" path can deference "rswitch" when it is NULL. Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Dan Carpenter <error27@gmail.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Chul Kim <chul.kim@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alexandre Bounine authored
Add RapidIO mport driver for IDT TSI721 PCI Express-to-SRIO bridge device. The driver provides full set of callback functions defined for mport devices in RapidIO subsystem. It also is compatible with current version of RIONET driver (Ethernet over RapidIO messaging services). This patch is applicable to kernel versions starting from 2.6.39. Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Signed-off-by: Chul Kim <chul.kim@idt.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Liu Gang authored
arch/powerpc/sysdev/fsl_rio.c: release rapidio port I/O region resource if port failed to initialize The "struct rio_mport" contains a member of master port I/O memory resource structure "struct resource iores". This resource will be read from device tree and be used for rapidio R/W transaction memory space. Rapidio requests the port I/O memory resource under the root resource "iomem_resource". struct rio_mport *port; port = kzalloc(sizeof(struct rio_mport), GFP_KERNEL); request_resource(&iomem_resource, &port->iores); When port failed to initialize, allocated "rio_mport" structure memory will be freed, and the port I/O memory resource structure pointer "&port->iores" will be invalid. If other requests resource under "iomem_resource", "&port->iores" node may be operated in the child resources list and this will cause the system to crash. So the requested port I/O memory resource should be released before freeing allocated "rio_mport" structure. Signed-off-by: Liu Gang <Gang.Liu@freescale.com> Acked-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Liu Gang authored
The discovered bit in PGCCSR register indicates if the device has been discovered by system host. In Rapidio systems, some agent devices can also be master devices. They can issue requests into the system. Signed-off-by: Liu Gang <Gang.Liu@freescale.com> Acked-by: Alexandre Bounine <alexandre.bounine@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Will Drewry authored
Expand root=PARTUUID=UUID syntax to support selecting a root partition by integer offset from a known, unique partition. This approach provides similar properties to specifying a device and partition number, but using the UUID as the unique path prior to evaluating the offset. For example, root=PARTUUID=99DE9194-FC15-4223-9192-FC243948F88B/PARTNROFF=1 selects the partition with UUID 99DE.. then select the next partition. This change is motivated by a particular usecase in Chromium OS where the bootloader can easily determine what partition it is on (by UUID) but doesn't perform general partition table walking. That said, support for this model provides a direct mechanism for the user to modify the root partition to boot without specifically needing to extract each UUID or update the bootloader explicitly when the root partition UUID is changed (if it is recreated to be larger, for instance). Pinning to a /boot-style partition UUID allows the arbitrary root partition reconfiguration/modifications with slightly less ambiguity than just [dev][partition] and less stringency than the specific root partition UUID. [sfr@canb.auug.org.au: fix init sections warning] Signed-off-by: Will Drewry <wad@chromium.org> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Manfred Spraul authored
For the sysvsem undo, each task struct contains a sysv_sem structure with a pointer to the undo information. This pointer is only necessary if sysvipc is enabled - thus the pointer can be made conditional on CONFIG_SYSVIPC. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Manfred Spraul authored
include/linux/sem.h contains several structures that are only used within ipc/sem.c. The patch moves them into ipc/sem.c - there is no need to expose the structures to the whole kernel. No functional changes, only whitespace cleanups and 80-char per line fixes. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Manfred Spraul authored
semtimedop() does not handle spurious wakeups, it returns -EINTR to user space. Most other schedule() users would just loop and not return to user space. The patch adds such a loop to semtimedop() Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Reported-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Manfred Spraul authored
sys_semtimedop() may return -EIDRM although the semaphore operation completed successfully: thread 1: thread 2: semtimedop(), sleeps semop(): * acquires sem_lock() semtimedop() woken up due to timeout sem_lock() loops * notices that thread 2 could be completed. * performs the operations that thread 2 is sleeping on. * marks the semaphore operation as IN_WAKEUP * drops sem_lock(), does wakeup, sets return code to 0 * thread delayed due to interrupt, whatever * returns to user space * thread still delayed semctl(IPC_RMID) * acquires sem_lock() * ipc_rmid(), ipcp->deleted=1 * drops sem_lock() * thread finally continues - but seem_lock() now fails due to ipcp->deleted == 1 * returns -EIDRM instead of 0 The fix is trivial: Always use the return code in queue.status. In real world, the race probably doesn't matter: If the semaphore array is destroyed, the app is probably not interested if the last operation succeeded or was already cancelled. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Mike Galbraith <efault@gmx.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Tejun Heo authored
It's often convenient to be able to release resource from IRQ context. Make ida_simple_*() use irqsave/restore spin ops so that they are IRQ safe. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Vasiliy Kulikov authored
fd* files are restricted to the task's owner, and other users may not get direct access to them. But one may open any of these files and run any setuid program, keeping opened file descriptors. As there are permission checks on open(), but not on readdir() and read(), operations on the kept file descriptors will not be checked. It makes it possible to violate procfs permission model. Reading fdinfo/* may disclosure current fds' position and flags, reading directory contents of fdinfo/ and fd/ may disclosure the number of opened files by the target task. This information is not sensible per se, but it can reveal some private information (like length of a password stored in a file) under certain conditions. Used existing (un)lock_trace functions to check for ptrace_may_access(), but instead of using EPERM return code from it use EACCES to be consistent with existing proc_pid_follow_link()/proc_pid_readlink() return code. If they differ, attacker can guess what fds exist by analyzing stat() return code. Patched handlers: stat() for fd/*, stat() and read() for fdindo/*, readdir() and lookup() for fd/ and fdinfo/. Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: <stable@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Pavel Emelyanov authored
On reading sysctl dirs we should return -EISDIR instead of -EINVAL. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
David Rientjes authored
{get,put}_mems_allowed() exist so that general kernel code may locklessly access a task's set of allowable nodes without having the chance that a concurrent write will cause the nodemask to be empty on configurations where MAX_NUMNODES > BITS_PER_LONG. This could incur a significant delay, however, especially in low memory conditions because the page allocator is blocking and reclaim requires get_mems_allowed() itself. It is not atypical to see writes to cpuset.mems take over 2 seconds to complete, for example. In low memory conditions, this is problematic because it's one of the most imporant times to change cpuset.mems in the first place! The only way a task's set of allowable nodes may change is through cpusets by writing to cpuset.mems and when attaching a task to a generic code is not reading the nodemask with get_mems_allowed() at the same time, and then clearing all the old nodes. This prevents the possibility that a reader will see an empty nodemask at the same time the writer is storing a new nodemask. If at least one node remains unchanged, though, it's possible to simply set all new nodes and then clear all the old nodes. Changing a task's nodemask is protected by cgroup_mutex so it's guaranteed that two threads are not changing the same task's nodemask at the same time, so the nodemask is guaranteed to be stored before another thread changes it and determines whether a node remains set or not. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Miao Xie <miaox@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Paul Menage <paul@paulmenage.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
H Hartley Sweeten authored
warning: symbol 'swap_cgroup_ctrl' was not declared. Should it be static? Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Cc: Paul Menage <paul@paulmenage.org> Cc: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Steven Rostedt authored
Various code in memcontrol.c () calls this_cpu_read() on the calculations to be done from two different percpu variables, or does an open-coded read-modify-write on a single percpu variable. Disable preemption throughout these operations so that the writes go to the correct palces. [hannes@cmpxchg.org: added this_cpu to __this_cpu conversion] Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Greg Thelen <gthelen@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Johannes Weiner authored
There is a potential race between a thread charging a page and another thread putting it back to the LRU list: charge: putback: SetPageCgroupUsed SetPageLRU PageLRU && add to memcg LRU PageCgroupUsed && add to memcg LRU The order of setting one flag and checking the other is crucial, otherwise the charge may observe !PageLRU while the putback observes !PageCgroupUsed and the page is not linked to the memcg LRU at all. Global memory pressure may fix this by trying to isolate and putback the page for reclaim, where that putback would link it to the memcg LRU again. Without that, the memory cgroup is undeletable due to a charge whose physical page can not be found and moved out. Signed-off-by: Johannes Weiner <jweiner@redhat.com> Cc: Ying Han <yinghan@google.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Johannes Weiner authored
Reclaim decides to skip scanning an active list when the corresponding inactive list is above a certain size in comparison to leave the assumed working set alone while there are still enough reclaim candidates around. The memcg implementation of comparing those lists instead reports whether the whole memcg is low on the requested type of inactive pages, considering all nodes and zones. This can lead to an oversized active list not being scanned because of the state of the other lists in the memcg, as well as an active list being scanned while its corresponding inactive list has enough pages. Not only is this wrong, it's also a scalability hazard, because the global memory state over all nodes and zones has to be gathered for each memcg and zone scanned. Make these calculations purely based on the size of the two LRU lists that are actually affected by the outcome of the decision. Signed-off-by: Johannes Weiner <jweiner@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: Ying Han <yinghan@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Igor Mammedov authored
If somebody is touching data too early, it might be easier to diagnose a problem when dereferencing NULL at mem->info.nodeinfo[node] than trying to understand why mem_cgroup_per_zone is [un|partly]initialized. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
KAMEZAWA Hiroyuki authored
Before calling schedule_timeout(), task state should be changed. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-