- 08 May, 2010 6 commits
-
-
Cyrill Gorcunov authored
RAW events are special and we should be ready for user passing in insane event index values. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> LKML-Reference: <20100508112717.315897547@openvz.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Cyrill Gorcunov authored
The caller already has done such a check. And it was wrong anyway, it had to be '>=' rather than '>' Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> LKML-Reference: <20100508112717.130386882@openvz.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Cyrill Gorcunov authored
Steven reported: | | I'm getting: | | Pid: 3477, comm: perf Not tainted 2.6.34-rc6 #2727 | Call Trace: | [<ffffffff811c7565>] debug_smp_processor_id+0xd5/0xf0 | [<ffffffff81019874>] p4_hw_config+0x2b/0x15c | [<ffffffff8107acbc>] ? trace_hardirqs_on_caller+0x12b/0x14f | [<ffffffff81019143>] hw_perf_event_init+0x468/0x7be | [<ffffffff810782fd>] ? debug_mutex_init+0x31/0x3c | [<ffffffff810c68b2>] T.850+0x273/0x42e | [<ffffffff810c6cab>] sys_perf_event_open+0x23e/0x3f1 | [<ffffffff81009e6a>] ? sysret_check+0x2e/0x69 | [<ffffffff81009e32>] system_call_fastpath+0x16/0x1b | | When running perf record in latest tip/perf/core | Due to the fact that p4 counters are shared between HT threads we synthetically divide the whole set of counters into two non-intersected subsets. And while we're "borrowing" counters from these subsets we should not be preempted (well, strictly speaking in p4_hw_config we just pre-set reference to the subset which allow to save some cycles in schedule routine if it happens on the same cpu). So use get_cpu/put_cpu pair. Also p4_pmu_schedule_events should use smp_processor_id rather than raw_ version. This allow us to catch up preemption issue (if there will ever be). Reported-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> LKML-Reference: <20100508112716.963478928@openvz.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Cyrill Gorcunov authored
If an event is not RAW we should not exit p4_hw_config early but call x86_setup_perfctr as well. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Robert Richter <robert.richter@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Paul Mackerras authored
Commit 6bde9b6c ("perf: Add group scheduling transactional APIs") added code to allow a group to be scheduled in a single transaction. However, it introduced a bug in handling events whose pmu does not implement transactions -- at the end of scheduling in the events in the group, in the non-transactional case the code now falls through to the group_error label, and proceeds to unschedule all the events in the group and return failure. This fixes it by returning 0 (success) in the non-transactional case. Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: eranian@gmail.com LKML-Reference: <20100508105800.GB10650@brick.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
-
- 07 May, 2010 14 commits
-
-
Arnaldo Carvalho de Melo authored
It was x86 specific and imcomplete at that, improve the situation by making it clear where the example provided applies and by adding the URLs for the Intel and AMD manuals where this is discussed in depth. Acked-by: Robert Richter <robert.richter@amd.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Robert Richter <robert.richter@amd.com> Reported-by: Robert Richter <robert.richter@amd.com LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Lin Ming authored
Convert to the transactional PMU API and remove the duplication of group_sched_in(). Reviewed-by: Stephane Eranian <eranian@google.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: David Miller <davem@davemloft.net> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1272002172.5707.61.camel@minggr.sh.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Lin Ming authored
Add group scheduling transactional APIs to struct pmu. These APIs will be implemented in arch code, based on Peter's idea as below. > the idea behind hw_perf_group_sched_in() is to not perform > schedulability tests on each event in the group, but to add the group > as a whole and then perform one test. > > Of course, when that test fails, you'll have to roll-back the whole > group again. > > So start_txn (or a better name) would simply toggle a flag in the pmu > implementation that will make pmu::enable() not perform the > schedulablilty test. > > Then commit_txn() will perform the schedulability test (so note the > method has to have a !void return value. > > This will allow us to use the regular > kernel/perf_event.c::group_sched_in() and all the rollback code. > Currently each hw_perf_group_sched_in() implementation duplicates all > the rolllback code (with various bugs). ->start_txn: Start group events scheduling transaction, set a flag to make pmu::enable() not perform the schedulability test, it will be performed at commit time. ->commit_txn: Commit group events scheduling transaction, perform the group schedulability as a whole ->cancel_txn: Stop group events scheduling transaction, clear the flag so pmu::enable() will perform the schedulability test. Reviewed-by: Stephane Eranian <eranian@google.com> Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Cc: David Miller <davem@davemloft.net> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1272002160.5707.60.camel@minggr.sh.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Peter Zijlstra authored
Rename perf_event_attr::precise to perf_event_attr::precise_ip and widen it to 2 bits. This new field describes the required precision of the PERF_SAMPLE_IP field: 0 - SAMPLE_IP can have arbitrary skid 1 - SAMPLE_IP must have constant skid 2 - SAMPLE_IP requested to have 0 skid 3 - SAMPLE_IP must have 0 skid And modify the Intel PEBS code accordingly. The PEBS implementation now supports up to precise_ip == 2, where we perform the IP fixup. Also s/PERF_RECORD_MISC_EXACT/&_IP/ to clarify its meaning, this bit should be set for each PERF_SAMPLE_IP field known to match the actual instruction triggering the event. This new scheme allows for a PEBS mode that uses the buffer for more than a single event. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Stephane Eranian <eranian@google.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Peter Zijlstra authored
Remove some duplicated logic. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Peter Zijlstra authored
Its broken, we really should get PERF_SAMPLE_REGS sorted. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Robert Richter authored
There may exist constraints with a cmask set to zero. In this case for_each_event_constraint() will not work properly. Now weight is used instead of the cmask for loop exit detection. Weight is always a value other than zero since the default contains the HWEIGHT from the counter mask and in other cases a value of zero does not fit too. This is in preparation of ibs event constraints that wont have a cmask. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1271190201-25705-7-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Robert Richter authored
To reuse this function for events with different enable bit masks, this mask is part of the function's argument list now. The function will be used later to control ibs events too. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1271190201-25705-6-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Robert Richter authored
The perfctr setup calls are in the corresponding .hw_config() functions now. This makes it possible to introduce config functions for other pmu events that are not perfctr specific. Also, all of a sudden the code looks much nicer. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1271190201-25705-4-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Robert Richter authored
Move x86_setup_perfctr(), no other changes made. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1271190201-25705-3-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Robert Richter authored
Split __hw_perf_event_init() to configure pmu events other than perfctrs. Perfctr code is moved to a separate function x86_setup_perfctr(). This and the following patches refactor the code. Split in multiple patches for better review. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1271190201-25705-2-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Peter Zijlstra authored
Stephane reported a lockdep warning while using PERF_FORMAT_GROUP. The issue is that perf_event_read_group() takes faults while holding the ctx->mutex, while perf_event_release_kernel() can be called from munmap(). Which makes for an AB-BA deadlock. Except we can never establish the deadlock because we'll only ever call perf_event_release_kernel() after all file descriptors are dead so there is no concurrency possible. Reported-by: Stephane Eranian <eranian@google.com> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Ingo Molnar authored
Merge reason: Resolve patch dependency Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Peter Zijlstra authored
Both Stephane and Corey reported that PERF_FORMAT_GROUP didn't work as expected if the task the counters were attached to quit before the read() call. The cause is that we unconditionally destroy the grouping when we remove counters from their context. Fix this by only doing this when we free the counter itself. Reported-by: Corey Ashford <cjashfor@linux.vnet.ibm.com> Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1273160566.5605.404.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
- 06 May, 2010 1 commit
-
-
Dan Carpenter authored
The original code doesn't work because "call" is never NULL there. Signed-off-by: Dan Carpenter <error27@gmail.com> LKML-Reference: <20100320143911.GF5331@bicker> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
-
- 05 May, 2010 19 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/misc-2.6Linus Torvalds authored
* 'zerolen' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/misc-2.6: [MTD] Remove zero-length files mtdbdi.c and internal.ho
-
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-devLinus Torvalds authored
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: pata_pcmcia / ide-cs: Fix bad hashes for Transcend and kingston IDs libata: Fix several inaccuracies in developer's guide
-
Jeff Garzik authored
Both were "removed" in commit a33eb6b9. Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
-
Kristoffer Ericson authored
This patch fixes the bad hashes for one Kingston and one Transcend card. Thanks to komuro for pointing this out. Signed-off-by: Kristoffer Ericson <kristoffer.ericson@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
-
Sergei Shtylyov authored
Commit 6bfff31e (libata: kill probe_ent and related helpers) killed ata_device_add() but didn't remove references to it from the libata developer's guide. Commits 9363c382 (libata: rename SFF functions) and 5682ed33 (libata: rename SFF port ops) renamed the taskfile access methods but didn't update the developer's guide. Commit c9f75b04 (libata: kill ata_noop_dev_select()) didn't update the developer's guide as well. The guide also refers to the long gone ata_pio_data_xfer_noirq(), ata_pio_data_xfer(), and ata_mmio_data_xfer() -- replace those by the modern ata_sff_data_xfer_noirq(), ata_sff_data_xfer(), and ata_sff_data_xfer32(). Also, remove the reference to non-existant ata_port_stop()... Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6Linus Torvalds authored
* 'slab-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6: slub: Fix bad boundary check in init_kmem_cache_nodes()
-
Zhang, Yanmin authored
Function init_kmem_cache_nodes is incorrect when checking upper limitation of kmalloc_caches. The breakage was introduced by commit 91efd773 ("dma kmalloc handling fixes"). Acked-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
-
Linus Torvalds authored
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: KEYS: call_sbin_request_key() must write lock keyrings before modifying them KEYS: Use RCU dereference wrappers in keyring key type code KEYS: find_keyring_by_name() can gain access to a freed keyring
-
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds authored
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: flush_delayed_work: keep the original workqueue for re-queueing
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds authored
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: FEC: Fix kernel panic in fec_set_mac_address. ipv6: Fix default multicast hops setting. net: ep93xx_eth stops receiving packets drivers/net/phy: micrel phy driver dm9601: fix phy/eeprom write routine ppp_generic: handle non-linear skbs when passing them to pppd ppp_generic: pull 2 bytes so that PPP_PROTO(skb) is valid net: fix compile error due to double return type in SOCK_DEBUG net/usb: initiate sync sequence in sierra_net.c driver net/usb: remove default in Kconfig for sierra_net driver r8169: Fix rtl8169_rx_interrupt() e1000e: Fix oops caused by ASPM patch. net/sb1250: register mdio bus in probe sctp: Fix skb_over_panic resulting from multiple invalid parameter errors (CVE-2010-1173) (v4) p54pci: fix bugs in p54p_check_tx_ring
-
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6Linus Torvalds authored
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: ALSA: hda: Fix 0 dB for Packard Bell models using Conexant CX20549 (Venice) ALSA: hda - Add quirk for Dell Inspiron 19T using a Conexant CX20582 ALSA: take tu->qlock with irqs disabled ALSA: hda: Use olpc-xo-1_5 quirk for Toshiba Satellite P500-PSPGSC-01800T ALSA: hda: Use olpc-xo-1_5 quirk for Toshiba Satellite Pro T130-15F ALSA: hda - fix array indexing while creating inputs for Cirrus codecs ALSA: es968: fix wrong PnP dma index
-
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/inputLinus Torvalds authored
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: joydev - allow binding to button-only devices Input: elantech - ignore high bits in the position coordinates Input: elantech - allow forcing Elantech protocol Input: elantech - fix firmware version check Input: ati_remote - add some missing devices from lirc_atiusb Input: eeti_ts - cancel pending work when going to suspend Input: Add support of Synaptics Clickpad device Revert "Input: ALPS - add signature for HP Pavilion dm3 laptops" Input: psmouse - ignore parity error for basic protocols
-
Dan Williams authored
The raid6 recovery code should immediately drop back to the optimized synchronous path when a p+q dma resource is not available. Otherwise we run the non-optimized/multi-pass async code in sync mode. Verified with raid6test (NDISKS=255) Applies to kernels >= 2.6.32. Cc: <stable@kernel.org> Acked-by: NeilBrown <neilb@suse.de> Reported-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Arnaldo Carvalho de Melo authored
Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
Using explanation given by Ingo Molnar in the oprofile mailing list. Suggested-by: Nick Black <dank@qemfd.net> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Nick Black <dank@qemfd.net> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Tom Zanussi authored
Fix a couple of inefficiencies and redundancies related to have_tracepoints() and its use when checking whether to write TRACE_INFO. First, there's no need to use get_tracepoints_path() in have_tracepoints() - we really just want the part that checks whether any attributes correspondo to tracepoints. Second, we really don't care about raw_samples per se - tracepoints are always raw_samples. In any case, the have_tracepoints() check should be sufficient to decide whether or not to write TRACE_INFO. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu>, Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1273030770.6383.6.camel@tropicana> Signed-off-by: Tom Zanussi <tzanussi@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
David Howells authored
call_sbin_request_key() creates a keyring and then attempts to insert a link to the authorisation key into that keyring, but does so without holding a write lock on the keyring semaphore. It will normally get away with this because it hasn't told anyone that the keyring exists yet. The new keyring, however, has had its serial number published, which means it can be accessed directly by that handle. This was found by a previous patch that adds RCU lockdep checks to the code that reads the keyring payload pointer, which includes a check that the keyring semaphore is actually locked. Without this patch, the following command: keyctl request2 user b a @s will provoke the following lockdep warning is displayed in dmesg: =================================================== [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- security/keys/keyring.c:727 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 2 locks held by keyctl/2076: #0: (key_types_sem){.+.+.+}, at: [<ffffffff811a5b29>] key_type_lookup+0x1c/0x71 #1: (keyring_serialise_link_sem){+.+.+.}, at: [<ffffffff811a6d1e>] __key_link+0x4d/0x3c5 stack backtrace: Pid: 2076, comm: keyctl Not tainted 2.6.34-rc6-cachefs #54 Call Trace: [<ffffffff81051fdc>] lockdep_rcu_dereference+0xaa/0xb2 [<ffffffff811a6d1e>] ? __key_link+0x4d/0x3c5 [<ffffffff811a6e6f>] __key_link+0x19e/0x3c5 [<ffffffff811a5952>] ? __key_instantiate_and_link+0xb1/0xdc [<ffffffff811a59bf>] ? key_instantiate_and_link+0x42/0x5f [<ffffffff811aa0dc>] call_sbin_request_key+0xe7/0x33b [<ffffffff8139376a>] ? mutex_unlock+0x9/0xb [<ffffffff811a5952>] ? __key_instantiate_and_link+0xb1/0xdc [<ffffffff811a59bf>] ? key_instantiate_and_link+0x42/0x5f [<ffffffff811aa6fa>] ? request_key_auth_new+0x1c2/0x23c [<ffffffff810aaf15>] ? cache_alloc_debugcheck_after+0x108/0x173 [<ffffffff811a9d00>] ? request_key_and_link+0x146/0x300 [<ffffffff810ac568>] ? kmem_cache_alloc+0xe1/0x118 [<ffffffff811a9e45>] request_key_and_link+0x28b/0x300 [<ffffffff811a89ac>] sys_request_key+0xf7/0x14a [<ffffffff81052c0b>] ? trace_hardirqs_on_caller+0x10c/0x130 [<ffffffff81394fb9>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff81001eeb>] system_call_fastpath+0x16/0x1b Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
-
David Howells authored
The keyring key type code should use RCU dereference wrappers, even when it holds the keyring's key semaphore. Reported-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
-
Toshiyuki Okajima authored
find_keyring_by_name() can gain access to a keyring that has had its reference count reduced to zero, and is thus ready to be freed. This then allows the dead keyring to be brought back into use whilst it is being destroyed. The following timeline illustrates the process: |(cleaner) (user) | | free_user(user) sys_keyctl() | | | | key_put(user->session_keyring) keyctl_get_keyring_ID() | || //=> keyring->usage = 0 | | |schedule_work(&key_cleanup_task) lookup_user_key() | || | | kmem_cache_free(,user) | | . |[KEY_SPEC_USER_KEYRING] | . install_user_keyrings() | . || | key_cleanup() [<= worker_thread()] || | | || | [spin_lock(&key_serial_lock)] |[mutex_lock(&key_user_keyr..mutex)] | | || | atomic_read() == 0 || | |{ rb_ease(&key->serial_node,) } || | | || | [spin_unlock(&key_serial_lock)] |find_keyring_by_name() | | ||| | keyring_destroy(keyring) ||[read_lock(&keyring_name_lock)] | || ||| | |[write_lock(&keyring_name_lock)] ||atomic_inc(&keyring->usage) | |. ||| *** GET freeing keyring *** | |. ||[read_unlock(&keyring_name_lock)] | || || | |list_del() |[mutex_unlock(&key_user_k..mutex)] | || | | |[write_unlock(&keyring_name_lock)] ** INVALID keyring is returned ** | | . | kmem_cache_free(,keyring) . | . | atomic_dec(&keyring->usage) v *** DESTROYED *** TIME If CONFIG_SLUB_DEBUG=y then we may see the following message generated: ============================================================================= BUG key_jar: Poison overwritten ----------------------------------------------------------------------------- INFO: 0xffff880197a7e200-0xffff880197a7e200. First byte 0x6a instead of 0x6b INFO: Allocated in key_alloc+0x10b/0x35f age=25 cpu=1 pid=5086 INFO: Freed in key_cleanup+0xd0/0xd5 age=12 cpu=1 pid=10 INFO: Slab 0xffffea000592cb90 objects=16 used=2 fp=0xffff880197a7e200 flags=0x200000000000c3 INFO: Object 0xffff880197a7e200 @offset=512 fp=0xffff880197a7e300 Bytes b4 0xffff880197a7e1f0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ Object 0xffff880197a7e200: 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b jkkkkkkkkkkkkkkk Alternatively, we may see a system panic happen, such as: BUG: unable to handle kernel NULL pointer dereference at 0000000000000001 IP: [<ffffffff810e61a3>] kmem_cache_alloc+0x5b/0xe9 PGD 6b2b4067 PUD 6a80d067 PMD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/kernel/kexec_crash_loaded CPU 1 ... Pid: 31245, comm: su Not tainted 2.6.34-rc5-nofixed-nodebug #2 D2089/PRIMERGY RIP: 0010:[<ffffffff810e61a3>] [<ffffffff810e61a3>] kmem_cache_alloc+0x5b/0xe9 RSP: 0018:ffff88006af3bd98 EFLAGS: 00010002 RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff88007d19900b RDX: 0000000100000000 RSI: 00000000000080d0 RDI: ffffffff81828430 RBP: ffffffff81828430 R08: ffff88000a293750 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000100000 R12: 00000000000080d0 R13: 00000000000080d0 R14: 0000000000000296 R15: ffffffff810f20ce FS: 00007f97116bc700(0000) GS:ffff88000a280000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000001 CR3: 000000006a91c000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process su (pid: 31245, threadinfo ffff88006af3a000, task ffff8800374414c0) Stack: 0000000512e0958e 0000000000008000 ffff880037f8d180 0000000000000001 0000000000000000 0000000000008001 ffff88007d199000 ffffffff810f20ce 0000000000008000 ffff88006af3be48 0000000000000024 ffffffff810face3 Call Trace: [<ffffffff810f20ce>] ? get_empty_filp+0x70/0x12f [<ffffffff810face3>] ? do_filp_open+0x145/0x590 [<ffffffff810ce208>] ? tlb_finish_mmu+0x2a/0x33 [<ffffffff810ce43c>] ? unmap_region+0xd3/0xe2 [<ffffffff810e4393>] ? virt_to_head_page+0x9/0x2d [<ffffffff81103916>] ? alloc_fd+0x69/0x10e [<ffffffff810ef4ed>] ? do_sys_open+0x56/0xfc [<ffffffff81008a02>] ? system_call_fastpath+0x16/0x1b Code: 0f 1f 44 00 00 49 89 c6 fa 66 0f 1f 44 00 00 65 4c 8b 04 25 60 e8 00 00 48 8b 45 00 49 01 c0 49 8b 18 48 85 db 74 0d 48 63 45 18 <48> 8b 04 03 49 89 00 eb 14 4c 89 f9 83 ca ff 44 89 e6 48 89 ef RIP [<ffffffff810e61a3>] kmem_cache_alloc+0x5b/0xe9 This problem is that find_keyring_by_name does not confirm that the keyring is valid before accepting it. Skipping keyrings that have been reduced to a zero count seems the way to go. To this end, use atomic_inc_not_zero() to increment the usage count and skip the candidate keyring if that returns false. The following script _may_ cause the bug to happen, but there's no guarantee as the window of opportunity is small: #!/bin/sh LOOP=100000 USER=dummy_user /bin/su -c "exit;" $USER || { /usr/sbin/adduser -m $USER; add=1; } for ((i=0; i<LOOP; i++)) do /bin/su -c "echo '$i' > /dev/null" $USER done (( add == 1 )) && /usr/sbin/userdel -r $USER exit Note that the nominated user must not be in use. An alternative way of testing this may be: for ((i=0; i<100000; i++)) do keyctl session foo /bin/true || break done >&/dev/null as that uses a keyring named "foo" rather than relying on the user and user-session named keyrings. Reported-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
-