- 13 Apr, 2010 1 commit
-
-
Gui Jianfeng authored
Currently, IO Controller makes use of blkio.weight to assign weight for all devices. Here a new user interface "blkio.weight_device" is introduced to assign different weights for different devices. blkio.weight becomes the default value for devices which are not configured by "blkio.weight_device" You can use the following format to assigned specific weight for a given device: #echo "major:minor weight" > blkio.weight_device major:minor represents device number. And you can remove weight for a given device as following: #echo "major:minor 0" > blkio.weight_device V1->V2 changes: - use user interface "weight_device" instead of "policy" suggested by Vivek - rename some struct suggested by Vivek - rebase to 2.6-block "for-linus" branch - remove an useless list_empty check pointed out by Li Zefan - some trivial typo fix V2->V3 changes: - Move policy_*_node() functions up to get rid of forward declarations - rename related functions by adding prefix "blkio_" Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
- 09 Apr, 2010 4 commits
-
-
Divyesh Shah authored
1) group_wait_time - This is the amount of time the cgroup had to wait to get a timeslice for one of its queues from when it became busy, i.e., went from 0 to 1 request queued. This is different from the io_wait_time which is the cumulative total of the amount of time spent by each IO in that cgroup waiting in the scheduler queue. This stat is a great way to find out any jobs in the fleet that are being starved or waiting for longer than what is expected (due to an IO controller bug or any other issue). 2) empty_time - This is the amount of time a cgroup spends w/o any pending requests. This stat is useful when a job does not seem to be able to use its assigned disk share by helping check if that is happening due to an IO controller bug or because the job is not submitting enough IOs. 3) idle_time - This is the amount of time spent by the IO scheduler idling for a given cgroup in anticipation of a better request than the exising ones from other queues/cgroups. All these stats are recorded using start and stop events. When reading these stats, we do not add the delta between the current time and the last start time if we're between the start and stop events. We avoid doing this to make sure that these numbers are always monotonically increasing when read. Since we're using sched_clock() which may use the tsc as its source, it may induce some inconsistency (due to tsc resync across cpus) if we included the current delta. Signed-off-by: Divyesh Shah<dpshah@google.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Divyesh Shah authored
These stats are useful for getting a feel for the queue depth of the cgroup, i.e., how filled up its queues are at a given instant and over the existence of the cgroup. This ability is useful when debugging problems in the wild as it helps understand the application's IO pattern w/o having to read through the userspace code (coz its tedious or just not available) or w/o the ability to run blktrace (since you may not have root access and/or not want to disturb performance). Signed-off-by: Divyesh Shah<dpshah@google.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Divyesh Shah authored
This includes both the number of bios merged into requests belonging to this cgroup as well as the number of requests merged together. In the past, we've observed different merging behavior across upstream kernels, some by design some actual bugs. This stat helps a lot in debugging such problems when applications report decreased throughput with a new kernel version. This needed adding an extra elevator function to capture bios being merged as I did not want to pollute elevator code with blkiocg knowledge and hence needed the accounting invocation to come from CFQ. Signed-off-by: Divyesh Shah<dpshah@google.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Divyesh Shah authored
that include some minor fixes and addresses all comments. Changelog: (most based on Vivek Goyal's comments) o renamed blkiocg_reset_write to blkiocg_reset_stats o more clarification in the documentation on io_service_time and io_wait_time o Initialize blkg->stats_lock o rename io_add_stat to blkio_add_stat and declare it static o use bool for direction and sync o derive direction and sync info from existing rq methods o use 12 for major:minor string length o define io_service_time better to cover the NCQ case o add a separate reset_stats interface o make the indexed stats a 2d array to simplify macro and function pointer code o blkio.time now exports in jiffies as before o Added stats description in patch description and Documentation/cgroup/blkio-controller.txt o Prefix all stats functions with blkio and make them static as applicable o replace IO_TYPE_MAX with IO_TYPE_TOTAL o Moved #define constant to top of blk-cgroup.c o Pass dev_t around instead of char * o Add note to documentation file about resetting stats o use BLK_CGROUP_MODULE in addition to BLK_CGROUP config option in #ifdef statements o Avoid struct request specific knowledge in blk-cgroup. blk-cgroup.h now has rq_direction() and rq_sync() functions which are used by CFQ and when using io-controller at a higher level, bio_* functions can be added. Signed-off-by: Divyesh Shah<dpshah@google.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
- 06 Apr, 2010 1 commit
-
-
Matthew Garrett authored
One of the features of laptop-mode is that it forces a writeout of dirty pages if something else triggers a physical read or write from a device. The current implementation flushes pages on all devices, rather than only the one that triggered the flush. This patch alters the behaviour so that only the recently accessed block device is flushed, preventing other disks being spun up for no terribly good reason. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
- 02 Apr, 2010 7 commits
-
-
Divyesh Shah authored
We also add start_time_ns and io_start_time_ns fields to struct request here to record the time when a request is created and when it is dispatched to device. We use ns uints here as ms and jiffies are not very useful for non-rotational media. Signed-off-by: Divyesh Shah<dpshah@google.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Divyesh Shah authored
- io_service_time - io_wait_time - io_serviced - io_service_bytes These stats are accumulated per operation type helping us to distinguish between read and write, and sync and async IO. This patch does not increment any of these stats. Signed-off-by: Divyesh Shah<dpshah@google.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Divyesh Shah authored
that info at request dispatch with other stats now. This patch removes the existing support for accounting sectors for a blkio_group. This will be added back differently in the next two patches. Signed-off-by: Divyesh Shah<dpshah@google.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
-
wzt.wzt@gmail.com authored
elevator_get() not check the name length, if the name length > sizeof(elv), elv will miss the '\0'. And elv buffer will be replace "-iosched" as something like aaaaaaaaa, then call request_module() can load an not trust module. Signed-off-by: Zhitong Wang <zhitong.wangzt@alibaba-inc.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Philipp Reisner authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Dan Carpenter authored
We take the spin_lock again in fail_all_cmds() so we need to unlock here. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
- 30 Mar, 2010 6 commits
-
-
Linus Torvalds authored
-
David Howells authored
Add a MAINTAINERS record for the key management facility. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Linus Torvalds authored
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: CRED: Fix memory leak in error handling
-
git://git.kernel.org/pub/scm/linux/kernel/git/joern/logfsLinus Torvalds authored
* git://git.kernel.org/pub/scm/linux/kernel/git/joern/logfs: [LogFS] Erase new journal segments [LogFS] Move reserved segments with journal [LogFS] Clear PagePrivate when moving journal Simplify and fix pad_wbuf Prevent data corruption in logfs_rewrite_block() Use deactivate_locked_super Fix logfs_get_sb_final error path Write out both superblocks on mismatch Prevent schedule while atomic in __logfs_readdir Plug memory leak in writeseg_end_io Limit max_pages for insane devices Open segment file before using it
-
Linus Torvalds authored
Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Do not free zero sized per cpu areas x86: Make sure free_init_pages() frees pages on page boundary x86: Make smp_locks end with page alignment
-
Mathieu Desnoyers authored
Fix a memory leak on an OOM condition in prepare_usermodehelper_creds(). Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
-
- 29 Mar, 2010 21 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2Linus Torvalds authored
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2: Fix a race in o2dlm lockres mastery Ocfs2: Handle deletion of reflinked oprhan inodes correctly. Ocfs2: Journaling i_flags and i_orphaned_slot when adding inode to orphan dir. ocfs2: Clear undo bits when local alloc is freed ocfs2: Init meta_ac properly in ocfs2_create_empty_xattr_block. ocfs2: Fix the update of name_offset when removing xattrs ocfs2: Always try for maximum bits with new local alloc windows ocfs2: set i_mode on disk during acl operations ocfs2: Update i_blocks in reflink operations. ocfs2: Change bg_chain check for ocfs2_validate_gd_parent. [PATCH] Skip check for mandatory locks when unlocking
-
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-clientLinus Torvalds authored
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (28 commits) ceph: update discussion list address in MAINTAINERS ceph: some documentations fixes ceph: fix use after free on mds __unregister_request ceph: avoid loaded term 'OSD' in documention ceph: fix possible double-free of mds request reference ceph: fix session check on mds reply ceph: handle kmalloc() failure ceph: propagate mds session allocation failures to caller ceph: make write_begin wait propagate ERESTARTSYS ceph: fix snap rebuild condition ceph: avoid reopening osd connections when address hasn't changed ceph: rename r_sent_stamp r_stamp ceph: fix connection fault con_work reentrancy problem ceph: prevent dup stale messages to console for restarting mds ceph: fix pg pool decoding from incremental osdmap update ceph: fix mds sync() race with completing requests ceph: only release unused caps with mds requests ceph: clean up handle_cap_grant, handle_caps wrt session mutex ceph: fix session locking in handle_caps, ceph_check_caps ceph: drop unnecessary WARN_ON in caps migration ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/stagingLinus Torvalds authored
* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging: hwmon: (asc7621) Add X58 entry in Kconfig hwmon: (w83793) Saving negative errors in unsigned hwmon: (coretemp) Add missing newline to dev_warn() message hwmon: (coretemp) Fix cpu model output
-
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-devLinus Torvalds authored
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: pata_via: fix VT6410/6415/6330 detection issue
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds authored
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (33 commits) r8169: offical fix for CVE-2009-4537 (overlength frame DMAs) ipv6: Don't drop cache route entry unless timer actually expired. tulip: Add missing parens. r8169: fix broken register writes pcnet_cs: add new id bonding: fix broken multicast with round-robin mode drivers/net: Fix continuation lines e1000: do not modify tx_queue_len on link speed change net: ipmr/ip6mr: prevent out-of-bounds vif_table access ixgbe: Do not run all Diagnostic offline tests when VFs are active igb: use correct bits to identify if managability is enabled benet: Fix compile warnnings in drivers/net/benet/be_ethtool.c net: Add MSG_WAITFORONE flag to recvmmsg e1000e: do not modify tx_queue_len on link speed change igbvf: do not modify tx_queue_len on link speed change ipv4: Restart rt_intern_hash after emergency rebuild (v2) ipv4: Cleanup struct net dereference in rt_intern_hash net: fix netlink address dumping in IPv4/IPv6 tulip: Fix null dereference in uli526x_rx_packet() gianfar: fix undo of reserve() ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6Linus Torvalds authored
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6: sparc64: Properly truncate pt_regs framepointer in perf callback. arch/sparc/kernel: Use set_cpus_allowed_ptr sparc: Fix use of uid16_t and gid16_t in asm/stat.h
-
Linus Torvalds authored
In commit 9df93939 ("ext3: Use bitops to read/modify EXT3_I(inode)->i_state") ext3 changed its internal 'i_state' variable to use bitops for its state handling. However, unline the same ext4 change, it didn't actually change the name of the field when it changed the semantics of it. As a result, an old use of 'i_state' remained in fs/ext3/ialloc.c that initialized the field to EXT3_STATE_NEW. And that does not work _at_all_ when we're now working with individually named bits rather than values that get masked. So the code tried to mark the state to be new, but in actual fact set the field to EXT3_STATE_JDATA. Which makes no sense at all, and screws up all the code that checks whether the inode was newly allocated. In particular, it made the xattr code unhappy, and caused various random behavior, like apparently https://bugzilla.redhat.com/show_bug.cgi?id=577911 So fix the initialization, and rename the field to match ext4 so that we don't have this happen again. Cc: James Morris <jmorris@namei.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Daniel J Walsh <dwalsh@redhat.com> Cc: Eric Paris <eparis@redhat.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Neil Horman authored
Official patch to fix the r8169 frame length check error. Based on this initial thread: http://marc.info/?l=linux-netdev&m=126202972828626&w=1 This is the official patch to fix the frame length problems in the r8169 driver. As noted in the previous thread, while this patch incurs a performance hit on the driver, its possible to improve performance dynamically by updating the mtu and rx_copybreak values at runtime to return performance to what it was for those NICS which are unaffected by the ideosyncracy (if there are any). Summary: A while back Eric submitted a patch for r8169 in which the proper allocated frame size was written to RXMaxSize to prevent the NIC from dmaing too much data. This was done in commit fdd7b4c3. A long time prior to that however, Francois posted 126fa4b9, which expiclitly disabled the MaxSize setting due to the fact that the hardware behaved in odd ways when overlong frames were received on NIC's supported by this driver. This was mentioned in a security conference recently: http://events.ccc.de/congress/2009/Fahrplan//events/3596.en.html It seems that if we can't enable frame size filtering, then, as Eric correctly noticed, we can find ourselves DMA-ing too much data to a buffer, causing corruption. As a result is seems that we are forced to allocate a frame which is ready to handle a maximally sized receive. This obviously has performance issues with it, so to mitigate that issue, this patch does two things: 1) Raises the copybreak value to the frame allocation size, which should force appropriately sized packets to get allocated on rx, rather than a full new 16k buffer. 2) This patch only disables frame filtering initially (i.e., during the NIC open), changing the MTU results in ring buffer allocation of a size in relation to the new mtu (along with a warning indicating that this is dangerous). Because of item (2), individuals who can't cope with the performance hit (or can otherwise filter frames to prevent the bug), or who have hardware they are sure is unaffected by this issue, can manually lower the copybreak and reset the mtu such that performance is restored easily. Signed-off-by: Neil Horman <nhorman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
For 32-bit processes, we save the full 64-bits of the regs in pt_regs. But unlike when the userspace actually does load and store instructions, the top 32-bits don't get automatically truncated by the cpu in kernel mode (because the kernel doesn't execute with PSTATE_AM address masking enabled). So we have to do it by hand. Reported-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jaswinder Singh Rajput authored
Intel X58 have asc7621a chip. So added X58 entry in Kconfig for asc7621. Also arranged existing models in ascending order. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: Jean Delvare <khali@linux-fr.org>
-
Dan Carpenter authored
"ret" is used to store the return value for watchdog_trigger() and it should be signed for the error handling to work. Signed-off-by: Dan Carpenter <error27@gmail.com> Acked-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Jean Delvare <khali@linux-fr.org>
-
Dean Nelson authored
Add missing newline to dev_warn() message string. This is more of an issue with older kernels that don't automatically add a newline if it was missing from the end of the previous line. Signed-off-by: Dean Nelson <dnelson@redhat.com> Cc: stable@kernel.org Signed-off-by: Jean Delvare <khali@linux-fr.org>
-
Prarit Bhargava authored
Avoid hex and decimal confusion when printing out the cpu model. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Jean Delvare <khali@linux-fr.org>
-
Joern Engel authored
If the device contains on old logfs image and the journal is moved to segment that have never been used by the current logfs and not all journal segments are erased before the next mount, the old content can confuse mount code. To prevent this, always erase the new journal segments. Signed-off-by: Joern Engel <joern@logfs.org>
-
Joern Engel authored
Fixes a GC livelock. Signed-off-by: Joern Engel <joern@logfs.org>
-
Ian Campbell authored
This avoids an infinite loop in free_early_partial(). Add a warning to free_early_partial() to catch future problems. -v5: put back start > end back into WARN_ONCE() -v6: use one line for warning, suggested by Linus -v7: more tests -v8: remove the function name as suggested by Johannes WARN_ONCE() will print out that function name. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Joel Becker <joel.becker@oracle.com> Tested-by: Stanislaw Gruszka <sgruszka@redhat.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: David Miller <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <1269830604-26214-4-git-send-email-yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Yinghai Lu authored
When CONFIG_NO_BOOTMEM=y, it could use memory more effiently, or in a more compact fashion. Example: Allocated new RAMDISK: 00ec2000 - 0248ce57 Move RAMDISK from 000000002ea04000 - 000000002ffcee56 to 00ec2000 - 0248ce56 The new RAMDISK's end is not page aligned. Last page could be shared with other users. When free_init_pages are called for initrd or .init, the page could be freed and we could corrupt other data. code segment in free_init_pages(): | for (; addr < end; addr += PAGE_SIZE) { | ClearPageReserved(virt_to_page(addr)); | init_page_count(virt_to_page(addr)); | memset((void *)(addr & ~(PAGE_SIZE-1)), | POISON_FREE_INITMEM, PAGE_SIZE); | free_page(addr); | totalram_pages++; | } last half page could be used as one whole free page. So page align the boundaries. -v2: make the original initramdisk to be aligned, according to Johannes, otherwise we have the chance to lose one page. we still need to keep initrd_end not aligned, otherwise it could confuse decompressor. -v3: change to WARN_ON instead, suggested by Johannes. -v4: use PAGE_ALIGN, suggested by Johannes. We may fix that macro name later to PAGE_ALIGN_UP, and PAGE_ALIGN_DOWN Add comments about assuming ramdisk start is aligned in relocate_initrd(), change to re get ramdisk_image instead of save it to make diff smaller. Add warning for wrong range, suggested by Johannes. -v6: remove one WARN() We need to align beginning in free_init_pages() do not copy more than ramdisk_size, noticed by Johannes Reported-by: Stanislaw Gruszka <sgruszka@redhat.com> Tested-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: David Miller <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <1269830604-26214-3-git-send-email-yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Sage Weil authored
Signed-off-by: Sage Weil <sage@newdream.net>
-
Cheng Renquan authored
New documentation should have an entry in the 00-INDEX. Correct git urls. Signed-off-by: Cheng Renquan <crquan@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
-
Yinghai Lu authored
Fix: ------------[ cut here ]------------ WARNING: at arch/x86/mm/init.c:342 free_init_pages+0x4c/0xfa() free_init_pages: range [0x40daf000, 0x40db5c24] is not aligned Modules linked in: Pid: 0, comm: swapper Not tainted 2.6.34-rc2-tip-03946-g4f16b23-dirty #50 Call Trace: [<40232e9f>] warn_slowpath_common+0x65/0x7c [<4021c9f0>] ? free_init_pages+0x4c/0xfa [<40881434>] ? _etext+0x0/0x24 [<40232eea>] warn_slowpath_fmt+0x24/0x27 [<4021c9f0>] free_init_pages+0x4c/0xfa [<40881434>] ? _etext+0x0/0x24 [<40d3f4bd>] alternative_instructions+0xf6/0x100 [<40d3fe4f>] check_bugs+0xbd/0xbf [<40d398a7>] start_kernel+0x2d5/0x2e4 [<40d390ce>] i386_start_kernel+0xce/0xd5 ---[ end trace 4eaa2a86a8e2da22 ]--- Comments in vmlinux.lds.S already said: | /* | * smp_locks might be freed after init | * start/end must be page aligned | */ Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: David Miller <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <1269830604-26214-2-git-send-email-yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide-2.6Linus Torvalds authored
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide-2.6: Revert "ide: skip probe if there are no devices on the port (v2)" Revert "via82cxxx: workaround h/w bugs"
-