Commits · 34d0f179d6dd711d3fc13c0820a456c59aae8048 · Kirill Smelkov / linux

13 Apr, 2010 1 commit

io-controller: Add a new interface "weight_device" for IO-Controller · 34d0f179

Gui Jianfeng authored Apr 13, 2010

Currently, IO Controller makes use of blkio.weight to assign weight for
all devices. Here a new user interface "blkio.weight_device" is introduced to
assign different weights for different devices. blkio.weight becomes the
default value for devices which are not configured by "blkio.weight_device"

You can use the following format to assigned specific weight for a given
device:
#echo "major:minor weight" > blkio.weight_device

major:minor represents device number.

And you can remove weight for a given device as following:
#echo "major:minor 0" > blkio.weight_device

V1->V2 changes:
- use user interface "weight_device" instead of "policy" suggested by Vivek
- rename some struct suggested by Vivek
- rebase to 2.6-block "for-linus" branch
- remove an useless list_empty check pointed out by Li Zefan
- some trivial typo fix

V2->V3 changes:
- Move policy_*_node() functions up to get rid of forward declarations
- rename related functions by adding prefix "blkio_"
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

34d0f179

09 Apr, 2010 4 commits

blkio: Add more debug-only per-cgroup stats · 812df48d

Divyesh Shah authored Apr 08, 2010

1) group_wait_time - This is the amount of time the cgroup had to wait to get a
timeslice for one of its queues from when it became busy, i.e., went from 0
to 1 request queued. This is different from the io_wait_time which is the
cumulative total of the amount of time spent by each IO in that cgroup waiting
in the scheduler queue. This stat is a great way to find out any jobs in the
fleet that are being starved or waiting for longer than what is expected (due
to an IO controller bug or any other issue).
2) empty_time - This is the amount of time a cgroup spends w/o any pending
requests. This stat is useful when a job does not seem to be able to use its
assigned disk share by helping check if that is happening due to an IO
controller bug or because the job is not submitting enough IOs.
3) idle_time - This is the amount of time spent by the IO scheduler idling
for a given cgroup in anticipation of a better request than the exising ones
from other queues/cgroups.

All these stats are recorded using start and stop events. When reading these
stats, we do not add the delta between the current time and the last start time
if we're between the start and stop events. We avoid doing this to make sure
that these numbers are always monotonically increasing when read. Since we're
using sched_clock() which may use the tsc as its source, it may induce some
inconsistency (due to tsc resync across cpus) if we included the current delta.

Signed-off-by: Divyesh Shah<dpshah@google.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

812df48d

blkio: Add io_queued and avg_queue_size stats · cdc1184c

Divyesh Shah authored Apr 08, 2010

These stats are useful for getting a feel for the queue depth of the cgroup,
i.e., how filled up its queues are at a given instant and over the existence of
the cgroup. This ability is useful when debugging problems in the wild as it
helps understand the application's IO pattern w/o having to read through the
userspace code (coz its tedious or just not available) or w/o the ability
to run blktrace (since you may not have root access and/or not want to disturb
performance).

Signed-off-by: Divyesh Shah<dpshah@google.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

cdc1184c

blkio: Add io_merged stat · 812d4026

Divyesh Shah authored Apr 08, 2010

This includes both the number of bios merged into requests belonging to this
cgroup as well as the number of requests merged together.
In the past, we've observed different merging behavior across upstream kernels,
some by design some actual bugs. This stat helps a lot in debugging such
problems when applications report decreased throughput with a new kernel
version.

This needed adding an extra elevator function to capture bios being merged as I
did not want to pollute elevator code with blkiocg knowledge and hence needed
the accounting invocation to come from CFQ.

Signed-off-by: Divyesh Shah<dpshah@google.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

812d4026

blkio: Changes to IO controller additional stats patches · 84c124da

Divyesh Shah authored Apr 09, 2010

that include some minor fixes and addresses all comments.

Changelog: (most based on Vivek Goyal's comments)
o renamed blkiocg_reset_write to blkiocg_reset_stats
o more clarification in the documentation on io_service_time and io_wait_time
o Initialize blkg->stats_lock
o rename io_add_stat to blkio_add_stat and declare it static
o use bool for direction and sync
o derive direction and sync info from existing rq methods
o use 12 for major:minor string length
o define io_service_time better to cover the NCQ case
o add a separate reset_stats interface
o make the indexed stats a 2d array to simplify macro and function pointer code
o blkio.time now exports in jiffies as before
o Added stats description in patch description and
  Documentation/cgroup/blkio-controller.txt
o Prefix all stats functions with blkio and make them static as applicable
o replace IO_TYPE_MAX with IO_TYPE_TOTAL
o Moved #define constant to top of blk-cgroup.c
o Pass dev_t around instead of char *
o Add note to documentation file about resetting stats
o use BLK_CGROUP_MODULE in addition to BLK_CGROUP config option in #ifdef
  statements
o Avoid struct request specific knowledge in blk-cgroup. blk-cgroup.h now has
  rq_direction() and rq_sync() functions which are used by CFQ and when using
  io-controller at a higher level, bio_* functions can be added.

Signed-off-by: Divyesh Shah<dpshah@google.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

84c124da

06 Apr, 2010 1 commit

laptop-mode: Make flushes per-device · 31373d09

Matthew Garrett authored Apr 06, 2010

One of the features of laptop-mode is that it forces a writeout of dirty
pages if something else triggers a physical read or write from a device.
The current implementation flushes pages on all devices, rather than only
the one that triggered the flush. This patch alters the behaviour so that
only the recently accessed block device is flushed, preventing other
disks being spun up for no terribly good reason.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

31373d09

02 Apr, 2010 7 commits

blkio: Increment the blkio cgroup stats for real now · 9195291e

Divyesh Shah authored Apr 01, 2010

We also add start_time_ns and io_start_time_ns fields to struct request
here to record the time when a request is created and when it is
dispatched to device. We use ns uints here as ms and jiffies are
not very useful for non-rotational media.

Signed-off-by: Divyesh Shah<dpshah@google.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

9195291e

blkio: Add io controller stats like · 303a3acb

Divyesh Shah authored Apr 01, 2010

- io_service_time
- io_wait_time
- io_serviced
- io_service_bytes

These stats are accumulated per operation type helping us to distinguish between
read and write, and sync and async IO. This patch does not increment any of
these stats.

Signed-off-by: Divyesh Shah<dpshah@google.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

303a3acb

blkio: Remove per-cfqq nr_sectors as we'll be passing · 9a0785b0

Divyesh Shah authored Apr 01, 2010

that info at request dispatch with other stats now. This patch removes the
existing support for accounting sectors for a blkio_group. This will be added
back differently in the next two patches.

Signed-off-by: Divyesh Shah<dpshah@google.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

9a0785b0

Merge branch 'for-linus' into for-2.6.35 · ed6b6dc7
Jens Axboe authored Apr 02, 2010

ed6b6dc7

Block: Fix block/elevator.c elevator_get() off-by-one error · a506aedc

wzt.wzt@gmail.com authored Apr 02, 2010

elevator_get() not check the name length, if the name length > sizeof(elv),
elv will miss the '\0'. And elv buffer will be replace "-iosched" as something
like aaaaaaaaa, then call request_module() can load an not trust module.
Signed-off-by: Zhitong Wang <zhitong.wangzt@alibaba-inc.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

a506aedc

drbd: lc_element_by_index() never returns NULL · b2b163dd

Philipp Reisner authored Apr 02, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

b2b163dd

cciss: unlock on error path · 61917bda

Dan Carpenter authored Apr 02, 2010

We take the spin_lock again in fail_all_cmds() so we need to unlock
here.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

61917bda

30 Mar, 2010 6 commits

Linux 2.6.34-rc3 · 2eaa9cfd
Linus Torvalds authored Mar 30, 2010

2eaa9cfd

KEYS: Add MAINTAINERS record · e971461f

David Howells authored Mar 29, 2010

Add a MAINTAINERS record for the key management facility.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

e971461f

Merge branch 'for-linus' of... · 246750ff

Linus Torvalds authored Mar 30, 2010

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
  CRED: Fix memory leak in error handling

246750ff

Merge git://git.kernel.org/pub/scm/linux/kernel/git/joern/logfs · 4660d3d2

Linus Torvalds authored Mar 30, 2010

* git://git.kernel.org/pub/scm/linux/kernel/git/joern/logfs:
  [LogFS] Erase new journal segments
  [LogFS] Move reserved segments with journal
  [LogFS] Clear PagePrivate when moving journal
  Simplify and fix pad_wbuf
  Prevent data corruption in logfs_rewrite_block()
  Use deactivate_locked_super
  Fix logfs_get_sb_final error path
  Write out both superblocks on mismatch
  Prevent schedule while atomic in __logfs_readdir
  Plug memory leak in writeseg_end_io
  Limit max_pages for insane devices
  Open segment file before using it

4660d3d2

Merge branch 'x86-fixes-for-linus' of... · be3fd3cc

Linus Torvalds authored Mar 30, 2010

Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Do not free zero sized per cpu areas
  x86: Make sure free_init_pages() frees pages on page boundary
  x86: Make smp_locks end with page alignment

be3fd3cc

CRED: Fix memory leak in error handling · 570b8fb5

Mathieu Desnoyers authored Mar 30, 2010

Fix a memory leak on an OOM condition in prepare_usermodehelper_creds().
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>

570b8fb5

29 Mar, 2010 21 commits

Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 · 9623e5a2

Linus Torvalds authored Mar 29, 2010

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
  ocfs2: Fix a race in o2dlm lockres mastery
  Ocfs2: Handle deletion of reflinked oprhan inodes correctly.
  Ocfs2: Journaling i_flags and i_orphaned_slot when adding inode to orphan dir.
  ocfs2: Clear undo bits when local alloc is freed
  ocfs2: Init meta_ac properly in ocfs2_create_empty_xattr_block.
  ocfs2: Fix the update of name_offset when removing xattrs
  ocfs2: Always try for maximum bits with new local alloc windows
  ocfs2: set i_mode on disk during acl operations
  ocfs2: Update i_blocks in reflink operations.
  ocfs2: Change bg_chain check for ocfs2_validate_gd_parent.
  [PATCH] Skip check for mandatory locks when unlocking

9623e5a2

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client · 9f321603

Linus Torvalds authored Mar 29, 2010

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (28 commits)
  ceph: update discussion list address in MAINTAINERS
  ceph: some documentations fixes
  ceph: fix use after free on mds __unregister_request
  ceph: avoid loaded term 'OSD' in documention
  ceph: fix possible double-free of mds request reference
  ceph: fix session check on mds reply
  ceph: handle kmalloc() failure
  ceph: propagate mds session allocation failures to caller
  ceph: make write_begin wait propagate ERESTARTSYS
  ceph: fix snap rebuild condition
  ceph: avoid reopening osd connections when address hasn't changed
  ceph: rename r_sent_stamp r_stamp
  ceph: fix connection fault con_work reentrancy problem
  ceph: prevent dup stale messages to console for restarting mds
  ceph: fix pg pool decoding from incremental osdmap update
  ceph: fix mds sync() race with completing requests
  ceph: only release unused caps with mds requests
  ceph: clean up handle_cap_grant, handle_caps wrt session mutex
  ceph: fix session locking in handle_caps, ceph_check_caps
  ceph: drop unnecessary WARN_ON in caps migration
  ...

9f321603

Merge branch 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging · 9d54e2c0

Linus Torvalds authored Mar 29, 2010

* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
  hwmon: (asc7621) Add X58 entry in Kconfig
  hwmon: (w83793) Saving negative errors in unsigned
  hwmon: (coretemp) Add missing newline to dev_warn() message
  hwmon: (coretemp) Fix cpu model output

9d54e2c0

Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev · 7b128872
Linus Torvalds authored Mar 29, 2010
```
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  pata_via: fix VT6410/6415/6330 detection issue
```
7b128872

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 · 6631424f

Linus Torvalds authored Mar 29, 2010

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (33 commits)
  r8169: offical fix for CVE-2009-4537 (overlength frame DMAs)
  ipv6: Don't drop cache route entry unless timer actually expired.
  tulip: Add missing parens.
  r8169: fix broken register writes
  pcnet_cs: add new id
  bonding: fix broken multicast with round-robin mode
  drivers/net: Fix continuation lines
  e1000: do not modify tx_queue_len on link speed change
  net: ipmr/ip6mr: prevent out-of-bounds vif_table access
  ixgbe: Do not run all Diagnostic offline tests when VFs are active
  igb: use correct bits to identify if managability is enabled
  benet: Fix compile warnnings in drivers/net/benet/be_ethtool.c
  net: Add MSG_WAITFORONE flag to recvmmsg
  e1000e: do not modify tx_queue_len on link speed change
  igbvf: do not modify tx_queue_len on link speed change
  ipv4: Restart rt_intern_hash after emergency rebuild (v2)
  ipv4: Cleanup struct net dereference in rt_intern_hash
  net: fix netlink address dumping in IPv4/IPv6
  tulip: Fix null dereference in uli526x_rx_packet()
  gianfar: fix undo of reserve()
  ...

6631424f

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6 · c45140a9

Linus Torvalds authored Mar 29, 2010

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  sparc64: Properly truncate pt_regs framepointer in perf callback.
  arch/sparc/kernel: Use set_cpus_allowed_ptr
  sparc: Fix use of uid16_t and gid16_t in asm/stat.h

c45140a9

ext3: fix broken handling of EXT3_STATE_NEW · de329820

Linus Torvalds authored Mar 29, 2010

In commit 9df93939 ("ext3: Use bitops to read/modify
EXT3_I(inode)->i_state") ext3 changed its internal 'i_state' variable to
use bitops for its state handling.  However, unline the same ext4
change, it didn't actually change the name of the field when it changed
the semantics of it.

As a result, an old use of 'i_state' remained in fs/ext3/ialloc.c that
initialized the field to EXT3_STATE_NEW.  And that does not work
_at_all_ when we're now working with individually named bits rather than
values that get masked.  So the code tried to mark the state to be new,
but in actual fact set the field to EXT3_STATE_JDATA.  Which makes no
sense at all, and screws up all the code that checks whether the inode
was newly allocated.

In particular, it made the xattr code unhappy, and caused various random
behavior, like apparently

	https://bugzilla.redhat.com/show_bug.cgi?id=577911

So fix the initialization, and rename the field to match ext4 so that we
don't have this happen again.

Cc: James Morris <jmorris@namei.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Daniel J Walsh <dwalsh@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

de329820

r8169: offical fix for CVE-2009-4537 (overlength frame DMAs) · c0cd884a

Neil Horman authored Mar 29, 2010

Official patch to fix the r8169 frame length check error.

Based on this initial thread:
http://marc.info/?l=linux-netdev&m=126202972828626&w=1
This is the official patch to fix the frame length problems in the r8169
driver. As noted in the previous thread, while this patch incurs a performance
hit on the driver, its possible to improve performance dynamically by updating
the mtu and rx_copybreak values at runtime to return performance to what it was
for those NICS which are unaffected by the ideosyncracy (if there are any).

Summary:

A while back Eric submitted a patch for r8169 in which the proper
allocated frame size was written to RXMaxSize to prevent the NIC from dmaing too
much data. This was done in commit fdd7b4c3. A
long time prior to that however, Francois posted
126fa4b9, which expiclitly disabled the MaxSize
setting due to the fact that the hardware behaved in odd ways when overlong
frames were received on NIC's supported by this driver. This was mentioned in a
security conference recently:
http://events.ccc.de/congress/2009/Fahrplan//events/3596.en.html

It seems that if we can't enable frame size filtering, then, as Eric correctly
noticed, we can find ourselves DMA-ing too much data to a buffer, causing
corruption. As a result is seems that we are forced to allocate a frame which
is ready to handle a maximally sized receive.

This obviously has performance issues with it, so to mitigate that issue, this
patch does two things:

1) Raises the copybreak value to the frame allocation size, which should force
appropriately sized packets to get allocated on rx, rather than a full new 16k
buffer.

2) This patch only disables frame filtering initially (i.e., during the NIC
open), changing the MTU results in ring buffer allocation of a size in relation
to the new mtu (along with a warning indicating that this is dangerous).

Because of item (2), individuals who can't cope with the performance hit (or can
otherwise filter frames to prevent the bug), or who have hardware they are sure
is unaffected by this issue, can manually lower the copybreak and reset the mtu
such that performance is restored easily.
Signed-off-by: Neil Horman <nhorman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c0cd884a

sparc64: Properly truncate pt_regs framepointer in perf callback. · 9e8307ec

David S. Miller authored Mar 29, 2010

For 32-bit processes, we save the full 64-bits of the regs in pt_regs.

But unlike when the userspace actually does load and store
instructions, the top 32-bits don't get automatically truncated by the
cpu in kernel mode (because the kernel doesn't execute with PSTATE_AM
address masking enabled).

So we have to do it by hand.
Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9e8307ec

hwmon: (asc7621) Add X58 entry in Kconfig · b00d8a7e

Jaswinder Singh Rajput authored Mar 29, 2010

Intel X58 have asc7621a chip. So added X58 entry in Kconfig for asc7621.
Also arranged existing models in ascending order.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>

b00d8a7e

hwmon: (w83793) Saving negative errors in unsigned · 3f7cd7ea

Dan Carpenter authored Mar 29, 2010

"ret" is used to store the return value for watchdog_trigger() and it
should be signed for the error handling to work.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>

3f7cd7ea

hwmon: (coretemp) Add missing newline to dev_warn() message · 4d7a5644

Dean Nelson authored Mar 29, 2010

Add missing newline to dev_warn() message string. This is more of an issue
with older kernels that don't automatically add a newline if it was missing
from the end of the previous line.
Signed-off-by: Dean Nelson <dnelson@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Jean Delvare <khali@linux-fr.org>

4d7a5644

hwmon: (coretemp) Fix cpu model output · fcc6a746

Prarit Bhargava authored Mar 29, 2010

Avoid hex and decimal confusion when printing out the cpu model.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>

fcc6a746

[LogFS] Erase new journal segments · 6be7fa06

Joern Engel authored Mar 29, 2010

If the device contains on old logfs image and the journal is moved to
segment that have never been used by the current logfs and not all
journal segments are erased before the next mount, the old content can
confuse mount code.  To prevent this, always erase the new journal
segments.
Signed-off-by: Joern Engel <joern@logfs.org>

6be7fa06

[LogFS] Move reserved segments with journal · 0943846a
Joern Engel authored Mar 29, 2010
```
Fixes a GC livelock.
Signed-off-by: Joern Engel <joern@logfs.org>
```
0943846a

x86: Do not free zero sized per cpu areas · eed63519

Ian Campbell authored Mar 28, 2010

This avoids an infinite loop in free_early_partial().

Add a warning to free_early_partial() to catch future problems.

-v5: put back start > end back into WARN_ONCE()
-v6: use one line for warning, suggested by Linus
-v7: more tests
-v8: remove the function name as suggested by Johannes
     WARN_ONCE() will print out that function name.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Joel Becker <joel.becker@oracle.com>
Tested-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: David Miller <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <1269830604-26214-4-git-send-email-yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

eed63519

x86: Make sure free_init_pages() frees pages on page boundary · c967da6a

Yinghai Lu authored Mar 28, 2010

When CONFIG_NO_BOOTMEM=y, it could use memory more effiently, or
in a more compact fashion.

Example:

 Allocated new RAMDISK: 00ec2000 - 0248ce57
 Move RAMDISK from 000000002ea04000 - 000000002ffcee56 to 00ec2000 - 0248ce56

The new RAMDISK's end is not page aligned.
Last page could be shared with other users.

When free_init_pages are called for initrd or .init, the page
could be freed and we could corrupt other data.

code segment in free_init_pages():

 |        for (; addr < end; addr += PAGE_SIZE) {
 |                ClearPageReserved(virt_to_page(addr));
 |                init_page_count(virt_to_page(addr));
 |                memset((void *)(addr & ~(PAGE_SIZE-1)),
 |                        POISON_FREE_INITMEM, PAGE_SIZE);
 |                free_page(addr);
 |                totalram_pages++;
 |        }

last half page could be used as one whole free page.

So page align the boundaries.

-v2: make the original initramdisk to be aligned, according to
     Johannes, otherwise we have the chance to lose one page.
     we still need to keep initrd_end not aligned, otherwise it could
     confuse decompressor.
-v3: change to WARN_ON instead, suggested by Johannes.
-v4: use PAGE_ALIGN, suggested by Johannes.
     We may fix that macro name later to PAGE_ALIGN_UP, and PAGE_ALIGN_DOWN
     Add comments about assuming ramdisk start is aligned
     in relocate_initrd(), change to re get ramdisk_image instead of save it
     to make diff smaller. Add warning for wrong range, suggested by Johannes.
-v6: remove one WARN()
     We need to align beginning in free_init_pages()
     do not copy more than ramdisk_size, noticed by Johannes
Reported-by: Stanislaw Gruszka <sgruszka@redhat.com>
Tested-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Miller <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <1269830604-26214-3-git-send-email-yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

c967da6a

ceph: update discussion list address in MAINTAINERS · 82593f87
Sage Weil authored Mar 29, 2010
```
Signed-off-by: Sage Weil <sage@newdream.net>
```
82593f87

ceph: some documentations fixes · 8136b58d

Cheng Renquan authored Mar 29, 2010

New documentation should have an entry in the 00-INDEX.  Correct git
urls.
Signed-off-by: Cheng Renquan <crquan@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>

8136b58d

x86: Make smp_locks end with page alignment · 596b711e

Yinghai Lu authored Mar 28, 2010

Fix:

 ------------[ cut here ]------------
 WARNING: at arch/x86/mm/init.c:342 free_init_pages+0x4c/0xfa()
 free_init_pages: range [0x40daf000, 0x40db5c24] is not aligned
 Modules linked in:
 Pid: 0, comm: swapper Not tainted
 2.6.34-rc2-tip-03946-g4f16b23-dirty #50 Call Trace:
  [<40232e9f>] warn_slowpath_common+0x65/0x7c
  [<4021c9f0>] ? free_init_pages+0x4c/0xfa
  [<40881434>] ? _etext+0x0/0x24
  [<40232eea>] warn_slowpath_fmt+0x24/0x27
  [<4021c9f0>] free_init_pages+0x4c/0xfa
  [<40881434>] ? _etext+0x0/0x24
  [<40d3f4bd>] alternative_instructions+0xf6/0x100
  [<40d3fe4f>] check_bugs+0xbd/0xbf
  [<40d398a7>] start_kernel+0x2d5/0x2e4
  [<40d390ce>] i386_start_kernel+0xce/0xd5
 ---[ end trace 4eaa2a86a8e2da22 ]---

Comments in vmlinux.lds.S already said:

 |        /*
 |         * smp_locks might be freed after init
 |         * start/end must be page aligned
 |         */
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Miller <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <1269830604-26214-2-git-send-email-yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

596b711e

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide-2.6 · ad4ba059

Linus Torvalds authored Mar 29, 2010

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide-2.6:
  Revert "ide: skip probe if there are no devices on the port (v2)"
  Revert "via82cxxx: workaround h/w bugs"

ad4ba059