Commits · 02507a80b35edd720480540d917e9f92cc371009 · Kirill Smelkov / linux

18 Jan, 2010 9 commits

[SCSI] mac_esp: fix PIO mode, take 2 · 02507a80

Finn Thain authored Dec 05, 2009

The mac_esp PIO algorithm no longer works in 2.6.31 and crashes my Centris
660av. So here's a better one.

Also, force async with esp_set_offset() rather than esp_slave_configure().

One of the SCSI drives I tested still doesn't like the PIO mode and fails
with "esp: esp0: Reconnect IRQ2 timeout" (the same drive works fine in
PDMA mode).

This failure happens when esp_reconnect_with_tag() tries to read in two
tag bytes but the chip only provides one (0x20). I don't know what causes
this. I decided not to waste any more time trying to fix it because the
best solution is to rip out the PIO mode altogether and use the DMA
engine.
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>

02507a80

[SCSI] scsi_transport_fc: Remove capping from dev_loss_tmo · f2818663

Hannes Reinecke authored Dec 15, 2009

Currently dev_loss_tmo is capped by SCSI_DEVICE_BLOCK_MAX_TIMEOUT.
This causes problem with multipathing when the 'no_path_retry' setting
exceeds the dev_loss_tmo setting, as then the system might run into
a deadlock when all paths have been removed temporarily for longer
than dev_loss_tmo.
The principal reasons for the capping has been that we should
not allow a remote port to remain in status 'blocked' indefinitely,
so the capping is there to ensure that the port status is being reset
eventually.
However, the fast_io_fail_tmo will also move the remote port out of
the 'blocked' state, so for any HBA driver implementing both the
capping should really be on the fast_io_fail_tmo, and not on the
dev_loss_tmo.
This patch implements just that, ie the fast_io_fail_tmo is capped
to SCSI_DEVICE_BLOCK_TIMEOUT and the capping is removed from
dev_loss_tmo when fast_io_fail_tmo is set.
This allows us to synchronize the dev_loss_tmo setting to the
'no_path_retry' setting from multipathing thus avoiding the deadlock.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Acked-by: James Smart  <james.smart@emulex.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>

f2818663

[SCSI] fusion: fix warning when not using procfs · e47c11c7

Erik Ekman authored Dec 14, 2009

Fixes the following warning:
drivers/message/fusion/mptbase.c:129: warning: 'mpt_proc_root_dir' defined but not used
also moves it from public data section since it is static.
Signed-off-by: Erik Ekman <erik@kryo.se>
Acked-by: "Desai, Kashyap" <Kashyap.Desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>

e47c11c7

[SCSI] ibmmca: fix buffer overflow · 340f0520

Roel Kluin authored Dec 08, 2009

Allows i == IM_MAX_HOSTS, which is out of range.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>

340f0520

[SCSI] u14-34f: fix buffer overflow · 4a02462a

Roel Kluin authored Dec 08, 2009

This allows i == MAX_INT_PARAM, which is out of range for ints[]
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>

4a02462a

[SCSI] eata: fix buffer overflow · 8fe79162

Roel Kluin authored Dec 08, 2009

Allows i == MAX_INT_PARAM, which is out of range.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>

8fe79162

[SCSI] ibmvscsi: fix a typo in a source code comment · 9b7dac08

Bart Van Assche authored Dec 04, 2009

Signed-off-by: Bart Van Assche <bart.vanassche@gmail.com>
Acked-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>

9b7dac08

[SCSI] libsrp: fix typo -- replace RDAM by RDMA · 40c4f3e4

Bart Van Assche authored Dec 04, 2009

Fixed a typo in libsrp.c: replaced two occurrences of 'RDAM' by 'RDMA'.
Signed-off-by: Bart Van Assche <bart.vanassche@gmail.com>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>

40c4f3e4

[SCSI] eliminate potential kmalloc failure in scsi_get_vpd_page() · e3deec09

James Bottomley authored Nov 03, 2009

The best way to fix this is to eliminate the intenal kmalloc() and
make the caller allocate the required amount of storage.
Signed-off-by: James Bottomley <James.Bottomley@suse.de>

e3deec09

17 Jan, 2010 16 commits

[SCSI] aic79xx: check for non-NULL scb in ahd_handle_nonpkt_busfree · 534ef056

Hannes Reinecke authored Jan 15, 2010

When removing several devices aic79xx will occasionally Oops
in ahd_handle_nonpkt_busfree during rescan. Looking at the
code I found that we're indeed not checking if the scb in
question is NULL. So check for it before accessing it.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>

534ef056

[SCSI] zfcp: Set hardware timeout as requested by BSG request. · 51375ee8

Swen Schillig authored Jan 14, 2010

The hardware used with zfcp provides a timer for CT and ELS requests
instead of an abort capability for these commands. To correctly handle
the FC BSG timeouts, pass the timeout from the BSG requests to the
hardware.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>

51375ee8

[SCSI] zfcp: Introduce bsg_timeout callback. · 491ca442

Swen Schillig authored Jan 14, 2010

Introduce a zfcp callback for timeouts triggered from FC BSG.  With
zfcp, the underlying hardware cannot abort CT or ELS requests, so
there is nothing to do when the block layer timeout expires.  To avoid
interference with the block layer timeout, simply indicate that the
block layer timer should be reset. The timer running in the hardware
for the pending CT or ELS request will return the request when it
expires.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>

491ca442

[SCSI] scsi_transport_fc: Allow LLD to reset FC BSG timeout · b8f08645

Swen Schillig authored Jan 14, 2010

The hardware used with zfcp cannot abort a currently pending CT or ELS
request. Therefore we need the option to postpone the timeout
triggered request abort within the fc layer, since there is nothing
zfcp can do to stop the request at this point.

Cc: James Smart <James.Smart@emulex.com>
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>

b8f08645

[SCSI] zfcp: add missing compat ptr conversion · 9e2ab1fa

Heiko Carstens authored Jan 13, 2010

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>

9e2ab1fa

[SCSI] zfcp: Fix linebreak in hba trace · 5a3fb308

Christof Schmitt authored Jan 13, 2010

Advance the correct pointer when inserting the linebreak for the HBA
trace. It was missing in the output since the pointer to the output
buffer was never advanced, and the linebreak character was overwritten
later.
Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>

5a3fb308

[SCSI] zfcp: Issue zfcp_fc_wka_port_put after FC CT BSG request · f09d5454

Christof Schmitt authored Jan 13, 2010

The patch "zfcp: Simplify handling of ct and els requests"
accidentally removed the call to zfcp_fc_wka_port_put for FC CT BSG
requests, thus not issuing a "close" request for the WKA ports.
Introduce a CT specific handler to first call zfcp_fc_wka_port_put and
then continue with the generic handler when returning from FC CT BSG
requests.
Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>

f09d5454

[SCSI] qla2xxx: Update version number to 8.03.01-k10. · 22c24734

Giridhar Malavali authored Jan 12, 2010

Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>

22c24734

[SCSI] fc-transport: Use packed modifier for fc_bsg_request structure. · eda05a28

Harish Zunjarrao authored Jan 12, 2010

The 32bit kernel does not add padding bytes in the fc_bsg_request structure
whereas the 64bit kernel adds padding bytes in the fc_bsg_request structure.
Due to this, structure elements gets mismatched with 32bit application and
64bit kernel.To resolve this, used packed modifier to avoid adding padding bytes.
Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>

eda05a28

[SCSI] qla2xxx: Perform fast mailbox read of flash regardless of size nor address alignment. · 368bbe07
Andrew Vasquez authored Jan 12, 2010
```
Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
```
368bbe07

[SCSI] qla2xxx: Correct FCP2 recovery handling. · f08b7251

Andrew Vasquez authored Jan 12, 2010

The driver did not account for non-tape devices needing to employ
proper FCP2 recovery. Driver now checks the FCP2-capable flag
only, rather than using a midlayer-determined flag (TYPE_TAPE).
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>

f08b7251

[SCSI] scsi_lib: Fix bug in completion of bidi commands · 63c43b0e

Boaz Harrosh authored Dec 15, 2009

Because of the terrible structuring of scsi-bidi-commands
it breaks some of the life time rules of a scsi-command.
It is now not allowed to free up the block-request before
cleanup and partial deallocation of the scsi-command. (Which
is not so for none bidi commands)

The right fix to this problem would be to make bidi command
a first citizen by allocating a scsi_sdb pointer at scsi command
just like cmd->prot_sdb. The bidi sdb should be allocated/deallocated
as part of the get/put_command (Again like the prot_sdb) and the
current decoupling of scsi_cmnd and blk-request should be kept.

For now make sure scsi_release_buffers() is called before the
call to blk_end_request_all() which might cause the suicide of
the block requests. At best the leak of bidi buffers, at worse
a crash, as there is a race between the existence of the bidi_request
and the free of the associated bidi_sdb.

The reason this was never hit before is because only OSD has the potential
of doing asynchronous bidi commands. (So does bsg but it is never used)
And OSD clients just happen to do all their bidi commands synchronously, up
until recently.

CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>

63c43b0e

[SCSI] mptsas: Fix issue with chain pools allocation on katmai · f1053a7c

Anatolij Gustschin authored Dec 12, 2009

Since commit 9d2e9d66
mptsas driver fails to allocate memory for the MPT chain buffers
for second LSI adapter on PPC440SPe Katmai platform:
...
ioc1: LSISAS1068E B3: Capabilities={Initiator}
mptbase: ioc1: ERROR - Unable to allocate Reply, Request, Chain Buffers!
mptbase: ioc1: ERROR - didn't initialize properly! (-3)
mptsas: probe of 0002:31:00.0 failed with error -3

This commit increased MPT_FC_CAN_QUEUE value but initChainBuffers()
doesn't differentiate between SAS and FC causing increased allocation
for SAS case, too. Later pci_alloc_consistent() fails to allocate
increased chain buffer pool size for SAS case.

Provide a fix by looking at the bus type and using appropriate
MPT_SAS_CAN_QUEUE value while calculation of the number of chain
buffers.
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Acked-by: Kashyap Desai <kashyap.desai@lsi.com>
Cc: Stable Tree <stable@kernel.org>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>

f1053a7c

[SCSI] aacraid: fix File System going into read-only mode · cacb6dc3

Penchala Narasimha Reddy Chilakala, ERS-HCLTech authored Dec 21, 2009

These particular problems were reported by Cisco and SAP and customers
as well. Cisco reported on RHEL4 U6 and SAP reported on SLES9 SP4 and
SLES10 SP2. We added these fixes on RHEL4 U6 and gave a private build
to IBM and Cisco. Cisco and IBM tested it for more than 15 days and
they reported that they did not see the issue so far. Before the fix,
Cisco used to see the issue within 5 days. We generated a patch for
SLES9 SP4 and SLES10 SP2 and submitted to Novell. Novell applied the
patch and gave a test build to SAP. SAP tested and reported that the
build is working properly.

We also tested in our lab using the tools "dishogsync", which is IO
stress tool and the tool was provided by Cisco.

Issue1:  File System going into read-only mode

Root cause: The driver tends to not free the memory (FIB) when the
management request exits prematurely. The accumulation of such
un-freed memory causes the driver to fail to allocate anymore memory
(FIB) and hence return 0x70000 value to the upper layer, which puts
the file system into read only mode.

Fix details: The fix makes sure to free the memory (FIB) even if the
request exits prematurely hence ensuring the driver wouldn't run out
of memory (FIBs).


Issue2: False Raid Alert occurs

When the Physical Drives and Logical drives are reported as deleted or
added, even though there is no change done on the system

Root cause: Driver IOCTLs is signaled with EINTR while waiting on
response from the lower layers. Returning "EINTR" will never initiate
internal retry.

Fix details: The issue was fixed by replacing "EINTR" with
"ERESTARTSYS" for mid-layer retries.
Signed-off-by: Penchala Narasimha Reddy <ServeRAIDDriver@hcl.in>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>

cacb6dc3

[SCSI] lpfc: fix file permissions · e6622df3

James Bottomley authored Jan 07, 2010

lpfc_hbadisc.c and lpfc_hw4.h accidentally got set executable.
Reported-by: Thomas Backlund <tmb@mandriva.org>
Cc: James Smart <James.Smart@Emulex.Com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>

e6622df3

page allocator: update NR_FREE_PAGES only when necessary · 6ccf80eb

KOSAKI Motohiro authored Jan 15, 2010

commit f2260e6b (page allocator: update NR_FREE_PAGES only as necessary)
made one minor regression.  if __rmqueue() was failed, NR_FREE_PAGES stat
go wrong.  this patch fixes it.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reported-by: Huang Shijie <shijie8@gmail.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

6ccf80eb

16 Jan, 2010 15 commits

Merge branch 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging · 1f0b8b95

Linus Torvalds authored Jan 16, 2010

* 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
  i2c: Do not use device name after device_unregister
  i2c/pca: Don't use *_interruptible
  i2c-ali1563: Remove sparse warnings
  i2c: Test off by one in {piix4,vt596}_transaction()
  i2c-core: Storage class should be before const qualifier

1f0b8b95

Merge branch 'x86-fixes-for-linus' of... · 330a518a

Linus Torvalds authored Jan 16, 2010

Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, uv: Ensure hub revision set for all ACPI modes.
  x86, uv: Add function retrieving node controller revision number
  x86: xen: 64-bit kernel RPL should be 0
  x86: kernel_thread() -- initialize SS to a known state
  x86/agp: Fix agp_amd64_init and agp_amd64_cleanup
  x86: SGI UV: Fix mapping of MMIO registers
  x86: mce.h: Fix warning in header checks

330a518a

Merge branch 'core-fixes-for-linus' of... · 2a8249da

Linus Torvalds authored Jan 16, 2010

Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  futexes: Remove rw parameter from get_futex_key()

2a8249da

Merge branch 'perf-fixes-for-linus' of... · c6a93d33

Linus Torvalds authored Jan 16, 2010

Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf tools: Check if /dev/null can be used as the -o gcc argument
  perf tools: Move QUIET_STDERR def to before first use
  perf: Stop stack frame walking off kernel addresses boundaries

c6a93d33

Merge branch 'tracing-fixes-for-linus' of... · 6ccc347b

Linus Torvalds authored Jan 16, 2010

Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tracing/filters: Add comment for match callbacks
  tracing/filters: Fix MATCH_FULL filter matching for PTR_STRING
  tracing/filters: Fix MATCH_MIDDLE_ONLY filter matching
  lib: Introduce strnstr()
  tracing/filters: Fix MATCH_END_ONLY filter matching
  tracing/filters: Fix MATCH_FRONT_ONLY filter matching
  ftrace: Fix MATCH_END_ONLY function filter
  tracing/x86: Derive arch from bits argument in recordmcount.pl
  ring-buffer: Add rb_list_head() wrapper around new reader page next field
  ring-buffer: Wrap a list.next reference with rb_list_head()

6ccc347b

revert "drivers/video/s3c-fb.c: fix clock setting for Samsung SoC Framebuffer" · eb29a5cc

Mark Brown authored Jan 15, 2010

Fix divide by zero and broken output.  Commit 600ce1a0 ("fix clock
setting for Samsung SoC Framebuffer") introduced a mandatory refresh
parameter to the platform data for the S3C framebuffer but did not
introduce any validation code, causing existing platforms (none of which
have refresh set) to divide by zero whenever the framebuffer is
configured, generating warnings and unusable output.

Ben Dooks noted several problems with the patch:

 - The platform data supplies the pixclk directly and should already
   have taken care of the refresh rate.
 - The addition of a window ID parameter doesn't help since only the
   root framebuffer can control the pixclk.
 - pixclk is specified in picoseconds (rather than Hz) as the patch
   assumed.

and suggests reverting the commit so do that.  Without fixing this no
mainline user of the driver will produce output.

[akpm@linux-foundation.org: don't revert the correct bit]
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: InKi Dae <inki.dae@samsung.com>
Cc: Kyungmin Park <kmpark@infradead.org>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Ben Dooks <ben-linux@fluff.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

eb29a5cc

nommu: fix shared mmap after truncate shrinkage problems · 7e660872

David Howells authored Jan 15, 2010

Fix a problem in NOMMU mmap with ramfs whereby a shared mmap can happen
over the end of a truncation.  The problem is that
ramfs_nommu_check_mappings() checks that the reduced file size against the
VMA tree, but not the vm_region tree.

The following sequence of events can cause the problem:

	fd = open("/tmp/x", O_RDWR|O_TRUNC|O_CREAT, 0600);
	ftruncate(fd, 32 * 1024);
	a = mmap(NULL, 32 * 1024, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
	b = mmap(NULL, 16 * 1024, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
	munmap(a, 32 * 1024);
	ftruncate(fd, 16 * 1024);
	c = mmap(NULL, 32 * 1024, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);

Mapping 'a' creates a vm_region covering 32KB of the file.  Mapping 'b'
sees that the vm_region from 'a' is covering the region it wants and so
shares it, pinning it in memory.

Mapping 'a' then goes away and the file is truncated to the end of VMA
'b'.  However, the region allocated by 'a' is still in effect, and has
_not_ been reduced.

Mapping 'c' is then created, and because there's a vm_region covering the
desired region, get_unmapped_area() is _not_ called to repeat the check,
and the mapping is granted, even though the pages from the latter half of
the mapping have been discarded.

However:

	d = mmap(NULL, 16 * 1024, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);

Mapping 'd' should work, and should end up sharing the region allocated by
'a'.

To deal with this, we shrink the vm_region struct during the truncation,
lest do_mmap_pgoff() take it as licence to share the full region
automatically without calling the get_unmapped_area() file op again.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Greg Ungerer <gerg@snapgear.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7e660872

nommu: fix race between ramfs truncation and shared mmap · 81759b5b

David Howells authored Jan 15, 2010

Fix the race between the truncation of a ramfs file and an attempt to make
a shared mmap of region of that file.

The problem is that do_mmap_pgoff() calls f_op->get_unmapped_area() to
verify that the file region is made of contiguous pages and to find its
base address - but there isn't any locking to guarantee this region until
vma_prio_tree_insert() is called by add_vma_to_mm().

Note that moving the functionality into f_op->mmap() doesn't help as that
is also called before vma_prio_tree_insert().

Instead make ramfs_nommu_check_mappings() grab nommu_region_sem whilst it
does its checks.  This means that this function will wait whilst mmaps
take place.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Greg Ungerer <gerg@snapgear.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

81759b5b

nommu: don't need get_unmapped_area() for NOMMU · efc1a3b1

David Howells authored Jan 15, 2010

get_unmapped_area() is unnecessary for NOMMU as no-one calls it.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Greg Ungerer <gerg@snapgear.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

efc1a3b1

nommu: remove a superfluous check of vm_region::vm_usage · 779c1023

David Howells authored Jan 15, 2010

In split_vma(), there's no need to check if the VMA being split has a
region that's in use by more than one VMA because:

 (1) The preceding test prohibits splitting of non-anonymous VMAs and regions
     (eg: file or chardev backed VMAs).

 (2) Anonymous regions can't be mapped multiple times because there's no handle
     by which to refer to the already existing region.

 (3) If a VMA has previously been split, then the region backing it has also
     been split into two regions, each of usage 1.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Greg Ungerer <gerg@snapgear.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

779c1023

nommu: struct vm_region's vm_usage count need not be atomic · 1e2ae599

David Howells authored Jan 15, 2010

The vm_usage count field in struct vm_region does not need to be atomic as
it's only even modified whilst nommu_region_sem is write locked.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Greg Ungerer <gerg@snapgear.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

1e2ae599

nommu: fix SYSV SHM for NOMMU · ed5e5894

David Howells authored Jan 15, 2010

Commit c4caa778 ("file
->get_unmapped_area() shouldn't duplicate work of get_unmapped_area()")
broke SYSV SHM for NOMMU by taking away the pointer to
shm_get_unmapped_area() from shm_file_operations.

Put it back conditionally on CONFIG_MMU=n.

file->f_ops->get_unmapped_area() is used to find out the base address for a
mapping of a mappable chardev device or mappable memory-based file (such as a
ramfs file).  It needs to be called prior to file->f_ops->mmap() being called.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Greg Ungerer <gerg@snapgear.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ed5e5894

sysdev: fix prototype for memory_sysdev_class show/store functions · 8ff410da

Wu Fengguang authored Jan 15, 2010

The function prototype mismatches in call stack:

                [<ffffffff81494268>] print_block_size+0x58/0x60
                [<ffffffff81487e3f>] sysdev_class_show+0x1f/0x30
                [<ffffffff811d629b>] sysfs_read_file+0xcb/0x1f0
                [<ffffffff81176328>] vfs_read+0xc8/0x180

Due to prototype mismatch, print_block_size() will sprintf() into
*attribute instead of *buf, hence user space will read the initial
zeros from *buf:
	$ hexdump /sys/devices/system/memory/block_size_bytes
	0000000 0000 0000 0000 0000
	0000008

After patch:
	cat /sys/devices/system/memory/block_size_bytes
	0x8000000

This complements commits c29af9636 and 4a0b2b4d.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: "Zheng, Shaohui" <shaohui.zheng@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8ff410da

memory-hotplug: add 0x prefix to HEX block_size_bytes · ba168fc3

Wu Fengguang authored Jan 15, 2010

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ba168fc3

memcg: ensure list is empty at rmdir · fce66477

Daisuke Nishimura authored Jan 15, 2010

Current mem_cgroup_force_empty() only ensures mem->res.usage == 0 on
success.  But this doesn't guarantee memcg's LRU is really empty, because
there are some cases in which !PageCgrupUsed pages exist on memcg's LRU.

For example:
- Pages can be uncharged by its owner process while they are on LRU.
- race between mem_cgroup_add_lru_list() and __mem_cgroup_uncharge_common().

So there can be a case in which the usage is zero but some of the LRUs are not empty.

OTOH, mem_cgroup_del_lru_list(), which can be called asynchronously with
rmdir, accesses the mem_cgroup, so this access can cause a problem if it
races with rmdir because the mem_cgroup might have been freed by rmdir.

Actually, I saw a bug which seems to be caused by this race.

	[1530745.949906] BUG: unable to handle kernel NULL pointer dereference at 0000000000000230
	[1530745.950651] IP: [<ffffffff810fbc11>] mem_cgroup_del_lru_list+0x30/0x80
	[1530745.950651] PGD 3863de067 PUD 3862c7067 PMD 0
	[1530745.950651] Oops: 0002 [#1] SMP
	[1530745.950651] last sysfs file: /sys/devices/system/cpu/cpu7/cache/index1/shared_cpu_map
	[1530745.950651] CPU 3
	[1530745.950651] Modules linked in: configs ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp nfsd nfs_acl auth_rpcgss exportfs autofs4 hidp rfcomm l2cap crc16 bluetooth lockd sunrpc ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi dm_mirror dm_multipath scsi_dh video output sbs sbshc battery ac lp kvm_intel kvm sg ide_cd_mod cdrom serio_raw tpm_tis tpm tpm_bios acpi_memhotplug button parport_pc parport rtc_cmos rtc_core rtc_lib e1000 i2c_i801 i2c_core pcspkr dm_region_hash dm_log dm_mod ata_piix libata shpchp megaraid_mbox sd_mod scsi_mod megaraid_mm ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: freq_table]
	[1530745.950651] Pid: 19653, comm: shmem_test_02 Tainted: G   M       2.6.32-mm1-00701-g2b04386 #3 Express5800/140Rd-4 [N8100-1065]
	[1530745.950651] RIP: 0010:[<ffffffff810fbc11>]  [<ffffffff810fbc11>] mem_cgroup_del_lru_list+0x30/0x80
	[1530745.950651] RSP: 0018:ffff8803863ddcb8  EFLAGS: 00010002
	[1530745.950651] RAX: 00000000000001e0 RBX: ffff8803abc02238 RCX: 00000000000001e0
	[1530745.950651] RDX: 0000000000000000 RSI: ffff88038611a000 RDI: ffff8803abc02238
	[1530745.950651] RBP: ffff8803863ddcc8 R08: 0000000000000002 R09: ffff8803a04c8643
	[1530745.950651] R10: 0000000000000000 R11: ffffffff810c7333 R12: 0000000000000000
	[1530745.950651] R13: ffff880000017f00 R14: 0000000000000092 R15: ffff8800179d0310
	[1530745.950651] FS:  0000000000000000(0000) GS:ffff880017800000(0000) knlGS:0000000000000000
	[1530745.950651] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
	[1530745.950651] CR2: 0000000000000230 CR3: 0000000379d87000 CR4: 00000000000006e0
	[1530745.950651] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
	[1530745.950651] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
	[1530745.950651] Process shmem_test_02 (pid: 19653, threadinfo ffff8803863dc000, task ffff88038612a8a0)
	[1530745.950651] Stack:
	[1530745.950651]  ffffea00040c2fe8 0000000000000000 ffff8803863ddd98 ffffffff810c739a
	[1530745.950651] <0> 00000000863ddd18 000000000000000c 0000000000000000 0000000000000000
	[1530745.950651] <0> 0000000000000002 0000000000000000 ffff8803863ddd68 0000000000000046
	[1530745.950651] Call Trace:
	[1530745.950651]  [<ffffffff810c739a>] release_pages+0x142/0x1e7
	[1530745.950651]  [<ffffffff810c778f>] ? pagevec_move_tail+0x6e/0x112
	[1530745.950651]  [<ffffffff810c781e>] pagevec_move_tail+0xfd/0x112
	[1530745.950651]  [<ffffffff810c78a9>] lru_add_drain+0x76/0x94
	[1530745.950651]  [<ffffffff810dba0c>] exit_mmap+0x6e/0x145
	[1530745.950651]  [<ffffffff8103f52d>] mmput+0x5e/0xcf
	[1530745.950651]  [<ffffffff81043ea8>] exit_mm+0x11c/0x129
	[1530745.950651]  [<ffffffff8108fb29>] ? audit_free+0x196/0x1c9
	[1530745.950651]  [<ffffffff81045353>] do_exit+0x1f5/0x6b7
	[1530745.950651]  [<ffffffff8106133f>] ? up_read+0x2b/0x2f
	[1530745.950651]  [<ffffffff8137d187>] ? lockdep_sys_exit_thunk+0x35/0x67
	[1530745.950651]  [<ffffffff81045898>] do_group_exit+0x83/0xb0
	[1530745.950651]  [<ffffffff810458dc>] sys_exit_group+0x17/0x1b
	[1530745.950651]  [<ffffffff81002c1b>] system_call_fastpath+0x16/0x1b
	[1530745.950651] Code: 54 53 0f 1f 44 00 00 83 3d cc 29 7c 00 00 41 89 f4 75 63 eb 4e 48 83 7b 08 00 75 04 0f 0b eb fe 48 89 df e8 18 f3 ff ff 44 89 e2 <48> ff 4c d0 50 48 8b 05 2b 2d 7c 00 48 39 43 08 74 39 48 8b 4b
	[1530745.950651] RIP  [<ffffffff810fbc11>] mem_cgroup_del_lru_list+0x30/0x80
	[1530745.950651]  RSP <ffff8803863ddcb8>
	[1530745.950651] CR2: 0000000000000230
	[1530745.950651] ---[ end trace c3419c1bb8acc34f ]---
	[1530745.950651] Fixing recursive fault but reboot is needed!

The problem here is pages on LRU may contain pointer to stale memcg.  To
make res->usage to be 0, all pages on memcg must be uncharged or moved to
another(parent) memcg.  Moved page_cgroup have already removed from
original LRU, but uncharged page_cgroup contains pointer to memcg withou
PCG_USED bit.  (This asynchronous LRU work is for improving performance.)
If PCG_USED bit is not set, page_cgroup will never be added to memcg's
LRU.  So, about pages not on LRU, they never access stale pointer.  Then,
what we have to take care of is page_cgroup _on_ LRU list.  This patch
fixes this problem by making mem_cgroup_force_empty() visit all LRUs
before exiting its loop and guarantee there are no pages on its LRU.
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fce66477