- 08 May, 2008 1 commit
-
-
Jens Axboe authored
This reverts commit c3270e57.
-
- 07 May, 2008 11 commits
-
-
Randy Dunlap authored
Fix fs/bio.c kernel-doc parameter warning: Warning(linux-2.6.25-git14//fs/bio.c:972): No description found for parameter 'reading' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
Better than setting idx to some random value and it silences the same bogus gcc warning. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
get_part() is fairly expensive, as it O(N) loops over partitions to find the right one. In lots of normal IO paths we end up looking up the partition twice, to make matters even worse. Change the stat add code to accept a passed in partition instead. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
We currently set all processes to the best-effort scheduling class, regardless of what CPU scheduling class they belong to. Improve that so that we correctly track idle and rt scheduling classes as well. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
Original patch from Mikulas Patocka <mpatocka@redhat.com> Mike Anderson was doing an OLTP benchmark on a computer with 48 physical disks mapped to one logical device via device mapper. He found that there was a slowdown on request_queue->lock in function generic_unplug_device. The slowdown is caused by the fact that when some code calls unplug on the device mapper, device mapper calls unplug on all physical disks. These unplug calls take the lock, find that the queue is already unplugged, release the lock and exit. With the below patch, performance of the benchmark was increased by 18% (the whole OLTP application, not just block layer microbenchmarks). So I'm submitting this patch for upstream. I think the patch is correct, because when more threads call simultaneously plug and unplug, it is unspecified, if the queue is or isn't plugged (so the patch can't make this worse). And the caller that plugged the queue should unplug it anyway. (if it doesn't, there's 3ms timeout). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
They tend to depend a lot on the workload, so not a clear-cut likely or unlikely fit. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Miklos Szeredi authored
generic_file_splice_write() duplicates remove_suid() just because it doesn't hold i_mutex. But it grabs i_mutex inside splice_from_pipe() anyway, so this is rather pointless. Move locking to generic_file_splice_write() and call remove_suid() and __splice_from_pipe() instead. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
put_io_context() drops the RCU read lock before calling into cfq_dtor(), however we need to hold off freeing there before grabbing and dereferencing the first object on the list. So extend the rcu_read_lock() scope to cover the calling of cfq_dtor(), and optimize cfq_free_io_context() to use a new variant for call_for_each_cic() that assumes the RCU read lock is already held. Hit in the wild by Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
For most initialization purposes, calling blk_queue_init_tags() without the queue lock held is OK. Only if called for resizing an existing map must the lock be held. Ditto for tag cleanup, the maps are reference counted. So switch the general queue flag setting to the unlocked variant, but retain the locked variant for resizing. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
Concurrency isn't a big deal here since we have requests in flight at this point, but do the locked variant to set a better example. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
-
Linus Torvalds authored
This reverts commit 22eecde2. Uli reports that it breaks UML on x86-64 with the Fedora 8 gcc (gcc 4.1.2), causing a crash on startup. See http://marc.info/?l=linux-kernel&m=121011722806093&w=2 for a trace. Reported-by: Ulrich Drepper <drepper@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
- 06 May, 2008 28 commits
-
-
OGAWA Hirofumi authored
if ((drv->entry.next != drv->entry.prev) || (drv->entry.next != NULL)) { warns list_empty(&drv->entry). Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Greg KH <gregkh@suse.de> Cc: Len Brown <lenb@kernel.org> [ Version 2 totally redone based on suggestions from Linus & Greg ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Linus Torvalds authored
Commit 33dcdac2 ("kill ->put_inode") removed the final use of i_op->put_inode, but left the now totally unused "op" variable in iput(). Get rid of it. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Hugh Dickins authored
Fix warning from pmd_bad() at bootup on a HIGHMEM64G HIGHPTE x86_32. That came from 9fc34113 x86: debug pmd_bad(); but we understand now that the typecasting was wrong for PAE in the previous version: pagetable pages above 4GB looked bad and stopped Arjan from booting. And revert that cded932b x86: fix pmd_bad and pud_bad to support huge pages. It was the wrong way round: we shouldn't weaken every pmd_bad and pud_bad check to let huge pages slip through - in part they check that we _don't_ have a huge page where it's not expected. Put the x86 pmd_bad() and pud_bad() definitions back to what they have long been: they can be improved (x86_32 should use PTE_MASK, to stop PAE thinking junk in the upper word is good; and x86_64 should follow x86_32's stricter comparison, to stop thinking any subset of required bits is good); but that should be a later patch. Fix Hans' good observation that follow_page() will never find pmd_huge() because that would have already failed the pmd_bad test: test pmd_huge in between the pmd_none and pmd_bad tests. Tighten x86's pmd_huge() check? No, once it's a hugepage entry, it can get quite far from a good pmd: for example, PROT_NONE leaves it with only ACCESSED of the KERN_PGTABLE bits. However... though follow_page() contains this and another test for huge pages, so it's nice to keep it working on them, where does it actually get called on a huge page? get_user_pages() checks is_vm_hugetlb_page(vma) to to call alternative hugetlb processing, as does unmap_vmas() and others. Signed-off-by: Hugh Dickins <hugh@veritas.com> Earlier-version-tested-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jeff Chua <jeff.chua.linux@gmail.com> Cc: Hans Rosenfeld <hans.rosenfeld@amd.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6Linus Torvalds authored
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: [PATCH] fix SMP ordering hole in fcntl_setlk() [PATCH] kill ->put_inode [PATCH] fix reservation discarding in affs
-
Al Viro authored
fcntl_setlk()/close() race prevention has a subtle hole - we need to make sure that if we *do* have an fcntl/close race on SMP box, the access to descriptor table and inode->i_flock won't get reordered. As it is, we get STORE inode->i_flock, LOAD descriptor table entry vs. STORE descriptor table entry, LOAD inode->i_flock with not a single lock in common on both sides. We do have BKL around the first STORE, but check in locks_remove_posix() is outside of BKL and for a good reason - we don't want BKL on common path of close(2). Solution is to hold ->file_lock around fcheck() in there; that orders us wrt removal from descriptor table that preceded locks_remove_posix() on close path and we either come first (in which case eviction will be handled by the close side) or we'll see the effect of close and do eviction ourselves. Note that even though it's read-only access, we do need ->file_lock here - rcu_read_lock() won't be enough to order the things. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Christoph Hellwig authored
And with that last patch to affs killing the last put_inode instance we can finally, after many years of transition kill this racy and awkward interface. (It's kinda funny that even the description in Documentation/filesystems/vfs.txt was entirely wrong..) Also remove a very misleading comment above the defintion of struct super_operations. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
Roman Zippel authored
- remove affs_put_inode, so preallocations aren't discared unnecessarily often. - remove affs_drop_inode, it's called with a spinlock held, so it can't use a mutex. - make i_opencnt atomic - avoid direct b_count manipulations - a few allocation failure fixes, so that these are more gracefully handled now. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-devLinus Torvalds authored
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: (27 commits) pata_atiixp: Don't disable sata_inic162x: update intro comment, up the version and drop EXPERIMENTAL sata_inic162x: add cardbus support sata_inic162x: kill now unused SFF related stuff sata_inic162x: use IDMA for ATAPI commands sata_inic162x: use IDMA for non DMA ATA commands sata_inic162x: kill now unused bmdma related stuff sata_inic162x: use IDMA for ATA_PROT_DMA sata_inic162x: update TF read handling sata_inic162x: add / update constants sata_inic162x: misc clean ups sata_mv use hweight16() for bit counting (V2) sata_mv NCQ-EH for FIS-based switching sata_mv delayed eh handling libata: export ata_eh_analyze_ncq_error sata_mv new mv_port_intr function sata_mv fix mv_host_intr bug for hc_irq_cause sata_mv NCQ and SError fixes for mv_err_intr sata_mv rearrange mv_config_fbs sata_mv errata workaround for sata25 part 1 ...
-
Alan Cox authored
A couple of distributions (Fedora, Ubuntu) were having weird problems with the ATI IXP series PATA controllers being reported as simplex. At the heart of the problem is that both distros ignored the recommendations to load pata_acpi and ata_generic *AFTER* specific host drivers. The underlying cause however is that if you D3 and then D0 an ATI IXP it helpfully throws away some configuration and won't let you rewrite it. Add checks to ata_generic and pata_acpi to pin ATIIXP devices. Possibly the real answer here is to quirk them and pin them, but right now we can't do that before they've been pcim_enable()'d by a driver. I'm indebted to David Gero for this. His bug report not only reported the problem but identified the cause correctly and he had tested the right values to prove what was going on [If you backport this for 2.6.24 you will need to pull in the 2.6.25 removal of the bogus WARN_ON() in pcim_enagle] Signed-off-by: Alan Cox <alan@redhat.com> Tested-by: David Gero <davidg@havidave.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
-
Tejun Heo authored
sata_inic162x is now ready for production use. Bump the version, explain what's working and what's not and drop EXPERIMENTAL. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
-
Tejun Heo authored
When attached to cardbus, mmio region is at BAR 1. Other than that, everything else is the same. Add support for it. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
-
Tejun Heo authored
sata_inic162x now doesn't use any SFF features. Remove all SFF related stuff. * Mask unsolicited ATA interrupts. This removes our primary source of spurious interrupts and spurious interrupt handling can be tightened up. There's no need to clear ATA interrupts by reading status register either. * Don't dance with IDMA_CTL_ATA_NIEN and simplify accesses to IDMA_CTL. * Inherit from sata_port_ops instead of ata_sff_port_ops. * Don't initialize or use ioaddr. There's no need to map BAR0-4 anymore. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
-
Tejun Heo authored
Use IDMA for ATAPI commands. Write and some misc commands time out when executed using ATAPI_PROT_DMA but ATAPI_PROT_PIO works fine. As PIO is driven by DMA too, it doesn't make any noticeable difference for native SATA devices. inic_check_atapi_dma() is implemented to force PIO for those ATAPI commands. After this change, sata_inic162x issues all commands using IDMA. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
-
Tejun Heo authored
Use IDMA for PIO and non-data commands. This allows sata_inic162x to safely drive LBA48 devices. Kill inic_dev_config() which contains code to reject LBA48 devices. With this change, status checking in inic_qc_issue() to avoid hard lock up after hotplug can go away too. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
-
Tejun Heo authored
sata_inic162x doesn't use BMDMA anymore. Kill bmdma related stuff. * prdctl manipulation * port IRQ mask manipulation * inherit ATA_BASE_SHT instead of ATA_BMDMA_SHT * BMDMA methods Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
-
Tejun Heo authored
The modified driver on initio site has enough clue on how to use IDMA. Use IDMA for ATA_PROT_DMA. * LBA48 now works as long as it uses DMA (LBA48 devices still aren't allowed as it can destroy data if PIO is used for any reason). * No need to mask IRQs for read DMAs as IDMA_DONE is properly raised after transfer to memory is actually completed. There will be some spurious interrupts but host_intr will handle it correctly and manipulating port IRQ mask interacts badly with the other port for some reason, so command type dependent port IRQ masking is not used anymore. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
-
Tejun Heo authored
inic162x can't reliably read back TF or at least we don't know how to do it yet. The only values which seem reliable are status and error. This patch updates access to TF. * implement inic_tf_read() which reads the TF area in mmio area * implement custom inic_qc_fill_rtf() which only returns true if status indicates device error. it'll be returning bogus addresses for device errors but it'll be able to report why it failed at least. * implement custom inic_check_ready() and use ata_wait_after_reset() instead of the SFF version. * use inic_tf_read() for classification. This is not perfect but it fixes hotplug detection failure and at least makes the driver report 0's instead of random garbages while reporting valid status and error for device errors. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
-
Tejun Heo authored
* add a bunch of constants, most are from the datasheet, a few undocumented ones are from initio's modified driver * HCTL_PWRDWN is bit 12 not 13 This is in preparation of further inic162x updates. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
-
Tejun Heo authored
* use larger indents for structure member definitions * kill unused variable @addr in inic_scr_write() * kill unnecessary flushes in inic_freeze/thaw() * kill buggy explicit kfree() on devres managed port private data This is in preparation of further inic162x updates. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
-
Mark Lord authored
Some tidying as suggested by Grant Grundler. Nuke local bit-counting function from sata_mv in favour of using hweight16(). Also add a short explanation for the 15msec timeout used when waiting for empty/idle. Signed-off-by: Mark Lord <mlord@pobox.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
-
Mark Lord authored
Convert sata_mv's EH for FIS-based switching (FBS) over to the sequence recommended by Marvell. This enables us to catch/analyze multiple failed links on a port-multiplier when using NCQ. To do this, we clear the ERR_DEV bit in the EDMA Halt-Conditions register, so that the EDMA engine doesn't self-disable on the first NCQ error. Our EH code sets the MV_PP_FLAG_DELAYED_EH flag to prevent new commands being queued while we await completion of all outstanding NCQ commands on all links of the failed PM. The SATA Test Control register tells us which links have failed, so we must only wait for any other active links to finish up before we stop the EDMA and run the .error_handler afterward. The patch also includes skeleton code for handling of non-NCQ FBS operation. This is more for documentation purposes right now, as that mode is not yet enabled in sata_mv. Signed-off-by: Mark Lord <mlord@pobox.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
-
Mark Lord authored
Introduce a new "delayed error handling" mechanism in sata_mv, to enable us to eventually deal with multiple simultaneous NCQ failures on a single host link when a PM is present. This involves a port flag (MV_PP_FLAG_DELAYED_EH) to prevent new commands being queued, and a pmp bitmap to indicate which pmp links had NCQ errors. The new mv_pmp_error_handler() uses those values to invoke ata_eh_analyze_ncq_error() on each failed link, prior to freezing the port and passing control to sata_pmp_error_handler(). This is based upon a strategy suggested by Tejun. For now, we just implement the delayed mechanism. The next patch in this series will add the multiple-NCQ EH code to take advantage of it. Signed-off-by: Mark Lord <mlord@pobox.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
-
Mark Lord authored
Export ata_eh_analyze_ncq_error() for subsequent use by sata_mv, as suggested by Tejun. Signed-off-by: Mark Lord <mlord@pobox.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
-
Mark Lord authored
Separate out the inner loop body of mv_host_intr() into it's own function called mv_port_intr(). This should help maintainabilty. Signed-off-by: Mark Lord <mlord@pobox.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
-
Mark Lord authored
Remove the unwanted reads of hc_irq_cause from mv_host_intr(), thereby removing a bug whereby we were not always reading it when needed.. Signed-off-by: Mark Lord <mlord@pobox.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
-
Mark Lord authored
Sigh. Undo some earlier changes to mv_port_intr(), so that we now read/clear SError again in all cases. Arrange the top of the function to be as close as possible to what we need for a later update (in this series) for ERR_DEV handling. Fix things so that libata-eh can attempt a READ_LOG_EXT_10H in response to a failed NCQ command, by just doing a local mv_eh_freeze() rather than ata_port_freeze(). This will now fully handle NCQ errors much of the time, but more fixes are needed for FBS/PMP, and for certain chip errata. Signed-off-by: Mark Lord <mlord@pobox.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
-
Mark Lord authored
Rearrange mv_config_fbs() to more closely follow the (corrected) datasheet recommendations for NCQ and FIS-based switching (FBS). Also, maintain a port flag to let us know when FBS is enabled. We will make more use of that flag later in this patch series. Signed-off-by: Mark Lord <mlord@pobox.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
-
Mark Lord authored
Part 1 of workaround for errata "sata#25" for the 60x1 series (the second half of this errata workaround is still in development. Bit22 of the GPIO port has to be set "on" when in NCQ mode. Signed-off-by: Mark Lord <mlord@pobox.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
-