Commits · e84987a1f941b8e2e3173bb38510ddf25cc8c7f0 · nexedi / linux

18 Mar, 2014 22 commits

Merge branch 'bcache-for-3.15' of git://evilpiepirate.org/~kent/linux-bcache into for-3.15/drivers · e84987a1
Jens Axboe authored Mar 18, 2014
```
Kent writes:

Jens, here's the bcache changes for 3.15. Lots of bugfixes, and some
refactoring and cleanups.
```
e84987a1

bcache: remove nested function usage · cb851149

John Sheu authored Mar 17, 2014

Uninlined nested functions can cause crashes when using ftrace, as they don't
follow the normal calling convention and confuse the ftrace function graph
tracer as it examines the stack.

Also, nested functions are supported as a gcc extension, but may fail on other
compilers (e.g. llvm).
Signed-off-by: John Sheu <john.sheu@gmail.com>

cb851149

bcache: Kill bucket->gc_gen · 3a2fd9d5

Kent Overstreet authored Feb 27, 2014

gc_gen was a temporary used to recalculate last_gc, but since we only need
bucket->last_gc when gc isn't running (gc_mark_valid = 1), we can just update
last_gc directly.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

3a2fd9d5

bcache: Kill unused freelist · 2531d9ee

Kent Overstreet authored Mar 17, 2014

This was originally added as at optimization that for various reasons isn't
needed anymore, but it does add a lot of nasty corner cases (and it was
responsible for some recently fixed bugs). Just get rid of it now.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

2531d9ee

bcache: Rework btree cache reserve handling · 0a63b66d

Kent Overstreet authored Mar 17, 2014

This changes the bucket allocation reserves to use _real_ reserves - separate
freelists - instead of watermarks, which if nothing else makes the current code
saner to reason about and is going to be important in the future when we add
support for multiple btrees.

It also adds btree_check_reserve(), which checks (and locks) the reserves for
both bucket allocation and memory allocation for btree nodes; the old code just
kinda sorta assumed that since (e.g. for btree node splits) it had the root
locked and that meant no other threads could try to make use of the same
reserve; this technically should have been ok for memory allocation (we should
always have a reserve for memory allocation (the btree node cache is used as a
reserve and we preallocate it)), but multiple btrees will mean that locking the
root won't be sufficient anymore, and for the bucket allocation reserve it was
technically possible for the old code to deadlock.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

0a63b66d

bcache: Kill btree_io_wq · 56b30770

Kent Overstreet authored Jan 23, 2014

With the locking rework in the last patch, this shouldn't be needed anymore -
btree_node_write_work() only takes b->write_lock which is never held for very
long.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

56b30770

bcache: btree locking rework · 2a285686

Kent Overstreet authored Mar 04, 2014

Add a new lock, b->write_lock, which is required to actually modify - or write -
a btree node; this lock is only held for short durations.

This means we can write out a btree node without taking b->lock, which _is_ held
for long durations - solving a deadlock when btree_flush_write() (from the
journalling code) is called with a btree node locked.

Right now just occurs in bch_btree_set_root(), but with an upcoming journalling
rework is going to happen a lot more.

This also turns b->lock is now more of a read/intent lock instead of a
read/write lock - but not completely, since it still blocks readers. May turn it
into a real intent lock at some point in the future.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

2a285686

bcache: Fix a race when freeing btree nodes · 05335cff

Kent Overstreet authored Mar 17, 2014

This isn't a bulletproof fix; btree_node_free() -> bch_bucket_free() puts the
bucket on the unused freelist, where it can be reused right away without any
ordering requirements. It would be better to wait on at least a journal write to
go down before reusing the bucket. bch_btree_set_root() does this, and inserting
into non leaf nodes is completely synchronous so we should be ok, but future
patches are just going to get rid of the unused freelist - it was needed in the
past for various reasons but shouldn't be anymore.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

05335cff

bcache: Add a real GC_MARK_RECLAIMABLE · 4fe6a816

Kent Overstreet authored Mar 13, 2014

This means the garbage collection code can better check for data and metadata
pointers to the same buckets.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

4fe6a816

bcache: Add bch_keylist_init_single() · c13f3af9

Kent Overstreet authored Jan 08, 2014

This will potentially save us an allocation when we've got inode/dirent bkeys
that don't fit in the keylist's inline keys.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

c13f3af9

bcache: Improve priority_stats · 15754020

Kent Overstreet authored Feb 25, 2014

Break down data into clean data/dirty data/metadata.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

15754020

bcache: Better alloc tracepoints · 7159b1ad

Kent Overstreet authored Feb 12, 2014

Change the invalidate tracepoint to indicate how much data we're invalidating,
and change the alloc tracepoints to indicate what offset they're for.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

7159b1ad

bcache: Kill dead cgroup code · 3f5e0a34

Kent Overstreet authored Jan 23, 2014

This hasn't been used or even enabled in ages.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

3f5e0a34

bcache: stop moving_gc marking buckets that can't be moved. · 3f6ef381
Nicholas Swenson authored Jan 23, 2014
```
Signed-off-by: Nicholas Swenson <nks@daterainc.com>
```
3f6ef381

bcache: Fix moving_pred() · 10d9dcf6

Kent Overstreet authored Feb 17, 2014

Avoid a potential null pointer deref (e.g. from check keys for cache misses)
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

10d9dcf6

bcache: Fix moving_gc deadlocking with a foreground write · da415a09

Nicholas Swenson authored Jan 09, 2014

Deadlock happened because a foreground write slept, waiting for a bucket
to be allocated. Normally the gc would mark buckets available for invalidation.
But the moving_gc was stuck waiting for outstanding writes to complete.
These writes used the bcache_wq, the same queue foreground writes used.

This fix gives moving_gc its own work queue, so it was still finish moving
even if foreground writes are stuck waiting for allocation. It also makes
work queue a parameter to the data_insert path, so moving_gc can use its
workqueue for writes.
Signed-off-by: Nicholas Swenson <nks@daterainc.com>
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

da415a09

bcache: Fix discard granularity · 90db6919

Kent Overstreet authored Feb 10, 2014

blk_stack_limits() doesn't like a discard granularity of 0.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

90db6919

bcache: Fix another bug recovering from unclean shutdown · 487dded8

Kent Overstreet authored Mar 17, 2014

The on disk bucket gens are allowed to be out of date, when we reuse buckets
that didn't have any live data in them. To deal with this, the initial gc has to
update the bucket gen when we find a pointer gen newer than the bucket's gen.

Unfortunately we weren't doing this for pointers in the journal that we're about
to replay.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

487dded8

bcache: Fix a bug recovering from unclean shutdown · 0bd143fd

Kent Overstreet authored Mar 04, 2014

The code to fixup incorrect bucket prios incorrectly did not skip btree node
freeing keys
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

0bd143fd

bcache: Fix a journalling reclaim after recovery bug · 27201cfd

Kent Overstreet authored Mar 13, 2014

On recovery we weren't correctly keeping track of what journal buckets had open
journal entries, thus it was possible for them to be overwritten until we'd
written all new journal entries.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

27201cfd

bcache: Fix a null ptr deref in journal replay · 65ddf45a
Kent Overstreet authored Feb 24, 2014
```
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
```
65ddf45a
bcache: Fix a lockdep splat in an error path · 4fa03402
Kent Overstreet authored Mar 17, 2014
```
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
```
4fa03402

13 Mar, 2014 9 commits

mtip32xx: mtip_async_complete() bug fixes · 5eb9291c

Sam Bradshaw authored Mar 13, 2014

This patch fixes 2 issues in the fast completion path:
1) Possible double completions / double dma_unmap_sg() calls due to lack
of atomicity in the check and subsequent dereference of the upper layer
callback function. Fixed with cmpxchg before unmap and callback.
2) Regression in unaligned IO constraining workaround for p420m devices.
Fixed by checking if IO is unaligned and using proper semaphore if so.
Signed-off-by: Sam Bradshaw <sbradshaw@micron.com>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>

5eb9291c

mtip32xx: Unmap the DMA segments before completing the IO request · 368c89d7

Felipe Franciosi authored Mar 13, 2014

If the buffers are unmapped after completing a request, then stale data
might be in the request.
Signed-off-by: Felipe Franciosi <felipe@paradoxo.org>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>

368c89d7

mtip32xx: Set queue bounce limit · 1044b1bb

Felipe Franciosi authored Mar 13, 2014

We need to set the queue bounce limit during the device initialization to
prevent excessive bouncing on 32 bit architectures.
Signed-off-by: Felipe Franciosi <felipe@paradoxo.org>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>

1044b1bb

nvme: Use pci_enable_msi_range() and pci_enable_msix_range() · be577fab

Alexander Gordeev authored Mar 04, 2014

As result of deprecation of MSI-X/MSI enablement functions
pci_enable_msix() and pci_enable_msi_block() all drivers
using these two interfaces need to be updated to use the
new pci_enable_msi_range()  or pci_enable_msi_exact()
and pci_enable_msix_range() or pci_enable_msix_exact()
interfaces.
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-nvme@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

be577fab

cciss: Fallback to MSI rather than to INTx if MSI-X failed · 371ff93a

Alexander Gordeev authored Feb 26, 2014

Currently the driver falls back to INTx mode when MSI-X
initialization failed. This is a suboptimal behaviour
for chips that also support MSI. This update changes that
behaviour and falls back to MSI mode in case MSI-X mode
initialization failed.
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Cc: Mike Miller <mike.miller@hp.com>
Cc: iss_storagedev@hp.com
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-pci@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>

371ff93a

swim3: fix interruptible_sleep_on race · 106fd892

Arnd Bergmann authored Feb 26, 2014

interruptible_sleep_on is racy and going away. This replaces the one
caller in the swim3 driver with the equivalent race-free
wait_event_interruptible call. Since we're here already, this
also fixes the case where we get interrupted from atomic context,
which used to just spin in the loop.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@fb.com>

106fd892

ataflop: fix sleep_on races · 7b8a3d22

Arnd Bergmann authored Feb 26, 2014

sleep_on() is inherently racy, and has been deprecated for a long time.
This fixes two instances in the atari floppy driver:

* fdc_wait/fdc_busy becomes an open-coded mutex. We cannot use the
  regular mutex since it gets released in interrupt context. The
  open-coded version using wait_event() and cmpxchg() is equivalent
  to the existing code but does the checks atomically, and we can
  now safely check the condition with irqs enabled.

* format_wait becomes a completion, which is the natural structure
  here. The format ioctl waits for the background task to either
  complete or abort.

This does not attempt to fix the preexisting bug of calling schedule
with local interrupts disabled.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michael Schmitz <schmitz@biophys.uni-duesseldorf.de>
Signed-off-by: Jens Axboe <axboe@fb.com>

7b8a3d22

DAC960: remove sleep_on usage · 9c552e1d

Arnd Bergmann authored Feb 26, 2014

sleep_on and its variants are going away. The use of sleep_on() in
DAC960_V2_ExecuteUserCommand seems to be bogus because the command
by the time we get there, the command has completed already and
we just enter the timeout. Based on this interpretation, I concluded
that we can replace it with a simple msleep(1000) and rearrange the
code around it slightly.

The interruptible_sleep_on_timeout in DAC960_gam_ioctl seems equivalent
to the race-free version using wait_event_interruptible_timeout.
I left the driver to return -EINTR rather than -ERESTARTSYS to preserve
the timeout behavior.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@fb.com>

9c552e1d

mtip32xx: Use pci_enable_msi() instead of pci_enable_msi_range() · c94efe36

Alexander Gordeev authored Feb 25, 2014

Commit "mtip32xx: Use pci_enable_msix_range() instead of
pci_enable_msix()" was unnecessary, since pci_enable_msi()
function is not deprecated and is still preferable for
enabling the single MSI mode. This update reverts usage of
pci_enable_msi() function.

Besides, the changelog for that commit was bogus, since
mtip32xx driver uses MSI interrupt, not MSI-X.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: linux-pci@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>

c94efe36

26 Feb, 2014 2 commits

bcache: Fix a shutdown bug · dabb4433

Kent Overstreet authored Feb 19, 2014

Shutdown wasn't cancelling/waiting on journal_write_work()
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

dabb4433

bcache: Fix flash_dev_cache_miss() for real this time · 1b4eaf3d

Kent Overstreet authored Jan 16, 2014

The code was using sectors to count the number of sectors it was zeroing... but
then it passed it to bio_advance()... after it had been set to 0. Amusing...
Signed-off-by: Kent Overstreet <kmo@daterainc.com>

1b4eaf3d

21 Feb, 2014 7 commits

skd: Use pci_enable_msix_range() instead of pci_enable_msix() · a9df8625

Alexander Gordeev authored Feb 19, 2014

As result of deprecation of MSI-X/MSI enablement functions
pci_enable_msix() and pci_enable_msi_block() all drivers
using these two interfaces need to be updated to use the
new pci_enable_msi_range() and pci_enable_msix_range()
interfaces.
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: linux-pci@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>

a9df8625

skd: Use unified access to skdev->msix_entries throughout the code · 1bc5ce5d

Alexander Gordeev authored Feb 19, 2014

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: linux-pci@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>

1bc5ce5d

skd: Fix incomplete cleanup of MSI-X interrupt · 46817769

Alexander Gordeev authored Feb 19, 2014

When enabling MSI-X interrupts fails due to lack of memory
the call to pci_disable_msix() is missed and the device is
left with MSI-X interrupts enabled while the driver assumes
otherwise. This update fixes the described misbehaviour and
cleans up the code of skd_release_msix() function.
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: linux-pci@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>

46817769

skd: Fix out of array boundary access · c5e3035c

Alexander Gordeev authored Feb 19, 2014

When enabling MSI-X, interrupts are requested for SKD_MAX_MSIX_COUNT
entries in skdev->msix_entries array, while the number of actually
allocated entries is skdev->msix_count. This might lead to an out of
boundary access in case number of allocated entries is less than
SKD_MAX_MSIX_COUNT. This update fixes the described misbehaviour.
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: linux-pci@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>

c5e3035c

mtip32xx: Use pci_enable_msix_range() instead of pci_enable_msix() · f219ad82

Alexander Gordeev authored Feb 19, 2014

As result of deprecation of MSI-X/MSI enablement functions
pci_enable_msix() and pci_enable_msi_block() all drivers
using these two interfaces need to be updated to use the
new pci_enable_msi_range() and pci_enable_msix_range()
interfaces.
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: linux-pci@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>

f219ad82

mtip32xx: Remove superfluous call to pci_disable_msi() · cf91f39b

Alexander Gordeev authored Feb 19, 2014

There is no need to call pci_disable_msi() in case
the previous call to pci_enable_msi() failed
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: linux-pci@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>

cf91f39b

drbd: Fix future possible NULL pointer dereference · f597f6b8

Andreas Gruenbacher authored Feb 19, 2014

Right now every resource has exactly one connection. But we are preparing
for dynamic connections. I.e. in the future thre can be resources without
connections.

However smatch points this out as 'variable dereferenced before check',
which is correct.

This issue was introduced in
drbd: get_one_status(): Iterate over resource->devices instead of connection->peer_devices
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

f597f6b8