Commits · b10ebd34cccae1b431caf1be54919aede2be7cbe · Kirill Smelkov / linux

08 Apr, 2014 2 commits

dm thin: fix rcu_read_lock being held in code that can sleep · b10ebd34

Joe Thornber authored Apr 08, 2014

Commit c140e1c4 ("dm thin: use per thin device deferred bio lists")
introduced the use of an rculist for all active thin devices.  The use
of rcu_read_lock() in process_deferred_bios() can result in a BUG if a
dm_bio_prison_cell must be allocated as a side-effect of bio_detain():

 BUG: sleeping function called from invalid context at mm/mempool.c:203
 in_atomic(): 1, irqs_disabled(): 0, pid: 6, name: kworker/u8:0
 3 locks held by kworker/u8:0/6:
   #0:  ("dm-" "thin"){.+.+..}, at: [<ffffffff8106be42>] process_one_work+0x192/0x550
   #1:  ((&pool->worker)){+.+...}, at: [<ffffffff8106be42>] process_one_work+0x192/0x550
   #2:  (rcu_read_lock){.+.+..}, at: [<ffffffff816360b5>] do_worker+0x5/0x4d0

We can't process deferred bios with the rcu lock held, since
dm_bio_prison_cell allocation may block if the bio-prison's cell mempool
is exhausted.

To fix:

- Introduce a refcount and completion field to each thin_c

- Add thin_get/put methods for adjusting the refcount.  If the refcount
  hits zero then the completion is triggered.

- Initialise refcount to 1 when creating thin_c

- When iterating the active_thins list we thin_get() whilst the rcu
  lock is held.

- After the rcu lock is dropped we process the deferred bios for that
  thin.

- When destroying a thin_c we thin_put() and then wait for the
  completion -- to avoid a race between the worker thread iterating
  from that thin_c and destroying the thin_c.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

b10ebd34

dm thin: irqsave must always be used with the pool->lock spinlock · 5e3283e2

Joe Thornber authored Apr 08, 2014

Commit c140e1c4 ("dm thin: use per thin device deferred bio lists")
incorrectly stopped disabling irqs when taking the pool's spinlock.

Irqs must be disabled when taking the pool's spinlock otherwise a thread
could spin_lock(), then get interrupted to service thin_endio() in
interrupt context, which would then deadlock in spin_lock_irqsave().
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

5e3283e2

04 Apr, 2014 2 commits

dm cache: fix a lock-inversion · 0596661f

Joe Thornber authored Apr 03, 2014

When suspending a cache the policy is walked and the individual policy
hints written to the metadata via sync_metadata().  This led to this
lock order:

      policy->lock
        cache_metadata->root_lock

When loading the cache target the policy is populated while the metadata
lock is held:

      cache_metadata->root_lock
         policy->lock

Fix this potential lock-inversion (ABBA) deadlock in sync_metadata() by
ensuring the cache_metadata root_lock is held whilst all the hints are
written, rather than being repeatedly locked while policy->lock is held
(as was the case with each callout that policy_walk_mappings() made to
the old save_hint() method).

Found by turning on the CONFIG_PROVE_LOCKING ("Lock debugging: prove
locking correctness") build option.  However, it is not clear how the
LOCKDEP reported paths can lead to a deadlock since the two paths,
suspending a target and loading a target, never occur at the same time.
But that doesn't mean the same lock-inversion couldn't have occurred
elsewhere.
Reported-by: Marian Csontos <mcsontos@redhat.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

0596661f

dm thin: sort the per thin deferred bios using an rb_tree · 67324ea1

Mike Snitzer authored Mar 21, 2014

A thin-pool will allocate blocks using FIFO order for all thin devices
which share the thin-pool.  Because of this simplistic allocation the
thin-pool's space can become fragmented quite easily; especially when
multiple threads are requesting blocks in parallel.

Sort each thin device's deferred_bio_list based on logical sector to
help reduce fragmentation of the thin-pool's ondisk layout.

The following tables illustrate the realized gains/potential offered by
sorting each thin device's deferred_bio_list.  An "io size"-sized random
read of the device would result in "seeks/io" fragments being read, with
an average "distance/seek" between each fragment.

Data was written to a single thin device using multiple threads via
iozone (8 threads, 64K for both the block_size and io_size).

unsorted:

     io size   seeks/io distance/seek
  --------------------------------------
          4k    0.000   0b
         16k    0.013   11m
         64k    0.065   11m
        256k    0.274   10m
          1m    1.109   10m
          4m    4.411   10m
         16m    17.097  11m
         64m    60.055  13m
        256m    148.798 25m
          1g    809.929 21m

sorted:

     io size   seeks/io distance/seek
  --------------------------------------
          4k    0.000   0b
         16k    0.000   1g
         64k    0.001   1g
        256k    0.003   1g
          1m    0.011   1g
          4m    0.045   1g
         16m    0.181   1g
         64m    0.747   1011m
        256m    3.299   1g
          1g    14.373  1g
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>

67324ea1

31 Mar, 2014 2 commits

dm thin: use per thin device deferred bio lists · c140e1c4

Mike Snitzer authored Mar 20, 2014

The thin-pool previously only had a single deferred_bios list that would
collect bios for all thin devices in the pool.  Split this per-pool
deferred_bios list out to per-thin deferred_bios_list -- doing so
enables increased parallelism when processing deferred bios.  And now
that each thin device has it's own deferred_bios_list we can sort all
bios in the list using logical sector.  The requeue code in error
handling path is also cleaner as a side-effect.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>

c140e1c4

dm thin: simplify pool_is_congested · 760fe67e

Mike Snitzer authored Mar 20, 2014

The pool is congested if the pool is in PM_OUT_OF_DATA_SPACE mode.  This
is more explicit/clear/efficient than inferring whether or not the pool
is congested by checking if retry_on_resume_list is empty.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>

760fe67e

28 Mar, 2014 1 commit

dm thin: fix dangling bio in process_deferred_bios error path · fe76cd88

Mike Snitzer authored Mar 28, 2014

If unable to ensure_next_mapping() we must add the current bio, which
was removed from the @bios list via bio_list_pop, back to the
deferred_bios list before all the remaining @bios.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Cc: stable@vger.kernel.org

fe76cd88

27 Mar, 2014 19 commits

dm mpath: print more useful warnings in multipath_message() · a356e426

Jose Castillo authored Jan 29, 2014

The warning message "Unrecognised multipath message received" is
displayed in two different situations in multipath_message(): when the
number of arguments passed is invalid and when the string passed in
argv[0] is not recognized.

Make it easier to identify where the problem is by making these warnings
more specific with additional context for each case.
Signed-off-by: Jose Castillo <jcastillo@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

a356e426

dm-mpath: do not activate failed paths · 3a017509

Hannes Reinecke authored Feb 28, 2014

activate_path() is run without a lock, so the path might be
set to failed before activate_path() had a chance to run.
This patch add a check for ->active in activate_path() to
avoid unnecessary overhead by calling functions which are known
to be failing.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

3a017509

dm mpath: remove extra nesting in map function · 9bf59a61

Mike Snitzer authored Feb 28, 2014

Return early for case when no path exists, and when the
pathgroup isn't ready. This eliminates the need for
extra nesting for the the common case.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>

9bf59a61

dm mpath: remove map_io() · 36fcffcc

Hannes Reinecke authored Feb 28, 2014

multipath_map() is now just a wrapper around map_io(), so we
can rename map_io() to multipath_map().
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>

36fcffcc

dm mpath: reduce memory pressure when requeuing · e3bde04f

Hannes Reinecke authored Feb 28, 2014

When multipath needs to requeue I/O in the block layer the per-request
context shouldn't be allocated, as it will be freed immediately
afterwards anyway.  Avoiding this memory allocation will reduce memory
pressure during requeuing.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>

e3bde04f

dm mpath: remove process_queued_ios() · 3e9f1be1

Hannes Reinecke authored Feb 28, 2014

process_queued_ios() has served 3 functions:
  1) select pg and pgpath if none is selected
  2) start pg_init if requested
  3) dispatch queued IOs when pg is ready

Basically, a call to queue_work(process_queued_ios) can be replaced by
dm_table_run_md_queue_async(), which runs request queue and ends up
calling map_io(), which does 1), 2) and 3).

Exception is when !pg_ready() (which means either pg_init is running or
requested), then multipath_busy() prevents map_io() being called from
request_fn.

If pg_init is running, it should be ok as long as pg_init_done() does
the right thing when pg_init is completed, I.e.: restart pg_init if
!pg_ready() or call dm_table_run_md_queue_async() to kick map_io().

If pg_init is requested, we have to make sure the request is detected
and pg_init will be started.  pg_init is requested in 3 places:
  a) __choose_pgpath() in map_io()
  b) __choose_pgpath() in multipath_ioctl()
  c) pg_init retry in pg_init_done()
a) is ok because map_io() calls __pg_init_all_paths(), which does 2).
b) needs a call to __pg_init_all_paths(), which does 2).
c) needs a call to __pg_init_all_paths(), which does 2).

So this patch removes process_queued_ios() and ensures that
__pg_init_all_paths() is called at the appropriate locations.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>

3e9f1be1

dm mpath: push back requests instead of queueing · e8099177

Hannes Reinecke authored Feb 28, 2014

There is no reason why multipath needs to queue requests internally for
queue_if_no_path or pg_init; we should rather push them back onto the
request queue.

And while we're at it we can simplify the conditional statement in
map_io() to make it easier to read.

Since mpath no longer does internal queuing of I/O the table info no
longer emits the internal queue_size.  Instead it displays 1 if queuing
is being used or 0 if it is not.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>

e8099177

dm table: add dm_table_run_md_queue_async · 9974fa2c

Mike Snitzer authored Feb 28, 2014

Introduce dm_table_run_md_queue_async() to run the request_queue of the
mapped_device associated with a request-based DM table.

Also add dm_md_get_queue() wrapper to extract the request_queue from a
mapped_device.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>

9974fa2c

dm mpath: do not call pg_init when it is already running · 17f4ff45

Hannes Reinecke authored Feb 28, 2014

This patch moves condition checks as a preparation of following
patches and has no effect on behaviour.
process_queued_ios() is the only caller of __pg_init_all_paths()
and 2 condition checks are moved from outside to inside without
side effects.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>

17f4ff45

dm: use RCU_INIT_POINTER instead of rcu_assign_pointer in __unbind · 9cdb8520

Monam Agarwal authored Mar 23, 2014

Replace rcu_assign_pointer(p, NULL) with RCU_INIT_POINTER(p, NULL).

The rcu_assign_pointer() ensures that the initialization of a structure
is carried out before storing a pointer to that structure.  And in the
case of the NULL pointer, there is no structure to initialize.  So,
rcu_assign_pointer(p, NULL) can be safely converted to
RCU_INIT_POINTER(p, NULL).
Signed-off-by: Monam Agarwal <monamagarwal123@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

9cdb8520

dm: stop using bi_private · bfc6d41c

Mikulas Patocka authored Mar 04, 2014

Device mapper uses the bio structure's bi_private field as a pointer
to dm_target_io or dm_rq_clone_bio_info.  But a bio structure is
embedded in the dm_target_io and dm_rq_clone_bio_info structures, so the
pointer to the structure that contains the bio can be found with the
container_of() macro.

Remove the use of bi_private and use container_of() instead.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

bfc6d41c

dm: remove dm_get_mapinfo · d70ab4fb

Mikulas Patocka authored Mar 03, 2014

Remove dm_get_mapinfo() because no target uses it.  Targets can allocate
per-bio data using ti->per_bio_data_size, this is much more flexible
than union map_info.

Leave union map_info only for the request-based multipath target's use.
Also delete the unused "unsigned long long ll" field of union map_info.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

d70ab4fb

dm: make dm_table_alloc_md_mempools static · 473c36df

Mikulas Patocka authored Feb 13, 2014

Make the function dm_table_alloc_md_mempools static because it is not
called from another file.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

473c36df

dm: take care to copy the space map roots before locking the superblock · 5a32083d

Joe Thornber authored Mar 27, 2014

In theory copying the space map root can fail, but in practice it never
does because we're careful to check what size buffer is needed.

But make certain we're able to copy the space map roots before
locking the superblock.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # drop dm-era and dm-cache changes as needed

5a32083d

dm transaction manager: fix corruption due to non-atomic transaction commit · a9d45396

Joe Thornber authored Mar 27, 2014

The persistent-data library used by dm-thin, dm-cache, etc is
transactional.  If anything goes wrong, such as an io error when writing
new metadata or a power failure, then we roll back to the last
transaction.

Atomicity when committing a transaction is achieved by:

a) Never overwriting data from the previous transaction.
b) Writing the superblock last, after all other metadata has hit the
   disk.

This commit and the following commit ("dm: take care to copy the space
map roots before locking the superblock") fix a bug associated with (b).
When committing it was possible for the superblock to still be written
in spite of an io error occurring during the preceeding metadata flush.
With these commits we're careful not to take the write lock out on the
superblock until after the metadata flush has completed.

Change the transaction manager's semantics for dm_tm_commit() to assume
all data has been flushed _before_ the single superblock that is passed
in.

As a prerequisite, split the block manager's block unlocking and
flushing by simplifying dm_bm_flush_and_unlock() to dm_bm_flush().  Now
the unlocking must be done separately.

This issue was discovered by forcing io errors at the crucial time
using dm-flakey.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

a9d45396

dm cache: remove remainder of distinct discard block size · 64ab346a

Heinz Mauelshagen authored Mar 27, 2014

Discard block size not being equal to cache block size causes data
corruption by erroneously avoiding migrations in issue_copy() because
the discard state is being cleared for a group of cache blocks when it
should not.

Completely remove all code that enabled a distinction between the
cache block size and discard block size.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

64ab346a

dm cache: prevent corruption caused by discard_block_size > cache_block_size · d132cc6d

Mike Snitzer authored Mar 20, 2014

If the discard block size is larger than the cache block size we will
not properly quiesce IO to a region that is about to be discarded.  This
results in a race between a cache migration where no copy is needed, and
a write to an adjacent cache block that's within the same large discard
block.

Workaround this by limiting the discard_block_size to cache_block_size.
Also limit the max_discard_sectors to cache_block_size.

A more comprehensive fix that introduces range locking support in the
bio_prison and proper quiescing of a discard range that spans multiple
cache blocks is already in development.
Reported-by: Morgan Mears <Morgan.Mears@netapp.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Acked-by: Heinz Mauelshagen <heinzm@redhat.com>
Cc: stable@vger.kernel.org

d132cc6d

dm bitset: only flush the current word if it has been dirtied · 428e4698

Joe Thornber authored Mar 03, 2014

This change offers a big performance boost for dm-era.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

428e4698

dm: add era target · eec40579

Joe Thornber authored Mar 03, 2014

dm-era is a target that behaves similar to the linear target.  In
addition it keeps track of which blocks were written within a user
defined period of time called an 'era'.  Each era target instance
maintains the current era as a monotonically increasing 32-bit
counter.

Use cases include tracking changed blocks for backup software, and
partially invalidating the contents of a cache to restore cache
coherency after rolling back a vendor snapshot.

dm-era is primarily expected to be paired with the dm-cache target.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

eec40579

25 Mar, 2014 4 commits

Linux 3.14-rc8 · b098d672
Linus Torvalds authored Mar 24, 2014

b098d672

Merge branch 'parisc-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · 82231646

Linus Torvalds authored Mar 24, 2014

Pull parisc updates from Helge Deller:
 - revert parts of the latest patch regarding font selection with STICON
   console
 - wire up the utimes() syscall for parisc
 - remove the unused parisc tmpalias code and unnecessary arch*relax
   defines

* 'parisc-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: locks: remove redundant arch_*_relax operations
  parisc: wire up sys_utimes
  parisc: Remove unused CONFIG_PARISC_TMPALIAS code
  partly revert commit 8a10bc9d: parisc/sti_console: prefer Linux fonts over built-in ROM fonts

82231646

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · 56f1f4b2

Linus Torvalds authored Mar 24, 2014

Pull sparc fixes from David Miller:

 1) Do serial locking in a way that makes things clear that these are
    IRQ spinlocks.

 2) Conversion to generic idle loop broke first generation Niagara
    machines, need to have %pil interrupts enabled during cpu yield
    hypervisor call.

 3) Do not use magic constants for iterations over tsb tables, from Doug
    Wilson.

 4) Fix erroneous truncation of 64-bit system call return values to
    32-bit.  From Dave Kleikamp.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc64: Make sure %pil interrupts are enabled during hypervisor yield.
  sparc64:tsb.c:use array size macro rather than number
  sparc64: don't treat 64-bit syscall return codes as 32-bit
  sparc: serial: Clean up the locking for -rt

56f1f4b2

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 8a109446

Linus Torvalds authored Mar 24, 2014

Pull networking fixes from David Miller:

 1) OpenVswitch's lookup_datapath() returns error pointers, so don't
    check against NULL.  From Jiri Pirko.

 2) pfkey_compile_policy() code path tries to do a GFP_KERNEL allocation
    under RCU locks, fix by using GFP_ATOMIC when necessary.  From
    Nikolay Aleksandrov.

 3) phy_suspend() indirectly passes uninitialized data into the ethtool
    get wake-on-land implementations.  Fix from Sebastian Hesselbarth.

 4) CPSW driver unregisters CPTS twice, fix from Benedikt Spranger.

 5) If SKB allocation of reply packet fails, vxlan's arp_reduce() defers
    a NULL pointer.  Fix from David Stevens.

 6) IPV6 neigh handling in vxlan doesn't validate the destination
    address properly, and it builds a packet with the src and dst
    reversed.  Fix also from David Stevens.

 7) Fix spinlock recursion during subscription failures in TIPC stack,
    from Erik Hugne.

 8) Revert buggy conversion of davinci_emac to devm_request_irq, from
    Chrstian Riesch.

 9) Wrong flags passed into forwarding database netlink notifications,
    from Nicolas Dichtel.

10) The netpoll neighbour soliciation handler checks wrong ethertype,
    needs to be ETH_P_IPV6 rather than ETH_P_ARP.  Fix from Li RongQing.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (34 commits)
  tipc: fix spinlock recursion bug for failed subscriptions
  vxlan: fix nonfunctional neigh_reduce()
  net: davinci_emac: Fix rollback of emac_dev_open()
  net: davinci_emac: Replace devm_request_irq with request_irq
  netpoll: fix the skb check in pkt_is_ns
  net: micrel : ks8851-ml: add vdd-supply support
  ip6mr: fix mfc notification flags
  ipmr: fix mfc notification flags
  rtnetlink: fix fdb notification flags
  tcp: syncookies: do not use getnstimeofday()
  netlink: fix setsockopt in mmap examples in documentation
  openvswitch: Correctly report flow used times for first 5 minutes after boot.
  via-rhine: Disable device in error path
  ATHEROS-ATL1E: Convert iounmap to pci_iounmap
  vxlan: fix potential NULL dereference in arp_reduce()
  cnic: Update version to 2.5.20 and copyright year.
  cnic,bnx2i,bnx2fc: Fix inconsistent use of page size
  cnic: Use proper ulp_ops for per device operations.
  net: cdc_ncm: fix control message ordering
  ipv6: ip6_append_data_mtu do not handle the mtu of the second fragment properly
  ...

8a109446

24 Mar, 2014 8 commits

tipc: fix spinlock recursion bug for failed subscriptions · a5d0e7c0

Erik Hugne authored Mar 24, 2014

If a topology event subscription fails for any reason, such as out
of memory, max number reached or because we received an invalid
request the correct behavior is to terminate the subscribers
connection to the topology server. This is currently broken and
produces the following oops:

[27.953662] tipc: Subscription rejected, illegal request
[27.955329] BUG: spinlock recursion on CPU#1, kworker/u4:0/6
[27.957066]  lock: 0xffff88003c67f408, .magic: dead4ead, .owner: kworker/u4:0/6, .owner_cpu: 1
[27.958054] CPU: 1 PID: 6 Comm: kworker/u4:0 Not tainted 3.14.0-rc6+ #5
[27.960230] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[27.960874] Workqueue: tipc_rcv tipc_recv_work [tipc]
[27.961430]  ffff88003c67f408 ffff88003de27c18 ffffffff815c0207 ffff88003de1c050
[27.962292]  ffff88003de27c38 ffffffff815beec5 ffff88003c67f408 ffffffff817f0a8a
[27.963152]  ffff88003de27c58 ffffffff815beeeb ffff88003c67f408 ffffffffa0013520
[27.964023] Call Trace:
[27.964292]  [<ffffffff815c0207>] dump_stack+0x45/0x56
[27.964874]  [<ffffffff815beec5>] spin_dump+0x8c/0x91
[27.965420]  [<ffffffff815beeeb>] spin_bug+0x21/0x26
[27.965995]  [<ffffffff81083df6>] do_raw_spin_lock+0x116/0x140
[27.966631]  [<ffffffff815c6215>] _raw_spin_lock_bh+0x15/0x20
[27.967256]  [<ffffffffa0008540>] subscr_conn_shutdown_event+0x20/0xa0 [tipc]
[27.968051]  [<ffffffffa000fde4>] tipc_close_conn+0xa4/0xb0 [tipc]
[27.968722]  [<ffffffffa00101ba>] tipc_conn_terminate+0x1a/0x30 [tipc]
[27.969436]  [<ffffffffa00089a2>] subscr_conn_msg_event+0x1f2/0x2f0 [tipc]
[27.970209]  [<ffffffffa0010000>] tipc_receive_from_sock+0x90/0xf0 [tipc]
[27.970972]  [<ffffffffa000fa79>] tipc_recv_work+0x29/0x50 [tipc]
[27.971633]  [<ffffffff8105dbf5>] process_one_work+0x165/0x3e0
[27.972267]  [<ffffffff8105e869>] worker_thread+0x119/0x3a0
[27.972896]  [<ffffffff8105e750>] ? manage_workers.isra.25+0x2a0/0x2a0
[27.973622]  [<ffffffff810648af>] kthread+0xdf/0x100
[27.974168]  [<ffffffff810647d0>] ? kthread_create_on_node+0x1a0/0x1a0
[27.974893]  [<ffffffff815ce13c>] ret_from_fork+0x7c/0xb0
[27.975466]  [<ffffffff810647d0>] ? kthread_create_on_node+0x1a0/0x1a0

The recursion occurs when subscr_terminate tries to grab the
subscriber lock, which is already taken by subscr_conn_msg_event.
We fix this by checking if the request to establish a new
subscription was successful, and if not we initiate termination of
the subscriber after we have released the subscriber lock.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a5d0e7c0

vxlan: fix nonfunctional neigh_reduce() · 4b29dba9

David Stevens authored Mar 24, 2014

The VXLAN neigh_reduce() code is completely non-functional since
check-in. Specific errors:

1) The original code drops all packets with a multicast destination address,
	even though neighbor solicitations are sent to the solicited-node
	address, a multicast address. The code after this check was never run.
2) The neighbor table lookup used the IPv6 header destination, which is the
	solicited node address, rather than the target address from the
	neighbor solicitation. So neighbor lookups would always fail if it
	got this far. Also for L3MISSes.
3) The code calls ndisc_send_na(), which does a send on the tunnel device.
	The context for neigh_reduce() is the transmit path, vxlan_xmit(),
	where the host or a bridge-attached neighbor is trying to transmit
	a neighbor solicitation. To respond to it, the tunnel endpoint needs
	to do a *receive* of the appropriate neighbor advertisement. Doing a
	send, would only try to send the advertisement, encapsulated, to the
	remote destinations in the fdb -- hosts that definitely did not do the
	corresponding solicitation.
4) The code uses the tunnel endpoint IPv6 forwarding flag to determine the
	isrouter flag in the advertisement. This has nothing to do with whether
	or not the target is a router, and generally won't be set since the
	tunnel endpoint is bridging, not routing, traffic.

	The patch below creates a proxy neighbor advertisement to respond to
neighbor solicitions as intended, providing proper IPv6 support for neighbor
reduction.
Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4b29dba9

Merge branch 'davinci_emac' · 866b7cdf

David S. Miller authored Mar 24, 2014

Christian Riesch says:

====================
net: davinci_emac: Fix interrupt requests and error handling

since commit 6892b41d (Linux 3.11) the
davinci_emac driver is broken. After doing ifconfig down, ifconfig up,
requesting the interrupts for the driver fails. The interface remains dead
until the board is rebooted.

The first patch in this patchset reverts commit
6892b41d partially and makes the driver
useable again.

During the work on the first patch, a number of bugs in the error handling
of the driver's ndo_open code were found. The second patch fixes these bugs.

I believe the first patch meets the rules for stable kernels, I therefore added
the stable tag to this patch. The second patch is just cleanup, the code
that is fixed by this patch is only executed in case of an error.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

866b7cdf

net: davinci_emac: Fix rollback of emac_dev_open() · cd11cf50

Christian Riesch authored Mar 24, 2014

If an error occurs during the initialization in emac_dev_open() (the
driver's ndo_open function), interrupts, DMA descriptors etc. must be freed.
The current rollback code is buggy in several ways.

  1) Freeing the interrupts. The current code will not free all interrupts
     that were requested by the driver. Furthermore,  the code tries to do a
     platform_get_resource(priv->pdev, IORESOURCE_IRQ, -1) in its last
     iteration.

     This patch fixes these bugs.

  2) Wrong order of err: and rollback: labels. If the setup of the PHY in
     the code fails, the interrupts that have been requested before are
     not freed:

        request irq
                if requesting irqs fails, goto rollback
        setup phy
                if phy setup fails, goto err
        return 0

     rollback:
        free irqs
     err:

     This patch brings the code into the correct order.

  3) The code calls napi_enable() and emac_int_enable(), but does not
     undo both in case of an error.

     This patch adds calls of emac_int_disable() and napi_disable() to the
     rollback code.

  4) RX DMA descriptors are not freed in case of an error: Right before
     requesting the irqs, the function creates DMA descriptors for the
     RX channel. These RX descriptors are never freed when we jump to either
     rollback or err.

     This patch adds code for freeing the DMA descriptors in the case of
     an initialization error. This required a modification of
     cpdma_ctrl_stop() in davinci_cpdma.c: We must be able to call this
     function to free the DMA descriptors while the DMA channels are
     in IDLE state (before cpdma_ctlr_start() was called).

Tested on a custom board with the Texas Instruments AM1808.
Signed-off-by: Christian Riesch <christian.riesch@omicron.at>
Signed-off-by: David S. Miller <davem@davemloft.net>

cd11cf50

net: davinci_emac: Replace devm_request_irq with request_irq · 33b7107f

Christian Riesch authored Mar 24, 2014

In commit 6892b41d

Author: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Date:   Tue Jun 25 21:24:51 2013 +0530
net: davinci: emac: Convert to devm_* api

the call of request_irq is replaced by devm_request_irq and the call
of free_irq is removed. But since interrupts are requested in
emac_dev_open, doing ifconfig up/down on the board requests the
interrupts again each time, causing devm_request_irq to fail. The
interface is dead until the device is rebooted.

This patch reverts said commit partially: It changes the driver back
to use request_irq instead of devm_request_irq, puts free_irq back in
place, but keeps the remaining changes of the original patch.
Reported-by: Jon Ringle <jon@ringle.org>
Signed-off-by: Christian Riesch <christian.riesch@omicron.at>
Cc: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

33b7107f

netpoll: fix the skb check in pkt_is_ns · c27f0872

Li RongQing authored Mar 21, 2014

Neighbor Solicitation is ipv6 protocol, so we should check
skb->protocol with ETH_P_IPV6
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Cc: WANG Cong <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c27f0872

sparc64: Make sure %pil interrupts are enabled during hypervisor yield. · cb3042d6

David S. Miller authored Mar 24, 2014

In arch_cpu_idle() we must enable %pil based interrupts before
potentially invoking the hypervisor cpu yield call.

As per the Hypervisor API documentation for cpu_yield:

	Interrupts which are blocked by some mechanism other that
	pstate.ie (for example %pil) are not guaranteed to cause
	a return from this service.

It seems that only first generation Niagara chips are hit by this
bug.  My best guess is that later chips implement this in hardware
and wake up anyways from %pil events, whereas in first generation
chips the yield is implemented completely in hypervisor code and
requires %pil to be enabled in order to wake properly from this
call.

Fixes: 87fa05ae ("sparc: Use generic idle loop")
Reported-by: Fabio M. Di Nitto <fabbione@fabbione.net>
Reported-by: Jan Engelhardt <jengelh@inai.de>
Tested-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

cb3042d6

net: micrel : ks8851-ml: add vdd-supply support · ebf4ad95

Nishanth Menon authored Mar 21, 2014

Few platforms use external regulator to keep the ethernet MAC supplied.
So, request and enable the regulator for driver functionality.

Fixes: 66fda75f (regulator: core: Replace direct ops->disable usage)
Reported-by: Russell King <rmk+kernel@arm.linux.org.uk>
Suggested-by: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ebf4ad95

23 Mar, 2014 2 commits

parisc: locks: remove redundant arch_*_relax operations · a34fe107

Will Deacon authored Feb 21, 2014

Now that the arch_{spin,read,write}_relax macros default to cpu_relax(),
remove the redundant definitions for parisc.

Cc: Helge Deller <deller@gmx.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Helge Deller <deller@gmx.de>

a34fe107

parisc: wire up sys_utimes · e9af8b7a

Helge Deller authored Mar 23, 2014

We seem to be nearly the only platform which does not provide the
sys_utimes syscall.  Adding it now makes our life much easier with
userspace applications (like dietlibc and e2fsprogs) since we then
behave like all other platforms too and don't need extra patches which
are hard to get upstream anyway because we are not a mainstream
architecture.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # v3.13

e9af8b7a