1. 01 May, 2014 1 commit
  2. 29 Apr, 2014 1 commit
  3. 15 Apr, 2014 1 commit
  4. 08 Apr, 2014 2 commits
    • Joe Thornber's avatar
      dm thin: fix rcu_read_lock being held in code that can sleep · b10ebd34
      Joe Thornber authored
      Commit c140e1c4 ("dm thin: use per thin device deferred bio lists")
      introduced the use of an rculist for all active thin devices.  The use
      of rcu_read_lock() in process_deferred_bios() can result in a BUG if a
      dm_bio_prison_cell must be allocated as a side-effect of bio_detain():
      
       BUG: sleeping function called from invalid context at mm/mempool.c:203
       in_atomic(): 1, irqs_disabled(): 0, pid: 6, name: kworker/u8:0
       3 locks held by kworker/u8:0/6:
         #0:  ("dm-" "thin"){.+.+..}, at: [<ffffffff8106be42>] process_one_work+0x192/0x550
         #1:  ((&pool->worker)){+.+...}, at: [<ffffffff8106be42>] process_one_work+0x192/0x550
         #2:  (rcu_read_lock){.+.+..}, at: [<ffffffff816360b5>] do_worker+0x5/0x4d0
      
      We can't process deferred bios with the rcu lock held, since
      dm_bio_prison_cell allocation may block if the bio-prison's cell mempool
      is exhausted.
      
      To fix:
      
      - Introduce a refcount and completion field to each thin_c
      
      - Add thin_get/put methods for adjusting the refcount.  If the refcount
        hits zero then the completion is triggered.
      
      - Initialise refcount to 1 when creating thin_c
      
      - When iterating the active_thins list we thin_get() whilst the rcu
        lock is held.
      
      - After the rcu lock is dropped we process the deferred bios for that
        thin.
      
      - When destroying a thin_c we thin_put() and then wait for the
        completion -- to avoid a race between the worker thread iterating
        from that thin_c and destroying the thin_c.
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      b10ebd34
    • Joe Thornber's avatar
      dm thin: irqsave must always be used with the pool->lock spinlock · 5e3283e2
      Joe Thornber authored
      Commit c140e1c4 ("dm thin: use per thin device deferred bio lists")
      incorrectly stopped disabling irqs when taking the pool's spinlock.
      
      Irqs must be disabled when taking the pool's spinlock otherwise a thread
      could spin_lock(), then get interrupted to service thin_endio() in
      interrupt context, which would then deadlock in spin_lock_irqsave().
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      5e3283e2
  5. 04 Apr, 2014 2 commits
    • Joe Thornber's avatar
      dm cache: fix a lock-inversion · 0596661f
      Joe Thornber authored
      When suspending a cache the policy is walked and the individual policy
      hints written to the metadata via sync_metadata().  This led to this
      lock order:
      
            policy->lock
              cache_metadata->root_lock
      
      When loading the cache target the policy is populated while the metadata
      lock is held:
      
            cache_metadata->root_lock
               policy->lock
      
      Fix this potential lock-inversion (ABBA) deadlock in sync_metadata() by
      ensuring the cache_metadata root_lock is held whilst all the hints are
      written, rather than being repeatedly locked while policy->lock is held
      (as was the case with each callout that policy_walk_mappings() made to
      the old save_hint() method).
      
      Found by turning on the CONFIG_PROVE_LOCKING ("Lock debugging: prove
      locking correctness") build option.  However, it is not clear how the
      LOCKDEP reported paths can lead to a deadlock since the two paths,
      suspending a target and loading a target, never occur at the same time.
      But that doesn't mean the same lock-inversion couldn't have occurred
      elsewhere.
      Reported-by: default avatarMarian Csontos <mcsontos@redhat.com>
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      0596661f
    • Mike Snitzer's avatar
      dm thin: sort the per thin deferred bios using an rb_tree · 67324ea1
      Mike Snitzer authored
      A thin-pool will allocate blocks using FIFO order for all thin devices
      which share the thin-pool.  Because of this simplistic allocation the
      thin-pool's space can become fragmented quite easily; especially when
      multiple threads are requesting blocks in parallel.
      
      Sort each thin device's deferred_bio_list based on logical sector to
      help reduce fragmentation of the thin-pool's ondisk layout.
      
      The following tables illustrate the realized gains/potential offered by
      sorting each thin device's deferred_bio_list.  An "io size"-sized random
      read of the device would result in "seeks/io" fragments being read, with
      an average "distance/seek" between each fragment.
      
      Data was written to a single thin device using multiple threads via
      iozone (8 threads, 64K for both the block_size and io_size).
      
      unsorted:
      
           io size   seeks/io distance/seek
        --------------------------------------
                4k    0.000   0b
               16k    0.013   11m
               64k    0.065   11m
              256k    0.274   10m
                1m    1.109   10m
                4m    4.411   10m
               16m    17.097  11m
               64m    60.055  13m
              256m    148.798 25m
                1g    809.929 21m
      
      sorted:
      
           io size   seeks/io distance/seek
        --------------------------------------
                4k    0.000   0b
               16k    0.000   1g
               64k    0.001   1g
              256k    0.003   1g
                1m    0.011   1g
                4m    0.045   1g
               16m    0.181   1g
               64m    0.747   1011m
              256m    3.299   1g
                1g    14.373  1g
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      67324ea1
  6. 31 Mar, 2014 2 commits
  7. 28 Mar, 2014 1 commit
  8. 27 Mar, 2014 19 commits
  9. 25 Mar, 2014 4 commits
    • Linus Torvalds's avatar
      Linux 3.14-rc8 · b098d672
      Linus Torvalds authored
      b098d672
    • Linus Torvalds's avatar
      Merge branch 'parisc-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · 82231646
      Linus Torvalds authored
      Pull parisc updates from Helge Deller:
       - revert parts of the latest patch regarding font selection with STICON
         console
       - wire up the utimes() syscall for parisc
       - remove the unused parisc tmpalias code and unnecessary arch*relax
         defines
      
      * 'parisc-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: locks: remove redundant arch_*_relax operations
        parisc: wire up sys_utimes
        parisc: Remove unused CONFIG_PARISC_TMPALIAS code
        partly revert commit 8a10bc9d: parisc/sti_console: prefer Linux fonts over built-in ROM fonts
      82231646
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · 56f1f4b2
      Linus Torvalds authored
      Pull sparc fixes from David Miller:
      
       1) Do serial locking in a way that makes things clear that these are
          IRQ spinlocks.
      
       2) Conversion to generic idle loop broke first generation Niagara
          machines, need to have %pil interrupts enabled during cpu yield
          hypervisor call.
      
       3) Do not use magic constants for iterations over tsb tables, from Doug
          Wilson.
      
       4) Fix erroneous truncation of 64-bit system call return values to
          32-bit.  From Dave Kleikamp.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        sparc64: Make sure %pil interrupts are enabled during hypervisor yield.
        sparc64:tsb.c:use array size macro rather than number
        sparc64: don't treat 64-bit syscall return codes as 32-bit
        sparc: serial: Clean up the locking for -rt
      56f1f4b2
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 8a109446
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) OpenVswitch's lookup_datapath() returns error pointers, so don't
          check against NULL.  From Jiri Pirko.
      
       2) pfkey_compile_policy() code path tries to do a GFP_KERNEL allocation
          under RCU locks, fix by using GFP_ATOMIC when necessary.  From
          Nikolay Aleksandrov.
      
       3) phy_suspend() indirectly passes uninitialized data into the ethtool
          get wake-on-land implementations.  Fix from Sebastian Hesselbarth.
      
       4) CPSW driver unregisters CPTS twice, fix from Benedikt Spranger.
      
       5) If SKB allocation of reply packet fails, vxlan's arp_reduce() defers
          a NULL pointer.  Fix from David Stevens.
      
       6) IPV6 neigh handling in vxlan doesn't validate the destination
          address properly, and it builds a packet with the src and dst
          reversed.  Fix also from David Stevens.
      
       7) Fix spinlock recursion during subscription failures in TIPC stack,
          from Erik Hugne.
      
       8) Revert buggy conversion of davinci_emac to devm_request_irq, from
          Chrstian Riesch.
      
       9) Wrong flags passed into forwarding database netlink notifications,
          from Nicolas Dichtel.
      
      10) The netpoll neighbour soliciation handler checks wrong ethertype,
          needs to be ETH_P_IPV6 rather than ETH_P_ARP.  Fix from Li RongQing.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (34 commits)
        tipc: fix spinlock recursion bug for failed subscriptions
        vxlan: fix nonfunctional neigh_reduce()
        net: davinci_emac: Fix rollback of emac_dev_open()
        net: davinci_emac: Replace devm_request_irq with request_irq
        netpoll: fix the skb check in pkt_is_ns
        net: micrel : ks8851-ml: add vdd-supply support
        ip6mr: fix mfc notification flags
        ipmr: fix mfc notification flags
        rtnetlink: fix fdb notification flags
        tcp: syncookies: do not use getnstimeofday()
        netlink: fix setsockopt in mmap examples in documentation
        openvswitch: Correctly report flow used times for first 5 minutes after boot.
        via-rhine: Disable device in error path
        ATHEROS-ATL1E: Convert iounmap to pci_iounmap
        vxlan: fix potential NULL dereference in arp_reduce()
        cnic: Update version to 2.5.20 and copyright year.
        cnic,bnx2i,bnx2fc: Fix inconsistent use of page size
        cnic: Use proper ulp_ops for per device operations.
        net: cdc_ncm: fix control message ordering
        ipv6: ip6_append_data_mtu do not handle the mtu of the second fragment properly
        ...
      8a109446
  10. 24 Mar, 2014 7 commits
    • Erik Hugne's avatar
      tipc: fix spinlock recursion bug for failed subscriptions · a5d0e7c0
      Erik Hugne authored
      If a topology event subscription fails for any reason, such as out
      of memory, max number reached or because we received an invalid
      request the correct behavior is to terminate the subscribers
      connection to the topology server. This is currently broken and
      produces the following oops:
      
      [27.953662] tipc: Subscription rejected, illegal request
      [27.955329] BUG: spinlock recursion on CPU#1, kworker/u4:0/6
      [27.957066]  lock: 0xffff88003c67f408, .magic: dead4ead, .owner: kworker/u4:0/6, .owner_cpu: 1
      [27.958054] CPU: 1 PID: 6 Comm: kworker/u4:0 Not tainted 3.14.0-rc6+ #5
      [27.960230] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [27.960874] Workqueue: tipc_rcv tipc_recv_work [tipc]
      [27.961430]  ffff88003c67f408 ffff88003de27c18 ffffffff815c0207 ffff88003de1c050
      [27.962292]  ffff88003de27c38 ffffffff815beec5 ffff88003c67f408 ffffffff817f0a8a
      [27.963152]  ffff88003de27c58 ffffffff815beeeb ffff88003c67f408 ffffffffa0013520
      [27.964023] Call Trace:
      [27.964292]  [<ffffffff815c0207>] dump_stack+0x45/0x56
      [27.964874]  [<ffffffff815beec5>] spin_dump+0x8c/0x91
      [27.965420]  [<ffffffff815beeeb>] spin_bug+0x21/0x26
      [27.965995]  [<ffffffff81083df6>] do_raw_spin_lock+0x116/0x140
      [27.966631]  [<ffffffff815c6215>] _raw_spin_lock_bh+0x15/0x20
      [27.967256]  [<ffffffffa0008540>] subscr_conn_shutdown_event+0x20/0xa0 [tipc]
      [27.968051]  [<ffffffffa000fde4>] tipc_close_conn+0xa4/0xb0 [tipc]
      [27.968722]  [<ffffffffa00101ba>] tipc_conn_terminate+0x1a/0x30 [tipc]
      [27.969436]  [<ffffffffa00089a2>] subscr_conn_msg_event+0x1f2/0x2f0 [tipc]
      [27.970209]  [<ffffffffa0010000>] tipc_receive_from_sock+0x90/0xf0 [tipc]
      [27.970972]  [<ffffffffa000fa79>] tipc_recv_work+0x29/0x50 [tipc]
      [27.971633]  [<ffffffff8105dbf5>] process_one_work+0x165/0x3e0
      [27.972267]  [<ffffffff8105e869>] worker_thread+0x119/0x3a0
      [27.972896]  [<ffffffff8105e750>] ? manage_workers.isra.25+0x2a0/0x2a0
      [27.973622]  [<ffffffff810648af>] kthread+0xdf/0x100
      [27.974168]  [<ffffffff810647d0>] ? kthread_create_on_node+0x1a0/0x1a0
      [27.974893]  [<ffffffff815ce13c>] ret_from_fork+0x7c/0xb0
      [27.975466]  [<ffffffff810647d0>] ? kthread_create_on_node+0x1a0/0x1a0
      
      The recursion occurs when subscr_terminate tries to grab the
      subscriber lock, which is already taken by subscr_conn_msg_event.
      We fix this by checking if the request to establish a new
      subscription was successful, and if not we initiate termination of
      the subscriber after we have released the subscriber lock.
      Signed-off-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5d0e7c0
    • David Stevens's avatar
      vxlan: fix nonfunctional neigh_reduce() · 4b29dba9
      David Stevens authored
      The VXLAN neigh_reduce() code is completely non-functional since
      check-in. Specific errors:
      
      1) The original code drops all packets with a multicast destination address,
      	even though neighbor solicitations are sent to the solicited-node
      	address, a multicast address. The code after this check was never run.
      2) The neighbor table lookup used the IPv6 header destination, which is the
      	solicited node address, rather than the target address from the
      	neighbor solicitation. So neighbor lookups would always fail if it
      	got this far. Also for L3MISSes.
      3) The code calls ndisc_send_na(), which does a send on the tunnel device.
      	The context for neigh_reduce() is the transmit path, vxlan_xmit(),
      	where the host or a bridge-attached neighbor is trying to transmit
      	a neighbor solicitation. To respond to it, the tunnel endpoint needs
      	to do a *receive* of the appropriate neighbor advertisement. Doing a
      	send, would only try to send the advertisement, encapsulated, to the
      	remote destinations in the fdb -- hosts that definitely did not do the
      	corresponding solicitation.
      4) The code uses the tunnel endpoint IPv6 forwarding flag to determine the
      	isrouter flag in the advertisement. This has nothing to do with whether
      	or not the target is a router, and generally won't be set since the
      	tunnel endpoint is bridging, not routing, traffic.
      
      	The patch below creates a proxy neighbor advertisement to respond to
      neighbor solicitions as intended, providing proper IPv6 support for neighbor
      reduction.
      Signed-off-by: default avatarDavid L Stevens <dlstevens@us.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4b29dba9
    • David S. Miller's avatar
      Merge branch 'davinci_emac' · 866b7cdf
      David S. Miller authored
      Christian Riesch says:
      
      ====================
      net: davinci_emac: Fix interrupt requests and error handling
      
      since commit 6892b41d (Linux 3.11) the
      davinci_emac driver is broken. After doing ifconfig down, ifconfig up,
      requesting the interrupts for the driver fails. The interface remains dead
      until the board is rebooted.
      
      The first patch in this patchset reverts commit
      6892b41d partially and makes the driver
      useable again.
      
      During the work on the first patch, a number of bugs in the error handling
      of the driver's ndo_open code were found. The second patch fixes these bugs.
      
      I believe the first patch meets the rules for stable kernels, I therefore added
      the stable tag to this patch. The second patch is just cleanup, the code
      that is fixed by this patch is only executed in case of an error.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      866b7cdf
    • Christian Riesch's avatar
      net: davinci_emac: Fix rollback of emac_dev_open() · cd11cf50
      Christian Riesch authored
      If an error occurs during the initialization in emac_dev_open() (the
      driver's ndo_open function), interrupts, DMA descriptors etc. must be freed.
      The current rollback code is buggy in several ways.
      
        1) Freeing the interrupts. The current code will not free all interrupts
           that were requested by the driver. Furthermore,  the code tries to do a
           platform_get_resource(priv->pdev, IORESOURCE_IRQ, -1) in its last
           iteration.
      
           This patch fixes these bugs.
      
        2) Wrong order of err: and rollback: labels. If the setup of the PHY in
           the code fails, the interrupts that have been requested before are
           not freed:
      
              request irq
                      if requesting irqs fails, goto rollback
              setup phy
                      if phy setup fails, goto err
              return 0
      
           rollback:
              free irqs
           err:
      
           This patch brings the code into the correct order.
      
        3) The code calls napi_enable() and emac_int_enable(), but does not
           undo both in case of an error.
      
           This patch adds calls of emac_int_disable() and napi_disable() to the
           rollback code.
      
        4) RX DMA descriptors are not freed in case of an error: Right before
           requesting the irqs, the function creates DMA descriptors for the
           RX channel. These RX descriptors are never freed when we jump to either
           rollback or err.
      
           This patch adds code for freeing the DMA descriptors in the case of
           an initialization error. This required a modification of
           cpdma_ctrl_stop() in davinci_cpdma.c: We must be able to call this
           function to free the DMA descriptors while the DMA channels are
           in IDLE state (before cpdma_ctlr_start() was called).
      
      Tested on a custom board with the Texas Instruments AM1808.
      Signed-off-by: default avatarChristian Riesch <christian.riesch@omicron.at>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cd11cf50
    • Christian Riesch's avatar
      net: davinci_emac: Replace devm_request_irq with request_irq · 33b7107f
      Christian Riesch authored
      In commit 6892b41d
      
      Author: Lad, Prabhakar <prabhakar.csengg@gmail.com>
      Date:   Tue Jun 25 21:24:51 2013 +0530
      net: davinci: emac: Convert to devm_* api
      
      the call of request_irq is replaced by devm_request_irq and the call
      of free_irq is removed. But since interrupts are requested in
      emac_dev_open, doing ifconfig up/down on the board requests the
      interrupts again each time, causing devm_request_irq to fail. The
      interface is dead until the device is rebooted.
      
      This patch reverts said commit partially: It changes the driver back
      to use request_irq instead of devm_request_irq, puts free_irq back in
      place, but keeps the remaining changes of the original patch.
      Reported-by: default avatarJon Ringle <jon@ringle.org>
      Signed-off-by: default avatarChristian Riesch <christian.riesch@omicron.at>
      Cc: Lad, Prabhakar <prabhakar.csengg@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      33b7107f
    • Li RongQing's avatar
      netpoll: fix the skb check in pkt_is_ns · c27f0872
      Li RongQing authored
      Neighbor Solicitation is ipv6 protocol, so we should check
      skb->protocol with ETH_P_IPV6
      Signed-off-by: default avatarLi RongQing <roy.qing.li@gmail.com>
      Cc: WANG Cong <amwang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c27f0872
    • David S. Miller's avatar
      sparc64: Make sure %pil interrupts are enabled during hypervisor yield. · cb3042d6
      David S. Miller authored
      In arch_cpu_idle() we must enable %pil based interrupts before
      potentially invoking the hypervisor cpu yield call.
      
      As per the Hypervisor API documentation for cpu_yield:
      
      	Interrupts which are blocked by some mechanism other that
      	pstate.ie (for example %pil) are not guaranteed to cause
      	a return from this service.
      
      It seems that only first generation Niagara chips are hit by this
      bug.  My best guess is that later chips implement this in hardware
      and wake up anyways from %pil events, whereas in first generation
      chips the yield is implemented completely in hypervisor code and
      requires %pil to be enabled in order to wake properly from this
      call.
      
      Fixes: 87fa05ae ("sparc: Use generic idle loop")
      Reported-by: default avatarFabio M. Di Nitto <fabbione@fabbione.net>
      Reported-by: default avatarJan Engelhardt <jengelh@inai.de>
      Tested-by: default avatarJan Engelhardt <jengelh@inai.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cb3042d6