1. 25 Jan, 2016 2 commits
    • Naoya Horiguchi's avatar
      mm: hugetlb: call huge_pte_alloc() only if ptep is null · 41d25989
      Naoya Horiguchi authored
      commit 0d777df5 upstream.
      
      Currently at the beginning of hugetlb_fault(), we call huge_pte_offset()
      and check whether the obtained *ptep is a migration/hwpoison entry or
      not.  And if not, then we get to call huge_pte_alloc().  This is racy
      because the *ptep could turn into migration/hwpoison entry after the
      huge_pte_offset() check.  This race results in BUG_ON in
      huge_pte_alloc().
      
      We don't have to call huge_pte_alloc() when the huge_pte_offset()
      returns non-NULL, so let's fix this bug with moving the code into else
      block.
      
      Note that the *ptep could turn into a migration/hwpoison entry after
      this block, but that's not a problem because we have another
      !pte_present check later (we never go into hugetlb_no_page() in that
      case.)
      
      Fixes: 290408d4 ("hugetlb: hugepage migration core")
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      41d25989
    • Michal Hocko's avatar
      mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress · f98ab7a1
      Michal Hocko authored
      commit 373ccbe5 upstream.
      
      Tetsuo Handa has reported that the system might basically livelock in
      OOM condition without triggering the OOM killer.
      
      The issue is caused by internal dependency of the direct reclaim on
      vmstat counter updates (via zone_reclaimable) which are performed from
      the workqueue context.  If all the current workers get assigned to an
      allocation request, though, they will be looping inside the allocator
      trying to reclaim memory but zone_reclaimable can see stalled numbers so
      it will consider a zone reclaimable even though it has been scanned way
      too much.  WQ concurrency logic will not consider this situation as a
      congested workqueue because it relies that worker would have to sleep in
      such a situation.  This also means that it doesn't try to spawn new
      workers or invoke the rescuer thread if the one is assigned to the
      queue.
      
      In order to fix this issue we need to do two things.  First we have to
      let wq concurrency code know that we are in trouble so we have to do a
      short sleep.  In order to prevent from issues handled by 0e093d99
      ("writeback: do not sleep on the congestion queue if there are no
      congested BDIs or if significant congestion is not being encountered in
      the current zone") we limit the sleep only to worker threads which are
      the ones of the interest anyway.
      
      The second thing to do is to create a dedicated workqueue for vmstat and
      mark it WQ_MEM_RECLAIM to note it participates in the reclaim and to
      have a spare worker thread for it.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Cristopher Lameter <clameter@sgi.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Arkadiusz Miskiewicz <arekm@maven.pl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      [ luis: backported to 3.16, based on Ben's backport to 3.2:
        - use queue_delayed_work instead of queue_delayed_work_on in function
          vmstat_update()
        - change start_cpu_timer() instead of vmstat_shepherd()
        - adjusted context ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      f98ab7a1
  2. 18 Jan, 2016 13 commits
    • Naoya Horiguchi's avatar
      mm: hugetlb: fix hugepage memory leak caused by wrong reserve count · 60d786c5
      Naoya Horiguchi authored
      commit a88c7695 upstream.
      
      When dequeue_huge_page_vma() in alloc_huge_page() fails, we fall back on
      alloc_buddy_huge_page() to directly create a hugepage from the buddy
      allocator.
      
      In that case, however, if alloc_buddy_huge_page() succeeds we don't
      decrement h->resv_huge_pages, which means that successful
      hugetlb_fault() returns without releasing the reserve count.  As a
      result, subsequent hugetlb_fault() might fail despite that there are
      still free hugepages.
      
      This patch simply adds decrementing code on that code path.
      
      I reproduced this problem when testing v4.3 kernel in the following situation:
       - the test machine/VM is a NUMA system,
       - hugepage overcommiting is enabled,
       - most of hugepages are allocated and there's only one free hugepage
         which is on node 0 (for example),
       - another program, which calls set_mempolicy(MPOL_BIND) to bind itself to
         node 1, tries to allocate a hugepage,
       - the allocation should fail but the reserve count is still hold.
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [ luis: backported to 3.16:
        - use 'chg' instead of 'gbl_chg'
        - adjusted context ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      60d786c5
    • Mikulas Patocka's avatar
      parisc iommu: fix panic due to trying to allocate too large region · 03dd7ad3
      Mikulas Patocka authored
      commit e46e31a3 upstream.
      
      When using the Promise TX2+ SATA controller on PA-RISC, the system often
      crashes with kernel panic, for example just writing data with the dd
      utility will make it crash.
      
      Kernel panic - not syncing: drivers/parisc/sba_iommu.c: I/O MMU @ 000000000000a000 is out of mapping resources
      
      CPU: 0 PID: 18442 Comm: mkspadfs Not tainted 4.4.0-rc2 #2
      Backtrace:
       [<000000004021497c>] show_stack+0x14/0x20
       [<0000000040410bf0>] dump_stack+0x88/0x100
       [<000000004023978c>] panic+0x124/0x360
       [<0000000040452c18>] sba_alloc_range+0x698/0x6a0
       [<0000000040453150>] sba_map_sg+0x260/0x5b8
       [<000000000c18dbb4>] ata_qc_issue+0x264/0x4a8 [libata]
       [<000000000c19535c>] ata_scsi_translate+0xe4/0x220 [libata]
       [<000000000c19a93c>] ata_scsi_queuecmd+0xbc/0x320 [libata]
       [<0000000040499bbc>] scsi_dispatch_cmd+0xfc/0x130
       [<000000004049da34>] scsi_request_fn+0x6e4/0x970
       [<00000000403e95a8>] __blk_run_queue+0x40/0x60
       [<00000000403e9d8c>] blk_run_queue+0x3c/0x68
       [<000000004049a534>] scsi_run_queue+0x2a4/0x360
       [<000000004049be68>] scsi_end_request+0x1a8/0x238
       [<000000004049de84>] scsi_io_completion+0xfc/0x688
       [<0000000040493c74>] scsi_finish_command+0x17c/0x1d0
      
      The cause of the crash is not exhaustion of the IOMMU space, there is
      plenty of free pages. The function sba_alloc_range is called with size
      0x11000, thus the pages_needed variable is 0x11. The function
      sba_search_bitmap is called with bits_wanted 0x11 and boundary size is
      0x10 (because dma_get_seg_boundary(dev) returns 0xffff).
      
      The function sba_search_bitmap attempts to allocate 17 pages that must not
      cross 16-page boundary - it can't satisfy this requirement
      (iommu_is_span_boundary always returns true) and fails even if there are
      many free entries in the IOMMU space.
      
      How did it happen that we try to allocate 17 pages that don't cross
      16-page boundary? The cause is in the function iommu_coalesce_chunks. This
      function tries to coalesce adjacent entries in the scatterlist. The
      function does several checks if it may coalesce one entry with the next,
      one of those checks is this:
      
      	if (startsg->length + dma_len > max_seg_size)
      		break;
      
      When it finishes coalescing adjacent entries, it allocates the mapping:
      
      sg_dma_len(contig_sg) = dma_len;
      dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE);
      sg_dma_address(contig_sg) =
      	PIDE_FLAG
      	| (iommu_alloc_range(ioc, dev, dma_len) << IOVP_SHIFT)
      	| dma_offset;
      
      It is possible that (startsg->length + dma_len > max_seg_size) is false
      (we are just near the 0x10000 max_seg_size boundary), so the funcion
      decides to coalesce this entry with the next entry. When the coalescing
      succeeds, the function performs
      	dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE);
      And now, because of non-zero dma_offset, dma_len is greater than 0x10000.
      iommu_alloc_range (a pointer to sba_alloc_range) is called and it attempts
      to allocate 17 pages for a device that must not cross 16-page boundary.
      
      To fix the bug, we must make sure that dma_len after addition of
      dma_offset and alignment doesn't cross the segment boundary. I.e. change
      	if (startsg->length + dma_len > max_seg_size)
      		break;
      to
      	if (ALIGN(dma_len + dma_offset + startsg->length, IOVP_SIZE) > max_seg_size)
      		break;
      
      This patch makes this change (it precalculates max_seg_boundary at the
      beginning of the function iommu_coalesce_chunks). I also added a check
      that the mapping length doesn't exceed dma_get_seg_boundary(dev) (it is
      not needed for Promise TX2+ SATA, but it may be needed for other devices
      that have dma_get_seg_boundary lower than dma_get_max_seg_size).
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      03dd7ad3
    • Alan Stern's avatar
      USB: add quirk for devices with broken LPM · 23b42ce5
      Alan Stern authored
      commit ad87e032 upstream.
      
      Some USB device / host controller combinations seem to have problems
      with Link Power Management.  For example, Steinar found that his xHCI
      controller wouldn't handle bandwidth calculations correctly for two
      video cards simultaneously when LPM was enabled, even though the bus
      had plenty of bandwidth available.
      
      This patch introduces a new quirk flag for devices that should remain
      disabled for LPM, and creates quirk entries for Steinar's devices.
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Reported-by: default avatarSteinar H. Gunderson <sgunderson@bigfoot.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [ luis: backported to 3.16: adjusted context ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      23b42ce5
    • Mathias Nyman's avatar
      xhci: fix usb2 resume timing and races. · 30f8b511
      Mathias Nyman authored
      commit f69115fd upstream.
      
      According to USB 2 specs ports need to signal resume for at least 20ms,
      in practice even longer, before moving to U0 state.
      Both host and devices can initiate resume.
      
      On device initiated resume, a port status interrupt with the port in resume
      state in issued. The interrupt handler tags a resume_done[port]
      timestamp with current time + USB_RESUME_TIMEOUT, and kick roothub timer.
      Root hub timer requests for port status, finds the port in resume state,
      checks if resume_done[port] timestamp passed, and set port to U0 state.
      
      On host initiated resume, current code sets the port to resume state,
      sleep 20ms, and finally sets the port to U0 state. This should also
      be changed to work in a similar way as the device initiated resume, with
      timestamp tagging, but that is not yet tested and will be a separate
      fix later.
      
      There are a few issues with this approach
      
      1. A host initiated resume will also generate a resume event. The event
         handler will find the port in resume state, believe it's a device
         initiated resume, and act accordingly.
      
      2. A port status request might cut the resume signalling short if a
         get_port_status request is handled during the host resume signalling.
         The port will be found in resume state. The timestamp is not set leading
         to time_after_eq(jiffies, timestamp) returning true, as timestamp = 0.
         get_port_status will proceed with moving the port to U0.
      
      3. If an error, or anything else happens to the port during device
         initiated resume signalling it will leave all the device resume
         parameters hanging uncleared, preventing further suspend, returning
         -EBUSY, and cause the pm thread to busyloop trying to enter suspend.
      
      Fix this by using the existing resuming_ports bitfield to indicate that
      resume signalling timing is taken care of.
      Check if the resume_done[port] is set before using it for timestamp
      comparison, and also clear out any resume signalling related variables
      if port is not in U0 or Resume state
      
      This issue was discovered when a PM thread busylooped, trying to runtime
      suspend the xhci USB 2 roothub on a Dell XPS
      Reported-by: default avatarDaniel J Blueman <daniel@quora.org>
      Tested-by: default avatarDaniel J Blueman <daniel@quora.org>
      Signed-off-by: default avatarMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      30f8b511
    • Kirill A. Shutemov's avatar
      vgaarb: fix signal handling in vga_get() · ccebddfb
      Kirill A. Shutemov authored
      commit 9f5bd308 upstream.
      
      There are few defects in vga_get() related to signal hadning:
      
        - we shouldn't check for pending signals for TASK_UNINTERRUPTIBLE
          case;
      
        - if we found pending signal we must remove ourself from wait queue
          and change task state back to running;
      
        - -ERESTARTSYS is more appropriate, I guess.
      Signed-off-by: default avatarKirill A. Shutemov <kirill@shutemov.name>
      Reviewed-by: default avatarDavid Herrmann <dh.herrmann@gmail.com>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      ccebddfb
    • Joe Thornber's avatar
      dm btree: fix bufio buffer leaks in dm_btree_del() error path · a94064d4
      Joe Thornber authored
      commit ed8b45a3 upstream.
      
      If dm_btree_del()'s call to push_frame() fails, e.g. due to
      btree_node_validator finding invalid metadata, the dm_btree_del() error
      path must unlock all frames (which have active dm-bufio buffers) that
      were pushed onto the del_stack.
      
      Otherwise, dm_bufio_client_destroy() will BUG_ON() because dm-bufio
      buffers have leaked, e.g.:
        device-mapper: bufio: leaked buffer 3, hold count 1, list 0
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      a94064d4
    • Jan Stancek's avatar
      ipmi: move timer init to before irq is setup · 35f6d738
      Jan Stancek authored
      commit 27f972d3 upstream.
      
      We encountered a panic on boot in ipmi_si on a dell per320 due to an
      uninitialized timer as follows.
      
      static int smi_start_processing(void       *send_info,
                                      ipmi_smi_t intf)
      {
              /* Try to claim any interrupts. */
              if (new_smi->irq_setup)
                      new_smi->irq_setup(new_smi);
      
       --> IRQ arrives here and irq handler tries to modify uninitialized timer
      
          which triggers BUG_ON(!timer->function) in __mod_timer().
      
       Call Trace:
         <IRQ>
         [<ffffffffa0532617>] start_new_msg+0x47/0x80 [ipmi_si]
         [<ffffffffa053269e>] start_check_enables+0x4e/0x60 [ipmi_si]
         [<ffffffffa0532bd8>] smi_event_handler+0x1e8/0x640 [ipmi_si]
         [<ffffffff810f5584>] ? __rcu_process_callbacks+0x54/0x350
         [<ffffffffa053327c>] si_irq_handler+0x3c/0x60 [ipmi_si]
         [<ffffffff810efaf0>] handle_IRQ_event+0x60/0x170
         [<ffffffff810f245e>] handle_edge_irq+0xde/0x180
         [<ffffffff8100fc59>] handle_irq+0x49/0xa0
         [<ffffffff8154643c>] do_IRQ+0x6c/0xf0
         [<ffffffff8100ba53>] ret_from_intr+0x0/0x11
      
              /* Set up the timer that drives the interface. */
              setup_timer(&new_smi->si_timer, smi_timeout, (long)new_smi);
      
      The following patch fixes the problem.
      
      To: Openipmi-developer@lists.sourceforge.net
      To: Corey Minyard <minyard@acm.org>
      CC: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarJan Stancek <jstancek@redhat.com>
      Signed-off-by: default avatarTony Camuso <tcamuso@redhat.com>
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      35f6d738
    • Joe Thornber's avatar
      dm space map metadata: fix ref counting bug when bootstrapping a new space map · b934c4ec
      Joe Thornber authored
      commit 50dd842a upstream.
      
      When applying block operations (BOPs) do not remove them from the
      uncommitted BOP ring-buffer until after they've been applied -- in case
      we recurse.
      
      Also, perform BOP_INC operation, in dm_sm_metadata_create() and
      sm_metadata_extend(), in terms of the uncommitted BOP ring-buffer rather
      than using direct calls to sm_ll_inc().
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      b934c4ec
    • Joe Thornber's avatar
      dm thin metadata: fix bug when taking a metadata snapshot · b8e3468d
      Joe Thornber authored
      commit 49e99fc7 upstream.
      
      When you take a metadata snapshot the btree roots for the mapping and
      details tree need to have their reference counts incremented so they
      persist for the lifetime of the metadata snap.
      
      The roots being incremented were those currently written in the
      superblock, which could possibly be out of date if concurrent IO is
      triggering new mappings, breaking of sharing, etc.
      
      Fix this by performing a commit with the metadata lock held while taking
      a metadata snapshot.
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      b8e3468d
    • Takashi Iwai's avatar
      ALSA: hda - Fix noise problems on Thinkpad T440s · b7caa158
      Takashi Iwai authored
      commit 9a811230 upstream.
      
      Lenovo Thinkpad T440s suffers from constant background noises, and it
      seems to be a generic hardware issue on this model:
        https://forums.lenovo.com/t5/ThinkPad-T400-T500-and-newer-T/T440s-speaker-noise/td-p/1339883
      
      As the noise comes from the analog loopback path, disabling the path
      is the easy workaround.
      
      Also, the machine gives significant cracking noises at PM suspend.  A
      workaround found by trial-and-error is to disable the shutup callback
      currently used for ALC269-variant.
      
      This patch addresses these noise issues by introducing a new fixup
      chain.  Although the same workaround might be applicable to other
      Thinkpad models, it's applied only to T440s (17aa:220c) in this patch,
      so far, just to be safe (you chicken!).  As a compromise, a new model
      option string "tp440" is provided now, though, so that owners of other
      Thinkpad models can test it more easily.
      
      Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=958504Reported-and-tested-by: default avatarTim Hardeck <thardeck@suse.de>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      [ luis: backported to 3.16: adjusted context ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      b7caa158
    • Oded Gabbay's avatar
      radeon: Fix VCE IB test on Big-Endian systems · 4b328046
      Oded Gabbay authored
      commit 361c32d3 upstream.
      
      This patch makes the VCE IB test pass on Big-Endian systems. It converts
      to little-endian the contents of the VCE message.
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarOded Gabbay <oded.gabbay@gmail.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      [ luis: backported to 3.16: adjusted context ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      4b328046
    • Oded Gabbay's avatar
      radeon: Fix VCE ring test for Big-Endian systems · ec3c6a0f
      Oded Gabbay authored
      commit 687f4b98 upstream.
      
      This patch fixes the VCE ring test when running on Big-Endian machines.
      Every write to the ring needs to be translated to little-endian.
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarOded Gabbay <oded.gabbay@gmail.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      [ luis: backported to 3.16: adjusted context ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      ec3c6a0f
    • Oded Gabbay's avatar
      radeon/cik: Fix GFX IB test on Big-Endian · 1fe8b129
      Oded Gabbay authored
      commit 5f3e226f upstream.
      
      This patch makes the IB test on the GFX ring pass for CI-based cards
      installed in Big-Endian machines.
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarOded Gabbay <oded.gabbay@gmail.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      1fe8b129
  3. 11 Jan, 2016 25 commits