1. 20 Aug, 2015 1 commit
    • Linus Torvalds's avatar
      Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal · 3bd8f7d8
      Linus Torvalds authored
      Pull thermal fixes from Eduardo Valentin:
       "Last minute fixes on the thermal-soc tree.  There is a fix of a long
        lasting bug in cpu cooling device, thanks for RMK for being pushing
        this"
      
      * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
        thermal/cpu_cooling: update policy limits if clipped_freq < policy->max
        thermal/cpu_cooling: rename max_freq as clipped_freq in notifier
        thermal/cpu_cooling: rename cpufreq_val as clipped_freq
        thermal/cpu_cooling: convert 'switch' block to 'if' block in notifier
        thermal/cpu_cooling: quit early after updating policy
        thermal/cpu_cooling: No need to initialize max_freq to 0
        thermal: cpu_cooling: fix lockdep problems in cpu_cooling
        thermal: power_allocator: do not use devm* interfaces
      3bd8f7d8
  2. 18 Aug, 2015 2 commits
  3. 17 Aug, 2015 5 commits
  4. 16 Aug, 2015 8 commits
  5. 15 Aug, 2015 12 commits
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 1efdb5f0
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "This has two libfc fixes for bugs causing rare crashes, one iscsi fix
        for a potential hang on shutdown, and a fix for an I/O blocksize issue
        which caused a regression"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        sd: Fix maximum I/O size for BLOCK_PC requests
        libfc: Fix fc_fcp_cleanup_each_cmd()
        libfc: Fix fc_exch_recv_req() error path
        libiscsi: Fix host busy blocking during connection teardown
      1efdb5f0
    • Dave Airlie's avatar
      Merge tag 'topic/drm-fixes-2015-08-14' of git://anongit.freedesktop.org/drm-intel into drm-next · 7945dc58
      Dave Airlie authored
      single MST fixes from Maarten.
      
      * tag 'topic/drm-fixes-2015-08-14' of git://anongit.freedesktop.org/drm-intel:
        drm/dp/mst: Remove port after removing connector.
      7945dc58
    • Dave Airlie's avatar
      Merge tag 'drm-intel-fixes-2015-08-14' of git://anongit.freedesktop.org/drm-intel into drm-next · 3acceca9
      Dave Airlie authored
      three display fixes for Intel.
      
      * tag 'drm-intel-fixes-2015-08-14' of git://anongit.freedesktop.org/drm-intel:
        drm/i915: Commit planes on each crtc separately.
        drm/i915: calculate primary visibility changes instead of calling from set_config
        drm/i915: Only dither on 6bpc panels
      3acceca9
    • Viresh Kumar's avatar
      thermal/cpu_cooling: update policy limits if clipped_freq < policy->max · 1afb9c53
      Viresh Kumar authored
      policy->max is the maximum allowed frequency defined by user and
      clipped_freq is the maximum that thermal constraints allow.
      
      If clipped_freq is lower than policy->max, then we need to readjust
      policy->max.
      
      But, if clipped_freq is greater than policy->max, we don't need to do
      anything. We used to call cpufreq_verify_within_limits() in this case,
      but it doesn't change anything in this case.
      
      Lets skip this unnecessary call and write a comment that explains this.
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarEduardo Valentin <edubezval@gmail.com>
      1afb9c53
    • Viresh Kumar's avatar
      thermal/cpu_cooling: rename max_freq as clipped_freq in notifier · abcbcc25
      Viresh Kumar authored
      That's what it is for, lets name it properly.
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarEduardo Valentin <edubezval@gmail.com>
      abcbcc25
    • Viresh Kumar's avatar
      thermal/cpu_cooling: rename cpufreq_val as clipped_freq · 59f0d218
      Viresh Kumar authored
      That's what it is for, lets name it properly.
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarEduardo Valentin <edubezval@gmail.com>
      59f0d218
    • Viresh Kumar's avatar
      thermal/cpu_cooling: convert 'switch' block to 'if' block in notifier · a24af233
      Viresh Kumar authored
      We just need to take care of single event here and there is no need to
      increase indentation level of most of the code (which causes lines
      longer that 80 columns to break).
      
      Kill the switch block.
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarEduardo Valentin <edubezval@gmail.com>
      a24af233
    • Viresh Kumar's avatar
      thermal/cpu_cooling: quit early after updating policy · 166529c9
      Viresh Kumar authored
      If a valid cpufreq_dev is found for policy->cpu, we should update the
      policy and quit the for loop. There is no need to keep traversing the
      list of cpufreq_dev's.
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarEduardo Valentin <edubezval@gmail.com>
      166529c9
    • Viresh Kumar's avatar
      thermal/cpu_cooling: No need to initialize max_freq to 0 · 76fd38ce
      Viresh Kumar authored
      Its always set before getting used, don't initialize it.
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarEduardo Valentin <edubezval@gmail.com>
      76fd38ce
    • Russell King's avatar
      thermal: cpu_cooling: fix lockdep problems in cpu_cooling · 02373d7c
      Russell King authored
      A recent change to the cpu_cooling code introduced a AB-BA deadlock
      scenario between the cpufreq_policy_notifier_list rwsem and the
      cooling_cpufreq_lock.  This is caused by cooling_cpufreq_lock being held
      before the registration/removal of the notifier block (an operation
      which takes the rwsem), and the notifier code itself which takes the
      locks in the reverse order:
      
      ======================================================
      [ INFO: possible circular locking dependency detected ]
      3.18.0+ #1453 Not tainted
      -------------------------------------------------------
      rc.local/770 is trying to acquire lock:
       (cooling_cpufreq_lock){+.+.+.}, at: [<c04abfc4>] cpufreq_thermal_notifier+0x34/0xfc
      
      but task is already holding lock:
       ((cpufreq_policy_notifier_list).rwsem){++++.+}, at: [<c0042f04>]  __blocking_notifier_call_chain+0x34/0x68
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 ((cpufreq_policy_notifier_list).rwsem){++++.+}:
             [<c06bc3b0>] down_write+0x44/0x9c
             [<c0043444>] blocking_notifier_chain_register+0x28/0xd8
             [<c04ad610>] cpufreq_register_notifier+0x68/0x90
             [<c04abe4c>] __cpufreq_cooling_register.part.1+0x120/0x180
             [<c04abf44>] __cpufreq_cooling_register+0x98/0xa4
             [<c04abf8c>] cpufreq_cooling_register+0x18/0x1c
             [<bf0046f8>] imx_thermal_probe+0x1c0/0x470 [imx_thermal]
             [<c037cef8>] platform_drv_probe+0x50/0xac
             [<c037b710>] driver_probe_device+0x114/0x234
             [<c037b8cc>] __driver_attach+0x9c/0xa0
             [<c0379d68>] bus_for_each_dev+0x5c/0x90
             [<c037b204>] driver_attach+0x24/0x28
             [<c037ae7c>] bus_add_driver+0xe0/0x1d8
             [<c037c0cc>] driver_register+0x80/0xfc
             [<c037cd80>] __platform_driver_register+0x50/0x64
             [<bf007018>] 0xbf007018
             [<c0008a5c>] do_one_initcall+0x88/0x1d8
             [<c0095da4>] load_module+0x1768/0x1ef8
             [<c0096614>] SyS_init_module+0xe0/0xf4
             [<c000ec00>] ret_fast_syscall+0x0/0x48
      
      -> #0 (cooling_cpufreq_lock){+.+.+.}:
             [<c00619f8>] lock_acquire+0xb0/0x124
             [<c06ba3b4>] mutex_lock_nested+0x5c/0x3d8
             [<c04abfc4>] cpufreq_thermal_notifier+0x34/0xfc
             [<c0042bf4>] notifier_call_chain+0x4c/0x8c
             [<c0042f20>] __blocking_notifier_call_chain+0x50/0x68
             [<c0042f58>] blocking_notifier_call_chain+0x20/0x28
             [<c04ae62c>] cpufreq_set_policy+0x7c/0x1d0
             [<c04af3cc>] store_scaling_governor+0x74/0x9c
             [<c04ad418>] store+0x90/0xc0
             [<c0175384>] sysfs_kf_write+0x54/0x58
             [<c01746b4>] kernfs_fop_write+0xdc/0x190
             [<c010dcc0>] vfs_write+0xac/0x1b4
             [<c010dfec>] SyS_write+0x44/0x90
             [<c000ec00>] ret_fast_syscall+0x0/0x48
      
      other info that might help us debug this:
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock((cpufreq_policy_notifier_list).rwsem);
                                     lock(cooling_cpufreq_lock);
                                     lock((cpufreq_policy_notifier_list).rwsem);
        lock(cooling_cpufreq_lock);
      
       *** DEADLOCK ***
      
      7 locks held by rc.local/770:
       #0:  (sb_writers#6){.+.+.+}, at: [<c010dda0>] vfs_write+0x18c/0x1b4
       #1:  (&of->mutex){+.+.+.}, at: [<c0174678>] kernfs_fop_write+0xa0/0x190
       #2:  (s_active#52){.+.+.+}, at: [<c0174680>] kernfs_fop_write+0xa8/0x190
       #3:  (cpu_hotplug.lock){++++++}, at: [<c0026a60>] get_online_cpus+0x34/0x90
       #4:  (cpufreq_rwsem){.+.+.+}, at: [<c04ad3e0>] store+0x58/0xc0
       #5:  (&policy->rwsem){+.+.+.}, at: [<c04ad3f8>] store+0x70/0xc0
       #6:  ((cpufreq_policy_notifier_list).rwsem){++++.+}, at: [<c0042f04>] __blocking_notifier_call_chain+0x34/0x68
      
      stack backtrace:
      CPU: 0 PID: 770 Comm: rc.local Not tainted 3.18.0+ #1453
      Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
      Backtrace:
      [<c00121c8>] (dump_backtrace) from [<c0012360>] (show_stack+0x18/0x1c)
       r6:c0b85a80 r5:c0b75630 r4:00000000 r3:00000000
      [<c0012348>] (show_stack) from [<c06b6c48>] (dump_stack+0x7c/0x98)
      [<c06b6bcc>] (dump_stack) from [<c06b42a4>] (print_circular_bug+0x28c/0x2d8)
       r4:c0b85a80 r3:d0071d40
      [<c06b4018>] (print_circular_bug) from [<c00613b0>] (__lock_acquire+0x1acc/0x1bb0)
       r10:c0b50660 r8:c09e6d80 r7:d0071d40 r6:c11d0f0c r5:00000007 r4:d0072240
      [<c005f8e4>] (__lock_acquire) from [<c00619f8>] (lock_acquire+0xb0/0x124)
       r10:00000000 r9:c04abfc4 r8:00000000 r7:00000000 r6:00000000 r5:c0a06f0c
       r4:00000000
      [<c0061948>] (lock_acquire) from [<c06ba3b4>] (mutex_lock_nested+0x5c/0x3d8)
       r10:ec853800 r9:c0a06ed4 r8:d0071d40 r7:c0a06ed4 r6:c11d0f0c r5:00000000
       r4:c04abfc4
      [<c06ba358>] (mutex_lock_nested) from [<c04abfc4>] (cpufreq_thermal_notifier+0x34/0xfc)
       r10:ec853800 r9:ec85380c r8:d00d7d3c r7:c0a06ed4 r6:d00d7d3c r5:00000000
       r4:fffffffe
      [<c04abf90>] (cpufreq_thermal_notifier) from [<c0042bf4>] (notifier_call_chain+0x4c/0x8c)
       r7:00000000 r6:00000000 r5:00000000 r4:fffffffe
      [<c0042ba8>] (notifier_call_chain) from [<c0042f20>] (__blocking_notifier_call_chain+0x50/0x68)
       r8:c0a072a4 r7:00000000 r6:d00d7d3c r5:ffffffff r4:c0a06fc8 r3:ffffffff
      [<c0042ed0>] (__blocking_notifier_call_chain) from [<c0042f58>] (blocking_notifier_call_chain+0x20/0x28)
       r7:ec98b540 r6:c13ebc80 r5:ed76e600 r4:d00d7d3c
      [<c0042f38>] (blocking_notifier_call_chain) from [<c04ae62c>] (cpufreq_set_policy+0x7c/0x1d0)
      [<c04ae5b0>] (cpufreq_set_policy) from [<c04af3cc>] (store_scaling_governor+0x74/0x9c)
       r7:ec98b540 r6:0000000c r5:ec98b540 r4:ed76e600
      [<c04af358>] (store_scaling_governor) from [<c04ad418>] (store+0x90/0xc0)
       r6:0000000c r5:ed76e6d4 r4:ed76e600
      [<c04ad388>] (store) from [<c0175384>] (sysfs_kf_write+0x54/0x58)
       r8:0000000c r7:d00d7f78 r6:ec98b540 r5:0000000c r4:ec853800 r3:0000000c
      [<c0175330>] (sysfs_kf_write) from [<c01746b4>] (kernfs_fop_write+0xdc/0x190)
       r6:ec98b540 r5:00000000 r4:00000000 r3:c0175330
      [<c01745d8>] (kernfs_fop_write) from [<c010dcc0>] (vfs_write+0xac/0x1b4)
       r10:0162aa70 r9:d00d6000 r8:0000000c r7:d00d7f78 r6:0162aa70 r5:0000000c
       r4:eccde500
      [<c010dc14>] (vfs_write) from [<c010dfec>] (SyS_write+0x44/0x90)
       r10:0162aa70 r8:0000000c r7:eccde500 r6:eccde500 r5:00000000 r4:00000000
      [<c010dfa8>] (SyS_write) from [<c000ec00>] (ret_fast_syscall+0x0/0x48)
       r10:00000000 r8:c000edc4 r7:00000004 r6:000216cc r5:0000000c r4:0162aa70
      
      Solve this by moving to finer grained locking - use one mutex to protect
      the cpufreq_dev_list as a whole, and a separate lock to ensure correct
      ordering of cpufreq notifier registration and removal.
      
      cooling_list_lock is taken within cooling_cpufreq_lock on
      (un)registration to preserve the behavior of the code, i.e. to
      atomically add/remove to the list and (un)register the notifier.
      
      Fixes: 2dcd851f ("thermal: cpu_cooling: Update always cpufreq policy with
      Reviewed-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarEduardo Valentin <edubezval@gmail.com>
      02373d7c
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 45e38cff
      Linus Torvalds authored
      Pull KVM fixes from Paolo Bonzini:
       "Just two very small & simple patches"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: x86: Use adjustment in guest cycles when handling MSR_IA32_TSC_ADJUST
        KVM: x86: zero IDT limit on entry to SMM
      45e38cff
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 8394a1b7
      Linus Torvalds authored
      Merge fixes from Andrew Morton:
       "11 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        Update maintainers for DRM STI driver
        mm: cma: mark cma_bitmap_maxno() inline in header
        zram: fix pool name truncation
        memory-hotplug: fix wrong edge when hot add a new node
        .mailmap: Andrey Ryabinin has moved
        ipc/sem.c: update/correct memory barriers
        mm/hwpoison: fix panic due to split huge zero page
        ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()
        ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits
        mm/hwpoison: fix fail isolate hugetlbfs page w/ refcount held
        mm/hwpoison: fix page refcount of unknown non LRU page
      8394a1b7
  6. 14 Aug, 2015 12 commits
    • Linus Torvalds's avatar
      Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · fbd9163f
      Linus Torvalds authored
      Pull clock fix from Stephen Boyd:
       "A one-liner for a regression found in the PXA clock driver"
      
      * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
        clk: pxa: pxa3xx: fix CKEN register access
      fbd9163f
    • Benjamin Gaignard's avatar
      Update maintainers for DRM STI driver · 7f11c476
      Benjamin Gaignard authored
      Add Vincent Abriou and myself as maintainers.
      Signed-off-by: default avatarBenjamin Gaignard <benjamin.gaignard@linaro.org>
      Cc: Vincent Abriou <vincent.abriou@st.com>
      Cc: Dave Airlie <airlied@linux.ie>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7f11c476
    • Gregory Fong's avatar
      mm: cma: mark cma_bitmap_maxno() inline in header · f21838e0
      Gregory Fong authored
      cma_bitmap_maxno() was marked as static and not static inline, which can
      cause warnings about this function not being used if this file is included
      in a file that does not call that function, and violates the conventions
      used elsewhere.  The two options are to move the function implementation
      back to mm/cma.c or make it inline here, and it's simple enough for the
      latter to make sense.
      Signed-off-by: default avatarGregory Fong <gregory.0xf0@gmail.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f21838e0
    • Sergey Senozhatsky's avatar
      zram: fix pool name truncation · 4ce321f5
      Sergey Senozhatsky authored
      zram_meta_alloc() constructs a pool name for zs_create_pool() call as
      
          snprintf(pool_name, sizeof(pool_name), "zram%d", device_id);
      
      However, it defines pool name buffer to be only 8 bytes long (minus
      trailing zero), which means that we can have only 1000 pool names: zram0
      -- zram999.
      
      With CONFIG_ZSMALLOC_STAT enabled an attempt to create a device zram1000
      can fail if device zram100 already exists, because snprintf() will
      truncate new pool name to zram100 and pass it debugfs_create_dir(),
      causing:
      
        debugfs dir <zram100> creation failed
        zram: Error creating memory pool
      
      ... and so on.
      
      Fix it by passing zram->disk->disk_name to zram_meta_alloc() instead of
      divice_id.  We construct zram%d name earlier and keep it as a ->disk_name,
      no need to snprintf() it again.
      Signed-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4ce321f5
    • Xishi Qiu's avatar
      memory-hotplug: fix wrong edge when hot add a new node · f9126ab9
      Xishi Qiu authored
      When we add a new node, the edge of memory may be wrong.
      
      e.g. system has 4 nodes, and node3 is movable, node3 mem:[24G-32G],
      
      1. hotremove the node3,
      2. then hotadd node3 with a part of memory, mem:[26G-30G],
      3. call hotadd_new_pgdat()
              free_area_init_node()
                      get_pfn_range_for_nid()
      4. it will return wrong start_pfn and end_pfn, because we have not
      update the memblock.
      
      This patch also fixes a BUG_ON during hot-addition, please see
      http://marc.info/?l=linux-kernel&m=142961156129456&w=2Signed-off-by: default avatarXishi Qiu <qiuxishi@huawei.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f9126ab9
    • Andrey Ryabinin's avatar
      .mailmap: Andrey Ryabinin has moved · 2baf9e89
      Andrey Ryabinin authored
      Update my email address.
      Signed-off-by: default avatarAndrey Ryabinin <ryabinin.a.a@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2baf9e89
    • Manfred Spraul's avatar
      ipc/sem.c: update/correct memory barriers · 3ed1f8a9
      Manfred Spraul authored
      sem_lock() did not properly pair memory barriers:
      
      !spin_is_locked() and spin_unlock_wait() are both only control barriers.
      The code needs an acquire barrier, otherwise the cpu might perform read
      operations before the lock test.
      
      As no primitive exists inside <include/spinlock.h> and since it seems
      noone wants another primitive, the code creates a local primitive within
      ipc/sem.c.
      
      With regards to -stable:
      
      The change of sem_wait_array() is a bugfix, the change to sem_lock() is a
      nop (just a preprocessor redefinition to improve the readability).  The
      bugfix is necessary for all kernels that use sem_wait_array() (i.e.:
      starting from 3.10).
      Signed-off-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Reported-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Kirill Tkhai <ktkhai@parallels.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: <stable@vger.kernel.org>	[3.10+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3ed1f8a9
    • Wanpeng Li's avatar
      mm/hwpoison: fix panic due to split huge zero page · 7f6bf39b
      Wanpeng Li authored
      Bug:
      
        ------------[ cut here ]------------
        kernel BUG at mm/huge_memory.c:1957!
        invalid opcode: 0000 [#1] SMP
        Modules linked in: snd_hda_codec_hdmi i915 rpcsec_gss_krb5 snd_hda_codec_realtek snd_hda_codec_generic nfsv4 dns_re
        CPU: 2 PID: 2576 Comm: test_huge Not tainted 4.2.0-rc5-mm1+ #27
        Hardware name: Dell Inc. OptiPlex 7020/0F5C5X, BIOS A03 01/08/2015
        task: ffff880204e3d600 ti: ffff8800db16c000 task.ti: ffff8800db16c000
        RIP: split_huge_page_to_list+0xdb/0x120
        Call Trace:
          memory_failure+0x32e/0x7c0
          madvise_hwpoison+0x8b/0x160
          SyS_madvise+0x40/0x240
          ? do_page_fault+0x37/0x90
          entry_SYSCALL_64_fastpath+0x12/0x71
        Code: ff f0 41 ff 4c 24 30 74 0d 31 c0 48 83 c4 08 5b 41 5c 41 5d c9 c3 4c 89 e7 e8 e2 58 fd ff 48 83 c4 08 31 c0
        RIP  split_huge_page_to_list+0xdb/0x120
         RSP <ffff8800db16fde8>
        ---[ end trace aee7ce0df8e44076 ]---
      
      Testcase:
      
          #define _GNU_SOURCE
          #include <stdlib.h>
          #include <stdio.h>
          #include <sys/mman.h>
          #include <unistd.h>
          #include <fcntl.h>
          #include <sys/types.h>
          #include <errno.h>
          #include <string.h>
      
          #define MB 1024*1024
      
          int main(void)
          {
                  char *mem;
      
                  posix_memalign((void **)&mem, 2 * MB, 200 * MB);
      
                  madvise(mem, 200 * MB, MADV_HWPOISON);
      
                  free(mem);
      
                  return 0;
          }
      
      Huge zero page is allocated if page fault w/o FAULT_FLAG_WRITE flag.
      The get_user_pages_fast() which called in madvise_hwpoison() will get
      huge zero page if the page is not allocated before.  Huge zero page is a
      tranparent huge page, however, it is not an anonymous page.
      memory_failure will split the huge zero page and trigger
      BUG_ON(is_huge_zero_page(page));
      
      After commit 98ed2b00 ("mm/memory-failure: give up error handling
      for non-tail-refcounted thp"), memory_failure will not catch non anon
      thp from madvise_hwpoison path and this bug occur.
      
      Fix it by catching non anon thp in memory_failure in order to not split
      huge zero page in madvise_hwpoison path.
      
      After this patch:
      
        Injecting memory failure for page 0x202800 at 0x7fd8ae800000
        MCE: 0x202800: non anonymous thp
        [...]
      
      [akpm@linux-foundation.org: remove second split, per Wanpeng]
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Acked-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7f6bf39b
    • Herton R. Krzesinski's avatar
      ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem() · a9795584
      Herton R. Krzesinski authored
      After we acquire the sma->sem_perm lock in exit_sem(), we are protected
      against a racing IPC_RMID operation.  Also at that point, we are the last
      user of sem_undo_list.  Therefore it isn't required that we acquire or use
      ulp->lock.
      Signed-off-by: default avatarHerton R. Krzesinski <herton@redhat.com>
      Acked-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Rafael Aquini <aquini@redhat.com>
      CC: Aristeu Rozanski <aris@redhat.com>
      Cc: David Jeffery <djeffery@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a9795584
    • Herton R. Krzesinski's avatar
      ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits · 602b8593
      Herton R. Krzesinski authored
      The current semaphore code allows a potential use after free: in
      exit_sem we may free the task's sem_undo_list while there is still
      another task looping through the same semaphore set and cleaning the
      sem_undo list at freeary function (the task called IPC_RMID for the same
      semaphore set).
      
      For example, with a test program [1] running which keeps forking a lot
      of processes (which then do a semop call with SEM_UNDO flag), and with
      the parent right after removing the semaphore set with IPC_RMID, and a
      kernel built with CONFIG_SLAB, CONFIG_SLAB_DEBUG and
      CONFIG_DEBUG_SPINLOCK, you can easily see something like the following
      in the kernel log:
      
         Slab corruption (Not tainted): kmalloc-64 start=ffff88003b45c1c0, len=64
         000: 6b 6b 6b 6b 6b 6b 6b 6b 00 6b 6b 6b 6b 6b 6b 6b  kkkkkkkk.kkkkkkk
         010: ff ff ff ff 6b 6b 6b 6b ff ff ff ff ff ff ff ff  ....kkkk........
         Prev obj: start=ffff88003b45c180, len=64
         000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a  .....N......ZZZZ
         010: ff ff ff ff ff ff ff ff c0 fb 01 37 00 88 ff ff  ...........7....
         Next obj: start=ffff88003b45c200, len=64
         000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a  .....N......ZZZZ
         010: ff ff ff ff ff ff ff ff 68 29 a7 3c 00 88 ff ff  ........h).<....
         BUG: spinlock wrong CPU on CPU#2, test/18028
         general protection fault: 0000 [#1] SMP
         Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
         CPU: 2 PID: 18028 Comm: test Not tainted 4.2.0-rc5+ #1
         Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
         RIP: spin_dump+0x53/0xc0
         Call Trace:
           spin_bug+0x30/0x40
           do_raw_spin_unlock+0x71/0xa0
           _raw_spin_unlock+0xe/0x10
           freeary+0x82/0x2a0
           ? _raw_spin_lock+0xe/0x10
           semctl_down.clone.0+0xce/0x160
           ? __do_page_fault+0x19a/0x430
           ? __audit_syscall_entry+0xa8/0x100
           SyS_semctl+0x236/0x2c0
           ? syscall_trace_leave+0xde/0x130
           entry_SYSCALL_64_fastpath+0x12/0x71
         Code: 8b 80 88 03 00 00 48 8d 88 60 05 00 00 48 c7 c7 a0 2c a4 81 31 c0 65 8b 15 eb 40 f3 7e e8 08 31 68 00 4d 85 e4 44 8b 4b 08 74 5e <45> 8b 84 24 88 03 00 00 49 8d 8c 24 60 05 00 00 8b 53 04 48 89
         RIP  [<ffffffff810d6053>] spin_dump+0x53/0xc0
          RSP <ffff88003750fd68>
         ---[ end trace 783ebb76612867a0 ]---
         NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [test:18053]
         Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
         CPU: 3 PID: 18053 Comm: test Tainted: G      D         4.2.0-rc5+ #1
         Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
         RIP: native_read_tsc+0x0/0x20
         Call Trace:
           ? delay_tsc+0x40/0x70
           __delay+0xf/0x20
           do_raw_spin_lock+0x96/0x140
           _raw_spin_lock+0xe/0x10
           sem_lock_and_putref+0x11/0x70
           SYSC_semtimedop+0x7bf/0x960
           ? handle_mm_fault+0xbf6/0x1880
           ? dequeue_task_fair+0x79/0x4a0
           ? __do_page_fault+0x19a/0x430
           ? kfree_debugcheck+0x16/0x40
           ? __do_page_fault+0x19a/0x430
           ? __audit_syscall_entry+0xa8/0x100
           ? do_audit_syscall_entry+0x66/0x70
           ? syscall_trace_enter_phase1+0x139/0x160
           SyS_semtimedop+0xe/0x10
           SyS_semop+0x10/0x20
           entry_SYSCALL_64_fastpath+0x12/0x71
         Code: 47 10 83 e8 01 85 c0 89 47 10 75 08 65 48 89 3d 1f 74 ff 7e c9 c3 0f 1f 44 00 00 55 48 89 e5 e8 87 17 04 00 66 90 c9 c3 0f 1f 00 <55> 48 89 e5 0f 31 89 c1 48 89 d0 48 c1 e0 20 89 c9 48 09 c8 c9
         Kernel panic - not syncing: softlockup: hung tasks
      
      I wasn't able to trigger any badness on a recent kernel without the
      proper config debugs enabled, however I have softlockup reports on some
      kernel versions, in the semaphore code, which are similar as above (the
      scenario is seen on some servers running IBM DB2 which uses semaphore
      syscalls).
      
      The patch here fixes the race against freeary, by acquiring or waiting
      on the sem_undo_list lock as necessary (exit_sem can race with freeary,
      while freeary sets un->semid to -1 and removes the same sem_undo from
      list_proc or when it removes the last sem_undo).
      
      After the patch I'm unable to reproduce the problem using the test case
      [1].
      
      [1] Test case used below:
      
          #include <stdio.h>
          #include <sys/types.h>
          #include <sys/ipc.h>
          #include <sys/sem.h>
          #include <sys/wait.h>
          #include <stdlib.h>
          #include <time.h>
          #include <unistd.h>
          #include <errno.h>
      
          #define NSEM 1
          #define NSET 5
      
          int sid[NSET];
      
          void thread()
          {
                  struct sembuf op;
                  int s;
                  uid_t pid = getuid();
      
                  s = rand() % NSET;
                  op.sem_num = pid % NSEM;
                  op.sem_op = 1;
                  op.sem_flg = SEM_UNDO;
      
                  semop(sid[s], &op, 1);
                  exit(EXIT_SUCCESS);
          }
      
          void create_set()
          {
                  int i, j;
                  pid_t p;
                  union {
                          int val;
                          struct semid_ds *buf;
                          unsigned short int *array;
                          struct seminfo *__buf;
                  } un;
      
                  /* Create and initialize semaphore set */
                  for (i = 0; i < NSET; i++) {
                          sid[i] = semget(IPC_PRIVATE , NSEM, 0644 | IPC_CREAT);
                          if (sid[i] < 0) {
                                  perror("semget");
                                  exit(EXIT_FAILURE);
                          }
                  }
                  un.val = 0;
                  for (i = 0; i < NSET; i++) {
                          for (j = 0; j < NSEM; j++) {
                                  if (semctl(sid[i], j, SETVAL, un) < 0)
                                          perror("semctl");
                          }
                  }
      
                  /* Launch threads that operate on semaphore set */
                  for (i = 0; i < NSEM * NSET * NSET; i++) {
                          p = fork();
                          if (p < 0)
                                  perror("fork");
                          if (p == 0)
                                  thread();
                  }
      
                  /* Free semaphore set */
                  for (i = 0; i < NSET; i++) {
                          if (semctl(sid[i], NSEM, IPC_RMID))
                                  perror("IPC_RMID");
                  }
      
                  /* Wait for forked processes to exit */
                  while (wait(NULL)) {
                          if (errno == ECHILD)
                                  break;
                  };
          }
      
          int main(int argc, char **argv)
          {
                  pid_t p;
      
                  srand(time(NULL));
      
                  while (1) {
                          p = fork();
                          if (p < 0) {
                                  perror("fork");
                                  exit(EXIT_FAILURE);
                          }
                          if (p == 0) {
                                  create_set();
                                  goto end;
                          }
      
                          /* Wait for forked processes to exit */
                          while (wait(NULL)) {
                                  if (errno == ECHILD)
                                          break;
                          };
                  }
          end:
                  return 0;
          }
      
      [akpm@linux-foundation.org: use normal comment layout]
      Signed-off-by: default avatarHerton R. Krzesinski <herton@redhat.com>
      Acked-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Rafael Aquini <aquini@redhat.com>
      CC: Aristeu Rozanski <aris@redhat.com>
      Cc: David Jeffery <djeffery@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      602b8593
    • Wanpeng Li's avatar
      mm/hwpoison: fix fail isolate hugetlbfs page w/ refcount held · 03613808
      Wanpeng Li authored
      Hugetlbfs pages will get a refcount in get_any_page() or
      madvise_hwpoison() if soft offlining through madvise.  The refcount which
      is held by the soft offline path should be released if we fail to isolate
      hugetlbfs pages.
      
      Fix it by reducing the refcount for both isolation success and failure.
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Acked-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: <stable@vger.kernel.org>	[3.9+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      03613808
    • Wanpeng Li's avatar
      mm/hwpoison: fix page refcount of unknown non LRU page · 4f32be67
      Wanpeng Li authored
      After trying to drain pages from pagevec/pageset, we try to get reference
      count of the page again, however, the reference count of the page is not
      reduced if the page is still not on LRU list.
      
      Fix it by adding the put_page() to drop the page reference which is from
      __get_any_page().
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Acked-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: <stable@vger.kernel.org>	[3.9+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4f32be67