1. 19 Jan, 2017 40 commits
    • Gabriel Krisman Bertazi's avatar
      blk-mq: Always schedule hctx->next_cpu · 6e8210ad
      Gabriel Krisman Bertazi authored
      commit c02ebfdd upstream.
      
      Commit 0e87e58b ("blk-mq: improve warning for running a queue on the
      wrong CPU") attempts to avoid triggering the WARN_ON in
      __blk_mq_run_hw_queue when the expected CPU is dead.  Problem is, in the
      last batch execution before round robin, blk_mq_hctx_next_cpu can
      schedule a dead CPU and also update next_cpu to the next alive CPU in
      the mask, which will trigger the WARN_ON despite the previous
      workaround.
      
      The following patch fixes this scenario by always scheduling the value
      in hctx->next_cpu.  This changes the moment when we round-robin the CPU
      running the hctx, but it really doesn't matter, since it still executes
      BLK_MQ_CPU_WORK_BATCH times in a row before switching to another CPU.
      
      Fixes: 0e87e58b ("blk-mq: improve warning for running a queue on the wrong CPU")
      Signed-off-by: default avatarGabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6e8210ad
    • Prarit Bhargava's avatar
      ACPI / APEI: Fix NMI notification handling · ddf0c377
      Prarit Bhargava authored
      commit a545715d upstream.
      
      When removing and adding cpu 0 on a system with GHES NMI the following stack
      trace is seen when re-adding the cpu:
      
      WARNING: CPU: 0 PID: 0 at arch/x86/kernel/apic/apic.c:1349 setup_local_APIC+
      Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 nfs fscache coretemp intel_ra
      CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc6+ #2
      Call Trace:
       dump_stack+0x63/0x8e
       __warn+0xd1/0xf0
       warn_slowpath_null+0x1d/0x20
       setup_local_APIC+0x275/0x370
       apic_ap_setup+0xe/0x20
       start_secondary+0x48/0x180
       set_init_arg+0x55/0x55
       early_idt_handler_array+0x120/0x120
       x86_64_start_reservations+0x2a/0x2c
       x86_64_start_kernel+0x13d/0x14c
      
      During the cpu bringup, wakeup_cpu_via_init_nmi() is called and issues an
      NMI on CPU 0.  The GHES NMI handler, ghes_notify_nmi() runs the
      ghes_proc_irq_work work queue which ends up setting IRQ_WORK_VECTOR
      (0xf6).  The "faulty" IR line set at arch/x86/kernel/apic/apic.c:1349 is  also
      0xf6 (specifically APIC IRR for irqs 255 to 224 is 0x400000) which confirms
      that something has set the IRQ_WORK_VECTOR line prior to the APIC being
      initialized.
      
      Commit 2383844d ("GHES: Elliminate double-loop in the NMI handler")
      incorrectly modified the behavior such that the handler returns
      NMI_HANDLED only if an error was processed, and incorrectly runs the ghes
      work queue for every NMI.
      
      This patch modifies the ghes_proc_irq_work() to run as it did prior to
      2383844d ("GHES: Elliminate double-loop in the NMI handler") by
      properly returning NMI_HANDLED and only calling the work queue if
      NMI_HANDLED has been set.
      
      Fixes: 2383844d (GHES: Elliminate double-loop in the NMI handler)
      Signed-off-by: default avatarPrarit Bhargava <prarit@redhat.com>
      Reviewed-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ddf0c377
    • Tejun Heo's avatar
      block: cfq_cpd_alloc() should use @gfp · 4af7970b
      Tejun Heo authored
      commit ebc4ff66 upstream.
      
      cfq_cpd_alloc() which is the cpd_alloc_fn implementation for cfq was
      incorrectly hard coding GFP_KERNEL instead of using the mask specified
      through the @gfp parameter.  This currently doesn't cause any actual
      issues because all current callers specify GFP_KERNEL.  Fix it.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Fixes: e4a9bde9 ("blkcg: replace blkcg_policy->cpd_size with ->cpd_alloc/free_fn() methods")
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4af7970b
    • Denis Kirjanov's avatar
      cpufreq: powernv: Disable preemption while checking CPU throttling state · 2c1dd423
      Denis Kirjanov authored
      commit 8a10c06a upstream.
      
      With preemption turned on we can read incorrect throttling state
      while being switched to CPU on a different chip.
      
       BUG: using smp_processor_id() in preemptible [00000000] code: cat/7343
       caller is .powernv_cpufreq_throttle_check+0x2c/0x710
       CPU: 13 PID: 7343 Comm: cat Not tainted 4.8.0-rc5-dirty #1
       Call Trace:
       [c0000007d25b75b0] [c000000000971378] .dump_stack+0xe4/0x150 (unreliable)
       [c0000007d25b7640] [c0000000005162e4] .check_preemption_disabled+0x134/0x150
       [c0000007d25b76e0] [c0000000007b63ac] .powernv_cpufreq_throttle_check+0x2c/0x710
       [c0000007d25b7790] [c0000000007b6d18] .powernv_cpufreq_target_index+0x288/0x360
       [c0000007d25b7870] [c0000000007acee4] .__cpufreq_driver_target+0x394/0x8c0
       [c0000007d25b7920] [c0000000007b22ac] .cpufreq_set+0x7c/0xd0
       [c0000007d25b79b0] [c0000000007adf50] .store_scaling_setspeed+0x80/0xc0
       [c0000007d25b7a40] [c0000000007ae270] .store+0xa0/0x100
       [c0000007d25b7ae0] [c0000000003566e8] .sysfs_kf_write+0x88/0xb0
       [c0000007d25b7b70] [c0000000003553b8] .kernfs_fop_write+0x178/0x260
       [c0000007d25b7c10] [c0000000002ac3cc] .__vfs_write+0x3c/0x1c0
       [c0000007d25b7cf0] [c0000000002ad584] .vfs_write+0xc4/0x230
       [c0000007d25b7d90] [c0000000002aeef8] .SyS_write+0x58/0x100
       [c0000007d25b7e30] [c00000000000bfec] system_call+0x38/0xfc
      
      Fixes: 09a972d1 (cpufreq: powernv: Report cpu frequency throttling)
      Reviewed-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarDenis Kirjanov <kda@linux-powerpc.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2c1dd423
    • NeilBrown's avatar
      NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success. · 33ebdfe9
      NeilBrown authored
      commit cfd278c2 upstream.
      
      Various places assume that if nfs4_fl_prepare_ds() turns a non-NULL 'ds',
      then ds->ds_clp will also be non-NULL.
      
      This is not necessasrily true in the case when the process received a fatal signal
      while nfs4_pnfs_ds_connect is waiting in nfs4_wait_ds_connect().
      In that case ->ds_clp may not be set, and the devid may not recently have been marked
      unavailable.
      
      So add a test for ds_clp == NULL and return NULL in that case.
      
      Fixes: c23266d5 ("NFS4.1 Fix data server connection race")
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Acked-by: default avatarOlga Kornievskaia <aglo@umich.edu>
      Acked-by: default avatarAdamson, Andy <William.Adamson@netapp.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      33ebdfe9
    • Trond Myklebust's avatar
      NFS: Fix a performance regression in readdir · 11804232
      Trond Myklebust authored
      commit 79f687a3 upstream.
      
      Ben Coddington reports that commit 311324ad, by adding the function
      nfs_dir_mapping_need_revalidate() that checks page cache validity on
      each call to nfs_readdir() causes a performance regression when
      the directory is being modified.
      
      If the directory is changing while we're iterating through the directory,
      POSIX does not require us to invalidate the page cache unless the user
      calls rewinddir(). However, we still do want to ensure that we use
      readdirplus in order to avoid a load of stat() calls when the user
      is doing an 'ls -l' workload.
      
      The fix should be to invalidate the page cache immediately when we're
      setting the NFS_INO_ADVISE_RDPLUS bit.
      Reported-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Fixes: 311324ad ("NFS: Be more aggressive in using readdirplus...")
      Reviewed-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Tested-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      11804232
    • Trond Myklebust's avatar
      pNFS: Fix race in pnfs_wait_on_layoutreturn · 8ff851bf
      Trond Myklebust authored
      commit ee284e35 upstream.
      
      We must put the task to sleep while holding the inode->i_lock in order
      to ensure atomicity with the test for NFS_LAYOUT_RETURN.
      
      Fixes: 500d701f ("NFS41: make close wait for layoutreturn")
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8ff851bf
    • Neil Armstrong's avatar
      pinctrl: meson: fix gpio request disabling other modes · 7aae6e3c
      Neil Armstrong authored
      commit f24d311f upstream.
      
      The pinctrl_gpio_request is called with the "full" gpio number, already
      containing the base, then meson_pmx_request_gpio is then called with the
      final pin number.
      Remove the base addition when calling meson_pmx_disable_other_groups.
      
      Fixes: 6ac73095 ("pinctrl: add driver for Amlogic Meson SoCs")
      CC: Beniamino Galvani <b.galvani@gmail.com>
      Signed-off-by: default avatarNeil Armstrong <narmstrong@baylibre.com>
      Acked-by: default avatarKevin Hilman <khilman@baylibre.com>
      Acked-by: default avatarBeniamino Galvani <b.galvani@gmail.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7aae6e3c
    • Jeff Mahoney's avatar
      btrfs: fix error handling when run_delayed_extent_op fails · f0382c09
      Jeff Mahoney authored
      commit aa7c8da3 upstream.
      
      In __btrfs_run_delayed_refs, the error path when run_delayed_extent_op
      fails sets locked_ref->processing = 0 but doesn't re-increment
      delayed_refs->num_heads_ready.  As a result, we end up triggering
      the WARN_ON in btrfs_select_ref_head.
      
      Fixes: d7df2c79 (Btrfs: attach delayed ref updates to delayed ref heads)
      Reported-by: default avatarJon Nelson <jnelson-suse@jamponi.net>
      Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
      Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f0382c09
    • Jeff Mahoney's avatar
      btrfs: fix locking when we put back a delayed ref that's too new · 205e997a
      Jeff Mahoney authored
      commit d0280996 upstream.
      
      In __btrfs_run_delayed_refs, when we put back a delayed ref that's too
      new, we have already dropped the lock on locked_ref when we set
      ->processing = 0.
      
      This patch keeps the lock to cover that assignment.
      
      Fixes: d7df2c79 (Btrfs: attach delayed ref updates to delayed ref heads)
      Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
      Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      205e997a
    • Lukasz Odzioba's avatar
      x86/cpu: Fix bootup crashes by sanitizing the argument of the 'clearcpuid=' command-line option · 68b97d28
      Lukasz Odzioba authored
      commit dd853fd2 upstream.
      
      A negative number can be specified in the cmdline which will be used as
      setup_clear_cpu_cap() argument. With that we can clear/set some bit in
      memory predceeding boot_cpu_data/cpu_caps_cleared which may cause kernel
      to misbehave. This patch adds lower bound check to setup_disablecpuid().
      
      Boris Petkov reproduced a crash:
      
        [    1.234575] BUG: unable to handle kernel paging request at ffffffff858bd540
        [    1.236535] IP: memcpy_erms+0x6/0x10
      Signed-off-by: default avatarLukasz Odzioba <lukasz.odzioba@intel.com>
      Acked-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: andi.kleen@intel.com
      Cc: bp@alien8.de
      Cc: dave.hansen@linux.intel.com
      Cc: luto@kernel.org
      Cc: slaoub@gmail.com
      Fixes: ac72e788 ("x86: add generic clearcpuid=... option")
      Link: http://lkml.kernel.org/r/1482933340-11857-1-git-send-email-lukasz.odzioba@intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      68b97d28
    • Johan Hovold's avatar
      USB: serial: ch341: fix modem-control and B0 handling · 0cf23324
      Johan Hovold authored
      commit 030ee7ae upstream.
      
      The modem-control signals are managed by the tty-layer during open and
      should not be asserted prematurely when set_termios is called from
      driver open.
      
      Also make sure that the signals are asserted only when changing speed
      from B0.
      
      Fixes: 664d5df9 ("USB: usb-serial ch341: support for DTR/RTS/CTS")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0cf23324
    • Johan Hovold's avatar
      USB: serial: ch341: fix resume after reset · 1d25a056
      Johan Hovold authored
      commit ce5e2928 upstream.
      
      Fix reset-resume handling which failed to resubmit the read and
      interrupt URBs, thereby leaving a port that was open before suspend in a
      broken state until closed and reopened.
      
      Fixes: 1ded7ea4 ("USB: ch341 serial: fix port number changed after
      resume")
      Fixes: 2bfd1c96 ("USB: serial: ch341: remove reset_resume callback")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1d25a056
    • Alex Deucher's avatar
    • Zhou Chengming's avatar
      sysctl: Drop reference added by grab_header in proc_sys_readdir · b9d66313
      Zhou Chengming authored
      commit 93362fa4 upstream.
      
      Fixes CVE-2016-9191, proc_sys_readdir doesn't drop reference
      added by grab_header when return from !dir_emit_dots path.
      It can cause any path called unregister_sysctl_table will
      wait forever.
      
      The calltrace of CVE-2016-9191:
      
      [ 5535.960522] Call Trace:
      [ 5535.963265]  [<ffffffff817cdaaf>] schedule+0x3f/0xa0
      [ 5535.968817]  [<ffffffff817d33fb>] schedule_timeout+0x3db/0x6f0
      [ 5535.975346]  [<ffffffff817cf055>] ? wait_for_completion+0x45/0x130
      [ 5535.982256]  [<ffffffff817cf0d3>] wait_for_completion+0xc3/0x130
      [ 5535.988972]  [<ffffffff810d1fd0>] ? wake_up_q+0x80/0x80
      [ 5535.994804]  [<ffffffff8130de64>] drop_sysctl_table+0xc4/0xe0
      [ 5536.001227]  [<ffffffff8130de17>] drop_sysctl_table+0x77/0xe0
      [ 5536.007648]  [<ffffffff8130decd>] unregister_sysctl_table+0x4d/0xa0
      [ 5536.014654]  [<ffffffff8130deff>] unregister_sysctl_table+0x7f/0xa0
      [ 5536.021657]  [<ffffffff810f57f5>] unregister_sched_domain_sysctl+0x15/0x40
      [ 5536.029344]  [<ffffffff810d7704>] partition_sched_domains+0x44/0x450
      [ 5536.036447]  [<ffffffff817d0761>] ? __mutex_unlock_slowpath+0x111/0x1f0
      [ 5536.043844]  [<ffffffff81167684>] rebuild_sched_domains_locked+0x64/0xb0
      [ 5536.051336]  [<ffffffff8116789d>] update_flag+0x11d/0x210
      [ 5536.057373]  [<ffffffff817cf61f>] ? mutex_lock_nested+0x2df/0x450
      [ 5536.064186]  [<ffffffff81167acb>] ? cpuset_css_offline+0x1b/0x60
      [ 5536.070899]  [<ffffffff810fce3d>] ? trace_hardirqs_on+0xd/0x10
      [ 5536.077420]  [<ffffffff817cf61f>] ? mutex_lock_nested+0x2df/0x450
      [ 5536.084234]  [<ffffffff8115a9f5>] ? css_killed_work_fn+0x25/0x220
      [ 5536.091049]  [<ffffffff81167ae5>] cpuset_css_offline+0x35/0x60
      [ 5536.097571]  [<ffffffff8115aa2c>] css_killed_work_fn+0x5c/0x220
      [ 5536.104207]  [<ffffffff810bc83f>] process_one_work+0x1df/0x710
      [ 5536.110736]  [<ffffffff810bc7c0>] ? process_one_work+0x160/0x710
      [ 5536.117461]  [<ffffffff810bce9b>] worker_thread+0x12b/0x4a0
      [ 5536.123697]  [<ffffffff810bcd70>] ? process_one_work+0x710/0x710
      [ 5536.130426]  [<ffffffff810c3f7e>] kthread+0xfe/0x120
      [ 5536.135991]  [<ffffffff817d4baf>] ret_from_fork+0x1f/0x40
      [ 5536.142041]  [<ffffffff810c3e80>] ? kthread_create_on_node+0x230/0x230
      
      One cgroup maintainer mentioned that "cgroup is trying to offline
      a cpuset css, which takes place under cgroup_mutex.  The offlining
      ends up trying to drain active usages of a sysctl table which apprently
      is not happening."
      The real reason is that proc_sys_readdir doesn't drop reference added
      by grab_header when return from !dir_emit_dots path. So this cpuset
      offline path will wait here forever.
      
      See here for details: http://www.openwall.com/lists/oss-security/2016/11/04/13
      
      Fixes: f0c3b509 ("[readdir] convert procfs")
      Reported-by: default avatarCAI Qian <caiqian@redhat.com>
      Tested-by: default avatarYang Shukui <yangshukui@huawei.com>
      Signed-off-by: default avatarZhou Chengming <zhouchengming1@huawei.com>
      Acked-by: default avatarAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b9d66313
    • Akinobu Mita's avatar
      sysrq: attach sysrq handler correctly for 32-bit kernel · ca81117b
      Akinobu Mita authored
      commit 802c0388 upstream.
      
      The sysrq input handler should be attached to the input device which has
      a left alt key.
      
      On 32-bit kernels, some input devices which has a left alt key cannot
      attach sysrq handler.  Because the keybit bitmap in struct input_device_id
      for sysrq is not correctly initialized.  KEY_LEFTALT is 56 which is
      greater than BITS_PER_LONG on 32-bit kernels.
      
      I found this problem when using a matrix keypad device which defines
      a KEY_LEFTALT (56) but doesn't have a KEY_O (24 == 56%32).
      
      Cc: Jiri Slaby <jslaby@suse.com>
      Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Acked-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ca81117b
    • Richard Genoud's avatar
      tty/serial: atmel_serial: BUG: stop DMA from transmitting in stop_tx · 075f493a
      Richard Genoud authored
      commit 89d82324 upstream.
      
      If we don't disable the transmitter in atmel_stop_tx, the DMA buffer
      continues to send data until it is emptied.
      This cause problems with the flow control (CTS is asserted and data are
      still sent).
      
      So, disabling the transmitter in atmel_stop_tx is a sane thing to do.
      
      Tested on at91sam9g35-cm(DMA)
      Tested for regressions on sama5d2-xplained(Fifo) and at91sam9g20ek(PDC)
      Signed-off-by: default avatarRichard Genoud <richard.genoud@gmail.com>
      Acked-by: default avatarNicolas Ferre <nicolas.ferre@atmel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      075f493a
    • Eric W. Biederman's avatar
      mnt: Protect the mountpoint hashtable with mount_lock · 4a6716f1
      Eric W. Biederman authored
      commit 3895dbf8 upstream.
      
      Protecting the mountpoint hashtable with namespace_sem was sufficient
      until a call to umount_mnt was added to mntput_no_expire.  At which
      point it became possible for multiple calls of put_mountpoint on
      the same hash chain to happen on the same time.
      
      Kristen Johansen <kjlx@templeofstupid.com> reported:
      > This can cause a panic when simultaneous callers of put_mountpoint
      > attempt to free the same mountpoint.  This occurs because some callers
      > hold the mount_hash_lock, while others hold the namespace lock.  Some
      > even hold both.
      >
      > In this submitter's case, the panic manifested itself as a GP fault in
      > put_mountpoint() when it called hlist_del() and attempted to dereference
      > a m_hash.pprev that had been poisioned by another thread.
      
      Al Viro observed that the simple fix is to switch from using the namespace_sem
      to the mount_lock to protect the mountpoint hash table.
      
      I have taken Al's suggested patch moved put_mountpoint in pivot_root
      (instead of taking mount_lock an additional time), and have replaced
      new_mountpoint with get_mountpoint a function that does the hash table
      lookup and addition under the mount_lock.   The introduction of get_mounptoint
      ensures that only the mount_lock is needed to manipulate the mountpoint
      hashtable.
      
      d_set_mounted is modified to only set DCACHE_MOUNTED if it is not
      already set.  This allows get_mountpoint to use the setting of
      DCACHE_MOUNTED to ensure adding a struct mountpoint for a dentry
      happens exactly once.
      
      Fixes: ce07d891 ("mnt: Honor MNT_LOCKED when detaching mounts")
      Reported-by: default avatarKrister Johansen <kjlx@templeofstupid.com>
      Suggested-by: default avatarAl Viro <viro@ZenIV.linux.org.uk>
      Acked-by: default avatarAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4a6716f1
    • Augusto Mecking Caringi's avatar
      vme: Fix wrong pointer utilization in ca91cx42_slave_get · 836fd7c9
      Augusto Mecking Caringi authored
      commit c8a6a09c upstream.
      
      In ca91cx42_slave_get function, the value pointed by vme_base pointer is
      set through:
      
      *vme_base = ioread32(bridge->base + CA91CX42_VSI_BS[i]);
      
      So it must be dereferenced to be used in calculation of pci_base:
      
      *pci_base = (dma_addr_t)*vme_base + pci_offset;
      
      This bug was caught thanks to the following gcc warning:
      
      drivers/vme/bridges/vme_ca91cx42.c: In function ‘ca91cx42_slave_get’:
      drivers/vme/bridges/vme_ca91cx42.c:467:14: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
      *pci_base = (dma_addr_t)vme_base + pci_offset;
      Signed-off-by: default avatarAugusto Mecking Caringi <augustocaringi@gmail.com>
      Acked-By: default avatarMartyn Welch <martyn@welchs.me.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      836fd7c9
    • Mathias Nyman's avatar
      xhci: fix deadlock at host remove by running watchdog correctly · d5fcd719
      Mathias Nyman authored
      commit d6169d04 upstream.
      
      If a URB is killed while the host is removed we can end up in a situation
      where the hub thread takes the roothub device lock, and waits for
      the URB to be given back by xhci-hcd, blocking the host remove code.
      
      xhci-hcd tries to stop the endpoint and give back the urb, but can't
      as the host is removed from PCI bus at the same time, preventing the normal
      way of giving back urb.
      
      Instead we need to rely on the stop command timeout function to give back
      the urb. This xhci_stop_endpoint_command_watchdog() timeout function
      used a XHCI_STATE_DYING flag to indicate if the timeout function is already
      running, but later this flag has been taking into use in other places to
      mark that xhci is dying.
      
      Remove checks for XHCI_STATE_DYING in xhci_urb_dequeue. We are still
      checking that reading from pci state does not return 0xffffffff or that
      host is not halted before trying to stop the endpoint.
      
      This whole area of stopping endpoints, giving back URBs, and the wathdog
      timeout need rework, this fix focuses on solving a specific deadlock
      issue that we can then send to stable before any major rework.
      Signed-off-by: default avatarMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d5fcd719
    • Vlad Tsyrklevich's avatar
      i2c: fix kernel memory disclosure in dev interface · ae76af25
      Vlad Tsyrklevich authored
      commit 30f939fe upstream.
      
      i2c_smbus_xfer() does not always fill an entire block, allowing
      kernel stack memory disclosure through the temp variable. Clear
      it before it's read to.
      Signed-off-by: default avatarVlad Tsyrklevich <vlad@tsyrklevich.net>
      Signed-off-by: default avatarWolfram Sang <wsa@the-dreams.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ae76af25
    • John Garry's avatar
      i2c: print correct device invalid address · f64b9acc
      John Garry authored
      commit 6f724fb3 upstream.
      
      In of_i2c_register_device(), when the check for
      device address validity fails we print the info.addr,
      which has not been assigned properly.
      
      Fix this by printing the actual invalid address.
      Signed-off-by: default avatarJohn Garry <john.garry@huawei.com>
      Reviewed-by: default avatarVladimir Zapolskiy <vz@mleia.com>
      Signed-off-by: default avatarWolfram Sang <wsa@the-dreams.de>
      Fixes: b4e2f6ac ("i2c: apply DT flags when probing")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f64b9acc
    • Guenter Roeck's avatar
      Input: elants_i2c - avoid divide by 0 errors on bad touchscreen data · 085f4ebe
      Guenter Roeck authored
      commit 1c3415a0 upstream.
      
      The following crash may be seen if bad data is received from the
      touchscreen.
      
      [ 2189.425150] elants_i2c i2c-ELAN0001:00: unknown packet ff ff ff ff
      [ 2189.430738] divide error: 0000 [#1] PREEMPT SMP
      [ 2189.434679] gsmi: Log Shutdown Reason 0x03
      [ 2189.434689] Modules linked in: ip6t_REJECT nf_reject_ipv6 rfcomm evdi
      uinput uvcvideo cmac videobuf2_vmalloc videobuf2_memops snd_hda_codec_hdmi
      i2c_dev videobuf2_core snd_soc_sst_cht_bsw_rt5645 snd_hda_intel
      snd_intel_sst_acpi btusb btrtl btbcm btintel bluetooth snd_soc_sst_acpi
      snd_hda_codec snd_intel_sst_core snd_hwdep snd_soc_sst_mfld_platform
      snd_hda_core snd_soc_rt5645 memconsole_x86_legacy memconsole zram snd_soc_rl6231
      fuse ip6table_filter iwlmvm iwlwifi iwl7000_mac80211 cfg80211 iio_trig_sysfs
      joydev cros_ec_sensors cros_ec_sensors_core industrialio_triggered_buffer
      kfifo_buf industrialio snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq
      snd_seq_device ppp_async ppp_generic slhc tun
      [ 2189.434866] CPU: 0 PID: 106 Comm: irq/184-ELAN000 Tainted: G        W
      3.18.0-13101-g57e8190 #1
      [ 2189.434883] Hardware name: GOOGLE Ultima, BIOS Google_Ultima.7287.131.43 07/20/2016
      [ 2189.434898] task: ffff88017a0b6d80 ti: ffff88017a2bc000 task.ti: ffff88017a2bc000
      [ 2189.434913] RIP: 0010:[<ffffffffbecc48d5>]  [<ffffffffbecc48d5>] elants_i2c_irq+0x190/0x200
      [ 2189.434937] RSP: 0018:ffff88017a2bfd98  EFLAGS: 00010293
      [ 2189.434948] RAX: 0000000000000000 RBX: ffff88017a967828 RCX: ffff88017a9678e8
      [ 2189.434962] RDX: 0000000000000000 RSI: 0000000000000246 RDI: 0000000000000000
      [ 2189.434975] RBP: ffff88017a2bfdd8 R08: 00000000000003e8 R09: 0000000000000000
      [ 2189.434989] R10: 0000000000000000 R11: 000000000044a2bd R12: ffff88017a991800
      [ 2189.435001] R13: ffffffffbe8a2a53 R14: ffff88017a0b6d80 R15: ffff88017a0b6d80
      [ 2189.435011] FS:  0000000000000000(0000) GS:ffff88017fc00000(0000) knlGS:0000000000000000
      [ 2189.435022] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [ 2189.435030] CR2: 00007f678d94b000 CR3: 000000003f41a000 CR4: 00000000001007f0
      [ 2189.435039] Stack:
      [ 2189.435044]  ffff88017a2bfda8 ffff88017a9678e8 646464647a2bfdd8 0000000006e09574
      [ 2189.435060]  0000000000000000 ffff88017a088b80 ffff88017a921000 ffffffffbe8a2a53
      [ 2189.435074]  ffff88017a2bfe08 ffffffffbe8a2a73 ffff88017a0b6d80 0000000006e09574
      [ 2189.435089] Call Trace:
      [ 2189.435101]  [<ffffffffbe8a2a53>] ? irq_thread_dtor+0xa9/0xa9
      [ 2189.435112]  [<ffffffffbe8a2a73>] irq_thread_fn+0x20/0x40
      [ 2189.435123]  [<ffffffffbe8a2be1>] irq_thread+0x14e/0x222
      [ 2189.435135]  [<ffffffffbee8cbeb>] ? __schedule+0x3b3/0x57a
      [ 2189.435145]  [<ffffffffbe8a29aa>] ? wake_threads_waitq+0x2d/0x2d
      [ 2189.435156]  [<ffffffffbe8a2a93>] ? irq_thread_fn+0x40/0x40
      [ 2189.435168]  [<ffffffffbe87c385>] kthread+0x10e/0x116
      [ 2189.435178]  [<ffffffffbe87c277>] ? __kthread_parkme+0x67/0x67
      [ 2189.435189]  [<ffffffffbee900ac>] ret_from_fork+0x7c/0xb0
      [ 2189.435199]  [<ffffffffbe87c277>] ? __kthread_parkme+0x67/0x67
      [ 2189.435208] Code: ff ff eb 73 0f b6 bb c1 00 00 00 83 ff 03 7e 13 49 8d 7c
      24 20 ba 04 00 00 00 48 c7 c6 8a cd 21 bf eb 4d 0f b6 83 c2 00 00 00 99 <f7> ff
      83 f8 37 75 15 48 6b f7 37 4c 8d a3 c4 00 00 00 4c 8d ac
      [ 2189.435312] RIP  [<ffffffffbecc48d5>] elants_i2c_irq+0x190/0x200
      [ 2189.435323]  RSP <ffff88017a2bfd98>
      [ 2189.435350] ---[ end trace f4945345a75d96dd ]---
      [ 2189.443841] Kernel panic - not syncing: Fatal exception
      [ 2189.444307] Kernel Offset: 0x3d800000 from 0xffffffff81000000
      	(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
      [ 2189.444519] gsmi: Log Shutdown Reason 0x02
      
      The problem was seen with a 3.18 based kernel, but there is no reason
      to believe that the upstream code is safe.
      
      Fixes: 66aee900 ("Input: add support for Elan eKTH I2C touchscreens")
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      085f4ebe
    • Johan Hovold's avatar
      USB: serial: ch341: fix open and resume after B0 · 214a8e98
      Johan Hovold authored
      commit a20047f3 upstream.
      
      The private baud_rate variable is used to configure the port at open and
      reset-resume and must never be set to (and left at) zero or reset-resume
      and all further open attempts will fail.
      
      Fixes: aa91def4 ("USB: ch341: set tty baud speed according to tty struct")
      Fixes: 664d5df9 ("USB: usb-serial ch341: support for DTR/RTS/CTS")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      214a8e98
    • Johan Hovold's avatar
      USB: serial: ch341: fix control-message error handling · 802b4ef3
      Johan Hovold authored
      commit 2d5a9c72 upstream.
      
      A short control transfer would currently fail to be detected, something
      which could lead to stale buffer data being used as valid input.
      
      Check for short transfers, and make sure to log any transfer errors.
      
      Note that this also avoids leaking heap data to user space (TIOCMGET)
      and the remote device (break control).
      
      Fixes: 6ce76104 ("USB: Driver for CH341 USB-serial adaptor")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      802b4ef3
    • Johan Hovold's avatar
      USB: serial: ch341: fix open error handling · bc74606d
      Johan Hovold authored
      commit f2950b78 upstream.
      
      Make sure to stop the interrupt URB before returning on errors during
      open.
      
      Fixes: 664d5df9 ("USB: usb-serial ch341: support for DTR/RTS/CTS")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bc74606d
    • Johan Hovold's avatar
      USB: serial: ch341: fix initial modem-control state · e29f709c
      Johan Hovold authored
      commit 4e2da446 upstream.
      
      DTR and RTS will be asserted by the tty-layer when the port is opened
      and deasserted on close (if HUPCL is set). Make sure the initial state
      is not-asserted before the port is first opened as well.
      
      Fixes: 664d5df9 ("USB: usb-serial ch341: support for DTR/RTS/CTS")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e29f709c
    • Johan Hovold's avatar
      USB: serial: kl5kusb105: fix line-state error handling · 3ef5bc0b
      Johan Hovold authored
      commit 146cc8a1 upstream.
      
      The current implementation failed to detect short transfers when
      attempting to read the line state, and also, to make things worse,
      logged the content of the uninitialised heap transfer buffer.
      
      Fixes: abf492e7 ("USB: kl5kusb105: fix DMA buffers on stack")
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3ef5bc0b
    • Johannes Berg's avatar
      nl80211: fix sched scan netlink socket owner destruction · 4a1ecf37
      Johannes Berg authored
      commit 753aacfd upstream.
      
      A single netlink socket might own multiple interfaces *and* a
      scheduled scan request (which might belong to another interface),
      so when it goes away both may need to be destroyed.
      
      Remove the schedule_scan_stop indirection to fix this - it's only
      needed for interface destruction because of the way this works
      right now, with a single work taking care of all interfaces.
      
      Fixes: 93a1e86c ("nl80211: Stop scheduled scan if netlink client disappears")
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4a1ecf37
    • Steve Rutherford's avatar
      KVM: x86: Introduce segmented_write_std · 9d3875c0
      Steve Rutherford authored
      commit 129a72a0 upstream.
      
      Introduces segemented_write_std.
      
      Switches from emulated reads/writes to standard read/writes in fxsave,
      fxrstor, sgdt, and sidt.  This fixes CVE-2017-2584, a longstanding
      kernel memory leak.
      
      Since commit 283c95d0 ("KVM: x86: emulate FXSAVE and FXRSTOR",
      2016-11-09), which is luckily not yet in any final release, this would
      also be an exploitable kernel memory *write*!
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Fixes: 96051572
      Fixes: 283c95d0Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarSteve Rutherford <srutherford@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9d3875c0
    • Radim Krčmář's avatar
      KVM: x86: emulate FXSAVE and FXRSTOR · 3490e72a
      Radim Krčmář authored
      commit 283c95d0 upstream.
      
      Internal errors were reported on 16 bit fxsave and fxrstor with ipxe.
      Old Intels don't have unrestricted_guest, so we have to emulate them.
      
      The patch takes advantage of the hardware implementation.
      
      AMD and Intel differ in saving and restoring other fields in first 32
      bytes.  A test wrote 0xff to the fxsave area, 0 to upper bits of MCSXR
      in the fxsave area, executed fxrstor, rewrote the fxsave area to 0xee,
      and executed fxsave:
      
        Intel (Nehalem):
          7f 1f 7f 7f ff 00 ff 07 ff ff ff ff ff ff 00 00
          ff ff ff ff ff ff 00 00 ff ff 00 00 ff ff 00 00
        Intel (Haswell -- deprecated FPU CS and FPU DS):
          7f 1f 7f 7f ff 00 ff 07 ff ff ff ff 00 00 00 00
          ff ff ff ff 00 00 00 00 ff ff 00 00 ff ff 00 00
        AMD (Opteron 2300-series):
          7f 1f 7f 7f ff 00 ee ee ee ee ee ee ee ee ee ee
          ee ee ee ee ee ee ee ee ff ff 00 00 ff ff 02 00
      
      fxsave/fxrstor will only be emulated on early Intels, so KVM can't do
      much to improve the situation.
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3490e72a
    • Radim Krčmář's avatar
      KVM: x86: add asm_safe wrapper · d9c4c1e7
      Radim Krčmář authored
      commit aabba3c6 upstream.
      
      Move the existing exception handling for inline assembly into a macro
      and switch its return values to X86EMUL type.
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d9c4c1e7
    • Radim Krčmář's avatar
      KVM: x86: add Align16 instruction flag · 4fa00902
      Radim Krčmář authored
      commit d3fe959f upstream.
      
      Needed for FXSAVE and FXRSTOR.
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4fa00902
    • David Matlack's avatar
      KVM: x86: flush pending lapic jump label updates on module unload · 1fc673d9
      David Matlack authored
      commit cef84c30 upstream.
      
      KVM's lapic emulation uses static_key_deferred (apic_{hw,sw}_disabled).
      These are implemented with delayed_work structs which can still be
      pending when the KVM module is unloaded. We've seen this cause kernel
      panics when the kvm_intel module is quickly reloaded.
      
      Use the new static_key_deferred_flush() API to flush pending updates on
      module unload.
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1fc673d9
    • David Matlack's avatar
      jump_labels: API for flushing deferred jump label updates · 3d27cd4b
      David Matlack authored
      commit b6416e61 upstream.
      
      Modules that use static_key_deferred need a way to synchronize with
      any delayed work that is still pending when the module is unloaded.
      Introduce static_key_deferred_flush() which flushes any pending
      jump label updates.
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3d27cd4b
    • Wanpeng Li's avatar
      KVM: eventfd: fix NULL deref irqbypass consumer · 34a55c9d
      Wanpeng Li authored
      commit 4f3dbdf4 upstream.
      
      Reported syzkaller:
      
          BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
          IP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass]
          PGD 0
      
          Oops: 0002 [#1] SMP
          CPU: 1 PID: 125 Comm: kworker/1:1 Not tainted 4.9.0+ #1
          Workqueue: kvm-irqfd-cleanup irqfd_shutdown [kvm]
          task: ffff9bbe0dfbb900 task.stack: ffffb61802014000
          RIP: 0010:irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass]
          Call Trace:
           irqfd_shutdown+0x66/0xa0 [kvm]
           process_one_work+0x16b/0x480
           worker_thread+0x4b/0x500
           kthread+0x101/0x140
           ? process_one_work+0x480/0x480
           ? kthread_create_on_node+0x60/0x60
           ret_from_fork+0x25/0x30
          RIP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass] RSP: ffffb61802017e20
          CR2: 0000000000000008
      
      The syzkaller folks reported a NULL pointer dereference that due to
      unregister an consumer which fails registration before. The syzkaller
      creates two VMs w/ an equal eventfd occasionally. So the second VM
      fails to register an irqbypass consumer. It will make irqfd as inactive
      and queue an workqueue work to shutdown irqfd and unregister the irqbypass
      consumer when eventfd is closed. However, the second consumer has been
      initialized though it fails registration. So the token(same as the first
      VM's) is taken to unregister the consumer through the workqueue, the
      consumer of the first VM is found and unregistered, then NULL deref incurred
      in the path of deleting consumer from the consumers list.
      
      This patch fixes it by making irq_bypass_register/unregister_consumer()
      looks for the consumer entry based on consumer pointer itself instead of
      token matching.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Suggested-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      34a55c9d
    • Paolo Bonzini's avatar
      KVM: x86: fix emulation of "MOV SS, null selector" · 816307c8
      Paolo Bonzini authored
      commit 33ab9110 upstream.
      
      This is CVE-2017-2583.  On Intel this causes a failed vmentry because
      SS's type is neither 3 nor 7 (even though the manual says this check is
      only done for usable SS, and the dmesg splat says that SS is unusable!).
      On AMD it's worse: svm.c is confused and sets CPL to 0 in the vmcb.
      
      The fix fabricates a data segment descriptor when SS is set to a null
      selector, so that CPL and SS.DPL are set correctly in the VMCS/vmcb.
      Furthermore, only allow setting SS to a NULL selector if SS.RPL < 3;
      this in turn ensures CPL < 3 because RPL must be equal to CPL.
      
      Thanks to Andy Lutomirski and Willy Tarreau for help in analyzing
      the bug and deciphering the manuals.
      Reported-by: default avatarXiaohan Zhang <zhangxiaohan1@huawei.com>
      Fixes: 79d5b4c3Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      816307c8
    • Mike Kravetz's avatar
      mm/hugetlb.c: fix reservation race when freeing surplus pages · 1a46e6ec
      Mike Kravetz authored
      commit e5bbc8a6 upstream.
      
      return_unused_surplus_pages() decrements the global reservation count,
      and frees any unused surplus pages that were backing the reservation.
      
      Commit 7848a4bf ("mm/hugetlb.c: add cond_resched_lock() in
      return_unused_surplus_pages()") added a call to cond_resched_lock in the
      loop freeing the pages.
      
      As a result, the hugetlb_lock could be dropped, and someone else could
      use the pages that will be freed in subsequent iterations of the loop.
      This could result in inconsistent global hugetlb page state, application
      api failures (such as mmap) failures or application crashes.
      
      When dropping the lock in return_unused_surplus_pages, make sure that
      the global reservation count (resv_huge_pages) remains sufficiently
      large to prevent someone else from claiming pages about to be freed.
      
      Analyzed by Paul Cassella.
      
      Fixes: 7848a4bf ("mm/hugetlb.c: add cond_resched_lock() in return_unused_surplus_pages()")
      Link: http://lkml.kernel.org/r/1483991767-6879-1-git-send-email-mike.kravetz@oracle.comSigned-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reported-by: default avatarPaul Cassella <cassella@cray.com>
      Suggested-by: default avatarMichal Hocko <mhocko@kernel.org>
      Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1a46e6ec
    • Eric Ren's avatar
      ocfs2: fix crash caused by stale lvb with fsdlm plugin · 6bbb8ff3
      Eric Ren authored
      commit e7ee2c08 upstream.
      
      The crash happens rather often when we reset some cluster nodes while
      nodes contend fiercely to do truncate and append.
      
      The crash backtrace is below:
      
         dlm: C21CBDA5E0774F4BA5A9D4F317717495: dlm_recover_grant 1 locks on 971 resources
         dlm: C21CBDA5E0774F4BA5A9D4F317717495: dlm_recover 9 generation 5 done: 4 ms
         ocfs2: Begin replay journal (node 318952601, slot 2) on device (253,18)
         ocfs2: End replay journal (node 318952601, slot 2) on device (253,18)
         ocfs2: Beginning quota recovery on device (253,18) for slot 2
         ocfs2: Finishing quota recovery on device (253,18) for slot 2
         (truncate,30154,1):ocfs2_truncate_file:470 ERROR: bug expression: le64_to_cpu(fe->i_size) != i_size_read(inode)
         (truncate,30154,1):ocfs2_truncate_file:470 ERROR: Inode 290321, inode i_size = 732 != di i_size = 937, i_flags = 0x1
         ------------[ cut here ]------------
         kernel BUG at /usr/src/linux/fs/ocfs2/file.c:470!
         invalid opcode: 0000 [#1] SMP
         Modules linked in: ocfs2_stack_user(OEN) ocfs2(OEN) ocfs2_nodemanager ocfs2_stackglue(OEN) quota_tree dlm(OEN) configfs fuse sd_mod    iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi af_packet iscsi_ibft iscsi_boot_sysfs softdog xfs libcrc32c ppdev parport_pc pcspkr parport      joydev virtio_balloon virtio_net i2c_piix4 acpi_cpufreq button processor ext4 crc16 jbd2 mbcache ata_generic cirrus virtio_blk ata_piix               drm_kms_helper ahci syscopyarea libahci sysfillrect sysimgblt fb_sys_fops ttm floppy libata drm virtio_pci virtio_ring uhci_hcd virtio ehci_hcd       usbcore serio_raw usb_common sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
         Supported: No, Unsupported modules are loaded
         CPU: 1 PID: 30154 Comm: truncate Tainted: G           OE   N  4.4.21-69-default #1
         Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20151112_172657-sheep25 04/01/2014
         task: ffff88004ff6d240 ti: ffff880074e68000 task.ti: ffff880074e68000
         RIP: 0010:[<ffffffffa05c8c30>]  [<ffffffffa05c8c30>] ocfs2_truncate_file+0x640/0x6c0 [ocfs2]
         RSP: 0018:ffff880074e6bd50  EFLAGS: 00010282
         RAX: 0000000000000074 RBX: 000000000000029e RCX: 0000000000000000
         RDX: 0000000000000001 RSI: 0000000000000246 RDI: 0000000000000246
         RBP: ffff880074e6bda8 R08: 000000003675dc7a R09: ffffffff82013414
         R10: 0000000000034c50 R11: 0000000000000000 R12: ffff88003aab3448
         R13: 00000000000002dc R14: 0000000000046e11 R15: 0000000000000020
         FS:  00007f839f965700(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000
         CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
         CR2: 00007f839f97e000 CR3: 0000000036723000 CR4: 00000000000006e0
         Call Trace:
           ocfs2_setattr+0x698/0xa90 [ocfs2]
           notify_change+0x1ae/0x380
           do_truncate+0x5e/0x90
           do_sys_ftruncate.constprop.11+0x108/0x160
           entry_SYSCALL_64_fastpath+0x12/0x6d
         Code: 24 28 ba d6 01 00 00 48 c7 c6 30 43 62 a0 8b 41 2c 89 44 24 08 48 8b 41 20 48 c7 c1 78 a3 62 a0 48 89 04 24 31 c0 e8 a0 97 f9 ff <0f> 0b 3d 00 fe ff ff 0f 84 ab fd ff ff 83 f8 fc 0f 84 a2 fd ff
         RIP  [<ffffffffa05c8c30>] ocfs2_truncate_file+0x640/0x6c0 [ocfs2]
      
      It's because ocfs2_inode_lock() get us stale LVB in which the i_size is
      not equal to the disk i_size.  We mistakenly trust the LVB because the
      underlaying fsdlm dlm_lock() doesn't set lkb_sbflags with
      DLM_SBF_VALNOTVALID properly for us.  But, why?
      
      The current code tries to downconvert lock without DLM_LKF_VALBLK flag
      to tell o2cb don't update RSB's LVB if it's a PR->NULL conversion, even
      if the lock resource type needs LVB.  This is not the right way for
      fsdlm.
      
      The fsdlm plugin behaves different on DLM_LKF_VALBLK, it depends on
      DLM_LKF_VALBLK to decide if we care about the LVB in the LKB.  If
      DLM_LKF_VALBLK is not set, fsdlm will skip recovering RSB's LVB from
      this lkb and set the right DLM_SBF_VALNOTVALID appropriately when node
      failure happens.
      
      The following diagram briefly illustrates how this crash happens:
      
      RSB1 is inode metadata lock resource with LOCK_TYPE_USES_LVB;
      
      The 1st round:
      
                   Node1                                    Node2
      RSB1: PR
                                                        RSB1(master): NULL->EX
      ocfs2_downconvert_lock(PR->NULL, set_lvb==0)
        ocfs2_dlm_lock(no DLM_LKF_VALBLK)
      
      - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
      
      dlm_lock(no DLM_LKF_VALBLK)
        convert_lock(overwrite lkb->lkb_exflags
                     with no DLM_LKF_VALBLK)
      
      RSB1: NULL                                        RSB1: EX
                                                        reset Node2
      dlm_recover_rsbs()
        recover_lvb()
      
      /* The LVB is not trustable if the node with EX fails and
       * no lock >= PR is left. We should set RSB_VALNOTVALID for RSB1.
       */
      
       if(!(kb_exflags & DLM_LKF_VALBLK)) /* This means we miss the chance to
                 return;                   * to invalid the LVB here.
                                           */
      
      The 2nd round:
      
               Node 1                                Node2
      RSB1(become master from recovery)
      
      ocfs2_setattr()
        ocfs2_inode_lock(NULL->EX)
          /* dlm_lock() return the stale lvb without setting DLM_SBF_VALNOTVALID */
          ocfs2_meta_lvb_is_trustable() return 1 /* so we don't refresh inode from disk */
        ocfs2_truncate_file()
            mlog_bug_on_msg(disk isize != i_size_read(inode))  /* crash! */
      
      The fix is quite straightforward.  We keep to set DLM_LKF_VALBLK flag
      for dlm_lock() if the lock resource type needs LVB and the fsdlm plugin
      is uesed.
      
      Link: http://lkml.kernel.org/r/1481275846-6604-1-git-send-email-zren@suse.comSigned-off-by: default avatarEric Ren <zren@suse.com>
      Reviewed-by: default avatarJoseph Qi <jiangqi903@gmail.com>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6bbb8ff3
    • Dan Williams's avatar
      mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done} · 70429b97
      Dan Williams authored
      commit f931ab47 upstream.
      
      Both arch_add_memory() and arch_remove_memory() expect a single threaded
      context.
      
      For example, arch/x86/mm/init_64.c::kernel_physical_mapping_init() does
      not hold any locks over this check and branch:
      
          if (pgd_val(*pgd)) {
          	pud = (pud_t *)pgd_page_vaddr(*pgd);
          	paddr_last = phys_pud_init(pud, __pa(vaddr),
          				   __pa(vaddr_end),
          				   page_size_mask);
          	continue;
          }
      
          pud = alloc_low_page();
          paddr_last = phys_pud_init(pud, __pa(vaddr), __pa(vaddr_end),
          			   page_size_mask);
      
      The result is that two threads calling devm_memremap_pages()
      simultaneously can end up colliding on pgd initialization.  This leads
      to crash signatures like the following where the loser of the race
      initializes the wrong pgd entry:
      
          BUG: unable to handle kernel paging request at ffff888ebfff0000
          IP: memcpy_erms+0x6/0x10
          PGD 2f8e8fc067 PUD 0 /* <---- Invalid PUD */
          Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
          CPU: 54 PID: 3818 Comm: systemd-udevd Not tainted 4.6.7+ #13
          task: ffff882fac290040 ti: ffff882f887a4000 task.ti: ffff882f887a4000
          RIP: memcpy_erms+0x6/0x10
          [..]
          Call Trace:
            ? pmem_do_bvec+0x205/0x370 [nd_pmem]
            ? blk_queue_enter+0x3a/0x280
            pmem_rw_page+0x38/0x80 [nd_pmem]
            bdev_read_page+0x84/0xb0
      
      Hold the standard memory hotplug mutex over calls to
      arch_{add,remove}_memory().
      
      Fixes: 41e94a85 ("add devm_memremap_pages")
      Link: http://lkml.kernel.org/r/148357647831.9498.12606007370121652979.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      70429b97