1. 08 Apr, 2017 1 commit
    • Ilya Dryomov's avatar
      libceph: force GFP_NOIO for socket allocations · 86015377
      Ilya Dryomov authored
      commit 633ee407 upstream.
      
      sock_alloc_inode() allocates socket+inode and socket_wq with
      GFP_KERNEL, which is not allowed on the writeback path:
      
          Workqueue: ceph-msgr con_work [libceph]
          ffff8810871cb018 0000000000000046 0000000000000000 ffff881085d40000
          0000000000012b00 ffff881025cad428 ffff8810871cbfd8 0000000000012b00
          ffff880102fc1000 ffff881085d40000 ffff8810871cb038 ffff8810871cb148
          Call Trace:
          [<ffffffff816dd629>] schedule+0x29/0x70
          [<ffffffff816e066d>] schedule_timeout+0x1bd/0x200
          [<ffffffff81093ffc>] ? ttwu_do_wakeup+0x2c/0x120
          [<ffffffff81094266>] ? ttwu_do_activate.constprop.135+0x66/0x70
          [<ffffffff816deb5f>] wait_for_completion+0xbf/0x180
          [<ffffffff81097cd0>] ? try_to_wake_up+0x390/0x390
          [<ffffffff81086335>] flush_work+0x165/0x250
          [<ffffffff81082940>] ? worker_detach_from_pool+0xd0/0xd0
          [<ffffffffa03b65b1>] xlog_cil_force_lsn+0x81/0x200 [xfs]
          [<ffffffff816d6b42>] ? __slab_free+0xee/0x234
          [<ffffffffa03b4b1d>] _xfs_log_force_lsn+0x4d/0x2c0 [xfs]
          [<ffffffff811adc1e>] ? lookup_page_cgroup_used+0xe/0x30
          [<ffffffffa039a723>] ? xfs_reclaim_inode+0xa3/0x330 [xfs]
          [<ffffffffa03b4dcf>] xfs_log_force_lsn+0x3f/0xf0 [xfs]
          [<ffffffffa039a723>] ? xfs_reclaim_inode+0xa3/0x330 [xfs]
          [<ffffffffa03a62c6>] xfs_iunpin_wait+0xc6/0x1a0 [xfs]
          [<ffffffff810aa250>] ? wake_atomic_t_function+0x40/0x40
          [<ffffffffa039a723>] xfs_reclaim_inode+0xa3/0x330 [xfs]
          [<ffffffffa039ac07>] xfs_reclaim_inodes_ag+0x257/0x3d0 [xfs]
          [<ffffffffa039bb13>] xfs_reclaim_inodes_nr+0x33/0x40 [xfs]
          [<ffffffffa03ab745>] xfs_fs_free_cached_objects+0x15/0x20 [xfs]
          [<ffffffff811c0c18>] super_cache_scan+0x178/0x180
          [<ffffffff8115912e>] shrink_slab_node+0x14e/0x340
          [<ffffffff811afc3b>] ? mem_cgroup_iter+0x16b/0x450
          [<ffffffff8115af70>] shrink_slab+0x100/0x140
          [<ffffffff8115e425>] do_try_to_free_pages+0x335/0x490
          [<ffffffff8115e7f9>] try_to_free_pages+0xb9/0x1f0
          [<ffffffff816d56e4>] ? __alloc_pages_direct_compact+0x69/0x1be
          [<ffffffff81150cba>] __alloc_pages_nodemask+0x69a/0xb40
          [<ffffffff8119743e>] alloc_pages_current+0x9e/0x110
          [<ffffffff811a0ac5>] new_slab+0x2c5/0x390
          [<ffffffff816d71c4>] __slab_alloc+0x33b/0x459
          [<ffffffff815b906d>] ? sock_alloc_inode+0x2d/0xd0
          [<ffffffff8164bda1>] ? inet_sendmsg+0x71/0xc0
          [<ffffffff815b906d>] ? sock_alloc_inode+0x2d/0xd0
          [<ffffffff811a21f2>] kmem_cache_alloc+0x1a2/0x1b0
          [<ffffffff815b906d>] sock_alloc_inode+0x2d/0xd0
          [<ffffffff811d8566>] alloc_inode+0x26/0xa0
          [<ffffffff811da04a>] new_inode_pseudo+0x1a/0x70
          [<ffffffff815b933e>] sock_alloc+0x1e/0x80
          [<ffffffff815ba855>] __sock_create+0x95/0x220
          [<ffffffff815baa04>] sock_create_kern+0x24/0x30
          [<ffffffffa04794d9>] con_work+0xef9/0x2050 [libceph]
          [<ffffffffa04aa9ec>] ? rbd_img_request_submit+0x4c/0x60 [rbd]
          [<ffffffff81084c19>] process_one_work+0x159/0x4f0
          [<ffffffff8108561b>] worker_thread+0x11b/0x530
          [<ffffffff81085500>] ? create_worker+0x1d0/0x1d0
          [<ffffffff8108b6f9>] kthread+0xc9/0xe0
          [<ffffffff8108b630>] ? flush_kthread_worker+0x90/0x90
          [<ffffffff816e1b98>] ret_from_fork+0x58/0x90
          [<ffffffff8108b630>] ? flush_kthread_worker+0x90/0x90
      
      Use memalloc_noio_{save,restore}() to temporarily force GFP_NOIO here.
      
      Link: http://tracker.ceph.com/issues/19309Reported-by: default avatarSergey Jerusalimov <wintchester@gmail.com>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      86015377
  2. 31 Mar, 2017 17 commits
  3. 30 Mar, 2017 22 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.9.19 · c8e13160
      Greg Kroah-Hartman authored
      c8e13160
    • Jiri Slaby's avatar
      crypto: algif_hash - avoid zero-sized array · bc959a40
      Jiri Slaby authored
      commit 62071194 upstream.
      
      With this reproducer:
        struct sockaddr_alg alg = {
                .salg_family = 0x26,
                .salg_type = "hash",
                .salg_feat = 0xf,
                .salg_mask = 0x5,
                .salg_name = "digest_null",
        };
        int sock, sock2;
      
        sock = socket(AF_ALG, SOCK_SEQPACKET, 0);
        bind(sock, (struct sockaddr *)&alg, sizeof(alg));
        sock2 = accept(sock, NULL, NULL);
        setsockopt(sock, SOL_ALG, ALG_SET_KEY, "\x9b\xca", 2);
        accept(sock2, NULL, NULL);
      
      ==== 8< ======== 8< ======== 8< ======== 8< ====
      
      one can immediatelly see an UBSAN warning:
      UBSAN: Undefined behaviour in crypto/algif_hash.c:187:7
      variable length array bound value 0 <= 0
      CPU: 0 PID: 15949 Comm: syz-executor Tainted: G            E      4.4.30-0-default #1
      ...
      Call Trace:
      ...
       [<ffffffff81d598fd>] ? __ubsan_handle_vla_bound_not_positive+0x13d/0x188
       [<ffffffff81d597c0>] ? __ubsan_handle_out_of_bounds+0x1bc/0x1bc
       [<ffffffffa0e2204d>] ? hash_accept+0x5bd/0x7d0 [algif_hash]
       [<ffffffffa0e2293f>] ? hash_accept_nokey+0x3f/0x51 [algif_hash]
       [<ffffffffa0e206b0>] ? hash_accept_parent_nokey+0x4a0/0x4a0 [algif_hash]
       [<ffffffff8235c42b>] ? SyS_accept+0x2b/0x40
      
      It is a correct warning, as hash state is propagated to accept as zero,
      but creating a zero-length variable array is not allowed in C.
      
      Fix this as proposed by Herbert -- do "?: 1" on that site. No sizeof or
      similar happens in the code there, so we just allocate one byte even
      though we do not use the array.
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: "David S. Miller" <davem@davemloft.net> (maintainer:CRYPTO API)
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bc959a40
    • Takashi Iwai's avatar
      fbcon: Fix vc attr at deinit · 3fd37725
      Takashi Iwai authored
      commit 8aac7f34 upstream.
      
      fbcon can deal with vc_hi_font_mask (the upper 256 chars) and adjust
      the vc attrs dynamically when vc_hi_font_mask is changed at
      fbcon_init().  When the vc_hi_font_mask is set, it remaps the attrs in
      the existing console buffer with one bit shift up (for 9 bits), while
      it remaps with one bit shift down (for 8 bits) when the value is
      cleared.  It works fine as long as the font gets updated after fbcon
      was initialized.
      
      However, we hit a bizarre problem when the console is switched to
      another fb driver (typically from vesafb or efifb to drmfb).  At
      switching to the new fb driver, we temporarily rebind the console to
      the dummy console, then rebind to the new driver.  During the
      switching, we leave the modified attrs as is.  Thus, the new fbcon
      takes over the old buffer as if it were to contain 8 bits chars
      (although the attrs are still shifted for 9 bits), and effectively
      this results in the yellow color texts instead of the original white
      color, as found in the bugzilla entry below.
      
      An easy fix for this is to re-adjust the attrs before leaving the
      fbcon at con_deinit callback.  Since the code to adjust the attrs is
      already present in the current fbcon code, in this patch, we simply
      factor out the relevant code, and call it from fbcon_deinit().
      
      Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1000619Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3fd37725
    • Daniel Vetter's avatar
      drm: reference count event->completion · c75fe789
      Daniel Vetter authored
      commit 24835e44 upstream.
      
      When writing the generic nonblocking commit code I assumed that
      through clever lifetime management I can assure that the completion
      (stored in drm_crtc_commit) only gets freed after it is completed. And
      that worked.
      
      I also wanted to make nonblocking helpers resilient against driver
      bugs, by having timeouts everywhere. And that worked too.
      
      Unfortunately taking boths things together results in oopses :( Well,
      at least sometimes: What seems to happen is that the drm event hangs
      around forever stuck in limbo land. The nonblocking helpers eventually
      time out, move on and release it. Now the bug I tested all this
      against is drivers that just entirely fail to deliver the vblank
      events like they should, and in those cases the event is simply
      leaked. But what seems to happen, at least sometimes, on i915 is that
      the event is set up correctly, but somohow the vblank fails to fire in
      time. Which means the event isn't leaked, it's still there waiting for
      eventually a vblank to fire. That tends to happen when re-enabling the
      pipe, and then the trap springs and the kernel oopses.
      
      The correct fix here is simply to refcount the crtc commit to make
      sure that the event sticks around even for drivers which only
      sometimes fail to deliver vblanks for some arbitrary reasons. Since
      crtc commits are already refcounted that's easy to do.
      
      References: https://bugs.freedesktop.org/show_bug.cgi?id=96781
      Cc: Jim Rees <rees@umich.edu>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161221102331.31033-1-daniel.vetter@ffwll.ch
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c75fe789
    • Johannes Berg's avatar
      nl80211: fix dumpit error path RTNL deadlocks · 56769e7a
      Johannes Berg authored
      commit ea90e0dc upstream.
      
      Sowmini pointed out Dmitry's RTNL deadlock report to me, and it turns out
      to be perfectly accurate - there are various error paths that miss unlock
      of the RTNL.
      
      To fix those, change the locking a bit to not be conditional in all those
      nl80211_prepare_*_dump() functions, but make those require the RTNL to
      start with, and fix the buggy error paths. This also let me use sparse
      (by appropriately overriding the rtnl_lock/rtnl_unlock functions) to
      validate the changes.
      Reported-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      56769e7a
    • Marek Szyprowski's avatar
      drm/bridge: analogix dp: Fix runtime PM state on driver bind · 7b3c8b2a
      Marek Szyprowski authored
      commit f0a8b49c upstream.
      
      Analogix_dp_bind() can be called from component framework, which doesn't
      guarantee proper runtime PM state of the device during bind operation,
      so ensure that device is runtime active before doing any register access.
      This ensures that the power domain, to which DP module belongs, is turned
      on. While at it, also fix the unbalanced call to phy_power_on() in
      analogix_dp_bind() function.
      
      This patch solves the following kernel oops on Samsung Exynos5250 Snow
      board:
      
      Unhandled fault: imprecise external abort (0x406) at 0x00000000
      pgd = c0004000
      [00000000] *pgd=00000000
      Internal error: : 406 [#1] PREEMPT SMP ARM
      Modules linked in:
      CPU: 0 PID: 75 Comm: kworker/0:2 Not tainted 4.9.0 #1046
      Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
      Workqueue: events deferred_probe_work_func
      task: ee272300 task.stack: ee312000
      PC is at analogix_dp_enable_sw_function+0x18/0x2c
      LR is at analogix_dp_init_dp+0x2c/0x50
      ...
      [<c03fcb38>] (analogix_dp_enable_sw_function) from [<c03fa9c4>] (analogix_dp_init_dp+0x2c/0x50)
      [<c03fa9c4>] (analogix_dp_init_dp) from [<c03fab6c>] (analogix_dp_bind+0x184/0x42c)
      [<c03fab6c>] (analogix_dp_bind) from [<c03fdb84>] (component_bind_all+0xf0/0x218)
      [<c03fdb84>] (component_bind_all) from [<c03ed64c>] (exynos_drm_load+0x134/0x200)
      [<c03ed64c>] (exynos_drm_load) from [<c03d5058>] (drm_dev_register+0xa0/0xd0)
      [<c03d5058>] (drm_dev_register) from [<c03d66b8>] (drm_platform_init+0x58/0xb0)
      [<c03d66b8>] (drm_platform_init) from [<c03fe0c4>] (try_to_bring_up_master+0x14c/0x188)
      [<c03fe0c4>] (try_to_bring_up_master) from [<c03fe188>] (component_add+0x88/0x138)
      [<c03fe188>] (component_add) from [<c0403a38>] (platform_drv_probe+0x50/0xb0)
      [<c0403a38>] (platform_drv_probe) from [<c0402470>] (driver_probe_device+0x1f0/0x2a8)
      [<c0402470>] (driver_probe_device) from [<c0400a54>] (bus_for_each_drv+0x44/0x8c)
      [<c0400a54>] (bus_for_each_drv) from [<c04021f8>] (__device_attach+0x9c/0x100)
      [<c04021f8>] (__device_attach) from [<c04018e8>] (bus_probe_device+0x84/0x8c)
      [<c04018e8>] (bus_probe_device) from [<c0401d1c>] (deferred_probe_work_func+0x60/0x8c)
      [<c0401d1c>] (deferred_probe_work_func) from [<c012fc14>] (process_one_work+0x120/0x318)
      [<c012fc14>] (process_one_work) from [<c012fe34>] (process_scheduled_works+0x28/0x38)
      [<c012fe34>] (process_scheduled_works) from [<c0130048>] (worker_thread+0x204/0x4ac)
      [<c0130048>] (worker_thread) from [<c01352c4>] (kthread+0xd8/0xf4)
      [<c01352c4>] (kthread) from [<c0107978>] (ret_from_fork+0x14/0x3c)
      Code: e59035f0 e5935018 f57ff04f e3c55001 (f57ff04e)
      ---[ end trace 3d1d0d87796de344 ]---
      Reviewed-by: default avatarSean Paul <seanpaul@chromium.org>
      Signed-off-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Signed-off-by: default avatarArchit Taneja <architt@codeaurora.org>
      Link: http://patchwork.freedesktop.org/patch/msgid/1483091866-1088-1-git-send-email-m.szyprowski@samsung.com
      Cc: Javier Martinez Canillas <javier@osg.samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7b3c8b2a
    • Dave Jiang's avatar
      device-dax: fix pmd/pte fault fallback handling · eae72468
      Dave Jiang authored
      commit 0134ed4f upstream.
      
      Jeff Moyer reports:
      
          With a device dax alignment of 4KB or 2MB, I get sigbus when running
          the attached fio job file for the current kernel (4.11.0-rc1+).  If
          I specify an alignment of 1GB, it works.
      
          I turned on debug output, and saw that it was failing in the huge
          fault code.
      
           dax dax1.0: dax_open
           dax dax1.0: dax_mmap
           dax dax1.0: dax_dev_huge_fault: fio: write (0x7f08f0a00000 -
           dax dax1.0: __dax_dev_pud_fault: phys_to_pgoff(0xffffffffcf60
           dax dax1.0: dax_release
      
          fio config for reproduce:
          [global]
          ioengine=dev-dax
          direct=0
          filename=/dev/dax0.0
          bs=2m
      
          [write]
          rw=write
      
          [read]
          stonewall
          rw=read
      
      The driver fails to fallback when taking a fault that is larger than
      the device alignment, or handling a larger fault when a smaller
      mapping is already established. While we could support larger
      mappings for a device with a smaller alignment, that change is
      too large for the immediate fix. The simplest change is to force
      fallback until the fault size matches the alignment.
      
      Fixes: dee41079 ("/dev/dax, core: file operations and dax-mmap")
      Cc: <stable@vger.kernel.org>
      Reported-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eae72468
    • Ilya Dryomov's avatar
      libceph: don't set weight to IN when OSD is destroyed · 81ec3dc1
      Ilya Dryomov authored
      commit b581a585 upstream.
      
      Since ceph.git commit 4e28f9e63644 ("osd/OSDMap: clear osd_info,
      osd_xinfo on osd deletion"), weight is set to IN when OSD is deleted.
      This changes the result of applying an incremental for clients, not
      just OSDs.  Because CRUSH computations are obviously affected,
      pre-4e28f9e63644 servers disagree with post-4e28f9e63644 clients on
      object placement, resulting in misdirected requests.
      
      Mirrors ceph.git commit a6009d1039a55e2c77f431662b3d6cc5a8e8e63f.
      
      Fixes: 930c5328 ("libceph: apply new_state before new_up_client on incrementals")
      Link: http://tracker.ceph.com/issues/19122Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarSage Weil <sage@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      81ec3dc1
    • K. Y. Srinivasan's avatar
      Drivers: hv: vmbus: Don't leak memory when a channel is rescinded · df1fe6c9
      K. Y. Srinivasan authored
      commit 5e030d5c upstream.
      
      When we close a channel that has been rescinded, we will leak memory since
      vmbus_teardown_gpadl() returns an error. Fix this so that we can properly
      cleanup the memory allocated to the ring buffers.
      
      Fixes: ccb61f8a ("Drivers: hv: vmbus: Fix a rescind handling bug")
      Signed-off-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Cc: Dexuan Cui <decui@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      df1fe6c9
    • K. Y. Srinivasan's avatar
      Drivers: hv: vmbus: Don't leak channel ids · b1f6b0a5
      K. Y. Srinivasan authored
      commit 9a547602 upstream.
      
      If we cannot allocate memory for the channel, free the relid
      associated with the channel.
      Signed-off-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b1f6b0a5
    • Alexander Shishkin's avatar
      intel_th: Don't leak module refcount on failure to activate · 3076066b
      Alexander Shishkin authored
      commit e609ccef upstream.
      
      Output 'activation' may fail for the reasons of the output driver,
      for example, if msc's buffer is not allocated. We forget, however,
      to drop the module reference in this case. So each attempt at
      activation in this case leaks a reference, preventing the module
      from ever unloading.
      
      This patch adds the missing module_put() in the activation error
      path.
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3076066b
    • Eric Biggers's avatar
      jbd2: don't leak memory if setting up journal fails · b176a6ee
      Eric Biggers authored
      commit cd9cb405 upstream.
      
      In journal_init_common(), if we failed to allocate the j_wbuf array, or
      if we failed to create the buffer_head for the journal superblock, we
      leaked the memory allocated for the revocation tables.  Fix this.
      
      Fixes: f0c9fd54Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b176a6ee
    • Dmitry Torokhov's avatar
      auxdisplay: img-ascii-lcd: add missing sentinel entry in img_ascii_lcd_matches · 90f39ad2
      Dmitry Torokhov authored
      commit abda288b upstream.
      
      The OF device table must be terminated, otherwise we'll be walking past it
      and into areas unknown.
      
      Fixes: 0cad855f ("auxdisplay: img-ascii-lcd: driver for simple ASCII...")
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Tested-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      90f39ad2
    • Alex Deucher's avatar
      drm/amdgpu: reinstate oland workaround for sclk · 9740abe0
      Alex Deucher authored
      commit e11ddff6 upstream.
      
      Higher sclks seem to be unstable on some boards.
      
      bug: https://bugs.freedesktop.org/show_bug.cgi?id=100222Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9740abe0
    • Ming Lei's avatar
      blk-mq: don't complete un-started request in timeout handler · 21d17f1b
      Ming Lei authored
      commit 95a49603 upstream.
      
      When iterating busy requests in timeout handler,
      if the STARTED flag of one request isn't set, that means
      the request is being processed in block layer or driver, and
      isn't submitted to hardware yet.
      
      In current implementation of blk_mq_check_expired(),
      if the request queue becomes dying, un-started requests are
      handled as being completed/freed immediately. This way is
      wrong, and can cause rq corruption or double allocation[1][2],
      when doing I/O and removing&resetting NVMe device at the sametime.
      
      This patch fixes several issues reported by Yi Zhang.
      
      [1]. oops log 1
      [  581.789754] ------------[ cut here ]------------
      [  581.789758] kernel BUG at block/blk-mq.c:374!
      [  581.789760] invalid opcode: 0000 [#1] SMP
      [  581.789761] Modules linked in: vfat fat ipmi_ssif intel_rapl sb_edac
      edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm nvme
      irqbypass crct10dif_pclmul nvme_core crc32_pclmul ghash_clmulni_intel
      intel_cstate ipmi_si mei_me ipmi_devintf intel_uncore sg ipmi_msghandler
      intel_rapl_perf iTCO_wdt mei iTCO_vendor_support mxm_wmi lpc_ich dcdbas shpchp
      pcspkr acpi_power_meter wmi nfsd auth_rpcgss nfs_acl lockd dm_multipath grace
      sunrpc ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper
      syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ahci libahci
      crc32c_intel tg3 libata megaraid_sas i2c_core ptp fjes pps_core dm_mirror
      dm_region_hash dm_log dm_mod
      [  581.789796] CPU: 1 PID: 1617 Comm: kworker/1:1H Not tainted 4.10.0.bz1420297+ #4
      [  581.789797] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.2.5 09/06/2016
      [  581.789804] Workqueue: kblockd blk_mq_timeout_work
      [  581.789806] task: ffff8804721c8000 task.stack: ffffc90006ee4000
      [  581.789809] RIP: 0010:blk_mq_end_request+0x58/0x70
      [  581.789810] RSP: 0018:ffffc90006ee7d50 EFLAGS: 00010202
      [  581.789811] RAX: 0000000000000001 RBX: ffff8802e4195340 RCX: ffff88028e2f4b88
      [  581.789812] RDX: 0000000000001000 RSI: 0000000000001000 RDI: 0000000000000000
      [  581.789813] RBP: ffffc90006ee7d60 R08: 0000000000000003 R09: ffff88028e2f4b00
      [  581.789814] R10: 0000000000001000 R11: 0000000000000001 R12: 00000000fffffffb
      [  581.789815] R13: ffff88042abe5780 R14: 000000000000002d R15: ffff88046fbdff80
      [  581.789817] FS:  0000000000000000(0000) GS:ffff88047fc00000(0000) knlGS:0000000000000000
      [  581.789818] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  581.789819] CR2: 00007f64f403a008 CR3: 000000014d078000 CR4: 00000000001406e0
      [  581.789820] Call Trace:
      [  581.789825]  blk_mq_check_expired+0x76/0x80
      [  581.789828]  bt_iter+0x45/0x50
      [  581.789830]  blk_mq_queue_tag_busy_iter+0xdd/0x1f0
      [  581.789832]  ? blk_mq_rq_timed_out+0x70/0x70
      [  581.789833]  ? blk_mq_rq_timed_out+0x70/0x70
      [  581.789840]  ? __switch_to+0x140/0x450
      [  581.789841]  blk_mq_timeout_work+0x88/0x170
      [  581.789845]  process_one_work+0x165/0x410
      [  581.789847]  worker_thread+0x137/0x4c0
      [  581.789851]  kthread+0x101/0x140
      [  581.789853]  ? rescuer_thread+0x3b0/0x3b0
      [  581.789855]  ? kthread_park+0x90/0x90
      [  581.789860]  ret_from_fork+0x2c/0x40
      [  581.789861] Code: 48 85 c0 74 0d 44 89 e6 48 89 df ff d0 5b 41 5c 5d c3 48
      8b bb 70 01 00 00 48 85 ff 75 0f 48 89 df e8 7d f0 ff ff 5b 41 5c 5d c3 <0f>
      0b e8 71 f0 ff ff 90 eb e9 0f 1f 40 00 66 2e 0f 1f 84 00 00
      [  581.789882] RIP: blk_mq_end_request+0x58/0x70 RSP: ffffc90006ee7d50
      [  581.789889] ---[ end trace bcaf03d9a14a0a70 ]---
      
      [2]. oops log2
      [ 6984.857362] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
      [ 6984.857372] IP: nvme_queue_rq+0x6e6/0x8cd [nvme]
      [ 6984.857373] PGD 0
      [ 6984.857374]
      [ 6984.857376] Oops: 0000 [#1] SMP
      [ 6984.857379] Modules linked in: ipmi_ssif vfat fat intel_rapl sb_edac
      edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
      irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ipmi_si iTCO_wdt
      iTCO_vendor_support mxm_wmi ipmi_devintf intel_cstate sg dcdbas intel_uncore
      mei_me intel_rapl_perf mei pcspkr lpc_ich ipmi_msghandler shpchp
      acpi_power_meter wmi nfsd auth_rpcgss dm_multipath nfs_acl lockd grace sunrpc
      ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea
      sysfillrect crc32c_intel sysimgblt fb_sys_fops ttm nvme drm nvme_core ahci
      libahci i2c_core tg3 libata ptp megaraid_sas pps_core fjes dm_mirror
      dm_region_hash dm_log dm_mod
      [ 6984.857416] CPU: 7 PID: 1635 Comm: kworker/7:1H Not tainted
      4.10.0-2.el7.bz1420297.x86_64 #1
      [ 6984.857417] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.2.5 09/06/2016
      [ 6984.857427] Workqueue: kblockd blk_mq_run_work_fn
      [ 6984.857429] task: ffff880476e3da00 task.stack: ffffc90002e90000
      [ 6984.857432] RIP: 0010:nvme_queue_rq+0x6e6/0x8cd [nvme]
      [ 6984.857433] RSP: 0018:ffffc90002e93c50 EFLAGS: 00010246
      [ 6984.857434] RAX: 0000000000000000 RBX: ffff880275646600 RCX: 0000000000001000
      [ 6984.857435] RDX: 0000000000000fff RSI: 00000002fba2a000 RDI: ffff8804734e6950
      [ 6984.857436] RBP: ffffc90002e93d30 R08: 0000000000002000 R09: 0000000000001000
      [ 6984.857437] R10: 0000000000001000 R11: 0000000000000000 R12: ffff8804741d8000
      [ 6984.857438] R13: 0000000000000040 R14: ffff880475649f80 R15: ffff8804734e6780
      [ 6984.857439] FS:  0000000000000000(0000) GS:ffff88047fcc0000(0000) knlGS:0000000000000000
      [ 6984.857440] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 6984.857442] CR2: 0000000000000010 CR3: 0000000001c09000 CR4: 00000000001406e0
      [ 6984.857443] Call Trace:
      [ 6984.857451]  ? mempool_free+0x2b/0x80
      [ 6984.857455]  ? bio_free+0x4e/0x60
      [ 6984.857459]  blk_mq_dispatch_rq_list+0xf5/0x230
      [ 6984.857462]  blk_mq_process_rq_list+0x133/0x170
      [ 6984.857465]  __blk_mq_run_hw_queue+0x8c/0xa0
      [ 6984.857467]  blk_mq_run_work_fn+0x12/0x20
      [ 6984.857473]  process_one_work+0x165/0x410
      [ 6984.857475]  worker_thread+0x137/0x4c0
      [ 6984.857478]  kthread+0x101/0x140
      [ 6984.857480]  ? rescuer_thread+0x3b0/0x3b0
      [ 6984.857481]  ? kthread_park+0x90/0x90
      [ 6984.857489]  ret_from_fork+0x2c/0x40
      [ 6984.857490] Code: 8b bd 70 ff ff ff 89 95 50 ff ff ff 89 8d 58 ff ff ff 44
      89 95 60 ff ff ff e8 b7 dd 12 e1 8b 95 50 ff ff ff 48 89 85 68 ff ff ff <4c>
      8b 48 10 44 8b 58 18 8b 8d 58 ff ff ff 44 8b 95 60 ff ff ff
      [ 6984.857511] RIP: nvme_queue_rq+0x6e6/0x8cd [nvme] RSP: ffffc90002e93c50
      [ 6984.857512] CR2: 0000000000000010
      [ 6984.895359] ---[ end trace 2d7ceb528432bf83 ]---
      Reported-by: default avatarYi Zhang <yizhan@redhat.com>
      Tested-by: default avatarYi Zhang <yizhan@redhat.com>
      Reviewed-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarMing Lei <tom.leiming@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      21d17f1b
    • Tejun Heo's avatar
      cgroup, net_cls: iterate the fds of only the tasks which are being migrated · 62f6341c
      Tejun Heo authored
      commit a05d4fd9 upstream.
      
      The net_cls controller controls the classid field of each socket which
      is associated with the cgroup.  Because the classid is per-socket
      attribute, when a task migrates to another cgroup or the configured
      classid of the cgroup changes, the controller needs to walk all
      sockets and update the classid value, which was implemented by
      3b13758f ("cgroups: Allow dynamically changing net_classid").
      
      While the approach is not scalable, migrating tasks which have a lot
      of fds attached to them is rare and the cost is born by the ones
      initiating the operations.  However, for simplicity, both the
      migration and classid config change paths call update_classid() which
      scans all fds of all tasks in the target css.  This is an overkill for
      the migration path which only needs to cover a much smaller subset of
      tasks which are actually getting migrated in.
      
      On cgroup v1, this can lead to unexpected scalability issues when one
      tries to migrate a task or process into a net_cls cgroup which already
      contains a lot of fds.  Even if the migration traget doesn't have many
      to get scanned, update_classid() ends up scanning all fds in the
      target cgroup which can be extremely numerous.
      
      Unfortunately, on cgroup v2 which doesn't use net_cls, the problem is
      even worse.  Before bfc2cf6f ("cgroup: call subsys->*attach() only
      for subsystems which are actually affected by migration"), cgroup core
      would call the ->css_attach callback even for controllers which don't
      see actual migration to a different css.
      
      As net_cls is always disabled but still mounted on cgroup v2, whenever
      a process is migrated on the cgroup v2 hierarchy, net_cls sees
      identity migration from root to root and cgroup core used to call
      ->css_attach callback for those.  The net_cls ->css_attach ends up
      calling update_classid() on the root net_cls css to which all
      processes on the system belong to as the controller isn't used.  This
      makes any cgroup v2 migration O(total_number_of_fds_on_the_system)
      which is horrible and easily leads to noticeable stalls triggering RCU
      stall warnings and so on.
      
      The worst symptom is already fixed in upstream by bfc2cf6f
      ("cgroup: call subsys->*attach() only for subsystems which are
      actually affected by migration"); however, backporting that commit is
      too invasive and we want to avoid other cases too.
      
      This patch updates net_cls's cgrp_attach() to iterate fds of only the
      processes which are actually getting migrated.  This removes the
      surprising migration cost which is dependent on the total number of
      fds in the target cgroup.  As this leaves write_classid() the only
      user of update_classid(), open-code the helper into write_classid().
      Reported-by: default avatarDavid Goode <dgoode@fb.com>
      Fixes: 3b13758f ("cgroups: Allow dynamically changing net_classid")
      Cc: Nina Schiff <ninasc@fb.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      62f6341c
    • Viresh Kumar's avatar
      cpufreq: Restore policy min/max limits on CPU online · f5650846
      Viresh Kumar authored
      commit ff010472 upstream.
      
      On CPU online the cpufreq core restores the previous governor (or
      the previous "policy" setting for ->setpolicy drivers), but it does
      not restore the min/max limits at the same time, which is confusing,
      inconsistent and real pain for users who set the limits and then
      suspend/resume the system (using full suspend), in which case the
      limits are reset on all CPUs except for the boot one.
      
      Fix this by making cpufreq_online() restore the limits when an inactive
      policy is brought online.
      
      The commit log and patch are inspired from Rafael's earlier work.
      Reported-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f5650846
    • Neeraj Upadhyay's avatar
      arm64: kaslr: Fix up the kernel image alignment · b9ed800f
      Neeraj Upadhyay authored
      commit afd0e5a8 upstream.
      
      If kernel image extends across alignment boundary, existing
      code increases the KASLR offset by size of kernel image. The
      offset is masked after resizing. There are cases, where after
      masking, we may still have kernel image extending across
      boundary. This eventually results in only 2MB block getting
      mapped while creating the page tables. This results in data aborts
      while accessing unmapped regions during second relocation (with
      kaslr offset) in __primary_switch. To fix this problem, round up the
      kernel image size, by swapper block size, before adding it for
      correction.
      
      For example consider below case, where kernel image still crosses
      1GB alignment boundary, after masking the offset, which is fixed
      by rounding up kernel image size.
      
      SWAPPER_TABLE_SHIFT = 30
      Swapper using section maps with section size 2MB.
      CONFIG_PGTABLE_LEVELS = 3
      VA_BITS = 39
      
      _text  : 0xffffff8008080000
      _end   : 0xffffff800aa1b000
      offset : 0x1f35600000
      mask = ((1UL << (VA_BITS - 2)) - 1) & ~(SZ_2M - 1)
      
      (_text + offset) >> SWAPPER_TABLE_SHIFT = 0x3fffffe7c
      (_end + offset) >> SWAPPER_TABLE_SHIFT  = 0x3fffffe7d
      
      offset after existing correction (before mask) = 0x1f37f9b000
      (_text + offset) >> SWAPPER_TABLE_SHIFT = 0x3fffffe7d
      (_end + offset) >> SWAPPER_TABLE_SHIFT  = 0x3fffffe7d
      
      offset (after mask) = 0x1f37e00000
      (_text + offset) >> SWAPPER_TABLE_SHIFT = 0x3fffffe7c
      (_end + offset) >> SWAPPER_TABLE_SHIFT  = 0x3fffffe7d
      
      new offset w/ rounding up = 0x1f38000000
      (_text + offset) >> SWAPPER_TABLE_SHIFT = 0x3fffffe7d
      (_end + offset) >> SWAPPER_TABLE_SHIFT  = 0x3fffffe7d
      
      Fixes: f80fb3a3 ("arm64: add support for kernel ASLR")
      Reviewed-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarNeeraj Upadhyay <neeraju@codeaurora.org>
      Signed-off-by: default avatarSrinivas Ramana <sramana@codeaurora.org>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b9ed800f
    • Nicolas Ferre's avatar
      ARM: at91: pm: cpu_idle: switch DDR to power-down mode · 2ab97521
      Nicolas Ferre authored
      commit 60b89f19 upstream.
      
      On some DDR controllers, compatible with the sama5d3 one,
      the sequence to enter/exit/re-enter the self-refresh mode adds
      more constrains than what is currently written in the at91_idle
      driver. An actual access to the DDR chip is needed between exit
      and re-enter of this mode which is somehow difficult to implement.
      This sequence can completely hang the SoC. It is particularly
      experienced on parts which embed a L2 cache if the code run
      between IDLE calls fits in it...
      
      Moreover, as the intention is to enter and exit pretty rapidly
      from IDLE, the power-down mode is a good candidate.
      
      So now we use power-down instead of self-refresh. As we can
      simplify the code for sama5d3 compatible DDR controllers,
      we instantiate a new sama5d3_ddr_standby() function.
      Signed-off-by: default avatarNicolas Ferre <nicolas.ferre@microchip.com>
      Fixes: 017b5522 ("ARM: at91: Add new binding for sama5d3-ddramc")
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@free-electrons.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2ab97521
    • Romain Izard's avatar
      Revert "ARM: at91/dt: sama5d2: Use new compatible for ohci node" · ca5477ad
      Romain Izard authored
      commit 9e10889a upstream.
      
      This reverts commit cab43282 ("ARM: at91/dt: sama5d2: Use new
      compatible for ohci node")
      
      It depends from commit 7150bc9b ("usb: ohci-at91: Forcibly suspend
      ports while USB suspend") which was reverted and implemented
      differently. With the new implementation, the compatible string must
      remain the same.
      
      The compatible string introduced by this commit has been used in the
      default SAMA5D2 dtsi starting from Linux 4.8. As it has never been
      working correctly in an official release, removing it should not be
      breaking the stability rules.
      
      Fixes: cab43282 ("ARM: at91/dt: sama5d2: Use new compatible for ohci node")
      Signed-off-by: default avatarRomain Izard <romain.izard.pro@gmail.com>
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@free-electrons.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ca5477ad
    • Koos Vriezen's avatar
      iommu/vt-d: Fix NULL pointer dereference in device_to_iommu · 352c0214
      Koos Vriezen authored
      commit 5003ae1e upstream.
      
      The function device_to_iommu() in the Intel VT-d driver
      lacks a NULL-ptr check, resulting in this oops at boot on
      some platforms:
      
       BUG: unable to handle kernel NULL pointer dereference at 00000000000007ab
       IP: [<ffffffff8132234a>] device_to_iommu+0x11a/0x1a0
       PGD 0
      
       [...]
      
       Call Trace:
         ? find_or_alloc_domain.constprop.29+0x1a/0x300
         ? dw_dma_probe+0x561/0x580 [dw_dmac_core]
         ? __get_valid_domain_for_dev+0x39/0x120
         ? __intel_map_single+0x138/0x180
         ? intel_alloc_coherent+0xb6/0x120
         ? sst_hsw_dsp_init+0x173/0x420 [snd_soc_sst_haswell_pcm]
         ? mutex_lock+0x9/0x30
         ? kernfs_add_one+0xdb/0x130
         ? devres_add+0x19/0x60
         ? hsw_pcm_dev_probe+0x46/0xd0 [snd_soc_sst_haswell_pcm]
         ? platform_drv_probe+0x30/0x90
         ? driver_probe_device+0x1ed/0x2b0
         ? __driver_attach+0x8f/0xa0
         ? driver_probe_device+0x2b0/0x2b0
         ? bus_for_each_dev+0x55/0x90
         ? bus_add_driver+0x110/0x210
         ? 0xffffffffa11ea000
         ? driver_register+0x52/0xc0
         ? 0xffffffffa11ea000
         ? do_one_initcall+0x32/0x130
         ? free_vmap_area_noflush+0x37/0x70
         ? kmem_cache_alloc+0x88/0xd0
         ? do_init_module+0x51/0x1c4
         ? load_module+0x1ee9/0x2430
         ? show_taint+0x20/0x20
         ? kernel_read_file+0xfd/0x190
         ? SyS_finit_module+0xa3/0xb0
         ? do_syscall_64+0x4a/0xb0
         ? entry_SYSCALL64_slow_path+0x25/0x25
       Code: 78 ff ff ff 4d 85 c0 74 ee 49 8b 5a 10 0f b6 9b e0 00 00 00 41 38 98 e0 00 00 00 77 da 0f b6 eb 49 39 a8 88 00 00 00 72 ce eb 8f <41> f6 82 ab 07 00 00 04 0f 85 76 ff ff ff 0f b6 4d 08 88 0e 49
       RIP  [<ffffffff8132234a>] device_to_iommu+0x11a/0x1a0
        RSP <ffffc90001457a78>
       CR2: 00000000000007ab
       ---[ end trace 16f974b6d58d0aad ]---
      
      Add the missing pointer check.
      
      Fixes: 1c387188 ("iommu/vt-d: Fix IOMMU lookup for SR-IOV Virtual Functions")
      Signed-off-by: default avatarKoos Vriezen <koos.vriezen@gmail.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      352c0214
    • Ankur Arora's avatar
      xen/acpi: upload PM state from init-domain to Xen · bc63212d
      Ankur Arora authored
      commit 1914f0cd upstream.
      
      This was broken in commit cd979883 ("xen/acpi-processor:
      fix enabling interrupts on syscore_resume"). do_suspend (from
      xen/manage.c) and thus xen_resume_notifier never get called on
      the initial-domain at resume (it is if running as guest.)
      
      The rationale for the breaking change was that upload_pm_data()
      potentially does blocking work in syscore_resume(). This patch
      addresses the original issue by scheduling upload_pm_data() to
      execute in workqueue context.
      
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Based-on-patch-by: default avatarKonrad Wilk <konrad.wilk@oracle.com>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: default avatarAnkur Arora <ankur.a.arora@oracle.com>
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bc63212d