1. 31 Mar, 2017 17 commits
  2. 30 Mar, 2017 23 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.10.7 · 55db23d3
      Greg Kroah-Hartman authored
      55db23d3
    • Jiri Slaby's avatar
      crypto: algif_hash - avoid zero-sized array · 0dad3de8
      Jiri Slaby authored
      commit 62071194 upstream.
      
      With this reproducer:
        struct sockaddr_alg alg = {
                .salg_family = 0x26,
                .salg_type = "hash",
                .salg_feat = 0xf,
                .salg_mask = 0x5,
                .salg_name = "digest_null",
        };
        int sock, sock2;
      
        sock = socket(AF_ALG, SOCK_SEQPACKET, 0);
        bind(sock, (struct sockaddr *)&alg, sizeof(alg));
        sock2 = accept(sock, NULL, NULL);
        setsockopt(sock, SOL_ALG, ALG_SET_KEY, "\x9b\xca", 2);
        accept(sock2, NULL, NULL);
      
      ==== 8< ======== 8< ======== 8< ======== 8< ====
      
      one can immediatelly see an UBSAN warning:
      UBSAN: Undefined behaviour in crypto/algif_hash.c:187:7
      variable length array bound value 0 <= 0
      CPU: 0 PID: 15949 Comm: syz-executor Tainted: G            E      4.4.30-0-default #1
      ...
      Call Trace:
      ...
       [<ffffffff81d598fd>] ? __ubsan_handle_vla_bound_not_positive+0x13d/0x188
       [<ffffffff81d597c0>] ? __ubsan_handle_out_of_bounds+0x1bc/0x1bc
       [<ffffffffa0e2204d>] ? hash_accept+0x5bd/0x7d0 [algif_hash]
       [<ffffffffa0e2293f>] ? hash_accept_nokey+0x3f/0x51 [algif_hash]
       [<ffffffffa0e206b0>] ? hash_accept_parent_nokey+0x4a0/0x4a0 [algif_hash]
       [<ffffffff8235c42b>] ? SyS_accept+0x2b/0x40
      
      It is a correct warning, as hash state is propagated to accept as zero,
      but creating a zero-length variable array is not allowed in C.
      
      Fix this as proposed by Herbert -- do "?: 1" on that site. No sizeof or
      similar happens in the code there, so we just allocate one byte even
      though we do not use the array.
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: "David S. Miller" <davem@davemloft.net> (maintainer:CRYPTO API)
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0dad3de8
    • Takashi Iwai's avatar
      fbcon: Fix vc attr at deinit · f9955dca
      Takashi Iwai authored
      commit 8aac7f34 upstream.
      
      fbcon can deal with vc_hi_font_mask (the upper 256 chars) and adjust
      the vc attrs dynamically when vc_hi_font_mask is changed at
      fbcon_init().  When the vc_hi_font_mask is set, it remaps the attrs in
      the existing console buffer with one bit shift up (for 9 bits), while
      it remaps with one bit shift down (for 8 bits) when the value is
      cleared.  It works fine as long as the font gets updated after fbcon
      was initialized.
      
      However, we hit a bizarre problem when the console is switched to
      another fb driver (typically from vesafb or efifb to drmfb).  At
      switching to the new fb driver, we temporarily rebind the console to
      the dummy console, then rebind to the new driver.  During the
      switching, we leave the modified attrs as is.  Thus, the new fbcon
      takes over the old buffer as if it were to contain 8 bits chars
      (although the attrs are still shifted for 9 bits), and effectively
      this results in the yellow color texts instead of the original white
      color, as found in the bugzilla entry below.
      
      An easy fix for this is to re-adjust the attrs before leaving the
      fbcon at con_deinit callback.  Since the code to adjust the attrs is
      already present in the current fbcon code, in this patch, we simply
      factor out the relevant code, and call it from fbcon_deinit().
      
      Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1000619Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f9955dca
    • Daniel Vetter's avatar
      drm: reference count event->completion · 2a324104
      Daniel Vetter authored
      commit 24835e44 upstream.
      
      When writing the generic nonblocking commit code I assumed that
      through clever lifetime management I can assure that the completion
      (stored in drm_crtc_commit) only gets freed after it is completed. And
      that worked.
      
      I also wanted to make nonblocking helpers resilient against driver
      bugs, by having timeouts everywhere. And that worked too.
      
      Unfortunately taking boths things together results in oopses :( Well,
      at least sometimes: What seems to happen is that the drm event hangs
      around forever stuck in limbo land. The nonblocking helpers eventually
      time out, move on and release it. Now the bug I tested all this
      against is drivers that just entirely fail to deliver the vblank
      events like they should, and in those cases the event is simply
      leaked. But what seems to happen, at least sometimes, on i915 is that
      the event is set up correctly, but somohow the vblank fails to fire in
      time. Which means the event isn't leaked, it's still there waiting for
      eventually a vblank to fire. That tends to happen when re-enabling the
      pipe, and then the trap springs and the kernel oopses.
      
      The correct fix here is simply to refcount the crtc commit to make
      sure that the event sticks around even for drivers which only
      sometimes fail to deliver vblanks for some arbitrary reasons. Since
      crtc commits are already refcounted that's easy to do.
      
      References: https://bugs.freedesktop.org/show_bug.cgi?id=96781
      Cc: Jim Rees <rees@umich.edu>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161221102331.31033-1-daniel.vetter@ffwll.ch
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2a324104
    • Dan Streetman's avatar
      xen: do not re-use pirq number cached in pci device msi msg data · 59758483
      Dan Streetman authored
      commit c74fd80f upstream.
      
      Revert the main part of commit:
      af42b8d1 ("xen: fix MSI setup and teardown for PV on HVM guests")
      
      That commit introduced reading the pci device's msi message data to see
      if a pirq was previously configured for the device's msi/msix, and re-use
      that pirq.  At the time, that was the correct behavior.  However, a
      later change to Qemu caused it to call into the Xen hypervisor to unmap
      all pirqs for a pci device, when the pci device disables its MSI/MSIX
      vectors; specifically the Qemu commit:
      c976437c7dba9c7444fb41df45468968aaa326ad
      ("qemu-xen: free all the pirqs for msi/msix when driver unload")
      
      Once Qemu added this pirq unmapping, it was no longer correct for the
      kernel to re-use the pirq number cached in the pci device msi message
      data.  All Qemu releases since 2.1.0 contain the patch that unmaps the
      pirqs when the pci device disables its MSI/MSIX vectors.
      
      This bug is causing failures to initialize multiple NVMe controllers
      under Xen, because the NVMe driver sets up a single MSIX vector for
      each controller (concurrently), and then after using that to talk to
      the controller for some configuration data, it disables the single MSIX
      vector and re-configures all the MSIX vectors it needs.  So the MSIX
      setup code tries to re-use the cached pirq from the first vector
      for each controller, but the hypervisor has already given away that
      pirq to another controller, and its initialization fails.
      
      This is discussed in more detail at:
      https://lists.xen.org/archives/html/xen-devel/2017-01/msg00447.html
      
      Fixes: af42b8d1 ("xen: fix MSI setup and teardown for PV on HVM guests")
      Signed-off-by: default avatarDan Streetman <dan.streetman@canonical.com>
      Reviewed-by: default avatarStefano Stabellini <sstabellini@kernel.org>
      Acked-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      59758483
    • Vaidyanathan Srinivasan's avatar
      cpuidle: Validate cpu_dev in cpuidle_add_sysfs() · 53569305
      Vaidyanathan Srinivasan authored
      commit ad0a45fd upstream.
      
      If a given cpu is not in cpu_present and cpu hotplug
      is disabled, arch can skip setting up the cpu_dev.
      
      Arch cpuidle driver should pass correct cpu mask
      for registration, but failing to do so by the driver
      causes error to propagate and crash like this:
      
      [   30.076045] Unable to handle kernel paging request for data at address 0x00000048
      [   30.076100] Faulting instruction address: 0xc0000000007b2f30
      cpu 0x4d: Vector: 300 (Data Access) at [c000003feb18b670]
          pc: c0000000007b2f30: kobject_get+0x20/0x70
          lr: c0000000007b3c94: kobject_add_internal+0x54/0x3f0
          sp: c000003feb18b8f0
         msr: 9000000000009033
         dar: 48
       dsisr: 40000000
        current = 0xc000003fd2ed8300
        paca    = 0xc00000000fbab500   softe: 0        irq_happened: 0x01
          pid   = 1, comm = swapper/0
      Linux version 4.11.0-rc2-svaidy+ (sv@sagarika) (gcc version 6.2.0
      20161005 (Ubuntu 6.2.0-5ubuntu12) ) #10 SMP Sun Mar 19 00:08:09 IST 2017
      enter ? for help
      [c000003feb18b960] c0000000007b3c94 kobject_add_internal+0x54/0x3f0
      [c000003feb18b9f0] c0000000007b43a4 kobject_init_and_add+0x64/0xa0
      [c000003feb18ba70] c000000000e284f4 cpuidle_add_sysfs+0xb4/0x130
      [c000003feb18baf0] c000000000e26038 cpuidle_register_device+0x118/0x1c0
      [c000003feb18bb30] c000000000e26c48 cpuidle_register+0x78/0x120
      [c000003feb18bbc0] c00000000168fd9c powernv_processor_idle_init+0x110/0x1c4
      [c000003feb18bc40] c00000000000cff8 do_one_initcall+0x68/0x1d0
      [c000003feb18bd00] c0000000016242f4 kernel_init_freeable+0x280/0x360
      [c000003feb18bdc0] c00000000000d864 kernel_init+0x24/0x160
      [c000003feb18be30] c00000000000b4e8 ret_from_kernel_thread+0x5c/0x74
      
      Validating cpu_dev fixes the crash and reports correct error message like:
      
      [   30.163506] Failed to register cpuidle device for cpu136
      [   30.173329] Registration of powernv driver failed.
      Signed-off-by: default avatarVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      [ rjw: Comment massage ]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      53569305
    • Damien Le Moal's avatar
      scsi: sd: Check for unaligned partial completion · a27142e6
      Damien Le Moal authored
      commit c46f0917 upstream.
      
      Commit <f2e767bb> ("mpt3sas: Force request partial completion
      alignment") was not considering the case of commands not operating on
      logical block size units (e.g. REQ_OP_ZONE_REPORT and its 64B aligned
      partial replies). In this case, forcing alignment of resid to the device
      logical block size can break the command result, e.g. in the case of
      REQ_OP_ZONE_REPORT, the exact number of zone reported by the device.
      
      Move the partial completion alignement check of mpt3sas to a generic
      implementation in sd_done(). The check is added within the default
      section of the initial req_op() switch case so that the report and reset
      zone commands are ignored. In addition, as sd_done() is not called for
      passthrough requests, resid corrections are not done as intended by the
      initial mpt3sas patch.
      
      Fixes: f2e767bb ("mpt3sas: Force request partial completion alignment")
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Acked-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      a27142e6
    • Dave Jiang's avatar
      device-dax: fix pmd/pte fault fallback handling · 66c08128
      Dave Jiang authored
      commit 0134ed4f upstream.
      
      Jeff Moyer reports:
      
          With a device dax alignment of 4KB or 2MB, I get sigbus when running
          the attached fio job file for the current kernel (4.11.0-rc1+).  If
          I specify an alignment of 1GB, it works.
      
          I turned on debug output, and saw that it was failing in the huge
          fault code.
      
           dax dax1.0: dax_open
           dax dax1.0: dax_mmap
           dax dax1.0: dax_dev_huge_fault: fio: write (0x7f08f0a00000 -
           dax dax1.0: __dax_dev_pud_fault: phys_to_pgoff(0xffffffffcf60
           dax dax1.0: dax_release
      
          fio config for reproduce:
          [global]
          ioengine=dev-dax
          direct=0
          filename=/dev/dax0.0
          bs=2m
      
          [write]
          rw=write
      
          [read]
          stonewall
          rw=read
      
      The driver fails to fallback when taking a fault that is larger than
      the device alignment, or handling a larger fault when a smaller
      mapping is already established. While we could support larger
      mappings for a device with a smaller alignment, that change is
      too large for the immediate fix. The simplest change is to force
      fallback until the fault size matches the alignment.
      
      Fixes: dee41079 ("/dev/dax, core: file operations and dax-mmap")
      Cc: <stable@vger.kernel.org>
      Reported-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      66c08128
    • Ilya Dryomov's avatar
      libceph: don't set weight to IN when OSD is destroyed · 96aa12df
      Ilya Dryomov authored
      commit b581a585 upstream.
      
      Since ceph.git commit 4e28f9e63644 ("osd/OSDMap: clear osd_info,
      osd_xinfo on osd deletion"), weight is set to IN when OSD is deleted.
      This changes the result of applying an incremental for clients, not
      just OSDs.  Because CRUSH computations are obviously affected,
      pre-4e28f9e63644 servers disagree with post-4e28f9e63644 clients on
      object placement, resulting in misdirected requests.
      
      Mirrors ceph.git commit a6009d1039a55e2c77f431662b3d6cc5a8e8e63f.
      
      Fixes: 930c5328 ("libceph: apply new_state before new_up_client on incrementals")
      Link: http://tracker.ceph.com/issues/19122Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarSage Weil <sage@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      96aa12df
    • Adrian Hunter's avatar
      mmc: block: Fix is_waiting_last_req set incorrectly · 8b38e319
      Adrian Hunter authored
      commit 2602b740 upstream.
      
      Commit 15520111 ("mmc: core: Further fix thread wake-up") allowed a
      queue to release the host with is_waiting_last_req set to true. A queue
      waiting to claim the host will not reset it, which can result in the
      queue getting stuck in a loop.
      
      Fixes: 15520111 ("mmc: core: Further fix thread wake-up")
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8b38e319
    • K. Y. Srinivasan's avatar
      Drivers: hv: vmbus: Don't leak memory when a channel is rescinded · f2a9bf4d
      K. Y. Srinivasan authored
      commit 5e030d5c upstream.
      
      When we close a channel that has been rescinded, we will leak memory since
      vmbus_teardown_gpadl() returns an error. Fix this so that we can properly
      cleanup the memory allocated to the ring buffers.
      
      Fixes: ccb61f8a ("Drivers: hv: vmbus: Fix a rescind handling bug")
      Signed-off-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Cc: Dexuan Cui <decui@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f2a9bf4d
    • K. Y. Srinivasan's avatar
      Drivers: hv: vmbus: Don't leak channel ids · 84006577
      K. Y. Srinivasan authored
      commit 9a547602 upstream.
      
      If we cannot allocate memory for the channel, free the relid
      associated with the channel.
      Signed-off-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      84006577
    • Alexander Shishkin's avatar
      intel_th: Don't leak module refcount on failure to activate · f8dd767b
      Alexander Shishkin authored
      commit e609ccef upstream.
      
      Output 'activation' may fail for the reasons of the output driver,
      for example, if msc's buffer is not allocated. We forget, however,
      to drop the module reference in this case. So each attempt at
      activation in this case leaks a reference, preventing the module
      from ever unloading.
      
      This patch adds the missing module_put() in the activation error
      path.
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f8dd767b
    • Eric Biggers's avatar
      jbd2: don't leak memory if setting up journal fails · 7bf105ac
      Eric Biggers authored
      commit cd9cb405 upstream.
      
      In journal_init_common(), if we failed to allocate the j_wbuf array, or
      if we failed to create the buffer_head for the journal superblock, we
      leaked the memory allocated for the revocation tables.  Fix this.
      
      Fixes: f0c9fd54Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7bf105ac
    • Dmitry Torokhov's avatar
      auxdisplay: img-ascii-lcd: add missing sentinel entry in img_ascii_lcd_matches · 8668c61b
      Dmitry Torokhov authored
      commit abda288b upstream.
      
      The OF device table must be terminated, otherwise we'll be walking past it
      and into areas unknown.
      
      Fixes: 0cad855f ("auxdisplay: img-ascii-lcd: driver for simple ASCII...")
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Tested-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8668c61b
    • Evan Quan's avatar
      drm/amd/amdgpu: add POLARIS12 PCI ID · 67dfc085
      Evan Quan authored
      commit cf8c73af upstream.
      Signed-off-by: default avatarEvan Quan <evan.quan@amd.com>
      Reviewed-by: default avatarJunwei Zhang <Jerry.Zhang@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      67dfc085
    • Alex Deucher's avatar
      drm/amdgpu: reinstate oland workaround for sclk · a7a14362
      Alex Deucher authored
      commit e11ddff6 upstream.
      
      Higher sclks seem to be unstable on some boards.
      
      bug: https://bugs.freedesktop.org/show_bug.cgi?id=100222Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a7a14362
    • Arnd Bergmann's avatar
      cpsw/netcp: cpts depends on posix_timers · 51d3848c
      Arnd Bergmann authored
      commit 07fef362 upstream.
      
      With posix timers having become optional, we get a build error with
      the cpts time sync option of the CPSW driver:
      
      drivers/net/ethernet/ti/cpts.c: In function 'cpts_find_ts':
      drivers/net/ethernet/ti/cpts.c:291:23: error: implicit declaration of function 'ptp_classify_raw';did you mean 'ptp_classifier_init'? [-Werror=implicit-function-declaration]
      
      This adds a hard dependency on PTP_CLOCK to avoid the problem, as
      building it without PTP support makes no sense anyway.
      
      Fixes: baa73d9e ("posix-timers: Make them configurable")
      Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarNicolas Pitre <nico@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      51d3848c
    • Ming Lei's avatar
      blk-mq: don't complete un-started request in timeout handler · 16379a79
      Ming Lei authored
      commit 95a49603 upstream.
      
      When iterating busy requests in timeout handler,
      if the STARTED flag of one request isn't set, that means
      the request is being processed in block layer or driver, and
      isn't submitted to hardware yet.
      
      In current implementation of blk_mq_check_expired(),
      if the request queue becomes dying, un-started requests are
      handled as being completed/freed immediately. This way is
      wrong, and can cause rq corruption or double allocation[1][2],
      when doing I/O and removing&resetting NVMe device at the sametime.
      
      This patch fixes several issues reported by Yi Zhang.
      
      [1]. oops log 1
      [  581.789754] ------------[ cut here ]------------
      [  581.789758] kernel BUG at block/blk-mq.c:374!
      [  581.789760] invalid opcode: 0000 [#1] SMP
      [  581.789761] Modules linked in: vfat fat ipmi_ssif intel_rapl sb_edac
      edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm nvme
      irqbypass crct10dif_pclmul nvme_core crc32_pclmul ghash_clmulni_intel
      intel_cstate ipmi_si mei_me ipmi_devintf intel_uncore sg ipmi_msghandler
      intel_rapl_perf iTCO_wdt mei iTCO_vendor_support mxm_wmi lpc_ich dcdbas shpchp
      pcspkr acpi_power_meter wmi nfsd auth_rpcgss nfs_acl lockd dm_multipath grace
      sunrpc ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper
      syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ahci libahci
      crc32c_intel tg3 libata megaraid_sas i2c_core ptp fjes pps_core dm_mirror
      dm_region_hash dm_log dm_mod
      [  581.789796] CPU: 1 PID: 1617 Comm: kworker/1:1H Not tainted 4.10.0.bz1420297+ #4
      [  581.789797] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.2.5 09/06/2016
      [  581.789804] Workqueue: kblockd blk_mq_timeout_work
      [  581.789806] task: ffff8804721c8000 task.stack: ffffc90006ee4000
      [  581.789809] RIP: 0010:blk_mq_end_request+0x58/0x70
      [  581.789810] RSP: 0018:ffffc90006ee7d50 EFLAGS: 00010202
      [  581.789811] RAX: 0000000000000001 RBX: ffff8802e4195340 RCX: ffff88028e2f4b88
      [  581.789812] RDX: 0000000000001000 RSI: 0000000000001000 RDI: 0000000000000000
      [  581.789813] RBP: ffffc90006ee7d60 R08: 0000000000000003 R09: ffff88028e2f4b00
      [  581.789814] R10: 0000000000001000 R11: 0000000000000001 R12: 00000000fffffffb
      [  581.789815] R13: ffff88042abe5780 R14: 000000000000002d R15: ffff88046fbdff80
      [  581.789817] FS:  0000000000000000(0000) GS:ffff88047fc00000(0000) knlGS:0000000000000000
      [  581.789818] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  581.789819] CR2: 00007f64f403a008 CR3: 000000014d078000 CR4: 00000000001406e0
      [  581.789820] Call Trace:
      [  581.789825]  blk_mq_check_expired+0x76/0x80
      [  581.789828]  bt_iter+0x45/0x50
      [  581.789830]  blk_mq_queue_tag_busy_iter+0xdd/0x1f0
      [  581.789832]  ? blk_mq_rq_timed_out+0x70/0x70
      [  581.789833]  ? blk_mq_rq_timed_out+0x70/0x70
      [  581.789840]  ? __switch_to+0x140/0x450
      [  581.789841]  blk_mq_timeout_work+0x88/0x170
      [  581.789845]  process_one_work+0x165/0x410
      [  581.789847]  worker_thread+0x137/0x4c0
      [  581.789851]  kthread+0x101/0x140
      [  581.789853]  ? rescuer_thread+0x3b0/0x3b0
      [  581.789855]  ? kthread_park+0x90/0x90
      [  581.789860]  ret_from_fork+0x2c/0x40
      [  581.789861] Code: 48 85 c0 74 0d 44 89 e6 48 89 df ff d0 5b 41 5c 5d c3 48
      8b bb 70 01 00 00 48 85 ff 75 0f 48 89 df e8 7d f0 ff ff 5b 41 5c 5d c3 <0f>
      0b e8 71 f0 ff ff 90 eb e9 0f 1f 40 00 66 2e 0f 1f 84 00 00
      [  581.789882] RIP: blk_mq_end_request+0x58/0x70 RSP: ffffc90006ee7d50
      [  581.789889] ---[ end trace bcaf03d9a14a0a70 ]---
      
      [2]. oops log2
      [ 6984.857362] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
      [ 6984.857372] IP: nvme_queue_rq+0x6e6/0x8cd [nvme]
      [ 6984.857373] PGD 0
      [ 6984.857374]
      [ 6984.857376] Oops: 0000 [#1] SMP
      [ 6984.857379] Modules linked in: ipmi_ssif vfat fat intel_rapl sb_edac
      edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
      irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ipmi_si iTCO_wdt
      iTCO_vendor_support mxm_wmi ipmi_devintf intel_cstate sg dcdbas intel_uncore
      mei_me intel_rapl_perf mei pcspkr lpc_ich ipmi_msghandler shpchp
      acpi_power_meter wmi nfsd auth_rpcgss dm_multipath nfs_acl lockd grace sunrpc
      ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea
      sysfillrect crc32c_intel sysimgblt fb_sys_fops ttm nvme drm nvme_core ahci
      libahci i2c_core tg3 libata ptp megaraid_sas pps_core fjes dm_mirror
      dm_region_hash dm_log dm_mod
      [ 6984.857416] CPU: 7 PID: 1635 Comm: kworker/7:1H Not tainted
      4.10.0-2.el7.bz1420297.x86_64 #1
      [ 6984.857417] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.2.5 09/06/2016
      [ 6984.857427] Workqueue: kblockd blk_mq_run_work_fn
      [ 6984.857429] task: ffff880476e3da00 task.stack: ffffc90002e90000
      [ 6984.857432] RIP: 0010:nvme_queue_rq+0x6e6/0x8cd [nvme]
      [ 6984.857433] RSP: 0018:ffffc90002e93c50 EFLAGS: 00010246
      [ 6984.857434] RAX: 0000000000000000 RBX: ffff880275646600 RCX: 0000000000001000
      [ 6984.857435] RDX: 0000000000000fff RSI: 00000002fba2a000 RDI: ffff8804734e6950
      [ 6984.857436] RBP: ffffc90002e93d30 R08: 0000000000002000 R09: 0000000000001000
      [ 6984.857437] R10: 0000000000001000 R11: 0000000000000000 R12: ffff8804741d8000
      [ 6984.857438] R13: 0000000000000040 R14: ffff880475649f80 R15: ffff8804734e6780
      [ 6984.857439] FS:  0000000000000000(0000) GS:ffff88047fcc0000(0000) knlGS:0000000000000000
      [ 6984.857440] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 6984.857442] CR2: 0000000000000010 CR3: 0000000001c09000 CR4: 00000000001406e0
      [ 6984.857443] Call Trace:
      [ 6984.857451]  ? mempool_free+0x2b/0x80
      [ 6984.857455]  ? bio_free+0x4e/0x60
      [ 6984.857459]  blk_mq_dispatch_rq_list+0xf5/0x230
      [ 6984.857462]  blk_mq_process_rq_list+0x133/0x170
      [ 6984.857465]  __blk_mq_run_hw_queue+0x8c/0xa0
      [ 6984.857467]  blk_mq_run_work_fn+0x12/0x20
      [ 6984.857473]  process_one_work+0x165/0x410
      [ 6984.857475]  worker_thread+0x137/0x4c0
      [ 6984.857478]  kthread+0x101/0x140
      [ 6984.857480]  ? rescuer_thread+0x3b0/0x3b0
      [ 6984.857481]  ? kthread_park+0x90/0x90
      [ 6984.857489]  ret_from_fork+0x2c/0x40
      [ 6984.857490] Code: 8b bd 70 ff ff ff 89 95 50 ff ff ff 89 8d 58 ff ff ff 44
      89 95 60 ff ff ff e8 b7 dd 12 e1 8b 95 50 ff ff ff 48 89 85 68 ff ff ff <4c>
      8b 48 10 44 8b 58 18 8b 8d 58 ff ff ff 44 8b 95 60 ff ff ff
      [ 6984.857511] RIP: nvme_queue_rq+0x6e6/0x8cd [nvme] RSP: ffffc90002e93c50
      [ 6984.857512] CR2: 0000000000000010
      [ 6984.895359] ---[ end trace 2d7ceb528432bf83 ]---
      Reported-by: default avatarYi Zhang <yizhan@redhat.com>
      Tested-by: default avatarYi Zhang <yizhan@redhat.com>
      Reviewed-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarMing Lei <tom.leiming@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      16379a79
    • Tejun Heo's avatar
      cgroup, net_cls: iterate the fds of only the tasks which are being migrated · fee328fe
      Tejun Heo authored
      commit a05d4fd9 upstream.
      
      The net_cls controller controls the classid field of each socket which
      is associated with the cgroup.  Because the classid is per-socket
      attribute, when a task migrates to another cgroup or the configured
      classid of the cgroup changes, the controller needs to walk all
      sockets and update the classid value, which was implemented by
      3b13758f ("cgroups: Allow dynamically changing net_classid").
      
      While the approach is not scalable, migrating tasks which have a lot
      of fds attached to them is rare and the cost is born by the ones
      initiating the operations.  However, for simplicity, both the
      migration and classid config change paths call update_classid() which
      scans all fds of all tasks in the target css.  This is an overkill for
      the migration path which only needs to cover a much smaller subset of
      tasks which are actually getting migrated in.
      
      On cgroup v1, this can lead to unexpected scalability issues when one
      tries to migrate a task or process into a net_cls cgroup which already
      contains a lot of fds.  Even if the migration traget doesn't have many
      to get scanned, update_classid() ends up scanning all fds in the
      target cgroup which can be extremely numerous.
      
      Unfortunately, on cgroup v2 which doesn't use net_cls, the problem is
      even worse.  Before bfc2cf6f ("cgroup: call subsys->*attach() only
      for subsystems which are actually affected by migration"), cgroup core
      would call the ->css_attach callback even for controllers which don't
      see actual migration to a different css.
      
      As net_cls is always disabled but still mounted on cgroup v2, whenever
      a process is migrated on the cgroup v2 hierarchy, net_cls sees
      identity migration from root to root and cgroup core used to call
      ->css_attach callback for those.  The net_cls ->css_attach ends up
      calling update_classid() on the root net_cls css to which all
      processes on the system belong to as the controller isn't used.  This
      makes any cgroup v2 migration O(total_number_of_fds_on_the_system)
      which is horrible and easily leads to noticeable stalls triggering RCU
      stall warnings and so on.
      
      The worst symptom is already fixed in upstream by bfc2cf6f
      ("cgroup: call subsys->*attach() only for subsystems which are
      actually affected by migration"); however, backporting that commit is
      too invasive and we want to avoid other cases too.
      
      This patch updates net_cls's cgrp_attach() to iterate fds of only the
      processes which are actually getting migrated.  This removes the
      surprising migration cost which is dependent on the total number of
      fds in the target cgroup.  As this leaves write_classid() the only
      user of update_classid(), open-code the helper into write_classid().
      Reported-by: default avatarDavid Goode <dgoode@fb.com>
      Fixes: 3b13758f ("cgroups: Allow dynamically changing net_classid")
      Cc: Nina Schiff <ninasc@fb.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fee328fe
    • Viresh Kumar's avatar
      cpufreq: Restore policy min/max limits on CPU online · 3742b9a0
      Viresh Kumar authored
      commit ff010472 upstream.
      
      On CPU online the cpufreq core restores the previous governor (or
      the previous "policy" setting for ->setpolicy drivers), but it does
      not restore the min/max limits at the same time, which is confusing,
      inconsistent and real pain for users who set the limits and then
      suspend/resume the system (using full suspend), in which case the
      limits are reset on all CPUs except for the boot one.
      
      Fix this by making cpufreq_online() restore the limits when an inactive
      policy is brought online.
      
      The commit log and patch are inspired from Rafael's earlier work.
      Reported-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3742b9a0
    • Neeraj Upadhyay's avatar
      arm64: kaslr: Fix up the kernel image alignment · fc0af251
      Neeraj Upadhyay authored
      commit afd0e5a8 upstream.
      
      If kernel image extends across alignment boundary, existing
      code increases the KASLR offset by size of kernel image. The
      offset is masked after resizing. There are cases, where after
      masking, we may still have kernel image extending across
      boundary. This eventually results in only 2MB block getting
      mapped while creating the page tables. This results in data aborts
      while accessing unmapped regions during second relocation (with
      kaslr offset) in __primary_switch. To fix this problem, round up the
      kernel image size, by swapper block size, before adding it for
      correction.
      
      For example consider below case, where kernel image still crosses
      1GB alignment boundary, after masking the offset, which is fixed
      by rounding up kernel image size.
      
      SWAPPER_TABLE_SHIFT = 30
      Swapper using section maps with section size 2MB.
      CONFIG_PGTABLE_LEVELS = 3
      VA_BITS = 39
      
      _text  : 0xffffff8008080000
      _end   : 0xffffff800aa1b000
      offset : 0x1f35600000
      mask = ((1UL << (VA_BITS - 2)) - 1) & ~(SZ_2M - 1)
      
      (_text + offset) >> SWAPPER_TABLE_SHIFT = 0x3fffffe7c
      (_end + offset) >> SWAPPER_TABLE_SHIFT  = 0x3fffffe7d
      
      offset after existing correction (before mask) = 0x1f37f9b000
      (_text + offset) >> SWAPPER_TABLE_SHIFT = 0x3fffffe7d
      (_end + offset) >> SWAPPER_TABLE_SHIFT  = 0x3fffffe7d
      
      offset (after mask) = 0x1f37e00000
      (_text + offset) >> SWAPPER_TABLE_SHIFT = 0x3fffffe7c
      (_end + offset) >> SWAPPER_TABLE_SHIFT  = 0x3fffffe7d
      
      new offset w/ rounding up = 0x1f38000000
      (_text + offset) >> SWAPPER_TABLE_SHIFT = 0x3fffffe7d
      (_end + offset) >> SWAPPER_TABLE_SHIFT  = 0x3fffffe7d
      
      Fixes: f80fb3a3 ("arm64: add support for kernel ASLR")
      Reviewed-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarNeeraj Upadhyay <neeraju@codeaurora.org>
      Signed-off-by: default avatarSrinivas Ramana <sramana@codeaurora.org>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc0af251
    • Nicolas Ferre's avatar
      ARM: at91: pm: cpu_idle: switch DDR to power-down mode · f464f86d
      Nicolas Ferre authored
      commit 60b89f19 upstream.
      
      On some DDR controllers, compatible with the sama5d3 one,
      the sequence to enter/exit/re-enter the self-refresh mode adds
      more constrains than what is currently written in the at91_idle
      driver. An actual access to the DDR chip is needed between exit
      and re-enter of this mode which is somehow difficult to implement.
      This sequence can completely hang the SoC. It is particularly
      experienced on parts which embed a L2 cache if the code run
      between IDLE calls fits in it...
      
      Moreover, as the intention is to enter and exit pretty rapidly
      from IDLE, the power-down mode is a good candidate.
      
      So now we use power-down instead of self-refresh. As we can
      simplify the code for sama5d3 compatible DDR controllers,
      we instantiate a new sama5d3_ddr_standby() function.
      Signed-off-by: default avatarNicolas Ferre <nicolas.ferre@microchip.com>
      Fixes: 017b5522 ("ARM: at91: Add new binding for sama5d3-ddramc")
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@free-electrons.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f464f86d