1. 02 Sep, 2017 10 commits
  2. 30 Aug, 2017 30 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.9.46 · 0eed54bd
      Greg Kroah-Hartman authored
      0eed54bd
    • Benjamin Herrenschmidt's avatar
      powerpc/mm: Ensure cpumask update is ordered · 5aa523a9
      Benjamin Herrenschmidt authored
      commit 1a92a80a upstream.
      
      There is no guarantee that the various isync's involved with
      the context switch will order the update of the CPU mask with
      the first TLB entry for the new context being loaded by the HW.
      
      Be safe here and add a memory barrier to order any subsequent
      load/store which may bring entries into the TLB.
      
      The corresponding barrier on the other side already exists as
      pte updates use pte_xchg() which uses __cmpxchg_u64 which has
      a sync after the atomic operation.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Reviewed-by: default avatarNicholas Piggin <npiggin@gmail.com>
      [mpe: Add comments in the code]
      [mpe: Backport to 4.12, minor context change]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5aa523a9
    • Lv Zheng's avatar
      ACPI: EC: Fix regression related to wrong ECDT initialization order · 5906715b
      Lv Zheng authored
      commit 98529b92 upstream.
      
      Commit 2a570840 (ACPI / EC: Fix a gap that ECDT EC cannot handle
      EC events) introduced acpi_ec_ecdt_start(), but that function is
      invoked before acpi_ec_query_init(), which is too early.  This causes
      the kernel to crash if an EC event occurs after boot, when ec_query_wq
      is not valid:
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000102
       ...
       Workqueue: events acpi_ec_event_handler
       task: ffff9f539790dac0 task.stack: ffffb437c0e10000
       RIP: 0010:__queue_work+0x32/0x430
      
      Normally, the DSDT EC should always be valid, so acpi_ec_ecdt_start()
      is actually a no-op in the majority of cases.  However, commit
      c712bb58 (ACPI / EC: Add support to skip boot stage DSDT probe)
      caused the probing of the DSDT EC as the "boot EC" to be skipped when
      the ECDT EC is valid and uncovered the bug.
      
      Fix this issue by invoking acpi_ec_ecdt_start() after acpi_ec_query_init()
      in acpi_ec_init().
      
      Link: https://jira01.devtools.intel.com/browse/LCK-4348
      Fixes: 2a570840 (ACPI / EC: Fix a gap that ECDT EC cannot handle EC events)
      Fixes: c712bb58 (ACPI / EC: Add support to skip boot stage DSDT probe)
      Reported-by: default avatarWang Wendy <wendy.wang@intel.com>
      Tested-by: default avatarFeng Chenzhou <chenzhoux.feng@intel.com>
      Signed-off-by: default avatarLv Zheng <lv.zheng@intel.com>
      [ rjw: Changelog ]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5906715b
    • James Morse's avatar
      ACPI / APEI: Add missing synchronize_rcu() on NOTIFY_SCI removal · 3bc8e4f9
      James Morse authored
      commit 7d64f82c upstream.
      
      When removing a GHES device notified by SCI, list_del_rcu() is used,
      ghes_remove() should call synchronize_rcu() before it goes on to call
      kfree(ghes), otherwise concurrent RCU readers may still hold this list
      entry after it has been freed.
      Signed-off-by: default avatarJames Morse <james.morse@arm.com>
      Reviewed-by: default avatar"Huang, Ying" <ying.huang@intel.com>
      Fixes: 81e88fdc (ACPI, APEI, Generic Hardware Error Source POLL/IRQ/NMI notification type support)
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3bc8e4f9
    • Joerg Roedel's avatar
      ACPI: ioapic: Clear on-stack resource before using it · 454cac5d
      Joerg Roedel authored
      commit e3d5092b upstream.
      
      The on-stack resource-window 'win' in setup_res() is not
      properly initialized. This causes the pointers in the
      embedded 'struct resource' to contain stale addresses.
      
      These pointers (in my case the ->child pointer) later get
      propagated to the global iomem_resources list, causing a #GP
      exception when the list is traversed in
      iomem_map_sanity_check().
      
      Fixes: c183619b (x86/irq, ACPI: Implement ACPI driver to support IOAPIC hotplug)
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      454cac5d
    • Dave Jiang's avatar
      ntb: transport shouldn't disable link due to bogus values in SPADs · c1628774
      Dave Jiang authored
      commit f3fd2afe upstream.
      
      It seems that under certain scenarios the SPAD can have bogus values caused
      by an agent (i.e. BIOS or other software) that is not the kernel driver, and
      that causes memory window setup failure. This should not cause the link to
      be disabled because if we do that, the driver will never recover again. We
      have verified in testing that this issue happens and prevents proper link
      recovery.
      Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
      Acked-by: default avatarAllen Hubbe <Allen.Hubbe@dell.com>
      Signed-off-by: default avatarJon Mason <jdmason@kudzu.us>
      Fixes: 84f76685 ("ntb: stop link work when we do not have memory")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c1628774
    • Logan Gunthorpe's avatar
      ntb: ntb_test: ensure the link is up before trying to configure the mws · 4d4f3547
      Logan Gunthorpe authored
      commit 0eb46345 upstream.
      
      After the link tests, there is a race on one side of the test for
      the link coming up. It's possible, in some cases, for the test script
      to write to the 'peer_trans' files before the link has come up.
      
      To fix this, we simply use the link event file to ensure both sides
      see the link as up before continuning.
      Signed-off-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Acked-by: default avatarAllen Hubbe <Allen.Hubbe@dell.com>
      Signed-off-by: default avatarJon Mason <jdmason@kudzu.us>
      Fixes: a9c59ef7 ("ntb_test: Add a selftest script for the NTB subsystem")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4d4f3547
    • Allen Hubbe's avatar
      ntb: no sleep in ntb_async_tx_submit · 7592db55
      Allen Hubbe authored
      commit 88931ec3 upstream.
      
      Do not sleep in ntb_async_tx_submit, which could deadlock.
      This reverts commit "8c874cc1"
      
      Fixes: 8c874cc1 ("NTB: Address out of DMA descriptor issue with NTB")
      Reported-by: default avatarJia-Ju Bai <baijiaju1990@163.com>
      Signed-off-by: default avatarAllen Hubbe <Allen.Hubbe@dell.com>
      Acked-by: default avatarDave Jiang <dave.jiang@intel.com>
      Signed-off-by: default avatarJon Mason <jdmason@kudzu.us>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7592db55
    • Logan Gunthorpe's avatar
      NTB: ntb_test: fix bug printing ntb_perf results · bff04a46
      Logan Gunthorpe authored
      commit 07b0b22b upstream.
      
      The code mistakenly prints the local perf results for the remote test
      so the script reports identical results for both directions. Fix this
      by ensuring we print the remote result.
      Signed-off-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Fixes: a9c59ef7 ("ntb_test: Add a selftest script for the NTB subsystem")
      Acked-by: default avatarAllen Hubbe <Allen.Hubbe@dell.com>
      Signed-off-by: default avatarJon Mason <jdmason@kudzu.us>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bff04a46
    • Logan Gunthorpe's avatar
      ntb_transport: fix bug calculating num_qps_mw · 471954c3
      Logan Gunthorpe authored
      commit 8e8496e0 upstream.
      
      A divide by zero error occurs if qp_count is less than mw_count because
      num_qps_mw is calculated to be zero. The calculation appears to be
      incorrect.
      
      The requirement is for num_qps_mw to be set to qp_count / mw_count
      with any remainder divided among the earlier mws.
      
      For example, if mw_count is 5 and qp_count is 12 then mws 0 and 1
      will have 3 qps per window and mws 2 through 4 will have 2 qps per window.
      Thus, when mw_num < qp_count % mw_count, num_qps_mw is 1 higher
      than when mw_num >= qp_count.
      Signed-off-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Fixes: e26a5843 ("NTB: Split ntb_hw_intel and ntb_transport drivers")
      Acked-by: default avatarAllen Hubbe <Allen.Hubbe@dell.com>
      Signed-off-by: default avatarJon Mason <jdmason@kudzu.us>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      471954c3
    • Logan Gunthorpe's avatar
      ntb_transport: fix qp count bug · 4743d1b3
      Logan Gunthorpe authored
      commit cb827ee6 upstream.
      
      In cases where there are more mw's than spads/2-2, the mw count gets
      reduced to match the limitation. ntb_transport also tries to ensure that
      there are fewer qps than mws but uses the full mw count instead of
      the reduced one. When this happens, the math in
      'ntb_transport_setup_qp_mw' will get confused and result in a kernel
      paging request bug.
      
      This patch fixes the bug by reducing qp_count to the reduced mw count
      instead of the full mw count.
      Signed-off-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Fixes: e26a5843 ("NTB: Split ntb_hw_intel and ntb_transport drivers")
      Acked-by: default avatarAllen Hubbe <Allen.Hubbe@dell.com>
      Signed-off-by: default avatarJon Mason <jdmason@kudzu.us>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4743d1b3
    • Linus Torvalds's avatar
      Clarify (and fix) MAX_LFS_FILESIZE macros · b8fce382
      Linus Torvalds authored
      commit 0cc3b0ec upstream.
      
      We have a MAX_LFS_FILESIZE macro that is meant to be filled in by
      filesystems (and other IO targets) that know they are 64-bit clean and
      don't have any 32-bit limits in their IO path.
      
      It turns out that our 32-bit value for that limit was bogus.  On 32-bit,
      the VM layer is limited by the page cache to only 32-bit index values,
      but our logic for that was confusing and actually wrong.  We used to
      define that value to
      
      	(((loff_t)PAGE_SIZE << (BITS_PER_LONG-1))-1)
      
      which is actually odd in several ways: it limits the index to 31 bits,
      and then it limits files so that they can't have data in that last byte
      of a page that has the highest 31-bit index (ie page index 0x7fffffff).
      
      Neither of those limitations make sense.  The index is actually the full
      32 bit unsigned value, and we can use that whole full page.  So the
      maximum size of the file would logically be "PAGE_SIZE << BITS_PER_LONG".
      
      However, we do wan tto avoid the maximum index, because we have code
      that iterates over the page indexes, and we don't want that code to
      overflow.  So the maximum size of a file on a 32-bit host should
      actually be one page less than the full 32-bit index.
      
      So the actual limit is ULONG_MAX << PAGE_SHIFT.  That means that we will
      not actually be using the page of that last index (ULONG_MAX), but we
      can grow a file up to that limit.
      
      The wrong value of MAX_LFS_FILESIZE actually caused problems for Doug
      Nazar, who was still using a 32-bit host, but with a 9.7TB 2 x RAID5
      volume.  It turns out that our old MAX_LFS_FILESIZE was 8TiB (well, one
      byte less), but the actual true VM limit is one page less than 16TiB.
      
      This was invisible until commit c2a9737f ("vfs,mm: fix a dead loop
      in truncate_inode_pages_range()"), which started applying that
      MAX_LFS_FILESIZE limit to block devices too.
      
      NOTE! On 64-bit, the page index isn't a limiter at all, and the limit is
      actually just the offset type itself (loff_t), which is signed.  But for
      clarity, on 64-bit, just use the maximum signed value, and don't make
      people have to count the number of 'f' characters in the hex constant.
      
      So just use LLONG_MAX for the 64-bit case.  That was what the value had
      been before too, just written out as a hex constant.
      
      Fixes: c2a9737f ("vfs,mm: fix a dead loop in truncate_inode_pages_range()")
      Reported-and-tested-by: default avatarDoug Nazar <nazard@nazar.ca>
      Cc: Andreas Dilger <adilger@dilger.ca>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Dave Kleikamp <shaggy@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b8fce382
    • Charles Milette's avatar
      staging: rtl8188eu: add RNX-N150NUB support · ab4be3a6
      Charles Milette authored
      commit f299aec6 upstream.
      
      Add support for USB Device Rosewill RNX-N150NUB.
      VendorID: 0x0bda, ProductID: 0xffef
      Signed-off-by: default avatarCharles Milette <charles.milette@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ab4be3a6
    • Srinivas Pandruvada's avatar
      iio: hid-sensor-trigger: Fix the race with user space powering up sensors · 23caaf2f
      Srinivas Pandruvada authored
      commit f1664eaa upstream.
      
      It has been reported for a while that with iio-sensor-proxy service the
      rotation only works after one suspend/resume cycle. This required a wait
      in the systemd unit file to avoid race. I found a Yoga 900 where I could
      reproduce this.
      
      The problem scenerio is:
      - During sensor driver init, enable run time PM and also set a
        auto-suspend for 3 seconds.
      	This result in one runtime resume. But there is a check to avoid
      a powerup in this sequence, but rpm is active
      - User space iio-sensor-proxy tries to power up the sensor. Since rpm is
        active it will simply return. But sensors were not actually
      powered up in the prior sequence, so actaully the sensors will not work
      - After 3 seconds the auto suspend kicks
      
      If we add a wait in systemd service file to fire iio-sensor-proxy after
      3 seconds, then now everything will work as the runtime resume will
      actually powerup the sensor as this is a user request.
      
      To avoid this:
      - Remove the check to match user requested state, this will cause a
        brief powerup, but if the iio-sensor-proxy starts immediately it will
      still work as the sensors are ON.
      - Also move the autosuspend delay to place when user requested turn off
        of sensors, like after user finished raw read or buffer disable
      Signed-off-by: default avatarSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Tested-by: default avatarBastien Nocera <hadess@hadess.net>
      Signed-off-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      23caaf2f
    • Dragos Bogdan's avatar
      iio: imu: adis16480: Fix acceleration scale factor for adis16480 · b150ee06
      Dragos Bogdan authored
      commit fdd0d32e upstream.
      
      According to the datasheet, the range of the acceleration is [-10 g, + 10 g],
      so the scale factor should be 10 instead of 5.
      Signed-off-by: default avatarDragos Bogdan <dragos.bogdan@analog.com>
      Acked-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Signed-off-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b150ee06
    • Martijn Coenen's avatar
      ANDROID: binder: fix proc->tsk check. · cbd854d9
      Martijn Coenen authored
      commit b2a6d1b9 upstream.
      
      Commit c4ea41ba ("binder: use group leader instead of open thread")'
      was incomplete and didn't update a check in binder_mmap(), causing all
      mmap() calls into the binder driver to fail.
      Signed-off-by: default avatarMartijn Coenen <maco@android.com>
      Tested-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cbd854d9
    • Riley Andrews's avatar
      binder: Use wake up hint for synchronous transactions. · 8fb0b0ce
      Riley Andrews authored
      commit 00b40d61 upstream.
      
      Use wake_up_interruptible_sync() to hint to the scheduler binder
      transactions are synchronous wakeups. Disable preemption while waking
      to avoid ping-ponging on the binder lock.
      Signed-off-by: default avatarTodd Kjos <tkjos@google.com>
      Signed-off-by: default avatarOmprakash Dhyade <odhyade@codeaurora.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8fb0b0ce
    • Todd Kjos's avatar
      binder: use group leader instead of open thread · 51050750
      Todd Kjos authored
      commit c4ea41ba upstream.
      
      The binder allocator assumes that the thread that
      called binder_open will never die for the lifetime of
      that proc. That thread is normally the group_leader,
      however it may not be. Use the group_leader instead
      of current.
      Signed-off-by: default avatarTodd Kjos <tkjos@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      51050750
    • Todd Kjos's avatar
      Revert "android: binder: Sanity check at binder ioctl" · eda70a55
      Todd Kjos authored
      commit a2b18708 upstream.
      
      This reverts commit a906d693.
      
      The patch introduced a race in the binder driver. An attempt to fix the
      race was submitted in "[PATCH v2] android: binder: fix dangling pointer
      comparison", however the conclusion in the discussion for that patch
      was that the original patch should be reverted.
      
      The reversion is being done as part of the fine-grained locking
      patchset since the patch would need to be refactored when
      proc->vmm_vm_mm is removed from struct binder_proc and added
      in the binder allocator.
      Signed-off-by: default avatarTodd Kjos <tkjos@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eda70a55
    • Jeffy Chen's avatar
      Bluetooth: bnep: fix possible might sleep error in bnep_session · 242cea2d
      Jeffy Chen authored
      commit 25717382 upstream.
      
      It looks like bnep_session has same pattern as the issue reported in
      old rfcomm:
      
      	while (1) {
      		set_current_state(TASK_INTERRUPTIBLE);
      		if (condition)
      			break;
      		// may call might_sleep here
      		schedule();
      	}
      	__set_current_state(TASK_RUNNING);
      
      Which fixed at:
      	dfb2fae7 Bluetooth: Fix nested sleeps
      
      So let's fix it at the same way, also follow the suggestion of:
      https://lwn.net/Articles/628628/Signed-off-by: default avatarJeffy Chen <jeffy.chen@rock-chips.com>
      Reviewed-by: default avatarBrian Norris <briannorris@chromium.org>
      Reviewed-by: default avatarAL Yu-Chen Cho <acho@suse.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      242cea2d
    • Jeffy Chen's avatar
      Bluetooth: cmtp: fix possible might sleep error in cmtp_session · ffb7640a
      Jeffy Chen authored
      commit f06d9773 upstream.
      
      It looks like cmtp_session has same pattern as the issue reported in
      old rfcomm:
      
      	while (1) {
      		set_current_state(TASK_INTERRUPTIBLE);
      		if (condition)
      			break;
      		// may call might_sleep here
      		schedule();
      	}
      	__set_current_state(TASK_RUNNING);
      
      Which fixed at:
      	dfb2fae7 Bluetooth: Fix nested sleeps
      
      So let's fix it at the same way, also follow the suggestion of:
      https://lwn.net/Articles/628628/Signed-off-by: default avatarJeffy Chen <jeffy.chen@rock-chips.com>
      Reviewed-by: default avatarBrian Norris <briannorris@chromium.org>
      Reviewed-by: default avatarAL Yu-Chen Cho <acho@suse.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ffb7640a
    • Jeffy Chen's avatar
      Bluetooth: hidp: fix possible might sleep error in hidp_session_thread · 1b5fcb3b
      Jeffy Chen authored
      commit 5da8e47d upstream.
      
      It looks like hidp_session_thread has same pattern as the issue reported in
      old rfcomm:
      
      	while (1) {
      		set_current_state(TASK_INTERRUPTIBLE);
      		if (condition)
      			break;
      		// may call might_sleep here
      		schedule();
      	}
      	__set_current_state(TASK_RUNNING);
      
      Which fixed at:
      	dfb2fae7 Bluetooth: Fix nested sleeps
      
      So let's fix it at the same way, also follow the suggestion of:
      https://lwn.net/Articles/628628/Signed-off-by: default avatarJeffy Chen <jeffy.chen@rock-chips.com>
      Tested-by: default avatarAL Yu-Chen Cho <acho@suse.com>
      Tested-by: default avatarRohit Vaswani <rvaswani@nvidia.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1b5fcb3b
    • Florian Westphal's avatar
      netfilter: nat: fix src map lookup · 5f81b1f5
      Florian Westphal authored
      commit 97772bcd upstream.
      
      When doing initial conversion to rhashtable I replaced the bucket
      walk with a single rhashtable_lookup_fast().
      
      When moving to rhlist I failed to properly walk the list of identical
      tuples, but that is what is needed for this to work correctly.
      The table contains the original tuples, so the reply tuples are all
      distinct.
      
      We currently decide that mapping is (not) in range only based on the
      first entry, but in case its not we need to try the reply tuple of the
      next entry until we either find an in-range mapping or we checked
      all the entries.
      
      This bug makes nat core attempt collision resolution while it might be
      able to use the mapping as-is.
      
      Fixes: 870190a9 ("netfilter: nat: convert nat bysrc hash to rhashtable")
      Reported-by: default avatarJaco Kroon <jaco@uls.co.za>
      Tested-by: default avatarJaco Kroon <jaco@uls.co.za>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5f81b1f5
    • Zhang Bo's avatar
      Revert "leds: handle suspend/resume in heartbeat trigger" · 090911a2
      Zhang Bo authored
      commit 436c4c45 upstream.
      
      This reverts commit 5ab92a7c.
      
      System cannot enter suspend mode because of heartbeat led trigger.
      In autosleep_wq, try_to_suspend function will try to enter suspend
      mode in specific period. it will get wakeup_count then call pm_notifier
      chain callback function and freeze processes.
      Heartbeat_pm_notifier is called and it call led_trigger_unregister to
      change the trigger of led device to none. It will send uevent message
      and the wakeup source count changed. As wakeup_count changed, suspend
      will abort.
      
      Fixes: 5ab92a7c ("leds: handle suspend/resume in heartbeat trigger")
      Signed-off-by: default avatarZhang Bo <bo.zhang@nxp.com>
      Acked-by: default avatarPavel Machek <pavel@ucw.cz>
      Reviewed-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarJacek Anaszewski <jacek.anaszewski@gmail.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      090911a2
    • Vadim Lomovtsev's avatar
      net: sunrpc: svcsock: fix NULL-pointer exception · d4c5c26c
      Vadim Lomovtsev authored
      commit eebe53e8 upstream.
      
      While running nfs/connectathon tests kernel NULL-pointer exception
      has been observed due to races in svcsock.c.
      
      Race is appear when kernel accepts connection by kernel_accept
      (which creates new socket) and start queuing ingress packets
      to new socket. This happens in ksoftirq context which could run
      concurrently on a different core while new socket setup is not done yet.
      
      The fix is to re-order socket user data init sequence and add
      write/read barrier calls to be sure that we got proper values
      for callback pointers before actually calling them.
      
      Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.
      
      Crash log:
      ---<-snip->---
      [ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
      [ 6708.647093] pgd = ffff0000094e0000
      [ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
      [ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
      [ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
      [ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
      [ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
      [ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
      [ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
      [ 6708.773822] PC is at 0x0
      [ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
      [ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
      [ 6708.789248] sp : ffff810ffbad3900
      [ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
      [ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
      [ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
      [ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
      [ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
      [ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
      [ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
      [ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
      [ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
      [ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
      [ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
      [ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
      [ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
      [ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
      [ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
      [ 6708.872084]
      [ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
      [ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
      [ 6708.886075] Call trace:
      [ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
      [ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
      [ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
      [ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
      [ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
      [ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
      [ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
      [ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
      [ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
      [ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
      [ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
      [ 6708.973117] [<          (null)>]           (null)
      [ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
      [ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
      [ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
      [ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
      [ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
      [ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
      [ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
      [ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
      [ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
      [ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
      [ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
      [ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
      [ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
      [ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
      [ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
      [ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
      [ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
      [ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
      [ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
      [ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
      [ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
      [ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
      [ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
      [ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
      [ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
      [ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
      [ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
      [ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
      [ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
      [ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
      [ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
      [ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
      [ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
      [ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
      [ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
      [ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
      [ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
      [ 6709.207967] Code: bad PC value
      [ 6709.211061] SMP: stopping secondary CPUs
      [ 6709.218830] Starting crashdump kernel...
      [ 6709.222749] Bye!
      ---<-snip>---
      Signed-off-by: default avatarVadim Lomovtsev <vlomovts@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d4c5c26c
    • Eric Biggers's avatar
      x86/mm: Fix use-after-free of ldt_struct · 3559de45
      Eric Biggers authored
      commit ccd5b323 upstream.
      
      The following commit:
      
        39a0526f ("x86/mm: Factor out LDT init from context init")
      
      renamed init_new_context() to init_new_context_ldt() and added a new
      init_new_context() which calls init_new_context_ldt().  However, the
      error code of init_new_context_ldt() was ignored.  Consequently, if a
      memory allocation in alloc_ldt_struct() failed during a fork(), the
      ->context.ldt of the new task remained the same as that of the old task
      (due to the memcpy() in dup_mm()).  ldt_struct's are not intended to be
      shared, so a use-after-free occurred after one task exited.
      
      Fix the bug by making init_new_context() pass through the error code of
      init_new_context_ldt().
      
      This bug was found by syzkaller, which encountered the following splat:
      
          BUG: KASAN: use-after-free in free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116
          Read of size 4 at addr ffff88006d2cb7c8 by task kworker/u9:0/3710
      
          CPU: 1 PID: 3710 Comm: kworker/u9:0 Not tainted 4.13.0-rc4-next-20170811 #2
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
          Call Trace:
           __dump_stack lib/dump_stack.c:16 [inline]
           dump_stack+0x194/0x257 lib/dump_stack.c:52
           print_address_description+0x73/0x250 mm/kasan/report.c:252
           kasan_report_error mm/kasan/report.c:351 [inline]
           kasan_report+0x24e/0x340 mm/kasan/report.c:409
           __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:429
           free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116
           free_ldt_struct arch/x86/kernel/ldt.c:173 [inline]
           destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171
           destroy_context arch/x86/include/asm/mmu_context.h:157 [inline]
           __mmdrop+0xe9/0x530 kernel/fork.c:889
           mmdrop include/linux/sched/mm.h:42 [inline]
           exec_mmap fs/exec.c:1061 [inline]
           flush_old_exec+0x173c/0x1ff0 fs/exec.c:1291
           load_elf_binary+0x81f/0x4ba0 fs/binfmt_elf.c:855
           search_binary_handler+0x142/0x6b0 fs/exec.c:1652
           exec_binprm fs/exec.c:1694 [inline]
           do_execveat_common.isra.33+0x1746/0x22e0 fs/exec.c:1816
           do_execve+0x31/0x40 fs/exec.c:1860
           call_usermodehelper_exec_async+0x457/0x8f0 kernel/umh.c:100
           ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
      
          Allocated by task 3700:
           save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
           save_stack+0x43/0xd0 mm/kasan/kasan.c:447
           set_track mm/kasan/kasan.c:459 [inline]
           kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
           kmem_cache_alloc_trace+0x136/0x750 mm/slab.c:3627
           kmalloc include/linux/slab.h:493 [inline]
           alloc_ldt_struct+0x52/0x140 arch/x86/kernel/ldt.c:67
           write_ldt+0x7b7/0xab0 arch/x86/kernel/ldt.c:277
           sys_modify_ldt+0x1ef/0x240 arch/x86/kernel/ldt.c:307
           entry_SYSCALL_64_fastpath+0x1f/0xbe
      
          Freed by task 3700:
           save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
           save_stack+0x43/0xd0 mm/kasan/kasan.c:447
           set_track mm/kasan/kasan.c:459 [inline]
           kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
           __cache_free mm/slab.c:3503 [inline]
           kfree+0xca/0x250 mm/slab.c:3820
           free_ldt_struct.part.2+0xdd/0x150 arch/x86/kernel/ldt.c:121
           free_ldt_struct arch/x86/kernel/ldt.c:173 [inline]
           destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171
           destroy_context arch/x86/include/asm/mmu_context.h:157 [inline]
           __mmdrop+0xe9/0x530 kernel/fork.c:889
           mmdrop include/linux/sched/mm.h:42 [inline]
           __mmput kernel/fork.c:916 [inline]
           mmput+0x541/0x6e0 kernel/fork.c:927
           copy_process.part.36+0x22e1/0x4af0 kernel/fork.c:1931
           copy_process kernel/fork.c:1546 [inline]
           _do_fork+0x1ef/0xfb0 kernel/fork.c:2025
           SYSC_clone kernel/fork.c:2135 [inline]
           SyS_clone+0x37/0x50 kernel/fork.c:2129
           do_syscall_64+0x26c/0x8c0 arch/x86/entry/common.c:287
           return_from_SYSCALL_64+0x0/0x7a
      
      Here is a C reproducer:
      
          #include <asm/ldt.h>
          #include <pthread.h>
          #include <signal.h>
          #include <stdlib.h>
          #include <sys/syscall.h>
          #include <sys/wait.h>
          #include <unistd.h>
      
          static void *fork_thread(void *_arg)
          {
              fork();
          }
      
          int main(void)
          {
              struct user_desc desc = { .entry_number = 8191 };
      
              syscall(__NR_modify_ldt, 1, &desc, sizeof(desc));
      
              for (;;) {
                  if (fork() == 0) {
                      pthread_t t;
      
                      srand(getpid());
                      pthread_create(&t, NULL, fork_thread, NULL);
                      usleep(rand() % 10000);
                      syscall(__NR_exit_group, 0);
                  }
                  wait(NULL);
              }
          }
      
      Note: the reproducer takes advantage of the fact that alloc_ldt_struct()
      may use vmalloc() to allocate a large ->entries array, and after
      commit:
      
        5d17a73a ("vmalloc: back off when the current task is killed")
      
      it is possible for userspace to fail a task's vmalloc() by
      sending a fatal signal, e.g. via exit_group().  It would be more
      difficult to reproduce this bug on kernels without that commit.
      
      This bug only affected kernels with CONFIG_MODIFY_LDT_SYSCALL=y.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Acked-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Fixes: 39a0526f ("x86/mm: Factor out LDT init from context init")
      Link: http://lkml.kernel.org/r/20170824175029.76040-1-ebiggers3@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3559de45
    • Nicholas Piggin's avatar
      timers: Fix excessive granularity of new timers after a nohz idle · 70b3fd5c
      Nicholas Piggin authored
      commit 2fe59f50 upstream.
      
      When a timer base is idle, it is forwarded when a new timer is added
      to ensure that granularity does not become excessive. When not idle,
      the timer tick is expected to increment the base.
      
      However there are several problems:
      
      - If an existing timer is modified, the base is forwarded only after
        the index is calculated.
      
      - The base is not forwarded by add_timer_on.
      
      - There is a window after a timer is restarted from a nohz idle, after
        it is marked not-idle and before the timer tick on this CPU, where a
        timer may be added but the ancient base does not get forwarded.
      
      These result in excessive granularity (a 1 jiffy timeout can blow out
      to 100s of jiffies), which cause the rcu lockup detector to trigger,
      among other things.
      
      Fix this by keeping track of whether the timer base has been idle
      since it was last run or forwarded, and if so then forward it before
      adding a new timer.
      
      There is still a case where mod_timer optimises the case of a pending
      timer mod with the same expiry time, where the timer can see excessive
      granularity relative to the new, shorter interval. A comment is added,
      but it's not changed because it is an important fastpath for
      networking.
      
      This has been tested and found to fix the RCU softlockup messages.
      
      Testing was also done with tracing to measure requested versus
      achieved wakeup latencies for all non-deferrable timers in an idle
      system (with no lockup watchdogs running). Wakeup latency relative to
      absolute latency is calculated (note this suffers from round-up skew
      at low absolute times) and analysed:
      
                   max     avg      std
      upstream   506.0    1.20     4.68
      patched      2.0    1.08     0.15
      
      The bug was noticed due to the lockup detector Kconfig changes
      dropping it out of people's .configs and resulting in larger base
      clk skew When the lockup detectors are enabled, no CPU can go idle for
      longer than 4 seconds, which limits the granularity errors.
      Sub-optimal timer behaviour is observable on a smaller scale in that
      case:
      
      	     max     avg      std
      upstream     9.0    1.05     0.19
      patched      2.0    1.04     0.11
      
      Fixes: Fixes: a683f390 ("timers: Forward the wheel clock whenever possible")
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: default avatarDavid Miller <davem@davemloft.net>
      Cc: dzickus@redhat.com
      Cc: sfr@canb.auug.org.au
      Cc: mpe@ellerman.id.au
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: linuxarm@huawei.com
      Cc: abdhalee@linux.vnet.ibm.com
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: akpm@linux-foundation.org
      Cc: paulmck@linux.vnet.ibm.com
      Cc: torvalds@linux-foundation.org
      Link: http://lkml.kernel.org/r/20170822084348.21436-1-npiggin@gmail.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      70b3fd5c
    • Thomas Gleixner's avatar
      perf/x86/intel/rapl: Make package handling more robust · 3df3b2ef
      Thomas Gleixner authored
      commit dd86e373 upstream.
      
      The package management code in RAPL relies on package mapping being
      available before a CPU is started. This changed with:
      
        9d85eb91 ("x86/smpboot: Make logical package management more robust")
      
      because the ACPI/BIOS information turned out to be unreliable, but that
      left RAPL in broken state. This was not noticed because on a regular boot
      all CPUs are online before RAPL is initialized.
      
      A possible fix would be to reintroduce the mess which allocates a package
      data structure in CPU prepare and when it turns out to already exist in
      starting throw it away later in the CPU online callback. But that's a
      horrible hack and not required at all because RAPL becomes functional for
      perf only in the CPU online callback. That's correct because user space is
      not yet informed about the CPU being onlined, so nothing caan rely on RAPL
      being available on that particular CPU.
      
      Move the allocation to the CPU online callback and simplify the hotplug
      handling. At this point the package mapping is established and correct.
      
      This also adds a missing check for available package data in the
      event_init() function.
      Reported-by: default avatarYasuaki Ishimatsu <yasu.isimatu@gmail.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: 9d85eb91 ("x86/smpboot: Make logical package management more robust")
      Link: http://lkml.kernel.org/r/20170131230141.212593966@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      [ jwang: backport to 4.9 fix Null pointer deref during hotplug cpu.]
      Signed-off-by: default avatarJack Wang <jinpu.wang@profitbricks.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3df3b2ef
    • Masami Hiramatsu's avatar
      perf probe: Fix --funcs to show correct symbols for offline module · bac83e5c
      Masami Hiramatsu authored
      commit eebc509b upstream.
      
      Fix --funcs (-F) option to show correct symbols for offline module.
      Since previous perf-probe uses machine__findnew_module_map() for offline
      module, even if user passes a module file (with full path) which is for
      other architecture, perf-probe always tries to load symbol map for
      current kernel module.
      
      This fix uses dso__new_map() to load the map from given binary as same
      as a map for user applications.
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/148350053478.19001.15435255244512631545.stgit@devboxSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Krister Johansen <kjlx@templeofstupid.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bac83e5c
    • Mark Rutland's avatar
      perf/core: Fix group {cpu,task} validation · bde6608d
      Mark Rutland authored
      commit 64aee2a9 upstream.
      
      Regardless of which events form a group, it does not make sense for the
      events to target different tasks and/or CPUs, as this leaves the group
      inconsistent and impossible to schedule. The core perf code assumes that
      these are consistent across (successfully intialised) groups.
      
      Core perf code only verifies this when moving SW events into a HW
      context. Thus, we can violate this requirement for pure SW groups and
      pure HW groups, unless the relevant PMU driver happens to perform this
      verification itself. These mismatched groups subsequently wreak havoc
      elsewhere.
      
      For example, we handle watchpoints as SW events, and reserve watchpoint
      HW on a per-CPU basis at pmu::event_init() time to ensure that any event
      that is initialised is guaranteed to have a slot at pmu::add() time.
      However, the core code only checks the group leader's cpu filter (via
      event_filter_match()), and can thus install follower events onto CPUs
      violating thier (mismatched) CPU filters, potentially installing them
      into a CPU without sufficient reserved slots.
      
      This can be triggered with the below test case, resulting in warnings
      from arch backends.
      
        #define _GNU_SOURCE
        #include <linux/hw_breakpoint.h>
        #include <linux/perf_event.h>
        #include <sched.h>
        #include <stdio.h>
        #include <sys/prctl.h>
        #include <sys/syscall.h>
        #include <unistd.h>
      
        static int perf_event_open(struct perf_event_attr *attr, pid_t pid, int cpu,
      			   int group_fd, unsigned long flags)
        {
      	return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags);
        }
      
        char watched_char;
      
        struct perf_event_attr wp_attr = {
      	.type = PERF_TYPE_BREAKPOINT,
      	.bp_type = HW_BREAKPOINT_RW,
      	.bp_addr = (unsigned long)&watched_char,
      	.bp_len = 1,
      	.size = sizeof(wp_attr),
        };
      
        int main(int argc, char *argv[])
        {
      	int leader, ret;
      	cpu_set_t cpus;
      
      	/*
      	 * Force use of CPU0 to ensure our CPU0-bound events get scheduled.
      	 */
      	CPU_ZERO(&cpus);
      	CPU_SET(0, &cpus);
      	ret = sched_setaffinity(0, sizeof(cpus), &cpus);
      	if (ret) {
      		printf("Unable to set cpu affinity\n");
      		return 1;
      	}
      
      	/* open leader event, bound to this task, CPU0 only */
      	leader = perf_event_open(&wp_attr, 0, 0, -1, 0);
      	if (leader < 0) {
      		printf("Couldn't open leader: %d\n", leader);
      		return 1;
      	}
      
      	/*
      	 * Open a follower event that is bound to the same task, but a
      	 * different CPU. This means that the group should never be possible to
      	 * schedule.
      	 */
      	ret = perf_event_open(&wp_attr, 0, 1, leader, 0);
      	if (ret < 0) {
      		printf("Couldn't open mismatched follower: %d\n", ret);
      		return 1;
      	} else {
      		printf("Opened leader/follower with mismastched CPUs\n");
      	}
      
      	/*
      	 * Open as many independent events as we can, all bound to the same
      	 * task, CPU0 only.
      	 */
      	do {
      		ret = perf_event_open(&wp_attr, 0, 0, -1, 0);
      	} while (ret >= 0);
      
      	/*
      	 * Force enable/disble all events to trigger the erronoeous
      	 * installation of the follower event.
      	 */
      	printf("Opened all events. Toggling..\n");
      	for (;;) {
      		prctl(PR_TASK_PERF_EVENTS_DISABLE, 0, 0, 0, 0);
      		prctl(PR_TASK_PERF_EVENTS_ENABLE, 0, 0, 0, 0);
      	}
      
      	return 0;
        }
      
      Fix this by validating this requirement regardless of whether we're
      moving events.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Zhou Chengming <zhouchengming1@huawei.com>
      Link: http://lkml.kernel.org/r/1498142498-15758-1-git-send-email-mark.rutland@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bde6608d