1. 07 Sep, 2017 10 commits
    • Eric Biggers's avatar
      mm, uprobes: fix multiple free of ->uprobes_state.xol_area · 8c46edd7
      Eric Biggers authored
      commit 355627f5 upstream.
      
      Commit 7c051267 ("mm, fork: make dup_mmap wait for mmap_sem for
      write killable") made it possible to kill a forking task while it is
      waiting to acquire its ->mmap_sem for write, in dup_mmap().
      
      However, it was overlooked that this introduced an new error path before
      the new mm_struct's ->uprobes_state.xol_area has been set to NULL after
      being copied from the old mm_struct by the memcpy in dup_mm().  For a
      task that has previously hit a uprobe tracepoint, this resulted in the
      'struct xol_area' being freed multiple times if the task was killed at
      just the right time while forking.
      
      Fix it by setting ->uprobes_state.xol_area to NULL in mm_init() rather
      than in uprobe_dup_mmap().
      
      With CONFIG_UPROBE_EVENTS=y, the bug can be reproduced by the same C
      program given by commit 2b7e8665 ("fork: fix incorrect fput of
      ->exe_file causing use-after-free"), provided that a uprobe tracepoint
      has been set on the fork_thread() function.  For example:
      
          $ gcc reproducer.c -o reproducer -lpthread
          $ nm reproducer | grep fork_thread
          0000000000400719 t fork_thread
          $ echo "p $PWD/reproducer:0x719" > /sys/kernel/debug/tracing/uprobe_events
          $ echo 1 > /sys/kernel/debug/tracing/events/uprobes/enable
          $ ./reproducer
      
      Here is the use-after-free reported by KASAN:
      
          BUG: KASAN: use-after-free in uprobe_clear_state+0x1c4/0x200
          Read of size 8 at addr ffff8800320a8b88 by task reproducer/198
      
          CPU: 1 PID: 198 Comm: reproducer Not tainted 4.13.0-rc7-00015-g36fde05f #255
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-20170228_101828-anatol 04/01/2014
          Call Trace:
           dump_stack+0xdb/0x185
           print_address_description+0x7e/0x290
           kasan_report+0x23b/0x350
           __asan_report_load8_noabort+0x19/0x20
           uprobe_clear_state+0x1c4/0x200
           mmput+0xd6/0x360
           do_exit+0x740/0x1670
           do_group_exit+0x13f/0x380
           get_signal+0x597/0x17d0
           do_signal+0x99/0x1df0
           exit_to_usermode_loop+0x166/0x1e0
           syscall_return_slowpath+0x258/0x2c0
           entry_SYSCALL_64_fastpath+0xbc/0xbe
      
          ...
      
          Allocated by task 199:
           save_stack_trace+0x1b/0x20
           kasan_kmalloc+0xfc/0x180
           kmem_cache_alloc_trace+0xf3/0x330
           __create_xol_area+0x10f/0x780
           uprobe_notify_resume+0x1674/0x2210
           exit_to_usermode_loop+0x150/0x1e0
           prepare_exit_to_usermode+0x14b/0x180
           retint_user+0x8/0x20
      
          Freed by task 199:
           save_stack_trace+0x1b/0x20
           kasan_slab_free+0xa8/0x1a0
           kfree+0xba/0x210
           uprobe_clear_state+0x151/0x200
           mmput+0xd6/0x360
           copy_process.part.8+0x605f/0x65d0
           _do_fork+0x1a5/0xbd0
           SyS_clone+0x19/0x20
           do_syscall_64+0x22f/0x660
           return_from_SYSCALL_64+0x0/0x7a
      
      Note: without KASAN, you may instead see a "Bad page state" message, or
      simply a general protection fault.
      
      Link: http://lkml.kernel.org/r/20170830033303.17927-1-ebiggers3@gmail.com
      Fixes: 7c051267 ("mm, fork: make dup_mmap wait for mmap_sem for write killable")
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Reported-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8c46edd7
    • Stephan Mueller's avatar
      crypto: algif_skcipher - only call put_page on referenced and used pages · 726bd348
      Stephan Mueller authored
      commit 445a5827 upstream.
      
      For asynchronous operation, SGs are allocated without a page mapped to
      them or with a page that is not used (ref-counted). If the SGL is freed,
      the code must only call put_page for an SG if there was a page assigned
      and ref-counted in the first place.
      
      This fixes a kernel crash when using io_submit with more than one iocb
      using the sendmsg and sendpage (vmsplice/splice) interface.
      Signed-off-by: default avatarStephan Mueller <smueller@chronox.de>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      726bd348
    • Stephen Douthit's avatar
      i2c: ismt: Return EMSGSIZE for block reads with bogus length · 44c6b4a9
      Stephen Douthit authored
      commit ba201c4f upstream.
      
      Compare the number of bytes actually seen on the wire to the byte
      count field returned by the slave device.
      
      Previously we just overwrote the byte count returned by the slave
      with the real byte count and let the caller figure out if the
      message was sane.
      Signed-off-by: default avatarStephen Douthit <stephend@adiengineering.com>
      Tested-by: default avatarDan Priamo <danp@adiengineering.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarWolfram Sang <wsa@the-dreams.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      44c6b4a9
    • Stephen Douthit's avatar
      i2c: ismt: Don't duplicate the receive length for block reads · 7a90bfae
      Stephen Douthit authored
      commit b6c159a9 upstream.
      
      According to Table 15-14 of the C2000 EDS (Intel doc #510524) the
      rx data pointed to by the descriptor dptr contains the byte count.
      
      desc->rxbytes reports all bytes read on the wire, including the
      "byte count" byte.  So if a device sends 4 bytes in response to a
      block read, on the wire and in the DMA buffer we see:
      
      count data1 data2 data3 data4
       0x04  0xde  0xad  0xbe  0xef
      
      That's what we want to return in data->block to the next level.
      
      Instead we were actually prefixing that with desc->rxbytes:
      
      bad
      count count data1 data2 data3 data4
       0x05  0x04  0xde  0xad  0xbe  0xef
      
      This was discovered while developing a BMC solution relying on the
      ipmi_ssif.c driver which was trying to interpret the bogus length
      field as part of the IPMI response.
      Signed-off-by: default avatarStephen Douthit <stephend@adiengineering.com>
      Tested-by: default avatarDan Priamo <danp@adiengineering.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarWolfram Sang <wsa@the-dreams.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7a90bfae
    • Ard Biesheuvel's avatar
      crypto: chacha20 - fix handling of chunked input · 6fb972d0
      Ard Biesheuvel authored
      commit 4de43726 upstream.
      
      Commit 9ae433bc ("crypto: chacha20 - convert generic and x86 versions
      to skcipher") ported the existing chacha20 code to use the new skcipher
      API, and introduced a bug along the way. Unfortunately, the tcrypt tests
      did not catch the error, and it was only found recently by Tobias.
      
      Stefan kindly diagnosed the error, and proposed a fix which is similar
      to the one below, with the exception that 'walk.stride' is used rather
      than the hardcoded block size. This does not actually matter in this
      case, but it's a better example of how to use the skcipher walk API.
      
      Fixes: 9ae433bc ("crypto: chacha20 - convert generic and x86 ...")
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Reported-by: default avatarTobias Brunner <tobias@strongswan.org>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6fb972d0
    • Cameron Gutman's avatar
      Input: xpad - fix PowerA init quirk for some gamepad models · 6b31ae87
      Cameron Gutman authored
      commit f5308d1b upstream.
      
      The PowerA gamepad initialization quirk worked with the PowerA
      wired gamepad I had around (0x24c6:0x543a), but a user reported [0]
      that it didn't work for him, even though our gamepads shared the
      same vendor and product IDs.
      
      When I initially implemented the PowerA quirk, I wanted to avoid
      actually triggering the rumble action during init. My tests showed
      that my gamepad would work correctly even if it received a rumble
      of 0 intensity, so that's what I went with.
      
      Unfortunately, this apparently isn't true for all models (perhaps
      a firmware difference?). This non-working gamepad seems to require
      the real magic rumble packet that the Microsoft driver sends, which
      actually vibrates the gamepad. To counteract this effect, I still
      send the old zero-rumble PowerA quirk packet which cancels the
      rumble effect before the motors can spin up enough to vibrate.
      
      [0]: https://github.com/paroj/xpad/issues/48#issuecomment-313904867Reported-by: default avatarKyle Beauchamp <kyleabeauchamp@gmail.com>
      Tested-by: default avatarKyle Beauchamp <kyleabeauchamp@gmail.com>
      Fixes: 81093c98 ("Input: xpad - support some quirky Xbox One pads")
      Signed-off-by: default avatarCameron Gutman <aicommander@gmail.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6b31ae87
    • Anthony Martin's avatar
      Input: synaptics - fix device info appearing different on reconnect · 2ed56448
      Anthony Martin authored
      commit 3f9db52d upstream.
      
      User-modified input settings no longer survive a suspend/resume cycle.
      Starting with 4.12, the touchpad is reinitialized on every reconnect
      because the hardware appears to be different. This can be reproduced
      by running the following as root:
      
          echo -n reconnect >/sys/devices/platform/i8042/serio1/drvctl
      
      A line like the following will show up in dmesg:
      
          [30378.295794] psmouse serio1: synaptics: hardware appears to be
                         different: id(149271-149271), model(114865-114865),
                         caps(d047b3-d047b1), ext(b40000-b40000).
      
      Note the single bit difference in caps: bit 1 (SYN_CAP_MULTIFINGER).
      
      This happens because we modify our stored copy of the device info
      capabilities when we enable advanced gesture mode but this change is
      not reflected in the actual hardware capabilities.
      
      It worked in the past because synaptics_query_hardware used to modify
      the stored synaptics_device_info struct instead of filling in a new
      one, as it does now.
      
      Fix it by no longer faking the SYN_CAP_MULTIFINGER bit when setting
      advanced gesture mode. This necessitated a small refactoring.
      
      Fixes: 6c53694f ("Input: synaptics - split device info into a separate structure")
      Signed-off-by: default avatarAnthony Martin <ality@pbrane.org>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2ed56448
    • James Hogan's avatar
      irqchip: mips-gic: SYNC after enabling GIC region · 55a5a10c
      James Hogan authored
      commit 2c0e8382 upstream.
      
      A SYNC is required between enabling the GIC region and actually trying
      to use it, even if the first access is a read, otherwise its possible
      depending on the timing (and in my case depending on the precise
      alignment of certain kernel code) to hit CM bus errors on that first
      access.
      
      Add the SYNC straight after setting the GIC base.
      
      [paul.burton@imgtec.com:
        Changes later in this series increase our likelihood of hitting this
        by reducing the amount of code that runs between enabling the GIC &
        accessing it.]
      
      Fixes: a7057270 ("irqchip: mips-gic: Add device-tree support")
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: default avatarPaul Burton <paul.burton@imgtec.com>
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jason Cooper <jason@lakedaemon.net>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/17019/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      55a5a10c
    • Arnd Bergmann's avatar
      x86/io: Add "memory" clobber to insb/insw/insl/outsb/outsw/outsl · 69120897
      Arnd Bergmann authored
      commit 7206f9bf upstream.
      
      The x86 version of insb/insw/insl uses an inline assembly that does
      not have the target buffer listed as an output. This can confuse
      the compiler, leading it to think that a subsequent access of the
      buffer is uninitialized:
      
        drivers/net/wireless/wl3501_cs.c: In function ‘wl3501_mgmt_scan_confirm’:
        drivers/net/wireless/wl3501_cs.c:665:9: error: ‘sig.status’ is used uninitialized in this function [-Werror=uninitialized]
        drivers/net/wireless/wl3501_cs.c:668:12: error: ‘sig.cap_info’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
        drivers/net/sb1000.c: In function 'sb1000_rx':
        drivers/net/sb1000.c:775:9: error: 'st[0]' is used uninitialized in this function [-Werror=uninitialized]
        drivers/net/sb1000.c:776:10: error: 'st[1]' may be used uninitialized in this function [-Werror=maybe-uninitialized]
        drivers/net/sb1000.c:784:11: error: 'st[1]' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      
      I tried to mark the exact input buffer as an output here, but couldn't
      figure it out. As suggested by Linus, marking all memory as clobbered
      however is good enough too. For the outs operations, I also add the
      memory clobber, to force the input to be written to local variables.
      This is probably already guaranteed by the "asm volatile", but it can't
      hurt to do this for symmetry.
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Link: http://lkml.kernel.org/r/20170719125310.2487451-5-arnd@arndb.de
      Link: https://lkml.org/lkml/2017/7/12/605Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      69120897
    • Mark Rutland's avatar
      arm64: mm: abort uaccess retries upon fatal signal · 34ed3508
      Mark Rutland authored
      commit 289d07a2 upstream.
      
      When there's a fatal signal pending, arm64's do_page_fault()
      implementation returns 0. The intent is that we'll return to the
      faulting userspace instruction, delivering the signal on the way.
      
      However, if we take a fatal signal during fixing up a uaccess, this
      results in a return to the faulting kernel instruction, which will be
      instantly retried, resulting in the same fault being taken forever. As
      the task never reaches userspace, the signal is not delivered, and the
      task is left unkillable. While the task is stuck in this state, it can
      inhibit the forward progress of the system.
      
      To avoid this, we must ensure that when a fatal signal is pending, we
      apply any necessary fixup for a faulting kernel instruction. Thus we
      will return to an error path, and it is up to that code to make forward
      progress towards delivering the fatal signal.
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Laura Abbott <labbott@redhat.com>
      Reviewed-by: default avatarSteve Capper <steve.capper@arm.com>
      Tested-by: default avatarSteve Capper <steve.capper@arm.com>
      Reviewed-by: default avatarJames Morse <james.morse@arm.com>
      Tested-by: default avatarJames Morse <james.morse@arm.com>
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      34ed3508
  2. 30 Aug, 2017 30 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.12.10 · 6371f030
      Greg Kroah-Hartman authored
      6371f030
    • Benjamin Herrenschmidt's avatar
      powerpc/mm: Ensure cpumask update is ordered · 849e9675
      Benjamin Herrenschmidt authored
      commit 1a92a80a upstream.
      
      There is no guarantee that the various isync's involved with
      the context switch will order the update of the CPU mask with
      the first TLB entry for the new context being loaded by the HW.
      
      Be safe here and add a memory barrier to order any subsequent
      load/store which may bring entries into the TLB.
      
      The corresponding barrier on the other side already exists as
      pte updates use pte_xchg() which uses __cmpxchg_u64 which has
      a sync after the atomic operation.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Reviewed-by: default avatarNicholas Piggin <npiggin@gmail.com>
      [mpe: Add comments in the code]
      [mpe: Backport to 4.12, minor context change]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      849e9675
    • Lv Zheng's avatar
      ACPI: EC: Fix regression related to wrong ECDT initialization order · 53220a20
      Lv Zheng authored
      commit 98529b92 upstream.
      
      Commit 2a570840 (ACPI / EC: Fix a gap that ECDT EC cannot handle
      EC events) introduced acpi_ec_ecdt_start(), but that function is
      invoked before acpi_ec_query_init(), which is too early.  This causes
      the kernel to crash if an EC event occurs after boot, when ec_query_wq
      is not valid:
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000102
       ...
       Workqueue: events acpi_ec_event_handler
       task: ffff9f539790dac0 task.stack: ffffb437c0e10000
       RIP: 0010:__queue_work+0x32/0x430
      
      Normally, the DSDT EC should always be valid, so acpi_ec_ecdt_start()
      is actually a no-op in the majority of cases.  However, commit
      c712bb58 (ACPI / EC: Add support to skip boot stage DSDT probe)
      caused the probing of the DSDT EC as the "boot EC" to be skipped when
      the ECDT EC is valid and uncovered the bug.
      
      Fix this issue by invoking acpi_ec_ecdt_start() after acpi_ec_query_init()
      in acpi_ec_init().
      
      Link: https://jira01.devtools.intel.com/browse/LCK-4348
      Fixes: 2a570840 (ACPI / EC: Fix a gap that ECDT EC cannot handle EC events)
      Fixes: c712bb58 (ACPI / EC: Add support to skip boot stage DSDT probe)
      Reported-by: default avatarWang Wendy <wendy.wang@intel.com>
      Tested-by: default avatarFeng Chenzhou <chenzhoux.feng@intel.com>
      Signed-off-by: default avatarLv Zheng <lv.zheng@intel.com>
      [ rjw: Changelog ]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      53220a20
    • Hanjun Guo's avatar
      ACPI: APD: Fix HID for Hisilicon Hip07/08 · 6e80b88a
      Hanjun Guo authored
      commit f7f3dd5b upstream.
      
      ACPI HID for Hisilicon Hip07/08 should be HISI02A1/2,
      not HISI0A21/2, HISI02A1/2 was tested ok but was modified
      by the stupid typo when upstream the patches (by me),
      correct them to the right IDs (matching the IDs in
      drivers/i2c/busses/i2c-designware-platdrv.c).
      
      Fixes: 6e14cf36 (ACPI / APD: Add clock frequency for Hisilicon Hip07/08 I2C controller)
      Reported-by: default avatarTao Tian <tiantao6@huawei.com>
      Signed-off-by: default avatarHanjun Guo <hanjun.guo@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6e80b88a
    • Dave Jiang's avatar
      ntb: transport shouldn't disable link due to bogus values in SPADs · 49fa8c02
      Dave Jiang authored
      commit f3fd2afe upstream.
      
      It seems that under certain scenarios the SPAD can have bogus values caused
      by an agent (i.e. BIOS or other software) that is not the kernel driver, and
      that causes memory window setup failure. This should not cause the link to
      be disabled because if we do that, the driver will never recover again. We
      have verified in testing that this issue happens and prevents proper link
      recovery.
      Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
      Acked-by: default avatarAllen Hubbe <Allen.Hubbe@dell.com>
      Signed-off-by: default avatarJon Mason <jdmason@kudzu.us>
      Fixes: 84f76685 ("ntb: stop link work when we do not have memory")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      49fa8c02
    • Logan Gunthorpe's avatar
      ntb: ntb_test: ensure the link is up before trying to configure the mws · ab75f027
      Logan Gunthorpe authored
      commit 0eb46345 upstream.
      
      After the link tests, there is a race on one side of the test for
      the link coming up. It's possible, in some cases, for the test script
      to write to the 'peer_trans' files before the link has come up.
      
      To fix this, we simply use the link event file to ensure both sides
      see the link as up before continuning.
      Signed-off-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Acked-by: default avatarAllen Hubbe <Allen.Hubbe@dell.com>
      Signed-off-by: default avatarJon Mason <jdmason@kudzu.us>
      Fixes: a9c59ef7 ("ntb_test: Add a selftest script for the NTB subsystem")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ab75f027
    • Linus Torvalds's avatar
      Clarify (and fix) MAX_LFS_FILESIZE macros · 03e58884
      Linus Torvalds authored
      commit 0cc3b0ec upstream.
      
      We have a MAX_LFS_FILESIZE macro that is meant to be filled in by
      filesystems (and other IO targets) that know they are 64-bit clean and
      don't have any 32-bit limits in their IO path.
      
      It turns out that our 32-bit value for that limit was bogus.  On 32-bit,
      the VM layer is limited by the page cache to only 32-bit index values,
      but our logic for that was confusing and actually wrong.  We used to
      define that value to
      
      	(((loff_t)PAGE_SIZE << (BITS_PER_LONG-1))-1)
      
      which is actually odd in several ways: it limits the index to 31 bits,
      and then it limits files so that they can't have data in that last byte
      of a page that has the highest 31-bit index (ie page index 0x7fffffff).
      
      Neither of those limitations make sense.  The index is actually the full
      32 bit unsigned value, and we can use that whole full page.  So the
      maximum size of the file would logically be "PAGE_SIZE << BITS_PER_LONG".
      
      However, we do wan tto avoid the maximum index, because we have code
      that iterates over the page indexes, and we don't want that code to
      overflow.  So the maximum size of a file on a 32-bit host should
      actually be one page less than the full 32-bit index.
      
      So the actual limit is ULONG_MAX << PAGE_SHIFT.  That means that we will
      not actually be using the page of that last index (ULONG_MAX), but we
      can grow a file up to that limit.
      
      The wrong value of MAX_LFS_FILESIZE actually caused problems for Doug
      Nazar, who was still using a 32-bit host, but with a 9.7TB 2 x RAID5
      volume.  It turns out that our old MAX_LFS_FILESIZE was 8TiB (well, one
      byte less), but the actual true VM limit is one page less than 16TiB.
      
      This was invisible until commit c2a9737f ("vfs,mm: fix a dead loop
      in truncate_inode_pages_range()"), which started applying that
      MAX_LFS_FILESIZE limit to block devices too.
      
      NOTE! On 64-bit, the page index isn't a limiter at all, and the limit is
      actually just the offset type itself (loff_t), which is signed.  But for
      clarity, on 64-bit, just use the maximum signed value, and don't make
      people have to count the number of 'f' characters in the hex constant.
      
      So just use LLONG_MAX for the 64-bit case.  That was what the value had
      been before too, just written out as a hex constant.
      
      Fixes: c2a9737f ("vfs,mm: fix a dead loop in truncate_inode_pages_range()")
      Reported-and-tested-by: default avatarDoug Nazar <nazard@nazar.ca>
      Cc: Andreas Dilger <adilger@dilger.ca>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Dave Kleikamp <shaggy@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      03e58884
    • Joerg Roedel's avatar
      iommu: Fix wrong freeing of iommu_device->dev · 0b9a3f30
      Joerg Roedel authored
      commit 2926a2aa upstream.
      
      The struct iommu_device has a 'struct device' embedded into
      it, not as a pointer, but the whole struct. In the
      conversion of the iommu drivers to use struct iommu_device
      it was forgotten that the relase function for that struct
      device simply calls kfree() on the pointer.
      
      This frees memory that was never allocated and causes memory
      corruption.
      
      To fix this issue, use a pointer to struct device instead of
      embedding the whole struct. This needs some updates in the
      iommu sysfs code as well as the Intel VT-d and AMD IOMMU
      driver.
      Reported-by: default avatarSebastian Ott <sebott@linux.vnet.ibm.com>
      Fixes: 39ab9555 ('iommu: Add sysfs bindings for struct iommu_device')
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0b9a3f30
    • Charles Milette's avatar
      staging: rtl8188eu: add RNX-N150NUB support · 75005bf8
      Charles Milette authored
      commit f299aec6 upstream.
      
      Add support for USB Device Rosewill RNX-N150NUB.
      VendorID: 0x0bda, ProductID: 0xffef
      Signed-off-by: default avatarCharles Milette <charles.milette@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      75005bf8
    • Lorenzo Bianconi's avatar
      iio: magnetometer: st_magn: remove ihl property for LSM303AGR · 91628e2a
      Lorenzo Bianconi authored
      commit 8b35a5f8 upstream.
      
      Remove IRQ active low support for LSM303AGR since the sensor does not
      support that capability for data-ready line
      
      Fixes: a9fd053b (iio: st_sensors: support active-low interrupts)
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo.bianconi@st.com>
      Reviewed-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      91628e2a
    • Lorenzo Bianconi's avatar
      iio: magnetometer: st_magn: fix status register address for LSM303AGR · e59c095c
      Lorenzo Bianconi authored
      commit 541ee9b2 upstream.
      
      Fixes: 97865fe4 (iio: st_sensors: verify interrupt event to status)
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo.bianconi@st.com>
      Reviewed-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e59c095c
    • Srinivas Pandruvada's avatar
      iio: hid-sensor-trigger: Fix the race with user space powering up sensors · fc7957b6
      Srinivas Pandruvada authored
      commit f1664eaa upstream.
      
      It has been reported for a while that with iio-sensor-proxy service the
      rotation only works after one suspend/resume cycle. This required a wait
      in the systemd unit file to avoid race. I found a Yoga 900 where I could
      reproduce this.
      
      The problem scenerio is:
      - During sensor driver init, enable run time PM and also set a
        auto-suspend for 3 seconds.
      	This result in one runtime resume. But there is a check to avoid
      a powerup in this sequence, but rpm is active
      - User space iio-sensor-proxy tries to power up the sensor. Since rpm is
        active it will simply return. But sensors were not actually
      powered up in the prior sequence, so actaully the sensors will not work
      - After 3 seconds the auto suspend kicks
      
      If we add a wait in systemd service file to fire iio-sensor-proxy after
      3 seconds, then now everything will work as the runtime resume will
      actually powerup the sensor as this is a user request.
      
      To avoid this:
      - Remove the check to match user requested state, this will cause a
        brief powerup, but if the iio-sensor-proxy starts immediately it will
      still work as the sensors are ON.
      - Also move the autosuspend delay to place when user requested turn off
        of sensors, like after user finished raw read or buffer disable
      Signed-off-by: default avatarSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Tested-by: default avatarBastien Nocera <hadess@hadess.net>
      Signed-off-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc7957b6
    • Dragos Bogdan's avatar
      iio: imu: adis16480: Fix acceleration scale factor for adis16480 · a1d7b7e7
      Dragos Bogdan authored
      commit fdd0d32e upstream.
      
      According to the datasheet, the range of the acceleration is [-10 g, + 10 g],
      so the scale factor should be 10 instead of 5.
      Signed-off-by: default avatarDragos Bogdan <dragos.bogdan@analog.com>
      Acked-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Signed-off-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a1d7b7e7
    • Martijn Coenen's avatar
      ANDROID: binder: fix proc->tsk check. · bf9b9d3b
      Martijn Coenen authored
      commit b2a6d1b9 upstream.
      
      Commit c4ea41ba ("binder: use group leader instead of open thread")'
      was incomplete and didn't update a check in binder_mmap(), causing all
      mmap() calls into the binder driver to fail.
      Signed-off-by: default avatarMartijn Coenen <maco@android.com>
      Tested-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bf9b9d3b
    • Riley Andrews's avatar
      binder: Use wake up hint for synchronous transactions. · f6fc60d9
      Riley Andrews authored
      commit 00b40d61 upstream.
      
      Use wake_up_interruptible_sync() to hint to the scheduler binder
      transactions are synchronous wakeups. Disable preemption while waking
      to avoid ping-ponging on the binder lock.
      Signed-off-by: default avatarTodd Kjos <tkjos@google.com>
      Signed-off-by: default avatarOmprakash Dhyade <odhyade@codeaurora.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f6fc60d9
    • Todd Kjos's avatar
      binder: use group leader instead of open thread · 7771e3f4
      Todd Kjos authored
      commit c4ea41ba upstream.
      
      The binder allocator assumes that the thread that
      called binder_open will never die for the lifetime of
      that proc. That thread is normally the group_leader,
      however it may not be. Use the group_leader instead
      of current.
      Signed-off-by: default avatarTodd Kjos <tkjos@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7771e3f4
    • Todd Kjos's avatar
      Revert "android: binder: Sanity check at binder ioctl" · 62ccb816
      Todd Kjos authored
      commit a2b18708 upstream.
      
      This reverts commit a906d693.
      
      The patch introduced a race in the binder driver. An attempt to fix the
      race was submitted in "[PATCH v2] android: binder: fix dangling pointer
      comparison", however the conclusion in the discussion for that patch
      was that the original patch should be reverted.
      
      The reversion is being done as part of the fine-grained locking
      patchset since the patch would need to be refactored when
      proc->vmm_vm_mm is removed from struct binder_proc and added
      in the binder allocator.
      Signed-off-by: default avatarTodd Kjos <tkjos@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      62ccb816
    • Jeffy Chen's avatar
      Bluetooth: bnep: fix possible might sleep error in bnep_session · b42c44ad
      Jeffy Chen authored
      commit 25717382 upstream.
      
      It looks like bnep_session has same pattern as the issue reported in
      old rfcomm:
      
      	while (1) {
      		set_current_state(TASK_INTERRUPTIBLE);
      		if (condition)
      			break;
      		// may call might_sleep here
      		schedule();
      	}
      	__set_current_state(TASK_RUNNING);
      
      Which fixed at:
      	dfb2fae7 Bluetooth: Fix nested sleeps
      
      So let's fix it at the same way, also follow the suggestion of:
      https://lwn.net/Articles/628628/Signed-off-by: default avatarJeffy Chen <jeffy.chen@rock-chips.com>
      Reviewed-by: default avatarBrian Norris <briannorris@chromium.org>
      Reviewed-by: default avatarAL Yu-Chen Cho <acho@suse.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b42c44ad
    • Jeffy Chen's avatar
      Bluetooth: cmtp: fix possible might sleep error in cmtp_session · b7418962
      Jeffy Chen authored
      commit f06d9773 upstream.
      
      It looks like cmtp_session has same pattern as the issue reported in
      old rfcomm:
      
      	while (1) {
      		set_current_state(TASK_INTERRUPTIBLE);
      		if (condition)
      			break;
      		// may call might_sleep here
      		schedule();
      	}
      	__set_current_state(TASK_RUNNING);
      
      Which fixed at:
      	dfb2fae7 Bluetooth: Fix nested sleeps
      
      So let's fix it at the same way, also follow the suggestion of:
      https://lwn.net/Articles/628628/Signed-off-by: default avatarJeffy Chen <jeffy.chen@rock-chips.com>
      Reviewed-by: default avatarBrian Norris <briannorris@chromium.org>
      Reviewed-by: default avatarAL Yu-Chen Cho <acho@suse.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b7418962
    • Jeffy Chen's avatar
      Bluetooth: hidp: fix possible might sleep error in hidp_session_thread · e792d2d4
      Jeffy Chen authored
      commit 5da8e47d upstream.
      
      It looks like hidp_session_thread has same pattern as the issue reported in
      old rfcomm:
      
      	while (1) {
      		set_current_state(TASK_INTERRUPTIBLE);
      		if (condition)
      			break;
      		// may call might_sleep here
      		schedule();
      	}
      	__set_current_state(TASK_RUNNING);
      
      Which fixed at:
      	dfb2fae7 Bluetooth: Fix nested sleeps
      
      So let's fix it at the same way, also follow the suggestion of:
      https://lwn.net/Articles/628628/Signed-off-by: default avatarJeffy Chen <jeffy.chen@rock-chips.com>
      Tested-by: default avatarAL Yu-Chen Cho <acho@suse.com>
      Tested-by: default avatarRohit Vaswani <rvaswani@nvidia.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e792d2d4
    • Mateusz Jurczyk's avatar
      netfilter: nfnetlink: Improve input length sanitization in nfnetlink_rcv · 1eb33a1b
      Mateusz Jurczyk authored
      commit f55ce7b0 upstream.
      
      Verify that the length of the socket buffer is sufficient to cover the
      nlmsghdr structure before accessing the nlh->nlmsg_len field for further
      input sanitization. If the client only supplies 1-3 bytes of data in
      sk_buff, then nlh->nlmsg_len remains partially uninitialized and
      contains leftover memory from the corresponding kernel allocation.
      Operating on such data may result in indeterminate evaluation of the
      nlmsg_len < NLMSG_HDRLEN expression.
      
      The bug was discovered by a runtime instrumentation designed to detect
      use of uninitialized memory in the kernel. The patch prevents this and
      other similar tools (e.g. KMSAN) from flagging this behavior in the future.
      Signed-off-by: default avatarMateusz Jurczyk <mjurczyk@google.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Cc: Florian Westphal <fw@strlen.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1eb33a1b
    • Florian Westphal's avatar
      netfilter: nat: fix src map lookup · 8b504107
      Florian Westphal authored
      commit 97772bcd upstream.
      
      When doing initial conversion to rhashtable I replaced the bucket
      walk with a single rhashtable_lookup_fast().
      
      When moving to rhlist I failed to properly walk the list of identical
      tuples, but that is what is needed for this to work correctly.
      The table contains the original tuples, so the reply tuples are all
      distinct.
      
      We currently decide that mapping is (not) in range only based on the
      first entry, but in case its not we need to try the reply tuple of the
      next entry until we either find an in-range mapping or we checked
      all the entries.
      
      This bug makes nat core attempt collision resolution while it might be
      able to use the mapping as-is.
      
      Fixes: 870190a9 ("netfilter: nat: convert nat bysrc hash to rhashtable")
      Reported-by: default avatarJaco Kroon <jaco@uls.co.za>
      Tested-by: default avatarJaco Kroon <jaco@uls.co.za>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8b504107
    • Florian Westphal's avatar
      netfilter: expect: fix crash when putting uninited expectation · f5263887
      Florian Westphal authored
      commit 36ac344e upstream.
      
      We crash in __nf_ct_expect_check, it calls nf_ct_remove_expect on the
      uninitialised expectation instead of existing one, so del_timer chokes
      on random memory address.
      
      Fixes: ec0e3f01 ("netfilter: nf_ct_expect: Add nf_ct_remove_expect()")
      Reported-by: default avatarSergey Kvachonok <ravenexp@gmail.com>
      Tested-by: default avatarSergey Kvachonok <ravenexp@gmail.com>
      Cc: Gao Feng <fgao@ikuai8.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f5263887
    • Vadim Lomovtsev's avatar
      net: sunrpc: svcsock: fix NULL-pointer exception · 4909a7b7
      Vadim Lomovtsev authored
      commit eebe53e8 upstream.
      
      While running nfs/connectathon tests kernel NULL-pointer exception
      has been observed due to races in svcsock.c.
      
      Race is appear when kernel accepts connection by kernel_accept
      (which creates new socket) and start queuing ingress packets
      to new socket. This happens in ksoftirq context which could run
      concurrently on a different core while new socket setup is not done yet.
      
      The fix is to re-order socket user data init sequence and add
      write/read barrier calls to be sure that we got proper values
      for callback pointers before actually calling them.
      
      Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.
      
      Crash log:
      ---<-snip->---
      [ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
      [ 6708.647093] pgd = ffff0000094e0000
      [ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
      [ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
      [ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
      [ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
      [ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
      [ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
      [ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
      [ 6708.773822] PC is at 0x0
      [ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
      [ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
      [ 6708.789248] sp : ffff810ffbad3900
      [ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
      [ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
      [ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
      [ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
      [ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
      [ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
      [ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
      [ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
      [ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
      [ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
      [ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
      [ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
      [ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
      [ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
      [ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
      [ 6708.872084]
      [ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
      [ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
      [ 6708.886075] Call trace:
      [ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
      [ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
      [ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
      [ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
      [ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
      [ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
      [ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
      [ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
      [ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
      [ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
      [ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
      [ 6708.973117] [<          (null)>]           (null)
      [ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
      [ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
      [ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
      [ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
      [ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
      [ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
      [ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
      [ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
      [ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
      [ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
      [ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
      [ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
      [ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
      [ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
      [ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
      [ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
      [ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
      [ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
      [ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
      [ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
      [ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
      [ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
      [ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
      [ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
      [ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
      [ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
      [ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
      [ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
      [ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
      [ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
      [ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
      [ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
      [ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
      [ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
      [ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
      [ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
      [ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
      [ 6709.207967] Code: bad PC value
      [ 6709.211061] SMP: stopping secondary CPUs
      [ 6709.218830] Starting crashdump kernel...
      [ 6709.222749] Bye!
      ---<-snip>---
      Signed-off-by: default avatarVadim Lomovtsev <vlomovts@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4909a7b7
    • Eric Biggers's avatar
      x86/mm: Fix use-after-free of ldt_struct · a8da876c
      Eric Biggers authored
      commit ccd5b323 upstream.
      
      The following commit:
      
        39a0526f ("x86/mm: Factor out LDT init from context init")
      
      renamed init_new_context() to init_new_context_ldt() and added a new
      init_new_context() which calls init_new_context_ldt().  However, the
      error code of init_new_context_ldt() was ignored.  Consequently, if a
      memory allocation in alloc_ldt_struct() failed during a fork(), the
      ->context.ldt of the new task remained the same as that of the old task
      (due to the memcpy() in dup_mm()).  ldt_struct's are not intended to be
      shared, so a use-after-free occurred after one task exited.
      
      Fix the bug by making init_new_context() pass through the error code of
      init_new_context_ldt().
      
      This bug was found by syzkaller, which encountered the following splat:
      
          BUG: KASAN: use-after-free in free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116
          Read of size 4 at addr ffff88006d2cb7c8 by task kworker/u9:0/3710
      
          CPU: 1 PID: 3710 Comm: kworker/u9:0 Not tainted 4.13.0-rc4-next-20170811 #2
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
          Call Trace:
           __dump_stack lib/dump_stack.c:16 [inline]
           dump_stack+0x194/0x257 lib/dump_stack.c:52
           print_address_description+0x73/0x250 mm/kasan/report.c:252
           kasan_report_error mm/kasan/report.c:351 [inline]
           kasan_report+0x24e/0x340 mm/kasan/report.c:409
           __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:429
           free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116
           free_ldt_struct arch/x86/kernel/ldt.c:173 [inline]
           destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171
           destroy_context arch/x86/include/asm/mmu_context.h:157 [inline]
           __mmdrop+0xe9/0x530 kernel/fork.c:889
           mmdrop include/linux/sched/mm.h:42 [inline]
           exec_mmap fs/exec.c:1061 [inline]
           flush_old_exec+0x173c/0x1ff0 fs/exec.c:1291
           load_elf_binary+0x81f/0x4ba0 fs/binfmt_elf.c:855
           search_binary_handler+0x142/0x6b0 fs/exec.c:1652
           exec_binprm fs/exec.c:1694 [inline]
           do_execveat_common.isra.33+0x1746/0x22e0 fs/exec.c:1816
           do_execve+0x31/0x40 fs/exec.c:1860
           call_usermodehelper_exec_async+0x457/0x8f0 kernel/umh.c:100
           ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
      
          Allocated by task 3700:
           save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
           save_stack+0x43/0xd0 mm/kasan/kasan.c:447
           set_track mm/kasan/kasan.c:459 [inline]
           kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
           kmem_cache_alloc_trace+0x136/0x750 mm/slab.c:3627
           kmalloc include/linux/slab.h:493 [inline]
           alloc_ldt_struct+0x52/0x140 arch/x86/kernel/ldt.c:67
           write_ldt+0x7b7/0xab0 arch/x86/kernel/ldt.c:277
           sys_modify_ldt+0x1ef/0x240 arch/x86/kernel/ldt.c:307
           entry_SYSCALL_64_fastpath+0x1f/0xbe
      
          Freed by task 3700:
           save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
           save_stack+0x43/0xd0 mm/kasan/kasan.c:447
           set_track mm/kasan/kasan.c:459 [inline]
           kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
           __cache_free mm/slab.c:3503 [inline]
           kfree+0xca/0x250 mm/slab.c:3820
           free_ldt_struct.part.2+0xdd/0x150 arch/x86/kernel/ldt.c:121
           free_ldt_struct arch/x86/kernel/ldt.c:173 [inline]
           destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171
           destroy_context arch/x86/include/asm/mmu_context.h:157 [inline]
           __mmdrop+0xe9/0x530 kernel/fork.c:889
           mmdrop include/linux/sched/mm.h:42 [inline]
           __mmput kernel/fork.c:916 [inline]
           mmput+0x541/0x6e0 kernel/fork.c:927
           copy_process.part.36+0x22e1/0x4af0 kernel/fork.c:1931
           copy_process kernel/fork.c:1546 [inline]
           _do_fork+0x1ef/0xfb0 kernel/fork.c:2025
           SYSC_clone kernel/fork.c:2135 [inline]
           SyS_clone+0x37/0x50 kernel/fork.c:2129
           do_syscall_64+0x26c/0x8c0 arch/x86/entry/common.c:287
           return_from_SYSCALL_64+0x0/0x7a
      
      Here is a C reproducer:
      
          #include <asm/ldt.h>
          #include <pthread.h>
          #include <signal.h>
          #include <stdlib.h>
          #include <sys/syscall.h>
          #include <sys/wait.h>
          #include <unistd.h>
      
          static void *fork_thread(void *_arg)
          {
              fork();
          }
      
          int main(void)
          {
              struct user_desc desc = { .entry_number = 8191 };
      
              syscall(__NR_modify_ldt, 1, &desc, sizeof(desc));
      
              for (;;) {
                  if (fork() == 0) {
                      pthread_t t;
      
                      srand(getpid());
                      pthread_create(&t, NULL, fork_thread, NULL);
                      usleep(rand() % 10000);
                      syscall(__NR_exit_group, 0);
                  }
                  wait(NULL);
              }
          }
      
      Note: the reproducer takes advantage of the fact that alloc_ldt_struct()
      may use vmalloc() to allocate a large ->entries array, and after
      commit:
      
        5d17a73a ("vmalloc: back off when the current task is killed")
      
      it is possible for userspace to fail a task's vmalloc() by
      sending a fatal signal, e.g. via exit_group().  It would be more
      difficult to reproduce this bug on kernels without that commit.
      
      This bug only affected kernels with CONFIG_MODIFY_LDT_SYSCALL=y.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Acked-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Fixes: 39a0526f ("x86/mm: Factor out LDT init from context init")
      Link: http://lkml.kernel.org/r/20170824175029.76040-1-ebiggers3@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a8da876c
    • Nicholas Piggin's avatar
      timers: Fix excessive granularity of new timers after a nohz idle · 2e11eede
      Nicholas Piggin authored
      commit 2fe59f50 upstream.
      
      When a timer base is idle, it is forwarded when a new timer is added
      to ensure that granularity does not become excessive. When not idle,
      the timer tick is expected to increment the base.
      
      However there are several problems:
      
      - If an existing timer is modified, the base is forwarded only after
        the index is calculated.
      
      - The base is not forwarded by add_timer_on.
      
      - There is a window after a timer is restarted from a nohz idle, after
        it is marked not-idle and before the timer tick on this CPU, where a
        timer may be added but the ancient base does not get forwarded.
      
      These result in excessive granularity (a 1 jiffy timeout can blow out
      to 100s of jiffies), which cause the rcu lockup detector to trigger,
      among other things.
      
      Fix this by keeping track of whether the timer base has been idle
      since it was last run or forwarded, and if so then forward it before
      adding a new timer.
      
      There is still a case where mod_timer optimises the case of a pending
      timer mod with the same expiry time, where the timer can see excessive
      granularity relative to the new, shorter interval. A comment is added,
      but it's not changed because it is an important fastpath for
      networking.
      
      This has been tested and found to fix the RCU softlockup messages.
      
      Testing was also done with tracing to measure requested versus
      achieved wakeup latencies for all non-deferrable timers in an idle
      system (with no lockup watchdogs running). Wakeup latency relative to
      absolute latency is calculated (note this suffers from round-up skew
      at low absolute times) and analysed:
      
                   max     avg      std
      upstream   506.0    1.20     4.68
      patched      2.0    1.08     0.15
      
      The bug was noticed due to the lockup detector Kconfig changes
      dropping it out of people's .configs and resulting in larger base
      clk skew When the lockup detectors are enabled, no CPU can go idle for
      longer than 4 seconds, which limits the granularity errors.
      Sub-optimal timer behaviour is observable on a smaller scale in that
      case:
      
      	     max     avg      std
      upstream     9.0    1.05     0.19
      patched      2.0    1.04     0.11
      
      Fixes: Fixes: a683f390 ("timers: Forward the wheel clock whenever possible")
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Tested-by: default avatarDavid Miller <davem@davemloft.net>
      Cc: dzickus@redhat.com
      Cc: sfr@canb.auug.org.au
      Cc: mpe@ellerman.id.au
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: linuxarm@huawei.com
      Cc: abdhalee@linux.vnet.ibm.com
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: akpm@linux-foundation.org
      Cc: paulmck@linux.vnet.ibm.com
      Cc: torvalds@linux-foundation.org
      Link: http://lkml.kernel.org/r/20170822084348.21436-1-npiggin@gmail.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2e11eede
    • Mark Rutland's avatar
      perf/core: Fix group {cpu,task} validation · 2c0dc7f0
      Mark Rutland authored
      commit 64aee2a9 upstream.
      
      Regardless of which events form a group, it does not make sense for the
      events to target different tasks and/or CPUs, as this leaves the group
      inconsistent and impossible to schedule. The core perf code assumes that
      these are consistent across (successfully intialised) groups.
      
      Core perf code only verifies this when moving SW events into a HW
      context. Thus, we can violate this requirement for pure SW groups and
      pure HW groups, unless the relevant PMU driver happens to perform this
      verification itself. These mismatched groups subsequently wreak havoc
      elsewhere.
      
      For example, we handle watchpoints as SW events, and reserve watchpoint
      HW on a per-CPU basis at pmu::event_init() time to ensure that any event
      that is initialised is guaranteed to have a slot at pmu::add() time.
      However, the core code only checks the group leader's cpu filter (via
      event_filter_match()), and can thus install follower events onto CPUs
      violating thier (mismatched) CPU filters, potentially installing them
      into a CPU without sufficient reserved slots.
      
      This can be triggered with the below test case, resulting in warnings
      from arch backends.
      
        #define _GNU_SOURCE
        #include <linux/hw_breakpoint.h>
        #include <linux/perf_event.h>
        #include <sched.h>
        #include <stdio.h>
        #include <sys/prctl.h>
        #include <sys/syscall.h>
        #include <unistd.h>
      
        static int perf_event_open(struct perf_event_attr *attr, pid_t pid, int cpu,
      			   int group_fd, unsigned long flags)
        {
      	return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags);
        }
      
        char watched_char;
      
        struct perf_event_attr wp_attr = {
      	.type = PERF_TYPE_BREAKPOINT,
      	.bp_type = HW_BREAKPOINT_RW,
      	.bp_addr = (unsigned long)&watched_char,
      	.bp_len = 1,
      	.size = sizeof(wp_attr),
        };
      
        int main(int argc, char *argv[])
        {
      	int leader, ret;
      	cpu_set_t cpus;
      
      	/*
      	 * Force use of CPU0 to ensure our CPU0-bound events get scheduled.
      	 */
      	CPU_ZERO(&cpus);
      	CPU_SET(0, &cpus);
      	ret = sched_setaffinity(0, sizeof(cpus), &cpus);
      	if (ret) {
      		printf("Unable to set cpu affinity\n");
      		return 1;
      	}
      
      	/* open leader event, bound to this task, CPU0 only */
      	leader = perf_event_open(&wp_attr, 0, 0, -1, 0);
      	if (leader < 0) {
      		printf("Couldn't open leader: %d\n", leader);
      		return 1;
      	}
      
      	/*
      	 * Open a follower event that is bound to the same task, but a
      	 * different CPU. This means that the group should never be possible to
      	 * schedule.
      	 */
      	ret = perf_event_open(&wp_attr, 0, 1, leader, 0);
      	if (ret < 0) {
      		printf("Couldn't open mismatched follower: %d\n", ret);
      		return 1;
      	} else {
      		printf("Opened leader/follower with mismastched CPUs\n");
      	}
      
      	/*
      	 * Open as many independent events as we can, all bound to the same
      	 * task, CPU0 only.
      	 */
      	do {
      		ret = perf_event_open(&wp_attr, 0, 0, -1, 0);
      	} while (ret >= 0);
      
      	/*
      	 * Force enable/disble all events to trigger the erronoeous
      	 * installation of the follower event.
      	 */
      	printf("Opened all events. Toggling..\n");
      	for (;;) {
      		prctl(PR_TASK_PERF_EVENTS_DISABLE, 0, 0, 0, 0);
      		prctl(PR_TASK_PERF_EVENTS_ENABLE, 0, 0, 0, 0);
      	}
      
      	return 0;
        }
      
      Fix this by validating this requirement regardless of whether we're
      moving events.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Zhou Chengming <zhouchengming1@huawei.com>
      Link: http://lkml.kernel.org/r/1498142498-15758-1-git-send-email-mark.rutland@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2c0dc7f0
    • Steven Rostedt (VMware)'s avatar
      ftrace: Check for null ret_stack on profile function graph entry function · aa2da6c4
      Steven Rostedt (VMware) authored
      commit a8f0f9e4 upstream.
      
      There's a small race when function graph shutsdown and the calling of the
      registered function graph entry callback. The callback must not reference
      the task's ret_stack without first checking that it is not NULL. Note, when
      a ret_stack is allocated for a task, it stays allocated until the task exits.
      The problem here, is that function_graph is shutdown, and a new task was
      created, which doesn't have its ret_stack allocated. But since some of the
      functions are still being traced, the callbacks can still be called.
      
      The normal function_graph code handles this, but starting with commit
      8861dd30 ("ftrace: Access ret_stack->subtime only in the function
      profiler") the profiler code references the ret_stack on function entry, but
      doesn't check if it is NULL first.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=196611
      
      Fixes: 8861dd30 ("ftrace: Access ret_stack->subtime only in the function profiler")
      Reported-by: lilydjwg@gmail.com
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aa2da6c4
    • Christoph Hellwig's avatar
      virtio_pci: fix cpu affinity support · 1b8ca885
      Christoph Hellwig authored
      commit ba74b6f7 upstream.
      
      Commit 0b0f9dc5 ("Revert "virtio_pci: use shared interrupts for
      virtqueues"") removed the adjustment of the pre_vectors for the virtio
      MSI-X vector allocation which was added in commit fb5e31d9 ("virtio:
      allow drivers to request IRQ affinity when creating VQs"). This will
      lead to an incorrect assignment of MSI-X vectors, and potential
      deadlocks when offlining cpus.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Fixes: 0b0f9dc5 ("Revert "virtio_pci: use shared interrupts for virtqueues")
      Reported-by: default avatarYASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1b8ca885
    • Steven Rostedt (VMware)'s avatar
      ring-buffer: Have ring_buffer_alloc_read_page() return error on offline CPU · 78f2e29f
      Steven Rostedt (VMware) authored
      commit a7e52ad7 upstream.
      
      Chunyu Hu reported:
        "per_cpu trace directories and files are created for all possible cpus,
         but only the cpus which have ever been on-lined have their own per cpu
         ring buffer (allocated by cpuhp threads). While trace_buffers_open, the
         open handler for trace file 'trace_pipe_raw' is always trying to access
         field of ring_buffer_per_cpu, and would panic with the NULL pointer.
      
         Align the behavior of trace_pipe_raw with trace_pipe, that returns -NODEV
         when openning it if that cpu does not have trace ring buffer.
      
         Reproduce:
         cat /sys/kernel/debug/tracing/per_cpu/cpu31/trace_pipe_raw
         (cpu31 is never on-lined, this is a 16 cores x86_64 box)
      
         Tested with:
         1) boot with maxcpus=14, read trace_pipe_raw of cpu15.
            Got -NODEV.
         2) oneline cpu15, read trace_pipe_raw of cpu15.
            Get the raw trace data.
      
         Call trace:
         [ 5760.950995] RIP: 0010:ring_buffer_alloc_read_page+0x32/0xe0
         [ 5760.961678]  tracing_buffers_read+0x1f6/0x230
         [ 5760.962695]  __vfs_read+0x37/0x160
         [ 5760.963498]  ? __vfs_read+0x5/0x160
         [ 5760.964339]  ? security_file_permission+0x9d/0xc0
         [ 5760.965451]  ? __vfs_read+0x5/0x160
         [ 5760.966280]  vfs_read+0x8c/0x130
         [ 5760.967070]  SyS_read+0x55/0xc0
         [ 5760.967779]  do_syscall_64+0x67/0x150
         [ 5760.968687]  entry_SYSCALL64_slow_path+0x25/0x25"
      
      This was introduced by the addition of the feature to reuse reader pages
      instead of re-allocating them. The problem is that the allocation of a
      reader page (which is per cpu) does not check if the cpu is online and set
      up for the ring buffer.
      
      Link: http://lkml.kernel.org/r/1500880866-1177-1-git-send-email-chuhu@redhat.com
      
      Fixes: 73a757e6 ("ring-buffer: Return reader page back into existing ring buffer")
      Reported-by: default avatarChunyu Hu <chuhu@redhat.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      78f2e29f