1. 30 May, 2018 40 commits
    • Petr Vorel's avatar
      ima: Fallback to the builtin hash algorithm · 5cace4b6
      Petr Vorel authored
      [ Upstream commit ab60368a ]
      
      IMA requires having it's hash algorithm be compiled-in due to it's
      early use.  The default IMA algorithm is protected by Kconfig to be
      compiled-in.
      
      The ima_hash kernel parameter allows to choose the hash algorithm. When
      the specified algorithm is not available or available as a module, IMA
      initialization fails, which leads to a kernel panic (mknodat syscall calls
      ima_post_path_mknod()).  Therefore as fallback we force IMA to use
      the default builtin Kconfig hash algorithm.
      
      Fixed crash:
      
      $ grep CONFIG_CRYPTO_MD4 .config
      CONFIG_CRYPTO_MD4=m
      
      [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.12.14-2.3-default root=UUID=74ae8202-9ca7-4e39-813b-22287ec52f7a video=1024x768-16 plymouth.ignore-serial-consoles console=ttyS0 console=tty resume=/dev/disk/by-path/pci-0000:00:07.0-part3 splash=silent showopts ima_hash=md4
      ...
      [    1.545190] ima: Can not allocate md4 (reason: -2)
      ...
      [    2.610120] BUG: unable to handle kernel NULL pointer dereference at           (null)
      [    2.611903] IP: ima_match_policy+0x23/0x390
      [    2.612967] PGD 0 P4D 0
      [    2.613080] Oops: 0000 [#1] SMP
      [    2.613080] Modules linked in: autofs4
      [    2.613080] Supported: Yes
      [    2.613080] CPU: 0 PID: 1 Comm: systemd Not tainted 4.12.14-2.3-default #1
      [    2.613080] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
      [    2.613080] task: ffff88003e2d0040 task.stack: ffffc90000190000
      [    2.613080] RIP: 0010:ima_match_policy+0x23/0x390
      [    2.613080] RSP: 0018:ffffc90000193e88 EFLAGS: 00010296
      [    2.613080] RAX: 0000000000000000 RBX: 000000000000000c RCX: 0000000000000004
      [    2.613080] RDX: 0000000000000010 RSI: 0000000000000001 RDI: ffff880037071728
      [    2.613080] RBP: 0000000000008000 R08: 0000000000000000 R09: 0000000000000000
      [    2.613080] R10: 0000000000000008 R11: 61c8864680b583eb R12: 00005580ff10086f
      [    2.613080] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000008000
      [    2.613080] FS:  00007f5c1da08940(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
      [    2.613080] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [    2.613080] CR2: 0000000000000000 CR3: 0000000037002000 CR4: 00000000003406f0
      [    2.613080] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [    2.613080] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [    2.613080] Call Trace:
      [    2.613080]  ? shmem_mknod+0xbf/0xd0
      [    2.613080]  ima_post_path_mknod+0x1c/0x40
      [    2.613080]  SyS_mknod+0x210/0x220
      [    2.613080]  entry_SYSCALL_64_fastpath+0x1a/0xa5
      [    2.613080] RIP: 0033:0x7f5c1bfde570
      [    2.613080] RSP: 002b:00007ffde1c90dc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000085
      [    2.613080] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5c1bfde570
      [    2.613080] RDX: 0000000000000000 RSI: 0000000000008000 RDI: 00005580ff10086f
      [    2.613080] RBP: 00007ffde1c91040 R08: 00005580ff10086f R09: 0000000000000000
      [    2.613080] R10: 0000000000104000 R11: 0000000000000246 R12: 00005580ffb99660
      [    2.613080] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000002
      [    2.613080] Code: 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57 41 56 44 8d 14 09 41 55 41 54 55 53 44 89 d3 09 cb 48 83 ec 38 48 8b 05 c5 03 29 01 <4c> 8b 20 4c 39 e0 0f 84 d7 01 00 00 4c 89 44 24 08 89 54 24 20
      [    2.613080] RIP: ima_match_policy+0x23/0x390 RSP: ffffc90000193e88
      [    2.613080] CR2: 0000000000000000
      [    2.613080] ---[ end trace 9a9f0a8a73079f6a ]---
      [    2.673052] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
      [    2.673052]
      [    2.675337] Kernel Offset: disabled
      [    2.676405] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
      Signed-off-by: default avatarPetr Vorel <pvorel@suse.cz>
      Signed-off-by: default avatarMimi Zohar <zohar@linux.vnet.ibm.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5cace4b6
    • Jiandi An's avatar
      ima: Fix Kconfig to select TPM 2.0 CRB interface · edf3bf9e
      Jiandi An authored
      [ Upstream commit fac37c62 ]
      
      TPM_CRB driver provides TPM CRB 2.0 support.  If it is built as a
      module, the TPM chip is registered after IMA init.  tpm_pcr_read() in
      IMA fails and displays the following message even though eventually
      there is a TPM chip on the system.
      
      ima: No TPM chip found, activating TPM-bypass! (rc=-19)
      
      Fix IMA Kconfig to select TPM_CRB so TPM_CRB driver is built in the kernel
      and initializes before IMA.
      Signed-off-by: default avatarJiandi An <anjiandi@codeaurora.org>
      Signed-off-by: default avatarMimi Zohar <zohar@linux.vnet.ibm.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      edf3bf9e
    • Karthikeyan Periyasamy's avatar
      ath10k: Fix kernel panic while using worker (ath10k_sta_rc_update_wk) · d1dbe5db
      Karthikeyan Periyasamy authored
      [ Upstream commit 8b2d93dd ]
      
      When attempt to run worker (ath10k_sta_rc_update_wk) after the station object
      (ieee80211_sta) delete will trigger the kernel panic.
      
      This problem arise in AP + Mesh configuration, Where the current node AP VAP
      and neighbor node mesh VAP MAC address are same. When the current mesh node
      try to establish the mesh link with neighbor node, driver peer creation for
      the neighbor mesh node fails due to duplication MAC address. Already the AP
      VAP created with same MAC address.
      
      It is caused by the following scenario steps.
      
      Steps:
      1. In above condition, ath10k driver sta_state callback (ath10k_sta_state)
         fails to do the state change for a station from IEEE80211_STA_NOTEXIST
         to IEEE80211_STA_NONE due to peer creation fails. Sta_state callback is
         called from ieee80211_add_station() to handle the new station
         (neighbor mesh node) request from the wpa_supplicant.
      2. Concurrently ath10k receive the sta_rc_update callback notification from
         the mesh_neighbour_update() to handle the beacon frames of the above
         neighbor mesh node. since its atomic callback, ath10k driver queue the
         work (ath10k_sta_rc_update_wk) to handle rc update.
      3. Due to driver sta_state callback fails (step 1), mac80211 free the station
         object.
      4. When the worker (ath10k_sta_rc_update_wk) scheduled to run, it will access
         the station object which is already deleted. so it will trigger kernel
         panic.
      
      Added the peer exist check in sta_rc_update callback before queue the work.
      
      Kernel Panic log:
      
      Unable to handle kernel NULL pointer dereference at virtual address 00000000
      pgd = c0204000
      [00000000] *pgd=00000000
      Internal error: Oops: 17 [#1] PREEMPT SMP ARM
      CPU: 1 PID: 1833 Comm: kworker/u4:2 Not tainted 3.14.77 #1
      task: dcef0000 ti: d72b6000 task.ti: d72b6000
      PC is at pwq_activate_delayed_work+0x10/0x40
      LR is at pwq_activate_delayed_work+0xc/0x40
      pc : [<c023f988>]    lr : [<c023f984>]    psr: 40000193
      sp : d72b7f18  ip : 0000007a  fp : d72b6000
      r10: 00000000  r9 : dd404414  r8 : d8c31998
      r7 : d72b6038  r6 : 00000004  r5 : d4907ec8  r4 : dcee1300
      r3 : ffffffe0  r2 : 00000000  r1 : 00000001  r0 : 00000000
      Flags: nZcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
      Control: 10c5787d  Table: 595bc06a  DAC: 00000015
      ...
      Process kworker/u4:2 (pid: 1833, stack limit = 0xd72b6238)
      Stack: (0xd72b7f18 to 0xd72b8000)
      7f00:                                                       00000001 dcee1300
      7f20: 00000001 c02410dc d8c31980 dd404400 dd404400 c0242790 d8c31980 00000089
      7f40: 00000000 d93e1340 00000000 d8c31980 c0242568 00000000 00000000 00000000
      7f60: 00000000 c02474dc 00000000 00000000 000000f8 d8c31980 00000000 00000000
      7f80: d72b7f80 d72b7f80 00000000 00000000 d72b7f90 d72b7f90 d72b7fac d93e1340
      7fa0: c0247404 00000000 00000000 c0208d20 00000000 00000000 00000000 00000000
      7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      7fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
      [<c023f988>] (pwq_activate_delayed_work) from [<c02410dc>] (pwq_dec_nr_in_flight+0x58/0xc4)
      [<c02410dc>] (pwq_dec_nr_in_flight) from [<c0242790>] (worker_thread+0x228/0x360)
      [<c0242790>] (worker_thread) from [<c02474dc>] (kthread+0xd8/0xec)
      [<c02474dc>] (kthread) from [<c0208d20>] (ret_from_fork+0x14/0x34)
      Code: e92d4038 e1a05000 ebffffbc[69210.619376] SMP: failed to stop secondary CPUs
      Rebooting in 3 seconds..
      Signed-off-by: default avatarKarthikeyan Periyasamy <periyasa@codeaurora.org>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d1dbe5db
    • Leon Romanovsky's avatar
      net/mlx5: Protect from command bit overflow · 3a97320c
      Leon Romanovsky authored
      [ Upstream commit 957f6ba8 ]
      
      The system with CONFIG_UBSAN enabled on produces the following error
      during driver initialization. The reason to it that max_reg_cmds can be
      larger enough to cause to "1 << max_reg_cmds" overflow the unsigned long.
      
      ================================================================================
      UBSAN: Undefined behaviour in drivers/net/ethernet/mellanox/mlx5/core/cmd.c:1805:42
      signed integer overflow:
      -2147483648 - 1 cannot be represented in type 'int'
      CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.15.0-rc2-00032-g06cda2358d9b-dirty #724
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
      Call Trace:
       dump_stack+0xe9/0x18f
       ? dma_virt_alloc+0x81/0x81
       ubsan_epilogue+0xe/0x4e
       handle_overflow+0x187/0x20c
       mlx5_cmd_init+0x73a/0x12b0
       mlx5_load_one+0x1c3d/0x1d30
       init_one+0xd02/0xf10
       pci_device_probe+0x26c/0x3b0
       driver_probe_device+0x622/0xb40
       __driver_attach+0x175/0x1b0
       bus_for_each_dev+0xef/0x190
       bus_add_driver+0x2db/0x490
       driver_register+0x16b/0x1e0
       __pci_register_driver+0x177/0x1b0
       init+0x6d/0x92
       do_one_initcall+0x15b/0x270
       kernel_init_freeable+0x2d8/0x3d0
       kernel_init+0x14/0x190
       ret_from_fork+0x24/0x30
      ================================================================================
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3a97320c
    • Michael Ellerman's avatar
      selftests: Print the test we're running to /dev/kmsg · e1448fac
      Michael Ellerman authored
      [ Upstream commit 88893cf7 ]
      
      Some tests cause the kernel to print things to the kernel log
      buffer (ie. printk), in particular oops and warnings etc. However when
      running all the tests in succession it's not always obvious which
      test(s) caused the kernel to print something.
      
      We can narrow it down by printing which test directory we're running
      in to /dev/kmsg, if it's writable.
      
      Example output:
      
        [  170.149149] kselftest: Running tests in powerpc
        [  305.300132] kworker/dying (71) used greatest stack depth: 7776 bytes
                       left
        [  808.915456] kselftest: Running tests in pstore
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarShuah Khan <shuahkh@osg.samsung.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e1448fac
    • Frank Asseg's avatar
      tools/thermal: tmon: fix for segfault · 0bc0aa27
      Frank Asseg authored
      [ Upstream commit 6c59f64b ]
      
      Fixes a segfault occurring when e.g. <TAB> is pressed multiple times in the
      ncurses tmon application. The segfault is caused by incrementing
      cur_thermal_record in the main function without checking if it's value reached
      NR_THERMAL_RECORD immediately. Since the boundary check only occurred in
      update_thermal_data a race condition existed, which lead to an attempted read
      beyond the last element of the trec array.
      
      The fix was implemented by moving the cur_thermal_record incrementation to the
      update_thermal_data function using a temporary variable on which the boundary
      condition is checked before updating cur_thread_record, so that the variable is
      never incremented beyond the trec array's boundary.
      
      It seems the segfault does not occur on every machine: On a HP EliteBook G4 the
      segfault happens, while it does not happen on a Thinkpad T540p.
      Signed-off-by: default avatarFrank Asseg <frank.asseg@objecthunter.net>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0bc0aa27
    • Michael Ellerman's avatar
      powerpc/perf: Fix kernel address leak via sampling registers · 55f43f3b
      Michael Ellerman authored
      [ Upstream commit e1ebd0e5 ]
      
      Current code in power_pmu_disable() does not clear the sampling
      registers like Sampling Instruction Address Register (SIAR) and
      Sampling Data Address Register (SDAR) after disabling the PMU. Since
      these are userspace readable and could contain kernel addresses, add
      code to explicitly clear the content of these registers.
      
      Also add a "context synchronizing instruction" to enforce no further
      updates to these registers as suggested by Power ISA v3.0B. From
      section 9.4, on page 1108:
      
        "If an mtspr instruction is executed that changes the value of a
        Performance Monitor register other than SIAR, SDAR, and SIER, the
        change is not guaranteed to have taken effect until after a
        subsequent context synchronizing instruction has been executed (see
        Chapter 11. "Synchronization Requirements for Context Alterations"
        on page 1133)."
      Signed-off-by: default avatarMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      [mpe: Massage change log and add ISA reference]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      55f43f3b
    • Madhavan Srinivasan's avatar
      powerpc/perf: Prevent kernel address leak to userspace via BHRB buffer · 83fb2a49
      Madhavan Srinivasan authored
      [ Upstream commit bb19af81 ]
      
      The current Branch History Rolling Buffer (BHRB) code does not check
      for any privilege levels before updating the data from BHRB. This
      could leak kernel addresses to userspace even when profiling only with
      userspace privileges. Add proper checks to prevent it.
      Acked-by: default avatarBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: default avatarMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      83fb2a49
    • Alexandre Belloni's avatar
      rtc: hctosys: Ensure system time doesn't overflow time_t · 0cf2a214
      Alexandre Belloni authored
      [ Upstream commit b3a5ac42 ]
      
      On 32bit platforms, time_t is still a signed 32bit long. If it is
      overflowed, userspace and the kernel cant agree on the current system time.
      This causes multiple issues, in particular with systemd:
      https://github.com/systemd/systemd/issues/1143
      
      A good workaround is to simply avoid using hctosys which is something I
      greatly encourage as the time is better set by userspace.
      
      However, many distribution enable it and use systemd which is rendering the
      system unusable in case the RTC holds a date after 2038 (and more so after
      2106). Many drivers have workaround for this case and they should be
      eliminated so there is only one place left to fix when userspace is able to
      cope with dates after the 31bit overflow.
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@bootlin.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0cf2a214
    • Guenter Roeck's avatar
      hwmon: (nct6775) Fix writing pwmX_mode · 634ac348
      Guenter Roeck authored
      [ Upstream commit 415eb2a1 ]
      
      pwmX_mode is defined in the ABI as 0=DC mode, 1=pwm mode. The chip
      register bit is set to 1 for DC mode. This got mixed up, and writing
      1 into pwmX_mode resulted in DC mode enabled. Fix it up by using
      the ABI definition throughout the driver for consistency.
      
      Fixes: 77eb5b37 ("hwmon: (nct6775) Add support for pwm, pwm_mode, ... ")
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      634ac348
    • Helge Deller's avatar
      parisc/pci: Switch LBA PCI bus from Hard Fail to Soft Fail mode · c33badb9
      Helge Deller authored
      [ Upstream commit b845f66f ]
      
      Carlo Pisani noticed that his C3600 workstation behaved unstable during heavy
      I/O on the PCI bus with a VIA VT6421 IDE/SATA PCI card.
      
      To avoid such instability, this patch switches the LBA PCI bus from Hard Fail
      mode into Soft Fail mode. In this mode the bus will return -1UL for timed out
      MMIO transactions, which is exactly how the x86 (and most other architectures)
      PCI busses behave.
      
      This patch is based on a proposal by Grant Grundler and Kyle McMartin 10
      years ago:
      https://www.spinics.net/lists/linux-parisc/msg01027.html
      
      Cc: Carlo Pisani <carlojpisani@gmail.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Reviewed-by: default avatarGrant Grundler <grantgrundler@gmail.com>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c33badb9
    • Greg Ungerer's avatar
      m68k: set dma and coherent masks for platform FEC ethernets · b1a557b9
      Greg Ungerer authored
      [ Upstream commit f61e6431 ]
      
      As of commit 205e1b7f ("dma-mapping: warn when there is no
      coherent_dma_mask") the Freescale FEC driver is issuing the following
      warning on driver initialization on ColdFire systems:
      
      WARNING: CPU: 0 PID: 1 at ./include/linux/dma-mapping.h:516 0x40159e20
      Modules linked in:
      CPU: 0 PID: 1 Comm: swapper Not tainted 4.16.0-rc7-dirty #4
      Stack from 41833dd8:
              41833dd8 40259c53 40025534 40279e26 00000003 00000000 4004e514 41827000
              400255de 40244e42 00000204 40159e20 00000009 00000000 00000000 4024531d
              40159e20 40244e42 00000204 00000000 00000000 00000000 00000007 00000000
              00000000 40279e26 4028d040 40226576 4003ae88 40279e26 418273f6 41833ef8
              7fffffff 418273f2 41867028 4003c9a2 4180ac6c 00000004 41833f8c 4013e71c
              40279e1c 40279e26 40226c16 4013ced2 40279e26 40279e58 4028d040 00000000
      Call Trace:
              [<40025534>] 0x40025534
       [<4004e514>] 0x4004e514
       [<400255de>] 0x400255de
       [<40159e20>] 0x40159e20
       [<40159e20>] 0x40159e20
      
      It is not fatal, the driver and the system continue to function normally.
      
      As per the warning the coherent_dma_mask is not set on this device.
      There is nothing special about the DMA memory coherency on this hardware
      so we can just set the mask to 32bits in the platform data for the FEC
      ethernet devices.
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b1a557b9
    • Michael Ellerman's avatar
      powerpc/mpic: Check if cpu_possible() in mpic_physmask() · f8240a4f
      Michael Ellerman authored
      [ Upstream commit 0834d627 ]
      
      In mpic_physmask() we loop over all CPUs up to 32, then get the hard
      SMP processor id of that CPU.
      
      Currently that's possibly walking off the end of the paca array, but
      in a future patch we will change the paca array to be an array of
      pointers, and in that case we will get a NULL for missing CPUs and
      oops. eg:
      
        Unable to handle kernel paging request for data at address 0x88888888888888b8
        Faulting instruction address: 0xc00000000004e380
        Oops: Kernel access of bad area, sig: 11 [#1]
        ...
        NIP .mpic_set_affinity+0x60/0x1a0
        LR  .irq_do_set_affinity+0x48/0x100
      
      Fix it by checking the CPU is possible, this also fixes the code if
      there are gaps in the CPU numbering which probably never happens on
      mpic systems but who knows.
      Debugged-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f8240a4f
    • Lenny Szubowicz's avatar
      ACPI: acpi_pad: Fix memory leak in power saving threads · 30ae6240
      Lenny Szubowicz authored
      [ Upstream commit 8b29d29a ]
      
      Fix once per second (round_robin_time) memory leak of about 1 KB in
      each acpi_pad kernel idling thread that is activated.
      
      Found by testing with kmemleak.
      Signed-off-by: default avatarLenny Szubowicz <lszubowi@redhat.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      30ae6240
    • Dan Carpenter's avatar
      xen/acpi: off by one in read_acpi_id() · a1b03156
      Dan Carpenter authored
      [ Upstream commit c37a3c94 ]
      
      If acpi_id is == nr_acpi_bits, then we access one element beyond the end
      of the acpi_psd[] array or we set one bit beyond the end of the bit map
      when we do __set_bit(acpi_id, acpi_id_present);
      
      Fixes: 59a56802 ("xen/acpi-processor: C and P-state driver that uploads said data to hypervisor.")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Reviewed-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a1b03156
    • Jeff Mahoney's avatar
      btrfs: fix lockdep splat in btrfs_alloc_subvolume_writers · 93c13652
      Jeff Mahoney authored
      [ Upstream commit 8a5a916d ]
      
      While running btrfs/011, I hit the following lockdep splat.
      
      This is the important bit:
         pcpu_alloc+0x1ac/0x5e0
         __percpu_counter_init+0x4e/0xb0
         btrfs_init_fs_root+0x99/0x1c0 [btrfs]
         btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
         resolve_indirect_refs+0x130/0x830 [btrfs]
         find_parent_nodes+0x69e/0xff0 [btrfs]
         btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
         btrfs_find_all_roots+0x50/0x70 [btrfs]
         btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
         btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
      
      The percpu_counter_init call in btrfs_alloc_subvolume_writers
      uses GFP_KERNEL, which we can't do during transaction commit.
      
      This switches it to GFP_NOFS.
      
      ========================================================
      WARNING: possible irq lock inversion dependency detected
      4.12.14-kvmsmall #8 Tainted: G        W
      --------------------------------------------------------
      kswapd0/50 just changed the state of lock:
       (&delayed_node->mutex){+.+.-.}, at: [<ffffffffc06994fa>] __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
      but this lock took another, RECLAIM_FS-unsafe lock in the past:
       (pcpu_alloc_mutex){+.+.+.}
      
      and interrupts could create inverse lock ordering between them.
      
      other info that might help us debug this:
      Chain exists of:
        &delayed_node->mutex --> &found->groups_sem --> pcpu_alloc_mutex
      
       Possible interrupt unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(pcpu_alloc_mutex);
                                     local_irq_disable();
                                     lock(&delayed_node->mutex);
                                     lock(&found->groups_sem);
        <Interrupt>
          lock(&delayed_node->mutex);
      
       *** DEADLOCK ***
      
      2 locks held by kswapd0/50:
       #0:  (shrinker_rwsem){++++..}, at: [<ffffffff811dc11f>] shrink_slab+0x7f/0x5b0
       #1:  (&type->s_umount_key#30){+++++.}, at: [<ffffffff8126dec6>] trylock_super+0x16/0x50
      
      the shortest dependencies between 2nd lock and 1st lock:
         -> (pcpu_alloc_mutex){+.+.+.} ops: 4904 {
            HARDIRQ-ON-W at:
                                __mutex_lock+0x4e/0x8c0
                                pcpu_alloc+0x1ac/0x5e0
                                alloc_kmem_cache_cpus.isra.70+0x25/0xa0
                                __do_tune_cpucache+0x2c/0x220
                                do_tune_cpucache+0x26/0xc0
                                enable_cpucache+0x6d/0xf0
                                kmem_cache_init_late+0x42/0x75
                                start_kernel+0x343/0x4cb
                                x86_64_start_kernel+0x127/0x134
                                secondary_startup_64+0xa5/0xb0
            SOFTIRQ-ON-W at:
                                __mutex_lock+0x4e/0x8c0
                                pcpu_alloc+0x1ac/0x5e0
                                alloc_kmem_cache_cpus.isra.70+0x25/0xa0
                                __do_tune_cpucache+0x2c/0x220
                                do_tune_cpucache+0x26/0xc0
                                enable_cpucache+0x6d/0xf0
                                kmem_cache_init_late+0x42/0x75
                                start_kernel+0x343/0x4cb
                                x86_64_start_kernel+0x127/0x134
                                secondary_startup_64+0xa5/0xb0
            RECLAIM_FS-ON-W at:
                                   __kmalloc+0x47/0x310
                                   pcpu_extend_area_map+0x2b/0xc0
                                   pcpu_alloc+0x3ec/0x5e0
                                   alloc_kmem_cache_cpus.isra.70+0x25/0xa0
                                   __do_tune_cpucache+0x2c/0x220
                                   do_tune_cpucache+0x26/0xc0
                                   enable_cpucache+0x6d/0xf0
                                   __kmem_cache_create+0x1bf/0x390
                                   create_cache+0xba/0x1b0
                                   kmem_cache_create+0x1f8/0x2b0
                                   ksm_init+0x6f/0x19d
                                   do_one_initcall+0x50/0x1b0
                                   kernel_init_freeable+0x201/0x289
                                   kernel_init+0xa/0x100
                                   ret_from_fork+0x3a/0x50
            INITIAL USE at:
                               __mutex_lock+0x4e/0x8c0
                               pcpu_alloc+0x1ac/0x5e0
                               alloc_kmem_cache_cpus.isra.70+0x25/0xa0
                               setup_cpu_cache+0x2f/0x1f0
                               __kmem_cache_create+0x1bf/0x390
                               create_boot_cache+0x8b/0xb1
                               kmem_cache_init+0xa1/0x19e
                               start_kernel+0x270/0x4cb
                               x86_64_start_kernel+0x127/0x134
                               secondary_startup_64+0xa5/0xb0
          }
          ... key      at: [<ffffffff821d8e70>] pcpu_alloc_mutex+0x70/0xa0
          ... acquired at:
         pcpu_alloc+0x1ac/0x5e0
         __percpu_counter_init+0x4e/0xb0
         btrfs_init_fs_root+0x99/0x1c0 [btrfs]
         btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
         resolve_indirect_refs+0x130/0x830 [btrfs]
         find_parent_nodes+0x69e/0xff0 [btrfs]
         btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
         btrfs_find_all_roots+0x50/0x70 [btrfs]
         btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
         btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
         transaction_kthread+0x176/0x1b0 [btrfs]
         kthread+0x102/0x140
         ret_from_fork+0x3a/0x50
      
        -> (&fs_info->commit_root_sem){++++..} ops: 1566382 {
           HARDIRQ-ON-W at:
                              down_write+0x3e/0xa0
                              cache_block_group+0x287/0x420 [btrfs]
                              find_free_extent+0x106c/0x12d0 [btrfs]
                              btrfs_reserve_extent+0xd8/0x170 [btrfs]
                              cow_file_range.isra.66+0x133/0x470 [btrfs]
                              run_delalloc_range+0x121/0x410 [btrfs]
                              writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
                              __extent_writepage+0x19a/0x360 [btrfs]
                              extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
                              extent_writepages+0x4d/0x60 [btrfs]
                              do_writepages+0x1a/0x70
                              __filemap_fdatawrite_range+0xa7/0xe0
                              btrfs_rename+0x5ee/0xdb0 [btrfs]
                              vfs_rename+0x52a/0x7e0
                              SyS_rename+0x351/0x3b0
                              do_syscall_64+0x79/0x1e0
                              entry_SYSCALL_64_after_hwframe+0x42/0xb7
           HARDIRQ-ON-R at:
                              down_read+0x35/0x90
                              caching_thread+0x57/0x560 [btrfs]
                              normal_work_helper+0x1c0/0x5e0 [btrfs]
                              process_one_work+0x1e0/0x5c0
                              worker_thread+0x44/0x390
                              kthread+0x102/0x140
                              ret_from_fork+0x3a/0x50
           SOFTIRQ-ON-W at:
                              down_write+0x3e/0xa0
                              cache_block_group+0x287/0x420 [btrfs]
                              find_free_extent+0x106c/0x12d0 [btrfs]
                              btrfs_reserve_extent+0xd8/0x170 [btrfs]
                              cow_file_range.isra.66+0x133/0x470 [btrfs]
                              run_delalloc_range+0x121/0x410 [btrfs]
                              writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
                              __extent_writepage+0x19a/0x360 [btrfs]
                              extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
                              extent_writepages+0x4d/0x60 [btrfs]
                              do_writepages+0x1a/0x70
                              __filemap_fdatawrite_range+0xa7/0xe0
                              btrfs_rename+0x5ee/0xdb0 [btrfs]
                              vfs_rename+0x52a/0x7e0
                              SyS_rename+0x351/0x3b0
                              do_syscall_64+0x79/0x1e0
                              entry_SYSCALL_64_after_hwframe+0x42/0xb7
           SOFTIRQ-ON-R at:
                              down_read+0x35/0x90
                              caching_thread+0x57/0x560 [btrfs]
                              normal_work_helper+0x1c0/0x5e0 [btrfs]
                              process_one_work+0x1e0/0x5c0
                              worker_thread+0x44/0x390
                              kthread+0x102/0x140
                              ret_from_fork+0x3a/0x50
           INITIAL USE at:
                             down_write+0x3e/0xa0
                             cache_block_group+0x287/0x420 [btrfs]
                             find_free_extent+0x106c/0x12d0 [btrfs]
                             btrfs_reserve_extent+0xd8/0x170 [btrfs]
                             cow_file_range.isra.66+0x133/0x470 [btrfs]
                             run_delalloc_range+0x121/0x410 [btrfs]
                             writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
                             __extent_writepage+0x19a/0x360 [btrfs]
                             extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
                             extent_writepages+0x4d/0x60 [btrfs]
                             do_writepages+0x1a/0x70
                             __filemap_fdatawrite_range+0xa7/0xe0
                             btrfs_rename+0x5ee/0xdb0 [btrfs]
                             vfs_rename+0x52a/0x7e0
                             SyS_rename+0x351/0x3b0
                             do_syscall_64+0x79/0x1e0
                             entry_SYSCALL_64_after_hwframe+0x42/0xb7
         }
         ... key      at: [<ffffffffc0729578>] __key.61970+0x0/0xfffffffffff9aa88 [btrfs]
         ... acquired at:
         cache_block_group+0x287/0x420 [btrfs]
         find_free_extent+0x106c/0x12d0 [btrfs]
         btrfs_reserve_extent+0xd8/0x170 [btrfs]
         btrfs_alloc_tree_block+0x12f/0x4c0 [btrfs]
         btrfs_create_tree+0xbb/0x2a0 [btrfs]
         btrfs_create_uuid_tree+0x37/0x140 [btrfs]
         open_ctree+0x23c0/0x2660 [btrfs]
         btrfs_mount+0xd36/0xf90 [btrfs]
         mount_fs+0x3a/0x160
         vfs_kern_mount+0x66/0x150
         btrfs_mount+0x18c/0xf90 [btrfs]
         mount_fs+0x3a/0x160
         vfs_kern_mount+0x66/0x150
         do_mount+0x1c1/0xcc0
         SyS_mount+0x7e/0xd0
         do_syscall_64+0x79/0x1e0
         entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
       -> (&found->groups_sem){++++..} ops: 2134587 {
          HARDIRQ-ON-W at:
                            down_write+0x3e/0xa0
                            __link_block_group+0x34/0x130 [btrfs]
                            btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
                            open_ctree+0x2054/0x2660 [btrfs]
                            btrfs_mount+0xd36/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            btrfs_mount+0x18c/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            do_mount+0x1c1/0xcc0
                            SyS_mount+0x7e/0xd0
                            do_syscall_64+0x79/0x1e0
                            entry_SYSCALL_64_after_hwframe+0x42/0xb7
          HARDIRQ-ON-R at:
                            down_read+0x35/0x90
                            btrfs_calc_num_tolerated_disk_barrier_failures+0x113/0x1f0 [btrfs]
                            open_ctree+0x207b/0x2660 [btrfs]
                            btrfs_mount+0xd36/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            btrfs_mount+0x18c/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            do_mount+0x1c1/0xcc0
                            SyS_mount+0x7e/0xd0
                            do_syscall_64+0x79/0x1e0
                            entry_SYSCALL_64_after_hwframe+0x42/0xb7
          SOFTIRQ-ON-W at:
                            down_write+0x3e/0xa0
                            __link_block_group+0x34/0x130 [btrfs]
                            btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
                            open_ctree+0x2054/0x2660 [btrfs]
                            btrfs_mount+0xd36/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            btrfs_mount+0x18c/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            do_mount+0x1c1/0xcc0
                            SyS_mount+0x7e/0xd0
                            do_syscall_64+0x79/0x1e0
                            entry_SYSCALL_64_after_hwframe+0x42/0xb7
          SOFTIRQ-ON-R at:
                            down_read+0x35/0x90
                            btrfs_calc_num_tolerated_disk_barrier_failures+0x113/0x1f0 [btrfs]
                            open_ctree+0x207b/0x2660 [btrfs]
                            btrfs_mount+0xd36/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            btrfs_mount+0x18c/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            do_mount+0x1c1/0xcc0
                            SyS_mount+0x7e/0xd0
                            do_syscall_64+0x79/0x1e0
                            entry_SYSCALL_64_after_hwframe+0x42/0xb7
          INITIAL USE at:
                           down_write+0x3e/0xa0
                           __link_block_group+0x34/0x130 [btrfs]
                           btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
                           open_ctree+0x2054/0x2660 [btrfs]
                           btrfs_mount+0xd36/0xf90 [btrfs]
                           mount_fs+0x3a/0x160
                           vfs_kern_mount+0x66/0x150
                           btrfs_mount+0x18c/0xf90 [btrfs]
                           mount_fs+0x3a/0x160
                           vfs_kern_mount+0x66/0x150
                           do_mount+0x1c1/0xcc0
                           SyS_mount+0x7e/0xd0
                           do_syscall_64+0x79/0x1e0
                           entry_SYSCALL_64_after_hwframe+0x42/0xb7
        }
        ... key      at: [<ffffffffc0729488>] __key.59101+0x0/0xfffffffffff9ab78 [btrfs]
        ... acquired at:
         find_free_extent+0xcb4/0x12d0 [btrfs]
         btrfs_reserve_extent+0xd8/0x170 [btrfs]
         btrfs_alloc_tree_block+0x12f/0x4c0 [btrfs]
         __btrfs_cow_block+0x110/0x5b0 [btrfs]
         btrfs_cow_block+0xd7/0x290 [btrfs]
         btrfs_search_slot+0x1f6/0x960 [btrfs]
         btrfs_lookup_inode+0x2a/0x90 [btrfs]
         __btrfs_update_delayed_inode+0x65/0x210 [btrfs]
         btrfs_commit_inode_delayed_inode+0x121/0x130 [btrfs]
         btrfs_evict_inode+0x3fe/0x6a0 [btrfs]
         evict+0xc4/0x190
         __dentry_kill+0xbf/0x170
         dput+0x2ae/0x2f0
         SyS_rename+0x2a6/0x3b0
         do_syscall_64+0x79/0x1e0
         entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      -> (&delayed_node->mutex){+.+.-.} ops: 5580204 {
         HARDIRQ-ON-W at:
                          __mutex_lock+0x4e/0x8c0
                          btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
                          btrfs_update_inode+0x83/0x110 [btrfs]
                          btrfs_dirty_inode+0x62/0xe0 [btrfs]
                          touch_atime+0x8c/0xb0
                          do_generic_file_read+0x818/0xb10
                          __vfs_read+0xdc/0x150
                          vfs_read+0x8a/0x130
                          SyS_read+0x45/0xa0
                          do_syscall_64+0x79/0x1e0
                          entry_SYSCALL_64_after_hwframe+0x42/0xb7
         SOFTIRQ-ON-W at:
                          __mutex_lock+0x4e/0x8c0
                          btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
                          btrfs_update_inode+0x83/0x110 [btrfs]
                          btrfs_dirty_inode+0x62/0xe0 [btrfs]
                          touch_atime+0x8c/0xb0
                          do_generic_file_read+0x818/0xb10
                          __vfs_read+0xdc/0x150
                          vfs_read+0x8a/0x130
                          SyS_read+0x45/0xa0
                          do_syscall_64+0x79/0x1e0
                          entry_SYSCALL_64_after_hwframe+0x42/0xb7
         IN-RECLAIM_FS-W at:
                             __mutex_lock+0x4e/0x8c0
                             __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
                             btrfs_evict_inode+0x22c/0x6a0 [btrfs]
                             evict+0xc4/0x190
                             dispose_list+0x35/0x50
                             prune_icache_sb+0x42/0x50
                             super_cache_scan+0x139/0x190
                             shrink_slab+0x262/0x5b0
                             shrink_node+0x2eb/0x2f0
                             kswapd+0x2eb/0x890
                             kthread+0x102/0x140
                             ret_from_fork+0x3a/0x50
         INITIAL USE at:
                         __mutex_lock+0x4e/0x8c0
                         btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
                         btrfs_update_inode+0x83/0x110 [btrfs]
                         btrfs_dirty_inode+0x62/0xe0 [btrfs]
                         touch_atime+0x8c/0xb0
                         do_generic_file_read+0x818/0xb10
                         __vfs_read+0xdc/0x150
                         vfs_read+0x8a/0x130
                         SyS_read+0x45/0xa0
                         do_syscall_64+0x79/0x1e0
                         entry_SYSCALL_64_after_hwframe+0x42/0xb7
       }
       ... key      at: [<ffffffffc072d488>] __key.56935+0x0/0xfffffffffff96b78 [btrfs]
       ... acquired at:
         __lock_acquire+0x264/0x11c0
         lock_acquire+0xbd/0x1e0
         __mutex_lock+0x4e/0x8c0
         __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
         btrfs_evict_inode+0x22c/0x6a0 [btrfs]
         evict+0xc4/0x190
         dispose_list+0x35/0x50
         prune_icache_sb+0x42/0x50
         super_cache_scan+0x139/0x190
         shrink_slab+0x262/0x5b0
         shrink_node+0x2eb/0x2f0
         kswapd+0x2eb/0x890
         kthread+0x102/0x140
         ret_from_fork+0x3a/0x50
      
      stack backtrace:
      CPU: 1 PID: 50 Comm: kswapd0 Tainted: G        W        4.12.14-kvmsmall #8 SLE15 (unreleased)
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
      Call Trace:
       dump_stack+0x78/0xb7
       print_irq_inversion_bug.part.38+0x19f/0x1aa
       check_usage_forwards+0x102/0x120
       ? ret_from_fork+0x3a/0x50
       ? check_usage_backwards+0x110/0x110
       mark_lock+0x16c/0x270
       __lock_acquire+0x264/0x11c0
       ? pagevec_lookup_entries+0x1a/0x30
       ? truncate_inode_pages_range+0x2b3/0x7f0
       lock_acquire+0xbd/0x1e0
       ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
       __mutex_lock+0x4e/0x8c0
       ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
       ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
       ? btrfs_evict_inode+0x1f6/0x6a0 [btrfs]
       __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
       btrfs_evict_inode+0x22c/0x6a0 [btrfs]
       evict+0xc4/0x190
       dispose_list+0x35/0x50
       prune_icache_sb+0x42/0x50
       super_cache_scan+0x139/0x190
       shrink_slab+0x262/0x5b0
       shrink_node+0x2eb/0x2f0
       kswapd+0x2eb/0x890
       kthread+0x102/0x140
       ? mem_cgroup_shrink_node+0x2c0/0x2c0
       ? kthread_create_on_node+0x40/0x40
       ret_from_fork+0x3a/0x50
      Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
      Reviewed-by: default avatarLiu Bo <bo.liu@linux.alibaba.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      93c13652
    • Filipe Manana's avatar
      Btrfs: fix copy_items() return value when logging an inode · b037a568
      Filipe Manana authored
      [ Upstream commit 8434ec46 ]
      
      When logging an inode, at tree-log.c:copy_items(), if we call
      btrfs_next_leaf() at the loop which checks for the need to log holes, we
      need to make sure copy_items() returns the value 1 to its caller and
      not 0 (on success). This is because the path the caller passed was
      released and is now different from what is was before, and the caller
      expects a return value of 0 to mean both success and that the path
      has not changed, while a return value of 1 means both success and
      signals the caller that it can not reuse the path, it has to perform
      another tree search.
      
      Even though this is a case that should not be triggered on normal
      circumstances or very rare at least, its consequences can be very
      unpredictable (especially when replaying a log tree).
      
      Fixes: 16e7549f ("Btrfs: incompatible format change to remove hole extents")
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b037a568
    • Qu Wenruo's avatar
      btrfs: tests/qgroup: Fix wrong tree backref level · a77b0cfe
      Qu Wenruo authored
      [ Upstream commit 3c0efdf0 ]
      
      The extent tree of the test fs is like the following:
      
       BTRFS info (device (null)): leaf 16327509003777336587 total ptrs 1 free space 3919
        item 0 key (4096 168 4096) itemoff 3944 itemsize 51
                extent refs 1 gen 1 flags 2
                tree block key (68719476736 0 0) level 1
                                                 ^^^^^^^
                ref#0: tree block backref root 5
      
      And it's using an empty tree for fs tree, so there is no way that its
      level can be 1.
      
      For REAL (created by mkfs) fs tree backref with no skinny metadata, the
      result should look like:
      
       item 3 key (30408704 EXTENT_ITEM 4096) itemoff 3845 itemsize 51
               refs 1 gen 4 flags TREE_BLOCK
               tree block key (256 INODE_ITEM 0) level 0
                                                 ^^^^^^^
               tree block backref root 5
      
      Fix the level to 0, so it won't break later tree level checker.
      
      Fixes: faa2dbf0 ("Btrfs: add sanity tests for new qgroup accounting code")
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a77b0cfe
    • Vicente Bergas's avatar
      Bluetooth: btusb: Add USB ID 7392:a611 for Edimax EW-7611ULB · b18cda27
      Vicente Bergas authored
      [ Upstream commit a41e0796 ]
      
      This WiFi/Bluetooth USB dongle uses a Realtek chipset, so, use btrtl for it.
      
      Product information:
      https://wikidevi.com/wiki/Edimax_EW-7611ULB
      
      >From /sys/kernel/debug/usb/devices
      T:  Bus=02 Lev=02 Prnt=02 Port=00 Cnt=01 Dev#=  3 Spd=480  MxCh= 0
      D:  Ver= 2.10 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs=  1
      P:  Vendor=7392 ProdID=a611 Rev= 2.00
      S:  Manufacturer=Realtek
      S:  Product=Edimax Wi-Fi N150 Bluetooth4.0 USB Adapter
      S:  SerialNumber=00e04c000001
      C:* #Ifs= 3 Cfg#= 1 Atr=e0 MxPwr=500mA
      A:  FirstIf#= 0 IfCount= 2 Cls=e0(wlcon) Sub=01 Prot=01
      I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
      E:  Ad=81(I) Atr=03(Int.) MxPS=  16 Ivl=1ms
      E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
      E:  Ad=03(O) Atr=01(Isoc) MxPS=   0 Ivl=1ms
      E:  Ad=83(I) Atr=01(Isoc) MxPS=   0 Ivl=1ms
      I:  If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
      E:  Ad=03(O) Atr=01(Isoc) MxPS=   9 Ivl=1ms
      E:  Ad=83(I) Atr=01(Isoc) MxPS=   9 Ivl=1ms
      I:  If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
      E:  Ad=03(O) Atr=01(Isoc) MxPS=  17 Ivl=1ms
      E:  Ad=83(I) Atr=01(Isoc) MxPS=  17 Ivl=1ms
      I:  If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
      E:  Ad=03(O) Atr=01(Isoc) MxPS=  25 Ivl=1ms
      E:  Ad=83(I) Atr=01(Isoc) MxPS=  25 Ivl=1ms
      I:  If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
      E:  Ad=03(O) Atr=01(Isoc) MxPS=  33 Ivl=1ms
      E:  Ad=83(I) Atr=01(Isoc) MxPS=  33 Ivl=1ms
      I:  If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
      E:  Ad=03(O) Atr=01(Isoc) MxPS=  49 Ivl=1ms
      E:  Ad=83(I) Atr=01(Isoc) MxPS=  49 Ivl=1ms
      I:* If#= 2 Alt= 0 #EPs= 6 Cls=ff(vend.) Sub=ff Prot=ff Driver=rtl8723bu
      E:  Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=87(I) Atr=03(Int.) MxPS=  64 Ivl=500us
      E:  Ad=08(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=09(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      Tested-by: default avatarVicente Bergas <vicencb@gmail.com>
      Signed-off-by: default avatarVicente Bergas <vicencb@gmail.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b18cda27
    • Florian Fainelli's avatar
      net: bgmac: Fix endian access in bgmac_dma_tx_ring_free() · 7dd86043
      Florian Fainelli authored
      [ Upstream commit 60d6e6f0 ]
      
      bgmac_dma_tx_ring_free() assigns the ctl1 word which is a litle endian
      32-bit word without using proper accessors, fix this, and because a
      length cannot be negative, use unsigned int while at it.
      
      Fixes: 9cde9450 ("bgmac: implement scatter/gather support")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7dd86043
    • Bryan O'Donoghue's avatar
      rtc: snvs: Fix usage of snvs_rtc_enable · d9bc96c4
      Bryan O'Donoghue authored
      [ Upstream commit 1485991c ]
      
      commit 179a502f ("rtc: snvs: add Freescale rtc-snvs driver") introduces
      the SNVS RTC driver with a function snvs_rtc_enable().
      
      snvs_rtc_enable() can return an error on the enable path however this
      driver does not currently trap that failure on the probe() path and
      consequently if enabling the RTC fails we encounter a later error spinning
      forever in rtc_write_sync_lp().
      
      [   36.093481] [<c010d630>] (__irq_svc) from [<c0c2e9ec>] (_raw_spin_unlock_irqrestore+0x34/0x44)
      [   36.102122] [<c0c2e9ec>] (_raw_spin_unlock_irqrestore) from [<c072e32c>] (regmap_read+0x4c/0x5c)
      [   36.110938] [<c072e32c>] (regmap_read) from [<c085d0f4>] (rtc_write_sync_lp+0x6c/0x98)
      [   36.118881] [<c085d0f4>] (rtc_write_sync_lp) from [<c085d160>] (snvs_rtc_alarm_irq_enable+0x40/0x4c)
      [   36.128041] [<c085d160>] (snvs_rtc_alarm_irq_enable) from [<c08567b4>] (rtc_timer_do_work+0xd8/0x1a8)
      [   36.137291] [<c08567b4>] (rtc_timer_do_work) from [<c01441b8>] (process_one_work+0x28c/0x76c)
      [   36.145840] [<c01441b8>] (process_one_work) from [<c01446cc>] (worker_thread+0x34/0x58c)
      [   36.153961] [<c01446cc>] (worker_thread) from [<c014aee4>] (kthread+0x138/0x150)
      [   36.161388] [<c014aee4>] (kthread) from [<c0107e14>] (ret_from_fork+0x14/0x20)
      [   36.168635] rcu_sched kthread starved for 2602 jiffies! g496 c495 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0 ->cpu=0
      [   36.178564] rcu_sched       R  running task        0     8      2 0x00000000
      [   36.185664] [<c0c288b0>] (__schedule) from [<c0c29134>] (schedule+0x3c/0xa0)
      [   36.192739] [<c0c29134>] (schedule) from [<c0c2db80>] (schedule_timeout+0x78/0x4e0)
      [   36.200422] [<c0c2db80>] (schedule_timeout) from [<c01a7ab0>] (rcu_gp_kthread+0x648/0x1864)
      [   36.208800] [<c01a7ab0>] (rcu_gp_kthread) from [<c014aee4>] (kthread+0x138/0x150)
      [   36.216309] [<c014aee4>] (kthread) from [<c0107e14>] (ret_from_fork+0x14/0x20)
      
      This patch fixes by parsing the result of rtc_write_sync_lp() and
      propagating both in the probe and elsewhere. If the RTC doesn't start we
      don't proceed loading the driver and don't get into this loop mess later
      on.
      
      Fixes: 179a502f ("rtc: snvs: add Freescale rtc-snvs driver")
      Signed-off-by: default avatarBryan O'Donoghue <pure.logic@nexus-software.ie>
      Acked-by: default avatarShawn Guo <shawn.guo@linaro.org>
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@bootlin.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d9bc96c4
    • David S. Miller's avatar
      sparc64: Make atomic_xchg() an inline function rather than a macro. · 659db521
      David S. Miller authored
      [ Upstream commit d13864b6 ]
      
      This avoids a lot of -Wunused warnings such as:
      
      ====================
      kernel/debug/debug_core.c: In function ‘kgdb_cpu_enter’:
      ./arch/sparc/include/asm/cmpxchg_64.h:55:22: warning: value computed is not used [-Wunused-value]
       #define xchg(ptr,x) ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr))))
      
      ./arch/sparc/include/asm/atomic_64.h:86:30: note: in expansion of macro ‘xchg’
       #define atomic_xchg(v, new) (xchg(&((v)->counter), new))
                                    ^~~~
      kernel/debug/debug_core.c:508:4: note: in expansion of macro ‘atomic_xchg’
          atomic_xchg(&kgdb_active, cpu);
          ^~~~~~~~~~~
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      659db521
    • David Howells's avatar
      fscache: Fix hanging wait on page discarded by writeback · 61b4c050
      David Howells authored
      [ Upstream commit 2c984257 ]
      
      If the fscache asynchronous write operation elects to discard a page that's
      pending storage to the cache because the page would be over the store limit
      then it needs to wake the page as someone may be waiting on completion of
      the write.
      
      The problem is that the store limit may be updated by a different
      asynchronous operation - and so may miss the write - and that the store
      limit may not even get updated until later by the netfs.
      
      Fix the kernel hang by making fscache_write_op() mark as written any pages
      that are over the limit.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      61b4c050
    • Sean Christopherson's avatar
      KVM: VMX: raise internal error for exception during invalid protected mode state · ffd0502d
      Sean Christopherson authored
      [ Upstream commit add5ff7a ]
      
      Exit to userspace with KVM_INTERNAL_ERROR_EMULATION if we encounter
      an exception in Protected Mode while emulating guest due to invalid
      guest state.  Unlike Big RM, KVM doesn't support emulating exceptions
      in PM, i.e. PM exceptions are always injected via the VMCS.  Because
      we will never do VMRESUME due to emulation_required, the exception is
      never realized and we'll keep emulating the faulting instruction over
      and over until we receive a signal.
      
      Exit to userspace iff there is a pending exception, i.e. don't exit
      simply on a requested event. The purpose of this check and exit is to
      aid in debugging a guest that is in all likelihood already doomed.
      Invalid guest state in PM is extremely limited in normal operation,
      e.g. it generally only occurs for a few instructions early in BIOS,
      and any exception at this time is all but guaranteed to be fatal.
      Non-vectored interrupts, e.g. INIT, SIPI and SMI, can be cleanly
      handled/emulated, while checking for vectored interrupts, e.g. INTR
      and NMI, without hitting false positives would add a fair amount of
      complexity for almost no benefit (getting hit by lightning seems
      more likely than encountering this specific scenario).
      
      Add a WARN_ON_ONCE to vmx_queue_exception() if we try to inject an
      exception via the VMCS and emulation_required is true.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ffd0502d
    • Davidlohr Bueso's avatar
      sched/rt: Fix rq->clock_update_flags < RQCF_ACT_SKIP warning · 094c145f
      Davidlohr Bueso authored
      [ Upstream commit d29a2064 ]
      
      While running rt-tests' pi_stress program I got the following splat:
      
        rq->clock_update_flags < RQCF_ACT_SKIP
        WARNING: CPU: 27 PID: 0 at kernel/sched/sched.h:960 assert_clock_updated.isra.38.part.39+0x13/0x20
      
        [...]
      
        <IRQ>
        enqueue_top_rt_rq+0xf4/0x150
        ? cpufreq_dbs_governor_start+0x170/0x170
        sched_rt_rq_enqueue+0x65/0x80
        sched_rt_period_timer+0x156/0x360
        ? sched_rt_rq_enqueue+0x80/0x80
        __hrtimer_run_queues+0xfa/0x260
        hrtimer_interrupt+0xcb/0x220
        smp_apic_timer_interrupt+0x62/0x120
        apic_timer_interrupt+0xf/0x20
        </IRQ>
      
        [...]
      
        do_idle+0x183/0x1e0
        cpu_startup_entry+0x5f/0x70
        start_secondary+0x192/0x1d0
        secondary_startup_64+0xa5/0xb0
      
      We can get rid of it be the "traditional" means of adding an
      update_rq_clock() call after acquiring the rq->lock in
      do_sched_rt_period_timer().
      
      The case for the RT task throttling (which this workload also hits)
      can be ignored in that the skip_update call is actually bogus and
      quite the contrary (the request bits are removed/reverted).
      
      By setting RQCF_UPDATED we really don't care if the skip is happening
      or not and will therefore make the assert_clock_updated() check happy.
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Reviewed-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave@stgolabs.net
      Cc: linux-kernel@vger.kernel.org
      Cc: rostedt@goodmis.org
      Link: http://lkml.kernel.org/r/20180402164954.16255-1-dave@stgolabs.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      094c145f
    • Jun Piao's avatar
      ocfs2/dlm: don't handle migrate lockres if already in shutdown · 6b6933fd
      Jun Piao authored
      [ Upstream commit bb34f24c ]
      
      We should not handle migrate lockres if we are already in
      'DLM_CTXT_IN_SHUTDOWN', as that will cause lockres remains after leaving
      dlm domain.  At last other nodes will get stuck into infinite loop when
      requsting lock from us.
      
      The problem is caused by concurrency umount between nodes.  Before
      receiveing N1's DLM_BEGIN_EXIT_DOMAIN_MSG, N2 has picked up N1 as the
      migrate target.  So N2 will continue sending lockres to N1 even though
      N1 has left domain.
      
              N1                             N2 (owner)
                                             touch file
      
          access the file,
          and get pr lock
      
                                             begin leave domain and
                                             pick up N1 as new owner
      
          begin leave domain and
          migrate all lockres done
      
                                             begin migrate lockres to N1
      
          end leave domain, but
          the lockres left
          unexpectedly, because
          migrate task has passed
      
      [piaojun@huawei.com: v3]
        Link: http://lkml.kernel.org/r/5A9CBD19.5020107@huawei.com
      Link: http://lkml.kernel.org/r/5A99F028.2090902@huawei.comSigned-off-by: default avatarJun Piao <piaojun@huawei.com>
      Reviewed-by: default avatarYiwen Jiang <jiangyiwen@huawei.com>
      Reviewed-by: default avatarJoseph Qi <jiangqi903@gmail.com>
      Reviewed-by: default avatarChangwei Ge <ge.changwei@h3c.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6b6933fd
    • Nikolay Borisov's avatar
      btrfs: Fix possible softlock on single core machines · e1ccd597
      Nikolay Borisov authored
      [ Upstream commit 1e1c50a9 ]
      
      do_chunk_alloc implements a loop checking whether there is a pending
      chunk allocation and if so causes the caller do loop. Generally this
      loop is executed only once, however testing with btrfs/072 on a single
      core vm machines uncovered an extreme case where the system could loop
      indefinitely. This is due to a missing cond_resched when loop which
      doesn't give a chance to the previous chunk allocator finish its job.
      
      The fix is to simply add the missing cond_resched.
      
      Fixes: 6d74119f ("Btrfs: avoid taking the chunk_mutex in do_chunk_alloc")
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e1ccd597
    • Liu Bo's avatar
      Btrfs: fix NULL pointer dereference in log_dir_items · b487d1cd
      Liu Bo authored
      [ Upstream commit 80c0b421 ]
      
      0, 1 and <0 can be returned by btrfs_next_leaf(), and when <0 is
      returned, path->nodes[0] could be NULL, log_dir_items lacks such a
      check for <0 and we may run into a null pointer dereference panic.
      
      Fixes: e02119d5 ("Btrfs: Add a write ahead tree log to optimize synchronous operations")
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarLiu Bo <bo.liu@linux.alibaba.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b487d1cd
    • Liu Bo's avatar
      Btrfs: bail out on error during replay_dir_deletes · 0c3968cc
      Liu Bo authored
      [ Upstream commit b98def7c ]
      
      If errors were returned by btrfs_next_leaf(), replay_dir_deletes needs
      to bail out, otherwise @ret would be forced to be 0 after 'break;' and
      the caller won't be aware of it.
      
      Fixes: e02119d5 ("Btrfs: Add a write ahead tree log to optimize synchronous operations")
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarLiu Bo <bo.liu@linux.alibaba.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0c3968cc
    • Huang Ying's avatar
      mm: fix races between address_space dereference and free in page_evicatable · 532618b4
      Huang Ying authored
      [ Upstream commit e92bb4dd ]
      
      When page_mapping() is called and the mapping is dereferenced in
      page_evicatable() through shrink_active_list(), it is possible for the
      inode to be truncated and the embedded address space to be freed at the
      same time.  This may lead to the following race.
      
      CPU1                                                CPU2
      
      truncate(inode)                                     shrink_active_list()
        ...                                                 page_evictable(page)
        truncate_inode_page(mapping, page);
          delete_from_page_cache(page)
            spin_lock_irqsave(&mapping->tree_lock, flags);
              __delete_from_page_cache(page, NULL)
                page_cache_tree_delete(..)
                  ...                                         mapping = page_mapping(page);
                  page->mapping = NULL;
                  ...
            spin_unlock_irqrestore(&mapping->tree_lock, flags);
            page_cache_free_page(mapping, page)
              put_page(page)
                if (put_page_testzero(page)) -> false
      - inode now has no pages and can be freed including embedded address_space
      
                                                              mapping_unevictable(mapping)
      							  test_bit(AS_UNEVICTABLE, &mapping->flags);
      - we've dereferenced mapping which is potentially already free.
      
      Similar race exists between swap cache freeing and page_evicatable()
      too.
      
      The address_space in inode and swap cache will be freed after a RCU
      grace period.  So the races are fixed via enclosing the page_mapping()
      and address_space usage in rcu_read_lock/unlock().  Some comments are
      added in code to make it clear what is protected by the RCU read lock.
      
      Link: http://lkml.kernel.org/r/20180212081227.1940-1-ying.huang@intel.comSigned-off-by: default avatar"Huang, Ying" <ying.huang@intel.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      532618b4
    • Claudio Imbrenda's avatar
      mm/ksm: fix interaction with THP · 35de741a
      Claudio Imbrenda authored
      [ Upstream commit 77da2ba0 ]
      
      This patch fixes a corner case for KSM.  When two pages belong or
      belonged to the same transparent hugepage, and they should be merged,
      KSM fails to split the page, and therefore no merging happens.
      
      This bug can be reproduced by:
      * making sure ksm is running (in case disabling ksmtuned)
      * enabling transparent hugepages
      * allocating a THP-aligned 1-THP-sized buffer
        e.g. on amd64: posix_memalign(&p, 1<<21, 1<<21)
      * filling it with the same values
        e.g. memset(p, 42, 1<<21)
      * performing madvise to make it mergeable
        e.g. madvise(p, 1<<21, MADV_MERGEABLE)
      * waiting for KSM to perform a few scans
      
      The expected outcome is that the all the pages get merged (1 shared and
      the rest sharing); the actual outcome is that no pages get merged (1
      unshared and the rest volatile)
      
      The reason of this behaviour is that we increase the reference count
      once for both pages we want to merge, but if they belong to the same
      hugepage (or compound page), the reference counter used in both cases is
      the one of the head of the compound page.  This means that
      split_huge_page will find a value of the reference counter too high and
      will fail.
      
      This patch solves this problem by testing if the two pages to merge
      belong to the same hugepage when attempting to merge them.  If so, the
      hugepage is split safely.  This means that the hugepage is not split if
      not necessary.
      
      Link: http://lkml.kernel.org/r/1521548069-24758-1-git-send-email-imbrenda@linux.vnet.ibm.comSigned-off-by: default avatarClaudio Imbrenda <imbrenda@linux.vnet.ibm.com>
      Co-authored-by: default avatarGerald Schaefer <gerald.schaefer@de.ibm.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      35de741a
    • Esben Haabendal's avatar
      dp83640: Ensure against premature access to PHY registers after reset · 1d4902ad
      Esben Haabendal authored
      [ Upstream commit 76327a35 ]
      
      The datasheet specifies a 3uS pause after performing a software
      reset. The default implementation of genphy_soft_reset() does not
      provide this, so implement soft_reset with the needed pause.
      Signed-off-by: default avatarEsben Haabendal <eha@deif.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1d4902ad
    • Dave Carroll's avatar
      scsi: aacraid: Insure command thread is not recursively stopped · 48b7c436
      Dave Carroll authored
      [ Upstream commit 1c6b41fb ]
      
      If a recursive IOP_RESET is invoked, usually due to the eh_thread
      handling errors after the first reset, be sure we flag that the command
      thread has been stopped to avoid an Oops of the form;
      
       [ 336.620256] CPU: 28 PID: 1193 Comm: scsi_eh_0 Kdump: loaded Not tainted 4.14.0-49.el7a.ppc64le #1
       [ 336.620297] task: c000003fd630b800 task.stack: c000003fd61a4000
       [ 336.620326] NIP: c000000000176794 LR: c00000000013038c CTR: c00000000024bc10
       [ 336.620361] REGS: c000003fd61a7720 TRAP: 0300 Not tainted (4.14.0-49.el7a.ppc64le)
       [ 336.620395] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 22084022 XER: 20040000
       [ 336.620435] CFAR: c000000000130388 DAR: 0000000000000000 DSISR: 40000000 SOFTE: 1
       [ 336.620435] GPR00: c00000000013038c c000003fd61a79a0 c0000000014c7e00 0000000000000000
       [ 336.620435] GPR04: 000000000000000c 000000000000000c 9000000000009033 0000000000000477
       [ 336.620435] GPR08: 0000000000000477 0000000000000000 0000000000000000 c008000010f7d940
       [ 336.620435] GPR12: c00000000024bc10 c000000007a33400 c0000000001708a8 c000003fe3b881d8
       [ 336.620435] GPR16: c000003fe3b88060 c000003fd61a7d10 fffffffffffff000 000000000000001e
       [ 336.620435] GPR20: 0000000000000001 c000000000ebf1a0 0000000000000001 c000003fe3b88000
       [ 336.620435] GPR24: 0000000000000003 0000000000000002 c000003fe3b88840 c000003fe3b887e8
       [ 336.620435] GPR28: c000003fe3b88000 c000003fc8181788 0000000000000000 c000003fc8181700
       [ 336.620750] NIP [c000000000176794] exit_creds+0x34/0x160
       [ 336.620775] LR [c00000000013038c] __put_task_struct+0x8c/0x1f0
       [ 336.620804] Call Trace:
       [ 336.620817] [c000003fd61a79a0] [c000003fe3b88000] 0xc000003fe3b88000 (unreliable)
       [ 336.620853] [c000003fd61a79d0] [c00000000013038c] __put_task_struct+0x8c/0x1f0
       [ 336.620889] [c000003fd61a7a00] [c000000000171418] kthread_stop+0x1e8/0x1f0
       [ 336.620922] [c000003fd61a7a40] [c008000010f7448c] aac_reset_adapter+0x14c/0x8d0 [aacraid]
       [ 336.620959] [c000003fd61a7b00] [c008000010f60174] aac_eh_host_reset+0x84/0x100 [aacraid]
       [ 336.621010] [c000003fd61a7b30] [c000000000864f24] scsi_try_host_reset+0x74/0x180
       [ 336.621046] [c000003fd61a7bb0] [c000000000867ac0] scsi_eh_ready_devs+0xc00/0x14d0
       [ 336.625165] [c000003fd61a7ca0] [c0000000008699e0] scsi_error_handler+0x550/0x730
       [ 336.632101] [c000003fd61a7dc0] [c000000000170a08] kthread+0x168/0x1b0
       [ 336.639031] [c000003fd61a7e30] [c00000000000b528] ret_from_kernel_thread+0x5c/0xb4
       [ 336.645971] Instruction dump:
       [ 336.648743] 384216a0 7c0802a6 fbe1fff8 f8010010 f821ffd1 7c7f1b78 60000000 60000000
       [ 336.657056] 39400000 e87f0838 f95f0838 7c0004ac <7d401828> 314affff 7d40192d 40c2fff4
       [ 336.663997] -[ end trace 4640cf8d4945ad95 ]-
      
      So flag when the thread is stopped by setting the thread pointer to NULL.
      Signed-off-by: default avatarDave Carroll <david.carroll@microsemi.com>
      Reviewed-by: default avatarRaghava Aditya Renukunta <raghavaaditya.renukunta@microsemi.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      48b7c436
    • Shunyong Yang's avatar
      cpufreq: CPPC: Initialize shared perf capabilities of CPUs · 11f34297
      Shunyong Yang authored
      [ Upstream commit 8913315e ]
      
      When multiple CPUs are related in one cpufreq policy, the first online
      CPU will be chosen by default to handle cpufreq operations. Let's take
      cpu0 and cpu1 as an example.
      
      When cpu0 is offline, policy->cpu will be shifted to cpu1. cpu1's perf
      capabilities should be initialized. Otherwise, perf capabilities are 0s
      and speed change can not take effect.
      
      This patch copies perf capabilities of the first online CPU to other
      shared CPUs when policy shared type is CPUFREQ_SHARED_TYPE_ANY.
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarShunyong Yang <shunyong.yang@hxt-semitech.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      11f34297
    • Carlos Maiolino's avatar
      Force log to disk before reading the AGF during a fstrim · 72422ef8
      Carlos Maiolino authored
      [ Upstream commit 8c81dd46 ]
      
      Forcing the log to disk after reading the agf is wrong, we might be
      calling xfs_log_force with XFS_LOG_SYNC with a metadata lock held.
      
      This can cause a deadlock when racing a fstrim with a filesystem
      shutdown.
      
      The deadlock has been identified due a miscalculation bug in device-mapper
      dm-thin, which returns lack of space to its users earlier than the device itself
      really runs out of space, changing the device-mapper volume into an error state.
      
      The problem happened while filling the filesystem with a single file,
      triggering the bug in device-mapper, consequently causing an IO error
      and shutting down the filesystem.
      
      If such file is removed, and fstrim executed before the XFS finishes the
      shut down process, the fstrim process will end up holding the buffer
      lock, and going to sleep on the cil wait queue.
      
      At this point, the shut down process will try to wake up all the threads
      waiting on the cil wait queue, but for this, it will try to hold the
      same buffer log already held my the fstrim, locking up the filesystem.
      Signed-off-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      72422ef8
    • Jens Axboe's avatar
      sr: get/drop reference to device in revalidate and check_events · 651a8964
      Jens Axboe authored
      [ Upstream commit 2d097c50 ]
      
      We can't just use scsi_cd() to get the scsi_cd structure, we have
      to grab a live reference to the device. For both callbacks, we're
      not inside an open where we already hold a reference to the device.
      
      This fixes device removal/addition under concurrent device access,
      which otherwise could result in the below oops.
      
      NULL pointer dereference at 0000000000000010
      PGD 0 P4D 0
      Oops: 0000 [#1] PREEMPT SMP
      Modules linked in:
      sr 12:0:0:0: [sr2] scsi-1 drive
       scsi_debug crc_t10dif crct10dif_generic crct10dif_common nvme nvme_core sb_edac xl
      sr 12:0:0:0: Attached scsi CD-ROM sr2
       sr_mod cdrom btrfs xor zstd_decompress zstd_compress xxhash lzo_compress zlib_defc
      sr 12:0:0:0: Attached scsi generic sg7 type 5
       igb ahci libahci i2c_algo_bit libata dca [last unloaded: crc_t10dif]
      CPU: 43 PID: 4629 Comm: systemd-udevd Not tainted 4.16.0+ #650
      Hardware name: Dell Inc. PowerEdge T630/0NT78X, BIOS 2.3.4 11/09/2016
      RIP: 0010:sr_block_revalidate_disk+0x23/0x190 [sr_mod]
      RSP: 0018:ffff883ff357bb58 EFLAGS: 00010292
      RAX: ffffffffa00b07d0 RBX: ffff883ff3058000 RCX: ffff883ff357bb66
      RDX: 0000000000000003 RSI: 0000000000007530 RDI: ffff881fea631000
      RBP: 0000000000000000 R08: ffff881fe4d38400 R09: 0000000000000000
      R10: 0000000000000000 R11: 00000000000001b6 R12: 000000000800005d
      R13: 000000000800005d R14: ffff883ffd9b3790 R15: 0000000000000000
      FS:  00007f7dc8e6d8c0(0000) GS:ffff883fff340000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000010 CR3: 0000003ffda98005 CR4: 00000000003606e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       ? __invalidate_device+0x48/0x60
       check_disk_change+0x4c/0x60
       sr_block_open+0x16/0xd0 [sr_mod]
       __blkdev_get+0xb9/0x450
       ? iget5_locked+0x1c0/0x1e0
       blkdev_get+0x11e/0x320
       ? bdget+0x11d/0x150
       ? _raw_spin_unlock+0xa/0x20
       ? bd_acquire+0xc0/0xc0
       do_dentry_open+0x1b0/0x320
       ? inode_permission+0x24/0xc0
       path_openat+0x4e6/0x1420
       ? cpumask_any_but+0x1f/0x40
       ? flush_tlb_mm_range+0xa0/0x120
       do_filp_open+0x8c/0xf0
       ? __seccomp_filter+0x28/0x230
       ? _raw_spin_unlock+0xa/0x20
       ? __handle_mm_fault+0x7d6/0x9b0
       ? list_lru_add+0xa8/0xc0
       ? _raw_spin_unlock+0xa/0x20
       ? __alloc_fd+0xaf/0x160
       ? do_sys_open+0x1a6/0x230
       do_sys_open+0x1a6/0x230
       do_syscall_64+0x5a/0x100
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      Reviewed-by: default avatarLee Duncan <lduncan@suse.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      651a8964
    • Tom Abraham's avatar
      swap: divide-by-zero when zero length swap file on ssd · 69ccb50c
      Tom Abraham authored
      [ Upstream commit a06ad633 ]
      
      Calling swapon() on a zero length swap file on SSD can lead to a
      divide-by-zero.
      
      Although creating such files isn't possible with mkswap and they woud be
      considered invalid, it would be better for the swapon code to be more
      robust and handle this condition gracefully (return -EINVAL).
      Especially since the fix is small and straightforward.
      
      To help with wear leveling on SSD, the swapon syscall calculates a
      random position in the swap file using modulo p->highest_bit, which is
      set to maxpages - 1 in read_swap_header.
      
      If the swap file is zero length, read_swap_header sets maxpages=1 and
      last_page=0, resulting in p->highest_bit=0 and we divide-by-zero when we
      modulo p->highest_bit in swapon syscall.
      
      This can be prevented by having read_swap_header return zero if
      last_page is zero.
      
      Link: http://lkml.kernel.org/r/5AC747C1020000A7001FA82C@prv-mh.provo.novell.comSigned-off-by: default avatarThomas Abraham <tabraham@suse.com>
      Reported-by: <Mark.Landis@Teradata.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      69ccb50c
    • Danilo Krummrich's avatar
      fs/proc/proc_sysctl.c: fix potential page fault while unregistering sysctl table · ea8a1f4a
      Danilo Krummrich authored
      [ Upstream commit a0b0d1c3 ]
      
      proc_sys_link_fill_cache() does not take currently unregistering sysctl
      tables into account, which might result into a page fault in
      sysctl_follow_link() - add a check to fix it.
      
      This bug has been present since v3.4.
      
      Link: http://lkml.kernel.org/r/20180228013506.4915-1-danilokrummrich@dk-develop.de
      Fixes: 0e47c99d ("sysctl: Replace root_list with links between sysctl_table_sets")
      Signed-off-by: default avatarDanilo Krummrich <danilokrummrich@dk-develop.de>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: "Luis R . Rodriguez" <mcgrof@kernel.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ea8a1f4a
    • Joerg Roedel's avatar
      x86/pgtable: Don't set huge PUD/PMD on non-leaf entries · dd0b7b0c
      Joerg Roedel authored
      [ Upstream commit e3e28812 ]
      
      The pmd_set_huge() and pud_set_huge() functions are used from
      the generic ioremap() code to establish large mappings where this
      is possible.
      
      But the generic ioremap() code does not check whether the
      PMD/PUD entries are already populated with a non-leaf entry,
      so that any page-table pages these entries point to will be
      lost.
      
      Further, on x86-32 with SHARED_KERNEL_PMD=0, this causes a
      BUG_ON() in vmalloc_sync_one() when PMD entries are synced
      from swapper_pg_dir to the current page-table. This happens
      because the PMD entry from swapper_pg_dir was promoted to a
      huge-page entry while the current PGD still contains the
      non-leaf entry. Because both entries are present and point
      to a different page, the BUG_ON() triggers.
      
      This was actually triggered with pti-x32 enabled in a KVM
      virtual machine by the graphics driver.
      
      A real and better fix for that would be to improve the
      page-table handling in the generic ioremap() code. But that is
      out-of-scope for this patch-set and left for later work.
      Reported-by: default avatarDavid H. Gutteridge <dhgutteridge@sympatico.ca>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Waiman Long <llong@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20180411152437.GC15462@8bytes.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dd0b7b0c
    • Rich Felker's avatar
      sh: fix debug trap failure to process signals before return to user · 8971f318
      Rich Felker authored
      [ Upstream commit 96a59899 ]
      
      When responding to a debug trap (breakpoint) in userspace, the
      kernel's trap handler raised SIGTRAP but returned from the trap via a
      code path that ignored pending signals, resulting in an infinite loop
      re-executing the trapping instruction.
      Signed-off-by: default avatarRich Felker <dalias@libc.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8971f318