1. 11 Feb, 2020 40 commits
    • Jens Axboe's avatar
      eventfd: track eventfd_signal() recursion depth · eaef83c4
      Jens Axboe authored
      commit b5e683d5 upstream.
      
      eventfd use cases from aio and io_uring can deadlock due to circular
      or resursive calling, when eventfd_signal() tries to grab the waitqueue
      lock. On top of that, it's also possible to construct notification
      chains that are deep enough that we could blow the stack.
      
      Add a percpu counter that tracks the percpu recursion depth, warn if we
      exceed it. The counter is also exposed so that users of eventfd_signal()
      can do the right thing if it's non-zero in the context where it is
      called.
      
      Cc: stable@vger.kernel.org # 4.19+
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eaef83c4
    • Coly Li's avatar
      bcache: add readahead cache policy options via sysfs interface · e608cd92
      Coly Li authored
      commit 038ba8cc upstream.
      
      In year 2007 high performance SSD was still expensive, in order to
      save more space for real workload or meta data, the readahead I/Os
      for non-meta data was bypassed and not cached on SSD.
      
      In now days, SSD price drops a lot and people can find larger size
      SSD with more comfortable price. It is unncessary to alway bypass
      normal readahead I/Os to save SSD space for now.
      
      This patch adds options for readahead data cache policies via sysfs
      file /sys/block/bcache<N>/readahead_cache_policy, the options are,
      - "all": cache all readahead data I/Os.
      - "meta-only": only cache meta data, and bypass other regular I/Os.
      
      If users want to make bcache continue to only cache readahead request
      for metadata and bypass regular data readahead, please set "meta-only"
      to this sysfs file. By default, bcache will back to cache all read-
      ahead requests now.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Acked-by: default avatarEric Wheeler <bcache@linux.ewheeler.net>
      Cc: Michael Lyle <mlyle@lyle.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e608cd92
    • Vladis Dronov's avatar
      watchdog: fix UAF in reboot notifier handling in watchdog core code · 1ca3742a
      Vladis Dronov authored
      commit 69503e58 upstream.
      
      After the commit 44ea3942 ("drivers/watchdog: make use of
      devm_register_reboot_notifier()") the struct notifier_block reboot_nb in
      the struct watchdog_device is removed from the reboot notifiers chain at
      the time watchdog's chardev is closed. But at least in i6300esb.c case
      reboot_nb is embedded in the struct esb_dev which can be freed on its
      device removal and before the chardev is closed, thus UAF at reboot:
      
      [    7.728581] esb_probe: esb_dev.watchdog_device ffff91316f91ab28
      ts# uname -r                            note the address ^^^
      5.5.0-rc5-ae6088-wdog
      ts# ./openwdog0 &
      [1] 696
      ts# opened /dev/watchdog0, sleeping 10s...
      ts# echo 1 > /sys/devices/pci0000\:00/0000\:00\:09.0/remove
      [  178.086079] devres:rel_nodes: dev ffff91317668a0b0 data ffff91316f91ab28
                 esb_dev.watchdog_device.reboot_nb memory is freed here ^^^
      ts# ...woken up
      [  181.459010] devres:rel_nodes: dev ffff913171781000 data ffff913174a1dae8
      [  181.460195] devm_unreg_reboot_notifier: res ffff913174a1dae8 nb ffff91316f91ab78
                                           attempt to use memory already freed ^^^
      [  181.461063] devm_unreg_reboot_notifier: nb->call 6b6b6b6b6b6b6b6b
      [  181.461243] devm_unreg_reboot_notifier: nb->next 6b6b6b6b6b6b6b6b
                      freed memory is filled with a slub poison ^^^
      [1]+  Done                    ./openwdog0
      ts# reboot
      [  229.921862] systemd-shutdown[1]: Rebooting.
      [  229.939265] notifier_call_chain: nb ffffffff9c6c2f20 nb->next ffffffff9c6d50c0
      [  229.943080] notifier_call_chain: nb ffffffff9c6d50c0 nb->next 6b6b6b6b6b6b6b6b
      [  229.946054] notifier_call_chain: nb 6b6b6b6b6b6b6b6b INVAL
      [  229.957584] general protection fault: 0000 [#1] SMP
      [  229.958770] CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 5.5.0-rc5-ae6088-wdog
      [  229.960224] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ...
      [  229.963288] RIP: 0010:notifier_call_chain+0x66/0xd0
      [  229.969082] RSP: 0018:ffffb20dc0013d88 EFLAGS: 00010246
      [  229.970812] RAX: 000000000000002e RBX: 6b6b6b6b6b6b6b6b RCX: 00000000000008b3
      [  229.972929] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffffffff9ccc46ac
      [  229.975028] RBP: 0000000000000001 R08: 0000000000000000 R09: 00000000000008b3
      [  229.977039] R10: 0000000000000001 R11: ffffffff9c26c740 R12: 0000000000000000
      [  229.979155] R13: 6b6b6b6b6b6b6b6b R14: 0000000000000000 R15: 00000000fffffffa
      ...   slub_debug=FZP poison ^^^
      [  229.989089] Call Trace:
      [  229.990157]  blocking_notifier_call_chain+0x43/0x59
      [  229.991401]  kernel_restart_prepare+0x14/0x30
      [  229.992607]  kernel_restart+0x9/0x30
      [  229.993800]  __do_sys_reboot+0x1d2/0x210
      [  230.000149]  do_syscall_64+0x3d/0x130
      [  230.001277]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  230.002639] RIP: 0033:0x7f5461bdd177
      [  230.016402] Modules linked in: i6300esb
      [  230.050261] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
      
      Fix the crash by reverting 44ea3942 so unregister_reboot_notifier()
      is called when watchdog device is removed. This also makes handling of
      the reboot notifier unified with the handling of the restart handler,
      which is freed with unregister_restart_handler() in the same place.
      
      Fixes: 44ea3942 ("drivers/watchdog: make use of devm_register_reboot_notifier()")
      Cc: stable@vger.kernel.org # v4.15+
      Signed-off-by: default avatarVladis Dronov <vdronov@redhat.com>
      Reviewed-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Link: https://lore.kernel.org/r/20200108125347.6067-1-vdronov@redhat.comSigned-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarWim Van Sebroeck <wim@linux-watchdog.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1ca3742a
    • Juergen Gross's avatar
      xen/balloon: Support xend-based toolstack take two · 9a69878d
      Juergen Gross authored
      commit eda4eabf upstream.
      
      Commit 3aa6c19d ("xen/balloon: Support xend-based toolstack")
      tried to fix a regression with running on rather ancient Xen versions.
      Unfortunately the fix was based on the assumption that xend would
      just use another Xenstore node, but in reality only some downstream
      versions of xend are doing that. The upstream xend does not write
      that Xenstore node at all, so the problem must be fixed in another
      way.
      
      The easiest way to achieve that is to fall back to the behavior
      before commit 96edd61d ("xen/balloon: don't online new memory
      initially") in case the static memory maximum can't be read.
      
      This is achieved by setting static_max to the current number of
      memory pages known by the system resulting in target_diff becoming
      zero.
      
      Fixes: 3aa6c19d ("xen/balloon: Support xend-based toolstack")
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: <stable@vger.kernel.org> # 4.13
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9a69878d
    • Gavin Shan's avatar
      tools/kvm_stat: Fix kvm_exit filter name · d85e2964
      Gavin Shan authored
      commit 5fcf3a55 upstream.
      
      The filter name is fixed to "exit_reason" for some kvm_exit events, no
      matter what architect we have. Actually, the filter name ("exit_reason")
      is only applicable to x86, meaning it's broken on other architects
      including aarch64.
      
      This fixes the issue by providing various kvm_exit filter names, depending
      on architect we're on. Afterwards, the variable filter name is picked and
      applied through ioctl(fd, SET_FILTER).
      Reported-by: default avatarAndrew Jones <drjones@redhat.com>
      Signed-off-by: default avatarGavin Shan <gshan@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d85e2964
    • Sean Young's avatar
      media: rc: ensure lirc is initialized before registering input device · 3696eddd
      Sean Young authored
      commit 080d89f5 upstream.
      
      Once rc_open is called on the input device, lirc events can be delivered.
      Ensure lirc is ready to do so else we might get this:
      
      Registered IR keymap rc-hauppauge
      rc rc0: Hauppauge WinTV PVR-350 as
      /devices/pci0000:00/0000:00:1e.0/0000:04:00.0/i2c-0/0-0018/rc/rc0
      input: Hauppauge WinTV PVR-350 as
      /devices/pci0000:00/0000:00:1e.0/0000:04:00.0/i2c-0/0-0018/rc/rc0/input9
      BUG: kernel NULL pointer dereference, address: 0000000000000038
      PGD 0 P4D 0
      Oops: 0000 [#1] SMP PTI
      CPU: 1 PID: 17 Comm: kworker/1:0 Not tainted 5.3.11-300.fc31.x86_64 #1
      Hardware name:  /DG43NB, BIOS NBG4310H.86A.0096.2009.0903.1845 09/03/2009
      Workqueue: events ir_work [ir_kbd_i2c]
      RIP: 0010:ir_lirc_scancode_event+0x3d/0xb0
      Code: a6 b4 07 00 00 49 81 c6 b8 07 00 00 55 53 e8 ba a7 9d ff 4c 89
      e7 49 89 45 00 e8 5e 7a 25 00 49 8b 1e 48 89 c5 4c 39 f3 74 58 <8b> 43
      38 8b 53 40 89 c1 2b 4b 3c 39 ca 72 41 21 d0 49 8b 7d 00 49
      RSP: 0018:ffffaae2000b3d88 EFLAGS: 00010017
      RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000019
      RDX: 0000000000000001 RSI: 006e801b1f26ce6a RDI: ffff9e39797c37b4
      RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000001
      R10: 0000000000000001 R11: 0000000000000001 R12: ffff9e39797c37b4
      R13: ffffaae2000b3db8 R14: ffff9e39797c37b8 R15: ffff9e39797c33d8
      FS:  0000000000000000(0000) GS:ffff9e397b680000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000038 CR3: 0000000035844000 CR4: 00000000000006e0
      Call Trace:
      ir_do_keydown+0x8e/0x2b0
      rc_keydown+0x52/0xc0
      ir_work+0xb8/0x130 [ir_kbd_i2c]
      process_one_work+0x19d/0x340
      worker_thread+0x50/0x3b0
      kthread+0xfb/0x130
      ? process_one_work+0x340/0x340
      ? kthread_park+0x80/0x80
      ret_from_fork+0x35/0x40
      Modules linked in: rc_hauppauge tuner msp3400 saa7127 saa7115 ivtv(+)
      tveeprom cx2341x v4l2_common videodev mc i2c_algo_bit ir_kbd_i2c
      ip_tables firewire_ohci e1000e serio_raw firewire_core ata_generic
      crc_itu_t pata_acpi pata_jmicron fuse
      CR2: 0000000000000038
      ---[ end trace c67c2697a99fa74b ]---
      RIP: 0010:ir_lirc_scancode_event+0x3d/0xb0
      Code: a6 b4 07 00 00 49 81 c6 b8 07 00 00 55 53 e8 ba a7 9d ff 4c 89
      e7 49 89 45 00 e8 5e 7a 25 00 49 8b 1e 48 89 c5 4c 39 f3 74 58 <8b> 43
      38 8b 53 40 89 c1 2b 4b 3c 39 ca 72 41 21 d0 49 8b 7d 00 49
      RSP: 0018:ffffaae2000b3d88 EFLAGS: 00010017
      RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000019
      RDX: 0000000000000001 RSI: 006e801b1f26ce6a RDI: ffff9e39797c37b4
      RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000001
      R10: 0000000000000001 R11: 0000000000000001 R12: ffff9e39797c37b4
      R13: ffffaae2000b3db8 R14: ffff9e39797c37b8 R15: ffff9e39797c33d8
      FS:  0000000000000000(0000) GS:ffff9e397b680000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000038 CR3: 0000000035844000 CR4: 00000000000006e0
      rc rc0: lirc_dev: driver ir_kbd_i2c registered at minor = 0, scancode
      receiver, no transmitter
      tuner-simple 0-0061: creating new instance
      tuner-simple 0-0061: type set to 2 (Philips NTSC (FI1236,FM1236 and
      compatibles))
      ivtv0: Registered device video0 for encoder MPG (4096 kB)
      ivtv0: Registered device video32 for encoder YUV (2048 kB)
      ivtv0: Registered device vbi0 for encoder VBI (1024 kB)
      ivtv0: Registered device video24 for encoder PCM (320 kB)
      ivtv0: Registered device radio0 for encoder radio
      ivtv0: Registered device video16 for decoder MPG (1024 kB)
      ivtv0: Registered device vbi8 for decoder VBI (64 kB)
      ivtv0: Registered device vbi16 for decoder VOUT
      ivtv0: Registered device video48 for decoder YUV (1024 kB)
      
      Cc: stable@vger.kernel.org
      Tested-by: default avatarNick French <nickfrench@gmail.com>
      Reported-by: default avatarNick French <nickfrench@gmail.com>
      Signed-off-by: default avatarSean Young <sean@mess.org>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+huawei@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3696eddd
    • Ville Syrjälä's avatar
      drm/rect: Avoid division by zero · f2c1ddb8
      Ville Syrjälä authored
      commit 433480c1 upstream.
      
      Check for zero width/height destination rectangle in
      drm_rect_clip_scaled() to avoid a division by zero.
      
      Cc: stable@vger.kernel.org
      Fixes: f96bdf56 ("drm/rect: Handle rounding errors in drm_rect_clip_scaled, v3.")
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Benjamin Gaignard <benjamin.gaignard@st.com>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Testcase: igt/kms_selftest/drm_rect_clip_scaled_div_by_zero
      Signed-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20191122175623.13565-2-ville.syrjala@linux.intel.comReviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Reviewed-by: default avatarBenjamin Gaignard <benjamin.gaignard@st.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f2c1ddb8
    • Andreas Gruenbacher's avatar
      gfs2: fix O_SYNC write handling · 4b67a516
      Andreas Gruenbacher authored
      commit 6e5e41e2 upstream.
      
      In gfs2_file_write_iter, for direct writes, the error checking in the buffered
      write fallback case is incomplete.  This can cause inode write errors to go
      undetected.  Fix and clean up gfs2_file_write_iter along the way.
      
      Based on a proposed fix by Christoph Hellwig <hch@lst.de>.
      
      Fixes: 967bcc91 ("gfs2: iomap direct I/O support")
      Cc: stable@vger.kernel.org # v4.19+
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4b67a516
    • Christoph Hellwig's avatar
      gfs2: move setting current->backing_dev_info · e57e77e9
      Christoph Hellwig authored
      commit 4c0e8dda upstream.
      
      Set current->backing_dev_info just around the buffered write calls to
      prepare for the next fix.
      
      Fixes: 967bcc91 ("gfs2: iomap direct I/O support")
      Cc: stable@vger.kernel.org # v4.19+
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e57e77e9
    • Roberto Bergantinos Corpas's avatar
      sunrpc: expiry_time should be seconds not timeval · a90c2c5e
      Roberto Bergantinos Corpas authored
      commit 3d96208c upstream.
      
      When upcalling gssproxy, cache_head.expiry_time is set as a
      timeval, not seconds since boot. As such, RPC cache expiry
      logic will not clean expired objects created under
      auth.rpcsec.context cache.
      
      This has proven to cause kernel memory leaks on field. Using
      64 bit variants of getboottime/timespec
      
      Expiration times have worked this way since 2010's c5b29f88 "sunrpc:
      use seconds since boot in expiry cache".  The gssproxy code introduced
      in 2012 added gss_proxy_save_rsc and introduced the bug.  That's a while
      for this to lurk, but it required a bit of an extreme case to make it
      obvious.
      Signed-off-by: default avatarRoberto Bergantinos Corpas <rbergant@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: 030d794b "SUNRPC: Use gssproxy upcall for server..."
      Tested-By: default avatarFrank Sorenson <sorenson@redhat.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a90c2c5e
    • Brian Norris's avatar
      mwifiex: fix unbalanced locking in mwifiex_process_country_ie() · eab22172
      Brian Norris authored
      commit 65b1aae0 upstream.
      
      We called rcu_read_lock(), so we need to call rcu_read_unlock() before
      we return.
      
      Fixes: 3d94a4a8 ("mwifiex: fix possible heap overflow in mwifiex_process_country_ie()")
      Cc: stable@vger.kernel.org
      Cc: huangwen <huangwenabc@gmail.com>
      Cc: Ganapathi Bhat <ganapathi.bhat@nxp.com>
      Signed-off-by: default avatarBrian Norris <briannorris@chromium.org>
      Acked-by: default avatarGanapathi Bhat <ganapathi.bhat@nxp.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eab22172
    • Luca Coelho's avatar
      iwlwifi: don't throw error when trying to remove IGTK · d07acc5e
      Luca Coelho authored
      commit 197288d5 upstream.
      
      The IGTK keys are only removed by mac80211 after it has already
      removed the AP station.  This causes the driver to throw an error
      because mac80211 is trying to remove the IGTK when the station doesn't
      exist anymore.
      
      The firmware is aware that the station has been removed and can deal
      with it the next time we try to add an IGTK for a station, so we
      shouldn't try to remove the key if the station ID is
      IWL_MVM_INVALID_STA.  Do this by removing the check for mvm_sta before
      calling iwl_mvm_send_sta_igtk() and check return from that function
      gracefully if the station ID is invalid.
      
      Cc: stable@vger.kernel.org # 4.12+
      Signed-off-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d07acc5e
    • Stephen Warren's avatar
      ARM: tegra: Enable PLLP bypass during Tegra124 LP1 · 8ca9b9f3
      Stephen Warren authored
      commit 1a3388d5 upstream.
      
      For a little over a year, U-Boot has configured the flow controller to
      perform automatic RAM re-repair on off->on power transitions of the CPU
      rail[1]. This is mandatory for correct operation of Tegra124. However,
      RAM re-repair relies on certain clocks, which the kernel must enable and
      leave running. PLLP is one of those clocks. This clock is shut down
      during LP1 in order to save power. Enable bypass (which I believe routes
      osc_div_clk, essentially the crystal clock, to the PLL output) so that
      this clock signal toggles even though the PLL is not active. This is
      required so that LP1 power mode (system suspend) operates correctly.
      
      The bypass configuration must then be undone when resuming from LP1, so
      that all peripheral clocks run at the expected rate. Without this, many
      peripherals won't work correctly; for example, the UART baud rate would
      be incorrect.
      
      NVIDIA's downstream kernel code only does this if not compiled for
      Tegra30, so the added code is made conditional upon the chip ID.
      NVIDIA's downstream code makes this change conditional upon the active
      CPU cluster. The upstream kernel currently doesn't support cluster
      switching, so this patch doesn't test the active CPU cluster ID.
      
      [1] 3cc7942a4ae5 ARM: tegra: implement RAM repair
      Reported-by: default avatarJonathan Hunter <jonathanh@nvidia.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarStephen Warren <swarren@nvidia.com>
      Signed-off-by: default avatarThierry Reding <treding@nvidia.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8ca9b9f3
    • Filipe Manana's avatar
      Btrfs: fix race between adding and putting tree mod seq elements and nodes · 18d07e43
      Filipe Manana authored
      commit 7227ff4d upstream.
      
      There is a race between adding and removing elements to the tree mod log
      list and rbtree that can lead to use-after-free problems.
      
      Consider the following example that explains how/why the problems happens:
      
      1) Task A has mod log element with sequence number 200. It currently is
         the only element in the mod log list;
      
      2) Task A calls btrfs_put_tree_mod_seq() because it no longer needs to
         access the tree mod log. When it enters the function, it initializes
         'min_seq' to (u64)-1. Then it acquires the lock 'tree_mod_seq_lock'
         before checking if there are other elements in the mod seq list.
         Since the list it empty, 'min_seq' remains set to (u64)-1. Then it
         unlocks the lock 'tree_mod_seq_lock';
      
      3) Before task A acquires the lock 'tree_mod_log_lock', task B adds
         itself to the mod seq list through btrfs_get_tree_mod_seq() and gets a
         sequence number of 201;
      
      4) Some other task, name it task C, modifies a btree and because there
         elements in the mod seq list, it adds a tree mod elem to the tree
         mod log rbtree. That node added to the mod log rbtree is assigned
         a sequence number of 202;
      
      5) Task B, which is doing fiemap and resolving indirect back references,
         calls btrfs get_old_root(), with 'time_seq' == 201, which in turn
         calls tree_mod_log_search() - the search returns the mod log node
         from the rbtree with sequence number 202, created by task C;
      
      6) Task A now acquires the lock 'tree_mod_log_lock', starts iterating
         the mod log rbtree and finds the node with sequence number 202. Since
         202 is less than the previously computed 'min_seq', (u64)-1, it
         removes the node and frees it;
      
      7) Task B still has a pointer to the node with sequence number 202, and
         it dereferences the pointer itself and through the call to
         __tree_mod_log_rewind(), resulting in a use-after-free problem.
      
      This issue can be triggered sporadically with the test case generic/561
      from fstests, and it happens more frequently with a higher number of
      duperemove processes. When it happens to me, it either freezes the VM or
      it produces a trace like the following before crashing:
      
        [ 1245.321140] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
        [ 1245.321200] CPU: 1 PID: 26997 Comm: pool Not tainted 5.5.0-rc6-btrfs-next-52 #1
        [ 1245.321235] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
        [ 1245.321287] RIP: 0010:rb_next+0x16/0x50
        [ 1245.321307] Code: ....
        [ 1245.321372] RSP: 0018:ffffa151c4d039b0 EFLAGS: 00010202
        [ 1245.321388] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8ae221363c80 RCX: 6b6b6b6b6b6b6b6b
        [ 1245.321409] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8ae221363c80
        [ 1245.321439] RBP: ffff8ae20fcc4688 R08: 0000000000000002 R09: 0000000000000000
        [ 1245.321475] R10: ffff8ae20b120910 R11: 00000000243f8bb1 R12: 0000000000000038
        [ 1245.321506] R13: ffff8ae221363c80 R14: 000000000000075f R15: ffff8ae223f762b8
        [ 1245.321539] FS:  00007fdee1ec7700(0000) GS:ffff8ae236c80000(0000) knlGS:0000000000000000
        [ 1245.321591] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [ 1245.321614] CR2: 00007fded4030c48 CR3: 000000021da16003 CR4: 00000000003606e0
        [ 1245.321642] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [ 1245.321668] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [ 1245.321706] Call Trace:
        [ 1245.321798]  __tree_mod_log_rewind+0xbf/0x280 [btrfs]
        [ 1245.321841]  btrfs_search_old_slot+0x105/0xd00 [btrfs]
        [ 1245.321877]  resolve_indirect_refs+0x1eb/0xc60 [btrfs]
        [ 1245.321912]  find_parent_nodes+0x3dc/0x11b0 [btrfs]
        [ 1245.321947]  btrfs_check_shared+0x115/0x1c0 [btrfs]
        [ 1245.321980]  ? extent_fiemap+0x59d/0x6d0 [btrfs]
        [ 1245.322029]  extent_fiemap+0x59d/0x6d0 [btrfs]
        [ 1245.322066]  do_vfs_ioctl+0x45a/0x750
        [ 1245.322081]  ksys_ioctl+0x70/0x80
        [ 1245.322092]  ? trace_hardirqs_off_thunk+0x1a/0x1c
        [ 1245.322113]  __x64_sys_ioctl+0x16/0x20
        [ 1245.322126]  do_syscall_64+0x5c/0x280
        [ 1245.322139]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [ 1245.322155] RIP: 0033:0x7fdee3942dd7
        [ 1245.322177] Code: ....
        [ 1245.322258] RSP: 002b:00007fdee1ec6c88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
        [ 1245.322294] RAX: ffffffffffffffda RBX: 00007fded40210d8 RCX: 00007fdee3942dd7
        [ 1245.322314] RDX: 00007fded40210d8 RSI: 00000000c020660b RDI: 0000000000000004
        [ 1245.322337] RBP: 0000562aa89e7510 R08: 0000000000000000 R09: 00007fdee1ec6d44
        [ 1245.322369] R10: 0000000000000073 R11: 0000000000000246 R12: 00007fdee1ec6d48
        [ 1245.322390] R13: 00007fdee1ec6d40 R14: 00007fded40210d0 R15: 00007fdee1ec6d50
        [ 1245.322423] Modules linked in: ....
        [ 1245.323443] ---[ end trace 01de1e9ec5dff3cd ]---
      
      Fix this by ensuring that btrfs_put_tree_mod_seq() computes the minimum
      sequence number and iterates the rbtree while holding the lock
      'tree_mod_log_lock' in write mode. Also get rid of the 'tree_mod_seq_lock'
      lock, since it is now redundant.
      
      Fixes: bd989ba3 ("Btrfs: add tree modification log functions")
      Fixes: 097b8a7c ("Btrfs: join tree mod log code with the code holding back delayed refs")
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      18d07e43
    • Josef Bacik's avatar
      btrfs: set trans->drity in btrfs_commit_transaction · 19ddbec7
      Josef Bacik authored
      commit d62b23c9 upstream.
      
      If we abort a transaction we have the following sequence
      
      if (!trans->dirty && list_empty(&trans->new_bgs))
      	return;
      WRITE_ONCE(trans->transaction->aborted, err);
      
      The idea being if we didn't modify anything with our trans handle then
      we don't really need to abort the whole transaction, maybe the other
      trans handles are fine and we can carry on.
      
      However in the case of create_snapshot we add a pending_snapshot object
      to our transaction and then commit the transaction.  We don't actually
      modify anything.  sync() behaves the same way, attach to an existing
      transaction and commit it.  This means that if we have an IO error in
      the right places we could abort the committing transaction with our
      trans->dirty being not set and thus not set transaction->aborted.
      
      This is a problem because in the create_snapshot() case we depend on
      pending->error being set to something, or btrfs_commit_transaction
      returning an error.
      
      If we are not the trans handle that gets to commit the transaction, and
      we're waiting on the commit to happen we get our return value from
      cur_trans->aborted.  If this was not set to anything because sync() hit
      an error in the transaction commit before it could modify anything then
      cur_trans->aborted would be 0.  Thus we'd return 0 from
      btrfs_commit_transaction() in create_snapshot.
      
      This is a problem because we then try to do things with
      pending_snapshot->snap, which will be NULL because we didn't create the
      snapshot, and then we'll get a NULL pointer dereference like the
      following
      
      "BUG: kernel NULL pointer dereference, address: 00000000000001f0"
      RIP: 0010:btrfs_orphan_cleanup+0x2d/0x330
      Call Trace:
       ? btrfs_mksubvol.isra.31+0x3f2/0x510
       btrfs_mksubvol.isra.31+0x4bc/0x510
       ? __sb_start_write+0xfa/0x200
       ? mnt_want_write_file+0x24/0x50
       btrfs_ioctl_snap_create_transid+0x16c/0x1a0
       btrfs_ioctl_snap_create_v2+0x11e/0x1a0
       btrfs_ioctl+0x1534/0x2c10
       ? free_debug_processing+0x262/0x2a3
       do_vfs_ioctl+0xa6/0x6b0
       ? do_sys_open+0x188/0x220
       ? syscall_trace_enter+0x1f8/0x330
       ksys_ioctl+0x60/0x90
       __x64_sys_ioctl+0x16/0x20
       do_syscall_64+0x4a/0x1b0
      
      In order to fix this we need to make sure anybody who calls
      commit_transaction has trans->dirty set so that they properly set the
      trans->transaction->aborted value properly so any waiters know bad
      things happened.
      
      This was found while I was running generic/475 with my modified
      fsstress, it reproduced within a few runs.  I ran with this patch all
      night and didn't see the problem again.
      
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      19ddbec7
    • Filipe Manana's avatar
      Btrfs: fix missing hole after hole punching and fsync when using NO_HOLES · 587292a1
      Filipe Manana authored
      commit 0e56315c upstream.
      
      When using the NO_HOLES feature, if we punch a hole into a file and then
      fsync it, there are cases where a subsequent fsync will miss the fact that
      a hole was punched, resulting in the holes not existing after replaying
      the log tree.
      
      Essentially these cases all imply that, tree-log.c:copy_items(), is not
      invoked for the leafs that delimit holes, because nothing changed those
      leafs in the current transaction. And it's precisely copy_items() where
      we currenly detect and log holes, which works as long as the holes are
      between file extent items in the input leaf or between the beginning of
      input leaf and the previous leaf or between the last item in the leaf
      and the next leaf.
      
      First example where we miss a hole:
      
        *) The extent items of the inode span multiple leafs;
      
        *) The punched hole covers a range that affects only the extent items of
           the first leaf;
      
        *) The fsync operation is done in full mode (BTRFS_INODE_NEEDS_FULL_SYNC
           is set in the inode's runtime flags).
      
        That results in the hole not existing after replaying the log tree.
      
        For example, if the fs/subvolume tree has the following layout for a
        particular inode:
      
            Leaf N, generation 10:
      
            [ ... INODE_ITEM INODE_REF EXTENT_ITEM (0 64K) EXTENT_ITEM (64K 128K) ]
      
            Leaf N + 1, generation 10:
      
            [ EXTENT_ITEM (128K 64K) ... ]
      
        If at transaction 11 we punch a hole coverting the range [0, 128K[, we end
        up dropping the two extent items from leaf N, but we don't touch the other
        leaf, so we end up in the following state:
      
            Leaf N, generation 11:
      
            [ ... INODE_ITEM INODE_REF ]
      
            Leaf N + 1, generation 10:
      
            [ EXTENT_ITEM (128K 64K) ... ]
      
        A full fsync after punching the hole will only process leaf N because it
        was modified in the current transaction, but not leaf N + 1, since it
        was not modified in the current transaction (generation 10 and not 11).
        As a result the fsync will not log any holes, because it didn't process
        any leaf with extent items.
      
      Second example where we will miss a hole:
      
        *) An inode as its items spanning 5 (or more) leafs;
      
        *) A hole is punched and it covers only the extents items of the 3rd
           leaf. This resulsts in deleting the entire leaf and not touching any
           of the other leafs.
      
        So the only leaf that is modified in the current transaction, when
        punching the hole, is the first leaf, which contains the inode item.
        During the full fsync, the only leaf that is passed to copy_items()
        is that first leaf, and that's not enough for the hole detection
        code in copy_items() to determine there's a hole between the last
        file extent item in the 2nd leaf and the first file extent item in
        the 3rd leaf (which was the 4th leaf before punching the hole).
      
      Fix this by scanning all leafs and punch holes as necessary when doing a
      full fsync (less common than a non-full fsync) when the NO_HOLES feature
      is enabled. The lack of explicit file extent items to mark holes makes it
      necessary to scan existing extents to determine if holes exist.
      
      A test case for fstests follows soon.
      
      Fixes: 16e7549f ("Btrfs: incompatible format change to remove hole extents")
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      587292a1
    • Vasily Averin's avatar
      jbd2_seq_info_next should increase position index · 68e81e14
      Vasily Averin authored
      commit 1a8e9cf4 upstream.
      
      if seq_file .next fuction does not change position index,
      read after some lseek can generate unexpected output.
      
      Script below generates endless output
       $ q=;while read -r r;do echo "$((++q)) $r";done </proc/fs/jbd2/DEV/info
      
      https://bugzilla.kernel.org/show_bug.cgi?id=206283
      
      Fixes: 1f4aace6 ("fs/seq_file.c: simplify seq_file iteration code and interface")
      Cc: stable@kernel.org
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/d13805e5-695e-8ac3-b678-26ca2313629f@virtuozzo.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      68e81e14
    • Trond Myklebust's avatar
      NFS: Directory page cache pages need to be locked when read · 729c1232
      Trond Myklebust authored
      commit 114de382 upstream.
      
      When a NFS directory page cache page is removed from the page cache,
      its contents are freed through a call to nfs_readdir_clear_array().
      To prevent the removal of the page cache entry until after we've
      finished reading it, we must take the page lock.
      
      Fixes: 11de3b11 ("NFS: Fix a memory leak in nfs_readdir")
      Cc: stable@vger.kernel.org # v2.6.37+
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Reviewed-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      729c1232
    • Trond Myklebust's avatar
      NFS: Fix memory leaks and corruption in readdir · 68b17243
      Trond Myklebust authored
      commit 4b310319 upstream.
      
      nfs_readdir_xdr_to_array() must not exit without having initialised
      the array, so that the page cache deletion routines can safely
      call nfs_readdir_clear_array().
      Furthermore, we should ensure that if we exit nfs_readdir_filler()
      with an error, we free up any page contents to prevent a leak
      if we try to fill the page again.
      
      Fixes: 11de3b11 ("NFS: Fix a memory leak in nfs_readdir")
      Cc: stable@vger.kernel.org # v2.6.37+
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Reviewed-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      68b17243
    • Arun Easi's avatar
      scsi: qla2xxx: Fix unbound NVME response length · 7a33aeda
      Arun Easi authored
      commit 00fe717e upstream.
      
      On certain cases when response length is less than 32, NVME response data
      is supplied inline in IOCB. This is indicated by some combination of state
      flags. There was an instance when a high, and incorrect, response length
      was indicated causing driver to overrun buffers. Fix this by checking and
      limiting the response payload length.
      
      Fixes: 7401bc18 ("scsi: qla2xxx: Add FC-NVMe command handling")
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20200124045014.23554-1-hmadhani@marvell.comSigned-off-by: default avatarArun Easi <aeasi@marvell.com>
      Signed-off-by: default avatarHimanshu Madhani <hmadhani@marvell.com>
      Reviewed-by: default avatarEwan D. Milne <emilne@redhat.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7a33aeda
    • Chuhong Yuan's avatar
      crypto: picoxcell - adjust the position of tasklet_init and fix missed tasklet_kill · 148c920e
      Chuhong Yuan authored
      commit 7f8c36fe upstream.
      
      Since tasklet is needed to be initialized before registering IRQ
      handler, adjust the position of tasklet_init to fix the wrong order.
      
      Besides, to fix the missed tasklet_kill, this patch adds a helper
      function and uses devm_add_action to kill the tasklet automatically.
      
      Fixes: ce921368 ("crypto: picoxcell - add support for the picoxcell crypto engines")
      Signed-off-by: default avatarChuhong Yuan <hslester96@gmail.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      148c920e
    • Herbert Xu's avatar
      crypto: api - Fix race condition in crypto_spawn_alg · 8b0a3e01
      Herbert Xu authored
      commit 73669cc5 upstream.
      
      The function crypto_spawn_alg is racy because it drops the lock
      before shooting the dying algorithm.  The algorithm could disappear
      altogether before we shoot it.
      
      This patch fixes it by moving the shooting into the locked section.
      
      Fixes: 6bfd4809 ("[CRYPTO] api: Added spawns")
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8b0a3e01
    • Tudor Ambarus's avatar
      crypto: atmel-aes - Fix counter overflow in CTR mode · ede3b239
      Tudor Ambarus authored
      commit 781a08d9 upstream.
      
      32 bit counter is not supported by neither of our AES IPs, all implement
      a 16 bit block counter. Drop the 32 bit block counter logic.
      
      Fixes: fcac8365 ("crypto: atmel-aes - fix the counter overflow in CTR mode")
      Signed-off-by: default avatarTudor Ambarus <tudor.ambarus@microchip.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ede3b239
    • Herbert Xu's avatar
      crypto: pcrypt - Do not clear MAY_SLEEP flag in original request · c90aa32d
      Herbert Xu authored
      commit e8d99826 upstream.
      
      We should not be modifying the original request's MAY_SLEEP flag
      upon completion.  It makes no sense to do so anyway.
      Reported-by: default avatarEric Biggers <ebiggers@kernel.org>
      Fixes: 5068c7a8 ("crypto: pcrypt - Add pcrypt crypto...")
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Tested-by: default avatarEric Biggers <ebiggers@kernel.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c90aa32d
    • Ard Biesheuvel's avatar
      crypto: ccp - set max RSA modulus size for v3 platform devices as well · f28e641b
      Ard Biesheuvel authored
      commit 11548f5a upstream.
      
      AMD Seattle incorporates a non-PCI version of the v3 CCP crypto
      accelerator, and this version was left behind when the maximum
      RSA modulus size was parameterized in order to support v5 hardware
      which supports larger moduli than v3 hardware does. Due to this
      oversight, RSA acceleration no longer works at all on these systems.
      
      Fix this by setting the .rsamax property to the appropriate value
      for v3 platform hardware.
      
      Fixes: e28c190d ("csrypto: ccp - Expand RSA support for a v5 ccp")
      Cc: Gary R Hook <gary.hook@amd.com>
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Acked-by: default avatarGary R Hook <gary.hook@amd.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f28e641b
    • Toke Høiland-Jørgensen's avatar
      samples/bpf: Don't try to remove user's homedir on clean · a1e311be
      Toke Høiland-Jørgensen authored
      commit b2e5e93a upstream.
      
      The 'clean' rule in the samples/bpf Makefile tries to remove backup
      files (ending in ~). However, if no such files exist, it will instead try
      to remove the user's home directory. While the attempt is mostly harmless,
      it does lead to a somewhat scary warning like this:
      
      rm: cannot remove '~': Is a directory
      
      Fix this by using find instead of shell expansion to locate any actual
      backup files that need to be removed.
      
      Fixes: b62a796c ("samples/bpf: allow make to be run from samples/bpf/ directory")
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Link: https://lore.kernel.org/bpf/157952560126.1683545.7273054725976032511.stgit@toke.dkSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a1e311be
    • Steven Rostedt (VMware)'s avatar
      ftrace: Protect ftrace_graph_hash with ftrace_sync · 0948d629
      Steven Rostedt (VMware) authored
      [ Upstream commit 54a16ff6 ]
      
      As function_graph tracer can run when RCU is not "watching", it can not be
      protected by synchronize_rcu() it requires running a task on each CPU before
      it can be freed. Calling schedule_on_each_cpu(ftrace_sync) needs to be used.
      
      Link: https://lore.kernel.org/r/20200205131110.GT2935@paulmck-ThinkPad-P72
      
      Cc: stable@vger.kernel.org
      Fixes: b9b0c831 ("ftrace: Convert graph filter to use hash tables")
      Reported-by: default avatar"Paul E. McKenney" <paulmck@kernel.org>
      Reviewed-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0948d629
    • Steven Rostedt (VMware)'s avatar
      ftrace: Add comment to why rcu_dereference_sched() is open coded · c03d2359
      Steven Rostedt (VMware) authored
      [ Upstream commit 16052dd5 ]
      
      Because the function graph tracer can execute in sections where RCU is not
      "watching", the rcu_dereference_sched() for the has needs to be open coded.
      This is fine because the RCU "flavor" of the ftrace hash is protected by
      its own RCU handling (it does its own little synchronization on every CPU
      and does not rely on RCU sched).
      Acked-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c03d2359
    • Amol Grover's avatar
      tracing: Annotate ftrace_graph_notrace_hash pointer with __rcu · 30afa80b
      Amol Grover authored
      [ Upstream commit fd0e6852 ]
      
      Fix following instances of sparse error
      kernel/trace/ftrace.c:5667:29: error: incompatible types in comparison
      kernel/trace/ftrace.c:5813:21: error: incompatible types in comparison
      kernel/trace/ftrace.c:5868:36: error: incompatible types in comparison
      kernel/trace/ftrace.c:5870:25: error: incompatible types in comparison
      
      Use rcu_dereference_protected to dereference the newly annotated pointer.
      
      Link: http://lkml.kernel.org/r/20200205055701.30195-1-frextrite@gmail.comSigned-off-by: default avatarAmol Grover <frextrite@gmail.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      30afa80b
    • Amol Grover's avatar
      tracing: Annotate ftrace_graph_hash pointer with __rcu · f144ad2e
      Amol Grover authored
      [ Upstream commit 24a9729f ]
      
      Fix following instances of sparse error
      kernel/trace/ftrace.c:5664:29: error: incompatible types in comparison
      kernel/trace/ftrace.c:5785:21: error: incompatible types in comparison
      kernel/trace/ftrace.c:5864:36: error: incompatible types in comparison
      kernel/trace/ftrace.c:5866:25: error: incompatible types in comparison
      
      Use rcu_dereference_protected to access the __rcu annotated pointer.
      
      Link: http://lkml.kernel.org/r/20200201072703.17330-1-frextrite@gmail.comReviewed-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: default avatarAmol Grover <frextrite@gmail.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f144ad2e
    • Herbert Xu's avatar
      padata: Remove broken queue flushing · dc34710a
      Herbert Xu authored
      [ Upstream commit 07928d9b ]
      
      The function padata_flush_queues is fundamentally broken because
      it cannot force padata users to complete the request that is
      underway.  IOW padata has to passively wait for the completion
      of any outstanding work.
      
      As it stands flushing is used in two places.  Its use in padata_stop
      is simply unnecessary because nothing depends on the queues to
      be flushed afterwards.
      
      The other use in padata_replace is more substantial as we depend
      on it to free the old pd structure.  This patch instead uses the
      pd->refcnt to dynamically free the pd structure once all requests
      are complete.
      
      Fixes: 2b73b07a ("padata: Flush the padata queues actively")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Reviewed-by: default avatarDaniel Jordan <daniel.m.jordan@oracle.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      dc34710a
    • Mikulas Patocka's avatar
      dm writecache: fix incorrect flush sequence when doing SSD mode commit · a9992966
      Mikulas Patocka authored
      commit aa950920 upstream.
      
      When committing state, the function writecache_flush does the following:
      1. write metadata (writecache_commit_flushed)
      2. flush disk cache (writecache_commit_flushed)
      3. wait for data writes to complete (writecache_wait_for_ios)
      4. increase superblock seq_count
      5. write the superblock
      6. flush disk cache
      
      It may happen that at step 3, when we wait for some write to finish, the
      disk may report the write as finished, but the write only hit the disk
      cache and it is not yet stored in persistent storage. At step 5 we write
      the superblock - it may happen that the superblock is written before the
      write that we waited for in step 3. If the machine crashes, it may result
      in incorrect data being returned after reboot.
      
      In order to fix the bug, we must swap steps 2 and 3 in the above sequence,
      so that we first wait for writes to complete and then flush the disk
      cache.
      
      Fixes: 48debafe ("dm: add writecache target")
      Cc: stable@vger.kernel.org # 4.18+
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a9992966
    • Mike Snitzer's avatar
      dm: fix potential for q->make_request_fn NULL pointer · 9eb75d69
      Mike Snitzer authored
      commit 47ace7e0 upstream.
      
      Move blk_queue_make_request() to dm.c:alloc_dev() so that
      q->make_request_fn is never NULL during the lifetime of a DM device
      (even one that is created without a DM table).
      
      Otherwise generic_make_request() will crash simply by doing:
        dmsetup create -n test
        mount /dev/dm-N /mnt
      
      While at it, move ->congested_data initialization out of
      dm.c:alloc_dev() and into the bio-based specific init method.
      Reported-by: default avatarStefan Bader <stefan.bader@canonical.com>
      BugLink: https://bugs.launchpad.net/bugs/1860231
      Fixes: ff36ab34 ("dm: remove request-based logic from make_request_fn wrapper")
      Depends-on: c12c9a3c ("dm: various cleanups to md->queue initialization code")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9eb75d69
    • Milan Broz's avatar
      dm crypt: fix benbi IV constructor crash if used in authenticated mode · 607d7cf2
      Milan Broz authored
      commit 4ea9471f upstream.
      
      If benbi IV is used in AEAD construction, for example:
        cryptsetup luksFormat <device> --cipher twofish-xts-benbi --key-size 512 --integrity=hmac-sha256
      the constructor uses wrong skcipher function and crashes:
      
       BUG: kernel NULL pointer dereference, address: 00000014
       ...
       EIP: crypt_iv_benbi_ctr+0x15/0x70 [dm_crypt]
       Call Trace:
        ? crypt_subkey_size+0x20/0x20 [dm_crypt]
        crypt_ctr+0x567/0xfc0 [dm_crypt]
        dm_table_add_target+0x15f/0x340 [dm_mod]
      
      Fix this by properly using crypt_aead_blocksize() in this case.
      
      Fixes: ef43aa38 ("dm crypt: add cryptographic data integrity protection (authenticated encryption)")
      Cc: stable@vger.kernel.org # v4.12+
      Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=941051Reported-by: default avatarJerad Simpson <jbsimpson@gmail.com>
      Signed-off-by: default avatarMilan Broz <gmazyland@gmail.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      607d7cf2
    • Joe Thornber's avatar
      dm space map common: fix to ensure new block isn't already in use · 1fac9f57
      Joe Thornber authored
      commit 4feaef83 upstream.
      
      The space-maps track the reference counts for disk blocks allocated by
      both the thin-provisioning and cache targets.  There are variants for
      tracking metadata blocks and data blocks.
      
      Transactionality is implemented by never touching blocks from the
      previous transaction, so we can rollback in the event of a crash.
      
      When allocating a new block we need to ensure the block is free (has
      reference count of 0) in both the current and previous transaction.
      Prior to this fix we were doing this by searching for a free block in
      the previous transaction, and relying on a 'begin' counter to track
      where the last allocation in the current transaction was.  This
      'begin' field was not being updated in all code paths (eg, increment
      of a data block reference count due to breaking sharing of a neighbour
      block in the same btree leaf).
      
      This fix keeps the 'begin' field, but now it's just a hint to speed up
      the search.  Instead the current transaction is searched for a free
      block, and then the old transaction is double checked to ensure it's
      free.  Much simpler.
      
      This fixes reports of sm_disk_new_block()'s BUG_ON() triggering when
      DM thin-provisioning's snapshots are heavily used.
      Reported-by: default avatarEric Wheeler <dm-devel@lists.ewheeler.net>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1fac9f57
    • Dmitry Fomichev's avatar
      dm zoned: support zone sizes smaller than 128MiB · 4ae8d3a5
      Dmitry Fomichev authored
      commit b3996295 upstream.
      
      dm-zoned is observed to log failed kernel assertions and not work
      correctly when operating against a device with a zone size smaller
      than 128MiB (e.g. 32768 bits per 4K block). The reason is that the
      bitmap size per zone is calculated as zero with such a small zone
      size. Fix this problem and also make the code related to zone bitmap
      management be able to handle per zone bitmaps smaller than a single
      block.
      
      A dm-zoned-tools patch is required to properly format dm-zoned devices
      with zone sizes smaller than 128MiB.
      
      Fixes: 3b1a94c8 ("dm zoned: drive-managed zoned block device target")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDmitry Fomichev <dmitry.fomichev@wdc.com>
      Reviewed-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4ae8d3a5
    • Michael Ellerman's avatar
      of: Add OF_DMA_DEFAULT_COHERENT & select it on powerpc · 8a7c008c
      Michael Ellerman authored
      commit dabf6b36 upstream.
      
      There's an OF helper called of_dma_is_coherent(), which checks if a
      device has a "dma-coherent" property to see if the device is coherent
      for DMA.
      
      But on some platforms devices are coherent by default, and on some
      platforms it's not possible to update existing device trees to add the
      "dma-coherent" property.
      
      So add a Kconfig symbol to allow arch code to tell
      of_dma_is_coherent() that devices are coherent by default, regardless
      of the presence of the property.
      
      Select that symbol on powerpc when NOT_COHERENT_CACHE is not set, ie.
      when the system has a coherent cache.
      
      Fixes: 92ea637e ("of: introduce of_dma_is_coherent() helper")
      Cc: stable@vger.kernel.org # v3.16+
      Reported-by: default avatarChristian Zigotzky <chzigotzky@xenosoft.de>
      Tested-by: default avatarChristian Zigotzky <chzigotzky@xenosoft.de>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8a7c008c
    • Rafael J. Wysocki's avatar
      PM: core: Fix handling of devices deleted during system-wide resume · 76d587bd
      Rafael J. Wysocki authored
      commit 0552e05f upstream.
      
      If a device is deleted by one of its system-wide resume callbacks
      (for example, because it does not appear to be present or accessible
      any more) along with its children, the resume of the children may
      continue leading to use-after-free errors and other issues
      (potentially).
      
      Namely, if the device's children are resumed asynchronously, their
      resume may have been scheduled already before the device's callback
      runs and so the device may be deleted while dpm_wait_for_superior()
      is being executed for them.  The memory taken up by the parent device
      object may be freed then while dpm_wait() is waiting for the parent's
      resume callback to complete, which leads to a use-after-free.
      Moreover, the resume of the children is really not expected to
      continue after they have been unregistered, so it must be terminated
      right away in that case.
      
      To address this problem, modify dpm_wait_for_superior() to check
      if the target device is still there in the system-wide PM list of
      devices and if so, to increment its parent's reference counter, both
      under dpm_list_mtx which prevents device_del() running for the child
      from dropping the parent's reference counter prematurely.
      
      If the device is not present in the system-wide PM list of devices
      any more, the resume of it cannot continue, so check that again after
      dpm_wait() returns, which means that the parent's callback has been
      completed, and pass the result of that check to the caller of
      dpm_wait_for_superior() to allow it to abort the device's resume
      if it is not there any more.
      
      Link: https://lore.kernel.org/linux-pm/1579568452-27253-1-git-send-email-chanho.min@lge.comReported-by: default avatarChanho Min <chanho.min@lge.com>
      Cc: All applicable <stable@vger.kernel.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      76d587bd
    • Chengguang Xu's avatar
      f2fs: code cleanup for f2fs_statfs_project() · 487da4d9
      Chengguang Xu authored
      commit bf2cbd3c upstream.
      
      Calling min_not_zero() to simplify complicated prjquota
      limit comparison in f2fs_statfs_project().
      Signed-off-by: default avatarChengguang Xu <cgxu519@mykernel.net>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      487da4d9
    • Chengguang Xu's avatar
      f2fs: fix miscounted block limit in f2fs_statfs_project() · d3811818
      Chengguang Xu authored
      commit acdf2172 upstream.
      
      statfs calculates Total/Used/Avail disk space in block unit,
      so we should translate soft/hard prjquota limit to block unit
      as well.
      
      Below testing result shows the block/inode numbers of
      Total/Used/Avail from df command are all correct afer
      applying this patch.
      
      [root@localhost quota-tools]\# ./repquota -P /dev/sdb1
      d3811818