1. 31 May, 2019 40 commits
    • Raul E Rangel's avatar
      mmc: core: Verify SD bus width · aeae78ba
      Raul E Rangel authored
      [ Upstream commit 9e4be8d0 ]
      
      The SD Physical Layer Spec says the following: Since the SD Memory Card
      shall support at least the two bus modes 1-bit or 4-bit width, then any SD
      Card shall set at least bits 0 and 2 (SD_BUS_WIDTH="0101").
      
      This change verifies the card has specified a bus width.
      
      AMD SDHC Device 7806 can get into a bad state after a card disconnect
      where anything transferred via the DATA lines will always result in a
      zero filled buffer. Currently the driver will continue without error if
      the HC is in this condition. A block device will be created, but reading
      from it will result in a zero buffer. This makes it seem like the SD
      device has been erased, when in actuality the data is never getting
      copied from the DATA lines to the data buffer.
      
      SCR is the first command in the SD initialization sequence that uses the
      DATA lines. By checking that the response was invalid, we can abort
      mounting the card.
      Reviewed-by: default avatarAvri Altman <avri.altman@wdc.com>
      Signed-off-by: default avatarRaul E Rangel <rrangel@chromium.org>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      aeae78ba
    • Andreas Gruenbacher's avatar
      gfs2: Fix occasional glock use-after-free · fb9132e8
      Andreas Gruenbacher authored
      [ Upstream commit 9287c645 ]
      
      This patch has to do with the life cycle of glocks and buffers.  When
      gfs2 metadata or journaled data is queued to be written, a gfs2_bufdata
      object is assigned to track the buffer, and that is queued to various
      lists, including the glock's gl_ail_list to indicate it's on the active
      items list.  Once the page associated with the buffer has been written,
      it is removed from the ail list, but its life isn't over until a revoke
      has been successfully written.
      
      So after the block is written, its bufdata object is moved from the
      glock's gl_ail_list to a file-system-wide list of pending revokes,
      sd_log_le_revoke.  At that point the glock still needs to track how many
      revokes it contributed to that list (in gl_revokes) so that things like
      glock go_sync can ensure all the metadata has been not only written, but
      also revoked before the glock is granted to a different node.  This is
      to guarantee journal replay doesn't replay the block once the glock has
      been granted to another node.
      
      Ross Lagerwall recently discovered a race in which an inode could be
      evicted, and its glock freed after its ail list had been synced, but
      while it still had unwritten revokes on the sd_log_le_revoke list.  The
      evict decremented the glock reference count to zero, which allowed the
      glock to be freed.  After the revoke was written, function
      revoke_lo_after_commit tried to adjust the glock's gl_revokes counter
      and clear its GLF_LFLUSH flag, at which time it referenced the freed
      glock.
      
      This patch fixes the problem by incrementing the glock reference count
      in gfs2_add_revoke when the glock's first bufdata object is moved from
      the glock to the global revokes list. Later, when the glock's last such
      bufdata object is freed, the reference count is decremented. This
      guarantees that whichever process finishes last (the revoke writing or
      the evict) will properly free the glock, and neither will reference the
      glock after it has been freed.
      Reported-by: default avatarRoss Lagerwall <ross.lagerwall@citrix.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      fb9132e8
    • Mike Marciniszyn's avatar
      IB/hfi1: Fix WQ_MEM_RECLAIM warning · 652fa249
      Mike Marciniszyn authored
      [ Upstream commit 4c4b1996 ]
      
      The work_item cancels that occur when a QP is destroyed can elicit the
      following trace:
      
       workqueue: WQ_MEM_RECLAIM ipoib_wq:ipoib_cm_tx_reap [ib_ipoib] is flushing !WQ_MEM_RECLAIM hfi0_0:_hfi1_do_send [hfi1]
       WARNING: CPU: 7 PID: 1403 at kernel/workqueue.c:2486 check_flush_dependency+0xb1/0x100
       Call Trace:
        __flush_work.isra.29+0x8c/0x1a0
        ? __switch_to_asm+0x40/0x70
        __cancel_work_timer+0x103/0x190
        ? schedule+0x32/0x80
        iowait_cancel_work+0x15/0x30 [hfi1]
        rvt_reset_qp+0x1f8/0x3e0 [rdmavt]
        rvt_destroy_qp+0x65/0x1f0 [rdmavt]
        ? _cond_resched+0x15/0x30
        ib_destroy_qp+0xe9/0x230 [ib_core]
        ipoib_cm_tx_reap+0x21c/0x560 [ib_ipoib]
        process_one_work+0x171/0x370
        worker_thread+0x49/0x3f0
        kthread+0xf8/0x130
        ? max_active_store+0x80/0x80
        ? kthread_bind+0x10/0x10
        ret_from_fork+0x35/0x40
      
      Since QP destruction frees memory, hfi1_wq should have the WQ_MEM_RECLAIM.
      
      The hfi1_wq does not allocate memory with GFP_KERNEL or otherwise become
      entangled with memory reclaim, so this flag is appropriate.
      
      Fixes: 0a226edd ("staging/rdma/hfi1: Use parallel workqueue for SDMA engines")
      Reviewed-by: default avatarMichael J. Ruhl <michael.j.ruhl@intel.com>
      Signed-off-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      652fa249
    • Shenghui Wang's avatar
      io_uring: use cpu_online() to check p->sq_thread_cpu instead of cpu_possible() · b6e7eed8
      Shenghui Wang authored
      [ Upstream commit 7889f44d ]
      
      This issue is found by running liburing/test/io_uring_setup test.
      
      When test run, the testcase "attempt to bind to invalid cpu" would not
      pass with messages like:
         io_uring_setup(1, 0xbfc2f7c8), \
      flags: IORING_SETUP_SQPOLL|IORING_SETUP_SQ_AFF, \
      resv: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000, \
      sq_thread_cpu: 2
         expected -1, got 3
         FAIL
      
      On my system, there is:
         CPU(s) possible : 0-3
         CPU(s) online   : 0-1
         CPU(s) offline  : 2-3
         CPU(s) present  : 0-1
      
      The sq_thread_cpu 2 is offline on my system, so the bind should fail.
      But cpu_possible() will pass the check. We shouldn't be able to bind
      to an offline cpu. Use cpu_online() to do the check.
      
      After the change, the testcase run as expected: EINVAL will be returned
      for cpu offlined.
      Reviewed-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: default avatarShenghui Wang <shhuiw@foxmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b6e7eed8
    • Abhi Das's avatar
      gfs2: fix race between gfs2_freeze_func and unmount · e983250f
      Abhi Das authored
      [ Upstream commit 8f918219 ]
      
      As part of the freeze operation, gfs2_freeze_func() is left blocking
      on a request to hold the sd_freeze_gl in SH. This glock is held in EX
      by the gfs2_freeze() code.
      
      A subsequent call to gfs2_unfreeze() releases the EXclusively held
      sd_freeze_gl, which allows gfs2_freeze_func() to acquire it in SH and
      resume its operation.
      
      gfs2_unfreeze(), however, doesn't wait for gfs2_freeze_func() to complete.
      If a umount is issued right after unfreeze, it could result in an
      inconsistent filesystem because some journal data (statfs update) isn't
      written out.
      
      Refer to commit 24972557 for a more detailed explanation of how
      freeze/unfreeze work.
      
      This patch causes gfs2_unfreeze() to wait for gfs2_freeze_func() to
      complete before returning to the user.
      Signed-off-by: default avatarAbhi Das <adas@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e983250f
    • Roberto Bergantinos Corpas's avatar
      NFS: make nfs_match_client killable · dab2c7d9
      Roberto Bergantinos Corpas authored
      [ Upstream commit 950a578c ]
      
          Actually we don't do anything with return value from
          nfs_wait_client_init_complete in nfs_match_client, as a
          consequence if we get a fatal signal and client is not
          fully initialised, we'll loop to "again" label
      
          This has been proven to cause soft lockups on some scenarios
          (no-carrier but configured network interfaces)
      Signed-off-by: default avatarRoberto Bergantinos Corpas <rbergant@redhat.com>
      Reviewed-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      dab2c7d9
    • David Howells's avatar
      afs: Fix getting the afs.fid xattr · 86edf5af
      David Howells authored
      [ Upstream commit a2f611a3 ]
      
      The AFS3 FID is three 32-bit unsigned numbers and is represented as three
      up-to-8-hex-digit numbers separated by colons to the afs.fid xattr.
      However, with the advent of support for YFS, the FID is now a 64-bit volume
      number, a 96-bit vnode/inode number and a 32-bit uniquifier (as before).
      Whilst the sprintf in afs_xattr_get_fid() has been partially updated (it
      currently ignores the upper 32 bits of the 96-bit vnode number), the size
      of the stack-based buffer has not been increased to match, thereby allowing
      stack corruption to occur.
      
      Fix this by increasing the buffer size appropriately and conditionally
      including the upper part of the vnode number if it is non-zero.  The latter
      requires the lower part to be zero-padded if the upper part is non-zero.
      
      Fixes: 3b6492df ("afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      86edf5af
    • YueHaibing's avatar
      cxgb4: Fix error path in cxgb4_init_module · 49842a9e
      YueHaibing authored
      [ Upstream commit a3147770 ]
      
      BUG: unable to handle kernel paging request at ffffffffa016a270
      PGD 3270067 P4D 3270067 PUD 3271063 PMD 230bbd067 PTE 0
      Oops: 0000 [#1
      CPU: 0 PID: 6134 Comm: modprobe Not tainted 5.1.0+ #33
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
      RIP: 0010:atomic_notifier_chain_register+0x24/0x60
      Code: 1f 80 00 00 00 00 55 48 89 e5 41 54 49 89 f4 53 48 89 fb e8 ae b4 38 01 48 8b 53 38 48 8d 4b 38 48 85 d2 74 20 45 8b 44 24 10 <44> 3b 42 10 7e 08 eb 13 44 39 42 10 7c 0d 48 8d 4a 08 48 8b 52 08
      RSP: 0018:ffffc90000e2bc60 EFLAGS: 00010086
      RAX: 0000000000000292 RBX: ffffffff83467240 RCX: ffffffff83467278
      RDX: ffffffffa016a260 RSI: ffffffff83752140 RDI: ffffffff83467240
      RBP: ffffc90000e2bc70 R08: 0000000000000000 R09: 0000000000000001
      R10: 0000000000000000 R11: 00000000014fa61f R12: ffffffffa01c8260
      R13: ffff888231091e00 R14: 0000000000000000 R15: ffffc90000e2be78
      FS:  00007fbd8d7cd540(0000) GS:ffff888237a00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffffffffa016a270 CR3: 000000022c7e3000 CR4: 00000000000006f0
      Call Trace:
       register_inet6addr_notifier+0x13/0x20
       cxgb4_init_module+0x6c/0x1000 [cxgb4
       ? 0xffffffffa01d7000
       do_one_initcall+0x6c/0x3cc
       ? do_init_module+0x22/0x1f1
       ? rcu_read_lock_sched_held+0x97/0xb0
       ? kmem_cache_alloc_trace+0x325/0x3b0
       do_init_module+0x5b/0x1f1
       load_module+0x1db1/0x2690
       ? m_show+0x1d0/0x1d0
       __do_sys_finit_module+0xc5/0xd0
       __x64_sys_finit_module+0x15/0x20
       do_syscall_64+0x6b/0x1d0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      If pci_register_driver fails, register inet6addr_notifier is
      pointless. This patch fix the error path in cxgb4_init_module.
      
      Fixes: b5a02f50 ("cxgb4 : Update ipv6 address handling api")
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      49842a9e
    • Ross Lagerwall's avatar
      gfs2: Fix lru_count going negative · dcbdbe13
      Ross Lagerwall authored
      [ Upstream commit 7881ef3f ]
      
      Under certain conditions, lru_count may drop below zero resulting in
      a large amount of log spam like this:
      
      vmscan: shrink_slab: gfs2_dump_glock+0x3b0/0x630 [gfs2] \
          negative objects to delete nr=-1
      
      This happens as follows:
      1) A glock is moved from lru_list to the dispose list and lru_count is
         decremented.
      2) The dispose function calls cond_resched() and drops the lru lock.
      3) Another thread takes the lru lock and tries to add the same glock to
         lru_list, checking if the glock is on an lru list.
      4) It is on a list (actually the dispose list) and so it avoids
         incrementing lru_count.
      5) The glock is moved to lru_list.
      5) The original thread doesn't dispose it because it has been re-added
         to the lru list but the lru_count has still decreased by one.
      
      Fix by checking if the LRU flag is set on the glock rather than checking
      if the glock is on some list and rearrange the code so that the LRU flag
      is added/removed precisely when the glock is added/removed from lru_list.
      Signed-off-by: default avatarRoss Lagerwall <ross.lagerwall@citrix.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      dcbdbe13
    • David Sterba's avatar
      Revert "btrfs: Honour FITRIM range constraints during free space trim" · 93d91ef8
      David Sterba authored
      This reverts commit eb432217.
      
      There is currently no corresponding patch in master due to additional
      changes that would be significantly different from plain revert in the
      respective stable branch.
      
      The range argument was not handled correctly and could cause trim to
      overlap allocated areas or reach beyond the end of the device. The
      address space that fitrim normally operates on is in logical
      coordinates, while the discards are done on the physical device extents.
      This distinction cannot be made with the current ioctl interface and
      caused the confusion.
      
      The bug depends on the layout of block groups and does not always
      happen. The whole-fs trim (run by default by the fstrim tool) is not
      affected.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      93d91ef8
    • Kristian Evensen's avatar
      netfilter: ctnetlink: Resolve conntrack L3-protocol flush regression · dc9ddd15
      Kristian Evensen authored
      commit f8e60898 upstream.
      
      Commit 59c08c69 ("netfilter: ctnetlink: Support L3 protocol-filter
      on flush") introduced a user-space regression when flushing connection
      track entries. Before this commit, the nfgen_family field was not used
      by the kernel and all entries were removed. Since this commit,
      nfgen_family is used to filter out entries that should not be removed.
      One example a broken tool is conntrack. conntrack always sets
      nfgen_family to AF_INET, so after 59c08c69 only IPv4 entries were
      removed with the -F parameter.
      
      Pablo Neira Ayuso suggested using nfgenmsg->version to resolve the
      regression, and this commit implements his suggestion. nfgenmsg->version
      is so far set to zero, so it is well-suited to be used as a flag for
      selecting old or new flush behavior. If version is 0, nfgen_family is
      ignored and all entries are used. If user-space sets the version to one
      (or any other value than 0), then the new behavior is used. As version
      only can have two valid values, I chose not to add a new
      NFNETLINK_VERSION-constant.
      
      Fixes: 59c08c69 ("netfilter: ctnetlink: Support L3 protocol-filter on flush")
      Reported-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Suggested-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarKristian Evensen <kristian.evensen@gmail.com>
      Tested-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dc9ddd15
    • Al Viro's avatar
      acct_on(): don't mess with freeze protection · a71be911
      Al Viro authored
      commit 9419a319 upstream.
      
      What happens there is that we are replacing file->path.mnt of
      a file we'd just opened with a clone and we need the write
      count contribution to be transferred from original mount to
      new one.  That's it.  We do *NOT* want any kind of freeze
      protection for the duration of switchover.
      
      IOW, we should just use __mnt_{want,drop}_write() for that
      switchover; no need to bother with mnt_{want,drop}_write()
      there.
      Tested-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Reported-by: syzbot+2a73a6ea9507b7112141@syzkaller.appspotmail.com
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a71be911
    • YueHaibing's avatar
      at76c50x-usb: Don't register led_trigger if usb_register_driver failed · f3919070
      YueHaibing authored
      commit 09ac2694 upstream.
      
      Syzkaller report this:
      
      [ 1213.468581] BUG: unable to handle kernel paging request at fffffbfff83bf338
      [ 1213.469530] #PF error: [normal kernel read fault]
      [ 1213.469530] PGD 237fe4067 P4D 237fe4067 PUD 237e60067 PMD 1c868b067 PTE 0
      [ 1213.473514] Oops: 0000 [#1] SMP KASAN PTI
      [ 1213.473514] CPU: 0 PID: 6321 Comm: syz-executor.0 Tainted: G         C        5.1.0-rc3+ #8
      [ 1213.473514] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      [ 1213.473514] RIP: 0010:strcmp+0x31/0xa0
      [ 1213.473514] Code: 00 00 00 00 fc ff df 55 53 48 83 ec 08 eb 0a 84 db 48 89 ef 74 5a 4c 89 e6 48 89 f8 48 89 fa 48 8d 6f 01 48 c1 e8 03 83 e2 07 <42> 0f b6 04 28 38 d0 7f 04 84 c0 75 50 48 89 f0 48 89 f2 0f b6 5d
      [ 1213.473514] RSP: 0018:ffff8881f2b7f950 EFLAGS: 00010246
      [ 1213.473514] RAX: 1ffffffff83bf338 RBX: ffff8881ea6f7240 RCX: ffffffff825350c6
      [ 1213.473514] RDX: 0000000000000000 RSI: ffffffffc1ee19c0 RDI: ffffffffc1df99c0
      [ 1213.473514] RBP: ffffffffc1df99c1 R08: 0000000000000001 R09: 0000000000000004
      [ 1213.473514] R10: 0000000000000000 R11: ffff8881de353f00 R12: ffff8881ee727900
      [ 1213.473514] R13: dffffc0000000000 R14: 0000000000000001 R15: ffffffffc1eeaaf0
      [ 1213.473514] FS:  00007fa66fa01700(0000) GS:ffff8881f7200000(0000) knlGS:0000000000000000
      [ 1213.473514] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1213.473514] CR2: fffffbfff83bf338 CR3: 00000001ebb9e005 CR4: 00000000007606f0
      [ 1213.473514] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 1213.473514] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 1213.473514] PKRU: 55555554
      [ 1213.473514] Call Trace:
      [ 1213.473514]  led_trigger_register+0x112/0x3f0
      [ 1213.473514]  led_trigger_register_simple+0x7a/0x110
      [ 1213.473514]  ? 0xffffffffc1c10000
      [ 1213.473514]  at76_mod_init+0x77/0x1000 [at76c50x_usb]
      [ 1213.473514]  do_one_initcall+0xbc/0x47d
      [ 1213.473514]  ? perf_trace_initcall_level+0x3a0/0x3a0
      [ 1213.473514]  ? kasan_unpoison_shadow+0x30/0x40
      [ 1213.473514]  ? kasan_unpoison_shadow+0x30/0x40
      [ 1213.473514]  do_init_module+0x1b5/0x547
      [ 1213.473514]  load_module+0x6405/0x8c10
      [ 1213.473514]  ? module_frob_arch_sections+0x20/0x20
      [ 1213.473514]  ? kernel_read_file+0x1e6/0x5d0
      [ 1213.473514]  ? find_held_lock+0x32/0x1c0
      [ 1213.473514]  ? cap_capable+0x1ae/0x210
      [ 1213.473514]  ? __do_sys_finit_module+0x162/0x190
      [ 1213.473514]  __do_sys_finit_module+0x162/0x190
      [ 1213.473514]  ? __ia32_sys_init_module+0xa0/0xa0
      [ 1213.473514]  ? __mutex_unlock_slowpath+0xdc/0x690
      [ 1213.473514]  ? wait_for_completion+0x370/0x370
      [ 1213.473514]  ? vfs_write+0x204/0x4a0
      [ 1213.473514]  ? do_syscall_64+0x18/0x450
      [ 1213.473514]  do_syscall_64+0x9f/0x450
      [ 1213.473514]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [ 1213.473514] RIP: 0033:0x462e99
      [ 1213.473514] Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      [ 1213.473514] RSP: 002b:00007fa66fa00c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      [ 1213.473514] RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
      [ 1213.473514] RDX: 0000000000000000 RSI: 0000000020000300 RDI: 0000000000000003
      [ 1213.473514] RBP: 00007fa66fa00c70 R08: 0000000000000000 R09: 0000000000000000
      [ 1213.473514] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fa66fa016bc
      [ 1213.473514] R13: 00000000004bcefa R14: 00000000006f6fb0 R15: 0000000000000004
      
      If usb_register failed, no need to call led_trigger_register_simple.
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Fixes: 1264b951 ("at76c50x-usb: add driver")
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f3919070
    • Linus Lüssing's avatar
      batman-adv: mcast: fix multicast tt/tvlv worker locking · 2eec11f7
      Linus Lüssing authored
      commit a3c7cd0c upstream.
      
      Syzbot has reported some issues with the locking assumptions made for
      the multicast tt/tvlv worker: It was able to trigger the WARN_ON() in
      batadv_mcast_mla_tt_retract() and batadv_mcast_mla_tt_add().
      While hard/not reproduceable for us so far it seems that the
      delayed_work_pending() we use might not be quite safe from reordering.
      
      Therefore this patch adds an explicit, new spinlock to protect the
      update of the mla_list and flags in bat_priv and then removes the
      WARN_ON(delayed_work_pending()).
      
      Reported-by: syzbot+83f2d54ec6b7e417e13f@syzkaller.appspotmail.com
      Reported-by: syzbot+050927a651272b145a5d@syzkaller.appspotmail.com
      Reported-by: syzbot+979ffc89b87309b1b94b@syzkaller.appspotmail.com
      Reported-by: syzbot+f9f3f388440283da2965@syzkaller.appspotmail.com
      Fixes: cbebd363 ("batman-adv: Use own timer for multicast TT and TVLV updates")
      Signed-off-by: default avatarLinus Lüssing <linus.luessing@c0d3.blue>
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarSimon Wunderlich <sw@simonwunderlich.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2eec11f7
    • Eric Dumazet's avatar
      bpf: devmap: fix use-after-free Read in __dev_map_entry_free · 45d7cd7c
      Eric Dumazet authored
      commit 2baae354 upstream.
      
      synchronize_rcu() is fine when the rcu callbacks only need
      to free memory (kfree_rcu() or direct kfree() call rcu call backs)
      
      __dev_map_entry_free() is a bit more complex, so we need to make
      sure that call queued __dev_map_entry_free() callbacks have completed.
      
      sysbot report:
      
      BUG: KASAN: use-after-free in dev_map_flush_old kernel/bpf/devmap.c:365
      [inline]
      BUG: KASAN: use-after-free in __dev_map_entry_free+0x2a8/0x300
      kernel/bpf/devmap.c:379
      Read of size 8 at addr ffff8801b8da38c8 by task ksoftirqd/1/18
      
      CPU: 1 PID: 18 Comm: ksoftirqd/1 Not tainted 4.17.0+ #39
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Call Trace:
        __dump_stack lib/dump_stack.c:77 [inline]
        dump_stack+0x1b9/0x294 lib/dump_stack.c:113
        print_address_description+0x6c/0x20b mm/kasan/report.c:256
        kasan_report_error mm/kasan/report.c:354 [inline]
        kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
        __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
        dev_map_flush_old kernel/bpf/devmap.c:365 [inline]
        __dev_map_entry_free+0x2a8/0x300 kernel/bpf/devmap.c:379
        __rcu_reclaim kernel/rcu/rcu.h:178 [inline]
        rcu_do_batch kernel/rcu/tree.c:2558 [inline]
        invoke_rcu_callbacks kernel/rcu/tree.c:2818 [inline]
        __rcu_process_callbacks kernel/rcu/tree.c:2785 [inline]
        rcu_process_callbacks+0xe9d/0x1760 kernel/rcu/tree.c:2802
        __do_softirq+0x2e0/0xaf5 kernel/softirq.c:284
        run_ksoftirqd+0x86/0x100 kernel/softirq.c:645
        smpboot_thread_fn+0x417/0x870 kernel/smpboot.c:164
        kthread+0x345/0x410 kernel/kthread.c:240
        ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
      
      Allocated by task 6675:
        save_stack+0x43/0xd0 mm/kasan/kasan.c:448
        set_track mm/kasan/kasan.c:460 [inline]
        kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
        kmem_cache_alloc_trace+0x152/0x780 mm/slab.c:3620
        kmalloc include/linux/slab.h:513 [inline]
        kzalloc include/linux/slab.h:706 [inline]
        dev_map_alloc+0x208/0x7f0 kernel/bpf/devmap.c:102
        find_and_alloc_map kernel/bpf/syscall.c:129 [inline]
        map_create+0x393/0x1010 kernel/bpf/syscall.c:453
        __do_sys_bpf kernel/bpf/syscall.c:2351 [inline]
        __se_sys_bpf kernel/bpf/syscall.c:2328 [inline]
        __x64_sys_bpf+0x303/0x510 kernel/bpf/syscall.c:2328
        do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:290
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 26:
        save_stack+0x43/0xd0 mm/kasan/kasan.c:448
        set_track mm/kasan/kasan.c:460 [inline]
        __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
        kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
        __cache_free mm/slab.c:3498 [inline]
        kfree+0xd9/0x260 mm/slab.c:3813
        dev_map_free+0x4fa/0x670 kernel/bpf/devmap.c:191
        bpf_map_free_deferred+0xba/0xf0 kernel/bpf/syscall.c:262
        process_one_work+0xc64/0x1b70 kernel/workqueue.c:2153
        worker_thread+0x181/0x13a0 kernel/workqueue.c:2296
        kthread+0x345/0x410 kernel/kthread.c:240
        ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
      
      The buggy address belongs to the object at ffff8801b8da37c0
        which belongs to the cache kmalloc-512 of size 512
      The buggy address is located 264 bytes inside of
        512-byte region [ffff8801b8da37c0, ffff8801b8da39c0)
      The buggy address belongs to the page:
      page:ffffea0006e368c0 count:1 mapcount:0 mapping:ffff8801da800940
      index:0xffff8801b8da3540
      flags: 0x2fffc0000000100(slab)
      raw: 02fffc0000000100 ffffea0007217b88 ffffea0006e30cc8 ffff8801da800940
      raw: ffff8801b8da3540 ffff8801b8da3040 0000000100000004 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
        ffff8801b8da3780: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
        ffff8801b8da3800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      > ffff8801b8da3880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                     ^
        ffff8801b8da3900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        ffff8801b8da3980: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
      
      Fixes: 546ac1ff ("bpf: add devmap, a map for storing net device references")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: syzbot+457d3e2ffbcf31aee5c0@syzkaller.appspotmail.com
      Acked-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      45d7cd7c
    • YueHaibing's avatar
      ssb: Fix possible NULL pointer dereference in ssb_host_pcmcia_exit · 3b924789
      YueHaibing authored
      commit b2c01aab upstream.
      
      Syzkaller report this:
      
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN PTI
      CPU: 0 PID: 4492 Comm: syz-executor.0 Not tainted 5.0.0-rc7+ #45
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      RIP: 0010:sysfs_remove_file_ns+0x27/0x70 fs/sysfs/file.c:468
      Code: 00 00 00 41 54 55 48 89 fd 53 49 89 d4 48 89 f3 e8 ee 76 9c ff 48 8d 7d 30 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 2d 48 89 da 48 b8 00 00 00 00 00 fc ff df 48 8b 6d
      RSP: 0018:ffff8881e9d9fc00 EFLAGS: 00010206
      RAX: dffffc0000000000 RBX: ffffffff900367e0 RCX: ffffffff81a95952
      RDX: 0000000000000006 RSI: ffffc90001405000 RDI: 0000000000000030
      RBP: 0000000000000000 R08: fffffbfff1fa22ed R09: fffffbfff1fa22ed
      R10: 0000000000000001 R11: fffffbfff1fa22ec R12: 0000000000000000
      R13: ffffffffc1abdac0 R14: 1ffff1103d3b3f8b R15: 0000000000000000
      FS:  00007fe409dc1700(0000) GS:ffff8881f1200000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000001b2d721000 CR3: 00000001e98b6005 CR4: 00000000007606f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
       sysfs_remove_file include/linux/sysfs.h:519 [inline]
       driver_remove_file+0x40/0x50 drivers/base/driver.c:122
       pcmcia_remove_newid_file drivers/pcmcia/ds.c:163 [inline]
       pcmcia_unregister_driver+0x7d/0x2b0 drivers/pcmcia/ds.c:209
       ssb_modexit+0xa/0x1b [ssb]
       __do_sys_delete_module kernel/module.c:1018 [inline]
       __se_sys_delete_module kernel/module.c:961 [inline]
       __x64_sys_delete_module+0x3dc/0x5e0 kernel/module.c:961
       do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x462e99
      Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fe409dc0c58 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0
      RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000200000c0
      RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe409dc16bc
      R13: 00000000004bccaa R14: 00000000006f6bc8 R15: 00000000ffffffff
      Modules linked in: ssb(-) 3c59x nvme_core macvlan tap pata_hpt3x3 rt2x00pci null_blk tsc40 pm_notifier_error_inject notifier_error_inject mdio cdc_wdm nf_reject_ipv4 ath9k_common ath9k_hw ath pppox ppp_generic slhc ehci_platform wl12xx wlcore tps6507x_ts ioc4 nf_synproxy_core ide_gd_mod ax25 can_dev iwlwifi can_raw atm tm2_touchkey can_gw can sundance adp5588_keys rt2800mmio rt2800lib rt2x00mmio rt2x00lib eeprom_93cx6 pn533 lru_cache elants_i2c ip_set nfnetlink gameport tipc hampshire nhc_ipv6 nhc_hop nhc_udp nhc_fragment nhc_routing nhc_mobility nhc_dest 6lowpan silead brcmutil nfc mt76_usb mt76 mac80211 iptable_security iptable_raw iptable_mangle iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip6_vti ip_gre sit hsr veth vxcan batman_adv cfg80211 rfkill chnl_net caif nlmon vcan bridge stp llc ip6_gre ip6_tunnel tunnel6 tun joydev mousedev serio_raw ide_pci_generic piix floppy ide_core sch_fq_codel ip_tables x_tables ipv6
       [last unloaded: 3c59x]
      Dumping ftrace buffer:
         (ftrace buffer empty)
      ---[ end trace 3913cbf8011e1c05 ]---
      
      In ssb_modinit, it does not fail SSB init when ssb_host_pcmcia_init failed,
      however in ssb_modexit, ssb_host_pcmcia_exit calls pcmcia_unregister_driver
      unconditionally, which may tigger a NULL pointer dereference issue as above.
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Fixes: 399500da ("ssb: pick PCMCIA host code support from b43 driver")
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3b924789
    • Alexander Potapenko's avatar
      media: vivid: use vfree() instead of kfree() for dev->bitmap_cap · b82e0f50
      Alexander Potapenko authored
      commit dad7e270 upstream.
      
      syzkaller reported crashes on kfree() called from
      vivid_vid_cap_s_selection(). This looks like a simple typo, as
      dev->bitmap_cap is allocated with vzalloc() throughout the file.
      
      Fixes: ef834f78 ("[media] vivid: add the video capture and output
      parts")
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Reported-by: default avatarSyzbot <syzbot+6c0effb5877f6b0344e2@syzkaller.appspotmail.com>
      Signed-off-by: default avatarHans Verkuil <hverkuil-cisco@xs4all.nl>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b82e0f50
    • Hans Verkuil's avatar
      media: vb2: add waiting_in_dqbuf flag · da757ae6
      Hans Verkuil authored
      commit d65842f7 upstream.
      
      Calling VIDIOC_DQBUF can release the core serialization lock pointed to
      by vb2_queue->lock if it has to wait for a new buffer to arrive.
      
      However, if userspace dup()ped the video device filehandle, then it is
      possible to read or call DQBUF from two filehandles at the same time.
      
      It is also possible to call REQBUFS from one filehandle while the other
      is waiting for a buffer. This will remove all the buffers and reallocate
      new ones. Removing all the buffers isn't the problem here (that's already
      handled correctly by DQBUF), but the reallocating part is: DQBUF isn't
      aware that the buffers have changed.
      
      This is fixed by setting a flag whenever the lock is released while waiting
      for a buffer to arrive. And checking the flag where needed so we can return
      -EBUSY.
      Signed-off-by: default avatarHans Verkuil <hverkuil@xs4all.nl>
      Reported-by: default avatarSyzbot <syzbot+4180ff9ca6810b06c1e9@syzkaller.appspotmail.com>
      Reviewed-by: default avatarTomasz Figa <tfiga@chromium.org>
      Signed-off-by: default avatarHans Verkuil <hverkuil-cisco@xs4all.nl>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      da757ae6
    • YueHaibing's avatar
      media: serial_ir: Fix use-after-free in serial_ir_init_module · 0270d8b8
      YueHaibing authored
      commit 56cd26b6 upstream.
      
      Syzkaller report this:
      
      BUG: KASAN: use-after-free in sysfs_remove_file_ns+0x5f/0x70 fs/sysfs/file.c:468
      Read of size 8 at addr ffff8881dc7ae030 by task syz-executor.0/6249
      
      CPU: 1 PID: 6249 Comm: syz-executor.0 Not tainted 5.0.0-rc8+ #3
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0xfa/0x1ce lib/dump_stack.c:113
       print_address_description+0x65/0x270 mm/kasan/report.c:187
       kasan_report+0x149/0x18d mm/kasan/report.c:317
       ? 0xffffffffc1728000
       sysfs_remove_file_ns+0x5f/0x70 fs/sysfs/file.c:468
       sysfs_remove_file include/linux/sysfs.h:519 [inline]
       driver_remove_file+0x40/0x50 drivers/base/driver.c:122
       remove_bind_files drivers/base/bus.c:585 [inline]
       bus_remove_driver+0x186/0x220 drivers/base/bus.c:725
       driver_unregister+0x6c/0xa0 drivers/base/driver.c:197
       serial_ir_init_module+0x169/0x1000 [serial_ir]
       do_one_initcall+0xfa/0x5ca init/main.c:887
       do_init_module+0x204/0x5f6 kernel/module.c:3460
       load_module+0x66b2/0x8570 kernel/module.c:3808
       __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
       do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x462e99
      Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f9450132c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
      RDX: 0000000000000000 RSI: 0000000020000100 RDI: 0000000000000003
      RBP: 00007f9450132c70 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007f94501336bc
      R13: 00000000004bcefa R14: 00000000006f6fb0 R15: 0000000000000004
      
      Allocated by task 6249:
       set_track mm/kasan/common.c:85 [inline]
       __kasan_kmalloc.constprop.3+0xa0/0xd0 mm/kasan/common.c:495
       kmalloc include/linux/slab.h:545 [inline]
       kzalloc include/linux/slab.h:740 [inline]
       bus_add_driver+0xc0/0x610 drivers/base/bus.c:651
       driver_register+0x1bb/0x3f0 drivers/base/driver.c:170
       serial_ir_init_module+0xe8/0x1000 [serial_ir]
       do_one_initcall+0xfa/0x5ca init/main.c:887
       do_init_module+0x204/0x5f6 kernel/module.c:3460
       load_module+0x66b2/0x8570 kernel/module.c:3808
       __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
       do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 6249:
       set_track mm/kasan/common.c:85 [inline]
       __kasan_slab_free+0x130/0x180 mm/kasan/common.c:457
       slab_free_hook mm/slub.c:1430 [inline]
       slab_free_freelist_hook mm/slub.c:1457 [inline]
       slab_free mm/slub.c:3005 [inline]
       kfree+0xe1/0x270 mm/slub.c:3957
       kobject_cleanup lib/kobject.c:662 [inline]
       kobject_release lib/kobject.c:691 [inline]
       kref_put include/linux/kref.h:67 [inline]
       kobject_put+0x146/0x240 lib/kobject.c:708
       bus_remove_driver+0x10e/0x220 drivers/base/bus.c:732
       driver_unregister+0x6c/0xa0 drivers/base/driver.c:197
       serial_ir_init_module+0x14c/0x1000 [serial_ir]
       do_one_initcall+0xfa/0x5ca init/main.c:887
       do_init_module+0x204/0x5f6 kernel/module.c:3460
       load_module+0x66b2/0x8570 kernel/module.c:3808
       __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
       do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff8881dc7ae000
       which belongs to the cache kmalloc-256 of size 256
      The buggy address is located 48 bytes inside of
       256-byte region [ffff8881dc7ae000, ffff8881dc7ae100)
      The buggy address belongs to the page:
      page:ffffea000771eb80 count:1 mapcount:0 mapping:ffff8881f6c02e00 index:0x0
      flags: 0x2fffc0000000200(slab)
      raw: 02fffc0000000200 ffffea0007d14800 0000000400000002 ffff8881f6c02e00
      raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8881dc7adf00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffff8881dc7adf80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      >ffff8881dc7ae000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                           ^
       ffff8881dc7ae080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8881dc7ae100: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00
      
      There are already cleanup handlings in serial_ir_init error path,
      no need to call serial_ir_exit do it again in serial_ir_init_module,
      otherwise will trigger a use-after-free issue.
      
      Fixes: fa5dc29c ("[media] lirc_serial: move out of staging and rename to serial_ir")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarSean Young <sean@mess.org>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0270d8b8
    • YueHaibing's avatar
      media: cpia2: Fix use-after-free in cpia2_exit · 8c103d2b
      YueHaibing authored
      commit dea37a97 upstream.
      
      Syzkaller report this:
      
      BUG: KASAN: use-after-free in sysfs_remove_file_ns+0x5f/0x70 fs/sysfs/file.c:468
      Read of size 8 at addr ffff8881f59a6b70 by task syz-executor.0/8363
      
      CPU: 0 PID: 8363 Comm: syz-executor.0 Not tainted 5.0.0-rc8+ #3
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0xfa/0x1ce lib/dump_stack.c:113
       print_address_description+0x65/0x270 mm/kasan/report.c:187
       kasan_report+0x149/0x18d mm/kasan/report.c:317
       sysfs_remove_file_ns+0x5f/0x70 fs/sysfs/file.c:468
       sysfs_remove_file include/linux/sysfs.h:519 [inline]
       driver_remove_file+0x40/0x50 drivers/base/driver.c:122
       usb_remove_newid_files drivers/usb/core/driver.c:212 [inline]
       usb_deregister+0x12a/0x3b0 drivers/usb/core/driver.c:1005
       cpia2_exit+0xa/0x16 [cpia2]
       __do_sys_delete_module kernel/module.c:1018 [inline]
       __se_sys_delete_module kernel/module.c:961 [inline]
       __x64_sys_delete_module+0x3dc/0x5e0 kernel/module.c:961
       do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x462e99
      Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f86f3754c58 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0
      RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000300
      RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007f86f37556bc
      R13: 00000000004bcca9 R14: 00000000006f6b48 R15: 00000000ffffffff
      
      Allocated by task 8363:
       set_track mm/kasan/common.c:85 [inline]
       __kasan_kmalloc.constprop.3+0xa0/0xd0 mm/kasan/common.c:495
       kmalloc include/linux/slab.h:545 [inline]
       kzalloc include/linux/slab.h:740 [inline]
       bus_add_driver+0xc0/0x610 drivers/base/bus.c:651
       driver_register+0x1bb/0x3f0 drivers/base/driver.c:170
       usb_register_driver+0x267/0x520 drivers/usb/core/driver.c:965
       0xffffffffc1b4817c
       do_one_initcall+0xfa/0x5ca init/main.c:887
       do_init_module+0x204/0x5f6 kernel/module.c:3460
       load_module+0x66b2/0x8570 kernel/module.c:3808
       __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
       do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 8363:
       set_track mm/kasan/common.c:85 [inline]
       __kasan_slab_free+0x130/0x180 mm/kasan/common.c:457
       slab_free_hook mm/slub.c:1430 [inline]
       slab_free_freelist_hook mm/slub.c:1457 [inline]
       slab_free mm/slub.c:3005 [inline]
       kfree+0xe1/0x270 mm/slub.c:3957
       kobject_cleanup lib/kobject.c:662 [inline]
       kobject_release lib/kobject.c:691 [inline]
       kref_put include/linux/kref.h:67 [inline]
       kobject_put+0x146/0x240 lib/kobject.c:708
       bus_remove_driver+0x10e/0x220 drivers/base/bus.c:732
       driver_unregister+0x6c/0xa0 drivers/base/driver.c:197
       usb_register_driver+0x341/0x520 drivers/usb/core/driver.c:980
       0xffffffffc1b4817c
       do_one_initcall+0xfa/0x5ca init/main.c:887
       do_init_module+0x204/0x5f6 kernel/module.c:3460
       load_module+0x66b2/0x8570 kernel/module.c:3808
       __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
       do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff8881f59a6b40
       which belongs to the cache kmalloc-256 of size 256
      The buggy address is located 48 bytes inside of
       256-byte region [ffff8881f59a6b40, ffff8881f59a6c40)
      The buggy address belongs to the page:
      page:ffffea0007d66980 count:1 mapcount:0 mapping:ffff8881f6c02e00 index:0x0
      flags: 0x2fffc0000000200(slab)
      raw: 02fffc0000000200 dead000000000100 dead000000000200 ffff8881f6c02e00
      raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8881f59a6a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffff8881f59a6a80: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc
      >ffff8881f59a6b00: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
                                                                   ^
       ffff8881f59a6b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8881f59a6c00: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
      
      cpia2_init does not check return value of cpia2_init, if it failed
      in usb_register_driver, there is already cleanup using driver_unregister.
      No need call cpia2_usb_cleanup on module exit.
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarHans Verkuil <hverkuil-cisco@xs4all.nl>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8c103d2b
    • Jiufei Xue's avatar
      fbdev: fix WARNING in __alloc_pages_nodemask bug · e630b9ac
      Jiufei Xue authored
      commit 8c40292b upstream.
      
      Syzkaller hit 'WARNING in __alloc_pages_nodemask' bug.
      
      WARNING: CPU: 1 PID: 1473 at mm/page_alloc.c:4377
      __alloc_pages_nodemask+0x4da/0x2130
      Kernel panic - not syncing: panic_on_warn set ...
      
      Call Trace:
       alloc_pages_current+0xb1/0x1e0
       kmalloc_order+0x1f/0x60
       kmalloc_order_trace+0x1d/0x120
       fb_alloc_cmap_gfp+0x85/0x2b0
       fb_set_user_cmap+0xff/0x370
       do_fb_ioctl+0x949/0xa20
       fb_ioctl+0xdd/0x120
       do_vfs_ioctl+0x186/0x1070
       ksys_ioctl+0x89/0xa0
       __x64_sys_ioctl+0x74/0xb0
       do_syscall_64+0xc8/0x550
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      This is a warning about order >= MAX_ORDER and the order is from
      userspace ioctl. Add flag __NOWARN to silence this warning.
      Signed-off-by: default avatarJiufei Xue <jiufei.xue@linux.alibaba.com>
      Signed-off-by: default avatarBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e630b9ac
    • Amir Goldstein's avatar
      ovl: relax WARN_ON() for overlapping layers use case · ebe6000e
      Amir Goldstein authored
      commit acf3062a upstream.
      
      This nasty little syzbot repro:
      https://syzkaller.appspot.com/x/repro.syz?x=12c7a94f400000
      
      Creates overlay mounts where the same directory is both in upper and lower
      layers. Simplified example:
      
        mkdir foo work
        mount -t overlay none foo -o"lowerdir=.,upperdir=foo,workdir=work"
      
      The repro runs several threads in parallel that attempt to chdir into foo
      and attempt to symlink/rename/exec/mkdir the file bar.
      
      The repro hits a WARN_ON() I placed in ovl_instantiate(), which suggests
      that an overlay inode already exists in cache and is hashed by the pointer
      of the real upper dentry that ovl_create_real() has just created. At the
      point of the WARN_ON(), for overlay dir inode lock is held and upper dir
      inode lock, so at first, I did not see how this was possible.
      
      On a closer look, I see that after ovl_create_real(), because of the
      overlapping upper and lower layers, a lookup by another thread can find the
      file foo/bar that was just created in upper layer, at overlay path
      foo/foo/bar and hash the an overlay inode with the new real dentry as lower
      dentry. This is possible because the overlay directory foo/foo is not
      locked and the upper dentry foo/bar is in dcache, so ovl_lookup() can find
      it without taking upper dir inode shared lock.
      
      Overlapping layers is considered a wrong setup which would result in
      unexpected behavior, but it shouldn't crash the kernel and it shouldn't
      trigger WARN_ON() either, so relax this WARN_ON() and leave a pr_warn()
      instead to cover all cases of failure to get an overlay inode.
      
      The error returned from failure to insert new inode to cache with
      inode_insert5() was changed to -EEXIST, to distinguish from the error
      -ENOMEM returned on failure to get/allocate inode with iget5_locked().
      
      Reported-by: syzbot+9c69c282adc4edd2b540@syzkaller.appspotmail.com
      Fixes: 01b39dcc ("ovl: use inode_insert5() to hash a newly...")
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ebe6000e
    • Will Deacon's avatar
      arm64: errata: Add workaround for Cortex-A76 erratum #1463225 · 2f71a4e3
      Will Deacon authored
      commit 969f5ea6 upstream.
      
      Revisions of the Cortex-A76 CPU prior to r4p0 are affected by an erratum
      that can prevent interrupts from being taken when single-stepping.
      
      This patch implements a software workaround to prevent userspace from
      effectively being able to disable interrupts.
      
      Cc: <stable@vger.kernel.org>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      2f71a4e3
    • Shile Zhang's avatar
      fbdev: fix divide error in fb_var_to_videomode · 4aeac859
      Shile Zhang authored
      commit cf84807f upstream.
      
      To fix following divide-by-zero error found by Syzkaller:
      
        divide error: 0000 [#1] SMP PTI
        CPU: 7 PID: 8447 Comm: test Kdump: loaded Not tainted 4.19.24-8.al7.x86_64 #1
        Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
        RIP: 0010:fb_var_to_videomode+0xae/0xc0
        Code: 04 44 03 46 78 03 4e 7c 44 03 46 68 03 4e 70 89 ce d1 ee 69 c0 e8 03 00 00 f6 c2 01 0f 45 ce 83 e2 02 8d 34 09 0f 45 ce 31 d2 <41> f7 f0 31 d2 f7 f1 89 47 08 f3 c3 66 0f 1f 44 00 00 0f 1f 44 00
        RSP: 0018:ffffb7e189347bf0 EFLAGS: 00010246
        RAX: 00000000e1692410 RBX: ffffb7e189347d60 RCX: 0000000000000000
        RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffb7e189347c10
        RBP: ffff99972a091c00 R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000100
        R13: 0000000000010000 R14: 00007ffd66baf6d0 R15: 0000000000000000
        FS:  00007f2054d11740(0000) GS:ffff99972fbc0000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007f205481fd20 CR3: 00000004288a0001 CR4: 00000000001606a0
        Call Trace:
         fb_set_var+0x257/0x390
         ? lookup_fast+0xbb/0x2b0
         ? fb_open+0xc0/0x140
         ? chrdev_open+0xa6/0x1a0
         do_fb_ioctl+0x445/0x5a0
         do_vfs_ioctl+0x92/0x5f0
         ? __alloc_fd+0x3d/0x160
         ksys_ioctl+0x60/0x90
         __x64_sys_ioctl+0x16/0x20
         do_syscall_64+0x5b/0x190
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f20548258d7
        Code: 44 00 00 48 8b 05 b9 15 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 89 15 2d 00 f7 d8 64 89 01 48
      
      It can be triggered easily with following test code:
      
        #include <linux/fb.h>
        #include <fcntl.h>
        #include <sys/ioctl.h>
        int main(void)
        {
                struct fb_var_screeninfo var = {.activate = 0x100, .pixclock = 60};
                int fd = open("/dev/fb0", O_RDWR);
                if (fd < 0)
                        return 1;
      
                if (ioctl(fd, FBIOPUT_VSCREENINFO, &var))
                        return 1;
      
                return 0;
        }
      Signed-off-by: default avatarShile Zhang <shile.zhang@linux.alibaba.com>
      Cc: Fredrik Noring <noring@nocrew.org>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Reviewed-by: default avatarMukesh Ojha <mojha@codeaurora.org>
      Signed-off-by: default avatarBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4aeac859
    • Tobin C. Harding's avatar
      btrfs: sysfs: don't leak memory when failing add fsid · 53804824
      Tobin C. Harding authored
      commit e3277335 upstream.
      
      A failed call to kobject_init_and_add() must be followed by a call to
      kobject_put().  Currently in the error path when adding fs_devices we
      are missing this call.  This could be fixed by calling
      btrfs_sysfs_remove_fsid() if btrfs_sysfs_add_fsid() returns an error or
      by adding a call to kobject_put() directly in btrfs_sysfs_add_fsid().
      Here we choose the second option because it prevents the slightly
      unusual error path handling requirements of kobject from leaking out
      into btrfs functions.
      
      Add a call to kobject_put() in the error path of kobject_add_and_init().
      This causes the release method to be called if kobject_init_and_add()
      fails.  open_tree() is the function that calls btrfs_sysfs_add_fsid()
      and the error code in this function is already written with the
      assumption that the release method is called during the error path of
      open_tree() (as seen by the call to btrfs_sysfs_remove_fsid() under the
      fail_fsdev_sysfs label).
      
      Cc: stable@vger.kernel.org # v4.4+
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarTobin C. Harding <tobin@kernel.org>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      53804824
    • Tobin C. Harding's avatar
      btrfs: sysfs: Fix error path kobject memory leak · b2a48467
      Tobin C. Harding authored
      commit 450ff834 upstream.
      
      If a call to kobject_init_and_add() fails we must call kobject_put()
      otherwise we leak memory.
      
      Calling kobject_put() when kobject_init_and_add() fails drops the
      refcount back to 0 and calls the ktype release method (which in turn
      calls the percpu destroy and kfree).
      
      Add call to kobject_put() in the error path of call to
      kobject_init_and_add().
      
      Cc: stable@vger.kernel.org # v4.4+
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarTobin C. Harding <tobin@kernel.org>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2a48467
    • Filipe Manana's avatar
      Btrfs: fix race between ranged fsync and writeback of adjacent ranges · ffd658ad
      Filipe Manana authored
      commit 0c713cba upstream.
      
      When we do a full fsync (the bit BTRFS_INODE_NEEDS_FULL_SYNC is set in the
      inode) that happens to be ranged, which happens during a msync() or writes
      for files opened with O_SYNC for example, we can end up with a corrupt log,
      due to different file extent items representing ranges that overlap with
      each other, or hit some assertion failures.
      
      When doing a ranged fsync we only flush delalloc and wait for ordered
      exents within that range. If while we are logging items from our inode
      ordered extents for adjacent ranges complete, we end up in a race that can
      make us insert the file extent items that overlap with others we logged
      previously and the assertion failures.
      
      For example, if tree-log.c:copy_items() receives a leaf that has the
      following file extents items, all with a length of 4K and therefore there
      is an implicit hole in the range 68K to 72K - 1:
      
        (257 EXTENT_ITEM 64K), (257 EXTENT_ITEM 72K), (257 EXTENT_ITEM 76K), ...
      
      It copies them to the log tree. However due to the need to detect implicit
      holes, it may release the path, in order to look at the previous leaf to
      detect an implicit hole, and then later it will search again in the tree
      for the first file extent item key, with the goal of locking again the
      leaf (which might have changed due to concurrent changes to other inodes).
      
      However when it locks again the leaf containing the first key, the key
      corresponding to the extent at offset 72K may not be there anymore since
      there is an ordered extent for that range that is finishing (that is,
      somewhere in the middle of btrfs_finish_ordered_io()), and it just
      removed the file extent item but has not yet replaced it with a new file
      extent item, so the part of copy_items() that does hole detection will
      decide that there is a hole in the range starting from 68K to 76K - 1,
      and therefore insert a file extent item to represent that hole, having
      a key offset of 68K. After that we now have a log tree with 2 different
      extent items that have overlapping ranges:
      
       1) The file extent item copied before copy_items() released the path,
          which has a key offset of 72K and a length of 4K, representing the
          file range 72K to 76K - 1.
      
       2) And a file extent item representing a hole that has a key offset of
          68K and a length of 8K, representing the range 68K to 76K - 1. This
          item was inserted after releasing the path, and overlaps with the
          extent item inserted before.
      
      The overlapping extent items can cause all sorts of unpredictable and
      incorrect behaviour, either when replayed or if a fast (non full) fsync
      happens later, which can trigger a BUG_ON() when calling
      btrfs_set_item_key_safe() through __btrfs_drop_extents(), producing a
      trace like the following:
      
        [61666.783269] ------------[ cut here ]------------
        [61666.783943] kernel BUG at fs/btrfs/ctree.c:3182!
        [61666.784644] invalid opcode: 0000 [#1] PREEMPT SMP
        (...)
        [61666.786253] task: ffff880117b88c40 task.stack: ffffc90008168000
        [61666.786253] RIP: 0010:btrfs_set_item_key_safe+0x7c/0xd2 [btrfs]
        [61666.786253] RSP: 0018:ffffc9000816b958 EFLAGS: 00010246
        [61666.786253] RAX: 0000000000000000 RBX: 000000000000000f RCX: 0000000000030000
        [61666.786253] RDX: 0000000000000000 RSI: ffffc9000816ba4f RDI: ffffc9000816b937
        [61666.786253] RBP: ffffc9000816b998 R08: ffff88011dae2428 R09: 0000000000001000
        [61666.786253] R10: 0000160000000000 R11: 6db6db6db6db6db7 R12: ffff88011dae2418
        [61666.786253] R13: ffffc9000816ba4f R14: ffff8801e10c4118 R15: ffff8801e715c000
        [61666.786253] FS:  00007f6060a18700(0000) GS:ffff88023f5c0000(0000) knlGS:0000000000000000
        [61666.786253] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [61666.786253] CR2: 00007f6060a28000 CR3: 0000000213e69000 CR4: 00000000000006e0
        [61666.786253] Call Trace:
        [61666.786253]  __btrfs_drop_extents+0x5e3/0xaad [btrfs]
        [61666.786253]  ? time_hardirqs_on+0x9/0x14
        [61666.786253]  btrfs_log_changed_extents+0x294/0x4e0 [btrfs]
        [61666.786253]  ? release_extent_buffer+0x38/0xb4 [btrfs]
        [61666.786253]  btrfs_log_inode+0xb6e/0xcdc [btrfs]
        [61666.786253]  ? lock_acquire+0x131/0x1c5
        [61666.786253]  ? btrfs_log_inode_parent+0xee/0x659 [btrfs]
        [61666.786253]  ? arch_local_irq_save+0x9/0xc
        [61666.786253]  ? btrfs_log_inode_parent+0x1f5/0x659 [btrfs]
        [61666.786253]  btrfs_log_inode_parent+0x223/0x659 [btrfs]
        [61666.786253]  ? arch_local_irq_save+0x9/0xc
        [61666.786253]  ? lockref_get_not_zero+0x2c/0x34
        [61666.786253]  ? rcu_read_unlock+0x3e/0x5d
        [61666.786253]  btrfs_log_dentry_safe+0x60/0x7b [btrfs]
        [61666.786253]  btrfs_sync_file+0x317/0x42c [btrfs]
        [61666.786253]  vfs_fsync_range+0x8c/0x9e
        [61666.786253]  SyS_msync+0x13c/0x1c9
        [61666.786253]  entry_SYSCALL_64_fastpath+0x18/0xad
      
      A sample of a corrupt log tree leaf with overlapping extents I got from
      running btrfs/072:
      
            item 14 key (295 108 200704) itemoff 2599 itemsize 53
                    extent data disk bytenr 0 nr 0
                    extent data offset 0 nr 458752 ram 458752
            item 15 key (295 108 659456) itemoff 2546 itemsize 53
                    extent data disk bytenr 4343541760 nr 770048
                    extent data offset 606208 nr 163840 ram 770048
            item 16 key (295 108 663552) itemoff 2493 itemsize 53
                    extent data disk bytenr 4343541760 nr 770048
                    extent data offset 610304 nr 155648 ram 770048
            item 17 key (295 108 819200) itemoff 2440 itemsize 53
                    extent data disk bytenr 4334788608 nr 4096
                    extent data offset 0 nr 4096 ram 4096
      
      The file extent item at offset 659456 (item 15) ends at offset 823296
      (659456 + 163840) while the next file extent item (item 16) starts at
      offset 663552.
      
      Another different problem that the race can trigger is a failure in the
      assertions at tree-log.c:copy_items(), which expect that the first file
      extent item key we found before releasing the path exists after we have
      released path and that the last key we found before releasing the path
      also exists after releasing the path:
      
        $ cat -n fs/btrfs/tree-log.c
        4080          if (need_find_last_extent) {
        4081                  /* btrfs_prev_leaf could return 1 without releasing the path */
        4082                  btrfs_release_path(src_path);
        4083                  ret = btrfs_search_slot(NULL, inode->root, &first_key,
        4084                                  src_path, 0, 0);
        4085                  if (ret < 0)
        4086                          return ret;
        4087                  ASSERT(ret == 0);
        (...)
        4103                  if (i >= btrfs_header_nritems(src_path->nodes[0])) {
        4104                          ret = btrfs_next_leaf(inode->root, src_path);
        4105                          if (ret < 0)
        4106                                  return ret;
        4107                          ASSERT(ret == 0);
        4108                          src = src_path->nodes[0];
        4109                          i = 0;
        4110                          need_find_last_extent = true;
        4111                  }
        (...)
      
      The second assertion implicitly expects that the last key before the path
      release still exists, because the surrounding while loop only stops after
      we have found that key. When this assertion fails it produces a stack like
      this:
      
        [139590.037075] assertion failed: ret == 0, file: fs/btrfs/tree-log.c, line: 4107
        [139590.037406] ------------[ cut here ]------------
        [139590.037707] kernel BUG at fs/btrfs/ctree.h:3546!
        [139590.038034] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
        [139590.038340] CPU: 1 PID: 31841 Comm: fsstress Tainted: G        W         5.0.0-btrfs-next-46 #1
        (...)
        [139590.039354] RIP: 0010:assfail.constprop.24+0x18/0x1a [btrfs]
        (...)
        [139590.040397] RSP: 0018:ffffa27f48f2b9b0 EFLAGS: 00010282
        [139590.040730] RAX: 0000000000000041 RBX: ffff897c635d92c8 RCX: 0000000000000000
        [139590.041105] RDX: 0000000000000000 RSI: ffff897d36a96868 RDI: ffff897d36a96868
        [139590.041470] RBP: ffff897d1b9a0708 R08: 0000000000000000 R09: 0000000000000000
        [139590.041815] R10: 0000000000000008 R11: 0000000000000000 R12: 0000000000000013
        [139590.042159] R13: 0000000000000227 R14: ffff897cffcbba88 R15: 0000000000000001
        [139590.042501] FS:  00007f2efc8dee80(0000) GS:ffff897d36a80000(0000) knlGS:0000000000000000
        [139590.042847] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [139590.043199] CR2: 00007f8c064935e0 CR3: 0000000232252002 CR4: 00000000003606e0
        [139590.043547] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [139590.043899] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [139590.044250] Call Trace:
        [139590.044631]  copy_items+0xa3f/0x1000 [btrfs]
        [139590.045009]  ? generic_bin_search.constprop.32+0x61/0x200 [btrfs]
        [139590.045396]  btrfs_log_inode+0x7b3/0xd70 [btrfs]
        [139590.045773]  btrfs_log_inode_parent+0x2b3/0xce0 [btrfs]
        [139590.046143]  ? do_raw_spin_unlock+0x49/0xc0
        [139590.046510]  btrfs_log_dentry_safe+0x4a/0x70 [btrfs]
        [139590.046872]  btrfs_sync_file+0x3b6/0x440 [btrfs]
        [139590.047243]  btrfs_file_write_iter+0x45b/0x5c0 [btrfs]
        [139590.047592]  __vfs_write+0x129/0x1c0
        [139590.047932]  vfs_write+0xc2/0x1b0
        [139590.048270]  ksys_write+0x55/0xc0
        [139590.048608]  do_syscall_64+0x60/0x1b0
        [139590.048946]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [139590.049287] RIP: 0033:0x7f2efc4be190
        (...)
        [139590.050342] RSP: 002b:00007ffe743243a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
        [139590.050701] RAX: ffffffffffffffda RBX: 0000000000008d58 RCX: 00007f2efc4be190
        [139590.051067] RDX: 0000000000008d58 RSI: 00005567eca0f370 RDI: 0000000000000003
        [139590.051459] RBP: 0000000000000024 R08: 0000000000000003 R09: 0000000000008d60
        [139590.051863] R10: 0000000000000078 R11: 0000000000000246 R12: 0000000000000003
        [139590.052252] R13: 00000000003d3507 R14: 00005567eca0f370 R15: 0000000000000000
        (...)
        [139590.055128] ---[ end trace 193f35d0215cdeeb ]---
      
      So fix this race between a full ranged fsync and writeback of adjacent
      ranges by flushing all delalloc and waiting for all ordered extents to
      complete before logging the inode. This is the simplest way to solve the
      problem because currently the full fsync path does not deal with ranges
      at all (it assumes a full range from 0 to LLONG_MAX) and it always needs
      to look at adjacent ranges for hole detection. For use cases of ranged
      fsyncs this can make a few fsyncs slower but on the other hand it can
      make some following fsyncs to other ranges do less work or no need to do
      anything at all. A full fsync is rare anyway and happens only once after
      loading/creating an inode and once after less common operations such as a
      shrinking truncate.
      
      This is an issue that exists for a long time, and was often triggered by
      generic/127, because it does mmap'ed writes and msync (which triggers a
      ranged fsync). Adding support for the tree checker to detect overlapping
      extents (next patch in the series) and trigger a WARN() when such cases
      are found, and then calling btrfs_check_leaf_full() at the end of
      btrfs_insert_file_extent() made the issue much easier to detect. Running
      btrfs/072 with that change to the tree checker and making fsstress open
      files always with O_SYNC made it much easier to trigger the issue (as
      triggering it with generic/127 is very rare).
      
      CC: stable@vger.kernel.org # 3.16+
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ffd658ad
    • Filipe Manana's avatar
      Btrfs: avoid fallback to transaction commit during fsync of files with holes · fb4bdda0
      Filipe Manana authored
      commit ebb92906 upstream.
      
      When we are doing a full fsync (bit BTRFS_INODE_NEEDS_FULL_SYNC set) of a
      file that has holes and has file extent items spanning two or more leafs,
      we can end up falling to back to a full transaction commit due to a logic
      bug that leads to failure to insert a duplicate file extent item that is
      meant to represent a hole between the last file extent item of a leaf and
      the first file extent item in the next leaf. The failure (EEXIST error)
      leads to a transaction commit (as most errors when logging an inode do).
      
      For example, we have the two following leafs:
      
      Leaf N:
      
        -----------------------------------------------
        | ..., ..., ..., (257, FILE_EXTENT_ITEM, 64K) |
        -----------------------------------------------
        The file extent item at the end of leaf N has a length of 4Kb,
        representing the file range from 64K to 68K - 1.
      
      Leaf N + 1:
      
        -----------------------------------------------
        | (257, FILE_EXTENT_ITEM, 72K), ..., ..., ... |
        -----------------------------------------------
        The file extent item at the first slot of leaf N + 1 has a length of
        4Kb too, representing the file range from 72K to 76K - 1.
      
      During the full fsync path, when we are at tree-log.c:copy_items() with
      leaf N as a parameter, after processing the last file extent item, that
      represents the extent at offset 64K, we take a look at the first file
      extent item at the next leaf (leaf N + 1), and notice there's a 4K hole
      between the two extents, and therefore we insert a file extent item
      representing that hole, starting at file offset 68K and ending at offset
      72K - 1. However we don't update the value of *last_extent, which is used
      to represent the end offset (plus 1, non-inclusive end) of the last file
      extent item inserted in the log, so it stays with a value of 68K and not
      with a value of 72K.
      
      Then, when copy_items() is called for leaf N + 1, because the value of
      *last_extent is smaller then the offset of the first extent item in the
      leaf (68K < 72K), we look at the last file extent item in the previous
      leaf (leaf N) and see it there's a 4K gap between it and our first file
      extent item (again, 68K < 72K), so we decide to insert a file extent item
      representing the hole, starting at file offset 68K and ending at offset
      72K - 1, this insertion will fail with -EEXIST being returned from
      btrfs_insert_file_extent() because we already inserted a file extent item
      representing a hole for this offset (68K) in the previous call to
      copy_items(), when processing leaf N.
      
      The -EEXIST error gets propagated to the fsync callback, btrfs_sync_file(),
      which falls back to a full transaction commit.
      
      Fix this by adjusting *last_extent after inserting a hole when we had to
      look at the next leaf.
      
      Fixes: 4ee3fad3 ("Btrfs: fix fsync after hole punching when using no-holes feature")
      Cc: stable@vger.kernel.org # 4.14+
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fb4bdda0
    • Filipe Manana's avatar
      Btrfs: do not abort transaction at btrfs_update_root() after failure to COW path · be69efb3
      Filipe Manana authored
      commit 72bd2323 upstream.
      
      Currently when we fail to COW a path at btrfs_update_root() we end up
      always aborting the transaction. However all the current callers of
      btrfs_update_root() are able to deal with errors returned from it, many do
      end up aborting the transaction themselves (directly or not, such as the
      transaction commit path), other BUG_ON() or just gracefully cancel whatever
      they were doing.
      
      When syncing the fsync log, we call btrfs_update_root() through
      tree-log.c:update_log_root(), and if it returns an -ENOSPC error, the log
      sync code does not abort the transaction, instead it gracefully handles
      the error and returns -EAGAIN to the fsync handler, so that it falls back
      to a transaction commit. Any other error different from -ENOSPC, makes the
      log sync code abort the transaction.
      
      So remove the transaction abort from btrfs_update_log() when we fail to
      COW a path to update the root item, so that if an -ENOSPC failure happens
      we avoid aborting the current transaction and have a chance of the fsync
      succeeding after falling back to a transaction commit.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203413
      Fixes: 79787eaa ("btrfs: replace many BUG_ONs with proper error handling")
      Cc: stable@vger.kernel.org # 4.4+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      be69efb3
    • Johnny Chang's avatar
      btrfs: Check the compression level before getting a workspace · 69bb5079
      Johnny Chang authored
      commit 2b90883c upstream.
      
      When a file's compression property is set as zlib or zstd but leave
      the compression mount option not be set, that means btrfs will try
      to compress the file with default compression level. But in
      btrfs_compress_pages(), it calls get_workspace() with level = 0.
      This will return a workspace with a wrong compression level.
      For zlib, the compression level in the workspace will be 0
      (that means "store only"). And for zstd, the compression in the
      workspace will be 1, not the default level 3.
      
      How to reproduce:
        mkfs -t btrfs /dev/sdb
        mount /dev/sdb /mnt/
        mkdir /mnt/zlib
        btrfs property set /mnt/zlib/ compression zlib
        dd if=/dev/zero of=/mnt/zlib/compression-friendly-file-10M bs=1M count=10
        sync
        btrfs-debugfs -f /mnt/zlib/compression-friendly-file-10M
      
      btrfs-debugfs output:
      * before:
        ...
        (258 9961472): ram 524288 disk 1106247680 disk_size 524288
        file: ... extents 20 disk size 10485760 logical size 10485760 ratio 1.00
      
      * after:
       ...
       (258 10354688): ram 131072 disk 14217216 disk_size 4096
       file: ... extents 80 disk size 327680 logical size 10485760 ratio 32.00
      
      The steps for zstd are similar, but need to put a debugging message to
      show the level of the return workspace in zstd_get_workspace().
      
      This commit adds a check of the compression level before getting a
      workspace by set_level().
      
      CC: stable@vger.kernel.org # 5.1+
      Signed-off-by: default avatarJohnny Chang <johnnyc@synology.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      69bb5079
    • Josef Bacik's avatar
      btrfs: don't double unlock on error in btrfs_punch_hole · 38bf3e22
      Josef Bacik authored
      commit 8fca9550 upstream.
      
      If we have an error writing out a delalloc range in
      btrfs_punch_hole_lock_range we'll unlock the inode and then goto
      out_only_mutex, where we will again unlock the inode.  This is bad,
      don't do this.
      
      Fixes: f27451f2 ("Btrfs: add support for fallocate's zero range operation")
      CC: stable@vger.kernel.org # 4.19+
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      38bf3e22
    • Andreas Gruenbacher's avatar
      gfs2: Fix sign extension bug in gfs2_update_stats · 873aac4c
      Andreas Gruenbacher authored
      commit 5a5ec83d upstream.
      
      Commit 4d207133 changed the types of the statistic values in struct
      gfs2_lkstats from s64 to u64.  Because of that, what should be a signed
      value in gfs2_update_stats turned into an unsigned value.  When shifted
      right, we end up with a large positive value instead of a small negative
      value, which results in an incorrect variance estimate.
      
      Fixes: 4d207133 ("gfs2: Make statistics unsigned, suitable for use with do_div()")
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Cc: stable@vger.kernel.org # v4.4+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      873aac4c
    • Christoph Hellwig's avatar
      arm64/iommu: handle non-remapped addresses in ->mmap and ->get_sgtable · b232deed
      Christoph Hellwig authored
      commit a98d9ae9 upstream.
      
      DMA allocations that can't sleep may return non-remapped addresses, but
      we do not properly handle them in the mmap and get_sgtable methods.
      Resolve non-vmalloc addresses using virt_to_page to handle this corner
      case.
      
      Cc: <stable@vger.kernel.org>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b232deed
    • Will Deacon's avatar
      arm64: Kconfig: Make ARM64_PSEUDO_NMI depend on BROKEN for now · 4101ec81
      Will Deacon authored
      commit 96a13f57 upstream.
      
      Although we merged support for pseudo-nmi using interrupt priority
      masking in 5.1, we've since uncovered a number of non-trivial issues
      with the implementation. Although there are patches pending to address
      these problems, we're facing issues that prevent us from merging them at
      this current time:
      
        https://lkml.kernel.org/r/1556553607-46531-1-git-send-email-julien.thierry@arm.com
      
      For now, simply mark this optional feature as BROKEN in the hope that we
      can fix things properly in the near future.
      
      Cc: <stable@vger.kernel.org> # 5.1
      Cc: Julien Thierry <julien.thierry@arm.com>
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4101ec81
    • Ard Biesheuvel's avatar
      arm64/kernel: kaslr: reduce module randomization range to 2 GB · 2c210489
      Ard Biesheuvel authored
      commit b2eed9b5 upstream.
      
      The following commit
      
        7290d580 ("module: use relative references for __ksymtab entries")
      
      updated the ksymtab handling of some KASLR capable architectures
      so that ksymtab entries are emitted as pairs of 32-bit relative
      references. This reduces the size of the entries, but more
      importantly, it gets rid of statically assigned absolute
      addresses, which require fixing up at boot time if the kernel
      is self relocating (which takes a 24 byte RELA entry for each
      member of the ksymtab struct).
      
      Since ksymtab entries are always part of the same module as the
      symbol they export, it was assumed at the time that a 32-bit
      relative reference is always sufficient to capture the offset
      between a ksymtab entry and its target symbol.
      
      Unfortunately, this is not always true: in the case of per-CPU
      variables, a per-CPU variable's base address (which usually differs
      from the actual address of any of its per-CPU copies) is allocated
      in the vicinity of the ..data.percpu section in the core kernel
      (i.e., in the per-CPU reserved region which follows the section
      containing the core kernel's statically allocated per-CPU variables).
      
      Since we randomize the module space over a 4 GB window covering
      the core kernel (based on the -/+ 4 GB range of an ADRP/ADD pair),
      we may end up putting the core kernel out of the -/+ 2 GB range of
      32-bit relative references of module ksymtab entries that refer to
      per-CPU variables.
      
      So reduce the module randomization range a bit further. We lose
      1 bit of randomization this way, but this is something we can
      tolerate.
      
      Cc: <stable@vger.kernel.org> # v4.19+
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2c210489
    • Dan Williams's avatar
      libnvdimm/pmem: Bypass CONFIG_HARDENED_USERCOPY overhead · e9e27bfc
      Dan Williams authored
      commit 52f476a3 upstream.
      
      Jeff discovered that performance improves from ~375K iops to ~519K iops
      on a simple psync-write fio workload when moving the location of 'struct
      page' from the default PMEM location to DRAM. This result is surprising
      because the expectation is that 'struct page' for dax is only needed for
      third party references to dax mappings. For example, a dax-mapped buffer
      passed to another system call for direct-I/O requires 'struct page' for
      sending the request down the driver stack and pinning the page. There is
      no usage of 'struct page' for first party access to a file via
      read(2)/write(2) and friends.
      
      However, this "no page needed" expectation is violated by
      CONFIG_HARDENED_USERCOPY and the check_copy_size() performed in
      copy_from_iter_full_nocache() and copy_to_iter_mcsafe(). The
      check_heap_object() helper routine assumes the buffer is backed by a
      slab allocator (DRAM) page and applies some checks.  Those checks are
      invalid, dax pages do not originate from the slab, and redundant,
      dax_iomap_actor() has already validated that the I/O is within bounds.
      Specifically that routine validates that the logical file offset is
      within bounds of the file, then it does a sector-to-pfn translation
      which validates that the physical mapping is within bounds of the block
      device.
      
      Bypass additional hardened usercopy overhead and call the 'no check'
      versions of the copy_{to,from}_iter operations directly.
      
      Fixes: 0aed55af ("x86, uaccess: introduce copy_from_iter_flushcache...")
      Cc: <stable@vger.kernel.org>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Reported-and-tested-by: default avatarJeff Smits <jeff.smits@intel.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e9e27bfc
    • Wanpeng Li's avatar
      KVM: nVMX: Fix using __this_cpu_read() in preemptible context · e3feb4af
      Wanpeng Li authored
      commit 541e886f upstream.
      
       BUG: using __this_cpu_read() in preemptible [00000000] code: qemu-system-x86/4590
        caller is nested_vmx_enter_non_root_mode+0xebd/0x1790 [kvm_intel]
        CPU: 4 PID: 4590 Comm: qemu-system-x86 Tainted: G           OE     5.1.0-rc4+ #1
        Call Trace:
         dump_stack+0x67/0x95
         __this_cpu_preempt_check+0xd2/0xe0
         nested_vmx_enter_non_root_mode+0xebd/0x1790 [kvm_intel]
         nested_vmx_run+0xda/0x2b0 [kvm_intel]
         handle_vmlaunch+0x13/0x20 [kvm_intel]
         vmx_handle_exit+0xbd/0x660 [kvm_intel]
         kvm_arch_vcpu_ioctl_run+0xa2c/0x1e50 [kvm]
         kvm_vcpu_ioctl+0x3ad/0x6d0 [kvm]
         do_vfs_ioctl+0xa5/0x6e0
         ksys_ioctl+0x6d/0x80
         __x64_sys_ioctl+0x1a/0x20
         do_syscall_64+0x6f/0x6c0
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Accessing per-cpu variable should disable preemption, this patch extends the
      preemption disable region for __this_cpu_read().
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Fixes: 52017608 ("KVM: nVMX: add option to perform early consistency checks via H/W")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e3feb4af
    • Suthikulpanit, Suravee's avatar
      kvm: svm/avic: fix off-by-one in checking host APIC ID · 4a4c222e
      Suthikulpanit, Suravee authored
      commit c9bcd3e3 upstream.
      
      Current logic does not allow VCPU to be loaded onto CPU with
      APIC ID 255. This should be allowed since the host physical APIC ID
      field in the AVIC Physical APIC table entry is an 8-bit value,
      and APIC ID 255 is valid in system with x2APIC enabled.
      Instead, do not allow VCPU load if the host APIC ID cannot be
      represented by an 8-bit value.
      
      Also, use the more appropriate AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK
      instead of AVIC_MAX_PHYSICAL_ID_COUNT.
      Signed-off-by: default avatarSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4a4c222e
    • Peter Xu's avatar
      kvm: Check irqchip mode before assign irqfd · baaee956
      Peter Xu authored
      commit 654f1f13 upstream.
      
      When assigning kvm irqfd we didn't check the irqchip mode but we allow
      KVM_IRQFD to succeed with all the irqchip modes.  However it does not
      make much sense to create irqfd even without the kernel chips.  Let's
      provide a arch-dependent helper to check whether a specific irqfd is
      allowed by the arch.  At least for x86, it should make sense to check:
      
      - when irqchip mode is NONE, all irqfds should be disallowed, and,
      
      - when irqchip mode is SPLIT, irqfds that are with resamplefd should
        be disallowed.
      
      For either of the case, previously we'll silently ignore the irq or
      the irq ack event if the irqchip mode is incorrect.  However that can
      cause misterious guest behaviors and it can be hard to triage.  Let's
      fail KVM_IRQFD even earlier to detect these incorrect configurations.
      
      CC: Paolo Bonzini <pbonzini@redhat.com>
      CC: Radim Krčmář <rkrcmar@redhat.com>
      CC: Alex Williamson <alex.williamson@redhat.com>
      CC: Eduardo Habkost <ehabkost@redhat.com>
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      baaee956
    • Dan Williams's avatar
      dax: Arrange for dax_supported check to span multiple devices · e00303be
      Dan Williams authored
      commit 7bf7eac8 upstream.
      
      Pankaj reports that starting with commit ad428cdb "dax: Check the
      end of the block-device capacity with dax_direct_access()" device-mapper
      no longer allows dax operation. This results from the stricter checks in
      __bdev_dax_supported() that validate that the start and end of a
      block-device map to the same 'pagemap' instance.
      
      Teach the dax-core and device-mapper to validate the 'pagemap' on a
      per-target basis. This is accomplished by refactoring the
      bdev_dax_supported() internals into generic_fsdax_supported() which
      takes a sector range to validate. Consequently generic_fsdax_supported()
      is suitable to be used in a device-mapper ->iterate_devices() callback.
      A new ->dax_supported() operation is added to allow composite devices to
      split and route upper-level bdev_dax_supported() requests.
      
      Fixes: ad428cdb ("dax: Check the end of the block-device...")
      Cc: <stable@vger.kernel.org>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reported-by: default avatarPankaj Gupta <pagupta@redhat.com>
      Reviewed-by: default avatarPankaj Gupta <pagupta@redhat.com>
      Tested-by: default avatarPankaj Gupta <pagupta@redhat.com>
      Tested-by: default avatarVaibhav Jain <vaibhav@linux.ibm.com>
      Reviewed-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e00303be