1. 15 Jul, 2020 3 commits
    • Zhao Heming's avatar
      md-cluster: fix wild pointer of unlock_all_bitmaps() · 60f80d6f
      Zhao Heming authored
      reproduction steps:
      ```
      node1 # mdadm -C /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sda
      /dev/sdb
      node2 # mdadm -A /dev/md0 /dev/sda /dev/sdb
      node1 # mdadm -G /dev/md0 -b none
      mdadm: failed to remove clustered bitmap.
      node1 # mdadm -S --scan
      ^C  <==== mdadm hung & kernel crash
      ```
      
      kernel stack:
      ```
      [  335.230657] general protection fault: 0000 [#1] SMP NOPTI
      [...]
      [  335.230848] Call Trace:
      [  335.230873]  ? unlock_all_bitmaps+0x5/0x70 [md_cluster]
      [  335.230886]  unlock_all_bitmaps+0x3d/0x70 [md_cluster]
      [  335.230899]  leave+0x10f/0x190 [md_cluster]
      [  335.230932]  ? md_super_wait+0x93/0xa0 [md_mod]
      [  335.230947]  ? leave+0x5/0x190 [md_cluster]
      [  335.230973]  md_cluster_stop+0x1a/0x30 [md_mod]
      [  335.230999]  md_bitmap_free+0x142/0x150 [md_mod]
      [  335.231013]  ? _cond_resched+0x15/0x40
      [  335.231025]  ? mutex_lock+0xe/0x30
      [  335.231056]  __md_stop+0x1c/0xa0 [md_mod]
      [  335.231083]  do_md_stop+0x160/0x580 [md_mod]
      [  335.231119]  ? 0xffffffffc05fb078
      [  335.231148]  md_ioctl+0xa04/0x1930 [md_mod]
      [  335.231165]  ? filename_lookup+0xf2/0x190
      [  335.231179]  blkdev_ioctl+0x93c/0xa10
      [  335.231205]  ? _cond_resched+0x15/0x40
      [  335.231214]  ? __check_object_size+0xd4/0x1a0
      [  335.231224]  block_ioctl+0x39/0x40
      [  335.231243]  do_vfs_ioctl+0xa0/0x680
      [  335.231253]  ksys_ioctl+0x70/0x80
      [  335.231261]  __x64_sys_ioctl+0x16/0x20
      [  335.231271]  do_syscall_64+0x65/0x1f0
      [  335.231278]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      ```
      Signed-off-by: default avatarZhao Heming <heming.zhao@suse.com>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      60f80d6f
    • Song Liu's avatar
      md/raid5-cache: clear MD_SB_CHANGE_PENDING before flushing stripes · c9020e64
      Song Liu authored
      In recovery, if we process too much data, raid5-cache may set
      MD_SB_CHANGE_PENDING, which causes spinning in handle_stripe().
      Fix this issue by clearing the bit before flushing data only
      stripes. This issue was initially discussed in [1].
      
      [1] https://www.spinics.net/lists/raid/msg64409.htmlSigned-off-by: default avatarSong Liu <songliubraving@fb.com>
      c9020e64
    • Junxiao Bi's avatar
      md: fix deadlock causing by sysfs_notify · e1a86dbb
      Junxiao Bi authored
      The following deadlock was captured. The first process is holding 'kernfs_mutex'
      and hung by io. The io was staging in 'r1conf.pending_bio_list' of raid1 device,
      this pending bio list would be flushed by second process 'md127_raid1', but
      it was hung by 'kernfs_mutex'. Using sysfs_notify_dirent_safe() to replace
      sysfs_notify() can fix it. There were other sysfs_notify() invoked from io
      path, removed all of them.
      
       PID: 40430  TASK: ffff8ee9c8c65c40  CPU: 29  COMMAND: "probe_file"
        #0 [ffffb87c4df37260] __schedule at ffffffff9a8678ec
        #1 [ffffb87c4df372f8] schedule at ffffffff9a867f06
        #2 [ffffb87c4df37310] io_schedule at ffffffff9a0c73e6
        #3 [ffffb87c4df37328] __dta___xfs_iunpin_wait_3443 at ffffffffc03a4057 [xfs]
        #4 [ffffb87c4df373a0] xfs_iunpin_wait at ffffffffc03a6c79 [xfs]
        #5 [ffffb87c4df373b0] __dta_xfs_reclaim_inode_3357 at ffffffffc039a46c [xfs]
        #6 [ffffb87c4df37400] xfs_reclaim_inodes_ag at ffffffffc039a8b6 [xfs]
        #7 [ffffb87c4df37590] xfs_reclaim_inodes_nr at ffffffffc039bb33 [xfs]
        #8 [ffffb87c4df375b0] xfs_fs_free_cached_objects at ffffffffc03af0e9 [xfs]
        #9 [ffffb87c4df375c0] super_cache_scan at ffffffff9a287ec7
       #10 [ffffb87c4df37618] shrink_slab at ffffffff9a1efd93
       #11 [ffffb87c4df37700] shrink_node at ffffffff9a1f5968
       #12 [ffffb87c4df37788] do_try_to_free_pages at ffffffff9a1f5ea2
       #13 [ffffb87c4df377f0] try_to_free_mem_cgroup_pages at ffffffff9a1f6445
       #14 [ffffb87c4df37880] try_charge at ffffffff9a26cc5f
       #15 [ffffb87c4df37920] memcg_kmem_charge_memcg at ffffffff9a270f6a
       #16 [ffffb87c4df37958] new_slab at ffffffff9a251430
       #17 [ffffb87c4df379c0] ___slab_alloc at ffffffff9a251c85
       #18 [ffffb87c4df37a80] __slab_alloc at ffffffff9a25635d
       #19 [ffffb87c4df37ac0] kmem_cache_alloc at ffffffff9a251f89
       #20 [ffffb87c4df37b00] alloc_inode at ffffffff9a2a2b10
       #21 [ffffb87c4df37b20] iget_locked at ffffffff9a2a4854
       #22 [ffffb87c4df37b60] kernfs_get_inode at ffffffff9a311377
       #23 [ffffb87c4df37b80] kernfs_iop_lookup at ffffffff9a311e2b
       #24 [ffffb87c4df37ba8] lookup_slow at ffffffff9a290118
       #25 [ffffb87c4df37c10] walk_component at ffffffff9a291e83
       #26 [ffffb87c4df37c78] path_lookupat at ffffffff9a293619
       #27 [ffffb87c4df37cd8] filename_lookup at ffffffff9a2953af
       #28 [ffffb87c4df37de8] user_path_at_empty at ffffffff9a295566
       #29 [ffffb87c4df37e10] vfs_statx at ffffffff9a289787
       #30 [ffffb87c4df37e70] SYSC_newlstat at ffffffff9a289d5d
       #31 [ffffb87c4df37f18] sys_newlstat at ffffffff9a28a60e
       #32 [ffffb87c4df37f28] do_syscall_64 at ffffffff9a003949
       #33 [ffffb87c4df37f50] entry_SYSCALL_64_after_hwframe at ffffffff9aa001ad
           RIP: 00007f617a5f2905  RSP: 00007f607334f838  RFLAGS: 00000246
           RAX: ffffffffffffffda  RBX: 00007f6064044b20  RCX: 00007f617a5f2905
           RDX: 00007f6064044b20  RSI: 00007f6064044b20  RDI: 00007f6064005890
           RBP: 00007f6064044aa0   R8: 0000000000000030   R9: 000000000000011c
           R10: 0000000000000013  R11: 0000000000000246  R12: 00007f606417e6d0
           R13: 00007f6064044aa0  R14: 00007f6064044b10  R15: 00000000ffffffff
           ORIG_RAX: 0000000000000006  CS: 0033  SS: 002b
      
       PID: 927    TASK: ffff8f15ac5dbd80  CPU: 42  COMMAND: "md127_raid1"
        #0 [ffffb87c4df07b28] __schedule at ffffffff9a8678ec
        #1 [ffffb87c4df07bc0] schedule at ffffffff9a867f06
        #2 [ffffb87c4df07bd8] schedule_preempt_disabled at ffffffff9a86825e
        #3 [ffffb87c4df07be8] __mutex_lock at ffffffff9a869bcc
        #4 [ffffb87c4df07ca0] __mutex_lock_slowpath at ffffffff9a86a013
        #5 [ffffb87c4df07cb0] mutex_lock at ffffffff9a86a04f
        #6 [ffffb87c4df07cc8] kernfs_find_and_get_ns at ffffffff9a311d83
        #7 [ffffb87c4df07cf0] sysfs_notify at ffffffff9a314b3a
        #8 [ffffb87c4df07d18] md_update_sb at ffffffff9a688696
        #9 [ffffb87c4df07d98] md_update_sb at ffffffff9a6886d5
       #10 [ffffb87c4df07da8] md_check_recovery at ffffffff9a68ad9c
       #11 [ffffb87c4df07dd0] raid1d at ffffffffc01f0375 [raid1]
       #12 [ffffb87c4df07ea0] md_thread at ffffffff9a680348
       #13 [ffffb87c4df07f08] kthread at ffffffff9a0b8005
       #14 [ffffb87c4df07f50] ret_from_fork at ffffffff9aa00344
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      e1a86dbb
  2. 14 Jul, 2020 2 commits
    • Artur Paszkiewicz's avatar
      md: improve io stats accounting · 41d2d848
      Artur Paszkiewicz authored
      Use generic io accounting functions to manage io stats. There was an
      attempt to do this earlier in commit 18c0b223 ("md: use generic io
      stats accounting functions to simplify io stat accounting"), but it did
      not include a call to generic_end_io_acct() and caused issues with
      tracking in-flight IOs, so it was later removed in commit 74672d06
      ("md: fix md io stats accounting broken").
      
      This patch attempts to fix this by using both disk_start_io_acct() and
      disk_end_io_acct(). To make it possible, a struct md_io is allocated for
      every new md bio, which includes the io start_time. A new mempool is
      introduced for this purpose. We override bio->bi_end_io with our own
      callback and call disk_start_io_acct() before passing the bio to
      md_handle_request(). When it completes, we call disk_end_io_acct() and
      the original bi_end_io callback.
      
      This adds correct statistics about in-flight IOs and IO processing time,
      interpreted e.g. in iostat as await, svctm, aqu-sz and %util.
      
      It also fixes a situation where too many IOs where reported if a bio was
      re-submitted to the mddev, because io accounting is now performed only
      on newly arriving bios.
      Acked-by: default avatarGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
      Signed-off-by: default avatarArtur Paszkiewicz <artur.paszkiewicz@intel.com>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      41d2d848
    • Colin Ian King's avatar
      md: raid0/linear: fix dereference before null check on pointer mddev · 9a5a8597
      Colin Ian King authored
      Pointer mddev is being dereferenced with a test_bit call before mddev
      is being null checked, this may cause a null pointer dereference. Fix
      this by moving the null pointer checks to sanity check mddev before
      it is dereferenced.
      
      Addresses-Coverity: ("Dereference before null check")
      Fixes: 62f7b198 ("md raid0/linear: Mark array as 'broken' and fail BIOs if a member is gone")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Reviewed-by: default avatarGuilherme G. Piccoli <gpiccoli@canonical.com>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      9a5a8597
  3. 11 Jul, 2020 1 commit
    • Christophe JAILLET's avatar
      rsxx: switch from 'pci_free_consistent()' to 'dma_free_coherent()' · 2eaac320
      Christophe JAILLET authored
      The wrappers in include/linux/pci-dma-compat.h should go away.
      
      The patch has been generated with the coccinelle script bellow.
      It has been compile tested.
      
      This also aligns code with what is in use in '/rsxx/dma.c'
      
      @@
      @@
      -    PCI_DMA_BIDIRECTIONAL
      +    DMA_BIDIRECTIONAL
      
      @@
      @@
      -    PCI_DMA_TODEVICE
      +    DMA_TO_DEVICE
      
      @@
      @@
      -    PCI_DMA_FROMDEVICE
      +    DMA_FROM_DEVICE
      
      @@
      @@
      -    PCI_DMA_NONE
      +    DMA_NONE
      
      @@
      expression e1, e2, e3;
      @@
      -    pci_alloc_consistent(e1, e2, e3)
      +    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
      
      @@
      expression e1, e2, e3;
      @@
      -    pci_zalloc_consistent(e1, e2, e3)
      +    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_free_consistent(e1, e2, e3, e4)
      +    dma_free_coherent(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_map_single(e1, e2, e3, e4)
      +    dma_map_single(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_single(e1, e2, e3, e4)
      +    dma_unmap_single(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4, e5;
      @@
      -    pci_map_page(e1, e2, e3, e4, e5)
      +    dma_map_page(&e1->dev, e2, e3, e4, e5)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_page(e1, e2, e3, e4)
      +    dma_unmap_page(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_map_sg(e1, e2, e3, e4)
      +    dma_map_sg(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_sg(e1, e2, e3, e4)
      +    dma_unmap_sg(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
      +    dma_sync_single_for_cpu(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_single_for_device(e1, e2, e3, e4)
      +    dma_sync_single_for_device(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
      +    dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_sg_for_device(e1, e2, e3, e4)
      +    dma_sync_sg_for_device(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2;
      @@
      -    pci_dma_mapping_error(e1, e2)
      +    dma_mapping_error(&e1->dev, e2)
      
      @@
      expression e1, e2;
      @@
      -    pci_set_dma_mask(e1, e2)
      +    dma_set_mask(&e1->dev, e2)
      
      @@
      expression e1, e2;
      @@
      -    pci_set_consistent_dma_mask(e1, e2)
      +    dma_set_coherent_mask(&e1->dev, e2)
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2eaac320
  4. 10 Jul, 2020 1 commit
    • Jens Axboe's avatar
      Merge branch 'nvme-5.9' of git://git.infradead.org/nvme into for-5.9/drivers · 80ee071b
      Jens Axboe authored
      Pull NVMe updates from Christoph:
      
      "Below is the current large chunk we have in the nvme tree for 5.9:
      
       - ZNS support (Aravind, Keith, Matias, Niklas)
        - misc cleanups and optimizations
           (Baolin, Chaitanya, David, Dongli, Max, Sagi)"
      
      * 'nvme-5.9' of git://git.infradead.org/nvme: (28 commits)
        nvme: remove ns->disk checks
        nvme-pci: use standard block status symbolic names
        nvme-pci: use the consistent return type of nvme_pci_iod_alloc_size()
        nvme-pci: add a blank line after declarations
        nvme-pci: fix some comments issues
        nvme-pci: remove redundant segment validation
        nvme: document quirked Intel models
        nvme: expose reconnect_delay and ctrl_loss_tmo via sysfs
        nvme: support for zoned namespaces
        nvme: support for multiple Command Sets Supported and Effects log pages
        nvme: implement multiple I/O Command Set support
        null_blk: introduce zone capacity for zoned device
        block: add capacity field to zone descriptors
        nvme: use USEC_PER_SEC instead of magic numbers
        nvmet-tcp: simplify nvmet_process_resp_list
        nvme-tcp: optimize network stack with setting msg flags according to batch size
        nvme-tcp: leverage request plugging
        nvme-tcp: have queue prod/cons send list become a llist
        nvme-fcloop: verify wwnn and wwpn format
        nvmet: use unsigned type for u64
        ...
      80ee071b
  5. 08 Jul, 2020 29 commits
  6. 07 Jul, 2020 1 commit
  7. 05 Jul, 2020 3 commits
    • Linus Torvalds's avatar
      Linux 5.8-rc4 · dcb7fd82
      Linus Torvalds authored
      dcb7fd82
    • Linus Torvalds's avatar
      x86/ldt: use "pr_info_once()" instead of open-coding it badly · bb5a93aa
      Linus Torvalds authored
      Using a mutex for "print this warning only once" is so overdesigned as
      to be actively offensive to my sensitive stomach.
      
      Just use "pr_info_once()" that already does this, although in a
      (harmlessly) racy manner that can in theory cause the message to be
      printed twice if more than one CPU races on that "is this the first
      time" test.
      
      [ If somebody really cares about that harmless data race (which sounds
        very unlikely indeed), that person can trivially fix printk_once() by
        using a simple atomic access, preferably with an optimistic non-atomic
        test first before even bothering to treat the pointless "make sure it
        is _really_ just once" case.
      
        A mutex is most definitely never the right primitive to use for
        something like this. ]
      
      Yes, this is a small and meaningless detail in a code path that hardly
      matters.  But let's keep some code quality standards here, and not
      accept outrageously bad code.
      
      Link: https://lore.kernel.org/lkml/CAHk-=wgV9toS7GU3KmNpj8hCS9SeF+A0voHS8F275_mgLhL4Lw@mail.gmail.com/
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bb5a93aa
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2020-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 72674d48
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
       "A series of fixes for x86:
      
         - Reset MXCSR in kernel_fpu_begin() to prevent using a stale user
           space value.
      
         - Prevent writing MSR_TEST_CTRL on CPUs which are not explicitly
           whitelisted for split lock detection. Some CPUs which do not
           support it crash even when the MSR is written to 0 which is the
           default value.
      
         - Fix the XEN PV fallout of the entry code rework
      
         - Fix the 32bit fallout of the entry code rework
      
         - Add more selftests to ensure that these entry problems don't come
           back.
      
         - Disable 16 bit segments on XEN PV. It's not supported because XEN
           PV does not implement ESPFIX64"
      
      * tag 'x86-urgent-2020-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/ldt: Disable 16-bit segments on Xen PV
        x86/entry/32: Fix #MC and #DB wiring on x86_32
        x86/entry/xen: Route #DB correctly on Xen PV
        x86/entry, selftests: Further improve user entry sanity checks
        x86/entry/compat: Clear RAX high bits on Xen PV SYSENTER
        selftests/x86: Consolidate and fix get/set_eflags() helpers
        selftests/x86/syscall_nt: Clear weird flags after each test
        selftests/x86/syscall_nt: Add more flag combinations
        x86/entry/64/compat: Fix Xen PV SYSENTER frame setup
        x86/entry: Move SYSENTER's regs->sp and regs->flags fixups into C
        x86/entry: Assert that syscalls are on the right stack
        x86/split_lock: Don't write MSR_TEST_CTRL on CPUs that aren't whitelisted
        x86/fpu: Reset MXCSR to default in kernel_fpu_begin()
      72674d48