1. 23 Apr, 2021 3 commits
    • Jens Axboe's avatar
      Merge branch 'md-next' of... · b8417f72
      Jens Axboe authored
      Merge branch 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.13/drivers
      
      Pull MD fixes from Song.
      
      * 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
        md/raid1: properly indicate failure when ending a failed write request
        md-cluster: fix use-after-free issue when removing rdev
      b8417f72
    • Paul Clements's avatar
      md/raid1: properly indicate failure when ending a failed write request · 2417b986
      Paul Clements authored
      This patch addresses a data corruption bug in raid1 arrays using bitmaps.
      Without this fix, the bitmap bits for the failed I/O end up being cleared.
      
      Since we are in the failure leg of raid1_end_write_request, the request
      either needs to be retried (R1BIO_WriteError) or failed (R1BIO_Degraded).
      
      Fixes: eeba6809 ("md/raid1: end bio when the device faulty")
      Cc: stable@vger.kernel.org # v5.2+
      Signed-off-by: default avatarPaul Clements <paul.clements@us.sios.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      2417b986
    • Heming Zhao's avatar
      md-cluster: fix use-after-free issue when removing rdev · f7c7a2f9
      Heming Zhao authored
      md_kick_rdev_from_array will remove rdev, so we should
      use rdev_for_each_safe to search list.
      
      How to trigger:
      
      env: Two nodes on kvm-qemu x86_64 VMs (2C2G with 2 iscsi luns).
      
      ```
      node2=192.168.0.3
      
      for i in {1..20}; do
          echo ==== $i `date` ====;
      
          mdadm -Ss && ssh ${node2} "mdadm -Ss"
          wipefs -a /dev/sda /dev/sdb
      
          mdadm -CR /dev/md0 -b clustered -e 1.2 -n 2 -l 1 /dev/sda \
             /dev/sdb --assume-clean
          ssh ${node2} "mdadm -A /dev/md0 /dev/sda /dev/sdb"
          mdadm --wait /dev/md0
          ssh ${node2} "mdadm --wait /dev/md0"
      
          mdadm --manage /dev/md0 --fail /dev/sda --remove /dev/sda
          sleep 1
      done
      ```
      
      Crash stack:
      
      ```
      stack segment: 0000 [#1] SMP
      ... ...
      RIP: 0010:md_check_recovery+0x1e8/0x570 [md_mod]
      ... ...
      RSP: 0018:ffffb149807a7d68 EFLAGS: 00010207
      RAX: 0000000000000000 RBX: ffff9d494c180800 RCX: ffff9d490fc01e50
      RDX: fffff047c0ed8308 RSI: 0000000000000246 RDI: 0000000000000246
      RBP: 6b6b6b6b6b6b6b6b R08: ffff9d490fc01e40 R09: 0000000000000000
      R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
      R13: ffff9d494c180818 R14: ffff9d493399ef38 R15: ffff9d4933a1d800
      FS:  0000000000000000(0000) GS:ffff9d494f700000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fe68cab9010 CR3: 000000004c6be001 CR4: 00000000003706e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       raid1d+0x5c/0xd40 [raid1]
       ? finish_task_switch+0x75/0x2a0
       ? lock_timer_base+0x67/0x80
       ? try_to_del_timer_sync+0x4d/0x80
       ? del_timer_sync+0x41/0x50
       ? schedule_timeout+0x254/0x2d0
       ? md_start_sync+0xe0/0xe0 [md_mod]
       ? md_thread+0x127/0x160 [md_mod]
       md_thread+0x127/0x160 [md_mod]
       ? wait_woken+0x80/0x80
       kthread+0x10d/0x130
       ? kthread_park+0xa0/0xa0
       ret_from_fork+0x1f/0x40
      ```
      
      Fixes: dbb64f86 ("md-cluster: Fix adding of new disk with new reload code")
      Fixes: 659b254f ("md-cluster: remove a disk asynchronously from cluster environment")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarGang He <ghe@suse.com>
      Signed-off-by: default avatarHeming Zhao <heming.zhao@suse.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      f7c7a2f9
  2. 22 Apr, 2021 2 commits
  3. 21 Apr, 2021 8 commits
    • Christoph Hellwig's avatar
      nvme: cleanup nvme_configure_apst · 60df5de9
      Christoph Hellwig authored
      Remove a level of indentation from the main code implementating the table
      search by using a goto for the APST not supported case.  Also move the
      main comment above the function.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarNiklas Cassel <niklas.cassel@wdc.com>
      60df5de9
    • Christoph Hellwig's avatar
      nvme: do not try to reconfigure APST when the controller is not live · 53fe2a30
      Christoph Hellwig authored
      Do not call nvme_configure_apst when the controller is not live, given
      that nvme_configure_apst will fail due the lack of an admin queue when
      the controller is being torn down and nvme_set_latency_tolerance is
      called from dev_pm_qos_hide_latency_tolerance.
      
      Fixes: 510a405d("nvme: fix memory leak for power latency tolerance")
      Reported-by: default avatarPeng Liu <liupeng17@lenovo.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarKeith Busch <kbusch@kernel.org>
      53fe2a30
    • Hannes Reinecke's avatar
      nvme: add 'kato' sysfs attribute · 74c22990
      Hannes Reinecke authored
      Add a 'kato' controller sysfs attribute to display the current
      keep-alive timeout value (if any). This allows userspace to identify
      persistent discovery controllers, as these will have a non-zero
      KATO value.
      Signed-off-by: default avatarHannes Reinecke <hare@suse.de>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      74c22990
    • Hannes Reinecke's avatar
      nvme: sanitize KATO setting · a70b81bd
      Hannes Reinecke authored
      According to the NVMe base spec the KATO commands should be sent
      at half of the KATO interval, to properly account for round-trip
      times.
      As we now will only ever send one KATO command per connection we
      can easily use the recommended values.
      This also fixes a potential issue where the request timeout for
      the KATO command does not match the value in the connect command,
      which might be causing spurious connection drops from the target.
      Signed-off-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      a70b81bd
    • Hou Pu's avatar
      nvmet: avoid queuing keep-alive timer if it is disabled · 8f864c59
      Hou Pu authored
      Issue following command:
      nvme set-feature -f 0xf -v 0 /dev/nvme1n1 # disable keep-alive timer
      nvme admin-passthru -o 0x18 /dev/nvme1n1  # send keep-alive command
      will make keep-alive timer fired and thus delete the controller like
      below:
      
      [247459.907635] nvmet: ctrl 1 keep-alive timer (0 seconds) expired!
      [247459.930294] nvmet: ctrl 1 fatal error occurred!
      
      Avoid this by not queuing delayed keep-alive if it is disabled when
      keep-alive command is received from the admin queue.
      Signed-off-by: default avatarHou Pu <houpu.main@gmail.com>
      Tested-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      8f864c59
    • Calvin Owens's avatar
      brd: expose number of allocated pages in debugfs · f4be591f
      Calvin Owens authored
      While the maximum size of each ramdisk is defined either as a module
      parameter, or compile time default, it's impossible to know how many pages
      have currently been allocated by each ram%d device, since they're
      allocated when used and never freed.
      
      This patch creates a new directory at this location:
      
      /sys/kernel/debug/ramdisk_pages/
      
      which will contain a file named "ram%d" for each instantiated ramdisk on
      the system. The file is read-only, and read() will output the number of
      pages currently held by that ramdisk.
      
      We lose track how much memory a ramdisk is using as pages once used are
      simply recycled but never freed.
      
      In instances where we exhaust the size of the ramdisk with a file that
      exceeds it, encounter ENOSPC and delete the file for mitigation; df would
      show decrease in used and increase in available blocks but the since we
      have touched all pages, the memory footprint of the ramdisk does not
      reflect the blocks used/available count
      
      ...
      [root@localhost ~]# mkfs.ext2 /dev/ram15
      mke2fs 1.45.6 (20-Mar-2020)
      Creating filesystem with 4096 1k blocks and 1024 inodes
      [root@localhost ~]# mount /dev/ram15 /mnt/ram15/
      
      [root@localhost ~]# cat
      /sys/kernel/debug/ramdisk_pages/ram15
      58
      [root@kerneltest008.06.prn3 ~]# df /dev/ram15
      Filesystem     1K-blocks  Used Available Use% Mounted on
      /dev/ram15          3963    31      3728   1% /mnt/ram15
      [root@kerneltest008.06.prn3 ~]# dd if=/dev/urandom of=/mnt/ram15/test2
      bs=1M count=5
      dd: error writing '/mnt/ram15/test2': No space left on device
      4+0 records in
      3+0 records out
      4005888 bytes (4.0 MB, 3.8 MiB) copied, 0.0446614 s, 89.7 MB/s
      [root@kerneltest008.06.prn3 ~]# df /mnt/ram15/
      Filesystem     1K-blocks  Used Available Use% Mounted on
      /dev/ram15          3963  3960         0 100% /mnt/ram15
      [root@kerneltest008.06.prn3 ~]# cat
      /sys/kernel/debug/ramdisk_pages/ram15
      1024
      [root@kerneltest008.06.prn3 ~]# rm /mnt/ram15/test2
      rm: remove regular file '/mnt/ram15/test2'? y
      [root@kerneltest008.06.prn3 /var]# df /dev/ram15
      Filesystem     1K-blocks  Used Available Use% Mounted on
      /dev/ram15          3963    31      3728   1% /mnt/ram15
      
      # Acutal memory footprint
      [root@kerneltest008.06.prn3 /var]# cat
      /sys/kernel/debug/ramdisk_pages/ram15
      1024
      ...
      
      This debugfs counter will always reveal the accurate number of
      permanently allocated pages to the ramdisk.
      Signed-off-by: default avatarCalvin Owens <calvinowens@fb.com>
      [cleaned up the !CONFIG_DEBUG_FS case and API changes for HEAD]
      Signed-off-by: default avatarKyle McMartin <jkkm@fb.com>
      [rebased]
      Signed-off-by: default avatarSaravanan D <saravanand@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f4be591f
    • Dan Carpenter's avatar
      ataflop: fix off by one in ataflop_probe() · b777f4c4
      Dan Carpenter authored
      Smatch complains that the "type > NUM_DISK_MINORS" should be >=
      instead of >.  We also need to subtract one from "type" at the start.
      
      Fixes: bf9c0538 ("ataflop: use a separate gendisk for each media format")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b777f4c4
    • Dan Carpenter's avatar
      ataflop: potential out of bounds in do_format() · 1ffec389
      Dan Carpenter authored
      The function uses "type" as an array index:
      
      	q = unit[drive].disk[type]->queue;
      
      Unfortunately the bounds check on "type" isn't done until later in the
      function.  Fix this by moving the bounds check to the start.
      
      Fixes: bf9c0538 ("ataflop: use a separate gendisk for each media format")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      1ffec389
  4. 20 Apr, 2021 26 commits
  5. 15 Apr, 2021 1 commit