1. 22 Apr, 2021 2 commits
  2. 21 Apr, 2021 8 commits
    • Christoph Hellwig's avatar
      nvme: cleanup nvme_configure_apst · 60df5de9
      Christoph Hellwig authored
      Remove a level of indentation from the main code implementating the table
      search by using a goto for the APST not supported case.  Also move the
      main comment above the function.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarNiklas Cassel <niklas.cassel@wdc.com>
      60df5de9
    • Christoph Hellwig's avatar
      nvme: do not try to reconfigure APST when the controller is not live · 53fe2a30
      Christoph Hellwig authored
      Do not call nvme_configure_apst when the controller is not live, given
      that nvme_configure_apst will fail due the lack of an admin queue when
      the controller is being torn down and nvme_set_latency_tolerance is
      called from dev_pm_qos_hide_latency_tolerance.
      
      Fixes: 510a405d("nvme: fix memory leak for power latency tolerance")
      Reported-by: default avatarPeng Liu <liupeng17@lenovo.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarKeith Busch <kbusch@kernel.org>
      53fe2a30
    • Hannes Reinecke's avatar
      nvme: add 'kato' sysfs attribute · 74c22990
      Hannes Reinecke authored
      Add a 'kato' controller sysfs attribute to display the current
      keep-alive timeout value (if any). This allows userspace to identify
      persistent discovery controllers, as these will have a non-zero
      KATO value.
      Signed-off-by: default avatarHannes Reinecke <hare@suse.de>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      74c22990
    • Hannes Reinecke's avatar
      nvme: sanitize KATO setting · a70b81bd
      Hannes Reinecke authored
      According to the NVMe base spec the KATO commands should be sent
      at half of the KATO interval, to properly account for round-trip
      times.
      As we now will only ever send one KATO command per connection we
      can easily use the recommended values.
      This also fixes a potential issue where the request timeout for
      the KATO command does not match the value in the connect command,
      which might be causing spurious connection drops from the target.
      Signed-off-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      a70b81bd
    • Hou Pu's avatar
      nvmet: avoid queuing keep-alive timer if it is disabled · 8f864c59
      Hou Pu authored
      Issue following command:
      nvme set-feature -f 0xf -v 0 /dev/nvme1n1 # disable keep-alive timer
      nvme admin-passthru -o 0x18 /dev/nvme1n1  # send keep-alive command
      will make keep-alive timer fired and thus delete the controller like
      below:
      
      [247459.907635] nvmet: ctrl 1 keep-alive timer (0 seconds) expired!
      [247459.930294] nvmet: ctrl 1 fatal error occurred!
      
      Avoid this by not queuing delayed keep-alive if it is disabled when
      keep-alive command is received from the admin queue.
      Signed-off-by: default avatarHou Pu <houpu.main@gmail.com>
      Tested-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      8f864c59
    • Calvin Owens's avatar
      brd: expose number of allocated pages in debugfs · f4be591f
      Calvin Owens authored
      While the maximum size of each ramdisk is defined either as a module
      parameter, or compile time default, it's impossible to know how many pages
      have currently been allocated by each ram%d device, since they're
      allocated when used and never freed.
      
      This patch creates a new directory at this location:
      
      /sys/kernel/debug/ramdisk_pages/
      
      which will contain a file named "ram%d" for each instantiated ramdisk on
      the system. The file is read-only, and read() will output the number of
      pages currently held by that ramdisk.
      
      We lose track how much memory a ramdisk is using as pages once used are
      simply recycled but never freed.
      
      In instances where we exhaust the size of the ramdisk with a file that
      exceeds it, encounter ENOSPC and delete the file for mitigation; df would
      show decrease in used and increase in available blocks but the since we
      have touched all pages, the memory footprint of the ramdisk does not
      reflect the blocks used/available count
      
      ...
      [root@localhost ~]# mkfs.ext2 /dev/ram15
      mke2fs 1.45.6 (20-Mar-2020)
      Creating filesystem with 4096 1k blocks and 1024 inodes
      [root@localhost ~]# mount /dev/ram15 /mnt/ram15/
      
      [root@localhost ~]# cat
      /sys/kernel/debug/ramdisk_pages/ram15
      58
      [root@kerneltest008.06.prn3 ~]# df /dev/ram15
      Filesystem     1K-blocks  Used Available Use% Mounted on
      /dev/ram15          3963    31      3728   1% /mnt/ram15
      [root@kerneltest008.06.prn3 ~]# dd if=/dev/urandom of=/mnt/ram15/test2
      bs=1M count=5
      dd: error writing '/mnt/ram15/test2': No space left on device
      4+0 records in
      3+0 records out
      4005888 bytes (4.0 MB, 3.8 MiB) copied, 0.0446614 s, 89.7 MB/s
      [root@kerneltest008.06.prn3 ~]# df /mnt/ram15/
      Filesystem     1K-blocks  Used Available Use% Mounted on
      /dev/ram15          3963  3960         0 100% /mnt/ram15
      [root@kerneltest008.06.prn3 ~]# cat
      /sys/kernel/debug/ramdisk_pages/ram15
      1024
      [root@kerneltest008.06.prn3 ~]# rm /mnt/ram15/test2
      rm: remove regular file '/mnt/ram15/test2'? y
      [root@kerneltest008.06.prn3 /var]# df /dev/ram15
      Filesystem     1K-blocks  Used Available Use% Mounted on
      /dev/ram15          3963    31      3728   1% /mnt/ram15
      
      # Acutal memory footprint
      [root@kerneltest008.06.prn3 /var]# cat
      /sys/kernel/debug/ramdisk_pages/ram15
      1024
      ...
      
      This debugfs counter will always reveal the accurate number of
      permanently allocated pages to the ramdisk.
      Signed-off-by: default avatarCalvin Owens <calvinowens@fb.com>
      [cleaned up the !CONFIG_DEBUG_FS case and API changes for HEAD]
      Signed-off-by: default avatarKyle McMartin <jkkm@fb.com>
      [rebased]
      Signed-off-by: default avatarSaravanan D <saravanand@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f4be591f
    • Dan Carpenter's avatar
      ataflop: fix off by one in ataflop_probe() · b777f4c4
      Dan Carpenter authored
      Smatch complains that the "type > NUM_DISK_MINORS" should be >=
      instead of >.  We also need to subtract one from "type" at the start.
      
      Fixes: bf9c0538 ("ataflop: use a separate gendisk for each media format")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b777f4c4
    • Dan Carpenter's avatar
      ataflop: potential out of bounds in do_format() · 1ffec389
      Dan Carpenter authored
      The function uses "type" as an array index:
      
      	q = unit[drive].disk[type]->queue;
      
      Unfortunately the bounds check on "type" isn't done until later in the
      function.  Fix this by moving the bounds check to the start.
      
      Fixes: bf9c0538 ("ataflop: use a separate gendisk for each media format")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      1ffec389
  3. 20 Apr, 2021 26 commits
  4. 15 Apr, 2021 4 commits
    • Jens Axboe's avatar
      Merge branch 'md-next' of... · 455abda6
      Jens Axboe authored
      Merge branch 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.13/drivers
      
      Pull MD updates from Song:
      
      "1. mddev_find_or_alloc() clean up, from Christoph.
       2. Fix NULL pointer deref with external bitmap, from Sudhakar."
      
      * 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
        md/bitmap: wait for external bitmap writes to complete during tear down
        md: do not return existing mddevs from mddev_find_or_alloc
        md: refactor mddev_find_or_alloc
        md: factor out a mddev_alloc_unit helper from mddev_find
      455abda6
    • Jens Axboe's avatar
      Merge tag 'nvme-5.13-2021-04-15' of git://git.infradead.org/nvme into for-5.13/drivers · e63c8eb1
      Jens Axboe authored
      Pull NVMe updates from Christoph:
      
      "nvme updates for Linux 5.13
      
       - refactor the ioctl code
       - fix a segmentation fault during io parsing error in nvmet-tcp
         (Elad Grupi)
       - fix NULL derefence in nvme_ctrl_fast_io_fail_tmo_show/store
         (Gopal Tiwari)
       - properly respect the sgl_threshold flag in nvme-pci (Niklas Cassel)
       - misc cleanups (Niklas Cassel, Amit Engel, Minwoo Im, Colin Ian King)"
      
      * tag 'nvme-5.13-2021-04-15' of git://git.infradead.org/nvme:
        nvme: fix NULL derefence in nvme_ctrl_fast_io_fail_tmo_show/store
        nvme: let namespace probing continue for unsupported features
        nvme: factor out nvme_ns_open and nvme_ns_release helpers
        nvme: move nvme_ns_head_ops to multipath.c
        nvme: factor out a nvme_tryget_ns_head helper
        nvme: move the ioctl code to a separate file
        nvme: don't bother to look up a namespace for controller ioctls
        nvme: simplify block device ioctl handling for the !multipath case
        nvme: simplify the compat ioctl handling
        nvme: factor out a nvme_ns_ioctl helper
        nvme: pass a user pointer to nvme_nvm_ioctl
        nvme: cleanup setting the disk name
        nvme: add a nvme_ns_head_multipath helper
        nvme: remove single trailing whitespace
        nvme-multipath: remove single trailing whitespace
        nvme-pci: remove single trailing whitespace
        nvme-pci: don't simple map sgl when sgls are disabled
        nvmet: fix a spelling mistake "nubmer" -> "number"
        nvmet-fc: simplify nvmet_fc_alloc_hostport
        nvmet-tcp: fix a segmentation fault during io parsing error
      e63c8eb1
    • Sudhakar Panneerselvam's avatar
      md/bitmap: wait for external bitmap writes to complete during tear down · 404a8ef5
      Sudhakar Panneerselvam authored
      NULL pointer dereference was observed in super_written() when it tries
      to access the mddev structure.
      
      [The below stack trace is from an older kernel, but the problem described
      in this patch applies to the mainline kernel.]
      
      [ 1194.474861] task: ffff8fdd20858000 task.stack: ffffb99d40790000
      [ 1194.488000] RIP: 0010:super_written+0x29/0xe1
      [ 1194.499688] RSP: 0018:ffff8ffb7fcc3c78 EFLAGS: 00010046
      [ 1194.512477] RAX: 0000000000000000 RBX: ffff8ffb7bf4a000 RCX: ffff8ffb78991048
      [ 1194.527325] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8ffb56b8a200
      [ 1194.542576] RBP: ffff8ffb7fcc3c90 R08: 000000000000000b R09: 0000000000000000
      [ 1194.558001] R10: ffff8ffb56b8a298 R11: 0000000000000000 R12: ffff8ffb56b8a200
      [ 1194.573070] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
      [ 1194.588117] FS:  0000000000000000(0000) GS:ffff8ffb7fcc0000(0000) knlGS:0000000000000000
      [ 1194.604264] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1194.617375] CR2: 00000000000002b8 CR3: 00000021e040a002 CR4: 00000000007606e0
      [ 1194.632327] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 1194.647865] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 1194.663316] PKRU: 55555554
      [ 1194.674090] Call Trace:
      [ 1194.683735]  <IRQ>
      [ 1194.692948]  bio_endio+0xae/0x135
      [ 1194.703580]  blk_update_request+0xad/0x2fa
      [ 1194.714990]  blk_update_bidi_request+0x20/0x72
      [ 1194.726578]  __blk_end_bidi_request+0x2c/0x4d
      [ 1194.738373]  __blk_end_request_all+0x31/0x49
      [ 1194.749344]  blk_flush_complete_seq+0x377/0x383
      [ 1194.761550]  flush_end_io+0x1dd/0x2a7
      [ 1194.772910]  blk_finish_request+0x9f/0x13c
      [ 1194.784544]  scsi_end_request+0x180/0x25c
      [ 1194.796149]  scsi_io_completion+0xc8/0x610
      [ 1194.807503]  scsi_finish_command+0xdc/0x125
      [ 1194.818897]  scsi_softirq_done+0x81/0xde
      [ 1194.830062]  blk_done_softirq+0xa4/0xcc
      [ 1194.841008]  __do_softirq+0xd9/0x29f
      [ 1194.851257]  irq_exit+0xe6/0xeb
      [ 1194.861290]  do_IRQ+0x59/0xe3
      [ 1194.871060]  common_interrupt+0x1c6/0x382
      [ 1194.881988]  </IRQ>
      [ 1194.890646] RIP: 0010:cpuidle_enter_state+0xdd/0x2a5
      [ 1194.902532] RSP: 0018:ffffb99d40793e68 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff43
      [ 1194.917317] RAX: ffff8ffb7fce27c0 RBX: ffff8ffb7fced800 RCX: 000000000000001f
      [ 1194.932056] RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
      [ 1194.946428] RBP: ffffb99d40793ea0 R08: 0000000000000004 R09: 0000000000002ed2
      [ 1194.960508] R10: 0000000000002664 R11: 0000000000000018 R12: 0000000000000003
      [ 1194.974454] R13: 000000000000000b R14: ffffffff925715a0 R15: 0000011610120d5a
      [ 1194.988607]  ? cpuidle_enter_state+0xcc/0x2a5
      [ 1194.999077]  cpuidle_enter+0x17/0x19
      [ 1195.008395]  call_cpuidle+0x23/0x3a
      [ 1195.017718]  do_idle+0x172/0x1d5
      [ 1195.026358]  cpu_startup_entry+0x73/0x75
      [ 1195.035769]  start_secondary+0x1b9/0x20b
      [ 1195.044894]  secondary_startup_64+0xa5/0xa5
      [ 1195.084921] RIP: super_written+0x29/0xe1 RSP: ffff8ffb7fcc3c78
      [ 1195.096354] CR2: 00000000000002b8
      
      bio in the above stack is a bitmap write whose completion is invoked after
      the tear down sequence sets the mddev structure to NULL in rdev.
      
      During tear down, there is an attempt to flush the bitmap writes, but for
      external bitmaps, there is no explicit wait for all the bitmap writes to
      complete. For instance, md_bitmap_flush() is called to flush the bitmap
      writes, but the last call to md_bitmap_daemon_work() in md_bitmap_flush()
      could generate new bitmap writes for which there is no explicit wait to
      complete those writes. The call to md_bitmap_update_sb() will return
      simply for external bitmaps and the follow-up call to md_update_sb() is
      conditional and may not get called for external bitmaps. This results in a
      kernel panic when the completion routine, super_written() is called which
      tries to reference mddev in the rdev that has been set to
      NULL(in unbind_rdev_from_array() by tear down sequence).
      
      The solution is to call md_super_wait() for external bitmaps after the
      last call to md_bitmap_daemon_work() in md_bitmap_flush() to ensure there
      are no pending bitmap writes before proceeding with the tear down.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSudhakar Panneerselvam <sudhakar.panneerselvam@oracle.com>
      Reviewed-by: default avatarZhao Heming <heming.zhao@suse.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      404a8ef5
    • Christoph Hellwig's avatar
      md: do not return existing mddevs from mddev_find_or_alloc · 0d809b38
      Christoph Hellwig authored
      Instead of returning an existing mddev, just for it to be discarded
      later directly return -EEXIST.  Rename the function to mddev_alloc now
      that it doesn't find an existing mddev.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      0d809b38