1. 12 Jul, 2019 5 commits
    • Damien Le Moal's avatar
      block: Limit zone array allocation size · 26202928
      Damien Le Moal authored
      Limit the size of the struct blk_zone array used in
      blk_revalidate_disk_zones() to avoid memory allocation failures leading
      to disk revalidation failure. Also further reduce the likelyhood of
      such failures by using kvcalloc() (that is vmalloc()) instead of
      allocating contiguous pages with alloc_pages().
      
      Fixes: 515ce606 ("scsi: sd_zbc: Fix sd_zbc_report_zones() buffer allocation")
      Fixes: e76239a3 ("block: add a report_zones method")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      26202928
    • Damien Le Moal's avatar
      sd_zbc: Fix report zones buffer allocation · b091ac61
      Damien Le Moal authored
      During disk scan and revalidation done with sd_revalidate(), the zones
      of a zoned disk are checked using the helper function
      blk_revalidate_disk_zones() if a configuration change is detected
      (change in the number of zones or zone size). The function
      blk_revalidate_disk_zones() issues report_zones calls that are very
      large, that is, to obtain zone information for all zones of the disk
      with a single command. The size of the report zones command buffer
      necessary for such large request generally is lower than the disk
      max_hw_sectors and KMALLOC_MAX_SIZE (4MB) and succeeds on boot (no
      memory fragmentation), but often fail at run time (e.g. hot-plug
      event). This causes the disk revalidation to fail and the disk
      capacity to be changed to 0.
      
      This problem can be avoided by using vmalloc() instead of kmalloc() for
      the buffer allocation. To limit the amount of memory to be allocated,
      this patch also introduces the arbitrary SD_ZBC_REPORT_MAX_ZONES
      maximum number of zones to report with a single report zones command.
      This limit may be lowered further to satisfy the disk max_hw_sectors
      limit. Finally, to ensure that the vmalloc-ed buffer can always be
      mapped in a request, the buffer size is further limited to at most
      queue_max_segments() pages, allowing successful mapping of the buffer
      even in the worst case scenario where none of the buffer pages are
      contiguous.
      
      Fixes: 515ce606 ("scsi: sd_zbc: Fix sd_zbc_report_zones() buffer allocation")
      Fixes: e76239a3 ("block: add a report_zones method")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b091ac61
    • Damien Le Moal's avatar
      block: Kill gfp_t argument of blkdev_report_zones() · bd976e52
      Damien Le Moal authored
      Only GFP_KERNEL and GFP_NOIO are used with blkdev_report_zones(). In
      preparation of using vmalloc() for large report buffer and zone array
      allocations used by this function, remove its "gfp_t gfp_mask" argument
      and rely on the caller context to use memalloc_noio_save/restore() where
      necessary (block layer zone revalidation and dm-zoned I/O error path).
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      bd976e52
    • Damien Le Moal's avatar
      block: Allow mapping of vmalloc-ed buffers · b4c5875d
      Damien Le Moal authored
      To allow the SCSI subsystem scsi_execute_req() function to issue
      requests using large buffers that are better allocated with vmalloc()
      rather than kmalloc(), modify bio_map_kern() to allow passing a buffer
      allocated with vmalloc().
      
      To do so, detect vmalloc-ed buffers using is_vmalloc_addr(). For
      vmalloc-ed buffers, flush the buffer using flush_kernel_vmap_range(),
      use vmalloc_to_page() instead of virt_to_page() to obtain the pages of
      the buffer, and invalidate the buffer addresses with
      invalidate_kernel_vmap_range() on completion of read BIOs. This last
      point is executed using the function bio_invalidate_vmalloc_pages()
      which is defined only if the architecture defines
      ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE, that is, if the architecture
      actually needs the invalidation done.
      
      Fixes: 515ce606 ("scsi: sd_zbc: Fix sd_zbc_report_zones() buffer allocation")
      Fixes: e76239a3 ("block: add a report_zones method")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b4c5875d
    • Wenwen Wang's avatar
      block/bio-integrity: fix a memory leak bug · e7bf90e5
      Wenwen Wang authored
      In bio_integrity_prep(), a kernel buffer is allocated through kmalloc() to
      hold integrity metadata. Later on, the buffer will be attached to the bio
      structure through bio_integrity_add_page(), which returns the number of
      bytes of integrity metadata attached. Due to unexpected situations,
      bio_integrity_add_page() may return 0. As a result, bio_integrity_prep()
      needs to be terminated with 'false' returned to indicate this error.
      However, the allocated kernel buffer is not freed on this execution path,
      leading to a memory leak.
      
      To fix this issue, free the allocated buffer before returning from
      bio_integrity_prep().
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Acked-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarWenwen Wang <wenwen@cs.uga.edu>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e7bf90e5
  2. 11 Jul, 2019 4 commits
    • Minwoo Im's avatar
      nvme: fix NULL deref for fabrics options · 7d30c81b
      Minwoo Im authored
      git://git.infradead.org/nvme.git nvme-5.3 branch now causes the
      following NULL deref oops.  Check the ctrl->opts first before the deref.
      
      [   16.337581] BUG: kernel NULL pointer dereference, address: 0000000000000056
      [   16.338551] #PF: supervisor read access in kernel mode
      [   16.338551] #PF: error_code(0x0000) - not-present page
      [   16.338551] PGD 0 P4D 0
      [   16.338551] Oops: 0000 [#1] SMP PTI
      [   16.338551] CPU: 2 PID: 1035 Comm: kworker/u16:5 Not tainted 5.2.0-rc6+ #1
      [   16.338551] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
      [   16.338551] Workqueue: nvme-wq nvme_scan_work [nvme_core]
      [   16.338551] RIP: 0010:nvme_validate_ns+0xc9/0x7e0 [nvme_core]
      [   16.338551] Code: c0 49 89 c5 0f 84 00 07 00 00 48 8b 7b 58 e8 be 48 39 c1 48 3d 00 f0 ff ff 49 89 45 18 0f 87 a4 06 00 00 48 8b 93 70 0a 00 00 <80> 7a 56 00 74 0c 48 8b 40 68 83 48 3c 08 49 8b 45 18 48 89 c6 bf
      [   16.338551] RSP: 0018:ffffc900024c7d10 EFLAGS: 00010283
      [   16.338551] RAX: ffff888135a30720 RBX: ffff88813a4fd1f8 RCX: 0000000000000007
      [   16.338551] RDX: 0000000000000000 RSI: ffffffff8256dd38 RDI: ffff888135a30720
      [   16.338551] RBP: 0000000000000001 R08: 0000000000000007 R09: ffff88813aa6a840
      [   16.338551] R10: 0000000000000001 R11: 000000000002d060 R12: ffff88813a4fd1f8
      [   16.338551] R13: ffff88813a77f800 R14: ffff88813aa35180 R15: 0000000000000001
      [   16.338551] FS:  0000000000000000(0000) GS:ffff88813ba80000(0000) knlGS:0000000000000000
      [   16.338551] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   16.338551] CR2: 0000000000000056 CR3: 000000000240a002 CR4: 0000000000360ee0
      [   16.338551] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   16.338551] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   16.338551] Call Trace:
      [   16.338551]  nvme_scan_work+0x2c0/0x340 [nvme_core]
      [   16.338551]  ? __switch_to_asm+0x40/0x70
      [   16.338551]  ? _raw_spin_unlock_irqrestore+0x18/0x30
      [   16.338551]  ? try_to_wake_up+0x408/0x450
      [   16.338551]  process_one_work+0x20b/0x3e0
      [   16.338551]  worker_thread+0x1f9/0x3d0
      [   16.338551]  ? cancel_delayed_work+0xa0/0xa0
      [   16.338551]  kthread+0x117/0x120
      [   16.338551]  ? kthread_stop+0xf0/0xf0
      [   16.338551]  ret_from_fork+0x3a/0x50
      [   16.338551] Modules linked in: nvme nvme_core
      [   16.338551] CR2: 0000000000000056
      [   16.338551] ---[ end trace b9bf761a93e62d84 ]---
      [   16.338551] RIP: 0010:nvme_validate_ns+0xc9/0x7e0 [nvme_core]
      [   16.338551] Code: c0 49 89 c5 0f 84 00 07 00 00 48 8b 7b 58 e8 be 48 39 c1 48 3d 00 f0 ff ff 49 89 45 18 0f 87 a4 06 00 00 48 8b 93 70 0a 00 00 <80> 7a 56 00 74 0c 48 8b 40 68 83 48 3c 08 49 8b 45 18 48 89 c6 bf
      [   16.338551] RSP: 0018:ffffc900024c7d10 EFLAGS: 00010283
      [   16.338551] RAX: ffff888135a30720 RBX: ffff88813a4fd1f8 RCX: 0000000000000007
      [   16.338551] RDX: 0000000000000000 RSI: ffffffff8256dd38 RDI: ffff888135a30720
      [   16.338551] RBP: 0000000000000001 R08: 0000000000000007 R09: ffff88813aa6a840
      [   16.338551] R10: 0000000000000001 R11: 000000000002d060 R12: ffff88813a4fd1f8
      [   16.338551] R13: ffff88813a77f800 R14: ffff88813aa35180 R15: 0000000000000001
      [   16.338551] FS:  0000000000000000(0000) GS:ffff88813ba80000(0000) knlGS:0000000000000000
      [   16.338551] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   16.338551] CR2: 0000000000000056 CR3: 000000000240a002 CR4: 0000000000360ee0
      [   16.338551] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   16.338551] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      Fixes: 958f2a0f ("nvme-tcp: set the STABLE_WRITES flag when data digests are enabled")
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Keith Busch <kbusch@kernel.org>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarMinwoo Im <minwoo.im.dev@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7d30c81b
    • Jens Axboe's avatar
      Merge branch 'nvme-5.3' of git://git.infradead.org/nvme into for-linus · b7403066
      Jens Axboe authored
      Pull NVMe fixes from Christoph:
      
      "Lof of fixes all over the place, and two very minor features that
       were in the nvme tree by the end of the merge window, but hadn't made
       it out to Jens yet."
      
      * 'nvme-5.3' of git://git.infradead.org/nvme:
        nvme: fix regression upon hot device removal and insertion
        nvme-fc: fix module unloads while lports still pending
        nvme-tcp: don't use sendpage for SLAB pages
        nvme-tcp: set the STABLE_WRITES flag when data digests are enabled
        nvmet: print a hint while rejecting NSID 0 or 0xffffffff
        nvme-multipath: do not select namespaces which are about to be removed
        nvme-multipath: also check for a disabled path if there is a single sibling
        nvme-multipath: factor out a nvme_path_is_disabled helper
        nvme: set physical block size and optimal I/O size
        nvme: add I/O characteristics fields
        nvmet: export I/O characteristics attributes in Identify
        nvme-trace: add delete completion and submission queue to admin cmds tracer
        nvme-trace: fix spelling mistake "spcecific" -> "specific"
        nvme-pci: limit max_hw_sectors based on the DMA max mapping size
        nvme-pci: check for NULL return from pci_alloc_p2pmem()
        nvme-pci: don't create a read hctx mapping without read queues
        nvme-pci: don't fall back to a 32-bit DMA mask
        nvme-pci: make nvme_dev_pm_ops static
        nvme-fcloop: resolve warnings on RCU usage and sleep warnings
        nvme-fcloop: fix inconsistent lock state warnings
      b7403066
    • Mike Christie's avatar
      nbd: add netlink reconfigure resize support · 4ddeaae8
      Mike Christie authored
      If the device is setup with ioctl we can resize the device after the
      initial setup, but if the device is setup with netlink we cannot use the
      resize related ioctls and there is no netlink reconfigure size ATTR
      handling code.
      
      This patch adds netlink reconfigure resize support to match the ioctl
      interface.
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarMike Christie <mchristi@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      4ddeaae8
    • Xiubo Li's avatar
      nbd: fix crash when the blksize is zero · 553768d1
      Xiubo Li authored
      This will allow the blksize to be set zero and then use 1024 as
      default.
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarXiubo Li <xiubli@redhat.com>
      [fix to use goto out instead of return in genl_connect]
      Signed-off-by: default avatarMike Christie <mchristi@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      553768d1
  3. 10 Jul, 2019 13 commits
  4. 09 Jul, 2019 18 commits