1. 17 Nov, 2020 2 commits
  2. 06 Oct, 2020 1 commit
    • Dan Williams's avatar
      x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}() · ec6347bb
      Dan Williams authored
      
      In reaction to a proposal to introduce a memcpy_mcsafe_fast()
      implementation Linus points out that memcpy_mcsafe() is poorly named
      relative to communicating the scope of the interface. Specifically what
      addresses are valid to pass as source, destination, and what faults /
      exceptions are handled.
      
      Of particular concern is that even though x86 might be able to handle
      the semantics of copy_mc_to_user() with its common copy_user_generic()
      implementation other archs likely need / want an explicit path for this
      case:
      
        On Fri, May 1, 2020 at 11:28 AM Linus Torvalds <torvalds@linux-foundation.org> wrote:
        >
        > On Thu, Apr 30, 2020 at 6:21 PM Dan Williams <dan.j.williams@intel.com> wrote:
        > >
        > > However now I see that copy_user_generic() works for the wrong reason.
        > > It works because the exception on the source address due to poison
        > > looks no different than a write fault on the user address to the
        > > caller, it's still just a short copy. So it makes copy_to_user() work
        > > for the wrong reason relative to the name.
        >
        > Right.
        >
        > And it won't work that way on other architectures. On x86, we have a
        > generic function that can take faults on either side, and we use it
        > for both cases (and for the "in_user" case too), but that's an
        > artifact of the architecture oddity.
        >
        > In fact, it's probably wrong even on x86 - because it can hide bugs -
        > but writing those things is painful enough that everybody prefers
        > having just one function.
      
      Replace a single top-level memcpy_mcsafe() with either
      copy_mc_to_user(), or copy_mc_to_kernel().
      
      Introduce an x86 copy_mc_fragile() name as the rename for the
      low-level x86 implementation formerly named memcpy_mcsafe(). It is used
      as the slow / careful backend that is supplanted by a fast
      copy_mc_generic() in a follow-on patch.
      
      One side-effect of this reorganization is that separating copy_mc_64.S
      to its own file means that perf no longer needs to track dependencies
      for its memcpy_64.S benchmarks.
      
       [ bp: Massage a bit. ]
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarTony Luck <tony.luck@intel.com>
      Acked-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Cc: <stable@vger.kernel.org>
      Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com
      Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dwillia2-desk3.amr.corp.intel.com
      ec6347bb
  3. 01 Sep, 2020 1 commit
  4. 16 Jul, 2020 2 commits
  5. 08 Jul, 2020 1 commit
  6. 01 Jul, 2020 1 commit
  7. 19 Jun, 2020 1 commit
  8. 17 Jun, 2020 2 commits
  9. 15 May, 2020 2 commits
  10. 16 Apr, 2020 1 commit
    • Mikulas Patocka's avatar
      dm writecache: fix data corruption when reloading the target · 31b22120
      Mikulas Patocka authored
      The dm-writecache reads metadata in the target constructor. However, when
      we reload the target, there could be another active instance running on
      the same device. This is the sequence of operations when doing a reload:
      
      1. construct new target
      2. suspend old target
      3. resume new target
      4. destroy old target
      
      Metadata that were written by the old target between steps 1 and 2 would
      not be visible by the new target.
      
      Fix the data corruption by loading the metadata in the resume handler.
      
      Also, validate block_size is at least as large as both the devices'
      logical block size and only read 1 block from the metadata during
      target constructor -- no need to read entirety of metadata now that it
      is done during resume.
      
      Fixes: 48debafe
      
       ("dm: add writecache target")
      Cc: stable@vger.kernel.org # v4.18+
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      31b22120
  11. 27 Mar, 2020 1 commit
  12. 24 Mar, 2020 4 commits
  13. 03 Mar, 2020 1 commit
    • Mike Snitzer's avatar
      dm: bump version of core and various targets · 636be424
      Mike Snitzer authored
      Changes made during the 5.6 cycle warrant bumping the version number
      for DM core and the targets modified by this commit.
      
      It should be noted that dm-thin, dm-crypt and dm-raid already had
      their target version bumped during the 5.6 merge window.
      
      Signed-off-by; Mike Snitzer <snitzer@redhat.com>
      636be424
  14. 27 Feb, 2020 2 commits
    • Mikulas Patocka's avatar
      dm writecache: verify watermark during resume · 41c526c5
      Mikulas Patocka authored
      Verify the watermark upon resume - so that if the target is reloaded
      with lower watermark, it will start the cleanup process immediately.
      
      Fixes: 48debafe
      
       ("dm: add writecache target")
      Cc: stable@vger.kernel.org # 4.18+
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      41c526c5
    • Mikulas Patocka's avatar
      dm: report suspended device during destroy · adc0daad
      Mikulas Patocka authored
      The function dm_suspended returns true if the target is suspended.
      However, when the target is being suspended during unload, it returns
      false.
      
      An example where this is a problem: the test "!dm_suspended(wc->ti)" in
      writecache_writeback is not sufficient, because dm_suspended returns
      zero while writecache_suspend is in progress.  As is, without an
      enhanced dm_suspended, simply switching from flush_workqueue to
      drain_workqueue still emits warnings:
      workqueue writecache-writeback: drain_workqueue() isn't complete after 10 tries
      workqueue writecache-writeback: drain_workqueue() isn't complete after 100 tries
      workqueue writecache-writeback: drain_workqueue() isn't complete after 200 tries
      workqueue writecache-writeback: drain_workqueue() isn't complete after 300 tries
      workqueue writecache-writeback: drain_workqueue() isn't complete after 400 tries
      
      writecache_suspend calls flush_workqueue(wc->writeback_wq) - this function
      flushes the current work. However, the workqueue may re-queue itself and
      flush_workqueue doesn't wait for re-queued works to finish. Because of
      this - the function writecache_writeback continues execution after the
      device was suspended and then concurrently with writecache_dtr, causing
      a crash in writecache_writeback.
      
      We must use drain_workqueue - that waits until the work and all re-queued
      works finish.
      
      As a prereq for switching to drain_workqueue, this commit fixes
      dm_suspended to return true after the presuspend hook and before the
      postsuspend hook - just like during a normal suspend. It allows
      simplifying the dm-integrity and dm-writecache targets so that they
      don't have to maintain suspended flags on their own.
      
      With this change use of drain_workqueue() can be used effectively.  This
      change was tested with the lvm2 testsuite and cryptsetup testsuite and
      the are no regressions.
      
      Fixes: 48debafe
      
       ("dm: add writecache target")
      Cc: stable@vger.kernel.org # 4.18+
      Reported-by: default avatarCorey Marthaler <cmarthal@redhat.com>
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      adc0daad
  15. 16 Jan, 2020 1 commit
    • Mikulas Patocka's avatar
      dm writecache: improve performance of large linear writes on SSDs · dcd19507
      Mikulas Patocka authored
      
      When dm-writecache is used with SSD as a cache device, it would submit a
      separate bio for each written block. The I/Os would be merged by the disk
      scheduler, but this merging degrades performance.
      
      Improve dm-writecache performance by submitting larger bios - this is
      possible as long as there is consecutive free space on the cache
      device.
      
      Benchmark (arm64 with 64k page size, using /dev/ram0 as a cache device):
      
      fio --bs=512k --iodepth=32 --size=400M --direct=1 \
          --filename=/dev/mapper/cache --rw=randwrite --numjobs=1 --name=test
      
      block	old	new
      size	MiB/s	MiB/s
      ---------------------
      512	181	700
      1k	347	1256
      2k	644	2020
      4k	1183	2759
      8k	1852	3333
      16k	2469	3509
      32k	2974	3670
      64k	3404	3810
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      dcd19507
  16. 15 Jan, 2020 1 commit
    • Mikulas Patocka's avatar
      dm writecache: fix incorrect flush sequence when doing SSD mode commit · aa950920
      Mikulas Patocka authored
      When committing state, the function writecache_flush does the following:
      1. write metadata (writecache_commit_flushed)
      2. flush disk cache (writecache_commit_flushed)
      3. wait for data writes to complete (writecache_wait_for_ios)
      4. increase superblock seq_count
      5. write the superblock
      6. flush disk cache
      
      It may happen that at step 3, when we wait for some write to finish, the
      disk may report the write as finished, but the write only hit the disk
      cache and it is not yet stored in persistent storage. At step 5 we write
      the superblock - it may happen that the superblock is written before the
      write that we waited for in step 3. If the machine crashes, it may result
      in incorrect data being returned after reboot.
      
      In order to fix the bug, we must swap steps 2 and 3 in the above sequence,
      so that we first wait for writes to complete and then flush the disk
      cache.
      
      Fixes: 48debafe
      
       ("dm: add writecache target")
      Cc: stable@vger.kernel.org # 4.18+
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      aa950920
  17. 05 Nov, 2019 2 commits
  18. 05 Sep, 2019 1 commit
  19. 26 Aug, 2019 3 commits
    • Huaisheng Ye's avatar
      dm writecache: optimize performance by sorting the blocks for writeback_all · 5229b489
      Huaisheng Ye authored
      
      During the process of writeback, the blocks, which have been placed in wbl.list
      for writeback soon, are partially ordered for the contiguous ones.
      
      When writeback_all has been set, for most cases, also by default, there will be
      a lot of blocks in pmem need to writeback at the same time.
      For this case, we could optimize the performance by sorting all blocks in
      wbl.list. writecache_writeback doesn't need to get blocks from the tail of
      wc->lru, whereas from the first rb_node from the rb_tree.
      
      The benefit is that, writecache_writeback doesn't need to have any cost to sort
      the blocks, because of all blocks are incremental originally in rb_tree.
      There will be a writecache_flush when writeback_all begins to work, that will
      eliminate duplicate blocks in cache by committed/uncommitted.
      
      Testing platform: Thinksystem SR630 with persistent memory.
      The cache comes from pmem, which has 1006MB size. The origin device is HDD, 2GB
      of which for using.
      
      Testing steps:
       1) dmsetup create mycache --table '0 4194304 writecache p /dev/sdb1 /dev/pmem4  4096 0'
       2) fio -filename=/dev/mapper/mycache -direct=1 -iodepth=20 -rw=randwrite
       -ioengine=libaio -bs=4k -loops=1  -size=2g -group_reporting -name=mytest1
       3) time dmsetup message /dev/mapper/mycache 0 flush
      
      Here is the results below,
      With the patch:
       # fio -filename=/dev/mapper/mycache -direct=1 -iodepth=20 -rw=randwrite
       -ioengine=libaio -bs=4k -loops=1  -size=2g -group_reporting -name=mytest1
         iops        : min= 1582, max=199470, avg=5305.94, stdev=21273.44, samples=197
       # time dmsetup message /dev/mapper/mycache 0 flush
      real	0m44.020s
      user	0m0.002s
      sys	0m0.003s
      
      Without the patch:
       # fio -filename=/dev/mapper/mycache -direct=1 -iodepth=20 -rw=randwrite
       -ioengine=libaio -bs=4k -loops=1  -size=2g -group_reporting -name=mytest1
         iops        : min= 1202, max=197650, avg=4968.67, stdev=20480.17, samples=211
       # time dmsetup message /dev/mapper/mycache 0 flush
      real	1m39.221s
      user	0m0.001s
      sys	0m0.003s
      
      I also have checked the data accuracy with this patch by making EXT4 filesystem
      on mycache, then mount it for checking md5 of files on that.
      The test result is positive, with this patch it could save more than half of time
      when writeback_all.
      Signed-off-by: default avatarHuaisheng Ye <yehs1@lenovo.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      5229b489
    • Huaisheng Ye's avatar
      dm writecache: add unlikely for getting two block with same LBA · 62421b38
      Huaisheng Ye authored
      
      In function writecache_writeback, entries g and f has same original
      sector only happens at entry f has been committed, but entry g has
      NOT yet.
      
      The probability of this happening is very low in the following
      256 blocks at most of entry e.
      Signed-off-by: default avatarHuaisheng Ye <yehs1@lenovo.com>
      Acked-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      62421b38
    • Huaisheng Ye's avatar
      dm writecache: remove unused member pointer in writeback_struct · 58912dbc
      Huaisheng Ye authored
      
      The stucture member pointer page in writeback_struct never has been
      used actually. Remove it.
      Signed-off-by: default avatarHuaisheng Ye <yehs1@lenovo.com>
      Acked-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      58912dbc
  20. 26 Apr, 2019 2 commits
  21. 18 Apr, 2019 2 commits
  22. 05 Mar, 2019 1 commit
  23. 18 Dec, 2018 1 commit
  24. 22 Oct, 2018 1 commit
  25. 16 Aug, 2018 1 commit
  26. 30 Jul, 2018 1 commit
  27. 27 Jul, 2018 1 commit