1. 04 Mar, 2021 1 commit
  2. 11 Feb, 2021 8 commits
    • Mikulas Patocka's avatar
      dm: fix deadlock when swapping to encrypted device · a666e5c0
      Mikulas Patocka authored
      The system would deadlock when swapping to a dm-crypt device. The reason
      is that for each incoming write bio, dm-crypt allocates memory that holds
      encrypted data. These excessive allocations exhaust all the memory and the
      result is either deadlock or OOM trigger.
      
      This patch limits the number of in-flight swap bios, so that the memory
      consumed by dm-crypt is limited. The limit is enforced if the target set
      the "limit_swap_bios" variable and if the bio has REQ_SWAP set.
      
      Non-swap bios are not affected becuase taking the semaphore would cause
      performance degradation.
      
      This is similar to request-based drivers - they will also block when the
      number of requests is over the limit.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      a666e5c0
    • Mike Snitzer's avatar
      dm: simplify target code conditional on CONFIG_BLK_DEV_ZONED · e3290b94
      Mike Snitzer authored
      Allow removal of CONFIG_BLK_DEV_ZONED conditionals in target_type
      definition of various targets.
      Suggested-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      e3290b94
    • Satya Tangirala's avatar
      dm: set DM_TARGET_PASSES_CRYPTO feature for some targets · 3db564b4
      Satya Tangirala authored
      dm-linear and dm-flakey obviously can pass through inline crypto support.
      Co-developed-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarSatya Tangirala <satyat@google.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      3db564b4
    • Satya Tangirala's avatar
      dm: support key eviction from keyslot managers of underlying devices · 9355a9eb
      Satya Tangirala authored
      Now that device mapper supports inline encryption, add the ability to
      evict keys from all underlying devices. When an upper layer requests
      a key eviction, we simply iterate through all underlying devices
      and evict that key from each device.
      Co-developed-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarSatya Tangirala <satyat@google.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      9355a9eb
    • Satya Tangirala's avatar
      dm: add support for passing through inline crypto support · aa6ce87a
      Satya Tangirala authored
      Update the device-mapper core to support exposing the inline crypto
      support of the underlying device(s) through the device-mapper device.
      
      This works by creating a "passthrough keyslot manager" for the dm
      device, which declares support for encryption settings which all
      underlying devices support.  When a supported setting is used, the bio
      cloning code handles cloning the crypto context to the bios for all the
      underlying devices.  When an unsupported setting is used, the blk-crypto
      fallback is used as usual.
      
      Crypto support on each underlying device is ignored unless the
      corresponding dm target opts into exposing it.  This is needed because
      for inline crypto to semantically operate on the original bio, the data
      must not be transformed by the dm target.  Thus, targets like dm-linear
      can expose crypto support of the underlying device, but targets like
      dm-crypt can't.  (dm-crypt could use inline crypto itself, though.)
      
      A DM device's table can only be changed if the "new" inline encryption
      capabilities are a (*not* necessarily strict) superset of the "old" inline
      encryption capabilities.  Attempts to make changes to the table that result
      in some inline encryption capability becoming no longer supported will be
      rejected.
      
      For the sake of clarity, key eviction from underlying devices will be
      handled in a future patch.
      Co-developed-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarSatya Tangirala <satyat@google.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      aa6ce87a
    • Satya Tangirala's avatar
      block/keyslot-manager: Introduce functions for device mapper support · d3b17a24
      Satya Tangirala authored
      Introduce blk_ksm_update_capabilities() to update the capabilities of
      a keyslot manager (ksm) in-place. The pointer to a ksm in a device's
      request queue may not be easily replaced, because upper layers like
      the filesystem might access it (e.g. for programming keys/checking
      capabilities) at the same time the device wants to replace that
      request queue's ksm (and free the old ksm's memory). This function
      allows the device to update the capabilities of the ksm in its request
      queue directly. Devices can safely update the ksm this way without any
      synchronization with upper layers *only* if the updated (new) ksm
      continues to support all the crypto capabilities that the old ksm did
      (see description below for blk_ksm_is_superset() for why this is so).
      
      Also introduce blk_ksm_is_superset() which checks whether one ksm's
      capabilities are a (not necessarily strict) superset of another ksm's.
      The blk-crypto framework requires that crypto capabilities that were
      advertised when a bio was created continue to be supported by the
      device until that bio is ended - in practice this probably means that
      a device's advertised crypto capabilities can *never* "shrink" (since
      there's no synchronization between bio creation and when a device may
      want to change its advertised capabilities) - so a previously
      advertised crypto capability must always continue to be supported.
      This function can be used to check that a new ksm is a valid
      replacement for an old ksm.
      Signed-off-by: default avatarSatya Tangirala <satyat@google.com>
      Reviewed-by: default avatarEric Biggers <ebiggers@google.com>
      Acked-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      d3b17a24
    • Satya Tangirala's avatar
      block/keyslot-manager: Introduce passthrough keyslot manager · 7bdcc48f
      Satya Tangirala authored
      The device mapper may map over devices that have inline encryption
      capabilities, and to make use of those capabilities, the DM device must
      itself advertise those inline encryption capabilities. One way to do this
      would be to have the DM device set up a keyslot manager with a
      "sufficiently large" number of keyslots, but that would use a lot of
      memory. Also, the DM device itself has no "keyslots", and it doesn't make
      much sense to talk about "programming a key into a DM device's keyslot
      manager", so all that extra memory used to represent those keyslots is just
      wasted. All a DM device really needs to be able to do is advertise the
      crypto capabilities of the underlying devices in a coherent manner and
      expose a way to evict keys from the underlying devices.
      
      There are also devices with inline encryption hardware that do not
      have a limited number of keyslots. One can send a raw encryption key along
      with a bio to these devices (as opposed to typical inline encryption
      hardware that require users to first program a raw encryption key into a
      keyslot, and send the index of that keyslot along with the bio). These
      devices also only need the same things from the keyslot manager that DM
      devices need - a way to advertise crypto capabilities and potentially a way
      to expose a function to evict keys from hardware.
      
      So we introduce a "passthrough" keyslot manager that provides a way to
      represent a keyslot manager that doesn't have just a limited number of
      keyslots, and for which do not require keys to be programmed into keyslots.
      DM devices can set up a passthrough keyslot manager in their request
      queues, and advertise appropriate crypto capabilities based on those of the
      underlying devices. Blk-crypto does not attempt to program keys into any
      keyslots in the passthrough keyslot manager. Instead, if/when the bio is
      resubmitted to the underlying device, blk-crypto will try to program the
      key into the underlying device's keyslot manager.
      Signed-off-by: default avatarSatya Tangirala <satyat@google.com>
      Reviewed-by: default avatarEric Biggers <ebiggers@google.com>
      Acked-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      7bdcc48f
    • Nikos Tsironis's avatar
      dm era: only resize metadata in preresume · cca2c6ae
      Nikos Tsironis authored
      Metadata resize shouldn't happen in the ctr. The ctr loads a temporary
      (inactive) table that will only become active upon resume. That is why
      resize should always be done in terms of resume. Otherwise a load (ctr)
      whose inactive table never becomes active will incorrectly resize the
      metadata.
      
      Also, perform the resize directly in preresume, instead of using the
      worker to do it.
      
      The worker might run other metadata operations, e.g., it could start
      digestion, before resizing the metadata. These operations will end up
      using the old size.
      
      This could lead to errors, like:
      
        device-mapper: era: metadata_digest_transcribe_writeset: dm_array_set_value failed
        device-mapper: era: process_old_eras: digest step failed, stopping digestion
      
      The reason of the above error is that the worker started the digestion
      of the archived writeset using the old, larger size.
      
      As a result, metadata_digest_transcribe_writeset tried to write beyond
      the end of the era array.
      
      Fixes: eec40579 ("dm: add era target")
      Cc: stable@vger.kernel.org # v3.15+
      Signed-off-by: default avatarNikos Tsironis <ntsironis@arrikto.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      cca2c6ae
  3. 10 Feb, 2021 6 commits
    • Nikos Tsironis's avatar
      dm era: Use correct value size in equality function of writeset tree · 64f2d15a
      Nikos Tsironis authored
      Fix the writeset tree equality test function to use the right value size
      when comparing two btree values.
      
      Fixes: eec40579 ("dm: add era target")
      Cc: stable@vger.kernel.org # v3.15+
      Signed-off-by: default avatarNikos Tsironis <ntsironis@arrikto.com>
      Reviewed-by: default avatarMing-Hung Tsai <mtsai@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      64f2d15a
    • Nikos Tsironis's avatar
      dm era: Fix bitset memory leaks · 904e6b26
      Nikos Tsironis authored
      Deallocate the memory allocated for the in-core bitsets when destroying
      the target and in error paths.
      
      Fixes: eec40579 ("dm: add era target")
      Cc: stable@vger.kernel.org # v3.15+
      Signed-off-by: default avatarNikos Tsironis <ntsironis@arrikto.com>
      Reviewed-by: default avatarMing-Hung Tsai <mtsai@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      904e6b26
    • Nikos Tsironis's avatar
      dm era: Verify the data block size hasn't changed · c8e846ff
      Nikos Tsironis authored
      dm-era doesn't support changing the data block size of existing devices,
      so check explicitly that the requested block size for a new target
      matches the one stored in the metadata.
      
      Fixes: eec40579 ("dm: add era target")
      Cc: stable@vger.kernel.org # v3.15+
      Signed-off-by: default avatarNikos Tsironis <ntsironis@arrikto.com>
      Reviewed-by: default avatarMing-Hung Tsai <mtsai@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      c8e846ff
    • Nikos Tsironis's avatar
      dm era: Reinitialize bitset cache before digesting a new writeset · 25249333
      Nikos Tsironis authored
      In case of devices with at most 64 blocks, the digestion of consecutive
      eras uses the writeset of the first era as the writeset of all eras to
      digest, leading to lost writes. That is, we lose the information about
      what blocks were written during the affected eras.
      
      The digestion code uses a dm_disk_bitset object to access the archived
      writesets. This structure includes a one word (64-bit) cache to reduce
      the number of array lookups.
      
      This structure is initialized only once, in metadata_digest_start(),
      when we kick off digestion.
      
      But, when we insert a new writeset into the writeset tree, before the
      digestion of the previous writeset is done, or equivalently when there
      are multiple writesets in the writeset tree to digest, then all these
      writesets are digested using the same cache and the cache is not
      re-initialized when moving from one writeset to the next.
      
      For devices with more than 64 blocks, i.e., the size of the cache, the
      cache is indirectly invalidated when we move to a next set of blocks, so
      we avoid the bug.
      
      But for devices with at most 64 blocks we end up using the same cached
      data for digesting all archived writesets, i.e., the cache is loaded
      when digesting the first writeset and it never gets reloaded, until the
      digestion is done.
      
      As a result, the writeset of the first era to digest is used as the
      writeset of all the following archived eras, leading to lost writes.
      
      Fix this by reinitializing the dm_disk_bitset structure, and thus
      invalidating the cache, every time the digestion code starts digesting a
      new writeset.
      
      Fixes: eec40579 ("dm: add era target")
      Cc: stable@vger.kernel.org # v3.15+
      Signed-off-by: default avatarNikos Tsironis <ntsironis@arrikto.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      25249333
    • Nikos Tsironis's avatar
      dm era: Update in-core bitset after committing the metadata · 2099b145
      Nikos Tsironis authored
      In case of a system crash, dm-era might fail to mark blocks as written
      in its metadata, although the corresponding writes to these blocks were
      passed down to the origin device and completed successfully.
      
      Consider the following sequence of events:
      
      1. We write to a block that has not been yet written in the current era
      2. era_map() checks the in-core bitmap for the current era and sees
         that the block is not marked as written.
      3. The write is deferred for submission after the metadata have been
         updated and committed.
      4. The worker thread processes the deferred write
         (process_deferred_bios()) and marks the block as written in the
         in-core bitmap, **before** committing the metadata.
      5. The worker thread starts committing the metadata.
      6. We do more writes that map to the same block as the write of step (1)
      7. era_map() checks the in-core bitmap and sees that the block is marked
         as written, **although the metadata have not been committed yet**.
      8. These writes are passed down to the origin device immediately and the
         device reports them as completed.
      9. The system crashes, e.g., power failure, before the commit from step
         (5) finishes.
      
      When the system recovers and we query the dm-era target for the list of
      written blocks it doesn't report the aforementioned block as written,
      although the writes of step (6) completed successfully.
      
      The issue is that era_map() decides whether to defer or not a write
      based on non committed information. The root cause of the bug is that we
      update the in-core bitmap, **before** committing the metadata.
      
      Fix this by updating the in-core bitmap **after** successfully
      committing the metadata.
      
      Fixes: eec40579 ("dm: add era target")
      Cc: stable@vger.kernel.org # v3.15+
      Signed-off-by: default avatarNikos Tsironis <ntsironis@arrikto.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      2099b145
    • Nikos Tsironis's avatar
      dm era: Recover committed writeset after crash · de89afc1
      Nikos Tsironis authored
      Following a system crash, dm-era fails to recover the committed writeset
      for the current era, leading to lost writes. That is, we lose the
      information about what blocks were written during the affected era.
      
      dm-era assumes that the writeset of the current era is archived when the
      device is suspended. So, when resuming the device, it just moves on to
      the next era, ignoring the committed writeset.
      
      This assumption holds when the device is properly shut down. But, when
      the system crashes, the code that suspends the target never runs, so the
      writeset for the current era is not archived.
      
      There are three issues that cause the committed writeset to get lost:
      
      1. dm-era doesn't load the committed writeset when opening the metadata
      2. The code that resizes the metadata wipes the information about the
         committed writeset (assuming it was loaded at step 1)
      3. era_preresume() starts a new era, without taking into account that
         the current era might not have been archived, due to a system crash.
      
      To fix this:
      
      1. Load the committed writeset when opening the metadata
      2. Fix the code that resizes the metadata to make sure it doesn't wipe
         the loaded writeset
      3. Fix era_preresume() to check for a loaded writeset and archive it,
         before starting a new era.
      
      Fixes: eec40579 ("dm: add era target")
      Cc: stable@vger.kernel.org # v3.15+
      Signed-off-by: default avatarNikos Tsironis <ntsironis@arrikto.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      de89afc1
  4. 09 Feb, 2021 6 commits
  5. 08 Feb, 2021 1 commit
  6. 03 Feb, 2021 10 commits
  7. 02 Feb, 2021 1 commit
  8. 01 Feb, 2021 2 commits
  9. 29 Jan, 2021 2 commits
  10. 28 Jan, 2021 1 commit
  11. 27 Jan, 2021 2 commits
    • Chaitanya Kulkarni's avatar
      nvme-core: check bdev value for NULL · 59c15743
      Chaitanya Kulkarni authored
      The nvme-core sets the bdev to NULL when admin comamnd is issued from
      IOCTL in the following path e.g. nvme list :-
      
      block_ioctl()
       blkdev_ioctl()
        nvme_ioctl()
         nvme_user_cmd()
          nvme_submit_user_cmd()
      
      The commit 309dca30 ("block: store a block_device pointer in struct bio")
      now uses bdev unconditionally in the macro bio_set_dev() and assumes
      that bdev value is not NULL which results in the following crash in
      since thats where bdev is actually accessed :-
      
      void bio_associate_blkg_from_css(struct bio *bio,
      				 struct cgroup_subsys_state *css)
      {
      	if (bio->bi_blkg)
      		blkg_put(bio->bi_blkg);
      
      	if (css && css->parent) {
      		bio->bi_blkg = blkg_tryget_closest(bio, css);
      	} else {
      -------------->	blkg_get(bio->bi_bdev->bd_disk->queue->root_blkg);
      		bio->bi_blkg = bio->bi_bdev->bd_disk->queue->root_blkg;
      	}
      }
      EXPORT_SYMBOL_GPL(bio_associate_blkg_from_css);
      
      [  345.385947] BUG: kernel NULL pointer dereference, address: 0000000000000690
      [  345.387103] #PF: supervisor read access in kernel mode
      [  345.387894] #PF: error_code(0x0000) - not-present page
      [  345.388756] PGD 162a2b067 P4D 162a2b067 PUD 1633eb067 PMD 0
      [  345.389625] Oops: 0000 [#1] SMP NOPTI
      [  345.390206] CPU: 15 PID: 4100 Comm: nvme Tainted: G           OE     5.11.0-rc5blk+ #141
      [  345.391377] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba52764
      [  345.393074] RIP: 0010:bio_associate_blkg_from_css.cold.47+0x58/0x21f
      
      [  345.396362] RSP: 0018:ffffc90000dbbce8 EFLAGS: 00010246
      [  345.397078] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027
      [  345.398114] RDX: 0000000000000000 RSI: ffff888813be91f0 RDI: ffff888813be91f8
      [  345.399039] RBP: ffffc90000dbbd30 R08: 0000000000000001 R09: 0000000000000001
      [  345.399950] R10: 0000000064c66670 R11: 00000000ef955201 R12: ffff888812d32800
      [  345.401031] R13: 0000000000000000 R14: ffff888113e51540 R15: ffff888113e51540
      [  345.401976] FS:  00007f3747f1d780(0000) GS:ffff888813a00000(0000) knlGS:0000000000000000
      [  345.402997] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  345.403737] CR2: 0000000000000690 CR3: 000000081a4bc000 CR4: 00000000003506e0
      [  345.404685] Call Trace:
      [  345.405031]  bio_associate_blkg+0x71/0x1c0
      [  345.405649]  nvme_submit_user_cmd+0x1aa/0x38e [nvme_core]
      [  345.406348]  nvme_user_cmd.isra.73.cold.98+0x54/0x92 [nvme_core]
      [  345.407117]  nvme_ioctl+0x226/0x260 [nvme_core]
      [  345.407707]  blkdev_ioctl+0x1c8/0x2b0
      [  345.408183]  block_ioctl+0x3f/0x50
      [  345.408627]  __x64_sys_ioctl+0x84/0xc0
      [  345.409117]  do_syscall_64+0x33/0x40
      [  345.409592]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  345.410233] RIP: 0033:0x7f3747632107
      
      [  345.413125] RSP: 002b:00007ffe461b6648 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
      [  345.414086] RAX: ffffffffffffffda RBX: 00000000007b7fd0 RCX: 00007f3747632107
      [  345.414998] RDX: 00007ffe461b6650 RSI: 00000000c0484e41 RDI: 0000000000000004
      [  345.415966] RBP: 0000000000000004 R08: 00000000007b7fe8 R09: 00000000007b9080
      [  345.416883] R10: 00007ffe461b62c0 R11: 0000000000000206 R12: 00000000007b7fd0
      [  345.417808] R13: 0000000000000000 R14: 0000000000000003 R15: 0000000000000000
      
      Add a NULL check before we set the bdev for bio.
      
      This issue is found on block/for-next tree.
      
      Fixes: 309dca30 ("block: store a block_device pointer in struct bio")
      Signed-off-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      59c15743
    • Jens Axboe's avatar
      mm: only make map_swap_entry available for CONFIG_HIBERNATION · 3e3126cf
      Jens Axboe authored
      Current tree spews this on compile:
      
      mm/swapfile.c:2290:17: warning: ‘map_swap_entry’ defined but not used [-Wunused-function]
       2290 | static sector_t map_swap_entry(swp_entry_t entry, struct block_device **bdev)
             |                 ^~~~~~~~~~~~~~
      
      if !CONFIG_HIBERNATION, as we don't use the function unless we have that
      config option set.
      
      Fixes: 48d15436 ("mm: remove get_swap_bio")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3e3126cf