1. 07 Apr, 2022 2 commits
    • Linus Torvalds's avatar
      Merge tag 'random-5.18-rc2-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random · 3638bd90
      Linus Torvalds authored
      Pull random number generator fixes from Jason Donenfeld:
      
       - Another fixup to the fast_init/crng_init split, this time in how much
         entropy is being credited, from Jan Varho.
      
       - As discussed, we now opportunistically call try_to_generate_entropy()
         in /dev/urandom reads, as a replacement for the reverted commit. I
         opted to not do the more invasive wait_for_random_bytes() change at
         least for now, preferring to do something smaller and more obvious
         for the time being, but maybe that can be revisited as things evolve
         later.
      
       - Userspace can use FUSE or userfaultfd or simply move a process to
         idle priority in order to make a read from the random device never
         complete, which breaks forward secrecy, fixed by overwriting
         sensitive bytes early on in the function.
      
       - Jann Horn noticed that /dev/urandom reads were only checking for
         pending signals if need_resched() was true, a bug going back to the
         genesis commit, now fixed by always checking for signal_pending() and
         calling cond_resched(). This explains various noticeable signal
         delivery delays I've seen in programs over the years that do long
         reads from /dev/urandom.
      
       - In order to be more like other devices (e.g. /dev/zero) and to
         mitigate the impact of fixing the above bug, which has been around
         forever (users have never really needed to check the return value of
         read() for medium-sized reads and so perhaps many didn't), we now
         move signal checking to the bottom part of the loop, and do so every
         PAGE_SIZE-bytes.
      
      * tag 'random-5.18-rc2-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
        random: check for signals every PAGE_SIZE chunk of /dev/[u]random
        random: check for signal_pending() outside of need_resched() check
        random: do not allow user to keep crng key around on stack
        random: opportunistically initialize on /dev/urandom reads
        random: do not split fast init input in add_hwgenerator_randomness()
      3638bd90
    • Linus Torvalds's avatar
      Merge tag 'ata-5.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata · 640b5037
      Linus Torvalds authored
      Pull ata fixes from Damien Le Moal:
      
       - Fix a compilation warning due to an uninitialized variable in
         ata_sff_lost_interrupt(), from me.
      
       - Fix invalid internal command tag handling in the sata_dwc_460ex
         driver, from Christian.
      
       - Disable READ LOG DMA EXT with Samsung 840 EVO SSDs as this command
         causes the drives to hang, from Christian.
      
       - Change the config option CONFIG_SATA_LPM_POLICY back to its original
         name CONFIG_SATA_LPM_MOBILE_POLICY to avoid potential problems with
         users losing their configuration (as discussed during the merge
         window), from Mario.
      
      * tag 'ata-5.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
        ata: ahci: Rename CONFIG_SATA_LPM_POLICY configuration item back
        ata: libata-core: Disable READ LOG DMA EXT for Samsung 840 EVOs
        ata: sata_dwc_460ex: Fix crash due to OOB write
        ata: libata-sff: Fix compilation warning in ata_sff_lost_interrupt()
      640b5037
  2. 06 Apr, 2022 4 commits
    • Jason A. Donenfeld's avatar
      random: check for signals every PAGE_SIZE chunk of /dev/[u]random · e3c1c4fd
      Jason A. Donenfeld authored
      In 1448769c ("random: check for signal_pending() outside of
      need_resched() check"), Jann pointed out that we previously were only
      checking the TIF_NOTIFY_SIGNAL and TIF_SIGPENDING flags if the process
      had TIF_NEED_RESCHED set, which meant in practice, super long reads to
      /dev/[u]random would delay signal handling by a long time. I tried this
      using the below program, and indeed I wasn't able to interrupt a
      /dev/urandom read until after several megabytes had been read. The bug
      he fixed has always been there, and so code that reads from /dev/urandom
      without checking the return value of read() has mostly worked for a long
      time, for most sizes, not just for <= 256.
      
      Maybe it makes sense to keep that code working. The reason it was so
      small prior, ignoring the fact that it didn't work anyway, was likely
      because /dev/random used to block, and that could happen for pretty
      large lengths of time while entropy was gathered. But now, it's just a
      chacha20 call, which is extremely fast and is just operating on pure
      data, without having to wait for some external event. In that sense,
      /dev/[u]random is a lot more like /dev/zero.
      
      Taking a page out of /dev/zero's read_zero() function, it always returns
      at least one chunk, and then checks for signals after each chunk. Chunk
      sizes there are of length PAGE_SIZE. Let's just copy the same thing for
      /dev/[u]random, and check for signals and cond_resched() for every
      PAGE_SIZE amount of data. This makes the behavior more consistent with
      expectations, and should mitigate the impact of Jann's fix for the
      age-old signal check bug.
      
      ---- test program ----
      
        #include <unistd.h>
        #include <signal.h>
        #include <stdio.h>
        #include <sys/random.h>
      
        static unsigned char x[~0U];
      
        static void handle(int) { }
      
        int main(int argc, char *argv[])
        {
          pid_t pid = getpid(), child;
          signal(SIGUSR1, handle);
          if (!(child = fork())) {
            for (;;)
              kill(pid, SIGUSR1);
          }
          pause();
          printf("interrupted after reading %zd bytes\n", getrandom(x, sizeof(x), 0));
          kill(child, SIGTERM);
          return 0;
        }
      
      Cc: Jann Horn <jannh@google.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      e3c1c4fd
    • Jann Horn's avatar
      random: check for signal_pending() outside of need_resched() check · 1448769c
      Jann Horn authored
      signal_pending() checks TIF_NOTIFY_SIGNAL and TIF_SIGPENDING, which
      signal that the task should bail out of the syscall when possible. This
      is a separate concept from need_resched(), which checks
      TIF_NEED_RESCHED, signaling that the task should preempt.
      
      In particular, with the current code, the signal_pending() bailout
      probably won't work reliably.
      
      Change this to look like other functions that read lots of data, such as
      read_zero().
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      1448769c
    • Jason A. Donenfeld's avatar
      random: do not allow user to keep crng key around on stack · aba120cc
      Jason A. Donenfeld authored
      The fast key erasure RNG design relies on the key that's used to be used
      and then discarded. We do this, making judicious use of
      memzero_explicit().  However, reads to /dev/urandom and calls to
      getrandom() involve a copy_to_user(), and userspace can use FUSE or
      userfaultfd, or make a massive call, dynamically remap memory addresses
      as it goes, and set the process priority to idle, in order to keep a
      kernel stack alive indefinitely. By probing
      /proc/sys/kernel/random/entropy_avail to learn when the crng key is
      refreshed, a malicious userspace could mount this attack every 5 minutes
      thereafter, breaking the crng's forward secrecy.
      
      In order to fix this, we just overwrite the stack's key with the first
      32 bytes of the "free" fast key erasure output. If we're returning <= 32
      bytes to the user, then we can still return those bytes directly, so
      that short reads don't become slower. And for long reads, the difference
      is hopefully lost in the amortization, so it doesn't change much, with
      that amortization helping variously for medium reads.
      
      We don't need to do this for get_random_bytes() and the various
      kernel-space callers, and later, if we ever switch to always batching,
      this won't be necessary either, so there's no need to change the API of
      these functions.
      
      Cc: Theodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJann Horn <jannh@google.com>
      Fixes: c92e040d ("random: add backtracking protection to the CRNG")
      Fixes: 186873c5 ("random: use simpler fast key erasure flow on per-cpu keys")
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      aba120cc
    • Mario Limonciello's avatar
      ata: ahci: Rename CONFIG_SATA_LPM_POLICY configuration item back · 55b01415
      Mario Limonciello authored
      CONFIG_SATA_LPM_MOBILE_POLICY was renamed to CONFIG_SATA_LPM_POLICY in
      commit 4dd4d3de ("ata: ahci: Rename CONFIG_SATA_LPM_MOBILE_POLICY
      configuration item").
      
      This can potentially cause problems as users would invisibly lose
      configuration policy defaults when they built the new kernel. To
      avoid such problems, switch back to the old name (even if it's wrong).
      Suggested-by: default avatarChristoph Hellwig <hch@infradead.org>
      Suggested-by: default avatarDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Signed-off-by: default avatarMario Limonciello <mario.limonciello@amd.com>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@opensource.wdc.com>
      55b01415
  3. 05 Apr, 2022 5 commits
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 3e732ebf
      Linus Torvalds authored
      Pull virtio fixes from Michael Tsirkin:
       "Fixes and cleanups:
      
         - A couple of mlx5 fixes related to cvq
      
         - A couple of reverts dropping useless code (code that used it got
           reverted earlier)"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        vdpa: mlx5: synchronize driver status with CVQ
        vdpa: mlx5: prevent cvq work from hogging CPU
        Revert "virtio_config: introduce a new .enable_cbs method"
        Revert "virtio: use virtio_device_ready() in virtio_device_restore()"
      3e732ebf
    • Pawan Gupta's avatar
      x86/speculation: Restore speculation related MSRs during S3 resume · e2a1256b
      Pawan Gupta authored
      After resuming from suspend-to-RAM, the MSRs that control CPU's
      speculative execution behavior are not being restored on the boot CPU.
      
      These MSRs are used to mitigate speculative execution vulnerabilities.
      Not restoring them correctly may leave the CPU vulnerable.  Secondary
      CPU's MSRs are correctly being restored at S3 resume by
      identify_secondary_cpu().
      
      During S3 resume, restore these MSRs for boot CPU when restoring its
      processor state.
      
      Fixes: 77243971 ("x86/bugs/intel: Set proper CPU features and setup RDS")
      Reported-by: default avatarNeelima Krishnan <neelima.krishnan@intel.com>
      Signed-off-by: default avatarPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Tested-by: default avatarNeelima Krishnan <neelima.krishnan@intel.com>
      Acked-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e2a1256b
    • Pawan Gupta's avatar
      x86/pm: Save the MSR validity status at context setup · 73924ec4
      Pawan Gupta authored
      The mechanism to save/restore MSRs during S3 suspend/resume checks for
      the MSR validity during suspend, and only restores the MSR if its a
      valid MSR.  This is not optimal, as an invalid MSR will unnecessarily
      throw an exception for every suspend cycle.  The more invalid MSRs,
      higher the impact will be.
      
      Check and save the MSR validity at setup.  This ensures that only valid
      MSRs that are guaranteed to not throw an exception will be attempted
      during suspend.
      
      Fixes: 7a9c2dd0 ("x86/pm: Introduce quirk framework to save/restore extra MSR registers around suspend/resume")
      Suggested-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: default avatarPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Reviewed-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      73924ec4
    • Linus Torvalds's avatar
      Merge tag 'for-5.18-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · ce4c854e
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
      
       - prevent deleting subvolume with active swapfile
      
       - fix qgroup reserve limit calculation overflow
      
       - remove device count in superblock and its item in one transaction so
         they cant't get out of sync
      
       - skip defragmenting an isolated sector, this could cause some extra IO
      
       - unify handling of mtime/permissions in hole punch with fallocate
      
       - zoned mode fixes:
           - remove assert checking for only single mode, we have the
             DUP mode implemented
           - fix potential lockdep warning while traversing devices
             when checking for zone activation
      
      * tag 'for-5.18-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: prevent subvol with swapfile from being deleted
        btrfs: do not warn for free space inode in cow_file_range
        btrfs: avoid defragging extents whose next extents are not targets
        btrfs: fix fallocate to use file_modified to update permissions consistently
        btrfs: remove device item and update super block in the same transaction
        btrfs: fix qgroup reserve overflow the qgroup limit
        btrfs: zoned: remove left over ASSERT checking for single profile
        btrfs: zoned: traverse devices under chunk_mutex in btrfs_can_activate_zone
      ce4c854e
    • Jason A. Donenfeld's avatar
      random: opportunistically initialize on /dev/urandom reads · 48bff105
      Jason A. Donenfeld authored
      In 6f98a4bf ("random: block in /dev/urandom"), we tried to make a
      successful try_to_generate_entropy() call *required* if the RNG was not
      already initialized. Unfortunately, weird architectures and old
      userspaces combined in TCG test harnesses, making that change still not
      realistic, so it was reverted in 0313bc27 ("Revert "random: block in
      /dev/urandom"").
      
      However, rather than making a successful try_to_generate_entropy() call
      *required*, we can instead make it *best-effort*.
      
      If try_to_generate_entropy() fails, it fails, and nothing changes from
      the current behavior. If it succeeds, then /dev/urandom becomes safe to
      use for free. This way, we don't risk the regression potential that led
      to us reverting the required-try_to_generate_entropy() call before.
      
      Practically speaking, this means that at least on x86, /dev/urandom
      becomes safe. Probably other architectures with working cycle counters
      will also become safe. And architectures with slow or broken cycle
      counters at least won't be affected at all by this change.
      
      So it may not be the glorious "all things are unified!" change we were
      hoping for initially, but practically speaking, it makes a positive
      impact.
      
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      48bff105
  4. 04 Apr, 2022 4 commits
    • Jan Varho's avatar
      random: do not split fast init input in add_hwgenerator_randomness() · 527a9867
      Jan Varho authored
      add_hwgenerator_randomness() tries to only use the required amount of input
      for fast init, but credits all the entropy, rather than a fraction of
      it. Since it's hard to determine how much entropy is left over out of a
      non-unformly random sample, either give it all to fast init or credit
      it, but don't attempt to do both. In the process, we can clean up the
      injection code to no longer need to return a value.
      Signed-off-by: default avatarJan Varho <jan.varho@gmail.com>
      [Jason: expanded commit message]
      Fixes: 73c7733f ("random: do not throw away excess input to crng_fast_load")
      Cc: stable@vger.kernel.org # 5.17+, requires af704c85Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      527a9867
    • Christian Lamparter's avatar
      ata: libata-core: Disable READ LOG DMA EXT for Samsung 840 EVOs · 53997522
      Christian Lamparter authored
      Samsung' 840 EVO with the latest firmware (EXT0DB6Q) locks up with
      the a message: "READ LOG DMA EXT failed, trying PIO" during boot.
      
      Initially this was discovered because it caused a crash
      with the sata_dwc_460ex controller on a WD MyBook Live DUO.
      
      The reporter "Tice Rex" which has the unique opportunity that he
      has two Samsung 840 EVO SSD! One with the older firmware "EXT0BB0Q"
      which booted fine and didn't expose "READ LOG DMA EXT". But the
      newer/latest firmware "EXT0DB6Q" caused the headaches.
      
      BugLink: https://github.com/openwrt/openwrt/issues/9505Signed-off-by: default avatarChristian Lamparter <chunkeey@gmail.com>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@opensource.wdc.com>
      53997522
    • Christian Lamparter's avatar
      ata: sata_dwc_460ex: Fix crash due to OOB write · 7aa8104a
      Christian Lamparter authored
      the driver uses libata's "tag" values from in various arrays.
      Since the mentioned patch bumped the ATA_TAG_INTERNAL to 32,
      the value of the SATA_DWC_QCMD_MAX needs to account for that.
      
      Otherwise ATA_TAG_INTERNAL usage cause similar crashes like
      this as reported by Tice Rex on the OpenWrt Forum and
      reproduced (with symbols) here:
      
      | BUG: Kernel NULL pointer dereference at 0x00000000
      | Faulting instruction address: 0xc03ed4b8
      | Oops: Kernel access of bad area, sig: 11 [#1]
      | BE PAGE_SIZE=4K PowerPC 44x Platform
      | CPU: 0 PID: 362 Comm: scsi_eh_1 Not tainted 5.4.163 #0
      | NIP:  c03ed4b8 LR: c03d27e8 CTR: c03ed36c
      | REGS: cfa59950 TRAP: 0300   Not tainted  (5.4.163)
      | MSR:  00021000 <CE,ME>  CR: 42000222  XER: 00000000
      | DEAR: 00000000 ESR: 00000000
      | GPR00: c03d27e8 cfa59a08 cfa55fe0 00000000 0fa46bc0 [...]
      | [..]
      | NIP [c03ed4b8] sata_dwc_qc_issue+0x14c/0x254
      | LR [c03d27e8] ata_qc_issue+0x1c8/0x2dc
      | Call Trace:
      | [cfa59a08] [c003f4e0] __cancel_work_timer+0x124/0x194 (unreliable)
      | [cfa59a78] [c03d27e8] ata_qc_issue+0x1c8/0x2dc
      | [cfa59a98] [c03d2b3c] ata_exec_internal_sg+0x240/0x524
      | [cfa59b08] [c03d2e98] ata_exec_internal+0x78/0xe0
      | [cfa59b58] [c03d30fc] ata_read_log_page.part.38+0x1dc/0x204
      | [cfa59bc8] [c03d324c] ata_identify_page_supported+0x68/0x130
      | [...]
      
      This is because sata_dwc_dma_xfer_complete() NULLs the
      dma_pending's next neighbour "chan" (a *dma_chan struct) in
      this '32' case right here (line ~735):
      > hsdevp->dma_pending[tag] = SATA_DWC_DMA_PENDING_NONE;
      
      Then the next time, a dma gets issued; dma_dwc_xfer_setup() passes
      the NULL'd hsdevp->chan to the dmaengine_slave_config() which then
      causes the crash.
      
      With this patch, SATA_DWC_QCMD_MAX is now set to ATA_MAX_QUEUE + 1.
      This avoids the OOB. But please note, there was a worthwhile discussion
      on what ATA_TAG_INTERNAL and ATA_MAX_QUEUE is. And why there should not
      be a "fake" 33 command-long queue size.
      
      Ideally, the dw driver should account for the ATA_TAG_INTERNAL.
      In Damien Le Moal's words: "... having looked at the driver, it
      is a bigger change than just faking a 33rd "tag" that is in fact
      not a command tag at all."
      
      Fixes: 28361c40 ("libata: add extra internal command")
      Cc: stable@kernel.org # 4.18+
      BugLink: https://github.com/openwrt/openwrt/issues/9505Signed-off-by: default avatarChristian Lamparter <chunkeey@gmail.com>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@opensource.wdc.com>
      7aa8104a
    • Damien Le Moal's avatar
      ata: libata-sff: Fix compilation warning in ata_sff_lost_interrupt() · 76ed2f61
      Damien Le Moal authored
      When returning false, ata_sff_altstatus() does not return any status
      value, resulting in a compilation warning in ata_sff_lost_interrupt()
      ("uninitialized symbol 'status'"). Fix this by initializing the local
      variable "status" to 0.
      
      Fixes: 03c0e84f ("ata: libata-sff: refactor ata_sff_altstatus()")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@opensource.wdc.com>
      76ed2f61
  5. 03 Apr, 2022 8 commits
  6. 02 Apr, 2022 17 commits