1. 05 Sep, 2023 20 commits
    • Andreas Gruenbacher's avatar
      gfs2: Stop using gfs2_make_fs_ro for withdraw · f66af88e
      Andreas Gruenbacher authored
      [   81.372851][ T5532] CPU: 1 PID: 5532 Comm: syz-executor.0 Not tainted 6.2.0-rc1-syzkaller-dirty #0
      [   81.382080][ T5532] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023
      [   81.392343][ T5532] Call Trace:
      [   81.395654][ T5532]  <TASK>
      [   81.398603][ T5532]  dump_stack_lvl+0x1b1/0x290
      [   81.418421][ T5532]  gfs2_assert_warn_i+0x19a/0x2e0
      [   81.423480][ T5532]  gfs2_quota_cleanup+0x4c6/0x6b0
      [   81.428611][ T5532]  gfs2_make_fs_ro+0x517/0x610
      [   81.457802][ T5532]  gfs2_withdraw+0x609/0x1540
      [   81.481452][ T5532]  gfs2_inode_refresh+0xb2d/0xf60
      [   81.506658][ T5532]  gfs2_instantiate+0x15e/0x220
      [   81.511504][ T5532]  gfs2_glock_wait+0x1d9/0x2a0
      [   81.516352][ T5532]  do_sync+0x485/0xc80
      [   81.554943][ T5532]  gfs2_quota_sync+0x3da/0x8b0
      [   81.559738][ T5532]  gfs2_sync_fs+0x49/0xb0
      [   81.564063][ T5532]  sync_filesystem+0xe8/0x220
      [   81.568740][ T5532]  generic_shutdown_super+0x6b/0x310
      [   81.574112][ T5532]  kill_block_super+0x79/0xd0
      [   81.578779][ T5532]  deactivate_locked_super+0xa7/0xf0
      [   81.584064][ T5532]  cleanup_mnt+0x494/0x520
      [   81.593753][ T5532]  task_work_run+0x243/0x300
      [   81.608837][ T5532]  exit_to_user_mode_loop+0x124/0x150
      [   81.614232][ T5532]  exit_to_user_mode_prepare+0xb2/0x140
      [   81.619820][ T5532]  syscall_exit_to_user_mode+0x26/0x60
      [   81.625287][ T5532]  do_syscall_64+0x49/0xb0
      [   81.629710][ T5532]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      In this backtrace, gfs2_quota_sync() takes quota data references and
      then calls do_sync().  Function do_sync() encounters filesystem
      corruption and withdraws the filesystem, which (among other things) calls
      gfs2_quota_cleanup().  Function gfs2_quota_cleanup() wrongly assumes
      that nobody is holding any quota data references anymore, and destroys
      all quota data objects.  When gfs2_quota_sync() then resumes and
      dereferences the quota data objects it is holding, those objects are no
      longer there.
      
      Function gfs2_quota_cleanup() deals with resource deallocation and can
      easily be delayed until gfs2_put_super() in the case of a filesystem
      withdraw.  In fact, most of the other work gfs2_make_fs_ro() does is
      unnecessary during a withdraw as well, so change signal_our_withdraw()
      to skip gfs2_make_fs_ro() and perform the necessary steps directly
      instead.
      
      Thanks to Edward Adam Davis <eadavis@sina.com> for the initial patches.
      
      Link: https://lore.kernel.org/all/0000000000002b5e2405f14e860f@google.com
      Reported-by: syzbot+3f6a670108ce43356017@syzkaller.appspotmail.com
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      f66af88e
    • Andreas Gruenbacher's avatar
      gfs2: Free quota data objects synchronously · a475c5dd
      Andreas Gruenbacher authored
      In gfs2_quota_cleanup(), wait for the quota data objects to be freed
      before returning.  Otherwise, there is no guarantee that the quota data
      objects will be gone when their kmem cache is destroyed.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      a475c5dd
    • Andreas Gruenbacher's avatar
      gfs2: Fix initial quota data refcount · bb73ae8f
      Andreas Gruenbacher authored
      Fix the refcount of quota data objects created directly by
      gfs2_quota_init(): those are placed into the in-memory quota "database"
      for eventual syncing to the main quota file, but they are not actively
      held and should thus have an initial refcount of 0.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      bb73ae8f
    • Andreas Gruenbacher's avatar
      gfs2: No more quota complaints after withdraw · fae2e73a
      Andreas Gruenbacher authored
      Once a filesystem is withdrawn, don't complain about quota changes
      that can't be synced to the main quota file anymore.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      fae2e73a
    • Andreas Gruenbacher's avatar
      gfs2: Factor out duplicate quota data disposal code · faada74a
      Andreas Gruenbacher authored
      Rename gfs2_qd_dispose() to gfs2_qd_dispose_list().  Move some code
      duplicated in gfs2_qd_dispose_list() and gfs2_quota_cleanup() into a
      new gfs2_qd_dispose() function.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      faada74a
    • Andreas Gruenbacher's avatar
      gfs2: Use gfs2_qd_dispose in gfs2_quota_cleanup · 961fe342
      Andreas Gruenbacher authored
      Change gfs2_quota_cleanup() to move the quota data objects to dispose of
      on a dispose list and call gfs2_qd_dispose() on that list, like
      gfs2_qd_shrink_scan() does, instead of disposing of the quota data
      objects directly.
      
      This may look a bit pointless by itself, but it will make more sense in
      combination with a fix that follows.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      961fe342
    • Andreas Gruenbacher's avatar
      gfs2: Fix wrong quota shrinker return value · 6b0e9a5f
      Andreas Gruenbacher authored
      Function gfs2_qd_isolate must only return LRU_REMOVED when removing the
      item from the lru list; otherwise, the number of items on the list will
      go wrong.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      6b0e9a5f
    • Andreas Gruenbacher's avatar
      gfs2: Rename SDF_DEACTIVATING to SDF_KILL · e7beb8b6
      Andreas Gruenbacher authored
      Rename the SDF_DEACTIVATING flag to SDF_KILL to make it more obvious
      that this relates to the kill_sb filesystem operation.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      e7beb8b6
    • Andreas Gruenbacher's avatar
      gfs2: Rename sd_{ glock => kill }_wait · 3c69c437
      Andreas Gruenbacher authored
      Rename sd_glock_wait to sd_kill_wait: we'll use it for other things
      related to "killing" a filesystem on unmount soon (kill_sb).
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      3c69c437
    • Bob Peterson's avatar
      gfs2: Use qd_sbd more consequently · 481f6e7d
      Bob Peterson authored
      Before this patch many of the functions in quota.c got their superblock
      pointer, sdp, from the quota_data's glock pointer. That's silly because
      the qd already has its own pointer to the superblock (qd_sbd).
      
      This patch changes references to use that instead, eliminating a level
      of indirection.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      481f6e7d
    • Andreas Gruenbacher's avatar
      gfs2: journal flush threshold fixes and cleanup · db77789b
      Andreas Gruenbacher authored
      Commit f07b3520 ("GFS2: Made logd daemon take into account log
      demand") changed gfs2_ail_flush_reqd() and gfs2_jrnl_flush_reqd() to
      take sd_log_blks_needed into account, but the checks in
      gfs2_log_commit() were not updated correspondingly.
      
      Once that is fixed, gfs2_jrnl_flush_reqd() and gfs2_ail_flush_reqd() can
      be used in gfs2_log_commit().  Make those two helpers available to
      gfs2_log_commit() by defining them above gfs2_log_commit().
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      db77789b
    • Andreas Gruenbacher's avatar
      gfs2: Fix logd wakeup on I/O error · b6b8f72a
      Andreas Gruenbacher authored
      When quotad detects an I/O error, it sets sd_log_error and then it wakes
      up logd to withdraw the filesystem.  However, logd doesn't wake up when
      sd_log_error is set.  Fix that.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      b6b8f72a
    • Andreas Gruenbacher's avatar
      gfs2: low-memory forced flush fixes · b74cd55a
      Andreas Gruenbacher authored
      First, function gfs2_ail_flush_reqd checks the SDF_FORCE_AIL_FLUSH flag
      to determine if an AIL flush should be forced in low-memory situations.
      However, it also immediately clears the flag, and when called repeatedly
      as in function gfs2_logd, the flag will be lost.  Fix that by pulling
      the SDF_FORCE_AIL_FLUSH flag check out of gfs2_ail_flush_reqd.
      
      Second, function gfs2_writepages sets the SDF_FORCE_AIL_FLUSH flag
      whether or not enough pages were written.  If enough pages could be
      written, flushing the AIL is unnecessary, though.
      
      Third, gfs2_writepages doesn't wake up logd after setting the
      SDF_FORCE_AIL_FLUSH flag, so it can take a long time for logd to react.
      It would be preferable to wake up logd, but that hurts the performance
      of some workloads and we don't quite understand why so far, so don't
      wake up logd so far.
      
      Fixes: b066a4ee ("gfs2: forcibly flush ail to relieve memory pressure")
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      b74cd55a
    • Andreas Gruenbacher's avatar
      gfs2: Switch to wait_event in gfs2_logd · 6df373b0
      Andreas Gruenbacher authored
      In gfs2_logd(), switch from an open-coded wait loop to
      wait_event_interruptible_timeout().
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      6df373b0
    • Bob Peterson's avatar
      gfs2: conversion deadlock do_promote bypass · 66fa9912
      Bob Peterson authored
      Consider the following case:
      1. A glock is held in shared mode.
      2. A process requests the glock in exclusive mode (rename).
      3. Before the lock is granted, more processes (read / ls) request the
         glock in shared mode again.
      4. gfs2 sends a request to dlm for the lock in exclusive mode because
         that holder is at the head of the queue.
      5. Somehow the dlm request gets canceled, so dlm sends us back a
         response with state == LM_ST_SHARED and LM_OUT_CANCELED.  So at that
         point, the glock is still held in shared mode.
      6. finish_xmote gets called to process the response from dlm. It detects
         that the glock is not in the requested mode and no demote is in
         progress, so it moves the canceled holder to the tail of the queue
         and finds the new holder at the head of the queue.  That holder is
         requesting the glock in shared mode.
      7. finish_xmote calls do_xmote to transition the glock into shared mode,
         but the glock is already in shared mode and so do_xmote complains
         about that with:
      	GLOCK_BUG_ON(gl, gl->gl_state == gl->gl_target);
      
      Instead, in finish_xmote, after moving the canceled holder to the tail
      of the queue, check if any new holders can be granted.  Only call
      do_xmote to repeat the dlm request if the holder at the head of the
      queue is requesting the glock in a mode that is incompatible with the
      mode the glock is currently held in.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      66fa9912
    • Andreas Gruenbacher's avatar
      gfs2: Remove LM_FLAG_PRIORITY flag · 0b93bac2
      Andreas Gruenbacher authored
      The last user of this flag was removed in commit b77b4a48 ("gfs2:
      Rework freeze / thaw logic").
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      0b93bac2
    • Andreas Gruenbacher's avatar
      gfs2: do_promote cleanup · de3e7f97
      Andreas Gruenbacher authored
      Change function do_promote to return true on success, and false
      otherwise.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      de3e7f97
    • Andreas Gruenbacher's avatar
      gfs: Don't use GFP_NOFS in gfs2_unstuff_dinode · dc0b9435
      Andreas Gruenbacher authored
      Revert the rest of commit 220cca2a ("GFS2: Change truncate page
      allocation to be GFP_NOFS"):
      
      In gfs2_unstuff_dinode(), there is no need to carry out the page cache
      allocation under GFP_NOFS because inodes on the "regular" filesystem are
      never un-inlined under memory pressure, so switch back from
      find_or_create_page() to grab_cache_page() here as well.
      
      Inodes on the "metadata" filesystem can theoretically be un-inlined
      under memory pressure, but any page cache allocations in that context
      would happen in GFP_NOFS context because those inodes have
      inode->i_mapping->gfp_mask set to GFP_NOFS (see the previous patch).
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      dc0b9435
    • Andreas Gruenbacher's avatar
      gfs2: Use mapping->gfp_mask for metadata inodes · 111c7d27
      Andreas Gruenbacher authored
      Set mapping->gfp mask to GFP_NOFS for all metadata inodes so that
      allocating pages in the address space of those inodes won't call back
      into the filesystem.  This allows to switch back from
      find_or_create_page() to grab_cache_page() in two places.
      
      Partially reverts commit 220cca2a ("GFS2: Change truncate page
      allocation to be GFP_NOFS").
      
      Thanks to Dan Carpenter <dan.carpenter@linaro.org> for pointing out a
      Smatch static checker warning.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      111c7d27
    • Minjie Du's avatar
      gfs2: increase usage of folio_next_index() helper · 5f02d168
      Minjie Du authored
      Simplify code pattern of 'folio->index + folio_nr_pages(folio)' by using
      the existing helper folio_next_index().
      Signed-off-by: default avatarMinjie Du <duminjie@vivo.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      5f02d168
  2. 08 Aug, 2023 3 commits
  3. 07 Aug, 2023 12 commits
    • Linus Torvalds's avatar
      Merge tag 'x86_bugs_srso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 138bcddb
      Linus Torvalds authored
      Pull x86/srso fixes from Borislav Petkov:
       "Add a mitigation for the speculative RAS (Return Address Stack)
        overflow vulnerability on AMD processors.
      
        In short, this is yet another issue where userspace poisons a
        microarchitectural structure which can then be used to leak privileged
        information through a side channel"
      
      * tag 'x86_bugs_srso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/srso: Tie SBPB bit setting to microcode patch detection
        x86/srso: Add a forgotten NOENDBR annotation
        x86/srso: Fix return thunks in generated code
        x86/srso: Add IBPB on VMEXIT
        x86/srso: Add IBPB
        x86/srso: Add SRSO_NO support
        x86/srso: Add IBPB_BRTYPE support
        x86/srso: Add a Speculative RAS Overflow mitigation
        x86/bugs: Increase the x86 bugs vector size to two u32s
      138bcddb
    • Linus Torvalds's avatar
      Merge tag 'wq-for-6.5-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq · 14f9643d
      Linus Torvalds authored
      Pull workqueue fixes from Tejun Heo:
      
       - The recently added cpu_intensive auto detection and warning mechanism
         was spuriously triggered on slow CPUs.
      
         While not causing serious issues, it's still a nuisance and can cause
         unintended concurrency management behaviors.
      
         Relax the threshold on machines with lower BogoMIPS. While BogoMIPS
         is not an accurate measure of performance by most measures, we don't
         have to be accurate and it has rough but strong enough correlation.
      
       - A correction in Kconfig help text
      
      * tag 'wq-for-6.5-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
        workqueue: Scale up wq_cpu_intensive_thresh_us if BogoMIPS is below 4000
        workqueue: Fix cpu_intensive_thresh_us name in help text
      14f9643d
    • Linus Torvalds's avatar
      Merge tag 'tpmdd-v6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd · 8043e222
      Linus Torvalds authored
      Pull tpm fixes from Jarkko Sakkinen:
       "A few more bug fixes"
      
      * tag 'tpmdd-v6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
        tpm/tpm_tis: Disable interrupts for Lenovo P620 devices
        tpm: Disable RNG for all AMD fTPMs
        sysctl: set variable key_sysctls storage-class-specifier to static
        tpm/tpm_tis: Disable interrupts for TUXEDO InfinityBook S 15/17 Gen7
      8043e222
    • Jonathan McDowell's avatar
      tpm/tpm_tis: Disable interrupts for Lenovo P620 devices · e117e7ad
      Jonathan McDowell authored
      The Lenovo ThinkStation P620 suffers from an irq storm issue like various
      other Lenovo machines, so add an entry for it to tpm_tis_dmi_table and
      force polling.
      
      It is worth noting that 481c2d14 (tpm,tpm_tis: Disable interrupts after
      1000 unhandled IRQs) does not seem to fix the problem on this machine, but
      setting 'tpm_tis.interrupts=0' on the kernel command line does.
      
      [jarkko@kernel.org: truncated the commit ID in the description to 12
      characters]
      Cc: stable@vger.kernel.org # v6.4+
      Fixes: e644b2f4 ("tpm, tpm_tis: Enable interrupt test")
      Signed-off-by: default avatarJonathan McDowell <noodles@meta.com>
      Reviewed-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
      Signed-off-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
      e117e7ad
    • Mario Limonciello's avatar
      tpm: Disable RNG for all AMD fTPMs · 554b841d
      Mario Limonciello authored
      The TPM RNG functionality is not necessary for entropy when the CPU
      already supports the RDRAND instruction. The TPM RNG functionality
      was previously disabled on a subset of AMD fTPM series, but reports
      continue to show problems on some systems causing stutter root caused
      to TPM RNG functionality.
      
      Expand disabling TPM RNG use for all AMD fTPMs whether they have versions
      that claim to have fixed or not. To accomplish this, move the detection
      into part of the TPM CRB registration and add a flag indicating that
      the TPM should opt-out of registration to hwrng.
      
      Cc: stable@vger.kernel.org # 6.1.y+
      Fixes: b006c439 ("hwrng: core - start hwrng kthread also for untrusted sources")
      Fixes: f1324bbc ("tpm: disable hwrng for fTPM on some AMD designs")
      Reported-by: daniil.stas@posteo.net
      Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217719
      Reported-by: bitlord0xff@gmail.com
      Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217212Signed-off-by: default avatarMario Limonciello <mario.limonciello@amd.com>
      Reviewed-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
      Signed-off-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
      554b841d
    • Tom Rix's avatar
      sysctl: set variable key_sysctls storage-class-specifier to static · 0de030b3
      Tom Rix authored
      smatch reports
      security/keys/sysctl.c:12:18: warning: symbol
        'key_sysctls' was not declared. Should it be static?
      
      This variable is only used in its defining file, so it should be static.
      Signed-off-by: default avatarTom Rix <trix@redhat.com>
      Signed-off-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
      0de030b3
    • Takashi Iwai's avatar
      tpm/tpm_tis: Disable interrupts for TUXEDO InfinityBook S 15/17 Gen7 · 0b15afc9
      Takashi Iwai authored
      TUXEDO InfinityBook S 15/17 Gen7 suffers from an IRQ problem on
      tpm_tis like a few other laptops.  Add an entry for the workaround.
      
      Cc: stable@vger.kernel.org
      Fixes: e644b2f4 ("tpm, tpm_tis: Enable interrupt test")
      Link: https://bugzilla.suse.com/show_bug.cgi?id=1213645Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Acked-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
      Signed-off-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
      0b15afc9
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · a027b2ec
      Linus Torvalds authored
      Pull kvm fixes from Paolo Bonzini:
       "x86:
      
         - Fix SEV race condition
      
        ARM:
      
         - Fixes for the configuration of SVE/SME traps when hVHE mode is in
           use
      
         - Allow use of pKVM on systems with FF-A implementations that are
           v1.0 compatible
      
         - Request/release percpu IRQs (arch timer, vGIC maintenance)
           correctly when pKVM is in use
      
         - Fix function prototype after __kvm_host_psci_cpu_entry() rename
      
         - Skip to the next instruction when emulating writes to TCR_EL1 on
           AmpereOne systems
      
        Selftests:
      
         - Fix missing include"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        selftests/rseq: Fix build with undefined __weak
        KVM: SEV: remove ghcb variable declarations
        KVM: SEV: only access GHCB fields once
        KVM: SEV: snapshot the GHCB before accessing it
        KVM: arm64: Skip instruction after emulating write to TCR_EL1
        KVM: arm64: fix __kvm_host_psci_cpu_entry() prototype
        KVM: arm64: Fix resetting SME trap values on reset for (h)VHE
        KVM: arm64: Fix resetting SVE trap values on reset for hVHE
        KVM: arm64: Use the appropriate feature trap register when activating traps
        KVM: arm64: Helper to write to appropriate feature trap register based on mode
        KVM: arm64: Disable SME traps for (h)VHE at setup
        KVM: arm64: Use the appropriate feature trap register for SVE at EL2 setup
        KVM: arm64: Factor out code for checking (h)VHE mode into a macro
        KVM: arm64: Rephrase percpu enable/disable tracking in terms of hyp
        KVM: arm64: Fix hardware enable/disable flows for pKVM
        KVM: arm64: Allow pKVM on v1.0 compatible FF-A implementations
      a027b2ec
    • Linus Torvalds's avatar
      Merge tag 'mmc-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · 016ce297
      Linus Torvalds authored
      Pull MMC fixes from Ulf Hansson:
      
       - moxart: Fix big-endian conversion for SCR structure
      
       - sdhci-f-sdh30: Replace with sdhci_pltfm to fix PM support
      
      * tag 'mmc-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
        mmc: sdhci-f-sdh30: Replace with sdhci_pltfm
        mmc: moxart: read scr register without changing byte order
      016ce297
    • Bob Peterson's avatar
      gfs2: Don't use filemap_splice_read · 0be84321
      Bob Peterson authored
      Starting with patch 2cb1e089, gfs2 started using the new function
      filemap_splice_read rather than the old (and subsequently deleted)
      function generic_file_splice_read.
      
      filemap_splice_read works by taking references to a number of folios in
      the page cache and splicing those folios into a pipe.  The folios are
      then read from the pipe and the folio references are dropped.  This can
      take an arbitrary amount of time.  We cannot allow that in gfs2 because
      those folio references will pin the inode glock to the node and prevent
      it from being demoted, which can lead to cluster-wide deadlocks.
      
      Instead, use copy_splice_read.
      
      (In addition, the old generic_file_splice_read called into ->read_iter,
      which called gfs2_file_read_iter, which took the inode glock during the
      operation.  The new filemap_splice_read interface does not take the
      inode glock anymore.  This is fixable, but it still wouldn't prevent
      cluster-wide deadlocks.)
      
      Fixes: 2cb1e089 ("splice: Use filemap_splice_read() instead of generic_file_splice_read()")
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      0be84321
    • Andreas Gruenbacher's avatar
      gfs2: Fix freeze consistency check in gfs2_trans_add_meta · 2cbd8064
      Andreas Gruenbacher authored
      Function gfs2_trans_add_meta() checks for the SDF_FROZEN flag to make
      sure that no buffers are added to a transaction while the filesystem is
      frozen.  With the recent freeze/thaw rework, the SDF_FROZEN flag is
      cleared after thaw_super() is called, which is sufficient for
      serializing freeze/thaw.
      
      However, other filesystem operations started after thaw_super() may now
      be calling gfs2_trans_add_meta() before the SDF_FROZEN flag is cleared,
      which will trigger the SDF_FROZEN check in gfs2_trans_add_meta().  Fix
      that by checking the s_writers.frozen state instead.
      
      In addition, make sure not to call gfs2_assert_withdraw() with the
      sd_log_lock spin lock held.  Check for a withdrawn filesystem before
      checking for a frozen filesystem, and don't pin/add buffers to the
      current transaction in case of a failure in either case.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      2cbd8064
    • Borislav Petkov (AMD)'s avatar
      x86/srso: Tie SBPB bit setting to microcode patch detection · 5a15d834
      Borislav Petkov (AMD) authored
      The SBPB bit in MSR_IA32_PRED_CMD is supported only after a microcode
      patch has been applied so set X86_FEATURE_SBPB only then. Otherwise,
      guests would attempt to set that bit and #GP on the MSR write.
      
      While at it, make SMT detection more robust as some guests - depending
      on how and what CPUID leafs their report - lead to cpu_smt_control
      getting set to CPU_SMT_NOT_SUPPORTED but SRSO_NO should be set for any
      guest incarnation where one simply cannot do SMT, for whatever reason.
      
      Fixes: fb3bd914 ("x86/srso: Add a Speculative RAS Overflow mitigation")
      Reported-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reported-by: default avatarSalvatore Bonaccorso <carnil@debian.org>
      Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      5a15d834
  4. 06 Aug, 2023 5 commits
    • Linus Torvalds's avatar
      Linux 6.5-rc5 · 52a93d39
      Linus Torvalds authored
      52a93d39
    • Linus Torvalds's avatar
      Merge tag 'v6.5-rc5.vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 0108963f
      Linus Torvalds authored
      Pull vfs fixes from Christian Brauner:
      
       - Fix a wrong check for O_TMPFILE during RESOLVE_CACHED lookup
      
       - Clean up directory iterators and clarify file_needs_f_pos_lock()
      
      * tag 'v6.5-rc5.vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        fs: rely on ->iterate_shared to determine f_pos locking
        vfs: get rid of old '->iterate' directory operation
        proc: fix missing conversion to 'iterate_shared'
        open: make RESOLVE_CACHED correctly test for O_TMPFILE
      0108963f
    • Christian Brauner's avatar
      fs: rely on ->iterate_shared to determine f_pos locking · 7d84d1b9
      Christian Brauner authored
      Now that we removed ->iterate we don't need to check for either
      ->iterate or ->iterate_shared in file_needs_f_pos_lock(). Simply check
      for ->iterate_shared instead. This will tell us whether we need to
      unconditionally take the lock. Not just does it allow us to avoid
      checking f_inode's mode it also actually clearly shows that we're
      locking because of readdir.
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      7d84d1b9
    • Linus Torvalds's avatar
      vfs: get rid of old '->iterate' directory operation · 3e327154
      Linus Torvalds authored
      All users now just use '->iterate_shared()', which only takes the
      directory inode lock for reading.
      
      Filesystems that never got convered to shared mode now instead use a
      wrapper that drops the lock, re-takes it in write mode, calls the old
      function, and then downgrades the lock back to read mode.
      
      This way the VFS layer and other callers no longer need to care about
      filesystems that never got converted to the modern era.
      
      The filesystems that use the new wrapper are ceph, coda, exfat, jfs,
      ntfs, ocfs2, overlayfs, and vboxsf.
      
      Honestly, several of them look like they really could just iterate their
      directories in shared mode and skip the wrapper entirely, but the point
      of this change is to not change semantics or fix filesystems that
      haven't been fixed in the last 7+ years, but to finally get rid of the
      dual iterators.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      3e327154
    • Linus Torvalds's avatar
      proc: fix missing conversion to 'iterate_shared' · 0a2c2baa
      Linus Torvalds authored
      I'm looking at the directory handling due to the discussion about f_pos
      locking (see commit 79796425: "file: reinstate f_pos locking
      optimization for regular files"), and wanting to clean that up.
      
      And one source of ugliness is how we were supposed to move filesystems
      over to the '->iterate_shared()' function that only takes the inode lock
      for reading many many years ago, but several filesystems still use the
      bad old '->iterate()' that takes the inode lock for exclusive access.
      
      See commit 61922694 ("introduce a parallel variant of ->iterate()")
      that also added some documentation stating
      
            Old method is only used if the new one is absent; eventually it will
            be removed.  Switch while you still can; the old one won't stay.
      
      and that was back in April 2016.  Here we are, many years later, and the
      old version is still clearly sadly alive and well.
      
      Now, some of those old style iterators are probably just because the
      filesystem may end up having per-inode mutable data that it uses for
      iterating a directory, but at least one case is just a mistake.
      
      Al switched over most filesystems to use '->iterate_shared()' back when
      it was introduced.  In particular, the /proc filesystem was converted as
      one of the first ones in commit f50752ea ("switch all procfs
      directories ->iterate_shared()").
      
      But then later one new user of '->iterate()' was then re-introduced by
      commit 6d9c939d ("procfs: add smack subdir to attrs").
      
      And that's clearly not what we wanted, since that new case just uses the
      same 'proc_pident_readdir()' and 'proc_pident_lookup()' helper functions
      that other /proc pident directories use, and they are most definitely
      safe to use with the inode lock held shared.
      
      So just fix it.
      
      This still leaves a fair number of oddball filesystems using the
      old-style directory iterator (ceph, coda, exfat, jfs, ntfs, ocfs2,
      overlayfs, and vboxsf), but at least we don't have any remaining in the
      core filesystems.
      
      I'm going to add a wrapper function that just drops the read-lock and
      takes it as a write lock, so that we can clean up the core vfs layer and
      make all the ugly 'this filesystem needs exclusive inode locking' be
      just filesystem-internal warts.
      
      I just didn't want to make that conversion when we still had a core user
      left.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      0a2c2baa