1. 16 May, 2023 5 commits
    • Ritesh Harjani (IBM)'s avatar
      ext2: Move direct-io to use iomap · fb5de435
      Ritesh Harjani (IBM) authored
      This patch converts ext2 direct-io path to iomap interface.
      - This also takes care of DIO_SKIP_HOLES part in which we return -ENOTBLK
        from ext2_iomap_begin(), in case if the write is done on a hole.
      - This fallbacks to buffered-io in case of DIO_SKIP_HOLES or in case of
        a partial write or if any error is detected in ext2_iomap_end().
        We try to return -ENOTBLK in such cases.
      - For any unaligned or extending DIO writes, we pass
        IOMAP_DIO_FORCE_WAIT flag to ensure synchronous writes.
      - For extending writes we set IOMAP_F_DIRTY in ext2_iomap_begin because
        otherwise with dsync writes on devices that support FUA, generic_write_sync
        won't be called and we might miss inode metadata updates.
      - Since ext2 already now uses _nolock vartiant of sync write. Hence
        there is no inode lock problem with iomap in this patch.
      - ext2_iomap_ops are now being shared by DIO, DAX & fiemap path
      Tested-by: default avatarDisha Goel <disgoel@linux.ibm.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarRitesh Harjani (IBM) <ritesh.list@gmail.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Message-Id: <610b672a52f2a7ff6dc550fd14d0f995806232a5.1682069716.git.ritesh.list@gmail.com>
      fb5de435
    • Ritesh Harjani (IBM)'s avatar
      ext2: Use generic_buffers_fsync() implementation · d0530704
      Ritesh Harjani (IBM) authored
      Next patch converts ext2 to use iomap interface for DIO.
      iomap layer can call generic_write_sync() -> ext2_fsync() from
      iomap_dio_complete while still holding the inode_lock().
      
      Now writeback from other paths doesn't need inode_lock().
      It seems there is also no need of an inode_lock() for
      sync_mapping_buffers(). It uses it's own mapping->private_lock
      for it's buffer list handling.
      Hence this patch is in preparation to move ext2 to iomap.
      This uses generic_buffers_fsync() which does not take any inode_lock()
      in ext2_fsync().
      Tested-by: default avatarDisha Goel <disgoel@linux.ibm.com>
      Signed-off-by: default avatarRitesh Harjani (IBM) <ritesh.list@gmail.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Message-Id: <76d206a464574ff91db25bc9e43479b51ca7e307.1682069716.git.ritesh.list@gmail.com>
      d0530704
    • Ritesh Harjani (IBM)'s avatar
      ext4: Use generic_buffers_fsync_noflush() implementation · 5b5b4ff8
      Ritesh Harjani (IBM) authored
      ext4 when got converted to iomap for dio, it copied __generic_file_fsync
      implementation to avoid taking inode_lock in order to avoid any deadlock
      (since iomap takes an inode_lock while calling generic_write_sync()).
      
      The previous patch already added generic_buffers_fsync*() which does not
      take any inode_lock(). Hence kill the redundant code and use
      generic_buffers_fsync_noflush() function instead.
      Tested-by: default avatarDisha Goel <disgoel@linux.ibm.com>
      Signed-off-by: default avatarRitesh Harjani (IBM) <ritesh.list@gmail.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Message-Id: <b43d4bb4403061ed86510c9587673e30a461ba14.1682069716.git.ritesh.list@gmail.com>
      5b5b4ff8
    • Ritesh Harjani (IBM)'s avatar
      fs/buffer.c: Add generic_buffers_fsync*() implementation · 31b2ebc0
      Ritesh Harjani (IBM) authored
      Some of the higher layers like iomap takes inode_lock() when calling
      generic_write_sync().
      Also writeback already happens from other paths without inode lock,
      so it's difficult to say that we really need sync_mapping_buffers() to
      take any inode locking here. Having said that, let's add
      generic_buffers_fsync/_noflush() implementation in buffer.c with no
      inode_lock/unlock() for now so that filesystems like ext2 and
      ext4's nojournal mode can use it.
      
      Ext4 when got converted to iomap for direct-io already copied it's own
      variant of __generic_file_fsync() without lock.
      
      This patch adds generic_buffers_fsync()
      & generic_buffers_fsync_noflush() implementations for use in filesystems
      like ext2 & ext4 respectively.
      
      Later we can review other filesystems as well to see if we can make
      generic_buffers_fsync/_noflush() which does not take any inode_lock() as
      the default path.
      Tested-by: default avatarDisha Goel <disgoel@linux.ibm.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarRitesh Harjani (IBM) <ritesh.list@gmail.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Message-Id: <d573408ac8408627d23a3d2d166e748c172c4c9e.1682069716.git.ritesh.list@gmail.com>
      31b2ebc0
    • Ritesh Harjani (IBM)'s avatar
      ext2/dax: Fix ext2_setsize when len is page aligned · fcced95b
      Ritesh Harjani (IBM) authored
      PAGE_ALIGN(x) macro gives the next highest value which is multiple of
      pagesize. But if x is already page aligned then it simply returns x.
      So, if x passed is 0 in dax_zero_range() function, that means the
      length gets passed as 0 to ->iomap_begin().
      
      In ext2 it then calls ext2_get_blocks -> max_blocks as 0 and hits bug_on
      here in ext2_get_blocks().
      	BUG_ON(maxblocks == 0);
      
      Instead we should be calling dax_truncate_page() here which takes
      care of it. i.e. it only calls dax_zero_range if the offset is not
      page/block aligned.
      
      This can be easily triggered with following on fsdax mounted pmem
      device.
      
      dd if=/dev/zero of=file count=1 bs=512
      truncate -s 0 file
      
      [79.525838] EXT2-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk
      [79.529376] ext2 filesystem being mounted at /mnt1/test supports timestamps until 2038 (0x7fffffff)
      [93.793207] ------------[ cut here ]------------
      [93.795102] kernel BUG at fs/ext2/inode.c:637!
      [93.796904] invalid opcode: 0000 [#1] PREEMPT SMP PTI
      [93.798659] CPU: 0 PID: 1192 Comm: truncate Not tainted 6.3.0-rc2-xfstests-00056-g131086faa369 #139
      [93.806459] RIP: 0010:ext2_get_blocks.constprop.0+0x524/0x610
      <...>
      [93.835298] Call Trace:
      [93.836253]  <TASK>
      [93.837103]  ? lock_acquire+0xf8/0x110
      [93.838479]  ? d_lookup+0x69/0xd0
      [93.839779]  ext2_iomap_begin+0xa7/0x1c0
      [93.841154]  iomap_iter+0xc7/0x150
      [93.842425]  dax_zero_range+0x6e/0xa0
      [93.843813]  ext2_setsize+0x176/0x1b0
      [93.845164]  ext2_setattr+0x151/0x200
      [93.846467]  notify_change+0x341/0x4e0
      [93.847805]  ? lock_acquire+0xf8/0x110
      [93.849143]  ? do_truncate+0x74/0xe0
      [93.850452]  ? do_truncate+0x84/0xe0
      [93.851739]  do_truncate+0x84/0xe0
      [93.852974]  do_sys_ftruncate+0x2b4/0x2f0
      [93.854404]  do_syscall_64+0x3f/0x90
      [93.855789]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
      
      CC: stable@vger.kernel.org
      Fixes: 2aa3048e ("iomap: switch iomap_zero_range to use iomap_iter")
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarRitesh Harjani (IBM) <ritesh.list@gmail.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Message-Id: <046a58317f29d9603d1068b2bbae47c2332c17ae.1682069716.git.ritesh.list@gmail.com>
      fcced95b
  2. 14 May, 2023 13 commits
  3. 13 May, 2023 17 commits
  4. 12 May, 2023 5 commits
    • Borislav Petkov (AMD)'s avatar
      x86/retbleed: Fix return thunk alignment · 9a48d604
      Borislav Petkov (AMD) authored
      SYM_FUNC_START_LOCAL_NOALIGN() adds an endbr leading to this layout
      (leaving only the last 2 bytes of the address):
      
        3bff <zen_untrain_ret>:
        3bff:       f3 0f 1e fa             endbr64
        3c03:       f6                      test   $0xcc,%bl
      
        3c04 <__x86_return_thunk>:
        3c04:       c3                      ret
        3c05:       cc                      int3
        3c06:       0f ae e8                lfence
      
      However, "the RET at __x86_return_thunk must be on a 64 byte boundary,
      for alignment within the BTB."
      
      Use SYM_START instead.
      Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9a48d604
    • Linus Torvalds's avatar
      Merge tag 'for-6.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 76c7f887
      Linus Torvalds authored
      Pull more btrfs fixes from David Sterba:
      
       - fix incorrect number of bitmap entries for space cache if loading is
         interrupted by some error
      
       - fix backref walking, this breaks a mode of LOGICAL_INO_V2 ioctl that
         is used in deduplication tools
      
       - zoned mode fixes:
            - properly finish zone reserved for relocation
            - correctly calculate super block zone end on ZNS
            - properly initialize new extent buffer for redirty
      
       - make mount option clear_cache work with block-group-tree, to rebuild
         free-space-tree instead of temporarily disabling it that would lead
         to a forced read-only mount
      
       - fix alignment check for offset when printing extent item
      
      * tag 'for-6.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: make clear_cache mount option to rebuild FST without disabling it
        btrfs: zero the buffer before marking it dirty in btrfs_redirty_list_add
        btrfs: zoned: fix full zone super block reading on ZNS
        btrfs: zoned: zone finish data relocation BG with last IO
        btrfs: fix backref walking not returning all inode refs
        btrfs: fix space cache inconsistency after error loading it from disk
        btrfs: print-tree: parent bytenr must be aligned to sector size
      76c7f887
    • Linus Torvalds's avatar
      Merge tag '6.4-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 · fd88f147
      Linus Torvalds authored
      Pull cifs client fixes from Steve French:
      
       - fix for copy_file_range bug for very large files that are multiples
         of rsize
      
       - do not ignore "isolated transport" flag if set on share
      
       - set rasize default better
      
       - three fixes related to shutdown and freezing (fixes 4 xfstests, and
         closes deferred handles faster in some places that were missed)
      
      * tag '6.4-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: release leases for deferred close handles when freezing
        smb3: fix problem remounting a share after shutdown
        SMB3: force unmount was failing to close deferred close files
        smb3: improve parallel reads of large files
        do not reuse connection if share marked as isolated
        cifs: fix pcchunk length type in smb2_copychunk_range
      fd88f147
    • Linus Torvalds's avatar
      Merge tag 'vfs/v6.4-rc1/pipe' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs · df8c2d13
      Linus Torvalds authored
      Pull vfs fix from Christian Brauner:
       "During the pipe nonblock rework the check for both O_NONBLOCK and
        IOCB_NOWAIT was dropped. Both checks need to be performed to ensure
        that files without O_NONBLOCK but IOCB_NOWAIT don't block when writing
        to or reading from a pipe.
      
        This just contains the fix adding the check for IOCB_NOWAIT back in"
      
      * tag 'vfs/v6.4-rc1/pipe' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
        pipe: check for IOCB_NOWAIT alongside O_NONBLOCK
      df8c2d13
    • Linus Torvalds's avatar
      Merge tag 'io_uring-6.4-2023-05-12' of git://git.kernel.dk/linux · 584dc5db
      Linus Torvalds authored
      Pull io_uring fix from Jens Axboe:
       "Just a single fix making io_uring_sqe_cmd() available regardless of
        CONFIG_IO_URING, fixing a regression introduced during the merge
        window if nvme was selected but io_uring was not"
      
      * tag 'io_uring-6.4-2023-05-12' of git://git.kernel.dk/linux:
        io_uring: make io_uring_sqe_cmd() unconditionally available
      584dc5db