1. 23 Aug, 2023 5 commits
  2. 11 Aug, 2023 3 commits
    • Jingbo Xu's avatar
      erofs: boost negative xattr lookup with bloom filter · fd73a439
      Jingbo Xu authored
      Optimise the negative xattr lookup with bloom filter.
      
      The bit value for the bloom filter map has a reverse semantics for
      compatibility.  That is, the bit value of 0 indicates existence, while
      the bit value of 1 indicates the absence of corresponding xattr.
      
      The initial version is _only_ enabled when xattr_filter_reserved is
      zero.  The filter map internals may change in the future, in which case
      the reserved flag will be set non-zero and we don't need bothering the
      compatible bits again at that time.  For now disable the optimization if
      this reserved flag is non-zero.
      Signed-off-by: default avatarJingbo Xu <jefflexu@linux.alibaba.com>
      Reviewed-by: default avatarGao Xiang <hsiangkao@linux.alibaba.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Link: https://lore.kernel.org/r/20230722094538.11754-3-jefflexu@linux.alibaba.comSigned-off-by: default avatarGao Xiang <hsiangkao@linux.alibaba.com>
      fd73a439
    • Jingbo Xu's avatar
      erofs: update on-disk format for xattr name filter · 3f339920
      Jingbo Xu authored
      The xattr name bloom filter feature is going to be introduced to speed
      up the negative xattr lookup, e.g. system.posix_acl_[access|default]
      lookup when running "ls -lR" workload.
      
      There are some commonly used extended attributes (n) and the total
      number of these is approximately 30.
      
      	trusted.overlay.opaque
      	trusted.overlay.redirect
      	trusted.overlay.origin
      	trusted.overlay.impure
      	trusted.overlay.nlink
      	trusted.overlay.upper
      	trusted.overlay.metacopy
      	trusted.overlay.protattr
      	user.overlay.opaque
      	user.overlay.redirect
      	user.overlay.origin
      	user.overlay.impure
      	user.overlay.nlink
      	user.overlay.upper
      	user.overlay.metacopy
      	user.overlay.protattr
      	security.evm
      	security.ima
      	security.selinux
      	security.SMACK64
      	security.SMACK64IPIN
      	security.SMACK64IPOUT
      	security.SMACK64EXEC
      	security.SMACK64TRANSMUTE
      	security.SMACK64MMAP
      	security.apparmor
      	security.capability
      	system.posix_acl_access
      	system.posix_acl_default
      	user.mime_type
      
      Given the number of bits of the bloom filter (m) is 32, the optimal
      value for the number of the hash functions (k) is 1 (ln2 * m/n = 0.74).
      
      The single hash function is implemented as:
      
      	xxh32(name, strlen(name), EROFS_XATTR_FILTER_SEED + index)
      
      where `index` represents the index of corresponding predefined short name
      prefix, while `name` represents the name string after stripping the above
      predefined name prefix.
      
      The constant magic number EROFS_XATTR_FILTER_SEED, i.e. 0x25BBE08F, is
      used to give a better spread when mapping these 30 extended attributes
      into 32-bit bloom filter as:
      
      	bit  0: security.ima
      	bit  1:
      	bit  2: trusted.overlay.nlink
      	bit  3:
      	bit  4: user.overlay.nlink
      	bit  5: trusted.overlay.upper
      	bit  6: user.overlay.origin
      	bit  7: trusted.overlay.protattr
      	bit  8: security.apparmor
      	bit  9: user.overlay.protattr
      	bit 10: user.overlay.opaque
      	bit 11: security.selinux
      	bit 12: security.SMACK64TRANSMUTE
      	bit 13: security.SMACK64
      	bit 14: security.SMACK64MMAP
      	bit 15: user.overlay.impure
      	bit 16: security.SMACK64IPIN
      	bit 17: trusted.overlay.redirect
      	bit 18: trusted.overlay.origin
      	bit 19: security.SMACK64IPOUT
      	bit 20: trusted.overlay.opaque
      	bit 21: system.posix_acl_default
      	bit 22:
      	bit 23: user.mime_type
      	bit 24: trusted.overlay.impure
      	bit 25: security.SMACK64EXEC
      	bit 26: user.overlay.redirect
      	bit 27: user.overlay.upper
      	bit 28: security.evm
      	bit 29: security.capability
      	bit 30: system.posix_acl_access
      	bit 31: trusted.overlay.metacopy, user.overlay.metacopy
      
      h_name_filter is introduced to the on-disk per-inode xattr header to
      place the corresponding xattr name filter, where bit value 1 indicates
      non-existence for compatibility.
      
      This feature is indicated by EROFS_FEATURE_COMPAT_XATTR_FILTER
      compatible feature bit.
      
      Reserve one byte in on-disk superblock as the on-disk format for xattr
      name filter may change in the future.  With this flag we don't need
      bothering these compatible bits again at that time.
      Suggested-by: default avatarAlexander Larsson <alexl@redhat.com>
      Signed-off-by: default avatarJingbo Xu <jefflexu@linux.alibaba.com>
      Reviewed-by: default avatarGao Xiang <hsiangkao@linux.alibaba.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Link: https://lore.kernel.org/r/20230722094538.11754-2-jefflexu@linux.alibaba.comSigned-off-by: default avatarGao Xiang <hsiangkao@linux.alibaba.com>
      3f339920
    • Gao Xiang's avatar
      erofs: DEFLATE compression support · ffa09b3b
      Gao Xiang authored
      Add DEFLATE compression as the 3rd supported algorithm.
      
      DEFLATE is a popular generic-purpose compression algorithm for quite
      long time (many advanced formats like gzip, zlib, zip, png are all
      based on that) as Apple documentation written "If you require
      interoperability with non-Apple devices, use COMPRESSION_ZLIB. [1]".
      
      Due to its popularity, there are several hardware on-market DEFLATE
      accelerators, such as (s390) DFLTCC, (Intel) IAA/QAT, (HiSilicon) ZIP
      accelerator, etc.  In addition, there are also several high-performence
      IP cores and even open-source FPGA approches available for DEFLATE.
      Therefore, it's useful to support DEFLATE compression in order to find
      a way to utilize these accelerators for asynchronous I/Os and get
      benefits from these later.
      
      Besides, it's a good choice to trade off between compression ratios
      and performance compared to LZ4 and LZMA.  The DEFLATE core format is
      simple as well as easy to understand, therefore the code size of its
      decompressor is small even for the bootloader use cases.  The runtime
      memory consumption is quite limited too (e.g. 32K + ~7K for each zlib
      stream).  As usual, EROFS ourperforms similar approaches too.
      
      Alternatively, DEFLATE could still be used for some specific files
      since EROFS supports multiple compression algorithms in one image.
      
      [1] https://developer.apple.com/documentation/compression/compression_algorithmReviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarGao Xiang <hsiangkao@linux.alibaba.com>
      Link: https://lore.kernel.org/r/20230810154859.118330-1-hsiangkao@linux.alibaba.com
      ffa09b3b
  3. 06 Aug, 2023 8 commits
    • Linus Torvalds's avatar
      Linux 6.5-rc5 · 52a93d39
      Linus Torvalds authored
      52a93d39
    • Linus Torvalds's avatar
      Merge tag 'v6.5-rc5.vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 0108963f
      Linus Torvalds authored
      Pull vfs fixes from Christian Brauner:
      
       - Fix a wrong check for O_TMPFILE during RESOLVE_CACHED lookup
      
       - Clean up directory iterators and clarify file_needs_f_pos_lock()
      
      * tag 'v6.5-rc5.vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        fs: rely on ->iterate_shared to determine f_pos locking
        vfs: get rid of old '->iterate' directory operation
        proc: fix missing conversion to 'iterate_shared'
        open: make RESOLVE_CACHED correctly test for O_TMPFILE
      0108963f
    • Christian Brauner's avatar
      fs: rely on ->iterate_shared to determine f_pos locking · 7d84d1b9
      Christian Brauner authored
      Now that we removed ->iterate we don't need to check for either
      ->iterate or ->iterate_shared in file_needs_f_pos_lock(). Simply check
      for ->iterate_shared instead. This will tell us whether we need to
      unconditionally take the lock. Not just does it allow us to avoid
      checking f_inode's mode it also actually clearly shows that we're
      locking because of readdir.
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      7d84d1b9
    • Linus Torvalds's avatar
      vfs: get rid of old '->iterate' directory operation · 3e327154
      Linus Torvalds authored
      All users now just use '->iterate_shared()', which only takes the
      directory inode lock for reading.
      
      Filesystems that never got convered to shared mode now instead use a
      wrapper that drops the lock, re-takes it in write mode, calls the old
      function, and then downgrades the lock back to read mode.
      
      This way the VFS layer and other callers no longer need to care about
      filesystems that never got converted to the modern era.
      
      The filesystems that use the new wrapper are ceph, coda, exfat, jfs,
      ntfs, ocfs2, overlayfs, and vboxsf.
      
      Honestly, several of them look like they really could just iterate their
      directories in shared mode and skip the wrapper entirely, but the point
      of this change is to not change semantics or fix filesystems that
      haven't been fixed in the last 7+ years, but to finally get rid of the
      dual iterators.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      3e327154
    • Linus Torvalds's avatar
      proc: fix missing conversion to 'iterate_shared' · 0a2c2baa
      Linus Torvalds authored
      I'm looking at the directory handling due to the discussion about f_pos
      locking (see commit 79796425: "file: reinstate f_pos locking
      optimization for regular files"), and wanting to clean that up.
      
      And one source of ugliness is how we were supposed to move filesystems
      over to the '->iterate_shared()' function that only takes the inode lock
      for reading many many years ago, but several filesystems still use the
      bad old '->iterate()' that takes the inode lock for exclusive access.
      
      See commit 61922694 ("introduce a parallel variant of ->iterate()")
      that also added some documentation stating
      
            Old method is only used if the new one is absent; eventually it will
            be removed.  Switch while you still can; the old one won't stay.
      
      and that was back in April 2016.  Here we are, many years later, and the
      old version is still clearly sadly alive and well.
      
      Now, some of those old style iterators are probably just because the
      filesystem may end up having per-inode mutable data that it uses for
      iterating a directory, but at least one case is just a mistake.
      
      Al switched over most filesystems to use '->iterate_shared()' back when
      it was introduced.  In particular, the /proc filesystem was converted as
      one of the first ones in commit f50752ea ("switch all procfs
      directories ->iterate_shared()").
      
      But then later one new user of '->iterate()' was then re-introduced by
      commit 6d9c939d ("procfs: add smack subdir to attrs").
      
      And that's clearly not what we wanted, since that new case just uses the
      same 'proc_pident_readdir()' and 'proc_pident_lookup()' helper functions
      that other /proc pident directories use, and they are most definitely
      safe to use with the inode lock held shared.
      
      So just fix it.
      
      This still leaves a fair number of oddball filesystems using the
      old-style directory iterator (ceph, coda, exfat, jfs, ntfs, ocfs2,
      overlayfs, and vboxsf), but at least we don't have any remaining in the
      core filesystems.
      
      I'm going to add a wrapper function that just drops the read-lock and
      takes it as a write lock, so that we can clean up the core vfs layer and
      make all the ugly 'this filesystem needs exclusive inode locking' be
      just filesystem-internal warts.
      
      I just didn't want to make that conversion when we still had a core user
      left.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      0a2c2baa
    • Aleksa Sarai's avatar
      open: make RESOLVE_CACHED correctly test for O_TMPFILE · a0fc452a
      Aleksa Sarai authored
      O_TMPFILE is actually __O_TMPFILE|O_DIRECTORY. This means that the old
      fast-path check for RESOLVE_CACHED would reject all users passing
      O_DIRECTORY with -EAGAIN, when in fact the intended test was to check
      for __O_TMPFILE.
      
      Cc: stable@vger.kernel.org # v5.12+
      Fixes: 99668f61 ("fs: expose LOOKUP_CACHED through openat2() RESOLVE_CACHED")
      Signed-off-by: default avatarAleksa Sarai <cyphar@cyphar.com>
      Message-Id: <20230806-resolve_cached-o_tmpfile-v1-1-7ba16308465e@cyphar.com>
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      a0fc452a
    • Linus Torvalds's avatar
      Merge tag 'rust-fixes-6.5-rc5' of https://github.com/Rust-for-Linux/linux · f0ab9f34
      Linus Torvalds authored
      Pull rust fixes from Miguel Ojeda:
      
       - Allocator: prevent mis-aligned allocation
      
       - Types: delete 'ForeignOwnable::borrow_mut'. A sound replacement is
         planned for the merge window
      
       - Build: fix bindgen error with UBSAN_BOUNDS_STRICT
      
      * tag 'rust-fixes-6.5-rc5' of https://github.com/Rust-for-Linux/linux:
        rust: fix bindgen build error with UBSAN_BOUNDS_STRICT
        rust: delete `ForeignOwnable::borrow_mut`
        rust: allocator: Prevent mis-aligned allocation
      f0ab9f34
    • Linus Torvalds's avatar
      Merge tag 'ata-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata · fb0d9199
      Linus Torvalds authored
      Pull ata fix from Damien Le Moal:
      
       - Prevent the scsi disk driver from issuing a START STOP UNIT command
         for ATA devices during system resume as this causes various issues
         reported by multiple users.
      
      * tag 'ata-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
        ata,scsi: do not issue START STOP UNIT on resume
      fb0d9199
  4. 05 Aug, 2023 5 commits
  5. 04 Aug, 2023 13 commits
  6. 03 Aug, 2023 6 commits
    • Dave Airlie's avatar
      Merge tag 'drm-intel-fixes-2023-08-03' of... · 1958b0f9
      Dave Airlie authored
      Merge tag 'drm-intel-fixes-2023-08-03' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
      
      - Fix bug in getting msg length in AUX CH registers handler [gvt] (Yan Zhao)
      - Gen12 AUX invalidation fixes [gt] (Andi Shyti, Jonathan Cavitt)
      - Fix premature release of request's reusable memory (Janusz Krzysztofik)
      
      - Merge tag 'gvt-fixes-2023-08-02' of https://github.com/intel/gvt-linux into drm-intel-fixes (Tvrtko Ursulin)
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/ZMtkxWGuUKpaRMmo@tursulin-desk
      1958b0f9
    • Dave Airlie's avatar
      Merge tag 'drm-misc-fixes-2023-08-03' of ssh://git.freedesktop.org/git/drm/drm-misc into drm-fixes · 062ff85b
      Dave Airlie authored
      A NULL pointer dereference fix for TTM, a timings fix for imx/ipuv3 and
      the addition of a MODULE_DEVICE_TABLE for the samsung-s6d7aa0 panel.
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      
      From: Maxime Ripard <mripard@redhat.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/ztfogof2dhtlvjwe73mvd2jp5kbldhkkav7k5culuseqblwpti@qfobohwx3c3j
      062ff85b
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-fixes-for-v6.5-2-2023-08-03' of... · c1a515d3
      Linus Torvalds authored
      Merge tag 'perf-tools-fixes-for-v6.5-2-2023-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
      
      Pull perf tools fixes from Arnaldo Carvalho de Melo:
      
       - Fix segfault in the powerpc specific arch_skip_callchain_idx
         function. The patch doing the reference count init/exit that went
         into 6.5 missed this function.
      
       - Fix regression reading the arm64 PMU cpu slots in sysfs, a patch
         removing some code duplication ended up duplicating the /sysfs prefix
         for these files.
      
       - Fix grouping of events related to topdown, addressing a regression on
         the CSV output produced by 'perf stat' noticed on the downstream tool
         toplev.
      
       - Fix the uprobe_from_different_cu 'perf test' entry, it is failing
         when gcc isn't available, so we need to check that and skip the test
         if it is not installed.
      
      * tag 'perf-tools-fixes-for-v6.5-2-2023-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
        perf test parse-events: Test complex name has required event format
        perf pmus: Create placholder regardless of scanning core_only
        perf test uprobe_from_different_cu: Skip if there is no gcc
        perf parse-events: Only move force grouped evsels when sorting
        perf parse-events: When fixing group leaders always set the leader
        perf parse-events: Extra care around force grouped events
        perf callchain powerpc: Fix addr location init during arch_skip_callchain_idx function
        perf pmu arm64: Fix reading the PMU cpu slots in sysfs
      c1a515d3
    • Linus Torvalds's avatar
      Merge tag 'cxl-fixes-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl · 638c1913
      Linus Torvalds authored
      Pull cxl fixes from Vishal Verma:
      
       - Fixup the Sanitixe device ABI that was merged for v6.5 to hide some
         sysfs files when the necessary support is missing. Update the ABI
         documentation around this as well.
      
      * tag 'cxl-fixes-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
        cxl/memdev: Only show sanitize sysfs files when supported
        cxl/memdev: Document security state in kern-doc
        cxl/memdev: Improve sanitize ABI descriptions
      638c1913
    • Linus Torvalds's avatar
      Merge tag 'net-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 999f6631
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from bpf and wireless.
      
        Nothing scary here. Feels like the first wave of regressions from v6.5
        is addressed - one outstanding fix still to come in TLS for the
        sendpage rework.
      
        Current release - regressions:
      
         - udp: fix __ip_append_data()'s handling of MSG_SPLICE_PAGES
      
         - dsa: fix older DSA drivers using phylink
      
        Previous releases - regressions:
      
         - gro: fix misuse of CB in udp socket lookup
      
         - mlx5: unregister devlink params in case interface is down
      
         - Revert "wifi: ath11k: Enable threaded NAPI"
      
        Previous releases - always broken:
      
         - sched: cls_u32: fix match key mis-addressing
      
         - sched: bind logic fixes for cls_fw, cls_u32 and cls_route
      
         - add bound checks to a number of places which hand-parse netlink
      
         - bpf: disable preemption in perf_event_output helpers code
      
         - qed: fix scheduling in a tasklet while getting stats
      
         - avoid using APIs which are not hardirq-safe in couple of drivers,
           when we may be in a hard IRQ (netconsole)
      
         - wifi: cfg80211: fix return value in scan logic, avoid page
           allocator warning
      
         - wifi: mt76: mt7615: do not advertise 5 GHz on first PHY of MT7615D
           (DBDC)
      
        Misc:
      
         - drop handful of inactive maintainers, put some new in place"
      
      * tag 'net-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (98 commits)
        MAINTAINERS: update TUN/TAP maintainers
        test/vsock: remove vsock_perf executable on `make clean`
        tcp_metrics: fix data-race in tcpm_suck_dst() vs fastopen
        tcp_metrics: annotate data-races around tm->tcpm_net
        tcp_metrics: annotate data-races around tm->tcpm_vals[]
        tcp_metrics: annotate data-races around tm->tcpm_lock
        tcp_metrics: annotate data-races around tm->tcpm_stamp
        tcp_metrics: fix addr_same() helper
        prestera: fix fallback to previous version on same major version
        udp: Fix __ip_append_data()'s handling of MSG_SPLICE_PAGES
        net/mlx5e: Set proper IPsec source port in L4 selector
        net/mlx5: fs_core: Skip the FTs in the same FS_TYPE_PRIO_CHAINS fs_prio
        net/mlx5: fs_core: Make find_closest_ft more generic
        wifi: brcmfmac: Fix field-spanning write in brcmf_scan_params_v2_to_v1()
        vxlan: Fix nexthop hash size
        ip6mr: Fix skb_under_panic in ip6mr_cache_report()
        s390/qeth: Don't call dev_close/dev_open (DOWN/UP)
        net: tap_open(): set sk_uid from current_fsuid()
        net: tun_chr_open(): set sk_uid from current_fsuid()
        net: dcb: choose correct policy to parse DCB_ATTR_BCN
        ...
      999f6631
    • Jakub Kicinski's avatar
      MAINTAINERS: update TUN/TAP maintainers · 0765c5f2
      Jakub Kicinski authored
      Willem and Jason have agreed to take over the maintainer
      duties for TUN/TAP, thank you!
      
      There's an existing entry for TUN/TAP which only covers
      the user mode Linux implementation.
      Since we haven't heard from Maxim on the list for almost
      a decade, extend that entry and take it over, rather than
      adding a new one.
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Link: https://lore.kernel.org/r/20230802182843.4193099-1-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0765c5f2