1. 15 Feb, 2024 2 commits
  2. 14 Feb, 2024 2 commits
    • Linus Torvalds's avatar
      Merge tag 'for-6.8-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 1f3a3e2a
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
       "A few regular fixes and one fix for space reservation regression since
        6.7 that users have been reporting:
      
         - fix over-reservation of metadata chunks due to not keeping proper
           balance between global block reserve and delayed refs reserve; in
           practice this leaves behind empty metadata block groups, the
           workaround is to reclaim them by using the '-musage=1' balance
           filter
      
         - other space reservation fixes:
            - do not delete unused block group if it may be used soon
            - do not reserve space for checksums for NOCOW files
      
         - fix extent map assertion failure when writing out free space inode
      
         - reject encoded write if inode has nodatasum flag set
      
         - fix chunk map leak when loading block group zone info"
      
      * tag 'for-6.8-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: don't refill whole delayed refs block reserve when starting transaction
        btrfs: zoned: fix chunk map leak when loading block group zone info
        btrfs: reject encoded write if inode has nodatasum flag set
        btrfs: don't reserve space for checksums when writing to nocow files
        btrfs: add new unused block groups to the list of unused block groups
        btrfs: do not delete unused block group if it may be used soon
        btrfs: add and use helper to check if block group is used
        btrfs: don't drop extent_map for free space inode on write error
      1f3a3e2a
    • Linus Torvalds's avatar
      Merge tag 'linux_kselftest-kunit-fixes-6.8-rc5' of... · 91f842ff
      Linus Torvalds authored
      Merge tag 'linux_kselftest-kunit-fixes-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull KUnit fix from Shuah Khan:
       "One important fix to unregister kunit_bus when KUnit module is
        unloaded.
      
        Not doing so causes an error when KUnit module tries to re-register
        the bus when it gets reloaded"
      
      * tag 'linux_kselftest-kunit-fixes-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
        kunit: device: Unregister the kunit_bus on shutdown
      91f842ff
  3. 13 Feb, 2024 5 commits
    • Filipe Manana's avatar
      btrfs: don't refill whole delayed refs block reserve when starting transaction · 2f6397e4
      Filipe Manana authored
      Since commit 28270e25 ("btrfs: always reserve space for delayed refs
      when starting transaction") we started not only to reserve metadata space
      for the delayed refs a caller of btrfs_start_transaction() might generate
      but also to try to fully refill the delayed refs block reserve, because
      there are several case where we generate delayed refs and haven't reserved
      space for them, relying on the global block reserve. Relying too much on
      the global block reserve is not always safe, and can result in hitting
      -ENOSPC during transaction commits or worst, in rare cases, being unable
      to mount a filesystem that needs to do orphan cleanup or anything that
      requires modifying the filesystem during mount, and has no more
      unallocated space and the metadata space is nearly full. This was
      explained in detail in that commit's change log.
      
      However the gap between the reserved amount and the size of the delayed
      refs block reserve can be huge, so attempting to reserve space for such
      a gap can result in allocating many metadata block groups that end up
      not being used. After a recent patch, with the subject:
      
        "btrfs: add new unused block groups to the list of unused block groups"
      
      We started to add new block groups that are unused to the list of unused
      block groups, to avoid having them around for a very long time in case
      they are never used, because a block group is only added to the list of
      unused block groups when we deallocate the last extent or when mounting
      the filesystem and the block group has 0 bytes used. This is not a problem
      introduced by the commit mentioned earlier, it always existed as our
      metadata space reservations are, most of the time, pessimistic and end up
      not using all the space they reserved, so we can occasionally end up with
      one or two unused metadata block groups for a long period. However after
      that commit mentioned earlier, we are just more pessimistic in the
      metadata space reservations when starting a transaction and therefore the
      issue is more likely to happen.
      
      This however is not always enough because we might create unused metadata
      block groups when reserving metadata space at a high rate if there's
      always a gap in the delayed refs block reserve and the cleaner kthread
      isn't triggered often enough or is busy with other work (running delayed
      iputs, cleaning deleted roots, etc), not to mention the block group's
      allocated space is only usable for a new block group after the transaction
      used to remove it is committed.
      
      A user reported that he's getting a lot of allocated metadata block groups
      but the usage percentage of metadata space was very low compared to the
      total allocated space, specially after running a series of block group
      relocations.
      
      So for now stop trying to refill the gap in the delayed refs block reserve
      and reserve space only for the delayed refs we are expected to generate
      when starting a transaction.
      
      CC: stable@vger.kernel.org # 6.7+
      Reported-by: default avatarIvan Shapovalov <intelfx@intelfx.name>
      Link: https://lore.kernel.org/linux-btrfs/9cdbf0ca9cdda1b4c84e15e548af7d7f9f926382.camel@intelfx.name/
      Link: https://lore.kernel.org/linux-btrfs/CAL3q7H6802ayLHUJFztzZAVzBLJAGdFx=6FHNNy87+obZXXZpQ@mail.gmail.com/Tested-by: default avatarIvan Shapovalov <intelfx@intelfx.name>
      Reported-by: default avatarHeddxh <g311571057@gmail.com>
      Link: https://lore.kernel.org/linux-btrfs/CAE93xANEby6RezOD=zcofENYZOT-wpYygJyauyUAZkLv6XVFOA@mail.gmail.com/Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      2f6397e4
    • Filipe Manana's avatar
      btrfs: zoned: fix chunk map leak when loading block group zone info · 88e81a67
      Filipe Manana authored
      At btrfs_load_block_group_zone_info() we never drop a reference on the
      chunk map we have looked up, therefore leaking a reference on it. So
      add the missing btrfs_free_chunk_map() at the end of the function.
      
      Fixes: 7dc66abb ("btrfs: use a dedicated data structure for chunk maps")
      Reported-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Tested-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      88e81a67
    • Filipe Manana's avatar
      btrfs: reject encoded write if inode has nodatasum flag set · 1bd96c92
      Filipe Manana authored
      Currently we allow an encoded write against inodes that have the NODATASUM
      flag set, either because they are NOCOW files or they were created while
      the filesystem was mounted with "-o nodatasum". This results in having
      compressed extents without corresponding checksums, which is a filesystem
      inconsistency reported by 'btrfs check'.
      
      For example, running btrfs/281 with MOUNT_OPTIONS="-o nodatacow" triggers
      this and 'btrfs check' errors out with:
      
         [1/7] checking root items
         [2/7] checking extents
         [3/7] checking free space tree
         [4/7] checking fs roots
         root 256 inode 257 errors 1040, bad file extent, some csum missing
         root 256 inode 258 errors 1040, bad file extent, some csum missing
         ERROR: errors found in fs roots
         (...)
      
      So reject encoded writes if the target inode has NODATASUM set.
      
      CC: stable@vger.kernel.org # 6.1+
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      1bd96c92
    • Filipe Manana's avatar
      btrfs: don't reserve space for checksums when writing to nocow files · feefe1f4
      Filipe Manana authored
      Currently when doing a write to a file we always reserve metadata space
      for inserting data checksums. However we don't need to do it if we have
      a nodatacow file (-o nodatacow mount option or chattr +C) or if checksums
      are disabled (-o nodatasum mount option), as in that case we are only
      adding unnecessary pressure to metadata reservations.
      
      For example on x86_64, with the default node size of 16K, a 4K buffered
      write into a nodatacow file is reserving 655360 bytes of metadata space,
      as it's accounting for checksums. After this change, which stops reserving
      space for checksums if we have a nodatacow file or checksums are disabled,
      we only need to reserve 393216 bytes of metadata.
      
      CC: stable@vger.kernel.org # 6.1+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      feefe1f4
    • Linus Torvalds's avatar
      Merge tag 'trace-tools-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 7e90b5c2
      Linus Torvalds authored
      Pull tracing tooling fixes from Steven Rostedt:
       "RTLA:
      
         - rtla tools are exiting with a positive value when usage() is
           called. Make them return 0 if the usage was called via -h/--help
      
         - the -P priority sets the sched priority for rtla workload. When the
           SCHED_OTHER scheduler is selected, it sets the rt_priority instead
           of the nice parameter. Setting the nice value is the correct thing,
           so fix it
      
         - rtla is failing to compile with clang due to unsupported options
           from gcc. Adjusting the compiler/linker options makes clang work
           properly
      
         - Remove the sched_getattr() unused function on utils.c
      
         - Fixes for variable initialization and size, reported by clang
      
        Verification:
      
         - rv is failing to compile with clang due to unsupported options from
           gcc. Adjusting the compiler/linker options makes clang work
           properly
      
         - Fix an uninitialized variable on in_kernel.c reported by clang"
      
      * tag 'trace-tools-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        tools/rtla: Exit with EXIT_SUCCESS when help is invoked
        tools/rtla: Replace setting prio with nice for SCHED_OTHER
        tools/rv: Fix curr_reactor uninitialized variable
        tools/rv: Fix Makefile compiler options for clang
        tools/rtla: Remove unused sched_getattr() function
        tools/rtla: Fix clang warning about mount_point var size
        tools/rtla: Fix uninitialized bucket/data->bucket_size warning
        tools/rtla: Fix Makefile compiler options for clang
      7e90b5c2
  4. 12 Feb, 2024 14 commits
    • Linus Torvalds's avatar
      Merge tag 'docs-6.8-fixes2' of git://git.lwn.net/linux · c664e16b
      Linus Torvalds authored
      Pull documentation fix from Jonathan Corbet:
       "A single fix to the kernel_feat extension for a bug that will crash
        the docs build in some situations"
      
      * tag 'docs-6.8-fixes2' of git://git.lwn.net/linux:
        docs: kernel_feat.py: fix build error for missing files
      c664e16b
    • Jiaxun Yang's avatar
      mm/memory: Use exception ip to search exception tables · 8fa50708
      Jiaxun Yang authored
      On architectures with delay slot, instruction_pointer() may differ
      from where exception was triggered.
      
      Use exception_ip we just introduced to search exception tables to
      get rid of the problem.
      
      Fixes: 4bce37a6 ("mips/mm: Convert to using lock_mm_and_find_vma()")
      Reported-by: default avatarXi Ruoyao <xry111@xry111.site>
      Link: https://lore.kernel.org/r/75e9fd7b08562ad9b456a5bdaacb7cc220311cc9.camel@xry111.site/Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiaxun Yang <jiaxun.yang@flygoat.com>
      Signed-off-by: default avatarThomas Bogendoerfer <tsbogend@alpha.franken.de>
      8fa50708
    • Jiaxun Yang's avatar
      MIPS: Clear Cause.BD in instruction_pointer_set · 9d6e21dd
      Jiaxun Yang authored
      Clear Cause.BD after we use instruction_pointer_set to override
      EPC.
      
      This can prevent exception_epc check against instruction code at
      new return address.
      It won't be considered as "in delay slot" after epc being overridden
      anyway.
      Signed-off-by: default avatarJiaxun Yang <jiaxun.yang@flygoat.com>
      Signed-off-by: default avatarThomas Bogendoerfer <tsbogend@alpha.franken.de>
      9d6e21dd
    • Jiaxun Yang's avatar
      ptrace: Introduce exception_ip arch hook · 11ba1728
      Jiaxun Yang authored
      On architectures with delay slot, architecture level instruction
      pointer (or program counter) in pt_regs may differ from where
      exception was triggered.
      
      Introduce exception_ip hook to invoke architecture code and determine
      actual instruction pointer to the exception.
      
      Link: https://lore.kernel.org/lkml/00d1b813-c55f-4365-8d81-d70258e10b16@app.fastmail.com/Signed-off-by: default avatarJiaxun Yang <jiaxun.yang@flygoat.com>
      Signed-off-by: default avatarThomas Bogendoerfer <tsbogend@alpha.franken.de>
      11ba1728
    • Guenter Roeck's avatar
      MIPS: Add 'memory' clobber to csum_ipv6_magic() inline assembler · d55347bf
      Guenter Roeck authored
      After 'lib: checksum: Use aligned accesses for ip_fast_csum and
      csum_ipv6_magic tests' was applied, the test_csum_ipv6_magic unit test
      started failing for all mips platforms, both little and bit endian.
      Oddly enough, adding debug code into test_csum_ipv6_magic() made the
      problem disappear.
      
      The gcc manual says:
      
      "The "memory" clobber tells the compiler that the assembly code performs
       memory reads or writes to items other than those listed in the input
       and output operands (for example, accessing the memory pointed to by one
       of the input parameters)
      "
      
      This is definitely the case for csum_ipv6_magic(). Indeed, adding the
      'memory' clobber fixes the problem.
      
      Cc: Charlie Jenkins <charlie@rivosinc.com>
      Cc: Palmer Dabbelt <palmer@rivosinc.com>
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Reviewed-by: default avatarCharlie Jenkins <charlie@rivosinc.com>
      Signed-off-by: default avatarThomas Bogendoerfer <tsbogend@alpha.franken.de>
      d55347bf
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.8-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 716f4aaa
      Linus Torvalds authored
      Pull vfs fixes from Christian Brauner:
      
       - Fix performance regression introduced by moving the security
         permission hook out of do_clone_file_range() and into its caller
         vfs_clone_file_range().
      
         This causes the security hook to be called in situation were it
         wasn't called before as the fast permission checks were left in
         do_clone_file_range().
      
         Fix this by merging the two implementations back together and
         restoring the old ordering: fast permission checks first, expensive
         ones later.
      
       - Tweak mount_setattr() permission checking so that mount properties on
         the real rootfs can be changed.
      
         When we added mount_setattr() we added additional checks compared to
         legacy mount(2). If the mount had a parent then verify that the
         caller and the mount namespace the mount is attached to match and if
         not make sure that it's an anonymous mount.
      
         But the real rootfs falls into neither category. It is neither an
         anoymous mount because it is obviously attached to the initial mount
         namespace but it also obviously doesn't have a parent mount. So that
         means legacy mount(2) allows changing mount properties on the real
         rootfs but mount_setattr(2) blocks this. This causes regressions (See
         the commit for details).
      
         Fix this by relaxing the check. If the mount has a parent or if it
         isn't a detached mount, verify that the mount namespaces of the
         caller and the mount are the same. Technically, we could probably
         write this even simpler and check that the mount namespaces match if
         it isn't a detached mount. But the slightly longer check makes it
         clearer what conditions one needs to think about.
      
      * tag 'vfs-6.8-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        fs: relax mount_setattr() permission checks
        remap_range: merge do_clone_file_range() into vfs_clone_file_range()
      716f4aaa
    • John Kacur's avatar
      tools/rtla: Exit with EXIT_SUCCESS when help is invoked · b5f31936
      John Kacur authored
      Fix rtla so that the following commands exit with 0 when help is invoked
      
      rtla osnoise top -h
      rtla osnoise hist -h
      rtla timerlat top -h
      rtla timerlat hist -h
      
      Link: https://lore.kernel.org/linux-trace-devel/20240203001607.69703-1-jkacur@redhat.com
      
      Cc: stable@vger.kernel.org
      Fixes: 1eeb6328 ("rtla/timerlat: Add timerlat hist mode")
      Signed-off-by: default avatarJohn Kacur <jkacur@redhat.com>
      Signed-off-by: default avatarDaniel Bristot de Oliveira <bristot@kernel.org>
      b5f31936
    • limingming3's avatar
      tools/rtla: Replace setting prio with nice for SCHED_OTHER · 14f08c97
      limingming3 authored
      Since the sched_priority for SCHED_OTHER is always 0, it makes no
      sence to set it.
      Setting nice for SCHED_OTHER seems more meaningful.
      
      Link: https://lkml.kernel.org/r/20240207065142.1753909-1-limingming3@lixiang.com
      
      Cc: stable@vger.kernel.org
      Fixes: b1696371 ("rtla: Helper functions for rtla")
      Signed-off-by: default avatarlimingming3 <limingming3@lixiang.com>
      Signed-off-by: default avatarDaniel Bristot de Oliveira <bristot@kernel.org>
      14f08c97
    • Daniel Bristot de Oliveira's avatar
      tools/rv: Fix curr_reactor uninitialized variable · 61ec586b
      Daniel Bristot de Oliveira authored
      clang is reporting:
      
      $ make HOSTCC=clang CC=clang LLVM_IAS=1
      
      clang -O -g -DVERSION=\"6.8.0-rc3\" -flto=auto -fexceptions
      	-fstack-protector-strong -fasynchronous-unwind-tables
      	-fstack-clash-protection  -Wall -Werror=format-security
      	-Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS
      	$(pkg-config --cflags libtracefs)  -I include
      	-c -o src/in_kernel.o src/in_kernel.c
      [...]
      
      src/in_kernel.c:227:6: warning: variable 'curr_reactor' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
        227 |         if (!end)
            |             ^~~~
      src/in_kernel.c:242:9: note: uninitialized use occurs here
        242 |         return curr_reactor;
            |                ^~~~~~~~~~~~
      src/in_kernel.c:227:2: note: remove the 'if' if its condition is always false
        227 |         if (!end)
            |         ^~~~~~~~~
        228 |                 goto out_free;
            |                 ~~~~~~~~~~~~~
      src/in_kernel.c:221:6: warning: variable 'curr_reactor' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
        221 |         if (!start)
            |             ^~~~~~
      src/in_kernel.c:242:9: note: uninitialized use occurs here
        242 |         return curr_reactor;
            |                ^~~~~~~~~~~~
      src/in_kernel.c:221:2: note: remove the 'if' if its condition is always false
        221 |         if (!start)
            |         ^~~~~~~~~~~
        222 |                 goto out_free;
            |                 ~~~~~~~~~~~~~
      src/in_kernel.c:215:20: note: initialize the variable 'curr_reactor' to silence this warning
        215 |         char *curr_reactor;
            |                           ^
            |                            = NULL
      2 warnings generated.
      
      Which is correct. Setting curr_reactor to NULL avoids the problem.
      
      Link: https://lkml.kernel.org/r/3a35551149e5ee0cb0950035afcb8082c3b5d05b.1707217097.git.bristot@kernel.org
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Bill Wendling <morbo@google.com>
      Cc: Justin Stitt <justinstitt@google.com>
      Cc: Donald Zickus <dzickus@redhat.com>
      Fixes: 6d60f896 ("tools/rv: Add in-kernel monitor interface")
      Signed-off-by: default avatarDaniel Bristot de Oliveira <bristot@kernel.org>
      61ec586b
    • Daniel Bristot de Oliveira's avatar
      tools/rv: Fix Makefile compiler options for clang · f9b2c871
      Daniel Bristot de Oliveira authored
      The following errors are showing up when compiling rv with clang:
      
       $ make HOSTCC=clang CC=clang LLVM_IAS=1
       [...]
        clang -O -g -DVERSION=\"6.8.0-rc1\" -flto=auto -ffat-lto-objects
        -fexceptions -fstack-protector-strong -fasynchronous-unwind-tables
        -fstack-clash-protection  -Wall -Werror=format-security
        -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS
        -Wno-maybe-uninitialized $(pkg-config --cflags libtracefs)
        -I include   -c -o src/utils.o src/utils.c
        clang: warning: optimization flag '-ffat-lto-objects' is not supported [-Wignored-optimization-argument]
        warning: unknown warning option '-Wno-maybe-uninitialized'; did you mean '-Wno-uninitialized'? [-Wunknown-warning-option]
        1 warning generated.
      
        clang -o rv -ggdb  src/in_kernel.o src/rv.o src/trace.o src/utils.o $(pkg-config --libs libtracefs)
        src/in_kernel.o: file not recognized: file format not recognized
        clang: error: linker command failed with exit code 1 (use -v to see invocation)
        make: *** [Makefile:110: rv] Error 1
      
      Solve these issues by:
        - removing -ffat-lto-objects and -Wno-maybe-uninitialized if using clang
        - informing the linker about -flto=auto
      
      Link: https://lkml.kernel.org/r/ed94a8ddc2ca8c8ef663cfb7ae9dd196c4a66b33.1707217097.git.bristot@kernel.org
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Bill Wendling <morbo@google.com>
      Cc: Justin Stitt <justinstitt@google.com>
      Fixes: 4bc4b131 ("rv: Add rv tool")
      Suggested-by: default avatarDonald Zickus <dzickus@redhat.com>
      Signed-off-by: default avatarDaniel Bristot de Oliveira <bristot@kernel.org>
      f9b2c871
    • Daniel Bristot de Oliveira's avatar
      tools/rtla: Remove unused sched_getattr() function · 084ce16d
      Daniel Bristot de Oliveira authored
      Clang is reporting:
      
      $ make HOSTCC=clang CC=clang LLVM_IAS=1
      [...]
      clang -O -g -DVERSION=\"6.8.0-rc3\" -flto=auto -fexceptions -fstack-protector-strong -fasynchronous-unwind-tables -fstack-clash-protection  -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS $(pkg-config --cflags libtracefs)    -c -o src/utils.o src/utils.c
      src/utils.c:241:19: warning: unused function 'sched_getattr' [-Wunused-function]
        241 | static inline int sched_getattr(pid_t pid, struct sched_attr *attr,
            |                   ^~~~~~~~~~~~~
      1 warning generated.
      
      Which is correct, so remove the unused function.
      
      Link: https://lkml.kernel.org/r/eaed7ba122c4ae88ce71277c824ef41cbf789385.1707217097.git.bristot@kernel.org
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Bill Wendling <morbo@google.com>
      Cc: Justin Stitt <justinstitt@google.com>
      Cc: Donald Zickus <dzickus@redhat.com>
      Fixes: b1696371 ("rtla: Helper functions for rtla")
      Signed-off-by: default avatarDaniel Bristot de Oliveira <bristot@kernel.org>
      084ce16d
    • Daniel Bristot de Oliveira's avatar
      tools/rtla: Fix clang warning about mount_point var size · 30369084
      Daniel Bristot de Oliveira authored
      clang is reporting this warning:
      
      $ make HOSTCC=clang CC=clang LLVM_IAS=1
      [...]
      clang -O -g -DVERSION=\"6.8.0-rc3\" -flto=auto -fexceptions
      	-fstack-protector-strong -fasynchronous-unwind-tables
      	-fstack-clash-protection  -Wall -Werror=format-security
      	-Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS
      	$(pkg-config --cflags libtracefs)    -c -o src/utils.o src/utils.c
      
      src/utils.c:548:66: warning: 'fscanf' may overflow; destination buffer in argument 3 has size 1024, but the corresponding specifier may require size 1025 [-Wfortify-source]
        548 |         while (fscanf(fp, "%*s %" STR(MAX_PATH) "s %99s %*s %*d %*d\n", mount_point, type) == 2) {
            |                                                                         ^
      
      Increase mount_point variable size to MAX_PATH+1 to avoid the overflow.
      
      Link: https://lkml.kernel.org/r/1b46712e93a2f4153909514a36016959dcc4021c.1707217097.git.bristot@kernel.org
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Bill Wendling <morbo@google.com>
      Cc: Justin Stitt <justinstitt@google.com>
      Cc: Donald Zickus <dzickus@redhat.com>
      Fixes: a957cbc0 ("rtla: Add -C cgroup support")
      Signed-off-by: default avatarDaniel Bristot de Oliveira <bristot@kernel.org>
      30369084
    • Daniel Bristot de Oliveira's avatar
      tools/rtla: Fix uninitialized bucket/data->bucket_size warning · 64dc40f7
      Daniel Bristot de Oliveira authored
      When compiling rtla with clang, I am getting the following warnings:
      
      $ make HOSTCC=clang CC=clang LLVM_IAS=1
      
      [..]
      clang -O -g -DVERSION=\"6.8.0-rc3\" -flto=auto -fexceptions
      	-fstack-protector-strong -fasynchronous-unwind-tables
      	-fstack-clash-protection  -Wall -Werror=format-security
      	-Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS
      	$(pkg-config --cflags libtracefs)
      	-c -o src/osnoise_hist.o src/osnoise_hist.c
      src/osnoise_hist.c:138:6: warning: variable 'bucket' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
        138 |         if (data->bucket_size)
            |             ^~~~~~~~~~~~~~~~~
      src/osnoise_hist.c:149:6: note: uninitialized use occurs here
        149 |         if (bucket < entries)
            |             ^~~~~~
      src/osnoise_hist.c:138:2: note: remove the 'if' if its condition is always true
        138 |         if (data->bucket_size)
            |         ^~~~~~~~~~~~~~~~~~~~~~
        139 |                 bucket = duration / data->bucket_size;
      src/osnoise_hist.c:132:12: note: initialize the variable 'bucket' to silence this warning
        132 |         int bucket;
            |                   ^
            |                    = 0
      1 warning generated.
      
      [...]
      
      clang -O -g -DVERSION=\"6.8.0-rc3\" -flto=auto -fexceptions
      	-fstack-protector-strong -fasynchronous-unwind-tables
      	-fstack-clash-protection  -Wall -Werror=format-security
      	-Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS
      	$(pkg-config --cflags libtracefs)
      	-c -o src/timerlat_hist.o src/timerlat_hist.c
      src/timerlat_hist.c:181:6: warning: variable 'bucket' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
        181 |         if (data->bucket_size)
            |             ^~~~~~~~~~~~~~~~~
      src/timerlat_hist.c:204:6: note: uninitialized use occurs here
        204 |         if (bucket < entries)
            |             ^~~~~~
      src/timerlat_hist.c:181:2: note: remove the 'if' if its condition is always true
        181 |         if (data->bucket_size)
            |         ^~~~~~~~~~~~~~~~~~~~~~
        182 |                 bucket = latency / data->bucket_size;
      src/timerlat_hist.c:175:12: note: initialize the variable 'bucket' to silence this warning
        175 |         int bucket;
            |                   ^
            |                    = 0
      1 warning generated.
      
      This is a legit warning, but data->bucket_size is always > 0 (see
      timerlat_hist_parse_args()), so the if is not necessary.
      
      Remove the unneeded if (data->bucket_size) to avoid the warning.
      
      Link: https://lkml.kernel.org/r/6e1b1665cd99042ae705b3e0fc410858c4c42346.1707217097.git.bristot@kernel.org
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Bill Wendling <morbo@google.com>
      Cc: Justin Stitt <justinstitt@google.com>
      Cc: Donald Zickus <dzickus@redhat.com>
      Fixes: 1eeb6328 ("rtla/timerlat: Add timerlat hist mode")
      Fixes: 829a6c0b ("rtla/osnoise: Add the hist mode")
      Signed-off-by: default avatarDaniel Bristot de Oliveira <bristot@kernel.org>
      64dc40f7
    • Daniel Bristot de Oliveira's avatar
      tools/rtla: Fix Makefile compiler options for clang · bc4cbc9d
      Daniel Bristot de Oliveira authored
      The following errors are showing up when compiling rtla with clang:
      
       $ make HOSTCC=clang CC=clang LLVM_IAS=1
       [...]
      
        clang -O -g -DVERSION=\"6.8.0-rc1\" -flto=auto -ffat-lto-objects
      	-fexceptions -fstack-protector-strong
      	-fasynchronous-unwind-tables -fstack-clash-protection  -Wall
      	-Werror=format-security -Wp,-D_FORTIFY_SOURCE=2
      	-Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized
      	$(pkg-config --cflags libtracefs)    -c -o src/utils.o src/utils.c
      
        clang: warning: optimization flag '-ffat-lto-objects' is not supported [-Wignored-optimization-argument]
        warning: unknown warning option '-Wno-maybe-uninitialized'; did you mean '-Wno-uninitialized'? [-Wunknown-warning-option]
        1 warning generated.
      
        clang -o rtla -ggdb  src/osnoise.o src/osnoise_hist.o src/osnoise_top.o
        src/rtla.o src/timerlat_aa.o src/timerlat.o src/timerlat_hist.o
        src/timerlat_top.o src/timerlat_u.o src/trace.o src/utils.o $(pkg-config --libs libtracefs)
      
        src/osnoise.o: file not recognized: file format not recognized
        clang: error: linker command failed with exit code 1 (use -v to see invocation)
        make: *** [Makefile:110: rtla] Error 1
      
      Solve these issues by:
        - removing -ffat-lto-objects and -Wno-maybe-uninitialized if using clang
        - informing the linker about -flto=auto
      
      Link: https://lore.kernel.org/linux-trace-kernel/567ac1b94effc228ce9a0225b9df7232a9b35b55.1707217097.git.bristot@kernel.org
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Bill Wendling <morbo@google.com>
      Cc: Justin Stitt <justinstitt@google.com>
      Fixes: 1a7b22ab ("tools/rtla: Build with EXTRA_{C,LD}FLAGS")
      Suggested-by: default avatarDonald Zickus <dzickus@redhat.com>
      Signed-off-by: default avatarDaniel Bristot de Oliveira <bristot@kernel.org>
      bc4cbc9d
  5. 11 Feb, 2024 3 commits
  6. 10 Feb, 2024 8 commits
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2024-02-10-11-16' of... · 7521f258
      Linus Torvalds authored
      Merge tag 'mm-hotfixes-stable-2024-02-10-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
      
      Pull misc fixes from Andrew Morton:
       "21 hotfixes. 12 are cc:stable and the remainder pertain to post-6.7
        issues or aren't considered to be needed in earlier kernel versions"
      
      * tag 'mm-hotfixes-stable-2024-02-10-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits)
        nilfs2: fix potential bug in end_buffer_async_write
        mm/damon/sysfs-schemes: fix wrong DAMOS tried regions update timeout setup
        nilfs2: fix hang in nilfs_lookup_dirty_data_buffers()
        MAINTAINERS: Leo Yan has moved
        mm/zswap: don't return LRU_SKIP if we have dropped lru lock
        fs,hugetlb: fix NULL pointer dereference in hugetlbs_fill_super
        mailmap: switch email address for John Moon
        mm: zswap: fix objcg use-after-free in entry destruction
        mm/madvise: don't forget to leave lazy MMU mode in madvise_cold_or_pageout_pte_range()
        arch/arm/mm: fix major fault accounting when retrying under per-VMA lock
        selftests: core: include linux/close_range.h for CLOSE_RANGE_* macros
        mm/memory-failure: fix crash in split_huge_page_to_list from soft_offline_page
        mm: memcg: optimize parent iteration in memcg_rstat_updated()
        nilfs2: fix data corruption in dsync block recovery for small block sizes
        mm/userfaultfd: UFFDIO_MOVE implementation should use ptep_get()
        exit: wait_task_zombie: kill the no longer necessary spin_lock_irq(siglock)
        fs/proc: do_task_stat: use sig->stats_lock to gather the threads/children stats
        fs/proc: do_task_stat: move thread_group_cputime_adjusted() outside of lock_task_sighand()
        getrusage: use sig->stats_lock rather than lock_task_sighand()
        getrusage: move thread_group_cputime_adjusted() outside of lock_task_sighand()
        ...
      7521f258
    • Linus Torvalds's avatar
      Merge tag 'block-6.8-2024-02-10' of git://git.kernel.dk/linux · a5b6244c
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
      
       - NVMe pull request via Keith:
           - Update a potentially stale firmware attribute (Maurizio)
           - Fixes for the recent verbose error logging (Keith, Chaitanya)
           - Protection information payload size fix for passthrough (Francis)
      
       - Fix for a queue freezing issue in virtblk (Yi)
      
       - blk-iocost underflow fix (Tejun)
      
       - blk-wbt task detection fix (Jan)
      
      * tag 'block-6.8-2024-02-10' of git://git.kernel.dk/linux:
        virtio-blk: Ensure no requests in virtqueues before deleting vqs.
        blk-iocost: Fix an UBSAN shift-out-of-bounds warning
        nvme: use ns->head->pi_size instead of t10_pi_tuple structure size
        nvme-core: fix comment to reflect right functions
        nvme: move passthrough logging attribute to head
        blk-wbt: Fix detection of dirty-throttled tasks
        nvme-host: fix the updating of the firmware version
      a5b6244c
    • Linus Torvalds's avatar
      Merge tag 'firewire-fixes-6.8-rc4' of... · a38ff5bb
      Linus Torvalds authored
      Merge tag 'firewire-fixes-6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
      
      Pull firewire fix from Takashi Sakamoto:
       "A change to accelerate the device detection step in some cases.
      
        In the self-identification step after bus-reset, all nodes in the same
        bus broadcast selfID packet including the value of gap count. The
        value is related to the cable hops between nodes, and used to
        calculate the subaction gap and the arbitration reset gap.
      
        When each node has the different value of the gap count, the
        asynchronous communication between them is unreliable, since an
        asynchronous transaction could be interrupted by another asynchronous
        transaction before completion. The gap count inconsistency can be
        resolved by several ways; e.g. the transfer of PHY configuration
        packet and generation of bus-reset.
      
        The current implementation of firewire stack can correctly detect the
        gap count inconsistency, however the recovery action from the
        inconsistency tends to be delayed after reading configuration ROM of
        root node. This results in the long time to probe devices in some
        combinations of hardware.
      
        Here the stack is changed to schedule the action as soon as possible"
      
      * tag 'firewire-fixes-6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
        firewire: core: send bus reset promptly on gap count error
      a38ff5bb
    • Linus Torvalds's avatar
      Merge tag '6.8-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd · 5a7ec870
      Linus Torvalds authored
      Pull smb server fixes from Steve French:
       "Two ksmbd server fixes:
      
         - memory leak fix
      
         - a minor kernel-doc fix"
      
      * tag '6.8-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
        ksmbd: free aux buffer if ksmbd_iov_pin_rsp_read fails
        ksmbd: Add kernel-doc for ksmbd_extract_sharename() function
      5a7ec870
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 4a7bbe75
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Three small driver fixes and one core fix.
      
        The core fix being a fixup to the one in the last pull request which
        didn't entirely move checking of scsi_host_busy() out from under the
        host lock"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: ufs: core: Remove the ufshcd_release() in ufshcd_err_handling_prepare()
        scsi: ufs: core: Fix shift issue in ufshcd_clear_cmd()
        scsi: lpfc: Use unsigned type for num_sge
        scsi: core: Move scsi_host_busy() out of host lock if it is for per-command
      4a7bbe75
    • Linus Torvalds's avatar
      Merge tag '6.8-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 · ca00c700
      Linus Torvalds authored
      Pull smb client fixes from Steve French:
      
       - reconnect fix
      
       - multichannel channel selection fix
      
       - minor mount warning fix
      
       - reparse point fix
      
       - null pointer check improvement
      
      * tag '6.8-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        smb3: clarify mount warning
        cifs: handle cases where multiple sessions share connection
        cifs: change tcon status when need_reconnect is set on it
        smb: client: set correct d_type for reparse points under DFS mounts
        smb3: add missing null server pointer check
      ca00c700
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-6.8-rc4' of https://github.com/ceph/ceph-client · e1e3f530
      Linus Torvalds authored
      Pull ceph fixes from Ilya Dryomov:
       "Some fscrypt-related fixups (sparse reads are used only for encrypted
        files) and two cap handling fixes from Xiubo and Rishabh"
      
      * tag 'ceph-for-6.8-rc4' of https://github.com/ceph/ceph-client:
        ceph: always check dir caps asynchronously
        ceph: prevent use-after-free in encode_cap_msg()
        ceph: always set initial i_blkbits to CEPH_FSCRYPT_BLOCK_SHIFT
        libceph: just wait for more data to be available on the socket
        libceph: rename read_sparse_msg_*() to read_partial_sparse_msg_*()
        libceph: fail sparse-read if the data length doesn't match
      e1e3f530
    • Linus Torvalds's avatar
      Merge tag 'ntfs3_for_6.8' of https://github.com/Paragon-Software-Group/linux-ntfs3 · a2343df3
      Linus Torvalds authored
      Pull ntfs3 fixes from Konstantin Komarov:
       "Fixed:
         - size update for compressed file
         - some logic errors, overflows
         - memory leak
         - some code was refactored
      
        Added:
         - implement super_operations::shutdown
      
        Improved:
         - alternative boot processing
         - reduced stack usage"
      
      * tag 'ntfs3_for_6.8' of https://github.com/Paragon-Software-Group/linux-ntfs3: (28 commits)
        fs/ntfs3: Slightly simplify ntfs_inode_printk()
        fs/ntfs3: Add ioctl operation for directories (FITRIM)
        fs/ntfs3: Fix oob in ntfs_listxattr
        fs/ntfs3: Fix an NULL dereference bug
        fs/ntfs3: Update inode->i_size after success write into compressed file
        fs/ntfs3: Fixed overflow check in mi_enum_attr()
        fs/ntfs3: Correct function is_rst_area_valid
        fs/ntfs3: Use i_size_read and i_size_write
        fs/ntfs3: Prevent generic message "attempt to access beyond end of device"
        fs/ntfs3: use non-movable memory for ntfs3 MFT buffer cache
        fs/ntfs3: Use kvfree to free memory allocated by kvmalloc
        fs/ntfs3: Disable ATTR_LIST_ENTRY size check
        fs/ntfs3: Fix c/mtime typo
        fs/ntfs3: Add NULL ptr dereference checking at the end of attr_allocate_frame()
        fs/ntfs3: Add and fix comments
        fs/ntfs3: ntfs3_forced_shutdown use int instead of bool
        fs/ntfs3: Implement super_operations::shutdown
        fs/ntfs3: Drop suid and sgid bits as a part of fpunch
        fs/ntfs3: Add file_modified
        fs/ntfs3: Correct use bh_read
        ...
      a2343df3
  7. 09 Feb, 2024 6 commits
    • Linus Torvalds's avatar
      work around gcc bugs with 'asm goto' with outputs · 4356e9f8
      Linus Torvalds authored
      We've had issues with gcc and 'asm goto' before, and we created a
      'asm_volatile_goto()' macro for that in the past: see commits
      3f0116c3 ("compiler/gcc4: Add quirk for 'asm goto' miscompilation
      bug") and a9f18034 ("compiler/gcc4: Make quirk for
      asm_volatile_goto() unconditional").
      
      Then, much later, we ended up removing the workaround in commit
      43c249ea ("compiler-gcc.h: remove ancient workaround for gcc PR
      58670") because we no longer supported building the kernel with the
      affected gcc versions, but we left the macro uses around.
      
      Now, Sean Christopherson reports a new version of a very similar
      problem, which is fixed by re-applying that ancient workaround.  But the
      problem in question is limited to only the 'asm goto with outputs'
      cases, so instead of re-introducing the old workaround as-is, let's
      rename and limit the workaround to just that much less common case.
      
      It looks like there are at least two separate issues that all hit in
      this area:
      
       (a) some versions of gcc don't mark the asm goto as 'volatile' when it
           has outputs:
      
              https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98619
              https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110420
      
           which is easy to work around by just adding the 'volatile' by hand.
      
       (b) Internal compiler errors:
      
              https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422
      
           which are worked around by adding the extra empty 'asm' as a
           barrier, as in the original workaround.
      
      but the problem Sean sees may be a third thing since it involves bad
      code generation (not an ICE) even with the manually added 'volatile'.
      
      but the same old workaround works for this case, even if this feels a
      bit like voodoo programming and may only be hiding the issue.
      Reported-and-tested-by: default avatarSean Christopherson <seanjc@google.com>
      Link: https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Uros Bizjak <ubizjak@gmail.com>
      Cc: Jakub Jelinek <jakub@redhat.com>
      Cc: Andrew Pinski <quic_apinski@quicinc.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4356e9f8
    • Steve French's avatar
      smb3: clarify mount warning · a5cc98eb
      Steve French authored
      When a user tries to use the "sec=krb5p" mount parameter to encrypt
      data on connection to a server (when authenticating with Kerberos), we
      indicate that it is not supported, but do not note the equivalent
      recommended mount parameter ("sec=krb5,seal") which turns on encryption
      for that mount (and uses Kerberos for auth).  Update the warning message.
      Reviewed-by: default avatarShyam Prasad N <sprasad@microsoft.com>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      a5cc98eb
    • Shyam Prasad N's avatar
      cifs: handle cases where multiple sessions share connection · a39c757b
      Shyam Prasad N authored
      Based on our implementation of multichannel, it is entirely
      possible that a server struct may not be found in any channel
      of an SMB session.
      
      In such cases, we should be prepared to move on and search for
      the server struct in the next session.
      Signed-off-by: default avatarShyam Prasad N <sprasad@microsoft.com>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      a39c757b
    • Shyam Prasad N's avatar
      cifs: change tcon status when need_reconnect is set on it · c6e02eef
      Shyam Prasad N authored
      When a tcon is marked for need_reconnect, the intention
      is to have it reconnected.
      
      This change adjusts tcon->status in cifs_tree_connect
      when need_reconnect is set. Also, this change has a minor
      correction in resetting need_reconnect on success. It makes
      sure that it is done with tc_lock held.
      Signed-off-by: default avatarShyam Prasad N <sprasad@microsoft.com>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      c6e02eef
    • Filipe Manana's avatar
      btrfs: add new unused block groups to the list of unused block groups · 12c5128f
      Filipe Manana authored
      Space reservations for metadata are, most of the time, pessimistic as we
      reserve space for worst possible cases - where tree heights are at the
      maximum possible height (8), we need to COW every extent buffer in a tree
      path, need to split extent buffers, etc.
      
      For data, we generally reserve the exact amount of space we are going to
      allocate. The exception here is when using compression, in which case we
      reserve space matching the uncompressed size, as the compression only
      happens at writeback time and in the worst possible case we need that
      amount of space in case the data is not compressible.
      
      This means that when there's not available space in the corresponding
      space_info object, we may need to allocate a new block group, and then
      that block group might not be used after all. In this case the block
      group is never added to the list of unused block groups and ends up
      never being deleted - except if we unmount and mount again the fs, as
      when reading block groups from disk we add unused ones to the list of
      unused block groups (fs_info->unused_bgs). Otherwise a block group is
      only added to the list of unused block groups when we deallocate the
      last extent from it, so if no extent is ever allocated, the block group
      is kept around forever.
      
      This also means that if we have a bunch of tasks reserving space in
      parallel we can end up allocating many block groups that end up never
      being used or kept around for too long without being used, which has
      the potential to result in ENOSPC failures in case for example we over
      allocate too many metadata block groups and then end up in a state
      without enough unallocated space to allocate a new data block group.
      
      This is more likely to happen with metadata reservations as of kernel
      6.7, namely since commit 28270e25 ("btrfs: always reserve space for
      delayed refs when starting transaction"), because we started to always
      reserve space for delayed references when starting a transaction handle
      for a non-zero number of items, and also to try to reserve space to fill
      the gap between the delayed block reserve's reserved space and its size.
      
      So to avoid this, when finishing the creation a new block group, add the
      block group to the list of unused block groups if it's still unused at
      that time. This way the next time the cleaner kthread runs, it will delete
      the block group if it's still unused and not needed to satisfy existing
      space reservations.
      Reported-by: default avatarIvan Shapovalov <intelfx@intelfx.name>
      Link: https://lore.kernel.org/linux-btrfs/9cdbf0ca9cdda1b4c84e15e548af7d7f9f926382.camel@intelfx.name/
      CC: stable@vger.kernel.org # 6.7+
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarBoris Burkov <boris@bur.io>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      12c5128f
    • Filipe Manana's avatar
      btrfs: do not delete unused block group if it may be used soon · f4a9f219
      Filipe Manana authored
      Before deleting a block group that is in the list of unused block groups
      (fs_info->unused_bgs), we check if the block group became used before
      deleting it, as extents from it may have been allocated after it was added
      to the list.
      
      However even if the block group was not yet used, there may be tasks that
      have only reserved space and have not yet allocated extents, and they
      might be relying on the availability of the unused block group in order
      to allocate extents. The reservation works first by increasing the
      "bytes_may_use" field of the corresponding space_info object (which may
      first require flushing delayed items, allocating a new block group, etc),
      and only later a task does the actual allocation of extents.
      
      For metadata we usually don't end up using all reserved space, as we are
      pessimistic and typically account for the worst cases (need to COW every
      single node in a path of a tree at maximum possible height, etc). For
      data we usually reserve the exact amount of space we're going to allocate
      later, except when using compression where we always reserve space based
      on the uncompressed size, as compression is only triggered when writeback
      starts so we don't know in advance how much space we'll actually need, or
      if the data is compressible.
      
      So don't delete an unused block group if the total size of its space_info
      object minus the block group's size is less then the sum of used space and
      space that may be used (space_info->bytes_may_use), as that means we have
      tasks that reserved space and may need to allocate extents from the block
      group. In this case, besides skipping the deletion, re-add the block group
      to the list of unused block groups so that it may be reconsidered later,
      in case the tasks that reserved space end up not needing to allocate
      extents from it.
      
      Allowing the deletion of the block group while we have reserved space, can
      result in tasks failing to allocate metadata extents (-ENOSPC) while under
      a transaction handle, resulting in a transaction abort, or failure during
      writeback for the case of data extents.
      
      CC: stable@vger.kernel.org # 6.0+
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarBoris Burkov <boris@bur.io>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      f4a9f219