1. 02 Nov, 2023 11 commits
    • Kent Overstreet's avatar
      bcachefs: Fix an integer overflow · df94cb2e
      Kent Overstreet authored
      Fixes:
      
      bcachefs (e7fdc10e-54a3-49d9-bd0c-390370889d84): disk usage increased 4294967296 more than 2823707312 sectors reserved)
      transaction updates for __bchfs_fallocate journal seq 467859
        update: btree=extents cached=0 bch2_trans_update+0x4e8/0x540
          old u64s 5 type deleted 536925940:3559337304:4294967283 len 0 ver 0
          new u64s 6 type reservation 536925940:3559337304:4294967283 len 3559337304 ver 0: generation 0 replicas 2
        update: btree=inodes cached=1 bch2_extent_update_i_size_sectors+0x305/0x3b0
          old u64s 19 type inode_v3 0:536925940:4294967283 len 0 ver 0: mode 100600 flags 15300000 journal_seq 467859 bi_size 0 bi_sectors 0 bi_version 0 bi_atime 40905301656446 bi_ctime 40905301656446 bi_mtime 40905301656446 bi_otime 40905301656446 bi_uid 0 bi_gid 0 bi_nlink 0 bi_generation 0 bi_dev 0 bi_data_checksum 0 bi_compression 0 bi_project 0 bi_background_compression 0 bi_data_replicas 0 bi_promote_target 0 bi_foreground_target 0 bi_background_target 0 bi_erasure_code 0 bi_fields_set 0 bi_dir 1879048193 bi_dir_offset 3384856038735393365 bi_subvol 0 bi_parent_subvol 0 bi_nocow 0
          new u64s 19 type inode_v3 0:536925940:4294967283 len 0 ver 0: mode 100600 flags 15300000 journal_seq 467859 bi_size 0 bi_sectors 3559337304 bi_version 0 bi_atime 40905301656446 bi_ctime 40905301656446 bi_mtime 40905301656446 bi_otime 40905301656446 bi_uid 0 bi_gid 0 bi_nlink 0 bi_generation 0 bi_dev 0 bi_data_checksum 0 bi_compression 0 bi_project 0 bi_background_compression 0 bi_data_replicas 0 bi_promote_target 0 bi_foreground_target 0 bi_background_target 0 bi_erasure_code 0 bi_fields_set 0 bi_dir 1879048193 bi_dir_offset 3384856038735393365 bi_subvol 0 bi_parent_subvol 0 bi_nocow 0
      
      Kernel panic - not syncing: bcachefs (e7fdc10e-54a3-49d9-bd0c-390370889d84): panic after error
      CPU: 4 PID: 5154 Comm: rsync Not tainted 6.5.9-gateway-gca1614174cc0-dirty #1
      Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Phantom Gaming 4, BIOS P4.20 08/02/2021
      Call Trace:
       <TASK>
       dump_stack_lvl+0x5a/0x90
       panic+0x105/0x300
       ? console_unlock+0xf1/0x130
       ? bch2_printbuf_exit+0x16/0x30
       ? srso_return_thunk+0x5/0x10
       bch2_inconsistent_error+0x6f/0x80
       bch2_trans_fs_usage_apply+0x279/0x3d0
       __bch2_trans_commit+0x112a/0x1df0
       ? bch2_extent_update+0x13a/0x1d0
       bch2_extent_update+0x13a/0x1d0
       bch2_extent_fallocate+0x58e/0x740
       bch2_fallocate_dispatch+0xb7c/0x1030
       ? do_filp_open+0xa0/0x140
       vfs_fallocate+0x18e/0x1d0
       __x64_sys_fallocate+0x46/0x70
       do_syscall_64+0x48/0xa0
       ? exit_to_user_mode_prepare+0x4d/0xa0
       entry_SYSCALL_64_after_hwframe+0x6e/0xd8
      RIP: 0033:0x7fc85d91bbb3
      Code: 64 89 02 b8 ff ff ff ff eb bd 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 80 3d 31 da 0d 00 00 49 89 ca 74 14 b8 1d 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 5d c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 10
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      df94cb2e
    • Kent Overstreet's avatar
      bcachefs: Don't downgrade locks on transaction restart · be9e782d
      Kent Overstreet authored
      We should only be downgrading locks on success - otherwise, our
      transaction restarts won't be getting the correct locks and we'll
      livelock.
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      be9e782d
    • Kent Overstreet's avatar
    • Kent Overstreet's avatar
      bcachefs: Fix error path in bch2_replicas_gc_end() · 85103d15
      Kent Overstreet authored
      We were dropping a lock we hadn't taken when entering with an error.
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      85103d15
    • Kent Overstreet's avatar
      bcachefs: Enumerate fsck errors · b65db750
      Kent Overstreet authored
      This patch adds a superblock error counter for every distinct fsck
      error; this means that when analyzing filesystems out in the wild we'll
      be able to see what sorts of inconsistencies are being found and repair,
      and hence what bugs to look for.
      
      Errors validating bkeys are not yet considered distinct fsck errors, but
      this patch adds a new helper, bkey_fsck_err(), in order to add distinct
      error types for them as well.
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      b65db750
    • Kent Overstreet's avatar
      bcachefs: bch_sb_field_errors · f5d26fa3
      Kent Overstreet authored
      Add a new superblock section to keep counts of errors seen since
      filesystem creation: we'll be addingcounters for every distinct fsck
      error.
      
      The new superblock section has entries of the for [ id, count,
      time_of_last_error ]; this is intended to let us see what errors are
      occuring - and getting fixed - via show-super output.
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      f5d26fa3
    • Kent Overstreet's avatar
      bcachefs: Add IO error counts to bch_member · 94119eeb
      Kent Overstreet authored
      We now track IO errors per device since filesystem creation.
      
      IO error counts can be viewed in sysfs, or with the 'bcachefs
      show-super' command.
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      94119eeb
    • Kent Overstreet's avatar
      5394fe94
    • Kent Overstreet's avatar
      bcachefs: Fix a kasan splat in bch2_dev_add() · e8484348
      Kent Overstreet authored
      This fixes a use after free - mi is dangling after the resize call.
      
      Additionally, resizing the device's member info section was useless - we
      were attempting to preallocate the space required before adding it to
      the filesystem superblock, but there's other sections that we should
      have been preallocating as well for that to work.
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      e8484348
    • Kent Overstreet's avatar
      bcachefs: Fix kasan splat in members_v1_get() · 5c1ab40e
      Kent Overstreet authored
      This fixes an incorrect memcpy() in the recent members_v2 code - a
      members_v1 member is BCH_MEMBER_V1_BYTES, not sizeof(struct bch_member).
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      5c1ab40e
    • Kent Overstreet's avatar
      bcachefs: rebalance_work · fb3f57bb
      Kent Overstreet authored
      This adds a new btree, rebalance_work, to eliminate scanning required
      for finding extents that need work done on them in the background - i.e.
      for the background_target and background_compression options.
      
      rebalance_work is a bitset btree, where a KEY_TYPE_set corresponds to an
      extent in the extents or reflink btree at the same pos.
      
      A new extent field is added, bch_extent_rebalance, which indicates that
      this extent has work that needs to be done in the background - and which
      options to use. This allows per-inode options to be propagated to
      indirect extents - at least in some circumstances. In this patch,
      changing IO options on a file will not propagate the new options to
      indirect extents pointed to by that file.
      
      Updating (setting/clearing) the rebalance_work btree is done by the
      extent trigger, which looks at the bch_extent_rebalance field.
      
      Scanning is still requrired after changing IO path options - either just
      for a given inode, or for the whole filesystem. We indicate that
      scanning is required by adding a KEY_TYPE_cookie key to the
      rebalance_work btree: the cookie counter is so that we can detect that
      scanning is still required when an option has been flipped mid-way
      through an existing scan.
      
      Future possible work:
       - Propagate options to indirect extents when being changed
       - Add other IO path options - nr_replicas, ec, to rebalance_work so
         they can be applied in the background when they change
       - Add a counter, for bcachefs fs usage output, showing the pending
         amount of rebalance work: we'll probably want to do this after the
         disk space accounting rewrite (moving it to a new btree)
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      fb3f57bb
  2. 31 Oct, 2023 27 commits
  3. 30 Oct, 2023 2 commits
    • Linus Torvalds's avatar
      Merge tag 'objtool-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · cd063c8b
      Linus Torvalds authored
      Pull objtool updates from Ingo Molnar:
       "Misc fixes and cleanups:
      
         - Fix potential MAX_NAME_LEN limit related build failures
      
         - Fix scripts/faddr2line symbol filtering bug
      
         - Fix scripts/faddr2line on LLVM=1
      
         - Fix scripts/faddr2line to accept readelf output with mapping
           symbols
      
         - Minor cleanups"
      
      * tag 'objtool-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        scripts/faddr2line: Skip over mapping symbols in output from readelf
        scripts/faddr2line: Use LLVM addr2line and readelf if LLVM=1
        scripts/faddr2line: Don't filter out non-function symbols from readelf
        objtool: Remove max symbol name length limitation
        objtool: Propagate early errors
        objtool: Use 'the fallthrough' pseudo-keyword
        x86/speculation, objtool: Use absolute relocations for annotations
        x86/unwind/orc: Remove redundant initialization of 'mid' pointer in __orc_find()
      cd063c8b
    • Linus Torvalds's avatar
      Merge tag 'sched-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 63ce50ff
      Linus Torvalds authored
      Pull scheduler updates from Ingo Molnar:
       "Fair scheduler (SCHED_OTHER) improvements:
         - Remove the old and now unused SIS_PROP code & option
         - Scan cluster before LLC in the wake-up path
         - Use candidate prev/recent_used CPU if scanning failed for cluster
           wakeup
      
        NUMA scheduling improvements:
         - Improve the VMA access-PID code to better skip/scan VMAs
         - Extend tracing to cover VMA-skipping decisions
         - Improve/fix the recently introduced sched_numa_find_nth_cpu() code
         - Generalize numa_map_to_online_node()
      
        Energy scheduling improvements:
         - Remove the EM_MAX_COMPLEXITY limit
         - Add tracepoints to track energy computation
         - Make the behavior of the 'sched_energy_aware' sysctl more
           consistent
         - Consolidate and clean up access to a CPU's max compute capacity
         - Fix uclamp code corner cases
      
        RT scheduling improvements:
         - Drive dl_rq->overloaded with dl_rq->pushable_dl_tasks updates
         - Drive the ->rto_mask with rt_rq->pushable_tasks updates
      
        Scheduler scalability improvements:
         - Rate-limit updates to tg->load_avg
         - On x86 disable IBRS when CPU is offline to improve single-threaded
           performance
         - Micro-optimize in_task() and in_interrupt()
         - Micro-optimize the PSI code
         - Avoid updating PSI triggers and ->rtpoll_total when there are no
           state changes
      
        Core scheduler infrastructure improvements:
         - Use saved_state to reduce some spurious freezer wakeups
         - Bring in a handful of fast-headers improvements to scheduler
           headers
         - Make the scheduler UAPI headers more widely usable by user-space
         - Simplify the control flow of scheduler syscalls by using lock
           guards
         - Fix sched_setaffinity() vs. CPU hotplug race
      
        Scheduler debuggability improvements:
         - Disallow writing invalid values to sched_rt_period_us
         - Fix a race in the rq-clock debugging code triggering warnings
         - Fix a warning in the bandwidth distribution code
         - Micro-optimize in_atomic_preempt_off() checks
         - Enforce that the tasklist_lock is held in for_each_thread()
         - Print the TGID in sched_show_task()
         - Remove the /proc/sys/kernel/sched_child_runs_first sysctl
      
        ... and misc cleanups & fixes"
      
      * tag 'sched-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (82 commits)
        sched/fair: Remove SIS_PROP
        sched/fair: Use candidate prev/recent_used CPU if scanning failed for cluster wakeup
        sched/fair: Scan cluster before scanning LLC in wake-up path
        sched: Add cpus_share_resources API
        sched/core: Fix RQCF_ACT_SKIP leak
        sched/fair: Remove unused 'curr' argument from pick_next_entity()
        sched/nohz: Update comments about NEWILB_KICK
        sched/fair: Remove duplicate #include
        sched/psi: Update poll => rtpoll in relevant comments
        sched: Make PELT acronym definition searchable
        sched: Fix stop_one_cpu_nowait() vs hotplug
        sched/psi: Bail out early from irq time accounting
        sched/topology: Rename 'DIE' domain to 'PKG'
        sched/psi: Delete the 'update_total' function parameter from update_triggers()
        sched/psi: Avoid updating PSI triggers and ->rtpoll_total when there are no state changes
        sched/headers: Remove comment referring to rq::cpu_load, since this has been removed
        sched/numa: Complete scanning of inactive VMAs when there is no alternative
        sched/numa: Complete scanning of partial VMAs regardless of PID activity
        sched/numa: Move up the access pid reset logic
        sched/numa: Trace decisions related to skipping VMAs
        ...
      63ce50ff