1. 31 Aug, 2024 1 commit
    • Kent Overstreet's avatar
      bcachefs: Revert lockless buffered IO path · e3e69409
      Kent Overstreet authored
      We had a report of data corruption on nixos when building installer
      images.
      
      https://github.com/NixOS/nixpkgs/pull/321055#issuecomment-2184131334
      
      It seems that writes are being dropped, but only when issued by QEMU,
      and possibly only in snapshot mode. It's undetermined if it's write
      calls are being dropped or dirty folios.
      
      Further testing, via minimizing the original patch to just the change
      that skips the inode lock on non appends/truncates, reveals that it
      really is just not taking the inode lock that causes the corruption: it
      has nothing to do with the other logic changes for preserving write
      atomicity in corner cases.
      
      It's also kernel config dependent: it doesn't reproduce with the minimal
      kernel config that ktest uses, but it does reproduce with nixos's distro
      config. Bisection the kernel config initially pointer the finger at page
      migration or compaction, but it appears that was erroneous; we haven't
      yet determined what kernel config option actually triggers it.
      
      Sadly it appears this will have to be reverted since we're getting too
      close to release and my plate is full, but we'd _really_ like to fully
      debug it.
      
      My suspicion is that this patch is exposing a preexisting bug - the
      inode lock actually covers very little in IO paths, and we have a
      different lock (the pagecache add lock) that guards against races with
      truncate here.
      
      Fixes: 7e64c86c ("bcachefs: Buffered write path now can avoid the inode lock")
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      e3e69409
  2. 27 Aug, 2024 2 commits
    • Kent Overstreet's avatar
      bcachefs: Fix bch2_extents_match() false positive · d2693569
      Kent Overstreet authored
      This was caught as a very rare nonce inconsistency, on systems with
      encryption and replication (and tiering, or some form of rebalance
      operation running):
      
      [Wed Jul 17 13:30:03 2024] about to insert invalid key in data update path
      [Wed Jul 17 13:30:03 2024] old: u64s 10 type extent 671283510:6392:U32_MAX len 16 ver 106595503: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:104 gen 7 ptr: 4:513244:48 gen 6 rebalance: target hdd compression zstd
      [Wed Jul 17 13:30:03 2024] k:   u64s 10 type extent 671283510:6400:U32_MAX len 16 ver 106595508: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:112 gen 7 ptr: 4:513244:56 gen 6 rebalance: target hdd compression zstd
      [Wed Jul 17 13:30:03 2024] new: u64s 14 type extent 671283510:6392:U32_MAX len 8 ver 106595508: durability: 2 crc: c_size 8 size 16 offset 0 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 3:355968:112 gen 7 cached ptr: 4:513244:56 gen 6 cached rebalance: target hdd compression zstd crc: c_size 8 size 16 offset 8 nonce 0 csum chacha20_poly1305_80 compress zstd ptr: 1:10860085:32 gen 0 ptr: 0:17285918:408 gen 0
      [Wed Jul 17 13:30:03 2024] bcachefs (cca5bc65-fe77-409d-a9fa-465a6e7f4eae): fatal error - emergency read only
      
      bch2_extents_match() was reporting true for extents that did not
      actually point to the same data.
      
      bch2_extent_match() iterates over pairs of pointers, looking for
      pointers that point to the same location on disk (with matching
      generation numbers). However one or both extents may have been trimmed
      (or merged) and they might not have the same disk offset: it corrects
      for this by subtracting the key offset and the checksum entry offset.
      
      However, this failed when an extent was immediately partially
      overwritten, and the new overwrite was allocated the next adjacent disk
      space.
      
      Normally, with compression off, this would never cause a bug, since the
      new extent would have to be immediately after the old extent for the
      pointer offsets to match, and the rebalance index update path is not
      looking for an extent outside the range of the extent it moved.
      
      However with compression enabled, extents take up less space on disk
      than they do in the btree index space - and spuriously matching after
      partial overwrite is possible.
      
      To fix this, add a secondary check, that strictly checks that the
      regions pointed to on disk overlap.
      
      https://github.com/koverstreet/bcachefs/issues/717Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      d2693569
    • Kent Overstreet's avatar
      bcachefs: Fix failure to return error in data_update_index_update() · 66927b89
      Kent Overstreet authored
      This fixes an assertion pop in io_write.c - if we don't return an error
      we're supposed to have completed all the btree updates.
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      66927b89
  3. 24 Aug, 2024 2 commits
  4. 22 Aug, 2024 22 commits
  5. 20 Aug, 2024 1 commit
  6. 19 Aug, 2024 3 commits
    • Kent Overstreet's avatar
      bcachefs: Fix incorrect gfp flags · 47cdc7b1
      Kent Overstreet authored
      fixes:
      00488 WARNING: CPU: 9 PID: 194 at mm/page_alloc.c:4410 __alloc_pages_noprof+0x1818/0x1888
      00488 Modules linked in:
      00488 CPU: 9 UID: 0 PID: 194 Comm: kworker/u66:1 Not tainted 6.11.0-rc1-ktest-g18fa10d6495f #2931
      00488 Hardware name: linux,dummy-virt (DT)
      00488 Workqueue: writeback wb_workfn (flush-bcachefs-2)
      00488 pstate: 20001005 (nzCv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
      00488 pc : __alloc_pages_noprof+0x1818/0x1888
      00488 lr : __alloc_pages_noprof+0x5f4/0x1888
      00488 sp : ffffff80ccd8ed00
      00488 x29: ffffff80ccd8ed00 x28: 0000000000000000 x27: dfffffc000000000
      00488 x26: 0000000000000010 x25: 0000000000000002 x24: 0000000000000000
      00488 x23: 0000000000000000 x22: 1ffffff0199b1dbe x21: ffffff80cc680900
      00488 x20: 0000000000000000 x19: ffffff80ccd8eed0 x18: 0000000000000000
      00488 x17: ffffff80cc58a010 x16: dfffffc000000000 x15: 1ffffff00474e518
      00488 x14: 1ffffff00474e518 x13: 1ffffff00474e518 x12: ffffffb8104701b9
      00488 x11: 1ffffff8104701b8 x10: ffffffb8104701b8 x9 : ffffffc08043cde8
      00488 x8 : 00000047efb8fe48 x7 : ffffff80ccd8ee20 x6 : 0000000000048000
      00488 x5 : 1ffffff810470138 x4 : 0000000000000050 x3 : 1ffffff0199b1d94
      00488 x2 : ffffffb0199b1d94 x1 : 0000000000000001 x0 : ffffffc082387448
      00488 Call trace:
      00488  __alloc_pages_noprof+0x1818/0x1888
      00488  new_slab+0x284/0x2f0
      00488  ___slab_alloc+0x208/0x8e0
      00488  __kmalloc_noprof+0x328/0x340
      00488  __bch2_writepage+0x106c/0x1830
      00488  write_cache_pages+0xa0/0xe8
      
      due to __GFP_NOFAIL without allowing reclaim
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      47cdc7b1
    • Kent Overstreet's avatar
      bcachefs: fix field-spanning write warning · d9f49c31
      Kent Overstreet authored
      attempts to retrofit memory safety onto C are increasingly annoying
      
      ------------[ cut here ]------------
      memcpy: detected field-spanning write (size 4) of single field "&k.replicas" at fs/bcachefs/replicas.c:454 (size 3)
      WARNING: CPU: 5 PID: 6525 at fs/bcachefs/replicas.c:454 bch2_replicas_gc2+0x2cb/0x400 [bcachefs]
      bch2_replicas_gc2+0x2cb/0x400:
      bch2_replicas_gc2 at /home/ojab/src/bcachefs/fs/bcachefs/replicas.c:454 (discriminator 3)
      Modules linked in: dm_mod tun nf_conntrack_netlink nfnetlink xt_addrtype br_netfilter overlay msr sctp bcachefs lz4hc_compress lz4_compress libcrc32c xor raid6_pq lz4_decompress pps_ldisc pps_core wireguard libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel curve25519_x86_64 libcurve25519_generic libchacha sit tunnel4 ip_tunnel af_packet bridge stp llc ip6table_nat ip6table_filter ip6_tables xt_MASQUERADE xt_conntrack iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables x_tables tcp_bbr sch_fq_codel efivarfs nls_iso8859_1 nls_cp437 vfat fat cdc_mbim cdc_wdm cdc_ncm cdc_ether usbnet r8152 input_leds joydev mii amdgpu mousedev hid_generic usbhid hid ath10k_pci amd_atl edac_mce_amd ath10k_core kvm_amd ath kvm mac80211 bfq crc32_pclmul crc32c_intel polyval_clmulni polyval_generic sha512_ssse3 sha256_ssse3 sha1_ssse3 snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg i2c_algo_bit drm_exec snd_hda_codec r8169 drm_suballoc_helper
      aesni_intel gf128mul crypto_simd amdxcp realtek mfd_core tpm_crb drm_buddy snd_hwdep mdio_devres libarc4 cryptd tpm_tis wmi_bmof cfg80211 evdev libphy snd_hda_core tpm_tis_core gpu_sched rapl xhci_pci xhci_hcd snd_pcm drm_display_helper snd_timer tpm sp5100_tco rfkill efi_pstore mpt3sas drm_ttm_helper ahci usbcore libaescfb ccp snd ttm 8250 libahci watchdog soundcore raid_class sha1_generic acpi_cpufreq k10temp 8250_base usb_common scsi_transport_sas i2c_piix4 hwmon video serial_mctrl_gpio serial_base ecdh_generic wmi rtc_cmos backlight ecc gpio_amdpt rng_core gpio_generic button
      CPU: 5 UID: 0 PID: 6525 Comm: bcachefs Tainted: G        W          6.11.0-rc1-ojab-00058-g224bc118aec9 #6 6d5debde398d2a84851f42ab300dae32c2992027
      Tainted: [W]=WARN
      RIP: 0010:bch2_replicas_gc2+0x2cb/0x400 [bcachefs]
      Code: c7 c2 60 91 d1 c1 48 89 c6 48 c7 c7 98 91 d1 c1 4c 89 14 24 44 89 5c 24 08 48 89 44 24 20 c6 05 fa 68 04 00 01 e8 05 a3 40 e4 <0f> 0b 4c 8b 14 24 44 8b 5c 24 08 48 8b 44 24 20 e9 55 fe ff ff 8b
      RSP: 0018:ffffb434c9263d60 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff9a8efa79cc00 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
      RBP: ffffb434c9263de0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
      R13: ffff9a8efa73c300 R14: ffff9a8d9e880000 R15: ffff9a8d9e8806f8
      FS:  0000000000000000(0000) GS:ffff9a9410c80000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000565423373090 CR3: 0000000164e30000 CR4: 00000000003506f0
      Call Trace:
      <TASK>
      ? __warn+0x97/0x150
      ? bch2_replicas_gc2+0x2cb/0x400 [bcachefs 9803eca5e131ef28f26250ede34072d5b50d98b3]
      bch2_replicas_gc2+0x2cb/0x400:
      bch2_replicas_gc2 at /home/ojab/src/bcachefs/fs/bcachefs/replicas.c:454 (discriminator 3)
      ? report_bug+0x196/0x1c0
      ? handle_bug+0x3c/0x70
      ? exc_invalid_op+0x17/0x80
      ? __wake_up_klogd.part.0+0x4c/0x80
      ? asm_exc_invalid_op+0x16/0x20
      ? bch2_replicas_gc2+0x2cb/0x400 [bcachefs 9803eca5e131ef28f26250ede34072d5b50d98b3]
      bch2_replicas_gc2+0x2cb/0x400:
      bch2_replicas_gc2 at /home/ojab/src/bcachefs/fs/bcachefs/replicas.c:454 (discriminator 3)
      ? bch2_dev_usage_read+0xa0/0xa0 [bcachefs 9803eca5e131ef28f26250ede34072d5b50d98b3]
      bch2_dev_usage_read+0xa0/0xa0:
      discard_in_flight_remove at /home/ojab/src/bcachefs/fs/bcachefs/alloc_background.c:1712
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      d9f49c31
    • Kent Overstreet's avatar
      bcachefs: Reallocate table when we're increasing size · d6d539c9
      Kent Overstreet authored
      Fixes: c2f6e16a ("bcachefs: Increase size of cuckoo hash table on too many rehashes")
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      d6d539c9
  7. 17 Aug, 2024 1 commit
  8. 16 Aug, 2024 4 commits
  9. 14 Aug, 2024 4 commits
    • Kent Overstreet's avatar
      bcachefs: bcachefs_metadata_version_disk_accounting_inum · 58474f76
      Kent Overstreet authored
      This adds another disk accounting counter to track usage per inode
      number (any snapshot ID).
      
      This will be used for a couple things:
      
      - It'll give us a way to tell the user how much space a given file ista
        consuming in all snapshots; i.e. how much extra space it's consuming
        due to snapshot versioning.
      
      - It counts number of extents and total size of extents (both in btree
        keyspace sectors and actual disk usage), meaning it gives us average
        extent size: that is, it'll let us cheaply find fragmented files that
        should be defragmented.
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      58474f76
    • Kent Overstreet's avatar
      bcachefs: Kill __bch2_accounting_mem_mod() · 5132b99b
      Kent Overstreet authored
      The next patch will be adding a disk accounting counter type which is
      not kept in the in-memory eytzinger tree.
      
      As prep, fold __bch2_accounting_mem_mod() into
      bch2_accounting_mem_mod_locked() so that we can check for that counter
      type and bail out without calling bpos_to_disk_accounting_pos() twice.
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      5132b99b
    • Kent Overstreet's avatar
      bcachefs: Make bkey_fsck_err() a wrapper around fsck_err() · d97de0d0
      Kent Overstreet authored
      bkey_fsck_err() was added as an interface that looks like fsck_err(),
      but previously all it did was ensure that the appropriate error counter
      was incremented in the superblock.
      
      This is a cleanup and bugfix patch that converts it to a wrapper around
      fsck_err(). This is needed to fix an issue with the upgrade path to
      disk_accounting_v3, where the "silent fix" error list now includes
      bkey_fsck errors; fsck_err() handles this in a unified way, and since we
      need to change printing of bkey fsck errors from the caller to the inner
      bkey_fsck_err() calls, this ends up being a pretty big change.
      
      Als,, rename .invalid() methods to .validate(), for clarity, while we're
      changing the function signature anyways (to drop the printbuf argument).
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      d97de0d0
    • Kent Overstreet's avatar