1. 23 Jan, 2020 12 commits
    • Coly Li's avatar
      bcache: avoid unnecessary btree nodes flushing in btree_flush_write() · 2aa8c529
      Coly Li authored
      the commit 91be66e1 ("bcache: performance improvement for
      btree_flush_write()") was an effort to flushing btree node with oldest
      btree node faster in following methods,
      - Only iterate dirty btree nodes in c->btree_cache, avoid scanning a lot
        of clean btree nodes.
      - Take c->btree_cache as a LRU-like list, aggressively flushing all
        dirty nodes from tail of c->btree_cache util the btree node with
        oldest journal entry is flushed. This is to reduce the time of holding
        c->bucket_lock.
      
      Guoju Fang and Shuang Li reported that they observe unexptected extra
      write I/Os on cache device after applying the above patch. Guoju Fang
      provideed more detailed diagnose information that the aggressive
      btree nodes flushing may cause 10x more btree nodes to flush in his
      workload. He points out when system memory is large enough to hold all
      btree nodes in memory, c->btree_cache is not a LRU-like list any more.
      Then the btree node with oldest journal entry is very probably not-
      close to the tail of c->btree_cache list. In such situation much more
      dirty btree nodes will be aggressively flushed before the target node
      is flushed. When slow SATA SSD is used as cache device, such over-
      aggressive flushing behavior will cause performance regression.
      
      After spending a lot of time on debug and diagnose, I find the real
      condition is more complicated, aggressive flushing dirty btree nodes
      from tail of c->btree_cache list is not a good solution.
      - When all btree nodes are cached in memory, c->btree_cache is not
        a LRU-like list, the btree nodes with oldest journal entry won't
        be close to the tail of the list.
      - There can be hundreds dirty btree nodes reference the oldest journal
        entry, before flushing all the nodes the oldest journal entry cannot
        be reclaimed.
      When the above two conditions mixed together, a simply flushing from
      tail of c->btree_cache list is really NOT a good idea.
      
      Fortunately there is still chance to make btree_flush_write() work
      better. Here is how this patch avoids unnecessary btree nodes flushing,
      - Only acquire c->journal.lock when getting oldest journal entry of
        fifo c->journal.pin. In rested locations check the journal entries
        locklessly, so their values can be changed on other cores
        in parallel.
      - In loop list_for_each_entry_safe_reverse(), checking latest front
        point of fifo c->journal.pin. If it is different from the original
        point which we get with locking c->journal.lock, it means the oldest
        journal entry is reclaim on other cores. At this moment, all selected
        dirty nodes recorded in array btree_nodes[] are all flushed and clean
        on other CPU cores, it is unncessary to iterate c->btree_cache any
        longer. Just quit the list_for_each_entry_safe_reverse() loop and
        the following for-loop will skip all the selected clean nodes.
      - Find a proper time to quit the list_for_each_entry_safe_reverse()
        loop. Check the refcount value of orignial fifo front point, if the
        value is larger than selected node number of btree_nodes[], it means
        more matching btree nodes should be scanned. Otherwise it means no
        more matching btee nodes in rest of c->btree_cache list, the loop
        can be quit. If the original oldest journal entry is reclaimed and
        fifo front point is updated, the refcount of original fifo front point
        will be 0, then the loop will be quit too.
      - Not hold c->bucket_lock too long time. c->bucket_lock is also required
        for space allocation for cached data, hold it for too long time will
        block regular I/O requests. When iterating list c->btree_cache, even
        there are a lot of maching btree nodes, in order to not holding
        c->bucket_lock for too long time, only BTREE_FLUSH_NR nodes are
        selected and to flush in following for-loop.
      With this patch, only btree nodes referencing oldest journal entry
      are flushed to cache device, no aggressive flushing for  unnecessary
      btree node any more. And in order to avoid blocking regluar I/O
      requests, each time when btree_flush_write() called, at most only
      BTREE_FLUSH_NR btree nodes are selected to flush, even there are more
      maching btree nodes in list c->btree_cache.
      
      At last, one more thing to explain: Why it is safe to read front point
      of c->journal.pin without holding c->journal.lock inside the
      list_for_each_entry_safe_reverse() loop ?
      
      Here is my answer: When reading the front point of fifo c->journal.pin,
      we don't need to know the exact value of front point, we just want to
      check whether the value is different from the original front point
      (which is accurate value because we get it while c->jouranl.lock is
      held). For such purpose, it works as expected without holding
      c->journal.lock. Even the front point is changed on other CPU core and
      not updated to local core, and current iterating btree node has
      identical journal entry local as original fetched fifo front point, it
      is still safe. Because after holding mutex b->write_lock (with memory
      barrier) this btree node can be found as clean and skipped, the loop
      will quite latter when iterate on next node of list c->btree_cache.
      
      Fixes: 91be66e1 ("bcache: performance improvement for btree_flush_write()")
      Reported-by: default avatarGuoju Fang <fangguoju@gmail.com>
      Reported-by: default avatarShuang Li <psymon@bonuscloud.io>
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2aa8c529
    • Coly Li's avatar
      bcache: add code comments for state->pool in __btree_sort() · 7a0bc2a8
      Coly Li authored
      To explain the pages allocated from mempool state->pool can be
      swapped in __btree_sort(), because state->pool is a page pool,
      which allocates pages by alloc_pages() indeed.
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7a0bc2a8
    • Ben Dooks (Codethink)'s avatar
      lib: crc64: include <linux/crc64.h> for 'crc64_be' · 0e0c1231
      Ben Dooks (Codethink) authored
      The crc64_be() is declared in <linux/crc64.h> so include
      this where the symbol is defined to avoid the following
      warning:
      
      lib/crc64.c:43:12: warning: symbol 'crc64_be' was not declared. Should it be static?
      Signed-off-by: default avatarBen Dooks (Codethink) <ben.dooks@codethink.co.uk>
      Reviewed-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0e0c1231
    • Christoph Hellwig's avatar
      bcache: use read_cache_page_gfp to read the superblock · 6321bef0
      Christoph Hellwig authored
      Avoid a pointless dependency on buffer heads in bcache by simply open
      coding reading a single page.  Also add a SB_OFFSET define for the
      byte offset of the superblock instead of using magic numbers.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6321bef0
    • Christoph Hellwig's avatar
      bcache: store a pointer to the on-disk sb in the cache and cached_dev structures · 475389ae
      Christoph Hellwig authored
      This allows to properly build the superblock bio including the offset in
      the page using the normal bio helpers.  This fixes writing the superblock
      for page sizes larger than 4k where the sb write bio would need an offset
      in the bio_vec.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      475389ae
    • Christoph Hellwig's avatar
      bcache: return a pointer to the on-disk sb from read_super · cfa0c56d
      Christoph Hellwig authored
      Returning the properly typed actual data structure insteaf of the
      containing struct page will save the callers some work going
      forward.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      cfa0c56d
    • Christoph Hellwig's avatar
      bcache: transfer the sb_page reference to register_{bdev,cache} · fc8f19cc
      Christoph Hellwig authored
      Avoid an extra reference count roundtrip by transferring the sb_page
      ownership to the lower level register helpers.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      fc8f19cc
    • Coly Li's avatar
      bcache: fix use-after-free in register_bcache() · ae3cd299
      Coly Li authored
      The patch "bcache: rework error unwinding in register_bcache" introduces
      a use-after-free regression in register_bcache(). Here are current code,
      	2510 out_free_path:
      	2511         kfree(path);
      	2512 out_module_put:
      	2513         module_put(THIS_MODULE);
      	2514 out:
      	2515         pr_info("error %s: %s", path, err);
      	2516         return ret;
      If some error happens and the above code path is executed, at line 2511
      path is released, but referenced at line 2515. Then KASAN reports a use-
      after-free error message.
      
      This patch changes line 2515 in the following way to fix the problem,
      	2515         pr_info("error %s: %s", path?path:"", err);
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ae3cd299
    • Coly Li's avatar
      bcache: properly initialize 'path' and 'err' in register_bcache() · 29cda393
      Coly Li authored
      Patch "bcache: rework error unwinding in register_bcache" from
      Christoph Hellwig changes the local variables 'path' and 'err'
      in undefined initial state. If the code in register_bcache() jumps
      to label 'out:' or 'out_module_put:' by goto, these two variables
      might be reference with undefined value by the following line,
      
      	out_module_put:
      	        module_put(THIS_MODULE);
      	out:
      	        pr_info("error %s: %s", path, err);
      	        return ret;
      
      Therefore this patch initializes these two local variables properly
      in register_bcache() to avoid such issue.
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      29cda393
    • Christoph Hellwig's avatar
      bcache: rework error unwinding in register_bcache · 50246693
      Christoph Hellwig authored
      Split the successful and error return path, and use one goto label for each
      resource to unwind.  This also fixes some small errors like leaking the
      module reference count in the reboot case (which seems entirely harmless)
      or printing the wrong warning messages for early failures.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      50246693
    • Christoph Hellwig's avatar
      bcache: use a separate data structure for the on-disk super block · a702a692
      Christoph Hellwig authored
      Split out an on-disk version struct cache_sb with the proper endianness
      annotations.  This fixes a fair chunk of sparse warnings, but there are
      some left due to the way the checksum is defined.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a702a692
    • Liang Chen's avatar
      bcache: cached_dev_free needs to put the sb page · e8547d42
      Liang Chen authored
      Same as cache device, the buffer page needs to be put while
      freeing cached_dev.  Otherwise a page would be leaked every
      time a cached_dev is stopped.
      Signed-off-by: default avatarLiang Chen <liangchen.linux@gmail.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e8547d42
  2. 14 Jan, 2020 1 commit
    • Jens Axboe's avatar
      Merge branch 'md-next' of... · 7454049e
      Jens Axboe authored
      Merge branch 'md-next' of git://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.6/drivers
      
      Pull MD changes from Song.
      
      * 'md-next' of git://git.kernel.org/pub/scm/linux/kernel/git/song/md:
        md/raid1: introduce wait_for_serialization
        md/raid1: use bucket based mechanism for IO serialization
        md: introduce a new struct for IO serialization
        md: don't destroy serial_info_pool if serialize_policy is true
        raid1: serialize the overlap write
        md: reorgnize mddev_create/destroy_serial_pool
        md: add serialize_policy sysfs node for raid1
        md: prepare for enable raid1 io serialization
        md: fix a typo s/creat/create
        md: rename wb stuffs
        raid5: remove worker_cnt_per_group argument from alloc_thread_groups
        md/raid6: fix algorithm choice under larger PAGE_SIZE
        raid6/test: fix a compilation warning
        raid6/test: fix a compilation error
        md-bitmap: small cleanups
      7454049e
  3. 13 Jan, 2020 15 commits
  4. 09 Jan, 2020 1 commit
  5. 07 Jan, 2020 2 commits
  6. 19 Dec, 2019 3 commits
  7. 18 Dec, 2019 1 commit
    • Linus Torvalds's avatar
      Merge tag 'sound-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 80a0c2e5
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "A slightly high amount at this time, but all good and small fixes:
      
         - A PCM core fix that initializes the buffer properly for avoiding
           information leaks; it is a long-standing minor problem, but good to
           fix better now
      
         - A few ASoC core fixes for the init / cleanup ordering issues that
           surfaced after the recent refactoring
      
         - Lots of SOF and topology-related fixes went in, as usual as such
           hot topics
      
         - Several ASoC codec and platform-specific small fixes: wm89xx,
           realtek, and max98090, AMD, Intel-SST
      
         - A fix for the previous incomplete regression of HD-audio, now
           hitting Nvidia HDMI
      
         - A few HD-audio CA0132 codec fixes"
      
      * tag 'sound-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (27 commits)
        ALSA: hda - Downgrade error message for single-cmd fallback
        ASoC: wm8962: fix lambda value
        ALSA: hda: Fix regression by strip mask fix
        ALSA: hda/ca0132 - Fix work handling in delayed HP detection
        ALSA: hda/ca0132 - Avoid endless loop
        ALSA: hda/ca0132 - Keep power on during processing DSP response
        ALSA: pcm: Avoid possible info leaks from PCM stream buffers
        ASoC: Intel: common: work-around incorrect ACPI HID for CML boards
        ASoC: SOF: Intel: split cht and byt debug window sizes
        ASoC: SOF: loader: fix snd_sof_fw_parse_ext_data
        ASoC: SOF: loader: snd_sof_fw_parse_ext_data log warning on unknown header
        ASoC: simple-card: Don't create separate link when platform is present
        ASoC: topology: Check return value for soc_tplg_pcm_create()
        ASoC: topology: Check return value for snd_soc_add_dai_link()
        ASoC: core: only flush inited work during free
        ASoC: Intel: bytcr_rt5640: Update quirk for Teclast X89
        ASoC: core: Init pcm runtime work early to avoid warnings
        ASoC: Intel: sst: Add missing include <linux/io.h>
        ASoC: max98090: fix possible race conditions
        ASoC: max98090: exit workaround earlier if PLL is locked
        ...
      80a0c2e5
  8. 17 Dec, 2019 5 commits
    • Linus Torvalds's avatar
      Merge tag 'for-5.5-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 2187f215
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
       "A mix of regression fixes and regular fixes for stable trees:
      
         - fix swapped error messages for qgroup enable/rescan
      
         - fixes for NO_HOLES feature with clone range
      
         - fix deadlock between iget/srcu lock/synchronize srcu while freeing
           an inode
      
         - fix double lock on subvolume cross-rename
      
         - tree log fixes
            * fix missing data checksums after replaying a log tree
            * also teach tree-checker about this problem
            * skip log replay on orphaned roots
      
         - fix maximum devices constraints for RAID1C -3 and -4
      
         - send: don't print warning on read-only mount regarding orphan
           cleanup
      
         - error handling fixes"
      
      * tag 'for-5.5-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: send: remove WARN_ON for readonly mount
        btrfs: do not leak reloc root if we fail to read the fs root
        btrfs: skip log replay on orphaned roots
        btrfs: handle ENOENT in btrfs_uuid_tree_iterate
        btrfs: abort transaction after failed inode updates in create_subvol
        Btrfs: fix hole extent items with a zero size after range cloning
        Btrfs: fix removal logic of the tree mod log that leads to use-after-free issues
        Btrfs: make tree checker detect checksum items with overlapping ranges
        Btrfs: fix missing data checksums after replaying a log tree
        btrfs: return error pointer from alloc_test_extent_buffer
        btrfs: fix devs_max constraints for raid1c3 and raid1c4
        btrfs: tree-checker: Fix error format string for size_t
        btrfs: don't double lock the subvol_sem for rename exchange
        btrfs: handle error in btrfs_cache_block_group
        btrfs: do not call synchronize_srcu() in inode_tree_del
        Btrfs: fix cloning range with a hole when using the NO_HOLES feature
        btrfs: Fix error messages in qgroup_rescan_init
      2187f215
    • Linus Torvalds's avatar
      early init: fix error handling when opening /dev/console · 2d3145f8
      Linus Torvalds authored
      The comment says "this should never fail", but it definitely can fail
      when you have odd initial boot filesystems, or kernel configurations.
      
      So get the error handling right: filp_open() returns an error pointer.
      Reported-by: default avatarJesse Barnes <jsbarnes@google.com>
      Reported-by: default avataryouling 257 <youling257@gmail.com>
      Fixes: 8243186f ("fs: remove ksys_dup()")
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2d3145f8
    • Linus Torvalds's avatar
      Merge tag 'regulator-fix-v5.5-rc2' of... · 58d90a04
      Linus Torvalds authored
      Merge tag 'regulator-fix-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
      
      Pull regulator fixes from Mark Brown:
       "A small set of fixes for mostly minor issues here, the only real code
        ones are Wen Yang's fixes for error handling in the core and Christian
        Marussi's list_voltage() change which is a fix for disruptively bad
        performance for regulators with continuous voltage control (which are
        rare)"
      
      * tag 'regulator-fix-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
        regulator: rn5t618: fix module aliases
        regulator: max77650: add of_match table
        regulator: core: avoid unneeded .list_voltage calls
        regulator: s5m8767: Fix a warning message
        regulator: core: fix regulator_register() error paths to properly release rdev
        regulator: fix use after free issue
      58d90a04
    • Linus Torvalds's avatar
      Merge tag 'spi-fix-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi · a922f1a9
      Linus Torvalds authored
      Pull spi fixes from Mark Brown:
       "A relatively large set of fixes here, the biggest part of it is for
        fallout from the GPIO descriptor rework that affected several of the
        devices with usable native chip select support. There's also some new
        PCI IDs for Intel Jasper Lake devices.
      
        The conversion to platform_get_irq() in the fsl driver is an
        incremental fix for build errors introduced on SPARC by the earlier
        fix for error handling in probe in that driver"
      
      * tag 'spi-fix-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
        spi: fsl: use platform_get_irq() instead of of_irq_to_resource()
        spi: nxp-fspi: Ensure width is respected in spi-mem operations
        spi: spi-ti-qspi: Fix a bug when accessing non default CS
        spi: fsl: don't map irq during probe
        spi: spi-cavium-thunderx: Add missing pci_release_regions()
        spi: sprd: Fix the incorrect SPI register
        gpiolib: of: Make of_gpio_spi_cs_get_count static
        spi: fsl: Handle the single hardwired chipselect case
        gpio: Handle counting of Freescale chipselects
        spi: fsl: Fix GPIO descriptor support
        spi: dw: Correct handling of native chipselect
        spi: cadence: Correct handling of native chipselect
        spi: pxa2xx: Add support for Intel Jasper Lake
      a922f1a9
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9065e063
      Linus Torvalds authored
      Pull x86 fix from Ingo Molnar:
       "Fix kexec booting with certain EFI memory map layouts"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/efi: Update e820 with reserved EFI boot services data to fix kexec breakage
      9065e063