1. 20 Nov, 2023 19 commits
    • Jan Höppner's avatar
      s390/dasd: protect device queue against concurrent access · db46cd1e
      Jan Höppner authored
      In dasd_profile_start() the amount of requests on the device queue are
      counted. The access to the device queue is unprotected against
      concurrent access. With a lot of parallel I/O, especially with alias
      devices enabled, the device queue can change while dasd_profile_start()
      is accessing the queue. In the worst case this leads to a kernel panic
      due to incorrect pointer accesses.
      
      Fix this by taking the device lock before accessing the queue and
      counting the requests. Additionally the check for a valid profile data
      pointer can be done earlier to avoid unnecessary locking in a hot path.
      
      Cc:  <stable@vger.kernel.org>
      Fixes: 4fa52aa7 ("[S390] dasd: add enhanced DASD statistics interface")
      Reviewed-by: default avatarStefan Haberland <sth@linux.ibm.com>
      Signed-off-by: default avatarJan Höppner <hoeppner@linux.ibm.com>
      Signed-off-by: default avatarStefan Haberland <sth@linux.ibm.com>
      Link: https://lore.kernel.org/r/20231025132437.1223363-3-sth@linux.ibm.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      db46cd1e
    • Muhammad Muzammil's avatar
    • Chengming Zhou's avatar
      block/null_blk: Fix double blk_mq_start_request() warning · 53f2bca2
      Chengming Zhou authored
      When CONFIG_BLK_DEV_NULL_BLK_FAULT_INJECTION is enabled, null_queue_rq()
      would return BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE for the request,
      which has been marked as MQ_RQ_IN_FLIGHT by blk_mq_start_request().
      
      Then null_queue_rqs() put these requests in the rqlist, return back to
      the block layer core, which would try to queue them individually again,
      so the warning in blk_mq_start_request() triggered.
      
      Fix it by splitting the null_queue_rq() into two parts: the first is the
      preparation of request, the second is the handling of request. We put
      the blk_mq_start_request() after the preparation part, which may fail
      and return back to the block layer core.
      
      The throttling also belongs to the preparation part, so move it before
      blk_mq_start_request(). And change the return type of null_handle_cmd()
      to void, since it always return BLK_STS_OK now.
      Reported-by: default avatar  <syzbot+fcc47ba2476570cbbeb0@syzkaller.appspotmail.com>
      Closes: https://lore.kernel.org/all/0000000000000e6aac06098aee0c@google.com/
      Fixes: d78bfa13 ("block/null_blk: add queue_rqs() support")
      Suggested-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarChengming Zhou <zhouchengming@bytedance.com>
      Link: https://lore.kernel.org/r/20231120032521.1012037-1-chengming.zhou@linux.devSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      53f2bca2
    • Damien Le Moal's avatar
      block: Remove blk_set_runtime_active() · c96b8175
      Damien Le Moal authored
      The function blk_set_runtime_active() is called only from
      blk_post_runtime_resume(), so there is no need for that function to be
      exported. Open-code this function directly in blk_post_runtime_resume()
      and remove it.
      Signed-off-by: default avatarDamien Le Moal <dlemoal@kernel.org>
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Link: https://lore.kernel.org/r/20231120070611.33951-1-dlemoal@kernel.orgSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c96b8175
    • Li Nan's avatar
      nbd: fix null-ptr-dereference while accessing 'nbd->config' · c2da049f
      Li Nan authored
      Memory reordering may occur in nbd_genl_connect(), causing config_refs
      to be set to 1 while nbd->config is still empty. Opening nbd at this
      time will cause null-ptr-dereference.
      
         T1                      T2
         nbd_open
          nbd_get_config_unlocked
                       	   nbd_genl_connect
                       	    nbd_alloc_and_init_config
                       	     //memory reordered
                        	     refcount_set(&nbd->config_refs, 1)  // 2
           nbd->config
            ->null point
      			     nbd->config = config  // 1
      
      Fix it by adding smp barrier to guarantee the execution sequence.
      Signed-off-by: default avatarLi Nan <linan122@huawei.com>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Link: https://lore.kernel.org/r/20231116162316.1740402-4-linan666@huaweicloud.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c2da049f
    • Li Nan's avatar
      nbd: factor out a helper to get nbd_config without holding 'config_lock' · 3123ac77
      Li Nan authored
      There are no functional changes, just to make code cleaner and prepare
      to fix null-ptr-dereference while accessing 'nbd->config'.
      Signed-off-by: default avatarLi Nan <linan122@huawei.com>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Link: https://lore.kernel.org/r/20231116162316.1740402-3-linan666@huaweicloud.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3123ac77
    • Li Nan's avatar
      nbd: fold nbd config initialization into nbd_alloc_config() · 1b598605
      Li Nan authored
      There are no functional changes, make the code cleaner and prepare to
      fix null-ptr-dereference while accessing 'nbd->config'.
      Signed-off-by: default avatarLi Nan <linan122@huawei.com>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Link: https://lore.kernel.org/r/20231116162316.1740402-2-linan666@huaweicloud.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      1b598605
    • Jens Axboe's avatar
      Merge tag 'md-fixes-20231120' of... · 8a554c62
      Jens Axboe authored
      Merge tag 'md-fixes-20231120' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-6.7
      
      Pull MD fix from Song.
      
      * tag 'md-fixes-20231120' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
        md: fix bi_status reporting in md_end_clone_io
      8a554c62
    • Coly Li's avatar
      bcache: avoid NULL checking to c->root in run_cache_set() · 3eba5e0b
      Coly Li authored
      In run_cache_set() after c->root returned from bch_btree_node_get(), it
      is checked by IS_ERR_OR_NULL(). Indeed it is unncessary to check NULL
      because bch_btree_node_get() will not return NULL pointer to caller.
      
      This patch replaces IS_ERR_OR_NULL() by IS_ERR() for the above reason.
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Link: https://lore.kernel.org/r/20231120052503.6122-11-colyli@suse.deSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3eba5e0b
    • Coly Li's avatar
      bcache: add code comments for bch_btree_node_get() and __bch_btree_node_alloc() · 31f5b956
      Coly Li authored
      This patch adds code comments to bch_btree_node_get() and
      __bch_btree_node_alloc() that NULL pointer will not be returned and it
      is unnecessary to check NULL pointer by the callers of these routines.
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Link: https://lore.kernel.org/r/20231120052503.6122-10-colyli@suse.deSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      31f5b956
    • Coly Li's avatar
      bcache: replace a mistaken IS_ERR() by IS_ERR_OR_NULL() in btree_gc_coalesce() · f72f4312
      Coly Li authored
      Commit 028ddcac ("bcache: Remove unnecessary NULL point check in
      node allocations") do the following change inside btree_gc_coalesce(),
      
      31 @@ -1340,7 +1340,7 @@ static int btree_gc_coalesce(
      32         memset(new_nodes, 0, sizeof(new_nodes));
      33         closure_init_stack(&cl);
      34
      35 -       while (nodes < GC_MERGE_NODES && !IS_ERR_OR_NULL(r[nodes].b))
      36 +       while (nodes < GC_MERGE_NODES && !IS_ERR(r[nodes].b))
      37                 keys += r[nodes++].keys;
      38
      39         blocks = btree_default_blocks(b->c) * 2 / 3;
      
      At line 35 the original r[nodes].b is not always allocatored from
      __bch_btree_node_alloc(), and possibly initialized as NULL pointer by
      caller of btree_gc_coalesce(). Therefore the change at line 36 is not
      correct.
      
      This patch replaces the mistaken IS_ERR() by IS_ERR_OR_NULL() to avoid
      potential issue.
      
      Fixes: 028ddcac ("bcache: Remove unnecessary NULL point check in node allocations")
      Cc:  <stable@vger.kernel.org> # 6.5+
      Cc: Zheng Wang <zyytlz.wz@163.com>
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Link: https://lore.kernel.org/r/20231120052503.6122-9-colyli@suse.deSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f72f4312
    • Mingzhe Zou's avatar
      bcache: fixup multi-threaded bch_sectors_dirty_init() wake-up race · 2faac25d
      Mingzhe Zou authored
      We get a kernel crash about "unable to handle kernel paging request":
      
      ```dmesg
      [368033.032005] BUG: unable to handle kernel paging request at ffffffffad9ae4b5
      [368033.032007] PGD fc3a0d067 P4D fc3a0d067 PUD fc3a0e063 PMD 8000000fc38000e1
      [368033.032012] Oops: 0003 [#1] SMP PTI
      [368033.032015] CPU: 23 PID: 55090 Comm: bch_dirtcnt[0] Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-147.5.1.es8_24.x86_64 #1
      [368033.032017] Hardware name: Tsinghua Tongfang THTF Chaoqiang Server/072T6D, BIOS 2.4.3 01/17/2017
      [368033.032027] RIP: 0010:native_queued_spin_lock_slowpath+0x183/0x1d0
      [368033.032029] Code: 8b 02 48 85 c0 74 f6 48 89 c1 eb d0 c1 e9 12 83 e0
      03 83 e9 01 48 c1 e0 05 48 63 c9 48 05 c0 3d 02 00 48 03 04 cd 60 68 93
      ad <48> 89 10 8b 42 08 85 c0 75 09 f3 90 8b 42 08 85 c0 74 f7 48 8b 02
      [368033.032031] RSP: 0018:ffffbb48852abe00 EFLAGS: 00010082
      [368033.032032] RAX: ffffffffad9ae4b5 RBX: 0000000000000246 RCX: 0000000000003bf3
      [368033.032033] RDX: ffff97b0ff8e3dc0 RSI: 0000000000600000 RDI: ffffbb4884743c68
      [368033.032034] RBP: 0000000000000001 R08: 0000000000000000 R09: 000007ffffffffff
      [368033.032035] R10: ffffbb486bb01000 R11: 0000000000000001 R12: ffffffffc068da70
      [368033.032036] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
      [368033.032038] FS:  0000000000000000(0000) GS:ffff97b0ff8c0000(0000) knlGS:0000000000000000
      [368033.032039] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [368033.032040] CR2: ffffffffad9ae4b5 CR3: 0000000fc3a0a002 CR4: 00000000003626e0
      [368033.032042] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [368033.032043] bcache: bch_cached_dev_attach() Caching rbd479 as bcache462 on set 8cff3c36-4a76-4242-afaa-7630206bc70b
      [368033.032045] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [368033.032046] Call Trace:
      [368033.032054]  _raw_spin_lock_irqsave+0x32/0x40
      [368033.032061]  __wake_up_common_lock+0x63/0xc0
      [368033.032073]  ? bch_ptr_invalid+0x10/0x10 [bcache]
      [368033.033502]  bch_dirty_init_thread+0x14c/0x160 [bcache]
      [368033.033511]  ? read_dirty_submit+0x60/0x60 [bcache]
      [368033.033516]  kthread+0x112/0x130
      [368033.033520]  ? kthread_flush_work_fn+0x10/0x10
      [368033.034505]  ret_from_fork+0x35/0x40
      ```
      
      The crash occurred when call wake_up(&state->wait), and then we want
      to look at the value in the state. However, bch_sectors_dirty_init()
      is not found in the stack of any task. Since state is allocated on
      the stack, we guess that bch_sectors_dirty_init() has exited, causing
      bch_dirty_init_thread() to be unable to handle kernel paging request.
      
      In order to verify this idea, we added some printing information during
      wake_up(&state->wait). We find that "wake up" is printed twice, however
      we only expect the last thread to wake up once.
      
      ```dmesg
      [  994.641004] alcache: bch_dirty_init_thread() wake up
      [  994.641018] alcache: bch_dirty_init_thread() wake up
      [  994.641523] alcache: bch_sectors_dirty_init() init exit
      ```
      
      There is a race. If bch_sectors_dirty_init() exits after the first wake
      up, the second wake up will trigger this bug("unable to handle kernel
      paging request").
      
      Proceed as follows:
      
      bch_sectors_dirty_init
          kthread_run ==============> bch_dirty_init_thread(bch_dirtcnt[0])
                  ...                         ...
          atomic_inc(&state.started)          ...
                  ...                         ...
          atomic_read(&state.enough)          ...
                  ...                 atomic_set(&state->enough, 1)
          kthread_run ======================================================> bch_dirty_init_thread(bch_dirtcnt[1])
                  ...                 atomic_dec_and_test(&state->started)            ...
          atomic_inc(&state.started)          ...                                     ...
                  ...                 wake_up(&state->wait)                           ...
          atomic_read(&state.enough)                                          atomic_dec_and_test(&state->started)
                  ...                                                                 ...
          wait_event(state.wait, atomic_read(&state.started) == 0)                    ...
          return                                                                      ...
                                                                              wake_up(&state->wait)
      
      We believe it is very common to wake up twice if there is no dirty, but
      crash is an extremely low probability event. It's hard for us to reproduce
      this issue. We attached and detached continuously for a week, with a total
      of more than one million attaches and only one crash.
      
      Putting atomic_inc(&state.started) before kthread_run() can avoid waking
      up twice.
      
      Fixes: b144e45f ("bcache: make bch_sectors_dirty_init() to be multithreaded")
      Signed-off-by: default avatarMingzhe Zou <mingzhe.zou@easystack.cn>
      Cc:  <stable@vger.kernel.org>
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Link: https://lore.kernel.org/r/20231120052503.6122-8-colyli@suse.deSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2faac25d
    • Mingzhe Zou's avatar
      bcache: fixup lock c->root error · e34820f9
      Mingzhe Zou authored
      We had a problem with io hung because it was waiting for c->root to
      release the lock.
      
      crash> cache_set.root -l cache_set.list ffffa03fde4c0050
        root = 0xffff802ef454c800
      crash> btree -o 0xffff802ef454c800 | grep rw_semaphore
        [ffff802ef454c858] struct rw_semaphore lock;
      crash> struct rw_semaphore ffff802ef454c858
      struct rw_semaphore {
        count = {
          counter = -4294967297
        },
        wait_list = {
          next = 0xffff00006786fc28,
          prev = 0xffff00005d0efac8
        },
        wait_lock = {
          raw_lock = {
            {
              val = {
                counter = 0
              },
              {
                locked = 0 '\000',
                pending = 0 '\000'
              },
              {
                locked_pending = 0,
                tail = 0
              }
            }
          }
        },
        osq = {
          tail = {
            counter = 0
          }
        },
        owner = 0xffffa03fdc586603
      }
      
      The "counter = -4294967297" means that lock count is -1 and a write lock
      is being attempted. Then, we found that there is a btree with a counter
      of 1 in btree_cache_freeable.
      
      crash> cache_set -l cache_set.list ffffa03fde4c0050 -o|grep btree_cache
        [ffffa03fde4c1140] struct list_head btree_cache;
        [ffffa03fde4c1150] struct list_head btree_cache_freeable;
        [ffffa03fde4c1160] struct list_head btree_cache_freed;
        [ffffa03fde4c1170] unsigned int btree_cache_used;
        [ffffa03fde4c1178] wait_queue_head_t btree_cache_wait;
        [ffffa03fde4c1190] struct task_struct *btree_cache_alloc_lock;
      crash> list -H ffffa03fde4c1140|wc -l
      973
      crash> list -H ffffa03fde4c1150|wc -l
      1123
      crash> cache_set.btree_cache_used -l cache_set.list ffffa03fde4c0050
        btree_cache_used = 2097
      crash> list -s btree -l btree.list -H ffffa03fde4c1140|grep -E -A2 "^  lock = {" > btree_cache.txt
      crash> list -s btree -l btree.list -H ffffa03fde4c1150|grep -E -A2 "^  lock = {" > btree_cache_freeable.txt
      [root@node-3 127.0.0.1-2023-08-04-16:40:28]# pwd
      /var/crash/127.0.0.1-2023-08-04-16:40:28
      [root@node-3 127.0.0.1-2023-08-04-16:40:28]# cat btree_cache.txt|grep counter|grep -v "counter = 0"
      [root@node-3 127.0.0.1-2023-08-04-16:40:28]# cat btree_cache_freeable.txt|grep counter|grep -v "counter = 0"
            counter = 1
      
      We found that this is a bug in bch_sectors_dirty_init() when locking c->root:
          (1). Thread X has locked c->root(A) write.
          (2). Thread Y failed to lock c->root(A), waiting for the lock(c->root A).
          (3). Thread X bch_btree_set_root() changes c->root from A to B.
          (4). Thread X releases the lock(c->root A).
          (5). Thread Y successfully locks c->root(A).
          (6). Thread Y releases the lock(c->root B).
      
              down_write locked ---(1)----------------------┐
                      |                                     |
                      |   down_read waiting ---(2)----┐     |
                      |           |               ┌-------------┐ ┌-------------┐
              bch_btree_set_root ===(3)========>> | c->root   A | | c->root   B |
                      |           |               └-------------┘ └-------------┘
                  up_write ---(4)---------------------┘     |            |
                                  |                         |            |
                          down_read locked ---(5)-----------┘            |
                                  |                                      |
                              up_read ---(6)-----------------------------┘
      
      Since c->root may change, the correct steps to lock c->root should be
      the same as bch_root_usage(), compare after locking.
      
      static unsigned int bch_root_usage(struct cache_set *c)
      {
              unsigned int bytes = 0;
              struct bkey *k;
              struct btree *b;
              struct btree_iter iter;
      
              goto lock_root;
      
              do {
                      rw_unlock(false, b);
      lock_root:
                      b = c->root;
                      rw_lock(false, b, b->level);
              } while (b != c->root);
      
              for_each_key_filter(&b->keys, k, &iter, bch_ptr_bad)
                      bytes += bkey_bytes(k);
      
              rw_unlock(false, b);
      
              return (bytes * 100) / btree_bytes(c);
      }
      
      Fixes: b144e45f ("bcache: make bch_sectors_dirty_init() to be multithreaded")
      Signed-off-by: default avatarMingzhe Zou <mingzhe.zou@easystack.cn>
      Cc:  <stable@vger.kernel.org>
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Link: https://lore.kernel.org/r/20231120052503.6122-7-colyli@suse.deSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e34820f9
    • Mingzhe Zou's avatar
      bcache: fixup init dirty data errors · 7cc47e64
      Mingzhe Zou authored
      We found that after long run, the dirty_data of the bcache device
      will have errors. This error cannot be eliminated unless re-register.
      
      We also found that reattach after detach, this error can accumulate.
      
      In bch_sectors_dirty_init(), all inode <= d->id keys will be recounted
      again. This is wrong, we only need to count the keys of the current
      device.
      
      Fixes: b144e45f ("bcache: make bch_sectors_dirty_init() to be multithreaded")
      Signed-off-by: default avatarMingzhe Zou <mingzhe.zou@easystack.cn>
      Cc:  <stable@vger.kernel.org>
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Link: https://lore.kernel.org/r/20231120052503.6122-6-colyli@suse.deSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7cc47e64
    • Rand Deeb's avatar
      bcache: prevent potential division by zero error · 2c7f497a
      Rand Deeb authored
      In SHOW(), the variable 'n' is of type 'size_t.' While there is a
      conditional check to verify that 'n' is not equal to zero before
      executing the 'do_div' macro, concerns arise regarding potential
      division by zero error in 64-bit environments.
      
      The concern arises when 'n' is 64 bits in size, greater than zero, and
      the lower 32 bits of it are zeros. In such cases, the conditional check
      passes because 'n' is non-zero, but the 'do_div' macro casts 'n' to
      'uint32_t,' effectively truncating it to its lower 32 bits.
      Consequently, the 'n' value becomes zero.
      
      To fix this potential division by zero error and ensure precise
      division handling, this commit replaces the 'do_div' macro with
      div64_u64(). div64_u64() is designed to work with 64-bit operands,
      guaranteeing that division is performed correctly.
      
      This change enhances the robustness of the code, ensuring that division
      operations yield accurate results in all scenarios, eliminating the
      possibility of division by zero, and improving compatibility across
      different 64-bit environments.
      
      Found by Linux Verification Center (linuxtesting.org) with SVACE.
      Signed-off-by: default avatarRand Deeb <rand.sec96@gmail.com>
      Cc:  <stable@vger.kernel.org>
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Link: https://lore.kernel.org/r/20231120052503.6122-5-colyli@suse.deSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2c7f497a
    • Colin Ian King's avatar
      bcache: remove redundant assignment to variable cur_idx · be93825f
      Colin Ian King authored
      Variable cur_idx is being initialized with a value that is never read,
      it is being re-assigned later in a while-loop. Remove the redundant
      assignment. Cleans up clang scan build warning:
      
      drivers/md/bcache/writeback.c:916:2: warning: Value stored to 'cur_idx'
      is never read [deadcode.DeadStores]
      Signed-off-by: default avatarColin Ian King <colin.i.king@gmail.com>
      Reviewed-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Link: https://lore.kernel.org/r/20231120052503.6122-4-colyli@suse.deSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      be93825f
    • Coly Li's avatar
      bcache: check return value from btree_node_alloc_replacement() · 777967e7
      Coly Li authored
      In btree_gc_rewrite_node(), pointer 'n' is not checked after it returns
      from btree_gc_rewrite_node(). There is potential possibility that 'n' is
      a non NULL ERR_PTR(), referencing such error code is not permitted in
      following code. Therefore a return value checking is necessary after 'n'
      is back from btree_node_alloc_replacement().
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Reported-by: default avatarDan Carpenter <dan.carpenter@linaro.org>
      Cc:  <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20231120052503.6122-3-colyli@suse.deSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      777967e7
    • Coly Li's avatar
      bcache: avoid oversize memory allocation by small stripe_size · baf8fb7e
      Coly Li authored
      Arraies bcache->stripe_sectors_dirty and bcache->full_dirty_stripes are
      used for dirty data writeback, their sizes are decided by backing device
      capacity and stripe size. Larger backing device capacity or smaller
      stripe size make these two arraies occupies more dynamic memory space.
      
      Currently bcache->stripe_size is directly inherited from
      queue->limits.io_opt of underlying storage device. For normal hard
      drives, its limits.io_opt is 0, and bcache sets the corresponding
      stripe_size to 1TB (1<<31 sectors), it works fine 10+ years. But for
      devices do declare value for queue->limits.io_opt, small stripe_size
      (comparing to 1TB) becomes an issue for oversize memory allocations of
      bcache->stripe_sectors_dirty and bcache->full_dirty_stripes, while the
      capacity of hard drives gets much larger in recent decade.
      
      For example a raid5 array assembled by three 20TB hardrives, the raid
      device capacity is 40TB with typical 512KB limits.io_opt. After the math
      calculation in bcache code, these two arraies will occupy 400MB dynamic
      memory. Even worse Andrea Tomassetti reports that a 4KB limits.io_opt is
      declared on a new 2TB hard drive, then these two arraies request 2GB and
      512MB dynamic memory from kzalloc(). The result is that bcache device
      always fails to initialize on his system.
      
      To avoid the oversize memory allocation, bcache->stripe_size should not
      directly inherited by queue->limits.io_opt from the underlying device.
      This patch defines BCH_MIN_STRIPE_SZ (4MB) as minimal bcache stripe size
      and set bcache device's stripe size against the declared limits.io_opt
      value from the underlying storage device,
      - If the declared limits.io_opt > BCH_MIN_STRIPE_SZ, bcache device will
        set its stripe size directly by this limits.io_opt value.
      - If the declared limits.io_opt < BCH_MIN_STRIPE_SZ, bcache device will
        set its stripe size by a value multiplying limits.io_opt and euqal or
        large than BCH_MIN_STRIPE_SZ.
      
      Then the minimal stripe size of a bcache device will always be >= 4MB.
      For a 40TB raid5 device with 512KB limits.io_opt, memory occupied by
      bcache->stripe_sectors_dirty and bcache->full_dirty_stripes will be 50MB
      in total. For a 2TB hard drive with 4KB limits.io_opt, memory occupied
      by these two arraies will be 2.5MB in total.
      
      Such mount of memory allocated for bcache->stripe_sectors_dirty and
      bcache->full_dirty_stripes is reasonable for most of storage devices.
      Reported-by: default avatarAndrea Tomassetti <andrea.tomassetti-opensource@devo.com>
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Reviewed-by: default avatarEric Wheeler <bcache@lists.ewheeler.net>
      Link: https://lore.kernel.org/r/20231120052503.6122-2-colyli@suse.deSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      baf8fb7e
    • Song Liu's avatar
      md: fix bi_status reporting in md_end_clone_io · 45b47895
      Song Liu authored
      md_end_clone_io() may overwrite error status in orig_bio->bi_status with
      BLK_STS_OK. This could happen when orig_bio has BIO_CHAIN (split by
      md_submit_bio => bio_split_to_limits, for example). As a result, upper
      layer may miss error reported from md (or the device) and consider the
      failed IO was successful.
      
      Fix this by only update orig_bio->bi_status when current bio reports
      error and orig_bio is BLK_STS_OK. This is the same behavior as
      __bio_chain_endio().
      
      Fixes: 10764815 ("md: add io accounting for raid0 and raid5")
      Cc: stable@vger.kernel.org # v5.14+
      Reported-by: default avatarBhanu Victor DiCara <00bvd0+linux@gmail.com>
      Closes: https://lore.kernel.org/regressions/5727380.DvuYhMxLoT@bvd0/Signed-off-by: default avatarSong Liu <song@kernel.org>
      Tested-by: default avatarXiao Ni <xni@redhat.com>
      Reviewed-by: default avatarYu Kuai <yukuai3@huawei.com>
      Acked-by: default avatarGuoqing Jiang <guoqing.jiang@linux.dev>
      45b47895
  2. 17 Nov, 2023 3 commits
  3. 13 Nov, 2023 2 commits
  4. 12 Nov, 2023 5 commits
  5. 11 Nov, 2023 1 commit
    • Linus Torvalds's avatar
      Merge tag 'probes-fixes-v6.7-rc1' of... · 3ca112b7
      Linus Torvalds authored
      Merge tag 'probes-fixes-v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
      
      Pull probes fixes from Masami Hiramatsu:
      
       - Documentation update: Add a note about argument and return value
         fetching is the best effort because it depends on the type.
      
       - objpool: Fix to make internal global variables static in
         test_objpool.c.
      
       - kprobes: Unify kprobes_exceptions_nofify() prototypes. There are the
         same prototypes in asm/kprobes.h for some architectures, but some of
         them are missing the prototype and it causes a warning. So move the
         prototype into linux/kprobes.h.
      
       - tracing: Fix to check the tracepoint event and return event at
         parsing stage. The tracepoint event doesn't support %return but if
         $retval exists, it will be converted to %return silently. This finds
         that case and rejects it.
      
       - tracing: Fix the order of the descriptions about the parameters of
         __kprobe_event_gen_cmd_start() to be consistent with the argument
         list of the function.
      
      * tag 'probes-fixes-v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        tracing/kprobes: Fix the order of argument descriptions
        tracing: fprobe-event: Fix to check tracepoint event and return
        kprobes: unify kprobes_exceptions_nofify() prototypes
        lib: test_objpool: make global variables static
        Documentation: tracing: Add a note about argument and retval access
      3ca112b7
  6. 10 Nov, 2023 10 commits
    • Linus Torvalds's avatar
      Merge tag 'fbdev-for-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev · 18553507
      Linus Torvalds authored
      Pull fbdev fixes and cleanups from Helge Deller:
      
       - fix double free and resource leaks in imsttfb
      
       - lots of remove callback cleanups and section mismatch fixes in
         omapfb, amifb and atmel_lcdfb
      
       - error code fix and memparse simplification in omapfb
      
      * tag 'fbdev-for-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev: (31 commits)
        fbdev: fsl-diu-fb: mark wr_reg_wa() static
        fbdev: amifb: Convert to platform remove callback returning void
        fbdev: amifb: Mark driver struct with __refdata to prevent section mismatch warning
        fbdev: hyperv_fb: fix uninitialized local variable use
        fbdev: omapfb/tpd12s015: Convert to platform remove callback returning void
        fbdev: omapfb/tfp410: Convert to platform remove callback returning void
        fbdev: omapfb/sharp-ls037v7dw01: Convert to platform remove callback returning void
        fbdev: omapfb/opa362: Convert to platform remove callback returning void
        fbdev: omapfb/hdmi: Convert to platform remove callback returning void
        fbdev: omapfb/dvi: Convert to platform remove callback returning void
        fbdev: omapfb/dsi-cm: Convert to platform remove callback returning void
        fbdev: omapfb/dpi: Convert to platform remove callback returning void
        fbdev: omapfb/analog-tv: Convert to platform remove callback returning void
        fbdev: atmel_lcdfb: Convert to platform remove callback returning void
        fbdev: omapfb/tpd12s015: Don't put .remove() in .exit.text and drop suppress_bind_attrs
        fbdev: omapfb/tfp410: Don't put .remove() in .exit.text and drop suppress_bind_attrs
        fbdev: omapfb/sharp-ls037v7dw01: Don't put .remove() in .exit.text and drop suppress_bind_attrs
        fbdev: omapfb/opa362: Don't put .remove() in .exit.text and drop suppress_bind_attrs
        fbdev: omapfb/hdmi: Don't put .remove() in .exit.text and drop suppress_bind_attrs
        fbdev: omapfb/dvi: Don't put .remove() in .exit.text and drop suppress_bind_attrs
        ...
      18553507
    • Yujie Liu's avatar
      tracing/kprobes: Fix the order of argument descriptions · f032c53b
      Yujie Liu authored
      The order of descriptions should be consistent with the argument list of
      the function, so "kretprobe" should be the second one.
      
      int __kprobe_event_gen_cmd_start(struct dynevent_cmd *cmd, bool kretprobe,
                                       const char *name, const char *loc, ...)
      
      Link: https://lore.kernel.org/all/20231031041305.3363712-1-yujie.liu@intel.com/
      
      Fixes: 2a588dd1 ("tracing: Add kprobe event command generation functions")
      Suggested-by: default avatarMukesh Ojha <quic_mojha@quicinc.com>
      Signed-off-by: default avatarYujie Liu <yujie.liu@intel.com>
      Reviewed-by: default avatarMukesh Ojha <quic_mojha@quicinc.com>
      Signed-off-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      f032c53b
    • Linus Torvalds's avatar
      Merge tag 'drm-next-2023-11-10' of git://anongit.freedesktop.org/drm/drm · c0d12d76
      Linus Torvalds authored
      Pull drm fixes from Daniel Vetter:
       "Dave's VPN to the big machine died, so it's on me to do fixes pr this
        and next week while everyone else is at plumbers.
      
         - big pile of amd fixes, but mostly for hw support newly added in 6.7
      
         - i915 fixes, mostly minor things
      
         - qxl memory leak fix
      
         - vc4 uaf fix in mock helpers
      
         - syncobj fix for DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE"
      
      * tag 'drm-next-2023-11-10' of git://anongit.freedesktop.org/drm/drm: (78 commits)
        drm/amdgpu: fix error handling in amdgpu_vm_init
        drm/amdgpu: Fix possible null pointer dereference
        drm/amdgpu: move UVD and VCE sched entity init after sched init
        drm/amdgpu: move kfd_resume before the ip late init
        drm/amd: Explicitly check for GFXOFF to be enabled for s0ix
        drm/amdgpu: Change WREG32_RLC to WREG32_SOC15_RLC where inst != 0 (v2)
        drm/amdgpu: Use correct KIQ MEC engine for gfx9.4.3 (v5)
        drm/amdgpu: add smu v13.0.6 pcs xgmi ras error query support
        drm/amdgpu: fix software pci_unplug on some chips
        drm/amd/display: remove duplicated argument
        drm/amdgpu: correct mca debugfs dump reg list
        drm/amdgpu: correct acclerator check architecutre dump
        drm/amdgpu: add pcs xgmi v6.4.0 ras support
        drm/amdgpu: Change extended-scope MTYPE on GC 9.4.3
        drm/amdgpu: disable smu v13.0.6 mca debug mode by default
        drm/amdgpu: Support multiple error query modes
        drm/amdgpu: refine smu v13.0.6 mca dump driver
        drm/amdgpu: Do not program PF-only regs in hdp_v4_0.c under SRIOV (v2)
        drm/amdgpu: Skip PCTL0_MMHUB_DEEPSLEEP_IB write in jpegv4.0.3 under SRIOV
        drm: amd: Resolve Sphinx unexpected indentation warning
        ...
      c0d12d76
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · ac347a06
      Linus Torvalds authored
      Pull arm64 fixes from Catalin Marinas:
       "Mostly PMU fixes and a reworking of the pseudo-NMI disabling on broken
        MediaTek firmware:
      
         - Move the MediaTek GIC quirk handling from irqchip to core. Before
           the merging window commit 44bd78dd ("irqchip/gic-v3: Disable
           pseudo NMIs on MediaTek devices w/ firmware issues") temporarily
           addressed this issue. Fixed now at a deeper level in the arch code
      
         - Reject events meant for other PMUs in the CoreSight PMU driver,
           otherwise some of the core PMU events would disappear
      
         - Fix the Armv8 PMUv3 driver driver to not truncate 64-bit registers,
           causing some events to be invisible
      
         - Remove duplicate declaration of __arm64_sys##name following the
           patch to avoid prototype warning for syscalls
      
         - Typos in the elf_hwcap documentation"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64/syscall: Remove duplicate declaration
        Revert "arm64: smp: avoid NMI IPIs with broken MediaTek FW"
        arm64: Move MediaTek GIC quirk handling from irqchip to core
        arm64/arm: arm_pmuv3: perf: Don't truncate 64-bit registers
        perf: arm_cspmu: Reject events meant for other PMUs
        Documentation/arm64: Fix typos in elf_hwcaps
      ac347a06
    • Linus Torvalds's avatar
      Merge tag 'sound-fix-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · e1d809b3
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "A collection of fixes for rc1.
      
        The majority of changes are various ASoC driver-specific small fixes
        and usual HD-audio quirks, while there are a couple of core changes: a
        fix in ALSA core procfs code to avoid deadlocks at disconnection and
        an ASoC core fix for DAPM clock widgets"
      
      * tag 'sound-fix-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        OSS: dmasound/paula: Convert to platform remove callback returning void
        ALSA: hda: ASUS UM5302LA: Added quirks for cs35L41/10431A83 on i2c bus
        ALSA: info: Fix potential deadlock at disconnection
        ASoC: nau8540: Add self recovery to improve capture quility
        ALSA: hda/realtek: Add support dual speaker for Dell
        ALSA: hda: Add ASRock X670E Taichi to denylist
        ALSA: hda/realtek: Add quirk for ASUS UX7602ZM
        ASoC: SOF: sof-client: trivial: fix comment typo
        ASoC: dapm: fix clock get name
        ASoC: hdmi-codec: register hpd callback on component probe
        ASoC: mediatek: mt8186_mt6366_rt1019_rt5682s: trivial: fix error messages
        ASoC: da7219: Improve system suspend and resume handling
        ASoC: codecs: Modify macro value error
        ASoC: codecs: Modify the wrong judgment of re value
        ASoC: codecs: Modify the maximum value of calib
        ASoC: amd: acp: fix for i2s mode register field update
        ASoC: codecs: aw88399: Fix -Wuninitialized in aw_dev_set_vcalb()
        ASoC: rt712-sdca: fix speaker route missing issue
        ASoC: rockchip: Fix unused rockchip_i2s_tdm_match warning for !CONFIG_OF
        ASoC: ti: omap-mcbsp: Fix runtime PM underflow warnings
      e1d809b3
    • Daniel Vetter's avatar
      Merge tag 'amd-drm-next-6.7-2023-11-10' of https://gitlab.freedesktop.org/agd5f/linux into drm-next · 03df0fc0
      Daniel Vetter authored
      amd-drm-next-6.7-2023-11-10:
      
      amdgpu:
      - SR-IOV fixes
      - DMCUB fixes
      - DCN3.5 fixes
      - DP2 fixes
      - SubVP fixes
      - SMU14 fixes
      - SDMA4.x fixes
      - Suspend/resume fixes
      - AGP regression fix
      - UAF fixes for some error cases
      - SMU 13.0.6 fixes
      - Documentation fixes
      - RAS fixes
      - Hotplug fixes
      - Scheduling entity ordering fix
      - GPUVM fixes
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      From: Alex Deucher <alexander.deucher@amd.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20231110190703.4741-1-alexander.deucher@amd.com
      03df0fc0
    • Linus Torvalds's avatar
      Merge tag 'spi-fix-v6.7-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi · ae4f52a7
      Linus Torvalds authored
      Pull spi fixes from Mark Brown:
       "A couple of fixes that came in during the merge window: one Kconfig
        dependency fix and another fix for a long standing issue where a sync
        transfer races with system suspend"
      
      * tag 'spi-fix-v6.7-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
        spi: Fix null dereference on suspend
        spi: spi-zynq-qspi: add spi-mem to driver kconfig dependencies
      ae4f52a7
    • Linus Torvalds's avatar
      Merge tag 'mmc-v6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · b456259e
      Linus Torvalds authored
      Pull MMC fixes from Ulf Hansson:
       "MMC core:
         - Fix broken cache-flush support for Micron eMMCs
         - Revert 'mmc: core: Capture correct oemid-bits for eMMC cards'
      
        MMC host:
         - sdhci_am654: Fix TAP value parsing for legacy speed mode
         - sdhci-pci-gli: Fix support for ASPM mode for GL9755/GL9750
         - vub300: Fix an error path in probe"
      
      * tag 'mmc-v6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
        mmc: sdhci-pci-gli: GL9750: Mask the replay timer timeout of AER
        mmc: sdhci-pci-gli: GL9755: Mask the replay timer timeout of AER
        Revert "mmc: core: Capture correct oemid-bits for eMMC cards"
        mmc: vub300: fix an error code
        mmc: Add quirk MMC_QUIRK_BROKEN_CACHE_FLUSH for Micron eMMC Q2J54A
        mmc: sdhci_am654: fix start loop index for TAP value parsing
      b456259e
    • Linus Torvalds's avatar
      Merge tag 'pwm/for-6.7-rc1-fixes' of... · b077b7ee
      Linus Torvalds authored
      Merge tag 'pwm/for-6.7-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm
      
      Pull pwm fixes from Thierry Reding:
       "This contains two very small fixes that I failed to include in the
        main pull request"
      
      * tag 'pwm/for-6.7-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
        pwm: Fix double shift bug
        pwm: samsung: Fix a bit test in pwm_samsung_resume()
      b077b7ee
    • Linus Torvalds's avatar
      Merge tag 'io_uring-6.7-2023-11-10' of git://git.kernel.dk/linux · b712075e
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
       "Mostly just a few fixes and cleanups caused by the read multishot
        support.
      
        Outside of that, a stable fix for how a connect retry is done"
      
      * tag 'io_uring-6.7-2023-11-10' of git://git.kernel.dk/linux:
        io_uring: do not clamp read length for multishot read
        io_uring: do not allow multishot read to set addr or len
        io_uring: indicate if io_kbuf_recycle did recycle anything
        io_uring/rw: add separate prep handler for fixed read/write
        io_uring/rw: add separate prep handler for readv/writev
        io_uring/net: ensure socket is marked connected on connect retry
        io_uring/rw: don't attempt to allocate async data if opcode doesn't need it
      b712075e