1. 14 Feb, 2020 4 commits
  2. 13 Feb, 2020 3 commits
    • Coly Li's avatar
      bcache: remove macro nr_to_fifo_front() · 4ec31cb6
      Coly Li authored
      Macro nr_to_fifo_front() is only used once in btree_flush_write(),
      it is unncessary indeed. This patch removes this macro and does
      calculation directly in place.
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      4ec31cb6
    • Coly Li's avatar
      bcache: Revert "bcache: shrink btree node cache after bch_btree_check()" · 309cc719
      Coly Li authored
      This reverts commit 1df3877f.
      
      In my testing, sometimes even all the cached btree nodes are freed,
      creating gc and allocator kernel threads may still fail. Finally it
      turns out that kthread_run() may fail if there is pending signal for
      current task. And the pending signal is sent from OOM killer which
      is triggered by memory consuption in bch_btree_check().
      
      Therefore explicitly shrinking bcache btree node here does not help,
      and after the shrinker callback is improved, as well as pending signals
      are ignored before creating kernel threads, now such operation is
      unncessary anymore.
      
      This patch reverts the commit 1df3877f ("bcache: shrink btree node
      cache after bch_btree_check()") because we have better improvement now.
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      309cc719
    • Coly Li's avatar
      bcache: ignore pending signals when creating gc and allocator thread · 0b96da63
      Coly Li authored
      When run a cache set, all the bcache btree node of this cache set will
      be checked by bch_btree_check(). If the bcache btree is very large,
      iterating all the btree nodes will occupy too much system memory and
      the bcache registering process might be selected and killed by system
      OOM killer. kthread_run() will fail if current process has pending
      signal, therefore the kthread creating in run_cache_set() for gc and
      allocator kernel threads are very probably failed for a very large
      bcache btree.
      
      Indeed such OOM is safe and the registering process will exit after
      the registration done. Therefore this patch flushes pending signals
      during the cache set start up, specificly in bch_cache_allocator_start()
      and bch_gc_thread_start(), to make sure run_cache_set() won't fail for
      large cahced data set.
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0b96da63
  3. 04 Feb, 2020 5 commits
    • Jens Axboe's avatar
      Merge branch 'nvme-5.6' of git://git.infradead.org/nvme into block-5.6 · b74e58cd
      Jens Axboe authored
      Pull NVMe fixes from Keith.
      
      * 'nvme-5.6' of git://git.infradead.org/nvme:
        nvmet: update AEN list and array at one place
        nvmet: Fix controller use after free
        nvmet: Fix error print message at nvmet_install_queue function
        nvme-pci: remove nvmeq->tags
        nvmet: fix dsm failure when payload does not match sgl descriptor
        nvmet: Pass lockdep expression to RCU lists
      b74e58cd
    • Daniel Wagner's avatar
      nvmet: update AEN list and array at one place · 0f5be6a4
      Daniel Wagner authored
      All async events are enqueued via nvmet_add_async_event() which
      updates the ctrl->async_event_cmds[] array and additionally an struct
      nvmet_async_event is added to the ctrl->async_events list.
      
      Under normal operations the nvmet_async_event_work() updates again
      the ctrl->async_event_cmds and removes the corresponding struct
      nvmet_async_event from the list again. Though nvmet_sq_destroy() could
      be called which calls nvmet_async_events_free() which only updates the
      ctrl->async_event_cmds[] array.
      
      Add new functions nvmet_async_events_process() and
      nvmet_async_events_free() to process async events, update an array and
      the list.
      
      When we destroy submission queue after clearing the aen present on
      the ctrl->async list we also loop over ctrl->async_event_cmds[] for
      any requests posted by the host for which we don't have the AEN in
      the ctrl->async_events list by calling nvmet_async_event_process()
      and nvmet_async_events_free().
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDaniel Wagner <dwagner@suse.de>
      [chaitanya.kulkarni@wdc.com
       * Loop over and clear out outstanding requests
       * Update changelog
      ]
      Signed-off-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      0f5be6a4
    • Israel Rukshin's avatar
      nvmet: Fix controller use after free · 1a3f540d
      Israel Rukshin authored
      After nvmet_install_queue() sets sq->ctrl calling to nvmet_sq_destroy()
      reduces the controller refcount. In case nvmet_install_queue() fails,
      calling to nvmet_ctrl_put() is done twice (at nvmet_sq_destroy and
      nvmet_execute_io_connect/nvmet_execute_admin_connect) instead of once for
      the queue which leads to use after free of the controller. Fix this by set
      NULL at sq->ctrl in case of a failure at nvmet_install_queue().
      
      The bug leads to the following Call Trace:
      
      [65857.994862] refcount_t: underflow; use-after-free.
      [65858.108304] Workqueue: events nvmet_rdma_release_queue_work [nvmet_rdma]
      [65858.115557] RIP: 0010:refcount_warn_saturate+0xe5/0xf0
      [65858.208141] Call Trace:
      [65858.211203]  nvmet_sq_destroy+0xe1/0xf0 [nvmet]
      [65858.216383]  nvmet_rdma_release_queue_work+0x37/0xf0 [nvmet_rdma]
      [65858.223117]  process_one_work+0x167/0x370
      [65858.227776]  worker_thread+0x49/0x3e0
      [65858.232089]  kthread+0xf5/0x130
      [65858.235895]  ? max_active_store+0x80/0x80
      [65858.240504]  ? kthread_bind+0x10/0x10
      [65858.244832]  ret_from_fork+0x1f/0x30
      [65858.249074] ---[ end trace f82d59250b54beb7 ]---
      
      Fixes: bb1cc747 ("nvmet: implement valid sqhd values in completions")
      Fixes: 1672ddb8 ("nvmet: Add install_queue callout")
      Signed-off-by: default avatarIsrael Rukshin <israelr@mellanox.com>
      Reviewed-by: default avatarMax Gurtovoy <maxg@mellanox.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      1a3f540d
    • Israel Rukshin's avatar
      nvmet: Fix error print message at nvmet_install_queue function · 0b87a2b7
      Israel Rukshin authored
      Place the arguments in the correct order.
      
      Fixes: 1672ddb8 ("nvmet: Add install_queue callout")
      Signed-off-by: default avatarIsrael Rukshin <israelr@mellanox.com>
      Reviewed-by: default avatarMax Gurtovoy <maxg@mellanox.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      0b87a2b7
    • Zhiqiang Liu's avatar
      brd: check and limit max_part par · c8ab4225
      Zhiqiang Liu authored
      In brd_init func, rd_nr num of brd_device are firstly allocated
      and add in brd_devices, then brd_devices are traversed to add each
      brd_device by calling add_disk func. When allocating brd_device,
      the disk->first_minor is set to i * max_part, if rd_nr * max_part
      is larger than MINORMASK, two different brd_device may have the same
      devt, then only one of them can be successfully added.
      when rmmod brd.ko, it will cause oops when calling brd_exit.
      
      Follow those steps:
        # modprobe brd rd_nr=3 rd_size=102400 max_part=1048576
        # rmmod brd
      then, the oops will appear.
      
      Oops log:
      [  726.613722] Call trace:
      [  726.614175]  kernfs_find_ns+0x24/0x130
      [  726.614852]  kernfs_find_and_get_ns+0x44/0x68
      [  726.615749]  sysfs_remove_group+0x38/0xb0
      [  726.616520]  blk_trace_remove_sysfs+0x1c/0x28
      [  726.617320]  blk_unregister_queue+0x98/0x100
      [  726.618105]  del_gendisk+0x144/0x2b8
      [  726.618759]  brd_exit+0x68/0x560 [brd]
      [  726.619501]  __arm64_sys_delete_module+0x19c/0x2a0
      [  726.620384]  el0_svc_common+0x78/0x130
      [  726.621057]  el0_svc_handler+0x38/0x78
      [  726.621738]  el0_svc+0x8/0xc
      [  726.622259] Code: aa0203f6 aa0103f7 aa1e03e0 d503201f (7940e260)
      
      Here, we add brd_check_and_reset_par func to check and limit max_part par.
      
      --
      V5->V6:
       - remove useless code
      
      V4->V5:(suggested by Ming Lei)
       - make sure max_part is not larger than DISK_MAX_PARTS
      
      V3->V4:(suggested by Ming Lei)
       - remove useless change
       - add one limit of max_part
      
      V2->V3: (suggested by Ming Lei)
       - clear .minors when running out of consecutive minor space in brd_alloc
       - remove limit of rd_nr
      
      V1->V2:
       - add more checks in brd_check_par_valid as suggested by Ming Lei.
      Signed-off-by: default avatarZhiqiang Liu <liuzhiqiang26@huawei.com>
      Reviewed-by: default avatarBob Liu <bob.liu@oracle.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c8ab4225
  4. 03 Feb, 2020 10 commits
  5. 01 Feb, 2020 5 commits
    • Coly Li's avatar
      bcache: check return value of prio_read() · 49d08d59
      Coly Li authored
      Now if prio_read() failed during starting a cache set, we can print
      out error message in run_cache_set() and handle the failure properly.
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      49d08d59
    • Coly Li's avatar
      bcache: fix incorrect data type usage in btree_flush_write() · d1c3cc34
      Coly Li authored
      Dan Carpenter points out that from commit 2aa8c529 ("bcache: avoid
      unnecessary btree nodes flushing in btree_flush_write()"), there is a
      incorrect data type usage which leads to the following static checker
      warning:
      	drivers/md/bcache/journal.c:444 btree_flush_write()
      	warn: 'ref_nr' unsigned <= 0
      
      drivers/md/bcache/journal.c
         422  static void btree_flush_write(struct cache_set *c)
         423  {
         424          struct btree *b, *t, *btree_nodes[BTREE_FLUSH_NR];
         425          unsigned int i, nr, ref_nr;
                                          ^^^^^^
      
         426          atomic_t *fifo_front_p, *now_fifo_front_p;
         427          size_t mask;
         428
         429          if (c->journal.btree_flushing)
         430                  return;
         431
         432          spin_lock(&c->journal.flush_write_lock);
         433          if (c->journal.btree_flushing) {
         434                  spin_unlock(&c->journal.flush_write_lock);
         435                  return;
         436          }
         437          c->journal.btree_flushing = true;
         438          spin_unlock(&c->journal.flush_write_lock);
         439
         440          /* get the oldest journal entry and check its refcount */
         441          spin_lock(&c->journal.lock);
         442          fifo_front_p = &fifo_front(&c->journal.pin);
         443          ref_nr = atomic_read(fifo_front_p);
         444          if (ref_nr <= 0) {
                          ^^^^^^^^^^^
      Unsigned can't be less than zero.
      
         445                  /*
         446                   * do nothing if no btree node references
         447                   * the oldest journal entry
         448                   */
         449                  spin_unlock(&c->journal.lock);
         450                  goto out;
         451          }
         452          spin_unlock(&c->journal.lock);
      
      As the warning information indicates, local varaible ref_nr in unsigned
      int type is wrong, which does not matche atomic_read() and the "<= 0"
      checking.
      
      This patch fixes the above error by defining local variable ref_nr as
      int type.
      
      Fixes: 2aa8c529 ("bcache: avoid unnecessary btree nodes flushing in btree_flush_write()")
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d1c3cc34
    • Coly Li's avatar
      bcache: add readahead cache policy options via sysfs interface · 038ba8cc
      Coly Li authored
      In year 2007 high performance SSD was still expensive, in order to
      save more space for real workload or meta data, the readahead I/Os
      for non-meta data was bypassed and not cached on SSD.
      
      In now days, SSD price drops a lot and people can find larger size
      SSD with more comfortable price. It is unncessary to alway bypass
      normal readahead I/Os to save SSD space for now.
      
      This patch adds options for readahead data cache policies via sysfs
      file /sys/block/bcache<N>/readahead_cache_policy, the options are,
      - "all": cache all readahead data I/Os.
      - "meta-only": only cache meta data, and bypass other regular I/Os.
      
      If users want to make bcache continue to only cache readahead request
      for metadata and bypass regular data readahead, please set "meta-only"
      to this sysfs file. By default, bcache will back to cache all read-
      ahead requests now.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Acked-by: default avatarEric Wheeler <bcache@linux.ewheeler.net>
      Cc: Michael Lyle <mlyle@lyle.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      038ba8cc
    • Coly Li's avatar
      bcache: explicity type cast in bset_bkey_last() · 7c02b005
      Coly Li authored
      In bset.h, macro bset_bkey_last() is defined as,
          bkey_idx((struct bkey *) (i)->d, (i)->keys)
      
      Parameter i can be variable type of data structure, the macro always
      works once the type of struct i has member 'd' and 'keys'.
      
      bset_bkey_last() is also used in macro csum_set() to calculate the
      checksum of a on-disk data structure. When csum_set() is used to
      calculate checksum of on-disk bcache super block, the parameter 'i'
      data type is struct cache_sb_disk. Inside struct cache_sb_disk (also in
      struct cache_sb) the member keys is __u16 type. But bkey_idx() expects
      unsigned int (a 32bit width), so there is problem when sending
      parameters via stack to call bkey_idx().
      
      Sparse tool from Intel 0day kbuild system reports this incompatible
      problem. bkey_idx() is part of user space API, so the simplest fix is
      to cast the (i)->keys to unsigned int type in macro bset_bkey_last().
      Reported-by: default avatarkbuild test robot <lkp@intel.com>
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7c02b005
    • Coly Li's avatar
      bcache: fix memory corruption in bch_cache_accounting_clear() · 5bebf748
      Coly Li authored
      Commit 83ff9318 ("bcache: not use hard coded memset size in
      bch_cache_accounting_clear()") tries to make the code more easy to
      understand by removing the hard coded number with following change,
      	void bch_cache_accounting_clear(...)
      	{
      		memset(&acc->total.cache_hits,
      			0,
      	-		sizeof(unsigned long) * 7);
      	+		sizeof(struct cache_stats));
      	}
      
      Unfortunately the change was wrong (it also tells us the original code
      was not easy to correctly understand). The hard coded number 7 is used
      because in struct cache_stats,
       15 struct cache_stats {
       16         struct kobject          kobj;
       17
       18         unsigned long cache_hits;
       19         unsigned long cache_misses;
       20         unsigned long cache_bypass_hits;
       21         unsigned long cache_bypass_misses;
       22         unsigned long cache_readaheads;
       23         unsigned long cache_miss_collisions;
       24         unsigned long sectors_bypassed;
       25
       26         unsigned int            rescale;
       27 };
      only members in LINE 18-24 want to be set to 0. It is wrong to use
      'sizeof(struct cache_stats)' to replace 'sizeof(unsigned long) * 7), the
      memory objects behind acc->total is staled by this change.
      
      Сорокин Артем Сергеевич reports that by the following steps, kernel
      panic will be triggered,
      1. Create new set: make-bcache -B /dev/nvme1n1 -C /dev/sda --wipe-bcache
      2. Run in /sys/fs/bcache/<uuid>:
         echo 1 > clear_stats && cat stats_five_minute/cache_bypass_hits
      
      I can reproduce the panic and get following dmesg with KASAN enabled,
      [22613.172742] ==================================================================
      [22613.172862] BUG: KASAN: null-ptr-deref in sysfs_kf_seq_show+0x117/0x230
      [22613.172864] Read of size 8 at addr 0000000000000000 by task cat/6753
      
      [22613.172870] CPU: 1 PID: 6753 Comm: cat Not tainted 5.5.0-rc7-lp151.28.16-default+ #11
      [22613.172872] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/29/2019
      [22613.172873] Call Trace:
      [22613.172964]  dump_stack+0x8b/0xbb
      [22613.172968]  ? sysfs_kf_seq_show+0x117/0x230
      [22613.172970]  ? sysfs_kf_seq_show+0x117/0x230
      [22613.173031]  __kasan_report+0x176/0x192
      [22613.173064]  ? pr_cont_kernfs_name+0x40/0x60
      [22613.173067]  ? sysfs_kf_seq_show+0x117/0x230
      [22613.173070]  kasan_report+0xe/0x20
      [22613.173072]  sysfs_kf_seq_show+0x117/0x230
      [22613.173105]  seq_read+0x199/0x6d0
      [22613.173110]  vfs_read+0xa5/0x1a0
      [22613.173113]  ksys_read+0x110/0x160
      [22613.173115]  ? kernel_write+0xb0/0xb0
      [22613.173177]  do_syscall_64+0x77/0x290
      [22613.173238]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [22613.173241] RIP: 0033:0x7fc2c886ac61
      [22613.173244] Code: fe ff ff 48 8d 3d c7 a0 09 00 48 83 ec 08 e8 46 03 02 00 66 0f 1f 44 00 00 8b 05 ca fb 2c 00 48 63 ff 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 55 53 48 89 d5 48 89
      [22613.173245] RSP: 002b:00007ffebe776d68 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
      [22613.173248] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fc2c886ac61
      [22613.173249] RDX: 0000000000020000 RSI: 00007fc2c8cca000 RDI: 0000000000000003
      [22613.173250] RBP: 0000000000020000 R08: ffffffffffffffff R09: 0000000000000000
      [22613.173251] R10: 000000000000038c R11: 0000000000000246 R12: 00007fc2c8cca000
      [22613.173253] R13: 0000000000000003 R14: 00007fc2c8cca00f R15: 0000000000020000
      [22613.173255] ==================================================================
      [22613.173256] Disabling lock debugging due to kernel taint
      [22613.173350] BUG: kernel NULL pointer dereference, address: 0000000000000000
      [22613.178380] #PF: supervisor read access in kernel mode
      [22613.180959] #PF: error_code(0x0000) - not-present page
      [22613.183444] PGD 0 P4D 0
      [22613.184867] Oops: 0000 [#1] SMP KASAN PTI
      [22613.186797] CPU: 1 PID: 6753 Comm: cat Tainted: G    B             5.5.0-rc7-lp151.28.16-default+ #11
      [22613.191253] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/29/2019
      [22613.196706] RIP: 0010:sysfs_kf_seq_show+0x117/0x230
      [22613.199097] Code: ff 48 8b 0b 48 8b 44 24 08 48 01 e9 eb a6 31 f6 48 89 cf ba 00 10 00 00 48 89 4c 24 10 e8 b1 e6 e9 ff 4c 89 ff e8 19 07 ea ff <49> 8b 07 48 85 c0 48 89 44 24 08 0f 84 91 00 00 00 49 8b 6d 00 48
      [22613.208016] RSP: 0018:ffff8881d4f8fd78 EFLAGS: 00010246
      [22613.210448] RAX: 0000000000000000 RBX: ffff8881eb99b180 RCX: ffffffff810d9ef6
      [22613.213691] RDX: 0000000000000001 RSI: 0000000000000246 RDI: 0000000000000246
      [22613.216893] RBP: 0000000000001000 R08: fffffbfff072ddcd R09: fffffbfff072ddcd
      [22613.220075] R10: 0000000000000001 R11: fffffbfff072ddcc R12: ffff8881de5c0200
      [22613.223256] R13: ffff8881ed175500 R14: ffff8881eb99b198 R15: 0000000000000000
      [22613.226290] FS:  00007fc2c8d3d500(0000) GS:ffff8881f2a80000(0000) knlGS:0000000000000000
      [22613.229637] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [22613.231993] CR2: 0000000000000000 CR3: 00000001ec89a004 CR4: 00000000003606e0
      [22613.234909] Call Trace:
      [22613.235931]  seq_read+0x199/0x6d0
      [22613.237259]  vfs_read+0xa5/0x1a0
      [22613.239229]  ksys_read+0x110/0x160
      [22613.240590]  ? kernel_write+0xb0/0xb0
      [22613.242040]  do_syscall_64+0x77/0x290
      [22613.243625]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [22613.245450] RIP: 0033:0x7fc2c886ac61
      [22613.246706] Code: fe ff ff 48 8d 3d c7 a0 09 00 48 83 ec 08 e8 46 03 02 00 66 0f 1f 44 00 00 8b 05 ca fb 2c 00 48 63 ff 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 55 53 48 89 d5 48 89
      [22613.253296] RSP: 002b:00007ffebe776d68 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
      [22613.255835] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fc2c886ac61
      [22613.258472] RDX: 0000000000020000 RSI: 00007fc2c8cca000 RDI: 0000000000000003
      [22613.260807] RBP: 0000000000020000 R08: ffffffffffffffff R09: 0000000000000000
      [22613.263188] R10: 000000000000038c R11: 0000000000000246 R12: 00007fc2c8cca000
      [22613.265598] R13: 0000000000000003 R14: 00007fc2c8cca00f R15: 0000000000020000
      [22613.268729] Modules linked in: scsi_transport_iscsi af_packet iscsi_ibft iscsi_boot_sysfs vmw_vsock_vmci_transport vsock fuse bnep kvm_intel kvm irqbypass crc32_pclmul crc32c_intel ghash_clmulni_intel snd_ens1371 snd_ac97_codec ac97_bus bcache snd_pcm btusb btrtl btbcm btintel crc64 aesni_intel glue_helper crypto_simd vmw_balloon cryptd bluetooth snd_timer snd_rawmidi snd joydev pcspkr e1000 rfkill vmw_vmci soundcore ecdh_generic ecc gameport i2c_piix4 mptctl ac button hid_generic usbhid sr_mod cdrom ata_generic ehci_pci vmwgfx uhci_hcd drm_kms_helper syscopyarea serio_raw sysfillrect sysimgblt fb_sys_fops ttm ehci_hcd mptspi scsi_transport_spi mptscsih ata_piix mptbase ahci usbcore libahci drm sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua
      [22613.292429] CR2: 0000000000000000
      [22613.293563] ---[ end trace a074b26a8508f378 ]---
      [22613.295138] RIP: 0010:sysfs_kf_seq_show+0x117/0x230
      [22613.296769] Code: ff 48 8b 0b 48 8b 44 24 08 48 01 e9 eb a6 31 f6 48 89 cf ba 00 10 00 00 48 89 4c 24 10 e8 b1 e6 e9 ff 4c 89 ff e8 19 07 ea ff <49> 8b 07 48 85 c0 48 89 44 24 08 0f 84 91 00 00 00 49 8b 6d 00 48
      [22613.303553] RSP: 0018:ffff8881d4f8fd78 EFLAGS: 00010246
      [22613.305280] RAX: 0000000000000000 RBX: ffff8881eb99b180 RCX: ffffffff810d9ef6
      [22613.307924] RDX: 0000000000000001 RSI: 0000000000000246 RDI: 0000000000000246
      [22613.310272] RBP: 0000000000001000 R08: fffffbfff072ddcd R09: fffffbfff072ddcd
      [22613.312685] R10: 0000000000000001 R11: fffffbfff072ddcc R12: ffff8881de5c0200
      [22613.315076] R13: ffff8881ed175500 R14: ffff8881eb99b198 R15: 0000000000000000
      [22613.318116] FS:  00007fc2c8d3d500(0000) GS:ffff8881f2a80000(0000) knlGS:0000000000000000
      [22613.320743] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [22613.322628] CR2: 0000000000000000 CR3: 00000001ec89a004 CR4: 00000000003606e0
      
      Here this patch fixes the following problem by explicity set all the 7
      members to 0 in bch_cache_accounting_clear().
      Reported-by: default avatarСорокин Артем Сергеевич <a.sorokin@bank-hlynov.ru>
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5bebf748
  6. 30 Jan, 2020 4 commits
  7. 29 Jan, 2020 9 commits
    • Linus Torvalds's avatar
      Merge tag 'docs-5.6' of git://git.lwn.net/linux · 05ef8b97
      Linus Torvalds authored
      Pull documentation updates from Jonathan Corbet:
       "It has been a relatively quiet cycle for documentation, but there's
        still a couple of things of note:
      
         - Conversion of the NFS documentation to RST
      
         - A new document on how to help with documentation (and a maintainer
           profile entry too)
      
        Plus the usual collection of typo fixes, etc"
      
      * tag 'docs-5.6' of git://git.lwn.net/linux: (40 commits)
        docs: filesystems: add overlayfs to index.rst
        docs: usb: remove some broken references
        scripts/find-unused-docs: Fix massive false positives
        docs: nvdimm: use ReST notation for subsection
        zram: correct documentation about sysfs node of huge page writeback
        Documentation: zram: various fixes in zram.rst
        Add a maintainer entry profile for documentation
        Add a document on how to contribute to the documentation
        docs: Keep up with the location of NoUri
        Documentation: Call out example SYM_FUNC_* usage as x86-specific
        Documentation: nfs: fault_injection: convert to ReST
        Documentation: nfs: pnfs-scsi-server: convert to ReST
        Documentation: nfs: convert pnfs-block-server to ReST
        Documentation: nfs: idmapper: convert to ReST
        Documentation: convert nfsd-admin-interfaces to ReST
        Documentation: nfs-rdma: convert to ReST
        Documentation: nfsroot.rst: COSMETIC: refill a paragraph
        Documentation: nfsroot.txt: convert to ReST
        Documentation: convert nfs.txt to ReST
        Documentation: filesystems: convert vfat.txt to RST
        ...
      05ef8b97
    • Linus Torvalds's avatar
      Merge tag 'linux-kselftest-5.6-rc1-kunit' of... · 08a3ef8f
      Linus Torvalds authored
      Merge tag 'linux-kselftest-5.6-rc1-kunit' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull Kselftest kunit updates from Shuah Khan:
       "This kunit update consists of:
      
         - Support for building kunit as a module from Alan Maguire
      
         - AppArmor KUnit tests for policy unpack from Mike Salvatore"
      
      * tag 'linux-kselftest-5.6-rc1-kunit' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
        kunit: building kunit as a module breaks allmodconfig
        kunit: update documentation to describe module-based build
        kunit: allow kunit to be loaded as a module
        kunit: remove timeout dependence on sysctl_hung_task_timeout_seconds
        kunit: allow kunit tests to be loaded as a module
        kunit: hide unexported try-catch interface in try-catch-impl.h
        kunit: move string-stream.h to lib/kunit
        apparmor: add AppArmor KUnit tests for policy unpack
      08a3ef8f
    • Linus Torvalds's avatar
      Merge tag 'linux-kselftest-5.6-rc1' of... · ce7ae9d9
      Linus Torvalds authored
      Merge tag 'linux-kselftest-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull Kselftest update from Shuah Khan:
       "This Kselftest update consists of several fixes to framework and
        individual tests.
      
        In addition, it enables LKDTM tests adding lkdtm target to kselftest
        Makefile"
      
      * tag 'linux-kselftest-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
        selftests/ftrace: fix glob selftest
        selftests: settings: tests can be in subsubdirs
        kselftest: Minimise dependency of get_size on C library interfaces
        selftests/livepatch: Remove unused local variable in set_ftrace_enabled()
        selftests/livepatch: Replace set_dynamic_debug() with setup_config() in README
        selftests/lkdtm: Add tests for LKDTM targets
        selftests: Uninitialized variable in test_cgcore_proc_migration()
        selftests: fix build behaviour on targets' failures
      ce7ae9d9
    • Linus Torvalds's avatar
      Merge tag 'y2038-drivers-for-v5.6-signed' of... · 22b17db4
      Linus Torvalds authored
      Merge tag 'y2038-drivers-for-v5.6-signed' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground
      
      Pull y2038 updates from Arnd Bergmann:
       "Core, driver and file system changes
      
        These are updates to device drivers and file systems that for some
        reason or another were not included in the kernel in the previous
        y2038 series.
      
        I've gone through all users of time_t again to make sure the kernel is
        in a long-term maintainable state, replacing all remaining references
        to time_t with safe alternatives.
      
        Some related parts of the series were picked up into the nfsd, xfs,
        alsa and v4l2 trees. A final set of patches in linux-mm removes the
        now unused time_t/timeval/timespec types and helper functions after
        all five branches are merged for linux-5.6, ensuring that no new users
        get merged.
      
        As a result, linux-5.6, or my backport of the patches to 5.4 [1],
        should be the first release that can serve as a base for a 32-bit
        system designed to run beyond year 2038, with a few remaining caveats:
      
         - All user space must be compiled with a 64-bit time_t, which will be
           supported in the coming musl-1.2 and glibc-2.32 releases, along
           with installed kernel headers from linux-5.6 or higher.
      
         - Applications that use the system call interfaces directly need to
           be ported to use the time64 syscalls added in linux-5.1 in place of
           the existing system calls. This impacts most users of futex() and
           seccomp() as well as programming languages that have their own
           runtime environment not based on libc.
      
         - Applications that use a private copy of kernel uapi header files or
           their contents may need to update to the linux-5.6 version, in
           particular for sound/asound.h, xfs/xfs_fs.h, linux/input.h,
           linux/elfcore.h, linux/sockios.h, linux/timex.h and
           linux/can/bcm.h.
      
         - A few remaining interfaces cannot be changed to pass a 64-bit
           time_t in a compatible way, so they must be configured to use
           CLOCK_MONOTONIC times or (with a y2106 problem) unsigned 32-bit
           timestamps. Most importantly this impacts all users of 'struct
           input_event'.
      
         - All y2038 problems that are present on 64-bit machines also apply
           to 32-bit machines. In particular this affects file systems with
           on-disk timestamps using signed 32-bit seconds: ext4 with
           ext3-style small inodes, ext2, xfs (to be fixed soon) and ufs"
      
      [1] https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git/log/?h=y2038-endgame
      
      * tag 'y2038-drivers-for-v5.6-signed' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (21 commits)
        Revert "drm/etnaviv: reject timeouts with tv_nsec >= NSEC_PER_SEC"
        y2038: sh: remove timeval/timespec usage from headers
        y2038: sparc: remove use of struct timex
        y2038: rename itimerval to __kernel_old_itimerval
        y2038: remove obsolete jiffies conversion functions
        nfs: fscache: use timespec64 in inode auxdata
        nfs: fix timstamp debug prints
        nfs: use time64_t internally
        sunrpc: convert to time64_t for expiry
        drm/etnaviv: avoid deprecated timespec
        drm/etnaviv: reject timeouts with tv_nsec >= NSEC_PER_SEC
        drm/msm: avoid using 'timespec'
        hfs/hfsplus: use 64-bit inode timestamps
        hostfs: pass 64-bit timestamps to/from user space
        packet: clarify timestamp overflow
        tsacct: add 64-bit btime field
        acct: stop using get_seconds()
        um: ubd: use 64-bit time_t where possible
        xtensa: ISS: avoid struct timeval
        dlm: use SO_SNDTIMEO_NEW instead of SO_SNDTIMEO_OLD
        ...
      22b17db4
    • Linus Torvalds's avatar
      Merge tag 'printk-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk · a4fe2b4d
      Linus Torvalds authored
      Pull printk update from Petr Mladek:
       "Prevent replaying log on all consoles"
      
      * tag 'printk-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
        printk: fix exclusive_console replaying
      a4fe2b4d
    • Linus Torvalds's avatar
      Merge tag 'erofs-for-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs · 3893c202
      Linus Torvalds authored
      Pull erofs updates from Gao Xiang:
       "A regression fix, several cleanups and (maybe) plus an upcoming new
        mount api convert patch as a part of vfs update are considered
        available for this cycle.
      
        All commits have been in linux-next and tested with no smoke out.
      
        Summary:
      
         - fix an out-of-bound read access introduced in v5.3, which could
           rarely cause data corruption
      
         - various cleanup patches"
      
      * tag 'erofs-for-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
        erofs: clean up z_erofs_submit_queue()
        erofs: fold in postsubmit_is_all_bypassed()
        erofs: fix out-of-bound read for shifted uncompressed block
        erofs: remove void tagging/untagging of workgroup pointers
        erofs: remove unused tag argument while registering a workgroup
        erofs: remove unused tag argument while finding a workgroup
        erofs: correct indentation of an assigned structure inside a function
      3893c202
    • Linus Torvalds's avatar
      Merge branch 'work.adfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 53070406
      Linus Torvalds authored
      Pull adfs updates from Al Viro:
       "adfs stuff for this cycle"
      
      * 'work.adfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (42 commits)
        fs/adfs: bigdir: Fix an error code in adfs_fplus_read()
        Documentation: update adfs filesystem documentation
        fs/adfs: mostly divorse inode number from indirect disc address
        fs/adfs: super: add support for E and E+ floppy image formats
        fs/adfs: super: extract filesystem block probe
        fs/adfs: dir: remove debug in adfs_dir_update()
        fs/adfs: super: fix inode dropping
        fs/adfs: bigdir: implement directory update support
        fs/adfs: bigdir: calculate and validate directory checkbyte
        fs/adfs: bigdir: directory validation strengthening
        fs/adfs: bigdir: extract directory validation
        fs/adfs: bigdir: factor out directory entry offset calculation
        fs/adfs: newdir: split out directory commit from update
        fs/adfs: newdir: clean up adfs_f_update()
        fs/adfs: newdir: merge adfs_dir_read() into adfs_f_read()
        fs/adfs: newdir: improve directory validation
        fs/adfs: newdir: factor out directory format validation
        fs/adfs: dir: use pointers to access directory head/tails
        fs/adfs: dir: add more efficient iterate() per-format method
        fs/adfs: dir: switch to iterate_shared method
        ...
      53070406
    • Linus Torvalds's avatar
      Merge branch 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 6aee4bad
      Linus Torvalds authored
      Pull openat2 support from Al Viro:
       "This is the openat2() series from Aleksa Sarai.
      
        I'm afraid that the rest of namei stuff will have to wait - it got
        zero review the last time I'd posted #work.namei, and there had been a
        leak in the posted series I'd caught only last weekend. I was going to
        repost it on Monday, but the window opened and the odds of getting any
        review during that... Oh, well.
      
        Anyway, openat2 part should be ready; that _did_ get sane amount of
        review and public testing, so here it comes"
      
      From Aleksa's description of the series:
       "For a very long time, extending openat(2) with new features has been
        incredibly frustrating. This stems from the fact that openat(2) is
        possibly the most famous counter-example to the mantra "don't silently
        accept garbage from userspace" -- it doesn't check whether unknown
        flags are present[1].
      
        This means that (generally) the addition of new flags to openat(2) has
        been fraught with backwards-compatibility issues (O_TMPFILE has to be
        defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old
        kernels gave errors, since it's insecure to silently ignore the
        flag[2]). All new security-related flags therefore have a tough road
        to being added to openat(2).
      
        Furthermore, the need for some sort of control over VFS's path
        resolution (to avoid malicious paths resulting in inadvertent
        breakouts) has been a very long-standing desire of many userspace
        applications.
      
        This patchset is a revival of Al Viro's old AT_NO_JUMPS[3] patchset
        (which was a variant of David Drysdale's O_BENEATH patchset[4] which
        was a spin-off of the Capsicum project[5]) with a few additions and
        changes made based on the previous discussion within [6] as well as
        others I felt were useful.
      
        In line with the conclusions of the original discussion of
        AT_NO_JUMPS, the flag has been split up into separate flags. However,
        instead of being an openat(2) flag it is provided through a new
        syscall openat2(2) which provides several other improvements to the
        openat(2) interface (see the patch description for more details). The
        following new LOOKUP_* flags are added:
      
        LOOKUP_NO_XDEV:
      
           Blocks all mountpoint crossings (upwards, downwards, or through
           absolute links). Absolute pathnames alone in openat(2) do not
           trigger this. Magic-link traversal which implies a vfsmount jump is
           also blocked (though magic-link jumps on the same vfsmount are
           permitted).
      
        LOOKUP_NO_MAGICLINKS:
      
           Blocks resolution through /proc/$pid/fd-style links. This is done
           by blocking the usage of nd_jump_link() during resolution in a
           filesystem. The term "magic-links" is used to match with the only
           reference to these links in Documentation/, but I'm happy to change
           the name.
      
           It should be noted that this is different to the scope of
           ~LOOKUP_FOLLOW in that it applies to all path components. However,
           you can do openat2(NO_FOLLOW|NO_MAGICLINKS) on a magic-link and it
           will *not* fail (assuming that no parent component was a
           magic-link), and you will have an fd for the magic-link.
      
           In order to correctly detect magic-links, the introduction of a new
           LOOKUP_MAGICLINK_JUMPED state flag was required.
      
        LOOKUP_BENEATH:
      
           Disallows escapes to outside the starting dirfd's
           tree, using techniques such as ".." or absolute links. Absolute
           paths in openat(2) are also disallowed.
      
           Conceptually this flag is to ensure you "stay below" a certain
           point in the filesystem tree -- but this requires some additional
           to protect against various races that would allow escape using
           "..".
      
           Currently LOOKUP_BENEATH implies LOOKUP_NO_MAGICLINKS, because it
           can trivially beam you around the filesystem (breaking the
           protection). In future, there might be similar safety checks done
           as in LOOKUP_IN_ROOT, but that requires more discussion.
      
        In addition, two new flags are added that expand on the above ideas:
      
        LOOKUP_NO_SYMLINKS:
      
           Does what it says on the tin. No symlink resolution is allowed at
           all, including magic-links. Just as with LOOKUP_NO_MAGICLINKS this
           can still be used with NOFOLLOW to open an fd for the symlink as
           long as no parent path had a symlink component.
      
        LOOKUP_IN_ROOT:
      
           This is an extension of LOOKUP_BENEATH that, rather than blocking
           attempts to move past the root, forces all such movements to be
           scoped to the starting point. This provides chroot(2)-like
           protection but without the cost of a chroot(2) for each filesystem
           operation, as well as being safe against race attacks that
           chroot(2) is not.
      
           If a race is detected (as with LOOKUP_BENEATH) then an error is
           generated, and similar to LOOKUP_BENEATH it is not permitted to
           cross magic-links with LOOKUP_IN_ROOT.
      
           The primary need for this is from container runtimes, which
           currently need to do symlink scoping in userspace[7] when opening
           paths in a potentially malicious container.
      
           There is a long list of CVEs that could have bene mitigated by
           having RESOLVE_THIS_ROOT (such as CVE-2017-1002101,
           CVE-2017-1002102, CVE-2018-15664, and CVE-2019-5736, just to name a
           few).
      
        In order to make all of the above more usable, I'm working on
        libpathrs[8] which is a C-friendly library for safe path resolution.
        It features a userspace-emulated backend if the kernel doesn't support
        openat2(2). Hopefully we can get userspace to switch to using it, and
        thus get openat2(2) support for free once it's ready.
      
        Future work would include implementing things like
        RESOLVE_NO_AUTOMOUNT and possibly a RESOLVE_NO_REMOTE (to allow
        programs to be sure they don't hit DoSes though stale NFS handles)"
      
      * 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        Documentation: path-lookup: include new LOOKUP flags
        selftests: add openat2(2) selftests
        open: introduce openat2(2) syscall
        namei: LOOKUP_{IN_ROOT,BENEATH}: permit limited ".." resolution
        namei: LOOKUP_IN_ROOT: chroot-like scoped resolution
        namei: LOOKUP_BENEATH: O_BENEATH-like scoped resolution
        namei: LOOKUP_NO_XDEV: block mountpoint crossing
        namei: LOOKUP_NO_MAGICLINKS: block magic-link resolution
        namei: LOOKUP_NO_SYMLINKS: block symlink resolution
        namei: allow set_root() to produce errors
        namei: allow nd_jump_link() to produce errors
        nsfs: clean-up ns_get_path() signature to return int
        namei: only return -ECHILD from follow_dotdot_rcu()
      6aee4bad
    • Linus Torvalds's avatar
      Merge branch 'urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu · 15d66324
      Linus Torvalds authored
      Pull RCU warning removal from Paul McKenney:
       "A single commit that fixes an embarrassing bug discussed here:
      
            https://lore.kernel.org/lkml/20200125131425.GB16136@zn.tnic/
      
        which apparently also affects smaller systems"
      
      [ This was sent to Ingo, but since I see the issue on the laptop I use for
        testing during the merge window, I'm doing the pull directly     - Linus ]
      
      * 'urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
        rcu: Forgive slow expedited grace periods at boot time
      15d66324