1. 14 Feb, 2020 1 commit
    • Jens Axboe's avatar
      io_uring: prune request from overflow list on flush · 2ca10259
      Jens Axboe authored
      Carter reported an issue where he could produce a stall on ring exit,
      when we're cleaning up requests that match the given file table. For
      this particular test case, a combination of a few things caused the
      issue:
      
      - The cq ring was overflown
      - The request being canceled was in the overflow list
      
      The combination of the above means that the cq overflow list holds a
      reference to the request. The request is canceled correctly, but since
      the overflow list holds a reference to it, the final put won't happen.
      Since the final put doesn't happen, the request remains in the inflight.
      Hence we never finish the cancelation flush.
      
      Fix this by removing requests from the overflow list if we're canceling
      them.
      
      Cc: stable@vger.kernel.org # 5.5
      Reported-by: default avatarCarter Li 李通洲 <carter.li@eoitek.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2ca10259
  2. 13 Feb, 2020 1 commit
    • Jens Axboe's avatar
      io-wq: don't call kXalloc_node() with non-online node · 7563439a
      Jens Axboe authored
      Glauber reports a crash on init on a box he has:
      
       RIP: 0010:__alloc_pages_nodemask+0x132/0x340
       Code: 18 01 75 04 41 80 ce 80 89 e8 48 8b 54 24 08 8b 74 24 1c c1 e8 0c 48 8b 3c 24 83 e0 01 88 44 24 20 48 85 d2 0f 85 74 01 00 00 <3b> 77 08 0f 82 6b 01 00 00 48 89 7c 24 10 89 ea 48 8b 07 b9 00 02
       RSP: 0018:ffffb8be4d0b7c28 EFLAGS: 00010246
       RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000e8e8
       RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000002080
       RBP: 0000000000012cc0 R08: 0000000000000000 R09: 0000000000000002
       R10: 0000000000000dc0 R11: ffff995c60400100 R12: 0000000000000000
       R13: 0000000000012cc0 R14: 0000000000000001 R15: ffff995c60db00f0
       FS:  00007f4d115ca900(0000) GS:ffff995c60d80000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000002088 CR3: 00000017cca66002 CR4: 00000000007606e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       PKRU: 55555554
       Call Trace:
        alloc_slab_page+0x46/0x320
        new_slab+0x9d/0x4e0
        ___slab_alloc+0x507/0x6a0
        ? io_wq_create+0xb4/0x2a0
        __slab_alloc+0x1c/0x30
        kmem_cache_alloc_node_trace+0xa6/0x260
        io_wq_create+0xb4/0x2a0
        io_uring_setup+0x97f/0xaa0
        ? io_remove_personalities+0x30/0x30
        ? io_poll_trigger_evfd+0x30/0x30
        do_syscall_64+0x5b/0x1c0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7f4d116cb1ed
      
      which is due to the 'wqe' and 'worker' allocation being node affine.
      But it isn't valid to call the node affine allocation if the node isn't
      online.
      
      Setup structures for even offline nodes, as usual, but skip them in
      terms of thread setup to not waste resources. If the node isn't online,
      just alloc memory with NUMA_NO_NODE.
      Reported-by: default avatarGlauber Costa <glauber@scylladb.com>
      Tested-by: default avatarGlauber Costa <glauber@scylladb.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7563439a
  3. 09 Feb, 2020 4 commits
  4. 08 Feb, 2020 12 commits
  5. 06 Feb, 2020 11 commits
    • Pavel Begunkov's avatar
      io_uring: fix deferred req iovec leak · 1e95081c
      Pavel Begunkov authored
      After defer, a request will be prepared, that includes allocating iovec
      if needed, and then submitted through io_wq_submit_work() but not custom
      handler (e.g. io_rw_async()/io_sendrecv_async()). However, it'll leak
      iovec, as it's in io-wq and the code goes as follows:
      
      io_read() {
      	if (!io_wq_current_is_worker())
      		kfree(iovec);
      }
      
      Put all deallocation logic in io_{read,write,send,recv}(), which will
      leave the memory, if going async with -EAGAIN.
      
      It also fixes a leak after failed io_alloc_async_ctx() in
      io_{recv,send}_msg().
      
      Cc: stable@vger.kernel.org # 5.5
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      1e95081c
    • Randy Dunlap's avatar
      io_uring: fix 1-bit bitfields to be unsigned · e1d85334
      Randy Dunlap authored
      Make bitfields of size 1 bit be unsigned (since there is no room
      for the sign bit).
      This clears up the sparse warnings:
      
        CHECK   ../fs/io_uring.c
      ../fs/io_uring.c:207:50: error: dubious one-bit signed bitfield
      ../fs/io_uring.c:208:55: error: dubious one-bit signed bitfield
      ../fs/io_uring.c:209:63: error: dubious one-bit signed bitfield
      ../fs/io_uring.c:210:54: error: dubious one-bit signed bitfield
      ../fs/io_uring.c:211:57: error: dubious one-bit signed bitfield
      
      Found by sight and then verified with sparse.
      
      Fixes: 69b3e546 ("io_uring: change io_ring_ctx bool fields into bit fields")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: io-uring@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e1d85334
    • Pavel Begunkov's avatar
      io_uring: get rid of delayed mm check · 1cb1edb2
      Pavel Begunkov authored
      Fail fast if can't grab mm, so past that requests always have an mm
      when required. This allows us to remove req->user altogether.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      1cb1edb2
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-5.6-rc1' of https://github.com/ceph/ceph-client · 4c46bef2
      Linus Torvalds authored
      Pull ceph fixes from Ilya Dryomov:
      
       - a set of patches that fixes various corner cases in mount and umount
         code (Xiubo Li). This has to do with choosing an MDS, distinguishing
         between laggy and down MDSes and parsing the server path.
      
       - inode initialization fixes (Jeff Layton). The one included here
         mostly concerns things like open_by_handle() and there is another one
         that will come through Al.
      
       - copy_file_range() now uses the new copy-from2 op (Luis Henriques).
         The existing copy-from op turned out to be infeasible for generic
         filesystem use; we disable the copy offload if OSDs don't support
         copy-from2.
      
       - a patch to link "rbd" and "block" devices together in sysfs (Hannes
         Reinecke)
      
      ... and a smattering of cleanups from Xiubo, Jeff and Chengguang.
      
      * tag 'ceph-for-5.6-rc1' of https://github.com/ceph/ceph-client: (25 commits)
        rbd: set the 'device' link in sysfs
        ceph: move net/ceph/ceph_fs.c to fs/ceph/util.c
        ceph: print name of xattr in __ceph_{get,set}xattr() douts
        ceph: print r_direct_hash in hex in __choose_mds() dout
        ceph: use copy-from2 op in copy_file_range
        ceph: close holes in structs ceph_mds_session and ceph_mds_request
        rbd: work around -Wuninitialized warning
        ceph: allocate the correct amount of extra bytes for the session features
        ceph: rename get_session and switch to use ceph_get_mds_session
        ceph: remove the extra slashes in the server path
        ceph: add possible_max_rank and make the code more readable
        ceph: print dentry offset in hex and fix xattr_version type
        ceph: only touch the caps which have the subset mask requested
        ceph: don't clear I_NEW until inode metadata is fully populated
        ceph: retry the same mds later after the new session is opened
        ceph: check availability of mds cluster on mount after wait timeout
        ceph: keep the session state until it is released
        ceph: add __send_request helper
        ceph: ensure we have a new cap before continuing in fill_inode
        ceph: drop unused ttl_from parameter from fill_inode
        ...
      4c46bef2
    • Linus Torvalds's avatar
      Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu · 5b211154
      Linus Torvalds authored
      Pull m68knommu updates from Greg Ungerer:
       "A couple of changes:
      
         - remove old CONFIG options from the m68knommu defconfig files
      
         - fix a warning in the m68k non-MMU get_user() macro"
      
      * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
        m68knommu: fix memcpy() out of bounds warning in get_user()
        m68k: configs: Cleanup old Kconfig IO scheduler options
      5b211154
    • Linus Torvalds's avatar
      Merge tag 'Smack-for-5.6' of git://github.com/cschaufler/smack-next · 85e55296
      Linus Torvalds authored
      Pull smack fix from Casey Schaufler:
       "One fix for an obscure error found using an old version of ping(1)
        that did not use IPv6 sockets in the documented way"
      
      * tag 'Smack-for-5.6' of git://github.com/cschaufler/smack-next:
        broken ping to ipv6 linklocal addresses on debian buster
      85e55296
    • Linus Torvalds's avatar
      Merge tag 'xfs-5.6-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 99be3f60
      Linus Torvalds authored
      Pull moar xfs updates from Darrick Wong:
       "This contains the buffer error code refactoring I mentioned last week,
        now that it has had extra time to complete the full xfs fuzz testing
        suite to make sure there aren't any obvious new bugs"
      
      * tag 'xfs-5.6-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: fix xfs_buf_ioerror_alert location reporting
        xfs: remove unnecessary null pointer checks from _read_agf callers
        xfs: make xfs_*read_agf return EAGAIN to ALLOC_FLAG_TRYLOCK callers
        xfs: remove the xfs_btree_get_buf[ls] functions
        xfs: make xfs_trans_get_buf return an error code
        xfs: make xfs_trans_get_buf_map return an error code
        xfs: make xfs_buf_read return an error code
        xfs: make xfs_buf_get_uncached return an error code
        xfs: make xfs_buf_get return an error code
        xfs: make xfs_buf_read_map return an error code
        xfs: make xfs_buf_get_map return an error code
        xfs: make xfs_buf_alloc return an error code
      99be3f60
    • Linus Torvalds's avatar
      Merge tag 'trace-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · e310396b
      Linus Torvalds authored
      Pull tracing updates from Steven Rostedt:
      
       - Added new "bootconfig".
      
         This looks for a file appended to initrd to add boot config options,
         and has been discussed thoroughly at Linux Plumbers.
      
         Very useful for adding kprobes at bootup.
      
         Only enabled if "bootconfig" is on the real kernel command line.
      
       - Created dynamic event creation.
      
         Merges common code between creating synthetic events and kprobe
         events.
      
       - Rename perf "ring_buffer" structure to "perf_buffer"
      
       - Rename ftrace "ring_buffer" structure to "trace_buffer"
      
         Had to rename existing "trace_buffer" to "array_buffer"
      
       - Allow trace_printk() to work withing (some) tracing code.
      
       - Sort of tracing configs to be a little better organized
      
       - Fixed bug where ftrace_graph hash was not being protected properly
      
       - Various other small fixes and clean ups
      
      * tag 'trace-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (88 commits)
        bootconfig: Show the number of nodes on boot message
        tools/bootconfig: Show the number of bootconfig nodes
        bootconfig: Add more parse error messages
        bootconfig: Use bootconfig instead of boot config
        ftrace: Protect ftrace_graph_hash with ftrace_sync
        ftrace: Add comment to why rcu_dereference_sched() is open coded
        tracing: Annotate ftrace_graph_notrace_hash pointer with __rcu
        tracing: Annotate ftrace_graph_hash pointer with __rcu
        bootconfig: Only load bootconfig if "bootconfig" is on the kernel cmdline
        tracing: Use seq_buf for building dynevent_cmd string
        tracing: Remove useless code in dynevent_arg_pair_add()
        tracing: Remove check_arg() callbacks from dynevent args
        tracing: Consolidate some synth_event_trace code
        tracing: Fix now invalid var_ref_vals assumption in trace action
        tracing: Change trace_boot to use synth_event interface
        tracing: Move tracing selftests to bottom of menu
        tracing: Move mmio tracer config up with the other tracers
        tracing: Move tracing test module configs together
        tracing: Move all function tracing configs together
        tracing: Documentation for in-kernel synthetic event API
        ...
      e310396b
    • Linus Torvalds's avatar
      Merge tag 'io_uring-5.6-2020-02-05' of git://git.kernel.dk/linux-block · c1ef57a3
      Linus Torvalds authored
      Pull io_uring updates from Jens Axboe:
       "Some later fixes for io_uring:
      
         - Small cleanup series from Pavel
      
         - Belt and suspenders build time check of sqe size and layout
           (Stefan)
      
         - Addition of ->show_fdinfo() on request of Jann Horn, to aid in
           understanding mapped personalities
      
         - eventfd recursion/deadlock fix, for both io_uring and aio
      
         - Fixup for send/recv handling
      
         - Fixup for double deferral of read/write request
      
         - Fix for potential double completion event for close request
      
         - Adjust fadvise advice async/inline behavior
      
         - Fix for shutdown hang with SQPOLL thread
      
         - Fix for potential use-after-free of fixed file table"
      
      * tag 'io_uring-5.6-2020-02-05' of git://git.kernel.dk/linux-block:
        io_uring: cleanup fixed file data table references
        io_uring: spin for sq thread to idle on shutdown
        aio: prevent potential eventfd recursion on poll
        io_uring: put the flag changing code in the same spot
        io_uring: iterate req cache backwards
        io_uring: punt even fadvise() WILLNEED to async context
        io_uring: fix sporadic double CQE entry for close
        io_uring: remove extra ->file check
        io_uring: don't map read/write iovec potentially twice
        io_uring: use the proper helpers for io_send/recv
        io_uring: prevent potential eventfd recursion on poll
        eventfd: track eventfd_signal() recursion depth
        io_uring: add BUILD_BUG_ON() to assert the layout of struct io_uring_sqe
        io_uring: add ->show_fdinfo() for the io_uring file descriptor
      c1ef57a3
    • Linus Torvalds's avatar
      Merge tag 'block-5.6-2020-02-05' of git://git.kernel.dk/linux-block · ed535f2c
      Linus Torvalds authored
      Pull more block updates from Jens Axboe:
       "Some later arrivals, but all fixes at this point:
      
         - bcache fix series (Coly)
      
         - Series of BFQ fixes (Paolo)
      
         - NVMe pull request from Keith with a few minor NVMe fixes
      
         - Various little tweaks"
      
      * tag 'block-5.6-2020-02-05' of git://git.kernel.dk/linux-block: (23 commits)
        nvmet: update AEN list and array at one place
        nvmet: Fix controller use after free
        nvmet: Fix error print message at nvmet_install_queue function
        brd: check and limit max_part par
        nvme-pci: remove nvmeq->tags
        nvmet: fix dsm failure when payload does not match sgl descriptor
        nvmet: Pass lockdep expression to RCU lists
        block, bfq: clarify the goal of bfq_split_bfqq()
        block, bfq: get a ref to a group when adding it to a service tree
        block, bfq: remove ifdefs from around gets/puts of bfq groups
        block, bfq: extend incomplete name of field on_st
        block, bfq: get extra ref to prevent a queue from being freed during a group move
        block, bfq: do not insert oom queue into position tree
        block, bfq: do not plug I/O for bfq_queues with no proc refs
        bcache: check return value of prio_read()
        bcache: fix incorrect data type usage in btree_flush_write()
        bcache: add readahead cache policy options via sysfs interface
        bcache: explicity type cast in bset_bkey_last()
        bcache: fix memory corruption in bch_cache_accounting_clear()
        xen/blkfront: limit allocated memory size to actual use case
        ...
      ed535f2c
    • Linus Torvalds's avatar
      Merge tag 'libata-5.6-2020-02-05' of git://git.kernel.dk/linux-block · 03840663
      Linus Torvalds authored
      Pull libata updates from Jens Axboe:
      
       - Add a Sandisk CF card to supported pata_pcmcia list (Christian)
      
       - Move pata_arasan_cf away from legacy API (Peter)
      
       - Ensure ahci DMA/ints are shut down on shutdown (Prabhakar)
      
      * tag 'libata-5.6-2020-02-05' of git://git.kernel.dk/linux-block:
        ata: pata_arasan_cf: Use dma_request_chan() instead dma_request_slave_channel()
        ata: ahci: Add shutdown to freeze hardware resources of ahci
        pata_pcmia: add SanDisk High (>8G) CF card to supported list
      03840663
  6. 05 Feb, 2020 11 commits