1. 10 Dec, 2019 3 commits
    • Jens Axboe's avatar
      io-wq: briefly spin for new work after finishing work · e995d512
      Jens Axboe authored
      To avoid going to sleep only to get woken shortly thereafter, spin
      briefly for new work upon completion of work.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e995d512
    • Jens Axboe's avatar
      io-wq: remove worker->wait waitqueue · 506d95ff
      Jens Axboe authored
      We only have one cases of using the waitqueue to wake the worker, the
      rest are using wake_up_process(). Since we can save some cycles not
      fiddling with the waitqueue io_wqe_worker(), switch the work activation
      to task wakeup and get rid of the now unused wait_queue_head_t in
      struct io_worker.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      506d95ff
    • Jens Axboe's avatar
      io_uring: allow unbreakable links · 4e88d6e7
      Jens Axboe authored
      Some commands will invariably end in a failure in the sense that the
      completion result will be less than zero. One such example is timeouts
      that don't have a completion count set, they will always complete with
      -ETIME unless cancelled.
      
      For linked commands, we sever links and fail the rest of the chain if
      the result is less than zero. Since we have commands where we know that
      will happen, add IOSQE_IO_HARDLINK as a stronger link that doesn't sever
      regardless of the completion result. Note that the link will still sever
      if we fail submitting the parent request, hard links are only resilient
      in the presence of completion results for requests that did submit
      correctly.
      
      Cc: stable@vger.kernel.org # v5.4
      Reviewed-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Reported-by: default avatar李通洲 <carter.li@eoitek.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      4e88d6e7
  2. 05 Dec, 2019 5 commits
    • LimingWu's avatar
      io_uring: fix a typo in a comment · 0b4295b5
      LimingWu authored
      thatn -> than.
      Signed-off-by: default avatarLiming Wu <19092205@suning.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0b4295b5
    • Pavel Begunkov's avatar
      io_uring: hook all linked requests via link_list · 4493233e
      Pavel Begunkov authored
      Links are created by chaining requests through req->list with an
      exception that head uses req->link_list. (e.g. link_list->list->list)
      Because of that, io_req_link_next() needs complex splicing to advance.
      
      Link them all through list_list. Also, it seems to be simpler and more
      consistent IMHO.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      4493233e
    • Pavel Begunkov's avatar
      io_uring: fix error handling in io_queue_link_head · 2e6e1fde
      Pavel Begunkov authored
      In case of an error io_submit_sqe() drops a request and continues
      without it, even if the request was a part of a link. Not only it
      doesn't cancel links, but also may execute wrong sequence of actions.
      
      Stop consuming sqes, and let the user handle errors.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2e6e1fde
    • Jens Axboe's avatar
      io_uring: use hash table for poll command lookups · 78076bb6
      Jens Axboe authored
      We recently changed this from a single list to an rbtree, but for some
      real life workloads, the rbtree slows down the submission/insertion
      case enough so that it's the top cycle consumer on the io_uring side.
      In testing, using a hash table is a more well rounded compromise. It
      is fast for insertion, and as long as it's sized appropriately, it
      works well for the cancellation case as well. Running TAO with a lot
      of network sockets, this removes io_poll_req_insert() from spending
      2% of the CPU cycles.
      Reported-by: default avatarDan Melnic <dmm@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      78076bb6
    • Jens Axboe's avatar
      io-wq: clear node->next on list deletion · 08bdcc35
      Jens Axboe authored
      If someone removes a node from a list, and then later adds it back to
      a list, we can have invalid data in ->next. This can cause all sorts
      of issues. One such use case is the IORING_OP_POLL_ADD command, which
      will do just that if we race and get woken twice without any pending
      events. This is a pretty rare case, but can happen under extreme loads.
      Dan reports that he saw the following crash:
      
      BUG: kernel NULL pointer dereference, address: 0000000000000000
      PGD d283ce067 P4D d283ce067 PUD e5ca04067 PMD 0
      Oops: 0002 [#1] SMP
      CPU: 17 PID: 10726 Comm: tao:fast-fiber Kdump: loaded Not tainted 5.2.9-02851-gac7bc042d2d1 #116
      Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A17 05/03/2019
      RIP: 0010:io_wqe_enqueue+0x3e/0xd0
      Code: 34 24 74 55 8b 47 58 48 8d 6f 50 85 c0 74 50 48 89 df e8 35 7c 75 00 48 83 7b 08 00 48 8b 14 24 0f 84 84 00 00 00 48 8b 4b 10 <48> 89 11 48 89 53 10 83 63 20 fe 48 89 c6 48 89 df e8 0c 7a 75 00
      RSP: 0000:ffffc90006858a08 EFLAGS: 00010082
      RAX: 0000000000000002 RBX: ffff889037492fc0 RCX: 0000000000000000
      RDX: ffff888e40cc11a8 RSI: ffff888e40cc11a8 RDI: ffff889037492fc0
      RBP: ffff889037493010 R08: 00000000000000c3 R09: ffffc90006858ab8
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff888e40cc11a8
      R13: 0000000000000000 R14: 00000000000000c3 R15: ffff888e40cc1100
      FS:  00007fcddc9db700(0000) GS:ffff88903fa40000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 0000000e479f5003 CR4: 00000000007606e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
       <IRQ>
       io_poll_wake+0x12f/0x2a0
       __wake_up_common+0x86/0x120
       __wake_up_common_lock+0x7a/0xc0
       sock_def_readable+0x3c/0x70
       tcp_rcv_established+0x557/0x630
       tcp_v6_do_rcv+0x118/0x3c0
       tcp_v6_rcv+0x97e/0x9d0
       ip6_protocol_deliver_rcu+0xe3/0x440
       ip6_input+0x3d/0xc0
       ? ip6_protocol_deliver_rcu+0x440/0x440
       ipv6_rcv+0x56/0xd0
       ? ip6_rcv_finish_core.isra.18+0x80/0x80
       __netif_receive_skb_one_core+0x50/0x70
       netif_receive_skb_internal+0x2f/0xa0
       napi_gro_receive+0x125/0x150
       mlx5e_handle_rx_cqe+0x1d9/0x5a0
       ? mlx5e_poll_tx_cq+0x305/0x560
       mlx5e_poll_rx_cq+0x49f/0x9c5
       mlx5e_napi_poll+0xee/0x640
       ? smp_reschedule_interrupt+0x16/0xd0
       ? reschedule_interrupt+0xf/0x20
       net_rx_action+0x286/0x3d0
       __do_softirq+0xca/0x297
       irq_exit+0x96/0xa0
       do_IRQ+0x54/0xe0
       common_interrupt+0xf/0xf
       </IRQ>
      RIP: 0033:0x7fdc627a2e3a
      Code: 31 c0 85 d2 0f 88 f6 00 00 00 55 48 89 e5 41 57 41 56 4c 63 f2 41 55 41 54 53 48 83 ec 18 48 85 ff 0f 84 c7 00 00 00 48 8b 07 <41> 89 d4 49 89 f5 48 89 fb 48 85 c0 0f 84 64 01 00 00 48 83 78 10
      
      when running a networked workload with about 5000 sockets being polled
      for. Fix this by clearing node->next when the node is being removed from
      the list.
      
      Fixes: 6206f0e1 ("io-wq: shrink io_wq_work a bit")
      Reported-by: default avatarDan Melnic <dmm@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      08bdcc35
  3. 04 Dec, 2019 5 commits
  4. 03 Dec, 2019 20 commits
  5. 02 Dec, 2019 7 commits
    • Jens Axboe's avatar
      io_uring: use current task creds instead of allocating a new one · 0b8c0ec7
      Jens Axboe authored
      syzbot reports:
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      CPU: 0 PID: 9217 Comm: io_uring-sq Not tainted 5.4.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      RIP: 0010:creds_are_invalid kernel/cred.c:792 [inline]
      RIP: 0010:__validate_creds include/linux/cred.h:187 [inline]
      RIP: 0010:override_creds+0x9f/0x170 kernel/cred.c:550
      Code: ac 25 00 81 fb 64 65 73 43 0f 85 a3 37 00 00 e8 17 ab 25 00 49 8d 7c
      24 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84
      c0 74 08 3c 03 0f 8e 96 00 00 00 41 8b 5c 24 10 bf
      RSP: 0018:ffff88809c45fda0 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: 0000000043736564 RCX: ffffffff814f3318
      RDX: 0000000000000002 RSI: ffffffff814f3329 RDI: 0000000000000010
      RBP: ffff88809c45fdb8 R08: ffff8880a3aac240 R09: ffffed1014755849
      R10: ffffed1014755848 R11: ffff8880a3aac247 R12: 0000000000000000
      R13: ffff888098ab1600 R14: 0000000000000000 R15: 0000000000000000
      FS:  0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007ffd51c40664 CR3: 0000000092641000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
        io_sq_thread+0x1c7/0xa20 fs/io_uring.c:3274
        kthread+0x361/0x430 kernel/kthread.c:255
        ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
      Modules linked in:
      ---[ end trace f2e1a4307fbe2245 ]---
      RIP: 0010:creds_are_invalid kernel/cred.c:792 [inline]
      RIP: 0010:__validate_creds include/linux/cred.h:187 [inline]
      RIP: 0010:override_creds+0x9f/0x170 kernel/cred.c:550
      Code: ac 25 00 81 fb 64 65 73 43 0f 85 a3 37 00 00 e8 17 ab 25 00 49 8d 7c
      24 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84
      c0 74 08 3c 03 0f 8e 96 00 00 00 41 8b 5c 24 10 bf
      RSP: 0018:ffff88809c45fda0 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: 0000000043736564 RCX: ffffffff814f3318
      RDX: 0000000000000002 RSI: ffffffff814f3329 RDI: 0000000000000010
      RBP: ffff88809c45fdb8 R08: ffff8880a3aac240 R09: ffffed1014755849
      R10: ffffed1014755848 R11: ffff8880a3aac247 R12: 0000000000000000
      R13: ffff888098ab1600 R14: 0000000000000000 R15: 0000000000000000
      FS:  0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007ffd51c40664 CR3: 0000000092641000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      which is caused by slab fault injection triggering a failure in
      prepare_creds(). We don't actually need to create a copy of the creds
      as we're not modifying it, we just need a reference on the current task
      creds. This avoids the failure case as well, and propagates the const
      throughout the stack.
      
      Fixes: 181e448d ("io_uring: async workers should inherit the user creds")
      Reported-by: syzbot+5320383e16029ba057ff@syzkaller.appspotmail.com
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0b8c0ec7
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 72c0870e
      Linus Torvalds authored
      Pull input updates from Dmitry Torokhov:
      
       - updates to Ilitech driver to support ILI2117
      
       - face lift of st1232 driver to support MT-B protocol
      
       - a new driver for i.MX system controller keys
      
       - mpr121 driver now supports polling mode
      
       - various input drivers have been switched away from input_polled_dev
         to use polled mode of regular input devices
      
       - other assorted cleanups and fixes
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (70 commits)
        Input: synaptics-rmi4 - fix various V4L2 compliance problems in F54
        Input: synaptics - switch another X1 Carbon 6 to RMI/SMbus
        Input: fix Kconfig indentation
        Input: imx_sc_key - correct SCU message structure to avoid stack corruption
        Input: ili210x - optionally show calibrate sysfs attribute
        Input: ili210x - add resolution to chip operations structure
        Input: ili210x - do not retrieve/print chip firmware version
        Input: mms114 - use device_get_match_data
        Input: ili210x - remove unneeded suspend and resume handlers
        Input: ili210x - do not unconditionally mark touchscreen as wakeup source
        Input: ili210x - define and use chip operations structure
        Input: ili210x - do not set parent device explicitly
        Input: ili210x - handle errors from input_mt_init_slots()
        Input: ili210x - switch to using threaded IRQ
        Input: ili210x - add ILI2117 support
        dt-bindings: input: touchscreen: ad7879: generic node names in example
        Input: ar1021 - fix typo in preprocessor macro name
        Input: synaptics-rmi4 - simplify data read in rmi_f54_work
        Input: kxtj9 - switch to using polled mode of input devices
        Input: kxtj9 - switch to using managed resources
        ...
      72c0870e
    • Linus Torvalds's avatar
      Merge tag 'libnvdimm-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · d10032dd
      Linus Torvalds authored
      Pull libnvdimm updates from Dan Williams:
       "The highlight this cycle is continuing integration fixes for PowerPC
        and some resulting optimizations.
      
        Summary:
      
         - Updates to better support vmalloc space restrictions on PowerPC
           platforms.
      
         - Cleanups to move common sysfs attributes to core 'struct
           device_type' objects.
      
         - Export the 'target_node' attribute (the effective numa node if pmem
           is marked online) for regions and namespaces.
      
         - Miscellaneous fixups and optimizations"
      
      * tag 'libnvdimm-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (21 commits)
        MAINTAINERS: Remove Keith from NVDIMM maintainers
        libnvdimm: Export the target_node attribute for regions and namespaces
        dax: Add numa_node to the default device-dax attributes
        libnvdimm: Simplify root read-only definition for the 'resource' attribute
        dax: Simplify root read-only definition for the 'resource' attribute
        dax: Create a dax device_type
        libnvdimm: Move nvdimm_bus_attribute_group to device_type
        libnvdimm: Move nvdimm_attribute_group to device_type
        libnvdimm: Move nd_mapping_attribute_group to device_type
        libnvdimm: Move nd_region_attribute_group to device_type
        libnvdimm: Move nd_numa_attribute_group to device_type
        libnvdimm: Move nd_device_attribute_group to device_type
        libnvdimm: Move region attribute group definition
        libnvdimm: Move attribute groups to device type
        libnvdimm: Remove prototypes for nonexistent functions
        libnvdimm/btt: fix variable 'rc' set but not used
        libnvdimm/pmem: Delete include of nd-core.h
        libnvdimm/namespace: Differentiate between probe mapping and runtime mapping
        libnvdimm/pfn_dev: Don't clear device memmap area during generic namespace probe
        libnvdimm: Trivial comment fix
        ...
      d10032dd
    • Linus Torvalds's avatar
      Merge tag 'mailbox-v5.5' of git://git.linaro.org/landing-teams/working/fujitsu/integration · 43fd4bd7
      Linus Torvalds authored
      Pull mailbox updates from Jassi Brar:
      
       - omap : misc - catch error returned from pm_runtime_put_sync
      
       - hisi : misc - drop .owner from platform_driver
      
       - stm : change how wakeup is handled
      
       - imx : fix - bailout on error and nuke correct irq
      
       - imx : add support for imx7ulp platform
      
      * tag 'mailbox-v5.5' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
        mailbox: imx: add support for imx v1 mu
        dt-bindings: mailbox: imx-mu: add imx7ulp MU support
        mailbox: imx: Clear the right interrupts at shutdown
        mailbox: imx: Fix Tx doorbell shutdown path
        mailbox: stm32-ipcc: Update wakeup management
        mailbox: no need to set .owner platform_driver_register
        mailbox/omap: Handle if CONFIG_PM is disabled
      43fd4bd7
    • Linus Torvalds's avatar
      Merge tag 'hwlock-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc · 454d9c4a
      Linus Torvalds authored
      Pull hwspinlock updates from Bjorn Andersson:
       "This contains a number of cleanups to the core and several drivers, in
        particular removing the requirement for drivers to implement
        pm_runtime.
      
        It also udpates the location of the git tree in MAINTAINERS"
      
      * tag 'hwlock-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc:
        hwspinlock: u8500_hsem: Remove redundant PM runtime implementation
        hwspinlock: sprd: Remove redundant PM runtime implementation
        hwspinlock: Let the PM runtime can be optional
        hwspinlock: Remove BUG_ON() from the hwspinlock core
        hwspinlock: sprd: Use devm_hwspin_lock_register() to register hwlock controller
        hwspinlock: sprd: Use devm_add_action_or_reset() for calls to clk_disable_unprepare()
        hwspinlock: sprd: Check the return value of clk_prepare_enable()
        hwspinlock: sprd: Change to use devm_platform_ioremap_resource()
        hwspinlock: u8500_hsem: Use devm_hwspin_lock_register() to register hwlock controller
        hwspinlock: u8500_hsem: Use devm_kzalloc() to allocate memory
        hwspinlock: u8500_hsem: Change to use devm_platform_ioremap_resource()
        MAINTAINERS: hwspinlock: update git tree location
      454d9c4a
    • Linus Torvalds's avatar
      Merge tag 'rpmsg-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc · 687fcad8
      Linus Torvalds authored
      Pull rpmsg updates from Bjorn Andersson:
       "This contains a number of bug fixes to the GLINK transport driver, an
        off-by-one in the GLINK smem driver and a memory leak fix in the rpmsg
        char driver"
      
      * tag 'rpmsg-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc:
        rpmsg: Fix Kconfig indentation
        rpmsg: char: Simplify 'rpmsg_eptdev_release()'
        rpmsg: glink: Free pending deferred work on remove
        rpmsg: glink: Don't send pending rx_done during remove
        rpmsg: glink: Fix rpmsg_register_device err handling
        rpmsg: glink: Put an extra reference during cleanup
        rpmsg: glink: Fix use after free in open_ack TIMEOUT case
        rpmsg: glink: Fix reuse intents memory leak issue
        rpmsg: glink: Set tail pointer to 0 at end of FIFO
        rpmsg: char: release allocated memory
      687fcad8
    • Linus Torvalds's avatar
      Merge tag 'rproc-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc · 5e3b06d3
      Linus Torvalds authored
      Pull remoteproc updates from Bjorn Andersson:
       "This adds support for booting the modem processor on Qualcomm MSM8998
        and carries some cleanup up and bug fixes to the framework and the
        stm32 driver"
      
      * tag 'rproc-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc:
        Revert "dt-bindings: remoteproc: stm32: add wakeup-source"
        remoteproc: stm32: fix probe error case
        remoteproc: stm32: wakeup the system by wdg irq
        dt-bindings: remoteproc: stm32: add wakeup-source
        remoteproc: Fix wrong rvring index computation
        remoteproc: stm32: use workqueue to treat mailbox callback
        remoteproc: fix argument 2 of rproc_mem_entry_init
        remoteproc: qcom_q6v5_mss: Add support for MSM8998
        dt-bindings: remoteproc: qcom: Add Q6v5 Modem PIL binding for MSM8998
        remoteproc: debug: Remove unneeded NULL check
        remoteproc: remove useless typedef
      5e3b06d3