1. 22 May, 2024 14 commits
    • David Stevens's avatar
      virtio_balloon: Treat stats requests as wakeup events · c578123e
      David Stevens authored
      Treat stats requests as wakeup events to ensure that the driver responds
      to device requests in a timely manner.
      Signed-off-by: default avatarDavid Stevens <stevensd@chromium.org>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Message-Id: <20240321012445.1593685-3-stevensd@google.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      c578123e
    • David Stevens's avatar
      virtio_balloon: Give the balloon its own wakeup source · 810d831b
      David Stevens authored
      Wakeup sources don't support nesting multiple events, so sharing a
      single object between multiple drivers can result in one driver
      overriding the wakeup event processing period specified by another
      driver. Have the virtio balloon driver use the wakeup source of the
      device it is bound to rather than the wakeup source of the parent
      device, to avoid conflicts with the transport layer.
      
      Note that although the virtio balloon's virtio_device itself isn't what
      actually wakes up the device, it is responsible for processing wakeup
      events. In the same way that EPOLLWAKEUP uses a dedicated wakeup_source
      to prevent suspend when userspace is processing wakeup events, a
      dedicated wakeup_source is necessary when processing wakeup events in a
      higher layer in the kernel.
      
      Fixes: b12fbc3f ("virtio_balloon: stay awake while adjusting balloon")
      Signed-off-by: default avatarDavid Stevens <stevensd@chromium.org>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Message-Id: <20240321012445.1593685-2-stevensd@google.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      810d831b
    • David Hildenbrand's avatar
      virtio-mem: support suspend+resume · e4544c55
      David Hildenbrand authored
      With virtio-mem, primarily hibernation is problematic: as the machine shuts
      down, the virtio-mem device loses its state. Powering the machine back up
      is like losing a bunch of DIMMs. While there would be ways to add limited
      support, suspend+resume is more commonly used for VMs and "easier" to
      support cleanly.
      
      s2idle can be supported without any device dependencies. Similarly, one
      would expect suspend-to-ram (i.e., S3) to work out of the box. However,
      QEMU currently unplugs all device memory when resuming the VM, using a
      cold reset on the "wakeup" path. In order to support S3, we need a feature
      flag for the device to tell us if memory remains plugged when waking up. In
      the future, QEMU will implement this feature.
      
      So let's always support s2idle and support S3 with plugged memory only if
      the device indicates support. Block hibernation early using the PM
      notifier.
      
      Trying to hibernate now fails early:
      	# echo disk > /sys/power/state
      	[   26.455369] PM: hibernation: hibernation entry
      	[   26.458271] virtio_mem virtio0: hibernation is not supported.
      	[   26.462498] PM: hibernation: hibernation exit
      	-bash: echo: write error: Operation not permitted
      
      s2idle works even without the new feature bit:
      	# echo s2idle > /sys/power/mem_sleep
      	# echo mem > /sys/power/state
      	[   52.083725] PM: suspend entry (s2idle)
      	[   52.095950] Filesystems sync: 0.010 seconds
      	[   52.101493] Freezing user space processes
      	[   52.104213] Freezing user space processes completed (elapsed 0.001 seconds)
      	[   52.106520] OOM killer disabled.
      	[   52.107655] Freezing remaining freezable tasks
      	[   52.110880] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
      	[   52.113296] printk: Suspending console(s) (use no_console_suspend to debug)
      
      S3 does not work without the feature bit when memory is plugged:
      	# echo deep > /sys/power/mem_sleep
      	# echo mem > /sys/power/state
      	[   32.788281] PM: suspend entry (deep)
      	[   32.816630] Filesystems sync: 0.027 seconds
      	[   32.820029] Freezing user space processes
      	[   32.823870] Freezing user space processes completed (elapsed 0.001 seconds)
      	[   32.827756] OOM killer disabled.
      	[   32.829608] Freezing remaining freezable tasks
      	[   32.833842] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
      	[   32.837953] printk: Suspending console(s) (use no_console_suspend to debug)
      	[   32.916172] virtio_mem virtio0: suspend+resume with plugged memory is not supported
      	[   32.916181] virtio-pci 0000:00:02.0: PM: pci_pm_suspend(): virtio_pci_freeze+0x0/0x50 returns -1
      	[   32.916197] virtio-pci 0000:00:02.0: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x170 returns -1
      	[   32.916210] virtio-pci 0000:00:02.0: PM: failed to suspend async: error -1
      
      But S3 works with the new feature bit when memory is plugged (patched
      QEMU):
      	# echo deep > /sys/power/mem_sleep
      	# echo mem > /sys/power/state
      	[   33.983694] PM: suspend entry (deep)
      	[   34.009828] Filesystems sync: 0.024 seconds
      	[   34.013589] Freezing user space processes
      	[   34.016722] Freezing user space processes completed (elapsed 0.001 seconds)
      	[   34.019092] OOM killer disabled.
      	[   34.020291] Freezing remaining freezable tasks
      	[   34.023549] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
      	[   34.026090] printk: Suspending console(s) (use no_console_suspend to debug)
      
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Message-Id: <20240318120645.105664-1-david@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      e4544c55
    • Mike Christie's avatar
      kernel: Remove signal hacks for vhost_tasks · 240a1853
      Mike Christie authored
      This removes the signal/coredump hacks added for vhost_tasks in:
      
      Commit f9010dbd ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression")
      
      When that patch was added vhost_tasks did not handle SIGKILL and would
      try to ignore/clear the signal and continue on until the device's close
      function was called. In the previous patches vhost_tasks and the vhost
      drivers were converted to support SIGKILL by cleaning themselves up and
      exiting. The hacks are no longer needed so this removes them.
      Signed-off-by: default avatarMike Christie <michael.christie@oracle.com>
      Message-Id: <20240316004707.45557-10-michael.christie@oracle.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      240a1853
    • Mike Christie's avatar
      vhost_task: Handle SIGKILL by flushing work and exiting · db5247d9
      Mike Christie authored
      Instead of lingering until the device is closed, this has us handle
      SIGKILL by:
      
      1. marking the worker as killed so we no longer try to use it with
         new virtqueues and new flush operations.
      2. setting the virtqueue to worker mapping so no new works are queued.
      3. running all the exiting works.
      Suggested-by: default avatarEdward Adam Davis <eadavis@qq.com>
      Reported-and-tested-by: syzbot+98edc2df894917b3431f@syzkaller.appspotmail.com
      Message-Id: <tencent_546DA49414E876EEBECF2C78D26D242EE50A@qq.com>
      Signed-off-by: default avatarMike Christie <michael.christie@oracle.com>
      Message-Id: <20240316004707.45557-9-michael.christie@oracle.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      db5247d9
    • Mike Christie's avatar
      vhost: Release worker mutex during flushes · ba704ff4
      Mike Christie authored
      In the next patches where the worker can be killed while in use, we
      need to be able to take the worker mutex and kill queued works for
      new IO and flushes, and set some new flags to prevent new
      __vhost_vq_attach_worker calls from swapping in/out killed workers.
      
      If we are holding the worker mutex during a flush and the flush's work
      is still in the queue, the worker code that will handle the SIGKILL
      cleanup won't be able to take the mutex and perform it's cleanup. So
      this patch has us drop the worker mutex while waiting for the flush
      to complete.
      Signed-off-by: default avatarMike Christie <michael.christie@oracle.com>
      Message-Id: <20240316004707.45557-8-michael.christie@oracle.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      ba704ff4
    • Mike Christie's avatar
      vhost: Use virtqueue mutex for swapping worker · 34cf9ba5
      Mike Christie authored
      __vhost_vq_attach_worker uses the vhost_dev mutex to serialize the
      swapping of a virtqueue's worker. This was done for simplicity because
      we are already holding that mutex.
      
      In the next patches where the worker can be killed while in use, we need
      finer grained locking because some drivers will hold the vhost_dev mutex
      while flushing. However in the SIGKILL handler in the next patches, we
      will need to be able to swap workers (set current one to NULL), kill
      queued works and stop new flushes while flushes are in progress.
      
      To prepare us, this has us use the virtqueue mutex for swapping workers
      instead of the vhost_dev one.
      Signed-off-by: default avatarMike Christie <michael.christie@oracle.com>
      Message-Id: <20240316004707.45557-7-michael.christie@oracle.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      34cf9ba5
    • Mike Christie's avatar
      vhost_scsi: Handle vhost_vq_work_queue failures for TMFs · 0352c961
      Mike Christie authored
      vhost_vq_work_queue will never fail when queueing the TMF's response
      handling because a guest can only send us TMFs when the device is fully
      setup so there is always a worker at that time. In the next patches we
      will modify the worker code so it handles SIGKILL by exiting before
      outstanding commands/TMFs have sent their responses. In that case
      vhost_vq_work_queue can fail when we try to send a response.
      
      This has us just free the TMF's resources since at this time the guest
      won't be able to get a response even if we could send it.
      Signed-off-by: default avatarMike Christie <michael.christie@oracle.com>
      Message-Id: <20240316004707.45557-6-michael.christie@oracle.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      0352c961
    • Mike Christie's avatar
      vhost: Remove vhost_vq_flush · d9e59eec
      Mike Christie authored
      vhost_vq_flush is no longer used so remove it.
      Signed-off-by: default avatarMike Christie <michael.christie@oracle.com>
      Message-Id: <20240316004707.45557-5-michael.christie@oracle.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      d9e59eec
    • Mike Christie's avatar
      vhost-scsi: Use system wq to flush dev for TMFs · 59b701b9
      Mike Christie authored
      We flush all the workers that are not also used by the ctl vq to make
      sure that responses queued by LIO before the TMF response are sent
      before the TMF response. This requires a special vhost_vq_flush
      function which, in the next patches where we handle SIGKILL killing
      workers while in use, will require extra locking/complexity. To avoid
      that, this patch has us flush the entire device from the system work
      queue, then queue up sending the response from there.
      
      This is a little less optimal since we now flush all workers but this
      will be ok since commands have already timed out and perf is not a
      concern.
      Signed-off-by: default avatarMike Christie <michael.christie@oracle.com>
      Message-Id: <20240316004707.45557-4-michael.christie@oracle.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      59b701b9
    • Mike Christie's avatar
      vhost-scsi: Handle vhost_vq_work_queue failures for cmds · 1eceddee
      Mike Christie authored
      In the next patches we will support the vhost_task being killed while in
      use. The problem for vhost-scsi is that we can't free some structs until
      we get responses for commands we have submitted to the target layer and
      we currently process the responses from the vhost_task.
      
      This has just drop the responses and free the command's resources. When
      all commands have completed then operations like flush will be woken up
      and we can complete device release and endpoint cleanup.
      Signed-off-by: default avatarMike Christie <michael.christie@oracle.com>
      Message-Id: <20240316004707.45557-3-michael.christie@oracle.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      1eceddee
    • Mike Christie's avatar
      vhost-scsi: Handle vhost_vq_work_queue failures for events · b1b2ce58
      Mike Christie authored
      Currently, we can try to queue an event's work before the vhost_task is
      created. When this happens we just drop it in vhost_scsi_do_plug before
      even calling vhost_vq_work_queue. During a device shutdown we do the
      same thing after vhost_scsi_clear_endpoint has cleared the backends.
      
      In the next patches we will be able to kill the vhost_task before we
      have cleared the endpoint. In that case, vhost_vq_work_queue can fail
      and we will leak the event's memory. This has handle the failure by
      just freeing the event. This is safe to do, because
      vhost_vq_work_queue will only return failure for us when the vhost_task
      is killed and so userspace will not be able to handle events if we
      sent them.
      Signed-off-by: default avatarMike Christie <michael.christie@oracle.com>
      Message-Id: <20240316004707.45557-2-michael.christie@oracle.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      b1b2ce58
    • Li Zhijian's avatar
      vdpa: Convert sprintf/snprintf to sysfs_emit · 7b1b5c7f
      Li Zhijian authored
      Per filesystems/sysfs.rst, show() should only use sysfs_emit()
      or sysfs_emit_at() when formatting the value to be returned to user space.
      
      coccinelle complains that there are still a couple of functions that use
      snprintf(). Convert them to sysfs_emit().
      
      sprintf() will be converted as weel if they have.
      
      Generally, this patch is generated by
      make coccicheck M=<path/to/file> MODE=patch \
      COCCI=scripts/coccinelle/api/device_attr_show.cocci
      
      No functional change intended
      
      CC: "Michael S. Tsirkin" <mst@redhat.com>
      CC: Jason Wang <jasowang@redhat.com>
      CC: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
      CC: virtualization@lists.linux.dev
      Signed-off-by: default avatarLi Zhijian <lizhijian@fujitsu.com>
      Message-Id: <20240314095853.1326111-1-lizhijian@fujitsu.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      7b1b5c7f
    • Yuxue Liu's avatar
      vp_vdpa: Fix return value check vp_vdpa_request_irq · f181a373
      Yuxue Liu authored
      In the vp_vdpa_set_status function, when setting the device status to
      VIRTIO_CONFIG_S_DRIVER_OK, the vp_vdpa_request_irq function may fail.
      In such cases, the device status should not be set to DRIVER_OK. Add
      exception printing to remind the user.
      Signed-off-by: default avatarYuxue Liu <yuxue.liu@jaguarmicro.com>
      Message-Id: <20240325105448.235-1-gavin.liu@jaguarmicro.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      f181a373
  2. 01 May, 2024 2 commits
  3. 30 Apr, 2024 5 commits
  4. 29 Apr, 2024 11 commits
  5. 28 Apr, 2024 8 commits