vdpa/mlx5: Parallelize device suspend
Currently device suspend works on vqs serially. Building up on previous changes that converted vq operations to the async api, this patch parallelizes the device suspend: 1) Suspend all active vqs parallel. 2) Query suspended vqs in parallel. For 1 vDPA device x 32 VQs (16 VQPs) attached to a large VM (256 GB RAM, 32 CPUs x 2 threads per core), the device suspend time is reduced from ~37 ms to ~13 ms. A later patch will remove the link unregister operation which will make it even faster. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20240816090159.1967650-7-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com>
Showing
Please register or sign in to comment