1. 07 Oct, 2024 2 commits
    • Haoran Zhang's avatar
      vhost/scsi: null-ptr-dereference in vhost_scsi_get_req() · 221af82f
      Haoran Zhang authored
      Since commit 3f8ca2e1 ("vhost/scsi: Extract common handling code
      from control queue handler") a null pointer dereference bug can be
      triggered when guest sends an SCSI AN request.
      
      In vhost_scsi_ctl_handle_vq(), `vc.target` is assigned with
      `&v_req.tmf.lun[1]` within a switch-case block and is then passed to
      vhost_scsi_get_req() which extracts `vc->req` and `tpg`. However, for
      a `VIRTIO_SCSI_T_AN_*` request, tpg is not required, so `vc.target` is
      set to NULL in this branch. Later, in vhost_scsi_get_req(),
      `vc->target` is dereferenced without being checked, leading to a null
      pointer dereference bug. This bug can be triggered from guest.
      
      When this bug occurs, the vhost_worker process is killed while holding
      `vq->mutex` and the corresponding tpg will remain occupied
      indefinitely.
      
      Below is the KASAN report:
      Oops: general protection fault, probably for non-canonical address
      0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN NOPTI
      KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
      CPU: 1 PID: 840 Comm: poc Not tainted 6.10.0+ #1
      Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS
      1.16.3-debian-1.16.3-2 04/01/2014
      RIP: 0010:vhost_scsi_get_req+0x165/0x3a0
      Code: 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 2b 02 00 00
      48 b8 00 00 00 00 00 fc ff df 4d 8b 65 30 4c 89 e2 48 c1 ea 03 <0f> b6
      04 02 4c 89 e2 83 e2 07 38 d0 7f 08 84 c0 0f 85 be 01 00 00
      RSP: 0018:ffff888017affb50 EFLAGS: 00010246
      RAX: dffffc0000000000 RBX: ffff88801b000000 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff888017affcb8
      RBP: ffff888017affb80 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      R13: ffff888017affc88 R14: ffff888017affd1c R15: ffff888017993000
      FS:  000055556e076500(0000) GS:ffff88806b100000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000200027c0 CR3: 0000000010ed0004 CR4: 0000000000370ef0
      Call Trace:
       <TASK>
       ? show_regs+0x86/0xa0
       ? die_addr+0x4b/0xd0
       ? exc_general_protection+0x163/0x260
       ? asm_exc_general_protection+0x27/0x30
       ? vhost_scsi_get_req+0x165/0x3a0
       vhost_scsi_ctl_handle_vq+0x2a4/0xca0
       ? __pfx_vhost_scsi_ctl_handle_vq+0x10/0x10
       ? __switch_to+0x721/0xeb0
       ? __schedule+0xda5/0x5710
       ? __kasan_check_write+0x14/0x30
       ? _raw_spin_lock+0x82/0xf0
       vhost_scsi_ctl_handle_kick+0x52/0x90
       vhost_run_work_list+0x134/0x1b0
       vhost_task_fn+0x121/0x350
      ...
       </TASK>
      ---[ end trace 0000000000000000 ]---
      
      Let's add a check in vhost_scsi_get_req.
      
      Fixes: 3f8ca2e1 ("vhost/scsi: Extract common handling code from control queue handler")
      Signed-off-by: default avatarHaoran Zhang <wh1sper@zju.edu.cn>
      [whitespace fixes]
      Signed-off-by: default avatarMike Christie <michael.christie@oracle.com>
      Message-Id: <b26d7ddd-b098-4361-88f8-17ca7f90adf7@oracle.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      221af82f
    • Michael S. Tsirkin's avatar
      vsock/virtio: use GFP_ATOMIC under RCU read lock · a194c985
      Michael S. Tsirkin authored
      virtio_transport_send_pkt in now called on transport fast path,
      under RCU read lock. In that case, we have a bug: virtio_add_sgs
      is called with GFP_KERNEL, and might sleep.
      
      Pass the gfp flags as an argument, and use GFP_ATOMIC on
      the fast path.
      
      Link: https://lore.kernel.org/all/hfcr2aget2zojmqpr4uhlzvnep4vgskblx5b6xf2ddosbsrke7@nt34bxgp7j2x
      Fixes: efcd71af ("vsock/virtio: avoid queuing packets when intermediate queue is empty")
      Reported-by: default avatarChristian Brauner <brauner@kernel.org>
      Cc: Stefano Garzarella <sgarzare@redhat.com>
      Cc: Luigi Leonardi <luigi.leonardi@outlook.com>
      Message-ID: <3fbfb6e871f625f89eb578c7228e127437b1975a.1727876449.git.mst@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: default avatarPankaj Gupta <pankaj.gupta@amd.com>
      Reviewed-by: default avatarChristian Brauner <brauner@kernel.org>
      Reviewed-by: default avatarLuigi Leonardi <luigi.leonardi@outlook.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      a194c985
  2. 25 Sep, 2024 27 commits
  3. 10 Sep, 2024 10 commits
    • Cindy Lu's avatar
      vdpa/mlx5: Add the support of set mac address · 6d17035a
      Cindy Lu authored
      Add the function to support setting the MAC address.
      For vdpa/mlx5, the function will use mlx5_mpfs_add_mac
      to set the mac address
      
      Tested in ConnectX-6 Dx device
      Signed-off-by: default avatarCindy Lu <lulu@redhat.com>
      Message-Id: <20240731031653.1047692-4-lulu@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      6d17035a
    • Cindy Lu's avatar
      vdpa_sim_net: Add the support of set mac address · 218bb7ec
      Cindy Lu authored
      Add the function to support setting the MAC address.
      For vdpa_sim_net, the driver will write the MAC address
      to the config space, and other devices can implement
      their own functions to support this.
      Signed-off-by: default avatarCindy Lu <lulu@redhat.com>
      Message-Id: <20240731031653.1047692-3-lulu@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      218bb7ec
    • Cindy Lu's avatar
      vdpa: support set mac address from vdpa tool · 2f87e9cf
      Cindy Lu authored
      Add new UAPI to support the mac address from vdpa tool
      Function vdpa_nl_cmd_dev_attr_set_doit() will get the
      new MAC address from the vdpa tool and then set it to the device.
      
      The usage is: vdpa dev set name vdpa_name mac **:**:**:**:**:**
      
      Here is example:
      root@L1# vdpa -jp dev config show vdpa0
      {
          "config": {
              "vdpa0": {
                  "mac": "82:4d:e9:5d:d7:e6",
                  "link ": "up",
                  "link_announce ": false,
                  "mtu": 1500
              }
          }
      }
      
      root@L1# vdpa dev set name vdpa0 mac 00:11:22:33:44:55
      
      root@L1# vdpa -jp dev config show vdpa0
      {
          "config": {
              "vdpa0": {
                  "mac": "00:11:22:33:44:55",
                  "link ": "up",
                  "link_announce ": false,
                  "mtu": 1500
              }
          }
      }
      Signed-off-by: default avatarCindy Lu <lulu@redhat.com>
      Message-Id: <20240731031653.1047692-2-lulu@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      2f87e9cf
    • Zhu Jun's avatar
      tools/virtio:Fix the wrong format specifier · a8927f69
      Zhu Jun authored
      The unsigned int should use "%u" instead of "%d".
      Signed-off-by: default avatarZhu Jun <zhujun2@cmss.chinamobile.com>
      Message-Id: <20240724074108.9530-1-zhujun2@cmss.chinamobile.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: default avatarEugenio Pérez <eperezma@redhat.com>
      Reviewed-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      a8927f69
    • zhenwei pi's avatar
      virtio_balloon: introduce memory scan/reclaim info · 74c025c5
      zhenwei pi authored
      Expose memory scan/reclaim information to the host side via virtio
      balloon device.
      
      Now we have a metric to analyze the memory performance:
      
      y: counter increases
      n: counter does not changes
      h: the rate of counter change is high
      l: the rate of counter change is low
      
      OOM: VIRTIO_BALLOON_S_OOM_KILL
      STALL: VIRTIO_BALLOON_S_ALLOC_STALL
      ASCAN: VIRTIO_BALLOON_S_SCAN_ASYNC
      DSCAN: VIRTIO_BALLOON_S_SCAN_DIRECT
      ARCLM: VIRTIO_BALLOON_S_RECLAIM_ASYNC
      DRCLM: VIRTIO_BALLOON_S_RECLAIM_DIRECT
      
      - OOM[y], STALL[*], ASCAN[*], DSCAN[*], ARCLM[*], DRCLM[*]:
        the guest runs under really critial memory pressure
      
      - OOM[n], STALL[h], ASCAN[*], DSCAN[l], ARCLM[*], DRCLM[l]:
        the memory allocation stalls due to cgroup, not the global memory
        pressure.
      
      - OOM[n], STALL[h], ASCAN[*], DSCAN[h], ARCLM[*], DRCLM[h]:
        the memory allocation stalls due to global memory pressure. The
        performance gets hurt a lot. A high ratio between DRCLM/DSCAN shows
        quite effective memory reclaiming.
      
      - OOM[n], STALL[h], ASCAN[*], DSCAN[h], ARCLM[*], DRCLM[l]:
        the memory allocation stalls due to global memory pressure.
        the ratio between DRCLM/DSCAN gets low, the guest OS is thrashing
        heavily, the serious case leads poor performance and difficult
        trouble shooting. Ex, sshd may block on memory allocation when
        accepting new connections, a user can't login a VM by ssh command.
      
      - OOM[n], STALL[n], ASCAN[h], DSCAN[n], ARCLM[l], DRCLM[n]:
        the low ratio between ARCLM/ASCAN shows that the guest tries to
        reclaim more memory, but it can't. Once more memory is required in
        future, it will struggle to reclaim memory.
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarzhenwei pi <pizhenwei@bytedance.com>
      Message-Id: <20240423034109.1552866-5-pizhenwei@bytedance.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      74c025c5
    • zhenwei pi's avatar
      virtio_balloon: introduce memory allocation stall counter · c5b70a26
      zhenwei pi authored
      Memory allocation stall counter represents the performance/latency of
      memory allocation, expose this counter to the host side by virtio
      balloon device via out-of-bound way.
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarzhenwei pi <pizhenwei@bytedance.com>
      Message-Id: <20240423034109.1552866-4-pizhenwei@bytedance.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      c5b70a26
    • zhenwei pi's avatar
      virtio_balloon: introduce oom-kill invocations · 6cf1c97d
      zhenwei pi authored
      When the guest OS runs under critical memory pressure, the guest
      starts to kill processes. A guest monitor agent may scan 'oom_kill'
      from /proc/vmstat, and reports the OOM KILL event. However, the agent
      may be killed and we will loss this critical event(and the later
      events).
      
      For now we can also grep for magic words in guest kernel log from host
      side. Rather than this unstable way, virtio balloon reports OOM-KILL
      invocations instead.
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarzhenwei pi <pizhenwei@bytedance.com>
      Message-Id: <20240423034109.1552866-3-pizhenwei@bytedance.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      6cf1c97d
    • Philip Chen's avatar
      virtio_pmem: Check device status before requesting flush · e25fbcd9
      Philip Chen authored
      If a pmem device is in a bad status, the driver side could wait for
      host ack forever in virtio_pmem_flush(), causing the system to hang.
      
      So add a status check in the beginning of virtio_pmem_flush() to return
      early if the device is not activated.
      Signed-off-by: default avatarPhilip Chen <philipchen@chromium.org>
      Message-Id: <20240826215313.2673566-1-philipchen@chromium.org>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com
      e25fbcd9
    • Jason Wang's avatar
      vhost_vdpa: assign irq bypass producer token correctly · 02e9e936
      Jason Wang authored
      We used to call irq_bypass_unregister_producer() in
      vhost_vdpa_setup_vq_irq() which is problematic as we don't know if the
      token pointer is still valid or not.
      
      Actually, we use the eventfd_ctx as the token so the life cycle of the
      token should be bound to the VHOST_SET_VRING_CALL instead of
      vhost_vdpa_setup_vq_irq() which could be called by set_status().
      
      Fixing this by setting up irq bypass producer's token when handling
      VHOST_SET_VRING_CALL and un-registering the producer before calling
      vhost_vring_ioctl() to prevent a possible use after free as eventfd
      could have been released in vhost_vring_ioctl(). And such registering
      and unregistering will only be done if DRIVER_OK is set.
      Reported-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Tested-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Reviewed-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Fixes: 2cf1ba9a ("vhost_vdpa: implement IRQ offloading in vhost_vdpa")
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Message-Id: <20240816031900.18013-1-jasowang@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      02e9e936
    • Dragos Tatulea's avatar
      vdpa/mlx5: Fix invalid mr resource destroy · dc125029
      Dragos Tatulea authored
      Certain error paths from mlx5_vdpa_dev_add() can end up releasing mr
      resources which never got initialized in the first place.
      
      This patch adds the missing check in mlx5_vdpa_destroy_mr_resources()
      to block releasing non-initialized mr resources.
      
      Reference trace:
      
        mlx5_core 0000:08:00.2: mlx5_vdpa_dev_add:3274:(pid 2700) warning: No mac address provisioned?
        BUG: kernel NULL pointer dereference, address: 0000000000000000
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        PGD 140216067 P4D 0
        Oops: 0000 [#1] PREEMPT SMP NOPTI
        CPU: 8 PID: 2700 Comm: vdpa Kdump: loaded Not tainted 5.14.0-496.el9.x86_64 #1
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        RIP: 0010:vhost_iotlb_del_range+0xf/0xe0 [vhost_iotlb]
        Code: [...]
        RSP: 0018:ff1c823ac23077f0 EFLAGS: 00010246
        RAX: ffffffffc1a21a60 RBX: ffffffff899567a0 RCX: 0000000000000000
        RDX: ffffffffffffffff RSI: 0000000000000000 RDI: 0000000000000000
        RBP: ff1bda1f7c21e800 R08: 0000000000000000 R09: ff1c823ac2307670
        R10: ff1c823ac2307668 R11: ffffffff8a9e7b68 R12: 0000000000000000
        R13: 0000000000000000 R14: ff1bda1f43e341a0 R15: 00000000ffffffea
        FS:  00007f56eba7c740(0000) GS:ff1bda269f800000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000000 CR3: 0000000104d90001 CR4: 0000000000771ef0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        PKRU: 55555554
        Call Trace:
      
         ? show_trace_log_lvl+0x1c4/0x2df
         ? show_trace_log_lvl+0x1c4/0x2df
         ? mlx5_vdpa_free+0x3d/0x150 [mlx5_vdpa]
         ? __die_body.cold+0x8/0xd
         ? page_fault_oops+0x134/0x170
         ? __irq_work_queue_local+0x2b/0xc0
         ? irq_work_queue+0x2c/0x50
         ? exc_page_fault+0x62/0x150
         ? asm_exc_page_fault+0x22/0x30
         ? __pfx_mlx5_vdpa_free+0x10/0x10 [mlx5_vdpa]
         ? vhost_iotlb_del_range+0xf/0xe0 [vhost_iotlb]
         mlx5_vdpa_free+0x3d/0x150 [mlx5_vdpa]
         vdpa_release_dev+0x1e/0x50 [vdpa]
         device_release+0x31/0x90
         kobject_cleanup+0x37/0x130
         mlx5_vdpa_dev_add+0x2d2/0x7a0 [mlx5_vdpa]
         vdpa_nl_cmd_dev_add_set_doit+0x277/0x4c0 [vdpa]
         genl_family_rcv_msg_doit+0xd9/0x130
         genl_family_rcv_msg+0x14d/0x220
         ? __pfx_vdpa_nl_cmd_dev_add_set_doit+0x10/0x10 [vdpa]
         ? _copy_to_user+0x1a/0x30
         ? move_addr_to_user+0x4b/0xe0
         genl_rcv_msg+0x47/0xa0
         ? __import_iovec+0x46/0x150
         ? __pfx_genl_rcv_msg+0x10/0x10
         netlink_rcv_skb+0x54/0x100
         genl_rcv+0x24/0x40
         netlink_unicast+0x245/0x370
         netlink_sendmsg+0x206/0x440
         __sys_sendto+0x1dc/0x1f0
         ? do_read_fault+0x10c/0x1d0
         ? do_pte_missing+0x10d/0x190
         __x64_sys_sendto+0x20/0x30
         do_syscall_64+0x5c/0xf0
         ? __count_memcg_events+0x4f/0xb0
         ? mm_account_fault+0x6c/0x100
         ? handle_mm_fault+0x116/0x270
         ? do_user_addr_fault+0x1d6/0x6a0
         ? do_syscall_64+0x6b/0xf0
         ? clear_bhb_loop+0x25/0x80
         ? clear_bhb_loop+0x25/0x80
         ? clear_bhb_loop+0x25/0x80
         ? clear_bhb_loop+0x25/0x80
         ? clear_bhb_loop+0x25/0x80
         entry_SYSCALL_64_after_hwframe+0x78/0x80
      
      Fixes: 512c0cdd ("vdpa/mlx5: Decouple cvq iotlb handling from hw mapping code")
      Signed-off-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Reviewed-by: default avatarCosmin Ratiu <cratiu@nvidia.com>
      Message-Id: <20240827160808.2448017-2-dtatulea@nvidia.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: default avatarSi-Wei Liu <si-wei.liu@oracle.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Reviewed-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      dc125029
  4. 01 Sep, 2024 1 commit