• Baochen Qiang's avatar
    wifi: ath12k: fix kernel crash during resume · 303c0178
    Baochen Qiang authored
    Currently during resume, QMI target memory is not properly handled, resulting
    in kernel crash in case DMA remap is not supported:
    
    BUG: Bad page state in process kworker/u16:54  pfn:36e80
    page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x36e80
    page dumped because: nonzero _refcount
    Call Trace:
     bad_page
     free_page_is_bad_report
     __free_pages_ok
     __free_pages
     dma_direct_free
     dma_free_attrs
     ath12k_qmi_free_target_mem_chunk
     ath12k_qmi_msg_mem_request_cb
    
    The reason is:
    Once ath12k module is loaded, firmware sends memory request to host. In case
    DMA remap not supported, ath12k refuses the first request due to failure in
    allocating with large segment size:
    
    ath12k_pci 0000:04:00.0: qmi firmware request memory request
    ath12k_pci 0000:04:00.0: qmi mem seg type 1 size 7077888
    ath12k_pci 0000:04:00.0: qmi mem seg type 4 size 8454144
    ath12k_pci 0000:04:00.0: qmi dma allocation failed (7077888 B type 1), will try later with small size
    ath12k_pci 0000:04:00.0: qmi delays mem_request 2
    ath12k_pci 0000:04:00.0: qmi firmware request memory request
    
    Later firmware comes back with more but small segments and allocation
    succeeds:
    
    ath12k_pci 0000:04:00.0: qmi mem seg type 1 size 524288
    ath12k_pci 0000:04:00.0: qmi mem seg type 1 size 524288
    ath12k_pci 0000:04:00.0: qmi mem seg type 1 size 524288
    ath12k_pci 0000:04:00.0: qmi mem seg type 1 size 524288
    ath12k_pci 0000:04:00.0: qmi mem seg type 1 size 524288
    ath12k_pci 0000:04:00.0: qmi mem seg type 1 size 524288
    ath12k_pci 0000:04:00.0: qmi mem seg type 1 size 524288
    ath12k_pci 0000:04:00.0: qmi mem seg type 1 size 262144
    ath12k_pci 0000:04:00.0: qmi mem seg type 1 size 524288
    ath12k_pci 0000:04:00.0: qmi mem seg type 1 size 524288
    ath12k_pci 0000:04:00.0: qmi mem seg type 1 size 524288
    ath12k_pci 0000:04:00.0: qmi mem seg type 1 size 524288
    ath12k_pci 0000:04:00.0: qmi mem seg type 1 size 524288
    ath12k_pci 0000:04:00.0: qmi mem seg type 4 size 524288
    ath12k_pci 0000:04:00.0: qmi mem seg type 4 size 524288
    ath12k_pci 0000:04:00.0: qmi mem seg type 4 size 524288
    ath12k_pci 0000:04:00.0: qmi mem seg type 4 size 524288
    ath12k_pci 0000:04:00.0: qmi mem seg type 4 size 524288
    ath12k_pci 0000:04:00.0: qmi mem seg type 4 size 524288
    ath12k_pci 0000:04:00.0: qmi mem seg type 4 size 524288
    ath12k_pci 0000:04:00.0: qmi mem seg type 4 size 524288
    ath12k_pci 0000:04:00.0: qmi mem seg type 4 size 524288
    ath12k_pci 0000:04:00.0: qmi mem seg type 4 size 524288
    ath12k_pci 0000:04:00.0: qmi mem seg type 4 size 524288
    ath12k_pci 0000:04:00.0: qmi mem seg type 4 size 524288
    ath12k_pci 0000:04:00.0: qmi mem seg type 4 size 524288
    ath12k_pci 0000:04:00.0: qmi mem seg type 4 size 524288
    ath12k_pci 0000:04:00.0: qmi mem seg type 4 size 524288
    ath12k_pci 0000:04:00.0: qmi mem seg type 4 size 524288
    ath12k_pci 0000:04:00.0: qmi mem seg type 4 size 65536
    ath12k_pci 0000:04:00.0: qmi mem seg type 1 size 524288
    
    Now ath12k is working. If suspend is triggered, firmware will be reloaded
    during resume. As same as before, firmware requests two large segments at
    first. In ath12k_qmi_msg_mem_request_cb() segment count and size are
    assigned:
    
    	ab->qmi.mem_seg_count == 2
    	ab->qmi.target_mem[0].size == 7077888
    	ab->qmi.target_mem[1].size == 8454144
    
    Then allocation failed like before and ath12k_qmi_free_target_mem_chunk()
    is called to free all allocated segments. Note the first segment is skipped
    because its v.addr is cleared due to allocation failure:
    
    	chunk->v.addr = dma_alloc_coherent()
    
    Also note that this leaks that segment because it has not been freed.
    
    While freeing the second segment, a size of 8454144 is passed to
    dma_free_coherent(). However remember that this segment is allocated at
    the first time firmware is loaded, before suspend. So its real size is
    524288, much smaller than 8454144. As a result kernel found we are freeing
    some memory which is in use and thus crashed.
    
    So one possible fix would be to free those segments during suspend. This
    works because with them freed, ath12k_qmi_free_target_mem_chunk() does
    nothing: all segment addresses are NULL so dma_free_coherent() is not called.
    
    But note that ath11k has similar logic but never hits this issue. Reviewing
    code there shows the luck comes from QMI memory reuse logic. So the decision
    is to port it to ath12k. Like in ath11k, the crash is avoided by adding
    prev_size to target_mem_chunk structure and caching real segment size in it,
    then prev_size instead of current size is passed to dma_free_coherent(),
    no unexpected memory is freed now.
    
    Also reuse m3 buffer.
    
    Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4
    Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3
    Signed-off-by: default avatarBaochen Qiang <quic_bqiang@quicinc.com>
    Signed-off-by: default avatarKalle Valo <quic_kvalo@quicinc.com>
    Link: https://msgid.link/20240419034034.2842-1-quic_bqiang@quicinc.com
    303c0178
qmi.c 83.6 KB