• Jean-Philippe Brucker's avatar
    skbuff: fix coalescing for page_pool fragment recycling · 1effe8ca
    Jean-Philippe Brucker authored
    Fix a use-after-free when using page_pool with page fragments. We
    encountered this problem during normal RX in the hns3 driver:
    
    (1) Initially we have three descriptors in the RX queue. The first one
        allocates PAGE1 through page_pool, and the other two allocate one
        half of PAGE2 each. Page references look like this:
    
                    RX_BD1 _______ PAGE1
                    RX_BD2 _______ PAGE2
                    RX_BD3 _________/
    
    (2) Handle RX on the first descriptor. Allocate SKB1, eventually added
        to the receive queue by tcp_queue_rcv().
    
    (3) Handle RX on the second descriptor. Allocate SKB2 and pass it to
        netif_receive_skb():
    
        netif_receive_skb(SKB2)
          ip_rcv(SKB2)
            SKB3 = skb_clone(SKB2)
    
        SKB2 and SKB3 share a reference to PAGE2 through
        skb_shinfo()->dataref. The other ref to PAGE2 is still held by
        RX_BD3:
    
                          SKB2 ---+- PAGE2
                          SKB3 __/   /
                    RX_BD3 _________/
    
     (3b) Now while handling TCP, coalesce SKB3 with SKB1:
    
          tcp_v4_rcv(SKB3)
            tcp_try_coalesce(to=SKB1, from=SKB3)    // succeeds
            kfree_skb_partial(SKB3)
              skb_release_data(SKB3)                // drops one dataref
    
                          SKB1 _____ PAGE1
                               \____
                          SKB2 _____ PAGE2
                                     /
                    RX_BD3 _________/
    
        In skb_try_coalesce(), __skb_frag_ref() takes a page reference to
        PAGE2, where it should instead have increased the page_pool frag
        reference, pp_frag_count. Without coalescing, when releasing both
        SKB2 and SKB3, a single reference to PAGE2 would be dropped. Now
        when releasing SKB1 and SKB2, two references to PAGE2 will be
        dropped, resulting in underflow.
    
     (3c) Drop SKB2:
    
          af_packet_rcv(SKB2)
            consume_skb(SKB2)
              skb_release_data(SKB2)                // drops second dataref
                page_pool_return_skb_page(PAGE2)    // drops one pp_frag_count
    
                          SKB1 _____ PAGE1
                               \____
                                     PAGE2
                                     /
                    RX_BD3 _________/
    
    (4) Userspace calls recvmsg()
        Copies SKB1 and releases it. Since SKB3 was coalesced with SKB1, we
        release the SKB3 page as well:
    
        tcp_eat_recv_skb(SKB1)
          skb_release_data(SKB1)
            page_pool_return_skb_page(PAGE1)
            page_pool_return_skb_page(PAGE2)        // drops second pp_frag_count
    
    (5) PAGE2 is freed, but the third RX descriptor was still using it!
        In our case this causes IOMMU faults, but it would silently corrupt
        memory if the IOMMU was disabled.
    
    Change the logic that checks whether pp_recycle SKBs can be coalesced.
    We still reject differing pp_recycle between 'from' and 'to' SKBs, but
    in order to avoid the situation described above, we also reject
    coalescing when both 'from' and 'to' are pp_recycled and 'from' is
    cloned.
    
    The new logic allows coalescing a cloned pp_recycle SKB into a page
    refcounted one, because in this case the release (4) will drop the right
    reference, the one taken by skb_try_coalesce().
    
    Fixes: 53e0961d ("page_pool: add frag page recycling support in page pool")
    Suggested-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
    Signed-off-by: default avatarJean-Philippe Brucker <jean-philippe@linaro.org>
    Reviewed-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
    Reviewed-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
    Acked-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
    Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    1effe8ca
skbuff.c 161 KB