• Nirmoy Das's avatar
    drm/mm: optimize rb_hole_addr rbtree search · 0cdea445
    Nirmoy Das authored
    Userspace can severely fragment rb_hole_addr rbtree by manipulating
    alignment while allocating buffers. Fragmented rb_hole_addr rbtree
    would result in large delays while allocating buffer object for a
    userspace application. It takes long time to find suitable hole
    because if we fail to find a suitable hole in the first attempt
    then we look for neighbouring nodes using rb_prev()/rb_next().
    Traversing rbtree using rb_prev()/rb_next() can take really long
    time if the tree is fragmented.
    
    This patch improves searches in fragmented rb_hole_addr rbtree by
    modifying it to an augmented rbtree which will store an extra field
    in drm_mm_node, subtree_max_hole. Each drm_mm_node now stores maximum
    hole size for its subtree in drm_mm_node->subtree_max_hole. Using
    drm_mm_node->subtree_max_hole, it is possible to eliminate a complete
    subtree if that subtree is unable to serve a request hence reducing
    number of rb_prev()/rb_next() used.
    
    With this patch applied, 1 million bo allocs on amdgpu took ~8 sec,
    compared to 50k bo allocs which took 28 sec without it.
    
    partial test code:
    int test_fragmentation(void)
    {
    
    	int i = 0;
            uint32_t  minor_version;
            uint32_t  major_version;
    
            struct amdgpu_bo_alloc_request request = {};
            amdgpu_bo_handle vram_handle[MAX_ALLOC] = {};
            amdgpu_device_handle device_handle;
    
            request.alloc_size = 4096;
            request.phys_alignment = 8192;
            request.preferred_heap = AMDGPU_GEM_DOMAIN_VRAM;
    
            int fd = open("/dev/dri/card0", O_RDWR | O_CLOEXEC);
            amdgpu_device_initialize(fd, &major_version,  &minor_version,
    				 &device_handle);
    
            for (i = 0; i < MAX_ALLOC; i++) {
                    amdgpu_bo_alloc(device_handle, &request, &vram_handle[i]);
            }
    
            for (i = 0; i < MAX_ALLOC; i++)
                    amdgpu_bo_free(vram_handle[i]);
    
            return 0;
    }
    
    v2:
    Use RB_DECLARE_CALLBACKS_MAX to maintain subtree_max_hole
    v3:
    insert_hole_addr() should be static a function
    fix return value of next_hole_high_addr()/next_hole_low_addr()
    Reported-by: default avatarkbuild test robot <lkp@intel.com>
    v4:
    Fix commit message.
    Signed-off-by: default avatarNirmoy Das <nirmoy.das@amd.com>
    Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
    Acked-by: default avatarChristian König <christian.koenig@amd.com>
    Link: https://patchwork.freedesktop.org/patch/364341/Signed-off-by: default avatarChristian König <christian.koenig@amd.com>
    0cdea445
drm_mm.h 17.5 KB