• Peng Zhang's avatar
    mm: kfence: improve the performance of __kfence_alloc() and __kfence_free() · 1ba3cbf3
    Peng Zhang authored
    In __kfence_alloc() and __kfence_free(), we will set and check canary. 
    Assuming that the size of the object is close to 0, nearly 4k memory
    accesses are required because setting and checking canary is executed byte
    by byte.
    
    canary is now defined like this:
    KFENCE_CANARY_PATTERN(addr) ((u8)0xaa ^ (u8)((unsigned long)(addr) & 0x7))
    
    Observe that canary is only related to the lower three bits of the
    address, so every 8 bytes of canary are the same.  We can access 8-byte
    canary each time instead of byte-by-byte, thereby optimizing nearly 4k
    memory accesses to 4k/8 times.
    
    Use the bcc tool funclatency to measure the latency of __kfence_alloc()
    and __kfence_free(), the numbers (deleted the distribution of latency) is
    posted below.  Though different object sizes will have an impact on the
    measurement, we ignore it for now and assume the average object size is
    roughly equal.
    
    Before patching:
    __kfence_alloc:
    avg = 5055 nsecs, total: 5515252 nsecs, count: 1091
    __kfence_free:
    avg = 5319 nsecs, total: 9735130 nsecs, count: 1830
    
    After patching:
    __kfence_alloc:
    avg = 3597 nsecs, total: 6428491 nsecs, count: 1787
    __kfence_free:
    avg = 3046 nsecs, total: 3415390 nsecs, count: 1121
    
    The numbers indicate that there is ~30% - ~40% performance improvement.
    
    Link: https://lkml.kernel.org/r/20230403122738.6006-1-zhangpeng.00@bytedance.comSigned-off-by: default avatarPeng Zhang <zhangpeng.00@bytedance.com>
    Reviewed-by: default avatarMarco Elver <elver@google.com>
    Cc: Alexander Potapenko <glider@google.com>
    Cc: Dmitry Vyukov <dvyukov@google.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    1ba3cbf3
report.c 9.91 KB