• Huacai Chen's avatar
    LoongArch: Use alternative to optimize libraries · a275a82d
    Huacai Chen authored
    Use the alternative to optimize common libraries according whether CPU
    has UAL (hardware unaligned access support) feature, including memset(),
    memcopy(), memmove(), copy_user() and clear_user().
    
    We have tested UnixBench on a Loongson-3A5000 quad-core machine (1.6GHz):
    
    1, One copy, before patch:
    
    System Benchmarks Index Values               BASELINE       RESULT    INDEX
    Dhrystone 2 using register variables         116700.0    9566582.0    819.8
    Double-Precision Whetstone                       55.0       2805.3    510.1
    Execl Throughput                                 43.0       2120.0    493.0
    File Copy 1024 bufsize 2000 maxblocks          3960.0     209833.0    529.9
    File Copy 256 bufsize 500 maxblocks            1655.0      89400.0    540.2
    File Copy 4096 bufsize 8000 maxblocks          5800.0     320036.0    551.8
    Pipe Throughput                               12440.0     340624.0    273.8
    Pipe-based Context Switching                   4000.0     109939.1    274.8
    Process Creation                                126.0       4728.7    375.3
    Shell Scripts (1 concurrent)                     42.4       2223.1    524.3
    Shell Scripts (8 concurrent)                      6.0        883.1   1471.9
    System Call Overhead                          15000.0     518639.1    345.8
                                                                       ========
    System Benchmarks Index Score                                         500.2
    
    2, One copy, after patch:
    
    System Benchmarks Index Values               BASELINE       RESULT    INDEX
    Dhrystone 2 using register variables         116700.0    9567674.7    819.9
    Double-Precision Whetstone                       55.0       2805.5    510.1
    Execl Throughput                                 43.0       2392.7    556.4
    File Copy 1024 bufsize 2000 maxblocks          3960.0     417804.0   1055.1
    File Copy 256 bufsize 500 maxblocks            1655.0     112909.5    682.2
    File Copy 4096 bufsize 8000 maxblocks          5800.0    1255207.4   2164.2
    Pipe Throughput                               12440.0     555712.0    446.7
    Pipe-based Context Switching                   4000.0      99964.5    249.9
    Process Creation                                126.0       5192.5    412.1
    Shell Scripts (1 concurrent)                     42.4       2302.4    543.0
    Shell Scripts (8 concurrent)                      6.0        919.6   1532.6
    System Call Overhead                          15000.0     511159.3    340.8
                                                                       ========
    System Benchmarks Index Score                                         640.1
    
    3, Four copies, before patch:
    
    System Benchmarks Index Values               BASELINE       RESULT    INDEX
    Dhrystone 2 using register variables         116700.0   38268610.5   3279.2
    Double-Precision Whetstone                       55.0      11222.2   2040.4
    Execl Throughput                                 43.0       7892.0   1835.3
    File Copy 1024 bufsize 2000 maxblocks          3960.0     235149.6    593.8
    File Copy 256 bufsize 500 maxblocks            1655.0      74959.6    452.9
    File Copy 4096 bufsize 8000 maxblocks          5800.0     545048.5    939.7
    Pipe Throughput                               12440.0    1337359.0   1075.0
    Pipe-based Context Switching                   4000.0     473663.9   1184.2
    Process Creation                                126.0      17491.2   1388.2
    Shell Scripts (1 concurrent)                     42.4       6865.7   1619.3
    Shell Scripts (8 concurrent)                      6.0       1015.9   1693.1
    System Call Overhead                          15000.0    1899535.2   1266.4
                                                                       ========
    System Benchmarks Index Score                                        1278.3
    
    4, Four copies, after patch:
    
    System Benchmarks Index Values               BASELINE       RESULT    INDEX
    Dhrystone 2 using register variables         116700.0   38272815.5   3279.6
    Double-Precision Whetstone                       55.0      11222.8   2040.5
    Execl Throughput                                 43.0       8839.2   2055.6
    File Copy 1024 bufsize 2000 maxblocks          3960.0     313912.9    792.7
    File Copy 256 bufsize 500 maxblocks            1655.0      80976.1    489.3
    File Copy 4096 bufsize 8000 maxblocks          5800.0    1176594.3   2028.6
    Pipe Throughput                               12440.0    2100941.9   1688.9
    Pipe-based Context Switching                   4000.0     476696.4   1191.7
    Process Creation                                126.0      18394.7   1459.9
    Shell Scripts (1 concurrent)                     42.4       7172.2   1691.6
    Shell Scripts (8 concurrent)                      6.0       1058.3   1763.9
    System Call Overhead                          15000.0    1874714.7   1249.8
                                                                       ========
    System Benchmarks Index Score                                        1488.8
    Signed-off-by: default avatarJun Yi <yijun@loongson.cn>
    Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
    a275a82d
memset.S 1.44 KB