• Levin Zimmermann's avatar
    bigfile/zodb: Add ZBlk format option 'auto' (heuristic) · 96c68406
    Levin Zimmermann authored
    There are two formats to save data with a ZBigFile: ZBlk0 and ZBlk1.
    They differ by adjusting the ratio between access-time and growing
    disk-space, where ZBlk1 is better regarding to disk space, while ZBlk0
    has a better access-time. Wendelin.core users may not always know yet or
    care which format fits better for their data. In this case it may be
    easier for users to just let the program automatically select the ZBlk
    format. With this patch and the new 'auto' (for heuristic) option of the
    'ZBlk' argument of ZBigFile, this is now possible. The 'auto' option isn't
    really a new ZBlk format in itself, but it just tries to automatically
    select the best ZBlk format option according to the characteristics
    of the changes that the user applies to the ZBigFile.
    
    In its current implementation, the heuristic tackles the use-case of
    large arrays with many small append-only changes. In this case 'auto' is
    smaller in space than ZBlk0, but faster to read than ZBlk1. It does so,
    by initially using ZBlk1 until a blk is filled up. Once a blk is full,
    it switches to ZBlk1, as it was recommended by @kirr in
    nexedi/wendelin.core!20 (comment 196084).
    
    With this patch comes a test (bigfile/tests/bench_zblkfmt) that creates
    benchmarks for different combinations and zblk formats. The test aims
    to check how the 'heuristic' format performs in contrast to 'ZBlk0'
    and 'ZBlk1':
    
        BenchmarkAppendSize/zblk=ZBlk0/change_count=500/change_percentage_set=[0.014]   1       538.1 MB
        BenchmarkAppendRandRead/zblk=ZBlk0/change_count=500/change_percentage_set=[0.014]       6 2.085 ms/blk
        BenchmarkAppendSize/zblk=ZBlk1/change_count=500/change_percentage_set=[0.014]   1       16.8 MB
        BenchmarkAppendRandRead/zblk=ZBlk1/change_count=500/change_percentage_set=[0.014]       6 14.564 ms/blk
        BenchmarkAppendSize/zblk=auto/change_count=500/change_percentage_set=[0.014]    1       29.4 MB
        BenchmarkAppendRandRead/zblk=auto/change_count=500/change_percentage_set=[0.014]        6 2.119 ms/blk
        BenchmarkRandWriteSize/zblk=ZBlk0/arrsize=1000000/change_count=500/change_percentage_set=[0.2]  1       1021.1 MB
        BenchmarkRandWriteRandRead/zblk=ZBlk0/arrsize=1000000/change_count=500/change_percentage_set=[0.2]      3 2.324 ms/blk
        BenchmarkRandWriteSize/zblk=ZBlk1/arrsize=1000000/change_count=500/change_percentage_set=[0.2]  1       216.2 MB
        BenchmarkRandWriteRandRead/zblk=ZBlk1/arrsize=1000000/change_count=500/change_percentage_set=[0.2]      3 15.317 ms/blk
        BenchmarkRandWriteSize/zblk=auto/arrsize=1000000/change_count=500/change_percentage_set=[0.2]   1       219.8 MB
        BenchmarkRandWriteRandRead/zblk=auto/arrsize=1000000/change_count=500/change_percentage_set=[0.2]       3 14.027 ms/blk
        BenchmarkRandWriteSize/zblk=ZBlk0/arrsize=1000000/change_count=500/change_percentage_set=[1]    1       1048.6 MB
        BenchmarkRandWriteRandRead/zblk=ZBlk0/arrsize=1000000/change_count=500/change_percentage_set=[1]        3 2.126 ms/blk
        BenchmarkRandWriteSize/zblk=ZBlk1/arrsize=1000000/change_count=500/change_percentage_set=[1]    1       1070.4 MB
        BenchmarkRandWriteRandRead/zblk=ZBlk1/arrsize=1000000/change_count=500/change_percentage_set=[1]        3 14.284 ms/blk
        BenchmarkRandWriteSize/zblk=auto/arrsize=1000000/change_count=500/change_percentage_set=[1]     1       1070.3 MB
        BenchmarkRandWriteRandRead/zblk=auto/arrsize=1000000/change_count=500/change_percentage_set=[1] 3 14.072 ms/blk
        BenchmarkRandWriteSize/zblk=ZBlk0/arrsize=1000000/change_count=500/change_percentage_set=[0.2,1]        1       1046.4 MB
        BenchmarkRandWriteRandRead/zblk=ZBlk0/arrsize=1000000/change_count=500/change_percentage_set=[0.2,1]    3 2.137 ms/blk
        BenchmarkRandWriteSize/zblk=ZBlk1/arrsize=1000000/change_count=500/change_percentage_set=[0.2,1]        1       638.2 MB
        BenchmarkRandWriteRandRead/zblk=ZBlk1/arrsize=1000000/change_count=500/change_percentage_set=[0.2,1]    3 14.083 ms/blk
        BenchmarkRandWriteSize/zblk=auto/arrsize=1000000/change_count=500/change_percentage_set=[0.2,1] 1       639.5 MB
        BenchmarkRandWriteRandRead/zblk=auto/arrsize=1000000/change_count=500/change_percentage_set=[0.2,1]     3 13.937 ms/blk
    
    and post-processed with benchstat from 3 such runs:
    
                                                                                                │     x.log     │
                                                                                                │       B       │
        AppendSize/zblk=ZBlk0/change_count=500/change_percentage_set=[0.014]                       513.2Mi ± 0%
        AppendSize/zblk=ZBlk1/change_count=500/change_percentage_set=[0.014]                       16.02Mi ± 0%
        AppendSize/zblk=auto/change_count=500/change_percentage_set=[0.014]                        28.04Mi ± 0%
        RandWriteSize/zblk=ZBlk0/arrsize=1000000/change_count=500/change_percentage_set=[0.2]      973.8Mi ± 0%
        RandWriteSize/zblk=ZBlk1/arrsize=1000000/change_count=500/change_percentage_set=[0.2]      206.2Mi ± 0%
        RandWriteSize/zblk=auto/arrsize=1000000/change_count=500/change_percentage_set=[0.2]       209.6Mi ± 0%
        RandWriteSize/zblk=ZBlk0/arrsize=1000000/change_count=500/change_percentage_set=[1]       1000.0Mi ± 0%
        RandWriteSize/zblk=ZBlk1/arrsize=1000000/change_count=500/change_percentage_set=[1]       1020.8Mi ± 0%
        RandWriteSize/zblk=auto/arrsize=1000000/change_count=500/change_percentage_set=[1]        1020.7Mi ± 0%
        RandWriteSize/zblk=ZBlk0/arrsize=1000000/change_count=500/change_percentage_set=[0.2,1]    997.9Mi ± 0%
        RandWriteSize/zblk=ZBlk1/arrsize=1000000/change_count=500/change_percentage_set=[0.2,1]    608.6Mi ± 0%
        RandWriteSize/zblk=auto/arrsize=1000000/change_count=500/change_percentage_set=[0.2,1]     609.9Mi ± 0%
        geomean                                                                                    353.0Mi
    
                                                                                                    │    x.log    │
                                                                                                    │   ms/blk    │
        AppendRandRead/zblk=ZBlk0/change_count=500/change_percentage_set=[0.014]                      2.094 ± 12%
        AppendRandRead/zblk=ZBlk1/change_count=500/change_percentage_set=[0.014]                      14.47 ±  1%
        AppendRandRead/zblk=auto/change_count=500/change_percentage_set=[0.014]                       2.168 ±  2%
        RandWriteRandRead/zblk=ZBlk0/arrsize=1000000/change_count=500/change_percentage_set=[0.2]     2.324 ±  1%
        RandWriteRandRead/zblk=ZBlk1/arrsize=1000000/change_count=500/change_percentage_set=[0.2]     13.73 ± 12%
        RandWriteRandRead/zblk=auto/arrsize=1000000/change_count=500/change_percentage_set=[0.2]      13.60 ±  3%
        RandWriteRandRead/zblk=ZBlk0/arrsize=1000000/change_count=500/change_percentage_set=[1]       2.125 ±  2%
        RandWriteRandRead/zblk=ZBlk1/arrsize=1000000/change_count=500/change_percentage_set=[1]       14.18 ±  3%
        RandWriteRandRead/zblk=auto/arrsize=1000000/change_count=500/change_percentage_set=[1]        14.17 ±  1%
        RandWriteRandRead/zblk=ZBlk0/arrsize=1000000/change_count=500/change_percentage_set=[0.2,1]   2.118 ±  1%
        RandWriteRandRead/zblk=ZBlk1/arrsize=1000000/change_count=500/change_percentage_set=[0.2,1]   13.85 ±  2%
        RandWriteRandRead/zblk=auto/arrsize=1000000/change_count=500/change_percentage_set=[0.2,1]    13.80 ±  1%
        geomean                                                                                       6.423
    
    See nexedi/wendelin.core!20 and
    da765ef7...0c6f0850 for the
    preliminary history of this patch.
    Co-authored-by: Kirill Smelkov's avatarKirill Smelkov <kirr@nexedi.com>
    96c68406
file_zodb.py 36.5 KB