-
Levin Zimmermann authored
There are two formats to save data with a ZBigFile: ZBlk0 and ZBlk1. They differ by adjusting the ratio between access-time and growing disk-space, where ZBlk1 is better regarding to disk space, while ZBlk0 has a better access-time. Wendelin.core users may not always know yet or care which format fits better for their data. In this case it may be easier for users to just let the program automatically select the ZBlk format. With this patch and the new 'auto' (for heuristic) option of the 'ZBlk' argument of ZBigFile, this is now possible. The 'auto' option isn't really a new ZBlk format in itself, but it just tries to automatically select the best ZBlk format option according to the characteristics of the changes that the user applies to the ZBigFile. In its current implementation, the heuristic tackles the use-case of large arrays with many small append-only changes. In this case 'auto' is smaller in space than ZBlk0, but faster to read than ZBlk1. It does so, by initially using ZBlk1 until a blk is filled up. Once a blk is full, it switches to ZBlk1, as it was recommended by @kirr in nexedi/wendelin.core!20 (comment 196084). With this patch comes a test (bigfile/tests/bench_zblkfmt) that creates benchmarks for different combinations and zblk formats. The test aims to check how the 'heuristic' format performs in contrast to 'ZBlk0' and 'ZBlk1': BenchmarkAppendSize/zblk=ZBlk0/change_count=500/change_percentage_set=[0.014] 1 538.1 MB BenchmarkAppendRandRead/zblk=ZBlk0/change_count=500/change_percentage_set=[0.014] 6 2.085 ms/blk BenchmarkAppendSize/zblk=ZBlk1/change_count=500/change_percentage_set=[0.014] 1 16.8 MB BenchmarkAppendRandRead/zblk=ZBlk1/change_count=500/change_percentage_set=[0.014] 6 14.564 ms/blk BenchmarkAppendSize/zblk=auto/change_count=500/change_percentage_set=[0.014] 1 29.4 MB BenchmarkAppendRandRead/zblk=auto/change_count=500/change_percentage_set=[0.014] 6 2.119 ms/blk BenchmarkRandWriteSize/zblk=ZBlk0/arrsize=1000000/change_count=500/change_percentage_set=[0.2] 1 1021.1 MB BenchmarkRandWriteRandRead/zblk=ZBlk0/arrsize=1000000/change_count=500/change_percentage_set=[0.2] 3 2.324 ms/blk BenchmarkRandWriteSize/zblk=ZBlk1/arrsize=1000000/change_count=500/change_percentage_set=[0.2] 1 216.2 MB BenchmarkRandWriteRandRead/zblk=ZBlk1/arrsize=1000000/change_count=500/change_percentage_set=[0.2] 3 15.317 ms/blk BenchmarkRandWriteSize/zblk=auto/arrsize=1000000/change_count=500/change_percentage_set=[0.2] 1 219.8 MB BenchmarkRandWriteRandRead/zblk=auto/arrsize=1000000/change_count=500/change_percentage_set=[0.2] 3 14.027 ms/blk BenchmarkRandWriteSize/zblk=ZBlk0/arrsize=1000000/change_count=500/change_percentage_set=[1] 1 1048.6 MB BenchmarkRandWriteRandRead/zblk=ZBlk0/arrsize=1000000/change_count=500/change_percentage_set=[1] 3 2.126 ms/blk BenchmarkRandWriteSize/zblk=ZBlk1/arrsize=1000000/change_count=500/change_percentage_set=[1] 1 1070.4 MB BenchmarkRandWriteRandRead/zblk=ZBlk1/arrsize=1000000/change_count=500/change_percentage_set=[1] 3 14.284 ms/blk BenchmarkRandWriteSize/zblk=auto/arrsize=1000000/change_count=500/change_percentage_set=[1] 1 1070.3 MB BenchmarkRandWriteRandRead/zblk=auto/arrsize=1000000/change_count=500/change_percentage_set=[1] 3 14.072 ms/blk BenchmarkRandWriteSize/zblk=ZBlk0/arrsize=1000000/change_count=500/change_percentage_set=[0.2,1] 1 1046.4 MB BenchmarkRandWriteRandRead/zblk=ZBlk0/arrsize=1000000/change_count=500/change_percentage_set=[0.2,1] 3 2.137 ms/blk BenchmarkRandWriteSize/zblk=ZBlk1/arrsize=1000000/change_count=500/change_percentage_set=[0.2,1] 1 638.2 MB BenchmarkRandWriteRandRead/zblk=ZBlk1/arrsize=1000000/change_count=500/change_percentage_set=[0.2,1] 3 14.083 ms/blk BenchmarkRandWriteSize/zblk=auto/arrsize=1000000/change_count=500/change_percentage_set=[0.2,1] 1 639.5 MB BenchmarkRandWriteRandRead/zblk=auto/arrsize=1000000/change_count=500/change_percentage_set=[0.2,1] 3 13.937 ms/blk and post-processed with benchstat from 3 such runs: │ x.log │ │ B │ AppendSize/zblk=ZBlk0/change_count=500/change_percentage_set=[0.014] 513.2Mi ± 0% AppendSize/zblk=ZBlk1/change_count=500/change_percentage_set=[0.014] 16.02Mi ± 0% AppendSize/zblk=auto/change_count=500/change_percentage_set=[0.014] 28.04Mi ± 0% RandWriteSize/zblk=ZBlk0/arrsize=1000000/change_count=500/change_percentage_set=[0.2] 973.8Mi ± 0% RandWriteSize/zblk=ZBlk1/arrsize=1000000/change_count=500/change_percentage_set=[0.2] 206.2Mi ± 0% RandWriteSize/zblk=auto/arrsize=1000000/change_count=500/change_percentage_set=[0.2] 209.6Mi ± 0% RandWriteSize/zblk=ZBlk0/arrsize=1000000/change_count=500/change_percentage_set=[1] 1000.0Mi ± 0% RandWriteSize/zblk=ZBlk1/arrsize=1000000/change_count=500/change_percentage_set=[1] 1020.8Mi ± 0% RandWriteSize/zblk=auto/arrsize=1000000/change_count=500/change_percentage_set=[1] 1020.7Mi ± 0% RandWriteSize/zblk=ZBlk0/arrsize=1000000/change_count=500/change_percentage_set=[0.2,1] 997.9Mi ± 0% RandWriteSize/zblk=ZBlk1/arrsize=1000000/change_count=500/change_percentage_set=[0.2,1] 608.6Mi ± 0% RandWriteSize/zblk=auto/arrsize=1000000/change_count=500/change_percentage_set=[0.2,1] 609.9Mi ± 0% geomean 353.0Mi │ x.log │ │ ms/blk │ AppendRandRead/zblk=ZBlk0/change_count=500/change_percentage_set=[0.014] 2.094 ± 12% AppendRandRead/zblk=ZBlk1/change_count=500/change_percentage_set=[0.014] 14.47 ± 1% AppendRandRead/zblk=auto/change_count=500/change_percentage_set=[0.014] 2.168 ± 2% RandWriteRandRead/zblk=ZBlk0/arrsize=1000000/change_count=500/change_percentage_set=[0.2] 2.324 ± 1% RandWriteRandRead/zblk=ZBlk1/arrsize=1000000/change_count=500/change_percentage_set=[0.2] 13.73 ± 12% RandWriteRandRead/zblk=auto/arrsize=1000000/change_count=500/change_percentage_set=[0.2] 13.60 ± 3% RandWriteRandRead/zblk=ZBlk0/arrsize=1000000/change_count=500/change_percentage_set=[1] 2.125 ± 2% RandWriteRandRead/zblk=ZBlk1/arrsize=1000000/change_count=500/change_percentage_set=[1] 14.18 ± 3% RandWriteRandRead/zblk=auto/arrsize=1000000/change_count=500/change_percentage_set=[1] 14.17 ± 1% RandWriteRandRead/zblk=ZBlk0/arrsize=1000000/change_count=500/change_percentage_set=[0.2,1] 2.118 ± 1% RandWriteRandRead/zblk=ZBlk1/arrsize=1000000/change_count=500/change_percentage_set=[0.2,1] 13.85 ± 2% RandWriteRandRead/zblk=auto/arrsize=1000000/change_count=500/change_percentage_set=[0.2,1] 13.80 ± 1% geomean 6.423 See nexedi/wendelin.core!20 and da765ef7...0c6f0850 for the preliminary history of this patch. Co-authored-by: Kirill Smelkov <kirr@nexedi.com>
96c68406