• Kirill Smelkov's avatar
    fixup! fixup! ZBigFile: Add ZBlk format option 'h' (heuristic) (4) · 3f631932
    Kirill Smelkov authored
    Take suggestions from Levin into account (nexedi/wendelin.core!20 (comment 198330)) :
    
        1. appending can be False, even though we are appending (misleading name).
        2. A big append uses ZBlk0 due to an if clause 25 lines later (logic is a bit far).
        3. in the previous version it could happen that if a block was filled up
           with small appends (ZBlk1), it wasn't transformed to ZBlk0 in case
           the next block would be filled up with only one big append.
        4. Regarding the actual algorithm, I wonder, why do we only use ZBlk0
           for big appends in case it's the first append of a new ZBlk? Couldn't
           we generally say it's ok to use ZBlk0 in case of big appends?
    
    All these notes are valid. The problem comes from misleadin semantic
    attached to 'appending' name. From the name it indicates only appending,
    but sometimes we want to attach 'small' meaning to it and we were not
    doing it universally.
    
    -> Fix the problem by splitting 'appending' and 'small' into separate
       flags so that there is no room for confusion.
    
    -> Rework the flow of code so that all cases that related to appending
       are under one branch.
    
    -> Also optimize ndelta computation - when done in plain python just
       this part was taking a lot of time as timing for initial writeup
       showed:
    
         writeup with ZBlk0: ~20-25s
         writeup with ZBlk1: ~20-30s
         writeup with auto:  was ~ 120s
    
       now, after switching to numpy for ndelta computation, whole runtime
       with 'auto' is taking ~ 35s. The whole runtime, if I observe
       benchmark execution correctly, is dominated by database writeup.
    3f631932
file_zodb.py 35.1 KB