• Lukas Czerner's avatar
    ext4: add support for lazy inode table initialization · bfff6873
    Lukas Czerner authored
    When the lazy_itable_init extended option is passed to mke2fs, it
    considerably speeds up filesystem creation because inode tables are
    not zeroed out.  The fact that parts of the inode table are
    uninitialized is not a problem so long as the block group descriptors,
    which contain information regarding how much of the inode table has
    been initialized, has not been corrupted However, if the block group
    checksums are not valid, e2fsck must scan the entire inode table, and
    the the old, uninitialized data could potentially cause e2fsck to
    report false problems.
    
    Hence, it is important for the inode tables to be initialized as soon
    as possble.  This commit adds this feature so that mke2fs can safely
    use the lazy inode table initialization feature to speed up formatting
    file systems.
    
    This is done via a new new kernel thread called ext4lazyinit, which is
    created on demand and destroyed, when it is no longer needed.  There
    is only one thread for all ext4 filesystems in the system. When the
    first filesystem with inititable mount option is mounted, ext4lazyinit
    thread is created, then the filesystem can register its request in the
    request list.
    
    This thread then walks through the list of requests picking up
    scheduled requests and invoking ext4_init_inode_table(). Next schedule
    time for the request is computed by multiplying the time it took to
    zero out last inode table with wait multiplier, which can be set with
    the (init_itable=n) mount option (default is 10).  We are doing
    this so we do not take the whole I/O bandwidth. When the thread is no
    longer necessary (request list is empty) it frees the appropriate
    structures and exits (and can be created later later by another
    filesystem).
    
    We do not disturb regular inode allocations in any way, it just do not
    care whether the inode table is, or is not zeroed. But when zeroing, we
    have to skip used inodes, obviously. Also we should prevent new inode
    allocations from the group, while zeroing is on the way. For that we
    take write alloc_sem lock in ext4_init_inode_table() and read alloc_sem
    in the ext4_claim_inode, so when we are unlucky and allocator hits the
    group which is currently being zeroed, it just has to wait.
    
    This can be suppresed using the mount option no_init_itable.
    Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
    Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
    bfff6873
ext4.h 71.7 KB