• Andreas Gruenbacher's avatar
    gfs2: Rework freeze / thaw logic · b77b4a48
    Andreas Gruenbacher authored
    So far, at mount time, gfs2 would take the freeze glock in shared mode
    and then immediately drop it again, turning it into a cached glock that
    can be reclaimed at any time.  To freeze the filesystem cluster-wide,
    the node initiating the freeze would take the freeze glock in exclusive
    mode, which would cause the freeze glock's freeze_go_sync() callback to
    run on each node.  There, gfs2 would freeze the filesystem and schedule
    gfs2_freeze_func() to run.  gfs2_freeze_func() would re-acquire the
    freeze glock in shared mode, thaw the filesystem, and drop the freeze
    glock again.  The initiating node would keep the freeze glock held in
    exclusive mode.  To thaw the filesystem, the initiating node would drop
    the freeze glock again, which would allow gfs2_freeze_func() to resume
    on all nodes, leaving the filesystem in the thawed state.
    
    It turns out that in freeze_go_sync(), we cannot reliably and safely
    freeze the filesystem.  This is primarily because the final unmount of a
    filesystem takes a write lock on the s_umount rw semaphore before
    calling into gfs2_put_super(), and freeze_go_sync() needs to call
    freeze_super() which also takes a write lock on the same semaphore,
    causing a deadlock.  We could work around this by trying to take an
    active reference on the super block first, which would prevent unmount
    from running at the same time.  But that can fail, and freeze_go_sync()
    isn't actually allowed to fail.
    
    To get around this, this patch changes the freeze glock locking scheme
    as follows:
    
    At mount time, each node takes the freeze glock in shared mode.  To
    freeze a filesystem, the initiating node first freezes the filesystem
    locally and then drops and re-acquires the freeze glock in exclusive
    mode.  All other nodes notice that there is contention on the freeze
    glock in their go_callback callbacks, and they schedule
    gfs2_freeze_func() to run.  There, they freeze the filesystem locally
    and drop and re-acquire the freeze glock before re-thawing the
    filesystem.  This is happening outside of the glock state engine, so
    there, we are allowed to fail.
    
    From a cluster point of view, taking and immediately dropping a glock is
    indistinguishable from taking the glock and only dropping it upon
    contention, so this new scheme is compatible with the old one.
    
    Thanks to Li Dong <lidong@vivo.com> for reporting a locking bug in
    gfs2_freeze_func() in a previous version of this commit.
    Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
    b77b4a48
super.c 40.6 KB