Commits · 803074ad77b91e270c1ce90793a924cdb4547162 · Kirill Smelkov / linux

23 Feb, 2021 1 commit

Merge branches 'rgrp-glock-sharing' and 'gfs2-revoke' from... · 803074ad

Andreas Gruenbacher authored Feb 22, 2021

Merge branches 'rgrp-glock-sharing' and 'gfs2-revoke' from https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git

Merge the resource group glock sharing feature and the revoke accounting rework.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

803074ad

22 Feb, 2021 3 commits

gfs2: Per-revoke accounting in transactions · 2129b428

Andreas Gruenbacher authored Dec 17, 2020

In the log, revokes are stored as a revoke descriptor (struct
gfs2_log_descriptor), followed by zero or more additional revoke blocks
(struct gfs2_meta_header).  On filesystems with a blocksize of 4k, the
revoke descriptor contains up to 503 revokes, and the metadata blocks
contain up to 509 revokes each.  We've so far been reserving space for
revokes in transactions in block granularity, so a lot more space than
necessary was being allocated and then released again.

This patch switches to assigning revokes to transactions individually
instead.  Initially, space for the revoke descriptor is reserved and
handed out to transactions.  When more revokes than that are reserved,
additional revoke blocks are added.  When the log is flushed, the space
for the additional revoke blocks is released, but we keep the space for
the revoke descriptor block allocated.

Transactions may still reserve more revokes than they will actually need
in the end, but now we won't overshoot the target as much, and by only
returning the space for excess revokes at log flush time, we further
reduce the amount of contention between processes.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

2129b428

gfs2: Rework the log space allocation logic · fe3e3976

Andreas Gruenbacher authored Dec 10, 2020

The current log space allocation logic is hard to understand or extend.
The principle it that when the log is flushed, we may or may not have a
transaction active that has space allocated in the log.  To deal with
that, we set aside a magical number of blocks to be used in case we
don't have an active transaction.  It isn't clear that the pool will
always be big enough.  In addition, we can't return unused log space at
the end of a transaction, so the number of blocks allocated must exactly
match the number of blocks used.

Simplify this as follows:
 * When transactions are allocated or merged, always reserve enough
   blocks to flush the transaction (err on the safe side).
 * In gfs2_log_flush, return any allocated blocks that haven't been used.
 * Maintain a pool of spare blocks big enough to do one log flush, as
   before.
 * In gfs2_log_flush, when we have no active transaction, allocate a
   suitable number of blocks.  For that, use the spare pool when
   called from logd, and leave the pool alone otherwise.  This means
   that when the log is almost full, logd will still be able to do one
   more log flush, which will result in more log space becoming
   available.

This will make the log space allocator code easier to work with in
the future.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

fe3e3976

gfs2: Minor calc_reserved cleanup · 71b219f4
Andreas Gruenbacher authored Jan 29, 2021
```
No functional change.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
```
71b219f4

17 Feb, 2021 9 commits

gfs2: Use resource group glock sharing · 4fc7ec31

Bob Peterson authored Apr 24, 2018

This patch takes advantage of the new glock holder sharing feature for
resource groups.  We have already introduced local resource group
locking in a previous patch, so competing accesses of local processes
are already under control.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

4fc7ec31

gfs2: Allow node-wide exclusive glock sharing · 06e908cd

Bob Peterson authored Apr 18, 2018

Introduce a new LM_FLAG_NODE_SCOPE glock holder flag: when taking a
glock in LM_ST_EXCLUSIVE (EX) mode and with the LM_FLAG_NODE_SCOPE flag
set, the exclusive lock is shared among all local processes who are
holding the glock in EX mode and have the LM_FLAG_NODE_SCOPE flag set.
From the point of view of other nodes, the lock is still held
exclusively.

A future patch will start using this flag to improve performance with
rgrp sharing.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

06e908cd

gfs2: Add local resource group locking · 9e514605

Andreas Gruenbacher authored Feb 08, 2021

Prepare for treating resource group glocks as exclusive among nodes but
shared among all tasks running on a node: introduce another layer of
node-specific locking that the local tasks can use to coordinate their
accesses.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

9e514605

gfs2: Add per-reservation reserved block accounting · 725d0e9d

Andreas Gruenbacher authored Oct 02, 2018

Add a rs_reserved field to struct gfs2_blkreserv to keep track of the number of
blocks reserved by this particular reservation, and a rd_reserved field to
struct gfs2_rgrpd to keep track of the total number of reserved blocks in the
resource group.  Those blocks are exclusively reserved, as opposed to the
rs_requested / rd_requested blocks which are tracked in the reservation tree
(rd_rstree) and which can be stolen if necessary.

When making a reservation with gfs2_inplace_reserve, rs_reserved is set to
somewhere between ap->min_target and ap->target depending on the number of free
blocks in the resource group.  When allocating blocks with gfs2_alloc_blocks,
rs_reserved is decremented accordingly.  Eventually, any reserved but not
consumed blocks are returned to the resource group by gfs2_inplace_release.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

725d0e9d

gfs2: Rename rs_{free -> requested} and rd_{reserved -> requested} · 07974d2a

Andreas Gruenbacher authored Oct 22, 2020

We keep track of what we've so far been referring to as reservations in
rd_rstree: the nodes in that tree indicate where in a resource group we'd
like to allocate the next couple of blocks for a particular inode. Local
processes take those as hints, but they may still "steal" blocks from those
extents, so when actually allocating a block, we must double check in the
bitmap whether that block is actually still free. Likewise, other cluster
nodes may "steal" such blocks as well.

One of the following patches introduces resource group glock sharing, i.e.,
sharing of an exclusively locked resource group glock among local processes to
speed up allocations. To make that work, we'll need to keep track of how many
blocks we've actually reserved for each inode, so we end up with two different
kinds of reservations.

Distinguish these two kinds by referring to blocks which are reserved but may
still be "stolen" as "requested". This rename also makes it more obvious that
rs_requested and rd_requested are strongly related.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

07974d2a

gfs2: Check for active reservation in gfs2_release · 0ec9b9ea

Andreas Gruenbacher authored Oct 21, 2020

In gfs2_release, check if the inode has an active reservation to avoid
unnecessary lock taking.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

0ec9b9ea

gfs2: Don't search for unreserved space twice · b2598965

Andreas Gruenbacher authored Oct 12, 2020

If gfs2_inplace_reserve has chosen a resource group but it couldn't make a
reservation there, there are too many other reservations in that resource
group.  In that case, don't even try to respect existing reservations in
gfs2_alloc_blocks.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

b2598965

gfs2: Only pass reservation down to gfs2_rbm_find · 3d39fcd1

Andreas Gruenbacher authored Oct 09, 2020

Only pass the current reservation down to gfs2_rbm_find rather than the entire
inode; we don't need any of the other information.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

3d39fcd1

gfs2: Also reflect single-block allocations in rgd->rd_extfail_pt · f38e998f

Andreas Gruenbacher authored Oct 05, 2020

Pass a non-NULL minext to gfs2_rbm_find even for single-block allocations. In
gfs2_rbm_find, also set rgd->rd_extfail_pt when a single-block allocation
fails in a resource group: there is no reason for treating that case
differently. In gfs2_reservation_check_and_update, only check how many free
blocks we have if more than one block is requested; we already know there's at
least one free block.

In addition, when allocating N blocks fails in gfs2_rbm_find, we need to set
rd_extfail_pt to N - 1 rather than N: rd_extfail_pt defines the biggest
allocation that might still succeed.

Finally, reset rd_extfail_pt when updating the resource group statistics in
update_rgrp_lvb, as we already do in gfs2_rgrp_bh_get.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

f38e998f

10 Feb, 2021 1 commit

gfs2: Recursive gfs2_quota_hold in gfs2_iomap_end · 7009fa9c

Andreas Gruenbacher authored Feb 09, 2021

When starting an iomap write, gfs2_quota_lock_check -> gfs2_quota_lock
-> gfs2_quota_hold is called from gfs2_iomap_begin.  At the end of the
write, before unlocking the quotas, punch_hole -> gfs2_quota_hold can be
called again in gfs2_iomap_end, which is incorrect and leads to a failed
assertion.  Instead, move the call to gfs2_quota_unlock before the call
to punch_hole to fix that.

Fixes: 64bc06bb ("gfs2: iomap buffered write support")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

7009fa9c

08 Feb, 2021 2 commits

gfs2: Add trusted xattr support · 866eef48

Andreas Gruenbacher authored Feb 05, 2021

Add support for an additional filesystem version (sb_fs_format = 1802).
When a filesystem with the new version is mounted, the filesystem
supports "trusted.*" xattrs.

In addition, version 1802 filesystems implement a form of forward
compatibility for xattrs: when xattrs with an unknown prefix (ea_type)
are found on a version 1802 filesystem, those attributes are not shown
by listxattr, and they are not accessible by getxattr, setxattr, or
removexattr.

This mechanism might turn out to be what we need in the future, but if
not, we can always bump the filesystem version and break compatibility
instead.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Andrew Price <anprice@redhat.com>

866eef48

gfs2: Enable rgrplvb for sb_fs_format 1802 · 47b7ec1d

Andrew Price authored Feb 05, 2021

Turn on rgrplvb by default for sb_fs_format > 1801.

Mount options still have to override this so a new args field to
differentiate between 'off' and 'not specified' is added, and the new
default is applied only when it's not specified.
Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

47b7ec1d

05 Feb, 2021 2 commits

gfs2: Don't skip dlm unlock if glock has an lvb · 78178ca8

Bob Peterson authored Feb 05, 2021

Patch fb6791d1 was designed to allow gfs2 to unmount quicker by
skipping the step where it tells dlm to unlock glocks in EX with lvbs.
This was done because when gfs2 unmounts a file system, it destroys the
dlm lockspace shortly after it destroys the glocks so it doesn't need to
unlock them all: the unlock is implied when the lockspace is destroyed
by dlm.

However, that patch introduced a use-after-free in dlm: as part of its
normal dlm_recoverd process, it can call ls_recovery to recover dead
locks. In so doing, it can call recover_rsbs which calls recover_lvb for
any mastered rsbs. Func recover_lvb runs through the list of lkbs queued
to the given rsb (if the glock is cached but unlocked, it will still be
queued to the lkb, but in NL--Unlocked--mode) and if it has an lvb,
copies it to the rsb, thus trying to preserve the lkb. However, when
gfs2 skips the dlm unlock step, it frees the glock and its lvb, which
means dlm's function recover_lvb references the now freed lvb pointer,
copying the freed lvb memory to the rsb.

This patch changes the check in gdlm_put_lock so that it calls
dlm_unlock for all glocks that contain an lvb pointer.

Fixes: fb6791d1 ("GFS2: skip dlm_unlock calls in unmount")
Cc: stable@vger.kernel.org # v3.8+
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

78178ca8

gfs2: Lock imbalance on error path in gfs2_recover_one · 834ec3e1

Andreas Gruenbacher authored Feb 05, 2021

In gfs2_recover_one, fix a sd_log_flush_lock imbalance when a recovery
pass fails.

Fixes: c9ebc4b7 ("gfs2: allow journal replay to hold sd_log_flush_lock")
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

834ec3e1

03 Feb, 2021 9 commits

gfs2: Move function gfs2_ail_empty_tr · 76fce654

Andreas Gruenbacher authored Dec 19, 2020

Move this function further up in log.c so that we can use it in the next patch.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

76fce654

gfs2: Get rid of current_tail() · 5cb738b5

Andreas Gruenbacher authored Dec 19, 2020

Keep the current value of the updated log tail in the super block as
sb_log_flush_tail instead of computing it on the fly. This avoids
unnecessary sd_ail_lock taking and cleans up the code.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

5cb738b5

gfs2: Use a tighter bound in gfs2_trans_begin · 297de318

Andreas Gruenbacher authored Dec 06, 2020

Use a tighter bound for the number of blocks required by transactions in
gfs2_trans_begin: in the worst case, we'll have mixed data and metadata,
so we'll need a log desciptor for each type.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

297de318

gfs2: Clean up gfs2_log_reserve · 5ae8fff8

Andreas Gruenbacher authored Dec 13, 2020

Wake up log waiters in gfs2_log_release when log space has actually become
available. This is a much better place for the wakeup than gfs2_logd.

Check if enough log space is immeditely available before anything else. If
there isn't, use io_wait_event to wait instead of open-coding it.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

5ae8fff8

gfs2: Don't wait for journal flush in clean_journal · 4a3d049d

Andreas Gruenbacher authored Dec 11, 2020

Commit 588bff95 added gfs2_write_log_header() and started using it in
clean_journal(), with an additional call to log_flush_wait() at the end of
gfs2_write_log_header() which is unnecessary for clean_journal(). Move
that call out of gfs2_write_log_header() to restore the previous behavior.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

4a3d049d

gfs2: Move lock flush locking to gfs2_trans_{begin,end} · c1eba1b0

Andreas Gruenbacher authored Dec 12, 2020

Move the read locking of sd_log_flush_lock from gfs2_log_reserve to
gfs2_trans_begin, and its unlocking from gfs2_log_release to
gfs2_trans_end.  Use gfs2_log_release in two places in which it was open
coded before.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

c1eba1b0

gfs2: Get rid of sd_reserving_log · f3708fb5

Andreas Gruenbacher authored Dec 13, 2020

This counter and the associated wait queue are only used so that
gfs2_make_fs_ro can efficiently wait for all pending log space
allocations to fail after setting the filesystem to read-only. This
comes at the cost of waking up that wait queue very frequently.

Instead, when gfs2_log_reserve fails because the filesystem has become
read-only, Wake up sd_log_waitq. In gfs2_make_fs_ro, set the file
system read-only and then wait until all the log space has been
released. Give up and report the problem after a while. With that,
sd_reserving_log and sd_reserving_log_wait can be removed.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

f3708fb5

gfs2: Clean up on-stack transactions · c968f578

Andreas Gruenbacher authored Jan 29, 2021

Replace the TR_ALLOCED flag by its inverse, TR_ONSTACK: that way, the flag only
needs to be set in the exceptional case of on-stack transactions.  Split off
__gfs2_trans_begin from gfs2_trans_begin and use it to replace the open-coded
version in gfs2_ail_empty_gl.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

c968f578

gfs2: Use sb_start_intwrite in gfs2_ail_empty_gl · 15e20a30

Andreas Gruenbacher authored Feb 03, 2021

Commit 2e60d768 ("GFS2: update freeze code to use freeze/thaw_super
on all nodes") optimized away the sb_start_intwrite ... sb_end_intwrite
protection for the on-stack transactions in gfs2_ail_empty_gl with no
explanation. I can't think of a valid reason for doing that, so revert
that change. This simplifies the next commit.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

15e20a30

25 Jan, 2021 4 commits

gfs2: keep bios separate for each journal · 82218943

Bob Peterson authored Jan 21, 2021

The recovery func can recover multiple journals, but they were all using
the same bio. This resulted in use-after-free related to sdp->sd_log_bio.
This patch moves the variable to the journal descriptor, jd, so that
every recovery can operate on its own bio. And hopefully we never run out.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>

82218943

gfs2: fix glock confusion in function signal_our_withdraw · f5f02fde

Bob Peterson authored Jan 18, 2021

If go_free is defined, function signal_our_withdraw is supposed to
synchronize on the GLF_FREEING flag of the inode glock, but it
accidentally does that on the live glock. Fix that and disambiguate
the glock variables.

Fixes: 601ef0d5 ("gfs2: Force withdraw to replay journals and wait for it to finish")
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Bob Peterson <rpeterso@redhat.com>

f5f02fde

Revert "GFS2: Re-add a call to log_flush_wait when flushing the journal" · 4a011849

Bob Peterson authored Jan 20, 2021

This reverts commit 428fd95d.
Patch 428fd95d85b2 added a call to log_flush_wait to function
gfs2_log_flush. Then gfs2_log_flush calls log_write_header which submits
a write request with the REQ_PREFLUSH flag which also forces it to wait.
This patch removes the unnecessary call to log_flush_wait.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>

4a011849

gfs2: Fix invalid block size message · bff2e532

Andrew Price authored Jan 12, 2021

Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>

bff2e532

22 Jan, 2021 1 commit

gfs2: amend SLAB_RECLAIM_ACCOUNT on gfs2 related slab cache · 00e8e9bc

Zhaoyang Huang authored Jan 05, 2021

As gfs2_quotad_cachep and gfs2_glock_cachep have registered
shrinkers, amending SLAB_RECLAIM_ACCOUNT when creating them,
which improves slab accounting.
Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

00e8e9bc

19 Jan, 2021 7 commits

gfs2: Clean up ail2_empty · 6e80674a

Andreas Gruenbacher authored Dec 11, 2020

Clean up the logic in ail2_empty (no functional change).
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

6e80674a

gfs2: Rename gfs2_{write => flush}_revokes · e7501bf8

Andreas Gruenbacher authored Dec 19, 2020

Function gfs2_write_revokes doesn't actually write any revokes; instead, it
adds revokes to the system transaction during a flush.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

e7501bf8

gfs2: Minor debugging improvement · 625a8edd

Andreas Gruenbacher authored Dec 06, 2020

Split the assert in gfs2_trans_end into two parts.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

625a8edd

gfs2: Some documentation updates · 6188e877

Andreas Gruenbacher authored Dec 06, 2020

The calc_reserved description claims that buf_limit is 502 (on 4k
filesystems), but it is actually 503.  Fix / clarify the entire
description.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

6188e877

gfs2: Minor gfs2_write_revokes cleanups · 5a4e9c60

Andreas Gruenbacher authored Dec 06, 2020

Clean up the computations in gfs2_write_revokes (no change in functionality).
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

5a4e9c60

gfs2: Simplify the buf_limit and databuf_limit definitions · 458094c2

Andreas Gruenbacher authored Dec 05, 2020

The BUF_OFFSET and DATABUF_OFFSET definitions are only used in buf_limit
and databuf_limit, respectively, and the rounding done in those
definitions is immediately wiped out by dividing by the element size.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

458094c2

gfs2: Un-obfuscate function jdesc_find_i · 736b2f77

Andreas Gruenbacher authored Dec 07, 2020

Clean up this function to show that it is trivial.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

736b2f77

18 Jan, 2021 1 commit

gfs2: Set GBF_FULL flags when reading resource group · 560b8eba

Andreas Gruenbacher authored Oct 06, 2020

When reading a resource group from disk or when receiving the resource group
statistics from a Lock Value Block (LVB), set/clear the GBF_FULL flags of all
bitmaps in that resource group according to whether or not the resource group
is full.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>

560b8eba