Commits · 0eb6bbe4d9cf02f639d661edf7c02defc3453a69 · nexedi / linux

02 Apr, 2018 40 commits

ceph: fix root quota realm check · 0eb6bbe4

Yan, Zheng authored Jan 12, 2018

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

0eb6bbe4

ceph: don't check quota for snap inode · 25963669

Yan, Zheng authored Jan 12, 2018

snap inode's i_snap_realm is not pointing to ceph_snap_realm.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

25963669

ceph: quota: update MDS when max_bytes is approaching · 1ab302a0

Luis Henriques authored Jan 05, 2018

When we're reaching the ceph.quota.max_bytes limit, i.e., when writing
more than 1/16th of the space left in a quota realm, update the MDS with
the new file size.

This mirrors the fuse-client approach with commit 122c50315ed1 ("client:
Inform mds file size when approaching quota limit"), in the ceph git tree.
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

1ab302a0

ceph: quota: support for ceph.quota.max_bytes · 2b83845f

Luis Henriques authored Jan 05, 2018

Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

2b83845f

ceph: quota: don't allow cross-quota renames · cafe21a4

Luis Henriques authored Jan 05, 2018

This patch changes ceph_rename so that -EXDEV is returned if an attempt is
made to mv a file between two different dir trees with different quotas
setup.
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

cafe21a4

ceph: quota: support for ceph.quota.max_files · b7a29217

Luis Henriques authored Jan 05, 2018

This patch adds support for the max_files quota.  It hooks into all the
ceph functions that add new filesystem objects that need to be checked
against the quota limits.  When these limits are hit, -EDQUOT is returned.

Note that we're not checking quotas on ceph_link().  ceph_link doesn't
really create a new inode,  and since the MDS doesn't update the directory
statistics when a new (hard) link is created (only with symlinks), they
are not accounted as a new file.
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

b7a29217

ceph: quota: add initial infrastructure to support cephfs quotas · fb18a575

Luis Henriques authored Jan 05, 2018

This patch adds the infrastructure required to support cephfs quotas as it
is currently implemented in the ceph fuse client.  Cephfs quotas can be
set on any directory, and can restrict the number of bytes or the number
of files stored beneath that point in the directory hierarchy.

Quotas are set using the extended attributes 'ceph.quota.max_files' and
'ceph.quota.max_bytes', and can be removed by setting these attributes to
'0'.

Link: http://tracker.ceph.com/issues/22372Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

fb18a575

rbd: remove VLA usage · 08a79102

Kyle Spiers authored Mar 17, 2018

As part of the effort to remove VLAs from the kernel[1], this moves
the literal values into the stack array calculation instead of using a
variable for the sizing. The resulting size can be found from
sizeof(buf).

[1] https://lkml.org/lkml/2018/3/7/621Signed-off-by: Kyle Spiers <kyle@spiers.me>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

08a79102

rbd: fix spelling mistake: "reregisteration" -> "reregistration" · f6870cc9

Colin Ian King authored Mar 19, 2018

Trivial fix to spelling mistake in rdb_warn message text.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

f6870cc9

ceph: rename function drop_leases() to a more descriptive name · 7aac453a
Yan, Zheng authored Mar 13, 2018
```
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
```
7aac453a

ceph: fix invalid point dereference for error case in mdsc destroy · 50c55aec

Chengguang Xu authored Mar 14, 2018

1. set fsc->mdsc after successfully allocate all necessary memory
in mdsc init.
2. if fsc->mdsc is NULL, just skip destroy operation in mdsc destroy.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

50c55aec

ceph: return proper bool type to caller instead of pointer · 98cfda81

Chengguang Xu authored Mar 13, 2018

Change to return true/false only for bool type return code.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

98cfda81

ceph: optimize memory usage · bb48bd4d

Chengguang Xu authored Mar 13, 2018

In current code, regular file and directory use same struct
ceph_file_info to store fs specific data so the struct has to
include some fields which are only used for directory
(e.g., readdir related info), when having plenty of regular files,
it will lead to memory waste.

This patch introduces dedicated ceph_dir_file_info cache for
readdir related thins. So that regular file does not include those
unused fields anymore.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

bb48bd4d

ceph: optimize mds session register · 47474d0b

Chengguang Xu authored Mar 13, 2018

Do memory allocation first, so that avoid unnecessary
initialization of newly allocated session in error case.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

47474d0b

libceph, ceph: add __init attribution to init funcitons · 57a35dfb

Chengguang Xu authored Mar 10, 2018

Add __init attribution to the functions which are called only once
during initiating/registering operations and deleting unnecessary
symbol exports.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

57a35dfb

ceph: filter out used flags when printing unused open flags · 51b10f3f

Chengguang Xu authored Mar 09, 2018

Filter out used access mode flags when printing unused open flags.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

51b10f3f

ceph: don't wait on writeback when there is no more dirty pages · 1582af2e

Yan, Zheng authored Mar 06, 2018

In sync mode, writepages() needs to write all dirty pages. But
it can only write dirty pages associated with the oldest snapc.
To write dirty pages associated with next snapc, it needs to wait
until current writes complete.

If there is no more dirty pages, writepages() should not wait on
writeback. Otherwise, dirty page writeback becomes very slow.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

1582af2e

ceph: invalidate pages that beyond EOF in ceph_writepages_start() · af9cc401

Yan, Zheng authored Mar 04, 2018

Dirty pages can be associated with different capsnap. Different capsnap
may have different EOF value. So invalidating dirty pages according to
the largest EOF value is wrong. Dirty pages beyond EOF, but associated
with other capsnap, do not get invalidated.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

af9cc401

ceph: mark the cap cache as unreclaimable · bc4b5ad3

Chengguang Xu authored Feb 27, 2018

Releasing cap is affected by many factors (e.g., avail_count/reserve_count/min_count)
and min_count could be specified high volume in client mount option. Hence it's better
to mark cap cache as unreclaimable in case of non-trivial discrepancies between memory
shown as reclaimable and what is actually reclaimed.
Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

bc4b5ad3

ceph: change variable name to follow common rule · 73737682

Chengguang Xu authored Feb 28, 2018

Variable name ci is mostly used for ceph_inode_info.
Variable name fi is mostly used for ceph_file_info.
Variable name cf is mostly used for ceph_cap_flush.

Change variable name to follow above common rules
in case of confusing.
Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

73737682

ceph: optimizing cap reservation · 79cd674a

Chengguang Xu authored Feb 24, 2018

When caps_avail_count is in a low level, most newly
trimmed caps will probably go into ->caps_list and
caps_avail_count will be increased. Hence after trimming,
should recheck caps_avail_count to effectly reuse
newly trimmed caps. Also, when releasing unnecessary
caps follow the same rule of ceph_put_cap.
Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

79cd674a

ceph: release unreserved caps if having enough available caps · b517c1d8

Chengguang Xu authored Feb 25, 2018

When unreserving caps check if there is too mamy available caps
in the ->caps_list, if so release unreserved caps.
Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

b517c1d8

ceph: optimizing cap allocation · e327ce06

Chengguang Xu authored Feb 24, 2018

When setting high volume of caps_min_count or having many
unreserved caps, unused caps may always keep in the ->caps_list
even can't get new cap from kmem_cache_alloc because lack of
maximum limitation of caps_avail_count. Hence reuse caps in
->caps_list if available, it's maybe better than setting max
limitation of caps_avail_count and releasing unused caps when
reaching the limit.
Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

e327ce06

ceph: adding protection for showing cap reservation info · b884014a

Chengguang Xu authored Feb 23, 2018

Adding spinlock protection during getting cap reservation
ralated fields so that the numbers match below BUG_ON condition
in the code.

BUG_ON(mdsc->caps_total_count != mdsc->caps_use_count +
				 mdsc->caps_reserve_count +
				 mdsc->caps_avail_count);
Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

b884014a

libceph: adding missing message types to ceph_msg_type_name() · f2f87877

Chengguang Xu authored Feb 22, 2018

Some of message types are missing in ceph_msg_type_name(),
so just adding them for better understanding of output information.
Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

f2f87877

rbd: get the latest osdmap when using an existing client · dd435855

Ilya Dryomov authored Feb 22, 2018

Currently we request the latest osdmap only if ceph_pg_poolid_by_name()
fails with -ENOENT.  This is effective with newly created pools, but we
also want to avoid attempting to map from pools that were recently
deleted and report "pool does not exist" instead.  (Such an attempt
eventually fails in the OSD client after map check code kicks in, but
the error message is confusing.)

Request the latest osdmap unconditionally after bumping a ref on an
existing client in rbd_client_find().
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

dd435855

rbd: move rbd_get_client() below rbd_put_client() · 5feb0d8d

Ilya Dryomov authored Feb 22, 2018

... to avoid a forward declaration in the next commit.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

5feb0d8d

rbd: remove redundant declaration of rbd_spec_put() · 0a4a1e68
Ilya Dryomov authored Feb 12, 2018
```
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
```
0a4a1e68

ceph: use seq_show_option for string type options · 4d8969af

Chengguang Xu authored Feb 15, 2018

Using seq_show_option to replace seq_printf for string type options.
Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

4d8969af

libceph: fix misjudgement of maximum monitor number · 7377324e

Chengguang Xu authored Feb 11, 2018

num_mon should allow up to CEPH_MAX_MON in ceph_monmap_decode().
Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

7377324e

libceph, ceph: change permission for readonly debugfs entries · 11e1478d

Chengguang Xu authored Feb 10, 2018

Remove write permission for debugfs entries which only have readonly
function.
Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

11e1478d

ceph: keep consistent semantic in fscache related option combination · 7ae7a828

Chengguang Xu authored Feb 07, 2018

When specifying multiple fscache related options, the result isn't always
the same as option order, this fix will keep strict consistent meaning
by order.
Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

7ae7a828

ceph: add newline to end of debug message format · 4c069a58

Chengguang Xu authored Jan 30, 2018

Some of dout format do not include newline in the end,
fix for the files which are in fs/ceph and net/ceph directories,
and changing printk to dout for printing debug info in super.c
Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

4c069a58

rbd: allow "fancy" striping · b1331852

Ilya Dryomov authored Feb 07, 2018

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Acked-by: Jason Dillaman <dillaman@redhat.com>

b1331852

rbd: introduce OWN_BVECS data type · afb97888

Ilya Dryomov authored Feb 06, 2018

If the layout is "fancy", we need to be able to rearrange the provided
bio_vecs in stripe unit chunks to make it possible for the messenger to
read/write directly from/to the provided data buffer, without employing
a temporary data buffer for assembling the result.

Higher level bio_vec arrays are generally immutable, so this requires
copying into a private array. Only the bio_vecs themselves are shuffled
around, not the actual data. OWN_BVECS doesn't own any pages.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

afb97888

rbd: remove rbd_parent_request_{create,destroy}() · e93aca0a

Ilya Dryomov authored Feb 06, 2018

rbd_parent_request_create() takes a ref on obj_req for child_img_req.
There is no point in doing that because child_img_req is created on
behalf of obj_req -- obj_req is the initiator and can't be completed
before child_img_req.

Open-code the rest of rbd_parent_request_create() and remove it along
with rbd_parent_request_destroy().
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

e93aca0a

rbd: get rid of img_req->{offset,length} · dfd9875f
Ilya Dryomov authored Feb 06, 2018
```
These are set, but no longer used.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
```
dfd9875f
rbd: remove rbd_img_request_fill() and helpers · 0420c5dd
Ilya Dryomov authored Feb 06, 2018
```
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
```
0420c5dd
rbd: switch to common striping framework · 5a237819
Ilya Dryomov authored Feb 06, 2018
```
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
```
5a237819

rbd: create+truncate for whole-object layered discards · 2bb1e56e

Ilya Dryomov authored Feb 06, 2018

A whole-object layered discard is implemented as a truncate rather
than a delete: a dummy object is needed to prevent the CoW machinery
from kicking in.  However, a truncate on a non-existent object is
a no-op.  If the object doesn't exist in HEAD, a discard request is
effectively ignored, which violates our "discard zeroes data" promise
and breaks REQ_OP_WRITE_ZEROES implementation.

A non-exclusive create on an existing object is also a no-op, so the
fix is to do a compound create+truncate instead.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

2bb1e56e