Commits · 6fd4e634835208ddb331234bfa51d75396a5c42c · Kirill Smelkov / linux

16 Sep, 2019 38 commits

ceph: allow object copies across different filesystems in the same cluster · 6fd4e634

Luis Henriques authored Sep 09, 2019

OSDs are able to perform object copies across different pools.  Thus,
there's no need to prevent copy_file_range from doing remote copies if the
source and destination superblocks are different.  Only return -EXDEV if
they have different fsid (the cluster ID).
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

6fd4e634

ceph: include ceph_debug.h in cache.c · 48f930ea

Ilya Dryomov authored Sep 05, 2019

Any file that uses dout() should include ceph_debug.h at the top.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

48f930ea

ceph: move static keyword to the front of declarations · 536cc331

Krzysztof Wilczynski authored Aug 31, 2019

Move the static keyword to the front of declarations of
snap_handle_length, handle_length and connected_handle_length,
and resolve the following compiler warnings that can be seen
when building with warnings enabled (W=1):

fs/ceph/export.c:38:2: warning:
  ‘static’ is not at beginning of declaration [-Wold-style-declaration]

fs/ceph/export.c:88:2: warning:
  ‘static’ is not at beginning of declaration [-Wold-style-declaration]

fs/ceph/export.c:90:2: warning:
  ‘static’ is not at beginning of declaration [-Wold-style-declaration]
Signed-off-by: Krzysztof Wilczynski <kw@linux.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

536cc331

rbd: pull rbd_img_request_create() dout out into the callers · 21ed05a8

Ilya Dryomov authored Aug 30, 2019

Make it more informative: log op_type, offset and length for block
layer requests and initiating obj_req for child requests.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

21ed05a8

ceph: reconnect connection if session hang in opening state · 71a228bc

Erqi Chen authored Aug 28, 2019

If client mds session is evicted in CEPH_MDS_SESSION_OPENING state,
mds won't send session msg to client, and delayed_work skip
CEPH_MDS_SESSION_OPENING state session, the session hang forever.

Allow ceph_con_keepalive to reconnect a session in OPENING to avoid
session hang. Also, ensure that we skip sessions in RESTARTING and
REJECTED states since those states can't be resurrected by issuing
a keepalive.

Link: https://tracker.ceph.com/issues/41551
Signed-off-by: Erqi Chen chenerqi@gmail.com
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

71a228bc

libceph: drop unused con parameter of calc_target() · 8edf84ba

Ilya Dryomov authored Aug 21, 2019

This bit was omitted from a5613724 ("libceph: fix PG split vs OSD
(re)connect race") to avoid backport conflicts.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

8edf84ba

ceph: use release_pages() directly · 96ac9158

John Hubbard authored Aug 08, 2019

release_pages() has been available to modules since Oct, 2010,
when commit 0be8557b ("fuse: use release_pages()") added
EXPORT_SYMBOL(release_pages). However, this ceph code was still
using a workaround.

Remove the workaround, and call release_pages() directly.
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

96ac9158

rbd: fix response length parameter for encoded strings · 5435d206

Dongsheng Yang authored Aug 09, 2019

rbd_dev_image_id() allocates space for length but passes a smaller
value to rbd_obj_method_sync().  rbd_dev_v2_object_prefix() doesn't
allocate space for length.  Fix both to be consistent.
Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

5435d206

ceph: allow arbitrary security.* xattrs · b8fe918b

Jeff Layton authored Aug 06, 2019

Most filesystems don't limit what security.* xattrs can be set or
fetched. I see no reason that we need to limit that on cephfs either.

Drop the special xattr handler for "security." xattrs, and allow the
"other" xattr handler to handle security xattrs as well.

In addition to fixing xfstest generic/093, this allows us to support
per-file capabilities (a'la setcap(8)).

Link: https://tracker.ceph.com/issues/41135Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

b8fe918b

ceph: only set CEPH_I_SEC_INITED if we got a MAC label · 026105eb

Jeff Layton authored Aug 06, 2019

__ceph_getxattr will set the CEPH_I_SEC_INITED flag whenever it gets
any xattr that starts with "security.". We only want to set that flag
when fetching the MAC label for the currently-active LSM, however.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

026105eb

ceph: turn ceph_security_invalidate_secctx into static inline · 668959a5

Jeff Layton authored Aug 06, 2019

No need to do an extra jump here. Also add some comments on the endifs.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

668959a5

ceph: add buffered/direct exclusionary locking for reads and writes · 321fe13c

Jeff Layton authored Aug 02, 2019

xfstest generic/451 intermittently fails. The test does O_DIRECT writes
to a file, and then reads back the result using buffered I/O, while
running a separate set of tasks that are also doing buffered reads.

The client will invalidate the cache prior to a direct write, but it's
easy for one of the other readers' replies to race in and reinstantiate
the invalidated range with stale data.

To fix this, we must to serialize direct I/O writes and buffered reads.
We could just sprinkle in some shared locks on the i_rwsem for reads,
and increase the exclusive footprint on the write side, but that would
cause O_DIRECT writes to end up serialized vs. other direct requests.

Instead, borrow the scheme used by nfs.ko. Buffered writes take the
i_rwsem exclusively, but buffered reads take a shared lock, allowing
them to run in parallel.

O_DIRECT requests also take a shared lock, but we need for them to not
run in parallel with buffered reads. A flag on the ceph_inode_info is
used to indicate whether it's in direct or buffered I/O mode. When a
conflicting request is submitted, it will block until the inode can be
flipped to the necessary mode.

Link: https://tracker.ceph.com/issues/40985Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

321fe13c

libceph: handle OSD op ceph_pagelist_append() errors · 4766815b

David Disseldorp authored Jul 03, 2019

osd_req_op_cls_init() and osd_req_op_xattr_init() currently propagate
ceph_pagelist_alloc() ENOMEM errors but ignore ceph_pagelist_append()
memory allocation failures. Add these checks and cleanup on error.
Signed-off-by: David Disseldorp <ddiss@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

4766815b

ceph: don't return a value from void function · 3e8730fa

John Hubbard authored Aug 01, 2019

This fixes a build warning to that effect.

Fixes: 1a829ff2 ("ceph: no need to check return value of debugfs_create functions")
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

3e8730fa

ceph: don't freeze during write page faults · 249c1df5

Jeff Layton authored Aug 01, 2019

Prevent freezing operations during write page faults. This is good
practice for most filesystems, but especially for ceph since we're
monkeying with the signal table here.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

249c1df5

ceph: update the mtime when truncating up · c62498d7

Jeff Layton authored Jul 25, 2019

If we have Fx caps, and the we're truncating the size to be larger, then
we'll cache the size attribute change, but the mtime won't be updated.

Move the size handling before the mtime, and add ATTR_MTIME to ia_valid
in that case to make sure the mtime also gets updated.

This fixes xfstest generic/313.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

c62498d7

ceph: fix indentation in __get_snap_name() · 0ed26f36

Ilya Dryomov authored Jul 29, 2019

Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

0ed26f36

ceph: remove incorrect comment above __send_cap · 98cd281a

Jeff Layton authored Jul 05, 2019

It doesn't do anything to invalidate the cache when dropping RD caps.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

98cd281a

ceph: remove CEPH_I_NOFLUSH · daca8bda

Jeff Layton authored Jul 05, 2019

Nothing sets this flag.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

daca8bda

ceph: remove unneeded test in try_flush_caps · 27b0a392

Jeff Layton authored Jul 05, 2019

cap->session is always non-NULL, so we can just do a single test for
equality w/o testing explicitly for a NULL pointer.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

27b0a392

ceph: have __mark_caps_flushing return flush_tid · 9f3345d8

Jeff Layton authored Jul 08, 2019

Currently, this function returns ci->i_dirty_caps, but the callers have
to check that that isn't 0 before calling this function. Have the
callers grab that value directly out of the inode, and have
__mark_caps_flushing return the flush_tid instead.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

9f3345d8

ceph: fix comments over ceph_add_cap · 354c63a0

Jeff Layton authored Jul 19, 2019

We actually need the ci->i_ceph_lock here. The necessity of the s_mutex
is less clear. Also add a lockdep assertion for the i_ceph_lock.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

354c63a0

ceph: eliminate session->s_trim_caps · 533a2818

Jeff Layton authored Jul 19, 2019

It's only used to keep count of caps being trimmed, but that requires
that we hold the session->s_mutex to prevent multiple trimming
operations from running concurrently.

We can achieve the same effect using an integer on the stack, which
allows us to (eventually) not need the s_mutex.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

533a2818

ceph: fetch cap_gen under spinlock in ceph_add_cap · 606d1023

Jeff Layton authored Jul 22, 2019

It's protected by the s_gen_ttl_lock, so we should fetch under it
and ensure that we're using the same generation in both places.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

606d1023

ceph: remove ceph_get_cap_mds and __ceph_get_cap_mds · 5de16b30

Jeff Layton authored Jul 23, 2019

Nothing calls these routines.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

5de16b30

ceph: don't SetPageError on writepage errors · b72b13eb

Jeff Layton authored Jul 02, 2019

We already mark the mapping in that case, and doing this can cause
false positives to occur at fsync time, as well as spurious read
errors.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

b72b13eb