Commit a5613724 authored by Ilya Dryomov's avatar Ilya Dryomov

libceph: fix PG split vs OSD (re)connect race

We can't rely on ->peer_features in calc_target() because it may be
called both when the OSD session is established and open and when it's
not.  ->peer_features is not valid unless the OSD session is open.  If
this happens on a PG split (pg_num increase), that could mean we don't
resend a request that should have been resent, hanging the client
indefinitely.

In userspace this was fixed by looking at require_osd_release and
get_xinfo[osd].features fields of the osdmap.  However these fields
belong to the OSD section of the osdmap, which the kernel doesn't
decode (only the client section is decoded).

Instead, let's drop this feature check.  It effectively checks for
luminous, so only pre-luminous OSDs would be affected in that on a PG
split the kernel might resend a request that should not have been
resent.  Duplicates can occur in other scenarios, so both sides should
already be prepared for them: see dup/replay logic on the OSD side and
retry_attempt check on the client side.

Cc: stable@vger.kernel.org
Fixes: 7de030d6 ("libceph: resend on PG splits if OSD has RESEND_ON_SPLIT")
Link: https://tracker.ceph.com/issues/41162Reported-by: default avatarJerry Lee <leisurelysw24@gmail.com>
Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
Tested-by: default avatarJerry Lee <leisurelysw24@gmail.com>
Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
parent 28a28261
...@@ -1496,7 +1496,7 @@ static enum calc_target_result calc_target(struct ceph_osd_client *osdc, ...@@ -1496,7 +1496,7 @@ static enum calc_target_result calc_target(struct ceph_osd_client *osdc,
struct ceph_osds up, acting; struct ceph_osds up, acting;
bool force_resend = false; bool force_resend = false;
bool unpaused = false; bool unpaused = false;
bool legacy_change; bool legacy_change = false;
bool split = false; bool split = false;
bool sort_bitwise = ceph_osdmap_flag(osdc, CEPH_OSDMAP_SORTBITWISE); bool sort_bitwise = ceph_osdmap_flag(osdc, CEPH_OSDMAP_SORTBITWISE);
bool recovery_deletes = ceph_osdmap_flag(osdc, bool recovery_deletes = ceph_osdmap_flag(osdc,
...@@ -1584,15 +1584,14 @@ static enum calc_target_result calc_target(struct ceph_osd_client *osdc, ...@@ -1584,15 +1584,14 @@ static enum calc_target_result calc_target(struct ceph_osd_client *osdc,
t->osd = acting.primary; t->osd = acting.primary;
} }
if (unpaused || legacy_change || force_resend || if (unpaused || legacy_change || force_resend || split)
(split && con && CEPH_HAVE_FEATURE(con->peer_features,
RESEND_ON_SPLIT)))
ct_res = CALC_TARGET_NEED_RESEND; ct_res = CALC_TARGET_NEED_RESEND;
else else
ct_res = CALC_TARGET_NO_ACTION; ct_res = CALC_TARGET_NO_ACTION;
out: out:
dout("%s t %p -> ct_res %d osd %d\n", __func__, t, ct_res, t->osd); dout("%s t %p -> %d%d%d%d ct_res %d osd%d\n", __func__, t, unpaused,
legacy_change, force_resend, split, ct_res, t->osd);
return ct_res; return ct_res;
} }
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment