• Ilya Dryomov's avatar
    libceph: apply new_state before new_up_client on incrementals · 8777c9f6
    Ilya Dryomov authored
    [ Upstream commit 930c5328 ]
    
    Currently, osd_weight and osd_state fields are updated in the encoding
    order.  This is wrong, because an incremental map may look like e.g.
    
        new_up_client: { osd=6, addr=... } # set osd_state and addr
        new_state: { osd=6, xorstate=EXISTS } # clear osd_state
    
    Suppose osd6's current osd_state is EXISTS (i.e. osd6 is down).  After
    applying new_up_client, osd_state is changed to EXISTS | UP.  Carrying
    on with the new_state update, we flip EXISTS and leave osd6 in a weird
    "!EXISTS but UP" state.  A non-existent OSD is considered down by the
    mapping code
    
    2087    for (i = 0; i < pg->pg_temp.len; i++) {
    2088            if (ceph_osd_is_down(osdmap, pg->pg_temp.osds[i])) {
    2089                    if (ceph_can_shift_osds(pi))
    2090                            continue;
    2091
    2092                    temp->osds[temp->size++] = CRUSH_ITEM_NONE;
    
    and so requests get directed to the second OSD in the set instead of
    the first, resulting in OSD-side errors like:
    
    [WRN] : client.4239 192.168.122.21:0/2444980242 misdirected client.4239.1:2827 pg 2.5df899f2 to osd.4 not [1,4,6] in e680/680
    
    and hung rbds on the client:
    
    [  493.566367] rbd: rbd0: write 400000 at 11cc00000 (0)
    [  493.566805] rbd: rbd0:   result -6 xferred 400000
    [  493.567011] blk_update_request: I/O error, dev rbd0, sector 9330688
    
    The fix is to decouple application from the decoding and:
    - apply new_weight first
    - apply new_state before new_up_client
    - twiddle osd_state flags if marking in
    - clear out some of the state if osd is destroyed
    
    Fixes: http://tracker.ceph.com/issues/14901
    
    Cc: stable@vger.kernel.org # 3.15+: 6dd74e44: libceph: set 'exists' flag for newly up osd
    Cc: stable@vger.kernel.org # 3.15+
    Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
    Reviewed-by: default avatarJosh Durgin <jdurgin@redhat.com>
    Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
    8777c9f6
osdmap.c 40 KB