• David Howells's avatar
    afs: Actively poll fileservers to maintain NAT or firewall openings · f6cbb368
    David Howells authored
    When an AFS client accesses a file, it receives a limited-duration callback
    promise that the server will notify it if another client changes a file.
    This callback duration can be a few hours in length.
    
    If a client mounts a volume and then an application prevents it from being
    unmounted, say by chdir'ing into it, but then does nothing for some time,
    the rxrpc_peer record will expire and rxrpc-level keepalive will cease.
    
    If there is NAT or a firewall between the client and the server, the route
    back for the server may close after a comparatively short duration, meaning
    that attempts by the server to notify the client may then bounce.
    
    The client, however, may (so far as it knows) still have a valid unexpired
    promise and will then rely on its cached data and will not see changes made
    on the server by a third party until it incidentally rechecks the status or
    the promise needs renewal.
    
    To deal with this, the client needs to regularly probe the server.  This
    has two effects: firstly, it keeps a route open back for the server, and
    secondly, it causes the server to disgorge any notifications that got
    queued up because they couldn't be sent.
    
    Fix this by adding a mechanism to emit regular probes.
    
    Two levels of probing are made available: Under normal circumstances the
    'slow' queue will be used for a fileserver - this just probes the preferred
    address once every 5 mins or so; however, if server fails to respond to any
    probes, the server will shift to the 'fast' queue from which all its
    interfaces will be probed every 30s.  When it finally responds, the record
    will switch back to the slow queue.
    
    Further notes:
    
     (1) Probing is now no longer driven from the fileserver rotation
         algorithm.
    
     (2) Probes are dispatched to all interfaces on a fileserver when that an
         afs_server object is set up to record it.
    
     (3) The afs_server object is removed from the probe queues when we start
         to probe it.  afs_is_probing_server() returns true if it's not listed
         - ie. it's undergoing probing.
    
     (4) The afs_server object is added back on to the probe queue when the
         final outstanding probe completes, but the probed_at time is set when
         we're about to launch a probe so that it's not dependent on the probe
         duration.
    
     (5) The timer and the work item added for this must be handed a count on
         net->servers_outstanding, which they hand on or release.  This makes
         sure that network namespace cleanup waits for them.
    
    Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")
    Reported-by: default avatarDave Botsch <botsch@cnf.cornell.edu>
    Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
    f6cbb368
fs_probe.c 10.1 KB