1. 19 Jul, 2002 12 commits
    • Neil Brown's avatar
      [PATCH] MD - Set desc_nr more sanely. · 999a2029
      Neil Brown authored
      Set desc_nr more sanely.
      
      Currently rdev->desc_nr is set in sync_sbs which is typcially
      called just before writing out the superblocks, which is an
      odd place to set it.
      It is also called when a new disk is added (which is sane) and
      when an old disc is imported ... which is quesitonable.
      
      With this patch it is set when a new disk is added, and when
      the superblocks are being analysed, which makes lots of sense.
      
      MULTIPATH is particularly an issue here.  The old code tried
      to figure the desc_nr for an rdev by matching device numbers in
      the superblock.  This doesn't make a lot of sense as
      device numbers can change.  Now MULTIPATH components
      get sequential desc_nrs.
      999a2029
    • Neil Brown's avatar
      [PATCH] MD - Move md_update_sb calls · 6f42312c
      Neil Brown authored
      Move md_update_sb calls
      
      When a change which requires a superblock update happens
      at interrupt time, we currently set a flag (sb_dirty) and
      wakeup to per-array thread (raid1/raid5d/multipathd) to
      do the actual update.
      
      This patch centralises this.  The sb_update is now done
      by the mdrecoveryd thread.  As this is always woken up after
      the error handler is called, we don't need the call to wakeup
      the local thread any more.
      
      With this, we don't need "md_update_sb" to lock the array
      any more and only use __md_update_sb which is local to md.c
      So we rename __md_update_sb back to md_update_sb and stop
      exporting it.
      6f42312c
    • Neil Brown's avatar
      [PATCH] MD - Pass the correct bdev to md_error · a15b60a2
      Neil Brown authored
      Pass the correct bdev to md_error
      
      After a call to generic_make_request, bio->bi_bdev can have changed
      (e.g. by a re-mapped like raid0).  So we cannot trust it for reporting
      the source of an error.  This patch takes care to find the correct
      bdev.
      a15b60a2
    • Neil Brown's avatar
      [PATCH] MD - Rdev list cleanups. · 2a9400e9
      Neil Brown authored
      Rdev list cleanups.
      
      An "rdev" can be on three different lists.
       - the list of all rdevs
       - the list of pending rdevs
       - the list of rdevs for a given mddev
      
      The first list is now only used to list "unused" devices in
      /proc/mdstat, and only pending rdevs can be unused, so this list
      isn't necessary.
      An rdev cannot be both pending and in an mddev, so we know rdev will
      only be on one list at at time.
      
      This patch discards  the all_raid_disks list, and changes the
      pending list to use "same_set" in the rdev.  It also changes
      /proc/mdstat to iterate through pending devices, rather than through
      all devices.
      
      So now an rdev is only on one list, either the pending list
      or the list of rdevs for a given mddev.  This means that
      ITERATE_RDEV_GENERIC doesn't need to be told which field,
      to walk down: there is ony one.
      2a9400e9
    • Neil Brown's avatar
      [PATCH] MD - Get rid of find_rdev_all · 70e96bef
      Neil Brown authored
      Get rid of find_rdev_all
      
      find_rdev_all is now only used to check if a device is already
      used in an md array.
      
      We change lock_rdev so that it claims the bdev for
      the specific rdev rather than for rdevs in general.
      Now lock_rdev will check if the bdev is inuse by another array
      or not, so the find_rdev_all check isn't needed and is removed,
      along with find_rdev_all itself.
      
      We also make sure that the error code from lock_rdev is
      propagated up properly.
      70e96bef
    • Neil Brown's avatar
      [PATCH] MD - Use symbolic names for multipath (-4) and linear (-1) · a0f86742
      Neil Brown authored
      Use symbolic names for multipath (-4) and linear (-1)
      
      Also, a variable called "level" was being used to store a
      "level" and a "personality" number.  This is potentially
      confusing, so it is now two variables.
      a0f86742
    • Neil Brown's avatar
      [PATCH] MD - Don't "analyze_sb" when creating new array. · 376163df
      Neil Brown authored
      Don't "analyze_sb" when creating new array.
      
      When creating a new array (and we have an mddev->sb),
      don't both to analyze the superblocks.  There is no point.
      Also, these means we always allocate the array sb in
      analyze_sbs, rather than conditionally.
      376163df
    • Neil Brown's avatar
      [PATCH] MD - Embed bio in mp_bh rather than separate allocation. · e3de153e
      Neil Brown authored
      Embed bio in mp_bh rather than separate allocation.
      
      multipath currently allocates an mp_bh and a bio for each
      request.  With this patch, the bio is made to be part of the
      mp_bh so there is only one allocation, and it from a private
      pool (the bio was allocated from a shared pool).
      
      Also remove "remaining" and "cmd" from mp_bh which aren't used.
      And remove spare (unused) from multipath_private_data.
      e3de153e
    • Neil Brown's avatar
      [PATCH] MD - 27 - Remove state field from multipath mp_bh structure. · 8e2a19e7
      Neil Brown authored
      Remove state field from multipath mp_bh structure.
      
      The MPBH_Uptodate flag is set but never used,
      The MPBH_SyncPhase flag was never used.
      These a both legacy from the copying of raid1.c
      
      MPBH_PreAlloc is no longer needed as due to use of
      mempools, so the state field can go...
      8e2a19e7
    • Neil Brown's avatar
      [PATCH] MD - Get multipath to use mempool · e18a7e5c
      Neil Brown authored
      Get multipath to use mempool
      
      ... rather than maintaining it's own mempool
      e18a7e5c
    • Neil Brown's avatar
      [PATCH] MD - Remove dead consistancy checking code from multipath. · 663c6269
      Neil Brown authored
      Remove dead consistancy checking code from multipath.
      
      This "consistancy_check" is carried over from raid1 on which multipath
      was based, and was not used in raid1 and has since been removed.  Now
      it gets removed from multipath too.
      663c6269
    • Neil Brown's avatar
      [PATCH] MD - Remove bdput calls from raid personalities. · 82b0fad1
      Neil Brown authored
      Remove bdput calls from raid personalities.
      
      Some of the md personalities currently hold a counted reference
      on a bdev.  This is not necessary as the main md module will always
      hold a counted reference in the rdev.
      This patch removes the code to take and drop these unnecessary
      references.
      82b0fad1
  2. 18 Jul, 2002 2 commits
    • Trond Myklebust's avatar
      [PATCH] Fix typo in net/sunrpc/xprt.c · 389a5884
      Trond Myklebust authored
      The appended patch fixes a typo in net/sunrpc/xprt.c: We want to
      ensure that we play safe, and only increment the UDP congestion window
      when we have successfully transmitted a full frame of data.
      
      In addition, we should perhaps still 'slow start' the UDP congestion
      code rather than assuming that we can immediately fire off 8
      requests. IOW revert the value of RPC_INITCWND.
      389a5884
    • Trond Myklebust's avatar
      [PATCH] Fix NFS locking bug · df458c00
      Trond Myklebust authored
      Here's one bugfix which might help to explain the GRANTED failure. The
      bug has been there all along (so I'll probably want to send this to
      Marcelo too).
      
      The code in question in supposed to ensure that we don't wait on a
      reply if the RPC call doesn't expect one. However, if the socket
      transmission failed for some reason, we do actually want to loop and
      try again...
      
      This bug will hit the RPC call in nlmsvc_grant_blocked().
      df458c00
  3. 16 Jul, 2002 9 commits
    • Linus Torvalds's avatar
      Kernel version 2.5.26 · 0d84f0ac
      Linus Torvalds authored
      0d84f0ac
    • Trond Myklebust's avatar
      [PATCH] RPC over UDP congestion control updates [8/8] · fefe89f4
      Trond Myklebust authored
      When determining who gets access to the socket, give priority to
      requests that are being resent. Despite the fact that congestion
      control now applies to resends, we still want to ensure that resends
      get ACKed as soon as possible (and before we start sending off new
      requests).
      fefe89f4
    • Trond Myklebust's avatar
      [PATCH] RPC over UDP congestion control updates [7/8] · 0b51abc8
      Trond Myklebust authored
        - Divorce the allocation of free request slots and the congestion
          control. Make the congestion control apply only to when we
          actually send data over the wire. This means that we *do* apply
          congestion control to resent requests: if a timeout has occured,
          and there are too many requests on the wire, delay resending until
          the congestion algorithm allows it.
      
        - Improve spinlocking by putting the congestion avoidance algoritm
          under xprt->sock_lock. This lock has to be taken *anyway* in
          (almost) all cases where we are updating the congestion control
          data.
      0b51abc8
    • Trond Myklebust's avatar
      [PATCH] RPC over UDP congestion control updates [6/8] · 4edf0555
      Trond Myklebust authored
      Eliminate the arbitrary timeouts in xprt_adjust_cwnd(). Strict
      enforcement of the congestion avoidance algorithm as detailed in Van
      Jacobson's 1998 paper http://www-nrg.ee.lbl.gov/nrg-papers.html
      Congestion Avoidance and Control.
      4edf0555
    • Trond Myklebust's avatar
      [PATCH] RPC over UDP congestion control updates [5/8] · 514349dc
      Trond Myklebust authored
      Clean up the Van Jacobson network congestion control code.
      514349dc
    • Trond Myklebust's avatar
      [PATCH] RPC over UDP congestion control updates [4/8] · c6b43f23
      Trond Myklebust authored
      Cleanups for the socket locking mechanism.
      
      Improve RPC request ordering by ensuring that RPC tasks that are
      already queued on xprt->sending get sent before tasks that happen to
      get scheduled just when there is a free slot.
      
      In case the socket send buffer is full, queue the tasks on
      xprt->pending rather than xprt->sending in order to eliminate the risk
      of accidental wakeups from xprt_release_write() and xprt_write_space().
      c6b43f23
    • Trond Myklebust's avatar
      [PATCH] RPC over UDP congestion control updates [3/8] · 9ba7d221
      Trond Myklebust authored
      Improve the response to timeouts. As requests time out, we delay
      timing out the remaining requests (in fact we follow exponential
      backoff). This is done because we assume either that the round trip
      time has been underestimated, or that the network/server is congested,
      and we need to back off the resending of new requests.
      9ba7d221
    • Trond Myklebust's avatar
      [PATCH] RPC over UDP congestion control updates [2/8] · fa7b279e
      Trond Myklebust authored
      Implement a count of the number of timeouts that have occured since
      we last recorded a successful reply from the server.
      
      For the moment this information is merely used in order to improve the
      estimate of whether or not the server is down. It will be used in
      patch 3/8 in order to improve the timeout backoff algorithm.
      fa7b279e
    • Trond Myklebust's avatar
      [PATCH] RPC over UDP congestion control updates [1/8] · 77d79030
      Trond Myklebust authored
      Implement the basic round trip timing algorithm in order to adapt the
      timeout values for the most common NFS operations to the server's
      rate of response.
      Algorithm is described in Van Jacobson's paper 1998 paper
      on http://www-nrg.ee.lbl.gov/nrg-papers.html, and is the same as is
      used for most TCP stacks.
      
      Following the *BSD code, we implement separate rtt timers for GETATTR,
      LOOKUP, READ/READDIR/READLINK, and WRITE. In addition to this, there
      is one extra timer for the COMMIT operation.
      All the remaining RPC calls use the current system in which a fixed
      timeout value gets set by the 'timeo' mount option.
      
      In case of a timeout, the current exponential backoff algoritm is
      implemented. Subsequent patches will improve this...
      77d79030
  4. 15 Jul, 2002 17 commits