1. 03 Dec, 2009 21 commits
    • Miklos Szeredi's avatar
      nfs: clean up sillyrenaming in nfs_rename() · 24e93025
      Miklos Szeredi authored
      The d_instantiate(new_dentry, NULL) is superfluous, the dentry is
      already negative.  Rehashing this dummy dentry isn't needed either,
      d_move() works fine on an unhashed target.
      
      The re-checking for busy after a failed nfs_sillyrename() is bogus
      too: new_dentry->d_count < 2 would be a bug here.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      24e93025
    • Miklos Szeredi's avatar
      nfs: dont unhash target if renaming a directory · 27226104
      Miklos Szeredi authored
      Move unhashing the target to after the check for existence and being a
      non-directory.
      
      If renaming a directory then the VFS already unhashes the target if it
      is not busy.  If it's busy then acquiring more references during the
      rename makes no difference.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      27226104
    • Miklos Szeredi's avatar
      nfs: fix comments in nfs_rename() · 28f79a1a
      Miklos Szeredi authored
      Comments are wrong or out of date.  In particular d_drop() doesn't
      free the inode it just unhashes the dentry.  And if target is a
      directory then it is not checked for being busy.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      28f79a1a
    • Miklos Szeredi's avatar
      nfs: remove unnecessary check from nfs_rename() · e48de5ec
      Miklos Szeredi authored
      VFS already checks if both source and target are directories.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      e48de5ec
    • Trond Myklebust's avatar
    • Chuck Lever's avatar
      SUNRPC: soft connect semantics for UDP · 3a28becc
      Chuck Lever authored
      Introduce soft connect behavior for UDP transports.  In this case, a
      major timeout returns ETIMEDOUT instead of EIO.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      3a28becc
    • Chuck Lever's avatar
      SUNRPC: Use soft connect semantics when performing RPC ping · caabea8a
      Chuck Lever authored
      Currently, if a remote RPC service is unreachable, an RPC ping will
      hang until the underlying transport connect attempt times out.  A more
      desirable behavior might be to have the ping fail immediately so upper
      layers can recover appropriately.
      
      In the case of an NFS mount, for instance, this would mean the
      mount(2) system call could fail immediately if the server isn't
      listening, rather than hanging uninterruptibly for more than 3
      minutes.
      
      Change rpc_ping() so that it fails immediately for connection-oriented
      transports.  rpc_create() will then fail immediately for such
      transports if an RPC ping was requested.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      caabea8a
    • Chuck Lever's avatar
      SUNRPC: Use soft connects for autobinding over TCP · 012da158
      Chuck Lever authored
      Autobinding is handled by the rpciod process, not in user processes
      that are generating regular RPC requests.  Thus autobinding is usually
      not affected by signals targetting user processes, such as KILL or
      timer expiration events.
      
      In addition, an RPC request generated by a user process that has
      RPC_TASK_SOFTCONN set and needs to perform an autobind will hang if
      the remote rpcbind service is not available.
      
      For rpcbind queries on connection-oriented transports, let's use the
      new soft connect semantic to return control to the user's process
      quickly, if the kernel's rpcbind client can't connect to the remote
      rpcbind service.
      
      Logic is introduced in call_bind_status() to handle connection errors
      that occurred during an asynchronous rpcbind query.  The logic
      abandons the rpcbind query if the RPC request has SOFTCONN set, and
      retries after a few seconds in the normal case.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      012da158
    • Chuck Lever's avatar
      SUNRPC: Use TCP for local rpcbind upcalls · 2a76b3bf
      Chuck Lever authored
      Use TCP with the soft connect semantic for local rpcbind upcalls so
      the kernel can detect immediately if the local rpcbind daemon is not
      running.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      2a76b3bf
    • Chuck Lever's avatar
      SUNRPC: Use a cached RPC client and transport for rpcbind upcalls · c526611d
      Chuck Lever authored
      The kernel's rpcbind client creates and deletes an rpc_clnt and its
      underlying transport socket for every upcall to the local rpcbind
      daemon.
      
      When starting a typical NFS server on IPv4 and IPv6, the NFS service
      itself does three upcalls (one per version) times two upcalls (one
      per transport) times two upcalls (one per address family), making 12,
      plus another one for the initial call to unregister previous NFS
      services.  Starting the NLM service adds an additional 13 upcalls,
      for similar reasons.
      
      (Currently the NFS service doesn't start IPv6 listeners, but it will
      soon enough).
      
      Instead, let's create an rpc_clnt for rpcbind upcalls during the
      first local rpcbind query, and cache it.  This saves the overhead of
      creating and destroying an rpc_clnt and a socket for every upcall.
      
      The new logic also prevents the kernel from attempting an RPCB_SET or
      RPCB_UNSET if it knows from the start that the local portmapper does
      not support rpcbind protocol version 4.  This will cut down on the
      number of rpcbind upcalls in legacy environments.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      c526611d
    • Chuck Lever's avatar
      SUNRPC: Simplify synopsis of rpcb_local_clnt() · 5a462115
      Chuck Lever authored
      Clean up: At one point, rpcb_local_clnt() handled IPv6 loopback
      addresses too, but it doesn't any more; only IPv4 loopback is used
      now.  Get rid of the @addr and @addrlen arguments to
      rpcb_local_clnt().
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      5a462115
    • Chuck Lever's avatar
      SUNRPC: Allow RPCs to fail quickly if the server is unreachable · 09a21c41
      Chuck Lever authored
      The kernel sometimes makes RPC calls to services that aren't running.
      Because the kernel's RPC client always assumes the hard retry semantic
      when reconnecting a connection-oriented RPC transport, the underlying
      reconnect logic takes a long while to time out, even though the remote
      may have responded immediately with ECONNREFUSED.
      
      In certain cases, like upcalls to our local rpcbind daemon, or for NFS
      mount requests, we'd like the kernel to fail immediately if the remote
      service isn't reachable.  This allows another transport to be tried
      immediately, or the pending request can be abandoned quickly.
      
      Introduce a per-request flag which controls how call_transmit_status()
      behaves when request transmission fails because the server cannot be
      reached.
      
      We don't want soft connection semantics to apply to other errors.  The
      default case of the switch statement in call_transmit_status() no
      longer falls through; the fall through code is copied to the default
      case, and a "break;" is added.
      
      The transport's connection re-establishment timeout is also ignored for
      such requests.  We want the request to fail immediately, so the
      reconnect delay is skipped.  Additionally, we don't want a connect
      failure here to further increase the reconnect timeout value, since
      this request will not be retried.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      09a21c41
    • Chuck Lever's avatar
      SUNRPC: Check explicitly for tk_status == 0 in call_transmit_status() · 206a134b
      Chuck Lever authored
      The success case, where task->tk_status == 0, is by far the most
      frequent case in call_transmit_status().
      
      The default: arm of the switch statement in call_transmit_status()
      handles the 0 case.  default: was moved close to the top of the switch
      statement in call_transmit_status() under the theory that the compiler
      places object code for the earliest arms of a switch statement first,
      making the CPU do less work.
      
      The default: arm of a switch statement, however, is executed only
      after all the other cases have been checked.  Even if the compiler
      rearranges the object code, the default: arm is the "last resort",
      meaning all of the other cases have been explicitly exhausted.  That
      makes the current arrangement about as inefficient as it gets for the
      common case.
      
      To fix this, add an explicit check for zero before the switch
      statement.  That forces the compiler to do the zero check first, no
      matter what optimizations it might try to do to the switch statement.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      206a134b
    • Chuck Lever's avatar
      NFS: Revert default r/wsize behavior · dd47f96c
      Chuck Lever authored
      When the "rsize=" or "wsize=" mount options are not specified,
      text-based mounts have slightly different behavior than legacy binary
      mounts.  Text-based mounts use the smaller of the server's maximum
      and the client's maximum, but binary mounts use the smaller of the
      server's _preferred_ size and the client's maximum.
      
      This difference is actually pretty subtle.  Most servers advertise
      the same value as their maximum and their preferred transfer size, so
      the end result is the same in most cases.
      
      The reason for this difference is that for text-based mounts, if
      r/wsize are not specified, they are set to the largest value supported
      by the client.  For legacy mounts, the values are set to zero if these
      options are not specified.
      
      nfs_server_set_fsinfo() can negotiate the transfer size defaults
      correctly in any case.  There's no need to specify any particular
      value as default in the text-based option parsing logic.
      
      Note that nfs4 doesn't use nfs_server_set_fsinfo(), but the mount.nfs4
      command does set rsize and wsize to 0 if the user didn't specify these
      options.  So, make the same change for text-based NFSv4 mounts.
      
      Thanks to James Pearson <james-p@moving-picture.com> for reporting and
      diagnosing the problem.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      dd47f96c
    • Chuck Lever's avatar
      NFS: Display compressed (shorthand) IPv6 in /proc/mounts · d250e190
      Chuck Lever authored
      Recent changes to snprintf() introduced the %pI6c formatter, which can
      display an IPv6 address with standard shorthanding.  Use this new
      formatter when displaying IPv6 server addresses in /proc/mounts.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      d250e190
    • Chuck Lever's avatar
      SUNRPC: Display compressed (shorthand) IPv6 presentation addresses · dd1fd90f
      Chuck Lever authored
      Recent changes to snprintf() introduced the %pI6c formatter, which can
      display an IPv6 address with standard shorthanding.  Using a
      shorthanded address can save us a few bytes of memory for each stored
      presentation address, or a few bytes on the wire when sending these in
      a universal address.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      dd1fd90f
    • Richard Kennedy's avatar
      NFS: reorder nfs4_sequence_regs to remove 8 bytes of padding on 64 bits · a01878aa
      Richard Kennedy authored
      reorder nfs4_sequence_args to remove 8 bytes of padding on 64 bit
      builds.
      
      The size of this structure drops to 24 bytes from 32 and reduces the
      text size of nfs.ko.
      On my x86_64 size reports
      
      		text       data     bss
      2.6.32-rc5 	200996	   8512	    432	 209940	  33414	nfs.ko
      +patch 		200884	   8512	    432	 209828	  333a4	nfs.ko
      Signed-off-by: default avatarRichard Kennedy <richard@rsk.demon.co.uk>
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      a01878aa
    • Jeff Layton's avatar
      NFS: convert proto= option to use netids rather than a protoname · ee671b01
      Jeff Layton authored
      Solaris uses netids as values for the proto= option, so that when
      someone specifies "tcp6" they get traffic over TCP + IPv6. Until
      recently, this has never really been an issue for Linux since it didn't
      support NFS over IPv6. The netid and the protocol name were generally
      always the same (modulo any strange configuration in /etc/netconfig).
      
      The solaris manpage documents their proto= option as:
      
          proto= _netid_ | rdma
      
      This patch is intended to bring Linux closer to how the Solaris proto=
      option works, by declaring a static netid mapping in the kernel and
      converting the proto= and mountproto= options to follow it and display
      the proper values in /proc/mounts.
      
      Much of this functionality will need to be provided by a userspace
      mount.nfs patch. Chuck Lever has a patch to change mount.nfs in
      the same way. In principle, we could do *all* of this in userspace but
      that would mean that the options in /proc/mounts may not match the
      options used by userspace.
      
      The alternative to the static mapping here is to add a mechanism to
      upcall to userspace for netid's. I'm not opposed to that option, but
      it'll probably mean more overhead (and quite a bit more code). Rather
      than shoot for that at first, I figured it was probably better to
      start simply.
      
      Comments welcome.
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      ee671b01
    • J. Bruce Fields's avatar
    • Trond Myklebust's avatar
      NFS: BKL removal from the mount code... · 96f287b0
      Trond Myklebust authored
      None of the code in nfs_umount_begin() or nfs_remount() has any BKL
      dependency.
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      96f287b0
    • Linus Torvalds's avatar
      Linux 2.6.32 · 22763c5c
      Linus Torvalds authored
      22763c5c
  2. 02 Dec, 2009 19 commits