- 01 Jun, 2009 2 commits
-
-
Yu Zhiguo authored
J. Bruce Fields wrote: ... > (This is extremely confusing code to track down: note that > proc->pc_decode is set to nfs4svc_decode_compoundargs() by the PROC() > macro at the end of fs/nfsd/nfs4proc.c. Which means, for example, that > grepping for nfs4svc_decode_compoundargs() gets you nowhere. Patches to > kill off that macro would be welcomed....) the macro 'PROC' is complicated and obscure, it had better be killed off in order to make the code more clear. Signed-off-by: Yu Zhiguo <yuzg@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Yu Zhiguo authored
Server should return NFS4ERR_ATTRNOTSUPP if an attribute specified is not supported in current environment. Operations CREATE, NVERIFY, OPEN, SETATTR and VERIFY should do this check. This bug is found when do newpynfs tests. The names of the tests that failed are following: CR12 NVF7a NVF7b NVF7c NVF7d NVF7f NVF7r NVF7s OPEN15 VF7a VF7b VF7c VF7d VF7f VF7r VF7s Add function do_check_fattr() to do exact check: 1, Check attribute specified is supported by the NFSv4 server or not. 2, Check FATTR4_WORD0_ACL & FATTR4_WORD0_FS_LOCATIONS are supported in current environment or not. 3, Check attribute specified is writable or not. step 1 and 3 are done in function nfsd4_decode_fattr() but removed to this function now. Signed-off-by: Yu Zhiguo <yuzg@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
- 27 May, 2009 3 commits
-
-
Greg Banks authored
The file nfsfh.c contains two static variables nfsd_nr_verified and nfsd_nr_put. These are counters which are incremented as a side effect of the fh_verify() fh_compose() and fh_put() operations, i.e. at least twice per NFS call for any non-trivial workload. Needless to say this makes the cacheline that contains them (and any other innocent victims) a very hot contention point indeed under high call-rate workloads on multiprocessor NFS server. It also turns out that these counters are not used anywhere. They're not reported to userspace, they're not used in logic, they're not even exported from the object file (let alone the module). All they do is waste CPU time. So this patch removes them. Tests on a 16 CPU Altix A4700 with 2 10gige Myricom cards, configured separately (no bonding). Workload is 640 client threads doing directory traverals with random small reads, from server RAM. Before ====== Kernel profile: % cumulative self self total time samples samples calls 1/call 1/call name 6.05 2716.00 2716.00 30406 0.09 1.02 svc_process 4.44 4706.00 1990.00 1975 1.01 1.01 spin_unlock_irqrestore 3.72 6376.00 1670.00 1666 1.00 1.00 svc_export_put 3.41 7907.00 1531.00 1786 0.86 1.02 nfsd_ofcache_lookup 3.25 9363.00 1456.00 10965 0.13 1.01 nfsd_dispatch 3.10 10752.00 1389.00 1376 1.01 1.01 nfsd_cache_lookup 2.57 11907.00 1155.00 4517 0.26 1.03 svc_tcp_recvfrom ... 2.21 15352.00 1003.00 1081 0.93 1.00 nfsd_choose_ofc <---- ^^^^ Here the function nfsd_choose_ofc() reads a global variable which by accident happened to be located in the same cacheline as nfsd_nr_verified. Call rate: nullarbor:~ # pmdumptext nfs3.server.calls ... Thu Dec 13 00:15:27 184780.663 Thu Dec 13 00:15:28 184885.881 Thu Dec 13 00:15:29 184449.215 Thu Dec 13 00:15:30 184971.058 Thu Dec 13 00:15:31 185036.052 Thu Dec 13 00:15:32 185250.475 Thu Dec 13 00:15:33 184481.319 Thu Dec 13 00:15:34 185225.737 Thu Dec 13 00:15:35 185408.018 Thu Dec 13 00:15:36 185335.764 After ===== kernel profile: % cumulative self self total time samples samples calls 1/call 1/call name 6.33 2813.00 2813.00 29979 0.09 1.01 svc_process 4.66 4883.00 2070.00 2065 1.00 1.00 spin_unlock_irqrestore 4.06 6687.00 1804.00 2182 0.83 1.00 nfsd_ofcache_lookup 3.20 8110.00 1423.00 10932 0.13 1.00 nfsd_dispatch 3.03 9456.00 1346.00 1343 1.00 1.00 nfsd_cache_lookup 2.62 10622.00 1166.00 4645 0.25 1.01 svc_tcp_recvfrom [...] 0.10 42586.00 44.00 74 0.59 1.00 nfsd_choose_ofc <--- HA!! ^^^^ Call rate: nullarbor:~ # pmdumptext nfs3.server.calls ... Thu Dec 13 01:45:28 194677.118 Thu Dec 13 01:45:29 193932.692 Thu Dec 13 01:45:30 194294.364 Thu Dec 13 01:45:31 194971.276 Thu Dec 13 01:45:32 194111.207 Thu Dec 13 01:45:33 194999.635 Thu Dec 13 01:45:34 195312.594 Thu Dec 13 01:45:35 195707.293 Thu Dec 13 01:45:36 194610.353 Thu Dec 13 01:45:37 195913.662 Thu Dec 13 01:45:38 194808.675 i.e. about a 5.3% improvement in call rate. Signed-off-by: Greg Banks <gnb@melbourne.sgi.com> Reviewed-by: David Chinner <dgc@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Greg Banks authored
Fix a regression in the reply cache introduced when the code was converted to use proper Linux lists. When a new entry needs to be inserted, the case where all the entries are currently being used by threads is not correctly detected. This can result in memory corruption and a crash. In the current code this is an extremely unlikely corner case; it would require the machine to have 1024 nfsd threads and all of them to be busy at the same time. However, upcoming reply cache changes make this more likely; a crash due to this problem was actually observed in field. Signed-off-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Greg Banks authored
Make REQHASH() an inline function. Rename hash_list to cache_hash. Fix an obsolete comment. Signed-off-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
- 13 May, 2009 1 commit
-
-
Randy Dunlap authored
lockd/svclock.c is missing a header file <linux/fs.h>. <linux/fs.h> is missing a definition of locks_release_private() for the config case of FILE_LOCKING=n, causing a build error: fs/lockd/svclock.c:330: error: implicit declaration of function 'locks_release_private' lockd without FILE_LOCKING doesn't make sense, so make LOCKD and LOCKD_V4 depend on FILE_LOCKING, and make NFS depend on FILE_LOCKING. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
- 06 May, 2009 1 commit
-
-
Wang Chen authored
Save some loop time. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
- 03 May, 2009 2 commits
-
-
Randy Dunlap authored
Eliminate 56 sparse warnings like this one: fs/nfsd/nfs4xdr.c:1331:15: warning: obsolete array initializer, use C99 syntax Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Neil Brown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
J. Bruce Fields authored
As with the probe, this removes the need for another kthread. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
- 02 May, 2009 1 commit
-
-
J. Bruce Fields authored
Move this out of a local variable into the nfs4_delegation object in preparation for making this an async rpc call (at which point we'll need any state like this in a common object that's preserved across function calls). Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
- 01 May, 2009 3 commits
-
-
J. Bruce Fields authored
There's no point in keeping this field around--it's always zero. (Background: the protocol allows you to tell the client that the file is about to be truncated, as an optimization to save the client from writing back dirty pages that will just be discarded. We don't implement this hint. If we do some day, adding this field back in will be the least of the work involved.) Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
J. Bruce Fields authored
The nfs4_cb_recall struct is used only in nfs4_delegation, so its pointer to the containing delegation is unnecessary--we could just use container_of(). But there's no real reason to have this a separate struct at all--just move these fields to nfs4_delegation. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
J. Bruce Fields authored
I want to use the name for a struct that actually does represent a single callback. (Actually, I've never been sure it helps to a separate struct for the callback information. Some day maybe those fields could just be dumped into struct nfs4_client. I don't know.) Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
- 29 Apr, 2009 10 commits
-
-
J. Bruce Fields authored
We don't really need a synchronous rpc, and moving to an asynchronous rpc allows us to do without this extra kthread. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
J. Bruce Fields authored
Lookup the callback cred once and then use it for all subsequent callbacks. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
J. Bruce Fields authored
The code is a little simpler, and it should be easier to avoid races, if we just do all rpc client creation/destruction from nfsd or laundromat threads and do only the rpc calls themselves asynchronously. The rpc creation doesn't involve any significant waiting (it doesn't call the client, for example), so there's no reason not to do this. Also don't bother destroying the client on failure of the rpc null probe. We may want to retry the probe later anyway. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
J. Bruce Fields authored
This is just a minor code simplification. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
J. Bruce Fields authored
We tried to do something overly complicated with the callback rpc timeouts here. And they're wrong--the result is that by the time a single callback times out, it's already too late to tell the client (using the cb_path_down return to RENEW) that the callback is down. Use a much shorter, simpler timeout. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
J. Bruce Fields authored
This setclientid_confirm case should allow the client to change callbacks, but it currently has a dummy implementation that just turns off callbacks completely. That dummy implementation isn't completely correct either, though: - There's no need to remove any client recovery directory in this case. - New clientid confirm verifiers should be generated (and returned) in setclientid; there's no need to generate a new one here. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
J. Bruce Fields authored
Stephen Rothwell said: "Today's linux-next build (powerpc ppc64_defconfig) produced this new warning: fs/nfsd/nfs4state.c: In function 'EXPIRED_STATEID': fs/nfsd/nfs4state.c:2757: warning: comparison of distinct pointer types lacks a cast Caused by commit 78155ed7 ("nfsd4: distinguish expired from stale stateids")." Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
-
J. Bruce Fields authored
ext4 supports a real NFSv4 change attribute, which is bumped whenever the ctime would be updated, including times when two updates arrive within a jiffy of each other. (Note that although ext4 has space for nanosecond-precision ctime, the real resolution is lower: it actually uses jiffies as the time-source.) This ensures clients will invalidate their caches when they need to. There is some fear that keeping the i_version up-to-date could have performance drawbacks, so for now it's turned on only by a mount option. We hope to do something better eventually. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Theodore Tso <tytso@mit.edu>
-
J. Bruce Fields authored
We don't need comments to tell us these macros are ugly. And we're long past trying to share any of this code with the BSD's. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
J. Bruce Fields authored
This macro doesn't serve any useful purpose. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
- 28 Apr, 2009 17 commits
-
-
Chuck Lever authored
Clean up: For consistency, handle output buffer size checking in a other nfsctl functions the same way it's done for write_versions(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
While it's not likely today that there are enough NFS versions to overflow the output buffer in write_versions(), we should be more careful about detecting the end of the buffer. The number of NFS versions will only increase as NFSv4 minor versions are added. Note that this API doesn't behave the same as portlist. Here we attempt to display as many versions as will fit in the buffer, and do not provide any indication that an overflow would have occurred. I don't have any good rationale for that. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
While it's not likely a pathname will be longer than SIMPLE_TRANSACTION_SIZE, we should be more careful about just plopping it into the output buffer without bounds checking. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
Clean up svc_one_sock_name() by setting up automatic variables for frequently used expressions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
Add an arm to the switch statement in svc_one_sock_name() so it can construct the name of PF_INET6 sockets properly. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aime Le Rouzic <aime.le-rouzic@bull.net> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
Use snprintf() in one_sock_name() to prevent overflowing the output buffer. If the name doesn't fit in the buffer, the buffer is filled in with an empty string, and -ENAMETOOLONG is returned. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
Adjust the synopsis of svc_sock_names() to pass in the size of the output buffer. Add a documenting comment. This is a cosmetic change for now. A subsequent patch will make sure the buffer length is passed to one_sock_name(), where the length will actually be useful. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
Adjust the synopsis of svc_addsock() to pass in the size of the output buffer. Add a documenting comment. This is a cosmetic change for now. A subsequent patch will make sure the buffer length is passed to one_sock_name(), where the length will actually be useful. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
The svc_xprt_names() function can overflow its buffer if it's so near the end of the passed in buffer that the "name too long" string still doesn't fit. Of course, it could never tell if it was near the end of the passed in buffer, since its only caller passes in zero as the buffer length. Let's make this API a little safer. Change svc_xprt_names() so it *always* checks for a buffer overflow, and change its only caller to pass in the correct buffer length. If svc_xprt_names() does overflow its buffer, it now fails with an ENAMETOOLONG errno, instead of trying to write a message at the end of the buffer. I don't like this much, but I can't figure out a clean way that's always safe to return some of the names, *and* an indication that the buffer was not long enough. The displayed error when doing a 'cat /proc/fs/nfsd/portlist' is "File name too long". Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
Clean up. A couple of years ago, a series of commits, finishing with commit 5680c446, swapped the order of the lockd_up() and svc_addsock() calls in __write_ports(). At that time lockd_up() needed to know the transport protocol of the passed-in socket to start a listener on the same transport protocol. These days, lockd_up() doesn't take a protocol argument; it always starts both a UDP and TCP listener. It's now more straightforward to try the lockd_up() first, then do a lockd_down() if the svc_addsock() fails. Careful review of this code shows that the svc_sock_names() call is used only to close the just-opened socket in case lockd_up() fails. So it is no longer needed if lockd_up() is done first. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
Clean up: Refactor transport name listing out of __write_ports() to make it easier to understand and maintain. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
User space must call listen(3) on SOCK_STREAM sockets passed into /proc/fs/nfsd/portlist, otherwise that listener is ignored. Document this. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
Clean up: Refactor the socket creation logic out of __write_ports() to make it easier to understand and maintain. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
Clean up: Refactor the socket closing logic out of __write_ports() to make it easier to understand and maintain. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
Clean up: Refactor transport addition out of __write_ports() to make it easier to understand and maintain. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
Clean up: Refactor transport removal out of __write_ports() to make it easier to understand and maintain. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
The svc_addr_len() helper function returns -EAFNOSUPPORT if it doesn't recognize the address family of the passed-in socket address. However, the return type of this function is size_t, which means -EAFNOSUPPORT is turned into a very large positive value in this case. The check in svc_udp_recvfrom() to see if the return value is less than zero therefore won't work at all. Additionally, handle_connect_req() passes this value directly to memset(). This could cause memset() to clobber a large chunk of memory if svc_addr_len() has returned an error. Currently the address family of these addresses, however, is known to be supported long before handle_connect_req() is called, so this isn't a real risk. Change the error return value of svc_addr_len() to zero, which fits in the range of size_t, and is safer to pass to memset() directly. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-