- 04 Jun, 2018 1 commit
-
-
Trond Myklebust authored
When we hold a delegation, we should not need to request attributes such as the file size or the change attribute. For some servers, avoiding asking for these unneeded attributes can improve the overall system performance. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
- 31 May, 2018 38 commits
-
-
Trond Myklebust authored
If the server recalls the layout that was just handed out, we risk hitting a race as described in RFC5661 Section 2.10.6.3 unless we ensure that we release the sequence slot after processing the LAYOUTGET operation that was sent as part of the OPEN compound. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
If the layoutget on open call failed, we can't really commit the inode, so don't bother calling it. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
If we're only opening the file for reading, and the file is empty and/or we already have cached data, then heuristically optimise away the LAYOUTGET. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
Ensure that we only switch off the LAYOUTGET operation in the OPEN compound when the server is truly broken, and/or it is complaining that the compound is too large. Currently, we end up turning off the functionality permanently, even for transient errors such as EACCES or ENOSPC. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
We need to ensure that pnfs_parse_lgopen() doesn't try to parse a struct nfs4_layoutget_res that was not filled by a successful call to decode_layoutget(). This can happen if we performed a cached open, or if either the OP_ACCESS or OP_GETATTR operations preceding the OP_LAYOUTGET in the compound returned an error. By initialising the 'status' field to NFS4ERR_DELAY, we ensure that pnfs_parse_lgopen() won't try to interpret the structure. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Fred Isaman authored
The flag was not always being cleared after LAYOUTGET on OPEN. Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Fred Isaman authored
Since the LAYOUTGET on OPEN can be sent without prior inode information, existing methods to prevent LAYOUTGET from being sent while processing CB_LAYOUTRECALL don't work. Track if a recall occurred while LAYOUTGET was being sent, and if so ignore the results. Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Fred Isaman authored
Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Fred Isaman authored
Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
Move the actual freeing of the struct nfs4_layoutget into fs/nfs/pnfs.c where it can be reused by the layoutget on open code. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Fred Isaman authored
This triggers when have no pre-existing inode to attach to. The preexisting case is saved for later. Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Fred Isaman authored
Don't send in a layout, instead use the (possibly NULL) inode. This is needed for LAYOUTGET attached to an OPEN where the inode is not yet set. Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Fred Isaman authored
It will be needed now by the pnfs code. Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Fred Isaman authored
Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Fred Isaman authored
They work better in the new alloc_init function. Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Fred Isaman authored
Pull out the alloc/init part for eventual reuse by OPEN. Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Fred Isaman authored
Driver can set flag to allow LAYOUTGET to be sent with OPEN. Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Fred Isaman authored
Preparing to add conditional LAYOUTGET to OPEN rpc, the LAYOUTGET will need the ctx info. Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Fred Isaman authored
This will be needed to seperate return value of OPEN and LAYOUTGET when they are combined into a single RPC. Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Fred Isaman authored
nfs_init_sequence() will clear this for us. Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Benjamin Coddington authored
If the wait for a LOCK operation is interrupted, and then the file is closed, the locks cleanup code will assume that no new locks will be added to the inode after it has completed. We already have a mechanism to detect if there was signal, so let's use that to avoid recreating the local lock once the RPC completes. Also skip re-sending the LOCK operation for the various error cases if we were signaled. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> [Trond: Fix inverted test of locks_lock_inode_wait()] Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
If we get an ESTALE error in response to an RPC call operating on the file on the MDS, we should immediately cancel the layout for that file. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Dave Wysochanski authored
In nfs_idmap_read_and_verify_message there is an incorrect sprintf '%d' that converts the __u32 'im_id' from struct idmap_msg to 'id_str', which is a stack char array variable of length NFS_UINT_MAXLEN == 11. If a uid or gid value is > 2147483647 = 0x7fffffff, the conversion overflows into a negative value, for example: crash> p (unsigned) (0x80000000) $1 = 2147483648 crash> p (signed) (0x80000000) $2 = -2147483648 The '-' sign is written to the buffer and this causes a 1 byte overflow when the NULL byte is written, which corrupts kernel stack memory. If CONFIG_CC_STACKPROTECTOR_STRONG is set we see a stack-protector panic: [11558053.616565] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffffa05b8a8c [11558053.639063] CPU: 6 PID: 9423 Comm: rpc.idmapd Tainted: G W ------------ T 3.10.0-514.el7.x86_64 #1 [11558053.641990] Hardware name: Red Hat OpenStack Compute, BIOS 1.10.2-3.el7_4.1 04/01/2014 [11558053.644462] ffffffff818c7bc0 00000000b1f3aec1 ffff880de0f9bd48 ffffffff81685eac [11558053.646430] ffff880de0f9bdc8 ffffffff8167f2b3 ffffffff00000010 ffff880de0f9bdd8 [11558053.648313] ffff880de0f9bd78 00000000b1f3aec1 ffffffff811dcb03 ffffffffa05b8a8c [11558053.650107] Call Trace: [11558053.651347] [<ffffffff81685eac>] dump_stack+0x19/0x1b [11558053.653013] [<ffffffff8167f2b3>] panic+0xe3/0x1f2 [11558053.666240] [<ffffffff811dcb03>] ? kfree+0x103/0x140 [11558053.682589] [<ffffffffa05b8a8c>] ? idmap_pipe_downcall+0x1cc/0x1e0 [nfsv4] [11558053.689710] [<ffffffff810855db>] __stack_chk_fail+0x1b/0x30 [11558053.691619] [<ffffffffa05b8a8c>] idmap_pipe_downcall+0x1cc/0x1e0 [nfsv4] [11558053.693867] [<ffffffffa00209d6>] rpc_pipe_write+0x56/0x70 [sunrpc] [11558053.695763] [<ffffffff811fe12d>] vfs_write+0xbd/0x1e0 [11558053.702236] [<ffffffff810acccc>] ? task_work_run+0xac/0xe0 [11558053.704215] [<ffffffff811fec4f>] SyS_write+0x7f/0xe0 [11558053.709674] [<ffffffff816964c9>] system_call_fastpath+0x16/0x1b Fix this by calling the internally defined nfs_map_numeric_to_string() function which properly uses '%u' to convert this __u32. For consistency, also replace the one other place where snprintf is called. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Reported-by: Stephen Johnston <sjohnsto@redhat.com> Fixes: cf4ab538 ("NFSv4: Fix the string length returned by the idmapper") Cc: stable@vger.kernel.org # v3.4+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
We do not want to ignore ctime updates that originate from functions such as link(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
We may need to revalidate the change attribute, ctime and the nlinks count. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
Ensure that a delegation doesn't cause us to skip initialising the inode if it was incomplete when we exited nfs_fhget() Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
Ensure that we register the fact that the inode ctime has changed. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
Ensure that we pass down the inode of the file being deleted so that we can return any delegation being held. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
Even then it isn't really necessary. The reason why we may not want to pass in a stateid in other cases is that we cannot use the delegation credential. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Anna Schumaker authored
Having these exist as two functions doesn't seem to add anything useful, and I think merging them together makes this easier to follow. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Anna Schumaker authored
We currently have a separate function just to set this, but I think it makes more sense to set it at the same time as the other values in nfs4_init_sequence() Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Anna Schumaker authored
Rather than doing this in the generic NFS client code. Let's put this with the other v4 stuff so it's all in one place. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Anna Schumaker authored
This doesn't really need to be in the generic NFS client code, and I think it makes more sense to keep the v4 code in one place. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
NeilBrown authored
There are three places that walk all delegation for an nfs_client and restart whenever they find something interesting - potentially resulting in a quadratic search: If there are 10,000 uninteresting delegations followed by 10,000 interesting one, then the code skips over 100,000,000 delegations, which can take a noticeable amount of time. Of these nfs_delegation_reap_unclaimed() and nfs_reap_expired_delegations() are only called during unusual events: a server reboots or reports expired delegations, probably due to a network partition. Optimizing these is not particularly important. The third, nfs_client_return_marked_delegations(), is called periodically via nfs_expire_unreferenced_delegations(). It could cause periodic problems on a busy server. New delegations are added to the end of the list, so if there are 10,000 open files with delegations, and 10,000 more recently opened files that received delegations but are now closed, then nfs_client_return_marked_delegations() can take seconds to skip over the 10,000 open files 10,000 times. That is a waste of time. The avoid this waste a place-holder (an inode) is kept when locks are dropped, so that the place can usually be found again after taking rcu_readlock(). This place holder ensure that we find the right starting point in the list of nfs_servers, and makes is probable that we find the right starting point in the list of delegations. We might need to occasionally restart at the head of that list. It might be possible that the place_holder inode could lose its delegation separately, and then get a new one using the same (freed and then reallocated) 'struct nfs_delegation'. Were this to happen, the new delegation would be at the end of the list and we would miss returning some other delegations. This would have the effect of unnecessarily delaying the return of some unused delegations until the next time this function is called - typically 90 seconds later. As this is not a correctness issue and is vanishingly unlikely to happen, it does not seem worth addressing. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
NeilBrown authored
list_for_each_entry_from_rcu() is an RCU version of list_for_each_entry_from(). It walks a linked list under rcu protection, from a given start point. It is similar to list_for_each_entry_continue_rcu() but starts *at* the given position rather than *after* it. Naturally, the start point must be known to be in the list. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
NeilBrown authored
In three places we walk the list of delegations for an nfs_client until an interesting one is found, then we act of that delegation and restart the walk. New delegations are added to the end of a list and the interesting delegations are usually old, so in many case we won't repeat a long walk over and over again, but it is possible - particularly if the first server in the list has a large number of uninteresting delegations. In each cache the work done on interesting delegations will often complete without sleeping, so this could loop many times without giving up the CPU. So add a cond_resched() at an appropriate point to avoid hogging the CPU for too long. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
NeilBrown authored
There are 3 places where we walk the list of delegations for an nfs_client. In each case there are two nested loops, one for nfs_servers and one for nfs_delegations. When we find an interesting delegation we try to get an active reference to the server. If that fails, it is pointless to continue to look at the other delegation for the server as we will never be able to get an active reference. So instead of continuing in the inner loop, break out and continue in the outer loop. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
- 28 May, 2018 1 commit
-
-
Trond Myklebust authored
We can optimise away any lookup for a rename target, unless we're being asked to revalidate a dentry that might be in use. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-