- 21 Feb, 2019 1 commit
-
-
Trond Myklebust authored
A 'false retry' in NFSv4.1 occurs when the client attempts to transmit a new RPC call using a slot+sequence number combination that references an already cached one. Currently, the Linux NFS client will do this if a user process interrupts an RPC call that is in progress. The problem with doing so is that we defeat the main mechanism used by the server to differentiate between a new call and a replayed one. Even if the server is able to perfectly cache the arguments of the old call, it cannot know if the client intended to replay or send a new call. The obvious fix is to bump the sequence number pre-emptively if an RPC call is interrupted, but in order to deal with the corner cases where the interrupted call is not actually received and processed by the server, we need to interpret the error NFS4ERR_SEQ_MISORDERED as a sign that we need to either wait or locate a correct sequence number that lies between the value we sent, and the last value that was acked by a SEQUENCE call on that slot. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Tested-by: Jason Tibbitts <tibbs@math.uh.edu>
-
- 20 Feb, 2019 36 commits
-
-
Trond Myklebust authored
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
Now that we send the pages using a struct msghdr, instead of using sendpage(), we no longer need to 'prime the socket' with an address for unconnected UDP messages. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
Simplify the page send code using iov_iter and bvecs. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
Prepare to the socket transmission code to use iov_iter. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
If the client stream receive code receives an ESHUTDOWN error either because the server closed the connection, or because it sent a callback which cannot be processed, then we should shut down the connection. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
If the message read completes, but the socket returned an error condition, we should ensure to propagate that error. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
A zero length fragment is really a bug, but let's ensure we don't go nuts when one turns up. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
To ensure that the receive worker has exclusive access to the stream record info, we must not reset the contents other than when holding the transport->recv_mutex. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
ZhangXiaoxu authored
After setxattr, the nfsv3 cached the acl which set by user. But at the backend, the shared file system (eg. ext4) will check the acl, if it can merged with mode, it won't add acl to the file. So, the nfsv3 cached acl is redundant. Don't 'set_cached_acl' when setxattr. Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Kazuo Ito authored
As the block and SCSI layouts can only read/write fixed-length blocks, we must perform read-modify-write when data to be written is not aligned to a block boundary or smaller than the block size. (612aa983 pnfs: add flag to force read-modify-write in ->write_begin) The current code tries to see if we have to do read-modify-write on block-oriented pNFS layouts by just checking !PageUptodate(page), but the same condition also applies for overwriting of any uncached potions of existing files, making such operations excessively slow even it is block-aligned. The change does not affect the optimization for modify-write-read cases (38c73044 NFS: read-modify-write page updating), because partial update of !PageUptodate() pages can only happen in layouts that can do arbitrary length read/write and never in block-based ones. Testing results: We ran fio on one of the pNFS clients running 4.20 kernel (vanilla and patched) in this configuration to read/write/overwrite files on the storage array, exported as pnfs share by the server. pNFS clients ---1G Ethernet--- pNFS server (HP DL360 G8) (HP DL360 G8) | | | | +------8G Fiber Channel--------+ | Storage Array (HP P6350) Throughput of overwrite (both buffered and O_SYNC) is noticeably improved. Ops. |block size| Throughput | | (KiB) | (MiB/s) | | | 4.20 | patched| ---------+----------+----------------+ buffered | 4| 21.3 | 232 | overwrite| 32| 22.2 | 256 | | 512| 22.4 | 260 | ---------+----------+----------------+ O_SYNC | 4| 3.84| 4.77| overwrite| 32| 12.2 | 32.0 | | 512| 18.5 | 152 | ---------+----------+----------------+ Read and write (buffered and O_SYNC) by the same client remain unchanged by the patch either negatively or positively, as they should do. Ops. |block size| Throughput | | (KiB) | (MiB/s) | | | 4.20 | patched| ---------+----------+----------------+ read | 4| 548 | 550 | | 32| 547 | 551 | | 512| 548 | 551 | ---------+----------+----------------+ buffered | 4| 237 | 244 | write | 32| 261 | 268 | | 512| 265 | 272 | ---------+----------+----------------+ O_SYNC | 4| 0.46| 0.46| write | 32| 3.60| 3.57| | 512| 105 | 106 | ---------+----------+----------------+ Signed-off-by: Kazuo Ito <ito_kazuo_g3@lab.ntt.co.jp> Tested-by: Hiroyuki Watanabe <watanabe.hiroyuki@lab.ntt.co.jp> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Kazuo Ito authored
nfs_want_read_modify_write() didn't check for !PagePrivate when pNFS block or SCSI layout was in use, therefore we could lose data forever if the page being written was filled by a read before completion. Signed-off-by: Kazuo Ito <ito_kazuo_g3@lab.ntt.co.jp> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
zhangliguang authored
This fixes the typo in comments of nfs_readdir_alloc_pages(). Because nfs_readdir_large_page and nfs_readdir_free_pagearray had been renamed. Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
zhangliguang authored
This removes redundant semicolon for ending code. Fixes: c7944ebb ("NFSv4: Fix lookup revalidate of regular files") Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
luanshi authored
When listing very large directories via NFS, clients may take a long time to complete. There are about three factors involved: First of all, ls and practically every other method of listing a directory including python os.listdir and find rely on libc readdir(). However readdir() only reads 32K of directory entries at a time, which means that if you have a lot of files in the same directory, it is going to take an insanely long time to read all the directory entries. Secondly, libc readdir() reads 32K of directory entries at a time, in kernel space 32K buffer split into 8 pages. One NFS readdirplus rpc will be called for one page, which introduces many readdirplus rpc calls. Lastly, one NFS readdirplus rpc asks for 32K data (filled by nfs_dentry) to fill one page (filled by dentry), we found that nearly one third of data was wasted. To solve above problems, pagecache mechanism was introduced. One NFS readdirplus rpc will ask for a large data (more than 32k), the data can fill more than one page, the cached pages can be used for next readdir call. This can reduce many readdirplus rpc calls and improve readdirplus performance. TESTING: When listing very large directories(include 300 thousand files) via NFS time ls -l /nfs_mount | wc -l without the patch: 300001 real 1m53.524s user 0m2.314s sys 0m2.599s with the patch: 300001 real 0m23.487s user 0m2.305s sys 0m2.558s Improved performance: 79.6% readdirplus rpc calls decrease: 85% Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Eric W. Biederman authored
In the rare and unsupported case of a hostname list nfs_parse_devname will modify dev_name. There is no need to modify dev_name as the all that is being computed is the length of the hostname, so the computed length can just be shorted. Fixes: dc045898 ("NFS: Use common device name parsing logic for NFSv4 and NFSv2/v3") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
NeilBrown authored
As reported by Dan Carpenter, this test for acred->cred being set is inconsistent with the dereference of the pointer a few lines earlier. An 'auth_cred' *always* has ->cred set - every place that creates one initializes this field, often as the first thing done. So remove this test. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Julia Lawall authored
Drop LIST_HEAD where the variable it declares has never been used. The semantic patch that fixes this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ identifier x; @@ - LIST_HEAD(x); ... when != x // </smpl> Fixes: 0e20162e ("NFSv4.1 Use MDS auth flavor for data server connection") Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
When we resend a request, ensure that the 'rq_bytes_sent' is reset to zero. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
Because we clear XPRT_SOCK_DATA_READY before reading, we can end up with a situation where new data arrives, causing xs_data_ready() to queue up a second receive worker job for the same socket, which then immediately gets stuck waiting on the transport receive mutex. The fix is to only clear XPRT_SOCK_DATA_READY once we're done reading, and then to use poll() to check if we might need to queue up a new job in order to deal with any new data. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
Set memalloc_nofs_save() on all the rpciod/xprtiod jobs so that we ensure memory allocations for asynchronous rpc calls don't ever end up recursing back to the NFS layer for memory reclaim. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
Fix up some compiler warnings about function parameters, etc not being correctly described or formatted. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
All the allocations that we can hit in the NFS layer and sunrpc layers themselves are already marked as GFP_NOFS, but we need to ensure that any calls to generic kernel functionality do the right thing as well. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
Allow the caller to pass error information when cleaning up a failed I/O request so that we can conditionally take action to cancel the request altogether if the error turned out to be fatal. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
In several places we're just moving the struct nfs_page from one list to another by first removing from the existing list, then adding to the new one. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
If the I/O completion failed with a fatal error, then we should just exit nfs_pageio_complete_mirror() rather than try to recoalesce. Fixes: a7d42ddb ("nfs: add mirroring support to pgio layer") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v4.0+
-
Trond Myklebust authored
Whether we need to exit early, or just reprocess the list, we must not lost track of the request which failed to get recoalesced. Fixes: 03d5eb65 ("NFS: Fix a memory leak in nfs_do_recoalesce") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v4.0+
-
Trond Myklebust authored
When we fail to add the request to the I/O queue, we currently leave it to the caller to free the failed request. However since some of the requests that fail are actually created by nfs_pageio_add_request() itself, and are not passed back the caller, this leads to a leakage issue, which can again cause page locks to leak. This commit addresses the leakage by freeing the created requests on error, using desc->pg_completion_ops->error_cleanup() Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Fixes: a7d42ddb ("nfs: add mirroring support to pgio layer") Cc: stable@vger.kernel.org # v4.0: c18b96a1: nfs: clean up rest of reqs Cc: stable@vger.kernel.org # v4.0: d600ad1f: NFS41: pop some layoutget Cc: stable@vger.kernel.org # v4.0+
-
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/soundLinus Torvalds authored
Pull sound fixes from Takashi Iwai: "Here are a few last-minute fixes for 5.0. The most significant one is the OF-node refcount fix for ASoC simple-card, which could be triggered on many boards. Another fix for ASoC core is for the error handling in topology, while others are device-specific fixes for Samsung and HD-audio" * tag 'sound-5.0' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ASoC: simple-card: fixup refcount_t underflow ASoC: topology: free created components in tplg load error ALSA: hda/realtek: Disable PC beep in passthrough on alc285 ALSA: hda/realtek - Headset microphone and internal speaker support for System76 oryp5 ASoC: samsung: i2s: Fix prescaler setting for the secondary DAI
-
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrlLinus Torvalds authored
Pull pin control fixes from Linus Walleij: "Some final pin control fixes (I hope) to round off the v5.0 pin control development cycle. Only driver fixes, one for stable: - Meson8B fixup for the sdc pins - Fix SDC tile position for Qualcomm QCS404" * tag 'pinctrl-v5.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: meson: meson8b: fix the sdxc_a data 1..3 pins pinctrl: qcom: qcs404: Correct SDC tile
-
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpioLinus Torvalds authored
Pull GPIO fixes from Linus Walleij: "Two GPIO fixes for the v5.0 series: - Per-instance irqchip on the MT7621 - Avoid direction setting using pin control on MMP2" * tag 'gpio-v5.0-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: gpio: pxa: avoid attempting to set pin direction via pinctrl on MMP2 gpio: MT7621: use a per instance irq_chip structure
-
git://git.infradead.org/linux-mtdLinus Torvalds authored
Pull MTD fixes from Boris Brezillon: - Don't add a digit to MTD-backed nvmem device names - Make sure powernv flash names are unique * tag 'mtd/fixes-for-5.0-rc8' of git://git.infradead.org/linux-mtd: mtd: powernv_flash: Fix device registration error mtd: Use mtd->name when registering nvmem device
-
Linus Torvalds authored
Merge branch 'fixes-v5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull keys fixes from James Morris: - Handle quotas better, allowing full quota to be reached. - Fix the creation of shortcuts in the assoc_array internal representation when the index key needs to be an exact multiple of the machine word size. - Fix a dependency loop between the request_key contruction record and the request_key authentication key. The construction record isn't really necessary and can be dispensed with. - Set the timestamp on a new key rather than leaving it as 0. This would ordinarily be fine - provided the system clock is never set to a time before 1970 * 'fixes-v5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: keys: Timestamp new keys keys: Fix dependency loop between construction record and auth key assoc_array: Fix shortcut creation KEYS: allow reaching the keys quotas exactly
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds authored
Pull networking fixes from David Miller: 1) Fix suspend and resume in mt76x0u USB driver, from Stanislaw Gruszka. 2) Missing memory barriers in xsk, from Magnus Karlsson. 3) rhashtable fixes in mac80211 from Herbert Xu. 4) 32-bit MIPS eBPF JIT fixes from Paul Burton. 5) Fix for_each_netdev_feature() on big endian, from Hauke Mehrtens. 6) GSO validation fixes from Willem de Bruijn. 7) Endianness fix for dwmac4 timestamp handling, from Alexandre Torgue. 8) More strict checks in tcp_v4_err(), from Eric Dumazet. 9) af_alg_release should NULL out the sk after the sock_put(), from Mao Wenan. 10) Missing unlock in mac80211 mesh error path, from Wei Yongjun. 11) Missing device put in hns driver, from Salil Mehta. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits) sky2: Increase D3 delay again vhost: correctly check the return value of translate_desc() in log_used() net: netcp: Fix ethss driver probe issue net: hns: Fixes the missing put_device in positive leg for roce reset net: stmmac: Fix a race in EEE enable callback qed: Fix iWARP syn packet mac address validation. qed: Fix iWARP buffer size provided for syn packet processing. r8152: Add support for MAC address pass through on RTL8153-BD mac80211: mesh: fix missing unlock on error in table_path_del() net/mlx4_en: fix spelling mistake: "quiting" -> "quitting" net: crypto set sk to NULL when af_alg_release. net: Do not allocate page fragments that are not skb aligned mm: Use fixed constant in page_frag_alloc instead of size + 1 tcp: tcp_v4_err() should be more careful tcp: clear icsk_backoff in tcp_write_queue_purge() net: mv643xx_eth: disable clk on error path in mv643xx_eth_shared_probe() qmi_wwan: apply SET_DTR quirk to Sierra WP7607 net: stmmac: handle endianness in dwmac4_get_timestamp doc: Mention MSG_ZEROCOPY implementation for UDP mlxsw: __mlxsw_sp_port_headroom_set(): Fix a use of local variable ...
-
- 19 Feb, 2019 3 commits
-
-
Kai-Heng Feng authored
Another platform requires even longer delay to make the device work correctly after S3. So increase the delay to 300ms. BugLink: https://bugs.launchpad.net/bugs/1798921Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jason Wang authored
When fail, translate_desc() returns negative value, otherwise the number of iovs. So we should fail when the return value is negative instead of a blindly check against zero. Detected by CoverityScan, CID# 1442593: Control flow issues (DEADCODE) Fixes: cc5e7107 ("vhost: log dirty page correctly") Acked-by: Michael S. Tsirkin <mst@redhat.com> Reported-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Takashi Iwai authored
Merge tag 'asoc-fix-v5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v5.0 A few small fixes, a driver fix for Samsung, a fix for refcounting of of_nodes in the simple-card driver that triggered on a lot of systems and a fix for topology error handling.
-