Commits · 71423274498169911bf9eedf02d5e7ac0a083801 · nexedi / linux

04 Aug, 2016 6 commits

nfsd: clean up bad-type check in nfsd_create_locked · 71423274
J. Bruce Fields authored Jul 22, 2016
```
Minor cleanup, no change in behavior.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
```
71423274

nfsd: remove unnecessary positive-dentry check · d03d9fe4

J. Bruce Fields authored Jul 21, 2016

vfs_{create,mkdir,mknod} each begin with a call to may_create(), which
returns EEXIST if the object already exists.

This check is therefore unnecessary.

(In the NFSv2 case, nfsd_proc_create also has such a check.  Contrary to
RFC 1094, our code seems to believe that a CREATE of an existing file
should succeed.  I'm leaving that behavior alone.)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

d03d9fe4

nfsd: reorganize nfsd_create · b44061d0

J. Bruce Fields authored Jul 20, 2016

There's some odd logic in nfsd_create() that allows it to be called with
the parent directory either locked or unlocked.  The only already-locked
caller is NFSv2's nfsd_proc_create().  It's less confusing to split out
the unlocked case into a separate function which the NFSv2 code can call
directly.

Also fix some comments while we're here.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

b44061d0

nfsd: check d_can_lookup in fh_verify of directories · e75b23f9

J. Bruce Fields authored Jul 19, 2016

Create and other nfsd ops generally assume we can call lookup_one_len on
inodes with S_IFDIR set.  Al says that this assumption isn't true in
general, though it should be for the filesystem objects nfsd sees.

Add a check just to make sure our assumption isn't violated.

Remove a couple checks for i_op->lookup in create code.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

e75b23f9

nfsd: remove redundant zero-length check from create · 12391d07

J. Bruce Fields authored Jul 19, 2016

lookup_one_len already has this check.

The only effect of this patch is to return access instead of perm in the
0-length-filename case.  I actually prefer nfserr_perm (or _inval?), but
I doubt anyone cares.

The isdotent check seems redundant too, but I worry that some client
might actually care about that strange nfserr_exist error.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

12391d07

nfsd: Make creates return EEXIST instead of EACCES · 7eed34f1

Oleg Drokin authored Jul 14, 2016

When doing a create (mkdir/mknod) on a name, it's worth
checking the name exists first before returning EACCES in case
the directory is not writeable by the user.
This makes return values on the client more consistent
regardless of whenever the entry there is cached in the local
cache or not.
Another positive side effect is certain programs only expect
EEXIST in that case even despite POSIX allowing any valid
error to be returned.
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

7eed34f1

01 Aug, 2016 2 commits

SUNRPC: Detect immediate closure of accepted sockets · c7995f8a

Trond Myklebust authored Jul 26, 2016

This modification is useful for debugging issues that happen while
the socket is being initialised.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

c7995f8a

SUNRPC: accept() may return sockets that are still in SYN_RECV · b2f21f7d

Trond Myklebust authored Jul 26, 2016

We're seeing traces of the following form:

 [10952.396347] svc: transport ffff88042ba4a 000 dequeued, inuse=2
 [10952.396351] svc: tcp_accept ffff88042ba4 a000 sock ffff88042a6e4c80
 [10952.396362] nfsd: connect from 10.2.6.1, port=187
 [10952.396364] svc: svc_setup_socket ffff8800b99bcf00
 [10952.396368] setting up TCP socket for reading
 [10952.396370] svc: svc_setup_socket created ffff8803eb10a000 (inet ffff88042b75b800)
 [10952.396373] svc: transport ffff8803eb10a000 put into queue
 [10952.396375] svc: transport ffff88042ba4a000 put into queue
 [10952.396377] svc: server ffff8800bb0ec000 waiting for data (to = 3600000)
 [10952.396380] svc: transport ffff8803eb10a000 dequeued, inuse=2
 [10952.396381] svc_recv: found XPT_CLOSE
 [10952.396397] svc: svc_delete_xprt(ffff8803eb10a000)
 [10952.396398] svc: svc_tcp_sock_detach(ffff8803eb10a000)
 [10952.396399] svc: svc_sock_detach(ffff8803eb10a000)
 [10952.396412] svc: svc_sock_free(ffff8803eb10a000)

i.e. an immediate close of the socket after initialisation.

The culprit appears to be the test at the end of svc_tcp_init, which
checks if the newly created socket is in the TCP_ESTABLISHED state,
and immediately closes it if not. The evidence appears to suggest that
the socket might still be in the SYN_RECV state at this time.

The fix is to check for both states, and then to add a check in
svc_tcp_state_change() to ensure we don't close the socket when
it transitions into TCP_ESTABLISHED.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

b2f21f7d

15 Jul, 2016 4 commits

nfsd: allow nfsd to advertise multiple layout types · 8a4c3926

Jeff Layton authored Jul 10, 2016

If the underlying filesystem supports multiple layout types, then there
is little reason not to advertise that fact to clients and let them
choose what type to use.

Turn the ex_layout_type field into a bitfield. For each supported
layout type, we set a bit in that field. When the client requests a
layout, ensure that the bit for that layout type is set. When the
client requests attributes, send back a list of supported types.
Signed-off-by: Jeff Layton <jlayton@poochiereds.net>
Reviewed-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

8a4c3926

nfsd: Close race between nfsd4_release_lockowner and nfsd4_lock · 88584818

Chuck Lever authored Jul 13, 2016

nfsd4_release_lockowner finds a lock owner that has no lock state,
and drops cl_lock. Then release_lockowner picks up cl_lock and
unhashes the lock owner.

During the window where cl_lock is dropped, I don't see anything
preventing a concurrent nfsd4_lock from finding that same lock owner
and adding lock state to it.

Move release_lockowner() into nfsd4_release_lockowner and hang onto
the cl_lock until after the lock owner's state cannot be found
again.

Found by inspection, we don't currently have a reproducer.

Fixes: 2c41beb0 ("nfsd: reduce cl_lock thrashing in ... ")
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

88584818

nfsd/blocklayout: Make sure calculate signature/designator length aligned · dd51db18

Kinglong Mee authored Jul 14, 2016

These values are all multiples of 4 already, so there's no change in
behavior from this patch.  But perhaps this will prevent mistakes in the
future.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

dd51db18

xfs: abstract block export operations from nfsd layouts · 15d66ac2

Benjamin Coddington authored Jul 08, 2016

Instead of creeping pnfs layout configuration into filesystems, move the
definition of block-based export operations under a more abstract
configuration.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

15d66ac2

13 Jul, 2016 17 commits

SUNRPC: Remove unused callback xpo_adjust_wspace() · f4a4906e

Trond Myklebust authored Jun 24, 2016

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

f4a4906e

SUNRPC: Change TCP socket space reservation · 637600f3

Trond Myklebust authored Jun 24, 2016

The current server rpc tcp code attempts to predict how much writeable
socket space will be available to a given RPC call before accepting it
for processing.  On a 40GigE network, we've found this throttles
individual clients long before the network or disk is saturated.  The
server may handle more clients easily, but the bandwidth of individual
clients is still artificially limited.

Instead of trying (and failing) to predict how much writeable socket space
will be available to the RPC call, just fall back to the simple model of
deferring processing until the socket is uncongested.

This may increase the risk of fast clients starving slower clients; in
such cases, the previous patch allows setting a hard per-connection
limit.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

637600f3

SUNRPC: Add a server side per-connection limit · ff3ac5c3

Trond Myklebust authored Jun 24, 2016

Allow the user to limit the number of requests serviced through a single
connection, to help prevent faster clients from starving slower clients.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

ff3ac5c3

SUNRPC: Micro optimisation for svc_data_ready · 4720b070

Trond Myklebust authored Jun 24, 2016

Don't call svc_xprt_enqueue() if the XPT_DATA flag is already set.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

4720b070

SUNRPC: Call the default socket callbacks instead of open coding · fa9251af

Trond Myklebust authored Jun 24, 2016

Rather than code up our own versions of the socket callbacks, just
call the defaults.
This also allows us to merge svc_udp_data_ready() and svc_tcp_data_ready().
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

fa9251af

SUNRPC: lock the socket while detaching it · 069c225b

Trond Myklebust authored Jun 24, 2016

Prevent callbacks from triggering while we're detaching the socket.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

069c225b

SUNRPC: Add tracepoints for dropped and deferred requests · 104f6351

Trond Myklebust authored Jun 24, 2016

Dropping and/or deferring requests has an impact on performance. Let's
make sure we can trace those events.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

104f6351

SUNRPC: Add a tracepoint for server socket out-of-space conditions · 82ea2d76

Trond Myklebust authored Jun 24, 2016

Add a tracepoint to track when the processing of incoming RPC data gets
deferred due to out-of-space issues on the outgoing transport.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

82ea2d76

nfsd: Fix some indent inconsistancy · d28c442f

Christophe JAILLET authored Jul 02, 2016

Silent a few smatch warnings about indentation
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

d28c442f

nfsd: Correct a comment for NFSD_MAY_ defines location · 93f580a9

Oleg Drokin authored Jul 07, 2016

Those are now defined in fs/nfsd/vfs.h
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Reviewed-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

93f580a9

nfsd: Add a super simple flex file server · 9b9960a0

Tom Haynes authored Jun 14, 2016

Have a simple flex file server where the mds (NFSv4.1 or NFSv4.2)
is also the ds (NFSv3). I.e., the metadata and the data file are
the exact same file.

This will allow testing of the flex file client.

Simply add the "pnfs" export option to your export
in /etc/exports and mount from a client that supports
flex files.
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

9b9960a0

nfsd: flex file device id encoding will need the server address · d7c920d1

Tom Haynes authored Jun 14, 2016

Signed-off-by: Tom Haynes <loghyr@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

d7c920d1

sunrpc: add gss minor status to svcauth_gss_proxy_init · 04d70eda

Scott Mayhew authored Jun 15, 2016

GSS-Proxy doesn't produce very much debug logging at all.  Printing out
the gss minor status will aid in troubleshooting if the
GSS_Accept_sec_context upcall fails.
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

04d70eda

nfsd: implement machine credential support for some operations · ed941643

Andrew Elble authored Jun 15, 2016

This addresses the conundrum referenced in RFC5661 18.35.3,
and will allow clients to return state to the server using the
machine credentials.

The biggest part of the problem is that we need to allow the client
to send a compound op with integrity/privacy on mounts that don't
have it enabled.

Add server support for properly decoding and using spo_must_enforce
and spo_must_allow bits. Add support for machine credentials to be
used for CLOSE, OPEN_DOWNGRADE, LOCKU, DELEGRETURN,
and TEST/FREE STATEID.
Implement a check so as to not throw WRONGSEC errors when these
operations are used if integrity/privacy isn't turned on.

Without this, Linux clients with credentials that expired while holding
delegations were getting stuck in an endless loop.
Signed-off-by: Andrew Elble <aweits@rit.edu>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

ed941643

nfsd: allow mach_creds_match to be used more broadly · dedeb13f

Andrew Elble authored Jun 15, 2016

Rename mach_creds_match() to nfsd4_mach_creds_match() and un-staticify
Signed-off-by: Andrew Elble <aweits@rit.edu>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

dedeb13f

nfs/nfsd: Move useful bitfield ops to a commonly accessible place · 1adf0c5a

Andrew Elble authored Jun 15, 2016

So these may be used in nfsd as well
Signed-off-by: Andrew Elble <aweits@rit.edu>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

1adf0c5a

sunrpc: remove 'inuse' flag from struct cache_detail. · d8d29138

NeilBrown authored Jun 02, 2016

This field is not currently in use.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

d8d29138

01 Jul, 2016 2 commits

SUNRPC: Don't allocate a full sockaddr_storage for tracing · db1bb44c

Trond Myklebust authored Jun 24, 2016

We're always tracing IPv4 or IPv6 addresses, so we can save a lot
of space on the ringbuffer by allocating the correct sockaddr size.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org
Fixes: 83a712e0 "sunrpc: add some tracepoints around ..."
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

db1bb44c

locks: use file_inode() · 6343a212

Miklos Szeredi authored Jul 01, 2016

(Another one for the f_path debacle.)

ltp fcntl33 testcase caused an Oops in selinux_file_send_sigiotask.

The reason is that generic_add_lease() used filp->f_path.dentry->inode
while all the others use file_inode().  This makes a difference for files
opened on overlayfs since the former will point to the overlay inode the
latter to the underlying inode.

So generic_add_lease() added the lease to the overlay inode and
generic_delete_lease() removed it from the underlying inode.  When the file
was released the lease remained on the overlay inode's lock list, resulting
in use after free.
Reported-by: Eryu Guan <eguan@redhat.com>
Fixes: 4bacc9c9 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
Cc: <stable@vger.kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

6343a212

30 Jun, 2016 1 commit

lockd: unregister notifier blocks if the service fails to come up completely · cb7d224f

Scott Mayhew authored Jun 30, 2016

If the lockd service fails to start up then we need to be sure that the
notifier blocks are not registered, otherwise a subsequent start of the
service could cause the same notifier to be registered twice, leading to
soft lockups.
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 0751ddf7 "lockd: Register callbacks on the inetaddr_chain..."
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

cb7d224f

27 Jun, 2016 1 commit
- Linux 4.7-rc5 · 4c2e07c6
  Linus Torvalds authored Jun 26, 2016
  
  4c2e07c6
26 Jun, 2016 1 commit

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 2ac9b973

Linus Torvalds authored Jun 26, 2016

Pull SCSI fixes from James Bottomley:
 "Two straightforward fixes.

  One is a concurrency issue only affecting SAS connected SATA drives,
  but which could hang the storage subsystem if it triggers (because the
  outstanding command count on error never goes back to zero) and the
  other is a NO_TAG fallout from the switch to hostwide tags which
  causes the system to crash on module insertion (we've checked
  carefully and only the 53c700 family of drivers is vulnerable to this
  issue)"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  53c700: fix BUG on untagged commands
  scsi: fix race between simultaneous decrements of ->host_failed

2ac9b973

25 Jun, 2016 6 commits

Merge branch 'for-linus-4.7-part2' of... · da2f6aba

Linus Torvalds authored Jun 25, 2016

Merge branch 'for-linus-4.7-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs

Pull btrfs fixes part 2 from Chris Mason:
 "This has one patch from Omar to bring iterate_shared back to btrfs.

  We have a tree of work we queue up for directory items and it doesn't
  lend itself well to shared access.  While we're cleaning it up, Omar
  has changed things to use an exclusive lock when there are delayed
  items"

* 'for-linus-4.7-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix ->iterate_shared() by upgrading i_rwsem for delayed nodes

da2f6aba

Merge branch 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · b971712a

Linus Torvalds authored Jun 25, 2016

Pull btrfs fixes from Chris Mason:
 "I have a two part pull this time because one of the patches Dave
  Sterba collected needed to be against v4.7-rc2 or higher (we used
  rc4).  I try to make my for-linus-xx branch testable on top of the
  last major so we can hand fixes to people on the list more easily, so
  I've split this pull in two.

  This first part has some fixes and two performance improvements that
  we've been testing for some time.

  Josef's two performance fixes are most notable.  The transid tracking
  patch makes a big improvement on pretty much every workload"

* 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: Force stripesize to the value of sectorsize
  btrfs: fix disk_i_size update bug when fallocate() fails
  Btrfs: fix error handling in map_private_extent_buffer
  Btrfs: fix error return code in btrfs_init_test_fs()
  Btrfs: don't do nocow check unless we have to
  btrfs: fix deadlock in delayed_ref_async_start
  Btrfs: track transid for delayed ref flushing

b971712a

Merge tag 'sound-4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · ca83a55c

Linus Torvalds authored Jun 25, 2016

Pull sound fixes from Takashi Iwai:
 "Again pretty calm weeks: we've had only a few trivial / stable
  HD-audio fixes in addition to a possible race fix for snd-dummy driver
  spotted by syzkaller"

* tag 'sound-4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: dummy: Fix a use-after-free at closing
  ALSA: hda / realtek - add two more Thinkpad IDs (5050,5053) for tpt460 fixup
  ALSA: hda - Fix the headset mic jack detection on Dell machine
  ALSA: hda/tegra: iomem fixups for sparse warnings
  ALSA: hdac_regmap - fix the register access for runtime PM

ca83a55c

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9a949a98

Linus Torvalds authored Jun 25, 2016

Pull x86 kprobe fix from Thomas Gleixner:
 "A single fix clearing the TF bit when a fault is single stepped"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kprobes/x86: Clear TF bit in fault on single-stepping

9a949a98

Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 57801c1b

Linus Torvalds authored Jun 25, 2016

Pull scheduler fixes from Thomas Gleixner:
 "A couple of scheduler fixes:

   - force watchdog reset while processing sysrq-w

   - fix a deadlock when enabling trace events in the scheduler

   - fixes to the throttled next buddy logic

   - fixes for the average accounting (missing serialization and
     underflow handling)

   - allow kernel threads for fallback to online but not active cpus"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/core: Allow kthreads to fall back to online && !active cpus
  sched/fair: Do not announce throttled next buddy in dequeue_task_fair()
  sched/fair: Initialize throttle_count for new task-groups lazily
  sched/fair: Fix cfs_rq avg tracking underflow
  kernel/sysrq, watchdog, sched/core: Reset watchdog on all CPUs while processing sysrq-w
  sched/debug: Fix deadlock when enabling sched events
  sched/fair: Fix post_init_entity_util_avg() serialization

57801c1b

Btrfs: fix ->iterate_shared() by upgrading i_rwsem for delayed nodes · 02dbfc99

Omar Sandoval authored May 20, 2016

Commit fe742fd4 ("Revert "btrfs: switch to ->iterate_shared()"")
backed out the conversion to ->iterate_shared() for Btrfs because the
delayed inode handling in btrfs_real_readdir() is racy. However, we can
still do readdir in parallel if there are no delayed nodes.

This is a temporary fix which upgrades the shared inode lock to an
exclusive lock only when we have delayed items until we come up with a
more complete solution. While we're here, rename the
btrfs_{get,put}_delayed_items functions to make it very clear that
they're just for readdir.

Tested with xfstests and by doing a parallel kernel build:

	while make tinyconfig && make -j4 && git clean dqfx; do
		:
	done

along with a bunch of parallel finds in another shell:

	while true; do
		for ((i=0; i<4; i++)); do
			find . >/dev/null &
		done
		wait
	done
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>

02dbfc99