Commits · 7f25bba819a38ab7310024a9350655f374707e20 · Kirill Smelkov / linux

02 Apr, 2014 32 commits

cifs_iovec_read: keep iov_iter between the calls of cifs_readdata_to_iov() · 7f25bba8

Al Viro authored Feb 04, 2014

... we are doing them on adjacent parts of file, so what happens is that
each subsequent call works to rebuild the iov_iter to exact state it
had been abandoned in by previous one.  Just keep it through the entire
cifs_iovec_read().  And use copy_page_to_iter() instead of doing
kmap/copy_to_user/kunmap manually...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

7f25bba8

switch vmsplice_to_user() to copy_page_to_iter() · 6130f531

Al Viro authored Feb 03, 2014

I've switched the sanity checks on iovec to rw_copy_check_uvector();
we might need to do a local analog, if any behaviour differences are
not actually bugfixes here...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

6130f531

switch pipe_read() to copy_page_to_iter() · 637b58c2
Al Viro authored Feb 03, 2014
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
637b58c2

cifs_iovec_read(): resubmit shouldn't restart the loop · 74027f4a

Al Viro authored Feb 04, 2014

... by that point the request we'd just resent is in the
head of the list anyway.  Just return to the beginning of
the loop body...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

74027f4a

introduce copy_page_to_iter, kill loop over iovec in generic_file_aio_read() · 6e58e79d

Al Viro authored Feb 03, 2014

generic_file_aio_read() was looping over the target iovec, with loop over
(source) pages nested inside that.  Just set an iov_iter up and pass *that*
to do_generic_file_aio_read().  With copy_page_to_iter() doing all work
of mapping and copying a page to iovec and advancing iov_iter.

Switch shmem_file_aio_read() to the same and kill file_read_actor(), while
we are at it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

6e58e79d

iov_iter: Move iov_iter to uio.h · 92236878
Kent Overstreet authored Nov 27, 2013
```
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
```
92236878
do_shmem_file_read(): call file_read_actor() directly · 8142c184
Al Viro authored Feb 02, 2014
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
8142c184
callers of iov_copy_from_user_atomic() don't need pagecache_disable() · 9e8c2af9
Al Viro authored Feb 02, 2014
```
... it does that itself (via kmap_atomic())
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
9e8c2af9
switch ->is_partially_uptodate() to saner arguments · c186afb4
Al Viro authored Feb 02, 2014
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
c186afb4

pipe: kill ->map() and ->unmap() · fbb32750

Al Viro authored Feb 02, 2014

all pipe_buffer_operations have the same instances of those...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

fbb32750

fuse/dev: use atomic maps · 58bda1da
Al Viro authored Feb 02, 2014
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
58bda1da

VFS: Make delayed_free() call free_vfsmnt() · 8ffcb32e

David Howells authored Jan 24, 2014

Make delayed_free() call free_vfsmnt() so that we don't have two functions
doing the same job.  This requires the calls to mnt_free_id() in free_vfsmnt()
to be moved into the callers of that function.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

8ffcb32e

mn10300: kmap_atomic() returns void *, not unsigned long... · 3ef120a4
Al Viro authored Feb 02, 2014
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
3ef120a4
cifs: ->rename() without ->lookup() makes no sense · 81c5a684
Al Viro authored Feb 01, 2014
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
81c5a684
get rid of pointless checks for NULL ->i_op · 627bf81a
Al Viro authored Feb 01, 2014
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
627bf81a
ntfs: don't put NULL into ->i_op/->i_fop · 05faf316
Al Viro authored Feb 01, 2014
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
05faf316
new helper: readlink_copy() · 5d826c84
Al Viro authored Mar 14, 2014
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
5d826c84
lustre: generic_readlink() is just fine there, TYVM... · 4efcc9ff
Al Viro authored Mar 14, 2014
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
4efcc9ff

get rid of files_defer_init() · 7f4b36f9

Al Viro authored Mar 14, 2014

the only thing it's doing these days is calculation of
upper limit for fs.nr_open sysctl and that can be done
statically
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

7f4b36f9

namei.c: move EXPORT_SYMBOL to corresponding definitions · 4d359507
Al Viro authored Mar 14, 2014
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
4d359507
get_write_access() is inlined, exporting it is pointless · 0018d8bf
Al Viro authored Mar 14, 2014
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
0018d8bf
tidy do_dentry_open() up a bit · 3f4d5a00
Al Viro authored Mar 14, 2014
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
3f4d5a00

mark struct file that had write access grabbed by open() · 83f936c7

Al Viro authored Mar 14, 2014

new flag in ->f_mode - FMODE_WRITER. Set by do_dentry_open() in case
when it has grabbed write access, checked by __fput() to decide whether
it wants to drop the sucker. Allows to stop bothering with mnt_clone_write()
in alloc_file(), along with fewer special_file() checks.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

83f936c7

fold __get_file_write_access() into its only caller · 0ccb2863
Al Viro authored Mar 14, 2014
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
0ccb2863

get rid of DEBUG_WRITECOUNT · 4597e695

Al Viro authored Mar 14, 2014

it only makes control flow in __fput() and friends more convoluted.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

4597e695

don't bother with {get,put}_write_access() on non-regular files · dd20908a

Al Viro authored Mar 14, 2014

it's pointless and actually leads to wrong behaviour in at least one
moderately convoluted case (pipe(), close one end, try to get to
another via /proc/*/fd and run into ETXTBUSY).

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

dd20908a

ncpfs: switch to sockfd_lookup()/sockfd_put() · 44ba8406
Al Viro authored Mar 06, 2014
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
44ba8406
switch nbd to sockfd_lookup/sockfd_put · e2511578
Al Viro authored Mar 05, 2014
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
e2511578
vhost: don't open-code sockfd_put() · 09aaacf0
Al Viro authored Mar 05, 2014
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
09aaacf0
usbip: don't open-code sockfd_lookup/sockfd_put · 964ea96e
Al Viro authored Mar 05, 2014
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
964ea96e
reduce m_start() cost... · c7999c36
Al Viro authored Feb 27, 2014
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
c7999c36

smarter propagate_mnt() · f2ebb3a9

Al Viro authored Feb 27, 2014

The current mainline has copies propagated to *all* nodes, then
tears down the copies we made for nodes that do not contain
counterparts of the desired mountpoint.  That sets the right
propagation graph for the copies (at teardown time we move
the slaves of removed node to a surviving peer or directly
to master), but we end up paying a fairly steep price in
useless allocations.  It's fairly easy to create a situation
where N calls of mount(2) create exactly N bindings, with
O(N^2) vfsmounts allocated and freed in process.

Fortunately, it is possible to avoid those allocations/freeings.
The trick is to create copies in the right order and find which
one would've eventually become a master with the current algorithm.
It turns out to be possible in O(nodes getting propagation) time
and with no extra allocations at all.

One part is that we need to make sure that eventual master will be
created before its slaves, so we need to walk the propagation
tree in a different order - by peer groups.  And iterate through
the peers before dealing with the next group.

Another thing is finding the (earlier) copy that will be a master
of one we are about to create; to do that we are (temporary) marking
the masters of mountpoints we are attaching the copies to.

Either we are in a peer of the last mountpoint we'd dealt with,
or we have the following situation: we are attaching to mountpoint M,
the last copy S_0 had been attached to M_0 and there are sequences
S_0...S_n, M_0...M_n such that S_{i+1} is a master of S_{i},
S_{i} mounted on M{i} and we need to create a slave of the first S_{k}
such that M is getting propagation from M_{k}.  It means that the master
of M_{k} will be among the sequence of masters of M.  On the
other hand, the nearest marked node in that sequence will either
be the master of M_{k} or the master of M_{k-1} (the latter -
in the case if M_{k-1} is a slave of something M gets propagation
from, but in a wrong peer group).

So we go through the sequence of masters of M until we find
a marked one (P).  Let N be the one before it.  Then we go through
the sequence of masters of S_0 until we find one (say, S) mounted
on a node D that has P as master and check if D is a peer of N.
If it is, S will be the master of new copy, if not - the master of S
will be.

That's it for the hard part; the rest is fairly simple.  Iterator
is in next_group(), handling of one prospective mountpoint is
propagate_one().

It seems to survive all tests and gives a noticably better performance
than the current mainline for setups that are seriously using shared
subtrees.

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

f2ebb3a9

30 Mar, 2014 4 commits

switch mnt_hash to hlist · 38129a13

Al Viro authored Mar 20, 2014

fixes RCU bug - walking through hlist is safe in face of element moves,
since it's self-terminating.  Cyclic lists are not - if we end up jumping
to another hash chain, we'll loop infinitely without ever hitting the
original list head.

[fix for dumb braino folded]

Spotted by: Max Kellermann <mk@cm4all.com>
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

38129a13

don't bother with propagate_mnt() unless the target is shared · 0b1b901b

Al Viro authored Mar 21, 2014

If the dest_mnt is not shared, propagate_mnt() does nothing -
there's no mounts to propagate to and thus no copies to create.
Might as well don't bother calling it in that case.

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

0b1b901b

keep shadowed vfsmounts together · 1d6a32ac

Al Viro authored Mar 20, 2014

preparation to switching mnt_hash to hlist

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

1d6a32ac

resizable namespace.c hashes · 0818bf27

Al Viro authored Feb 28, 2014

* switch allocation to alloc_large_system_hash()
* make sizes overridable by boot parameters (mhash_entries=, mphash_entries=)
* switch mountpoint_hashtable from list_head to hlist_head

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

0818bf27

29 Mar, 2014 3 commits

Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 981e893e

Linus Torvalds authored Mar 29, 2014

Pull timer fix from Ingo Molnar:
 "A late breaking fix from John.  (The bug fixed has a hard lockup
  potential, but that was not observed, warnings were)"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  time: Revert to calling clock_was_set_delayed() while in irq context

981e893e

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client · 0f2776e6

Linus Torvalds authored Mar 29, 2014

Pull Ceph fix from Sage Weil:
 "This drops a bad assert that a few users have been hitting but we've
  only recently been able to track down"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  rbd: drop an unsafe assertion

0f2776e6

rbd: drop an unsafe assertion · 638c323c

Alex Elder authored Mar 25, 2014

Olivier Bonvalet reported having repeated crashes due to a failed
assertion he was hitting in rbd_img_obj_callback():

    Assertion failure in rbd_img_obj_callback() at line 2165:
	rbd_assert(which >= img_request->next_completion);

With a lot of help from Olivier with reproducing the problem
we were able to determine the object and image requests had
already been completed (and often freed) at the point the
assertion failed.

There was a great deal of discussion on the ceph-devel mailing list
about this.  The problem only arose when there were two (or more)
object requests in an image request, and the problem was always
seen when the second request was being completed.

The problem is due to a race in the window between setting the
"done" flag on an object request and checking the image request's
next completion value.  When the first object request completes, it
checks to see if its successor request is marked "done", and if
so, that request is also completed.  In the process, the image
request's next_completion value is updated to reflect that both
the first and second requests are completed.  By the time the
second request is able to check the next_completion value, it
has been set to a value *greater* than its own "which" value,
which caused an assertion to fail.

Fix this problem by skipping over any completion processing
unless the completing object request is the next one expected.
Test only for inequality (not >=), and eliminate the bad
assertion.
Tested-by: Olivier Bonvalet <ob@daevel.fr>
Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>

638c323c

28 Mar, 2014 1 commit

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 49d8137a

Linus Torvalds authored Mar 28, 2014

Pull networking fixes from David Miller:

 1) We've discovered a common error in several networking drivers, they
    put VLAN offload features into ->vlan_features, which would suggest
    that they support offloading 2 or more levels of VLAN encapsulation.
    Not only do these devices not do that, but we don't have the
    infrastructure yet to handle that at all.

    Fixes from Vlad Yasevich.

 2) Fix tcpdump crash with bridging and vlans, also from Vlad.

 3) Some MAINTAINERS updates for random32 and bonding.

 4) Fix late reseeds of prandom generator, from Sasha Levin.

 5) Bridge doesn't handle stacked vlans properly, fix from Toshiaki
    Makita.

 6) Fix deadlock in openvswitch, from Flavio Leitner.

 7) get_timewait4_sock() doesn't report delay times correctly, fix from
    Eric Dumazet.

 8) Duplicate address detection and addrconf verification need to run in
    contexts where RTNL can be obtained.  Move them to run from a
    workqueue.  From Hannes Frederic Sowa.

 9) Fix route refcount leaking in ip tunnels, from Pravin B Shelar.

10) Don't return -EINTR from non-blocking recvmsg() on AF_UNIX sockets,
    from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (28 commits)
  vlan: Warn the user if lowerdev has bad vlan features.
  veth: Turn off vlan rx acceleration in vlan_features
  ifb: Remove vlan acceleration from vlan_features
  qlge: Do not propaged vlan tag offloads to vlans
  bridge: Fix crash with vlan filtering and tcpdump
  net: Account for all vlan headers in skb_mac_gso_segment
  MAINTAINERS: bonding: change email address
  MAINTAINERS: bonding: change email address
  ipv6: move DAD and addrconf_verify processing to workqueue
  tcp: fix get_timewait4_sock() delay computation on 64bit
  openvswitch: fix a possible deadlock and lockdep warning
  bridge: Fix handling stacked vlan tags
  bridge: Fix inabillity to retrieve vlan tags when tx offload is disabled
  vhost: validate vhost_get_vq_desc return value
  vhost: fix total length when packets are too short
  random32: avoid attempt to late reseed if in the middle of seeding
  random32: assign to network folks in MAINTAINERS
  net/mlx4_core: pass pci_device_id.driver_data to __mlx4_init_one during reset
  core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors
  vlan: Set hard_header_len according to available acceleration
  ...

49d8137a