1. 16 Aug, 2016 14 commits
    • Dave Weinstein's avatar
      arm: oabi compat: add missing access checks · 65413c15
      Dave Weinstein authored
      commit 7de24996 upstream.
      
      Add access checks to sys_oabi_epoll_wait() and sys_oabi_semtimedop().
      This fixes CVE-2016-3857, a local privilege escalation under
      CONFIG_OABI_COMPAT.
      Reported-by: default avatarChiachih Wu <wuchiachih@gmail.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarNicolas Pitre <nico@linaro.org>
      Signed-off-by: default avatarDave Weinstein <olorin@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      65413c15
    • Soheil Hassas Yeganeh's avatar
      tcp: consider recv buf for the initial window scale · 89c2f98f
      Soheil Hassas Yeganeh authored
      [ Upstream commit f626300a ]
      
      tcp_select_initial_window() intends to advertise a window
      scaling for the maximum possible window size. To do so,
      it considers the maximum of net.ipv4.tcp_rmem[2] and
      net.core.rmem_max as the only possible upper-bounds.
      However, users with CAP_NET_ADMIN can use SO_RCVBUFFORCE
      to set the socket's receive buffer size to values
      larger than net.ipv4.tcp_rmem[2] and net.core.rmem_max.
      Thus, SO_RCVBUFFORCE is effectively ignored by
      tcp_select_initial_window().
      
      To fix this, consider the maximum of net.ipv4.tcp_rmem[2],
      net.core.rmem_max and socket's initial buffer space.
      
      Fixes: b0573dea ("[NET]: Introduce SO_{SND,RCV}BUFFORCE socket options")
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Suggested-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      89c2f98f
    • Beniamino Galvani's avatar
      macsec: ensure rx_sa is set when validation is disabled · bb51025f
      Beniamino Galvani authored
      [ Upstream commit e3a3b626 ]
      
      macsec_decrypt() is not called when validation is disabled and so
      macsec_skb_cb(skb)->rx_sa is not set; but it is used later in
      macsec_post_decrypt(), ensure that it's always initialized.
      
      Fixes: c09440f7 ("macsec: introduce IEEE 802.1AE driver")
      Signed-off-by: default avatarBeniamino Galvani <bgalvani@redhat.com>
      Acked-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bb51025f
    • Manish Chopra's avatar
      qed: Fix setting/clearing bit in completion bitmap · 6dc47138
      Manish Chopra authored
      [ Upstream commit 59d3f1ce ]
      
      Slowpath completion handling is incorrectly changing
      SPQ_RING_SIZE bits instead of a single one.
      
      Fixes: 76a9a364 ("qed: fix handling of concurrent ramrods")
      Signed-off-by: default avatarManish Chopra <manish.chopra@qlogic.com>
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6dc47138
    • Vegard Nossum's avatar
      net/sctp: terminate rhashtable walk correctly · 77719251
      Vegard Nossum authored
      [ Upstream commit 5fc382d8 ]
      
      I was seeing a lot of these:
      
          BUG: sleeping function called from invalid context at mm/slab.h:388
          in_atomic(): 0, irqs_disabled(): 0, pid: 14971, name: trinity-c2
          Preemption disabled at:[<ffffffff819bcd46>] rhashtable_walk_start+0x46/0x150
      
           [<ffffffff81149abb>] preempt_count_add+0x1fb/0x280
           [<ffffffff83295722>] _raw_spin_lock+0x12/0x40
           [<ffffffff811aac87>] console_unlock+0x2f7/0x930
           [<ffffffff811ab5bb>] vprintk_emit+0x2fb/0x520
           [<ffffffff811aba6a>] vprintk_default+0x1a/0x20
           [<ffffffff812c171a>] printk+0x94/0xb0
           [<ffffffff811d6ed0>] print_stack_trace+0xe0/0x170
           [<ffffffff8115835e>] ___might_sleep+0x3be/0x460
           [<ffffffff81158490>] __might_sleep+0x90/0x1a0
           [<ffffffff8139b823>] kmem_cache_alloc+0x153/0x1e0
           [<ffffffff819bca1e>] rhashtable_walk_init+0xfe/0x2d0
           [<ffffffff82ec64de>] sctp_transport_walk_start+0x1e/0x60
           [<ffffffff82edd8ad>] sctp_transport_seq_start+0x4d/0x150
           [<ffffffff8143a82b>] seq_read+0x27b/0x1180
           [<ffffffff814f97fc>] proc_reg_read+0xbc/0x180
           [<ffffffff813d471b>] __vfs_read+0xdb/0x610
           [<ffffffff813d4d3a>] vfs_read+0xea/0x2d0
           [<ffffffff813d615b>] SyS_pread64+0x11b/0x150
           [<ffffffff8100334c>] do_syscall_64+0x19c/0x410
           [<ffffffff832960a5>] return_from_SYSCALL_64+0x0/0x6a
           [<ffffffffffffffff>] 0xffffffffffffffff
      
      Apparently we always need to call rhashtable_walk_stop(), even when
      rhashtable_walk_start() fails:
      
       * rhashtable_walk_start - Start a hash table walk
       * @iter:       Hash table iterator
       *
       * Start a hash table walk.  Note that we take the RCU lock in all
       * cases including when we return an error.  So you must always call
       * rhashtable_walk_stop to clean up.
      
      otherwise we never call rcu_read_unlock() and we get the splat above.
      
      Fixes: 53fa1036 ("sctp: fix some rhashtable functions using in sctp proc/diag")
      See-also: 53fa1036 ("sctp: fix some rhashtable functions using in sctp proc/diag")
      See-also: f2dba9c6 ("rhashtable: Introduce rhashtable_walk_*")
      Cc: Xin Long <lucien.xin@gmail.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      77719251
    • Vegard Nossum's avatar
      net/irda: fix NULL pointer dereference on memory allocation failure · e8f7ce7b
      Vegard Nossum authored
      [ Upstream commit d3e6952c ]
      
      I ran into this:
      
          kasan: CONFIG_KASAN_INLINE enabled
          kasan: GPF could be caused by NULL-ptr deref or user memory access
          general protection fault: 0000 [#1] PREEMPT SMP KASAN
          CPU: 2 PID: 2012 Comm: trinity-c3 Not tainted 4.7.0-rc7+ #19
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
          task: ffff8800b745f2c0 ti: ffff880111740000 task.ti: ffff880111740000
          RIP: 0010:[<ffffffff82bbf066>]  [<ffffffff82bbf066>] irttp_connect_request+0x36/0x710
          RSP: 0018:ffff880111747bb8  EFLAGS: 00010286
          RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000069dd8358
          RDX: 0000000000000009 RSI: 0000000000000027 RDI: 0000000000000048
          RBP: ffff880111747c00 R08: 0000000000000000 R09: 0000000000000000
          R10: 0000000069dd8358 R11: 1ffffffff0759723 R12: 0000000000000000
          R13: ffff88011a7e4780 R14: 0000000000000027 R15: 0000000000000000
          FS:  00007fc738404700(0000) GS:ffff88011af00000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: 00007fc737fdfb10 CR3: 0000000118087000 CR4: 00000000000006e0
          Stack:
           0000000000000200 ffff880111747bd8 ffffffff810ee611 ffff880119f1f220
           ffff880119f1f4f8 ffff880119f1f4f0 ffff88011a7e4780 ffff880119f1f232
           ffff880119f1f220 ffff880111747d58 ffffffff82bca542 0000000000000000
          Call Trace:
           [<ffffffff82bca542>] irda_connect+0x562/0x1190
           [<ffffffff825ae582>] SYSC_connect+0x202/0x2a0
           [<ffffffff825b4489>] SyS_connect+0x9/0x10
           [<ffffffff8100334c>] do_syscall_64+0x19c/0x410
           [<ffffffff83295ca5>] entry_SYSCALL64_slow_path+0x25/0x25
          Code: 41 89 ca 48 89 e5 41 57 41 56 41 55 41 54 41 89 d7 53 48 89 fb 48 83 c7 48 48 89 fa 41 89 f6 48 c1 ea 03 48 83 ec 20 4c 8b 65 10 <0f> b6 04 02 84 c0 74 08 84 c0 0f 8e 4c 04 00 00 80 7b 48 00 74
          RIP  [<ffffffff82bbf066>] irttp_connect_request+0x36/0x710
           RSP <ffff880111747bb8>
          ---[ end trace 4cda2588bc055b30 ]---
      
      The problem is that irda_open_tsap() can fail and leave self->tsap = NULL,
      and then irttp_connect_request() almost immediately dereferences it.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e8f7ce7b
    • Marcelo Ricardo Leitner's avatar
      sctp: fix BH handling on socket backlog · 2f9abe46
      Marcelo Ricardo Leitner authored
      [ Upstream commit eefc1b1d ]
      
      Now that the backlog processing is called with BH enabled, we have to
      disable BH before taking the socket lock via bh_lock_sock() otherwise
      it may dead lock:
      
      sctp_backlog_rcv()
                      bh_lock_sock(sk);
      
                      if (sock_owned_by_user(sk)) {
                              if (sk_add_backlog(sk, skb, sk->sk_rcvbuf))
                                      sctp_chunk_free(chunk);
                              else
                                      backloged = 1;
                      } else
                              sctp_inq_push(inqueue, chunk);
      
                      bh_unlock_sock(sk);
      
      while sctp_inq_push() was disabling/enabling BH, but enabling BH
      triggers pending softirq, which then may try to re-lock the socket in
      sctp_rcv().
      
      [  219.187215]  <IRQ>
      [  219.187217]  [<ffffffff817ca3e0>] _raw_spin_lock+0x20/0x30
      [  219.187223]  [<ffffffffa041888c>] sctp_rcv+0x48c/0xba0 [sctp]
      [  219.187225]  [<ffffffff816e7db2>] ? nf_iterate+0x62/0x80
      [  219.187226]  [<ffffffff816f1b14>] ip_local_deliver_finish+0x94/0x1e0
      [  219.187228]  [<ffffffff816f1e1f>] ip_local_deliver+0x6f/0xf0
      [  219.187229]  [<ffffffff816f1a80>] ? ip_rcv_finish+0x3b0/0x3b0
      [  219.187230]  [<ffffffff816f17a8>] ip_rcv_finish+0xd8/0x3b0
      [  219.187232]  [<ffffffff816f2122>] ip_rcv+0x282/0x3a0
      [  219.187233]  [<ffffffff810d8bb6>] ? update_curr+0x66/0x180
      [  219.187235]  [<ffffffff816abac4>] __netif_receive_skb_core+0x524/0xa90
      [  219.187236]  [<ffffffff810d8e00>] ? update_cfs_shares+0x30/0xf0
      [  219.187237]  [<ffffffff810d557c>] ? __enqueue_entity+0x6c/0x70
      [  219.187239]  [<ffffffff810dc454>] ? enqueue_entity+0x204/0xdf0
      [  219.187240]  [<ffffffff816ac048>] __netif_receive_skb+0x18/0x60
      [  219.187242]  [<ffffffff816ad1ce>] process_backlog+0x9e/0x140
      [  219.187243]  [<ffffffff816ac8ec>] net_rx_action+0x22c/0x370
      [  219.187245]  [<ffffffff817cd352>] __do_softirq+0x112/0x2e7
      [  219.187247]  [<ffffffff817cc3bc>] do_softirq_own_stack+0x1c/0x30
      [  219.187247]  <EOI>
      [  219.187248]  [<ffffffff810aa1c8>] do_softirq.part.14+0x38/0x40
      [  219.187249]  [<ffffffff810aa24d>] __local_bh_enable_ip+0x7d/0x80
      [  219.187254]  [<ffffffffa0408428>] sctp_inq_push+0x68/0x80 [sctp]
      [  219.187258]  [<ffffffffa04190f1>] sctp_backlog_rcv+0x151/0x1c0 [sctp]
      [  219.187260]  [<ffffffff81692b07>] __release_sock+0x87/0xf0
      [  219.187261]  [<ffffffff81692ba0>] release_sock+0x30/0xa0
      [  219.187265]  [<ffffffffa040e46d>] sctp_accept+0x17d/0x210 [sctp]
      [  219.187266]  [<ffffffff810e7510>] ? prepare_to_wait_event+0xf0/0xf0
      [  219.187268]  [<ffffffff8172d52c>] inet_accept+0x3c/0x130
      [  219.187269]  [<ffffffff8168d7a3>] SYSC_accept4+0x103/0x210
      [  219.187271]  [<ffffffff817ca2ba>] ? _raw_spin_unlock_bh+0x1a/0x20
      [  219.187272]  [<ffffffff81692bfc>] ? release_sock+0x8c/0xa0
      [  219.187276]  [<ffffffffa0413e22>] ? sctp_inet_listen+0x62/0x1b0 [sctp]
      [  219.187277]  [<ffffffff8168f2d0>] SyS_accept+0x10/0x20
      
      Fixes: 860fbbc3 ("sctp: prepare for socket backlog behavior change")
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2f9abe46
    • Mike Manning's avatar
      net: ipv6: Always leave anycast and multicast groups on link down · 201e07fe
      Mike Manning authored
      [ Upstream commit ea06f717 ]
      
      Default kernel behavior is to delete IPv6 addresses on link
      down, which entails deletion of the multicast and the
      subnet-router anycast addresses. These deletions do not
      happen with sysctl setting to keep global IPv6 addresses on
      link down, so every link down/up causes an increment of the
      anycast and multicast refcounts. These bogus refcounts may
      stop these addrs from being removed on subsequent calls to
      delete them. The solution is to leave the groups for the
      multicast and subnet anycast on link down for the callflow
      when global IPv6 addresses are kept.
      
      Fixes: f1705ec1 ("net: ipv6: Make address flushing on ifdown optional")
      Signed-off-by: default avatarMike Manning <mmanning@brocade.com>
      Acked-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      201e07fe
    • Ido Schimmel's avatar
      bridge: Fix incorrect re-injection of LLDP packets · 5986b7b7
      Ido Schimmel authored
      [ Upstream commit baedbe55 ]
      
      Commit 8626c56c ("bridge: fix potential use-after-free when hook
      returns QUEUE or STOLEN verdict") caused LLDP packets arriving through a
      bridge port to be re-injected to the Rx path with skb->dev set to the
      bridge device, but this breaks the lldpad daemon.
      
      The lldpad daemon opens a packet socket with protocol set to ETH_P_LLDP
      for any valid device on the system, which doesn't not include soft
      devices such as bridge and VLAN.
      
      Since packet sockets (ptype_base) are processed in the Rx path after the
      Rx handler, LLDP packets with skb->dev set to the bridge device never
      reach the lldpad daemon.
      
      Fix this by making the bridge's Rx handler re-inject LLDP packets with
      RX_HANDLER_PASS, which effectively restores the behaviour prior to the
      mentioned commit.
      
      This means netfilter will never receive LLDP packets coming through a
      bridge port, as I don't see a way in which we can have okfn() consume
      the packet without breaking existing behaviour. I've already carried out
      a similar fix for STP packets in commit 56fae404 ("bridge: Fix
      incorrect re-injection of STP packets").
      
      Fixes: 8626c56c ("bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdict")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5986b7b7
    • Mark Bloch's avatar
      net/bonding: Enforce active-backup policy for IPoIB bonds · 746ac22c
      Mark Bloch authored
      [ Upstream commit 1533e773 ]
      
      When using an IPoIB bond currently only active-backup mode is a valid
      use case and this commit strengthens it.
      
      Since commit 2ab82852 ("net/bonding: Enable bonding to enslave
      netdevices not supporting set_mac_address()") was introduced till
      4.7-rc1, IPoIB didn't support the set_mac_address ndo, and hence the
      fail over mac policy always applied to IPoIB bonds.
      
      With the introduction of commit 492a7e67 ("IB/IPoIB: Allow setting
      the device address"), that doesn't hold and practically IPoIB bonds are
      broken as of that. To fix it, lets go to fail over mac if the device
      doesn't support the ndo OR this is IPoIB device.
      
      As a by-product, this commit also prevents a stack corruption which
      occurred when trying to copy 20 bytes (IPoIB) device address
      to a sockaddr struct that has only 16 bytes of storage.
      Signed-off-by: default avatarMark Bloch <markb@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Acked-by: default avatarAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      746ac22c
    • Daniel Borkmann's avatar
      udp: use sk_filter_trim_cap for udp{,6}_queue_rcv_skb · ec3bdcb8
      Daniel Borkmann authored
      [ Upstream commit ba66bbe5 ]
      
      After a6127697 ("udp: prevent bugcheck if filter truncates packet
      too much"), there followed various other fixes for similar cases such
      as f4979fce ("rose: limit sk_filter trim to payload").
      
      Latter introduced a new helper sk_filter_trim_cap(), where we can pass
      the trim limit directly to the socket filter handling. Make use of it
      here as well with sizeof(struct udphdr) as lower cap limit and drop the
      extra skb->len test in UDP's input path.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Willem de Bruijn <willemb@google.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ec3bdcb8
    • Miklos Szeredi's avatar
      vfs: fix deadlock in file_remove_privs() on overlayfs · 99ffed74
      Miklos Szeredi authored
      commit c1892c37 upstream.
      
      file_remove_privs() is called with inode lock on file_inode(), which
      proceeds to calling notify_change() on file->f_path.dentry.  Which triggers
      the WARN_ON_ONCE(!inode_is_locked(inode)) in addition to deadlocking later
      when ovl_setattr tries to lock the underlying inode again.
      
      Fix this mess by not mixing the layers, but doing everything on underlying
      dentry/inode.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Fixes: 07a2daab ("ovl: Copy up underlying inode's ->i_mode to overlay inode")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      99ffed74
    • Scott Bauer's avatar
      vfs: ioctl: prevent double-fetch in dedupe ioctl · 37fe5281
      Scott Bauer authored
      commit 10eec60c upstream.
      
      This prevents a double-fetch from user space that can lead to to an
      undersized allocation and heap overflow.
      
      Fixes: 54dbc151 ("vfs: hoist the btrfs deduplication ioctl to the vfs")
      Signed-off-by: default avatarScott Bauer <sbauer@plzdonthack.me>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      37fe5281
    • Vegard Nossum's avatar
      ext4: verify extent header depth · 3f66cf7a
      Vegard Nossum authored
      commit 7bc94916 upstream.
      
      Although the extent tree depth of 5 should enough be for the worst
      case of 2*32 extents of length 1, the extent tree code does not
      currently to merge nodes which are less than half-full with a sibling
      node, or to shrink the tree depth if possible.  So it's possible, at
      least in theory, for the tree depth to be greater than 5.  However,
      even in the worst case, a tree depth of 32 is highly unlikely, and if
      the file system is maliciously corrupted, an insanely large eh_depth
      can cause memory allocation failures that will trigger kernel warnings
      (here, eh_depth = 65280):
      
          JBD2: ext4.exe wants too many credits credits:195849 rsv_credits:0 max:256
          ------------[ cut here ]------------
          WARNING: CPU: 0 PID: 50 at fs/jbd2/transaction.c:293 start_this_handle+0x569/0x580
          CPU: 0 PID: 50 Comm: ext4.exe Not tainted 4.7.0-rc5+ #508
          Stack:
           604a8947 625badd8 0002fd09 00000000
           60078643 00000000 62623910 601bf9bc
           62623970 6002fc84 626239b0 900000125
          Call Trace:
           [<6001c2dc>] show_stack+0xdc/0x1a0
           [<601bf9bc>] dump_stack+0x2a/0x2e
           [<6002fc84>] __warn+0x114/0x140
           [<6002fdff>] warn_slowpath_null+0x1f/0x30
           [<60165829>] start_this_handle+0x569/0x580
           [<60165d4e>] jbd2__journal_start+0x11e/0x220
           [<60146690>] __ext4_journal_start_sb+0x60/0xa0
           [<60120a81>] ext4_truncate+0x131/0x3a0
           [<60123677>] ext4_setattr+0x757/0x840
           [<600d5d0f>] notify_change+0x16f/0x2a0
           [<600b2b16>] do_truncate+0x76/0xc0
           [<600c3e56>] path_openat+0x806/0x1300
           [<600c55c9>] do_filp_open+0x89/0xf0
           [<600b4074>] do_sys_open+0x134/0x1e0
           [<600b4140>] SyS_open+0x20/0x30
           [<6001ea68>] handle_syscall+0x88/0x90
           [<600295fd>] userspace+0x3fd/0x500
           [<6001ac55>] fork_handler+0x85/0x90
      
          ---[ end trace 08b0b88b6387a244 ]---
      
      [ Commit message modified and the extent tree depath check changed
      from 5 to 32 -- tytso ]
      
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3f66cf7a
  2. 24 Jul, 2016 2 commits
  3. 23 Jul, 2016 22 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 107df032
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix memory leak in nftables, from Liping Zhang.
      
       2) Need to check result of vlan_insert_tag() in batman-adv otherwise we
          risk NULL skb derefs, from Sven Eckelmann.
      
       3) Check for dev_alloc_skb() failures in cfg80211, from Gregory
          Greenman.
      
       4) Handle properly when we have ppp_unregister_channel() happening in
          parallel with ppp_connect_channel(), from WANG Cong.
      
       5) Fix DCCP deadlock, from Eric Dumazet.
      
       6) Bail out properly in UDP if sk_filter() truncates the packet to be
          smaller than even the space that the protocol headers need.  From
          Michal Kubecek.
      
       7) Similarly for rose, dccp, and sctp, from Willem de Bruijn.
      
       8) Make TCP challenge ACKs less predictable, from Eric Dumazet.
      
       9) Fix infinite loop in bgmac_dma_tx_add() from Florian Fainelli.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
        packet: propagate sock_cmsg_send() error
        net/mlx5e: Fix del vxlan port command buffer memset
        packet: fix second argument of sock_tx_timestamp()
        net: switchdev: change ageing_time type to clock_t
        Update maintainer for EHEA driver.
        net/mlx4_en: Add resilience in low memory systems
        net/mlx4_en: Move filters cleanup to a proper location
        sctp: load transport header after sk_filter
        net/sched/sch_htb: clamp xstats tokens to fit into 32-bit int
        net: cavium: liquidio: Avoid dma_unmap_single on uninitialized ndata
        net: nb8800: Fix SKB leak in nb8800_receive()
        et131x: Fix logical vs bitwise check in et131x_tx_timeout()
        vlan: use a valid default mtu value for vlan over macsec
        net: bgmac: Fix infinite loop in bgmac_dma_tx_add()
        mlxsw: spectrum: Prevent invalid ingress buffer mapping
        mlxsw: spectrum: Prevent overwrite of DCB capability fields
        mlxsw: spectrum: Don't emit errors when PFC is disabled
        mlxsw: spectrum: Indicate support for autonegotiation
        mlxsw: spectrum: Force link training according to admin state
        r8152: add MODULE_VERSION
        ...
      107df032
    • Linus Torvalds's avatar
      Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs · 88083e98
      Linus Torvalds authored
      Pull overlayfs fixes from Miklos Szeredi:
       "This contains a fix for a potential crash/corruption issue and another
        where the suid/sgid bits weren't cleared on write"
      
      * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
        ovl: verify upper dentry in ovl_remove_and_whiteout()
        ovl: Copy up underlying inode's ->i_mode to overlay inode
        ovl: handle ATTR_KILL*
      88083e98
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · b1386ced
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "Five fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        pps: do not crash when failed to register
        tools/vm/slabinfo: fix an unintentional printf
        testing/radix-tree: fix a macro expansion bug
        radix-tree: fix radix_tree_iter_retry() for tagged iterators.
        mm: memcontrol: fix cgroup creation failure after many small jobs
      b1386ced
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-for-v4.7-rc8-intel-kbl' of git://people.freedesktop.org/~airlied/linux · d15ae814
      Linus Torvalds authored
      Pull intel kabylake drm fixes from Dave Airlie:
       "As mentioned Intel has gathered all the Kabylake fixes from -next,
        which we've enabled in 4.7 for the first time, these are pretty much
        limited in scope to only affects kabylake, which is hw that isn't
        shipping yet.  So I'm mostly okay with it going in now.
      
        If we don't land this, it might be a good idea to disable kabylake
        support in 4.7 before we ship"
      
      * tag 'drm-fixes-for-v4.7-rc8-intel-kbl' of git://people.freedesktop.org/~airlied/linux: (28 commits)
        drm/i915/kbl: Introduce the first official DMC for Kabylake.
        drm/i915: Introduce Kabypoint PCH for Kabylake H/DT.
        drm/i915/gen9: implement WaConextSwitchWithConcurrentTLBInvalidate
        drm/i915/gen9: Add WaFbcHighMemBwCorruptionAvoidance
        drm/i195/fbc: Add WaFbcNukeOnHostModify
        drm/i915/gen9: Add WaFbcWakeMemOn
        drm/i915/gen9: Add WaFbcTurnOffFbcWatermark
        drm/i915/kbl: Add WaClearSlmSpaceAtContextSwitch
        drm/i915/gen9: Add WaEnableChickenDCPR
        drm/i915/kbl: Add WaDisableSbeCacheDispatchPortSharing
        drm/i915/kbl: Add WaDisableGafsUnitClkGating
        drm/i915/kbl: Add WaForGAMHang
        drm/i915: Add WaInsertDummyPushConstP for bxt and kbl
        drm/i915/kbl: Add WaDisableDynamicCreditSharing
        drm/i915/kbl: Add WaDisableGamClockGating
        drm/i915/gen9: Enable must set chicken bits in config0 reg
        drm/i915/kbl: Add WaDisableLSQCROPERFforOCL
        drm/i915/kbl: Add WaDisableSDEUnitClockGating
        drm/i915/kbl: Add WaDisableFenceDestinationToSLM for A0
        drm/i915/kbl: Add WaEnableGapsTsvCreditFix
        ...
      d15ae814
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-for-v4.7-rc8-intel' of git://people.freedesktop.org/~airlied/linux · 3f2625d7
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Two i915 regression fixes.
      
        Intel have submitted some Kabylake fixes I'll send separately, since
        this is the first kernel with kabylake support and they don't go much
        outside that area I think they should be fine"
      
      * tag 'drm-fixes-for-v4.7-rc8-intel' of git://people.freedesktop.org/~airlied/linux:
        drm/i915: add missing condition for committing planes on crtc
        drm/i915: Treat eDP as always connected, again
      3f2625d7
    • Linus Torvalds's avatar
      Merge tag 'm68k-for-v4.8-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k · 23218843
      Linus Torvalds authored
      Pull m68k upddates from Geert Uytterhoeven:
       - assorted spelling fixes
       - defconfig updates
      
      * tag 'm68k-for-v4.8-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
        m68k/defconfig: Update defconfigs for v4.7-rc2
        m68k: Assorted spelling fixes
      23218843
    • Linus Torvalds's avatar
      Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 7825e0c4
      Linus Torvalds authored
      Pull ARM SoC fixes from Olof Johansson:
       "A handful of fixes before final release:
      
        Marvell Armada:
         - One to fix a typo in the devicetree specifying memory ranges for
           the crypto engine
         - Two to deal with marking PCI and device-memory as strongly ordered
           to avoid hardware deadlocks, in particular when enabling above
           crypto driver.
         - Compile fix for PM
      
        Allwinner:
         - DT clock fixes to deal with u-boot-enabled framebuffer (simplefb).
         - Make R8 (C.H.I.P. SoC) inherit system compatibility from A13 to
           make clocks register proper.
      
        Tegra:
         - Fix SD card voltage setting on the Tegra3 Beaver dev board
      
        Misc:
         - Two maintainers updates for STM32 and STi platforms"
      
      * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
        ARM: tegra: beaver: Allow SD card voltage to be changed
        MAINTAINERS: update STi maintainer list
        MAINTAINERS: update STM32 maintainers list
        ARM: mvebu: compile pm code conditionally
        ARM: dts: sun7i: Fix pll3x2 and pll7x2 not having a parent clock
        ARM: dts: sunxi: Add pll3 to simplefb nodes clocks lists
        ARM: dts: armada-38x: fix MBUS_ID for crypto SRAM on Armada 385 Linksys
        ARM: mvebu: map PCI I/O regions strongly ordered
        ARM: mvebu: fix HW I/O coherency related deadlocks
        ARM: sunxi/dt: make the CHIP inherit from allwinner,sun5i-a13
      7825e0c4
    • Linus Torvalds's avatar
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 48d4ca56
      Linus Torvalds authored
      Pull crypto fixes from Herbert Xu:
       "This fixes a sporadic build failure in the qat driver as well as a
        memory corruption bug in rsa-pkcs1pad"
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: rsa-pkcs1pad - fix rsa-pkcs1pad request struct
        crypto: qat - make qat_asym_algs.o depend on asn1 headers
      48d4ca56
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security · 897473fc
      Linus Torvalds authored
      Pull key handling fixes from James Morris:
       "Quoting David Howells:
      
        Here are three miscellaneous fixes:
      
        (1) Fix a panic in some debugging code in PKCS#7.  This can only
            happen by explicitly inserting a #define DEBUG into the code.
      
        (2) Fix the calculation of the digest length in the PE file parser.
            This causes a failure where there should be a success.
      
        (3) Fix the case where an X.509 cert can be added as an asymmetric key
            to a trusted keyring with no trust restriction if no AKID is
            supplied.
      
        Bugs (1) and (2) aren't particularly problematic, but (3) allows a
        security check to be bypassed.  Happily, this is a recent regression
        and never made it into a released kernel"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
        KEYS: Fix for erroneous trust of incorrectly signed X.509 certs
        pefile: Fix the failure of calculation for digest
        PKCS#7: Fix panic when referring to the empty AKID when DEBUG defined
      897473fc
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 3aa536d9
      Linus Torvalds authored
      Pull input fixes from Dmitry Torokhov:
       "A few more fixes for the input subsystem:
      
         - restore naming for tsc2005 touchscreens as some userspace match on it
         - fix out of bound access in legacy keyboard driver
         - fixup in RMI4 driver
      
        Everything is tagged for stable as well"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: tsc200x - report proper input_dev name
        tty/vt/keyboard: fix OOB access in do_compute_shiftstate()
        Input: synaptics-rmi4 - fix maximum size check for F12 control register 8
      3aa536d9
    • Linus Torvalds's avatar
      Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · f1894d83
      Linus Torvalds authored
      Pull libnvdimm fix from Dan Williams:
       "This contains a regression fix for a problem that was introduced in
        v4.7-rc6.
      
        In 4.7-rc1 we introduced auto-probing for the ACPI DSM (device-
        specific-method) format that the platform firmware implements for
        nvdimm devices.  We initially fixed a regression in probing the QEMU
        DSM implementation by making acpi_check_dsm() tolerant of the way QEMU
        reports the "0 DSMs supported" condition.
      
        However, that broke HPE platforms since that tolerance caused the
        driver to mistakenly match the 1-zero-byte response those platforms
        give to "unknown" commands.  Instead, we simply make the driver
        tolerant of not finding any supported DSMs.  This has been tested to
        work with both QEMU and HPE platforms.
      
        This commit has appeared in a -next release with no reported issues"
      
      * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        nfit: make DIMM DSMs optional
      f1894d83
    • Linus Torvalds's avatar
      Merge tag 'gpio-v4.7-6' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio · ee62f09b
      Linus Torvalds authored
      Pull GPIO fix from Linus Walleij:
       "Compile problem fix for Tegra,
      
        Sorry to send this in the last minute but Ingo says this build failure
        is very prominent so I'm not going to wait for v4.7 before sending it.
      
        It is a case of COMPILE_TEST causing more problems than it solves and
        I'm already swearing about me shooting myself in the foot with that
        gun :("
      
      * tag 'gpio-v4.7-6' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
        gpio: tegra: don't auto-enable for COMPILE_TEST
      ee62f09b
    • Linus Torvalds's avatar
      Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · 62cd69d5
      Linus Torvalds authored
      Pull clk fixes from Michael Turquette:
       "Fix a bug in the at91 clk driver, two compile time warnings in sunxi
        clk drivers, and one bug in a sunxi clk driver introduced in the 4.7
        merge window"
      
      * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
        clk: at91: fix clk_programmable_set_parent()
        clk: sunxi: remove unused variable
        clk: sunxi: display: Add per-clock flags
        clk: sunxi: tcon-ch1: Do not return a negative error in get_parent
      62cd69d5
    • Linus Torvalds's avatar
      Merge branch 'for-4.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata · a933f80d
      Linus Torvalds authored
      Pull libata fix from Tejun Heo:
       "Another fallout from max_sectors bump a couple years ago.  The lite-on
        optical drive times out on large requests"
      
      * 'for-4.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
        libata: LITE-ON CX1-JB256-HP needs lower max_sectors
      a933f80d
    • Linus Torvalds's avatar
      Merge tag 'mmc-v4.7-rc7' of git://git.linaro.org/people/ulf.hansson/mmc · ea4b3cfa
      Linus Torvalds authored
      Pull MMC fixes from Ulf Hansson:
       "Here are a few late mmc fixes intended for v4.7 final.
      
        MMC core:
         - Fix eMMC packed command header endianness
         - Fix free of uninitialized buffer for mmc ioctl
      
        MMC host:
         - pxamci: Fix potential oops in ->probe()"
      
      * tag 'mmc-v4.7-rc7' of git://git.linaro.org/people/ulf.hansson/mmc:
        mmc: pxamci: fix potential oops
        mmc: block: fix packed command header endianness
        mmc: block: fix free of uninitialized 'idata->buf'
      ea4b3cfa
    • Linus Torvalds's avatar
      Merge tag 'sound-4.7-fix2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · b6cbecae
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "No surprise, just a few small fixes: a couple of changes are seen in
        the core part, and both of them are rather for unusual error paths.
      
        The rest are the regular HD-audio fixes and one USB-audio regression
        fix"
      
      * tag 'sound-4.7-fix2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: usb-audio: Fix quirks code is not called
        ALSA: hda: add AMD Stoney PCI ID with proper driver caps
        ALSA: hda - fix use-after-free after module unload
        ALSA: pcm: Free chmap at PCM free callback, too
        ALSA: ctl: Stop notification after disconnection
        ALSA: hda/realtek - add new pin definition in alc225 pin quirk table
      b6cbecae
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · ff8d6fac
      Linus Torvalds authored
      Pull NVMe fix from Jens Axboe:
       "Late addition here, it's basically a revert of a patch that was added
        in this merge window, but has proven to cause problems.
      
        This is swapping out the RCU based namespace protection with a good
        old mutex instead"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        nvme: Remove RCU namespace protection
      ff8d6fac
    • Jiri Slaby's avatar
      pps: do not crash when failed to register · 368301f2
      Jiri Slaby authored
      With this command sequence:
      
        modprobe plip
        modprobe pps_parport
        rmmod pps_parport
      
      the partport_pps modules causes this crash:
      
        BUG: unable to handle kernel NULL pointer dereference at (null)
        IP: parport_detach+0x1d/0x60 [pps_parport]
        Oops: 0000 [#1] SMP
        ...
        Call Trace:
          parport_unregister_driver+0x65/0xc0 [parport]
          SyS_delete_module+0x187/0x210
      
      The sequence that builds up to this is:
      
       1) plip is loaded and takes the parport device for exclusive use:
      
          plip0: Parallel port at 0x378, using IRQ 7.
      
       2) pps_parport then fails to grab the device:
      
          pps_parport: parallel port PPS client
          parport0: cannot grant exclusive access for device pps_parport
          pps_parport: couldn't register with parport0
      
       3) rmmod of pps_parport is then killed because it tries to access
          pardev->name, but pardev (taken from port->cad) is NULL.
      
      So add a check for NULL in the test there too.
      
      Link: http://lkml.kernel.org/r/20160714115245.12651-1-jslaby@suse.czSigned-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Acked-by: default avatarRodolfo Giometti <giometti@enneenne.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      368301f2
    • Dan Carpenter's avatar
      tools/vm/slabinfo: fix an unintentional printf · 2d6a4d64
      Dan Carpenter authored
      The curly braces are missing here so we print stuff unintentionally.
      
      Fixes: 9da4714a ('slub: slabinfo update for cmpxchg handling')
      Link: http://lkml.kernel.org/r/20160715211243.GE19522@mwandaSigned-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Laura Abbott <labbott@fedoraproject.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2d6a4d64
    • Dan Carpenter's avatar
      testing/radix-tree: fix a macro expansion bug · b301aac5
      Dan Carpenter authored
      There are no parentheses around this macro and it causes a problem when
      we do:
      
      	index = rand() % THRASH_SIZE;
      
      Link: http://lkml.kernel.org/r/20160715210953.GC19522@mwandaSigned-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b301aac5
    • Andrey Ryabinin's avatar
      radix-tree: fix radix_tree_iter_retry() for tagged iterators. · 3cb9185c
      Andrey Ryabinin authored
      radix_tree_iter_retry() resets slot to NULL, but it doesn't reset tags.
      Then NULL slot and non-zero iter.tags passed to radix_tree_next_slot()
      leading to crash:
      
        RIP: radix_tree_next_slot include/linux/radix-tree.h:473
          find_get_pages_tag+0x334/0x930 mm/filemap.c:1452
        ....
        Call Trace:
          pagevec_lookup_tag+0x3a/0x80 mm/swap.c:960
          mpage_prepare_extent_to_map+0x321/0xa90 fs/ext4/inode.c:2516
          ext4_writepages+0x10be/0x2b20 fs/ext4/inode.c:2736
          do_writepages+0x97/0x100 mm/page-writeback.c:2364
          __filemap_fdatawrite_range+0x248/0x2e0 mm/filemap.c:300
          filemap_write_and_wait_range+0x121/0x1b0 mm/filemap.c:490
          ext4_sync_file+0x34d/0xdb0 fs/ext4/fsync.c:115
          vfs_fsync_range+0x10a/0x250 fs/sync.c:195
          vfs_fsync fs/sync.c:209
          do_fsync+0x42/0x70 fs/sync.c:219
          SYSC_fdatasync fs/sync.c:232
          SyS_fdatasync+0x19/0x20 fs/sync.c:230
          entry_SYSCALL_64_fastpath+0x23/0xc1 arch/x86/entry/entry_64.S:207
      
      We must reset iterator's tags to bail out from radix_tree_next_slot()
      and go to the slow-path in radix_tree_next_chunk().
      
      Fixes: 46437f9a ("radix-tree: fix race in gang lookup")
      Link: http://lkml.kernel.org/r/1468495196-10604-1-git-send-email-aryabinin@virtuozzo.comSigned-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Acked-by: default avatarKonstantin Khlebnikov <koct9i@gmail.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3cb9185c
    • Johannes Weiner's avatar
      mm: memcontrol: fix cgroup creation failure after many small jobs · 73f576c0
      Johannes Weiner authored
      The memory controller has quite a bit of state that usually outlives the
      cgroup and pins its CSS until said state disappears.  At the same time
      it imposes a 16-bit limit on the CSS ID space to economically store IDs
      in the wild.  Consequently, when we use cgroups to contain frequent but
      small and short-lived jobs that leave behind some page cache, we quickly
      run into the 64k limitations of outstanding CSSs.  Creating a new cgroup
      fails with -ENOSPC while there are only a few, or even no user-visible
      cgroups in existence.
      
      Although pinning CSSs past cgroup removal is common, there are only two
      instances that actually need an ID after a cgroup is deleted: cache
      shadow entries and swapout records.
      
      Cache shadow entries reference the ID weakly and can deal with the CSS
      having disappeared when it's looked up later.  They pose no hurdle.
      
      Swap-out records do need to pin the css to hierarchically attribute
      swapins after the cgroup has been deleted; though the only pages that
      remain swapped out after offlining are tmpfs/shmem pages.  And those
      references are under the user's control, so they are manageable.
      
      This patch introduces a private 16-bit memcg ID and switches swap and
      cache shadow entries over to using that.  This ID can then be recycled
      after offlining when the CSS remains pinned only by objects that don't
      specifically need it.
      
      This script demonstrates the problem by faulting one cache page in a new
      cgroup and deleting it again:
      
        set -e
        mkdir -p pages
        for x in `seq 128000`; do
          [ $((x % 1000)) -eq 0 ] && echo $x
          mkdir /cgroup/foo
          echo $$ >/cgroup/foo/cgroup.procs
          echo trex >pages/$x
          echo $$ >/cgroup/cgroup.procs
          rmdir /cgroup/foo
        done
      
      When run on an unpatched kernel, we eventually run out of possible IDs
      even though there are no visible cgroups:
      
        [root@ham ~]# ./cssidstress.sh
        [...]
        65000
        mkdir: cannot create directory '/cgroup/foo': No space left on device
      
      After this patch, the IDs get released upon cgroup destruction and the
      cache and css objects get released once memory reclaim kicks in.
      
      [hannes@cmpxchg.org: init the IDR]
        Link: http://lkml.kernel.org/r/20160621154601.GA22431@cmpxchg.org
      Fixes: b2052564 ("mm: memcontrol: continue cache reclaim from offlined groups")
      Link: http://lkml.kernel.org/r/20160617162516.GD19084@cmpxchg.orgSigned-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: default avatarJohn Garcia <john.garcia@mesosphere.io>
      Reviewed-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Nikolay Borisov <kernel@kyup.com>
      Cc: <stable@vger.kernel.org>	[3.19+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      73f576c0
  4. 22 Jul, 2016 2 commits
    • Arnd Bergmann's avatar
      gpio: tegra: don't auto-enable for COMPILE_TEST · 0bfb85c6
      Arnd Bergmann authored
      I stumbled over a build error with COMPILE_TEST and CONFIG_OF
      disabled:
      
      drivers/gpio/gpio-tegra.c: In function 'tegra_gpio_probe':
      drivers/gpio/gpio-tegra.c:603:9: error: 'struct gpio_chip' has no member named 'of_node'
      
      The problem is that the newly added GPIO_TEGRA Kconfig symbol
      does not have a dependency on CONFIG_OF. However, there is another
      problem here as the driver gets enabled unconditionally whenever
      COMPILE_TEST is set.
      
      This fixes both problems, by making the symbol user-visible
      when COMPILE_TEST is set and default-enabled for ARCH_TEGRA=y.
      
      As a side-effect, it is now possible to compile-test a Tegra
      kernel with GPIO support disabled, which is harmless.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Fixes: 4dd4dd1d ("gpio: tegra: Allow compile test")
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      0bfb85c6
    • Ilya Dryomov's avatar
      libceph: apply new_state before new_up_client on incrementals · 930c5328
      Ilya Dryomov authored
      Currently, osd_weight and osd_state fields are updated in the encoding
      order.  This is wrong, because an incremental map may look like e.g.
      
          new_up_client: { osd=6, addr=... } # set osd_state and addr
          new_state: { osd=6, xorstate=EXISTS } # clear osd_state
      
      Suppose osd6's current osd_state is EXISTS (i.e. osd6 is down).  After
      applying new_up_client, osd_state is changed to EXISTS | UP.  Carrying
      on with the new_state update, we flip EXISTS and leave osd6 in a weird
      "!EXISTS but UP" state.  A non-existent OSD is considered down by the
      mapping code
      
      2087    for (i = 0; i < pg->pg_temp.len; i++) {
      2088            if (ceph_osd_is_down(osdmap, pg->pg_temp.osds[i])) {
      2089                    if (ceph_can_shift_osds(pi))
      2090                            continue;
      2091
      2092                    temp->osds[temp->size++] = CRUSH_ITEM_NONE;
      
      and so requests get directed to the second OSD in the set instead of
      the first, resulting in OSD-side errors like:
      
      [WRN] : client.4239 192.168.122.21:0/2444980242 misdirected client.4239.1:2827 pg 2.5df899f2 to osd.4 not [1,4,6] in e680/680
      
      and hung rbds on the client:
      
      [  493.566367] rbd: rbd0: write 400000 at 11cc00000 (0)
      [  493.566805] rbd: rbd0:   result -6 xferred 400000
      [  493.567011] blk_update_request: I/O error, dev rbd0, sector 9330688
      
      The fix is to decouple application from the decoding and:
      - apply new_weight first
      - apply new_state before new_up_client
      - twiddle osd_state flags if marking in
      - clear out some of the state if osd is destroyed
      
      Fixes: http://tracker.ceph.com/issues/14901
      
      Cc: stable@vger.kernel.org # 3.15+: 6dd74e44: libceph: set 'exists' flag for newly up osd
      Cc: stable@vger.kernel.org # 3.15+
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarJosh Durgin <jdurgin@redhat.com>
      930c5328