1. 15 Oct, 2014 17 commits
    • Michal Kubeček's avatar
      net: fix checksum features handling in netif_skb_features() · af3e4e5d
      Michal Kubeček authored
      [ Upstream commit db115037 ]
      
      This is follow-up to
      
        da08143b ("vlan: more careful checksum features handling")
      
      which introduced more careful feature intersection in vlan code,
      taking into account that HW_CSUM should be considered superset
      of IP_CSUM/IPV6_CSUM. The same is needed in netif_skb_features()
      in order to avoid offloading mismatch warning when vlan is
      created on top of a bond consisting of slaves supporting IP/IPv6
      checksumming but not vlan Tx offloading.
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      af3e4e5d
    • Gerhard Stenzel's avatar
      vxlan: fix incorrect initializer in union vxlan_addr · cd80ab0c
      Gerhard Stenzel authored
      [ Upstream commit a45e92a5 ]
      
      The first initializer in the following
      
              union vxlan_addr ipa = {
                  .sin.sin_addr.s_addr = tip,
                  .sa.sa_family = AF_INET,
              };
      
      is optimised away by the compiler, due to the second initializer,
      therefore initialising .sin.sin_addr.s_addr always to 0.
      This results in netlink messages indicating a L3 miss never contain the
      missed IP address. This was observed with GCC 4.8 and 4.9. I do not know about previous versions.
      The problem affects user space programs relying on an IP address being
      sent as part of a netlink message indicating a L3 miss.
      
      Changing
                  .sa.sa_family = AF_INET,
      to
                  .sin.sin_family = AF_INET,
      fixes the problem.
      Signed-off-by: default avatarGerhard Stenzel <gerhard.stenzel@de.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cd80ab0c
    • Jiri Benc's avatar
      openvswitch: fix panic with multiple vlan headers · 18460f5f
      Jiri Benc authored
      [ Upstream commit 2ba5af42 ]
      
      When there are multiple vlan headers present in a received frame, the first
      one is put into vlan_tci and protocol is set to ETH_P_8021Q. Anything in the
      skb beyond the VLAN TPID may be still non-linear, including the inner TCI
      and ethertype. While ovs_flow_extract takes care of IP and IPv6 headers, it
      does nothing with ETH_P_8021Q. Later, if OVS_ACTION_ATTR_POP_VLAN is
      executed, __pop_vlan_tci pulls the next vlan header into vlan_tci.
      
      This leads to two things:
      
      1. Part of the resulting ethernet header is in the non-linear part of the
         skb. When eth_type_trans is called later as the result of
         OVS_ACTION_ATTR_OUTPUT, kernel BUGs in __skb_pull. Also, __pop_vlan_tci
         is in fact accessing random data when it reads past the TPID.
      
      2. network_header points into the ethernet header instead of behind it.
         mac_len is set to a wrong value (10), too.
      Reported-by: default avatarYulong Pei <ypei@redhat.com>
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      18460f5f
    • Benjamin Block's avatar
      net: ipv6: fib: don't sleep inside atomic lock · 3f719b11
      Benjamin Block authored
      [ Upstream commit 793c3b40 ]
      
      The function fib6_commit_metrics() allocates a piece of memory in mode
      GFP_KERNEL while holding an atomic lock from higher up in the stack, in
      the function __ip6_ins_rt(). This produces the following BUG:
      
      > BUG: sleeping function called from invalid context at mm/slub.c:1250
      > in_atomic(): 1, irqs_disabled(): 0, pid: 2909, name: dhcpcd
      > 2 locks held by dhcpcd/2909:
      >  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff81978e67>] rtnl_lock+0x17/0x20
      >  #1:  (&tb->tb6_lock){++--+.}, at: [<ffffffff81a6951a>] ip6_route_add+0x65a/0x800
      > CPU: 1 PID: 2909 Comm: dhcpcd Not tainted 3.17.0-rc1 #1
      > Hardware name: ASUS All Series/Q87T, BIOS 0216 10/16/2013
      >  0000000000000008 ffff8800c8f13858 ffffffff81af135a 0000000000000000
      >  ffff880212202430 ffff8800c8f13878 ffffffff810f8d3a ffff880212202c98
      >  0000000000000010 ffff8800c8f138c8 ffffffff8121ad0e 0000000000000001
      > Call Trace:
      >  [<ffffffff81af135a>] dump_stack+0x4e/0x68
      >  [<ffffffff810f8d3a>] __might_sleep+0x10a/0x120
      >  [<ffffffff8121ad0e>] kmem_cache_alloc_trace+0x4e/0x190
      >  [<ffffffff81a6bcd6>] ? fib6_commit_metrics+0x66/0x110
      >  [<ffffffff81a6bcd6>] fib6_commit_metrics+0x66/0x110
      >  [<ffffffff81a6cbf3>] fib6_add+0x883/0xa80
      >  [<ffffffff81a6951a>] ? ip6_route_add+0x65a/0x800
      >  [<ffffffff81a69535>] ip6_route_add+0x675/0x800
      >  [<ffffffff81a68f2a>] ? ip6_route_add+0x6a/0x800
      >  [<ffffffff81a6990c>] inet6_rtm_newroute+0x5c/0x80
      >  [<ffffffff8197cf01>] rtnetlink_rcv_msg+0x211/0x260
      >  [<ffffffff81978e67>] ? rtnl_lock+0x17/0x20
      >  [<ffffffff81119708>] ? lock_release_holdtime+0x28/0x180
      >  [<ffffffff81978e67>] ? rtnl_lock+0x17/0x20
      >  [<ffffffff8197ccf0>] ? __rtnl_unlock+0x20/0x20
      >  [<ffffffff819a989e>] netlink_rcv_skb+0x6e/0xd0
      >  [<ffffffff81978ee5>] rtnetlink_rcv+0x25/0x40
      >  [<ffffffff819a8e59>] netlink_unicast+0xd9/0x180
      >  [<ffffffff819a9600>] netlink_sendmsg+0x700/0x770
      >  [<ffffffff81103735>] ? local_clock+0x25/0x30
      >  [<ffffffff8194e83c>] sock_sendmsg+0x6c/0x90
      >  [<ffffffff811f98e3>] ? might_fault+0xa3/0xb0
      >  [<ffffffff8195ca6d>] ? verify_iovec+0x7d/0xf0
      >  [<ffffffff8194ec3e>] ___sys_sendmsg+0x37e/0x3b0
      >  [<ffffffff8111ef15>] ? trace_hardirqs_on_caller+0x185/0x220
      >  [<ffffffff81af979e>] ? mutex_unlock+0xe/0x10
      >  [<ffffffff819a55ec>] ? netlink_insert+0xbc/0xe0
      >  [<ffffffff819a65e5>] ? netlink_autobind.isra.30+0x125/0x150
      >  [<ffffffff819a6520>] ? netlink_autobind.isra.30+0x60/0x150
      >  [<ffffffff819a84f9>] ? netlink_bind+0x159/0x230
      >  [<ffffffff811f989a>] ? might_fault+0x5a/0xb0
      >  [<ffffffff8194f25e>] ? SYSC_bind+0x7e/0xd0
      >  [<ffffffff8194f8cd>] __sys_sendmsg+0x4d/0x80
      >  [<ffffffff8194f912>] SyS_sendmsg+0x12/0x20
      >  [<ffffffff81afc692>] system_call_fastpath+0x16/0x1b
      
      Fixing this by replacing the mode GFP_KERNEL with GFP_ATOMIC.
      Signed-off-by: default avatarBenjamin Block <bebl@mageta.org>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3f719b11
    • Yuval Mintz's avatar
      bnx2x: Revert UNDI flushing mechanism · 39f5b1eb
      Yuval Mintz authored
      [ Upstream commit 7c3afd85 ]
      
      Commit 91ebb929 ("bnx2x: Add support for Multi-Function UNDI") [which was
      later supposedly fixed by de682941 ("bnx2x: Fix UNDI driver unload")]
      introduced a bug in which in some [yet-to-be-determined] scenarios the
      alternative flushing mechanism which was to guarantee the Rx buffers are
      empty before resetting them during device probe will fail.
      If this happens, when device will be loaded once more a fatal attention will
      occur; Since this most likely happens in boot from SAN scenarios, the machine
      will fail to load.
      
      Notice this may occur not only in the 'Multi-Function' scenario but in the
      regular scenario as well, i.e., this introduced a regression in the driver's
      ability to perform boot from SAN.
      
      The patch reverts the mechanism and applies the old scheme to multi-function
      devices as well as to single-function devices.
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: default avatarAriel Elior <Ariel.Elior@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      39f5b1eb
    • Eric Dumazet's avatar
      packet: handle too big packets for PACKET_V3 · 29c75569
      Eric Dumazet authored
      [ Upstream commit dc808110 ]
      
      af_packet can currently overwrite kernel memory by out of bound
      accesses, because it assumed a [new] block can always hold one frame.
      
      This is not generally the case, even if most existing tools do it right.
      
      This patch clamps too long frames as API permits, and issue a one time
      error on syslog.
      
      [  394.357639] tpacket_rcv: packet too big, clamped from 5042 to 3966. macoff=82
      
      In this example, packet header tp_snaplen was set to 3966,
      and tp_len was set to 5042 (skb->len)
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Fixes: f6fb8f10 ("af-packet: TPACKET_V3 flexible buffer implementation.")
      Acked-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      29c75569
    • Erik Hugne's avatar
      tipc: fix message importance range check · 8ef544dc
      Erik Hugne authored
      [ Upstream commit ac32c7f7 ]
      
      Commit 3b4f302d ("tipc: eliminate
      redundant locking") introduced a bug by removing the sanity check
      for message importance, allowing programs to assign any value to
      the msg_user field. This will mess up the packet reception logic
      and may cause random link resets.
      Signed-off-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8ef544dc
    • Gwenhael Goavec-Merou's avatar
      net: phy: smsc: move smsc_phy_config_init reset part in a soft_reset function · 4775bfac
      Gwenhael Goavec-Merou authored
      [ Upstream commit 21009686 ]
      
      On the one hand, phy_device.c provides a generic reset function if the phy
      driver does not provide a soft_reset pointer. This generic reset does not take
      into account the state of the phy, with a potential failure if the phy is in
      powerdown mode. On the other hand, smsc driver provides a function with both
      correct reset behaviour and configuration.
      
      This patch moves the reset part into a new smsc_phy_reset function and provides
      the soft_reset pointer to have a correct reset behaviour by default.
      Signed-off-by: default avatarGwenhael Goavec-Merou <gwenhael.goavec-merou@armadeus.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4775bfac
    • Neal Cardwell's avatar
      tcp: fix ssthresh and undo for consecutive short FRTO episodes · 638228c5
      Neal Cardwell authored
      [ Upstream commit 0c9ab092 ]
      
      Fix TCP FRTO logic so that it always notices when snd_una advances,
      indicating that any RTO after that point will be a new and distinct
      loss episode.
      
      Previously there was a very specific sequence that could cause FRTO to
      fail to notice a new loss episode had started:
      
      (1) RTO timer fires, enter FRTO and retransmit packet 1 in write queue
      (2) receiver ACKs packet 1
      (3) FRTO sends 2 more packets
      (4) RTO timer fires again (should start a new loss episode)
      
      The problem was in step (3) above, where tcp_process_loss() returned
      early (in the spot marked "Step 2.b"), so that it never got to the
      logic to clear icsk_retransmits. Thus icsk_retransmits stayed
      non-zero. Thus in step (4) tcp_enter_loss() would see the non-zero
      icsk_retransmits, decide that this RTO is not a new episode, and
      decide not to cut ssthresh and remember the current cwnd and ssthresh
      for undo.
      
      There were two main consequences to the bug that we have
      observed. First, ssthresh was not decreased in step (4). Second, when
      there was a series of such FRTO (1-4) sequences that happened to be
      followed by an FRTO undo, we would restore the cwnd and ssthresh from
      before the entire series started (instead of the cwnd and ssthresh
      from before the most recent RTO). This could result in cwnd and
      ssthresh being restored to values much bigger than the proper values.
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Fixes: e33099f9 ("tcp: implement RFC5682 F-RTO")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      638228c5
    • Neal Cardwell's avatar
      tcp: fix tcp_release_cb() to dispatch via address family for mtu_reduced() · 72dcbec4
      Neal Cardwell authored
      [ Upstream commit 4fab9071 ]
      
      Make sure we use the correct address-family-specific function for
      handling MTU reductions from within tcp_release_cb().
      
      Previously AF_INET6 sockets were incorrectly always using the IPv6
      code path when sometimes they were handling IPv4 traffic and thus had
      an IPv4 dst.
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Diagnosed-by: default avatarWillem de Bruijn <willemb@google.com>
      Fixes: 563d34d0 ("tcp: dont drop MTU reduction indications")
      Reviewed-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      72dcbec4
    • Shmulik Ladkani's avatar
      sit: Fix ipip6_tunnel_lookup device matching criteria · e72790ed
      Shmulik Ladkani authored
      [ Upstream commit bc8fc7b8 ]
      
      As of 4fddbf5d ("sit: strictly restrict incoming traffic to tunnel link device"),
      when looking up a tunnel, tunnel's underlying interface (t->parms.link)
      is verified to match incoming traffic's ingress device.
      
      However the comparison was incorrectly based on skb->dev->iflink.
      
      Instead, dev->ifindex should be used, which correctly represents the
      interface from which the IP stack hands the ipip6 packets.
      
      This allows setting up sit tunnels bound to vlan interfaces (otherwise
      incoming ipip6 traffic on the vlan interface was dropped due to
      ipip6_tunnel_lookup match failure).
      Signed-off-by: default avatarShmulik Ladkani <shmulik.ladkani@gmail.com>
      Acked-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e72790ed
    • Andrey Vagin's avatar
      tcp: don't use timestamp from repaired skb-s to calculate RTT (v2) · 48518112
      Andrey Vagin authored
      [ Upstream commit 9d186cac ]
      
      We don't know right timestamp for repaired skb-s. Wrong RTT estimations
      isn't good, because some congestion modules heavily depends on it.
      
      This patch adds the TCPCB_REPAIRED flag, which is included in
      TCPCB_RETRANS.
      
      Thanks to Eric for the advice how to fix this issue.
      
      This patch fixes the warning:
      [  879.562947] WARNING: CPU: 0 PID: 2825 at net/ipv4/tcp_input.c:3078 tcp_ack+0x11f5/0x1380()
      [  879.567253] CPU: 0 PID: 2825 Comm: socket-tcpbuf-l Not tainted 3.16.0-next-20140811 #1
      [  879.567829] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [  879.568177]  0000000000000000 00000000c532680c ffff880039643d00 ffffffff817aa2d2
      [  879.568776]  0000000000000000 ffff880039643d38 ffffffff8109afbd ffff880039d6ba80
      [  879.569386]  ffff88003a449800 000000002983d6bd 0000000000000000 000000002983d6bc
      [  879.569982] Call Trace:
      [  879.570264]  [<ffffffff817aa2d2>] dump_stack+0x4d/0x66
      [  879.570599]  [<ffffffff8109afbd>] warn_slowpath_common+0x7d/0xa0
      [  879.570935]  [<ffffffff8109b0ea>] warn_slowpath_null+0x1a/0x20
      [  879.571292]  [<ffffffff816d0a05>] tcp_ack+0x11f5/0x1380
      [  879.571614]  [<ffffffff816d10bd>] tcp_rcv_established+0x1ed/0x710
      [  879.571958]  [<ffffffff816dc9da>] tcp_v4_do_rcv+0x10a/0x370
      [  879.572315]  [<ffffffff81657459>] release_sock+0x89/0x1d0
      [  879.572642]  [<ffffffff816c81a0>] do_tcp_setsockopt.isra.36+0x120/0x860
      [  879.573000]  [<ffffffff8110a52e>] ? rcu_read_lock_held+0x6e/0x80
      [  879.573352]  [<ffffffff816c8912>] tcp_setsockopt+0x32/0x40
      [  879.573678]  [<ffffffff81654ac4>] sock_common_setsockopt+0x14/0x20
      [  879.574031]  [<ffffffff816537b0>] SyS_setsockopt+0x80/0xf0
      [  879.574393]  [<ffffffff817b40a9>] system_call_fastpath+0x16/0x1b
      [  879.574730] ---[ end trace a17cbc38eb8c5c00 ]---
      
      v2: moving setting of skb->when for repaired skb-s in tcp_write_xmit,
          where it's set for other skb-s.
      
      Fixes: 431a9124 ("tcp: timestamp SYN+DATA messages")
      Fixes: 740b0f18 ("tcp: switch rtt estimations to usec resolution")
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarAndrey Vagin <avagin@openvz.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      48518112
    • David S. Miller's avatar
      Revert "macvlan: simplify the structure port" · de25adff
      David S. Miller authored
      [ Upstream commit 5e3c516b ]
      
      This reverts commit a188a54d.
      
      It causes crashes
      
      ====================
      [   80.643286] BUG: unable to handle kernel NULL pointer dereference at 0000000000000878
      [   80.670103] IP: [<ffffffff810832e4>] try_to_grab_pending+0x64/0x1f0
      [   80.691289] PGD 22c102067 PUD 235bf0067 PMD 0
      [   80.706611] Oops: 0002 [#1] SMP
      [   80.717836] Modules linked in: macvlan nfsd lockd nfs_acl exportfs auth_rpcgss sunrpc oid_registry ioatdma ixgbe(-) mdio igb dca
      [   80.757935] CPU: 37 PID: 6724 Comm: rmmod Not tainted 3.16.0-net-next-08-12-2014-FCoE+ #1
      [   80.785688] Hardware name: Intel Corporation S2600CO/S2600CO, BIOS SE5C600.86B.02.03.0003.041920141333 04/19/2014
      [   80.820310] task: ffff880235a9eae0 ti: ffff88022e844000 task.ti: ffff88022e844000
      [   80.845770] RIP: 0010:[<ffffffff810832e4>]  [<ffffffff810832e4>] try_to_grab_pending+0x64/0x1f0
      [   80.875326] RSP: 0018:ffff88022e847b28  EFLAGS: 00010046
      [   80.893251] RAX: 0000000000037a6a RBX: 0000000000000878 RCX: 0000000000000000
      [   80.917187] RDX: ffff880235a9eae0 RSI: 0000000000000001 RDI: ffffffff810832db
      [   80.941125] RBP: ffff88022e847b58 R08: 0000000000000000 R09: 0000000000000000
      [   80.965056] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88022e847b70
      [   80.988994] R13: 0000000000000000 R14: ffff88022e847be8 R15: ffffffff81ebe440
      [   81.012929] FS:  00007fab90b07700(0000) GS:ffff88043f7a0000(0000) knlGS:0000000000000000
      [   81.040400] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   81.059757] CR2: 0000000000000878 CR3: 0000000235a42000 CR4: 00000000001407e0
      [   81.083689] Stack:
      [   81.090739]  ffff880235a9eae0 0000000000000878 ffff88022e847b70 0000000000000000
      [   81.116253]  ffff88022e847be8 ffffffff81ebe440 ffff88022e847b98 ffffffff810847f1
      [   81.141766]  ffff88022e847b78 0000000000000286 ffff880234200000 0000000000000000
      [   81.167282] Call Trace:
      [   81.175768]  [<ffffffff810847f1>] __cancel_work_timer+0x31/0x170
      [   81.195985]  [<ffffffff8108494b>] cancel_work_sync+0xb/0x10
      [   81.214769]  [<ffffffffa015ae68>] macvlan_port_destroy+0x28/0x60 [macvlan]
      [   81.237844]  [<ffffffffa015b930>] macvlan_uninit+0x40/0x50 [macvlan]
      [   81.259209]  [<ffffffff816bf6e2>] rollback_registered_many+0x1a2/0x2c0
      [   81.281140]  [<ffffffff816bf81a>] unregister_netdevice_many+0x1a/0xb0
      [   81.302786]  [<ffffffffa015a4ff>] macvlan_device_event+0x1ef/0x240 [macvlan]
      [   81.326439]  [<ffffffff8108a13d>] notifier_call_chain+0x4d/0x70
      [   81.346366]  [<ffffffff8108a201>] raw_notifier_call_chain+0x11/0x20
      [   81.367439]  [<ffffffff816bf25b>] call_netdevice_notifiers_info+0x3b/0x70
      [   81.390228]  [<ffffffff816bf2a1>] call_netdevice_notifiers+0x11/0x20
      [   81.411587]  [<ffffffff816bf6bd>] rollback_registered_many+0x17d/0x2c0
      [   81.433518]  [<ffffffff816bf925>] unregister_netdevice_queue+0x75/0x110
      [   81.455735]  [<ffffffff816bfb2b>] unregister_netdev+0x1b/0x30
      [   81.475094]  [<ffffffffa0039b50>] ixgbe_remove+0x170/0x1d0 [ixgbe]
      [   81.495886]  [<ffffffff813512a2>] pci_device_remove+0x32/0x60
      [   81.515246]  [<ffffffff814c75c4>] __device_release_driver+0x64/0xd0
      [   81.536321]  [<ffffffff814c76f8>] driver_detach+0xc8/0xd0
      [   81.554530]  [<ffffffff814c656e>] bus_remove_driver+0x4e/0xa0
      [   81.573888]  [<ffffffff814c828b>] driver_unregister+0x2b/0x60
      [   81.593246]  [<ffffffff8135143e>] pci_unregister_driver+0x1e/0xa0
      [   81.613749]  [<ffffffffa005db18>] ixgbe_exit_module+0x1c/0x2e [ixgbe]
      [   81.635401]  [<ffffffff810e738b>] SyS_delete_module+0x15b/0x1e0
      [   81.655334]  [<ffffffff8187a395>] ? sysret_check+0x22/0x5d
      [   81.673833]  [<ffffffff810abd2d>] ? trace_hardirqs_on_caller+0x11d/0x1e0
      [   81.696339]  [<ffffffff8132bfde>] ? trace_hardirqs_on_thunk+0x3a/0x3f
      [   81.717985]  [<ffffffff8187a369>] system_call_fastpath+0x16/0x1b
      [   81.738199] Code: 00 48 83 3d 6e bb da 00 00 48 89 c2 0f 84 67 01 00 00 fa 66 0f 1f 44 00 00 49 89 14 24 e8 b5 4b 02 00 45 84 ed 0f 85 ac 00 00 00 <f0> 0f ba 2b 00 72 1d 31 c0 48 8b 5d d8 4c 8b 65 e0 4c 8b 6d e8
      [   81.807026] RIP  [<ffffffff810832e4>] try_to_grab_pending+0x64/0x1f0
      [   81.828468]  RSP <ffff88022e847b28>
      [   81.840384] CR2: 0000000000000878
      [   81.851731] ---[ end trace 9f6c7232e3464e11 ]---
      ====================
      
      This bug could be triggered by these steps:
      
      modprobe ixgbe ; modprobe macvlan
      ip link add link p96p1 address 00:1B:21:6E:06:00 macvlan0 type macvlan
      ip link add link p96p1 address 00:1B:21:6E:06:01 macvlan1 type macvlan
      ip link add link p96p1 address 00:1B:21:6E:06:02 macvlan2 type macvlan
      ip link add link p96p1 address 00:1B:21:6E:06:03 macvlan3 type macvlan
      rmmod ixgbe
      Reported-by: default avatar"Keller, Jacob E" <jacob.e.keller@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      de25adff
    • Stanislaw Gruszka's avatar
      myri10ge: check for DMA mapping errors · a378b942
      Stanislaw Gruszka authored
      [ Upstream commit 10545937 ]
      
      On IOMMU systems DMA mapping can fail, we need to check for
      that possibility.
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a378b942
    • Vlad Yasevich's avatar
      net: Always untag vlan-tagged traffic on input. · 84beb1a9
      Vlad Yasevich authored
      [ Upstream commit 0d5501c1 ]
      
      Currently the functionality to untag traffic on input resides
      as part of the vlan module and is build only when VLAN support
      is enabled in the kernel.  When VLAN is disabled, the function
      vlan_untag() turns into a stub and doesn't really untag the
      packets.  This seems to create an interesting interaction
      between VMs supporting checksum offloading and some network drivers.
      
      There are some drivers that do not allow the user to change
      tx-vlan-offload feature of the driver.  These drivers also seem
      to assume that any VLAN-tagged traffic they transmit will
      have the vlan information in the vlan_tci and not in the vlan
      header already in the skb.  When transmitting skbs that already
      have tagged data with partial checksum set, the checksum doesn't
      appear to be updated correctly by the card thus resulting in a
      failure to establish TCP connections.
      
      The following is a packet trace taken on the receiver where a
      sender is a VM with a VLAN configued.  The host VM is running on
      doest not have VLAN support and the outging interface on the
      host is tg3:
      10:12:43.503055 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q
      (0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27243,
      offset 0, flags [DF], proto TCP (6), length 60)
          10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect
      -> 0x48d9), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val
      4294837885 ecr 0,nop,wscale 7], length 0
      10:12:44.505556 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q
      (0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27244,
      offset 0, flags [DF], proto TCP (6), length 60)
          10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect
      -> 0x44ee), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val
      4294838888 ecr 0,nop,wscale 7], length 0
      
      This connection finally times out.
      
      I've only access to the TG3 hardware in this configuration thus have
      only tested this with TG3 driver.  There are a lot of other drivers
      that do not permit user changes to vlan acceleration features, and
      I don't know if they all suffere from a similar issue.
      
      The patch attempt to fix this another way.  It moves the vlan header
      stipping code out of the vlan module and always builds it into the
      kernel network core.  This way, even if vlan is not supported on
      a virtualizatoin host, the virtual machines running on top of such
      host will still work with VLANs enabled.
      
      CC: Patrick McHardy <kaber@trash.net>
      CC: Nithin Nayak Sujir <nsujir@broadcom.com>
      CC: Michael Chan <mchan@broadcom.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      84beb1a9
    • Jiri Benc's avatar
      rtnetlink: fix VF info size · 53f8c7d2
      Jiri Benc authored
      [ Upstream commit 945a3676 ]
      
      Commit 1d8faf48 ("net/core: Add VF link state control") added new
      attribute to IFLA_VF_INFO group in rtnl_fill_ifinfo but did not adjust size
      of the allocated memory in if_nlmsg_size/rtnl_vfinfo_size. As the result, we
      may trigger warnings in rtnl_getlink and similar functions when many VF
      links are enabled, as the information does not fit into the allocated skb.
      
      Fixes: 1d8faf48 ("net/core: Add VF link state control")
      Reported-by: default avatarYulong Pei <ypei@redhat.com>
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      53f8c7d2
    • Daniel Borkmann's avatar
      netlink: reset network header before passing to taps · 47a0ff6c
      Daniel Borkmann authored
      [ Upstream commit 4e48ed88 ]
      
      netlink doesn't set any network header offset thus when the skb is
      being passed to tap devices via dev_queue_xmit_nit(), it emits klog
      false positives due to it being unset like:
      
        ...
        [  124.990397] protocol 0000 is buggy, dev nlmon0
        [  124.990411] protocol 0000 is buggy, dev nlmon0
        ...
      
      So just reset the network header before passing to the device; for
      packet sockets that just means nothing will change - mac and net
      offset hold the same value just as before.
      Reported-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      47a0ff6c
  2. 09 Oct, 2014 23 commits
    • Greg Kroah-Hartman's avatar
      Linux 3.16.5 · 13c24cc8
      Greg Kroah-Hartman authored
      13c24cc8
    • Andrew Hunter's avatar
      jiffies: Fix timeval conversion to jiffies · 79d627d4
      Andrew Hunter authored
      commit d78c9300 upstream.
      
      timeval_to_jiffies tried to round a timeval up to an integral number
      of jiffies, but the logic for doing so was incorrect: intervals
      corresponding to exactly N jiffies would become N+1. This manifested
      itself particularly repeatedly stopping/starting an itimer:
      
      setitimer(ITIMER_PROF, &val, NULL);
      setitimer(ITIMER_PROF, NULL, &val);
      
      would add a full tick to val, _even if it was exactly representable in
      terms of jiffies_ (say, the result of a previous rounding.)  Doing
      this repeatedly would cause unbounded growth in val.  So fix the math.
      
      Here's what was wrong with the conversion: we essentially computed
      (eliding seconds)
      
      jiffies = usec  * (NSEC_PER_USEC/TICK_NSEC)
      
      by using scaling arithmetic, which took the best approximation of
      NSEC_PER_USEC/TICK_NSEC with denominator of 2^USEC_JIFFIE_SC =
      x/(2^USEC_JIFFIE_SC), and computed:
      
      jiffies = (usec * x) >> USEC_JIFFIE_SC
      
      and rounded this calculation up in the intermediate form (since we
      can't necessarily exactly represent TICK_NSEC in usec.) But the
      scaling arithmetic is a (very slight) *over*approximation of the true
      value; that is, instead of dividing by (1 usec/ 1 jiffie), we
      effectively divided by (1 usec/1 jiffie)-epsilon (rounding
      down). This would normally be fine, but we want to round timeouts up,
      and we did so by adding 2^USEC_JIFFIE_SC - 1 before the shift; this
      would be fine if our division was exact, but dividing this by the
      slightly smaller factor was equivalent to adding just _over_ 1 to the
      final result (instead of just _under_ 1, as desired.)
      
      In particular, with HZ=1000, we consistently computed that 10000 usec
      was 11 jiffies; the same was true for any exact multiple of
      TICK_NSEC.
      
      We could possibly still round in the intermediate form, adding
      something less than 2^USEC_JIFFIE_SC - 1, but easier still is to
      convert usec->nsec, round in nanoseconds, and then convert using
      time*spec*_to_jiffies.  This adds one constant multiplication, and is
      not observably slower in microbenchmarks on recent x86 hardware.
      
      Tested: the following program:
      
      int main() {
        struct itimerval zero = {{0, 0}, {0, 0}};
        /* Initially set to 10 ms. */
        struct itimerval initial = zero;
        initial.it_interval.tv_usec = 10000;
        setitimer(ITIMER_PROF, &initial, NULL);
        /* Save and restore several times. */
        for (size_t i = 0; i < 10; ++i) {
          struct itimerval prev;
          setitimer(ITIMER_PROF, &zero, &prev);
          /* on old kernels, this goes up by TICK_USEC every iteration */
          printf("previous value: %ld %ld %ld %ld\n",
                 prev.it_interval.tv_sec, prev.it_interval.tv_usec,
                 prev.it_value.tv_sec, prev.it_value.tv_usec);
          setitimer(ITIMER_PROF, &prev, NULL);
        }
          return 0;
      }
      
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Reviewed-by: default avatarPaul Turner <pjt@google.com>
      Reported-by: default avatarAaron Jacobs <jacobsa@google.com>
      Signed-off-by: default avatarAndrew Hunter <ahh@google.com>
      [jstultz: Tweaked to apply to 3.17-rc]
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      [bwh: Backported to 3.16: adjust filename]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      79d627d4
    • Hans Verkuil's avatar
      media: vb2: fix VBI/poll regression · 1c1e2cc7
      Hans Verkuil authored
      commit 58d75f4b upstream.
      
      The recent conversion of saa7134 to vb2 unconvered a poll() bug that
      broke the teletext applications alevt and mtt. These applications
      expect that calling poll() without having called VIDIOC_STREAMON will
      cause poll() to return POLLERR. That did not happen in vb2.
      
      This patch fixes that behavior. It also fixes what should happen when
      poll() is called when STREAMON is called but no buffers have been
      queued. In that case poll() will also return POLLERR, but only for
      capture queues since output queues will always return POLLOUT
      anyway in that situation.
      
      This brings the vb2 behavior in line with the old videobuf behavior.
      Signed-off-by: default avatarHans Verkuil <hans.verkuil@cisco.com>
      Acked-by: default avatarLaurent Pinchart <laurent.pinchart@ideasonboard.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1c1e2cc7
    • Mel Gorman's avatar
      mm: numa: Do not mark PTEs pte_numa when splitting huge pages · 106aad13
      Mel Gorman authored
      commit abc40bd2 upstream.
      
      This patch reverts 1ba6e0b5 ("mm: numa: split_huge_page: transfer the
      NUMA type from the pmd to the pte"). If a huge page is being split due
      a protection change and the tail will be in a PROT_NONE vma then NUMA
      hinting PTEs are temporarily created in the protected VMA.
      
       VM_RW|VM_PROTNONE
      |-----------------|
            ^
            split here
      
      In the specific case above, it should get fixed up by change_pte_range()
      but there is a window of opportunity for weirdness to happen. Similarly,
      if a huge page is shrunk and split during a protection update but before
      pmd_numa is cleared then a pte_numa can be left behind.
      
      Instead of adding complexity trying to deal with the case, this patch
      will not mark PTEs NUMA when splitting a huge page. NUMA hinting faults
      will not be triggered which is marginal in comparison to the complexity
      in dealing with the corner cases during THP split.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      106aad13
    • Waiman Long's avatar
      mm, thp: move invariant bug check out of loop in __split_huge_page_map · c16f6baf
      Waiman Long authored
      commit f8303c25 upstream.
      
      In __split_huge_page_map(), the check for page_mapcount(page) is
      invariant within the for loop.  Because of the fact that the macro is
      implemented using atomic_read(), the redundant check cannot be optimized
      away by the compiler leading to unnecessary read to the page structure.
      
      This patch moves the invariant bug check out of the loop so that it will
      be done only once.  On a 3.16-rc1 based kernel, the execution time of a
      microbenchmark that broke up 1000 transparent huge pages using munmap()
      had an execution time of 38,245us and 38,548us with and without the
      patch respectively.  The performance gain is about 1%.
      Signed-off-by: default avatarWaiman Long <Waiman.Long@hp.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c16f6baf
    • Bruno Prémont's avatar
      vgaarb: Don't default exclusively to first video device with mem+io · ce027dac
      Bruno Prémont authored
      commit 86fd887b upstream.
      
      Commit 20cde694 ("x86, ia64: Move EFI_FB vga_default_device()
      initialization to pci_vga_fixup()") moved boot video device detection from
      efifb to x86 and ia64 pci/fixup.c.
      
      For dual-GPU Apple computers above change represents a regression as code
      in efifb did forcefully override vga_default_device while the merge did not
      (vgaarb happens prior to PCI fixup).
      
      To improve on initial device selection by vgaarb (it cannot know if PCI
      device not behind bridges see/decode legacy VGA I/O or not), move the
      screen_info based check from pci_video_fixup() to vgaarb's init function and
      use it to refine/override decision taken while adding the individual PCI
      VGA devices.  This way PCI fixup has no reason to adjust vga_default_device
      anymore but can depend on its value for flagging shadowed VBIOS.
      
      This has the nice benefit of removing duplicated code but does introduce a
      #if defined() block in vgaarb.  Not all architectures have screen_info and
      would cause compile to fail without it.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=84461Reported-and-Tested-By: default avatarAndreas Noever <andreas.noever@gmail.com>
      Signed-off-by: default avatarBruno Prémont <bonbons@linux-vserver.org>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      CC: Matthew Garrett <matthew.garrett@nebula.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ce027dac
    • Bruno Prémont's avatar
      x86, ia64: Move EFI_FB vga_default_device() initialization to pci_vga_fixup() · 7babfd7f
      Bruno Prémont authored
      commit 20cde694 upstream.
      
      Commit b4aa0163 ("efifb: Implement vga_default_device() (v2)") added
      efifb vga_default_device() so EFI systems that do not load shadow VBIOS or
      setup VGA get proper value for boot_vga PCI sysfs attribute on the
      corresponding PCI device.
      
      Xorg doesn't detect devices when boot_vga=0, e.g., on some EFI systems such
      as MacBookAir2,1.  Xorg detects the GPU and finds the DRI device but then
      bails out with "no devices detected".
      
      Note: When vga_default_device() is set boot_vga PCI sysfs attribute
      reflects its state.  When unset this attribute is 1 whenever
      IORESOURCE_ROM_SHADOW flag is set.
      
      With introduction of sysfb/simplefb/simpledrm efifb is getting obsolete
      while having native drivers for the GPU also makes selecting sysfb/efifb
      optional.
      
      Remove the efifb implementation of vga_default_device() and initialize
      vgaarb's vga_default_device() with the PCI GPU that matches boot
      screen_info in pci_fixup_video().
      
      [bhelgaas: remove unused "dev" in efifb_setup()]
      Fixes: b4aa0163 ("efifb: Implement vga_default_device() (v2)")
      Tested-by: default avatarAnibal Francisco Martinez Cortina <linuxkid.zeuz@gmail.com>
      Signed-off-by: default avatarBruno Prémont <bonbons@linux-vserver.org>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Acked-by: default avatarMatthew Garrett <matthew.garrett@nebula.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7babfd7f
    • Hans de Goede's avatar
    • Hans de Goede's avatar
      uas: Disable uas on ASM1051 devices · d7d36249
      Hans de Goede authored
      commit a9c54caa upstream.
      
      There are a large numbers of issues with ASM1051 devices in uas mode:
      
      1) They do not support REPORT SUPPORTED OPERATION CODES
      
      2) They use out of spec 8 byte status iu-s when they have no sense data,
         switching to normal 16 byte status iu-s when they do have sense data.
      
      3) They hang / crash when combined with some disks, e.g. a Crucial M500 ssd.
      
      4) They hang / crash when stressed (through e.g. sg_reset --bus) with disks
         with which then normally do work (once 1 & 2 are worked around).
      
      Where as in BOT mode they appear to work fine, so the best way forward with
      these devices is to just blacklist them for uas usage.
      
      Unfortunately this is easier said then done. as older versions of the ASM1053
      (which works fine) use the same usb-id as the ASM1051.
      
      When connected over USB-3 the 2 can be told apart by the number of streams
      they support. So this patch adds some less then pretty code to disable uas for
      the ASM1051. When connected over USB-2, simply disable uas alltogether for
      devices with the shared usb-id.
      
      Cc: stable@vger.kernel.org # 3.16
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d7d36249
    • Hans de Goede's avatar
      uas: Log a warning when we cannot use uas because the hcd lacks streams · d751f881
      Hans de Goede authored
      commit 43508be5 upstream.
      
      So that an user who wants to use uas can see why he is not getting uas.
      
      Also move the check down so that we don't warn if there are other reasons
      why uas cannot work.
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d751f881
    • Hans de Goede's avatar
      uas: Only complain about missing sg if all other checks succeed · d9d4dc60
      Hans de Goede authored
      commit cc4deafc upstream.
      
      Don't complain about controllers without sg support if there are other
      reasons why uas cannot be used anyways.
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d9d4dc60
    • Steven Rostedt (Red Hat)'s avatar
      ring-buffer: Fix infinite spin in reading buffer · 70e9e520
      Steven Rostedt (Red Hat) authored
      commit 24607f11 upstream.
      
      Commit 651e22f2 "ring-buffer: Always reset iterator to reader page"
      fixed one bug but in the process caused another one. The reset is to
      update the header page, but that fix also changed the way the cached
      reads were updated. The cache reads are used to test if an iterator
      needs to be updated or not.
      
      A ring buffer iterator, when created, disables writes to the ring buffer
      but does not stop other readers or consuming reads from happening.
      Although all readers are synchronized via a lock, they are only
      synchronized when in the ring buffer functions. Those functions may
      be called by any number of readers. The iterator continues down when
      its not interrupted by a consuming reader. If a consuming read
      occurs, the iterator starts from the beginning of the buffer.
      
      The way the iterator sees that a consuming read has happened since
      its last read is by checking the reader "cache". The cache holds the
      last counts of the read and the reader page itself.
      
      Commit 651e22f2 changed what was saved by the cache_read when
      the rb_iter_reset() occurred, making the iterator never match the cache.
      Then if the iterator calls rb_iter_reset(), it will go into an
      infinite loop by checking if the cache doesn't match, doing the reset
      and retrying, just to see that the cache still doesn't match! Which
      should never happen as the reset is suppose to set the cache to the
      current value and there's locks that keep a consuming reader from
      having access to the data.
      
      Fixes: 651e22f2 "ring-buffer: Always reset iterator to reader page"
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      70e9e520
    • Josh Triplett's avatar
      init/Kconfig: Fix HAVE_FUTEX_CMPXCHG to not break up the EXPERT menu · f3b920a7
      Josh Triplett authored
      commit 62b4d204 upstream.
      
      commit 03b8c7b6 ("futex: Allow
      architectures to skip futex_atomic_cmpxchg_inatomic() test") added the
      HAVE_FUTEX_CMPXCHG symbol right below FUTEX.  This placed it right in
      the middle of the options for the EXPERT menu.  However,
      HAVE_FUTEX_CMPXCHG does not depend on EXPERT or FUTEX, so Kconfig stops
      placing items in the EXPERT menu, and displays the remaining several
      EXPERT items (starting with EPOLL) directly in the General Setup menu.
      
      Since both users of HAVE_FUTEX_CMPXCHG only select it "if FUTEX", make
      HAVE_FUTEX_CMPXCHG itself depend on FUTEX.  With this change, the
      subsequent items display as part of the EXPERT menu again; the EMBEDDED
      menu now appears as the next top-level item in the General Setup menu,
      which makes General Setup much shorter and more usable.
      Signed-off-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Acked-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f3b920a7
    • Steve French's avatar
      Fix problem recognizing symlinks · f173e28f
      Steve French authored
      commit 19e81573 upstream.
      
      Changeset eb85d94b introduced a problem where if a cifs open
      fails during query info of a file we
      will still try to close the file (happens with certain types
      of reparse points) even though the file handle is not valid.
      
      In addition for SMB2/SMB3 we were not mapping the return code returned
      by Windows when trying to open a file (like a Windows NFS symlink)
      which is a reparse point.
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      Reviewed-by: default avatarPavel Shilovsky <pshilovsky@samba.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f173e28f
    • Chris Wilson's avatar
      drm/i915: Flush the PTEs after updating them before suspend · 332cec01
      Chris Wilson authored
      commit 91e56499 upstream.
      
      As we use WC updates of the PTE, we are responsible for notifying the
      hardware when to flush its TLBs. Do so after we zap all the PTEs before
      suspend (and the BIOS tries to read our GTT).
      
      Fixes a regression from
      
      commit 828c7908
      Author: Ben Widawsky <benjamin.widawsky@intel.com>
      Date:   Wed Oct 16 09:21:30 2013 -0700
      
          drm/i915: Disable GGTT PTEs on GEN6+ suspend
      
      that survived and continue to cause harm even after
      
      commit e568af1c
      Author: Daniel Vetter <daniel.vetter@ffwll.ch>
      Date:   Wed Mar 26 20:08:20 2014 +0100
      
          drm/i915: Undo gtt scratch pte unmapping again
      
      v2: Trivial rebase.
      v3: Fixes requires pointer dances.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82340
      Tested-by: ming.yao@intel.com
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
      Cc: Todd Previte <tprevite@gmail.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      332cec01
    • NeilBrown's avatar
      md/raid5: disable 'DISCARD' by default due to safety concerns. · ede6b1e9
      NeilBrown authored
      commit 8e0e99ba upstream.
      
      It has come to my attention (thanks Martin) that 'discard_zeroes_data'
      is only a hint.  Some devices in some cases don't do what it
      says on the label.
      
      The use of DISCARD in RAID5 depends on reads from discarded regions
      being predictably zero.  If a write to a previously discarded region
      performs a read-modify-write cycle it assumes that the parity block
      was consistent with the data blocks.  If all were zero, this would
      be the case.  If some are and some aren't this would not be the case.
      This could lead to data corruption after a device failure when
      data needs to be reconstructed from the parity.
      
      As we cannot trust 'discard_zeroes_data', ignore it by default
      and so disallow DISCARD on all raid4/5/6 arrays.
      
      As many devices are trustworthy, and as there are benefits to using
      DISCARD, add a module parameter to over-ride this caution and cause
      DISCARD to work if discard_zeroes_data is set.
      
      If a site want to enable DISCARD on some arrays but not on others they
      should select DISCARD support at the filesystem level, and set the
      raid456 module parameter.
          raid456.devices_handle_discard_safely=Y
      
      As this is a data-safety issue, I believe this patch is suitable for
      -stable.
      DISCARD support for RAID456 was added in 3.7
      
      Cc: Shaohua Li <shli@kernel.org>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Heinz Mauelshagen <heinzm@redhat.com>
      Acked-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Acked-by: default avatarMike Snitzer <snitzer@redhat.com>
      Fixes: 620125f2Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ede6b1e9
    • Rafael J. Wysocki's avatar
      cpufreq: pcc-cpufreq: Fix wait_event() under spinlock · 9346dc9c
      Rafael J. Wysocki authored
      commit e65b5ddb upstream.
      
      Fix the following bug introduced by commit 8fec051e (cpufreq:
      Convert existing drivers to use cpufreq_freq_transition_{begin|end})
      that forgot to move the spin_lock() in pcc_cpufreq_target() past
      cpufreq_freq_transition_begin() which calls wait_event():
      
      BUG: sleeping function called from invalid context at drivers/cpufreq/cpufreq.c:370
      in_atomic(): 1, irqs_disabled(): 0, pid: 2636, name: modprobe
      Preemption disabled at:[<ffffffffa04d74d7>] pcc_cpufreq_target+0x27/0x200 [pcc_cpufreq]
      [   51.025044]
      CPU: 57 PID: 2636 Comm: modprobe Tainted: G            E  3.17.0-default #7
      Hardware name: Hewlett-Packard ProLiant DL980 G7, BIOS P66 07/07/2010
       00000000ffffffff ffff88026c46b828 ffffffff81589dbd 0000000000000000
       ffff880037978090 ffff88026c46b848 ffffffff8108e1df ffff880037978090
       0000000000000000 ffff88026c46b878 ffffffff8108e298 ffff88026d73ec00
      Call Trace:
       [<ffffffff81589dbd>] dump_stack+0x4d/0x90
       [<ffffffff8108e1df>] ___might_sleep+0x10f/0x180
       [<ffffffff8108e298>] __might_sleep+0x48/0xd0
       [<ffffffff8145b905>] cpufreq_freq_transition_begin+0x75/0x140 drivers/cpufreq/cpufreq.c:370 wait_event(policy->transition_wait, !policy->transition_ongoing);
       [<ffffffff8108fc99>] ? preempt_count_add+0xb9/0xc0
       [<ffffffffa04d7513>] pcc_cpufreq_target+0x63/0x200 [pcc_cpufreq] drivers/cpufreq/pcc-cpufreq.c:207 spin_lock(&pcc_lock);
       [<ffffffff810e0d0f>] ? update_ts_time_stats+0x7f/0xb0
       [<ffffffff8145be55>] __cpufreq_driver_target+0x85/0x170
       [<ffffffff8145e4c8>] od_check_cpu+0xa8/0xb0
       [<ffffffff8145ef10>] dbs_check_cpu+0x180/0x1d0
       [<ffffffff8145f310>] cpufreq_governor_dbs+0x3b0/0x720
       [<ffffffff8145ebe3>] od_cpufreq_governor_dbs+0x33/0xe0
       [<ffffffff814593d9>] __cpufreq_governor+0xa9/0x210
       [<ffffffff81459fb2>] cpufreq_set_policy+0x1e2/0x2e0
       [<ffffffff8145a6cc>] cpufreq_init_policy+0x8c/0x110
       [<ffffffff8145c9a0>] ? cpufreq_update_policy+0x1b0/0x1b0
       [<ffffffff8108fb99>] ? preempt_count_sub+0xb9/0x100
       [<ffffffff8145c6c6>] __cpufreq_add_dev+0x596/0x6b0
       [<ffffffffa016c608>] ? pcc_cpufreq_probe+0x4b4/0x4b4 [pcc_cpufreq]
       [<ffffffff8145c7ee>] cpufreq_add_dev+0xe/0x10
       [<ffffffff81408e81>] subsys_interface_register+0xc1/0xf0
       [<ffffffff8108fb99>] ? preempt_count_sub+0xb9/0x100
       [<ffffffff8145b3d7>] cpufreq_register_driver+0x117/0x2a0
       [<ffffffffa016c65d>] pcc_cpufreq_init+0x55/0x9f8 [pcc_cpufreq]
       [<ffffffffa016c608>] ? pcc_cpufreq_probe+0x4b4/0x4b4 [pcc_cpufreq]
       [<ffffffff81000298>] do_one_initcall+0xc8/0x1f0
       [<ffffffff811a731d>] ? __vunmap+0x9d/0x100
       [<ffffffff810eb9a0>] do_init_module+0x30/0x1b0
       [<ffffffff810edfa6>] load_module+0x686/0x710
       [<ffffffff810ebb20>] ? do_init_module+0x1b0/0x1b0
       [<ffffffff810ee1db>] SyS_init_module+0x9b/0xc0
       [<ffffffff8158f7a9>] system_call_fastpath+0x16/0x1b
      
      Fixes: 8fec051e (cpufreq: Convert existing drivers to use cpufreq_freq_transition_{begin|end})
      Reported-and-tested-by: default avatarMike Galbraith <umgwanakikbuti@gmail.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9346dc9c
    • Arnd Bergmann's avatar
      cpufreq: integrator: fix integrator_cpufreq_remove return type · 2c36e464
      Arnd Bergmann authored
      commit d62dbf77 upstream.
      
      When building this driver as a module, we get a helpful warning
      about the return type:
      
      drivers/cpufreq/integrator-cpufreq.c:232:2: warning: initialization from incompatible pointer type
        .remove = __exit_p(integrator_cpufreq_remove),
      
      If the remove callback returns void, the caller gets an undefined
      value as it expects an integer to be returned. This fixes the
      problem by passing down the value from cpufreq_unregister_driver.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2c36e464
    • Aaron Lu's avatar
      ACPI / i915: Update the condition to ignore firmware backlight change request · 8cc97651
      Aaron Lu authored
      commit 77076c7a upstream.
      
      Some of the Thinkpads' firmware will issue a backlight change request
      through i915 operation region unconditionally on AC plug/unplug, the
      backlight level used is arbitrary and thus should be ignored. This is
      handled by commit 0b9f7d93 (ACPI / i915: ignore firmware requests
      for backlight change). Then there is a Dell laptop whose vendor backlight
      interface also makes use of operation region to change backlight level
      and with the above commit, that interface no long works. The condition
      used to ignore the backlight change request from firmware is thus
      changed to: if the vendor backlight interface is not in use and the ACPI
      backlight interface is broken, we ignore the requests; oterwise, we keep
      processing them.
      
      Fixes: 0b9f7d93 (ACPI / i915: ignore firmware requests for backlight change)
      Link: https://lkml.org/lkml/2014/9/23/854Reported-and-tested-by: default avatarPali Rohár <pali.rohar@gmail.com>
      Signed-off-by: default avatarAaron Lu <aaron.lu@intel.com>
      Acked-by: default avatarDaniel Vetter <daniel@ffwll.ch>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8cc97651
    • Alexandru M Stan's avatar
      i2c: rk3x: fix 0 length write transfers · 57ca8495
      Alexandru M Stan authored
      commit cf27020d upstream.
      
      i2cdetect -q was broken (everything was a false positive, and no transfers were
      actually being sent over i2c). The way it works is by sending a 0 length write
      request and checking for NACK. This patch fixes the 0 length writes and actually
      sends them.
      Reported-by: default avatarDoug Anderson <dianders@chromium.org>
      Signed-off-by: default avatarAlexandru M Stan <amstan@chromium.org>
      Tested-by: default avatarDoug Anderson <dianders@chromium.org>
      Tested-by: default avatarMax Schwarz <max.schwarz@online.de>
      Signed-off-by: default avatarWolfram Sang <wsa@the-dreams.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      57ca8495
    • Andy Gross's avatar
      i2c: qup: Fix order of runtime pm initialization · 7f28469c
      Andy Gross authored
      commit 86b59bbf upstream.
      
      The runtime pm calls need to be done before populating the children via the
      i2c_add_adapter call.  If this is not done, a child can run into issues trying
      to do i2c read/writes due to the pm_runtime_sync failing.
      Signed-off-by: default avatarAndy Gross <agross@codeaurora.org>
      Reviewed-by: default avatarFelipe Balbi <balbi@ti.com>
      Acked-by: default avatarBjorn Andersson <bjorn.andersson@sonymobile.com>
      Signed-off-by: default avatarWolfram Sang <wsa@the-dreams.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7f28469c
    • Mel Gorman's avatar
      mm: migrate: Close race between migration completion and mprotect · e9203e7b
      Mel Gorman authored
      commit d3cb8bf6 upstream.
      
      A migration entry is marked as write if pte_write was true at the time the
      entry was created. The VMA protections are not double checked when migration
      entries are being removed as mprotect marks write-migration-entries as
      read. It means that potentially we take a spurious fault to mark PTEs write
      again but it's straight-forward. However, there is a race between write
      migrations being marked read and migrations finishing. This potentially
      allows a PTE to be write that should have been read. Close this race by
      double checking the VMA permissions using maybe_mkwrite when migration
      completes.
      
      [torvalds@linux-foundation.org: use maybe_mkwrite]
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e9203e7b
    • Johannes Weiner's avatar
      mm: memcontrol: do not iterate uninitialized memcgs · a1130ef0
      Johannes Weiner authored
      commit 2f7dd7a4 upstream.
      
      The cgroup iterators yield css objects that have not yet gone through
      css_online(), but they are not complete memcgs at this point and so the
      memcg iterators should not return them.  Commit d8ad3055 ("mm/memcg:
      iteration skip memcgs not yet fully initialized") set out to implement
      exactly this, but it uses CSS_ONLINE, a cgroup-internal flag that does
      not meet the ordering requirements for memcg, and so the iterator may
      skip over initialized groups, or return partially initialized memcgs.
      
      The cgroup core can not reasonably provide a clear answer on whether the
      object around the css has been fully initialized, as that depends on
      controller-specific locking and lifetime rules.  Thus, introduce a
      memcg-specific flag that is set after the memcg has been initialized in
      css_online(), and read before mem_cgroup_iter() callers access the memcg
      members.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Tejun Heo <tj@kernel.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a1130ef0