1. 15 Nov, 2016 30 commits
    • Jamal Hadi Salim's avatar
      net sched filters: fix notification of filter delete with proper handle · 5a37dce1
      Jamal Hadi Salim authored
      [ Upstream commit 9ee78374 ]
      
      Daniel says:
      
      While trying out [1][2], I noticed that tc monitor doesn't show the
      correct handle on delete:
      
      $ tc monitor
      qdisc clsact ffff: dev eno1 parent ffff:fff1
      filter dev eno1 ingress protocol all pref 49152 bpf handle 0x2a [...]
      deleted filter dev eno1 ingress protocol all pref 49152 bpf handle 0xf3be0c80
      
      some context to explain the above:
      The user identity of any tc filter is represented by a 32-bit
      identifier encoded in tcm->tcm_handle. Example 0x2a in the bpf filter
      above. A user wishing to delete, get or even modify a specific filter
      uses this handle to reference it.
      Every classifier is free to provide its own semantics for the 32 bit handle.
      Example: classifiers like u32 use schemes like 800:1:801 to describe
      the semantics of their filters represented as hash table, bucket and
      node ids etc.
      Classifiers also have internal per-filter representation which is different
      from this externally visible identity. Most classifiers set this
      internal representation to be a pointer address (which allows fast retrieval
      of said filters in their implementations). This internal representation
      is referenced with the "fh" variable in the kernel control code.
      
      When a user successfuly deletes a specific filter, by specifying the correct
      tcm->tcm_handle, an event is generated to user space which indicates
      which specific filter was deleted.
      
      Before this patch, the "fh" value was sent to user space as the identity.
      As an example what is shown in the sample bpf filter delete event above
      is 0xf3be0c80. This is infact a 32-bit truncation of 0xffff8807f3be0c80
      which happens to be a 64-bit memory address of the internal filter
      representation (address of the corresponding filter's struct cls_bpf_prog);
      
      After this patch the appropriate user identifiable handle as encoded
      in the originating request tcm->tcm_handle is generated in the event.
      One of the cardinal rules of netlink rules is to be able to take an
      event (such as a delete in this case) and reflect it back to the
      kernel and successfully delete the filter. This patch achieves that.
      
      Note, this issue has existed since the original TC action
      infrastructure code patch back in 2004 as found in:
      https://git.kernel.org/cgit/linux/kernel/git/history/history.git/commit/
      
      [1] http://patchwork.ozlabs.org/patch/682828/
      [2] http://patchwork.ozlabs.org/patch/682829/
      
      Fixes: 4e54c481 ("[NET]: Add tc extensions infrastructure.")
      Reported-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5a37dce1
    • David Ahern's avatar
      net: ipv6: Do not consider link state for nexthop validation · d46b1968
      David Ahern authored
      [ Upstream commit d5d32e4b ]
      
      Similar to IPv4, do not consider link state when validating next hops.
      
      Currently, if the link is down default routes can fail to insert:
       $ ip -6 ro add vrf blue default via 2100:2::64 dev eth2
       RTNETLINK answers: No route to host
      
      With this patch the command succeeds.
      
      Fixes: 8c14586f ("net: ipv6: Use passed in table for nexthop lookups")
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d46b1968
    • Tobias Brunner's avatar
      macsec: Fix header length if SCI is added if explicitly disabled · eb77db88
      Tobias Brunner authored
      [ Upstream commit e0f841f5 ]
      
      Even if sending SCIs is explicitly disabled, the code that creates the
      Security Tag might still decide to add it (e.g. if multiple RX SCs are
      defined on the MACsec interface).
      But because the header length so far only depended on the configuration
      option the SCI overwrote the original frame's contents (EtherType and
      e.g. the beginning of the IP header) and if encrypted did not visibly
      end up in the packet, while the SC flag in the TCI field of the Security
      Tag was still set, resulting in invalid MACsec frames.
      
      Fixes: c09440f7 ("macsec: introduce IEEE 802.1AE driver")
      Signed-off-by: default avatarTobias Brunner <tobias@strongswan.org>
      Acked-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eb77db88
    • Stephen Hemminger's avatar
      netvsc: fix incorrect receive checksum offloading · 027ab3b8
      Stephen Hemminger authored
      [ Upstream commit e52fed71 ]
      
      The Hyper-V netvsc driver was looking at the incorrect status bits
      in the checksum info. It was setting the receive checksum unnecessary
      flag based on the IP header checksum being correct. The checksum
      flag is skb is about TCP and UDP checksum status. Because of this
      bug, any packet received with bad TCP checksum would be passed
      up the stack and to the application causing data corruption.
      The problem is reproducible via netcat and netem.
      
      This had a side effect of not doing receive checksum offload
      on IPv6. The driver was also also always doing checksum offload
      independent of the checksum setting done via ethtool.
      Signed-off-by: default avatarStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      027ab3b8
    • Eric Dumazet's avatar
      udp: fix IP_CHECKSUM handling · b75edf27
      Eric Dumazet authored
      [ Upstream commit 10df8e61 ]
      
      First bug was added in commit ad6f939a ("ip: Add offset parameter to
      ip_cmsg_recv") : Tom missed that ipv4 udp messages could be received on
      AF_INET6 socket. ip_cmsg_recv(msg, skb) should have been replaced by
      ip_cmsg_recv_offset(msg, skb, sizeof(struct udphdr));
      
      Then commit e6afc8ac ("udp: remove headers from UDP packets before
      queueing") forgot to adjust the offsets now UDP headers are pulled
      before skb are put in receive queue.
      
      Fixes: ad6f939a ("ip: Add offset parameter to ip_cmsg_recv")
      Fixes: e6afc8ac ("udp: remove headers from UDP packets before queueing")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Sam Kumar <samanthakumar@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Tested-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b75edf27
    • Xin Long's avatar
      sctp: fix the panic caused by route update · 5ee35602
      Xin Long authored
      [ Upstream commit ecc515d7 ]
      
      Commit 7303a147 ("sctp: identify chunks that need to be fragmented
      at IP level") made the chunk be fragmented at IP level in the next round
      if it's size exceed PMTU.
      
      But there still is another case, PMTU can be updated if transport's dst
      expires and transport's pmtu_pending is set in sctp_packet_transmit. If
      the new PMTU is less than the chunk, the same issue with that commit can
      be triggered.
      
      So we should drop this packet and let it retransmit in another round
      where it would be fragmented at IP level.
      
      This patch is to fix it by checking the chunk size after PMTU may be
      updated and dropping this packet if it's size exceed PMTU.
      
      Fixes: 90017acc ("sctp: Add GSO support")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@txudriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5ee35602
    • Jiri Slaby's avatar
      net: sctp, forbid negative length · d90cbfaf
      Jiri Slaby authored
      [ Upstream commit a4b8e71b ]
      
      Most of getsockopt handlers in net/sctp/socket.c check len against
      sizeof some structure like:
              if (len < sizeof(int))
                      return -EINVAL;
      
      On the first look, the check seems to be correct. But since len is int
      and sizeof returns size_t, int gets promoted to unsigned size_t too. So
      the test returns false for negative lengths. Yes, (-1 < sizeof(long)) is
      false.
      
      Fix this in sctp by explicitly checking len < 0 before any getsockopt
      handler is called.
      
      Note that sctp_getsockopt_events already handled the negative case.
      Since we added the < 0 check elsewhere, this one can be removed.
      
      If not checked, this is the result:
      UBSAN: Undefined behaviour in ../mm/page_alloc.c:2722:19
      shift exponent 52 is too large for 32-bit type 'int'
      CPU: 1 PID: 24535 Comm: syz-executor Not tainted 4.8.1-0-syzkaller #1
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
       0000000000000000 ffff88006d99f2a8 ffffffffb2f7bdea 0000000041b58ab3
       ffffffffb4363c14 ffffffffb2f7bcde ffff88006d99f2d0 ffff88006d99f270
       0000000000000000 0000000000000000 0000000000000034 ffffffffb5096422
      Call Trace:
       [<ffffffffb3051498>] ? __ubsan_handle_shift_out_of_bounds+0x29c/0x300
      ...
       [<ffffffffb273f0e4>] ? kmalloc_order+0x24/0x90
       [<ffffffffb27416a4>] ? kmalloc_order_trace+0x24/0x220
       [<ffffffffb2819a30>] ? __kmalloc+0x330/0x540
       [<ffffffffc18c25f4>] ? sctp_getsockopt_local_addrs+0x174/0xca0 [sctp]
       [<ffffffffc18d2bcd>] ? sctp_getsockopt+0x10d/0x1b0 [sctp]
       [<ffffffffb37c1219>] ? sock_common_getsockopt+0xb9/0x150
       [<ffffffffb37be2f5>] ? SyS_getsockopt+0x1a5/0x270
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: linux-sctp@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d90cbfaf
    • Fabio Estevam's avatar
      net: fec: Call swap_buffer() prior to IP header alignment · 64774617
      Fabio Estevam authored
      [ Upstream commit 235bde1e ]
      
      Commit 3ac72b7b ("net: fec: align IP header in hardware") breaks
      networking on mx28.
      
      There is an erratum on mx28 (ENGR121613 - ENET big endian mode
      not compatible with ARM little endian) that requires an additional
      byte-swap operation to workaround this problem.
      
      So call swap_buffer() prior to performing the IP header alignment
      to restore network functionality on mx28.
      
      Fixes: 3ac72b7b ("net: fec: align IP header in hardware")
      Reported-and-tested-by: default avatarHenri Roosen <henri.roosen@ginzinger.com>
      Signed-off-by: default avatarFabio Estevam <fabio.estevam@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      64774617
    • WANG Cong's avatar
      ipv4: use the right lock for ping_group_range · c6c82c2b
      WANG Cong authored
      [ Upstream commit 396a30cc ]
      
      This reverts commit a681574c
      ("ipv4: disable BH in set_ping_group_range()") because we never
      read ping_group_range in BH context (unlike local_port_range).
      
      Then, since we already have a lock for ping_group_range, those
      using ip_local_ports.lock for ping_group_range are clearly typos.
      
      We might consider to share a same lock for both ping_group_range
      and local_port_range w.r.t. space saving, but that should be for
      net-next.
      
      Fixes: a681574c ("ipv4: disable BH in set_ping_group_range()")
      Fixes: ba6b918a ("ping: move ping_group_range out of CONFIG_SYSCTL")
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Eric Salo <salo@google.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c6c82c2b
    • Eric Dumazet's avatar
      ipv4: disable BH in set_ping_group_range() · 8418193f
      Eric Dumazet authored
      [ Upstream commit a681574c ]
      
      In commit 4ee3bd4a ("ipv4: disable BH when changing ip local port
      range") Cong added BH protection in set_local_port_range() but missed
      that same fix was needed in set_ping_group_range()
      
      Fixes: b8f1a556 ("udp: Add function to make source port for UDP tunnels")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarEric Salo <salo@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8418193f
    • Sabrina Dubroca's avatar
      net: add recursion limit to GRO · 23c110c4
      Sabrina Dubroca authored
      [ Upstream commit fcd91dd4 ]
      
      Currently, GRO can do unlimited recursion through the gro_receive
      handlers.  This was fixed for tunneling protocols by limiting tunnel GRO
      to one level with encap_mark, but both VLAN and TEB still have this
      problem.  Thus, the kernel is vulnerable to a stack overflow, if we
      receive a packet composed entirely of VLAN headers.
      
      This patch adds a recursion counter to the GRO layer to prevent stack
      overflow.  When a gro_receive function hits the recursion limit, GRO is
      aborted for this skb and it is processed normally.  This recursion
      counter is put in the GRO CB, but could be turned into a percpu counter
      if we run out of space in the CB.
      
      Thanks to Vladimír Beneš <vbenes@redhat.com> for the initial bug report.
      
      Fixes: CVE-2016-7039
      Fixes: 9b174d88 ("net: Add Transparent Ethernet Bridging GRO support.")
      Fixes: 66e5133f ("vlan: Add GRO support for non hardware accelerated vlan")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: default avatarJiri Benc <jbenc@redhat.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      23c110c4
    • Ido Schimmel's avatar
      net: core: Correctly iterate over lower adjacency list · d3bbd04b
      Ido Schimmel authored
      [ Upstream commit e4961b07 ]
      
      Tamir reported the following trace when processing ARP requests received
      via a vlan device on top of a VLAN-aware bridge:
      
       NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [swapper/1:0]
      [...]
       CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W       4.8.0-rc7 #1
       Hardware name: Mellanox Technologies Ltd. "MSN2100-CB2F"/"SA001017", BIOS 5.6.5 06/07/2016
       task: ffff88017edfea40 task.stack: ffff88017ee10000
       RIP: 0010:[<ffffffff815dcc73>]  [<ffffffff815dcc73>] netdev_all_lower_get_next_rcu+0x33/0x60
      [...]
       Call Trace:
        <IRQ>
        [<ffffffffa015de0a>] mlxsw_sp_port_lower_dev_hold+0x5a/0xa0 [mlxsw_spectrum]
        [<ffffffffa016f1b0>] mlxsw_sp_router_netevent_event+0x80/0x150 [mlxsw_spectrum]
        [<ffffffff810ad07a>] notifier_call_chain+0x4a/0x70
        [<ffffffff810ad13a>] atomic_notifier_call_chain+0x1a/0x20
        [<ffffffff815ee77b>] call_netevent_notifiers+0x1b/0x20
        [<ffffffff815f2eb6>] neigh_update+0x306/0x740
        [<ffffffff815f38ce>] neigh_event_ns+0x4e/0xb0
        [<ffffffff8165ea3f>] arp_process+0x66f/0x700
        [<ffffffff8170214c>] ? common_interrupt+0x8c/0x8c
        [<ffffffff8165ec29>] arp_rcv+0x139/0x1d0
        [<ffffffff816e505a>] ? vlan_do_receive+0xda/0x320
        [<ffffffff815e3794>] __netif_receive_skb_core+0x524/0xab0
        [<ffffffff815e6830>] ? dev_queue_xmit+0x10/0x20
        [<ffffffffa06d612d>] ? br_forward_finish+0x3d/0xc0 [bridge]
        [<ffffffffa06e5796>] ? br_handle_vlan+0xf6/0x1b0 [bridge]
        [<ffffffff815e3d38>] __netif_receive_skb+0x18/0x60
        [<ffffffff815e3dc0>] netif_receive_skb_internal+0x40/0xb0
        [<ffffffff815e3e4c>] netif_receive_skb+0x1c/0x70
        [<ffffffffa06d7856>] br_pass_frame_up+0xc6/0x160 [bridge]
        [<ffffffffa06d63d7>] ? deliver_clone+0x37/0x50 [bridge]
        [<ffffffffa06d656c>] ? br_flood+0xcc/0x160 [bridge]
        [<ffffffffa06d7b14>] br_handle_frame_finish+0x224/0x4f0 [bridge]
        [<ffffffffa06d7f94>] br_handle_frame+0x174/0x300 [bridge]
        [<ffffffff815e3599>] __netif_receive_skb_core+0x329/0xab0
        [<ffffffff81374815>] ? find_next_bit+0x15/0x20
        [<ffffffff8135e802>] ? cpumask_next_and+0x32/0x50
        [<ffffffff810c9968>] ? load_balance+0x178/0x9b0
        [<ffffffff815e3d38>] __netif_receive_skb+0x18/0x60
        [<ffffffff815e3dc0>] netif_receive_skb_internal+0x40/0xb0
        [<ffffffff815e3e4c>] netif_receive_skb+0x1c/0x70
        [<ffffffffa01544a1>] mlxsw_sp_rx_listener_func+0x61/0xb0 [mlxsw_spectrum]
        [<ffffffffa005c9f7>] mlxsw_core_skb_receive+0x187/0x200 [mlxsw_core]
        [<ffffffffa007332a>] mlxsw_pci_cq_tasklet+0x63a/0x9b0 [mlxsw_pci]
        [<ffffffff81091986>] tasklet_action+0xf6/0x110
        [<ffffffff81704556>] __do_softirq+0xf6/0x280
        [<ffffffff8109213f>] irq_exit+0xdf/0xf0
        [<ffffffff817042b4>] do_IRQ+0x54/0xd0
        [<ffffffff8170214c>] common_interrupt+0x8c/0x8c
      
      The problem is that netdev_all_lower_get_next_rcu() never advances the
      iterator, thereby causing the loop over the lower adjacency list to run
      forever.
      
      Fix this by advancing the iterator and avoid the infinite loop.
      
      Fixes: 7ce856aa ("mlxsw: spectrum: Add couple of lower device helper functions")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reported-by: default avatarTamir Winetroub <tamirw@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d3bbd04b
    • Jiri Pirko's avatar
      rtnetlink: Add rtnexthop offload flag to compare mask · fc5722f8
      Jiri Pirko authored
      [ Upstream commit 85dda4e5 ]
      
      The offload flag is a status flag and should not be used by
      FIB semantics for comparison.
      
      Fixes: 37ed9493 ("rtnetlink: add RTNH_F_EXTERNAL flag for fib offload")
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarAndy Gospodarek <andy@greyhouse.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc5722f8
    • Ido Schimmel's avatar
      switchdev: Execute bridge ndos only for bridge ports · 4ac3ca8c
      Ido Schimmel authored
      [ Upstream commit 97c24290 ]
      
      We recently got the following warning after setting up a vlan device on
      top of an offloaded bridge and executing 'bridge link':
      
      WARNING: CPU: 0 PID: 18566 at drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c:81 mlxsw_sp_port_orig_get.part.9+0x55/0x70 [mlxsw_spectrum]
      [...]
       CPU: 0 PID: 18566 Comm: bridge Not tainted 4.8.0-rc7 #1
       Hardware name: Mellanox Technologies Ltd. Mellanox switch/Mellanox switch, BIOS 4.6.5 05/21/2015
        0000000000000286 00000000e64ab94f ffff880406e6f8f0 ffffffff8135eaa3
        0000000000000000 0000000000000000 ffff880406e6f930 ffffffff8108c43b
        0000005106e6f988 ffff8803df398840 ffff880403c60108 ffff880406e6f990
       Call Trace:
        [<ffffffff8135eaa3>] dump_stack+0x63/0x90
        [<ffffffff8108c43b>] __warn+0xcb/0xf0
        [<ffffffff8108c56d>] warn_slowpath_null+0x1d/0x20
        [<ffffffffa01420d5>] mlxsw_sp_port_orig_get.part.9+0x55/0x70 [mlxsw_spectrum]
        [<ffffffffa0142195>] mlxsw_sp_port_attr_get+0xa5/0xb0 [mlxsw_spectrum]
        [<ffffffff816f151f>] switchdev_port_attr_get+0x4f/0x140
        [<ffffffff816f15d0>] switchdev_port_attr_get+0x100/0x140
        [<ffffffff816f15d0>] switchdev_port_attr_get+0x100/0x140
        [<ffffffff816f1d6b>] switchdev_port_bridge_getlink+0x5b/0xc0
        [<ffffffff816f2680>] ? switchdev_port_fdb_dump+0x90/0x90
        [<ffffffff815f5427>] rtnl_bridge_getlink+0xe7/0x190
        [<ffffffff8161a1b2>] netlink_dump+0x122/0x290
        [<ffffffff8161b0df>] __netlink_dump_start+0x15f/0x190
        [<ffffffff815f5340>] ? rtnl_bridge_dellink+0x230/0x230
        [<ffffffff815fab46>] rtnetlink_rcv_msg+0x1a6/0x220
        [<ffffffff81208118>] ? __kmalloc_node_track_caller+0x208/0x2c0
        [<ffffffff815f5340>] ? rtnl_bridge_dellink+0x230/0x230
        [<ffffffff815fa9a0>] ? rtnl_newlink+0x890/0x890
        [<ffffffff8161cf54>] netlink_rcv_skb+0xa4/0xc0
        [<ffffffff815f56f8>] rtnetlink_rcv+0x28/0x30
        [<ffffffff8161c92c>] netlink_unicast+0x18c/0x240
        [<ffffffff8161ccdb>] netlink_sendmsg+0x2fb/0x3a0
        [<ffffffff815c5a48>] sock_sendmsg+0x38/0x50
        [<ffffffff815c6031>] SYSC_sendto+0x101/0x190
        [<ffffffff815c7111>] ? __sys_recvmsg+0x51/0x90
        [<ffffffff815c6b6e>] SyS_sendto+0xe/0x10
        [<ffffffff817017f2>] entry_SYSCALL_64_fastpath+0x1a/0xa4
      
      The problem is that the 8021q module propagates the call to
      ndo_bridge_getlink() via switchdev ops, but the switch driver doesn't
      recognize the netdev, as it's not offloaded.
      
      While we can ignore calls being made to non-bridge ports inside the
      driver, a better fix would be to push this check up to the switchdev
      layer.
      
      Note that these ndos can be called for non-bridged netdev, but this only
      happens in certain PF drivers which don't call the corresponding
      switchdev functions anyway.
      
      Fixes: 99f44bb3 ("mlxsw: spectrum: Enable L3 interfaces on top of bridge devices")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reported-by: default avatarTamir Winetroub <tamirw@mellanox.com>
      Tested-by: default avatarTamir Winetroub <tamirw@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4ac3ca8c
    • Nikolay Aleksandrov's avatar
      bridge: multicast: restore perm router ports on multicast enable · 63d82a2c
      Nikolay Aleksandrov authored
      [ Upstream commit 7cb3f921 ]
      
      Satish reported a problem with the perm multicast router ports not getting
      reenabled after some series of events, in particular if it happens that the
      multicast snooping has been disabled and the port goes to disabled state
      then it will be deleted from the router port list, but if it moves into
      non-disabled state it will not be re-added because the mcast snooping is
      still disabled, and enabling snooping later does nothing.
      
      Here are the steps to reproduce, setup br0 with snooping enabled and eth1
      added as a perm router (multicast_router = 2):
      1. $ echo 0 > /sys/class/net/br0/bridge/multicast_snooping
      2. $ ip l set eth1 down
      ^ This step deletes the interface from the router list
      3. $ ip l set eth1 up
      ^ This step does not add it again because mcast snooping is disabled
      4. $ echo 1 > /sys/class/net/br0/bridge/multicast_snooping
      5. $ bridge -d -s mdb show
      <empty>
      
      At this point we have mcast enabled and eth1 as a perm router (value = 2)
      but it is not in the router list which is incorrect.
      
      After this change:
      1. $ echo 0 > /sys/class/net/br0/bridge/multicast_snooping
      2. $ ip l set eth1 down
      ^ This step deletes the interface from the router list
      3. $ ip l set eth1 up
      ^ This step does not add it again because mcast snooping is disabled
      4. $ echo 1 > /sys/class/net/br0/bridge/multicast_snooping
      5. $ bridge -d -s mdb show
      router ports on br0: eth1
      
      Note: we can directly do br_multicast_enable_port for all because the
      querier timer already has checks for the port state and will simply
      expire if it's in blocking/disabled. See the comment added by
      commit 9aa66382 ("bridge: multicast: add a comment to
      br_port_state_selection about blocking state")
      
      Fixes: 561f1103 ("bridge: Add multicast_snooping sysfs toggle")
      Reported-by: default avatarSatish Ashok <sashok@cumulusnetworks.com>
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      63d82a2c
    • Eric Dumazet's avatar
      net: pktgen: remove rcu locking in pktgen_change_name() · e9a5921c
      Eric Dumazet authored
      [ Upstream commit 9a0b1e8b ]
      
      After Jesper commit back in linux-3.18, we trigger a lockdep
      splat in proc_create_data() while allocating memory from
      pktgen_change_name().
      
      This patch converts t->if_lock to a mutex, since it is now only
      used from control path, and adds proper locking to pktgen_change_name()
      
      1) pktgen_thread_lock to protect the outer loop (iterating threads)
      2) t->if_lock to protect the inner loop (iterating devices)
      
      Note that before Jesper patch, pktgen_change_name() was lacking proper
      protection, but lockdep was not able to detect the problem.
      
      Fixes: 8788370a ("pktgen: RCU-ify "if_list" to remove lock in next_to_run()")
      Reported-by: default avatarJohn Sperbeck <jsperbeck@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e9a5921c
    • Brenden Blanco's avatar
      net/mlx4_en: fixup xdp tx irq to match rx · 2eeb5735
      Brenden Blanco authored
      [ Upstream commit 958b3d39 ]
      
      In cases where the number of tx rings is not a multiple of the number of
      rx rings, the tx completion event will be handled on a different core
      from the transmit and population of the ring. Races on the ring will
      lead to a double-free of the page, and possibly other corruption.
      
      The rings are initialized by default with a valid multiple of rings,
      based on the number of cpus, therefore an invalid configuration requires
      ethtool to change the ring layout. For instance 'ethtool -L eth0 rx 9 tx
      8' will cause packets received on rx0, and XDP_TX'd to tx48, to be
      completed on cpu3 (48 % 9 == 3).
      
      Resolve this discrepancy by shifting the irq for the xdp tx queues to
      start again from 0, modulo rx_ring_num.
      
      Fixes: 9ecc2d86 ("net/mlx4_en: add xdp forwarding and data write support")
      Reported-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarBrenden Blanco <bblanco@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2eeb5735
    • Paolo Abeni's avatar
      IB/ipoib: move back IB LL address into the hard header · 27bb6e31
      Paolo Abeni authored
      [ Upstream commit fc791b63 ]
      
      After the commit 9207f9d4 ("net: preserve IP control block
      during GSO segmentation"), the GSO CB and the IPoIB CB conflict.
      That destroy the IPoIB address information cached there,
      causing a severe performance regression, as better described here:
      
      http://marc.info/?l=linux-kernel&m=146787279825501&w=2
      
      This change moves the data cached by the IPoIB driver from the
      skb control lock into the IPoIB hard header, as done before
      the commit 936d7de3 ("IPoIB: Stop lying about hard_header_len
      and use skb->cb to stash LL addresses").
      In order to avoid GRO issue, on packet reception, the IPoIB driver
      stash into the skb a dummy pseudo header, so that the received
      packets have actually a hard header matching the declared length.
      To avoid changing the connected mode maximum mtu, the allocated
      head buffer size is increased by the pseudo header length.
      
      After this commit, IPoIB performances are back to pre-regression
      value.
      
      v2 -> v3: rebased
      v1 -> v2: avoid changing the max mtu, increasing the head buf size
      
      Fixes: 9207f9d4 ("net: preserve IP control block during GSO segmentation")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      27bb6e31
    • Nicolas Dichtel's avatar
      ipv6: correctly add local routes when lo goes up · f280126e
      Nicolas Dichtel authored
      [ Upstream commit a220445f ]
      
      The goal of the patch is to fix this scenario:
       ip link add dummy1 type dummy
       ip link set dummy1 up
       ip link set lo down ; ip link set lo up
      
      After that sequence, the local route to the link layer address of dummy1 is
      not there anymore.
      
      When the loopback is set down, all local routes are deleted by
      addrconf_ifdown()/rt6_ifdown(). At this time, the rt6_info entry still
      exists, because the corresponding idev has a reference on it. After the rcu
      grace period, dst_rcu_free() is called, and thus ___dst_free(), which will
      set obsolete to DST_OBSOLETE_DEAD.
      
      In this case, init_loopback() is called before dst_rcu_free(), thus
      obsolete is still sets to something <= 0. So, the function doesn't add the
      route again. To avoid that race, let's check the rt6 refcnt instead.
      
      Fixes: 25fb6ca4 ("net IPv6 : Fix broken IPv6 routing table after loopback down-up")
      Fixes: a881ae1f ("ipv6: don't call addrconf_dst_alloc again when enable lo")
      Fixes: 33d99113 ("ipv6: reallocate addrconf router for ipv6 address when lo device up")
      Reported-by: default avatarFrancesco Santoro <francesco.santoro@6wind.com>
      Reported-by: default avatarSamuel Gauthier <samuel.gauthier@6wind.com>
      CC: Balakumaran Kannan <Balakumaran.Kannan@ap.sony.com>
      CC: Maruthi Thotad <Maruthi.Thotad@ap.sony.com>
      CC: Sabrina Dubroca <sd@queasysnail.net>
      CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
      CC: Weilong Chen <chenweilong@huawei.com>
      CC: Gao feng <gaofeng@cn.fujitsu.com>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f280126e
    • Vadim Fedorenko's avatar
      ip6_tunnel: fix ip6_tnl_lookup · 0f3e7762
      Vadim Fedorenko authored
      [ Upstream commit 68d00f33 ]
      
      The commit ea3dc960 ("ip6_tunnel: Add support for wildcard tunnel
      endpoints.") introduces support for wildcards in tunnels endpoints,
      but in some rare circumstances ip6_tnl_lookup selects wrong tunnel
      interface relying only on source or destination address of the packet
      and not checking presence of wildcard in tunnels endpoints. Later in
      ip6_tnl_rcv this packets can be dicarded because of difference in
      ipproto even if fallback device have proper ipproto configuration.
      
      This patch adds checks of wildcard endpoint in tunnel avoiding such
      behavior
      
      Fixes: ea3dc960 ("ip6_tunnel: Add support for wildcard tunnel endpoints.")
      Signed-off-by: default avatarVadim Fedorenko <junk@yandex-team.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0f3e7762
    • Andrew Lunn's avatar
      net: phy: Trigger state machine on state change and not polling. · a148a818
      Andrew Lunn authored
      [ Upstream commit 3c293f4e ]
      
      The phy_start() is used to indicate the PHY is now ready to do its
      work. The state is changed, normally to PHY_UP which means that both
      the MAC and the PHY are ready.
      
      If the phy driver is using polling, when the next poll happens, the
      state machine notices the PHY is now in PHY_UP, and kicks off
      auto-negotiation, if needed.
      
      If however, the PHY is using interrupts, there is no polling. The phy
      is stuck in PHY_UP until the next interrupt comes along. And there is
      no reason for the PHY to interrupt.
      
      Have phy_start() schedule the state machine to run, which both speeds
      up the polling use case, and makes the interrupt use case actually
      work.
      
      This problems exists whenever there is a state change which will not
      cause an interrupt. Trigger the state machine in these cases,
      e.g. phy_error().
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Cc: Kyle Roeschley <kyle.roeschley@ni.com>
      Tested-by: default avatarKyle Roeschley <kyle.roeschley@ni.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a148a818
    • Eric Dumazet's avatar
      ipv6: tcp: restore IP6CB for pktoptions skbs · 2a909989
      Eric Dumazet authored
      [ Upstream commit 8ce48623 ]
      
      Baozeng Ding reported following KASAN splat :
      
      BUG: KASAN: use-after-free in ip6_datagram_recv_specific_ctl+0x13f1/0x15c0 at addr ffff880029c84ec8
      Read of size 1 by task poc/25548
      Call Trace:
       [<ffffffff82cf43c9>] dump_stack+0x12e/0x185 /lib/dump_stack.c:15
       [<     inline     >] print_address_description /mm/kasan/report.c:204
       [<ffffffff817ced3b>] kasan_report_error+0x48b/0x4b0 /mm/kasan/report.c:283
       [<     inline     >] kasan_report /mm/kasan/report.c:303
       [<ffffffff817ced9e>] __asan_report_load1_noabort+0x3e/0x40 /mm/kasan/report.c:321
       [<ffffffff85c71da1>] ip6_datagram_recv_specific_ctl+0x13f1/0x15c0 /net/ipv6/datagram.c:687
       [<ffffffff85c734c3>] ip6_datagram_recv_ctl+0x33/0x40
       [<ffffffff85c0b07c>] do_ipv6_getsockopt.isra.4+0xaec/0x2150
       [<ffffffff85c0c7f6>] ipv6_getsockopt+0x116/0x230
       [<ffffffff859b5a12>] tcp_getsockopt+0x82/0xd0 /net/ipv4/tcp.c:3035
       [<ffffffff855fb385>] sock_common_getsockopt+0x95/0xd0 /net/core/sock.c:2647
       [<     inline     >] SYSC_getsockopt /net/socket.c:1776
       [<ffffffff855f8ba2>] SyS_getsockopt+0x142/0x230 /net/socket.c:1758
       [<ffffffff8685cdc5>] entry_SYSCALL_64_fastpath+0x23/0xc6
      Memory state around the buggy address:
       ffff880029c84d80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ffff880029c84e00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      > ffff880029c84e80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                                                    ^
       ffff880029c84f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ffff880029c84f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      
      He also provided a syzkaller reproducer.
      
      Issue is that ip6_datagram_recv_specific_ctl() expects to find IP6CB
      data that was moved at a different place in tcp_v6_rcv()
      
      This patch moves tcp_v6_restore_cb() up and calls it from
      tcp_v6_do_rcv() when np->pktoptions is set.
      
      Fixes: 971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarBaozeng Ding <sploving1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2a909989
    • WANG Cong's avatar
      net_sched: reorder pernet ops and act ops registrations · 50b43ad1
      WANG Cong authored
      [ Upstream commit ab102b80 ]
      
      Krister reported a kernel NULL pointer dereference after
      tcf_action_init_1() invokes a_o->init(), it is a race condition
      where one thread calling tcf_register_action() to initialize
      the netns data after putting act ops in the global list and
      the other thread searching the list and then calling
      a_o->init(net, ...).
      
      Fix this by moving the pernet ops registration before making
      the action ops visible. This is fine because: a) we don't
      rely on act_base in pernet ops->init(), b) in the worst case we
      have a fully initialized netns but ops is still not ready so
      new actions still can't be created.
      Reported-by: default avatarKrister Johansen <kjlx@templeofstupid.com>
      Tested-by: default avatarKrister Johansen <kjlx@templeofstupid.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      50b43ad1
    • Vlad Tsyrklevich's avatar
      drivers/ptp: Fix kernel memory disclosure · dac04913
      Vlad Tsyrklevich authored
      [ Upstream commit 02a9079c ]
      
      The reserved field precise_offset->rsv is not cleared before being
      copied to user space, leaking kernel stack memory. Clear the struct
      before it's copied.
      Signed-off-by: default avatarVlad Tsyrklevich <vlad@tsyrklevich.net>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dac04913
    • Eric Dumazet's avatar
      netlink: do not enter direct reclaim from netlink_dump() · 3f841d15
      Eric Dumazet authored
      [ Upstream commit d35c99ff ]
      
      Since linux-3.15, netlink_dump() can use up to 16384 bytes skb
      allocations.
      
      Due to struct skb_shared_info ~320 bytes overhead, we end up using
      order-3 (on x86) page allocations, that might trigger direct reclaim and
      add stress.
      
      The intent was really to attempt a large allocation but immediately
      fallback to a smaller one (order-1 on x86) in case of memory stress.
      
      On recent kernels (linux-4.4), we can remove __GFP_DIRECT_RECLAIM to
      meet the goal. Old kernels would need to remove __GFP_WAIT
      
      While we are at it, since we do an order-3 allocation, allow to use
      all the allocated bytes instead of 16384 to reduce syscalls during
      large dumps.
      
      iproute2 already uses 32KB recvmsg() buffer sizes.
      
      Alexei provided an initial patch downsizing to SKB_WITH_OVERHEAD(16384)
      
      Fixes: 9063e21f ("netlink: autosize skb lengthes")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Cc: Greg Thelen <gthelen@google.com>
      Reviewed-by: default avatarGreg Rose <grose@lightfleet.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3f841d15
    • Anoob Soman's avatar
      packet: call fanout_release, while UNREGISTERING a netdev · 5086cadf
      Anoob Soman authored
      [ Upstream commit 66644982 ]
      
      If a socket has FANOUT sockopt set, a new proto_hook is registered
      as part of fanout_add(). When processing a NETDEV_UNREGISTER event in
      af_packet, __fanout_unlink is called for all sockets, but prot_hook which was
      registered as part of fanout_add is not removed. Call fanout_release, on a
      NETDEV_UNREGISTER, which removes prot_hook and removes fanout from the
      fanout_list.
      
      This fixes BUG_ON(!list_empty(&dev->ptype_specific)) in netdev_run_todo()
      Signed-off-by: default avatarAnoob Soman <anoob.soman@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5086cadf
    • Andrew Collins's avatar
      net: Add netdev all_adj_list refcnt propagation to fix panic · 6fff1319
      Andrew Collins authored
      [ Upstream commit 93409033 ]
      
      This is a respin of a patch to fix a relatively easily reproducible kernel
      panic related to the all_adj_list handling for netdevs in recent kernels.
      
      The following sequence of commands will reproduce the issue:
      
      ip link add link eth0 name eth0.100 type vlan id 100
      ip link add link eth0 name eth0.200 type vlan id 200
      ip link add name testbr type bridge
      ip link set eth0.100 master testbr
      ip link set eth0.200 master testbr
      ip link add link testbr mac0 type macvlan
      ip link delete dev testbr
      
      This creates an upper/lower tree of (excuse the poor ASCII art):
      
                  /---eth0.100-eth0
      mac0-testbr-
                  \---eth0.200-eth0
      
      When testbr is deleted, the all_adj_lists are walked, and eth0 is deleted twice from
      the mac0 list. Unfortunately, during setup in __netdev_upper_dev_link, only one
      reference to eth0 is added, so this results in a panic.
      
      This change adds reference count propagation so things are handled properly.
      
      Matthias Schiffer reported a similar crash in batman-adv:
      
      https://github.com/freifunk-gluon/gluon/issues/680
      https://www.open-mesh.org/issues/247
      
      which this patch also seems to resolve.
      Signed-off-by: default avatarAndrew Collins <acollins@cradlepoint.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6fff1319
    • Shmulik Ladkani's avatar
      net/sched: act_vlan: Push skb->data to mac_header prior calling skb_vlan_*() functions · 9caee42c
      Shmulik Ladkani authored
      [ Upstream commit f39acc84 ]
      
      Generic skb_vlan_push/skb_vlan_pop functions don't properly handle the
      case where the input skb data pointer does not point at the mac header:
      
      - They're doing push/pop, but fail to properly unwind data back to its
        original location.
        For example, in the skb_vlan_push case, any subsequent
        'skb_push(skb, skb->mac_len)' calls make the skb->data point 4 bytes
        BEFORE start of frame, leading to bogus frames that may be transmitted.
      
      - They update rcsum per the added/removed 4 bytes tag.
        Alas if data is originally after the vlan/eth headers, then these
        bytes were already pulled out of the csum.
      
      OTOH calling skb_vlan_push/skb_vlan_pop with skb->data at mac_header
      present no issues.
      
      act_vlan is the only caller to skb_vlan_*() that has skb->data pointing
      at network header (upon ingress).
      Other calles (ovs, bpf) already adjust skb->data at mac_header.
      
      This patch fixes act_vlan to point to the mac_header prior calling
      skb_vlan_*() functions, as other callers do.
      Signed-off-by: default avatarShmulik Ladkani <shmulik.ladkani@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Pravin Shelar <pshelar@ovn.org>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9caee42c
    • Paolo Abeni's avatar
      net: pktgen: fix pkt_size · c002dfd8
      Paolo Abeni authored
      [ Upstream commit 63d75463 ]
      
      The commit 879c7220 ("net: pktgen: Observe needed_headroom
      of the device") increased the 'pkt_overhead' field value by
      LL_RESERVED_SPACE.
      As a side effect the generated packet size, computed as:
      
      	/* Eth + IPh + UDPh + mpls */
      	datalen = pkt_dev->cur_pkt_size - 14 - 20 - 8 -
      		  pkt_dev->pkt_overhead;
      
      is decreased by the same value.
      The above changed slightly the behavior of existing pktgen users,
      and made the procfs interface somewhat inconsistent.
      Fix it by restoring the previous pkt_overhead value and using
      LL_RESERVED_SPACE as extralen in skb allocation.
      Also, change pktgen_alloc_skb() to only partially reserve
      the headroom to allow the caller to prefetch from ll header
      start.
      
      v1 -> v2:
       - fixed some typos in the comments
      
      Fixes: 879c7220 ("net: pktgen: Observe needed_headroom of the device")
      Suggested-by: default avatarBen Greear <greearb@candelatech.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c002dfd8
    • Gavin Schenk's avatar
      net: fec: set mac address unconditionally · ff1b27c3
      Gavin Schenk authored
      [ Upstream commit b82d44d7 ]
      
      If the mac address origin is not dt, you can only safely assign a mac
      address after "link up" of the device. If the link is off the clocks are
      disabled and because of issues assigning registers when clocks are off the
      new mac address cannot be written in .ndo_set_mac_address() on some soc's.
      This fix sets the mac address unconditionally in fec_restart(...) and
      ensures consistency between fec registers and the network layer.
      Signed-off-by: default avatarGavin Schenk <g.schenk@eckelmann.de>
      Acked-by: default avatarFugang Duan <fugang.duan@nxp.com>
      Acked-by: default avatarUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Fixes: 9638d19e ("net: fec: add netif status check before set mac address")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ff1b27c3
  2. 10 Nov, 2016 10 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.8.7 · 567aeca9
      Greg Kroah-Hartman authored
      567aeca9
    • Oliver Neukum's avatar
      HID: usbhid: add ATEN CS962 to list of quirky devices · 1bf121d3
      Oliver Neukum authored
      commit cf0ea4da upstream.
      
      Like many similar devices it needs a quirk to work.
      Issuing the request gets the device into an irrecoverable state.
      Signed-off-by: default avatarOliver Neukum <oneukum@suse.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1bf121d3
    • Rafael J. Wysocki's avatar
      cpufreq: intel_pstate: Set P-state upfront in performance mode · 05a833d4
      Rafael J. Wysocki authored
      commit a6c6ead1 upstream.
      
      After commit a4675fbc (cpufreq: intel_pstate: Replace timers with
      utilization update callbacks) the cpufreq governor callbacks may not
      be invoked on NOHZ_FULL CPUs and, in particular, switching to the
      "performance" policy via sysfs may not have any effect on them.  That
      is a problem, because it usually is desirable to squeeze the last
      bit of performance out of those CPUs, so work around it by setting
      the maximum P-state (within the limits) in intel_pstate_set_policy()
      upfront when the policy is CPUFREQ_POLICY_PERFORMANCE.
      
      Fixes: a4675fbc (cpufreq: intel_pstate: Replace timers with utilization update callbacks)
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: default avatarSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      05a833d4
    • Boris Brezillon's avatar
      ubi: fastmap: Fix add_vol() return value test in ubi_attach_fastmap() · c8977151
      Boris Brezillon authored
      commit 40b6e61a upstream.
      
      Commit e96a8a3b ("UBI: Fastmap: Do not add vol if it already
      exists") introduced a bug by changing the possible error codes returned
      by add_vol():
      - this function no longer returns NULL in case of allocation failure
        but return ERR_PTR(-ENOMEM)
      - when a duplicate entry in the volume RB tree is found it returns
        ERR_PTR(-EEXIST) instead of ERR_PTR(-EINVAL)
      
      Fix the tests done on add_vol() return val to match this new behavior.
      
      Fixes: e96a8a3b ("UBI: Fastmap: Do not add vol if it already exists")
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@free-electrons.com>
      Acked-by: default avatarSheng Yong <shengyong1@huawei.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c8977151
    • Goldwyn Rodrigues's avatar
      btrfs: qgroup: Prevent qgroup->reserved from going subzero · 591bf136
      Goldwyn Rodrigues authored
      commit 0b34c261 upstream.
      
      While free'ing qgroup->reserved resources, we much check if
      the page has not been invalidated by a truncate operation
      by checking if the page is still dirty before reducing the
      qgroup resources. Resources in such a case are free'd when
      the entire extent is released by delayed_ref.
      
      This fixes a double accounting while releasing resources
      in case of truncating a file, reproduced by the following testcase.
      
      SCRATCH_DEV=/dev/vdb
      SCRATCH_MNT=/mnt
      mkfs.btrfs -f $SCRATCH_DEV
      mount -t btrfs $SCRATCH_DEV $SCRATCH_MNT
      cd $SCRATCH_MNT
      btrfs quota enable $SCRATCH_MNT
      btrfs subvolume create a
      btrfs qgroup limit 500m a $SCRATCH_MNT
      sync
      for c in {1..15}; do
      dd if=/dev/zero  bs=1M count=40 of=$SCRATCH_MNT/a/file;
      done
      
      sleep 10
      sync
      sleep 5
      
      touch $SCRATCH_MNT/a/newfile
      
      echo "Removing file"
      rm $SCRATCH_MNT/a/file
      
      Fixes: b9d0b389 ("btrfs: Add handler for invalidate page")
      Signed-off-by: default avatarGoldwyn Rodrigues <rgoldwyn@suse.com>
      Reviewed-by: default avatarQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      591bf136
    • Owen Hofmann's avatar
      kvm: x86: Check memopp before dereference (CVE-2016-8630) · 0c879624
      Owen Hofmann authored
      commit d9092f52 upstream.
      
      Commit 41061cdb ("KVM: emulate: do not initialize memopp") removes a
      check for non-NULL under incorrect assumptions. An undefined instruction
      with a ModR/M byte with Mod=0 and R/M-5 (e.g. 0xc7 0x15) will attempt
      to dereference a null pointer here.
      
      Fixes: 41061cdb
      Message-Id: <1477592752-126650-2-git-send-email-osh@google.com>
      Signed-off-by: default avatarOwen Hofmann <osh@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0c879624
    • Russell King's avatar
      ARM: fix oops when using older ARMv4T CPUs · 725a92be
      Russell King authored
      commit 04946fb6 upstream.
      
      Alexander Shiyan reports that CLPS711x fails at boot time in the data
      exception handler due to a NULL pointer dereference.  This is caused by
      the late-v4t abort handler overwriting R9 (which becomes zero).  Fix
      this by making the abort handler save and restore R9.
      
      Unable to handle kernel NULL pointer dereference at virtual address 00000008
      pgd = c3b58000
      [00000008] *pgd=800000000, *pte=00000000, *ppte=feff4140
      Internal error: Oops: 63c11817 [#1] PREEMPT ARM
      CPU: 0 PID: 448 Comm: ash Not tainted 4.8.1+ #1
      Hardware name: Cirrus Logic CLPS711X (Device Tree Support)
      task: c39e03a0 ti: c3b4e000 task.ti: c3b4e000
      PC is at __dabt_svc+0x4c/0x60
      LR is at do_page_fault+0x144/0x2ac
      pc : [<c000d3ac>]    lr : [<c000fcec>]    psr: 60000093
      sp : c3b4fe6c  ip : 00000001  fp : b6f1bf88
      r10: c387a5a0  r9 : 00000000  r8 : e4e0e001
      r7 : bee3ef83  r6 : 00100000  r5 : 80000013  r4 : c022fcf8
      r3 : 00000000  r2 : 00000008  r1 : bf000000  r0 : 00000000
      Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
      Control: 0000217f  Table: c3b58055  DAC: 00000055
      Process ash (pid: 448, stack limit = 0xc3b4e190)
      Stack: (0xc3b4fe6c to 0xc3b50000)
      fe60:                            bee3ef83 c05168d1 ffffffff 00000000 c3adfe80
      fe80: c3a03300 00000000 c3b4fed0 c3a03400 bee3ef83 c387a5a0 b6f1bf88 00000001
      fea0: c3b4febc 00000076 c022fcf8 80000013 ffffffff 0000003f bf000000 bee3ef83
      fec0: 00000004 00000000 c3adfe80 c00e432c 00000812 00000005 00000001 00000006
      fee0: b6f1b000 00000000 00010000 0003c944 0004d000 0004d439 00010000 b6f1b000
      ff00: 00000005 00000000 00015ecc c3b4fed0 0000000a 00000000 00000000 c00a1dc0
      ff20: befff000 c3a03300 c3b4e000 c0507cd8 c0508024 fffffff8 c3a03300 00000000
      ff40: c0516a58 c00a35bc c39e03a0 000001c0 bea84ce8 0004e008 c3b3a000 c00a3ac0
      ff60: c3b40374 c3b3a000 bea84d11 00000000 c0500188 bea84d11 bea84ce8 00000001
      ff80: 0000000b c000a304 c3b4e000 00000000 bea84ce4 c00a3cd0 00000000 bea84d11
      ffa0: bea84ce8 c000a160 bea84d11 bea84ce8 bea84d11 bea84ce8 0004e008 0004d450
      ffc0: bea84d11 bea84ce8 00000001 0000000b b6f45ee4 00000000 b6f5ff70 bea84ce4
      ffe0: b6f2f130 bea84cb0 b6f2f194 b6ef29f4 a0000010 bea84d11 02c7cffa 02c7cffd
      [<c000d3ac>] (__dabt_svc) from [<c022fcf8>] (__copy_to_user_std+0xf8/0x330)
      [<c022fcf8>] (__copy_to_user_std) from [<c00e432c>]
      +(load_elf_binary+0x920/0x107c)
      [<c00e432c>] (load_elf_binary) from [<c00a35bc>]
      +(search_binary_handler+0x80/0x16c)
      [<c00a35bc>] (search_binary_handler) from [<c00a3ac0>]
      +(do_execveat_common+0x418/0x600)
      [<c00a3ac0>] (do_execveat_common) from [<c00a3cd0>] (do_execve+0x28/0x30)
      [<c00a3cd0>] (do_execve) from [<c000a160>] (ret_fast_syscall+0x0/0x30)
      Code: e1a0200d eb00136b e321f093 e59d104c (e5891008)
      ---[ end trace 4b4f8086ebef98c5 ]---
      
      Fixes: e6978e4b ("ARM: save and reset the address limit when entering an exception")
      Reported-by: default avatarAlexander Shiyan <shc_work@mail.ru>
      Tested-by: default avatarAlexander Shiyan <shc_work@mail.ru>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      725a92be
    • Jiri Slaby's avatar
      tty: vt, fix bogus division in csi_J · e339609b
      Jiri Slaby authored
      commit 42acfc66 upstream.
      
      In csi_J(3), the third parameter of scr_memsetw (vc_screenbuf_size) is
      divided by 2 inappropriatelly. But scr_memsetw expects size, not
      count, because it divides the size by 2 on its own before doing actual
      memset-by-words.
      
      So remove the bogus division.
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Cc: Petr Písař <ppisar@redhat.com>
      Fixes: f8df13e0 (tty: Clean console safely)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e339609b
    • Laurent Pinchart's avatar
      v4l: vsp1: Prevent pipelines from running when not streaming · 4a22930a
      Laurent Pinchart authored
      commit e4e70a14 upstream.
      
      Pipelines can only be run if all their video nodes are streaming. Commit
      b4dfb9b3 ("[media] v4l: vsp1: Stop the pipeline upon the first
      STREAMOFF") fixed the pipeline stop sequence, but introduced a race
      condition that makes it possible to run a pipeline after stopping the
      stream on a video node by queuing a buffer on the other side of the
      pipeline.
      
      Fix this by clearing the buffers ready flag when stopping the stream,
      which will prevent the QBUF handler from running the pipeline.
      
      Fixes: b4dfb9b3 ("[media] v4l: vsp1: Stop the pipeline upon the first STREAMOFF")
      Reported-by: default avatarKieran Bingham <kieran@bingham.xyz>
      Tested-by: default avatarKieran Bingham <kieran@bingham.xyz>
      Signed-off-by: default avatarLaurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4a22930a
    • Tony Lindgren's avatar
      usb: musb: Fix hardirq-safe hardirq-unsafe lock order error · 59f9693a
      Tony Lindgren authored
      commit d8e5f0ec upstream.
      
      If we configure musb with 2430 glue as a peripheral, and then rmmod
      omap2430 module, we'll get the following error:
      
      [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ]
      ...
      rmmod/413 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
       (&phy->mutex){+.+.+.}, at: [<c04b9fd0>] phy_power_off+0x1c/0xb8
      [  204.678710]
                     and this task is already holding:
       (&(&musb->lock)->rlock){-.-...}, at: [<bf3a482c>]
       musb_gadget_stop+0x24/0xec [musb_hdrc]
      which would create a new lock dependency:
       (&(&musb->lock)->rlock){-.-...} -> (&phy->mutex){+.+.+.}
      ...
      
      This is because some glue layers expect musb_platform_enable/disable
      to be called with spinlock held, and 2430 glue layer has USB PHY on
      the I2C bus using a mutex.
      
      We could fix the glue layers to take the spinlock, but we still have
      a problem of musb_plaform_enable/disable being called in an unbalanced
      manner. So that would still lead into USB PHY enable/disable related
      problems for omap2430 glue layer.
      
      While it makes sense to only enable USB PHY when needed from PM point
      of view, in this case we just can't do it yet without breaking things.
      So let's just revert phy_enable/disable related changes instead and
      reconsider this after we have fixed musb_platform_enable/disable to
      be balanced.
      
      Fixes: a83e17d0 ("usb: musb: Improve PM runtime and phy handling for 2430 glue layer")
      Reviewed-by: default avatarLaurent Pinchart <laurent.pinchart@ideasonboard.com>
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      Signed-off-by: default avatarBin Liu <b-liu@ti.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      59f9693a