1. 09 Dec, 2015 30 commits
    • Marek Szyprowski's avatar
      ARM: 8426/1: dma-mapping: add missing range check in dma_mmap() · 85b36754
      Marek Szyprowski authored
      commit 371f0f08 upstream.
      
      dma_mmap() function in IOMMU-based dma-mapping implementation lacked
      a check for valid range of mmap parameters (offset and buffer size), what
      might have caused access beyond the allocated buffer. This patch fixes
      this issue.
      Signed-off-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      85b36754
    • Sasha Levin's avatar
      RDS: verify the underlying transport exists before creating a connection · 1935cd0b
      Sasha Levin authored
      [ Upstream commit 74e98eb0 ]
      
      There was no verification that an underlying transport exists when creating
      a connection, this would cause dereferencing a NULL ptr.
      
      It might happen on sockets that weren't properly bound before attempting to
      send a message, which will cause a NULL ptr deref:
      
      [135546.047719] kasan: GPF could be caused by NULL-ptr deref or user memory accessgeneral protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
      [135546.051270] Modules linked in:
      [135546.051781] CPU: 4 PID: 15650 Comm: trinity-c4 Not tainted 4.2.0-next-20150902-sasha-00041-gbaa1222-dirty #2527
      [135546.053217] task: ffff8800835bc000 ti: ffff8800bc708000 task.ti: ffff8800bc708000
      [135546.054291] RIP: __rds_conn_create (net/rds/connection.c:194)
      [135546.055666] RSP: 0018:ffff8800bc70fab0  EFLAGS: 00010202
      [135546.056457] RAX: dffffc0000000000 RBX: 0000000000000f2c RCX: ffff8800835bc000
      [135546.057494] RDX: 0000000000000007 RSI: ffff8800835bccd8 RDI: 0000000000000038
      [135546.058530] RBP: ffff8800bc70fb18 R08: 0000000000000001 R09: 0000000000000000
      [135546.059556] R10: ffffed014d7a3a23 R11: ffffed014d7a3a21 R12: 0000000000000000
      [135546.060614] R13: 0000000000000001 R14: ffff8801ec3d0000 R15: 0000000000000000
      [135546.061668] FS:  00007faad4ffb700(0000) GS:ffff880252000000(0000) knlGS:0000000000000000
      [135546.062836] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [135546.063682] CR2: 000000000000846a CR3: 000000009d137000 CR4: 00000000000006a0
      [135546.064723] Stack:
      [135546.065048]  ffffffffafe2055c ffffffffafe23fc1 ffffed00493097bf ffff8801ec3d0008
      [135546.066247]  0000000000000000 00000000000000d0 0000000000000000 ac194a24c0586342
      [135546.067438]  1ffff100178e1f78 ffff880320581b00 ffff8800bc70fdd0 ffff880320581b00
      [135546.068629] Call Trace:
      [135546.069028] ? __rds_conn_create (include/linux/rcupdate.h:856 net/rds/connection.c:134)
      [135546.069989] ? rds_message_copy_from_user (net/rds/message.c:298)
      [135546.071021] rds_conn_create_outgoing (net/rds/connection.c:278)
      [135546.071981] rds_sendmsg (net/rds/send.c:1058)
      [135546.072858] ? perf_trace_lock (include/trace/events/lock.h:38)
      [135546.073744] ? lockdep_init (kernel/locking/lockdep.c:3298)
      [135546.074577] ? rds_send_drop_to (net/rds/send.c:976)
      [135546.075508] ? __might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3795)
      [135546.076349] ? __might_fault (mm/memory.c:3795)
      [135546.077179] ? rds_send_drop_to (net/rds/send.c:976)
      [135546.078114] sock_sendmsg (net/socket.c:611 net/socket.c:620)
      [135546.078856] SYSC_sendto (net/socket.c:1657)
      [135546.079596] ? SYSC_connect (net/socket.c:1628)
      [135546.080510] ? trace_dump_stack (kernel/trace/trace.c:1926)
      [135546.081397] ? ring_buffer_unlock_commit (kernel/trace/ring_buffer.c:2479 kernel/trace/ring_buffer.c:2558 kernel/trace/ring_buffer.c:2674)
      [135546.082390] ? trace_buffer_unlock_commit (kernel/trace/trace.c:1749)
      [135546.083410] ? trace_event_raw_event_sys_enter (include/trace/events/syscalls.h:16)
      [135546.084481] ? do_audit_syscall_entry (include/trace/events/syscalls.h:16)
      [135546.085438] ? trace_buffer_unlock_commit (kernel/trace/trace.c:1749)
      [135546.085515] rds_ib_laddr_check(): addr 36.74.25.172 ret -99 node type -1
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1935cd0b
    • Eric Dumazet's avatar
      net: fix a race in dst_release() · fa3ef263
      Eric Dumazet authored
      [ Upstream commit d69bbf88 ]
      
      Only cpu seeing dst refcount going to 0 can safely
      dereference dst->flags.
      
      Otherwise an other cpu might already have freed the dst.
      
      Fixes: 27b75c95 ("net: avoid RCU for NOCACHE dst")
      Reported-by: default avatarGreg Thelen <gthelen@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fa3ef263
    • Jay Vosburgh's avatar
      bonding: fix panic on non-ARPHRD_ETHER enslave failure · d780d212
      Jay Vosburgh authored
      [ Upstream commit 40baec22 ]
      
      Since commit 7d5cd2ce529b, when bond_enslave fails on devices that
      are not ARPHRD_ETHER, if needed, it resets the bonding device back to
      ARPHRD_ETHER by calling ether_setup.
      
      	Unfortunately, ether_setup clobbers dev->flags, clearing IFF_UP
      if the bond device is up, leaving it in a quasi-down state without
      having actually gone through dev_close.  For bonding, if any periodic
      work queue items are active (miimon, arp_interval, etc), those will
      remain running, as they are stopped by bond_close.  At this point, if
      the bonding module is unloaded or the bond is deleted, the system will
      panic when the work function is called.
      
      	This panic is resolved by calling dev_close on the bond itself
      prior to calling ether_setup.
      
      Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Fixes: 7d5cd2ce ("bonding: correctly handle bonding type change on enslave failure")
      Acked-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d780d212
    • Francesco Ruggeri's avatar
      packet: race condition in packet_bind · f8be56fc
      Francesco Ruggeri authored
      [ Upstream commit 30f7ea1c ]
      
      There is a race conditions between packet_notifier and packet_bind{_spkt}.
      
      It happens if packet_notifier(NETDEV_UNREGISTER) executes between the
      time packet_bind{_spkt} takes a reference on the new netdevice and the
      time packet_do_bind sets po->ifindex.
      In this case the notification can be missed.
      If this happens during a dev_change_net_namespace this can result in the
      netdevice to be moved to the new namespace while the packet_sock in the
      old namespace still holds a reference on it. When the netdevice is later
      deleted in the new namespace the deletion hangs since the packet_sock
      is not found in the new namespace' &net->packet.sklist.
      It can be reproduced with the script below.
      
      This patch makes packet_do_bind check again for the presence of the
      netdevice in the packet_sock's namespace after the synchronize_net
      in unregister_prot_hook.
      More in general it also uses the rcu lock for the duration of the bind
      to stop dev_change_net_namespace/rollback_registered_many from
      going past the synchronize_net following unlist_netdevice, so that
      no NETDEV_UNREGISTER notifications can happen on the new netdevice
      while the bind is executing. In order to do this some code from
      packet_bind{_spkt} is consolidated into packet_do_dev.
      
      import socket, os, time, sys
      proto=7
      realDev='em1'
      vlanId=400
      if len(sys.argv) > 1:
         vlanId=int(sys.argv[1])
      dev='vlan%d' % vlanId
      
      os.system('taskset -p 0x10 %d' % os.getpid())
      
      s = socket.socket(socket.PF_PACKET, socket.SOCK_RAW, proto)
      os.system('ip link add link %s name %s type vlan id %d' %
                (realDev, dev, vlanId))
      os.system('ip netns add dummy')
      
      pid=os.fork()
      
      if pid == 0:
         # dev should be moved while packet_do_bind is in synchronize net
         os.system('taskset -p 0x20000 %d' % os.getpid())
         os.system('ip link set %s netns dummy' % dev)
         os.system('ip netns exec dummy ip link del %s' % dev)
         s.close()
         sys.exit(0)
      
      time.sleep(.004)
      try:
         s.bind(('%s' % dev, proto+1))
      except:
         print 'Could not bind socket'
         s.close()
         os.system('ip netns del dummy')
         sys.exit(0)
      
      os.waitpid(pid, 0)
      s.close()
      os.system('ip netns del dummy')
      sys.exit(0)
      Signed-off-by: default avatarFrancesco Ruggeri <fruggeri@arista.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f8be56fc
    • WANG Cong's avatar
      ipv4: disable BH when changing ip local port range · 1b27b7f6
      WANG Cong authored
      [ Upstream commit 4ee3bd4a ]
      
      This fixes the following lockdep warning:
      
       [ INFO: inconsistent lock state ]
       4.3.0-rc7+ #1197 Not tainted
       ---------------------------------
       inconsistent {IN-SOFTIRQ-R} -> {SOFTIRQ-ON-W} usage.
       sysctl/1019 [HC0[0]:SC0[0]:HE1:SE1] takes:
        (&(&net->ipv4.ip_local_ports.lock)->seqcount){+.+-..}, at: [<ffffffff81921de7>] ipv4_local_port_range+0xb4/0x12a
       {IN-SOFTIRQ-R} state was registered at:
         [<ffffffff810bd682>] __lock_acquire+0x2f6/0xdf0
         [<ffffffff810be6d5>] lock_acquire+0x11c/0x1a4
         [<ffffffff818e599c>] inet_get_local_port_range+0x4e/0xae
         [<ffffffff8166e8e3>] udp_flow_src_port.constprop.40+0x23/0x116
         [<ffffffff81671cb9>] vxlan_xmit_one+0x219/0xa6a
         [<ffffffff81672f75>] vxlan_xmit+0xa6b/0xaa5
         [<ffffffff817f2deb>] dev_hard_start_xmit+0x2ae/0x465
         [<ffffffff817f35ed>] __dev_queue_xmit+0x531/0x633
         [<ffffffff817f3702>] dev_queue_xmit_sk+0x13/0x15
         [<ffffffff818004a5>] neigh_resolve_output+0x12f/0x14d
         [<ffffffff81959cfa>] ip6_finish_output2+0x344/0x39f
         [<ffffffff8195bf58>] ip6_finish_output+0x88/0x8e
         [<ffffffff8195bfef>] ip6_output+0x91/0xe5
         [<ffffffff819792ae>] dst_output_sk+0x47/0x4c
         [<ffffffff81979392>] NF_HOOK_THRESH.constprop.30+0x38/0x82
         [<ffffffff8197981e>] mld_sendpack+0x189/0x266
         [<ffffffff8197b28b>] mld_ifc_timer_expire+0x1ef/0x223
         [<ffffffff810de581>] call_timer_fn+0xfb/0x28c
         [<ffffffff810ded1e>] run_timer_softirq+0x1c7/0x1f1
      
      Fixes: b8f1a556 ("udp: Add function to make source port for UDP tunnels")
      Cc: Tom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1b27b7f6
    • Sabrina Dubroca's avatar
      ipv6: clean up dev_snmp6 proc entry when we fail to initialize inet6_dev · f6f9261a
      Sabrina Dubroca authored
      [ Upstream commit 2a189f9e ]
      
      In ipv6_add_dev, when addrconf_sysctl_register fails, we do not clean up
      the dev_snmp6 entry that we have already registered for this device.
      Call snmp6_unregister_dev in this case.
      
      Fixes: a317a2f1 ("ipv6: fail early when creating netdev named all or default")
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f6f9261a
    • Eric Dumazet's avatar
      net: avoid NULL deref in inet_ctl_sock_destroy() · 1ce33c2b
      Eric Dumazet authored
      [ Upstream commit 8fa677d2 ]
      
      Under low memory conditions, tcp_sk_init() and icmp_sk_init()
      can both iterate on all possible cpus and call inet_ctl_sock_destroy(),
      with eventual NULL pointer.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1ce33c2b
    • Martin Habets's avatar
      sfc: push partner queue for skb->xmit_more · b3c940d8
      Martin Habets authored
      [ Upstream commit b2663a4f ]
      
      When the IP stack passes SKBs the sfc driver puts them in 2 different TX
      queues (called partners), one for checksummed and one for not checksummed.
      If the SKB has xmit_more set the driver will delay pushing the work to the
      NIC.
      
      When later it does decide to push the buffers this patch ensures it also
      pushes the partner queue, if that also has any delayed work. Before this
      fix the work in the partner queue would be left for a long time and cause
      a netdev watchdog.
      
      Fixes: 70b33fb0 ("sfc: add support for skb->xmit_more")
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarMartin Habets <mhabets@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b3c940d8
    • Eric Dumazet's avatar
      sit: fix sit0 percpu double allocations · 0d0f84af
      Eric Dumazet authored
      [ Upstream commit 4ece9009 ]
      
      sit0 device allocates its percpu storage twice :
      - One time in ipip6_tunnel_init()
      - One time in ipip6_fb_tunnel_init()
      
      Thus we leak 48 bytes per possible cpu per network namespace dismantle.
      
      ipip6_fb_tunnel_init() can be much simpler and does not
      return an error, and should be called after register_netdev()
      
      Note that ipip6_tunnel_clone_6rd() also needs to be called
      after register_netdev() (calling ipip6_tunnel_init())
      
      Fixes: ebe084aa ("sit: Use ipip6_tunnel_init as the ndo_init function.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0d0f84af
    • Ani Sinha's avatar
      ipmr: fix possible race resulting from improper usage of IP_INC_STATS_BH() in preemptible context. · 176dae31
      Ani Sinha authored
      [ Upstream commit 44f49dd8 ]
      
      Fixes the following kernel BUG :
      
      BUG: using __this_cpu_add() in preemptible [00000000] code: bash/2758
      caller is __this_cpu_preempt_check+0x13/0x15
      CPU: 0 PID: 2758 Comm: bash Tainted: P           O   3.18.19 #2
       ffffffff8170eaca ffff880110d1b788 ffffffff81482b2a 0000000000000000
       0000000000000000 ffff880110d1b7b8 ffffffff812010ae ffff880007cab800
       ffff88001a060800 ffff88013a899108 ffff880108b84240 ffff880110d1b7c8
      Call Trace:
      [<ffffffff81482b2a>] dump_stack+0x52/0x80
      [<ffffffff812010ae>] check_preemption_disabled+0xce/0xe1
      [<ffffffff812010d4>] __this_cpu_preempt_check+0x13/0x15
      [<ffffffff81419d60>] ipmr_queue_xmit+0x647/0x70c
      [<ffffffff8141a154>] ip_mr_forward+0x32f/0x34e
      [<ffffffff8141af76>] ip_mroute_setsockopt+0xe03/0x108c
      [<ffffffff810553fc>] ? get_parent_ip+0x11/0x42
      [<ffffffff810e6974>] ? pollwake+0x4d/0x51
      [<ffffffff81058ac0>] ? default_wake_function+0x0/0xf
      [<ffffffff810553fc>] ? get_parent_ip+0x11/0x42
      [<ffffffff810613d9>] ? __wake_up_common+0x45/0x77
      [<ffffffff81486ea9>] ? _raw_spin_unlock_irqrestore+0x1d/0x32
      [<ffffffff810618bc>] ? __wake_up_sync_key+0x4a/0x53
      [<ffffffff8139a519>] ? sock_def_readable+0x71/0x75
      [<ffffffff813dd226>] do_ip_setsockopt+0x9d/0xb55
      [<ffffffff81429818>] ? unix_seqpacket_sendmsg+0x3f/0x41
      [<ffffffff813963fe>] ? sock_sendmsg+0x6d/0x86
      [<ffffffff813959d4>] ? sockfd_lookup_light+0x12/0x5d
      [<ffffffff8139650a>] ? SyS_sendto+0xf3/0x11b
      [<ffffffff810d5738>] ? new_sync_read+0x82/0xaa
      [<ffffffff813ddd19>] compat_ip_setsockopt+0x3b/0x99
      [<ffffffff813fb24a>] compat_raw_setsockopt+0x11/0x32
      [<ffffffff81399052>] compat_sock_common_setsockopt+0x18/0x1f
      [<ffffffff813c4d05>] compat_SyS_setsockopt+0x1a9/0x1cf
      [<ffffffff813c4149>] compat_SyS_socketcall+0x180/0x1e3
      [<ffffffff81488ea1>] cstar_dispatch+0x7/0x1e
      Signed-off-by: default avatarAni Sinha <ani@arista.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      176dae31
    • Phil Reid's avatar
      stmmac: Correctly report PTP capabilities. · 912bd0da
      Phil Reid authored
      [ Upstream commit e6dbe1eb ]
      
      priv->hwts_*_en indicate if timestamping is enabled/disabled at run
      time. But  priv->dma_cap.time_stamp  and priv->dma_cap.atime_stamp
      indicates HW is support for PTPv1/PTPv2.
      Signed-off-by: default avatarPhil Reid <preid@electromag.com.au>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      912bd0da
    • Julian Anastasov's avatar
      ipv4: update RTNH_F_LINKDOWN flag on UP event · d5f4b227
      Julian Anastasov authored
      [ Upstream commit c9b3292e ]
      
      When nexthop is part of multipath route we should clear the
      LINKDOWN flag when link goes UP or when first address is added.
      This is needed because we always set LINKDOWN flag when DEAD flag
      was set but now on UP the nexthop is not dead anymore. Examples when
      LINKDOWN bit can be forgotten when no NETDEV_CHANGE is delivered:
      
      - link goes down (LINKDOWN is set), then link goes UP and device
      shows carrier OK but LINKDOWN remains set
      
      - last address is deleted (LINKDOWN is set), then address is
      added and device shows carrier OK but LINKDOWN remains set
      
      Steps to reproduce:
      modprobe dummy
      ifconfig dummy0 192.168.168.1 up
      
      here add a multipath route where one nexthop is for dummy0:
      
      ip route add 1.2.3.4 nexthop dummy0 nexthop SOME_OTHER_DEVICE
      ifconfig dummy0 down
      ifconfig dummy0 up
      
      now ip route shows nexthop that is not dead. Now set the sysctl var:
      
      echo 1 > /proc/sys/net/ipv4/conf/dummy0/ignore_routes_with_linkdown
      
      now ip route will show a dead nexthop because the forgotten
      RTNH_F_LINKDOWN is propagated as RTNH_F_DEAD.
      
      Fixes: 8a3d0316 ("net: track link-status of ipv4 nexthops")
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d5f4b227
    • Julian Anastasov's avatar
      ipv4: fix to not remove local route on link down · ee256463
      Julian Anastasov authored
      [ Upstream commit 4f823def ]
      
      When fib_netdev_event calls fib_disable_ip on NETDEV_DOWN event
      we should not delete the local routes if the local address
      is still present. The confusion comes from the fact that both
      fib_netdev_event and fib_inetaddr_event use the NETDEV_DOWN
      constant. Fix it by returning back the variable 'force'.
      
      Steps to reproduce:
      modprobe dummy
      ifconfig dummy0 192.168.168.1 up
      ifconfig dummy0 down
      ip route list table local | grep dummy | grep host
      local 192.168.168.1 dev dummy0  proto kernel  scope host  src 192.168.168.1
      
      Fixes: 8a3d0316 ("net: track link-status of ipv4 nexthops")
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ee256463
    • Jon Paul Maloy's avatar
      tipc: linearize arriving NAME_DISTR and LINK_PROTO buffers · 2d52c83a
      Jon Paul Maloy authored
      [ Upstream commit 5cbb28a4 ]
      
      Testing of the new UDP bearer has revealed that reception of
      NAME_DISTRIBUTOR, LINK_PROTOCOL/RESET and LINK_PROTOCOL/ACTIVATE
      message buffers is not prepared for the case that those may be
      non-linear.
      
      We now linearize all such buffers before they are delivered up to the
      generic reception layer.
      
      In order for the commit to apply cleanly to 'net' and 'stable', we do
      the change in the function tipc_udp_recv() for now. Later, we will post
      a commit to 'net-next' moving the linearization to generic code, in
      tipc_named_rcv() and tipc_link_proto_rcv().
      
      Fixes: commit d0f91938 ("tipc: add ip/udp media type")
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2d52c83a
    • Carol L Soto's avatar
      net/mlx4: Copy/set only sizeof struct mlx4_eqe bytes · 769053d2
      Carol L Soto authored
      [ Upstream commit c02b0501 ]
      
      When doing memcpy/memset of EQEs, we should use sizeof struct
      mlx4_eqe as the base size and not caps.eqe_size which could be bigger.
      
      If caps.eqe_size is bigger than the struct mlx4_eqe then we corrupt
      data in the master context.
      
      When using a 64 byte stride, the memcpy copied over 63 bytes to the
      slave_eq structure.  This resulted in copying over the entire eqe of
      interest, including its ownership bit -- and also 31 bytes of garbage
      into the next WQE in the slave EQ -- which did NOT include the ownership
      bit (and therefore had no impact).
      
      However, once the stride is increased to 128, we are overwriting the
      ownership bits of *three* eqes in the slave_eq struct.  This results
      in an incorrect ownership bit for those eqes, which causes the eq to
      seem to be full. The issue therefore surfaced only once 128-byte EQEs
      started being used in SRIOV and (overarchitectures that have 128/256
      byte cache-lines such as PPC) - e.g after commit 77507aa2
      "net/mlx4_core: Enable CQE/EQE stride support".
      
      Fixes: 08ff3235 ('mlx4: 64-byte CQE/EQE support')
      Signed-off-by: default avatarCarol L Soto <clsoto@linux.vnet.ibm.com>
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      769053d2
    • Sowmini Varadhan's avatar
      RDS-TCP: Recover correctly from pskb_pull()/pksb_trim() failure in rds_tcp_data_recv · aee5e801
      Sowmini Varadhan authored
      [ Upstream commit 8ce675ff ]
      
      Either of pskb_pull() or pskb_trim() may fail under low memory conditions.
      If rds_tcp_data_recv() ignores such failures, the application will
      receive corrupted data because the skb has not been correctly
      carved to the RDS datagram size.
      
      Avoid this by handling pskb_pull/pskb_trim failure in the same
      manner as the skb_clone failure: bail out of rds_tcp_data_recv(), and
      retry via the deferred call to rds_send_worker() that gets set up on
      ENOMEM from rds_tcp_read_sock()
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aee5e801
    • Alexander Duyck's avatar
      fib_trie: leaf_walk_rcu should not compute key if key is less than pn->key · 438da3c1
      Alexander Duyck authored
      [ Upstream commit c2229fe1 ]
      
      We were computing the child index in cases where the key value we were
      looking for was actually less than the base key of the tnode.  As a result
      we were getting incorrect index values that would cause us to skip over
      some children.
      
      To fix this I have added a test that will force us to use child index 0 if
      the key we are looking for is less than the key of the current tnode.
      
      Fixes: 8be33e95 ("fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf")
      Reported-by: default avatarBrian Rak <brak@gameservers.com>
      Signed-off-by: default avatarAlexander Duyck <aduyck@mirantis.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      438da3c1
    • Maciej S. Szmigiero's avatar
      net: fec: normalize return value of pm_runtime_get_sync() in MDIO write · e8b3a1a9
      Maciej S. Szmigiero authored
      [ Upstream commit 42ea4457 ]
      
      If fec MDIO write method succeeds its return value comes from
      call to pm_runtime_get_sync().
      But pm_runtime_get_sync() can also return 1.
      
      In case of Micrel KSZ9031 PHY this value will then
      be returned along the call chain of phy_write() ->
      ksz9031_extended_write() -> ksz9031_center_flp_timing() ->
      ksz9031_config_init() -> phy_init_hw() -> phy_attach_direct() ->
      phy_connect_direct().
      
      Then phy_connect() will cast it into a pointer using ERR_PTR(),
      which then fec_enet_mii_probe() will try to dereference
      resulting in an oops.
      
      Fix it by normalizing return value of pm_runtime_get_sync()
      to be zero if positive in MDIO write method.
      
      Fixes: 8fff755e ("net: fec: Ensure clocks are enabled while using mdio bus")
      Signed-off-by: default avatarMaciej Szmigiero <mail@maciej.szmigiero.name>
      Acked-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e8b3a1a9
    • Eric Dumazet's avatar
      ipv6: gre: support SIT encapsulation · 0a7f9f1c
      Eric Dumazet authored
      [ Upstream commit 7e3b6e74 ]
      
      gre_gso_segment() chokes if SIT frames were aggregated by GRO engine.
      
      Fixes: 61c1db7f ("ipv6: sit: add GSO/TSO support")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0a7f9f1c
    • Fabio Estevam's avatar
      net: fec: Remove unneeded use of IS_ERR_VALUE() macro · 914eba92
      Fabio Estevam authored
      [ Upstream commit b0c6ce24 ]
      
      There is no need to use the IS_ERR_VALUE() macro for checking
      the return value from pm_runtime_* functions.
      
      Just do a simple negative test instead.
      
      The semantic patch that makes this change is available
      in scripts/coccinelle/api/pm_runtime.cocci.
      Signed-off-by: default avatarFabio Estevam <fabio.estevam@freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      914eba92
    • Lendacky, Thomas's avatar
      amd-xgbe: Fix race between access of desc and desc index · 60773152
      Lendacky, Thomas authored
      [ Upstream commit 20986ed8 ]
      
      During Tx cleanup it's still possible for the descriptor data to be
      read ahead of the descriptor index. A memory barrier is required between
      the read of the descriptor index and the start of the Tx cleanup loop.
      This allows a change to a lighter-weight barrier in the Tx transmit
      routine just before updating the current descriptor index.
      
      Since the memory barrier does result in extra overhead on arm64, keep
      the previous change to not chase the current descriptor value. This
      prevents the execution of the barrier for each loop performed.
      Suggested-by: default avatarAlexander Duyck <alexander.duyck@gmail.com>
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      60773152
    • Lendacky, Thomas's avatar
      amd-xgbe: Use wmb before updating current descriptor count · 3bc6cb1e
      Lendacky, Thomas authored
      [ Upstream commit 20a41fba ]
      
      The code currently uses the lightweight dma_wmb barrier before updating
      the current descriptor count. Under heavy load, the Tx cleanup routine
      was seeing the updated current descriptor count before the updated
      descriptor information. As a result, the Tx descriptor was being cleaned
      up before it was used because it was not "owned" by the hardware yet,
      resulting in a Tx queue hang.
      
      Using the wmb barrier insures that the descriptor is updated before the
      descriptor counter preventing the Tx queue hang. For extra insurance,
      the Tx cleanup routine is changed to grab the current decriptor count on
      entry and uses that initial value in the processing loop rather than
      trying to chase the current value.
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Tested-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3bc6cb1e
    • Guillaume Nault's avatar
      ppp: fix pppoe_dev deletion condition in pppoe_release() · 8edcc554
      Guillaume Nault authored
      [ Upstream commit 1acea4f6 ]
      
      We can't rely on PPPOX_ZOMBIE to decide whether to clear po->pppoe_dev.
      PPPOX_ZOMBIE can be set by pppoe_disc_rcv() even when po->pppoe_dev is
      NULL. So we have no guarantee that (sk->sk_state & PPPOX_ZOMBIE) implies
      (po->pppoe_dev != NULL).
      Since we're releasing a PPPoE socket, we want to release the pppoe_dev
      if it exists and reset sk_state to PPPOX_DEAD, no matter the previous
      value of sk_state. So we can just check for po->pppoe_dev and avoid any
      assumption on sk->sk_state.
      
      Fixes: 2b018d57 ("pppoe: drop PPPOX_ZOMBIEs in pppoe_release")
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8edcc554
    • Jason Wang's avatar
      macvtap: unbreak receiving of gro skb with frag list · bd71bc5e
      Jason Wang authored
      [ Upstream commit f23d538b ]
      
      We don't have fraglist support in TAP_FEATURES. This will lead
      software segmentation of gro skb with frag list. Fixes by having
      frag list support in TAP_FEATURES.
      
      With this patch single session of netperf receiving were restored from
      about 5Gb/s to about 12Gb/s on mlx4.
      
      Fixes a567dd62 ("macvtap: simplify usage of tap_features")
      Cc: Vlad Yasevich <vyasevic@redhat.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bd71bc5e
    • Bjørn Mork's avatar
      qmi_wwan: add Sierra Wireless MC74xx/EM74xx · 7eaa9188
      Bjørn Mork authored
      [ Upstream commit 0db65fcf ]
      
      New device IDs shamelessly lifted from the vendor driver.
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7eaa9188
    • David Herrmann's avatar
      netlink: fix locking around NETLINK_LIST_MEMBERSHIPS · aa23ddc3
      David Herrmann authored
      [ Upstream commit 47191d65 ]
      
      Currently, NETLINK_LIST_MEMBERSHIPS grabs the netlink table while copying
      the membership state to user-space. However, grabing the netlink table is
      effectively a write_lock_irq(), and as such we should not be triggering
      page-faults in the critical section.
      
      This can be easily reproduced by the following snippet:
          int s = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
          void *p = mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0);
          int r = getsockopt(s, 0x10e, 9, p, (void*)((char*)p + 4092));
      
      This should work just fine, but currently triggers EFAULT and a possible
      WARN_ON below handle_mm_fault().
      
      Fix this by reducing locking of NETLINK_LIST_MEMBERSHIPS to a read-side
      lock. The write-lock was overkill in the first place, and the read-lock
      allows page-faults just fine.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarDavid Herrmann <dh.herrmann@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aa23ddc3
    • Renato Westphal's avatar
      tcp: remove improper preemption check in tcp_xmit_probe_skb() · 9a14758e
      Renato Westphal authored
      [ Upstream commit e2e8009f ]
      
      Commit e520af48 introduced the following bug when setting the
      TCP_REPAIR sockoption:
      
      [ 2860.657036] BUG: using __this_cpu_add() in preemptible [00000000] code: daemon/12164
      [ 2860.657045] caller is __this_cpu_preempt_check+0x13/0x20
      [ 2860.657049] CPU: 1 PID: 12164 Comm: daemon Not tainted 4.2.3 #1
      [ 2860.657051] Hardware name: Dell Inc. PowerEdge R210 II/0JP7TR, BIOS 2.0.5 03/13/2012
      [ 2860.657054]  ffffffff81c7f071 ffff880231e9fdf8 ffffffff8185d765 0000000000000002
      [ 2860.657058]  0000000000000001 ffff880231e9fe28 ffffffff8146ed91 ffff880231e9fe18
      [ 2860.657062]  ffffffff81cd1a5d ffff88023534f200 ffff8800b9811000 ffff880231e9fe38
      [ 2860.657065] Call Trace:
      [ 2860.657072]  [<ffffffff8185d765>] dump_stack+0x4f/0x7b
      [ 2860.657075]  [<ffffffff8146ed91>] check_preemption_disabled+0xe1/0xf0
      [ 2860.657078]  [<ffffffff8146edd3>] __this_cpu_preempt_check+0x13/0x20
      [ 2860.657082]  [<ffffffff817e0bc7>] tcp_xmit_probe_skb+0xc7/0x100
      [ 2860.657085]  [<ffffffff817e1e2d>] tcp_send_window_probe+0x2d/0x30
      [ 2860.657089]  [<ffffffff817d1d8c>] do_tcp_setsockopt.isra.29+0x74c/0x830
      [ 2860.657093]  [<ffffffff817d1e9c>] tcp_setsockopt+0x2c/0x30
      [ 2860.657097]  [<ffffffff81767b74>] sock_common_setsockopt+0x14/0x20
      [ 2860.657100]  [<ffffffff817669e1>] SyS_setsockopt+0x71/0xc0
      [ 2860.657104]  [<ffffffff81865172>] entry_SYSCALL_64_fastpath+0x16/0x75
      
      Since tcp_xmit_probe_skb() can be called from process context, use
      NET_INC_STATS() instead of NET_INC_STATS_BH().
      
      Fixes: e520af48 ("tcp: add TCPWinProbe and TCPKeepAlive SNMP counters")
      Signed-off-by: default avatarRenato Westphal <renatow@taghos.com.br>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9a14758e
    • Jon Paul Maloy's avatar
      tipc: allow non-linear first fragment buffer · c19282fd
      Jon Paul Maloy authored
      [ Upstream commit 45c8b7b1 ]
      
      The current code for message reassembly is erroneously assuming that
      the the first arriving fragment buffer always is linear, and then goes
      ahead resetting the fragment list of that buffer in anticipation of
      more arriving fragments.
      
      However, if the buffer already happens to be non-linear, we will
      inadvertently drop the already attached fragment list, and later
      on trig a BUG() in __pskb_pull_tail().
      
      We see this happen when running fragmented TIPC multicast across UDP,
      something made possible since
      commit d0f91938 ("tipc: add ip/udp media type")
      
      We fix this by not resetting the fragment list when the buffer is non-
      linear, and by initiatlizing our private fragment list tail pointer to
      the tail of the existing fragment list.
      
      Fixes: commit d0f91938 ("tipc: add ip/udp media type")
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c19282fd
    • Dan Carpenter's avatar
      irda: precedence bug in irlmp_seq_hb_idx() · 87056ada
      Dan Carpenter authored
      [ Upstream commit 50010c20 ]
      
      This is decrementing the pointer, instead of the value stored in the
      pointer.  KASan detects it as an out of bounds reference.
      Reported-by: default avatar"Berry Cheng 程君(成淼)" <chengmiao.cj@alibaba-inc.com>
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      87056ada
  2. 09 Nov, 2015 10 commits