1. 04 Sep, 2018 1 commit
  2. 02 Sep, 2018 13 commits
    • Vinson Lee's avatar
      uapi: Fix linux/rds.h userspace compilation errors. · 59a03fea
      Vinson Lee authored
      Include linux/in6.h for struct in6_addr.
      
      /usr/include/linux/rds.h:156:18: error: field ‘laddr’ has incomplete type
        struct in6_addr laddr;
                        ^~~~~
      /usr/include/linux/rds.h:157:18: error: field ‘faddr’ has incomplete type
        struct in6_addr faddr;
                        ^~~~~
      /usr/include/linux/rds.h:178:18: error: field ‘laddr’ has incomplete type
        struct in6_addr laddr;
                        ^~~~~
      /usr/include/linux/rds.h:179:18: error: field ‘faddr’ has incomplete type
        struct in6_addr faddr;
                        ^~~~~
      /usr/include/linux/rds.h:198:18: error: field ‘bound_addr’ has incomplete type
        struct in6_addr bound_addr;
                        ^~~~~~~~~~
      /usr/include/linux/rds.h:199:18: error: field ‘connected_addr’ has incomplete type
        struct in6_addr connected_addr;
                        ^~~~~~~~~~~~~~
      /usr/include/linux/rds.h:219:18: error: field ‘local_addr’ has incomplete type
        struct in6_addr local_addr;
                        ^~~~~~~~~~
      /usr/include/linux/rds.h:221:18: error: field ‘peer_addr’ has incomplete type
        struct in6_addr peer_addr;
                        ^~~~~~~~~
      /usr/include/linux/rds.h:245:18: error: field ‘src_addr’ has incomplete type
        struct in6_addr src_addr;
                        ^~~~~~~~
      /usr/include/linux/rds.h:246:18: error: field ‘dst_addr’ has incomplete type
        struct in6_addr dst_addr;
                        ^~~~~~~~
      
      Fixes: b7ff8b10 ("rds: Extend RDS API for IPv6 support")
      Signed-off-by: default avatarVinson Lee <vlee@freedesktop.org>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      59a03fea
    • Jia-Ju Bai's avatar
      net: cadence: Fix a sleep-in-atomic-context bug in macb_halt_tx() · 16fe10cf
      Jia-Ju Bai authored
      The kernel module may sleep with holding a spinlock.
      
      The function call paths (from bottom to top) in Linux-4.16 are:
      
      [FUNC] usleep_range
      drivers/net/ethernet/cadence/macb_main.c, 648:
      	usleep_range in macb_halt_tx
      drivers/net/ethernet/cadence/macb_main.c, 730:
      	macb_halt_tx in macb_tx_error_task
      drivers/net/ethernet/cadence/macb_main.c, 721:
      	_raw_spin_lock_irqsave in macb_tx_error_task
      
      To fix this bug, usleep_range() is replaced with udelay().
      
      This bug is found by my static analysis tool DSAC.
      Signed-off-by: default avatarJia-Ju Bai <baijiaju1990@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      16fe10cf
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · a80afe89
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2018-09-02
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Fix one remaining buggy offset override in sockmap's bpf_msg_pull_data()
         when linearizing multiple scatterlist elements, from Tushar.
      
      2) Fix BPF sockmap's misuse of ULP when a collision with another ULP is
         found on map update where it would release existing ULP. syzbot found and
         triggered this couple of times now, fix from John.
      
      3) Add missing xskmap type to bpftool so it will properly show the type
         on map dump, from Prashant.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a80afe89
    • David Ahern's avatar
      net/ipv6: Only update MTU metric if it set · 15a81b41
      David Ahern authored
      Jan reported a regression after an update to 4.18.5. In this case ipv6
      default route is setup by systemd-networkd based on data from an RA. The
      RA contains an MTU of 1492 which is used when the route is first inserted
      but then systemd-networkd pushes down updates to the default route
      without the mtu set.
      
      Prior to the change to fib6_info, metrics such as MTU were held in the
      dst_entry and rt6i_pmtu in rt6_info contained an update to the mtu if
      any. ip6_mtu would look at rt6i_pmtu first and use it if set. If not,
      the value from the metrics is used if it is set and finally falling
      back to the idev value.
      
      After the fib6_info change metrics are contained in the fib6_info struct
      and there is no equivalent to rt6i_pmtu. To maintain consistency with
      the old behavior the new code should only reset the MTU in the metrics
      if the route update has it set.
      
      Fixes: d4ead6b3 ("net/ipv6: move metrics from dst to rt6_info")
      Reported-by: default avatarJan Janssen <medhefgo@web.de>
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      15a81b41
    • Tony Lindgren's avatar
      net: ethernet: cpsw-phy-sel: prefer phandle for phy sel · 18eb8aea
      Tony Lindgren authored
      The cpsw-phy-sel device is not a child of the cpsw interconnect target
      module. It lives in the system control module.
      
      Let's fix this issue by trying to use cpsw-phy-sel phandle first if it
      exists and if not fall back to current usage of trying to find the
      cpsw-phy-sel child. That way the phy sel driver can be a child of the
      system control module where it belongs in the device tree.
      
      Without this fix, we cannot have a proper interconnect target module
      hierarchy in device tree for things like genpd.
      
      Note that deferred probe is mostly not supported by cpsw and this patch
      does not attempt to fix that. In case deferred probe support is needed,
      this could be added to cpsw_slave_open() and phy_connect() so they start
      handling and returning errors.
      
      For documenting it, looks like the cpsw-phy-sel is used for all cpsw device
      tree nodes. It's missing the related binding documentation, so let's also
      update the binding documentation accordingly.
      
      Cc: devicetree@vger.kernel.org
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Grygorii Strashko <grygorii.strashko@ti.com>
      Cc: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Murali Karicheri <m-karicheri2@ti.com>
      Cc: Rob Herring <robh+dt@kernel.org>
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      18eb8aea
    • Tony Lindgren's avatar
      dt-bindings: net: cpsw: Document cpsw-phy-sel usage but prefer phandle · 10d7fac4
      Tony Lindgren authored
      The current cpsw usage for cpsw-phy-sel is undocumented but is used for
      all the boards using cpsw. And cpsw-phy-sel is not really a child of
      the cpsw device, it lives in the system control module instead.
      
      Let's document the existing usage, and improve it a bit where we prefer
      to use a phandle instead of a child device for it. That way we can
      properly describe the hardware in dts files for things like genpd.
      
      Cc: devicetree@vger.kernel.org
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Grygorii Strashko <grygorii.strashko@ti.com>
      Cc: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Murali Karicheri <m-karicheri2@ti.com>
      Cc: Rob Herring <robh+dt@kernel.org>
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      10d7fac4
    • David S. Miller's avatar
      Merge branch 'igmp-fix-two-incorrect-unsolicit-report-count-issues' · c60e06c3
      David S. Miller authored
      Hangbin Liu says:
      
      ====================
      igmp: fix two incorrect unsolicit report count issues
      
      Just like the subject, fix two minor igmp unsolicit report count issues.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c60e06c3
    • Hangbin Liu's avatar
      igmp: fix incorrect unsolicit report count after link down and up · ff06525f
      Hangbin Liu authored
      After link down and up, i.e. when call ip_mc_up(), we doesn't init
      im->unsolicit_count. So after igmp_timer_expire(), we will not start
      timer again and only send one unsolicit report at last.
      
      Fix it by initializing im->unsolicit_count in igmp_group_added(), so
      we can respect igmp robustness value.
      
      Fixes: 24803f38 ("igmp: do not remove igmp souce list info when set link down")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ff06525f
    • Hangbin Liu's avatar
      igmp: fix incorrect unsolicit report count when join group · 4fb7253e
      Hangbin Liu authored
      We should not start timer if im->unsolicit_count equal to 0 after decrease.
      Or we will send one more unsolicit report message. i.e. 3 instead of 2 by
      default.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4fb7253e
    • John Fastabend's avatar
      bpf: avoid misuse of psock when TCP_ULP_BPF collides with another ULP · 597222f7
      John Fastabend authored
      Currently we check sk_user_data is non NULL to determine if the sk
      exists in a map. However, this is not sufficient to ensure the psock
      or the ULP ops are not in use by another user, such as kcm or TLS. To
      avoid this when adding a sock to a map also verify it is of the
      correct ULP type. Additionally, when releasing a psock verify that
      it is the TCP_ULP_BPF type before releasing the ULP. The error case
      where we abort an update due to ULP collision can cause this error
      path.
      
      For example,
      
        __sock_map_ctx_update_elem()
           [...]
           err = tcp_set_ulp_id(sock, TCP_ULP_BPF) <- collides with TLS
           if (err)                                <- so err out here
              goto out_free
           [...]
        out_free:
           smap_release_sock() <- calling tcp_cleanup_ulp releases the
                                  TLS ULP incorrectly.
      
      Fixes: 2f857d04 ("bpf: sockmap, remove STRPARSER map_flags and add multi-map support")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      597222f7
    • Prashant Bhole's avatar
      tools/bpf: bpftool, add xskmap in map types · 97911e0c
      Prashant Bhole authored
      When listed all maps, bpftool currently shows (null) for xskmap.
      Added xskmap type in map_type_name[] to show correct type.
      Signed-off-by: default avatarPrashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      97911e0c
    • Tushar Dave's avatar
      bpf: Fix bpf_msg_pull_data() · 9db39f4d
      Tushar Dave authored
      Helper bpf_msg_pull_data() mistakenly reuses variable 'offset' while
      linearizing multiple scatterlist elements. Variable 'offset' is used
      to find first starting scatterlist element
          i.e. msg->data = sg_virt(&sg[first_sg]) + start - offset"
      
      Use different variable name while linearizing multiple scatterlist
      elements so that value contained in variable 'offset' won't get
      overwritten.
      
      Fixes: 015632bb ("bpf: sk_msg program helper bpf_sk_msg_pull_data")
      Signed-off-by: default avatarTushar Dave <tushar.n.dave@oracle.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      9db39f4d
    • Alexey Kodanev's avatar
      ipv6: don't get lwtstate twice in ip6_rt_copy_init() · 93bbadd6
      Alexey Kodanev authored
      Commit 80f1a0f4 ("net/ipv6: Put lwtstate when destroying fib6_info")
      partially fixed the kmemleak [1], lwtstate can be copied from fib6_info,
      with ip6_rt_copy_init(), and it should be done only once there.
      
      rt->dst.lwtstate is set by ip6_rt_init_dst(), at the start of the function
      ip6_rt_copy_init(), so there is no need to get it again at the end.
      
      With this patch, lwtstate also isn't copied from RTF_REJECT routes.
      
      [1]:
      unreferenced object 0xffff880b6aaa14e0 (size 64):
        comm "ip", pid 10577, jiffies 4295149341 (age 1273.903s)
        hex dump (first 32 bytes):
          01 00 04 00 04 00 00 00 10 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<0000000018664623>] lwtunnel_build_state+0x1bc/0x420
          [<00000000b73aa29a>] ip6_route_info_create+0x9f7/0x1fd0
          [<00000000ee2c5d1f>] ip6_route_add+0x14/0x70
          [<000000008537b55c>] inet6_rtm_newroute+0xd9/0xe0
          [<000000002acc50f5>] rtnetlink_rcv_msg+0x66f/0x8e0
          [<000000008d9cd381>] netlink_rcv_skb+0x268/0x3b0
          [<000000004c893c76>] netlink_unicast+0x417/0x5a0
          [<00000000f2ab1afb>] netlink_sendmsg+0x70b/0xc30
          [<00000000890ff0aa>] sock_sendmsg+0xb1/0xf0
          [<00000000a2e7b66f>] ___sys_sendmsg+0x659/0x950
          [<000000001e7426c8>] __sys_sendmsg+0xde/0x170
          [<00000000fe411443>] do_syscall_64+0x9f/0x4a0
          [<000000001be7b28b>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
          [<000000006d21f353>] 0xffffffffffffffff
      
      Fixes: 6edb3c96 ("net/ipv6: Defer initialization of dst to data path")
      Signed-off-by: default avatarAlexey Kodanev <alexey.kodanev@oracle.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93bbadd6
  3. 01 Sep, 2018 8 commits
    • Thomas Falcon's avatar
      ibmvnic: Include missing return code checks in reset function · f611a5b4
      Thomas Falcon authored
      Check the return codes of these functions and halt reset
      in case of failure. The driver will remain in a dormant state
      until the next reset event, when device initialization will be
      re-attempted.
      Signed-off-by: default avatarThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f611a5b4
    • Sabrina Dubroca's avatar
      selftests: pmtu: detect correct binary to ping ipv6 addresses · c81c7012
      Sabrina Dubroca authored
      Some systems don't have the ping6 binary anymore, and use ping for
      everything. Detect the absence of ping6 and try to use ping instead.
      
      Fixes: d1f1b9cb ("selftests: net: Introduce first PMTU test")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Acked-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c81c7012
    • Sabrina Dubroca's avatar
      selftests: pmtu: maximum MTU for vti4 is 2^16-1-20 · 902b5417
      Sabrina Dubroca authored
      Since commit 82612de1 ("ip_tunnel: restore binding to ifaces with a
      large mtu"), the maximum MTU for vti4 is based on IP_MAX_MTU instead of
      the mysterious constant 0xFFF8.  This makes this selftest fail.
      
      Fixes: 82612de1 ("ip_tunnel: restore binding to ifaces with a large mtu")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Acked-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Acked-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      902b5417
    • Florian Westphal's avatar
      tcp: do not restart timewait timer on rst reception · 63cc357f
      Florian Westphal authored
      RFC 1337 says:
       ''Ignore RST segments in TIME-WAIT state.
         If the 2 minute MSL is enforced, this fix avoids all three hazards.''
      
      So with net.ipv4.tcp_rfc1337=1, expected behaviour is to have TIME-WAIT sk
      expire rather than removing it instantly when a reset is received.
      
      However, Linux will also re-start the TIME-WAIT timer.
      
      This causes connect to fail when tying to re-use ports or very long
      delays (until syn retry interval exceeds MSL).
      
      packetdrill test case:
      // Demonstrate bogus rearming of TIME-WAIT timer in rfc1337 mode.
      `sysctl net.ipv4.tcp_rfc1337=1`
      
      0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      0.000 bind(3, ..., ...) = 0
      0.000 listen(3, 1) = 0
      
      0.100 < S 0:0(0) win 29200 <mss 1460,nop,nop,sackOK,nop,wscale 7>
      0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
      0.200 < . 1:1(0) ack 1 win 257
      0.200 accept(3, ..., ...) = 4
      
      // Receive first segment
      0.310 < P. 1:1001(1000) ack 1 win 46
      
      // Send one ACK
      0.310 > . 1:1(0) ack 1001
      
      // read 1000 byte
      0.310 read(4, ..., 1000) = 1000
      
      // Application writes 100 bytes
      0.350 write(4, ..., 100) = 100
      0.350 > P. 1:101(100) ack 1001
      
      // ACK
      0.500 < . 1001:1001(0) ack 101 win 257
      
      // close the connection
      0.600 close(4) = 0
      0.600 > F. 101:101(0) ack 1001 win 244
      
      // Our side is in FIN_WAIT_1 & waits for ack to fin
      0.7 < . 1001:1001(0) ack 102 win 244
      
      // Our side is in FIN_WAIT_2 with no outstanding data.
      0.8 < F. 1001:1001(0) ack 102 win 244
      0.8 > . 102:102(0) ack 1002 win 244
      
      // Our side is now in TIME_WAIT state, send ack for fin.
      0.9 < F. 1002:1002(0) ack 102 win 244
      0.9 > . 102:102(0) ack 1002 win 244
      
      // Peer reopens with in-window SYN:
      1.000 < S 1000:1000(0) win 9200 <mss 1460,nop,nop,sackOK,nop,wscale 7>
      
      // Therefore, reply with ACK.
      1.000 > . 102:102(0) ack 1002 win 244
      
      // Peer sends RST for this ACK.  Normally this RST results
      // in tw socket removal, but rfc1337=1 setting prevents this.
      1.100 < R 1002:1002(0) win 244
      
      // second syn. Due to rfc1337=1 expect another pure ACK.
      31.0 < S 1000:1000(0) win 9200 <mss 1460,nop,nop,sackOK,nop,wscale 7>
      31.0 > . 102:102(0) ack 1002 win 244
      
      // .. and another RST from peer.
      31.1 < R 1002:1002(0) win 244
      31.2 `echo no timer restart;ss -m -e -a -i -n -t -o state TIME-WAIT`
      
      // third syn after one minute.  Time-Wait socket should have expired by now.
      63.0 < S 1000:1000(0) win 9200 <mss 1460,nop,nop,sackOK,nop,wscale 7>
      
      // so we expect a syn-ack & 3whs to proceed from here on.
      63.0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
      
      Without this patch, 'ss' shows restarts of tw timer and last packet is
      thus just another pure ack, more than one minute later.
      
      This restores the original code from commit 283fd6cf0be690a83
      ("Merge in ANK networking jumbo patch") in netdev-vger-cvs.git .
      
      For some reason the else branch was removed/lost in 1f28b683339f7
      ("Merge in TCP/UDP optimizations and [..]") and timer restart became
      unconditional.
      Reported-by: default avatarMichal Tesar <mtesar@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      63cc357f
    • Pavel Machek's avatar
      net/rds: RDS is not Radio Data System · b0e0b0ab
      Pavel Machek authored
      Getting prompt "The RDS Protocol" (RDS) is not too helpful, and it is
      easily confused with Radio Data System (which we may want to support
      in kernel, too).
      Signed-off-by: default avatarPavel Machek <pavel@ucw.cz>
      Acked-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Acked-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b0e0b0ab
    • Dexuan Cui's avatar
      hv_netvsc: Fix a deadlock by getting rtnl lock earlier in netvsc_probe() · e04e7a7b
      Dexuan Cui authored
      This patch fixes the race between netvsc_probe() and
      rndis_set_subchannel(), which can cause a deadlock.
      
      These are the related 3 paths which show the deadlock:
      
      path #1:
          Workqueue: hv_vmbus_con vmbus_onmessage_work [hv_vmbus]
          Call Trace:
           schedule
           schedule_preempt_disabled
           __mutex_lock
           __device_attach
           bus_probe_device
           device_add
           vmbus_device_register
           vmbus_onoffer
           vmbus_onmessage_work
           process_one_work
           worker_thread
           kthread
           ret_from_fork
      
      path #2:
          schedule
           schedule_preempt_disabled
           __mutex_lock
           netvsc_probe
           vmbus_probe
           really_probe
           __driver_attach
           bus_for_each_dev
           driver_attach_async
           async_run_entry_fn
           process_one_work
           worker_thread
           kthread
           ret_from_fork
      
      path #3:
          Workqueue: events netvsc_subchan_work [hv_netvsc]
          Call Trace:
           schedule
           rndis_set_subchannel
           netvsc_subchan_work
           process_one_work
           worker_thread
           kthread
           ret_from_fork
      
      Before path #1 finishes, path #2 can start to run, because just before
      the "bus_probe_device(dev);" in device_add() in path #1, there is a line
      "object_uevent(&dev->kobj, KOBJ_ADD);", so systemd-udevd can
      immediately try to load hv_netvsc and hence path #2 can start to run.
      
      Next, path #2 offloads the subchannal's initialization to a workqueue,
      i.e. path #3, so we can end up in a deadlock situation like this:
      
      Path #2 gets the device lock, and is trying to get the rtnl lock;
      Path #3 gets the rtnl lock and is waiting for all the subchannel messages
      to be processed;
      Path #1 is trying to get the device lock, but since #2 is not releasing
      the device lock, path #1 has to sleep; since the VMBus messages are
      processed one by one, this means the sub-channel messages can't be
      procedded, so #3 has to sleep with the rtnl lock held, and finally #2
      has to sleep... Now all the 3 paths are sleeping and we hit the deadlock.
      
      With the patch, we can make sure #2 gets both the device lock and the
      rtnl lock together, gets its job done, and releases the locks, so #1
      and #3 will not be blocked for ever.
      
      Fixes: 8195b139 ("hv_netvsc: fix deadlock on hotplug")
      Signed-off-by: default avatarDexuan Cui <decui@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e04e7a7b
    • Jakub Kicinski's avatar
      nfp: wait for posted reconfigs when disabling the device · 9ad716b9
      Jakub Kicinski authored
      To avoid leaking a running timer we need to wait for the
      posted reconfigs after netdev is unregistered.  In common
      case the process of deinitializing the device will perform
      synchronous reconfigs which wait for posted requests, but
      especially with VXLAN ports being actively added and removed
      there can be a race condition leaving a timer running after
      adapter structure is freed leading to a crash.
      
      Add an explicit flush after deregistering and for a good
      measure a warning to check if timer is running just before
      structures are freed.
      
      Fixes: 3d780b92 ("nfp: add async reconfiguration mechanism")
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9ad716b9
    • Eric Dumazet's avatar
      Revert "packet: switch kvzalloc to allocate memory" · 3a7ad063
      Eric Dumazet authored
      This reverts commit 71e41286.
      
      mmap()/munmap() can not be backed by kmalloced pages :
      
      We fault in :
      
          VM_BUG_ON_PAGE(PageSlab(page), page);
      
          unmap_single_vma+0x8a/0x110
          unmap_vmas+0x4b/0x90
          unmap_region+0xc9/0x140
          do_munmap+0x274/0x360
          vm_munmap+0x81/0xc0
          SyS_munmap+0x2b/0x40
          do_syscall_64+0x13e/0x1c0
          entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      Fixes: 71e41286 ("packet: switch kvzalloc to allocate memory")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarJohn Sperbeck <jsperbeck@google.com>
      Bisected-by: default avatarJohn Sperbeck <jsperbeck@google.com>
      Cc: Zhang Yu <zhangyu31@baidu.com>
      Cc: Li RongQing <lirongqing@baidu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a7ad063
  4. 30 Aug, 2018 18 commits
    • David S. Miller's avatar
      Merge branch 'net_sched-reject-unknown-tcfa_action-values' · dc641794
      David S. Miller authored
      Paolo Abeni says:
      
      ====================
      net_sched: reject unknown tcfa_action values
      
      As agreed some time ago, this changeset reject unknown tcfa_action values,
      instead of changing such values under the hood.
      
      A tdc test is included to verify the new behavior.
      
      v1 -> v2:
       - helper is now static and renamed according to act_* convention
       - updated extack message, according to the new behavior
      ====================
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc641794
    • Paolo Abeni's avatar
      tc-testing: add test-cases for numeric and invalid control action · 25a8238f
      Paolo Abeni authored
      Only the police action allows us to specify an arbitrary numeric value
      for the control action. This change introduces an explicit test case
      for the above feature and then leverage it for testing the kernel behavior
      for invalid control actions (reject).
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      25a8238f
    • Paolo Abeni's avatar
      net_sched: reject unknown tcfa_action values · 97763dc0
      Paolo Abeni authored
      After the commit 802bfb19 ("net/sched: user-space can't set
      unknown tcfa_action values"), unknown tcfa_action values are
      converted to TC_ACT_UNSPEC, but the common agreement is instead
      rejecting such configurations.
      
      This change also introduces a helper to simplify the destruction
      of a single action, avoiding code duplication.
      
      v1 -> v2:
       - helper is now static and renamed according to act_* convention
       - updated extack message, according to the new behavior
      
      Fixes: 802bfb19 ("net/sched: user-space can't set unknown tcfa_action values")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      97763dc0
    • Baruch Siach's avatar
      net: mvpp2: initialize port of_node pointer · c4053ef3
      Baruch Siach authored
      Without a valid of_node in struct device we can't find the mvpp2 port
      device by its DT node. Specifically, this breaks
      of_find_net_device_by_node().
      
      For example, the Armada 8040 based Clearfog GT-8K uses Marvell 88E6141
      switch connected to the &cp1_eth2 port:
      
      &cp1_mdio {
      	...
      
      	switch0: switch0@4 {
      		compatible = "marvell,mv88e6085";
      		...
      
      		ports {
      			...
      
      			port@5 {
      				reg = <5>;
      				label = "cpu";
      				ethernet = <&cp1_eth2>;
      			};
      		};
      	};
      };
      
      Without this patch, dsa_register_switch() returns -EPROBE_DEFER because
      of_find_net_device_by_node() can't find the device_node of the &cp1_eth2
      device.
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarBaruch Siach <baruch@tkos.co.il>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c4053ef3
    • Doug Berger's avatar
      net: bcmgenet: use MAC link status for fixed phy · c3c397c1
      Doug Berger authored
      When using the fixed PHY with GENET (e.g. MOCA) the PHY link
      status can be determined from the internal link status captured
      by the MAC. This allows the PHY state machine to use the correct
      link state with the fixed PHY even if MAC link event interrupts
      are missed when the net device is opened.
      
      Fixes: 8d88c6eb ("net: bcmgenet: enable MoCA link state change detection")
      Signed-off-by: default avatarDoug Berger <opendmb@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c3c397c1
    • Dinh Nguyen's avatar
      net: stmmac: build the dwmac-socfpga platform driver for Stratix10 · c305660b
      Dinh Nguyen authored
      The Stratix10 SoC is an AARCH64 based platform that shares the same ethernet
      controller that is on other SoCFPGA platforms. Build the platform driver.
      Signed-off-by: default avatarDinh Nguyen <dinguyen@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c305660b
    • David S. Miller's avatar
      Merge branch 'ipv6-fix-error-path-of-inet6_init' · e0b7e7dc
      David S. Miller authored
      Sabrina Dubroca says:
      
      ====================
      ipv6: fix error path of inet6_init()
      
      The error path of inet6_init() can trigger multiple kernel panics,
      mostly due to wrong ordering of cleanups. This series fixes those
      issues.
      ====================
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e0b7e7dc
    • Sabrina Dubroca's avatar
      net: rtnl: return early from rtnl_unregister_all when protocol isn't registered · f707ef61
      Sabrina Dubroca authored
      rtnl_unregister_all(PF_INET6) gets called from inet6_init in cases when
      no handler has been registered for PF_INET6 yet, for example if
      ip6_mr_init() fails. Abort and avoid a NULL pointer deref in that case.
      
      Example of panic (triggered by faking a failure of
       register_pernet_subsys):
      
          general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI
          [...]
          RIP: 0010:rtnl_unregister_all+0x17e/0x2a0
          [...]
          Call Trace:
           ? rtnetlink_net_init+0x250/0x250
           ? sock_unregister+0x103/0x160
           ? kernel_getsockopt+0x200/0x200
           inet6_init+0x197/0x20d
      
      Fixes: e2fddf5e ("[IPV6]: Make af_inet6 to check ip6_route_init return value.")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f707ef61
    • Sabrina Dubroca's avatar
      ipv6: fix cleanup ordering for pingv6 registration · a03dc36b
      Sabrina Dubroca authored
      Commit 6d0bfe22 ("net: ipv6: Add IPv6 support to the ping socket.")
      contains an error in the cleanup path of inet6_init(): when
      proto_register(&pingv6_prot, 1) fails, we try to unregister
      &pingv6_prot. When rawv6_init() fails, we skip unregistering
      &pingv6_prot.
      
      Example of panic (triggered by faking a failure of
       proto_register(&pingv6_prot, 1)):
      
          general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI
          [...]
          RIP: 0010:__list_del_entry_valid+0x79/0x160
          [...]
          Call Trace:
           proto_unregister+0xbb/0x550
           ? trace_preempt_on+0x6f0/0x6f0
           ? sock_no_shutdown+0x10/0x10
           inet6_init+0x153/0x1b8
      
      Fixes: 6d0bfe22 ("net: ipv6: Add IPv6 support to the ping socket.")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a03dc36b
    • Sabrina Dubroca's avatar
      ipv6: fix cleanup ordering for ip6_mr failure · afe49de4
      Sabrina Dubroca authored
      Commit 15e66807 ("ipv6: reorder icmpv6_init() and ip6_mr_init()")
      moved the cleanup label for ipmr_fail, but should have changed the
      contents of the cleanup labels as well. Now we can end up cleaning up
      icmpv6 even though it hasn't been initialized (jump to icmp_fail or
      ipmr_fail).
      
      Simply undo things in the reverse order of their initialization.
      
      Example of panic (triggered by faking a failure of icmpv6_init):
      
          kasan: GPF could be caused by NULL-ptr deref or user memory access
          general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI
          [...]
          RIP: 0010:__list_del_entry_valid+0x79/0x160
          [...]
          Call Trace:
           ? lock_release+0x8a0/0x8a0
           unregister_pernet_operations+0xd4/0x560
           ? ops_free_list+0x480/0x480
           ? down_write+0x91/0x130
           ? unregister_pernet_subsys+0x15/0x30
           ? down_read+0x1b0/0x1b0
           ? up_read+0x110/0x110
           ? kmem_cache_create_usercopy+0x1b4/0x240
           unregister_pernet_subsys+0x1d/0x30
           icmpv6_cleanup+0x1d/0x30
           inet6_init+0x1b5/0x23f
      
      Fixes: 15e66807 ("ipv6: reorder icmpv6_init() and ip6_mr_init()")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      afe49de4
    • Davide Caratti's avatar
      net/sched: act_pedit: fix dump of extended layered op · 85eb9af1
      Davide Caratti authored
      in the (rare) case of failure in nla_nest_start(), missing NULL checks in
      tcf_pedit_key_ex_dump() can make the following command
      
       # tc action add action pedit ex munge ip ttl set 64
      
      dereference a NULL pointer:
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
       PGD 800000007d1cd067 P4D 800000007d1cd067 PUD 7acd3067 PMD 0
       Oops: 0002 [#1] SMP PTI
       CPU: 0 PID: 3336 Comm: tc Tainted: G            E     4.18.0.pedit+ #425
       Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
       RIP: 0010:tcf_pedit_dump+0x19d/0x358 [act_pedit]
       Code: be 02 00 00 00 48 89 df 66 89 44 24 20 e8 9b b1 fd e0 85 c0 75 46 8b 83 c8 00 00 00 49 83 c5 08 48 03 83 d0 00 00 00 4d 39 f5 <66> 89 04 25 00 00 00 00 0f 84 81 01 00 00 41 8b 45 00 48 8d 4c 24
       RSP: 0018:ffffb5d4004478a8 EFLAGS: 00010246
       RAX: ffff8880fcda2070 RBX: ffff8880fadd2900 RCX: 0000000000000000
       RDX: 0000000000000002 RSI: ffffb5d4004478ca RDI: ffff8880fcda206e
       RBP: ffff8880fb9cb900 R08: 0000000000000008 R09: ffff8880fcda206e
       R10: ffff8880fadd2900 R11: 0000000000000000 R12: ffff8880fd26cf40
       R13: ffff8880fc957430 R14: ffff8880fc957430 R15: ffff8880fb9cb988
       FS:  00007f75a537a740(0000) GS:ffff8880fda00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000000 CR3: 000000007a2fa005 CR4: 00000000001606f0
       Call Trace:
        ? __nla_reserve+0x38/0x50
        tcf_action_dump_1+0xd2/0x130
        tcf_action_dump+0x6a/0xf0
        tca_get_fill.constprop.31+0xa3/0x120
        tcf_action_add+0xd1/0x170
        tc_ctl_action+0x137/0x150
        rtnetlink_rcv_msg+0x263/0x2d0
        ? _cond_resched+0x15/0x40
        ? rtnl_calcit.isra.30+0x110/0x110
        netlink_rcv_skb+0x4d/0x130
        netlink_unicast+0x1a3/0x250
        netlink_sendmsg+0x2ae/0x3a0
        sock_sendmsg+0x36/0x40
        ___sys_sendmsg+0x26f/0x2d0
        ? do_wp_page+0x8e/0x5f0
        ? handle_pte_fault+0x6c3/0xf50
        ? __handle_mm_fault+0x38e/0x520
        ? __sys_sendmsg+0x5e/0xa0
        __sys_sendmsg+0x5e/0xa0
        do_syscall_64+0x5b/0x180
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7f75a4583ba0
       Code: c3 48 8b 05 f2 62 2c 00 f7 db 64 89 18 48 83 cb ff eb dd 0f 1f 80 00 00 00 00 83 3d fd c3 2c 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ae cc 00 00 48 89 04 24
       RSP: 002b:00007fff60ee7418 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
       RAX: ffffffffffffffda RBX: 00007fff60ee7540 RCX: 00007f75a4583ba0
       RDX: 0000000000000000 RSI: 00007fff60ee7490 RDI: 0000000000000003
       RBP: 000000005b842d3e R08: 0000000000000002 R09: 0000000000000000
       R10: 00007fff60ee6ea0 R11: 0000000000000246 R12: 0000000000000000
       R13: 00007fff60ee7554 R14: 0000000000000001 R15: 000000000066c100
       Modules linked in: act_pedit(E) ip6table_filter ip6_tables iptable_filter binfmt_misc crct10dif_pclmul ext4 crc32_pclmul mbcache ghash_clmulni_intel jbd2 pcbc snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd snd_timer cryptd glue_helper snd joydev pcspkr soundcore virtio_balloon i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net net_failover virtio_blk virtio_console failover qxl crc32c_intel drm_kms_helper syscopyarea serio_raw sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix virtio_pci libata virtio_ring i2c_core virtio floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: act_pedit]
       CR2: 0000000000000000
      
      Like it's done for other TC actions, give up dumping pedit rules and return
      an error if nla_nest_start() returns NULL.
      
      Fixes: 71d0ed70 ("net/act_pedit: Support using offset relative to the conventional network headers")
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      85eb9af1
    • Chris Brandt's avatar
      sh_eth: Add R7S9210 support · 6e0bb04d
      Chris Brandt authored
      Add support for the R7S9210 which is part of the RZ/A2 series.
      Signed-off-by: default avatarChris Brandt <chris.brandt@renesas.com>
      Acked-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e0bb04d
    • David S. Miller's avatar
      Merge branch 'hns-fixes' · def70b61
      David S. Miller authored
      Peng Li says:
      
      ====================
      net: hns: fix some bugs about speed and duplex change
      
      If there are packets in hardware when changing the spped
      or duplex, it may cause hardware hang up.
      
      This patchset adds the code for waiting chip to clean the all
      pkts(TX & RX) in chip when the driver uses the function named
      "adjust link".
      
      This patchset cleans the pkts as follows:
      1) close rx of chip, close tx of protocol stack.
      2) wait rcb, ppe, mac to clean.
      3) adjust link
      4) open rx of chip, open tx of protocol stack.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      def70b61
    • Peng Li's avatar
      net: hns: add netif_carrier_off before change speed and duplex · 455c4401
      Peng Li authored
      If there are packets in hardware when changing the speed
      or duplex, it may cause hardware hang up.
      
      This patch adds netif_carrier_off before change speed and
      duplex in ethtool_ops.set_link_ksettings, and adds
      netif_carrier_on after complete the change.
      Signed-off-by: default avatarPeng Li <lipeng321@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      455c4401
    • Peng Li's avatar
      net: hns: add the code for cleaning pkt in chip · 31fabbee
      Peng Li authored
      If there are packets in hardware when changing the speed
      or duplex, it may cause hardware hang up.
      
      This patch adds the code for waiting chip to clean the all
      pkts(TX & RX) in chip when the driver uses the function named
      "adjust link".
      
      This patch cleans the pkts as follows:
      1) close rx of chip, close tx of protocol stack.
      2) wait rcb, ppe, mac to clean.
      3) adjust link
      4) open rx of chip, open tx of protocol stack.
      Signed-off-by: default avatarPeng Li <lipeng321@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      31fabbee
    • Azat Khuzhin's avatar
      r8169: set RxConfig after tx/rx is enabled for RTL8169sb/8110sb devices · 05212ba8
      Azat Khuzhin authored
      I have two Ethernet adapters:
        r8169 0000:03:01.0 eth0: RTL8169sb/8110sb, 00:14:d1:14:2d:49, XID 10000000, IRQ 18
        r8169 0000:01:00.0 eth0: RTL8168e/8111e, 64:66:b3:11:14:5d, XID 2c200000, IRQ 30
      And after upgrading from linux 4.15 [1] to linux 4.18+ [2] RTL8169sb failed to
      receive any packets. tcpdump shows a lot of checksum mismatch.
      
        [1]: a0f79386
        [2]: 05193597 (4.19 merge window opened)
      
      I started bisecting and the found that [3] breaks it. According to [4]:
        "For 8110S, 8110SB, and 8110SC series, the initial value of RxConfig
        needs to be set after the tx/rx is enabled."
      So I moved rtl_init_rxcfg() after enabling tx/rs and now my adapter works
      (RTL8168e works too).
      
        [3]: 3559d81e
        [4]: e542a226 ("r8169: adjust the RxConfig
      settings.")
      
      Also drop "rx" from rtl_set_rx_tx_config_registers(), since it does nothing
      with it already.
      
      Fixes: 3559d81e ("r8169: simplify
      rtl_hw_start_8169")
      
      Cc: Heiner Kallweit <hkallweit1@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: netdev@vger.kernel.org
      Cc: Realtek linux nic maintainers <nic_swsd@realtek.com>
      Signed-off-by: default avatarAzat Khuzhin <a3at.mail@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      05212ba8
    • Cong Wang's avatar
      tipc: switch to rhashtable iterator · 9a07efa9
      Cong Wang authored
      syzbot reported a use-after-free in tipc_group_fill_sock_diag(),
      where tipc_group_fill_sock_diag() still reads tsk->group meanwhile
      tipc_group_delete() just deletes it in tipc_release().
      
      tipc_nl_sk_walk() aims to lock this sock when walking each sock
      in the hash table to close race conditions with sock changes like
      this one, by acquiring tsk->sk.sk_lock.slock spinlock, unfortunately
      this doesn't work at all. All non-BH call path should take
      lock_sock() instead to make it work.
      
      tipc_nl_sk_walk() brutally iterates with raw rht_for_each_entry_rcu()
      where RCU read lock is required, this is the reason why lock_sock()
      can't be taken on this path. This could be resolved by switching to
      rhashtable iterator API's, where taking a sleepable lock is possible.
      Also, the iterator API's are friendly for restartable calls like
      diag dump, the last position is remembered behind the scence,
      all we need to do here is saving the iterator into cb->args[].
      
      I tested this with parallel tipc diag dump and thousands of tipc
      socket creation and release, no crash or memory leak.
      
      Reported-by: syzbot+b9c8f3ab2994b7cd1625@syzkaller.appspotmail.com
      Cc: Jon Maloy <jon.maloy@ericsson.com>
      Cc: Ying Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9a07efa9
    • Jerome Brunet's avatar
      Revert "net: stmmac: Do not keep rearming the coalesce timer in stmmac_xmit" · e5133f2f
      Jerome Brunet authored
      This reverts commit 4ae0169f.
      
      This change in the handling of the coalesce timer is causing regression on
      (at least) amlogic platforms.
      
      Network will break down very quickly (a few seconds) after starting
      a download. This can easily be reproduced using iperf3 for example.
      
      The problem has been reported on the S805, S905, S912 and A113 SoCs
      (Realtek and Micrel PHYs) and it is likely impacting all Amlogics
      platforms using Gbit ethernet
      
      No problem was seen with the platform using 10/100 only PHYs (GXL internal)
      
      Reverting change brings things back to normal and allows to use network
      again until we better understand the problem with the coalesce timer.
      
      Cc: Jose Abreu <joabreu@synopsys.com>
      Cc: Joao Pinto <jpinto@synopsys.com>
      Cc: Vitor Soares <soares@synopsys.com>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Cc: Corentin Labbe <clabbe@baylibre.com>
      Signed-off-by: default avatarJerome Brunet <jbrunet@baylibre.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e5133f2f