1. 16 Mar, 2018 20 commits
    • David S. Miller's avatar
      Merge branch 'rtnl_lock_killable' · ce627a1b
      David S. Miller authored
      Kirill Tkhai says:
      
      ====================
      Introduce rtnl_lock_killable()
      
      rtnl_lock() is widely used mutex in kernel. Some of kernel code
      does memory allocations under it. In case of memory deficit this
      may invoke OOM killer, but the problem is a killed task can't
      exit if it's waiting for the mutex. This may be a reason of deadlock
      and panic.
      
      This patchset adds a new primitive, which responds on SIGKILL,
      and it allows to use it in the places, where we don't want
      to sleep forever. Also, the first place is made to use it.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ce627a1b
    • Kirill Tkhai's avatar
      net: Use rtnl_lock_killable() in register_netdev() · b0f3debc
      Kirill Tkhai authored
      This patch adds rtnl_lock_killable() to one of hot path
      using rtnl_lock().
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b0f3debc
    • Kirill Tkhai's avatar
      net: Add rtnl_lock_killable() · 79ffdfc6
      Kirill Tkhai authored
      rtnl_lock() is widely used mutex in kernel. Some of kernel code
      does memory allocations under it. In case of memory deficit this
      may invoke OOM killer, but the problem is a killed task can't
      exit if it's waiting for the mutex. This may be a reason of deadlock
      and panic.
      
      This patch adds a new primitive, which responds on SIGKILL, and
      it allows to use it in the places, where we don't want to sleep
      forever.
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      79ffdfc6
    • Tonghao Zhang's avatar
      doc: Change the udp/sctp rmem/wmem default value. · 320bd6de
      Tonghao Zhang authored
      The SK_MEM_QUANTUM was changed from PAGE_SIZE to 4096.
      Signed-off-by: default avatarTonghao Zhang <xiangxia.m.yue@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      320bd6de
    • Tonghao Zhang's avatar
      udp: Move the udp sysctl to namespace. · 1e802951
      Tonghao Zhang authored
      This patch moves the udp_rmem_min, udp_wmem_min
      to namespace and init the udp_l3mdev_accept explicitly.
      
      The udp_rmem_min/udp_wmem_min affect udp rx/tx queue,
      with this patch namespaces can set them differently.
      Signed-off-by: default avatarTonghao Zhang <xiangxia.m.yue@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1e802951
    • David S. Miller's avatar
      Merge branch 'net-ipv6-Address-checks-need-to-consider-the-L3-domain' · 859844e5
      David S. Miller authored
      David Ahern says:
      
      ====================
      net/ipv6: Address checks need to consider the L3 domain
      
      IPv6 prohibits a local address from being used as a gateway for a route.
      However, it is ok for the gateway to be a local address in a different L3
      domain (e.g., VRF). This allows, for example, veth pairs to connect VRFs.
      
      ip6_route_info_create calls ipv6_chk_addr_and_flags for gateway addresses
      to determine if the address is a local one, but ipv6_chk_addr_and_flags
      does not currently consider L3 domains. As a result routes can not be
      added in one VRF with a nexthop that points to a local address in a
      second VRF.
      
      Resolve by comparing the l3mdev for the passed in device and requiring an
      l3mdev match with the device containing an address. The intent of checking
      for an address on the specified device versus any device in the domain is
      mantained by a new argument to skip the check between the passed in device
      and the device with the address.
      
      Patch 1 moves the gateway validation from ip6_route_info_create into a
      helper; the function is long enough and refactoring drops the indent
      level.
      
      Patch 2 adds a skip_dev_check argument to ipv6_chk_addr_and_flags to
      allow a device to always be passed yet skip the device check when
      looking at addresses and fixes up a few ipv6_chk_addr callers that
      pass a NULL device.
      
      Patch 3 adds l3mdev checks to ipv6_chk_addr_and_flags.
      
      Patches 4 and 5 do some refactoring to the fib_tests script and then
      patch 6 adds nexthop validation tests.
      
      v4
      - separated l3mdev check into a separate patch (patch 3 of this set)
        as suggested by Kirill
      - consolidated dev and ipv6_chk_addr_and_flags call into 1 if (Kirill)
      - added a temp variable for gw type (Kirill)
      
      v3
      - set skip_dev_check in ipv6_chk_addr based on dev == NULL (per
        comment from Ido)
      
      v2
      - handle 2 variations of route spec with sane error path
      - add test cases
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      859844e5
    • David Ahern's avatar
      selftests: fib_tests: Add IPv6 nexthop spec tests · 654d3a78
      David Ahern authored
      Add series of tests for valid and invalid nexthop specs for IPv6.
      
      $ TEST=fib_nexthop_test ./fib_tests.sh
      ...
      IPv6 nexthop tests
          TEST: Directly connected nexthop, unicast address              [ OK ]
          TEST: Directly connected nexthop, unicast address with device  [ OK ]
          TEST: Gateway is linklocal address                             [ OK ]
          TEST: Gateway is linklocal address, no device                  [ OK ]
          TEST: Gateway can not be local unicast address                 [ OK ]
          TEST: Gateway can not be local unicast address, with device    [ OK ]
          TEST: Gateway can not be a local linklocal address             [ OK ]
          TEST: Gateway can be local address in a VRF                    [ OK ]
          TEST: Gateway can be local address in a VRF, with device       [ OK ]
          TEST: Gateway can be local linklocal address in a VRF          [ OK ]
          TEST: Redirect to VRF lookup                                   [ OK ]
          TEST: VRF route, gateway can be local address in default VRF   [ OK ]
          TEST: VRF route, gateway can not be a local address            [ OK ]
          TEST: VRF route, gateway can not be a local addr with device   [ OK ]
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      654d3a78
    • David Ahern's avatar
      selftests: fib_tests: Allow user to run a specific test · a511858c
      David Ahern authored
      Allow a user to run just a specific fib test by setting the TEST
      environment variable.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a511858c
    • David Ahern's avatar
      selftests: fib_tests: Use an alias for ip command · 171a4871
      David Ahern authored
      Replace 'ip -netns testns' with the alias IP. Shortens the line lengths
      and makes running the commands manually a bit easier.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      171a4871
    • David Ahern's avatar
      net/ipv6: Add l3mdev check to ipv6_chk_addr_and_flags · 1893ff20
      David Ahern authored
      Lookup the L3 master device for the passed in device. Only consider
      addresses on netdev's with the same master device. If the device is
      not enslaved or is NULL, then the l3mdev is NULL which means only
      devices not enslaved (ie, in the default domain) are considered.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1893ff20
    • David Ahern's avatar
      net/ipv6: Change address check to always take a device argument · 232378e8
      David Ahern authored
      ipv6_chk_addr_and_flags determines if an address is a local address and
      optionally if it is an address on a specific device. For example, it is
      called by ip6_route_info_create to determine if a given gateway address
      is a local address. The address check currently does not consider L3
      domains and as a result does not allow a route to be added in one VRF
      if the nexthop points to an address in a second VRF. e.g.,
      
          $ ip route add 2001:db8:1::/64 vrf r2 via 2001:db8:102::23
          Error: Invalid gateway address.
      
      where 2001:db8:102::23 is an address on an interface in vrf r1.
      
      ipv6_chk_addr_and_flags needs to allow callers to always pass in a device
      with a separate argument to not limit the address to the specific device.
      The device is used used to determine the L3 domain of interest.
      
      To that end add an argument to skip the device check and update callers
      to always pass a device where possible and use the new argument to mean
      any address in the domain.
      
      Update a handful of users of ipv6_chk_addr with a NULL dev argument. This
      patch handles the change to these callers without adding the domain check.
      
      ip6_validate_gw needs to handle 2 cases - one where the device is given
      as part of the nexthop spec and the other where the device is resolved.
      There is at least 1 VRF case where deferring the check to only after
      the route lookup has resolved the device fails with an unintuitive error
      "RTNETLINK answers: No route to host" as opposed to the preferred
      "Error: Gateway can not be a local address." The 'no route to host'
      error is because of the fallback to a full lookup. The check is done
      twice to avoid this error.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      232378e8
    • David Ahern's avatar
      net/ipv6: Refactor gateway validation on route add · 9fbb704c
      David Ahern authored
      Move gateway validation code from ip6_route_info_create into
      ip6_validate_gw. Code move plus adjustments to handle the potential
      reset of dev and idev and to make checkpatch happy.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9fbb704c
    • David S. Miller's avatar
      Merge branch 'macb-Introduce-phy-handle-DT-functionality' · 1ad2ff02
      David S. Miller authored
      Brad Mouring says:
      
      ====================
      net: macb: Introduce phy-handle DT functionality
      
      Consider the situation where a macb netdev is connected through
      a phydev that sits on a mii bus other than the one provided to
      this particular netdev. This situation is what this patchset aims
      to accomplish through the existing phy-handle optional binding.
      
      This optional binding (as described in the ethernet DT bindings doc)
      directs the netdev to the phydev to use. This is precisely the
      situation this patchset aims to solve, so it makes sense to introduce
      the functionality to this driver (where the physical layout discussed
      was encountered).
      
      The devicetree snippet would look something like this:
      
      ...
         ethernet@feedf00d {
                 ...
                 phy-handle = <&phy0> // the first netdev is physically wired to phy0
                 ...
                 phy0: phy@0 {
                         ...
                         reg = <0x0> // MDIO address 0
                         ...
                 }
                 phy1: phy@1 {
                         ...
                         reg = <0x1> // MDIO address 1
                         ...
                 }
                 ...
         }
      
         ethernet@deadbeef {
                 ...
                 phy-handle = <&phy1> // tells the driver to use phy1 on the
                                      // first mac's mdio bus (it's wired thusly)
                 ...
         }
      ...
      
      The work done to add the phy_node in the first place (dacdbb4d:
      "net: macb: add fixed-link node support") will consume the
      device_node (if found).
      
      v2: Reorganization of mii probe/init functions, suggested by Andrew Lunn
      v3: Moved some of the bus init code back into init (erroneously moved to probe)
          some style issues, and an unintialized variable warning addressed.
      v4: Add Reviewed-by: tags
          Skip fallback code if phy-handle phandle is found
      v5: Cleanup formatting issues
          Fix compile failure introduced in 1/4 "net: macb: Reorganize macb_mii
              bringup"
          Fix typo in "Documentation: macb: Document phy-handle binding"
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ad2ff02
    • Brad Mouring's avatar
      Documentation: macb: Document phy-handle binding · f3b249e6
      Brad Mouring authored
      Document the existence of the optional binding, directing to the
      general ethernet document that describes this binding.
      Signed-off-by: default avatarBrad Mouring <brad.mouring@ni.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f3b249e6
    • Brad Mouring's avatar
      net: macb: Add phy-handle DT support · 2105a5d3
      Brad Mouring authored
      This optional binding (as described in the ethernet DT bindings doc)
      directs the netdev to the phydev to use. This is useful for a phy
      chip that has >1 phy in it, and two netdevs are using the same phy
      chip (i.e. the second mac's phy lives on the first mac's MDIO bus)
      
      The devicetree snippet would look something like this:
      
      ethernet@feedf00d {
      	...
      	phy-handle = <&phy0> // the first netdev is physically wired to phy0
      	...
      	phy0: phy@0 {
      		...
      		reg = <0x0> // MDIO address 0
      		...
      	}
      	phy1: phy@1 {
      		...
      		reg = <0x1> // MDIO address 1
      		...
      	}
      ...
      }
      
      ethernet@deadbeef {
      	...
      	phy-handle = <&phy1> // tells the driver to use phy1 on the
      						 // first mac's mdio bus (it's wired thusly)
      	...
      }
      
      The work done to add the phy_node in the first place (dacdbb4d:
      "net: macb: add fixed-link node support") will consume the
      device_node (if found).
      Signed-off-by: default avatarBrad Mouring <brad.mouring@ni.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2105a5d3
    • Brad Mouring's avatar
      net: macb: Remove redundant poll irq assignment · cb732e9a
      Brad Mouring authored
      In phy_device's general probe, this device will already be set for
      phy register polling, rendering this code redundant.
      Signed-off-by: default avatarBrad Mouring <brad.mouring@ni.com>
      Suggested-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cb732e9a
    • Brad Mouring's avatar
      net: macb: Reorganize macb_mii bringup · 739de9a1
      Brad Mouring authored
      The macb mii setup (mii_probe() and mii_init()) previously was
      somewhat interspersed, likely a result of organic growth and hacking.
      
      This change moves mii bus registration into mii_init and probing the
      bus for devices into mii_probe.
      Signed-off-by: default avatarBrad Mouring <brad.mouring@ni.com>
      Suggested-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      739de9a1
    • Stephen Hemminger's avatar
      doc: remove out of date links and info from packet mmap · 2b221d20
      Stephen Hemminger authored
      The packet_mmap documentation had links to no longer existing web
      sites; replace with other site which has similar example.
      
      Support for packet mmap has been in mainline versions of libpcap
      for several years.
      Signed-off-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2b221d20
    • Govindarajulu Varadarajan's avatar
      enic: drop IP proto check for vxlan tunnel delete · ce3db6aa
      Govindarajulu Varadarajan authored
      Commit d1179094 ("enic: Add vxlan offload support for IPv6 pkts")
      added vxlan offload support for IPv6 pkts. Required change in
      enic_udp_tunnel_del was not made. This creates a bug where once user
      adds IPv6 tunnel, hw offload for that cannot be deleted.
      
      This patch removes check for IP proto in tunnel delete path. Driver need
      not check for IP proto since same UDP port cannot be used to create two
      tunnels.
      
      Fixes: d1179094 ("enic: Add vxlan offload support for IPv6 pkts")
      Signed-off-by: default avatarGovindarajulu Varadarajan <gvaradar@cisco.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ce3db6aa
    • Colin Ian King's avatar
      rxrpc: remove redundant initialization of variable 'len' · 650b4eca
      Colin Ian King authored
      The variable 'len' is being initialized with a value that is never
      read and it is re-assigned later, hence the initialization is redundant
      and can be removed.
      
      Cleans up clang warning:
      net/rxrpc/recvmsg.c:275:15: warning: Value stored to 'len' during its
      initialization is never read
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      650b4eca
  2. 15 Mar, 2018 3 commits
    • Neil Horman's avatar
      sctp: Fix double free in sctp_sendmsg_to_asoc · 0aee4c25
      Neil Horman authored
      syzbot/kasan detected a double free in sctp_sendmsg_to_asoc:
      BUG: KASAN: use-after-free in sctp_association_free+0x7b7/0x930
      net/sctp/associola.c:332
      Read of size 8 at addr ffff8801d8006ae0 by task syzkaller914861/4202
      
      CPU: 1 PID: 4202 Comm: syzkaller914861 Not tainted 4.16.0-rc4+ #258
      Hardware name: Google Google Compute Engine/Google Compute Engine
      01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x194/0x24d lib/dump_stack.c:53
       print_address_description+0x73/0x250 mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:354 [inline]
       kasan_report+0x23c/0x360 mm/kasan/report.c:412
       __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
       sctp_association_free+0x7b7/0x930 net/sctp/associola.c:332
       sctp_sendmsg+0xc67/0x1a80 net/sctp/socket.c:2075
       inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:763
       sock_sendmsg_nosec net/socket.c:629 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:639
       SYSC_sendto+0x361/0x5c0 net/socket.c:1748
       SyS_sendto+0x40/0x50 net/socket.c:1716
       do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      This was introduced by commit:
      f84af331 sctp: factor out sctp_sendmsg_to_asoc from sctp_sendmsg
      
      As the newly refactored function moved the wait_for_sndbuf call to a
      point after the association was connected, allowing for peeloff events
      to occur, which in turn caused wait_for_sndbuf to return -EPIPE which
      was not caught by the logic that determines if an association should be
      freed or not.
      
      Fix it the easy way by returning the ordering of
      sctp_primitive_ASSOCIATE and sctp_wait_for_sndbuf to the old order, to
      ensure that EPIPE will not happen.
      
      Tested by myself using the syzbot reproducers with positive results
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      CC: davem@davemloft.net
      CC: Xin Long <lucien.xin@gmail.com>
      Reported-by: syzbot+a4e4112c3aff00c8cfd8@syzkaller.appspotmail.com
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0aee4c25
    • Joe Perches's avatar
      net: drivers/net: Remove unnecessary skb_copy_expand OOM messages · 0c3d5a96
      Joe Perches authored
      skb_copy_expand without __GFP_NOWARN already does a dump_stack
      on OOM so these messages are redundant.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0c3d5a96
    • David S. Miller's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 80d9f3a0
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      40GbE Intel Wired LAN Driver Updates 2018-03-14
      
      This series contains updates to i40e and i40evf only.
      
      Corentin Labbe cleans up the left over FCoE files in the i40e driver.
      
      Gustavo A R Silva fixes a cut and paste error.
      
      Paweł fixes a race condition when the VF driver is loaded on a host and
      virsh is trying to attach it to the virtual machine and set a MAC
      address.  Resolve the issue by adding polling in i40e_ndo_set_vf_mac()
      when the VF is in reset mode.
      
      Jake cleans up i40e_vlan_rx_register() since this only used in a single
      location, so just inline the contents of the function.  Created a helper
      function to proper update the per-filter statistics when we delete it.
      Factored out the re-enabling ATR and SB rules.  Fixed an issue when
      re-enabling ATR after the last TCPv4 filter is removed and ntuple is
      still active, we were not restoring the TCPv4 filter input set.
      
      Filip modifies the permission check function to ensure that it knows how
      many filters are being requested, which allows the check to ensure that
      the total number of filters in a single request does not cause us to go
      over the limit.
      
      Mariusz fixed an issue where the wrong calculation of partition id was
      being done on OCP PHY mezzanine cards, which in turn caused wake on LAN
      to be disabled on certain ports.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      80d9f3a0
  3. 14 Mar, 2018 17 commits