1. 05 Jun, 2018 18 commits
    • Arnd Bergmann's avatar
      netfilter: provide udp*_lib_lookup for nf_tproxy · 6e86000c
      Arnd Bergmann authored
      It is now possible to enable the libified nf_tproxy modules without
      also enabling NETFILTER_XT_TARGET_TPROXY, which throws off the
      ifdef logic in the udp core code:
      
      net/ipv6/netfilter/nf_tproxy_ipv6.o: In function `nf_tproxy_get_sock_v6':
      nf_tproxy_ipv6.c:(.text+0x1a8): undefined reference to `udp6_lib_lookup'
      net/ipv4/netfilter/nf_tproxy_ipv4.o: In function `nf_tproxy_get_sock_v4':
      nf_tproxy_ipv4.c:(.text+0x3d0): undefined reference to `udp4_lib_lookup'
      
      We can actually simplify the conditions now to provide the two functions
      exactly when they are needed.
      
      Fixes: 45ca4e0c ("netfilter: Libify xt_TPROXY")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Acked-by: default avatarMáté Eckl <ecklm94@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e86000c
    • Michal Kalderon's avatar
      qed*: Utilize FW 8.37.2.0 · d52c89f1
      Michal Kalderon authored
      This FW contains several fixes and features.
      
      RDMA
      - Several modifications and fixes for Memory Windows
      - drop vlan and tcp timestamp from mss calculation in driver for
        this FW
      - Fix SQ completion flow when local ack timeout is infinite
      - Modifications in t10dif support
      
      ETH
      - Fix aRFS for tunneled traffic without inner IP.
      - Fix chip configuration which may fail under heavy traffic conditions.
      - Support receiving any-VNI in VXLAN and GENEVE RX classification.
      
      iSCSI / FcoE
      - Fix iSCSI recovery flow
      - Drop vlan and tcp timestamp from mss calc for fw 8.37.2.0
      
      Misc
      - Several registers (split registers) won't read correctly with
        ethtool -d
      Signed-off-by: default avatarAriel Elior <Ariel.Elior@cavium.com>
      Signed-off-by: default avatarManish Rangankar <manish.rangankar@cavium.com>
      Signed-off-by: default avatarMichal Kalderon <Michal.Kalderon@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d52c89f1
    • Maciej Żenczykowski's avatar
      net-tcp: remove useless tw_timeout field · 95358a95
      Maciej Żenczykowski authored
      Tested: 'git grep tw_timeout' comes up empty and it builds :-)
      Signed-off-by: default avatarMaciej Żenczykowski <maze@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95358a95
    • Paul Blakey's avatar
      net: sched: cls: Fix offloading when ingress dev is vxlan · d96a43c6
      Paul Blakey authored
      When using a vxlan device as the ingress dev, we count it as a
      "no offload dev", so when such a rule comes and err stop is true,
      we fail early and don't try the egdev route which can offload it
      through the egress device.
      
      Fix that by not calling the block offload if one of the devices
      attached to it is not offload capable, but make sure egress on such case
      is capable instead.
      
      Fixes: caa72601 ("net: sched: keep track of offloaded filters [..]")
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarPaul Blakey <paulb@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d96a43c6
    • Xin Long's avatar
      sctp: not allow transport timeout value less than HZ/5 for hb_timer · 1d88ba1e
      Xin Long authored
      syzbot reported a rcu_sched self-detected stall on CPU which is caused
      by too small value set on rto_min with SCTP_RTOINFO sockopt. With this
      value, hb_timer will get stuck there, as in its timer handler it starts
      this timer again with this value, then goes to the timer handler again.
      
      This problem is there since very beginning, and thanks to Eric for the
      reproducer shared from a syzbot mail.
      
      This patch fixes it by not allowing sctp_transport_timeout to return a
      smaller value than HZ/5 for hb_timer, which is based on TCP's min rto.
      
      Note that it doesn't fix this issue by limiting rto_min, as some users
      are still using small rto and no proper value was found for it yet.
      
      Reported-by: syzbot+3dcd59a1f907245f891f@syzkaller.appspotmail.com
      Suggested-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1d88ba1e
    • Alexei Starovoitov's avatar
      bpfilter: switch to CC from HOSTCC · 819dd92b
      Alexei Starovoitov authored
      check that CC can build executables and use that compiler instead of HOSTCC
      Suggested-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      819dd92b
    • Wei Yongjun's avatar
      net/mlx5e: fix error return code in mlx5e_alloc_rq() · 47a6ca3f
      Wei Yongjun authored
      Fix to return error code -ENOMEM from the kvzalloc_node() error handling
      case instead of 0, as done elsewhere in this function.
      
      Fixes: 069d1146 ("net/mlx5e: RX, Enhance legacy Receive Queue memory scheme")
      Signed-off-by: default avatarWei Yongjun <weiyongjun1@huawei.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      47a6ca3f
    • Wei Yongjun's avatar
      net/mlx5e: Make function mlx5e_change_rep_mtu() static · 6f6027a5
      Wei Yongjun authored
      Fixes the following sparse warning:
      
      drivers/net/ethernet/mellanox/mlx5/core/en_rep.c:903:5: warning:
       symbol 'mlx5e_change_rep_mtu' was not declared. Should it be static?
      Signed-off-by: default avatarWei Yongjun <weiyongjun1@huawei.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6f6027a5
    • Subash Abhinov Kasiviswanathan's avatar
      net: qualcomm: rmnet: Fix use after free while sending command ack · 3602207c
      Subash Abhinov Kasiviswanathan authored
      When sending an ack to a command packet, the skb is still referenced
      after it is sent to the real device. Since the real device could
      free the skb, the device pointer would be invalid.
      Also, remove an unnecessary variable.
      
      Fixes: ceed73a2 ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation")
      Signed-off-by: default avatarSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3602207c
    • Subash Abhinov Kasiviswanathan's avatar
      net: ipv6: Generate random IID for addresses on RAWIP devices · 9deb441c
      Subash Abhinov Kasiviswanathan authored
      RAWIP devices such as rmnet do not have a hardware address and
      instead require the kernel to generate a random IID for the
      IPv6 addresses.
      Signed-off-by: default avatarSean Tranchetti <stranche@codeaurora.org>
      Signed-off-by: default avatarSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9deb441c
    • Yousuk Seung's avatar
      tcp: refactor tcp_ecn_check_ce to remove sk type cast · f4c9f85f
      Yousuk Seung authored
      Refactor tcp_ecn_check_ce and __tcp_ecn_check_ce to accept struct sock*
      instead of tcp_sock* to clean up type casts. This is a pure refactor
      patch.
      Signed-off-by: default avatarYousuk Seung <ysseung@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f4c9f85f
    • David Ahern's avatar
      net/ipv6: prevent use after free in ip6_route_mpath_notify · f7225172
      David Ahern authored
      syzbot reported a use-after-free:
      
      BUG: KASAN: use-after-free in ip6_route_mpath_notify+0xe9/0x100 net/ipv6/route.c:4180
      Read of size 4 at addr ffff8801bf789cf0 by task syz-executor756/4555
      
      CPU: 1 PID: 4555 Comm: syz-executor756 Not tainted 4.17.0-rc7+ #78
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1b9/0x294 lib/dump_stack.c:113
       print_address_description+0x6c/0x20b mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:354 [inline]
       kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
       __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432
       ip6_route_mpath_notify+0xe9/0x100 net/ipv6/route.c:4180
       ip6_route_multipath_add+0x615/0x1910 net/ipv6/route.c:4303
       inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
       ...
      
      Allocated by task 4555:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
       kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
       kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554
       dst_alloc+0xbb/0x1d0 net/core/dst.c:104
       __ip6_dst_alloc+0x35/0xa0 net/ipv6/route.c:361
       ip6_dst_alloc+0x29/0xb0 net/ipv6/route.c:376
       ip6_route_info_create+0x4d4/0x3a30 net/ipv6/route.c:2834
       ip6_route_multipath_add+0xc7e/0x1910 net/ipv6/route.c:4240
       inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
       ...
      
      Freed by task 4555:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
       kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
       __cache_free mm/slab.c:3498 [inline]
       kmem_cache_free+0x86/0x2d0 mm/slab.c:3756
       dst_destroy+0x267/0x3c0 net/core/dst.c:140
       dst_release_immediate+0x71/0x9e net/core/dst.c:205
       fib6_add+0xa40/0x1650 net/ipv6/ip6_fib.c:1305
       __ip6_ins_rt+0x6c/0x90 net/ipv6/route.c:1011
       ip6_route_multipath_add+0x513/0x1910 net/ipv6/route.c:4267
       inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
       ...
      
      The problem is that rt_last can point to a deleted route if the insert
      fails.
      
      One reproducer is to insert a route and then add a multipath route that
      has a duplicate nexthop.e.g,:
          $ ip -6 ro add vrf red 2001:db8:101::/64 nexthop via 2001:db8:1::2
          $ ip -6 ro append vrf red 2001:db8:101::/64 nexthop via 2001:db8:1::4 nexthop via 2001:db8:1::2
      
      Fix by not setting rt_last until the it is verified the insert succeeded.
      
      Fixes: 3b1137fe ("net: ipv6: Change notifications for multipath add to RTA_MULTIPATH")
      Cc: Eric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7225172
    • Kun Yi's avatar
      net: phy: broadcom: Enable 125 MHz clock on LED4 pin for BCM54612E by default. · 69e2eccc
      Kun Yi authored
      BCM54612E have 4 multi-functional LED pins that can be configured
      through register setting; the LED4 pin can be configured to a 125MHz
      reference clock output by setting the spare register. Since the dedicated
      CLK125 reference clock pin is not brought out on the 48-Pin MLP, the LED4
      pin is the only pin to provide such function in this package, and therefore
      it is beneficial to just enable the reference clock by default.
      Signed-off-by: default avatarKun Yi <kunyi@google.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      69e2eccc
    • Guillaume Nault's avatar
      l2tp: fix refcount leakage on PPPoL2TP sockets · 3d609342
      Guillaume Nault authored
      Commit d02ba2a6 ("l2tp: fix race in pppol2tp_release with session
      object destroy") tried to fix a race condition where a PPPoL2TP socket
      would disappear while the L2TP session was still using it. However, it
      missed the root issue which is that an L2TP session may accept to be
      reconnected if its associated socket has entered the release process.
      
      The tentative fix makes the session hold the socket it is connected to.
      That saves the kernel from crashing, but introduces refcount leakage,
      preventing the socket from completing the release process. Once stalled,
      everything the socket depends on can't be released anymore, including
      the L2TP session and the l2tp_ppp module.
      
      The root issue is that, when releasing a connected PPPoL2TP socket, the
      session's ->sk pointer (RCU-protected) is reset to NULL and we have to
      wait for a grace period before destroying the socket. The socket drops
      the session in its ->sk_destruct callback function, so the session
      will exist until the last reference on the socket is dropped.
      Therefore, there is a time frame where pppol2tp_connect() may accept
      reconnecting a session, as it only checks ->sk to figure out if the
      session is connected. This time frame is shortened by the fact that
      pppol2tp_release() calls l2tp_session_delete(), making the session
      unreachable before resetting ->sk. However, pppol2tp_connect() may
      grab the session before it gets unhashed by l2tp_session_delete(), but
      it may test ->sk after the later got reset. The race is not so hard to
      trigger and syzbot found a pretty reliable reproducer:
      https://syzkaller.appspot.com/bug?id=418578d2a4389074524e04d641eacb091961b2cf
      
      Before d02ba2a6, another race could let pppol2tp_release()
      overwrite the ->__sk pointer of an L2TP session, thus tricking
      pppol2tp_put_sk() into calling sock_put() on a socket that is different
      than the one for which pppol2tp_release() was originally called. To get
      there, we had to trigger the race described above, therefore having one
      PPPoL2TP socket being released, while the session it is connected to is
      reconnecting to a different PPPoL2TP socket. When releasing this new
      socket fast enough, pppol2tp_release() overwrites the session's
      ->__sk pointer with the address of the new socket, before the first
      pppol2tp_put_sk() call gets scheduled. Then the pppol2tp_put_sk() call
      invoked by the original socket will sock_put() the new socket,
      potentially dropping its last reference. When the second
      pppol2tp_put_sk() finally runs, its socket has already been freed.
      
      With d02ba2a6, the session takes a reference on both sockets.
      Furthermore, the session's ->sk pointer is reset in the
      pppol2tp_session_close() callback function rather than in
      pppol2tp_release(). Therefore, ->__sk can't be overwritten and
      pppol2tp_put_sk() is called only once (l2tp_session_delete() will only
      run pppol2tp_session_close() once, to protect the session against
      concurrent deletion requests). Now pppol2tp_put_sk() will properly
      sock_put() the original socket, but the new socket will remain, as
      l2tp_session_delete() prevented the release process from completing.
      Here, we don't depend on the ->__sk race to trigger the bug. Getting
      into the pppol2tp_connect() race is enough to leak the reference, no
      matter when new socket is released.
      
      So it all boils down to pppol2tp_connect() failing to realise that the
      session has already been connected. This patch drops the unneeded extra
      reference counting (mostly reverting d02ba2a6) and checks that
      neither ->sk nor ->__sk is set before allowing a session to be
      connected.
      
      Fixes: d02ba2a6 ("l2tp: fix race in pppol2tp_release with session object destroy")
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3d609342
    • David S. Miller's avatar
      Merge branch 'net-phy-improve-PM-handling-of-PHY-MDIO' · 7a723099
      David S. Miller authored
      Heiner Kallweit says:
      
      ====================
      net: phy: improve PM handling of PHY/MDIO
      
      Current implementation of MDIO bus PM ops doesn't actually implement
      bus-specific PM ops but just calls PM ops defined on a device level
      what doesn't seem to be fully in line with the core PM model.
      
      When looking e.g. at __device_suspend() the PM core looks for PM ops
      of a device in a specific order:
      1. device PM domain
      2. device type
      3. device class
      4. device bus
      
      I think it has good reason that there's no PM ops on device level.
      The situation can be improved by modeling PHY's as device type of
      a MDIO device. If for some other type of MDIO device PM ops are
      needed, it could be modeled as struct device_type as well.
      ====================
      Tested-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7a723099
    • Heiner Kallweit's avatar
      net: phy: remove PM ops from MDIO bus · 9107c05e
      Heiner Kallweit authored
      Current implementation of MDIO bus PM ops doesn't actually implement
      bus-specific PM ops but just calls PM ops defined on a device level
      what doesn't seem to be fully in line with the core PM model.
      
      When looking e.g. at __device_suspend() the PM core looks for PM ops
      of a device in a specific order:
      1. device PM domain
      2. device type
      3. device class
      4. device bus
      
      I think it has good reason that there's no PM ops on device level.
      
      Now that a device type representation of PHY's as special type of MDIO
      devices was added (only user of MDIO bus PM ops), the MDIO bus
      PM ops can be removed including member pm of struct mdio_device.
      
      If for some other type of MDIO device PM ops are needed, it should be
      modeled as struct device_type as well.
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9107c05e
    • Heiner Kallweit's avatar
      net: phy: add struct device_type representation of a PHY · 7f4828ff
      Heiner Kallweit authored
      A PHY is a type of MDIO device, so let's model it as struct device_type
      and place PM ops, attribute groups and release callback on device type
      level. For this the attribute definitions have to be moved.
      This change allows us to get rid of the PM ops on a bus level in a second
      step.
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7f4828ff
    • David S. Miller's avatar
  2. 04 Jun, 2018 22 commits