1. 23 Nov, 2018 12 commits
    • Jon Maloy's avatar
      tipc: fix lockdep warning when reinitilaizing sockets · ce209966
      Jon Maloy authored
      [ Upstream commit adba75be ]
      
      We get the following warning:
      
      [   47.926140] 32-bit node address hash set to 2010a0a
      [   47.927202]
      [   47.927433] ================================
      [   47.928050] WARNING: inconsistent lock state
      [   47.928661] 4.19.0+ #37 Tainted: G            E
      [   47.929346] --------------------------------
      [   47.929954] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
      [   47.930116] swapper/3/0 [HC0[0]:SC1[3]:HE1:SE0] takes:
      [   47.930116] 00000000af8bc31e (&(&ht->lock)->rlock){+.?.}, at: rhashtable_walk_enter+0x36/0xb0
      [   47.930116] {SOFTIRQ-ON-W} state was registered at:
      [   47.930116]   _raw_spin_lock+0x29/0x60
      [   47.930116]   rht_deferred_worker+0x556/0x810
      [   47.930116]   process_one_work+0x1f5/0x540
      [   47.930116]   worker_thread+0x64/0x3e0
      [   47.930116]   kthread+0x112/0x150
      [   47.930116]   ret_from_fork+0x3a/0x50
      [   47.930116] irq event stamp: 14044
      [   47.930116] hardirqs last  enabled at (14044): [<ffffffff9a07fbba>] __local_bh_enable_ip+0x7a/0xf0
      [   47.938117] hardirqs last disabled at (14043): [<ffffffff9a07fb81>] __local_bh_enable_ip+0x41/0xf0
      [   47.938117] softirqs last  enabled at (14028): [<ffffffff9a0803ee>] irq_enter+0x5e/0x60
      [   47.938117] softirqs last disabled at (14029): [<ffffffff9a0804a5>] irq_exit+0xb5/0xc0
      [   47.938117]
      [   47.938117] other info that might help us debug this:
      [   47.938117]  Possible unsafe locking scenario:
      [   47.938117]
      [   47.938117]        CPU0
      [   47.938117]        ----
      [   47.938117]   lock(&(&ht->lock)->rlock);
      [   47.938117]   <Interrupt>
      [   47.938117]     lock(&(&ht->lock)->rlock);
      [   47.938117]
      [   47.938117]  *** DEADLOCK ***
      [   47.938117]
      [   47.938117] 2 locks held by swapper/3/0:
      [   47.938117]  #0: 0000000062c64f90 ((&d->timer)){+.-.}, at: call_timer_fn+0x5/0x280
      [   47.938117]  #1: 00000000ee39619c (&(&d->lock)->rlock){+.-.}, at: tipc_disc_timeout+0xc8/0x540 [tipc]
      [   47.938117]
      [   47.938117] stack backtrace:
      [   47.938117] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G            E     4.19.0+ #37
      [   47.938117] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [   47.938117] Call Trace:
      [   47.938117]  <IRQ>
      [   47.938117]  dump_stack+0x5e/0x8b
      [   47.938117]  print_usage_bug+0x1ed/0x1ff
      [   47.938117]  mark_lock+0x5b5/0x630
      [   47.938117]  __lock_acquire+0x4c0/0x18f0
      [   47.938117]  ? lock_acquire+0xa6/0x180
      [   47.938117]  lock_acquire+0xa6/0x180
      [   47.938117]  ? rhashtable_walk_enter+0x36/0xb0
      [   47.938117]  _raw_spin_lock+0x29/0x60
      [   47.938117]  ? rhashtable_walk_enter+0x36/0xb0
      [   47.938117]  rhashtable_walk_enter+0x36/0xb0
      [   47.938117]  tipc_sk_reinit+0xb0/0x410 [tipc]
      [   47.938117]  ? mark_held_locks+0x6f/0x90
      [   47.938117]  ? __local_bh_enable_ip+0x7a/0xf0
      [   47.938117]  ? lockdep_hardirqs_on+0x20/0x1a0
      [   47.938117]  tipc_net_finalize+0xbf/0x180 [tipc]
      [   47.938117]  tipc_disc_timeout+0x509/0x540 [tipc]
      [   47.938117]  ? call_timer_fn+0x5/0x280
      [   47.938117]  ? tipc_disc_msg_xmit.isra.19+0xa0/0xa0 [tipc]
      [   47.938117]  ? tipc_disc_msg_xmit.isra.19+0xa0/0xa0 [tipc]
      [   47.938117]  call_timer_fn+0xa1/0x280
      [   47.938117]  ? tipc_disc_msg_xmit.isra.19+0xa0/0xa0 [tipc]
      [   47.938117]  run_timer_softirq+0x1f2/0x4d0
      [   47.938117]  __do_softirq+0xfc/0x413
      [   47.938117]  irq_exit+0xb5/0xc0
      [   47.938117]  smp_apic_timer_interrupt+0xac/0x210
      [   47.938117]  apic_timer_interrupt+0xf/0x20
      [   47.938117]  </IRQ>
      [   47.938117] RIP: 0010:default_idle+0x1c/0x140
      [   47.938117] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 41 54 55 53 65 8b 2d d8 2b 74 65 0f 1f 44 00 00 e8 c6 2c 8b ff fb f4 <65> 8b 2d c5 2b 74 65 0f 1f 44 00 00 5b 5d 41 5c c3 65 8b 05 b4 2b
      [   47.938117] RSP: 0018:ffffaf6ac0207ec8 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
      [   47.938117] RAX: ffff8f5b3735e200 RBX: 0000000000000003 RCX: 0000000000000001
      [   47.938117] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff8f5b3735e200
      [   47.938117] RBP: 0000000000000003 R08: 0000000000000001 R09: 0000000000000000
      [   47.938117] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      [   47.938117] R13: 0000000000000000 R14: ffff8f5b3735e200 R15: ffff8f5b3735e200
      [   47.938117]  ? default_idle+0x1a/0x140
      [   47.938117]  do_idle+0x1bc/0x280
      [   47.938117]  cpu_startup_entry+0x19/0x20
      [   47.938117]  start_secondary+0x187/0x1c0
      [   47.938117]  secondary_startup_64+0xa4/0xb0
      
      The reason seems to be that tipc_net_finalize()->tipc_sk_reinit() is
      calling the function rhashtable_walk_enter() within a timer interrupt.
      We fix this by executing tipc_net_finalize() in work queue context.
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ce209966
    • Jon Maloy's avatar
      tipc: don't assume linear buffer when reading ancillary data · aaf13772
      Jon Maloy authored
      [ Upstream commit 1c1274a5 ]
      
      The code for reading ancillary data from a received buffer is assuming
      the buffer is linear. To make this assumption true we have to linearize
      the buffer before message data is read.
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aaf13772
    • Siva Reddy Kallam's avatar
      tg3: Add PHY reset for 5717/5719/5720 in change ring and flow control paths · 710c65c8
      Siva Reddy Kallam authored
      [ Upstream commit 59663e42 ]
      
      This patch has the fix to avoid PHY lockup with 5717/5719/5720 in change
      ring and flow control paths. This patch solves the RX hang while doing
      continuous ring or flow control parameters with heavy traffic from peer.
      Signed-off-by: default avatarSiva Reddy Kallam <siva.kallam@broadcom.com>
      Acked-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      710c65c8
    • Stephen Mallon's avatar
      tcp: Fix SOF_TIMESTAMPING_RX_HARDWARE to use the latest timestamp during TCP coalescing · 7e678227
      Stephen Mallon authored
      [ Upstream commit cadf9df2 ]
      
      During tcp coalescing ensure that the skb hardware timestamp refers to the
      highest sequence number data.
      Previously only the software timestamp was updated during coalescing.
      Signed-off-by: default avatarStephen Mallon <stephen.mallon@sydney.edu.au>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7e678227
    • Xin Long's avatar
      sctp: not allow to set asoc prsctp_enable by sockopt · 7e86081c
      Xin Long authored
      [ Upstream commit cc3ccf26 ]
      
      As rfc7496#section4.5 says about SCTP_PR_SUPPORTED:
      
         This socket option allows the enabling or disabling of the
         negotiation of PR-SCTP support for future associations.  For existing
         associations, it allows one to query whether or not PR-SCTP support
         was negotiated on a particular association.
      
      It means only sctp sock's prsctp_enable can be set.
      
      Note that for the limitation of SCTP_{CURRENT|ALL}_ASSOC, we will
      add it when introducing SCTP_{FUTURE|CURRENT|ALL}_ASSOC for linux
      sctp in another patchset.
      
      v1->v2:
        - drop the params.assoc_id check as Neil suggested.
      
      Fixes: 28aa4c26 ("sctp: add SCTP_PR_SUPPORTED on sctp sockopt")
      Reported-by: default avatarYing Xu <yinxu@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7e86081c
    • Eric Dumazet's avatar
      net-gro: reset skb->pkt_type in napi_reuse_skb() · a21a82a9
      Eric Dumazet authored
      [ Upstream commit 33d9a2c7 ]
      
      eth_type_trans() assumes initial value for skb->pkt_type
      is PACKET_HOST.
      
      This is indeed the value right after a fresh skb allocation.
      
      However, it is possible that GRO merged a packet with a different
      value (like PACKET_OTHERHOST in case macvlan is used), so
      we need to make sure napi->skb will have pkt_type set back to
      PACKET_HOST.
      
      Otherwise, valid packets might be dropped by the stack because
      their pkt_type is not PACKET_HOST.
      
      napi_reuse_skb() was added in commit 96e93eab ("gro: Add
      internal interfaces for VLAN"), but this bug always has
      been there.
      
      Fixes: 96e93eab ("gro: Add internal interfaces for VLAN")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a21a82a9
    • Doug Berger's avatar
      net: bcmgenet: protect stop from timeout · 852c280d
      Doug Berger authored
      A timing hazard exists when the network interface is stopped that
      allows a watchdog timeout to be processed by a separate core in
      parallel. This creates the potential for the timeout handler to
      wake the queues while the driver is shutting down, or access
      registers after their clocks have been removed.
      
      The more common case is that the watchdog timeout will produce a
      warning message which doesn't lead to a crash. The chances of this
      are greatly increased by the fact that bcmgenet_netif_stop stops
      the transmit queues which can easily precipitate a watchdog time-
      out because of stale trans_start data in the queues.
      
      This commit corrects the behavior by ensuring that the watchdog
      timeout is disabled before enterring bcmgenet_netif_stop. There
      are currently only two users of the bcmgenet_netif_stop function:
      close and suspend.
      
      The close case already handles the issue by exiting the RUNNING
      state before invoking the driver close service.
      
      The suspend case now performs the netif_device_detach to exit the
      PRESENT state before the call to bcmgenet_netif_stop rather than
      after it.
      
      These behaviors prevent any future scheduling of the driver timeout
      service during the window. The netif_tx_stop_all_queues function
      in bcmgenet_netif_stop is replaced with netif_tx_disable to ensure
      synchronization with any transmit or timeout threads that may
      already be executing on other cores.
      
      For symmetry, the netif_device_attach call upon resume is moved to
      after the call to bcmgenet_netif_start. Since it wakes the transmit
      queues it is not necessary to invoke netif_tx_start_all_queues from
      bcmgenet_netif_start so it is moved into the driver open service.
      
      [ Upstream commit 09e805d2 ]
      
      Fixes: 1c1008c7 ("net: bcmgenet: add main driver file")
      Signed-off-by: default avatarDoug Berger <opendmb@gmail.com>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      852c280d
    • David Ahern's avatar
      ipv6: Fix PMTU updates for UDP/raw sockets in presence of VRF · 5bb115fb
      David Ahern authored
      [ Upstream commit 7ddacfa5 ]
      
      Preethi reported that PMTU discovery for UDP/raw applications is not
      working in the presence of VRF when the socket is not bound to a device.
      The problem is that ip6_sk_update_pmtu does not consider the L3 domain
      of the skb device if the socket is not bound. Update the function to
      set oif to the L3 master device if relevant.
      
      Fixes: ca254490 ("net: Add VRF support to IPv6 stack")
      Reported-by: default avatarPreethi Ramachandra <preethir@juniper.net>
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5bb115fb
    • Xin Long's avatar
      ipv6: fix a dst leak when removing its exception · b536dd80
      Xin Long authored
      [ Upstream commit 761f6026 ]
      
      These is no need to hold dst before calling rt6_remove_exception_rt().
      The call to dst_hold_safe() in ip6_link_failure() was for ip6_del_rt(),
      which has been removed in Commit 93531c67 ("net/ipv6: separate
      handling of FIB entries from dst based routes"). Otherwise, it will
      cause a dst leak.
      
      This patch is to simply remove the dst_hold_safe() call before calling
      rt6_remove_exception_rt() and also do the same in ip6_del_cached_rt().
      It's safe, because the removal of the exception that holds its dst's
      refcnt is protected by rt6_exception_lock.
      
      Fixes: 93531c67 ("net/ipv6: separate handling of FIB entries from dst based routes")
      Fixes: 23fb93a4 ("net/ipv6: Cleanup exception and cache route handling")
      Reported-by: default avatarLi Shuang <shuali@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b536dd80
    • Sabrina Dubroca's avatar
      ip_tunnel: don't force DF when MTU is locked · 60258098
      Sabrina Dubroca authored
      [ Upstream commit 16f7eb2b ]
      
      The various types of tunnels running over IPv4 can ask to set the DF
      bit to do PMTU discovery. However, PMTU discovery is subject to the
      threshold set by the net.ipv4.route.min_pmtu sysctl, and is also
      disabled on routes with "mtu lock". In those cases, we shouldn't set
      the DF bit.
      
      This patch makes setting the DF bit conditional on the route's MTU
      locking state.
      
      This issue seems to be older than git history.
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      60258098
    • Michał Mirosław's avatar
      ibmvnic: fix accelerated VLAN handling · a6870825
      Michał Mirosław authored
      [ Upstream commit e84b4794 ]
      
      Don't request tag insertion when it isn't present in outgoing skb.
      Signed-off-by: default avatarMichał Mirosław <mirq-linux@rere.qmqm.pl>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a6870825
    • 배석진's avatar
      flow_dissector: do not dissect l4 ports for fragments · ad6dfbd1
      배석진 authored
      [ Upstream commit 62230715 ]
      
      Only first fragment has the sport/dport information,
      not the following ones.
      
      If we want consistent hash for all fragments, we need to
      ignore ports even for first fragment.
      
      This bug is visible for IPv6 traffic, if incoming fragments
      do not have a flow label, since skb_get_hash() will give
      different results for first fragment and following ones.
      
      It is also visible if any routing rule wants dissection
      and sport or dport.
      
      See commit 5e5d6fed ("ipv6: route: dissect flow
      in input path if fib rules need it") for details.
      
      [edumazet] rewrote the changelog completely.
      
      Fixes: 06635a35 ("flow_dissect: use programable dissector in skb_flow_dissect and friends")
      Signed-off-by: default avatar배석진 <soukjin.bae@samsung.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ad6dfbd1
  2. 21 Nov, 2018 28 commits