1. 23 Jul, 2018 5 commits
  2. 22 Jul, 2018 12 commits
    • Randy Dunlap's avatar
      net: prevent ISA drivers from building on PPC32 · c9ce1fa1
      Randy Dunlap authored
      Prevent drivers from building on PPC32 if they use isa_bus_to_virt(),
      isa_virt_to_bus(), or isa_page_to_bus(), which are not available and
      thus cause build errors.
      
      ../drivers/net/ethernet/3com/3c515.c: In function 'corkscrew_open':
      ../drivers/net/ethernet/3com/3c515.c:824:9: error: implicit declaration of function 'isa_virt_to_bus'; did you mean 'virt_to_bus'? [-Werror=implicit-function-declaration]
      
      ../drivers/net/ethernet/amd/lance.c: In function 'lance_rx':
      ../drivers/net/ethernet/amd/lance.c:1203:23: error: implicit declaration of function 'isa_bus_to_virt'; did you mean 'bus_to_virt'? [-Werror=implicit-function-declaration]
      
      ../drivers/net/ethernet/amd/ni65.c: In function 'ni65_init_lance':
      ../drivers/net/ethernet/amd/ni65.c:585:20: error: implicit declaration of function 'isa_virt_to_bus'; did you mean 'virt_to_bus'? [-Werror=implicit-function-declaration]
      
      ../drivers/net/ethernet/cirrus/cs89x0.c: In function 'net_open':
      ../drivers/net/ethernet/cirrus/cs89x0.c:897:20: error: implicit declaration of function 'isa_virt_to_bus'; did you mean 'virt_to_bus'? [-Werror=implicit-function-declaration]
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Suggested-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c9ce1fa1
    • John Hurley's avatar
      nfp: flower: ensure dead neighbour entries are not offloaded · b809ec86
      John Hurley authored
      Previously only the neighbour state was checked to decide if an offloaded
      entry should be removed. However, there can be situations when the entry
      is dead but still marked as valid. This can lead to dead entries not
      being removed from fw tables or even incorrect data being added.
      
      Check the entry dead bit before deciding if it should be added to or
      removed from fw neighbour tables.
      
      Fixes: 8e6a9046 ("nfp: flower vxlan neighbour offload")
      Signed-off-by: default avatarJohn Hurley <john.hurley@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b809ec86
    • David S. Miller's avatar
      Merge branch 'vxlan-fix-default-fdb-entry-user-space-notify-ordering-race' · 0fb8d5a0
      David S. Miller authored
      Roopa Prabhu says:
      
      ====================
      vxlan: fix default fdb entry user-space notify ordering/race
      
      Problem:
      In vxlan_newlink, a default fdb entry is added before register_netdev.
      The default fdb creation function notifies user-space of the
      fdb entry on the vxlan device which user-space does not know about yet.
      (RTM_NEWNEIGH goes before RTM_NEWLINK for the same ifindex).
      
      This series fixes the user-space netlink notification ordering issue
      with the following changes:
      - decouple fdb notify from fdb create.
      - Move fdb notify after register_netdev.
      - modify rtnl_configure_link to allow configuring a link early.
      - Call rtnl_configure_link in vxlan newlink handler to notify
      userspace about the newlink before fdb notify and
      hence avoiding the user-space race.
      ====================
      
      Fixes: afbd8bae ("vxlan: add implicit fdb entry for default destination")
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      0fb8d5a0
    • Roopa Prabhu's avatar
      vxlan: fix default fdb entry netlink notify ordering during netdev create · e99465b9
      Roopa Prabhu authored
      Problem:
      In vxlan_newlink, a default fdb entry is added before register_netdev.
      The default fdb creation function also notifies user-space of the
      fdb entry on the vxlan device which user-space does not know about yet.
      (RTM_NEWNEIGH goes before RTM_NEWLINK for the same ifindex).
      
      This patch fixes the user-space netlink notification ordering issue
      with the following changes:
      - decouple fdb notify from fdb create.
      - Move fdb notify after register_netdev.
      - Call rtnl_configure_link in vxlan newlink handler to notify
      userspace about the newlink before fdb notify and
      hence avoiding the user-space race.
      
      Fixes: afbd8bae ("vxlan: add implicit fdb entry for default destination")
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e99465b9
    • Roopa Prabhu's avatar
      vxlan: make netlink notify in vxlan_fdb_destroy optional · f6e05385
      Roopa Prabhu authored
      Add a new option do_notify to vxlan_fdb_destroy to make
      sending netlink notify optional. Used by a later patch.
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f6e05385
    • Roopa Prabhu's avatar
      vxlan: add new fdb alloc and create helpers · 7431016b
      Roopa Prabhu authored
      - Add new vxlan_fdb_alloc helper
      - rename existing vxlan_fdb_create into vxlan_fdb_update:
              because it really creates or updates an existing
              fdb entry
      - move new fdb creation into a separate vxlan_fdb_create
      
      Main motivation for this change is to introduce the ability
      to decouple vxlan fdb creation and notify, used in a later patch.
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7431016b
    • Roopa Prabhu's avatar
      rtnetlink: add rtnl_link_state check in rtnl_configure_link · 5025f7f7
      Roopa Prabhu authored
      rtnl_configure_link sets dev->rtnl_link_state to
      RTNL_LINK_INITIALIZED and unconditionally calls
      __dev_notify_flags to notify user-space of dev flags.
      
      current call sequence for rtnl_configure_link
      rtnetlink_newlink
          rtnl_link_ops->newlink
          rtnl_configure_link (unconditionally notifies userspace of
                               default and new dev flags)
      
      If a newlink handler wants to call rtnl_configure_link
      early, we will end up with duplicate notifications to
      user-space.
      
      This patch fixes rtnl_configure_link to check rtnl_link_state
      and call __dev_notify_flags with gchanges = 0 if already
      RTNL_LINK_INITIALIZED.
      
      Later in the series, this patch will help the following sequence
      where a driver implementing newlink can call rtnl_configure_link
      to initialize the link early.
      
      makes the following call sequence work:
      rtnetlink_newlink
          rtnl_link_ops->newlink (vxlan) -> rtnl_configure_link (initializes
                                                      link and notifies
                                                      user-space of default
                                                      dev flags)
          rtnl_configure_link (updates dev flags if requested by user ifm
                               and notifies user-space of new dev flags)
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5025f7f7
    • Florian Westphal's avatar
      atl1c: reserve min skb headroom · 6e568307
      Florian Westphal authored
      Got crash report with following backtrace:
      BUG: unable to handle kernel paging request at ffff8801869daffe
      RIP: 0010:[<ffffffff816429c4>]  [<ffffffff816429c4>] ip6_finish_output2+0x394/0x4c0
      RSP: 0018:ffff880186c83a98  EFLAGS: 00010283
      RAX: ffff8801869db00e ...
        [<ffffffff81644cdc>] ip6_finish_output+0x8c/0xf0
        [<ffffffff81644d97>] ip6_output+0x57/0x100
        [<ffffffff81643dc9>] ip6_forward+0x4b9/0x840
        [<ffffffff81645566>] ip6_rcv_finish+0x66/0xc0
        [<ffffffff81645db9>] ipv6_rcv+0x319/0x530
        [<ffffffff815892ac>] netif_receive_skb+0x1c/0x70
        [<ffffffffc0060bec>] atl1c_clean+0x1ec/0x310 [atl1c]
        ...
      
      The bad access is in neigh_hh_output(), at skb->data - 16 (HH_DATA_MOD).
      atl1c driver provided skb with no headroom, so 14 bytes (ethernet
      header) got pulled, but then 16 are copied.
      
      Reserve NET_SKB_PAD bytes headroom, like netdev_alloc_skb().
      
      Compile tested only; I lack hardware.
      
      Fixes: 7b701764 ("atl1c: Fix misuse of netdev_alloc_skb in refilling rx ring")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e568307
    • Hangbin Liu's avatar
      multicast: do not restore deleted record source filter mode to new one · 08d3ffcc
      Hangbin Liu authored
      There are two scenarios that we will restore deleted records. The first is
      when device down and up(or unmap/remap). In this scenario the new filter
      mode is same with previous one. Because we get it from in_dev->mc_list and
      we do not touch it during device down and up.
      
      The other scenario is when a new socket join a group which was just delete
      and not finish sending status reports. In this scenario, we should use the
      current filter mode instead of restore old one. Here are 4 cases in total.
      
      old_socket        new_socket       before_fix       after_fix
        IN(A)             IN(A)           ALLOW(A)         ALLOW(A)
        IN(A)             EX( )           TO_IN( )         TO_EX( )
        EX( )             IN(A)           TO_EX( )         ALLOW(A)
        EX( )             EX( )           TO_EX( )         TO_EX( )
      
      Fixes: 24803f38 (igmp: do not remove igmp souce list info when set link down)
      Fixes: 1666d49e (mld: do not remove mld souce list info when set link down)
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      08d3ffcc
    • Uwe Kleine-König's avatar
      net: dsa: mv88e6xxx: fix races between lock and irq freeing · 3d82475a
      Uwe Kleine-König authored
      free_irq() waits until all handlers for this IRQ have completed. As the
      relevant handler (mv88e6xxx_g1_irq_thread_fn()) takes the chip's reg_lock
      it might never return if the thread calling free_irq() holds this lock.
      
      For the same reason kthread_cancel_delayed_work_sync() in the polling case
      must not hold this lock.
      
      Also first free the irq (or stop the worker respectively) such that
      mv88e6xxx_g1_irq_thread_work() isn't called any more before the irq
      mappings are dropped in mv88e6xxx_g1_irq_free_common() to prevent the
      worker thread to call handle_nested_irq(0) which results in a NULL-pointer
      exception.
      Signed-off-by: default avatarUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3d82475a
    • Eric Dumazet's avatar
      net: skb_segment() should not return NULL · ff907a11
      Eric Dumazet authored
      syzbot caught a NULL deref [1], caused by skb_segment()
      
      skb_segment() has many "goto err;" that assume the @err variable
      contains -ENOMEM.
      
      A successful call to __skb_linearize() should not clear @err,
      otherwise a subsequent memory allocation error could return NULL.
      
      While we are at it, we might use -EINVAL instead of -ENOMEM when
      MAX_SKB_FRAGS limit is reached.
      
      [1]
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      CPU: 0 PID: 13285 Comm: syz-executor3 Not tainted 4.18.0-rc4+ #146
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:tcp_gso_segment+0x3dc/0x1780 net/ipv4/tcp_offload.c:106
      Code: f0 ff ff 0f 87 1c fd ff ff e8 00 88 0b fb 48 8b 75 d0 48 b9 00 00 00 00 00 fc ff df 48 8d be 90 00 00 00 48 89 f8 48 c1 e8 03 <0f> b6 14 08 48 8d 86 94 00 00 00 48 89 c6 83 e0 07 48 c1 ee 03 0f
      RSP: 0018:ffff88019b7fd060 EFLAGS: 00010206
      RAX: 0000000000000012 RBX: 0000000000000020 RCX: dffffc0000000000
      RDX: 0000000000040000 RSI: 0000000000000000 RDI: 0000000000000090
      RBP: ffff88019b7fd0f0 R08: ffff88019510e0c0 R09: ffffed003b5c46d6
      R10: ffffed003b5c46d6 R11: ffff8801dae236b3 R12: 0000000000000001
      R13: ffff8801d6c581f4 R14: 0000000000000000 R15: ffff8801d6c58128
      FS:  00007fcae64d6700(0000) GS:ffff8801dae00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000004e8664 CR3: 00000001b669b000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       tcp4_gso_segment+0x1c3/0x440 net/ipv4/tcp_offload.c:54
       inet_gso_segment+0x64e/0x12d0 net/ipv4/af_inet.c:1342
       inet_gso_segment+0x64e/0x12d0 net/ipv4/af_inet.c:1342
       skb_mac_gso_segment+0x3b5/0x740 net/core/dev.c:2792
       __skb_gso_segment+0x3c3/0x880 net/core/dev.c:2865
       skb_gso_segment include/linux/netdevice.h:4099 [inline]
       validate_xmit_skb+0x640/0xf30 net/core/dev.c:3104
       __dev_queue_xmit+0xc14/0x3910 net/core/dev.c:3561
       dev_queue_xmit+0x17/0x20 net/core/dev.c:3602
       neigh_hh_output include/net/neighbour.h:473 [inline]
       neigh_output include/net/neighbour.h:481 [inline]
       ip_finish_output2+0x1063/0x1860 net/ipv4/ip_output.c:229
       ip_finish_output+0x841/0xfa0 net/ipv4/ip_output.c:317
       NF_HOOK_COND include/linux/netfilter.h:276 [inline]
       ip_output+0x223/0x880 net/ipv4/ip_output.c:405
       dst_output include/net/dst.h:444 [inline]
       ip_local_out+0xc5/0x1b0 net/ipv4/ip_output.c:124
       iptunnel_xmit+0x567/0x850 net/ipv4/ip_tunnel_core.c:91
       ip_tunnel_xmit+0x1598/0x3af1 net/ipv4/ip_tunnel.c:778
       ipip_tunnel_xmit+0x264/0x2c0 net/ipv4/ipip.c:308
       __netdev_start_xmit include/linux/netdevice.h:4148 [inline]
       netdev_start_xmit include/linux/netdevice.h:4157 [inline]
       xmit_one net/core/dev.c:3034 [inline]
       dev_hard_start_xmit+0x26c/0xc30 net/core/dev.c:3050
       __dev_queue_xmit+0x29ef/0x3910 net/core/dev.c:3569
       dev_queue_xmit+0x17/0x20 net/core/dev.c:3602
       neigh_direct_output+0x15/0x20 net/core/neighbour.c:1403
       neigh_output include/net/neighbour.h:483 [inline]
       ip_finish_output2+0xa67/0x1860 net/ipv4/ip_output.c:229
       ip_finish_output+0x841/0xfa0 net/ipv4/ip_output.c:317
       NF_HOOK_COND include/linux/netfilter.h:276 [inline]
       ip_output+0x223/0x880 net/ipv4/ip_output.c:405
       dst_output include/net/dst.h:444 [inline]
       ip_local_out+0xc5/0x1b0 net/ipv4/ip_output.c:124
       ip_queue_xmit+0x9df/0x1f80 net/ipv4/ip_output.c:504
       tcp_transmit_skb+0x1bf9/0x3f10 net/ipv4/tcp_output.c:1168
       tcp_write_xmit+0x1641/0x5c20 net/ipv4/tcp_output.c:2363
       __tcp_push_pending_frames+0xb2/0x290 net/ipv4/tcp_output.c:2536
       tcp_push+0x638/0x8c0 net/ipv4/tcp.c:735
       tcp_sendmsg_locked+0x2ec5/0x3f00 net/ipv4/tcp.c:1410
       tcp_sendmsg+0x2f/0x50 net/ipv4/tcp.c:1447
       inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
       sock_sendmsg_nosec net/socket.c:641 [inline]
       sock_sendmsg+0xd5/0x120 net/socket.c:651
       __sys_sendto+0x3d7/0x670 net/socket.c:1797
       __do_sys_sendto net/socket.c:1809 [inline]
       __se_sys_sendto net/socket.c:1805 [inline]
       __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1805
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x455ab9
      Code: 1d ba fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb b9 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fcae64d5c68 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 00007fcae64d66d4 RCX: 0000000000455ab9
      RDX: 0000000000000001 RSI: 0000000020000200 RDI: 0000000000000013
      RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000014
      R13: 00000000004c1145 R14: 00000000004d1818 R15: 0000000000000006
      Modules linked in:
      Dumping ftrace buffer:
         (ftrace buffer empty)
      
      Fixes: ddff00d4 ("net: Move skb_has_shared_frag check out of GRE code and into segmentation")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Alexander Duyck <alexander.h.duyck@intel.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ff907a11
    • David Ahern's avatar
      net/ipv6: Fix linklocal to global address with VRF · 24b711ed
      David Ahern authored
      Example setup:
          host: ip -6 addr add dev eth1 2001:db8:104::4
                 where eth1 is enslaved to a VRF
      
          switch: ip -6 ro add 2001:db8:104::4/128 dev br1
                  where br1 only has an LLA
      
                 ping6 2001:db8:104::4
                 ssh   2001:db8:104::4
      
      (NOTE: UDP works fine if the PKTINFO has the address set to the global
      address and ifindex is set to the index of eth1 with a destination an
      LLA).
      
      For ICMP, icmp6_iif needs to be updated to check if skb->dev is an
      L3 master. If it is then return the ifindex from rt6i_idev similar
      to what is done for loopback.
      
      For TCP, restore the original tcp_v6_iif definition which is needed in
      most places and add a new tcp_v6_iif_l3_slave that considers the
      l3_slave variability. This latter check is only needed for socket
      lookups.
      
      Fixes: 9ff74384 ("net: vrf: Handle ipv6 multicast and link-local addresses")
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      24b711ed
  3. 21 Jul, 2018 10 commits
  4. 20 Jul, 2018 10 commits
    • Doron Roberts-Kedes's avatar
      tls: check RCV_SHUTDOWN in tls_wait_data · fcf4793e
      Doron Roberts-Kedes authored
      The current code does not check sk->sk_shutdown & RCV_SHUTDOWN.
      tls_sw_recvmsg may return a positive value in the case where bytes have
      already been copied when the socket is shutdown. sk->sk_err has been
      cleared, causing the tls_wait_data to hang forever on a subsequent
      invocation. Checking sk->sk_shutdown & RCV_SHUTDOWN, as in tcp_recvmsg,
      fixes this problem.
      
      Fixes: c46234eb ("tls: RX path for ktls")
      Acked-by: default avatarDave Watson <davejwatson@fb.com>
      Signed-off-by: default avatarDoron Roberts-Kedes <doronrk@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fcf4793e
    • David S. Miller's avatar
      Merge branch 'tcp-fix-DCTCP-ECE-Ack-series' · f7a6eb1e
      David S. Miller authored
      Yuchung Cheng says:
      
      ====================
      fix DCTCP ECE Ack series
      
      This patch set address that the existing DCTCP implementation does not
      fully implement the ACK policy specified in the RFC. This improves
      the responsiveness of CE status change particularly on flows with
      small inflight.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7a6eb1e
    • Yuchung Cheng's avatar
      tcp: do not delay ACK in DCTCP upon CE status change · a0496ef2
      Yuchung Cheng authored
      Per DCTCP RFC8257 (Section 3.2) the ACK reflecting the CE status change
      has to be sent immediately so the sender can respond quickly:
      
      """ When receiving packets, the CE codepoint MUST be processed as follows:
      
         1.  If the CE codepoint is set and DCTCP.CE is false, set DCTCP.CE to
             true and send an immediate ACK.
      
         2.  If the CE codepoint is not set and DCTCP.CE is true, set DCTCP.CE
             to false and send an immediate ACK.
      """
      
      Previously DCTCP implementation may continue to delay the ACK. This
      patch fixes that to implement the RFC by forcing an immediate ACK.
      
      Tested with this packetdrill script provided by Larry Brakmo
      
      0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      0.000 setsockopt(3, SOL_TCP, TCP_CONGESTION, "dctcp", 5) = 0
      0.000 bind(3, ..., ...) = 0
      0.000 listen(3, 1) = 0
      
      0.100 < [ect0] SEW 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
      0.100 > SE. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
      0.110 < [ect0] . 1:1(0) ack 1 win 257
      0.200 accept(3, ..., ...) = 4
         +0 setsockopt(4, SOL_SOCKET, SO_DEBUG, [1], 4) = 0
      
      0.200 < [ect0] . 1:1001(1000) ack 1 win 257
      0.200 > [ect01] . 1:1(0) ack 1001
      
      0.200 write(4, ..., 1) = 1
      0.200 > [ect01] P. 1:2(1) ack 1001
      
      0.200 < [ect0] . 1001:2001(1000) ack 2 win 257
      +0.005 < [ce] . 2001:3001(1000) ack 2 win 257
      
      +0.000 > [ect01] . 2:2(0) ack 2001
      // Previously the ACK below would be delayed by 40ms
      +0.000 > [ect01] E. 2:2(0) ack 3001
      
      +0.500 < F. 9501:9501(0) ack 4 win 257
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a0496ef2
    • Yuchung Cheng's avatar
      tcp: do not cancel delay-AcK on DCTCP special ACK · 27cde44a
      Yuchung Cheng authored
      Currently when a DCTCP receiver delays an ACK and receive a
      data packet with a different CE mark from the previous one's, it
      sends two immediate ACKs acking previous and latest sequences
      respectly (for ECN accounting).
      
      Previously sending the first ACK may mark off the delayed ACK timer
      (tcp_event_ack_sent). This may subsequently prevent sending the
      second ACK to acknowledge the latest sequence (tcp_ack_snd_check).
      The culprit is that tcp_send_ack() assumes it always acknowleges
      the latest sequence, which is not true for the first special ACK.
      
      The fix is to not make the assumption in tcp_send_ack and check the
      actual ack sequence before cancelling the delayed ACK. Further it's
      safer to pass the ack sequence number as a local variable into
      tcp_send_ack routine, instead of intercepting tp->rcv_nxt to avoid
      future bugs like this.
      Reported-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      27cde44a
    • Yuchung Cheng's avatar
      tcp: helpers to send special DCTCP ack · 2987babb
      Yuchung Cheng authored
      Refactor and create helpers to send the special ACK in DCTCP.
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2987babb
    • Martin KaFai Lau's avatar
      bpf: Use option "help" in the llvm-objcopy test · 7c3e8b64
      Martin KaFai Lau authored
      I noticed the "--version" option of the llvm-objcopy command has recently
      disappeared from the master llvm branch.  It is currently used as a BTF
      support test in tools/testing/selftests/bpf/Makefile.
      
      This patch replaces it with "--help" which should be
      less error prone in the future.
      
      Fixes: c0fa1b6c ("bpf: btf: Add BTF tests")
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      7c3e8b64
    • Martin KaFai Lau's avatar
      bpf: btf: Clean up BTF_INT_BITS() in uapi btf.h · 36fc3c8c
      Martin KaFai Lau authored
      This patch shrinks the BTF_INT_BITS() mask.  The current
      btf_int_check_meta() ensures the nr_bits of an integer
      cannot exceed 64.  Hence, it is mostly an uapi cleanup.
      
      The actual btf usage (i.e. seq_show()) is also modified
      to use u8 instead of u16.  The verification (e.g. btf_int_check_meta())
      path stays as is to deal with invalid BTF situation.
      
      Fixes: 69b693f0 ("bpf: btf: Introduce BPF Type Format (BTF)")
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      36fc3c8c
    • Taeung Song's avatar
      tools/bpftool: Fix segfault case regarding 'pin' arguments · 759b94a0
      Taeung Song authored
      Arguments of 'pin' subcommand should be checked
      at the very beginning of do_pin_any().
      Otherwise segfault errors can occur when using
      'map pin' or 'prog pin' commands, so fix it.
      
        # bpftool prog pin id
        Segmentation fault
      
      Fixes: 71bb428f ("tools: bpf: add bpftool")
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reported-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarTaeung Song <treeze.taeung@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      759b94a0
    • Zhao Chen's avatar
      net-next/hinic: fix a problem in hinic_xmit_frame() · f7482683
      Zhao Chen authored
      The calculation of "wqe_size" is not correct when the tx queue is busy in
      hinic_xmit_frame().
      
      When there are no free WQEs, the tx flow will unmap the skb buffer, then
      ring the doobell for the pending packets. But the "wqe_size" which used
      to calculate the doorbell address is not correct. The wqe size should be
      cleared to 0, otherwise, it will cause a doorbell error.
      
      This patch fixes the problem.
      Reported-by: default avatarZhou Wang <wangzhou1@hisilicon.com>
      Signed-off-by: default avatarZhao Chen <zhaochen6@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7482683
    • Tariq Toukan's avatar
      net/page_pool: Fix inconsistent lock state warning · 4905bd9a
      Tariq Toukan authored
      Fix the warning below by calling the ptr_ring_consume_bh,
      which uses spin_[un]lock_bh.
      
      [  179.064300] ================================
      [  179.069073] WARNING: inconsistent lock state
      [  179.073846] 4.18.0-rc2+ #18 Not tainted
      [  179.078133] --------------------------------
      [  179.082907] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
      [  179.089637] swapper/21/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
      [  179.095478] 00000000963d1995 (&(&r->consumer_lock)->rlock){+.?.}, at:
      __page_pool_empty_ring+0x61/0x100
      [  179.105988] {SOFTIRQ-ON-W} state was registered at:
      [  179.111443]   _raw_spin_lock+0x35/0x50
      [  179.115634]   __page_pool_empty_ring+0x61/0x100
      [  179.120699]   page_pool_destroy+0x32/0x50
      [  179.125204]   mlx5e_free_rq+0x38/0xc0 [mlx5_core]
      [  179.130471]   mlx5e_close_channel+0x20/0x120 [mlx5_core]
      [  179.136418]   mlx5e_close_channels+0x26/0x40 [mlx5_core]
      [  179.142364]   mlx5e_close_locked+0x44/0x50 [mlx5_core]
      [  179.148509]   mlx5e_close+0x42/0x60 [mlx5_core]
      [  179.153936]   __dev_close_many+0xb1/0x120
      [  179.158749]   dev_close_many+0xa2/0x170
      [  179.163364]   rollback_registered_many+0x148/0x460
      [  179.169047]   rollback_registered+0x56/0x90
      [  179.174043]   unregister_netdevice_queue+0x7e/0x100
      [  179.179816]   unregister_netdev+0x18/0x20
      [  179.184623]   mlx5e_remove+0x2a/0x50 [mlx5_core]
      [  179.190107]   mlx5_remove_device+0xe5/0x110 [mlx5_core]
      [  179.196274]   mlx5_unregister_interface+0x39/0x90 [mlx5_core]
      [  179.203028]   cleanup+0x5/0xbfc [mlx5_core]
      [  179.208031]   __x64_sys_delete_module+0x16b/0x240
      [  179.213640]   do_syscall_64+0x5a/0x210
      [  179.218151]   entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  179.224218] irq event stamp: 334398
      [  179.228438] hardirqs last  enabled at (334398): [<ffffffffa511d8b7>]
      rcu_process_callbacks+0x1c7/0x790
      [  179.239178] hardirqs last disabled at (334397): [<ffffffffa511d872>]
      rcu_process_callbacks+0x182/0x790
      [  179.249931] softirqs last  enabled at (334386): [<ffffffffa509732e>] irq_enter+0x5e/0x70
      [  179.259306] softirqs last disabled at (334387): [<ffffffffa509741c>] irq_exit+0xdc/0xf0
      [  179.268584]
      [  179.268584] other info that might help us debug this:
      [  179.276572]  Possible unsafe locking scenario:
      [  179.276572]
      [  179.283877]        CPU0
      [  179.286954]        ----
      [  179.290033]   lock(&(&r->consumer_lock)->rlock);
      [  179.295546]   <Interrupt>
      [  179.298830]     lock(&(&r->consumer_lock)->rlock);
      [  179.304550]
      [  179.304550]  *** DEADLOCK ***
      
      Fixes: ff7d6b27 ("page_pool: refurbish version of page_pool code")
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4905bd9a
  5. 19 Jul, 2018 3 commits