1. 15 Dec, 2019 10 commits
    • Vishal Kulkarni's avatar
      cxgb4: Fix kernel panic while accessing sge_info · 479a0d13
      Vishal Kulkarni authored
      The sge_info debugfs collects offload queue info even when offload
      capability is disabled and leads to panic.
      
      [  144.139871] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  144.139874] CR2: 0000000000000000 CR3: 000000082d456005 CR4: 00000000001606e0
      [  144.139876] Call Trace:
      [  144.139887]  sge_queue_start+0x12/0x30 [cxgb4]
      [  144.139897]  seq_read+0x1d4/0x3d0
      [  144.139906]  full_proxy_read+0x50/0x70
      [  144.139913]  vfs_read+0x89/0x140
      [  144.139916]  ksys_read+0x55/0xd0
      [  144.139924]  do_syscall_64+0x5b/0x1d0
      [  144.139933]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  144.139936] RIP: 0033:0x7f4b01493990
      
      Fix this crash by skipping the offload queue access in sge_qinfo when
      offload capability is disabled
      Signed-off-by: default avatarHerat Ramani <herat@chelsio.com>
      Signed-off-by: default avatarVishal Kulkarni <vishal@chelsio.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      479a0d13
    • Ursula Braun's avatar
      net/smc: add fallback check to connect() · 86434744
      Ursula Braun authored
      FASTOPEN setsockopt() or sendmsg() may switch the SMC socket to fallback
      mode. Once fallback mode is active, the native TCP socket functions are
      called. Nevertheless there is a small race window, when FASTOPEN
      setsockopt/sendmsg runs in parallel to a connect(), and switch the
      socket into fallback mode before connect() takes the sock lock.
      Make sure the SMC-specific connect setup is omitted in this case.
      
      This way a syzbot-reported refcount problem is fixed, triggered by
      different threads running non-blocking connect() and FASTOPEN_KEY
      setsockopt.
      
      Reported-by: syzbot+96d3f9ff6a86d37e44c8@syzkaller.appspotmail.com
      Fixes: 6d6dd528 ("net/smc: fix refcount non-blocking connect() -part 2")
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      86434744
    • Russell King's avatar
      net: phylink: fix interface passed to mac_link_up · 9b2079c0
      Russell King authored
      A mismerge between the following two commits:
      
      c6787263 ("net: phylink: ensure consistent phy interface mode")
      27755ff8 ("net: phylink: Add phylink_mac_link_{up, down} wrapper functions")
      
      resulted in the wrong interface being passed to the mac_link_up()
      function. Fix this up.
      
      Fixes: b4b12b0d ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      9b2079c0
    • Thadeu Lima de Souza Cascardo's avatar
      selftests: net: tls: remove recv_rcvbuf test · 6dd504b0
      Thadeu Lima de Souza Cascardo authored
      This test only works when [1] is applied, which was rejected.
      
      Basically, the errors are reported and cleared. In this particular case of
      tls sockets, following reads will block.
      
      The test case was originally submitted with the rejected patch, but, then,
      was included as part of a different patchset, possibly by mistake.
      
      [1] https://lore.kernel.org/netdev/20191007035323.4360-2-jakub.kicinski@netronome.com/#t
      
      Thanks Paolo Pisati for pointing out the original patchset where this
      appeared.
      
      Fixes: 65190f77 (selftests/tls: add a test for fragmented messages)
      Reported-by: default avatarPaolo Pisati <paolo.pisati@canonical.com>
      Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@canonical.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      6dd504b0
    • Jakub Kicinski's avatar
      Merge branch 'gtp-fix-several-bugs-in-gtp-module' · 8ed001c9
      Jakub Kicinski authored
      Taehee Yoo says:
      
      ====================
      gtp: fix several bugs in gtp module
      
      This patchset fixes several bugs in the GTP module.
      
      1. Do not allow adding duplicate TID and ms_addr pdp context.
      In the current code, duplicate TID and ms_addr pdp context could be added.
      So, RX and TX path could find correct pdp context.
      
      2. Fix wrong condition in ->dumpit() callback.
      ->dumpit() callback is re-called if dump packet size is too big.
      So, before return, it saves last position and then restart from
      last dump position.
      TID value is used to find last dump position.
      GTP module allows adding zero TID value. But ->dumpit() callback ignores
      zero TID value.
      So, dump would not work correctly if dump packet size too big.
      
      3. Fix use-after-free in ipv4_pdp_find().
      RX and TX patch always uses gtp->tid_hash and gtp->addr_hash.
      but while packet processing, these hash pointer would be freed.
      So, use-after-free would occur.
      
      4. Fix panic because of zero size hashtable
      GTP hashtable size could be set by user-space.
      If hashsize is set to 0, hashtable will not work and panic will occur.
      ====================
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      8ed001c9
    • Taehee Yoo's avatar
      gtp: avoid zero size hashtable · 6a902c0f
      Taehee Yoo authored
      GTP default hashtable size is 1024 and userspace could set specific
      hashtable size with IFLA_GTP_PDP_HASHSIZE. If hashtable size is set to 0
      from userspace,  hashtable will not work and panic will occur.
      
      Fixes: 459aa660 ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      6a902c0f
    • Taehee Yoo's avatar
      gtp: fix an use-after-free in ipv4_pdp_find() · 94dc550a
      Taehee Yoo authored
      ipv4_pdp_find() is called in TX packet path of GTP.
      ipv4_pdp_find() internally uses gtp->tid_hash to lookup pdp context.
      In the current code, gtp->tid_hash and gtp->addr_hash are freed by
      ->dellink(), which is gtp_dellink().
      But gtp_dellink() would be called while packets are processing.
      So, gtp_dellink() should not free gtp->tid_hash and gtp->addr_hash.
      Instead, dev->priv_destructor() would be used because this callback
      is called after all packet processing safely.
      
      Test commands:
          ip link add veth1 type veth peer name veth2
          ip a a 172.0.0.1/24 dev veth1
          ip link set veth1 up
          ip a a 172.99.0.1/32 dev lo
      
          gtp-link add gtp1 &
      
          gtp-tunnel add gtp1 v1 200 100 172.99.0.2 172.0.0.2
          ip r a  172.99.0.2/32 dev gtp1
          ip link set gtp1 mtu 1500
      
          ip netns add ns2
          ip link set veth2 netns ns2
          ip netns exec ns2 ip a a 172.0.0.2/24 dev veth2
          ip netns exec ns2 ip link set veth2 up
          ip netns exec ns2 ip a a 172.99.0.2/32 dev lo
          ip netns exec ns2 ip link set lo up
      
          ip netns exec ns2 gtp-link add gtp2 &
          ip netns exec ns2 gtp-tunnel add gtp2 v1 100 200 172.99.0.1 172.0.0.1
          ip netns exec ns2 ip r a 172.99.0.1/32 dev gtp2
          ip netns exec ns2 ip link set gtp2 mtu 1500
      
          hping3 172.99.0.2 -2 --flood &
          ip link del gtp1
      
      Splat looks like:
      [   72.568081][ T1195] BUG: KASAN: use-after-free in ipv4_pdp_find.isra.12+0x130/0x170 [gtp]
      [   72.568916][ T1195] Read of size 8 at addr ffff8880b9a35d28 by task hping3/1195
      [   72.569631][ T1195]
      [   72.569861][ T1195] CPU: 2 PID: 1195 Comm: hping3 Not tainted 5.5.0-rc1 #199
      [   72.570547][ T1195] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [   72.571438][ T1195] Call Trace:
      [   72.571764][ T1195]  dump_stack+0x96/0xdb
      [   72.572171][ T1195]  ? ipv4_pdp_find.isra.12+0x130/0x170 [gtp]
      [   72.572761][ T1195]  print_address_description.constprop.5+0x1be/0x360
      [   72.573400][ T1195]  ? ipv4_pdp_find.isra.12+0x130/0x170 [gtp]
      [   72.573971][ T1195]  ? ipv4_pdp_find.isra.12+0x130/0x170 [gtp]
      [   72.574544][ T1195]  __kasan_report+0x12a/0x16f
      [   72.575014][ T1195]  ? ipv4_pdp_find.isra.12+0x130/0x170 [gtp]
      [   72.575593][ T1195]  kasan_report+0xe/0x20
      [   72.576004][ T1195]  ipv4_pdp_find.isra.12+0x130/0x170 [gtp]
      [   72.576577][ T1195]  gtp_build_skb_ip4+0x199/0x1420 [gtp]
      [ ... ]
      [   72.647671][ T1195] BUG: unable to handle page fault for address: ffff8880b9a35d28
      [   72.648512][ T1195] #PF: supervisor read access in kernel mode
      [   72.649158][ T1195] #PF: error_code(0x0000) - not-present page
      [   72.649849][ T1195] PGD a6c01067 P4D a6c01067 PUD 11fb07067 PMD 11f939067 PTE 800fffff465ca060
      [   72.652958][ T1195] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [   72.653834][ T1195] CPU: 2 PID: 1195 Comm: hping3 Tainted: G    B             5.5.0-rc1 #199
      [   72.668062][ T1195] RIP: 0010:ipv4_pdp_find.isra.12+0x86/0x170 [gtp]
      [ ... ]
      [   72.679168][ T1195] Call Trace:
      [   72.679603][ T1195]  gtp_build_skb_ip4+0x199/0x1420 [gtp]
      [   72.681915][ T1195]  ? ipv4_pdp_find.isra.12+0x170/0x170 [gtp]
      [   72.682513][ T1195]  ? lock_acquire+0x164/0x3b0
      [   72.682966][ T1195]  ? gtp_dev_xmit+0x35e/0x890 [gtp]
      [   72.683481][ T1195]  gtp_dev_xmit+0x3c2/0x890 [gtp]
      [ ... ]
      
      Fixes: 459aa660 ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      94dc550a
    • Taehee Yoo's avatar
      gtp: fix wrong condition in gtp_genl_dump_pdp() · 94a6d9fb
      Taehee Yoo authored
      gtp_genl_dump_pdp() is ->dumpit() callback of GTP module and it is used
      to dump pdp contexts. it would be re-executed because of dump packet size.
      
      If dump packet size is too big, it saves current dump pointer
      (gtp interface pointer, bucket, TID value) then it restarts dump from
      last pointer.
      Current GTP code allows adding zero TID pdp context but dump code
      ignores zero TID value. So, last dump pointer will not be found.
      
      In addition, this patch adds missing rcu_read_lock() in
      gtp_genl_dump_pdp().
      
      Fixes: 459aa660 ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      94a6d9fb
    • Taehee Yoo's avatar
      gtp: do not allow adding duplicate tid and ms_addr pdp context · 6b01b1d9
      Taehee Yoo authored
      GTP RX packet path lookups pdp context with TID. If duplicate TID pdp
      contexts are existing in the list, it couldn't select correct pdp context.
      So, TID value  should be unique.
      GTP TX packet path lookups pdp context with ms_addr. If duplicate ms_addr pdp
      contexts are existing in the list, it couldn't select correct pdp context.
      So, ms_addr value should be unique.
      
      Fixes: 459aa660 ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      6b01b1d9
    • Mahesh Bandewar's avatar
      bonding: fix active-backup transition after link failure · 5d485ed8
      Mahesh Bandewar authored
      After the recent fix in commit 1899bb32 ("bonding: fix state
      transition issue in link monitoring"), the active-backup mode with
      miimon initially come-up fine but after a link-failure, both members
      transition into backup state.
      
      Following steps to reproduce the scenario (eth1 and eth2 are the
      slaves of the bond):
      
          ip link set eth1 up
          ip link set eth2 down
          sleep 1
          ip link set eth2 up
          ip link set eth1 down
          cat /sys/class/net/eth1/bonding_slave/state
          cat /sys/class/net/eth2/bonding_slave/state
      
      Fixes: 1899bb32 ("bonding: fix state transition issue in link monitoring")
      CC: Jay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Acked-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      5d485ed8
  2. 14 Dec, 2019 15 commits
    • Jakub Kicinski's avatar
      Merge branch 'bnx2x-bug-fixes' · 7ae1629d
      Jakub Kicinski authored
      Manish Chopra says:
      
      ====================
      bnx2x: bug fixes
      
      This series has two driver changes, one to fix some unexpected
      hardware behaviour casued during the parity error recovery in
      presence of SR-IOV VFs and another one related for fixing resource
      management in the driver among the PFs configured on an engine.
      
      Please consider applying it to "net".
      
      V1->V2:
      =======
      Fix the compilation errors reported by kbuild test robot
      on the patch #1 with CONFIG_BNX2X_SRIOV=n
      ====================
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      7ae1629d
    • Manish Chopra's avatar
      bnx2x: Fix logic to get total no. of PFs per engine · ee699f89
      Manish Chopra authored
      Driver doesn't calculate total number of PFs configured on a
      given engine correctly which messed up resources in the PFs
      loaded on that engine, leading driver to exceed configuration
      of resources (like vlan filters etc.) beyond the limit per
      engine, which ended up with asserts from the firmware.
      Signed-off-by: default avatarManish Chopra <manishc@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      ee699f89
    • Manish Chopra's avatar
      bnx2x: Do not handle requests from VFs after parity · 7113f796
      Manish Chopra authored
      Parity error from the hardware will cause PF to lose the state
      of their VFs due to PF's internal reload and hardware reset following
      the parity error. Restrict any configuration request from the VFs after
      the parity as it could cause unexpected hardware behavior, only way
      for VFs to recover would be to trigger FLR on VFs and reload them.
      Signed-off-by: default avatarManish Chopra <manishc@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      7113f796
    • Arnd Bergmann's avatar
      net: ethernet: ti: build cpsw-common for switchdev · ed56dd8f
      Arnd Bergmann authored
      Without the common part of the driver, the new file fails to link:
      
      drivers/net/ethernet/ti/cpsw_new.o: In function `cpsw_probe':
      cpsw_new.c:(.text+0x312c): undefined reference to `ti_cm_get_macid'
      
      Use the same Makefile hack as before, and build cpsw-common.o for
      any driver that needs it.
      
      Fixes: ed3525ed ("net: ethernet: ti: introduce cpsw switchdev based driver part 1 - dual-emac")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      ed56dd8f
    • Arnd Bergmann's avatar
      net: ethernet: ti: select PAGE_POOL for switchdev driver · 99e9fe22
      Arnd Bergmann authored
      The new driver misses a dependency:
      
      drivers/net/ethernet/ti/cpsw_new.o: In function `cpsw_rx_handler':
      cpsw_new.c:(.text+0x259c): undefined reference to `__page_pool_put_page'
      cpsw_new.c:(.text+0x25d0): undefined reference to `page_pool_alloc_pages'
      drivers/net/ethernet/ti/cpsw_priv.o: In function `cpsw_fill_rx_channels':
      cpsw_priv.c:(.text+0x22d8): undefined reference to `page_pool_alloc_pages'
      cpsw_priv.c:(.text+0x2420): undefined reference to `__page_pool_put_page'
      drivers/net/ethernet/ti/cpsw_priv.o: In function `cpsw_create_xdp_rxqs':
      cpsw_priv.c:(.text+0x2624): undefined reference to `page_pool_create'
      drivers/net/ethernet/ti/cpsw_priv.o: In function `cpsw_run_xdp':
      cpsw_priv.c:(.text+0x2dc8): undefined reference to `__page_pool_put_page'
      
      Other drivers use 'select' for PAGE_POOL, so do the same here.
      
      Fixes: ed3525ed ("net: ethernet: ti: introduce cpsw switchdev based driver part 1 - dual-emac")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Reviewed-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      99e9fe22
    • Haiyang Zhang's avatar
      hv_netvsc: Fix tx_table init in rndis_set_subchannel() · c39ea5cb
      Haiyang Zhang authored
      Host can provide send indirection table messages anytime after RSS is
      enabled by calling rndis_filter_set_rss_param(). So the host provided
      table values may be overwritten by the initialization in
      rndis_set_subchannel().
      
      To prevent this problem, move the tx_table initialization before calling
      rndis_filter_set_rss_param().
      
      Fixes: a6fb6aa3 ("hv_netvsc: Set tx_table to equal weight after subchannels open")
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      c39ea5cb
    • Russell King's avatar
      net: marvell: mvpp2: phylink requires the link interrupt · f3f2364e
      Russell King authored
      phylink requires the MAC to report when its link status changes when
      operating in inband modes.  Failure to report link status changes
      means that phylink has no idea when the link events happen, which
      results in either the network interface's carrier remaining up or
      remaining permanently down.
      
      For example, with a fiber module, if the interface is brought up and
      link is initially established, taking the link down at the far end
      will cut the optical power.  The SFP module's LOS asserts, we
      deactivate the link, and the network interface reports no carrier.
      
      When the far end is brought back up, the SFP module's LOS deasserts,
      but the MAC may be slower to establish link.  If this happens (which
      in my tests is a certainty) then phylink never hears that the MAC
      has established link with the far end, and the network interface is
      stuck reporting no carrier.  This means the interface is
      non-functional.
      
      Avoiding the link interrupt when we have phylink is basically not
      an option, so remove the !port->phylink from the test.
      
      Fixes: 4bb04326 ("net: mvpp2: phylink support")
      Tested-by: default avatarSven Auhagen <sven.auhagen@voleatech.de>
      Tested-by: default avatarAntoine Tenart <antoine.tenart@bootlin.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      f3f2364e
    • Jakub Kicinski's avatar
      Merge branch 'tcp-take-care-of-empty-skbs-in-write-queue' · cd1263b6
      Jakub Kicinski authored
      Eric Dumazet says:
      ====================
      tcp: take care of empty skbs in write queue
      
      We understood recently that TCP sockets could have an empty
      skb at the tail of the write queue, leading to various problems.
      
      This patch series :
      
      1) Make sure we do not send an empty packet since this
         was unintended and causing crashes in old kernels.
      
      2) Change tcp_write_queue_empty() to not be fooled by
         the presence of an empty skb.
      
      3) Fix a bug that could trigger suboptimal epoll()
         application behavior under memory pressure.
      ====================
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      cd1263b6
    • Eric Dumazet's avatar
      tcp: refine rule to allow EPOLLOUT generation under mem pressure · 216808c6
      Eric Dumazet authored
      At the time commit ce5ec440 ("tcp: ensure epoll edge trigger
      wakeup when write queue is empty") was added to the kernel,
      we still had a single write queue, combining rtx and write queues.
      
      Once we moved the rtx queue into a separate rb-tree, testing
      if sk_write_queue is empty has been suboptimal.
      
      Indeed, if we have packets in the rtx queue, we probably want
      to delay the EPOLLOUT generation at the time incoming packets
      will free them, making room, but more importantly avoiding
      flooding application with EPOLLOUT events.
      
      Solution is to use tcp_rtx_and_write_queues_empty() helper.
      
      Fixes: 75c119af ("tcp: implement rb-tree based retransmit queue")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Jason Baron <jbaron@akamai.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      216808c6
    • Eric Dumazet's avatar
      tcp: refine tcp_write_queue_empty() implementation · ee2aabd3
      Eric Dumazet authored
      Due to how tcp_sendmsg() is implemented, we can have an empty
      skb at the tail of the write queue.
      
      Most [1] tcp_write_queue_empty() callers want to know if there is
      anything to send (payload and/or FIN)
      
      Instead of checking if the sk_write_queue is empty, we need
      to test if tp->write_seq == tp->snd_nxt
      
      [1] tcp_send_fin() was the only caller that expected to
       see if an skb was in the write queue, I have changed the code
       to reuse the tcp_write_queue_tail() result.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      ee2aabd3
    • Eric Dumazet's avatar
      tcp: do not send empty skb from tcp_write_xmit() · 1f85e626
      Eric Dumazet authored
      Backport of commit fdfc5c85 ("tcp: remove empty skb from
      write queue in error cases") in linux-4.14 stable triggered
      various bugs. One of them has been fixed in commit ba2ddb43f270
      ("tcp: Don't dequeue SYN/FIN-segments from write-queue"), but
      we still have crashes in some occasions.
      
      Root-cause is that when tcp_sendmsg() has allocated a fresh
      skb and could not append a fragment before being blocked
      in sk_stream_wait_memory(), tcp_write_xmit() might be called
      and decide to send this fresh and empty skb.
      
      Sending an empty packet is not only silly, it might have caused
      many issues we had in the past with tp->packets_out being
      out of sync.
      
      Fixes: c65f7f00 ("[TCP]: Simplify SKB data portion allocation with NETIF_F_SG.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Christoph Paasch <cpaasch@apple.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Cc: Jason Baron <jbaron@akamai.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      1f85e626
    • Eric Dumazet's avatar
      6pack,mkiss: fix possible deadlock · 5c9934b6
      Eric Dumazet authored
      We got another syzbot report [1] that tells us we must use
      write_lock_irq()/write_unlock_irq() to avoid possible deadlock.
      
      [1]
      
      WARNING: inconsistent lock state
      5.5.0-rc1-syzkaller #0 Not tainted
      --------------------------------
      inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-R} usage.
      syz-executor826/9605 [HC1[1]:SC0[0]:HE0:SE1] takes:
      ffffffff8a128718 (disc_data_lock){+-..}, at: sp_get.isra.0+0x1d/0xf0 drivers/net/ppp/ppp_synctty.c:138
      {HARDIRQ-ON-W} state was registered at:
        lock_acquire+0x190/0x410 kernel/locking/lockdep.c:4485
        __raw_write_lock_bh include/linux/rwlock_api_smp.h:203 [inline]
        _raw_write_lock_bh+0x33/0x50 kernel/locking/spinlock.c:319
        sixpack_close+0x1d/0x250 drivers/net/hamradio/6pack.c:657
        tty_ldisc_close.isra.0+0x119/0x1a0 drivers/tty/tty_ldisc.c:489
        tty_set_ldisc+0x230/0x6b0 drivers/tty/tty_ldisc.c:585
        tiocsetd drivers/tty/tty_io.c:2337 [inline]
        tty_ioctl+0xe8d/0x14f0 drivers/tty/tty_io.c:2597
        vfs_ioctl fs/ioctl.c:47 [inline]
        file_ioctl fs/ioctl.c:545 [inline]
        do_vfs_ioctl+0x977/0x14e0 fs/ioctl.c:732
        ksys_ioctl+0xab/0xd0 fs/ioctl.c:749
        __do_sys_ioctl fs/ioctl.c:756 [inline]
        __se_sys_ioctl fs/ioctl.c:754 [inline]
        __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:754
        do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      irq event stamp: 3946
      hardirqs last  enabled at (3945): [<ffffffff87c86e43>] __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:168 [inline]
      hardirqs last  enabled at (3945): [<ffffffff87c86e43>] _raw_spin_unlock_irq+0x23/0x80 kernel/locking/spinlock.c:199
      hardirqs last disabled at (3946): [<ffffffff8100675f>] trace_hardirqs_off_thunk+0x1a/0x1c arch/x86/entry/thunk_64.S:42
      softirqs last  enabled at (2658): [<ffffffff86a8b4df>] spin_unlock_bh include/linux/spinlock.h:383 [inline]
      softirqs last  enabled at (2658): [<ffffffff86a8b4df>] clusterip_netdev_event+0x46f/0x670 net/ipv4/netfilter/ipt_CLUSTERIP.c:222
      softirqs last disabled at (2656): [<ffffffff86a8b22b>] spin_lock_bh include/linux/spinlock.h:343 [inline]
      softirqs last disabled at (2656): [<ffffffff86a8b22b>] clusterip_netdev_event+0x1bb/0x670 net/ipv4/netfilter/ipt_CLUSTERIP.c:196
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(disc_data_lock);
        <Interrupt>
          lock(disc_data_lock);
      
       *** DEADLOCK ***
      
      5 locks held by syz-executor826/9605:
       #0: ffff8880a905e198 (&tty->legacy_mutex){+.+.}, at: tty_lock+0xc7/0x130 drivers/tty/tty_mutex.c:19
       #1: ffffffff899a56c0 (rcu_read_lock){....}, at: mutex_spin_on_owner+0x0/0x330 kernel/locking/mutex.c:413
       #2: ffff8880a496a2b0 (&(&i->lock)->rlock){-.-.}, at: spin_lock include/linux/spinlock.h:338 [inline]
       #2: ffff8880a496a2b0 (&(&i->lock)->rlock){-.-.}, at: serial8250_interrupt+0x2d/0x1a0 drivers/tty/serial/8250/8250_core.c:116
       #3: ffffffff8c104048 (&port_lock_key){-.-.}, at: serial8250_handle_irq.part.0+0x24/0x330 drivers/tty/serial/8250/8250_port.c:1823
       #4: ffff8880a905e090 (&tty->ldisc_sem){++++}, at: tty_ldisc_ref+0x22/0x90 drivers/tty/tty_ldisc.c:288
      
      stack backtrace:
      CPU: 1 PID: 9605 Comm: syz-executor826 Not tainted 5.5.0-rc1-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x197/0x210 lib/dump_stack.c:118
       print_usage_bug.cold+0x327/0x378 kernel/locking/lockdep.c:3101
       valid_state kernel/locking/lockdep.c:3112 [inline]
       mark_lock_irq kernel/locking/lockdep.c:3309 [inline]
       mark_lock+0xbb4/0x1220 kernel/locking/lockdep.c:3666
       mark_usage kernel/locking/lockdep.c:3554 [inline]
       __lock_acquire+0x1e55/0x4a00 kernel/locking/lockdep.c:3909
       lock_acquire+0x190/0x410 kernel/locking/lockdep.c:4485
       __raw_read_lock include/linux/rwlock_api_smp.h:149 [inline]
       _raw_read_lock+0x32/0x50 kernel/locking/spinlock.c:223
       sp_get.isra.0+0x1d/0xf0 drivers/net/ppp/ppp_synctty.c:138
       sixpack_write_wakeup+0x25/0x340 drivers/net/hamradio/6pack.c:402
       tty_wakeup+0xe9/0x120 drivers/tty/tty_io.c:536
       tty_port_default_wakeup+0x2b/0x40 drivers/tty/tty_port.c:50
       tty_port_tty_wakeup+0x57/0x70 drivers/tty/tty_port.c:387
       uart_write_wakeup+0x46/0x70 drivers/tty/serial/serial_core.c:104
       serial8250_tx_chars+0x495/0xaf0 drivers/tty/serial/8250/8250_port.c:1761
       serial8250_handle_irq.part.0+0x2a2/0x330 drivers/tty/serial/8250/8250_port.c:1834
       serial8250_handle_irq drivers/tty/serial/8250/8250_port.c:1820 [inline]
       serial8250_default_handle_irq+0xc0/0x150 drivers/tty/serial/8250/8250_port.c:1850
       serial8250_interrupt+0xf1/0x1a0 drivers/tty/serial/8250/8250_core.c:126
       __handle_irq_event_percpu+0x15d/0x970 kernel/irq/handle.c:149
       handle_irq_event_percpu+0x74/0x160 kernel/irq/handle.c:189
       handle_irq_event+0xa7/0x134 kernel/irq/handle.c:206
       handle_edge_irq+0x25e/0x8d0 kernel/irq/chip.c:830
       generic_handle_irq_desc include/linux/irqdesc.h:156 [inline]
       do_IRQ+0xde/0x280 arch/x86/kernel/irq.c:250
       common_interrupt+0xf/0xf arch/x86/entry/entry_64.S:607
       </IRQ>
      RIP: 0010:cpu_relax arch/x86/include/asm/processor.h:685 [inline]
      RIP: 0010:mutex_spin_on_owner+0x247/0x330 kernel/locking/mutex.c:579
      Code: c3 be 08 00 00 00 4c 89 e7 e8 e5 06 59 00 4c 89 e0 48 c1 e8 03 42 80 3c 38 00 0f 85 e1 00 00 00 49 8b 04 24 a8 01 75 96 f3 90 <e9> 2f fe ff ff 0f 0b e8 0d 19 09 00 84 c0 0f 85 ff fd ff ff 48 c7
      RSP: 0018:ffffc90001eafa20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd7
      RAX: 0000000000000000 RBX: ffff88809fd9e0c0 RCX: 1ffffffff13266dd
      RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000000
      RBP: ffffc90001eafa60 R08: 1ffff11013d22898 R09: ffffed1013d22899
      R10: ffffed1013d22898 R11: ffff88809e9144c7 R12: ffff8880a905e138
      R13: ffff88809e9144c0 R14: 0000000000000000 R15: dffffc0000000000
       mutex_optimistic_spin kernel/locking/mutex.c:673 [inline]
       __mutex_lock_common kernel/locking/mutex.c:962 [inline]
       __mutex_lock+0x32b/0x13c0 kernel/locking/mutex.c:1106
       mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1121
       tty_lock+0xc7/0x130 drivers/tty/tty_mutex.c:19
       tty_release+0xb5/0xe90 drivers/tty/tty_io.c:1665
       __fput+0x2ff/0x890 fs/file_table.c:280
       ____fput+0x16/0x20 fs/file_table.c:313
       task_work_run+0x145/0x1c0 kernel/task_work.c:113
       exit_task_work include/linux/task_work.h:22 [inline]
       do_exit+0x8e7/0x2ef0 kernel/exit.c:797
       do_group_exit+0x135/0x360 kernel/exit.c:895
       __do_sys_exit_group kernel/exit.c:906 [inline]
       __se_sys_exit_group kernel/exit.c:904 [inline]
       __x64_sys_exit_group+0x44/0x50 kernel/exit.c:904
       do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x43fef8
      Code: Bad RIP value.
      RSP: 002b:00007ffdb07d2338 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000043fef8
      RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
      RBP: 00000000004bf730 R08: 00000000000000e7 R09: ffffffffffffffd0
      R10: 00000000004002c8 R11: 0000000000000246 R12: 0000000000000001
      R13: 00000000006d1180 R14: 0000000000000000 R15: 0000000000000000
      
      Fixes: 6e4e2f81 ("6pack,mkiss: fix lock inconsistency")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      5c9934b6
    • Eric Dumazet's avatar
      tcp/dccp: fix possible race __inet_lookup_established() · 8dbd76e7
      Eric Dumazet authored
      Michal Kubecek and Firo Yang did a very nice analysis of crashes
      happening in __inet_lookup_established().
      
      Since a TCP socket can go from TCP_ESTABLISH to TCP_LISTEN
      (via a close()/socket()/listen() cycle) without a RCU grace period,
      I should not have changed listeners linkage in their hash table.
      
      They must use the nulls protocol (Documentation/RCU/rculist_nulls.txt),
      so that a lookup can detect a socket in a hash list was moved in
      another one.
      
      Since we added code in commit d296ba60 ("soreuseport: Resolve
      merge conflict for v4/v6 ordering fix"), we have to add
      hlist_nulls_add_tail_rcu() helper.
      
      Fixes: 3b24d854 ("tcp/dccp: do not touch listener sk_refcnt under synflood")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Reported-by: default avatarFiro Yang <firo.yang@suse.com>
      Reviewed-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Link: https://lore.kernel.org/netdev/20191120083919.GH27852@unicorn.suse.cz/Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      8dbd76e7
    • Thomas Falcon's avatar
      net/ibmvnic: Fix typo in retry check · 8f9cc1ee
      Thomas Falcon authored
      This conditional is missing a bang, with the intent
      being to break when the retry count reaches zero.
      
      Fixes: 476d96ca ("ibmvnic: Bound waits for device queries")
      Suggested-by: default avatarJuliet Kim <julietk@linux.vnet.ibm.com>
      Signed-off-by: default avatarThomas Falcon <tlfalcon@linux.ibm.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      8f9cc1ee
    • Hangbin Liu's avatar
      ipv6/addrconf: only check invalid header values when NETLINK_F_STRICT_CHK is set · 2beb6d29
      Hangbin Liu authored
      In commit 4b1373de ("net: ipv6: addr: perform strict checks also for
      doit handlers") we add strict check for inet6_rtm_getaddr(). But we did
      the invalid header values check before checking if NETLINK_F_STRICT_CHK
      is set. This may break backwards compatibility if user already set the
      ifm->ifa_prefixlen, ifm->ifa_flags, ifm->ifa_scope in their netlink code.
      
      I didn't move the nlmsg_len check because I thought it's a valid check.
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Fixes: 4b1373de ("net: ipv6: addr: perform strict checks also for doit handlers")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      2beb6d29
  3. 13 Dec, 2019 4 commits
  4. 12 Dec, 2019 3 commits
  5. 11 Dec, 2019 8 commits