1. 30 Oct, 2019 3 commits
  2. 29 Oct, 2019 8 commits
    • Jiangfeng Xiao's avatar
      net: hisilicon: Fix ping latency when deal with high throughput · e56bd641
      Jiangfeng Xiao authored
      This is due to error in over budget processing.
      When dealing with high throughput, the used buffers
      that exceeds the budget is not cleaned up. In addition,
      it takes a lot of cycles to clean up the used buffer,
      and then the buffer where the valid data is located can take effect.
      Signed-off-by: default avatarJiangfeng Xiao <xiaojiangfeng@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e56bd641
    • Eran Ben Elisha's avatar
      net/mlx4_core: Dynamically set guaranteed amount of counters per VF · e19868ef
      Eran Ben Elisha authored
      Prior to this patch, the amount of counters guaranteed per VF in the
      resource tracker was MLX4_VF_COUNTERS_PER_PORT * MLX4_MAX_PORTS. It was
      set regardless if the VF was single or dual port.
      This caused several VFs to have no guaranteed counters although the
      system could satisfy their request.
      
      The fix is to dynamically guarantee counters, based on each VF
      specification.
      
      Fixes: 9de92c60 ("net/mlx4_core: Adjust counter grant policy in the resource tracker")
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e19868ef
    • David S. Miller's avatar
      Merge branch 'VLAN-fixes-for-Ocelot-switch' · c1b5ddc1
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      VLAN fixes for Ocelot switch
      
      This series addresses 2 issues with vlan_filtering=1:
      - Untagged traffic gets dropped unless commands are run in a very
        specific order.
      - Untagged traffic starts being transmitted as tagged after adding
        another untagged VID on the port.
      
      Tested on NXP LS1028A-RDB board.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1b5ddc1
    • Vladimir Oltean's avatar
      net: mscc: ocelot: refuse to overwrite the port's native vlan · b9cd75e6
      Vladimir Oltean authored
      The switch driver keeps a "vid" variable per port, which signifies _the_
      VLAN ID that is stripped on that port's egress (aka the native VLAN on a
      trunk port).
      
      That is the way the hardware is designed (mostly). The port->vid is
      programmed into REW:PORT:PORT_VLAN_CFG:PORT_VID and the rewriter is told
      to send all traffic as tagged except the one having port->vid.
      
      There exists a possibility of finer-grained egress untagging decisions:
      using the VCAP IS1 engine, one rule can be added to match every
      VLAN-tagged frame whose VLAN should be untagged, and set POP_CNT=1 as
      action. However, the IS1 can hold at most 512 entries, and the VLANs are
      in the order of 6 * 4096.
      
      So the code is fine for now. But this sequence of commands:
      
      $ bridge vlan add dev swp0 vid 1 pvid untagged
      $ bridge vlan add dev swp0 vid 2 untagged
      
      makes untagged and pvid-tagged traffic be sent out of swp0 as tagged
      with VID 1, despite user's request.
      
      Prevent that from happening. The user should temporarily remove the
      existing untagged VLAN (1 in this case), add it back as tagged, and then
      add the new untagged VLAN (2 in this case).
      
      Cc: Antoine Tenart <antoine.tenart@bootlin.com>
      Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
      Fixes: 7142529f ("net: mscc: ocelot: add VLAN filtering")
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: default avatarAlexandre Belloni <alexandre.belloni@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b9cd75e6
    • Vladimir Oltean's avatar
      net: mscc: ocelot: fix vlan_filtering when enslaving to bridge before link is up · 1c44ce56
      Vladimir Oltean authored
      Background information: the driver operates the hardware in a mode where
      a single VLAN can be transmitted as untagged on a particular egress
      port. That is the "native VLAN on trunk port" use case. Its value is
      held in port->vid.
      
      Consider the following command sequence (no network manager, all
      interfaces are down, debugging prints added by me):
      
      $ ip link add dev br0 type bridge vlan_filtering 1
      $ ip link set dev swp0 master br0
      
      Kernel code path during last command:
      
      br_add_slave -> ocelot_netdevice_port_event (NETDEV_CHANGEUPPER):
      [   21.401901] ocelot_vlan_port_apply: port 0 vlan aware 0 pvid 0 vid 0
      
      br_add_slave -> nbp_vlan_init -> switchdev_port_attr_set -> ocelot_port_attr_set (SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING):
      [   21.413335] ocelot_vlan_port_apply: port 0 vlan aware 1 pvid 0 vid 0
      
      br_add_slave -> nbp_vlan_init -> nbp_vlan_add -> br_switchdev_port_vlan_add -> switchdev_port_obj_add -> ocelot_port_obj_add -> ocelot_vlan_vid_add
      [   21.667421] ocelot_vlan_port_apply: port 0 vlan aware 1 pvid 1 vid 1
      
      So far so good. The bridge has replaced the driver's default pvid used
      in standalone mode (0) with its own default_pvid (1). The port's vid
      (native VLAN) has also changed from 0 to 1.
      
      $ ip link set dev swp0 up
      
      [   31.722956] 8021q: adding VLAN 0 to HW filter on device swp0
      do_setlink -> dev_change_flags -> vlan_vid_add -> ocelot_vlan_rx_add_vid -> ocelot_vlan_vid_add:
      [   31.728700] ocelot_vlan_port_apply: port 0 vlan aware 1 pvid 1 vid 0
      
      The 8021q module uses the .ndo_vlan_rx_add_vid API on .ndo_open to make
      ports be able to transmit and receive 802.1p-tagged traffic by default.
      This API is supposed to offload a VLAN sub-interface, which for a switch
      port means to add a VLAN that is not a pvid, and tagged on egress.
      
      But the driver implementation of .ndo_vlan_rx_add_vid is wrong: it adds
      back vid 0 as "egress untagged". Now back to the initial paragraph:
      there is a single untagged VID that the driver keeps track of, and that
      has just changed from 1 (the pvid) to 0. So this breaks the bridge
      core's expectation, because it has changed vid 1 from untagged to
      tagged, when what the user sees is.
      
      $ bridge vlan
      port    vlan ids
      swp0     1 PVID Egress Untagged
      
      br0      1 PVID Egress Untagged
      
      But curiously, instead of manifesting itself as "untagged and
      pvid-tagged traffic gets sent as tagged on egress", the bug:
      
      - is hidden when vlan_filtering=0
      - manifests as dropped traffic when vlan_filtering=1, due to this setting:
      
      	if (port->vlan_aware && !port->vid)
      		/* If port is vlan-aware and tagged, drop untagged and priority
      		 * tagged frames.
      		 */
      		val |= ANA_PORT_DROP_CFG_DROP_UNTAGGED_ENA |
      		       ANA_PORT_DROP_CFG_DROP_PRIO_S_TAGGED_ENA |
      		       ANA_PORT_DROP_CFG_DROP_PRIO_C_TAGGED_ENA;
      
      which would have made sense if it weren't for this bug. The setting's
      intention was "this is a trunk port with no native VLAN, so don't accept
      untagged traffic". So the driver was never expecting to set VLAN 0 as
      the value of the native VLAN, 0 was just encoding for "invalid".
      
      So the fix is to not send 802.1p traffic as untagged, because that would
      change the port's native vlan to 0, unbeknownst to the bridge, and
      trigger unexpected code paths in the driver.
      
      Cc: Antoine Tenart <antoine.tenart@bootlin.com>
      Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
      Fixes: 7142529f ("net: mscc: ocelot: add VLAN filtering")
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: default avatarAlexandre Belloni <alexandre.belloni@bootlin.com>
      Reviewed-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1c44ce56
    • Navid Emamdoost's avatar
      wimax: i2400: Fix memory leak in i2400m_op_rfkill_sw_toggle · 6f3ef5c2
      Navid Emamdoost authored
      In the implementation of i2400m_op_rfkill_sw_toggle() the allocated
      buffer for cmd should be released before returning. The
      documentation for i2400m_msg_to_dev() says when it returns the buffer
      can be reused. Meaning cmd should be released in either case. Move
      kfree(cmd) before return to be reached by all execution paths.
      
      Fixes: 2507e6ab ("wimax: i2400: fix memory leak")
      Signed-off-by: default avatarNavid Emamdoost <navid.emamdoost@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6f3ef5c2
    • Jiangfeng Xiao's avatar
      net: hisilicon: Fix "Trying to free already-free IRQ" · 63a41746
      Jiangfeng Xiao authored
      When rmmod hip04_eth.ko, we can get the following warning:
      
      Task track: rmmod(1623)>bash(1591)>login(1581)>init(1)
      ------------[ cut here ]------------
      WARNING: CPU: 0 PID: 1623 at kernel/irq/manage.c:1557 __free_irq+0xa4/0x2ac()
      Trying to free already-free IRQ 200
      Modules linked in: ping(O) pramdisk(O) cpuinfo(O) rtos_snapshot(O) interrupt_ctrl(O) mtdblock mtd_blkdevrtfs nfs_acl nfs lockd grace sunrpc xt_tcpudp ipt_REJECT iptable_filter ip_tables x_tables nf_reject_ipv
      CPU: 0 PID: 1623 Comm: rmmod Tainted: G           O    4.4.193 #1
      Hardware name: Hisilicon A15
      [<c020b408>] (rtos_unwind_backtrace) from [<c0206624>] (show_stack+0x10/0x14)
      [<c0206624>] (show_stack) from [<c03f2be4>] (dump_stack+0xa0/0xd8)
      [<c03f2be4>] (dump_stack) from [<c021a780>] (warn_slowpath_common+0x84/0xb0)
      [<c021a780>] (warn_slowpath_common) from [<c021a7e8>] (warn_slowpath_fmt+0x3c/0x68)
      [<c021a7e8>] (warn_slowpath_fmt) from [<c026876c>] (__free_irq+0xa4/0x2ac)
      [<c026876c>] (__free_irq) from [<c0268a14>] (free_irq+0x60/0x7c)
      [<c0268a14>] (free_irq) from [<c0469e80>] (release_nodes+0x1c4/0x1ec)
      [<c0469e80>] (release_nodes) from [<c0466924>] (__device_release_driver+0xa8/0x104)
      [<c0466924>] (__device_release_driver) from [<c0466a80>] (driver_detach+0xd0/0xf8)
      [<c0466a80>] (driver_detach) from [<c0465e18>] (bus_remove_driver+0x64/0x8c)
      [<c0465e18>] (bus_remove_driver) from [<c02935b0>] (SyS_delete_module+0x198/0x1e0)
      [<c02935b0>] (SyS_delete_module) from [<c0202ed0>] (__sys_trace_return+0x0/0x10)
      ---[ end trace bb25d6123d849b44 ]---
      
      Currently "rmmod hip04_eth.ko" call free_irq more than once
      as devres_release_all and hip04_remove both call free_irq.
      This results in a 'Trying to free already-free IRQ' warning.
      To solve the problem free_irq has been moved out of hip04_remove.
      Signed-off-by: default avatarJiangfeng Xiao <xiaojiangfeng@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      63a41746
    • Will Deacon's avatar
      fjes: Handle workqueue allocation failure · 85ac30fa
      Will Deacon authored
      In the highly unlikely event that we fail to allocate either of the
      "/txrx" or "/control" workqueues, we should bail cleanly rather than
      blindly march on with NULL queue pointer(s) installed in the
      'fjes_adapter' instance.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Reported-by: default avatarNicolas Waisman <nico@semmle.com>
      Link: https://lore.kernel.org/lkml/CADJ_3a8WFrs5NouXNqS5WYe7rebFP+_A5CheeqAyD_p7DFJJcg@mail.gmail.com/Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      85ac30fa
  3. 28 Oct, 2019 13 commits
    • David S. Miller's avatar
      Merge tag 'batadv-net-for-davem-20191025' of git://git.open-mesh.org/linux-merge · 55793d2a
      David S. Miller authored
      Simon Wunderlich says:
      
      ====================
      Here are two batman-adv bugfixes:
      
       * Fix free/alloc race for OGM and OGMv2, by Sven Eckelmann (2 patches)
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      55793d2a
    • Daniel Wagner's avatar
      net: usb: lan78xx: Disable interrupts before calling generic_handle_irq() · 0a29ac5b
      Daniel Wagner authored
      lan78xx_status() will run with interrupts enabled due to the change in
      ed194d13 ("usb: core: remove local_irq_save() around ->complete()
      handler"). generic_handle_irq() expects to be run with IRQs disabled.
      
      [    4.886203] 000: irq 79 handler irq_default_primary_handler+0x0/0x8 enabled interrupts
      [    4.886243] 000: WARNING: CPU: 0 PID: 0 at kernel/irq/handle.c:152 __handle_irq_event_percpu+0x154/0x168
      [    4.896294] 000: Modules linked in:
      [    4.896301] 000: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.6 #39
      [    4.896310] 000: Hardware name: Raspberry Pi 3 Model B+ (DT)
      [    4.896315] 000: pstate: 60000005 (nZCv daif -PAN -UAO)
      [    4.896321] 000: pc : __handle_irq_event_percpu+0x154/0x168
      [    4.896331] 000: lr : __handle_irq_event_percpu+0x154/0x168
      [    4.896339] 000: sp : ffff000010003cc0
      [    4.896346] 000: x29: ffff000010003cc0 x28: 0000000000000060
      [    4.896355] 000: x27: ffff000011021980 x26: ffff00001189c72b
      [    4.896364] 000: x25: ffff000011702bc0 x24: ffff800036d6e400
      [    4.896373] 000: x23: 000000000000004f x22: ffff000010003d64
      [    4.896381] 000: x21: 0000000000000000 x20: 0000000000000002
      [    4.896390] 000: x19: ffff8000371c8480 x18: 0000000000000060
      [    4.896398] 000: x17: 0000000000000000 x16: 00000000000000eb
      [    4.896406] 000: x15: ffff000011712d18 x14: 7265746e69206465
      [    4.896414] 000: x13: ffff000010003ba0 x12: ffff000011712df0
      [    4.896422] 000: x11: 0000000000000001 x10: ffff000011712e08
      [    4.896430] 000: x9 : 0000000000000001 x8 : 000000000003c920
      [    4.896437] 000: x7 : ffff0000118cc410 x6 : ffff0000118c7f00
      [    4.896445] 000: x5 : 000000000003c920 x4 : 0000000000004510
      [    4.896453] 000: x3 : ffff000011712dc8 x2 : 0000000000000000
      [    4.896461] 000: x1 : 73a3f67df94c1500 x0 : 0000000000000000
      [    4.896466] 000: Call trace:
      [    4.896471] 000:  __handle_irq_event_percpu+0x154/0x168
      [    4.896481] 000:  handle_irq_event_percpu+0x50/0xb0
      [    4.896489] 000:  handle_irq_event+0x40/0x98
      [    4.896497] 000:  handle_simple_irq+0xa4/0xf0
      [    4.896505] 000:  generic_handle_irq+0x24/0x38
      [    4.896513] 000:  intr_complete+0xb0/0xe0
      [    4.896525] 000:  __usb_hcd_giveback_urb+0x58/0xd8
      [    4.896533] 000:  usb_giveback_urb_bh+0xd0/0x170
      [    4.896539] 000:  tasklet_action_common.isra.0+0x9c/0x128
      [    4.896549] 000:  tasklet_hi_action+0x24/0x30
      [    4.896556] 000:  __do_softirq+0x120/0x23c
      [    4.896564] 000:  irq_exit+0xb8/0xd8
      [    4.896571] 000:  __handle_domain_irq+0x64/0xb8
      [    4.896579] 000:  bcm2836_arm_irqchip_handle_irq+0x60/0xc0
      [    4.896586] 000:  el1_irq+0xb8/0x140
      [    4.896592] 000:  arch_cpu_idle+0x10/0x18
      [    4.896601] 000:  do_idle+0x200/0x280
      [    4.896608] 000:  cpu_startup_entry+0x20/0x28
      [    4.896615] 000:  rest_init+0xb4/0xc0
      [    4.896623] 000:  arch_call_rest_init+0xc/0x14
      [    4.896632] 000:  start_kernel+0x454/0x480
      
      Fixes: ed194d13 ("usb: core: remove local_irq_save() around ->complete() handler")
      Cc: Woojung Huh <woojung.huh@microchip.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Stefan Wahren <wahrenst@gmx.net>
      Cc: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: default avatarDaniel Wagner <dwagner@suse.de>
      Tested-by: default avatarStefan Wahren <wahrenst@gmx.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a29ac5b
    • Arnd Bergmann's avatar
      net: dsa: sja1105: improve NET_DSA_SJA1105_TAS dependency · 5d294fc4
      Arnd Bergmann authored
      An earlier bugfix introduced a dependency on CONFIG_NET_SCH_TAPRIO,
      but this missed the case of NET_SCH_TAPRIO=m and NET_DSA_SJA1105=y,
      which still causes a link error:
      
      drivers/net/dsa/sja1105/sja1105_tas.o: In function `sja1105_setup_tc_taprio':
      sja1105_tas.c:(.text+0x5c): undefined reference to `taprio_offload_free'
      sja1105_tas.c:(.text+0x3b4): undefined reference to `taprio_offload_get'
      drivers/net/dsa/sja1105/sja1105_tas.o: In function `sja1105_tas_teardown':
      sja1105_tas.c:(.text+0x6ec): undefined reference to `taprio_offload_free'
      
      Change the dependency to only allow selecting the TAS code when it
      can link against the taprio code.
      
      Fixes: a8d570de ("net: dsa: sja1105: Add dependency for NET_DSA_SJA1105_TAS")
      Fixes: 317ab5b8 ("net: dsa: sja1105: Configure the Time-Aware Scheduler via tc-taprio offload")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5d294fc4
    • Benjamin Herrenschmidt's avatar
      net: ethernet: ftgmac100: Fix DMA coherency issue with SW checksum · 88824e3b
      Benjamin Herrenschmidt authored
      We are calling the checksum helper after the dma_map_single()
      call to map the packet. This is incorrect as the checksumming
      code will touch the packet from the CPU. This means the cache
      won't be properly flushes (or the bounce buffering will leave
      us with the unmodified packet to DMA).
      
      This moves the calculation of the checksum & vlan tags to
      before the DMA mapping.
      
      This also has the side effect of fixing another bug: If the
      checksum helper fails, we goto "drop" to drop the packet, which
      will not unmap the DMA mapping.
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Fixes: 05690d63 ("ftgmac100: Upgrade to NETIF_F_HW_CSUM")
      Reviewed-by: default avatarVijay Khemka <vijaykhemka@fb.com>
      Tested-by: default avatarVijay Khemka <vijaykhemka@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      88824e3b
    • Tejun Heo's avatar
      net: fix sk_page_frag() recursion from memory reclaim · 20eb4f29
      Tejun Heo authored
      sk_page_frag() optimizes skb_frag allocations by using per-task
      skb_frag cache when it knows it's the only user.  The condition is
      determined by seeing whether the socket allocation mask allows
      blocking - if the allocation may block, it obviously owns the task's
      context and ergo exclusively owns current->task_frag.
      
      Unfortunately, this misses recursion through memory reclaim path.
      Please take a look at the following backtrace.
      
       [2] RIP: 0010:tcp_sendmsg_locked+0xccf/0xe10
           ...
           tcp_sendmsg+0x27/0x40
           sock_sendmsg+0x30/0x40
           sock_xmit.isra.24+0xa1/0x170 [nbd]
           nbd_send_cmd+0x1d2/0x690 [nbd]
           nbd_queue_rq+0x1b5/0x3b0 [nbd]
           __blk_mq_try_issue_directly+0x108/0x1b0
           blk_mq_request_issue_directly+0xbd/0xe0
           blk_mq_try_issue_list_directly+0x41/0xb0
           blk_mq_sched_insert_requests+0xa2/0xe0
           blk_mq_flush_plug_list+0x205/0x2a0
           blk_flush_plug_list+0xc3/0xf0
       [1] blk_finish_plug+0x21/0x2e
           _xfs_buf_ioapply+0x313/0x460
           __xfs_buf_submit+0x67/0x220
           xfs_buf_read_map+0x113/0x1a0
           xfs_trans_read_buf_map+0xbf/0x330
           xfs_btree_read_buf_block.constprop.42+0x95/0xd0
           xfs_btree_lookup_get_block+0x95/0x170
           xfs_btree_lookup+0xcc/0x470
           xfs_bmap_del_extent_real+0x254/0x9a0
           __xfs_bunmapi+0x45c/0xab0
           xfs_bunmapi+0x15/0x30
           xfs_itruncate_extents_flags+0xca/0x250
           xfs_free_eofblocks+0x181/0x1e0
           xfs_fs_destroy_inode+0xa8/0x1b0
           destroy_inode+0x38/0x70
           dispose_list+0x35/0x50
           prune_icache_sb+0x52/0x70
           super_cache_scan+0x120/0x1a0
           do_shrink_slab+0x120/0x290
           shrink_slab+0x216/0x2b0
           shrink_node+0x1b6/0x4a0
           do_try_to_free_pages+0xc6/0x370
           try_to_free_mem_cgroup_pages+0xe3/0x1e0
           try_charge+0x29e/0x790
           mem_cgroup_charge_skmem+0x6a/0x100
           __sk_mem_raise_allocated+0x18e/0x390
           __sk_mem_schedule+0x2a/0x40
       [0] tcp_sendmsg_locked+0x8eb/0xe10
           tcp_sendmsg+0x27/0x40
           sock_sendmsg+0x30/0x40
           ___sys_sendmsg+0x26d/0x2b0
           __sys_sendmsg+0x57/0xa0
           do_syscall_64+0x42/0x100
           entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      In [0], tcp_send_msg_locked() was using current->page_frag when it
      called sk_wmem_schedule().  It already calculated how many bytes can
      be fit into current->page_frag.  Due to memory pressure,
      sk_wmem_schedule() called into memory reclaim path which called into
      xfs and then IO issue path.  Because the filesystem in question is
      backed by nbd, the control goes back into the tcp layer - back into
      tcp_sendmsg_locked().
      
      nbd sets sk_allocation to (GFP_NOIO | __GFP_MEMALLOC) which makes
      sense - it's in the process of freeing memory and wants to be able to,
      e.g., drop clean pages to make forward progress.  However, this
      confused sk_page_frag() called from [2].  Because it only tests
      whether the allocation allows blocking which it does, it now thinks
      current->page_frag can be used again although it already was being
      used in [0].
      
      After [2] used current->page_frag, the offset would be increased by
      the used amount.  When the control returns to [0],
      current->page_frag's offset is increased and the previously calculated
      number of bytes now may overrun the end of allocated memory leading to
      silent memory corruptions.
      
      Fix it by adding gfpflags_normal_context() which tests sleepable &&
      !reclaim and use it to determine whether to use current->task_frag.
      
      v2: Eric didn't like gfp flags being tested twice.  Introduce a new
          helper gfpflags_normal_context() and combine the two tests.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      20eb4f29
    • Eric Dumazet's avatar
      udp: fix data-race in udp_set_dev_scratch() · a793183c
      Eric Dumazet authored
      KCSAN reported a data-race in udp_set_dev_scratch() [1]
      
      The issue here is that we must not write over skb fields
      if skb is shared. A similar issue has been fixed in commit
      89c22d8c ("net: Fix skb csum races when peeking")
      
      While we are at it, use a helper only dealing with
      udp_skb_scratch(skb)->csum_unnecessary, as this allows
      udp_set_dev_scratch() to be called once and thus inlined.
      
      [1]
      BUG: KCSAN: data-race in udp_set_dev_scratch / udpv6_recvmsg
      
      write to 0xffff888120278317 of 1 bytes by task 10411 on cpu 1:
       udp_set_dev_scratch+0xea/0x200 net/ipv4/udp.c:1308
       __first_packet_length+0x147/0x420 net/ipv4/udp.c:1556
       first_packet_length+0x68/0x2a0 net/ipv4/udp.c:1579
       udp_poll+0xea/0x110 net/ipv4/udp.c:2720
       sock_poll+0xed/0x250 net/socket.c:1256
       vfs_poll include/linux/poll.h:90 [inline]
       do_select+0x7d0/0x1020 fs/select.c:534
       core_sys_select+0x381/0x550 fs/select.c:677
       do_pselect.constprop.0+0x11d/0x160 fs/select.c:759
       __do_sys_pselect6 fs/select.c:784 [inline]
       __se_sys_pselect6 fs/select.c:769 [inline]
       __x64_sys_pselect6+0x12e/0x170 fs/select.c:769
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      read to 0xffff888120278317 of 1 bytes by task 10413 on cpu 0:
       udp_skb_csum_unnecessary include/net/udp.h:358 [inline]
       udpv6_recvmsg+0x43e/0xe90 net/ipv6/udp.c:310
       inet6_recvmsg+0xbb/0x240 net/ipv6/af_inet6.c:592
       sock_recvmsg_nosec+0x5c/0x70 net/socket.c:871
       ___sys_recvmsg+0x1a0/0x3e0 net/socket.c:2480
       do_recvmmsg+0x19a/0x5c0 net/socket.c:2601
       __sys_recvmmsg+0x1ef/0x200 net/socket.c:2680
       __do_sys_recvmmsg net/socket.c:2703 [inline]
       __se_sys_recvmmsg net/socket.c:2696 [inline]
       __x64_sys_recvmmsg+0x89/0xb0 net/socket.c:2696
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 10413 Comm: syz-executor.0 Not tainted 5.4.0-rc3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 2276f58a ("udp: use a separate rx queue for packet reception")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a793183c
    • Nishad Kamdar's avatar
      net: dpaa2: Use the correct style for SPDX License Identifier · 7de4344f
      Nishad Kamdar authored
      This patch corrects the SPDX License Identifier style in
      header files related to DPAA2 Ethernet driver supporting
      Freescale SoCs with DPAA2. For C header files
      Documentation/process/license-rules.rst mandates C-like comments
      (opposed to C source files where C++ style should be used)
      
      Changes made by using a script provided by Joe Perches here:
      https://lkml.org/lkml/2019/2/7/46.
      Suggested-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarNishad Kamdar <nishadkamdar@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7de4344f
    • David S. Miller's avatar
      Merge branch 'net-avoid-KCSAN-splats' · 20243058
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      net: avoid KCSAN splats
      
      Often times we use skb_queue_empty() without holding a lock,
      meaning that other cpus (or interrupt) can change the queue
      under us. This is fine, but we need to properly annotate
      the lockless intent to make sure the compiler wont over
      optimize things.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      20243058
    • Eric Dumazet's avatar
      net: add READ_ONCE() annotation in __skb_wait_for_more_packets() · 7c422d0c
      Eric Dumazet authored
      __skb_wait_for_more_packets() can be called while other cpus
      can feed packets to the socket receive queue.
      
      KCSAN reported :
      
      BUG: KCSAN: data-race in __skb_wait_for_more_packets / __udp_enqueue_schedule_skb
      
      write to 0xffff888102e40b58 of 8 bytes by interrupt on cpu 0:
       __skb_insert include/linux/skbuff.h:1852 [inline]
       __skb_queue_before include/linux/skbuff.h:1958 [inline]
       __skb_queue_tail include/linux/skbuff.h:1991 [inline]
       __udp_enqueue_schedule_skb+0x2d7/0x410 net/ipv4/udp.c:1470
       __udp_queue_rcv_skb net/ipv4/udp.c:1940 [inline]
       udp_queue_rcv_one_skb+0x7bd/0xc70 net/ipv4/udp.c:2057
       udp_queue_rcv_skb+0xb5/0x400 net/ipv4/udp.c:2074
       udp_unicast_rcv_skb.isra.0+0x7e/0x1c0 net/ipv4/udp.c:2233
       __udp4_lib_rcv+0xa44/0x17c0 net/ipv4/udp.c:2300
       udp_rcv+0x2b/0x40 net/ipv4/udp.c:2470
       ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204
       ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:442 [inline]
       ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
       __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
       __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
       process_backlog+0x1d3/0x420 net/core/dev.c:5955
      
      read to 0xffff888102e40b58 of 8 bytes by task 13035 on cpu 1:
       __skb_wait_for_more_packets+0xfa/0x320 net/core/datagram.c:100
       __skb_recv_udp+0x374/0x500 net/ipv4/udp.c:1683
       udp_recvmsg+0xe1/0xb10 net/ipv4/udp.c:1712
       inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
       sock_recvmsg_nosec+0x5c/0x70 net/socket.c:871
       ___sys_recvmsg+0x1a0/0x3e0 net/socket.c:2480
       do_recvmmsg+0x19a/0x5c0 net/socket.c:2601
       __sys_recvmmsg+0x1ef/0x200 net/socket.c:2680
       __do_sys_recvmmsg net/socket.c:2703 [inline]
       __se_sys_recvmmsg net/socket.c:2696 [inline]
       __x64_sys_recvmmsg+0x89/0xb0 net/socket.c:2696
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 13035 Comm: syz-executor.3 Not tainted 5.4.0-rc3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c422d0c
    • Eric Dumazet's avatar
      net: use skb_queue_empty_lockless() in busy poll contexts · 3f926af3
      Eric Dumazet authored
      Busy polling usually runs without locks.
      Let's use skb_queue_empty_lockless() instead of skb_queue_empty()
      
      Also uses READ_ONCE() in __skb_try_recv_datagram() to address
      a similar potential problem.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3f926af3
    • Eric Dumazet's avatar
      net: use skb_queue_empty_lockless() in poll() handlers · 3ef7cf57
      Eric Dumazet authored
      Many poll() handlers are lockless. Using skb_queue_empty_lockless()
      instead of skb_queue_empty() is more appropriate.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3ef7cf57
    • Eric Dumazet's avatar
      udp: use skb_queue_empty_lockless() · 137a0dbe
      Eric Dumazet authored
      syzbot reported a data-race [1].
      
      We should use skb_queue_empty_lockless() to document that we are
      not ensuring a mutual exclusion and silence KCSAN.
      
      [1]
      BUG: KCSAN: data-race in __skb_recv_udp / __udp_enqueue_schedule_skb
      
      write to 0xffff888122474b50 of 8 bytes by interrupt on cpu 0:
       __skb_insert include/linux/skbuff.h:1852 [inline]
       __skb_queue_before include/linux/skbuff.h:1958 [inline]
       __skb_queue_tail include/linux/skbuff.h:1991 [inline]
       __udp_enqueue_schedule_skb+0x2c1/0x410 net/ipv4/udp.c:1470
       __udp_queue_rcv_skb net/ipv4/udp.c:1940 [inline]
       udp_queue_rcv_one_skb+0x7bd/0xc70 net/ipv4/udp.c:2057
       udp_queue_rcv_skb+0xb5/0x400 net/ipv4/udp.c:2074
       udp_unicast_rcv_skb.isra.0+0x7e/0x1c0 net/ipv4/udp.c:2233
       __udp4_lib_rcv+0xa44/0x17c0 net/ipv4/udp.c:2300
       udp_rcv+0x2b/0x40 net/ipv4/udp.c:2470
       ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204
       ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:442 [inline]
       ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
       __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
       __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
       process_backlog+0x1d3/0x420 net/core/dev.c:5955
      
      read to 0xffff888122474b50 of 8 bytes by task 8921 on cpu 1:
       skb_queue_empty include/linux/skbuff.h:1494 [inline]
       __skb_recv_udp+0x18d/0x500 net/ipv4/udp.c:1653
       udp_recvmsg+0xe1/0xb10 net/ipv4/udp.c:1712
       inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
       sock_recvmsg_nosec+0x5c/0x70 net/socket.c:871
       ___sys_recvmsg+0x1a0/0x3e0 net/socket.c:2480
       do_recvmmsg+0x19a/0x5c0 net/socket.c:2601
       __sys_recvmmsg+0x1ef/0x200 net/socket.c:2680
       __do_sys_recvmmsg net/socket.c:2703 [inline]
       __se_sys_recvmmsg net/socket.c:2696 [inline]
       __x64_sys_recvmmsg+0x89/0xb0 net/socket.c:2696
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 8921 Comm: syz-executor.4 Not tainted 5.4.0-rc3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      137a0dbe
    • Eric Dumazet's avatar
      net: add skb_queue_empty_lockless() · d7d16a89
      Eric Dumazet authored
      Some paths call skb_queue_empty() without holding
      the queue lock. We must use a barrier in order
      to not let the compiler do strange things, and avoid
      KCSAN splats.
      
      Adding a barrier in skb_queue_empty() might be overkill,
      I prefer adding a new helper to clearly identify
      points where the callers might be lockless. This might
      help us finding real bugs.
      
      The corresponding WRITE_ONCE() should add zero cost
      for current compilers.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d7d16a89
  4. 27 Oct, 2019 2 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · fc11078d
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter/IPVS fixes for net
      
      The following patchset contains Netfilter/IPVS fixes for net:
      
      1) Fix crash on flowtable due to race between garbage collection
         and insertion.
      
      2) Restore callback unbinding in netfilter offloads.
      
      3) Fix races on IPVS module removal, from Davide Caratti.
      
      4) Make old_secure_tcp per-netns to fix sysbot report,
         from Eric Dumazet.
      
      5) Validate matching length in netfilter offloads, from wenxu.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fc11078d
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 1a51a474
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2019-10-27
      
      The following pull-request contains BPF updates for your *net* tree.
      
      We've added 7 non-merge commits during the last 11 day(s) which contain
      a total of 7 files changed, 66 insertions(+), 16 deletions(-).
      
      The main changes are:
      
      1) Fix two use-after-free bugs in relation to RCU in jited symbol exposure to
         kallsyms, from Daniel Borkmann.
      
      2) Fix NULL pointer dereference in AF_XDP rx-only sockets, from Magnus Karlsson.
      
      3) Fix hang in netdev unregister for hash based devmap as well as another overflow
         bug on 32 bit archs in memlock cost calculation, from Toke Høiland-Jørgensen.
      
      4) Fix wrong memory access in LWT BPF programs on reroute due to invalid dst.
         Also fix BPF selftests to use more compatible nc options, from Jiri Benc.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a51a474
  5. 26 Oct, 2019 11 commits
  6. 25 Oct, 2019 3 commits
    • Ben Dooks (Codethink)'s avatar
      net: hwbm: if CONFIG_NET_HWBM unset, make stub functions static · 91e2e576
      Ben Dooks (Codethink) authored
      If CONFIG_NET_HWBM is not set, then these stub functions in
      <net/hwbm.h> should be declared static to avoid trying to
      export them from any driver that includes this.
      
      Fixes the following sparse warnings:
      
      ./include/net/hwbm.h:24:6: warning: symbol 'hwbm_buf_free' was not declared. Should it be static?
      ./include/net/hwbm.h:25:5: warning: symbol 'hwbm_pool_refill' was not declared. Should it be static?
      ./include/net/hwbm.h:26:5: warning: symbol 'hwbm_pool_add' was not declared. Should it be static?
      Signed-off-by: default avatarBen Dooks (Codethink) <ben.dooks@codethink.co.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      91e2e576
    • Ben Dooks (Codethink)'s avatar
      net: mvneta: make stub functions static inline · 3f6b2c44
      Ben Dooks (Codethink) authored
      If the CONFIG_MVNET_BA is not set, then make the stub functions
      static inline to avoid trying to export them, and remove hte
      following sparse warnings:
      
      drivers/net/ethernet/marvell/mvneta_bm.h:163:6: warning: symbol 'mvneta_bm_pool_destroy' was not declared. Should it be static?
      drivers/net/ethernet/marvell/mvneta_bm.h:165:6: warning: symbol 'mvneta_bm_bufs_free' was not declared. Should it be static?
      drivers/net/ethernet/marvell/mvneta_bm.h:167:5: warning: symbol 'mvneta_bm_construct' was not declared. Should it be static?
      drivers/net/ethernet/marvell/mvneta_bm.h:168:5: warning: symbol 'mvneta_bm_pool_refill' was not declared. Should it be static?
      drivers/net/ethernet/marvell/mvneta_bm.h:170:23: warning: symbol 'mvneta_bm_pool_use' was not declared. Should it be static?
      drivers/net/ethernet/marvell/mvneta_bm.h:181:18: warning: symbol 'mvneta_bm_get' was not declared. Should it be static?
      drivers/net/ethernet/marvell/mvneta_bm.h:182:6: warning: symbol 'mvneta_bm_put' was not declared. Should it be static?
      Signed-off-by: default avatarBen Dooks (Codethink) <ben.dooks@codethink.co.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3f6b2c44
    • Vincent Prince's avatar
      net: sch_generic: Use pfifo_fast as fallback scheduler for CAN hardware · fa784f2a
      Vincent Prince authored
      There is networking hardware that isn't based on Ethernet for layers 1 and 2.
      
      For example CAN.
      
      CAN is a multi-master serial bus standard for connecting Electronic Control
      Units [ECUs] also known as nodes. A frame on the CAN bus carries up to 8 bytes
      of payload. Frame corruption is detected by a CRC. However frame loss due to
      corruption is possible, but a quite unusual phenomenon.
      
      While fq_codel works great for TCP/IP, it doesn't for CAN. There are a lot of
      legacy protocols on top of CAN, which are not build with flow control or high
      CAN frame drop rates in mind.
      
      When using fq_codel, as soon as the queue reaches a certain delay based length,
      skbs from the head of the queue are silently dropped. Silently meaning that the
      user space using a send() or similar syscall doesn't get an error. However
      TCP's flow control algorithm will detect dropped packages and adjust the
      bandwidth accordingly.
      
      When using fq_codel and sending raw frames over CAN, which is the common use
      case, the user space thinks the package has been sent without problems, because
      send() returned without an error. pfifo_fast will drop skbs, if the queue
      length exceeds the maximum. But with this scheduler the skbs at the tail are
      dropped, an error (-ENOBUFS) is propagated to user space. So that the user
      space can slow down the package generation.
      
      On distributions, where fq_codel is made default via CONFIG_DEFAULT_NET_SCH
      during compile time, or set default during runtime with sysctl
      net.core.default_qdisc (see [1]), we get a bad user experience. In my test case
      with pfifo_fast, I can transfer thousands of million CAN frames without a frame
      drop. On the other hand with fq_codel there is more then one lost CAN frame per
      thousand frames.
      
      As pointed out fq_codel is not suited for CAN hardware, so this patch changes
      attach_one_default_qdisc() to use pfifo_fast for "ARPHRD_CAN" network devices.
      
      During transition of a netdev from down to up state the default queuing
      discipline is attached by attach_default_qdiscs() with the help of
      attach_one_default_qdisc(). This patch modifies attach_one_default_qdisc() to
      attach the pfifo_fast (pfifo_fast_ops) if the network device type is
      "ARPHRD_CAN".
      
      [1] https://github.com/systemd/systemd/issues/9194Signed-off-by: default avatarVincent Prince <vincent.prince.fr@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fa784f2a