1. 03 Jul, 2019 1 commit
    • Mahesh Bandewar's avatar
      loopback: fix lockdep splat · d62962b3
      Mahesh Bandewar authored
      dev_init_scheduler() and dev_activate() expect the caller to
      hold RTNL. Since we don't want blackhole device to be initialized
      per ns, we are initializing at init.
      
      [    3.855027] Call Trace:
      [    3.855034]  dump_stack+0x67/0x95
      [    3.855037]  lockdep_rcu_suspicious+0xd5/0x110
      [    3.855044]  dev_init_scheduler+0xe3/0x120
      [    3.855048]  ? net_olddevs_init+0x60/0x60
      [    3.855050]  blackhole_netdev_init+0x45/0x6e
      [    3.855052]  do_one_initcall+0x6c/0x2fa
      [    3.855058]  ? rcu_read_lock_sched_held+0x8c/0xa0
      [    3.855066]  kernel_init_freeable+0x1e5/0x288
      [    3.855071]  ? rest_init+0x260/0x260
      [    3.855074]  kernel_init+0xf/0x180
      [    3.855076]  ? rest_init+0x260/0x260
      [    3.855078]  ret_from_fork+0x24/0x30
      
      Fixes: 4de83b88 ("loopback: create blackhole net device similar to loopack.")
      Reported-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Tested-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d62962b3
  2. 02 Jul, 2019 39 commits
    • Petr Machata's avatar
      mlxsw: spectrum_ptp: Fix validation in mlxsw_sp1_ptp_packet_finish() · dbcdb61a
      Petr Machata authored
      Before mlxsw_sp1_ptp_packet_finish() sends the packet back, it validates
      whether the corresponding port is still valid. However the condition is
      incorrect: when mlxsw_sp_port == NULL, the code dereferences the port to
      compare it to skb->dev.
      
      The condition needs to check whether the port is present and skb->dev still
      refers to that port (or else is NULL). If that does not hold, bail out.
      Add a pair of parentheses to fix the condition.
      
      Fixes: d92e4e6e ("mlxsw: spectrum: PTP: Support timestamping on Spectrum-1")
      Reported-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dbcdb61a
    • Heiner Kallweit's avatar
      r8169: add random MAC address fallback · c782e204
      Heiner Kallweit authored
      It was reported that the GPD MicroPC is broken in a way that no valid
      MAC address can be read from the network chip. The vendor driver deals
      with this by assigning a random MAC address as fallback. So let's do
      the same.
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c782e204
    • Heiner Kallweit's avatar
      Revert "r8169: improve handling VLAN tag" · 7424edbb
      Heiner Kallweit authored
      This reverts commit 759d0957.
      
      The patch was based on a misunderstanding. As Al Viro pointed out [0]
      it's simply wrong on big endian. So let's revert it.
      
      [0] https://marc.info/?t=156200975600004&r=1&w=2Reported-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7424edbb
    • Martin Blumenstingl's avatar
      net: stmmac: make "snps,reset-delays-us" optional again · cc5e92c2
      Martin Blumenstingl authored
      Commit 760f1dc2 ("net: stmmac: add sanity check to
      device_property_read_u32_array call") introduced error checking of the
      device_property_read_u32_array() call in stmmac_mdio_reset().
      This results in the following error when the "snps,reset-delays-us"
      property is not defined in devicetree:
        invalid property snps,reset-delays-us
      
      This sanity check made sense until commit 84ce4d0f ("net: stmmac:
      initialize the reset delay array") ensured that there are fallback
      values for the reset delay if the "snps,reset-delays-us" property is
      absent. That was at the cost of making that property mandatory though.
      
      Drop the sanity check for device_property_read_u32_array() and thus make
      the "snps,reset-delays-us" property optional again (avoiding the error
      message while loading the stmmac driver with a .dtb where the property
      is absent).
      
      Fixes: 760f1dc2 ("net: stmmac: add sanity check to device_property_read_u32_array call")
      Signed-off-by: default avatarMartin Blumenstingl <martin.blumenstingl@googlemail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc5e92c2
    • Eric Dumazet's avatar
      bonding/main: fix NULL dereference in bond_select_active_slave() · b8bd72d3
      Eric Dumazet authored
      A bonding master can be up while best_slave is NULL.
      
      [12105.636318] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
      [12105.638204] mlx4_en: eth1: Linkstate event 1 -> 1
      [12105.648984] IP: bond_select_active_slave+0x125/0x250
      [12105.653977] PGD 0 P4D 0
      [12105.656572] Oops: 0000 [#1] SMP PTI
      [12105.660487] gsmi: Log Shutdown Reason 0x03
      [12105.664620] Modules linked in: kvm_intel loop act_mirred uhaul vfat fat stg_standard_ftl stg_megablocks stg_idt stg_hdi stg elephant_dev_num stg_idt_eeprom w1_therm wire i2c_mux_pca954x i2c_mux mlx4_i2c i2c_usb cdc_acm ehci_pci ehci_hcd i2c_iimc mlx4_en mlx4_ib ib_uverbs ib_core mlx4_core [last unloaded: kvm_intel]
      [12105.685686] mlx4_core 0000:03:00.0: dispatching link up event for port 2
      [12105.685700] mlx4_en: eth2: Linkstate event 2 -> 1
      [12105.685700] mlx4_en: eth2: Link Up (linkstate)
      [12105.724452] Workqueue: bond0 bond_mii_monitor
      [12105.728854] RIP: 0010:bond_select_active_slave+0x125/0x250
      [12105.734355] RSP: 0018:ffffaf146a81fd88 EFLAGS: 00010246
      [12105.739637] RAX: 0000000000000003 RBX: ffff8c62b03c6900 RCX: 0000000000000000
      [12105.746838] RDX: 0000000000000000 RSI: ffffaf146a81fd08 RDI: ffff8c62b03c6000
      [12105.754054] RBP: ffffaf146a81fdb8 R08: 0000000000000001 R09: ffff8c517d387600
      [12105.761299] R10: 00000000001075d9 R11: ffffffffaceba92f R12: 0000000000000000
      [12105.768553] R13: ffff8c8240ae4800 R14: 0000000000000000 R15: 0000000000000000
      [12105.775748] FS:  0000000000000000(0000) GS:ffff8c62bfa40000(0000) knlGS:0000000000000000
      [12105.783892] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [12105.789716] CR2: 0000000000000000 CR3: 0000000d0520e001 CR4: 00000000001626f0
      [12105.796976] Call Trace:
      [12105.799446]  [<ffffffffac31d387>] bond_mii_monitor+0x497/0x6f0
      [12105.805317]  [<ffffffffabd42643>] process_one_work+0x143/0x370
      [12105.811225]  [<ffffffffabd42c7a>] worker_thread+0x4a/0x360
      [12105.816761]  [<ffffffffabd48bc5>] kthread+0x105/0x140
      [12105.821865]  [<ffffffffabd42c30>] ? rescuer_thread+0x380/0x380
      [12105.827757]  [<ffffffffabd48ac0>] ? kthread_associate_blkcg+0xc0/0xc0
      [12105.834266]  [<ffffffffac600241>] ret_from_fork+0x51/0x60
      
      Fixes: e2a7420d ("bonding/main: convert to using slave printk macros")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarJohn Sperbeck <jsperbeck@google.com>
      Cc: Jarod Wilson <jarod@redhat.com>
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b8bd72d3
    • Xin Long's avatar
      tipc: remove ub->ubsock checks · d2c3a4ba
      Xin Long authored
      Both tipc_udp_enable and tipc_udp_disable are called under rtnl_lock,
      ub->ubsock could never be NULL in tipc_udp_disable and cleanup_bearer,
      so remove the check.
      
      Also remove the one in tipc_udp_enable by adding "free" label.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d2c3a4ba
    • Stefano Brivio's avatar
      ipv4: Fix off-by-one in route dump counter without netlink strict checking · 885b8b4d
      Stefano Brivio authored
      In commit ee28906f ("ipv4: Dump route exceptions if requested") I
      added a counter of per-node dumped routes (including actual routes and
      exceptions), analogous to the existing counter for dumped nodes. Dumping
      exceptions means we need to also keep track of how many routes are dumped
      for each node: this would be just one route per node, without exceptions.
      
      When netlink strict checking is not enabled, we dump both routes and
      exceptions at the same time: the RTM_F_CLONED flag is not used as a
      filter. In this case, the per-node counter 'i_fa' is incremented by one
      to track the single dumped route, then also incremented by one for each
      exception dumped, and then stored as netlink callback argument as skip
      counter, 's_fa', to be used when a partial dump operation restarts.
      
      The per-node counter needs to be increased by one also when we skip a
      route (exception) due to a previous non-zero skip counter, because it
      needs to match the existing skip counter, if we are dumping both routes
      and exceptions. I missed this, and only incremented the counter, for
      regular routes, if the previous skip counter was zero. This means that,
      in case of a mixed dump, partial dump operations after the first one
      will start with a mismatching skip counter value, one less than expected.
      
      This means in turn that the first exception for a given node is skipped
      every time a partial dump operation restarts, if netlink strict checking
      is not enabled (iproute < 5.0).
      
      It turns out I didn't repeat the test in its final version, commit
      de755a85 ("selftests: pmtu: Introduce list_flush_ipv4_exception test
      case"), which also counts the number of route exceptions returned, with
      iproute2 versions < 5.0 -- I was instead using the equivalent of the IPv6
      test as it was before commit b964641e ("selftests: pmtu: Make
      list_flush_ipv6_exception test more demanding").
      
      Always increment the per-node counter by one if we previously dumped
      a regular route, so that it matches the current skip counter.
      
      Fixes: ee28906f ("ipv4: Dump route exceptions if requested")
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      885b8b4d
    • René van Dorst's avatar
      net: ethernet: mediatek: Allow non TRGMII mode with MT7621 DDR2 devices · cce581a0
      René van Dorst authored
      No reason to error out on a MT7621 device with DDR2 memory when non
      TRGMII mode is selected.
      Only MT7621 DDR2 clock setup is not supported for TRGMII mode.
      But non TRGMII mode doesn't need any special clock setup.
      Signed-off-by: default avatarRené van Dorst <opensource@vdorst.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cce581a0
    • David Howells's avatar
      rxrpc: Fix uninitialized error code in rxrpc_send_data_packet() · 3427beb6
      David Howells authored
      With gcc 4.1:
      
          net/rxrpc/output.c: In function ‘rxrpc_send_data_packet’:
          net/rxrpc/output.c:338: warning: ‘ret’ may be used uninitialized in this function
      
      Indeed, if the first jump to the send_fragmentable label is made, and
      the address family is not handled in the switch() statement, ret will be
      used uninitialized.
      
      Fix this by BUG()'ing as is done in other places in rxrpc where internal
      support for future address families will need adding.  It should not be
      possible to reach this normally as the address families are checked
      up-front.
      
      Fixes: 5a924b89 ("rxrpc: Don't store the rxrpc header in the Tx queue sk_buffs")
      Reported-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3427beb6
    • Colin Ian King's avatar
      nfc: st-nci: remove redundant assignment to variable r · 23ec8eaf
      Colin Ian King authored
      The variable r is being initialized with a value that is never
      read and it is being updated later with a new value. The
      initialization is redundant and can be removed.
      
      Addresses-Coverity: ("Unused value")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      23ec8eaf
    • Xue Chaojing's avatar
      hinic: remove standard netdev stats · 83b6a85b
      Xue Chaojing authored
      This patch removes standard netdev stats in ethtool -S.
      Suggested-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarXue Chaojing <xuechaojing@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83b6a85b
    • Jose Abreu's avatar
      net: stmmac: Re-word Kconfig entry · b432bdb6
      Jose Abreu authored
      We support many speeds and it doesn't make much sense to list them all
      in the Kconfig. Let's just call it Multi-Gigabit.
      Suggested-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJose Abreu <joabreu@synopsys.com>
      Cc: Joao Pinto <jpinto@synopsys.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b432bdb6
    • David S. Miller's avatar
      Merge branch 'Add-gve-driver' · 337d1ccb
      David S. Miller authored
      Catherine Sullivan says:
      
      ====================
      Add gve driver
      
      This patch series adds the gve driver which will support the
      Compute Engine Virtual NIC that will be available in the future.
      
      v2:
      - Patch 1:
        - Remove gve_size_assert.h and use static_assert instead.
        - Loop forever instead of bugging if the device won't reset
        - Use module_pci_driver
      - Patch 2:
        - Use be16_to_cpu in the RX Seq No define
        - Remove unneeded ndo_change_mtu
      - Patch 3:
        - No Changes
      - Patch 4:
        - Instead of checking netif_carrier_ok in ethtool stats, just make sure
      
      v3:
      - Patch 1:
        - Remove X86 dep
      - Patch 2:
        - No changes
      - Patch 3:
        - No changes
      - Patch 4:
        - Remove unneeded memsets in ethtool stats
      
      v4:
      - Patch 1:
        - Use io[read|write]32be instead of [read|write]l(cpu_to_be32())
        - Explicitly add padding to gve_adminq_set_driver_parameter
        - Use static where appropriate
      - Patch 2:
        - Use u64_stats_sync
        - Explicity add padding to gve_adminq_create_rx_queue
        - Fix some enianness typing issues found by kbuild
        - Use static where appropriate
        - Remove unused variables
      - Patch 3:
        - Use io[read|write]32be instead of [read|write]l(cpu_to_be32())
      - Patch 4:
        - Use u64_stats_sync
        - Use static where appropriate
      Warnings reported by:
      Reported-by: default avatarkbuild test robot <lkp@intel.com>
      Reported-by: default avatarJulia Lawall <julia.lawall@lip6.fr>
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      337d1ccb
    • Catherine Sullivan's avatar
      gve: Add ethtool support · e5b845dc
      Catherine Sullivan authored
      Add support for the following ethtool commands:
      
      ethtool -s|--change devname [msglvl N] [msglevel type on|off]
      ethtool -S|--statistics devname
      ethtool -i|--driver devname
      ethtool -l|--show-channels devname
      ethtool -L|--set-channels devname
      ethtool -g|--show-ring devname
      ethtool --reset devname
      Signed-off-by: default avatarCatherine Sullivan <csully@google.com>
      Signed-off-by: default avatarSagi Shahar <sagis@google.com>
      Signed-off-by: default avatarJon Olson <jonolson@google.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarLuigi Rizzo <lrizzo@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e5b845dc
    • Catherine Sullivan's avatar
      gve: Add workqueue and reset support · 9e5f7d26
      Catherine Sullivan authored
      Add support for the workqueue to handle management interrupts and
      support for resets.
      Signed-off-by: default avatarCatherine Sullivan <csully@google.com>
      Signed-off-by: default avatarSagi Shahar <sagis@google.com>
      Signed-off-by: default avatarJon Olson <jonolson@google.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarLuigi Rizzo <lrizzo@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9e5f7d26
    • Catherine Sullivan's avatar
      gve: Add transmit and receive support · f5cedc84
      Catherine Sullivan authored
      Add support for passing traffic.
      Signed-off-by: default avatarCatherine Sullivan <csully@google.com>
      Signed-off-by: default avatarSagi Shahar <sagis@google.com>
      Signed-off-by: default avatarJon Olson <jonolson@google.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarLuigi Rizzo <lrizzo@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f5cedc84
    • Catherine Sullivan's avatar
      gve: Add basic driver framework for Compute Engine Virtual NIC · 893ce44d
      Catherine Sullivan authored
      Add a driver framework for the Compute Engine Virtual NIC that will be
      available in the future.
      
      At this point the only functionality is loading the driver.
      Signed-off-by: default avatarCatherine Sullivan <csully@google.com>
      Signed-off-by: default avatarSagi Shahar <sagis@google.com>
      Signed-off-by: default avatarJon Olson <jonolson@google.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarLuigi Rizzo <lrizzo@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      893ce44d
    • David S. Miller's avatar
      Merge branch 'blackhole-device-to-invalidate-dst' · 2a8d8e0f
      David S. Miller authored
      Mahesh Bandewar says:
      
      ====================
      blackhole device to invalidate dst
      
      When we invalidate dst or mark it "dead", we assign 'lo' to
      dst->dev. First of all this assignment is racy and more over,
      it has MTU implications.
      
      The standard dev MTU is 1500 while the Loopback MTU is 64k. TCP
      code when dereferencing the dst don't check if the dst is valid
      or not. TCP when dereferencing a dead-dst while negotiating a
      new connection, may use dst device which is 'lo' instead of
      using the correct device. Consider the following scenario:
      
      A SYN arrives on an interface and tcp-layer while processing
      SYNACK finds a dst and associates it with SYNACK skb. Now before
      skb gets passed to L3 for processing, if that dst gets "dead"
      (because of the virtual device getting disappeared & then reappeared),
      the 'lo' gets assigned to that dst (lo MTU = 64k). Let's assume
      the SYN has ADV_MSS set as 9k while the output device through
      which this SYNACK is going to go out has standard MTU of 1500.
      The MTU check during the route check passes since MIN(9K, 64K)
      is 9k and TCP successfully negotiates 9k MSS. The subsequent
      data packet; bigger in size gets passed to the device and it
      won't be marked as GSO since the assumed MTU of the device is
      9k.
      
      This either crashes the NIC and we have seen fixes that went
      into drivers to handle this scenario. 8914a595 ('bnx2x:
      disable GSO where gso_size is too big for hardware') and
      2b16f048 ('net: create skb_gso_validate_mac_len()') and
      with those fixes TCP eventually recovers but not before
      few dropped segments.
      
      Well, I'm not a TCP expert and though we have experienced
      these corner cases in our environment, I could not reproduce
      this case reliably in my test setup to try this fix myself.
      However, Michael Chan <michael.chan@broadcom.com> had a setup
      where these fixes helped him mitigate the issue and not cause
      the crash.
      
      The idea here is to not alter the data-path with additional
      locks or smb()/rmb() barriers to avoid racy assignments but
      to create a new device that has really low MTU that has
      .ndo_start_xmit essentially a kfree_skb(). Make use of this
      device instead of 'lo' when marking the dst dead.
      
      First patch implements the blackhole device and second
      patch uses it in IPv4 and IPv6 stack while the third patch
      is the self test that ensures the sanity of this device.
      
      v1->v2
        fixed the self-test patch to handle the conflict
      
      v2 -> v3
        fixed Kconfig text/string.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2a8d8e0f
    • Mahesh Bandewar's avatar
      blackhole_dev: add a selftest · 509e56b3
      Mahesh Bandewar authored
      Since this is not really a device with all capabilities, this test
      ensures that it has *enough* to make it through the data path
      without causing unwanted side-effects (read crash!).
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      509e56b3
    • Mahesh Bandewar's avatar
      blackhole_netdev: use blackhole_netdev to invalidate dst entries · 8d7017fd
      Mahesh Bandewar authored
      Use blackhole_netdev instead of 'lo' device with lower MTU when marking
      dst "dead".
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Tested-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d7017fd
    • Mahesh Bandewar's avatar
      loopback: create blackhole net device similar to loopack. · 4de83b88
      Mahesh Bandewar authored
      Create a blackhole net device that can be used for "dead"
      dst entries instead of loopback device. This blackhole device differs
      from loopback in few aspects: (a) It's not per-ns. (b)  MTU on this
      device is ETH_MIN_MTU (c) The xmit function is essentially kfree_skb().
      and (d) since it's not registered it won't have ifindex.
      
      Lower MTU effectively make the device not pass the MTU check during
      the route check when a dst associated with the skb is dead.
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4de83b88
    • Hariprasad Kelam's avatar
      net: ethernet: broadcom: bcm63xx_enet: Remove unneeded memset · 8909783c
      Hariprasad Kelam authored
      Remove unneeded memset as alloc_etherdev is using kvzalloc which uses
      __GFP_ZERO flag
      Signed-off-by: default avatarHariprasad Kelam <hariprasad.kelam@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8909783c
    • David S. Miller's avatar
      Merge branch 'net-netsec-Add-XDP-Support' · fec3b9ec
      David S. Miller authored
      Ilias Apalodimas says:
      
      ====================
      net: netsec: Add XDP Support
      
      This is a respin of https://www.spinics.net/lists/netdev/msg526066.html
      Since page_pool API fixes are merged into net-next we can now safely use
      it's DMA mapping capabilities.
      
      First patch changes the buffer allocation from napi/netdev_alloc_frag()
      to page_pool API. Although this will lead to slightly reduced performance
      (on raw packet drops only) we can use the API for XDP buffer recycling.
      Another side effect is a slight increase in memory usage, due to using a
      single page per packet.
      
      The second patch adds XDP support on the driver.
      There's a bunch of interesting options that come up due to the single
      Tx queue.
      Locking is needed(to avoid messing up the Tx queues since ndo_xdp_xmit
      and the normal stack can co-exist). We also need to track down the
      'buffer type' for TX and properly free or recycle the packet depending
      on it's nature.
      
      Changes since RFC:
      - Bug fixes from Jesper and Maciej
      - Added page pool API to retrieve the DMA direction
      
      Changes since v1:
      - Use page_pool_free correctly if xdp_rxq_info_reg() failed
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fec3b9ec
    • Ilias Apalodimas's avatar
      net: netsec: add XDP support · ba2b2321
      Ilias Apalodimas authored
      The interface only supports 1 Tx queue so locking is introduced on
      the Tx queue if XDP is enabled to make sure .ndo_start_xmit and
      .ndo_xdp_xmit won't corrupt Tx ring
      
      - Performance (SMMU off)
      
      Benchmark   XDP_SKB     XDP_DRV
      xdp1        291kpps     344kpps
      rxdrop      282kpps     342kpps
      
      - Performance (SMMU on)
      Benchmark   XDP_SKB     XDP_DRV
      xdp1        167kpps     324kpps
      rxdrop      164kpps     323kpps
      Signed-off-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba2b2321
    • Ilias Apalodimas's avatar
      net: page_pool: add helper function for retrieving dma direction · bb005f2a
      Ilias Apalodimas authored
      Since the dma direction is stored in page pool params, offer an API
      helper for driver that choose not to keep track of it locally
      Signed-off-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb005f2a
    • Ilias Apalodimas's avatar
      net: netsec: Use page_pool API · 5c67bf0e
      Ilias Apalodimas authored
      Use page_pool and it's DMA mapping capabilities for Rx buffers instead
      of netdev/napi_alloc_frag()
      
      Although this will result in a slight performance penalty on small sized
      packets (~10%) the use of the API will allow to easily add XDP support.
      The penalty won't be visible in network testing i.e ipef/netperf etc, it
      only happens during raw packet drops.
      Furthermore we intend to add recycling capabilities on the API
      in the future. Once the recycling is added the performance penalty will
      go away.
      The only 'real' penalty is the slightly increased memory usage, since we
      now allocate a page per packet instead of the amount of bytes we need +
      skb metadata (difference is roughly 2kb per packet).
      With a minimum of 4BG of RAM on the only SoC that has this NIC the
      extra memory usage is negligible (a bit more on 64K pages)
      Signed-off-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5c67bf0e
    • Roman Mashak's avatar
      a8488b70
    • David S. Miller's avatar
      Merge branch 'mirred-batch-fixes' · c8881faf
      David S. Miller authored
      Roman Mashak says:
      
      ====================
      Fix batched event generation for mirred action
      
      When adding or deleting a batch of entries, the kernel sends upto
      TCA_ACT_MAX_PRIO entries in an event to user space. However it does not
      consider that the action sizes may vary and require different skb sizes.
      
      For example :
      
      % cat tc-batch.sh
      TC="sudo /mnt/iproute2.git/tc/tc"
      
      $TC actions flush action mirred
      for i in `seq 1 $1`;
      do
         cmd="action mirred egress redirect dev lo index $i "
         args=$args$cmd
      done
      $TC actions add $args
      %
      % ./tc-batch.sh 32
      Error: Failed to fill netlink attributes while adding TC action.
      We have an error talking to the kernel
      %
      
      patch 1 adds callback in tc_action_ops of mirred action, which calculates
      the action size, and passes size to tcf_add_notify()/tcf_del_notify().
      
      patch 2 updates the TDC test suite with relevant test cases.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c8881faf
    • Roman Mashak's avatar
    • Roman Mashak's avatar
      net sched: update mirred action for batched events operations · b84b2d4e
      Roman Mashak authored
      Add get_fill_size() routine used to calculate the action size
      when building a batch of events.
      Signed-off-by: default avatarRoman Mashak <mrv@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b84b2d4e
    • Jason A. Donenfeld's avatar
      netlink: use 48 byte ctx instead of 6 signed longs for callback · 362b87f5
      Jason A. Donenfeld authored
      People are inclined to stuff random things into cb->args[n] because it
      looks like an array of integers. Sometimes people even put u64s in there
      with comments noting that a certain member takes up two slots. The
      horror! Really this should mirror the usage of skb->cb, which are just
      48 opaque bytes suitable for casting a struct. Then people can create
      their usual casting macros for accessing strongly typed members of a
      struct.
      
      As a plus, this also gives us the same amount of space on 32bit and 64bit.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Reviewed-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      362b87f5
    • Jon Maloy's avatar
      tipc: embed jiffies in macro TIPC_BC_RETR_LIM · 53962bce
      Jon Maloy authored
      The macro TIPC_BC_RETR_LIM is always used in combination with 'jiffies',
      so we can just as well perform the addition in the macro itself. This
      way, we get a few shorter code lines and one less line break.
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      53962bce
    • Eiichi Tsukata's avatar
      net/ipv6: Fix misuse of proc_dointvec "flowlabel_reflect" · 00dc3307
      Eiichi Tsukata authored
      /proc/sys/net/ipv6/flowlabel_reflect assumes written value to be in the
      range of 0 to 3. Use proc_dointvec_minmax instead of proc_dointvec.
      
      Fixes: 323a53c4 ("ipv6: tcp: enable flowlabel reflection in some RST packets")
      Signed-off-by: default avatarEiichi Tsukata <devel@etsukata.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      00dc3307
    • Yunsheng Lin's avatar
      net: link_watch: prevent starvation when processing linkwatch wq · 27ba4059
      Yunsheng Lin authored
      When user has configured a large number of virtual netdev, such
      as 4K vlans, the carrier on/off operation of the real netdev
      will also cause it's virtual netdev's link state to be processed
      in linkwatch. Currently, the processing is done in a work queue,
      which may cause rtnl locking starvation problem and worker
      starvation problem for other work queue, such as irqfd_inject wq.
      
      This patch releases the cpu when link watch worker has processed
      a fixed number of netdev' link watch event, and schedule the
      work queue again when there is still link watch event remaining.
      Signed-off-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      27ba4059
    • David S. Miller's avatar
      Merge branch 'mlxsw-PTP-timestamping-support' · 0d0bcacc
      David S. Miller authored
      Ido Schimmel says:
      
      ====================
      mlxsw: PTP timestamping support
      
      This is the second patchset adding PTP support in mlxsw. Next patchset
      will add PTP shapers which are required to maintain accuracy under rates
      lower than 40Gb/s, while subsequent patchsets will add tracepoints and
      selftests.
      
      Petr says:
      
      This patch set introduces support for retrieving and processing hardware
      timestamps for PTP packets.
      
      The way PTP timestamping works on Spectrum-1 is that there are two queues
      associated with each front panel port. When a packet is timestamped, the
      timestamp is put to one of the queues: timestamps for transmitted packets
      to one and for received packets to the other. Activity on these queues is
      signaled through the events PTP_ING_FIFO and PTP_EGR_FIFO.
      
      Packets themselves arrive through two traps: PTP0 and PTP1. It is possible
      to configure which PTP messages should be trapped under which PTP trap. On
      Spectrum systems, mlxsw will use PTP0 for event messages (which need
      timestamping), and PTP1 for general messages (which do not).
      
      There are therefore four relevant traps: receive of PTP event resp. general
      message, and receive of timestamp for a transmitted resp. received PTP
      packet. The obvious point where to put the new logic is a custom listener
      to the mentioned traps.
      
      Besides handling ingress traffic (be in packets or timestamps), the driver
      also needs to handle timestamping of transmitted packets. One option would
      be to invoke the relevant logic from mlxsw_core_ptp_transmitted(). However
      on Spectrum-2, the timestamps are actually delivered through the completion
      queue, and for that reason this patchset opts to invoke the logic from the
      PCI code, via core and the driver, to a chip-specific operation. That way
      the invocation will be done in a place where a Spectrum-2 implementation
      will have an opportunity to extract the timestamp.
      
      As indicated above, the PTP FIFO signaling happens independently from
      packet delivery. A packet corresponding to any given timestamp could be
      delivered sooner or later than the timestamp itself. Additionally, the
      queues are only four elements deep, and it is therefore possible that the
      timestamp for a delivered packet never arrives at all. Similarly a PTP
      packet might be dropped due to CPU traffic pressure, and never be delivered
      even if the corresponding timestamp was.
      
      The driver thus needs to hold a cache of as-yet-unmatched SKBs and
      timestamps. The first piece to arrive (be it timestamp or SKB) is put to
      this cache. When the other piece arrives, the timestamp is attached to the
      SKB and that is passed on. A delayed work is run at regular intervals to
      prune the old unmatched entries.
      
      As mentioned above, the mechanism for timestamp delivery changes on
      Spectrum-2, where timestamps are part of completion queue elements, and all
      packets are timestamped. All this bookkeeping is therefore unnecessary on
      Spectrum-2. For this reason, this patchset spends some time introducing
      Spectrum-1 specific artifacts such as a possibility to register a given
      trap only on Spectrum-1.
      
      Patches #1-#4 describe new registers.
      
      Patches #5 and #6 introduce the possibility to register certain traps
      only on some systems. The list of Spectrum-1 specific traps is left empty
      at this point.
      
      Patch #7 hooks into packet receive path by registering PTP traps
      and appropriate handlers (that however do nothing of substance yet).
      
      Patch #8 adds a helper to allow storing custom data to SKB->cb.
      
      Patch #9 adds a call into the PCI completion queue handler that invokes,
      via core and spectrum code, a PTP transmit handler. (Which also does not do
      anything interesting yet.)
      
      Patch #10 introduces code to invoke PTP initialization and adds data types
      for the cache of unmatched entries.
      
      Patches #11 and #12 implement the timestamping itself. In #11, the PHC
      spin_locks are converted to _bh variants, because unlike normal PHC path,
      which runs in process context, timestamp processing runs as soft interrupt.
      Then #12 introduces the code for saving and retrieval of unmatched entries,
      invokes PTP classifier to identify packets of interest, registers timestamp
      FIFO events, and handles decoding and attaching timestamps to packets.
      
      Patch #13 introduces a garbage collector for left-behind entries that have
      not been matched for about a second.
      
      In patch #14, PTP message types are configured to arrive as PTP0
      (events) or PTP1 (everything else) as appropriate. At this point, the PTP
      packets start arriving through the traps, but because PTP is disabled and
      there is no way to enable it yet, they are always just passed to the usual
      receive path right away.
      
      Finally patches #15 and #16 add the plumbing to actually make it possible
      to enable this code through SIOCSHWTSTAMP ioctl, and to advertise the
      hardware timestamping capabilities through ethtool.
      
      v2:
      - Patch #12:
          - In mlxsw_sp1_ptp_fifo_event_func(), post-increment when iterating over PTP
            FIFO records.
      - Patch #14:
          - Change namespace of message type enumerators from MLXSW_ to MLXSW_SP_.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d0bcacc
    • Petr Machata's avatar
      mlxsw: spectrum: PTP: Support ethtool get_ts_info · 87ee07f8
      Petr Machata authored
      The get_ts_info callback is used for obtaining information about
      timestamping capabilities of a network device. On Spectrum-1, implement
      it to advertise the PHC and the capability to do HW timestamping, and
      the supported RX and TX filters.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      87ee07f8
    • Petr Machata's avatar
      mlxsw: spectrum: PTP: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls · 87486427
      Petr Machata authored
      The SIOCSHWTSTAMP ioctl configures HW timestamping on a given port.
      Dispatch the ioctls to per-chip handler (which add to ptp_ops). Find
      which PTP messages need to be timestamped and configure MTPPPC
      accordingly.
      
      The SIOCGHWTSTAMP ioctl is getter for the current configuration.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      87486427
    • Petr Machata's avatar
      mlxsw: spectrum: PTP: Configure PTP traps and FIFO events · a773c76c
      Petr Machata authored
      Configure MTPTPT to set which message types should arrive under which
      PTP trap, and MOGCR to clear the timestamp queue after its contents are
      reported through PTP_ING_FIFO or PTP_EGR_FIFO.
      
      With this configuration, PTP packets start arriving through the PTP
      traps. However since timestamping is disabled by default and there is
      currently no way to enable it, they will not be timestamped.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a773c76c
    • Petr Machata's avatar
      mlxsw: spectrum: PTP: Garbage-collect unmatched entries · 5d23e415
      Petr Machata authored
      On Spectrum-1, timestamped PTP packets and the corresponding timestamps
      need to be kept in caches until both are available, at which point they are
      matched up and packets forwarded as appropriate. However, not all packets
      will ever see their timestamp, and not all timestamps will ever see their
      packet. It is therefore necessary to dispose of such abandoned entries.
      
      To that end, introduce a garbage collector to collect entries that have
      not had their counterpart turn up within about a second. The GC
      maintains a monotonously-increasing value of GC cycle. Every entry that
      is put to the hash table is annotated with the GC cycle at which it
      should be collected. When the GC runs, it walks the hash table, and
      collects the objects according to their GC cycle annotation.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5d23e415