1. 07 Dec, 2016 2 commits
  2. 06 Dec, 2016 38 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · bc3913a5
      Linus Torvalds authored
      Pull sparc fix from David Miller:
       "A use-before-NULL-check from Dan Carpenter"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        dbri: move dereference after check for NULL
      bc3913a5
    • Dan Carpenter's avatar
      dbri: move dereference after check for NULL · 163117e8
      Dan Carpenter authored
      We accidentally introduced a dereference before the NULL check in
      xmit_descs() as part of silencing a GCC warning.
      
      Fixes: 16f46050 ("dbri: Fix compiler warning")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      163117e8
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · da1b466f
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) When dcbnl_cee_fill() fails to be able to push a new netlink
          attribute, it return 0 instead of an error code. From Pan Bian.
      
       2) Two suffix handling fixes to FIB trie code, from Alexander Duyck.
      
       3) bnxt_hwrm_stat_ctx_alloc() goes through all the trouble of setting
          and maintaining a return code 'rc' but fails to actually return it.
          Also from Pan Bian.
      
       4) ping socket ICMP handler needs to validate ICMP header length, from
          Kees Cook.
      
       5) caif_sktinit_module() has this interesting logic:
      
              int err = sock_register(...);
              if (!err)
                      return err;
              return 0;
      
          Just return sock_register()'s return value directly which is the
          only possible correct thing to do.
      
       6) Two bnx2x driver fixes from Yuval Mintz, return a reasonable
          estimate from get_ringparam() ethtool op when interface is down and
          avoid trying to use UDP port based tunneling on 577xx chips.
      
       7) Fix ep93xx_eth crash on module unload from Florian Fainelli.
      
       8) Missing uapi exports, from Stephen Hemminger.
      
       9) Don't schedule work from sk_destruct(), because the socket will be
          freed upon return from that function. From Herbert Xu.
      
      10) Buggy drivers, of which we know there is at least one, can send a
          huge packet into the TCP stack but forget to set the gso_size in the
          SKB, which causes all kinds of problems.
      
          Correct this when it happens, and emit a one-time warning with the
          device name included so that it can be diagnosed more easily.
      
          From Marcelo Ricardo Leitner.
      
      11) virtio-net does DMA off the stack causes hiccups with VMAP_STACK,
          fix from Andy Lutomirski.
      
      12) Fix fec driver compilation with CONFIG_M5272, from Nikita
          Yushchenko.
      
      13) mlx5 fixes from Kamal Heib, Saeed Mahameed, and Mohamad Haj Yahia.
          (erroneously flushing queues on error, module parameter validation,
          etc)
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (34 commits)
        net/mlx5e: Change the SQ/RQ operational state to positive logic
        net/mlx5e: Don't flush SQ on error
        net/mlx5e: Don't notify HW when filling the edge of ICO SQ
        net/mlx5: Fix query ISSI flow
        net/mlx5: Remove duplicate pci dev name print
        net/mlx5: Verify module parameters
        net: fec: fix compile with CONFIG_M5272
        be2net: Add DEVSEC privilege to SET_HSW_CONFIG command.
        virtio-net: Fix DMA-from-the-stack in virtnet_set_mac_address()
        tcp: warn on bogus MSS and try to amend it
        uapi glibc compat: fix outer guard of net device flags enum
        net: stmmac: clear reset value of snps, wr_osr_lmt/snps, rd_osr_lmt before writing
        netlink: Do not schedule work from sk_destruct
        uapi: export nf_log.h
        uapi: export tc_skbmod.h
        net: ep93xx_eth: Do not crash unloading module
        bnx2x: Prevent tunnel config for 577xx
        bnx2x: Correct ringparam estimate when DOWN
        isdn: hisax: set error code on failure
        net: bnx2x: fix improper return value
        ...
      da1b466f
    • Linus Torvalds's avatar
      shmem: fix shm fallocate() list corruption · 10d20bd2
      Linus Torvalds authored
      The shmem hole punching with fallocate(FALLOC_FL_PUNCH_HOLE) does not
      want to race with generating new pages by faulting them in.
      
      However, the wait-queue used to delay the page faulting has a serious
      problem: the wait queue head (in shmem_fallocate()) is allocated on the
      stack, and the code expects that "wake_up_all()" will make sure that all
      the queue entries are gone before the stack frame is de-allocated.
      
      And that is not at all necessarily the case.
      
      Yes, a normal wake-up sequence will remove the wait-queue entry that
      caused the wakeup (see "autoremove_wake_function()"), but the key
      wording there is "that caused the wakeup".  When there are multiple
      possible wakeup sources, the wait queue entry may well stay around.
      
      And _particularly_ in a page fault path, we may be faulting in new pages
      from user space while we also have other things going on, and there may
      well be other pending wakeups.
      
      So despite the "wake_up_all()", it's not at all guaranteed that all list
      entries are removed from the wait queue head on the stack.
      
      Fix this by introducing a new wakeup function that removes the list
      entry unconditionally, even if the target process had already woken up
      for other reasons.  Use that "synchronous" function to set up the
      waiters in shmem_fault().
      
      This problem has never been seen in the wild afaik, but Dave Jones has
      reported it on and off while running trinity.  We thought we fixed the
      stack corruption with the blk-mq rq_list locking fix (commit
      7fe31130: "blk-mq: update hardware and software queues for sleeping
      alloc"), but it turns out there was _another_ stack corruptor hiding
      in the trinity runs.
      
      Vegard Nossum (also running trinity) was able to trigger this one fairly
      consistently, and made us look once again at the shmem code due to the
      faults often being in that area.
      
      Reported-and-tested-by: Vegard Nossum <vegard.nossum@oracle.com>.
      Reported-by: default avatarDave Jones <davej@codemonkey.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      10d20bd2
    • David S. Miller's avatar
      Merge branch 'mlx5-fixes' · 32f16e14
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      Mellanox 100G mlx5 fixes 2016-12-04
      
      Some bug fixes for mlx5 core and mlx5e driver.
      
      v1->v2:
       - replace "uint" with "unsigned int"
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      32f16e14
    • Mohamad Haj Yahia's avatar
      net/mlx5e: Change the SQ/RQ operational state to positive logic · c0f1147d
      Mohamad Haj Yahia authored
      When using the negative logic (i.e. FLUSH state), after the RQ/SQ reopen
      we will have a time interval that the RQ/SQ is not really ready and the
      state indicates that its not in FLUSH state because the initial SQ/RQ struct
      memory starts as zeros.
      Now we changed the state to indicate if the SQ/RQ is opened and we will
      set the READY state after finishing preparing all the SQ/RQ resources.
      
      Fixes: 6e8dd6d6 ("net/mlx5e: Don't wait for SQ completions on close")
      Fixes: f2fde18c ("net/mlx5e: Don't wait for RQ completions on close")
      Signed-off-by: default avatarMohamad Haj Yahia <mohamad@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c0f1147d
    • Saeed Mahameed's avatar
      net/mlx5e: Don't flush SQ on error · 3c8591d5
      Saeed Mahameed authored
      We are doing SQ descriptors cleanup in driver.
      
      Fixes: 6e8dd6d6 ("net/mlx5e: Don't wait for SQ completions on close")
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3c8591d5
    • Saeed Mahameed's avatar
      net/mlx5e: Don't notify HW when filling the edge of ICO SQ · b8335d91
      Saeed Mahameed authored
      We are going to do this a couple of steps ahead anyway.
      
      Fixes: d3c9bc27 ("net/mlx5e: Added ICO SQs")
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b8335d91
    • Kamal Heib's avatar
      net/mlx5: Fix query ISSI flow · f9c14e46
      Kamal Heib authored
      In old FWs query ISSI command is not supported and for some of those FWs
      it might fail with status other than "MLX5_CMD_STAT_BAD_OP_ERR".
      
      In such case instead of failing the driver load, we will treat any FW
      status other than 0 for Query ISSI FW command as ISSI not supported and
      assume ISSI=0 (most basic driver/FW interface).
      
      In case of driver syndrom (query ISSI failure by driver) we will fail
      driver load.
      
      Fixes: f62b8bb8 ('net/mlx5: Extend mlx5_core to support ConnectX-4
      Ethernet functionality')
      Signed-off-by: default avatarKamal Heib <kamalh@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f9c14e46
    • Kamal Heib's avatar
      net/mlx5: Remove duplicate pci dev name print · 9e5b2fc1
      Kamal Heib authored
      Remove duplicate pci dev name printing from mlx5_core_warn/dbg.
      
      Fixes: 5a788398 ('net/mlx5_core: Improve mlx5 messages')
      Signed-off-by: default avatarKamal Heib <kamalh@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9e5b2fc1
    • Kamal Heib's avatar
      net/mlx5: Verify module parameters · f663ad98
      Kamal Heib authored
      Verify the mlx5_core module parameters by making sure that they are in
      the expected range and if they aren't restore them to their default
      values.
      
      Fixes: 9603b61d ('mlx5: Move pci device handling from mlx5_ib to mlx5_core')
      Signed-off-by: default avatarKamal Heib <kamalh@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f663ad98
    • Salil's avatar
      net: hns: Fix to conditionally convey RX checksum flag to stack · 862b3d20
      Salil authored
      This patch introduces the RX checksum function to check the
      status of the hardware calculated checksum and its error and
      appropriately convey status to the upper stack in skb->ip_summed
      field.
      
      In hardware, we only support checksum for the following
      protocols:
      1) IPv4,
      2) TCP(over IPv4 or IPv6),
      3) UDP(over IPv4 or IPv6),
      4) SCTP(over IPv4 or IPv6)
      but we support many L3(IPv4, IPv6, MPLS, PPPoE etc) and
      L4(TCP, UDP, GRE, SCTP, IGMP, ICMP etc.) protocols.
      
      Hardware limitation:
      Our present hardware RX Descriptor lacks L3/L4 checksum
      "Status & Error" bit (which usually can be used to indicate whether
      checksum was calculated by the hardware and if there was any error
      encountered during checksum calculation).
      
      Software workaround:
      We do get info within the RX descriptor about the kind of
      L3/L4 protocol coming in the packet and the error status. These
      errors might not just be checksum errors but could be related to
      version, length of IPv4, UDP, TCP etc.
      Because there is no-way of knowing if it is a L3/L4 error due
      to bad checksum or any other L3/L4 error, we will not (cannot)
      convey hardware checksum status(CHECKSUM_UNNECESSARY) for such
      cases to upper stack and will not maintain the RX L3/L4 checksum
      counters as well.
      Signed-off-by: default avatarSalil Mehta <salil.mehta@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      862b3d20
    • Nikita Yushchenko's avatar
      net: fec: fix compile with CONFIG_M5272 · f85de666
      Nikita Yushchenko authored
      Commit 80cca775 ("net: fec: cache statistics while device is down")
      introduced unconditional statistics-related actions.
      
      However, when driver is compiled with CONFIG_M5272, staticsics-related
      definitions do not exist, which results into build errors.
      
      Fix that by adding explicit handling of !defined(CONFIG_M5272) case.
      
      Fixes: 80cca775 ("net: fec: cache statistics while device is down")
      Signed-off-by: default avatarNikita Yushchenko <nikita.yoush@cogentembedded.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f85de666
    • Venkat Duvvuru's avatar
      be2net: Add DEVSEC privilege to SET_HSW_CONFIG command. · d14584d9
      Venkat Duvvuru authored
      OPCODE_COMMON_GET_FN_PRIVILEGES is returning only DEVSEC
      privilege (Unrestricted Administrative Privilege) for Lancer NIC functions.
      So, driver is failing SET_HSW_CONFIG command, as DEVSEC privilege was not
      set in the privilege bitmap. This patch fixes the problem by setting DEVSEC
      privilege in SET_HSW_CONFIG’s privilege bitmap.
      Signed-off-by: default avatarVenkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
      Signed-off-by: default avatarSuresh Reddy <suresh.reddy@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d14584d9
    • Andy Lutomirski's avatar
      virtio-net: Fix DMA-from-the-stack in virtnet_set_mac_address() · e37e2ff3
      Andy Lutomirski authored
      With CONFIG_VMAP_STACK=y, virtnet_set_mac_address() can be passed a
      pointer to the stack and it will OOPS.  Copy the address to the heap
      to prevent the crash.
      
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Laura Abbott <labbott@redhat.com>
      Reported-by: zbyszek@in.waw.pl
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e37e2ff3
    • Ivan Khoronzhuk's avatar
      net: ethernet: ti: cpsw: fix early budget split · 48e0a83e
      Ivan Khoronzhuk authored
      The budget split function requires the phy speed to be known.
      While ndo open a phy speed identification is postponed till the
      moment link is up. Hence, move it to appropriate callback, when link
      is up.
      Reported-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Fixes: 8feb0a19 ("net: ethernet: ti: cpsw: split tx budget according between channels")
      Signed-off-by: default avatarIvan Khoronzhuk <ivan.khoronzhuk@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      48e0a83e
    • Florian Westphal's avatar
      Revert "dctcp: update cwnd on congestion event" · 343dfaa1
      Florian Westphal authored
      Neal Cardwell says:
       If I am reading the code correctly, then I would have two concerns:
       1) Has that been tested? That seems like an extremely dramatic
          decrease in cwnd. For example, if the cwnd is 80, and there are 40
          ACKs, and half the ACKs are ECE marked, then my back-of-the-envelope
          calculations seem to suggest that after just 11 ACKs the cwnd would be
          down to a minimal value of 2 [..]
       2) That seems to contradict another passage in the draft [..] where it
          sazs:
             Just as specified in [RFC3168], DCTCP does not react to congestion
             indications more than once for every window of data.
      
      Neal is right.  Fortunately we don't have to complicate this by testing
      vs. current rtt estimate, we can just revert the patch.
      
      Normal stack already handles this for us: receiving ACKs with ECE
      set causes a call to tcp_enter_cwr(), from there on the ssthresh gets
      adjusted and prr will take care of cwnd adjustment.
      
      Fixes: 47805667 ("dctcp: update cwnd on congestion event")
      Cc: Neal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      343dfaa1
    • David S. Miller's avatar
      Merge branch 'mv88e6xxx-rework-reset-and-PPU-code' · b1df0f5c
      David S. Miller authored
      Vivien Didelot says:
      
      ====================
      net: dsa: mv88e6xxx: rework reset and PPU code
      
      Old Marvell chips (like 88E6060) don't have a PHY Polling Unit (PPU).
      
      Next chips (like 88E6185) have a PPU, which has exclusive access to the
      PHY registers, thus must be disabled before access.
      
      Newer chips (like 88E6352) have an indirect mechanism to access the PHY
      registers whenever, thus loose control over the PPU (always enabled).
      
      Here's a summary:
      
      Model | PPU? | Has PPU ctrl?  | PPU state readable? | PHY access
      ----- | ---- | -------------- | ------------------- | ----------
       6060 | no   | no             | no                  | direct
       6185 | yes  | yes, PPUEn bit | yes, PPUState 2-bit | direct w/ PPU dis.
       6352 | yes  | no             | yes, PPUState 1-bit | indirect
       6390 | yes  | no             | yes, InitState bit  | indirect
      
      Depending on the PPU control, a switch may have to restart the PPU when
      resetting the switch. Once the switch is reset, we must wait for the PPU
      state to be active polling again before accessing the registers.
      
      For that purpose, add new operations to the chips to enable/disable the
      PPU, and execute software reset. With these new ops in place, rework the
      switch reset code and finally get rid of the MV88E6XXX_FLAG_PPU* flags.
      
      Changes in v3:
        - consider 6097 as 6352 (no PPU ops and use mv88e6352_g1_reset).
      
      Changes in v2:
        - wait in ppu/reset ops so that ppu_polling is not needed anymore.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b1df0f5c
    • Vivien Didelot's avatar
      net: dsa: mv88e6xxx: add PPU operations · a199d8b6
      Vivien Didelot authored
      Some Marvell chips can enable/disable the PPU on demand. This is needed
      to access the PHY registers when there is no indirection mechanism.
      
      Add two new ppu_enable and ppu_disable ops to describe this and finally
      get rid of the MV88E6XXX_FLAG_PPU* flags.
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a199d8b6
    • Vivien Didelot's avatar
      net: dsa: mv88e6xxx: add a soft reset operation · 17e708ba
      Vivien Didelot authored
      Marvell chips have different way to issue a software reset.
      
      Old chips (such as 88E6060) have a reset bit in an ATU control register.
      
      Newer chips moved this bit in a Global control register. Chips with
      controllable PPU should reset the PPU when resetting the switch.
      
      Add a new reset operation to implement these differences and introduce a
      mv88e6xxx_software_reset() helper to wrap it conveniently.
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      17e708ba
    • Vivien Didelot's avatar
      net: dsa: mv88e6xxx: add helper to hardware reset · 309eca6d
      Vivien Didelot authored
      Add an helper to toggle the eventual GPIO connected to the reset pin.
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      309eca6d
    • Vivien Didelot's avatar
      net: dsa: mv88e6xxx: add helper to disable ports · 4ac4b5a6
      Vivien Didelot authored
      Before resetting a switch, the ports should be set to the Disabled state
      and the transmit queues should be drained.
      
      Add an helper to explicit that.
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ac4b5a6
    • David S. Miller's avatar
      Merge branch 'Alacritech-SLIC-driver' · 9f9ffdff
      David S. Miller authored
      Lino Sanfilippo says:
      
      ====================
      Gigabit ethernet driver for Alacritechs SLIC devices (v4)
      
      this is the forth version of the slicoss gigabit ethernet driver (which is a
      rework of the driver from Alacritech which can currently be found under
      drivers/staging/slicoss). The driver is supposed to support Mojave, Oasis and
      Kalahari cards, for both copper and fiber.
      
      If this code is accepted the staging version can be removed.
      
      The driver has been tested on a SEN2104ET adapter (4 Port PCIe copper).
      
      v4:
      - fix wrong driver name in Kconfig file (reported by Rami Rosen)
      - remove unused variable from driver struct (reported by Rami Rosen)
      - return "err" instead of 0 in slic_load_rcvseq_firmware() (reported by Rami Rosen)
      - Fix typos in constants, comments and error message (reported by Markus Böhme)
      - fix various warnings concerning signedness (reported by Markus Böhme)
      - improve line formatting (reported by Markus Böhme)
      - add comment describing the need for SLIC_MAX_TX_COMPLETIONS (suggested by Florian Fainelli)
      - do not zero out complete rx descriptor (suggested by Florian Fainelli)
      - add missing write barrier (reported by Florian Fainelli)
      - remove unneeded assignment of net_device to skb (reported by Florian Fainelli)
      - use napi_complete_done() instead of napi_complete (suggested by Florian Fainelli)
      - use napi_schedule_irqoff() instead of napi_schedule (suggested by Florian Fainelli)
      - do not map error returned by slic_init() to -ENOMEM
      - do proper dma syncs before and after rx descriptor status is set to 0
      - if after dma sync for CPU rx descriptor is not used return it to HW by means of dma sync for device
      
      v3:
      - dont add defines to pci_ids.h but instead put it into the drivers header file
      (requested by Greg Kroah-Hartman)
      
      v2:
      - remove unusual padding in statistic strings (suggested by Andrew Lunn)
      - for mdio register and bit names use defines from mii.h instead of own ones
        (suggested by Andrew Lunn)
      - remove unused defines
      - ensure PCI flush at two more places
      - use mmiowb before lock to prevent mmio writes leaking out of lock
      - fix some typos in comments
      - add copyright and GPL header
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9f9ffdff
    • Lino Sanfilippo's avatar
      MAINTAINERS: add entry for slicoss ethernet driver · b9567027
      Lino Sanfilippo authored
      Add myself as maintainer for the slicoss ethernet driver.
      Signed-off-by: default avatarLino Sanfilippo <LinoSanfilippo@gmx.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b9567027
    • Lino Sanfilippo's avatar
      net: ethernet: slicoss: add slicoss gigabit ethernet driver · 60c140df
      Lino Sanfilippo authored
      Add driver for Alacritech gigabit ethernet cards with SLIC (session-layer
      interface control) technology. The driver provides basic support without
      SLIC for the following devices:
      
      - Mojave cards (single port PCI Gigabit) both copper and fiber
      - Oasis cards (single and dual port PCI-x Gigabit) copper and fiber
      - Kalahari cards (dual and quad port PCI-e Gigabit) copper and fiber
      Signed-off-by: default avatarLino Sanfilippo <LinoSanfilippo@gmx.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      60c140df
    • Marcelo Ricardo Leitner's avatar
      tcp: warn on bogus MSS and try to amend it · dcb17d22
      Marcelo Ricardo Leitner authored
      There have been some reports lately about TCP connection stalls caused
      by NIC drivers that aren't setting gso_size on aggregated packets on rx
      path. This causes TCP to assume that the MSS is actually the size of the
      aggregated packet, which is invalid.
      
      Although the proper fix is to be done at each driver, it's often hard
      and cumbersome for one to debug, come to such root cause and report/fix
      it.
      
      This patch amends this situation in two ways. First, it adds a warning
      on when this situation occurs, so it gives a hint to those trying to
      debug this. It also limit the maximum probed MSS to the adverised MSS,
      as it should never be any higher than that.
      
      The result is that the connection may not have the best performance ever
      but it shouldn't stall, and the admin will have a hint on what to look
      for.
      
      Tested with virtio by forcing gso_size to 0.
      
      v2: updated msg per David's suggestion
      v3: use skb_iif to find the interface and also log its name, per Eric
          Dumazet's suggestion. As the skb may be backlogged and the interface
          gone by then, we need to check if the number still has a meaning.
      v4: use helper tcp_gro_dev_warn() and avoid pr_warn_once inside __once, per
          David's suggestion
      
      Cc: Jonathan Maxwell <jmaxwell37@gmail.com>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dcb17d22
    • Jonas Gorski's avatar
      uapi glibc compat: fix outer guard of net device flags enum · efc45154
      Jonas Gorski authored
      Fix a wrong condition preventing the higher net device flags
      IFF_LOWER_UP etc to be defined if net/if.h is included before
      linux/if.h.
      
      The comment makes it clear the intention was to allow partial
      definition with either parts.
      
      This fixes compilation of userspace programs trying to use
      IFF_LOWER_UP, IFF_DORMANT or IFF_ECHO.
      
      Fixes: 4a91cb61 ("uapi glibc compat: fix compile errors when glibc net/if.h included before linux/if.h")
      Signed-off-by: default avatarJonas Gorski <jonas.gorski@gmail.com>
      Reviewed-by: default avatarMikko Rapeli <mikko.rapeli@iki.fi>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      efc45154
    • Eric Dumazet's avatar
      net/udp: do not touch skb->peeked unless really needed · a297569f
      Eric Dumazet authored
      In UDP recvmsg() path we currently access 3 cache lines from an skb
      while holding receive queue lock, plus another one if packet is
      dequeued, since we need to change skb->next->prev
      
      1st cache line (contains ->next/prev pointers, offsets 0x00 and 0x08)
      2nd cache line (skb->len & skb->peeked, offsets 0x80 and 0x8e)
      3rd cache line (skb->truesize/users, offsets 0xe0 and 0xe4)
      
      skb->peeked is only needed to make sure 0-length packets are properly
      handled while MSG_PEEK is operated.
      
      I had first the intent to remove skb->peeked but the "MSG_PEEK at
      non-zero offset" support added by Sam Kumar makes this not possible.
      
      This patch avoids one cache line miss during the locked section, when
      skb->len and skb->peeked do not have to be read.
      
      It also avoids the skb_set_peeked() cost for non empty UDP datagrams.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a297569f
    • Niklas Cassel's avatar
      net: stmmac: clear reset value of snps, wr_osr_lmt/snps, rd_osr_lmt before writing · 6b3374cb
      Niklas Cassel authored
      WR_OSR_LMT and RD_OSR_LMT have a reset value of 1.
      Since the reset value wasn't cleared before writing, the value in the
      register would be incorrect if specifying an uneven value for
      snps,wr_osr_lmt/snps,rd_osr_lmt.
      
      Zero is a valid value for the properties, since the databook specifies:
      maximum outstanding requests = WR_OSR_LMT + 1.
      
      We do not want to change the behavior for existing users when the
      property is missing. Therefore, default to 1 if the property is missing,
      since that is the same as the reset value.
      Signed-off-by: default avatarNiklas Cassel <niklas.cassel@axis.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6b3374cb
    • David S. Miller's avatar
      Merge branch 'hix5hd2_gmac-txsg-reset-clock-control' · e466af66
      David S. Miller authored
      Dongpo Li says:
      
      ====================
      net: hix5hd2_gmac: add tx sg feature and reset/clock control signals
      
      The "hix5hd2" is SoC name, add the generic ethernet driver compatible string.
      The "hisi-gemac-v1" is the basic version and "hisi-gemac-v2" adds
      the SG/TXCSUM/TSO/UFO features.
      This patch set only adds the SG(scatter-gather) driver for transmitting,
      the drivers of other features will be submitted later.
      
      Add the MAC reset control signals and clock signals.
      We make these signals optional to be backward compatible with
      the hix5hd2 SoC.
      
      Changes in v2:
      - Make the compatible string changes be a separate patch and
      the most specific string come first than the generic string
      as advised by Rob.
      - Make the MAC reset control signals and clock signals optional
      to be backward compatible with the hix5hd2 SoC.
      - Change the compatible string and give the clock a specific name
      in hix5hd2 dts file.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e466af66
    • Dongpo Li's avatar
      ARM: dts: hix5hd2: add gmac generic compatible and clock names · 0855950b
      Dongpo Li authored
      Add gmac generic compatible and clock names.
      Signed-off-by: default avatarDongpo Li <lidongpo@hisilicon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0855950b
    • Dongpo Li's avatar
      net: hix5hd2_gmac: add reset control and clock signals · 7087140d
      Dongpo Li authored
      Add three reset control signals, "mac_core_rst", "mac_ifc_rst" and
      "phy_rst".
      The following diagram explained how the reset signals work.
      
                              SoC
      |-----------------------------------------------------
      |                               ------                |
      |                               | cpu |               |
      |                               ------                |
      |                                  |                  |
      |                              ------------ AMBA bus  |
      |                         GMAC     |                  |
      |                            ----------------------   |
      | ------------- mac_core_rst | --------------      |  |
      | |clock and   |-------------->|   mac core  |     |  |
      | |reset       |             | --------------      |  |
      | |generator   |----         |       |             |  |
      | -------------     |        | ----------------    |  |
      |          |        ---------->| mac interface |   |  |
      |          |     mac_ifc_rst | ----------------    |  |
      |          |                 |       |             |  |
      |          |                 | ------------------  |  |
      |          |phy_rst          | | RGMII interface | |  |
      |          |                 | ------------------  |  |
      |          |                 ----------------------   |
      |----------|------------------------------------------|
                 |                          |
                 |                      ----------
                 |--------------------- |PHY chip |
                                        ----------
      
      The "mac_core_rst" represents "mac core reset signal", it resets
      the mac core including packet processing unit, descriptor processing unit,
      tx engine, rx engine, control unit.
      The "mac_ifc_rst" represents "mac interface reset signal", it resets
      the mac interface. The mac interface unit connects mac core and
      data interface like MII/RMII/RGMII. After we set a new value of
      interface mode, we must reset mac interface to reload the new mode value.
      The "mac_core_rst" and "mac_ifc_rst" are both optional to be
      backward compatible with the hix5hd2 SoC.
      The "phy_rst" represents "phy reset signal", it does a hardware reset
      on the PHY chip. This reset signal is optional if the PHY can work well
      without the hardware reset.
      
      Add one more clock signal, the existing is MAC core clock,
      and the new one is MAC interface clock.
      The MAC interface clock is optional to be backward compatible with
      the hix5hd2 SoC.
      Signed-off-by: default avatarDongpo Li <lidongpo@hisilicon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7087140d
    • Dongpo Li's avatar
      net: hix5hd2_gmac: add tx scatter-gather feature · e5222b1c
      Dongpo Li authored
      "hisi-gemac-v2" adds the SG/TXCSUM/TSO/UFO features.
      This patch only adds the SG(scatter-gather) driver for transmitting,
      the drivers of other features will be submitted later.
      Signed-off-by: default avatarDongpo Li <lidongpo@hisilicon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e5222b1c
    • Dongpo Li's avatar
      net: hix5hd2_gmac: add generic compatible string · d0fb6ba7
      Dongpo Li authored
      The "hix5hd2" is SoC name, add the generic ethernet driver name.
      The "hisi-gemac-v1" is the basic version and "hisi-gemac-v2" adds
      the SG/TXCSUM/TSO/UFO features.
      Signed-off-by: default avatarDongpo Li <lidongpo@hisilicon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d0fb6ba7
    • Stefan Eichenberger's avatar
      net: dsa: mv88e6xxx: Use EDSA on mv88e6097 · 2bfcfcd3
      Stefan Eichenberger authored
      Use DSA_TAG_PROTO_EDSA as tag_protocol for the mv88e6097. The
      initialisation was missing before.
      
      Fixes: a1f482aa8c33 ("net: dsa: mv88e6xxx: Move the tagging protocol into info")
      Signed-off-by: default avatarStefan Eichenberger <stefan.eichenberger@netmodule.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2bfcfcd3
    • Thomas Graf's avatar
      bpf: add additional verifier tests for BPF_PROG_TYPE_LWT_* · 3f731d89
      Thomas Graf authored
      - direct packet read is allowed for LWT_*
       - direct packet write for LWT_IN/LWT_OUT is prohibited
       - direct packet write for LWT_XMIT is allowed
       - access to skb->tc_classid is prohibited for LWT_*
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3f731d89
    • Haiyang Zhang's avatar
      tools: hv: Enable network manager for bonding scripts on RHEL · fd7aabb0
      Haiyang Zhang authored
      We found network manager is necessary on RHEL to make the synthetic
      NIC, VF NIC bonding operations handled automatically. So, enabling
      network manager here.
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Reviewed-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd7aabb0
    • Herbert Xu's avatar
      netlink: Do not schedule work from sk_destruct · ed5d7788
      Herbert Xu authored
      It is wrong to schedule a work from sk_destruct using the socket
      as the memory reserve because the socket will be freed immediately
      after the return from sk_destruct.
      
      Instead we should do the deferral prior to sk_free.
      
      This patch does just that.
      
      Fixes: 707693c8 ("netlink: Call cb->done from a worker thread")
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ed5d7788