1. 25 Aug, 2017 16 commits
    • Anjali Singhai Jain's avatar
      i40e: Fix a bug with VMDq RSS queue allocation · 5a433199
      Anjali Singhai Jain authored
      The X722 pf flag setup should happen before the VMDq RSS queue count is
      initialized for VMDq VSI to get the right number of queues for RSS in
      case of X722 devices.
      Signed-off-by: default avatarAnjali Singhai Jain <anjali.singhai@intel.com>
      Signed-off-by: default avatarAlice Michael <alice.michael@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      5a433199
    • Sudheer Mogilappagari's avatar
      i40evf: prevent VF close returning before state transitions to DOWN · fe2647ab
      Sudheer Mogilappagari authored
      Currently i40evf_close() can return before state transitions to
      __I40EVF_DOWN because of the latency involved in processing and
      receiving response from PF driver and scheduling of VF watchdog_task.
      Due to this inconsistency an immediate call to i40evf_open() fails
      because state is still DOWN_PENDING.
      
      When a VF interface is in up state and we try to add it as slave,
      The bonding driver calls dev_close() and dev_open() in short duration
      resulting in dev_open returning error. The ifenslave command needs
      to be run again for dev_open to succeed.
      
      This fix ensures that watchdog timer is scheduled immediately after
      admin queue operations are scheduled in i40evf_down(). In addition a
      wait condition is added at the end of i40evf_close so that function
      wont return when state is still DOWN_PENDING. The timeout value is
      chosen after some profiling and includes some buffer.
      Signed-off-by: default avatarSudheer Mogilappagari <sudheer.mogilappagari@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      fe2647ab
    • Mitch Williams's avatar
      i40e/i40evf: adjust packet size to account for double VLANs · 1e3a5fd5
      Mitch Williams authored
      Now that the kernel supports double VLAN tags, we should at least play
      nice. Adjust the max packet size to account for two VLAN tags, not just
      one.
      Signed-off-by: default avatarMitch Williams <mitch.a.williams@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      1e3a5fd5
    • Eric Biggers's avatar
      strparser: initialize all callbacks · 3fd87127
      Eric Biggers authored
      commit bbb03029 ("strparser: Generalize strparser") added more
      function pointers to 'struct strp_callbacks'; however, kcm_attach() was
      not updated to initialize them.  This could cause the ->lock() and/or
      ->unlock() function pointers to be set to garbage values, causing a
      crash in strp_work().
      
      Fix the bug by moving the callback structs into static memory, so
      unspecified members are zeroed.  Also constify them while we're at it.
      
      This bug was found by syzkaller, which encountered the following splat:
      
          IP: 0x55
          PGD 3b1ca067
          P4D 3b1ca067
          PUD 3b12f067
          PMD 0
      
          Oops: 0010 [#1] SMP KASAN
          Dumping ftrace buffer:
             (ftrace buffer empty)
          Modules linked in:
          CPU: 2 PID: 1194 Comm: kworker/u8:1 Not tainted 4.13.0-rc4-next-20170811 #2
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
          Workqueue: kstrp strp_work
          task: ffff88006bb0e480 task.stack: ffff88006bb10000
          RIP: 0010:0x55
          RSP: 0018:ffff88006bb17540 EFLAGS: 00010246
          RAX: dffffc0000000000 RBX: ffff88006ce4bd60 RCX: 0000000000000000
          RDX: 1ffff1000d9c97bd RSI: 0000000000000000 RDI: ffff88006ce4bc48
          RBP: ffff88006bb17558 R08: ffffffff81467ab2 R09: 0000000000000000
          R10: ffff88006bb17438 R11: ffff88006bb17940 R12: ffff88006ce4bc48
          R13: ffff88003c683018 R14: ffff88006bb17980 R15: ffff88003c683000
          FS:  0000000000000000(0000) GS:ffff88006de00000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: 0000000000000055 CR3: 000000003c145000 CR4: 00000000000006e0
          DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
          DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
          Call Trace:
           process_one_work+0xbf3/0x1bc0 kernel/workqueue.c:2098
           worker_thread+0x223/0x1860 kernel/workqueue.c:2233
           kthread+0x35e/0x430 kernel/kthread.c:231
           ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
          Code:  Bad RIP value.
          RIP: 0x55 RSP: ffff88006bb17540
          CR2: 0000000000000055
          ---[ end trace f0e4920047069cee ]---
      
      Here is a C reproducer (requires CONFIG_BPF_SYSCALL=y and
      CONFIG_AF_KCM=y):
      
          #include <linux/bpf.h>
          #include <linux/kcm.h>
          #include <linux/types.h>
          #include <stdint.h>
          #include <sys/ioctl.h>
          #include <sys/socket.h>
          #include <sys/syscall.h>
          #include <unistd.h>
      
          static const struct bpf_insn bpf_insns[3] = {
              { .code = 0xb7 }, /* BPF_MOV64_IMM(0, 0) */
              { .code = 0x95 }, /* BPF_EXIT_INSN() */
          };
      
          static const union bpf_attr bpf_attr = {
              .prog_type = 1,
              .insn_cnt = 2,
              .insns = (uintptr_t)&bpf_insns,
              .license = (uintptr_t)"",
          };
      
          int main(void)
          {
              int bpf_fd = syscall(__NR_bpf, BPF_PROG_LOAD,
                                   &bpf_attr, sizeof(bpf_attr));
              int inet_fd = socket(AF_INET, SOCK_STREAM, 0);
              int kcm_fd = socket(AF_KCM, SOCK_DGRAM, 0);
      
              ioctl(kcm_fd, SIOCKCMATTACH,
                    &(struct kcm_attach) { .fd = inet_fd, .bpf_fd = bpf_fd });
          }
      
      Fixes: bbb03029 ("strparser: Generalize strparser")
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Tom Herbert <tom@quantonium.net>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3fd87127
    • Haiyang Zhang's avatar
      hv_netvsc: Fix rndis_filter_close error during netvsc_remove · c6f71c41
      Haiyang Zhang authored
      We now remove rndis filter before unregister_netdev(), which calls
      device close. It involves closing rndis filter already removed.
      
      This patch fixes this error.
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c6f71c41
    • David S. Miller's avatar
      Merge tag 'mlx5-updates-2017-08-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 0cf3f4c3
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5-updates-2017-08-24
      
      This series includes updates to mlx5 core driver.
      
      From Gal and Saeed, three cleanup patches.
      From Matan, Low level flow steering improvements and optimizations,
       - Use more efficient data structures for flow steering objects handling.
       - Add tracepoints to flow steering operations.
       - Overall these patches improve flow steering rule insertion rate by a
         factor of seven in large scales (~50K rules or more).
      
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0cf3f4c3
    • Dan Carpenter's avatar
      hinic: uninitialized variable in hinic_api_cmd_init() · 256fbe11
      Dan Carpenter authored
      We never set the error code in this function.
      
      Fixes: eabf0fad ("net-next/hinic: Initialize api cmd resources")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      256fbe11
    • Florian Fainelli's avatar
      net: mv643xx_eth: Be drop monitor friendly · 43cee2d2
      Florian Fainelli authored
      txq_reclaim() does the normal transmit queue reclamation and
      rxq_deinit() does the RX ring cleanup, none of these are packet drops,
      so use dev_consume_skb() for both locations.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      43cee2d2
    • Florian Fainelli's avatar
      tg3: Be drop monitor friendly · 1e9d8e7a
      Florian Fainelli authored
      tg3_tx() does the normal packet TX completion,
      tigon3_dma_hwbug_workaround() and tg3_tso_bug() both need to allocate a
      new SKB that is suitable to workaround HW bugs, and finally
      tg3_free_rings() is doing ring cleanup. Use dev_consume_skb_any() for
      these 3 locations to be SKB drop monitor friendly.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1e9d8e7a
    • David S. Miller's avatar
      Merge branch 'ipv6-Route-ICMPv6-errors-with-the-flow-when-ECMP-in-use' · 45c7ec9d
      David S. Miller authored
      Jakub Sitnicki says:
      
      ====================
      ipv6: Route ICMPv6 errors with the flow when ECMP in use
      
      This patch set is another take at making Path MTU Discovery work when
      server nodes are behind a router employing multipath routing in a
      load-balance or anycast setup (that is, when not every end-node can be
      reached by every path). The problem has been well described in RFC 7690
      [1], but in short - in such setups ICMPv6 PTB errors are not guaranteed
      to be routed back to the server node that sent a reply that exceeds path
      MTU.
      
      The proposed solution is two-fold:
      
       (1) on the server side - reflect the Flow Label [2]. This can be done
           without modifying the application using a new per-netns sysctl knob
           that has been proposed independently of this patchset in the patch
           entitled "ipv6: Add sysctl for per namespace flow label
           reflection" [3].
      
       (2) on the ECMP router - make the ipv6 routing subsystem look into the
           ICMPv6 error packets and compute the flow-hash from its payload,
           i.e. the offending packet that triggered the error. This is the
           same behavior as ipv4 stack has already.
      
      With both parts in place Path MTU Discovery can work past the ECMP
      router when using IPv6.
      
      [1] https://tools.ietf.org/html/rfc7690
      [2] https://tools.ietf.org/html/draft-wang-6man-flow-label-reflection-01
      [3] http://patchwork.ozlabs.org/patch/804870/
      
      v1 -> v2:
       - don't use "extern" in external function declaration in header file
       - style change, put as many arguments as possible on the first line of
         a function call, and align consecutive lines to the first argument
       - expand the cover letter based on the feedback
      
      v2 -> v3:
       - switch to computing flow-hash using flow dissector to align with
         recent changes to multipath routing in ipv4 stack
       - add a sysctl knob for enabling flow label reflection per netns
      
      ---
      
      Testing has covered multipath routing of ICMPv6 PTB errors in forward
      and local output path in a simple use-case of an HTTP server sending a
      reply which is over the path MTU size [3]. I have also checked if the
      flows get evenly spread over multiple paths (i.e. if there are no
      regressions) [4].
      
      [3] https://github.com/jsitnicki/tools/tree/master/net/tests/ecmp/pmtud
      [4] https://github.com/jsitnicki/tools/tree/master/net/tests/ecmp/load-balance
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      45c7ec9d
    • Jakub Sitnicki's avatar
      ipv6: Use multipath hash from flow info if available · b673d6cc
      Jakub Sitnicki authored
      Allow our callers to influence the choice of ECMP link by honoring the
      hash passed together with the flow info. This allows for special
      treatment of ICMP errors which we would like to route over the same path
      as the IPv6 datagram that triggered the error.
      
      Also go through rt6_multipath_hash(), in the usual case when we aren't
      dealing with an ICMP error, so that there is one central place where
      multipath hash is computed.
      Signed-off-by: default avatarJakub Sitnicki <jkbs@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b673d6cc
    • Jakub Sitnicki's avatar
      ipv6: Fold rt6_info_hash_nhsfn() into its only caller · 956b4531
      Jakub Sitnicki authored
      Commit 644d0e65 ("ipv6 Use get_hash_from_flowi6 for rt6 hash") has
      turned rt6_info_hash_nhsfn() into a one-liner, so it no longer makes
      sense to keep it around. Also remove the accompanying comment that has
      become outdated.
      Signed-off-by: default avatarJakub Sitnicki <jkbs@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      956b4531
    • Jakub Sitnicki's avatar
      ipv6: Compute multipath hash for ICMP errors from offending packet · 23aebdac
      Jakub Sitnicki authored
      When forwarding or sending out an ICMPv6 error, look at the embedded
      packet that triggered the error and compute a flow hash over its
      headers.
      
      This let's us route the ICMP error together with the flow it belongs to
      when multipath (ECMP) routing is in use, which in turn makes Path MTU
      Discovery work in ECMP load-balanced or anycast setups (RFC 7690).
      
      Granted, end-hosts behind the ECMP router (aka servers) need to reflect
      the IPv6 Flow Label for PMTUD to work.
      
      The code is organized to be in parallel with ipv4 stack:
      
        ip_multipath_l3_keys -> ip6_multipath_l3_keys
        fib_multipath_hash   -> rt6_multipath_hash
      Signed-off-by: default avatarJakub Sitnicki <jkbs@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      23aebdac
    • Jakub Sitnicki's avatar
      net: Extend struct flowi6 with multipath hash · 29825717
      Jakub Sitnicki authored
      Allow for functions that fill out the IPv6 flow info to also pass a hash
      computed over the skb contents. The hash value will drive the multipath
      routing decisions.
      
      This is intended for special treatment of ICMPv6 errors, where we would
      like to make a routing decision based on the flow identifying the
      offending IPv6 datagram that triggered the error, rather than the flow
      of the ICMP error itself.
      Signed-off-by: default avatarJakub Sitnicki <jkbs@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      29825717
    • David S. Miller's avatar
      devlink: Fix devlink_dpipe_table_register() stub signature. · 790c6056
      David S. Miller authored
      One too many arguments compared to the non-stub version.
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Fixes: ffd3cdcc ("devlink: Add support for dynamic table size")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      790c6056
    • Jakub Sitnicki's avatar
      ipv6: Add sysctl for per namespace flow label reflection · 22b6722b
      Jakub Sitnicki authored
      Reflecting IPv6 Flow Label at server nodes is useful in environments
      that employ multipath routing to load balance the requests. As "IPv6
      Flow Label Reflection" standard draft [1] points out - ICMPv6 PTB error
      messages generated in response to a downstream packets from the server
      can be routed by a load balancer back to the original server without
      looking at transport headers, if the server applies the flow label
      reflection. This enables the Path MTU Discovery past the ECMP router in
      load-balance or anycast environments where each server node is reachable
      by only one path.
      
      Introduce a sysctl to enable flow label reflection per net namespace for
      all newly created sockets. Same could be earlier achieved only per
      socket by setting the IPV6_FL_F_REFLECT flag for the IPV6_FLOWLABEL_MGR
      socket option.
      
      [1] https://tools.ietf.org/html/draft-wang-6man-flow-label-reflection-01Signed-off-by: default avatarJakub Sitnicki <jkbs@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      22b6722b
  2. 24 Aug, 2017 24 commits