1. 25 Jul, 2016 40 commits
    • Marcelo Ricardo Leitner's avatar
      sctp: also point GSO head_skb to the sk when it's available · 52253db9
      Marcelo Ricardo Leitner authored
      The head skb for GSO packets won't travel through the inner depths of
      SCTP stack as it doesn't contain any chunks on it. That means skb->sk
      doesn't get set and then when sctp_recvmsg() calls
      sctp_inet6_skb_msgname() on the head_skb it panics, as this last needs
      to check flags at the socket (sp->v4mapped).
      
      The fix is to initialize skb->sk for th head skb once we are able to do
      it. That is, when the first chunk is processed.
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      52253db9
    • Marcelo Ricardo Leitner's avatar
      sctp: fix BH handling on socket backlog · eefc1b1d
      Marcelo Ricardo Leitner authored
      Now that the backlog processing is called with BH enabled, we have to
      disable BH before taking the socket lock via bh_lock_sock() otherwise
      it may dead lock:
      
      sctp_backlog_rcv()
                      bh_lock_sock(sk);
      
                      if (sock_owned_by_user(sk)) {
                              if (sk_add_backlog(sk, skb, sk->sk_rcvbuf))
                                      sctp_chunk_free(chunk);
                              else
                                      backloged = 1;
                      } else
                              sctp_inq_push(inqueue, chunk);
      
                      bh_unlock_sock(sk);
      
      while sctp_inq_push() was disabling/enabling BH, but enabling BH
      triggers pending softirq, which then may try to re-lock the socket in
      sctp_rcv().
      
      [  219.187215]  <IRQ>
      [  219.187217]  [<ffffffff817ca3e0>] _raw_spin_lock+0x20/0x30
      [  219.187223]  [<ffffffffa041888c>] sctp_rcv+0x48c/0xba0 [sctp]
      [  219.187225]  [<ffffffff816e7db2>] ? nf_iterate+0x62/0x80
      [  219.187226]  [<ffffffff816f1b14>] ip_local_deliver_finish+0x94/0x1e0
      [  219.187228]  [<ffffffff816f1e1f>] ip_local_deliver+0x6f/0xf0
      [  219.187229]  [<ffffffff816f1a80>] ? ip_rcv_finish+0x3b0/0x3b0
      [  219.187230]  [<ffffffff816f17a8>] ip_rcv_finish+0xd8/0x3b0
      [  219.187232]  [<ffffffff816f2122>] ip_rcv+0x282/0x3a0
      [  219.187233]  [<ffffffff810d8bb6>] ? update_curr+0x66/0x180
      [  219.187235]  [<ffffffff816abac4>] __netif_receive_skb_core+0x524/0xa90
      [  219.187236]  [<ffffffff810d8e00>] ? update_cfs_shares+0x30/0xf0
      [  219.187237]  [<ffffffff810d557c>] ? __enqueue_entity+0x6c/0x70
      [  219.187239]  [<ffffffff810dc454>] ? enqueue_entity+0x204/0xdf0
      [  219.187240]  [<ffffffff816ac048>] __netif_receive_skb+0x18/0x60
      [  219.187242]  [<ffffffff816ad1ce>] process_backlog+0x9e/0x140
      [  219.187243]  [<ffffffff816ac8ec>] net_rx_action+0x22c/0x370
      [  219.187245]  [<ffffffff817cd352>] __do_softirq+0x112/0x2e7
      [  219.187247]  [<ffffffff817cc3bc>] do_softirq_own_stack+0x1c/0x30
      [  219.187247]  <EOI>
      [  219.187248]  [<ffffffff810aa1c8>] do_softirq.part.14+0x38/0x40
      [  219.187249]  [<ffffffff810aa24d>] __local_bh_enable_ip+0x7d/0x80
      [  219.187254]  [<ffffffffa0408428>] sctp_inq_push+0x68/0x80 [sctp]
      [  219.187258]  [<ffffffffa04190f1>] sctp_backlog_rcv+0x151/0x1c0 [sctp]
      [  219.187260]  [<ffffffff81692b07>] __release_sock+0x87/0xf0
      [  219.187261]  [<ffffffff81692ba0>] release_sock+0x30/0xa0
      [  219.187265]  [<ffffffffa040e46d>] sctp_accept+0x17d/0x210 [sctp]
      [  219.187266]  [<ffffffff810e7510>] ? prepare_to_wait_event+0xf0/0xf0
      [  219.187268]  [<ffffffff8172d52c>] inet_accept+0x3c/0x130
      [  219.187269]  [<ffffffff8168d7a3>] SYSC_accept4+0x103/0x210
      [  219.187271]  [<ffffffff817ca2ba>] ? _raw_spin_unlock_bh+0x1a/0x20
      [  219.187272]  [<ffffffff81692bfc>] ? release_sock+0x8c/0xa0
      [  219.187276]  [<ffffffffa0413e22>] ? sctp_inet_listen+0x62/0x1b0 [sctp]
      [  219.187277]  [<ffffffff8168f2d0>] SyS_accept+0x10/0x20
      
      Fixes: 860fbbc3 ("sctp: prepare for socket backlog behavior change")
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eefc1b1d
    • Haiyang Zhang's avatar
      hv_netvsc: Fix VF register on bonding devices · e2b9f1f7
      Haiyang Zhang authored
      Added a condition to avoid bonding devices with same MAC registering
      as VF.
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Reviewed-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e2b9f1f7
    • Colin Ian King's avatar
      kcm: remove redundant -ve error check and return path · 0a58f474
      Colin Ian King authored
      The check for a -ve error is redundant, remove it and just
      immediately return the return value from the call to
      seq_open_net.
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a58f474
    • Mike Manning's avatar
      net: ipv6: Always leave anycast and multicast groups on link down · ea06f717
      Mike Manning authored
      Default kernel behavior is to delete IPv6 addresses on link
      down, which entails deletion of the multicast and the
      subnet-router anycast addresses. These deletions do not
      happen with sysctl setting to keep global IPv6 addresses on
      link down, so every link down/up causes an increment of the
      anycast and multicast refcounts. These bogus refcounts may
      stop these addrs from being removed on subsequent calls to
      delete them. The solution is to leave the groups for the
      multicast and subnet anycast on link down for the callflow
      when global IPv6 addresses are kept.
      
      Fixes: f1705ec1 ("net: ipv6: Make address flushing on ifdown optional")
      Signed-off-by: default avatarMike Manning <mmanning@brocade.com>
      Acked-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ea06f717
    • David S. Miller's avatar
      Merge tag 'wireless-drivers-next-for-davem-2016-07-22' of... · d5b160d3
      David S. Miller authored
      Merge tag 'wireless-drivers-next-for-davem-2016-07-22' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
      
      Kalle Valo says:
      
      ====================
      pull-request: wireless-drivers-next 2016-07-22
      
      I'm sick so I have to keep this short, but here's the last pull request
      to net-next. This time there's a trivial conflict with mtd tree:
      
      http://lkml.kernel.org/g/20160720123133.44dab209@canb.auug.org.au
      
      We concluded with Brian (CCed) that it's best that we ask Linus to fix
      this. The patches have been in linux-next for a couple of days. This
      time I haven't done any merge tests so I don't know if there are any
      other conflicts etc.
      
      Please let me know if there are any problems.
      
      wireless-drivers-next patches for 4.8
      
      Major changes:
      
      wl18xx
      
      * add initial mesh support
      
      bcma
      
      * serial flash support on non-MIPS SoCs
      
      ath10k
      
      * enable support for QCA9888
      * disable wake_tx_queue() mac80211 op for older devices to workaround
        throughput regression
      
      ath9k
      
      * implement temperature compensation support for AR9003+
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d5b160d3
    • Wei Yongjun's avatar
      libcxgb: remove unused including <linux/version.h> · 15657841
      Wei Yongjun authored
      Remove including <linux/version.h> that don't need it.
      Signed-off-by: default avatarWei Yongjun <weiyj.lk@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      15657841
    • Xin Long's avatar
      sctp: use inet_recvmsg to support sctp RFS well · fd2d180a
      Xin Long authored
      Commit 486bdee0 ("sctp: add support for RPS and RFS")
      saves skb->hash into sk->sk_rxhash so that the inet_* can
      record it to flow table.
      
      But sctp uses sock_common_recvmsg as .recvmsg instead
      of inet_recvmsg, sock_common_recvmsg doesn't invoke
      sock_rps_record_flow to record the flow. It may cause
      that the receiver has no chances to record the flow if
      it doesn't send msg or poll the socket.
      
      So this patch fixes it by using inet_recvmsg as .recvmsg
      in sctp.
      
      Fixes: 486bdee0 ("sctp: add support for RPS and RFS")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd2d180a
    • David S. Miller's avatar
      Merge branch 'macsec-icv-fixes' · 07a01697
      David S. Miller authored
      Davide Caratti says:
      
      ====================
      macsec: fix configurable ICV length
      
      This series provides a fix for macsec configurable ICV length. The
      maximum length of ICV element has been made compliant to IEEE 802.1AE,
      and error reporting in case of cipher suite configuration failure has been
      improved. Finally, a test has been added to netlink verify() callback in
      order to avoid creation of macsec interfaces having user-provided ICV length
      values that are not supported by the cipher suite.
      ====================
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Acked-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      07a01697
    • Davide Caratti's avatar
      macsec: validate ICV length on link creation · f04c392d
      Davide Caratti authored
      Test the cipher suite initialization in case ICV length has a value
      different than its default. If this test fails, creation of a new macsec
      link will also fail. This avoids situations where further security
      associations can't be added due to failures of crypto_aead_setauthsize(),
      caused by unsupported user-provided values of the ICV length.
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f04c392d
    • Davide Caratti's avatar
      macsec: fix error codes when a SA is created · 34aedfee
      Davide Caratti authored
      preserve the return value of AEAD functions that are called when a SA is
      created, to avoid inappropriate display of "RTNETLINK answers: Cannot
      allocate memory" message.
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34aedfee
    • Davide Caratti's avatar
      macsec: limit ICV length to 16 octets · 2ccbe2cb
      Davide Caratti authored
      IEEE 802.1AE-2006 standard recommends that the ICV element in a MACsec
      frame should not exceed 16 octets: add MACSEC_STD_ICV_LEN in uapi
      definitions accordingly, and avoid accepting configurations where the ICV
      length exceeds the standard value. Leave definition of MACSEC_MAX_ICV_LEN
      unchanged for backwards compatibility with userspace programs.
      
      Fixes: dece8d2b ("uapi: add MACsec bits")
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2ccbe2cb
    • Ido Schimmel's avatar
      bridge: Fix incorrect re-injection of LLDP packets · baedbe55
      Ido Schimmel authored
      Commit 8626c56c ("bridge: fix potential use-after-free when hook
      returns QUEUE or STOLEN verdict") caused LLDP packets arriving through a
      bridge port to be re-injected to the Rx path with skb->dev set to the
      bridge device, but this breaks the lldpad daemon.
      
      The lldpad daemon opens a packet socket with protocol set to ETH_P_LLDP
      for any valid device on the system, which doesn't not include soft
      devices such as bridge and VLAN.
      
      Since packet sockets (ptype_base) are processed in the Rx path after the
      Rx handler, LLDP packets with skb->dev set to the bridge device never
      reach the lldpad daemon.
      
      Fix this by making the bridge's Rx handler re-inject LLDP packets with
      RX_HANDLER_PASS, which effectively restores the behaviour prior to the
      mentioned commit.
      
      This means netfilter will never receive LLDP packets coming through a
      bridge port, as I don't see a way in which we can have okfn() consume
      the packet without breaking existing behaviour. I've already carried out
      a similar fix for STP packets in commit 56fae404 ("bridge: Fix
      incorrect re-injection of STP packets").
      
      Fixes: 8626c56c ("bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdict")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      baedbe55
    • Xin Long's avatar
      sctp: support ipv6 nonlocal bind · 9b974202
      Xin Long authored
      This patch makes sctp support ipv6 nonlocal bind by adding
      sp->inet.freebind and net->ipv6.sysctl.ip_nonlocal_bind
      check in sctp_v6_available as what sctp did to support
      ipv4 nonlocal bind (commit cdac4e07).
      Reported-by: default avatarShijoe George <spanjikk@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b974202
    • David S. Miller's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · da54bb13
      David S. Miller authored
      Conflicts:
      	drivers/net/ethernet/intel/i40e/i40e_main.c
      
      Jeff Kirsher says:
      
      ====================
      40GbE Intel Wired LAN Driver Updates 2016-07-22
      
      This series contains updates to i40e and i40evf.
      
      Heinrich Schuchardt found a possible null pointer being dereferenced in
      i40e_debug_aq(), fixed the issue by doing the variable assignment after
      we are sure the pointer is not null.
      
      Avinash fixed an issue when link was down, we were not showing the
      correct advertised link modes.
      
      Mitch cleans up a useless initializer since the variable is assigned
      right away.  Refactors the receive filter handling to properly track
      filter adds and deletes so the driver will not lose filters during a
      reset and up/down cycles.  Also added a tracking mechanism so that the
      driver knows when to enter and leave promiscuous mode.
      
      Catherine removes a device id which is not needed (or used).  Moves
      a mutex lock since we need to lock the client list around the
      i40e_client_release() call to prevent the release from interrupting
      the client instances while they are being added.
      
      Joshua adds Hyper-V specific VF device ids.
      
      Amitoj Kaur Chawla cleans up a redundant memset() call before a memcpy().
      
      Stefan Assmann adds the missing link advertise for some x710 NICs.
      
      Tushar Dave fixes and issue found on SPARC, where a PF reset clears MAC
      filters and if a platform-specific MAC address is used, the driver has
      to explicitly write default MAC address to MAC filters otherwise all
      incoming traffic destined to the default MAC address will be dropped
      after reset.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      da54bb13
    • Daniel Borkmann's avatar
      bpf, events: fix offset in skb copy handler · aa7145c1
      Daniel Borkmann authored
      This patch fixes the __output_custom() routine we currently use with
      bpf_skb_copy(). I missed that when len is larger than the size of the
      current handle, we can issue multiple invocations of copy_func, and
      __output_custom() advances destination but also source buffer by the
      written amount of bytes. When we have __output_custom(), this is actually
      wrong since in that case the source buffer points to a non-linear object,
      in our case an skb, which the copy_func helper is supposed to walk.
      Therefore, since this is non-linear we thus need to pass the offset into
      the helper, so that copy_func can use it for extracting the data from
      the source object.
      
      Therefore, adjust the callback signatures properly and pass offset
      into the skb_header_pointer() invoked from bpf_skb_copy() callback. The
      __DEFINE_OUTPUT_COPY_BODY() is adjusted to accommodate for two things:
      i) to pass in whether we should advance source buffer or not; this is
      a compile-time constant condition, ii) to pass in the offset for
      __output_custom(), which we do with help of __VA_ARGS__, so everything
      can stay inlined as is currently. Both changes allow for adapting the
      __output_* fast-path helpers w/o extra overhead.
      
      Fixes: 555c8a86 ("bpf: avoid stack copy and use skb ctx for event output")
      Fixes: 7e3f977e ("perf, events: add non-linear data support for raw records")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aa7145c1
    • Arnd Bergmann's avatar
      net/ncsi: avoid maybe-uninitialized warning · a1b43edd
      Arnd Bergmann authored
      gcc-4.9 and higher warn about the newly added NSCI code:
      
      net/ncsi/ncsi-manage.c: In function 'ncsi_process_next_channel':
      net/ncsi/ncsi-manage.c:1003:2: error: 'old_state' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      
      The warning is a false positive and therefore harmless, but it would be good to
      avoid it anyway. I have determined that the barrier in the spin_unlock_irqsave()
      is what confuses gcc to the point that it cannot track whether the variable
      was unused or not.
      
      This rearranges the code in a way that makes it obvious to gcc that old_state
      is always initialized at the time of use, functionally this should not
      change anything.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a1b43edd
    • David S. Miller's avatar
      Merge branch 'libcxgb' · 974b9963
      David S. Miller authored
      Varun Prakash says:
      
      ====================
      common library for Chelsio drivers.
      
       This patch series adds common library module(libcxgb.ko)
       for Chelsio drivers to remove duplicate code.
      
       This series moves common iSCSI DDP Page Pod manager
       code from cxgb4.ko to libcxgb.ko, earlier this code
       was used by only cxgbit.ko now it is used by
       three Chelsio iSCSI drivers cxgb3i, cxgb4i, cxgbit.
      
       In future this module will have common connection
       management and hardware specific code that can
       be shared by multiple Chelsio drivers(cxgb4,
       csiostor, iw_cxgb4, cxgb4i, cxgbit).
      
       Please review.
      
       Thanks
      
      -v3
      - removed unused module init and exit functions.
      
      -v2
      - updated CONFIG_CHELSIO_LIB to an invisible option
      - changed libcxgb.ko module license from GPL to Dual BSD/GPL
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      974b9963
    • Varun Prakash's avatar
      cxgb3i, cxgb4i: fix symbol not declared sparse warning · 4665bdd5
      Varun Prakash authored
      Fix following sparse warnings
      warning: symbol 'cxgb3i_ofld_init' was not declared. Should it be static?
      warning: symbol 'cxgb4i_cplhandlers' was not declared. Should it be static?
      warning: symbol 'cxgb4i_ofld_init' was not declared. Should it be static?
      Signed-off-by: default avatarVarun Prakash <varun@chelsio.com>
      Reviewed-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4665bdd5
    • Varun Prakash's avatar
      libcxgb: export ppm release and tagmask set api · 9d5c44b7
      Varun Prakash authored
      Export cxgbi_ppm_release() to release
      ppod manager and cxgbi_tagmask_set() to
      set tag mask, they are used by cxgb3i, cxgb4i
      and cxgbit.
      Signed-off-by: default avatarVarun Prakash <varun@chelsio.com>
      Reviewed-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9d5c44b7
    • Varun Prakash's avatar
      cxgb3i: add iSCSI DDP support · b75113b1
      Varun Prakash authored
      Add iSCSI DDP support in cxgb3i driver
      using common iSCSI DDP Page Pod Manager.
      Signed-off-by: default avatarVarun Prakash <varun@chelsio.com>
      Reviewed-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b75113b1
    • Varun Prakash's avatar
      cxgb4i,libcxgbi: add iSCSI DDP support · 71f7a00b
      Varun Prakash authored
      Add iSCSI DDP support in cxgb4i driver
      using common iSCSI DDP Page Pod Manager.
      Signed-off-by: default avatarVarun Prakash <varun@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      71f7a00b
    • Varun Prakash's avatar
      cxgb3i,cxgb4i,libcxgbi: remove iSCSI DDP support · 5999299f
      Varun Prakash authored
      Remove old ddp code from cxgb3i,cxgb4i,libcxgbi.
      
      Next two commits adds DDP support using
      common iSCSI DDP Page Pod Manager.
      Signed-off-by: default avatarVarun Prakash <varun@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5999299f
    • Varun Prakash's avatar
      libcxgb: add library module for Chelsio drivers · b8b9d81b
      Varun Prakash authored
      Add common library module(libcxgb.ko) for
      Chelsio drivers to remove duplicate code.
      
      Code for iSCSI DDP Page Pod Manager is moved
      from cxgb4.ko to libcxgb.ko. Earlier only cxgbit.ko
      was using this code, now cxgb3i and cxgb4i will
      also use common Page Pod manager code.
      
      In future this module will have common connection
      management and hardware specific code that can be
      shared by multiple Chelsio drivers.
      Signed-off-by: default avatarVarun Prakash <varun@chelsio.com>
      Reviewed-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b8b9d81b
    • Vivien Didelot's avatar
      net: bridge: br_set_ageing_time takes a clock_t · 9e0b27fe
      Vivien Didelot authored
      Change the ageing_time type in br_set_ageing_time() from u32 to what it
      is expected to be, i.e. a clock_t.
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9e0b27fe
    • Vivien Didelot's avatar
      net: bridge: fix br_stp_enable_bridge comment · dba479f3
      Vivien Didelot authored
      br_stp_enable_bridge() does take the br->lock spinlock. Fix its wrongly
      pasted comment and use the same as br_stp_disable_bridge().
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dba479f3
    • Ganesh Goudar's avatar
      cxgb4/cxgb4vf: Add link mode mask API to cxgb4 and cxgb4vf · eb97ad99
      Ganesh Goudar authored
      Based on original work by Casey Leedom <leedom@chelsio.com>
      Signed-off-by: default avatarGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eb97ad99
    • Mark Bloch's avatar
      net/bonding: Enforce active-backup policy for IPoIB bonds · 1533e773
      Mark Bloch authored
      When using an IPoIB bond currently only active-backup mode is a valid
      use case and this commit strengthens it.
      
      Since commit 2ab82852 ("net/bonding: Enable bonding to enslave
      netdevices not supporting set_mac_address()") was introduced till
      4.7-rc1, IPoIB didn't support the set_mac_address ndo, and hence the
      fail over mac policy always applied to IPoIB bonds.
      
      With the introduction of commit 492a7e67 ("IB/IPoIB: Allow setting
      the device address"), that doesn't hold and practically IPoIB bonds are
      broken as of that. To fix it, lets go to fail over mac if the device
      doesn't support the ndo OR this is IPoIB device.
      
      As a by-product, this commit also prevents a stack corruption which
      occurred when trying to copy 20 bytes (IPoIB) device address
      to a sockaddr struct that has only 16 bytes of storage.
      Signed-off-by: default avatarMark Bloch <markb@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Acked-by: default avatarAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1533e773
    • David S. Miller's avatar
      Merge branch 'mlxsw-port-mirroring' · bc0c419e
      David S. Miller authored
      Jiri Pirko says:
      
      ====================
      mlxsw: implement port mirroring offload
      
      This patchset introduces tc matchall classifier and its offload
      to Spectrum hardware. In combination with mirred action, defined port mirroring
      setup is offloaded by mlxsw/spectrum driver.
      
      The commands used for creating mirror ports:
      
      tc qdisc  add dev eth25 handle ffff: ingress
      tc filter add dev eth25 parent ffff:            \
              matchall skip_sw                        \
              action mirred egress mirror             \
              dev eth27
      
      tc qdisc add dev eth25 handle 1: root prio
      tc filter add dev eth25 parent 1:               \
              matchall skip_sw                        \
              action mirred egress mirror             \
              dev eth27
      
      These patches contain:
       - Resource query implementation
       - Hardware port mirorring support for spectrum.
       - Definition of the matchall traffic classifier.
       - General support for hw-offloading for that classifier.
       - Specific spectrum implementaion for matchall offloading.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bc0c419e
    • Yotam Gigi's avatar
      mlxsw: spectrum: Add support in matchall mirror TC offloading · 763b4b70
      Yotam Gigi authored
      This patch offloads port mirroring directives to hw using the matchall TC
      with action mirror. It includes both the implementation of the
      ndo_setup_tc function for the spectrum driver and the spectrum hardware
      offload configuration code.
      
      The hardware offload code is basically two new functions which are capable
      of adding and removing a new mirror ports pair. It is done using the MPAT,
      MPAR and SBIB registers:
       - A new Switch-Port Analyzer (SPAN) entry is added using MPAT to the 'to'
         port.
       - The 'to' port is bound to the SPAN entry using MPAR register.
       - In case of egress SPAN, the 'to' port gets a new internal shared
         buffer using SBIB register.
      
      In addition, a new database was added to the mlxsw_sp struct to store all
      the SPAN entries and their bound ports list. The number of supported SPAN
      entries is determined by resource query.
      Signed-off-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      763b4b70
    • Yotam Gigi's avatar
      net/sched: act_mirred: Add helper inlines to access tcf_mirred info. · 56a20680
      Yotam Gigi authored
      The helper function is_tcf_mirred_mirror helps finding whether an action
      struct is of type mirred and is configured to be of type mirror.
      Signed-off-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      56a20680
    • Yotam Gigi's avatar
      mlxsw: reg: Add the Monitoring Port Analyzer register · 23019054
      Yotam Gigi authored
      The MPAR register is used to bind ports to a SPAN entry (which was
      created using MPAT register) and thus mirror their traffic (ingress /
      egress) to a different port.
      Signed-off-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      23019054
    • Yotam Gigi's avatar
      mlxsw: reg: Add Monitoring Port Analyzer Table register · 43a46856
      Yotam Gigi authored
      The MPAT register is used to query and configure the Switch Port Analyzer
      (SPAN) table. This register is used to configure a port as a mirror output
      port, while after that a mirrored input port can be bound using MPAR
      register.
      Signed-off-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      43a46856
    • Yotam Gigi's avatar
      mlxsw: reg: Add Shared Buffer Internal Buffer register · 51ae8cc6
      Yotam Gigi authored
      The SBIB register configures per port buffer for internal use. This
      register is used to configure an egress mirror buffer on the egress port
      which does the mirroring.
      Signed-off-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      51ae8cc6
    • Yotam Gigi's avatar
      net/sched: Add match-all classifier hw offloading. · b87f7936
      Yotam Gigi authored
      Following the work that have been done on offloading classifiers like u32
      and flower, now the match-all classifier hw offloading is possible. if
      the interface supports tc offloading.
      
      To control the offloading, two tc flags have been introduced: skip_sw and
      skip_hw. Typical usage:
      
      tc filter add dev eth25 parent ffff: 	\
      	matchall skip_sw		\
      	action mirred egress mirror	\
      	dev eth27
      Signed-off-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b87f7936
    • Jiri Pirko's avatar
      net/sched: introduce Match-all classifier · bf3994d2
      Jiri Pirko authored
      The matchall classifier matches every packet and allows the user to apply
      actions on it. This filter is very useful in usecases where every packet
      should be matched, for example, packet mirroring (SPAN) can be setup very
      easily using that filter.
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bf3994d2
    • Nogah Frankel's avatar
      mlxsw: pci: Add max span resources to resources query · ded821c8
      Nogah Frankel authored
      Add max span resources to resources query.
      Signed-off-by: default avatarNogah Frankel <nogahf@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ded821c8
    • Nogah Frankel's avatar
      mlxsw: pci: Add resources query implementation. · 57d316ba
      Nogah Frankel authored
      Add resources query implementation. If exists, query the HW for its
      builtin resources instead of having them as consts in the code.
      Signed-off-by: default avatarNogah Frankel <nogahf@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      57d316ba
    • Kristian Evensen's avatar
      cdc_ether: Improve ZTE MF823/831/910 handling · bfe9b9d2
      Kristian Evensen authored
      The firmware in several ZTE devices (at least the MF823/831/910
      modems/mifis) use OS fingerprinting to determine which type of device to
      export. In addition, these devices export a REST API which can be used to
      control the type of device. So far, on Linux, the devices have been seen as
      RNDIS or CDC Ether.
      
      When CDC Ether is used, devices of the same type are, as with RNDIS,
      exported with the same, bogus random MAC address. In addition, the devices
      (at least on all firmware revisions I have found) use the bogus MAC when
      sending traffic routed from external networks. And as a final feature, the
      devices sometimes export the link state incorrectly. There are also
      references online to several other ZTE devices displaying this behavior,
      with several different PIDs and MAC addresses.
      
      This patch tries to improve the handling of ZTE devices by doing the
      following:
      
      * Create a new driver_info-struct that is used by ZTE devices that do not
      have an explicit entry in the product table. This struct is the same as the
      default cdc_ether driver info, but a new bind- and an rx_fixup-function
      have been added.
      
      * In the new bind function, we check if we have read a random MAC from the
      device. If we have, then we generate a new random MAC address. This will
      ensure that all devices get a unique MAC.
      
      * The rx_fixup-function replaces the destination MAC address in the skb
      with that of the device. I have not seen a revision of these devices that
      behaves correctly (i.e., sets the right destination MAC), so I chose not to
      do any comparison with for example the known, bogus addresses.
      
      * The MF823/MF832/MF910 sometimes export cdc carrier on twice on connect
      (the correct behavior is off then on). Work around this by manually setting
      carrier to off if an on-notification is received and the NOCARRIER-bit is
      not set.
      
      This change will affect all devices, but it should take care of similar
      mistakes made by other manufacturers. I tried to think of/look/test for
      problems/regressions that could be introduced by this behavior, but could
      not find any. However, my familiarity with this code path is not that
      great, so there could be something I have overlooked.
      
      I have tested this patch with multiple revisions of all three devices, and
      they behave as expected. In other words, they all got a valid, random MAC,
      the correct operational state and I can receive/sent traffic without
      problems. I also tested with some other cdc_ether devices I have and did
      not find any problems/regressions caused by the two general changes.
      
      v3->v4:
      * Forgot to remove unused variables, sorry about that (thanks David
      Miller).
      
      v2->v3:
      * I had forgot to remove the random MAC generation from usbnet_cdc_bind()
      (thanks Oliver).
      * Rework logic in the ZTE bind-function a bit.
      
      v1->v2:
      * Only generate random MAC for ZTE devices (thanks Oliver Neukum).
      * Set random MAC and do RX fixup for all ZTE devices that do not have a
      product-entry, as the bogus MAC have been seen on devices with several
      different PIDs/MAC addresses. In other words, it seems to be the default
      behavior of ZTE CDC Ether devices (thanks Lars Melin).
      Signed-off-by: default avatarKristian Evensen <kristian.evensen@gmail.com>
      Acked-by: default avatarOliver Neukum <oneukum@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bfe9b9d2
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · c42d7121
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter/IPVS updates for net-next
      
      The following patchset contains Netfilter/IPVS updates for net-next,
      they are:
      
      1) Count pre-established connections as active in "least connection"
         schedulers such that pre-established connections to avoid overloading
         backend servers on peak demands, from Michal Kubecek via Simon Horman.
      
      2) Address a race condition when resizing the conntrack table by caching
         the bucket size when fulling iterating over the hashtable in these
         three possible scenarios: 1) dump via /proc/net/nf_conntrack,
         2) unlinking userspace helper and 3) unlinking custom conntrack timeout.
         From Liping Zhang.
      
      3) Revisit early_drop() path to perform lockless traversal on conntrack
         eviction under stress, use del_timer() as synchronization point to
         avoid two CPUs evicting the same entry, from Florian Westphal.
      
      4) Move NAT hlist_head to nf_conn object, this simplifies the existing
         NAT extension and it doesn't increase size since recent patches to
         align nf_conn, from Florian.
      
      5) Use rhashtable for the by-source NAT hashtable, also from Florian.
      
      6) Don't allow --physdev-is-out from OUTPUT chain, just like
         --physdev-out is not either, from Hangbin Liu.
      
      7) Automagically set on nf_conntrack counters if the user tries to
         match ct bytes/packets from nftables, from Liping Zhang.
      
      8) Remove possible_net_t fields in nf_tables set objects since we just
         simply pass the net pointer to the backend set type implementations.
      
      9) Fix possible off-by-one in h323, from Toby DiPasquale.
      
      10) early_drop() may be called from ctnetlink patch, so we must hold
          rcu read size lock from them too, this amends Florian's patch #3
          coming in this batch, from Liping Zhang.
      
      11) Use binary search to validate jump offset in x_tables, this
          addresses the O(n!) validation that was introduced recently
          resolve security issues with unpriviledge namespaces, from Florian.
      
      12) Fix reference leak to connlabel in error path of nft_ct, from Zhang.
      
      13) Three updates for nft_log: Fix log prefix leak in error path. Bail
          out on loglevel larger than debug in nft_log and set on the new
          NF_LOG_F_COPY_LEN flag when snaplen is specified. Again from Zhang.
      
      14) Allow to filter rule dumps in nf_tables based on table and chain
          names.
      
      15) Simplify connlabel to always use 128 bits to store labels and
          get rid of unused function in xt_connlabel, from Florian.
      
      16) Replace set_expect_timeout() by mod_timer() from the h323 conntrack
          helper, by Gao Feng.
      
      17) Put back x_tables module reference in nft_compat on error, from
          Liping Zhang.
      
      18) Add a reference count to the x_tables extensions cache in
          nft_compat, so we can remove them when unused and avoid a crash
          if the extensions are rmmod, again from Zhang.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c42d7121