1. 12 Jun, 2014 11 commits
  2. 11 Jun, 2014 29 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 902455e0
      David S. Miller authored
      Conflicts:
      	net/core/rtnetlink.c
      	net/core/skbuff.c
      
      Both conflicts were very simple overlapping changes.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      902455e0
    • Doug Ledford's avatar
      net/core: Add VF link state control policy · c5b46160
      Doug Ledford authored
      Commit 1d8faf48 (net/core: Add VF link state control) added VF link state
      control to the netlink VF nested structure, but failed to add a proper entry
      for the new structure into the VF policy table.  Add the missing entry so
      the table and the actual data copied into the netlink nested struct are in
      sync.
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5b46160
    • Andy Fleming's avatar
      39f33367
    • Shruti Kanetkar's avatar
      net/fsl: Make xgmac_mdio read error message useful · 55fd3641
      Shruti Kanetkar authored
      Print the device address, the register number and the PHY ID for
      which the MDIO read operation failed
      Signed-off-by: default avatarShruti Kanetkar <Shruti@Freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      55fd3641
    • Florian Westphal's avatar
      net_sched: drr: warn when qdisc is not work conserving · 6e765a00
      Florian Westphal authored
      The DRR scheduler requires that items on the active list are work
      conserving, i.e. do not hold on to skbs for throttling purposes, etc.
      Attaching e.g. tbf renders DRR useless because all other classes on the
      active list are delayed as well.
      
      So, warn users that this configuration won't work as expected; we
      already do this in couple of other qdiscs, see e.g.
      
      commit b00355db
      ('pkt_sched: sch_hfsc: sch_htb: Add non-work-conserving warning handler')
      
      The 'const' change is needed to avoid compiler warning ("discards 'const'
      qualifier from pointer target type").
      
      tested with:
      drr_hier() {
              parent=$1
              classes=$2
              for i in  $(seq 1 $classes); do
                      classid=$parent$(printf %x $i)
                      tc class add dev eth0 parent $parent classid $classid drr
      		tc qdisc add dev eth0 parent $classid tbf rate 64kbit burst 256kbit limit 64kbit
              done
      }
      tc qdisc add dev eth0 root handle 1: drr
      drr_hier 1: 32
      tc filter add dev eth0 protocol all pref 1 parent 1: handle 1 flow hash keys dst perturb 1 divisor 32
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e765a00
    • David S. Miller's avatar
      Merge branch 'inet_csums' · f3591fd4
      David S. Miller authored
      Tom Herbert says:
      
      ====================
      net: Checksum offload changes - Part IV
      
      I am working on overhauling RX checksum offload. Goals of this effort
      are:
      
      - Specify what exactly it means when driver returns CHECKSUM_UNNECESSARY
      - Preserve CHECKSUM_COMPLETE through encapsulation layers
      - Don't do skb_checksum more than once per packet
      - Unify GRO and non-GRO csum verification as much as possible
      - Unify the checksum functions (checksum_init)
      - Simply code
      
      What is in this fourth patch set:
      
      - Preserve CHECKSUM_COMPLETE instead of changing it to
        CHECKSUM_UNNECESSARY. This allows correct reuse in validating multiple
        csums in a packet.
      - When SW needs to compute the packet checksum, save it as
        CHECKSUM_COMPLETE. Also mark that checksum was compute by SW.
      - Add skb_gro_postpull_rcsum to udp and vxlan to make GRO work with
        CHECKSUM_COMPLETE.
      
      v2: Removed patch setting skb_encapsulation when validating checksum
          in tcp_gro_receive
      
      Please review carefully and test if possible, mucking with basic
      checksum functions is always a little precarious :-)
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f3591fd4
    • Tom Herbert's avatar
      net: Add skb_gro_postpull_rcsum to udp and vxlan · 6bae1d4c
      Tom Herbert authored
      Need to gro_postpull_rcsum for GRO to work with checksum complete.
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6bae1d4c
    • Tom Herbert's avatar
      net: Save software checksum complete · 7e3cead5
      Tom Herbert authored
      In skb_checksum complete, if we need to compute the checksum for the
      packet (via skb_checksum) save the result as CHECKSUM_COMPLETE.
      Subsequent checksum verification can use this.
      
      Also, added csum_complete_sw flag to distinguish between software and
      hardware generated checksum complete, we should always be able to trust
      the software computation.
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e3cead5
    • Tom Herbert's avatar
      net: Preserve CHECKSUM_COMPLETE at validation · 5d0c2b95
      Tom Herbert authored
      Currently when the first checksum in a packet is validated using
      CHECKSUM_COMPLETE, ip_summed is overwritten to be CHECKSUM_UNNECESSARY
      so that any subsequent checksums in the packet are not correctly
      validated.
      
      This patch adds csum_valid flag in sk_buff and uses that to indicate
      validated checksum instead of setting CHECKSUM_UNNECESSARY. The bit
      is set accordingly in the skb_checksum_validate_* functions. The flag
      is checked in skb_checksum_complete, so that validation is communicated
      between checksum_init and checksum_complete sequence in TCP and UDP.
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5d0c2b95
    • David S. Miller's avatar
      Merge branch 'qlcnic-next' · 1054cc15
      David S. Miller authored
      Shahed Shaikh says:
      
      ====================
      This series contains an enhancement in the area of firmware minidump collection
      and optimization of ring count validation function.
      
      Please apply this series to net-next.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1054cc15
    • Shahed Shaikh's avatar
      038782d6
    • Shahed Shaikh's avatar
      qlcnic: Optimize ring count validations · 18e0d625
      Shahed Shaikh authored
      - Check interrupt mode at the start of qlcnic_set_channels().
      - Do not validate ring count if they are not going to change.
      Signed-off-by: default avatarShahed Shaikh <shahed.shaikh@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      18e0d625
    • Shahed Shaikh's avatar
      qlcnic: Pre-allocate DMA buffer used for minidump collection · 4da005cf
      Shahed Shaikh authored
      Pre-allocate the physically contiguous DMA buffer used for
      minidump collection at driver load time, rather than at
      run time, to minimize allocation failures. Driver will allocate
      the buffer at load time if PEX DMA support capability is indicated
      by the adapter.
      Signed-off-by: default avatarShahed Shaikh <shahed.shaikh@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4da005cf
    • Dmitry Popov's avatar
      ip_vti: fix sparse warnings for VTI_ISVTI · efd0f11d
      Dmitry Popov authored
      This patch fixes the following sparse warnings:
      
      net/ipv4/ip_tunnel.c:245:53: warning: restricted __be16 degrades to integer
      net/ipv4/ip_vti.c:321:19: warning: incorrect type in assignment (different base types)
      net/ipv4/ip_vti.c:321:19:    expected restricted __be16 [addressable] [assigned] [usertype] i_flags
      net/ipv4/ip_vti.c:321:19:    got int
      net/ipv4/ip_vti.c:447:24: warning: incorrect type in assignment (different base types)
      net/ipv4/ip_vti.c:447:24:    expected restricted __be16 [usertype] i_flags
      net/ipv4/ip_vti.c:447:24:    got int
      
      Since VTI_ISVTI is always used with ip_tunnel_parm->i_flags (which is __be16),
      we can __force cast VTI_ISVTI to __be16 in header file.
      Signed-off-by: default avatarDmitry Popov <ixaphire@qrator.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      efd0f11d
    • Dan Carpenter's avatar
      drivers: net: davinci_cpdma: double free on error · 2f87208e
      Dan Carpenter authored
      We recently change the kzalloc() to devm_kzalloc() so freeing "ctlr"
      here could lead to a double free.
      
      Fixes: e1943128 ('drivers: net: davinci_cpdma: Convert kzalloc() to devm_kzalloc().')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f87208e
    • Dan Carpenter's avatar
      amd-xgbe: unwind on error in xgbe_mdio_register() · 8fc908c3
      Dan Carpenter authored
      There is a typo here so we return directly instead of unwinding.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8fc908c3
    • Varka Bhadram's avatar
      mrf24j40: add device managed APIs · 0aaf43f5
      Varka Bhadram authored
      adds the device managed APIs so that no need worry about
      freeing the resources.
      Signed-off-by: default avatarVarka Bhadram <varkab@cdac.in>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0aaf43f5
    • stephen hemminger's avatar
      ceph: remove bogus extern · f6479449
      stephen hemminger authored
      Sparse complained about this bogus extern on definition of
      a function.
      Signed-off-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f6479449
    • Alexei Starovoitov's avatar
      net: filter: document internal instruction encoding · 783e327b
      Alexei Starovoitov authored
      This patch adds a description of eBPFs instruction encoding in order
      to bring the documentation in line with the implementation.
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      783e327b
    • Alexei Starovoitov's avatar
      net: filter: mention eBPF terminology as well · e4ad4032
      Alexei Starovoitov authored
      Since the term eBPF is used anyway on mailing list discussions, lets
      also document that in the main BPF documentation file and replace a
      couple of occurrences with eBPF terminology to be more clear.
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e4ad4032
    • Eric Dumazet's avatar
      ipv4: fix a race in ip4_datagram_release_cb() · 9709674e
      Eric Dumazet authored
      Alexey gave a AddressSanitizer[1] report that finally gave a good hint
      at where was the origin of various problems already reported by Dormando
      in the past [2]
      
      Problem comes from the fact that UDP can have a lockless TX path, and
      concurrent threads can manipulate sk_dst_cache, while another thread,
      is holding socket lock and calls __sk_dst_set() in
      ip4_datagram_release_cb() (this was added in linux-3.8)
      
      It seems that all we need to do is to use sk_dst_check() and
      sk_dst_set() so that all the writers hold same spinlock
      (sk->sk_dst_lock) to prevent corruptions.
      
      TCP stack do not need this protection, as all sk_dst_cache writers hold
      the socket lock.
      
      [1]
      https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel
      
      AddressSanitizer: heap-use-after-free in ipv4_dst_check
      Read of size 2 by thread T15453:
       [<ffffffff817daa3a>] ipv4_dst_check+0x1a/0x90 ./net/ipv4/route.c:1116
       [<ffffffff8175b789>] __sk_dst_check+0x89/0xe0 ./net/core/sock.c:531
       [<ffffffff81830a36>] ip4_datagram_release_cb+0x46/0x390 ??:0
       [<ffffffff8175eaea>] release_sock+0x17a/0x230 ./net/core/sock.c:2413
       [<ffffffff81830882>] ip4_datagram_connect+0x462/0x5d0 ??:0
       [<ffffffff81846d06>] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
       [<ffffffff817580ac>] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
       [<ffffffff817596ce>] SyS_connect+0xe/0x10 ./net/socket.c:1682
       [<ffffffff818b0a29>] system_call_fastpath+0x16/0x1b
      ./arch/x86/kernel/entry_64.S:629
      
      Freed by thread T15455:
       [<ffffffff8178d9b8>] dst_destroy+0xa8/0x160 ./net/core/dst.c:251
       [<ffffffff8178de25>] dst_release+0x45/0x80 ./net/core/dst.c:280
       [<ffffffff818304c1>] ip4_datagram_connect+0xa1/0x5d0 ??:0
       [<ffffffff81846d06>] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
       [<ffffffff817580ac>] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
       [<ffffffff817596ce>] SyS_connect+0xe/0x10 ./net/socket.c:1682
       [<ffffffff818b0a29>] system_call_fastpath+0x16/0x1b
      ./arch/x86/kernel/entry_64.S:629
      
      Allocated by thread T15453:
       [<ffffffff8178d291>] dst_alloc+0x81/0x2b0 ./net/core/dst.c:171
       [<ffffffff817db3b7>] rt_dst_alloc+0x47/0x50 ./net/ipv4/route.c:1406
       [<     inlined    >] __ip_route_output_key+0x3e8/0xf70
      __mkroute_output ./net/ipv4/route.c:1939
       [<ffffffff817dde08>] __ip_route_output_key+0x3e8/0xf70 ./net/ipv4/route.c:2161
       [<ffffffff817deb34>] ip_route_output_flow+0x14/0x30 ./net/ipv4/route.c:2249
       [<ffffffff81830737>] ip4_datagram_connect+0x317/0x5d0 ??:0
       [<ffffffff81846d06>] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
       [<ffffffff817580ac>] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
       [<ffffffff817596ce>] SyS_connect+0xe/0x10 ./net/socket.c:1682
       [<ffffffff818b0a29>] system_call_fastpath+0x16/0x1b
      ./arch/x86/kernel/entry_64.S:629
      
      [2]
      <4>[196727.311203] general protection fault: 0000 [#1] SMP
      <4>[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp pps_core mdio
      <4>[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1
      <4>[196727.311344] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013
      <4>[196727.311364] task: ffff885e6f069700 ti: ffff885e6f072000 task.ti: ffff885e6f072000
      <4>[196727.311377] RIP: 0010:[<ffffffff815f8c7f>]  [<ffffffff815f8c7f>] ipv4_dst_destroy+0x4f/0x80
      <4>[196727.311399] RSP: 0018:ffff885effd23a70  EFLAGS: 00010282
      <4>[196727.311409] RAX: dead000000200200 RBX: ffff8854c398ecc0 RCX: 0000000000000040
      <4>[196727.311423] RDX: dead000000100100 RSI: dead000000100100 RDI: dead000000200200
      <4>[196727.311437] RBP: ffff885effd23a80 R08: ffffffff815fd9e0 R09: ffff885d5a590800
      <4>[196727.311451] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      <4>[196727.311464] R13: ffffffff81c8c280 R14: 0000000000000000 R15: ffff880e85ee16ce
      <4>[196727.311510] FS:  0000000000000000(0000) GS:ffff885effd20000(0000) knlGS:0000000000000000
      <4>[196727.311554] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      <4>[196727.311581] CR2: 00007a46751eb000 CR3: 0000005e65688000 CR4: 00000000000407e0
      <4>[196727.311625] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      <4>[196727.311669] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      <4>[196727.311713] Stack:
      <4>[196727.311733]  ffff8854c398ecc0 ffff8854c398ecc0 ffff885effd23ab0 ffffffff815b7f42
      <4>[196727.311784]  ffff88be6595bc00 ffff8854c398ecc0 0000000000000000 ffff8854c398ecc0
      <4>[196727.311834]  ffff885effd23ad0 ffffffff815b86c6 ffff885d5a590800 ffff8816827821c0
      <4>[196727.311885] Call Trace:
      <4>[196727.311907]  <IRQ>
      <4>[196727.311912]  [<ffffffff815b7f42>] dst_destroy+0x32/0xe0
      <4>[196727.311959]  [<ffffffff815b86c6>] dst_release+0x56/0x80
      <4>[196727.311986]  [<ffffffff81620bd5>] tcp_v4_do_rcv+0x2a5/0x4a0
      <4>[196727.312013]  [<ffffffff81622b5a>] tcp_v4_rcv+0x7da/0x820
      <4>[196727.312041]  [<ffffffff815fd9e0>] ? ip_rcv_finish+0x360/0x360
      <4>[196727.312070]  [<ffffffff815de02d>] ? nf_hook_slow+0x7d/0x150
      <4>[196727.312097]  [<ffffffff815fd9e0>] ? ip_rcv_finish+0x360/0x360
      <4>[196727.312125]  [<ffffffff815fda92>] ip_local_deliver_finish+0xb2/0x230
      <4>[196727.312154]  [<ffffffff815fdd9a>] ip_local_deliver+0x4a/0x90
      <4>[196727.312183]  [<ffffffff815fd799>] ip_rcv_finish+0x119/0x360
      <4>[196727.312212]  [<ffffffff815fe00b>] ip_rcv+0x22b/0x340
      <4>[196727.312242]  [<ffffffffa0339680>] ? macvlan_broadcast+0x160/0x160 [macvlan]
      <4>[196727.312275]  [<ffffffff815b0c62>] __netif_receive_skb_core+0x512/0x640
      <4>[196727.312308]  [<ffffffff811427fb>] ? kmem_cache_alloc+0x13b/0x150
      <4>[196727.312338]  [<ffffffff815b0db1>] __netif_receive_skb+0x21/0x70
      <4>[196727.312368]  [<ffffffff815b0fa1>] netif_receive_skb+0x31/0xa0
      <4>[196727.312397]  [<ffffffff815b1ae8>] napi_gro_receive+0xe8/0x140
      <4>[196727.312433]  [<ffffffffa00274f1>] ixgbe_poll+0x551/0x11f0 [ixgbe]
      <4>[196727.312463]  [<ffffffff815fe00b>] ? ip_rcv+0x22b/0x340
      <4>[196727.312491]  [<ffffffff815b1691>] net_rx_action+0x111/0x210
      <4>[196727.312521]  [<ffffffff815b0db1>] ? __netif_receive_skb+0x21/0x70
      <4>[196727.312552]  [<ffffffff810519d0>] __do_softirq+0xd0/0x270
      <4>[196727.312583]  [<ffffffff816cef3c>] call_softirq+0x1c/0x30
      <4>[196727.312613]  [<ffffffff81004205>] do_softirq+0x55/0x90
      <4>[196727.312640]  [<ffffffff81051c85>] irq_exit+0x55/0x60
      <4>[196727.312668]  [<ffffffff816cf5c3>] do_IRQ+0x63/0xe0
      <4>[196727.312696]  [<ffffffff816c5aaa>] common_interrupt+0x6a/0x6a
      <4>[196727.312722]  <EOI>
      <1>[196727.313071] RIP  [<ffffffff815f8c7f>] ipv4_dst_destroy+0x4f/0x80
      <4>[196727.313100]  RSP <ffff885effd23a70>
      <4>[196727.313377] ---[ end trace 64b3f14fae0f2e29 ]---
      <0>[196727.380908] Kernel panic - not syncing: Fatal exception in interrupt
      Reported-by: default avatarAlexey Preobrazhensky <preobr@google.com>
      Reported-by: default avatardormando <dormando@rydia.ne>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Fixes: 8141ed9f ("ipv4: Add a socket release callback for datagram sockets")
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9709674e
    • Daniel Borkmann's avatar
      net: filter: add test_bpf module under MAINTAINERS' networking section · a101ccd1
      Daniel Borkmann authored
      Add lib/test_bpf.c entry to maintainers file under networking.
      All changes were posted via netdev for review, so make sure
      other people Cc it as well when they call get_maintainer.pl.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Acked-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a101ccd1
    • Octavian Purdila's avatar
      net: add __pskb_copy_fclone and pskb_copy_for_clone · bad93e9d
      Octavian Purdila authored
      There are several instances where a pskb_copy or __pskb_copy is
      immediately followed by an skb_clone.
      
      Add a couple of new functions to allow the copy skb to be allocated
      from the fclone cache and thus speed up subsequent skb_clone calls.
      
      Cc: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
      Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
      Cc: Marek Lindner <mareklindner@neomailbox.ch>
      Cc: Simon Wunderlich <sw@simonwunderlich.de>
      Cc: Antonio Quartulli <antonio@meshcoding.com>
      Cc: Marcel Holtmann <marcel@holtmann.org>
      Cc: Gustavo Padovan <gustavo@padovan.org>
      Cc: Johan Hedberg <johan.hedberg@gmail.com>
      Cc: Arvid Brodin <arvid.brodin@alten.se>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Cc: Lauro Ramos Venancio <lauro.venancio@openbossa.org>
      Cc: Aloisio Almeida Jr <aloisio.almeida@openbossa.org>
      Cc: Samuel Ortiz <sameo@linux.intel.com>
      Cc: Jon Maloy <jon.maloy@ericsson.com>
      Cc: Allan Stephens <allan.stephens@windriver.com>
      Cc: Andrew Hendry <andrew.hendry@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarChristoph Paasch <christoph.paasch@uclouvain.be>
      Signed-off-by: default avatarOctavian Purdila <octavian.purdila@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bad93e9d
    • Jon Cooper's avatar
      sfc: PIO:Restrict to 64bit arch and use 64-bit writes. · daf37b55
      Jon Cooper authored
      Fixes:ee45fd92
      ("sfc: Use TX PIO for sufficiently small packets")
      
      The linux net driver uses memcpy_toio() in order to copy into
      the PIO buffers.
      Even on a 64bit machine this causes 32bit accesses to a write-
      combined memory region.
      There are hardware limitations that mean that only 64bit
      naturally aligned accesses are safe in all cases.
      Due to being write-combined memory region two 32bit accesses
      may be coalesced to form a 64bit non 64bit aligned access.
      Solution was to open-code the memory copy routines using pointers
      and to only enable PIO for x86_64 machines.
      
      Not tested on platforms other than x86_64 because this patch
      disables the PIO feature on other platforms.
      Compile-tested on x86 to ensure that works.
      
      The WARN_ON_ONCE() code in the previous version of this patch
      has been moved into the internal sfc debug driver as the
      assertion was unnecessary in the upstream kernel code.
      
      This bug fix applies to v3.13 and v3.14 stable branches.
      Signed-off-by: default avatarShradha Shah <sshah@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      daf37b55
    • David S. Miller's avatar
      Merge branch 'bridge-next' · 1a0b20b2
      David S. Miller authored
      Toshiaki Makita says:
      
      ====================
      bridge: 802.1ad vlan protocol support
      
      Currently bridge vlan filtering doesn't work fine with 802.1ad protocol.
      Only if a bridge is configured without pvid, the bridge receives only
      802.1ad tagged frames and no STP is used, it will work.
      Otherwise:
      - If pvid is configured, it can put only 802.1Q tags but cannot put 802.1ad
        tags.
      - If 802.1Q and 802.1ad tagged frames arrive in mixture, it applies filtering
        regardless of their protocols.
      - While an 802.1ad bridge should use another mac address for STP BPDU and
        should forward customer's BPDU frames, it can't.
      Thus, we can't properly handle frames once 802.1ad is used.
      
      Handling 802.1ad is useful if we want to allow stacked vlans to be used,
      e.g., guest VMs wants to use vlan tags and the host also wants to segregate
      guest's traffic from other guests' by vlan tags.
      
      Here is the image describing how to configure a bridge to filter VMs traffic.
      
               +-------+p/u   +-----+  +---------+
       +----+  |       |------|vnet0|--|User A VM|
       |eth0|--|802.1ad|      +-----+  +---------+
       +----+  |bridge |p/u   +-----+  +---------+
               |       |------|vnet1|--|User B VM|
               +-------+      +-----+  +---------+
      p/u: pvid/untagged
      
      This patch set enables us to set vlan protocols per bridge.
      This tries to implement a bridge like S-VLAN component in IEEE 802.1Q-2011
      spec.
      
      Note that there is another possible implementation that sets vlan protocols
      per port. Some HW switches seem to take that approach.
      However, I think per-bridge approach is better, because;
      - I think the typical usage of an 802.1ad bridge is segregating 802.1Q tagged
        traffic (like what is described above), and this doesn't need the ability to
        be set protocols per port. Also, If a bridge has many ports and it supports
        per-port setting, we might have to make much more extra configurations to
        change protocols of all ports.
      
      - I assume that the main perpose to set protocol per port is to assign S-VID
        according to C-VID, or to realize two logical bridges (one is an 802.1Q
        filtering bridge and the other is an 802.1ad filtering bridge) in one bridge.
        The former usually needs additional features such as vlan id mapping, and
        is likely to make bridge's code complicated. If a user wants, such enhanced
        features can be accomplished by a combination of multiple bridges, so it is
        not absolutely necessary to implement these features in a bridge itself.
        The latter is simply unnecessary because we can easily make two bridges of
        which one is an 802.1Q bridge and the other is an 802.1ad bridge.
      
      Here is an example of the enhanced feature that we can realize by using
      multiple bridges and veth interfaces. This way is documented in
      IEEE 802.1Q-2011 clause 15.4 (C-tagged service interface).
      
       +----+  +-------+p/u         +------+  +----+  +--+
       |eth0|--|802.1ad|----veth----|802.1Q|--|vnet|--|VM|
       +----+  |bridge |----veth----|bridge|  +----+  +--+
               +-------+p/u         +------+
      p/u: pvid/untagged
      
      In this configuration, we can map C-VIDs to any S-VID.
      For example;
       C-VID 10 and 20 to S-VID 100
       C-VID 30 to S-VID 110
      This is achieved through the 802.1Q bridge that forwards C-tagged frames to
      proper ports of the 802.1ad bridge.
      
      Changes:
      v1 -> v2:
      - Make the way to forward bridge group addresses more generic by introducing
        new mask, group_fwd_mask_required.
      
      RFC -> v1:
      - Add S-TAG tx offload.
      - Remove a fix around stacked vlan which has already been fixed.
      - Take into account Bridge Group Addresses.
      - Separate handling of protocol-mismatch from br_vlan_get_tag().
      - Change the way to set vlan_proto from netlink to sysfs because no other
        existing configuration per bridge can be set by netlink.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a0b20b2
    • Toshiaki Makita's avatar
      bridge: Support 802.1ad vlan filtering · 204177f3
      Toshiaki Makita authored
      This enables us to change the vlan protocol for vlan filtering.
      We come to be able to filter frames on the basis of 802.1ad vlan tags
      through a bridge.
      
      This also changes br->group_addr if it has not been set by user.
      This is needed for an 802.1ad bridge.
      (See IEEE 802.1Q-2011 8.13.5.)
      
      Furthermore, this sets br->group_fwd_mask_required so that an 802.1ad
      bridge can forward the Nearest Customer Bridge group addresses except
      for br->group_addr, which should be passed to higher layer.
      
      To change the vlan protocol, write a protocol in sysfs:
      # echo 0x88a8 > /sys/class/net/br0/bridge/vlan_protocol
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      204177f3
    • Toshiaki Makita's avatar
      bridge: Prepare for forwarding another bridge group addresses · f2808d22
      Toshiaki Makita authored
      If a bridge is an 802.1ad bridge, it must forward another bridge group
      addresses (the Nearest Customer Bridge group addresses).
      (For details, see IEEE 802.1Q-2011 8.6.3.)
      
      As user might not want group_fwd_mask to be modified by enabling 802.1ad,
      introduce a new mask, group_fwd_mask_required, which indicates addresses
      the bridge wants to forward. This will be set by enabling 802.1ad.
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f2808d22
    • Toshiaki Makita's avatar
      bridge: Prepare for 802.1ad vlan filtering support · 8580e211
      Toshiaki Makita authored
      This enables a bridge to have vlan protocol informantion and allows vlan
      tag manipulation (retrieve, insert and remove tags) according to the vlan
      protocol.
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8580e211
    • Toshiaki Makita's avatar
      bridge: Add 802.1ad tx vlan acceleration · 1c5abb6c
      Toshiaki Makita authored
      Bridge device doesn't need to embed S-tag into skb->data.
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1c5abb6c