1. 14 Apr, 2014 2 commits
    • Patrick McHardy's avatar
      netfilter: nf_tables: fix nft_cmp_fast failure on big endian for size < 4 · b855d416
      Patrick McHardy authored
      nft_cmp_fast is used for equality comparisions of size <= 4. For
      comparisions of size < 4 byte a mask is calculated that is applied to
      both the data from userspace (during initialization) and the register
      value (during runtime). Both values are stored using (in effect) memcpy
      to a memory area that is then interpreted as u32 by nft_cmp_fast.
      
      This works fine on little endian since smaller types have the same base
      address, however on big endian this is not true and the smaller types
      are interpreted as a big number with trailing zero bytes.
      
      The mask therefore must not include the lower bytes, but the higher bytes
      on big endian. Add a helper function that does a cpu_to_le32 to switch
      the bytes on big endian. Since we're dealing with a mask of just consequitive
      bits, this works out fine.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      b855d416
    • Andrey Vagin's avatar
      netfilter: nf_conntrack: initialize net.ct.generation · ee214d54
      Andrey Vagin authored
      [  251.920788] INFO: trying to register non-static key.
      [  251.921386] the code is fine but needs lockdep annotation.
      [  251.921386] turning off the locking correctness validator.
      [  251.921386] CPU: 2 PID: 15715 Comm: socket_listen Not tainted 3.14.0+ #294
      [  251.921386] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [  251.921386]  0000000000000000 000000009d18c210 ffff880075f039b8 ffffffff816b7ecd
      [  251.921386]  ffffffff822c3b10 ffff880075f039c8 ffffffff816b36f4 ffff880075f03aa0
      [  251.921386]  ffffffff810c65ff ffffffff810c4a85 00000000fffffe01 ffffffffa0075172
      [  251.921386] Call Trace:
      [  251.921386]  [<ffffffff816b7ecd>] dump_stack+0x45/0x56
      [  251.921386]  [<ffffffff816b36f4>] register_lock_class.part.24+0x38/0x3c
      [  251.921386]  [<ffffffff810c65ff>] __lock_acquire+0x168f/0x1b40
      [  251.921386]  [<ffffffff810c4a85>] ? trace_hardirqs_on_caller+0x105/0x1d0
      [  251.921386]  [<ffffffffa0075172>] ? nf_nat_setup_info+0x252/0x3a0 [nf_nat]
      [  251.921386]  [<ffffffff816c1215>] ? _raw_spin_unlock_bh+0x35/0x40
      [  251.921386]  [<ffffffffa0075172>] ? nf_nat_setup_info+0x252/0x3a0 [nf_nat]
      [  251.921386]  [<ffffffff810c7272>] lock_acquire+0xa2/0x120
      [  251.921386]  [<ffffffffa008ab90>] ? ipv4_confirm+0x90/0xf0 [nf_conntrack_ipv4]
      [  251.921386]  [<ffffffffa0055989>] __nf_conntrack_confirm+0x129/0x410 [nf_conntrack]
      [  251.921386]  [<ffffffffa008ab90>] ? ipv4_confirm+0x90/0xf0 [nf_conntrack_ipv4]
      [  251.921386]  [<ffffffffa008ab90>] ipv4_confirm+0x90/0xf0 [nf_conntrack_ipv4]
      [  251.921386]  [<ffffffff815e7b00>] ? ip_fragment+0x9f0/0x9f0
      [  251.921386]  [<ffffffff815d8c5a>] nf_iterate+0xaa/0xc0
      [  251.921386]  [<ffffffff815e7b00>] ? ip_fragment+0x9f0/0x9f0
      [  251.921386]  [<ffffffff815d8d14>] nf_hook_slow+0xa4/0x190
      [  251.921386]  [<ffffffff815e7b00>] ? ip_fragment+0x9f0/0x9f0
      [  251.921386]  [<ffffffff815e98f2>] ip_output+0x92/0x100
      [  251.921386]  [<ffffffff815e8df9>] ip_local_out+0x29/0x90
      [  251.921386]  [<ffffffff815e9240>] ip_queue_xmit+0x170/0x4c0
      [  251.921386]  [<ffffffff815e90d5>] ? ip_queue_xmit+0x5/0x4c0
      [  251.921386]  [<ffffffff81601208>] tcp_transmit_skb+0x498/0x960
      [  251.921386]  [<ffffffff81602d82>] tcp_connect+0x812/0x960
      [  251.921386]  [<ffffffff810e3dc5>] ? ktime_get_real+0x25/0x70
      [  251.921386]  [<ffffffff8159ea2a>] ? secure_tcp_sequence_number+0x6a/0xc0
      [  251.921386]  [<ffffffff81606f57>] tcp_v4_connect+0x317/0x470
      [  251.921386]  [<ffffffff8161f645>] __inet_stream_connect+0xb5/0x330
      [  251.921386]  [<ffffffff8158dfc3>] ? lock_sock_nested+0x33/0xa0
      [  251.921386]  [<ffffffff810c4b5d>] ? trace_hardirqs_on+0xd/0x10
      [  251.921386]  [<ffffffff81078885>] ? __local_bh_enable_ip+0x75/0xe0
      [  251.921386]  [<ffffffff8161f8f8>] inet_stream_connect+0x38/0x50
      [  251.921386]  [<ffffffff8158b157>] SYSC_connect+0xe7/0x120
      [  251.921386]  [<ffffffff810e3789>] ? current_kernel_time+0x69/0xd0
      [  251.921386]  [<ffffffff810c4a85>] ? trace_hardirqs_on_caller+0x105/0x1d0
      [  251.921386]  [<ffffffff810c4b5d>] ? trace_hardirqs_on+0xd/0x10
      [  251.921386]  [<ffffffff8158c36e>] SyS_connect+0xe/0x10
      [  251.921386]  [<ffffffff816caf69>] system_call_fastpath+0x16/0x1b
      [  312.014104] INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 0, t=60003 jiffies, g=42359, c=42358, q=333)
      [  312.015097] INFO: Stall ended before state dump start
      
      Fixes: 93bb0ceb ("netfilter: conntrack: remove central spinlock nf_conntrack_lock")
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarAndrey Vagin <avagin@openvz.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      ee214d54
  2. 08 Apr, 2014 1 commit
    • Andrey Vagin's avatar
      netfilter: nf_conntrack: flush net_gre->keymap_list only from gre helper · 8142b227
      Andrey Vagin authored
      nf_ct_gre_keymap_flush() removes a nf_ct_gre_keymap object from
      net_gre->keymap_list and frees the object. But it doesn't clean
      a reference on this object from ct_pptp_info->keymap[dir].
      Then nf_ct_gre_keymap_destroy() may release the same object again.
      
      So nf_ct_gre_keymap_flush() can be called only when we are sure that
      when nf_ct_gre_keymap_destroy will not be called.
      
      nf_ct_gre_keymap is created by nf_ct_gre_keymap_add() and the right way
      to destroy it is to call nf_ct_gre_keymap_destroy().
      
      This patch marks nf_ct_gre_keymap_flush() as static, so this patch can
      break compilation of third party modules, which use
      nf_ct_gre_keymap_flush. I'm not sure this is the right way to deprecate
      this function.
      
      [  226.540793] general protection fault: 0000 [#1] SMP
      [  226.541750] Modules linked in: nf_nat_pptp nf_nat_proto_gre
      nf_conntrack_pptp nf_conntrack_proto_gre ip_gre ip_tunnel gre
      ppp_deflate bsd_comp ppp_async crc_ccitt ppp_generic slhc xt_nat
      iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat
      nf_conntrack veth tun bridge stp llc ppdev microcode joydev pcspkr
      serio_raw virtio_console virtio_balloon floppy parport_pc parport
      pvpanic i2c_piix4 virtio_net drm_kms_helper ttm ata_generic virtio_pci
      virtio_ring virtio drm i2c_core pata_acpi [last unloaded: ip_tunnel]
      [  226.541776] CPU: 0 PID: 49 Comm: kworker/u4:2 Not tainted 3.14.0-rc8+ #101
      [  226.541776] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [  226.541776] Workqueue: netns cleanup_net
      [  226.541776] task: ffff8800371e0000 ti: ffff88003730c000 task.ti: ffff88003730c000
      [  226.541776] RIP: 0010:[<ffffffff81389ba9>]  [<ffffffff81389ba9>] __list_del_entry+0x29/0xd0
      [  226.541776] RSP: 0018:ffff88003730dbd0  EFLAGS: 00010a83
      [  226.541776] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8800374e6c40 RCX: dead000000200200
      [  226.541776] RDX: 6b6b6b6b6b6b6b6b RSI: ffff8800371e07d0 RDI: ffff8800374e6c40
      [  226.541776] RBP: ffff88003730dbd0 R08: 0000000000000000 R09: 0000000000000000
      [  226.541776] R10: 0000000000000001 R11: ffff88003730d92e R12: 0000000000000002
      [  226.541776] R13: ffff88007a4c42d0 R14: ffff88007aef0000 R15: ffff880036cf0018
      [  226.541776] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
      [  226.541776] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [  226.541776] CR2: 00007f07f643f7d0 CR3: 0000000036fd2000 CR4: 00000000000006f0
      [  226.541776] Stack:
      [  226.541776]  ffff88003730dbe8 ffffffff81389c5d ffff8800374ffbe4 ffff88003730dc28
      [  226.541776]  ffffffffa0162a43 ffffffffa01627c5 ffff88007a4c42d0 ffff88007aef0000
      [  226.541776]  ffffffffa01651c0 ffff88007a4c45e0 ffff88007aef0000 ffff88003730dc40
      [  226.541776] Call Trace:
      [  226.541776]  [<ffffffff81389c5d>] list_del+0xd/0x30
      [  226.541776]  [<ffffffffa0162a43>] nf_ct_gre_keymap_destroy+0x283/0x2d0 [nf_conntrack_proto_gre]
      [  226.541776]  [<ffffffffa01627c5>] ? nf_ct_gre_keymap_destroy+0x5/0x2d0 [nf_conntrack_proto_gre]
      [  226.541776]  [<ffffffffa0162ab7>] gre_destroy+0x27/0x70 [nf_conntrack_proto_gre]
      [  226.541776]  [<ffffffffa0117de3>] destroy_conntrack+0x83/0x200 [nf_conntrack]
      [  226.541776]  [<ffffffffa0117d87>] ? destroy_conntrack+0x27/0x200 [nf_conntrack]
      [  226.541776]  [<ffffffffa0117d60>] ? nf_conntrack_hash_check_insert+0x2e0/0x2e0 [nf_conntrack]
      [  226.541776]  [<ffffffff81630142>] nf_conntrack_destroy+0x72/0x180
      [  226.541776]  [<ffffffff816300d5>] ? nf_conntrack_destroy+0x5/0x180
      [  226.541776]  [<ffffffffa011ef80>] ? kill_l3proto+0x20/0x20 [nf_conntrack]
      [  226.541776]  [<ffffffffa011847e>] nf_ct_iterate_cleanup+0x14e/0x170 [nf_conntrack]
      [  226.541776]  [<ffffffffa011f74b>] nf_ct_l4proto_pernet_unregister+0x5b/0x90 [nf_conntrack]
      [  226.541776]  [<ffffffffa0162409>] proto_gre_net_exit+0x19/0x30 [nf_conntrack_proto_gre]
      [  226.541776]  [<ffffffff815edf89>] ops_exit_list.isra.1+0x39/0x60
      [  226.541776]  [<ffffffff815eecc0>] cleanup_net+0x100/0x1d0
      [  226.541776]  [<ffffffff810a608a>] process_one_work+0x1ea/0x4f0
      [  226.541776]  [<ffffffff810a6028>] ? process_one_work+0x188/0x4f0
      [  226.541776]  [<ffffffff810a64ab>] worker_thread+0x11b/0x3a0
      [  226.541776]  [<ffffffff810a6390>] ? process_one_work+0x4f0/0x4f0
      [  226.541776]  [<ffffffff810af42d>] kthread+0xed/0x110
      [  226.541776]  [<ffffffff8173d4dc>] ? _raw_spin_unlock_irq+0x2c/0x40
      [  226.541776]  [<ffffffff810af340>] ? kthread_create_on_node+0x200/0x200
      [  226.541776]  [<ffffffff8174747c>] ret_from_fork+0x7c/0xb0
      [  226.541776]  [<ffffffff810af340>] ? kthread_create_on_node+0x200/0x200
      [  226.541776] Code: 00 00 55 48 8b 17 48 b9 00 01 10 00 00 00 ad de
      48 8b 47 08 48 89 e5 48 39 ca 74 29 48 b9 00 02 20 00 00 00 ad de 48
      39 c8 74 7a <4c> 8b 00 4c 39 c7 75 53 4c 8b 42 08 4c 39 c7 75 2b 48 89
      42 08
      [  226.541776] RIP  [<ffffffff81389ba9>] __list_del_entry+0x29/0xd0
      [  226.541776]  RSP <ffff88003730dbd0>
      [  226.612193] ---[ end trace 985ae23ddfcc357c ]---
      
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarAndrey Vagin <avagin@openvz.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      8142b227
  3. 07 Apr, 2014 8 commits
  4. 06 Apr, 2014 1 commit
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · d80e773f
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      The following patchset contains Netfilter fixes for your net tree, they
      are:
      
      * Use 16-bits offset and length fields instead of 8-bits in the conntrack
        extension to avoid an overflow when many conntrack extension are used,
        from Andrey Vagin.
      
      * Allow to use cgroup match from LOCAL_IN, there is no apparent reason
        for not allowing this, from Alexey Perevalov.
      
      * Fix build of the connlimit match after recent changes to let it scale
        up that result in a divide by zero compilation error in UP, from
        Florian Westphal.
      
      * Move the lock out of the structure connlimit_data to avoid a false
        sharing spotted by Eric Dumazet and Jesper D. Brouer, this needed as
        part of the recent connlimit scalability improvements, also from
        Florian Westphal.
      
      * Add missing module aliases in xt_osf to fix loading of rules using
        this match, from Kirill Tkhai.
      
      * Restrict set names in nf_tables to 15 characters instead of silently
        trimming them off, from me.
      
      * Fix wrong format in nf_tables request module call for chain types,
        spotted by Florian Westphal, patch from me.
      
      * Fix crash in xtables when it fails to copy the counters back to userspace
        after having replaced the table already.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d80e773f
  5. 05 Apr, 2014 1 commit
    • Thomas Graf's avatar
      netfilter: Can't fail and free after table replacement · c58dd2dd
      Thomas Graf authored
      All xtables variants suffer from the defect that the copy_to_user()
      to copy the counters to user memory may fail after the table has
      already been exchanged and thus exposed. Return an error at this
      point will result in freeing the already exposed table. Any
      subsequent packet processing will result in a kernel panic.
      
      We can't copy the counters before exposing the new tables as we
      want provide the counter state after the old table has been
      unhooked. Therefore convert this into a silent error.
      
      Cc: Florian Westphal <fw@strlen.de>
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      c58dd2dd
  6. 04 Apr, 2014 4 commits
  7. 03 Apr, 2014 23 commits
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: fix wrong format in request_module() · 2fec6bb6
      Pablo Neira Ayuso authored
      The intended format in request_module is %.*s instead of %*.s.
      Reported-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      2fec6bb6
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: set names cannot be larger than 15 bytes · a9bdd836
      Pablo Neira Ayuso authored
      Currently, nf_tables trims off the set name if it exceeeds 15
      bytes, so explicitly reject set names that are too large.
      Reported-by: default avatarGiuseppe Longo <giuseppelng@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a9bdd836
    • Andrey Vagin's avatar
      netfilter: nf_conntrack: reserve two bytes for nf_ct_ext->len · 223b02d9
      Andrey Vagin authored
      "len" contains sizeof(nf_ct_ext) and size of extensions. In a worst
      case it can contain all extensions. Bellow you can find sizes for all
      types of extensions. Their sum is definitely bigger than 256.
      
      nf_ct_ext_types[0]->len = 24
      nf_ct_ext_types[1]->len = 32
      nf_ct_ext_types[2]->len = 24
      nf_ct_ext_types[3]->len = 32
      nf_ct_ext_types[4]->len = 152
      nf_ct_ext_types[5]->len = 2
      nf_ct_ext_types[6]->len = 16
      nf_ct_ext_types[7]->len = 8
      
      I have seen "len" up to 280 and my host has crashes w/o this patch.
      
      The right way to fix this problem is reducing the size of the ecache
      extension (4) and Florian is going to do this, but these changes will
      be quite large to be appropriate for a stable tree.
      
      Fixes: 5b423f6a (netfilter: nf_conntrack: fix racy timer handling with reliable)
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarAndrey Vagin <avagin@openvz.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      223b02d9
    • Kirill Tkhai's avatar
      netfilter: Add {ipt,ip6t}_osf aliases for xt_osf · b8ddd9ea
      Kirill Tkhai authored
      There are no these aliases, so kernel can not request appropriate
      match table:
      
      $ iptables -I INPUT -p tcp -m osf --genre Windows --ttl 2 -j DROP
      iptables: No chain/target/match by that name.
      
      setsockopt() requests ipt_osf module, which is not present. Add
      the aliases.
      Signed-off-by: default avatarKirill Tkhai <ktkhai@parallels.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      b8ddd9ea
    • Alexey Perevalov's avatar
      netfilter: x_tables: allow to use cgroup match for LOCAL_IN nf hooks · a00e7634
      Alexey Perevalov authored
      This simple modification allows iptables to work with INPUT chain
      in combination with cgroup module. It could be useful for counting
      ingress traffic per cgroup with nfacct netfilter module. There
      were no problems to count the egress traffic that way formerly.
      
      It's possible to get classified sk_buff after PREROUTING, due to
      socket lookup being done in early_demux (tcp_v4_early_demux). Also
      it works for udp as well.
      
      Trivial usage example, assuming we're in the same shell every step
      and we have enough permissions:
      
      1) Classic net_cls cgroup initialization:
      
        mkdir /sys/fs/cgroup/net_cls
        mount -t cgroup -o net_cls net_cls /sys/fs/cgroup/net_cls
      
      2) Set up cgroup for interesting application:
      
        mkdir /sys/fs/cgroup/net_cls/wget
        echo 1 > /sys/fs/cgroup/net_cls/wget/net_cls.classid
        echo $BASHPID > /sys/fs/cgroup/net_cls/wget/cgroup.procs
      
      3) Create kernel counters:
      
        nfacct add wget-cgroup-in
        iptables -A INPUT -m cgroup ! --cgroup 1 -m nfacct --nfacct-name wget-cgroup-in
      
        nfacct add wget-cgroup-out
        iptables -A OUTPUT -m cgroup ! --cgroup 1 -m nfacct --nfacct-name wget-cgroup-out
      
      4) Network usage:
      
        wget https://www.kernel.org/pub/linux/kernel/v3.x/testing/linux-3.14-rc6.tar.xz
      
      5) Check results:
      
        nfacct list
      
      Cgroup approach is being used for the DataUsage (counting & blocking
      traffic) feature for Samsung's modification of the Tizen OS.
      Signed-off-by: default avatarAlexey Perevalov <a.perevalov@samsung.com>
      Acked-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a00e7634
    • Florian Westphal's avatar
      netfilter: connlimit: move lock array out of struct connlimit_data · e00b437b
      Florian Westphal authored
      Eric points out that the locks can be global.
      Moreover, both Jesper and Eric note that using only 32 locks increases
      false sharing as only two cache lines are used.
      
      This increases locks to 256 (16 cache lines assuming 64byte cacheline and
      4 bytes per spinlock).
      Suggested-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Suggested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      e00b437b
    • Florian Westphal's avatar
      netfilter: connlimit: fix UP build · e5ac6eaf
      Florian Westphal authored
      cannot use ARRAY_SIZE() if spinlock_t is empty struct.
      
      Fixes: 1442e750 ("netfilter: connlimit: use keyed locks")
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      e5ac6eaf
    • Eric Dumazet's avatar
      net-gro: reset skb->truesize in napi_reuse_skb() · e33d0ba8
      Eric Dumazet authored
      Recycling skb always had been very tough...
      
      This time it appears GRO layer can accumulate skb->truesize
      adjustments made by drivers when they attach a fragment to skb.
      
      skb_gro_receive() can only subtract from skb->truesize the used part
      of a fragment.
      
      I spotted this problem seeing TcpExtPruneCalled and
      TcpExtTCPRcvCollapsed that were unexpected with a recent kernel, where
      TCP receive window should be sized properly to accept traffic coming
      from a driver not overshooting skb->truesize.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e33d0ba8
    • Philipp Zabel's avatar
      net: Micrel KSZ8864RMN 4-port managed switch support · 240a12d5
      Philipp Zabel authored
      This patch adds support for the Micrel KSZ8864RMN switch to the spi_ks8995
      driver. The KSZ8864RMN switch has a wider 256-byte register space.
      Signed-off-by: default avatarPhilipp Zabel <p.zabel@pengutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      240a12d5
    • Erik Hugne's avatar
      tipc: fix regression bug where node events are not being generated · a5e7ac5c
      Erik Hugne authored
      Commit 5902385a ("tipc: obsolete
      the remote management feature") introduces a regression where node
      topology events are not being generated because the publication
      that triggers this: {0, <z.c.n>, <z.c.n>} is no longer available.
      This will break applications that rely on node events to discover
      when nodes join/leave a cluster.
      
      We fix this by advertising the node publication when TIPC enters
      networking mode, and withdraws it upon shutdown.
      Signed-off-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5e7ac5c
    • françois romieu's avatar
      sxgbe: fix driver probe error path and driver removal leaks · d9bd6461
      françois romieu authored
      sxgbe_drv_probe:  mdio and priv->hw leaks
      sxgbe_drv_remove: clk and priv->hw leaks
      Signed-off-by: default avatarFrancois Romieu <romieu@fr.zoreil.com>
      Acked-by: default avatarByungho An <bh74.an@samsung.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d9bd6461
    • françois romieu's avatar
    • Jiri Pirko's avatar
      net: add busy_poll device feature · d0290214
      Jiri Pirko authored
      Currently there is no way how to find out if a device supports busy
      polling. So add a feature and make it dependent on ndo_busy_poll
      existence.
      Signed-off-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d0290214
    • Daniel Borkmann's avatar
      packet: fix packet_direct_xmit for BQL enabled drivers · 8e2f1a63
      Daniel Borkmann authored
      Currently, in packet_direct_xmit() we test the assigned netdevice queue
      for netif_xmit_frozen_or_stopped() before doing an ndo_start_xmit().
      
      This can have the side-effect that BQL enabled drivers which make use
      of netdev_tx_sent_queue() internally, set __QUEUE_STATE_STACK_XOFF from
      within the stack and would not fully fill the device's TX ring from
      packet sockets with PACKET_QDISC_BYPASS enabled.
      
      Instead, use a test without BQL bit so that bursts can be absorbed
      into the NICs TX ring. Fix and code suggested by Eric Dumazet, thanks!
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8e2f1a63
    • Daniel Borkmann's avatar
      packet: report tx_dropped in packet_direct_xmit · 0f97ede4
      Daniel Borkmann authored
      Since commit 015f0688 ("net: net: add a core netdev->tx_dropped
      counter"), we can now account for TX drops from within the core
      stack instead of drivers.
      
      Therefore, fix packet_direct_xmit() and increase drop count when we
      encounter a problem before driver's xmit function was called (we do
      not want to doubly account for it).
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0f97ede4
    • Zoltan Kiss's avatar
      xen-netback: Grant copy the header instead of map and memcpy · bdab8275
      Zoltan Kiss authored
      An old inefficiency of the TX path that we are grant mapping the first slot,
      and then copy the header part to the linear area. Instead, doing a grant copy
      for that header straight on is more reasonable. Especially because there are
      ongoing efforts to make Xen avoiding TLB flush after unmap when the page were
      not touched in Dom0. In the original way the memcpy ruined that.
      The key changes:
      - the vif has a tx_copy_ops array again
      - xenvif_tx_build_gops sets up the grant copy operations
      - we don't have to figure out whether the header and first frag are on the same
        grant mapped page or not
      Note, we only grant copy PKT_PROT_LEN bytes from the first slot, the rest (if
      any) will be on the first frag, which is grant mapped. If the first slot is
      smaller than PKT_PROT_LEN, then we grant copy that, and later __pskb_pull_tail
      will pull more from the frags (if any)
      Signed-off-by: default avatarZoltan Kiss <zoltan.kiss@citrix.com>
      Reviewed-by: default avatarPaul Durrant <paul.durrant@citrix.com>
      Acked-by: default avatarIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bdab8275
    • Zoltan Kiss's avatar
      xen-netback: Rename map ops · 9074ce24
      Zoltan Kiss authored
      Rename identifiers to state explicitly that they refer to map ops.
      Signed-off-by: default avatarZoltan Kiss <zoltan.kiss@citrix.com>
      Reviewed-by: default avatarPaul Durrant <paul.durrant@citrix.com>
      Acked-by: default avatarIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9074ce24
    • Josh Boyer's avatar
      net: qlcnic: include irq.h for irq definitions · acdd32be
      Josh Boyer authored
      The qlcnic driver fails to build on ARM with errors like:
      
      In file included from drivers/net/ethernet/qlogic/qlcnic/qlcnic.h:36:0,
                       from drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.c:8:
      drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.h:585:1: error: unknown type name 'irqreturn_t'
       irqreturn_t qlcnic_83xx_clear_legacy_intr(struct qlcnic_adapter *);
       ^
      
      Nothing in the driver is explicitly including the irq definitions, so we
      add an include of linux/irq.h to pick them up.
      Signed-off-by: default avatarJosh Boyer <jwboyer@fedoraproject.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      acdd32be
    • Josh Boyer's avatar
      net: enic: include irq.h for irqreturn_t definitions · fef1f07c
      Josh Boyer authored
      The enic driver fails to build on ARM with:
      
      In file included from drivers/net/ethernet/cisco/enic/enic_res.c:40:0:
      drivers/net/ethernet/cisco/enic/enic.h:48:2: error: expected specifier-qualifier-list before 'irqreturn_t'
        irqreturn_t (*isr)(int, void *);
        ^
      
      Nothing in the driver is explicitly including the irq definitions, so we add
      an include of linux/irq.h to pick them up.
      Signed-off-by: default avatarJosh Boyer <jwboyer@fedoraproject.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fef1f07c
    • Josh Boyer's avatar
      net: bnx2x: include irq.h for irqreturn_t definitions · df1efc2d
      Josh Boyer authored
      The bnx2x driver fails to build on ARM with:
      
      In file included from drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c:28:0:
      drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h:243:1: error: unknown type name 'irqreturn_t'
       irqreturn_t bnx2x_msix_sp_int(int irq, void *dev_instance);
       ^
      drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h:251:1: error: unknown type name 'irqreturn_t'
       irqreturn_t bnx2x_interrupt(int irq, void *dev_instance);
       ^
      
      Nothing in bnx2x_link.c or bnx2x_cmn.h is explicitly including the irq
      definitions, so we add an include of linux/irq.h to pick them up.
      Signed-off-by: default avatarJosh Boyer <jwboyer@fedoraproject.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df1efc2d
    • YOSHIFUJI Hideaki / 吉藤英明's avatar
      isdnloop: Validate NUL-terminated strings from user. · 77bc6bed
      YOSHIFUJI Hideaki / 吉藤英明 authored
      Return -EINVAL unless all of user-given strings are correctly
      NUL-terminated.
      Signed-off-by: default avatarYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      77bc6bed
    • Alexei Starovoitov's avatar
      net: ti: fix CPTS driver build on arm · 79eb9d28
      Alexei Starovoitov authored
      fix build errors:
      drivers/net/ethernet/ti/cpts.c:266:12: error: 'ETH_HLEN' undeclared (first use in this function)
      drivers/net/ethernet/ti/cpts.c:276:23: error: 'VLAN_HLEN' undeclared (first use in this function)
      
      Fixes: 408eccce ("net: ptp: move PTP classifier in its own file")
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Suggested-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Acked-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      79eb9d28
    • Mike Rapoport's avatar
      net: vxlan: fix crash when interface is created with no group · 5933a7bb
      Mike Rapoport authored
      If the vxlan interface is created without explicit group definition,
      there are corner cases which may cause kernel panic.
      
      For instance, in the following scenario:
      
      node A:
      $ ip link add dev vxlan42  address 2c:c2:60:00:10:20 type vxlan id 42
      $ ip addr add dev vxlan42 10.0.0.1/24
      $ ip link set up dev vxlan42
      $ arp -i vxlan42 -s 10.0.0.2 2c:c2:60:00:01:02
      $ bridge fdb add dev vxlan42 to 2c:c2:60:00:01:02 dst <IPv4 address>
      $ ping 10.0.0.2
      
      node B:
      $ ip link add dev vxlan42 address 2c:c2:60:00:01:02 type vxlan id 42
      $ ip addr add dev vxlan42 10.0.0.2/24
      $ ip link set up dev vxlan42
      $ arp -i vxlan42 -s 10.0.0.1 2c:c2:60:00:10:20
      
      node B crashes:
      
       vxlan42: 2c:c2:60:00:10:20 migrated from 4011:eca4:c0a8:6466:c0a8:6415:8e09:2118 to (invalid address)
       vxlan42: 2c:c2:60:00:10:20 migrated from 4011:eca4:c0a8:6466:c0a8:6415:8e09:2118 to (invalid address)
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000046
       IP: [<ffffffff8143c459>] ip6_route_output+0x58/0x82
       PGD 7bd89067 PUD 7bd4e067 PMD 0
       Oops: 0000 [#1] SMP
       Modules linked in:
       CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.14.0-rc8-hvx-xen-00019-g97a5221f-dirty #154
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
       task: ffff88007c774f50 ti: ffff88007c79c000 task.ti: ffff88007c79c000
       RIP: 0010:[<ffffffff8143c459>]  [<ffffffff8143c459>] ip6_route_output+0x58/0x82
       RSP: 0018:ffff88007fd03668  EFLAGS: 00010282
       RAX: 0000000000000000 RBX: ffffffff8186a000 RCX: 0000000000000040
       RDX: 0000000000000000 RSI: ffff88007b0e4a80 RDI: ffff88007fd03754
       RBP: ffff88007fd03688 R08: ffff88007b0e4a80 R09: 0000000000000000
       R10: 0200000a0100000a R11: 0001002200000000 R12: ffff88007fd03740
       R13: ffff88007b0e4a80 R14: ffff88007b0e4a80 R15: ffff88007bba0c50
       FS:  0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       CR2: 0000000000000046 CR3: 000000007bb60000 CR4: 00000000000006e0
       Stack:
        0000000000000000 ffff88007fd037a0 ffffffff8186a000 ffff88007fd03740
        ffff88007fd036c8 ffffffff814320bb 0000000000006e49 ffff88007b8b7360
        ffff88007bdbf200 ffff88007bcbc000 ffff88007b8b7000 ffff88007b8b7360
       Call Trace:
        <IRQ>
        [<ffffffff814320bb>] ip6_dst_lookup_tail+0x2d/0xa4
        [<ffffffff814322a5>] ip6_dst_lookup+0x10/0x12
        [<ffffffff81323b4e>] vxlan_xmit_one+0x32a/0x68c
        [<ffffffff814a325a>] ? _raw_spin_unlock_irqrestore+0x12/0x14
        [<ffffffff8104c551>] ? lock_timer_base.isra.23+0x26/0x4b
        [<ffffffff8132451a>] vxlan_xmit+0x66a/0x6a8
        [<ffffffff8141a365>] ? ipt_do_table+0x35f/0x37e
        [<ffffffff81204ba2>] ? selinux_ip_postroute+0x41/0x26e
        [<ffffffff8139d0c1>] dev_hard_start_xmit+0x2ce/0x3ce
        [<ffffffff8139d491>] __dev_queue_xmit+0x2d0/0x392
        [<ffffffff813b380f>] ? eth_header+0x28/0xb5
        [<ffffffff8139d569>] dev_queue_xmit+0xb/0xd
        [<ffffffff813a5aa6>] neigh_resolve_output+0x134/0x152
        [<ffffffff813db741>] ip_finish_output2+0x236/0x299
        [<ffffffff813dc074>] ip_finish_output+0x98/0x9d
        [<ffffffff813dc749>] ip_output+0x62/0x67
        [<ffffffff813da9f2>] dst_output+0xf/0x11
        [<ffffffff813dc11c>] ip_local_out+0x1b/0x1f
        [<ffffffff813dcf1b>] ip_send_skb+0x11/0x37
        [<ffffffff813dcf70>] ip_push_pending_frames+0x2f/0x33
        [<ffffffff813ff732>] icmp_push_reply+0x106/0x115
        [<ffffffff813ff9e4>] icmp_reply+0x142/0x164
        [<ffffffff813ffb3b>] icmp_echo.part.16+0x46/0x48
        [<ffffffff813c1d30>] ? nf_iterate+0x43/0x80
        [<ffffffff813d8037>] ? xfrm4_policy_check.constprop.11+0x52/0x52
        [<ffffffff813ffb62>] icmp_echo+0x25/0x27
        [<ffffffff814005f7>] icmp_rcv+0x1d2/0x20a
        [<ffffffff813d8037>] ? xfrm4_policy_check.constprop.11+0x52/0x52
        [<ffffffff813d810d>] ip_local_deliver_finish+0xd6/0x14f
        [<ffffffff813d8037>] ? xfrm4_policy_check.constprop.11+0x52/0x52
        [<ffffffff813d7fde>] NF_HOOK.constprop.10+0x4c/0x53
        [<ffffffff813d82bf>] ip_local_deliver+0x4a/0x4f
        [<ffffffff813d7f7b>] ip_rcv_finish+0x253/0x26a
        [<ffffffff813d7d28>] ? inet_add_protocol+0x3e/0x3e
        [<ffffffff813d7fde>] NF_HOOK.constprop.10+0x4c/0x53
        [<ffffffff813d856a>] ip_rcv+0x2a6/0x2ec
        [<ffffffff8139a9a0>] __netif_receive_skb_core+0x43e/0x478
        [<ffffffff812a346f>] ? virtqueue_poll+0x16/0x27
        [<ffffffff8139aa2f>] __netif_receive_skb+0x55/0x5a
        [<ffffffff8139aaaa>] process_backlog+0x76/0x12f
        [<ffffffff8139add8>] net_rx_action+0xa2/0x1ab
        [<ffffffff81047847>] __do_softirq+0xca/0x1d1
        [<ffffffff81047ace>] irq_exit+0x3e/0x85
        [<ffffffff8100b98b>] do_IRQ+0xa9/0xc4
        [<ffffffff814a37ad>] common_interrupt+0x6d/0x6d
        <EOI>
        [<ffffffff810378db>] ? native_safe_halt+0x6/0x8
        [<ffffffff810110c7>] default_idle+0x9/0xd
        [<ffffffff81011694>] arch_cpu_idle+0x13/0x1c
        [<ffffffff8107480d>] cpu_startup_entry+0xbc/0x137
        [<ffffffff8102e741>] start_secondary+0x1a0/0x1a5
       Code: 24 14 e8 f1 e5 01 00 31 d2 a8 32 0f 95 c2 49 8b 44 24 2c 49 0b 44 24 24 74 05 83 ca 04 eb 1c 4d 85 ed 74 17 49 8b 85 a8 02 00 00 <66> 8b 40 46 66 c1 e8 07 83 e0 07 c1 e0 03 09 c2 4c 89 e6 48 89
       RIP  [<ffffffff8143c459>] ip6_route_output+0x58/0x82
        RSP <ffff88007fd03668>
       CR2: 0000000000000046
       ---[ end trace 4612329caab37efd ]---
      
      When vxlan interface is created without explicit group definition, the
      default_dst protocol family is initialiazed to AF_UNSPEC and the driver
      assumes IPv4 configuration. On the other side, the default_dst protocol
      family is used to differentiate between IPv4 and IPv6 cases and, since,
      AF_UNSPEC != AF_INET, the processing takes the IPv6 path.
      
      Making the IPv4 assumption explicit by settting default_dst protocol
      family to AF_INET4 and preventing mixing of IPv4 and IPv6 addresses in
      snooped fdb entries fixes the corner case crashes.
      Signed-off-by: default avatarMike Rapoport <mike.rapoport@ravellosystems.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5933a7bb