1. 18 Sep, 2015 9 commits
    • Jiri Benc's avatar
      vxlan: set needed headroom correctly · 9dc2ad10
      Jiri Benc authored
      vxlan_setup is called when allocating the net_device, i.e. way before
      vxlan_newlink (or vxlan_dev_configure) is called. This means
      vxlan->default_dst is actually unset in vxlan_setup and the condition that
      sets needed_headroom always takes the else branch.
      
      Set the needed_headrom at the point when we have the information about
      the address family available.
      
      Fixes: e4c7ed41 ("vxlan: add ipv6 support")
      Fixes: 2853af6a ("vxlan: use dev->needed_headroom instead of dev->hard_header_len")
      CC: Cong Wang <cwang@twopensource.com>
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9dc2ad10
    • Michael Grzeschik's avatar
      MAINTAINERS: add arcnet and take maintainership · c38f6ac7
      Michael Grzeschik authored
      Add entry for arcnet to MAINTAINERS file and add myself as the
      maintainer of the subsystem.
      Signed-off-by: default avatarMichael Grzeschik <m.grzeschik@pengutronix.de>
      Cc: davem@davemloft.net
      Cc: joe@perches.com
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c38f6ac7
    • Michael Grzeschik's avatar
      ARCNET: fix hard_header_len limit · 980137a2
      Michael Grzeschik authored
      For arcnet the bare minimum header only contains the 4 bytes to
      specify source, dest and offset (1, 1 and 2 bytes respectively).
      The corresponding struct is struct arc_hardware.
      
      The struct archdr contains additionally a union of possible soft
      headers. When doing $insertusecasehere packets might well
      include short (or even no?) soft headers.
      
      For this reason only use arc_hardware instead of archdr to
      determine the hard_header_len for an arcnet device.
      Signed-off-by: default avatarMichael Grzeschik <m.grzeschik@pengutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      980137a2
    • David S. Miller's avatar
      Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth · 1dbb2413
      David S. Miller authored
      Johan Hedberg says:
      
      ====================
      pull request: bluetooth 2015-09-17
      
      Here's one important patch for the 4.3-rc series that fixes an issue
      with Bluetooth LE encryption failing because of a too early check for
      the SMP context.
      
      Please let me know if there are any issues pulling. Thanks.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1dbb2413
    • Sasha Levin's avatar
      atm: deal with setting entry before mkip was called · 34f5b006
      Sasha Levin authored
      If we didn't call ATMARP_MKIP before ATMARP_ENCAP the VCC descriptor is
      non-existant and we'll end up dereferencing a NULL ptr:
      
      [1033173.491930] kasan: GPF could be caused by NULL-ptr deref or user memory accessirq event stamp: 123386
      [1033173.493678] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
      [1033173.493689] Modules linked in:
      [1033173.493697] CPU: 9 PID: 23815 Comm: trinity-c64 Not tainted 4.2.0-next-20150911-sasha-00043-g353d875-dirty #2545
      [1033173.493706] task: ffff8800630c4000 ti: ffff880063110000 task.ti: ffff880063110000
      [1033173.493823] RIP: clip_ioctl (net/atm/clip.c:320 net/atm/clip.c:689)
      [1033173.493826] RSP: 0018:ffff880063117a88  EFLAGS: 00010203
      [1033173.493828] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 000000000000000c
      [1033173.493830] RDX: 0000000000000002 RSI: ffffffffb3f10720 RDI: 0000000000000014
      [1033173.493832] RBP: ffff880063117b80 R08: ffff88047574d9a4 R09: 0000000000000000
      [1033173.493834] R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff1000c622f53
      [1033173.493836] R13: ffff8800cb905500 R14: ffff8808d6da2000 R15: 00000000fffffdfd
      [1033173.493840] FS:  00007fa56b92d700(0000) GS:ffff880478000000(0000) knlGS:0000000000000000
      [1033173.493843] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [1033173.493845] CR2: 0000000000000000 CR3: 00000000630e8000 CR4: 00000000000006a0
      [1033173.493855] Stack:
      [1033173.493862]  ffffffffb0b60444 000000000000eaea 0000000041b58ab3 ffffffffb3c3ce32
      [1033173.493867]  ffffffffb0b6f3e0 ffffffffb0b60444 ffffffffb5ea2e50 1ffff1000c622f5e
      [1033173.493873]  ffff8800630c4cd8 00000000000ee09a ffffffffb3ec4888 ffffffffb5ea2de8
      [1033173.493874] Call Trace:
      [1033173.494108] do_vcc_ioctl (net/atm/ioctl.c:170)
      [1033173.494113] vcc_ioctl (net/atm/ioctl.c:189)
      [1033173.494116] svc_ioctl (net/atm/svc.c:605)
      [1033173.494200] sock_do_ioctl (net/socket.c:874)
      [1033173.494204] sock_ioctl (net/socket.c:958)
      [1033173.494244] do_vfs_ioctl (fs/ioctl.c:43 fs/ioctl.c:607)
      [1033173.494290] SyS_ioctl (fs/ioctl.c:622 fs/ioctl.c:613)
      [1033173.494295] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:186)
      [1033173.494362] Code: fa 48 c1 ea 03 80 3c 02 00 0f 85 50 09 00 00 49 8b 9e 60 06 00 00 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 14 48 89 fa 48 c1 ea 03 <0f> b6 04 02 48 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 14 09 00
      All code
      
      ========
         0:   fa                      cli
         1:   48 c1 ea 03             shr    $0x3,%rdx
         5:   80 3c 02 00             cmpb   $0x0,(%rdx,%rax,1)
         9:   0f 85 50 09 00 00       jne    0x95f
         f:   49 8b 9e 60 06 00 00    mov    0x660(%r14),%rbx
        16:   48 b8 00 00 00 00 00    movabs $0xdffffc0000000000,%rax
        1d:   fc ff df
        20:   48 8d 7b 14             lea    0x14(%rbx),%rdi
        24:   48 89 fa                mov    %rdi,%rdx
        27:   48 c1 ea 03             shr    $0x3,%rdx
        2b:*  0f b6 04 02             movzbl (%rdx,%rax,1),%eax               <-- trapping instruction
        2f:   48 89 fa                mov    %rdi,%rdx
        32:   83 e2 07                and    $0x7,%edx
        35:   38 d0                   cmp    %dl,%al
        37:   7f 08                   jg     0x41
        39:   84 c0                   test   %al,%al
        3b:   0f 85 14 09 00 00       jne    0x955
      
      Code starting with the faulting instruction
      ===========================================
         0:   0f b6 04 02             movzbl (%rdx,%rax,1),%eax
         4:   48 89 fa                mov    %rdi,%rdx
         7:   83 e2 07                and    $0x7,%edx
         a:   38 d0                   cmp    %dl,%al
         c:   7f 08                   jg     0x16
         e:   84 c0                   test   %al,%al
        10:   0f 85 14 09 00 00       jne    0x92a
      [1033173.494366] RIP clip_ioctl (net/atm/clip.c:320 net/atm/clip.c:689)
      [1033173.494368]  RSP <ffff880063117a88>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34f5b006
    • Florian Westphal's avatar
      ipv6: ip6_fragment: fix headroom tests and skb leak · 1d325d21
      Florian Westphal authored
      David Woodhouse reports skb_under_panic when we try to push ethernet
      header to fragmented ipv6 skbs:
      
       skbuff: skb_under_panic: text:c1277f1e len:1294 put:14 head:dec98000
       data:dec97ffc tail:0xdec9850a end:0xdec98f40 dev:br-lan
      [..]
      ip6_finish_output2+0x196/0x4da
      
      David further debugged this:
        [..] offending fragments were arriving here with skb_headroom(skb)==10.
        Which is reasonable, being the Solos ADSL card's header of 8 bytes
        followed by 2 bytes of PPP frame type.
      
      The problem is that if netfilter ipv6 defragmentation is used, skb_cow()
      in ip6_forward will only see reassembled skb.
      
      Therefore, headroom is overestimated by 8 bytes (we pulled fragment
      header) and we don't check the skbs in the frag_list either.
      
      We can't do these checks in netfilter defrag since outdev isn't known yet.
      
      Furthermore, existing tests in ip6_fragment did not consider the fragment
      or ipv6 header size when checking headroom of the fraglist skbs.
      
      While at it, also fix a skb leak on memory allocation -- ip6_fragment
      must consume the skb.
      
      I tested this e1000 driver hacked to not allocate additional headroom
      (we end up in slowpath, since LL_RESERVED_SPACE is 16).
      
      If 2 bytes of headroom are allocated, fastpath is taken (14 byte
      ethernet header was pulled, so 16 byte headroom available in all
      fragments).
      Reported-by: default avatarDavid Woodhouse <dwmw2@infradead.org>
      Diagnosed-by: default avatarDavid Woodhouse <dwmw2@infradead.org>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Tested-by: default avatarDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1d325d21
    • David Woodhouse's avatar
      solos-pci: Increase headroom on received packets · ce816eb0
      David Woodhouse authored
      A comment in include/linux/skbuff.h says that:
      
       * Various parts of the networking layer expect at least 32 bytes of
       * headroom, you should not reduce this.
      
      This was demonstrated by a panic when handling fragmented IPv6 packets:
      http://marc.info/?l=linux-netdev&m=144236093519172&w=2
      
      It's not entirely clear if that comment is still valid — and if it is,
      perhaps netif_rx() ought to be enforcing it with a warning.
      
      But either way, it is rather stupid from a performance point of view
      for us to be receiving packets into a buffer which doesn't have enough
      room to prepend an Ethernet header — it means that *every* incoming
      packet is going to be need to be reallocated. So let's fix that.
      Signed-off-by: default avatarDavid Woodhouse <David.Woodhouse@intel.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ce816eb0
    • Javier Martinez Canillas's avatar
      net: ks8851: Export OF module alias information · 88c79664
      Javier Martinez Canillas authored
      Drivers needs to export the OF id table and this be built into
      the module or udev won't have the necessary information to autoload
      the driver module when the device is registered via OF.
      Signed-off-by: default avatarJavier Martinez Canillas <javier@osg.samsung.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      88c79664
    • Eric Dumazet's avatar
      net/mlx4_en: really allow to change RSS key · 4671fc6d
      Eric Dumazet authored
      When changing rss key, we do not want to overwrite user provided key
      by the one provided by netdev_rss_key_fill(), which is the host random
      key generated at boot time.
      
      Fixes: 947cbb0a ("net/mlx4_en: Support for configurable RSS hash function")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Eyal Perry <eyalpe@mellanox.com>
      CC: Amir Vadai <amirv@mellanox.com>
      Acked-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4671fc6d
  2. 17 Sep, 2015 7 commits
  3. 15 Sep, 2015 13 commits
    • Julia Lawall's avatar
      dccp: drop null test before destroy functions · 20471ed4
      Julia Lawall authored
      Remove unneeded NULL test.
      
      The semantic patch that makes this change is as follows:
      (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @@
      expression x;
      @@
      
      -if (x != NULL)
        \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x);
      
      @@
      expression x;
      @@
      
      -if (x != NULL) {
        \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x);
        x = NULL;
      -}
      // </smpl>
      Signed-off-by: default avatarJulia Lawall <Julia.Lawall@lip6.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      20471ed4
    • Julia Lawall's avatar
      net: core: drop null test before destroy functions · adf78eda
      Julia Lawall authored
      Remove unneeded NULL test.
      
      The semantic patch that makes this change is as follows:
      (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @@ expression x; @@
      -if (x != NULL) {
        \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x);
        x = NULL;
      -}
      // </smpl>
      Signed-off-by: default avatarJulia Lawall <Julia.Lawall@lip6.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      adf78eda
    • Julia Lawall's avatar
      atm: he: drop null test before destroy functions · 58d29e3c
      Julia Lawall authored
      Remove unneeded NULL test.
      
      The semantic patch that makes this change is as follows:
      (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @@ expression x; @@
      -if (x != NULL)
        \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x);
      // </smpl>
      Signed-off-by: default avatarJulia Lawall <Julia.Lawall@lip6.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      58d29e3c
    • Jesse Gross's avatar
      openvswitch: Fix mask generation for nested attributes. · 982b5270
      Jesse Gross authored
      Masks were added to OVS flows in a way that was backwards compatible
      with userspace programs that did not generate masks. As a result, it is
      possible that we may receive flows that do not have a mask and we need
      to synthesize one.
      
      Generating a mask requires iterating over attributes and descending into
      nested attributes. For each level we need to know the size to generate the
      correct mask. We do this with a linked table of attribute types.
      
      Although the logic to handle these nested attributes was there in concept,
      there are a number of bugs in practice. Examples include incomplete links
      between tables, variable length attributes being treated as nested and
      missing sanity checks.
      Signed-off-by: default avatarJesse Gross <jesse@nicira.com>
      Acked-by: default avatarPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      982b5270
    • Sjoerd Simons's avatar
      net: stmmac: Use msleep rather then udelay for reset delay · 892aa01d
      Sjoerd Simons authored
      The reset delays used for stmmac are in the order of 10ms to 1 second,
      which is far too long for udelay usage, so switch to using msleep.
      
      Practically this fixes the PHY not being reliably detected in some cases
      as udelay wouldn't actually delay for long enough to let the phy
      reliably be reset.
      Signed-off-by: default avatarSjoerd Simons <sjoerd.simons@collabora.co.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      892aa01d
    • Roopa Prabhu's avatar
      rtnetlink: catch -EOPNOTSUPP errors from ndo_bridge_getlink · d64f69b0
      Roopa Prabhu authored
      problem reported:
      	kernel 4.1.3
      	------------
      	# bridge vlan
      	port	vlan ids
      	eth0	 1 PVID Egress Untagged
      	 	90
      	 	91
      	 	92
      	 	93
      	 	94
      	 	95
      	 	96
      	 	97
      	 	98
      	 	99
      	 	100
      
      	vmbr0	 1 PVID Egress Untagged
      	 	94
      
      	kernel 4.2
      	-----------
      	# bridge vlan
      	port	vlan ids
      
      ndo_bridge_getlink can return -EOPNOTSUPP when an interfaces
      ndo_bridge_getlink op is set to switchdev_port_bridge_getlink
      and CONFIG_SWITCHDEV is not defined. This today can happen to
      bond, rocker and team devices. This patch adds -EOPNOTSUPP
      checks after calls to ndo_bridge_getlink.
      
      Fixes: 85fdb956 ("switchdev: cut over to new switchdev_port_bridge_getlink")
      Reported-by: default avatarAlexandre DERUMIER <aderumier@odiso.com>
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d64f69b0
    • Simon Guinot's avatar
      net: mvneta: fix DMA buffer unmapping in mvneta_rx() · daf158d0
      Simon Guinot authored
      This patch fixes a regression introduced by the commit a84e3289
      ("net: mvneta: fix refilling for Rx DMA buffers"). Due to this commit
      the newly allocated Rx buffers are DMA-unmapped in place of those passed
      to the networking stack. Obviously, this causes data corruptions.
      
      This patch fixes the issue by ensuring that the right Rx buffers are
      DMA-unmapped.
      Reported-by: default avatarOren Laskin <oren@igneous.io>
      Signed-off-by: default avatarSimon Guinot <simon.guinot@sequanux.org>
      Fixes: a84e3289 ("net: mvneta: fix refilling for Rx DMA buffers")
      Cc: <stable@vger.kernel.org> # v3.8+
      Tested-by: default avatarOren Laskin <oren@igneous.io>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      daf158d0
    • David S. Miller's avatar
      Merge branch 'ip6tunnel_dst' · 244b7f43
      David S. Miller authored
      Martin KaFai Lau says:
      
      ====================
      ipv6: Fix dst_entry refcnt bugs in ip6_tunnel
      
      v4:
      - Fix a compilation error in patch 5 when CONFIG_LOCKDEP is turned on and
        re-test it
      
      v3:
      - Merge a 'if else if' test in patch 4
      - Use rcu_dereference_protected in patch 5 to fix a sparse check when
        CONFIG_SPARSE_RCU_POINTER is enabled
      
      v2:
      - Add patch 4 and 5 to remove the spinlock
      
      v1:
      This patch series is to fix the dst refcnt bugs in ip6_tunnel.
      
      Patch 1 and 2 are the prep works.  Patch 3 is the fix.
      
      I can reproduce the bug by adding and removing the ip6gre tunnel
      while running a super_netperf TCP_CRR test.  I get the following
      trace by adding WARN_ON_ONCE(newrefcnt < 0) to dst_release():
      
      [  312.760432] ------------[ cut here ]------------
      [  312.774664] WARNING: CPU: 2 PID: 10263 at net/core/dst.c:288 dst_release+0xf3/0x100()
      [  312.776041] Modules linked in: k10temp coretemp hwmon ip6_gre ip6_tunnel tunnel6 ipmi_devintf ipmi_ms\
      ghandler ip6table_filter ip6_tables xt_NFLOG nfnetlink_log nfnetlink xt_comment xt_statistic iptable_fil\
      ter ip_tables x_tables nfsv3 nfs_acl nfs fscache lockd grace mptctl netconsole autofs4 rpcsec_gss_krb5 a\
      uth_rpcgss oid_registry sunrpc ipv6 dm_mod loop iTCO_wdt iTCO_vendor_support serio_raw rtc_cmos pcspkr i\
      2c_i801 i2c_core lpc_ich mfd_core ehci_pci ehci_hcd e1000e mlx4_en ptp pps_core vxlan udp_tunnel ip6_udp\
      _tunnel mlx4_core sg button ext3 jbd mpt2sas raid_class
      [  312.785302] CPU: 2 PID: 10263 Comm: netperf Not tainted 4.2.0-rc8-00046-g4db9b63-dirty #15
      [  312.791695] Hardware name: Quanta Freedom /Windmill-EP, BIOS F03_3B04 09/12/2013
      [  312.792965]  ffffffff819dca2c ffff8811dfbdf6f8 ffffffff816537de ffff88123788fdb8
      [  312.794263]  0000000000000000 ffff8811dfbdf738 ffffffff81052646 ffff8811dfbdf768
      [  312.795593]  ffff881203a98180 00000000ffffffff ffff88242927a000 ffff88120a2532e0
      [  312.796946] Call Trace:
      [  312.797380]  [<ffffffff816537de>] dump_stack+0x45/0x57
      [  312.798288]  [<ffffffff81052646>] warn_slowpath_common+0x86/0xc0
      [  312.799699]  [<ffffffff8105273a>] warn_slowpath_null+0x1a/0x20
      [  312.800852]  [<ffffffff8159f9b3>] dst_release+0xf3/0x100
      [  312.801834]  [<ffffffffa03f1308>] ip6_tnl_dst_store+0x48/0x70 [ip6_tunnel]
      [  312.803738]  [<ffffffffa03fd0b6>] ip6gre_xmit2+0x536/0x720 [ip6_gre]
      [  312.804774]  [<ffffffffa03fd40a>] ip6gre_tunnel_xmit+0x16a/0x410 [ip6_gre]
      [  312.805986]  [<ffffffff8159934b>] dev_hard_start_xmit+0x23b/0x390
      [  312.808810]  [<ffffffff815a2f5f>] ? neigh_destroy+0xef/0x140
      [  312.809843]  [<ffffffff81599a6c>] __dev_queue_xmit+0x48c/0x4f0
      [  312.813931]  [<ffffffff81599ae3>] dev_queue_xmit_sk+0x13/0x20
      [  312.814993]  [<ffffffff815a0832>] neigh_direct_output+0x12/0x20
      [  312.817448]  [<ffffffffa021d633>] ip6_finish_output2+0x183/0x460 [ipv6]
      [  312.818762]  [<ffffffff81306fc5>] ? find_next_bit+0x15/0x20
      [  312.819671]  [<ffffffffa021fd79>] ip6_finish_output+0x89/0xe0 [ipv6]
      [  312.820720]  [<ffffffffa021fe14>] ip6_output+0x44/0xe0 [ipv6]
      [  312.821762]  [<ffffffff815c8809>] ? nf_hook_slow+0x69/0xc0
      [  312.823123]  [<ffffffffa021d232>] ip6_xmit+0x242/0x4c0 [ipv6]
      [  312.824073]  [<ffffffffa021c9f0>] ? ac6_proc_exit+0x20/0x20 [ipv6]
      [  312.825116]  [<ffffffffa024c751>] inet6_csk_xmit+0x61/0xa0 [ipv6]
      [  312.826127]  [<ffffffff815eb590>] tcp_transmit_skb+0x4f0/0x9b0
      [  312.827441]  [<ffffffff815ed267>] tcp_connect+0x637/0x7a0
      [  312.828327]  [<ffffffffa0245906>] tcp_v6_connect+0x2d6/0x550 [ipv6]
      [  312.829581]  [<ffffffff81606f05>] __inet_stream_connect+0x95/0x2f0
      [  312.830600]  [<ffffffff810ae13a>] ? hrtimer_try_to_cancel+0x1a/0xf0
      [  312.833456]  [<ffffffff812fba19>] ? timerqueue_add+0x59/0xb0
      [  312.834407]  [<ffffffff81607198>] inet_stream_connect+0x38/0x50
      [  312.835886]  [<ffffffff8157cb17>] SYSC_connect+0xb7/0xf0
      [  312.840035]  [<ffffffff810af6d3>] ? do_setitimer+0x1b3/0x200
      [  312.840983]  [<ffffffff810af75a>] ? alarm_setitimer+0x3a/0x70
      [  312.841941]  [<ffffffff8157d7ae>] SyS_connect+0xe/0x10
      [  312.842818]  [<ffffffff81659297>] entry_SYSCALL_64_fastpath+0x12/0x6a
      [  312.844206] ---[ end trace 43f3ecd86c3b1313 ]---
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      244b7f43
    • Martin KaFai Lau's avatar
      ipv6: Replace spinlock with seqlock and rcu in ip6_tunnel · 70da5b5c
      Martin KaFai Lau authored
      This patch uses a seqlock to ensure consistency between idst->dst and
      idst->cookie.  It also makes dst freeing from fib tree to undergo a
      rcu grace period.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      70da5b5c
    • Martin KaFai Lau's avatar
      ipv6: Avoid double dst_free · 8e3d5be7
      Martin KaFai Lau authored
      It is a prep work to get dst freeing from fib tree undergo
      a rcu grace period.
      
      The following is a common paradigm:
      if (ip6_del_rt(rt))
      	dst_free(rt)
      
      which means, if rt cannot be deleted from the fib tree, dst_free(rt) now.
      1. We don't know the ip6_del_rt(rt) failure is because it
         was not managed by fib tree (e.g. DST_NOCACHE) or it had already been
         removed from the fib tree.
      2. If rt had been managed by the fib tree, ip6_del_rt(rt) failure means
         dst_free(rt) has been called already.  A second
         dst_free(rt) is not always obviously safe.  The rt may have
         been destroyed already.
      3. If rt is a DST_NOCACHE, dst_free(rt) should not be called.
      4. It is a stopper to make dst freeing from fib tree undergo a
         rcu grace period.
      
      This patch is to use a DST_NOCACHE flag to indicate a rt is
      not managed by the fib tree.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8e3d5be7
    • Martin KaFai Lau's avatar
      ipv6: Fix dst_entry refcnt bugs in ip6_tunnel · cdf3464e
      Martin KaFai Lau authored
      Problems in the current dst_entry cache in the ip6_tunnel:
      
      1. ip6_tnl_dst_set is racy.  There is no lock to protect it:
         - One major problem is that the dst refcnt gets messed up. F.e.
           the same dst_cache can be released multiple times and then
           triggering the infamous dst refcnt < 0 warning message.
         - Another issue is the inconsistency between dst_cache and
           dst_cookie.
      
         It can be reproduced by adding and removing the ip6gre tunnel
         while running a super_netperf TCP_CRR test.
      
      2. ip6_tnl_dst_get does not take the dst refcnt before returning
         the dst.
      
      This patch:
      1. Create a percpu dst_entry cache in ip6_tnl
      2. Use a spinlock to protect the dst_cache operations
      3. ip6_tnl_dst_get always takes the dst refcnt before returning
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cdf3464e
    • Martin KaFai Lau's avatar
      ipv6: Rename the dst_cache helper functions in ip6_tunnel · f230d1e8
      Martin KaFai Lau authored
      It is a prep work to fix the dst_entry refcnt bugs in
      ip6_tunnel.
      
      This patch rename:
      1. ip6_tnl_dst_check() to ip6_tnl_dst_get() to better
         reflect that it will take a dst refcnt in the next patch.
      2. ip6_tnl_dst_store() to ip6_tnl_dst_set() to have a more
         conventional name matching with ip6_tnl_dst_get().
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f230d1e8
    • Martin KaFai Lau's avatar
      ipv6: Refactor common ip6gre_tunnel_init codes · a3c119d3
      Martin KaFai Lau authored
      It is a prep work to fix the dst_entry refcnt bugs in ip6_tunnel.
      
      This patch refactors some common init codes used by both
      ip6gre_tunnel_init and ip6gre_tap_init.
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a3c119d3
  4. 11 Sep, 2015 11 commits
    • Alexey Khoroshilov's avatar
      irda: ali-ircc: Fix deadlock in ali_ircc_sir_change_speed() · e8684c88
      Alexey Khoroshilov authored
      ali_ircc_sir_change_speed() is always called with self->lock held,
      so acquiring the lock inside it leads to unavoidable deadlock.
      
      Call graph:
      ali_ircc_sir_change_speed() is called from ali_ircc_change_speed()
        ali_ircc_fir_hard_xmit() under spin_lock_irqsave(&self->lock, flags);
        ali_ircc_sir_hard_xmit() under spin_lock_irqsave(&self->lock, flags);
        ali_ircc_net_ioctl() under spin_lock_irqsave(&self->lock, flags);
        ali_ircc_dma_xmit_complete()
          ali_ircc_fir_interrupt()
            ali_ircc_interrupt() under spin_lock(&self->lock);
        ali_ircc_sir_write_wakeup()
          ali_ircc_sir_interrupt()
            ali_ircc_interrupt() under spin_lock(&self->lock);
      
      The patch removes spin_lock/unlock from ali_ircc_sir_change_speed().
      
      Found by Linux Driver Verification project (linuxtesting.org).
      Signed-off-by: default avatarAlexey Khoroshilov <khoroshilov@ispras.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e8684c88
    • Joe Stringer's avatar
      openvswitch: Fix dependency on IPv6 defrag. · 38c089d1
      Joe Stringer authored
      When NF_CONNTRACK is built-in, NF_DEFRAG_IPV6 is a module, and
      OPENVSWITCH is built-in, the following build error would occur:
      
      net/built-in.o: In function `ovs_ct_execute':
      (.text+0x10f587): undefined reference to `nf_ct_frag6_gather'
      
      Fixes: 7f8a436e ("openvswitch: Add conntrack action")
      Reported-by: default avatarJim Davis <jim.epost@gmail.com>
      Signed-off-by: default avatarJoe Stringer <joestringer@nicira.com>
      Acked-by: default avatarPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      38c089d1
    • Linus Lüssing's avatar
      bridge: fix igmpv3 / mldv2 report parsing · c2d4fbd2
      Linus Lüssing authored
      With the newly introduced helper functions the skb pulling is hidden in
      the checksumming function - and undone before returning to the caller.
      
      The IGMPv3 and MLDv2 report parsing functions in the bridge still
      assumed that the skb is pointing to the beginning of the IGMP/MLD
      message while it is now kept at the beginning of the IPv4/6 header,
      breaking the message parsing and creating packet loss.
      
      Fixing this by taking the offset between IP and IGMP/MLD header into
      account, too.
      
      Fixes: 9afd85c9 ("net: Export IGMP/MLD message validation code")
      Reported-by: default avatarTobias Powalowski <tobias.powalowski@googlemail.com>
      Tested-by: default avatarTobias Powalowski <tobias.powalowski@googlemail.com>
      Signed-off-by: default avatarLinus Lüssing <linus.luessing@c0d3.blue>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c2d4fbd2
    • Arnd Bergmann's avatar
      bnx2x: use ktime_get_seconds() for timestamp · a19a19de
      Arnd Bergmann authored
      commit c48f350f "bnx2x: Add MFW dump support" added the
      bnx2x_update_mfw_dump() function that reads the current time and stores
      it in a 32-bit field that gets passed into a buffer in a fixed format.
      
      This is potentially broken when the epoch overflows in 2038, and
      otherwise overflows in 2106. As we're trying to avoid uses of
      struct timeval for this reason, I noticed the addition of this
      function, and tried to rewrite it in a way that is more explicit
      about the overflow and that will keep working once we deprecate
      struct timeval.
      
      I assume that it is not possible to change the ABI any more, otherwise
      we should try to use a 64-bit field for the seconds right away.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: Yuval Mintz <Yuval.Mintz@qlogic.com>
      Cc: Ariel Elior <Ariel.Elior@qlogic.com>
      Acked-by: default avatarYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a19a19de
    • Marcelo Ricardo Leitner's avatar
      sctp: fix race on protocol/netns initialization · 8e2d61e0
      Marcelo Ricardo Leitner authored
      Consider sctp module is unloaded and is being requested because an user
      is creating a sctp socket.
      
      During initialization, sctp will add the new protocol type and then
      initialize pernet subsys:
      
              status = sctp_v4_protosw_init();
              if (status)
                      goto err_protosw_init;
      
              status = sctp_v6_protosw_init();
              if (status)
                      goto err_v6_protosw_init;
      
              status = register_pernet_subsys(&sctp_net_ops);
      
      The problem is that after those calls to sctp_v{4,6}_protosw_init(), it
      is possible for userspace to create SCTP sockets like if the module is
      already fully loaded. If that happens, one of the possible effects is
      that we will have readers for net->sctp.local_addr_list list earlier
      than expected and sctp_net_init() does not take precautions while
      dealing with that list, leading to a potential panic but not limited to
      that, as sctp_sock_init() will copy a bunch of blank/partially
      initialized values from net->sctp.
      
      The race happens like this:
      
           CPU 0                           |  CPU 1
        socket()                           |
         __sock_create                     | socket()
          inet_create                      |  __sock_create
           list_for_each_entry_rcu(        |
              answer, &inetsw[sock->type], |
              list) {                      |   inet_create
            /* no hits */                  |
           if (unlikely(err)) {            |
            ...                            |
            request_module()               |
            /* socket creation is blocked  |
             * the module is fully loaded  |
             */                            |
             sctp_init                     |
              sctp_v4_protosw_init         |
               inet_register_protosw       |
                list_add_rcu(&p->list,     |
                             last_perm);   |
                                           |  list_for_each_entry_rcu(
                                           |     answer, &inetsw[sock->type],
              sctp_v6_protosw_init         |     list) {
                                           |     /* hit, so assumes protocol
                                           |      * is already loaded
                                           |      */
                                           |  /* socket creation continues
                                           |   * before netns is initialized
                                           |   */
              register_pernet_subsys       |
      
      Simply inverting the initialization order between
      register_pernet_subsys() and sctp_v4_protosw_init() is not possible
      because register_pernet_subsys() will create a control sctp socket, so
      the protocol must be already visible by then. Deferring the socket
      creation to a work-queue is not good specially because we loose the
      ability to handle its errors.
      
      So, as suggested by Vlad, the fix is to split netns initialization in
      two moments: defaults and control socket, so that the defaults are
      already loaded by when we register the protocol, while control socket
      initialization is kept at the same moment it is today.
      
      Fixes: 4db67e80 ("sctp: Make the address lists per network namespace")
      Signed-off-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8e2d61e0
    • Tycho Andersen's avatar
      ebpf: emit correct src_reg for conditional jumps · 19539ce7
      Tycho Andersen authored
      Instead of always emitting BPF_REG_X, let's emit BPF_REG_X only when the
      source actually is BPF_X. This causes programs generated by the classic
      converter to not be importable via bpf(), as the eBPF verifier checks that
      the src_reg is correct or 0. While not a problem yet, this will be a
      problem when BPF_PROG_DUMP lands, and we can potentially dump and re-import
      programs generated by the converter.
      Signed-off-by: default avatarTycho Andersen <tycho.andersen@canonical.com>
      CC: Alexei Starovoitov <ast@kernel.org>
      CC: Daniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      19539ce7
    • Daniel Borkmann's avatar
      netlink, mmap: transform mmap skb into full skb on taps · 1853c949
      Daniel Borkmann authored
      Ken-ichirou reported that running netlink in mmap mode for receive in
      combination with nlmon will throw a NULL pointer dereference in
      __kfree_skb() on nlmon_xmit(), in my case I can also trigger an "unable
      to handle kernel paging request". The problem is the skb_clone() in
      __netlink_deliver_tap_skb() for skbs that are mmaped.
      
      I.e. the cloned skb doesn't have a destructor, whereas the mmap netlink
      skb has it pointed to netlink_skb_destructor(), set in the handler
      netlink_ring_setup_skb(). There, skb->head is being set to NULL, so
      that in such cases, __kfree_skb() doesn't perform a skb_release_data()
      via skb_release_all(), where skb->head is possibly being freed through
      kfree(head) into slab allocator, although netlink mmap skb->head points
      to the mmap buffer. Similarly, the same has to be done also for large
      netlink skbs where the data area is vmalloced. Therefore, as discussed,
      make a copy for these rather rare cases for now. This fixes the issue
      on my and Ken-ichirou's test-cases.
      
      Reference: http://thread.gmane.org/gmane.linux.network/371129
      Fixes: bcbde0d4 ("net: netlink: virtual tap device management")
      Reported-by: default avatarKen-ichirou MATSUZAWA <chamaken@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: default avatarKen-ichirou MATSUZAWA <chamaken@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1853c949
    • Linus Torvalds's avatar
      Merge tag 'sound-fix-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 64d1def7
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "A collection of small fixes since the last update: the HD-audio quirks
        as usual with a USB-audio fix and a trivial fix for the old sparc
        driver"
      
      * tag 'sound-fix-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: usb-audio: Change internal PCM order
        ALSA: hda - Fix white noise on Dell M3800
        ALSA: hda - Use ALC880_FIXUP_FUJITSU for FSC Amilo M1437
        ALSA: hda - Enable headphone jack detect on old Fujitsu laptops
        ALSA: sparc: amd7930: Fix module autoload for OF platform driver
        ALSA: hda - Add some FIXUP quirks for white noise on Dell laptop.
      64d1def7
    • Linus Torvalds's avatar
      Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · 04d78e39
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Just a bunch of fixes to squeeze in before -rc1:
      
         - three nouveau regression fixes
      
         - one qxl regression fix
      
         - a bunch of i915 fixes
      
        ... and some core displayport/atomic fixes"
      
      * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
        drm/nouveau/device: enable c800 quirk for tecra w50
        drm/nouveau/clk/gt215: Unbreak engine pausing for GT21x/MCP7x
        drm/nouveau/gr/nv04: fix big endian setting on gr context
        drm/qxl: validate monitors config modes
        drm/i915: Allow DSI dual link to be configured on any pipe
        drm/i915: Don't try to use DDR DVFS on CHV when disabled in the BIOS
        drm/i915: Fix CSR MMIO address check
        drm/i915: Limit the number of loops for reading a split 64bit register
        drm/i915: Fix broken mst get_hw_state.
        drm/i915: Pass hpd_status_i915[] to intel_get_hpd_pins() in pre-g4x
        uapi/drm/i915_drm.h: fix userspace compilation.
        drm/i915: Always mark the object as dirty when used by the GPU
        drm/dp: Add dp_aux_i2c_speed_khz module param to set the assume i2c bus speed
        drm/dp: Adjust i2c-over-aux retry count based on message size and i2c bus speed
        drm/dp: Define AUX_RETRY_INTERVAL as 500 us
        drm/atomic: Fix bookkeeping with TEST_ONLY, v3.
      04d78e39
    • Dave Airlie's avatar
      Merge branch 'linux-4.3' of git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-next · 9fbcc7c0
      Dave Airlie authored
      three nouveau regression fixes.
      * 'linux-4.3' of git://anongit.freedesktop.org/git/nouveau/linux-2.6:
        drm/nouveau/device: enable c800 quirk for tecra w50
        drm/nouveau/clk/gt215: Unbreak engine pausing for GT21x/MCP7x
        drm/nouveau/gr/nv04: fix big endian setting on gr context
      9fbcc7c0
    • Linus Torvalds's avatar
      Merge branch 'for-4.3/blkcg' of git://git.kernel.dk/linux-block · b0a1ea51
      Linus Torvalds authored
      Pull blk-cg updates from Jens Axboe:
       "A bit later in the cycle, but this has been in the block tree for a a
        while.  This is basically four patchsets from Tejun, that improve our
        buffered cgroup writeback.  It was dependent on the other cgroup
        changes, but they went in earlier in this cycle.
      
        Series 1 is set of 5 patches that has cgroup writeback updates:
      
         - bdi_writeback iteration fix which could lead to some wb's being
           skipped or repeated during e.g. sync under memory pressure.
      
         - Simplification of wb work wait mechanism.
      
         - Writeback tracepoints updated to report cgroup.
      
        Series 2 is is a set of updates for the CFQ cgroup writeback handling:
      
           cfq has always charged all async IOs to the root cgroup.  It didn't
           have much choice as writeback didn't know about cgroups and there
           was no way to tell who to blame for a given writeback IO.
           writeback finally grew support for cgroups and now tags each
           writeback IO with the appropriate cgroup to charge it against.
      
           This patchset updates cfq so that it follows the blkcg each bio is
           tagged with.  Async cfq_queues are now shared across cfq_group,
           which is per-cgroup, instead of per-request_queue cfq_data.  This
           makes all IOs follow the weight based IO resource distribution
           implemented by cfq.
      
           - Switched from GFP_ATOMIC to GFP_NOWAIT as suggested by Jeff.
      
           - Other misc review points addressed, acks added and rebased.
      
        Series 3 is the blkcg policy cleanup patches:
      
           This patchset contains assorted cleanups for blkcg_policy methods
           and blk[c]g_policy_data handling.
      
           - alloc/free added for blkg_policy_data.  exit dropped.
      
           - alloc/free added for blkcg_policy_data.
      
           - blk-throttle's async percpu allocation is replaced with direct
             allocation.
      
           - all methods now take blk[c]g_policy_data instead of blkcg_gq or
             blkcg.
      
        And finally, series 4 is a set of patches cleaning up the blkcg stats
        handling:
      
          blkcg's stats have always been somwhat of a mess.  This patchset
          tries to improve the situation a bit.
      
           - The following patches added to consolidate blkcg entry point and
             blkg creation.  This is in itself is an improvement and helps
             colllecting common stats on bio issue.
      
           - per-blkg stats now accounted on bio issue rather than request
             completion so that bio based and request based drivers can behave
             the same way.  The issue was spotted by Vivek.
      
           - cfq-iosched implements custom recursive stats and blk-throttle
             implements custom per-cpu stats.  This patchset make blkcg core
             support both by default.
      
           - cfq-iosched and blk-throttle keep track of the same stats
             multiple times.  Unify them"
      
      * 'for-4.3/blkcg' of git://git.kernel.dk/linux-block: (45 commits)
        blkcg: use CGROUP_WEIGHT_* scale for io.weight on the unified hierarchy
        blkcg: s/CFQ_WEIGHT_*/CFQ_WEIGHT_LEGACY_*/
        blkcg: implement interface for the unified hierarchy
        blkcg: misc preparations for unified hierarchy interface
        blkcg: separate out tg_conf_updated() from tg_set_conf()
        blkcg: move body parsing from blkg_conf_prep() to its callers
        blkcg: mark existing cftypes as legacy
        blkcg: rename subsystem name from blkio to io
        blkcg: refine error codes returned during blkcg configuration
        blkcg: remove unnecessary NULL checks from __cfqg_set_weight_device()
        blkcg: reduce stack usage of blkg_rwstat_recursive_sum()
        blkcg: remove cfqg_stats->sectors
        blkcg: move io_service_bytes and io_serviced stats into blkcg_gq
        blkcg: make blkg_[rw]stat_recursive_sum() to be able to index into blkcg_gq
        blkcg: make blkcg_[rw]stat per-cpu
        blkcg: add blkg_[rw]stat->aux_cnt and replace cfq_group->dead_stats with it
        blkcg: consolidate blkg creation in blkcg_bio_issue_check()
        blk-throttle: improve queue bypass handling
        blkcg: move root blkg lookup optimization from throtl_lookup_tg() to __blkg_lookup()
        blkcg: inline [__]blkg_lookup()
        ...
      b0a1ea51