1. 24 Jun, 2016 26 commits
    • Jakub Sitnicki's avatar
      ipv6: Skip XFRM lookup if dst_entry in socket cache is valid · 48fc8f94
      Jakub Sitnicki authored
      [ Upstream commit 00bc0ef5 ]
      
      At present we perform an xfrm_lookup() for each UDPv6 message we
      send. The lookup involves querying the flow cache (flow_cache_lookup)
      and, in case of a cache miss, creating an XFRM bundle.
      
      If we miss the flow cache, we can end up creating a new bundle and
      deriving the path MTU (xfrm_init_pmtu) from on an already transformed
      dst_entry, which we pass from the socket cache (sk->sk_dst_cache) down
      to xfrm_lookup(). This can happen only if we're caching the dst_entry
      in the socket, that is when we're using a connected UDP socket.
      
      To put it another way, the path MTU shrinks each time we miss the flow
      cache, which later on leads to incorrectly fragmented payload. It can
      be observed with ESPv6 in transport mode:
      
        1) Set up a transformation and lower the MTU to trigger fragmentation
          # ip xfrm policy add dir out src ::1 dst ::1 \
            tmpl src ::1 dst ::1 proto esp spi 1
          # ip xfrm state add src ::1 dst ::1 \
            proto esp spi 1 enc 'aes' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b
          # ip link set dev lo mtu 1500
      
        2) Monitor the packet flow and set up an UDP sink
          # tcpdump -ni lo -ttt &
          # socat udp6-listen:12345,fork /dev/null &
      
        3) Send a datagram that needs fragmentation with a connected socket
          # perl -e 'print "@" x 1470 | socat - udp6:[::1]:12345
          2016/06/07 18:52:52 socat[724] E read(3, 0x555bb3d5ba00, 8192): Protocol error
          00:00:00.000000 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x2), length 1448
          00:00:00.000014 IP6 ::1 > ::1: frag (1448|32)
          00:00:00.000050 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x3), length 1272
          (^ ICMPv6 Parameter Problem)
          00:00:00.000022 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x5), length 136
      
        4) Compare it to a non-connected socket
          # perl -e 'print "@" x 1500' | socat - udp6-sendto:[::1]:12345
          00:00:40.535488 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x6), length 1448
          00:00:00.000010 IP6 ::1 > ::1: frag (1448|64)
      
      What happens in step (3) is:
      
        1) when connecting the socket in __ip6_datagram_connect(), we
           perform an XFRM lookup, miss the flow cache, create an XFRM
           bundle, and cache the destination,
      
        2) afterwards, when sending the datagram, we perform an XFRM lookup,
           again, miss the flow cache (due to mismatch of flowi6_iif and
           flowi6_oif, which is an issue of its own), and recreate an XFRM
           bundle based on the cached (and already transformed) destination.
      
      To prevent the recreation of an XFRM bundle, avoid an XFRM lookup
      altogether whenever we already have a destination entry cached in the
      socket. This prevents the path MTU shrinkage and brings us on par with
      UDPv4.
      
      The fix also benefits connected PINGv6 sockets, another user of
      ip6_sk_dst_lookup_flow(), who also suffer messages being transformed
      twice.
      
      Joint work with Hannes Frederic Sowa.
      Reported-by: default avatarJan Tluka <jtluka@redhat.com>
      Signed-off-by: default avatarJakub Sitnicki <jkbs@redhat.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      48fc8f94
    • Guillaume Nault's avatar
      l2tp: fix configuration passed to setup_udp_tunnel_sock() · d532139b
      Guillaume Nault authored
      [ Upstream commit a5c5e2da ]
      
      Unused fields of udp_cfg must be all zeros. Otherwise
      setup_udp_tunnel_sock() fills ->gro_receive and ->gro_complete
      callbacks with garbage, eventually resulting in panic when used by
      udp_gro_receive().
      
      [   72.694123] BUG: unable to handle kernel paging request at ffff880033f87d78
      [   72.695518] IP: [<ffff880033f87d78>] 0xffff880033f87d78
      [   72.696530] PGD 26e2067 PUD 26e3067 PMD 342ed063 PTE 8000000033f87163
      [   72.696530] Oops: 0011 [#1] SMP KASAN
      [   72.696530] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pptp gre pppox ppp_generic slhc crc32c_intel ghash_clmulni_intel jitterentropy_rng sha256_generic hmac drbg ansi_cprng aesni_intel evdev aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper serio_raw acpi_cpufreq button proc\
      essor ext4 crc16 jbd2 mbcache virtio_blk virtio_net virtio_pci virtio_ring virtio
      [   72.696530] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.7.0-rc1 #1
      [   72.696530] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
      [   72.696530] task: ffff880035b59700 ti: ffff880035b70000 task.ti: ffff880035b70000
      [   72.696530] RIP: 0010:[<ffff880033f87d78>]  [<ffff880033f87d78>] 0xffff880033f87d78
      [   72.696530] RSP: 0018:ffff880035f87bc0  EFLAGS: 00010246
      [   72.696530] RAX: ffffed000698f996 RBX: ffff88003326b840 RCX: ffffffff814cc823
      [   72.696530] RDX: ffff88003326b840 RSI: ffff880033e48038 RDI: ffff880034c7c780
      [   72.696530] RBP: ffff880035f87c18 R08: 000000000000a506 R09: 0000000000000000
      [   72.696530] R10: ffff880035f87b38 R11: ffff880034b9344d R12: 00000000ebfea715
      [   72.696530] R13: 0000000000000000 R14: ffff880034c7c780 R15: 0000000000000000
      [   72.696530] FS:  0000000000000000(0000) GS:ffff880035f80000(0000) knlGS:0000000000000000
      [   72.696530] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   72.696530] CR2: ffff880033f87d78 CR3: 0000000033c98000 CR4: 00000000000406a0
      [   72.696530] Stack:
      [   72.696530]  ffffffff814cc834 ffff880034b93468 0000001481416818 ffff88003326b874
      [   72.696530]  ffff880034c7ccb0 ffff880033e48038 ffff88003326b840 ffff880034b93462
      [   72.696530]  ffff88003326b88a ffff88003326b88c ffff880034b93468 ffff880035f87c70
      [   72.696530] Call Trace:
      [   72.696530]  <IRQ>
      [   72.696530]  [<ffffffff814cc834>] ? udp_gro_receive+0x1c6/0x1f9
      [   72.696530]  [<ffffffff814ccb1c>] udp4_gro_receive+0x2b5/0x310
      [   72.696530]  [<ffffffff814d989b>] inet_gro_receive+0x4a3/0x4cd
      [   72.696530]  [<ffffffff81431b32>] dev_gro_receive+0x584/0x7a3
      [   72.696530]  [<ffffffff810adf7a>] ? __lock_is_held+0x29/0x64
      [   72.696530]  [<ffffffff814321f7>] napi_gro_receive+0x124/0x21d
      [   72.696530]  [<ffffffffa000b145>] virtnet_receive+0x8df/0x8f6 [virtio_net]
      [   72.696530]  [<ffffffffa000b27e>] virtnet_poll+0x1d/0x8d [virtio_net]
      [   72.696530]  [<ffffffff81431350>] net_rx_action+0x15b/0x3b9
      [   72.696530]  [<ffffffff815893d6>] __do_softirq+0x216/0x546
      [   72.696530]  [<ffffffff81062392>] irq_exit+0x49/0xb6
      [   72.696530]  [<ffffffff81588e9a>] do_IRQ+0xe2/0xfa
      [   72.696530]  [<ffffffff81587a49>] common_interrupt+0x89/0x89
      [   72.696530]  <EOI>
      [   72.696530]  [<ffffffff810b05df>] ? trace_hardirqs_on_caller+0x229/0x270
      [   72.696530]  [<ffffffff8102b3c7>] ? default_idle+0x1c/0x2d
      [   72.696530]  [<ffffffff8102b3c5>] ? default_idle+0x1a/0x2d
      [   72.696530]  [<ffffffff8102bb8c>] arch_cpu_idle+0xa/0xc
      [   72.696530]  [<ffffffff810a6c39>] default_idle_call+0x1a/0x1c
      [   72.696530]  [<ffffffff810a6d96>] cpu_startup_entry+0x15b/0x20f
      [   72.696530]  [<ffffffff81039a81>] start_secondary+0x12c/0x133
      [   72.696530] Code: ff ff ff ff ff ff ff ff ff ff 7f ff ff ff ff ff ff ff 7f 00 7e f8 33 00 88 ff ff 6d 61 58 81 ff ff ff ff 5e de 0a 81 ff ff ff ff <00> 5c e2 34 00 88 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   72.696530] RIP  [<ffff880033f87d78>] 0xffff880033f87d78
      [   72.696530]  RSP <ffff880035f87bc0>
      [   72.696530] CR2: ffff880033f87d78
      [   72.696530] ---[ end trace ad7758b9a1dccf99 ]---
      [   72.696530] Kernel panic - not syncing: Fatal exception in interrupt
      [   72.696530] Kernel Offset: disabled
      [   72.696530] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
      
      v2: use empty initialiser instead of "{ NULL }" to avoid relying on
          first field's type.
      
      Fixes: 38fd2af2 ("udp: Add socket based GRO and config")
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d532139b
    • Toshiaki Makita's avatar
      bridge: Don't insert unnecessary local fdb entry on changing mac address · df181e84
      Toshiaki Makita authored
      [ Upstream commit 0b148def ]
      
      The missing br_vlan_should_use() test caused creation of an unneeded
      local fdb entry on changing mac address of a bridge device when there is
      a vlan which is configured on a bridge port but not on the bridge
      device.
      
      Fixes: 2594e906 ("bridge: vlan: add per-vlan struct and move to rhashtables")
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      df181e84
    • Yuchung Cheng's avatar
      tcp: record TLP and ER timer stats in v6 stats · 1a5a4a9d
      Yuchung Cheng authored
      [ Upstream commit ce3cf4ec ]
      
      The v6 tcp stats scan do not provide TLP and ER timer information
      correctly like the v4 version . This patch fixes that.
      
      Fixes: 6ba8a3b1 ("tcp: Tail loss probe (TLP)")
      Fixes: eed530b6 ("tcp: early retransmit")
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1a5a4a9d
    • Daniel Borkmann's avatar
      bpf, trace: use READ_ONCE for retrieving file ptr · a6878f9f
      Daniel Borkmann authored
      [ Upstream commit 5b6c1b4d ]
      
      In bpf_perf_event_read() and bpf_perf_event_output(), we must use
      READ_ONCE() for fetching the struct file pointer, which could get
      updated concurrently, so we must prevent the compiler from potential
      refetching.
      
      We already do this with tail calls for fetching the related bpf_prog,
      but not so on stored perf events. Semantics for both are the same
      with regards to updates.
      
      Fixes: a43eec30 ("bpf: introduce bpf_perf_event_output() helper")
      Fixes: 35578d79 ("bpf: Implement function bpf_perf_event_read() that get the selected hardware PMU conuter")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a6878f9f
    • Elad Kanfi's avatar
      net: nps_enet: Disable interrupts before napi reschedule · 2d83aede
      Elad Kanfi authored
      [ Upstream commit 86651650 ]
      
      Since NAPI works by shutting down event interrupts when theres
      work and turning them on when theres none, the net driver must
      make sure that interrupts are disabled when it reschedules polling.
      By calling napi_reschedule, the driver switches to polling mode,
      therefor there should be no interrupt interference.
      Any received packets will be handled in nps_enet_poll by polling the HW
      indication of received packet until all packets are handled.
      Signed-off-by: default avatarElad Kanfi <eladkan@mellanox.com>
      Acked-by: default avatarNoam Camus <noamca@mellanox.com>
      Tested-by: default avatarAlexey Brodkin <abrodkin@synopsys.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2d83aede
    • Chen Haiquan's avatar
      vxlan: Accept user specified MTU value when create new vxlan link · 2e42134e
      Chen Haiquan authored
      [ Upstream commit ce577668 ]
      
      When create a new vxlan link, example:
        ip link add vtap mtu 1440 type vxlan vni 1 dev eth0
      
      The argument "mtu" has no effect, because it is not set to conf->mtu. The
      default value is used in vxlan_dev_configure function.
      
      This problem was introduced by commit 0dfbdf41 (vxlan: Factor out device
      configuration).
      
      Fixes: 0dfbdf41 (vxlan: Factor out device configuration)
      Signed-off-by: default avatarChen Haiquan <oc@yunify.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2e42134e
    • Marek Vasut's avatar
      net: stmmac: Fix incorrect memcpy source memory · 0a626fb8
      Marek Vasut authored
      [ Upstream commit 643d60bf ]
      
      The memcpy() currently copies mdio_bus_data into new_bus->irq, which
      makes no sense, since the mdio_bus_data structure contains more than
      just irqs. The code was likely supposed to copy mdio_bus_data->irqs
      into the new_bus->irq instead, so fix this.
      
      Fixes: e7f4dc35 ("mdio: Move allocation of interrupts into core")
      Signed-off-by: default avatarMarek Vasut <marex@denx.de>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0a626fb8
    • Feng Tang's avatar
      net: alx: use custom skb allocator · 5f47c0ca
      Feng Tang authored
      [ Upstream commit 26c5f03b ]
      
      This patch follows Eric Dumazet's commit 7b701764 for Atheros
      atl1c driver to fix one exactly same bug in alx driver, that the
      network link will be lost in 1-5 minutes after the device is up.
      
      My laptop Lenovo Y580 with Atheros AR8161 ethernet device hit the
      same problem with kernel 4.4, and it will be cured by Jarod Wilson's
      commit c406700c for alx driver which get merged in 4.5. But there
      are still some alx devices can't function well even with Jarod's
      patch, while this patch could make them work fine. More details on
      	https://bugzilla.kernel.org/show_bug.cgi?id=70761
      
      The debug shows the issue is very likely to be related with the RX
      DMA address, specifically 0x...f80, if RX buffer get 0x...f80 several
      times, their will be RX overflow error and device will stop working.
      
      For kernel 4.5.0 with Jarod's patch which works fine with my
      AR8161/Lennov Y580, if I made some change to the
      	__netdev_alloc_skb
      		--> __alloc_page_frag()
      to make the allocated buffer can get an address with 0x...f80,
      then the same error happens. If I make it to 0x...f40 or 0x....fc0,
      everything will be still fine. So I tend to believe that the
      0x..f80 address cause the silicon to behave abnormally.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=70761
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Jarod Wilson <jarod@redhat.com>
      Signed-off-by: default avatarFeng Tang <feng.tang@intel.com>
      Tested-by: default avatarOle Lukoie <olelukoie@mail.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5f47c0ca
    • Ivan Vecera's avatar
      team: don't call netdev_change_features under team->lock · 928cb370
      Ivan Vecera authored
      [ Upstream commit f6988cb6 ]
      
      The team_device_event() notifier calls team_compute_features() to fix
      vlan_features under team->lock to protect team->port_list. The problem is
      that subsequent __team_compute_features() calls netdev_change_features()
      to propagate vlan_features to upper vlan devices while team->lock is still
      taken. This can lead to deadlock when NETIF_F_LRO is modified on lower
      devices or team device itself.
      
      Example:
      The team0 as active backup with eth0 and eth1 NICs. Both eth0 & eth1 are
      LRO capable and LRO is enabled. Thus LRO is also enabled on team0.
      
      The command 'ethtool -K team0 lro off' now hangs due to this deadlock:
      
      dev_ethtool()
      -> ethtool_set_features()
       -> __netdev_update_features(team)
        -> netdev_sync_lower_features()
         -> netdev_update_features(lower_1)
          -> __netdev_update_features(lower_1)
          -> netdev_features_change(lower_1)
           -> call_netdevice_notifiers(...)
            -> team_device_event(lower_1)
             -> team_compute_features(team) [TAKES team->lock]
              -> netdev_change_features(team)
               -> __netdev_update_features(team)
                -> netdev_sync_lower_features()
                 -> netdev_update_features(lower_2)
                  -> __netdev_update_features(lower_2)
                  -> netdev_features_change(lower_2)
                   -> call_netdevice_notifiers(...)
                    -> team_device_event(lower_2)
                     -> team_compute_features(team) [DEADLOCK]
      
      The bug is present in team from the beginning but it appeared after the commit
      fd867d51 (net/core: generic support for disabling netdev features down stack)
      that adds synchronization of features with lower devices.
      
      Fixes: fd867d51 (net/core: generic support for disabling netdev features down stack)
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      928cb370
    • Edward Cree's avatar
      sfc: on MC reset, clear PIO buffer linkage in TXQs · f9f1165b
      Edward Cree authored
      [ Upstream commit c0795bf6 ]
      
      Otherwise, if we fail to allocate new PIO buffers, our TXQs will try to
      use the old ones, which aren't there any more.
      
      Fixes: 183233be "sfc: Allocate and link PIO buffers; map them with write-combining"
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f9f1165b
    • Gregory CLEMENT's avatar
      net: hwbm: Fix unbalanced spinlock in error case · cba1de5b
      Gregory CLEMENT authored
      [ Upstream commit b388fc74 ]
      
      When hwbm_pool_add exited in error the spinlock was not released. This
      patch fixes this issue.
      
      Fixes: 8cb2d8bf ("net: add a hardware buffer management helper API")
      Reported-by: default avatarJean-Jacques Hiblot <jjhiblot@traphandler.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarGregory CLEMENT <gregory.clement@free-electrons.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cba1de5b
    • Gregory CLEMENT's avatar
      net: mvneta: Fix lacking spinlock initialization · 4360cde0
      Gregory CLEMENT authored
      [ Upstream commit 91c45e38 ]
      
      The spinlock used by the hwbm functions must be initialized by the
      network driver. This commit fixes this lack and the following erros when
      lockdep is enabled:
      
      INFO: trying to register non-static key.
      the code is fine but needs lockdep annotation.
      turning off the locking correctness validator.
      [<c010ff80>] (unwind_backtrace) from [<c010bd08>] (show_stack+0x10/0x14)
      [<c010bd08>] (show_stack) from [<c032913c>] (dump_stack+0xb4/0xe0)
      [<c032913c>] (dump_stack) from [<c01670e4>] (__lock_acquire+0x1f58/0x2060)
      [<c01670e4>] (__lock_acquire) from [<c0167dec>] (lock_acquire+0xa4/0xd0)
      [<c0167dec>] (lock_acquire) from [<c06f6650>] (_raw_spin_lock_irqsave+0x54/0x68)
      [<c06f6650>] (_raw_spin_lock_irqsave) from [<c058e830>] (hwbm_pool_add+0x1c/0xdc)
      [<c058e830>] (hwbm_pool_add) from [<c043f4e8>] (mvneta_bm_pool_use+0x338/0x490)
      [<c043f4e8>] (mvneta_bm_pool_use) from [<c0443198>] (mvneta_probe+0x654/0x1284)
      [<c0443198>] (mvneta_probe) from [<c03b894c>] (platform_drv_probe+0x4c/0xb0)
      [<c03b894c>] (platform_drv_probe) from [<c03b7158>] (driver_probe_device+0x214/0x2c0)
      [<c03b7158>] (driver_probe_device) from [<c03b72c4>] (__driver_attach+0xc0/0xc4)
      [<c03b72c4>] (__driver_attach) from [<c03b5440>] (bus_for_each_dev+0x68/0x9c)
      [<c03b5440>] (bus_for_each_dev) from [<c03b65b8>] (bus_add_driver+0x1a0/0x218)
      [<c03b65b8>] (bus_add_driver) from [<c03b79cc>] (driver_register+0x78/0xf8)
      [<c03b79cc>] (driver_register) from [<c01018f4>] (do_one_initcall+0x90/0x1dc)
      [<c01018f4>] (do_one_initcall) from [<c0900de4>] (kernel_init_freeable+0x15c/0x1fc)
      [<c0900de4>] (kernel_init_freeable) from [<c06eed90>] (kernel_init+0x8/0x114)
      [<c06eed90>] (kernel_init) from [<c0107910>] (ret_from_fork+0x14/0x24)
      
      Fixes: baa11ebc ("net: mvneta: Use the new hwbm framework")
      Reported-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarGregory CLEMENT <gregory.clement@free-electrons.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4360cde0
    • Daniel Borkmann's avatar
      bpf, inode: disallow userns mounts · 5614cb19
      Daniel Borkmann authored
      [ Upstream commit 612bacad ]
      
      Follow-up to commit e27f4a94 ("bpf: Use mount_nodev not mount_ns
      to mount the bpf filesystem"), which removes the FS_USERNS_MOUNT flag.
      
      The original idea was to have a per mountns instance instead of a
      single global fs instance, but that didn't work out and we had to
      switch to mount_nodev() model. The intent of that middle ground was
      that we avoid users who don't play nice to create endless instances
      of bpf fs which are difficult to control and discover from an admin
      point of view, but at the same time it would have allowed us to be
      more flexible with regard to namespaces.
      
      Therefore, since we now did the switch to mount_nodev() as a fix
      where individual instances are created, we also need to remove userns
      mount flag along with it to avoid running into mentioned situation.
      I don't expect any breakage at this early point in time with removing
      the flag and we can revisit this later should the requirement for
      this come up with future users. This and commit e27f4a94 have
      been split to facilitate tracking should any of them run into the
      unlikely case of causing a regression.
      
      Fixes: b2197755 ("bpf: add support for persistent maps/progs")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5614cb19
    • Ezequiel Garcia's avatar
      ipv4: Fix non-initialized TTL when CONFIG_SYSCTL=n · 02aa6d24
      Ezequiel Garcia authored
      [ Upstream commit 049bbf58 ]
      
      Commit fa50d974 ("ipv4: Namespaceify ip_default_ttl sysctl knob")
      moves the default TTL assignment, and as side-effect IPv4 TTL now
      has a default value only if sysctl support is enabled (CONFIG_SYSCTL=y).
      
      The sysctl_ip_default_ttl is fundamental for IP to work properly,
      as it provides the TTL to be used as default. The defautl TTL may be
      used in ip_selected_ttl, through the following flow:
      
        ip_select_ttl
          ip4_dst_hoplimit
            net->ipv4.sysctl_ip_default_ttl
      
      This commit fixes the issue by assigning net->ipv4.sysctl_ip_default_ttl
      in net_init_net, called during ipv4's initialization.
      
      Without this commit, a kernel built without sysctl support will send
      all IP packets with zero TTL (unless a TTL is explicitly set, e.g.
      with setsockopt).
      
      Given a similar issue might appear on the other knobs that were
      namespaceify, this commit also moves them.
      
      Fixes: fa50d974 ("ipv4: Namespaceify ip_default_ttl sysctl knob")
      Signed-off-by: default avatarEzequiel Garcia <ezequiel@vanguardiasur.com.ar>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      02aa6d24
    • Nicolas Dichtel's avatar
      uapi glibc compat: fix compilation when !__USE_MISC in glibc · d03c60f4
      Nicolas Dichtel authored
      [ Upstream commit f0a3fdca ]
      
      These structures are defined only if __USE_MISC is set in glibc net/if.h
      headers, ie when _BSD_SOURCE or _SVID_SOURCE are defined.
      
      CC: Jan Engelhardt <jengelh@inai.de>
      CC: Josh Boyer <jwboyer@fedoraproject.org>
      CC: Stephen Hemminger <shemming@brocade.com>
      CC: Waldemar Brodkorb <mail@waldemar-brodkorb.de>
      CC: Gabriel Laskar <gabriel@lse.epita.fr>
      CC: Mikko Rapeli <mikko.rapeli@iki.fi>
      Fixes: 4a91cb61 ("uapi glibc compat: fix compile errors when glibc net/if.h included before linux/if.h")
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d03c60f4
    • Hannes Frederic Sowa's avatar
      udp: prevent skbs lingering in tunnel socket queues · e6b67488
      Hannes Frederic Sowa authored
      [ Upstream commit e5aed006 ]
      
      In case we find a socket with encapsulation enabled we should call
      the encap_recv function even if just a udp header without payload is
      available. The callbacks are responsible for correctly verifying and
      dropping the packets.
      
      Also, in case the header validation fails for geneve and vxlan we
      shouldn't put the skb back into the socket queue, no one will pick
      them up there.  Instead we can simply discard them in the respective
      encap_recv functions.
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e6b67488
    • Eric W. Biederman's avatar
      bpf: Use mount_nodev not mount_ns to mount the bpf filesystem · 6584810a
      Eric W. Biederman authored
      [ Upstream commit e27f4a94 ]
      
      While reviewing the filesystems that set FS_USERNS_MOUNT I spotted the
      bpf filesystem.  Looking at the code I saw a broken usage of mount_ns
      with current->nsproxy->mnt_ns. As the code does not acquire a
      reference to the mount namespace it can not possibly be correct to
      store the mount namespace on the superblock as it does.
      
      Replace mount_ns with mount_nodev so that each mount of the bpf
      filesystem returns a distinct instance, and the code is not buggy.
      
      In discussion with Hannes Frederic Sowa it was reported that the use
      of mount_ns was an attempt to have one bpf instance per mount
      namespace, in an attempt to keep resources that pin resources from
      hiding.  That intent simply does not work, the vfs is not built to
      allow that kind of behavior.  Which means that the bpf filesystem
      really is buggy both semantically and in it's implemenation as it does
      not nor can it implement the original intent.
      
      This change is userspace visible, but my experience with similar
      filesystems leads me to believe nothing will break with a model of each
      mount of the bpf filesystem is distinct from all others.
      
      Fixes: b2197755 ("bpf: add support for persistent maps/progs")
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6584810a
    • Jason Wang's avatar
      tuntap: correctly wake up process during uninit · b8dddea3
      Jason Wang authored
      [ Upstream commit addf8fc4 ]
      
      We used to check dev->reg_state against NETREG_REGISTERED after each
      time we are woke up. But after commit 9e641bdc ("net-tun:
      restructure tun_do_read for better sleep/wakeup efficiency"), it uses
      skb_recv_datagram() which does not check dev->reg_state. This will
      result if we delete a tun/tap device after a process is blocked in the
      reading. The device will wait for the reference count which was held
      by that process for ever.
      
      Fixes this by using RCV_SHUTDOWN which will be checked during
      sk_recv_datagram() before trying to wake up the process during uninit.
      
      Fixes: 9e641bdc ("net-tun: restructure tun_do_read for better
      sleep/wakeup efficiency")
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Xi Wang <xii@google.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b8dddea3
    • Sabrina Dubroca's avatar
      macsec: fix netlink attribute for key id · a70e1edf
      Sabrina Dubroca authored
      [ Upstream commit 1968a0b8 ]
      
      In my last commit I replaced MACSEC_SA_ATTR_KEYID by
      MACSEC_SA_ATTR_KEY.
      
      Fixes: 8acca6ac ("macsec: key identifier is 128 bits, not 64")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a70e1edf
    • Jiri Pirko's avatar
      switchdev: pass pointer to fib_info instead of copy · 2e1d00bf
      Jiri Pirko authored
      [ Upstream commit da4ed551 ]
      
      The problem is that fib_info->nh is [0] so the struct fib_info
      allocation size depends on number of nexthops. If we just copy fib_info,
      we do not copy the nexthops info and driver accesses memory which is not
      ours.
      
      Given the fact that fib4 does not defer operations and therefore it does
      not need copy, just pass the pointer down to drivers as it was done
      before.
      
      Fixes: 850d0cbc ("switchdev: remove pointers from switchdev objects")
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2e1d00bf
    • Richard Alpe's avatar
      tipc: fix nametable publication field in nl compat · 988ae90a
      Richard Alpe authored
      [ Upstream commit 03aaaa9b ]
      
      The publication field of the old netlink API should contain the
      publication key and not the publication reference.
      
      Fixes: 44a8ae94 (tipc: convert legacy nl name table dump to nl compat)
      Signed-off-by: default avatarRichard Alpe <richard.alpe@ericsson.com>
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      988ae90a
    • Herbert Xu's avatar
      netlink: Fix dump skb leak/double free · c55a7faa
      Herbert Xu authored
      [ Upstream commit 92964c79 ]
      
      When we free cb->skb after a dump, we do it after releasing the
      lock.  This means that a new dump could have started in the time
      being and we'll end up freeing their skb instead of ours.
      
      This patch saves the skb and module before we unlock so we free
      the right memory.
      
      Fixes: 16b304f3 ("netlink: Eliminate kmalloc in netlink dump operation.")
      Reported-by: default avatarBaozeng Ding <sploving1@gmail.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c55a7faa
    • Richard Alpe's avatar
      tipc: check nl sock before parsing nested attributes · ba7963c7
      Richard Alpe authored
      [ Upstream commit 45e093ae ]
      
      Make sure the socket for which the user is listing publication exists
      before parsing the socket netlink attributes.
      
      Prior to this patch a call without any socket caused a NULL pointer
      dereference in tipc_nl_publ_dump().
      Tested-and-reported-by: default avatarBaozeng Ding <sploving1@gmail.com>
      Signed-off-by: default avatarRichard Alpe <richard.alpe@ericsson.com>
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.cm>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ba7963c7
    • Ewan D. Milne's avatar
      scsi: Add QEMU CD-ROM to VPD Inquiry Blacklist · 7c1f0a9c
      Ewan D. Milne authored
      commit fbd83006 upstream.
      
      Linux fails to boot as a guest with a QEMU CD-ROM:
      
      [    4.439488] ata2.00: ATAPI: QEMU CD-ROM, 0.8.2, max UDMA/100
      [    4.443649] ata2.00: configured for MWDMA2
      [    4.450267] scsi 1:0:0:0: CD-ROM            QEMU     QEMU CD-ROM      0.8. PQ: 0 ANSI: 5
      [    4.464317] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
      [    4.464319] ata2.00: BMDMA stat 0x5
      [    4.464339] ata2.00: cmd a0/01:00:00:00:01/00:00:00:00:00/a0 tag 0 dma 16640 in
      [    4.464339]          Inquiry 12 01 00 00 ff 00res 48/20:02:00:24:00/00:00:00:00:00/a0 Emask 0x2 (HSM violation)
      [    4.464341] ata2.00: status: { DRDY DRQ }
      [    4.465864] ata2: soft resetting link
      [    4.625971] ata2.00: configured for MWDMA2
      [    4.628290] ata2: EH complete
      [    4.646670] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
      [    4.646671] ata2.00: BMDMA stat 0x5
      [    4.646683] ata2.00: cmd a0/01:00:00:00:01/00:00:00:00:00/a0 tag 0 dma 16640 in
      [    4.646683]          Inquiry 12 01 00 00 ff 00res 48/20:02:00:24:00/00:00:00:00:00/a0 Emask 0x2 (HSM violation)
      [    4.646685] ata2.00: status: { DRDY DRQ }
      [    4.648193] ata2: soft resetting link
      
      ...
      
      Fix this by suppressing VPD inquiry for this device.
      Signed-off-by: default avatarEwan D. Milne <emilne@redhat.com>
      Reported-by: default avatarJan Stancek <jstancek@redhat.com>
      Tested-by: default avatarJan Stancek <jstancek@redhat.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7c1f0a9c
    • James Bottomley's avatar
      scsi_lib: correctly retry failed zero length REQ_TYPE_FS commands · 2bb72cec
      James Bottomley authored
      commit a621bac3 upstream.
      
      When SCSI was written, all commands coming from the filesystem
      (REQ_TYPE_FS commands) had data.  This meant that our signal for needing
      to complete the command was the number of bytes completed being equal to
      the number of bytes in the request.  Unfortunately, with the advent of
      flush barriers, we can now get zero length REQ_TYPE_FS commands, which
      confuse this logic because they satisfy the condition every time.  This
      means they never get retried even for retryable conditions, like UNIT
      ATTENTION because we complete them early assuming they're done.  Fix
      this by special casing the early completion condition to recognise zero
      length commands with errors and let them drop through to the retry code.
      Reported-by: default avatarSebastian Parschauer <s.parschauer@gmx.de>
      Signed-off-by: default avatarJames E.J. Bottomley <jejb@linux.vnet.ibm.com>
      Tested-by: default avatarJack Wang <jinpu.wang@profitbricks.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2bb72cec
  2. 08 Jun, 2016 14 commits