1. 16 Mar, 2017 40 commits
    • Andrew Collins's avatar
      net: Add netdev all_adj_list refcnt propagation to fix panic · 611854c5
      Andrew Collins authored
      [ Upstream commit 93409033 ]
      
      This is a respin of a patch to fix a relatively easily reproducible kernel
      panic related to the all_adj_list handling for netdevs in recent kernels.
      
      The following sequence of commands will reproduce the issue:
      
      ip link add link eth0 name eth0.100 type vlan id 100
      ip link add link eth0 name eth0.200 type vlan id 200
      ip link add name testbr type bridge
      ip link set eth0.100 master testbr
      ip link set eth0.200 master testbr
      ip link add link testbr mac0 type macvlan
      ip link delete dev testbr
      
      This creates an upper/lower tree of (excuse the poor ASCII art):
      
                  /---eth0.100-eth0
      mac0-testbr-
                  \---eth0.200-eth0
      
      When testbr is deleted, the all_adj_lists are walked, and eth0 is deleted twice from
      the mac0 list. Unfortunately, during setup in __netdev_upper_dev_link, only one
      reference to eth0 is added, so this results in a panic.
      
      This change adds reference count propagation so things are handled properly.
      
      Matthias Schiffer reported a similar crash in batman-adv:
      
      https://github.com/freifunk-gluon/gluon/issues/680
      https://www.open-mesh.org/issues/247
      
      which this patch also seems to resolve.
      Signed-off-by: default avatarAndrew Collins <acollins@cradlepoint.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      611854c5
    • Douglas Caetano dos Santos's avatar
      tcp: fix wrong checksum calculation on MTU probing · 8db120b9
      Douglas Caetano dos Santos authored
      [ Upstream commit 2fe664f1 ]
      
      With TCP MTU probing enabled and offload TX checksumming disabled,
      tcp_mtu_probe() calculated the wrong checksum when a fragment being copied
      into the probe's SKB had an odd length. This was caused by the direct use
      of skb_copy_and_csum_bits() to calculate the checksum, as it pads the
      fragment being copied, if needed. When this fragment was not the last, a
      subsequent call used the previous checksum without considering this
      padding.
      
      The effect was a stale connection in one way, as even retransmissions
      wouldn't solve the problem, because the checksum was never recalculated for
      the full SKB length.
      Signed-off-by: default avatarDouglas Caetano dos Santos <douglascs@taghos.com.br>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      8db120b9
    • Eric Dumazet's avatar
      net: avoid sk_forward_alloc overflows · 58302431
      Eric Dumazet authored
      [ Upstream commit 20c64d5c ]
      
      A malicious TCP receiver, sending SACK, can force the sender to split
      skbs in write queue and increase its memory usage.
      
      Then, when socket is closed and its write queue purged, we might
      overflow sk_forward_alloc (It becomes negative)
      
      sk_mem_reclaim() does nothing in this case, and more than 2GB
      are leaked from TCP perspective (tcp_memory_allocated is not changed)
      
      Then warnings trigger from inet_sock_destruct() and
      sk_stream_kill_queues() seeing a not zero sk_forward_alloc
      
      All TCP stack can be stuck because TCP is under memory pressure.
      
      A simple fix is to preemptively reclaim from sk_mem_uncharge().
      
      This makes sure a socket wont have more than 2 MB forward allocated,
      after burst and idle period.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      58302431
    • Eric Dumazet's avatar
      tcp: fix overflow in __tcp_retransmit_skb() · 7bf2abbc
      Eric Dumazet authored
      [ Upstream commit ffb4d6c8 ]
      
      If a TCP socket gets a large write queue, an overflow can happen
      in a test in __tcp_retransmit_skb() preventing all retransmits.
      
      The flow then stalls and resets after timeouts.
      
      Tested:
      
      sysctl -w net.core.wmem_max=1000000000
      netperf -H dest -- -s 1000000000
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      7bf2abbc
    • Eric Dumazet's avatar
      net: fix sk_mem_reclaim_partial() · 8d7c2ffc
      Eric Dumazet authored
      commit 1a24e04e upstream.
      
      sk_mem_reclaim_partial() goal is to ensure each socket has
      one SK_MEM_QUANTUM forward allocation. This is needed both for
      performance and better handling of memory pressure situations in
      follow up patches.
      
      SK_MEM_QUANTUM is currently a page, but might be reduced to 4096 bytes
      as some arches have 64KB pages.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      8d7c2ffc
    • Beniamino Galvani's avatar
      bonding: set carrier off for devices created through netlink · 6765bbc4
      Beniamino Galvani authored
      [ Upstream commit 005db31d ]
      
      Commit e826eafa ("bonding: Call netif_carrier_off after
      register_netdevice") moved netif_carrier_off() from bond_init() to
      bond_create(), but the latter is called only for initial default
      devices and ones created through sysfs:
      
       $ modprobe bonding
       $ echo +bond1 > /sys/class/net/bonding_masters
       $ ip link add bond2 type bond
       $ grep "MII Status" /proc/net/bonding/*
       /proc/net/bonding/bond0:MII Status: down
       /proc/net/bonding/bond1:MII Status: down
       /proc/net/bonding/bond2:MII Status: up
      
      Ensure that carrier is initially off also for devices created through
      netlink.
      Signed-off-by: default avatarBeniamino Galvani <bgalvani@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      6765bbc4
    • Bjørn Mork's avatar
      cdc_ncm: workaround for EM7455 "silent" data interface · 35342020
      Bjørn Mork authored
      [ Upstream commit c086e709 ]
      
      Several Lenovo users have reported problems with their Sierra
      Wireless EM7455 modem. The driver has loaded successfully and
      the MBIM management channel has appeared to work, including
      establishing a connection to the mobile network. But no frames
      have been received over the data interface.
      
      The problem affects all EM7455 and MC7455, and is assumed to
      affect other modems based on the same Qualcomm chipset and
      baseband firmware.
      
      Testing narrowed the problem down to what seems to be a
      firmware timing bug during initialization. Adding a short sleep
      while probing is sufficient to make the problem disappear.
      Experiments have shown that 1-2 ms is too little to have any
      effect, while 10-20 ms is enough to reliably succeed.
      Reported-by: default avatarStefan Armbruster <ml001@armbruster-it.de>
      Reported-by: default avatarRalph Plawetzki <ralph@purejava.org>
      Reported-by: default avatarAndreas Fett <andreas.fett@secunet.com>
      Reported-by: default avatarRasmus Lerdorf <rasmus@lerdorf.com>
      Reported-by: default avatarSamo Ratnik <samo.ratnik@gmail.com>
      Reported-and-tested-by: default avatarAleksander Morgado <aleksander@aleksander.es>
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      35342020
    • Feng Tang's avatar
      net: alx: Work around the DMA RX overflow issue · e8e0d79f
      Feng Tang authored
      [ Upstream commit 881d0327 ]
      
      Note: This is a verified backported patch for stable 4.4 kernel, and it
      could also be applied to 4.3/4.2/4.1/3.18/3.16
      
      There is a problem with alx devices, that the network link will be
      lost in 1-5 minutes after the device is up.
      
      >From debugging without datasheet, we found the error always
      happen when the DMA RX address is set to 0x....fc0, which is very
      likely to be a HW/silicon problem.
      
      This patch will apply rx skb with 64 bytes longer space, and if the
      allocated skb has a 0x...fc0 address, it will use skb_resever(skb, 64)
      to advance the address, so that the RX overflow can be avoided.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=70761Signed-off-by: default avatarFeng Tang <feng.tang@intel.com>
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Tested-by: default avatarOle Lukoie <olelukoie@mail.ru>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      e8e0d79f
    • Tom Goff's avatar
      ipmr/ip6mr: Initialize the last assert time of mfc entries. · 797a8385
      Tom Goff authored
      [ Upstream commit 70a0dec4 ]
      
      This fixes wrong-interface signaling on 32-bit platforms for entries
      created when jiffies > 2^31 + MFC_ASSERT_THRESH.
      Signed-off-by: default avatarTom Goff <thomas.goff@ll.mit.edu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      797a8385
    • Simon Horman's avatar
      sit: correct IP protocol used in ipip6_err · 98ae009c
      Simon Horman authored
      [ Upstream commit d5d8760b ]
      
      Since 32b8a8e5 ("sit: add IPv4 over IPv4 support")
      ipip6_err() may be called for packets whose IP protocol is
      IPPROTO_IPIP as well as those whose IP protocol is IPPROTO_IPV6.
      
      In the case of IPPROTO_IPIP packets the correct protocol value is not
      passed to ipv4_update_pmtu() or ipv4_redirect().
      
      This patch resolves this problem by using the IP protocol of the packet
      rather than a hard-coded value. This appears to be consistent
      with the usage of the protocol of a packet by icmp_socket_deliver()
      the caller of ipip6_err().
      
      I was able to exercise the redirect case by using a setup where an ICMP
      redirect was received for the destination of the encapsulated packet.
      However, it appears that although incorrect the protocol field is not used
      in this case and thus no problem manifests.  On inspection it does not
      appear that a problem will manifest in the fragmentation needed/update pmtu
      case either.
      
      In short I believe this is a cosmetic fix. None the less, the use of
      IPPROTO_IPV6 seems wrong and confusing.
      Reviewed-by: default avatarDinan Gunawardena <dinan.gunawardena@netronome.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@netronome.com>
      Acked-by: default avatarYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      98ae009c
    • Jakub Sitnicki's avatar
      ipv6: Skip XFRM lookup if dst_entry in socket cache is valid · e9371d22
      Jakub Sitnicki authored
      [ Upstream commit 00bc0ef5 ]
      
      At present we perform an xfrm_lookup() for each UDPv6 message we
      send. The lookup involves querying the flow cache (flow_cache_lookup)
      and, in case of a cache miss, creating an XFRM bundle.
      
      If we miss the flow cache, we can end up creating a new bundle and
      deriving the path MTU (xfrm_init_pmtu) from on an already transformed
      dst_entry, which we pass from the socket cache (sk->sk_dst_cache) down
      to xfrm_lookup(). This can happen only if we're caching the dst_entry
      in the socket, that is when we're using a connected UDP socket.
      
      To put it another way, the path MTU shrinks each time we miss the flow
      cache, which later on leads to incorrectly fragmented payload. It can
      be observed with ESPv6 in transport mode:
      
        1) Set up a transformation and lower the MTU to trigger fragmentation
          # ip xfrm policy add dir out src ::1 dst ::1 \
            tmpl src ::1 dst ::1 proto esp spi 1
          # ip xfrm state add src ::1 dst ::1 \
            proto esp spi 1 enc 'aes' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b
          # ip link set dev lo mtu 1500
      
        2) Monitor the packet flow and set up an UDP sink
          # tcpdump -ni lo -ttt &
          # socat udp6-listen:12345,fork /dev/null &
      
        3) Send a datagram that needs fragmentation with a connected socket
          # perl -e 'print "@" x 1470 | socat - udp6:[::1]:12345
          2016/06/07 18:52:52 socat[724] E read(3, 0x555bb3d5ba00, 8192): Protocol error
          00:00:00.000000 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x2), length 1448
          00:00:00.000014 IP6 ::1 > ::1: frag (1448|32)
          00:00:00.000050 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x3), length 1272
          (^ ICMPv6 Parameter Problem)
          00:00:00.000022 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x5), length 136
      
        4) Compare it to a non-connected socket
          # perl -e 'print "@" x 1500' | socat - udp6-sendto:[::1]:12345
          00:00:40.535488 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x6), length 1448
          00:00:00.000010 IP6 ::1 > ::1: frag (1448|64)
      
      What happens in step (3) is:
      
        1) when connecting the socket in __ip6_datagram_connect(), we
           perform an XFRM lookup, miss the flow cache, create an XFRM
           bundle, and cache the destination,
      
        2) afterwards, when sending the datagram, we perform an XFRM lookup,
           again, miss the flow cache (due to mismatch of flowi6_iif and
           flowi6_oif, which is an issue of its own), and recreate an XFRM
           bundle based on the cached (and already transformed) destination.
      
      To prevent the recreation of an XFRM bundle, avoid an XFRM lookup
      altogether whenever we already have a destination entry cached in the
      socket. This prevents the path MTU shrinkage and brings us on par with
      UDPv4.
      
      The fix also benefits connected PINGv6 sockets, another user of
      ip6_sk_dst_lookup_flow(), who also suffer messages being transformed
      twice.
      
      Joint work with Hannes Frederic Sowa.
      Reported-by: default avatarJan Tluka <jtluka@redhat.com>
      Signed-off-by: default avatarJakub Sitnicki <jkbs@redhat.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.16: deleted code is slightly different]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      e9371d22
    • Hannes Frederic Sowa's avatar
      udp: prevent skbs lingering in tunnel socket queues · df4db51e
      Hannes Frederic Sowa authored
      [ Upstream commit e5aed006 ]
      
      In case we find a socket with encapsulation enabled we should call
      the encap_recv function even if just a udp header without payload is
      available. The callbacks are responsible for correctly verifying and
      dropping the packets.
      
      Also, in case the header validation fails for geneve and vxlan we
      shouldn't put the skb back into the socket queue, no one will pick
      them up there.  Instead we can simply discard them in the respective
      encap_recv functions.
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.16:
       - Drop changes to geneve
       - vxlan error checking looked a bit different]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      df4db51e
    • Nikolay Aleksandrov's avatar
      net: bridge: fix old ioctl unlocked net device walk · 0a2628b8
      Nikolay Aleksandrov authored
      [ Upstream commit 31ca0458 ]
      
      get_bridge_ifindices() is used from the old "deviceless" bridge ioctl
      calls which aren't called with rtnl held. The comment above says that it is
      called with rtnl but that is not really the case.
      Here's a sample output from a test ASSERT_RTNL() which I put in
      get_bridge_ifindices and executed "brctl show":
      [  957.422726] RTNL: assertion failed at net/bridge//br_ioctl.c (30)
      [  957.422925] CPU: 0 PID: 1862 Comm: brctl Tainted: G        W  O
      4.6.0-rc4+ #157
      [  957.423009] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
      BIOS 1.8.1-20150318_183358- 04/01/2014
      [  957.423009]  0000000000000000 ffff880058adfdf0 ffffffff8138dec5
      0000000000000400
      [  957.423009]  ffffffff81ce8380 ffff880058adfe58 ffffffffa05ead32
      0000000000000001
      [  957.423009]  00007ffec1a444b0 0000000000000400 ffff880053c19130
      0000000000008940
      [  957.423009] Call Trace:
      [  957.423009]  [<ffffffff8138dec5>] dump_stack+0x85/0xc0
      [  957.423009]  [<ffffffffa05ead32>]
      br_ioctl_deviceless_stub+0x212/0x2e0 [bridge]
      [  957.423009]  [<ffffffff81515beb>] sock_ioctl+0x22b/0x290
      [  957.423009]  [<ffffffff8126ba75>] do_vfs_ioctl+0x95/0x700
      [  957.423009]  [<ffffffff8126c159>] SyS_ioctl+0x79/0x90
      [  957.423009]  [<ffffffff8163a4c0>] entry_SYSCALL_64_fastpath+0x23/0xc1
      
      Since it only reads bridge ifindices, we can use rcu to safely walk the net
      device list. Also remove the wrong rtnl comment above.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      0a2628b8
    • Ian Campbell's avatar
      VSOCK: do not disconnect socket when peer has shutdown SEND only · 97b7d6be
      Ian Campbell authored
      [ Upstream commit dedc58e0 ]
      
      The peer may be expecting a reply having sent a request and then done a
      shutdown(SHUT_WR), so tearing down the whole socket at this point seems
      wrong and breaks for me with a client which does a SHUT_WR.
      
      Looking at other socket family's stream_recvmsg callbacks doing a shutdown
      here does not seem to be the norm and removing it does not seem to have
      had any adverse effects that I can see.
      
      I'm using Stefan's RFC virtio transport patches, I'm unsure of the impact
      on the vmci transport.
      Signed-off-by: default avatarIan Campbell <ian.campbell@docker.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Stefan Hajnoczi <stefanha@redhat.com>
      Cc: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
      Cc: Andy King <acking@vmware.com>
      Cc: Dmitry Torokhov <dtor@vmware.com>
      Cc: Jorgen Hansen <jhansen@vmware.com>
      Cc: Adit Ranadive <aditr@vmware.com>
      Cc: netdev@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      97b7d6be
    • Neil Horman's avatar
      netem: Segment GSO packets on enqueue · ff962567
      Neil Horman authored
      [ Upstream commit 6071bd1a ]
      
      This was recently reported to me, and reproduced on the latest net kernel,
      when attempting to run netperf from a host that had a netem qdisc attached
      to the egress interface:
      
      [  788.073771] ---------------------[ cut here ]---------------------------
      [  788.096716] WARNING: at net/core/dev.c:2253 skb_warn_bad_offload+0xcd/0xda()
      [  788.129521] bnx2: caps=(0x00000001801949b3, 0x0000000000000000) len=2962
      data_len=0 gso_size=1448 gso_type=1 ip_summed=3
      [  788.182150] Modules linked in: sch_netem kvm_amd kvm crc32_pclmul ipmi_ssif
      ghash_clmulni_intel sp5100_tco amd64_edac_mod aesni_intel lrw gf128mul
      glue_helper ablk_helper edac_mce_amd cryptd pcspkr sg edac_core hpilo ipmi_si
      i2c_piix4 k10temp fam15h_power hpwdt ipmi_msghandler shpchp acpi_power_meter
      pcc_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c
      sd_mod crc_t10dif crct10dif_generic mgag200 syscopyarea sysfillrect sysimgblt
      i2c_algo_bit drm_kms_helper ahci ata_generic pata_acpi ttm libahci
      crct10dif_pclmul pata_atiixp tg3 libata crct10dif_common drm crc32c_intel ptp
      serio_raw bnx2 r8169 hpsa pps_core i2c_core mii dm_mirror dm_region_hash dm_log
      dm_mod
      [  788.465294] CPU: 16 PID: 0 Comm: swapper/16 Tainted: G        W
      ------------   3.10.0-327.el7.x86_64 #1
      [  788.511521] Hardware name: HP ProLiant DL385p Gen8, BIOS A28 12/17/2012
      [  788.542260]  ffff880437c036b8 f7afc56532a53db9 ffff880437c03670
      ffffffff816351f1
      [  788.576332]  ffff880437c036a8 ffffffff8107b200 ffff880633e74200
      ffff880231674000
      [  788.611943]  0000000000000001 0000000000000003 0000000000000000
      ffff880437c03710
      [  788.647241] Call Trace:
      [  788.658817]  <IRQ>  [<ffffffff816351f1>] dump_stack+0x19/0x1b
      [  788.686193]  [<ffffffff8107b200>] warn_slowpath_common+0x70/0xb0
      [  788.713803]  [<ffffffff8107b29c>] warn_slowpath_fmt+0x5c/0x80
      [  788.741314]  [<ffffffff812f92f3>] ? ___ratelimit+0x93/0x100
      [  788.767018]  [<ffffffff81637f49>] skb_warn_bad_offload+0xcd/0xda
      [  788.796117]  [<ffffffff8152950c>] skb_checksum_help+0x17c/0x190
      [  788.823392]  [<ffffffffa01463a1>] netem_enqueue+0x741/0x7c0 [sch_netem]
      [  788.854487]  [<ffffffff8152cb58>] dev_queue_xmit+0x2a8/0x570
      [  788.880870]  [<ffffffff8156ae1d>] ip_finish_output+0x53d/0x7d0
      ...
      
      The problem occurs because netem is not prepared to handle GSO packets (as it
      uses skb_checksum_help in its enqueue path, which cannot manipulate these
      frames).
      
      The solution I think is to simply segment the skb in a simmilar fashion to the
      way we do in __dev_queue_xmit (via validate_xmit_skb), with some minor changes.
      When we decide to corrupt an skb, if the frame is GSO, we segment it, corrupt
      the first segment, and enqueue the remaining ones.
      
      tested successfully by myself on the latest net kernel, to which this applies
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      CC: Jamal Hadi Salim <jhs@mojatatu.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: netem@lists.linux-foundation.org
      CC: eric.dumazet@gmail.com
      CC: stephen@networkplumber.org
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.16: open-code qdisc_qstats_drop()]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      ff962567
    • WANG Cong's avatar
      sch_dsmark: update backlog as well · 097ad5ec
      WANG Cong authored
      [ Upstream commit bdf17661 ]
      
      Similarly, we need to update backlog too when we update qlen.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.16: open-code qdisc_qstats_backlog_{inc,dec}()]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      097ad5ec
    • WANG Cong's avatar
      sch_htb: update backlog as well · a3432a6c
      WANG Cong authored
      [ Upstream commit 431e3a8e ]
      
      We saw qlen!=0 but backlog==0 on our production machine:
      
      qdisc htb 1: dev eth0 root refcnt 2 r2q 10 default 1 direct_packets_stat 0 ver 3.17
       Sent 172680457356 bytes 222469449 pkt (dropped 0, overlimits 123575834 requeues 0)
       backlog 0b 72p requeues 0
      
      The problem is we only count qlen for HTB qdisc but not backlog.
      We need to update backlog too when we update qlen, so that we
      can at least know the average packet length.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.16: open-code qdisc_qstats_backlog_{inc,dec}()]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      a3432a6c
    • Chris Friesen's avatar
      route: do not cache fib route info on local routes with oif · 5499c9cd
      Chris Friesen authored
      [ Upstream commit d6d5e999 ]
      
      For local routes that require a particular output interface we do not want
      to cache the result.  Caching the result causes incorrect behaviour when
      there are multiple source addresses on the interface.  The end result
      being that if the intended recipient is waiting on that interface for the
      packet he won't receive it because it will be delivered on the loopback
      interface and the IP_PKTINFO ipi_ifindex will be set to the loopback
      interface as well.
      
      This can be tested by running a program such as "dhcp_release" which
      attempts to inject a packet on a particular interface so that it is
      received by another program on the same board.  The receiving process
      should see an IP_PKTINFO ipi_ifndex value of the source interface
      (e.g., eth1) instead of the loopback interface (e.g., lo).  The packet
      will still appear on the loopback interface in tcpdump but the important
      aspect is that the CMSG info is correct.
      
      Sample dhcp_release command line:
      
         dhcp_release eth1 192.168.204.222 02:11:33:22:44:66
      Signed-off-by: default avatarAllain Legacy <allain.legacy@windriver.com>
      Signed off-by: Chris Friesen <chris.friesen@windriver.com>
      Reviewed-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      5499c9cd
    • David S. Miller's avatar
      decnet: Do not build routes to devices without decnet private data. · abc877be
      David S. Miller authored
      [ Upstream commit a36a0d40 ]
      
      In particular, make sure we check for decnet private presence
      for loopback devices.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      abc877be
    • Rasmus Villemoes's avatar
      lib/vsprintf.c: improve sanity check in vsnprintf() · de699f05
      Rasmus Villemoes authored
      commit 2aa2f9e2 upstream.
      
      On 64 bit, size may very well be huge even if bit 31 happens to be 0.
      Somehow it doesn't feel right that one can pass a 5 GiB buffer but not a
      3 GiB one.  So cap at INT_MAX as was probably the intention all along.
      This is also the made-up value passed by sprintf and vsprintf.
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Willy Tarreau <w@1wt.eu>
      de699f05
    • Hiroshi Shimamoto's avatar
      sched/cputime: Fix invalid gtime in proc · 0c8a7400
      Hiroshi Shimamoto authored
      commit 2541117b upstream.
      
      /proc/stats shows invalid gtime when the thread is running in guest.
      When vtime accounting is not enabled, we cannot get a valid delta.
      The delta is calculated with now - tsk->vtime_snap, but tsk->vtime_snap
      is only updated when vtime accounting is runtime enabled.
      
      This patch makes task_gtime() just return gtime without computing the
      buggy non-existing tickless delta when vtime accounting is not enabled.
      
      Use context_tracking_is_enabled() to check if vtime is accounting on
      some cpu, in which case only we need to check the tickless delta. This
      way we fix the gtime value regression on machines not running nohz full.
      
      The kernel config contains CONFIG_VIRT_CPU_ACCOUNTING_GEN=y and
      CONFIG_NO_HZ_FULL_ALL=n and boot without nohz_full.
      
      I ran and stop a busy loop in VM and see the gtime in host.
      Dump the 43rd field which shows the gtime in every second:
      
      	 # while :; do awk '{print $3" "$43}' /proc/3955/task/4014/stat; sleep 1; done
      	S 4348
      	R 7064566
      	R 7064766
      	R 7064967
      	R 7065168
      	S 4759
      	S 4759
      
      During running busy loop, it returns large value.
      
      After applying this patch, we can see right gtime.
      
      	 # while :; do awk '{print $3" "$43}' /proc/10913/task/10956/stat; sleep 1; done
      	S 5338
      	R 5365
      	R 5465
      	R 5566
      	R 5666
      	S 5726
      	S 5726
      Signed-off-by: default avatarHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1447948054-28668-2-git-send-email-fweisbec@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      0c8a7400
    • David S. Miller's avatar
      irda: Fix lockdep annotations in hashbin_delete(). · 55429dd9
      David S. Miller authored
      commit 4c03b862 upstream.
      
      A nested lock depth was added to the hasbin_delete() code but it
      doesn't actually work some well and results in tons of lockdep splats.
      
      Fix the code instead to properly drop the lock around the operation
      and just keep peeking the head of the hashbin queue.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      55429dd9
    • Al Viro's avatar
      Fix missing sanity check in /dev/sg · 41abcae4
      Al Viro authored
      commit 137d01df upstream.
      
      What happens is that a write to /dev/sg is given a request with non-zero
      ->iovec_count combined with zero ->dxfer_len.  Or with ->dxferp pointing
      to an array full of empty iovecs.
      
      Having write permission to /dev/sg shouldn't be equivalent to the
      ability to trigger BUG_ON() while holding spinlocks...
      
      Found by Dmitry Vyukov and syzkaller.
      
      [ The BUG_ON() got changed to a WARN_ON_ONCE(), but this fixes the
        underlying issue.  - Linus ]
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [bwh: Backported to 3.16: we're not using iov_iter, but can check the
       byte length after truncation]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      41abcae4
    • Sergey Senozhatsky's avatar
      printk: use rcuidle console tracepoint · e5ca28d1
      Sergey Senozhatsky authored
      commit fc98c3c8 upstream.
      
      Use rcuidle console tracepoint because, apparently, it may be issued
      from an idle CPU:
      
        hw-breakpoint: Failed to enable monitor mode on CPU 0.
        hw-breakpoint: CPU 0 failed to disable vector catch
      
        ===============================
        [ ERR: suspicious RCU usage.  ]
        4.10.0-rc8-next-20170215+ #119 Not tainted
        -------------------------------
        ./include/trace/events/printk.h:32 suspicious rcu_dereference_check() usage!
      
        other info that might help us debug this:
      
        RCU used illegally from idle CPU!
        rcu_scheduler_active = 2, debug_locks = 0
        RCU used illegally from extended quiescent state!
        2 locks held by swapper/0/0:
         #0:  (cpu_pm_notifier_lock){......}, at: [<c0237e2c>] cpu_pm_exit+0x10/0x54
         #1:  (console_lock){+.+.+.}, at: [<c01ab350>] vprintk_emit+0x264/0x474
      
        stack backtrace:
        CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-rc8-next-20170215+ #119
        Hardware name: Generic OMAP4 (Flattened Device Tree)
          console_unlock
          vprintk_emit
          vprintk_default
          printk
          reset_ctrl_regs
          dbg_cpu_pm_notify
          notifier_call_chain
          cpu_pm_exit
          omap_enter_idle_coupled
          cpuidle_enter_state
          cpuidle_enter_state_coupled
          do_idle
          cpu_startup_entry
          start_kernel
      
      This RCU warning, however, is suppressed by lockdep_off() in printk().
      lockdep_off() increments the ->lockdep_recursion counter and thus
      disables RCU_LOCKDEP_WARN() and debug_lockdep_rcu_enabled(), which want
      lockdep to be enabled "current->lockdep_recursion == 0".
      
      Link: http://lkml.kernel.org/r/20170217015932.11898-1-sergey.senozhatsky@gmail.comSigned-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reported-by: default avatarTony Lindgren <tony@atomide.com>
      Tested-by: default avatarTony Lindgren <tony@atomide.com>
      Acked-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Lindgren <tony@atomide.com>
      Cc: Russell King <rmk@armlinux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      e5ca28d1
    • Anoob Soman's avatar
      packet: Do not call fanout_release from atomic contexts · d227a42e
      Anoob Soman authored
      commit 2bd624b4 upstream.
      
      Commit 66644982 ("packet: call fanout_release, while UNREGISTERING a
      netdev"), unfortunately, introduced the following issues.
      
      1. calling mutex_lock(&fanout_mutex) (fanout_release()) from inside
      rcu_read-side critical section. rcu_read_lock disables preemption, most often,
      which prohibits calling sleeping functions.
      
      [  ] include/linux/rcupdate.h:560 Illegal context switch in RCU read-side critical section!
      [  ]
      [  ] rcu_scheduler_active = 1, debug_locks = 0
      [  ] 4 locks held by ovs-vswitchd/1969:
      [  ]  #0:  (cb_lock){++++++}, at: [<ffffffff8158a6c9>] genl_rcv+0x19/0x40
      [  ]  #1:  (ovs_mutex){+.+.+.}, at: [<ffffffffa04878ca>] ovs_vport_cmd_del+0x4a/0x100 [openvswitch]
      [  ]  #2:  (rtnl_mutex){+.+.+.}, at: [<ffffffff81564157>] rtnl_lock+0x17/0x20
      [  ]  #3:  (rcu_read_lock){......}, at: [<ffffffff81614165>] packet_notifier+0x5/0x3f0
      [  ]
      [  ] Call Trace:
      [  ]  [<ffffffff813770c1>] dump_stack+0x85/0xc4
      [  ]  [<ffffffff810c9077>] lockdep_rcu_suspicious+0x107/0x110
      [  ]  [<ffffffff810a2da7>] ___might_sleep+0x57/0x210
      [  ]  [<ffffffff810a2fd0>] __might_sleep+0x70/0x90
      [  ]  [<ffffffff8162e80c>] mutex_lock_nested+0x3c/0x3a0
      [  ]  [<ffffffff810de93f>] ? vprintk_default+0x1f/0x30
      [  ]  [<ffffffff81186e88>] ? printk+0x4d/0x4f
      [  ]  [<ffffffff816106dd>] fanout_release+0x1d/0xe0
      [  ]  [<ffffffff81614459>] packet_notifier+0x2f9/0x3f0
      
      2. calling mutex_lock(&fanout_mutex) inside spin_lock(&po->bind_lock).
      "sleeping function called from invalid context"
      
      [  ] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
      [  ] in_atomic(): 1, irqs_disabled(): 0, pid: 1969, name: ovs-vswitchd
      [  ] INFO: lockdep is turned off.
      [  ] Call Trace:
      [  ]  [<ffffffff813770c1>] dump_stack+0x85/0xc4
      [  ]  [<ffffffff810a2f52>] ___might_sleep+0x202/0x210
      [  ]  [<ffffffff810a2fd0>] __might_sleep+0x70/0x90
      [  ]  [<ffffffff8162e80c>] mutex_lock_nested+0x3c/0x3a0
      [  ]  [<ffffffff816106dd>] fanout_release+0x1d/0xe0
      [  ]  [<ffffffff81614459>] packet_notifier+0x2f9/0x3f0
      
      3. calling dev_remove_pack(&fanout->prot_hook), from inside
      spin_lock(&po->bind_lock) or rcu_read-side critical-section. dev_remove_pack()
      -> synchronize_net(), which might sleep.
      
      [  ] BUG: scheduling while atomic: ovs-vswitchd/1969/0x00000002
      [  ] INFO: lockdep is turned off.
      [  ] Call Trace:
      [  ]  [<ffffffff813770c1>] dump_stack+0x85/0xc4
      [  ]  [<ffffffff81186274>] __schedule_bug+0x64/0x73
      [  ]  [<ffffffff8162b8cb>] __schedule+0x6b/0xd10
      [  ]  [<ffffffff8162c5db>] schedule+0x6b/0x80
      [  ]  [<ffffffff81630b1d>] schedule_timeout+0x38d/0x410
      [  ]  [<ffffffff810ea3fd>] synchronize_sched_expedited+0x53d/0x810
      [  ]  [<ffffffff810ea6de>] synchronize_rcu_expedited+0xe/0x10
      [  ]  [<ffffffff8154eab5>] synchronize_net+0x35/0x50
      [  ]  [<ffffffff8154eae3>] dev_remove_pack+0x13/0x20
      [  ]  [<ffffffff8161077e>] fanout_release+0xbe/0xe0
      [  ]  [<ffffffff81614459>] packet_notifier+0x2f9/0x3f0
      
      4. fanout_release() races with calls from different CPU.
      
      To fix the above problems, remove the call to fanout_release() under
      rcu_read_lock(). Instead, call __dev_remove_pack(&fanout->prot_hook) and
      netdev_run_todo will be happy that &dev->ptype_specific list is empty. In order
      to achieve this, I moved dev_{add,remove}_pack() out of fanout_{add,release} to
      __fanout_{link,unlink}. So, call to {,__}unregister_prot_hook() will make sure
      fanout->prot_hook is removed as well.
      
      Fixes: 66644982 ("packet: call fanout_release, while UNREGISTERING a netdev")
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarAnoob Soman <anoob.soman@citrix.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.16:
       - Don't call fanout_release_data()
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      d227a42e
    • Anoob Soman's avatar
      packet: call fanout_release, while UNREGISTERING a netdev · 5d4c3670
      Anoob Soman authored
      commit 66644982 upstream.
      
      If a socket has FANOUT sockopt set, a new proto_hook is registered
      as part of fanout_add(). When processing a NETDEV_UNREGISTER event in
      af_packet, __fanout_unlink is called for all sockets, but prot_hook which was
      registered as part of fanout_add is not removed. Call fanout_release, on a
      NETDEV_UNREGISTER, which removes prot_hook and removes fanout from the
      fanout_list.
      
      This fixes BUG_ON(!list_empty(&dev->ptype_specific)) in netdev_run_todo()
      Signed-off-by: default avatarAnoob Soman <anoob.soman@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      5d4c3670
    • Miklos Szeredi's avatar
      vfs: fix uninitialized flags in splice_to_pipe() · 36733716
      Miklos Szeredi authored
      commit 5a81e6a1 upstream.
      
      Flags (PIPE_BUF_FLAG_PACKET, PIPE_BUF_FLAG_GIFT) could remain on the
      unused part of the pipe ring buffer.  Previously splice_to_pipe() left
      the flags value alone, which could result in incorrect behavior.
      
      Uninitialized flags appears to have been there from the introduction of
      the splice syscall.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [bwh: Backported to 3.16: adjust context, indentation]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      36733716
    • Anssi Hannula's avatar
      net: xilinx_emaclite: fix freezes due to unordered I/O · c32ff59a
      Anssi Hannula authored
      commit acf138f1 upstream.
      
      The xilinx_emaclite uses __raw_writel and __raw_readl for register
      accesses. Those functions do not imply any kind of memory barriers and
      they may be reordered.
      
      The driver does not seem to take that into account, though, and the
      driver does not satisfy the ordering requirements of the hardware.
      For clear examples, see xemaclite_mdio_write() and xemaclite_mdio_read()
      which try to set MDIO address before initiating the transaction.
      
      I'm seeing system freezes with the driver with GCC 5.4 and current
      Linux kernels on Zynq-7000 SoC immediately when trying to use the
      interface.
      
      In commit 123c1407 ("net: emaclite: Do not use microblaze and ppc
      IO functions") the driver was switched from non-generic
      in_be32/out_be32 (memory barriers, big endian) to
      __raw_readl/__raw_writel (no memory barriers, native endian), so
      apparently the device follows system endianness and the driver was
      originally written with the assumption of memory barriers.
      
      Rather than try to hunt for each case of missing barrier, just switch
      the driver to use iowrite32/ioread32/iowrite32be/ioread32be depending
      on endianness instead.
      
      Tested on little-endian Zynq-7000 ARM SoC FPGA.
      Signed-off-by: default avatarAnssi Hannula <anssi.hannula@bitwise.fi>
      Fixes: 123c1407 ("net: emaclite: Do not use microblaze and ppc IO
      functions")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      c32ff59a
    • Anssi Hannula's avatar
      net: xilinx_emaclite: fix receive buffer overflow · 7a7ba224
      Anssi Hannula authored
      commit cd224553 upstream.
      
      xilinx_emaclite looks at the received data to try to determine the
      Ethernet packet length but does not properly clamp it if
      proto_type == ETH_P_IP or 1500 < proto_type <= 1518, causing a buffer
      overflow and a panic via skb_panic() as the length exceeds the allocated
      skb size.
      
      Fix those cases.
      
      Also add an additional unconditional check with WARN_ON() at the end.
      Signed-off-by: default avatarAnssi Hannula <anssi.hannula@bitwise.fi>
      Fixes: bb81b2dd ("net: add Xilinx emac lite device driver")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      7a7ba224
    • Mauro Carvalho Chehab's avatar
      siano: make it work again with CONFIG_VMAP_STACK · 6ef068c8
      Mauro Carvalho Chehab authored
      commit f9c85ee6 upstream.
      
      Reported as a Kaffeine bug:
      	https://bugs.kde.org/show_bug.cgi?id=375811
      
      The USB control messages require DMA to work. We cannot pass
      a stack-allocated buffer, as it is not warranted that the
      stack would be into a DMA enabled area.
      
      On Kernel 4.9, the default is to not accept DMA on stack anymore
      on x86 architecture. On other architectures, this has been a
      requirement since Kernel 2.2. So, after this patch, this driver
      should likely work fine on all archs.
      
      Tested with USB ID 2040:5510: Hauppauge Windham
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
      [bwh: Backported to 3.16: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      6ef068c8
    • Eric Dumazet's avatar
      packet: fix races in fanout_add() · bf791623
      Eric Dumazet authored
      commit d199fab6 upstream.
      
      Multiple threads can call fanout_add() at the same time.
      
      We need to grab fanout_mutex earlier to avoid races that could
      lead to one thread freeing po->rollover that was set by another thread.
      
      Do the same in fanout_release(), for peace of mind, and to help us
      finding lockdep issues earlier.
      
      Fixes: dc99f600 ("packet: Add fanout support.")
      Fixes: 0648ab70 ("packet: rollover prepare: per-socket state")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.16:
       - No rollover queue stats
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      bf791623
    • Anssi Hannula's avatar
      mmc: core: fix multi-bit bus width without high-speed mode · b08a436e
      Anssi Hannula authored
      commit 3d4ef329 upstream.
      
      Commit 577fb131 ("mmc: rework selection of bus speed mode")
      refactored bus width selection code to mmc_select_bus_width().
      
      However, it also altered the behavior to not call the selection code in
      non-high-speed modes anymore.
      
      This causes 1-bit mode to always be used when the high-speed mode is not
      enabled, even though 4-bit and 8-bit bus are valid bus widths in the
      backwards-compatibility (legacy) mode as well (see e.g. 5.3.2 Bus Speed
      Modes in JEDEC 84-B50). This results in a significant regression in
      transfer speeds.
      
      Fix the code to allow 4-bit and 8-bit widths even without high-speed
      mode, as before.
      
      Tested with a Zynq-7000 PicoZed 7020 board.
      
      Fixes: 577fb131 ("mmc: rework selection of bus speed mode")
      Signed-off-by: default avatarAnssi Hannula <anssi.hannula@bitwise.fi>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      [bwh: Backported to 3.16: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      b08a436e
    • Yang Yang's avatar
      futex: Move futex_init() to core_initcall · dcc9b826
      Yang Yang authored
      commit 25f71d1c upstream.
      
      The UEVENT user mode helper is enabled before the initcalls are executed
      and is available when the root filesystem has been mounted.
      
      The user mode helper is triggered by device init calls and the executable
      might use the futex syscall.
      
      futex_init() is marked __initcall which maps to device_initcall, but there
      is no guarantee that futex_init() is invoked _before_ the first device init
      call which triggers the UEVENT user mode helper.
      
      If the user mode helper uses the futex syscall before futex_init() then the
      syscall crashes with a NULL pointer dereference because the futex subsystem
      has not been initialized yet.
      
      Move futex_init() to core_initcall so futexes are initialized before the
      root filesystem is mounted and the usermode helper becomes available.
      
      [ tglx: Rewrote changelog ]
      Signed-off-by: default avatarYang Yang <yang.yang29@zte.com.cn>
      Cc: jiang.biao2@zte.com.cn
      Cc: jiang.zhengxiong@zte.com.cn
      Cc: zhong.weidong@zte.com.cn
      Cc: deng.huali@zte.com.cn
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1483085875-6130-1-git-send-email-yang.yang29@zte.com.cnSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      dcc9b826
    • Eric Dumazet's avatar
      net/llc: avoid BUG_ON() in skb_orphan() · 8e822a0f
      Eric Dumazet authored
      commit 8b74d439 upstream.
      
      It seems nobody used LLC since linux-3.12.
      
      Fortunately fuzzers like syzkaller still know how to run this code,
      otherwise it would be no fun.
      
      Setting skb->sk without skb->destructor leads to all kinds of
      bugs, we now prefer to be very strict about it.
      
      Ideally here we would use skb_set_owner() but this helper does not exist yet,
      only CAN seems to have a private helper for that.
      
      Fixes: 376c7311 ("net: add a temporary sanity check in skb_orphan()")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      8e822a0f
    • Ben Hutchings's avatar
      net/sock: Add sock_efree() function · 8f1a8602
      Ben Hutchings authored
      Extracted from commit 62bccb8c ("net-timestamp: Make the clone operation
      stand-alone from phy timestamping").
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      8f1a8602
    • Eric Dumazet's avatar
      l2tp: do not use udp_ioctl() · 3f70197d
      Eric Dumazet authored
      commit 72fb96e7 upstream.
      
      udp_ioctl(), as its name suggests, is used by UDP protocols,
      but is also used by L2TP :(
      
      L2TP should use its own handler, because it really does not
      look the same.
      
      SIOCINQ for instance should not assume UDP checksum or headers.
      
      Thanks to Andrey and syzkaller team for providing the report
      and a nice reproducer.
      
      While crashes only happen on recent kernels (after commit
      7c13f97f ("udp: do fwd memory scheduling on dequeue")), this
      probably needs to be backported to older kernels.
      
      Fixes: 7c13f97f ("udp: do fwd memory scheduling on dequeue")
      Fixes: 85584672 ("udp: Fix udp_poll() and ioctl()")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      3f70197d
    • Boris Ostrovsky's avatar
      xen-netfront: Delete rx_refill_timer in xennet_disconnect_backend() · e5b48d15
      Boris Ostrovsky authored
      commit 74470954 upstream.
      
      rx_refill_timer should be deleted as soon as we disconnect from the
      backend since otherwise it is possible for the timer to go off before
      we get to xennet_destroy_queues(). If this happens we may dereference
      queue->rx.sring which is set to NULL in xennet_disconnect_backend().
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Reviewed-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.16: del_timer_sync() was called from xennet_remove()
       but that's also too late]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      e5b48d15
    • Steffen Maier's avatar
      scsi: zfcp: fix use-after-free by not tracing WKA port open/close on failed send · f32f04a3
      Steffen Maier authored
      commit 2dfa6688 upstream.
      
      Dan Carpenter kindly reported:
      <quote>
      The patch d27a7cb9: "zfcp: trace on request for open and close of
      WKA port" from Aug 10, 2016, leads to the following static checker
      warning:
      
      	drivers/s390/scsi/zfcp_fsf.c:1615 zfcp_fsf_open_wka_port()
      	warn: 'req' was already freed.
      
      drivers/s390/scsi/zfcp_fsf.c
        1609          zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT);
        1610          retval = zfcp_fsf_req_send(req);
        1611          if (retval)
        1612                  zfcp_fsf_req_free(req);
                                                ^^^
      Freed.
      
        1613  out:
        1614          spin_unlock_irq(&qdio->req_q_lock);
        1615          if (req && !IS_ERR(req))
        1616                  zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req->req_id);
                                                                        ^^^^^^^^^^^
      Use after free.
      
        1617          return retval;
        1618  }
      
      Same thing for zfcp_fsf_close_wka_port() as well.
      </quote>
      
      Rather than relying on req being NULL (or ERR_PTR) for all cases where
      we don't want to trace or should not trace,
      simply check retval which is unconditionally initialized with -EIO != 0
      and it can only become 0 on successful retval = zfcp_fsf_req_send(req).
      With that we can also remove the then again unnecessary unconditional
      initialization of req which was introduced with that earlier commit.
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Suggested-by: default avatarBenjamin Block <bblock@linux.vnet.ibm.com>
      Signed-off-by: default avatarSteffen Maier <maier@linux.vnet.ibm.com>
      Fixes: d27a7cb9 ("zfcp: trace on request for open and close of WKA port")
      Reviewed-by: default avatarBenjamin Block <bblock@linux.vnet.ibm.com>
      Reviewed-by: default avatarJens Remus <jremus@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      f32f04a3
    • Hui Wang's avatar
      ALSA: hda - adding a new NV HDMI/DP codec ID in the driver · 3f9f59a1
      Hui Wang authored
      commit af677166 upstream.
      
      Without this change, the HDMI/DP codec will be recognised as a
      generic codec, and there is no sound when playing through this codec.
      
      As suggested by NVidia side, after adding the new ID in the driver,
      the sound playing works well.
      Signed-off-by: default avatarHui Wang <hui.wang@canonical.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      [bwh: Backported to 3.16: don't use HDA_CODEC_ENTRY()]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      3f9f59a1
    • WANG Cong's avatar
      ping: fix a null pointer dereference · 69961a4c
      WANG Cong authored
      commit 73d2c667 upstream.
      
      Andrey reported a kernel crash:
      
        general protection fault: 0000 [#1] SMP KASAN
        Dumping ftrace buffer:
           (ftrace buffer empty)
        Modules linked in:
        CPU: 2 PID: 3880 Comm: syz-executor1 Not tainted 4.10.0-rc6+ #124
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
        task: ffff880060048040 task.stack: ffff880069be8000
        RIP: 0010:ping_v4_push_pending_frames net/ipv4/ping.c:647 [inline]
        RIP: 0010:ping_v4_sendmsg+0x1acd/0x23f0 net/ipv4/ping.c:837
        RSP: 0018:ffff880069bef8b8 EFLAGS: 00010206
        RAX: dffffc0000000000 RBX: ffff880069befb90 RCX: 0000000000000000
        RDX: 0000000000000018 RSI: ffff880069befa30 RDI: 00000000000000c2
        RBP: ffff880069befbb8 R08: 0000000000000008 R09: 0000000000000000
        R10: 0000000000000002 R11: 0000000000000000 R12: ffff880069befab0
        R13: ffff88006c624a80 R14: ffff880069befa70 R15: 0000000000000000
        FS:  00007f6f7c716700(0000) GS:ffff88006de00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00000000004a6f28 CR3: 000000003a134000 CR4: 00000000000006e0
        Call Trace:
         inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
         sock_sendmsg_nosec net/socket.c:635 [inline]
         sock_sendmsg+0xca/0x110 net/socket.c:645
         SYSC_sendto+0x660/0x810 net/socket.c:1687
         SyS_sendto+0x40/0x50 net/socket.c:1655
         entry_SYSCALL_64_fastpath+0x1f/0xc2
      
      This is because we miss a check for NULL pointer for skb_peek() when
      the queue is empty. Other places already have the same check.
      
      Fixes: c319b4d7 ("net: ipv4: add IPPROTO_ICMP socket kind")
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      69961a4c