1. 07 Mar, 2015 40 commits
    • Austin Lund's avatar
      [media] media/rc: Send sync space information on the lirc device · 190b31ab
      Austin Lund authored
      Userspace expects to see a long space before the first pulse is sent on
      the lirc device.  Currently, if a long time has passed and a new packet
      is started, the lirc codec just returns and doesn't send anything.  This
      makes lircd ignore many perfectly valid signals unless they are sent in
      quick sucession.  When a reset event is delivered, we cannot know
      anything about the duration of the space.  But it should be safe to
      assume it has been a long time and we just set the duration to maximum.
      Signed-off-by: default avatarAustin Lund <austin.lund@gmail.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      
      (cherry picked from commit a8f29e89)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      190b31ab
    • Saran Maruti Ramanara's avatar
      net: sctp: fix passing wrong parameter header to param_type2af in sctp_process_param · bf431c8f
      Saran Maruti Ramanara authored
      When making use of RFC5061, section 4.2.4. for setting the primary IP
      address, we're passing a wrong parameter header to param_type2af(),
      resulting always in NULL being returned.
      
      At this point, param.p points to a sctp_addip_param struct, containing
      a sctp_paramhdr (type = 0xc004, length = var), and crr_id as a correlation
      id. Followed by that, as also presented in RFC5061 section 4.2.4., comes
      the actual sctp_addr_param, which also contains a sctp_paramhdr, but
      this time with the correct type SCTP_PARAM_IPV{4,6}_ADDRESS that
      param_type2af() can make use of. Since we already hold a pointer to
      addr_param from previous line, just reuse it for param_type2af().
      
      Fixes: d6de3097 ("[SCTP]: Add the handling of "Set Primary IP Address" parameter to INIT")
      Signed-off-by: default avatarSaran Maruti Ramanara <saran.neti@telus.com>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit cfbf654e)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      bf431c8f
    • Florian Westphal's avatar
      ppp: deflate: never return len larger than output buffer · c047a324
      Florian Westphal authored
      When we've run out of space in the output buffer to store more data, we
      will call zlib_deflate with a NULL output buffer until we've consumed
      remaining input.
      
      When this happens, olen contains the size the output buffer would have
      consumed iff we'd have had enough room.
      
      This can later cause skb_over_panic when ppp_generic skb_put()s
      the returned length.
      Reported-by: default avatarIain Douglas <centos@1n6.org.uk>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit e2a4800e)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      c047a324
    • Eric Dumazet's avatar
      ipv4: tcp: get rid of ugly unicast_sock · 971b8803
      Eric Dumazet authored
      In commit be9f4a44 ("ipv4: tcp: remove per net tcp_sock")
      I tried to address contention on a socket lock, but the solution
      I chose was horrible :
      
      commit 3a7c384f ("ipv4: tcp: unicast_sock should not land outside
      of TCP stack") addressed a selinux regression.
      
      commit 0980e56e ("ipv4: tcp: set unicast_sock uc_ttl to -1")
      took care of another regression.
      
      commit b5ec8eea ("ipv4: fix ip_send_skb()") fixed another regression.
      
      commit 811230cd ("tcp: ipv4: initialize unicast_sock sk_pacing_rate")
      was another shot in the dark.
      
      Really, just use a proper socket per cpu, and remove the skb_orphan()
      call, to re-enable flow control.
      
      This solves a serious problem with FQ packet scheduler when used in
      hostile environments, as we do not want to allocate a flow structure
      for every RST packet sent in response to a spoofed packet.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      net: sched: fix panic in rate estimators
      
      Doing the following commands on a non idle network device
      panics the box instantly, because cpu_bstats gets overwritten
      by stats.
      
      tc qdisc add dev eth0 root <your_favorite_qdisc>
      ... some traffic (one packet is enough) ...
      tc qdisc replace dev eth0 root est 1sec 4sec <your_favorite_qdisc>
      
      [  325.355596] BUG: unable to handle kernel paging request at ffff8841dc5a074c
      [  325.362609] IP: [<ffffffff81541c9e>] __gnet_stats_copy_basic+0x3e/0x90
      [  325.369158] PGD 1fa7067 PUD 0
      [  325.372254] Oops: 0000 [#1] SMP
      [  325.375514] Modules linked in: ...
      [  325.398346] CPU: 13 PID: 14313 Comm: tc Not tainted 3.19.0-smp-DEV #1163
      [  325.412042] task: ffff8800793ab5d0 ti: ffff881ff2fa4000 task.ti: ffff881ff2fa4000
      [  325.419518] RIP: 0010:[<ffffffff81541c9e>]  [<ffffffff81541c9e>] __gnet_stats_copy_basic+0x3e/0x90
      [  325.428506] RSP: 0018:ffff881ff2fa7928  EFLAGS: 00010286
      [  325.433824] RAX: 000000000000000c RBX: ffff881ff2fa796c RCX: 000000000000000c
      [  325.440988] RDX: ffff8841dc5a0744 RSI: 0000000000000060 RDI: 0000000000000060
      [  325.448120] RBP: ffff881ff2fa7948 R08: ffffffff81cd4f80 R09: 0000000000000000
      [  325.455268] R10: ffff883ff223e400 R11: 0000000000000000 R12: 000000015cba0744
      [  325.462405] R13: ffffffff81cd4f80 R14: ffff883ff223e460 R15: ffff883feea0722c
      [  325.469536] FS:  00007f2ee30fa700(0000) GS:ffff88407fa20000(0000) knlGS:0000000000000000
      [  325.477630] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  325.483380] CR2: ffff8841dc5a074c CR3: 0000003feeae9000 CR4: 00000000001407e0
      [  325.490510] Stack:
      [  325.492524]  ffff883feea0722c ffff883fef719dc0 ffff883feea0722c ffff883ff223e4a0
      [  325.499990]  ffff881ff2fa79a8 ffffffff815424ee ffff883ff223e49c 000000015cba0744
      [  325.507460]  00000000f2fa7978 0000000000000000 ffff881ff2fa79a8 ffff883ff223e4a0
      [  325.514956] Call Trace:
      [  325.517412]  [<ffffffff815424ee>] gen_new_estimator+0x8e/0x230
      [  325.523250]  [<ffffffff815427aa>] gen_replace_estimator+0x4a/0x60
      [  325.529349]  [<ffffffff815718ab>] tc_modify_qdisc+0x52b/0x590
      [  325.535117]  [<ffffffff8155edd0>] rtnetlink_rcv_msg+0xa0/0x240
      [  325.540963]  [<ffffffff8155ed30>] ? __rtnl_unlock+0x20/0x20
      [  325.546532]  [<ffffffff8157f811>] netlink_rcv_skb+0xb1/0xc0
      [  325.552145]  [<ffffffff8155b355>] rtnetlink_rcv+0x25/0x40
      [  325.557558]  [<ffffffff8157f0d8>] netlink_unicast+0x168/0x220
      [  325.563317]  [<ffffffff8157f47c>] netlink_sendmsg+0x2ec/0x3e0
      
      Lets play safe and not use an union : percpu 'pointers' are mostly read
      anyway, and we have typically few qdiscs per host.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Fixes: 22e0f8b9 ("net: sched: make bstats per cpu and estimator RCU safe")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      hyperv: Fix the error processing in netvsc_send()
      
      The existing code frees the skb in EAGAIN case, in which the skb will be
      retried from upper layer and used again.
      Also, the existing code doesn't free send buffer slot in error case, because
      there is no completion message for unsent packets.
      This patch fixes these problems.
      
      (Please also include this patch for stable trees. Thanks!)
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Reviewed-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      drivers: net: xgene: fix: Out of order descriptor bytes read
      
      This patch fixes the following kernel crash,
      
      	WARNING: CPU: 2 PID: 0 at net/ipv4/tcp_input.c:3079 tcp_clean_rtx_queue+0x658/0x80c()
      	Call trace:
      	[<fffffe0000096b7c>] dump_backtrace+0x0/0x184
      	[<fffffe0000096d10>] show_stack+0x10/0x1c
      	[<fffffe0000685ea0>] dump_stack+0x74/0x98
      	[<fffffe00000b44e0>] warn_slowpath_common+0x88/0xb0
      	[<fffffe00000b461c>] warn_slowpath_null+0x14/0x20
      	[<fffffe00005b5c1c>] tcp_clean_rtx_queue+0x654/0x80c
      	[<fffffe00005b6228>] tcp_ack+0x454/0x688
      	[<fffffe00005b6ca8>] tcp_rcv_established+0x4a4/0x62c
      	[<fffffe00005bf4b4>] tcp_v4_do_rcv+0x16c/0x350
      	[<fffffe00005c225c>] tcp_v4_rcv+0x8e8/0x904
      	[<fffffe000059d470>] ip_local_deliver_finish+0x100/0x26c
      	[<fffffe000059dad8>] ip_local_deliver+0xac/0xc4
      	[<fffffe000059d6c4>] ip_rcv_finish+0xe8/0x328
      	[<fffffe000059dd3c>] ip_rcv+0x24c/0x38c
      	[<fffffe0000563950>] __netif_receive_skb_core+0x29c/0x7c8
      	[<fffffe0000563ea4>] __netif_receive_skb+0x28/0x7c
      	[<fffffe0000563f54>] netif_receive_skb_internal+0x5c/0xe0
      	[<fffffe0000564810>] napi_gro_receive+0xb4/0x110
      	[<fffffe0000482a2c>] xgene_enet_process_ring+0x144/0x338
      	[<fffffe0000482d18>] xgene_enet_napi+0x1c/0x50
      	[<fffffe0000565454>] net_rx_action+0x154/0x228
      	[<fffffe00000b804c>] __do_softirq+0x110/0x28c
      	[<fffffe00000b8424>] irq_exit+0x8c/0xc0
      	[<fffffe0000093898>] handle_IRQ+0x44/0xa8
      	[<fffffe000009032c>] gic_handle_irq+0x38/0x7c
      	[...]
      
      Software writes poison data into the descriptor bytes[15:8] and upon
      receiving the interrupt, if those bytes are overwritten by the hardware with
      the valid data, software also reads bytes[7:0] and executes receive/tx
      completion logic.
      
      If the CPU executes the above two reads in out of order fashion, then the
      bytes[7:0] will have older data and causing the kernel panic.  We have to
      force the order of the reads and thus this patch introduces read memory
      barrier between these reads.
      Signed-off-by: default avatarIyappan Subramanian <isubramanian@apm.com>
      Signed-off-by: default avatarKeyur Chudgar <kchudgar@apm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      Merge branch 'vlan_get_protocol'
      
      Toshiaki Makita says:
      
      ====================
      Fix checksum error when using stacked vlan
      
      When I was testing 802.1ad, I found several drivers don't take into
      account 802.1ad or multiple vlans when retrieving L3 (IP/IPv6) or
      L4 (TCP/UDP) protocol for checksum offload.
      
      It is mainly due to vlan_get_protocol(), which extracts ether type only
      when it is tagged with single 802.1Q. When 802.1ad is used or there are
      multiple vlans, it extracts vlan protocol and drivers cannot determine
      which L3/L4 protocol is used.
      
      Those drivers, most of which have IP_CSUM/IPV6_CSUM features, get L3/L4
      header-offset by software, so it seems that their checksum offload works
      with multiple vlans if we can parse protocols correctly.
      (They know mac header length, and probably don't care about what is in it.)
      
      And another thing, some of Intel's drivers seem to use skb->protocol where
      vlan_get_protocol() is more suitable.
      
      I tested that at least igb/igbvf on I350 works with this patch set.
      
      Note:
      We can hand a double tagged packet with CHECKSUM_PARTIAL to a HW driver
      by creating a vlan device on a bridge device and enabling vlan_filtering
      of the bridge with 802.1ad protocol.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      ixgbevf: Fix checksum error when using stacked vlan
      
      When a skb has multiple vlans and it is CHECKSUM_PARTIAL,
      ixgbevf_tx_csum() fails to get the network protocol and checksum related
      descriptor fields are not configured correctly because skb->protocol
      doesn't show the L3 protocol in this case.
      
      Use first->protocol instead of skb->protocol to get the proper network
      protocol.
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      ixgbe: Fix checksum error when using stacked vlan
      
      When a skb has multiple vlans and it is CHECKSUM_PARTIAL,
      ixgbe_tx_csum() fails to get the network protocol and checksum related
      descriptor fields are not configured correctly because skb->protocol
      doesn't show the L3 protocol in this case.
      
      Use vlan_get_protocol() to get the proper network protocol.
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      igbvf: Fix checksum error when using stacked vlan
      
      When a skb has multiple vlans and it is CHECKSUM_PARTIAL,
      igbvf_tx_csum() fails to get the network protocol and checksum related
      descriptor fields are not configured correctly because skb->protocol
      doesn't show the L3 protocol in this case.
      
      Use vlan_get_protocol() to get the proper network protocol.
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      net: Fix vlan_get_protocol for stacked vlan
      
      vlan_get_protocol() could not get network protocol if a skb has a 802.1ad
      vlan tag or multiple vlans, which caused incorrect checksum calculation
      in several drivers.
      
      Fix vlan_get_protocol() to retrieve network protocol instead of incorrect
      vlan protocol.
      
      As the logic is the same as skb_network_protocol(), create a common helper
      function __vlan_get_protocol() and call it from existing functions.
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      net: sctp: fix passing wrong parameter header to param_type2af in sctp_process_param
      
      When making use of RFC5061, section 4.2.4. for setting the primary IP
      address, we're passing a wrong parameter header to param_type2af(),
      resulting always in NULL being returned.
      
      At this point, param.p points to a sctp_addip_param struct, containing
      a sctp_paramhdr (type = 0xc004, length = var), and crr_id as a correlation
      id. Followed by that, as also presented in RFC5061 section 4.2.4., comes
      the actual sctp_addr_param, which also contains a sctp_paramhdr, but
      this time with the correct type SCTP_PARAM_IPV{4,6}_ADDRESS that
      param_type2af() can make use of. Since we already hold a pointer to
      addr_param from previous line, just reuse it for param_type2af().
      
      Fixes: d6de3097 ("[SCTP]: Add the handling of "Set Primary IP Address" parameter to INIT")
      Signed-off-by: default avatarSaran Maruti Ramanara <saran.neti@telus.com>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      netlink: fix wrong subscription bitmask to group mapping in
      
      The subscription bitmask passed via struct sockaddr_nl is converted to
      the group number when calling the netlink_bind() and netlink_unbind()
      callbacks.
      
      The conversion is however incorrect since bitmask (1 << 0) needs to be
      mapped to group number 1. Note that you cannot specify the group number 0
      (usually known as _NONE) from setsockopt() using NETLINK_ADD_MEMBERSHIP
      since this is rejected through -EINVAL.
      
      This problem became noticeable since 97840cb6 ("netfilter: nfnetlink:
      fix insufficient validation in nfnetlink_bind") when binding to bitmask
      (1 << 0) in ctnetlink.
      Reported-by: default avatarAndre Tomt <andre@tomt.net>
      Reported-by: default avatarIvan Delalande <colona@arista.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      ipv4: Don't increase PMTU with Datagram Too Big message.
      
      RFC 1191 said, "a host MUST not increase its estimate of the Path
      MTU in response to the contents of a Datagram Too Big message."
      Signed-off-by: default avatarLi Wei <lw@cn.fujitsu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      Merge branch 'arm-build-fixes'
      
      Arnd Bergmann says:
      
      ====================
      net: driver fixes from arm randconfig builds
      
      These four patches are fallout from test builds on ARM. I have a
      few more of them in my backlog but have not yet confirmed them
      to still be valid.
      
      The first three patches are about incomplete dependencies on
      old drivers. One could backport them to the beginning of time
      in theory, but there is little value since nobody would run into
      these problems.
      
      The final patch is one I had submitted before together with the
      respective pcmcia patch but forgot to follow up on that. It's
      still a valid but relatively theoretical bug, because the previous
      behavior of the driver was just as broken as what we have in
      mainline.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      net: am2150: fix nmclan_cs.c shared interrupt handling
      
      A recent patch tried to work around a valid warning for the use of a
      deprecated interface by blindly changing from the old
      pcmcia_request_exclusive_irq() interface to pcmcia_request_irq().
      
      This driver has an interrupt handler that is not currently aware
      of shared interrupts, but can be easily converted to be.
      At the moment, the driver reads the interrupt status register
      repeatedly until it contains only zeroes in the interesting bits,
      and handles each bit individually.
      
      This patch adds the missing part of returning IRQ_NONE in case none
      of the bits are set to start with, so we can move on to the next
      interrupt source.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Fixes: 5f5316fc ("am2150: Update nmclan_cs.c to use update PCMCIA API")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      net: lance,ni64: don't build for ARM
      
      The ni65 and lance ethernet drivers manually program the ISA DMA
      controller that is only available on x86 PCs and a few compatible
      systems. Trying to build it on ARM results in this error:
      
      ni65.c: In function 'ni65_probe1':
      ni65.c:496:62: error: 'DMA1_STAT_REG' undeclared (first use in this function)
           ((inb(DMA1_STAT_REG) >> 4) & 0x0f)
                                                                    ^
      ni65.c:496:62: note: each undeclared identifier is reported only once for each function it appears in
      ni65.c:497:63: error: 'DMA2_STAT_REG' undeclared (first use in this function)
           | (inb(DMA2_STAT_REG) & 0xf0);
      
      The DMA1_STAT_REG and DMA2_STAT_REG registers are only defined for
      alpha, mips, parisc, powerpc and x86, although it is not clear
      which subarchitectures actually have them at the correct location.
      
      This patch for now just disables it for ARM, to avoid randconfig
      build errors. We could also decide to limit it to the set of
      architectures on which it does compile, but that might look more
      deliberate than guessing based on where the drivers build.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      net: wan: add missing virt_to_bus dependencies
      
      The cosa driver is rather outdated and does not get built on most
      platforms because it requires the ISA_DMA_API symbol. However
      there are some ARM platforms that have ISA_DMA_API but no virt_to_bus,
      and they get this build error when enabling the ltpc driver.
      
      drivers/net/wan/cosa.c: In function 'tx_interrupt':
      drivers/net/wan/cosa.c:1768:3: error: implicit declaration of function 'virt_to_bus'
         unsigned long addr = virt_to_bus(cosa->txbuf);
         ^
      
      The same problem exists for the Hostess SV-11 and Sealevel Systems 4021
      drivers.
      
      This adds another dependency in Kconfig to avoid that configuration.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      net: cs89x0: always build platform code if !HAS_IOPORT_MAP
      
      The cs89x0 driver can either be built as an ISA driver or a platform
      driver, the choice is controlled by the CS89x0_PLATFORM Kconfig
      symbol. Building the ISA driver on a system that does not have
      a way to map I/O ports fails with this error:
      
      drivers/built-in.o: In function `cs89x0_ioport_probe.constprop.1':
      :(.init.text+0x4794): undefined reference to `ioport_map'
      :(.init.text+0x4830): undefined reference to `ioport_unmap'
      
      This changes the Kconfig logic to take that option away and
      always force building the platform variant of this driver if
      CONFIG_HAS_IOPORT_MAP is not set. This is the only correct
      choice in this case, and it avoids the build error.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      ppp: deflate: never return len larger than output buffer
      
      When we've run out of space in the output buffer to store more data, we
      will call zlib_deflate with a NULL output buffer until we've consumed
      remaining input.
      
      When this happens, olen contains the size the output buffer would have
      consumed iff we'd have had enough room.
      
      This can later cause skb_over_panic when ppp_generic skb_put()s
      the returned length.
      Reported-by: default avatarIain Douglas <centos@1n6.org.uk>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      Merge branch 'netns'
      
      Nicolas Dichtel says:
      
      ====================
      netns: audit netdevice creation with IFLA_NET_NS_[PID|FD]
      
      When one of these attributes is set, the netdevice is created into the netns
      pointed by IFLA_NET_NS_[PID|FD] (see the call to rtnl_create_link() in
      rtnl_newlink()). Let's call this netns the dest_net. After this creation, if the
      newlink handler exists, it is called with a netns argument that points to the
      netns where the netlink message has been received (called src_net in the code)
      which is the link netns.
      Hence, with one of these attributes, it's possible to create a x-netns
      netdevice.
      
      Here is the result of my code review:
      - all ip tunnels (sit, ipip, ip6_tunnels, gre[tap][v6], ip_vti[6]) does not
        really allows to use this feature: the netdevice is created in the dest_net
        and the src_net is completely ignored in the newlink handler.
      - VLAN properly handles this x-netns creation.
      - bridge ignores src_net, which seems fine (NETIF_F_NETNS_LOCAL is set).
      - CAIF subsystem is not clear for me (I don't know how it works), but it seems
        to wrongly use src_net. Patch #1 tries to fix this, but it was done only by
        code review (and only compile-tested), so please carefully review it. I may
        miss something.
      - HSR subsystem uses src_net to parse IFLA_HSR_SLAVE[1|2], but the netdevice has
        the flag NETIF_F_NETNS_LOCAL, so the question is: does this netdevice really
        supports x-netns? If not, the newlink handler should use the dest_net instead
        of src_net, I can provide the patch.
      - ieee802154 uses also src_net and does not have NETIF_F_NETNS_LOCAL. Same
        question: does this netdevice really supports x-netns?
      - bonding ignores src_net and flag NETIF_F_NETNS_LOCAL is set, ie x-netns is not
        supported. Fine.
      - CAN does not support rtnl/newlink, ok.
      - ipvlan uses src_net and does not have NETIF_F_NETNS_LOCAL. After looking at
        the code, it seems that this drivers support x-netns. Am I right?
      - macvlan/macvtap uses src_net and seems to have x-netns support.
      - team ignores src_net and has the flag NETIF_F_NETNS_LOCAL, ie x-netns is not
        supported. Ok.
      - veth uses src_net and have x-netns support ;-) Ok.
      - VXLAN didn't properly handle this. The link netns (vxlan->net) is the src_net
        and not dest_net (see patch #2). Note that it was already possible to create a
        x-netns vxlan before the commit f01ec1c0 ("vxlan: add x-netns support")
        but the nedevice remains broken.
      
      To summarize:
       - CAIF patch must be carefully reviewed
       - for HSR, ieee802154, ipvlan: is x-netns supported?
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      vxlan: setup the right link netns in newlink hdlr
      
      Rename the netns to src_net to avoid confusion with the netns where the
      interface stands. The user may specify IFLA_NET_NS_[PID|FD] to create
      a x-netns netndevice: IFLA_NET_NS_[PID|FD] points to the netns where the
      netdevice stands and src_net to the link netns.
      
      Note that before commit f01ec1c0 ("vxlan: add x-netns support"), it was
      possible to create a x-netns vxlan netdevice, but the netdevice was not
      operational.
      
      Fixes: f01ec1c0 ("vxlan: add x-netns support")
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      caif: remove wrong dev_net_set() call
      
      src_net points to the netns where the netlink message has been received. This
      netns may be different from the netns where the interface is created (because
      the user may add IFLA_NET_NS_[PID|FD]). In this case, src_net is the link netns.
      
      It seems wrong to override the netns in the newlink() handler because if it
      was not already src_net, it means that the user explicitly asks to create the
      netdevice in another netns.
      
      CC: Sjur Brændeland <sjur.brandeland@stericsson.com>
      CC: Dmitry Tarnyagin <dmitry.tarnyagin@lockless.no>
      Fixes: 8391c4aa ("caif: Bugfixes in CAIF netdevice for close and flow control")
      Fixes: c4125400 ("caif-hsi: Add rtnl support")
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      lib/checksum.c: fix build for generic csum_tcpudp_nofold
      
      Fixed commit added from64to32 under _#ifndef do_csum_ but used it
      under _#ifndef csum_tcpudp_nofold_, breaking some builds (Fengguang's
      robot reported TILEGX's). Move from64to32 under the latter.
      
      Fixes: 150ae0e9 ("lib/checksum.c: fix carry in csum_tcpudp_nofold")
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarKarl Beldan <karl.beldan@rivierawaves.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      tcp: ipv4: initialize unicast_sock sk_pacing_rate
      
      When I added sk_pacing_rate field, I forgot to initialize its value
      in the per cpu unicast_sock used in ip_send_unicast_reply()
      
      This means that for sch_fq users, RST packets, or ACK packets sent
      on behalf of TIME_WAIT sockets might be sent to slowly or even dropped
      once we reach the per flow limit.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Fixes: 95bd09eb ("tcp: TSO packets automatic sizing")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit bdbbb852
      811230cd)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      971b8803
    • Roopa Prabhu's avatar
      bridge: dont send notification when skb->len == 0 in rtnl_bridge_notify · 8c040420
      Roopa Prabhu authored
      Reported in: https://bugzilla.kernel.org/show_bug.cgi?id=92081
      
      This patch avoids calling rtnl_notify if the device ndo_bridge_getlink
      handler does not return any bytes in the skb.
      
      Alternately, the skb->len check can be moved inside rtnl_notify.
      
      For the bridge vlan case described in 92081, there is also a fix needed
      in bridge driver to generate a proper notification. Will fix that in
      subsequent patch.
      
      v2: rebase patch on net tree
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit 59ccaaaa)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      8c040420
    • subashab@codeaurora.org's avatar
      ping: Fix race in free in receive path · abf175c2
      subashab@codeaurora.org authored
      An exception is seen in ICMP ping receive path where the skb
      destructor sock_rfree() tries to access a freed socket. This happens
      because ping_rcv() releases socket reference with sock_put() and this
      internally frees up the socket. Later icmp_rcv() will try to free the
      skb and as part of this, skb destructor is called and which leads
      to a kernel panic as the socket is freed already in ping_rcv().
      
      -->|exception
      -007|sk_mem_uncharge
      -007|sock_rfree
      -008|skb_release_head_state
      -009|skb_release_all
      -009|__kfree_skb
      -010|kfree_skb
      -011|icmp_rcv
      -012|ip_local_deliver_finish
      
      Fix this incorrect free by cloning this skb and processing this cloned
      skb instead.
      
      This patch was suggested by Eric Dumazet
      Signed-off-by: default avatarSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit fc752f1f)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      abf175c2
    • Herbert Xu's avatar
      udp_diag: Fix socket skipping within chain · aec9ed7c
      Herbert Xu authored
      While working on rhashtable walking I noticed that the UDP diag
      dumping code is buggy.  In particular, the socket skipping within
      a chain never happens, even though we record the number of sockets
      that should be skipped.
      
      As this code was supposedly copied from TCP, this patch does what
      TCP does and resets num before we walk a chain.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: default avatarPavel Emelyanov <xemul@parallels.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit 86f3cddb)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      aec9ed7c
    • Hannes Frederic Sowa's avatar
      ipv4: try to cache dst_entries which would cause a redirect · 3cab2e87
      Hannes Frederic Sowa authored
      Not caching dst_entries which cause redirects could be exploited by hosts
      on the same subnet, causing a severe DoS attack. This effect aggravated
      since commit f8864972 ("ipv4: fix dst race in sk_dst_get()").
      
      Lookups causing redirects will be allocated with DST_NOCACHE set which
      will force dst_release to free them via RCU.  Unfortunately waiting for
      RCU grace period just takes too long, we can end up with >1M dst_entries
      waiting to be released and the system will run OOM. rcuos threads cannot
      catch up under high softirq load.
      
      Attaching the flag to emit a redirect later on to the specific skb allows
      us to cache those dst_entries thus reducing the pressure on allocation
      and deallocation.
      
      This issue was discovered by Marcelo Leitner.
      
      Cc: Julian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarMarcelo Leitner <mleitner@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit df4d9254)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      3cab2e87
    • Daniel Borkmann's avatar
      net: sctp: fix slab corruption from use after free on INIT collisions · 7fe1bdec
      Daniel Borkmann authored
      When hitting an INIT collision case during the 4WHS with AUTH enabled, as
      already described in detail in commit 1be9a950 ("net: sctp: inherit
      auth_capable on INIT collisions"), it can happen that we occasionally
      still remotely trigger the following panic on server side which seems to
      have been uncovered after the fix from commit 1be9a950 ...
      
      [  533.876389] BUG: unable to handle kernel paging request at 00000000ffffffff
      [  533.913657] IP: [<ffffffff811ac385>] __kmalloc+0x95/0x230
      [  533.940559] PGD 5030f2067 PUD 0
      [  533.957104] Oops: 0000 [#1] SMP
      [  533.974283] Modules linked in: sctp mlx4_en [...]
      [  534.939704] Call Trace:
      [  534.951833]  [<ffffffff81294e30>] ? crypto_init_shash_ops+0x60/0xf0
      [  534.984213]  [<ffffffff81294e30>] crypto_init_shash_ops+0x60/0xf0
      [  535.015025]  [<ffffffff8128c8ed>] __crypto_alloc_tfm+0x6d/0x170
      [  535.045661]  [<ffffffff8128d12c>] crypto_alloc_base+0x4c/0xb0
      [  535.074593]  [<ffffffff8160bd42>] ? _raw_spin_lock_bh+0x12/0x50
      [  535.105239]  [<ffffffffa0418c11>] sctp_inet_listen+0x161/0x1e0 [sctp]
      [  535.138606]  [<ffffffff814e43bd>] SyS_listen+0x9d/0xb0
      [  535.166848]  [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b
      
      ... or depending on the the application, for example this one:
      
      [ 1370.026490] BUG: unable to handle kernel paging request at 00000000ffffffff
      [ 1370.026506] IP: [<ffffffff811ab455>] kmem_cache_alloc+0x75/0x1d0
      [ 1370.054568] PGD 633c94067 PUD 0
      [ 1370.070446] Oops: 0000 [#1] SMP
      [ 1370.085010] Modules linked in: sctp kvm_amd kvm [...]
      [ 1370.963431] Call Trace:
      [ 1370.974632]  [<ffffffff8120f7cf>] ? SyS_epoll_ctl+0x53f/0x960
      [ 1371.000863]  [<ffffffff8120f7cf>] SyS_epoll_ctl+0x53f/0x960
      [ 1371.027154]  [<ffffffff812100d3>] ? anon_inode_getfile+0xd3/0x170
      [ 1371.054679]  [<ffffffff811e3d67>] ? __alloc_fd+0xa7/0x130
      [ 1371.080183]  [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b
      
      With slab debugging enabled, we can see that the poison has been overwritten:
      
      [  669.826368] BUG kmalloc-128 (Tainted: G        W     ): Poison overwritten
      [  669.826385] INFO: 0xffff880228b32e50-0xffff880228b32e50. First byte 0x6a instead of 0x6b
      [  669.826414] INFO: Allocated in sctp_auth_create_key+0x23/0x50 [sctp] age=3 cpu=0 pid=18494
      [  669.826424]  __slab_alloc+0x4bf/0x566
      [  669.826433]  __kmalloc+0x280/0x310
      [  669.826453]  sctp_auth_create_key+0x23/0x50 [sctp]
      [  669.826471]  sctp_auth_asoc_create_secret+0xcb/0x1e0 [sctp]
      [  669.826488]  sctp_auth_asoc_init_active_key+0x68/0xa0 [sctp]
      [  669.826505]  sctp_do_sm+0x29d/0x17c0 [sctp] [...]
      [  669.826629] INFO: Freed in kzfree+0x31/0x40 age=1 cpu=0 pid=18494
      [  669.826635]  __slab_free+0x39/0x2a8
      [  669.826643]  kfree+0x1d6/0x230
      [  669.826650]  kzfree+0x31/0x40
      [  669.826666]  sctp_auth_key_put+0x19/0x20 [sctp]
      [  669.826681]  sctp_assoc_update+0x1ee/0x2d0 [sctp]
      [  669.826695]  sctp_do_sm+0x674/0x17c0 [sctp]
      
      Since this only triggers in some collision-cases with AUTH, the problem at
      heart is that sctp_auth_key_put() on asoc->asoc_shared_key is called twice
      when having refcnt 1, once directly in sctp_assoc_update() and yet again
      from within sctp_auth_asoc_init_active_key() via sctp_assoc_update() on
      the already kzfree'd memory, which is also consistent with the observation
      of the poison decrease from 0x6b to 0x6a (note: the overwrite is detected
      at a later point in time when poison is checked on new allocation).
      
      Reference counting of auth keys revisited:
      
      Shared keys for AUTH chunks are being stored in endpoints and associations
      in endpoint_shared_keys list. On endpoint creation, a null key is being
      added; on association creation, all endpoint shared keys are being cached
      and thus cloned over to the association. struct sctp_shared_key only holds
      a pointer to the actual key bytes, that is, struct sctp_auth_bytes which
      keeps track of users internally through refcounting. Naturally, on assoc
      or enpoint destruction, sctp_shared_key are being destroyed directly and
      the reference on sctp_auth_bytes dropped.
      
      User space can add keys to either list via setsockopt(2) through struct
      sctp_authkey and by passing that to sctp_auth_set_key() which replaces or
      adds a new auth key. There, sctp_auth_create_key() creates a new sctp_auth_bytes
      with refcount 1 and in case of replacement drops the reference on the old
      sctp_auth_bytes. A key can be set active from user space through setsockopt()
      on the id via sctp_auth_set_active_key(), which iterates through either
      endpoint_shared_keys and in case of an assoc, invokes (one of various places)
      sctp_auth_asoc_init_active_key().
      
      sctp_auth_asoc_init_active_key() computes the actual secret from local's
      and peer's random, hmac and shared key parameters and returns a new key
      directly as sctp_auth_bytes, that is asoc->asoc_shared_key, plus drops
      the reference if there was a previous one. The secret, which where we
      eventually double drop the ref comes from sctp_auth_asoc_set_secret() with
      intitial refcount of 1, which also stays unchanged eventually in
      sctp_assoc_update(). This key is later being used for crypto layer to
      set the key for the hash in crypto_hash_setkey() from sctp_auth_calculate_hmac().
      
      To close the loop: asoc->asoc_shared_key is freshly allocated secret
      material and independant of the sctp_shared_key management keeping track
      of only shared keys in endpoints and assocs. Hence, also commit 4184b2a7
      ("net: sctp: fix memory leak in auth key management") is independant of
      this bug here since it concerns a different layer (though same structures
      being used eventually). asoc->asoc_shared_key is reference dropped correctly
      on assoc destruction in sctp_association_free() and when active keys are
      being replaced in sctp_auth_asoc_init_active_key(), it always has a refcount
      of 1. Hence, it's freed prematurely in sctp_assoc_update(). Simple fix is
      to remove that sctp_auth_key_put() from there which fixes these panics.
      
      Fixes: 730fc3d0 ("[SCTP]: Implete SCTP-AUTH parameter processing")
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit 600ddd68)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      7fe1bdec
    • Eric Dumazet's avatar
      netxen: fix netxen_nic_poll() logic · e8c3b7d0
      Eric Dumazet authored
      NAPI poll logic now enforces that a poller returns exactly the budget
      when it wants to be called again.
      
      If a driver limits TX completion, it has to return budget as well when
      the limit is hit, not the number of received packets.
      Reported-and-tested-by: default avatarMike Galbraith <umgwanakikbuti@gmail.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Fixes: d75b1ade ("net: less interrupt masking in NAPI")
      Cc: Manish Chopra <manish.chopra@qlogic.com>
      Acked-by: default avatarManish Chopra <manish.chopra@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit 6088beef)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      e8c3b7d0
    • Hagen Paul Pfeifer's avatar
      ipv6: stop sending PTB packets for MTU < 1280 · 626c263b
      Hagen Paul Pfeifer authored
      Reduce the attack vector and stop generating IPv6 Fragment Header for
      paths with an MTU smaller than the minimum required IPv6 MTU
      size (1280 byte) - called atomic fragments.
      
      See IETF I-D "Deprecating the Generation of IPv6 Atomic Fragments" [1]
      for more information and how this "feature" can be misused.
      
      [1] https://tools.ietf.org/html/draft-ietf-6man-deprecate-atomfrag-generation-00Signed-off-by: default avatarFernando Gont <fgont@si6networks.com>
      Signed-off-by: default avatarHagen Paul Pfeifer <hagen@jauu.net>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit 9d289715)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      626c263b
    • karl beldan's avatar
      lib/checksum.c: fix build for generic csum_tcpudp_nofold · 23f2b873
      karl beldan authored
      Fixed commit added from64to32 under _#ifndef do_csum_ but used it
      under _#ifndef csum_tcpudp_nofold_, breaking some builds (Fengguang's
      robot reported TILEGX's). Move from64to32 under the latter.
      
      Fixes: 150ae0e9 ("lib/checksum.c: fix carry in csum_tcpudp_nofold")
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarKarl Beldan <karl.beldan@rivierawaves.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit 9ce35779)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      23f2b873
    • karl beldan's avatar
      lib/checksum.c: fix build for generic csum_tcpudp_nofold · 6b0dec80
      karl beldan authored
      Fixed commit added from64to32 under _#ifndef do_csum_ but used it
      under _#ifndef csum_tcpudp_nofold_, breaking some builds (Fengguang's
      robot reported TILEGX's). Move from64to32 under the latter.
      
      Fixes: 150ae0e9 ("lib/checksum.c: fix carry in csum_tcpudp_nofold")
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarKarl Beldan <karl.beldan@rivierawaves.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      tcp: ipv4: initialize unicast_sock sk_pacing_rate
      
      When I added sk_pacing_rate field, I forgot to initialize its value
      in the per cpu unicast_sock used in ip_send_unicast_reply()
      
      This means that for sch_fq users, RST packets, or ACK packets sent
      on behalf of TIME_WAIT sockets might be sent to slowly or even dropped
      once we reach the per flow limit.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Fixes: 95bd09eb ("tcp: TSO packets automatic sizing")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      lib/checksum.c: fix carry in csum_tcpudp_nofold
      
      The carry from the 64->32bits folding was dropped, e.g with:
      saddr=0xFFFFFFFF daddr=0xFF0000FF len=0xFFFF proto=0 sum=1,
      csum_tcpudp_nofold returned 0 instead of 1.
      Signed-off-by: default avatarKarl Beldan <karl.beldan@rivierawaves.com>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: netdev@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit 9ce35779
      150ae0e9)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      6b0dec80
    • David Vrabel's avatar
      Revert "swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single" · ebc46030
      David Vrabel authored
      This reverts commit 2c3fc8d2.
      
      This commit broke on x86 PV because entries in the generic SWIOTLB are
      indexed using (pseudo-)physical address not DMA address and these are
      not the same in a x86 PV guest.
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: default avatarStefano Stabellini <stefano.stabellini@eu.citrix.com>
      
      Revert "swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single"
      
      This reverts commit 2c3fc8d2.
      
      This commit broke on x86 PV because entries in the generic SWIOTLB are
      indexed using (pseudo-)physical address not DMA address and these are
      not the same in a x86 PV guest.
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: default avatarStefano Stabellini <stefano.stabellini@eu.citrix.com>
      
      xen: annotate xen_set_identity_and_remap_chunk() with __init
      
      Commit 5b8e7d80 removed the __init
      annotation from xen_set_identity_and_remap_chunk(). Add it again.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: introduce helper functions to do safe read and write accesses
      
      Introduce two helper functions to safely read and write unsigned long
      values from or to memory when the access may fault because the mapping
      is non-present or read-only.
      
      These helpers can be used instead of open coded uses of __get_user()
      and __put_user() avoiding the need to do casts to fix sparse warnings.
      
      Use the helpers in page.h and p2m.c. This will fix the sparse
      warnings when doing "make C=1".
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: Speed up set_phys_to_machine() by using read-only mappings
      
      Instead of checking at each call of set_phys_to_machine() whether a
      new p2m page has to be allocated due to writing an entry in a large
      invalid or identity area, just map those areas read only and react
      to a page fault on write by allocating the new page.
      
      This change will make the common path with no allocation much
      faster as it only requires a single write of the new mfn instead
      of walking the address translation tables and checking for the
      special cases.
      Suggested-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: switch to linear virtual mapped sparse p2m list
      
      At start of the day the Xen hypervisor presents a contiguous mfn list
      to a pv-domain. In order to support sparse memory this mfn list is
      accessed via a three level p2m tree built early in the boot process.
      Whenever the system needs the mfn associated with a pfn this tree is
      used to find the mfn.
      
      Instead of using a software walked tree for accessing a specific mfn
      list entry this patch is creating a virtual address area for the
      entire possible mfn list including memory holes. The holes are
      covered by mapping a pre-defined  page consisting only of "invalid
      mfn" entries. Access to a mfn entry is possible by just using the
      virtual base address of the mfn list and the pfn as index into that
      list. This speeds up the (hot) path of determining the mfn of a
      pfn.
      
      Kernel build on a Dell Latitude E6440 (2 cores, HT) in 64 bit Dom0
      showed following improvements:
      
      Elapsed time: 32:50 ->  32:35
      System:       18:07 ->  17:47
      User:        104:00 -> 103:30
      
      Tested with following configurations:
      - 64 bit dom0, 8GB RAM
      - 64 bit dom0, 128 GB RAM, PCI-area above 4 GB
      - 32 bit domU, 512 MB, 8 GB, 43 GB (more wouldn't work even without
                                          the patch)
      - 32 bit domU, ballooning up and down
      - 32 bit domU, save and restore
      - 32 bit domU with PCI passthrough
      - 64 bit domU, 8 GB, 2049 MB, 5000 MB
      - 64 bit domU, ballooning up and down
      - 64 bit domU, save and restore
      - 64 bit domU with PCI passthrough
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: Hide get_phys_to_machine() to be able to tune common path
      
      Today get_phys_to_machine() is always called when the mfn for a pfn
      is to be obtained. Add a wrapper __pfn_to_mfn() as inline function
      to be able to avoid calling get_phys_to_machine() when possible as
      soon as the switch to a linear mapped p2m list has been done.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      x86: Introduce function to get pmd entry pointer
      
      Introduces lookup_pmd_address() to get the address of the pmd entry
      related to a virtual address in the current address space. This
      function is needed for support of a virtual mapped sparse p2m list
      in xen pv domains, as we need the address of the pmd entry, not the
      one of the pte in that case.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: Delay invalidating extra memory
      
      When the physical memory configuration is initialized the p2m entries
      for not pouplated memory pages are set to "invalid". As those pages
      are beyond the hypervisor built p2m list the p2m tree has to be
      extended.
      
      This patch delays processing the extra memory related p2m entries
      during the boot process until some more basic memory management
      functions are callable. This removes the need to create new p2m
      entries until virtual memory management is available.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: Delay m2p_override initialization
      
      The m2p overrides are used to be able to find the local pfn for a
      foreign mfn mapped into the domain. They are used by driver backends
      having to access frontend data.
      
      As this functionality isn't used in early boot it makes no sense to
      initialize the m2p override functions very early. It can be done
      later without doing any harm, removing the need for allocating memory
      via extend_brk().
      
      While at it make some m2p override functions static as they are only
      used internally.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: Delay remapping memory of pv-domain
      
      Early in the boot process the memory layout of a pv-domain is changed
      to match the E820 map (either the host one for Dom0 or the Xen one)
      regarding placement of RAM and PCI holes. This requires removing memory
      pages initially located at positions not suitable for RAM and adding
      them later at higher addresses where no restrictions apply.
      
      To be able to operate on the hypervisor supported p2m list until a
      virtual mapped linear p2m list can be constructed, remapping must
      be delayed until virtual memory management is initialized, as the
      initial p2m list can't be extended unlimited at physical memory
      initialization time due to it's fixed structure.
      
      A further advantage is the reduction in complexity and code volume as
      we don't have to be careful regarding memory restrictions during p2m
      updates.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: use common page allocation function in p2m.c
      
      In arch/x86/xen/p2m.c three different allocation functions for
      obtaining a memory page are used: extend_brk(), alloc_bootmem_align()
      or __get_free_page().  Which of those functions is used depends on the
      progress of the boot process of the system.
      
      Introduce a common allocation routine selecting the to be called
      allocation routine dynamically based on the boot progress. This allows
      moving initialization steps without having to care about changing
      allocation calls.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: Make functions static
      
      Some functions in arch/x86/xen/p2m.c are used locally only. Make them
      static. Rearrange the functions in p2m.c to avoid forward declarations.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: fix some style issues in p2m.c
      
      The source arch/x86/xen/p2m.c has some coding style issues. Fix them.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      (cherry picked from commit dbdd7476
      4ef8e3f3)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      ebc46030
    • Chris Wilson's avatar
      drm/i915: Fix mutex->owner inspection race under DEBUG_MUTEXES · cb55bd0a
      Chris Wilson authored
      commit 226e5ae9 upstream.
      
      If CONFIG_DEBUG_MUTEXES is set, the mutex->owner field is only cleared
      if the mutex debugging is enabled which introduces a race in our
      mutex_is_locked_by() - i.e. we may inspect the old owner value before it
      is acquired by the new task.
      
      This is the root cause of this error:
      
      # diff --git a/kernel/locking/mutex-debug.c b/kernel/locking/mutex-debug.c
      # index 5cf6731..3ef3736 100644
      # --- a/kernel/locking/mutex-debug.c
      # +++ b/kernel/locking/mutex-debug.c
      # @@ -80,13 +80,13 @@ void debug_mutex_unlock(struct mutex *lock)
      # 			DEBUG_LOCKS_WARN_ON(lock->owner != current);
      #
      # 		DEBUG_LOCKS_WARN_ON(!lock->wait_list.prev && !lock->wait_list.next);
      # -		mutex_clear_owner(lock);
      # 	}
      #
      # 	/*
      # 	 * __mutex_slowpath_needs_to_unlock() is explicitly 0 for debug
      # 	 * mutexes so that we can do it here after we've verified state.
      # 	 */
      # +	mutex_clear_owner(lock);
      # 	atomic_set(&lock->count, 1);
      #  }
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87955Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      (cherry picked from commit 8dcffdd3)
      cb55bd0a
    • Hans de Goede's avatar
      asus-nb-wmi: Add another wapf=4 quirk · 22288f47
      Hans de Goede authored
      Wifi on this laptop does not work unless asus-nb-wmi.wapf=4 is specified on
      the kerne commandline, add a quirk for this.
      
      Cc: stable@vger.kernel.org
      BugLink: https://bugs.launchpad.net/bugs/1173681Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarDarren Hart <dvhart@linux.intel.com>
      
      (cherry picked from commit 841e11cc)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      22288f47
    • Chris Wilson's avatar
      drm/i915: Force the CS stall for invalidate flushes · 383fafa8
      Chris Wilson authored
      In order to act as a full command barrier by itself, we need to tell the
      pipecontrol to actually stall the command streamer while the flush runs.
      We require the full command barrier before operations like
      MI_SET_CONTEXT, which currently rely on a prior invalidate flush.
      
      References: https://bugs.freedesktop.org/show_bug.cgi?id=83677
      Cc: Simon Farnsworth <simon@farnz.org.uk>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      
      (cherry picked from commit add284a3)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      383fafa8
    • Kazuya Mizuguchi's avatar
      usb: renesas_usbhs: gadget: fix NULL pointer dereference in ep_disable() · d7d8e0fa
      Kazuya Mizuguchi authored
      This patch fixes an issue that the NULL pointer dereference happens
      when we uses g_audio driver. Since the g_audio driver will call
      usb_ep_disable() in afunc_set_alt() before it calls usb_ep_enable(),
      the uep->pipe of renesas usbhs driver will be NULL. So, this patch
      adds a condition to avoid the oops.
      Signed-off-by: default avatarKazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
      Signed-off-by: default avatarTakeshi Kihara <takeshi.kihara.df@renesas.com>
      Signed-off-by: default avatarYoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
      Fixes: 2f98382d (usb: renesas_usbhs: Add Renesas USBHS Gadget)
      Cc: <stable@vger.kernel.org> # v3.0+
      Signed-off-by: default avatarFelipe Balbi <balbi@ti.com>
      
      (cherry picked from commit 11432050)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      d7d8e0fa
    • Gwendal Grignou's avatar
      HID: i2c-hid: prevent buffer overflow in early IRQ · 17fd0ceb
      Gwendal Grignou authored
      commit d1c7e29e upstream.
      
      Before ->start() is called, bufsize size is set to HID_MIN_BUFFER_SIZE,
      64 bytes. While processing the IRQ, we were asking to receive up to
      wMaxInputLength bytes, which can be bigger than 64 bytes.
      
      Later, when ->start is run, a proper bufsize will be calculated.
      
      Given wMaxInputLength is said to be unreliable in other part of the
      code, set to receive only what we can even if it results in truncated
      reports.
      Signed-off-by: default avatarGwendal Grignou <gwendal@chromium.org>
      Reviewed-by: default avatarBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      (cherry picked from commit e788b6db)
      17fd0ceb
    • Stefano Stabellini's avatar
      Revert "swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single" · 764a3524
      Stefano Stabellini authored
      This reverts commit 2c3fc8d2.
      
      This commit broke on x86 PV because entries in the generic SWIOTLB are
      indexed using (pseudo-)physical address not DMA address and these are
      not the same in a x86 PV guest.
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: default avatarStefano Stabellini <stefano.stabellini@eu.citrix.com>
      
      Revert "swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single"
      
      This reverts commit 2c3fc8d2.
      
      This commit broke on x86 PV because entries in the generic SWIOTLB are
      indexed using (pseudo-)physical address not DMA address and these are
      not the same in a x86 PV guest.
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: default avatarStefano Stabellini <stefano.stabellini@eu.citrix.com>
      
      xen: annotate xen_set_identity_and_remap_chunk() with __init
      
      Commit 5b8e7d80 removed the __init
      annotation from xen_set_identity_and_remap_chunk(). Add it again.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: introduce helper functions to do safe read and write accesses
      
      Introduce two helper functions to safely read and write unsigned long
      values from or to memory when the access may fault because the mapping
      is non-present or read-only.
      
      These helpers can be used instead of open coded uses of __get_user()
      and __put_user() avoiding the need to do casts to fix sparse warnings.
      
      Use the helpers in page.h and p2m.c. This will fix the sparse
      warnings when doing "make C=1".
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: Speed up set_phys_to_machine() by using read-only mappings
      
      Instead of checking at each call of set_phys_to_machine() whether a
      new p2m page has to be allocated due to writing an entry in a large
      invalid or identity area, just map those areas read only and react
      to a page fault on write by allocating the new page.
      
      This change will make the common path with no allocation much
      faster as it only requires a single write of the new mfn instead
      of walking the address translation tables and checking for the
      special cases.
      Suggested-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: switch to linear virtual mapped sparse p2m list
      
      At start of the day the Xen hypervisor presents a contiguous mfn list
      to a pv-domain. In order to support sparse memory this mfn list is
      accessed via a three level p2m tree built early in the boot process.
      Whenever the system needs the mfn associated with a pfn this tree is
      used to find the mfn.
      
      Instead of using a software walked tree for accessing a specific mfn
      list entry this patch is creating a virtual address area for the
      entire possible mfn list including memory holes. The holes are
      covered by mapping a pre-defined  page consisting only of "invalid
      mfn" entries. Access to a mfn entry is possible by just using the
      virtual base address of the mfn list and the pfn as index into that
      list. This speeds up the (hot) path of determining the mfn of a
      pfn.
      
      Kernel build on a Dell Latitude E6440 (2 cores, HT) in 64 bit Dom0
      showed following improvements:
      
      Elapsed time: 32:50 ->  32:35
      System:       18:07 ->  17:47
      User:        104:00 -> 103:30
      
      Tested with following configurations:
      - 64 bit dom0, 8GB RAM
      - 64 bit dom0, 128 GB RAM, PCI-area above 4 GB
      - 32 bit domU, 512 MB, 8 GB, 43 GB (more wouldn't work even without
                                          the patch)
      - 32 bit domU, ballooning up and down
      - 32 bit domU, save and restore
      - 32 bit domU with PCI passthrough
      - 64 bit domU, 8 GB, 2049 MB, 5000 MB
      - 64 bit domU, ballooning up and down
      - 64 bit domU, save and restore
      - 64 bit domU with PCI passthrough
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: Hide get_phys_to_machine() to be able to tune common path
      
      Today get_phys_to_machine() is always called when the mfn for a pfn
      is to be obtained. Add a wrapper __pfn_to_mfn() as inline function
      to be able to avoid calling get_phys_to_machine() when possible as
      soon as the switch to a linear mapped p2m list has been done.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      x86: Introduce function to get pmd entry pointer
      
      Introduces lookup_pmd_address() to get the address of the pmd entry
      related to a virtual address in the current address space. This
      function is needed for support of a virtual mapped sparse p2m list
      in xen pv domains, as we need the address of the pmd entry, not the
      one of the pte in that case.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: Delay invalidating extra memory
      
      When the physical memory configuration is initialized the p2m entries
      for not pouplated memory pages are set to "invalid". As those pages
      are beyond the hypervisor built p2m list the p2m tree has to be
      extended.
      
      This patch delays processing the extra memory related p2m entries
      during the boot process until some more basic memory management
      functions are callable. This removes the need to create new p2m
      entries until virtual memory management is available.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: Delay m2p_override initialization
      
      The m2p overrides are used to be able to find the local pfn for a
      foreign mfn mapped into the domain. They are used by driver backends
      having to access frontend data.
      
      As this functionality isn't used in early boot it makes no sense to
      initialize the m2p override functions very early. It can be done
      later without doing any harm, removing the need for allocating memory
      via extend_brk().
      
      While at it make some m2p override functions static as they are only
      used internally.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: Delay remapping memory of pv-domain
      
      Early in the boot process the memory layout of a pv-domain is changed
      to match the E820 map (either the host one for Dom0 or the Xen one)
      regarding placement of RAM and PCI holes. This requires removing memory
      pages initially located at positions not suitable for RAM and adding
      them later at higher addresses where no restrictions apply.
      
      To be able to operate on the hypervisor supported p2m list until a
      virtual mapped linear p2m list can be constructed, remapping must
      be delayed until virtual memory management is initialized, as the
      initial p2m list can't be extended unlimited at physical memory
      initialization time due to it's fixed structure.
      
      A further advantage is the reduction in complexity and code volume as
      we don't have to be careful regarding memory restrictions during p2m
      updates.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: use common page allocation function in p2m.c
      
      In arch/x86/xen/p2m.c three different allocation functions for
      obtaining a memory page are used: extend_brk(), alloc_bootmem_align()
      or __get_free_page().  Which of those functions is used depends on the
      progress of the boot process of the system.
      
      Introduce a common allocation routine selecting the to be called
      allocation routine dynamically based on the boot progress. This allows
      moving initialization steps without having to care about changing
      allocation calls.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: Make functions static
      
      Some functions in arch/x86/xen/p2m.c are used locally only. Make them
      static. Rearrange the functions in p2m.c to avoid forward declarations.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: fix some style issues in p2m.c
      
      The source arch/x86/xen/p2m.c has some coding style issues. Fix them.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen/pci: Use APIC directly when APIC virtualization hardware is available
      
      When hardware supports APIC/x2APIC virtualization we don't need to use
      pirqs for MSI handling and instead use APIC since most APIC accesses
      (MMIO or MSR) will now be processed without VMEXITs.
      
      As an example, netperf on the original code produces this profile
      (collected wih 'xentrace -e 0x0008ffff -T 5'):
      
          342 cpu_change
          260 CPUID
        34638 HLT
        64067 INJ_VIRQ
        28374 INTR
        82733 INTR_WINDOW
           10 NPF
        24337 TRAP
       370610 vlapic_accept_pic_intr
       307528 VMENTRY
       307527 VMEXIT
       140998 VMMCALL
          127 wrap_buffer
      
      After applying this patch the same test shows
      
          230 cpu_change
          260 CPUID
        36542 HLT
          174 INJ_VIRQ
        27250 INTR
          222 INTR_WINDOW
           20 NPF
        24999 TRAP
       381812 vlapic_accept_pic_intr
       166480 VMENTRY
       166479 VMEXIT
        77208 VMMCALL
           81 wrap_buffer
      
      ApacheBench results (ab -n 10000 -c 200) improve by about 10%
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: default avatarAndrew Cooper <andrew.cooper3@citrix.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen/pci: Defer initialization of MSI ops on HVM guests
      
      If the hardware supports APIC virtualization we may decide not to use
      pirqs and instead use APIC/x2APIC directly, meaning that we don't want
      to set x86_msi.setup_msi_irqs and x86_msi.teardown_msi_irq to
      Xen-specific routines.  However, x2APIC is not set up by the time
      pci_xen_hvm_init() is called so we need to postpone setting these ops
      until later, when we know which APIC mode is used.
      
      (Note that currently x2APIC is never initialized on HVM guests. This
      may change in the future)
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Acked-by: default avatarStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen-pciback: drop SR-IOV VFs when PF driver unloads
      
      When a PF driver unloads, it may find it necessary to leave the VFs
      around simply because of pciback having marked them as assigned to a
      guest. Utilize a suitable notification to let go of the VFs, thus
      allowing the PF to go back into the state it was before its driver
      loaded (which in particular allows the driver to be loaded again with
      it being able to create the VFs anew, but which also allows to then
      pass through the PF instead of the VFs).
      
      Don't do this however for any VFs currently in active use by a guest.
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      [v2: Removed the switch statement, moved it about]
      [v3: Redid it a bit differently]
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen/pciback: Restore configuration space when detaching from a guest.
      
      The commit "xen/pciback: Don't deadlock when unbinding." was using
      the version of pci_reset_function which would lock the device lock.
      That is no good as we can dead-lock. As such we swapped to using
      the lock-less version and requiring that the callers
      of 'pcistub_put_pci_dev' take the device lock. And as such
      this bug got exposed.
      
      Using the lock-less version is  OK, except that we tried to
      use 'pci_restore_state' after the lock-less version of
      __pci_reset_function_locked - which won't work as 'state_saved'
      is set to false. Said 'state_saved' is a toggle boolean that
      is to be used by the sequence of a) pci_save_state/pci_restore_state
      or b) pci_load_and_free_saved_state/pci_restore_state. We don't
      want to use a) as the guest might have messed up the PCI
      configuration space and we want it to revert to the state
      when the PCI device was binded to us. Therefore we pick
      b) to restore the configuration space.
      
      We restore from our 'golden' version of PCI configuration space, when an:
       - Device is unbinded from pciback
       - Device is detached from a guest.
      Reported-by: default avatarSander Eikelenboom <linux@eikelenboom.it>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      PCI: Expose pci_load_saved_state for public consumption.
      
      We have the pci_load_and_free_saved_state, and pci_store_saved_state
      but are missing the functionality to just load the state
      multiple times in the PCI device without having to free/save
      the state.
      
      This patch makes it possible to use this function.
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen/pciback: Remove tons of dereferences
      
      A little cleanup. No functional difference.
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen/pciback: Print out the domain owning the device.
      
      We had been printing it only if the device was built with
      debug enabled. But this information is useful in the field
      to troubleshoot.
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen/pciback: Include the domain id if removing the device whilst still in use
      
      Cleanup the function a bit - also include the id of the
      domain that is using the device.
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      driver core: Provide an wrapper around the mutex to do lockdep warnings
      
      Instead of open-coding it in drivers that want to double check
      that their functions are indeed holding the device lock.
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Suggested-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen/pciback: Don't deadlock when unbinding.
      
      As commit 0a9fd015
      'xen/pciback: Document the entry points for 'pcistub_put_pci_dev''
      explained there are four entry points in this function.
      Two of them are when the user fiddles in the SysFS to
      unbind a device which might be in use by a guest or not.
      
      Both 'unbind' states will cause a deadlock as the the PCI lock has
      already been taken, which then pci_device_reset tries to take.
      
      We can simplify this by requiring that all callers of
      pcistub_put_pci_dev MUST hold the device lock. And then
      we can just call the lockless version of pci_device_reset.
      
      To make it even simpler we will modify xen_pcibk_release_pci_dev
      to quality whether it should take a lock or not - as it ends
      up calling xen_pcibk_release_pci_dev and needs to hold the lock.
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single
      
      Need to pass the pointer within the swiotlb internal buffer to the
      swiotlb library, that in the case of xen_unmap_single is dev_addr, not
      paddr.
      Signed-off-by: default avatarStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Acked-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      CC: stable@vger.kernel.org
      
      (cherry picked from commit dbdd7476
      4ef8e3f3
      2c3fc8d2)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      764a3524
    • Guenter Roeck's avatar
      next: sh: Fix compile error · a1e5cc97
      Guenter Roeck authored
      Commit 927609d6 ("kernel: tighten rules for ACCESS ONCE") results in a
      compile failure for sh builds with CONFIG_X2TLB enabled.
      
      arch/sh/mm/gup.c: In function 'gup_get_pte':
      arch/sh/mm/gup.c:20:2: error: invalid initializer
      make[1]: *** [arch/sh/mm/gup.o] Error 1
      
      Replace ACCESS_ONCE with READ_ONCE to fix the problem.
      
      Fixes: 927609d6 ("kernel: tighten rules for ACCESS ONCE")
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      
      (cherry picked from commit 378af02b)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      a1e5cc97
    • Christian Borntraeger's avatar
      kernel: Fix sparse warning for ACCESS_ONCE · 4e3fb495
      Christian Borntraeger authored
      Commit 927609d6 ("kernel: tighten rules for ACCESS ONCE") results in
      sparse warnings like "Using plain integer as NULL pointer" - Let's add a
      type cast to the dummy assignment.
      To avoid warnings lik "sparse: warning: cast to restricted __hc32" we also
      use __force on that cast.
      
      Fixes: 927609d6 ("kernel: tighten rules for ACCESS ONCE")
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      
      (cherry picked from commit c5b19946)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      4e3fb495
    • Christian Borntraeger's avatar
      kernel: Fix sparse warning for ACCESS_ONCE · 3d9479f9
      Christian Borntraeger authored
      Commit 927609d6 ("kernel: tighten rules for ACCESS ONCE") results in
      sparse warnings like "Using plain integer as NULL pointer" - Let's add a
      type cast to the dummy assignment.
      To avoid warnings lik "sparse: warning: cast to restricted __hc32" we also
      use __force on that cast.
      
      Fixes: 927609d6 ("kernel: tighten rules for ACCESS ONCE")
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      
      next: sh: Fix compile error
      
      Commit 927609d6 ("kernel: tighten rules for ACCESS ONCE") results in a
      compile failure for sh builds with CONFIG_X2TLB enabled.
      
      arch/sh/mm/gup.c: In function 'gup_get_pte':
      arch/sh/mm/gup.c:20:2: error: invalid initializer
      make[1]: *** [arch/sh/mm/gup.o] Error 1
      
      Replace ACCESS_ONCE with READ_ONCE to fix the problem.
      
      Fixes: 927609d6 ("kernel: tighten rules for ACCESS ONCE")
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      
      kernel: tighten rules for ACCESS ONCE
      
      Now that all non-scalar users of ACCESS_ONCE have been converted
      to READ_ONCE or ASSIGN once, lets tighten ACCESS_ONCE to only
      work on scalar types.
      This variant was proposed by Alexei Starovoitov.
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      
      (cherry picked from commit c5b19946
      378af02b
      927609d6)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      3d9479f9
    • Hector Marco-Gisbert's avatar
      x86, mm/ASLR: Fix stack randomization on 64-bit systems · 02d7840b
      Hector Marco-Gisbert authored
      The issue is that the stack for processes is not properly randomized on
      64 bit architectures due to an integer overflow.
      
      The affected function is randomize_stack_top() in file
      "fs/binfmt_elf.c":
      
        static unsigned long randomize_stack_top(unsigned long stack_top)
        {
                 unsigned int random_variable = 0;
      
                 if ((current->flags & PF_RANDOMIZE) &&
                         !(current->personality & ADDR_NO_RANDOMIZE)) {
                         random_variable = get_random_int() & STACK_RND_MASK;
                         random_variable <<= PAGE_SHIFT;
                 }
                 return PAGE_ALIGN(stack_top) + random_variable;
                 return PAGE_ALIGN(stack_top) - random_variable;
        }
      
      Note that, it declares the "random_variable" variable as "unsigned int".
      Since the result of the shifting operation between STACK_RND_MASK (which
      is 0x3fffff on x86_64, 22 bits) and PAGE_SHIFT (which is 12 on x86_64):
      
      	  random_variable <<= PAGE_SHIFT;
      
      then the two leftmost bits are dropped when storing the result in the
      "random_variable". This variable shall be at least 34 bits long to hold
      the (22+12) result.
      
      These two dropped bits have an impact on the entropy of process stack.
      Concretely, the total stack entropy is reduced by four: from 2^28 to
      2^30 (One fourth of expected entropy).
      
      This patch restores back the entropy by correcting the types involved
      in the operations in the functions randomize_stack_top() and
      stack_maxrandom_size().
      
      The successful fix can be tested with:
      
        $ for i in `seq 1 10`; do cat /proc/self/maps | grep stack; done
        7ffeda566000-7ffeda587000 rw-p 00000000 00:00 0                          [stack]
        7fff5a332000-7fff5a353000 rw-p 00000000 00:00 0                          [stack]
        7ffcdb7a1000-7ffcdb7c2000 rw-p 00000000 00:00 0                          [stack]
        7ffd5e2c4000-7ffd5e2e5000 rw-p 00000000 00:00 0                          [stack]
        ...
      
      Once corrected, the leading bytes should be between 7ffc and 7fff,
      rather than always being 7fff.
      Signed-off-by: default avatarHector Marco-Gisbert <hecmargi@upv.es>
      Signed-off-by: default avatarIsmael Ripoll <iripoll@upv.es>
      [ Rebased, fixed 80 char bugs, cleaned up commit message, added test example and CVE ]
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Cc: <stable@vger.kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Fixes: CVE-2015-1593
      Link: http://lkml.kernel.org/r/20150214173350.GA18393@www.outflux.netSigned-off-by: default avatarBorislav Petkov <bp@suse.de>
      
      (cherry picked from commit 4e7c22d4)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      02d7840b
    • Thadeu Lima de Souza Cascardo's avatar
      blk-throttle: check stats_cpu before reading it from sysfs · bc289b9a
      Thadeu Lima de Souza Cascardo authored
      When reading blkio.throttle.io_serviced in a recently created blkio
      cgroup, it's possible to race against the creation of a throttle policy,
      which delays the allocation of stats_cpu.
      
      Like other functions in the throttle code, just checking for a NULL
      stats_cpu prevents the following oops caused by that race.
      
      [ 1117.285199] Unable to handle kernel paging request for data at address 0x7fb4d0020
      [ 1117.285252] Faulting instruction address: 0xc0000000003efa2c
      [ 1137.733921] Oops: Kernel access of bad area, sig: 11 [#1]
      [ 1137.733945] SMP NR_CPUS=2048 NUMA PowerNV
      [ 1137.734025] Modules linked in: bridge stp llc kvm_hv kvm binfmt_misc autofs4
      [ 1137.734102] CPU: 3 PID: 5302 Comm: blkcgroup Not tainted 3.19.0 #5
      [ 1137.734132] task: c000000f1d188b00 ti: c000000f1d210000 task.ti: c000000f1d210000
      [ 1137.734167] NIP: c0000000003efa2c LR: c0000000003ef9f0 CTR: c0000000003ef980
      [ 1137.734202] REGS: c000000f1d213500 TRAP: 0300   Not tainted  (3.19.0)
      [ 1137.734230] MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI>  CR: 42008884  XER: 20000000
      [ 1137.734325] CFAR: 0000000000008458 DAR: 00000007fb4d0020 DSISR: 40000000 SOFTE: 0
      GPR00: c0000000003ed3a0 c000000f1d213780 c000000000c59538 0000000000000000
      GPR04: 0000000000000800 0000000000000000 0000000000000000 0000000000000000
      GPR08: ffffffffffffffff 00000007fb4d0020 00000007fb4d0000 c000000000780808
      GPR12: 0000000022000888 c00000000fdc0d80 0000000000000000 0000000000000000
      GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
      GPR20: 000001003e120200 c000000f1d5b0cc0 0000000000000200 0000000000000000
      GPR24: 0000000000000001 c000000000c269e0 0000000000000020 c000000f1d5b0c80
      GPR28: c000000000ca3a08 c000000000ca3dec c000000f1c667e00 c000000f1d213850
      [ 1137.734886] NIP [c0000000003efa2c] .tg_prfill_cpu_rwstat+0xac/0x180
      [ 1137.734915] LR [c0000000003ef9f0] .tg_prfill_cpu_rwstat+0x70/0x180
      [ 1137.734943] Call Trace:
      [ 1137.734952] [c000000f1d213780] [d000000005560520] 0xd000000005560520 (unreliable)
      [ 1137.734996] [c000000f1d2138a0] [c0000000003ed3a0] .blkcg_print_blkgs+0xe0/0x1a0
      [ 1137.735039] [c000000f1d213960] [c0000000003efb50] .tg_print_cpu_rwstat+0x50/0x70
      [ 1137.735082] [c000000f1d2139e0] [c000000000104b48] .cgroup_seqfile_show+0x58/0x150
      [ 1137.735125] [c000000f1d213a70] [c0000000002749dc] .kernfs_seq_show+0x3c/0x50
      [ 1137.735161] [c000000f1d213ae0] [c000000000218630] .seq_read+0xe0/0x510
      [ 1137.735197] [c000000f1d213bd0] [c000000000275b04] .kernfs_fop_read+0x164/0x200
      [ 1137.735240] [c000000f1d213c80] [c0000000001eb8e0] .__vfs_read+0x30/0x80
      [ 1137.735276] [c000000f1d213cf0] [c0000000001eb9c4] .vfs_read+0x94/0x1b0
      [ 1137.735312] [c000000f1d213d90] [c0000000001ebb38] .SyS_read+0x58/0x100
      [ 1137.735349] [c000000f1d213e30] [c000000000009218] syscall_exit+0x0/0x98
      [ 1137.735383] Instruction dump:
      [ 1137.735405] 7c6307b4 7f891800 409d00b8 60000000 60420000 3d420004 392a63b0 786a1f24
      [ 1137.735471] 7d49502a e93e01c8 7d495214 7d2ad214 <7cead02a> e9090008 e9490010 e9290018
      
      And here is one code that allows to easily reproduce this, although this
      has first been found by running docker.
      
      void run(pid_t pid)
      {
      	int n;
      	int status;
      	int fd;
      	char *buffer;
      	buffer = memalign(BUFFER_ALIGN, BUFFER_SIZE);
      	n = snprintf(buffer, BUFFER_SIZE, "%d\n", pid);
      	fd = open(CGPATH "/test/tasks", O_WRONLY);
      	write(fd, buffer, n);
      	close(fd);
      	if (fork() > 0) {
      		fd = open("/dev/sda", O_RDONLY | O_DIRECT);
      		read(fd, buffer, 512);
      		close(fd);
      		wait(&status);
      	} else {
      		fd = open(CGPATH "/test/blkio.throttle.io_serviced", O_RDONLY);
      		n = read(fd, buffer, BUFFER_SIZE);
      		close(fd);
      	}
      	free(buffer);
      	exit(0);
      }
      
      void test(void)
      {
      	int status;
      	mkdir(CGPATH "/test", 0666);
      	if (fork() > 0)
      		wait(&status);
      	else
      		run(getpid());
      	rmdir(CGPATH "/test");
      }
      
      int main(int argc, char **argv)
      {
      	int i;
      	for (i = 0; i < NR_TESTS; i++)
      		test();
      	return 0;
      }
      Reported-by: default avatarRicardo Marin Matinata <rmm@br.ibm.com>
      Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      
      (cherry picked from commit 045c47ca)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      bc289b9a
    • Chen Jie's avatar
      jffs2: fix handling of corrupted summary length · ccbdcbb9
      Chen Jie authored
      sm->offset maybe wrong but magic maybe right, the offset do not have CRC.
      
      Badness at c00c7580 [verbose debug info unavailable]
      NIP: c00c7580 LR: c00c718c CTR: 00000014
      REGS: df07bb40 TRAP: 0700   Not tainted  (2.6.34.13-WR4.3.0.0_standard)
      MSR: 00029000 <EE,ME,CE>  CR: 22084f84  XER: 00000000
      TASK = df84d6e0[908] 'mount' THREAD: df07a000
      GPR00: 00000001 df07bbf0 df84d6e0 00000000 00000001 00000000 df07bb58 00000041
      GPR08: 00000041 c0638860 00000000 00000010 22084f88 100636c8 df814ff8 00000000
      GPR16: df84d6e0 dfa558cc c05adb90 00000048 c0452d30 00000000 000240d0 000040d0
      GPR24: 00000014 c05ae734 c05be2e0 00000000 00000001 00000000 00000000 c05ae730
      NIP [c00c7580] __alloc_pages_nodemask+0x4d0/0x638
      LR [c00c718c] __alloc_pages_nodemask+0xdc/0x638
      Call Trace:
      [df07bbf0] [c00c718c] __alloc_pages_nodemask+0xdc/0x638 (unreliable)
      [df07bc90] [c00c7708] __get_free_pages+0x20/0x48
      [df07bca0] [c00f4a40] __kmalloc+0x15c/0x1ec
      [df07bcd0] [c01fc880] jffs2_scan_medium+0xa58/0x14d0
      [df07bd70] [c01ff38c] jffs2_do_mount_fs+0x1f4/0x6b4
      [df07bdb0] [c020144c] jffs2_do_fill_super+0xa8/0x260
      [df07bdd0] [c020230c] jffs2_fill_super+0x104/0x184
      [df07be00] [c0335814] get_sb_mtd_aux+0x9c/0xec
      [df07be20] [c033596c] get_sb_mtd+0x84/0x1e8
      [df07be60] [c0201ed0] jffs2_get_sb+0x1c/0x2c
      [df07be70] [c0103898] vfs_kern_mount+0x78/0x1e8
      [df07bea0] [c0103a58] do_kern_mount+0x40/0x100
      [df07bec0] [c011fe90] do_mount+0x240/0x890
      [df07bf10] [c0120570] sys_mount+0x90/0xd8
      [df07bf40] [c00110d8] ret_from_syscall+0x0/0x4
      
      === Exception: c01 at 0xff61a34
          LR = 0x100135f0
      Instruction dump:
      38800005 38600000 48010f41 4bfffe1c 4bfc2d15 4bfffe8c 72e90200 4082fc28
      3d20c064 39298860 8809000d 68000001 <0f000000> 2f800000 419efc0c 38000001
      mount: mounting /dev/mtdblock3 on /common failed: Input/output error
      Signed-off-by: default avatarChen Jie <chenjie6@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarDavid Woodhouse <David.Woodhouse@intel.com>
      
      (cherry picked from commit 164c2406)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      ccbdcbb9
    • Tomáš Hodek's avatar
      md/raid1: fix read balance when a drive is write-mostly. · 7401d6e3
      Tomáš Hodek authored
      When a drive is marked write-mostly it should only be the
      target of reads if there is no other option.
      
      This behaviour was broken by
      
      commit 9dedf603
          md/raid1: read balance chooses idlest disk for SSD
      
      which causes a write-mostly device to be *preferred* is some cases.
      
      Restore correct behaviour by checking and setting
      best_dist_disk and best_pending_disk rather than best_disk.
      
      We only need to test one of these as they are both changed
      from -1 or >=0 at the same time.
      
      As we leave min_pending and best_dist unchanged, any non-write-mostly
      device will appear better than the write-mostly device.
      Reported-by: default avatarTomáš Hodek <tomas.hodek@volny.cz>
      Reported-by: default avatarDark Penguin <darkpenguin@yandex.ru>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Link: http://marc.info/?l=linux-raid&m=135982797322422
      Fixes: 9dedf603
      Cc: stable@vger.kernel.org (3.6+)
      
      (cherry picked from commit d1901ef0)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      7401d6e3
    • NeilBrown's avatar
      md/raid5: Fix livelock when array is both resyncing and degraded. · d1fb6ee2
      NeilBrown authored
      Commit a7854487:
        md: When RAID5 is dirty, force reconstruct-write instead of read-modify-write.
      
      Causes an RCW cycle to be forced even when the array is degraded.
      A degraded array cannot support RCW as that requires reading all data
      blocks, and one may be missing.
      
      Forcing an RCW when it is not possible causes a live-lock and the code
      spins, repeatedly deciding to do something that cannot succeed.
      
      So change the condition to only force RCW on non-degraded arrays.
      Reported-by: default avatarManibalan P <pmanibalan@amiindia.co.in>
      Bisected-by: default avatarJes Sorensen <Jes.Sorensen@redhat.com>
      Tested-by: default avatarJes Sorensen <Jes.Sorensen@redhat.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Fixes: a7854487
      Cc: stable@vger.kernel.org (v3.7+)
      
      (cherry picked from commit 26ac1073)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      d1fb6ee2
    • Nicolas Saenz Julienne's avatar
      gpio: tps65912: fix wrong container_of arguments · 81bf896a
      Nicolas Saenz Julienne authored
      The gpio_chip operations receive a pointer the gpio_chip struct which is
      contained in the driver's private struct, yet the container_of call in those
      functions point to the mfd struct defined in include/linux/mfd/tps65912.h.
      
      Cc: Stable <stable@vger.kernel.org>
      Signed-off-by: default avatarNicolas Saenz Julienne <nicolassaenzj@gmail.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      
      (cherry picked from commit 2f97c20e)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      81bf896a
    • Catalin Marinas's avatar
      arm64: compat Fix siginfo_t -> compat_siginfo_t conversion on big endian · 2f6ed8ad
      Catalin Marinas authored
      The native (64-bit) sigval_t union contains sival_int (32-bit) and
      sival_ptr (64-bit). When a compat application invokes a syscall that
      takes a sigval_t value (as part of a larger structure, e.g.
      compat_sys_mq_notify, compat_sys_timer_create), the compat_sigval_t
      union is converted to the native sigval_t with sival_int overlapping
      with either the least or the most significant half of sival_ptr,
      depending on endianness. When the corresponding signal is delivered to a
      compat application, on big endian the current (compat_uptr_t)sival_ptr
      cast always returns 0 since sival_int corresponds to the top part of
      sival_ptr. This patch fixes copy_siginfo_to_user32() so that sival_int
      is copied to the compat_siginfo_t structure.
      
      Cc: <stable@vger.kernel.org>
      Reported-by: default avatarBamvor Jian Zhang <bamvor.zhangjian@huawei.com>
      Tested-by: default avatarBamvor Jian Zhang <bamvor.zhangjian@huawei.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      
      (cherry picked from commit 9d42d48a)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      2f6ed8ad
    • Martin Vajnar's avatar
      hx4700: regulator: declare full constraints · 7801a4a6
      Martin Vajnar authored
      Since the removal of CONFIG_REGULATOR_DUMMY option, the touchscreen stopped
      working. This patch enables the "replacement" for REGULATOR_DUMMY and
      allows the touchscreen to work even though there is no regulator for "vcc".
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMartin Vajnar <martin.vajnar@gmail.com>
      Signed-off-by: default avatarRobert Jarzmik <robert.jarzmik@free.fr>
      
      (cherry picked from commit a52d2093)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      7801a4a6
    • Jan Kara's avatar
      udf: Check length of extended attributes and allocation descriptors · 0d31b113
      Jan Kara authored
      Check length of extended attributes and allocation descriptors when
      loading inodes from disk. Otherwise corrupted filesystems could confuse
      the code and make the kernel oops.
      Reported-by: default avatarCarl Henrik Lunde <chlunde@ping.uio.no>
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      
      (cherry picked from commit 23b133bd)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      0d31b113
    • Jay Lan's avatar
      kdb: fix incorrect counts in KDB summary command output · 16301684
      Jay Lan authored
      The output of KDB 'summary' command should report MemTotal, MemFree
      and Buffers output in kB. Current codes report in unit of pages.
      
      A define of K(x) as
      is defined in the code, but not used.
      
      This patch would apply the define to convert the values to kB.
      Please include me on Cc on replies. I do not subscribe to linux-kernel.
      Signed-off-by: default avatarJay Lan <jlan@sgi.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarJason Wessel <jason.wessel@windriver.com>
      
      (cherry picked from commit 14675592)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      16301684
    • Dmitry Eremin-Solenikov's avatar
      ARM: pxa: add regulator_has_full_constraints to poodle board file · 016e2467
      Dmitry Eremin-Solenikov authored
      Add regulator_has_full_constraints() call to poodle board file to let
      regulator core know that we do not have any additional regulators left.
      This lets it substitute unprovided regulators with dummy ones.
      
      This fixes the following warnings that can be seen on poodle if
      regulators are enabled:
      
      ads7846 spi1.0: unable to get regulator: -517
      spi spi1.0: Driver ads7846 requests probe deferral
      wm8731 0-001b: Failed to get supply 'AVDD': -517
      wm8731 0-001b: Failed to request supplies: -517
      wm8731 0-001b: ASoC: failed to probe component -517
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDmitry Eremin-Solenikov <dbaryshkov@gmail.com>
      Acked-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarRobert Jarzmik <robert.jarzmik@free.fr>
      
      (cherry picked from commit 9bc78f32)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      016e2467
    • Dmitry Eremin-Solenikov's avatar
      ARM: pxa: add regulator_has_full_constraints to corgi board file · 857d9547
      Dmitry Eremin-Solenikov authored
      Add regulator_has_full_constraints() call to corgi board file to let
      regulator core know that we do not have any additional regulators left.
      This lets it substitute unprovided regulators with dummy ones.
      
      This fixes the following warnings that can be seen on corgi if
      regulators are enabled:
      
      ads7846 spi1.0: unable to get regulator: -517
      spi spi1.0: Driver ads7846 requests probe deferral
      wm8731 0-001b: Failed to get supply 'AVDD': -517
      wm8731 0-001b: Failed to request supplies: -517
      wm8731 0-001b: ASoC: failed to probe component -517
      corgi-audio corgi-audio: ASoC: failed to instantiate card -517
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDmitry Eremin-Solenikov <dbaryshkov@gmail.com>
      Acked-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarRobert Jarzmik <robert.jarzmik@free.fr>
      
      (cherry picked from commit 271e8017)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      857d9547
    • Nicolas Pitre's avatar
      vt: provide notifications on selection changes · acacda61
      Nicolas Pitre authored
      The vcs device's poll/fasync support relies on the vt notifier to signal
      changes to the screen content.  Notifier invocations were missing for
      changes that comes through the selection interface though.  Fix that.
      
      Tested with BRLTTY 5.2.
      Signed-off-by: default avatarNicolas Pitre <nico@linaro.org>
      Cc: Dave Mielke <dave@mielke.cc>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      (cherry picked from commit 19e3ae6b)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      acacda61
    • Oliver Neukum's avatar
      cdc-acm: add sanity checks · 4128a2c4
      Oliver Neukum authored
      Check the special CDC headers for a plausible minimum length.
      Another big operating systems ignores such garbage.
      Signed-off-by: default avatarOliver Neukum <oneukum@suse.de>
      CC: stable@vger.kernel.org
      Reviewed-by: default avatarAdam Lee <adam8157@gmail.com>
      Tested-by: default avatarAdam Lee <adam8157@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      (cherry picked from commit 7e860a6e)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      4128a2c4
    • Alan Stern's avatar
      USB: don't cancel queued resets when unbinding drivers · 7a370c88
      Alan Stern authored
      The USB stack provides a mechanism for drivers to request an
      asynchronous device reset (usb_queue_reset_device()).  The mechanism
      uses a work item (reset_ws) embedded in the usb_interface structure
      used by the driver, and the reset is carried out by a work queue
      routine.
      
      The asynchronous reset can race with driver unbinding.  When this
      happens, we try to cancel the queued reset before unbinding the
      driver, on the theory that the driver won't care about any resets once
      it is unbound.
      
      However, thanks to the fact that lockdep now tracks work queue
      accesses, this can provoke a lockdep warning in situations where the
      device reset causes another interface's driver to be unbound; see
      
      	http://marc.info/?l=linux-usb&m=141893165203776&w=2
      
      for an example.  The reason is that the work routine for reset_ws in
      one interface calls cancel_queued_work() for the reset_ws in another
      interface.  Lockdep thinks this might lead to a work routine trying to
      cancel itself.  The simplest solution is not to cancel queued resets
      when unbinding drivers.
      
      This means we now need to acquire a reference to the usb_interface
      when queuing a reset_ws work item and to drop the reference when the
      work routine finishes.  We also need to make sure that the
      usb_interface structure doesn't outlive its parent usb_device; this
      means acquiring and dropping a reference when the interface is created
      and destroyed.
      
      In addition, cancelling a queued reset can fail (if the device is in
      the middle of an earlier reset), and this can cause usb_reset_device()
      to try to rebind an interface that has been deallocated (see
      http://marc.info/?l=linux-usb&m=142175717016628&w=2 for details).
      Acquiring the extra references prevents this failure.
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Reported-by: default avatarRussell King - ARM Linux <linux@arm.linux.org.uk>
      Reported-by: default avatarOlivier Sobrie <olivier@sobrie.be>
      Tested-by: default avatarOlivier Sobrie <olivier@sobrie.be>
      Cc: stable <stable@vger.kernel.org> # 3.19
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      (cherry picked from commit 524134d4)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      7a370c88
    • Sebastian Andrzej Siewior's avatar
      usb: core: buffer: smallest buffer should start at ARCH_DMA_MINALIGN · ff355a54
      Sebastian Andrzej Siewior authored
      the following error pops up during "testusb -a -t 10"
      | musb-hdrc musb-hdrc.1.auto: dma_pool_free buffer-128,	f134e000/be842000 (bad dma)
      hcd_buffer_create() creates a few buffers, the smallest has 32 bytes of
      size. ARCH_KMALLOC_MINALIGN is set to 64 bytes. This combo results in
      hcd_buffer_alloc() returning memory which is 32 bytes aligned and it
      might by identified by buffer_offset() as another buffer. This means the
      buffer which is on a 32 byte boundary will not get freed, instead it
      tries to free another buffer with the error message.
      
      This patch fixes the issue by creating the smallest DMA buffer with the
      size of ARCH_KMALLOC_MINALIGN (or 32 in case ARCH_KMALLOC_MINALIGN is
      smaller). This might be 32, 64 or even 128 bytes. The next three pools
      will have the size 128, 512 and 2048.
      In case the smallest pool is 128 bytes then we have only three pools
      instead of four (and zero the first entry in the array).
      The last pool size is always 2048 bytes which is the assumed PAGE_SIZE /
      2 of 4096. I doubt it makes sense to continue using PAGE_SIZE / 2 where
      we would end up with 8KiB buffer in case we have 16KiB pages.
      Instead I think it makes sense to have a common size(s) and extend them
      if there is need to.
      There is a BUILD_BUG_ON() now in case someone has a minalign of more than
      128 bytes.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Acked-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      (cherry picked from commit 5efd2ea8)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      ff355a54
    • Alan Stern's avatar
      USB: fix use-after-free bug in usb_hcd_unlink_urb() · 396a2711
      Alan Stern authored
      The usb_hcd_unlink_urb() routine in hcd.c contains two possible
      use-after-free errors.  The dev_dbg() statement at the end of the
      routine dereferences urb and urb->dev even though both structures may
      have been deallocated.
      
      This patch fixes the problem by storing urb->dev in a local variable
      (avoiding the dereference of urb) and moving the dev_dbg() up before
      the usb_put_dev() call.
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Reported-by: default avatarJoe Lawrence <joe.lawrence@stratus.com>
      Tested-by: default avatarJoe Lawrence <joe.lawrence@stratus.com>
      CC: <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <greg@kroah.com>
      
      (cherry picked from commit c9919790)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      396a2711