1. 07 Mar, 2015 40 commits
    • Nadav Amit's avatar
      KVM: vmx: Inject #GP on invalid PAT CR · 44a837c5
      Nadav Amit authored
      Guest which sets the PAT CR to invalid value should get a #GP.  Currently, if
      vmx supports loading PAT CR during entry, then the value is not checked.  This
      patch makes the required check in that case.
      Signed-off-by: default avatarNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      
      (cherry picked from commit 4566654b)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      44a837c5
    • Nadav Amit's avatar
      KVM: x86: emulating descriptor load misses long-mode case · fd23ace6
      Nadav Amit authored
      In 64-bit mode a #GP should be delivered to the guest "if the code segment
      descriptor pointed to by the selector in the 64-bit gate doesn't have the L-bit
      set and the D-bit clear." - Intel SDM "Interrupt 13—General Protection
      Exception (#GP)".
      
      This patch fixes the behavior of CS loading emulation code. Although the
      comment says that segment loading is not supported in long mode, this function
      is executed in long mode, so the fix is necassary.
      Signed-off-by: default avatarNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      
      (cherry picked from commit 040c8dc8)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      fd23ace6
    • Nadav Amit's avatar
      KVM: x86: Handle errors when RIP is set during far jumps · 22ced39a
      Nadav Amit authored
      Far jmp/call/ret may fault while loading a new RIP.  Currently KVM does not
      handle this case, and may result in failed vm-entry once the assignment is
      done.  The tricky part of doing so is that loading the new CS affects the
      VMCS/VMCB state, so if we fail during loading the new RIP, we are left in
      unconsistent state.  Therefore, this patch saves on 64-bit the old CS
      descriptor and restores it if loading RIP failed.
      
      This fixes CVE-2014-3647.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      
      (cherry picked from commit d1442d85)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      22ced39a
    • Nadav Amit's avatar
      KVM: x86: Warn if guest virtual address space is not 48-bits · 618f4fab
      Nadav Amit authored
      The KVM emulator code assumes that the guest virtual address space (in 64-bit)
      is 48-bits wide.  Fail the KVM_SET_CPUID and KVM_SET_CPUID2 ioctl if
      userspace tries to create a guest that does not obey this restriction.
      Signed-off-by: default avatarNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      
      (cherry picked from commit dd598091)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      618f4fab
    • Eric Dumazet's avatar
      net: rps: fix cpu unplug · 9597e6a3
      Eric Dumazet authored
      softnet_data.input_pkt_queue is protected by a spinlock that
      we must hold when transferring packets from victim queue to an active
      one. This is because other cpus could still be trying to enqueue packets
      into victim queue.
      
      A second problem is that when we transfert the NAPI poll_list from
      victim to current cpu, we absolutely need to special case the percpu
      backlog, because we do not want to add complex locking to protect
      process_queue : Only owner cpu is allowed to manipulate it, unless cpu
      is offline.
      
      Based on initial patch from Prasad Sodagudi & Subash Abhinov
      Kasiviswanathan.
      
      This version is better because we do not slow down packet processing,
      only make migration safer.
      Reported-by: default avatarPrasad Sodagudi <psodagud@codeaurora.org>
      Reported-by: default avatarSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit ac64da0b)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      9597e6a3
    • karl beldan's avatar
      lib/checksum.c: fix build for generic csum_tcpudp_nofold · 836ec0ff
      karl beldan authored
      Fixed commit added from64to32 under _#ifndef do_csum_ but used it
      under _#ifndef csum_tcpudp_nofold_, breaking some builds (Fengguang's
      robot reported TILEGX's). Move from64to32 under the latter.
      
      Fixes: 150ae0e9 ("lib/checksum.c: fix carry in csum_tcpudp_nofold")
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarKarl Beldan <karl.beldan@rivierawaves.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit 9ce35779)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      836ec0ff
    • Peter Kümmel's avatar
      kconfig: Fix warning "‘jump’ may be used uninitialized" · 9dea6103
      Peter Kümmel authored
      Warning:
      In file included from scripts/kconfig/zconf.tab.c:2537:0:
      scripts/kconfig/menu.c: In function ‘get_symbol_str’:
      scripts/kconfig/menu.c:590:18: warning: ‘jump’ may be used uninitialized in this function [-Wmaybe-uninitialized]
           jump->offset = strlen(r->s);
      
      Simplifies the test logic because (head && local) means (jump != 0)
      and makes GCC happy when checking if the jump pointer was initialized.
      Signed-off-by: default avatarPeter Kümmel <syntheticpp@gmx.net>
      Signed-off-by: default avatarMichal Marek <mmarek@suse.cz>
      
      (cherry picked from commit 2d560306)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      9dea6103
    • karl beldan's avatar
      lib/checksum.c: fix build for generic csum_tcpudp_nofold · fcdaac24
      karl beldan authored
      Fixed commit added from64to32 under _#ifndef do_csum_ but used it
      under _#ifndef csum_tcpudp_nofold_, breaking some builds (Fengguang's
      robot reported TILEGX's). Move from64to32 under the latter.
      
      Fixes: 150ae0e9 ("lib/checksum.c: fix carry in csum_tcpudp_nofold")
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarKarl Beldan <karl.beldan@rivierawaves.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      tcp: ipv4: initialize unicast_sock sk_pacing_rate
      
      When I added sk_pacing_rate field, I forgot to initialize its value
      in the per cpu unicast_sock used in ip_send_unicast_reply()
      
      This means that for sch_fq users, RST packets, or ACK packets sent
      on behalf of TIME_WAIT sockets might be sent to slowly or even dropped
      once we reach the per flow limit.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Fixes: 95bd09eb ("tcp: TSO packets automatic sizing")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      lib/checksum.c: fix carry in csum_tcpudp_nofold
      
      The carry from the 64->32bits folding was dropped, e.g with:
      saddr=0xFFFFFFFF daddr=0xFF0000FF len=0xFFFF proto=0 sum=1,
      csum_tcpudp_nofold returned 0 instead of 1.
      Signed-off-by: default avatarKarl Beldan <karl.beldan@rivierawaves.com>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: netdev@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit 9ce35779
      150ae0e9)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      fcdaac24
    • Martin Walch's avatar
      kconfig: fix bug in search results string: use strlen(gstr->s), not gstr->len · 083e171b
      Martin Walch authored
      The struct gstr has a capacity that may differ from the actual string length.
      
      However, a string manipulation in the function search_conf made the assumption
      that it is the same, which led to messing up some search results, especially
      when the content of the gstr in use had not yet reached at least 63 chars.
      Signed-off-by: default avatarMartin Walch <walch.martin@web.de>
      Acked-by: default avatarWang YanQing <udknight@gmail.com>
      Acked-by: default avatarBenjamin Poirier <bpoirier@suse.de>
      Reviewed-by: default avatar"Yann E. MORIN" <yann.morin.1998@free.fr>
      Signed-off-by: default avatar"Yann E. MORIN" <yann.morin.1998@free.fr>
      
      (cherry picked from commit 503c8230)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      083e171b
    • Adrian Hunter's avatar
      mmc: sdhci-acpi: Intel SDIO has broken card detect · 7592d350
      Adrian Hunter authored
      Intel SDIO has broken card detect so add a quirk to reflect that.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Acked-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarChris Ball <chris@printf.net>
      
      (cherry picked from commit c6748017)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      7592d350
    • Nicholas Bellinger's avatar
      target: Drop arbitrary maximum I/O size limit · a9d4e96a
      Nicholas Bellinger authored
      This patch drops the arbitrary maximum I/O size limit in sbc_parse_cdb(),
      which currently for fabric_max_sectors is hardcoded to 8192 (4 MB for 512
      byte sector devices), and for hw_max_sectors is a backend driver dependent
      value.
      
      This limit is problematic because Linux initiators have only recently
      started to honor block limits MAXIMUM TRANSFER LENGTH, and other non-Linux
      based initiators (eg: MSFT Fibre Channel) can also generate I/Os larger
      than 4 MB in size.
      
      Currently when this happens, the following message will appear on the
      target resulting in I/Os being returned with non recoverable status:
      
        SCSI OP 28h with too big sectors 16384 exceeds fabric_max_sectors: 8192
      
      Instead, drop both [fabric,hw]_max_sector checks in sbc_parse_cdb(),
      and convert the existing hw_max_sectors into a purely informational
      attribute used to represent the granuality that backend driver and/or
      subsystem code is splitting I/Os upon.
      
      Also, update FILEIO with an explicit FD_MAX_BYTES check in fd_execute_rw()
      to deal with the one special iovec limitiation case.
      
      v2 changes:
        - Drop hw_max_sectors check in sbc_parse_cdb()
      Reported-by: default avatarLance Gropper <lance.gropper@qosserver.com>
      Reported-by: default avatarStefan Priebe <s.priebe@profihost.ag>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Roland Dreier <roland@purestorage.com>
      Cc: stable@vger.kernel.org # 3.4
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      
      (cherry picked from commit 046ba642)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      a9d4e96a
    • Hans de Goede's avatar
      asus-nb-wmi: Add another wapf=4 quirk · 03d84eea
      Hans de Goede authored
      Wifi on this laptop does not work unless asus-nb-wmi.wapf=4 is specified on
      the kerne commandline, add a quirk for this.
      
      Cc: stable@vger.kernel.org
      BugLink: https://bugs.launchpad.net/bugs/1173681Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarDarren Hart <dvhart@linux.intel.com>
      
      (cherry picked from commit 841e11cc)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      03d84eea
    • Stanislaw Gruszka's avatar
      asus-nb-wmi: Add wapf4 quirk for the X550VB · bd4dc52c
      Stanislaw Gruszka authored
      X550VB as many others Asus laptops need wapf4 quirk to make RFKILL
      switch be functional. Otherwise system boots with wireless card
      disabled and is only possible to enable it by suspend/resume.
      
      Bug report:
      http://bugzilla.redhat.com/show_bug.cgi?id=1089731#c23Reported-and-tested-by: default avatarVratislav Podzimek <vpodzime@redhat.com>
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: default avatarDarren Hart <dvhart@linux.intel.com>
      
      (cherry picked from commit 4ec7a45b)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      bd4dc52c
    • Hans de Goede's avatar
      asus-nb-wmi: Add wapf4 quirk for the U32U · d187cad6
      Hans de Goede authored
      As reported here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1173681
      the U32U needs wapf=4 too.
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarMatthew Garrett <matthew.garrett@nebula.com>
      
      (cherry picked from commit 831a444e)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      d187cad6
    • Hans de Goede's avatar
      asus-nb-wmi: Add wapf4 quirk for the X550CC · eec7f389
      Hans de Goede authored
      As reported here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1173681
      the X550CC needs wapf=4 too.
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarMatthew Garrett <matthew.garrett@nebula.com>
      
      (cherry picked from commit 6d6ded3b)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      eec7f389
    • Hans de Goede's avatar
      asus-nb-wmi: Add wapf4 quirk for the X550CL · d2498a25
      Hans de Goede authored
      As reported here: https://bugs.launchpad.net/bugs/1277959
      the X550CL needs wapf=4 too.
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarMatthew Garrett <matthew.garrett@nebula.com>
      
      (cherry picked from commit 22ba58c8)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      d2498a25
    • Chris Wilson's avatar
      drm/i915: Force the CS stall for invalidate flushes · 5c88b9b4
      Chris Wilson authored
      In order to act as a full command barrier by itself, we need to tell the
      pipecontrol to actually stall the command streamer while the flush runs.
      We require the full command barrier before operations like
      MI_SET_CONTEXT, which currently rely on a prior invalidate flush.
      
      References: https://bugs.freedesktop.org/show_bug.cgi?id=83677
      Cc: Simon Farnsworth <simon@farnz.org.uk>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      
      (cherry picked from commit add284a3)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      5c88b9b4
    • Gwendal Grignou's avatar
      HID: i2c-hid: prevent buffer overflow in early IRQ · f49f1d42
      Gwendal Grignou authored
      commit d1c7e29e upstream.
      
      Before ->start() is called, bufsize size is set to HID_MIN_BUFFER_SIZE,
      64 bytes. While processing the IRQ, we were asking to receive up to
      wMaxInputLength bytes, which can be bigger than 64 bytes.
      
      Later, when ->start is run, a proper bufsize will be calculated.
      
      Given wMaxInputLength is said to be unreliable in other part of the
      code, set to receive only what we can even if it results in truncated
      reports.
      Signed-off-by: default avatarGwendal Grignou <gwendal@chromium.org>
      Reviewed-by: default avatarBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      (cherry picked from commit f0a431e7)
      f49f1d42
    • Eric W. Biederman's avatar
      umount: Disallow unprivileged mount force · eb30317a
      Eric W. Biederman authored
      Forced unmount affects not just the mount namespace but the underlying
      superblock as well.  Restrict forced unmount to the global root user
      for now.  Otherwise it becomes possible a user in a less privileged
      mount namespace to force the shutdown of a superblock of a filesystem
      in a more privileged mount namespace, allowing a DOS attack on root.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      
      (cherry picked from commit b2f5d4dc)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      eb30317a
    • Ilya Dryomov's avatar
      libceph: change from BUG to WARN for __remove_osd() asserts · 86cb33d0
      Ilya Dryomov authored
      commit cc9f1f51 upstream.
      
      No reason to use BUG_ON for osd request list assertions.
      Signed-off-by: default avatarIlya Dryomov <idryomov@redhat.com>
      Reviewed-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      (cherry picked from commit 26e9bfd9)
      86cb33d0
    • Ilya Dryomov's avatar
      libceph: assert both regular and lingering lists in __remove_osd() · a6ef32b4
      Ilya Dryomov authored
      commit 7c6e6fc5 upstream.
      
      It is important that both regular and lingering requests lists are
      empty when the OSD is removed.
      Signed-off-by: default avatarIlya Dryomov <ilya.dryomov@inktank.com>
      Reviewed-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      (cherry picked from commit 81113cff)
      a6ef32b4
    • Seth Forshee's avatar
      HID: i2c-hid: Limit reads to wMaxInputLength bytes for input events · 20606ef9
      Seth Forshee authored
      d1c7e29e (HID: i2c-hid: prevent buffer overflow in early IRQ)
      changed hid_get_input() to read ihid->bufsize bytes, which can be
      more than wMaxInputLength. This is the case with the Dell XPS 13
      9343, and it is causing events to be missed. In some cases the
      missed events are releases, which can cause the cursor to jump or
      freeze, among other problems. Limit the number of bytes read to
      min(wMaxInputLength, ihid->bufsize) to prevent such problems.
      
      Fixes: d1c7e29e "HID: i2c-hid: prevent buffer overflow in early IRQ"
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Reviewed-by: default avatarBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      
      (cherry picked from commit 6d00f37e)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      20606ef9
    • Hannes Frederic Sowa's avatar
      ipv6: replacing a rt6_info needs to purge possible propagated rt6_infos too · 0839fe82
      Hannes Frederic Sowa authored
      Lubomir Rintel reported that during replacing a route the interface
      reference counter isn't correctly decremented.
      
      To quote bug <https://bugzilla.kernel.org/show_bug.cgi?id=91941>:
      | [root@rhel7-5 lkundrak]# sh -x lal
      | + ip link add dev0 type dummy
      | + ip link set dev0 up
      | + ip link add dev1 type dummy
      | + ip link set dev1 up
      | + ip addr add 2001:db8:8086::2/64 dev dev0
      | + ip route add 2001:db8:8086::/48 dev dev0 proto static metric 20
      | + ip route add 2001:db8:8088::/48 dev dev1 proto static metric 10
      | + ip route replace 2001:db8:8086::/48 dev dev1 proto static metric 20
      | + ip link del dev0 type dummy
      | Message from syslogd@rhel7-5 at Jan 23 10:54:41 ...
      |  kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2
      |
      | Message from syslogd@rhel7-5 at Jan 23 10:54:51 ...
      |  kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2
      
      During replacement of a rt6_info we must walk all parent nodes and check
      if the to be replaced rt6_info got propagated. If so, replace it with
      an alive one.
      
      Fixes: 4a287eba ("IPv6 routing, NLM_F_* flag support: REPLACE and EXCL flags support, warn about missing CREATE flag")
      Reported-by: default avatarLubomir Rintel <lkundrak@v3.sk>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Tested-by: default avatarLubomir Rintel <lkundrak@v3.sk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit 6e9e16e6)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      0839fe82
    • karl beldan's avatar
      lib/checksum.c: fix build for generic csum_tcpudp_nofold · 508caf59
      karl beldan authored
      Fixed commit added from64to32 under _#ifndef do_csum_ but used it
      under _#ifndef csum_tcpudp_nofold_, breaking some builds (Fengguang's
      robot reported TILEGX's). Move from64to32 under the latter.
      
      Fixes: 150ae0e9 ("lib/checksum.c: fix carry in csum_tcpudp_nofold")
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarKarl Beldan <karl.beldan@rivierawaves.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit 9ce35779)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      508caf59
    • karl beldan's avatar
      lib/checksum.c: fix build for generic csum_tcpudp_nofold · e9c21363
      karl beldan authored
      Fixed commit added from64to32 under _#ifndef do_csum_ but used it
      under _#ifndef csum_tcpudp_nofold_, breaking some builds (Fengguang's
      robot reported TILEGX's). Move from64to32 under the latter.
      
      Fixes: 150ae0e9 ("lib/checksum.c: fix carry in csum_tcpudp_nofold")
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarKarl Beldan <karl.beldan@rivierawaves.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      tcp: ipv4: initialize unicast_sock sk_pacing_rate
      
      When I added sk_pacing_rate field, I forgot to initialize its value
      in the per cpu unicast_sock used in ip_send_unicast_reply()
      
      This means that for sch_fq users, RST packets, or ACK packets sent
      on behalf of TIME_WAIT sockets might be sent to slowly or even dropped
      once we reach the per flow limit.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Fixes: 95bd09eb ("tcp: TSO packets automatic sizing")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      lib/checksum.c: fix carry in csum_tcpudp_nofold
      
      The carry from the 64->32bits folding was dropped, e.g with:
      saddr=0xFFFFFFFF daddr=0xFF0000FF len=0xFFFF proto=0 sum=1,
      csum_tcpudp_nofold returned 0 instead of 1.
      Signed-off-by: default avatarKarl Beldan <karl.beldan@rivierawaves.com>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: netdev@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit 9ce35779
      150ae0e9)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      e9c21363
    • David Vrabel's avatar
      Revert "swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single" · 2c1405e2
      David Vrabel authored
      This reverts commit 2c3fc8d2.
      
      This commit broke on x86 PV because entries in the generic SWIOTLB are
      indexed using (pseudo-)physical address not DMA address and these are
      not the same in a x86 PV guest.
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: default avatarStefano Stabellini <stefano.stabellini@eu.citrix.com>
      
      Revert "swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single"
      
      This reverts commit 2c3fc8d2.
      
      This commit broke on x86 PV because entries in the generic SWIOTLB are
      indexed using (pseudo-)physical address not DMA address and these are
      not the same in a x86 PV guest.
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: default avatarStefano Stabellini <stefano.stabellini@eu.citrix.com>
      
      xen: annotate xen_set_identity_and_remap_chunk() with __init
      
      Commit 5b8e7d80 removed the __init
      annotation from xen_set_identity_and_remap_chunk(). Add it again.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: introduce helper functions to do safe read and write accesses
      
      Introduce two helper functions to safely read and write unsigned long
      values from or to memory when the access may fault because the mapping
      is non-present or read-only.
      
      These helpers can be used instead of open coded uses of __get_user()
      and __put_user() avoiding the need to do casts to fix sparse warnings.
      
      Use the helpers in page.h and p2m.c. This will fix the sparse
      warnings when doing "make C=1".
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: Speed up set_phys_to_machine() by using read-only mappings
      
      Instead of checking at each call of set_phys_to_machine() whether a
      new p2m page has to be allocated due to writing an entry in a large
      invalid or identity area, just map those areas read only and react
      to a page fault on write by allocating the new page.
      
      This change will make the common path with no allocation much
      faster as it only requires a single write of the new mfn instead
      of walking the address translation tables and checking for the
      special cases.
      Suggested-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: switch to linear virtual mapped sparse p2m list
      
      At start of the day the Xen hypervisor presents a contiguous mfn list
      to a pv-domain. In order to support sparse memory this mfn list is
      accessed via a three level p2m tree built early in the boot process.
      Whenever the system needs the mfn associated with a pfn this tree is
      used to find the mfn.
      
      Instead of using a software walked tree for accessing a specific mfn
      list entry this patch is creating a virtual address area for the
      entire possible mfn list including memory holes. The holes are
      covered by mapping a pre-defined  page consisting only of "invalid
      mfn" entries. Access to a mfn entry is possible by just using the
      virtual base address of the mfn list and the pfn as index into that
      list. This speeds up the (hot) path of determining the mfn of a
      pfn.
      
      Kernel build on a Dell Latitude E6440 (2 cores, HT) in 64 bit Dom0
      showed following improvements:
      
      Elapsed time: 32:50 ->  32:35
      System:       18:07 ->  17:47
      User:        104:00 -> 103:30
      
      Tested with following configurations:
      - 64 bit dom0, 8GB RAM
      - 64 bit dom0, 128 GB RAM, PCI-area above 4 GB
      - 32 bit domU, 512 MB, 8 GB, 43 GB (more wouldn't work even without
                                          the patch)
      - 32 bit domU, ballooning up and down
      - 32 bit domU, save and restore
      - 32 bit domU with PCI passthrough
      - 64 bit domU, 8 GB, 2049 MB, 5000 MB
      - 64 bit domU, ballooning up and down
      - 64 bit domU, save and restore
      - 64 bit domU with PCI passthrough
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: Hide get_phys_to_machine() to be able to tune common path
      
      Today get_phys_to_machine() is always called when the mfn for a pfn
      is to be obtained. Add a wrapper __pfn_to_mfn() as inline function
      to be able to avoid calling get_phys_to_machine() when possible as
      soon as the switch to a linear mapped p2m list has been done.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      x86: Introduce function to get pmd entry pointer
      
      Introduces lookup_pmd_address() to get the address of the pmd entry
      related to a virtual address in the current address space. This
      function is needed for support of a virtual mapped sparse p2m list
      in xen pv domains, as we need the address of the pmd entry, not the
      one of the pte in that case.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: Delay invalidating extra memory
      
      When the physical memory configuration is initialized the p2m entries
      for not pouplated memory pages are set to "invalid". As those pages
      are beyond the hypervisor built p2m list the p2m tree has to be
      extended.
      
      This patch delays processing the extra memory related p2m entries
      during the boot process until some more basic memory management
      functions are callable. This removes the need to create new p2m
      entries until virtual memory management is available.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: Delay m2p_override initialization
      
      The m2p overrides are used to be able to find the local pfn for a
      foreign mfn mapped into the domain. They are used by driver backends
      having to access frontend data.
      
      As this functionality isn't used in early boot it makes no sense to
      initialize the m2p override functions very early. It can be done
      later without doing any harm, removing the need for allocating memory
      via extend_brk().
      
      While at it make some m2p override functions static as they are only
      used internally.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: Delay remapping memory of pv-domain
      
      Early in the boot process the memory layout of a pv-domain is changed
      to match the E820 map (either the host one for Dom0 or the Xen one)
      regarding placement of RAM and PCI holes. This requires removing memory
      pages initially located at positions not suitable for RAM and adding
      them later at higher addresses where no restrictions apply.
      
      To be able to operate on the hypervisor supported p2m list until a
      virtual mapped linear p2m list can be constructed, remapping must
      be delayed until virtual memory management is initialized, as the
      initial p2m list can't be extended unlimited at physical memory
      initialization time due to it's fixed structure.
      
      A further advantage is the reduction in complexity and code volume as
      we don't have to be careful regarding memory restrictions during p2m
      updates.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: use common page allocation function in p2m.c
      
      In arch/x86/xen/p2m.c three different allocation functions for
      obtaining a memory page are used: extend_brk(), alloc_bootmem_align()
      or __get_free_page().  Which of those functions is used depends on the
      progress of the boot process of the system.
      
      Introduce a common allocation routine selecting the to be called
      allocation routine dynamically based on the boot progress. This allows
      moving initialization steps without having to care about changing
      allocation calls.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: Make functions static
      
      Some functions in arch/x86/xen/p2m.c are used locally only. Make them
      static. Rearrange the functions in p2m.c to avoid forward declarations.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: fix some style issues in p2m.c
      
      The source arch/x86/xen/p2m.c has some coding style issues. Fix them.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      (cherry picked from commit dbdd7476
      4ef8e3f3)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      2c1405e2
    • Chris Wilson's avatar
      drm/i915: Force the CS stall for invalidate flushes · 7c32670d
      Chris Wilson authored
      In order to act as a full command barrier by itself, we need to tell the
      pipecontrol to actually stall the command streamer while the flush runs.
      We require the full command barrier before operations like
      MI_SET_CONTEXT, which currently rely on a prior invalidate flush.
      
      References: https://bugs.freedesktop.org/show_bug.cgi?id=83677
      Cc: Simon Farnsworth <simon@farnz.org.uk>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      
      (cherry picked from commit add284a3)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      7c32670d
    • Gwendal Grignou's avatar
      HID: i2c-hid: prevent buffer overflow in early IRQ · 1fff9f82
      Gwendal Grignou authored
      commit d1c7e29e upstream.
      
      Before ->start() is called, bufsize size is set to HID_MIN_BUFFER_SIZE,
      64 bytes. While processing the IRQ, we were asking to receive up to
      wMaxInputLength bytes, which can be bigger than 64 bytes.
      
      Later, when ->start is run, a proper bufsize will be calculated.
      
      Given wMaxInputLength is said to be unreliable in other part of the
      code, set to receive only what we can even if it results in truncated
      reports.
      Signed-off-by: default avatarGwendal Grignou <gwendal@chromium.org>
      Reviewed-by: default avatarBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      (cherry picked from commit 6d7f4a3b)
      1fff9f82
    • Stefano Stabellini's avatar
      Revert "swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single" · b828503b
      Stefano Stabellini authored
      This reverts commit 2c3fc8d2.
      
      This commit broke on x86 PV because entries in the generic SWIOTLB are
      indexed using (pseudo-)physical address not DMA address and these are
      not the same in a x86 PV guest.
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: default avatarStefano Stabellini <stefano.stabellini@eu.citrix.com>
      
      Revert "swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single"
      
      This reverts commit 2c3fc8d2.
      
      This commit broke on x86 PV because entries in the generic SWIOTLB are
      indexed using (pseudo-)physical address not DMA address and these are
      not the same in a x86 PV guest.
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: default avatarStefano Stabellini <stefano.stabellini@eu.citrix.com>
      
      xen: annotate xen_set_identity_and_remap_chunk() with __init
      
      Commit 5b8e7d80 removed the __init
      annotation from xen_set_identity_and_remap_chunk(). Add it again.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: introduce helper functions to do safe read and write accesses
      
      Introduce two helper functions to safely read and write unsigned long
      values from or to memory when the access may fault because the mapping
      is non-present or read-only.
      
      These helpers can be used instead of open coded uses of __get_user()
      and __put_user() avoiding the need to do casts to fix sparse warnings.
      
      Use the helpers in page.h and p2m.c. This will fix the sparse
      warnings when doing "make C=1".
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: Speed up set_phys_to_machine() by using read-only mappings
      
      Instead of checking at each call of set_phys_to_machine() whether a
      new p2m page has to be allocated due to writing an entry in a large
      invalid or identity area, just map those areas read only and react
      to a page fault on write by allocating the new page.
      
      This change will make the common path with no allocation much
      faster as it only requires a single write of the new mfn instead
      of walking the address translation tables and checking for the
      special cases.
      Suggested-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: switch to linear virtual mapped sparse p2m list
      
      At start of the day the Xen hypervisor presents a contiguous mfn list
      to a pv-domain. In order to support sparse memory this mfn list is
      accessed via a three level p2m tree built early in the boot process.
      Whenever the system needs the mfn associated with a pfn this tree is
      used to find the mfn.
      
      Instead of using a software walked tree for accessing a specific mfn
      list entry this patch is creating a virtual address area for the
      entire possible mfn list including memory holes. The holes are
      covered by mapping a pre-defined  page consisting only of "invalid
      mfn" entries. Access to a mfn entry is possible by just using the
      virtual base address of the mfn list and the pfn as index into that
      list. This speeds up the (hot) path of determining the mfn of a
      pfn.
      
      Kernel build on a Dell Latitude E6440 (2 cores, HT) in 64 bit Dom0
      showed following improvements:
      
      Elapsed time: 32:50 ->  32:35
      System:       18:07 ->  17:47
      User:        104:00 -> 103:30
      
      Tested with following configurations:
      - 64 bit dom0, 8GB RAM
      - 64 bit dom0, 128 GB RAM, PCI-area above 4 GB
      - 32 bit domU, 512 MB, 8 GB, 43 GB (more wouldn't work even without
                                          the patch)
      - 32 bit domU, ballooning up and down
      - 32 bit domU, save and restore
      - 32 bit domU with PCI passthrough
      - 64 bit domU, 8 GB, 2049 MB, 5000 MB
      - 64 bit domU, ballooning up and down
      - 64 bit domU, save and restore
      - 64 bit domU with PCI passthrough
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: Hide get_phys_to_machine() to be able to tune common path
      
      Today get_phys_to_machine() is always called when the mfn for a pfn
      is to be obtained. Add a wrapper __pfn_to_mfn() as inline function
      to be able to avoid calling get_phys_to_machine() when possible as
      soon as the switch to a linear mapped p2m list has been done.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      x86: Introduce function to get pmd entry pointer
      
      Introduces lookup_pmd_address() to get the address of the pmd entry
      related to a virtual address in the current address space. This
      function is needed for support of a virtual mapped sparse p2m list
      in xen pv domains, as we need the address of the pmd entry, not the
      one of the pte in that case.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: Delay invalidating extra memory
      
      When the physical memory configuration is initialized the p2m entries
      for not pouplated memory pages are set to "invalid". As those pages
      are beyond the hypervisor built p2m list the p2m tree has to be
      extended.
      
      This patch delays processing the extra memory related p2m entries
      during the boot process until some more basic memory management
      functions are callable. This removes the need to create new p2m
      entries until virtual memory management is available.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: Delay m2p_override initialization
      
      The m2p overrides are used to be able to find the local pfn for a
      foreign mfn mapped into the domain. They are used by driver backends
      having to access frontend data.
      
      As this functionality isn't used in early boot it makes no sense to
      initialize the m2p override functions very early. It can be done
      later without doing any harm, removing the need for allocating memory
      via extend_brk().
      
      While at it make some m2p override functions static as they are only
      used internally.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: Delay remapping memory of pv-domain
      
      Early in the boot process the memory layout of a pv-domain is changed
      to match the E820 map (either the host one for Dom0 or the Xen one)
      regarding placement of RAM and PCI holes. This requires removing memory
      pages initially located at positions not suitable for RAM and adding
      them later at higher addresses where no restrictions apply.
      
      To be able to operate on the hypervisor supported p2m list until a
      virtual mapped linear p2m list can be constructed, remapping must
      be delayed until virtual memory management is initialized, as the
      initial p2m list can't be extended unlimited at physical memory
      initialization time due to it's fixed structure.
      
      A further advantage is the reduction in complexity and code volume as
      we don't have to be careful regarding memory restrictions during p2m
      updates.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: use common page allocation function in p2m.c
      
      In arch/x86/xen/p2m.c three different allocation functions for
      obtaining a memory page are used: extend_brk(), alloc_bootmem_align()
      or __get_free_page().  Which of those functions is used depends on the
      progress of the boot process of the system.
      
      Introduce a common allocation routine selecting the to be called
      allocation routine dynamically based on the boot progress. This allows
      moving initialization steps without having to care about changing
      allocation calls.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: Make functions static
      
      Some functions in arch/x86/xen/p2m.c are used locally only. Make them
      static. Rearrange the functions in p2m.c to avoid forward declarations.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen: fix some style issues in p2m.c
      
      The source arch/x86/xen/p2m.c has some coding style issues. Fix them.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen/pci: Use APIC directly when APIC virtualization hardware is available
      
      When hardware supports APIC/x2APIC virtualization we don't need to use
      pirqs for MSI handling and instead use APIC since most APIC accesses
      (MMIO or MSR) will now be processed without VMEXITs.
      
      As an example, netperf on the original code produces this profile
      (collected wih 'xentrace -e 0x0008ffff -T 5'):
      
          342 cpu_change
          260 CPUID
        34638 HLT
        64067 INJ_VIRQ
        28374 INTR
        82733 INTR_WINDOW
           10 NPF
        24337 TRAP
       370610 vlapic_accept_pic_intr
       307528 VMENTRY
       307527 VMEXIT
       140998 VMMCALL
          127 wrap_buffer
      
      After applying this patch the same test shows
      
          230 cpu_change
          260 CPUID
        36542 HLT
          174 INJ_VIRQ
        27250 INTR
          222 INTR_WINDOW
           20 NPF
        24999 TRAP
       381812 vlapic_accept_pic_intr
       166480 VMENTRY
       166479 VMEXIT
        77208 VMMCALL
           81 wrap_buffer
      
      ApacheBench results (ab -n 10000 -c 200) improve by about 10%
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: default avatarAndrew Cooper <andrew.cooper3@citrix.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen/pci: Defer initialization of MSI ops on HVM guests
      
      If the hardware supports APIC virtualization we may decide not to use
      pirqs and instead use APIC/x2APIC directly, meaning that we don't want
      to set x86_msi.setup_msi_irqs and x86_msi.teardown_msi_irq to
      Xen-specific routines.  However, x2APIC is not set up by the time
      pci_xen_hvm_init() is called so we need to postpone setting these ops
      until later, when we know which APIC mode is used.
      
      (Note that currently x2APIC is never initialized on HVM guests. This
      may change in the future)
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Acked-by: default avatarStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen-pciback: drop SR-IOV VFs when PF driver unloads
      
      When a PF driver unloads, it may find it necessary to leave the VFs
      around simply because of pciback having marked them as assigned to a
      guest. Utilize a suitable notification to let go of the VFs, thus
      allowing the PF to go back into the state it was before its driver
      loaded (which in particular allows the driver to be loaded again with
      it being able to create the VFs anew, but which also allows to then
      pass through the PF instead of the VFs).
      
      Don't do this however for any VFs currently in active use by a guest.
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      [v2: Removed the switch statement, moved it about]
      [v3: Redid it a bit differently]
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen/pciback: Restore configuration space when detaching from a guest.
      
      The commit "xen/pciback: Don't deadlock when unbinding." was using
      the version of pci_reset_function which would lock the device lock.
      That is no good as we can dead-lock. As such we swapped to using
      the lock-less version and requiring that the callers
      of 'pcistub_put_pci_dev' take the device lock. And as such
      this bug got exposed.
      
      Using the lock-less version is  OK, except that we tried to
      use 'pci_restore_state' after the lock-less version of
      __pci_reset_function_locked - which won't work as 'state_saved'
      is set to false. Said 'state_saved' is a toggle boolean that
      is to be used by the sequence of a) pci_save_state/pci_restore_state
      or b) pci_load_and_free_saved_state/pci_restore_state. We don't
      want to use a) as the guest might have messed up the PCI
      configuration space and we want it to revert to the state
      when the PCI device was binded to us. Therefore we pick
      b) to restore the configuration space.
      
      We restore from our 'golden' version of PCI configuration space, when an:
       - Device is unbinded from pciback
       - Device is detached from a guest.
      Reported-by: default avatarSander Eikelenboom <linux@eikelenboom.it>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      PCI: Expose pci_load_saved_state for public consumption.
      
      We have the pci_load_and_free_saved_state, and pci_store_saved_state
      but are missing the functionality to just load the state
      multiple times in the PCI device without having to free/save
      the state.
      
      This patch makes it possible to use this function.
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen/pciback: Remove tons of dereferences
      
      A little cleanup. No functional difference.
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen/pciback: Print out the domain owning the device.
      
      We had been printing it only if the device was built with
      debug enabled. But this information is useful in the field
      to troubleshoot.
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen/pciback: Include the domain id if removing the device whilst still in use
      
      Cleanup the function a bit - also include the id of the
      domain that is using the device.
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      driver core: Provide an wrapper around the mutex to do lockdep warnings
      
      Instead of open-coding it in drivers that want to double check
      that their functions are indeed holding the device lock.
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Suggested-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      xen/pciback: Don't deadlock when unbinding.
      
      As commit 0a9fd015
      'xen/pciback: Document the entry points for 'pcistub_put_pci_dev''
      explained there are four entry points in this function.
      Two of them are when the user fiddles in the SysFS to
      unbind a device which might be in use by a guest or not.
      
      Both 'unbind' states will cause a deadlock as the the PCI lock has
      already been taken, which then pci_device_reset tries to take.
      
      We can simplify this by requiring that all callers of
      pcistub_put_pci_dev MUST hold the device lock. And then
      we can just call the lockless version of pci_device_reset.
      
      To make it even simpler we will modify xen_pcibk_release_pci_dev
      to quality whether it should take a lock or not - as it ends
      up calling xen_pcibk_release_pci_dev and needs to hold the lock.
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      
      swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single
      
      Need to pass the pointer within the swiotlb internal buffer to the
      swiotlb library, that in the case of xen_unmap_single is dev_addr, not
      paddr.
      Signed-off-by: default avatarStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Acked-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      CC: stable@vger.kernel.org
      
      (cherry picked from commit dbdd7476
      4ef8e3f3
      2c3fc8d2)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      b828503b
    • Marcelo Tosatti's avatar
      KVM: x86: update masterclock values on TSC writes · 72bd7e64
      Marcelo Tosatti authored
      When the guest writes to the TSC, the masterclock TSC copy must be
      updated as well along with the TSC_OFFSET update, otherwise a negative
      tsc_timestamp is calculated at kvm_guest_time_update.
      
      Once "if (!vcpus_matched && ka->use_master_clock)" is simplified to
      "if (ka->use_master_clock)", the corresponding "if (!ka->use_master_clock)"
      becomes redundant, so remove the do_request boolean and collapse
      everything into a single condition.
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      
      (cherry picked from commit 7f187922)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      72bd7e64
    • Jan Kara's avatar
      udf: Check length of extended attributes and allocation descriptors · 862984c5
      Jan Kara authored
      Check length of extended attributes and allocation descriptors when
      loading inodes from disk. Otherwise corrupted filesystems could confuse
      the code and make the kernel oops.
      Reported-by: default avatarCarl Henrik Lunde <chlunde@ping.uio.no>
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      
      (cherry picked from commit 23b133bd)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      862984c5
    • Seth Forshee's avatar
      HID: i2c-hid: Limit reads to wMaxInputLength bytes for input events · 86a16ad3
      Seth Forshee authored
      d1c7e29e (HID: i2c-hid: prevent buffer overflow in early IRQ)
      changed hid_get_input() to read ihid->bufsize bytes, which can be
      more than wMaxInputLength. This is the case with the Dell XPS 13
      9343, and it is causing events to be missed. In some cases the
      missed events are releases, which can cause the cursor to jump or
      freeze, among other problems. Limit the number of bytes read to
      min(wMaxInputLength, ihid->bufsize) to prevent such problems.
      
      Fixes: d1c7e29e "HID: i2c-hid: prevent buffer overflow in early IRQ"
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Reviewed-by: default avatarBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      
      (cherry picked from commit 6d00f37e)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      86a16ad3
    • Matej Dubovy's avatar
      Bluetooth: btusb: Add support for Lite-On (04ca) Broadcom based, BCM43142 · 97285a8f
      Matej Dubovy authored
      Please add support for sub BT chip on the combo card
      Broadcom 43142A0 (in Lenovo E145), 04ca:2007
      
      /sys/kernel/debug/usb/devices
      
      T:  Bus=05 Lev=01 Prnt=01 Port=01 Cnt=02 Dev#=  3 Spd=12   MxCh= 0
      D:  Ver= 2.00 Cls=ff(vend.) Sub=01 Prot=01 MxPS=64 #Cfgs=  1
      P:  Vendor=04ca ProdID=2007 Rev= 1.12
      S:  Manufacturer=Broadcom Corp
      S:  Product=BCM43142A0
      S:  SerialNumber=28E347EC73BD
      C:* #Ifs= 4 Cfg#= 1 Atr=e0 MxPwr=  0mA
      I:* If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=01 Prot=01 Driver=(none)
      E:  Ad=81(I) Atr=03(Int.) MxPS=  16 Ivl=1ms
      E:  Ad=82(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
      E:  Ad=02(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms
      I:* If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=01 Prot=01 Driver=(none)
      E:  Ad=83(I) Atr=01(Isoc) MxPS=   0 Ivl=1ms
      E:  Ad=03(O) Atr=01(Isoc) MxPS=   0 Ivl=1ms
      I:  If#= 1 Alt= 1 #EPs= 2 Cls=ff(vend.) Sub=01 Prot=01 Driver=(none)
      E:  Ad=83(I) Atr=01(Isoc) MxPS=   9 Ivl=1ms
      E:  Ad=03(O) Atr=01(Isoc) MxPS=   9 Ivl=1ms
      I:  If#= 1 Alt= 2 #EPs= 2 Cls=ff(vend.) Sub=01 Prot=01 Driver=(none)
      E:  Ad=83(I) Atr=01(Isoc) MxPS=  17 Ivl=1ms
      E:  Ad=03(O) Atr=01(Isoc) MxPS=  17 Ivl=1ms
      I:  If#= 1 Alt= 3 #EPs= 2 Cls=ff(vend.) Sub=01 Prot=01 Driver=(none)
      E:  Ad=83(I) Atr=01(Isoc) MxPS=  25 Ivl=1ms
      E:  Ad=03(O) Atr=01(Isoc) MxPS=  25 Ivl=1ms
      I:  If#= 1 Alt= 4 #EPs= 2 Cls=ff(vend.) Sub=01 Prot=01 Driver=(none)
      E:  Ad=83(I) Atr=01(Isoc) MxPS=  33 Ivl=1ms
      E:  Ad=03(O) Atr=01(Isoc) MxPS=  33 Ivl=1ms
      I:  If#= 1 Alt= 5 #EPs= 2 Cls=ff(vend.) Sub=01 Prot=01 Driver=(none)
      E:  Ad=83(I) Atr=01(Isoc) MxPS=  49 Ivl=1ms
      E:  Ad=03(O) Atr=01(Isoc) MxPS=  49 Ivl=1ms
      I:* If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none)
      E:  Ad=84(I) Atr=02(Bulk) MxPS=  32 Ivl=0ms
      E:  Ad=04(O) Atr=02(Bulk) MxPS=  32 Ivl=0ms
      I:* If#= 3 Alt= 0 #EPs= 0 Cls=fe(app. ) Sub=01 Prot=01 Driver=(none)
      
      Firmware for 04ca:2007 can be extracted from the latest Lenovo E145
      Bluetooth driver for Windows (driver is however described as BCM20702
      but contains also firwmare for BCM43142).
      Search for BCM43142A0_001.001.011.0122.0153.hex within hex files, then
      it must be converted using hex2hcd utility. Rename file to
      BCM43142A0-04ca-2007.hcd, then move to /lib/firmware/brcm/.
      Signed-off-by: default avatarMatej Dubovy <matej.dubovy@gmail.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Cc: stable@vger.kernel.org
      
      (cherry picked from commit 8f0c304c)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      97285a8f
    • Austin Lund's avatar
      [media] media/rc: Send sync space information on the lirc device · 190b31ab
      Austin Lund authored
      Userspace expects to see a long space before the first pulse is sent on
      the lirc device.  Currently, if a long time has passed and a new packet
      is started, the lirc codec just returns and doesn't send anything.  This
      makes lircd ignore many perfectly valid signals unless they are sent in
      quick sucession.  When a reset event is delivered, we cannot know
      anything about the duration of the space.  But it should be safe to
      assume it has been a long time and we just set the duration to maximum.
      Signed-off-by: default avatarAustin Lund <austin.lund@gmail.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      
      (cherry picked from commit a8f29e89)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      190b31ab
    • Saran Maruti Ramanara's avatar
      net: sctp: fix passing wrong parameter header to param_type2af in sctp_process_param · bf431c8f
      Saran Maruti Ramanara authored
      When making use of RFC5061, section 4.2.4. for setting the primary IP
      address, we're passing a wrong parameter header to param_type2af(),
      resulting always in NULL being returned.
      
      At this point, param.p points to a sctp_addip_param struct, containing
      a sctp_paramhdr (type = 0xc004, length = var), and crr_id as a correlation
      id. Followed by that, as also presented in RFC5061 section 4.2.4., comes
      the actual sctp_addr_param, which also contains a sctp_paramhdr, but
      this time with the correct type SCTP_PARAM_IPV{4,6}_ADDRESS that
      param_type2af() can make use of. Since we already hold a pointer to
      addr_param from previous line, just reuse it for param_type2af().
      
      Fixes: d6de3097 ("[SCTP]: Add the handling of "Set Primary IP Address" parameter to INIT")
      Signed-off-by: default avatarSaran Maruti Ramanara <saran.neti@telus.com>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit cfbf654e)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      bf431c8f
    • Florian Westphal's avatar
      ppp: deflate: never return len larger than output buffer · c047a324
      Florian Westphal authored
      When we've run out of space in the output buffer to store more data, we
      will call zlib_deflate with a NULL output buffer until we've consumed
      remaining input.
      
      When this happens, olen contains the size the output buffer would have
      consumed iff we'd have had enough room.
      
      This can later cause skb_over_panic when ppp_generic skb_put()s
      the returned length.
      Reported-by: default avatarIain Douglas <centos@1n6.org.uk>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit e2a4800e)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      c047a324
    • Eric Dumazet's avatar
      ipv4: tcp: get rid of ugly unicast_sock · 971b8803
      Eric Dumazet authored
      In commit be9f4a44 ("ipv4: tcp: remove per net tcp_sock")
      I tried to address contention on a socket lock, but the solution
      I chose was horrible :
      
      commit 3a7c384f ("ipv4: tcp: unicast_sock should not land outside
      of TCP stack") addressed a selinux regression.
      
      commit 0980e56e ("ipv4: tcp: set unicast_sock uc_ttl to -1")
      took care of another regression.
      
      commit b5ec8eea ("ipv4: fix ip_send_skb()") fixed another regression.
      
      commit 811230cd ("tcp: ipv4: initialize unicast_sock sk_pacing_rate")
      was another shot in the dark.
      
      Really, just use a proper socket per cpu, and remove the skb_orphan()
      call, to re-enable flow control.
      
      This solves a serious problem with FQ packet scheduler when used in
      hostile environments, as we do not want to allocate a flow structure
      for every RST packet sent in response to a spoofed packet.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      net: sched: fix panic in rate estimators
      
      Doing the following commands on a non idle network device
      panics the box instantly, because cpu_bstats gets overwritten
      by stats.
      
      tc qdisc add dev eth0 root <your_favorite_qdisc>
      ... some traffic (one packet is enough) ...
      tc qdisc replace dev eth0 root est 1sec 4sec <your_favorite_qdisc>
      
      [  325.355596] BUG: unable to handle kernel paging request at ffff8841dc5a074c
      [  325.362609] IP: [<ffffffff81541c9e>] __gnet_stats_copy_basic+0x3e/0x90
      [  325.369158] PGD 1fa7067 PUD 0
      [  325.372254] Oops: 0000 [#1] SMP
      [  325.375514] Modules linked in: ...
      [  325.398346] CPU: 13 PID: 14313 Comm: tc Not tainted 3.19.0-smp-DEV #1163
      [  325.412042] task: ffff8800793ab5d0 ti: ffff881ff2fa4000 task.ti: ffff881ff2fa4000
      [  325.419518] RIP: 0010:[<ffffffff81541c9e>]  [<ffffffff81541c9e>] __gnet_stats_copy_basic+0x3e/0x90
      [  325.428506] RSP: 0018:ffff881ff2fa7928  EFLAGS: 00010286
      [  325.433824] RAX: 000000000000000c RBX: ffff881ff2fa796c RCX: 000000000000000c
      [  325.440988] RDX: ffff8841dc5a0744 RSI: 0000000000000060 RDI: 0000000000000060
      [  325.448120] RBP: ffff881ff2fa7948 R08: ffffffff81cd4f80 R09: 0000000000000000
      [  325.455268] R10: ffff883ff223e400 R11: 0000000000000000 R12: 000000015cba0744
      [  325.462405] R13: ffffffff81cd4f80 R14: ffff883ff223e460 R15: ffff883feea0722c
      [  325.469536] FS:  00007f2ee30fa700(0000) GS:ffff88407fa20000(0000) knlGS:0000000000000000
      [  325.477630] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  325.483380] CR2: ffff8841dc5a074c CR3: 0000003feeae9000 CR4: 00000000001407e0
      [  325.490510] Stack:
      [  325.492524]  ffff883feea0722c ffff883fef719dc0 ffff883feea0722c ffff883ff223e4a0
      [  325.499990]  ffff881ff2fa79a8 ffffffff815424ee ffff883ff223e49c 000000015cba0744
      [  325.507460]  00000000f2fa7978 0000000000000000 ffff881ff2fa79a8 ffff883ff223e4a0
      [  325.514956] Call Trace:
      [  325.517412]  [<ffffffff815424ee>] gen_new_estimator+0x8e/0x230
      [  325.523250]  [<ffffffff815427aa>] gen_replace_estimator+0x4a/0x60
      [  325.529349]  [<ffffffff815718ab>] tc_modify_qdisc+0x52b/0x590
      [  325.535117]  [<ffffffff8155edd0>] rtnetlink_rcv_msg+0xa0/0x240
      [  325.540963]  [<ffffffff8155ed30>] ? __rtnl_unlock+0x20/0x20
      [  325.546532]  [<ffffffff8157f811>] netlink_rcv_skb+0xb1/0xc0
      [  325.552145]  [<ffffffff8155b355>] rtnetlink_rcv+0x25/0x40
      [  325.557558]  [<ffffffff8157f0d8>] netlink_unicast+0x168/0x220
      [  325.563317]  [<ffffffff8157f47c>] netlink_sendmsg+0x2ec/0x3e0
      
      Lets play safe and not use an union : percpu 'pointers' are mostly read
      anyway, and we have typically few qdiscs per host.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Fixes: 22e0f8b9 ("net: sched: make bstats per cpu and estimator RCU safe")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      hyperv: Fix the error processing in netvsc_send()
      
      The existing code frees the skb in EAGAIN case, in which the skb will be
      retried from upper layer and used again.
      Also, the existing code doesn't free send buffer slot in error case, because
      there is no completion message for unsent packets.
      This patch fixes these problems.
      
      (Please also include this patch for stable trees. Thanks!)
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Reviewed-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      drivers: net: xgene: fix: Out of order descriptor bytes read
      
      This patch fixes the following kernel crash,
      
      	WARNING: CPU: 2 PID: 0 at net/ipv4/tcp_input.c:3079 tcp_clean_rtx_queue+0x658/0x80c()
      	Call trace:
      	[<fffffe0000096b7c>] dump_backtrace+0x0/0x184
      	[<fffffe0000096d10>] show_stack+0x10/0x1c
      	[<fffffe0000685ea0>] dump_stack+0x74/0x98
      	[<fffffe00000b44e0>] warn_slowpath_common+0x88/0xb0
      	[<fffffe00000b461c>] warn_slowpath_null+0x14/0x20
      	[<fffffe00005b5c1c>] tcp_clean_rtx_queue+0x654/0x80c
      	[<fffffe00005b6228>] tcp_ack+0x454/0x688
      	[<fffffe00005b6ca8>] tcp_rcv_established+0x4a4/0x62c
      	[<fffffe00005bf4b4>] tcp_v4_do_rcv+0x16c/0x350
      	[<fffffe00005c225c>] tcp_v4_rcv+0x8e8/0x904
      	[<fffffe000059d470>] ip_local_deliver_finish+0x100/0x26c
      	[<fffffe000059dad8>] ip_local_deliver+0xac/0xc4
      	[<fffffe000059d6c4>] ip_rcv_finish+0xe8/0x328
      	[<fffffe000059dd3c>] ip_rcv+0x24c/0x38c
      	[<fffffe0000563950>] __netif_receive_skb_core+0x29c/0x7c8
      	[<fffffe0000563ea4>] __netif_receive_skb+0x28/0x7c
      	[<fffffe0000563f54>] netif_receive_skb_internal+0x5c/0xe0
      	[<fffffe0000564810>] napi_gro_receive+0xb4/0x110
      	[<fffffe0000482a2c>] xgene_enet_process_ring+0x144/0x338
      	[<fffffe0000482d18>] xgene_enet_napi+0x1c/0x50
      	[<fffffe0000565454>] net_rx_action+0x154/0x228
      	[<fffffe00000b804c>] __do_softirq+0x110/0x28c
      	[<fffffe00000b8424>] irq_exit+0x8c/0xc0
      	[<fffffe0000093898>] handle_IRQ+0x44/0xa8
      	[<fffffe000009032c>] gic_handle_irq+0x38/0x7c
      	[...]
      
      Software writes poison data into the descriptor bytes[15:8] and upon
      receiving the interrupt, if those bytes are overwritten by the hardware with
      the valid data, software also reads bytes[7:0] and executes receive/tx
      completion logic.
      
      If the CPU executes the above two reads in out of order fashion, then the
      bytes[7:0] will have older data and causing the kernel panic.  We have to
      force the order of the reads and thus this patch introduces read memory
      barrier between these reads.
      Signed-off-by: default avatarIyappan Subramanian <isubramanian@apm.com>
      Signed-off-by: default avatarKeyur Chudgar <kchudgar@apm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      Merge branch 'vlan_get_protocol'
      
      Toshiaki Makita says:
      
      ====================
      Fix checksum error when using stacked vlan
      
      When I was testing 802.1ad, I found several drivers don't take into
      account 802.1ad or multiple vlans when retrieving L3 (IP/IPv6) or
      L4 (TCP/UDP) protocol for checksum offload.
      
      It is mainly due to vlan_get_protocol(), which extracts ether type only
      when it is tagged with single 802.1Q. When 802.1ad is used or there are
      multiple vlans, it extracts vlan protocol and drivers cannot determine
      which L3/L4 protocol is used.
      
      Those drivers, most of which have IP_CSUM/IPV6_CSUM features, get L3/L4
      header-offset by software, so it seems that their checksum offload works
      with multiple vlans if we can parse protocols correctly.
      (They know mac header length, and probably don't care about what is in it.)
      
      And another thing, some of Intel's drivers seem to use skb->protocol where
      vlan_get_protocol() is more suitable.
      
      I tested that at least igb/igbvf on I350 works with this patch set.
      
      Note:
      We can hand a double tagged packet with CHECKSUM_PARTIAL to a HW driver
      by creating a vlan device on a bridge device and enabling vlan_filtering
      of the bridge with 802.1ad protocol.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      ixgbevf: Fix checksum error when using stacked vlan
      
      When a skb has multiple vlans and it is CHECKSUM_PARTIAL,
      ixgbevf_tx_csum() fails to get the network protocol and checksum related
      descriptor fields are not configured correctly because skb->protocol
      doesn't show the L3 protocol in this case.
      
      Use first->protocol instead of skb->protocol to get the proper network
      protocol.
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      ixgbe: Fix checksum error when using stacked vlan
      
      When a skb has multiple vlans and it is CHECKSUM_PARTIAL,
      ixgbe_tx_csum() fails to get the network protocol and checksum related
      descriptor fields are not configured correctly because skb->protocol
      doesn't show the L3 protocol in this case.
      
      Use vlan_get_protocol() to get the proper network protocol.
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      igbvf: Fix checksum error when using stacked vlan
      
      When a skb has multiple vlans and it is CHECKSUM_PARTIAL,
      igbvf_tx_csum() fails to get the network protocol and checksum related
      descriptor fields are not configured correctly because skb->protocol
      doesn't show the L3 protocol in this case.
      
      Use vlan_get_protocol() to get the proper network protocol.
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      net: Fix vlan_get_protocol for stacked vlan
      
      vlan_get_protocol() could not get network protocol if a skb has a 802.1ad
      vlan tag or multiple vlans, which caused incorrect checksum calculation
      in several drivers.
      
      Fix vlan_get_protocol() to retrieve network protocol instead of incorrect
      vlan protocol.
      
      As the logic is the same as skb_network_protocol(), create a common helper
      function __vlan_get_protocol() and call it from existing functions.
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      net: sctp: fix passing wrong parameter header to param_type2af in sctp_process_param
      
      When making use of RFC5061, section 4.2.4. for setting the primary IP
      address, we're passing a wrong parameter header to param_type2af(),
      resulting always in NULL being returned.
      
      At this point, param.p points to a sctp_addip_param struct, containing
      a sctp_paramhdr (type = 0xc004, length = var), and crr_id as a correlation
      id. Followed by that, as also presented in RFC5061 section 4.2.4., comes
      the actual sctp_addr_param, which also contains a sctp_paramhdr, but
      this time with the correct type SCTP_PARAM_IPV{4,6}_ADDRESS that
      param_type2af() can make use of. Since we already hold a pointer to
      addr_param from previous line, just reuse it for param_type2af().
      
      Fixes: d6de3097 ("[SCTP]: Add the handling of "Set Primary IP Address" parameter to INIT")
      Signed-off-by: default avatarSaran Maruti Ramanara <saran.neti@telus.com>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      netlink: fix wrong subscription bitmask to group mapping in
      
      The subscription bitmask passed via struct sockaddr_nl is converted to
      the group number when calling the netlink_bind() and netlink_unbind()
      callbacks.
      
      The conversion is however incorrect since bitmask (1 << 0) needs to be
      mapped to group number 1. Note that you cannot specify the group number 0
      (usually known as _NONE) from setsockopt() using NETLINK_ADD_MEMBERSHIP
      since this is rejected through -EINVAL.
      
      This problem became noticeable since 97840cb6 ("netfilter: nfnetlink:
      fix insufficient validation in nfnetlink_bind") when binding to bitmask
      (1 << 0) in ctnetlink.
      Reported-by: default avatarAndre Tomt <andre@tomt.net>
      Reported-by: default avatarIvan Delalande <colona@arista.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      ipv4: Don't increase PMTU with Datagram Too Big message.
      
      RFC 1191 said, "a host MUST not increase its estimate of the Path
      MTU in response to the contents of a Datagram Too Big message."
      Signed-off-by: default avatarLi Wei <lw@cn.fujitsu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      Merge branch 'arm-build-fixes'
      
      Arnd Bergmann says:
      
      ====================
      net: driver fixes from arm randconfig builds
      
      These four patches are fallout from test builds on ARM. I have a
      few more of them in my backlog but have not yet confirmed them
      to still be valid.
      
      The first three patches are about incomplete dependencies on
      old drivers. One could backport them to the beginning of time
      in theory, but there is little value since nobody would run into
      these problems.
      
      The final patch is one I had submitted before together with the
      respective pcmcia patch but forgot to follow up on that. It's
      still a valid but relatively theoretical bug, because the previous
      behavior of the driver was just as broken as what we have in
      mainline.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      net: am2150: fix nmclan_cs.c shared interrupt handling
      
      A recent patch tried to work around a valid warning for the use of a
      deprecated interface by blindly changing from the old
      pcmcia_request_exclusive_irq() interface to pcmcia_request_irq().
      
      This driver has an interrupt handler that is not currently aware
      of shared interrupts, but can be easily converted to be.
      At the moment, the driver reads the interrupt status register
      repeatedly until it contains only zeroes in the interesting bits,
      and handles each bit individually.
      
      This patch adds the missing part of returning IRQ_NONE in case none
      of the bits are set to start with, so we can move on to the next
      interrupt source.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Fixes: 5f5316fc ("am2150: Update nmclan_cs.c to use update PCMCIA API")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      net: lance,ni64: don't build for ARM
      
      The ni65 and lance ethernet drivers manually program the ISA DMA
      controller that is only available on x86 PCs and a few compatible
      systems. Trying to build it on ARM results in this error:
      
      ni65.c: In function 'ni65_probe1':
      ni65.c:496:62: error: 'DMA1_STAT_REG' undeclared (first use in this function)
           ((inb(DMA1_STAT_REG) >> 4) & 0x0f)
                                                                    ^
      ni65.c:496:62: note: each undeclared identifier is reported only once for each function it appears in
      ni65.c:497:63: error: 'DMA2_STAT_REG' undeclared (first use in this function)
           | (inb(DMA2_STAT_REG) & 0xf0);
      
      The DMA1_STAT_REG and DMA2_STAT_REG registers are only defined for
      alpha, mips, parisc, powerpc and x86, although it is not clear
      which subarchitectures actually have them at the correct location.
      
      This patch for now just disables it for ARM, to avoid randconfig
      build errors. We could also decide to limit it to the set of
      architectures on which it does compile, but that might look more
      deliberate than guessing based on where the drivers build.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      net: wan: add missing virt_to_bus dependencies
      
      The cosa driver is rather outdated and does not get built on most
      platforms because it requires the ISA_DMA_API symbol. However
      there are some ARM platforms that have ISA_DMA_API but no virt_to_bus,
      and they get this build error when enabling the ltpc driver.
      
      drivers/net/wan/cosa.c: In function 'tx_interrupt':
      drivers/net/wan/cosa.c:1768:3: error: implicit declaration of function 'virt_to_bus'
         unsigned long addr = virt_to_bus(cosa->txbuf);
         ^
      
      The same problem exists for the Hostess SV-11 and Sealevel Systems 4021
      drivers.
      
      This adds another dependency in Kconfig to avoid that configuration.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      net: cs89x0: always build platform code if !HAS_IOPORT_MAP
      
      The cs89x0 driver can either be built as an ISA driver or a platform
      driver, the choice is controlled by the CS89x0_PLATFORM Kconfig
      symbol. Building the ISA driver on a system that does not have
      a way to map I/O ports fails with this error:
      
      drivers/built-in.o: In function `cs89x0_ioport_probe.constprop.1':
      :(.init.text+0x4794): undefined reference to `ioport_map'
      :(.init.text+0x4830): undefined reference to `ioport_unmap'
      
      This changes the Kconfig logic to take that option away and
      always force building the platform variant of this driver if
      CONFIG_HAS_IOPORT_MAP is not set. This is the only correct
      choice in this case, and it avoids the build error.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      ppp: deflate: never return len larger than output buffer
      
      When we've run out of space in the output buffer to store more data, we
      will call zlib_deflate with a NULL output buffer until we've consumed
      remaining input.
      
      When this happens, olen contains the size the output buffer would have
      consumed iff we'd have had enough room.
      
      This can later cause skb_over_panic when ppp_generic skb_put()s
      the returned length.
      Reported-by: default avatarIain Douglas <centos@1n6.org.uk>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      Merge branch 'netns'
      
      Nicolas Dichtel says:
      
      ====================
      netns: audit netdevice creation with IFLA_NET_NS_[PID|FD]
      
      When one of these attributes is set, the netdevice is created into the netns
      pointed by IFLA_NET_NS_[PID|FD] (see the call to rtnl_create_link() in
      rtnl_newlink()). Let's call this netns the dest_net. After this creation, if the
      newlink handler exists, it is called with a netns argument that points to the
      netns where the netlink message has been received (called src_net in the code)
      which is the link netns.
      Hence, with one of these attributes, it's possible to create a x-netns
      netdevice.
      
      Here is the result of my code review:
      - all ip tunnels (sit, ipip, ip6_tunnels, gre[tap][v6], ip_vti[6]) does not
        really allows to use this feature: the netdevice is created in the dest_net
        and the src_net is completely ignored in the newlink handler.
      - VLAN properly handles this x-netns creation.
      - bridge ignores src_net, which seems fine (NETIF_F_NETNS_LOCAL is set).
      - CAIF subsystem is not clear for me (I don't know how it works), but it seems
        to wrongly use src_net. Patch #1 tries to fix this, but it was done only by
        code review (and only compile-tested), so please carefully review it. I may
        miss something.
      - HSR subsystem uses src_net to parse IFLA_HSR_SLAVE[1|2], but the netdevice has
        the flag NETIF_F_NETNS_LOCAL, so the question is: does this netdevice really
        supports x-netns? If not, the newlink handler should use the dest_net instead
        of src_net, I can provide the patch.
      - ieee802154 uses also src_net and does not have NETIF_F_NETNS_LOCAL. Same
        question: does this netdevice really supports x-netns?
      - bonding ignores src_net and flag NETIF_F_NETNS_LOCAL is set, ie x-netns is not
        supported. Fine.
      - CAN does not support rtnl/newlink, ok.
      - ipvlan uses src_net and does not have NETIF_F_NETNS_LOCAL. After looking at
        the code, it seems that this drivers support x-netns. Am I right?
      - macvlan/macvtap uses src_net and seems to have x-netns support.
      - team ignores src_net and has the flag NETIF_F_NETNS_LOCAL, ie x-netns is not
        supported. Ok.
      - veth uses src_net and have x-netns support ;-) Ok.
      - VXLAN didn't properly handle this. The link netns (vxlan->net) is the src_net
        and not dest_net (see patch #2). Note that it was already possible to create a
        x-netns vxlan before the commit f01ec1c0 ("vxlan: add x-netns support")
        but the nedevice remains broken.
      
      To summarize:
       - CAIF patch must be carefully reviewed
       - for HSR, ieee802154, ipvlan: is x-netns supported?
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      vxlan: setup the right link netns in newlink hdlr
      
      Rename the netns to src_net to avoid confusion with the netns where the
      interface stands. The user may specify IFLA_NET_NS_[PID|FD] to create
      a x-netns netndevice: IFLA_NET_NS_[PID|FD] points to the netns where the
      netdevice stands and src_net to the link netns.
      
      Note that before commit f01ec1c0 ("vxlan: add x-netns support"), it was
      possible to create a x-netns vxlan netdevice, but the netdevice was not
      operational.
      
      Fixes: f01ec1c0 ("vxlan: add x-netns support")
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      caif: remove wrong dev_net_set() call
      
      src_net points to the netns where the netlink message has been received. This
      netns may be different from the netns where the interface is created (because
      the user may add IFLA_NET_NS_[PID|FD]). In this case, src_net is the link netns.
      
      It seems wrong to override the netns in the newlink() handler because if it
      was not already src_net, it means that the user explicitly asks to create the
      netdevice in another netns.
      
      CC: Sjur Brændeland <sjur.brandeland@stericsson.com>
      CC: Dmitry Tarnyagin <dmitry.tarnyagin@lockless.no>
      Fixes: 8391c4aa ("caif: Bugfixes in CAIF netdevice for close and flow control")
      Fixes: c4125400 ("caif-hsi: Add rtnl support")
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      lib/checksum.c: fix build for generic csum_tcpudp_nofold
      
      Fixed commit added from64to32 under _#ifndef do_csum_ but used it
      under _#ifndef csum_tcpudp_nofold_, breaking some builds (Fengguang's
      robot reported TILEGX's). Move from64to32 under the latter.
      
      Fixes: 150ae0e9 ("lib/checksum.c: fix carry in csum_tcpudp_nofold")
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarKarl Beldan <karl.beldan@rivierawaves.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      tcp: ipv4: initialize unicast_sock sk_pacing_rate
      
      When I added sk_pacing_rate field, I forgot to initialize its value
      in the per cpu unicast_sock used in ip_send_unicast_reply()
      
      This means that for sch_fq users, RST packets, or ACK packets sent
      on behalf of TIME_WAIT sockets might be sent to slowly or even dropped
      once we reach the per flow limit.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Fixes: 95bd09eb ("tcp: TSO packets automatic sizing")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit bdbbb852
      811230cd)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      971b8803
    • Roopa Prabhu's avatar
      bridge: dont send notification when skb->len == 0 in rtnl_bridge_notify · 8c040420
      Roopa Prabhu authored
      Reported in: https://bugzilla.kernel.org/show_bug.cgi?id=92081
      
      This patch avoids calling rtnl_notify if the device ndo_bridge_getlink
      handler does not return any bytes in the skb.
      
      Alternately, the skb->len check can be moved inside rtnl_notify.
      
      For the bridge vlan case described in 92081, there is also a fix needed
      in bridge driver to generate a proper notification. Will fix that in
      subsequent patch.
      
      v2: rebase patch on net tree
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit 59ccaaaa)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      8c040420
    • subashab@codeaurora.org's avatar
      ping: Fix race in free in receive path · abf175c2
      subashab@codeaurora.org authored
      An exception is seen in ICMP ping receive path where the skb
      destructor sock_rfree() tries to access a freed socket. This happens
      because ping_rcv() releases socket reference with sock_put() and this
      internally frees up the socket. Later icmp_rcv() will try to free the
      skb and as part of this, skb destructor is called and which leads
      to a kernel panic as the socket is freed already in ping_rcv().
      
      -->|exception
      -007|sk_mem_uncharge
      -007|sock_rfree
      -008|skb_release_head_state
      -009|skb_release_all
      -009|__kfree_skb
      -010|kfree_skb
      -011|icmp_rcv
      -012|ip_local_deliver_finish
      
      Fix this incorrect free by cloning this skb and processing this cloned
      skb instead.
      
      This patch was suggested by Eric Dumazet
      Signed-off-by: default avatarSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit fc752f1f)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      abf175c2
    • Herbert Xu's avatar
      udp_diag: Fix socket skipping within chain · aec9ed7c
      Herbert Xu authored
      While working on rhashtable walking I noticed that the UDP diag
      dumping code is buggy.  In particular, the socket skipping within
      a chain never happens, even though we record the number of sockets
      that should be skipped.
      
      As this code was supposedly copied from TCP, this patch does what
      TCP does and resets num before we walk a chain.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: default avatarPavel Emelyanov <xemul@parallels.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit 86f3cddb)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      aec9ed7c