1. 24 Jun, 2016 40 commits
    • Jann Horn's avatar
      ecryptfs: forbid opening files without mmap handler · dea2cf7c
      Jann Horn authored
      commit 2f36db71 upstream.
      
      This prevents users from triggering a stack overflow through a recursive
      invocation of pagefault handling that involves mapping procfs files into
      virtual memory.
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Acked-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dea2cf7c
    • Tejun Heo's avatar
      memcg: add RCU locking around css_for_each_descendant_pre() in memcg_offline_kmem() · d3f97524
      Tejun Heo authored
      commit 3a06bb78 upstream.
      
      memcg_offline_kmem() may be called from memcg_free_kmem() after a css
      init failure.  memcg_free_kmem() is a ->css_free callback which is
      called without cgroup_mutex and memcg_offline_kmem() ends up using
      css_for_each_descendant_pre() without any locking.  Fix it by adding rcu
      read locking around it.
      
          mkdir: cannot create directory `65530': No space left on device
          ===============================
          [ INFO: suspicious RCU usage. ]
          4.6.0-work+ #321 Not tainted
          -------------------------------
          kernel/cgroup.c:4008 cgroup_mutex or RCU read lock required!
           [  527.243970] other info that might help us debug this:
           [  527.244715]
          rcu_scheduler_active = 1, debug_locks = 0
          2 locks held by kworker/0:5/1664:
           #0:  ("cgroup_destroy"){.+.+..}, at: [<ffffffff81060ab5>] process_one_work+0x165/0x4a0
           #1:  ((&css->destroy_work)#3){+.+...}, at: [<ffffffff81060ab5>] process_one_work+0x165/0x4a0
           [  527.248098] stack backtrace:
          CPU: 0 PID: 1664 Comm: kworker/0:5 Not tainted 4.6.0-work+ #321
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014
          Workqueue: cgroup_destroy css_free_work_fn
          Call Trace:
            dump_stack+0x68/0xa1
            lockdep_rcu_suspicious+0xd7/0x110
            css_next_descendant_pre+0x7d/0xb0
            memcg_offline_kmem.part.44+0x4a/0xc0
            mem_cgroup_css_free+0x1ec/0x200
            css_free_work_fn+0x49/0x5e0
            process_one_work+0x1c5/0x4a0
            worker_thread+0x49/0x490
            kthread+0xea/0x100
            ret_from_fork+0x1f/0x40
      
      Link: http://lkml.kernel.org/r/20160526203018.GG23194@mtj.duckdns.orgSigned-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d3f97524
    • Helge Deller's avatar
      parisc: Fix pagefault crash in unaligned __get_user() call · 1125f3b0
      Helge Deller authored
      commit 8b78f260 upstream.
      
      One of the debian buildd servers had this crash in the syslog without
      any other information:
      
       Unaligned handler failed, ret = -2
       clock_adjtime (pid 22578): Unaligned data reference (code 28)
       CPU: 1 PID: 22578 Comm: clock_adjtime Tainted: G  E  4.5.0-2-parisc64-smp #1 Debian 4.5.4-1
       task: 000000007d9960f8 ti: 00000001bde7c000 task.ti: 00000001bde7c000
      
            YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
       PSW: 00001000000001001111100000001111 Tainted: G            E
       r00-03  000000ff0804f80f 00000001bde7c2b0 00000000402d2be8 00000001bde7c2b0
       r04-07  00000000409e1fd0 00000000fa6f7fff 00000001bde7c148 00000000fa6f7fff
       r08-11  0000000000000000 00000000ffffffff 00000000fac9bb7b 000000000002b4d4
       r12-15  000000000015241c 000000000015242c 000000000000002d 00000000fac9bb7b
       r16-19  0000000000028800 0000000000000001 0000000000000070 00000001bde7c218
       r20-23  0000000000000000 00000001bde7c210 0000000000000002 0000000000000000
       r24-27  0000000000000000 0000000000000000 00000001bde7c148 00000000409e1fd0
       r28-31  0000000000000001 00000001bde7c320 00000001bde7c350 00000001bde7c218
       sr00-03  0000000001200000 0000000001200000 0000000000000000 0000000001200000
       sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
      
       IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000402d2e84 00000000402d2e88
        IIR: 0ca0d089    ISR: 0000000001200000  IOR: 00000000fa6f7fff
        CPU:        1   CR30: 00000001bde7c000 CR31: ffffffffffffffff
        ORIG_R28: 00000002369fe628
        IAOQ[0]: compat_get_timex+0x2dc/0x3c0
        IAOQ[1]: compat_get_timex+0x2e0/0x3c0
        RP(r2): compat_get_timex+0x40/0x3c0
       Backtrace:
        [<00000000402d4608>] compat_SyS_clock_adjtime+0x40/0xc0
        [<0000000040205024>] syscall_exit+0x0/0x14
      
      This means the userspace program clock_adjtime called the clock_adjtime()
      syscall and then crashed inside the compat_get_timex() function.
      Syscalls should never crash programs, but instead return EFAULT.
      
      The IIR register contains the executed instruction, which disassebles
      into "ldw 0(sr3,r5),r9".
      This load-word instruction is part of __get_user() which tried to read the word
      at %r5/IOR (0xfa6f7fff). This means the unaligned handler jumped in.  The
      unaligned handler is able to emulate all ldw instructions, but it fails if it
      fails to read the source e.g. because of page fault.
      
      The following program reproduces the problem:
      
      #define _GNU_SOURCE
      #include <unistd.h>
      #include <sys/syscall.h>
      #include <sys/mman.h>
      
      int main(void) {
              /* allocate 8k */
              char *ptr = mmap(NULL, 2*4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
              /* free second half (upper 4k) and make it invalid. */
              munmap(ptr+4096, 4096);
              /* syscall where first int is unaligned and clobbers into invalid memory region */
              /* syscall should return EFAULT */
              return syscall(__NR_clock_adjtime, 0, ptr+4095);
      }
      
      To fix this issue we simply need to check if the faulting instruction address
      is in the exception fixup table when the unaligned handler failed. If it
      is, call the fixup routine instead of crashing.
      
      While looking at the unaligned handler I found another issue as well: The
      target register should not be modified if the handler was unsuccessful.
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1125f3b0
    • hongkun.cao's avatar
      pinctrl: mediatek: fix dual-edge code defect · b5ff1d60
      hongkun.cao authored
      commit 5edf673d upstream.
      
      When a dual-edge irq is triggered, an incorrect irq will be reported on
      condition that the external signal is not stable and this incorrect irq
      has been registered.
      Correct the register offset.
      Signed-off-by: default avatarHongkun Cao <hongkun.cao@mediatek.com>
      Reviewed-by: default avatarMatthias Brugger <matthias.bgg@gmail.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b5ff1d60
    • Thomas Huth's avatar
      powerpc/pseries: Add POWER8NVL support to ibm,client-architecture-support call · a976f62a
      Thomas Huth authored
      commit 7cc85103 upstream.
      
      If we do not provide the PVR for POWER8NVL, a guest on this system
      currently ends up in PowerISA 2.06 compatibility mode on KVM, since QEMU
      does not provide a generic PowerISA 2.07 mode yet. So some new
      instructions from POWER8 (like "mtvsrd") get disabled for the guest,
      resulting in crashes when using code compiled explicitly for
      POWER8 (e.g. with the "-mcpu=power8" option of GCC).
      
      Fixes: ddee09c0 ("powerpc: Add PVR for POWER8NVL processor")
      Signed-off-by: default avatarThomas Huth <thuth@redhat.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a976f62a
    • Thomas Huth's avatar
      powerpc: Use privileged SPR number for MMCR2 · cac2863f
      Thomas Huth authored
      commit 8dd75ccb upstream.
      
      We are already using the privileged versions of MMCR0, MMCR1
      and MMCRA in the kernel, so for MMCR2, we should better use
      the privileged versions, too, to be consistent.
      
      Fixes: 240686c1 ("powerpc: Initialise PMU related regs on Power8")
      Suggested-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarThomas Huth <thuth@redhat.com>
      Acked-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cac2863f
    • Thomas Huth's avatar
      powerpc: Fix definition of SIAR and SDAR registers · 4f27ca0e
      Thomas Huth authored
      commit d23fac2b upstream.
      
      The SIAR and SDAR registers are available twice, one time as SPRs
      780 / 781 (unprivileged, but read-only), and one time as the SPRs
      796 / 797 (privileged, but read and write). The Linux kernel code
      currently uses the unprivileged  SPRs - while this is OK for reading,
      writing to that register of course does not work.
      Since the KVM code tries to write to this register, too (see the mtspr
      in book3s_hv_rmhandlers.S), the contents of this register sometimes get
      lost for the guests, e.g. during migration of a VM.
      To fix this issue, simply switch to the privileged SPR numbers instead.
      Signed-off-by: default avatarThomas Huth <thuth@redhat.com>
      Acked-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4f27ca0e
    • Russell Currey's avatar
      powerpc/pseries/eeh: Handle RTAS delay requests in configure_bridge · baa6dfd6
      Russell Currey authored
      commit 871e178e upstream.
      
      In the "ibm,configure-pe" and "ibm,configure-bridge" RTAS calls, the
      spec states that values of 9900-9905 can be returned, indicating that
      software should delay for 10^x (where x is the last digit, i.e. 990x)
      milliseconds and attempt the call again. Currently, the kernel doesn't
      know about this, and respecting it fixes some PCI failures when the
      hypervisor is busy.
      
      The delay is capped at 0.2 seconds.
      Signed-off-by: default avatarRussell Currey <ruscur@russell.cc>
      Acked-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      baa6dfd6
    • Will Deacon's avatar
      arm64: mm: always take dirty state from new pte in ptep_set_access_flags · 5e8b53a4
      Will Deacon authored
      commit 0106d456 upstream.
      
      Commit 66dbd6e6 ("arm64: Implement ptep_set_access_flags() for
      hardware AF/DBM") ensured that pte flags are updated atomically in the
      face of potential concurrent, hardware-assisted updates. However, Alex
      reports that:
      
       | This patch breaks swapping for me.
       | In the broken case, you'll see either systemd cpu time spike (because
       | it's stuck in a page fault loop) or the system hang (because the
       | application owning the screen is stuck in a page fault loop).
      
      It turns out that this is because the 'dirty' argument to
      ptep_set_access_flags is always 0 for read faults, and so we can't use
      it to set PTE_RDONLY. The failing sequence is:
      
        1. We put down a PTE_WRITE | PTE_DIRTY | PTE_AF pte
        2. Memory pressure -> pte_mkold(pte) -> clear PTE_AF
        3. A read faults due to the missing access flag
        4. ptep_set_access_flags is called with dirty = 0, due to the read fault
        5. pte is then made PTE_WRITE | PTE_DIRTY | PTE_AF | PTE_RDONLY (!)
        6. A write faults, but pte_write is true so we get stuck
      
      The solution is to check the new page table entry (as would be done by
      the generic, non-atomic definition of ptep_set_access_flags that just
      calls set_pte_at) to establish the dirty state.
      
      Fixes: 66dbd6e6 ("arm64: Implement ptep_set_access_flags() for hardware AF/DBM")
      Reviewed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reported-by: default avatarAlexander Graf <agraf@suse.de>
      Tested-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5e8b53a4
    • Catalin Marinas's avatar
      arm64: Provide "model name" in /proc/cpuinfo for PER_LINUX32 tasks · d0bc1f47
      Catalin Marinas authored
      commit e47b020a upstream.
      
      This patch brings the PER_LINUX32 /proc/cpuinfo format more in line with
      the 32-bit ARM one by providing an additional line:
      
      model name      : ARMv8 Processor rev X (v8l)
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d0bc1f47
    • Tom Lendacky's avatar
      crypto: ccp - Fix AES XTS error for request sizes above 4096 · 774920ee
      Tom Lendacky authored
      commit ab6a11a7 upstream.
      
      The ccp-crypto module for AES XTS support has a bug that can allow requests
      greater than 4096 bytes in size to be passed to the CCP hardware. The CCP
      hardware does not support request sizes larger than 4096, resulting in
      incorrect output. The request should actually be handled by the fallback
      mechanism instantiated by the ccp-crypto module.
      
      Add a check to insure the request size is less than or equal to the maximum
      supported size and use the fallback mechanism if it is not.
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      774920ee
    • Arnd Bergmann's avatar
      crypto: public_key: select CRYPTO_AKCIPHER · b440f3ae
      Arnd Bergmann authored
      commit bad6a185 upstream.
      
      In some rare randconfig builds, we can end up with
      ASYMMETRIC_PUBLIC_KEY_SUBTYPE enabled but CRYPTO_AKCIPHER disabled,
      which fails to link because of the reference to crypto_alloc_akcipher:
      
      crypto/built-in.o: In function `public_key_verify_signature':
      :(.text+0x110e4): undefined reference to `crypto_alloc_akcipher'
      
      This adds a Kconfig 'select' statement to ensure the dependency
      is always there.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b440f3ae
    • Marc Zyngier's avatar
      irqchip/gic-v3: Fix ICC_SGI1R_EL1.INTID decoding mask · f32ef5c8
      Marc Zyngier authored
      commit dd5f1b04 upstream.
      
      The INTID mask is wrong, and is made a signed value, which has
      nteresting effects in the KVM emulation. Let's sanitize it.
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f32ef5c8
    • Michael Holzheu's avatar
      s390/bpf: reduce maximum program size to 64 KB · 9be2fa20
      Michael Holzheu authored
      commit 0fa96355 upstream.
      
      The s390 BFP compiler currently uses relative branch instructions
      that only support jumps up to 64 KB. Examples are "j", "jnz", "cgrj",
      etc.  Currently the maximum size of s390 BPF programs is set
      to 0x7ffff.  If branches over 64 KB are generated the, kernel can
      crash due to incorrect code.
      
      So fix this an reduce the maximum size to 64 KB. Programs larger than
      that will be interpreted.
      
      Fixes: ce2b6ad9 ("s390/bpf: increase BPF_SIZE_MAX")
      Signed-off-by: default avatarMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9be2fa20
    • Michael Holzheu's avatar
      s390/bpf: fix recache skb->data/hlen for skb_vlan_push/pop · ebf52918
      Michael Holzheu authored
      commit 6edf0aa4 upstream.
      
      In case of usage of skb_vlan_push/pop, in the prologue we store
      the SKB pointer on the stack and restore it after BPF_JMP_CALL
      to skb_vlan_push/pop.
      
      Unfortunately currently there are two bugs in the code:
      
       1) The wrong stack slot (offset 170 instead of 176) is used
       2) The wrong register (W1 instead of B1) is saved
      
      So fix this and use correct stack slot and register.
      
      Fixes: 9db7f2b8 ("s390/bpf: recache skb->data/hlen for skb_vlan_push/pop")
      Signed-off-by: default avatarMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ebf52918
    • Ben Dooks's avatar
      gpio: bcm-kona: fix bcm_kona_gpio_reset() warnings · e1c35534
      Ben Dooks authored
      commit b66b2a0a upstream.
      
      The bcm_kona_gpio_reset() calls bcm_kona_gpio_write_lock_regs()
      with what looks like the wrong parameter. The write_lock_regs
      function takes a pointer to the registers, not the bcm_kona_gpio
      structure.
      
      Fix the warning, and probably bug by changing the function to
      pass reg_base instead of kona_gpio, fixing the following warning:
      
      drivers/gpio/gpio-bcm-kona.c:550:47: warning: incorrect type in argument 1
        (different address spaces)
        expected void [noderef] <asn:2>*reg_base
        got struct bcm_kona_gpio *kona_gpio
        warning: incorrect type in argument 1 (different address spaces)
        expected void [noderef] <asn:2>*reg_base
        got struct bcm_kona_gpio *kona_gpio
      Signed-off-by: default avatarBen Dooks <ben.dooks@codethink.co.uk>
      Acked-by: default avatarRay Jui <ray.jui@broadcom.com>
      Reviewed-by: default avatarMarkus Mayer <mmayer@broadcom.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e1c35534
    • Russell King's avatar
      ARM: fix PTRACE_SETVFPREGS on SMP systems · 9edd6fd1
      Russell King authored
      commit e2dfb4b8 upstream.
      
      PTRACE_SETVFPREGS fails to properly mark the VFP register set to be
      reloaded, because it undoes one of the effects of vfp_flush_hwstate().
      
      Specifically vfp_flush_hwstate() sets thread->vfpstate.hard.cpu to
      an invalid CPU number, but vfp_set() overwrites this with the original
      CPU number, thereby rendering the hardware state as apparently "valid",
      even though the software state is more recent.
      
      Fix this by reverting the previous change.
      
      Fixes: 8130b9d7 ("ARM: 7308/1: vfp: flush thread hwstate before copying ptrace registers")
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Tested-by: default avatarSimon Marchi <simon.marchi@ericsson.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9edd6fd1
    • Torsten Hilbrich's avatar
      ALSA: hda/realtek: Add T560 docking unit fixup · da7f1c92
      Torsten Hilbrich authored
      commit dab38e43 upstream.
      
      Tested with Lenovo Ultradock. Fixes the non-working headphone jack on
      the docking unit.
      Signed-off-by: default avatarTorsten Hilbrich <torsten.hilbrich@secunet.com>
      Tested-by: default avatarTorsten Hilbrich <torsten.hilbrich@secunet.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      da7f1c92
    • Kailang Yang's avatar
      ALSA: hda/realtek - Add support for new codecs ALC700/ALC701/ALC703 · 81999107
      Kailang Yang authored
      commit 6fbae35a upstream.
      
      Support new codecs for ALC700/ALC701/ALC703.
      Signed-off-by: default avatarKailang Yang <kailang@realtek.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      81999107
    • Kailang Yang's avatar
      ALSA: hda/realtek - ALC256 speaker noise issue · c3fd646b
      Kailang Yang authored
      commit e69e7e03 upstream.
      
      That is some different register for ALC255 and ALC256.
      ALC256 can't fit with some ALC255 register.
      This issue is cause from LDO output voltage control.
      This patch is updated the right LDO register value.
      Signed-off-by: default avatarKailang Yang <kailang@realtek.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c3fd646b
    • AceLan Kao's avatar
      ALSA: hda - Fix headset mic detection problem for Dell machine · 1bf80a48
      AceLan Kao authored
      commit f90d83b3 upstream.
      
      Add the pin configuration value of this machine into the pin_quirk
      table to make DELL1_MIC_NO_PRESENCE apply to this machine.
      Signed-off-by: default avatarAceLan Kao <acelan.kao@canonical.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1bf80a48
    • Vinod Koul's avatar
      ALSA: hda - Add PCI ID for Kabylake · 1f4b7507
      Vinod Koul authored
      commit 35639a0e upstream.
      
      Kabylake shows up as PCI ID 0xa171. And Kabylake-LP as 0x9d71.
      Since these are similar to Skylake add these to SKL_PLUS macro
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1f4b7507
    • Paolo Bonzini's avatar
      KVM: irqfd: fix NULL pointer dereference in kvm_irq_map_gsi · 2cb77b0a
      Paolo Bonzini authored
      commit c622a3c2 upstream.
      
      Found by syzkaller:
      
          BUG: unable to handle kernel NULL pointer dereference at 0000000000000120
          IP: [<ffffffffa0797202>] kvm_irq_map_gsi+0x12/0x90 [kvm]
          PGD 6f80b067 PUD b6535067 PMD 0
          Oops: 0000 [#1] SMP
          CPU: 3 PID: 4988 Comm: a.out Not tainted 4.4.9-300.fc23.x86_64 #1
          [...]
          Call Trace:
           [<ffffffffa0795f62>] irqfd_update+0x32/0xc0 [kvm]
           [<ffffffffa0796c7c>] kvm_irqfd+0x3dc/0x5b0 [kvm]
           [<ffffffffa07943f4>] kvm_vm_ioctl+0x164/0x6f0 [kvm]
           [<ffffffff81241648>] do_vfs_ioctl+0x298/0x480
           [<ffffffff812418a9>] SyS_ioctl+0x79/0x90
           [<ffffffff817a1062>] tracesys_phase2+0x84/0x89
          Code: b5 71 a7 e0 5b 41 5c 41 5d 5d f3 c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 8b 8f 10 2e 00 00 31 c0 48 89 e5 <39> 91 20 01 00 00 76 6a 48 63 d2 48 8b 94 d1 28 01 00 00 48 85
          RIP  [<ffffffffa0797202>] kvm_irq_map_gsi+0x12/0x90 [kvm]
           RSP <ffff8800926cbca8>
          CR2: 0000000000000120
      
      Testcase:
      
          #include <unistd.h>
          #include <sys/syscall.h>
          #include <string.h>
          #include <stdint.h>
          #include <linux/kvm.h>
          #include <fcntl.h>
          #include <sys/ioctl.h>
      
          long r[26];
      
          int main()
          {
              memset(r, -1, sizeof(r));
              r[2] = open("/dev/kvm", 0);
              r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
      
              struct kvm_irqfd ifd;
              ifd.fd = syscall(SYS_eventfd2, 5, 0);
              ifd.gsi = 3;
              ifd.flags = 2;
              ifd.resamplefd = ifd.fd;
              r[25] = ioctl(r[3], KVM_IRQFD, &ifd);
              return 0;
          }
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2cb77b0a
    • Paolo Bonzini's avatar
      KVM: x86: fix OOPS after invalid KVM_SET_DEBUGREGS · ded4fc62
      Paolo Bonzini authored
      commit d14bdb55 upstream.
      
      MOV to DR6 or DR7 causes a #GP if an attempt is made to write a 1 to
      any of bits 63:32.  However, this is not detected at KVM_SET_DEBUGREGS
      time, and the next KVM_RUN oopses:
      
         general protection fault: 0000 [#1] SMP
         CPU: 2 PID: 14987 Comm: a.out Not tainted 4.4.9-300.fc23.x86_64 #1
         Hardware name: LENOVO 2325F51/2325F51, BIOS G2ET32WW (1.12 ) 05/30/2012
         [...]
         Call Trace:
          [<ffffffffa072c93d>] kvm_arch_vcpu_ioctl_run+0x141d/0x14e0 [kvm]
          [<ffffffffa071405d>] kvm_vcpu_ioctl+0x33d/0x620 [kvm]
          [<ffffffff81241648>] do_vfs_ioctl+0x298/0x480
          [<ffffffff812418a9>] SyS_ioctl+0x79/0x90
          [<ffffffff817a0f2e>] entry_SYSCALL_64_fastpath+0x12/0x71
         Code: 55 83 ff 07 48 89 e5 77 27 89 ff ff 24 fd 90 87 80 81 0f 23 fe 5d c3 0f 23 c6 5d c3 0f 23 ce 5d c3 0f 23 d6 5d c3 0f 23 de 5d c3 <0f> 23 f6 5d c3 0f 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00
         RIP  [<ffffffff810639eb>] native_set_debugreg+0x2b/0x40
          RSP <ffff88005836bd50>
      
      Testcase (beautified/reduced from syzkaller output):
      
          #include <unistd.h>
          #include <sys/syscall.h>
          #include <string.h>
          #include <stdint.h>
          #include <linux/kvm.h>
          #include <fcntl.h>
          #include <sys/ioctl.h>
      
          long r[8];
      
          int main()
          {
              struct kvm_debugregs dr = { 0 };
      
              r[2] = open("/dev/kvm", O_RDONLY);
              r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
              r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7);
      
              memcpy(&dr,
                     "\x5d\x6a\x6b\xe8\x57\x3b\x4b\x7e\xcf\x0d\xa1\x72"
                     "\xa3\x4a\x29\x0c\xfc\x6d\x44\x00\xa7\x52\xc7\xd8"
                     "\x00\xdb\x89\x9d\x78\xb5\x54\x6b\x6b\x13\x1c\xe9"
                     "\x5e\xd3\x0e\x40\x6f\xb4\x66\xf7\x5b\xe3\x36\xcb",
                     48);
              r[7] = ioctl(r[4], KVM_SET_DEBUGREGS, &dr);
              r[6] = ioctl(r[4], KVM_RUN, 0);
          }
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ded4fc62
    • David Wragg's avatar
      vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devices · ce9c0dba
      David Wragg authored
      [ Upstream commit 7e059158 ]
      
      Prior to 4.3, openvswitch tunnel vports (vxlan, gre and geneve) could
      transmit vxlan packets of any size, constrained only by the ability to
      send out the resulting packets.  4.3 introduced netdevs corresponding
      to tunnel vports.  These netdevs have an MTU, which limits the size of
      a packet that can be successfully encapsulated.  The default MTU
      values are low (1500 or less), which is awkwardly small in the context
      of physical networks supporting jumbo frames, and leads to a
      conspicuous change in behaviour for userspace.
      
      Instead, set the MTU on openvswitch-created netdevs to be the relevant
      maximum (i.e. the maximum IP packet size minus any relevant overhead),
      effectively restoring the behaviour prior to 4.3.
      Signed-off-by: default avatarDavid Wragg <david@weave.works>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ce9c0dba
    • David Wragg's avatar
      geneve: Relax MTU constraints · 51d7c394
      David Wragg authored
      [ Upstream commit 55e5bfb5 ]
      
      Allow the MTU of geneve devices to be set to large values, in order to
      exploit underlying networks with larger frame sizes.
      
      GENEVE does not have a fixed encapsulation overhead (an openvswitch
      rule can add variable length options), so there is no relevant maximum
      MTU to enforce.  A maximum of IP_MAX_MTU is used instead.
      Encapsulated packets that are too big for the underlying network will
      get dropped on the floor.
      Signed-off-by: default avatarDavid Wragg <david@weave.works>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      51d7c394
    • David Wragg's avatar
      vxlan: Relax MTU constraints · 3dc44305
      David Wragg authored
      [ Upstream commit 72564b59 ]
      
      Allow the MTU of vxlan devices without an underlying device to be set
      to larger values (up to a maximum based on IP packet limits and vxlan
      overhead).
      
      Previously, their MTUs could not be set to higher than the
      conventional ethernet value of 1500.  This is a very arbitrary value
      in the context of vxlan, and prevented vxlan devices from being able
      to take advantage of jumbo frames etc.
      
      The default MTU remains 1500, for compatibility.
      Signed-off-by: default avatarDavid Wragg <david@weave.works>
      Acked-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3dc44305
    • Jakub Sitnicki's avatar
      ipv6: Skip XFRM lookup if dst_entry in socket cache is valid · 4d82f395
      Jakub Sitnicki authored
      [ Upstream commit 00bc0ef5 ]
      
      At present we perform an xfrm_lookup() for each UDPv6 message we
      send. The lookup involves querying the flow cache (flow_cache_lookup)
      and, in case of a cache miss, creating an XFRM bundle.
      
      If we miss the flow cache, we can end up creating a new bundle and
      deriving the path MTU (xfrm_init_pmtu) from on an already transformed
      dst_entry, which we pass from the socket cache (sk->sk_dst_cache) down
      to xfrm_lookup(). This can happen only if we're caching the dst_entry
      in the socket, that is when we're using a connected UDP socket.
      
      To put it another way, the path MTU shrinks each time we miss the flow
      cache, which later on leads to incorrectly fragmented payload. It can
      be observed with ESPv6 in transport mode:
      
        1) Set up a transformation and lower the MTU to trigger fragmentation
          # ip xfrm policy add dir out src ::1 dst ::1 \
            tmpl src ::1 dst ::1 proto esp spi 1
          # ip xfrm state add src ::1 dst ::1 \
            proto esp spi 1 enc 'aes' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b
          # ip link set dev lo mtu 1500
      
        2) Monitor the packet flow and set up an UDP sink
          # tcpdump -ni lo -ttt &
          # socat udp6-listen:12345,fork /dev/null &
      
        3) Send a datagram that needs fragmentation with a connected socket
          # perl -e 'print "@" x 1470 | socat - udp6:[::1]:12345
          2016/06/07 18:52:52 socat[724] E read(3, 0x555bb3d5ba00, 8192): Protocol error
          00:00:00.000000 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x2), length 1448
          00:00:00.000014 IP6 ::1 > ::1: frag (1448|32)
          00:00:00.000050 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x3), length 1272
          (^ ICMPv6 Parameter Problem)
          00:00:00.000022 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x5), length 136
      
        4) Compare it to a non-connected socket
          # perl -e 'print "@" x 1500' | socat - udp6-sendto:[::1]:12345
          00:00:40.535488 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x6), length 1448
          00:00:00.000010 IP6 ::1 > ::1: frag (1448|64)
      
      What happens in step (3) is:
      
        1) when connecting the socket in __ip6_datagram_connect(), we
           perform an XFRM lookup, miss the flow cache, create an XFRM
           bundle, and cache the destination,
      
        2) afterwards, when sending the datagram, we perform an XFRM lookup,
           again, miss the flow cache (due to mismatch of flowi6_iif and
           flowi6_oif, which is an issue of its own), and recreate an XFRM
           bundle based on the cached (and already transformed) destination.
      
      To prevent the recreation of an XFRM bundle, avoid an XFRM lookup
      altogether whenever we already have a destination entry cached in the
      socket. This prevents the path MTU shrinkage and brings us on par with
      UDPv4.
      
      The fix also benefits connected PINGv6 sockets, another user of
      ip6_sk_dst_lookup_flow(), who also suffer messages being transformed
      twice.
      
      Joint work with Hannes Frederic Sowa.
      Reported-by: default avatarJan Tluka <jtluka@redhat.com>
      Signed-off-by: default avatarJakub Sitnicki <jkbs@redhat.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4d82f395
    • Guillaume Nault's avatar
      l2tp: fix configuration passed to setup_udp_tunnel_sock() · 05cbd46b
      Guillaume Nault authored
      [ Upstream commit a5c5e2da ]
      
      Unused fields of udp_cfg must be all zeros. Otherwise
      setup_udp_tunnel_sock() fills ->gro_receive and ->gro_complete
      callbacks with garbage, eventually resulting in panic when used by
      udp_gro_receive().
      
      [   72.694123] BUG: unable to handle kernel paging request at ffff880033f87d78
      [   72.695518] IP: [<ffff880033f87d78>] 0xffff880033f87d78
      [   72.696530] PGD 26e2067 PUD 26e3067 PMD 342ed063 PTE 8000000033f87163
      [   72.696530] Oops: 0011 [#1] SMP KASAN
      [   72.696530] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pptp gre pppox ppp_generic slhc crc32c_intel ghash_clmulni_intel jitterentropy_rng sha256_generic hmac drbg ansi_cprng aesni_intel evdev aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper serio_raw acpi_cpufreq button proc\
      essor ext4 crc16 jbd2 mbcache virtio_blk virtio_net virtio_pci virtio_ring virtio
      [   72.696530] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.7.0-rc1 #1
      [   72.696530] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
      [   72.696530] task: ffff880035b59700 ti: ffff880035b70000 task.ti: ffff880035b70000
      [   72.696530] RIP: 0010:[<ffff880033f87d78>]  [<ffff880033f87d78>] 0xffff880033f87d78
      [   72.696530] RSP: 0018:ffff880035f87bc0  EFLAGS: 00010246
      [   72.696530] RAX: ffffed000698f996 RBX: ffff88003326b840 RCX: ffffffff814cc823
      [   72.696530] RDX: ffff88003326b840 RSI: ffff880033e48038 RDI: ffff880034c7c780
      [   72.696530] RBP: ffff880035f87c18 R08: 000000000000a506 R09: 0000000000000000
      [   72.696530] R10: ffff880035f87b38 R11: ffff880034b9344d R12: 00000000ebfea715
      [   72.696530] R13: 0000000000000000 R14: ffff880034c7c780 R15: 0000000000000000
      [   72.696530] FS:  0000000000000000(0000) GS:ffff880035f80000(0000) knlGS:0000000000000000
      [   72.696530] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   72.696530] CR2: ffff880033f87d78 CR3: 0000000033c98000 CR4: 00000000000406a0
      [   72.696530] Stack:
      [   72.696530]  ffffffff814cc834 ffff880034b93468 0000001481416818 ffff88003326b874
      [   72.696530]  ffff880034c7ccb0 ffff880033e48038 ffff88003326b840 ffff880034b93462
      [   72.696530]  ffff88003326b88a ffff88003326b88c ffff880034b93468 ffff880035f87c70
      [   72.696530] Call Trace:
      [   72.696530]  <IRQ>
      [   72.696530]  [<ffffffff814cc834>] ? udp_gro_receive+0x1c6/0x1f9
      [   72.696530]  [<ffffffff814ccb1c>] udp4_gro_receive+0x2b5/0x310
      [   72.696530]  [<ffffffff814d989b>] inet_gro_receive+0x4a3/0x4cd
      [   72.696530]  [<ffffffff81431b32>] dev_gro_receive+0x584/0x7a3
      [   72.696530]  [<ffffffff810adf7a>] ? __lock_is_held+0x29/0x64
      [   72.696530]  [<ffffffff814321f7>] napi_gro_receive+0x124/0x21d
      [   72.696530]  [<ffffffffa000b145>] virtnet_receive+0x8df/0x8f6 [virtio_net]
      [   72.696530]  [<ffffffffa000b27e>] virtnet_poll+0x1d/0x8d [virtio_net]
      [   72.696530]  [<ffffffff81431350>] net_rx_action+0x15b/0x3b9
      [   72.696530]  [<ffffffff815893d6>] __do_softirq+0x216/0x546
      [   72.696530]  [<ffffffff81062392>] irq_exit+0x49/0xb6
      [   72.696530]  [<ffffffff81588e9a>] do_IRQ+0xe2/0xfa
      [   72.696530]  [<ffffffff81587a49>] common_interrupt+0x89/0x89
      [   72.696530]  <EOI>
      [   72.696530]  [<ffffffff810b05df>] ? trace_hardirqs_on_caller+0x229/0x270
      [   72.696530]  [<ffffffff8102b3c7>] ? default_idle+0x1c/0x2d
      [   72.696530]  [<ffffffff8102b3c5>] ? default_idle+0x1a/0x2d
      [   72.696530]  [<ffffffff8102bb8c>] arch_cpu_idle+0xa/0xc
      [   72.696530]  [<ffffffff810a6c39>] default_idle_call+0x1a/0x1c
      [   72.696530]  [<ffffffff810a6d96>] cpu_startup_entry+0x15b/0x20f
      [   72.696530]  [<ffffffff81039a81>] start_secondary+0x12c/0x133
      [   72.696530] Code: ff ff ff ff ff ff ff ff ff ff 7f ff ff ff ff ff ff ff 7f 00 7e f8 33 00 88 ff ff 6d 61 58 81 ff ff ff ff 5e de 0a 81 ff ff ff ff <00> 5c e2 34 00 88 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   72.696530] RIP  [<ffff880033f87d78>] 0xffff880033f87d78
      [   72.696530]  RSP <ffff880035f87bc0>
      [   72.696530] CR2: ffff880033f87d78
      [   72.696530] ---[ end trace ad7758b9a1dccf99 ]---
      [   72.696530] Kernel panic - not syncing: Fatal exception in interrupt
      [   72.696530] Kernel Offset: disabled
      [   72.696530] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
      
      v2: use empty initialiser instead of "{ NULL }" to avoid relying on
          first field's type.
      
      Fixes: 38fd2af2 ("udp: Add socket based GRO and config")
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      05cbd46b
    • Toshiaki Makita's avatar
      bridge: Don't insert unnecessary local fdb entry on changing mac address · 38f56354
      Toshiaki Makita authored
      [ Upstream commit 0b148def ]
      
      The missing br_vlan_should_use() test caused creation of an unneeded
      local fdb entry on changing mac address of a bridge device when there is
      a vlan which is configured on a bridge port but not on the bridge
      device.
      
      Fixes: 2594e906 ("bridge: vlan: add per-vlan struct and move to rhashtables")
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      38f56354
    • Yuchung Cheng's avatar
      tcp: record TLP and ER timer stats in v6 stats · f946ceab
      Yuchung Cheng authored
      [ Upstream commit ce3cf4ec ]
      
      The v6 tcp stats scan do not provide TLP and ER timer information
      correctly like the v4 version . This patch fixes that.
      
      Fixes: 6ba8a3b1 ("tcp: Tail loss probe (TLP)")
      Fixes: eed530b6 ("tcp: early retransmit")
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f946ceab
    • Chen Haiquan's avatar
      vxlan: Accept user specified MTU value when create new vxlan link · 721976e9
      Chen Haiquan authored
      [ Upstream commit ce577668 ]
      
      When create a new vxlan link, example:
        ip link add vtap mtu 1440 type vxlan vni 1 dev eth0
      
      The argument "mtu" has no effect, because it is not set to conf->mtu. The
      default value is used in vxlan_dev_configure function.
      
      This problem was introduced by commit 0dfbdf41 (vxlan: Factor out device
      configuration).
      
      Fixes: 0dfbdf41 (vxlan: Factor out device configuration)
      Signed-off-by: default avatarChen Haiquan <oc@yunify.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      721976e9
    • Ivan Vecera's avatar
      team: don't call netdev_change_features under team->lock · 13a055d6
      Ivan Vecera authored
      [ Upstream commit f6988cb6 ]
      
      The team_device_event() notifier calls team_compute_features() to fix
      vlan_features under team->lock to protect team->port_list. The problem is
      that subsequent __team_compute_features() calls netdev_change_features()
      to propagate vlan_features to upper vlan devices while team->lock is still
      taken. This can lead to deadlock when NETIF_F_LRO is modified on lower
      devices or team device itself.
      
      Example:
      The team0 as active backup with eth0 and eth1 NICs. Both eth0 & eth1 are
      LRO capable and LRO is enabled. Thus LRO is also enabled on team0.
      
      The command 'ethtool -K team0 lro off' now hangs due to this deadlock:
      
      dev_ethtool()
      -> ethtool_set_features()
       -> __netdev_update_features(team)
        -> netdev_sync_lower_features()
         -> netdev_update_features(lower_1)
          -> __netdev_update_features(lower_1)
          -> netdev_features_change(lower_1)
           -> call_netdevice_notifiers(...)
            -> team_device_event(lower_1)
             -> team_compute_features(team) [TAKES team->lock]
              -> netdev_change_features(team)
               -> __netdev_update_features(team)
                -> netdev_sync_lower_features()
                 -> netdev_update_features(lower_2)
                  -> __netdev_update_features(lower_2)
                  -> netdev_features_change(lower_2)
                   -> call_netdevice_notifiers(...)
                    -> team_device_event(lower_2)
                     -> team_compute_features(team) [DEADLOCK]
      
      The bug is present in team from the beginning but it appeared after the commit
      fd867d51 (net/core: generic support for disabling netdev features down stack)
      that adds synchronization of features with lower devices.
      
      Fixes: fd867d51 (net/core: generic support for disabling netdev features down stack)
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      13a055d6
    • Edward Cree's avatar
      sfc: on MC reset, clear PIO buffer linkage in TXQs · 450db517
      Edward Cree authored
      [ Upstream commit c0795bf6 ]
      
      Otherwise, if we fail to allocate new PIO buffers, our TXQs will try to
      use the old ones, which aren't there any more.
      
      Fixes: 183233be "sfc: Allocate and link PIO buffers; map them with write-combining"
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      450db517
    • Daniel Borkmann's avatar
      bpf, inode: disallow userns mounts · bfe951d5
      Daniel Borkmann authored
      [ Upstream commit 612bacad ]
      
      Follow-up to commit e27f4a94 ("bpf: Use mount_nodev not mount_ns
      to mount the bpf filesystem"), which removes the FS_USERNS_MOUNT flag.
      
      The original idea was to have a per mountns instance instead of a
      single global fs instance, but that didn't work out and we had to
      switch to mount_nodev() model. The intent of that middle ground was
      that we avoid users who don't play nice to create endless instances
      of bpf fs which are difficult to control and discover from an admin
      point of view, but at the same time it would have allowed us to be
      more flexible with regard to namespaces.
      
      Therefore, since we now did the switch to mount_nodev() as a fix
      where individual instances are created, we also need to remove userns
      mount flag along with it to avoid running into mentioned situation.
      I don't expect any breakage at this early point in time with removing
      the flag and we can revisit this later should the requirement for
      this come up with future users. This and commit e27f4a94 have
      been split to facilitate tracking should any of them run into the
      unlikely case of causing a regression.
      
      Fixes: b2197755 ("bpf: add support for persistent maps/progs")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bfe951d5
    • Nicolas Dichtel's avatar
      uapi glibc compat: fix compilation when !__USE_MISC in glibc · f5f16bf6
      Nicolas Dichtel authored
      [ Upstream commit f0a3fdca ]
      
      These structures are defined only if __USE_MISC is set in glibc net/if.h
      headers, ie when _BSD_SOURCE or _SVID_SOURCE are defined.
      
      CC: Jan Engelhardt <jengelh@inai.de>
      CC: Josh Boyer <jwboyer@fedoraproject.org>
      CC: Stephen Hemminger <shemming@brocade.com>
      CC: Waldemar Brodkorb <mail@waldemar-brodkorb.de>
      CC: Gabriel Laskar <gabriel@lse.epita.fr>
      CC: Mikko Rapeli <mikko.rapeli@iki.fi>
      Fixes: 4a91cb61 ("uapi glibc compat: fix compile errors when glibc net/if.h included before linux/if.h")
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f5f16bf6
    • Hannes Frederic Sowa's avatar
      udp: prevent skbs lingering in tunnel socket queues · ab1f253d
      Hannes Frederic Sowa authored
      [ Upstream commit e5aed006 ]
      
      In case we find a socket with encapsulation enabled we should call
      the encap_recv function even if just a udp header without payload is
      available. The callbacks are responsible for correctly verifying and
      dropping the packets.
      
      Also, in case the header validation fails for geneve and vxlan we
      shouldn't put the skb back into the socket queue, no one will pick
      them up there.  Instead we can simply discard them in the respective
      encap_recv functions.
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ab1f253d
    • Eric W. Biederman's avatar
      bpf: Use mount_nodev not mount_ns to mount the bpf filesystem · 5b7ea922
      Eric W. Biederman authored
      [ Upstream commit e27f4a94 ]
      
      While reviewing the filesystems that set FS_USERNS_MOUNT I spotted the
      bpf filesystem.  Looking at the code I saw a broken usage of mount_ns
      with current->nsproxy->mnt_ns. As the code does not acquire a
      reference to the mount namespace it can not possibly be correct to
      store the mount namespace on the superblock as it does.
      
      Replace mount_ns with mount_nodev so that each mount of the bpf
      filesystem returns a distinct instance, and the code is not buggy.
      
      In discussion with Hannes Frederic Sowa it was reported that the use
      of mount_ns was an attempt to have one bpf instance per mount
      namespace, in an attempt to keep resources that pin resources from
      hiding.  That intent simply does not work, the vfs is not built to
      allow that kind of behavior.  Which means that the bpf filesystem
      really is buggy both semantically and in it's implemenation as it does
      not nor can it implement the original intent.
      
      This change is userspace visible, but my experience with similar
      filesystems leads me to believe nothing will break with a model of each
      mount of the bpf filesystem is distinct from all others.
      
      Fixes: b2197755 ("bpf: add support for persistent maps/progs")
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5b7ea922
    • Jason Wang's avatar
      tuntap: correctly wake up process during uninit · bccd56fa
      Jason Wang authored
      [ Upstream commit addf8fc4 ]
      
      We used to check dev->reg_state against NETREG_REGISTERED after each
      time we are woke up. But after commit 9e641bdc ("net-tun:
      restructure tun_do_read for better sleep/wakeup efficiency"), it uses
      skb_recv_datagram() which does not check dev->reg_state. This will
      result if we delete a tun/tap device after a process is blocked in the
      reading. The device will wait for the reference count which was held
      by that process for ever.
      
      Fixes this by using RCV_SHUTDOWN which will be checked during
      sk_recv_datagram() before trying to wake up the process during uninit.
      
      Fixes: 9e641bdc ("net-tun: restructure tun_do_read for better
      sleep/wakeup efficiency")
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Xi Wang <xii@google.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bccd56fa
    • Jiri Pirko's avatar
      switchdev: pass pointer to fib_info instead of copy · 835d0122
      Jiri Pirko authored
      [ Upstream commit da4ed551 ]
      
      The problem is that fib_info->nh is [0] so the struct fib_info
      allocation size depends on number of nexthops. If we just copy fib_info,
      we do not copy the nexthops info and driver accesses memory which is not
      ours.
      
      Given the fact that fib4 does not defer operations and therefore it does
      not need copy, just pass the pointer down to drivers as it was done
      before.
      
      Fixes: 850d0cbc ("switchdev: remove pointers from switchdev objects")
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      835d0122