1. 24 Jun, 2016 40 commits
    • Nitin Gupta's avatar
      sparc64: Reduce TLB flushes during hugepte changes · 87575e31
      Nitin Gupta authored
      [ Upstream commit 24e49ee3 ]
      
      During hugepage map/unmap, TSB and TLB flushes are currently
      issued at every PAGE_SIZE'd boundary which is unnecessary.
      We now issue the flush at REAL_HPAGE_SIZE boundaries only.
      
      Without this patch workloads which unmap a large hugepage
      backed VMA region get CPU lockups due to excessive TLB
      flush calls.
      
      Orabug: 22365539, 22643230, 22995196
      Signed-off-by: default avatarNitin Gupta <nitin.m.gupta@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      87575e31
    • Babu Moger's avatar
      sparc/PCI: Fix for panic while enabling SR-IOV · ccd02310
      Babu Moger authored
      [ Upstream commit d0c31e02 ]
      
      We noticed this panic while enabling SR-IOV in sparc.
      
      mlx4_core: Mellanox ConnectX core driver v2.2-1 (Jan  1 2015)
      mlx4_core: Initializing 0007:01:00.0
      mlx4_core 0007:01:00.0: Enabling SR-IOV with 5 VFs
      mlx4_core: Initializing 0007:01:00.1
      Unable to handle kernel NULL pointer dereference
      insmod(10010): Oops [#1]
      CPU: 391 PID: 10010 Comm: insmod Not tainted
      		4.1.12-32.el6uek.kdump2.sparc64 #1
      TPC: <dma_supported+0x20/0x80>
      I7: <__mlx4_init_one+0x324/0x500 [mlx4_core]>
      Call Trace:
       [00000000104c5ea4] __mlx4_init_one+0x324/0x500 [mlx4_core]
       [00000000104c613c] mlx4_init_one+0xbc/0x120 [mlx4_core]
       [0000000000725f14] local_pci_probe+0x34/0xa0
       [0000000000726028] pci_call_probe+0xa8/0xe0
       [0000000000726310] pci_device_probe+0x50/0x80
       [000000000079f700] really_probe+0x140/0x420
       [000000000079fa24] driver_probe_device+0x44/0xa0
       [000000000079fb5c] __device_attach+0x3c/0x60
       [000000000079d85c] bus_for_each_drv+0x5c/0xa0
       [000000000079f588] device_attach+0x88/0xc0
       [000000000071acd0] pci_bus_add_device+0x30/0x80
       [0000000000736090] virtfn_add.clone.1+0x210/0x360
       [00000000007364a4] sriov_enable+0x2c4/0x520
       [000000000073672c] pci_enable_sriov+0x2c/0x40
       [00000000104c2d58] mlx4_enable_sriov+0xf8/0x180 [mlx4_core]
       [00000000104c49ac] mlx4_load_one+0x42c/0xd40 [mlx4_core]
      Disabling lock debugging due to kernel taint
      Caller[00000000104c5ea4]: __mlx4_init_one+0x324/0x500 [mlx4_core]
      Caller[00000000104c613c]: mlx4_init_one+0xbc/0x120 [mlx4_core]
      Caller[0000000000725f14]: local_pci_probe+0x34/0xa0
      Caller[0000000000726028]: pci_call_probe+0xa8/0xe0
      Caller[0000000000726310]: pci_device_probe+0x50/0x80
      Caller[000000000079f700]: really_probe+0x140/0x420
      Caller[000000000079fa24]: driver_probe_device+0x44/0xa0
      Caller[000000000079fb5c]: __device_attach+0x3c/0x60
      Caller[000000000079d85c]: bus_for_each_drv+0x5c/0xa0
      Caller[000000000079f588]: device_attach+0x88/0xc0
      Caller[000000000071acd0]: pci_bus_add_device+0x30/0x80
      Caller[0000000000736090]: virtfn_add.clone.1+0x210/0x360
      Caller[00000000007364a4]: sriov_enable+0x2c4/0x520
      Caller[000000000073672c]: pci_enable_sriov+0x2c/0x40
      Caller[00000000104c2d58]: mlx4_enable_sriov+0xf8/0x180 [mlx4_core]
      Caller[00000000104c49ac]: mlx4_load_one+0x42c/0xd40 [mlx4_core]
      Caller[00000000104c5f90]: __mlx4_init_one+0x410/0x500 [mlx4_core]
      Caller[00000000104c613c]: mlx4_init_one+0xbc/0x120 [mlx4_core]
      Caller[0000000000725f14]: local_pci_probe+0x34/0xa0
      Caller[0000000000726028]: pci_call_probe+0xa8/0xe0
      Caller[0000000000726310]: pci_device_probe+0x50/0x80
      Caller[000000000079f700]: really_probe+0x140/0x420
      Caller[000000000079fa24]: driver_probe_device+0x44/0xa0
      Caller[000000000079fb08]: __driver_attach+0x88/0xa0
      Caller[000000000079d90c]: bus_for_each_dev+0x6c/0xa0
      Caller[000000000079f29c]: driver_attach+0x1c/0x40
      Caller[000000000079e35c]: bus_add_driver+0x17c/0x220
      Caller[00000000007a02d4]: driver_register+0x74/0x120
      Caller[00000000007263fc]: __pci_register_driver+0x3c/0x60
      Caller[00000000104f62bc]: mlx4_init+0x60/0xcc [mlx4_core]
      Kernel panic - not syncing: Fatal exception
      Press Stop-A (L1-A) to return to the boot prom
      ---[ end Kernel panic - not syncing: Fatal exception
      
      Details:
      Here is the call sequence
      virtfn_add->__mlx4_init_one->dma_set_mask->dma_supported
      
      The panic happened at line 760(file arch/sparc/kernel/iommu.c)
      
      758 int dma_supported(struct device *dev, u64 device_mask)
      759 {
      760         struct iommu *iommu = dev->archdata.iommu;
      761         u64 dma_addr_mask = iommu->dma_addr_mask;
      762
      763         if (device_mask >= (1UL << 32UL))
      764                 return 0;
      765
      766         if ((device_mask & dma_addr_mask) == dma_addr_mask)
      767                 return 1;
      768
      769 #ifdef CONFIG_PCI
      770         if (dev_is_pci(dev))
      771		return pci64_dma_supported(to_pci_dev(dev), device_mask);
      772 #endif
      773
      774         return 0;
      775 }
      776 EXPORT_SYMBOL(dma_supported);
      
      Same panic happened with Intel ixgbe driver also.
      
      SR-IOV code looks for arch specific data while enabling
      VFs. When VF device is added, driver probe function makes set
      of calls to initialize the pci device. Because the VF device is
      added different way than the normal PF device(which happens via
      of_create_pci_dev for sparc), some of the arch specific initialization
      does not happen for VF device.  That causes panic when archdata is
      accessed.
      
      To fix this, I have used already defined weak function
      pcibios_setup_device to copy archdata from PF to VF.
      Also verified the fix.
      Signed-off-by: default avatarBabu Moger <babu.moger@oracle.com>
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Reviewed-by: default avatarEthan Zhao <ethan.zhao@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ccd02310
    • David S. Miller's avatar
      sparc64: Fix sparc64_set_context stack handling. · b1206090
      David S. Miller authored
      [ Upstream commit 397d1533 ]
      
      Like a signal return, we should use synchronize_user_stack() rather
      than flush_user_windows().
      Reported-by: default avatarIlya Malakhov <ilmalakhovthefirst@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b1206090
    • Nitin Gupta's avatar
      sparc64: Fix numa node distance initialization · 4185bd68
      Nitin Gupta authored
      [ Upstream commit 36beca65 ]
      
      Orabug: 22495713
      
      Currently, NUMA node distance matrix is initialized only
      when a machine descriptor (MD) exists. However, sun4u
      machines (e.g. Sun Blade 2500) do not have an MD and thus
      distance values were left uninitialized. The initialization
      is now moved such that it happens on both sun4u and sun4v.
      Signed-off-by: default avatarNitin Gupta <nitin.m.gupta@oracle.com>
      Tested-by: default avatarMikael Pettersson <mikpelinux@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4185bd68
    • David S. Miller's avatar
      sparc64: Fix bootup regressions on some Kconfig combinations. · e9c74337
      David S. Miller authored
      [ Upstream commit 49fa5230 ]
      
      The system call tracing bug fix mentioned in the Fixes tag
      below increased the amount of assembler code in the sequence
      of assembler files included by head_64.S
      
      This caused to total set of code to exceed 0x4000 bytes in
      size, which overflows the expression in head_64.S that works
      to place swapper_tsb at address 0x408000.
      
      When this is violated, the TSB is not properly aligned, and
      also the trap table is not aligned properly either.  All of
      this together results in failed boots.
      
      So, do two things:
      
      1) Simplify some code by using ba,a instead of ba/nop to get
         those bytes back.
      
      2) Add a linker script assertion to make sure that if this
         happens again the build will fail.
      
      Fixes: 1a40b953 ("sparc: Fix system call tracing register handling.")
      Reported-by: default avatarMeelis Roos <mroos@linux.ee>
      Reported-by: default avatarJoerg Abraham <joerg.abraham@nokia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e9c74337
    • Mike Frysinger's avatar
      sparc: Fix system call tracing register handling. · c9bc125c
      Mike Frysinger authored
      [ Upstream commit 1a40b953 ]
      
      A system call trace trigger on entry allows the tracing
      process to inspect and potentially change the traced
      process's registers.
      
      Account for that by reloading the %g1 (syscall number)
      and %i0-%i5 (syscall argument) values.  We need to be
      careful to revalidate the range of %g1, and reload the
      system call table entry it corresponds to into %l7.
      Reported-by: default avatarMike Frysinger <vapier@gentoo.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Tested-by: default avatarMike Frysinger <vapier@gentoo.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c9bc125c
    • Al Viro's avatar
      fix d_walk()/non-delayed __d_free() race · 2b11d80e
      Al Viro authored
      commit 3d56c25e upstream.
      
      Ascend-to-parent logics in d_walk() depends on all encountered child
      dentries not getting freed without an RCU delay.  Unfortunately, in
      quite a few cases it is not true, with hard-to-hit oopsable race as
      the result.
      
      Fortunately, the fix is simiple; right now the rule is "if it ever
      been hashed, freeing must be delayed" and changing it to "if it
      ever had a parent, freeing must be delayed" closes that hole and
      covers all cases the old rule used to cover.  Moreover, pipes and
      sockets remain _not_ covered, so we do not introduce RCU delay in
      the cases which are the reason for having that delay conditional
      in the first place.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2b11d80e
    • Jann Horn's avatar
      sched: panic on corrupted stack end · c08b1a59
      Jann Horn authored
      commit 29d64551 upstream.
      
      Until now, hitting this BUG_ON caused a recursive oops (because oops
      handling involves do_exit(), which calls into the scheduler, which in
      turn raises an oops), which caused stuff below the stack to be
      overwritten until a panic happened (e.g.  via an oops in interrupt
      context, caused by the overwritten CPU index in the thread_info).
      
      Just panic directly.
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c08b1a59
    • Jann Horn's avatar
      proc: prevent stacking filesystems on top · 9beb96b3
      Jann Horn authored
      commit e54ad7f1 upstream.
      
      This prevents stacking filesystems (ecryptfs and overlayfs) from using
      procfs as lower filesystem.  There is too much magic going on inside
      procfs, and there is no good reason to stack stuff on top of procfs.
      
      (For example, procfs does access checks in VFS open handlers, and
      ecryptfs by design calls open handlers from a kernel thread that doesn't
      drop privileges or so.)
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9beb96b3
    • Andy Lutomirski's avatar
      x86/entry/traps: Don't force in_interrupt() to return true in IST handlers · 035a94d8
      Andy Lutomirski authored
      commit aaee8c3c upstream.
      
      Forcing in_interrupt() to return true if we're not in a bona fide
      interrupt confuses the softirq code.  This fixes warnings like:
      
        NOHZ: local_softirq_pending 282
      
      ... which can happen when running things like selftests/x86.
      
      This will change perf's static percpu buffer usage in IST context.
      I think this is okay, and it's changing the behavior to match
      historical (pre-4.0) behavior.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 95927475 ("x86, traps: Track entry into and exit from IST context")
      Link: http://lkml.kernel.org/r/cdc215f94d118d691d73df35275022331156fb45.1464130360.git.luto@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      035a94d8
    • Prasun Maiti's avatar
      wext: Fix 32 bit iwpriv compatibility issue with 64 bit Kernel · 47648b58
      Prasun Maiti authored
      commit 3d5fdff4 upstream.
      
      iwpriv app uses iw_point structure to send data to Kernel. The iw_point
      structure holds a pointer. For compatibility Kernel converts the pointer
      as required for WEXT IOCTLs (SIOCIWFIRST to SIOCIWLAST). Some drivers
      may use iw_handler_def.private_args to populate iwpriv commands instead
      of iw_handler_def.private. For those case, the IOCTLs from
      SIOCIWFIRSTPRIV to SIOCIWLASTPRIV will follow the path ndo_do_ioctl().
      Accordingly when the filled up iw_point structure comes from 32 bit
      iwpriv to 64 bit Kernel, Kernel will not convert the pointer and sends
      it to driver. So, the driver may get the invalid data.
      
      The pointer conversion for the IOCTLs (SIOCIWFIRSTPRIV to
      SIOCIWLASTPRIV), which follow the path ndo_do_ioctl(), is mandatory.
      This patch adds pointer conversion from 32 bit to 64 bit and vice versa,
      if the ioctl comes from 32 bit iwpriv to 64 bit Kernel.
      Signed-off-by: default avatarPrasun Maiti <prasunmaiti87@gmail.com>
      Signed-off-by: default avatarUjjal Roy <royujjal@gmail.com>
      Tested-by: default avatarDibyajyoti Ghosh <dibyajyotig@gmail.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      47648b58
    • Jann Horn's avatar
      ecryptfs: forbid opening files without mmap handler · dea2cf7c
      Jann Horn authored
      commit 2f36db71 upstream.
      
      This prevents users from triggering a stack overflow through a recursive
      invocation of pagefault handling that involves mapping procfs files into
      virtual memory.
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Acked-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dea2cf7c
    • Tejun Heo's avatar
      memcg: add RCU locking around css_for_each_descendant_pre() in memcg_offline_kmem() · d3f97524
      Tejun Heo authored
      commit 3a06bb78 upstream.
      
      memcg_offline_kmem() may be called from memcg_free_kmem() after a css
      init failure.  memcg_free_kmem() is a ->css_free callback which is
      called without cgroup_mutex and memcg_offline_kmem() ends up using
      css_for_each_descendant_pre() without any locking.  Fix it by adding rcu
      read locking around it.
      
          mkdir: cannot create directory `65530': No space left on device
          ===============================
          [ INFO: suspicious RCU usage. ]
          4.6.0-work+ #321 Not tainted
          -------------------------------
          kernel/cgroup.c:4008 cgroup_mutex or RCU read lock required!
           [  527.243970] other info that might help us debug this:
           [  527.244715]
          rcu_scheduler_active = 1, debug_locks = 0
          2 locks held by kworker/0:5/1664:
           #0:  ("cgroup_destroy"){.+.+..}, at: [<ffffffff81060ab5>] process_one_work+0x165/0x4a0
           #1:  ((&css->destroy_work)#3){+.+...}, at: [<ffffffff81060ab5>] process_one_work+0x165/0x4a0
           [  527.248098] stack backtrace:
          CPU: 0 PID: 1664 Comm: kworker/0:5 Not tainted 4.6.0-work+ #321
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014
          Workqueue: cgroup_destroy css_free_work_fn
          Call Trace:
            dump_stack+0x68/0xa1
            lockdep_rcu_suspicious+0xd7/0x110
            css_next_descendant_pre+0x7d/0xb0
            memcg_offline_kmem.part.44+0x4a/0xc0
            mem_cgroup_css_free+0x1ec/0x200
            css_free_work_fn+0x49/0x5e0
            process_one_work+0x1c5/0x4a0
            worker_thread+0x49/0x490
            kthread+0xea/0x100
            ret_from_fork+0x1f/0x40
      
      Link: http://lkml.kernel.org/r/20160526203018.GG23194@mtj.duckdns.orgSigned-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d3f97524
    • Helge Deller's avatar
      parisc: Fix pagefault crash in unaligned __get_user() call · 1125f3b0
      Helge Deller authored
      commit 8b78f260 upstream.
      
      One of the debian buildd servers had this crash in the syslog without
      any other information:
      
       Unaligned handler failed, ret = -2
       clock_adjtime (pid 22578): Unaligned data reference (code 28)
       CPU: 1 PID: 22578 Comm: clock_adjtime Tainted: G  E  4.5.0-2-parisc64-smp #1 Debian 4.5.4-1
       task: 000000007d9960f8 ti: 00000001bde7c000 task.ti: 00000001bde7c000
      
            YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
       PSW: 00001000000001001111100000001111 Tainted: G            E
       r00-03  000000ff0804f80f 00000001bde7c2b0 00000000402d2be8 00000001bde7c2b0
       r04-07  00000000409e1fd0 00000000fa6f7fff 00000001bde7c148 00000000fa6f7fff
       r08-11  0000000000000000 00000000ffffffff 00000000fac9bb7b 000000000002b4d4
       r12-15  000000000015241c 000000000015242c 000000000000002d 00000000fac9bb7b
       r16-19  0000000000028800 0000000000000001 0000000000000070 00000001bde7c218
       r20-23  0000000000000000 00000001bde7c210 0000000000000002 0000000000000000
       r24-27  0000000000000000 0000000000000000 00000001bde7c148 00000000409e1fd0
       r28-31  0000000000000001 00000001bde7c320 00000001bde7c350 00000001bde7c218
       sr00-03  0000000001200000 0000000001200000 0000000000000000 0000000001200000
       sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
      
       IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000402d2e84 00000000402d2e88
        IIR: 0ca0d089    ISR: 0000000001200000  IOR: 00000000fa6f7fff
        CPU:        1   CR30: 00000001bde7c000 CR31: ffffffffffffffff
        ORIG_R28: 00000002369fe628
        IAOQ[0]: compat_get_timex+0x2dc/0x3c0
        IAOQ[1]: compat_get_timex+0x2e0/0x3c0
        RP(r2): compat_get_timex+0x40/0x3c0
       Backtrace:
        [<00000000402d4608>] compat_SyS_clock_adjtime+0x40/0xc0
        [<0000000040205024>] syscall_exit+0x0/0x14
      
      This means the userspace program clock_adjtime called the clock_adjtime()
      syscall and then crashed inside the compat_get_timex() function.
      Syscalls should never crash programs, but instead return EFAULT.
      
      The IIR register contains the executed instruction, which disassebles
      into "ldw 0(sr3,r5),r9".
      This load-word instruction is part of __get_user() which tried to read the word
      at %r5/IOR (0xfa6f7fff). This means the unaligned handler jumped in.  The
      unaligned handler is able to emulate all ldw instructions, but it fails if it
      fails to read the source e.g. because of page fault.
      
      The following program reproduces the problem:
      
      #define _GNU_SOURCE
      #include <unistd.h>
      #include <sys/syscall.h>
      #include <sys/mman.h>
      
      int main(void) {
              /* allocate 8k */
              char *ptr = mmap(NULL, 2*4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
              /* free second half (upper 4k) and make it invalid. */
              munmap(ptr+4096, 4096);
              /* syscall where first int is unaligned and clobbers into invalid memory region */
              /* syscall should return EFAULT */
              return syscall(__NR_clock_adjtime, 0, ptr+4095);
      }
      
      To fix this issue we simply need to check if the faulting instruction address
      is in the exception fixup table when the unaligned handler failed. If it
      is, call the fixup routine instead of crashing.
      
      While looking at the unaligned handler I found another issue as well: The
      target register should not be modified if the handler was unsuccessful.
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1125f3b0
    • hongkun.cao's avatar
      pinctrl: mediatek: fix dual-edge code defect · b5ff1d60
      hongkun.cao authored
      commit 5edf673d upstream.
      
      When a dual-edge irq is triggered, an incorrect irq will be reported on
      condition that the external signal is not stable and this incorrect irq
      has been registered.
      Correct the register offset.
      Signed-off-by: default avatarHongkun Cao <hongkun.cao@mediatek.com>
      Reviewed-by: default avatarMatthias Brugger <matthias.bgg@gmail.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b5ff1d60
    • Thomas Huth's avatar
      powerpc/pseries: Add POWER8NVL support to ibm,client-architecture-support call · a976f62a
      Thomas Huth authored
      commit 7cc85103 upstream.
      
      If we do not provide the PVR for POWER8NVL, a guest on this system
      currently ends up in PowerISA 2.06 compatibility mode on KVM, since QEMU
      does not provide a generic PowerISA 2.07 mode yet. So some new
      instructions from POWER8 (like "mtvsrd") get disabled for the guest,
      resulting in crashes when using code compiled explicitly for
      POWER8 (e.g. with the "-mcpu=power8" option of GCC).
      
      Fixes: ddee09c0 ("powerpc: Add PVR for POWER8NVL processor")
      Signed-off-by: default avatarThomas Huth <thuth@redhat.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a976f62a
    • Thomas Huth's avatar
      powerpc: Use privileged SPR number for MMCR2 · cac2863f
      Thomas Huth authored
      commit 8dd75ccb upstream.
      
      We are already using the privileged versions of MMCR0, MMCR1
      and MMCRA in the kernel, so for MMCR2, we should better use
      the privileged versions, too, to be consistent.
      
      Fixes: 240686c1 ("powerpc: Initialise PMU related regs on Power8")
      Suggested-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarThomas Huth <thuth@redhat.com>
      Acked-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cac2863f
    • Thomas Huth's avatar
      powerpc: Fix definition of SIAR and SDAR registers · 4f27ca0e
      Thomas Huth authored
      commit d23fac2b upstream.
      
      The SIAR and SDAR registers are available twice, one time as SPRs
      780 / 781 (unprivileged, but read-only), and one time as the SPRs
      796 / 797 (privileged, but read and write). The Linux kernel code
      currently uses the unprivileged  SPRs - while this is OK for reading,
      writing to that register of course does not work.
      Since the KVM code tries to write to this register, too (see the mtspr
      in book3s_hv_rmhandlers.S), the contents of this register sometimes get
      lost for the guests, e.g. during migration of a VM.
      To fix this issue, simply switch to the privileged SPR numbers instead.
      Signed-off-by: default avatarThomas Huth <thuth@redhat.com>
      Acked-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4f27ca0e
    • Russell Currey's avatar
      powerpc/pseries/eeh: Handle RTAS delay requests in configure_bridge · baa6dfd6
      Russell Currey authored
      commit 871e178e upstream.
      
      In the "ibm,configure-pe" and "ibm,configure-bridge" RTAS calls, the
      spec states that values of 9900-9905 can be returned, indicating that
      software should delay for 10^x (where x is the last digit, i.e. 990x)
      milliseconds and attempt the call again. Currently, the kernel doesn't
      know about this, and respecting it fixes some PCI failures when the
      hypervisor is busy.
      
      The delay is capped at 0.2 seconds.
      Signed-off-by: default avatarRussell Currey <ruscur@russell.cc>
      Acked-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      baa6dfd6
    • Will Deacon's avatar
      arm64: mm: always take dirty state from new pte in ptep_set_access_flags · 5e8b53a4
      Will Deacon authored
      commit 0106d456 upstream.
      
      Commit 66dbd6e6 ("arm64: Implement ptep_set_access_flags() for
      hardware AF/DBM") ensured that pte flags are updated atomically in the
      face of potential concurrent, hardware-assisted updates. However, Alex
      reports that:
      
       | This patch breaks swapping for me.
       | In the broken case, you'll see either systemd cpu time spike (because
       | it's stuck in a page fault loop) or the system hang (because the
       | application owning the screen is stuck in a page fault loop).
      
      It turns out that this is because the 'dirty' argument to
      ptep_set_access_flags is always 0 for read faults, and so we can't use
      it to set PTE_RDONLY. The failing sequence is:
      
        1. We put down a PTE_WRITE | PTE_DIRTY | PTE_AF pte
        2. Memory pressure -> pte_mkold(pte) -> clear PTE_AF
        3. A read faults due to the missing access flag
        4. ptep_set_access_flags is called with dirty = 0, due to the read fault
        5. pte is then made PTE_WRITE | PTE_DIRTY | PTE_AF | PTE_RDONLY (!)
        6. A write faults, but pte_write is true so we get stuck
      
      The solution is to check the new page table entry (as would be done by
      the generic, non-atomic definition of ptep_set_access_flags that just
      calls set_pte_at) to establish the dirty state.
      
      Fixes: 66dbd6e6 ("arm64: Implement ptep_set_access_flags() for hardware AF/DBM")
      Reviewed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reported-by: default avatarAlexander Graf <agraf@suse.de>
      Tested-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5e8b53a4
    • Catalin Marinas's avatar
      arm64: Provide "model name" in /proc/cpuinfo for PER_LINUX32 tasks · d0bc1f47
      Catalin Marinas authored
      commit e47b020a upstream.
      
      This patch brings the PER_LINUX32 /proc/cpuinfo format more in line with
      the 32-bit ARM one by providing an additional line:
      
      model name      : ARMv8 Processor rev X (v8l)
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d0bc1f47
    • Tom Lendacky's avatar
      crypto: ccp - Fix AES XTS error for request sizes above 4096 · 774920ee
      Tom Lendacky authored
      commit ab6a11a7 upstream.
      
      The ccp-crypto module for AES XTS support has a bug that can allow requests
      greater than 4096 bytes in size to be passed to the CCP hardware. The CCP
      hardware does not support request sizes larger than 4096, resulting in
      incorrect output. The request should actually be handled by the fallback
      mechanism instantiated by the ccp-crypto module.
      
      Add a check to insure the request size is less than or equal to the maximum
      supported size and use the fallback mechanism if it is not.
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      774920ee
    • Arnd Bergmann's avatar
      crypto: public_key: select CRYPTO_AKCIPHER · b440f3ae
      Arnd Bergmann authored
      commit bad6a185 upstream.
      
      In some rare randconfig builds, we can end up with
      ASYMMETRIC_PUBLIC_KEY_SUBTYPE enabled but CRYPTO_AKCIPHER disabled,
      which fails to link because of the reference to crypto_alloc_akcipher:
      
      crypto/built-in.o: In function `public_key_verify_signature':
      :(.text+0x110e4): undefined reference to `crypto_alloc_akcipher'
      
      This adds a Kconfig 'select' statement to ensure the dependency
      is always there.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b440f3ae
    • Marc Zyngier's avatar
      irqchip/gic-v3: Fix ICC_SGI1R_EL1.INTID decoding mask · f32ef5c8
      Marc Zyngier authored
      commit dd5f1b04 upstream.
      
      The INTID mask is wrong, and is made a signed value, which has
      nteresting effects in the KVM emulation. Let's sanitize it.
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f32ef5c8
    • Michael Holzheu's avatar
      s390/bpf: reduce maximum program size to 64 KB · 9be2fa20
      Michael Holzheu authored
      commit 0fa96355 upstream.
      
      The s390 BFP compiler currently uses relative branch instructions
      that only support jumps up to 64 KB. Examples are "j", "jnz", "cgrj",
      etc.  Currently the maximum size of s390 BPF programs is set
      to 0x7ffff.  If branches over 64 KB are generated the, kernel can
      crash due to incorrect code.
      
      So fix this an reduce the maximum size to 64 KB. Programs larger than
      that will be interpreted.
      
      Fixes: ce2b6ad9 ("s390/bpf: increase BPF_SIZE_MAX")
      Signed-off-by: default avatarMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9be2fa20
    • Michael Holzheu's avatar
      s390/bpf: fix recache skb->data/hlen for skb_vlan_push/pop · ebf52918
      Michael Holzheu authored
      commit 6edf0aa4 upstream.
      
      In case of usage of skb_vlan_push/pop, in the prologue we store
      the SKB pointer on the stack and restore it after BPF_JMP_CALL
      to skb_vlan_push/pop.
      
      Unfortunately currently there are two bugs in the code:
      
       1) The wrong stack slot (offset 170 instead of 176) is used
       2) The wrong register (W1 instead of B1) is saved
      
      So fix this and use correct stack slot and register.
      
      Fixes: 9db7f2b8 ("s390/bpf: recache skb->data/hlen for skb_vlan_push/pop")
      Signed-off-by: default avatarMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ebf52918
    • Ben Dooks's avatar
      gpio: bcm-kona: fix bcm_kona_gpio_reset() warnings · e1c35534
      Ben Dooks authored
      commit b66b2a0a upstream.
      
      The bcm_kona_gpio_reset() calls bcm_kona_gpio_write_lock_regs()
      with what looks like the wrong parameter. The write_lock_regs
      function takes a pointer to the registers, not the bcm_kona_gpio
      structure.
      
      Fix the warning, and probably bug by changing the function to
      pass reg_base instead of kona_gpio, fixing the following warning:
      
      drivers/gpio/gpio-bcm-kona.c:550:47: warning: incorrect type in argument 1
        (different address spaces)
        expected void [noderef] <asn:2>*reg_base
        got struct bcm_kona_gpio *kona_gpio
        warning: incorrect type in argument 1 (different address spaces)
        expected void [noderef] <asn:2>*reg_base
        got struct bcm_kona_gpio *kona_gpio
      Signed-off-by: default avatarBen Dooks <ben.dooks@codethink.co.uk>
      Acked-by: default avatarRay Jui <ray.jui@broadcom.com>
      Reviewed-by: default avatarMarkus Mayer <mmayer@broadcom.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e1c35534
    • Russell King's avatar
      ARM: fix PTRACE_SETVFPREGS on SMP systems · 9edd6fd1
      Russell King authored
      commit e2dfb4b8 upstream.
      
      PTRACE_SETVFPREGS fails to properly mark the VFP register set to be
      reloaded, because it undoes one of the effects of vfp_flush_hwstate().
      
      Specifically vfp_flush_hwstate() sets thread->vfpstate.hard.cpu to
      an invalid CPU number, but vfp_set() overwrites this with the original
      CPU number, thereby rendering the hardware state as apparently "valid",
      even though the software state is more recent.
      
      Fix this by reverting the previous change.
      
      Fixes: 8130b9d7 ("ARM: 7308/1: vfp: flush thread hwstate before copying ptrace registers")
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Tested-by: default avatarSimon Marchi <simon.marchi@ericsson.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9edd6fd1
    • Torsten Hilbrich's avatar
      ALSA: hda/realtek: Add T560 docking unit fixup · da7f1c92
      Torsten Hilbrich authored
      commit dab38e43 upstream.
      
      Tested with Lenovo Ultradock. Fixes the non-working headphone jack on
      the docking unit.
      Signed-off-by: default avatarTorsten Hilbrich <torsten.hilbrich@secunet.com>
      Tested-by: default avatarTorsten Hilbrich <torsten.hilbrich@secunet.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      da7f1c92
    • Kailang Yang's avatar
      ALSA: hda/realtek - Add support for new codecs ALC700/ALC701/ALC703 · 81999107
      Kailang Yang authored
      commit 6fbae35a upstream.
      
      Support new codecs for ALC700/ALC701/ALC703.
      Signed-off-by: default avatarKailang Yang <kailang@realtek.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      81999107
    • Kailang Yang's avatar
      ALSA: hda/realtek - ALC256 speaker noise issue · c3fd646b
      Kailang Yang authored
      commit e69e7e03 upstream.
      
      That is some different register for ALC255 and ALC256.
      ALC256 can't fit with some ALC255 register.
      This issue is cause from LDO output voltage control.
      This patch is updated the right LDO register value.
      Signed-off-by: default avatarKailang Yang <kailang@realtek.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c3fd646b
    • AceLan Kao's avatar
      ALSA: hda - Fix headset mic detection problem for Dell machine · 1bf80a48
      AceLan Kao authored
      commit f90d83b3 upstream.
      
      Add the pin configuration value of this machine into the pin_quirk
      table to make DELL1_MIC_NO_PRESENCE apply to this machine.
      Signed-off-by: default avatarAceLan Kao <acelan.kao@canonical.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1bf80a48
    • Vinod Koul's avatar
      ALSA: hda - Add PCI ID for Kabylake · 1f4b7507
      Vinod Koul authored
      commit 35639a0e upstream.
      
      Kabylake shows up as PCI ID 0xa171. And Kabylake-LP as 0x9d71.
      Since these are similar to Skylake add these to SKL_PLUS macro
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1f4b7507
    • Paolo Bonzini's avatar
      KVM: irqfd: fix NULL pointer dereference in kvm_irq_map_gsi · 2cb77b0a
      Paolo Bonzini authored
      commit c622a3c2 upstream.
      
      Found by syzkaller:
      
          BUG: unable to handle kernel NULL pointer dereference at 0000000000000120
          IP: [<ffffffffa0797202>] kvm_irq_map_gsi+0x12/0x90 [kvm]
          PGD 6f80b067 PUD b6535067 PMD 0
          Oops: 0000 [#1] SMP
          CPU: 3 PID: 4988 Comm: a.out Not tainted 4.4.9-300.fc23.x86_64 #1
          [...]
          Call Trace:
           [<ffffffffa0795f62>] irqfd_update+0x32/0xc0 [kvm]
           [<ffffffffa0796c7c>] kvm_irqfd+0x3dc/0x5b0 [kvm]
           [<ffffffffa07943f4>] kvm_vm_ioctl+0x164/0x6f0 [kvm]
           [<ffffffff81241648>] do_vfs_ioctl+0x298/0x480
           [<ffffffff812418a9>] SyS_ioctl+0x79/0x90
           [<ffffffff817a1062>] tracesys_phase2+0x84/0x89
          Code: b5 71 a7 e0 5b 41 5c 41 5d 5d f3 c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 8b 8f 10 2e 00 00 31 c0 48 89 e5 <39> 91 20 01 00 00 76 6a 48 63 d2 48 8b 94 d1 28 01 00 00 48 85
          RIP  [<ffffffffa0797202>] kvm_irq_map_gsi+0x12/0x90 [kvm]
           RSP <ffff8800926cbca8>
          CR2: 0000000000000120
      
      Testcase:
      
          #include <unistd.h>
          #include <sys/syscall.h>
          #include <string.h>
          #include <stdint.h>
          #include <linux/kvm.h>
          #include <fcntl.h>
          #include <sys/ioctl.h>
      
          long r[26];
      
          int main()
          {
              memset(r, -1, sizeof(r));
              r[2] = open("/dev/kvm", 0);
              r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
      
              struct kvm_irqfd ifd;
              ifd.fd = syscall(SYS_eventfd2, 5, 0);
              ifd.gsi = 3;
              ifd.flags = 2;
              ifd.resamplefd = ifd.fd;
              r[25] = ioctl(r[3], KVM_IRQFD, &ifd);
              return 0;
          }
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2cb77b0a
    • Paolo Bonzini's avatar
      KVM: x86: fix OOPS after invalid KVM_SET_DEBUGREGS · ded4fc62
      Paolo Bonzini authored
      commit d14bdb55 upstream.
      
      MOV to DR6 or DR7 causes a #GP if an attempt is made to write a 1 to
      any of bits 63:32.  However, this is not detected at KVM_SET_DEBUGREGS
      time, and the next KVM_RUN oopses:
      
         general protection fault: 0000 [#1] SMP
         CPU: 2 PID: 14987 Comm: a.out Not tainted 4.4.9-300.fc23.x86_64 #1
         Hardware name: LENOVO 2325F51/2325F51, BIOS G2ET32WW (1.12 ) 05/30/2012
         [...]
         Call Trace:
          [<ffffffffa072c93d>] kvm_arch_vcpu_ioctl_run+0x141d/0x14e0 [kvm]
          [<ffffffffa071405d>] kvm_vcpu_ioctl+0x33d/0x620 [kvm]
          [<ffffffff81241648>] do_vfs_ioctl+0x298/0x480
          [<ffffffff812418a9>] SyS_ioctl+0x79/0x90
          [<ffffffff817a0f2e>] entry_SYSCALL_64_fastpath+0x12/0x71
         Code: 55 83 ff 07 48 89 e5 77 27 89 ff ff 24 fd 90 87 80 81 0f 23 fe 5d c3 0f 23 c6 5d c3 0f 23 ce 5d c3 0f 23 d6 5d c3 0f 23 de 5d c3 <0f> 23 f6 5d c3 0f 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00
         RIP  [<ffffffff810639eb>] native_set_debugreg+0x2b/0x40
          RSP <ffff88005836bd50>
      
      Testcase (beautified/reduced from syzkaller output):
      
          #include <unistd.h>
          #include <sys/syscall.h>
          #include <string.h>
          #include <stdint.h>
          #include <linux/kvm.h>
          #include <fcntl.h>
          #include <sys/ioctl.h>
      
          long r[8];
      
          int main()
          {
              struct kvm_debugregs dr = { 0 };
      
              r[2] = open("/dev/kvm", O_RDONLY);
              r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
              r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7);
      
              memcpy(&dr,
                     "\x5d\x6a\x6b\xe8\x57\x3b\x4b\x7e\xcf\x0d\xa1\x72"
                     "\xa3\x4a\x29\x0c\xfc\x6d\x44\x00\xa7\x52\xc7\xd8"
                     "\x00\xdb\x89\x9d\x78\xb5\x54\x6b\x6b\x13\x1c\xe9"
                     "\x5e\xd3\x0e\x40\x6f\xb4\x66\xf7\x5b\xe3\x36\xcb",
                     48);
              r[7] = ioctl(r[4], KVM_SET_DEBUGREGS, &dr);
              r[6] = ioctl(r[4], KVM_RUN, 0);
          }
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ded4fc62
    • David Wragg's avatar
      vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devices · ce9c0dba
      David Wragg authored
      [ Upstream commit 7e059158 ]
      
      Prior to 4.3, openvswitch tunnel vports (vxlan, gre and geneve) could
      transmit vxlan packets of any size, constrained only by the ability to
      send out the resulting packets.  4.3 introduced netdevs corresponding
      to tunnel vports.  These netdevs have an MTU, which limits the size of
      a packet that can be successfully encapsulated.  The default MTU
      values are low (1500 or less), which is awkwardly small in the context
      of physical networks supporting jumbo frames, and leads to a
      conspicuous change in behaviour for userspace.
      
      Instead, set the MTU on openvswitch-created netdevs to be the relevant
      maximum (i.e. the maximum IP packet size minus any relevant overhead),
      effectively restoring the behaviour prior to 4.3.
      Signed-off-by: default avatarDavid Wragg <david@weave.works>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ce9c0dba
    • David Wragg's avatar
      geneve: Relax MTU constraints · 51d7c394
      David Wragg authored
      [ Upstream commit 55e5bfb5 ]
      
      Allow the MTU of geneve devices to be set to large values, in order to
      exploit underlying networks with larger frame sizes.
      
      GENEVE does not have a fixed encapsulation overhead (an openvswitch
      rule can add variable length options), so there is no relevant maximum
      MTU to enforce.  A maximum of IP_MAX_MTU is used instead.
      Encapsulated packets that are too big for the underlying network will
      get dropped on the floor.
      Signed-off-by: default avatarDavid Wragg <david@weave.works>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      51d7c394
    • David Wragg's avatar
      vxlan: Relax MTU constraints · 3dc44305
      David Wragg authored
      [ Upstream commit 72564b59 ]
      
      Allow the MTU of vxlan devices without an underlying device to be set
      to larger values (up to a maximum based on IP packet limits and vxlan
      overhead).
      
      Previously, their MTUs could not be set to higher than the
      conventional ethernet value of 1500.  This is a very arbitrary value
      in the context of vxlan, and prevented vxlan devices from being able
      to take advantage of jumbo frames etc.
      
      The default MTU remains 1500, for compatibility.
      Signed-off-by: default avatarDavid Wragg <david@weave.works>
      Acked-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3dc44305
    • Jakub Sitnicki's avatar
      ipv6: Skip XFRM lookup if dst_entry in socket cache is valid · 4d82f395
      Jakub Sitnicki authored
      [ Upstream commit 00bc0ef5 ]
      
      At present we perform an xfrm_lookup() for each UDPv6 message we
      send. The lookup involves querying the flow cache (flow_cache_lookup)
      and, in case of a cache miss, creating an XFRM bundle.
      
      If we miss the flow cache, we can end up creating a new bundle and
      deriving the path MTU (xfrm_init_pmtu) from on an already transformed
      dst_entry, which we pass from the socket cache (sk->sk_dst_cache) down
      to xfrm_lookup(). This can happen only if we're caching the dst_entry
      in the socket, that is when we're using a connected UDP socket.
      
      To put it another way, the path MTU shrinks each time we miss the flow
      cache, which later on leads to incorrectly fragmented payload. It can
      be observed with ESPv6 in transport mode:
      
        1) Set up a transformation and lower the MTU to trigger fragmentation
          # ip xfrm policy add dir out src ::1 dst ::1 \
            tmpl src ::1 dst ::1 proto esp spi 1
          # ip xfrm state add src ::1 dst ::1 \
            proto esp spi 1 enc 'aes' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b
          # ip link set dev lo mtu 1500
      
        2) Monitor the packet flow and set up an UDP sink
          # tcpdump -ni lo -ttt &
          # socat udp6-listen:12345,fork /dev/null &
      
        3) Send a datagram that needs fragmentation with a connected socket
          # perl -e 'print "@" x 1470 | socat - udp6:[::1]:12345
          2016/06/07 18:52:52 socat[724] E read(3, 0x555bb3d5ba00, 8192): Protocol error
          00:00:00.000000 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x2), length 1448
          00:00:00.000014 IP6 ::1 > ::1: frag (1448|32)
          00:00:00.000050 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x3), length 1272
          (^ ICMPv6 Parameter Problem)
          00:00:00.000022 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x5), length 136
      
        4) Compare it to a non-connected socket
          # perl -e 'print "@" x 1500' | socat - udp6-sendto:[::1]:12345
          00:00:40.535488 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x6), length 1448
          00:00:00.000010 IP6 ::1 > ::1: frag (1448|64)
      
      What happens in step (3) is:
      
        1) when connecting the socket in __ip6_datagram_connect(), we
           perform an XFRM lookup, miss the flow cache, create an XFRM
           bundle, and cache the destination,
      
        2) afterwards, when sending the datagram, we perform an XFRM lookup,
           again, miss the flow cache (due to mismatch of flowi6_iif and
           flowi6_oif, which is an issue of its own), and recreate an XFRM
           bundle based on the cached (and already transformed) destination.
      
      To prevent the recreation of an XFRM bundle, avoid an XFRM lookup
      altogether whenever we already have a destination entry cached in the
      socket. This prevents the path MTU shrinkage and brings us on par with
      UDPv4.
      
      The fix also benefits connected PINGv6 sockets, another user of
      ip6_sk_dst_lookup_flow(), who also suffer messages being transformed
      twice.
      
      Joint work with Hannes Frederic Sowa.
      Reported-by: default avatarJan Tluka <jtluka@redhat.com>
      Signed-off-by: default avatarJakub Sitnicki <jkbs@redhat.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4d82f395
    • Guillaume Nault's avatar
      l2tp: fix configuration passed to setup_udp_tunnel_sock() · 05cbd46b
      Guillaume Nault authored
      [ Upstream commit a5c5e2da ]
      
      Unused fields of udp_cfg must be all zeros. Otherwise
      setup_udp_tunnel_sock() fills ->gro_receive and ->gro_complete
      callbacks with garbage, eventually resulting in panic when used by
      udp_gro_receive().
      
      [   72.694123] BUG: unable to handle kernel paging request at ffff880033f87d78
      [   72.695518] IP: [<ffff880033f87d78>] 0xffff880033f87d78
      [   72.696530] PGD 26e2067 PUD 26e3067 PMD 342ed063 PTE 8000000033f87163
      [   72.696530] Oops: 0011 [#1] SMP KASAN
      [   72.696530] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pptp gre pppox ppp_generic slhc crc32c_intel ghash_clmulni_intel jitterentropy_rng sha256_generic hmac drbg ansi_cprng aesni_intel evdev aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper serio_raw acpi_cpufreq button proc\
      essor ext4 crc16 jbd2 mbcache virtio_blk virtio_net virtio_pci virtio_ring virtio
      [   72.696530] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.7.0-rc1 #1
      [   72.696530] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
      [   72.696530] task: ffff880035b59700 ti: ffff880035b70000 task.ti: ffff880035b70000
      [   72.696530] RIP: 0010:[<ffff880033f87d78>]  [<ffff880033f87d78>] 0xffff880033f87d78
      [   72.696530] RSP: 0018:ffff880035f87bc0  EFLAGS: 00010246
      [   72.696530] RAX: ffffed000698f996 RBX: ffff88003326b840 RCX: ffffffff814cc823
      [   72.696530] RDX: ffff88003326b840 RSI: ffff880033e48038 RDI: ffff880034c7c780
      [   72.696530] RBP: ffff880035f87c18 R08: 000000000000a506 R09: 0000000000000000
      [   72.696530] R10: ffff880035f87b38 R11: ffff880034b9344d R12: 00000000ebfea715
      [   72.696530] R13: 0000000000000000 R14: ffff880034c7c780 R15: 0000000000000000
      [   72.696530] FS:  0000000000000000(0000) GS:ffff880035f80000(0000) knlGS:0000000000000000
      [   72.696530] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   72.696530] CR2: ffff880033f87d78 CR3: 0000000033c98000 CR4: 00000000000406a0
      [   72.696530] Stack:
      [   72.696530]  ffffffff814cc834 ffff880034b93468 0000001481416818 ffff88003326b874
      [   72.696530]  ffff880034c7ccb0 ffff880033e48038 ffff88003326b840 ffff880034b93462
      [   72.696530]  ffff88003326b88a ffff88003326b88c ffff880034b93468 ffff880035f87c70
      [   72.696530] Call Trace:
      [   72.696530]  <IRQ>
      [   72.696530]  [<ffffffff814cc834>] ? udp_gro_receive+0x1c6/0x1f9
      [   72.696530]  [<ffffffff814ccb1c>] udp4_gro_receive+0x2b5/0x310
      [   72.696530]  [<ffffffff814d989b>] inet_gro_receive+0x4a3/0x4cd
      [   72.696530]  [<ffffffff81431b32>] dev_gro_receive+0x584/0x7a3
      [   72.696530]  [<ffffffff810adf7a>] ? __lock_is_held+0x29/0x64
      [   72.696530]  [<ffffffff814321f7>] napi_gro_receive+0x124/0x21d
      [   72.696530]  [<ffffffffa000b145>] virtnet_receive+0x8df/0x8f6 [virtio_net]
      [   72.696530]  [<ffffffffa000b27e>] virtnet_poll+0x1d/0x8d [virtio_net]
      [   72.696530]  [<ffffffff81431350>] net_rx_action+0x15b/0x3b9
      [   72.696530]  [<ffffffff815893d6>] __do_softirq+0x216/0x546
      [   72.696530]  [<ffffffff81062392>] irq_exit+0x49/0xb6
      [   72.696530]  [<ffffffff81588e9a>] do_IRQ+0xe2/0xfa
      [   72.696530]  [<ffffffff81587a49>] common_interrupt+0x89/0x89
      [   72.696530]  <EOI>
      [   72.696530]  [<ffffffff810b05df>] ? trace_hardirqs_on_caller+0x229/0x270
      [   72.696530]  [<ffffffff8102b3c7>] ? default_idle+0x1c/0x2d
      [   72.696530]  [<ffffffff8102b3c5>] ? default_idle+0x1a/0x2d
      [   72.696530]  [<ffffffff8102bb8c>] arch_cpu_idle+0xa/0xc
      [   72.696530]  [<ffffffff810a6c39>] default_idle_call+0x1a/0x1c
      [   72.696530]  [<ffffffff810a6d96>] cpu_startup_entry+0x15b/0x20f
      [   72.696530]  [<ffffffff81039a81>] start_secondary+0x12c/0x133
      [   72.696530] Code: ff ff ff ff ff ff ff ff ff ff 7f ff ff ff ff ff ff ff 7f 00 7e f8 33 00 88 ff ff 6d 61 58 81 ff ff ff ff 5e de 0a 81 ff ff ff ff <00> 5c e2 34 00 88 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   72.696530] RIP  [<ffff880033f87d78>] 0xffff880033f87d78
      [   72.696530]  RSP <ffff880035f87bc0>
      [   72.696530] CR2: ffff880033f87d78
      [   72.696530] ---[ end trace ad7758b9a1dccf99 ]---
      [   72.696530] Kernel panic - not syncing: Fatal exception in interrupt
      [   72.696530] Kernel Offset: disabled
      [   72.696530] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
      
      v2: use empty initialiser instead of "{ NULL }" to avoid relying on
          first field's type.
      
      Fixes: 38fd2af2 ("udp: Add socket based GRO and config")
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      05cbd46b