1. 24 Jun, 2016 40 commits
    • Al Viro's avatar
      fix d_walk()/non-delayed __d_free() race · 21ceba1b
      Al Viro authored
      commit 3d56c25e upstream.
      
      Ascend-to-parent logics in d_walk() depends on all encountered child
      dentries not getting freed without an RCU delay.  Unfortunately, in
      quite a few cases it is not true, with hard-to-hit oopsable race as
      the result.
      
      Fortunately, the fix is simiple; right now the rule is "if it ever
      been hashed, freeing must be delayed" and changing it to "if it
      ever had a parent, freeing must be delayed" closes that hole and
      covers all cases the old rule used to cover.  Moreover, pipes and
      sockets remain _not_ covered, so we do not introduce RCU delay in
      the cases which are the reason for having that delay conditional
      in the first place.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      21ceba1b
    • Jann Horn's avatar
      sched: panic on corrupted stack end · 80f7d5b0
      Jann Horn authored
      commit 29d64551 upstream.
      
      Until now, hitting this BUG_ON caused a recursive oops (because oops
      handling involves do_exit(), which calls into the scheduler, which in
      turn raises an oops), which caused stuff below the stack to be
      overwritten until a panic happened (e.g.  via an oops in interrupt
      context, caused by the overwritten CPU index in the thread_info).
      
      Just panic directly.
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      80f7d5b0
    • Jann Horn's avatar
      proc: prevent stacking filesystems on top · c4fd3264
      Jann Horn authored
      commit e54ad7f1 upstream.
      
      This prevents stacking filesystems (ecryptfs and overlayfs) from using
      procfs as lower filesystem.  There is too much magic going on inside
      procfs, and there is no good reason to stack stuff on top of procfs.
      
      (For example, procfs does access checks in VFS open handlers, and
      ecryptfs by design calls open handlers from a kernel thread that doesn't
      drop privileges or so.)
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c4fd3264
    • Andy Lutomirski's avatar
      x86/entry/traps: Don't force in_interrupt() to return true in IST handlers · af1a31e7
      Andy Lutomirski authored
      commit aaee8c3c upstream.
      
      Forcing in_interrupt() to return true if we're not in a bona fide
      interrupt confuses the softirq code.  This fixes warnings like:
      
        NOHZ: local_softirq_pending 282
      
      ... which can happen when running things like selftests/x86.
      
      This will change perf's static percpu buffer usage in IST context.
      I think this is okay, and it's changing the behavior to match
      historical (pre-4.0) behavior.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 95927475 ("x86, traps: Track entry into and exit from IST context")
      Link: http://lkml.kernel.org/r/cdc215f94d118d691d73df35275022331156fb45.1464130360.git.luto@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      af1a31e7
    • Gerald Schaefer's avatar
      mm: thp: broken page count after commit aa88b68c · a5f09a92
      Gerald Schaefer authored
      commit 770a5370 upstream.
      
      Christian Borntraeger reported a kernel panic after corrupt page counts,
      and it turned out to be a regression introduced with commit aa88b68c
      ("thp: keep huge zero page pinned until tlb flush"), at least on s390.
      
      put_huge_zero_page() was moved over from zap_huge_pmd() to
      release_pages(), and it was replaced by tlb_remove_page().  However,
      release_pages() might not always be triggered by (the arch-specific)
      tlb_remove_page().
      
      On s390 we call free_page_and_swap_cache() from tlb_remove_page(), and
      not tlb_flush_mmu() -> free_pages_and_swap_cache() like the generic
      version, because we don't use the MMU-gather logic.  Although both
      functions have very similar names, they are doing very unsimilar things,
      in particular free_page_xxx is just doing a put_page(), while
      free_pages_xxx calls release_pages().
      
      This of course results in very harmful put_page()s on the huge zero
      page, on architectures where tlb_remove_page() is implemented in this
      way.  It seems to affect only s390 and sh, but sh doesn't have THP
      support, so the problem (currently) probably only exists on s390.
      
      The following quick hack fixed the issue:
      
      Link: http://lkml.kernel.org/r/20160602172141.75c006a9@thinkpadSigned-off-by: default avatarGerald Schaefer <gerald.schaefer@de.ibm.com>
      Reported-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Tested-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a5f09a92
    • Prasun Maiti's avatar
      wext: Fix 32 bit iwpriv compatibility issue with 64 bit Kernel · 99642546
      Prasun Maiti authored
      commit 3d5fdff4 upstream.
      
      iwpriv app uses iw_point structure to send data to Kernel. The iw_point
      structure holds a pointer. For compatibility Kernel converts the pointer
      as required for WEXT IOCTLs (SIOCIWFIRST to SIOCIWLAST). Some drivers
      may use iw_handler_def.private_args to populate iwpriv commands instead
      of iw_handler_def.private. For those case, the IOCTLs from
      SIOCIWFIRSTPRIV to SIOCIWLASTPRIV will follow the path ndo_do_ioctl().
      Accordingly when the filled up iw_point structure comes from 32 bit
      iwpriv to 64 bit Kernel, Kernel will not convert the pointer and sends
      it to driver. So, the driver may get the invalid data.
      
      The pointer conversion for the IOCTLs (SIOCIWFIRSTPRIV to
      SIOCIWLASTPRIV), which follow the path ndo_do_ioctl(), is mandatory.
      This patch adds pointer conversion from 32 bit to 64 bit and vice versa,
      if the ioctl comes from 32 bit iwpriv to 64 bit Kernel.
      Signed-off-by: default avatarPrasun Maiti <prasunmaiti87@gmail.com>
      Signed-off-by: default avatarUjjal Roy <royujjal@gmail.com>
      Tested-by: default avatarDibyajyoti Ghosh <dibyajyotig@gmail.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      99642546
    • Jann Horn's avatar
      ecryptfs: forbid opening files without mmap handler · de4307f6
      Jann Horn authored
      commit 2f36db71 upstream.
      
      This prevents users from triggering a stack overflow through a recursive
      invocation of pagefault handling that involves mapping procfs files into
      virtual memory.
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Acked-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      de4307f6
    • Tejun Heo's avatar
      memcg: add RCU locking around css_for_each_descendant_pre() in memcg_offline_kmem() · 50c596c2
      Tejun Heo authored
      commit 3a06bb78 upstream.
      
      memcg_offline_kmem() may be called from memcg_free_kmem() after a css
      init failure.  memcg_free_kmem() is a ->css_free callback which is
      called without cgroup_mutex and memcg_offline_kmem() ends up using
      css_for_each_descendant_pre() without any locking.  Fix it by adding rcu
      read locking around it.
      
          mkdir: cannot create directory `65530': No space left on device
          ===============================
          [ INFO: suspicious RCU usage. ]
          4.6.0-work+ #321 Not tainted
          -------------------------------
          kernel/cgroup.c:4008 cgroup_mutex or RCU read lock required!
           [  527.243970] other info that might help us debug this:
           [  527.244715]
          rcu_scheduler_active = 1, debug_locks = 0
          2 locks held by kworker/0:5/1664:
           #0:  ("cgroup_destroy"){.+.+..}, at: [<ffffffff81060ab5>] process_one_work+0x165/0x4a0
           #1:  ((&css->destroy_work)#3){+.+...}, at: [<ffffffff81060ab5>] process_one_work+0x165/0x4a0
           [  527.248098] stack backtrace:
          CPU: 0 PID: 1664 Comm: kworker/0:5 Not tainted 4.6.0-work+ #321
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014
          Workqueue: cgroup_destroy css_free_work_fn
          Call Trace:
            dump_stack+0x68/0xa1
            lockdep_rcu_suspicious+0xd7/0x110
            css_next_descendant_pre+0x7d/0xb0
            memcg_offline_kmem.part.44+0x4a/0xc0
            mem_cgroup_css_free+0x1ec/0x200
            css_free_work_fn+0x49/0x5e0
            process_one_work+0x1c5/0x4a0
            worker_thread+0x49/0x490
            kthread+0xea/0x100
            ret_from_fork+0x1f/0x40
      
      Link: http://lkml.kernel.org/r/20160526203018.GG23194@mtj.duckdns.orgSigned-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      50c596c2
    • Helge Deller's avatar
      parisc: Fix pagefault crash in unaligned __get_user() call · 05e1ad39
      Helge Deller authored
      commit 8b78f260 upstream.
      
      One of the debian buildd servers had this crash in the syslog without
      any other information:
      
       Unaligned handler failed, ret = -2
       clock_adjtime (pid 22578): Unaligned data reference (code 28)
       CPU: 1 PID: 22578 Comm: clock_adjtime Tainted: G  E  4.5.0-2-parisc64-smp #1 Debian 4.5.4-1
       task: 000000007d9960f8 ti: 00000001bde7c000 task.ti: 00000001bde7c000
      
            YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
       PSW: 00001000000001001111100000001111 Tainted: G            E
       r00-03  000000ff0804f80f 00000001bde7c2b0 00000000402d2be8 00000001bde7c2b0
       r04-07  00000000409e1fd0 00000000fa6f7fff 00000001bde7c148 00000000fa6f7fff
       r08-11  0000000000000000 00000000ffffffff 00000000fac9bb7b 000000000002b4d4
       r12-15  000000000015241c 000000000015242c 000000000000002d 00000000fac9bb7b
       r16-19  0000000000028800 0000000000000001 0000000000000070 00000001bde7c218
       r20-23  0000000000000000 00000001bde7c210 0000000000000002 0000000000000000
       r24-27  0000000000000000 0000000000000000 00000001bde7c148 00000000409e1fd0
       r28-31  0000000000000001 00000001bde7c320 00000001bde7c350 00000001bde7c218
       sr00-03  0000000001200000 0000000001200000 0000000000000000 0000000001200000
       sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
      
       IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000402d2e84 00000000402d2e88
        IIR: 0ca0d089    ISR: 0000000001200000  IOR: 00000000fa6f7fff
        CPU:        1   CR30: 00000001bde7c000 CR31: ffffffffffffffff
        ORIG_R28: 00000002369fe628
        IAOQ[0]: compat_get_timex+0x2dc/0x3c0
        IAOQ[1]: compat_get_timex+0x2e0/0x3c0
        RP(r2): compat_get_timex+0x40/0x3c0
       Backtrace:
        [<00000000402d4608>] compat_SyS_clock_adjtime+0x40/0xc0
        [<0000000040205024>] syscall_exit+0x0/0x14
      
      This means the userspace program clock_adjtime called the clock_adjtime()
      syscall and then crashed inside the compat_get_timex() function.
      Syscalls should never crash programs, but instead return EFAULT.
      
      The IIR register contains the executed instruction, which disassebles
      into "ldw 0(sr3,r5),r9".
      This load-word instruction is part of __get_user() which tried to read the word
      at %r5/IOR (0xfa6f7fff). This means the unaligned handler jumped in.  The
      unaligned handler is able to emulate all ldw instructions, but it fails if it
      fails to read the source e.g. because of page fault.
      
      The following program reproduces the problem:
      
      #define _GNU_SOURCE
      #include <unistd.h>
      #include <sys/syscall.h>
      #include <sys/mman.h>
      
      int main(void) {
              /* allocate 8k */
              char *ptr = mmap(NULL, 2*4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
              /* free second half (upper 4k) and make it invalid. */
              munmap(ptr+4096, 4096);
              /* syscall where first int is unaligned and clobbers into invalid memory region */
              /* syscall should return EFAULT */
              return syscall(__NR_clock_adjtime, 0, ptr+4095);
      }
      
      To fix this issue we simply need to check if the faulting instruction address
      is in the exception fixup table when the unaligned handler failed. If it
      is, call the fixup routine instead of crashing.
      
      While looking at the unaligned handler I found another issue as well: The
      target register should not be modified if the handler was unsuccessful.
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      05e1ad39
    • hongkun.cao's avatar
      pinctrl: mediatek: fix dual-edge code defect · a958e05d
      hongkun.cao authored
      commit 5edf673d upstream.
      
      When a dual-edge irq is triggered, an incorrect irq will be reported on
      condition that the external signal is not stable and this incorrect irq
      has been registered.
      Correct the register offset.
      Signed-off-by: default avatarHongkun Cao <hongkun.cao@mediatek.com>
      Reviewed-by: default avatarMatthias Brugger <matthias.bgg@gmail.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a958e05d
    • Aneesh Kumar K.V's avatar
      powerpc/mm/hash: Fix the reference bit update when handling hash fault · 05036294
      Aneesh Kumar K.V authored
      commit dc47c0c1 upstream.
      
      When we converted the asm routines to C functions, we missed updating
      HPTE_R_R based on _PAGE_ACCESSED. ASM code used to copy over the lower
      bits from pte via.
      
      andi.	r3,r30,0x1fe		/* Get basic set of flags */
      
      We also update the code such that we won't update the Change bit ('C'
      bit) always. This was added by commit c5cf0e30 ("powerpc: Fix
      buglet with MMU hash management").
      
      With hash64, we need to make sure that hardware doesn't do a pte update
      directly. This is because we do end up with entries in TLB with no hash
      page table entry. This happens because when we find a hash bucket full,
      we "evict" a more/less random entry from it. When we do that we don't
      invalidate the TLB (hpte_remove) because we assume the old translation
      is still technically "valid". For more info look at commit
      0608d692("powerpc/mm: Always invalidate tlb on hpte invalidate and
      update").
      
      Thus it's critical that valid hash PTEs always have reference bit set
      and writeable ones have change bit set. We do this by hashing a
      non-dirty linux PTE as read-only and always setting _PAGE_ACCESSED (and
      thus R) when hashing anything else in. Any attempt by Linux at clearing
      those bits also removes the corresponding hash entry.
      
      Commit 5cf0e30bf3d8 did that for 'C' bit by enabling 'C' bit always.
      We don't really need to do that because we never map a RW pte entry
      without setting 'C' bit. On READ fault on a RW pte entry, we still map
      it READ only, hence a store update in the page will still cause a hash
      pte fault.
      
      This patch reverts the part of commit c5cf0e30 ("[PATCH] powerpc:
      Fix buglet with MMU hash management") and retain the updatepp part.
      
      - If we hit the updatepp path on native, the old code without that
        commit, would fail to set C bcause native_hpte_updatepp()
        was implemented to filter the same bits as H_PROTECT and not let C
        through thus we would "upgrade" a RO HPTE to RW without setting C
        thus causing the bug. So the real fix in that commit was the change
        to native_hpte_updatepp
      
      Fixes: 89ff7250 ("powerpc/mm: Convert __hash_page_64K to C")
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      05036294
    • Thomas Huth's avatar
      powerpc/pseries: Add POWER8NVL support to ibm,client-architecture-support call · 39d615c3
      Thomas Huth authored
      commit 7cc85103 upstream.
      
      If we do not provide the PVR for POWER8NVL, a guest on this system
      currently ends up in PowerISA 2.06 compatibility mode on KVM, since QEMU
      does not provide a generic PowerISA 2.07 mode yet. So some new
      instructions from POWER8 (like "mtvsrd") get disabled for the guest,
      resulting in crashes when using code compiled explicitly for
      POWER8 (e.g. with the "-mcpu=power8" option of GCC).
      
      Fixes: ddee09c0 ("powerpc: Add PVR for POWER8NVL processor")
      Signed-off-by: default avatarThomas Huth <thuth@redhat.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      39d615c3
    • Thomas Huth's avatar
      powerpc: Use privileged SPR number for MMCR2 · 59f286ba
      Thomas Huth authored
      commit 8dd75ccb upstream.
      
      We are already using the privileged versions of MMCR0, MMCR1
      and MMCRA in the kernel, so for MMCR2, we should better use
      the privileged versions, too, to be consistent.
      
      Fixes: 240686c1 ("powerpc: Initialise PMU related regs on Power8")
      Suggested-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarThomas Huth <thuth@redhat.com>
      Acked-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      59f286ba
    • Thomas Huth's avatar
      powerpc: Fix definition of SIAR and SDAR registers · d2f0f1d0
      Thomas Huth authored
      commit d23fac2b upstream.
      
      The SIAR and SDAR registers are available twice, one time as SPRs
      780 / 781 (unprivileged, but read-only), and one time as the SPRs
      796 / 797 (privileged, but read and write). The Linux kernel code
      currently uses the unprivileged  SPRs - while this is OK for reading,
      writing to that register of course does not work.
      Since the KVM code tries to write to this register, too (see the mtspr
      in book3s_hv_rmhandlers.S), the contents of this register sometimes get
      lost for the guests, e.g. during migration of a VM.
      To fix this issue, simply switch to the privileged SPR numbers instead.
      Signed-off-by: default avatarThomas Huth <thuth@redhat.com>
      Acked-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d2f0f1d0
    • Russell Currey's avatar
      powerpc/pseries/eeh: Handle RTAS delay requests in configure_bridge · f3db6767
      Russell Currey authored
      commit 871e178e upstream.
      
      In the "ibm,configure-pe" and "ibm,configure-bridge" RTAS calls, the
      spec states that values of 9900-9905 can be returned, indicating that
      software should delay for 10^x (where x is the last digit, i.e. 990x)
      milliseconds and attempt the call again. Currently, the kernel doesn't
      know about this, and respecting it fixes some PCI failures when the
      hypervisor is busy.
      
      The delay is capped at 0.2 seconds.
      Signed-off-by: default avatarRussell Currey <ruscur@russell.cc>
      Acked-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f3db6767
    • Will Deacon's avatar
      arm64: mm: always take dirty state from new pte in ptep_set_access_flags · 24ba19ec
      Will Deacon authored
      commit 0106d456 upstream.
      
      Commit 66dbd6e6 ("arm64: Implement ptep_set_access_flags() for
      hardware AF/DBM") ensured that pte flags are updated atomically in the
      face of potential concurrent, hardware-assisted updates. However, Alex
      reports that:
      
       | This patch breaks swapping for me.
       | In the broken case, you'll see either systemd cpu time spike (because
       | it's stuck in a page fault loop) or the system hang (because the
       | application owning the screen is stuck in a page fault loop).
      
      It turns out that this is because the 'dirty' argument to
      ptep_set_access_flags is always 0 for read faults, and so we can't use
      it to set PTE_RDONLY. The failing sequence is:
      
        1. We put down a PTE_WRITE | PTE_DIRTY | PTE_AF pte
        2. Memory pressure -> pte_mkold(pte) -> clear PTE_AF
        3. A read faults due to the missing access flag
        4. ptep_set_access_flags is called with dirty = 0, due to the read fault
        5. pte is then made PTE_WRITE | PTE_DIRTY | PTE_AF | PTE_RDONLY (!)
        6. A write faults, but pte_write is true so we get stuck
      
      The solution is to check the new page table entry (as would be done by
      the generic, non-atomic definition of ptep_set_access_flags that just
      calls set_pte_at) to establish the dirty state.
      
      Fixes: 66dbd6e6 ("arm64: Implement ptep_set_access_flags() for hardware AF/DBM")
      Reviewed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reported-by: default avatarAlexander Graf <agraf@suse.de>
      Tested-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      24ba19ec
    • Catalin Marinas's avatar
      arm64: Provide "model name" in /proc/cpuinfo for PER_LINUX32 tasks · 60a61375
      Catalin Marinas authored
      commit e47b020a upstream.
      
      This patch brings the PER_LINUX32 /proc/cpuinfo format more in line with
      the 32-bit ARM one by providing an additional line:
      
      model name      : ARMv8 Processor rev X (v8l)
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      60a61375
    • Tom Lendacky's avatar
      crypto: ccp - Fix AES XTS error for request sizes above 4096 · d6d35c96
      Tom Lendacky authored
      commit ab6a11a7 upstream.
      
      The ccp-crypto module for AES XTS support has a bug that can allow requests
      greater than 4096 bytes in size to be passed to the CCP hardware. The CCP
      hardware does not support request sizes larger than 4096, resulting in
      incorrect output. The request should actually be handled by the fallback
      mechanism instantiated by the ccp-crypto module.
      
      Add a check to insure the request size is less than or equal to the maximum
      supported size and use the fallback mechanism if it is not.
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d6d35c96
    • Arnd Bergmann's avatar
      crypto: public_key: select CRYPTO_AKCIPHER · 5aabc1b7
      Arnd Bergmann authored
      commit bad6a185 upstream.
      
      In some rare randconfig builds, we can end up with
      ASYMMETRIC_PUBLIC_KEY_SUBTYPE enabled but CRYPTO_AKCIPHER disabled,
      which fails to link because of the reference to crypto_alloc_akcipher:
      
      crypto/built-in.o: In function `public_key_verify_signature':
      :(.text+0x110e4): undefined reference to `crypto_alloc_akcipher'
      
      This adds a Kconfig 'select' statement to ensure the dependency
      is always there.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5aabc1b7
    • Marc Zyngier's avatar
      irqchip/gic-v3: Fix ICC_SGI1R_EL1.INTID decoding mask · 48f23acc
      Marc Zyngier authored
      commit dd5f1b04 upstream.
      
      The INTID mask is wrong, and is made a signed value, which has
      nteresting effects in the KVM emulation. Let's sanitize it.
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      48f23acc
    • Michael Holzheu's avatar
      s390/bpf: reduce maximum program size to 64 KB · 73e66190
      Michael Holzheu authored
      commit 0fa96355 upstream.
      
      The s390 BFP compiler currently uses relative branch instructions
      that only support jumps up to 64 KB. Examples are "j", "jnz", "cgrj",
      etc.  Currently the maximum size of s390 BPF programs is set
      to 0x7ffff.  If branches over 64 KB are generated the, kernel can
      crash due to incorrect code.
      
      So fix this an reduce the maximum size to 64 KB. Programs larger than
      that will be interpreted.
      
      Fixes: ce2b6ad9 ("s390/bpf: increase BPF_SIZE_MAX")
      Signed-off-by: default avatarMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      73e66190
    • Michael Holzheu's avatar
      s390/bpf: fix recache skb->data/hlen for skb_vlan_push/pop · 66babd49
      Michael Holzheu authored
      commit 6edf0aa4 upstream.
      
      In case of usage of skb_vlan_push/pop, in the prologue we store
      the SKB pointer on the stack and restore it after BPF_JMP_CALL
      to skb_vlan_push/pop.
      
      Unfortunately currently there are two bugs in the code:
      
       1) The wrong stack slot (offset 170 instead of 176) is used
       2) The wrong register (W1 instead of B1) is saved
      
      So fix this and use correct stack slot and register.
      
      Fixes: 9db7f2b8 ("s390/bpf: recache skb->data/hlen for skb_vlan_push/pop")
      Signed-off-by: default avatarMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      66babd49
    • Ricardo Ribalda Delgado's avatar
      gpiolib: Fix unaligned used of reference counters · 55d32af7
      Ricardo Ribalda Delgado authored
      commit f4833b8c upstream.
      
      gpiolib relies on the reference counters to clean up the gpio_device
      structure.
      
      Although the number of get/put is properly aligned on gpiolib.c
      itself, it does not take into consideration how the referece counters
      are affected by other external functions such as cdev_add and device_add.
      
      Because of this, after the last call to put_device, the reference counter
      has a value of +3, therefore never calling gpiodevice_release.
      
      Due to the fact that some of the device  has already been cleaned on
      gpiochip_remove, the library will end up OOPsing the kernel (e.g. a call
      to of_gpiochip_find_and_xlate).
      Signed-off-by: default avatarRicardo Ribalda Delgado <ricardo.ribalda@gmail.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      55d32af7
    • Ricardo Ribalda Delgado's avatar
      gpiolib: Fix NULL pointer deference · d917611d
      Ricardo Ribalda Delgado authored
      commit 11f33a6d upstream.
      
      Under some circumstances, a gpiochip might be half cleaned from the
      gpio_device list.
      
      This patch makes sure that the chip pointer is still valid, before
      calling the match function.
      
      [  104.088296] BUG: unable to handle kernel NULL pointer dereference at
      0000000000000090
      [  104.089772] IP: [<ffffffff813d2045>] of_gpiochip_find_and_xlate+0x15/0x80
      [  104.128273] Call Trace:
      [  104.129802]  [<ffffffff813d2030>] ? of_parse_own_gpio+0x1f0/0x1f0
      [  104.131353]  [<ffffffff813cd910>] gpiochip_find+0x60/0x90
      [  104.132868]  [<ffffffff813d21ba>] of_get_named_gpiod_flags+0x9a/0x120
      ...
      [  104.141586]  [<ffffffff8163d12b>] gpio_led_probe+0x11b/0x360
      Signed-off-by: default avatarRicardo Ribalda Delgado <ricardo.ribalda@gmail.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d917611d
    • Ben Dooks's avatar
      gpio: bcm-kona: fix bcm_kona_gpio_reset() warnings · ac63dc3f
      Ben Dooks authored
      commit b66b2a0a upstream.
      
      The bcm_kona_gpio_reset() calls bcm_kona_gpio_write_lock_regs()
      with what looks like the wrong parameter. The write_lock_regs
      function takes a pointer to the registers, not the bcm_kona_gpio
      structure.
      
      Fix the warning, and probably bug by changing the function to
      pass reg_base instead of kona_gpio, fixing the following warning:
      
      drivers/gpio/gpio-bcm-kona.c:550:47: warning: incorrect type in argument 1
        (different address spaces)
        expected void [noderef] <asn:2>*reg_base
        got struct bcm_kona_gpio *kona_gpio
        warning: incorrect type in argument 1 (different address spaces)
        expected void [noderef] <asn:2>*reg_base
        got struct bcm_kona_gpio *kona_gpio
      Signed-off-by: default avatarBen Dooks <ben.dooks@codethink.co.uk>
      Acked-by: default avatarRay Jui <ray.jui@broadcom.com>
      Reviewed-by: default avatarMarkus Mayer <mmayer@broadcom.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ac63dc3f
    • Linus Walleij's avatar
      gpio: bail out silently on NULL descriptors · ecfb6ce5
      Linus Walleij authored
      commit 54d77198 upstream.
      
      In fdeb8e15
      ("gpio: reflect base and ngpio into gpio_device")
      assumed that GPIO descriptors are either valid or error
      pointers, but gpiod_get_[index_]optional() actually return
      NULL descriptors and then all subsequent calls should just
      bail out.
      
      Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Cc: Florian Fainelli <f.fainelli@gmail.com>
      Cc: Andrew Lunn <andrew@lunn.ch>
      Fixes: fdeb8e15 ("gpio: reflect base and ngpio into gpio_device")
      Reported-by: default avatarUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ecfb6ce5
    • Russell King's avatar
      ARM: fix PTRACE_SETVFPREGS on SMP systems · 3a2ec155
      Russell King authored
      commit e2dfb4b8 upstream.
      
      PTRACE_SETVFPREGS fails to properly mark the VFP register set to be
      reloaded, because it undoes one of the effects of vfp_flush_hwstate().
      
      Specifically vfp_flush_hwstate() sets thread->vfpstate.hard.cpu to
      an invalid CPU number, but vfp_set() overwrites this with the original
      CPU number, thereby rendering the hardware state as apparently "valid",
      even though the software state is more recent.
      
      Fix this by reverting the previous change.
      
      Fixes: 8130b9d7 ("ARM: 7308/1: vfp: flush thread hwstate before copying ptrace registers")
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Tested-by: default avatarSimon Marchi <simon.marchi@ericsson.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3a2ec155
    • Torsten Hilbrich's avatar
      ALSA: hda/realtek: Add T560 docking unit fixup · 0f16822c
      Torsten Hilbrich authored
      commit dab38e43 upstream.
      
      Tested with Lenovo Ultradock. Fixes the non-working headphone jack on
      the docking unit.
      Signed-off-by: default avatarTorsten Hilbrich <torsten.hilbrich@secunet.com>
      Tested-by: default avatarTorsten Hilbrich <torsten.hilbrich@secunet.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0f16822c
    • Kailang Yang's avatar
      ALSA: hda/realtek - Add support for new codecs ALC700/ALC701/ALC703 · b4d79b9b
      Kailang Yang authored
      commit 6fbae35a upstream.
      
      Support new codecs for ALC700/ALC701/ALC703.
      Signed-off-by: default avatarKailang Yang <kailang@realtek.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b4d79b9b
    • Kailang Yang's avatar
      ALSA: hda/realtek - ALC256 speaker noise issue · cd8edeef
      Kailang Yang authored
      commit e69e7e03 upstream.
      
      That is some different register for ALC255 and ALC256.
      ALC256 can't fit with some ALC255 register.
      This issue is cause from LDO output voltage control.
      This patch is updated the right LDO register value.
      Signed-off-by: default avatarKailang Yang <kailang@realtek.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cd8edeef
    • AceLan Kao's avatar
      ALSA: hda - Fix headset mic detection problem for Dell machine · 05c40c5b
      AceLan Kao authored
      commit f90d83b3 upstream.
      
      Add the pin configuration value of this machine into the pin_quirk
      table to make DELL1_MIC_NO_PRESENCE apply to this machine.
      Signed-off-by: default avatarAceLan Kao <acelan.kao@canonical.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      05c40c5b
    • Vinod Koul's avatar
      ALSA: hda - Add PCI ID for Kabylake · 71cf12da
      Vinod Koul authored
      commit 35639a0e upstream.
      
      Kabylake shows up as PCI ID 0xa171. And Kabylake-LP as 0x9d71.
      Since these are similar to Skylake add these to SKL_PLUS macro
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      71cf12da
    • Julien Grall's avatar
      drivers/perf: arm_pmu: Defer the setting of __oprofile_cpu_pmu · 711d9f5d
      Julien Grall authored
      commit 0f254c76 upstream.
      
      The global variable __oprofile_cpu_pmu is set before the PMU is fully
      initialized. If an error occurs before the end of the initialization,
      the PMU will be freed and the variable will contain an invalid pointer.
      
      This will result in a kernel crash when perf will be used.
      
      Fix it by moving the setting of __oprofile_cpu_pmu when the PMU is fully
      initialized (i.e when it is no longer possible to fail).
      Signed-off-by: default avatarJulien Grall <julien.grall@arm.com>
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      711d9f5d
    • Paolo Bonzini's avatar
      KVM: irqfd: fix NULL pointer dereference in kvm_irq_map_gsi · 9ae6bfa4
      Paolo Bonzini authored
      commit c622a3c2 upstream.
      
      Found by syzkaller:
      
          BUG: unable to handle kernel NULL pointer dereference at 0000000000000120
          IP: [<ffffffffa0797202>] kvm_irq_map_gsi+0x12/0x90 [kvm]
          PGD 6f80b067 PUD b6535067 PMD 0
          Oops: 0000 [#1] SMP
          CPU: 3 PID: 4988 Comm: a.out Not tainted 4.4.9-300.fc23.x86_64 #1
          [...]
          Call Trace:
           [<ffffffffa0795f62>] irqfd_update+0x32/0xc0 [kvm]
           [<ffffffffa0796c7c>] kvm_irqfd+0x3dc/0x5b0 [kvm]
           [<ffffffffa07943f4>] kvm_vm_ioctl+0x164/0x6f0 [kvm]
           [<ffffffff81241648>] do_vfs_ioctl+0x298/0x480
           [<ffffffff812418a9>] SyS_ioctl+0x79/0x90
           [<ffffffff817a1062>] tracesys_phase2+0x84/0x89
          Code: b5 71 a7 e0 5b 41 5c 41 5d 5d f3 c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 8b 8f 10 2e 00 00 31 c0 48 89 e5 <39> 91 20 01 00 00 76 6a 48 63 d2 48 8b 94 d1 28 01 00 00 48 85
          RIP  [<ffffffffa0797202>] kvm_irq_map_gsi+0x12/0x90 [kvm]
           RSP <ffff8800926cbca8>
          CR2: 0000000000000120
      
      Testcase:
      
          #include <unistd.h>
          #include <sys/syscall.h>
          #include <string.h>
          #include <stdint.h>
          #include <linux/kvm.h>
          #include <fcntl.h>
          #include <sys/ioctl.h>
      
          long r[26];
      
          int main()
          {
              memset(r, -1, sizeof(r));
              r[2] = open("/dev/kvm", 0);
              r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
      
              struct kvm_irqfd ifd;
              ifd.fd = syscall(SYS_eventfd2, 5, 0);
              ifd.gsi = 3;
              ifd.flags = 2;
              ifd.resamplefd = ifd.fd;
              r[25] = ioctl(r[3], KVM_IRQFD, &ifd);
              return 0;
          }
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9ae6bfa4
    • Paolo Bonzini's avatar
      KVM: x86: fix OOPS after invalid KVM_SET_DEBUGREGS · 3076e1a6
      Paolo Bonzini authored
      commit d14bdb55 upstream.
      
      MOV to DR6 or DR7 causes a #GP if an attempt is made to write a 1 to
      any of bits 63:32.  However, this is not detected at KVM_SET_DEBUGREGS
      time, and the next KVM_RUN oopses:
      
         general protection fault: 0000 [#1] SMP
         CPU: 2 PID: 14987 Comm: a.out Not tainted 4.4.9-300.fc23.x86_64 #1
         Hardware name: LENOVO 2325F51/2325F51, BIOS G2ET32WW (1.12 ) 05/30/2012
         [...]
         Call Trace:
          [<ffffffffa072c93d>] kvm_arch_vcpu_ioctl_run+0x141d/0x14e0 [kvm]
          [<ffffffffa071405d>] kvm_vcpu_ioctl+0x33d/0x620 [kvm]
          [<ffffffff81241648>] do_vfs_ioctl+0x298/0x480
          [<ffffffff812418a9>] SyS_ioctl+0x79/0x90
          [<ffffffff817a0f2e>] entry_SYSCALL_64_fastpath+0x12/0x71
         Code: 55 83 ff 07 48 89 e5 77 27 89 ff ff 24 fd 90 87 80 81 0f 23 fe 5d c3 0f 23 c6 5d c3 0f 23 ce 5d c3 0f 23 d6 5d c3 0f 23 de 5d c3 <0f> 23 f6 5d c3 0f 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00
         RIP  [<ffffffff810639eb>] native_set_debugreg+0x2b/0x40
          RSP <ffff88005836bd50>
      
      Testcase (beautified/reduced from syzkaller output):
      
          #include <unistd.h>
          #include <sys/syscall.h>
          #include <string.h>
          #include <stdint.h>
          #include <linux/kvm.h>
          #include <fcntl.h>
          #include <sys/ioctl.h>
      
          long r[8];
      
          int main()
          {
              struct kvm_debugregs dr = { 0 };
      
              r[2] = open("/dev/kvm", O_RDONLY);
              r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
              r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7);
      
              memcpy(&dr,
                     "\x5d\x6a\x6b\xe8\x57\x3b\x4b\x7e\xcf\x0d\xa1\x72"
                     "\xa3\x4a\x29\x0c\xfc\x6d\x44\x00\xa7\x52\xc7\xd8"
                     "\x00\xdb\x89\x9d\x78\xb5\x54\x6b\x6b\x13\x1c\xe9"
                     "\x5e\xd3\x0e\x40\x6f\xb4\x66\xf7\x5b\xe3\x36\xcb",
                     48);
              r[7] = ioctl(r[4], KVM_SET_DEBUGREGS, &dr);
              r[6] = ioctl(r[4], KVM_RUN, 0);
          }
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3076e1a6
    • Christoffer Dall's avatar
      KVM: arm/arm64: vgic-v3: Clear all dirty LRs · f25ef62e
      Christoffer Dall authored
      commit fa89c77e upstream.
      
      When saving the state of the list registers, it is critical to
      reset them zero, as we could otherwise leave unexpected EOI
      interrupts pending for virtual level interrupts.
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f25ef62e
    • Christoffer Dall's avatar
      KVM: arm/arm64: vgic-v2: Clear all dirty LRs · 05be0960
      Christoffer Dall authored
      commit 4d3afc9b upstream.
      
      When saving the state of the list registers, it is critical to
      reset them zero, as we could otherwise leave unexpected EOI
      interrupts pending for virtual level interrupts.
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      05be0960
    • Jakub Sitnicki's avatar
      ipv6: Skip XFRM lookup if dst_entry in socket cache is valid · 48fc8f94
      Jakub Sitnicki authored
      [ Upstream commit 00bc0ef5 ]
      
      At present we perform an xfrm_lookup() for each UDPv6 message we
      send. The lookup involves querying the flow cache (flow_cache_lookup)
      and, in case of a cache miss, creating an XFRM bundle.
      
      If we miss the flow cache, we can end up creating a new bundle and
      deriving the path MTU (xfrm_init_pmtu) from on an already transformed
      dst_entry, which we pass from the socket cache (sk->sk_dst_cache) down
      to xfrm_lookup(). This can happen only if we're caching the dst_entry
      in the socket, that is when we're using a connected UDP socket.
      
      To put it another way, the path MTU shrinks each time we miss the flow
      cache, which later on leads to incorrectly fragmented payload. It can
      be observed with ESPv6 in transport mode:
      
        1) Set up a transformation and lower the MTU to trigger fragmentation
          # ip xfrm policy add dir out src ::1 dst ::1 \
            tmpl src ::1 dst ::1 proto esp spi 1
          # ip xfrm state add src ::1 dst ::1 \
            proto esp spi 1 enc 'aes' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b
          # ip link set dev lo mtu 1500
      
        2) Monitor the packet flow and set up an UDP sink
          # tcpdump -ni lo -ttt &
          # socat udp6-listen:12345,fork /dev/null &
      
        3) Send a datagram that needs fragmentation with a connected socket
          # perl -e 'print "@" x 1470 | socat - udp6:[::1]:12345
          2016/06/07 18:52:52 socat[724] E read(3, 0x555bb3d5ba00, 8192): Protocol error
          00:00:00.000000 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x2), length 1448
          00:00:00.000014 IP6 ::1 > ::1: frag (1448|32)
          00:00:00.000050 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x3), length 1272
          (^ ICMPv6 Parameter Problem)
          00:00:00.000022 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x5), length 136
      
        4) Compare it to a non-connected socket
          # perl -e 'print "@" x 1500' | socat - udp6-sendto:[::1]:12345
          00:00:40.535488 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x6), length 1448
          00:00:00.000010 IP6 ::1 > ::1: frag (1448|64)
      
      What happens in step (3) is:
      
        1) when connecting the socket in __ip6_datagram_connect(), we
           perform an XFRM lookup, miss the flow cache, create an XFRM
           bundle, and cache the destination,
      
        2) afterwards, when sending the datagram, we perform an XFRM lookup,
           again, miss the flow cache (due to mismatch of flowi6_iif and
           flowi6_oif, which is an issue of its own), and recreate an XFRM
           bundle based on the cached (and already transformed) destination.
      
      To prevent the recreation of an XFRM bundle, avoid an XFRM lookup
      altogether whenever we already have a destination entry cached in the
      socket. This prevents the path MTU shrinkage and brings us on par with
      UDPv4.
      
      The fix also benefits connected PINGv6 sockets, another user of
      ip6_sk_dst_lookup_flow(), who also suffer messages being transformed
      twice.
      
      Joint work with Hannes Frederic Sowa.
      Reported-by: default avatarJan Tluka <jtluka@redhat.com>
      Signed-off-by: default avatarJakub Sitnicki <jkbs@redhat.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      48fc8f94
    • Guillaume Nault's avatar
      l2tp: fix configuration passed to setup_udp_tunnel_sock() · d532139b
      Guillaume Nault authored
      [ Upstream commit a5c5e2da ]
      
      Unused fields of udp_cfg must be all zeros. Otherwise
      setup_udp_tunnel_sock() fills ->gro_receive and ->gro_complete
      callbacks with garbage, eventually resulting in panic when used by
      udp_gro_receive().
      
      [   72.694123] BUG: unable to handle kernel paging request at ffff880033f87d78
      [   72.695518] IP: [<ffff880033f87d78>] 0xffff880033f87d78
      [   72.696530] PGD 26e2067 PUD 26e3067 PMD 342ed063 PTE 8000000033f87163
      [   72.696530] Oops: 0011 [#1] SMP KASAN
      [   72.696530] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pptp gre pppox ppp_generic slhc crc32c_intel ghash_clmulni_intel jitterentropy_rng sha256_generic hmac drbg ansi_cprng aesni_intel evdev aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper serio_raw acpi_cpufreq button proc\
      essor ext4 crc16 jbd2 mbcache virtio_blk virtio_net virtio_pci virtio_ring virtio
      [   72.696530] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.7.0-rc1 #1
      [   72.696530] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
      [   72.696530] task: ffff880035b59700 ti: ffff880035b70000 task.ti: ffff880035b70000
      [   72.696530] RIP: 0010:[<ffff880033f87d78>]  [<ffff880033f87d78>] 0xffff880033f87d78
      [   72.696530] RSP: 0018:ffff880035f87bc0  EFLAGS: 00010246
      [   72.696530] RAX: ffffed000698f996 RBX: ffff88003326b840 RCX: ffffffff814cc823
      [   72.696530] RDX: ffff88003326b840 RSI: ffff880033e48038 RDI: ffff880034c7c780
      [   72.696530] RBP: ffff880035f87c18 R08: 000000000000a506 R09: 0000000000000000
      [   72.696530] R10: ffff880035f87b38 R11: ffff880034b9344d R12: 00000000ebfea715
      [   72.696530] R13: 0000000000000000 R14: ffff880034c7c780 R15: 0000000000000000
      [   72.696530] FS:  0000000000000000(0000) GS:ffff880035f80000(0000) knlGS:0000000000000000
      [   72.696530] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   72.696530] CR2: ffff880033f87d78 CR3: 0000000033c98000 CR4: 00000000000406a0
      [   72.696530] Stack:
      [   72.696530]  ffffffff814cc834 ffff880034b93468 0000001481416818 ffff88003326b874
      [   72.696530]  ffff880034c7ccb0 ffff880033e48038 ffff88003326b840 ffff880034b93462
      [   72.696530]  ffff88003326b88a ffff88003326b88c ffff880034b93468 ffff880035f87c70
      [   72.696530] Call Trace:
      [   72.696530]  <IRQ>
      [   72.696530]  [<ffffffff814cc834>] ? udp_gro_receive+0x1c6/0x1f9
      [   72.696530]  [<ffffffff814ccb1c>] udp4_gro_receive+0x2b5/0x310
      [   72.696530]  [<ffffffff814d989b>] inet_gro_receive+0x4a3/0x4cd
      [   72.696530]  [<ffffffff81431b32>] dev_gro_receive+0x584/0x7a3
      [   72.696530]  [<ffffffff810adf7a>] ? __lock_is_held+0x29/0x64
      [   72.696530]  [<ffffffff814321f7>] napi_gro_receive+0x124/0x21d
      [   72.696530]  [<ffffffffa000b145>] virtnet_receive+0x8df/0x8f6 [virtio_net]
      [   72.696530]  [<ffffffffa000b27e>] virtnet_poll+0x1d/0x8d [virtio_net]
      [   72.696530]  [<ffffffff81431350>] net_rx_action+0x15b/0x3b9
      [   72.696530]  [<ffffffff815893d6>] __do_softirq+0x216/0x546
      [   72.696530]  [<ffffffff81062392>] irq_exit+0x49/0xb6
      [   72.696530]  [<ffffffff81588e9a>] do_IRQ+0xe2/0xfa
      [   72.696530]  [<ffffffff81587a49>] common_interrupt+0x89/0x89
      [   72.696530]  <EOI>
      [   72.696530]  [<ffffffff810b05df>] ? trace_hardirqs_on_caller+0x229/0x270
      [   72.696530]  [<ffffffff8102b3c7>] ? default_idle+0x1c/0x2d
      [   72.696530]  [<ffffffff8102b3c5>] ? default_idle+0x1a/0x2d
      [   72.696530]  [<ffffffff8102bb8c>] arch_cpu_idle+0xa/0xc
      [   72.696530]  [<ffffffff810a6c39>] default_idle_call+0x1a/0x1c
      [   72.696530]  [<ffffffff810a6d96>] cpu_startup_entry+0x15b/0x20f
      [   72.696530]  [<ffffffff81039a81>] start_secondary+0x12c/0x133
      [   72.696530] Code: ff ff ff ff ff ff ff ff ff ff 7f ff ff ff ff ff ff ff 7f 00 7e f8 33 00 88 ff ff 6d 61 58 81 ff ff ff ff 5e de 0a 81 ff ff ff ff <00> 5c e2 34 00 88 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   72.696530] RIP  [<ffff880033f87d78>] 0xffff880033f87d78
      [   72.696530]  RSP <ffff880035f87bc0>
      [   72.696530] CR2: ffff880033f87d78
      [   72.696530] ---[ end trace ad7758b9a1dccf99 ]---
      [   72.696530] Kernel panic - not syncing: Fatal exception in interrupt
      [   72.696530] Kernel Offset: disabled
      [   72.696530] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
      
      v2: use empty initialiser instead of "{ NULL }" to avoid relying on
          first field's type.
      
      Fixes: 38fd2af2 ("udp: Add socket based GRO and config")
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d532139b
    • Toshiaki Makita's avatar
      bridge: Don't insert unnecessary local fdb entry on changing mac address · df181e84
      Toshiaki Makita authored
      [ Upstream commit 0b148def ]
      
      The missing br_vlan_should_use() test caused creation of an unneeded
      local fdb entry on changing mac address of a bridge device when there is
      a vlan which is configured on a bridge port but not on the bridge
      device.
      
      Fixes: 2594e906 ("bridge: vlan: add per-vlan struct and move to rhashtables")
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      df181e84