1. 05 Dec, 2018 29 commits
    • Takashi Iwai's avatar
      ALSA: ac97: Fix incorrect bit shift at AC97-SPSA control write · 5d3201bb
      Takashi Iwai authored
      commit 7194eda1 upstream.
      
      The function snd_ac97_put_spsa() gets the bit shift value from the
      associated private_value, but it extracts too much; the current code
      extracts 8 bit values in bits 8-15, but this is a combination of two
      nibbles (bits 8-11 and bits 12-15) for left and right shifts.
      Due to the incorrect bits extraction, the actual shift may go beyond
      the 32bit value, as spotted recently by UBSAN check:
       UBSAN: Undefined behaviour in sound/pci/ac97/ac97_codec.c:836:7
       shift exponent 68 is too large for 32-bit type 'int'
      
      This patch fixes the shift value extraction by masking the properly
      with 0x0f instead of 0xff.
      Reported-and-tested-by: default avatarMeelis Roos <mroos@linux.ee>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5d3201bb
    • Takashi Iwai's avatar
      ALSA: wss: Fix invalid snd_free_pages() at error path · 12b2efff
      Takashi Iwai authored
      commit 7b691541 upstream.
      
      Some spurious calls of snd_free_pages() have been overlooked and
      remain in the error paths of wss driver code.  Since runtime->dma_area
      is managed by the PCM core helper, we shouldn't release manually.
      
      Drop the superfluous calls.
      Reviewed-by: default avatarTakashi Sakamoto <o-takashi@sakamocchi.jp>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      12b2efff
    • Maximilian Heyne's avatar
      fs: fix lost error code in dio_complete · 55eb06b7
      Maximilian Heyne authored
      commit 41e817bc upstream.
      
      commit e2592217 ("fs: simplify the
      generic_write_sync prototype") reworked callers of generic_write_sync(),
      and ended up dropping the error return for the directio path. Prior to
      that commit, in dio_complete(), an error would be bubbled up the stack,
      but after that commit, errors passed on to dio_complete were eaten up.
      
      This was reported on the list earlier, and a fix was proposed in
      https://lore.kernel.org/lkml/20160921141539.GA17898@infradead.org/, but
      never followed up with.  We recently hit this bug in our testing where
      fencing io errors, which were previously erroring out with EIO, were
      being returned as success operations after this commit.
      
      The fix proposed on the list earlier was a little short -- it would have
      still called generic_write_sync() in case `ret` already contained an
      error. This fix ensures generic_write_sync() is only called when there's
      no pending error in the write. Additionally, transferred is replaced
      with ret to bring this code in line with other callers.
      
      Fixes: e2592217 ("fs: simplify the generic_write_sync prototype")
      Reported-by: default avatarRavi Nankani <rnankani@amazon.com>
      Signed-off-by: default avatarMaximilian Heyne <mheyne@amazon.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      CC: Torsten Mehlan <tomeh@amazon.de>
      CC: Uwe Dannowski <uwed@amazon.de>
      CC: Amit Shah <aams@amazon.de>
      CC: David Woodhouse <dwmw@amazon.co.uk>
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      55eb06b7
    • Jiri Olsa's avatar
      perf/x86/intel: Add generic branch tracing check to intel_pmu_has_bts() · 54f73829
      Jiri Olsa authored
      commit 67266c10 upstream.
      
      Currently we check the branch tracing only by checking for the
      PERF_COUNT_HW_BRANCH_INSTRUCTIONS event of PERF_TYPE_HARDWARE
      type. But we can define the same event with the PERF_TYPE_RAW
      type.
      
      Changing the intel_pmu_has_bts() code to check on event's final
      hw config value, so both HW types are covered.
      
      Adding unlikely to intel_pmu_has_bts() condition calls, because
      it was used in the original code in intel_bts_constraints.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/20181121101612.16272-2-jolsa@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      54f73829
    • Jiri Olsa's avatar
      perf/x86/intel: Move branch tracing setup to the Intel-specific source file · 08c133e8
      Jiri Olsa authored
      commit ed6101bb upstream.
      
      Moving branch tracing setup to Intel core object into separate
      intel_pmu_bts_config function, because it's Intel specific.
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/20181121101612.16272-1-jolsa@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      08c133e8
    • Filipe Manana's avatar
      Btrfs: ensure path name is null terminated at btrfs_control_ioctl · 6e1210e2
      Filipe Manana authored
      commit f505754f upstream.
      
      We were using the path name received from user space without checking that
      it is null terminated. While btrfs-progs is well behaved and does proper
      validation and null termination, someone could call the ioctl and pass
      a non-null terminated patch, leading to buffer overrun problems in the
      kernel.  The ioctl is protected by CAP_SYS_ADMIN.
      
      So just set the last byte of the path to a null character, similar to what
      we do in other ioctls (add/remove/resize device, snapshot creation, etc).
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6e1210e2
    • Max Filippov's avatar
      xtensa: fix coprocessor context offset definitions · f4038875
      Max Filippov authored
      commit 03bc996a upstream.
      
      Coprocessor context offsets are used by the assembly code that moves
      coprocessor context between the individual fields of the
      thread_info::xtregs_cp structure and coprocessor registers.
      This fixes coprocessor context clobbering on flushing and reloading
      during normal user code execution and user process debugging in the
      presence of more than one coprocessor in the core configuration.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f4038875
    • Max Filippov's avatar
      xtensa: enable coprocessors that are being flushed · c26e3c6c
      Max Filippov authored
      commit 2958b666 upstream.
      
      coprocessor_flush_all may be called from a context of a thread that is
      different from the thread being flushed. In that case contents of the
      cpenable special register may not match ti->cpenable of the target
      thread, resulting in unhandled coprocessor exception in the kernel
      context.
      Set cpenable special register to the ti->cpenable of the target register
      for the duration of the flush and restore it afterwards.
      This fixes the following crash caused by coprocessor register inspection
      in native gdb:
      
        (gdb) p/x $w0
        Illegal instruction in kernel: sig: 9 [#1] PREEMPT
        Call Trace:
          ___might_sleep+0x184/0x1a4
          __might_sleep+0x41/0xac
          exit_signals+0x14/0x218
          do_exit+0xc9/0x8b8
          die+0x99/0xa0
          do_illegal_instruction+0x18/0x6c
          common_exception+0x77/0x77
          coprocessor_flush+0x16/0x3c
          arch_ptrace+0x46c/0x674
          sys_ptrace+0x2ce/0x3b4
          system_call+0x54/0x80
          common_exception+0x77/0x77
        note: gdb[100] exited with preempt_count 1
        Killed
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c26e3c6c
    • Wanpeng Li's avatar
      KVM: X86: Fix scan ioapic use-before-initialization · 3a468e8e
      Wanpeng Li authored
      commit e97f852f upstream.
      
      Reported by syzkaller:
      
       BUG: unable to handle kernel NULL pointer dereference at 00000000000001c8
       PGD 80000003ec4da067 P4D 80000003ec4da067 PUD 3f7bfa067 PMD 0
       Oops: 0000 [#1] PREEMPT SMP PTI
       CPU: 7 PID: 5059 Comm: debug Tainted: G           OE     4.19.0-rc5 #16
       RIP: 0010:__lock_acquire+0x1a6/0x1990
       Call Trace:
        lock_acquire+0xdb/0x210
        _raw_spin_lock+0x38/0x70
        kvm_ioapic_scan_entry+0x3e/0x110 [kvm]
        vcpu_enter_guest+0x167e/0x1910 [kvm]
        kvm_arch_vcpu_ioctl_run+0x35c/0x610 [kvm]
        kvm_vcpu_ioctl+0x3e9/0x6d0 [kvm]
        do_vfs_ioctl+0xa5/0x690
        ksys_ioctl+0x6d/0x80
        __x64_sys_ioctl+0x1a/0x20
        do_syscall_64+0x83/0x6e0
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The reason is that the testcase writes hyperv synic HV_X64_MSR_SINT6 msr
      and triggers scan ioapic logic to load synic vectors into EOI exit bitmap.
      However, irqchip is not initialized by this simple testcase, ioapic/apic
      objects should not be accessed.
      This can be triggered by the following program:
      
          #define _GNU_SOURCE
      
          #include <endian.h>
          #include <stdint.h>
          #include <stdio.h>
          #include <stdlib.h>
          #include <string.h>
          #include <sys/syscall.h>
          #include <sys/types.h>
          #include <unistd.h>
      
          uint64_t r[3] = {0xffffffffffffffff, 0xffffffffffffffff, 0xffffffffffffffff};
      
          int main(void)
          {
          	syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0);
          	long res = 0;
          	memcpy((void*)0x20000040, "/dev/kvm", 9);
          	res = syscall(__NR_openat, 0xffffffffffffff9c, 0x20000040, 0, 0);
          	if (res != -1)
          		r[0] = res;
          	res = syscall(__NR_ioctl, r[0], 0xae01, 0);
          	if (res != -1)
          		r[1] = res;
          	res = syscall(__NR_ioctl, r[1], 0xae41, 0);
          	if (res != -1)
          		r[2] = res;
          	memcpy(
          			(void*)0x20000080,
          			"\x01\x00\x00\x00\x00\x5b\x61\xbb\x96\x00\x00\x40\x00\x00\x00\x00\x01\x00"
          			"\x08\x00\x00\x00\x00\x00\x0b\x77\xd1\x78\x4d\xd8\x3a\xed\xb1\x5c\x2e\x43"
          			"\xaa\x43\x39\xd6\xff\xf5\xf0\xa8\x98\xf2\x3e\x37\x29\x89\xde\x88\xc6\x33"
          			"\xfc\x2a\xdb\xb7\xe1\x4c\xac\x28\x61\x7b\x9c\xa9\xbc\x0d\xa0\x63\xfe\xfe"
          			"\xe8\x75\xde\xdd\x19\x38\xdc\x34\xf5\xec\x05\xfd\xeb\x5d\xed\x2e\xaf\x22"
          			"\xfa\xab\xb7\xe4\x42\x67\xd0\xaf\x06\x1c\x6a\x35\x67\x10\x55\xcb",
          			106);
          	syscall(__NR_ioctl, r[2], 0x4008ae89, 0x20000080);
          	syscall(__NR_ioctl, r[2], 0xae80, 0);
          	return 0;
          }
      
      This patch fixes it by bailing out scan ioapic if ioapic is not initialized in
      kernel.
      Reported-by: default avatarWei Wu <ww9210@gmail.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Wei Wu <ww9210@gmail.com>
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3a468e8e
    • Jim Mattson's avatar
      kvm: svm: Ensure an IBPB on all affected CPUs when freeing a vmcb · 43dd9f48
      Jim Mattson authored
      commit fd65d314 upstream.
      
      Previously, we only called indirect_branch_prediction_barrier on the
      logical CPU that freed a vmcb. This function should be called on all
      logical CPUs that last loaded the vmcb in question.
      
      Fixes: 15d45071 ("KVM/x86: Add IBPB support")
      Reported-by: default avatarNeel Natu <neelnatu@google.com>
      Signed-off-by: default avatarJim Mattson <jmattson@google.com>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      43dd9f48
    • Junaid Shahid's avatar
      kvm: mmu: Fix race in emulated page table writes · aa716434
      Junaid Shahid authored
      commit 0e0fee5c upstream.
      
      When a guest page table is updated via an emulated write,
      kvm_mmu_pte_write() is called to update the shadow PTE using the just
      written guest PTE value. But if two emulated guest PTE writes happened
      concurrently, it is possible that the guest PTE and the shadow PTE end
      up being out of sync. Emulated writes do not mark the shadow page as
      unsync-ed, so this inconsistency will not be resolved even by a guest TLB
      flush (unless the page was marked as unsync-ed at some other point).
      
      This is fixed by re-reading the current value of the guest PTE after the
      MMU lock has been acquired instead of just using the value that was
      written prior to calling kvm_mmu_pte_write().
      Signed-off-by: default avatarJunaid Shahid <junaids@google.com>
      Reviewed-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aa716434
    • Bernd Eckstein's avatar
      usbnet: ipheth: fix potential recvmsg bug and recvmsg bug 2 · ff67a7d3
      Bernd Eckstein authored
      [ Upstream commit 45611c61 ]
      
      The bug is not easily reproducable, as it may occur very infrequently
      (we had machines with 20minutes heavy downloading before it occurred)
      However, on a virual machine (VMWare on Windows 10 host) it occurred
      pretty frequently (1-2 seconds after a speedtest was started)
      
      dev->tx_skb mab be freed via dev_kfree_skb_irq on a callback
      before it is set.
      
      This causes the following problems:
      - double free of the skb or potential memory leak
      - in dmesg: 'recvmsg bug' and 'recvmsg bug 2' and eventually
        general protection fault
      
      Example dmesg output:
      [  134.841986] ------------[ cut here ]------------
      [  134.841987] recvmsg bug: copied 9C24A555 seq 9C24B557 rcvnxt 9C25A6B3 fl 0
      [  134.841993] WARNING: CPU: 7 PID: 2629 at /build/linux-hwe-On9fm7/linux-hwe-4.15.0/net/ipv4/tcp.c:1865 tcp_recvmsg+0x44d/0xab0
      [  134.841994] Modules linked in: ipheth(OE) kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd vmw_balloon intel_rapl_perf joydev input_leds serio_raw vmw_vsock_vmci_transport vsock shpchp i2c_piix4 mac_hid binfmt_misc vmw_vmci parport_pc ppdev lp parport autofs4 vmw_pvscsi vmxnet3 hid_generic usbhid hid vmwgfx ttm drm_kms_helper syscopyarea sysfillrect mptspi mptscsih sysimgblt ahci psmouse fb_sys_fops pata_acpi mptbase libahci e1000 drm scsi_transport_spi
      [  134.842046] CPU: 7 PID: 2629 Comm: python Tainted: G        W  OE    4.15.0-34-generic #37~16.04.1-Ubuntu
      [  134.842046] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
      [  134.842048] RIP: 0010:tcp_recvmsg+0x44d/0xab0
      [  134.842048] RSP: 0018:ffffa6630422bcc8 EFLAGS: 00010286
      [  134.842049] RAX: 0000000000000000 RBX: ffff997616f4f200 RCX: 0000000000000006
      [  134.842049] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff9976257d6490
      [  134.842050] RBP: ffffa6630422bd98 R08: 0000000000000001 R09: 000000000004bba4
      [  134.842050] R10: 0000000001e00c6f R11: 000000000004bba4 R12: ffff99760dee3000
      [  134.842051] R13: 0000000000000000 R14: ffff99760dee3514 R15: 0000000000000000
      [  134.842051] FS:  00007fe332347700(0000) GS:ffff9976257c0000(0000) knlGS:0000000000000000
      [  134.842052] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  134.842053] CR2: 0000000001e41000 CR3: 000000020e9b4006 CR4: 00000000003606e0
      [  134.842055] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  134.842055] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  134.842057] Call Trace:
      [  134.842060]  ? aa_sk_perm+0x53/0x1a0
      [  134.842064]  inet_recvmsg+0x51/0xc0
      [  134.842066]  sock_recvmsg+0x43/0x50
      [  134.842070]  SYSC_recvfrom+0xe4/0x160
      [  134.842072]  ? __schedule+0x3de/0x8b0
      [  134.842075]  ? ktime_get_ts64+0x4c/0xf0
      [  134.842079]  SyS_recvfrom+0xe/0x10
      [  134.842082]  do_syscall_64+0x73/0x130
      [  134.842086]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      [  134.842086] RIP: 0033:0x7fe331f5a81d
      [  134.842088] RSP: 002b:00007ffe8da98398 EFLAGS: 00000246 ORIG_RAX: 000000000000002d
      [  134.842090] RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 00007fe331f5a81d
      [  134.842094] RDX: 00000000000003fb RSI: 0000000001e00874 RDI: 0000000000000003
      [  134.842095] RBP: 00007fe32f642c70 R08: 0000000000000000 R09: 0000000000000000
      [  134.842097] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe332347698
      [  134.842099] R13: 0000000001b7e0a0 R14: 0000000001e00874 R15: 0000000000000000
      [  134.842103] Code: 24 fd ff ff e9 cc fe ff ff 48 89 d8 41 8b 8c 24 10 05 00 00 44 8b 45 80 48 c7 c7 08 bd 59 8b 48 89 85 68 ff ff ff e8 b3 c4 7d ff <0f> 0b 48 8b 85 68 ff ff ff e9 e9 fe ff ff 41 8b 8c 24 10 05 00
      [  134.842126] ---[ end trace b7138fc08c83147f ]---
      [  134.842144] general protection fault: 0000 [#1] SMP PTI
      [  134.842145] Modules linked in: ipheth(OE) kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd vmw_balloon intel_rapl_perf joydev input_leds serio_raw vmw_vsock_vmci_transport vsock shpchp i2c_piix4 mac_hid binfmt_misc vmw_vmci parport_pc ppdev lp parport autofs4 vmw_pvscsi vmxnet3 hid_generic usbhid hid vmwgfx ttm drm_kms_helper syscopyarea sysfillrect mptspi mptscsih sysimgblt ahci psmouse fb_sys_fops pata_acpi mptbase libahci e1000 drm scsi_transport_spi
      [  134.842161] CPU: 7 PID: 2629 Comm: python Tainted: G        W  OE    4.15.0-34-generic #37~16.04.1-Ubuntu
      [  134.842162] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
      [  134.842164] RIP: 0010:tcp_close+0x2c6/0x440
      [  134.842165] RSP: 0018:ffffa6630422bde8 EFLAGS: 00010202
      [  134.842167] RAX: 0000000000000000 RBX: ffff99760dee3000 RCX: 0000000180400034
      [  134.842168] RDX: 5c4afd407207a6c4 RSI: ffffe868495bd300 RDI: ffff997616f4f200
      [  134.842169] RBP: ffffa6630422be08 R08: 0000000016f4d401 R09: 0000000180400034
      [  134.842169] R10: ffffa6630422bd98 R11: 0000000000000000 R12: 000000000000600c
      [  134.842170] R13: 0000000000000000 R14: ffff99760dee30c8 R15: ffff9975bd44fe00
      [  134.842171] FS:  00007fe332347700(0000) GS:ffff9976257c0000(0000) knlGS:0000000000000000
      [  134.842173] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  134.842174] CR2: 0000000001e41000 CR3: 000000020e9b4006 CR4: 00000000003606e0
      [  134.842177] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  134.842178] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  134.842179] Call Trace:
      [  134.842181]  inet_release+0x42/0x70
      [  134.842183]  __sock_release+0x42/0xb0
      [  134.842184]  sock_close+0x15/0x20
      [  134.842187]  __fput+0xea/0x220
      [  134.842189]  ____fput+0xe/0x10
      [  134.842191]  task_work_run+0x8a/0xb0
      [  134.842193]  exit_to_usermode_loop+0xc4/0xd0
      [  134.842195]  do_syscall_64+0xf4/0x130
      [  134.842197]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      [  134.842197] RIP: 0033:0x7fe331f5a560
      [  134.842198] RSP: 002b:00007ffe8da982e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
      [  134.842200] RAX: 0000000000000000 RBX: 00007fe32f642c70 RCX: 00007fe331f5a560
      [  134.842201] RDX: 00000000008f5320 RSI: 0000000001cd4b50 RDI: 0000000000000003
      [  134.842202] RBP: 00007fe32f6500f8 R08: 000000000000003c R09: 00000000009343c0
      [  134.842203] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe32f6500d0
      [  134.842204] R13: 00000000008f5320 R14: 00000000008f5320 R15: 0000000001cd4770
      [  134.842205] Code: c8 00 00 00 45 31 e4 49 39 fe 75 4d eb 50 83 ab d8 00 00 00 01 48 8b 17 48 8b 47 08 48 c7 07 00 00 00 00 48 c7 47 08 00 00 00 00 <48> 89 42 08 48 89 10 0f b6 57 34 8b 47 2c 2b 47 28 83 e2 01 80
      [  134.842226] RIP: tcp_close+0x2c6/0x440 RSP: ffffa6630422bde8
      [  134.842227] ---[ end trace b7138fc08c831480 ]---
      
      The proposed patch eliminates a potential racing condition.
      Before, usb_submit_urb was called and _after_ that, the skb was attached
      (dev->tx_skb). So, on a callback it was possible, however unlikely that the
      skb was freed before it was set. That way (because dev->tx_skb was not set
      to NULL after it was freed), it could happen that a skb from a earlier
      transmission was freed a second time (and the skb we should have freed did
      not get freed at all)
      
      Now we free the skb directly in ipheth_tx(). It is not passed to the
      callback anymore, eliminating the posibility of a double free of the same
      skb. Depending on the retval of usb_submit_urb() we use dev_kfree_skb_any()
      respectively dev_consume_skb_any() to free the skb.
      Signed-off-by: default avatarOliver Zweigle <Oliver.Zweigle@faro.com>
      Signed-off-by: default avatarBernd Eckstein <3ernd.Eckstein@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ff67a7d3
    • Julian Wiedmann's avatar
      s390/qeth: fix length check in SNMP processing · 1d3891c7
      Julian Wiedmann authored
      [ Upstream commit 9a764c1e ]
      
      The response for a SNMP request can consist of multiple parts, which
      the cmd callback stages into a kernel buffer until all parts have been
      received. If the callback detects that the staging buffer provides
      insufficient space, it bails out with error.
      This processing is buggy for the first part of the response - while it
      initially checks for a length of 'data_len', it later copies an
      additional amount of 'offsetof(struct qeth_snmp_cmd, data)' bytes.
      
      Fix the calculation of 'data_len' for the first part of the response.
      This also nicely cleans up the memcpy code.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Reviewed-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1d3891c7
    • Pan Bian's avatar
      rapidio/rionet: do not free skb before reading its length · 13a3d890
      Pan Bian authored
      [ Upstream commit cfc43519 ]
      
      skb is freed via dev_kfree_skb_any, however, skb->len is read then. This
      may result in a use-after-free bug.
      
      Fixes: e6161d64 ("rapidio/rionet: rework driver initialization and removal")
      Signed-off-by: default avatarPan Bian <bianpan2016@163.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      13a3d890
    • Petr Machata's avatar
      net: skb_scrub_packet(): Scrub offload_fwd_mark · d39ebd19
      Petr Machata authored
      [ Upstream commit b5dd186d ]
      
      When a packet is trapped and the corresponding SKB marked as
      already-forwarded, it retains this marking even after it is forwarded
      across veth links into another bridge. There, since it ingresses the
      bridge over veth, which doesn't have offload_fwd_mark, it triggers a
      warning in nbp_switchdev_frame_mark().
      
      Then nbp_switchdev_allowed_egress() decides not to allow egress from
      this bridge through another veth, because the SKB is already marked, and
      the mark (of 0) of course matches. Thus the packet is incorrectly
      blocked.
      
      Solve by resetting offload_fwd_mark() in skb_scrub_packet(). That
      function is called from tunnels and also from veth, and thus catches the
      cases where traffic is forwarded between bridges and transformed in a
      way that invalidates the marking.
      
      Fixes: 6bc506b4 ("bridge: switchdev: Add forward mark support for stacked devices")
      Fixes: abf4bb6b ("skbuff: Add the offload_mr_fwd_mark field")
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Suggested-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d39ebd19
    • Sasha Levin's avatar
      Revert "wlcore: Add missing PM call for wlcore_cmd_wait_for_event_or_timeout()" · ad0ee4f5
      Sasha Levin authored
      This reverts commit afeeecc7 which was
      upstream commit 4ec7cece.
      
      From Dietmar May's report on the stable mailing list
      (https://www.spinics.net/lists/stable/msg272201.html):
      
      > I've run into some problems which appear due to (a) recent patch(es) on
      > the wlcore wifi driver.
      >
      > 4.4.160 - commit 3fdd3464
      > 4.9.131 - commit afeeecc7
      >
      > Earlier versions (4.9.130 and 4.4.159 - tested back to 4.4.49) do not
      > exhibit this problem. It is still present in 4.9.141.
      >
      > master as of 4.20.0-rc4 does not exhibit this problem.
      >
      > Basically, during client association when in AP mode (running hostapd),
      > handshake may or may not complete following a noticeable delay. If
      > successful, then the driver fails consistently in warn_slowpath_null
      > during disassociation. If unsuccessful, the wifi client attempts multiple
      > times, sometimes failing repeatedly. I've had clients unable to connect
      > for 3-5 minutes during testing, with the syslog filled with dozens of
      > backtraces. syslog details are below.
      >
      > I'm working on an embedded device with a TI 3352 ARM processor and a
      > murata wl1271 module in sdio mode. We're running a fully patched ubuntu
      > 18.04 ARM build, with a kernel built from kernel.org's stable/linux repo <https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-4.9.y&id=afeeecc764436f31d4447575bb9007732333818c>.
      > Relevant parts of the kernel config are included below.
      >
      > The commit message states:
      >
      > > /I've only seen this few times with the runtime PM patches enabled so
      > > this one is probably not needed before that. This seems to work
      > > currently based on the current PM implementation timer. Let's apply
      > > this separately though in case others are hitting this issue./
      > We're not doing anything explicit with power management. The device is an
      > IoT edge gateway with battery backup, normally running on wall power. The
      > battery is currently used solely to shut down the system cleanly to avoid
      > filesystem corruption.
      >
      > The device tree is configured to keep power in suspend; but the device
      > should never suspend, so in our case, there is no need to call
      > wl1271_ps_elp_wakeup() or wl1271_ps_elp_sleep(), as occurs in the patch.
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ad0ee4f5
    • Matthias Schwarzott's avatar
      media: em28xx: Fix use-after-free when disconnecting · 6fc74d9f
      Matthias Schwarzott authored
      [ Upstream commit 910b0797 ]
      
      Fix bug by moving the i2c_unregister_device calls after deregistration
      of dvb frontend.
      
      The new style i2c drivers already destroys the frontend object at
      i2c_unregister_device time.
      When the dvb frontend is unregistered afterwards it leads to this oops:
      
        [ 6058.866459] BUG: unable to handle kernel NULL pointer dereference at 00000000000001f8
        [ 6058.866578] IP: dvb_frontend_stop+0x30/0xd0 [dvb_core]
        [ 6058.866644] PGD 0
        [ 6058.866646] P4D 0
      
        [ 6058.866726] Oops: 0000 [#1] SMP
        [ 6058.866768] Modules linked in: rc_pinnacle_pctv_hd(O) em28xx_rc(O) si2157(O) si2168(O) em28xx_dvb(O) em28xx(O) si2165(O) a8293(O) tda10071(O) tea5767(O) tuner(O) cx23885(O) tda18271(O) videobuf2_dvb(O) videobuf2_dma_sg(O) m88ds3103(O) tveeprom(O) cx2341x(O) v4l2_common(O) dvb_core(O) rc_core(O) videobuf2_memops(O) videobuf2_v4l2(O) videobuf2_core(O) videodev(O) media(O) bluetooth ecdh_generic ums_realtek uas rtl8192cu rtl_usb rtl8192c_common rtlwifi usb_storage snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic i2c_mux snd_hda_intel snd_hda_codec snd_hwdep x86_pkg_temp_thermal snd_hda_core kvm_intel kvm irqbypass [last unloaded: videobuf2_memops]
        [ 6058.867497] CPU: 2 PID: 7349 Comm: kworker/2:0 Tainted: G        W  O    4.13.9-gentoo #1
        [ 6058.867595] Hardware name: MEDION E2050 2391/H81H3-EM2, BIOS H81EM2W08.308 08/25/2014
        [ 6058.867692] Workqueue: usb_hub_wq hub_event
        [ 6058.867746] task: ffff88011a15e040 task.stack: ffffc90003074000
        [ 6058.867825] RIP: 0010:dvb_frontend_stop+0x30/0xd0 [dvb_core]
        [ 6058.867896] RSP: 0018:ffffc90003077b58 EFLAGS: 00010293
        [ 6058.867964] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000010040001f
        [ 6058.868056] RDX: ffff88011a15e040 RSI: ffffea000464e400 RDI: ffff88001cbe3028
        [ 6058.868150] RBP: ffffc90003077b68 R08: ffff880119390380 R09: 000000010040001f
        [ 6058.868241] R10: ffffc90003077b18 R11: 000000000001e200 R12: ffff88001cbe3028
        [ 6058.868330] R13: ffff88001cbe68d0 R14: ffff8800cf734000 R15: ffff8800cf734098
        [ 6058.868419] FS:  0000000000000000(0000) GS:ffff88011fb00000(0000) knlGS:0000000000000000
        [ 6058.868511] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [ 6058.868578] CR2: 00000000000001f8 CR3: 00000001113c5000 CR4: 00000000001406e0
        [ 6058.868662] Call Trace:
        [ 6058.868705]  dvb_unregister_frontend+0x2a/0x80 [dvb_core]
        [ 6058.868774]  em28xx_dvb_fini+0x132/0x220 [em28xx_dvb]
        [ 6058.868840]  em28xx_close_extension+0x34/0x90 [em28xx]
        [ 6058.868902]  em28xx_usb_disconnect+0x4e/0x70 [em28xx]
        [ 6058.868968]  usb_unbind_interface+0x6d/0x260
        [ 6058.869025]  device_release_driver_internal+0x150/0x210
        [ 6058.869094]  device_release_driver+0xd/0x10
        [ 6058.869150]  bus_remove_device+0xe4/0x160
        [ 6058.869204]  device_del+0x1ce/0x2f0
        [ 6058.869253]  usb_disable_device+0x99/0x270
        [ 6058.869306]  usb_disconnect+0x8d/0x260
        [ 6058.869359]  hub_event+0x93d/0x1520
        [ 6058.869408]  ? dequeue_task_fair+0xae5/0xd20
        [ 6058.869467]  process_one_work+0x1d9/0x3e0
        [ 6058.869522]  worker_thread+0x43/0x3e0
        [ 6058.869576]  kthread+0x104/0x140
        [ 6058.869602]  ? trace_event_raw_event_workqueue_work+0x80/0x80
        [ 6058.869640]  ? kthread_create_on_node+0x40/0x40
        [ 6058.869673]  ret_from_fork+0x22/0x30
        [ 6058.869698] Code: 54 49 89 fc 53 48 8b 9f 18 03 00 00 0f 1f 44 00 00 41 83 bc 24 04 05 00 00 02 74 0c 41 c7 84 24 04 05 00 00 01 00 00 00 0f ae f0 <48> 8b bb f8 01 00 00 48 85 ff 74 5c e8 df 40 f0 e0 48 8b 93 f8
        [ 6058.869850] RIP: dvb_frontend_stop+0x30/0xd0 [dvb_core] RSP: ffffc90003077b58
        [ 6058.869894] CR2: 00000000000001f8
        [ 6058.875880] ---[ end trace 717eecf7193b3fc6 ]---
      Signed-off-by: default avatarMatthias Schwarzott <zzam@gentoo.org>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6fc74d9f
    • Hugh Dickins's avatar
      mm/khugepaged: collapse_shmem() do not crash on Compound · dc62803e
      Hugh Dickins authored
      commit 06a5e126 upstream.
      
      collapse_shmem()'s VM_BUG_ON_PAGE(PageTransCompound) was unsafe: before
      it holds page lock of the first page, racing truncation then extension
      might conceivably have inserted a hugepage there already.  Fail with the
      SCAN_PAGE_COMPOUND result, instead of crashing (CONFIG_DEBUG_VM=y) or
      otherwise mishandling the unexpected hugepage - though later we might
      code up a more constructive way of handling it, with SCAN_SUCCESS.
      
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261529310.2275@eggly.anvils
      Fixes: f3f0e1d2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: <stable@vger.kernel.org>	[4.8+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      dc62803e
    • Hugh Dickins's avatar
      mm/khugepaged: collapse_shmem() without freezing new_page · 8dcbb5f2
      Hugh Dickins authored
      commit 87c460a0 upstream.
      
      khugepaged's collapse_shmem() does almost all of its work, to assemble
      the huge new_page from 512 scattered old pages, with the new_page's
      refcount frozen to 0 (and refcounts of all old pages so far also frozen
      to 0).  Including shmem_getpage() to read in any which were out on swap,
      memory reclaim if necessary to allocate their intermediate pages, and
      copying over all the data from old to new.
      
      Imagine the frozen refcount as a spinlock held, but without any lock
      debugging to highlight the abuse: it's not good, and under serious load
      heads into lockups - speculative getters of the page are not expecting
      to spin while khugepaged is rescheduled.
      
      One can get a little further under load by hacking around elsewhere; but
      fortunately, freezing the new_page turns out to have been entirely
      unnecessary, with no hacks needed elsewhere.
      
      The huge new_page lock is already held throughout, and guards all its
      subpages as they are brought one by one into the page cache tree; and
      anything reading the data in that page, without the lock, before it has
      been marked PageUptodate, would already be in the wrong.  So simply
      eliminate the freezing of the new_page.
      
      Each of the old pages remains frozen with refcount 0 after it has been
      replaced by a new_page subpage in the page cache tree, until they are
      all unfrozen on success or failure: just as before.  They could be
      unfrozen sooner, but cause no problem once no longer visible to
      find_get_entry(), filemap_map_pages() and other speculative lookups.
      
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261527570.2275@eggly.anvils
      Fixes: f3f0e1d2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: <stable@vger.kernel.org>	[4.8+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8dcbb5f2
    • Hugh Dickins's avatar
      mm/khugepaged: minor reorderings in collapse_shmem() · c2ca73b7
      Hugh Dickins authored
      commit 042a3082 upstream.
      
      Several cleanups in collapse_shmem(): most of which probably do not
      really matter, beyond doing things in a more familiar and reassuring
      order.  Simplify the failure gotos in the main loop, and on success
      update stats while interrupts still disabled from the last iteration.
      
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261526400.2275@eggly.anvils
      Fixes: f3f0e1d2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: <stable@vger.kernel.org>	[4.8+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c2ca73b7
    • Hugh Dickins's avatar
      mm/khugepaged: collapse_shmem() remember to clear holes · 5c0ecc2b
      Hugh Dickins authored
      commit 2af8ff29 upstream.
      
      Huge tmpfs testing reminds us that there is no __GFP_ZERO in the gfp
      flags khugepaged uses to allocate a huge page - in all common cases it
      would just be a waste of effort - so collapse_shmem() must remember to
      clear out any holes that it instantiates.
      
      The obvious place to do so, where they are put into the page cache tree,
      is not a good choice: because interrupts are disabled there.  Leave it
      until further down, once success is assured, where the other pages are
      copied (before setting PageUptodate).
      
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261525080.2275@eggly.anvils
      Fixes: f3f0e1d2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: <stable@vger.kernel.org>	[4.8+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5c0ecc2b
    • Hugh Dickins's avatar
      mm/khugepaged: fix crashes due to misaccounted holes · 0dba3e54
      Hugh Dickins authored
      commit aaa52e34 upstream.
      
      Huge tmpfs testing on a shortish file mapped into a pmd-rounded extent
      hit shmem_evict_inode()'s WARN_ON(inode->i_blocks) followed by
      clear_inode()'s BUG_ON(inode->i_data.nrpages) when the file was later
      closed and unlinked.
      
      khugepaged's collapse_shmem() was forgetting to update mapping->nrpages
      on the rollback path, after it had added but then needs to undo some
      holes.
      
      There is indeed an irritating asymmetry between shmem_charge(), whose
      callers want it to increment nrpages after successfully accounting
      blocks, and shmem_uncharge(), when __delete_from_page_cache() already
      decremented nrpages itself: oh well, just add a comment on that to them
      both.
      
      And shmem_recalc_inode() is supposed to be called when the accounting is
      expected to be in balance (so it can deduce from imbalance that reclaim
      discarded some pages): so change shmem_charge() to update nrpages
      earlier (though it's rare for the difference to matter at all).
      
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261523450.2275@eggly.anvils
      Fixes: 800d8c63 ("shmem: add huge pages support")
      Fixes: f3f0e1d2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: <stable@vger.kernel.org>	[4.8+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0dba3e54
    • Mike Rapoport's avatar
      shmem: introduce shmem_inode_acct_block · 9815b0fc
      Mike Rapoport authored
      commit 0f079694 upstream.
      
      The shmem_acct_block and the update of used_blocks are following one
      another in all the places they are used.  Combine these two into a
      helper function.
      
      Link: http://lkml.kernel.org/r/1497939652-16528-3-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Pavel Emelyanov <xemul@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9815b0fc
    • Mike Rapoport's avatar
      shmem: shmem_charge: verify max_block is not exceeded before inode update · cae7ed25
      Mike Rapoport authored
      commit b1cc94ab upstream.
      
      Patch series "userfaultfd: enable zeropage support for shmem".
      
      These patches enable support for UFFDIO_ZEROPAGE for shared memory.
      
      The first two patches are not strictly related to userfaultfd, they are
      just minor refactoring to reduce amount of code duplication.
      
      This patch (of 7):
      
      Currently we update inode and shmem_inode_info before verifying that
      used_blocks will not exceed max_blocks.  In case it will, we undo the
      update.  Let's switch the order and move the verification of the blocks
      count before the inode and shmem_inode_info update.
      
      Link: http://lkml.kernel.org/r/1497939652-16528-2-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Pavel Emelyanov <xemul@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      cae7ed25
    • Hugh Dickins's avatar
      mm/khugepaged: collapse_shmem() stop if punched or truncated · 10e458e6
      Hugh Dickins authored
      commit 701270fa upstream.
      
      Huge tmpfs testing showed that although collapse_shmem() recognizes a
      concurrently truncated or hole-punched page correctly, its handling of
      holes was liable to refill an emptied extent.  Add check to stop that.
      
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261522040.2275@eggly.anvils
      Fixes: f3f0e1d2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarMatthew Wilcox <willy@infradead.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: <stable@vger.kernel.org>	[4.8+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      10e458e6
    • Hugh Dickins's avatar
      mm/huge_memory: fix lockdep complaint on 32-bit i_size_read() · b59b24fe
      Hugh Dickins authored
      commit 006d3ff2 upstream.
      
      Huge tmpfs testing, on 32-bit kernel with lockdep enabled, showed that
      __split_huge_page() was using i_size_read() while holding the irq-safe
      lru_lock and page tree lock, but the 32-bit i_size_read() uses an
      irq-unsafe seqlock which should not be nested inside them.
      
      Instead, read the i_size earlier in split_huge_page_to_list(), and pass
      the end offset down to __split_huge_page(): all while holding head page
      lock, which is enough to prevent truncation of that extent before the
      page tree lock has been taken.
      
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261520070.2275@eggly.anvils
      Fixes: baa355fd ("thp: file pages support for split_huge_page()")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: <stable@vger.kernel.org>	[4.8+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b59b24fe
    • Hugh Dickins's avatar
      mm/huge_memory: splitting set mapping+index before unfreeze · ffdad597
      Hugh Dickins authored
      commit 173d9d9f upstream.
      
      Huge tmpfs stress testing has occasionally hit shmem_undo_range()'s
      VM_BUG_ON_PAGE(page_to_pgoff(page) != index, page).
      
      Move the setting of mapping and index up before the page_ref_unfreeze()
      in __split_huge_page_tail() to fix this: so that a page cache lookup
      cannot get a reference while the tail's mapping and index are unstable.
      
      In fact, might as well move them up before the smp_wmb(): I don't see an
      actual need for that, but if I'm missing something, this way round is
      safer than the other, and no less efficient.
      
      You might argue that VM_BUG_ON_PAGE(page_to_pgoff(page) != index, page) is
      misplaced, and should be left until after the trylock_page(); but left as
      is has not crashed since, and gives more stringent assurance.
      
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261516380.2275@eggly.anvils
      Fixes: e9b61f19 ("thp: reintroduce split_huge_page()")
      Requires: 605ca5ed ("mm/huge_memory.c: reorder operations in __split_huge_page_tail()")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: <stable@vger.kernel.org>	[4.8+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ffdad597
    • Konstantin Khlebnikov's avatar
      mm/huge_memory.c: reorder operations in __split_huge_page_tail() · fb732e62
      Konstantin Khlebnikov authored
      commit 605ca5ed upstream.
      
      THP split makes non-atomic change of tail page flags.  This is almost ok
      because tail pages are locked and isolated but this breaks recent
      changes in page locking: non-atomic operation could clear bit
      PG_waiters.
      
      As a result concurrent sequence get_page_unless_zero() -> lock_page()
      might block forever.  Especially if this page was truncated later.
      
      Fix is trivial: clone flags before unfreezing page reference counter.
      
      This race exists since commit 62906027 ("mm: add PageWaiters
      indicating tasks are waiting for a page bit") while unsave unfreeze
      itself was added in commit 8df651c7 ("thp: cleanup
      split_huge_page()").
      
      clear_compound_head() also must be called before unfreezing page
      reference because after successful get_page_unless_zero() might follow
      put_page() which needs correct compound_head().
      
      And replace page_ref_inc()/page_ref_add() with page_ref_unfreeze() which
      is made especially for that and has semantic of smp_store_release().
      
      Link: http://lkml.kernel.org/r/151844393341.210639.13162088407980624477.stgit@buzzSigned-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      fb732e62
    • Hugh Dickins's avatar
      mm/huge_memory: rename freeze_page() to unmap_page() · b48c29b1
      Hugh Dickins authored
      commit 906f9cdf upstream.
      
      The term "freeze" is used in several ways in the kernel, and in mm it
      has the particular meaning of forcing page refcount temporarily to 0.
      freeze_page() is just too confusing a name for a function that unmaps a
      page: rename it unmap_page(), and rename unfreeze_page() remap_page().
      
      Went to change the mention of freeze_page() added later in mm/rmap.c,
      but found it to be incorrect: ordinary page reclaim reaches there too;
      but the substance of the comment still seems correct, so edit it down.
      
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261514080.2275@eggly.anvils
      Fixes: e9b61f19 ("thp: reintroduce split_huge_page()")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: <stable@vger.kernel.org>	[4.8+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b48c29b1
  2. 01 Dec, 2018 11 commits