1. 20 Nov, 2016 4 commits
  2. 19 Nov, 2016 17 commits
    • Linus Torvalds's avatar
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 50d438fb
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang:
       "Some I2C driver bugfixes (and one documentation fix)"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: i2c-mux-pca954x: fix deselect enabling for device-tree
        i2c: digicolor: use clk_disable_unprepare instead of clk_unprepare
        i2c: mux: fix up dependencies
        i2c: Documentation: i2c-topology: fix minor whitespace nit
        i2c: mux: demux-pinctrl: make drivers with no pinctrl work again
      50d438fb
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · dce9ce36
      Linus Torvalds authored
      Pull KVM fixes from Radim Krčmář:
       "ARM:
         - Fix handling of the 32bit cycle counter
         - Fix cycle counter filtering
      
        x86:
         - Fix a race leading to double unregistering of user notifiers
         - Amend oversight in kvm_arch_set_irq that turned Hyper-V code dead
         - Use SRCU around kvm_lapic_set_vapic_addr
         - Avoid recursive flushing of asynchronous page faults
         - Do not rely on deferred update in KVM_GET_CLOCK, which fixes #GP
         - Let userspace know that KVM_GET_CLOCK is useful with master clock;
           4.9 changed the return value to better match the guest clock, but
           didn't provide means to let guests take advantage of it"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        kvm: x86: merge kvm_arch_set_irq and kvm_arch_set_irq_inatomic
        KVM: x86: fix missed SRCU usage in kvm_lapic_set_vapic_addr
        KVM: async_pf: avoid recursive flushing of work items
        kvm: kvmclock: let KVM_GET_CLOCK return whether the master clock is in use
        KVM: Disable irq while unregistering user notifier
        KVM: x86: do not go through vcpu in __get_kvmclock_ns
        KVM: arm64: Fix the issues when guest PMCCFILTR is configured
        arm64: KVM: pmu: Fix AArch32 cycle counter access
      dce9ce36
    • Alex Hemme's avatar
      i2c: i2c-mux-pca954x: fix deselect enabling for device-tree · ad092de6
      Alex Hemme authored
      Deselect functionality can be ignored for device-trees with
      "i2c-mux-idle-disconnect" entries if no platform_data is available.
      By enabling the deselect functionality outside the platform_data
      block the logic works as it did in previous kernels.
      
      Fixes: 7fcac980 ("i2c: i2c-mux-pca954x: convert to use an explicit i2c mux core")
      Cc: <stable@vger.kernel.org> # v4.7+
      Signed-off-by: default avatarAlex Hemme <ahemme@cisco.com>
      Signed-off-by: default avatarZiyang Wu <ziywu@cisco.com>
      [touched up a few minor issues /peda]
      Signed-off-by: default avatarPeter Rosin <peda@axentia.se>
      Signed-off-by: default avatarWolfram Sang <wsa@the-dreams.de>
      ad092de6
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.9-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · f6918382
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "Fixes marked for stable:
         - fix system reset interrupt winkle wakeups
         - fix setting of AIL in hypervisor mode
      
        Fixes for code merged this cycle:
         - fix exception vector build with 2.23 era binutils
         - fix missing update of HID register on secondary CPUs
      
        Other:
         - fix missing pr_cont()s
         - invalidate ERAT on tlbiel for POWER9 DD1"
      
      * tag 'powerpc-4.9-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/mm: Fix missing update of HID register on secondary CPUs
        powerpc/mm/radix: Invalidate ERAT on tlbiel for POWER9 DD1
        powerpc/64: Fix setting of AIL in hypervisor mode
        powerpc/oops: Fix missing pr_cont()s in instruction dump
        powerpc/oops: Fix missing pr_cont()s in show_regs()
        powerpc/oops: Fix missing pr_cont()s in print_msr_bits() et. al.
        powerpc/oops: Fix missing pr_cont()s in show_stack()
        powerpc: Fix exception vector build with 2.23 era binutils
        powerpc/64s: Fix system reset interrupt winkle wakeups
      f6918382
    • Linus Torvalds's avatar
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 384b0dc4
      Linus Torvalds authored
      Pull crypto fixes from Herbert Xu:
       "This fixes the following issues:
      
         - Compiler warning in caam driver that was the last one remaining
      
         - Do not register aes-xts in caam drivers on unsupported platforms
      
         - Regression in algif_hash interface that may lead to an oops"
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: algif_hash - Fix NULL hash crash with shash
        crypto: caam - fix type mismatch warning
        crypto: caam - do not register AES-XTS mode on LP units
      384b0dc4
    • Linus Torvalds's avatar
      Merge tag 'leds_4.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds · 67418976
      Linus Torvalds authored
      Pull LED subsystem update from Jacek Anaszewski:
       "I'd like to announce a new co-maintainer - Pavel Machek"
      
      * tag 'leds_4.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
        MAINTAINERS: Add LED subsystem co-maintainer
      67418976
    • Linus Torvalds's avatar
      Merge tag 'dmaengine-fix-4.9-rc6' of git://git.infradead.org/users/vkoul/slave-dma · eab8d4bc
      Linus Torvalds authored
      Pull dmaengine fixes from Vinod Koul:
       "Some driver fixes which we pending in my tree:
      
         - return error code fix in edma driver
         - Kconfig fix for genric allocator in mmp_tdma
         - fix uninitialized value in sun6i
         - Runtime pm fixes for cppi"
      
      * tag 'dmaengine-fix-4.9-rc6' of git://git.infradead.org/users/vkoul/slave-dma:
        dmaengine: cppi41: More PM runtime fixes
        dmaengine: cpp41: Fix handling of error path
        dmaengine: cppi41: Fix unpaired pm runtime when only a USB hub is connected
        dmaengine: cppi41: Fix list not empty warning on module removal
        dmaengine: sun6i: fix the uninitialized value for v_lli
        dmaengine: mmp_tdma: add missing select GENERIC_ALLOCATOR in Kconfig
        dmaengine: edma: Fix error return code in edma_alloc_chan_resources()
      eab8d4bc
    • Paolo Bonzini's avatar
      kvm: x86: merge kvm_arch_set_irq and kvm_arch_set_irq_inatomic · a2b07739
      Paolo Bonzini authored
      kvm_arch_set_irq is unused since commit b97e6de9.  Merge
      its functionality with kvm_arch_set_irq_inatomic.
      Reported-by: default avatarJiang Biao <jiang.biao2@zte.com.cn>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      a2b07739
    • Paolo Bonzini's avatar
      KVM: x86: fix missed SRCU usage in kvm_lapic_set_vapic_addr · 7301d6ab
      Paolo Bonzini authored
      Reported by syzkaller:
      
          [ INFO: suspicious RCU usage. ]
          4.9.0-rc4+ #47 Not tainted
          -------------------------------
          ./include/linux/kvm_host.h:536 suspicious rcu_dereference_check() usage!
      
          stack backtrace:
          CPU: 1 PID: 6679 Comm: syz-executor Not tainted 4.9.0-rc4+ #47
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
           ffff880039e2f6d0 ffffffff81c2e46b ffff88003e3a5b40 0000000000000000
           0000000000000001 ffffffff83215600 ffff880039e2f700 ffffffff81334ea9
           ffffc9000730b000 0000000000000004 ffff88003c4f8420 ffff88003d3f8000
          Call Trace:
           [<     inline     >] __dump_stack lib/dump_stack.c:15
           [<ffffffff81c2e46b>] dump_stack+0xb3/0x118 lib/dump_stack.c:51
           [<ffffffff81334ea9>] lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4445
           [<     inline     >] __kvm_memslots include/linux/kvm_host.h:534
           [<     inline     >] kvm_memslots include/linux/kvm_host.h:541
           [<ffffffff8105d6ae>] kvm_gfn_to_hva_cache_init+0xa1e/0xce0 virt/kvm/kvm_main.c:1941
           [<ffffffff8112685d>] kvm_lapic_set_vapic_addr+0xed/0x140 arch/x86/kvm/lapic.c:2217
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Fixes: fda4e2e8
      Cc: Andrew Honig <ahonig@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      7301d6ab
    • Paolo Bonzini's avatar
      KVM: async_pf: avoid recursive flushing of work items · 22583f0d
      Paolo Bonzini authored
      This was reported by syzkaller:
      
          [ INFO: possible recursive locking detected ]
          4.9.0-rc4+ #49 Not tainted
          ---------------------------------------------
          kworker/2:1/5658 is trying to acquire lock:
           ([ 1644.769018] (&work->work)
          [<     inline     >] list_empty include/linux/compiler.h:243
          [<ffffffff8128dd60>] flush_work+0x0/0x660 kernel/workqueue.c:1511
      
          but task is already holding lock:
           ([ 1644.769018] (&work->work)
          [<ffffffff812916ab>] process_one_work+0x94b/0x1900 kernel/workqueue.c:2093
      
          stack backtrace:
          CPU: 2 PID: 5658 Comm: kworker/2:1 Not tainted 4.9.0-rc4+ #49
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
          Workqueue: events async_pf_execute
           ffff8800676ff630 ffffffff81c2e46b ffffffff8485b930 ffff88006b1fc480
           0000000000000000 ffffffff8485b930 ffff8800676ff7e0 ffffffff81339b27
           ffff8800676ff7e8 0000000000000046 ffff88006b1fcce8 ffff88006b1fccf0
          Call Trace:
          ...
          [<ffffffff8128ddf3>] flush_work+0x93/0x660 kernel/workqueue.c:2846
          [<ffffffff812954ea>] __cancel_work_timer+0x17a/0x410 kernel/workqueue.c:2916
          [<ffffffff81295797>] cancel_work_sync+0x17/0x20 kernel/workqueue.c:2951
          [<ffffffff81073037>] kvm_clear_async_pf_completion_queue+0xd7/0x400 virt/kvm/async_pf.c:126
          [<     inline     >] kvm_free_vcpus arch/x86/kvm/x86.c:7841
          [<ffffffff810b728d>] kvm_arch_destroy_vm+0x23d/0x620 arch/x86/kvm/x86.c:7946
          [<     inline     >] kvm_destroy_vm virt/kvm/kvm_main.c:731
          [<ffffffff8105914e>] kvm_put_kvm+0x40e/0x790 virt/kvm/kvm_main.c:752
          [<ffffffff81072b3d>] async_pf_execute+0x23d/0x4f0 virt/kvm/async_pf.c:111
          [<ffffffff8129175c>] process_one_work+0x9fc/0x1900 kernel/workqueue.c:2096
          [<ffffffff8129274f>] worker_thread+0xef/0x1480 kernel/workqueue.c:2230
          [<ffffffff812a5a94>] kthread+0x244/0x2d0 kernel/kthread.c:209
          [<ffffffff831f102a>] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433
      
      The reason is that kvm_put_kvm is causing the destruction of the VM, but
      the page fault is still on the ->queue list.  The ->queue list is owned
      by the VCPU, not by the work items, so we cannot just add list_del to
      the work item.
      
      Instead, use work->vcpu to note async page faults that have been resolved
      and will be processed through the done list.  There is no need to flush
      those.
      
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      22583f0d
    • Paolo Bonzini's avatar
      kvm: kvmclock: let KVM_GET_CLOCK return whether the master clock is in use · e3fd9a93
      Paolo Bonzini authored
      Userspace can read the exact value of kvmclock by reading the TSC
      and fetching the timekeeping parameters out of guest memory.  This
      however is brittle and not necessary anymore with KVM 4.11.  Provide
      a mechanism that lets userspace know if the new KVM_GET_CLOCK
      semantics are in effect, and---since we are at it---if the clock
      is stable across all VCPUs.
      
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      e3fd9a93
    • Ignacio Alvarado's avatar
      KVM: Disable irq while unregistering user notifier · 1650b4eb
      Ignacio Alvarado authored
      Function user_notifier_unregister should be called only once for each
      registered user notifier.
      
      Function kvm_arch_hardware_disable can be executed from an IPI context
      which could cause a race condition with a VCPU returning to user mode
      and attempting to unregister the notifier.
      Signed-off-by: default avatarIgnacio Alvarado <ikalvarado@google.com>
      Cc: stable@vger.kernel.org
      Fixes: 18863bdd ("KVM: x86 shared msr infrastructure")
      Reviewed-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      1650b4eb
    • Paolo Bonzini's avatar
      KVM: x86: do not go through vcpu in __get_kvmclock_ns · 8b953440
      Paolo Bonzini authored
      Going through the first VCPU is wrong if you follow a KVM_SET_CLOCK with
      a KVM_GET_CLOCK immediately after, without letting the VCPU run and
      call kvm_guest_time_update.
      
      To fix this, compute the kvmclock value ourselves, using the master
      clock (tsc, nsec) pair as the base and the host CPU frequency as
      the scale.
      Reported-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      8b953440
    • Radim Krčmář's avatar
      Merge tag 'kvm-arm-for-4.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm · e5dbc4bf
      Radim Krčmář authored
      KVM/ARM updates for v4.9-rc6
      
      - Fix handling of the 32bit cycle counter
      - Fix cycle counter filtering
      e5dbc4bf
    • Linus Torvalds's avatar
      Merge tag 'acpi-4.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 20afa6e2
      Linus Torvalds authored
      Pull ACPI fixes from Rafael Wysocki:
       "They fix an ACPI thermal management regression introduced by a recent
        FADT handling cleanup, an ACPI tools build issue introduced by a
        recent ACPICA commit and a PCC mailbox initialization bug causing
        lockdep to complain loudly.
      
        Specifics:
      
         - Revert a recent ACPICA cleanup that attempted to get rid of all
           FADT version 2 legacy, but broke ACPI thermal management on at
           least one system (Rafael Wysocki).
      
         - Fix cross-compiled builds of ACPI tools that stopped working after
           a recent cleanup related to the handling of header files in ACPICA
           (Lv Zheng).
      
         - Fix a locking issue in the PCC channel initialization code that
           invokes devm_request_irq() under a spinlock (among other things)
           and causes lockdep to complain (Hoan Tran)"
      
      * tag 'acpi-4.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        tools/power/acpi: Remove direct kernel source include reference
        mailbox: PCC: Fix lockdep warning when request PCC channel
        Revert "ACPICA: FADT support cleanup"
      20afa6e2
    • Linus Torvalds's avatar
      Merge branch 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild · 04e36857
      Linus Torvalds authored
      Pull kbuild fixes from Michal Marek:
       "Here are some regression fixes for kbuild:
      
         - modversion support for exported asm symbols (Nick Piggin). The
           affected architectures need separate patches adding
           asm-prototypes.h.
      
         - fix rebuilds of lib-ksyms.o (Nick Piggin)
      
         - -fno-PIE builds (Sebastian Siewior and Borislav Petkov). This is
           not a kernel regression, but one of the Debian gcc package.
           Nevertheless, it's quite annoying, so I think it should go into
           mainline and stable now"
      
      * 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
        kbuild: Steal gcc's pie from the very beginning
        kbuild: be more careful about matching preprocessed asm ___EXPORT_SYMBOL
        x86/kexec: add -fno-PIE
        scripts/has-stack-protector: add -fno-PIE
        kbuild: add -fno-PIE
        kbuild: modversions for EXPORT_SYMBOL() for asm
        kbuild: prevent lib-ksyms.o rebuilds
      04e36857
    • Linus Torvalds's avatar
      Merge tag 'nfsd-4.9-2' of git://linux-nfs.org/~bfields/linux · aad931a3
      Linus Torvalds authored
      Pull nfsd bugfix from Bruce Fields:
       "Just one fix for an NFS/RDMA crash"
      
      * tag 'nfsd-4.9-2' of git://linux-nfs.org/~bfields/linux:
        sunrpc: svc_age_temp_xprts_now should not call setsockopt non-tcp transports
      aad931a3
  3. 18 Nov, 2016 14 commits
  4. 17 Nov, 2016 5 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 62389867
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "A set of fixes, one for NVMe from Keith, and a set for nvme-{rdma,t,f}
        from the usual suspects, fixing actual problems that would be a shame
        to release 4.9 with"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        nvme/pci: Don't free queues on error
        nvmet-rdma: drain the queue-pair just before freeing it
        nvme-rdma: stop and free io queues on connect failure
        nvmet-rdma: don't forget to delete a queue from the list of connection failed
        nvmet: Don't queue fatal error work if csts.cfs is set
        nvme-rdma: reject non-connect commands before the queue is live
        nvmet-rdma: Fix possible NULL deref when handling rdma cm events
      62389867
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma · 57400d30
      Linus Torvalds authored
      Pull rmda fixes from Doug Ledford.
       "First round of -rc fixes.
      
        Due to various issues, I've been away and couldn't send a pull request
        for about three weeks. There were a number of -rc patches that built
        up in the meantime (some where there already from the early -rc
        stages). Obviously, there were way too many to send now, so I tried to
        pare the list down to the more important patches for the -rc cycle.
      
        Most of the code has had plenty of soak time at the various vendor's
        testing setups, so I doubt there will be another -rc pull request this
        cycle. I also tried to limit the patches to those with smaller
        footprints, so even though a shortlog is longer than I would like, the
        actual diffstat is mostly very small with the exception of just three
        files that had more changes, and a couple files with pure removals.
      
        Summary:
         - Misc Intel hfi1 fixes
         - Misc Mellanox mlx4, mlx5, and rxe fixes
         - A couple cxgb4 fixes"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (34 commits)
        iw_cxgb4: invalidate the mr when posting a read_w_inv wr
        iw_cxgb4: set *bad_wr for post_send/post_recv errors
        IB/rxe: Update qp state for user query
        IB/rxe: Clear queue buffer when modifying QP to reset
        IB/rxe: Fix handling of erroneous WR
        IB/rxe: Fix kernel panic in UDP tunnel with GRO and RX checksum
        IB/mlx4: Fix create CQ error flow
        IB/mlx4: Check gid_index return value
        IB/mlx5: Fix NULL pointer dereference on debug print
        IB/mlx5: Fix fatal error dispatching
        IB/mlx5: Resolve soft lock on massive reg MRs
        IB/mlx5: Use cache line size to select CQE stride
        IB/mlx5: Validate requested RQT size
        IB/mlx5: Fix memory leak in query device
        IB/core: Avoid unsigned int overflow in sg_alloc_table
        IB/core: Add missing check for addr_resolve callback return value
        IB/core: Set routable RoCE gid type for ipv4/ipv6 networks
        IB/cm: Mark stale CM id's whenever the mad agent was unregistered
        IB/uverbs: Fix leak of XRC target QPs
        IB/hfi1: Remove incorrect IS_ERR check
        ...
      57400d30
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · bec1b089
      Linus Torvalds authored
      Pull vfs fixes from Al Viro:
       "A couple of regression fixes"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        fix iov_iter_advance() for ITER_PIPE
        xattr: Fix setting security xattrs on sockfs
      bec1b089
    • Linus Torvalds's avatar
      Merge tag 'for-linus-4.9-rc5-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux · d46bc34d
      Linus Torvalds authored
      Pull orangefs fix from Mike Marshall:
       "orangefs: add .owner to debugfs file_operations
      
        Without ".owner = THIS_MODULE" it is possible to crash the kernel by
        unloading the Orangefs module while someone is reading debugfs files"
      
      * tag 'for-linus-4.9-rc5-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
        orangefs: add .owner to debugfs file_operations
      d46bc34d
    • Aaron Lu's avatar
      mremap: fix race between mremap() and page cleanning · 5d190420
      Aaron Lu authored
      Prior to 3.15, there was a race between zap_pte_range() and
      page_mkclean() where writes to a page could be lost.  Dave Hansen
      discovered by inspection that there is a similar race between
      move_ptes() and page_mkclean().
      
      We've been able to reproduce the issue by enlarging the race window with
      a msleep(), but have not been able to hit it without modifying the code.
      So, we think it's a real issue, but is difficult or impossible to hit in
      practice.
      
      The zap_pte_range() issue is fixed by commit 1cf35d47("mm: split
      'tlb_flush_mmu()' into tlb flushing and memory freeing parts").  And
      this patch is to fix the race between page_mkclean() and mremap().
      
      Here is one possible way to hit the race: suppose a process mmapped a
      file with READ | WRITE and SHARED, it has two threads and they are bound
      to 2 different CPUs, e.g.  CPU1 and CPU2.  mmap returned X, then thread
      1 did a write to addr X so that CPU1 now has a writable TLB for addr X
      on it.  Thread 2 starts mremaping from addr X to Y while thread 1
      cleaned the page and then did another write to the old addr X again.
      The 2nd write from thread 1 could succeed but the value will get lost.
      
              thread 1                           thread 2
           (bound to CPU1)                    (bound to CPU2)
      
        1: write 1 to addr X to get a
           writeable TLB on this CPU
      
                                              2: mremap starts
      
                                              3: move_ptes emptied PTE for addr X
                                                 and setup new PTE for addr Y and
                                                 then dropped PTL for X and Y
      
        4: page laundering for N by doing
           fadvise FADV_DONTNEED. When done,
           pageframe N is deemed clean.
      
        5: *write 2 to addr X
      
                                              6: tlb flush for addr X
      
        7: munmap (Y, pagesize) to make the
           page unmapped
      
        8: fadvise with FADV_DONTNEED again
           to kick the page off the pagecache
      
        9: pread the page from file to verify
           the value. If 1 is there, it means
           we have lost the written 2.
      
        *the write may or may not cause segmentation fault, it depends on
        if the TLB is still on the CPU.
      
      Please note that this is only one specific way of how the race could
      occur, it didn't mean that the race could only occur in exact the above
      config, e.g. more than 2 threads could be involved and fadvise() could
      be done in another thread, etc.
      
      For anonymous pages, they could race between mremap() and page reclaim:
      THP: a huge PMD is moved by mremap to a new huge PMD, then the new huge
      PMD gets unmapped/splitted/pagedout before the flush tlb happened for
      the old huge PMD in move_page_tables() and we could still write data to
      it.  The normal anonymous page has similar situation.
      
      To fix this, check for any dirty PTE in move_ptes()/move_huge_pmd() and
      if any, did the flush before dropping the PTL.  If we did the flush for
      every move_ptes()/move_huge_pmd() call then we do not need to do the
      flush in move_pages_tables() for the whole range.  But if we didn't, we
      still need to do the whole range flush.
      
      Alternatively, we can track which part of the range is flushed in
      move_ptes()/move_huge_pmd() and which didn't to avoid flushing the whole
      range in move_page_tables().  But that would require multiple tlb
      flushes for the different sub-ranges and should be less efficient than
      the single whole range flush.
      
      KBuild test on my Sandybridge desktop doesn't show any noticeable change.
      v4.9-rc4:
        real    5m14.048s
        user    32m19.800s
        sys     4m50.320s
      
      With this commit:
        real    5m13.888s
        user    32m19.330s
        sys     4m51.200s
      Reported-by: default avatarDave Hansen <dave.hansen@intel.com>
      Signed-off-by: default avatarAaron Lu <aaron.lu@intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5d190420