1. 15 Aug, 2020 35 commits
    • Krzysztof Kozlowski's avatar
      virtio: pci: constify ioreadX() iomem argument (as in generic implementation) · fe0580ac
      Krzysztof Kozlowski authored
      The ioreadX() helpers have inconsistent interface.  On some architectures
      void *__iomem address argument is a pointer to const, on some not.
      
      Implementations of ioreadX() do not modify the memory under the address so
      they can be converted to a "const" version for const-safety and
      consistency among architectures.
      Signed-off-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Cc: Allen Hubbe <allenbh@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Jon Mason <jdmason@kudzu.us>
      Cc: Kalle Valo <kvalo@codeaurora.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200709072837.5869-5-krzk@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fe0580ac
    • Krzysztof Kozlowski's avatar
      ntb: intel: constify ioreadX() iomem argument (as in generic implementation) · 58184e95
      Krzysztof Kozlowski authored
      The ioreadX() helpers have inconsistent interface.  On some architectures
      void *__iomem address argument is a pointer to const, on some not.
      
      Implementations of ioreadX() do not modify the memory under the address so
      they can be converted to a "const" version for const-safety and
      consistency among architectures.
      Signed-off-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Acked-by: default avatarDave Jiang <dave.jiang@intel.com>
      Cc: Allen Hubbe <allenbh@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Jon Mason <jdmason@kudzu.us>
      Cc: Kalle Valo <kvalo@codeaurora.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200709072837.5869-4-krzk@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      58184e95
    • Krzysztof Kozlowski's avatar
      rtl818x: constify ioreadX() iomem argument (as in generic implementation) · 5ca6ad7d
      Krzysztof Kozlowski authored
      The ioreadX() helpers have inconsistent interface.  On some architectures
      void *__iomem address argument is a pointer to const, on some not.
      
      Implementations of ioreadX() do not modify the memory under the address so
      they can be converted to a "const" version for const-safety and
      consistency among architectures.
      Signed-off-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Acked-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Cc: Allen Hubbe <allenbh@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Jon Mason <jdmason@kudzu.us>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200709072837.5869-3-krzk@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5ca6ad7d
    • Krzysztof Kozlowski's avatar
      iomap: constify ioreadX() iomem argument (as in generic implementation) · 8f28ca6b
      Krzysztof Kozlowski authored
      Patch series "iomap: Constify ioreadX() iomem argument", v3.
      
      The ioread8/16/32() and others have inconsistent interface among the
      architectures: some taking address as const, some not.
      
      It seems there is nothing really stopping all of them to take pointer to
      const.
      
      This patch (of 4):
      
      The ioreadX() and ioreadX_rep() helpers have inconsistent interface.  On
      some architectures void *__iomem address argument is a pointer to const,
      on some not.
      
      Implementations of ioreadX() do not modify the memory under the address so
      they can be converted to a "const" version for const-safety and
      consistency among architectures.
      
      [krzk@kernel.org: sh: clk: fix assignment from incompatible pointer type for ioreadX()]
        Link: http://lkml.kernel.org/r/20200723082017.24053-1-krzk@kernel.org
      [akpm@linux-foundation.org: fix drivers/mailbox/bcm-pdc-mailbox.c]
        Link: http://lkml.kernel.org/r/202007132209.Rxmv4QyS%25lkp@intel.comSuggested-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Reviewed-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Kalle Valo <kvalo@codeaurora.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Jon Mason <jdmason@kudzu.us>
      Cc: Allen Hubbe <allenbh@gmail.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Link: http://lkml.kernel.org/r/20200709072837.5869-1-krzk@kernel.org
      Link: http://lkml.kernel.org/r/20200709072837.5869-2-krzk@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8f28ca6b
    • Kuninori Morimoto's avatar
      sh: use generic strncpy() · f9e7ff9c
      Kuninori Morimoto authored
      Current SH will get below warning at strncpy()
      
      In file included from ${LINUX}/arch/sh/include/asm/string.h:3,
                       from ${LINUX}/include/linux/string.h:20,
                       from ${LINUX}/include/linux/bitmap.h:9,
                       from ${LINUX}/include/linux/nodemask.h:95,
                       from ${LINUX}/include/linux/mmzone.h:17,
                       from ${LINUX}/include/linux/gfp.h:6,
                       from ${LINUX}/innclude/linux/slab.h:15,
                       from ${LINUX}/linux/drivers/mmc/host/vub300.c:38:
      ${LINUX}/drivers/mmc/host/vub300.c: In function 'new_system_port_status':
      ${LINUX}/arch/sh/include/asm/string_32.h:51:42: warning: array subscript\
        80 is above array bounds of 'char[26]' [-Warray-bounds]
         : "0" (__dest), "1" (__src), "r" (__src+__n)
                                           ~~~~~^~~~
      
      In general, strncpy() should behave like below.
      
      	char dest[10];
      	char *src = "12345";
      
      	strncpy(dest, src, 10);
      	// dest = {'1', '2', '3', '4', '5',
      	           '\0','\0','\0','\0','\0'}
      
      But, current SH strnpy() has 2 issues.
      1st is it will access to out-of-memory (= src + 10).
      2nd is it needs big fixup for it, and maintenance __asm__
      code is difficult.
      
      To solve these issues, this patch simply uses generic strncpy()
      instead of architecture specific one.
      Signed-off-by: default avatarKuninori Morimoto <kuninori.morimoto.gx@renesas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Alan Modra <amodra@gmail.com>
      Cc: Bin Meng <bin.meng@windriver.com>
      Cc: Chen Zhou <chenzhou10@huawei.com>
      Cc: Geert Uytterhoeven <geert+renesas@glider.be>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Krzysztof Kozlowski <krzk@kernel.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Romain Naour <romain.naour@gmail.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: https://marc.info/?l=linux-renesas-soc&m=157664657013309Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f9e7ff9c
    • Kuninori Morimoto's avatar
      sh: clkfwk: remove r8/r16/r32 · a8e3943b
      Kuninori Morimoto authored
      SH will get below warning
      
      ${LINUX}/drivers/sh/clk/cpg.c: In function 'r8':
      ${LINUX}/drivers/sh/clk/cpg.c:41:17: warning: passing argument 1 of 'ioread8'
       discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
        return ioread8(addr);
                       ^~~~
      In file included from ${LINUX}/arch/sh/include/asm/io.h:21,
                       from ${LINUX}/include/linux/io.h:13,
                       from ${LINUX}/drivers/sh/clk/cpg.c:14:
      ${LINUX}/include/asm-generic/iomap.h:29:29: note: expected 'void *' but
      argument is of type 'const void *'
       extern unsigned int ioread8(void __iomem *);
                                   ^~~~~~~~~~~~~~
      
      We don't need "const" for r8/r16/r32.  And we don't need r8/r16/r32
      themselvs.  This patch cleanup these.
      Signed-off-by: default avatarKuninori Morimoto <kuninori.morimoto.gx@renesas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Alan Modra <amodra@gmail.com>
      Cc: Bin Meng <bin.meng@windriver.com>
      Cc: Chen Zhou <chenzhou10@huawei.com>
      Cc: Geert Uytterhoeven <geert+renesas@glider.be>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Krzysztof Kozlowski <krzk@kernel.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Romain Naour <romain.naour@gmail.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      X-MARC-Message: https://marc.info/?l=linux-renesas-soc&m=157852973916903Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a8e3943b
    • Romain Naour's avatar
      include/asm-generic/vmlinux.lds.h: align ro_after_init · 7f897acb
      Romain Naour authored
      Since the patch [1], building the kernel using a toolchain built with
      binutils 2.33.1 prevents booting a sh4 system under Qemu.  Apply the patch
      provided by Alan Modra [2] that fix alignment of rodata.
      
      [1] https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=ebd2263ba9a9124d93bbc0ece63d7e0fae89b40e
      [2] https://www.sourceware.org/ml/binutils/2019-12/msg00112.htmlSigned-off-by: default avatarRomain Naour <romain.naour@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Alan Modra <amodra@gmail.com>
      Cc: Bin Meng <bin.meng@windriver.com>
      Cc: Chen Zhou <chenzhou10@huawei.com>
      Cc: Geert Uytterhoeven <geert+renesas@glider.be>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Krzysztof Kozlowski <krzk@kernel.org>
      Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: <stable@vger.kernel.org>
      Link: https://marc.info/?l=linux-sh&m=158429470221261Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7f897acb
    • Qian Cai's avatar
      mm: annotate a data race in page_zonenum() · c403f6a3
      Qian Cai authored
       BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page
      
       write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3:
        page_cpupid_xchg_last+0x51/0x80
        page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11)
        wp_page_reuse+0x3e/0xc0
        wp_page_reuse at mm/memory.c:2453
        do_wp_page+0x472/0x7b0
        do_wp_page at mm/memory.c:2798
        __handle_mm_fault+0xcb0/0xd00
        handle_pte_fault at mm/memory.c:4049
        (inlined by) __handle_mm_fault at mm/memory.c:4163
        handle_mm_fault+0xfc/0x2f0
        handle_mm_fault at mm/memory.c:4200
        do_page_fault+0x263/0x6f9
        do_user_addr_fault at arch/x86/mm/fault.c:1465
        (inlined by) do_page_fault at arch/x86/mm/fault.c:1539
        page_fault+0x34/0x40
      
       read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69:
        put_page+0x15a/0x1f0
        page_zonenum at include/linux/mm.h:923
        (inlined by) is_zone_device_page at include/linux/mm.h:929
        (inlined by) page_is_devmap_managed at include/linux/mm.h:948
        (inlined by) put_page at include/linux/mm.h:1023
        wp_page_copy+0x571/0x930
        wp_page_copy at mm/memory.c:2615
        do_wp_page+0x107/0x7b0
        __handle_mm_fault+0xcb0/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G        W  O L 5.5.0-next-20200204+ #6
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
      
      A page never changes its zone number. The zone number happens to be
      stored in the same word as other bits which are modified, but the zone
      number bits will never be modified by any other write, so it can accept
      a reload of the zone bits after an intervening write and it don't need
      to use READ_ONCE(). Thus, annotate this data race using
      ASSERT_EXCLUSIVE_BITS() to also assert that there are no concurrent
      writes to it.
      Suggested-by: default avatarMarco Elver <elver@google.com>
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Link: http://lkml.kernel.org/r/1581619089-14472-1-git-send-email-cai@lca.pwSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c403f6a3
    • Qian Cai's avatar
      mm/swap.c: annotate data races for lru_rotate_pvecs · 7e0cc01e
      Qian Cai authored
      Read to lru_add_pvec->nr could be interrupted and then write to the same
      variable.  The write has local interrupt disabled, but the plain reads
      result in data races.  However, it is unlikely the compilers could do much
      damage here given that lru_add_pvec->nr is a "unsigned char" and there is
      an existing compiler barrier.  Thus, annotate the reads using the
      data_race() macro.  The data races were reported by KCSAN,
      
       BUG: KCSAN: data-race in lru_add_drain_cpu / rotate_reclaimable_page
      
       write to 0xffff9291ebcb8a40 of 1 bytes by interrupt on cpu 23:
        rotate_reclaimable_page+0x2df/0x490
        pagevec_add at include/linux/pagevec.h:81
        (inlined by) rotate_reclaimable_page at mm/swap.c:259
        end_page_writeback+0x1b5/0x2b0
        end_swap_bio_write+0x1d0/0x280
        bio_endio+0x297/0x560
        dec_pending+0x218/0x430 [dm_mod]
        clone_endio+0xe4/0x2c0 [dm_mod]
        bio_endio+0x297/0x560
        blk_update_request+0x201/0x920
        scsi_end_request+0x6b/0x4a0
        scsi_io_completion+0xb7/0x7e0
        scsi_finish_command+0x1ed/0x2a0
        scsi_softirq_done+0x1c9/0x1d0
        blk_done_softirq+0x181/0x1d0
        __do_softirq+0xd9/0x57c
        irq_exit+0xa2/0xc0
        do_IRQ+0x8b/0x190
        ret_from_intr+0x0/0x42
        delay_tsc+0x46/0x80
        __const_udelay+0x3c/0x40
        __udelay+0x10/0x20
        kcsan_setup_watchpoint+0x202/0x3a0
        __tsan_read1+0xc2/0x100
        lru_add_drain_cpu+0xb8/0x3f0
        lru_add_drain+0x25/0x40
        shrink_active_list+0xe1/0xc80
        shrink_lruvec+0x766/0xb70
        shrink_node+0x2d6/0xca0
        do_try_to_free_pages+0x1f7/0x9a0
        try_to_free_pages+0x252/0x5b0
        __alloc_pages_slowpath+0x458/0x1290
        __alloc_pages_nodemask+0x3bb/0x450
        alloc_pages_vma+0x8a/0x2c0
        do_anonymous_page+0x16e/0x6f0
        __handle_mm_fault+0xcd5/0xd40
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       read to 0xffff9291ebcb8a40 of 1 bytes by task 37761 on cpu 23:
        lru_add_drain_cpu+0xb8/0x3f0
        lru_add_drain_cpu at mm/swap.c:602
        lru_add_drain+0x25/0x40
        shrink_active_list+0xe1/0xc80
        shrink_lruvec+0x766/0xb70
        shrink_node+0x2d6/0xca0
        do_try_to_free_pages+0x1f7/0x9a0
        try_to_free_pages+0x252/0x5b0
        __alloc_pages_slowpath+0x458/0x1290
        __alloc_pages_nodemask+0x3bb/0x450
        alloc_pages_vma+0x8a/0x2c0
        do_anonymous_page+0x16e/0x6f0
        __handle_mm_fault+0xcd5/0xd40
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       2 locks held by oom02/37761:
        #0: ffff9281e5928808 (&mm->mmap_sem#2){++++}, at: do_page_fault
        #1: ffffffffb3ade380 (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part
       irq event stamp: 1949217
       trace_hardirqs_on_thunk+0x1a/0x1c
       __do_softirq+0x2e7/0x57c
       __do_softirq+0x34c/0x57c
       irq_exit+0xa2/0xc0
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 23 PID: 37761 Comm: oom02 Not tainted 5.6.0-rc3-next-20200226+ #6
       Hardware name: HP ProLiant BL660c Gen9, BIOS I38 10/17/2018
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarMarco Elver <elver@google.com>
      Link: http://lkml.kernel.org/r/20200228044018.1263-1-cai@lca.pwSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7e0cc01e
    • Qian Cai's avatar
      mm/rmap: annotate a data race at tlb_flush_batched · 9c1177b6
      Qian Cai authored
      mm->tlb_flush_batched could be accessed concurrently as noticed by
      KCSAN,
      
       BUG: KCSAN: data-race in flush_tlb_batched_pending / try_to_unmap_one
      
       write to 0xffff93f754880bd0 of 1 bytes by task 822 on cpu 6:
        try_to_unmap_one+0x59a/0x1ab0
        set_tlb_ubc_flush_pending at mm/rmap.c:635
        (inlined by) try_to_unmap_one at mm/rmap.c:1538
        rmap_walk_anon+0x296/0x650
        rmap_walk+0xdf/0x100
        try_to_unmap+0x18a/0x2f0
        shrink_page_list+0xef6/0x2870
        shrink_inactive_list+0x316/0x880
        shrink_lruvec+0x8dc/0x1380
        shrink_node+0x317/0xd80
        balance_pgdat+0x652/0xd90
        kswapd+0x396/0x8d0
        kthread+0x1e0/0x200
        ret_from_fork+0x27/0x50
      
       read to 0xffff93f754880bd0 of 1 bytes by task 6364 on cpu 4:
        flush_tlb_batched_pending+0x29/0x90
        flush_tlb_batched_pending at mm/rmap.c:682
        change_p4d_range+0x5dd/0x1030
        change_pte_range at mm/mprotect.c:44
        (inlined by) change_pmd_range at mm/mprotect.c:212
        (inlined by) change_pud_range at mm/mprotect.c:240
        (inlined by) change_p4d_range at mm/mprotect.c:260
        change_protection+0x222/0x310
        change_prot_numa+0x3e/0x60
        task_numa_work+0x219/0x350
        task_work_run+0xed/0x140
        prepare_exit_to_usermode+0x2cc/0x2e0
        ret_from_intr+0x32/0x42
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 4 PID: 6364 Comm: mtest01 Tainted: G        W    L 5.5.0-next-20200210+ #5
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
      
      flush_tlb_batched_pending() is under PTL but the write is not, but
      mm->tlb_flush_batched is only a bool type, so the value is unlikely to be
      shattered.  Thus, mark it as an intentional data race by using the data
      race macro.
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Marco Elver <elver@google.com>
      Link: http://lkml.kernel.org/r/1581450783-8262-1-git-send-email-cai@lca.pwSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9c1177b6
    • Qian Cai's avatar
      mm/mempool: fix a data race in mempool_free() · abe1de42
      Qian Cai authored
      mempool_t pool.curr_nr could be accessed concurrently as noticed by
      KCSAN,
      
       BUG: KCSAN: data-race in mempool_free / remove_element
      
       write to 0xffffffffa937638c of 4 bytes by task 6359 on cpu 113:
        remove_element+0x4a/0x1c0
        remove_element at mm/mempool.c:132
        mempool_alloc+0x102/0x210
        (inlined by) mempool_alloc at mm/mempool.c:399
        bio_alloc_bioset+0x106/0x2c0
        get_swap_bio+0x49/0x230
        __swap_writepage+0x680/0xc30
        swap_writepage+0x9c/0xf0
        pageout+0x33e/0xae0
        shrink_page_list+0x1f57/0x2870
        shrink_inactive_list+0x316/0x880
        shrink_lruvec+0x8dc/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
        <snip>
      
       read to 0xffffffffa937638c of 4 bytes by interrupt on cpu 64:
        mempool_free+0x3e/0x150
        mempool_free at mm/mempool.c:492
        bio_free+0x192/0x280
        bio_put+0x91/0xd0
        end_swap_bio_write+0x1d8/0x280
        bio_endio+0x2c2/0x5b0
        dec_pending+0x22b/0x440 [dm_mod]
        clone_endio+0xe4/0x2c0 [dm_mod]
        bio_endio+0x2c2/0x5b0
        blk_update_request+0x217/0x940
        scsi_end_request+0x6b/0x4d0
        scsi_io_completion+0xb7/0x7e0
        scsi_finish_command+0x223/0x310
        scsi_softirq_done+0x1d5/0x210
        blk_mq_complete_request+0x224/0x250
        scsi_mq_done+0xc2/0x250
        pqi_raid_io_complete+0x5a/0x70 [smartpqi]
        pqi_irq_handler+0x150/0x1410 [smartpqi]
        __handle_irq_event_percpu+0x90/0x540
        handle_irq_event_percpu+0x49/0xd0
        handle_irq_event+0x85/0xca
        handle_edge_irq+0x13f/0x3e0
        do_IRQ+0x86/0x190
        <snip>
      
      Since the write is under pool->lock but the read is done as lockless.
      Even though the commit 5b990546 ("mempool: fix and document
      synchronization and memory barrier usage") introduced the smp_wmb() and
      smp_rmb() pair to improve the situation, it is adequate to protect it
      from data races which could lead to a logic bug, so fix it by adding
      READ_ONCE() for the read.
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Marco Elver <elver@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Link: http://lkml.kernel.org/r/1581446384-2131-1-git-send-email-cai@lca.pwSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      abe1de42
    • Qian Cai's avatar
      mm/list_lru: fix a data race in list_lru_count_one · a1f45935
      Qian Cai authored
      struct list_lru_one l.nr_items could be accessed concurrently as noticed
      by KCSAN,
      
       BUG: KCSAN: data-race in list_lru_count_one / list_lru_isolate_move
      
       write to 0xffffa102789c4510 of 8 bytes by task 823 on cpu 39:
        list_lru_isolate_move+0xf9/0x130
        list_lru_isolate_move at mm/list_lru.c:180
        inode_lru_isolate+0x12b/0x2a0
        __list_lru_walk_one+0x122/0x3d0
        list_lru_walk_one+0x75/0xa0
        prune_icache_sb+0x8b/0xc0
        super_cache_scan+0x1b8/0x250
        do_shrink_slab+0x256/0x6d0
        shrink_slab+0x41b/0x4a0
        shrink_node+0x35c/0xd80
        balance_pgdat+0x652/0xd90
        kswapd+0x396/0x8d0
        kthread+0x1e0/0x200
        ret_from_fork+0x27/0x50
      
       read to 0xffffa102789c4510 of 8 bytes by task 6345 on cpu 56:
        list_lru_count_one+0x116/0x2f0
        list_lru_count_one at mm/list_lru.c:193
        super_cache_count+0xe8/0x170
        do_shrink_slab+0x95/0x6d0
        shrink_slab+0x41b/0x4a0
        shrink_node+0x35c/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
        __alloc_pages_nodemask+0x3bb/0x450
        alloc_pages_vma+0x8a/0x2c0
        do_anonymous_page+0x170/0x700
        __handle_mm_fault+0xc9f/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 56 PID: 6345 Comm: oom01 Tainted: G        W    L 5.5.0-next-20200205+ #4
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
      
      A shattered l.nr_items could affect the shrinker behaviour due to a data
      race. Fix it by adding READ_ONCE() for the read. Since the writes are
      aligned and up to word-size, assume those are safe from data races to
      avoid readability issues of writing WRITE_ONCE(var, var + val).
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Marco Elver <elver@google.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Link: http://lkml.kernel.org/r/1581114679-5488-1-git-send-email-cai@lca.pwSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a1f45935
    • Qian Cai's avatar
      mm/memcontrol: fix a data race in scan count · e0e3f42f
      Qian Cai authored
      struct mem_cgroup_per_node mz.lru_zone_size[zone_idx][lru] could be
      accessed concurrently as noticed by KCSAN,
      
       BUG: KCSAN: data-race in lruvec_lru_size / mem_cgroup_update_lru_size
      
       write to 0xffff9c804ca285f8 of 8 bytes by task 50951 on cpu 12:
        mem_cgroup_update_lru_size+0x11c/0x1d0
        mem_cgroup_update_lru_size at mm/memcontrol.c:1266
        isolate_lru_pages+0x6a9/0xf30
        shrink_active_list+0x123/0xcc0
        shrink_lruvec+0x8fd/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
        __alloc_pages_nodemask+0x3bb/0x450
        alloc_pages_vma+0x8a/0x2c0
        do_anonymous_page+0x170/0x700
        __handle_mm_fault+0xc9f/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       read to 0xffff9c804ca285f8 of 8 bytes by task 50964 on cpu 95:
        lruvec_lru_size+0xbb/0x270
        mem_cgroup_get_zone_lru_size at include/linux/memcontrol.h:536
        (inlined by) lruvec_lru_size at mm/vmscan.c:326
        shrink_lruvec+0x1d0/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
        __alloc_pages_nodemask+0x3bb/0x450
        alloc_pages_current+0xa6/0x120
        alloc_slab_page+0x3b1/0x540
        allocate_slab+0x70/0x660
        new_slab+0x46/0x70
        ___slab_alloc+0x4ad/0x7d0
        __slab_alloc+0x43/0x70
        kmem_cache_alloc+0x2c3/0x420
        getname_flags+0x4c/0x230
        getname+0x22/0x30
        do_sys_openat2+0x205/0x3b0
        do_sys_open+0x9a/0xf0
        __x64_sys_openat+0x62/0x80
        do_syscall_64+0x91/0xb47
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 95 PID: 50964 Comm: cc1 Tainted: G        W  O L    5.5.0-next-20200204+ #6
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
      
      The write is under lru_lock, but the read is done as lockless.  The scan
      count is used to determine how aggressively the anon and file LRU lists
      should be scanned.  Load tearing could generate an inefficient heuristic,
      so fix it by adding READ_ONCE() for the read.
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Link: http://lkml.kernel.org/r/20200206034945.2481-1-cai@lca.pwSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e0e3f42f
    • Qian Cai's avatar
      mm/page_counter: fix various data races at memsw · 6e4bd50f
      Qian Cai authored
      Commit 3e32cb2e ("mm: memcontrol: lockless page counters") could had
      memcg->memsw->watermark and memcg->memsw->failcnt been accessed
      concurrently as reported by KCSAN,
      
       BUG: KCSAN: data-race in page_counter_try_charge / page_counter_try_charge
      
       read to 0xffff8fb18c4cd190 of 8 bytes by task 1081 on cpu 59:
        page_counter_try_charge+0x4d/0x150 mm/page_counter.c:138
        try_charge+0x131/0xd50 mm/memcontrol.c:2405
        __memcg_kmem_charge_memcg+0x58/0x140
        __memcg_kmem_charge+0xcc/0x280
        __alloc_pages_nodemask+0x1e1/0x450
        alloc_pages_current+0xa6/0x120
        pte_alloc_one+0x17/0xd0
        __pte_alloc+0x3a/0x1f0
        copy_p4d_range+0xc36/0x1990
        copy_page_range+0x21d/0x360
        dup_mmap+0x5f5/0x7a0
        dup_mm+0xa2/0x240
        copy_process+0x1b3f/0x3460
        _do_fork+0xaa/0xa20
        __x64_sys_clone+0x13b/0x170
        do_syscall_64+0x91/0xb47
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
       write to 0xffff8fb18c4cd190 of 8 bytes by task 1153 on cpu 120:
        page_counter_try_charge+0x5b/0x150 mm/page_counter.c:139
        try_charge+0x131/0xd50 mm/memcontrol.c:2405
        mem_cgroup_try_charge+0x159/0x460
        mem_cgroup_try_charge_delay+0x3d/0xa0
        wp_page_copy+0x14d/0x930
        do_wp_page+0x107/0x7b0
        __handle_mm_fault+0xce6/0xd40
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       BUG: KCSAN: data-race in page_counter_try_charge / page_counter_try_charge
      
       write to 0xffff88809bbf2158 of 8 bytes by task 11782 on cpu 0:
        page_counter_try_charge+0x100/0x170 mm/page_counter.c:129
        try_charge+0x185/0xbf0 mm/memcontrol.c:2405
        __memcg_kmem_charge_memcg+0x4a/0xe0 mm/memcontrol.c:2837
        __memcg_kmem_charge+0xcf/0x1b0 mm/memcontrol.c:2877
        __alloc_pages_nodemask+0x26c/0x310 mm/page_alloc.c:4780
      
       read to 0xffff88809bbf2158 of 8 bytes by task 11814 on cpu 1:
        page_counter_try_charge+0xef/0x170 mm/page_counter.c:129
        try_charge+0x185/0xbf0 mm/memcontrol.c:2405
        __memcg_kmem_charge_memcg+0x4a/0xe0 mm/memcontrol.c:2837
        __memcg_kmem_charge+0xcf/0x1b0 mm/memcontrol.c:2877
        __alloc_pages_nodemask+0x26c/0x310 mm/page_alloc.c:4780
      
      Since watermark could be compared or set to garbage due to a data race
      which would change the code logic, fix it by adding a pair of READ_ONCE()
      and WRITE_ONCE() in those places.
      
      The "failcnt" counter is tolerant of some degree of inaccuracy and is only
      used to report stats, a data race will not be harmful, thus mark it as an
      intentional data race using the data_race() macro.
      
      Fixes: 3e32cb2e ("mm: memcontrol: lockless page counters")
      Reported-by: syzbot+f36cfe60b1006a94f9dc@syzkaller.appspotmail.com
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Marco Elver <elver@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Link: http://lkml.kernel.org/r/1581519682-23594-1-git-send-email-cai@lca.pwSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6e4bd50f
    • Qian Cai's avatar
      mm/swapfile: fix and annotate various data races · a449bf58
      Qian Cai authored
      swap_info_struct si.highest_bit, si.swap_map[offset] and si.flags could
      be accessed concurrently separately as noticed by KCSAN,
      
      === si.highest_bit ===
      
       write to 0xffff8d5abccdc4d4 of 4 bytes by task 5353 on cpu 24:
        swap_range_alloc+0x81/0x130
        swap_range_alloc at mm/swapfile.c:681
        scan_swap_map_slots+0x371/0xb90
        get_swap_pages+0x39d/0x5c0
        get_swap_page+0xf2/0x524
        add_to_swap+0xe4/0x1c0
        shrink_page_list+0x1795/0x2870
        shrink_inactive_list+0x316/0x880
        shrink_lruvec+0x8dc/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
      
       read to 0xffff8d5abccdc4d4 of 4 bytes by task 6672 on cpu 70:
        scan_swap_map_slots+0x4a6/0xb90
        scan_swap_map_slots at mm/swapfile.c:892
        get_swap_pages+0x39d/0x5c0
        get_swap_page+0xf2/0x524
        add_to_swap+0xe4/0x1c0
        shrink_page_list+0x1795/0x2870
        shrink_inactive_list+0x316/0x880
        shrink_lruvec+0x8dc/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 70 PID: 6672 Comm: oom01 Tainted: G        W    L 5.5.0-next-20200205+ #3
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
      
      === si.swap_map[offset] ===
      
       write to 0xffffbc370c29a64c of 1 bytes by task 6856 on cpu 86:
        __swap_entry_free_locked+0x8c/0x100
        __swap_entry_free_locked at mm/swapfile.c:1209 (discriminator 4)
        __swap_entry_free.constprop.20+0x69/0xb0
        free_swap_and_cache+0x53/0xa0
        unmap_page_range+0x7f8/0x1d70
        unmap_single_vma+0xcd/0x170
        unmap_vmas+0x18b/0x220
        exit_mmap+0xee/0x220
        mmput+0x10e/0x270
        do_exit+0x59b/0xf40
        do_group_exit+0x8b/0x180
      
       read to 0xffffbc370c29a64c of 1 bytes by task 6855 on cpu 20:
        _swap_info_get+0x81/0xa0
        _swap_info_get at mm/swapfile.c:1140
        free_swap_and_cache+0x40/0xa0
        unmap_page_range+0x7f8/0x1d70
        unmap_single_vma+0xcd/0x170
        unmap_vmas+0x18b/0x220
        exit_mmap+0xee/0x220
        mmput+0x10e/0x270
        do_exit+0x59b/0xf40
        do_group_exit+0x8b/0x180
      
      === si.flags ===
      
       write to 0xffff956c8fc6c400 of 8 bytes by task 6087 on cpu 23:
        scan_swap_map_slots+0x6fe/0xb50
        scan_swap_map_slots at mm/swapfile.c:887
        get_swap_pages+0x39d/0x5c0
        get_swap_page+0x377/0x524
        add_to_swap+0xe4/0x1c0
        shrink_page_list+0x1795/0x2870
        shrink_inactive_list+0x316/0x880
        shrink_lruvec+0x8dc/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
      
       read to 0xffff956c8fc6c400 of 8 bytes by task 6207 on cpu 63:
        _swap_info_get+0x41/0xa0
        __swap_info_get at mm/swapfile.c:1114
        put_swap_page+0x84/0x490
        __remove_mapping+0x384/0x5f0
        shrink_page_list+0xff1/0x2870
        shrink_inactive_list+0x316/0x880
        shrink_lruvec+0x8dc/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
      
      The writes are under si->lock but the reads are not. For si.highest_bit
      and si.swap_map[offset], data race could trigger logic bugs, so fix them
      by having WRITE_ONCE() for the writes and READ_ONCE() for the reads
      except those isolated reads where they compare against zero which a data
      race would cause no harm. Thus, annotate them as intentional data races
      using the data_race() macro.
      
      For si.flags, the readers are only interested in a single bit where a
      data race there would cause no issue there.
      
      [cai@lca.pw: add a missing annotation for si->flags in memory.c]
        Link: http://lkml.kernel.org/r/1581612647-5958-1-git-send-email-cai@lca.pwSigned-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Marco Elver <elver@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Link: http://lkml.kernel.org/r/1581095163-12198-1-git-send-email-cai@lca.pwSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a449bf58
    • Kirill A. Shutemov's avatar
      mm/filemap.c: fix a data race in filemap_fault() · e630bfac
      Kirill A. Shutemov authored
      struct file_ra_state ra.mmap_miss could be accessed concurrently during
      page faults as noticed by KCSAN,
      
       BUG: KCSAN: data-race in filemap_fault / filemap_map_pages
      
       write to 0xffff9b1700a2c1b4 of 4 bytes by task 3292 on cpu 30:
        filemap_fault+0x920/0xfc0
        do_sync_mmap_readahead at mm/filemap.c:2384
        (inlined by) filemap_fault at mm/filemap.c:2486
        __xfs_filemap_fault+0x112/0x3e0 [xfs]
        xfs_filemap_fault+0x74/0x90 [xfs]
        __do_fault+0x9e/0x220
        do_fault+0x4a0/0x920
        __handle_mm_fault+0xc69/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       read to 0xffff9b1700a2c1b4 of 4 bytes by task 3313 on cpu 32:
        filemap_map_pages+0xc2e/0xd80
        filemap_map_pages at mm/filemap.c:2625
        do_fault+0x3da/0x920
        __handle_mm_fault+0xc69/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 32 PID: 3313 Comm: systemd-udevd Tainted: G        W    L 5.5.0-next-20200210+ #1
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
      
      ra.mmap_miss is used to contribute the readahead decisions, a data race
      could be undesirable.  Both the read and write is only under non-exclusive
      mmap_sem, two concurrent writers could even underflow the counter.  Fix
      the underflow by writing to a local variable before committing a final
      store to ra.mmap_miss given a small inaccuracy of the counter should be
      acceptable.
      Signed-off-by: default avatarKirill A. Shutemov <kirill@shutemov.name>
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: default avatarQian Cai <cai@lca.pw>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Marco Elver <elver@google.com>
      Link: http://lkml.kernel.org/r/20200211030134.1847-1-cai@lca.pwSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e630bfac
    • Qian Cai's avatar
      mm/swap_state: mark various intentional data races · b96a3db2
      Qian Cai authored
      swap_cache_info.* could be accessed concurrently as noticed by
      KCSAN,
      
       BUG: KCSAN: data-race in lookup_swap_cache / lookup_swap_cache
      
       write to 0xffffffff85517318 of 8 bytes by task 94138 on cpu 101:
        lookup_swap_cache+0x12e/0x460
        lookup_swap_cache at mm/swap_state.c:322
        do_swap_page+0x112/0xeb0
        __handle_mm_fault+0xc7a/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       read to 0xffffffff85517318 of 8 bytes by task 91655 on cpu 100:
        lookup_swap_cache+0x117/0x460
        lookup_swap_cache at mm/swap_state.c:322
        shmem_swapin_page+0xc7/0x9e0
        shmem_getpage_gfp+0x2ca/0x16c0
        shmem_fault+0xef/0x3c0
        __do_fault+0x9e/0x220
        do_fault+0x4a0/0x920
        __handle_mm_fault+0xc69/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 100 PID: 91655 Comm: systemd-journal Tainted: G        W  O L 5.5.0-next-20200204+ #6
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
      
       write to 0xffffffff8d717308 of 8 bytes by task 11365 on cpu 87:
         __delete_from_swap_cache+0x681/0x8b0
         __delete_from_swap_cache at mm/swap_state.c:178
      
       read to 0xffffffff8d717308 of 8 bytes by task 11275 on cpu 53:
         __delete_from_swap_cache+0x66e/0x8b0
         __delete_from_swap_cache at mm/swap_state.c:178
      
      Both the read and write are done as lockless. Since swap_cache_info.*
      are only used to print out counter information, even if any of them
      missed a few incremental due to data races, it will be harmless, so just
      mark it as an intentional data race using the data_race() macro.
      
      While at it, fix a checkpatch.pl warning,
      
      WARNING: Single statement macros should not use a do {} while (0) loop
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Marco Elver <elver@google.com>
      Link: http://lkml.kernel.org/r/20200207003715.1578-1-cai@lca.pwSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b96a3db2
    • Qian Cai's avatar
      mm/page_io: mark various intentional data races · 7b37e226
      Qian Cai authored
      struct swap_info_struct si.flags could be accessed concurrently as noticed
      by KCSAN,
      
       BUG: KCSAN: data-race in scan_swap_map_slots / swap_readpage
      
       write to 0xffff9c77b80ac400 of 8 bytes by task 91325 on cpu 16:
        scan_swap_map_slots+0x6fe/0xb50
        scan_swap_map_slots at mm/swapfile.c:887
        get_swap_pages+0x39d/0x5c0
        get_swap_page+0x377/0x524
        add_to_swap+0xe4/0x1c0
        shrink_page_list+0x1740/0x2820
        shrink_inactive_list+0x316/0x8b0
        shrink_lruvec+0x8dc/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
        __alloc_pages_nodemask+0x3bb/0x450
        alloc_pages_vma+0x8a/0x2c0
        do_anonymous_page+0x170/0x700
        __handle_mm_fault+0xc9f/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       read to 0xffff9c77b80ac400 of 8 bytes by task 5422 on cpu 7:
        swap_readpage+0x204/0x6a0
        swap_readpage at mm/page_io.c:380
        read_swap_cache_async+0xa2/0xb0
        swapin_readahead+0x6a0/0x890
        do_swap_page+0x465/0xeb0
        __handle_mm_fault+0xc7a/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 7 PID: 5422 Comm: gmain Tainted: G        W  O L 5.5.0-next-20200204+ #6
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
      
      Other reads,
      
       read to 0xffff91ea33eac400 of 8 bytes by task 11276 on cpu 120:
        __swap_writepage+0x140/0xc20
        __swap_writepage at mm/page_io.c:289
      
       read to 0xffff91ea33eac400 of 8 bytes by task 11264 on cpu 16:
        swap_set_page_dirty+0x44/0x1f4
        swap_set_page_dirty at mm/page_io.c:442
      
      The write is under &si->lock, but the reads are done as lockless.  Since
      the reads only check for a specific bit in the flag, it is harmless even
      if load tearing happens.  Thus, just mark them as intentional data races
      using the data_race() macro.
      
      [cai@lca.pw: add a missing annotation]
        Link: http://lkml.kernel.org/r/1581612585-5812-1-git-send-email-cai@lca.pwSigned-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Marco Elver <elver@google.com>
      Link: http://lkml.kernel.org/r/20200207003601.1526-1-cai@lca.pwSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7b37e226
    • Qian Cai's avatar
      mm/frontswap: mark various intentional data races · 96bdd2bc
      Qian Cai authored
      There are a few information counters that are intentionally not protected
      against increment races, so just annotate them using the data_race()
      macro.
      
       BUG: KCSAN: data-race in __frontswap_store / __frontswap_store
      
       write to 0xffffffff8b7174d8 of 8 bytes by task 6396 on cpu 103:
        __frontswap_store+0x2d0/0x344
        inc_frontswap_failed_stores at mm/frontswap.c:70
        (inlined by) __frontswap_store at mm/frontswap.c:280
        swap_writepage+0x83/0xf0
        pageout+0x33e/0xae0
        shrink_page_list+0x1f57/0x2870
        shrink_inactive_list+0x316/0x880
        shrink_lruvec+0x8dc/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
        __alloc_pages_nodemask+0x3bb/0x450
        alloc_pages_vma+0x8a/0x2c0
        do_anonymous_page+0x170/0x700
        __handle_mm_fault+0xc9f/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       read to 0xffffffff8b7174d8 of 8 bytes by task 6405 on cpu 47:
        __frontswap_store+0x2b9/0x344
        inc_frontswap_failed_stores at mm/frontswap.c:70
        (inlined by) __frontswap_store at mm/frontswap.c:280
        swap_writepage+0x83/0xf0
        pageout+0x33e/0xae0
        shrink_page_list+0x1f57/0x2870
        shrink_inactive_list+0x316/0x880
        shrink_lruvec+0x8dc/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
        __alloc_pages_nodemask+0x3bb/0x450
        alloc_pages_vma+0x8a/0x2c0
        do_anonymous_page+0x170/0x700
        __handle_mm_fault+0xc9f/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Marco Elver <elver@google.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Link: http://lkml.kernel.org/r/1581114499-5042-1-git-send-email-cai@lca.pwSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      96bdd2bc
    • Qian Cai's avatar
      mm/kmemleak: silence KCSAN splats in checksum · 69d0b54d
      Qian Cai authored
      Even if KCSAN is disabled for kmemleak, update_checksum() could still call
      crc32() (which is outside of kmemleak.c) to dereference object->pointer.
      Thus, the value of object->pointer could be accessed concurrently as
      noticed by KCSAN,
      
       BUG: KCSAN: data-race in crc32_le_base / do_raw_spin_lock
      
       write to 0xffffb0ea683a7d50 of 4 bytes by task 23575 on cpu 12:
        do_raw_spin_lock+0x114/0x200
        debug_spin_lock_after at kernel/locking/spinlock_debug.c:91
        (inlined by) do_raw_spin_lock at kernel/locking/spinlock_debug.c:115
        _raw_spin_lock+0x40/0x50
        __handle_mm_fault+0xa9e/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       read to 0xffffb0ea683a7d50 of 4 bytes by task 839 on cpu 60:
        crc32_le_base+0x67/0x350
        crc32_le_base+0x67/0x350:
        crc32_body at lib/crc32.c:106
        (inlined by) crc32_le_generic at lib/crc32.c:179
        (inlined by) crc32_le at lib/crc32.c:197
        kmemleak_scan+0x528/0xd90
        update_checksum at mm/kmemleak.c:1172
        (inlined by) kmemleak_scan at mm/kmemleak.c:1497
        kmemleak_scan_thread+0xcc/0xfa
        kthread+0x1e0/0x200
        ret_from_fork+0x27/0x50
      
      If a shattered value was returned due to a data race, it will be corrected
      in the next scan.  Thus, let KCSAN ignore all reads in the region to
      silence KCSAN in case the write side is non-atomic.
      Suggested-by: default avatarMarco Elver <elver@google.com>
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarMarco Elver <elver@google.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Link: http://lkml.kernel.org/r/20200317182754.2180-1-cai@lca.pwSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      69d0b54d
    • Xiaoming Ni's avatar
      all arch: remove system call sys_sysctl · 88db0aa2
      Xiaoming Ni authored
      Since commit 61a47c1a ("sysctl: Remove the sysctl system call"),
      sys_sysctl is actually unavailable: any input can only return an error.
      
      We have been warning about people using the sysctl system call for years
      and believe there are no more users.  Even if there are users of this
      interface if they have not complained or fixed their code by now they
      probably are not going to, so there is no point in warning them any
      longer.
      
      So completely remove sys_sysctl on all architectures.
      
      [nixiaoming@huawei.com: s390: fix build error for sys_call_table_emu]
       Link: http://lkml.kernel.org/r/20200618141426.16884-1-nixiaoming@huawei.comSigned-off-by: default avatarXiaoming Ni <nixiaoming@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: Will Deacon <will@kernel.org>		[arm/arm64]
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Aleksa Sarai <cyphar@cyphar.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Bin Meng <bin.meng@windriver.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: chenzefeng <chenzefeng2@huawei.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Christian Brauner <christian@brauner.io>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: David Howells <dhowells@redhat.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Diego Elio Pettenò <flameeyes@flameeyes.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Iurii Zaikin <yzaikin@google.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kars de Jong <jongk@linux-m68k.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Krzysztof Kozlowski <krzk@kernel.org>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: Marco Elver <elver@google.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Miklos Szeredi <mszeredi@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Nick Piggin <npiggin@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Olof Johansson <olof@lixom.net>
      Cc: Paul Burton <paulburton@kernel.org>
      Cc: "Paul E. McKenney" <paulmck@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sami Tolvanen <samitolvanen@google.com>
      Cc: Sargun Dhillon <sargun@sargun.me>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Sudeep Holla <sudeep.holla@arm.com>
      Cc: Sven Schnelle <svens@stackframe.org>
      Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Zhou Yanjie <zhouyanjie@wanyeetech.com>
      Link: http://lkml.kernel.org/r/20200616030734.87257-1-nixiaoming@huawei.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      88db0aa2
    • Randy Dunlap's avatar
    • Matthew Wilcox (Oracle)'s avatar
      mm: introduce offset_in_thp · ee6c400f
      Matthew Wilcox (Oracle) authored
      Mirroring offset_in_page(), this gives you the offset within this
      particular page, no matter what size page it is.  It optimises down to
      offset_in_page() if CONFIG_TRANSPARENT_HUGEPAGE is not set.
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarWilliam Kucharski <william.kucharski@oracle.com>
      Reviewed-by: default avatarZi Yan <ziy@nvidia.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: http://lkml.kernel.org/r/20200629151959.15779-8-willy@infradead.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ee6c400f
    • Matthew Wilcox (Oracle)'s avatar
      mm: add thp_head · 2be1d718
      Matthew Wilcox (Oracle) authored
      This is like compound_head() but compiles away when
      CONFIG_TRANSPARENT_HUGEPAGE is not enabled.
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarWilliam Kucharski <william.kucharski@oracle.com>
      Reviewed-by: default avatarZi Yan <ziy@nvidia.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: http://lkml.kernel.org/r/20200629151959.15779-7-willy@infradead.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2be1d718
    • Matthew Wilcox (Oracle)'s avatar
      mm: replace hpage_nr_pages with thp_nr_pages · 6c357848
      Matthew Wilcox (Oracle) authored
      The thp prefix is more frequently used than hpage and we should be
      consistent between the various functions.
      
      [akpm@linux-foundation.org: fix mm/migrate.c]
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarWilliam Kucharski <william.kucharski@oracle.com>
      Reviewed-by: default avatarZi Yan <ziy@nvidia.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: http://lkml.kernel.org/r/20200629151959.15779-6-willy@infradead.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6c357848
    • Matthew Wilcox (Oracle)'s avatar
      mm: add thp_size · af3bbc12
      Matthew Wilcox (Oracle) authored
      This function returns the number of bytes in a THP.  It is like
      page_size(), but compiles to just PAGE_SIZE if CONFIG_TRANSPARENT_HUGEPAGE
      is disabled.
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarWilliam Kucharski <william.kucharski@oracle.com>
      Reviewed-by: default avatarZi Yan <ziy@nvidia.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: http://lkml.kernel.org/r/20200629151959.15779-5-willy@infradead.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      af3bbc12
    • Matthew Wilcox (Oracle)'s avatar
      mm: add thp_order · 6ffbb458
      Matthew Wilcox (Oracle) authored
      This function returns the order of a transparent huge page.  It compiles
      to 0 if CONFIG_TRANSPARENT_HUGEPAGE is disabled.
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarWilliam Kucharski <william.kucharski@oracle.com>
      Reviewed-by: default avatarZi Yan <ziy@nvidia.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: http://lkml.kernel.org/r/20200629151959.15779-4-willy@infradead.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6ffbb458
    • Matthew Wilcox (Oracle)'s avatar
      mm: move page-flags include to top of file · 41901567
      Matthew Wilcox (Oracle) authored
      Give up on the notion that we can remove page-flags.h from mm.h.  There
      are currently 14 inline functions which use a PageFoo function.  Also, two
      of the files directly included by mm.h include page-flags.h themselves,
      and there are probably more indirect inclusions.  So just include it at
      the top like any other header file.
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarWilliam Kucharski <william.kucharski@oracle.com>
      Reviewed-by: default avatarZi Yan <ziy@nvidia.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: http://lkml.kernel.org/r/20200629151959.15779-3-willy@infradead.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      41901567
    • Matthew Wilcox (Oracle)'s avatar
      mm: store compound_nr as well as compound_order · 1378a5ee
      Matthew Wilcox (Oracle) authored
      Patch series "THP prep patches".
      
      These are some generic cleanups and improvements, which I would like
      merged into mmotm soon.  The first one should be a performance improvement
      for all users of compound pages, and the others are aimed at getting code
      to compile away when CONFIG_TRANSPARENT_HUGEPAGE is disabled (ie small
      systems).  Also better documented / less confusing than the current prefix
      mixture of compound, hpage and thp.
      
      This patch (of 7):
      
      This removes a few instructions from functions which need to know how many
      pages are in a compound page.  The storage used is either page->mapping on
      64-bit or page->index on 32-bit.  Both of these are fine to overlay on
      tail pages.
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarWilliam Kucharski <william.kucharski@oracle.com>
      Reviewed-by: default avatarZi Yan <ziy@nvidia.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Link: http://lkml.kernel.org/r/20200629151959.15779-1-willy@infradead.org
      Link: http://lkml.kernel.org/r/20200629151959.15779-2-willy@infradead.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1378a5ee
    • Greg Kurz's avatar
      mailmap: add entry for Greg Kurz · 14a36a43
      Greg Kurz authored
      I had stopped using gkurz@linux.vnet.ibm.com a while back already but this
      email address was shutdown last June when I quit IBM.  It's about time to
      map it to groug@kaod.org.
      Signed-off-by: default avatarGreg Kurz <groug@kaod.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/159724692879.76040.4938578139173154028.stgit@bahia.lanSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      14a36a43
    • Kees Cook's avatar
      selftests/exec: add file type errno tests · 0f71241a
      Kees Cook authored
      Make sure execve() returns the expected errno values for non-regular
      files.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Marc Zyngier <maz@kernel.org>
      Link: http://lkml.kernel.org/r/20200813231723.2725102-3-keescook@chromium.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0f71241a
    • Kees Cook's avatar
      exec: restore EACCES of S_ISDIR execve() · fc4177be
      Kees Cook authored
      Patch series "Fix S_ISDIR execve() errno".
      
      Fix an errno change for execve() of directories, noticed by Marc Zyngier.
      Along with the fix, include a regression test to avoid seeing this return
      in the future.
      
      This patch (of 2):
      
      The return code for attempting to execute a directory has always been
      EACCES.  Adjust the S_ISDIR exec test to reflect the old errno instead of
      the general EISDIR for other kinds of "open" attempts on directories.
      
      Fixes: 633fb6ac ("exec: move S_ISREG() check earlier")
      Reported-by: default avatarMarc Zyngier <maz@kernel.org>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: default avatarGreg Kroah-Hartman <gregkh@android.com>
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@google.com>
      Link: http://lkml.kernel.org/r/20200813231723.2725102-2-keescook@chromium.org
      Link: https://lore.kernel.org/lkml/20200813151305.6191993b@whySigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fc4177be
    • Nick Terrell's avatar
      lz4: fix kernel decompression speed · b1a3e75e
      Nick Terrell authored
      This patch replaces all memcpy() calls with LZ4_memcpy() which calls
      __builtin_memcpy() so the compiler can inline it.
      
      LZ4 relies heavily on memcpy() with a constant size being inlined.  In x86
      and i386 pre-boot environments memcpy() cannot be inlined because memcpy()
      doesn't get defined as __builtin_memcpy().
      
      An equivalent patch has been applied upstream so that the next import
      won't lose this change [1].
      
      I've measured the kernel decompression speed using QEMU before and after
      this patch for the x86_64 and i386 architectures.  The speed-up is about
      10x as shown below.
      
      Code	Arch	Kernel Size	Time	Speed
      v5.8	x86_64	11504832 B	148 ms	 79 MB/s
      patch	x86_64	11503872 B	 13 ms	885 MB/s
      v5.8	i386	 9621216 B	 91 ms	106 MB/s
      patch	i386	 9620224 B	 10 ms	962 MB/s
      
      I also measured the time to decompress the initramfs on x86_64, i386, and
      arm.  All three show the same decompression speed before and after, as
      expected.
      
      [1] https://github.com/lz4/lz4/pull/890Signed-off-by: default avatarNick Terrell <terrelln@fb.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Yann Collet <yann.collet.73@gmail.com>
      Cc: Gao Xiang <gaoxiang25@huawei.com>
      Cc: Sven Schmidt <4sschmid@informatik.uni-hamburg.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Arvind Sankar <nivedita@alum.mit.edu>
      Link: http://lkml.kernel.org/r/20200803194022.2966806-1-nickrterrell@gmail.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b1a3e75e
    • Baoquan He's avatar
      Revert "mm/vmstat.c: do not show lowmem reserve protection information of empty zone" · a8a4b7ae
      Baoquan He authored
      This reverts commit 26e7dead.
      
      Sonny reported that one of their tests started failing on the latest
      kernel on their Chrome OS platform.  The root cause is that the above
      commit removed the protection line of empty zone, while the parser used in
      the test relies on the protection line to mark the end of each zone.
      
      Let's revert it to avoid breaking userspace testing or applications.
      
      Fixes: 26e7dead ("mm/vmstat.c: do not show lowmem reserve protection information of empty zone)"
      Reported-by: default avatarSonny Rao <sonnyrao@chromium.org>
      Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: <stable@vger.kernel.org>	[5.8.x]
      Link: http://lkml.kernel.org/r/20200811075412.12872-1-bhe@redhat.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a8a4b7ae
    • Mike Rapoport's avatar
      asm-generic: pgalloc.h: use correct #ifdef to enable pud_alloc_one() · 9922c1de
      Mike Rapoport authored
      The #ifdef statement that guards the generic version of pud_alloc_one() by
      mistake used __HAVE_ARCH_PUD_FREE instead of __HAVE_ARCH_PUD_ALLOC_ONE.
      
      Fix it.
      
      Fixes: d9e8b929 ("asm-generic: pgalloc: provide generic pud_alloc_one() and pud_free_one()")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20200812191415.GE163101@linux.ibm.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9922c1de
  2. 14 Aug, 2020 5 commits
    • Linus Torvalds's avatar
      Merge tag 'timers-urgent-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b923f124
      Linus Torvalds authored
      Pull timekeeping updates from Thomas Gleixner:
       "A set of timekeeping/VDSO updates:
      
         - Preparatory work to allow S390 to switch over to the generic VDSO
           implementation.
      
           S390 requires that the VDSO data pointer is handed in to the
           counter read function when time namespace support is enabled.
           Adding the pointer is a NOOP for all other architectures because
           the compiler is supposed to optimize that out when it is unused in
           the architecture specific inline. The change also solved a similar
           problem for MIPS which fortunately has time namespaces not yet
           enabled.
      
           S390 needs to update clock related VDSO data independent of the
           timekeeping updates. This was solved so far with yet another
           sequence counter in the S390 implementation. A better solution is
           to utilize the already existing VDSO sequence count for this. The
           core code now exposes helper functions which allow to serialize
           against the timekeeper code and against concurrent readers.
      
           S390 needs extra data for their clock readout function. The initial
           common VDSO data structure did not provide a way to add that. It
           now has an embedded architecture specific struct embedded which
           defaults to an empty struct.
      
           Doing this now avoids tree dependencies and conflicts post rc1 and
           allows all other architectures which work on generic VDSO support
           to work from a common upstream base.
      
         - A trivial comment fix"
      
      * tag 'timers-urgent-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        time: Delete repeated words in comments
        lib/vdso: Allow to add architecture-specific vdso data
        timekeeping/vsyscall: Provide vdso_update_begin/end()
        vdso/treewide: Add vdso_data pointer argument to __arch_get_hw_counter()
      b923f124
    • Linus Torvalds's avatar
      Merge tag 'timers-core-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b6b178e3
      Linus Torvalds authored
      Pull more timer updates from Thomas Gleixner:
       "A set of posix CPU timer changes which allows to defer the heavy work
        of posix CPU timers into task work context. The tick interrupt is
        reduced to a quick check which queues the work which is doing the
        heavy lifting before returning to user space or going back to guest
        mode. Moving this out is deferring the signal delivery slightly but
        posix CPU timers are inaccurate by nature as they depend on the tick
        so there is no real damage. The relevant test cases all passed.
      
        This lifts the last offender for RT out of the hard interrupt context
        tick handler, but it also has the general benefit that the actual
        heavy work is accounted to the task/process and not to the tick
        interrupt itself.
      
        Further optimizations are possible to break long sighand lock hold and
        interrupt disabled (on !RT kernels) times when a massive amount of
        posix CPU timers (which are unpriviledged) is armed for a
        task/process.
      
        This is currently only enabled for x86 because the architecture has to
        ensure that task work is handled in KVM before entering a guest, which
        was just established for x86 with the new common entry/exit code which
        got merged post 5.8 and is not the case for other KVM architectures"
      
      * tag 'timers-core-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86: Select POSIX_CPU_TIMERS_TASK_WORK
        posix-cpu-timers: Provide mechanisms to defer timer handling to task_work
        posix-cpu-timers: Split run_posix_cpu_timers()
      b6b178e3
    • Linus Torvalds's avatar
      Merge tag 'irq-urgent-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1d229a65
      Linus Torvalds authored
      Pull irq fixes from Thomas Gleixner:
       "Two fixes in the core interrupt code which ensure that all error exits
        unlock the descriptor lock"
      
      * tag 'irq-urgent-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq: Unlock irq descriptor after errors
        genirq/PM: Always unlock IRQ descriptor in rearm_wake_irq()
      1d229a65
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://github.com/openrisc/linux · e1d74fbe
      Linus Torvalds authored
      Pull OpenRISC updates from Stafford Horne:
       "A few patches all over the place during this cycle, mostly bug and
        sparse warning fixes for OpenRISC, but a few enhancements too. Note,
        there are 2 non OpenRISC specific fixups.
      
        Non OpenRISC fixes:
      
         - In init we need to align the init_task correctly to fix an issue
           with MUTEX_FLAGS, reviewed by Peter Z. No one picked this up so I
           kept it on my tree.
      
         - In asm-generic/io.h I fixed up some sparse warnings, OK'd by Arnd.
           Arnd asked to merge it via my tree.
      
        OpenRISC fixes:
      
         - Many fixes for OpenRISC sprase warnings.
      
         - Add support OpenRISC SMP tlb flushing rather than always flushing
           the entire TLB on every CPU.
      
         - Fix bug when dumping stack via /proc/xxx/stack of user threads"
      
      * tag 'for-linus' of git://github.com/openrisc/linux:
        openrisc: uaccess: Add user address space check to access_ok
        openrisc: signal: Fix sparse address space warnings
        openrisc: uaccess: Remove unused macro __addr_ok
        openrisc: uaccess: Use static inline function in access_ok
        openrisc: uaccess: Fix sparse address space warnings
        openrisc: io: Fixup defines and move include to the end
        asm-generic/io.h: Fix sparse warnings on big-endian architectures
        openrisc: Implement proper SMP tlb flushing
        openrisc: Fix oops caused when dumping stack
        openrisc: Add support for external initrd images
        init: Align init_task to avoid conflict with MUTEX_FLAGS
        openrisc: fix __user in raw_copy_to_user()'s prototype
      e1d74fbe
    • Linus Torvalds's avatar
      Merge tag 'powerpc-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 7fca4dee
      Linus Torvalds authored
      Pull powerpc fix from Michael Ellerman:
       "One fix for a boot crash on some platforms introduced by the recent
        pkey refactoring.
      
        Thanks to Christian Zigotzky and Aneesh Kumar K.V"
      
      * tag 'powerpc-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/pkeys: Fix boot failures with Nemo board (A-EON AmigaOne X1000)
      7fca4dee