1. 04 Feb, 2020 16 commits
    • Manfred Spraul's avatar
      ipc/mqueue.c: update/document memory barriers · c5b2cbdb
      Manfred Spraul authored
      Update and document memory barriers for mqueue.c:
      
      - ewp->state is read without any locks, thus READ_ONCE is required.
      
      - add smp_aquire__after_ctrl_dep() after the READ_ONCE, we need
        acquire semantics if the value is STATE_READY.
      
      - use wake_q_add_safe()
      
      - document why __set_current_state() may be used:
        Reading task->state cannot happen before the wake_q_add() call,
        which happens while holding info->lock. Thus the spin_unlock()
        is the RELEASE, and the spin_lock() is the ACQUIRE.
      
      For completeness: there is also a 3 CPU scenario, if the to be woken
      up task is already on another wake_q.
      Then:
      - CPU1: spin_unlock() of the task that goes to sleep is the RELEASE
      - CPU2: the spin_lock() of the waker is the ACQUIRE
      - CPU2: smp_mb__before_atomic inside wake_q_add() is the RELEASE
      - CPU3: smp_mb__after_spinlock() inside try_to_wake_up() is the ACQUIRE
      
      Link: http://lkml.kernel.org/r/20191020123305.14715-4-manfred@colorfullife.comSigned-off-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Reviewed-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Cc: Waiman Long <longman@redhat.com>
      Cc: <1vier1@web.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c5b2cbdb
    • Davidlohr Bueso's avatar
      ipc/mqueue.c: remove duplicated code · ed29f171
      Davidlohr Bueso authored
      pipelined_send() and pipelined_receive() are identical, so merge them.
      
      [manfred@colorfullife.com: add changelog]
      Link: http://lkml.kernel.org/r/20191020123305.14715-3-manfred@colorfullife.comSigned-off-by: default avatarDavidlohr Bueso <dave@stgolabs.net>
      Signed-off-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Cc: <1vier1@web.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ed29f171
    • Manfred Spraul's avatar
      smp_mb__{before,after}_atomic(): update Documentation · 39323c64
      Manfred Spraul authored
      When adding the _{acquire|release|relaxed}() variants of some atomic
      operations, it was forgotten to update Documentation/memory_barrier.txt:
      
      smp_mb__{before,after}_atomic() is now intended for all RMW operations
      that do not imply a memory barrier.
      
      1)
      	smp_mb__before_atomic();
      	atomic_add();
      
      2)
      	smp_mb__before_atomic();
      	atomic_xchg_relaxed();
      
      3)
      	smp_mb__before_atomic();
      	atomic_fetch_add_relaxed();
      
      Invalid would be:
      	smp_mb__before_atomic();
      	atomic_set();
      
      In addition, the patch splits the long sentence into multiple shorter
      sentences.
      
      Link: http://lkml.kernel.org/r/20191020123305.14715-2-manfred@colorfullife.com
      Fixes: 654672d4 ("locking/atomics: Add _{acquire|release|relaxed}() variants of some atomic operations")
      Signed-off-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Acked-by: default avatarWaiman Long <longman@redhat.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: <1vier1@web.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      39323c64
    • David Hildenbrand's avatar
      mm/memory_hotplug: drop valid_start/valid_end from test_pages_in_a_zone() · 92917998
      David Hildenbrand authored
      The callers are only interested in the actual zone, they don't care about
      boundaries.  Return the zone instead to simplify.
      
      Link: http://lkml.kernel.org/r/20200110183308.11849-1-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      92917998
    • David Hildenbrand's avatar
      mm/memory_hotplug: cleanup __remove_pages() · 52fb87c8
      David Hildenbrand authored
      Let's drop the basically unused section stuff and simplify.
      
      Also, let's use a shorter variant to calculate the number of pages to
      the next section boundary.
      
      Link: http://lkml.kernel.org/r/20191006085646.5768-11-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Pankaj Gupta <pagupta@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      52fb87c8
    • David Hildenbrand's avatar
      mm/memory_hotplug: drop local variables in shrink_zone_span() · 5d12071c
      David Hildenbrand authored
      Get rid of the unnecessary local variables.
      
      Link: http://lkml.kernel.org/r/20191006085646.5768-10-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pankaj Gupta <pagupta@redhat.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5d12071c
    • David Hildenbrand's avatar
      mm/memory_hotplug: don't check for "all holes" in shrink_zone_span() · 950b68d9
      David Hildenbrand authored
      If we have holes, the holes will automatically get detected and removed
      once we remove the next bigger/smaller section.  The extra checks can go.
      
      Link: http://lkml.kernel.org/r/20191006085646.5768-9-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pankaj Gupta <pagupta@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      950b68d9
    • David Hildenbrand's avatar
      mm/memory_hotplug: we always have a zone in find_(smallest|biggest)_section_pfn · 9b05158f
      David Hildenbrand authored
      With shrink_pgdat_span() out of the way, we now always have a valid zone.
      
      Link: http://lkml.kernel.org/r/20191006085646.5768-8-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pankaj Gupta <pagupta@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9b05158f
    • David Hildenbrand's avatar
      mm/memory_hotplug: poison memmap in remove_pfn_range_from_zone() · d33695b1
      David Hildenbrand authored
      Let's poison the pages similar to when adding new memory in
      sparse_add_section().  Also call remove_pfn_range_from_zone() from
      memunmap_pages(), so we can poison the memmap from there as well.
      
      Link: http://lkml.kernel.org/r/20191006085646.5768-7-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pankaj Gupta <pagupta@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d33695b1
    • Aneesh Kumar K.V's avatar
      mm/memmap_init: update variable name in memmap_init_zone · 1f8d75c1
      Aneesh Kumar K.V authored
      Patch series "mm/memory_hotplug: Shrink zones before removing memory", v6.
      
      This series fixes the access of uninitialized memmaps when shrinking
      zones/nodes and when removing memory.  Also, it contains all fixes for
      crashes that can be triggered when removing certain namespace using
      memunmap_pages() - ZONE_DEVICE, reported by Aneesh.
      
      We stop trying to shrink ZONE_DEVICE, as it's buggy, fixing it would be
      more involved (we don't have SECTION_IS_ONLINE as an indicator), and
      shrinking is only of limited use (set_zone_contiguous() cannot detect the
      ZONE_DEVICE as contiguous).
      
      We continue shrinking !ZONE_DEVICE zones, however, I reduced the amount of
      code to a minimum.  Shrinking is especially necessary to keep
      zone->contiguous set where possible, especially, on memory unplug of DIMMs
      at zone boundaries.
      
      --------------------------------------------------------------------------
      
      Zones are now properly shrunk when offlining memory blocks or when
      onlining failed.  This allows to properly shrink zones on memory unplug
      even if the separate memory blocks of a DIMM were onlined to different
      zones or re-onlined to a different zone after offlining.
      
      Example:
      
      :/# cat /proc/zoneinfo
      Node 1, zone  Movable
              spanned  0
              present  0
              managed  0
      :/# echo "online_movable" > /sys/devices/system/memory/memory41/state
      :/# echo "online_movable" > /sys/devices/system/memory/memory43/state
      :/# cat /proc/zoneinfo
      Node 1, zone  Movable
              spanned  98304
              present  65536
              managed  65536
      :/# echo 0 > /sys/devices/system/memory/memory43/online
      :/# cat /proc/zoneinfo
      Node 1, zone  Movable
              spanned  32768
              present  32768
              managed  32768
      :/# echo 0 > /sys/devices/system/memory/memory41/online
      :/# cat /proc/zoneinfo
      Node 1, zone  Movable
              spanned  0
              present  0
              managed  0
      
      This patch (of 6):
      
      The third argument is actually number of pages.  Change the variable name
      from size to nr_pages to indicate this better.
      
      No functional change in this patch.
      
      Link: http://lkml.kernel.org/r/20191006085646.5768-3-david@redhat.comSigned-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarPankaj Gupta <pagupta@redhat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1f8d75c1
    • David Hildenbrand's avatar
      mm: factor out next_present_section_nr() · 4c605881
      David Hildenbrand authored
      Let's move it to the header and use the shorter variant from
      mm/page_alloc.c (the original one will also check
      "__highest_present_section_nr + 1", which is not necessary).  While at
      it, make the section_nr in next_pfn() const.
      
      In next_pfn(), we now return section_nr_to_pfn(-1) instead of -1 once we
      exceed __highest_present_section_nr, which doesn't make a difference in
      the caller as it is big enough (>= all sane end_pfn).
      
      Link: http://lkml.kernel.org/r/20200113144035.10848-3-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: "Jin, Zhi" <zhi.jin@intel.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4c605881
    • David Hildenbrand's avatar
      mm/page_alloc: fix and rework pfn handling in memmap_init_zone() · 948c436e
      David Hildenbrand authored
      Let's update the pfn manually whenever we continue the loop.  This makes
      the code easier to read but also less error prone (and we can directly fix
      one issue).
      
      When overlap_memmap_init() returns true, pfn is updated to
      "memblock_region_memory_end_pfn(r)".  So it already points at the *next*
      pfn to process.  Incrementing the pfn another time is wrong, we might
      leave one uninitialized.  I spotted this by inspecting the code, so I have
      no idea if this is relevant in practise (with kernelcore=mirror).
      
      Link: http://lkml.kernel.org/r/20200113144035.10848-2-david@redhat.com
      Fixes: a9a9e77f ("mm: move mirrored memory specific code outside of memmap_init_zone")
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reviewed-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: "Jin, Zhi" <zhi.jin@intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      948c436e
    • David Hildenbrand's avatar
      mm/page_alloc.c: initialize memmap of unavailable memory directly · 4b094b78
      David Hildenbrand authored
      Let's make sure that all memory holes are actually marked PageReserved(),
      that page_to_pfn() produces reliable results, and that these pages are not
      detected as "mmap" pages due to the mapcount.
      
      E.g., booting a x86-64 QEMU guest with 4160 MB:
      
      [    0.010585] Early memory node ranges
      [    0.010586]   node   0: [mem 0x0000000000001000-0x000000000009efff]
      [    0.010588]   node   0: [mem 0x0000000000100000-0x00000000bffdefff]
      [    0.010589]   node   0: [mem 0x0000000100000000-0x0000000143ffffff]
      
      max_pfn is 0x144000.
      
      Before this change:
      
      [root@localhost ~]# ./page-types -r -a 0x144000,
                   flags      page-count       MB  symbolic-flags                     long-symbolic-flags
      0x0000000000000800           16384       64  ___________M_______________________________        mmap
                   total           16384       64
      
      After this change:
      
      [root@localhost ~]# ./page-types -r -a 0x144000,
                   flags      page-count       MB  symbolic-flags                     long-symbolic-flags
      0x0000000100000000           16384       64  ___________________________r_______________        reserved
                   total           16384       64
      
      IOW, especially the unavailable physical memory ("memory hole") in the
      last section would not get properly marked PageReserved() and is indicated
      to be "mmap" memory.
      
      Drop the trace of that function from include/linux/mm.h - nobody else
      needs it, and rename it accordingly.
      
      Note: The fake zone/node might not be covered by the zone/node span.  This
      is not an urgent issue (for now, we had the same node/zone due to the
      zeroing).  We'll need a clean way to mark memory holes (e.g., using a page
      type PageHole() if possible or a fake ZONE_INVALID) and eventually stop
      marking these memory holes PageReserved().
      
      Link: http://lkml.kernel.org/r/20191211163201.17179-4-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Bob Picco <bob.picco@oracle.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4b094b78
    • David Hildenbrand's avatar
      fs/proc/page.c: allow inspection of last section and fix end detection · abec749f
      David Hildenbrand authored
      If max_pfn does not fall onto a section boundary, it is possible to
      inspect PFNs up to max_pfn, and PFNs above max_pfn, however, max_pfn
      itself can't be inspected.  We can have a valid (and online) memmap at and
      above max_pfn if max_pfn is not aligned to a section boundary.  The whole
      early section has a memmap and is marked online.  Being able to inspect
      the state of these PFNs is valuable for debugging, especially because
      max_pfn can change on memory hotplug and expose these memmaps.
      
      Also, querying page flags via "./page-types -r -a 0x144001,"
      (tools/vm/page-types.c) inside a x86-64 guest with 4160MB under QEMU
      results in an (almost) endless loop in user space, because the end is not
      detected properly when starting after max_pfn.
      
      Instead, let's allow to inspect all pages in the highest section and
      return 0 directly if we try to access pages above that section.
      
      While at it, check the count before adjusting it, to avoid masking user
      errors.
      
      Link: http://lkml.kernel.org/r/20191211163201.17179-3-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Bob Picco <bob.picco@oracle.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      abec749f
    • David Hildenbrand's avatar
      mm/page_alloc.c: fix uninitialized memmaps on a partially populated last section · e822969c
      David Hildenbrand authored
      Patch series "mm: fix max_pfn not falling on section boundary", v2.
      
      Playing with different memory sizes for a x86-64 guest, I discovered that
      some memmaps (highest section if max_mem does not fall on the section
      boundary) are marked as being valid and online, but contain garbage.  We
      have to properly initialize these memmaps.
      
      Looking at /proc/kpageflags and friends, I found some more issues,
      partially related to this.
      
      This patch (of 3):
      
      If max_pfn is not aligned to a section boundary, we can easily run into
      BUGs.  This can e.g., be triggered on x86-64 under QEMU by specifying a
      memory size that is not a multiple of 128MB (e.g., 4097MB, but also
      4160MB).  I was told that on real HW, we can easily have this scenario
      (esp., one of the main reasons sub-section hotadd of devmem was added).
      
      The issue is, that we have a valid memmap (pfn_valid()) for the whole
      section, and the whole section will be marked "online".
      pfn_to_online_page() will succeed, but the memmap contains garbage.
      
      E.g., doing a "./page-types -r -a 0x144001" when QEMU was started with "-m
      4160M" - (see tools/vm/page-types.c):
      
      [  200.476376] BUG: unable to handle page fault for address: fffffffffffffffe
      [  200.477500] #PF: supervisor read access in kernel mode
      [  200.478334] #PF: error_code(0x0000) - not-present page
      [  200.479076] PGD 59614067 P4D 59614067 PUD 59616067 PMD 0
      [  200.479557] Oops: 0000 [#4] SMP NOPTI
      [  200.479875] CPU: 0 PID: 603 Comm: page-types Tainted: G      D W         5.5.0-rc1-next-20191209 #93
      [  200.480646] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu4
      [  200.481648] RIP: 0010:stable_page_flags+0x4d/0x410
      [  200.482061] Code: f3 ff 41 89 c0 48 b8 00 00 00 00 01 00 00 00 45 84 c0 0f 85 cd 02 00 00 48 8b 53 08 48 8b 2b 48f
      [  200.483644] RSP: 0018:ffffb139401cbe60 EFLAGS: 00010202
      [  200.484091] RAX: fffffffffffffffe RBX: fffffbeec5100040 RCX: 0000000000000000
      [  200.484697] RDX: 0000000000000001 RSI: ffffffff9535c7cd RDI: 0000000000000246
      [  200.485313] RBP: ffffffffffffffff R08: 0000000000000000 R09: 0000000000000000
      [  200.485917] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000144001
      [  200.486523] R13: 00007ffd6ba55f48 R14: 00007ffd6ba55f40 R15: ffffb139401cbf08
      [  200.487130] FS:  00007f68df717580(0000) GS:ffff9ec77fa00000(0000) knlGS:0000000000000000
      [  200.487804] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  200.488295] CR2: fffffffffffffffe CR3: 0000000135d48000 CR4: 00000000000006f0
      [  200.488897] Call Trace:
      [  200.489115]  kpageflags_read+0xe9/0x140
      [  200.489447]  proc_reg_read+0x3c/0x60
      [  200.489755]  vfs_read+0xc2/0x170
      [  200.490037]  ksys_pread64+0x65/0xa0
      [  200.490352]  do_syscall_64+0x5c/0xa0
      [  200.490665]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      But it can be triggered much easier via "cat /proc/kpageflags > /dev/null"
      after cold/hot plugging a DIMM to such a system:
      
      [root@localhost ~]# cat /proc/kpageflags > /dev/null
      [  111.517275] BUG: unable to handle page fault for address: fffffffffffffffe
      [  111.517907] #PF: supervisor read access in kernel mode
      [  111.518333] #PF: error_code(0x0000) - not-present page
      [  111.518771] PGD a240e067 P4D a240e067 PUD a2410067 PMD 0
      
      This patch fixes that by at least zero-ing out that memmap (so e.g.,
      page_to_pfn() will not crash).  Commit 907ec5fc ("mm: zero remaining
      unavailable struct pages") tried to fix a similar issue, but forgot to
      consider this special case.
      
      After this patch, there are still problems to solve.  E.g., not all of
      these pages falling into a memory hole will actually get initialized later
      and set PageReserved - they are only zeroed out - but at least the
      immediate crashes are gone.  A follow-up patch will take care of this.
      
      Link: http://lkml.kernel.org/r/20191211163201.17179-2-david@redhat.com
      Fixes: f7f99100 ("mm: stop zeroing memory during allocation in vmemmap")
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Tested-by: default avatarDaniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Bob Picco <bob.picco@oracle.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: <stable@vger.kernel.org>	[4.15+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e822969c
    • Gang He's avatar
      ocfs2: fix oops when writing cloned file · 2d797e9f
      Gang He authored
      Writing a cloned file triggers a kernel oops and the user-space command
      process is also killed by the system.  The bug can be reproduced stably
      via:
      
      1) create a file under ocfs2 file system directory.
      
        journalctl -b > aa.txt
      
      2) create a cloned file for this file.
      
        reflink aa.txt bb.txt
      
      3) write the cloned file with dd command.
      
        dd if=/dev/zero of=bb.txt bs=512 count=1 conv=notrunc
      
      The dd command is killed by the kernel, then you can see the oops message
      via dmesg command.
      
      [  463.875404] BUG: kernel NULL pointer dereference, address: 0000000000000028
      [  463.875413] #PF: supervisor read access in kernel mode
      [  463.875416] #PF: error_code(0x0000) - not-present page
      [  463.875418] PGD 0 P4D 0
      [  463.875425] Oops: 0000 [#1] SMP PTI
      [  463.875431] CPU: 1 PID: 2291 Comm: dd Tainted: G           OE     5.3.16-2-default
      [  463.875433] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      [  463.875500] RIP: 0010:ocfs2_refcount_cow+0xa4/0x5d0 [ocfs2]
      [  463.875505] Code: 06 89 6c 24 38 89 eb f6 44 24 3c 02 74 be 49 8b 47 28
      [  463.875508] RSP: 0018:ffffa2cb409dfce8 EFLAGS: 00010202
      [  463.875512] RAX: ffff8b1ebdca8000 RBX: 0000000000000001 RCX: ffff8b1eb73a9df0
      [  463.875515] RDX: 0000000000056a01 RSI: 0000000000000000 RDI: 0000000000000000
      [  463.875517] RBP: 0000000000000001 R08: ffff8b1eb73a9de0 R09: 0000000000000000
      [  463.875520] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
      [  463.875522] R13: ffff8b1eb922f048 R14: 0000000000000000 R15: ffff8b1eb922f048
      [  463.875526] FS:  00007f8f44d15540(0000) GS:ffff8b1ebeb00000(0000) knlGS:0000000000000000
      [  463.875529] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  463.875532] CR2: 0000000000000028 CR3: 000000003c17a000 CR4: 00000000000006e0
      [  463.875546] Call Trace:
      [  463.875596]  ? ocfs2_inode_lock_full_nested+0x18b/0x960 [ocfs2]
      [  463.875648]  ocfs2_file_write_iter+0xaf8/0xc70 [ocfs2]
      [  463.875672]  new_sync_write+0x12d/0x1d0
      [  463.875688]  vfs_write+0xad/0x1a0
      [  463.875697]  ksys_write+0xa1/0xe0
      [  463.875710]  do_syscall_64+0x60/0x1f0
      [  463.875743]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  463.875758] RIP: 0033:0x7f8f4482ed44
      [  463.875762] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 80 00 00 00
      [  463.875765] RSP: 002b:00007fff300a79d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [  463.875769] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f8f4482ed44
      [  463.875771] RDX: 0000000000000200 RSI: 000055f771b5c000 RDI: 0000000000000001
      [  463.875774] RBP: 0000000000000200 R08: 00007f8f44af9c78 R09: 0000000000000003
      [  463.875776] R10: 000000000000089f R11: 0000000000000246 R12: 000055f771b5c000
      [  463.875779] R13: 0000000000000200 R14: 0000000000000000 R15: 000055f771b5c000
      
      This regression problem was introduced by commit e74540b2 ("ocfs2:
      protect extent tree in ocfs2_prepare_inode_for_write()").
      
      Link: http://lkml.kernel.org/r/20200121050153.13290-1-ghe@suse.com
      Fixes: e74540b2 ("ocfs2: protect extent tree in ocfs2_prepare_inode_for_write()").
      Signed-off-by: default avatarGang He <ghe@suse.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2d797e9f
  2. 03 Feb, 2020 8 commits
    • Masahiro Yamada's avatar
      initramfs: do not show compression mode choice if INITRAMFS_SOURCE is empty · d4e9056d
      Masahiro Yamada authored
      Since commit ddd09bcc ("initramfs: make compression options not
      depend on INITRAMFS_SOURCE"), Kconfig asks the compression mode for
      the built-in initramfs regardless of INITRAMFS_SOURCE.
      
      It is technically simpler, but pointless from a UI perspective,
      Linus says [1].
      
      When INITRAMFS_SOURCE is empty, usr/Makefile creates a tiny default
      cpio, which is so small that nobody cares about the compression.
      
      This commit hides the Kconfig choice in that case. The default cpio
      is embedded without compression, which was the original behavior.
      
      [1]: https://lkml.org/lkml/2020/2/1/160Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d4e9056d
    • Linus Torvalds's avatar
      Merge tag 'for-5.6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · ad801428
      Linus Torvalds authored
      Pull more btrfs updates from David Sterba:
       "Fixes that arrived after the merge window freeze, mostly stable
        material.
      
         - fix race in tree-mod-log element tracking
      
         - fix bio flushing inside extent writepages
      
         - fix assertion when in-memory tracking of discarded extents finds an
           empty tree (eg. after adding a new device)
      
         - update logic of temporary read-only block groups to take into
           account overcommit
      
         - fix some fixup worker corner cases:
             - page could not go through proper COW cycle and the dirty status
               is lost due to page migration
             - deadlock if delayed allocation is performed under page lock
      
         - fix send emitting invalid clones within the same file
      
         - fix statfs reporting 0 free space when global block reserve size is
           larger than remaining free space but there is still space for new
           chunks"
      
      * tag 'for-5.6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: do not zero f_bavail if we have available space
        Btrfs: send, fix emission of invalid clone operations within the same file
        btrfs: do not do delalloc reservation under page lock
        btrfs: drop the -EBUSY case in __extent_writepage_io
        Btrfs: keep pages dirty when using btrfs_writepage_fixup_worker
        btrfs: take overcommit into account in inc_block_group_ro
        btrfs: fix force usage in inc_block_group_ro
        btrfs: Correctly handle empty trees in find_first_clear_extent_bit
        btrfs: flush write bio if we loop in extent_write_cache_pages
        Btrfs: fix race between adding and putting tree mod seq elements and nodes
      ad801428
    • Linus Torvalds's avatar
      Merge tag 'kgdb-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux · e17ac02b
      Linus Torvalds authored
      Pull kgdb updates from Daniel Thompson:
       "Everything for kgdb this time around is either simplifications or
        clean ups.
      
        In particular Douglas Anderson's modifications to the backtrace
        machine in the *last* dev cycle have enabled Doug to tidy up some MIPS
        specific backtrace code and stop sharing certain data structures
        across the kernel. Note that The MIPS folks were on Cc: for the MIPS
        patch and reacted positively (but without an explicit Acked-by).
      
        Doug also got rid of the implicit switching between tasks and register
        sets during some but not of kdb's backtrace actions (because the
        implicit switching was either confusing for users, pointless or both).
      
        Finally there is a coverity fix and patch to replace open coded
        console traversal with the proper helper function"
      
      * tag 'kgdb-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
        kdb: Use for_each_console() helper
        kdb: remove redundant assignment to pointer bp
        kdb: Get rid of confusing diag msg from "rd" if current task has no regs
        kdb: Gid rid of implicit setting of the current task / regs
        kdb: kdb_current_task shouldn't be exported
        kdb: kdb_current_regs should be private
        MIPS: kdb: Remove old workaround for backtracing on other CPUs
      e17ac02b
    • Linus Torvalds's avatar
      Merge tag 'char-misc-5.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 754beeec
      Linus Torvalds authored
      Pull char/misc fix from Greg KH:
       "Here is a single patch, that fixes up a commit that came in the
        previous char/misc merge.
      
        It fixes a bug in the hpet driver that everyone keeps tripping over in
        their automated testing. Good thing is, people are catching it. Bad
        thing it wasn't caught by anyone testing before this. Oh well...
      
        This has been in linux-next for a few days with no reported issues"
      
      * tag 'char-misc-5.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        char: hpet: Fix out-of-bounds read bug
      754beeec
    • Linus Torvalds's avatar
      Merge tag 'backlight-next-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight · 2367da5b
      Linus Torvalds authored
      Pull backlight updates from Lee Jones:
       "Fix-ups:
         - Remove superfluous code in ams369fg06
         - Convert over to GPIO descriptor (gpiod) in bd6107
      
        Bug Fixes:
         - Fix unsigned comparison to less than zero in qcom-wled"
      
      * tag 'backlight-next-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight:
        backlight: qcom-wled: Fix unsigned comparison to zero
        backlight: bd6107: Convert to use GPIO descriptor
        backlight: ams369fg06: Drop GPIO include
      2367da5b
    • Linus Torvalds's avatar
      Merge tag 'mfd-next-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd · af32f3a4
      Linus Torvalds authored
      Pull MFD updates from Lee Jones:
       "New Drivers:
         - Add support for ROHM BD71828 PMICs and GPIOs
         - Add support for Qualcomm Aqstic Audio Codecs WCD9340 and WCD9341
      
        New Device Support:
         - Add support for BD71828 to BD70528 RTC driver
         - Add support for Intel's Jasper Lake to LPSS PCI
      
        New Functionality:
         - Add support for Power Key to ROHM BD71828
         - Add support for Clocks to ROHM BD71828
         - Add support for GPIOs to Dialog DA9062
         - Add support for USB PD Notify to ChromiumOS EC
         - Allow callers to specify args when requesting regmap lookup; syscon
      
        Fix-ups:
         - Improve error handling and sanity checking; atmel-hlcdc, dln2
         - Device Tree support/documentation; bd71828, da9062, xylon,logicvc,
           ab8500, max14577, atmel-usart
         - Match devices using platform IDs; bd7xxxx
         - Refactor BD718x7 regulator component; bd718x7-regulator
         - Use standard interfaces/helpers; syscon, sm501
         - Trivial (whitespace, spelling, etc); ab8500-core, Kconfig
         - Remove unused code; db8500-prcmu, tqmx86
         - Wait until boot has finished before accessing registers;
           madera-core
         - Provide missing register value defaults; cs47l15-tables
         - Allow more time for hardware to reset; madera-core
      
        Bug Fixes:
         - Fix erroneous register values; rohm-bd70528
         - Fix register volatility; axp20x, rn5t618
         - Fix Kconfig dependencies; MFD_MAX77650
         - Fix incorrect compatible string; da9062-core
         - Fix syscon_regmap_lookup_by_phandle_args() stub; syscon"
      
      * tag 'mfd-next-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (41 commits)
        mfd: syscon: Fix syscon_regmap_lookup_by_phandle_args() dummy
        mfd: wcd934x: Add support to wcd9340/wcd9341 codec
        mfd: syscon: Add arguments support for syscon reference
        mfd: rn5t618: Mark ADC control register volatile
        dt-bindings: atmel-usart: Add microchip,sam9x60-{usart, dbgu}
        dt-bindings: atmel-usart: Remove wildcard
        mfd: cros_ec: Add cros-usbpd-notify subdevice
        mfd: da9062: Fix watchdog compatible string
        mfd: madera: Allow more time for hardware reset
        mfd: cs47l15: Add missing register default
        mfd: madera: Wait for boot done before accessing any other registers
        mfd: Kconfig: Rename Samsung to lowercase
        mfd: tqmx86: remove set but not used variable 'i2c_ien'
        mfd: dbx500-prcmu: Drop DSI pll clock functions
        mfd: dbx500-prcmu: Drop set_display_clocks()
        mfd: max77650: Select REGMAP_IRQ in Kconfig
        mfd: axp20x: Mark AXP20X_VBUS_IPSOUT_MGMT as volatile
        mfd: ab8500: Fix ab8500-clk typo
        mfd: intel-lpss: Add Intel Jasper Lake PCI IDs
        dt-bindings: mfd: max14577: Add reference to max14040_battery.txt descriptions
        ...
      af32f3a4
    • Linus Torvalds's avatar
      Merge tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux · d0fa9250
      Linus Torvalds authored
      Pull Hyper-V updates from Sasha Levin:
      
       - Most of the commits here are work to enable host-initiated
         hibernation support by Dexuan Cui.
      
       - Fix for a warning shown when host sends non-aligned balloon requests
         by Tianyu Lan.
      
      * tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
        hv_utils: Add the support of hibernation
        hv_utils: Support host-initiated hibernation request
        hv_utils: Support host-initiated restart request
        Tools: hv: Reopen the devices if read() or write() returns errors
        video: hyperv: hyperv_fb: Use physical memory for fb on HyperV Gen 1 VMs.
        Drivers: hv: vmbus: Ignore CHANNELMSG_TL_CONNECT_RESULT(23)
        video: hyperv_fb: Fix hibernation for the deferred IO feature
        Input: hyperv-keyboard: Add the support of hibernation
        hv_balloon: Balloon up according to request page number
      d0fa9250
    • Geert Uytterhoeven's avatar
      mfd: syscon: Fix syscon_regmap_lookup_by_phandle_args() dummy · 5312f321
      Geert Uytterhoeven authored
      If CONFIG_MFD_SYSCON=n:
      
          include/linux/mfd/syscon.h:54:23: warning: ‘syscon_regmap_lookup_by_phandle_args’ defined but not used [-Wunused-function]
      
      Fix this by adding the missing inline keyword.
      
      Fixes: 6a24f567 ("mfd: syscon: Add arguments support for syscon reference")
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      5312f321
  3. 02 Feb, 2020 5 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · 46d6b7be
      Linus Torvalds authored
      Pull sparc fix from David Miller:
       "adjtimex regression fix from Arnd"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        sparc64: fix adjtimex regression
      46d6b7be
    • Linus Torvalds's avatar
      Merge tag 'leds-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/pavel/linux-leds · 545ae665
      Linus Torvalds authored
      Pull LED updates from Pavel Machek:
      
       - New driver for TI TPS6105X
      
       - Add managed API to get a LED from a device driver
      
       - Misc fixes and updates
      
      * tag 'leds-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/pavel/linux-leds: (22 commits)
        leds: lm3692x: Disable chip on brightness 0
        leds: lm3692x: Split out lm3692x_leds_disable
        leds: lm3692x: Move lm3692x_init and rename to lm3692x_leds_enable
        leds: lm3692x: Make sure we don't exceed the maximum LED current
        dt: bindings: lm3692x: Add led-max-microamp property
        leds: lm3692x: Allow to configure over voltage protection
        dt: bindings: lm3692x: Add ti,ovp-microvolt property
        leds: populate the device's of_node
        leds: Add managed API to get a LED from a device driver
        leds: Add of_led_get() and led_put()
        leds: lm3532: add pointer to documentation and fix typo
        leds: lm3532: use extended registration so that LED can be used for backlight
        leds: lm3642: remove warnings for bad strtol, cleanup gotos
        leds: rb532: cleanup whitespace
        ledtrig-pattern: fix email address quoting in MODULE_AUTHOR()
        dt-bindings: mfd: update TI tps6105x chip bindings
        leds: tps6105x: add driver for MFD chip LED mode
        led: max77650: add of_match table
        leds: bd2802: Convert to use GPIO descriptors
        leds: pca963x: Fix open-drain initialization
        ...
      545ae665
    • Linus Torvalds's avatar
      Merge branch 'pcmcia-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux · 15f8e733
      Linus Torvalds authored
      Pull pcmcia updates from Dominik Brodowski:
       "This is a series co-developed by Simon Geis and Lukas Panzer to clean
        up the i82092 PCMCIA device driver"
      
      * 'pcmcia-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux:
        PCMCIA/i82092: remove #if 0 block
        PCMCIA/i82092: delete enter/leave macro
        PCMCIA/i82092: include <linux/io.h> instead of <asm/io.h>
        PCMCIA/i82092: shorten the lines with over 80 characters
        PCMCIA/i82092: move assignment out of if condition
        PCMCIA/i82092: change code indentation
        PCMCIA/i82092: insert blank line after declarations
        PCMCIA/i82092: remove braces around single statement blocks
        PCMCIA/i82092: add/remove spaces to improve readability
        PCMCIA/i82092: use dev_<level> instead of printk
      15f8e733
    • Josef Bacik's avatar
      btrfs: do not zero f_bavail if we have available space · d55966c4
      Josef Bacik authored
      There was some logic added a while ago to clear out f_bavail in statfs()
      if we did not have enough free metadata space to satisfy our global
      reserve.  This was incorrect at the time, however didn't really pose a
      problem for normal file systems because we would often allocate chunks
      if we got this low on free metadata space, and thus wouldn't really hit
      this case unless we were actually full.
      
      Fast forward to today and now we are much better about not allocating
      metadata chunks all of the time.  Couple this with d792b0f1 ("btrfs:
      always reserve our entire size for the global reserve") which now means
      we'll easily have a larger global reserve than our free space, we are
      now more likely to trip over this while still having plenty of space.
      
      Fix this by skipping this logic if the global rsv's space_info is not
      full.  space_info->full is 0 unless we've attempted to allocate a chunk
      for that space_info and that has failed.  If this happens then the space
      for the global reserve is definitely sacred and we need to report
      b_avail == 0, but before then we can just use our calculated b_avail.
      Reported-by: default avatarMartin Steigerwald <martin@lichtvoll.de>
      Fixes: ca8a51b3 ("btrfs: statfs: report zero available if metadata are exhausted")
      CC: stable@vger.kernel.org # 4.5+
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Tested-By: default avatarMartin Steigerwald <martin@lichtvoll.de>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      d55966c4
    • Arnd Bergmann's avatar
      sparc64: fix adjtimex regression · 11648b83
      Arnd Bergmann authored
      Anatoly Pugachev reported one of the y2038 patches to introduce
      a fatal bug from a stupid typo:
      
      [   96.384129] watchdog: BUG: soft lockup - CPU#8 stuck for 22s!
      ...
      [   96.385624]  [0000000000652ca4] handle_mm_fault+0x84/0x320
      [   96.385668]  [0000000000b6f2bc] do_sparc64_fault+0x43c/0x820
      [   96.385720]  [0000000000407754] sparc64_realfault_common+0x10/0x20
      [   96.385769]  [000000000042fa28] __do_sys_sparc_clock_adjtime+0x28/0x80
      [   96.385819]  [00000000004307f0] sys_sparc_clock_adjtime+0x10/0x20
      [   96.385866]  [0000000000406294] linux_sparc_syscall+0x34/0x44
      
      Fix the code to dereference the correct pointer again.
      Reported-by: default avatarAnatoly Pugachev <matorola@gmail.com>
      Tested-by: default avatarAnatoly Pugachev <matorola@gmail.com>
      Fixes: 251ec1c1 ("y2038: sparc: remove use of struct timex")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      11648b83
  4. 01 Feb, 2020 5 commits
    • Linus Torvalds's avatar
      Merge tag '5.6-rc-small-smb3-fix-for-stable' of git://git.samba.org/sfrench/cifs-2.6 · 94f2630b
      Linus Torvalds authored
      Pull cifs fix from Steve French:
       "Small SMB3 fix for stable (fixes problem with soft mounts)"
      
      * tag '5.6-rc-small-smb3-fix-for-stable' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: update internal module version number
        cifs: fix soft mounts hanging in the reconnect code
      94f2630b
    • Al Viro's avatar
      vfs: fix do_last() regression · 6404674a
      Al Viro authored
      Brown paperbag time: fetching ->i_uid/->i_mode really should've been
      done from nd->inode.  I even suggested that, but the reason for that has
      slipped through the cracks and I went for dir->d_inode instead - made
      for more "obvious" patch.
      
      Analysis:
      
       - at the entry into do_last() and all the way to step_into(): dir (aka
         nd->path.dentry) is known not to have been freed; so's nd->inode and
         it's equal to dir->d_inode unless we are already doomed to -ECHILD.
         inode of the file to get opened is not known.
      
       - after step_into(): inode of the file to get opened is known; dir
         might be pointing to freed memory/be negative/etc.
      
       - at the call of may_create_in_sticky(): guaranteed to be out of RCU
         mode; inode of the file to get opened is known and pinned; dir might
         be garbage.
      
      The last was the reason for the original patch.  Except that at the
      do_last() entry we can be in RCU mode and it is possible that
      nd->path.dentry->d_inode has already changed under us.
      
      In that case we are going to fail with -ECHILD, but we need to be
      careful; nd->inode is pointing to valid struct inode and it's the same
      as nd->path.dentry->d_inode in "won't fail with -ECHILD" case, so we
      should use that.
      Reported-by: default avatar"Rantala, Tommi T. (Nokia - FI/Espoo)" <tommi.t.rantala@nokia.com>
      Reported-by: syzbot+190005201ced78a74ad6@syzkaller.appspotmail.com
      Wearing-brown-paperbag: Al Viro <viro@zeniv.linux.org.uk>
      Cc: stable@kernel.org
      Fixes: d0cb5018 ("do_last(): fetch directory ->i_mode and ->i_uid before it's too late")
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6404674a
    • Linus Torvalds's avatar
      Merge tag 'kconfig-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild · 14cd0bd0
      Linus Torvalds authored
      Pull Kconfig updates from Masahiro Yamada:
      
       - add 'yes2modconfig' and 'mod2yesconfig' targets (useful mainly for
         turning syzbot configs into more modular ones as a step to minimizing
         the result)
      
       - sanitize help text
      
       - various code cleanups
      
      * tag 'kconfig-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        kconfig: fix documentation typos
        kconfig: fix an "implicit declaration of function" warning
        kconfig: fix nesting of symbol help text
        kconfig: distinguish between dependencies and visibility in help text
        kconfig: list all definitions of a symbol in help text
        kconfig: Add yes2modconfig and mod2yesconfig targets.
        kconfig: use $(PERL) in Makefile
        kconfig: fix too deep indentation in Makefile
        kconfig: localmodconfig: fix indentation for closing brace
        kconfig: localmodconfig: remove unused $config
        kconfig: squash prop_alloc() into menu_add_prop()
        kconfig: remove sym from struct property
        kconfig: remove 'prompt' argument from menu_add_prop()
        kconfig: move prompt handling to menu_add_prompt() from menu_add_prop()
        kconfig: remove 'prompt' symbol
        kconfig: drop T_WORD from the RHS of 'prompt' symbol
        kconfig: use parent->dep as the parentdep of 'menu'
        kconfig: remove the rootmenu check in menu_add_prop()
      14cd0bd0
    • Linus Torvalds's avatar
      Merge tag 'kbuild-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild · 368d060b
      Linus Torvalds authored
      Pull Kbuild updates from Masahiro Yamada:
      
       - detect missing include guard in UAPI headers
      
       - do not create orphan built-in.a or obj-y objects
      
       - generate modules.builtin more simply, and drop tristate.conf
      
       - simplify built-in initramfs creation
      
       - make linux-headers deb package thinner
      
       - optimize the deb package build script
      
       - misc cleanups
      
      * tag 'kbuild-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (34 commits)
        builddeb: split libc headers deployment out into a function
        builddeb: split kernel headers deployment out into a function
        builddeb: remove redundant make for ARCH=um
        builddeb: avoid invoking sub-shells where possible
        builddeb: remove redundant $objtree/
        builddeb: match temporary directory name to the package name
        builddeb: remove unneeded files in hdrobjfiles for headers package
        kbuild: use -S instead of -E for precise cc-option test in Kconfig
        builddeb: allow selection of .deb compressor
        kbuild: remove 'Building modules, stage 2.' log
        kbuild: remove *.tmp file when filechk fails
        kbuild: remove PYTHON2 variable
        modpost: assume STT_SPARC_REGISTER is defined
        gen_initramfs.sh: remove intermediate cpio_list on errors
        initramfs: refactor the initramfs build rules
        gen_initramfs.sh: always output cpio even without -o option
        initramfs: add default_cpio_list, and delete -d option support
        initramfs: generate dependency list and cpio at the same time
        initramfs: specify $(src)/gen_initramfs.sh as a prerequisite in Makefile
        initramfs: make initramfs compression choice non-optional
        ...
      368d060b
    • Linus Torvalds's avatar
      Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random · acd77500
      Linus Torvalds authored
      Pull random changes from Ted Ts'o:
       "Change /dev/random so that it uses the CRNG and only blocking if the
        CRNG hasn't initialized, instead of the old blocking pool. Also clean
        up archrandom.h, and some other miscellaneous cleanups"
      
      * tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: (24 commits)
        s390x: Mark archrandom.h functions __must_check
        powerpc: Mark archrandom.h functions __must_check
        powerpc: Use bool in archrandom.h
        x86: Mark archrandom.h functions __must_check
        linux/random.h: Mark CONFIG_ARCH_RANDOM functions __must_check
        linux/random.h: Use false with bool
        linux/random.h: Remove arch_has_random, arch_has_random_seed
        s390: Remove arch_has_random, arch_has_random_seed
        powerpc: Remove arch_has_random, arch_has_random_seed
        x86: Remove arch_has_random, arch_has_random_seed
        random: remove some dead code of poolinfo
        random: fix typo in add_timer_randomness()
        random: Add and use pr_fmt()
        random: convert to ENTROPY_BITS for better code readability
        random: remove unnecessary unlikely()
        random: remove kernel.random.read_wakeup_threshold
        random: delete code to pull data into pools
        random: remove the blocking pool
        random: make /dev/random be almost like /dev/urandom
        random: ignore GRND_RANDOM in getentropy(2)
        ...
      acd77500
  5. 31 Jan, 2020 6 commits
    • Linus Torvalds's avatar
      Merge tag 'pci-v5.6-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 26dca6db
      Linus Torvalds authored
      Pull PCI updates from Bjorn Helgaas:
      
       "Resource management:
      
         - Improve resource assignment for hot-added nested bridges, e.g.,
           Thunderbolt (Nicholas Johnson)
      
        Power management:
      
         - Optionally print config space of devices before suspend (Chen Yu)
      
         - Increase D3 delay for AMD Ryzen5/7 XHCI controllers (Daniel Drake)
      
        Virtualization:
      
         - Generalize DMA alias quirks (James Sewart)
      
         - Add DMA alias quirk for PLX PEX NTB (James Sewart)
      
         - Fix IOV memory leak (Navid Emamdoost)
      
        AER:
      
         - Log which device prevents error recovery (Yicong Yang)
      
        Peer-to-peer DMA:
      
         - Whitelist Intel SkyLake-E (Armen Baloyan)
      
        Broadcom iProc host bridge driver:
      
         - Apply PAXC quirk whether driver is built-in or module (Wei Liu)
      
        Broadcom STB host bridge driver:
      
         - Add Broadcom STB PCIe host controller driver (Jim Quinlan)
      
        Intel Gateway SoC host bridge driver:
      
         - Add driver for Intel Gateway SoC (Dilip Kota)
      
        Intel VMD host bridge driver:
      
         - Add support for DMA aliases on other buses (Jon Derrick)
      
         - Remove dma_map_ops overrides (Jon Derrick)
      
         - Remove now-unused X86_DEV_DMA_OPS (Christoph Hellwig)
      
        NVIDIA Tegra host bridge driver:
      
         - Fix Tegra30 afi_pex2_ctrl register offset (Marcel Ziswiler)
      
        Panasonic UniPhier host bridge driver:
      
         - Remove module code since driver can't be built as a module
           (Masahiro Yamada)
      
        Qualcomm host bridge driver:
      
         - Add support for SDM845 PCIe controller (Bjorn Andersson)
      
        TI Keystone host bridge driver:
      
         - Fix "num-viewport" DT property error handling (Kishon Vijay Abraham I)
      
         - Fix link training retries initiation (Yurii Monakov)
      
         - Fix outbound region mapping (Yurii Monakov)
      
        Misc:
      
         - Add Switchtec Gen4 support (Kelvin Cao)
      
         - Add Switchtec Intercomm Notify and Upstream Error Containment
           support (Logan Gunthorpe)
      
         - Use dma_set_mask_and_coherent() since Switchtec supports 64-bit
           addressing (Wesley Sheng)"
      
      * tag 'pci-v5.6-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (60 commits)
        PCI: Allow adjust_bridge_window() to shrink resource if necessary
        PCI: Set resource size directly in adjust_bridge_window()
        PCI: Rename extend_bridge_window() to adjust_bridge_window()
        PCI: Rename extend_bridge_window() parameter
        PCI: Consider alignment of hot-added bridges when assigning resources
        PCI: Remove local variable usage in pci_bus_distribute_available_resources()
        PCI: Pass size + alignment to pci_bus_distribute_available_resources()
        PCI: Rename variables
        PCI: vmd: Add two VMD Device IDs
        PCI: Remove unnecessary braces
        PCI: brcmstb: Add MSI support
        PCI: brcmstb: Add Broadcom STB PCIe host controller driver
        x86/PCI: Remove X86_DEV_DMA_OPS
        PCI: vmd: Remove dma_map_ops overrides
        iommu/vt-d: Remove VMD child device sanity check
        iommu/vt-d: Use pci_real_dma_dev() for mapping
        PCI: Introduce pci_real_dma_dev()
        x86/PCI: Expose VMD's pci_dev in struct pci_sysdata
        x86/PCI: Add to_pci_sysdata() helper
        PCI/AER: Initialize aer_fifo
        ...
      26dca6db
    • Linus Torvalds's avatar
      Merge tag 'media/v5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · 846de71b
      Linus Torvalds authored
      Pull media updates from Mauro Carvalho Chehab:
      
       - New staging driver for Rockship ISPv1 unit
      
       - New staging driver for Rockchip MIPI Synopsys DPHY RX0
      
       - y2038 fixes at V4L2 API (backward-compatible)
      
       - A dvb core fix when receiving invalid EIT sections
      
       - Some clang-specific warnings got fixed
      
       - Added support for touch V4L2 interface at vivid
      
       - Several drivers were converted to use the new
         i2c_new_scanned_device() kAPI
      
       - Added sm1 support at meson's vdec driver
      
       - Several other driver cleanups, fixes and improvements
      
      * tag 'media/v5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (207 commits)
        media: staging/intel-ipu3: remove TODO item about acronyms
        media: v4l2-fwnode: Print the node name while parsing endpoints
        media: Revert "media: staging/intel-ipu3: make imgu use fixed running mode"
        media: mt9v111: constify copied structure
        media: platform: VIDEO_MEDIATEK_JPEG can also depend on MTK_IOMMU
        media: uvcvideo: Add a quirk to force GEO GC6500 Camera bits-per-pixel value
        media: uvcvideo: Avoid cyclic entity chains due to malformed USB descriptors
        media: hantro: fix post-processing NULL pointer dereference
        media: rcar-vin: Use correct pixel format when aligning format
        media: MAINTAINERS: add entry for Rockchip ISP1 driver
        media: staging: rkisp1: add TODO file for staging
        media: staging: rkisp1: add document for rkisp1 meta buffer format
        media: staging: rkisp1: add output device for parameters
        media: staging: rkisp1: add capture device for statistics
        media: staging: rkisp1: add user space ABI definitions
        media: staging: rkisp1: add streaming paths
        media: staging: rkisp1: add Rockchip ISP1 base driver
        media: staging: phy-rockchip-dphy-rx0: add Rockchip MIPI Synopsys DPHY RX0 driver
        media: staging: dt-bindings: add Rockchip MIPI RX D-PHY RX0 yaml bindings
        media: staging: dt-bindings: add Rockchip ISP1 yaml bindings
        ...
      846de71b
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · 8fdd4019
      Linus Torvalds authored
      Pull rdma updates from Jason Gunthorpe:
       "A very quiet cycle with few notable changes. Mostly the usual list of
        one or two patches to drivers changing something that isn't quite rc
        worthy. The subsystem seems to be seeing a larger number of rework and
        cleanup style patches right now, I feel that several vendors are
        prepping their drivers for new silicon.
      
        Summary:
      
         - Driver updates and cleanup for qedr, bnxt_re, hns, siw, mlx5, mlx4,
           rxe, i40iw
      
         - Larger series doing cleanup and rework for hns and hfi1.
      
         - Some general reworking of the CM code to make it a little more
           understandable
      
         - Unify the different code paths connected to the uverbs FD scheme
      
         - New UAPI ioctls conversions for get context and get async fd
      
         - Trace points for CQ and CM portions of the RDMA stack
      
         - mlx5 driver support for virtio-net formatted rings as RDMA raw
           ethernet QPs
      
         - verbs support for setting the PCI-E relaxed ordering bit on DMA
           traffic connected to a MR
      
         - A couple of bug fixes that came too late to make rc7"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (108 commits)
        RDMA/core: Make the entire API tree static
        RDMA/efa: Mask access flags with the correct optional range
        RDMA/cma: Fix unbalanced cm_id reference count during address resolve
        RDMA/umem: Fix ib_umem_find_best_pgsz()
        IB/mlx4: Fix leak in id_map_find_del
        IB/opa_vnic: Spelling correction of 'erorr' to 'error'
        IB/hfi1: Fix logical condition in msix_request_irq
        RDMA/cm: Remove CM message structs
        RDMA/cm: Use IBA functions for complex structure members
        RDMA/cm: Use IBA functions for simple structure members
        RDMA/cm: Use IBA functions for swapping get/set acessors
        RDMA/cm: Use IBA functions for simple get/set acessors
        RDMA/cm: Add SET/GET implementations to hide IBA wire format
        RDMA/cm: Add accessors for CM_REQ transport_type
        IB/mlx5: Return the administrative GUID if exists
        RDMA/core: Ensure that rdma_user_mmap_entry_remove() is a fence
        IB/mlx4: Fix memory leak in add_gid error flow
        IB/mlx5: Expose RoCE accelerator counters
        RDMA/mlx5: Set relaxed ordering when requested
        RDMA/core: Add the core support field to METHOD_GET_CONTEXT
        ...
      8fdd4019
    • Linus Torvalds's avatar
      Merge tag 'thermal-v5.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux · 68b62e5d
      Linus Torvalds authored
      Pull thermal fixes from Daniel Lezcano:
      
       - Fix a severe docs build failure for cpu idle cooling device (Randy
         Dunlap)
      
       - Fix a spelling mistake in the error message for the stm32 (Colin Ian
         King)
      
      * tag 'thermal-v5.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux:
        thermal: stm32: fix spelling mistake "preprare" -> "prepare"
        Documentation: cpu-idle-cooling: fix a SEVERE docs build failure
      68b62e5d
    • Linus Torvalds's avatar
      Merge tag 'acpi-5.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · ffda81b6
      Linus Torvalds authored
      Pull more ACPI updates from Rafael Wysocki:
       "Fix up MAINTAINERS entires related to ACPI (Andy Shevchenko)"
      
      * tag 'acpi-5.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        MAINTAINERS: Sort entries in database for X-POWERS AXP288
        MAINTAINERS: Sort entries in database for ACPICA
        MAINTAINERS: Sort entries in database for ACPI
      ffda81b6
    • Linus Torvalds's avatar
      Merge tag 'pm-5.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · cf3c8f84
      Linus Torvalds authored
      Pull more power manadement updates from Rafael Wysocki:
       "Prevent cpufreq from creating excessively large stack frames and fix
        the handling of devices deleted during system-wide resume in the PM
        core (Rafael Wysocki), revert a problematic commit affecting the
        cpupower utility and correct its man page (Thomas Renninger,
        Brahadambal Srinivasan), and improve the intel_pstate_tracer utility
        (Doug Smythies)"
      
      * tag 'pm-5.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        tools/power/x86/intel_pstate_tracer: change several graphs to autoscale y-axis
        tools/power/x86/intel_pstate_tracer: changes for python 3 compatibility
        Correction to manpage of cpupower
        cpufreq: Avoid creating excessively large stack frames
        PM: core: Fix handling of devices deleted during system-wide resume
        cpupower: Revert library ABI changes from commit ae291709
      cf3c8f84