1. 11 Sep, 2022 3 commits
    • Dan Williams's avatar
      xfs: quiet notify_failure EOPNOTSUPP cases · b14d067e
      Dan Williams authored
      Patch series "mm, xfs, dax: Fixes for memory_failure() handling".
      
      I failed to run the memory error injection section of the ndctl test suite
      on linux-next prior to the merge window and as a result some bugs were
      missed.  While the new enabling targeted reflink enabled XFS filesystems
      the bugs cropped up in the surrounding cases of DAX error injection on
      ext4-fsdax and device-dax.
      
      One new assumption / clarification in this set is the notion that if a
      filesystem's ->notify_failure() handler returns -EOPNOTSUPP, then it must
      be the case that the fsdax usage of page->index and page->mapping are
      valid.  I am fairly certain this is true for xfs_dax_notify_failure(), but
      would appreciate another set of eyes.
      
      
      This patch (of 4):
      
      XFS always registers dax_holder_operations regardless of whether the
      filesystem is capable of handling the notifications.  The expectation is
      that if the notify_failure handler cannot run then there are no scenarios
      where it needs to run.  In other words the expected semantic is that
      page->index and page->mapping are valid for memory_failure() when the
      conditions that cause -EOPNOTSUPP in xfs_dax_notify_failure() are present.
      
      A fallback to the generic memory_failure() path is expected so do not warn
      when that happens.
      
      Link: https://lkml.kernel.org/r/166153426798.2758201.15108211981034512993.stgit@dwillia2-xfh.jf.intel.com
      Link: https://lkml.kernel.org/r/166153427440.2758201.6709480562966161512.stgit@dwillia2-xfh.jf.intel.com
      Fixes: 6f643c57 ("xfs: implement ->notify_failure() for XFS")
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Shiyang Ruan <ruansy.fnst@fujitsu.com>
      Cc: Darrick J. Wong <djwong@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Ritesh Harjani <riteshh@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b14d067e
    • Mel Gorman's avatar
      mm/page_alloc: fix race condition between build_all_zonelists and page allocation · 3d36424b
      Mel Gorman authored
      Patrick Daly reported the following problem;
      
      	NODE_DATA(nid)->node_zonelists[ZONELIST_FALLBACK] - before offline operation
      	[0] - ZONE_MOVABLE
      	[1] - ZONE_NORMAL
      	[2] - NULL
      
      	For a GFP_KERNEL allocation, alloc_pages_slowpath() will save the
      	offset of ZONE_NORMAL in ac->preferred_zoneref. If a concurrent
      	memory_offline operation removes the last page from ZONE_MOVABLE,
      	build_all_zonelists() & build_zonerefs_node() will update
      	node_zonelists as shown below. Only populated zones are added.
      
      	NODE_DATA(nid)->node_zonelists[ZONELIST_FALLBACK] - after offline operation
      	[0] - ZONE_NORMAL
      	[1] - NULL
      	[2] - NULL
      
      The race is simple -- page allocation could be in progress when a memory
      hot-remove operation triggers a zonelist rebuild that removes zones.  The
      allocation request will still have a valid ac->preferred_zoneref that is
      now pointing to NULL and triggers an OOM kill.
      
      This problem probably always existed but may be slightly easier to trigger
      due to 6aa303de ("mm, vmscan: only allocate and reclaim from zones
      with pages managed by the buddy allocator") which distinguishes between
      zones that are completely unpopulated versus zones that have valid pages
      not managed by the buddy allocator (e.g.  reserved, memblock, ballooning
      etc).  Memory hotplug had multiple stages with timing considerations
      around managed/present page updates, the zonelist rebuild and the zone
      span updates.  As David Hildenbrand puts it
      
      	memory offlining adjusts managed+present pages of the zone
      	essentially in one go. If after the adjustments, the zone is no
      	longer populated (present==0), we rebuild the zone lists.
      
      	Once that's done, we try shrinking the zone (start+spanned
      	pages) -- which results in zone_start_pfn == 0 if there are no
      	more pages. That happens *after* rebuilding the zonelists via
      	remove_pfn_range_from_zone().
      
      The only requirement to fix the race is that a page allocation request
      identifies when a zonelist rebuild has happened since the allocation
      request started and no page has yet been allocated.  Use a seqlock_t to
      track zonelist updates with a lockless read-side of the zonelist and
      protecting the rebuild and update of the counter with a spinlock.
      
      [akpm@linux-foundation.org: make zonelist_update_seq static]
      Link: https://lkml.kernel.org/r/20220824110900.vh674ltxmzb3proq@techsingularity.net
      Fixes: 6aa303de ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator")
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Reported-by: default avatarPatrick Daly <quic_pdaly@quicinc.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: <stable@vger.kernel.org>	[4.9+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3d36424b
    • ChenXiaoSong's avatar
      ntfs: fix BUG_ON in ntfs_lookup_inode_by_name() · 1b513f61
      ChenXiaoSong authored
      Syzkaller reported BUG_ON as follows:
      
      ------------[ cut here ]------------
      kernel BUG at fs/ntfs/dir.c:86!
      invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
      CPU: 3 PID: 758 Comm: a.out Not tainted 5.19.0-next-20220808 #5
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      RIP: 0010:ntfs_lookup_inode_by_name+0xd11/0x2d10
      Code: ff e9 b9 01 00 00 e8 1e fe d6 fe 48 8b 7d 98 49 8d 5d 07 e8 91 85 29 ff 48 c7 45 98 00 00 00 00 e9 5a fb ff ff e8 ff fd d6 fe <0f> 0b e8 f8 fd d6 fe 0f 0b e8 f1 fd d6 fe 48 8b b5 50 ff ff ff 4c
      RSP: 0018:ffff888079607978 EFLAGS: 00010293
      RAX: 0000000000000000 RBX: 0000000000008000 RCX: 0000000000000000
      RDX: ffff88807cf10000 RSI: ffffffff82a4a081 RDI: 0000000000000003
      RBP: ffff888079607a70 R08: 0000000000000001 R09: ffff88807a6d01d7
      R10: ffffed100f4da03a R11: 0000000000000000 R12: ffff88800f0fb110
      R13: ffff88800f0ee000 R14: ffff88800f0fb000 R15: 0000000000000001
      FS:  00007f33b63c7540(0000) GS:ffff888108580000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f33b635c090 CR3: 000000000f39e005 CR4: 0000000000770ee0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
       <TASK>
       load_system_files+0x1f7f/0x3620
       ntfs_fill_super+0xa01/0x1be0
       mount_bdev+0x36a/0x440
       ntfs_mount+0x3a/0x50
       legacy_get_tree+0xfb/0x210
       vfs_get_tree+0x8f/0x2f0
       do_new_mount+0x30a/0x760
       path_mount+0x4de/0x1880
       __x64_sys_mount+0x2b3/0x340
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      RIP: 0033:0x7f33b62ff9ea
      Code: 48 8b 0d a9 f4 0b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 76 f4 0b 00 f7 d8 64 89 01 48
      RSP: 002b:00007ffd0c471aa8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f33b62ff9ea
      RDX: 0000000020000000 RSI: 0000000020000100 RDI: 00007ffd0c471be0
      RBP: 00007ffd0c471c60 R08: 00007ffd0c471ae0 R09: 00007ffd0c471c24
      R10: 0000000000000000 R11: 0000000000000202 R12: 000055bac5afc160
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
       </TASK>
      Modules linked in:
      ---[ end trace 0000000000000000 ]---
      
      Fix this by adding sanity check on extended system files' directory inode
      to ensure that it is directory, just like ntfs_extend_init() when mounting
      ntfs3.
      
      Link: https://lkml.kernel.org/r/20220809064730.2316892-1-chenxiaosong2@huawei.comSigned-off-by: default avatarChenXiaoSong <chenxiaosong2@huawei.com>
      Cc: Anton Altaparmakov <anton@tuxera.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1b513f61
  2. 28 Aug, 2022 25 commits
  3. 27 Aug, 2022 12 commits
    • Linus Torvalds's avatar
      Merge tag 'thermal-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 10d4879f
      Linus Torvalds authored
      Pull thermal control fixes from Rafael Wysocki:
       "Fix two issues introduced recently and one driver problem leading to a
        NULL pointer dereference in some cases.
      
        Specifics:
      
         - Add missing EXPORT_SYMBOL_GPL in the thermal core and add back the
           required 'trips' property to the thermal zone DT bindings (Daniel
           Lezcano)
      
         - Prevent the int340x_thermal driver from crashing when a package
           with a buffer of 0 length is returned by an ACPI control method
           evaluated by it (Lee, Chun-Yi)"
      
      * tag 'thermal-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        thermal/int340x_thermal: handle data_vault when the value is ZERO_SIZE_PTR
        dt-bindings: thermal: Fix missing required property
        thermal/core: Add missing EXPORT_SYMBOL_GPL
      10d4879f
    • Linus Torvalds's avatar
      Merge tag 'pm-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · b98f602d
      Linus Torvalds authored
      Pull power management fix from Rafael Wysocki:
       "Make __resolve_freq() check the presence of the frequency table
        instead of checking whether or not the ->target_index() callback is
        implemented by the driver, because that need not be the case when
        __resolve_freq() is used (Lukasz Luba)"
      
      * tag 'pm-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        cpufreq: check only freq_table in __resolve_freq()
      b98f602d
    • Linus Torvalds's avatar
      Merge tag 'acpi-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 2b1ddb59
      Linus Torvalds authored
      Pull ACPI fixes from Rafael Wysocki:
       "These fix issues introduced by recent changes related to the handling
        of ACPI device properties and a coding mistake in the exit path of the
        ACPI processor driver.
      
        Specifics:
      
         - Prevent acpi_thermal_cpufreq_exit() from attempting to remove
           the same frequency QoS request multiple times (Riwen Lu)
      
         - Fix type detection for integer ACPI device properties (Stefan
           Binding)
      
         - Avoid emitting false-positive warnings when processing ACPI
           device properties and drop the useless default case from the
           acpi_copy_property_array_uint() macro (Sakari Ailus)"
      
      * tag 'acpi-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI: property: Remove default association from integer maximum values
        ACPI: property: Ignore already existing data node tags
        ACPI: property: Fix type detection of unified integer reading functions
        ACPI: processor: Remove freq Qos request for all CPUs
      2b1ddb59
    • Linus Torvalds's avatar
      Merge tag 's390-6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · dee18737
      Linus Torvalds authored
      Pull s390 fixes from Vasily Gorbik:
      
       - Fix double free of guarded storage and runtime instrumentation
         control blocks on fork() failure
      
       - Fix triggering write fault when VMA does not allow VM_WRITE
      
      * tag 's390-6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/mm: do not trigger write fault when vma does not allow VM_WRITE
        s390: fix double free of GS and RI CBs on fork() failure
      dee18737
    • Linus Torvalds's avatar
      Merge tag 'for-linus-6.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 05519f24
      Linus Torvalds authored
      Pull xen fixes from Juergen Gross:
      
       - two minor cleanups
      
       - a fix of the xen/privcmd driver avoiding a possible NULL dereference
         in an error case
      
      * tag 'for-linus-6.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen/privcmd: fix error exit of privcmd_ioctl_dm_op()
        xen: move from strlcpy with unused retval to strscpy
        xen: x86: remove setting the obsolete config XEN_MAX_DOMAIN_MEMORY
      05519f24
    • Linus Torvalds's avatar
      Merge tag 'audit-pr-20220826' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit · 17b28d42
      Linus Torvalds authored
      Pull audit fix from Paul Moore:
       "Another small audit patch, this time to fix a bug where the return
        codes were not properly set before the audit filters were run,
        potentially resulting in missed audit records"
      
      * tag 'audit-pr-20220826' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
        audit: move audit_return_fixup before the filters
      17b28d42
    • Linus Torvalds's avatar
      Merge tag 'fbdev-for-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev · 89b749d8
      Linus Torvalds authored
      Pull fbdev fixes and updates from Helge Deller:
       "Mostly just small patches, with the exception of the bigger indenting
        cleanups in the sisfb and radeonfb drivers.
      
        Two patches should be mentioned though: A fix-up for fbdev if the
        screen resize fails (by Shigeru Yoshida), and a potential divide by
        zero fix in fb_pm2fb (by Letu Ren).
      
        Summary:
      
        Major fixes:
         - Revert the changes for fbcon console when vc_resize() fails
           [Shigeru Yoshida]
         - Avoid a potential divide by zero error in fb_pm2fb [Letu Ren]
      
        Minor fixes:
         - Add missing pci_disable_device() in chipsfb_pci_init() [Yang
           Yingliang]
         - Fix tests for platform_get_irq() failure in omapfb [Yu Zhe]
         - Destroy mutex on freeing struct fb_info in fbsysfs [Shigeru
           Yoshida]
      
        Cleanups:
         - Move fbdev drivers from strlcpy to strscpy [Wolfram Sang]
         - Indenting fixes, comment fixes, ... [Jiapeng Chong & Jilin Yuan]"
      
      * tag 'fbdev-for-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
        fbdev: fbcon: Properly revert changes when vc_resize() failed
        fbdev: Move fbdev drivers from strlcpy to strscpy
        fbdev: omap: Remove unnecessary print function dev_err()
        fbdev: chipsfb: Add missing pci_disable_device() in chipsfb_pci_init()
        fbdev: fbcon: Destroy mutex on freeing struct fb_info
        fbdev: radeon: Clean up some inconsistent indenting
        fbdev: sisfb: Clean up some inconsistent indenting
        fbdev: fb_pm2fb: Avoid potential divide by zero error
        fbdev: ssd1307fb: Fix repeated words in comments
        fbdev: omapfb: Fix tests for platform_get_irq() failure
      89b749d8
    • Mikulas Patocka's avatar
      provide arch_test_bit_acquire for architectures that define test_bit · d6ffe606
      Mikulas Patocka authored
      Some architectures define their own arch_test_bit and they also need
      arch_test_bit_acquire, otherwise they won't compile.  We also clean up
      the code by using the generic test_bit if that is equivalent to the
      arch-specific version.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: 8238b457 ("wait_on_bit: add an acquire memory barrier")
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d6ffe606
    • Zhengjun Xing's avatar
      perf stat: Capitalize topdown metrics' names · 48648548
      Zhengjun Xing authored
      Capitalize topdown metrics' names to follow the intel SDM.
      
      Before:
      
       # ./perf stat -a  sleep 1
      
       Performance counter stats for 'system wide':
      
              228,094.05 msec cpu-clock                        #  225.026 CPUs utilized
                     842      context-switches                 #    3.691 /sec
                     224      cpu-migrations                   #    0.982 /sec
                      70      page-faults                      #    0.307 /sec
              23,164,105      cycles                           #    0.000 GHz
              29,403,446      instructions                     #    1.27  insn per cycle
               5,268,185      branches                         #   23.097 K/sec
                  33,239      branch-misses                    #    0.63% of all branches
             136,248,990      slots                            #  597.337 K/sec
              32,976,450      topdown-retiring                 #     24.2% retiring
               4,651,918      topdown-bad-spec                 #      3.4% bad speculation
              26,148,695      topdown-fe-bound                 #     19.2% frontend bound
              72,515,776      topdown-be-bound                 #     53.2% backend bound
               6,008,540      topdown-heavy-ops                #      4.4% heavy operations       #     19.8% light operations
               3,934,049      topdown-br-mispredict            #      2.9% branch mispredict      #      0.5% machine clears
              16,655,439      topdown-fetch-lat                #     12.2% fetch latency          #      7.0% fetch bandwidth
              41,635,972      topdown-mem-bound                #     30.5% memory bound           #     22.7% Core bound
      
             1.013634593 seconds time elapsed
      
      After:
      
       # ./perf stat -a  sleep 1
      
       Performance counter stats for 'system wide':
      
              228,081.94 msec cpu-clock                        #  225.003 CPUs utilized
                     824      context-switches                 #    3.613 /sec
                     224      cpu-migrations                   #    0.982 /sec
                      67      page-faults                      #    0.294 /sec
              22,647,423      cycles                           #    0.000 GHz
              28,870,551      instructions                     #    1.27  insn per cycle
               5,167,099      branches                         #   22.655 K/sec
                  32,383      branch-misses                    #    0.63% of all branches
             133,411,074      slots                            #  584.926 K/sec
              32,352,607      topdown-retiring                 #     24.3% Retiring
               4,456,977      topdown-bad-spec                 #      3.3% Bad Speculation
              25,626,487      topdown-fe-bound                 #     19.2% Frontend Bound
              70,955,316      topdown-be-bound                 #     53.2% Backend Bound
               5,834,844      topdown-heavy-ops                #      4.4% Heavy Operations       #     19.9% Light Operations
               3,738,781      topdown-br-mispredict            #      2.8% Branch Mispredict      #      0.5% Machine Clears
              16,286,803      topdown-fetch-lat                #     12.2% Fetch Latency          #      7.0% Fetch Bandwidth
              40,802,069      topdown-mem-bound                #     30.6% Memory Bound           #     22.6% Core Bound
      
             1.013683125 seconds time elapsed
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarXing Zhengjun <zhengjun.xing@linux.intel.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220825015458.3252239-1-zhengjun.xing@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      48648548
    • Kan Liang's avatar
      perf docs: Update the documentation for the save_type filter · 3126204c
      Kan Liang authored
      Update the documentation to reflect the kernel changes.
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20220816125612.2042397-2-kan.liang@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3126204c
    • Ian Rogers's avatar
      perf sched: Fix memory leaks in __cmd_record detected with -fsanitize=address · d72e5cf3
      Ian Rogers authored
      An array of strings is passed to cmd_record but not freed. As
      cmd_record modifies the array, add another array as a copy that can be
      mutated allowing the original array contents to all be freed.
      
      Detected with -fsanitize=address.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20220824145733.409005-1-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d72e5cf3
    • Andi Kleen's avatar
      perf record: Fix manpage formatting of description of support to hybrid systems · e89eaa61
      Andi Kleen authored
      The Intel hybrid description is written in a different style than the
      rest of the perf record man page. There were some new command line
      options added after it which resulted in very strange section ordering.
      Move the hybrid include last.
      
      Also the sub sections in the hybrid document don't fit the record
      manpage well (especially since it talks about all kinds of unrelated
      commands). I left this for now, but would be better to separate this
      properly in the different man pages.
      
      It would be better to use sub sections for the other sections, but these
      don't seem to be supported in AsciiDoc?
      
      Some of the examples are still misrendered in the manpage with an
      indented troff command, but I don't know how to fix that.
      
      In any case it's now better than before.
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: zhengjun.xing@intel.com
      Link: https://lore.kernel.org/r/20220818100127.249401-1-ak@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e89eaa61