1. 06 Dec, 2020 5 commits
    • Minchan Kim's avatar
      mm/zsmalloc.c: drop ZSMALLOC_PGTABLE_MAPPING · e91d8d78
      Minchan Kim authored
      While I was doing zram testing, I found sometimes decompression failed
      since the compression buffer was corrupted.  With investigation, I found
      below commit calls cond_resched unconditionally so it could make a
      problem in atomic context if the task is reschedule.
      
        BUG: sleeping function called from invalid context at mm/vmalloc.c:108
        in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 946, name: memhog
        3 locks held by memhog/946:
         #0: ffff9d01d4b193e8 (&mm->mmap_lock#2){++++}-{4:4}, at: __mm_populate+0x103/0x160
         #1: ffffffffa3d53de0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0xa98/0x1160
         #2: ffff9d01d56b8110 (&zspage->lock){.+.+}-{3:3}, at: zs_map_object+0x8e/0x1f0
        CPU: 0 PID: 946 Comm: memhog Not tainted 5.9.3-00011-gc5bfc0287345-dirty #316
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
        Call Trace:
          unmap_kernel_range_noflush+0x2eb/0x350
          unmap_kernel_range+0x14/0x30
          zs_unmap_object+0xd5/0xe0
          zram_bvec_rw.isra.0+0x38c/0x8e0
          zram_rw_page+0x90/0x101
          bdev_write_page+0x92/0xe0
          __swap_writepage+0x94/0x4a0
          pageout+0xe3/0x3a0
          shrink_page_list+0xb94/0xd60
          shrink_inactive_list+0x158/0x460
      
      We can fix this by removing the ZSMALLOC_PGTABLE_MAPPING feature (which
      contains the offending calling code) from zsmalloc.
      
      Even though this option showed some amount improvement(e.g., 30%) in
      some arm32 platforms, it has been headache to maintain since it have
      abused APIs[1](e.g., unmap_kernel_range in atomic context).
      
      Since we are approaching to deprecate 32bit machines and already made
      the config option available for only builtin build since v5.8, lastly it
      has been not default option in zsmalloc, it's time to drop the option
      for better maintenance.
      
      [1] http://lore.kernel.org/linux-mm/20201105170249.387069-1-minchan@kernel.org
      
      Fixes: e47110e9 ("mm/vunmap: add cond_resched() in vunmap_pmd_range")
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Tony Lindgren <tony@atomide.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Harish Sriram <harish@linux.ibm.com>
      Cc: Uladzislau Rezki <urezki@gmail.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20201117202916.GA3856507@google.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e91d8d78
    • Yang Shi's avatar
      mm: list_lru: set shrinker map bit when child nr_items is not zero · 8199be00
      Yang Shi authored
      When investigating a slab cache bloat problem, significant amount of
      negative dentry cache was seen, but confusingly they neither got shrunk
      by reclaimer (the host has very tight memory) nor be shrunk by dropping
      cache.  The vmcore shows there are over 14M negative dentry objects on
      lru, but tracing result shows they were even not scanned at all.
      
      Further investigation shows the memcg's vfs shrinker_map bit is not set.
      So the reclaimer or dropping cache just skip calling vfs shrinker.  So
      we have to reboot the hosts to get the memory back.
      
      I didn't manage to come up with a reproducer in test environment, and
      the problem can't be reproduced after rebooting.  But it seems there is
      race between shrinker map bit clear and reparenting by code inspection.
      The hypothesis is elaborated as below.
      
      The memcg hierarchy on our production environment looks like:
      
                      root
                     /    \
                system   user
      
      The main workloads are running under user slice's children, and it
      creates and removes memcg frequently.  So reparenting happens very often
      under user slice, but no task is under user slice directly.
      
      So with the frequent reparenting and tight memory pressure, the below
      hypothetical race condition may happen:
      
             CPU A                            CPU B
      reparent
          dst->nr_items == 0
                                       shrinker:
                                           total_objects == 0
          add src->nr_items to dst
          set_bit
                                           return SHRINK_EMPTY
                                           clear_bit
      child memcg offline
          replace child's kmemcg_id with
          parent's (in memcg_offline_kmem())
                                        list_lru_del() between shrinker runs
                                           see parent's kmemcg_id
                                           dec dst->nr_items
      reparent again
          dst->nr_items may go negative
          due to concurrent list_lru_del()
      
                                       The second run of shrinker:
                                           read nr_items without any
                                           synchronization, so it may
                                           see intermediate negative
                                           nr_items then total_objects
                                           may return 0 coincidently
      
                                           keep the bit cleared
          dst->nr_items != 0
          skip set_bit
          add scr->nr_item to dst
      
      After this point dst->nr_item may never go zero, so reparenting will not
      set shrinker_map bit anymore.  And since there is no task under user
      slice directly, so no new object will be added to its lru to set the
      shrinker map bit either.  That bit is kept cleared forever.
      
      How does list_lru_del() race with reparenting? It is because reparenting
      replaces children's kmemcg_id to parent's without protecting from
      nlru->lock, so list_lru_del() may see parent's kmemcg_id but actually
      deleting items from child's lru, but dec'ing parent's nr_items, so the
      parent's nr_items may go negative as commit 2788cf0c ("memcg:
      reparent list_lrus and free kmemcg_id on css offline") says.
      
      Since it is impossible that dst->nr_items goes negative and
      src->nr_items goes zero at the same time, so it seems we could set the
      shrinker map bit iff src->nr_items != 0.  We could synchronize
      list_lru_count_one() and reparenting with nlru->lock, but it seems
      checking src->nr_items in reparenting is the simplest and avoids lock
      contention.
      
      Fixes: fae91d6d ("mm/list_lru.c: set bit in memcg shrinker bitmap on first list_lru item appearance")
      Suggested-by: default avatarRoman Gushchin <guro@fb.com>
      Signed-off-by: default avatarYang Shi <shy828301@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarRoman Gushchin <guro@fb.com>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Acked-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: <stable@vger.kernel.org>	[4.19]
      Link: https://lkml.kernel.org/r/20201202171749.264354-1-shy828301@gmail.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8199be00
    • Roman Gushchin's avatar
      mm: memcg/slab: fix obj_cgroup_charge() return value handling · becaba65
      Roman Gushchin authored
      Commit 10befea9 ("mm: memcg/slab: use a single set of kmem_caches
      for all allocations") introduced a regression into the handling of the
      obj_cgroup_charge() return value.  If a non-zero value is returned
      (indicating of exceeding one of memory.max limits), the allocation
      should fail, instead of falling back to non-accounted mode.
      
      To make the code more readable, move memcg_slab_pre_alloc_hook() and
      memcg_slab_post_alloc_hook() calling conditions into bodies of these
      hooks.
      
      Fixes: 10befea9 ("mm: memcg/slab: use a single set of kmem_caches for all allocations")
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20201127161828.GD840171@carbon.dhcp.thefacebook.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      becaba65
    • Menglong Dong's avatar
      coredump: fix core_pattern parse error · 2bf509d9
      Menglong Dong authored
      'format_corename()' will splite 'core_pattern' on spaces when it is in
      pipe mode, and take helper_argv[0] as the path to usermode executable.
      It works fine in most cases.
      
      However, if there is a space between '|' and '/file/path', such as
      '| /usr/lib/systemd/systemd-coredump %P %u %g', then helper_argv[0] will
      be parsed as '', and users will get a 'Core dump to | disabled'.
      
      It is not friendly to users, as the pattern above was valid previously.
      Fix this by ignoring the spaces between '|' and '/file/path'.
      
      Fixes: 315c6926 ("coredump: split pipe command whitespace before expanding template")
      Signed-off-by: default avatarMenglong Dong <dong.menglong@zte.com.cn>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Paul Wise <pabs3@bonedaddy.net>
      Cc: Jakub Wilk <jwilk@jwilk.net> [https://bugs.debian.org/924398]
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/5fb62870.1c69fb81.8ef5d.af76@mx.google.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2bf509d9
    • Randy Dunlap's avatar
      zlib: export S390 symbols for zlib modules · 11fb479f
      Randy Dunlap authored
      Fix build errors when ZLIB_INFLATE=m and ZLIB_DEFLATE=m and ZLIB_DFLTCC=y
      by exporting the 2 needed symbols in dfltcc_inflate.c.
      
      Fixes these build errors:
      
        ERROR: modpost: "dfltcc_inflate" [lib/zlib_inflate/zlib_inflate.ko] undefined!
        ERROR: modpost: "dfltcc_can_inflate" [lib/zlib_inflate/zlib_inflate.ko] undefined!
      
      Fixes: 12619610 ("lib/zlib: add s390 hardware support for kernel zlib_inflate")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Cc: Mikhail Zaslonko <zaslonko@linux.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Link: https://lkml.kernel.org/r/20201123191712.4882-1-rdunlap@infradead.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      11fb479f
  2. 05 Dec, 2020 9 commits
  3. 04 Dec, 2020 9 commits
    • Mike Snitzer's avatar
      block: fix incorrect branching in blk_max_size_offset() · 65f33b35
      Mike Snitzer authored
      If non-zero 'chunk_sectors' is passed in to blk_max_size_offset() that
      override will be incorrectly ignored.
      
      Old blk_max_size_offset() branching, prior to commit 3ee16db3,
      must be used only if passed 'chunk_sectors' override is zero.
      
      Fixes: 3ee16db3 ("dm: fix IO splitting")
      Cc: stable@vger.kernel.org # 5.9
      Reported-by: default avatarJohn Dorminy <jdorminy@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      65f33b35
    • Linus Torvalds's avatar
      Merge tag 'for-5.10/dm-fixes' of... · b3298500
      Linus Torvalds authored
      Merge tag 'for-5.10/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
      
      Pull device mapper fixes from Mike Snitzer:
      
       - Fix DM's bio splitting changes that were made during v5.9. This
         restores splitting in terms of varied per-target ti->max_io_len
         rather than use block core's single stacked 'chunk_sectors' limit.
      
       - Like DM crypt, update DM integrity to not use crypto drivers that
         have CRYPTO_ALG_ALLOCATES_MEMORY set.
      
       - Fix DM writecache target's argument parsing and status display.
      
       - Remove needless BUG() from dm writecache's persistent_memory_claim()
      
       - Remove old gcc workaround in DM cache target's block_div() for ARM
         link errors now that gcc >= 4.9 is required.
      
       - Fix RCU locking in dm_blk_report_zones and dm_dax_zero_page_range.
      
       - Remove old, and now frowned upon, BUG_ON(in_interrupt()) in
         dm_table_event().
      
       - Remove invalid sparse annotations from dm_prepare_ioctl() and
         dm_unprepare_ioctl().
      
      * tag 'for-5.10/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
        dm: remove invalid sparse __acquires and __releases annotations
        dm: fix double RCU unlock in dm_dax_zero_page_range() error path
        dm: fix IO splitting
        dm writecache: remove BUG() and fail gracefully instead
        dm table: Remove BUG_ON(in_interrupt())
        dm: fix bug with RCU locking in dm_blk_report_zones
        Revert "dm cache: fix arm link errors with inline"
        dm writecache: fix the maximum number of arguments
        dm writecache: advance the number of arguments when reporting max_age
        dm integrity: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY
      b3298500
    • Mike Snitzer's avatar
      dm: remove invalid sparse __acquires and __releases annotations · bde3808b
      Mike Snitzer authored
      Fixes sparse warnings:
      drivers/md/dm.c:508:12: warning: context imbalance in 'dm_prepare_ioctl' - wrong count at exit
      drivers/md/dm.c:543:13: warning: context imbalance in 'dm_unprepare_ioctl' - wrong count at exit
      
      Fixes: 971888c4 ("dm: hold DM table for duration of ioctl rather than use blkdev_get")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      bde3808b
    • Mike Snitzer's avatar
      dm: fix double RCU unlock in dm_dax_zero_page_range() error path · f05c4403
      Mike Snitzer authored
      Remove redundant dm_put_live_table() in dm_dax_zero_page_range() error
      path to fix sparse warning:
      drivers/md/dm.c:1208:9: warning: context imbalance in 'dm_dax_zero_page_range' - unexpected unlock
      
      Fixes: cdf6cdcd ("dm,dax: Add dax zero_page_range operation")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      f05c4403
    • Mike Snitzer's avatar
      dm: fix IO splitting · 3ee16db3
      Mike Snitzer authored
      Commit 882ec4e6 ("dm table: stack 'chunk_sectors' limit to account
      for target-specific splitting") caused a couple regressions:
      1) Using lcm_not_zero() when stacking chunk_sectors was a bug because
         chunk_sectors must reflect the most limited of all devices in the
         IO stack.
      2) DM targets that set max_io_len but that do _not_ provide an
         .iterate_devices method no longer had there IO split properly.
      
      And commit 5091cdec ("dm: change max_io_len() to use
      blk_max_size_offset()") also caused a regression where DM no longer
      supported varied (per target) IO splitting. The implication being the
      potential for severely reduced performance for IO stacks that use a DM
      target like dm-cache to hide performance limitations of a slower
      device (e.g. one that requires 4K IO splitting).
      
      Coming full circle: Fix all these issues by discontinuing stacking
      chunk_sectors up using ti->max_io_len in dm_calculate_queue_limits(),
      add optional chunk_sectors override argument to blk_max_size_offset()
      and update DM's max_io_len() to pass ti->max_io_len to its
      blk_max_size_offset() call.
      
      Passing in an optional chunk_sectors override to blk_max_size_offset()
      allows for code reuse of block's centralized calculation for max IO
      size based on provided offset and split boundary.
      
      Fixes: 882ec4e6 ("dm table: stack 'chunk_sectors' limit to account for target-specific splitting")
      Fixes: 5091cdec ("dm: change max_io_len() to use blk_max_size_offset()")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarJohn Dorminy <jdorminy@redhat.com>
      Reported-by: default avatarBruce Johnston <bjohnsto@redhat.com>
      Reported-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Reviewed-by: default avatarJohn Dorminy <jdorminy@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Reviewed-by: default avatarJens Axboe <axboe@kernel.dk>
      3ee16db3
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2020-12-04' of git://anongit.freedesktop.org/drm/drm · e87297fa
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "This week's regular fixes.
      
        i915 has fixes for a few races, use-after-free, and gpu hangs. Tegra
        just has some minor fixes that I didn't see much point in hanging on
        to. The nouveau fix is for all pre-nv50 cards and was reported a few
        times. Otherwise it's just some amdgpu, and a few misc fixes.
      
        Summary:
      
        amdgpu:
         - SMU11 manual fan fix
         - Renoir display clock fix
         - VCN3 dynamic powergating fix
      
        i915:
         - Program mocs:63 for cache eviction on gen9 (Chris)
         - Protect context lifetime with RCU (Chris)
         - Split the breadcrumb spinlock between global and contexts (Chris)
         - Retain default context state across shrinking (Venkata)
         - Limit frequency drop to RPe on parking (Chris)
         - Return earlier from intel_modeset_init() without display (Jani)
         - Defer initial modeset until after GGTT is initialized (Chris)
      
        nouveau:
         - pre-nv50 regression fix
      
        rockchip:
         - uninitialised LVDS property fix
      
        omap:
         - bridge fix
      
        panel:
         - race fix
      
        mxsfb:
         - fence sync fix
         - modifiers fix
      
        tegra:
         - idr init fix
         - sor fixes
         - output/of cleanup fix"
      
      * tag 'drm-fixes-2020-12-04' of git://anongit.freedesktop.org/drm/drm: (22 commits)
        drm/amdgpu/vcn3.0: remove old DPG workaround
        drm/amdgpu/vcn3.0: stall DPG when WPTR/RPTR reset
        drm/amd/display: Init clock value by current vbios CLKs
        drm/amdgpu/pm/smu11: Fix fan set speed bug
        drm/i915/display: Defer initial modeset until after GGTT is initialised
        drm/i915/display: return earlier from intel_modeset_init() without display
        drm/i915/gt: Limit frequency drop to RPe on parking
        drm/i915/gt: Retain default context state across shrinking
        drm/i915/gt: Split the breadcrumb spinlock between global and contexts
        drm/i915/gt: Protect context lifetime with RCU
        drm/i915/gt: Program mocs:63 for cache eviction on gen9
        drm/omap: sdi: fix bridge enable/disable
        drm/panel: sony-acx565akm: Fix race condition in probe
        drm/rockchip: Avoid uninitialized use of endpoint id in LVDS
        drm/tegra: sor: Disable clocks on error in tegra_sor_init()
        drm/nouveau: make sure ret is initialized in nouveau_ttm_io_mem_reserve
        drm: mxsfb: Implement .format_mod_supported
        drm: mxsfb: fix fence synchronization
        drm/tegra: output: Do not put OF node twice
        drm/tegra: replace idr_init() by idr_init_base()
        ...
      e87297fa
    • Dave Airlie's avatar
      Merge tag 'drm-misc-fixes-2020-12-03' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes · de9b485d
      Dave Airlie authored
      One bridge fix for OMAP, one for a race condition in a panel, two for
      uninitialized variables in rockchip and nouveau, and two fixes for mxsfb
      to fix a regression with modifiers and a fix for a fence synchronization
      issue.
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      
      From: Maxime Ripard <maxime@cerno.tech>
      Link: https://patchwork.freedesktop.org/patch/msgid/20201203125943.h2ft2xoywunt5orl@gilmour
      de9b485d
    • Dave Airlie's avatar
      Merge tag 'amd-drm-fixes-5.10-2020-12-02' of... · 5353219f
      Dave Airlie authored
      Merge tag 'amd-drm-fixes-5.10-2020-12-02' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
      
      amd-drm-fixes-5.10-2020-12-02:
      
      amdgpu:
      - SMU11 manual fan fix
      - Renoir display clock fix
      - VCN3 dynamic powergating fix
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      From: Alex Deucher <alexdeucher@gmail.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20201203044815.41257-1-alexander.deucher@amd.com
      5353219f
    • Dave Airlie's avatar
      Merge tag 'drm-intel-fixes-2020-12-03' of... · 94cfbd05
      Dave Airlie authored
      Merge tag 'drm-intel-fixes-2020-12-03' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
      
      Fixes for GPU hang, null dereference, suspend-resume, power consumption, and use-after-free.
      
      - Program mocs:63 for cache eviction on gen9 (Chris)
      - Protect context lifetime with RCU (Chris)
      - Split the breadcrumb spinlock between global and contexts (Chris)
      - Retain default context state across shrinking (Venkata)
      - Limit frequency drop to RPe on parking (Chris)
      - Return earlier from intel_modeset_init() without display (Jani)
      - Defer initial modeset until after GGTT is initialized (Chris)
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      
      From: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20201203134705.GA1575873@intel.com
      94cfbd05
  4. 03 Dec, 2020 17 commits