1. 20 Oct, 2020 4 commits
  2. 15 Oct, 2020 11 commits
    • Bob Peterson's avatar
      gfs2: eliminate GLF_QUEUED flag in favor of list_empty(gl_holders) · e2c6c8a7
      Bob Peterson authored
      Before this patch, glock.c maintained a flag, GLF_QUEUED, which indicated
      when a glock had a holder queued. It was only checked for inode glocks,
      although set and cleared by all glocks, and it was only used to determine
      whether the glock should be held for the minimum hold time before releasing.
      
      The problem is that the flag is not accurate at all. If a process holds
      the glock, the flag is set. When they dequeue the glock, it only cleared
      the flag in cases when the state actually changed. So if the state doesn't
      change, the flag may still be set, even when nothing is queued.
      
      This happens to iopen glocks often: the get held in SH, then the file is
      closed, but the glock remains in SH mode.
      
      We don't need a special flag to indicate this: we can simply tell whether
      the glock has any items queued to the holders queue. It's a waste of cpu
      time to maintain it.
      
      This patch eliminates the flag in favor of simply checking list_empty
      on the glock holders.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      e2c6c8a7
    • Bob Peterson's avatar
      gfs2: Ignore journal log writes for jdata holes · b2a846db
      Bob Peterson authored
      When flushing out its ail1 list, gfs2_write_jdata_page calls function
      __block_write_full_page passing in function gfs2_get_block_noalloc.
      But there was a problem when a process wrote to a jdata file, then
      truncated it or punched a hole, leaving references to the blocks within
      the new hole in its ail list, which are to be written to the journal log.
      
      In writing them to the journal, after calling gfs2_block_map, function
      gfs2_get_block_noalloc determined that the (hole-punched) block was not
      mapped, so it returned -EIO to generic_writepages, which passed it back
      to gfs2_ail1_start_one. This, in turn, performed a withdraw, assuming
      there was a real IO error writing to the journal.
      
      This might be a valid error when writing metadata to the journal, but for
      journaled data writes, it does not warrant a withdraw.
      
      This patch adds a check to function gfs2_block_map that makes an exception
      for journaled data writes that correspond to jdata holes: If the iomap
      get function returns a block type of IOMAP_HOLE, it instead returns
      -ENODATA which does not cause the withdraw. Other errors are returned as
      before.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      b2a846db
    • Bob Peterson's avatar
      gfs2: simplify gfs2_block_map · a6645745
      Bob Peterson authored
      Function gfs2_block_map had a lot of redundancy between its create and
      no_create paths. This patch simplifies the code to eliminate the redundancy.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      a6645745
    • Bob Peterson's avatar
      gfs2: Only set PageChecked if we have a transaction · 6302d6f4
      Bob Peterson authored
      With jdata writes, we frequently got into situations where gfs2 deadlocked
      because of this calling sequence:
      
      gfs2_ail1_start
         gfs2_ail1_flush - for every tr on the sd_ail1_list:
            gfs2_ail1_start_one - for every bd on the tr's tr_ail1_list:
               generic_writepages
      	    write_cache_pages passing __writepage()
      	       calls clear_page_dirty_for_io which calls set_page_dirty:
      	          which calls jdata_set_page_dirty which sets PageChecked.
      	       __writepage() calls
      	          mapping->a_ops->writepage AKA gfs2_jdata_writepage
      
      However, gfs2_jdata_writepage checks if PageChecked is set, and if so, it
      ignores the write and redirties the page. The problem is that write_cache_pages
      calls clear_page_dirty_for_io, which often calls set_page_dirty(). See comments
      in page-writeback.c starting with "Yes, Virginia". If it's jdata,
      set_page_dirty will call jdata_set_page_dirty which will set PageChecked.
      That causes a conflict because it makes it look like the page has been
      redirtied by another writer, in which case we need to skip writing it and
      redirty the page. That ends up in a deadlock because it isn't a "real" writer
      and nothing will ever clear PageChecked.
      
      If we do have a real writer, it will have started a transaction. So this
      patch checks if a transaction is in use, and if not, it skips setting
      PageChecked. That way, the page will be dirtied, cleaned, and written
      appropriately.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      6302d6f4
    • Bob Peterson's avatar
      gfs2: don't lock sd_ail_lock in gfs2_releasepage · 249ffe18
      Bob Peterson authored
      Patch 380f7c65 changed gfs2_releasepage
      so that it held the sd_ail_lock spin_lock for most of its processing.
      It did this for some mysterious undocumented bug somewhere in the
      evict code path. But in the nine years since, evict has been reworked
      and fixed many times, and so have the transactions and ail list.
      I can't see a reason to hold the sd_ail_lock unless it's protecting
      the actual ail lists hung off the transactions. Therefore, this patch
      removes the locking to increase speed and efficiency, and to further help
      us rework the log flush code to be more concurrent with transactions.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      249ffe18
    • Bob Peterson's avatar
      gfs2: make gfs2_ail1_empty_one return the count of active items · 36c78309
      Bob Peterson authored
      This patch is one baby step toward simplifying the journal management.
      It simply changes function gfs2_ail1_empty_one from a void to an int and
      makes it return a count of active items. This allows the caller to check
      the return code rather than list_empty on the tr_ail1_list. This way
      we can, in a later patch, combine transaction ail1 and ail2 lists.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      36c78309
    • Bob Peterson's avatar
      gfs2: Wipe jdata and ail1 in gfs2_journal_wipe, formerly gfs2_meta_wipe · 68942870
      Bob Peterson authored
      Before this patch, when blocks were freed, it called gfs2_meta_wipe to
      take the metadata out of the pending journal blocks. It did this mostly
      by calling another function called gfs2_remove_from_journal. This is
      shortsighted because it does not do anything with jdata blocks which
      may also be in the journal.
      
      This patch expands the function so that it wipes out jdata blocks from
      the journal as well, and it wipes it from the ail1 list if it hasn't
      been written back yet. Since it now processes jdata blocks as well,
      the function has been renamed from gfs2_meta_wipe to gfs2_journal_wipe.
      
      New function gfs2_ail1_wipe wants a static view of the ail list, so it
      locks the sd_ail_lock when removing items. To accomplish this, function
      gfs2_remove_from_journal no longer locks the sd_ail_lock, and it's now
      the caller's responsibility to do so.
      
      I was going to make sd_ail_lock locking conditional, but the practice is
      generally frowned upon. For details, see: https://lwn.net/Articles/109066/Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      68942870
    • Bob Peterson's avatar
      gfs2: enhance log_blocks trace point to show log blocks free · 97c5e43d
      Bob Peterson authored
      This patch adds some code to enhance the log_blocks trace point. It
      reports the number of free log blocks. This makes the trace point much
      more useful, especially for debugging performance problems when we can
      tell when the journal gets full and needs to wait for flushes, etc.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      97c5e43d
    • Bob Peterson's avatar
      gfs2: add missing log_blocks trace points in gfs2_write_revokes · 77650bdb
      Bob Peterson authored
      Function gfs2_write_revokes was incrementing and decrementing the number
      of log blocks free, but there was never a log_blocks trace point for it.
      Thus, the free blocks from a log_blocks trace would jump around
      mysteriously.
      
      This patch adds the missing trace points so the trace makes more sense.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      77650bdb
    • Bob Peterson's avatar
      gfs2: rename gfs2_write_full_page to gfs2_write_jdata_page, remove parm · 21b6924b
      Bob Peterson authored
      Since the function is only used for writing jdata pages, this patch
      simply renames function gfs2_write_full_page to a more appropriate
      name: gfs2_write_jdata_page. This makes the code easier to understand.
      
      The function was only called in one place, which passed in a pointer to
      function gfs2_get_block_noalloc. The function doesn't need to be
      passed in. Therefore, this also eliminates the unnecessary parameter
      to increase efficiency.
      
      I also took the liberty of cleaning up the function comments.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      21b6924b
    • Anant Thazhemadam's avatar
      gfs2: add validation checks for size of superblock · 0ddc5154
      Anant Thazhemadam authored
      In gfs2_check_sb(), no validation checks are performed with regards to
      the size of the superblock.
      syzkaller detected a slab-out-of-bounds bug that was primarily caused
      because the block size for a superblock was set to zero.
      A valid size for a superblock is a power of 2 between 512 and PAGE_SIZE.
      Performing validation checks and ensuring that the size of the superblock
      is valid fixes this bug.
      
      Reported-by: syzbot+af90d47a37376844e731@syzkaller.appspotmail.com
      Tested-by: syzbot+af90d47a37376844e731@syzkaller.appspotmail.com
      Suggested-by: default avatarAndrew Price <anprice@redhat.com>
      Signed-off-by: default avatarAnant Thazhemadam <anant.thazhemadam@gmail.com>
      [Minor code reordering.]
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      0ddc5154
  3. 14 Oct, 2020 12 commits
  4. 11 Oct, 2020 10 commits
  5. 10 Oct, 2020 3 commits
    • Linus Torvalds's avatar
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · da690031
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang:
       "Some more driver bugfixes for I2C. Including a revert - the updated
        series for it will come during the next merge window"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: owl: Clear NACK and BUS error bits
        Revert "i2c: imx: Fix reset of I2SR_IAL flag"
        i2c: meson: fixup rate calculation with filter delay
        i2c: meson: keep peripheral clock enabled
        i2c: meson: fix clock setting overwrite
        i2c: imx: Fix reset of I2SR_IAL flag
      da690031
    • Vladimir Zapolskiy's avatar
      cifs: Fix incomplete memory allocation on setxattr path · 64b7f674
      Vladimir Zapolskiy authored
      On setxattr() syscall path due to an apprent typo the size of a dynamically
      allocated memory chunk for storing struct smb2_file_full_ea_info object is
      computed incorrectly, to be more precise the first addend is the size of
      a pointer instead of the wanted object size. Coincidentally it makes no
      difference on 64-bit platforms, however on 32-bit targets the following
      memcpy() writes 4 bytes of data outside of the dynamically allocated memory.
      
        =============================================================================
        BUG kmalloc-16 (Not tainted): Redzone overwritten
        -----------------------------------------------------------------------------
      
        Disabling lock debugging due to kernel taint
        INFO: 0x79e69a6f-0x9e5cdecf @offset=368. First byte 0x73 instead of 0xcc
        INFO: Slab 0xd36d2454 objects=85 used=51 fp=0xf7d0fc7a flags=0x35000201
        INFO: Object 0x6f171df3 @offset=352 fp=0x00000000
      
        Redzone 5d4ff02d: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc  ................
        Object 6f171df3: 00 00 00 00 00 05 06 00 73 6e 72 75 62 00 66 69  ........snrub.fi
        Redzone 79e69a6f: 73 68 32 0a                                      sh2.
        Padding 56254d82: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
        CPU: 0 PID: 8196 Comm: attr Tainted: G    B             5.9.0-rc8+ #3
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
        Call Trace:
         dump_stack+0x54/0x6e
         print_trailer+0x12c/0x134
         check_bytes_and_report.cold+0x3e/0x69
         check_object+0x18c/0x250
         free_debug_processing+0xfe/0x230
         __slab_free+0x1c0/0x300
         kfree+0x1d3/0x220
         smb2_set_ea+0x27d/0x540
         cifs_xattr_set+0x57f/0x620
         __vfs_setxattr+0x4e/0x60
         __vfs_setxattr_noperm+0x4e/0x100
         __vfs_setxattr_locked+0xae/0xd0
         vfs_setxattr+0x4e/0xe0
         setxattr+0x12c/0x1a0
         path_setxattr+0xa4/0xc0
         __ia32_sys_lsetxattr+0x1d/0x20
         __do_fast_syscall_32+0x40/0x70
         do_fast_syscall_32+0x29/0x60
         do_SYSENTER_32+0x15/0x20
         entry_SYSENTER_32+0x9f/0xf2
      
      Fixes: 5517554e ("cifs: Add support for writing attributes on SMB2+")
      Signed-off-by: default avatarVladimir Zapolskiy <vladimir@tuxera.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      64b7f674
    • Hugh Dickins's avatar
      mm/khugepaged: fix filemap page_to_pgoff(page) != offset · 033b5d77
      Hugh Dickins authored
      There have been elusive reports of filemap_fault() hitting its
      VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page) on kernels built
      with CONFIG_READ_ONLY_THP_FOR_FS=y.
      
      Suren has hit it on a kernel with CONFIG_READ_ONLY_THP_FOR_FS=y and
      CONFIG_NUMA is not set: and he has analyzed it down to how khugepaged
      without NUMA reuses the same huge page after collapse_file() failed
      (whereas NUMA targets its allocation to the respective node each time).
      And most of us were usually testing with CONFIG_NUMA=y kernels.
      
      collapse_file(old start)
        new_page = khugepaged_alloc_page(hpage)
        __SetPageLocked(new_page)
        new_page->index = start // hpage->index=old offset
        new_page->mapping = mapping
        xas_store(&xas, new_page)
      
                                filemap_fault
                                  page = find_get_page(mapping, offset)
                                  // if offset falls inside hpage then
                                  // compound_head(page) == hpage
                                  lock_page_maybe_drop_mmap()
                                    __lock_page(page)
      
        // collapse fails
        xas_store(&xas, old page)
        new_page->mapping = NULL
        unlock_page(new_page)
      
      collapse_file(new start)
        new_page = khugepaged_alloc_page(hpage)
        __SetPageLocked(new_page)
        new_page->index = start // hpage->index=new offset
        new_page->mapping = mapping // mapping becomes valid again
      
                                  // since compound_head(page) == hpage
                                  // page_to_pgoff(page) got changed
                                  VM_BUG_ON_PAGE(page_to_pgoff(page) != offset)
      
      An initial patch replaced __SetPageLocked() by lock_page(), which did
      fix the race which Suren illustrates above.  But testing showed that it's
      not good enough: if the racing task's __lock_page() gets delayed long
      after its find_get_page(), then it may follow collapse_file(new start)'s
      successful final unlock_page(), and crash on the same VM_BUG_ON_PAGE.
      
      It could be fixed by relaxing filemap_fault()'s VM_BUG_ON_PAGE to a
      check and retry (as is done for mapping), with similar relaxations in
      find_lock_entry() and pagecache_get_page(): but it's not obvious what
      else might get caught out; and khugepaged non-NUMA appears to be unique
      in exposing a page to page cache, then revoking, without going through
      a full cycle of freeing before reuse.
      
      Instead, non-NUMA khugepaged_prealloc_page() release the old page
      if anyone else has a reference to it (1% of cases when I tested).
      
      Although never reported on huge tmpfs, I believe its find_lock_entry()
      has been at similar risk; but huge tmpfs does not rely on khugepaged
      for its normal working nearly so much as READ_ONLY_THP_FOR_FS does.
      Reported-by: default avatarDenis Lisov <dennis.lissov@gmail.com>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206569
      Link: https://lore.kernel.org/linux-mm/?q=20200219144635.3b7417145de19b65f258c943%40linux-foundation.orgReported-by: default avatarQian Cai <cai@lca.pw>
      Link: https://lore.kernel.org/linux-xfs/?q=20200616013309.GB815%40lca.pwReported-and-analyzed-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Fixes: 87c460a0 ("mm/khugepaged: collapse_shmem() without freezing new_page")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: stable@vger.kernel.org # v4.9+
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      033b5d77