• Mel Gorman's avatar
    mm: pin address_space before dereferencing it while isolating an LRU page · 01f10aa3
    Mel Gorman authored
    BugLink: https://bugs.launchpad.net/bugs/1775771
    
    [ Upstream commit 69d763fc ]
    
    Minchan Kim asked the following question -- what locks protects
    address_space destroying when race happens between inode trauncation and
    __isolate_lru_page? Jan Kara clarified by describing the race as follows
    
    CPU1                                            CPU2
    
    truncate(inode)                                 __isolate_lru_page()
      ...
      truncate_inode_page(mapping, page);
        delete_from_page_cache(page)
          spin_lock_irqsave(&mapping->tree_lock, flags);
            __delete_from_page_cache(page, NULL)
              page_cache_tree_delete(..)
                ...                                   mapping = page_mapping(page);
                page->mapping = NULL;
                ...
          spin_unlock_irqrestore(&mapping->tree_lock, flags);
          page_cache_free_page(mapping, page)
            put_page(page)
              if (put_page_testzero(page)) -> false
    - inode now has no pages and can be freed including embedded address_space
    
                                                      if (mapping && !mapping->a_ops->migratepage)
    - we've dereferenced mapping which is potentially already free.
    
    The race is theoretically possible but unlikely.  Before the
    delete_from_page_cache, truncate_cleanup_page is called so the page is
    likely to be !PageDirty or PageWriteback which gets skipped by the only
    caller that checks the mappping in __isolate_lru_page.  Even if the race
    occurs, a substantial amount of work has to happen during a tiny window
    with no preemption but it could potentially be done using a virtual
    machine to artifically slow one CPU or halt it during the critical
    window.
    
    This patch should eliminate the race with truncation by try-locking the
    page before derefencing mapping and aborting if the lock was not
    acquired.  There was a suggestion from Huang Ying to use RCU as a
    side-effect to prevent mapping being freed.  However, I do not like the
    solution as it's an unconventional means of preserving a mapping and
    it's not a context where rcu_read_lock is obviously protecting rcu data.
    
    Link: http://lkml.kernel.org/r/20180104102512.2qos3h5vqzeisrek@techsingularity.net
    Fixes: c8244935 ("mm: compaction: make isolate_lru_page() filter-aware again")
    Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
    Acked-by: default avatarMinchan Kim <minchan@kernel.org>
    Cc: "Huang, Ying" <ying.huang@intel.com>
    Cc: Jan Kara <jack@suse.cz>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    Signed-off-by: default avatarJuerg Haefliger <juergh@canonical.com>
    Signed-off-by: default avatarKhalid Elmously <khalid.elmously@canonical.com>
    01f10aa3
vmscan.c 111 KB