• Alexander Gordeev's avatar
    s390/mm: fix 2KB pgtable release race · c2c22493
    Alexander Gordeev authored
    There is a race on concurrent 2KB-pgtables release paths when
    both upper and lower halves of the containing parent page are
    freed, one via page_table_free_rcu() + __tlb_remove_table(),
    and the other via page_table_free(). The race might lead to a
    corruption as result of remove of list item in page_table_free()
    concurrently with __free_page() in __tlb_remove_table().
    
    Let's assume first the lower and next the upper 2KB-pgtables are
    freed from a page. Since both halves of the page are allocated
    the tracking byte (bits 24-31 of the page _refcount) has value
    of 0x03 initially:
    
    CPU0				CPU1
    ----				----
    
    page_table_free_rcu() // lower half
    {
    	// _refcount[31..24] == 0x03
    	...
    	atomic_xor_bits(&page->_refcount,
    			0x11U << (0 + 24));
    	// _refcount[31..24] <= 0x12
    	...
    	table = table | (1U << 0);
    	tlb_remove_table(tlb, table);
    }
    ...
    __tlb_remove_table()
    {
    	// _refcount[31..24] == 0x12
    	mask = _table & 3;
    	// mask <= 0x01
    	...
    
    				page_table_free() // upper half
    				{
    					// _refcount[31..24] == 0x12
    					...
    					atomic_xor_bits(
    						&page->_refcount,
    						1U << (1 + 24));
    					// _refcount[31..24] <= 0x10
    					// mask <= 0x10
    					...
    	atomic_xor_bits(&page->_refcount,
    			mask << (4 + 24));
    	// _refcount[31..24] <= 0x00
    	// mask <= 0x00
    	...
    	if (mask != 0) // == false
    		break;
    	fallthrough;
    	...
    					if (mask & 3) // == false
    						...
    					else
    	__free_page(page);			list_del(&page->lru);
    	^^^^^^^^^^^^^^^^^^	RACE!		^^^^^^^^^^^^^^^^^^^^^
    }					...
    				}
    
    The problem is page_table_free() releases the page as result of
    lower nibble unset and __tlb_remove_table() observing zero too
    early. With this update page_table_free() will use the similar
    logic as page_table_free_rcu() + __tlb_remove_table(), and mark
    the fragment as pending for removal in the upper nibble until
    after the list_del().
    
    In other words, the parent page is considered as unreferenced and
    safe to release only when the lower nibble is cleared already and
    unsetting a bit in upper nibble results in that nibble turned zero.
    
    Cc: stable@vger.kernel.org
    Suggested-by: default avatarVlastimil Babka <vbabka@suse.com>
    Reviewed-by: default avatarGerald Schaefer <gerald.schaefer@linux.ibm.com>
    Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
    Signed-off-by: default avatarHeiko Carstens <hca@linux.ibm.com>
    c2c22493
pgalloc.c 15.1 KB