• Hugh Dickins's avatar
    tmpfs: demolish old swap vector support · 285b2c4f
    Hugh Dickins authored
    The maximum size of a shmem/tmpfs file has been limited by the maximum
    size of its triple-indirect swap vector.  With 4kB page size, maximum
    filesize was just over 2TB on a 32-bit kernel, but sadly one eighth of
    that on a 64-bit kernel.  (With 8kB page size, maximum filesize was just
    over 4TB on a 64-bit kernel, but 16TB on a 32-bit kernel,
    MAX_LFS_FILESIZE being then more restrictive than swap vector layout.)
    
    It's a shame that tmpfs should be more restrictive than ramfs, and this
    limitation has now been noticed.  Add another level to the swap vector?
    No, it became obscure and hard to maintain, once I complicated it to
    make use of highmem pages nine years ago: better choose another way.
    
    Surely, if 2.4 had had the radix tree pagecache introduced in 2.5, then
    tmpfs would never have invented its own peculiar radix tree: we would
    have fitted swap entries into the common radix tree instead, in much the
    same way as we fit swap entries into page tables.
    
    And why should each file have a separate radix tree for its pages and
    for its swap entries? The swap entries are required precisely where and
    when the pages are not.  We want to put them together in a single radix
    tree: which can then avoid much of the locking which was needed to
    prevent them from being exchanged underneath us.
    
    This also avoids the waste of memory devoted to swap vectors, first in
    the shmem_inode itself, then at least two more pages once a file grew
    beyond 16 data pages (pages accounted by df and du, but not by memcg).
    Allocated upfront, to avoid allocation when under swapping pressure, but
    pure waste when CONFIG_SWAP is not set - I have never spattered around
    the ifdefs to prevent that, preferring this move to sharing the common
    radix tree instead.
    
    There are three downsides to sharing the radix tree.  One, that it binds
    tmpfs more tightly to the rest of mm, either requiring knowledge of swap
    entries in radix tree there, or duplication of its code here in shmem.c.
    I believe that the simplications and memory savings (and probable higher
    performance, not yet measured) justify that.
    
    Two, that on HIGHMEM systems with SWAP enabled, it's the lowmem radix
    nodes that cannot be freed under memory pressure - whereas before it was
    the less precious highmem swap vector pages that could not be freed.
    I'm hoping that 64-bit has now been accessible for long enough, that the
    highmem argument has grown much less persuasive.
    
    Three, that swapoff is slower than it used to be on tmpfs files, since
    it's using a simple generic mechanism not tailored to it: I find this
    noticeable, and shall want to improve, but maybe nobody else will
    notice.
    
    So...  now remove most of the old swap vector code from shmem.c.  But,
    for the moment, keep the simple i_direct vector of 16 pages, with simple
    accessors shmem_put_swap() and shmem_get_swap(), as a toy implementation
    to help mark where swap needs to be handled in subsequent patches.
    Signed-off-by: default avatarHugh Dickins <hughd@google.com>
    Acked-by: default avatarRik van Riel <riel@redhat.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    285b2c4f
shmem.c 61.8 KB