1. 03 Apr, 2009 40 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-fscache · 3cc50ac0
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-fscache: (41 commits)
        NFS: Add mount options to enable local caching on NFS
        NFS: Display local caching state
        NFS: Store pages from an NFS inode into a local cache
        NFS: Read pages from FS-Cache into an NFS inode
        NFS: nfs_readpage_async() needs to be accessible as a fallback for local caching
        NFS: Add read context retention for FS-Cache to call back with
        NFS: FS-Cache page management
        NFS: Add some new I/O counters for FS-Cache doing things for NFS
        NFS: Invalidate FsCache page flags when cache removed
        NFS: Use local disk inode cache
        NFS: Define and create inode-level cache objects
        NFS: Define and create superblock-level objects
        NFS: Define and create server-level objects
        NFS: Register NFS for caching and retrieve the top-level index
        NFS: Permit local filesystem caching to be enabled for NFS
        NFS: Add FS-Cache option bit and debug bit
        NFS: Add comment banners to some NFS functions
        FS-Cache: Make kAFS use FS-Cache
        CacheFiles: A cache that backs onto a mounted filesystem
        CacheFiles: Export things for CacheFiles
        ...
      3cc50ac0
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm · d9b9be02
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm: (36 commits)
        dm: set queue ordered mode
        dm: move wait queue declaration
        dm: merge pushback and deferred bio lists
        dm: allow uninterruptible wait for pending io
        dm: merge __flush_deferred_io into caller
        dm: move bio_io_error into __split_and_process_bio
        dm: rename __split_bio
        dm: remove unnecessary struct dm_wq_req
        dm: remove unnecessary work queue context field
        dm: remove unnecessary work queue type field
        dm: bio list add bio_list_add_head
        dm snapshot: persistent fix dtr cleanup
        dm snapshot: move status to exception store
        dm snapshot: move ctr parsing to exception store
        dm snapshot: use DMEMIT macro for status
        dm snapshot: remove dm_snap header
        dm snapshot: remove dm_snap header use
        dm exception store: move cow pointer
        dm exception store: move chunk_fields
        dm exception store: move dm_target pointer
        ...
      d9b9be02
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd · 9b59f031
      Linus Torvalds authored
      * 'for-linus' of git://git.open-osd.org/linux-open-osd:
        fs: Add exofs to Kernel build
        exofs: Documentation
        exofs: export_operations
        exofs: super_operations and file_system_type
        exofs: dir_inode and directory operations
        exofs: address_space_operations
        exofs: symlink_inode and fast_symlink_inode operations
        exofs: file and file_inode operations
        exofs: Kbuild, Headers and osd utils
      9b59f031
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs · ac7c1a77
      Linus Torvalds authored
      * 'for-linus' of git://oss.sgi.com/xfs/xfs: (61 commits)
        Revert "xfs: increase the maximum number of supported ACL entries"
        xfs: cleanup uuid handling
        xfs: remove m_attroffset
        xfs: fix various typos
        xfs: pagecache usage optimization
        xfs: remove m_litino
        xfs: kill ino64 mount option
        xfs: kill mutex_t typedef
        xfs: increase the maximum number of supported ACL entries
        xfs: factor out code to find the longest free extent in the AG
        xfs: kill VN_BAD
        xfs: kill vn_atime_* helpers.
        xfs: cleanup xlog_bread
        xfs: cleanup xlog_recover_do_trans
        xfs: remove another leftover of the old inode log item format
        xfs: cleanup log unmount handling
        Fix xfs debug build breakage by pushing xfs_error.h after
        xfs: include header files for prototypes
        xfs: make symbols static
        xfs: move declaration to header file
        ...
      ac7c1a77
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/kyle/parisc-2.6 · 3ba113d1
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/kyle/parisc-2.6: (23 commits)
        parisc: move dereference_function_descriptor to process.c
        parisc: Move kernel Elf_Fdesc define to <asm/elf.h>
        parisc: fix build when ARCH_HAS_KMAP
        parisc: fix "make tar-pkg"
        parisc: drivers: fix warnings
        parisc: select BUG always
        parisc: asm/pdc.h should include asm/page.h
        parisc: led: remove proc_dir_entry::owner
        parisc: fix macro expansion in atomic.h
        parisc: iosapic: fix build breakage
        parisc: oops_enter()/oops_exit() in die()
        parisc: document light weight syscall ABI
        parisc: blink all or loadavg LEDs on oops
        parisc: add ftrace (function and graph tracer) functionality
        parisc: simplify sys_clone()
        parisc: add LATENCYTOP_SUPPORT and CONFIG_STACKTRACE_SUPPORT
        parisc: allow to build with 16k default kernel page size
        parisc: expose 32/64-bit capabilities in cpuinfo
        parisc: use constants instead of numbers in assembly
        parisc: fix usage of 32bit PTE page table entries on 32bit kernels
        ...
      3ba113d1
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/kyle/rtc-parisc · bad6a5c0
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/kyle/rtc-parisc:
        powerpc/ps3: Add rtc-ps3
        powerpc: Hook up rtc-generic, and kill rtc-ppc
        m68k: Hook up rtc-generic
        parisc: rtc: Rename rtc-parisc to rtc-generic
        parisc: rtc: Add missing module alias
        parisc: rtc: platform_driver_probe() fixups
        parisc: rtc: get_rtc_time() returns unsigned int
      bad6a5c0
    • Linus Torvalds's avatar
      Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6 · 03c3fa0a
      Linus Torvalds authored
      * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6:
        udf: Don't write integrity descriptor too often
        udf: Try anchor in block 256 first
        udf: Some type fixes and cleanups
        udf: use hardware sector size
        udf: fix novrs mount option
        udf: Fix oops when invalid character in filename occurs
        udf: return f_fsid for statfs(2)
        udf: Add checks to not underflow sector_t
        udf: fix default mode and dmode options handling
        udf: fix sparse warnings:
        udf: unsigned last[i] cannot be less than 0
        udf: implement mode and dmode mounting options
        udf: reduce stack usage of udf_get_filename
        udf: reduce stack usage of udf_load_pvoldesc
        Fix the udf code not to pass structs on stack where possible.
        Remove struct typedefs from fs/udf/ecma_167.h et al.
      03c3fa0a
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/rcu-doc-2.6 · 3e850509
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/rcu-doc-2.6:
        Doc: Fix spelling in RCU/rculist_nulls.txt.
        Doc: Fix wrong API example usage of call_rcu().
        Doc: Fix missing whitespaces in RCU documentation.
      3e850509
    • Akinobu Mita's avatar
      mm: fix misuse of debug_kmap_atomic · a0e0404f
      Akinobu Mita authored
      Commit 7ca43e75 ("mm: use debug_kmap_atomic")
      introduced some debug_kmap_atomic() in wrong places.
      Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a0e0404f
    • Kumar Gala's avatar
      Fix highmem PPC build failure · 3688e07f
      Kumar Gala authored
      Commit f4112de6 ("mm: introduce
      debug_kmap_atomic") broke PPC builds with CONFIG_HIGHMEM=y:
      
         CC      init/main.o
        In file included from include/linux/highmem.h:25,
                         from include/linux/pagemap.h:11,
                         from include/linux/mempolicy.h:63,
                         from init/main.c:53:
        arch/powerpc/include/asm/highmem.h: In function 'kmap_atomic_prot':
        arch/powerpc/include/asm/highmem.h:98: error: implicit declaration of function 'debug_kmap_atomic'
        In file included from include/linux/pagemap.h:11,
                         from include/linux/mempolicy.h:63,
                         from init/main.c:53:
        include/linux/highmem.h: At top level:
        include/linux/highmem.h:196: warning: conflicting types for 'debug_kmap_atomic'
        include/linux/highmem.h:196: error: static declaration of 'debug_kmap_atomic' follows non-static declaration
        include/asm/highmem.h:98: error: previous implicit declaration of 'debug_kmap_atomic' was here
        make[1]: *** [init/main.o] Error 1
        make: *** [init] Error 2
      Signed-off-by: default avatarKumar Gala <galak@kernel.crashing.org>
      Acked-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3688e07f
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · c54c4dec
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: ixp4xx - Fix handling of chained sg buffers
        crypto: shash - Fix unaligned calculation with short length
        hwrng: timeriomem - Use phys address rather than virt
      c54c4dec
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu · 5de1ccbe
      Linus Torvalds authored
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu: (41 commits)
        m68knommu: improve compile arch switch settings
        m68knommu: fix 5407 ColdFire UART vector setup
        m68knommu: fix 5307 ColdFire UART vector setup
        m68knommu: fix 5249 ColdFire UART vector setup
        m68knommu: fix 5249 ColdFire UART setup
        m68knommu: fix end of uart table marker
        m68knommu: switch to using generic_handle_irq()
        m68k: merge the mmu and non-mmu versions of tlbflush.h
        m68knommu: introduce basic clk infrastructure
        m68k: merge the mmu and non-mmu versions of module.h
        m68knommu: add missing interrupt line definition for UART 2
        m68k: merge the mmu and non-mmu versions of mmu_context.h
        m68k: merge the mmu and non-mmu versions of current.h
        m68k: merge the mmu and non-mmu versions of div64.h
        m68k: merge the mmu and non-mmu versions of bugs.h
        m68k: merge the mmu and non-mmu versions of bug.h
        m68k: use the mmu version of cache.h for m68knommu as well
        m68k: use the mmu version of bootinfo.h for m68knommu as well
        m68k: merge the mmu and non-mmu versions of fb.h
        m68k: merge the mmu and non-mmu versions of segment.h
        ...
      5de1ccbe
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://neil.brown.name/md · 223cdea4
      Linus Torvalds authored
      * 'for-linus' of git://neil.brown.name/md: (53 commits)
        md/raid5 revise rules for when to update metadata during reshape
        md/raid5: minor code cleanups in make_request.
        md: remove CONFIG_MD_RAID_RESHAPE config option.
        md/raid5: be more careful about write ordering when reshaping.
        md: don't display meaningless values in sysfs files resync_start and sync_speed
        md/raid5: allow layout and chunksize to be changed on active array.
        md/raid5: reshape using largest of old and new chunk size
        md/raid5: prepare for allowing reshape to change layout
        md/raid5: prepare for allowing reshape to change chunksize.
        md/raid5: clearly differentiate 'before' and 'after' stripes during reshape.
        Documentation/md.txt update
        md: allow number of drives in raid5 to be reduced
        md/raid5: change reshape-progress measurement to cope with reshaping backwards.
        md: add explicit method to signal the end of a reshape.
        md/raid5: enhance raid5_size to work correctly with negative delta_disks
        md/raid5: drop qd_idx from r6_state
        md/raid6: move raid6 data processing to raid6_pq.ko
        md: raid5 run(): Fix max_degraded for raid level 4.
        md: 'array_size' sysfs attribute
        md: centralize ->array_sectors modifications
        ...
      223cdea4
    • Linus Torvalds's avatar
      Merge master.kernel.org:/home/rmk/linux-2.6-arm · 31e6e2da
      Linus Torvalds authored
      * master.kernel.org:/home/rmk/linux-2.6-arm:
        [ARM] fix build-breaking 7a192ec3 commit
        ARM: Add SMSC911X support to Overo platform (V2)
        arm: update omap_ldp defconfig to use smsc911x
        arm: update realview defconfigs to use smsc911x
        arm: update pcm037 defconfig to use smsc911x
        arm: convert omap ldp platform to use smsc911x
        arm: convert realview platform to use smsc911x
        arm: convert pcm037 platform to use smsc911x
        [ARM] 5444/1: ARM: Realview: Fix event-device multiplicators in localtimer.c
        [ARM] 5442/1: pxa/cm-x255: fix reverse RDY gpios in PCMCIA driver
        [ARM] 5441/1: Use pr_err on error paths in at91 pm
        [ARM] 5440/1: Fix VFP state corruption due to preemption during VFP exceptions
        [ARM] 5439/1: Do not clear bit 10 of DFSR during abort handling on ARMv6
        [ARM] 5437/1: Add documentation for "nohlt" kernel parameter
        [ARM] 5436/1: ARM: OMAP: Fix compile for rx51
        [ARM] arch_reset() now takes a second parameter
        [ARM] Kirkwood: small L2 code cleanup
        [ARM] Kirkwood: invalidate L2 cache before enabling it
      31e6e2da
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/linux-hdreg-h-cleanup · ea02259f
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/bart/linux-hdreg-h-cleanup:
        remove <linux/ata.h> include from <linux/hdreg.h>
        include/linux/hdreg.h: remove unused defines
        isd200: use ATA_* defines instead of *_STAT and *_ERR ones
        include/linux/hdreg.h: cover WIN_* and friends with #ifndef/#endif __KERNEL__
        aoe: WIN_* -> ATA_CMD_*
        isd200: WIN_* -> ATA_CMD_*
        include/linux/hdreg.h: cover struct hd_driveid with #ifndef/#endif __KERNEL__
        xsysace: make it 'struct hd_driveid'-free
        ubd_kern: make it 'struct hd_driveid'-free
        isd200: make it 'struct hd_driveid'-free
      ea02259f
    • David Howells's avatar
      NFS: Add mount options to enable local caching on NFS · b797cac7
      David Howells authored
      Add NFS mount options to allow the local caching support to be enabled.
      
      The attached patch makes it possible for the NFS filesystem to be told to make
      use of the network filesystem local caching service (FS-Cache).
      
      To be able to use this, a recent nfsutils package is required.
      
      There are three variant NFS mount options that can be added to a mount command
      to control caching for a mount.  Only the last one specified takes effect:
      
       (*) Adding "fsc" will request caching.
      
       (*) Adding "fsc=<string>" will request caching and also specify a uniquifier.
      
       (*) Adding "nofsc" will disable caching.
      
      For example:
      
      	mount warthog:/ /a -o fsc
      
      The cache of a particular superblock (NFS FSID) will be shared between all
      mounts of that volume, provided they have the same connection parameters and
      are not marked 'nosharecache'.
      
      Where it is otherwise impossible to distinguish superblocks because all the
      parameters are identical, but the 'nosharecache' option is supplied, a
      uniquifying string must be supplied, else only the first mount will be
      permitted to use the cache.
      
      If there's a key collision, then the second mount will disable caching and give
      a warning into the kernel log.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarSteve Dickson <steved@redhat.com>
      Acked-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Tested-by: default avatarDaire Byrne <Daire.Byrne@framestore.com>
      b797cac7
    • David Howells's avatar
      NFS: Display local caching state · 5d1acff1
      David Howells authored
      Display the local caching state in /proc/fs/nfsfs/volumes.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarSteve Dickson <steved@redhat.com>
      Acked-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Tested-by: default avatarDaire Byrne <Daire.Byrne@framestore.com>
      5d1acff1
    • David Howells's avatar
      NFS: Store pages from an NFS inode into a local cache · 7f8e05f6
      David Howells authored
      Store pages from an NFS inode into the cache data storage object associated
      with that inode.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarSteve Dickson <steved@redhat.com>
      Acked-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Tested-by: default avatarDaire Byrne <Daire.Byrne@framestore.com>
      7f8e05f6
    • David Howells's avatar
      NFS: Read pages from FS-Cache into an NFS inode · 9a9fc1c0
      David Howells authored
      Read pages from an FS-Cache data storage object representing an inode into an
      NFS inode.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarSteve Dickson <steved@redhat.com>
      Acked-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Tested-by: default avatarDaire Byrne <Daire.Byrne@framestore.com>
      9a9fc1c0
    • David Howells's avatar
      NFS: nfs_readpage_async() needs to be accessible as a fallback for local caching · f42b293d
      David Howells authored
      nfs_readpage_async() needs to be non-static so that it can be used as a
      fallback for the local on-disk caching should an EIO crop up when reading the
      cache.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarSteve Dickson <steved@redhat.com>
      Acked-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Tested-by: default avatarDaire Byrne <Daire.Byrne@framestore.com>
      f42b293d
    • David Howells's avatar
      NFS: Add read context retention for FS-Cache to call back with · 1fcdf534
      David Howells authored
      Add read context retention so that FS-Cache can call back into NFS when a read
      operation on the cache fails EIO rather than reading data.  This permits NFS to
      then fetch the data from the server instead using the appropriate security
      context.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarSteve Dickson <steved@redhat.com>
      Acked-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Tested-by: default avatarDaire Byrne <Daire.Byrne@framestore.com>
      1fcdf534
    • David Howells's avatar
      NFS: FS-Cache page management · 545db45f
      David Howells authored
      FS-Cache page management for NFS.  This includes hooking the releasing and
      invalidation of pages marked with PG_fscache (aka PG_private_2) and waiting for
      completion of the write-to-cache flag (PG_fscache_write aka PG_owner_priv_2).
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarSteve Dickson <steved@redhat.com>
      Acked-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Tested-by: default avatarDaire Byrne <Daire.Byrne@framestore.com>
      545db45f
    • David Howells's avatar
      NFS: Add some new I/O counters for FS-Cache doing things for NFS · 6a51091d
      David Howells authored
      Add some new NFS I/O counters for FS-Cache doing things for NFS.  A new line is
      emitted into /proc/pid/mountstats if caching is enabled that looks like:
      
      	fsc: <rok> <rfl> <wok> <wfl> <unc>
      
      Where <rok> is the number of pages read successfully from the cache, <rfl> is
      the number of failed page reads against the cache, <wok> is the number of
      successful page writes to the cache, <wfl> is the number of failed page writes
      to the cache, and <unc> is the number of NFS pages that have been disconnected
      from the cache.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarSteve Dickson <steved@redhat.com>
      Acked-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Tested-by: default avatarDaire Byrne <Daire.Byrne@framestore.com>
      6a51091d
    • David Howells's avatar
      NFS: Invalidate FsCache page flags when cache removed · d599064a
      David Howells authored
      Invalidate the FsCache page flags on the pages belonging to an inode when the
      cache backing that NFS inode is removed.
      
      This allows a live cache to be withdrawn.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarSteve Dickson <steved@redhat.com>
      Acked-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Tested-by: default avatarDaire Byrne <Daire.Byrne@framestore.com>
      d599064a
    • David Howells's avatar
      NFS: Use local disk inode cache · ef79c097
      David Howells authored
      Bind data storage objects in the local cache to NFS inodes.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarSteve Dickson <steved@redhat.com>
      Acked-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Tested-by: default avatarDaire Byrne <Daire.Byrne@framestore.com>
      ef79c097
    • David Howells's avatar
      NFS: Define and create inode-level cache objects · 10329a5d
      David Howells authored
      Define and create inode-level cache data storage objects (as managed by
      nfs_inode structs).
      
      Each inode-level object is created in a superblock-level index object and is
      itself a data storage object into which pages from the inode are stored.
      
      The inode object key is the NFS file handle for the inode.
      
      The inode object is given coherency data to carry in the auxiliary data
      permitted by the cache.  This is a sequence made up of:
      
       (1) i_mtime from the NFS inode.
      
       (2) i_ctime from the NFS inode.
      
       (3) i_size from the NFS inode.
      
       (4) change_attr from the NFSv4 attribute data.
      
      As the cache is a persistent cache, the auxiliary data is checked when a new
      NFS in-memory inode is set up that matches an already existing data storage
      object in the cache.  If the coherency data is the same, the on-disk object is
      retained and used; if not, it is scrapped and a new one created.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarSteve Dickson <steved@redhat.com>
      Acked-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Tested-by: default avatarDaire Byrne <Daire.Byrne@framestore.com>
      10329a5d
    • David Howells's avatar
      NFS: Define and create superblock-level objects · 08734048
      David Howells authored
      Define and create superblock-level cache index objects (as managed by
      nfs_server structs).
      
      Each superblock object is created in a server level index object and is itself
      an index into which inode-level objects are inserted.
      
      Ideally there would be one superblock-level object per server, and the former
      would be folded into the latter; however, since the "nosharecache" option
      exists this isn't possible.
      
      The superblock object key is a sequence consisting of:
      
       (1) Certain superblock s_flags.
      
       (2) Various connection parameters that serve to distinguish superblocks for
           sget().
      
       (3) The volume FSID.
      
       (4) The security flavour.
      
       (5) The uniquifier length.
      
       (6) The uniquifier text.  This is normally an empty string, unless the fsc=xyz
           mount option was used to explicitly specify a uniquifier.
      
      The key blob is of variable length, depending on the length of (6).
      
      The superblock object is given no coherency data to carry in the auxiliary data
      permitted by the cache.  It is assumed that the superblock is always coherent.
      
      This patch also adds uniquification handling such that two otherwise identical
      superblocks, at least one of which is marked "nosharecache", won't end up
      trying to share the on-disk cache.  It will be possible to manually provide a
      uniquifier through a mount option with a later patch to avoid the error
      otherwise produced.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarSteve Dickson <steved@redhat.com>
      Acked-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Tested-by: default avatarDaire Byrne <Daire.Byrne@framestore.com>
      08734048
    • David Howells's avatar
      NFS: Define and create server-level objects · 14727281
      David Howells authored
      Define and create server-level cache index objects (as managed by nfs_client
      structs).
      
      Each server object is created in the NFS top-level index object and is itself
      an index into which superblock-level objects are inserted.
      
      Ideally there would be one superblock-level object per server, and the former
      would be folded into the latter; however, since the "nosharecache" option
      exists this isn't possible.
      
      The server object key is a sequence consisting of:
      
       (1) NFS version
      
       (2) Server address family (eg: AF_INET or AF_INET6)
      
       (3) Server port.
      
       (4) Server IP address.
      
      The key blob is of variable length, depending on the length of (4).
      
      The server object is given no coherency data to carry in the auxiliary data
      permitted by the cache.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarSteve Dickson <steved@redhat.com>
      Acked-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Tested-by: default avatarDaire Byrne <Daire.Byrne@framestore.com>
      14727281
    • David Howells's avatar
      NFS: Register NFS for caching and retrieve the top-level index · 8ec442ae
      David Howells authored
      Register NFS for caching and retrieve the top-level cache index object cookie.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarSteve Dickson <steved@redhat.com>
      Acked-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Tested-by: default avatarDaire Byrne <Daire.Byrne@framestore.com>
      8ec442ae
    • David Howells's avatar
      NFS: Permit local filesystem caching to be enabled for NFS · 3b9ce977
      David Howells authored
      Permit local filesystem caching to be enabled for NFS in the kernel
      configuration.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarSteve Dickson <steved@redhat.com>
      Acked-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Tested-by: default avatarDaire Byrne <Daire.Byrne@framestore.com>
      3b9ce977
    • David Howells's avatar
      NFS: Add FS-Cache option bit and debug bit · c6a6f19e
      David Howells authored
      Add FS-Cache option bit to nfs_server struct.  This is set to indicate local
      on-disk caching is enabled for a particular superblock.
      
      Also add debug bit for local caching operations.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarSteve Dickson <steved@redhat.com>
      Acked-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Tested-by: default avatarDaire Byrne <Daire.Byrne@framestore.com>
      c6a6f19e
    • David Howells's avatar
      NFS: Add comment banners to some NFS functions · 6b9b3514
      David Howells authored
      Add comment banners to some NFS functions so that they can be modified by the
      NFS fscache patches for further information.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarSteve Dickson <steved@redhat.com>
      Acked-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Tested-by: default avatarDaire Byrne <Daire.Byrne@framestore.com>
      6b9b3514
    • David Howells's avatar
      FS-Cache: Make kAFS use FS-Cache · 9b3f26c9
      David Howells authored
      The attached patch makes the kAFS filesystem in fs/afs/ use FS-Cache, and
      through it any attached caches.  The kAFS filesystem will use caching
      automatically if it's available.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarSteve Dickson <steved@redhat.com>
      Acked-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Tested-by: default avatarDaire Byrne <Daire.Byrne@framestore.com>
      9b3f26c9
    • David Howells's avatar
      CacheFiles: A cache that backs onto a mounted filesystem · 9ae326a6
      David Howells authored
      Add an FS-Cache cache-backend that permits a mounted filesystem to be used as a
      backing store for the cache.
      
      CacheFiles uses a userspace daemon to do some of the cache management - such as
      reaping stale nodes and culling.  This is called cachefilesd and lives in
      /sbin.  The source for the daemon can be downloaded from:
      
      	http://people.redhat.com/~dhowells/cachefs/cachefilesd.c
      
      And an example configuration from:
      
      	http://people.redhat.com/~dhowells/cachefs/cachefilesd.conf
      
      The filesystem and data integrity of the cache are only as good as those of the
      filesystem providing the backing services.  Note that CacheFiles does not
      attempt to journal anything since the journalling interfaces of the various
      filesystems are very specific in nature.
      
      CacheFiles creates a misc character device - "/dev/cachefiles" - that is used
      to communication with the daemon.  Only one thing may have this open at once,
      and whilst it is open, a cache is at least partially in existence.  The daemon
      opens this and sends commands down it to control the cache.
      
      CacheFiles is currently limited to a single cache.
      
      CacheFiles attempts to maintain at least a certain percentage of free space on
      the filesystem, shrinking the cache by culling the objects it contains to make
      space if necessary - see the "Cache Culling" section.  This means it can be
      placed on the same medium as a live set of data, and will expand to make use of
      spare space and automatically contract when the set of data requires more
      space.
      
      ============
      REQUIREMENTS
      ============
      
      The use of CacheFiles and its daemon requires the following features to be
      available in the system and in the cache filesystem:
      
      	- dnotify.
      
      	- extended attributes (xattrs).
      
      	- openat() and friends.
      
      	- bmap() support on files in the filesystem (FIBMAP ioctl).
      
      	- The use of bmap() to detect a partial page at the end of the file.
      
      It is strongly recommended that the "dir_index" option is enabled on Ext3
      filesystems being used as a cache.
      
      =============
      CONFIGURATION
      =============
      
      The cache is configured by a script in /etc/cachefilesd.conf.  These commands
      set up cache ready for use.  The following script commands are available:
      
       (*) brun <N>%
       (*) bcull <N>%
       (*) bstop <N>%
       (*) frun <N>%
       (*) fcull <N>%
       (*) fstop <N>%
      
      	Configure the culling limits.  Optional.  See the section on culling
      	The defaults are 7% (run), 5% (cull) and 1% (stop) respectively.
      
      	The commands beginning with a 'b' are file space (block) limits, those
      	beginning with an 'f' are file count limits.
      
       (*) dir <path>
      
      	Specify the directory containing the root of the cache.  Mandatory.
      
       (*) tag <name>
      
      	Specify a tag to FS-Cache to use in distinguishing multiple caches.
      	Optional.  The default is "CacheFiles".
      
       (*) debug <mask>
      
      	Specify a numeric bitmask to control debugging in the kernel module.
      	Optional.  The default is zero (all off).  The following values can be
      	OR'd into the mask to collect various information:
      
      		1	Turn on trace of function entry (_enter() macros)
      		2	Turn on trace of function exit (_leave() macros)
      		4	Turn on trace of internal debug points (_debug())
      
      	This mask can also be set through sysfs, eg:
      
      		echo 5 >/sys/modules/cachefiles/parameters/debug
      
      ==================
      STARTING THE CACHE
      ==================
      
      The cache is started by running the daemon.  The daemon opens the cache device,
      configures the cache and tells it to begin caching.  At that point the cache
      binds to fscache and the cache becomes live.
      
      The daemon is run as follows:
      
      	/sbin/cachefilesd [-d]* [-s] [-n] [-f <configfile>]
      
      The flags are:
      
       (*) -d
      
      	Increase the debugging level.  This can be specified multiple times and
      	is cumulative with itself.
      
       (*) -s
      
      	Send messages to stderr instead of syslog.
      
       (*) -n
      
      	Don't daemonise and go into background.
      
       (*) -f <configfile>
      
      	Use an alternative configuration file rather than the default one.
      
      ===============
      THINGS TO AVOID
      ===============
      
      Do not mount other things within the cache as this will cause problems.  The
      kernel module contains its own very cut-down path walking facility that ignores
      mountpoints, but the daemon can't avoid them.
      
      Do not create, rename or unlink files and directories in the cache whilst the
      cache is active, as this may cause the state to become uncertain.
      
      Renaming files in the cache might make objects appear to be other objects (the
      filename is part of the lookup key).
      
      Do not change or remove the extended attributes attached to cache files by the
      cache as this will cause the cache state management to get confused.
      
      Do not create files or directories in the cache, lest the cache get confused or
      serve incorrect data.
      
      Do not chmod files in the cache.  The module creates things with minimal
      permissions to prevent random users being able to access them directly.
      
      =============
      CACHE CULLING
      =============
      
      The cache may need culling occasionally to make space.  This involves
      discarding objects from the cache that have been used less recently than
      anything else.  Culling is based on the access time of data objects.  Empty
      directories are culled if not in use.
      
      Cache culling is done on the basis of the percentage of blocks and the
      percentage of files available in the underlying filesystem.  There are six
      "limits":
      
       (*) brun
       (*) frun
      
           If the amount of free space and the number of available files in the cache
           rises above both these limits, then culling is turned off.
      
       (*) bcull
       (*) fcull
      
           If the amount of available space or the number of available files in the
           cache falls below either of these limits, then culling is started.
      
       (*) bstop
       (*) fstop
      
           If the amount of available space or the number of available files in the
           cache falls below either of these limits, then no further allocation of
           disk space or files is permitted until culling has raised things above
           these limits again.
      
      These must be configured thusly:
      
      	0 <= bstop < bcull < brun < 100
      	0 <= fstop < fcull < frun < 100
      
      Note that these are percentages of available space and available files, and do
      _not_ appear as 100 minus the percentage displayed by the "df" program.
      
      The userspace daemon scans the cache to build up a table of cullable objects.
      These are then culled in least recently used order.  A new scan of the cache is
      started as soon as space is made in the table.  Objects will be skipped if
      their atimes have changed or if the kernel module says it is still using them.
      
      ===============
      CACHE STRUCTURE
      ===============
      
      The CacheFiles module will create two directories in the directory it was
      given:
      
       (*) cache/
      
       (*) graveyard/
      
      The active cache objects all reside in the first directory.  The CacheFiles
      kernel module moves any retired or culled objects that it can't simply unlink
      to the graveyard from which the daemon will actually delete them.
      
      The daemon uses dnotify to monitor the graveyard directory, and will delete
      anything that appears therein.
      
      The module represents index objects as directories with the filename "I..." or
      "J...".  Note that the "cache/" directory is itself a special index.
      
      Data objects are represented as files if they have no children, or directories
      if they do.  Their filenames all begin "D..." or "E...".  If represented as a
      directory, data objects will have a file in the directory called "data" that
      actually holds the data.
      
      Special objects are similar to data objects, except their filenames begin
      "S..." or "T...".
      
      If an object has children, then it will be represented as a directory.
      Immediately in the representative directory are a collection of directories
      named for hash values of the child object keys with an '@' prepended.  Into
      this directory, if possible, will be placed the representations of the child
      objects:
      
      	INDEX     INDEX      INDEX                             DATA FILES
      	========= ========== ================================= ================
      	cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400
      	cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...DB1ry
      	cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...N22ry
      	cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...FP1ry
      
      If the key is so long that it exceeds NAME_MAX with the decorations added on to
      it, then it will be cut into pieces, the first few of which will be used to
      make a nest of directories, and the last one of which will be the objects
      inside the last directory.  The names of the intermediate directories will have
      '+' prepended:
      
      	J1223/@23/+xy...z/+kl...m/Epqr
      
      Note that keys are raw data, and not only may they exceed NAME_MAX in size,
      they may also contain things like '/' and NUL characters, and so they may not
      be suitable for turning directly into a filename.
      
      To handle this, CacheFiles will use a suitably printable filename directly and
      "base-64" encode ones that aren't directly suitable.  The two versions of
      object filenames indicate the encoding:
      
      	OBJECT TYPE	PRINTABLE	ENCODED
      	===============	===============	===============
      	Index		"I..."		"J..."
      	Data		"D..."		"E..."
      	Special		"S..."		"T..."
      
      Intermediate directories are always "@" or "+" as appropriate.
      
      Each object in the cache has an extended attribute label that holds the object
      type ID (required to distinguish special objects) and the auxiliary data from
      the netfs.  The latter is used to detect stale objects in the cache and update
      or retire them.
      
      Note that CacheFiles will erase from the cache any file it doesn't recognise or
      any file of an incorrect type (such as a FIFO file or a device file).
      
      ==========================
      SECURITY MODEL AND SELINUX
      ==========================
      
      CacheFiles is implemented to deal properly with the LSM security features of
      the Linux kernel and the SELinux facility.
      
      One of the problems that CacheFiles faces is that it is generally acting on
      behalf of a process, and running in that process's context, and that includes a
      security context that is not appropriate for accessing the cache - either
      because the files in the cache are inaccessible to that process, or because if
      the process creates a file in the cache, that file may be inaccessible to other
      processes.
      
      The way CacheFiles works is to temporarily change the security context (fsuid,
      fsgid and actor security label) that the process acts as - without changing the
      security context of the process when it the target of an operation performed by
      some other process (so signalling and suchlike still work correctly).
      
      When the CacheFiles module is asked to bind to its cache, it:
      
       (1) Finds the security label attached to the root cache directory and uses
           that as the security label with which it will create files.  By default,
           this is:
      
      	cachefiles_var_t
      
       (2) Finds the security label of the process which issued the bind request
           (presumed to be the cachefilesd daemon), which by default will be:
      
      	cachefilesd_t
      
           and asks LSM to supply a security ID as which it should act given the
           daemon's label.  By default, this will be:
      
      	cachefiles_kernel_t
      
           SELinux transitions the daemon's security ID to the module's security ID
           based on a rule of this form in the policy.
      
      	type_transition <daemon's-ID> kernel_t : process <module's-ID>;
      
           For instance:
      
      	type_transition cachefilesd_t kernel_t : process cachefiles_kernel_t;
      
      The module's security ID gives it permission to create, move and remove files
      and directories in the cache, to find and access directories and files in the
      cache, to set and access extended attributes on cache objects, and to read and
      write files in the cache.
      
      The daemon's security ID gives it only a very restricted set of permissions: it
      may scan directories, stat files and erase files and directories.  It may
      not read or write files in the cache, and so it is precluded from accessing the
      data cached therein; nor is it permitted to create new files in the cache.
      
      There are policy source files available in:
      
      	http://people.redhat.com/~dhowells/fscache/cachefilesd-0.8.tar.bz2
      
      and later versions.  In that tarball, see the files:
      
      	cachefilesd.te
      	cachefilesd.fc
      	cachefilesd.if
      
      They are built and installed directly by the RPM.
      
      If a non-RPM based system is being used, then copy the above files to their own
      directory and run:
      
      	make -f /usr/share/selinux/devel/Makefile
      	semodule -i cachefilesd.pp
      
      You will need checkpolicy and selinux-policy-devel installed prior to the
      build.
      
      By default, the cache is located in /var/fscache, but if it is desirable that
      it should be elsewhere, than either the above policy files must be altered, or
      an auxiliary policy must be installed to label the alternate location of the
      cache.
      
      For instructions on how to add an auxiliary policy to enable the cache to be
      located elsewhere when SELinux is in enforcing mode, please see:
      
      	/usr/share/doc/cachefilesd-*/move-cache.txt
      
      When the cachefilesd rpm is installed; alternatively, the document can be found
      in the sources.
      
      ==================
      A NOTE ON SECURITY
      ==================
      
      CacheFiles makes use of the split security in the task_struct.  It allocates
      its own task_security structure, and redirects current->act_as to point to it
      when it acts on behalf of another process, in that process's context.
      
      The reason it does this is that it calls vfs_mkdir() and suchlike rather than
      bypassing security and calling inode ops directly.  Therefore the VFS and LSM
      may deny the CacheFiles access to the cache data because under some
      circumstances the caching code is running in the security context of whatever
      process issued the original syscall on the netfs.
      
      Furthermore, should CacheFiles create a file or directory, the security
      parameters with that object is created (UID, GID, security label) would be
      derived from that process that issued the system call, thus potentially
      preventing other processes from accessing the cache - including CacheFiles's
      cache management daemon (cachefilesd).
      
      What is required is to temporarily override the security of the process that
      issued the system call.  We can't, however, just do an in-place change of the
      security data as that affects the process as an object, not just as a subject.
      This means it may lose signals or ptrace events for example, and affects what
      the process looks like in /proc.
      
      So CacheFiles makes use of a logical split in the security between the
      objective security (task->sec) and the subjective security (task->act_as).  The
      objective security holds the intrinsic security properties of a process and is
      never overridden.  This is what appears in /proc, and is what is used when a
      process is the target of an operation by some other process (SIGKILL for
      example).
      
      The subjective security holds the active security properties of a process, and
      may be overridden.  This is not seen externally, and is used whan a process
      acts upon another object, for example SIGKILLing another process or opening a
      file.
      
      LSM hooks exist that allow SELinux (or Smack or whatever) to reject a request
      for CacheFiles to run in a context of a specific security label, or to create
      files and directories with another security label.
      
      This documentation is added by the patch to:
      
      	Documentation/filesystems/caching/cachefiles.txt
      Signed-Off-By: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarSteve Dickson <steved@redhat.com>
      Acked-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Tested-by: default avatarDaire Byrne <Daire.Byrne@framestore.com>
      9ae326a6
    • David Howells's avatar
      CacheFiles: Export things for CacheFiles · 800a9647
      David Howells authored
      Export a number of functions for CacheFiles's use.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarSteve Dickson <steved@redhat.com>
      Acked-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Tested-by: default avatarDaire Byrne <Daire.Byrne@framestore.com>
      800a9647
    • David Howells's avatar
      CacheFiles: Permit the page lock state to be monitored · 385e1ca5
      David Howells authored
      Add a function to install a monitor on the page lock waitqueue for a particular
      page, thus allowing the page being unlocked to be detected.
      
      This is used by CacheFiles to detect read completion on a page in the backing
      filesystem so that it can then copy the data to the waiting netfs page.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarSteve Dickson <steved@redhat.com>
      Acked-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Tested-by: default avatarDaire Byrne <Daire.Byrne@framestore.com>
      385e1ca5
    • David Howells's avatar
      FS-Cache: Implement data I/O part of netfs API · b5108822
      David Howells authored
      Implement the data I/O part of the FS-Cache netfs API.  The documentation and
      API header file were added in a previous patch.
      
      This patch implements the following functions for the netfs to call:
      
       (*) fscache_attr_changed().
      
           Indicate that the object has changed its attributes.  The only attribute
           currently recorded is the file size.  Only pages within the set file size
           will be stored in the cache.
      
           This operation is submitted for asynchronous processing, and will return
           immediately.  It will return -ENOMEM if an out of memory error is
           encountered, -ENOBUFS if the object is not actually cached, or 0 if the
           operation is successfully queued.
      
       (*) fscache_read_or_alloc_page().
       (*) fscache_read_or_alloc_pages().
      
           Request data be fetched from the disk, and allocate internal metadata to
           track the netfs pages and reserve disk space for unknown pages.
      
           These operations perform semi-asynchronous data reads.  Upon returning
           they will indicate which pages they think can be retrieved from disk, and
           will have set in progress attempts to retrieve those pages.
      
           These will return, in order of preference, -ENOMEM on memory allocation
           error, -ERESTARTSYS if a signal interrupted proceedings, -ENODATA if one
           or more requested pages are not yet cached, -ENOBUFS if the object is not
           actually cached or if there isn't space for future pages to be cached on
           this object, or 0 if successful.
      
           In the case of the multipage function, the pages for which reads are set
           in progress will be removed from the list and the page count decreased
           appropriately.
      
           If any read operations should fail, the completion function will be given
           an error, and will also be passed contextual information to allow the
           netfs to fall back to querying the server for the absent pages.
      
           For each successful read, the page completion function will also be
           called.
      
           Any pages subsequently tracked by the cache will have PG_fscache set upon
           them on return.  fscache_uncache_page() must be called for such pages.
      
           If supplied by the netfs, the mark_pages_cached() cookie op will be
           invoked for any pages now tracked.
      
       (*) fscache_alloc_page().
      
           Allocate internal metadata to track a netfs page and reserve disk space.
      
           This will return -ENOMEM on memory allocation error, -ERESTARTSYS on
           signal, -ENOBUFS if the object isn't cached, or there isn't enough space
           in the cache, or 0 if successful.
      
           Any pages subsequently tracked by the cache will have PG_fscache set upon
           them on return.  fscache_uncache_page() must be called for such pages.
      
           If supplied by the netfs, the mark_pages_cached() cookie op will be
           invoked for any pages now tracked.
      
       (*) fscache_write_page().
      
           Request data be stored to disk.  This may only be called on pages that
           have been read or alloc'd by the above three functions and have not yet
           been uncached.
      
           This will return -ENOMEM on memory allocation error, -ERESTARTSYS on
           signal, -ENOBUFS if the object isn't cached, or there isn't immediately
           enough space in the cache, or 0 if successful.
      
           On a successful return, this operation will have queued the page for
           asynchronous writing to the cache.  The page will be returned with
           PG_fscache_write set until the write completes one way or another.  The
           caller will not be notified if the write fails due to an I/O error.  If
           that happens, the object will become available and all pending writes will
           be aborted.
      
           Note that the cache may batch up page writes, and so it may take a while
           to get around to writing them out.
      
           The caller must assume that until PG_fscache_write is cleared the page is
           use by the cache.  Any changes made to the page may be reflected on disk.
           The page may even be under DMA.
      
       (*) fscache_uncache_page().
      
           Indicate that the cache should stop tracking a page previously read or
           alloc'd from the cache.  If the page was alloc'd only, but unwritten, it
           will not appear on disk.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarSteve Dickson <steved@redhat.com>
      Acked-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Tested-by: default avatarDaire Byrne <Daire.Byrne@framestore.com>
      b5108822
    • David Howells's avatar
      FS-Cache: Add and document asynchronous operation handling · 952efe7b
      David Howells authored
      Add and document asynchronous operation handling for use by FS-Cache's data
      storage and retrieval routines.
      
      The following documentation is added to:
      
      	Documentation/filesystems/caching/operations.txt
      
      		       ================================
      		       ASYNCHRONOUS OPERATIONS HANDLING
      		       ================================
      
      ========
      OVERVIEW
      ========
      
      FS-Cache has an asynchronous operations handling facility that it uses for its
      data storage and retrieval routines.  Its operations are represented by
      fscache_operation structs, though these are usually embedded into some other
      structure.
      
      This facility is available to and expected to be be used by the cache backends,
      and FS-Cache will create operations and pass them off to the appropriate cache
      backend for completion.
      
      To make use of this facility, <linux/fscache-cache.h> should be #included.
      
      ===============================
      OPERATION RECORD INITIALISATION
      ===============================
      
      An operation is recorded in an fscache_operation struct:
      
      	struct fscache_operation {
      		union {
      			struct work_struct fast_work;
      			struct slow_work slow_work;
      		};
      		unsigned long		flags;
      		fscache_operation_processor_t processor;
      		...
      	};
      
      Someone wanting to issue an operation should allocate something with this
      struct embedded in it.  They should initialise it by calling:
      
      	void fscache_operation_init(struct fscache_operation *op,
      				    fscache_operation_release_t release);
      
      with the operation to be initialised and the release function to use.
      
      The op->flags parameter should be set to indicate the CPU time provision and
      the exclusivity (see the Parameters section).
      
      The op->fast_work, op->slow_work and op->processor flags should be set as
      appropriate for the CPU time provision (see the Parameters section).
      
      FSCACHE_OP_WAITING may be set in op->flags prior to each submission of the
      operation and waited for afterwards.
      
      ==========
      PARAMETERS
      ==========
      
      There are a number of parameters that can be set in the operation record's flag
      parameter.  There are three options for the provision of CPU time in these
      operations:
      
       (1) The operation may be done synchronously (FSCACHE_OP_MYTHREAD).  A thread
           may decide it wants to handle an operation itself without deferring it to
           another thread.
      
           This is, for example, used in read operations for calling readpages() on
           the backing filesystem in CacheFiles.  Although readpages() does an
           asynchronous data fetch, the determination of whether pages exist is done
           synchronously - and the netfs does not proceed until this has been
           determined.
      
           If this option is to be used, FSCACHE_OP_WAITING must be set in op->flags
           before submitting the operation, and the operating thread must wait for it
           to be cleared before proceeding:
      
      		wait_on_bit(&op->flags, FSCACHE_OP_WAITING,
      			    fscache_wait_bit, TASK_UNINTERRUPTIBLE);
      
       (2) The operation may be fast asynchronous (FSCACHE_OP_FAST), in which case it
           will be given to keventd to process.  Such an operation is not permitted
           to sleep on I/O.
      
           This is, for example, used by CacheFiles to copy data from a backing fs
           page to a netfs page after the backing fs has read the page in.
      
           If this option is used, op->fast_work and op->processor must be
           initialised before submitting the operation:
      
      		INIT_WORK(&op->fast_work, do_some_work);
      
       (3) The operation may be slow asynchronous (FSCACHE_OP_SLOW), in which case it
           will be given to the slow work facility to process.  Such an operation is
           permitted to sleep on I/O.
      
           This is, for example, used by FS-Cache to handle background writes of
           pages that have just been fetched from a remote server.
      
           If this option is used, op->slow_work and op->processor must be
           initialised before submitting the operation:
      
      		fscache_operation_init_slow(op, processor)
      
      Furthermore, operations may be one of two types:
      
       (1) Exclusive (FSCACHE_OP_EXCLUSIVE).  Operations of this type may not run in
           conjunction with any other operation on the object being operated upon.
      
           An example of this is the attribute change operation, in which the file
           being written to may need truncation.
      
       (2) Shareable.  Operations of this type may be running simultaneously.  It's
           up to the operation implementation to prevent interference between other
           operations running at the same time.
      
      =========
      PROCEDURE
      =========
      
      Operations are used through the following procedure:
      
       (1) The submitting thread must allocate the operation and initialise it
           itself.  Normally this would be part of a more specific structure with the
           generic op embedded within.
      
       (2) The submitting thread must then submit the operation for processing using
           one of the following two functions:
      
      	int fscache_submit_op(struct fscache_object *object,
      			      struct fscache_operation *op);
      
      	int fscache_submit_exclusive_op(struct fscache_object *object,
      					struct fscache_operation *op);
      
           The first function should be used to submit non-exclusive ops and the
           second to submit exclusive ones.  The caller must still set the
           FSCACHE_OP_EXCLUSIVE flag.
      
           If successful, both functions will assign the operation to the specified
           object and return 0.  -ENOBUFS will be returned if the object specified is
           permanently unavailable.
      
           The operation manager will defer operations on an object that is still
           undergoing lookup or creation.  The operation will also be deferred if an
           operation of conflicting exclusivity is in progress on the object.
      
           If the operation is asynchronous, the manager will retain a reference to
           it, so the caller should put their reference to it by passing it to:
      
      	void fscache_put_operation(struct fscache_operation *op);
      
       (3) If the submitting thread wants to do the work itself, and has marked the
           operation with FSCACHE_OP_MYTHREAD, then it should monitor
           FSCACHE_OP_WAITING as described above and check the state of the object if
           necessary (the object might have died whilst the thread was waiting).
      
           When it has finished doing its processing, it should call
           fscache_put_operation() on it.
      
       (4) The operation holds an effective lock upon the object, preventing other
           exclusive ops conflicting until it is released.  The operation can be
           enqueued for further immediate asynchronous processing by adjusting the
           CPU time provisioning option if necessary, eg:
      
      	op->flags &= ~FSCACHE_OP_TYPE;
      	op->flags |= ~FSCACHE_OP_FAST;
      
           and calling:
      
      	void fscache_enqueue_operation(struct fscache_operation *op)
      
           This can be used to allow other things to have use of the worker thread
           pools.
      
      =====================
      ASYNCHRONOUS CALLBACK
      =====================
      
      When used in asynchronous mode, the worker thread pool will invoke the
      processor method with a pointer to the operation.  This should then get at the
      container struct by using container_of():
      
      	static void fscache_write_op(struct fscache_operation *_op)
      	{
      		struct fscache_storage *op =
      			container_of(_op, struct fscache_storage, op);
      	...
      	}
      
      The caller holds a reference on the operation, and will invoke
      fscache_put_operation() when the processor function returns.  The processor
      function is at liberty to call fscache_enqueue_operation() or to take extra
      references.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarSteve Dickson <steved@redhat.com>
      Acked-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Tested-by: default avatarDaire Byrne <Daire.Byrne@framestore.com>
      952efe7b
    • David Howells's avatar
      FS-Cache: Implement the cookie management part of the netfs API · ccc4fc3d
      David Howells authored
      Implement the cookie management part of the FS-Cache netfs client API.  The
      documentation and API header file were added in a previous patch.
      
      This patch implements the following three functions:
      
       (1) fscache_acquire_cookie().
      
           Acquire a cookie to represent an object to the netfs.  If the object in
           question is a non-index object, then that object and its parent indices
           will be created on disk at this point if they don't already exist.  Index
           creation is deferred because an index may reside in multiple caches.
      
       (2) fscache_relinquish_cookie().
      
           Retire or release a cookie previously acquired.  At this point, the
           object on disk may be destroyed.
      
       (3) fscache_update_cookie().
      
           Update the in-cache representation of a cookie.  This is used to update
           the auxiliary data for coherency management purposes.
      
      With this patch it is possible to have a netfs instruct a cache backend to
      look up, validate and create metadata on disk and to destroy it again.
      The ability to actually store and retrieve data in the objects so created is
      added in later patches.
      
      Note that these functions will never return an error.  _All_ errors are
      handled internally to FS-Cache.
      
      The worst that can happen is that fscache_acquire_cookie() may return a NULL
      pointer - which is considered a negative cookie pointer and can be passed back
      to any function that takes a cookie without harm.  A negative cookie pointer
      merely suppresses caching at that level.
      
      The stub in linux/fscache.h will detect inline the negative cookie pointer and
      abort the operation as fast as possible.  This means that the compiler doesn't
      have to set up for a call in that case.
      
      See the documentation in Documentation/filesystems/caching/netfs-api.txt for
      more information.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarSteve Dickson <steved@redhat.com>
      Acked-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Tested-by: default avatarDaire Byrne <Daire.Byrne@framestore.com>
      ccc4fc3d
    • David Howells's avatar
      FS-Cache: Object management state machine · 36c95590
      David Howells authored
      Implement the cache object management state machine.
      
      The following documentation is added to illuminate the working of this state
      machine.  It will also be added as:
      
      	Documentation/filesystems/caching/object.txt
      
      	     ====================================================
      	     IN-KERNEL CACHE OBJECT REPRESENTATION AND MANAGEMENT
      	     ====================================================
      
      ==============
      REPRESENTATION
      ==============
      
      FS-Cache maintains an in-kernel representation of each object that a netfs is
      currently interested in.  Such objects are represented by the fscache_cookie
      struct and are referred to as cookies.
      
      FS-Cache also maintains a separate in-kernel representation of the objects that
      a cache backend is currently actively caching.  Such objects are represented by
      the fscache_object struct.  The cache backends allocate these upon request, and
      are expected to embed them in their own representations.  These are referred to
      as objects.
      
      There is a 1:N relationship between cookies and objects.  A cookie may be
      represented by multiple objects - an index may exist in more than one cache -
      or even by no objects (it may not be cached).
      
      Furthermore, both cookies and objects are hierarchical.  The two hierarchies
      correspond, but the cookies tree is a superset of the union of the object trees
      of multiple caches:
      
      	    NETFS INDEX TREE               :      CACHE 1     :      CACHE 2
      	                                   :                  :
      	                                   :   +-----------+  :
      	                          +----------->|  IObject  |  :
      	      +-----------+       |        :   +-----------+  :
      	      |  ICookie  |-------+        :         |        :
      	      +-----------+       |        :         |        :   +-----------+
      	            |             +------------------------------>|  IObject  |
      	            |                      :         |        :   +-----------+
      	            |                      :         V        :         |
      	            |                      :   +-----------+  :         |
      	            V             +----------->|  IObject  |  :         |
      	      +-----------+       |        :   +-----------+  :         |
      	      |  ICookie  |-------+        :         |        :         V
      	      +-----------+       |        :         |        :   +-----------+
      	            |             +------------------------------>|  IObject  |
      	      +-----+-----+                :         |        :   +-----------+
      	      |           |                :         |        :         |
      	      V           |                :         V        :         |
      	+-----------+     |                :   +-----------+  :         |
      	|  ICookie  |------------------------->|  IObject  |  :         |
      	+-----------+     |                :   +-----------+  :         |
      	      |           V                :         |        :         V
      	      |     +-----------+          :         |        :   +-----------+
      	      |     |  ICookie  |-------------------------------->|  IObject  |
      	      |     +-----------+          :         |        :   +-----------+
      	      V           |                :         V        :         |
      	+-----------+     |                :   +-----------+  :         |
      	|  DCookie  |------------------------->|  DObject  |  :         |
      	+-----------+     |                :   +-----------+  :         |
      	                  |                :                  :         |
      	          +-------+-------+        :                  :         |
      	          |               |        :                  :         |
      	          V               V        :                  :         V
      	    +-----------+   +-----------+  :                  :   +-----------+
      	    |  DCookie  |   |  DCookie  |------------------------>|  DObject  |
      	    +-----------+   +-----------+  :                  :   +-----------+
      	                                   :                  :
      
      In the above illustration, ICookie and IObject represent indices and DCookie
      and DObject represent data storage objects.  Indices may have representation in
      multiple caches, but currently, non-index objects may not.  Objects of any type
      may also be entirely unrepresented.
      
      As far as the netfs API goes, the netfs is only actually permitted to see
      pointers to the cookies.  The cookies themselves and any objects attached to
      those cookies are hidden from it.
      
      ===============================
      OBJECT MANAGEMENT STATE MACHINE
      ===============================
      
      Within FS-Cache, each active object is managed by its own individual state
      machine.  The state for an object is kept in the fscache_object struct, in
      object->state.  A cookie may point to a set of objects that are in different
      states.
      
      Each state has an action associated with it that is invoked when the machine
      wakes up in that state.  There are four logical sets of states:
      
       (1) Preparation: states that wait for the parent objects to become ready.  The
           representations are hierarchical, and it is expected that an object must
           be created or accessed with respect to its parent object.
      
       (2) Initialisation: states that perform lookups in the cache and validate
           what's found and that create on disk any missing metadata.
      
       (3) Normal running: states that allow netfs operations on objects to proceed
           and that update the state of objects.
      
       (4) Termination: states that detach objects from their netfs cookies, that
           delete objects from disk, that handle disk and system errors and that free
           up in-memory resources.
      
      In most cases, transitioning between states is in response to signalled events.
      When a state has finished processing, it will usually set the mask of events in
      which it is interested (object->event_mask) and relinquish the worker thread.
      Then when an event is raised (by calling fscache_raise_event()), if the event
      is not masked, the object will be queued for processing (by calling
      fscache_enqueue_object()).
      
      PROVISION OF CPU TIME
      ---------------------
      
      The work to be done by the various states is given CPU time by the threads of
      the slow work facility (see Documentation/slow-work.txt).  This is used in
      preference to the workqueue facility because:
      
       (1) Threads may be completely occupied for very long periods of time by a
           particular work item.  These state actions may be doing sequences of
           synchronous, journalled disk accesses (lookup, mkdir, create, setxattr,
           getxattr, truncate, unlink, rmdir, rename).
      
       (2) Threads may do little actual work, but may rather spend a lot of time
           sleeping on I/O.  This means that single-threaded and 1-per-CPU-threaded
           workqueues don't necessarily have the right numbers of threads.
      
      LOCKING SIMPLIFICATION
      ----------------------
      
      Because only one worker thread may be operating on any particular object's
      state machine at once, this simplifies the locking, particularly with respect
      to disconnecting the netfs's representation of a cache object (fscache_cookie)
      from the cache backend's representation (fscache_object) - which may be
      requested from either end.
      
      =================
      THE SET OF STATES
      =================
      
      The object state machine has a set of states that it can be in.  There are
      preparation states in which the object sets itself up and waits for its parent
      object to transit to a state that allows access to its children:
      
       (1) State FSCACHE_OBJECT_INIT.
      
           Initialise the object and wait for the parent object to become active.  In
           the cache, it is expected that it will not be possible to look an object
           up from the parent object, until that parent object itself has been looked
           up.
      
      There are initialisation states in which the object sets itself up and accesses
      disk for the object metadata:
      
       (2) State FSCACHE_OBJECT_LOOKING_UP.
      
           Look up the object on disk, using the parent as a starting point.
           FS-Cache expects the cache backend to probe the cache to see whether this
           object is represented there, and if it is, to see if it's valid (coherency
           management).
      
           The cache should call fscache_object_lookup_negative() to indicate lookup
           failure for whatever reason, and should call fscache_obtained_object() to
           indicate success.
      
           At the completion of lookup, FS-Cache will let the netfs go ahead with
           read operations, no matter whether the file is yet cached.  If not yet
           cached, read operations will be immediately rejected with ENODATA until
           the first known page is uncached - as to that point there can be no data
           to be read out of the cache for that file that isn't currently also held
           in the pagecache.
      
       (3) State FSCACHE_OBJECT_CREATING.
      
           Create an object on disk, using the parent as a starting point.  This
           happens if the lookup failed to find the object, or if the object's
           coherency data indicated what's on disk is out of date.  In this state,
           FS-Cache expects the cache to create
      
           The cache should call fscache_obtained_object() if creation completes
           successfully, fscache_object_lookup_negative() otherwise.
      
           At the completion of creation, FS-Cache will start processing write
           operations the netfs has queued for an object.  If creation failed, the
           write ops will be transparently discarded, and nothing recorded in the
           cache.
      
      There are some normal running states in which the object spends its time
      servicing netfs requests:
      
       (4) State FSCACHE_OBJECT_AVAILABLE.
      
           A transient state in which pending operations are started, child objects
           are permitted to advance from FSCACHE_OBJECT_INIT state, and temporary
           lookup data is freed.
      
       (5) State FSCACHE_OBJECT_ACTIVE.
      
           The normal running state.  In this state, requests the netfs makes will be
           passed on to the cache.
      
       (6) State FSCACHE_OBJECT_UPDATING.
      
           The state machine comes here to update the object in the cache from the
           netfs's records.  This involves updating the auxiliary data that is used
           to maintain coherency.
      
      And there are terminal states in which an object cleans itself up, deallocates
      memory and potentially deletes stuff from disk:
      
       (7) State FSCACHE_OBJECT_LC_DYING.
      
           The object comes here if it is dying because of a lookup or creation
           error.  This would be due to a disk error or system error of some sort.
           Temporary data is cleaned up, and the parent is released.
      
       (8) State FSCACHE_OBJECT_DYING.
      
           The object comes here if it is dying due to an error, because its parent
           cookie has been relinquished by the netfs or because the cache is being
           withdrawn.
      
           Any child objects waiting on this one are given CPU time so that they too
           can destroy themselves.  This object waits for all its children to go away
           before advancing to the next state.
      
       (9) State FSCACHE_OBJECT_ABORT_INIT.
      
           The object comes to this state if it was waiting on its parent in
           FSCACHE_OBJECT_INIT, but its parent died.  The object will destroy itself
           so that the parent may proceed from the FSCACHE_OBJECT_DYING state.
      
      (10) State FSCACHE_OBJECT_RELEASING.
      (11) State FSCACHE_OBJECT_RECYCLING.
      
           The object comes to one of these two states when dying once it is rid of
           all its children, if it is dying because the netfs relinquished its
           cookie.  In the first state, the cached data is expected to persist, and
           in the second it will be deleted.
      
      (12) State FSCACHE_OBJECT_WITHDRAWING.
      
           The object transits to this state if the cache decides it wants to
           withdraw the object from service, perhaps to make space, but also due to
           error or just because the whole cache is being withdrawn.
      
      (13) State FSCACHE_OBJECT_DEAD.
      
           The object transits to this state when the in-memory object record is
           ready to be deleted.  The object processor shouldn't ever see an object in
           this state.
      
      THE SET OF EVENTS
      -----------------
      
      There are a number of events that can be raised to an object state machine:
      
       (*) FSCACHE_OBJECT_EV_UPDATE
      
           The netfs requested that an object be updated.  The state machine will ask
           the cache backend to update the object, and the cache backend will ask the
           netfs for details of the change through its cookie definition ops.
      
       (*) FSCACHE_OBJECT_EV_CLEARED
      
           This is signalled in two circumstances:
      
           (a) when an object's last child object is dropped and
      
           (b) when the last operation outstanding on an object is completed.
      
           This is used to proceed from the dying state.
      
       (*) FSCACHE_OBJECT_EV_ERROR
      
           This is signalled when an I/O error occurs during the processing of some
           object.
      
       (*) FSCACHE_OBJECT_EV_RELEASE
       (*) FSCACHE_OBJECT_EV_RETIRE
      
           These are signalled when the netfs relinquishes a cookie it was using.
           The event selected depends on whether the netfs asks for the backing
           object to be retired (deleted) or retained.
      
       (*) FSCACHE_OBJECT_EV_WITHDRAW
      
           This is signalled when the cache backend wants to withdraw an object.
           This means that the object will have to be detached from the netfs's
           cookie.
      
      Because the withdrawing releasing/retiring events are all handled by the object
      state machine, it doesn't matter if there's a collision with both ends trying
      to sever the connection at the same time.  The state machine can just pick
      which one it wants to honour, and that effects the other.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarSteve Dickson <steved@redhat.com>
      Acked-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Tested-by: default avatarDaire Byrne <Daire.Byrne@framestore.com>
      36c95590