Commit 3df6b7b7 authored by Tim Gardner's avatar Tim Gardner

UBUNTU: SAUCE: AUFS

Signed-off-by: default avatarTim Gardner <tim.gardner@canonical.com>
parent a1a6d4d3
What: /debug/aufs/si_<id>/
Date: March 2009
Contact: J. R. Okajima <hooanon05g@gmail.com>
Description:
Under /debug/aufs, a directory named si_<id> is created
per aufs mount, where <id> is a unique id generated
internally.
What: /debug/aufs/si_<id>/plink
Date: Apr 2013
Contact: J. R. Okajima <hooanon05g@gmail.com>
Description:
It has three lines and shows the information about the
pseudo-link. The first line is a single number
representing a number of buckets. The second line is a
number of pseudo-links per buckets (separated by a
blank). The last line is a single number representing a
total number of psedo-links.
When the aufs mount option 'noplink' is specified, it
will show "1\n0\n0\n".
What: /debug/aufs/si_<id>/xib
Date: March 2009
Contact: J. R. Okajima <hooanon05g@gmail.com>
Description:
It shows the consumed blocks by xib (External Inode Number
Bitmap), its block size and file size.
When the aufs mount option 'noxino' is specified, it
will be empty. About XINO files, see the aufs manual.
What: /debug/aufs/si_<id>/xino0, xino1 ... xinoN
Date: March 2009
Contact: J. R. Okajima <hooanon05g@gmail.com>
Description:
It shows the consumed blocks by xino (External Inode Number
Translation Table), its link count, block size and file
size.
When the aufs mount option 'noxino' is specified, it
will be empty. About XINO files, see the aufs manual.
What: /debug/aufs/si_<id>/xigen
Date: March 2009
Contact: J. R. Okajima <hooanon05g@gmail.com>
Description:
It shows the consumed blocks by xigen (External Inode
Generation Table), its block size and file size.
If CONFIG_AUFS_EXPORT is disabled, this entry will not
be created.
When the aufs mount option 'noxino' is specified, it
will be empty. About XINO files, see the aufs manual.
What: /sys/fs/aufs/si_<id>/
Date: March 2009
Contact: J. R. Okajima <hooanon05g@gmail.com>
Description:
Under /sys/fs/aufs, a directory named si_<id> is created
per aufs mount, where <id> is a unique id generated
internally.
What: /sys/fs/aufs/si_<id>/br0, br1 ... brN
Date: March 2009
Contact: J. R. Okajima <hooanon05g@gmail.com>
Description:
It shows the abolute path of a member directory (which
is called branch) in aufs, and its permission.
What: /sys/fs/aufs/si_<id>/brid0, brid1 ... bridN
Date: July 2013
Contact: J. R. Okajima <hooanon05g@gmail.com>
Description:
It shows the id of a member directory (which is called
branch) in aufs.
What: /sys/fs/aufs/si_<id>/xi_path
Date: March 2009
Contact: J. R. Okajima <hooanon05g@gmail.com>
Description:
It shows the abolute path of XINO (External Inode Number
Bitmap, Translation Table and Generation Table) file
even if it is the default path.
When the aufs mount option 'noxino' is specified, it
will be empty. About XINO files, see the aufs manual.
This diff is collapsed.
# Copyright (C) 2005-2015 Junjiro R. Okajima
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
Introduction
----------------------------------------
aufs [ei ju: ef es] | [a u f s]
1. abbrev. for "advanced multi-layered unification filesystem".
2. abbrev. for "another unionfs".
3. abbrev. for "auf das" in German which means "on the" in English.
Ex. "Butter aufs Brot"(G) means "butter onto bread"(E).
But "Filesystem aufs Filesystem" is hard to understand.
AUFS is a filesystem with features:
- multi layered stackable unification filesystem, the member directory
is called as a branch.
- branch permission and attribute, 'readonly', 'real-readonly',
'readwrite', 'whiteout-able', 'link-able whiteout', etc. and their
combination.
- internal "file copy-on-write".
- logical deletion, whiteout.
- dynamic branch manipulation, adding, deleting and changing permission.
- allow bypassing aufs, user's direct branch access.
- external inode number translation table and bitmap which maintains the
persistent aufs inode number.
- seekable directory, including NFS readdir.
- file mapping, mmap and sharing pages.
- pseudo-link, hardlink over branches.
- loopback mounted filesystem as a branch.
- several policies to select one among multiple writable branches.
- revert a single systemcall when an error occurs in aufs.
- and more...
Multi Layered Stackable Unification Filesystem
----------------------------------------------------------------------
Most people already knows what it is.
It is a filesystem which unifies several directories and provides a
merged single directory. When users access a file, the access will be
passed/re-directed/converted (sorry, I am not sure which English word is
correct) to the real file on the member filesystem. The member
filesystem is called 'lower filesystem' or 'branch' and has a mode
'readonly' and 'readwrite.' And the deletion for a file on the lower
readonly branch is handled by creating 'whiteout' on the upper writable
branch.
On LKML, there have been discussions about UnionMount (Jan Blunck,
Bharata B Rao and Valerie Aurora) and Unionfs (Erez Zadok). They took
different approaches to implement the merged-view.
The former tries putting it into VFS, and the latter implements as a
separate filesystem.
(If I misunderstand about these implementations, please let me know and
I shall correct it. Because it is a long time ago when I read their
source files last time).
UnionMount's approach will be able to small, but may be hard to share
branches between several UnionMount since the whiteout in it is
implemented in the inode on branch filesystem and always
shared. According to Bharata's post, readdir does not seems to be
finished yet.
There are several missing features known in this implementations such as
- for users, the inode number may change silently. eg. copy-up.
- link(2) may break by copy-up.
- read(2) may get an obsoleted filedata (fstat(2) too).
- fcntl(F_SETLK) may be broken by copy-up.
- unnecessary copy-up may happen, for example mmap(MAP_PRIVATE) after
open(O_RDWR).
In linux-3.18, "overlay" filesystem (formerly known as "overlayfs") was
merged into mainline. This is another implementation of UnionMount as a
separated filesystem. All the limitations and known problems which
UnionMount are equally inherited to "overlay" filesystem.
Unionfs has a longer history. When I started implementing a stackable
filesystem (Aug 2005), it already existed. It has virtual super_block,
inode, dentry and file objects and they have an array pointing lower
same kind objects. After contributing many patches for Unionfs, I
re-started my project AUFS (Jun 2006).
In AUFS, the structure of filesystem resembles to Unionfs, but I
implemented my own ideas, approaches and enhancements and it became
totally different one.
Comparing DM snapshot and fs based implementation
- the number of bytes to be copied between devices is much smaller.
- the type of filesystem must be one and only.
- the fs must be writable, no readonly fs, even for the lower original
device. so the compression fs will not be usable. but if we use
loopback mount, we may address this issue.
for instance,
mount /cdrom/squashfs.img /sq
losetup /sq/ext2.img
losetup /somewhere/cow
dmsetup "snapshot /dev/loop0 /dev/loop1 ..."
- it will be difficult (or needs more operations) to extract the
difference between the original device and COW.
- DM snapshot-merge may help a lot when users try merging. in the
fs-layer union, users will use rsync(1).
You may want to read my old paper "Filesystems in LiveCD"
(http://aufs.sourceforge.net/aufs2/report/sq/sq.pdf).
Several characters/aspects/persona of aufs
----------------------------------------------------------------------
Aufs has several characters, aspects or persona.
1. a filesystem, callee of VFS helper
2. sub-VFS, caller of VFS helper for branches
3. a virtual filesystem which maintains persistent inode number
4. reader/writer of files on branches such like an application
1. Callee of VFS Helper
As an ordinary linux filesystem, aufs is a callee of VFS. For instance,
unlink(2) from an application reaches sys_unlink() kernel function and
then vfs_unlink() is called. vfs_unlink() is one of VFS helper and it
calls filesystem specific unlink operation. Actually aufs implements the
unlink operation but it behaves like a redirector.
2. Caller of VFS Helper for Branches
aufs_unlink() passes the unlink request to the branch filesystem as if
it were called from VFS. So the called unlink operation of the branch
filesystem acts as usual. As a caller of VFS helper, aufs should handle
every necessary pre/post operation for the branch filesystem.
- acquire the lock for the parent dir on a branch
- lookup in a branch
- revalidate dentry on a branch
- mnt_want_write() for a branch
- vfs_unlink() for a branch
- mnt_drop_write() for a branch
- release the lock on a branch
3. Persistent Inode Number
One of the most important issue for a filesystem is to maintain inode
numbers. This is particularly important to support exporting a
filesystem via NFS. Aufs is a virtual filesystem which doesn't have a
backend block device for its own. But some storage is necessary to
keep and maintain the inode numbers. It may be a large space and may not
suit to keep in memory. Aufs rents some space from its first writable
branch filesystem (by default) and creates file(s) on it. These files
are created by aufs internally and removed soon (currently) keeping
opened.
Note: Because these files are removed, they are totally gone after
unmounting aufs. It means the inode numbers are not persistent
across unmount or reboot. I have a plan to make them really
persistent which will be important for aufs on NFS server.
4. Read/Write Files Internally (copy-on-write)
Because a branch can be readonly, when you write a file on it, aufs will
"copy-up" it to the upper writable branch internally. And then write the
originally requested thing to the file. Generally kernel doesn't
open/read/write file actively. In aufs, even a single write may cause a
internal "file copy". This behaviour is very similar to cp(1) command.
Some people may think it is better to pass such work to user space
helper, instead of doing in kernel space. Actually I am still thinking
about it. But currently I have implemented it in kernel space.
This diff is collapsed.
# Copyright (C) 2015 Junjiro R. Okajima
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
Support for a branch who has its ->atomic_open()
----------------------------------------------------------------------
The filesystems who implement its ->atomic_open() are not majority. For
example NFSv4 does, and aufs should call NFSv4 ->atomic_open,
particularly for open(O_CREAT|O_EXCL, 0400) case. Other than
->atomic_open(), NFSv4 returns an error for this open(2). While I am not
sure whether all filesystems who have ->atomic_open() behave like this,
but NFSv4 surely returns the error.
In order to support ->atomic_open() for aufs, there are a few
approaches.
A. Introduce aufs_atomic_open()
- calls one of VFS:do_last(), lookup_open() or atomic_open() for
branch fs.
B. Introduce aufs_atomic_open() calling create, open and chmod. this is
an aufs user Pip Cet's approach
- calls aufs_create(), VFS finish_open() and notify_change().
- pass fake-mode to finish_open(), and then correct the mode by
notify_change().
C. Extend aufs_open() to call branch fs's ->atomic_open()
- no aufs_atomic_open().
- aufs_lookup() registers the TID to an aufs internal object.
- aufs_create() does nothing when the matching TID is registered, but
registers the mode.
- aufs_open() calls branch fs's ->atomic_open() when the matching
TID is registered.
D. Extend aufs_open() to re-try branch fs's ->open() with superuser's
credential
- no aufs_atomic_open().
- aufs_create() registers the TID to an internal object. this info
represents "this process created this file just now."
- when aufs gets EACCES from branch fs's ->open(), then confirm the
registered TID and re-try open() with superuser's credential.
Pros and cons for each approach.
A.
- straightforward but highly depends upon VFS internal.
- the atomic behavaiour is kept.
- some of parameters such as nameidata are hard to reproduce for
branch fs.
- large overhead.
B.
- easy to implement.
- the atomic behavaiour is lost.
C.
- the atomic behavaiour is kept.
- dirty and tricky.
- VFS checks whether the file is created correctly after calling
->create(), which means this approach doesn't work.
D.
- easy to implement.
- the atomic behavaiour is lost.
- to open a file with superuser's credential and give it to a user
process is a bad idea, since the file object keeps the credential
in it. It may affect LSM or something. This approach doesn't work
either.
The approach A is ideal, but it hard to implement. So here is a
variation of A, which is to be implemented.
A-1. Introduce aufs_atomic_open()
- calls branch fs ->atomic_open() if exists. otherwise calls
vfs_create() and finish_open().
- the demerit is that the several checks after branch fs
->atomic_open() are lost. in the ordinary case, the checks are
done by VFS:do_last(), lookup_open() and atomic_open(). some can
be implemented in aufs, but not all I am afraid.
# Copyright (C) 2005-2015 Junjiro R. Okajima
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
Lookup in a Branch
----------------------------------------------------------------------
Since aufs has a character of sub-VFS (see Introduction), it operates
lookup for branches as VFS does. It may be a heavy work. But almost all
lookup operation in aufs is the simplest case, ie. lookup only an entry
directly connected to its parent. Digging down the directory hierarchy
is unnecessary. VFS has a function lookup_one_len() for that use, and
aufs calls it.
When a branch is a remote filesystem, aufs basically relies upon its
->d_revalidate(), also aufs forces the hardest revalidate tests for
them.
For d_revalidate, aufs implements three levels of revalidate tests. See
"Revalidate Dentry and UDBA" in detail.
Test Only the Highest One for the Directory Permission (dirperm1 option)
----------------------------------------------------------------------
Let's try case study.
- aufs has two branches, upper readwrite and lower readonly.
/au = /rw + /ro
- "dirA" exists under /ro, but /rw. and its mode is 0700.
- user invoked "chmod a+rx /au/dirA"
- the internal copy-up is activated and "/rw/dirA" is created and its
permission bits are set to world readable.
- then "/au/dirA" becomes world readable?
In this case, /ro/dirA is still 0700 since it exists in readonly branch,
or it may be a natively readonly filesystem. If aufs respects the lower
branch, it should not respond readdir request from other users. But user
allowed it by chmod. Should really aufs rejects showing the entries
under /ro/dirA?
To be honest, I don't have a good solution for this case. So aufs
implements 'dirperm1' and 'nodirperm1' mount options, and leave it to
users.
When dirperm1 is specified, aufs checks only the highest one for the
directory permission, and shows the entries. Otherwise, as usual, checks
every dir existing on all branches and rejects the request.
As a side effect, dirperm1 option improves the performance of aufs
because the number of permission check is reduced when the number of
branch is many.
Revalidate Dentry and UDBA (User's Direct Branch Access)
----------------------------------------------------------------------
Generally VFS helpers re-validate a dentry as a part of lookup.
0. digging down the directory hierarchy.
1. lock the parent dir by its i_mutex.
2. lookup the final (child) entry.
3. revalidate it.
4. call the actual operation (create, unlink, etc.)
5. unlock the parent dir
If the filesystem implements its ->d_revalidate() (step 3), then it is
called. Actually aufs implements it and checks the dentry on a branch is
still valid.
But it is not enough. Because aufs has to release the lock for the
parent dir on a branch at the end of ->lookup() (step 2) and
->d_revalidate() (step 3) while the i_mutex of the aufs dir is still
held by VFS.
If the file on a branch is changed directly, eg. bypassing aufs, after
aufs released the lock, then the subsequent operation may cause
something unpleasant result.
This situation is a result of VFS architecture, ->lookup() and
->d_revalidate() is separated. But I never say it is wrong. It is a good
design from VFS's point of view. It is just not suitable for sub-VFS
character in aufs.
Aufs supports such case by three level of revalidation which is
selectable by user.
1. Simple Revalidate
Addition to the native flow in VFS's, confirm the child-parent
relationship on the branch just after locking the parent dir on the
branch in the "actual operation" (step 4). When this validation
fails, aufs returns EBUSY. ->d_revalidate() (step 3) in aufs still
checks the validation of the dentry on branches.
2. Monitor Changes Internally by Inotify/Fsnotify
Addition to above, in the "actual operation" (step 4) aufs re-lookup
the dentry on the branch, and returns EBUSY if it finds different
dentry.
Additionally, aufs sets the inotify/fsnotify watch for every dir on branches
during it is in cache. When the event is notified, aufs registers a
function to kernel 'events' thread by schedule_work(). And the
function sets some special status to the cached aufs dentry and inode
private data. If they are not cached, then aufs has nothing to
do. When the same file is accessed through aufs (step 0-3) later,
aufs will detect the status and refresh all necessary data.
In this mode, aufs has to ignore the event which is fired by aufs
itself.
3. No Extra Validation
This is the simplest test and doesn't add any additional revalidation
test, and skip the revalidation in step 4. It is useful and improves
aufs performance when system surely hide the aufs branches from user,
by over-mounting something (or another method).
# Copyright (C) 2005-2015 Junjiro R. Okajima
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
Branch Manipulation
Since aufs supports dynamic branch manipulation, ie. add/remove a branch
and changing its permission/attribute, there are a lot of works to do.
Add a Branch
----------------------------------------------------------------------
o Confirm the adding dir exists outside of aufs, including loopback
mount, and its various attributes.
o Initialize the xino file and whiteout bases if necessary.
See struct.txt.
o Check the owner/group/mode of the directory
When the owner/group/mode of the adding directory differs from the
existing branch, aufs issues a warning because it may impose a
security risk.
For example, when a upper writable branch has a world writable empty
top directory, a malicious user can create any files on the writable
branch directly, like copy-up and modify manually. If something like
/etc/{passwd,shadow} exists on the lower readonly branch but the upper
writable branch, and the writable branch is world-writable, then a
malicious guy may create /etc/passwd on the writable branch directly
and the infected file will be valid in aufs.
I am afraid it can be a security issue, but aufs can do nothing except
producing a warning.
Delete a Branch
----------------------------------------------------------------------
o Confirm the deleting branch is not busy
To be general, there is one merit to adopt "remount" interface to
manipulate branches. It is to discard caches. At deleting a branch,
aufs checks the still cached (and connected) dentries and inodes. If
there are any, then they are all in-use. An inode without its
corresponding dentry can be alive alone (for example, inotify/fsnotify case).
For the cached one, aufs checks whether the same named entry exists on
other branches.
If the cached one is a directory, because aufs provides a merged view
to users, as long as one dir is left on any branch aufs can show the
dir to users. In this case, the branch can be removed from aufs.
Otherwise aufs rejects deleting the branch.
If any file on the deleting branch is opened by aufs, then aufs
rejects deleting.
Modify the Permission of a Branch
----------------------------------------------------------------------
o Re-initialize or remove the xino file and whiteout bases if necessary.
See struct.txt.
o rw --> ro: Confirm the modifying branch is not busy
Aufs rejects the request if any of these conditions are true.
- a file on the branch is mmap-ed.
- a regular file on the branch is opened for write and there is no
same named entry on the upper branch.
# Copyright (C) 2005-2015 Junjiro R. Okajima
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
Policies to Select One among Multiple Writable Branches
----------------------------------------------------------------------
When the number of writable branch is more than one, aufs has to decide
the target branch for file creation or copy-up. By default, the highest
writable branch which has the parent (or ancestor) dir of the target
file is chosen (top-down-parent policy).
By user's request, aufs implements some other policies to select the
writable branch, for file creation several policies, round-robin,
most-free-space, and other policies. For copy-up, top-down-parent,
bottom-up-parent, bottom-up and others.
As expected, the round-robin policy selects the branch in circular. When
you have two writable branches and creates 10 new files, 5 files will be
created for each branch. mkdir(2) systemcall is an exception. When you
create 10 new directories, all will be created on the same branch.
And the most-free-space policy selects the one which has most free
space among the writable branches. The amount of free space will be
checked by aufs internally, and users can specify its time interval.
The policies for copy-up is more simple,
top-down-parent is equivalent to the same named on in create policy,
bottom-up-parent selects the writable branch where the parent dir
exists and the nearest upper one from the copyup-source,
bottom-up selects the nearest upper writable branch from the
copyup-source, regardless the existence of the parent dir.
There are some rules or exceptions to apply these policies.
- If there is a readonly branch above the policy-selected branch and
the parent dir is marked as opaque (a variation of whiteout), or the
target (creating) file is whiteout-ed on the upper readonly branch,
then the result of the policy is ignored and the target file will be
created on the nearest upper writable branch than the readonly branch.
- If there is a writable branch above the policy-selected branch and
the parent dir is marked as opaque or the target file is whiteouted
on the branch, then the result of the policy is ignored and the target
file will be created on the highest one among the upper writable
branches who has diropq or whiteout. In case of whiteout, aufs removes
it as usual.
- link(2) and rename(2) systemcalls are exceptions in every policy.
They try selecting the branch where the source exists as possible
since copyup a large file will take long time. If it can't be,
ie. the branch where the source exists is readonly, then they will
follow the copyup policy.
- There is an exception for rename(2) when the target exists.
If the rename target exists, aufs compares the index of the branches
where the source and the target exists and selects the higher
one. If the selected branch is readonly, then aufs follows the
copyup policy.
# Copyright (C) 2011-2015 Junjiro R. Okajima
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
File-based Hierarchical Storage Management (FHSM)
----------------------------------------------------------------------
Hierarchical Storage Management (or HSM) is a well-known feature in the
storage world. Aufs provides this feature as file-based with multiple
writable branches, based upon the principle of "Colder, the Lower".
Here the word "colder" means that the less used files, and "lower" means
that the position in the order of the stacked branches vertically.
These multiple writable branches are prioritized, ie. the topmost one
should be the fastest drive and be used heavily.
o Characters in aufs FHSM story
- aufs itself and a new branch attribute.
- a new ioctl interface to move-down and to establish a connection with
the daemon ("move-down" is a converse of "copy-up").
- userspace tool and daemon.
The userspace daemon establishes a connection with aufs and waits for
the notification. The notified information is very similar to struct
statfs containing the number of consumed blocks and inodes.
When the consumed blocks/inodes of a branch exceeds the user-specified
upper watermark, the daemon activates its move-down process until the
consumed blocks/inodes reaches the user-specified lower watermark.
The actual move-down is done by aufs based upon the request from
user-space since we need to maintain the inode number and the internal
pointer arrays in aufs.
Currently aufs FHSM handles the regular files only. Additionally they
must not be hard-linked nor pseudo-linked.
o Cowork of aufs and the user-space daemon
During the userspace daemon established the connection, aufs sends a
small notification to it whenever aufs writes something into the
writable branch. But it may cost high since aufs issues statfs(2)
internally. So user can specify a new option to cache the
info. Actually the notification is controlled by these factors.
+ the specified cache time.
+ classified as "force" by aufs internally.
Until the specified time expires, aufs doesn't send the info
except the forced cases. When aufs decide forcing, the info is always
notified to userspace.
For example, the number of free inodes is generally large enough and
the shortage of it happens rarely. So aufs doesn't force the
notification when creating a new file, directory and others. This is
the typical case which aufs doesn't force.
When aufs writes the actual filedata and the files consumes any of new
blocks, the aufs forces notifying.
o Interfaces in aufs
- New branch attribute.
+ fhsm
Specifies that the branch is managed by FHSM feature. In other word,
participant in the FHSM.
When nofhsm is set to the branch, it will not be the source/target
branch of the move-down operation. This attribute is set
independently from coo and moo attributes, and if you want full
FHSM, you should specify them as well.
- New mount option.
+ fhsm_sec
Specifies a second to suppress many less important info to be
notified.
- New ioctl.
+ AUFS_CTL_FHSM_FD
create a new file descriptor which userspace can read the notification
(a subset of struct statfs) from aufs.
- Module parameter 'brs'
It has to be set to 1. Otherwise the new mount option 'fhsm' will not
be set.
- mount helpers /sbin/mount.aufs and /sbin/umount.aufs
When there are two or more branches with fhsm attributes,
/sbin/mount.aufs invokes the user-space daemon and /sbin/umount.aufs
terminates it. As a result of remounting and branch-manipulation, the
number of branches with fhsm attribute can be one. In this case,
/sbin/mount.aufs will terminate the user-space daemon.
Finally the operation is done as these steps in kernel-space.
- make sure that,
+ no one else is using the file.
+ the file is not hard-linked.
+ the file is not pseudo-linked.
+ the file is a regular file.
+ the parent dir is not opaqued.
- find the target writable branch.
- make sure the file is not whiteout-ed by the upper (than the target)
branch.
- make the parent dir on the target branch.
- mutex lock the inode on the branch.
- unlink the whiteout on the target branch (if exists).
- lookup and create the whiteout-ed temporary name on the target branch.
- copy the file as the whiteout-ed temporary name on the target branch.
- rename the whiteout-ed temporary name to the original name.
- unlink the file on the source branch.
- maintain the internal pointer array and the external inode number
table (XINO).
- maintain the timestamps and other attributes of the parent dir and the
file.
And of course, in every step, an error may happen. So the operation
should restore the original file state after an error happens.
# Copyright (C) 2005-2015 Junjiro R. Okajima
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
mmap(2) -- File Memory Mapping
----------------------------------------------------------------------
In aufs, the file-mapped pages are handled by a branch fs directly, no
interaction with aufs. It means aufs_mmap() calls the branch fs's
->mmap().
This approach is simple and good, but there is one problem.
Under /proc, several entries show the mmapped files by its path (with
device and inode number), and the printed path will be the path on the
branch fs's instead of virtual aufs's.
This is not a problem in most cases, but some utilities lsof(1) (and its
user) may expect the path on aufs.
To address this issue, aufs adds a new member called vm_prfile in struct
vm_area_struct (and struct vm_region). The original vm_file points to
the file on the branch fs in order to handle everything correctly as
usual. The new vm_prfile points to a virtual file in aufs, and the
show-functions in procfs refers to vm_prfile if it is set.
Also we need to maintain several other places where touching vm_file
such like
- fork()/clone() copies vma and the reference count of vm_file is
incremented.
- merging vma maintains the ref count too.
This is not a good approach. It just fakes the printed path. But it
leaves all behaviour around f_mapping unchanged. This is surely an
advantage.
Actually aufs had adopted another complicated approach which calls
generic_file_mmap() and handles struct vm_operations_struct. In this
approach, aufs met a hard problem and I could not solve it without
switching the approach.
There may be one more another approach which is
- bind-mount the branch-root onto the aufs-root internally
- grab the new vfsmount (ie. struct mount)
- lazy-umount the branch-root internally
- in open(2) the aufs-file, open the branch-file with the hidden
vfsmount (instead of the original branch's vfsmount)
- ideally this "bind-mount and lazy-umount" should be done atomically,
but it may be possible from userspace by the mount helper.
Adding the internal hidden vfsmount and using it in opening a file, the
file path under /proc will be printed correctly. This approach looks
smarter, but is not possible I am afraid.
- aufs-root may be bind-mount later. when it happens, another hidden
vfsmount will be required.
- it is hard to get the chance to bind-mount and lazy-umount
+ in kernel-space, FS can have vfsmount in open(2) via
file->f_path, and aufs can know its vfsmount. But several locks are
already acquired, and if aufs tries to bind-mount and lazy-umount
here, then it may cause a deadlock.
+ in user-space, bind-mount doesn't invoke the mount helper.
- since /proc shows dev and ino, aufs has to give vma these info. it
means a new member vm_prinode will be necessary. this is essentially
equivalent to vm_prfile described above.
I have to give up this "looks-smater" approach.
# Copyright (C) 2014-2015 Junjiro R. Okajima
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
Listing XATTR/EA and getting the value
----------------------------------------------------------------------
For the inode standard attributes (owner, group, timestamps, etc.), aufs
shows the values from the topmost existing file. This behaviour is good
for the non-dir entries since the bahaviour exactly matches the shown
information. But for the directories, aufs considers all the same named
entries on the lower branches. Which means, if one of the lower entry
rejects readdir call, then aufs returns an error even if the topmost
entry allows it. This behaviour is necessary to respect the branch fs's
security, but can make users confused since the user-visible standard
attributes don't match the behaviour.
To address this issue, aufs has a mount option called dirperm1 which
checks the permission for the topmost entry only, and ignores the lower
entry's permission.
A similar issue can happen around XATTR.
getxattr(2) and listxattr(2) families behave as if dirperm1 option is
always set. Otherwise these very unpleasant situation would happen.
- listxattr(2) may return the duplicated entries.
- users may not be able to remove or reset the XATTR forever,
XATTR/EA support in the internal (copy,move)-(up,down)
----------------------------------------------------------------------
Generally the extended attributes of inode are categorized as these.
- "security" for LSM and capability.
- "system" for posix ACL, 'acl' mount option is required for the branch
fs generally.
- "trusted" for userspace, CAP_SYS_ADMIN is required.
- "user" for userspace, 'user_xattr' mount option is required for the
branch fs generally.
Moreover there are some other categories. Aufs handles these rather
unpopular categories as the ordinary ones, ie. there is no special
condition nor exception.
In copy-up, the support for XATTR on the dst branch may differ from the
src branch. In this case, the copy-up operation will get an error and
the original user operation which triggered the copy-up will fail. It
can happen that even all copy-up will fail.
When both of src and dst branches support XATTR and if an error occurs
during copying XATTR, then the copy-up should fail obviously. That is a
good reason and aufs should return an error to userspace. But when only
the src branch support that XATTR, aufs should not return an error.
For example, the src branch supports ACL but the dst branch doesn't
because the dst branch may natively un-support it or temporary
un-support it due to "noacl" mount option. Of course, the dst branch fs
may NOT return an error even if the XATTR is not supported. It is
totally up to the branch fs.
Anyway when the aufs internal copy-up gets an error from the dst branch
fs, then aufs tries removing the just copied entry and returns the error
to the userspace. The worst case of this situation will be all copy-up
will fail.
For the copy-up operation, there two basic approaches.
- copy the specified XATTR only (by category above), and return the
error unconditionally if it happens.
- copy all XATTR, and ignore the error on the specified category only.
In order to support XATTR and to implement the correct behaviour, aufs
chooses the latter approach and introduces some new branch attributes,
"icexsec", "icexsys", "icextr", "icexusr", and "icexoth".
They correspond to the XATTR namespaces (see above). Additionally, to be
convenient, "icex" is also provided which means all "icex*" attributes
are set (here the word "icex" stands for "ignore copy-error on XATTR").
The meaning of these attributes is to ignore the error from setting
XATTR on that branch.
Note that aufs tries copying all XATTR unconditionally, and ignores the
error from the dst branch according to the specified attributes.
Some XATTR may have its default value. The default value may come from
the parent dir or the environment. If the default value is set at the
file creating-time, it will be overwritten by copy-up.
Some contradiction may happen I am afraid.
Do we need another attribute to stop copying XATTR? I am unsure. For
now, aufs implements the branch attributes to ignore the error.
# Copyright (C) 2005-2015 Junjiro R. Okajima
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
Export Aufs via NFS
----------------------------------------------------------------------
Here is an approach.
- like xino/xib, add a new file 'xigen' which stores aufs inode
generation.
- iget_locked(): initialize aufs inode generation for a new inode, and
store it in xigen file.
- destroy_inode(): increment aufs inode generation and store it in xigen
file. it is necessary even if it is not unlinked, because any data of
inode may be changed by UDBA.
- encode_fh(): for a root dir, simply return FILEID_ROOT. otherwise
build file handle by
+ branch id (4 bytes)
+ superblock generation (4 bytes)
+ inode number (4 or 8 bytes)
+ parent dir inode number (4 or 8 bytes)
+ inode generation (4 bytes))
+ return value of exportfs_encode_fh() for the parent on a branch (4
bytes)
+ file handle for a branch (by exportfs_encode_fh())
- fh_to_dentry():
+ find the index of a branch from its id in handle, and check it is
still exist in aufs.
+ 1st level: get the inode number from handle and search it in cache.
+ 2nd level: if not found in cache, get the parent inode number from
the handle and search it in cache. and then open the found parent
dir, find the matching inode number by vfs_readdir() and get its
name, and call lookup_one_len() for the target dentry.
+ 3rd level: if the parent dir is not cached, call
exportfs_decode_fh() for a branch and get the parent on a branch,
build a pathname of it, convert it a pathname in aufs, call
path_lookup(). now aufs gets a parent dir dentry, then handle it as
the 2nd level.
+ to open the dir, aufs needs struct vfsmount. aufs keeps vfsmount
for every branch, but not itself. to get this, (currently) aufs
searches in current->nsproxy->mnt_ns list. it may not be a good
idea, but I didn't get other approach.
+ test the generation of the gotten inode.
- every inode operation: they may get EBUSY due to UDBA. in this case,
convert it into ESTALE for NFSD.
- readdir(): call lockdep_on/off() because filldir in NFSD calls
lookup_one_len(), vfs_getattr(), encode_fh() and others.
# Copyright (C) 2005-2015 Junjiro R. Okajima
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
Show Whiteout Mode (shwh)
----------------------------------------------------------------------
Generally aufs hides the name of whiteouts. But in some cases, to show
them is very useful for users. For instance, creating a new middle layer
(branch) by merging existing layers.
(borrowing aufs1 HOW-TO from a user, Michael Towers)
When you have three branches,
- Bottom: 'system', squashfs (underlying base system), read-only
- Middle: 'mods', squashfs, read-only
- Top: 'overlay', ram (tmpfs), read-write
The top layer is loaded at boot time and saved at shutdown, to preserve
the changes made to the system during the session.
When larger changes have been made, or smaller changes have accumulated,
the size of the saved top layer data grows. At this point, it would be
nice to be able to merge the two overlay branches ('mods' and 'overlay')
and rewrite the 'mods' squashfs, clearing the top layer and thus
restoring save and load speed.
This merging is simplified by the use of another aufs mount, of just the
two overlay branches using the 'shwh' option.
# mount -t aufs -o ro,shwh,br:/livesys/overlay=ro+wh:/livesys/mods=rr+wh \
aufs /livesys/merge_union
A merged view of these two branches is then available at
/livesys/merge_union, and the new feature is that the whiteouts are
visible!
Note that in 'shwh' mode the aufs mount must be 'ro', which will disable
writing to all branches. Also the default mode for all branches is 'ro'.
It is now possible to save the combined contents of the two overlay
branches to a new squashfs, e.g.:
# mksquashfs /livesys/merge_union /path/to/newmods.squash
This new squashfs archive can be stored on the boot device and the
initramfs will use it to replace the old one at the next boot.
# Copyright (C) 2010-2015 Junjiro R. Okajima
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
Dynamically customizable FS operations
----------------------------------------------------------------------
Generally FS operations (struct inode_operations, struct
address_space_operations, struct file_operations, etc.) are defined as
"static const", but it never means that FS have only one set of
operation. Some FS have multiple sets of them. For instance, ext2 has
three sets, one for XIP, for NOBH, and for normal.
Since aufs overrides and redirects these operations, sometimes aufs has
to change its behaviour according to the branch FS type. More importantly
VFS acts differently if a function (member in the struct) is set or
not. It means aufs should have several sets of operations and select one
among them according to the branch FS definition.
In order to solve this problem and not to affect the behaviour of VFS,
aufs defines these operations dynamically. For instance, aufs defines
dummy direct_IO function for struct address_space_operations, but it may
not be set to the address_space_operations actually. When the branch FS
doesn't have it, aufs doesn't set it to its address_space_operations
while the function definition itself is still alive. So the behaviour
itself will not change, and it will return an error when direct_IO is
not set.
The lifetime of these dynamically generated operation object is
maintained by aufs branch object. When the branch is removed from aufs,
the reference counter of the object is decremented. When it reaches
zero, the dynamically generated operation object will be freed.
This approach is designed to support AIO (io_submit), Direct I/O and
XIP (DAX) mainly.
Currently this approach is applied to address_space_operations for
regular files only.
...@@ -2029,6 +2029,19 @@ F: include/linux/audit.h ...@@ -2029,6 +2029,19 @@ F: include/linux/audit.h
F: include/uapi/linux/audit.h F: include/uapi/linux/audit.h
F: kernel/audit* F: kernel/audit*
AUFS (advanced multi layered unification filesystem) FILESYSTEM
M: "J. R. Okajima" <hooanon05g@gmail.com>
L: linux-unionfs@vger.kernel.org
L: aufs-users@lists.sourceforge.net (members only)
W: http://aufs.sourceforge.net
T: git://github.com/sfjro/aufs4-linux.git
S: Supported
F: Documentation/filesystems/aufs/
F: Documentation/ABI/testing/debugfs-aufs
F: Documentation/ABI/testing/sysfs-aufs
F: fs/aufs/
F: include/uapi/linux/aufs_type.h
AUXILIARY DISPLAY DRIVERS AUXILIARY DISPLAY DRIVERS
M: Miguel Ojeda Sandonis <miguel.ojeda.sandonis@gmail.com> M: Miguel Ojeda Sandonis <miguel.ojeda.sandonis@gmail.com>
W: http://miguelojeda.es/auxdisplay.htm W: http://miguelojeda.es/auxdisplay.htm
......
...@@ -556,7 +556,7 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq) ...@@ -556,7 +556,7 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq)
} }
struct switch_request { struct switch_request {
struct file *file; struct file *file, *virt_file;
struct completion wait; struct completion wait;
}; };
...@@ -582,6 +582,7 @@ static void do_loop_switch(struct loop_device *lo, struct switch_request *p) ...@@ -582,6 +582,7 @@ static void do_loop_switch(struct loop_device *lo, struct switch_request *p)
mapping = file->f_mapping; mapping = file->f_mapping;
mapping_set_gfp_mask(old_file->f_mapping, lo->old_gfp_mask); mapping_set_gfp_mask(old_file->f_mapping, lo->old_gfp_mask);
lo->lo_backing_file = file; lo->lo_backing_file = file;
lo->lo_backing_virt_file = p->virt_file;
lo->lo_blocksize = S_ISBLK(mapping->host->i_mode) ? lo->lo_blocksize = S_ISBLK(mapping->host->i_mode) ?
mapping->host->i_bdev->bd_block_size : PAGE_SIZE; mapping->host->i_bdev->bd_block_size : PAGE_SIZE;
lo->old_gfp_mask = mapping_gfp_mask(mapping); lo->old_gfp_mask = mapping_gfp_mask(mapping);
...@@ -594,11 +595,13 @@ static void do_loop_switch(struct loop_device *lo, struct switch_request *p) ...@@ -594,11 +595,13 @@ static void do_loop_switch(struct loop_device *lo, struct switch_request *p)
* First it needs to flush existing IO, it does this by sending a magic * First it needs to flush existing IO, it does this by sending a magic
* BIO down the pipe. The completion of this BIO does the actual switch. * BIO down the pipe. The completion of this BIO does the actual switch.
*/ */
static int loop_switch(struct loop_device *lo, struct file *file) static int loop_switch(struct loop_device *lo, struct file *file,
struct file *virt_file)
{ {
struct switch_request w; struct switch_request w;
w.file = file; w.file = file;
w.virt_file = virt_file;
/* freeze queue and wait for completion of scheduled requests */ /* freeze queue and wait for completion of scheduled requests */
blk_mq_freeze_queue(lo->lo_queue); blk_mq_freeze_queue(lo->lo_queue);
...@@ -617,7 +620,16 @@ static int loop_switch(struct loop_device *lo, struct file *file) ...@@ -617,7 +620,16 @@ static int loop_switch(struct loop_device *lo, struct file *file)
*/ */
static int loop_flush(struct loop_device *lo) static int loop_flush(struct loop_device *lo)
{ {
return loop_switch(lo, NULL); return loop_switch(lo, NULL, NULL);
}
static struct file *loop_real_file(struct file *file)
{
struct file *f = NULL;
if (file->f_path.dentry->d_sb->s_op->real_loop)
f = file->f_path.dentry->d_sb->s_op->real_loop(file);
return f;
} }
static void loop_reread_partitions(struct loop_device *lo, static void loop_reread_partitions(struct loop_device *lo,
...@@ -654,6 +666,7 @@ static int loop_change_fd(struct loop_device *lo, struct block_device *bdev, ...@@ -654,6 +666,7 @@ static int loop_change_fd(struct loop_device *lo, struct block_device *bdev,
unsigned int arg) unsigned int arg)
{ {
struct file *file, *old_file; struct file *file, *old_file;
struct file *f, *virt_file = NULL, *old_virt_file;
struct inode *inode; struct inode *inode;
int error; int error;
...@@ -670,9 +683,16 @@ static int loop_change_fd(struct loop_device *lo, struct block_device *bdev, ...@@ -670,9 +683,16 @@ static int loop_change_fd(struct loop_device *lo, struct block_device *bdev,
file = fget(arg); file = fget(arg);
if (!file) if (!file)
goto out; goto out;
f = loop_real_file(file);
if (f) {
virt_file = file;
file = f;
get_file(file);
}
inode = file->f_mapping->host; inode = file->f_mapping->host;
old_file = lo->lo_backing_file; old_file = lo->lo_backing_file;
old_virt_file = lo->lo_backing_virt_file;
error = -EINVAL; error = -EINVAL;
...@@ -684,17 +704,21 @@ static int loop_change_fd(struct loop_device *lo, struct block_device *bdev, ...@@ -684,17 +704,21 @@ static int loop_change_fd(struct loop_device *lo, struct block_device *bdev,
goto out_putf; goto out_putf;
/* and ... switch */ /* and ... switch */
error = loop_switch(lo, file); error = loop_switch(lo, file, virt_file);
if (error) if (error)
goto out_putf; goto out_putf;
fput(old_file); fput(old_file);
if (old_virt_file)
fput(old_virt_file);
if (lo->lo_flags & LO_FLAGS_PARTSCAN) if (lo->lo_flags & LO_FLAGS_PARTSCAN)
loop_reread_partitions(lo, bdev); loop_reread_partitions(lo, bdev);
return 0; return 0;
out_putf: out_putf:
fput(file); fput(file);
if (virt_file)
fput(virt_file);
out: out:
return error; return error;
} }
...@@ -706,6 +730,24 @@ static inline int is_loop_device(struct file *file) ...@@ -706,6 +730,24 @@ static inline int is_loop_device(struct file *file)
return i && S_ISBLK(i->i_mode) && MAJOR(i->i_rdev) == LOOP_MAJOR; return i && S_ISBLK(i->i_mode) && MAJOR(i->i_rdev) == LOOP_MAJOR;
} }
/*
* for AUFS
* no get/put for file.
*/
struct file *loop_backing_file(struct super_block *sb)
{
struct file *ret;
struct loop_device *l;
ret = NULL;
if (MAJOR(sb->s_dev) == LOOP_MAJOR) {
l = sb->s_bdev->bd_disk->private_data;
ret = l->lo_backing_file;
}
return ret;
}
EXPORT_SYMBOL(loop_backing_file);
/* loop sysfs attributes */ /* loop sysfs attributes */
static ssize_t loop_attr_show(struct device *dev, char *page, static ssize_t loop_attr_show(struct device *dev, char *page,
...@@ -863,7 +905,7 @@ static int loop_prepare_queue(struct loop_device *lo) ...@@ -863,7 +905,7 @@ static int loop_prepare_queue(struct loop_device *lo)
static int loop_set_fd(struct loop_device *lo, fmode_t mode, static int loop_set_fd(struct loop_device *lo, fmode_t mode,
struct block_device *bdev, unsigned int arg) struct block_device *bdev, unsigned int arg)
{ {
struct file *file, *f; struct file *file, *f, *virt_file = NULL;
struct inode *inode; struct inode *inode;
struct address_space *mapping; struct address_space *mapping;
unsigned lo_blocksize; unsigned lo_blocksize;
...@@ -878,6 +920,12 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode, ...@@ -878,6 +920,12 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
file = fget(arg); file = fget(arg);
if (!file) if (!file)
goto out; goto out;
f = loop_real_file(file);
if (f) {
virt_file = file;
file = f;
get_file(file);
}
error = -EBUSY; error = -EBUSY;
if (lo->lo_state != Lo_unbound) if (lo->lo_state != Lo_unbound)
...@@ -930,6 +978,7 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode, ...@@ -930,6 +978,7 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
lo->lo_device = bdev; lo->lo_device = bdev;
lo->lo_flags = lo_flags; lo->lo_flags = lo_flags;
lo->lo_backing_file = file; lo->lo_backing_file = file;
lo->lo_backing_virt_file = virt_file;
lo->transfer = NULL; lo->transfer = NULL;
lo->ioctl = NULL; lo->ioctl = NULL;
lo->lo_sizelimit = 0; lo->lo_sizelimit = 0;
...@@ -962,6 +1011,8 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode, ...@@ -962,6 +1011,8 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
out_putf: out_putf:
fput(file); fput(file);
if (virt_file)
fput(virt_file);
out: out:
/* This is safe: open() is still holding a reference. */ /* This is safe: open() is still holding a reference. */
module_put(THIS_MODULE); module_put(THIS_MODULE);
...@@ -1008,6 +1059,7 @@ loop_init_xfer(struct loop_device *lo, struct loop_func_table *xfer, ...@@ -1008,6 +1059,7 @@ loop_init_xfer(struct loop_device *lo, struct loop_func_table *xfer,
static int loop_clr_fd(struct loop_device *lo) static int loop_clr_fd(struct loop_device *lo)
{ {
struct file *filp = lo->lo_backing_file; struct file *filp = lo->lo_backing_file;
struct file *virt_filp = lo->lo_backing_virt_file;
gfp_t gfp = lo->old_gfp_mask; gfp_t gfp = lo->old_gfp_mask;
struct block_device *bdev = lo->lo_device; struct block_device *bdev = lo->lo_device;
...@@ -1039,6 +1091,7 @@ static int loop_clr_fd(struct loop_device *lo) ...@@ -1039,6 +1091,7 @@ static int loop_clr_fd(struct loop_device *lo)
spin_lock_irq(&lo->lo_lock); spin_lock_irq(&lo->lo_lock);
lo->lo_state = Lo_rundown; lo->lo_state = Lo_rundown;
lo->lo_backing_file = NULL; lo->lo_backing_file = NULL;
lo->lo_backing_virt_file = NULL;
spin_unlock_irq(&lo->lo_lock); spin_unlock_irq(&lo->lo_lock);
loop_release_xfer(lo); loop_release_xfer(lo);
...@@ -1083,6 +1136,8 @@ static int loop_clr_fd(struct loop_device *lo) ...@@ -1083,6 +1136,8 @@ static int loop_clr_fd(struct loop_device *lo)
* bd_mutex which is usually taken before lo_ctl_mutex. * bd_mutex which is usually taken before lo_ctl_mutex.
*/ */
fput(filp); fput(filp);
if (virt_filp)
fput(virt_filp);
return 0; return 0;
} }
......
...@@ -46,7 +46,7 @@ struct loop_device { ...@@ -46,7 +46,7 @@ struct loop_device {
int (*ioctl)(struct loop_device *, int cmd, int (*ioctl)(struct loop_device *, int cmd,
unsigned long arg); unsigned long arg);
struct file * lo_backing_file; struct file * lo_backing_file, *lo_backing_virt_file;
struct block_device *lo_device; struct block_device *lo_device;
unsigned lo_blocksize; unsigned lo_blocksize;
void *key_data; void *key_data;
......
...@@ -221,6 +221,7 @@ source "fs/pstore/Kconfig" ...@@ -221,6 +221,7 @@ source "fs/pstore/Kconfig"
source "fs/sysv/Kconfig" source "fs/sysv/Kconfig"
source "fs/ufs/Kconfig" source "fs/ufs/Kconfig"
source "fs/exofs/Kconfig" source "fs/exofs/Kconfig"
source "fs/aufs/Kconfig"
endif # MISC_FILESYSTEMS endif # MISC_FILESYSTEMS
......
...@@ -126,3 +126,4 @@ obj-y += exofs/ # Multiple modules ...@@ -126,3 +126,4 @@ obj-y += exofs/ # Multiple modules
obj-$(CONFIG_CEPH_FS) += ceph/ obj-$(CONFIG_CEPH_FS) += ceph/
obj-$(CONFIG_PSTORE) += pstore/ obj-$(CONFIG_PSTORE) += pstore/
obj-$(CONFIG_EFIVAR_FS) += efivarfs/ obj-$(CONFIG_EFIVAR_FS) += efivarfs/
obj-$(CONFIG_AUFS_FS) += aufs/
config AUFS_FS
tristate "Aufs (Advanced multi layered unification filesystem) support"
help
Aufs is a stackable unification filesystem such as Unionfs,
which unifies several directories and provides a merged single
directory.
In the early days, aufs was entirely re-designed and
re-implemented Unionfs Version 1.x series. Introducing many
original ideas, approaches and improvements, it becomes totally
different from Unionfs while keeping the basic features.
if AUFS_FS
choice
prompt "Maximum number of branches"
default AUFS_BRANCH_MAX_127
help
Specifies the maximum number of branches (or member directories)
in a single aufs. The larger value consumes more system
resources and has a minor impact to performance.
config AUFS_BRANCH_MAX_127
bool "127"
help
Specifies the maximum number of branches (or member directories)
in a single aufs. The larger value consumes more system
resources and has a minor impact to performance.
config AUFS_BRANCH_MAX_511
bool "511"
help
Specifies the maximum number of branches (or member directories)
in a single aufs. The larger value consumes more system
resources and has a minor impact to performance.
config AUFS_BRANCH_MAX_1023
bool "1023"
help
Specifies the maximum number of branches (or member directories)
in a single aufs. The larger value consumes more system
resources and has a minor impact to performance.
config AUFS_BRANCH_MAX_32767
bool "32767"
help
Specifies the maximum number of branches (or member directories)
in a single aufs. The larger value consumes more system
resources and has a minor impact to performance.
endchoice
config AUFS_SBILIST
bool
depends on AUFS_MAGIC_SYSRQ || PROC_FS
default y
help
Automatic configuration for internal use.
When aufs supports Magic SysRq or /proc, enabled automatically.
config AUFS_HNOTIFY
bool "Detect direct branch access (bypassing aufs)"
help
If you want to modify files on branches directly, eg. bypassing aufs,
and want aufs to detect the changes of them fully, then enable this
option and use 'udba=notify' mount option.
Currently there is only one available configuration, "fsnotify".
It will have a negative impact to the performance.
See detail in aufs.5.
choice
prompt "method" if AUFS_HNOTIFY
default AUFS_HFSNOTIFY
config AUFS_HFSNOTIFY
bool "fsnotify"
select FSNOTIFY
endchoice
config AUFS_EXPORT
bool "NFS-exportable aufs"
depends on EXPORTFS
help
If you want to export your mounted aufs via NFS, then enable this
option. There are several requirements for this configuration.
See detail in aufs.5.
config AUFS_INO_T_64
bool
depends on AUFS_EXPORT
depends on 64BIT && !(ALPHA || S390)
default y
help
Automatic configuration for internal use.
/* typedef unsigned long/int __kernel_ino_t */
/* alpha and s390x are int */
config AUFS_XATTR
bool "support for XATTR/EA (including Security Labels)"
help
If your branch fs supports XATTR/EA and you want to make them
available in aufs too, then enable this opsion and specify the
branch attributes for EA.
See detail in aufs.5.
config AUFS_FHSM
bool "File-based Hierarchical Storage Management"
help
Hierarchical Storage Management (or HSM) is a well-known feature
in the storage world. Aufs provides this feature as file-based.
with multiple branches.
These multiple branches are prioritized, ie. the topmost one
should be the fastest drive and be used heavily.
config AUFS_RDU
bool "Readdir in userspace"
help
Aufs has two methods to provide a merged view for a directory,
by a user-space library and by kernel-space natively. The latter
is always enabled but sometimes large and slow.
If you enable this option, install the library in aufs2-util
package, and set some environment variables for your readdir(3),
then the work will be handled in user-space which generally
shows better performance in most cases.
See detail in aufs.5.
config AUFS_SHWH
bool "Show whiteouts"
help
If you want to make the whiteouts in aufs visible, then enable
this option and specify 'shwh' mount option. Although it may
sounds like philosophy or something, but in technically it
simply shows the name of whiteout with keeping its behaviour.
config AUFS_BR_RAMFS
bool "Ramfs (initramfs/rootfs) as an aufs branch"
help
If you want to use ramfs as an aufs branch fs, then enable this
option. Generally tmpfs is recommended.
Aufs prohibited them to be a branch fs by default, because
initramfs becomes unusable after switch_root or something
generally. If you sets initramfs as an aufs branch and boot your
system by switch_root, you will meet a problem easily since the
files in initramfs may be inaccessible.
Unless you are going to use ramfs as an aufs branch fs without
switch_root or something, leave it N.
config AUFS_BR_FUSE
bool "Fuse fs as an aufs branch"
depends on FUSE_FS
select AUFS_POLL
help
If you want to use fuse-based userspace filesystem as an aufs
branch fs, then enable this option.
It implements the internal poll(2) operation which is
implemented by fuse only (curretnly).
config AUFS_POLL
bool
help
Automatic configuration for internal use.
config AUFS_BR_HFSPLUS
bool "Hfsplus as an aufs branch"
depends on HFSPLUS_FS
default y
help
If you want to use hfsplus fs as an aufs branch fs, then enable
this option. This option introduces a small overhead at
copying-up a file on hfsplus.
config AUFS_BDEV_LOOP
bool
depends on BLK_DEV_LOOP
default y
help
Automatic configuration for internal use.
Convert =[ym] into =y.
config AUFS_DEBUG
bool "Debug aufs"
help
Enable this to compile aufs internal debug code.
It will have a negative impact to the performance.
config AUFS_MAGIC_SYSRQ
bool
depends on AUFS_DEBUG && MAGIC_SYSRQ
default y
help
Automatic configuration for internal use.
When aufs supports Magic SysRq, enabled automatically.
endif
include ${src}/magic.mk
ifeq (${CONFIG_AUFS_FS},m)
include ${src}/conf.mk
endif
-include ${src}/priv_def.mk
# cf. include/linux/kernel.h
# enable pr_debug
ccflags-y += -DDEBUG
# sparse requires the full pathname
ifdef M
ccflags-y += -include ${M}/../../include/uapi/linux/aufs_type.h
else
ccflags-y += -include ${srctree}/include/uapi/linux/aufs_type.h
endif
obj-$(CONFIG_AUFS_FS) += aufs.o
aufs-y := module.o sbinfo.o super.o branch.o xino.o sysaufs.o opts.o \
wkq.o vfsub.o dcsub.o \
cpup.o whout.o wbr_policy.o \
dinfo.o dentry.o \
dynop.o \
finfo.o file.o f_op.o \
dir.o vdir.o \
iinfo.o inode.o i_op.o i_op_add.o i_op_del.o i_op_ren.o \
mvdown.o ioctl.o
# all are boolean
aufs-$(CONFIG_PROC_FS) += procfs.o plink.o
aufs-$(CONFIG_SYSFS) += sysfs.o
aufs-$(CONFIG_DEBUG_FS) += dbgaufs.o
aufs-$(CONFIG_AUFS_BDEV_LOOP) += loop.o
aufs-$(CONFIG_AUFS_HNOTIFY) += hnotify.o
aufs-$(CONFIG_AUFS_HFSNOTIFY) += hfsnotify.o
aufs-$(CONFIG_AUFS_EXPORT) += export.o
aufs-$(CONFIG_AUFS_XATTR) += xattr.o
aufs-$(CONFIG_FS_POSIX_ACL) += posix_acl.o
aufs-$(CONFIG_AUFS_FHSM) += fhsm.o
aufs-$(CONFIG_AUFS_POLL) += poll.o
aufs-$(CONFIG_AUFS_RDU) += rdu.o
aufs-$(CONFIG_AUFS_BR_HFSPLUS) += hfsplus.o
aufs-$(CONFIG_AUFS_DEBUG) += debug.o
aufs-$(CONFIG_AUFS_MAGIC_SYSRQ) += sysrq.o
/*
* Copyright (C) 2005-2015 Junjiro R. Okajima
*
* This program, aufs is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
/*
* all header files
*/
#ifndef __AUFS_H__
#define __AUFS_H__
#ifdef __KERNEL__
#define AuStub(type, name, body, ...) \
static inline type name(__VA_ARGS__) { body; }
#define AuStubVoid(name, ...) \
AuStub(void, name, , __VA_ARGS__)
#define AuStubInt0(name, ...) \
AuStub(int, name, return 0, __VA_ARGS__)
#include "debug.h"
#include "branch.h"
#include "cpup.h"
#include "dcsub.h"
#include "dbgaufs.h"
#include "dentry.h"
#include "dir.h"
#include "dynop.h"
#include "file.h"
#include "fstype.h"
#include "inode.h"
#include "loop.h"
#include "module.h"
#include "opts.h"
#include "rwsem.h"
#include "spl.h"
#include "super.h"
#include "sysaufs.h"
#include "vfsub.h"
#include "whout.h"
#include "wkq.h"
#endif /* __KERNEL__ */
#endif /* __AUFS_H__ */
This diff is collapsed.
/*
* Copyright (C) 2005-2015 Junjiro R. Okajima
*
* This program, aufs is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
/*
* branch filesystems and xino for them
*/
#ifndef __AUFS_BRANCH_H__
#define __AUFS_BRANCH_H__
#ifdef __KERNEL__
#include <linux/mount.h>
#include "dynop.h"
#include "rwsem.h"
#include "super.h"
/* ---------------------------------------------------------------------- */
/* a xino file */
struct au_xino_file {
struct file *xi_file;
struct mutex xi_nondir_mtx;
/* todo: make xino files an array to support huge inode number */
#ifdef CONFIG_DEBUG_FS
struct dentry *xi_dbgaufs;
#endif
};
/* File-based Hierarchical Storage Management */
struct au_br_fhsm {
#ifdef CONFIG_AUFS_FHSM
struct mutex bf_lock;
unsigned long bf_jiffy;
struct aufs_stfs bf_stfs;
int bf_readable;
#endif
};
/* members for writable branch only */
enum {AuBrWh_BASE, AuBrWh_PLINK, AuBrWh_ORPH, AuBrWh_Last};
struct au_wbr {
struct au_rwsem wbr_wh_rwsem;
struct dentry *wbr_wh[AuBrWh_Last];
atomic_t wbr_wh_running;
#define wbr_whbase wbr_wh[AuBrWh_BASE] /* whiteout base */
#define wbr_plink wbr_wh[AuBrWh_PLINK] /* pseudo-link dir */
#define wbr_orph wbr_wh[AuBrWh_ORPH] /* dir for orphans */
/* mfs mode */
unsigned long long wbr_bytes;
};
/* ext2 has 3 types of operations at least, ext3 has 4 */
#define AuBrDynOp (AuDyLast * 4)
#ifdef CONFIG_AUFS_HFSNOTIFY
/* support for asynchronous destruction */
struct au_br_hfsnotify {
struct fsnotify_group *hfsn_group;
};
#endif
/* sysfs entries */
struct au_brsysfs {
char name[16];
struct attribute attr;
};
enum {
AuBrSysfs_BR,
AuBrSysfs_BRID,
AuBrSysfs_Last
};
/* protected by superblock rwsem */
struct au_branch {
struct au_xino_file br_xino;
aufs_bindex_t br_id;
int br_perm;
struct path br_path;
spinlock_t br_dykey_lock;
struct au_dykey *br_dykey[AuBrDynOp];
atomic_t br_count;
struct au_wbr *br_wbr;
struct au_br_fhsm *br_fhsm;
/* xino truncation */
atomic_t br_xino_running;
#ifdef CONFIG_AUFS_HFSNOTIFY
struct au_br_hfsnotify *br_hfsn;
#endif
#ifdef CONFIG_SYSFS
/* entries under sysfs per mount-point */
struct au_brsysfs br_sysfs[AuBrSysfs_Last];
#endif
};
/* ---------------------------------------------------------------------- */
static inline struct vfsmount *au_br_mnt(struct au_branch *br)
{
return br->br_path.mnt;
}
static inline struct dentry *au_br_dentry(struct au_branch *br)
{
return br->br_path.dentry;
}
static inline struct super_block *au_br_sb(struct au_branch *br)
{
return au_br_mnt(br)->mnt_sb;
}
static inline int au_br_rdonly(struct au_branch *br)
{
return ((au_br_sb(br)->s_flags & MS_RDONLY)
|| !au_br_writable(br->br_perm))
? -EROFS : 0;
}
static inline int au_br_hnotifyable(int brperm __maybe_unused)
{
#ifdef CONFIG_AUFS_HNOTIFY
return !(brperm & AuBrPerm_RR);
#else
return 0;
#endif
}
static inline int au_br_test_oflag(int oflag, struct au_branch *br)
{
int err, exec_flag;
err = 0;
exec_flag = oflag & __FMODE_EXEC;
if (unlikely(exec_flag && path_noexec(&br->br_path)))
err = -EACCES;
return err;
}
/* ---------------------------------------------------------------------- */
/* branch.c */
struct au_sbinfo;
void au_br_free(struct au_sbinfo *sinfo);
int au_br_index(struct super_block *sb, aufs_bindex_t br_id);
struct au_opt_add;
int au_br_add(struct super_block *sb, struct au_opt_add *add, int remount);
struct au_opt_del;
int au_br_del(struct super_block *sb, struct au_opt_del *del, int remount);
long au_ibusy_ioctl(struct file *file, unsigned long arg);
#ifdef CONFIG_COMPAT
long au_ibusy_compat_ioctl(struct file *file, unsigned long arg);
#endif
struct au_opt_mod;
int au_br_mod(struct super_block *sb, struct au_opt_mod *mod, int remount,
int *do_refresh);
struct aufs_stfs;
int au_br_stfs(struct au_branch *br, struct aufs_stfs *stfs);
/* xino.c */
static const loff_t au_loff_max = LLONG_MAX;
int au_xib_trunc(struct super_block *sb);
ssize_t xino_fread(vfs_readf_t func, struct file *file, void *buf, size_t size,
loff_t *pos);
ssize_t xino_fwrite(vfs_writef_t func, struct file *file, void *buf,
size_t size, loff_t *pos);
struct file *au_xino_create2(struct file *base_file, struct file *copy_src);
struct file *au_xino_create(struct super_block *sb, char *fname, int silent);
ino_t au_xino_new_ino(struct super_block *sb);
void au_xino_delete_inode(struct inode *inode, const int unlinked);
int au_xino_write(struct super_block *sb, aufs_bindex_t bindex, ino_t h_ino,
ino_t ino);
int au_xino_read(struct super_block *sb, aufs_bindex_t bindex, ino_t h_ino,
ino_t *ino);
int au_xino_br(struct super_block *sb, struct au_branch *br, ino_t hino,
struct file *base_file, int do_test);
int au_xino_trunc(struct super_block *sb, aufs_bindex_t bindex);
struct au_opt_xino;
int au_xino_set(struct super_block *sb, struct au_opt_xino *xino, int remount);
void au_xino_clr(struct super_block *sb);
struct file *au_xino_def(struct super_block *sb);
int au_xino_path(struct seq_file *seq, struct file *file);
/* ---------------------------------------------------------------------- */
/* Superblock to branch */
static inline
aufs_bindex_t au_sbr_id(struct super_block *sb, aufs_bindex_t bindex)
{
return au_sbr(sb, bindex)->br_id;
}
static inline
struct vfsmount *au_sbr_mnt(struct super_block *sb, aufs_bindex_t bindex)
{
return au_br_mnt(au_sbr(sb, bindex));
}
static inline
struct super_block *au_sbr_sb(struct super_block *sb, aufs_bindex_t bindex)
{
return au_br_sb(au_sbr(sb, bindex));
}
static inline void au_sbr_put(struct super_block *sb, aufs_bindex_t bindex)
{
atomic_dec(&au_sbr(sb, bindex)->br_count);
}
static inline int au_sbr_perm(struct super_block *sb, aufs_bindex_t bindex)
{
return au_sbr(sb, bindex)->br_perm;
}
static inline int au_sbr_whable(struct super_block *sb, aufs_bindex_t bindex)
{
return au_br_whable(au_sbr_perm(sb, bindex));
}
/* ---------------------------------------------------------------------- */
/*
* wbr_wh_read_lock, wbr_wh_write_lock
* wbr_wh_read_unlock, wbr_wh_write_unlock, wbr_wh_downgrade_lock
*/
AuSimpleRwsemFuncs(wbr_wh, struct au_wbr *wbr, &wbr->wbr_wh_rwsem);
#define WbrWhMustNoWaiters(wbr) AuRwMustNoWaiters(&wbr->wbr_wh_rwsem)
#define WbrWhMustAnyLock(wbr) AuRwMustAnyLock(&wbr->wbr_wh_rwsem)
#define WbrWhMustWriteLock(wbr) AuRwMustWriteLock(&wbr->wbr_wh_rwsem)
/* ---------------------------------------------------------------------- */
#ifdef CONFIG_AUFS_FHSM
static inline void au_br_fhsm_init(struct au_br_fhsm *brfhsm)
{
mutex_init(&brfhsm->bf_lock);
brfhsm->bf_jiffy = 0;
brfhsm->bf_readable = 0;
}
static inline void au_br_fhsm_fin(struct au_br_fhsm *brfhsm)
{
mutex_destroy(&brfhsm->bf_lock);
}
#else
AuStubVoid(au_br_fhsm_init, struct au_br_fhsm *brfhsm)
AuStubVoid(au_br_fhsm_fin, struct au_br_fhsm *brfhsm)
#endif
#endif /* __KERNEL__ */
#endif /* __AUFS_BRANCH_H__ */
AuConfStr = CONFIG_AUFS_FS=${CONFIG_AUFS_FS}
define AuConf
ifdef ${1}
AuConfStr += ${1}=${${1}}
endif
endef
AuConfAll = BRANCH_MAX_127 BRANCH_MAX_511 BRANCH_MAX_1023 BRANCH_MAX_32767 \
SBILIST \
HNOTIFY HFSNOTIFY \
EXPORT INO_T_64 \
XATTR \
FHSM \
RDU \
SHWH \
BR_RAMFS \
BR_FUSE POLL \
BR_HFSPLUS \
BDEV_LOOP \
DEBUG MAGIC_SYSRQ
$(foreach i, ${AuConfAll}, \
$(eval $(call AuConf,CONFIG_AUFS_${i})))
AuConfName = ${obj}/conf.str
${AuConfName}.tmp: FORCE
@echo ${AuConfStr} | tr ' ' '\n' | sed -e 's/^/"/' -e 's/$$/\\n"/' > $@
${AuConfName}: ${AuConfName}.tmp
@diff -q $< $@ > /dev/null 2>&1 || { \
echo ' GEN ' $@; \
cp -p $< $@; \
}
FORCE:
clean-files += ${AuConfName} ${AuConfName}.tmp
${obj}/sysfs.o: ${AuConfName}
-include ${srctree}/${src}/conf_priv.mk
This diff is collapsed.
/*
* Copyright (C) 2005-2015 Junjiro R. Okajima
*
* This program, aufs is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
/*
* copy-up/down functions
*/
#ifndef __AUFS_CPUP_H__
#define __AUFS_CPUP_H__
#ifdef __KERNEL__
#include <linux/path.h>
struct inode;
struct file;
struct au_pin;
void au_cpup_attr_flags(struct inode *dst, unsigned int iflags);
void au_cpup_attr_timesizes(struct inode *inode);
void au_cpup_attr_nlink(struct inode *inode, int force);
void au_cpup_attr_changeable(struct inode *inode);
void au_cpup_igen(struct inode *inode, struct inode *h_inode);
void au_cpup_attr_all(struct inode *inode, int force);
/* ---------------------------------------------------------------------- */
struct au_cp_generic {
struct dentry *dentry;
aufs_bindex_t bdst, bsrc;
loff_t len;
struct au_pin *pin;
unsigned int flags;
};
/* cpup flags */
#define AuCpup_DTIME 1 /* do dtime_store/revert */
#define AuCpup_KEEPLINO (1 << 1) /* do not clear the lower xino,
for link(2) */
#define AuCpup_RENAME (1 << 2) /* rename after cpup */
#define AuCpup_HOPEN (1 << 3) /* call h_open_pre/post() in
cpup */
#define AuCpup_OVERWRITE (1 << 4) /* allow overwriting the
existing entry */
#define AuCpup_RWDST (1 << 5) /* force write target even if
the branch is marked as RO */
#define au_ftest_cpup(flags, name) ((flags) & AuCpup_##name)
#define au_fset_cpup(flags, name) \
do { (flags) |= AuCpup_##name; } while (0)
#define au_fclr_cpup(flags, name) \
do { (flags) &= ~AuCpup_##name; } while (0)
int au_copy_file(struct file *dst, struct file *src, loff_t len);
int au_sio_cpup_simple(struct au_cp_generic *cpg);
int au_sio_cpdown_simple(struct au_cp_generic *cpg);
int au_sio_cpup_wh(struct au_cp_generic *cpg, struct file *file);
int au_cp_dirs(struct dentry *dentry, aufs_bindex_t bdst,
int (*cp)(struct dentry *dentry, aufs_bindex_t bdst,
struct au_pin *pin,
struct dentry *h_parent, void *arg),
void *arg);
int au_cpup_dirs(struct dentry *dentry, aufs_bindex_t bdst);
int au_test_and_cpup_dirs(struct dentry *dentry, aufs_bindex_t bdst);
/* ---------------------------------------------------------------------- */
/* keep timestamps when copyup */
struct au_dtime {
struct dentry *dt_dentry;
struct path dt_h_path;
struct timespec dt_atime, dt_mtime;
};
void au_dtime_store(struct au_dtime *dt, struct dentry *dentry,
struct path *h_path);
void au_dtime_revert(struct au_dtime *dt);
#endif /* __KERNEL__ */
#endif /* __AUFS_CPUP_H__ */
/*
* Copyright (C) 2005-2015 Junjiro R. Okajima
*
* This program, aufs is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
/*
* debugfs interface
*/
#include <linux/debugfs.h>
#include "aufs.h"
#ifndef CONFIG_SYSFS
#error DEBUG_FS depends upon SYSFS
#endif
static struct dentry *dbgaufs;
static const mode_t dbgaufs_mode = S_IRUSR | S_IRGRP | S_IROTH;
/* 20 is max digits length of ulong 64 */
struct dbgaufs_arg {
int n;
char a[20 * 4];
};
/*
* common function for all XINO files
*/
static int dbgaufs_xi_release(struct inode *inode __maybe_unused,
struct file *file)
{
kfree(file->private_data);
return 0;
}
static int dbgaufs_xi_open(struct file *xf, struct file *file, int do_fcnt)
{
int err;
struct kstat st;
struct dbgaufs_arg *p;
err = -ENOMEM;
p = kmalloc(sizeof(*p), GFP_NOFS);
if (unlikely(!p))
goto out;
err = 0;
p->n = 0;
file->private_data = p;
if (!xf)
goto out;
err = vfs_getattr(&xf->f_path, &st);
if (!err) {
if (do_fcnt)
p->n = snprintf
(p->a, sizeof(p->a), "%ld, %llux%lu %lld\n",
(long)file_count(xf), st.blocks, st.blksize,
(long long)st.size);
else
p->n = snprintf(p->a, sizeof(p->a), "%llux%lu %lld\n",
st.blocks, st.blksize,
(long long)st.size);
AuDebugOn(p->n >= sizeof(p->a));
} else {
p->n = snprintf(p->a, sizeof(p->a), "err %d\n", err);
err = 0;
}
out:
return err;
}
static ssize_t dbgaufs_xi_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
{
struct dbgaufs_arg *p;
p = file->private_data;
return simple_read_from_buffer(buf, count, ppos, p->a, p->n);
}
/* ---------------------------------------------------------------------- */
struct dbgaufs_plink_arg {
int n;
char a[];
};
static int dbgaufs_plink_release(struct inode *inode __maybe_unused,
struct file *file)
{
free_page((unsigned long)file->private_data);
return 0;
}
static int dbgaufs_plink_open(struct inode *inode, struct file *file)
{
int err, i, limit;
unsigned long n, sum;
struct dbgaufs_plink_arg *p;
struct au_sbinfo *sbinfo;
struct super_block *sb;
struct au_sphlhead *sphl;
err = -ENOMEM;
p = (void *)get_zeroed_page(GFP_NOFS);
if (unlikely(!p))
goto out;
err = -EFBIG;
sbinfo = inode->i_private;
sb = sbinfo->si_sb;
si_noflush_read_lock(sb);
if (au_opt_test(au_mntflags(sb), PLINK)) {
limit = PAGE_SIZE - sizeof(p->n);
/* the number of buckets */
n = snprintf(p->a + p->n, limit, "%d\n", AuPlink_NHASH);
p->n += n;
limit -= n;
sum = 0;
for (i = 0, sphl = sbinfo->si_plink;
i < AuPlink_NHASH;
i++, sphl++) {
n = au_sphl_count(sphl);
sum += n;
n = snprintf(p->a + p->n, limit, "%lu ", n);
p->n += n;
limit -= n;
if (unlikely(limit <= 0))
goto out_free;
}
p->a[p->n - 1] = '\n';
/* the sum of plinks */
n = snprintf(p->a + p->n, limit, "%lu\n", sum);
p->n += n;
limit -= n;
if (unlikely(limit <= 0))
goto out_free;
} else {
#define str "1\n0\n0\n"
p->n = sizeof(str) - 1;
strcpy(p->a, str);
#undef str
}
si_read_unlock(sb);
err = 0;
file->private_data = p;
goto out; /* success */
out_free:
free_page((unsigned long)p);
out:
return err;
}
static ssize_t dbgaufs_plink_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
{
struct dbgaufs_plink_arg *p;
p = file->private_data;
return simple_read_from_buffer(buf, count, ppos, p->a, p->n);
}
static const struct file_operations dbgaufs_plink_fop = {
.owner = THIS_MODULE,
.open = dbgaufs_plink_open,
.release = dbgaufs_plink_release,
.read = dbgaufs_plink_read
};
/* ---------------------------------------------------------------------- */
static int dbgaufs_xib_open(struct inode *inode, struct file *file)
{
int err;
struct au_sbinfo *sbinfo;
struct super_block *sb;
sbinfo = inode->i_private;
sb = sbinfo->si_sb;
si_noflush_read_lock(sb);
err = dbgaufs_xi_open(sbinfo->si_xib, file, /*do_fcnt*/0);
si_read_unlock(sb);
return err;
}
static const struct file_operations dbgaufs_xib_fop = {
.owner = THIS_MODULE,
.open = dbgaufs_xib_open,
.release = dbgaufs_xi_release,
.read = dbgaufs_xi_read
};
/* ---------------------------------------------------------------------- */
#define DbgaufsXi_PREFIX "xi"
static int dbgaufs_xino_open(struct inode *inode, struct file *file)
{
int err;
long l;
struct au_sbinfo *sbinfo;
struct super_block *sb;
struct file *xf;
struct qstr *name;
err = -ENOENT;
xf = NULL;
name = &file->f_path.dentry->d_name;
if (unlikely(name->len < sizeof(DbgaufsXi_PREFIX)
|| memcmp(name->name, DbgaufsXi_PREFIX,
sizeof(DbgaufsXi_PREFIX) - 1)))
goto out;
err = kstrtol(name->name + sizeof(DbgaufsXi_PREFIX) - 1, 10, &l);
if (unlikely(err))
goto out;
sbinfo = inode->i_private;
sb = sbinfo->si_sb;
si_noflush_read_lock(sb);
if (l <= au_sbend(sb)) {
xf = au_sbr(sb, (aufs_bindex_t)l)->br_xino.xi_file;
err = dbgaufs_xi_open(xf, file, /*do_fcnt*/1);
} else
err = -ENOENT;
si_read_unlock(sb);
out:
return err;
}
static const struct file_operations dbgaufs_xino_fop = {
.owner = THIS_MODULE,
.open = dbgaufs_xino_open,
.release = dbgaufs_xi_release,
.read = dbgaufs_xi_read
};
void dbgaufs_brs_del(struct super_block *sb, aufs_bindex_t bindex)
{
aufs_bindex_t bend;
struct au_branch *br;
struct au_xino_file *xi;
if (!au_sbi(sb)->si_dbgaufs)
return;
bend = au_sbend(sb);
for (; bindex <= bend; bindex++) {
br = au_sbr(sb, bindex);
xi = &br->br_xino;
debugfs_remove(xi->xi_dbgaufs);
xi->xi_dbgaufs = NULL;
}
}
void dbgaufs_brs_add(struct super_block *sb, aufs_bindex_t bindex)
{
struct au_sbinfo *sbinfo;
struct dentry *parent;
struct au_branch *br;
struct au_xino_file *xi;
aufs_bindex_t bend;
char name[sizeof(DbgaufsXi_PREFIX) + 5]; /* "xi" bindex NULL */
sbinfo = au_sbi(sb);
parent = sbinfo->si_dbgaufs;
if (!parent)
return;
bend = au_sbend(sb);
for (; bindex <= bend; bindex++) {
snprintf(name, sizeof(name), DbgaufsXi_PREFIX "%d", bindex);
br = au_sbr(sb, bindex);
xi = &br->br_xino;
AuDebugOn(xi->xi_dbgaufs);
xi->xi_dbgaufs = debugfs_create_file(name, dbgaufs_mode, parent,
sbinfo, &dbgaufs_xino_fop);
/* ignore an error */
if (unlikely(!xi->xi_dbgaufs))
AuWarn1("failed %s under debugfs\n", name);
}
}
/* ---------------------------------------------------------------------- */
#ifdef CONFIG_AUFS_EXPORT
static int dbgaufs_xigen_open(struct inode *inode, struct file *file)
{
int err;
struct au_sbinfo *sbinfo;
struct super_block *sb;
sbinfo = inode->i_private;
sb = sbinfo->si_sb;
si_noflush_read_lock(sb);
err = dbgaufs_xi_open(sbinfo->si_xigen, file, /*do_fcnt*/0);
si_read_unlock(sb);
return err;
}
static const struct file_operations dbgaufs_xigen_fop = {
.owner = THIS_MODULE,
.open = dbgaufs_xigen_open,
.release = dbgaufs_xi_release,
.read = dbgaufs_xi_read
};
static int dbgaufs_xigen_init(struct au_sbinfo *sbinfo)
{
int err;
/*
* This function is a dynamic '__init' function actually,
* so the tiny check for si_rwsem is unnecessary.
*/
/* AuRwMustWriteLock(&sbinfo->si_rwsem); */
err = -EIO;
sbinfo->si_dbgaufs_xigen = debugfs_create_file
("xigen", dbgaufs_mode, sbinfo->si_dbgaufs, sbinfo,
&dbgaufs_xigen_fop);
if (sbinfo->si_dbgaufs_xigen)
err = 0;
return err;
}
#else
static int dbgaufs_xigen_init(struct au_sbinfo *sbinfo)
{
return 0;
}
#endif /* CONFIG_AUFS_EXPORT */
/* ---------------------------------------------------------------------- */
void dbgaufs_si_fin(struct au_sbinfo *sbinfo)
{
/*
* This function is a dynamic '__fin' function actually,
* so the tiny check for si_rwsem is unnecessary.
*/
/* AuRwMustWriteLock(&sbinfo->si_rwsem); */
debugfs_remove_recursive(sbinfo->si_dbgaufs);
sbinfo->si_dbgaufs = NULL;
kobject_put(&sbinfo->si_kobj);
}
int dbgaufs_si_init(struct au_sbinfo *sbinfo)
{
int err;
char name[SysaufsSiNameLen];
/*
* This function is a dynamic '__init' function actually,
* so the tiny check for si_rwsem is unnecessary.
*/
/* AuRwMustWriteLock(&sbinfo->si_rwsem); */
err = -ENOENT;
if (!dbgaufs) {
AuErr1("/debug/aufs is uninitialized\n");
goto out;
}
err = -EIO;
sysaufs_name(sbinfo, name);
sbinfo->si_dbgaufs = debugfs_create_dir(name, dbgaufs);
if (unlikely(!sbinfo->si_dbgaufs))
goto out;
kobject_get(&sbinfo->si_kobj);
sbinfo->si_dbgaufs_xib = debugfs_create_file
("xib", dbgaufs_mode, sbinfo->si_dbgaufs, sbinfo,
&dbgaufs_xib_fop);
if (unlikely(!sbinfo->si_dbgaufs_xib))
goto out_dir;
sbinfo->si_dbgaufs_plink = debugfs_create_file
("plink", dbgaufs_mode, sbinfo->si_dbgaufs, sbinfo,
&dbgaufs_plink_fop);
if (unlikely(!sbinfo->si_dbgaufs_plink))
goto out_dir;
err = dbgaufs_xigen_init(sbinfo);
if (!err)
goto out; /* success */
out_dir:
dbgaufs_si_fin(sbinfo);
out:
return err;
}
/* ---------------------------------------------------------------------- */
void dbgaufs_fin(void)
{
debugfs_remove(dbgaufs);
}
int __init dbgaufs_init(void)
{
int err;
err = -EIO;
dbgaufs = debugfs_create_dir(AUFS_NAME, NULL);
if (dbgaufs)
err = 0;
return err;
}
/*
* Copyright (C) 2005-2015 Junjiro R. Okajima
*
* This program, aufs is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
/*
* debugfs interface
*/
#ifndef __DBGAUFS_H__
#define __DBGAUFS_H__
#ifdef __KERNEL__
struct super_block;
struct au_sbinfo;
#ifdef CONFIG_DEBUG_FS
/* dbgaufs.c */
void dbgaufs_brs_del(struct super_block *sb, aufs_bindex_t bindex);
void dbgaufs_brs_add(struct super_block *sb, aufs_bindex_t bindex);
void dbgaufs_si_fin(struct au_sbinfo *sbinfo);
int dbgaufs_si_init(struct au_sbinfo *sbinfo);
void dbgaufs_fin(void);
int __init dbgaufs_init(void);
#else
AuStubVoid(dbgaufs_brs_del, struct super_block *sb, aufs_bindex_t bindex)
AuStubVoid(dbgaufs_brs_add, struct super_block *sb, aufs_bindex_t bindex)
AuStubVoid(dbgaufs_si_fin, struct au_sbinfo *sbinfo)
AuStubInt0(dbgaufs_si_init, struct au_sbinfo *sbinfo)
AuStubVoid(dbgaufs_fin, void)
AuStubInt0(__init dbgaufs_init, void)
#endif /* CONFIG_DEBUG_FS */
#endif /* __KERNEL__ */
#endif /* __DBGAUFS_H__ */
/*
* Copyright (C) 2005-2015 Junjiro R. Okajima
*
* This program, aufs is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
/*
* sub-routines for dentry cache
*/
#include "aufs.h"
static void au_dpage_free(struct au_dpage *dpage)
{
int i;
struct dentry **p;
p = dpage->dentries;
for (i = 0; i < dpage->ndentry; i++)
dput(*p++);
free_page((unsigned long)dpage->dentries);
}
int au_dpages_init(struct au_dcsub_pages *dpages, gfp_t gfp)
{
int err;
void *p;
err = -ENOMEM;
dpages->dpages = kmalloc(sizeof(*dpages->dpages), gfp);
if (unlikely(!dpages->dpages))
goto out;
p = (void *)__get_free_page(gfp);
if (unlikely(!p))
goto out_dpages;
dpages->dpages[0].ndentry = 0;
dpages->dpages[0].dentries = p;
dpages->ndpage = 1;
return 0; /* success */
out_dpages:
kfree(dpages->dpages);
out:
return err;
}
void au_dpages_free(struct au_dcsub_pages *dpages)
{
int i;
struct au_dpage *p;
p = dpages->dpages;
for (i = 0; i < dpages->ndpage; i++)
au_dpage_free(p++);
kfree(dpages->dpages);
}
static int au_dpages_append(struct au_dcsub_pages *dpages,
struct dentry *dentry, gfp_t gfp)
{
int err, sz;
struct au_dpage *dpage;
void *p;
dpage = dpages->dpages + dpages->ndpage - 1;
sz = PAGE_SIZE / sizeof(dentry);
if (unlikely(dpage->ndentry >= sz)) {
AuLabel(new dpage);
err = -ENOMEM;
sz = dpages->ndpage * sizeof(*dpages->dpages);
p = au_kzrealloc(dpages->dpages, sz,
sz + sizeof(*dpages->dpages), gfp);
if (unlikely(!p))
goto out;
dpages->dpages = p;
dpage = dpages->dpages + dpages->ndpage;
p = (void *)__get_free_page(gfp);
if (unlikely(!p))
goto out;
dpage->ndentry = 0;
dpage->dentries = p;
dpages->ndpage++;
}
AuDebugOn(au_dcount(dentry) <= 0);
dpage->dentries[dpage->ndentry++] = dget_dlock(dentry);
return 0; /* success */
out:
return err;
}
/* todo: BAD approach */
/* copied from linux/fs/dcache.c */
enum d_walk_ret {
D_WALK_CONTINUE,
D_WALK_QUIT,
D_WALK_NORETRY,
D_WALK_SKIP,
};
extern void d_walk(struct dentry *parent, void *data,
enum d_walk_ret (*enter)(void *, struct dentry *),
void (*finish)(void *));
struct ac_dpages_arg {
int err;
struct au_dcsub_pages *dpages;
struct super_block *sb;
au_dpages_test test;
void *arg;
};
static enum d_walk_ret au_call_dpages_append(void *_arg, struct dentry *dentry)
{
enum d_walk_ret ret;
struct ac_dpages_arg *arg = _arg;
ret = D_WALK_CONTINUE;
if (dentry->d_sb == arg->sb
&& !IS_ROOT(dentry)
&& au_dcount(dentry) > 0
&& au_di(dentry)
&& (!arg->test || arg->test(dentry, arg->arg))) {
arg->err = au_dpages_append(arg->dpages, dentry, GFP_ATOMIC);
if (unlikely(arg->err))
ret = D_WALK_QUIT;
}
return ret;
}
int au_dcsub_pages(struct au_dcsub_pages *dpages, struct dentry *root,
au_dpages_test test, void *arg)
{
struct ac_dpages_arg args = {
.err = 0,
.dpages = dpages,
.sb = root->d_sb,
.test = test,
.arg = arg
};
d_walk(root, &args, au_call_dpages_append, NULL);
return args.err;
}
int au_dcsub_pages_rev(struct au_dcsub_pages *dpages, struct dentry *dentry,
int do_include, au_dpages_test test, void *arg)
{
int err;
err = 0;
write_seqlock(&rename_lock);
spin_lock(&dentry->d_lock);
if (do_include
&& au_dcount(dentry) > 0
&& (!test || test(dentry, arg)))
err = au_dpages_append(dpages, dentry, GFP_ATOMIC);
spin_unlock(&dentry->d_lock);
if (unlikely(err))
goto out;
/*
* RCU for vfsmount is unnecessary since this is a traverse in a single
* mount
*/
while (!IS_ROOT(dentry)) {
dentry = dentry->d_parent; /* rename_lock is locked */
spin_lock(&dentry->d_lock);
if (au_dcount(dentry) > 0
&& (!test || test(dentry, arg)))
err = au_dpages_append(dpages, dentry, GFP_ATOMIC);
spin_unlock(&dentry->d_lock);
if (unlikely(err))
break;
}
out:
write_sequnlock(&rename_lock);
return err;
}
static inline int au_dcsub_dpages_aufs(struct dentry *dentry, void *arg)
{
return au_di(dentry) && dentry->d_sb == arg;
}
int au_dcsub_pages_rev_aufs(struct au_dcsub_pages *dpages,
struct dentry *dentry, int do_include)
{
return au_dcsub_pages_rev(dpages, dentry, do_include,
au_dcsub_dpages_aufs, dentry->d_sb);
}
int au_test_subdir(struct dentry *d1, struct dentry *d2)
{
struct path path[2] = {
{
.dentry = d1
},
{
.dentry = d2
}
};
return path_is_under(path + 0, path + 1);
}
/*
* Copyright (C) 2005-2015 Junjiro R. Okajima
*
* This program, aufs is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
/*
* sub-routines for dentry cache
*/
#ifndef __AUFS_DCSUB_H__
#define __AUFS_DCSUB_H__
#ifdef __KERNEL__
#include <linux/dcache.h>
#include <linux/fs.h>
struct au_dpage {
int ndentry;
struct dentry **dentries;
};
struct au_dcsub_pages {
int ndpage;
struct au_dpage *dpages;
};
/* ---------------------------------------------------------------------- */
/* dcsub.c */
int au_dpages_init(struct au_dcsub_pages *dpages, gfp_t gfp);
void au_dpages_free(struct au_dcsub_pages *dpages);
typedef int (*au_dpages_test)(struct dentry *dentry, void *arg);
int au_dcsub_pages(struct au_dcsub_pages *dpages, struct dentry *root,
au_dpages_test test, void *arg);
int au_dcsub_pages_rev(struct au_dcsub_pages *dpages, struct dentry *dentry,
int do_include, au_dpages_test test, void *arg);
int au_dcsub_pages_rev_aufs(struct au_dcsub_pages *dpages,
struct dentry *dentry, int do_include);
int au_test_subdir(struct dentry *d1, struct dentry *d2);
/* ---------------------------------------------------------------------- */
/*
* todo: in linux-3.13, several similar (but faster) helpers are added to
* include/linux/dcache.h. Try them (in the future).
*/
static inline int au_d_hashed_positive(struct dentry *d)
{
int err;
struct inode *inode = d_inode(d);
err = 0;
if (unlikely(d_unhashed(d)
|| d_is_negative(d)
|| !inode->i_nlink))
err = -ENOENT;
return err;
}
static inline int au_d_linkable(struct dentry *d)
{
int err;
struct inode *inode = d_inode(d);
err = au_d_hashed_positive(d);
if (err
&& d_is_positive(d)
&& (inode->i_state & I_LINKABLE))
err = 0;
return err;
}
static inline int au_d_alive(struct dentry *d)
{
int err;
struct inode *inode;
err = 0;
if (!IS_ROOT(d))
err = au_d_hashed_positive(d);
else {
inode = d_inode(d);
if (unlikely(d_unlinked(d)
|| d_is_negative(d)
|| !inode->i_nlink))
err = -ENOENT;
}
return err;
}
static inline int au_alive_dir(struct dentry *d)
{
int err;
err = au_d_alive(d);
if (unlikely(err || IS_DEADDIR(d_inode(d))))
err = -ENOENT;
return err;
}
static inline int au_qstreq(struct qstr *a, struct qstr *b)
{
return a->len == b->len
&& !memcmp(a->name, b->name, a->len);
}
/*
* by the commit
* 360f547 2015-01-25 dcache: let the dentry count go down to zero without
* taking d_lock
* the type of d_lockref.count became int, but the inlined function d_count()
* still returns unsigned int.
* I don't know why. Maybe it is for every d_count() users?
* Anyway au_dcount() lives on.
*/
static inline int au_dcount(struct dentry *d)
{
return (int)d_count(d);
}
#endif /* __KERNEL__ */
#endif /* __AUFS_DCSUB_H__ */
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
/*
* Copyright (C) 2005-2015 Junjiro R. Okajima
*
* This program, aufs is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
/*
* directory operations
*/
#ifndef __AUFS_DIR_H__
#define __AUFS_DIR_H__
#ifdef __KERNEL__
#include <linux/fs.h>
/* ---------------------------------------------------------------------- */
/* need to be faster and smaller */
struct au_nhash {
unsigned int nh_num;
struct hlist_head *nh_head;
};
struct au_vdir_destr {
unsigned char len;
unsigned char name[0];
} __packed;
struct au_vdir_dehstr {
struct hlist_node hash;
struct au_vdir_destr *str;
} ____cacheline_aligned_in_smp;
struct au_vdir_de {
ino_t de_ino;
unsigned char de_type;
/* caution: packed */
struct au_vdir_destr de_str;
} __packed;
struct au_vdir_wh {
struct hlist_node wh_hash;
#ifdef CONFIG_AUFS_SHWH
ino_t wh_ino;
aufs_bindex_t wh_bindex;
unsigned char wh_type;
#else
aufs_bindex_t wh_bindex;
#endif
/* caution: packed */
struct au_vdir_destr wh_str;
} __packed;
union au_vdir_deblk_p {
unsigned char *deblk;
struct au_vdir_de *de;
};
struct au_vdir {
unsigned char **vd_deblk;
unsigned long vd_nblk;
struct {
unsigned long ul;
union au_vdir_deblk_p p;
} vd_last;
unsigned long vd_version;
unsigned int vd_deblk_sz;
unsigned long vd_jiffy;
} ____cacheline_aligned_in_smp;
/* ---------------------------------------------------------------------- */
/* dir.c */
extern const struct file_operations aufs_dir_fop;
void au_add_nlink(struct inode *dir, struct inode *h_dir);
void au_sub_nlink(struct inode *dir, struct inode *h_dir);
loff_t au_dir_size(struct file *file, struct dentry *dentry);
void au_dir_ts(struct inode *dir, aufs_bindex_t bsrc);
int au_test_empty_lower(struct dentry *dentry);
int au_test_empty(struct dentry *dentry, struct au_nhash *whlist);
/* vdir.c */
unsigned int au_rdhash_est(loff_t sz);
int au_nhash_alloc(struct au_nhash *nhash, unsigned int num_hash, gfp_t gfp);
void au_nhash_wh_free(struct au_nhash *whlist);
int au_nhash_test_longer_wh(struct au_nhash *whlist, aufs_bindex_t btgt,
int limit);
int au_nhash_test_known_wh(struct au_nhash *whlist, char *name, int nlen);
int au_nhash_append_wh(struct au_nhash *whlist, char *name, int nlen, ino_t ino,
unsigned int d_type, aufs_bindex_t bindex,
unsigned char shwh);
void au_vdir_free(struct au_vdir *vdir);
int au_vdir_init(struct file *file);
int au_vdir_fill_de(struct file *file, struct dir_context *ctx);
/* ioctl.c */
long aufs_ioctl_dir(struct file *file, unsigned int cmd, unsigned long arg);
#ifdef CONFIG_AUFS_RDU
/* rdu.c */
long au_rdu_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
#ifdef CONFIG_COMPAT
long au_rdu_compat_ioctl(struct file *file, unsigned int cmd,
unsigned long arg);
#endif
#else
AuStub(long, au_rdu_ioctl, return -EINVAL, struct file *file,
unsigned int cmd, unsigned long arg)
#ifdef CONFIG_COMPAT
AuStub(long, au_rdu_compat_ioctl, return -EINVAL, struct file *file,
unsigned int cmd, unsigned long arg)
#endif
#endif
#endif /* __KERNEL__ */
#endif /* __AUFS_DIR_H__ */
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
# defined in ${srctree}/fs/fuse/inode.c
# tristate
ifdef CONFIG_FUSE_FS
ccflags-y += -DFUSE_SUPER_MAGIC=0x65735546
endif
# defined in ${srctree}/fs/xfs/xfs_sb.h
# tristate
ifdef CONFIG_XFS_FS
ccflags-y += -DXFS_SB_MAGIC=0x58465342
endif
# defined in ${srctree}/fs/configfs/mount.c
# tristate
ifdef CONFIG_CONFIGFS_FS
ccflags-y += -DCONFIGFS_MAGIC=0x62656570
endif
# defined in ${srctree}/fs/ubifs/ubifs.h
# tristate
ifdef CONFIG_UBIFS_FS
ccflags-y += -DUBIFS_SUPER_MAGIC=0x24051905
endif
# defined in ${srctree}/fs/hfsplus/hfsplus_raw.h
# tristate
ifdef CONFIG_HFSPLUS_FS
ccflags-y += -DHFSPLUS_SUPER_MAGIC=0x482b
endif
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment