1. 04 Apr, 2017 1 commit
    • Eric W. Biederman's avatar
      vfs: Commit to never having exectuables on proc and sysfs. · 495d1af4
      Eric W. Biederman authored
      commit 22f6b4d3
      
       upstream.
      
      Today proc and sysfs do not contain any executable files.  Several
      applications today mount proc or sysfs without noexec and nosuid and
      then depend on there being no exectuables files on proc or sysfs.
      Having any executable files show on proc or sysfs would cause
      a user space visible regression, and most likely security problems.
      
      Therefore commit to never allowing executables on proc and sysfs by
      adding a new flag to mark them as filesystems without executables and
      enforce that flag.
      
      Test the flag where MNT_NOEXEC is tested today, so that the only user
      visible effect will be that exectuables will be treated as if the
      execute bit is cleared.
      
      The filesystems proc and sysfs do not currently incoporate any
      executable files so this does not result in any user visible effects.
      
      This makes it unnecessary to vet changes to proc and sysfs tightly for
      adding exectuable files or changes to chattr that would modify
      existing files, as no matter what the individual file say they will
      not be treated as exectuable files by the vfs.
      
      Not having to vet changes to closely is important as without this we
      are only one proc_create call (or another goof up in the
      implementation of notify_change) from having problematic executables
      on proc.  Those mistakes are all too easy to make and would create
      a situation where there are security issues or the assumptions of
      some program having to be broken (and cause userspace regressions).
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      [bwh: Backported to 3.16: we don't have super_block::s_iflags; use
       file_system_type::fs_flags instead]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      495d1af4
  2. 22 Aug, 2016 1 commit
  3. 15 Jul, 2015 1 commit
  4. 09 Jul, 2015 1 commit
  5. 13 Mar, 2014 1 commit
    • Theodore Ts'o's avatar
      fs: push sync_filesystem() down to the file system's remount_fs() · 02b9984d
      Theodore Ts'o authored
      
      Previously, the no-op "mount -o mount /dev/xxx" operation when the
      file system is already mounted read-write causes an implied,
      unconditional syncfs().  This seems pretty stupid, and it's certainly
      documented or guaraunteed to do this, nor is it particularly useful,
      except in the case where the file system was mounted rw and is getting
      remounted read-only.
      
      However, it's possible that there might be some file systems that are
      actually depending on this behavior.  In most file systems, it's
      probably fine to only call sync_filesystem() when transitioning from
      read-write to read-only, and there are some file systems where this is
      not needed at all (for example, for a pseudo-filesystem or something
      like romfs).
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Artem Bityutskiy <dedekind1@gmail.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Evgeniy Dushistov <dus...
      02b9984d
  6. 11 Mar, 2014 1 commit
    • Grant Likely's avatar
      of: remove /proc/device-tree · 8357041a
      Grant Likely authored
      
      The same data is now available in sysfs, so we can remove the code
      that exports it in /proc and replace it with a symlink to the sysfs
      version.
      
      Tested on versatile qemu model and mpc5200 eval board. More testing
      would be appreciated.
      
      v5: Fixed up conflicts with mainline changes
      Signed-off-by: default avatarGrant Likely <grant.likely@secretlab.ca>
      Cc: Rob Herring <rob.herring@calxeda.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
      Cc: Pantelis Antoniou <panto@antoniou-consulting.com>
      8357041a
  7. 27 Aug, 2013 1 commit
    • Eric W. Biederman's avatar
      userns: Better restrictions on when proc and sysfs can be mounted · e51db735
      Eric W. Biederman authored
      
      Rely on the fact that another flavor of the filesystem is already
      mounted and do not rely on state in the user namespace.
      
      Verify that the mounted filesystem is not covered in any significant
      way.  I would love to verify that the previously mounted filesystem
      has no mounts on top but there are at least the directories
      /proc/sys/fs/binfmt_misc and /sys/fs/cgroup/ that exist explicitly
      for other filesystems to mount on top of.
      
      Refactor the test into a function named fs_fully_visible and call that
      function from the mount routines of proc and sysfs.  This makes this
      test local to the filesystems involved and the results current of when
      the mounts take place, removing a weird threading of the user
      namespace, the mount namespace and the filesystems themselves.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      e51db735
  8. 26 Aug, 2013 1 commit
    • Eric W. Biederman's avatar
      proc: Restrict mounting the proc filesystem · aee1c13d
      Eric W. Biederman authored
      
      Don't allow mounting the proc filesystem unless the caller has
      CAP_SYS_ADMIN rights over the pid namespace.  The principle here is if
      you create or have capabilities over it you can mount it, otherwise
      you get to live with what other people have mounted.
      
      Andy pointed out that this is needed to prevent users in a user
      namespace from remounting proc and specifying different hidepid and gid
      options on already existing proc mounts.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      aee1c13d
  9. 19 Aug, 2013 1 commit
    • Richard Genoud's avatar
      proc: return on proc_readdir error · 94fc5d9d
      Richard Genoud authored
      Commit f0c3b509 ("[readdir] convert procfs") introduced a bug on the
      listing of the proc file-system.  The return value of proc_readdir()
      isn't tested anymore in the proc_root_readdir function.
      
      This lead to an "interesting" behaviour when we are using the getdents()
      system call with a buffer too small: instead of failing, it returns the
      first entries of /proc (enough to fill the given buffer), plus the PID
      directories.
      
      This is not triggered on glibc (as getdents is called with a 32KB
      buffer), but on uclibc, the buffer size is only 1KB, thus some proc
      entries are missing.
      
      See https://lkml.org/lkml/2013/8/12/288
      
       for more background.
      Signed-off-by: default avatarRichard Genoud <richard.genoud@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      94fc5d9d
  10. 29 Jun, 2013 1 commit
  11. 09 Apr, 2013 1 commit
  12. 27 Mar, 2013 1 commit
    • Eric W. Biederman's avatar
      userns: Restrict when proc and sysfs can be mounted · 87a8ebd6
      Eric W. Biederman authored
      
      Only allow unprivileged mounts of proc and sysfs if they are already
      mounted when the user namespace is created.
      
      proc and sysfs are interesting because they have content that is
      per namespace, and so fresh mounts are needed when new namespaces
      are created while at the same time proc and sysfs have content that
      is shared between every instance.
      
      Respect the policy of who may see the shared content of proc and sysfs
      by only allowing new mounts if there was an existing mount at the time
      the user namespace was created.
      
      In practice there are only two interesting cases: proc and sysfs are
      mounted at their usual places, proc and sysfs are not mounted at all
      (some form of mount namespace jail).
      
      Cc: stable@vger.kernel.org
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      87a8ebd6
  13. 20 Nov, 2012 1 commit
  14. 19 Nov, 2012 4 commits
    • Eric W. Biederman's avatar
      pidns: Make the pidns proc mount/umount logic obvious. · 0a01f2cc
      Eric W. Biederman authored
      
      Track the number of pids in the proc hash table.  When the number of
      pids goes to 0 schedule work to unmount the kernel mount of proc.
      
      Move the mount of proc into alloc_pid when we allocate the pid for
      init.
      
      Remove the surprising calls of pid_ns_release proc in fork and
      proc_flush_task.  Those code paths really shouldn't know about proc
      namespace implementation details and people have demonstrated several
      times that finding and understanding those code paths is difficult and
      non-obvious.
      
      Because of the call path detach pid is alwasy called with the
      rtnl_lock held free_pid is not allowed to sleep, so the work to
      unmounting proc is moved to a work queue.  This has the side benefit
      of not blocking the entire world waiting for the unnecessary
      rcu_barrier in deactivate_locked_super.
      
      In the process of making the code clear and obvious this fixes a bug
      reported by Gao feng <gaofeng@cn.fujitsu.com> where we would leak a
      mount of proc during clone(CLONE_NEWPID|CLONE_NEWNET) if copy_pid_ns
      succeeded and copy_net_ns failed.
      Acked-by: default avatar"Serge E. Hallyn" <serge@hallyn.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      0a01f2cc
    • Eric W. Biederman's avatar
      pidns: Use task_active_pid_ns where appropriate · 17cf22c3
      Eric W. Biederman authored
      
      The expressions tsk->nsproxy->pid_ns and task_active_pid_ns
      aka ns_of_pid(task_pid(tsk)) should have the same number of
      cache line misses with the practical difference that
      ns_of_pid(task_pid(tsk)) is released later in a processes life.
      
      Furthermore by using task_active_pid_ns it becomes trivial
      to write an unshare implementation for the the pid namespace.
      
      So I have used task_active_pid_ns everywhere I can.
      
      In fork since the pid has not yet been attached to the
      process I use ns_of_pid, to achieve the same effect.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      17cf22c3
    • Eric W. Biederman's avatar
      procfs: Don't cache a pid in the root inode. · ae06c7c8
      Eric W. Biederman authored
      
      Now that we have s_fs_info pointing to our pid namespace
      the original reason for the proc root inode having a struct
      pid is gone.
      
      Caching a pid in the root inode has led to some complicated
      code.  Now that we don't need the struct pid, just remove it.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      ae06c7c8
    • Eric W. Biederman's avatar
      procfs: Use the proc generic infrastructure for proc/self. · e656d8a6
      Eric W. Biederman authored
      
      I had visions at one point of splitting proc into two filesystems.  If
      that had happened proc/self being the the part of proc that actually deals
      with pids would have been a nice cleanup.  As it is proc/self requires
      a lot of unnecessary infrastructure for a single file.
      
      The only user visible change is that a mounted /proc for a pid namespace
      that is dead now shows a broken proc symlink, instead of being completely
      invisible.  I don't think anyone will notice or care.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      e656d8a6
  15. 05 Oct, 2012 1 commit
  16. 14 Jul, 2012 2 commits
  17. 15 May, 2012 1 commit
  18. 05 Apr, 2012 1 commit
  19. 11 Jan, 2012 2 commits
    • Vasiliy Kulikov's avatar
      procfs: add hidepid= and gid= mount options · 0499680a
      Vasiliy Kulikov authored
      Add support for mount options to restrict access to /proc/PID/
      directories.  The default backward-compatible "relaxed" behaviour is left
      untouched.
      
      The first mount option is called "hidepid" and its value defines how much
      info about processes we want to be available for non-owners:
      
      hidepid=0 (default) means the old behavior - anybody may read all
      world-readable /proc/PID/* files.
      
      hidepid=1 means users may not access any /proc/<pid>/ directories, but
      their own.  Sensitive files like cmdline, sched*, status are now protected
      against other users.  As permission checking done in proc_pid_permission()
      and files' permissions are left untouched, programs expecting specific
      files' modes are not confused.
      
      hidepid=2 means hidepid=1 plus all /proc/PID/ will be invisible to other
      users.  It doesn't mean that it hides whether a process exists (it can be
      learned by other means, e.g.  by kill -0 $PID), but it hides process' euid
      and egid.  It compicates intruder's task of gathering info about running
      processes, whether some daemon runs with elevated privileges, whether
      another user runs some sensitive program, whether other users run any
      program at all, etc.
      
      gid=XXX defines a group that will be able to gather all processes' info
      (as in hidepid=0 mode).  This group should be used instead of putting
      nonroot user in sudoers file or something.  However, untrusted users (like
      daemons, etc.) which are not supposed to monitor the tasks in the whole
      system should not be added to the group.
      
      hidepid=1 or higher is designed to restrict access to procfs files, which
      might reveal some sensitive private information like precise keystrokes
      timings:
      
      http://www.openwall.com/lists/oss-security/2011/11/05/3
      
      hidepid=1/2 doesn't break monitoring userspace tools.  ps, top, pgrep, and
      conky gracefully handle EPERM/ENOENT and behave as if the current user is
      the only user running processes.  pstree shows the process subtree which
      contains "pstree" process.
      
      Note: the patch doesn't deal with setuid/setgid issues of keeping
      preopened descriptors of procfs files (like
      https://lkml.org/lkml/2011/2/7/368
      
      ).  We rely on that the leaked
      information like the scheduling counters of setuid apps doesn't threaten
      anybody's privacy - only the user started the setuid program may read the
      counters.
      Signed-off-by: default avatarVasiliy Kulikov <segoon@openwall.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Theodore Tso <tytso@MIT.EDU>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: James Morris <jmorris@namei.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0499680a
    • Vasiliy Kulikov's avatar
      procfs: parse mount options · 97412950
      Vasiliy Kulikov authored
      
      Add support for procfs mount options.  Actual mount options are coming in
      the next patches.
      Signed-off-by: default avatarVasiliy Kulikov <segoon@openwall.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Theodore Tso <tytso@MIT.EDU>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: James Morris <jmorris@namei.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      97412950
  20. 09 Dec, 2011 1 commit
  21. 27 Jul, 2011 1 commit
    • David Howells's avatar
      proc: make struct proc_dir_entry::name a terminal array rather than a pointer · 09570f91
      David Howells authored
      
      Since __proc_create() appends the name it is given to the end of the PDE
      structure that it allocates, there isn't a need to store a name pointer.
      Instead we can just replace the name pointer with a terminal char array of
      _unspecified_ length.  The compiler will simply append the string to statically
      defined variables of PDE type overlapping any hole at the end of the structure
      and, unlike specifying an explicitly _zero_ length array, won't give a warning
      if you try to statically initialise it with a string of more than zero length.
      
      Also, whilst we're at it:
      
       (1) Move namelen to end just prior to name and reduce it to a single byte
           (name shouldn't be longer than NAME_MAX).
      
       (2) Move pde_unload_lock two places further on so that if it's four bytes in
           size on a 64-bit machine, it won't cause an unused hole in the PDE struct.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      09570f91
  22. 12 Jun, 2011 1 commit
  23. 24 Mar, 2011 2 commits
  24. 29 Oct, 2010 2 commits
  25. 15 Oct, 2010 1 commit
    • Arnd Bergmann's avatar
      llseek: automatically add .llseek fop · 6038f373
      Arnd Bergmann authored
      
      All file_operations should get a .llseek operation so we can make
      nonseekable_open the default for future file operations without a
      .llseek pointer.
      
      The three cases that we can automatically detect are no_llseek, seq_lseek
      and default_llseek. For cases where we can we can automatically prove that
      the file offset is always ignored, we use noop_llseek, which maintains
      the current behavior of not returning an error from a seek.
      
      New drivers should normally not use noop_llseek but instead use no_llseek
      and call nonseekable_open at open time.  Existing drivers can be converted
      to do the same when the maintainer knows for certain that no user code
      relies on calling seek on the device file.
      
      The generated code is often incorrectly indented and right now contains
      comments that clarify for each added line why a specific variant was
      chosen. In the version that gets submitted upstream, the comments will
      be gone and I will manually fix the indentation, because there does not
      seem to be a way to do that using coccinelle.
      
      Some amount of new code is currently sitting in linux-next that should get
      the same modifications, which I will do at the end of the merge window.
      
      Many thanks to Julia Lawall for helping me learn to write a semantic
      patch that does all this.
      
      ===== begin semantic patch =====
      // This adds an llseek= method to all file operations,
      // as a preparation for making no_llseek the default.
      //
      // The rules are
      // - use no_llseek explicitly if we do nonseekable_open
      // - use seq_lseek for sequential files
      // - use default_llseek if we know we access f_pos
      // - use noop_llseek if we know we don't access f_pos,
      //   but we still want to allow users to call lseek
      //
      @ open1 exists @
      identifier nested_open;
      @@
      nested_open(...)
      {
      <+...
      nonseekable_open(...)
      ...+>
      }
      
      @ open exists@
      identifier open_f;
      identifier i, f;
      identifier open1.nested_open;
      @@
      int open_f(struct inode *i, struct file *f)
      {
      <+...
      (
      nonseekable_open(...)
      |
      nested_open(...)
      )
      ...+>
      }
      
      @ read disable optional_qualifier exists @
      identifier read_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      expression E;
      identifier func;
      @@
      ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
      {
      <+...
      (
         *off = E
      |
         *off += E
      |
         func(..., off, ...)
      |
         E = *off
      )
      ...+>
      }
      
      @ read_no_fpos disable optional_qualifier exists @
      identifier read_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      @@
      ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
      {
      ... when != off
      }
      
      @ write @
      identifier write_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      expression E;
      identifier func;
      @@
      ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
      {
      <+...
      (
        *off = E
      |
        *off += E
      |
        func(..., off, ...)
      |
        E = *off
      )
      ...+>
      }
      
      @ write_no_fpos @
      identifier write_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      @@
      ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
      {
      ... when != off
      }
      
      @ fops0 @
      identifier fops;
      @@
      struct file_operations fops = {
       ...
      };
      
      @ has_llseek depends on fops0 @
      identifier fops0.fops;
      identifier llseek_f;
      @@
      struct file_operations fops = {
      ...
       .llseek = llseek_f,
      ...
      };
      
      @ has_read depends on fops0 @
      identifier fops0.fops;
      identifier read_f;
      @@
      struct file_operations fops = {
      ...
       .read = read_f,
      ...
      };
      
      @ has_write depends on fops0 @
      identifier fops0.fops;
      identifier write_f;
      @@
      struct file_operations fops = {
      ...
       .write = write_f,
      ...
      };
      
      @ has_open depends on fops0 @
      identifier fops0.fops;
      identifier open_f;
      @@
      struct file_operations fops = {
      ...
       .open = open_f,
      ...
      };
      
      // use no_llseek if we call nonseekable_open
      ////////////////////////////////////////////
      @ nonseekable1 depends on !has_llseek && has_open @
      identifier fops0.fops;
      identifier nso ~= "nonseekable_open";
      @@
      struct file_operations fops = {
      ...  .open = nso, ...
      +.llseek = no_llseek, /* nonseekable */
      };
      
      @ nonseekable2 depends on !has_llseek @
      identifier fops0.fops;
      identifier open.open_f;
      @@
      struct file_operations fops = {
      ...  .open = open_f, ...
      +.llseek = no_llseek, /* open uses nonseekable */
      };
      
      // use seq_lseek for sequential files
      /////////////////////////////////////
      @ seq depends on !has_llseek @
      identifier fops0.fops;
      identifier sr ~= "seq_read";
      @@
      struct file_operations fops = {
      ...  .read = sr, ...
      +.llseek = seq_lseek, /* we have seq_read */
      };
      
      // use default_llseek if there is a readdir
      ///////////////////////////////////////////
      @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier readdir_e;
      @@
      // any other fop is used that changes pos
      struct file_operations fops = {
      ... .readdir = readdir_e, ...
      +.llseek = default_llseek, /* readdir is present */
      };
      
      // use default_llseek if at least one of read/write touches f_pos
      /////////////////////////////////////////////////////////////////
      @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read.read_f;
      @@
      // read fops use offset
      struct file_operations fops = {
      ... .read = read_f, ...
      +.llseek = default_llseek, /* read accesses f_pos */
      };
      
      @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier write.write_f;
      @@
      // write fops use offset
      struct file_operations fops = {
      ... .write = write_f, ...
      +	.llseek = default_llseek, /* write accesses f_pos */
      };
      
      // Use noop_llseek if neither read nor write accesses f_pos
      ///////////////////////////////////////////////////////////
      
      @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read_no_fpos.read_f;
      identifier write_no_fpos.write_f;
      @@
      // write fops use offset
      struct file_operations fops = {
      ...
       .write = write_f,
       .read = read_f,
      ...
      +.llseek = noop_llseek, /* read and write both use no f_pos */
      };
      
      @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier write_no_fpos.write_f;
      @@
      struct file_operations fops = {
      ... .write = write_f, ...
      +.llseek = noop_llseek, /* write uses no f_pos */
      };
      
      @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read_no_fpos.read_f;
      @@
      struct file_operations fops = {
      ... .read = read_f, ...
      +.llseek = noop_llseek, /* read uses no f_pos */
      };
      
      @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      @@
      struct file_operations fops = {
      ...
      +.llseek = noop_llseek, /* no read or write fn */
      };
      ===== End semantic patch =====
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: Julia Lawall <julia@diku.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      6038f373
  26. 27 May, 2010 1 commit
  27. 03 Mar, 2010 1 commit
  28. 09 May, 2009 1 commit
  29. 27 Mar, 2009 1 commit
  30. 05 Jan, 2009 1 commit
    • Alexey Dobriyan's avatar
      proc: stop using BKL · b4df2b92
      Alexey Dobriyan authored
      
      There are four BKL users in proc: de_put(), proc_lookup_de(),
      proc_readdir_de(), proc_root_readdir(),
      
      1) de_put()
      -----------
      de_put() is classic atomic_dec_and_test() refcount wrapper -- no BKL
      needed. BKL doesn't matter to possible refcount leak as well.
      
      2) proc_lookup_de()
      -------------------
      Walking PDE list is protected by proc_subdir_lock(), proc_get_inode() is
      potentially blocking, all callers of proc_lookup_de() eventually end up
      from ->lookup hooks which is protected by directory's ->i_mutex -- BKL
      doesn't protect anything.
      
      3) proc_readdir_de()
      --------------------
      "." and ".." part doesn't need BKL, walking PDE list is under
      proc_subdir_lock, calling filldir callback is potentially blocking
      because it writes to luserspace. All proc_readdir_de() callers
      eventually come from ->readdir hook which is under directory's
      ->i_mutex -- BKL doesn't protect anything.
      
      4) proc_root_readdir_de()
      -------------------------
      proc_root_readdir_de is ->readdir hook, see (3).
      
      Since readdir hooks doesn't use BKL anymore, switch to
      generic_file_llseek, since it also takes directory's i_mutex.
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      b4df2b92
  31. 23 Oct, 2008 2 commits
  32. 29 Apr, 2008 1 commit
    • Denis V. Lunev's avatar
      proc: introduce proc_create_data to setup de->data · 59b74351
      Denis V. Lunev authored
      This set of patches fixes an proc ->open'less usage due to ->proc_fops flip in
      the most part of the kernel code.  The original OOPS is described in the
      commit 2d3a4e36
      
      :
      
          Typical PDE creation code looks like:
      
          	pde = create_proc_entry("foo", 0, NULL);
          	if (pde)
          		pde->proc_fops = &foo_proc_fops;
      
          Notice that PDE is first created, only then ->proc_fops is set up to
          final value. This is a problem because right after creation
          a) PDE is fully visible in /proc , and
          b) ->proc_fops are proc_file_operations which do not have ->open callback. So, it's
             possible to ->read without ->open (see one class of oopses below).
      
          The fix is new API called proc_create() which makes sure ->proc_fops are
          set up before gluing PDE to main tree. Typical new code looks like:
      
          	pde = proc_create("foo", 0, NULL, &foo_proc_fops);
          	if (!pde)
          		return -ENOMEM;
      
          Fix most networking users for a start.
      
          In the long run, create_proc_entry() for regular files will go.
      
      In addition to this, proc_create_data is introduced to fix reading from
      proc without PDE->data. The race is basically the same as above.
      
      create_proc_entries is replaced in the entire kernel code as new method
      is also simply better.
      
      This patch:
      
      The problem is the same as for de->proc_fops.  Right now PDE becomes visible
      without data set.  So, the entry could be looked up without data.  This, in
      most cases, will simply OOPS.
      
      proc_create_data call is created to address this issue.  proc_create now
      becomes a wrapper around it.
      Signed-off-by: default avatarDenis V. Lunev <den@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Acked-by: default avatarDavid Howells <dhowells@redhat.com>
      Cc: Dmitry Torokhov <dtor@mail.ru>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Grant Grundler <grundler@parisc-linux.org>
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Jaroslav Kysela <perex@suse.cz>
      Cc: Jeff Garzik <jgarzik@pobox.com>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Karsten Keil <kkeil@suse.de>
      Cc: Kyle McMartin <kyle@parisc-linux.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Nadia Derbey <Nadia.Derbey@bull.net>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Osterlund <petero2@telia.com>
      Cc: Pierre Peiffer <peifferp@gmail.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      59b74351