1. 20 Nov, 2008 1 commit
  2. 19 Nov, 2008 4 commits
    • Chris Mason's avatar
    • Josef Bacik's avatar
      Btrfs: fix free space accounting when unpinning extents · 07103a3c
      Josef Bacik authored
      This patch fixes what I hope is the last early ENOSPC bug left.  I did not know
      that pinned extents would merge into one big extent when inserted on to the
      pinned extent tree, so I was adding free space to a block group that could
      possibly span multiple block groups.
      
      This is a big issue because first that space doesn't exist in that block group,
      and second we won't actually use that space because there are a bunch of other
      checks to make sure we're allocating within the constraints of the block group.
      
      This patch fixes the problem by adding the btrfs_add_free_space to
      btrfs_update_pinned_extents which makes sure we are adding the appropriate
      amount of free space to the appropriate block group.  Thanks much to Lee Trager
      for running my myriad of debug patches to help me track this problem down.
      Thank you,
      Signed-off-by: default avatarJosef Bacik <jbacik@redhat.com>
      07103a3c
    • Chris Mason's avatar
      Btrfs: Do fsync log replay when mount -o ro, except when on readonly media · 7c2ca468
      Chris Mason authored
      fsync log replay can change the filesystem, so it cannot be delayed until
      mount -o rw,remount, and it can't be forgotten entirely.  So, this patch
      changes btrfs to do with reiserfs, ext3 and xfs do, which is to do the
      log replay even when mounted readonly.
      
      On a readonly device if log replay is required, the mount is aborted.
      
      Getting all of this right had required fixing up some of the error
      handling in open_ctree.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      7c2ca468
    • Chris Mason's avatar
      Btrfs: Avoid writeback stalls · d2c3f4f6
      Chris Mason authored
      While building large bios in writepages, btrfs may end up waiting
      for other page writeback to finish if WB_SYNC_ALL is used.
      
      While it is waiting, the bio it is building has a number of pages with the
      writeback bit set and they aren't getting to the disk any time soon.  This
      lowers the latencies of writeback in general by sending down the bio being
      built before waiting for other pages.
      
      The bio submission code tries to limit the total number of async bios in
      flight by waiting when we're over a certain number of async bios.  But,
      the waits are happening while writepages is building bios, and this can easily
      lead to stalls and other problems for people calling wait_on_page_writeback.
      
      The current fix is to let the congestion tests take care of waiting.
      
      sync() and others make sure to drain the current async requests to make
      sure that everything that was pending when the sync was started really get
      to disk.  The code would drain pending requests both before and after
      submitting a new request.
      
      But, if one of the requests is waiting for page writeback to finish,
      the draining waits might block that page writeback.  This changes the
      draining code to only wait after submitting the bio being processed.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      d2c3f4f6
  3. 18 Nov, 2008 10 commits
    • Chris Mason's avatar
      Btrfs: switch back to wait_on_page_writeback to wait on metadata writes · 105d931d
      Chris Mason authored
      The extent based waiting was using more CPU, and other fixes have helped
      with the unplug storm problems.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      105d931d
    • Chris Mason's avatar
    • Chris Mason's avatar
      Btrfs: unplug all devices in the unplug call back · 9f0ba5bd
      Chris Mason authored
      For larger multi-device filesystems, there was logic to limit the
      number of devices unplugged to just the page that was sent to our sync_page
      function.
      
      But, the code wasn't always unplugging the right device.  Since this was
      just an optimization, disable it for now.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      9f0ba5bd
    • Liu Hui's avatar
      Btrfs: Some fixes for batching extent insert. · b4eec2ca
      Liu Hui authored
      In insert_extents(), when ret==1 and last is not zero, it should
      check if the current inserted item is the last item in this batching
      inserts. If so, it should just break from loop. If not, 'cur =
      insert_list->next' will make no sense because the list is empty now,
      and 'op' will point to an unexpectable place.
      
      There are also some trivial fixs in this patch including one comment
      typo error and deleting two redundant lines.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      b4eec2ca
    • Chris Mason's avatar
      Btrfs: prevent loops in the directory tree when creating snapshots · ea9e8b11
      Chris Mason authored
      For a directory tree:
      
      /mnt/subvolA/subvolB
      
      btrfsctl -s /mnt/subvolA/subvolB /mnt
      
      Will create a directory loop with subvolA under subvolB.  This
      commit uses the forward refs for each subvol and snapshot to error out
      before creating the loop.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      ea9e8b11
    • Chris Mason's avatar
      Btrfs: Add backrefs and forward refs for subvols and snapshots · 0660b5af
      Chris Mason authored
      Subvols and snapshots can now be referenced from any point in the directory
      tree.  We need to maintain back refs for them so we can find lost
      subvols.
      
      Forward refs are added so that we know all of the subvols and
      snapshots referenced anywhere in the directory tree of a single subvol.  This
      can be used to do recursive snapshotting (but they aren't yet) and it is
      also used to detect and prevent directory loops when creating new snapshots.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      0660b5af
    • Chris Mason's avatar
      Btrfs: Give each subvol and snapshot their own anonymous devid · 3394e160
      Chris Mason authored
      Each subvolume has its own private inode number space, and so we need
      to fill in different device numbers for each subvolume to avoid confusing
      applications.
      
      This commit puts a struct super_block into struct btrfs_root so it can
      call set_anon_super() and get a different device number generated for
      each root.
      
      btrfs_rename is changed to prevent renames across subvols.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      3394e160
    • Chris Mason's avatar
      Btrfs: Allow subvolumes and snapshots anywhere in the directory tree · 3de4586c
      Chris Mason authored
      Before, all snapshots and subvolumes lived in a single flat directory.  This
      was awkward and confusing because the single flat directory was only writable
      with the ioctls.
      
      This commit changes the ioctls to create subvols and snapshots at any
      point in the directory tree.  This requires making separate ioctls for
      snapshot and subvol creation instead of a combining them into one.
      
      The subvol ioctl does:
      
      btrfsctl -S subvol_name parent_dir
      
      After the ioctl is done subvol_name lives inside parent_dir.
      
      The snapshot ioctl does:
      
      btrfsctl -s path_for_snapshot root_to_snapshot
      
      path_for_snapshot can be an absolute or relative path.  btrfsctl breaks it up
      into directory and basename components.
      
      root_to_snapshot can be any file or directory in the FS.  The snapshot
      is taken of the entire root where that file lives.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      3de4586c
    • Josef Bacik's avatar
      Btrfs: Add some debugging around the ENOSPC bugs · 4ce4cb52
      Josef Bacik authored
      Some people are still reporting problems with early enospc.  This
      will help narrown down the cause.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      4ce4cb52
    • Josef Bacik's avatar
      Btrfs: fix free space leak · e3e469f8
      Josef Bacik authored
      In my batch delete/update/insert patch I introduced a free space leak.  The
      extent that we do the original search on in free_extents is never pinned, so we
      always update the block saying that it has free space, but the free space never
      actually gets added to the free space tree, since op->del will always be 0 and
      it's never actually added to the pinned extents tree.
      
      This patch fixes this problem by making sure we call pin_down_bytes on the
      pending extent op and set op->del to the return value of pin_down_bytes so
      update_block_group is called with the right value.  This seems to fix the case
      where we were getting ENOSPC when there was plenty of space available.
      Signed-off-by: default avatarJosef Bacik <jbacik@redhat.com>
      e3e469f8
  4. 15 Nov, 2008 17 commits
  5. 14 Nov, 2008 8 commits
    • Tejun Heo's avatar
      libata: improve phantom device detection · 6a6b97d3
      Tejun Heo authored
      Currently libata uses four methods to detect device presence.
      
      1. PHY status if available.
      2. TF register R/W test (only promotes presence, never demotes)
      3. device signature after reset
      4. IDENTIFY failure detection in SFF state machine
      
      Combination of the above works well in most cases but recently there
      have been a few reports where a phantom device causes unnecessary
      delay during probe.  In both cases, PHY status wasn't available.  In
      one case, it passed #2 and #3 and failed IDENTIFY with ATA_ERR which
      didn't qualify as #4.  The other failed #2 but as it passed #3 and #4,
      it still caused failure.
      
      In both cases, phantom device reported diagnostic failure, so these
      cases can be safely worked around by considering any !ATA_DRQ IDENTIFY
      failure as NODEV_HINT if diagnostic failure is set.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJeff Garzik <jgarzik@redhat.com>
      6a6b97d3
    • Randy Dunlap's avatar
      9p: restrict RDMA usage · 4ff429e6
      Randy Dunlap authored
      linux-next:
      
      Make 9p's RDMA option depend on INET since it uses Infiniband rdma_*
      functions and that code depends on INET.  Otherwise 9p can try to
      use symbols which don't exist.
      
      ERROR: "rdma_destroy_id" [net/9p/9pnet_rdma.ko] undefined!
      ERROR: "rdma_connect" [net/9p/9pnet_rdma.ko] undefined!
      ERROR: "rdma_create_id" [net/9p/9pnet_rdma.ko] undefined!
      ERROR: "rdma_create_qp" [net/9p/9pnet_rdma.ko] undefined!
      ERROR: "rdma_resolve_route" [net/9p/9pnet_rdma.ko] undefined!
      ERROR: "rdma_disconnect" [net/9p/9pnet_rdma.ko] undefined!
      ERROR: "rdma_resolve_addr" [net/9p/9pnet_rdma.ko] undefined!
      
      I used an if/endif block so that the menu items would remain
      presented together.
      
      Also correct an article adjective.
      Signed-off-by: default avatarRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: default avatarEric Van Hensbergen <ericvh@gmail.com>
      4ff429e6
    • Randy Dunlap's avatar
      Create/use more directory structure in the Documentation/ tree. · 31c00fc1
      Randy Dunlap authored
      Create Documentation/blockdev/ sub-directory and populate it.
      Populate the Documentation/serial/ sub-directory.
      Move MSI-HOWTO.txt to Documentation/PCI/.
      Move ioctl-number.txt to Documentation/ioctl/.
      Update all relevant 00-INDEX files.
      Update all relevant Kconfig files and source files.
      Signed-off-by: default avatarRandy Dunlap <randy.dunlap@oracle.com>
      31c00fc1
    • Martin Schwidefsky's avatar
      [S390] fix s390x_newuname · d2f019fe
      Martin Schwidefsky authored
      The uname system call for 64 bit compares current->personality without
      masking the upper 16 bits. If e.g. READ_IMPLIES_EXEC is set the result
      of a uname system call will always be s390x even if the process uses
      the s390 personality.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      d2f019fe
    • Stefan Haberland's avatar
      [S390] dasd: log sense for fatal errors · a9cffb22
      Stefan Haberland authored
      The logging of sense data for fatal errors was accidentally removed
      during Hyper PAV implementation.
      Signed-off-by: default avatarStefan Haberland <stefan.haberland@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      a9cffb22
    • Heiko Carstens's avatar
      [S390] cpu topology: fix locking · 74af2831
      Heiko Carstens authored
      cpu_coregroup_map used to grab a mutex on s390 since it was only
      called from process context.
      Since c7c22e4d "block: add support
      for IO CPU affinity" this is not true anymore.
      It now also gets called from softirq context.
      
      To prevent possible deadlocks change this in architecture code and
      use a spinlock instead of a mutex.
      
      Cc: stable@kernel.org
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      74af2831
    • Cornelia Huck's avatar
      [S390] cio: Fix refcount after moving devices. · 85acc407
      Cornelia Huck authored
      In ccw_device_move_to_orphanage(), a replacing ccw_device
      is searched via get_{disc,orphaned}_ccwdev_by_dev_id()
      which obtain a reference on the returned ccw_device.
      This reference must be given up again after the device
      has been moved to its new parent.
      Signed-off-by: default avatarCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      85acc407
    • Heiko Carstens's avatar
      [S390] ftrace: fix kernel stack backchain walking · 50bec4ce
      Heiko Carstens authored
      With CONFIG_IRQSOFF_TRACER the trace_hardirqs_off() function includes
      a call to __builtin_return_address(1). But we calltrace_hardirqs_off()
      from early entry code. There we have just a single stack frame.
      So this results in a kernel stack backchain walk that would walk beyond
      the kernel stack. Following the NULL terminated backchain this results
      in a lowcore read access.
      
      To fix this we simply call trace_hardirqs_off_caller() and pass the
      current instruction pointer.
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      50bec4ce