1. 26 Feb, 2009 1 commit
  2. 02 Feb, 2009 1 commit
  3. 05 Jan, 2009 9 commits
    • Mark Fasheh's avatar
      ocfs2: Add directory block trailers. · 87d35a74
      Mark Fasheh authored
      
      Future ocfs2 features metaecc and indexed directories need to store a
      little bit of data in each dirblock.  For compatibility, we place this
      in a trailer at the end of the dirblock.  The trailer plays itself as an
      empty dirent, so that if the features are turned off, it can be reused
      without requiring a tunefs scan.
      
      This code adds the trailer and validates it when the block is read in.
      
      [ Mark is the original author, but I reinserted this code before his
        dir index work.  -- Joel ]
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      Signed-off-by: default avatarMark Fasheh <mfasheh@suse.com>
      87d35a74
    • Joel Becker's avatar
      ocfs2: Use metadata-specific ocfs2_journal_access_*() functions. · 13723d00
      Joel Becker authored
      
      The per-metadata-type ocfs2_journal_access_*() functions hook up jbd2
      commit triggers and allow us to compute metadata ecc right before the
      buffers are written out.  This commit provides ecc for inodes, extent
      blocks, group descriptors, and quota blocks.  It is not safe to use
      extened attributes and metaecc at the same time yet.
      
      The ocfs2_extent_tree and ocfs2_path abstractions in alloc.c both hide
      the type of block at their root.  Before, it didn't matter, but now the
      root block must use the appropriate ocfs2_journal_access_*() function.
      To keep this abstract, the structures now have a pointer to the matching
      journal_access function and a wrapper call to call it.
      
      A few places use naked ocfs2_write_block() calls instead of adding the
      blocks to the journal.  We make sure to calculate their checksum and ecc
      before the write.
      
      Since we pass around the journal_access functions.  Let's typedef them
      in ocfs2.h.
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      Signed-off-by: default avatarMark Fasheh <mfasheh@suse.com>
      13723d00
    • Joel Becker's avatar
      ocfs2: Add the underlying blockcheck code. · 70ad1ba7
      Joel Becker authored
      
      This is the code that computes crc32 and ecc for ocfs2 metadata blocks.
      There are high-level functions that check whether the filesystem has the
      ecc feature, mid-level functions that work on a single block or array of
      buffer_heads, and the low-level ecc hamming code that can handle
      multiple buffers like crc32_le().
      
      It's not hooked up to the filesystem yet.
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMark Fasheh <mfasheh@suse.com>
      70ad1ba7
    • Jan Kara's avatar
      ocfs2: Enable quota accounting on mount, disable on umount · 19ece546
      Jan Kara authored
      
      Enable quota usage tracking on mount and disable it on umount. Also
      add support for quota on and quota off quotactls and usrquota and
      grpquota mount options. Add quota features among supported ones.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarMark Fasheh <mfasheh@suse.com>
      19ece546
    • Jan Kara's avatar
      ocfs2: Implement quota recovery · 2205363d
      Jan Kara authored
      
      Implement functions for recovery after a crash. Functions just
      read local quota file and sync info to global quota file.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarMark Fasheh <mfasheh@suse.com>
      2205363d
    • Joel Becker's avatar
      ocfs2: Wrap extent block reads in a dedicated function. · 5e96581a
      Joel Becker authored
      
      We weren't consistently checking extent blocks after we read them.
      Most places checked the signature, but none checked h_blkno or
      h_fs_signature.  Create a toplevel ocfs2_read_extent_block() that does
      the read and the validation.
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      Signed-off-by: default avatarMark Fasheh <mfasheh@suse.com>
      5e96581a
    • Joel Becker's avatar
      ocfs2: Morph the haphazard OCFS2_IS_VALID_GROUP_DESC() checks. · 42035306
      Joel Becker authored
      
      Random places in the code would check a group descriptor bh to see if it
      was valid. The previous commit unified descriptor block reads,
      validating all block reads in the same place.  Thus, these checks are no
      longer necessary.  Rather than eliminate them, however, we change them
      to BUG_ON() checks.  This ensures the assumptions remain true.  All of
      the code paths to these checks have been audited to ensure they come
      from a validated descriptor read.
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      Signed-off-by: default avatarMark Fasheh <mfasheh@suse.com>
      42035306
    • Joel Becker's avatar
      ocfs2: Morph the haphazard OCFS2_IS_VALID_DINODE() checks. · 10995aa2
      Joel Becker authored
      
      Random places in the code would check a dinode bh to see if it was
      valid.  Not only did they do different levels of validation, they
      handled errors in different ways.
      
      The previous commit unified inode block reads, validating all block
      reads in the same place.  Thus, these haphazard checks are no longer
      necessary.  Rather than eliminate them, however, we change them to
      BUG_ON() checks.  This ensures the assumptions remain true.  All of the
      code paths to these checks have been audited to ensure they come from a
      validated inode read.
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      Signed-off-by: default avatarMark Fasheh <mfasheh@suse.com>
      10995aa2
    • Tiger Yang's avatar
      ocfs2: add POSIX ACL API · 929fb014
      Tiger Yang authored
      
      This patch adds POSIX ACL(access control lists) APIs in ocfs2. We convert
      struct posix_acl to many ocfs2_acl_entry and regard them as an extended
      attribute entry.
      Signed-off-by: default avatarTiger Yang <tiger.yang@oracle.com>
      Signed-off-by: default avatarMark Fasheh <mfasheh@suse.com>
      929fb014
  4. 01 Dec, 2008 1 commit
  5. 10 Nov, 2008 1 commit
  6. 14 Oct, 2008 1 commit
    • Joel Becker's avatar
      ocfs2: Switch over to JBD2. · 2b4e30fb
      Joel Becker authored
      
      ocfs2 wants JBD2 for many reasons, not the least of which is that JBD is
      limiting our maximum filesystem size.
      
      It's a pretty trivial change.  Most functions are just renamed.  The
      only functional change is moving to Jan's inode-based ordered data mode.
      It's better, too.
      
      Because JBD2 reads and writes JBD journals, this is compatible with any
      existing filesystem.  It can even interact with JBD-based ocfs2 as long
      as the journal is formated for JBD.
      
      We provide a compatibility option so that paranoid people can still use
      JBD for the time being.  This will go away shortly.
      
      [ Moved call of ocfs2_begin_ordered_truncate() from ocfs2_delete_inode() to
        ocfs2_truncate_for_delete(). --Mark ]
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      Signed-off-by: default avatarMark Fasheh <mfasheh@suse.com>
      2b4e30fb
  7. 13 Oct, 2008 7 commits
    • Joel Becker's avatar
      ocfs2: Add the 'inode64' mount option. · 12462f1d
      Joel Becker authored
      
      Now that ocfs2 limits inode numbers to 32bits, add a mount option to
      disable the limit.  This parallels XFS.  64bit systems can handle the
      larger inode numbers.
      
      [ Added description of inode64 mount option in ocfs2.txt. --Mark ]
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      Signed-off-by: default avatarMark Fasheh <mfasheh@suse.com>
      12462f1d
    • Tiger Yang's avatar
      ocfs2: Add incompatible flag for extended attribute · 8154da3d
      Tiger Yang authored
      
      This patch adds the s_incompat flag for extended attribute support. This
      helps us ensure that older versions of Ocfs2 or ocfs2-tools will not be able
      to mount a volume with xattr support.
      Signed-off-by: default avatarTiger Yang <tiger.yang@oracle.com>
      Signed-off-by: default avatarMark Fasheh <mfasheh@suse.com>
      8154da3d
    • Tiger Yang's avatar
      ocfs2: Add extended attribute support · cf1d6c76
      Tiger Yang authored
      
      This patch implements storing extended attributes both in inode or a single
      external block. We only store EA's in-inode when blocksize > 512 or that
      inode block has free space for it. When an EA's value is larger than 80
      bytes, we will store the value via b-tree outside inode or block.
      Signed-off-by: default avatarTiger Yang <tiger.yang@oracle.com>
      Signed-off-by: default avatarMark Fasheh <mfasheh@suse.com>
      cf1d6c76
    • Tiger Yang's avatar
      ocfs2: reserve inline space for extended attribute · fdd77704
      Tiger Yang authored
      
      Add the structures and helper functions we want for handling inline extended
      attributes. We also update the inline-data handlers so that they properly
      function in the event that we have both inline data and inline attributes
      sharing an inode block.
      Signed-off-by: default avatarTiger Yang <tiger.yang@oracle.com>
      Signed-off-by: default avatarMark Fasheh <mfasheh@suse.com>
      fdd77704
    • Mark Fasheh's avatar
      ocfs2: track local alloc state via debugfs · 9a8ff578
      Mark Fasheh authored
      
      A per-mount debugfs file, "local_alloc" is created which when read will
      expose live state of the nodes local alloc file. Performance impact is
      minimal, only a bit of memory overhead per mount point. Still, the code is
      hidden behind CONFIG_OCFS2_FS_STATS. This feature will help us debug
      local alloc performance problems on a live system.
      Signed-off-by: default avatarMark Fasheh <mfasheh@suse.com>
      9a8ff578
    • Mark Fasheh's avatar
      ocfs2: throttle back local alloc when low on disk space · 9c7af40b
      Mark Fasheh authored
      
      Ocfs2's local allocator disables itself for the duration of a mount point
      when it has trouble allocating a large enough area from the primary bitmap.
      That can cause performance problems, especially for disks which were only
      temporarily full or fragmented. This patch allows for the allocator to
      shrink it's window first, before being disabled. Later, it can also be
      re-enabled so that any performance drop is minimized.
      
      To do this, we allow the value of osb->local_alloc_bits to be shrunk when
      needed. The default value is recorded in a mostly read-only variable so that
      we can re-initialize when required.
      
      Locking had to be updated so that we could protect changes to
      local_alloc_bits. Mostly this involves protecting various local alloc values
      with the osb spinlock. A new state is also added, OCFS2_LA_THROTTLED, which
      is used when the local allocator is has shrunk, but is not disabled. If the
      available space dips below 1 megabyte, the local alloc file is disabled. In
      either case, local alloc is re-enabled 30 seconds after the event, or when
      an appropriate amount of bits is seen in the primary bitmap.
      Signed-off-by: default avatarMark Fasheh <mfasheh@suse.com>
      9c7af40b
    • Mark Fasheh's avatar
      ocfs2: Track local alloc bits internally · ebcee4b5
      Mark Fasheh authored
      
      Do this instead of tracking absolute local alloc size. This avoids
      needless re-calculatiion of bits from bytes in localalloc.c. Additionally,
      the value is now in a more natural unit for internal file system bitmap
      work.
      Signed-off-by: default avatarMark Fasheh <mfasheh@suse.com>
      ebcee4b5
  8. 31 Jul, 2008 1 commit
    • Sunil Mushran's avatar
      [PATCH 2/2] ocfs2: Fix race between mount and recovery · 539d8264
      Sunil Mushran authored
      As the fs recovery is asynchronous, there is a small chance that another
      node can mount (and thus recover) the slot before the recovery thread
      gets to it.
      
      If this happens, the recovery thread will block indefinitely on the
      journal/slot lock as that lock will be held for the duration of the mount
      (by design) by the node assigned to that slot.
      
      The solution implemented is to keep track of the journal replays using
      a recovery generation in the journal inode, which will be incremented by the
      thread replaying that journal. The recovery thread, before attempting the
      blocking lock on the journal/slot lock, will compare the generation on disk
      with what it has cached and skip recovery if it does not match.
      
      This bug appears to have been inadvertently introduced during the mount/umount
      vote removal by mainline commit 34d024f8
      
      . In the
      mount voting scheme, the messaging would indirectly indicate that the slot
      was being recovered.
      Signed-off-by: default avatarSunil Mushran <sunil.mushran@oracle.com>
      Signed-off-by: default avatarMark Fasheh <mfasheh@suse.com>
      539d8264
  9. 14 Jul, 2008 1 commit
  10. 18 Apr, 2008 11 commits
    • Tao Ma's avatar
      ocfs2: Add inode stealing for ocfs2_reserve_new_inode · 4d0ddb2c
      Tao Ma authored
      
      Inode allocation is modified to look in other nodes allocators during
      extreme out of space situations. We retry our own slot when space is freed
      back to the global bitmap, or whenever we've allocated more than 1024 inodes
      from another slot.
      Signed-off-by: default avatarTao Ma <tao.ma@oracle.com>
      Signed-off-by: default avatarMark Fasheh <mfasheh@suse.com>
      4d0ddb2c
    • Joel Becker's avatar
      ocfs2: Add the USERSPACE_STACK incompat bit. · b61817e1
      Joel Becker authored
      
      The filesystem gains the USERSPACE_STACK incomat bit and the
      s_cluster_info field on the superblock.  When a userspace stack is in
      use, the name of the stack is stored on-disk for mount-time
      verification.
      
      The "cluster_stack" option is added to mount(2) processing.  The mount
      process needs to pass the matching stack name.  If the passed name and
      the on-disk name do not match, the mount is failed.
      
      When using the classic o2cb stack, the incompat bit is *not* set and no
      mount option is used other than the usual heartbeat=local.  Thus, the
      filesystem is compatible with older tools.
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      Signed-off-by: default avatarMark Fasheh <mfasheh@suse.com>
      b61817e1
    • Joel Becker's avatar
      ocfs2: Remove CANCELGRANT from the view of dlmglue. · de551246
      Joel Becker authored
      
      o2dlm has the non-standard behavior of providing a cancel callback
      (unlock_ast) even when the cancel has failed (the locking operation
      succeeded without canceling).  This is called CANCELGRANT after the
      status code sent to the callback.  fs/dlm does not provide this
      callback, so dlmglue must be changed to live without it.
      o2dlm_unlock_ast_wrapper() in stackglue now ignores CANCELGRANT calls.
      
      Because dlmglue no longer sees CANCELGRANT, ocfs2_unlock_ast() no longer
      needs to check for it.  ocfs2_locking_ast() must catch that a cancel was
      tried and clear the cancel state.
      
      Making these changes opens up a locking race.  dlmglue uses the the
      OCFS2_LOCK_BUSY flag to ensure only one thread is calling the dlm at any
      one time.  But dlmglue must unlock the lockres before calling into the
      dlm.  In the small window of time between unlocking the lockres and
      calling the dlm, the downconvert thread can try to cancel the lock.  The
      downconvert thread is checking the OCFS2_LOCK_BUSY flag - it doesn't
      know that ocfs2_dlm_lock() has not yet been called.
      
      Because ocfs2_dlm_lock() has not yet been called, the cancel operation
      will just be a no-op.  There's nothing to cancel.  With CANCELGRANT,
      dlmglue uses the CANCELGRANT callback to clear up the cancel state.
      When it comes around again, it will retry the cancel.  Eventually, the
      first thread will have called into ocfs2_dlm_lock(), and either the
      lock or the cancel will succeed.  The downconvert thread can then do its
      downconvert.
      
      Without CANCELGRANT, there is nothing to clean up the cancellation
      state.  The downconvert thread does not know to retry its operations.
      More importantly, the original lock may be blocking on the other node
      that is trying to cancel us.  With neither able to make progress, the
      ast is never called and the cancellation state is never cleaned up that
      way.  dlmglue is deadlocked.
      
      The OCFS2_LOCK_PENDING flag is introduced to remedy this window.  It is
      set at the same time OCFS2_LOCK_BUSY is.  Thus, the downconvert thread
      can check whether the lock is cancelable.  If not, it just loops around
      to try again.  Once ocfs2_dlm_lock() is called, the thread then clears
      OCFS2_LOCK_PENDING and wakes the downconvert thread.  Now, if the
      downconvert thread finds the lock BUSY, it can safely try to cancel it.
      Whether the cancel works or not, the state will be properly set and the
      lock processing can continue.
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      Signed-off-by: default avatarMark Fasheh <mfasheh@suse.com>
      de551246
    • Joel Becker's avatar
      ocfs2: Move o2hb functionality into the stack glue. · 6953b4c0
      Joel Becker authored
      
      The last bit of classic stack used directly in ocfs2 code is o2hb.
      Specifically, the check for heartbeat during mount and the call to
      ocfs2_hb_ctl during unmount.
      
      We create an extra API, ocfs2_cluster_hangup(), to encapsulate the call
      to ocfs2_hb_ctl.  Other stacks will just leave hangup() empty.
      
      The check for heartbeat is moved into ocfs2_cluster_connect().  It will
      be matched by a similar check for other stacks.
      
      With this change, only stackglue.c includes cluster/ headers.
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      Signed-off-by: default avatarMark Fasheh <mfasheh@suse.com>
      6953b4c0
    • Joel Becker's avatar
      ocfs2: Abstract out node number queries. · 19fdb624
      Joel Becker authored
      
      ocfs2 asks the cluster stack for the local node's node number for two
      reasons; to fill the slot map and to print it. While the slot map isn't
      necessary for userspace cluster stacks, the printing is very nice for
      debugging. Thus we add ocfs2_cluster_this_node() as a generic API to get
      this value. It is anticipated that the slot map will not be used under a
      userspace cluster stack, so validity checks of the node num only need to
      exist in the slot map code. Otherwise, it just gets used and printed as an
      opaque value.
      
      [ Fixed up some "int" versus "unsigned int" issues and made osb->node_num
        truly opaque. --Mark ]
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      Signed-off-by: default avatarMark Fasheh <mfasheh@suse.com>
      19fdb624
    • Joel Becker's avatar
      ocfs2: Introduce the new ocfs2_cluster_connect/disconnect() API. · 4670c46d
      Joel Becker authored
      
      This step introduces a cluster stack agnostic API for initializing and
      exiting.  fs/ocfs2/dlmglue.c no longer uses o2cb/o2dlm knowledge to
      connect to the stack.  It is all handled in stackglue.c.
      
      heartbeat.c no longer needs to know how it gets called.
      ocfs2_do_node_down() is now a clean recovery trigger.
      
      The big gotcha is the ordering of initializations and de-initializations done
      underneath ocfs2_cluster_connect().  ocfs2_dlm_init() used to do all
      o2dlm initialization in one block.  Thus, the o2dlm functionality of
      ocfs2_cluster_connect() is very straightforward.  ocfs2_dlm_shutdown(),
      however, did a few things between de-registration of the eviction
      callback and actually shutting down the domain.  Now de-registration and
      shutdown of the domain are wrapped within the single
      ocfs2_cluster_disconnect() call.  I've checked the code paths to make
      sure we can safely tear down things in ocfs2_dlm_shutdown() before
      calling ocfs2_cluster_disconnect().  The filesystem has already set
      itself to ignore the callback.
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      Signed-off-by: default avatarMark Fasheh <mfasheh@suse.com>
      4670c46d
    • Joel Becker's avatar
      ocfs2: Create the lock status block union. · 8f2c9c1b
      Joel Becker authored
      
      Wrap the lock status block (lksb) in a union.  Later we will add a union
      element for the fs/dlm lksb.  Create accessors for the status and lvb
      fields.
      
      Other than a debugging function, dlmglue.c does not directly reference
      the o2dlm locking path anymore.
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      Signed-off-by: default avatarMark Fasheh <mfasheh@suse.com>
      8f2c9c1b
    • Joel Becker's avatar
      ocfs2: New slot map format · 386a2ef8
      Joel Becker authored
      
      The old slot map had a few limitations:
      
      - It was limited to one block, so the maximum slot count was 255.
      - Each slot was signed 16bits, limiting node numbers to INT16_MAX.
      - An empty slot was marked by the magic 0xFFFF (-1).
      
      The new slot map format provides 32bit node numbers (UINT32_MAX), a
      separate space to mark a slot in use, and extra room to grow.  The slot
      map is now bounded by i_size, not a block.
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      Signed-off-by: default avatarMark Fasheh <mfasheh@suse.com>
      386a2ef8
    • Joel Becker's avatar
      ocfs2: De-magic the in-memory slot map. · fc881fa0
      Joel Becker authored
      
      The in-memory slot map uses the same magic as the on-disk one.  There is
      a special value to mark a slot as invalid.  It relies on the size of
      certain types and so on.
      
      Write a new in-memory map that keeps validity as a separate field.  Outside
      of the I/O functions, OCFS2_INVALID_SLOT now means what it is supposed to.
      It also is no longer tied to the type size.
      
      This also means that only the I/O functions refer to 16bit quantities.
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      Signed-off-by: default avatarMark Fasheh <mfasheh@suse.com>
      fc881fa0
    • Joel Becker's avatar
      ocfs2: Change the recovery map to an array of node numbers. · 553abd04
      Joel Becker authored
      
      The old recovery map was a bitmap of node numbers.  This was sufficient
      for the maximum node number of 254.  Going forward, we want node numbers
      to be UINT32.  Thus, we need a new recovery map.
      
      Note that we can't keep track of slots here.  We must write down the
      node number to recovery *before* we get the locks needed to convert a
      node number into a slot number.
      
      The recovery map is now an array of unsigned ints, max_slots in size.
      It moves to journal.c with the rest of recovery.
      
      Because it needs to be initialized, we move all of recovery initialization
      into a new function, ocfs2_recovery_init().  This actually cleans up
      ocfs2_initialize_super() a little as well.  Following on, recovery cleaup
      becomes part of ocfs2_recovery_exit().
      
      A number of node map functions are rendered obsolete and are removed.
      
      Finally, waiting on recovery is wrapped in a function rather than naked
      checks on the recovery_event.  This is a cleanup from Mark.
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      Signed-off-by: default avatarMark Fasheh <mfasheh@suse.com>
      553abd04
    • Joel Becker's avatar
      ocfs2: Make ocfs2_slot_info private. · d85b20e4
      Joel Becker authored
      
      Just use osb_lock around the ocfs2_slot_info data.  This allows us to
      take the ocfs2_slot_info structure private in slot_info.c.  All access
      is now via accessors.
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      Signed-off-by: default avatarMark Fasheh <mfasheh@suse.com>
      d85b20e4
  11. 08 Feb, 2008 1 commit
    • Marcin Slusarz's avatar
      byteorder: move le32_add_cpu & friends from OCFS2 to core · 8b5f6883
      Marcin Slusarz authored
      
      This patchset moves le*_add_cpu and be*_add_cpu functions from OCFS2 to core
      header (1st), converts ext3 filesystem to this API (2nd) and replaces XFS
      different named functions with new ones (3rd).
      
      There are many places where these functions will be useful.  Just look at:
      grep -r 'cpu_to_[ble12346]*([ble12346]*_to_cpu.*[-+]' linux-src/ Patch for
      ext3 is an example how conversions will probably look like.
      
      This patch:
      
      - move inline functions which add native byte order variable to
        little/big endian variable to core header
        * le16_add_cpu(__le16 *var, u16 val)
        * le32_add_cpu(__le32 *var, u32 val)
        * le64_add_cpu(__le64 *var, u64 val)
        * be32_add_cpu(__be32 *var, u32 val)
      - add for completeness:
        * be16_add_cpu(__be16 *var, u16 val)
        * be64_add_cpu(__be64 *var, u64 val)
      Signed-off-by: default avatarMarcin Slusarz <marcin.slusarz@gmail.com>
      Acked-by: default avatarMark Fasheh <mark.fasheh@oracle.com>
      Cc: David Chinner <dgc@sgi.com>
      Cc: Timothy Shimmin <tes@sgi.com>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8b5f6883
  12. 07 Feb, 2008 1 commit
    • Joel Becker's avatar
      ocfs2: Negotiate locking protocol versions. · d24fbcda
      Joel Becker authored
      
      Currently, when ocfs2 nodes connect via TCP, they advertise their
      compatibility level.  If the versions do not match, two nodes cannot speak
      to each other and they disconnect. As a result, this provides no forward or
      backwards compatibility.
      
      This patch implements a simple protocol negotiation at the dlm level by
      introducing a major/minor version number scheme for entities that
      communicate.  Specifically, o2dlm has a major/minor version for interaction
      with o2dlm on other nodes, and ocfs2 itself has a major/minor version for
      interacting with the filesystem on other nodes.
      
      This will allow rolling upgrades of ocfs2 clusters when changes to the
      locking or network protocols can be done in a backwards compatible manner.
      In those cases, only the minor number is changed and the negotatied protocol
      minor is returned from dlm join. In the far less likely event that a
      required protocol change makes backwards compatibility impossible, we simply
      bump the major number.
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      Signed-off-by: default avatarMark Fasheh <mark.fasheh@oracle.com>
      d24fbcda
  13. 25 Jan, 2008 4 commits
    • Mark Fasheh's avatar
      ocfs2: document access rules for blocked_lock_list · 7ec373cf
      Mark Fasheh authored
      
      ocfs2_super->blocked_lock_list and ocfs2_super->blocked_lock_count have some
      usage restrictions which aren't immediately obvious to anyone reading the
      code. It's a good idea to document this so that we avoid making costly
      mistakes in the future.
      Signed-off-by: default avatarMark Fasheh <mark.fasheh@oracle.com>
      7ec373cf
    • Mark Fasheh's avatar
      [PATCH 2/2] ocfs2: cluster aware flock() · 53fc622b
      Mark Fasheh authored
      
      Hook up ocfs2_flock(), using the new flock lock type in dlmglue.c. A new
      mount option, "localflocks" is added so that users can revert to old
      functionality as need be.
      Signed-off-by: default avatarMark Fasheh <mark.fasheh@oracle.com>
      53fc622b
    • Mark Fasheh's avatar
      [PATCH 1/2] ocfs2: add flock lock type · cf8e06f1
      Mark Fasheh authored
      
      This adds a new dlmglue lock type which is intended to back flock()
      requests.
      
      Since these locks are driven from userspace, usage rules are much more
      liberal than the typical Ocfs2 internal cluster lock. As a result, we can't
      make use of most dlmglue features - lock caching and lock level
      optimizations in particular. Additionally, userspace is free to deadlock
      itself, so we have to deal with that in the same way as the rest of the
      kernel - by allowing a signal to abort a lock request.
      
      In order to keep ocfs2_cluster_lock() complexity down, ocfs2_file_lock()
      does it's own dlm coordination. We still use the same helper functions
      though, so duplicated code is kept to a minimum.
      Signed-off-by: default avatarMark Fasheh <mark.fasheh@oracle.com>
      cf8e06f1
    • Sunil Mushran's avatar
      ocfs2: Local alloc window size changeable via mount option · 2fbe8d1e
      Sunil Mushran authored
      
      Local alloc is a performance optimization in ocfs2 in which a node
      takes a window of bits from the global bitmap and then uses that for
      all small local allocations. This window size is fixed to 8MB currently.
      This patch allows users to specify the window size in MB including
      disabling it by passing in 0. If the number specified is too large,
      the fs will use the default value of 8MB.
      
      mount -o localalloc=X /dev/sdX /mntpoint
      Signed-off-by: default avatarSunil Mushran <sunil.mushran@oracle.com>
      Signed-off-by: default avatarMark Fasheh <mark.fasheh@oracle.com>
      2fbe8d1e