An error occurred fetching the project authors.
  1. 13 Oct, 2016 1 commit
  2. 30 Sep, 2016 2 commits
  3. 06 Sep, 2016 2 commits
    • Dmitry Monakhov's avatar
      ext4: improve ext4lazyinit scalability · e22834f0
      Dmitry Monakhov authored
      ext4lazyinit is a global thread. This thread performs itable
      initalization under li_list_mtx mutex.
      
      It basically does the following:
      ext4_lazyinit_thread
        ->mutex_lock(&eli->li_list_mtx);
        ->ext4_run_li_request(elr)
          ->ext4_init_inode_table-> Do a lot of IO if the list is large
      
      And when new mount/umount arrive they have to block on ->li_list_mtx
      because  lazy_thread holds it during full walk procedure.
      ext4_fill_super
       ->ext4_register_li_request
         ->mutex_lock(&ext4_li_info->li_list_mtx);
         ->list_add(&elr->lr_request, &ext4_li_info >li_request_list);
      In my case mount takes 40minutes on server with 36 * 4Tb HDD.
      Common user may face this in case of very slow dev ( /dev/mmcblkXXX)
      Even more. If one of filesystems was frozen lazyinit_thread will simply
      block on sb_start_write() so other mount/umount will be stuck forever.
      
      This patch changes logic like follows:
      - grab ->s_umount read sem before processing new li_request.
        After that it is safe to drop li_list_mtx because all callers of
        li_remove_request are holding ->s_umount for write.
      - li_thread skips frozen SB's
      
      Locking order:
      Mh KOrder is asserted by umount path like follows: s_umount ->li_list_mtx so
      the only way to to grab ->s_mount inside li_thread is via down_read_trylock
      
      xfstests:ext4/023
      #PSBM-49658
      Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      e22834f0
    • Jan Kara's avatar
      ext4: enable quota enforcement based on mount options · 49da9392
      Jan Kara authored
      When quota information is stored in quota files, we enable only quota
      accounting on mount and enforcement is enabled only in response to
      Q_QUOTAON quotactl. To make ext4 behavior consistent with XFS, we add a
      possibility to enable quota enforcement on mount by specifying
      corresponding quota mount option (usrquota, grpquota, prjquota).
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      49da9392
  4. 01 Aug, 2016 1 commit
  5. 15 Jul, 2016 1 commit
    • Vegard Nossum's avatar
      ext4: short-cut orphan cleanup on error · c65d5c6c
      Vegard Nossum authored
      If we encounter a filesystem error during orphan cleanup, we should stop.
      Otherwise, we may end up in an infinite loop where the same inode is
      processed again and again.
      
          EXT4-fs (loop0): warning: checktime reached, running e2fsck is recommended
          EXT4-fs error (device loop0): ext4_mb_generate_buddy:758: group 2, block bitmap and bg descriptor inconsistent: 6117 vs 0 free clusters
          Aborting journal on device loop0-8.
          EXT4-fs (loop0): Remounting filesystem read-only
          EXT4-fs error (device loop0) in ext4_free_blocks:4895: Journal has aborted
          EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
          EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
          EXT4-fs error (device loop0) in ext4_ext_remove_space:3068: IO failure
          EXT4-fs error (device loop0) in ext4_ext_truncate:4667: Journal has aborted
          EXT4-fs error (device loop0) in ext4_orphan_del:2927: Journal has aborted
          EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
          EXT4-fs (loop0): Inode 16 (00000000618192a0): orphan list check failed!
          [...]
          EXT4-fs (loop0): Inode 16 (0000000061819748): orphan list check failed!
          [...]
          EXT4-fs (loop0): Inode 16 (0000000061819bf0): orphan list check failed!
          [...]
      
      See-also: c9eb13a9 ("ext4: fix hang when processing corrupted orphaned inode list")
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      c65d5c6c
  6. 10 Jul, 2016 1 commit
  7. 06 Jul, 2016 1 commit
    • Theodore Ts'o's avatar
      ext4: validate s_reserved_gdt_blocks on mount · 5b9554dc
      Theodore Ts'o authored
      If s_reserved_gdt_blocks is extremely large, it's possible for
      ext4_init_block_bitmap(), which is called when ext4 sets up an
      uninitialized block bitmap, to corrupt random kernel memory.  Add the
      same checks which e2fsck has --- it must never be larger than
      blocksize / sizeof(__u32) --- and then add a backup check in
      ext4_init_block_bitmap() in case the superblock gets modified after
      the file system is mounted.
      Reported-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      5b9554dc
  8. 04 Jul, 2016 1 commit
    • Pranay Kr. Srivastava's avatar
      ext4: Fix WARN_ON_ONCE in ext4_commit_super() · 4743f839
      Pranay Kr. Srivastava authored
      If there are racing calls to ext4_commit_super() it's possible for
      another writeback of the superblock to result in the buffer being
      marked with an error after we check if the buffer is marked as having
      a write error and the buffer up-to-date flag is set again.  If that
      happens mark_buffer_dirty() can end up throwing a WARN_ON_ONCE.
      
      Fix this by moving this check to write before we call
      write_buffer_dirty(), and keeping the buffer locked during this whole
      sequence.
      Signed-off-by: default avatarPranay Kr. Srivastava <pranjas@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      4743f839
  9. 03 Jul, 2016 1 commit
  10. 07 Jun, 2016 1 commit
  11. 17 May, 2016 1 commit
  12. 26 Apr, 2016 1 commit
  13. 04 Apr, 2016 1 commit
    • Kirill A. Shutemov's avatar
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov authored
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  14. 03 Apr, 2016 1 commit
    • Theodore Ts'o's avatar
      ext4: ignore quota mount options if the quota feature is enabled · c325a67c
      Theodore Ts'o authored
      Previously, ext4 would fail the mount if the file system had the quota
      feature enabled and quota mount options (used for the older quota
      setups) were present.  This broke xfstests, since xfs silently ignores
      the usrquote and grpquota mount options if they are specified.  This
      commit changes things so that we are consistent with xfs; having the
      mount options specified is harmless, so no sense break users by
      forbidding them.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      c325a67c
  15. 01 Apr, 2016 2 commits
    • Theodore Ts'o's avatar
      ext4: avoid calling dquot_get_next_id() if quota is not enabled · 8f0e8746
      Theodore Ts'o authored
      This should be fixed in the quota layer so we can test with the quota
      mutex held, but for now, we need this to avoid tests from crashing the
      kernel aborting the regression test suite.
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      8f0e8746
    • Theodore Ts'o's avatar
      ext4: add lockdep annotations for i_data_sem · daf647d2
      Theodore Ts'o authored
      With the internal Quota feature, mke2fs creates empty quota inodes and
      quota usage tracking is enabled as soon as the file system is mounted.
      Since quotacheck is no longer preallocating all of the blocks in the
      quota inode that are likely needed to be written to, we are now seeing
      a lockdep false positive caused by needing to allocate a quota block
      from inside ext4_map_blocks(), while holding i_data_sem for a data
      inode.  This results in this complaint:
      
        Possible unsafe locking scenario:
      
              CPU0                    CPU1
              ----                    ----
         lock(&ei->i_data_sem);
                                      lock(&s->s_dquot.dqio_mutex);
                                      lock(&ei->i_data_sem);
         lock(&s->s_dquot.dqio_mutex);
      
      Google-Bug-Id: 27907753
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      daf647d2
  16. 13 Mar, 2016 1 commit
  17. 09 Mar, 2016 2 commits
    • Jan Kara's avatar
      ext4: remove i_ioend_count · 600be30a
      Jan Kara authored
      Remove counter of pending io ends as it is unused.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      600be30a
    • Jan Kara's avatar
      ext4: use i_mutex to serialize unaligned AIO DIO · e142d052
      Jan Kara authored
      Currently we've used hashed aio_mutex to serialize unaligned AIO DIO.
      However the code cleanups that happened after 2011 when the lock was
      introduced made aio_mutex acquired at almost the same places where we
      already have exclusion using i_mutex. So just use i_mutex for the
      exclusion of unaligned AIO DIO.
      
      The change moves waiting for pending unwritten extent conversion under
      i_mutex. That makes special handling of O_APPEND writes unnecessary and
      also avoids possible livelocking of unaligned AIO DIO with aligned one
      (nothing was preventing contiguous stream of aligned AIO DIOs to let
      unaligned AIO DIO wait forever).
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      e142d052
  18. 23 Feb, 2016 1 commit
  19. 22 Feb, 2016 1 commit
    • Jan Kara's avatar
      ext4: convert to mbcache2 · 82939d79
      Jan Kara authored
      The conversion is generally straightforward. The only tricky part is
      that xattr block corresponding to found mbcache entry can get freed
      before we get buffer lock for that block. So we have to check whether
      the entry is still valid after getting buffer lock.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      82939d79
  20. 19 Feb, 2016 1 commit
  21. 09 Feb, 2016 1 commit
  22. 22 Jan, 2016 1 commit
    • Al Viro's avatar
      wrappers for ->i_mutex access · 5955102c
      Al Viro authored
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  23. 15 Jan, 2016 1 commit
    • Vladimir Davydov's avatar
      kmemcg: account certain kmem allocations to memcg · 5d097056
      Vladimir Davydov authored
      Mark those kmem allocations that are known to be easily triggered from
      userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
      memcg.  For the list, see below:
      
       - threadinfo
       - task_struct
       - task_delay_info
       - pid
       - cred
       - mm_struct
       - vm_area_struct and vm_region (nommu)
       - anon_vma and anon_vma_chain
       - signal_struct
       - sighand_struct
       - fs_struct
       - files_struct
       - fdtable and fdtable->full_fds_bits
       - dentry and external_name
       - inode for all filesystems. This is the most tedious part, because
         most filesystems overwrite the alloc_inode method.
      
      The list is far from complete, so feel free to add more objects.
      Nevertheless, it should be close to "account everything" approach and
      keep most workloads within bounds.  Malevolent users will be able to
      breach the limit, but this was possible even with the former "account
      everything" approach (simply because it did not account everything in
      fact).
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5d097056
  24. 08 Jan, 2016 2 commits
  25. 07 Dec, 2015 2 commits
    • Jan Kara's avatar
      ext4: document lock ordering · e74031fd
      Jan Kara authored
      We have enough locks that it's probably worth documenting the lock
      ordering rules we have in ext4.
      Signed-off-by: default avatarJan Kara <jack@suse.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      e74031fd
    • Jan Kara's avatar
      ext4: fix races between page faults and hole punching · ea3d7209
      Jan Kara authored
      Currently, page faults and hole punching are completely unsynchronized.
      This can result in page fault faulting in a page into a range that we
      are punching after truncate_pagecache_range() has been called and thus
      we can end up with a page mapped to disk blocks that will be shortly
      freed. Filesystem corruption will shortly follow. Note that the same
      race is avoided for truncate by checking page fault offset against
      i_size but there isn't similar mechanism available for punching holes.
      
      Fix the problem by creating new rw semaphore i_mmap_sem in inode and
      grab it for writing over truncate, hole punching, and other functions
      removing blocks from extent tree and for read over page faults. We
      cannot easily use i_data_sem for this since that ranks below transaction
      start and we need something ranking above it so that it can be held over
      the whole truncate / hole punching operation. Also remove various
      workarounds we had in the code to reduce race window when page fault
      could have created pages with stale mapping information.
      Signed-off-by: default avatarJan Kara <jack@suse.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      ea3d7209
  26. 16 Nov, 2015 1 commit
    • Dan Williams's avatar
      ext2, ext4: warn when mounting with dax enabled · ef83b6e8
      Dan Williams authored
      Similar to XFS warn when mounting DAX while it is still considered under
      development.  Also, aspects of the DAX implementation, for example
      synchronization against multiple faults and faults causing block
      allocation, depend on the correct implementation in the filesystem.  The
      maturity of a given DAX implementation is filesystem specific.
      
      Cc: <stable@vger.kernel.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: linux-ext4@vger.kernel.org
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: default avatarDave Chinner <david@fromorbit.com>
      Acked-by: default avatarJan Kara <jack@suse.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      ef83b6e8
  27. 07 Nov, 2015 1 commit
    • Mel Gorman's avatar
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep... · d0164adc
      Mel Gorman authored
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd
      
      __GFP_WAIT has been used to identify atomic context in callers that hold
      spinlocks or are in interrupts.  They are expected to be high priority and
      have access one of two watermarks lower than "min" which can be referred
      to as the "atomic reserve".  __GFP_HIGH users get access to the first
      lower watermark and can be called the "high priority reserve".
      
      Over time, callers had a requirement to not block when fallback options
      were available.  Some have abused __GFP_WAIT leading to a situation where
      an optimisitic allocation with a fallback option can access atomic
      reserves.
      
      This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
      cannot sleep and have no alternative.  High priority users continue to use
      __GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
      are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
      callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
      redefined as a caller that is willing to enter direct reclaim and wake
      kswapd for background reclaim.
      
      This patch then converts a number of sites
      
      o __GFP_ATOMIC is used by callers that are high priority and have memory
        pools for those requests. GFP_ATOMIC uses this flag.
      
      o Callers that have a limited mempool to guarantee forward progress clear
        __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
        into this category where kswapd will still be woken but atomic reserves
        are not used as there is a one-entry mempool to guarantee progress.
      
      o Callers that are checking if they are non-blocking should use the
        helper gfpflags_allow_blocking() where possible. This is because
        checking for __GFP_WAIT as was done historically now can trigger false
        positives. Some exceptions like dm-crypt.c exist where the code intent
        is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
        flag manipulations.
      
      o Callers that built their own GFP flags instead of starting with GFP_KERNEL
        and friends now also need to specify __GFP_KSWAPD_RECLAIM.
      
      The first key hazard to watch out for is callers that removed __GFP_WAIT
      and was depending on access to atomic reserves for inconspicuous reasons.
      In some cases it may be appropriate for them to use __GFP_HIGH.
      
      The second key hazard is callers that assembled their own combination of
      GFP flags instead of starting with something like GFP_KERNEL.  They may
      now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
      if it's missed in most cases as other activity will wake kswapd.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vitaly Wool <vitalywool@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d0164adc
  28. 19 Oct, 2015 2 commits
  29. 18 Oct, 2015 1 commit
    • Daeho Jeong's avatar
      ext4, jbd2: ensure entering into panic after recording an error in superblock · 4327ba52
      Daeho Jeong authored
      If a EXT4 filesystem utilizes JBD2 journaling and an error occurs, the
      journaling will be aborted first and the error number will be recorded
      into JBD2 superblock and, finally, the system will enter into the
      panic state in "errors=panic" option.  But, in the rare case, this
      sequence is little twisted like the below figure and it will happen
      that the system enters into panic state, which means the system reset
      in mobile environment, before completion of recording an error in the
      journal superblock. In this case, e2fsck cannot recognize that the
      filesystem failure occurred in the previous run and the corruption
      wouldn't be fixed.
      
      Task A                        Task B
      ext4_handle_error()
      -> jbd2_journal_abort()
        -> __journal_abort_soft()
          -> __jbd2_journal_abort_hard()
          | -> journal->j_flags |= JBD2_ABORT;
          |
          |                         __ext4_abort()
          |                         -> jbd2_journal_abort()
          |                         | -> __journal_abort_soft()
          |                         |   -> if (journal->j_flags & JBD2_ABORT)
          |                         |           return;
          |                         -> panic()
          |
          -> jbd2_journal_update_sb_errno()
      Tested-by: default avatarHobin Woo <hobin.woo@samsung.com>
      Signed-off-by: default avatarDaeho Jeong <daeho.jeong@samsung.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      4327ba52
  30. 17 Oct, 2015 3 commits
  31. 23 Sep, 2015 1 commit