1. 09 Jul, 2011 6 commits
    • KAMEZAWA Hiroyuki's avatar
      memcg: fix reclaimable lru check in memcg · 4d0c066d
      KAMEZAWA Hiroyuki authored
      Now, in mem_cgroup_hierarchical_reclaim(), mem_cgroup_local_usage() is
      used for checking whether the memcg contains reclaimable pages or not.  If
      no pages in it, the routine skips it.
      
      But, mem_cgroup_local_usage() contains Unevictable pages and cannot handle
      "noswap" condition correctly.  This doesn't work on a swapless system.
      
      This patch adds test_mem_cgroup_reclaimable() and replaces
      mem_cgroup_local_usage().  test_mem_cgroup_reclaimable() see LRU counter
      and returns correct answer to the caller.  And this new function has
      "noswap" argument and can see only FILE LRU if necessary.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: fix kerneldoc layout]
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Ying Han <yinghan@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4d0c066d
    • Shaohua Li's avatar
      mm: __tlb_remove_page() check the correct batch · 0b43c3aa
      Shaohua Li authored
      __tlb_remove_page() switches to a new batch page, but still checks space
      in the old batch.  This check always fails, and causes a forced tlb flush.
      Signed-off-by: default avatarShaohua Li <shaohua.li@intel.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0b43c3aa
    • Mel Gorman's avatar
      mm: vmscan: only read new_classzone_idx from pgdat when reclaiming successfully · 215ddd66
      Mel Gorman authored
      During allocator-intensive workloads, kswapd will be woken frequently
      causing free memory to oscillate between the high and min watermark.  This
      is expected behaviour.  Unfortunately, if the highest zone is small, a
      problem occurs.
      
      When balance_pgdat() returns, it may be at a lower classzone_idx than it
      started because the highest zone was unreclaimable.  Before checking if it
      should go to sleep though, it checks pgdat->classzone_idx which when there
      is no other activity will be MAX_NR_ZONES-1.  It interprets this as it has
      been woken up while reclaiming, skips scheduling and reclaims again.  As
      there is no useful reclaim work to do, it enters into a loop of shrinking
      slab consuming loads of CPU until the highest zone becomes reclaimable for
      a long period of time.
      
      There are two problems here.  1) If the returned classzone or order is
      lower, it'll continue reclaiming without scheduling.  2) if the highest
      zone was marked unreclaimable but balance_pgdat() returns immediately at
      DEF_PRIORITY, the new lower classzone is not communicated back to kswapd()
      for sleeping.
      
      This patch does two things that are related.  If the end_zone is
      unreclaimable, this information is communicated back.  Second, if the
      classzone or order was reduced due to failing to reclaim, new information
      is not read from pgdat and instead an attempt is made to go to sleep.  Due
      to this, it is also necessary that pgdat->classzone_idx be initialised
      each time to pgdat->nr_zones - 1 to avoid re-reads being interpreted as
      wakeups.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reported-by: default avatarPádraig Brady <P@draigBrady.com>
      Tested-by: default avatarPádraig Brady <P@draigBrady.com>
      Tested-by: default avatarAndrew Lutomirski <luto@mit.edu>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      215ddd66
    • Mel Gorman's avatar
      mm: vmscan: evaluate the watermarks against the correct classzone · da175d06
      Mel Gorman authored
      When deciding if kswapd is sleeping prematurely, the classzone is taken
      into account but this is different to what balance_pgdat() and the
      allocator are doing.  Specifically, the DMA zone will be checked based on
      the classzone used when waking kswapd which could be for a GFP_KERNEL or
      GFP_HIGHMEM request.  The lowmem reserve limit kicks in, the watermark is
      not met and kswapd thinks it's sleeping prematurely keeping kswapd awake in
      error.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reported-by: default avatarPádraig Brady <P@draigBrady.com>
      Tested-by: default avatarPádraig Brady <P@draigBrady.com>
      Tested-by: default avatarAndrew Lutomirski <luto@mit.edu>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Reviewed-by: default avatarMinchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      da175d06
    • Mel Gorman's avatar
      mm: vmscan: do not apply pressure to slab if we are not applying pressure to zone · d7868dae
      Mel Gorman authored
      During allocator-intensive workloads, kswapd will be woken frequently
      causing free memory to oscillate between the high and min watermark.  This
      is expected behaviour.
      
      When kswapd applies pressure to zones during node balancing, it checks if
      the zone is above a high+balance_gap threshold.  If it is, it does not
      apply pressure but it unconditionally shrinks slab on a global basis which
      is excessive.  In the event kswapd is being kept awake due to a high small
      unreclaimable zone, it skips zone shrinking but still calls shrink_slab().
      
      Once pressure has been applied, the check for zone being unreclaimable is
      being made before the check is made if all_unreclaimable should be set.
      This miss of unreclaimable can cause has_under_min_watermark_zone to be
      set due to an unreclaimable zone preventing kswapd backing off on
      congestion_wait().
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reported-by: default avatarPádraig Brady <P@draigBrady.com>
      Tested-by: default avatarPádraig Brady <P@draigBrady.com>
      Tested-by: default avatarAndrew Lutomirski <luto@mit.edu>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Reviewed-by: default avatarMinchan Kim <minchan.kim@gmail.com>
      Reviewed-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d7868dae
    • Mel Gorman's avatar
      mm: vmscan: correct check for kswapd sleeping in sleeping_prematurely · 08951e54
      Mel Gorman authored
      During allocator-intensive workloads, kswapd will be woken frequently
      causing free memory to oscillate between the high and min watermark.  This
      is expected behaviour.  Unfortunately, if the highest zone is small, a
      problem occurs.
      
      This seems to happen most with recent sandybridge laptops but it's
      probably a co-incidence as some of these laptops just happen to have a
      small Normal zone.  The reproduction case is almost always during copying
      large files that kswapd pegs at 100% CPU until the file is deleted or
      cache is dropped.
      
      The problem is mostly down to sleeping_prematurely() keeping kswapd awake
      when the highest zone is small and unreclaimable and compounded by the
      fact we shrink slabs even when not shrinking zones causing a lot of time
      to be spent in shrinkers and a lot of memory to be reclaimed.
      
      Patch 1 corrects sleeping_prematurely to check the zones matching
      	the classzone_idx instead of all zones.
      
      Patch 2 avoids shrinking slab when we are not shrinking a zone.
      
      Patch 3 notes that sleeping_prematurely is checking lower zones against
      	a high classzone which is not what allocators or balance_pgdat()
      	is doing leading to an artifical belief that kswapd should be
      	still awake.
      
      Patch 4 notes that when balance_pgdat() gives up on a high zone that the
      	decision is not communicated to sleeping_prematurely()
      
      This problem affects 2.6.38.8 for certain and is expected to affect 2.6.39
      and 3.0-rc4 as well.  If accepted, they need to go to -stable to be picked
      up by distros and this series is against 3.0-rc4.  I've cc'd people that
      reported similar problems recently to see if they still suffer from the
      problem and if this fixes it.
      
      This patch: correct the check for kswapd sleeping in sleeping_prematurely()
      
      During allocator-intensive workloads, kswapd will be woken frequently
      causing free memory to oscillate between the high and min watermark.  This
      is expected behaviour.
      
      A problem occurs if the highest zone is small.  balance_pgdat() only
      considers unreclaimable zones when priority is DEF_PRIORITY but
      sleeping_prematurely considers all zones.  It's possible for this sequence
      to occur
      
        1. kswapd wakes up and enters balance_pgdat()
        2. At DEF_PRIORITY, marks highest zone unreclaimable
        3. At DEF_PRIORITY-1, ignores highest zone setting end_zone
        4. At DEF_PRIORITY-1, calls shrink_slab freeing memory from
              highest zone, clearing all_unreclaimable. Highest zone
              is still unbalanced
        5. kswapd returns and calls sleeping_prematurely
        6. sleeping_prematurely looks at *all* zones, not just the ones
           being considered by balance_pgdat. The highest small zone
           has all_unreclaimable cleared but the zone is not
           balanced. all_zones_ok is false so kswapd stays awake
      
      This patch corrects the behaviour of sleeping_prematurely to check the
      zones balance_pgdat() checked.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reported-by: default avatarPádraig Brady <P@draigBrady.com>
      Tested-by: default avatarPádraig Brady <P@draigBrady.com>
      Tested-by: default avatarAndrew Lutomirski <luto@mit.edu>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Reviewed-by: default avatarMinchan Kim <minchan.kim@gmail.com>
      Reviewed-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      08951e54
  2. 08 Jul, 2011 5 commits
  3. 07 Jul, 2011 21 commits
  4. 06 Jul, 2011 8 commits
    • Dave Chinner's avatar
      xfs: unpin stale inodes directly in IOP_COMMITTED · 1316d4da
      Dave Chinner authored
      When inodes are marked stale in a transaction, they are treated
      specially when the inode log item is being inserted into the AIL.
      It tries to avoid moving the log item forward in the AIL due to a
      race condition with the writing the underlying buffer back to disk.
      The was "fixed" in commit de25c181 ("xfs: avoid moving stale inodes
      in the AIL").
      
      To avoid moving the item forward, we return a LSN smaller than the
      commit_lsn of the completing transaction, thereby trying to trick
      the commit code into not moving the inode forward at all. I'm not
      sure this ever worked as intended - it assumes the inode is already
      in the AIL, but I don't think the returned LSN would have been small
      enough to prevent moving the inode. It appears that the reason it
      worked is that the lower LSN of the inodes meant they were inserted
      into the AIL and flushed before the inode buffer (which was moved to
      the commit_lsn of the transaction).
      
      The big problem is that with delayed logging, the returning of the
      different LSN means insertion takes the slow, non-bulk path.  Worse
      yet is that insertion is to a position -before- the commit_lsn so it
      is doing a AIL traversal on every insertion, and has to walk over
      all the items that have already been inserted into the AIL. It's
      expensive.
      
      To compound the matter further, with delayed logging inodes are
      likely to go from clean to stale in a single checkpoint, which means
      they aren't even in the AIL at all when we come across them at AIL
      insertion time. Hence these were all getting inserted into the AIL
      when they simply do not need to be as inodes marked XFS_ISTALE are
      never written back.
      
      Transactional/recovery integrity is maintained in this case by the
      other items in the unlink transaction that were modified (e.g. the
      AGI btree blocks) and committed in the same checkpoint.
      
      So to fix this, simply unpin the stale inodes directly in
      xfs_inode_item_committed() and return -1 to indicate that the AIL
      insertion code does not need to do any further processing of these
      inodes.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAlex Elder <aelder@sgi.com>
      1316d4da
    • Andrea Righi's avatar
      Documentation: fix cgroup blkio throttle filenames · 9b61fc4c
      Andrea Righi authored
      All the blkio.throttle.* file names are incorrectly reported without
      ".throttle" in the documentation. Fix it.
      Signed-off-by: default avatarAndrea Righi <andrea@betterlinux.com>
      Signed-off-by: default avatarRandy Dunlap <randy.dunlap@oracle.com>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9b61fc4c
    • Jesper Juhl's avatar
      Documentation: update CodingStyle memory allocators · 316b3799
      Jesper Juhl authored
      The list of available general purpose memory allocators in
      Documentation/CodingStyle chapter 14 is incomplete. This patch adds
      the missing vzalloc() to the list.
      Signed-off-by: default avatarJesper Juhl <jj@chaosbits.net>
      Signed-off-by: default avatarRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      316b3799
    • Randy Dunlap's avatar
      MAINTAINERS: move kernel-doc patches location · 0dcb6d73
      Randy Dunlap authored
      Move location of quilt series for kernel-doc patches.
      Signed-off-by: default avatarRandy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0dcb6d73
    • Linus Torvalds's avatar
      Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6 · de3796e7
      Linus Torvalds authored
      * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6: (46 commits)
        [media] rc: call input_sync after scancode reports
        [media] imon: allow either proto on unknown 0xffdc
        [media] imon: auto-config ffdc 7e device
        [media] saa7134: fix raw IR timeout value
        [media] rc: fix ghost keypresses with certain hw
        [media] [staging] lirc_serial: allocate irq at init time
        [media] lirc_zilog: fix spinning rx thread
        [media] keymaps: fix table for pinnacle pctv hd devices
        [media] ite-cir: 8709 needs to use pnp resource 2
        [media] V4L: mx1-camera: fix uninitialized variable
        [media] omap_vout: Added check in reqbuf & mmap for buf_size allocation
        [media] OMAP_VOUT: Change hardcoded device node number to -1
        [media] OMAP_VOUTLIB: Fix wrong resizer calculation
        [media] uvcvideo: Disable the queue when failing to start
        [media] uvcvideo: Remove buffers from the queues when freeing
        [media] uvcvideo: Ignore entities for terminals with no supported format
        [media] v4l: Don't access media entity after is has been destroyed
        [media] media: omap3isp: fix a potential NULL deref
        [media] media: vb2: fix allocation failure check
        [media] media: vb2: reset queued_count value during queue reinitialization
        ...
      
      Fix up trivial conflict in MAINTAINERS as per Mauro
      de3796e7
    • Davidlohr Bueso's avatar
      FDPIC: Fix memory leak · bcb65a79
      Davidlohr Bueso authored
      The shdr4extnum variable isn't being freed in the cleanup process of
      elf_fdpic_core_dump().
      Signed-off-by: default avatarDavidlohr Bueso <dave@gnu.org>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bcb65a79
    • Rafael J. Wysocki's avatar
      PM / Hibernate: Fix free_unnecessary_pages() · 4d4cf23c
      Rafael J. Wysocki authored
      There is a bug in free_unnecessary_pages() that causes it to
      attempt to free too many pages in some cases, which triggers the
      BUG_ON() in memory_bm_clear_bit() for copy_bm.  Namely, if
      count_data_pages() is initially greater than alloc_normal, we get
      to_free_normal equal to 0 and "save" greater from 0.  In that case,
      if the sum of "save" and count_highmem_pages() is greater than
      alloc_highmem, we subtract a positive number from to_free_normal.
      Hence, since to_free_normal was 0 before the subtraction and is
      an unsigned int, the result is converted to a huge positive number
      that is used as the number of pages to free.
      
      Fix this bug by checking if to_free_normal is actually greater
      than or equal to the number we're going to subtract from it.
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Reported-and-tested-by: default avatarMatthew Garrett <mjg@redhat.com>
      Cc: stable@kernel.org
      4d4cf23c
    • Ram Pai's avatar
      resource: ability to resize an allocated resource · 23c570a6
      Ram Pai authored
      Provides the ability to resize a resource that is already allocated.
      This functionality is put in place to support reallocation needs of
      pci resources.
      Signed-off-by: default avatarRam Pai <linuxram@us.ibm.com>
      Acked-by: default avatarJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      23c570a6