1. 11 May, 2009 2 commits
    • Ryusuke Konishi's avatar
      nilfs2: fix lock order reversal in nilfs_clean_segments ioctl · 4f6b8288
      Ryusuke Konishi authored
      This is a companion patch to ("nilfs2: fix possible circular locking
      for get information ioctls").
      
      This corrects lock order reversal between mm->mmap_sem and
      nilfs->ns_segctor_sem in nilfs_clean_segments() which was detected by
      lockdep check:
      
       =======================================================
       [ INFO: possible circular locking dependency detected ]
       2.6.30-rc3-nilfs-00003-g360bdc1 #7
       -------------------------------------------------------
       mmap/5294 is trying to acquire lock:
        (&nilfs->ns_segctor_sem){++++.+}, at: [<d0d0e846>] nilfs_transaction_begin+0xb6/0x10c [nilfs2]
      
       but task is already holding lock:
        (&mm->mmap_sem){++++++}, at: [<c043700a>] do_page_fault+0x1d8/0x30a
      
       which lock already depends on the new lock.
      
       the existing dependency chain (in reverse order) is:
      
       -> #1 (&mm->mmap_sem){++++++}:
              [<c01470a5>] __lock_acquire+0x1066/0x13b0
              [<c01474a9>] lock_acquire+0xba/0xdd
              [<c01836bc>] might_fault+0x68/0x88
              [<c023c61d>] copy_from_user+0x2a/0x111
              [<d0d120d0>] nilfs_ioctl_prepare_clean_segments+0x1d/0xf1 [nilfs2]
              [<d0d0e2aa>] nilfs_clean_segments+0x6d/0x1b9 [nilfs2]
              [<d0d11f68>] nilfs_ioctl+0x2ad/0x318 [nilfs2]
              [<c01a3be7>] vfs_ioctl+0x22/0x69
              [<c01a408e>] do_vfs_ioctl+0x460/0x499
              [<c01a4107>] sys_ioctl+0x40/0x5a
              [<c01031a4>] sysenter_do_call+0x12/0x38
              [<ffffffff>] 0xffffffff
      
       -> #0 (&nilfs->ns_segctor_sem){++++.+}:
              [<c0146e0b>] __lock_acquire+0xdcc/0x13b0
              [<c01474a9>] lock_acquire+0xba/0xdd
              [<c0433f1d>] down_read+0x2a/0x3e
              [<d0d0e846>] nilfs_transaction_begin+0xb6/0x10c [nilfs2]
              [<d0cfe0e5>] nilfs_page_mkwrite+0xe7/0x154 [nilfs2]
              [<c0183b0b>] __do_fault+0x165/0x376
              [<c01855cd>] handle_mm_fault+0x287/0x5d1
              [<c043712d>] do_page_fault+0x2fb/0x30a
              [<c0435462>] error_code+0x72/0x78
              [<ffffffff>] 0xffffffff
      
      where nilfs_clean_segments() holds:
      
        nilfs->ns_segctor_sem -> copy_from_user()
                                   --> page fault -> mm->mmap_sem
      
      And, page fault path may hold:
      
        page fault -> mm->mmap_sem
               --> nilfs_page_mkwrite() -> nilfs->ns_segctor_sem
      
      Even though nilfs_clean_segments() does not perform write access on
      given user pages, it may cause deadlock because nilfs->ns_segctor_sem
      is shared per device and mm->mmap_sem can be shared with other tasks.
      
      To avoid this problem, this patch moves all calls of copy_from_user()
      outside the nilfs->ns_segctor_sem lock in the ioctl.
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      4f6b8288
    • Ryusuke Konishi's avatar
      nilfs2: fix possible circular locking for get information ioctls · 47eb6b9c
      Ryusuke Konishi authored
      This is one of two patches which are to correct possible circular
      locking between mm->mmap_sem and nilfs->ns_segctor_sem.
      
      The problem was detected by lockdep check as follows:
      
       =======================================================
       [ INFO: possible circular locking dependency detected ]
       2.6.30-rc3-nilfs-00002-g3552613 #6
       -------------------------------------------------------
       mmap/5418 is trying to acquire lock:
       (&nilfs->ns_segctor_sem){++++.+}, at: [<d0d0e852>] nilfs_transaction_begin+0xb6/0x10c [nilfs2]
      
       but task is already holding lock:
       (&mm->mmap_sem){++++++}, at: [<c043700a>] do_page_fault+0x1d8/0x30a
      
       which lock already depends on the new lock.
      
       the existing dependency chain (in reverse order) is:
      
       -> #1 (&mm->mmap_sem){++++++}:
       [<c01470a5>] __lock_acquire+0x1066/0x13b0
       [<c01474a9>] lock_acquire+0xba/0xdd
       [<c01836bc>] might_fault+0x68/0x88
       [<c023c730>] copy_to_user+0x2c/0xfc
       [<d0d11b4f>] nilfs_ioctl_wrap_copy+0x103/0x160 [nilfs2]
       [<d0d11fa9>] nilfs_ioctl+0x30a/0x3b0 [nilfs2]
       [<c01a3be7>] vfs_ioctl+0x22/0x69
       [<c01a408e>] do_vfs_ioctl+0x460/0x499
       [<c01a4107>] sys_ioctl+0x40/0x5a
       [<c01031a4>] sysenter_do_call+0x12/0x38
       [<ffffffff>] 0xffffffff
      
       -> #0 (&nilfs->ns_segctor_sem){++++.+}:
       [<c0146e0b>] __lock_acquire+0xdcc/0x13b0
       [<c01474a9>] lock_acquire+0xba/0xdd
       [<c0433f1d>] down_read+0x2a/0x3e
       [<d0d0e852>] nilfs_transaction_begin+0xb6/0x10c [nilfs2]
       [<d0cfe0e5>] nilfs_page_mkwrite+0xe7/0x154 [nilfs2]
       [<c0183b0b>] __do_fault+0x165/0x376
       [<c01855cd>] handle_mm_fault+0x287/0x5d1
       [<c043712d>] do_page_fault+0x2fb/0x30a
       [<c0435462>] error_code+0x72/0x78
       [<ffffffff>] 0xffffffff
      
       other info that might help us debug this:
      
       1 lock held by mmap/5418:
       #0:  (&mm->mmap_sem){++++++}, at: [<c043700a>] do_page_fault+0x1d8/0x30a
      
       stack backtrace:
       Pid: 5418, comm: mmap Not tainted 2.6.30-rc3-nilfs-00002-g3552613 #6
       Call Trace:
       [<c0432145>] ? printk+0xf/0x12
       [<c0145c48>] print_circular_bug_tail+0xaa/0xb5
       [<c0146e0b>] __lock_acquire+0xdcc/0x13b0
       [<d0d10149>] ? nilfs_sufile_get_stat+0x1e/0x105 [nilfs2]
       [<c013b59a>] ? up_read+0x16/0x2c
       [<d0d10225>] ? nilfs_sufile_get_stat+0xfa/0x105 [nilfs2]
       [<c01474a9>] lock_acquire+0xba/0xdd
       [<d0d0e852>] ? nilfs_transaction_begin+0xb6/0x10c [nilfs2]
       [<c0433f1d>] down_read+0x2a/0x3e
       [<d0d0e852>] ? nilfs_transaction_begin+0xb6/0x10c [nilfs2]
       [<d0d0e852>] nilfs_transaction_begin+0xb6/0x10c [nilfs2]
       [<d0cfe0e5>] nilfs_page_mkwrite+0xe7/0x154 [nilfs2]
       [<c0183b0b>] __do_fault+0x165/0x376
       [<c01855cd>] handle_mm_fault+0x287/0x5d1
       [<c043700a>] ? do_page_fault+0x1d8/0x30a
       [<c013b54f>] ? down_read_trylock+0x39/0x43
       [<c043712d>] do_page_fault+0x2fb/0x30a
       [<c0436e32>] ? do_page_fault+0x0/0x30a
       [<c0435462>] error_code+0x72/0x78
       [<c0436e32>] ? do_page_fault+0x0/0x30a
      
      This makes the lock granularity of nilfs->ns_segctor_sem finer than
      that of the mmap semaphore for ioctl commands except
      nilfs_clean_segments().
      
      The successive patch ("nilfs2: fix lock order reversal in
      nilfs_clean_segments ioctl") is required to fully resolve the problem.
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      47eb6b9c
  2. 10 May, 2009 1 commit
  3. 09 May, 2009 4 commits
    • Ryusuke Konishi's avatar
      nilfs2: fix circular locking dependency of writer mutex · 201913ed
      Ryusuke Konishi authored
      This fixes the following circular locking dependency problem:
      
       =======================================================
       [ INFO: possible circular locking dependency detected ]
       2.6.30-rc3 #5
       -------------------------------------------------------
       segctord/3895 is trying to acquire lock:
        (&nilfs->ns_writer_mutex){+.+...}, at: [<d0d02172>]
         nilfs_mdt_get_block+0x89/0x20f [nilfs2]
      
       but task is already holding lock:
        (&bmap->b_sem){++++..}, at: [<d0d02d99>]
         nilfs_bmap_propagate+0x14/0x2e [nilfs2]
      
       which lock already depends on the new lock.
      
      The bugfix is done by replacing call sites of nilfs_get_writer() which
      are never called from read-only context with direct dereferencing of
      pointer to a writable FS-instance.
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      201913ed
    • Ryusuke Konishi's avatar
      nilfs2: fix possible recovery failure due to block creation without writer · 85c2a74f
      Ryusuke Konishi authored
      Some function calls in nilfs_prepare_segment_for_recovery() may fail
      because they can create blocks on meta data files without configuring
      a writable FS-instance.  Concretely, nilfs_mdt_create_block() routine
      of meta data files will fail in that case.
      
      This fixes the problem by temporarily attaching a writable FS-instace
      during the function is called.
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      85c2a74f
    • Linus Torvalds's avatar
      Linux 2.6.30-rc5 · 091bf762
      Linus Torvalds authored
      091bf762
    • Linus Torvalds's avatar
      Merge git://git.infradead.org/mtd-2.6 · 621c2559
      Linus Torvalds authored
      * git://git.infradead.org/mtd-2.6:
        mtd: fix timeout in M25P80 driver
        mtd: Bug in m25p80.c during whole-chip erase
        mtd: expose subpage size via sysfs
        mtd: mtd in mtd_release is unused without CONFIG_MTD_CHAR
      621c2559
  4. 08 May, 2009 14 commits
  5. 07 May, 2009 13 commits
    • David Howells's avatar
      NOMMU: Don't check vm_region::vm_start is page aligned in add_nommu_region() · 8c9ed899
      David Howells authored
      Don't check vm_region::vm_start is page aligned in add_nommu_region() because
      the region may reflect some non-page-aligned mapped file, such as could be
      obtained from RomFS XIP.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarGreg Ungerer <gerg@uclinux.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8c9ed899
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://neil.brown.name/md · ee7fee0b
      Linus Torvalds authored
      * 'for-linus' of git://neil.brown.name/md:
        md: remove rd%d links immediately after stopping an array.
        md: remove ability to explicit set an inactive array to 'clean'.
        md: constify VFTs
        md: tidy up status_resync to handle large arrays.
        md: fix some (more) errors with bitmaps on devices larger than 2TB.
        md/raid10: don't clear bitmap during recovery if array will still be degraded.
        md: fix loading of out-of-date bitmap.
      ee7fee0b
    • Linus Torvalds's avatar
      random: make get_random_int() more random · 8a0a9bd4
      Linus Torvalds authored
      It's a really simple patch that basically just open-codes the current
      "secure_ip_id()" call, but when open-coding it we now use a _static_
      hashing area, so that it gets updated every time.
      
      And to make sure somebody can't just start from the same original seed of
      all-zeroes, and then do the "half_md4_transform()" over and over until
      they get the same sequence as the kernel has, each iteration also mixes in
      the same old "current->pid + jiffies" we used - so we should now have a
      regular strong pseudo-number generator, but we also have one that doesn't
      have a single seed.
      
      Note: the "pid + jiffies" is just meant to be a tiny tiny bit of noise. It
      has no real meaning. It could be anything. I just picked the previous
      seed, it's just that now we keep the state in between calls and that will
      feed into the next result, and that should make all the difference.
      
      I made that hash be a per-cpu data just to avoid cache-line ping-pong:
      having multiple CPU's write to the same data would be fine for randomness,
      and add yet another layer of chaos to it, but since get_random_int() is
      supposed to be a fast interface I did it that way instead. I considered
      using "__raw_get_cpu_var()" to avoid any preemption overhead while still
      getting the hash be _mostly_ ping-pong free, but in the end good taste won
      out.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8a0a9bd4
    • Linus Torvalds's avatar
      Merge master.kernel.org:/home/rmk/linux-2.6-arm · 2c66fa7e
      Linus Torvalds authored
      * master.kernel.org:/home/rmk/linux-2.6-arm:
        [ARM] 5507/1: support R_ARM_MOVW_ABS_NC and MOVT_ABS relocation types
        [ARM] 5506/1: davinci: DMA_32BIT_MASK --> DMA_BIT_MASK(32)
        i.MX31: Disable CPU_32v6K in mx3_defconfig.
        mx3fb: Fix compilation with CONFIG_PM
        mx27ads: move PBC mapping out of vmalloc space
        MXC: remove BUG_ON in interrupt handler
        mx31: remove mx31moboard_defconfig
        ARM: ARCH_MXC should select HAVE_CLK
        mxc : BUG in imx_dma_request
        mxc : Clean up properly when imx_dma_free() used without imx_dma_disable()
        [ARM] mv78xx0: update defconfig
        [ARM] orion5x: update defconfig
        [ARM] Kirkwood: update defconfig
        [ARM] Kconfig typo fix:  "PXA930" -> "CPU_PXA930".
        [ARM] S3C2412: Add missing cache flush in suspend code
        [ARM] S3C: Add UDIVSLOT support for newer UARTS
        [ARM] S3C64XX: Add S3C64XX_PA_IIS{0,1} to <mach/map.h>
      2c66fa7e
    • Paul Gortmaker's avatar
      [ARM] 5507/1: support R_ARM_MOVW_ABS_NC and MOVT_ABS relocation types · ae51e609
      Paul Gortmaker authored
      From: Bruce Ashfield <bruce.ashfield@windriver.com>
      
      To fully support the armv7-a instruction set/optimizations, support
      for the R_ARM_MOVW_ABS_NC and R_ARM_MOVT_ABS relocation types is
      required.
      
      The MOVW and MOVT are both load-immediate instructions, MOVW loads 16
      bits into the bottom half of a register, and MOVT loads 16 bits into the
      top half of a register.
      
      The relocation information for these instructions has a full 32 bit
      value, plus an addend which is stored in the 16 immediate bits in the
      instruction itself.  The immediate bits in the instruction are not
      contiguous (the register # splits it into a 4 bit and 12 bit value),
      so the addend has to be extracted accordingly and added to the value.
      The value is then split and put into the instruction; a MOVW uses the
      bottom 16 bits of the value, and a MOVT uses the top 16 bits.
      Signed-off-by: default avatarDavid Borman <david.borman@windriver.com>
      Signed-off-by: default avatarBruce Ashfield <bruce.ashfield@windriver.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      ae51e609
    • Kevin Hilman's avatar
      [ARM] 5506/1: davinci: DMA_32BIT_MASK --> DMA_BIT_MASK(32) · a029b706
      Kevin Hilman authored
      As per commit 284901a9, use
      DMA_BIT_MASK(n)
      Signed-off-by: default avatarKevin Hilman <khilman@deeprootsystems.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      a029b706
    • NeilBrown's avatar
      md: remove rd%d links immediately after stopping an array. · c4647292
      NeilBrown authored
      md maintains link in sys/mdXX/md/ to identify which device has
      which role in the array. e.g.
         rd2 -> dev-sda
      
      indicates that the device with role '2' in the array is sda.
      
      These links are only present when the array is active.  They are
      created immediately after ->run is called, and so should be removed
      immediately after ->stop is called.
      However they are currently removed a little bit later, and it is
      possible for ->run to be called again, thus adding these links, before
      they are removed.
      
      So move the removal earlier so they are consistently only present when
      the array is active.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      c4647292
    • NeilBrown's avatar
      md: remove ability to explicit set an inactive array to 'clean'. · 5bf29597
      NeilBrown authored
      Being able to write 'clean' to an 'array_state' of an inactive array
      to activate it in 'clean' mode is both unnecessary and inconvenient.
      
      It is unnecessary because the same can be achieved by writing
      'active'.  This activates and array, but it still remains 'clean'
      until the first write.
      
      It is inconvenient because writing 'clean' is more often used to
      cause an 'active' array to revert to 'clean' mode (thus blocking
      any writes until a 'write-pending' is promoted to 'active').
      
      Allowing 'clean' to both activate an array and mark an active array as
      clean can lead to races:  One program writes 'clean' to mark the
      active array as clean at the same time as another program writes
      'inactive' to deactivate (stop) and active array.  Depending on which
      writes first, the array could be deactivated and immediately
      reactivated which isn't what was desired.
      
      So just disable the use of 'clean' to activate an array.
      
      This avoids a race that can be triggered with mdadm-3.0 and external
      metadata, so it suitable for -stable.
      Reported-by: default avatarRafal Marszewski <rafal.marszewski@intel.com>
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      5bf29597
    • Jan Engelhardt's avatar
      md: constify VFTs · 110518bc
      Jan Engelhardt authored
      Signed-off-by: default avatarJan Engelhardt <jengelh@medozas.de>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      110518bc
    • NeilBrown's avatar
      md: tidy up status_resync to handle large arrays. · dd71cf6b
      NeilBrown authored
      Two problems in status_resync.
      1/ It still used Kilobytes as the basic block unit, while most code
         now uses sectors uniformly.
      2/ It doesn't allow for the possibility that max_sectors exceeds
         the range of "unsigned long".
      
      So
       - change "max_blocks" to "max_sectors", and store sector numbers
         in there and in 'resync'
       - Make 'rt' a 'sector_t' so it can temporarily hold the number of
         remaining sectors.
       - use sector_div rather than normal division.
       - change the magic '100' used to preserve precision to '32'.
         + making it a power of 2 makes division easier
         + it doesn't need to be as large as it was chosen when we averaged
           speed over the entire run.  Now we average speed over the last 30
           seconds or so.
      Reported-by: default avatar"Mario 'BitKoenig' Holbe" <Mario.Holbe@TU-Ilmenau.DE>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      dd71cf6b
    • NeilBrown's avatar
      md: fix some (more) errors with bitmaps on devices larger than 2TB. · db305e50
      NeilBrown authored
      If a write intent bitmap covers more than 2TB, we sometimes work with
      values beyond 32bit, so these need to be sector_t.  This patches
      add the required casts to some unsigned longs that are being shifted
      up.
      
      This will affect any raid10 larger than 2TB, or any raid1/4/5/6 with
      member devices that are larger than 2TB.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Reported-by: default avatar"Mario 'BitKoenig' Holbe" <Mario.Holbe@TU-Ilmenau.DE>
      Cc: stable@kernel.org
      db305e50
    • NeilBrown's avatar
      md/raid10: don't clear bitmap during recovery if array will still be degraded. · 18055569
      NeilBrown authored
      If we have a raid10 with multiple missing devices, and we recover just
      one of these to a spare, then we risk (depending on the bitmap and
      array chunk size) clearing bits of the bitmap for which recovery isn't
      complete (because a device is still missing).
      
      This can lead to a subsequent "re-add" being recovered without
      any IO happening, which would result in loss of data.
      
      This patch takes the safe approach of not clearing bitmap bits
      if the array will still be degraded.
      
      This patch is suitable for all active -stable kernels.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      18055569
    • NeilBrown's avatar
      md: fix loading of out-of-date bitmap. · b74fd282
      NeilBrown authored
      When md is loading a bitmap which it knows is out of date, it fills
      each page with 1s and writes it back out again.  However the
      write_page call makes used of bitmap->file_pages and
      bitmap->last_page_size which haven't been set correctly yet.  So this
      can sometimes fail.
      
      Move the setting of file_pages and last_page_size to before the call
      to write_page.
      
      This bug can cause the assembly on an array to fail, thus making the
      data inaccessible.  Hence I think it is a suitable candidate for
      -stable.
      
      Cc: stable@kernel.org
      Reported-by: default avatarVojtech Pavlik <vojtech@suse.cz>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      b74fd282
  6. 06 May, 2009 6 commits