1. 30 Oct, 2012 8 commits
  2. 25 Oct, 2012 1 commit
    • Jianpeng Ma's avatar
      block: Add blk_rq_pos(rq) to sort rq when plushing · 975927b9
      Jianpeng Ma authored
      My workload is a raid5 which had 16 disks. And used our filesystem to
      write using direct-io mode.
      
      I used the blktrace to find those message:
      8,16   0     6647     2.453665504  2579  M   W 7493152 + 8 [md0_raid5]
      8,16   0     6648     2.453672411  2579  Q   W 7493160 + 8 [md0_raid5]
      8,16   0     6649     2.453672606  2579  M   W 7493160 + 8 [md0_raid5]
      8,16   0     6650     2.453679255  2579  Q   W 7493168 + 8 [md0_raid5]
      8,16   0     6651     2.453679441  2579  M   W 7493168 + 8 [md0_raid5]
      8,16   0     6652     2.453685948  2579  Q   W 7493176 + 8 [md0_raid5]
      8,16   0     6653     2.453686149  2579  M   W 7493176 + 8 [md0_raid5]
      8,16   0     6654     2.453693074  2579  Q   W 7493184 + 8 [md0_raid5]
      8,16   0     6655     2.453693254  2579  M   W 7493184 + 8 [md0_raid5]
      8,16   0     6656     2.453704290  2579  Q   W 7493192 + 8 [md0_raid5]
      8,16   0     6657     2.453704482  2579  M   W 7493192 + 8 [md0_raid5]
      8,16   0     6658     2.453715016  2579  Q   W 7493200 + 8 [md0_raid5]
      8,16   0     6659     2.453715247  2579  M   W 7493200 + 8 [md0_raid5]
      8,16   0     6660     2.453721730  2579  Q   W 7493208 + 8 [md0_raid5]
      8,16   0     6661     2.453721974  2579  M   W 7493208 + 8 [md0_raid5]
      8,16   0     6662     2.453728202  2579  Q   W 7493216 + 8 [md0_raid5]
      8,16   0     6663     2.453728436  2579  M   W 7493216 + 8 [md0_raid5]
      8,16   0     6664     2.453734782  2579  Q   W 7493224 + 8 [md0_raid5]
      8,16   0     6665     2.453735019  2579  M   W 7493224 + 8 [md0_raid5]
      8,16   0     6666     2.453741401  2579  Q   W 7493232 + 8 [md0_raid5]
      8,16   0     6667     2.453741632  2579  M   W 7493232 + 8 [md0_raid5]
      8,16   0     6668     2.453748148  2579  Q   W 7493240 + 8 [md0_raid5]
      8,16   0     6669     2.453748386  2579  M   W 7493240 + 8 [md0_raid5]
      8,16   0     6670     2.453851843  2579  I   W 7493144 + 104 [md0_raid5]
      8,16   0        0     2.453853661     0  m   N cfq2579 insert_request
      8,16   0     6671     2.453854064  2579  I   W 7493120 + 24 [md0_raid5]
      8,16   0        0     2.453854439     0  m   N cfq2579 insert_request
      8,16   0     6672     2.453854793  2579  U   N [md0_raid5] 2
      8,16   0        0     2.453855513     0  m   N cfq2579 Not idling.st->count:1
      8,16   0        0     2.453855927     0  m   N cfq2579 dispatch_insert
      8,16   0        0     2.453861771     0  m   N cfq2579 dispatched a request
      8,16   0        0     2.453862248     0  m   N cfq2579 activate rq,drv=1
      8,16   0     6673     2.453862332  2579  D   W 7493120 + 24 [md0_raid5]
      8,16   0        0     2.453865957     0  m   N cfq2579 Not idling.st->count:1
      8,16   0        0     2.453866269     0  m   N cfq2579 dispatch_insert
      8,16   0        0     2.453866707     0  m   N cfq2579 dispatched a request
      8,16   0        0     2.453867061     0  m   N cfq2579 activate rq,drv=2
      8,16   0     6674     2.453867145  2579  D   W 7493144 + 104 [md0_raid5]
      8,16   0     6675     2.454147608     0  C   W 7493120 + 24 [0]
      8,16   0        0     2.454149357     0  m   N cfq2579 complete rqnoidle 0
      8,16   0     6676     2.454791505     0  C   W 7493144 + 104 [0]
      8,16   0        0     2.454794803     0  m   N cfq2579 complete rqnoidle 0
      8,16   0        0     2.454795160     0  m   N cfq schedule dispatch
      
      From above messages,we can find rq[W 7493144 + 104] and rq[W
      7493120 + 24] do not merge.
      Because the bio order is:
        8,16   0     6638     2.453619407  2579  Q   W 7493144 + 8 [md0_raid5]
        8,16   0     6639     2.453620460  2579  G   W 7493144 + 8 [md0_raid5]
        8,16   0     6640     2.453639311  2579  Q   W 7493120 + 8 [md0_raid5]
        8,16   0     6641     2.453639842  2579  G   W 7493120 + 8 [md0_raid5]
      The bio(7493144) first and bio(7493120) later.So the subsequent
      bios will be divided into two parts.
      When flushing plug-list,because elv_attempt_insert_merge only support
      backmerge,not supporting frontmerge.
      So rq[7493120 + 24] can't merge with rq[7493144 + 104].
      
      From my test,i found those situation can count 25% in our system.
      Using this patch, there is no this situation.
      Signed-off-by: default avatarJianpeng Ma <majianpeng@gmail.com>
      CC:Shaohua Li <shli@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      975927b9
  3. 23 Oct, 2012 2 commits
  4. 22 Oct, 2012 7 commits
    • Anna Leuschner's avatar
      vfs: fix: don't increase bio_slab_max if krealloc() fails · 386bc35a
      Anna Leuschner authored
      Without the patch, bio_slab_max, representing bio_slabs capacity, is increased before krealloc() of bio_slabs. If krealloc() fails, bio_slab_max is too high. Fix that by only updating bio_slab_max if krealloc() is successful.
      Signed-off-by: default avatarAnna Leuschner <anna.m.leuschner@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      386bc35a
    • Jun'ichi Nomura's avatar
      blkcg: stop iteration early if root_rl is the only request list · 65c77fd9
      Jun'ichi Nomura authored
      __blk_queue_next_rl() finds next request list based on blkg_list
      while skipping root_blkg in the list.
      OTOH, root_rl is special as it may exist even without root_blkg.
      
      Though the later part of the function handles such a case correctly,
      exiting early is good for readability of the code.
      Signed-off-by: default avatarJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Cc: Tejun Heo <tj@kernel.org>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      65c77fd9
    • Jun'ichi Nomura's avatar
      blkcg: Fix use-after-free of q->root_blkg and q->root_rl.blkg · 65635cbc
      Jun'ichi Nomura authored
      blk_put_rl() does not call blkg_put() for q->root_rl because we
      don't take request list reference on q->root_blkg.
      However, if root_blkg is once attached then detached (freed),
      blk_put_rl() is confused by the bogus pointer in q->root_blkg.
      
      For example, with !CONFIG_BLK_DEV_THROTTLING &&
      CONFIG_CFQ_GROUP_IOSCHED,
      switching IO scheduler from cfq to deadline will cause system stall
      after the following warning with 3.6:
      
      > WARNING: at /work/build/linux/block/blk-cgroup.h:250
      > blk_put_rl+0x4d/0x95()
      > Modules linked in: bridge stp llc sunrpc acpi_cpufreq freq_table mperf
      > ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4
      > Pid: 0, comm: swapper/0 Not tainted 3.6.0 #1
      > Call Trace:
      >  <IRQ>  [<ffffffff810453bd>] warn_slowpath_common+0x85/0x9d
      >  [<ffffffff810453ef>] warn_slowpath_null+0x1a/0x1c
      >  [<ffffffff811d5f8d>] blk_put_rl+0x4d/0x95
      >  [<ffffffff811d614a>] __blk_put_request+0xc3/0xcb
      >  [<ffffffff811d71a3>] blk_finish_request+0x232/0x23f
      >  [<ffffffff811d76c3>] ? blk_end_bidi_request+0x34/0x5d
      >  [<ffffffff811d76d1>] blk_end_bidi_request+0x42/0x5d
      >  [<ffffffff811d7728>] blk_end_request+0x10/0x12
      >  [<ffffffff812cdf16>] scsi_io_completion+0x207/0x4d5
      >  [<ffffffff812c6fcf>] scsi_finish_command+0xfa/0x103
      >  [<ffffffff812ce2f8>] scsi_softirq_done+0xff/0x108
      >  [<ffffffff811dcea5>] blk_done_softirq+0x8d/0xa1
      >  [<ffffffff810915d5>] ?
      >  generic_smp_call_function_single_interrupt+0x9f/0xd7
      >  [<ffffffff8104cf5b>] __do_softirq+0x102/0x213
      >  [<ffffffff8108a5ec>] ? lock_release_holdtime+0xb6/0xbb
      >  [<ffffffff8104d2b4>] ? raise_softirq_irqoff+0x9/0x3d
      >  [<ffffffff81424dfc>] call_softirq+0x1c/0x30
      >  [<ffffffff81011beb>] do_softirq+0x4b/0xa3
      >  [<ffffffff8104cdb0>] irq_exit+0x53/0xd5
      >  [<ffffffff8102d865>] smp_call_function_single_interrupt+0x34/0x36
      >  [<ffffffff8142486f>] call_function_single_interrupt+0x6f/0x80
      >  <EOI>  [<ffffffff8101800b>] ? mwait_idle+0x94/0xcd
      >  [<ffffffff81018002>] ? mwait_idle+0x8b/0xcd
      >  [<ffffffff81017811>] cpu_idle+0xbb/0x114
      >  [<ffffffff81401fbd>] rest_init+0xc1/0xc8
      >  [<ffffffff81401efc>] ? csum_partial_copy_generic+0x16c/0x16c
      >  [<ffffffff81cdbd3d>] start_kernel+0x3d4/0x3e1
      >  [<ffffffff81cdb79e>] ? kernel_init+0x1f7/0x1f7
      >  [<ffffffff81cdb2dd>] x86_64_start_reservations+0xb8/0xbd
      >  [<ffffffff81cdb3e3>] x86_64_start_kernel+0x101/0x110
      
      This patch clears q->root_blkg and q->root_rl.blkg when root blkg
      is destroyed.
      Signed-off-by: default avatarJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Cc: stable@kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      65635cbc
    • Randy Dunlap's avatar
      module_signing: fix printk format warning · 0390c883
      Randy Dunlap authored
      Fix the warning:
      
        kernel/module_signing.c:195:2: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'size_t'
      
      by using the proper 'z' modifier for printing a size_t.
      Signed-off-by: default avatarRandy Dunlap <rdunlap@xenotime.net>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0390c883
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k · 4fe7e866
      Linus Torvalds authored
      Pull m68k updates from Geert Uytterhoeven:
       "Just the expected UAPI disintegration and the "new" kcmp syscall."
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
        m68k: Wire up kcmp
        m68k: Remove empty #ifdef/#else/#endif block
        UAPI: (Scripted) Disintegrate arch/m68k/include/asm
      4fe7e866
    • Dmitry Torokhov's avatar
      Input: fix use-after-free introduced with dynamic minor changes · 4a215aad
      Dmitry Torokhov authored
      Commit 7f8d4cad ("Input: extend the number of event (and other)
      devices") made evdev, joydev and mousedev to embed struct cdev into
      their respective structures representing input devices.
      
      Unfortunately character device structure may outlive the parent
      structure unless we do not set it up as parent of character device so
      that it will stay pinned until character device is freed.
      
      Also, now that parent structure is pinned while character device exists
      we do not need to pin and unpin it every time user opens or closes it.
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4a215aad
    • Dmitry Torokhov's avatar
      char_dev: pin parent kobject · 2f0157f1
      Dmitry Torokhov authored
      In certain cases (for example when a cdev structure is embedded into
      another object whose lifetime is controlled by a separate kobject) it is
      beneficial to tie lifetime of another object to the lifetime of
      character device so that related object is not freed until after
      char_dev object is freed.
      
      To achieve this let's pin kobject's parent when doing cdev_add() and
      unpin when last reference to cdev structure is being released.
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2f0157f1
  5. 20 Oct, 2012 9 commits
    • Linus Torvalds's avatar
      Linux 3.7-rc2 · 6f0c0580
      Linus Torvalds authored
      6f0c0580
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64 · 198190a1
      Linus Torvalds authored
      Pull arm64 fixes from Catalin Marinas:
       "Main changes:
         - AArch64 Linux compilation fixes following 3.7-rc1 changes
           (MODULES_USE_ELF_RELA, update_vsyscall() prototype)
         - Unnecessary register setting in start_thread() (thanks to Al Viro)
         - ptrace fixes"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64:
        arm64: fix alignment padding in assembly code
        arm64: ptrace: use HW_BREAKPOINT_EMPTY type for disabled breakpoints
        arm64: ptrace: make structure padding explicit for debug registers
        arm64: No need to set the x0-x2 registers in start_thread()
        arm64: Ignore memory blocks below PHYS_OFFSET
        arm64: Fix the update_vsyscall() prototype
        arm64: Select MODULES_USE_ELF_RELA
        arm64: Remove duplicate inclusion of mmu_context.h in smp.c
      198190a1
    • Marc Zyngier's avatar
      arm64: fix alignment padding in assembly code · aeed41a9
      Marc Zyngier authored
      An interesting effect of using the generic version of linkage.h
      is that the padding is defined in terms of x86 NOPs, which can have
      even more interesting effects when the assembly code looks like this:
      
      ENTRY(func1)
      	mov	x0, xzr
      ENDPROC(func1)
      	// fall through
      ENTRY(func2)
      	mov	x0, #1
      	ret
      ENDPROC(func2)
      
      Admittedly, the code is not very nice. But having code from another
      architecture doesn't look completely sane either.
      
      The fix is to add arm64's version of linkage.h, which causes the insertion
      of proper AArch64 NOPs.
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      aeed41a9
    • Kees Cook's avatar
      use clamp_t in UNAME26 fix · 31fd84b9
      Kees Cook authored
      The min/max call needed to have explicit types on some architectures
      (e.g. mn10300). Use clamp_t instead to avoid the warning:
      
        kernel/sys.c: In function 'override_release':
        kernel/sys.c:1287:10: warning: comparison of distinct pointer types lacks a cast [enabled by default]
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      31fd84b9
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 8c1bee68
      Linus Torvalds authored
      Pull perf fixes from Ingo Molnar:
       "Assorted small fixes"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf python: Properly link with libtraceevent
        perf hists browser: Add back callchain folding symbol
        perf tools: Fix build on sparc.
        perf python: Link with libtraceevent
        perf python: Initialize 'page_size' variable
        tools lib traceevent: Fix missed freeing of subargs in free_arg() in filter
        lib tools traceevent: Add back pevent assignment in __pevent_parse_format()
        perf hists browser: Fix off-by-two bug on the first column
        perf tools: Remove warnings on JIT samples for srcline sort key
        perf tools: Fix segfault when using srcline sort key
        perf: Require exclude_guest to use PEBS - kernel side enforcement
        perf tool: Precise mode requires exclude_guest
      8c1bee68
    • Arnaldo Carvalho de Melo's avatar
      perf python: Properly link with libtraceevent · 45bff41a
      Arnaldo Carvalho de Melo authored
      Namhyung Kim reported that the build fails with:
      
        GEN python/perf.so
        gcc: error: python_ext_build/tmp//../../libtraceevent.a: No such file or directory
        error: command 'gcc' failed with exit status 1
        cp: cannot stat `python_ext_build/lib/perf.so': No such file or directory
        make: *** [python/perf.so] Error 1
      
      We need to propagate the TE_PATH variable to the setup.py file.
      Reported-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Link: http://lkml.kernel.org/n/tip-8umiPbm4sxpknKivbjgykhut@git.kernel.org
      [ Fixed superfluous variable build error. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      45bff41a
    • Ingo Molnar's avatar
      Merge tag 'perf-urgent-for-mingo' of... · a448a031
      Ingo Molnar authored
      Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
      
      Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
      
      * The python binding needs to link with libtraceevent and to initialize
        the 'page_size' variable so that mmaping works again.
      
      * The callchain folding character that appears on the TUI just before
        the overhead had disappeared due to recent changes, add it back.
      
      * Intel PEBS in VT-x context uses the DS address as a guest linear address,
        even though its programmed by the host as a host linear address. This either
        results in guest memory corruption and or the hardware faulting and 'crashing'
        the virtual machine.  Therefore we have to disable PEBS on VT-x enter and
        re-enable on VT-x exit, enforcing a strict exclude_guest.
      
        Kernel side enforcement fix by Peter Zijlstra, tooling side fix by David Ahern.
      
      * Fix build on sparc due to UAPI, fix from David Miller.
      
      * Fixes for the srclike sort key for unresolved symbols and when processing
        samples in JITted code, where we don't have an ELF file, just an special
        symbol table, fixes from Namhyung Kim.
      
      * Fix some leaks in libtraceevent, from Steven Rostedt.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a448a031
    • Linus Torvalds's avatar
      Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 37820108
      Linus Torvalds authored
      Pull ARM soc fixes from Olof Johansson:
       "A set of fixes and some minor cleanups for -rc2:
      
         - A series from Arnd that fixes warnings in drivers and other code
           included by ARM defconfigs.  Most have been acked by corresponding
           maintainers (and seem quite hard to argue not picking up anyway in
           the few exception cases).
         - A few misc patches from the list for integrator/vt8500/i.MX
         - A batch of fixes to OMAP platforms, fixing:
           - boot problems on beaglebone,
           - regression fixes for local timers
           - clockdomain locking fixes
           - a few boot/sparse warnings
         - For Tegra:
           - Clock rate calculation overflow fix
           - Revert a change that removed timer clocks and a fix for symbol
             name clashes
         - For Renesas:
           - IO accessor / annotation cleanups to remove warnings
         - For Kirkwood/Dove/mvebu:
           - Fixes for device trees for Dove (some minor cleanups, some fixes)
           - Fixes for the mvebu gpio driver
           - Fix build problem for Feroceon due to missing ifdefs
           - Fix lsxl DTS files"
      
      * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (31 commits)
        ARM: kirkwood: fix buttons on lsxl boards
        ARM: kirkwood: fix LEDs names for lsxl boards
        ARM: Kirkwood: fix disabling CACHE_FEROCEON_L2
        gpio: mvebu: Add missing breaks in mvebu_gpio_irq_set_type
        ARM: dove: Add crypto engine to DT
        ARM: dove: Remove watchdog from DT
        ARM: dove: Restructure SoC device tree descriptor
        ARM: dove: Fix clock names of sata and gbe
        ARM: dove: Fix tauros2 device tree init
        ARM: dove: Add pcie clock support
        ARM: OMAP2+: Allow kernel to boot even if GPMC fails to reserve memory
        ARM: OMAP: clockdomain: Fix locking on _clkdm_clk_hwmod_enable / disable
        ARM: s3c: mark s3c2440_clk_add as __init_refok
        spi/s3c64xx: use correct dma_transfer_direction type
        ARM: OMAP4: devices: fixup OMAP4 DMIC platform device error message
        ARM: OMAP2+: clock data: Add dev-id for the omap-gpmc dummy fck
        ARM: OMAP: resolve sparse warning concerning debug_card_init()
        ARM: OMAP4: Fix twd_local_timer_register regression
        ARM: tegra: add tegra_timer clock
        ARM: tegra: rename tegra system timer
        ...
      37820108
    • David Howells's avatar
      MODSIGN: Move the magic string to the end of a module and eliminate the search · caabe240
      David Howells authored
      Emit the magic string that indicates a module has a signature after the
      signature data instead of before it.  This allows module_sig_check() to
      be made simpler and faster by the elimination of the search for the
      magic string.  Instead we just need to do a single memcmp().
      
      This works because at the end of the signature data there is the
      fixed-length signature information block.  This block then falls
      immediately prior to the magic number.
      
      From the contents of the information block, it is trivial to calculate
      the size of the signature data and thus the size of the actual module
      data.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      caabe240
  6. 19 Oct, 2012 13 commits