1. 28 Feb, 2009 12 commits
    • Ingo Molnar's avatar
      Merge branch 'tip/tracing/ftrace' of... · acdb2c28
      Ingo Molnar authored
      Merge branch 'tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace
      acdb2c28
    • Steven Rostedt's avatar
      tracing: create the C style tracing for the irq subsystem · f2034f1e
      Steven Rostedt authored
      This patch utilizes the TRACE_EVENT_FORMAT macro to enable the C style
      faster tracing for the irq subsystem trace points.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      f2034f1e
    • Steven Rostedt's avatar
      tracing: create the C style tracing for the sched subsystem · 62992804
      Steven Rostedt authored
      This patch utilizes the TRACE_EVENT_FORMAT macro to enable the C style
      faster tracing for the sched subsystem trace points.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      62992804
    • Steven Rostedt's avatar
      tracing: add raw fast tracing interface for trace events · fd994989
      Steven Rostedt authored
      This patch adds the interface to enable the C style trace points.
      In the directory /debugfs/tracing/events/subsystem/event
      We now have three files:
      
       enable : values 0 or 1 to enable or disable the trace event.
      
       available_types: values 'raw' and 'printf' which indicate the tracing
             types available for the trace point. If a developer does not
             use the TRACE_EVENT_FORMAT macro and just uses the TRACE_FORMAT
             macro, then only 'printf' will be available. This file is
             read only.
      
       type: values 'raw' or 'printf'. This indicates which type of tracing
             is active for that trace point. 'printf' is the default and
             if 'raw' is not available, this file is read only.
      
       # echo raw > /debug/tracing/events/sched/sched_wakeup/type
       # echo 1 > /debug/tracing/events/sched/sched_wakeup/enable
      
       Will enable the C style tracing for the sched_wakeup trace point.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      fd994989
    • Steven Rostedt's avatar
      tracing: add raw trace point recording infrastructure · c32e827b
      Steven Rostedt authored
      Impact: lower overhead tracing
      
      The current event tracer can automatically pick up trace points
      that are registered with the TRACE_FORMAT macro. But it required
      a printf format string and parsing. Although, this adds the ability
      to get guaranteed information like task names and such, it took
      a hit in overhead processing. This processing can add about 500-1000
      nanoseconds overhead, but in some cases that too is considered
      too much and we want to shave off as much from this overhead as
      possible.
      
      Tom Zanussi recently posted tracing patches to lkml that are based
      on a nice idea about capturing the data via C structs using
      STRUCT_ENTER, STRUCT_EXIT type of macros.
      
      I liked that method very much, but did not like the implementation
      that required a developer to add data/code in several disjoint
      locations.
      
      This patch extends the event_tracer macros to do a similar "raw C"
      approach that Tom Zanussi did. But instead of having the developers
      needing to tweak a bunch of code all over the place, they can do it
      all in one macro - preferably placed near the code that it is
      tracing. That makes it much more likely that tracepoints will be
      maintained on an ongoing basis by the code they modify.
      
      The new macro TRACE_EVENT_FORMAT is created for this approach. (Note,
      a developer may still utilize the more low level DECLARE_TRACE macros
      if they don't care about getting their traces automatically in the event
      tracer.)
      
      They can also use the existing TRACE_FORMAT if they don't need to code
      the tracepoint in C, but just want to use the convenience of printf.
      
      So if the developer wants to "hardwire" a tracepoint in the fastest
      possible way, and wants to acquire their data via a user space utility
      in a raw binary format, or wants to see it in the trace output but not
      sacrifice any performance, then they can implement the faster but
      more complex TRACE_EVENT_FORMAT macro.
      
      Here's what usage looks like:
      
        TRACE_EVENT_FORMAT(name,
      	TPPROTO(proto),
      	TPARGS(args),
      	TPFMT(fmt, fmt_args),
      	TRACE_STUCT(
      		TRACE_FIELD(type1, item1, assign1)
      		TRACE_FIELD(type2, item2, assign2)
      			[...]
      	),
      	TPRAWFMT(raw_fmt)
      	);
      
      Note name, proto, args, and fmt, are all identical to what TRACE_FORMAT
      uses.
      
       name: is the unique identifier of the trace point
       proto: The proto type that the trace point uses
       args: the args in the proto type
       fmt: printf format to use with the event printf tracer
       fmt_args: the printf argments to match fmt
      
       TRACE_STRUCT starts the ability to create a structure.
       Each item in the structure is defined with a TRACE_FIELD
      
        TRACE_FIELD(type, item, assign)
      
       type: the C type of item.
       item: the name of the item in the stucture
       assign: what to assign the item in the trace point callback
      
       raw_fmt is a way to pretty print the struct. It must match
        the order of the items are added in TRACE_STUCT
      
       An example of this would be:
      
       TRACE_EVENT_FORMAT(sched_wakeup,
      	TPPROTO(struct rq *rq, struct task_struct *p, int success),
      	TPARGS(rq, p, success),
      	TPFMT("task %s:%d %s",
      	      p->comm, p->pid, success?"succeeded":"failed"),
      	TRACE_STRUCT(
      		TRACE_FIELD(pid_t, pid, p->pid)
      		TRACE_FIELD(int, success, success)
      	),
      	TPRAWFMT("task %d success=%d")
      	);
      
       This creates us a unique struct of:
      
       struct {
      	pid_t		pid;
      	int		success;
       };
      
       And the way the call back would assign these values would be:
      
      	entry->pid = p->pid;
      	entry->success = success;
      
      The nice part about this is that the creation of the assignent is done
      via macro magic in the event tracer.  Once the TRACE_EVENT_FORMAT is
      created, the developer will then have a faster method to record
      into the ring buffer. They do not need to worry about the tracer itself.
      
      The developer would only need to touch the files in include/trace/*.h
      
      Again, I would like to give special thanks to Tom Zanussi for this
      nice idea.
      
      Idea-from: Tom Zanussi <tzanussi@gmail.com>
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      c32e827b
    • Steven Rostedt's avatar
      tracing: add interface to write into current tracer buffer · ef5580d0
      Steven Rostedt authored
      Right now all tracers must manage their own trace buffers. This was
      to enforce tracers to be independent in case we finally decide to
      allow each tracer to have their own trace buffer.
      
      But now we are adding event tracing that writes to the current tracer's
      buffer. This adds an interface to allow events to write to the current
      tracer buffer without having to manage its own. Since event tracing
      has no "tracer", and is just a way to hook into any other tracer.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      ef5580d0
    • Steven Rostedt's avatar
      tracing: add subsystem sched for sched events · 3d7ba938
      Steven Rostedt authored
      Add the TRACE_SYSTEM sched for the sched events.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      3d7ba938
    • Steven Rostedt's avatar
      tracing: add subsystem irq for irq events · 0ec2ef15
      Steven Rostedt authored
      Add the TRACE_SYSTEM irq for the irq events.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      0ec2ef15
    • Steven Rostedt's avatar
      tracing: make the set_event and available_events subsystem aware · b628b3e6
      Steven Rostedt authored
      This patch makes the event files, set_event and available_events
      aware of the subsystem.
      
      Now you can enable an entire subsystem with:
      
        echo 'irq:*' > set_event
      
      Note: the '*' is not needed.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      b628b3e6
    • Ingo Molnar's avatar
      Merge branch 'tip/tracing/ftrace' of... · e9abf4c5
      Ingo Molnar authored
      Merge branch 'tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace
      e9abf4c5
    • Steven Rostedt's avatar
      tracing: add subsystem level to trace events · 6ecc2d1c
      Steven Rostedt authored
      If a trace point header defines TRACE_SYSTEM, then it will add the
      following trace points into that event system.
      
      If include/trace/irq_event_types.h has:
      
       #define TRACE_SYSTEM irq
      
      at the top and
      
       #undef TRACE_SYSTEM
      
      at the bottom, then a directory "irq" will be created in the
      /debug/tracing/events directory. Inside that directory will contain the
      two trace points that are defined in include/trace/irq_event_types.h.
      
      Only adding the above to irq and not to sched, we get:
      
       # ls /debug/tracing/events/
      irq                     sched_process_exit  sched_signal_send  sched_wakeup_new
      sched_kthread_stop      sched_process_fork  sched_switch
      sched_kthread_stop_ret  sched_process_free  sched_wait_task
      sched_migrate_task      sched_process_wait  sched_wakeup
      
       # ls /debug/tracing/events/irq
      irq_handler_entry  irq_handler_exit
      
      If we add #define TRACE_SYSTEM sched to the trace/sched_event_types.h
      then the rest of the trace events will be put in a sched directory
      within the events directory.
      
      I've been playing with this idea of the subsystem for a while, but
      recently Tom Zanussi posted some patches to lkml that included this
      method. Tom's approach was clean and got me to finally put some effort
      to clean up the event trace points.
      
      Thanks to Tom Zanussi for demonstrating how nice the subsystem
      method is.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      6ecc2d1c
    • Steven Rostedt's avatar
      tracing: move trace point formats to files in include/trace directory · eb594e45
      Steven Rostedt authored
      Impact: clean up
      
      To further facilitate the ease of adding trace points for developers, this
      patch creates include/trace/trace_events.h and
      include/trace/trace_event_types.h.
      
      The former file will hold the trace/<type>.h files and the latter will hold
      the trace/<type>_event_types.h files.
      
      To create new tracepoints and to have them automatically
      appear in the event tracer, a developer makes the trace/<type>.h file
      which includes <linux/tracepoint.h> and the trace/<type>_event_types.h file.
      
      The trace/<type>_event_types.h file will hold the TRACE_FORMAT
      macros.
      
      Then add the trace/<type>.h file to trace/trace_events.h,
      and add the trace/<type>_event_types.h to the trace_event_types.h file.
      
      No need to modify files elsewhere.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      eb594e45
  2. 27 Feb, 2009 9 commits
  3. 26 Feb, 2009 19 commits
    • Ingo Molnar's avatar
      Merge branch 'sched/clock' into tracing/ftrace · 5d0859ce
      Ingo Molnar authored
      Conflicts:
      	kernel/sched_clock.c
      5d0859ce
    • Ingo Molnar's avatar
      x86: set X86_FEATURE_TSC_RELIABLE · 83ce4009
      Ingo Molnar authored
      If the TSC is constant and non-stop, also set it reliable.
      
      (We will turn this off in DMI quirks for multi-chassis systems)
      
      The performance number on a 16-way Nehalem system running
      32 tasks that context-switch between each other is significant:
      
         sched_clock_stable=0		sched_clock_stable=1
         ....................         ....................
         22.456925 million/sec        24.306972 million/sec   [+8.2%]
      
      lmbench's "lat_ctx -s 0 2" goes from 0.63 microseconds to
      0.59 microseconds - a 6.7% increase in context-switching
      performance.
      
      Perfstat of 1 million pipe context switches between two tasks:
      
       Performance counter stats for './pipe-test-1m':
      
             [before]           [after]
         ............      ............
         37621.421089      36436.848378    task clock ticks     (msecs)
      
                    0                 0    CPU migrations       (events)
              2000274           2000189    context switches     (events)
                  194               193    pagefaults           (events)
           8433799643        8171016416    CPU cycles           (events) -3.21%
           8370133368        8180999694    instructions         (events) -2.31%
              4158565           3895941    cache references     (events) -6.74%
                44312             46264    cache misses         (events)
      
          2349.287976       2279.362465    wall-time            (msecs)  -3.06%
      
      The speedup comes straight from the reduction in the instruction
      count. sched_clock_cpu() got simpler and the whole workload thus
      executes faster.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      83ce4009
    • Ingo Molnar's avatar
      sched: allow architectures to specify sched_clock_stable · b342501c
      Ingo Molnar authored
      Allow CONFIG_HAVE_UNSTABLE_SCHED_CLOCK architectures to still specify
      that their sched_clock() implementation is reliable.
      
      This will be used by x86 to switch on a faster sched_clock_cpu()
      implementation on certain CPU types.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b342501c
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable · 64e71303
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
        Btrfs: try committing transaction before returning ENOSPC
        Btrfs: add better -ENOSPC handling
      64e71303
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block · babb29b0
      Linus Torvalds authored
      * 'for-linus' of git://git.kernel.dk/linux-2.6-block:
        xen/blkfront: use blk_rq_map_sg to generate ring entries
        block: reduce stack footprint of blk_recount_segments()
        cciss: shorten 30s timeout on controller reset
        block: add documentation for register_blkdev()
        block: fix bogus gcc warning for uninitialized var usage
      babb29b0
    • Linus Torvalds's avatar
      Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc · 6fc79d40
      Linus Torvalds authored
      * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
        powerpc: Fix 64bit __copy_tofrom_user() regression
        powerpc: Fix 64bit memcpy() regression
        powerpc: Fix load/store float double alignment handler
      6fc79d40
    • Linus Torvalds's avatar
      Make ieee1394_init a fs-initcall · 86883c27
      Linus Torvalds authored
      It needs to happen before any firewire driver actually registers itself,
      and that was previously handled by having the Makefile list the core
      ieee1394 files before the drivers.
      
      But now there are firewire drivers in drivers/media, and the Makefile
      games aren't enough.  So just make ieee1394_init happen earlier in the
      init sequence, the way all other bus layers already do.
      Reported-and-tested-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
      Cc: Henrik Kurelid <henrik@kurelid.se>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: Ben Backx <ben@bbackx.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      86883c27
    • Ingo Molnar's avatar
      tracing: implement trace_clock_*() APIs · 14131f2f
      Ingo Molnar authored
      Impact: implement new tracing timestamp APIs
      
      Add three trace clock variants, with differing scalability/precision
      tradeoffs:
      
       -   local: CPU-local trace clock
       -  medium: scalable global clock with some jitter
       -  global: globally monotonic, serialized clock
      
      Make the ring-buffer use the local trace clock internally.
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      14131f2f
    • Ingo Molnar's avatar
      sched: sched_clock() improvement: use in_nmi() · 6409c4da
      Ingo Molnar authored
      make sure we dont execute more complex sched_clock() code in NMI context.
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      6409c4da
    • Jason Baron's avatar
      tracing, genirq: add irq enter and exit trace events · af39241b
      Jason Baron authored
      Impact: add new tracepoints
      
      Add them to the generic IRQ code, that way every architecture
      gets these new tracepoints, not just x86.
      
      Using Steve's new 'TRACE_FORMAT', I can get function graph
      trace as follows using the original two IRQ tracepoints:
      
       3)               |    handle_IRQ_event() {
       3)               |    /* (irq_handler_entry) irq=28 handler=eth0 */
       3)               |    e1000_intr_msi() {
       3)   2.460 us    |      __napi_schedule();
       3)   9.416 us    |    }
       3)               |    /* (irq_handler_exit) irq=28 handler=eth0 return=handled */
       3) + 22.935 us   |  }
      Signed-off-by: default avatarJason Baron <jbaron@redhat.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
      Cc: "Frank Ch. Eigler" <fche@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      af39241b
    • Frederic Weisbecker's avatar
      tracing/core: make the per cpu trace files in per cpu directories · 8656e7a2
      Frederic Weisbecker authored
      Impact: restructure the VFS layout of per CPU trace buffers
      
      The per cpu trace files are all in a single directory:
      /debug/tracing/per_cpu. In case of a large number of cpu, the
      content of this directory becomes messy so we create now one
      directory per cpu inside /debug/tracing/per_cpu which contain
      each their own trace_pipe and trace files.
      
      Ie:
      
       /debug/tracing$ ls -R per_cpu
       per_cpu:
       cpu0  cpu1
      
       per_cpu/cpu0:
       trace  trace_pipe
      
       per_cpu/cpu1:
       trace  trace_pipe
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8656e7a2
    • Jens Axboe's avatar
      xen/blkfront: use blk_rq_map_sg to generate ring entries · 9e973e64
      Jens Axboe authored
      On occasion, the request will apparently have more segments than we
      fit into the ring. Jens says:
      
      > The second problem is that the block layer then appears to create one
      > too many segments, but from the dump it has rq->nr_phys_segments ==
      > BLKIF_MAX_SEGMENTS_PER_REQUEST. I suspect the latter is due to
      > xen-blkfront not handling the merging on its own. It should check that
      > the new page doesn't form part of the previous page. The
      > rq_for_each_segment() iterates all single bits in the request, not dma
      > segments. The "easiest" way to do this is to call blk_rq_map_sg() and
      > then iterate the mapped sg list. That will give you what you are
      > looking for.
      
      > Here's a test patch, compiles but otherwise untested. I spent more
      > time figuring out how to enable XEN than to code it up, so YMMV!
      > Probably the sg list wants to be put inside the ring and only
      > initialized on allocation, then you can get rid of the sg on stack and
      > sg_init_table() loop call in the function. I'll leave that, and the
      > testing, to you.
      
      [Moved sg array into info structure, and initialize once. -J]
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      9e973e64
    • Jens Axboe's avatar
      block: reduce stack footprint of blk_recount_segments() · 1e428079
      Jens Axboe authored
      blk_recalc_rq_segments() requires a request structure passed in, which
      we don't have from blk_recount_segments(). So the latter allocates one on
      the stack, using > 400 bytes of stack for that. This can cause us to spill
      over one page of stack from ext4 at least:
      
       0)     4560     400   blk_recount_segments+0x43/0x62
       1)     4160      32   bio_phys_segments+0x1c/0x24
       2)     4128      32   blk_rq_bio_prep+0x2a/0xf9
       3)     4096      32   init_request_from_bio+0xf9/0xfe
       4)     4064     112   __make_request+0x33c/0x3f6
       5)     3952     144   generic_make_request+0x2d1/0x321
       6)     3808      64   submit_bio+0xb9/0xc3
       7)     3744      48   submit_bh+0xea/0x10e
       8)     3696     368   ext4_mb_init_cache+0x257/0xa6a [ext4]
       9)     3328     288   ext4_mb_regular_allocator+0x421/0xcd9 [ext4]
      10)     3040     160   ext4_mb_new_blocks+0x211/0x4b4 [ext4]
      11)     2880     336   ext4_ext_get_blocks+0xb61/0xd45 [ext4]
      12)     2544      96   ext4_get_blocks_wrap+0xf2/0x200 [ext4]
      13)     2448      80   ext4_da_get_block_write+0x6e/0x16b [ext4]
      14)     2368     352   mpage_da_map_blocks+0x7e/0x4b3 [ext4]
      15)     2016     352   ext4_da_writepages+0x2ce/0x43c [ext4]
      16)     1664      32   do_writepages+0x2d/0x3c
      17)     1632     144   __writeback_single_inode+0x162/0x2cd
      18)     1488      96   generic_sync_sb_inodes+0x1e3/0x32b
      19)     1392      16   sync_sb_inodes+0xe/0x10
      20)     1376      48   writeback_inodes+0x69/0xb3
      21)     1328     208   balance_dirty_pages_ratelimited_nr+0x187/0x2f9
      22)     1120     224   generic_file_buffered_write+0x1d4/0x2c4
      23)      896     176   __generic_file_aio_write_nolock+0x35f/0x393
      24)      720      80   generic_file_aio_write+0x6c/0xc8
      25)      640      80   ext4_file_write+0xa9/0x137 [ext4]
      26)      560     320   do_sync_write+0xf0/0x137
      27)      240      48   vfs_write+0xb3/0x13c
      28)      192      64   sys_write+0x4c/0x74
      29)      128     128   system_call_fastpath+0x16/0x1b
      
      Split the segment counting out into a __blk_recalc_rq_segments() helper
      to avoid allocating an onstack request just for checking the physical
      segment count.
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      1e428079
    • Jens Axboe's avatar
      cciss: shorten 30s timeout on controller reset · 5e4c91c8
      Jens Axboe authored
      If reset_devices is set for kexec, then cciss will delay 30 seconds
      since the old 5i controller _may_ need that long to recover. Replace
      the long sleep with incremental sleep and tests to reduce the 30 seconds
      to worst case for 5i, so that other controllers will proceed quickly.
      Reviewed-by: default avatarMike Miller <mike.miller@hp.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      5e4c91c8
    • Márton Németh's avatar
      block: add documentation for register_blkdev() · 9e8c0bcc
      Márton Németh authored
      Add documentation for register_blkdev() function and for the parameters.
      Signed-off-by: default avatarMárton Németh <nm127@freemail.hu>
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      9e8c0bcc
    • Jens Axboe's avatar
      block: fix bogus gcc warning for uninitialized var usage · b2bf9683
      Jens Axboe authored
      Newer gcc throw this warning:
      
              fs/bio.c: In function ?bio_alloc_bioset?:
              fs/bio.c:305: warning: ?p? may be used uninitialized in this function
      
      since it cannot figure out that 'p' is only ever used if 'bs' is non-NULL.
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      b2bf9683
    • Mark Nelson's avatar
      powerpc: Fix 64bit __copy_tofrom_user() regression · f72b728b
      Mark Nelson authored
      This fixes a regression introduced by commit
      a4e22f02 ("powerpc: Update 64bit
      __copy_tofrom_user() using CPU_FTR_UNALIGNED_LD_STD").
      
      The same bug that existed in the 64bit memcpy() also exists here so fix
      it here too. The fix is the same as that applied to memcpy() with the
      addition of fixes for the exception handling code required for
      __copy_tofrom_user().
      
      This stops us reading beyond the end of the source region we were told
      to copy.
      Signed-off-by: default avatarMark Nelson <markn@au1.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      f72b728b
    • Mark Nelson's avatar
      powerpc: Fix 64bit memcpy() regression · e423b9ec
      Mark Nelson authored
      This fixes a regression introduced by commit
      25d6e2d7 ("powerpc: Update 64bit memcpy()
      using CPU_FTR_UNALIGNED_LD_STD").
      
      This commit allowed CPUs that have the CPU_FTR_UNALIGNED_LD_STD CPU
      feature bit present to do the memcpy() with unaligned load doubles. But,
      along with this came a bug where our final load double would read bytes
      beyond a page boundary and into the next (unmapped) page. This was caught
      by enabling CONFIG_DEBUG_PAGEALLOC,
      
      The fix was to read only the number of bytes that we need to store rather
      than reading a full 8-byte doubleword and storing only a portion of that.
      
      In order to minimise the amount of existing code touched we use the
      original do_tail for the src_unaligned case.
      
      Below is an example of the regression, as reported by Sachin Sant:
      
      Unable to handle kernel paging request for data at address 0xc00000003f380000
      Faulting instruction address: 0xc000000000039574
      cpu 0x1: Vector: 300 (Data Access) at [c00000003baf3020]
          pc: c000000000039574: .memcpy+0x74/0x244
          lr: d00000000244916c: .ext3_xattr_get+0x288/0x2f4 [ext3]
          sp: c00000003baf32a0
         msr: 8000000000009032
         dar: c00000003f380000
       dsisr: 40000000
        current = 0xc00000003e54b010
        paca    = 0xc000000000a53680
          pid   = 1840, comm = readahead
      enter ? for help
      [link register   ] d00000000244916c .ext3_xattr_get+0x288/0x2f4 [ext3]
      [c00000003baf32a0] d000000002449104 .ext3_xattr_get+0x220/0x2f4 [ext3]
      (unreliab
      le)
      [c00000003baf3390] d00000000244a6e8 .ext3_xattr_security_get+0x40/0x5c [ext3]
      [c00000003baf3400] c000000000148154 .generic_getxattr+0x74/0x9c
      [c00000003baf34a0] c000000000333400 .inode_doinit_with_dentry+0x1c4/0x678
      [c00000003baf3560] c00000000032c6b0 .security_d_instantiate+0x50/0x68
      [c00000003baf35e0] c00000000013c818 .d_instantiate+0x78/0x9c
      [c00000003baf3680] c00000000013ced0 .d_splice_alias+0xf0/0x120
      [c00000003baf3720] d00000000243e05c .ext3_lookup+0xec/0x134 [ext3]
      [c00000003baf37c0] c000000000131e74 .do_lookup+0x110/0x260
      [c00000003baf3880] c000000000134ed0 .__link_path_walk+0xa98/0x1010
      [c00000003baf3970] c0000000001354a0 .path_walk+0x58/0xc4
      [c00000003baf3a20] c000000000135720 .do_path_lookup+0x138/0x1e4
      [c00000003baf3ad0] c00000000013645c .path_lookup_open+0x6c/0xc8
      [c00000003baf3b70] c000000000136780 .do_filp_open+0xcc/0x874
      [c00000003baf3d10] c0000000001251e0 .do_sys_open+0x80/0x140
      [c00000003baf3dc0] c00000000016aaec .compat_sys_open+0x24/0x38
      [c00000003baf3e30] c00000000000855c syscall_exit+0x0/0x40
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      e423b9ec
    • Michael Neuling's avatar
      powerpc: Fix load/store float double alignment handler · 49f297f8
      Michael Neuling authored
      When we introduced VSX, we changed the way FPRs are stored in the
      thread_struct.  Unfortunately we missed the load/store float double
      alignment handler code when updating how we access FPRs in the
      thread_struct.
      
      Below fixes this and merges the little/big endian case.
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      49f297f8