1. 03 Mar, 2017 1 commit
    • Steven Rostedt (VMware)'s avatar
      ftrace/graph: Do not modify the EMPTY_HASH for the function_graph filter · 92ad18ec
      Steven Rostedt (VMware) authored
      On boot up, if the kernel command line sets a graph funtion with the kernel
      command line options "ftrace_graph_filter" or "ftrace_graph_notrace" then it
      updates the corresponding function graph hash, ftrace_graph_hash or
      ftrace_graph_notrace_hash respectively. Unfortunately, at boot up, these
      variables are pointers to the "EMPTY_HASH" which is a constant used as a
      placeholder when a hash has no entities. The problem was that the comand
      line version to set the hashes updated the actual EMPTY_HASH instead of
      creating a new hash for the function graph. This broke the EMPTY_HASH
      because not only did it modify a constant (not sure how that was allowed to
      happen, except maybe because it was done at early boot, const variables were
      still mutable), but it made the filters have functions listed in them when
      they were actually empty.
      
      The kernel command line function needs to allocate a new hash for the
      function graph filters and assign the necessary variables to that new hash
      instead.
      
      Link: http://lkml.kernel.org/r/1488420091.7212.17.camel@linux.intel.com
      
      Cc: Namhyung Kim <namhyung@kernel.org>
      Fixes: b9b0c831 ("ftrace: Convert graph filter to use hash tables")
      Reported-by: default avatarTodd Brandt <todd.e.brandt@linux.intel.com>
      Tested-by: default avatarTodd Brandt <todd.e.brandt@linux.intel.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      92ad18ec
  2. 27 Feb, 2017 1 commit
  3. 17 Feb, 2017 1 commit
  4. 15 Feb, 2017 8 commits
  5. 10 Feb, 2017 1 commit
  6. 03 Feb, 2017 8 commits
    • Steven Rostedt (VMware)'s avatar
      ftrace: Have set_graph_function handle multiple functions in one write · e704eff3
      Steven Rostedt (VMware) authored
      Currently, only one function can be written to set_graph_function and
      set_graph_notrace. The last function in the list will have saved, even
      though other functions will be added then removed.
      
      Change the behavior to be the same as set_ftrace_function as to allow
      multiple functions to be written. If any one fails, none of them will be
      added. The addition of the functions are done at the end when the file is
      closed.
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      e704eff3
    • Steven Rostedt (VMware)'s avatar
      ftrace: Do not hold references of ftrace_graph_{notrace_}hash out of graph_lock · 649b988b
      Steven Rostedt (VMware) authored
      The hashs ftrace_graph_hash and ftrace_graph_notrace_hash are modified
      within the graph_lock being held. Holding a pointer to them and passing them
      along can lead to a use of a stale pointer (fgd->hash). Move assigning the
      pointer and its use to within the holding of the lock. Note, it's an
      rcu_sched protected data, and other instances of referencing them are done
      with preemption disabled. But the file manipuation code must be protected by
      the lock.
      
      The fgd->hash pointer is set to NULL when the lock is being released.
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      649b988b
    • Steven Rostedt (VMware)'s avatar
      tracing: Reset parser->buffer to allow multiple "puts" · 0e684b65
      Steven Rostedt (VMware) authored
      trace_parser_put() simply frees the allocated parser buffer. But it does not
      reset the pointer that was freed. This means that if trace_parser_put() is
      called on the same parser more than once, it will corrupt the allocation
      system. Setting parser->buffer to NULL after free allows it to be called
      more than once without any ill effect.
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      0e684b65
    • Steven Rostedt (VMware)'s avatar
      ftrace: Have set_graph_functions handle write with RDWR · ae98d27a
      Steven Rostedt (VMware) authored
      Since reading the set_graph_functions uses seq functions, which sets the
      file->private_data pointer to a seq_file descriptor. On writes the
      ftrace_graph_data descriptor is set to file->private_data. But if the file
      is opened for RDWR, the ftrace_graph_write() will incorrectly use the
      file->private_data descriptor instead of
      ((struct seq_file *)file->private_data)->private pointer, and this can crash
      the kernel.
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      ae98d27a
    • Steven Rostedt (VMware)'s avatar
      ftrace: Reset fgd->hash in ftrace_graph_write() · d4ad9a1c
      Steven Rostedt (VMware) authored
      fgd->hash is saved and then freed, but is never reset to either
      ftrace_graph_hash nor ftrace_graph_notrace_hash. But if multiple writes are
      performed, then the freed hash could be accessed again.
      
       # cd /sys/kernel/debug/tracing
       # head -1000 available_filter_functions > /tmp/funcs
       # cat /tmp/funcs > set_graph_function
      
      Causes:
      
       general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
       Modules linked in:  [...]
       CPU: 2 PID: 1337 Comm: cat Not tainted 4.10.0-rc2-test-00010-g6b052e9 #32
       Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
       task: ffff880113a12200 task.stack: ffffc90001940000
       RIP: 0010:free_ftrace_hash+0x7c/0x160
       RSP: 0018:ffffc90001943db0 EFLAGS: 00010246
       RAX: 6b6b6b6b6b6b6b6b RBX: 6b6b6b6b6b6b6b6b RCX: 6b6b6b6b6b6b6b6b
       RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffff8800ce1e1d40
       RBP: ffff8800ce1e1d50 R08: 0000000000000000 R09: 0000000000006400
       R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
       R13: ffff8800ce1e1d40 R14: 0000000000004000 R15: 0000000000000001
       FS:  00007f9408a07740(0000) GS:ffff88011e500000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000aee1f0 CR3: 0000000116bb4000 CR4: 00000000001406e0
       Call Trace:
        ? ftrace_graph_write+0x150/0x190
        ? __vfs_write+0x1f6/0x210
        ? __audit_syscall_entry+0x17f/0x200
        ? rw_verify_area+0xdb/0x210
        ? _cond_resched+0x2b/0x50
        ? __sb_start_write+0xb4/0x130
        ? vfs_write+0x1c8/0x330
        ? SyS_write+0x62/0xf0
        ? do_syscall_64+0xa3/0x1b0
        ? entry_SYSCALL64_slow_path+0x25/0x25
       Code: 01 48 85 db 0f 84 92 00 00 00 b8 01 00 00 00 d3 e0 85 c0 7e 3f 83 e8 01 48 8d 6f 10 45 31 e4 4c 8d 34 c5 08 00 00 00 49 8b 45 08 <4a> 8b 34 20 48 85 f6 74 13 48 8b 1e 48 89 ef e8 20 fa ff ff 48
       RIP: free_ftrace_hash+0x7c/0x160 RSP: ffffc90001943db0
       ---[ end trace 999b48216bf4b393 ]---
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      d4ad9a1c
    • Steven Rostedt (VMware)'s avatar
      ftrace: Replace (void *)1 with a meaningful macro name FTRACE_GRAPH_EMPTY · 555fc781
      Steven Rostedt (VMware) authored
      When the set_graph_function or set_graph_notrace contains no records, a
      banner is displayed of either "#### all functions enabled ####" or
      "#### all functions disabled ####" respectively. To tell the seq operations
      to do this, (void *)1 is passed as a return value. Instead of using a
      hardcoded meaningless variable, define it as a macro.
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      555fc781
    • Steven Rostedt (VMware)'s avatar
      ftrace: Create a slight optimization on searching the ftrace_hash · 2b2c279c
      Steven Rostedt (VMware) authored
      This is a micro-optimization, but as it has to deal with a fast path of the
      function tracer, these optimizations can be noticed.
      
      The ftrace_lookup_ip() returns true if the given ip is found in the hash. If
      it's not found or the hash is NULL, it returns false. But there's some cases
      that a NULL hash is a true, and the ftrace_hash_empty() is tested before
      calling ftrace_lookup_ip() in those cases. But as ftrace_lookup_ip() tests
      that first, that adds a few extra unneeded instructions in those cases.
      
      A new static "always_inlined" function is created that does not perform the
      hash empty test. This most only be used by callers that do the check first
      anyway, as an empty or NULL hash could cause a crash if a lookup is
      performed on it.
      
      Also add kernel doc for the ftrace_lookup_ip() main function.
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      2b2c279c
    • Steven Rostedt (VMware)'s avatar
      tracing: Add ftrace_hash_key() helper function · 2b0cce0e
      Steven Rostedt (VMware) authored
      Replace the couple of use cases that has small logic to produce the ftrace
      function key id with a helper function. No need for duplicate code.
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      2b0cce0e
  7. 20 Jan, 2017 3 commits
  8. 19 Jan, 2017 2 commits
  9. 17 Jan, 2017 2 commits
    • Steven Rostedt (VMware)'s avatar
      tracing: Process constants for (un)likely() profiler · d45ae1f7
      Steven Rostedt (VMware) authored
      When running the likely/unlikely profiler, one of the results did not look
      accurate. It noted that the unlikely() in link_path_walk() was 100%
      incorrect. When I added a trace_printk() to see what was happening there, it
      became 80% correct! Looking deeper into what whas happening, I found that
      gcc split that if statement into two paths. One where the if statement
      became a constant, the other path a variable. The other path had the if
      statement always hit (making the unlikely there, always false), but since
      the #define unlikely() has:
      
        #define unlikely() (__builtin_constant_p(x) ? !!(x) : __branch_check__(x, 0))
      
      Where constants are ignored by the branch profiler, the "constant" path
      made by the compiler was ignored, even though it was hit 80% of the time.
      
      By just passing the constant value to the __branch_check__() function and
      tracing it out of line (as always correct, as likely/unlikely isn't a factor
      for constants), then we get back the accurate readings of branches that were
      optimized by gcc causing part of the execution to become constant.
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      d45ae1f7
    • Kenny Yu's avatar
      uprobe: Find last occurrence of ':' when parsing uprobe PATH:OFFSET · 6496bb72
      Kenny Yu authored
      Previously, `create_trace_uprobe` found the *first* occurence
      of the ':' character when parsing `PATH:OFFSET` for a uprobe.
      However, if the path contains a ':' character, then the function
      would parse the path incorrectly. Even worse, if the path does not
      exist, the subsequent call to `kern_path()` would set `ret` to
      `ENOENT`, leading to very cryptic errno values in user space.
      
      The fix is to find the *last* occurence of ':'.
      
      How to repro:: The write fails with "No such file or directory", suggesting
      incorrectly that the `uprobe_events` file does not exist.
      
        $ mkdir testing && cd testing
        $ cp /bin/bash .
        $ cp /bin/bash ./bash:with:colon
        $ echo "p:uprobes/p__root_testing_bash_0x6 /root/testing/bash:0x6" > /sys/kernel/debug/tracing/uprobe_events     # this works
        $ echo "p:uprobes/p__root_testing_bash_with_colon_0x6 /root/testing/bash:with:colon:0x6" >> /sys/kernel/debug/tracing/uprobe_events     # this doesn't
        -bash: echo: write error: No such file or directory
      
      With the patch:
      
        $ echo "p:uprobes/p__root_testing_bash_0x6 /root/testing/bash:0x6" > /sys/kernel/debug/tracing/uprobe_events     # this still works
        $ echo "p:uprobes/p__root_testing_bash_with_colon_0x6 /root/testing/bash:with:colon:0x6" >> /sys/kernel/debug/tracing/uprobe_events     # this works now too!
        $ cat /sys/kernel/debug/tracing/uprobe_events
        p:uprobes/p__root_testing_bash_0x6 /root/testing/bash:0x0000000000000006
        p:uprobes/p__root_testing_bash_with_colon_0x6 /root/testing/bash:with:colon:0x0000000000000006
      
      Link: http://lkml.kernel.org/r/20170113165834.4081016-1-kennyyu@fb.comSigned-off-by: default avatarKenny Yu <kennyyu@fb.com>
      Reviewed-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      6496bb72
  10. 01 Jan, 2017 2 commits
    • Linus Torvalds's avatar
      Linux 4.10-rc2 · 0c744ea4
      Linus Torvalds authored
      0c744ea4
    • Linus Torvalds's avatar
      Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 4759d386
      Linus Torvalds authored
      Pull DAX updates from Dan Williams:
       "The completion of Jan's DAX work for 4.10.
      
        As I mentioned in the libnvdimm-for-4.10 pull request, these are some
        final fixes for the DAX dirty-cacheline-tracking invalidation work
        that was merged through the -mm, ext4, and xfs trees in -rc1. These
        patches were prepared prior to the merge window, but we waited for
        4.10-rc1 to have a stable merge base after all the prerequisites were
        merged.
      
        Quoting Jan on the overall changes in these patches:
      
           "So I'd like all these 6 patches to go for rc2. The first three
            patches fix invalidation of exceptional DAX entries (a bug which
            is there for a long time) - without these patches data loss can
            occur on power failure even though user called fsync(2). The other
            three patches change locking of DAX faults so that ->iomap_begin()
            is called in a more relaxed locking context and we are safe to
            start a transaction there for ext4"
      
        These have received a build success notification from the kbuild
        robot, and pass the latest libnvdimm unit tests. There have not been
        any -next releases since -rc1, so they have not appeared there"
      
      * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        ext4: Simplify DAX fault path
        dax: Call ->iomap_begin without entry lock during dax fault
        dax: Finish fault completely when loading holes
        dax: Avoid page invalidation races and unnecessary radix tree traversals
        mm: Invalidate DAX radix tree entries only if appropriate
        ext2: Return BH_New buffers for zeroed blocks
      4759d386
  11. 30 Dec, 2016 2 commits
  12. 29 Dec, 2016 2 commits
    • Olof Johansson's avatar
      mm/filemap: fix parameters to test_bit() · 98473f9f
      Olof Johansson authored
       mm/filemap.c: In function 'clear_bit_unlock_is_negative_byte':
        mm/filemap.c:933:9: error: too few arguments to function 'test_bit'
          return test_bit(PG_waiters);
               ^~~~~~~~
      
      Fixes: b91e1302 ('mm: optimize PageWaiters bit use for unlock_page()')
      Signed-off-by: default avatarOlof Johansson <olof@lixom.net>
      Brown-paper-bag-by: default avatarLinus Torvalds <dummy@duh.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      98473f9f
    • Linus Torvalds's avatar
      mm: optimize PageWaiters bit use for unlock_page() · b91e1302
      Linus Torvalds authored
      In commit 62906027 ("mm: add PageWaiters indicating tasks are
      waiting for a page bit") Nick Piggin made our page locking no longer
      unconditionally touch the hashed page waitqueue, which not only helps
      performance in general, but is particularly helpful on NUMA machines
      where the hashed wait queues can bounce around a lot.
      
      However, the "clear lock bit atomically and then test the waiters bit"
      sequence turns out to be much more expensive than it needs to be,
      because you get a nasty stall when trying to access the same word that
      just got updated atomically.
      
      On architectures where locking is done with LL/SC, this would be trivial
      to fix with a new primitive that clears one bit and tests another
      atomically, but that ends up not working on x86, where the only atomic
      operations that return the result end up being cmpxchg and xadd.  The
      atomic bit operations return the old value of the same bit we changed,
      not the value of an unrelated bit.
      
      On x86, we could put the lock bit in the high bit of the byte, and use
      "xadd" with that bit (where the overflow ends up not touching other
      bits), and look at the other bits of the result.  However, an even
      simpler model is to just use a regular atomic "and" to clear the lock
      bit, and then the sign bit in eflags will indicate the resulting state
      of the unrelated bit #7.
      
      So by moving the PageWaiters bit up to bit #7, we can atomically clear
      the lock bit and test the waiters bit on x86 too.  And architectures
      with LL/SC (which is all the usual RISC suspects), the particular bit
      doesn't matter, so they are fine with this approach too.
      
      This avoids the extra access to the same atomic word, and thus avoids
      the costly stall at page unlock time.
      
      The only downside is that the interface ends up being a bit odd and
      specialized: clear a bit in a byte, and test the sign bit.  Nick doesn't
      love the resulting name of the new primitive, but I'd rather make the
      name be descriptive and very clear about the limitation imposed by
      trying to work across all relevant architectures than make it be some
      generic thing that doesn't make the odd semantics explicit.
      
      So this introduces the new architecture primitive
      
          clear_bit_unlock_is_negative_byte();
      
      and adds the trivial implementation for x86.  We have a generic
      non-optimized fallback (that just does a "clear_bit()"+"test_bit(7)"
      combination) which can be overridden by any architecture that can do
      better.  According to Nick, Power has the same hickup x86 has, for
      example, but some other architectures may not even care.
      
      All these optimizations mean that my page locking stress-test (which is
      just executing a lot of small short-lived shell scripts: "make test" in
      the git source tree) no longer makes our page locking look horribly bad.
      Before all these optimizations, just the unlock_page() costs were just
      over 3% of all CPU overhead on "make test".  After this, it's down to
      0.66%, so just a quarter of the cost it used to be.
      
      (The difference on NUMA is bigger, but there this micro-optimization is
      likely less noticeable, since the big issue on NUMA was not the accesses
      to 'struct page', but the waitqueue accesses that were already removed
      by Nick's earlier commit).
      Acked-by: default avatarNick Piggin <npiggin@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Andrew Lutomirski <luto@kernel.org>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b91e1302
  13. 28 Dec, 2016 2 commits
    • Linus Torvalds's avatar
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 2d706e79
      Linus Torvalds authored
      Pull crypto fix from Herbert Xu:
       "This fixes a hash corruption bug in the marvell driver"
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: marvell - Copy IVDIG before launching partial DMA ahash requests
      2d706e79
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 8f18e4d0
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Various ipvlan fixes from Eric Dumazet and Mahesh Bandewar.
      
          The most important is to not assume the packet is RX just because
          the destination address matches that of the device. Such an
          assumption causes problems when an interface is put into loopback
          mode.
      
       2) If we retry when creating a new tc entry (because we dropped the
          RTNL mutex in order to load a module, for example) we end up with
          -EAGAIN and then loop trying to replay the request. But we didn't
          reset some state when looping back to the top like this, and if
          another thread meanwhile inserted the same tc entry we were trying
          to, we re-link it creating an enless loop in the tc chain. Fix from
          Daniel Borkmann.
      
       3) There are two different WRITE bits in the MDIO address register for
          the stmmac chip, depending upon the chip variant. Due to a bug we
          could set them both, fix from Hock Leong Kweh.
      
       4) Fix mlx4 bug in XDP_TX handling, from Tariq Toukan.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        net: stmmac: fix incorrect bit set in gmac4 mdio addr register
        r8169: add support for RTL8168 series add-on card.
        net: xdp: remove unused bfp_warn_invalid_xdp_buffer()
        openvswitch: upcall: Fix vlan handling.
        ipv4: Namespaceify tcp_tw_reuse knob
        net: korina: Fix NAPI versus resources freeing
        net, sched: fix soft lockup in tc_classify
        net/mlx4_en: Fix user prio field in XDP forward
        tipc: don't send FIN message from connectionless socket
        ipvlan: fix multicast processing
        ipvlan: fix various issues in ipvlan_process_multicast()
      8f18e4d0
  14. 27 Dec, 2016 5 commits