1. 28 Mar, 2017 1 commit
    • Steven Rostedt (VMware)'s avatar
      ftrace/x86: Do no run CPU sync when there is only one CPU online · 2b87965a
      Steven Rostedt (VMware) authored
      Moving enabling of function tracing to early boot, even before scheduling is
      enabled, means that it is not safe to enable interrupts. When function
      tracing was enabled at boot up, it use to happen after scheduling and the
      other CPUs were brought up. That required running a sync across all CPUs
      when modifying the function hook locations in the code. To do the
      synchronization, interrupts had to be enabled. Now function tracing can be
      started before the other CPUs are brought up, and enabling interrupts in
      that case is dangerous. As only tho boot CPU is active, there is no reason
      to run the synchronization. If the online CPU count is one, do not bother
      doing the synchronization. This removes the need to enable interrupts.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      2b87965a
  2. 25 Mar, 2017 4 commits
    • Steven Rostedt (VMware)'s avatar
      tracing: Move trace_handle_return() out of line · af0009fc
      Steven Rostedt (VMware) authored
      Currently trace_handle_return() looks like this:
      
       static inline enum print_line_t trace_handle_return(struct trace_seq *s)
       {
              return trace_seq_has_overflowed(s) ?
                      TRACE_TYPE_PARTIAL_LINE : TRACE_TYPE_HANDLED;
       }
      
      Where trace_seq_overflowed(s) is:
      
       static inline bool trace_seq_has_overflowed(struct trace_seq *s)
       {
      	return s->full || seq_buf_has_overflowed(&s->seq);
       }
      
      And seq_buf_has_overflowed(&s->seq) is:
      
       static inline bool
       seq_buf_has_overflowed(struct seq_buf *s)
       {
      	return s->len > s->size;
       }
      
      Making trace_handle_return() into:
      
       return (s->full || (s->seq->len > s->seq->size)) ?
                 TRACE_TYPE_PARTIAL_LINE :
                 TRACE_TYPE_HANDLED;
      
      One would think this is not an issue to keep as an inline. But because this
      is used in the TRACE_EVENT() macro, it is extended for every tracepoint in
      the system. Taking a look at a single tracepoint x86_irq_vector (was the
      first one I randomly chosen). As trace_handle_return is used in the
      TRACE_EVENT() macro of trace_raw_output_##call() we disassemble
      trace_raw_output_x86_irq_vector and do a diff:
      
      - is the original
      + is the out-of-line code
      
      I removed identical lines that were different just due to different
      addresses.
      
      --- /tmp/irq-vec-orig	2017-03-16 09:12:48.569384851 -0400
      +++ /tmp/irq-vec-ool	2017-03-16 09:13:39.378153385 -0400
      @@ -6,27 +6,23 @@
              53                      push   %rbx
              48 89 fb                mov    %rdi,%rbx
              4c 8b a7 c0 20 00 00    mov    0x20c0(%rdi),%r12
              e8 f7 72 13 00          callq  ffffffff81155c80 <trace_raw_output_prep>
              83 f8 01                cmp    $0x1,%eax
              74 05                   je     ffffffff8101e993 <trace_raw_output_x86_irq_vector+0x23>
              5b                      pop    %rbx
              41 5c                   pop    %r12
              5d                      pop    %rbp
              c3                      retq
              41 8b 54 24 08          mov    0x8(%r12),%edx
      -       48 8d bb 98 10 00 00    lea    0x1098(%rbx),%rdi
      +       48 81 c3 98 10 00 00    add    $0x1098,%rbx
      -       48 c7 c6 7b 8a a0 81    mov    $0xffffffff81a08a7b,%rsi
      +       48 c7 c6 ab 8a a0 81    mov    $0xffffffff81a08aab,%rsi
      -       e8 c5 85 13 00          callq  ffffffff81156f70 <trace_seq_printf>
      
       === here's the start of the main difference ===
      
      +       48 89 df                mov    %rbx,%rdi
      +       e8 62 7e 13 00          callq  ffffffff81156810 <trace_seq_printf>
      -       8b 93 b8 20 00 00       mov    0x20b8(%rbx),%edx
      -       31 c0                   xor    %eax,%eax
      -       85 d2                   test   %edx,%edx
      -       75 11                   jne    ffffffff8101e9c8 <trace_raw_output_x86_irq_vector+0x58>
      -       48 8b 83 a8 20 00 00    mov    0x20a8(%rbx),%rax
      -       48 39 83 a0 20 00 00    cmp    %rax,0x20a0(%rbx)
      -       0f 93 c0                setae  %al
      +       48 89 df                mov    %rbx,%rdi
      +       e8 4a c5 12 00          callq  ffffffff8114af00 <trace_handle_return>
              5b                      pop    %rbx
      -       0f b6 c0                movzbl %al,%eax
      
       === end ===
      
              41 5c                   pop    %r12
              5d                      pop    %rbp
              c3                      retq
      
      If you notice, the original has 22 bytes of text more than the out of line
      version. As this is for every TRACE_EVENT() defined in the system, this can
      become quite large.
      
         text	   data	    bss	    dec	    hex	filename
      8690305	5450490	1298432	15439227	 eb957b	vmlinux-orig
      8681725	5450490	1298432	15430647	 eb73f7	vmlinux-handle
      
      This change has a total of 8580 bytes in savings.
      
       $ objdump -dr /tmp/vmlinux-orig | grep '^[0-9a-f]* <trace_raw_output' | wc -l
      324
      
      That's 324 tracepoints. But this does not include modules (which contain
      many more tracepoints). For an allyesconfig build:
      
       $ objdump -dr vmlinux-allyes-orig | grep '^[0-9a-f]* <trace_raw_output' | wc -l
      1401
      
      That's 1401 tracepoints giving us:
      
         text    data     bss     dec     hex filename
      137920629       140221067       53264384        331406080       13c0db00 vmlinux-allyes-orig
      137827709       140221067       53264384        331313160       13bf7008 vmlinux-allyes-handle
      
      92920 bytes in savings!!!
      
      Link: http://lkml.kernel.org/r/20170315021431.13107-2-andi@firstfloor.orgReported-by: default avatarAndi Kleen <andi@firstfloor.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      af0009fc
    • Steven Rostedt (VMware)'s avatar
      ftrace: Allow for function tracing to record init functions on boot up · 42c269c8
      Steven Rostedt (VMware) authored
      Adding a hook into free_reserve_area() that informs ftrace that boot up init
      text is being free, lets ftrace safely remove those init functions from its
      records, which keeps ftrace from trying to modify text that no longer
      exists.
      
      Note, this still does not allow for tracing .init text of modules, as
      modules require different work for freeing its init code.
      
      Link: http://lkml.kernel.org/r/1488502497.7212.24.camel@linux.intel.com
      
      Cc: linux-mm@kvack.org
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Requested-by: default avatarTodd Brandt <todd.e.brandt@linux.intel.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      42c269c8
    • Steven Rostedt (VMware)'s avatar
      ftrace: Have function tracing start in early boot up · dbeafd0d
      Steven Rostedt (VMware) authored
      Register the function tracer right after the tracing buffers are initialized
      in early boot up. This will allow function tracing to begin early if it is
      enabled via the kernel command line.
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      dbeafd0d
    • Steven Rostedt (VMware)'s avatar
      tracing: Postpone tracer start-up tests till the system is more robust · 9afecfbb
      Steven Rostedt (VMware) authored
      As tracing can now be enabled very early in boot up, even before some
      critical system services (like scheduling), do not run the tracer selftests
      until after early_initcall() is performed. If a tracer is registered before
      such time, it is saved off in a list and the test is run when the system is
      able to handle more diverse functions.
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      9afecfbb
  3. 24 Mar, 2017 2 commits
  4. 20 Mar, 2017 5 commits
    • Linus Torvalds's avatar
      Linux 4.11-rc3 · 97da3854
      Linus Torvalds authored
      97da3854
    • Linus Torvalds's avatar
      mm/swap: don't BUG_ON() due to uninitialized swap slot cache · 452b94b8
      Linus Torvalds authored
      This BUG_ON() triggered for me once at shutdown, and I don't see a
      reason for the check.  The code correctly checks whether the swap slot
      cache is usable or not, so an uninitialized swap slot cache is not
      actually problematic afaik.
      
      I've temporarily just switched the BUG_ON() to a WARN_ON_ONCE(), since
      I'm not sure why that seemingly pointless check was there.  I suspect
      the real fix is to just remove it entirely, but for now we'll warn about
      it but not bring the machine down.
      
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      452b94b8
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.11-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · a07a6e41
      Linus Torvalds authored
      Pull more powerpc fixes from Michael Ellerman:
       "A couple of minor powerpc fixes for 4.11:
      
         - wire up statx() syscall
      
         - don't print a warning on memory hotplug when HPT resizing isn't
           available
      
        Thanks to: David Gibson, Chandan Rajendra"
      
      * tag 'powerpc-4.11-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/pseries: Don't give a warning when HPT resizing isn't available
        powerpc: Wire up statx() syscall
      a07a6e41
    • Linus Torvalds's avatar
      Merge branch 'parisc-4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · 4571bc5a
      Linus Torvalds authored
      Pull parisc fixes from Helge Deller:
      
       - Mikulas Patocka added support for R_PARISC_SECREL32 relocations in
         modules with CONFIG_MODVERSIONS.
      
       - Dave Anglin optimized the cache flushing for vmap ranges.
      
       - Arvind Yadav provided a fix for a potential NULL pointer dereference
         in the parisc perf code (and some code cleanups).
      
       - I wired up the new statx system call, fixed some compiler warnings
         with the access_ok() macro and fixed shutdown code to really halt a
         system at shutdown instead of crashing & rebooting.
      
      * 'parisc-4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: Fix system shutdown halt
        parisc: perf: Fix potential NULL pointer dereference
        parisc: Avoid compiler warnings with access_ok()
        parisc: Wire up statx system call
        parisc: Optimize flush_kernel_vmap_range and invalidate_kernel_vmap_range
        parisc: support R_PARISC_SECREL32 relocation in modules
      4571bc5a
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending · 8aa34172
      Linus Torvalds authored
      Pull SCSI target fixes from Nicholas Bellinger:
       "The bulk of the changes are in qla2xxx target driver code to address
        various issues found during Cavium/QLogic's internal testing (stable
        CC's included), along with a few other stability and smaller
        miscellaneous improvements.
      
        There are also a couple of different patch sets from Mike Christie,
        which have been a result of his work to use target-core ALUA logic
        together with tcm-user backend driver.
      
        Finally, a patch to address some long standing issues with
        pass-through SCSI export of TYPE_TAPE + TYPE_MEDIUM_CHANGER devices,
        which will make folks using physical (or virtual) magnetic tape happy"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (28 commits)
        qla2xxx: Update driver version to 9.00.00.00-k
        qla2xxx: Fix delayed response to command for loop mode/direct connect.
        qla2xxx: Change scsi host lookup method.
        qla2xxx: Add DebugFS node to display Port Database
        qla2xxx: Use IOCB interface to submit non-critical MBX.
        qla2xxx: Add async new target notification
        qla2xxx: Export DIF stats via debugfs
        qla2xxx: Improve T10-DIF/PI handling in driver.
        qla2xxx: Allow relogin to proceed if remote login did not finish
        qla2xxx: Fix sess_lock & hardware_lock lock order problem.
        qla2xxx: Fix inadequate lock protection for ABTS.
        qla2xxx: Fix request queue corruption.
        qla2xxx: Fix memory leak for abts processing
        qla2xxx: Allow vref count to timeout on vport delete.
        tcmu: Convert cmd_time_out into backend device attribute
        tcmu: make cmd timeout configurable
        tcmu: add helper to check if dev was configured
        target: fix race during implicit transition work flushes
        target: allow userspace to set state to transitioning
        target: fix ALUA transition timeout handling
        ...
      8aa34172
  5. 19 Mar, 2017 15 commits
  6. 18 Mar, 2017 13 commits
    • Nicholas Bellinger's avatar
      tcmu: Convert cmd_time_out into backend device attribute · 7d7a7435
      Nicholas Bellinger authored
      Instead of putting cmd_time_out under ../target/core/user_0/foo/control,
      which has historically been used by parameters needed for initial
      backend device configuration, go ahead and move cmd_time_out into
      a backend device attribute.
      
      In order to do this, tcmu_module_init() has been updated to create
      a local struct configfs_attribute **tcmu_attrs, that is based upon
      the existing passthrough_attrib_attrs along with the new cmd_time_out
      attribute.  Once **tcm_attrs has been setup, go ahead and point
      it at tcmu_ops->tb_dev_attrib_attrs so it's picked up by target-core.
      
      Also following MNC's previous change, ->cmd_time_out is stored in
      milliseconds but exposed via configfs in seconds.  Also, note this
      patch restricts the modification of ->cmd_time_out to before +
      after the TCMU device has been configured, but not while it has
      active fabric exports.
      
      Cc: Mike Christie <mchristi@redhat.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      7d7a7435
    • Mike Christie's avatar
      tcmu: make cmd timeout configurable · af980e46
      Mike Christie authored
      A single daemon could implement multiple types of devices
      using multuple types of real devices that may not support
      restarting from crashes and/or handling tcmu timeouts. This
      makes the cmd timeout configurable, so handlers that do not
      support it can turn if off for now.
      Signed-off-by: default avatarMike Christie <mchristi@redhat.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      af980e46
    • Mike Christie's avatar
      tcmu: add helper to check if dev was configured · 972c7f16
      Mike Christie authored
      This adds a helper to check if the dev was configured. It
      will be used in the next patch to prevent updates to some
      config settings after the device has been setup.
      Signed-off-by: default avatarMike Christie <mchristi@redhat.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      972c7f16
    • Linus Torvalds's avatar
      Merge tag 'openrisc-for-linus' of git://github.com/openrisc/linux · 93afaa45
      Linus Torvalds authored
      Pull OpenRISC fixes from Stafford Horne:
       "OpenRISC fixes for build issues that were exposed by kbuild robots
        after 4.11 merge. All from allmodconfig builds. This includes:
      
         - bug in the handling of 8-byte get_user() calls
      
         - module build failure due to multile missing symbol exports"
      
      * tag 'openrisc-for-linus' of git://github.com/openrisc/linux:
        openrisc: Export symbols needed by modules
        openrisc: fix issue handling 8 byte get_user calls
        openrisc: xchg: fix `computed is not used` warning
      93afaa45
    • Mike Christie's avatar
      target: fix race during implicit transition work flushes · 760bf578
      Mike Christie authored
      This fixes the following races:
      
      1. core_alua_do_transition_tg_pt could have read
      tg_pt_gp_alua_access_state and gone into this if chunk:
      
      if (!explicit &&
              atomic_read(&tg_pt_gp->tg_pt_gp_alua_access_state) ==
                 ALUA_ACCESS_STATE_TRANSITION) {
      
      and then core_alua_do_transition_tg_pt_work could update the
      state. core_alua_do_transition_tg_pt would then only set
      tg_pt_gp_alua_pending_state and the tg_pt_gp_alua_access_state would
      not get updated with the second calls state.
      
      2. core_alua_do_transition_tg_pt could be setting
      tg_pt_gp_transition_complete while the tg_pt_gp_transition_work
      is already completing. core_alua_do_transition_tg_pt then waits on the
      completion that will never be called.
      
      To handle these issues, we just call flush_work which will return when
      core_alua_do_transition_tg_pt_work has completed so there is no need
      to do the complete/wait. And, if core_alua_do_transition_tg_pt_work
      was running, instead of trying to sneak in the state change, we just
      schedule up another core_alua_do_transition_tg_pt_work call.
      
      Note that this does not handle a possible race where there are multiple
      threads call core_alua_do_transition_tg_pt at the same time. I think
      we need a mutex in target_tg_pt_gp_alua_access_state_store.
      Signed-off-by: default avatarMike Christie <mchristi@redhat.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      760bf578
    • Mike Christie's avatar
      target: allow userspace to set state to transitioning · 1ca4d4fa
      Mike Christie authored
      Userspace target_core_user handlers like tcmu-runner may want to set the
      ALUA state to transitioning while it does implicit transitions. This
      patch allows that state when set from configfs.
      Signed-off-by: default avatarMike Christie <mchristi@redhat.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      1ca4d4fa
    • Mike Christie's avatar
      target: fix ALUA transition timeout handling · d7175373
      Mike Christie authored
      The implicit transition time tells initiators the min time
      to wait before timing out a transition. We currently schedule
      the transition to occur in tg_pt_gp_implicit_trans_secs
      seconds so there is no room for delays. If
      core_alua_do_transition_tg_pt_work->core_alua_update_tpg_primary_metadata
      needs to write out info to a remote file, then the initiator can
      easily time out the operation.
      Signed-off-by: default avatarMike Christie <mchristi@redhat.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      d7175373
    • Mike Christie's avatar
      target: Use system workqueue for ALUA transitions · 207ee841
      Mike Christie authored
      If tcmu-runner is processing a STPG and needs to change the kernel's
      ALUA state then we cannot use the same work queue for task management
      requests and ALUA transitions, because we could deadlock. The problem
      occurs when a STPG times out before tcmu-runner is able to
      call into target_tg_pt_gp_alua_access_state_store->
      core_alua_do_port_transition -> core_alua_do_transition_tg_pt ->
      queue_work. In this case, the tmr is on the work queue waiting for
      the STPG to complete, but the STPG transition is now queued behind
      the waiting tmr.
      
      Note:
      This bug will also be fixed by this patch:
      http://www.spinics.net/lists/target-devel/msg14560.html
      which switches the tmr code to use the system workqueues.
      
      For both, I am not sure if we need a dedicated workqueue since
      it is not a performance path and I do not think we need WQ_MEM_RECLAIM
      to make forward progress to free up memory like the block layer does.
      Signed-off-by: default avatarMike Christie <mchristi@redhat.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      207ee841
    • Mike Christie's avatar
      target: fail ALUA transitions for pscsi · 0a414572
      Mike Christie authored
      We do not setup the LU group for pscsi devices, so if you write
      a state to alua_access_state that will cause a transition you will
      get a NULL pointer dereference.
      
      This patch will fail attempts to try and transition the path
      for backend devices that set the TRANSPORT_FLAG_PASSTHROUGH_ALUA
      flag.
      Signed-off-by: default avatarMike Christie <mchristi@redhat.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      0a414572
    • Mike Christie's avatar
      target: allow ALUA setup for some passthrough backends · 530c6891
      Mike Christie authored
      This patch allows passthrough backends to use the core/base LIO
      ALUA setup and state checks, but still handle the execution of
      commands.
      
      This will allow the target_core_user module to execute STPG and RTPG
      in userspace, and not have to duplicate the ALUA state checks, path
      information (needed so we can check if command is executable on
      specific paths) and setup (rtslib sets/updates the configfs ALUA
      interface like it does for iblock or file).
      
      For STPG, the target_core_user userspace daemon, tcmu-runner will
      still execute the STPG, and to update the core/base LIO state it
      will use the existing configfs interface. For RTPG, tcmu-runner
      will loop over configfs and/or cache the state.
      Signed-off-by: default avatarMike Christie <mchristi@redhat.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      530c6891
    • Mike Christie's avatar
      tcmu: return on first Opt parse failure · 2579325c
      Mike Christie authored
      We only were returing failure if the last opt to be parsed failed.
      This has a return failure when we first detect a failure.
      Signed-off-by: default avatarMike Christie <mchristi@redhat.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      2579325c
    • Mike Christie's avatar
      tcmu: allow hw_max_sectors greater than 128 · 3abaa2bf
      Mike Christie authored
      tcmu hard codes the hw_max_sectors to 128 which is a litle small.
      Userspace uses the max_sectors to report the optimal IO size and
      some initiators perform better with larger IOs (open-iscsi seems
      to do better with 256 to 512 depending on the test).
      
      (Fix do not display hw max sectors twice - MNC)
      Signed-off-by: default avatarMike Christie <mchristi@redhat.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      3abaa2bf
    • Nicholas Bellinger's avatar
      target: Drop pointless tfo->check_stop_free check · 9c28ca4f
      Nicholas Bellinger authored
      All in-tree fabric drivers provide a tfo->check_stop_free(),
      so there is no need to do the extra check within existing
      transport_cmd_check_stop_to_fabric() code.
      
      Just to be sure, add a check in target_fabric_tf_ops_check()
      to notify any out-of-tree drivers that might be missing it.
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      9c28ca4f