1. 05 Feb, 2019 6 commits
    • Sam Bobroff's avatar
      powerpc/eeh: Correct retries in eeh_pe_reset_full() · 195482c3
      Sam Bobroff authored
      Currently, eeh_pe_reset_full() will only attempt to reset a PE more
      than once if activating the reset state and deactivating it both
      succeed, but later polling shows that it hasn't become active.
      
      Change this so that it will try up to three times for any reason other
      than an unrecoverable slot error and adjust the message generation so
      that it's clear weather the reset has ultimately succeeded or failed.
      This allows the reset to succeed in some situations where it would
      currently fail.
      Signed-off-by: default avatarSam Bobroff <sbobroff@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      195482c3
    • Sam Bobroff's avatar
      powerpc/eeh: Improve recovery of passed-through devices · 1ef52073
      Sam Bobroff authored
      Currently, the EEH recovery process considers passed-through devices
      as if they were not EEH-aware, which can cause them to be removed as
      part of recovery.  Because device removal requires cooperation from
      the guest, this may lead to the process stalling or deadlocking.
      Also, if devices are removed on the host side, they will be removed
      from their IOMMU group, making recovery in the guest impossible.
      
      Therefore, alter the recovery process so that passed-through devices
      are not removed but are instead left frozen (and marked isolated)
      until the guest performs it's own recovery.  If firmware thaws a
      passed-through PE because it's parent PE has been thawed (because it
      was not passed through), re-freeze it.
      Signed-off-by: default avatarSam Bobroff <sbobroff@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      1ef52073
    • Sam Bobroff's avatar
      powerpc/eeh: Add include_passed to eeh_clear_pe_frozen_state() · 4d8e325d
      Sam Bobroff authored
      Add a parameter to eeh_clear_pe_frozen_state() that allows
      passed-through PEs to be excluded. Update callers to always pass true
      so that there is no change in behaviour.
      
      This is to prepare for follow-up work for passed-through devices.
      Signed-off-by: default avatarSam Bobroff <sbobroff@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      4d8e325d
    • Sam Bobroff's avatar
      powerpc/eeh: Add include_passed to eeh_pe_state_clear() · 9ed5ca66
      Sam Bobroff authored
      Add a parameter to eeh_pe_state_clear() that allows passed-through PEs
      to be excluded. Update callers to always pass true so that there is no
      change in behaviour.
      
      Also refactor to use direct traversal, to allow the removal of some
      boilerplate.
      
      This is to prepare for follow-up work for passed-through devices.
      Signed-off-by: default avatarSam Bobroff <sbobroff@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      9ed5ca66
    • Sam Bobroff's avatar
      powerpc/eeh: remove sw_state from eeh_unfreeze_pe() · 188fdea6
      Sam Bobroff authored
      eeh_unfreeze_pe() performs two operations: unfreezing a PE (which may
      cause firmware to unfreeze child PEs as well) and de-isolating the PE
      and it's children.
      
      To simplify this and support future work, separate out the
      de-isolation and perform it at the call sites (when necessary).
      
      There should be no change in behaviour.
      Signed-off-by: default avatarSam Bobroff <sbobroff@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      188fdea6
    • Sam Bobroff's avatar
      powerpc/eeh: Cleanup eeh_pe_clear_frozen_state() · 3376cb91
      Sam Bobroff authored
      The 'clear_sw_state' parameter for eeh_pe_clear_frozen_state() is
      redundant because it has no effect (except in the rare case of a
      hardware error part way through unfreezing a tree of PEs, where it
      would dangerously allow partial de-isolation before returning
      failure).
      
      It is passed down to __eeh_pe_clear_frozen_state(), and from there to
      eeh_unfreeze_pe(), where it causes EEH_PE_ISOLATED to be removed
      from the state of each PE during the traversal.  However, when the
      traversal finishes, EEH_PE_ISOLATED is unconditionally removed by a
      call to eeh_pe_state_clear() regardless of the parameter's value.
      
      So remove the flag and pass false to eeh_unfreeze_pe() (to avoid the
      rare case described above, as it was before the flag was introduced).
      Also, perform the recursion directly in the function and eliminate a
      bit of boilerplate.
      
      There should be no change in functionality, except as mentioned above.
      Signed-off-by: default avatarSam Bobroff <sbobroff@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      3376cb91
  2. 04 Feb, 2019 1 commit
  3. 03 Feb, 2019 2 commits
  4. 01 Feb, 2019 1 commit
  5. 31 Jan, 2019 5 commits
    • Joe Lawrence's avatar
      powerpc/livepatch: return -ERRNO values in save_stack_trace_tsk_reliable() · 3de27dcf
      Joe Lawrence authored
      To match its x86 counterpart, save_stack_trace_tsk_reliable() should
      return -EINVAL in cases that it is currently returning 1.  No caller is
      currently differentiating non-zero error codes, but let's keep the
      arch-specific implementations consistent.
      Signed-off-by: default avatarJoe Lawrence <joe.lawrence@redhat.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      3de27dcf
    • Joe Lawrence's avatar
      powerpc/livepatch: small cleanups in save_stack_trace_tsk_reliable() · 29a77bbb
      Joe Lawrence authored
      Mostly cosmetic changes:
      
      - Group common stack pointer code at the top
      - Simplify the first frame logic
      - Code stackframe iteration into for...loop construct
      - Check for trace->nr_entries overflow before adding any into the array
      Suggested-by: default avatarNicolai Stange <nstange@suse.de>
      Signed-off-by: default avatarJoe Lawrence <joe.lawrence@redhat.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      29a77bbb
    • Joe Lawrence's avatar
      powerpc/livepatch: relax reliable stack tracer checks for first-frame · 18be3760
      Joe Lawrence authored
      The bottom-most stack frame (the first to be unwound) may be largely
      uninitialized, for the "Power Architecture 64-Bit ELF V2 ABI" only
      requires its backchain pointer to be set.
      
      The reliable stack tracer should be careful when verifying this frame:
      skip checks on STACK_FRAME_LR_SAVE and STACK_FRAME_MARKER offsets that
      may contain uninitialized residual data.
      
      Fixes: df78d3f6 ("powerpc/livepatch: Implement reliable stack tracing for the consistency model")
      Signed-off-by: default avatarJoe Lawrence <joe.lawrence@redhat.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      18be3760
    • Nicolai Stange's avatar
      powerpc/64s: Make reliable stacktrace dependency clearer · a50d3250
      Nicolai Stange authored
      Make the HAVE_RELIABLE_STACKTRACE Kconfig option depend on
      PPC_BOOK3S_64 for documentation purposes. Before this patch, it
      depended on PPC64 && CPU_LITTLE_ENDIAN and because CPU_LITTLE_ENDIAN
      implies PPC_BOOK3S_64, there's no functional change here.
      Signed-off-by: default avatarNicolai Stange <nstange@suse.de>
      Signed-off-by: default avatarJoe Lawrence <joe.lawrence@redhat.com>
      [mpe: Split out of larger patch]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      a50d3250
    • Nicolai Stange's avatar
      powerpc/64s: Clear on-stack exception marker upon exception return · eddd0b33
      Nicolai Stange authored
      The ppc64 specific implementation of the reliable stacktracer,
      save_stack_trace_tsk_reliable(), bails out and reports an "unreliable
      trace" whenever it finds an exception frame on the stack. Stack frames
      are classified as exception frames if the STACK_FRAME_REGS_MARKER
      magic, as written by exception prologues, is found at a particular
      location.
      
      However, as observed by Joe Lawrence, it is possible in practice that
      non-exception stack frames can alias with prior exception frames and
      thus, that the reliable stacktracer can find a stale
      STACK_FRAME_REGS_MARKER on the stack. It in turn falsely reports an
      unreliable stacktrace and blocks any live patching transition to
      finish. Said condition lasts until the stack frame is
      overwritten/initialized by function call or other means.
      
      In principle, we could mitigate this by making the exception frame
      classification condition in save_stack_trace_tsk_reliable() stronger:
      in addition to testing for STACK_FRAME_REGS_MARKER, we could also take
      into account that for all exceptions executing on the kernel stack
        - their stack frames's backlink pointers always match what is saved
          in their pt_regs instance's ->gpr[1] slot and that
        - their exception frame size equals STACK_INT_FRAME_SIZE, a value
          uncommonly large for non-exception frames.
      
      However, while these are currently true, relying on them would make
      the reliable stacktrace implementation more sensitive towards future
      changes in the exception entry code. Note that false negatives, i.e.
      not detecting exception frames, would silently break the live patching
      consistency model.
      
      Furthermore, certain other places (diagnostic stacktraces, perf, xmon)
      rely on STACK_FRAME_REGS_MARKER as well.
      
      Make the exception exit code clear the on-stack
      STACK_FRAME_REGS_MARKER for those exceptions running on the "normal"
      kernel stack and returning to kernelspace: because the topmost frame
      is ignored by the reliable stack tracer anyway, returns to userspace
      don't need to take care of clearing the marker.
      
      Furthermore, as I don't have the ability to test this on Book 3E or 32
      bits, limit the change to Book 3S and 64 bits.
      
      Fixes: df78d3f6 ("powerpc/livepatch: Implement reliable stack tracing for the consistency model")
      Reported-by: default avatarJoe Lawrence <joe.lawrence@redhat.com>
      Signed-off-by: default avatarNicolai Stange <nstange@suse.de>
      Signed-off-by: default avatarJoe Lawrence <joe.lawrence@redhat.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      eddd0b33
  6. 30 Jan, 2019 8 commits
  7. 15 Jan, 2019 12 commits
  8. 14 Jan, 2019 5 commits