1. 30 Sep, 2016 3 commits
  2. 28 Sep, 2016 1 commit
  3. 27 Sep, 2016 1 commit
  4. 26 Sep, 2016 3 commits
  5. 22 Sep, 2016 6 commits
  6. 20 Sep, 2016 8 commits
  7. 06 Sep, 2016 3 commits
  8. 01 Sep, 2016 1 commit
  9. 29 Aug, 2016 13 commits
    • Martin Schwidefsky's avatar
      s390/crypto: simplify CPACF encryption / decryption functions · 7bac4f5b
      Martin Schwidefsky authored
      The double while loops of the CTR mode encryption / decryption functions
      are overly complex for little gain. Simplify the functions to a single
      while loop at the cost of an additional memcpy of a few bytes for every
      4K page worth of data.
      Adapt the other crypto functions to make them all look alike.
      Reviewed-by: default avatarHarald Freudenberger <freude@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      7bac4f5b
    • Martin Schwidefsky's avatar
      s390/crypto: cpacf function detection · 69c0e360
      Martin Schwidefsky authored
      The CPACF code makes some assumptions about the availablity of hardware
      support. E.g. if the machine supports KM(AES-256) without chaining it is
      assumed that KMC(AES-256) with chaining is available as well. For the
      existing CPUs this is true but the architecturally correct way is to
      check each CPACF functions on its own. This is what the query function
      of each instructions is all about.
      Reviewed-by: default avatarHarald Freudenberger <freude@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      69c0e360
    • Martin Schwidefsky's avatar
      s390/crypto: simplify init / exit functions · d863d594
      Martin Schwidefsky authored
      The aes and the des module register multiple crypto algorithms
      dependent on the availability of specific CPACF instructions.
      To simplify the deregistration with crypto_unregister_alg add
      an array with pointers to the successfully registered algorithms
      and use it for the error handling in the init function and in
      the module exit function.
      Reviewed-by: default avatarHarald Freudenberger <freude@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      d863d594
    • Martin Schwidefsky's avatar
      s390/crypto: simplify return code handling · 0177db01
      Martin Schwidefsky authored
      The CPACF instructions can complete with three different condition codes:
      CC=0 for successful completion, CC=1 if the protected key verification
      failed, and CC=3 for partial completion.
      
      The inline functions will restart the CPACF instruction for partial
      completion, this removes the CC=3 case. The CC=1 case is only relevant
      for the protected key functions of the KM, KMC, KMAC and KMCTR
      instructions. As the protected key functions are not used by the
      current code, there is no need for any kind of return code handling.
      Reviewed-by: default avatarHarald Freudenberger <freude@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      0177db01
    • Martin Schwidefsky's avatar
      s390/crypto: cleanup cpacf function codes · edc63a37
      Martin Schwidefsky authored
      Use a separate define for the decryption modifier bit instead of
      duplicating the function codes for encryption / decrypton.
      In addition use an unsigned type for the function code.
      Reviewed-by: default avatarHarald Freudenberger <freude@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      edc63a37
    • Martin Schwidefsky's avatar
      RAID/s390: add SIMD implementation for raid6 gen/xor · 474fd6e8
      Martin Schwidefsky authored
      Using vector registers is slightly faster:
      
      raid6: vx128x8  gen() 19705 MB/s
      raid6: vx128x8  xor() 11886 MB/s
      raid6: using algorithm vx128x8 gen() 19705 MB/s
      raid6: .... xor() 11886 MB/s, rmw enabled
      
      vs the software algorithms:
      
      raid6: int64x1  gen()  3018 MB/s
      raid6: int64x1  xor()  1429 MB/s
      raid6: int64x2  gen()  4661 MB/s
      raid6: int64x2  xor()  3143 MB/s
      raid6: int64x4  gen()  5392 MB/s
      raid6: int64x4  xor()  3509 MB/s
      raid6: int64x8  gen()  4441 MB/s
      raid6: int64x8  xor()  3207 MB/s
      raid6: using algorithm int64x4 gen() 5392 MB/s
      raid6: .... xor() 3509 MB/s, rmw enabled
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      474fd6e8
    • Martin Schwidefsky's avatar
      s390/nmi: improve revalidation of fpu / vector registers · 8f149ea6
      Martin Schwidefsky authored
      The machine check handler will do one of two things if the floating-point
      control, a floating point register or a vector register can not be
      revalidated:
      1) if the PSW indicates user mode the process is terminated
      2) if the PSW indicates kernel mode the system is stopped
      
      To unconditionally stop the system for 2) is incorrect.
      
      There are three possible outcomes if the floating-point control, a
      floating point register or a vector registers can not be revalidated:
      1) The kernel is inside a kernel_fpu_begin/kernel_fpu_end block and
         needs the register. The system is stopped.
      2) No active kernel_fpu_begin/kernel_fpu_end block and the CIF_CPU bit
         is not set. The user space process needs the register and is killed.
      3) No active kernel_fpu_begin/kernel_fpu_end block and the CIF_FPU bit
         is set. Neither the kernel nor the user space process needs the
         lost register. Just revalidate it and continue.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      8f149ea6
    • Martin Schwidefsky's avatar
      s390/fpu: improve kernel_fpu_[begin|end] · 7f79695c
      Martin Schwidefsky authored
      In case of nested user of the FPU or vector registers in the kernel
      the current code uses the mask of the FPU/vector registers of the
      previous contexts to decide which registers to save and restore.
      E.g. if the previous context used KERNEL_VXR_V0V7 and the next
      context wants to use KERNEL_VXR_V24V31 the first 8 vector registers
      are stored to the FPU state structure. But this is not necessary
      as the next context does not use these registers.
      
      Rework the FPU/vector register save and restore code. The new code
      does a few things differently:
      1) A lowcore field is used instead of a per-cpu variable.
      2) The kernel_fpu_end function now has two parameters just like
         kernel_fpu_begin. The register flags are required by both
         functions to save / restore the minimal register set.
      3) The inline functions kernel_fpu_begin/kernel_fpu_end now do the
         update of the register masks. If the user space FPU registers
         have already been stored neither save_fpu_regs nor the
         __kernel_fpu_begin/__kernel_fpu_end functions have to be called
         for the first context. In this case kernel_fpu_begin adds 7
         instructions and kernel_fpu_end adds 4 instructions.
      3) The inline assemblies in __kernel_fpu_begin / __kernel_fpu_end
         to save / restore the vector registers are simplified a bit.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      7f79695c
    • Martin Schwidefsky's avatar
      s390/vx: allow to include vx-insn.h with .include · 0eab11c7
      Martin Schwidefsky authored
      To make the vx-insn.h more versatile avoid cpp preprocessor macros
      and allow to use plain numbers for vector and general purpose register
      operands. With that you can emit an .include from a C file into the
      assembler text and then use the vx-insn macros in inline assemblies.
      
      For example:
      
      asm (".include \"asm/vx-insn.h\"");
      
      static inline void xor_vec(int x, int y, int z)
      {
      	asm volatile("VX %0,%1,%2"
      		     : : "i" (x), "i" (y), "i" (z));
      }
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      0eab11c7
    • David Hildenbrand's avatar
      s390/time: avoid races when updating tb_update_count · 67f03de5
      David Hildenbrand authored
      The increment might not be atomic and we're not holding the
      timekeeper_lock. Therefore we might lose an update to count, resulting in
      VDSO being trapped in a loop. As other archs also simply update the
      values and count doesn't seem to have an impact on reloading of these
      values in VDSO code, let's just remove the update of tb_update_count.
      Suggested-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      67f03de5
    • David Hildenbrand's avatar
      s390/time: fixup the clock comparator on all cpus · 0c00b1e0
      David Hildenbrand authored
      By leaving fixup_cc unset, only the clock comparator of the cpu actually
      doing the sync is fixed up until now.
      Signed-off-by: default avatarDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      0c00b1e0
    • David Hildenbrand's avatar
      s390/time: cleanup etr leftovers · ca64f639
      David Hildenbrand authored
      There are still some etr leftovers and wrong comments, let's clean that up.
      Signed-off-by: default avatarDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      ca64f639
    • David Hildenbrand's avatar
      s390/time: simplify stp time syncs · 41ad0220
      David Hildenbrand authored
      The way we call do_adjtimex() today is broken. It has 0 effect, as
      ADJ_OFFSET_SINGLESHOT (0x0001) in the kernel maps to !ADJ_ADJTIME
      (in contrast to user space where it maps to  ADJ_OFFSET_SINGLESHOT |
      ADJ_ADJTIME - 0x8001). !ADJ_ADJTIME will silently ignore all adjustments
      without STA_PLL being active. We could switch to ADJ_ADJTIME or turn
      STA_PLL on, but still we would run into some problems:
      
      - Even when switching to nanoseconds, we lose accuracy.
      - Successive calls to do_adjtimex() will simply overwrite any leftovers
        from the previous call (if not fully handled)
      - Anything that NTP does using the sysctl heavily interferes with our
        use.
      - !ADJ_ADJTIME will silently round stuff > or < than 0.5 seconds
      
      Reusing do_adjtimex() here just feels wrong. The whole STP synchronization
      works right now *somehow* only, as do_adjtimex() does nothing and our
      TOD clock jumps in time, although it shouldn't. This is especially bad
      as the clock could jump backwards in time. We will have to find another
      way to fix this up.
      
      As leap seconds are also not properly handled yet, let's just get rid of
      all this complex logic altogether and use the correct clock_delta for
      fixing up the clock comparator and keeping the sched_clock monotonic.
      
      This change should have 0 effect on the current STP mechanism. Once we
      know how to best handle sync events and leap second updates, we'll start
      with a fresh implementation.
      Signed-off-by: default avatarDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      41ad0220
  10. 26 Aug, 2016 1 commit