Commits · 907fa061cc24f7092db4fcb7938b9cb6ea7dd15e · Kirill Smelkov / linux

28 Jun, 2016 16 commits

Heiko Carstens authored Jun 20, 2016

Now that hopefully all inline assemblies have been converted to single
basic blocks we can enable kcov on s390.

Note that this patch does not disable as many files on s390 like the
x86 variant does. Right now I didn't see a reason to do that, however
additional files or directories can be excluded at any time.

The runtime overhead seems to be quite high.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

907fa061

s390/cpumf: use basic block for ecctr inline assembly · e238c15e

Heiko Carstens authored Jun 20, 2016

Use only simple inline assemblies which consist of a single basic
block if the register asm construct is being used.

Otherwise gcc would generate broken code if the compiler option
--sanitize-coverage=trace-pc would be used.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

e238c15e

s390/hypfs: use basic block for diag inline assembly · e030c112

Heiko Carstens authored Jun 20, 2016

Use only simple inline assemblies which consist of a single basic
block if the register asm construct is being used.

Otherwise gcc would generate broken code if the compiler option
--sanitize-coverage=trace-pc would be used.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

e030c112

s390/sysinfo: use basic block for stsi inline assembly · 2c79813a

Heiko Carstens authored Jun 20, 2016

Use only simple inline assemblies which consist of a single basic
block if the register asm construct is being used.

Otherwise gcc would generate broken code if the compiler option
--sanitize-coverage=trace-pc would be used.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

2c79813a

s390/smp: use basic blocks for sigp inline assemblies · 80a60f6e

Heiko Carstens authored Jun 20, 2016

Use only simple inline assemblies which consist of a single basic
block if the register asm construct is being used.

Otherwise gcc would generate broken code if the compiler option
--sanitize-coverage=trace-pc would be used.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

80a60f6e

s390/cio: use basic blocks for i/o inline assemblies · a4d9b97c

Heiko Carstens authored Jun 20, 2016

Use only simple inline assemblies which consist of a single basic
block if the register asm construct is being used.

Otherwise gcc would generate broken code if the compiler option
--sanitize-coverage=trace-pc would be used.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

a4d9b97c

s390/cio: use basic blocks for cmf inline assemblies · 7b4ff87c

Heiko Carstens authored Jun 20, 2016

Use only simple inline assemblies which consist of a single basic
block if the register asm construct is being used.

Otherwise gcc would generate broken code if the compiler option
--sanitize-coverage=trace-pc would be used.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

7b4ff87c

s390/pci: use basic blocks for pci inline assemblies · 7b411ac6

Heiko Carstens authored Jun 20, 2016

Use only simple inline assemblies which consist of a single basic
block if the register asm construct is being used.

Otherwise gcc would generate broken code if the compiler option
--sanitize-coverage=trace-pc would be used.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

7b411ac6

s390/iucv: use basic blocks for iucv inline assemblies · eb090ad2

Heiko Carstens authored Jun 20, 2016

Use only simple inline assemblies which consist of a single basic
block if the register asm construct is being used.

Otherwise gcc would generate broken code if the compiler option
--sanitize-coverage=trace-pc would be used.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

eb090ad2

s390/crypto: use basic blocks for ap bus inline assemblies · 11d37673

Heiko Carstens authored Jun 20, 2016

Use only simple inline assemblies which consist of a single basic
block if the register asm construct is being used.

Otherwise gcc would generate broken code if the compiler option
--sanitize-coverage=trace-pc would be used.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

11d37673

s390/mm: use basic block for essa inline assembly · 931641c6

Heiko Carstens authored Jun 20, 2016

Use only simple inline assemblies which consist of a single basic
block if the register asm construct is being used.

Otherwise gcc would generate broken code if the compiler option
--sanitize-coverage=trace-pc would be used.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

931641c6

s390/lib: use basic blocks for inline assemblies · db7f5eef

Heiko Carstens authored Jun 20, 2016

Use only simple inline assemblies which consist of a single basic
block if the register asm construct is being used.

Otherwise gcc would generate broken code if the compiler option
--sanitize-coverage=trace-pc would be used.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

db7f5eef

s390/uaccess: fix __put_get_user_asm define · dc4aace1

Heiko Carstens authored Jun 20, 2016

The __put_get_user_asm defines an inline assmembly which makes use of
the asm register construct. The parameters passed to that define may
also contain function calls.

It is a gcc restriction that between register asm statements and the
use of any such annotated variables function calls may clobber the
register / variable contents. Or in other words: gcc would generate
broken code.

This can be achieved e.g. with the following code:

    get_user(x, func() ? a : b);

where the call of func would clobber register zero which is used by
the __put_get_user_asm define.
To avoid this add two static inline functions which don't have these
side effects.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

dc4aace1

s390/cpuinfo: rename cpu field to cpu number · 109ab954

Heiko Carstens authored Jun 21, 2016

The cpu field name within /proc/cpuinfo has a conflict with the
powerpc and sparc output where it contains the cpu model name.  So
rename the field name to cpu number which shouldn't generate any
conflicts.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

109ab954

s390/perf: remove perf_release/reserver_sampling functions · 2764196f

Heiko Carstens authored Jun 11, 2016

Now that the oprofile sampling code is gone there is only one user of
the sampling facility left. Therefore the reserve and release
functions can be removed.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

2764196f

s390/oprofile: remove hardware sampler support · 93dd49d0

Heiko Carstens authored Jun 10, 2016

Remove hardware sampler support from oprofile module.

The oprofile user space utilty has been switched to use the kernel
perf interface, for which we also provide hardware sampling support.

In addition the hardware sampling support is also slightly broken: it
supports only 16 bits for the pid and therefore would generate wrong
results on machines which have a pid >64k.

Also the pt_regs structure which was passed to oprofile common code
cannot necessarily be used to generate sane backtraces, since the
task(s) in question may run while the samples are fed to oprofile.
So the result would be more or less random.

However given that the only user space tools switched to the perf
interface already four years ago the hardware sampler code seems to be
unused code, and therefore it should be reasonable to remove it.

The timer based oprofile support continues to work.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Andreas Arnez <arnez@linux.vnet.ibm.com>
Acked-by: Andreas Krebbel <krebbel@linux.vnet.ibm.com>
Acked-by: Robert Richter <rric@kernel.org>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

93dd49d0

15 Jun, 2016 4 commits

s390: remove math emulation code · 11a7752e

Heiko Carstens authored Jul 01, 2015

The last in-kernel user is gone so we can finally remove this code.
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

11a7752e

s390: calculate loops_per_jiffies with fp instructions · 9c203239

Heiko Carstens authored Jul 01, 2015

Implement calculation of loops_per_jiffies with fp instructions which
are available on all 64 bit machines.
To save and restore floating point register context use the new vx support
functions.
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

9c203239

s390: Updated kernel config files · a086171a

Hendrik Brueckner authored Jun 24, 2015

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

a086171a

s390/crc32-vx: add crypto API module for optimized CRC-32 algorithms · f848dbd3

Hendrik Brueckner authored Apr 28, 2015

Add a crypto API module to access the vector extension based CRC-32
implementations.  Users can request the optimized implementation through
the shash crypto API interface.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

f848dbd3

14 Jun, 2016 3 commits

s390/crc32-vx: use vector instructions to optimize CRC-32 computation · 19c93787

Hendrik Brueckner authored Apr 28, 2015

Use vector instructions to optimize the computation of CRC-32 checksums.
An optimized version is provided for CRC-32 (IEEE 802.3 Ethernet) in
normal and bitreflected domain, as well as, for bitreflected CRC-32C
(Castagnoli).
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

19c93787

s390/vx: add support functions for in-kernel FPU use · 04864808

Hendrik Brueckner authored Feb 18, 2015

Introduce the kernel_fpu_begin() and kernel_fpu_end() function
to enclose any in-kernel use of FPU instructions and registers.
In enclosed sections, you can perform floating-point or vector
(SIMD) computations.  The functions take care of saving and
restoring FPU register contents and controls.

For usage details, see the guidelines in arch/s390/include/asm/fpu/api.h
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

04864808

s390/mm: fix compile for PAGE_DEFAULT_KEY != 0 · de3fa841

Heiko Carstens authored Jun 14, 2016

The usual problem for code that is ifdef'ed out is that it doesn't
compile after a while. That's also the case for the storage key
initialisation code, if it would be used (set PAGE_DEFAULT_KEY to
something not zero):

./arch/s390/include/asm/page.h: In function 'storage_key_init_range':
./arch/s390/include/asm/page.h:36:2: error: implicit declaration of function '__storage_key_init_range'

Since the code itself has been useful for debugging purposes several
times, remove the ifdefs and make sure the code gets compiler
coverage. The cost for this is eight bytes.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

de3fa841

13 Jun, 2016 17 commits

s390/topology: remove z10 special handling · 86d18a55

Heiko Carstens authored May 25, 2016

I don't have a z10 to test this anymore, so I have no idea if the code
works at all or even crashes. I can try to emulate, but it is just
guess work.

Nor do we know if the z10 special handling is performance wise still
better than the generic handling. There have been a lot of changes to
the scheduler.

Therefore let's play safe and remove the special handling.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

86d18a55

s390/topology: add drawer scheduling domain level · adac0f1e

Heiko Carstens authored May 25, 2016

The z13 machine added a fourth level to the cpu topology
information. The new top level is called drawer.

A drawer contains two books, which used to be the top level.

Adding this additional scheduling domain did show performance
improvements for some workloads of up to 8%, while there don't
seem to be any workloads impacted in a negative way.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

adac0f1e

topology/sysfs: provide drawer id and siblings attributes · a62247e1

Heiko Carstens authored May 21, 2016

The s390 cpu topology gained another hierarchy level. The top level is
now called drawer and contains several books. A book used to be the
top level.

In order to expose the cpu topology to user space allow to create new
sysfs attributes dependent on CONFIG_SCHED_DRAWER which an
architecture may define and select.

These additional attributes will be available:

/sys/devices/system/cpu/cpuX/topology/drawer_id
/sys/devices/system/cpu/cpuX/topology/drawer_siblings
/sys/devices/system/cpu/cpuX/topology/drawer_siblings_list
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

a62247e1

s390/ipl: rename diagnose enums · 0599eead

Heiko Carstens authored Jun 11, 2016

Rename DIAG308_IPL and DIAG308_DUMP to DIAG308_LOAD_CLEAR and
DIAG308_LOAD_NORMAL_DUMP to better reflect the associated IPL
functions.
Suggested-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

0599eead

s390/ipl: use load normal for LPAR re-ipl · 0f7451ff

Heiko Carstens authored Jun 10, 2016

Avoid clearing memory for CCW-type re-ipl within a logical
partition. This can save a significant amount of time if a logical
partition contains a lot of memory.

On the other hand we still clear memory if running within a second
level hypervisor, since the hypervisor can simply free all memory that
was used for the guest.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

0f7451ff

s390: avoid extable collisions · 6c22c986

Heiko Carstens authored Jun 10, 2016

We have some inline assemblies where the extable entry points to a
label at the end of an inline assembly which is not followed by an
instruction.

On the other hand we have also inline assemblies where the extable
entry points to the first instruction of an inline assembly.

If a first type inline asm (extable point to empty label at the end)
would be directly followed by a second type inline asm (extable points
to first instruction) then we would have two different extable entries
that point to the same instruction but would have a different target
address.

This can lead to quite random behaviour, depending on sorting order.

I verified that we currently do not have such collisions within the
kernel. However to avoid such subtle bugs add a couple of nop
instructions to those inline assemblies which contain an extable that
points to an empty label.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

6c22c986

s390/uaccess: use __builtin_expect for get_user/put_user · ee64baf4

Heiko Carstens authored Jun 13, 2016

We always expect that get_user and put_user return with zero. Give the
compiler a hint so it can slightly optimize the code and avoid
branches.
This is the same what x86 got with commit a76cf66e ("x86/uaccess:
Tell the compiler that uaccess is unlikely to fault").
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

ee64baf4

s390/uaccess: fix whitespace damage · b8ac5e2f

Heiko Carstens authored Jun 10, 2016

Fix some whitespace damage that was introduced by me with a
query-replace when removing 31 bit support.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

b8ac5e2f

s390/pci: ensure to not cross a dma segment boundary · 8ee2db3c

Sebastian Ott authored Jun 03, 2016

When we use the iommu_area_alloc helper to get dma addresses
we specify the boundary_size parameter but not the offset (called
shift in this context).

As long as the offset (start_dma) is a multiple of the boundary
we're ok (on current machines start_dma always seems to be 4GB).

Don't leave this to chance and specify the offset for iommu_area_alloc.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

8ee2db3c

s390/pci: ensure page aligned dma start address · 53b1bc9a

Sebastian Ott authored Jun 03, 2016

We don't have an architectural guarantee on the value of
the dma offset but rely on it to be at least page aligned.
Enforce page alignemt of start_dma.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

53b1bc9a

s390: use SPARSE_IRQ · bb98f396

Sebastian Ott authored Jun 02, 2016

Use dynamically allocated irq descriptors on s390 which allows
us to get rid of the s390 specific config option PCI_NR_MSI and
exploit more MSI interrupts. Also the size of the kernel image
is reduced by 131K (using performance_defconfig).
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

bb98f396

s390/Documentation: improve sort command for trace buffer · 1b8b9c81

Thomas Richter authored Jun 08, 2016

When s390 traces with hex_ascii or sprintf view are
extracted and sorted, use the sort option -s (stable)
to avoid multiple lines with the same time stamp being
sorted using the rest of the line as secondary key.
Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

1b8b9c81

s390: use __section macro everywhere · 72a9b02d

Heiko Carstens authored Jun 07, 2016

Small cleanup patch to use the shorter __section macro everywhere.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

72a9b02d

s390: add proper __ro_after_init support · d07a980c

Heiko Carstens authored Jun 07, 2016

On s390 __ro_after_init is currently mapped to __read_mostly which
means that data marked as __ro_after_init will not be protected.

Reason for this is that the common code __ro_after_init implementation
is x86 centric: the ro_after_init data section was added to rodata,
since x86 enables write protection to kernel text and rodata very
late. On s390 we have write protection for these sections enabled with
the initial page tables. So adding the ro_after_init data section to
rodata does not work on s390.

In order to make __ro_after_init work properly on s390 move the
ro_after_init data, right behind rodata. Unlike the rodata section it
will be marked read-only later after all init calls happened.

This s390 specific implementation adds new __start_ro_after_init and
__end_ro_after_init labels. Everything in between will be marked
read-only after the init calls happened. In addition to the
__ro_after_init data move also the exception table there, since from a
practical point of view it fits the __ro_after_init requirements.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

d07a980c

vmlinux.lds.h: allow arch specific handling of ro_after_init data section · 32fb2fc5

Heiko Carstens authored Jun 07, 2016

commit c74ba8b3 ("arch: Introduce post-init read-only memory")
introduced the __ro_after_init attribute which allows to add variables
to the ro_after_init data section.

This new section was added to rodata, even though it contains writable
data. This in turn causes problems on architectures which mark the
page table entries read-only that point to rodata very early.

This patch allows architectures to implement an own handling of the
.data..ro_after_init section.
Usually that would be:
- mark the rodata section read-only very early
- mark the ro_after_init section read-only within mark_rodata_ro
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

32fb2fc5

s390/mm: simplify the TLB flushing code · 64f31d58

Martin Schwidefsky authored May 25, 2016

ptep_flush_lazy and pmdp_flush_lazy use mm->context.attach_count to
decide between a lazy TLB flush vs an immediate TLB flush. The field
contains two 16-bit counters, the number of CPUs that have the mm
attached and can create TLB entries for it and the number of CPUs in
the middle of a page table update.

The __tlb_flush_asce, ptep_flush_direct and pmdp_flush_direct functions
use the attach counter and a mask check with mm_cpumask(mm) to decide
between a local flush local of the current CPU and a global flush.

For all these functions the decision between lazy vs immediate and
local vs global TLB flush can be based on CPU masks. There are two
masks: the mm->context.cpu_attach_mask with the CPUs that are actively
using the mm, and the mm_cpumask(mm) with the CPUs that have used the
mm since the last full flush. The decision between lazy vs immediate
flush is based on the mm->context.cpu_attach_mask, to decide between
local vs global flush the mm_cpumask(mm) is used.

With this patch all checks will use the CPU masks, the old counter
mm->context.attach_count with its two 16-bit values is turned into a
single counter mm->context.flush_count that keeps track of the number
of CPUs with incomplete page table updates. The sole user of this
counter is finish_arch_post_lock_switch() which waits for the end of
all page table updates.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

64f31d58

bitmap: bitmap_equal memcmp optimization · 7dd96816

Martin Schwidefsky authored May 25, 2016

The bitmap_equal function has optimized code for small bitmaps with less
than BITS_PER_LONG bits. For larger bitmaps the out-of-line function
__bitmap_equal is called.

For a constant number of bits divisible by BITS_PER_LONG the memcmp
function can be used. For s390 gcc knows how to optimize this function,
memcmp calls with up to 256 bytes / 2048 bits are translated into a
single instruction.
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

7dd96816