• Ross Zwisler's avatar
    x86/asm: Add support for the CLWB instruction · d9dc64f3
    Ross Zwisler authored
    Add support for the new CLWB (cache line write back)
    instruction.  This instruction was announced in the document
    "Intel Architecture Instruction Set Extensions Programming
    Reference" with reference number 319433-022.
    
      https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
    
    The CLWB instruction is used to write back the contents of
    dirtied cache lines to memory without evicting the cache lines
    from the processor's cache hierarchy.  This should be used in
    favor of clflushopt or clflush in cases where you require the
    cache line to be written to memory but plan to access the data
    again in the near future.
    
    One of the main use cases for this is with persistent memory
    where CLWB can be used with PCOMMIT to ensure that data has been
    accepted to memory and is durable on the DIMM.
    
    This function shows how to properly use CLWB/CLFLUSHOPT/CLFLUSH
    and PCOMMIT with appropriate fencing:
    
    void flush_and_commit_buffer(void *vaddr, unsigned int size)
    {
    	void *vend = vaddr + size - 1;
    
    	for (; vaddr < vend; vaddr += boot_cpu_data.x86_clflush_size)
    		clwb(vaddr);
    
    	/* Flush any possible final partial cacheline */
    	clwb(vend);
    
    	/*
    	 * Use SFENCE to order CLWB/CLFLUSHOPT/CLFLUSH cache flushes.
    	 * (MFENCE via mb() also works)
    	 */
    	wmb();
    
    	/* PCOMMIT and the required SFENCE for ordering */
    	pcommit_sfence();
    }
    
    After this function completes the data pointed to by vaddr is
    has been accepted to memory and will be durable if the vaddr
    points to persistent memory.
    
    Regarding the details of how the alternatives assembly is set
    up, we need one additional byte at the beginning of the CLFLUSH
    so that we can flip it into a CLFLUSHOPT by changing that byte
    into a 0x66 prefix.  Two options are to either insert a 1 byte
    ASM_NOP1, or to add a 1 byte NOP_DS_PREFIX.  Both have no
    functional effect with the plain CLFLUSH, but I've been told
    that executing a CLFLUSH + prefix should be faster than
    executing a CLFLUSH + NOP.
    
    We had to hard code the assembly for CLWB because, lacking the
    ability to assemble the CLWB instruction itself, the next
    closest thing is to have an xsaveopt instruction with a 0x66
    prefix.  Unfortunately XSAVEOPT itself is also relatively new,
    and isn't included by all the GCC versions that the kernel needs
    to support.
    Signed-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
    Acked-by: default avatarBorislav Petkov <bp@suse.de>
    Acked-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
    Cc: Andy Lutomirski <luto@amacapital.net>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Brian Gerst <brgerst@gmail.com>
    Cc: Denys Vlasenko <dvlasenk@redhat.com>
    Cc: H. Peter Anvin <hpa@zytor.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Link: http://lkml.kernel.org/r/1422377631-8986-3-git-send-email-ross.zwisler@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
    d9dc64f3
cpufeature.h 26.8 KB