Commits · 1f716d05f8daee4f393dc568ea7a53c7ecfd0bfc · Kirill Smelkov / linux

An error occurred fetching the project authors.

06 May, 2016 2 commits

nfit: add sysfs dimm 'family' and 'dsm_mask' attributes · a94e3fbe

Dan Williams authored 8 years ago

Communicate the command format and supported functions to userspace
tooling.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

a94e3fbe

nfit: disable vendor specific commands · 87554098

Dan Williams authored 8 years ago

Module option to limit userspace to the publicly defined command set.
For cases where private DIMM commands may be interfering with the
kernel's handling of DIMM state this option can be set to block vendor
specific commands.

Cc: Jerry Hoemann <jerry.hoemann@hpe.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

87554098

02 May, 2016 1 commit

nfit: fix translation of command status results · 2eea6582

Dan Williams authored 8 years ago

When transportation of the command completes successfully, it indicates
that the 'status' result is valid.  Fix the missed checking and
translation of the status field at the end of acpi_nfit_ctl().
Otherwise, we fail to handle reported errors and assume commands
complete successfully.
Reported-by: Linda Knippers <linda.knippers@hpe.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

2eea6582

28 Apr, 2016 2 commits

nfit, libnvdimm: limited/whitelisted dimm command marshaling mechanism · 31eca76b

Dan Williams authored 8 years ago

There are currently 4 known similar but incompatible definitions of the
command sets that can be sent to an NVDIMM through ACPI.  It is also
clear that future platform generations (ACPI or not) will continue to
revise and extend the DIMM command set as new devices and use cases
arrive.

It is obviously untenable to continue to proliferate divergence
of these command definitions, and to that end a standardization process
has begun to provide for a unified specification.  However, that leaves a
problem about what to do with this first generation where vendors are
already shipping divergence.

The Linux kernel can support these initial diverged platforms without
giving platform-firmware free reign to continue to diverge and compound
kernel maintenance overhead.  The kernel implementation can encourage
standardization in two ways:

1/ Require that any function code that userspace wants to send be
   explicitly white-listed in the implementation.  For ACPI this means
   function codes marked as supported by acpi_check_dsm() may
   only be invoked if they appear in the white-list.  A function must be
   publicly documented before it is added to the white-list.

2/ The above restrictions can be trivially bypassed by using the
   "vendor-specific" payload command.  However, since vendor-specific
   commands are by definition not publicly documented and have the
   potential to corrupt the kernel's view of the dimm state, we provide a
   toggle to disable vendor-specific operations.  Enabling undefined
   behavior is a policy decision that can be made by the platform owner
   and encourages firmware implementations to choose public over
   private command implementations.

Based on an initial patch from Jerry Hoemann
Cc: Jerry Hoemann <jerry.hoemann@hpe.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

31eca76b

nfit, libnvdimm: clarify "commands" vs "_DSMs" · e3654eca

Dan Williams authored 8 years ago

Clarify the distinction between "commands", the ioctls userspace calls
to request the kernel take some action on a given dimm device, and
"_DSMs", the actual function numbers used in the firmware interface to
the DIMM.  _DSMs are ACPI specific whereas commands are Linux kernel
generic.

This is in preparation for breaking the 1:1 implicit relationship
between the kernel ioctl number space and the firmware specific function
numbers.

Cc: Jerry Hoemann <jerry.hoemann@hpe.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

e3654eca

11 Apr, 2016 2 commits

libnvdimm, nfit: Use ACPI_SIG_NFIT instead of hard coded string · 82595423

Lee, Chun-Yi authored 9 years ago

It's minor but that's still better to use ACPI_SIG_NFIT instead of hard
coded string.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Lee, Chun-Yi <jlee@suse.com>
Acked-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

82595423

libnvdimm, nfit: report multiple interface codes per-dimm · 8cc6ddfc

Dan Williams authored 8 years ago

Starting with ACPI 6.1 an NFIT table will report multiple 'NVDIMM
Control Region Structure' instances per-dimm, one for each supported
format interface.  Report that code in the following format in sysfs:

    nmemX/nfit/formats
    nmemX/nfit/format
    nmemX/nfit/format1
    nmemX/nfit/format2
    ...
    nmemX/nfit/formatN

Where format2 - formatN are theoretical as there are no known DIMMs with
support for more than two interface formats.

This layout is compatible with existing libndctl binaries that only
expect one code per-dimm as they will ignore nmemX/nfit/formats and
nmemX/nfit/formatN.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

8cc6ddfc

09 Mar, 2016 1 commit

ACPI: Change NFIT driver to insert new resource · af1996ef

Toshi Kani authored 9 years ago

ACPI 6 defines persistent memory (PMEM) ranges in multiple
firmware interfaces, e820, EFI, and ACPI NFIT table.  This EFI
change, however, leads to hit a bug in the grub bootloader, which
treats EFI_PERSISTENT_MEMORY type as regular memory and corrupts
stored user data [1].

Therefore, BIOS may set generic reserved type in e820 and EFI to
cover PMEM ranges.  The kernel can initialize PMEM ranges from
ACPI NFIT table alone.

This scheme causes a problem in the iomem table, though.  On x86,
for instance, e820_reserve_resources() initializes top-level entries
(iomem_resource.child) from the e820 table at early boot-time.
This creates "reserved" entry for a PMEM range, which does not allow
region_intersects() to check with PMEM type.

Change acpi_nfit_register_region() to call acpi_nfit_insert_resource(),
which calls insert_resource() to insert a PMEM entry from NFIT when
the iomem table does not have a PMEM entry already.  That is, when
a PMEM range is marked as reserved type in e820, it inserts
"Persistent Memory" entry, which results as follows.

 + "Persistent Memory"
    + "reserved"

This allows the EINJ driver, which calls region_intersects() to check
PMEM ranges, to work continuously even if BIOS sets reserved type
(or sets nothing) to PMEM ranges in e820 and EFI.

[1]: https://lists.gnu.org/archive/html/grub-devel/2015-11/msg00209.htmlSigned-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

af1996ef

06 Mar, 2016 1 commit

nfit, libnvdimm: clear poison command support · d4f32367

Dan Williams authored 9 years ago

Add the boiler-plate for a 'clear error' command based on section
9.20.7.6 "Function Index 4 - Clear Uncorrectable Error" from the ACPI
6.1 specification, and add a reference implementation in nfit_test.
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

d4f32367

05 Mar, 2016 6 commits

nfit: disable userspace initiated ars during scrub · 87bf572e

Dan Williams authored 9 years ago

While the nfit driver is issuing address range scrub commands and
reaping the results do not permit an ars_start command issued from
userspace. The scrub thread assumes that all ars completions are for
scrubs initiated by platform firmware at boot, or by the nfit driver.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

87bf572e

nfit: scrub and register regions in a workqueue · 1cf03c00

Dan Williams authored 9 years ago

Address range scrub is a potentially long running process that we want
to complete before any pmem regions are registered. Perform this
operation asynchronously to allow other drivers to load in the meantime.

Platform firmware may have initiated a partial scrub prior to the driver
loading, so we must be careful to consume those results before kicking
off kernel initiated scrubs on other regions.

This rework also makes the registration path more tolerant of scrub
errors in that it splits scrubbing into 2 phases. The first phase
synchronously waits for a platform-firmware initiated scrub to complete.
The second phase scans the remaining address ranges asynchronously and
notifies the related driver(s) when the scrub completes.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

1cf03c00

nfit, libnvdimm: async region scrub workqueue · 7ae0fa43

Dan Williams authored 9 years ago

Introduce a workqueue that will be used to run address range scrub
asynchronously with the rest of nvdimm device probing.

Userspace still wants notification when probing operations complete, so
introduce a new callback to flush this workqueue when userspace is
awaiting probe completion.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

7ae0fa43

nfit, tools/testing/nvdimm: unify common init for acpi_nfit_desc · a61fe6f7

Dan Williams authored 9 years ago

The nvdimm unit test infrastructure performs its own initialization of
an acpi_nfit_desc to specify test overrides over the native
implementation. Make it clear which attributes and operations it is
overriding by re-using acpi_nfit_init_desc() as a common starting point.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

a61fe6f7

libnvdimm, nfit: centralize command status translation · aef25338

Dan Williams authored 9 years ago

The return value from an 'ndctl_fn' reports the command execution
status, i.e. was the command properly formatted and was it successfully
submitted to the bus provider.  The new 'cmd_rc' parameter allows the bus
provider to communicate command specific results, translated into
common error codes.

Convert the ARS commands to this scheme to:

1/ Consolidate status reporting

2/ Prepare for for expanding ars unit test cases

3/ Make the implementation more generic

Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

aef25338

nfit: Continue init even if ARS commands are unimplemented · 6e2452df

Vishal Verma authored 9 years ago

If firmware doesn't implement any of the ARS commands, take that to
mean that ARS is unsupported, and continue to initialize regions without
bad block lists. We cannot make the assumption that ARS commands will be
unconditionally supported on all NVDIMMs.
Reported-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Acked-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Tested-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

6e2452df

24 Feb, 2016 1 commit

nfit: update address range scrub commands to the acpi 6.1 format · 4577b066

Dan Williams authored 9 years ago

The original format of these commands from the "NVDIMM DSM Interface
Example" [1] are superseded by the ACPI 6.1 definition of the "NVDIMM Root
Device _DSMs" [2].

[1]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
[2]: http://www.uefi.org/sites/default/files/resources/ACPI_6_1.pdf
     "9.20.7 NVDIMM Root Device _DSMs"

Changes include:
1/ New 'restart' fields in ars_status, unfortunately these are
   implemented in the middle of the existing definition so this change
   is not backwards compatible.  The expectation is that shipping
   platforms will only ever support the ACPI 6.1 definition.

2/ New status values for ars_start ('busy') and ars_status ('overflow').

Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Linda Knippers <linda.knippers@hpe.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

4577b066

19 Feb, 2016 2 commits

libnvdimm, tools/testing/nvdimm: fix 'ars_status' output buffer sizing · 747ffe11

Dan Williams authored 9 years ago

Use the output length specified in the command to size the receive
buffer rather than the arbitrary 4K limit.

This bug was hiding the fact that the ndctl implementation of
ndctl_bus_cmd_new_ars_status() was not specifying an output buffer size.

Cc: <stable@vger.kernel.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

747ffe11

nfit: fix multi-interface dimm handling, acpi6.1 compatibility · 6697b2cf

Dan Williams authored 9 years ago

ACPI 6.1 clarified that multi-interface dimms require multiple control
region entries (DCRs) per dimm.  Previously we were assuming that a
control region is only present when block-data-windows are present.
This implementation was done with an eye to be compatibility with the
looser ACPI 6.0 interpretation of this table.

1/ When coalescing the memory device (MEMDEV) tables for a single dimm,
coalesce on device_handle rather than control region index.

2/ Whenever we disocver a control region with non-zero block windows
re-scan for block-data-window (BDW) entries.

We may need to revisit this if a DIMM ever implements a format interface
outside of blk or pmem, but that is not on the foreseeable horizon.

Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

6697b2cf

09 Jan, 2016 1 commit

libnvdimm: Add a poison list and export badblocks · 0caeef63

Vishal Verma authored 9 years ago

During region creation, perform Address Range Scrubs (ARS) for the SPA
(System Physical Address) ranges to retrieve known poison locations from
firmware. Add a new data structure 'nd_poison' which is used as a list
in nvdimm_bus to store these poison locations.

When creating a pmem namespace, if there is any known poison associated
with its physical address space, convert the poison ranges to bad sectors
that are exposed using the badblocks interface.
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

0caeef63

11 Dec, 2015 1 commit

nfit: acpi_nfit_notify(): Do not leave device locked · d91e8928

Alexey Khoroshilov authored 9 years ago

Even if dev->driver is null because we are being removed,
it is safer to not leave device locked.

Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

d91e8928

30 Nov, 2015 3 commits

nfit: Adjust for different _FIT and NFIT headers · 6b577c9d

Linda Knippers authored 9 years ago

When support for _FIT was added, the code presumed that the data
returned by the _FIT method is identical to the NFIT table, which
starts with an acpi_table_header.  However, the _FIT is defined
to return a data in the format of a series of NFIT type structure
entries and as a method, has an acpi_object header rather tahn
an acpi_table_header.

To address the differences, explicitly save the acpi_table_header
from the NFIT, since it is accessible through /sys, and change
the nfit pointer in the acpi_desc structure to point to the
table entries rather than the headers.

Reported-by: Jeff Moyer (jmoyer@redhat.com>
Signed-off-by: Linda Knippers <linda.knippers@hpe.com>
Acked-by: Vishal Verma <vishal.l.verma@intel.com>
[vishal: fix up unit test for new header assumptions]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

6b577c9d

nfit: Fix the check for a successful NFIT merge · ff5a55f8

Linda Knippers authored 9 years ago

Missed previously due to a lack of test coverage on a platform that
provided an valid response to _FIT.
Signed-off-by: Linda Knippers <linda.knippers@hpe.com>
Acked-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

ff5a55f8

nfit: Account for table size length variation · 826c416f

Linda Knippers authored 9 years ago

The size of NFIT tables don't necessarily match the size of the
data structures that we use for them.  For example, the NVDIMM
Control Region Structure table is shorter for a device with
no block control windows than for a device with block control windows.
Other tables, such as Flush Hint Address Structure and the Interleave
Structure are variable length by definition.

Account for the size difference when comparing table entries by
using the actual table size from the table header if it's less
than the structure size.
Signed-off-by: Linda Knippers <linda.knippers@hpe.com>
Acked-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

826c416f

02 Nov, 2015 2 commits

acpi: nfit: Add support for hot-add · 20985164

Vishal Verma authored 9 years ago

Add a .notify callback to the acpi_nfit_driver that gets called on a
hotplug event. From this, evaluate the _FIT ACPI method which returns
the updated NFIT with handles for the hot-plugged NVDIMM.

Iterate over the new NFIT, and add any new tables found, and
register/enable the corresponding regions.

In the nfit test framework, after normal initialization, update the NFIT
with a new hot-plugged NVDIMM, and directly call into the driver to
update its view of the available regions.

Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Elliott, Robert <elliott@hpe.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: <linux-acpi@vger.kernel.org>
Cc: <linux-nvdimm@lists.01.org>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

20985164

nfit: in acpi_nfit_init, break on a 0-length table · 564d5011

Vishal Verma authored 9 years ago

If acpi_nfit_init is called (such as from nfit_test), with an nfit table
that has more memory allocated than it needs (and a similarly large
'size' field, add_tables would happily keep adding null SPA Range tables
filling up all available memory.

Make it friendlier by breaking out if a 0-length header is found in any
of the tables.

Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: <linux-acpi@vger.kernel.org>
Cc: <linux-nvdimm@lists.01.org>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

564d5011

22 Oct, 2015 1 commit

ACPICA: Update NFIT table to rename a flags field · ca321d1c

Bob Moore authored 9 years ago

ACPICA commit 534deab97fb416a13bfede15c538e2c9eac9384a

Updated one of the memory subtable flags to clarify.

Link: https://github.com/acpica/acpica/commit/534deab9Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

ca321d1c

14 Oct, 2015 1 commit

move io-64-nonatomic*.h out of asm-generic · 2f8e2c87

Christoph Hellwig authored 9 years ago

These are not implementations of default architecture code but helpers
for drivers. Move them to the place they belong to.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Darren Hart <dvhart@linux.intel.com>
Acked-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

2f8e2c87

27 Aug, 2015 3 commits

x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB · 96601adb

Dan Williams authored 9 years ago

Given that a write-back (WB) mapping plus non-temporal stores is
expected to be the most efficient way to access PMEM, update the
definition of ARCH_HAS_PMEM_API to imply arch support for
WB-mapped-PMEM.  This is needed as a pre-requisite for adding PMEM to
the direct map and mapping it with struct page.

The above clarification for X86_64 means that memcpy_to_pmem() is
permitted to use the non-temporal arch_memcpy_to_pmem() rather than
needlessly fall back to default_memcpy_to_pmem() when the pcommit
instruction is not available.  When arch_memcpy_to_pmem() is not
guaranteed to flush writes out of cache, i.e. on older X86_32
implementations where non-temporal stores may just dirty cache,
ARCH_HAS_PMEM_API is simply disabled.

The default fall back for persistent memory handling remains.  Namely,
map it with the WT (write-through) cache-type and hope for the best.

arch_has_pmem_api() is updated to only indicate whether the arch
provides the proper helpers to meet the minimum "writes are visible
outside the cache hierarchy after memcpy_to_pmem() + wmb_pmem()".  Code
that cares whether wmb_pmem() actually flushes writes to pmem must now
call arch_has_wmb_pmem() directly.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
[hch: set ARCH_HAS_PMEM_API=n on x86_32]
Reviewed-by: Christoph Hellwig <hch@lst.de>
[toshi: x86_32 compile fixes]
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

96601adb

nd_blk: change aperture mapping from WC to WB · 67a3e8fe

Ross Zwisler authored 9 years ago

This should result in a pretty sizeable performance gain for reads.  For
rough comparison I did some simple read testing using PMEM to compare
reads of write combining (WC) mappings vs write-back (WB).  This was
done on a random lab machine.

PMEM reads from a write combining mapping:
	# dd of=/dev/null if=/dev/pmem0 bs=4096 count=100000
	100000+0 records in
	100000+0 records out
	409600000 bytes (410 MB) copied, 9.2855 s, 44.1 MB/s

PMEM reads from a write-back mapping:
	# dd of=/dev/null if=/dev/pmem0 bs=4096 count=1000000
	1000000+0 records in
	1000000+0 records out
	4096000000 bytes (4.1 GB) copied, 3.44034 s, 1.2 GB/s

To be able to safely support a write-back aperture I needed to add
support for the "read flush" _DSM flag, as outlined in the DSM spec:

http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf

This flag tells the ND BLK driver that it needs to flush the cache lines
associated with the aperture after the aperture is moved but before any
new data is read.  This ensures that any stale cache lines from the
previous contents of the aperture will be discarded from the processor
cache, and the new data will be read properly from the DIMM.  We know
that the cache lines are clean and will be discarded without any
writeback because either a) the previous aperture operation was a read,
and we never modified the contents of the aperture, or b) the previous
aperture operation was a write and we must have written back the dirtied
contents of the aperture to the DIMM before the I/O was completed.

In order to add support for the "read flush" flag I needed to add a
generic routine to invalidate cache lines, mmio_flush_range().  This is
protected by the ARCH_HAS_MMIO_FLUSH Kconfig variable, and is currently
only supported on x86.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

67a3e8fe

nfit: Clarify memory device state flags strings · 402bae59

Toshi Kani authored 9 years ago

ACPI 6.0 NFIT Memory Device State Flags in Table 5-129 defines
NVDIMM status as follows.  These bits indicate multiple info,
such as failures, pending event, and capability.

  Bit [0] set to 1 to indicate that the previous SAVE to the
  Memory Device failed.
  Bit [1] set to 1 to indicate that the last RESTORE from the
  Memory Device failed.
  Bit [2] set to 1 to indicate that platform flush of data to
  Memory Device failed. As a result, the restored data content
  may be inconsistent even if SAVE and RESTORE do not indicate
  failure.
  Bit [3] set to 1 to indicate that the Memory Device is observed
  to be not armed prior to OSPM hand off. A Memory Device is
  considered armed if it is able to accept persistent writes.
  Bit [4] set to 1 to indicate that the Memory Device observed
  SMART and health events prior to OSPM handoff.

/sys/bus/nd/devices/nmemX/nfit/flags shows this flags info.
The output strings associated with the bits are "save", "restore",
"smart", etc., which can be confusing as they may be interpreted
as positive status, i.e. save succeeded.

Change also the dev_info() message in acpi_nfit_register_dimms()
to be consistent with the sysfs flags strings.
Reported-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
[ross: rename 'not_arm' to 'not_armed']
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
[djbw: defer adding bit5, HEALTH_ENABLED, for now]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

402bae59

25 Aug, 2015 1 commit

nfit, nd_blk: BLK status register is only 32 bits · de4a196c

Ross Zwisler authored 9 years ago

Only read 32 bits for the BLK status register in read_blk_stat().

The format and size of this register is defined in the
"NVDIMM Driver Writer's guide":

http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdfSigned-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reported-by: Nicholas Moulin <nicholas.w.moulin@linux.intel.com>
Tested-by: Nicholas Moulin <nicholas.w.moulin@linux.intel.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

de4a196c

28 Jul, 2015 2 commits

nfit: Don't check _STA on NVDIMM devices · 60e95f43

Linda Knippers authored 9 years ago

The _STA only applies to the root device, not the individual NVDIMMS,
so don't check here. NVDIMM device state flags are checked elsewhere.
Signed-off-by: Linda Knippers <linda.knippers@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

60e95f43

libnvdimm: Add DSM support for Address Range Scrub commands · 39c686b8

Vishal Verma authored 9 years ago

Add support for the three ARS DSM commands:
- Query ARS Capabilities - Queries the firmware to check if a given
  range supports scrub, and if so, which type (persistent vs. volatile)
- Start ARS - Starts a scrub for a given range/type
- Query ARS Status - Checks status of a previously started scrub, and
  provides the error logs if any.

  The commands are described by the example DSM spec at:
  http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf

Also add these commands to the nfit_test test framework, and return
canned data.
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

39c686b8

10 Jul, 2015 2 commits

nfit: add support for NVDIMM "latch" flag · f0f2c072

Ross Zwisler authored 9 years ago

Add support in the NFIT BLK I/O path for the "latch" flag
defined in the "Get Block NVDIMM Flags" _DSM function:

http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf

This flag requires the driver to read back the command register after it
is written in the block I/O path.  This ensures that the hardware has
fully processed the new command and moved the aperture appropriately.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

f0f2c072

nfit: update block I/O path to use PMEM API · c2ad2954

Ross Zwisler authored 9 years ago

Update the nfit block I/O path to use the new PMEM API and to adhere to
the read/write flows outlined in the "NVDIMM Block Window Driver
Writer's Guide":

http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf

This includes adding support for targeted NVDIMM flushes called "flush
hints" in the ACPI 6.0 specification:

http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf

For performance and media durability the mapping for a BLK aperture is
moved to a write-combining mapping which is consistent with
memcpy_to_pmem() and wmb_blk().
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

c2ad2954

30 Jun, 2015 1 commit

nfit: fix smatch "use after null check" report · 193ccca4

Dan Williams authored 9 years ago

drivers/acpi/nfit.c:1224 acpi_nfit_blk_region_enable()
         error: we previously assumed 'nfit_mem' could be null (see line 1223)

drivers/acpi/nfit.c
  1222          nfit_mem = nvdimm_provider_data(nvdimm);
  1223          if (!nfit_mem || !nfit_mem->dcr || !nfit_mem->bdw) {
                     ^^^^^^^^
Check.

  1224                  dev_dbg(dev, "%s: missing%s%s%s\n", __func__,
  1225                                  nfit_mem ? "" : " nfit_mem",
  1226                                  nfit_mem->dcr ? "" : " dcr",
                                        ^^^^^^^^^^^^^
Unchecked dereference.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

193ccca4

26 Jun, 2015 4 commits

libnvdimm: Add sysfs numa_node to NVDIMM devices · 74ae66c3

Toshi Kani authored 9 years ago

Add support of sysfs 'numa_node' to I/O-related NVDIMM devices
under /sys/bus/nd/devices, regionN, namespaceN.0, and bttN.x.

An example of numa_node values on a 2-socket system with a single
NVDIMM range on each socket is shown below.
  /sys/bus/nd/devices
  |-- btt0.0/numa_node:0
  |-- btt1.0/numa_node:1
  |-- btt1.1/numa_node:1
  |-- namespace0.0/numa_node:0
  |-- namespace1.0/numa_node:1
  |-- region0/numa_node:0
  |-- region1/numa_node:1

These numa_node files are then linked under the block class of
their device names.
  /sys/class/block/pmem0/device/numa_node:0
  /sys/class/block/pmem1s/device/numa_node:1

This enables numactl(8) to accept 'block:' and 'file:' paths of
pmem and btt devices as shown in the examples below.
  numactl --preferred block:pmem0 --show
  numactl --preferred file:/dev/pmem1s --show
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

74ae66c3

libnvdimm: Set numa_node to NVDIMM devices · 41d7a6d6

Toshi Kani authored 9 years ago

ACPI NFIT table has System Physical Address Range Structure entries that
describe a proximity ID of each range when ACPI_NFIT_PROXIMITY_VALID is
set in the flags.

Change acpi_nfit_register_region() to map a proximity ID to its node ID,
and set it to a new numa_node field of nd_region_desc, which is then
conveyed to the nd_region device.

The device core arranges for btt and namespace devices to inherit their
node from their parent region.
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
[djbw: move set_dev_node() from region.c to bus.c]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

41d7a6d6

libnvdimm, nfit: handle unarmed dimms, mark namespaces read-only · 58138820

Dan Williams authored 9 years ago

Upon detection of an unarmed dimm in a region, arrange for descendant
BTT, PMEM, or BLK instances to be read-only. A dimm is primarily marked
"unarmed" via flags passed by platform firmware (NFIT).

The flags in the NFIT memory device sub-structure indicate the state of
the data on the nvdimm relative to its energy source or last "flush to
persistence". For the most part there is nothing the driver can do but
advertise the state of these flags in sysfs and emit a message if
firmware indicates that the contents of the device may be corrupted.
However, for the case of ACPI_NFIT_MEM_ARMED, the driver can arrange for
the block devices incorporating that nvdimm to be marked read-only.
This is a safe default as the data is still available and new writes are
held off until the administrator either forces read-write mode, or the
energy source becomes armed.

A 'read_only' attribute is added to REGION devices to allow for
overriding the default read-only policy of all descendant block devices.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

58138820

tools/testing/nvdimm: libnvdimm unit test infrastructure · 6bc75619

Dan Williams authored 9 years ago

'libnvdimm' is the first driver sub-system in the kernel to implement
mocking for unit test coverage.  The nfit_test module gets built as an
external module and arranges for external module replacements of nfit,
libnvdimm, nd_pmem, and nd_blk.  These replacements use the linker
--wrap option to redirect calls to ioremap() + request_mem_region() to
custom defined unit test resources.  The end result is a fully
functional nvdimm_bus, as far as userspace is concerned, but with the
capability to perform otherwise destructive tests on emulated resources.

Q: Why not use QEMU for this emulation?
QEMU is not suitable for unit testing.  QEMU's role is to faithfully
emulate the platform.  A unit test's role is to unfaithfully implement
the platform with the goal of triggering bugs in the corners of the
sub-system implementation.  As bugs are discovered in platforms, or the
sub-system itself, the unit tests are extended to backstop a fix with a
reproducer unit test.

Another problem with QEMU is that it would require coordination of 3
software projects instead of 2 (kernel + libndctl [1]) to maintain and
execute the tests.  The chances for bit rot and the difficulty of
getting the tests running goes up non-linearly the more components
involved.


Q: Why submit this to the kernel tree instead of external modules in
   libndctl?
Simple, to alleviate the same risk that out-of-tree external modules
face.  Updates to drivers/nvdimm/ can be immediately evaluated to see if
they have any impact on tools/testing/nvdimm/.


Q: What are the negative implications of merging this?
It is a unique maintenance burden because the purpose of mocking an
interface to enable a unit test is to purposefully short circuit the
semantics of a routine to enable testing.  For example
__wrap_ioremap_cache() fakes the pmem driver into "ioremap()'ing" a test
resource buffer allocated by dma_alloc_coherent().  The future
maintenance burden hits when someone changes the semantics of
ioremap_cache() and wonders what the implications are for the unit test.

[1]: https://github.com/pmem/ndctl

Cc: <linux-acpi@vger.kernel.org>
Cc: Lv Zheng <lv.zheng@intel.com>
Cc: Robert Moore <robert.moore@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

6bc75619