- 19 Mar, 2019 8 commits
-
-
Joe Carnuccio authored
This patch adds multipe firmware dump template and segments support for ISP27XX/28XX. Signed-off-by: Joe Carnuccio <joe.carnuccio@cavium.com> Signed-off-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Joe Carnuccio authored
This patch does following: - Clean up NVRAM code. - Optimizes reading of primary/secondary flash image validation. - Remove 0xff mask and make correct width in FLT structure. - Use endian macros to assign static fields in fwdump header. - Correct fdwt checksum calculation. - Simplify ql_dump_buffer() interface usage. - Add endianizers to 27xx firmware image validator. - fixes compiler warnings for big endian architecture. Signed-off-by: Joe Carnuccio <joe.carnuccio@cavium.com> Signed-off-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Joe Carnuccio authored
This patch fixes reported speed for min_link and max_supported speed. Also rename sysfs nodes link_speed and max_supported to be consistent with {min|max}_suuported_speed. Signed-off-by: Joe Carnuccio <joe.carnuccio@cavium.com> Signed-off-by: Mike Hernandez <mhernandez@marvell.com> Signed-off-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Joe Carnuccio authored
This patch adds sysfs node for serdes_version and also cleans up port_speed display. Signed-off-by: Joe Carnuccio <joe.carnuccio@cavium.com> Signed-off-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Joe Carnuccio authored
This patch adds PCI device ID ISP28XX for Gen7 support. Also signature determination for primary/secondary flash image for ISP27XX/28XX is aded as part of Gen7 support. Signed-off-by: Joe Carnuccio <joe.carnuccio@cavium.com> Signed-off-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Joe Carnuccio authored
This patch fixes qla27xx_dump_{mpi|ram} api for ISP27XX. Signed-off-by: Joe Carnuccio <joe.carnuccio@cavium.com> Signed-off-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Joe Carnuccio authored
This patch removes FW default template as there will never be case where the default template would be invoked. Signed-off-by: Joe Carnuccio <joe.carnuccio@cavium.com> Signed-off-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Joe Carnuccio authored
This patch adds new sysfs node to display firmware attributes and port number. Signed-off-by: Joe Carnuccio <joe.carnuccio@cavium.com> Signed-off-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
- 18 Mar, 2019 16 commits
-
-
Suganath Prabu authored
Updated driver version to 28.100.00.00, which is equivalent to OOB Phase 9. Signed-off-by: Suganath Prabu <suganath-prabu.subramani@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Suganath Prabu authored
* Reduce the threshold value to 1/4 of the queue depth. * With this FW can find enough entries to post the Reply Descriptors in the reply descriptor post queue. * With module param, user can play with threshold value, the same irqpoll_weight is used as the budget in processing of reply descriptor post queues in _base_process_reply_queue. Signed-off-by: Suganath Prabu <suganath-prabu.subramani@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Suganath Prabu authored
Driver uses "reply descriptor post queues" in round robin fashion so that IO's are distributed to all the available reply descriptor post queues equally. With this each reply descriptor post queue load is balanced. This is enabled only if CPUs count to MSI-X vector count ratio is X:1 (where X > 1) This improves performance and also fixes soft lockups. Signed-off-by: Suganath Prabu <suganath-prabu.subramani@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Suganath Prabu authored
Issue Description: We have seen cpu lock up issue from fields if system has greater (more than 96) logical cpu count. SAS3.0 controller (Invader series) supports at max 96 msix vector and SAS3.5 product (Ventura) supports at max 128 msix vectors. This may be a generic issue (if PCI device supports completion on multiple reply queues). Let me explain it w.r.t to mpt3sas supported h/w just to simplify the problem and possible changes to handle such issues. IT HBA (mpt3sas) supports multiple reply queues in completion path. Driver creates MSI-x vectors for controller as "min of (FW supported Reply queue, Logical CPUs)". If submitter is not interrupted via completion on same CPU, there is a loop in the IO path. This behavior can cause hard/soft CPU lockups, IO timeout, system sluggish etc. Example - one CPU (e.g. CPU A) is busy submitting the IOs and another CPU (e.g. CPU B) is busy with processing the corresponding IO's reply descriptors from reply descriptor queue upon receiving the interrupts from HBA. If the CPU A is continuously pumping the IOs then always CPU B (which is executing the ISR) will see the valid reply descriptors in the reply descriptor queue and it will be continuously processing those reply descriptor in a loop without quitting the ISR handler. Mpt3sas driver will exit ISR handler if it finds unused reply descriptor in the reply descriptor queue. Since CPU A will be continuously sending the IOs, CPU B may always see a valid reply descriptor (posted by HBA Firmware after processing the IO) in the reply descriptor queue. In worst case, driver will not quit from this loop in the ISR handler. Eventually, CPU lockup will be detected by watchdog. Above mentioned behavior is not common if "rq_affinity" set to 2 or affinity_hint is honored by irqbalance as "exact". If rq_affinity is set to 2, submitter will be always interrupted via completion on same CPU. If irqbalance is using "exact" policy, interrupt will be delivered to submitter CPU. If CPU counts to MSI-X vectors (reply descriptor Queues) count ratio is not 1:1, we still have exposure of issue explained above and for that we don't have any solution. Exposure of soft/hard lockup if CPU count is more than MSI-x supported by device. If CPUs count to MSI-x vectors count ratio is not 1:1, (Other way, if CPU counts to MSI-x vector count ratio is something like X:1, where X > 1) then 'exact' irqbalance policy OR rq_affinity = 2 won't help to avoid CPU hard/soft lockups. There won't be any one to one mapping between CPU to MSI-x vector instead one MSI-x interrupt (or reply descriptor queue) is shared with group/set of CPUs and there is a possibility of having a loop in the IO path within that CPU group and may observe lockups. For example: Consider a system having two NUMA nodes and each node having four logical CPUs and also consider that number of MSI-x vectors enabled on the HBA is two, then CPUs count to MSI-x vector count ratio as 4:1. e.g. MSIx vector 0 is affinity to CPU 0, CPU 1, CPU 2 & CPU 3 of NUMA node 0 and MSI-x vector 1 is affinity to CPU 4, CPU 5, CPU 6 & CPU 7 of NUMA node 1. numactl --hardware available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 --> MSI-x 0 node 0 size: 65536 MB node 0 free: 63176 MB node 1 cpus: 4 5 6 7 -->MSI-x 1 node 1 size: 65536 MB node 1 free: 63176 MB Assume that user started an application which uses all the CPUs of NUMA node 0 for issuing the IOs. Only one CPU from affinity list (it can be any cpu since this behavior depends upon irqbalance) CPU0 will receive the interrupts from MSIx vector 0 for all the IOs. Eventually, CPU 0 IO submission percentage will be decreasing and ISR processing percentage will be increasing as it is more busy with processing the interrupts. Gradually IO submission percentage on CPU 0 will be zero and it's ISR processing percentage will be 100 percentage as IO loop has already formed within the NUMA node 0, i.e. CPU 1, CPU 2 & CPU 3 will be continuously busy with submitting the heavy IOs and only CPU 0 is busy in the ISR path as it always find the valid reply descriptor in the reply descriptor queue. Eventually, we will observe the hard lockup here. Chances of occurring of hard/soft lockups are directly proportional to value of X. If value of X is high, then chances of observing CPU lockups is high. Solution: Use IRQ poll interface defined in " irq_poll.c". mpt3sas driver will execute ISR routine in Softirq context and it will always quit the loop based on budget provided in IRQ poll interface. In these scenarios (i.e. where CPUs count to MSI-X vectors count ratio is X:1 (where X > 1)), IRQ poll interface will avoid CPU hard lockups due to voluntary exit from the reply queue processing based on budget. Note - Only one MSI-x vector is busy doing processing. Irqstat output: IRQs / 1 second(s) IRQ# TOTAL NODE0 NODE1 NODE2 NODE3 NAME 44 122871 122871 0 0 0 IR-PCI-MSI-edge mpt3sas0-msix0 45 0 0 0 0 0 IR-PCI-MSI-edge mpt3sas0-msix1 We use this approach only if cpu count is more than FW supported MSI-x vector Signed-off-by: Suganath Prabu <suganath-prabu.subramani@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Suganath Prabu authored
Separate out processing of reply descriptor post queue from _base_interrupt to _base_process_reply_queue. Signed-off-by: Suganath Prabu <suganath-prabu.subramani@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Suganath Prabu authored
Fixed typo in request_desript_type. request_desript_type --> request_descript_type. Signed-off-by: Suganath Prabu <suganath-prabu.subramani@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Alan Adamson authored
The product_id and revision attributes will allow for the modification of the T10 Model and Revision strings returned in inquiry responses. Its value can be viewed and modified via the ConfigFS path at: target/core/$backstore/$name/wwn/product_id target/core/$backstore/$name/wwn/revision [mkp: dropped parentheses as requested by Bart] Signed-off-by: Alan Adamson <alan.adamson@oracle.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Don Brace authored
Reviewed-by: Gerry Morong <gerry.morong@microsemi.com> Reviewed-by: David Carroll <david.carroll@microsemi.com> Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Reviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Don Brace authored
Reviewed-by: David Carroll <david.carroll@microsemi.com> Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Reviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Don Brace authored
Reviewed-by: Gerry Morong <gerry.morong@microsemi.com> Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Reviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: David Carroll <david.carroll@microsemi.com> Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Ajish Koshy authored
Reviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Reviewed-by: David Carroll <david.carroll@microsemi.com> Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Ajish Koshy <ajish.koshy@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Kevin Barnett authored
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Reviewed-by: David Carroll <david.carroll@microsemi.com> Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Don Brace authored
Reviewed-by: Gerry Morong <gerry.morong@microsemi.com> Reviewed-by: Scott Teel <scott.teel@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Don Brace authored
There are times when a TUR can take longer than the DEFAULT_TIMEOUT value. The timeout code is not correct as the function exits with an automatic as the completion variable...To be fixed later. Remove the TUR timeout. Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Reviewed-by: Scott Teel <scott.teel@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Don Brace authored
Correct a 'rare' race condition where a disk is failed after a device list has been obtained from the controller and before attempting to get the device id. Reviewed-by: Scott Teel <scott.teel@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Don Brace authored
Multipath failures are normally detected at the frequency of the event thread. Detect LUN failures earlier by checking request completion status. Reviewed-by: Bader Ali-saleh <bader.ali-saleh@microsemi.com> Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Reviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Prasad Munirathnam <Prasad.Munirathnam@microsemi.com> Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
- 17 Mar, 2019 14 commits
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuildLinus Torvalds authored
Pull more Kbuild updates from Masahiro Yamada: - add more Build-Depends to Debian source package - prefix header search paths with $(srctree)/ - make modpost show verbose section mismatch warnings - avoid hard-coded CROSS_COMPILE for h8300 - fix regression for Debian make-kpkg command - add semantic patch to detect missing put_device() - fix some warnings of 'make deb-pkg' - optimize NOSTDINC_FLAGS evaluation - add warnings about redundant generic-y - clean up Makefiles and scripts * tag 'kbuild-v5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kconfig: remove stale lxdialog/.gitignore kbuild: force all architectures except um to include mandatory-y kbuild: warn redundant generic-y Revert "modsign: Abort modules_install when signing fails" kbuild: Make NOSTDINC_FLAGS a simply expanded variable kbuild: deb-pkg: avoid implicit effects coccinelle: semantic code search for missing put_device() kbuild: pkg: grep include/config/auto.conf instead of $KCONFIG_CONFIG kbuild: deb-pkg: introduce is_enabled and if_enabled_echo to builddeb kbuild: deb-pkg: add CONFIG_ prefix to kernel config options kbuild: add workaround for Debian make-kpkg kbuild: source include/config/auto.conf instead of ${KCONFIG_CONFIG} unicore32: simplify linker script generation for decompressor h8300: use cc-cross-prefix instead of hardcoding h8300-unknown-linux- kbuild: move archive command to scripts/Makefile.lib modpost: always show verbose warning for section mismatch ia64: prefix header search path with $(srctree)/ libfdt: prefix header search paths with $(srctree)/ deb-pkg: generate correct build dependencies
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 asm updates from Thomas Gleixner: "Two cleanup patches removing dead conditionals and unused code" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asm: Remove unused __constant_c_x_memset() macro and inlines x86/asm: Remove dead __GNUC__ conditionals
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull perf fixes from Thomas Gleixner: "Three fixes for the fallout from the TSX errata workaround: - Prevent memory corruption caused by a unchecked out of bound array index. - Two trivial fixes to address compiler warnings" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Make dev_attr_allow_tsx_force_abort static perf/x86: Fixup typo in stub functions perf/x86/intel: Fix memory corruption
-
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tipLinus Torvalds authored
Pull xen fix from Juergen Gross: "A fix for a Xen bug introduced by David's series for excluding ballooned pages in vmcores" * tag 'for-linus-5.1b-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/balloon: Fix mapping PG_offline pages to user space
-
git://github.com/martinetd/linuxLinus Torvalds authored
Pull 9p updates from Dominique Martinet: "Here is a 9p update for 5.1; there honestly hasn't been much. Two fixes (leak on invalid mount argument and possible deadlock on i_size update on 32bit smp) and a fall-through warning cleanup" * tag '9p-for-5.1' of git://github.com/martinetd/linux: 9p/net: fix memory leak in p9_client_create 9p: use inode->i_lock to protect i_size_write() under 32-bit 9p: mark expected switch fall-through
-
kbuild test robot authored
Fixes: 400816f6 ("perf/x86/intel: Implement support for TSX Force Abort") Signed-off-by: kbuild test robot <lkp@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: kbuild-all@01.org Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190313184243.GA10820@lkp-sb-ep06
-
Masahiro Yamada authored
When this .gitignore was added, lxdialog was an independent hostprogs-y. Now that all objects in lxdialog/ are directly linked to mconf, the lxdialog is no longer generated. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
-
Masahiro Yamada authored
Currently, every arch/*/include/uapi/asm/Kbuild explicitly includes the common Kbuild.asm file. Factor out the duplicated include directives to scripts/Makefile.asm-generic so that no architecture would opt out of the mandatory-y mechanism. um is not forced to include mandatory-y since it is a very exceptional case which does not support UAPI. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
-
Masahiro Yamada authored
The generic-y is redundant under the following condition: - arch has its own implementation - the same header is added to generated-y - the same header is added to mandatory-y If a redundant generic-y is found, the warning like follows is displayed: scripts/Makefile.asm-generic:20: redundant generic-y found in arch/arm/include/asm/Kbuild: timex.h I fixed up arch Kbuild files found by this. Suggested-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
-
Douglas Anderson authored
This reverts commit caf6fe91. The commit was fine but is no longer needed as of commit 3a2429e1 ("kbuild: change if_changed_rule for multi-line recipe"). Let's go back to using ";" to be consistent. For some discussion, see: https://lkml.kernel.org/r/CAK7LNASde0Q9S5GKeQiWhArfER4S4wL1=R_FW8q0++_X3T5=hQ@mail.gmail.comSigned-off-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
-
Douglas Anderson authored
During a simple no-op (nothing changed) build I saw 39 invocations of the C compiler with the argument "-print-file-name=include". We don't need to call the C compiler 39 times for this--one time will suffice. Let's change NOSTDINC_FLAGS to a simply expanded variable to avoid this since there doesn't appear to be any reason it should be recursively expanded. On my build this shaved ~400 ms off my "no-op" build. Note that the recursive expansion seems to date back to the (really old) commit e8f5bdb0 ("[PATCH] Makefile include path ordering"). It's a little unclear to me if the point of that patch was to switch the variable to be recursively expanded (which it did) or to avoid directly assigning to NOSTDINC_FLAGS (AKA to switch to +=) because someone else (out of tree?) was setting it. I presume later since if the only goal was to switch to recursive expansion the patch would have just removed the ":". Signed-off-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
-
Arseny Maslennikov authored
* The man page for dpkg-source(1) notes: > -b, --build directory [format-specific-parameters] > Build a source package (--build since dpkg 1.17.14). > <...> > > dpkg-source will build the source package with the first > format found in this ordered list: the format indicated > with the --format command line option, the format > indicated in debian/source/format, “1.0”. The fallback > to “1.0” is deprecated and will be removed at some point > in the future, you should always document the desired > source format in debian/source/format. See section > SOURCE PACKAGE FORMATS for an extensive description of > the various source package formats. Thus it would be more foolproof to explicitly use 1.0 (as we always did) than to rely on dpkg-source's defaults. * In a similar vein, debian/rules is not made executable by mkdebian, and dpkg-source warns about that but still silently fixes the file. Let's be explicit once again. Signed-off-by: Arseny Maslennikov <ar@cs.msu.ru> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
-
Wen Yang authored
The of_find_device_by_node() takes a reference to the underlying device structure, we should release that reference. The implementation of this semantic code search is: In a function, for a local variable returned by calling of_find_device_by_node(), a, if it is released by a function such as put_device()/of_dev_put()/platform_device_put() after the last use, it is considered that there is no reference leak; b, if it is passed back to the caller via dev_get_drvdata()/platform_get_drvdata()/get_device(), etc., the reference will be released in other functions, and the current function also considers that there is no reference leak; c, for the rest of the situation, the current function should release the reference by calling put_device, this code search will report the corresponding error message. By using this semantic code search, we have found some object reference leaks, such as: commit 11907e9d ("ASoC: fsl-asoc-card: fix object reference leaks in fsl_asoc_card_probe") commit a12085d1 ("mtd: rawnand: atmel: fix possible object reference leak") commit 11493f26 ("mtd: rawnand: jz4780: fix possible object reference leak") There are still dozens of reference leaks in the current kernel code. Further, for the case of b, the object returned to other functions may also have a reference leak, we will continue to develop other cocci scripts to further check the reference leak. Signed-off-by: Wen Yang <wen.yang99@zte.com.cn> Reviewed-by: Julia Lawall <Julia.Lawall@lip6.fr> Reviewed-by: Markus Elfring <Markus.Elfring@web.de> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
-
- 16 Mar, 2019 2 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linuxLinus Torvalds authored
Pull pidfd system call from Christian Brauner: "This introduces the ability to use file descriptors from /proc/<pid>/ as stable handles on struct pid. Even if a pid is recycled the handle will not change. For a start these fds can be used to send signals to the processes they refer to. With the ability to use /proc/<pid> fds as stable handles on struct pid we can fix a long-standing issue where after a process has exited its pid can be reused by another process. If a caller sends a signal to a reused pid it will end up signaling the wrong process. With this patchset we enable a variety of use cases. One obvious example is that we can now safely delegate an important part of process management - sending signals - to processes other than the parent of a given process by sending file descriptors around via scm rights and not fearing that the given process will have been recycled in the meantime. It also allows for easy testing whether a given process is still alive or not by sending signal 0 to a pidfd which is quite handy. There has been some interest in this feature e.g. from systems management (systemd, glibc) and container managers. I have requested and gotten comments from glibc to make sure that this syscall is suitable for their needs as well. In the future I expect it to take on most other pid-based signal syscalls. But such features are left for the future once they are needed. This has been sitting in linux-next for quite a while and has not caused any issues. It comes with selftests which verify basic functionality and also test that a recycled pid cannot be signaled via a pidfd. Jon has written about a prior version of this patchset. It should cover the basic functionality since not a lot has changed since then: https://lwn.net/Articles/773459/ The commit message for the syscall itself is extensively documenting the syscall, including it's functionality and extensibility" * tag 'pidfd-v5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: selftests: add tests for pidfd_send_signal() signal: add pidfd_send_signal() syscall
-
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimmLinus Torvalds authored
Pull device-dax updates from Dan Williams: "New device-dax infrastructure to allow persistent memory and other "reserved" / performance differentiated memories, to be assigned to the core-mm as "System RAM". Some users want to use persistent memory as additional volatile memory. They are willing to cope with potential performance differences, for example between DRAM and 3D Xpoint, and want to use typical Linux memory management apis rather than a userspace memory allocator layered over an mmap() of a dax file. The administration model is to decide how much Persistent Memory (pmem) to use as System RAM, create a device-dax-mode namespace of that size, and then assign it to the core-mm. The rationale for device-dax is that it is a generic memory-mapping driver that can be layered over any "special purpose" memory, not just pmem. On subsequent boots udev rules can be used to restore the memory assignment. One implication of using pmem as RAM is that mlock() no longer keeps data off persistent media. For this reason it is recommended to enable NVDIMM Security (previously merged for 5.0) to encrypt pmem contents at rest. We considered making this recommendation an actively enforced requirement, but in the end decided to leave it as a distribution / administrator policy to allow for emulation and test environments that lack security capable NVDIMMs. Summary: - Replace the /sys/class/dax device model with /sys/bus/dax, and include a compat driver so distributions can opt-in to the new ABI. - Allow for an alternative driver for the device-dax address-range - Introduce the 'kmem' driver to hotplug / assign a device-dax address-range to the core-mm. - Arrange for the device-dax target-node to be onlined so that the newly added memory range can be uniquely referenced by numa apis" NOTE! I'm not entirely happy with the whole "PMEM as RAM" model because we currently have special - and very annoying rules in the kernel about accessing PMEM only with the "MC safe" accessors, because machine checks inside the regular repeat string copy functions can be fatal in some (not described) circumstances. And apparently the PMEM modules can cause that a lot more than regular RAM. The argument is that this happens because PMEM doesn't necessarily get scrubbed at boot like RAM does, but that is planned to be added for the user space tooling. Quoting Dan from another email: "The exposure can be reduced in the volatile-RAM case by scanning for and clearing errors before it is onlined as RAM. The userspace tooling for that can be in place before v5.1-final. There's also runtime notifications of errors via acpi_nfit_uc_error_notify() from background scrubbers on the DIMM devices. With that mechanism the kernel could proactively clear newly discovered poison in the volatile case, but that would be additional development more suitable for v5.2. I understand the concern, and the need to highlight this issue by tapping the brakes on feature development, but I don't see PMEM as RAM making the situation worse when the exposure is also there via DAX in the PMEM case. Volatile-RAM is arguably a safer use case since it's possible to repair pages where the persistent case needs active application coordination" * tag 'devdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: device-dax: "Hotplug" persistent memory for use like normal RAM mm/resource: Let walk_system_ram_range() search child resources mm/memory-hotplug: Allow memory resources to be children mm/resource: Move HMM pr_debug() deeper into resource code mm/resource: Return real error codes from walk failures device-dax: Add a 'modalias' attribute to DAX 'bus' devices device-dax: Add a 'target_node' attribute device-dax: Auto-bind device after successful new_id acpi/nfit, device-dax: Identify differentiated memory with a unique numa-node device-dax: Add /sys/class/dax backwards compatibility device-dax: Add support for a dax override driver device-dax: Move resource pinning+mapping into the common driver device-dax: Introduce bus + driver model device-dax: Start defining a dax bus model device-dax: Remove multi-resource infrastructure device-dax: Kill dax_region base device-dax: Kill dax_region ida
-