- 17 Apr, 2017 8 commits
-
-
Saeed Mahameed authored
Create mlx5e IPoIB netdevice profile skeleton in the new ipoib.c file with empty implementation. Downstream patches will provide the full mlx5 rdma netdevice acceleration support for IPoIB into this new file, by using the mlx5e netdevice profile and new mlx5_channels APIs and infrastructures. Same as already done in mlx5e NIC netdevice and switchdev mode VF representors. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Saeed Mahameed authored
In preparation for mlx5e RDMA net_device support, here we generalize mlx5e_attach/detach in a way that those functions will be agnostic to link type. For that we move ethernet specific NIC net device logic out of those functions into {nic,rep}_{enable/disable} mlx5e NIC and representor profiles callbacks. Also some of the logic was moved only to NIC profile since it is not right to have this logic for representor net device (e.g. set port MTU). Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Erez Shitrit authored
Get the relevant capabilities if supports ipoib_enhanced_offloads and init the flow steering table accordingly. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Erez Shitrit authored
IB flow tables need the underlay qp to perform flow steering. Here we change the API of the flow tables creation to accept the underlay QP number as a parameter in order to support IB (IPoIB) flow steering. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Erez Shitrit authored
New capability bit: ipoib_enhanced_offloads, indicates new ability for UD QP to do RSS and enhanced IPoIB offloads and acceleration. Add underlay_qpn to the TIS and flow_table objects In order to support SET_ROOT command, to connect between IPoIB QPs and flow steering tables. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Haiyang Zhang authored
Azure hosts are not supporting non-TCP port numbers in vRSS hashing for now. For example, UDP packet loss rate will be high if port numbers are also included in vRSS hash. So, we created this patch to use only IP numbers for hashing in non-TCP traffic. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Haiyang Zhang authored
If the outgoing skb has a RX queue mapping available, we use the queue number directly, other than put it through Send Indirection Table. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vivien Didelot authored
This patch moves as is the legacy DSA code from dsa.c to legacy.c, except the few shared symbols which remain in dsa.c. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 16 Apr, 2017 1 commit
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller authored
Conflicts were simply overlapping changes. In the net/ipv4/route.c case the code had simply moved around a little bit and the same fix was made in both 'net' and 'net-next'. In the net/sched/sch_generic.c case a fix in 'net' happened at the same time that a new argument was added to qdisc_hash_add(). Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 15 Apr, 2017 3 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/inputLinus Torvalds authored
Pull input fixes from Dmitry Torokhov: "Just a small update to xpad driver to recognize yet another gamepad, and another change making sure userio.h is exported" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: xpad - add support for Razer Wildcat gamepad uapi: add missing install of userio.h
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds authored
Pull networking fixes from David Miller: "Things seem to be settling down as far as networking is concerned, let's hope this trend continues... 1) Add iov_iter_revert() and use it to fix the behavior of skb_copy_datagram_msg() et al., from Al Viro. 2) Fix the protocol used in the synthetic SKB we cons up for the purposes of doing a simulated route lookup for RTM_GETROUTE requests. From Florian Larysch. 3) Don't add noop_qdisc to the per-device qdisc hashes, from Cong Wang. 4) Don't call netdev_change_features with the team lock held, from Xin Long. 5) Revert TCP F-RTO extension to catch more spurious timeouts because it interacts very badly with some middle-boxes. From Yuchung Cheng. 6) Fix the loss of error values in l2tp {s,g}etsockopt calls, from Guillaume Nault. 7) ctnetlink uses bit positions where it should be using bit masks, fix from Liping Zhang. 8) Missing RCU locking in netfilter helper code, from Gao Feng. 9) Avoid double frees and use-after-frees in tcp_disconnect(), from Eric Dumazet. 10) Don't do a changelink before we register the netdevice in bridging, from Ido Schimmel. 11) Lock the ipv6 device address list properly, from Rabin Vincent" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (29 commits) netfilter: ipt_CLUSTERIP: Fix wrong conntrack netns refcnt usage netfilter: nft_hash: do not dump the auto generated seed drivers: net: usb: qmi_wwan: add QMI_QUIRK_SET_DTR for Telit PID 0x1201 ipv6: Fix idev->addr_list corruption net: xdp: don't export dev_change_xdp_fd() bridge: netlink: register netdevice before executing changelink bridge: implement missing ndo_uninit() bpf: reference may_access_skb() from __bpf_prog_run() tcp: clear saved_syn in tcp_disconnect() netfilter: nf_ct_expect: use proper RCU list traversal/update APIs netfilter: ctnetlink: skip dumping expect when nfct_help(ct) is NULL netfilter: make it safer during the inet6_dev->addr_list traversal netfilter: ctnetlink: make it safer when checking the ct helper name netfilter: helper: Add the rcu lock when call __nf_conntrack_helper_find netfilter: ctnetlink: using bit to represent the ct event netfilter: xt_TCPMSS: add more sanity tests on tcph->doff net: tcp: Increase TCP_MIB_OUTRSTS even though fail to alloc skb l2tp: don't mask errors in pppol2tp_getsockopt() l2tp: don't mask errors in pppol2tp_setsockopt() tcp: restrict F-RTO to work-around broken middle-boxes ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 fixes from Thomas Gleixner: "A set of small fixes for x86: - fix locking in RDT to prevent memory leaks and freeing in use memory - prevent setting invalid values for vdso32_enabled which cause inconsistencies for user space resulting in application crashes. - plug a race in the vdso32 code between fork and sysctl which causes inconsistencies for user space resulting in application crashes. - make MPX signal delivery work in compat mode - make the dmesg output of traps and faults readable again" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/intel_rdt: Fix locking in rdtgroup_schemata_write() x86/debug: Fix the printk() debug output of signal_fault(), do_trap() and do_general_protection() x86/vdso: Plug race between mapping and ELF header setup x86/vdso: Ensure vdso32_enabled gets set to valid values only x86/signals: Fix lower/upper bound reporting in compat siginfo
-
- 14 Apr, 2017 28 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull perf fixes from Thomas Gleixner: "Two small fixes for perf: - the move to support cross arch annotation introduced per arch initialization requirements, fullfill them for s/390 (Christian Borntraeger) - add the missing initialization to the LBR entries to avoid exposing random or stale data" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Avoid exposing wrong/stale data in intel_pmu_lbr_read_32() perf annotate s390: Fix perf annotate error -95 (4.10 regression)
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull irq fixes from Thomas Gleixner: "The irq department provides: - two fixes for the CPU affinity spread infrastructure to prevent unbalanced spreading in corner cases which leads to horrible performance, because interrupts are rather aggregated than spread - add a missing spinlock initializer in the imx-gpcv2 init code" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/irq-imx-gpcv2: Fix spinlock initialization irq/affinity: Fix extra vecs calculation irq/affinity: Fix CPU spread for unbalanced nodes
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull EFI fixes from Thomas Gleixner: "Three fixes from EFI land: - prevent accessing a Graphic Output Device (GOP) which the kernel does not know to handle - prevent PCI reconfiguration to modify a BAR which covers the framebuffer because that's already in use through the EFI GOP interface - avoid reserving EFI runtime regions as this results in bogus memory mappings" * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/efi: Don't try to reserve runtime regions efi/fb: Avoid reconfiguration of BAR that covers the framebuffer efi/libstub: Skip GOP with PIXEL_BLT_ONLY format
-
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfsLinus Torvalds authored
Pull btrfs fixes from Chris Mason: "Dave Sterba collected a few more fixes for the last rc. These aren't marked for stable, but I'm putting them in with a batch were testing/sending by hand for this release" * 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: fix potential use-after-free for cloned bio Btrfs: fix segmentation fault when doing dio read Btrfs: fix invalid dereference in btrfs_retry_endio btrfs: drop the nossd flag when remounting with -o ssd
-
git://git.samba.org/sfrench/cifs-2.6Linus Torvalds authored
Pull more CIFS fixes from Steve French: "As promised, here is the remaining set of cifs/smb3 fixes for stable (and a fix for one regression) now that they have had additional review and testing" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: CIFS: Fix SMB3 mount without specifying a security mechanism CIFS: store results of cifs_reopen_file to avoid infinite wait CIFS: remove bad_network_name flag CIFS: reconnect thread reschedule itself CIFS: handle guest access errors to Windows shares CIFS: Fix null pointer deref during read resp processing
-
git://github.com/bzolnier/linuxLinus Torvalds authored
Pull fbdev fixes from Bartlomiej Zolnierkiewicz: - fix probing time checks in omapfb driver (regression fix) - fix optional VBAT support in ssd1307fb driver (regression fix) - fix connecting to backend in xen-fbfront driver * tag 'fbdev-v4.11-rc6' of git://github.com/bzolnier/linux: fbdev: omapfb: delete check_required_callbacks() xen, fbfront: fix connecting to backend fbdev/ssd1307fb: fix optional VBAT support
-
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds authored
Pull power management fixes from Rafael Wysocki: "These fix a cpufreq core regression related to CPU online/offline and several issues in the turbostat and cpupower utilities. Specifics: - Allow CPUs to be put back online even if the cpufreq driver is unable to work with them (eg. due to missing information from platform firmware), which was the previous behavior expected by users, but changed in the 4.9 time frame (Chen Yu). - Fix a few minor issues in the turbostat utility, introduced mostly during the recent update of it (Len Brown, Doug Smythies). - Fix a cpupower utility bug causing it to report incorrect values for turbo frequencies in some cases (Ben Hutchings)" * tag 'pm-4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpupower: Fix turbo frequency reporting for pre-Sandy Bridge cores cpufreq: Bring CPUs up even if cpufreq_online() failed tools/power turbostat: update version number tools/power turbostat: fix impossibly large CPU%c1 value tools/power turbostat: turbostat.8 add missing column definitions tools/power turbostat: update HWP dump to decimal from hex tools/power turbostat: enable package THERM_INTERRUPT dump tools/power turbostat: show missing Core and GFX power on SKL and KBL tools/power turbostat: bugfix: GFXMHz column not changing
-
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds authored
Pull ACPI fixes from Rafael Wysocki: "These revert a recent ACPICA commit that turned out to be problematic and fix a device enumeration breakage from the 4.8 cycle. Specifics: - Revert a recent ACPICA commit targeted at catching firmware bugs which promptly did that and caused functional problems to appear (Rafael Wysocki). - Fix a device enumeration problem introduced in the 4.8 time frame which caused the ACPI docking station driver to report incorrect status via sysfs among other things (Rafael Wysocki)" * tag 'acpi-4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: Revert "ACPICA: Resources: Not a valid resource if buffer length too long" ACPI / scan: Set the visited flag for all enumerated devices
-
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linuxLinus Torvalds authored
Pull CONFIG_STRICT_DEVMEM fix from Kees Cook: "Fixes /dev/mem to read back zeros for System RAM areas in the 1MB exception area on x86 to avoid exposing RAM or tripping hardened usercopy" * tag 'devmem-v4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: mm: Tighten x86 /dev/mem with zeroing reads
-
git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds authored
Pull virtio fixes from Michael S. Tsirkin: "virtio oops fixes The virtio pci rework using shared interrupts caused a lot of issues. We tried to fix them but run out of time. Revert for now, and revisit the issue for the next kernel. Luckily we are able to do this without loosing automatic interrupt NUMA affinity which was the main motivator for the rework" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio-pci: Remove affinity hint before freeing the interrupt Revert "virtio_pci: remove struct virtio_pci_vq_info" Revert "virtio_pci: use shared interrupts for virtqueues" Revert "virtio_pci: don't duplicate the msix_enable flag in struct pci_dev" Revert "virtio_pci: simplify MSI-X setup" Revert "virtio_pci: fix out of bound access for msix_names" MAINTAINERS: fix virtio file pattern virtio_console: fix uninitialized variable use virtio_net: clear MTU when out of range virtio: allow drivers to validate features virtio_net: enable big packets for large MTU values
-
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller authored
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree, they are: 1) Missing TCP header sanity check in TCPMSS target, from Eric Dumazet. 2) Incorrect event message type for related conntracks created via ctnetlink, from Liping Zhang. 3) Fix incorrect rcu locking when handling helpers from ctnetlink, from Gao feng. 4) Fix missing rcu locking when updating helper, from Liping Zhang. 5) Fix missing read_lock_bh when iterating over list of device addresses from TPROXY and redirect, also from Liping. 6) Fix crash when trying to dump expectations from conntrack with no helper via ctnetlink, from Liping. 7) Missing RCU protection to expecation list update given ctnetlink iterates over the list under rcu read lock side, from Liping too. 8) Don't dump autogenerated seed in nft_hash to userspace, this is very confusing to the user, again from Liping. 9) Fix wrong conntrack netns module refcount in ipt_CLUSTERIP, from Gao feng. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Aaro Koskinen authored
Commit 561eb9d0 ("fbdev: omap/lcd: Make callbacks optional") made panel callbacks optional but forgot to update check_required_callbacks(). As a result many (all?) OMAP systems using omapfb will crash at boot. Fix by deleting the whole function. Fixes: 561eb9d0 ("fbdev: omap/lcd: Make callbacks optional") Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
-
Rafael J. Wysocki authored
* acpi-scan-fixes: ACPI / scan: Set the visited flag for all enumerated devices * acpica-fixes: Revert "ACPICA: Resources: Not a valid resource if buffer length too long"
-
Rafael J. Wysocki authored
* pm-cpufreq-fixes: cpufreq: Bring CPUs up even if cpufreq_online() failed * pm-tools-fixes: cpupower: Fix turbo frequency reporting for pre-Sandy Bridge cores tools/power turbostat: update version number tools/power turbostat: fix impossibly large CPU%c1 value tools/power turbostat: turbostat.8 add missing column definitions tools/power turbostat: update HWP dump to decimal from hex tools/power turbostat: enable package THERM_INTERRUPT dump tools/power turbostat: show missing Core and GFX power on SKL and KBL tools/power turbostat: bugfix: GFXMHz column not changing
-
Tyler Baker authored
The raw_spinlock in the IMX GPCV2 interupt chip is not initialized before usage. That results in a lockdep splat: INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. Add the missing raw_spin_lock_init() to the setup code. Fixes: e324c4dc ("irqchip/imx-gpcv2: IMX GPCv2 driver for wakeup sources") Signed-off-by: Tyler Baker <tyler.baker@linaro.org> Reviewed-by: Fabio Estevam <fabio.estevam@nxp.com> Cc: jason@lakedaemon.net Cc: marc.zyngier@arm.com Cc: shawnguo@kernel.org Cc: andrew.smirnov@gmail.com Cc: linux-arm-kernel@lists.infradead.org Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20170413222731.5917-1-tyler.baker@linaro.orgSigned-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Peter Zijlstra authored
When the perf_branch_entry::{in_tx,abort,cycles} fields were added, intel_pmu_lbr_read_32() wasn't updated to initialize them. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Cc: <stable@vger.kernel.org> Fixes: 135c5612 ("perf/x86/intel: Support Haswell/v4 LBR format") Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
Linus Torvalds authored
Merge fixes from Andrew Morton: "11 fixes. The presence of 'thp: reduce indentation level in change_huge_pmd()' is unfortunate. But the patchset had been decently reviewed and tested before we decided it was needed in -stable and I felt it best not to churn things at the last minute" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mailmap: add Martin Kepplinger's email zsmalloc: expand class bit zram: do not use copy_page with non-page aligned address zram: fix operator precedence to get offset hugetlbfs: fix offset overflow in hugetlbfs mmap thp: fix MADV_DONTNEED vs clear soft dirty race thp: fix MADV_DONTNEED vs. MADV_FREE race mm: drop unused pmdp_huge_get_and_clear_notify() thp: fix MADV_DONTNEED vs. numa balancing race thp: reduce indentation level in change_huge_pmd() z3fold: fix page locking in z3fold_alloc()
-
Martin Kepplinger authored
Set the partly deprecated companies' email addresses as alias for the personal one. Link: http://lkml.kernel.org/r/1491984622-17321-1-git-send-email-martin.kepplinger@ginzinger.comSigned-off-by: Martin Kepplinger <martin.kepplinger@ginzinger.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Minchan Kim authored
Now 64K page system, zsamlloc has 257 classes so 8 class bit is not enough. With that, it corrupts the system when zsmalloc stores 65536byte data(ie, index number 256) so that this patch increases class bit for simple fix for stable backport. We should clean up this mess soon. index size 0 32 1 288 .. .. 204 52256 256 65536 Fixes: 3783689a ("zsmalloc: introduce zspage structure") Link: http://lkml.kernel.org/r/1492042622-12074-3-git-send-email-minchan@kernel.orgSigned-off-by: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Minchan Kim authored
The copy_page is optimized memcpy for page-alinged address. If it is used with non-page aligned address, it can corrupt memory which means system corruption. With zram, it can happen with 1. 64K architecture 2. partial IO 3. slub debug Partial IO need to allocate a page and zram allocates it via kmalloc. With slub debug, kmalloc(PAGE_SIZE) doesn't return page-size aligned address. And finally, copy_page(mem, cmem) corrupts memory. So, this patch changes it to memcpy. Actuaully, we don't need to change zram_bvec_write part because zsmalloc returns page-aligned address in case of PAGE_SIZE class but it's not good to rely on the internal of zsmalloc. Note: When this patch is merged to stable, clear_page should be fixed, too. Unfortunately, recent zram removes it by "same page merge" feature so it's hard to backport this patch to -stable tree. I will handle it when I receive the mail from stable tree maintainer to merge this patch to backport. Fixes: 42e99bd9 ("zram: optimize memory operations with clear_page()/copy_page()") Link: http://lkml.kernel.org/r/1492042622-12074-2-git-send-email-minchan@kernel.orgSigned-off-by: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Minchan Kim authored
In zram_rw_page, the logic to get offset is wrong by operator precedence (i.e., "<<" is higher than "&"). With wrong offset, zram can corrupt the user's data. This patch fixes it. Fixes: 8c7f0102 ("zram: implement rw_page operation of zram") Link: http://lkml.kernel.org/r/1492042622-12074-1-git-send-email-minchan@kernel.orgSigned-off-by: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Mike Kravetz authored
If mmap() maps a file, it can be passed an offset into the file at which the mapping is to start. Offset could be a negative value when represented as a loff_t. The offset plus length will be used to update the file size (i_size) which is also a loff_t. Validate the value of offset and offset + length to make sure they do not overflow and appear as negative. Found by syzcaller with commit ff8c0c53 ("mm/hugetlb.c: don't call region_abort if region_chg fails") applied. Prior to this commit, the overflow would still occur but we would luckily return ENOMEM. To reproduce: mmap(0, 0x2000, 0, 0x40021, 0xffffffffffffffffULL, 0x8000000000000000ULL); Resulted in, kernel BUG at mm/hugetlb.c:742! Call Trace: hugetlbfs_evict_inode+0x80/0xa0 evict+0x24a/0x620 iput+0x48f/0x8c0 dentry_unlink_inode+0x31f/0x4d0 __dentry_kill+0x292/0x5e0 dput+0x730/0x830 __fput+0x438/0x720 ____fput+0x1a/0x20 task_work_run+0xfe/0x180 exit_to_usermode_loop+0x133/0x150 syscall_return_slowpath+0x184/0x1c0 entry_SYSCALL_64_fastpath+0xab/0xad Fixes: ff8c0c53 ("mm/hugetlb.c: don't call region_abort if region_chg fails") Link: http://lkml.kernel.org/r/1491951118-30678-1-git-send-email-mike.kravetz@oracle.comReported-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kirill A. Shutemov authored
Yet another instance of the same race. Fix is identical to change_huge_pmd(). See "thp: fix MADV_DONTNEED vs. numa balancing race" for more details. Link: http://lkml.kernel.org/r/20170302151034.27829-5-kirill.shutemov@linux.intel.comSigned-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kirill A. Shutemov authored
Both MADV_DONTNEED and MADV_FREE handled with down_read(mmap_sem). It's critical to not clear pmd intermittently while handling MADV_FREE to avoid race with MADV_DONTNEED: CPU0: CPU1: madvise_free_huge_pmd() pmdp_huge_get_and_clear_full() madvise_dontneed() zap_pmd_range() pmd_trans_huge(*pmd) == 0 (without ptl) // skip the pmd set_pmd_at(); // pmd is re-established It results in MADV_DONTNEED skipping the pmd, leaving it not cleared. It violates MADV_DONTNEED interface and can result is userspace misbehaviour. Basically it's the same race as with numa balancing in change_huge_pmd(), but a bit simpler to mitigate: we don't need to preserve dirty/young flags here due to MADV_FREE functionality. [kirill.shutemov@linux.intel.com: Urgh... Power is special again] Link: http://lkml.kernel.org/r/20170303102636.bhd2zhtpds4mt62a@black.fi.intel.com Link: http://lkml.kernel.org/r/20170302151034.27829-4-kirill.shutemov@linux.intel.comSigned-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kirill A. Shutemov authored
Dave noticed that after fixing MADV_DONTNEED vs numa balancing race the last pmdp_huge_get_and_clear_notify() user is gone. Let's drop the helper. Link: http://lkml.kernel.org/r/20170306112047.24809-1-kirill.shutemov@linux.intel.comSigned-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kirill A. Shutemov authored
In case prot_numa, we are under down_read(mmap_sem). It's critical to not clear pmd intermittently to avoid race with MADV_DONTNEED which is also under down_read(mmap_sem): CPU0: CPU1: change_huge_pmd(prot_numa=1) pmdp_huge_get_and_clear_notify() madvise_dontneed() zap_pmd_range() pmd_trans_huge(*pmd) == 0 (without ptl) // skip the pmd set_pmd_at(); // pmd is re-established The race makes MADV_DONTNEED miss the huge pmd and don't clear it which may break userspace. Found by code analysis, never saw triggered. Link: http://lkml.kernel.org/r/20170302151034.27829-3-kirill.shutemov@linux.intel.comSigned-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kirill A. Shutemov authored
Patch series "thp: fix few MADV_DONTNEED races" For MADV_DONTNEED to work properly with huge pages, it's critical to not clear pmd intermittently unless you hold down_write(mmap_sem). Otherwise MADV_DONTNEED can miss the THP which can lead to userspace breakage. See example of such race in commit message of patch 2/4. All these races are found by code inspection. I haven't seen them triggered. I don't think it's worth to apply them to stable@. This patch (of 4): Restructure code in preparation for a fix. Link: http://lkml.kernel.org/r/20170302151034.27829-2-kirill.shutemov@linux.intel.comSigned-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Vitaly Wool authored
Stress testing of the current z3fold implementation on a 8-core system revealed it was possible that a z3fold page deleted from its unbuddied list in z3fold_alloc() would be put on another unbuddied list by z3fold_free() while z3fold_alloc() is still processing it. This has been introduced with commit 5a27aa82 ("z3fold: add kref refcounting") due to the removal of special handling of a z3fold page not on any list in z3fold_free(). To fix this, the z3fold page lock should be taken in z3fold_alloc() before the pool lock is released. To avoid deadlocking, we just try to lock the page as soon as we get a hold of it, and if trylock fails, we drop this page and take the next one. Signed-off-by: Vitaly Wool <vitalywool@gmail.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: <Oleksiy.Avramchenko@sony.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-