- 04 Nov, 2017 22 commits
-
-
Vadim Fedorenko authored
Some time ago Eric Dumazet suggested a "hack the IFF_XMIT_DST_RELEASE flag on the vlan netdev". But the last comment was "does not support properly bonding/team.(If the real_dev->privflags IFF_XMIT_DST_RELEASE bit changes, we want to update all the vlans at the same time )" I've extended that patch to support changes of IFF_XMIT_DST_RELEASE in bonding/team. Both bonding and team call netdev_change_features() after recalculation of features including priv_flags IFF_XMIT_DST_RELEASE bit. So the only thing needed to support is to recheck this bit in vlan_transfer_features(). Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Vadim Fedorenko <vfedorenko@yandex-team.ru> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wei Yongjun authored
Fixes the following sparse warnings: drivers/net/phy/phylink.c:570:6: warning: symbol 'phylink_phy_change' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Merge tag 'wireless-drivers-next-for-davem-2017-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for 4.15 Mostly fixes this time, but also few new features. Major changes: wil6210 * remove ssid debugfs file rsi * add WOWLAN support for suspend, hibernate and shutdown states ath10k * add support for CCMP-256, GCMP and GCMP-256 ciphers on hardware where it's supported (QCA99x0 and QCA4019) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller authored
Files removed in 'net-next' had their license header updated in 'net'. We take the remove from 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vijaya Mohan Guvva authored
Return success if the same dispatch function is being registered for a given opcode and subcode, there by allow multiple switchdev enable and disables. Signed-off-by: Vijaya Mohan Guvva <vijaya.guvva@cavium.com> Signed-off-by: Satanand Burla <satananda.burla@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Jiri Pirko says: ==================== mlxsw: Handle changes in GRE configuration Petr says: Until now, when an IP tunnel was offloaded by the mlxsw driver, the offload was pretty much static, and changes in Linux configuration were not reflected in the hardware. That led to discrepancies between traffic flows in slow path and fast path. The work-around used to be to remove all routes that forward to the netdevice and re-add them. This is clearly suboptimal, but actually, as of the decap-only patchset, it's not even enough anymore, and one needs to go all the way and simply drop the tunnel and recreate it correctly. With this patchset, the NETDEV_CHANGE events that are generated for changes of up'd tunnel netdevices are captured and interpreted to correctly reconfigure the HW in accordance with changes requested at the software layer. In addition, NETDEV_CHANGEUPPER, NETDEV_UP and NETDEV_DOWN are now handled not only for tunnel devices themselves, but also for their bound devices. Each change is then translated to one or more of the following updates to the HW configuration: - refresh of offload of local route that corresponds to tunnel's local address - refresh of the loopback RIF - refresh of offloads of routes that forward to the changed tunnel - removal of tunnel offloads These tools are used to implement the following configuration changes: - addition of a new offloadable tunnel with local address that conflicts with that of an already-offloaded tunnel (the existing tunnel is onloaded, the new one isn't offloaded) - changes to TTL, TOS that make tunnel unsuitable for offloading - changes to ikey, okey, remote - changes to local, which when they cause conflict with another tunnel, lead to onloading of both newly-conflicting tunnels - migration of a bound device of an offloaded tunnel device to a different VRF - changes to what device is bound to a tunnel device (i.e. like what "ip tunnel change name g dev another" does) - changes to up / down state of a bound device. A down bound device doesn't forward encapsulated traffic anymore, but decap still works. This patchset starts with a suite of patches that adapt the existing code base step by step to facilitate introduction of the offloading code. The five substantial patches at the end then implement the changes mentioned above. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
When the bound device of a tunnel device is down, encapsulated packets are not egressed anymore, but tunnel decap still works. Extend mlxsw_sp_nexthop_rif_update() to take IFF_UP into consideration when deciding whether a given next hop should be offloaded. Because the new logic was added to mlxsw_sp_nexthop_rif_update(), this fixes the case where a newly-added tunnel has a down bound device, which would previously be fully offloaded. Now the down state of the bound device is noted and next hops forwarding to such tunnel are not offloaded. In addition to that, notice NETDEV_UP and NETDEV_DOWN of a bound device to force refresh of tunnel encap route offloads. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
When a bound device of an IP-in-IP tunnel changes, such as through 'ip tunnel change name $name dev $dev', the loopback backing the tunnel needs to be recreated. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
Changes to L3 tunnel netdevices (through `ip tunnel change' as well as `ip link set') lead to NETDEV_CHANGE being generated on the tunnel device. Because what is relevant for the tunnel in question depends on the tunnel type, handling of the event is dispatched to the IPIP module through a newly-added interface mlxsw_sp_ipip_ops.ol_netdev_change(). IPIP tunnels now remember the last set of tunnel parameters in struct mlxsw_sp_ipip_entry.parms, and use it to figure out what exactly has changed. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
When a bound device of a tunnel netdevice changes VRF, the loopback RIF that backs the tunnel needs to be updated and existing encapsulating routes need to be refreshed. Note that several tunnels can share the same bound device, in which case all the impacted tunnels need to be updated. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
The approach for offloading IP tunnels implemented currently by mlxsw doesn't allow two tunnels that have the same local IP address in the same (underlay) VRF. Previously, offloads were introduced on demand as encap routes were formed. When such a route was created that would cause offload of a conflicting tunnel, mlxsw_sp_ipip_entry_create() would detect it and return -EEXIST, which would propagate up and cause FIB abort. Now however IPIP entries are created as soon as an offloadable netdevice is created, and the failure prevents creation of such device. Furthermore, if the driver is installed at the point where such conflicting tunnels exist, the failure actually prevents successful modprobe. Furthermore, follow-up patches implement handling of NETDEV_CHANGE due to the local address change. However, NETDEV_CHANGE can't be vetoed. The failure merely means that the offloads weren't updated, but the change in Linux configuration is not rolled back. It is thus desirable to have a robust way of handling these conflicts, which can later be reused for handling NETDEV_CHANGE as well. To fix this, when a conflicting tunnel is created, instead of failing, simply pull the old tunnel to slow path and reject offloading the new one. Introduce two functions: mlxsw_sp_ipip_entry_demote_tunnel() and mlxsw_sp_ipip_demote_tunnel_by_saddr() to handle this. Make them both public, because they will be useful later on in this patchset. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
When trying to determine whether there are other offloaded tunnels with the same local address, mlxsw_sp_ipip_entry_create() should look for a tunnel with matching UL protocol, matching saddr, in the same VRF. However instead of taking into account the UL protocol of the tunnel netdevice (which mlxsw_sp_ipip_entry_saddr_matches() then compares to the UL protocol of inspected IPIP entry), it deduces the UL protocol from the inspected IPIP entry (and that's compared to itself). This is currently immaterial, because only one tunnel type is offloaded, and therefore the UL protocol always matches, but introducing support for a tunnel with IPv6 underlay would uncover this error. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
The work that needs to be done to update HW configuration in response to changes is similar to what __mlxsw_sp_ipip_entry_update_tunnel() already does, but with a number of twists: each change requires a different subset of things to happen. Extend the function to support all these uses, and allow finely-grained configuration of what should happen at each call through a suite of function arguments. Publish the updated function to allow use from the spectrum_ipip module. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
The work that's done by mlxsw_sp_netdevice_ipip_ol_vrf_event() is a good basis for a more versatile function that would take care of all sorts of tunnel updates requests: __mlxsw_sp_ipip_entry_update_tunnel(). Extract that function. Factor out a helper mlxsw_sp_ipip_entry_ol_lb_update() as well. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
The function mlxsw_sp_rif_create() takes an extack parameter. So far, for creation of loopback interfaces, NULL was passed. For some events however the extack can be extracted and passed along. So do that for NETDEV_CHANGEUPPER handler. Use the opportunity to update the type of info argument that mlxsw_sp_netdevice_ipip_ol_event() takes. Follow-up patches will introduce handling of more changes, and some of them carry an extack as well, but in an info structure of a different type. Though not strictly erroneous (the pointer could be cast whichever way), it makes no sense to pretend the value is always of a certain type, when in fact it isn't. So change the prototype of the above-mentioned function as well. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
The piece of logic to promote decap route, if any, is useful for generic tunnel updates, not just for handling of NETDEV_UP events on tunnel interfaces. Extract it to a separate function. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
This function only ever returns 0, so don't pretend it returns anything useful and just make it void. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
To implement NETDEV_CHANGE notifications on IP-in-IP tunnels, the handler needs to figure out what actually changed, to understand how exactly to update the offloads. It will do so by storing struct ip_tunnel_parm with previous configuration, and comparing that to the new version. To facilitate these comparisons, extract the code that operates on struct ip_tunnel_parm from the existing accessor functions, and make those a thin wrapper that extracts tunnel parameters and dispatches. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
These functions ideologically belong to the IPIP module, and some follow-up work will benefit from their presence there. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
Some of the code down the road needs this logic as well. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Machata authored
To distinguish between events related to tunnel device itself and its bound device, rename a number of functions related to handling tunneling netdevice events to include _ol_ (for "overlay") in the name. That leaves room in the namespace for underlay-related functions, which would have _ul_ in the name. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 03 Nov, 2017 18 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linuxLinus Torvalds authored
Pull clk fix from Stephen Boyd: "One fix for USB clks on Uniphier PXs3 SoCs" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: uniphier: fix clock data for PXs3
-
git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tileLinus Torvalds authored
Pull arch/tile fixes from Chris Metcalf: "Two one-line bug fixes" * git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: arch/tile: Implement ->set_state_oneshot_stopped() tile: pass machine size to sparse
-
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds authored
Pull SCSI fix from James Bottomley: "One minor fix in the error leg of the qla2xxx driver (it oopses the system if we get an error trying to start the internal kernel thread). The fix is minor because the problem isn't often encountered in the field (although it can be induced by inserting the module in a low memory environment)" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: qla2xxx: Fix oops in qla2x00_probe_one error path
-
Chris Metcalf authored
set_state_oneshot_stopped() is called by the clkevt core, when the next event is required at an expiry time of 'KTIME_MAX'. This normally happens with NO_HZ_{IDLE|FULL} in both LOWRES/HIGHRES modes. This patch makes the clockevent device to stop on such an event, to avoid spurious interrupts, as explained by: commit 8fff52fd ("clockevents: Introduce CLOCK_EVT_STATE_ONESHOT_STOPPED state"). Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds authored
Pull powerpc fixes from Michael Ellerman: "Some more powerpc fixes for 4.14. This is bigger than I like to send at rc7, but that's at least partly because I didn't send any fixes last week. If it wasn't for the IMC driver, which is new and getting heavy testing, the diffstat would look a bit better. I've also added ftrace on big endian to my test suite, so we shouldn't break that again in future. - A fix to the handling of misaligned paste instructions (P9 only), where a change to a #define has caused the check for the instruction to always fail. - The preempt handling was unbalanced in the radix THP flush (P9 only). Though we don't generally use preempt we want to keep it working as much as possible. - Two fixes for IMC (P9 only), one when booting with restricted number of CPUs and one in the error handling when initialisation fails due to firmware etc. - A revert to fix function_graph on big endian machines, and then a rework of the reverted patch to fix kprobes blacklist handling on big endian machines. Thanks to: Anju T Sudhakar, Guilherme G. Piccoli, Madhavan Srinivasan, Naveen N. Rao, Nicholas Piggin, Paul Mackerras" * tag 'powerpc-4.14-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/perf: Fix core-imc hotplug callback failure during imc initialization powerpc/kprobes: Dereference function pointers only if the address does not belong to kernel text Revert "powerpc64/elfv1: Only dereference function descriptor for non-text symbols" powerpc/64s/radix: Fix preempt imbalance in TLB flush powerpc: Fix check for copy/paste instructions in alignment handler powerpc/perf: Fix IMC allocation routine
-
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmcLinus Torvalds authored
Pull MMC fixes from Ulf Hansson: "Fix dw_mmc request timeout issues" * tag 'mmc-v4.14-rc4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: dw_mmc: Fix the DTO timeout calculation mmc: dw_mmc: Add locking to the CTO timer mmc: dw_mmc: Fix the CTO timeout calculation mmc: dw_mmc: cancel the CTO timer after a voltage switch
-
git://people.freedesktop.org/~airlied/linuxLinus Torvalds authored
Pull drm fixes from Dave Airlie: - one nouveau regression fix - some amdgpu fixes for stable to fix hangs on some harvested Polaris GPUs - a set of KASAN and regression fixes for i915, their CI system seems to be working pretty well now. * tag 'drm-fixes-for-v4.14-rc8' of git://people.freedesktop.org/~airlied/linux: drm/amdgpu: allow harvesting check for Polaris VCE drm/amdgpu: return -ENOENT from uvd 6.0 early init for harvesting drm/i915: Check incoming alignment for unfenced buffers (on i915gm) drm/nouveau/kms/nv50: use the correct state for base channel notifier setup drm/i915: Hold rcu_read_lock when iterating over the radixtree (vma idr) drm/i915: Hold rcu_read_lock when iterating over the radixtree (objects) drm/i915/edp: read edp display control registers unconditionally drm/i915: Do not rely on wm preservation for ILK watermarks drm/i915: Cancel the modeset retry work during modeset cleanup
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds authored
Pull networking fixes from David Miller: "Hopefully this is the last batch of networking fixes for 4.14 Fingers crossed... 1) Fix stmmac to use the proper sized OF property read, from Bhadram Varka. 2) Fix use after free in net scheduler tc action code, from Cong Wang. 3) Fix SKB control block mangling in tcp_make_synack(). 4) Use proper locking in fib_dump_info(), from Florian Westphal. 5) Fix IPG encodings in systemport driver, from Florian Fainelli. 6) Fix division by zero in NV TCP congestion control module, from Konstantin Khlebnikov. 7) Fix use after free in nf_reject_ipv4, from Tejaswi Tanikella" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: net: systemport: Correct IPG length settings tcp: do not mangle skb->cb[] in tcp_make_synack() fib: fib_dump_info can no longer use __in_dev_get_rtnl stmmac: use of_property_read_u32 instead of read_u8 net_sched: hold netns refcnt for each action net_sched: acquire RTNL in tc_action_net_exit() net: vrf: correct FRA_L3MDEV encode type tcp_nv: fix division by zero in tcpnv_acked() netfilter: nf_reject_ipv4: Fix use-after-free in send_reset netfilter: nft_set_hash: disable fast_ops for 2-len keys
-
Linus Torvalds authored
Merge misc fixes from Andrew Morton: "7 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm, swap: fix race between swap count continuation operations mm/huge_memory.c: deposit page table when copying a PMD migration entry initramfs: fix initramfs rebuilds w/ compression after disabling fs/hugetlbfs/inode.c: fix hwpoison reserve accounting ocfs2: fstrim: Fix start offset of first cluster group during fstrim mm, /proc/pid/pagemap: fix soft dirty marking for PMD migration entry userfaultfd: hugetlbfs: prevent UFFDIO_COPY to fill beyond the end of i_size
-
Paul Burton authored
MIPS will soon not be a part of Imagination Technologies, and as such many @imgtec.com email addresses will no longer be valid. This patch updates the addresses for those who: - Have 10 or more patches in mainline authored using an @imgtec.com email address, or any patches dated within the past year. - Are still with Imagination but leaving as part of the MIPS business unit, as determined from an internal email address list. - Haven't already updated their email address (ie. JamesH) or expressed a desire to be excluded (ie. Maciej). - Acked v2 or earlier of this patch, which leaves Deng-Cheng, Matt & myself. New addresses are of the form firstname.lastname@mips.com, and all verified against an internal email address list. An entry is added to .mailmap for each person such that get_maintainer.pl will report the new addresses rather than @imgtec.com addresses which will soon be dead. Instances of the affected addresses throughout the tree are then mechanically replaced with the new @mips.com address. Signed-off-by: Paul Burton <paul.burton@mips.com> Cc: Deng-Cheng Zhu <dengcheng.zhu@imgtec.com> Cc: Deng-Cheng Zhu <dengcheng.zhu@mips.com> Acked-by: Dengcheng Zhu <dengcheng.zhu@mips.com> Cc: Matt Redfearn <matt.redfearn@imgtec.com> Cc: Matt Redfearn <matt.redfearn@mips.com> Acked-by: Matt Redfearn <matt.redfearn@mips.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: trivial@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Rafael J. Wysocki authored
Commit 890da9cf (Revert "x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"") is not sufficient to restore the previous behavior of "cpu MHz" in /proc/cpuinfo on x86 due to some changes made after the commit it has reverted. To address this, make the code in question use arch_freq_get_on_cpu() which also is used by cpufreq for reporting the current frequency of CPUs and since that function doesn't really depend on cpufreq in any way, drop the CONFIG_CPU_FREQ dependency for the object file containing it. Also refactor arch_freq_get_on_cpu() somewhat to avoid IPIs and return cached values right away if it is called very often over a short time (to prevent user space from triggering IPI storms through it). Fixes: 890da9cf (Revert "x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"") Cc: stable@kernel.org # 4.13 - together with 890da9cfSigned-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Huang Ying authored
One page may store a set of entries of the sis->swap_map (swap_info_struct->swap_map) in multiple swap clusters. If some of the entries has sis->swap_map[offset] > SWAP_MAP_MAX, multiple pages will be used to store the set of entries of the sis->swap_map. And the pages are linked with page->lru. This is called swap count continuation. To access the pages which store the set of entries of the sis->swap_map simultaneously, previously, sis->lock is used. But to improve the scalability of __swap_duplicate(), swap cluster lock may be used in swap_count_continued() now. This may race with add_swap_count_continuation() which operates on a nearby swap cluster, in which the sis->swap_map entries are stored in the same page. The race can cause wrong swap count in practice, thus cause unfreeable swap entries or software lockup, etc. To fix the race, a new spin lock called cont_lock is added to struct swap_info_struct to protect the swap count continuation page list. This is a lock at the swap device level, so the scalability isn't very well. But it is still much better than the original sis->lock, because it is only acquired/released when swap count continuation is used. Which is considered rare in practice. If it turns out that the scalability becomes an issue for some workloads, we can split the lock into some more fine grained locks. Link: http://lkml.kernel.org/r/20171017081320.28133-1-ying.huang@intel.com Fixes: 235b6217 ("mm/swap: add cluster lock") Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Shaohua Li <shli@kernel.org> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> [4.11+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Zi Yan authored
We need to deposit pre-allocated PTE page table when a PMD migration entry is copied in copy_huge_pmd(). Otherwise, we will leak the pre-allocated page and cause a NULL pointer dereference later in zap_huge_pmd(). The missing counters during PMD migration entry copy process are added as well. The bug report is here: https://lkml.org/lkml/2017/10/29/214 Link: http://lkml.kernel.org/r/20171030144636.4836-1-zi.yan@sent.com Fixes: 84c3fc4e ("mm: thp: check pmd migration entry in common path") Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Florian Fainelli authored
This is a follow-up to commit 57ddfdaa ("initramfs: fix disabling of initramfs (and its compression)"). This particular commit fixed the use case where we build the kernel with an initramfs with no compression, and then we build the kernel with no initramfs. Now this still left us with the same case as described here: http://lkml.kernel.org/r/20170521033337.6197-1-f.fainelli@gmail.com not working with initramfs compression. This can be seen by the following steps/timestamps: https://www.spinics.net/lists/kernel/msg2598153.html .initramfs_data.cpio.gz.cmd is correct: cmd_usr/initramfs_data.cpio.gz := /bin/bash ./scripts/gen_initramfs_list.sh -o usr/initramfs_data.cpio.gz -u 1000 -g 1000 /home/fainelli/work/uclinux-rootfs/romfs /home/fainelli/work/uclinux-rootfs/misc/initramfs.dev and was generated the first time we did generate the gzip initramfs, so the command has not changed, nor its arguments, so we just don't call it, no initramfs cpio is re-generated as a consequence. The fix for this problem is just to properly keep track of the .initramfs_cpio_data.d file by suffixing it with the compression extension. This takes care of properly tracking dependencies such that the initramfs get (re)generated any time files are added/deleted etc. Link: http://lkml.kernel.org/r/20170930033936.6722-1-f.fainelli@gmail.com Fixes: db2aa7fd ("initramfs: allow again choice of the embedded initramfs compression algorithm") Fixes: 9e3596b0 ("kbuild: initramfs cleanup, set target from Kconfig") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Cc: "Francisco Blas Izquierdo Riera (klondike)" <klondike@xiscosoft.net> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Mike Kravetz authored
Calling madvise(MADV_HWPOISON) on a hugetlbfs page will result in bad (negative) reserved huge page counts. This may not happen immediately, but may happen later when the underlying file is removed or filesystem unmounted. For example: AnonHugePages: 0 kB ShmemHugePages: 0 kB HugePages_Total: 1 HugePages_Free: 0 HugePages_Rsvd: 18446744073709551615 HugePages_Surp: 0 Hugepagesize: 2048 kB In routine hugetlbfs_error_remove_page(), hugetlb_fix_reserve_counts is called after remove_huge_page. hugetlb_fix_reserve_counts is designed to only be called/used only if a failure is returned from hugetlb_unreserve_pages. Therefore, call hugetlb_unreserve_pages as required and only call hugetlb_fix_reserve_counts in the unlikely event that hugetlb_unreserve_pages returns an error. Link: http://lkml.kernel.org/r/20171019230007.17043-2-mike.kravetz@oracle.com Fixes: 78bb9203 ("mm: hwpoison: dissolve in-use hugepage in unrecoverable memory error") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Ashish Samant authored
The first cluster group descriptor is not stored at the start of the group but at an offset from the start. We need to take this into account while doing fstrim on the first cluster group. Otherwise we will wrongly start fstrim a few blocks after the desired start block and the range can cross over into the next cluster group and zero out the group descriptor there. This can cause filesytem corruption that cannot be fixed by fsck. Link: http://lkml.kernel.org/r/1507835579-7308-1-git-send-email-ashish.samant@oracle.comSigned-off-by: Ashish Samant <ashish.samant@oracle.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Huang Ying authored
When the pagetable is walked in the implementation of /proc/<pid>/pagemap, pmd_soft_dirty() is used for both the PMD huge page map and the PMD migration entries. That is wrong, pmd_swp_soft_dirty() should be used for the PMD migration entries instead because the different page table entry flag is used. As a result, /proc/pid/pagemap may report incorrect soft dirty information for PMD migration entries. Link: http://lkml.kernel.org/r/20171017081818.31795-1-ying.huang@intel.com Fixes: 84c3fc4e ("mm: thp: check pmd migration entry in common path") Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Michal Hocko <mhocko@suse.com> Cc: David Rientjes <rientjes@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Hugh Dickins <hughd@google.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Daniel Colascione <dancol@google.com> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Andrea Arcangeli authored
This oops: kernel BUG at fs/hugetlbfs/inode.c:484! RIP: remove_inode_hugepages+0x3d0/0x410 Call Trace: hugetlbfs_setattr+0xd9/0x130 notify_change+0x292/0x410 do_truncate+0x65/0xa0 do_sys_ftruncate.constprop.3+0x11a/0x180 SyS_ftruncate+0xe/0x10 tracesys+0xd9/0xde was caused by the lack of i_size check in hugetlb_mcopy_atomic_pte. mmap() can still succeed beyond the end of the i_size after vmtruncate zapped vmas in those ranges, but the faults must not succeed, and that includes UFFDIO_COPY. We could differentiate the retval to userland to represent a SIGBUS like a page fault would do (vs SIGSEGV), but it doesn't seem very useful and we'd need to pick a random retval as there's no meaningful syscall retval that would differentiate from SIGSEGV and SIGBUS, there's just -EFAULT. Link: http://lkml.kernel.org/r/20171016223914.2421-2-aarcange@redhat.comSigned-off-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-