- 30 Mar, 2017 11 commits
-
-
Eric Dumazet authored
[ Upstream commit 15bb7745 ] icsk_ack.lrcvtime has a 0 value at socket creation time. tcpi_last_data_recv can have bogus value if no payload is ever received. This patch initializes icsk_ack.lrcvtime for active sessions in tcp_finish_connect(), and for passive sessions in tcp_create_openreq_child() Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Daniel Borkmann authored
[ Upstream commit a97e50cc ] In sk_clone_lock(), we create a new socket and inherit most of the parent's members via sock_copy() which memcpy()'s various sections. Now, in case the parent socket had a BPF socket filter attached, then newsk->sk_filter points to the same instance as the original sk->sk_filter. sk_filter_charge() is then called on the newsk->sk_filter to take a reference and should that fail due to hitting max optmem, we bail out and release the newsk instance. The issue is that commit 278571ba ("net: filter: simplify socket charging") wrongly combined the dismantle path with the failure path of xfrm_sk_clone_policy(). This means, even when charging failed, we call sk_free_unlock_clone() on the newsk, which then still points to the same sk_filter as the original sk. Thus, sk_free_unlock_clone() calls into __sk_destruct() eventually where it tests for present sk_filter and calls sk_filter_uncharge() on it, which potentially lets sk_omem_alloc wrap around and releases the eBPF prog and sk_filter structure from the (still intact) parent. Fix it by making sure that when sk_filter_charge() failed, we reset newsk->sk_filter back to NULL before passing to sk_free_unlock_clone(), so that we don't mess with the parents sk_filter. Only if xfrm_sk_clone_policy() fails, we did reach the point where either the parent's filter was NULL and as a result newsk's as well or where we previously had a successful sk_filter_charge(), thus for that case, we do need sk_filter_uncharge() to release the prior taken reference on sk_filter. Fixes: 278571ba ("net: filter: simplify socket charging") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
[ Upstream commit c64c0b3c ] Alexander reported a KMSAN splat caused by reads of uninitialized field (tb_id_in) from user provided struct fib_result_nl It turns out nl_fib_input() sanity tests on user input is a bit wrong : User can pretend nlh->nlmsg_len is big enough, but provide at sendmsg() time a too small buffer. Reported-by: Alexander Potapenko <glider@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Doug Berger authored
[ Upstream commit 31739eae ] Commit 6ac3ce82 ("net: bcmgenet: Remove excessive PHY reset") removed the bcmgenet_mii_reset() function from bcmgenet_power_up() and bcmgenet_internal_phy_setup() functions. In so doing it broke the reset of the internal PHY devices used by the GENETv1-GENETv3 which required this reset before the UniMAC was enabled. It also broke the internal GPHY devices used by the GENETv4 because the config_init that installed the AFE workaround was no longer occurring after the reset of the GPHY performed by bcmgenet_phy_power_set() in bcmgenet_internal_phy_setup(). In addition the code in bcmgenet_internal_phy_setup() related to the "enable APD" comment goes with the bcmgenet_mii_reset() so it should have also been removed. Commit bd4060a6 ("net: bcmgenet: Power on integrated GPHY in bcmgenet_power_up()") moved the bcmgenet_phy_power_set() call to the bcmgenet_power_up() function, but failed to remove it from the bcmgenet_internal_phy_setup() function. Had it done so, the bcmgenet_internal_phy_setup() function would have been empty and could have been removed at that time. Commit 5dbebbb4 ("net: bcmgenet: Software reset EPHY after power on") was submitted to correct the functional problems introduced by commit 6ac3ce82 ("net: bcmgenet: Remove excessive PHY reset"). It was included in v4.4 and made available on 4.3-stable. Unfortunately, it didn't fully revert the commit because this bcmgenet_mii_reset() doesn't apply the soft reset to the internal GPHY used by GENETv4 like the previous one did. This prevents the restoration of the AFE work- arounds for internal GPHY devices after the bcmgenet_phy_power_set() in bcmgenet_internal_phy_setup(). This commit takes the alternate approach of removing the unnecessary bcmgenet_internal_phy_setup() function which shouldn't have been in v4.3 so that when bcmgenet_mii_reset() was restored it should have only gone into bcmgenet_power_up(). This will avoid the problems while also removing the redundancy (and hopefully some of the confusion). Fixes: 6ac3ce82 ("net: bcmgenet: Remove excessive PHY reset") Signed-off-by: Doug Berger <opendmb@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Gal Pressman authored
[ Upstream commit 8ab7e2ae ] RX packets statistics ('rx_packets' counter) used to count LRO packets as one, even though it contains multiple segments. This patch will increment the counter by the number of segments, and align the driver with the behavior of other drivers in the stack. Note that no information is lost in this patch due to 'rx_lro_packets' counter existence. Before, ethtool showed: $ ethtool -S ens6 | egrep "rx_packets|rx_lro_packets" rx_packets: 435277 rx_lro_packets: 35847 rx_packets_phy: 1935066 Now, we will see the more logical statistics: $ ethtool -S ens6 | egrep "rx_packets|rx_lro_packets" rx_packets: 1935066 rx_lro_packets: 35847 rx_packets_phy: 1935066 Fixes: e586b3b0 ("net/mlx5: Ethernet Datapath files") Signed-off-by: Gal Pressman <galp@mellanox.com> Cc: kernel-team@fb.com Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Maor Gottlieb authored
[ Upstream commit 5f40b4ed ] With ConnectX-4 sharing SRQs from the same space as QPs, we hit a limit preventing some applications to allocate needed QPs amount. Double the size to 256K. Fixes: e126ba97 ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andrey Ulanov authored
[ Upstream commit 7df9c246 ] Dmitry has reported that a BUG_ON() condition in unix_notinflight() may be triggered by a simple code that forwards unix socket in an SCM_RIGHTS message. That is caused by incorrect unix socket GC implementation in unix_gc(). The GC first collects list of candidates, then (a) decrements their "children's" inflight counter, (b) checks which inflight counters are now 0, and then (c) increments all inflight counters back. (a) and (c) are done by calling scan_children() with inc_inflight or dec_inflight as the second argument. Commit 6209344f ("net: unix: fix inflight counting bug in garbage collector") changed scan_children() such that it no longer considers sockets that do not have UNIX_GC_CANDIDATE flag. It also added a block of code that that unsets this flag _before_ invoking scan_children(, dec_iflight, ). This may lead to incorrect inflight counters for some sockets. This change fixes this bug by changing order of operations: UNIX_GC_CANDIDATE is now unset only after all inflight counters are restored to the original state. kernel BUG at net/unix/garbage.c:149! RIP: 0010:[<ffffffff8717ebf4>] [<ffffffff8717ebf4>] unix_notinflight+0x3b4/0x490 net/unix/garbage.c:149 Call Trace: [<ffffffff8716cfbf>] unix_detach_fds.isra.19+0xff/0x170 net/unix/af_unix.c:1487 [<ffffffff8716f6a9>] unix_destruct_scm+0xf9/0x210 net/unix/af_unix.c:1496 [<ffffffff86a90a01>] skb_release_head_state+0x101/0x200 net/core/skbuff.c:655 [<ffffffff86a9808a>] skb_release_all+0x1a/0x60 net/core/skbuff.c:668 [<ffffffff86a980ea>] __kfree_skb+0x1a/0x30 net/core/skbuff.c:684 [<ffffffff86a98284>] kfree_skb+0x184/0x570 net/core/skbuff.c:705 [<ffffffff871789d5>] unix_release_sock+0x5b5/0xbd0 net/unix/af_unix.c:559 [<ffffffff87179039>] unix_release+0x49/0x90 net/unix/af_unix.c:836 [<ffffffff86a694b2>] sock_release+0x92/0x1f0 net/socket.c:570 [<ffffffff86a6962b>] sock_close+0x1b/0x20 net/socket.c:1017 [<ffffffff81a76b8e>] __fput+0x34e/0x910 fs/file_table.c:208 [<ffffffff81a771da>] ____fput+0x1a/0x20 fs/file_table.c:244 [<ffffffff81483ab0>] task_work_run+0x1a0/0x280 kernel/task_work.c:116 [< inline >] exit_task_work include/linux/task_work.h:21 [<ffffffff8141287a>] do_exit+0x183a/0x2640 kernel/exit.c:828 [<ffffffff8141383e>] do_group_exit+0x14e/0x420 kernel/exit.c:931 [<ffffffff814429d3>] get_signal+0x663/0x1880 kernel/signal.c:2307 [<ffffffff81239b45>] do_signal+0xc5/0x2190 arch/x86/kernel/signal.c:807 [<ffffffff8100666a>] exit_to_usermode_loop+0x1ea/0x2d0 arch/x86/entry/common.c:156 [< inline >] prepare_exit_to_usermode arch/x86/entry/common.c:190 [<ffffffff81009693>] syscall_return_slowpath+0x4d3/0x570 arch/x86/entry/common.c:259 [<ffffffff881478e6>] entry_SYSCALL_64_fastpath+0xc4/0xc6 Link: https://lkml.org/lkml/2017/3/6/252Signed-off-by: Andrey Ulanov <andreyu@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Fixes: 6209344f ("net: unix: fix inflight counting bug in garbage collector") Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Lendacky, Thomas authored
[ Upstream commit 622c36f1 ] Newer hardware does not provide a cumulative payload length when multiple descriptors are needed to handle the data. Once the MTU increases beyond the size that can be handled by a single descriptor, the SKB does not get built properly by the driver. The driver will now calculate the size of the data buffers used by the hardware. The first buffer of the first descriptor is for packet headers or packet headers and data when the headers can't be split. Subsequent descriptors in a multi-descriptor chain will not use the first buffer. The second buffer is used by all the descriptors in the chain for payload data. Based on whether the driver is processing the first, intermediate, or last descriptor it can calculate the buffer usage and build the SKB properly. Tested and verified on both old and new hardware. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
[ Upstream commit 22a0e18e ] I mistakenly added the code to release sk->sk_frag in sk_common_release() instead of sk_destruct() TCP sockets using sk->sk_allocation == GFP_ATOMIC do no call sk_common_release() at close time, thus leaking one (order-3) page. iSCSI is using such sockets. Fixes: 5640f768 ("net: use a per task frag allocator") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Florian Fainelli authored
[ Upstream commit 5371bbf4 ] Suspending the PHY would be putting it in a low power state where it may no longer allow us to do Wake-on-LAN. Fixes: cc013fb4 ("net: bcmgenet: correctly suspend and resume PHY device") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Or Gerlitz authored
[ Upstream commit 3d20f1f7 ] When dealing with ipv6 source tunnel key address attribute (OVS_TUNNEL_KEY_ATTR_IPV6_SRC) we are wrongly setting the tunnel dst ip, fix that. Fixes: 6b26ba3a ('openvswitch: netlink attributes for IPv6 tunneling') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Paul Blakey <paulb@mellanox.com> Acked-by: Jiri Benc <jbenc@redhat.com> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 26 Mar, 2017 29 commits
-
-
Greg Kroah-Hartman authored
-
Theodore Ts'o authored
commit 2ba3e6e8 upstream. It is OK for s_first_meta_bg to be equal to the number of block group descriptor blocks. (It rarely happens, but it shouldn't cause any problems.) https://bugzilla.kernel.org/show_bug.cgi?id=194567 Fixes: 3a4b77cdSigned-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Tahsin Erdogan authored
commit 320661b0 upstream. Update to pcpu_nr_empty_pop_pages in pcpu_alloc() is currently done without holding pcpu_lock. This can lead to bad updates to the variable. Add missing lock calls. Fixes: b539b87f ("percpu: implmeent pcpu_nr_empty_pop_pages and chunk->nr_populated") Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andreas Gruenbacher authored
commit 28ea06c4 upstream. Commit 88ffbf3e switches to using rhashtables for glocks, hashing over the entire struct lm_lockname instead of its individual fields. On some architectures, struct lm_lockname contains a hole of uninitialized memory due to alignment rules, which now leads to incorrect hash values. Get rid of that hole. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Johan Hovold authored
commit 68c32f9c upstream. Make sure to check the number of endpoints to avoid dereferencing a NULL-pointer should a malicious device lack endpoints. Fixes: cf7776dc ("[PATCH] isdn4linux: Siemens Gigaset drivers - direct USB connection") Cc: Hansjoerg Lipp <hjlipp@web.de> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Max Lohrmann authored
commit 13603685 upstream. As reported by Max, the Windows 2008 R2 chkdsk utility expects VERIFY_16 to be supported, and does not handle the returned CHECK_CONDITION properly, resulting in an infinite loop. The kernel will log huge amounts of this error: kernel: TARGET_CORE[iSCSI]: Unsupported SCSI Opcode 0x8f, sending CHECK_CONDITION. Signed-off-by: Max Lohrmann <post@wickenrode.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Chris Leech authored
commit 6f8830f5 upstream. There's a rather long standing regression from the commit "libiscsi: Reduce locking contention in fast path" Depending on iSCSI target behavior, it's possible to hit the case in iscsi_complete_task where the task is still on a pending list (!list_empty(&task->running)). When that happens the task is removed from the list while holding the session back_lock, but other task list modification occur under the frwd_lock. That leads to linked list corruption and eventually a panicked system. Rather than back out the session lock split entirely, in order to try and keep some of the performance gains this patch adds another lock to maintain the task lists integrity. Major enterprise supported kernels have been backing out the lock split for while now, thanks to the efforts at IBM where a lab setup has the most reliable reproducer I've seen on this issue. This patch has been tested there successfully. Signed-off-by: Chris Leech <cleech@redhat.com> Fixes: 659743b0 ("[SCSI] libiscsi: Reduce locking contention in fast path") Reported-by: Prashantha Subbarao <psubbara@us.ibm.com> Reviewed-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Anton Blanchard authored
commit 85e8a239 upstream. We see lpfc devices regularly fail during kexec. Fix this by adding a shutdown method which mirrors the remove method. Signed-off-by: Anton Blanchard <anton@samba.org> Reviewed-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Tested-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Nicholas Bellinger authored
commit a04e54f2 upstream. The following fixes a divide by zero OOPs with TYPE_TAPE due to pscsi_tape_read_blocksize() failing causing a zero sd->sector_size being propigated up via dev_attrib.hw_block_size. It also fixes another long-standing bug where TYPE_TAPE and TYPE_MEDIMUM_CHANGER where using pscsi_create_type_other(), which does not call scsi_device_get() to take the device reference. Instead, rename pscsi_create_type_rom() to pscsi_create_type_nondisk() and use it for all cases. Finally, also drop a dump_stack() in pscsi_get_blocks() for non TYPE_DISK, which in modern target-core can get invoked via target_sense_desc_format() during CHECK_CONDITION. Reported-by: Malcolm Haak <insanemal@gmail.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Shaohua Li authored
commit 61eb2b43 upstream. Neil Brown pointed out a potential deadlock in raid 10 code with bio_split/chain. The raid1 code could have the same issue, but recent barrier rework makes it less likely to happen. The deadlock happens in below sequence: 1. generic_make_request(bio), this will set current->bio_list 2. raid10_make_request will split bio to bio1 and bio2 3. __make_request(bio1), wait_barrer, add underlayer disk bio to current->bio_list 4. __make_request(bio2), wait_barrer If raise_barrier happens between 3 & 4, since wait_barrier runs at 3, raise_barrier waits for IO completion from 3. And since raise_barrier sets barrier, 4 waits for raise_barrier. But IO from 3 can't be dispatched because raid10_make_request() doesn't finished yet. The solution is to adjust the IO ordering. Quotes from Neil: " It is much safer to: if (need to split) { split = bio_split(bio, ...) bio_chain(...) make_request_fn(split); generic_make_request(bio); } else make_request_fn(mddev, bio); This way we first process the initial section of the bio (in 'split') which will queue some requests to the underlying devices. These requests will be queued in generic_make_request. Then we queue the remainder of the bio, which will be added to the end of the generic_make_request queue. Then we return. generic_make_request() will pop the lower-level device requests off the queue and handle them first. Then it will process the remainder of the original bio once the first section has been fully processed. " Note, this only happens in read path. In write path, the bio is flushed to underlaying disks either by blk flush (from schedule) or offladed to raid1/10d. It's queued in current->bio_list. Cc: Coly Li <colyli@suse.de> Suggested-by: NeilBrown <neilb@suse.com> Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michael Ellerman authored
commit 97ee351b upstream. Recent toolchains force the TOC to be 256 byte aligned. We need to enforce this alignment in the zImage linker script, otherwise pointers to our TOC variables (__toc_start) could be incorrect. If the actual start of the TOC and __toc_start don't have the same value we crash early in the zImage wrapper. Suggested-by: Alan Modra <amodra@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Rafael J. Wysocki authored
commit 9b4f603e upstream. There is a missing newline in show_cpuinfo_cur_freq(), so add it, but while at it clean that function up somewhat too. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Peter Zijlstra authored
commit e7cc4865 upstream. While hunting for clues to a use-after-free, Oleg spotted that perf_event_init_context() can loose an error value with the result that fork() can succeed even though we did not fully inherit the perf event context. Spotted-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: oleg@redhat.com Fixes: 889ff015 ("perf/core: Split context's event group list into pinned and non-pinned lists") Link: http://lkml.kernel.org/r/20170316125823.190342547@infradead.orgSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Linus Torvalds authored
commit 474c9015 upstream. gcc-7 has an "optimization" pass that completely screws up, and generates the code expansion for the (impossible) case of calling ilog2() with a zero constant, even when the code gcc compiles does not actually have a zero constant. And we try to generate a compile-time error for anybody doing ilog2() on a constant where that doesn't make sense (be it zero or negative). So now gcc7 will fail the build due to our sanity checking, because it created that constant-zero case that didn't actually exist in the source code. There's a whole long discussion on the kernel mailing about how to work around this gcc bug. The gcc people themselevs have discussed their "feature" in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72785 but it's all water under the bridge, because while it looked at one point like it would be solved by the time gcc7 was released, that was not to be. So now we have to deal with this compiler braindamage. And the only simple approach seems to be to just delete the code that tries to warn about bad uses of ilog2(). So now "ilog2()" will just return 0 not just for the value 1, but for any non-positive value too. It's not like I can recall anybody having ever actually tried to use this function on any invalid value, but maybe the sanity check just meant that such code never made it out in public. Reported-by: Laura Abbott <labbott@redhat.com> Cc: John Stultz <john.stultz@linaro.org>, Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andi Kleen authored
commit 725fc629 upstream. Linux preallocates the task structs of the idle tasks for all possible CPUs. This currently means they all end up on node 0. This also implies that the cache line of MWAIT, which is around the flags field in the task struct, are all located in node 0. We see a noticeable performance improvement on Knights Landing CPUs when the cache lines used for MWAIT are located in the local nodes of the CPUs using them. I would expect this to give a (likely slight) improvement on other systems too. The patch implements placing the idle task in the node of its CPUs, by passing the right target node to copy_process() [akpm@linux-foundation.org: use NUMA_NO_NODE, not a bare -1] Link: http://lkml.kernel.org/r/1463492694-15833-1-git-send-email-andi@firstfloor.orgSigned-off-by: Andi Kleen <ak@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Vitaly Kuznetsov authored
commit 757647e1 upstream. Recent changes to 'struct flow_keys' (e.g commit d34af823 ("net: Add VLAN ID to flow_keys")) introduced a performance regression in netvsc driver. Is problem is, however, not the above mentioned commit but the fact that netvsc_set_hash() function did some assumptions on the struct flow_keys data layout and this is wrong. Get rid of netvsc_set_hash() by switching to skb_get_hash(). This change will also imply switching to Jenkins hash from the currently used Toeplitz but it seems there is no good excuse for Toeplitz to stay. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jason Gunthorpe authored
commit 727f28b8 upstream. The interrupt is always allocated with devm_request_irq so it must always be freed with devm_free_irq. Fixes: 448e9c55 ("tpm_tis: verify interrupt during init") Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Acked-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Martin Wilck <Martin.Wilck@ts.fujitsu.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Acked-by: Peter Huewe <peterhuewe@gmx.de> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dave Airlie authored
commit e9c5e740 upstream. this fixes the build on arm. Signed-off-by: Dave Airlie <airlied@redhat.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sebastian Ott authored
commit dba59909 upstream. After a failure during registration of the dma_table (because of the function being in error state) we free its memory but don't reset the associated pointer to zero. When we then receive a notification from firmware (about the function being in error state) we'll try to walk and free the dma_table again. Fix this by resetting the dma_table pointer. In addition to that make sure that we free the iommu_bitmap when appropriate. Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Thomas Huth authored
commit 708e75a3 upstream. If kvmppc_handle_exit_pr() calls kvmppc_emulate_instruction() to emulate one instruction (in the BOOK3S_INTERRUPT_H_EMUL_ASSIST case), it calls kvmppc_core_queue_program() afterwards if kvmppc_emulate_instruction() returned EMULATE_FAIL, so the guest gets an program interrupt for the illegal opcode. However, the kvmppc_emulate_instruction() also tried to inject a program exception for this already, so the program interrupt gets injected twice and the return address in srr0 gets destroyed. All other callers of kvmppc_emulate_instruction() are also injecting a program interrupt, and since the callers have the right knowledge about the srr1 flags that should be used, it is the function kvmppc_emulate_instruction() that should _not_ inject program interrupts, so remove the kvmppc_core_queue_program() here. This fixes the issue discovered by Laurent Vivier with kvm-unit-tests where the logs are filled with these messages when the test tries to execute an illegal instruction: Couldn't emulate instruction 0x00000000 (op 0 xop 0) kvmppc_handle_exit_pr: emulation at 700 failed (00000000) Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Alexander Graf <agraf@suse.de> Tested-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ross Lagerwall authored
commit 707e59ba upstream. The following commit: 1fb3a8b2 ("xen/spinlock: Fix locking path engaging too soon under PVHVM.") ... moved the initalization of the kicker interrupt until after native_cpu_up() is called. However, when using qspinlocks, a CPU may try to kick another CPU that is spinning (because it has not yet initialized its kicker interrupt), resulting in the following crash during boot: kernel BUG at /build/linux-Ay7j_C/linux-4.4.0/drivers/xen/events/events_base.c:1210! invalid opcode: 0000 [#1] SMP ... RIP: 0010:[<ffffffff814c97c9>] [<ffffffff814c97c9>] xen_send_IPI_one+0x59/0x60 ... Call Trace: [<ffffffff8102be9e>] xen_qlock_kick+0xe/0x10 [<ffffffff810cabc2>] __pv_queued_spin_unlock+0xb2/0xf0 [<ffffffff810ca6d1>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20 [<ffffffff81052936>] ? check_tsc_warp+0x76/0x150 [<ffffffff81052aa6>] check_tsc_sync_source+0x96/0x160 [<ffffffff81051e28>] native_cpu_up+0x3d8/0x9f0 [<ffffffff8102b315>] xen_hvm_cpu_up+0x35/0x80 [<ffffffff8108198c>] _cpu_up+0x13c/0x180 [<ffffffff81081a4a>] cpu_up+0x7a/0xa0 [<ffffffff81f80dfc>] smp_init+0x7f/0x81 [<ffffffff81f5a121>] kernel_init_freeable+0xef/0x212 [<ffffffff81817f30>] ? rest_init+0x80/0x80 [<ffffffff81817f3e>] kernel_init+0xe/0xe0 [<ffffffff8182488f>] ret_from_fork+0x3f/0x70 [<ffffffff81817f30>] ? rest_init+0x80/0x80 To fix this, only send the kick if the target CPU's interrupt has been initialized. This check isn't racy, because the target is waiting for the spinlock, so it won't have initialized the interrupt in the meantime. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Juergen Gross <jgross@suse.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Cc: xen-devel@lists.xenproject.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Vitaly Kuznetsov authored
commit a9f61ca7 upstream. When we crash from NMI context (e.g. after NMI injection from host when 'sysctl -w kernel.unknown_nmi_panic=1' is set) we hit kernel BUG at mm/vmalloc.c:1530! as vfree() is denied. While the issue could be solved with in_nmi() check instead I opted for skipping vfree on all sorts of crashes to reduce the amount of work which can cause consequent crashes. We don't really need to free anything on crash. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Vitaly Kuznetsov authored
commit 77c0c973 upstream. When we iterate through all HA regions in handle_pg_range() we have an assumption that all these regions are sorted in the list and the 'start_pfn >= has->end_pfn' check is enough to find the proper region. Unfortunately it's not the case with WS2016 where host can hot-add regions in a different order. We end up modifying the wrong HA region and crashing later on pages online. Modify the check to make sure we found the region we were searching for while iterating. Fix the same check in pfn_covered() as well. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mika Westerberg authored
commit bcb48cca upstream. The Cherryview GPIO controller has 8 or 16 wires connected to the I/O-APIC which can be used directly by the platform/BIOS or drivers. One such wire is used as SCI (System Control Interrupt) which ACPI depends on to be able to trigger GPEs (General Purpose Events). The pinctrl driver itself uses another IRQ resource which is wire OR of all the 8 (or 16) wires and follows what BIOS has programmed to the IntSel register of each pin. Currently the driver masks all interrupts at probe time and this prevents these direct interrupts from working as expected. The reason for this is that some early stage prototypes had some pins misconfigured causing lots of spurious interrupts. We fix this by leaving the interrupt mask untouched. This allows SCI and other direct interrupts work properly. What comes to the possible spurious interrupts we switch the default handler to be handle_bad_irq() instead of handle_simple_irq() (which was not correct anyway). Reported-by: Yu C Chen <yu.c.chen@intel.com> Reported-by: Anisse Astier <anisse@astier.eu> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Hung authored
commit e34fbbac upstream. Some system supports hybrid graphics and its discrete VGA does not have any connectors and therefore has no _DOD method. Signed-off-by: Alex Hung <alex.hung@canonical.com> Reviewed-by: Aaron Lu <aaron.lu@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Manoj N. Kumar authored
commit 83430833 upstream. With the current value of cmd_per_lun at 16, the throughput over a single adapter is limited to around 150kIOPS. Increase the value of cmd_per_lun to 256 to improve throughput. With this change a single adapter is able to attain close to the maximum throughput (380kIOPS). Also change the number of RRQ entries that can be queued. Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Reviewed-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Wang, Rui Y authored
commit ddef4824 upstream. mcryptd_create_hash() fails by returning -EINVAL, causing any driver using mcryptd to fail to load. It is because it needs to set its statesize properly. Signed-off-by: Rui Wang <rui.y.wang@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Wang, Rui Y authored
commit 1a078340 upstream. cryptd_create_hash() fails by returning -EINVAL. It is because after 8996eafd ("crypto: ahash - ensure statesize is non-zero") all ahash drivers must have a non-zero statesize. This patch fixes the problem by properly assigning the statesize. Signed-off-by: Rui Wang <rui.y.wang@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Wang, Rui Y authored
commit 3a020a72 upstream. ghash_clmulni_intel fails to load on Linux 4.3+ with the following message: "modprobe: ERROR: could not insert 'ghash_clmulni_intel': Invalid argument" After 8996eafd ("crypto: ahash - ensure statesize is non-zero") all ahash drivers are required to implement import()/export(), and must have a non- zero statesize. This patch has been tested with the algif_hash interface. The calculated digest values, after several rounds of import()s and export()s, match those calculated by tcrypt. Signed-off-by: Rui Wang <rui.y.wang@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-