- 06 Dec, 2014 9 commits
-
-
Cristian Stoica authored
Replace equivalent (and partially incorrect) scatter-gather functions with ones from crypto-API. The replacement is motivated by page-faults in sg_copy_part triggered by successive calls to crypto_hash_update. The following fault appears after calling crypto_ahash_update twice, first with 13 and then with 285 bytes: Unable to handle kernel paging request for data at address 0x00000008 Faulting instruction address: 0xf9bf9a8c Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=8 CoreNet Generic Modules linked in: tcrypt(+) caamhash caam_jr caam tls CPU: 6 PID: 1497 Comm: cryptomgr_test Not tainted 3.12.19-rt30-QorIQ-SDK-V1.6+g9fda9f2 #75 task: e9308530 ti: e700e000 task.ti: e700e000 NIP: f9bf9a8c LR: f9bfcf28 CTR: c0019ea0 REGS: e700fb80 TRAP: 0300 Not tainted (3.12.19-rt30-QorIQ-SDK-V1.6+g9fda9f2) MSR: 00029002 <CE,EE,ME> CR: 44f92024 XER: 20000000 DEAR: 00000008, ESR: 00000000 GPR00: f9bfcf28 e700fc30 e9308530 e70b1e55 00000000 ffffffdd e70b1e54 0bebf888 GPR08: 902c7ef5 c0e771e2 00000002 00000888 c0019ea0 00000000 00000000 c07a4154 GPR16: c08d0000 e91a8f9c 00000001 e98fb400 00000100 e9c83028 e70b1e08 e70b1d48 GPR24: e992ce10 e70b1dc8 f9bfe4f4 e70b1e55 ffffffdd e70b1ce0 00000000 00000000 NIP [f9bf9a8c] sg_copy+0x1c/0x100 [caamhash] LR [f9bfcf28] ahash_update_no_ctx+0x628/0x660 [caamhash] Call Trace: [e700fc30] [f9bf9c50] sg_copy_part+0xe0/0x160 [caamhash] (unreliable) [e700fc50] [f9bfcf28] ahash_update_no_ctx+0x628/0x660 [caamhash] [e700fcb0] [f954e19c] crypto_tls_genicv+0x13c/0x300 [tls] [e700fd10] [f954e65c] crypto_tls_encrypt+0x5c/0x260 [tls] [e700fd40] [c02250ec] __test_aead.constprop.9+0x2bc/0xb70 [e700fe40] [c02259f0] alg_test_aead+0x50/0xc0 [e700fe60] [c02241e4] alg_test+0x114/0x2e0 [e700fee0] [c022276c] cryptomgr_test+0x4c/0x60 [e700fef0] [c004f658] kthread+0x98/0xa0 [e700ff40] [c000fd04] ret_from_kernel_thread+0x5c/0x64 Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au> (cherry picked from commit 307fd543) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Michael Ellerman authored
We don't expect to get errors from the hypervisor when reading the rng, but if we do we should pass the error up to the hwrng driver. Otherwise the hwrng driver will continue calling us forever. Signed-off-by:
Michael Ellerman <michael@ellerman.id.au> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au> (cherry picked from commit d319fe2a) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Cong Wang authored
Since f660daac (oom: thaw threads if oom killed thread is frozen before deferring) OOM killer relies on being able to thaw a frozen task to handle OOM situation but a3201227 (freezer: make freezing() test freeze conditions in effect instead of TIF_FREEZE) has reorganized the code and stopped clearing freeze flag in __thaw_task. This means that the target task only wakes up and goes into the fridge again because the freezing condition hasn't changed for it. This reintroduces the bug fixed by f660daac. Fix the issue by checking for TIF_MEMDIE thread flag in freezing_slow_path and exclude the task from freezing completely. If a task was already frozen it would get woken by __thaw_task from OOM killer and get out of freezer after rechecking freezing(). Changes since v1 - put TIF_MEMDIE check into freezing_slowpath rather than in __refrigerator as per Oleg - return __thaw_task into oom_scan_process_thread because oom_kill_process will not wake task in the fridge because it is sleeping uninterruptible [mhocko@suse.cz: rewrote the changelog] Fixes: a3201227 (freezer: make freezing() test freeze conditions in effect instead of TIF_FREEZE) Cc: 3.3+ <stable@vger.kernel.org> # 3.3+ Signed-off-by:
Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by:
Michal Hocko <mhocko@suse.cz> Acked-by:
Oleg Nesterov <oleg@redhat.com> Signed-off-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com> (cherry picked from commit 51fae6da) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Quentin Casasnovas authored
Fixes commit 2f06fa04 which was an incorrect backported version of commit d6b41cb0 upstream. If val_count is zero we return -EINVAL with map->lock_arg locked, which will deadlock the kernel next time we try to acquire this lock. This was introduced by f5942dd ("regmap: fix possible ZERO_SIZE_PTR pointer dereferencing error.") which improperly back-ported d6b41cb0. This issue was found during review of Ubuntu Trusty 3.13.0-40.68 kernel to prepare Ksplice rebootless updates. Fixes: f5942dd ("regmap: fix possible ZERO_SIZE_PTR pointer dereferencing error.") Signed-off-by:
Quentin Casasnovas <quentin.casasnovas@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 197b3975) (cherry picked from commit HEAD) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
bob picco authored
This patch attempts to do a few things. The highlights are: 1) enable SPARSE_IRQ unconditionally, 2) kills off !SPARSE_IRQ code 3) allocates ivector_table at boot time and 4) default to cookie only VIRQ mechanism for supported firmware. The first firmware with cookie only support for me appears on T5. You can optionally force the HV firmware to not cookie only mode which is the sysino support. The sysino is a deprecated HV mechanism according to the most recent SPARC Virtual Machine Specification. HV_GRP_INTR is what controls the cookie/sysino firmware versioning. The history of this interface is: 1) Major version 1.0 only supported sysino based interrupt interfaces. 2) Major version 2.0 added cookie based VIRQs, however due to the fact that OSs were using the VIRQs without negoatiating major version 2.0 (Linux and Solaris are both guilty), the VIRQs calls were allowed even with major version 1.0 To complicate things even further, the VIRQ interfaces were only actually hooked up in the hypervisor for LDC interrupt sources. VIRQ calls on other device types would result in HV_EINVAL errors. So effectively, major version 2.0 is unusable. 3) Major version 3.0 was created to signal use of VIRQs and the fact that the hypervisor has these calls hooked up for all interrupt sources, not just those for LDC devices. A new boot option is provided should cookie only HV support have issues. hvirq - this is the version for HV_GRP_INTR. This is related to HV API versioning. The code attempts major=3 first by default. The option can be used to override this default. I've tested with SPARSE_IRQ on T5-8, M7-4 and T4-X and Jalap?no. Signed-off-by:
Bob Picco <bob.picco@oracle.com> Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit ee6a9333) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
bob picco authored
The T5 (niagara5) has different PCR related HV fast trap values and a new HV API Group. This patch utilizes these and shares when possible with niagara4. We use the same sparc_pmu niagara4_pmu. Should there be new effort to obtain the MCU perf statistics then this would have to be changed. Cc: sparclinux@vger.kernel.org Signed-off-by:
Bob Picco <bob.picco@oracle.com> Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit 05aa1651) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Xiubo Li authored
Since we cannot make sure the 'val_count' will always be none zero here, and then if it equals to zero, the kmemdup() will return ZERO_SIZE_PTR, which equals to ((void *)16). So this patch fix this with just doing the zero check before calling kmemdup(). Signed-off-by:
Xiubo Li <Li.Xiubo@freescale.com> Signed-off-by:
Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org (cherry picked from commit d6b41cb0) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Bryan O'Donoghue authored
This patch is to enable the USB gadget device for Intel Quark X1000 Signed-off-by:
Bryan O'Donoghue <bryan.odonoghue@intel.com> Signed-off-by:
Bing Niu <bing.niu@intel.com> Signed-off-by:
Alvin (Weike) Chen <alvin.chen@intel.com> Signed-off-by:
Felipe Balbi <balbi@ti.com> (cherry picked from commit a68df706) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Stanislaw Gruszka authored
X550VB as many others Asus laptops need wapf4 quirk to make RFKILL switch be functional. Otherwise system boots with wireless card disabled and is only possible to enable it by suspend/resume. Bug report: http://bugzilla.redhat.com/show_bug.cgi?id=1089731#c23Reported-and-tested-by:
Vratislav Podzimek <vpodzime@redhat.com> Signed-off-by:
Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by:
Darren Hart <dvhart@linux.intel.com> (cherry picked from commit 4ec7a45b) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
- 22 Nov, 2014 2 commits
-
-
Sasha Levin authored
Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Willy Tarreau authored
Commits b7c9e4ee ("sysctl net: Keep tcp_syn_retries inside the boundary") and eedcafdc ("net: check net.core.somaxconn sysctl values") were missing a .strategy entry which is still required in 2.6.32. Because of this, the Ubuntu kernel team has faced kernel dumps during their testing. Tyler Hicks and Luis Henriques proposed this patch to fix the issue, which properly sets .strategy as needed in 2.6.32. Reported-by:
Luis Henriques <luis.henriques@canonical.com> Cc: tyler.hicks@canonical.com Cc: Michal Tesar <mtesar@redhat.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by:
Willy Tarreau <w@1wt.eu> (cherry picked from commit b90422de) (cherry picked from commit HEAD) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
- 21 Nov, 2014 29 commits
-
-
Matthew Leach authored
When copying in a struct msghdr from the user, if the user has set the msg_namelen parameter to a negative value it gets clamped to a valid size due to a comparison between signed and unsigned values. Ensure the syscall errors when the user passes in a negative value. Signed-off-by:
Matthew Leach <matthew.leach@arm.com> Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit dbb490b9) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Vlad Yasevich authored
The following commit: b6c40d68 net: only invoke dev->change_rx_flags when device is UP tried to fix a problem with VLAN devices and promiscuouse flag setting. The issue was that VLAN device was setting a flag on an interface that was down, thus resulting in bad promiscuity count. This commit blocked flag propagation to any device that is currently down. A later commit: deede2fa vlan: Don't propagate flag changes on down interfaces fixed VLAN code to only propagate flags when the VLAN interface is up, thus fixing the same issue as above, only localized to VLAN. The problem we have now is that if we have create a complex stack involving multiple software devices like bridges, bonds, and vlans, then it is possible that the flags would not propagate properly to the physical devices. A simple examle of the scenario is the following: eth0----> bond0 ----> bridge0 ---> vlan50 If bond0 or eth0 happen to be down at the time bond0 is added to the bridge, then eth0 will never have promisc mode set which is currently required for operation as part of the bridge. As a result, packets with vlan50 will be dropped by the interface. The only 2 devices that implement the special flag handling are VLAN and DSA and they both have required code to prevent incorrect flag propagation. As a result we can remove the generic solution introduced in b6c40d68 and leave it to the individual devices to decide whether they will block flag propagation or not. Reported-by:
Stefan Priebe <s.priebe@profihost.ag> Suggested-by:
Veaceslav Falico <vfalico@redhat.com> Signed-off-by:
Vlad Yasevich <vyasevic@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit d2615bf4) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Dan Carpenter authored
If kmsg->msg_namelen > sizeof(struct sockaddr_storage) then in the original code that would lead to memory corruption in the kernel if you had audit configured. If you didn't have audit configured it was harmless. There are some programs such as beta versions of Ruby which use too large of a buffer and returning an error code breaks them. We should clamp the ->msg_namelen value instead. Fixes: 1661bf36 ("net: heap overflow in __audit_sockaddr()") Reported-by:
Eric Wong <normalperson@yhbt.net> Signed-off-by:
Dan Carpenter <dan.carpenter@oracle.com> Tested-by:
Eric Wong <normalperson@yhbt.net> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit db31c55a) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Roman Gushchin authored
It's possible to assign an invalid value to the net.core.somaxconn sysctl variable, because there is no checks at all. The sk_max_ack_backlog field of the sock structure is defined as unsigned short. Therefore, the backlog argument in inet_listen() shouldn't exceed USHRT_MAX. The backlog argument in the listen() syscall is truncated to the somaxconn value. So, the somaxconn value shouldn't exceed 65535 (USHRT_MAX). Also, negative values of somaxconn are meaningless. before: $ sysctl -w net.core.somaxconn=256 net.core.somaxconn = 256 $ sysctl -w net.core.somaxconn=65536 net.core.somaxconn = 65536 $ sysctl -w net.core.somaxconn=-100 net.core.somaxconn = -100 after: $ sysctl -w net.core.somaxconn=256 net.core.somaxconn = 256 $ sysctl -w net.core.somaxconn=65536 error: "Invalid argument" setting key "net.core.somaxconn" $ sysctl -w net.core.somaxconn=-100 error: "Invalid argument" setting key "net.core.somaxconn" Based on a prior patch from Changli Gao. Signed-off-by:
Roman Gushchin <klamm@yandex-team.ru> Reported-by:
Changli Gao <xiaosuo@gmail.com> Suggested-by:
Eric Dumazet <edumazet@google.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit 5f671d6b) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Michal Tesar authored
Limit the min/max value passed to the /proc/sys/net/ipv4/tcp_syn_retries. Signed-off-by:
Michal Tesar <mtesar@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit 651e9271) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Nikola Pajkovsky authored
crypto_larval_lookup should only return a larval if it created one. Any larval created by another entity must be processed through crypto_larval_wait before being returned. Otherwise this will lead to a larval being killed twice, which will most likely lead to a crash. Cc: stable@vger.kernel.org Reported-by:
Kees Cook <keescook@chromium.org> Tested-by:
Kees Cook <keescook@chromium.org> Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au> (cherry picked from commit 77dbd7a9) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Julian Anastasov authored
Fix CHECKSUM_PARTIAL handling. Tested for IPv4 TCP, UDP not tested because it needs network card with HW CSUM support. May be fixes problem where IPVS can not be used in virtual boxes. Problem appears with DNAT to local address when the local stack sends reply in CHECKSUM_PARTIAL mode. Fix tcp_dnat_handler and udp_dnat_handler to provide vaddr and daddr in right order (old and new IP) when calling tcp_partial_csum_update/udp_partial_csum_update (CHECKSUM_PARTIAL). Signed-off-by:
Julian Anastasov <ja@ssi.bg> Signed-off-by:
Simon Horman <horms@verge.net.au> (cherry picked from commit 5bc9068e) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Willy Tarreau authored
syscall_trace_enter() and syscall_trace_leave() are only called from within asm code and do not need to be declared in the .c at all. Removing their reference fixes the build issue that was happening with gcc 4.7. Both Sven-Haegar Koch and Christoph Biedl confirmed this patch addresses their respective build issues. Cc: Sven-Haegar Koch <haegar@sdinet.de> Cc: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de> Signed-off-by:
Willy Tarreau <w@1wt.eu> (cherry picked from commit 40c74e0d) (cherry picked from commit HEAD) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Weiping Pan authored
Jay Fenlason (fenlason@redhat.com) found a bug, that recvfrom() on an RDS socket can return the contents of random kernel memory to userspace if it was called with a address length larger than sizeof(struct sockaddr_in). rds_recvmsg() also fails to set the addr_len paramater properly before returning, but that's just a bug. There are also a number of cases wher recvfrom() can return an entirely bogus address. Anything in rds_recvmsg() that returns a non-negative value but does not go through the "sin = (struct sockaddr_in *)msg->msg_name;" code path at the end of the while(1) loop will return up to 128 bytes of kernel memory to userspace. And I write two test programs to reproduce this bug, you will see that in rds_server, fromAddr will be overwritten and the following sock_fd will be destroyed. Yes, it is the programmer's fault to set msg_namelen incorrectly, but it is better to make the kernel copy the real length of address to user space in such case. How to run the test programs ? I test them on 32bit x86 system, 3.5.0-rc7. 1 compile gcc -o rds_client rds_client.c gcc -o rds_server rds_server.c 2 run ./rds_server on one console 3 run ./rds_client on another console 4 you will see something like: server is waiting to receive data... old socket fd=3 server received data from client:data from client msg.msg_namelen=32 new socket fd=-1067277685 sendmsg() : Bad file descriptor /***************** rds_client.c ********************/ int main(void) { int sock_fd; struct sockaddr_in serverAddr; struct sockaddr_in toAddr; char recvBuffer[128] = "data from client"; struct msghdr msg; struct iovec iov; sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0); if (sock_fd < 0) { perror("create socket error\n"); exit(1); } memset(&serverAddr, 0, sizeof(serverAddr)); serverAddr.sin_family = AF_INET; serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1"); serverAddr.sin_port = htons(4001); if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) { perror("bind() error\n"); close(sock_fd); exit(1); } memset(&toAddr, 0, sizeof(toAddr)); toAddr.sin_family = AF_INET; toAddr.sin_addr.s_addr = inet_addr("127.0.0.1"); toAddr.sin_port = htons(4000); msg.msg_name = &toAddr; msg.msg_namelen = sizeof(toAddr); msg.msg_iov = &iov; msg.msg_iovlen = 1; msg.msg_iov->iov_base = recvBuffer; msg.msg_iov->iov_len = strlen(recvBuffer) + 1; msg.msg_control = 0; msg.msg_controllen = 0; msg.msg_flags = 0; if (sendmsg(sock_fd, &msg, 0) == -1) { perror("sendto() error\n"); close(sock_fd); exit(1); } printf("client send data:%s\n", recvBuffer); memset(recvBuffer, '\0', 128); msg.msg_name = &toAddr; msg.msg_namelen = sizeof(toAddr); msg.msg_iov = &iov; msg.msg_iovlen = 1; msg.msg_iov->iov_base = recvBuffer; msg.msg_iov->iov_len = 128; msg.msg_control = 0; msg.msg_controllen = 0; msg.msg_flags = 0; if (recvmsg(sock_fd, &msg, 0) == -1) { perror("recvmsg() error\n"); close(sock_fd); exit(1); } printf("receive data from server:%s\n", recvBuffer); close(sock_fd); return 0; } /***************** rds_server.c ********************/ int main(void) { struct sockaddr_in fromAddr; int sock_fd; struct sockaddr_in serverAddr; unsigned int addrLen; char recvBuffer[128]; struct msghdr msg; struct iovec iov; sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0); if(sock_fd < 0) { perror("create socket error\n"); exit(0); } memset(&serverAddr, 0, sizeof(serverAddr)); serverAddr.sin_family = AF_INET; serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1"); serverAddr.sin_port = htons(4000); if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) { perror("bind error\n"); close(sock_fd); exit(1); } printf("server is waiting to receive data...\n"); msg.msg_name = &fromAddr; /* * I add 16 to sizeof(fromAddr), ie 32, * and pay attention to the definition of fromAddr, * recvmsg() will overwrite sock_fd, * since kernel will copy 32 bytes to userspace. * * If you just use sizeof(fromAddr), it works fine. * */ msg.msg_namelen = sizeof(fromAddr) + 16; /* msg.msg_namelen = sizeof(fromAddr); */ msg.msg_iov = &iov; msg.msg_iovlen = 1; msg.msg_iov->iov_base = recvBuffer; msg.msg_iov->iov_len = 128; msg.msg_control = 0; msg.msg_controllen = 0; msg.msg_flags = 0; while (1) { printf("old socket fd=%d\n", sock_fd); if (recvmsg(sock_fd, &msg, 0) == -1) { perror("recvmsg() error\n"); close(sock_fd); exit(1); } printf("server received data from client:%s\n", recvBuffer); printf("msg.msg_namelen=%d\n", msg.msg_namelen); printf("new socket fd=%d\n", sock_fd); strcat(recvBuffer, "--data from server"); if (sendmsg(sock_fd, &msg, 0) == -1) { perror("sendmsg()\n"); close(sock_fd); exit(1); } } close(sock_fd); return 0; } Signed-off-by:
Weiping Pan <wpan@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit 06b6a1cf) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Mathias Krause authored
For stream sockets the code misses to update the msg_namelen member to 0 and therefore makes net/socket.c leak the local, uninitialized sockaddr_storage variable to userland -- 128 bytes of kernel stack memory. The msg_namelen update is also missing for datagram sockets in case the socket is shutting down during receive. Fix both issues by setting msg_namelen to 0 early. It will be updated later if we're going to fill the msg_name member. Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by:
Mathias Krause <minipli@googlemail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit c77a4b9c) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Mathias Krause authored
The current code does not fill the msg_name member in case it is set. It also does not set the msg_namelen member to 0 and therefore makes net/socket.c leak the local, uninitialized sockaddr_storage variable to userland -- 128 bytes of kernel stack memory. Fix that by simply setting msg_namelen to 0 as obviously nobody cared about vcc_recvmsg() not filling the msg_name in case it was set. Signed-off-by:
Mathias Krause <minipli@googlemail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit 9b3e617f) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Eric Dumazet authored
In various network workloads, __do_softirq() latencies can be up to 20 ms if HZ=1000, and 200 ms if HZ=100. This is because we iterate 10 times in the softirq dispatcher, and some actions can consume a lot of cycles. This patch changes the fallback to ksoftirqd condition to : - A time limit of 2 ms. - need_resched() being set on current task When one of this condition is met, we wakeup ksoftirqd for further softirq processing if we still have pending softirqs. Using need_resched() as the only condition can trigger RCU stalls, as we can keep BH disabled for too long. I ran several benchmarks and got no significant difference in throughput, but a very significant reduction of latencies (one order of magnitude) : In following bench, 200 antagonist "netperf -t TCP_RR" are started in background, using all available cpus. Then we start one "netperf -t TCP_RR", bound to the cpu handling the NIC IRQ (hard+soft) Before patch : # netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind RT_LATENCY=550110.424 MIN_LATENCY=146858 MAX_LATENCY=997109 P50_LATENCY=305000 P90_LATENCY=550000 P99_LATENCY=710000 MEAN_LATENCY=376989.12 STDDEV_LATENCY=184046.92 After patch : # netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind RT_LATENCY=40545.492 MIN_LATENCY=9834 MAX_LATENCY=78366 P50_LATENCY=33583 P90_LATENCY=59000 P99_LATENCY=69000 MEAN_LATENCY=38364.67 STDDEV_LATENCY=12865.26 Signed-off-by:
Eric Dumazet <edumazet@google.com> Cc: David Miller <davem@davemloft.net> Cc: Tom Herbert <therbert@google.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit c10d7367) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Eric Dumazet authored
We should use time_after_eq() to get maximum latency of two ticks, instead of three. Bug added in commit 24f8b238 (net: increase receive packet quantum) Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit d1f41b67) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Mathias Krause authored
If RFCOMM_DEFER_SETUP is set in the flags, rfcomm_sock_recvmsg() returns early with 0 without updating the possibly set msg_namelen member. This, in turn, leads to a 128 byte kernel stack leak in net/socket.c. Fix this by updating msg_namelen in this case. For all other cases it will be handled in bt_sock_stream_recvmsg(). Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Gustavo Padovan <gustavo@padovan.org> Cc: Johan Hedberg <johan.hedberg@gmail.com> Signed-off-by:
Mathias Krause <minipli@googlemail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit e11e0455) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
James Bottomley authored
USB surprise removal of sr is triggering an oops in scsi_dispatch_command(). What seems to be happening is that USB is hanging on to a queue reference until the last close of the upper device, so the crash is caused by surprise remove of a mounted CD followed by attempted unmount. The problem is that USB doesn't issue its final commands as part of the SCSI teardown path, but on last close when the block queue is long gone. The long term fix is probably to make sr do the teardown in the same way as sd (so remove all the lower bits on ejection, but keep the upper disk alive until last close of user space). However, the current oops can be simply fixed by not allowing any commands to be sent to a dead queue. Cc: stable@kernel.org Signed-off-by:
James Bottomley <JBottomley@Parallels.com> (cherry picked from commit bfe159a5) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Ian Abbott authored
`s626_enc_insn_config()` is incorrectly dereferencing `insn->data` which is a pointer to user memory. It should be dereferencing the separate `data` parameter that points to a copy of the data in kernel memory. Signed-off-by:
Ian Abbott <abbotti@mev.co.uk> Reviewed-by:
H Hartley Sweeten <hsweeten@visionengravers.com> Cc: stable <stable@vger.kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit b655c2c4) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Romain Francoise authored
Before v2.6.38 CONFIG_EXPERT was known as CONFIG_EMBEDDED but the Kconfig entry was not changed to match when upstream commit 628c6246 ("x86, random: Architectural inlines to get random integers with RDRAND") was backported. Signed-off-by:
Romain Francoise <romain@orebokech.com> Signed-off-by:
Willy Tarreau <w@1wt.eu> (cherry picked from commit 119274d6) (cherry picked from commit HEAD) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Hugh Dickins authored
In fuzzing with trinity, lockdep protested "possible irq lock inversion dependency detected" when isolate_lru_page() reenabled interrupts while still holding the supposedly irq-safe tree_lock: invalidate_inode_pages2 invalidate_complete_page2 spin_lock_irq(&mapping->tree_lock) clear_page_mlock isolate_lru_page spin_unlock_irq(&zone->lru_lock) isolate_lru_page() is correct to enable interrupts unconditionally: invalidate_complete_page2() is incorrect to call clear_page_mlock() while holding tree_lock, which is supposed to nest inside lru_lock. Both truncate_complete_page() and invalidate_complete_page() call clear_page_mlock() before taking tree_lock to remove page from radix_tree. I guess invalidate_complete_page2() preferred to test PageDirty (again) under tree_lock before committing to the munlock; but since the page has already been unmapped, its state is already somewhat inconsistent, and no worse if clear_page_mlock() moved up. Reported-by:
Sasha Levin <levinsasha928@gmail.com> Deciphered-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Hugh Dickins <hughd@google.com> Acked-by:
Mel Gorman <mel@csn.ul.ie> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michel Lespinasse <walken@google.com> Cc: Ying Han <yinghan@google.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit ec4d9f62) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Oleg Nesterov authored
Minor cleanup. ____call_usermodehelper() can simply return, no need to call do_exit() explicitely. Signed-off-by:
Oleg Nesterov <oleg@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 5b9bd473) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Kees Cook authored
Fix possible overflow of the buffer used for expanding environment variables when building file list. In the extremely unlikely case of an attacker having control over the environment variables visible to gen_init_cpio, control over the contents of the file gen_init_cpio parses, and gen_init_cpio was built without compiler hardening, the attacker can gain arbitrary execution control via a stack buffer overflow. $ cat usr/crash.list file foo ${BIG}${BIG}${BIG}${BIG}${BIG}${BIG} 0755 0 0 $ BIG=$(perl -e 'print "A" x 4096;') ./usr/gen_init_cpio usr/crash.list *** buffer overflow detected ***: ./usr/gen_init_cpio terminated This also replaces the space-indenting with tabs. Patch based on existing fix extracted from grsecurity. Signed-off-by:
Kees Cook <keescook@chromium.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Brad Spengler <spender@grsecurity.net> Cc: PaX Team <pageexec@freemail.hu> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 20f1de65) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Ben Hutchings authored
This reverts commit 2af3af56, which was commit 6c4088ac upstream. This broke compilation of the driver in 2.6.32.y as the early_io{remap,unmap}() functions are not defined for ia64. The driver can *only* be built for ia64 (even in current mainline), so a fix for x86_64 is pointless. Signed-off-by:
Ben Hutchings <ben@decadent.org.uk> Signed-off-by:
Willy Tarreau <w@1wt.eu> (cherry picked from commit 01ab25d5) (cherry picked from commit HEAD) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Linus Torvalds authored
Add a new interface, add_device_randomness() for adding data to the random pool that is likely to differ between two devices (or possibly even per boot). This would be things like MAC addresses or serial numbers, or the read-out of the RTC. This does *not* add any actual entropy to the pool, but it initializes the pool to different values for devices that might otherwise be identical and have very little entropy available to them (particularly common in the embedded world). [ Modified by tytso to mix in a timestamp, since there may be some variability caused by the time needed to detect/configure the hardware in question. ] Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org (cherry picked from commit a2080a67) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
bjschuma@gmail.com authored
This allows distros to remove the line from their modprobe configuration. Signed-off-by:
Bryan Schumaker <bjschuma@netapp.com> Cc: stable@vger.kernel.org Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com> (cherry picked from commit 425e776d) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Thomas Jarosch authored
Some BIOS implementations leave the Intel GPU interrupts enabled, even though no one is handling them (f.e. i915 driver is never loaded). Additionally the interrupt destination is not set up properly and the interrupt ends up -somewhere-. These spurious interrupts are "sticky" and the kernel disables the (shared) interrupt line after 100.000+ generated interrupts. Fix it by disabling the still enabled interrupts. This resolves crashes often seen on monitor unplug. Tested on the following boards: - Intel DH61CR: Affected - Intel DH67BL: Affected - Intel S1200KP server board: Affected - Asus P8H61-M LE: Affected, but system does not crash. Probably the IRQ ends up somewhere unnoticed. According to reports on the net, the Intel DH61WW board is also affected. Many thanks to Jesse Barnes from Intel for helping with the register configuration and to Intel in general for providing public hardware documentation. Signed-off-by:
Thomas Jarosch <thomas.jarosch@intra2net.com> Tested-by:
Charlie Suffin <charlie.suffin@stratus.com> Signed-off-by:
Jesse Barnes <jbarnes@virtuousgeek.org> (cherry picked from commit f67fd55f) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Salman Qazi authored
When a machine boots up, the TSC generally gets reset. However, when kexec is used to boot into a kernel, the TSC value would be carried over from the previous kernel. The computation of cycns_offset in set_cyc2ns_scale is prone to an overflow, if the machine has been up more than 208 days prior to the kexec. The overflow happens when we multiply *scale, even though there is enough room to store the final answer. We fix this issue by decomposing tsc_now into the quotient and remainder of division by CYC2NS_SCALE_FACTOR and then performing the multiplication separately on the two components. Refactor code to share the calculation with the previous fix in __cycles_2_ns(). Signed-off-by:
Salman Qazi <sqazi@google.com> Acked-by:
John Stultz <john.stultz@linaro.org> Acked-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Turner <pjt@google.com> Cc: john stultz <johnstul@us.ibm.com> Link: http://lkml.kernel.org/r/20120310004027.19291.88460.stgit@dungbeetle.mtv.corp.google.comSigned-off-by:
Ingo Molnar <mingo@elte.hu> (cherry picked from commit 9993bc63) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Sasha Levin authored
'long secs' is passed as divisor to div_s64, which accepts a 32bit divisor. On 64bit machines that value is trimmed back from 8 bytes back to 4, causing a divide by zero when the number is bigger than (1 << 32) - 1 and all 32 lower bits are 0. Use div64_long() instead. Signed-off-by:
Sasha Levin <levinsasha928@gmail.com> Cc: johnstul@us.ibm.com Link: http://lkml.kernel.org/r/1331829374-31543-2-git-send-email-levinsasha928@gmail.com Cc: stable@vger.kernel.org Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> (cherry picked from commit a078c6d0) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Willy Tarreau authored
Initial stable commit : 2215d910 This patch backported into 2.6.32.55 is enabled when CONFIG_AMD_NB is set, but this config option does not exist in 2.6.32, it was called CONFIG_K8_NB, so the fix was never applied. Some other changes were needed to make it work. first, the correct include file name was asm/k8.h and not asm/amd_nb.h, and second, amd_get_mmconfig_range() is needed and was merged by previous patch. Thanks to Jiri Slabi who reported the issue and diagnosed all the dependencies. Signed-off-by:
Willy Tarreau <w@1wt.eu> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by:
Willy Tarreau <w@1wt.eu> (cherry picked from commit 46e8a56a) (cherry picked from commit HEAD) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Ben Hutchings authored
commit 32974ad4 upstream This just changes Kconfig rather than touching all the other files the original commit did. Patch description from the original commit : | [IA64] Remove COMPAT_IA32 support | | This has been broken since May 2008 when Al Viro killed altroot support. | Since nobody has complained, it would appear that there are no users of | this code (A plausible theory since the main OSVs that support ia64 prefer | to use the IA32-EL software emulation). | | Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk> Signed-off-by:
Willy Tarreau <w@1wt.eu> (cherry picked from commit d9a25c03) (cherry picked from commit HEAD) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Heiko Carstens authored
For kernels < 3.0 the backport of 048cd4e5 "compat: fix compile breakage on s390" will break compilation... Re-add a single #include <asm/compat.h> in order to fix this. This patch is _not_ necessary for upstream, only for stable kernels which include the "build fix" mentioned above. Reported-by:
Jiri Slaby <jslaby@suse.cz> Signed-off-by:
Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by:
Willy Tarreau <w@1wt.eu> (cherry picked from commit ee116431) (cherry picked from commit HEAD) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-