- 07 Mar, 2015 40 commits
-
-
subashab@codeaurora.org authored
An exception is seen in ICMP ping receive path where the skb destructor sock_rfree() tries to access a freed socket. This happens because ping_rcv() releases socket reference with sock_put() and this internally frees up the socket. Later icmp_rcv() will try to free the skb and as part of this, skb destructor is called and which leads to a kernel panic as the socket is freed already in ping_rcv(). -->|exception -007|sk_mem_uncharge -007|sock_rfree -008|skb_release_head_state -009|skb_release_all -009|__kfree_skb -010|kfree_skb -011|icmp_rcv -012|ip_local_deliver_finish Fix this incorrect free by cloning this skb and processing this cloned skb instead. This patch was suggested by Eric Dumazet Signed-off-by:
Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit fc752f1f) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Herbert Xu authored
While working on rhashtable walking I noticed that the UDP diag dumping code is buggy. In particular, the socket skipping within a chain never happens, even though we record the number of sockets that should be skipped. As this code was supposedly copied from TCP, this patch does what TCP does and resets num before we walk a chain. Signed-off-by:
Herbert Xu <herbert@gondor.apana.org.au> Acked-by:
Pavel Emelyanov <xemul@parallels.com> Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit 86f3cddb) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Hannes Frederic Sowa authored
Not caching dst_entries which cause redirects could be exploited by hosts on the same subnet, causing a severe DoS attack. This effect aggravated since commit f8864972 ("ipv4: fix dst race in sk_dst_get()"). Lookups causing redirects will be allocated with DST_NOCACHE set which will force dst_release to free them via RCU. Unfortunately waiting for RCU grace period just takes too long, we can end up with >1M dst_entries waiting to be released and the system will run OOM. rcuos threads cannot catch up under high softirq load. Attaching the flag to emit a redirect later on to the specific skb allows us to cache those dst_entries thus reducing the pressure on allocation and deallocation. This issue was discovered by Marcelo Leitner. Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by:
Marcelo Leitner <mleitner@redhat.com> Signed-off-by:
Florian Westphal <fw@strlen.de> Signed-off-by:
Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by:
Julian Anastasov <ja@ssi.bg> Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit df4d9254) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Daniel Borkmann authored
When hitting an INIT collision case during the 4WHS with AUTH enabled, as already described in detail in commit 1be9a950 ("net: sctp: inherit auth_capable on INIT collisions"), it can happen that we occasionally still remotely trigger the following panic on server side which seems to have been uncovered after the fix from commit 1be9a950 ... [ 533.876389] BUG: unable to handle kernel paging request at 00000000ffffffff [ 533.913657] IP: [<ffffffff811ac385>] __kmalloc+0x95/0x230 [ 533.940559] PGD 5030f2067 PUD 0 [ 533.957104] Oops: 0000 [#1] SMP [ 533.974283] Modules linked in: sctp mlx4_en [...] [ 534.939704] Call Trace: [ 534.951833] [<ffffffff81294e30>] ? crypto_init_shash_ops+0x60/0xf0 [ 534.984213] [<ffffffff81294e30>] crypto_init_shash_ops+0x60/0xf0 [ 535.015025] [<ffffffff8128c8ed>] __crypto_alloc_tfm+0x6d/0x170 [ 535.045661] [<ffffffff8128d12c>] crypto_alloc_base+0x4c/0xb0 [ 535.074593] [<ffffffff8160bd42>] ? _raw_spin_lock_bh+0x12/0x50 [ 535.105239] [<ffffffffa0418c11>] sctp_inet_listen+0x161/0x1e0 [sctp] [ 535.138606] [<ffffffff814e43bd>] SyS_listen+0x9d/0xb0 [ 535.166848] [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b ... or depending on the the application, for example this one: [ 1370.026490] BUG: unable to handle kernel paging request at 00000000ffffffff [ 1370.026506] IP: [<ffffffff811ab455>] kmem_cache_alloc+0x75/0x1d0 [ 1370.054568] PGD 633c94067 PUD 0 [ 1370.070446] Oops: 0000 [#1] SMP [ 1370.085010] Modules linked in: sctp kvm_amd kvm [...] [ 1370.963431] Call Trace: [ 1370.974632] [<ffffffff8120f7cf>] ? SyS_epoll_ctl+0x53f/0x960 [ 1371.000863] [<ffffffff8120f7cf>] SyS_epoll_ctl+0x53f/0x960 [ 1371.027154] [<ffffffff812100d3>] ? anon_inode_getfile+0xd3/0x170 [ 1371.054679] [<ffffffff811e3d67>] ? __alloc_fd+0xa7/0x130 [ 1371.080183] [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b With slab debugging enabled, we can see that the poison has been overwritten: [ 669.826368] BUG kmalloc-128 (Tainted: G W ): Poison overwritten [ 669.826385] INFO: 0xffff880228b32e50-0xffff880228b32e50. First byte 0x6a instead of 0x6b [ 669.826414] INFO: Allocated in sctp_auth_create_key+0x23/0x50 [sctp] age=3 cpu=0 pid=18494 [ 669.826424] __slab_alloc+0x4bf/0x566 [ 669.826433] __kmalloc+0x280/0x310 [ 669.826453] sctp_auth_create_key+0x23/0x50 [sctp] [ 669.826471] sctp_auth_asoc_create_secret+0xcb/0x1e0 [sctp] [ 669.826488] sctp_auth_asoc_init_active_key+0x68/0xa0 [sctp] [ 669.826505] sctp_do_sm+0x29d/0x17c0 [sctp] [...] [ 669.826629] INFO: Freed in kzfree+0x31/0x40 age=1 cpu=0 pid=18494 [ 669.826635] __slab_free+0x39/0x2a8 [ 669.826643] kfree+0x1d6/0x230 [ 669.826650] kzfree+0x31/0x40 [ 669.826666] sctp_auth_key_put+0x19/0x20 [sctp] [ 669.826681] sctp_assoc_update+0x1ee/0x2d0 [sctp] [ 669.826695] sctp_do_sm+0x674/0x17c0 [sctp] Since this only triggers in some collision-cases with AUTH, the problem at heart is that sctp_auth_key_put() on asoc->asoc_shared_key is called twice when having refcnt 1, once directly in sctp_assoc_update() and yet again from within sctp_auth_asoc_init_active_key() via sctp_assoc_update() on the already kzfree'd memory, which is also consistent with the observation of the poison decrease from 0x6b to 0x6a (note: the overwrite is detected at a later point in time when poison is checked on new allocation). Reference counting of auth keys revisited: Shared keys for AUTH chunks are being stored in endpoints and associations in endpoint_shared_keys list. On endpoint creation, a null key is being added; on association creation, all endpoint shared keys are being cached and thus cloned over to the association. struct sctp_shared_key only holds a pointer to the actual key bytes, that is, struct sctp_auth_bytes which keeps track of users internally through refcounting. Naturally, on assoc or enpoint destruction, sctp_shared_key are being destroyed directly and the reference on sctp_auth_bytes dropped. User space can add keys to either list via setsockopt(2) through struct sctp_authkey and by passing that to sctp_auth_set_key() which replaces or adds a new auth key. There, sctp_auth_create_key() creates a new sctp_auth_bytes with refcount 1 and in case of replacement drops the reference on the old sctp_auth_bytes. A key can be set active from user space through setsockopt() on the id via sctp_auth_set_active_key(), which iterates through either endpoint_shared_keys and in case of an assoc, invokes (one of various places) sctp_auth_asoc_init_active_key(). sctp_auth_asoc_init_active_key() computes the actual secret from local's and peer's random, hmac and shared key parameters and returns a new key directly as sctp_auth_bytes, that is asoc->asoc_shared_key, plus drops the reference if there was a previous one. The secret, which where we eventually double drop the ref comes from sctp_auth_asoc_set_secret() with intitial refcount of 1, which also stays unchanged eventually in sctp_assoc_update(). This key is later being used for crypto layer to set the key for the hash in crypto_hash_setkey() from sctp_auth_calculate_hmac(). To close the loop: asoc->asoc_shared_key is freshly allocated secret material and independant of the sctp_shared_key management keeping track of only shared keys in endpoints and assocs. Hence, also commit 4184b2a7 ("net: sctp: fix memory leak in auth key management") is independant of this bug here since it concerns a different layer (though same structures being used eventually). asoc->asoc_shared_key is reference dropped correctly on assoc destruction in sctp_association_free() and when active keys are being replaced in sctp_auth_asoc_init_active_key(), it always has a refcount of 1. Hence, it's freed prematurely in sctp_assoc_update(). Simple fix is to remove that sctp_auth_key_put() from there which fixes these panics. Fixes: 730fc3d0 ("[SCTP]: Implete SCTP-AUTH parameter processing") Signed-off-by:
Daniel Borkmann <dborkman@redhat.com> Acked-by:
Vlad Yasevich <vyasevich@gmail.com> Acked-by:
Neil Horman <nhorman@tuxdriver.com> Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit 600ddd68) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Eric Dumazet authored
NAPI poll logic now enforces that a poller returns exactly the budget when it wants to be called again. If a driver limits TX completion, it has to return budget as well when the limit is hit, not the number of received packets. Reported-and-tested-by:
Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by:
Eric Dumazet <edumazet@google.com> Fixes: d75b1ade ("net: less interrupt masking in NAPI") Cc: Manish Chopra <manish.chopra@qlogic.com> Acked-by:
Manish Chopra <manish.chopra@qlogic.com> Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit 6088beef) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Hagen Paul Pfeifer authored
Reduce the attack vector and stop generating IPv6 Fragment Header for paths with an MTU smaller than the minimum required IPv6 MTU size (1280 byte) - called atomic fragments. See IETF I-D "Deprecating the Generation of IPv6 Atomic Fragments" [1] for more information and how this "feature" can be misused. [1] https://tools.ietf.org/html/draft-ietf-6man-deprecate-atomfrag-generation-00Signed-off-by:
Fernando Gont <fgont@si6networks.com> Signed-off-by:
Hagen Paul Pfeifer <hagen@jauu.net> Acked-by:
Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit 9d289715) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
karl beldan authored
Fixed commit added from64to32 under _#ifndef do_csum_ but used it under _#ifndef csum_tcpudp_nofold_, breaking some builds (Fengguang's robot reported TILEGX's). Move from64to32 under the latter. Fixes: 150ae0e9 ("lib/checksum.c: fix carry in csum_tcpudp_nofold") Reported-by:
kbuild test robot <fengguang.wu@intel.com> Signed-off-by:
Karl Beldan <karl.beldan@rivierawaves.com> Cc: Eric Dumazet <edumazet@google.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit 9ce35779) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
karl beldan authored
Fixed commit added from64to32 under _#ifndef do_csum_ but used it under _#ifndef csum_tcpudp_nofold_, breaking some builds (Fengguang's robot reported TILEGX's). Move from64to32 under the latter. Fixes: 150ae0e9 ("lib/checksum.c: fix carry in csum_tcpudp_nofold") Reported-by:
kbuild test robot <fengguang.wu@intel.com> Signed-off-by:
Karl Beldan <karl.beldan@rivierawaves.com> Cc: Eric Dumazet <edumazet@google.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by:
David S. Miller <davem@davemloft.net> tcp: ipv4: initialize unicast_sock sk_pacing_rate When I added sk_pacing_rate field, I forgot to initialize its value in the per cpu unicast_sock used in ip_send_unicast_reply() This means that for sch_fq users, RST packets, or ACK packets sent on behalf of TIME_WAIT sockets might be sent to slowly or even dropped once we reach the per flow limit. Signed-off-by:
Eric Dumazet <edumazet@google.com> Fixes: 95bd09eb ("tcp: TSO packets automatic sizing") Signed-off-by:
David S. Miller <davem@davemloft.net> lib/checksum.c: fix carry in csum_tcpudp_nofold The carry from the 64->32bits folding was dropped, e.g with: saddr=0xFFFFFFFF daddr=0xFF0000FF len=0xFFFF proto=0 sum=1, csum_tcpudp_nofold returned 0 instead of 1. Signed-off-by:
Karl Beldan <karl.beldan@rivierawaves.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Mike Frysinger <vapier@gentoo.org> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit 9ce35779 150ae0e9) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
David Vrabel authored
This reverts commit 2c3fc8d2. This commit broke on x86 PV because entries in the generic SWIOTLB are indexed using (pseudo-)physical address not DMA address and these are not the same in a x86 PV guest. Signed-off-by:
David Vrabel <david.vrabel@citrix.com> Reviewed-by:
Stefano Stabellini <stefano.stabellini@eu.citrix.com> Revert "swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single" This reverts commit 2c3fc8d2. This commit broke on x86 PV because entries in the generic SWIOTLB are indexed using (pseudo-)physical address not DMA address and these are not the same in a x86 PV guest. Signed-off-by:
David Vrabel <david.vrabel@citrix.com> Reviewed-by:
Stefano Stabellini <stefano.stabellini@eu.citrix.com> xen: annotate xen_set_identity_and_remap_chunk() with __init Commit 5b8e7d80 removed the __init annotation from xen_set_identity_and_remap_chunk(). Add it again. Signed-off-by:
Juergen Gross <jgross@suse.com> Signed-off-by:
David Vrabel <david.vrabel@citrix.com> xen: introduce helper functions to do safe read and write accesses Introduce two helper functions to safely read and write unsigned long values from or to memory when the access may fault because the mapping is non-present or read-only. These helpers can be used instead of open coded uses of __get_user() and __put_user() avoiding the need to do casts to fix sparse warnings. Use the helpers in page.h and p2m.c. This will fix the sparse warnings when doing "make C=1". Signed-off-by:
Juergen Gross <jgross@suse.com> Signed-off-by:
David Vrabel <david.vrabel@citrix.com> xen: Speed up set_phys_to_machine() by using read-only mappings Instead of checking at each call of set_phys_to_machine() whether a new p2m page has to be allocated due to writing an entry in a large invalid or identity area, just map those areas read only and react to a page fault on write by allocating the new page. This change will make the common path with no allocation much faster as it only requires a single write of the new mfn instead of walking the address translation tables and checking for the special cases. Suggested-by:
David Vrabel <david.vrabel@citrix.com> Signed-off-by:
Juergen Gross <jgross@suse.com> Reviewed-by:
David Vrabel <david.vrabel@citrix.com> Reviewed-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
David Vrabel <david.vrabel@citrix.com> xen: switch to linear virtual mapped sparse p2m list At start of the day the Xen hypervisor presents a contiguous mfn list to a pv-domain. In order to support sparse memory this mfn list is accessed via a three level p2m tree built early in the boot process. Whenever the system needs the mfn associated with a pfn this tree is used to find the mfn. Instead of using a software walked tree for accessing a specific mfn list entry this patch is creating a virtual address area for the entire possible mfn list including memory holes. The holes are covered by mapping a pre-defined page consisting only of "invalid mfn" entries. Access to a mfn entry is possible by just using the virtual base address of the mfn list and the pfn as index into that list. This speeds up the (hot) path of determining the mfn of a pfn. Kernel build on a Dell Latitude E6440 (2 cores, HT) in 64 bit Dom0 showed following improvements: Elapsed time: 32:50 -> 32:35 System: 18:07 -> 17:47 User: 104:00 -> 103:30 Tested with following configurations: - 64 bit dom0, 8GB RAM - 64 bit dom0, 128 GB RAM, PCI-area above 4 GB - 32 bit domU, 512 MB, 8 GB, 43 GB (more wouldn't work even without the patch) - 32 bit domU, ballooning up and down - 32 bit domU, save and restore - 32 bit domU with PCI passthrough - 64 bit domU, 8 GB, 2049 MB, 5000 MB - 64 bit domU, ballooning up and down - 64 bit domU, save and restore - 64 bit domU with PCI passthrough Signed-off-by:
Juergen Gross <jgross@suse.com> Signed-off-by:
David Vrabel <david.vrabel@citrix.com> xen: Hide get_phys_to_machine() to be able to tune common path Today get_phys_to_machine() is always called when the mfn for a pfn is to be obtained. Add a wrapper __pfn_to_mfn() as inline function to be able to avoid calling get_phys_to_machine() when possible as soon as the switch to a linear mapped p2m list has been done. Signed-off-by:
Juergen Gross <jgross@suse.com> Reviewed-by:
David Vrabel <david.vrabel@citrix.com> Signed-off-by:
David Vrabel <david.vrabel@citrix.com> x86: Introduce function to get pmd entry pointer Introduces lookup_pmd_address() to get the address of the pmd entry related to a virtual address in the current address space. This function is needed for support of a virtual mapped sparse p2m list in xen pv domains, as we need the address of the pmd entry, not the one of the pte in that case. Signed-off-by:
Juergen Gross <jgross@suse.com> Signed-off-by:
David Vrabel <david.vrabel@citrix.com> xen: Delay invalidating extra memory When the physical memory configuration is initialized the p2m entries for not pouplated memory pages are set to "invalid". As those pages are beyond the hypervisor built p2m list the p2m tree has to be extended. This patch delays processing the extra memory related p2m entries during the boot process until some more basic memory management functions are callable. This removes the need to create new p2m entries until virtual memory management is available. Signed-off-by:
Juergen Gross <jgross@suse.com> Reviewed-by:
David Vrabel <david.vrabel@citrix.com> Signed-off-by:
David Vrabel <david.vrabel@citrix.com> xen: Delay m2p_override initialization The m2p overrides are used to be able to find the local pfn for a foreign mfn mapped into the domain. They are used by driver backends having to access frontend data. As this functionality isn't used in early boot it makes no sense to initialize the m2p override functions very early. It can be done later without doing any harm, removing the need for allocating memory via extend_brk(). While at it make some m2p override functions static as they are only used internally. Signed-off-by:
Juergen Gross <jgross@suse.com> Reviewed-by:
David Vrabel <david.vrabel@citrix.com> Reviewed-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
David Vrabel <david.vrabel@citrix.com> xen: Delay remapping memory of pv-domain Early in the boot process the memory layout of a pv-domain is changed to match the E820 map (either the host one for Dom0 or the Xen one) regarding placement of RAM and PCI holes. This requires removing memory pages initially located at positions not suitable for RAM and adding them later at higher addresses where no restrictions apply. To be able to operate on the hypervisor supported p2m list until a virtual mapped linear p2m list can be constructed, remapping must be delayed until virtual memory management is initialized, as the initial p2m list can't be extended unlimited at physical memory initialization time due to it's fixed structure. A further advantage is the reduction in complexity and code volume as we don't have to be careful regarding memory restrictions during p2m updates. Signed-off-by:
Juergen Gross <jgross@suse.com> Reviewed-by:
David Vrabel <david.vrabel@citrix.com> Signed-off-by:
David Vrabel <david.vrabel@citrix.com> xen: use common page allocation function in p2m.c In arch/x86/xen/p2m.c three different allocation functions for obtaining a memory page are used: extend_brk(), alloc_bootmem_align() or __get_free_page(). Which of those functions is used depends on the progress of the boot process of the system. Introduce a common allocation routine selecting the to be called allocation routine dynamically based on the boot progress. This allows moving initialization steps without having to care about changing allocation calls. Signed-off-by:
Juergen Gross <jgross@suse.com> Signed-off-by:
David Vrabel <david.vrabel@citrix.com> xen: Make functions static Some functions in arch/x86/xen/p2m.c are used locally only. Make them static. Rearrange the functions in p2m.c to avoid forward declarations. Signed-off-by:
Juergen Gross <jgross@suse.com> Signed-off-by:
David Vrabel <david.vrabel@citrix.com> xen: fix some style issues in p2m.c The source arch/x86/xen/p2m.c has some coding style issues. Fix them. Signed-off-by:
Juergen Gross <jgross@suse.com> Signed-off-by:
David Vrabel <david.vrabel@citrix.com> (cherry picked from commit dbdd7476 4ef8e3f3) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Chris Wilson authored
commit 226e5ae9 upstream. If CONFIG_DEBUG_MUTEXES is set, the mutex->owner field is only cleared if the mutex debugging is enabled which introduces a race in our mutex_is_locked_by() - i.e. we may inspect the old owner value before it is acquired by the new task. This is the root cause of this error: # diff --git a/kernel/locking/mutex-debug.c b/kernel/locking/mutex-debug.c # index 5cf6731..3ef3736 100644 # --- a/kernel/locking/mutex-debug.c # +++ b/kernel/locking/mutex-debug.c # @@ -80,13 +80,13 @@ void debug_mutex_unlock(struct mutex *lock) # DEBUG_LOCKS_WARN_ON(lock->owner != current); # # DEBUG_LOCKS_WARN_ON(!lock->wait_list.prev && !lock->wait_list.next); # - mutex_clear_owner(lock); # } # # /* # * __mutex_slowpath_needs_to_unlock() is explicitly 0 for debug # * mutexes so that we can do it here after we've verified state. # */ # + mutex_clear_owner(lock); # atomic_set(&lock->count, 1); # } Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87955Signed-off-by:
Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by:
Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by:
Jani Nikula <jani.nikula@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 8dcffdd3)
-
Hans de Goede authored
Wifi on this laptop does not work unless asus-nb-wmi.wapf=4 is specified on the kerne commandline, add a quirk for this. Cc: stable@vger.kernel.org BugLink: https://bugs.launchpad.net/bugs/1173681Signed-off-by:
Hans de Goede <hdegoede@redhat.com> Signed-off-by:
Darren Hart <dvhart@linux.intel.com> (cherry picked from commit 841e11cc) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Chris Wilson authored
In order to act as a full command barrier by itself, we need to tell the pipecontrol to actually stall the command streamer while the flush runs. We require the full command barrier before operations like MI_SET_CONTEXT, which currently rely on a prior invalidate flush. References: https://bugs.freedesktop.org/show_bug.cgi?id=83677 Cc: Simon Farnsworth <simon@farnz.org.uk> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by:
Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org Signed-off-by:
Jani Nikula <jani.nikula@intel.com> (cherry picked from commit add284a3) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Kazuya Mizuguchi authored
This patch fixes an issue that the NULL pointer dereference happens when we uses g_audio driver. Since the g_audio driver will call usb_ep_disable() in afunc_set_alt() before it calls usb_ep_enable(), the uep->pipe of renesas usbhs driver will be NULL. So, this patch adds a condition to avoid the oops. Signed-off-by:
Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com> Signed-off-by:
Takeshi Kihara <takeshi.kihara.df@renesas.com> Signed-off-by:
Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Fixes: 2f98382d (usb: renesas_usbhs: Add Renesas USBHS Gadget) Cc: <stable@vger.kernel.org> # v3.0+ Signed-off-by:
Felipe Balbi <balbi@ti.com> (cherry picked from commit 11432050) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Gwendal Grignou authored
commit d1c7e29e upstream. Before ->start() is called, bufsize size is set to HID_MIN_BUFFER_SIZE, 64 bytes. While processing the IRQ, we were asking to receive up to wMaxInputLength bytes, which can be bigger than 64 bytes. Later, when ->start is run, a proper bufsize will be calculated. Given wMaxInputLength is said to be unreliable in other part of the code, set to receive only what we can even if it results in truncated reports. Signed-off-by:
Gwendal Grignou <gwendal@chromium.org> Reviewed-by:
Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by:
Jiri Kosina <jkosina@suse.cz> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit e788b6db)
-
Stefano Stabellini authored
This reverts commit 2c3fc8d2. This commit broke on x86 PV because entries in the generic SWIOTLB are indexed using (pseudo-)physical address not DMA address and these are not the same in a x86 PV guest. Signed-off-by:
David Vrabel <david.vrabel@citrix.com> Reviewed-by:
Stefano Stabellini <stefano.stabellini@eu.citrix.com> Revert "swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single" This reverts commit 2c3fc8d2. This commit broke on x86 PV because entries in the generic SWIOTLB are indexed using (pseudo-)physical address not DMA address and these are not the same in a x86 PV guest. Signed-off-by:
David Vrabel <david.vrabel@citrix.com> Reviewed-by:
Stefano Stabellini <stefano.stabellini@eu.citrix.com> xen: annotate xen_set_identity_and_remap_chunk() with __init Commit 5b8e7d80 removed the __init annotation from xen_set_identity_and_remap_chunk(). Add it again. Signed-off-by:
Juergen Gross <jgross@suse.com> Signed-off-by:
David Vrabel <david.vrabel@citrix.com> xen: introduce helper functions to do safe read and write accesses Introduce two helper functions to safely read and write unsigned long values from or to memory when the access may fault because the mapping is non-present or read-only. These helpers can be used instead of open coded uses of __get_user() and __put_user() avoiding the need to do casts to fix sparse warnings. Use the helpers in page.h and p2m.c. This will fix the sparse warnings when doing "make C=1". Signed-off-by:
Juergen Gross <jgross@suse.com> Signed-off-by:
David Vrabel <david.vrabel@citrix.com> xen: Speed up set_phys_to_machine() by using read-only mappings Instead of checking at each call of set_phys_to_machine() whether a new p2m page has to be allocated due to writing an entry in a large invalid or identity area, just map those areas read only and react to a page fault on write by allocating the new page. This change will make the common path with no allocation much faster as it only requires a single write of the new mfn instead of walking the address translation tables and checking for the special cases. Suggested-by:
David Vrabel <david.vrabel@citrix.com> Signed-off-by:
Juergen Gross <jgross@suse.com> Reviewed-by:
David Vrabel <david.vrabel@citrix.com> Reviewed-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
David Vrabel <david.vrabel@citrix.com> xen: switch to linear virtual mapped sparse p2m list At start of the day the Xen hypervisor presents a contiguous mfn list to a pv-domain. In order to support sparse memory this mfn list is accessed via a three level p2m tree built early in the boot process. Whenever the system needs the mfn associated with a pfn this tree is used to find the mfn. Instead of using a software walked tree for accessing a specific mfn list entry this patch is creating a virtual address area for the entire possible mfn list including memory holes. The holes are covered by mapping a pre-defined page consisting only of "invalid mfn" entries. Access to a mfn entry is possible by just using the virtual base address of the mfn list and the pfn as index into that list. This speeds up the (hot) path of determining the mfn of a pfn. Kernel build on a Dell Latitude E6440 (2 cores, HT) in 64 bit Dom0 showed following improvements: Elapsed time: 32:50 -> 32:35 System: 18:07 -> 17:47 User: 104:00 -> 103:30 Tested with following configurations: - 64 bit dom0, 8GB RAM - 64 bit dom0, 128 GB RAM, PCI-area above 4 GB - 32 bit domU, 512 MB, 8 GB, 43 GB (more wouldn't work even without the patch) - 32 bit domU, ballooning up and down - 32 bit domU, save and restore - 32 bit domU with PCI passthrough - 64 bit domU, 8 GB, 2049 MB, 5000 MB - 64 bit domU, ballooning up and down - 64 bit domU, save and restore - 64 bit domU with PCI passthrough Signed-off-by:
Juergen Gross <jgross@suse.com> Signed-off-by:
David Vrabel <david.vrabel@citrix.com> xen: Hide get_phys_to_machine() to be able to tune common path Today get_phys_to_machine() is always called when the mfn for a pfn is to be obtained. Add a wrapper __pfn_to_mfn() as inline function to be able to avoid calling get_phys_to_machine() when possible as soon as the switch to a linear mapped p2m list has been done. Signed-off-by:
Juergen Gross <jgross@suse.com> Reviewed-by:
David Vrabel <david.vrabel@citrix.com> Signed-off-by:
David Vrabel <david.vrabel@citrix.com> x86: Introduce function to get pmd entry pointer Introduces lookup_pmd_address() to get the address of the pmd entry related to a virtual address in the current address space. This function is needed for support of a virtual mapped sparse p2m list in xen pv domains, as we need the address of the pmd entry, not the one of the pte in that case. Signed-off-by:
Juergen Gross <jgross@suse.com> Signed-off-by:
David Vrabel <david.vrabel@citrix.com> xen: Delay invalidating extra memory When the physical memory configuration is initialized the p2m entries for not pouplated memory pages are set to "invalid". As those pages are beyond the hypervisor built p2m list the p2m tree has to be extended. This patch delays processing the extra memory related p2m entries during the boot process until some more basic memory management functions are callable. This removes the need to create new p2m entries until virtual memory management is available. Signed-off-by:
Juergen Gross <jgross@suse.com> Reviewed-by:
David Vrabel <david.vrabel@citrix.com> Signed-off-by:
David Vrabel <david.vrabel@citrix.com> xen: Delay m2p_override initialization The m2p overrides are used to be able to find the local pfn for a foreign mfn mapped into the domain. They are used by driver backends having to access frontend data. As this functionality isn't used in early boot it makes no sense to initialize the m2p override functions very early. It can be done later without doing any harm, removing the need for allocating memory via extend_brk(). While at it make some m2p override functions static as they are only used internally. Signed-off-by:
Juergen Gross <jgross@suse.com> Reviewed-by:
David Vrabel <david.vrabel@citrix.com> Reviewed-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
David Vrabel <david.vrabel@citrix.com> xen: Delay remapping memory of pv-domain Early in the boot process the memory layout of a pv-domain is changed to match the E820 map (either the host one for Dom0 or the Xen one) regarding placement of RAM and PCI holes. This requires removing memory pages initially located at positions not suitable for RAM and adding them later at higher addresses where no restrictions apply. To be able to operate on the hypervisor supported p2m list until a virtual mapped linear p2m list can be constructed, remapping must be delayed until virtual memory management is initialized, as the initial p2m list can't be extended unlimited at physical memory initialization time due to it's fixed structure. A further advantage is the reduction in complexity and code volume as we don't have to be careful regarding memory restrictions during p2m updates. Signed-off-by:
Juergen Gross <jgross@suse.com> Reviewed-by:
David Vrabel <david.vrabel@citrix.com> Signed-off-by:
David Vrabel <david.vrabel@citrix.com> xen: use common page allocation function in p2m.c In arch/x86/xen/p2m.c three different allocation functions for obtaining a memory page are used: extend_brk(), alloc_bootmem_align() or __get_free_page(). Which of those functions is used depends on the progress of the boot process of the system. Introduce a common allocation routine selecting the to be called allocation routine dynamically based on the boot progress. This allows moving initialization steps without having to care about changing allocation calls. Signed-off-by:
Juergen Gross <jgross@suse.com> Signed-off-by:
David Vrabel <david.vrabel@citrix.com> xen: Make functions static Some functions in arch/x86/xen/p2m.c are used locally only. Make them static. Rearrange the functions in p2m.c to avoid forward declarations. Signed-off-by:
Juergen Gross <jgross@suse.com> Signed-off-by:
David Vrabel <david.vrabel@citrix.com> xen: fix some style issues in p2m.c The source arch/x86/xen/p2m.c has some coding style issues. Fix them. Signed-off-by:
Juergen Gross <jgross@suse.com> Signed-off-by:
David Vrabel <david.vrabel@citrix.com> xen/pci: Use APIC directly when APIC virtualization hardware is available When hardware supports APIC/x2APIC virtualization we don't need to use pirqs for MSI handling and instead use APIC since most APIC accesses (MMIO or MSR) will now be processed without VMEXITs. As an example, netperf on the original code produces this profile (collected wih 'xentrace -e 0x0008ffff -T 5'): 342 cpu_change 260 CPUID 34638 HLT 64067 INJ_VIRQ 28374 INTR 82733 INTR_WINDOW 10 NPF 24337 TRAP 370610 vlapic_accept_pic_intr 307528 VMENTRY 307527 VMEXIT 140998 VMMCALL 127 wrap_buffer After applying this patch the same test shows 230 cpu_change 260 CPUID 36542 HLT 174 INJ_VIRQ 27250 INTR 222 INTR_WINDOW 20 NPF 24999 TRAP 381812 vlapic_accept_pic_intr 166480 VMENTRY 166479 VMEXIT 77208 VMMCALL 81 wrap_buffer ApacheBench results (ab -n 10000 -c 200) improve by about 10% Signed-off-by:
Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by:
Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by:
David Vrabel <david.vrabel@citrix.com> xen/pci: Defer initialization of MSI ops on HVM guests If the hardware supports APIC virtualization we may decide not to use pirqs and instead use APIC/x2APIC directly, meaning that we don't want to set x86_msi.setup_msi_irqs and x86_msi.teardown_msi_irq to Xen-specific routines. However, x2APIC is not set up by the time pci_xen_hvm_init() is called so we need to postpone setting these ops until later, when we know which APIC mode is used. (Note that currently x2APIC is never initialized on HVM guests. This may change in the future) Signed-off-by:
Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by:
Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by:
David Vrabel <david.vrabel@citrix.com> xen-pciback: drop SR-IOV VFs when PF driver unloads When a PF driver unloads, it may find it necessary to leave the VFs around simply because of pciback having marked them as assigned to a guest. Utilize a suitable notification to let go of the VFs, thus allowing the PF to go back into the state it was before its driver loaded (which in particular allows the driver to be loaded again with it being able to create the VFs anew, but which also allows to then pass through the PF instead of the VFs). Don't do this however for any VFs currently in active use by a guest. Signed-off-by:
Jan Beulich <jbeulich@suse.com> [v2: Removed the switch statement, moved it about] [v3: Redid it a bit differently] Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
David Vrabel <david.vrabel@citrix.com> xen/pciback: Restore configuration space when detaching from a guest. The commit "xen/pciback: Don't deadlock when unbinding." was using the version of pci_reset_function which would lock the device lock. That is no good as we can dead-lock. As such we swapped to using the lock-less version and requiring that the callers of 'pcistub_put_pci_dev' take the device lock. And as such this bug got exposed. Using the lock-less version is OK, except that we tried to use 'pci_restore_state' after the lock-less version of __pci_reset_function_locked - which won't work as 'state_saved' is set to false. Said 'state_saved' is a toggle boolean that is to be used by the sequence of a) pci_save_state/pci_restore_state or b) pci_load_and_free_saved_state/pci_restore_state. We don't want to use a) as the guest might have messed up the PCI configuration space and we want it to revert to the state when the PCI device was binded to us. Therefore we pick b) to restore the configuration space. We restore from our 'golden' version of PCI configuration space, when an: - Device is unbinded from pciback - Device is detached from a guest. Reported-by:
Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
David Vrabel <david.vrabel@citrix.com> PCI: Expose pci_load_saved_state for public consumption. We have the pci_load_and_free_saved_state, and pci_store_saved_state but are missing the functionality to just load the state multiple times in the PCI device without having to free/save the state. This patch makes it possible to use this function. Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by:
Bjorn Helgaas <bhelgaas@google.com> Signed-off-by:
David Vrabel <david.vrabel@citrix.com> xen/pciback: Remove tons of dereferences A little cleanup. No functional difference. Reviewed-by:
Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
David Vrabel <david.vrabel@citrix.com> xen/pciback: Print out the domain owning the device. We had been printing it only if the device was built with debug enabled. But this information is useful in the field to troubleshoot. Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by:
David Vrabel <david.vrabel@citrix.com> Signed-off-by:
David Vrabel <david.vrabel@citrix.com> xen/pciback: Include the domain id if removing the device whilst still in use Cleanup the function a bit - also include the id of the domain that is using the device. Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by:
David Vrabel <david.vrabel@citrix.com> Signed-off-by:
David Vrabel <david.vrabel@citrix.com> driver core: Provide an wrapper around the mutex to do lockdep warnings Instead of open-coding it in drivers that want to double check that their functions are indeed holding the device lock. Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Suggested-by:
David Vrabel <david.vrabel@citrix.com> Acked-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
David Vrabel <david.vrabel@citrix.com> xen/pciback: Don't deadlock when unbinding. As commit 0a9fd015 'xen/pciback: Document the entry points for 'pcistub_put_pci_dev'' explained there are four entry points in this function. Two of them are when the user fiddles in the SysFS to unbind a device which might be in use by a guest or not. Both 'unbind' states will cause a deadlock as the the PCI lock has already been taken, which then pci_device_reset tries to take. We can simplify this by requiring that all callers of pcistub_put_pci_dev MUST hold the device lock. And then we can just call the lockless version of pci_device_reset. To make it even simpler we will modify xen_pcibk_release_pci_dev to quality whether it should take a lock or not - as it ends up calling xen_pcibk_release_pci_dev and needs to hold the lock. Reviewed-by:
Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
David Vrabel <david.vrabel@citrix.com> swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single Need to pass the pointer within the swiotlb internal buffer to the swiotlb library, that in the case of xen_unmap_single is dev_addr, not paddr. Signed-off-by:
Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> CC: stable@vger.kernel.org (cherry picked from commit dbdd7476 4ef8e3f3 2c3fc8d2) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Guenter Roeck authored
Commit 927609d6 ("kernel: tighten rules for ACCESS ONCE") results in a compile failure for sh builds with CONFIG_X2TLB enabled. arch/sh/mm/gup.c: In function 'gup_get_pte': arch/sh/mm/gup.c:20:2: error: invalid initializer make[1]: *** [arch/sh/mm/gup.o] Error 1 Replace ACCESS_ONCE with READ_ONCE to fix the problem. Fixes: 927609d6 ("kernel: tighten rules for ACCESS ONCE") Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by:
Guenter Roeck <linux@roeck-us.net> Reviewed-by:
Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by:
Christian Borntraeger <borntraeger@de.ibm.com> (cherry picked from commit 378af02b) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Christian Borntraeger authored
Commit 927609d6 ("kernel: tighten rules for ACCESS ONCE") results in sparse warnings like "Using plain integer as NULL pointer" - Let's add a type cast to the dummy assignment. To avoid warnings lik "sparse: warning: cast to restricted __hc32" we also use __force on that cast. Fixes: 927609d6 ("kernel: tighten rules for ACCESS ONCE") Signed-off-by:
Christian Borntraeger <borntraeger@de.ibm.com> (cherry picked from commit c5b19946) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Christian Borntraeger authored
Commit 927609d6 ("kernel: tighten rules for ACCESS ONCE") results in sparse warnings like "Using plain integer as NULL pointer" - Let's add a type cast to the dummy assignment. To avoid warnings lik "sparse: warning: cast to restricted __hc32" we also use __force on that cast. Fixes: 927609d6 ("kernel: tighten rules for ACCESS ONCE") Signed-off-by:
Christian Borntraeger <borntraeger@de.ibm.com> next: sh: Fix compile error Commit 927609d6 ("kernel: tighten rules for ACCESS ONCE") results in a compile failure for sh builds with CONFIG_X2TLB enabled. arch/sh/mm/gup.c: In function 'gup_get_pte': arch/sh/mm/gup.c:20:2: error: invalid initializer make[1]: *** [arch/sh/mm/gup.o] Error 1 Replace ACCESS_ONCE with READ_ONCE to fix the problem. Fixes: 927609d6 ("kernel: tighten rules for ACCESS ONCE") Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by:
Guenter Roeck <linux@roeck-us.net> Reviewed-by:
Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by:
Christian Borntraeger <borntraeger@de.ibm.com> kernel: tighten rules for ACCESS ONCE Now that all non-scalar users of ACCESS_ONCE have been converted to READ_ONCE or ASSIGN once, lets tighten ACCESS_ONCE to only work on scalar types. This variant was proposed by Alexei Starovoitov. Signed-off-by:
Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by:
Paul E. McKenney <paulmck@linux.vnet.ibm.com> (cherry picked from commit c5b19946 378af02b 927609d6) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Hector Marco-Gisbert authored
The issue is that the stack for processes is not properly randomized on 64 bit architectures due to an integer overflow. The affected function is randomize_stack_top() in file "fs/binfmt_elf.c": static unsigned long randomize_stack_top(unsigned long stack_top) { unsigned int random_variable = 0; if ((current->flags & PF_RANDOMIZE) && !(current->personality & ADDR_NO_RANDOMIZE)) { random_variable = get_random_int() & STACK_RND_MASK; random_variable <<= PAGE_SHIFT; } return PAGE_ALIGN(stack_top) + random_variable; return PAGE_ALIGN(stack_top) - random_variable; } Note that, it declares the "random_variable" variable as "unsigned int". Since the result of the shifting operation between STACK_RND_MASK (which is 0x3fffff on x86_64, 22 bits) and PAGE_SHIFT (which is 12 on x86_64): random_variable <<= PAGE_SHIFT; then the two leftmost bits are dropped when storing the result in the "random_variable". This variable shall be at least 34 bits long to hold the (22+12) result. These two dropped bits have an impact on the entropy of process stack. Concretely, the total stack entropy is reduced by four: from 2^28 to 2^30 (One fourth of expected entropy). This patch restores back the entropy by correcting the types involved in the operations in the functions randomize_stack_top() and stack_maxrandom_size(). The successful fix can be tested with: $ for i in `seq 1 10`; do cat /proc/self/maps | grep stack; done 7ffeda566000-7ffeda587000 rw-p 00000000 00:00 0 [stack] 7fff5a332000-7fff5a353000 rw-p 00000000 00:00 0 [stack] 7ffcdb7a1000-7ffcdb7c2000 rw-p 00000000 00:00 0 [stack] 7ffd5e2c4000-7ffd5e2e5000 rw-p 00000000 00:00 0 [stack] ... Once corrected, the leading bytes should be between 7ffc and 7fff, rather than always being 7fff. Signed-off-by:
Hector Marco-Gisbert <hecmargi@upv.es> Signed-off-by:
Ismael Ripoll <iripoll@upv.es> [ Rebased, fixed 80 char bugs, cleaned up commit message, added test example and CVE ] Signed-off-by:
Kees Cook <keescook@chromium.org> Cc: <stable@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Fixes: CVE-2015-1593 Link: http://lkml.kernel.org/r/20150214173350.GA18393@www.outflux.netSigned-off-by:
Borislav Petkov <bp@suse.de> (cherry picked from commit 4e7c22d4) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Thadeu Lima de Souza Cascardo authored
When reading blkio.throttle.io_serviced in a recently created blkio cgroup, it's possible to race against the creation of a throttle policy, which delays the allocation of stats_cpu. Like other functions in the throttle code, just checking for a NULL stats_cpu prevents the following oops caused by that race. [ 1117.285199] Unable to handle kernel paging request for data at address 0x7fb4d0020 [ 1117.285252] Faulting instruction address: 0xc0000000003efa2c [ 1137.733921] Oops: Kernel access of bad area, sig: 11 [#1] [ 1137.733945] SMP NR_CPUS=2048 NUMA PowerNV [ 1137.734025] Modules linked in: bridge stp llc kvm_hv kvm binfmt_misc autofs4 [ 1137.734102] CPU: 3 PID: 5302 Comm: blkcgroup Not tainted 3.19.0 #5 [ 1137.734132] task: c000000f1d188b00 ti: c000000f1d210000 task.ti: c000000f1d210000 [ 1137.734167] NIP: c0000000003efa2c LR: c0000000003ef9f0 CTR: c0000000003ef980 [ 1137.734202] REGS: c000000f1d213500 TRAP: 0300 Not tainted (3.19.0) [ 1137.734230] MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI> CR: 42008884 XER: 20000000 [ 1137.734325] CFAR: 0000000000008458 DAR: 00000007fb4d0020 DSISR: 40000000 SOFTE: 0 GPR00: c0000000003ed3a0 c000000f1d213780 c000000000c59538 0000000000000000 GPR04: 0000000000000800 0000000000000000 0000000000000000 0000000000000000 GPR08: ffffffffffffffff 00000007fb4d0020 00000007fb4d0000 c000000000780808 GPR12: 0000000022000888 c00000000fdc0d80 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 000001003e120200 c000000f1d5b0cc0 0000000000000200 0000000000000000 GPR24: 0000000000000001 c000000000c269e0 0000000000000020 c000000f1d5b0c80 GPR28: c000000000ca3a08 c000000000ca3dec c000000f1c667e00 c000000f1d213850 [ 1137.734886] NIP [c0000000003efa2c] .tg_prfill_cpu_rwstat+0xac/0x180 [ 1137.734915] LR [c0000000003ef9f0] .tg_prfill_cpu_rwstat+0x70/0x180 [ 1137.734943] Call Trace: [ 1137.734952] [c000000f1d213780] [d000000005560520] 0xd000000005560520 (unreliable) [ 1137.734996] [c000000f1d2138a0] [c0000000003ed3a0] .blkcg_print_blkgs+0xe0/0x1a0 [ 1137.735039] [c000000f1d213960] [c0000000003efb50] .tg_print_cpu_rwstat+0x50/0x70 [ 1137.735082] [c000000f1d2139e0] [c000000000104b48] .cgroup_seqfile_show+0x58/0x150 [ 1137.735125] [c000000f1d213a70] [c0000000002749dc] .kernfs_seq_show+0x3c/0x50 [ 1137.735161] [c000000f1d213ae0] [c000000000218630] .seq_read+0xe0/0x510 [ 1137.735197] [c000000f1d213bd0] [c000000000275b04] .kernfs_fop_read+0x164/0x200 [ 1137.735240] [c000000f1d213c80] [c0000000001eb8e0] .__vfs_read+0x30/0x80 [ 1137.735276] [c000000f1d213cf0] [c0000000001eb9c4] .vfs_read+0x94/0x1b0 [ 1137.735312] [c000000f1d213d90] [c0000000001ebb38] .SyS_read+0x58/0x100 [ 1137.735349] [c000000f1d213e30] [c000000000009218] syscall_exit+0x0/0x98 [ 1137.735383] Instruction dump: [ 1137.735405] 7c6307b4 7f891800 409d00b8 60000000 60420000 3d420004 392a63b0 786a1f24 [ 1137.735471] 7d49502a e93e01c8 7d495214 7d2ad214 <7cead02a> e9090008 e9490010 e9290018 And here is one code that allows to easily reproduce this, although this has first been found by running docker. void run(pid_t pid) { int n; int status; int fd; char *buffer; buffer = memalign(BUFFER_ALIGN, BUFFER_SIZE); n = snprintf(buffer, BUFFER_SIZE, "%d\n", pid); fd = open(CGPATH "/test/tasks", O_WRONLY); write(fd, buffer, n); close(fd); if (fork() > 0) { fd = open("/dev/sda", O_RDONLY | O_DIRECT); read(fd, buffer, 512); close(fd); wait(&status); } else { fd = open(CGPATH "/test/blkio.throttle.io_serviced", O_RDONLY); n = read(fd, buffer, BUFFER_SIZE); close(fd); } free(buffer); exit(0); } void test(void) { int status; mkdir(CGPATH "/test", 0666); if (fork() > 0) wait(&status); else run(getpid()); rmdir(CGPATH "/test"); } int main(int argc, char **argv) { int i; for (i = 0; i < NR_TESTS; i++) test(); return 0; } Reported-by:
Ricardo Marin Matinata <rmm@br.ibm.com> Signed-off-by:
Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Signed-off-by:
Jens Axboe <axboe@fb.com> (cherry picked from commit 045c47ca) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Chen Jie authored
sm->offset maybe wrong but magic maybe right, the offset do not have CRC. Badness at c00c7580 [verbose debug info unavailable] NIP: c00c7580 LR: c00c718c CTR: 00000014 REGS: df07bb40 TRAP: 0700 Not tainted (2.6.34.13-WR4.3.0.0_standard) MSR: 00029000 <EE,ME,CE> CR: 22084f84 XER: 00000000 TASK = df84d6e0[908] 'mount' THREAD: df07a000 GPR00: 00000001 df07bbf0 df84d6e0 00000000 00000001 00000000 df07bb58 00000041 GPR08: 00000041 c0638860 00000000 00000010 22084f88 100636c8 df814ff8 00000000 GPR16: df84d6e0 dfa558cc c05adb90 00000048 c0452d30 00000000 000240d0 000040d0 GPR24: 00000014 c05ae734 c05be2e0 00000000 00000001 00000000 00000000 c05ae730 NIP [c00c7580] __alloc_pages_nodemask+0x4d0/0x638 LR [c00c718c] __alloc_pages_nodemask+0xdc/0x638 Call Trace: [df07bbf0] [c00c718c] __alloc_pages_nodemask+0xdc/0x638 (unreliable) [df07bc90] [c00c7708] __get_free_pages+0x20/0x48 [df07bca0] [c00f4a40] __kmalloc+0x15c/0x1ec [df07bcd0] [c01fc880] jffs2_scan_medium+0xa58/0x14d0 [df07bd70] [c01ff38c] jffs2_do_mount_fs+0x1f4/0x6b4 [df07bdb0] [c020144c] jffs2_do_fill_super+0xa8/0x260 [df07bdd0] [c020230c] jffs2_fill_super+0x104/0x184 [df07be00] [c0335814] get_sb_mtd_aux+0x9c/0xec [df07be20] [c033596c] get_sb_mtd+0x84/0x1e8 [df07be60] [c0201ed0] jffs2_get_sb+0x1c/0x2c [df07be70] [c0103898] vfs_kern_mount+0x78/0x1e8 [df07bea0] [c0103a58] do_kern_mount+0x40/0x100 [df07bec0] [c011fe90] do_mount+0x240/0x890 [df07bf10] [c0120570] sys_mount+0x90/0xd8 [df07bf40] [c00110d8] ret_from_syscall+0x0/0x4 === Exception: c01 at 0xff61a34 LR = 0x100135f0 Instruction dump: 38800005 38600000 48010f41 4bfffe1c 4bfc2d15 4bfffe8c 72e90200 4082fc28 3d20c064 39298860 8809000d 68000001 <0f000000> 2f800000 419efc0c 38000001 mount: mounting /dev/mtdblock3 on /common failed: Input/output error Signed-off-by:
Chen Jie <chenjie6@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
David Woodhouse <David.Woodhouse@intel.com> (cherry picked from commit 164c2406) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Tomáš Hodek authored
When a drive is marked write-mostly it should only be the target of reads if there is no other option. This behaviour was broken by commit 9dedf603 md/raid1: read balance chooses idlest disk for SSD which causes a write-mostly device to be *preferred* is some cases. Restore correct behaviour by checking and setting best_dist_disk and best_pending_disk rather than best_disk. We only need to test one of these as they are both changed from -1 or >=0 at the same time. As we leave min_pending and best_dist unchanged, any non-write-mostly device will appear better than the write-mostly device. Reported-by:
Tomáš Hodek <tomas.hodek@volny.cz> Reported-by:
Dark Penguin <darkpenguin@yandex.ru> Signed-off-by:
NeilBrown <neilb@suse.de> Link: http://marc.info/?l=linux-raid&m=135982797322422 Fixes: 9dedf603 Cc: stable@vger.kernel.org (3.6+) (cherry picked from commit d1901ef0) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
NeilBrown authored
Commit a7854487: md: When RAID5 is dirty, force reconstruct-write instead of read-modify-write. Causes an RCW cycle to be forced even when the array is degraded. A degraded array cannot support RCW as that requires reading all data blocks, and one may be missing. Forcing an RCW when it is not possible causes a live-lock and the code spins, repeatedly deciding to do something that cannot succeed. So change the condition to only force RCW on non-degraded arrays. Reported-by:
Manibalan P <pmanibalan@amiindia.co.in> Bisected-by:
Jes Sorensen <Jes.Sorensen@redhat.com> Tested-by:
Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by:
NeilBrown <neilb@suse.de> Fixes: a7854487 Cc: stable@vger.kernel.org (v3.7+) (cherry picked from commit 26ac1073) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Nicolas Saenz Julienne authored
The gpio_chip operations receive a pointer the gpio_chip struct which is contained in the driver's private struct, yet the container_of call in those functions point to the mfd struct defined in include/linux/mfd/tps65912.h. Cc: Stable <stable@vger.kernel.org> Signed-off-by:
Nicolas Saenz Julienne <nicolassaenzj@gmail.com> Signed-off-by:
Linus Walleij <linus.walleij@linaro.org> (cherry picked from commit 2f97c20e) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Catalin Marinas authored
The native (64-bit) sigval_t union contains sival_int (32-bit) and sival_ptr (64-bit). When a compat application invokes a syscall that takes a sigval_t value (as part of a larger structure, e.g. compat_sys_mq_notify, compat_sys_timer_create), the compat_sigval_t union is converted to the native sigval_t with sival_int overlapping with either the least or the most significant half of sival_ptr, depending on endianness. When the corresponding signal is delivered to a compat application, on big endian the current (compat_uptr_t)sival_ptr cast always returns 0 since sival_int corresponds to the top part of sival_ptr. This patch fixes copy_siginfo_to_user32() so that sival_int is copied to the compat_siginfo_t structure. Cc: <stable@vger.kernel.org> Reported-by:
Bamvor Jian Zhang <bamvor.zhangjian@huawei.com> Tested-by:
Bamvor Jian Zhang <bamvor.zhangjian@huawei.com> Signed-off-by:
Catalin Marinas <catalin.marinas@arm.com> (cherry picked from commit 9d42d48a) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Martin Vajnar authored
Since the removal of CONFIG_REGULATOR_DUMMY option, the touchscreen stopped working. This patch enables the "replacement" for REGULATOR_DUMMY and allows the touchscreen to work even though there is no regulator for "vcc". Cc: stable@vger.kernel.org Signed-off-by:
Martin Vajnar <martin.vajnar@gmail.com> Signed-off-by:
Robert Jarzmik <robert.jarzmik@free.fr> (cherry picked from commit a52d2093) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Jan Kara authored
Check length of extended attributes and allocation descriptors when loading inodes from disk. Otherwise corrupted filesystems could confuse the code and make the kernel oops. Reported-by:
Carl Henrik Lunde <chlunde@ping.uio.no> CC: stable@vger.kernel.org Signed-off-by:
Jan Kara <jack@suse.cz> (cherry picked from commit 23b133bd) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Jay Lan authored
The output of KDB 'summary' command should report MemTotal, MemFree and Buffers output in kB. Current codes report in unit of pages. A define of K(x) as is defined in the code, but not used. This patch would apply the define to convert the values to kB. Please include me on Cc on replies. I do not subscribe to linux-kernel. Signed-off-by:
Jay Lan <jlan@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Jason Wessel <jason.wessel@windriver.com> (cherry picked from commit 14675592) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Dmitry Eremin-Solenikov authored
Add regulator_has_full_constraints() call to poodle board file to let regulator core know that we do not have any additional regulators left. This lets it substitute unprovided regulators with dummy ones. This fixes the following warnings that can be seen on poodle if regulators are enabled: ads7846 spi1.0: unable to get regulator: -517 spi spi1.0: Driver ads7846 requests probe deferral wm8731 0-001b: Failed to get supply 'AVDD': -517 wm8731 0-001b: Failed to request supplies: -517 wm8731 0-001b: ASoC: failed to probe component -517 Cc: stable@vger.kernel.org Signed-off-by:
Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Acked-by:
Mark Brown <broonie@kernel.org> Signed-off-by:
Robert Jarzmik <robert.jarzmik@free.fr> (cherry picked from commit 9bc78f32) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Dmitry Eremin-Solenikov authored
Add regulator_has_full_constraints() call to corgi board file to let regulator core know that we do not have any additional regulators left. This lets it substitute unprovided regulators with dummy ones. This fixes the following warnings that can be seen on corgi if regulators are enabled: ads7846 spi1.0: unable to get regulator: -517 spi spi1.0: Driver ads7846 requests probe deferral wm8731 0-001b: Failed to get supply 'AVDD': -517 wm8731 0-001b: Failed to request supplies: -517 wm8731 0-001b: ASoC: failed to probe component -517 corgi-audio corgi-audio: ASoC: failed to instantiate card -517 Cc: stable@vger.kernel.org Signed-off-by:
Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Acked-by:
Mark Brown <broonie@kernel.org> Signed-off-by:
Robert Jarzmik <robert.jarzmik@free.fr> (cherry picked from commit 271e8017) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Nicolas Pitre authored
The vcs device's poll/fasync support relies on the vt notifier to signal changes to the screen content. Notifier invocations were missing for changes that comes through the selection interface though. Fix that. Tested with BRLTTY 5.2. Signed-off-by:
Nicolas Pitre <nico@linaro.org> Cc: Dave Mielke <dave@mielke.cc> Cc: stable <stable@vger.kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 19e3ae6b) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Oliver Neukum authored
Check the special CDC headers for a plausible minimum length. Another big operating systems ignores such garbage. Signed-off-by:
Oliver Neukum <oneukum@suse.de> CC: stable@vger.kernel.org Reviewed-by:
Adam Lee <adam8157@gmail.com> Tested-by:
Adam Lee <adam8157@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 7e860a6e) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Alan Stern authored
The USB stack provides a mechanism for drivers to request an asynchronous device reset (usb_queue_reset_device()). The mechanism uses a work item (reset_ws) embedded in the usb_interface structure used by the driver, and the reset is carried out by a work queue routine. The asynchronous reset can race with driver unbinding. When this happens, we try to cancel the queued reset before unbinding the driver, on the theory that the driver won't care about any resets once it is unbound. However, thanks to the fact that lockdep now tracks work queue accesses, this can provoke a lockdep warning in situations where the device reset causes another interface's driver to be unbound; see http://marc.info/?l=linux-usb&m=141893165203776&w=2 for an example. The reason is that the work routine for reset_ws in one interface calls cancel_queued_work() for the reset_ws in another interface. Lockdep thinks this might lead to a work routine trying to cancel itself. The simplest solution is not to cancel queued resets when unbinding drivers. This means we now need to acquire a reference to the usb_interface when queuing a reset_ws work item and to drop the reference when the work routine finishes. We also need to make sure that the usb_interface structure doesn't outlive its parent usb_device; this means acquiring and dropping a reference when the interface is created and destroyed. In addition, cancelling a queued reset can fail (if the device is in the middle of an earlier reset), and this can cause usb_reset_device() to try to rebind an interface that has been deallocated (see http://marc.info/?l=linux-usb&m=142175717016628&w=2 for details). Acquiring the extra references prevents this failure. Signed-off-by:
Alan Stern <stern@rowland.harvard.edu> Reported-by:
Russell King - ARM Linux <linux@arm.linux.org.uk> Reported-by:
Olivier Sobrie <olivier@sobrie.be> Tested-by:
Olivier Sobrie <olivier@sobrie.be> Cc: stable <stable@vger.kernel.org> # 3.19 Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 524134d4) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Sebastian Andrzej Siewior authored
the following error pops up during "testusb -a -t 10" | musb-hdrc musb-hdrc.1.auto: dma_pool_free buffer-128, f134e000/be842000 (bad dma) hcd_buffer_create() creates a few buffers, the smallest has 32 bytes of size. ARCH_KMALLOC_MINALIGN is set to 64 bytes. This combo results in hcd_buffer_alloc() returning memory which is 32 bytes aligned and it might by identified by buffer_offset() as another buffer. This means the buffer which is on a 32 byte boundary will not get freed, instead it tries to free another buffer with the error message. This patch fixes the issue by creating the smallest DMA buffer with the size of ARCH_KMALLOC_MINALIGN (or 32 in case ARCH_KMALLOC_MINALIGN is smaller). This might be 32, 64 or even 128 bytes. The next three pools will have the size 128, 512 and 2048. In case the smallest pool is 128 bytes then we have only three pools instead of four (and zero the first entry in the array). The last pool size is always 2048 bytes which is the assumed PAGE_SIZE / 2 of 4096. I doubt it makes sense to continue using PAGE_SIZE / 2 where we would end up with 8KiB buffer in case we have 16KiB pages. Instead I think it makes sense to have a common size(s) and extend them if there is need to. There is a BUILD_BUG_ON() now in case someone has a minalign of more than 128 bytes. Cc: stable@vger.kernel.org Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by:
Alan Stern <stern@rowland.harvard.edu> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 5efd2ea8) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Alan Stern authored
The usb_hcd_unlink_urb() routine in hcd.c contains two possible use-after-free errors. The dev_dbg() statement at the end of the routine dereferences urb and urb->dev even though both structures may have been deallocated. This patch fixes the problem by storing urb->dev in a local variable (avoiding the dereference of urb) and moving the dev_dbg() up before the usb_put_dev() call. Signed-off-by:
Alan Stern <stern@rowland.harvard.edu> Reported-by:
Joe Lawrence <joe.lawrence@stratus.com> Tested-by:
Joe Lawrence <joe.lawrence@stratus.com> CC: <stable@vger.kernel.org> Signed-off-by:
Greg Kroah-Hartman <greg@kroah.com> (cherry picked from commit c9919790) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Lennart Sorensen authored
Added the USB serial console device ID for Siemens Ruggedcom devices which have a USB port for their serial console. Signed-off-by:
Len Sorensen <lsorense@csclub.uwaterloo.ca> Cc: stable <stable@vger.kernel.org> Signed-off-by:
Johan Hovold <johan@kernel.org> (cherry picked from commit a6f03312) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Peter Hurley authored
Commit 26df6d13 ("tty: Add EXTPROC support for LINEMODE") allows a process which has opened a pty master to send _any_ signal to the process group of the pty slave. Although potentially exploitable by a malicious program running a setuid program on a pty slave, it's unknown if this exploit currently exists. Limit to signals actually used. Cc: Theodore Ts'o <tytso@mit.edu> Cc: Howard Chu <hyc@symas.com> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: Jiri Slaby <jslaby@suse.cz> Cc: <stable@vger.kernel.org> # 2.6.36+ Signed-off-by:
Peter Hurley <peter@hurleysoftware.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 37480a05) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Matthew Wilcox authored
The 'pfn' returned by axonram was completely bogus, and has been since 2008. Signed-off-by:
Matthew Wilcox <matthew.r.wilcox@intel.com> Reviewed-by:
Jan Kara <jack@suse.cz> Reviewed-by:
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: stable@vger.kernel.org Signed-off-by:
Jens Axboe <axboe@fb.com> (cherry picked from commit 91117a20) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Paul Moore authored
Using the IPCB() macro to get the IPv4 options is convenient, but unfortunately NetLabel often needs to examine the CIPSO option outside of the scope of the IP layer in the stack. While historically IPCB() worked above the IP layer, due to the inclusion of the inet_skb_param struct at the head of the {tcp,udp}_skb_cb structs, recent commit 971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses") reordered the tcp_skb_cb struct and invalidated this IPCB() trick. This patch fixes the problem by creating a new function, cipso_v4_optptr(), which locates the CIPSO option inside the IP header without calling IPCB(). Unfortunately, this isn't as fast as a simple lookup so some additional tweaks were made to limit the use of this new function. Cc: <stable@vger.kernel.org> # 3.18 Reported-by:
Casey Schaufler <casey@schaufler-ca.com> Signed-off-by:
Paul Moore <pmoore@redhat.com> Tested-by:
Casey Schaufler <casey@schaufler-ca.com> (cherry picked from commit 04f81f01) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Jeff Moyer authored
Hi, If you can manage to submit an async write as the first async I/O from the context of a process with realtime scheduling priority, then a cfq_queue is allocated, but filed into the wrong async_cfqq bucket. It ends up in the best effort array, but actually has realtime I/O scheduling priority set in cfqq->ioprio. The reason is that cfq_get_queue assumes the default scheduling class and priority when there is no information present (i.e. when the async cfqq is created): static struct cfq_queue * cfq_get_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_cq *cic, struct bio *bio, gfp_t gfp_mask) { const int ioprio_class = IOPRIO_PRIO_CLASS(cic->ioprio); const int ioprio = IOPRIO_PRIO_DATA(cic->ioprio); cic->ioprio starts out as 0, which is "invalid". So, class of 0 (IOPRIO_CLASS_NONE) is passed to cfq_async_queue_prio like so: async_cfqq = cfq_async_queue_prio(cfqd, ioprio_class, ioprio); static struct cfq_queue ** cfq_async_queue_prio(struct cfq_data *cfqd, int ioprio_class, int ioprio) { switch (ioprio_class) { case IOPRIO_CLASS_RT: return &cfqd->async_cfqq[0][ioprio]; case IOPRIO_CLASS_NONE: ioprio = IOPRIO_NORM; /* fall through */ case IOPRIO_CLASS_BE: return &cfqd->async_cfqq[1][ioprio]; case IOPRIO_CLASS_IDLE: return &cfqd->async_idle_cfqq; default: BUG(); } } Here, instead of returning a class mapped from the process' scheduling priority, we get back the bucket associated with IOPRIO_CLASS_BE. Now, there is no queue allocated there yet, so we create it: cfqq = cfq_find_alloc_queue(cfqd, is_sync, cic, bio, gfp_mask); That function ends up doing this: cfq_init_cfqq(cfqd, cfqq, current->pid, is_sync); cfq_init_prio_data(cfqq, cic); cfq_init_cfqq marks the priority as having changed. Then, cfq_init_prio data does this: ioprio_class = IOPRIO_PRIO_CLASS(cic->ioprio); switch (ioprio_class) { default: printk(KERN_ERR "cfq: bad prio %x\n", ioprio_class); case IOPRIO_CLASS_NONE: /* * no prio set, inherit CPU scheduling settings */ cfqq->ioprio = task_nice_ioprio(tsk); cfqq->ioprio_class = task_nice_ioclass(tsk); break; So we basically have two code paths that treat IOPRIO_CLASS_NONE differently, which results in an RT async cfqq filed into a best effort bucket. Attached is a patch which fixes the problem. I'm not sure how to make it cleaner. Suggestions would be welcome. Signed-off-by:
Jeff Moyer <jmoyer@redhat.com> Tested-by:
Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: stable@kernel.org Signed-off-by:
Jens Axboe <axboe@fb.com> (cherry picked from commit c6ce1943) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-