- 30 Oct, 2015 11 commits
-
-
Mikulas Patocka authored
commit 042745ee upstream. Commit 3a0f9aae ("dm raid: round region_size to power of two") intended to make sure that the default region size is a power of two. However, the logic in that commit is incorrect and sets the variable region_size to 0 or 1, depending on whether min_region_size is a power of two. Fix this logic, using roundup_pow_of_two(), so that region_size is properly rounded up to the next power of two. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: 3a0f9aae ("dm raid: round region_size to power of two") Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Yitian Bu authored
commit 4873867e upstream. from Designware I2S datasheet, tx/rx XRUN irq is cleared by reading register TOR/ROR, rather than by writing into them. Signed-off-by: Yitian Bu <yitian.bu@tangramtek.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Ben Dooks authored
commit 19e79687 upstream. On the OMAP AM3517 platform the uart4_ick gets registered twice, causing any power management to /dev/ttyO3 to fail when trying to wake the device up. This solves the following oops: [] Unhandled fault: external abort on non-linefetch (0x1028) at 0xfa09e008 [] PC is at serial_omap_pm+0x48/0x15c [] LR is at _raw_spin_unlock_irqrestore+0x30/0x5c Fixes: aafd900c ("CLK: TI: add omap3 clock init file") Cc: mturquette@baylibre.com Cc: sboyd@codeaurora.org Cc: linux-clk@vger.kernel.org Cc: linux-omap@vger.kernel.org Cc: linux-kernel@lists.codethink.co.uk Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: Tero Kristo <t-kristo@ti.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Steve French authored
commit 646200a0 upstream. The error paths in set_file_size for cifs and smb3 are incorrect. In the unlikely event that a server did not support set file info of the file size, the code incorrectly falls back to trying SMBWriteX (note that only the original core SMB Write, used for example by DOS, can set the file size this way - this actually does not work for the more recent SMBWriteX). The idea was since the old DOS SMB Write could set the file size if you write zero bytes at that offset then use that if server rejects the normal set file info call. Fortunately the SMBWriteX will never be sent on the wire (except when file size is zero) since the length and offset fields were reversed in the two places in this function that call SMBWriteX causing the fall back path to return an error. It is also important to never call an SMB request from an SMB2/sMB3 session (which theoretically would be possible, and can cause a brief session drop, although the client recovers) so this should be fixed. In practice this path does not happen with modern servers but the error fall back to SMBWriteX is clearly wrong. Removing the calls to SMBWriteX in the error paths in cifs_set_file_size Pointed out by PaX/grsecurity team Signed-off-by: Steve French <steve.french@primarydata.com> Reported-by: PaX Team <pageexec@freemail.hu> CC: Emese Revfy <re.emese@gmail.com> CC: Brad Spengler <spender@grsecurity.net> [ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Junichi Nomura authored
commit 2a708cff upstream. __dm_destroy() takes io_barrier SRCU lock (dm_get_live_table) and suspend_lock in reverse order. Doing so can cause AB-BA deadlock: __dm_destroy dm_swap_table --------------------------------------------------- mutex_lock(suspend_lock) dm_get_live_table() srcu_read_lock(io_barrier) dm_sync_table() synchronize_srcu(io_barrier) .. waiting for dm_put_live_table() mutex_lock(suspend_lock) .. waiting for suspend_lock Fix this by taking the locks in proper order. Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Fixes: ab7c7bb6 ("dm: hold suspend_lock while suspending device during device deletion") Acked-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> [ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Steve Wise authored
commit c91aed98 upstream. The server rdma_read_chunk_lcl() and rdma_read_chunk_frmr() functions were not taking into account the initial page_offset when determining the rdma read length. This resulted in a read who's starting address and length exceeded the base/bounds of the frmr. The server gets an async error from the rdma device and kills the connection, and the client then reconnects and resends. This repeats indefinitely, and the application hangs. Most work loads don't tickle this bug apparently, but one test hit it every time: building the linux kernel on a 16 core node with 'make -j 16 O=/mnt/0' where /mnt/0 is a ramdisk mounted via NFSRDMA. This bug seems to only be tripped with devices having small fastreg page list depths. I didn't see it with mlx4, for instance. Fixes: 0bf48289 ('svcrdma: refactor marshalling logic') Signed-off-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> [ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Christian Borntraeger authored
commit adc0b7fb upstream. my gcc 5.1 used an ldgr instruction with a register != 0,2,4,6 for spilling/filling into a floating point register in our decompressor. This will cause an AFP-register data exception as the decompressor did not setup the additional floating point registers via cr0. That causes a program check loop that looked like a hang with one "Uncompressing Linux... " message (directly booted via kvm) or a loop of "Uncompressing Linux... " messages (when booted via zipl boot loader). The offending code in my build was 48e400: e3 c0 af ff ff 71 lay %r12,-1(%r10) -->48e406: b3 c1 00 1c ldgr %f1,%r12 48e40a: ec 6c 01 22 02 7f clij %r6,2,12,0x48e64e but gcc could do spilling into an fpr at any function. We can simply disable floating point support at that early stage. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Vitaly Kuznetsov authored
commit 0b34a166 upstream. Currently there is a number of issues preventing PVHVM Xen guests from doing successful kexec/kdump: - Bound event channels. - Registered vcpu_info. - PIRQ/emuirq mappings. - shared_info frame after XENMAPSPACE_shared_info operation. - Active grant mappings. Basically, newly booted kernel stumbles upon already set up Xen interfaces and there is no way to reestablish them. In Xen-4.7 a new feature called 'soft reset' is coming. A guest performing kexec/kdump operation is supposed to call SCHEDOP_shutdown hypercall with SHUTDOWN_soft_reset reason before jumping to new kernel. Hypervisor (with some help from toolstack) will do full domain cleanup (but keeping its memory and vCPU contexts intact) returning the guest to the state it had when it was first booted and thus allowing it to start over. Doing SHUTDOWN_soft_reset on Xen hypervisors which don't support it is probably OK as by default all unknown shutdown reasons cause domain destroy with a message in toolstack log: 'Unknown shutdown reason code 5. Destroying domain.' which gives a clue to what the problem is and eliminates false expectations. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> [ luis: backported to 3.16: - use CONFIG_KEXEC instead of CONFIG_KEXEC_CORE, as suggested by David Vrabel ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Mark Brown authored
commit 176fc2d5 upstream. The in kernel snprintf() will conveniently return the actual length of the printed string even if not given an output beffer at all so just do that rather than relying on the user to pass in a suitable buffer, ensuring that we don't need to worry if the buffer was truncated due to the size of the buffer passed in. Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Mark Brown authored
commit b763ec17 upstream. If a read is attempted which is smaller than the line length then we may underflow the subtraction we're doing with the unsigned size_t type so move some of the calculation to be additions on the right hand side instead in order to avoid this. Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Luis Henriques authored
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
- 28 Oct, 2015 29 commits
-
-
Filipe Manana authored
commit 808f80b4 upstream. My previous fix in commit 005efedf ("Btrfs: fix read corruption of compressed and shared extents") was effective only if the compressed extents cover a file range with a length that is not a multiple of 16 pages. That's because the detection of when we reached a different range of the file that shares the same compressed extent as the previously processed range was done at extent_io.c:__do_contiguous_readpages(), which covers subranges with a length up to 16 pages, because extent_readpages() groups the pages in clusters no larger than 16 pages. So fix this by tracking the start of the previously processed file range's extent map at extent_readpages(). The following test case for fstests reproduces the issue: seq=`basename $0` seqres=$RESULT_DIR/$seq echo "QA output created by $seq" tmp=/tmp/$$ status=1 # failure is the default! trap "_cleanup; exit \$status" 0 1 2 3 15 _cleanup() { rm -f $tmp.* } # get standard environment, filters and checks . ./common/rc . ./common/filter # real QA test starts here _need_to_be_root _supported_fs btrfs _supported_os Linux _require_scratch _require_cloner rm -f $seqres.full test_clone_and_read_compressed_extent() { local mount_opts=$1 _scratch_mkfs >>$seqres.full 2>&1 _scratch_mount $mount_opts # Create our test file with a single extent of 64Kb that is going to # be compressed no matter which compression algo is used (zlib/lzo). $XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 64K" \ $SCRATCH_MNT/foo | _filter_xfs_io # Now clone the compressed extent into an adjacent file offset. $CLONER_PROG -s 0 -d $((64 * 1024)) -l $((64 * 1024)) \ $SCRATCH_MNT/foo $SCRATCH_MNT/foo echo "File digest before unmount:" md5sum $SCRATCH_MNT/foo | _filter_scratch # Remount the fs or clear the page cache to trigger the bug in # btrfs. Because the extent has an uncompressed length that is a # multiple of 16 pages, all the pages belonging to the second range # of the file (64K to 128K), which points to the same extent as the # first range (0K to 64K), had their contents full of zeroes instead # of the byte 0xaa. This was a bug exclusively in the read path of # compressed extents, the correct data was stored on disk, btrfs # just failed to fill in the pages correctly. _scratch_remount echo "File digest after remount:" # Must match the digest we got before. md5sum $SCRATCH_MNT/foo | _filter_scratch } echo -e "\nTesting with zlib compression..." test_clone_and_read_compressed_extent "-o compress=zlib" _scratch_unmount echo -e "\nTesting with lzo compression..." test_clone_and_read_compressed_extent "-o compress=lzo" status=0 exit Signed-off-by: Filipe Manana <fdmanana@suse.com> Tested-by: Timofey Titovets <nefelim4ag@gmail.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
David Howells authored
commit 911b79cd upstream. If request_key() is used to find a keyring, only do the search part - don't do the construction part if the keyring was not found by the search. We don't really want keyrings in the negative instantiated state since the rejected/negative instantiation error value in the payload is unioned with keyring metadata. Now the kernel gives an error: request_key("keyring", "#selinux,bdekeyring", "keyring", KEY_SPEC_USER_SESSION_KEYRING) = -1 EPERM (Operation not permitted) Signed-off-by: David Howells <dhowells@redhat.com> Cc: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
David Howells authored
commit f05819df upstream. The following sequence of commands: i=`keyctl add user a a @s` keyctl request2 keyring foo bar @t keyctl unlink $i @s tries to invoke an upcall to instantiate a keyring if one doesn't already exist by that name within the user's keyring set. However, if the upcall fails, the code sets keyring->type_data.reject_error to -ENOKEY or some other error code. When the key is garbage collected, the key destroy function is called unconditionally and keyring_destroy() uses list_empty() on keyring->type_data.link - which is in a union with reject_error. Subsequently, the kernel tries to unlink the keyring from the keyring names list - which oopses like this: BUG: unable to handle kernel paging request at 00000000ffffff8a IP: [<ffffffff8126e051>] keyring_destroy+0x3d/0x88 ... Workqueue: events key_garbage_collector ... RIP: 0010:[<ffffffff8126e051>] keyring_destroy+0x3d/0x88 RSP: 0018:ffff88003e2f3d30 EFLAGS: 00010203 RAX: 00000000ffffff82 RBX: ffff88003bf1a900 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 000000003bfc6901 RDI: ffffffff81a73a40 RBP: ffff88003e2f3d38 R08: 0000000000000152 R09: 0000000000000000 R10: ffff88003e2f3c18 R11: 000000000000865b R12: ffff88003bf1a900 R13: 0000000000000000 R14: ffff88003bf1a908 R15: ffff88003e2f4000 ... CR2: 00000000ffffff8a CR3: 000000003e3ec000 CR4: 00000000000006f0 ... Call Trace: [<ffffffff8126c756>] key_gc_unused_keys.constprop.1+0x5d/0x10f [<ffffffff8126ca71>] key_garbage_collector+0x1fa/0x351 [<ffffffff8105ec9b>] process_one_work+0x28e/0x547 [<ffffffff8105fd17>] worker_thread+0x26e/0x361 [<ffffffff8105faa9>] ? rescuer_thread+0x2a8/0x2a8 [<ffffffff810648ad>] kthread+0xf3/0xfb [<ffffffff810647ba>] ? kthread_create_on_node+0x1c2/0x1c2 [<ffffffff815f2ccf>] ret_from_fork+0x3f/0x70 [<ffffffff810647ba>] ? kthread_create_on_node+0x1c2/0x1c2 Note the value in RAX. This is a 32-bit representation of -ENOKEY. The solution is to only call ->destroy() if the key was successfully instantiated. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Cc: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
David Howells authored
commit 94c4554b upstream. There appears to be a race between: (1) key_gc_unused_keys() which frees key->security and then calls keyring_destroy() to unlink the name from the name list (2) find_keyring_by_name() which calls key_permission(), thus accessing key->security, on a key before checking to see whether the key usage is 0 (ie. the key is dead and might be cleaned up). Fix this by calling ->destroy() before cleaning up the core key data - including key->security. Reported-by: Petr Matousek <pmatouse@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Cc: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Sabrina Dubroca authored
Without this length argument, we can read past the end of the iovec in memcpy_toiovec because we have no way of knowing the total length of the iovec's buffers. This is needed for stable kernels where 89c22d8c ("net: Fix skb csum races when peeking") has been backported but that don't have the ioviter conversion, which is almost all the stable trees <= 3.18. This also fixes a kernel crash for NFS servers when the client uses -onfsvers=3,proto=udp to mount the export. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> [ luis: backported to 3.16: - dropped changes to net/rxrpc/ar-recvmsg.c ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Arad, Ronen authored
commit db65a3aa upstream. netlink_dump() allocates skb based on the calculated min_dump_alloc or a per socket max_recvmsg_len. min_alloc_size is maximum space required for any single netdev attributes as calculated by rtnl_calcit(). max_recvmsg_len tracks the user provided buffer to netlink_recvmsg. It is capped at 16KiB. The intention is to avoid small allocations and to minimize the number of calls required to obtain dump information for all net devices. netlink_dump packs as many small messages as could fit within an skb that was sized for the largest single netdev information. The actual space available within an skb is larger than what is requested. It could be much larger and up to near 2x with align to next power of 2 approach. Allowing netlink_dump to use all the space available within the allocated skb increases the buffer size a user has to provide to avoid truncaion (i.e. MSG_TRUNG flag set). It was observed that with many VLANs configured on at least one netdev, a larger buffer of near 64KiB was necessary to avoid "Message truncated" error in "ip link" or "bridge [-c[ompressvlans]] vlan show" when min_alloc_size was only little over 32KiB. This patch trims skb to allocated size in order to allow the user to avoid truncation with more reasonable buffer size. Signed-off-by: Ronen Arad <ronen.arad@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Konstantin Khlebnikov authored
commit 598c12d0 upstream. When openvswitch tries allocate memory from offline numa node 0: stats = kmem_cache_alloc_node(flow_stats_cache, GFP_KERNEL | __GFP_ZERO, 0) It catches VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid)) [ replaced with VM_WARN_ON(!node_online(nid)) recently ] in linux/gfp.h This patch disables numa affinity in this case. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Joe Perches authored
commit 077cb37f upstream. It seems that kernel memory can leak into userspace by a kmalloc, ethtool_get_strings, then copy_to_user sequence. Avoid this by using kcalloc to zero fill the copied buffer. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Guillaume Nault authored
commit e6740165 upstream. Since commit 2b018d57 ("pppoe: drop PPPOX_ZOMBIEs in pppoe_release"), pppoe_release() calls dev_put(po->pppoe_dev) if sk is in the PPPOX_ZOMBIE state. But pppoe_flush_dev() can set sk->sk_state to PPPOX_ZOMBIE _and_ reset po->pppoe_dev to NULL. This leads to the following oops: [ 570.140800] BUG: unable to handle kernel NULL pointer dereference at 00000000000004e0 [ 570.142931] IP: [<ffffffffa018c701>] pppoe_release+0x50/0x101 [pppoe] [ 570.144601] PGD 3d119067 PUD 3dbc1067 PMD 0 [ 570.144601] Oops: 0000 [#1] SMP [ 570.144601] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pppoe pppox ppp_generic slhc loop crc32c_intel ghash_clmulni_intel jitterentropy_rng sha256_generic hmac drbg ansi_cprng aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper acpi_cpufreq evdev serio_raw processor button ext4 crc16 mbcache jbd2 virtio_net virtio_blk virtio_pci virtio_ring virtio [ 570.144601] CPU: 1 PID: 15738 Comm: ppp-apitest Not tainted 4.2.0 #1 [ 570.144601] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014 [ 570.144601] task: ffff88003d30d600 ti: ffff880036b60000 task.ti: ffff880036b60000 [ 570.144601] RIP: 0010:[<ffffffffa018c701>] [<ffffffffa018c701>] pppoe_release+0x50/0x101 [pppoe] [ 570.144601] RSP: 0018:ffff880036b63e08 EFLAGS: 00010202 [ 570.144601] RAX: 0000000000000000 RBX: ffff880034340000 RCX: 0000000000000206 [ 570.144601] RDX: 0000000000000006 RSI: ffff88003d30dd20 RDI: ffff88003d30dd20 [ 570.144601] RBP: ffff880036b63e28 R08: 0000000000000001 R09: 0000000000000000 [ 570.144601] R10: 00007ffee9b50420 R11: ffff880034340078 R12: ffff8800387ec780 [ 570.144601] R13: ffff8800387ec7b0 R14: ffff88003e222aa0 R15: ffff8800387ec7b0 [ 570.144601] FS: 00007f5672f48700(0000) GS:ffff88003fc80000(0000) knlGS:0000000000000000 [ 570.144601] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 570.144601] CR2: 00000000000004e0 CR3: 0000000037f7e000 CR4: 00000000000406a0 [ 570.144601] Stack: [ 570.144601] ffffffffa018f240 ffff8800387ec780 ffffffffa018f240 ffff8800387ec7b0 [ 570.144601] ffff880036b63e48 ffffffff812caabe ffff880039e4e000 0000000000000008 [ 570.144601] ffff880036b63e58 ffffffff812cabad ffff880036b63ea8 ffffffff811347f5 [ 570.144601] Call Trace: [ 570.144601] [<ffffffff812caabe>] sock_release+0x1a/0x75 [ 570.144601] [<ffffffff812cabad>] sock_close+0xd/0x11 [ 570.144601] [<ffffffff811347f5>] __fput+0xff/0x1a5 [ 570.144601] [<ffffffff811348cb>] ____fput+0x9/0xb [ 570.144601] [<ffffffff81056682>] task_work_run+0x66/0x90 [ 570.144601] [<ffffffff8100189e>] prepare_exit_to_usermode+0x8c/0xa7 [ 570.144601] [<ffffffff81001a26>] syscall_return_slowpath+0x16d/0x19b [ 570.144601] [<ffffffff813babb1>] int_ret_from_sys_call+0x25/0x9f [ 570.144601] Code: 48 8b 83 c8 01 00 00 a8 01 74 12 48 89 df e8 8b 27 14 e1 b8 f7 ff ff ff e9 b7 00 00 00 8a 43 12 a8 0b 74 1c 48 8b 83 a8 04 00 00 <48> 8b 80 e0 04 00 00 65 ff 08 48 c7 83 a8 04 00 00 00 00 00 00 [ 570.144601] RIP [<ffffffffa018c701>] pppoe_release+0x50/0x101 [pppoe] [ 570.144601] RSP <ffff880036b63e08> [ 570.144601] CR2: 00000000000004e0 [ 570.200518] ---[ end trace 46956baf17349563 ]--- pppoe_flush_dev() has no reason to override sk->sk_state with PPPOX_ZOMBIE. pppox_unbind_sock() already sets sk->sk_state to PPPOX_DEAD, which is the correct state given that sk is unbound and po->pppoe_dev is NULL. Fixes: 2b018d57 ("pppoe: drop PPPOX_ZOMBIEs in pppoe_release") Tested-by: Oleksii Berezhniak <core@irc.lg.ua> Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Eric Dumazet authored
commit c7c49b8f upstream. Greg reported crashes hitting the following check in __sk_backlog_rcv() BUG_ON(!sock_flag(sk, SOCK_MEMALLOC)); The pfmemalloc bit is currently checked in sk_filter(). This works correctly for TCP, because sk_filter() is ran in tcp_v[46]_rcv() before hitting the prequeue or backlog checks. For UDP or other protocols, this does not work, because the sk_filter() is ran from sock_queue_rcv_skb(), which might be called _after_ backlog queuing if socket is owned by user by the time packet is processed by softirq handler. Fixes: b4b9e355 ("netvm: set PF_MEMALLOC as appropriate during SKB processing") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Greg Thelen <gthelen@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Pravin B Shelar authored
commit 31b33dfb upstream. Earlier patch 6ae459bd tried to detect void ckecksum partial skb by comparing pull length to checksum offset. But it does not work for all cases since checksum-offset depends on updates to skb->data. Following patch fixes it by validating checksum start offset after skb-data pointer is updated. Negative value of checksum offset start means there is no need to checksum. Fixes: 6ae459bd ("skbuff: Fix skb checksum flag on skb pull") Reported-by: Andrew Vagin <avagin@odin.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Pravin B Shelar authored
commit 6ae459bd upstream. VXLAN device can receive skb with checksum partial. But the checksum offset could be in outer header which is pulled on receive. This results in negative checksum offset for the skb. Such skb can cause the assert failure in skb_checksum_help(). Following patch fixes the bug by setting checksum-none while pulling outer header. Following is the kernel panic msg from old kernel hitting the bug. ------------[ cut here ]------------ kernel BUG at net/core/dev.c:1906! RIP: 0010:[<ffffffff81518034>] skb_checksum_help+0x144/0x150 Call Trace: <IRQ> [<ffffffffa0164c28>] queue_userspace_packet+0x408/0x470 [openvswitch] [<ffffffffa016614d>] ovs_dp_upcall+0x5d/0x60 [openvswitch] [<ffffffffa0166236>] ovs_dp_process_packet_with_key+0xe6/0x100 [openvswitch] [<ffffffffa016629b>] ovs_dp_process_received_packet+0x4b/0x80 [openvswitch] [<ffffffffa016c51a>] ovs_vport_receive+0x2a/0x30 [openvswitch] [<ffffffffa0171383>] vxlan_rcv+0x53/0x60 [openvswitch] [<ffffffffa01734cb>] vxlan_udp_encap_recv+0x8b/0xf0 [openvswitch] [<ffffffff8157addc>] udp_queue_rcv_skb+0x2dc/0x3b0 [<ffffffff8157b56f>] __udp4_lib_rcv+0x1cf/0x6c0 [<ffffffff8157ba7a>] udp_rcv+0x1a/0x20 [<ffffffff8154fdbd>] ip_local_deliver_finish+0xdd/0x280 [<ffffffff81550128>] ip_local_deliver+0x88/0x90 [<ffffffff8154fa7d>] ip_rcv_finish+0x10d/0x370 [<ffffffff81550365>] ip_rcv+0x235/0x300 [<ffffffff8151ba1d>] __netif_receive_skb+0x55d/0x620 [<ffffffff8151c360>] netif_receive_skb+0x80/0x90 [<ffffffff81459935>] virtnet_poll+0x555/0x6f0 [<ffffffff8151cd04>] net_rx_action+0x134/0x290 [<ffffffff810683d8>] __do_softirq+0xa8/0x210 [<ffffffff8162fe6c>] call_softirq+0x1c/0x30 [<ffffffff810161a5>] do_softirq+0x65/0xa0 [<ffffffff810687be>] irq_exit+0x8e/0xb0 [<ffffffff81630733>] do_IRQ+0x63/0xe0 [<ffffffff81625f2e>] common_interrupt+0x6e/0x6e Reported-by: Anupam Chanda <achanda@vmware.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Andrey Vagin authored
commit e9193d60 upstream. Now send with MSG_PEEK can return data from multiple SKBs. Unfortunately we take into account the peek offset for each skb, that is wrong. We need to apply the peek offset only once. In addition, the peek offset should be used only if MSG_PEEK is set. Cc: "David S. Miller" <davem@davemloft.net> (maintainer:NETWORKING Cc: Eric Dumazet <edumazet@google.com> (commit_signer:1/14=7%) Cc: Aaron Conole <aconole@bytheb.org> Fixes: 9f389e35 ("af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag") Signed-off-by: Andrey Vagin <avagin@openvz.org> Tested-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Aaron Conole authored
commit 9f389e35 upstream. AF_UNIX sockets now return multiple skbs from recv() when MSG_PEEK flag is set. This is referenced in kernel bugzilla #12323 @ https://bugzilla.kernel.org/show_bug.cgi?id=12323 As described both in the BZ and lkml thread @ http://lkml.org/lkml/2008/1/8/444 calling recv() with MSG_PEEK on an AF_UNIX socket only reads a single skb, where the desired effect is to return as much skb data has been queued, until hitting the recv buffer size (whichever comes first). The modified MSG_PEEK path will now move to the next skb in the tree and jump to the again: label, rather than following the natural loop structure. This requires duplicating some of the loop head actions. This was tested using the python socketpair python code attached to the bugzilla issue. Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: David S. Miller <davem@davemloft.net> [ luis: backported to 3.16: used davem's backport to 3.14 ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Aaron Conole authored
commit 4613012d upstream. As suggested by Eric Dumazet this change replaces the complaints by the compiler when misusing the API. Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Alexander Couzens authored
commit 06a15f51 upstream. There is a small chance that tunnel_free() is called before tunnel->del_work scheduled resulting in a zero pointer dereference. Signed-off-by: Alexander Couzens <lynxis@fe80.eu> Acked-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Nikhil Badola authored
commit f8786a91 upstream. Incoming packets in high speed are randomly corrupted by h/w resulting in multiple errors. This workaround makes FS as default mode in all affected socs by disabling HS chirp signalling.This errata does not affect FS and LS mode. Forces all HS devices to connect in FS mode for all socs affected by this erratum: P3041 and P2041 rev 1.0 and 1.1 P5020 and P5010 rev 1.0 and 2.0 P5040, P1010 and T4240 rev 1.0 Signed-off-by: Ramneek Mehresh <ramneek.mehresh@freescale.com> Signed-off-by: Nikhil Badola <nikhil.badola@freescale.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Nikhil Badola authored
commit 523f1dec upstream. USB controller version-2.5 requires to enable internal UTMI phy and program PTS field in PORTSC register before asserting controller reset. This is must for successful resetting of the controller and subsequent enumeration of usb devices Signed-off-by: Nikhil Badola <nikhil.badola@freescale.com> Signed-off-by: Suresh Gupta <suresh.gupta@freescale.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Manfred Spraul authored
commit e8577d1f upstream. ipc_addid() makes a new ipc identifier visible to everyone. New objects start as locked, so that the caller can complete the initialization after the call. Within struct sem_array, at least sma->sem_base and sma->sem_nsems are accessed without any locks, therefore this approach doesn't work. Thus: Move the ipc_addid() to the end of the initialization. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Reported-by: Rik van Riel <riel@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Rafael Aquini <aquini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
lucien authored
commit f648f807 upstream. Commit f8d96052 ("sctp: Enforce retransmission limit during shutdown") fixed a problem with excessive retransmissions in the SHUTDOWN_PENDING by not resetting the association overall_error_count. This allowed the association to better enforce assoc.max_retrans limit. However, the same issue still exists when the association is in SHUTDOWN_RECEIVED state. In this state, HB-ACKs will continue to reset the overall_error_count for the association would extend the lifetime of association unnecessarily. This patch solves this by resetting the overall_error_count whenever the current state is small then SCTP_STATE_SHUTDOWN_PENDING. As a small side-effect, we end up also handling SCTP_STATE_SHUTDOWN_ACK_SENT and SCTP_STATE_SHUTDOWN_SENT states, but they are not really impacted because we disable Heartbeats in those states. Fixes: Commit f8d96052 ("sctp: Enforce retransmission limit during shutdown") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Jann Horn authored
commit fbb18169 upstream. It was possible for an attacking user to trick root (or another user) into writing his coredumps into an attacker-readable, pre-existing file using rename() or link(), causing the disclosure of secret data from the victim process' virtual memory. Depending on the configuration, it was also possible to trick root into overwriting system files with coredumps. Fix that issue by never writing coredumps into existing files. Requirements for the attack: - The attack only applies if the victim's process has a nonzero RLIMIT_CORE and is dumpable. - The attacker can trick the victim into coredumping into an attacker-writable directory D, either because the core_pattern is relative and the victim's cwd is attacker-writable or because an absolute core_pattern pointing to a world-writable directory is used. - The attacker has one of these: A: on a system with protected_hardlinks=0: execute access to a folder containing a victim-owned, attacker-readable file on the same partition as D, and the victim-owned file will be deleted before the main part of the attack takes place. (In practice, there are lots of files that fulfill this condition, e.g. entries in Debian's /var/lib/dpkg/info/.) This does not apply to most Linux systems because most distros set protected_hardlinks=1. B: on a system with protected_hardlinks=1: execute access to a folder containing a victim-owned, attacker-readable and attacker-writable file on the same partition as D, and the victim-owned file will be deleted before the main part of the attack takes place. (This seems to be uncommon.) C: on any system, independent of protected_hardlinks: write access to a non-sticky folder containing a victim-owned, attacker-readable file on the same partition as D (This seems to be uncommon.) The basic idea is that the attacker moves the victim-owned file to where he expects the victim process to dump its core. The victim process dumps its core into the existing file, and the attacker reads the coredump from it. If the attacker can't move the file because he does not have write access to the containing directory, he can instead link the file to a directory he controls, then wait for the original link to the file to be deleted (because the kernel checks that the link count of the corefile is 1). A less reliable variant that requires D to be non-sticky works with link() and does not require deletion of the original link: link() the file into D, but then unlink() it directly before the kernel performs the link count check. On systems with protected_hardlinks=0, this variant allows an attacker to not only gain information from coredumps, but also clobber existing, victim-writable files with coredumps. (This could theoretically lead to a privilege escalation.) Signed-off-by: Jann Horn <jann@thejh.net> Cc: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Thomas Hellstrom authored
commit ed7d78b2 upstream. The commit "drm/vmwgfx: Fix up user_dmabuf refcounting", while fixing a kernel crash introduced a NULL pointer dereference on older hardware. Fix this. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Sinclair Yeh <syeh@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Joonsoo Kim authored
commit 03a2d2a3 upstream. Commit description is copied from the original post of this bug: http://comments.gmane.org/gmane.linux.kernel.mm/135349 Kernels after v3.9 use kmalloc_size(INDEX_NODE + 1) to get the next larger cache size than the size index INDEX_NODE mapping. In kernels 3.9 and earlier we used malloc_sizes[INDEX_L3 + 1].cs_size. However, sometimes we can't get the right output we expected via kmalloc_size(INDEX_NODE + 1), causing a BUG(). The mapping table in the latest kernel is like: index = {0, 1, 2 , 3, 4, 5, 6, n} size = {0, 96, 192, 8, 16, 32, 64, 2^n} The mapping table before 3.10 is like this: index = {0 , 1 , 2, 3, 4 , 5 , 6, n} size = {32, 64, 96, 128, 192, 256, 512, 2^(n+3)} The problem on my mips64 machine is as follows: (1) When configured DEBUG_SLAB && DEBUG_PAGEALLOC && DEBUG_LOCK_ALLOC && DEBUG_SPINLOCK, the sizeof(struct kmem_cache_node) will be "150", and the macro INDEX_NODE turns out to be "2": #define INDEX_NODE kmalloc_index(sizeof(struct kmem_cache_node)) (2) Then the result of kmalloc_size(INDEX_NODE + 1) is 8. (3) Then "if(size >= kmalloc_size(INDEX_NODE + 1)" will lead to "size = PAGE_SIZE". (4) Then "if ((size >= (PAGE_SIZE >> 3))" test will be satisfied and "flags |= CFLGS_OFF_SLAB" will be covered. (5) if (flags & CFLGS_OFF_SLAB)" test will be satisfied and will go to "cachep->slabp_cache = kmalloc_slab(slab_size, 0u)", and the result here may be NULL while kernel bootup. (6) Finally,"BUG_ON(ZERO_OR_NULL_PTR(cachep->slabp_cache));" causes the BUG info as the following shows (may be only mips64 has this problem): This patch fixes the problem of kmalloc_size(INDEX_NODE + 1) and removes the BUG by adding 'size >= 256' check to guarantee that all necessary small sized slabs are initialized regardless sequence of slab size in mapping table. Fixes: e3366016 ("slab: Use common kmalloc_index/kmalloc_size...") Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reported-by: Liuhailong <liu.hailong6@zte.com.cn> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Andy Shevchenko authored
commit 6bea0f6d upstream. In case we have less than maximum allowed channels (8) and autoconfiguration is enabled the DWC_PARAMS read is wrong because it uses different arithmetic to what is needed for channel priority setup. Re-do the caclulations properly. This now works on AVR32 board well. Fixes: fed2574b (dw_dmac: introduce software emulation of LLP transfers) Cc: yitian.bu@tangramtek.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
John Stultz authored
commit 67dfae0c upstream. This patch fixes one cases where abs() was being used with 64-bit nanosecond values, where the result may be capped at 32-bits. This potentially could cause watchdog false negatives on 32-bit systems, so this patch addresses the issue by using abs64(). Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Link: http://lkml.kernel.org/r/1442279124-7309-2-git-send-email-john.stultz@linaro.orgSigned-off-by: Thomas Gleixner <tglx@linutronix.de> [ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Stephen Smalley authored
commit ab76f7b4 upstream. Unused space between the end of __ex_table and the start of rodata can be left W+x in the kernel page tables. Extend the setting of the NX bit to cover this gap by starting from text_end rather than rodata_start. Before: ---[ High Kernel Mapping ]--- 0xffffffff80000000-0xffffffff81000000 16M pmd 0xffffffff81000000-0xffffffff81600000 6M ro PSE GLB x pmd 0xffffffff81600000-0xffffffff81754000 1360K ro GLB x pte 0xffffffff81754000-0xffffffff81800000 688K RW GLB x pte 0xffffffff81800000-0xffffffff81a00000 2M ro PSE GLB NX pmd 0xffffffff81a00000-0xffffffff81b3b000 1260K ro GLB NX pte 0xffffffff81b3b000-0xffffffff82000000 4884K RW GLB NX pte 0xffffffff82000000-0xffffffff82200000 2M RW PSE GLB NX pmd 0xffffffff82200000-0xffffffffa0000000 478M pmd After: ---[ High Kernel Mapping ]--- 0xffffffff80000000-0xffffffff81000000 16M pmd 0xffffffff81000000-0xffffffff81600000 6M ro PSE GLB x pmd 0xffffffff81600000-0xffffffff81754000 1360K ro GLB x pte 0xffffffff81754000-0xffffffff81800000 688K RW GLB NX pte 0xffffffff81800000-0xffffffff81a00000 2M ro PSE GLB NX pmd 0xffffffff81a00000-0xffffffff81b3b000 1260K ro GLB NX pte 0xffffffff81b3b000-0xffffffff82000000 4884K RW GLB NX pte 0xffffffff82000000-0xffffffff82200000 2M RW PSE GLB NX pmd 0xffffffff82200000-0xffffffffa0000000 478M pmd Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1443704662-3138-1-git-send-email-sds@tycho.nsa.govSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Mel Gorman authored
commit 2f84a899 upstream. SunDong reported the following on https://bugzilla.kernel.org/show_bug.cgi?id=103841 I think I find a linux bug, I have the test cases is constructed. I can stable recurring problems in fedora22(4.0.4) kernel version, arch for x86_64. I construct transparent huge page, when the parent and child process with MAP_SHARE, MAP_PRIVATE way to access the same huge page area, it has the opportunity to lead to huge page copy on write failure, and then it will munmap the child corresponding mmap area, but then the child mmap area with VM_MAYSHARE attributes, child process munmap this area can trigger VM_BUG_ON in set_vma_resv_flags functions (vma - > vm_flags & VM_MAYSHARE). There were a number of problems with the report (e.g. it's hugetlbfs that triggers this, not transparent huge pages) but it was fundamentally correct in that a VM_BUG_ON in set_vma_resv_flags() can be triggered that looks like this vma ffff8804651fd0d0 start 00007fc474e00000 end 00007fc475e00000 next ffff8804651fd018 prev ffff8804651fd188 mm ffff88046b1b1800 prot 8000000000000027 anon_vma (null) vm_ops ffffffff8182a7a0 pgoff 0 file ffff88106bdb9800 private_data (null) flags: 0x84400fb(read|write|shared|mayread|maywrite|mayexec|mayshare|dontexpand|hugetlb) ------------ kernel BUG at mm/hugetlb.c:462! SMP Modules linked in: xt_pkttype xt_LOG xt_limit [..] CPU: 38 PID: 26839 Comm: map Not tainted 4.0.4-default #1 Hardware name: Dell Inc. PowerEdge R810/0TT6JF, BIOS 2.7.4 04/26/2012 set_vma_resv_flags+0x2d/0x30 The VM_BUG_ON is correct because private and shared mappings have different reservation accounting but the warning clearly shows that the VMA is shared. When a private COW fails to allocate a new page then only the process that created the VMA gets the page -- all the children unmap the page. If the children access that data in the future then they get killed. The problem is that the same file is mapped shared and private. During the COW, the allocation fails, the VMAs are traversed to unmap the other private pages but a shared VMA is found and the bug is triggered. This patch identifies such VMAs and skips them. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reported-by: SunDong <sund_sky@126.com> Reviewed-by: Michal Hocko <mhocko@suse.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: David Rientjes <rientjes@google.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Dirk Müller authored
commit d2922422 upstream. The cpu feature flags are not ever going to change, so warning everytime can cause a lot of kernel log spam (in our case more than 10GB/hour). The warning seems to only occur when nested virtualization is enabled, so it's probably triggered by a KVM bug. This is a sensible and safe change anyway, and the KVM bug fix might not be suitable for stable releases anyway. Signed-off-by: Dirk Mueller <dmueller@suse.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-
Matt Fleming authored
commit a5caa209 upstream. Beginning with UEFI v2.5 EFI_PROPERTIES_TABLE was introduced that signals that the firmware PE/COFF loader supports splitting code and data sections of PE/COFF images into separate EFI memory map entries. This allows the kernel to map those regions with strict memory protections, e.g. EFI_MEMORY_RO for code, EFI_MEMORY_XP for data, etc. Unfortunately, an unwritten requirement of this new feature is that the regions need to be mapped with the same offsets relative to each other as observed in the EFI memory map. If this is not done crashes like this may occur, BUG: unable to handle kernel paging request at fffffffefe6086dd IP: [<fffffffefe6086dd>] 0xfffffffefe6086dd Call Trace: [<ffffffff8104c90e>] efi_call+0x7e/0x100 [<ffffffff81602091>] ? virt_efi_set_variable+0x61/0x90 [<ffffffff8104c583>] efi_delete_dummy_variable+0x63/0x70 [<ffffffff81f4e4aa>] efi_enter_virtual_mode+0x383/0x392 [<ffffffff81f37e1b>] start_kernel+0x38a/0x417 [<ffffffff81f37495>] x86_64_start_reservations+0x2a/0x2c [<ffffffff81f37582>] x86_64_start_kernel+0xeb/0xef Here 0xfffffffefe6086dd refers to an address the firmware expects to be mapped but which the OS never claimed was mapped. The issue is that included in these regions are relative addresses to other regions which were emitted by the firmware toolchain before the "splitting" of sections occurred at runtime. Needless to say, we don't satisfy this unwritten requirement on x86_64 and instead map the EFI memory map entries in reverse order. The above crash is almost certainly triggerable with any kernel newer than v3.13 because that's when we rewrote the EFI runtime region mapping code, in commit d2f7cbe7 ("x86/efi: Runtime services virtual mapping"). For kernel versions before v3.13 things may work by pure luck depending on the fragmentation of the kernel virtual address space at the time we map the EFI regions. Instead of mapping the EFI memory map entries in reverse order, where entry N has a higher virtual address than entry N+1, map them in the same order as they appear in the EFI memory map to preserve this relative offset between regions. This patch has been kept as small as possible with the intention that it should be applied aggressively to stable and distribution kernels. It is very much a bugfix rather than support for a new feature, since when EFI_PROPERTIES_TABLE is enabled we must map things as outlined above to even boot - we have no way of asking the firmware not to split the code/data regions. In fact, this patch doesn't even make use of the more strict memory protections available in UEFI v2.5. That will come later. Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Chun-Yi <jlee@suse.com> Cc: Dave Young <dyoung@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: James Bottomley <JBottomley@Odin.com> Cc: Lee, Chun-Yi <jlee@suse.com> Cc: Leif Lindholm <leif.lindholm@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Jones <pjones@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1443218539-7610-2-git-send-email-matt@codeblueprint.co.ukSigned-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
-