- 21 Nov, 2014 22 commits
-
-
Ezequiel Garcia authored
The UBI_IOCVOLUP ioctl is used to start an update and also to truncate a volume. In the first case, a "volume updated" notification is dispatched when the update is done. This commit adds the "volume updated" notification to be also sent when the volume is truncated. This is required for UBI block and gluebi to get notified about the new volume size. Signed-off-by:
Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by:
Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: stable@vger.kernel.org # v3.15+ (cherry picked from commit fda322a1) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Al Viro authored
schedule_delayed_work() happening when the work is already pending is a cheap no-op. Don't bother with ->wbuf_queued logics - it's both broken (cancelling ->wbuf_dwork leaves it set, as spotted by Jeff Harris) and pointless. It's cheaper to let schedule_delayed_work() handle that case. Reported-by:
Jeff Harris <jefftharris@gmail.com> Tested-by:
Jeff Harris <jefftharris@gmail.com> Cc: stable@vger.kernel.org Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> (cherry picked from commit 99358a1c) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Takashi Iwai authored
In compat mode, we copy each field of snd_pcm_status struct but don't touch the reserved fields, and this leaves uninitialized values there. Meanwhile the native ioctl does zero-clear the whole structure, so we should follow the same rule in compat mode, too. Reported-by:
Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Takashi Iwai <tiwai@suse.de> (cherry picked from commit 317168d0) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Dmitry Kasatkin authored
evm_inode_setxattr() can be called with no value. The function does not check the length so that following command can be used to produce the kernel oops: setfattr -n security.evm FOO. This patch fixes it. Changes in v3: * there is no reason to return different error codes for EVM_XATTR_HMAC and non EVM_XATTR_HMAC. Remove unnecessary test then. Changes in v2: * testing for validity of xattr type [ 1106.396921] BUG: unable to handle kernel NULL pointer dereference at (null) [ 1106.398192] IP: [<ffffffff812af7b8>] evm_inode_setxattr+0x2a/0x48 [ 1106.399244] PGD 29048067 PUD 290d7067 PMD 0 [ 1106.399953] Oops: 0000 [#1] SMP [ 1106.400020] Modules linked in: bridge stp llc evdev serio_raw i2c_piix4 button fuse [ 1106.400020] CPU: 0 PID: 3635 Comm: setxattr Not tainted 3.16.0-kds+ #2936 [ 1106.400020] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 1106.400020] task: ffff8800291a0000 ti: ffff88002917c000 task.ti: ffff88002917c000 [ 1106.400020] RIP: 0010:[<ffffffff812af7b8>] [<ffffffff812af7b8>] evm_inode_setxattr+0x2a/0x48 [ 1106.400020] RSP: 0018:ffff88002917fd50 EFLAGS: 00010246 [ 1106.400020] RAX: 0000000000000000 RBX: ffff88002917fdf8 RCX: 0000000000000000 [ 1106.400020] RDX: 0000000000000000 RSI: ffffffff818136d3 RDI: ffff88002917fdf8 [ 1106.400020] RBP: ffff88002917fd68 R08: 0000000000000000 R09: 00000000003ec1df [ 1106.400020] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800438a0a00 [ 1106.400020] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 1106.400020] FS: 00007f7dfa7d7740(0000) GS:ffff88005da00000(0000) knlGS:0000000000000000 [ 1106.400020] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1106.400020] CR2: 0000000000000000 CR3: 000000003763e000 CR4: 00000000000006f0 [ 1106.400020] Stack: [ 1106.400020] ffff8800438a0a00 ffff88002917fdf8 0000000000000000 ffff88002917fd98 [ 1106.400020] ffffffff812a1030 ffff8800438a0a00 ffff88002917fdf8 0000000000000000 [ 1106.400020] 0000000000000000 ffff88002917fde0 ffffffff8116d08a ffff88002917fdc8 [ 1106.400020] Call Trace: [ 1106.400020] [<ffffffff812a1030>] security_inode_setxattr+0x5d/0x6a [ 1106.400020] [<ffffffff8116d08a>] vfs_setxattr+0x6b/0x9f [ 1106.400020] [<ffffffff8116d1e0>] setxattr+0x122/0x16c [ 1106.400020] [<ffffffff811687e8>] ? mnt_want_write+0x21/0x45 [ 1106.400020] [<ffffffff8114d011>] ? __sb_start_write+0x10f/0x143 [ 1106.400020] [<ffffffff811687e8>] ? mnt_want_write+0x21/0x45 [ 1106.400020] [<ffffffff811687c0>] ? __mnt_want_write+0x48/0x4f [ 1106.400020] [<ffffffff8116d3e6>] SyS_setxattr+0x6e/0xb0 [ 1106.400020] [<ffffffff81529da9>] system_call_fastpath+0x16/0x1b [ 1106.400020] Code: c3 0f 1f 44 00 00 55 48 89 e5 41 55 49 89 d5 41 54 49 89 fc 53 48 89 f3 48 c7 c6 d3 36 81 81 48 89 df e8 18 22 04 00 85 c0 75 07 <41> 80 7d 00 02 74 0d 48 89 de 4c 89 e7 e8 5a fe ff ff eb 03 83 [ 1106.400020] RIP [<ffffffff812af7b8>] evm_inode_setxattr+0x2a/0x48 [ 1106.400020] RSP <ffff88002917fd50> [ 1106.400020] CR2: 0000000000000000 [ 1106.428061] ---[ end trace ae08331628ba3050 ]--- Reported-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Dmitry Kasatkin <d.kasatkin@samsung.com> Cc: stable@vger.kernel.org Signed-off-by:
Mimi Zohar <zohar@linux.vnet.ibm.com> (cherry picked from commit 3b1deef6) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Andy Lutomirski authored
Rusty noticed a Really Bad Bug (tm) in my NT fix. The entry code reads out of bounds, causing the NT fix to be unreliable. But, and this is much, much worse, if your stack is somehow just below the top of the direct map (or a hole), you read out of bounds and crash. Excerpt from the crash: [ 1.129513] RSP: 0018:ffff88001da4bf88 EFLAGS: 00010296 2b:* f7 84 24 90 00 00 00 testl $0x4000,0x90(%rsp) That read is deterministically above the top of the stack. I thought I even single-stepped through this code when I wrote it to check the offset, but I clearly screwed it up. Fixes: 8c7aa698 ("x86_64, entry: Filter RFLAGS.NT on entry from userspace") Reported-by:
Rusty Russell <rusty@ozlabs.org> Cc: stable@vger.kernel.org Signed-off-by:
Andy Lutomirski <luto@amacapital.net> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 653bc77a) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Andy Lutomirski authored
commit 8c7aa698 upstream. The NT flag doesn't do anything in long mode other than causing IRET to #GP. Oddly, CPL3 code can still set NT using popf. Entry via hardware or software interrupt clears NT automatically, so the only relevant entries are fast syscalls. If user code causes kernel code to run with NT set, then there's at least some (small) chance that it could cause trouble. For example, user code could cause a call to EFI code with NT set, and who knows what would happen? Apparently some games on Wine sometimes do this (!), and, if an IRET return happens, they will segfault. That segfault cannot be handled, because signal delivery fails, too. This patch programs the CPU to clear NT on entry via SYSCALL (both 32-bit and 64-bit, by my reading of the AMD APM), and it clears NT in software on entry via SYSENTER. To save a few cycles, this borrows a trick from Jan Beulich in Xen: it checks whether NT is set before trying to clear it. As a result, it seems to have very little effect on SYSENTER performance on my machine. There's another minor bug fix in here: it looks like the CFI annotations were wrong if CONFIG_AUDITSYSCALL=n. Testers beware: on Xen, SYSENTER with NT set turns into a GPF. I haven't touched anything on 32-bit kernels. The syscall mask change comes from a variant of this patch by Anish Bhatt. Note to stable maintainers: there is no known security issue here. A misguided program can set NT and cause the kernel to try and fail to deliver SIGSEGV, crashing the program. This patch fixes Far Cry on Wine: https://bugs.winehq.org/show_bug.cgi?id=33275Reported-by:
Anish Bhatt <anish@chelsio.com> Signed-off-by:
Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/395749a5d39a29bd3e4b35899cf3a3c1340e5595.1412189265.git.luto@amacapital.netSigned-off-by:
H. Peter Anvin <hpa@zytor.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit b1f7cac1)
-
Oleg Nesterov authored
Add preempt_disable() + preempt_enable() around math_state_restore() in __restore_xstate_sig(). Otherwise __switch_to() after __thread_fpu_begin() can overwrite fpu->state we are going to restore. Signed-off-by:
Oleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/20140902175717.GA21649@redhat.com Cc: <stable@vger.kernel.org> # v3.7+ Reviewed-by:
Suresh Siddha <sbsiddha@gmail.com> Signed-off-by:
H. Peter Anvin <hpa@linux.intel.com> (cherry picked from commit df24fb85) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Ben Hutchings authored
It is currently possible to execve() an x32 executable on an x86_64 kernel that has only ia32 compat enabled. However all its syscalls will fail, even _exit(). This usually causes it to segfault. Change the ELF compat architecture check so that x32 executables are rejected if we don't support the x32 ABI. Signed-off-by:
Ben Hutchings <ben@decadent.org.uk> Link: http://lkml.kernel.org/r/1410120305.6822.9.camel@decadent.org.uk Cc: stable@vger.kernel.org Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> (cherry picked from commit 0e6d3112) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Artem Bityutskiy authored
Hu (hujianyang <hujianyang@huawei.com>) discovered an issue in the 'empty_log_bytes()' function, which calculates how many bytes are left in the log: " If 'c->lhead_lnum + 1 == c->ltail_lnum' and 'c->lhead_offs == c->leb_size', 'h' would equalent to 't' and 'empty_log_bytes()' would return 'c->log_bytes' instead of 0. " At this point it is not clear what would be the consequences of this, and whether this may lead to any problems, but this patch addresses the issue just in case. Cc: stable@vger.kernel.org Tested-by:
hujianyang <hujianyang@huawei.com> Reported-by:
hujianyang <hujianyang@huawei.com> Signed-off-by:
Artem Bityutskiy <artem.bityutskiy@linux.intel.com> (cherry picked from commit ba29e721) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Mikulas Patocka authored
This patch makes it possible to kill a process looping in cont_expand_zero. A process may spend a lot of time in this function, so it is desirable to be able to kill it. It happened to me that I wanted to copy a piece data from the disk to a file. By mistake, I used the "seek" parameter to dd instead of "skip". Due to the "seek" parameter, dd attempted to extend the file and became stuck doing so - the only possibility was to reset the machine or wait many hours until the filesystem runs out of space and cont_expand_zero fails. We need this patch to be able to terminate the process. Signed-off-by:
Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> (cherry picked from commit c2ca0fcd) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Roger Tseng authored
Current code erroneously fill the last byte of R2 response with an undefined value. In addition, the controller actually 'offloads' the last byte (CRC7, end bit) while receiving R2 response and thus it's impossible to get the actual value. This could cause mmc stack to obtain inconsistent CID from the same card after resume and misidentify it as a different card. Fix by assigning dummy CRC and end bit: {7'b0, 1} = 0x1 to the last byte of R2. Cc: <stable@vger.kernel.org> # v3.8+ Fixes: ff984e57 ("mmc: Add realtek pcie sdmmc host driver") Signed-off-by:
Roger Tseng <rogerable@realtek.com> Signed-off-by:
Ulf Hansson <ulf.hansson@linaro.org> (cherry picked from commit d1419d50) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Ondrej Zary authored
Currently, ata_sff_softreset is skipped for controllers with no ctl port. But that also skips ata_sff_dev_classify required for device detection. This means that libata is currently broken on controllers with no ctl port. No device connected: [ 1.872480] pata_isapnp 01:01.02: activated [ 1.889823] scsi2 : pata_isapnp [ 1.890109] ata3: PATA max PIO0 cmd 0x1e8 ctl 0x0 irq 11 [ 6.888110] ata3.01: qc timeout (cmd 0xec) [ 6.888179] ata3.01: failed to IDENTIFY (I/O error, err_mask=0x5) [ 16.888085] ata3.01: qc timeout (cmd 0xec) [ 16.888147] ata3.01: failed to IDENTIFY (I/O error, err_mask=0x5) [ 46.888086] ata3.01: qc timeout (cmd 0xec) [ 46.888148] ata3.01: failed to IDENTIFY (I/O error, err_mask=0x5) [ 51.888100] ata3.00: qc timeout (cmd 0xec) [ 51.888160] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x5) [ 61.888079] ata3.00: qc timeout (cmd 0xec) [ 61.888141] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x5) [ 91.888089] ata3.00: qc timeout (cmd 0xec) [ 91.888152] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x5) ATAPI device connected: [ 1.882061] pata_isapnp 01:01.02: activated [ 1.893430] scsi2 : pata_isapnp [ 1.893719] ata3: PATA max PIO0 cmd 0x1e8 ctl 0x0 irq 11 [ 6.892107] ata3.01: qc timeout (cmd 0xec) [ 6.892171] ata3.01: failed to IDENTIFY (I/O error, err_mask=0x5) [ 16.892079] ata3.01: qc timeout (cmd 0xec) [ 16.892138] ata3.01: failed to IDENTIFY (I/O error, err_mask=0x5) [ 46.892079] ata3.01: qc timeout (cmd 0xec) [ 46.892138] ata3.01: failed to IDENTIFY (I/O error, err_mask=0x5) [ 46.908586] ata3.00: ATAPI: ACER CD-767E/O, V1.5X, max PIO2, CDB intr [ 46.924570] ata3.00: configured for PIO0 (device error ignored) [ 46.926295] scsi 2:0:0:0: CD-ROM ACER CD-767E/O 1.5X PQ: 0 ANSI: 5 [ 46.984519] sr0: scsi3-mmc drive: 6x/6x xa/form2 tray [ 46.984592] cdrom: Uniform CD-ROM driver Revision: 3.20 So don't skip ata_sff_softreset, just skip the reset part of ata_bus_softreset if the ctl port is not available. This makes IDE port on ES968 behave correctly: No device connected: [ 4.670888] pata_isapnp 01:01.02: activated [ 4.673207] scsi host2: pata_isapnp [ 4.673675] ata3: PATA max PIO0 cmd 0x1e8 ctl 0x0 irq 11 [ 7.081840] Adding 2541652k swap on /dev/sda2. Priority:-1 extents:1 across:2541652k ATAPI device connected: [ 4.704362] pata_isapnp 01:01.02: activated [ 4.706620] scsi host2: pata_isapnp [ 4.706877] ata3: PATA max PIO0 cmd 0x1e8 ctl 0x0 irq 11 [ 4.872782] ata3.00: ATAPI: ACER CD-767E/O, V1.5X, max PIO2, CDB intr [ 4.888673] ata3.00: configured for PIO0 (device error ignored) [ 4.893984] scsi 2:0:0:0: CD-ROM ACER CD-767E/O 1.5X PQ: 0 ANSI: 5 [ 7.015578] Adding 2541652k swap on /dev/sda2. Priority:-1 extents:1 across:2541652k Signed-off-by:
Ondrej Zary <linux@rainbow-software.org> Signed-off-by:
Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org (cherry picked from commit 6d8ca28f) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Scott Carter authored
The Broadcom OSB4 IDE Controller (vendor and device IDs: 1166:0211) does not support 64-KB DMA transfers. Whenever a 64-KB DMA transfer is attempted, the transfer fails and messages similar to the following are written to the console log: [ 2431.851125] sr 0:0:0:0: [sr0] Unhandled sense code [ 2431.851139] sr 0:0:0:0: [sr0] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [ 2431.851152] sr 0:0:0:0: [sr0] Sense Key : Hardware Error [current] [ 2431.851166] sr 0:0:0:0: [sr0] Add. Sense: Logical unit communication time-out [ 2431.851182] sr 0:0:0:0: [sr0] CDB: Read(10): 28 00 00 00 76 f4 00 00 40 00 [ 2431.851210] end_request: I/O error, dev sr0, sector 121808 When the libata and pata_serverworks modules are recompiled with ATA_DEBUG and ATA_VERBOSE_DEBUG defined in libata.h, the 64-KB transfer size in the scatter-gather list can be seen in the console log: [ 2664.897267] sr 9:0:0:0: [sr0] Send: [ 2664.897274] 0xf63d85e0 [ 2664.897283] sr 9:0:0:0: [sr0] CDB: [ 2664.897288] Read(10): 28 00 00 00 7f b4 00 00 40 00 [ 2664.897319] buffer = 0xf6d6fbc0, bufflen = 131072, queuecommand 0xf81b7700 [ 2664.897331] ata_scsi_dump_cdb: CDB (1:0,0,0) 28 00 00 00 7f b4 00 00 40 [ 2664.897338] ata_scsi_translate: ENTER [ 2664.897345] ata_sg_setup: ENTER, ata1 [ 2664.897356] ata_sg_setup: 3 sg elements mapped [ 2664.897364] ata_bmdma_fill_sg: PRD[0] = (0x66FD2000, 0xE000) [ 2664.897371] ata_bmdma_fill_sg: PRD[1] = (0x65000000, 0x10000) ------------------------------------------------------> ======= [ 2664.897378] ata_bmdma_fill_sg: PRD[2] = (0x66A10000, 0x2000) [ 2664.897386] ata1: ata_dev_select: ENTER, device 0, wait 1 [ 2664.897422] ata_sff_tf_load: feat 0x1 nsect 0x0 lba 0x0 0x0 0xFC [ 2664.897428] ata_sff_tf_load: device 0xA0 [ 2664.897448] ata_sff_exec_command: ata1: cmd 0xA0 [ 2664.897457] ata_scsi_translate: EXIT [ 2664.897462] leaving scsi_dispatch_cmnd() [ 2664.897497] Doing sr request, dev = sr0, block = 0 [ 2664.897507] sr0 : reading 64/256 512 byte blocks. [ 2664.897553] ata_sff_hsm_move: ata1: protocol 7 task_state 1 (dev_stat 0x58) [ 2664.897560] atapi_send_cdb: send cdb [ 2666.910058] ata_bmdma_port_intr: ata1: host_stat 0x64 [ 2666.910079] __ata_sff_port_intr: ata1: protocol 7 task_state 3 [ 2666.910093] ata_sff_hsm_move: ata1: protocol 7 task_state 3 (dev_stat 0x51) [ 2666.910101] ata_sff_hsm_move: ata1: protocol 7 task_state 4 (dev_stat 0x51) [ 2666.910129] sr 9:0:0:0: [sr0] Done: [ 2666.910136] 0xf63d85e0 TIMEOUT lspci shows that the driver used for the Broadcom OSB4 IDE Controller is pata_serverworks: 00:0f.1 IDE interface: Broadcom OSB4 IDE Controller (prog-if 8e [Master SecP SecO PriP]) Flags: bus master, medium devsel, latency 64 [virtual] Memory at 000001f0 (32-bit, non-prefetchable) [size=8] [virtual] Memory at 000003f0 (type 3, non-prefetchable) [size=1] I/O ports at 0170 [size=8] I/O ports at 0374 [size=4] I/O ports at 1440 [size=16] Kernel driver in use: pata_serverworks The pata_serverworks driver supports five distinct device IDs, one being the OSB4 and the other four belonging to the CSB series. The CSB series appears to support 64-KB DMA transfers, as tests on a machine with an SAI2 motherboard containing a Broadcom CSB5 IDE Controller (vendor and device IDs: 1166:0212) showed no problems with 64-KB DMA transfers. This problem was first discovered when attempting to install openSUSE from a DVD on a machine with an STL2 motherboard. Using the pata_serverworks module, older releases of openSUSE will not install at all due to the timeouts. Releases of openSUSE prior to 11.3 can be installed by disabling the pata_serverworks module using the brokenmodules boot parameter, which causes the serverworks module to be used instead. Recent releases of openSUSE (12.2 and later) include better error recovery and will install, though very slowly. On all openSUSE releases, the problem can be recreated on a machine containing a Broadcom OSB4 IDE Controller by mounting an install DVD and running a command similar to the following: find /mnt -type f -print | xargs cat > /dev/null The patch below corrects the problem. Similar to the other ATA drivers that do not support 64-KB DMA transfers, the patch changes the ata_port_operations qc_prep vector to point to a routine that breaks any 64-KB segment into two 32-KB segments and changes the scsi_host_template sg_tablesize element to reduce by half the number of scatter/gather elements allowed. These two changes affect only the OSB4. Signed-off-by:
Scott Carter <ccscott@funsoft.com> Signed-off-by:
Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org (cherry picked from commit 37017ac6) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Guenter Roeck authored
This reverts commit 3189eddb ("percpu: free percpu allocation info for uniprocessor system"). The commit causes a hang with a crisv32 image. This may be an architecture problem, but at least for now the revert is necessary to be able to boot a crisv32 image. Cc: Tejun Heo <tj@kernel.org> Cc: Honggang Li <enjoymindful@gmail.com> Signed-off-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Tejun Heo <tj@kernel.org> Fixes: 3189eddb ("percpu: free percpu allocation info for uniprocessor system") Cc: stable@vger.kernel.org # Please don't apply 3189eddb (cherry picked from commit bb2e226b) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Benjamin Coddington authored
If rpc.statd is restarted, upcalls to monitor hosts can fail with ECONNREFUSED. In that case force a lookup of statd's new port and retry the upcall. Signed-off-by:
Benjamin Coddington <bcodding@redhat.com> Cc: stable@vger.kernel.org Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com> (cherry picked from commit 173b3afc) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Andy Shevchenko authored
The commit 3b57de95 brought the support for a different amount of the filter bins, but didn't update the PCI driver accordingly. This patch appends the default values when the device is enumerated via PCI bus. Fixes: 3b57de95 (net: stmmac: Support devicetree configs for mcast and ucast filter entries) Signed-off-by:
Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit 1e19e084) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Ben Hutchings authored
These drivers now call ipv6_proxy_select_ident(), which is defined only if CONFIG_INET is enabled. However, they have really depended on CONFIG_INET for as long as they have allowed sending GSO packets from userland. Reported-by:
kbuild test robot <fengguang.wu@intel.com> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk> Fixes: f43798c2 ("tun: Allow GSO using virtio_net_hdr") Fixes: b9fb9ee0 ("macvtap: add GSO/csum offload support") Fixes: 5188cd44 ("drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets") Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit de11b0e8) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Or Gerlitz authored
For VXLAN/NVGRE encapsulation, the current HW doesn't support offloading both the outer UDP TX checksum and the inner TCP/UDP TX checksum. The driver doesn't advertize SKB_GSO_UDP_TUNNEL_CSUM, however we are wrongly telling the HW to offload the outer UDP checksum for encapsulated packets, fix that. Fixes: 837052d0 ('net/mlx4_en: Add netdev support for TCP/IP offloads of vxlan tunneling') Signed-off-by:
Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit a4f2dacb) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Anish Bhatt authored
win0_lock was being used un-initialized, resulting in warning traces being seen when lock debugging is enabled (and just wrong) Fixes : fc5ab020 ('cxgb4: Replaced the backdoor mechanism to access the HW memory with PCIe Window method') Signed-off-by:
Anish Bhatt <anish@chelsio.com> Signed-off-by:
Casey Leedom <leedom@chelsio.com> Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit e327c225) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Sathya Perla authored
The commit "net: Save TX flow hash in sock and set in skbuf on xmit" introduced the inet_set_txhash() and ip6_set_txhash() routines to calculate and record flow hash(sk_txhash) in the socket structure. sk_txhash is used to set skb->hash which is used to spread flows across multiple TXQs. But, the above routines are invoked before the source port of the connection is created. Because of this all outgoing connections that just differ in the source port get hashed into the same TXQ. This patch fixes this problem for IPv4/6 by invoking the the above routines after the source port is available for the socket. Fixes: b73c3d0e("net: Save TX flow hash in sock and set in skbuf on xmit") Signed-off-by:
Sathya Perla <sathya.perla@emulex.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit 9e7ceb06) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Vasily Averin authored
ip_setup_cork() called inside ip_append_data() steals dst entry from rt to cork and in case errors in __ip_append_data() nobody frees stolen dst entry Fixes: 2e77d89b ("net: avoid a pair of dst_hold()/dst_release() in ip_append_data()") Signed-off-by:
Vasily Averin <vvs@parallels.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit 4062090e) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Jiri Pirko authored
fib_nh_match does not match nexthops correctly. Example: ip route add 172.16.10/24 nexthop via 192.168.122.12 dev eth0 \ nexthop via 192.168.122.13 dev eth0 ip route del 172.16.10/24 nexthop via 192.168.122.14 dev eth0 \ nexthop via 192.168.122.15 dev eth0 Del command is successful and route is removed. After this patch applied, the route is correctly matched and result is: RTNETLINK answers: No such process Please consider this for stable trees as well. Fixes: 4e902c57 ("[IPv4]: FIB configuration using struct fib_config") Signed-off-by:
Jiri Pirko <jiri@resnulli.us> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit f76936d0) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
- 08 Nov, 2014 1 commit
-
-
Sasha Levin authored
Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
- 07 Nov, 2014 17 commits
-
-
Xiubo Li authored
If 'map->dev' is NULL and there will lead dev_name() to be NULL pointer dereference. So before dev_name(), we need to have check of the map->dev pionter. We also should make sure that the 'name' pointer shouldn't be NULL for debugfs_create_dir(). So here using one default "dummy" debugfs name when the 'name' pointer and 'map->dev' are both NULL. Signed-off-by:
Xiubo Li <Li.Xiubo@freescale.com> Signed-off-by:
Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org (cherry picked from commit 2c98e0c1) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Michael Ellerman authored
The kernel defines the function spin_is_locked(), which can be used to check if a spinlock is currently locked. Using spin_is_locked() on a lock you don't hold is obviously racy. That is, even though you may observe that the lock is unlocked, it may become locked at any time. There is (at least) one exception to that, which is if two locks are used as a pair, and the holder of each checks the status of the other before doing any update. Assuming *A and *B are two locks, and *COUNTER is a shared non-atomic value: The first CPU does: spin_lock(*A) if spin_is_locked(*B) # nothing else smp_mb() LOAD r = *COUNTER r++ STORE *COUNTER = r spin_unlock(*A) And the second CPU does: spin_lock(*B) if spin_is_locked(*A) # nothing else smp_mb() LOAD r = *COUNTER r++ STORE *COUNTER = r spin_unlock(*B) Although this is a strange locking construct, it should work. It seems to be understood, but not documented, that spin_is_locked() is not a memory barrier, so in the examples above and below the caller inserts its own memory barrier before acting on the result of spin_is_locked(). For now we assume spin_is_locked() is implemented as below, and we break it out in our examples: bool spin_is_locked(*LOCK) { LOAD l = *LOCK return l.locked } Our intuition is that there should be no problem even if the two code sequences run simultaneously such as: CPU 0 CPU 1 ================================================== spin_lock(*A) spin_lock(*B) LOAD b = *B LOAD a = *A if b.locked # true if a.locked # true # nothing # nothing spin_unlock(*A) spin_unlock(*B) If one CPU gets the lock before the other then it will do the update and the other CPU will back off: CPU 0 CPU 1 ================================================== spin_lock(*A) LOAD b = *B spin_lock(*B) if b.locked # false LOAD a = *A else if a.locked # true smp_mb() # nothing LOAD r1 = *COUNTER spin_unlock(*B) r1++ STORE *COUNTER = r1 spin_unlock(*A) However in reality spin_lock() itself is not indivisible. On powerpc we implement it as a load-and-reserve and store-conditional. Ignoring the retry logic for the lost reservation case, it boils down to: spin_lock(*LOCK) { LOAD l = *LOCK l.locked = true STORE *LOCK = l ACQUIRE_BARRIER } The ACQUIRE_BARRIER is required to give spin_lock() ACQUIRE semantics as defined in memory-barriers.txt: This acts as a one-way permeable barrier. It guarantees that all memory operations after the ACQUIRE operation will appear to happen after the ACQUIRE operation with respect to the other components of the system. On modern powerpc systems we use lwsync for ACQUIRE_BARRIER. lwsync is also know as "lightweight sync", or "sync 1". As described in Power ISA v2.07 section B.2.1.1, in this scenario the lwsync is not the barrier itself. It instead causes the LOAD of *LOCK to act as the barrier, preventing any loads or stores in the locked region from occurring prior to the load of *LOCK. Whether this behaviour is in accordance with the definition of ACQUIRE semantics in memory-barriers.txt is open to discussion, we may switch to a different barrier in future. What this means in practice is that the following can occur: CPU 0 CPU 1 ================================================== LOAD a = *A LOAD b = *B a.locked = true b.locked = true LOAD b = *B LOAD a = *A STORE *A = a STORE *B = b if b.locked # false if a.locked # false else else smp_mb() smp_mb() LOAD r1 = *COUNTER LOAD r2 = *COUNTER r1++ r2++ STORE *COUNTER = r1 STORE *COUNTER = r2 # Lost update spin_unlock(*A) spin_unlock(*B) That is, the load of *B can occur prior to the store that makes *A visibly locked. And similarly for CPU 1. The result is both CPUs hold their lock and believe the other lock is unlocked. The easiest fix for this is to add a full memory barrier to the start of spin_is_locked(), so adding to our previous definition would give us: bool spin_is_locked(*LOCK) { smp_mb() LOAD l = *LOCK return l.locked } The new barrier orders the store to the lock we are locking vs the load of the other lock: CPU 0 CPU 1 ================================================== LOAD a = *A LOAD b = *B a.locked = true b.locked = true STORE *A = a STORE *B = b smp_mb() smp_mb() LOAD b = *B LOAD a = *A if b.locked # true if a.locked # true # nothing # nothing spin_unlock(*A) spin_unlock(*B) Although the above example is theoretical, there is code similar to this example in sem_lock() in ipc/sem.c. This commit in addition to the next commit appears to be a fix for crashes we are seeing in that code where we believe this race happens in practice. Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org> (cherry picked from commit 51d7d520) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Vince Weaver authored
This was discussed back in February: https://lkml.org/lkml/2014/2/18/956 But I never saw a patch come out of it. On IvyBridge we share the SandyBridge cache event tables, but the dTLB-load-miss event is not compatible. Patch it up after the fact to the proper DTLB_LOAD_MISSES.DEMAND_LD_MISS_CAUSES_A_WALK Signed-off-by:
Vince Weaver <vincent.weaver@maine.edu> Signed-off-by:
Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1407141528200.17214@vincent-weaver-1.umelst.maine.eduSigned-off-by:
Ingo Molnar <mingo@kernel.org> (cherry picked from commit 1996388e) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Xiubo Li authored
Since we cannot make sure the 'val_count' will always be none zero here, and then if it equals to zero, the kmemdup() will return ZERO_SIZE_PTR, which equals to ((void *)16). So this patch fix this with just doing the zero check before calling kmemdup(). Signed-off-by:
Xiubo Li <Li.Xiubo@freescale.com> Signed-off-by:
Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org (cherry picked from commit d6b41cb0) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
Xiubo Li authored
If 'map->dev' is NULL and there will lead dev_name() to be NULL pointer dereference. So before dev_name(), we need to have check of the map->dev pionter. We also should make sure that the 'name' pointer shouldn't be NULL for debugfs_create_dir(). So here using one default "dummy" debugfs name when the 'name' pointer and 'map->dev' are both NULL. Signed-off-by:
Xiubo Li <Li.Xiubo@freescale.com> Signed-off-by:
Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org (cherry picked from commit 2c98e0c1) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
David S. Miller authored
Make sure, at compile time, that the kernel can properly support whatever MAX_PHYS_ADDRESS_BITS is defined to. On M7 chips, use a max_phys_bits value of 49. Based upon a patch by Bob Picco. Signed-off-by:
David S. Miller <davem@davemloft.net> Acked-by:
Bob Picco <bob.picco@oracle.com> (cherry picked from commit 7c0fa0f2) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
David S. Miller authored
Now that we use 4-level page tables, we can provide up to 53-bits of virtual address space to the user. Adjust the VA hole based upon the capabilities of the cpu type probed. Signed-off-by:
David S. Miller <davem@davemloft.net> Acked-by:
Bob Picco <bob.picco@oracle.com> (cherry picked from commit 4397bed0) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
David S. Miller authored
This has become necessary with chips that support more than 43-bits of physical addressing. Based almost entirely upon a patch by Bob Picco. Signed-off-by:
David S. Miller <davem@davemloft.net> Acked-by:
Bob Picco <bob.picco@oracle.com> (cherry picked from commit ac55c768) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
David S. Miller authored
Instead of returning false we should at least check the most basic things, otherwise page table corruptions will be very difficult to debug. PMD and PTE tables are of size PAGE_SIZE, so none of the sub-PAGE_SIZE bits should be set. We also complement this with a check that the physical address the pud/pmd points to is valid memory. PowerPC was used as a guide while implementating this. Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit 26cf4325) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
David S. Miller authored
Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit eaf85da8) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
David S. Miller authored
The large PMD path needs to check _PAGE_VALID not _PAGE_PRESENT, to decide if it needs to bail and return 0. pmd_large() should therefore just check _PAGE_PMD_HUGE. Calls to gup_huge_pmd() are guarded with a check of pmd_large(), so we just need to add a valid bit check. Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit 04df419d) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
David S. Miller authored
This code was mistakenly using the exec bit from the PMD in all cases, even when the PMD isn't a huge PMD. If it's not a huge PMD, test the exec bit in the individual ptes down in tlb_batch_pmd_scan(). Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit 5b1e94fa) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
David S. Miller authored
Now that we have 64-bits for PMDs we can stop using special encodings for the huge PMD values, and just put real PTEs in there. We allocate a _PAGE_PMD_HUGE bit to distinguish between plain PMDs and huge ones. It is the same for both 4U and 4V PTE layouts. We also use _PAGE_SPECIAL to indicate the splitting state, since a huge PMD cannot also be special. All of the PMD --> PTE translation code disappears, and most of the huge PMD bit modifications and tests just degenerate into the PTE operations. In particular USER_PGTABLE_CHECK_PMD_HUGE becomes trivial. As a side effect, normal PMDs don't shift the physical address around. This also speeds up the page table walks in the TLB miss paths since they don't have to do the shifts any more. Another non-trivial aspect is that pte_modify() has to be changed to preserve the _PAGE_PMD_HUGE bits as well as the page size field of the pte. Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit a7b9403f) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
David S. Miller authored
To make the page tables compact, we were using 32-bit PGDs and PMDs. We only had to support <= 43 bits of physical addresses so this was quite feasible. In order to support larger physical addresses we have to move to 64-bit PGDs and PMDs. Most of the changes are straight-forward: 1) {pgd,pmd}_t --> unsigned long 2) Anything that tries to use plain "unsigned int" types with pgd/pmd values needs to be adjusted. In particular things like "0U" become "0UL". 3) {PGDIR,PMD}_BITS decrease by one. 4) In the assembler page table walkers, use "ldxa" instead of "lduwa" and adjust the low bit masks to clear out the low 3 bits instead of just the low 2 bits during pgd/pmd address formation. Also, use PTRS_PER_PGD and PTRS_PER_PMD in the sizing of the swapper_{pg_dir,low_pmd_dir} arrays. This patch does not try to take advantage of having 64-bits in the PMDs to simplify the hugepage code, that will come in a subsequent change. Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit 2b77933c) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
David S. Miller authored
The impetus for this is that we would like to move to 64-bit PMDs and PGDs, but that would result in only supporting a 42-bit address space with the current page table layout. It'd be nice to support at least 43-bits. The reason we'd end up with only 42-bits after making PMDs and PGDs 64-bit is that we only use half-page sized PTE tables in order to make PMDs line up to 4MB, the hardware huge page size we use. So what we do here is we make huge pages 8MB, and fabricate them using 4MB hw TLB entries. Facilitate this by providing a "REAL_HPAGE_SHIFT" which is used in places that really need to operate on hardware 4MB pages. Use full pages (512 entries) for PTE tables, and adjust PMD_SHIFT, PGD_SHIFT, and the build time CPP test as needed. Use a CPP test to make sure REAL_HPAGE_SHIFT and the _PAGE_SZHUGE_* we use match up. This makes the pgtable cache completely unused, so remove the code managing it and the state used in mm_context_t. Now we have less spinlocks taken in the page table allocation path. The technique we use to fabricate the 8MB pages is to transfer bit 22 from the missing virtual address into the PTEs physical address field. That takes care of the transparent huge pages case. For hugetlb, we fill things in at the PTE level and that code already puts the sub huge page physical bits into the PTEs, based upon the offset, so there is nothing special we need to do. It all just works out. So, a small amount of complexity in the THP case, but this code is about to get much simpler when we move the 64-bit PMDs as we can move away from the fancy 32-bit huge PMD encoding and just put a real PTE value in there. With bug fixes and help from Bob Picco. Signed-off-by:
David S. Miller <davem@davemloft.net> (cherry picked from commit 37b3a8ff) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
David S. Miller authored
Choose PAGE_OFFSET dynamically based upon cpu type. Original UltraSPARC-I (spitfire) chips only supported a 44-bit virtual address space. Newer chips (T4 and later) support 52-bit virtual addresses and up to 47-bits of physical memory space. Therefore we have to adjust PAGE_SIZE dynamically based upon the capabilities of the chip. Note that this change alone does not allow us to support > 43-bit physical memory, to do that we need to re-arrange our page table support. The current encodings of the pmd_t and pgd_t pointers restricts us to "32 + 11" == 43 bits. This change can waste quite a bit of memory for the various tables. In particular, a future change should work to size and allocate kern_linear_bitmap[] and sparc64_valid_addr_bitmap[] dynamically. This isn't easy as we really cannot take a TLB miss when accessing kern_linear_bitmap[]. We'd have to lock it into the TLB or similar. Signed-off-by:
David S. Miller <davem@davemloft.net> Acked-by:
Bob Picco <bob.picco@oracle.com> (cherry picked from commit b2d43834) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-
David S. Miller authored
Some parts of the code use '41' others use '42', make them all use the same value. Signed-off-by:
David S. Miller <davem@davemloft.net> Acked-by:
Bob Picco <bob.picco@oracle.com> (cherry picked from commit f998c9c0) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
-