1. 15 Feb, 2016 22 commits
    • Helge Deller's avatar
      parisc: Fix syscall restarts · 70e2633a
      Helge Deller authored
      commit 71a71fb5 upstream.
      
      On parisc syscalls which are interrupted by signals sometimes failed to
      restart and instead returned -ENOSYS which in the worst case lead to
      userspace crashes.
      A similiar problem existed on MIPS and was fixed by commit e967ef02
      ("MIPS: Fix restart of indirect syscalls").
      
      On parisc the current syscall restart code assumes that all syscall
      callers load the syscall number in the delay slot of the ble
      instruction. That's how it is e.g. done in the unistd.h header file:
      	ble 0x100(%sr2, %r0)
      	ldi #syscall_nr, %r20
      Because of that assumption the current code never restored %r20 before
      returning to userspace.
      
      This assumption is at least not true for code which uses the glibc
      syscall() function, which instead uses this syntax:
      	ble 0x100(%sr2, %r0)
      	copy regX, %r20
      where regX depend on how the compiler optimizes the code and register
      usage.
      
      This patch fixes this problem by adding code to analyze how the syscall
      number is loaded in the delay branch and - if needed - copy the syscall
      number to regX prior returning to userspace for the syscall restart.
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      70e2633a
    • Helge Deller's avatar
      parisc: Drop unused MADV_xxxK_PAGES flags from asm/mman.h · 7248da87
      Helge Deller authored
      commit dcbf0d29 upstream.
      
      Drop the MADV_xxK_PAGES flags, which were never used and were from a proposed
      API which was never integrated into the generic Linux kernel code.
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      7248da87
    • Andy Leiserson's avatar
      fix calculation of meta_bg descriptor backups · a6abf86b
      Andy Leiserson authored
      commit 904dad47 upstream.
      
      "group" is the group where the backup will be placed, and is
      initialized to zero in the declaration. This meant that backups for
      meta_bg descriptors were erroneously written to the backup block group
      descriptors in groups 1 and (desc_per_block-1).
      
      Reproduction information:
        mke2fs -Fq -t ext4 -b 1024 -O ^resize_inode /tmp/foo.img 16G
        truncate -s 24G /tmp/foo.img
        losetup /dev/loop0 /tmp/foo.img
        mount /dev/loop0 /mnt
        resize2fs /dev/loop0
        umount /dev/loop0
        dd if=/dev/zero of=/dev/loop0 bs=1024 count=2
        e2fsck -fy /dev/loop0
        losetup -d /dev/loop0
      Signed-off-by: default avatarAndy Leiserson <andy@leiserson.org>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      a6abf86b
    • Jan Kara's avatar
      jbd2: Fix unreclaimed pages after truncate in data=journal mode · 7c000068
      Jan Kara authored
      commit bc23f0c8 upstream.
      
      Ted and Namjae have reported that truncated pages don't get timely
      reclaimed after being truncated in data=journal mode. The following test
      triggers the issue easily:
      
      for (i = 0; i < 1000; i++) {
      	pwrite(fd, buf, 1024*1024, 0);
      	fsync(fd);
      	fsync(fd);
      	ftruncate(fd, 0);
      }
      
      The reason is that journal_unmap_buffer() finds that truncated buffers
      are not journalled (jh->b_transaction == NULL), they are part of
      checkpoint list of a transaction (jh->b_cp_transaction != NULL) and have
      been already written out (!buffer_dirty(bh)). We clean such buffers but
      we leave them in the checkpoint list. Since checkpoint transaction holds
      a reference to the journal head, these buffers cannot be released until
      the checkpoint transaction is cleaned up. And at that point we don't
      call release_buffer_page() anymore so pages detached from mapping are
      lingering in the system waiting for reclaim to find them and free them.
      
      Fix the problem by removing buffers from transaction checkpoint lists
      when journal_unmap_buffer() finds out they don't have to be there
      anymore.
      Reported-and-tested-by: default avatarNamjae Jeon <namjae.jeon@samsung.com>
      Fixes: de1b7941Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      7c000068
    • Qiu Peiyang's avatar
      tracing: Fix setting of start_index in find_next() · 96ecc0e4
      Qiu Peiyang authored
      commit f36d1be2 upstream.
      
      When we do cat /sys/kernel/debug/tracing/printk_formats, we hit kernel
      panic at t_show.
      
      general protection fault: 0000 [#1] PREEMPT SMP
      CPU: 0 PID: 2957 Comm: sh Tainted: G W  O 3.14.55-x86_64-01062-gd4acdc7 #2
      RIP: 0010:[<ffffffff811375b2>]
       [<ffffffff811375b2>] t_show+0x22/0xe0
      RSP: 0000:ffff88002b4ebe80  EFLAGS: 00010246
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004
      RDX: 0000000000000004 RSI: ffffffff81fd26a6 RDI: ffff880032f9f7b1
      RBP: ffff88002b4ebe98 R08: 0000000000001000 R09: 000000000000ffec
      R10: 0000000000000000 R11: 000000000000000f R12: ffff880004d9b6c0
      R13: 7365725f6d706400 R14: ffff880004d9b6c0 R15: ffffffff82020570
      FS:  0000000000000000(0000) GS:ffff88003aa00000(0063) knlGS:00000000f776bc40
      CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
      CR2: 00000000f6c02ff0 CR3: 000000002c2b3000 CR4: 00000000001007f0
      Call Trace:
       [<ffffffff811dc076>] seq_read+0x2f6/0x3e0
       [<ffffffff811b749b>] vfs_read+0x9b/0x160
       [<ffffffff811b7f69>] SyS_read+0x49/0xb0
       [<ffffffff81a3a4b9>] ia32_do_call+0x13/0x13
       ---[ end trace 5bd9eb630614861e ]---
      Kernel panic - not syncing: Fatal exception
      
      When the first time find_next calls find_next_mod_format, it should
      iterate the trace_bprintk_fmt_list to find the first print format of
      the module. However in current code, start_index is smaller than *pos
      at first, and code will not iterate the list. Latter container_of will
      get the wrong address with former v, which will cause mod_fmt be a
      meaningless object and so is the returned mod_fmt->fmt.
      
      This patch will fix it by correcting the start_index. After fixed,
      when the first time calls find_next_mod_format, start_index will be
      equal to *pos, and code will iterate the trace_bprintk_fmt_list to
      get the right module printk format, so is the returned mod_fmt->fmt.
      
      Link: http://lkml.kernel.org/r/5684B900.9000309@intel.com
      
      Fixes: 102c9323 "tracing: Add __tracepoint_string() to export string pointers"
      Signed-off-by: default avatarQiu Peiyang <peiyangx.qiu@intel.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      96ecc0e4
    • Boris BREZILLON's avatar
      mtd: mtdpart: fix add_mtd_partitions error path · 2b774589
      Boris BREZILLON authored
      commit e5bae867 upstream.
      
      If we fail to allocate a partition structure in the middle of the partition
      creation process, the already allocated partitions are never removed, which
      means they are still present in the partition list and their resources are
      never freed.
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@free-electrons.com>
      Signed-off-by: default avatarBrian Norris <computersforpeace@gmail.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      2b774589
    • Hon Ching \(Vicky\) Lo's avatar
      vTPM: fix memory allocation flag for rtce buffer at kernel boot · 9100d165
      Hon Ching \(Vicky\) Lo authored
      commit 60ecd86c upstream.
      
      At ibm vtpm initialzation, tpm_ibmvtpm_probe() registers its interrupt
      handler, ibmvtpm_interrupt, which calls ibmvtpm_crq_process to allocate
      memory for rtce buffer.  The current code uses 'GFP_KERNEL' as the
      type of kernel memory allocation, which resulted a warning at
      kernel/lockdep.c.  This patch uses 'GFP_ATOMIC' instead so that the
      allocation is high-priority and does not sleep.
      Signed-off-by: default avatarHon Ching(Vicky) Lo <honclo@linux.vnet.ibm.com>
      Signed-off-by: default avatarPeter Huewe <peterhuewe@gmx.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9100d165
    • Uri Mashiach's avatar
      wlcore/wl12xx: spi: fix NULL pointer dereference (Oops) · d5dfe938
      Uri Mashiach authored
      commit e47301b0 upstream.
      
      Fix the below Oops when trying to modprobe wlcore_spi.
      The oops occurs because the wl1271_power_{off,on}()
      function doesn't check the power() function pointer.
      
      [   23.401447] Unable to handle kernel NULL pointer dereference at
      virtual address 00000000
      [   23.409954] pgd = c0004000
      [   23.412922] [00000000] *pgd=00000000
      [   23.416693] Internal error: Oops: 80000007 [#1] SMP ARM
      [   23.422168] Modules linked in: wl12xx wlcore mac80211 cfg80211
      musb_dsps musb_hdrc usbcore usb_common snd_soc_simple_card evdev joydev
      omap_rng wlcore_spi snd_soc_tlv320aic23_i2c rng_core snd_soc_tlv320aic23
      c_can_platform c_can can_dev snd_soc_davinci_mcasp snd_soc_edma
      snd_soc_omap omap_wdt musb_am335x cpufreq_dt thermal_sys hwmon
      [   23.453253] CPU: 0 PID: 36 Comm: kworker/0:2 Not tainted
      4.2.0-00002-g951efee-dirty #233
      [   23.461720] Hardware name: Generic AM33XX (Flattened Device Tree)
      [   23.468123] Workqueue: events request_firmware_work_func
      [   23.473690] task: de32efc0 ti: de4ee000 task.ti: de4ee000
      [   23.479341] PC is at 0x0
      [   23.482112] LR is at wl12xx_set_power_on+0x28/0x124 [wlcore]
      [   23.488074] pc : [<00000000>]    lr : [<bf2581f0>]    psr: 60000013
      [   23.488074] sp : de4efe50  ip : 00000002  fp : 00000000
      [   23.500162] r10: de7cdd00  r9 : dc848800  r8 : bf27af00
      [   23.505663] r7 : bf27a1a8  r6 : dcbd8a80  r5 : dce0e2e0  r4 :
      dce0d2e0
      [   23.512536] r3 : 00000000  r2 : 00000000  r1 : 00000001  r0 :
      dc848810
      [   23.519412] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM
      Segment kernel
      [   23.527109] Control: 10c5387d  Table: 9cb78019  DAC: 00000015
      [   23.533160] Process kworker/0:2 (pid: 36, stack limit = 0xde4ee218)
      [   23.539760] Stack: (0xde4efe50 to 0xde4f0000)
      
      [...]
      
      [   23.665030] [<bf2581f0>] (wl12xx_set_power_on [wlcore]) from
      [<bf25f7ac>] (wlcore_nvs_cb+0x118/0xa4c [wlcore])
      [   23.675604] [<bf25f7ac>] (wlcore_nvs_cb [wlcore]) from [<c04387ec>]
      (request_firmware_work_func+0x30/0x58)
      [   23.685784] [<c04387ec>] (request_firmware_work_func) from
      [<c0058e2c>] (process_one_work+0x1b4/0x4b4)
      [   23.695591] [<c0058e2c>] (process_one_work) from [<c0059168>]
      (worker_thread+0x3c/0x4a4)
      [   23.704124] [<c0059168>] (worker_thread) from [<c005ee68>]
      (kthread+0xd4/0xf0)
      [   23.711747] [<c005ee68>] (kthread) from [<c000f598>]
      (ret_from_fork+0x14/0x3c)
      [   23.719357] Code: bad PC value
      [   23.722760] ---[ end trace 981be8510db9b3a9 ]---
      
      Prevent oops by validationg power() pointer value before
      calling the function.
      Signed-off-by: default avatarUri Mashiach <uri.mashiach@compulab.co.il>
      Acked-by: default avatarIgor Grinberg <grinberg@compulab.co.il>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      d5dfe938
    • Uri Mashiach's avatar
      wlcore/wl12xx: spi: fix oops on firmware load · eb0c82e1
      Uri Mashiach authored
      commit 9b2761cb upstream.
      
      The maximum chunks used by the function is
      (SPI_AGGR_BUFFER_SIZE / WSPI_MAX_CHUNK_SIZE + 1).
      The original commands array had space for
      (SPI_AGGR_BUFFER_SIZE / WSPI_MAX_CHUNK_SIZE) commands.
      When the last chunk is used (len > 4 * WSPI_MAX_CHUNK_SIZE), the last
      command is stored outside the bounds of the commands array.
      
      Oops 5 (page fault) is generated during current wl1271 firmware load
      attempt:
      
      root@debian-armhf:~# ifconfig wlan0 up
      [  294.312399] Unable to handle kernel paging request at virtual address
      00203fc4
      [  294.320173] pgd = de528000
      [  294.323028] [00203fc4] *pgd=00000000
      [  294.326916] Internal error: Oops: 5 [#1] SMP ARM
      [  294.331789] Modules linked in: bnep rfcomm bluetooth ipv6 arc4 wl12xx
      wlcore mac80211 musb_dsps cfg80211 musb_hdrc usbcore usb_common
      wlcore_spi omap_rng rng_core musb_am335x omap_wdt cpufreq_dt thermal_sys
      hwmon
      [  294.351838] CPU: 0 PID: 1827 Comm: ifconfig Not tainted
      4.2.0-00002-g3e9ad27-dirty #78
      [  294.360154] Hardware name: Generic AM33XX (Flattened Device Tree)
      [  294.366557] task: dc9d6d40 ti: de550000 task.ti: de550000
      [  294.372236] PC is at __spi_validate+0xa8/0x2ac
      [  294.376902] LR is at __spi_sync+0x78/0x210
      [  294.381200] pc : [<c049c760>]    lr : [<c049ebe0>]    psr: 60000013
      [  294.381200] sp : de551998  ip : de5519d8  fp : 00200000
      [  294.393242] r10: de551c8c  r9 : de5519d8  r8 : de3a9000
      [  294.398730] r7 : de3a9258  r6 : de3a9400  r5 : de551a48  r4 :
      00203fbc
      [  294.405577] r3 : 00000000  r2 : 00000000  r1 : 00000000  r0 :
      de3a9000
      [  294.412420] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM
      Segment user
      [  294.419918] Control: 10c5387d  Table: 9e528019  DAC: 00000015
      [  294.425954] Process ifconfig (pid: 1827, stack limit = 0xde550218)
      [  294.432437] Stack: (0xde551998 to 0xde552000)
      
      ...
      
      [  294.883613] [<c049c760>] (__spi_validate) from [<c049ebe0>]
      (__spi_sync+0x78/0x210)
      [  294.891670] [<c049ebe0>] (__spi_sync) from [<bf036598>]
      (wl12xx_spi_raw_write+0xfc/0x148 [wlcore_spi])
      [  294.901661] [<bf036598>] (wl12xx_spi_raw_write [wlcore_spi]) from
      [<bf21c694>] (wlcore_boot_upload_firmware+0x1ec/0x458 [wlcore])
      [  294.914038] [<bf21c694>] (wlcore_boot_upload_firmware [wlcore]) from
      [<bf24532c>] (wl12xx_boot+0xc10/0xfac [wl12xx])
      [  294.925161] [<bf24532c>] (wl12xx_boot [wl12xx]) from [<bf20d5cc>]
      (wl1271_op_add_interface+0x5b0/0x910 [wlcore])
      [  294.936364] [<bf20d5cc>] (wl1271_op_add_interface [wlcore]) from
      [<bf15c4ac>] (ieee80211_do_open+0x44c/0xf7c [mac80211])
      [  294.947963] [<bf15c4ac>] (ieee80211_do_open [mac80211]) from
      [<c0537978>] (__dev_open+0xa8/0x110)
      [  294.957307] [<c0537978>] (__dev_open) from [<c0537bf8>]
      (__dev_change_flags+0x88/0x148)
      [  294.965713] [<c0537bf8>] (__dev_change_flags) from [<c0537cd0>]
      (dev_change_flags+0x18/0x48)
      [  294.974576] [<c0537cd0>] (dev_change_flags) from [<c05a55a0>]
      (devinet_ioctl+0x6b4/0x7d0)
      [  294.983191] [<c05a55a0>] (devinet_ioctl) from [<c0517040>]
      (sock_ioctl+0x1e4/0x2bc)
      [  294.991244] [<c0517040>] (sock_ioctl) from [<c017d378>]
      (do_vfs_ioctl+0x420/0x6b0)
      [  294.999208] [<c017d378>] (do_vfs_ioctl) from [<c017d674>]
      (SyS_ioctl+0x6c/0x7c)
      [  295.006880] [<c017d674>] (SyS_ioctl) from [<c000f4c0>]
      (ret_fast_syscall+0x0/0x54)
      [  295.014835] Code: e1550004 e2444034 0a00007d e5953018 (e5942008)
      [  295.021544] ---[ end trace 66ed188198f4e24e ]---
      Signed-off-by: default avatarUri Mashiach <uri.mashiach@compulab.co.il>
      Acked-by: default avatarIgor Grinberg <grinberg@compulab.co.il>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      eb0c82e1
    • Johan Hovold's avatar
      spi: fix parent-device reference leak · 9671d925
      Johan Hovold authored
      commit 157f38f9 upstream.
      
      Fix parent-device reference leak due to SPI-core taking an unnecessary
      reference to the parent when allocating the master structure, a
      reference that was never released.
      
      Note that driver core takes its own reference to the parent when the
      master device is registered.
      
      Fixes: 49dce689 ("spi doesn't need class_device")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9671d925
    • Vignesh R's avatar
      spi: ti-qspi: Fix data corruption seen on r/w stress test · 3b8d4228
      Vignesh R authored
      commit bc27a539 upstream.
      
      Writing invalid command to QSPI_SPI_CMD_REG will terminate current
      transfer and de-assert the chip select. This has to be done before
      calling spi_finalize_current_message(). Because
      spi_finalize_current_message() will mark the end of current message
      transfer and schedule the next transfer. If the chipselect is not
      de-asserted before calling spi_finalize_current_message() then the next
      transfer will overlap with the previous transfer leading to data
      corruption.
      __spi_pump_message() can be called either from kthread worker context or
      directly from the calling process's context. It is possible that these
      two calls can race against each other. But race is serialized by
      checking whether master->cur_msg == NULL (pointer to msg being handled
      by transfer_one() at present). The master->cur_msg is set to NULL when
      spi_finalize_current_message() is called on that message, which means
      calling spi_finalize_current_message() allows __spi_sync() to pump next
      message in calling process context.
      Now if spi-ti-qspi calls spi_finalize_current_message() before we
      terminate transfer at hardware side, if __spi_pump_message() is called
      from process context then the successive transactions can overlap.
      
      Fix this by moving writing invalid command to QSPI_SPI_CMD_REG to
      before calling spi_finalize_current_message() call.
      Signed-off-by: default avatarVignesh R <vigneshr@ti.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      3b8d4228
    • David Mosberger-Tang's avatar
      spi: atmel: Fix DMA-setup for transfers with more than 8 bits per word · faed3a9b
      David Mosberger-Tang authored
      commit 06515f83 upstream.
      
      The DMA-slave configuration depends on the whether <= 8 or > 8 bits
      are transferred per word, so we need to call
      atmel_spi_dma_slave_config() with the correct value.
      Signed-off-by: default avatarDavid Mosberger <davidm@egauge.net>
      Signed-off-by: default avatarNicolas Ferre <nicolas.ferre@atmel.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      faed3a9b
    • Mauricio Faria de Oliveira's avatar
      Revert "dm mpath: fix stalls when handling invalid ioctls" · ec3b2cdc
      Mauricio Faria de Oliveira authored
      commit 47796938 upstream.
      
      This reverts commit a1989b33.
      
      That commit introduced a regression at least for the case of the SG_IO ioctl()
      running without CAP_SYS_RAWIO capability (e.g., unprivileged users) when there
      are no active paths: the ioctl() fails with the ENOTTY errno immediately rather
      than blocking due to queue_if_no_path until a path becomes active, for example.
      
      That case happens to be exercised by QEMU KVM guests with 'scsi-block' devices
      (qemu "-device scsi-block" [1], libvirt "<disk type='block' device='lun'>" [2])
      from multipath devices; which leads to SCSI/filesystem errors in such a guest.
      
      More general scenarios can hit that regression too. The following demonstration
      employs a SG_IO ioctl() with a standard SCSI INQUIRY command for this objective
      (some output & user changes omitted for brevity and comments added for clarity).
      
      Reverting that commit restores normal operation (queueing) in failing scenarios;
      tested on linux-next (next-20151022).
      
      1) Test-case is based on sg_simple0 [3] (just SG_IO; remove SG_GET_VERSION_NUM)
      
          $ cat sg_simple0.c
          ... see [3] ...
          $ sed '/SG_GET_VERSION_NUM/,/}/d' sg_simple0.c > sgio_inquiry.c
          $ gcc sgio_inquiry.c -o sgio_inquiry
      
      2) The ioctl() works fine with active paths present.
      
          # multipath -l 85ag56
          85ag56 (...) dm-19 IBM     ,2145
          size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw
          |-+- policy='service-time 0' prio=0 status=active
          | |- 8:0:11:0  sdz  65:144  active undef running
          | `- 9:0:9:0   sdbf 67:144  active undef running
          `-+- policy='service-time 0' prio=0 status=enabled
            |- 8:0:12:0  sdae 65:224  active undef running
            `- 9:0:12:0  sdbo 68:32   active undef running
      
          $ ./sgio_inquiry /dev/mapper/85ag56
          Some of the INQUIRY command's response:
              IBM       2145              0000
          INQUIRY duration=0 millisecs, resid=0
      
      3) The ioctl() fails with ENOTTY errno with _no_ active paths present,
         for unprivileged users (rather than blocking due to queue_if_no_path).
      
          # for path in $(multipath -l 85ag56 | grep -o 'sd[a-z]\+'); \
                do multipathd -k"fail path $path"; done
      
          # multipath -l 85ag56
          85ag56 (...) dm-19 IBM     ,2145
          size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw
          |-+- policy='service-time 0' prio=0 status=enabled
          | |- 8:0:11:0  sdz  65:144  failed undef running
          | `- 9:0:9:0   sdbf 67:144  failed undef running
          `-+- policy='service-time 0' prio=0 status=enabled
            |- 8:0:12:0  sdae 65:224  failed undef running
            `- 9:0:12:0  sdbo 68:32   failed undef running
      
          $ ./sgio_inquiry /dev/mapper/85ag56
          sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device
      
      4) dmesg shows that scsi_verify_blk_ioctl() failed for SG_IO (0x2285);
         it returns -ENOIOCTLCMD, later replaced with -ENOTTY in vfs_ioctl().
      
          $ dmesg
          <...>
          [] device-mapper: multipath: Failing path 65:144.
          [] device-mapper: multipath: Failing path 67:144.
          [] device-mapper: multipath: Failing path 65:224.
          [] device-mapper: multipath: Failing path 68:32.
          [] sgio_inquiry: sending ioctl 2285 to a partition!
      
      5) The ioctl() only works if the SYS_CAP_RAWIO capability is present
         (then queueing happens -- in this example, queue_if_no_path is set);
         this is due to a conditional check in scsi_verify_blk_ioctl().
      
          # capsh --drop=cap_sys_rawio -- -c './sgio_inquiry /dev/mapper/85ag56'
          sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device
      
          # ./sgio_inquiry /dev/mapper/85ag56 &
          [1] 72830
      
          # cat /proc/72830/stack
          [<c00000171c0df700>] 0xc00000171c0df700
          [<c000000000015934>] __switch_to+0x204/0x350
          [<c000000000152d4c>] msleep+0x5c/0x80
          [<c00000000077dfb0>] dm_blk_ioctl+0x70/0x170
          [<c000000000487c40>] blkdev_ioctl+0x2b0/0x9b0
          [<c0000000003128e4>] block_ioctl+0x64/0xd0
          [<c0000000002dd3b0>] do_vfs_ioctl+0x490/0x780
          [<c0000000002dd774>] SyS_ioctl+0xd4/0xf0
          [<c000000000009358>] system_call+0x38/0xd0
      
      6) This is the function call chain exercised in this analysis:
      
      SYSCALL_DEFINE3(ioctl, <...>) @ fs/ioctl.c
          -> do_vfs_ioctl()
              -> vfs_ioctl()
                  ...
                  error = filp->f_op->unlocked_ioctl(filp, cmd, arg);
                  ...
                      -> dm_blk_ioctl() @ drivers/md/dm.c
                          -> multipath_ioctl() @ drivers/md/dm-mpath.c
                              ...
                              (bdev = NULL, due to no active paths)
                              ...
                              if (!bdev || <...>) {
                                  int err = scsi_verify_blk_ioctl(NULL, cmd);
                                  if (err)
                                      r = err;
                              }
                              ...
                                  -> scsi_verify_blk_ioctl() @ block/scsi_ioctl.c
                                      ...
                                      if (bd && bd == bd->bd_contains) // not taken (bd = NULL)
                                          return 0;
                                      ...
                                      if (capable(CAP_SYS_RAWIO)) // not taken (unprivileged user)
                                          return 0;
                                      ...
                                      printk_ratelimited(KERN_WARNING
                                                 "%s: sending ioctl %x to a partition!\n" <...>);
      
                                      return -ENOIOCTLCMD;
                                  <-
                              ...
                              return r ? : <...>
                          <-
                  ...
                  if (error == -ENOIOCTLCMD)
                      error = -ENOTTY;
                   out:
                      return error;
                  ...
      
      Links:
      [1] http://git.qemu.org/?p=qemu.git;a=commit;h=336a6915bc7089fb20fea4ba99972ad9a97c5f52
      [2] https://libvirt.org/formatdomain.html#elementsDisks (see 'disk' -> 'device')
      [3] http://tldp.org/HOWTO/SCSI-Generic-HOWTO/pexample.html (Revision 1.2, 2002-05-03)
      Signed-off-by: default avatarMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ec3b2cdc
    • Dmitry V. Levin's avatar
      sh64: fix __NR_fgetxattr · afa6ea6a
      Dmitry V. Levin authored
      commit 2d33fa10 upstream.
      
      According to arch/sh/kernel/syscalls_64.S and common sense, __NR_fgetxattr
      has to be defined to 259, but it doesn't.  Instead, it's defined to 269,
      which is of course used by another syscall, __NR_sched_setaffinity in this
      case.
      
      This bug was found by strace test suite.
      Signed-off-by: default avatarDmitry V. Levin <ldv@altlinux.org>
      Acked-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      afa6ea6a
    • xuejiufei's avatar
      ocfs2/dlm: clear refmap bit of recovery lock while doing local recovery cleanup · 4e7094d5
      xuejiufei authored
      commit c95a5180 upstream.
      
      When recovery master down, dlm_do_local_recovery_cleanup() only remove
      the $RECOVERY lock owned by dead node, but do not clear the refmap bit.
      Which will make umount thread falling in dead loop migrating $RECOVERY
      to the dead node.
      Signed-off-by: default avatarxuejiufei <xuejiufei@huawei.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      4e7094d5
    • xuejiufei's avatar
      ocfs2/dlm: ignore cleaning the migration mle that is inuse · 327f9703
      xuejiufei authored
      commit bef5502d upstream.
      
      We have found that migration source will trigger a BUG that the refcount
      of mle is already zero before put when the target is down during
      migration.  The situation is as follows:
      
      dlm_migrate_lockres
        dlm_add_migration_mle
        dlm_mark_lockres_migrating
        dlm_get_mle_inuse
        <<<<<< Now the refcount of the mle is 2.
        dlm_send_one_lockres and wait for the target to become the
        new master.
        <<<<<< o2hb detect the target down and clean the migration
        mle. Now the refcount is 1.
      
      dlm_migrate_lockres woken, and put the mle twice when found the target
      goes down which trigger the BUG with the following message:
      
        "ERROR: bad mle: ".
      Signed-off-by: default avatarJiufei Xue <xuejiufei@huawei.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      327f9703
    • Richard Weinberger's avatar
      kernel/signal.c: unexport sigsuspend() · 562b5d5e
      Richard Weinberger authored
      commit 9d8a7652 upstream.
      
      sigsuspend() is nowhere used except in signal.c itself, so we can mark it
      static do not pollute the global namespace.
      
      But this patch is more than a boring cleanup patch, it fixes a real issue
      on UserModeLinux.  UML has a special console driver to display ttys using
      xterm, or other terminal emulators, on the host side.  Vegard reported
      that sometimes UML is unable to spawn a xterm and he's facing the
      following warning:
      
        WARNING: CPU: 0 PID: 908 at include/linux/thread_info.h:128 sigsuspend+0xab/0xc0()
      
      It turned out that this warning makes absolutely no sense as the UML
      xterm code calls sigsuspend() on the host side, at least it tries.  But
      as the kernel itself offers a sigsuspend() symbol the linker choose this
      one instead of the glibc wrapper.  Interestingly this code used to work
      since ever but always blocked signals on the wrong side.  Some recent
      kernel change made the WARN_ON() trigger and uncovered the bug.
      
      It is a wonderful example of how much works by chance on computers. :-)
      
      Fixes: 68f3f16d ("new helper: sigsuspend()")
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Reported-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Tested-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      562b5d5e
    • OGAWA Hirofumi's avatar
      fat: fix fake_offset handling on error path · 8ffd304f
      OGAWA Hirofumi authored
      commit 928a4771 upstream.
      
      For the root directory, .  and ..  are faked (using dir_emit_dots()) and
      ctx->pos is reset from 2 to 0.
      
      A corrupted root directory could cause fat_get_entry() to fail, but
      ->iterate() (fat_readdir()) reports progress to the VFS (with ctx->pos
      rewound to 0), so any following calls to ->iterate() continue to return
      the same entries again and again.
      
      The result is that userspace will never see the end of the directory,
      causing e.g.  'ls' to hang in a getdents() loop.
      
      [hirofumi@mail.parknet.co.jp: cleanup and make sure to correct fake_offset]
      Reported-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Tested-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarRichard Weinberger <richard.weinberger@gmail.com>
      Signed-off-by: default avatarOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      8ffd304f
    • Arnd Bergmann's avatar
      remoteproc: avoid stack overflow in debugfs file · 881aa623
      Arnd Bergmann authored
      commit 92792e48 upstream.
      
      Recent gcc versions warn about reading from a negative offset of
      an on-stack array:
      
      drivers/remoteproc/remoteproc_debugfs.c: In function 'rproc_recovery_write':
      drivers/remoteproc/remoteproc_debugfs.c:167:9: warning: 'buf[4294967295u]' may be used uninitialized in this function [-Wmaybe-uninitialized]
      
      I don't see anything in sys_write() that prevents us from
      being called with a zero 'count' argument, so we should
      add an extra check in rproc_recovery_write() to prevent the
      access and avoid the warning.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Fixes: 2e37abb8 ("remoteproc: create a 'recovery' debugfs entry")
      Signed-off-by: default avatarOhad Ben-Cohen <ohad@wizery.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      881aa623
    • Oleg Nesterov's avatar
      proc: actually make proc_fd_permission() thread-friendly · 71755056
      Oleg Nesterov authored
      commit 54708d28 upstream.
      
      The commit 96d0df79 ("proc: make proc_fd_permission() thread-friendly")
      fixed the access to /proc/self/fd from sub-threads, but introduced another
      problem: a sub-thread can't access /proc/<tid>/fd/ or /proc/thread-self/fd
      if generic_permission() fails.
      
      Change proc_fd_permission() to check same_thread_group(pid_task(), current).
      
      Fixes: 96d0df79 ("proc: make proc_fd_permission() thread-friendly")
      Reported-by: default avatar"Jin, Yihua" <yihua.jin@intel.com>
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      71755056
    • Jiri Slaby's avatar
      Revert "ocfs2: fix umask ignored issue" · 9d7b1354
      Jiri Slaby authored
      This reverts commit e1f20b83, upstream
      commit 8f1eb487. This commit fixes
      702e5bc6 ("ocfs2: use generic posix ACL infrastructure"), which is only
      in 3.14.
      
      So this commit should have never been applied to 3.12 and it can
      cause sgid inheritance issues.
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9d7b1354
    • Ben Hutchings's avatar
      pipe: Fix buffer offset after partially failed read · 6316a3e9
      Ben Hutchings authored
      Quoting the RHEL advisory:
      
      > It was found that the fix for CVE-2015-1805 incorrectly kept buffer
      > offset and buffer length in sync on a failed atomic read, potentially
      > resulting in a pipe buffer state corruption. A local, unprivileged user
      > could use this flaw to crash the system or leak kernel memory to user
      > space. (CVE-2016-0774, Moderate)
      
      The same flawed fix was applied to stable branches from 2.6.32.y to
      3.14.y inclusive, and I was able to reproduce the issue on 3.2.y.
      We need to give pipe_iov_copy_to_user() a separate offset variable
      and only update the buffer offset if it succeeds.
      
      References: https://rhn.redhat.com/errata/RHSA-2016-0103.htmlSigned-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      6316a3e9
  2. 12 Feb, 2016 18 commits