1. 17 Feb, 2016 25 commits
    • Jan Kara's avatar
      jbd2: Fix unreclaimed pages after truncate in data=journal mode · 0d7e2f0d
      Jan Kara authored
      commit bc23f0c8 upstream.
      
      Ted and Namjae have reported that truncated pages don't get timely
      reclaimed after being truncated in data=journal mode. The following test
      triggers the issue easily:
      
      for (i = 0; i < 1000; i++) {
      	pwrite(fd, buf, 1024*1024, 0);
      	fsync(fd);
      	fsync(fd);
      	ftruncate(fd, 0);
      }
      
      The reason is that journal_unmap_buffer() finds that truncated buffers
      are not journalled (jh->b_transaction == NULL), they are part of
      checkpoint list of a transaction (jh->b_cp_transaction != NULL) and have
      been already written out (!buffer_dirty(bh)). We clean such buffers but
      we leave them in the checkpoint list. Since checkpoint transaction holds
      a reference to the journal head, these buffers cannot be released until
      the checkpoint transaction is cleaned up. And at that point we don't
      call release_buffer_page() anymore so pages detached from mapping are
      lingering in the system waiting for reclaim to find them and free them.
      
      Fix the problem by removing buffers from transaction checkpoint lists
      when journal_unmap_buffer() finds out they don't have to be there
      anymore.
      Reported-and-tested-by: default avatarNamjae Jeon <namjae.jeon@samsung.com>
      Fixes: de1b7941Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0d7e2f0d
    • Qiu Peiyang's avatar
      tracing: Fix setting of start_index in find_next() · da9165f6
      Qiu Peiyang authored
      commit f36d1be2 upstream.
      
      When we do cat /sys/kernel/debug/tracing/printk_formats, we hit kernel
      panic at t_show.
      
      general protection fault: 0000 [#1] PREEMPT SMP
      CPU: 0 PID: 2957 Comm: sh Tainted: G W  O 3.14.55-x86_64-01062-gd4acdc7 #2
      RIP: 0010:[<ffffffff811375b2>]
       [<ffffffff811375b2>] t_show+0x22/0xe0
      RSP: 0000:ffff88002b4ebe80  EFLAGS: 00010246
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004
      RDX: 0000000000000004 RSI: ffffffff81fd26a6 RDI: ffff880032f9f7b1
      RBP: ffff88002b4ebe98 R08: 0000000000001000 R09: 000000000000ffec
      R10: 0000000000000000 R11: 000000000000000f R12: ffff880004d9b6c0
      R13: 7365725f6d706400 R14: ffff880004d9b6c0 R15: ffffffff82020570
      FS:  0000000000000000(0000) GS:ffff88003aa00000(0063) knlGS:00000000f776bc40
      CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
      CR2: 00000000f6c02ff0 CR3: 000000002c2b3000 CR4: 00000000001007f0
      Call Trace:
       [<ffffffff811dc076>] seq_read+0x2f6/0x3e0
       [<ffffffff811b749b>] vfs_read+0x9b/0x160
       [<ffffffff811b7f69>] SyS_read+0x49/0xb0
       [<ffffffff81a3a4b9>] ia32_do_call+0x13/0x13
       ---[ end trace 5bd9eb630614861e ]---
      Kernel panic - not syncing: Fatal exception
      
      When the first time find_next calls find_next_mod_format, it should
      iterate the trace_bprintk_fmt_list to find the first print format of
      the module. However in current code, start_index is smaller than *pos
      at first, and code will not iterate the list. Latter container_of will
      get the wrong address with former v, which will cause mod_fmt be a
      meaningless object and so is the returned mod_fmt->fmt.
      
      This patch will fix it by correcting the start_index. After fixed,
      when the first time calls find_next_mod_format, start_index will be
      equal to *pos, and code will iterate the trace_bprintk_fmt_list to
      get the right module printk format, so is the returned mod_fmt->fmt.
      
      Link: http://lkml.kernel.org/r/5684B900.9000309@intel.com
      
      Fixes: 102c9323 "tracing: Add __tracepoint_string() to export string pointers"
      Signed-off-by: default avatarQiu Peiyang <peiyangx.qiu@intel.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      da9165f6
    • Christoph Biedl's avatar
      PCI: Fix minimum allocation address overwrite · f65088dd
      Christoph Biedl authored
      commit 3460baa6 upstream.
      
      Commit 36e097a8 ("PCI: Split out bridge window override of minimum
      allocation address") claimed to do no functional changes but unfortunately
      did: The "min" variable is altered.  At least the AVM A1 PCMCIA adapter was
      no longer detected, breaking ISDN operation.
      
      Use a local copy of "min" to restore the previous behaviour.
      
      [bhelgaas: avoid gcc "?:" extension for portability and readability]
      Fixes: 36e097a8 ("PCI: Split out bridge window override of minimum allocation address")
      Signed-off-by: default avatarChristoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f65088dd
    • Boris BREZILLON's avatar
      mtd: mtdpart: fix add_mtd_partitions error path · 6188cfb9
      Boris BREZILLON authored
      commit e5bae867 upstream.
      
      If we fail to allocate a partition structure in the middle of the partition
      creation process, the already allocated partitions are never removed, which
      means they are still present in the partition list and their resources are
      never freed.
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@free-electrons.com>
      Signed-off-by: default avatarBrian Norris <computersforpeace@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6188cfb9
    • Hon Ching \(Vicky\) Lo's avatar
      vTPM: fix memory allocation flag for rtce buffer at kernel boot · 20605785
      Hon Ching \(Vicky\) Lo authored
      commit 60ecd86c upstream.
      
      At ibm vtpm initialzation, tpm_ibmvtpm_probe() registers its interrupt
      handler, ibmvtpm_interrupt, which calls ibmvtpm_crq_process to allocate
      memory for rtce buffer.  The current code uses 'GFP_KERNEL' as the
      type of kernel memory allocation, which resulted a warning at
      kernel/lockdep.c.  This patch uses 'GFP_ATOMIC' instead so that the
      allocation is high-priority and does not sleep.
      Signed-off-by: default avatarHon Ching(Vicky) Lo <honclo@linux.vnet.ibm.com>
      Signed-off-by: default avatarPeter Huewe <peterhuewe@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      20605785
    • Uri Mashiach's avatar
      wlcore/wl12xx: spi: fix NULL pointer dereference (Oops) · 38a2d810
      Uri Mashiach authored
      commit e47301b0 upstream.
      
      Fix the below Oops when trying to modprobe wlcore_spi.
      The oops occurs because the wl1271_power_{off,on}()
      function doesn't check the power() function pointer.
      
      [   23.401447] Unable to handle kernel NULL pointer dereference at
      virtual address 00000000
      [   23.409954] pgd = c0004000
      [   23.412922] [00000000] *pgd=00000000
      [   23.416693] Internal error: Oops: 80000007 [#1] SMP ARM
      [   23.422168] Modules linked in: wl12xx wlcore mac80211 cfg80211
      musb_dsps musb_hdrc usbcore usb_common snd_soc_simple_card evdev joydev
      omap_rng wlcore_spi snd_soc_tlv320aic23_i2c rng_core snd_soc_tlv320aic23
      c_can_platform c_can can_dev snd_soc_davinci_mcasp snd_soc_edma
      snd_soc_omap omap_wdt musb_am335x cpufreq_dt thermal_sys hwmon
      [   23.453253] CPU: 0 PID: 36 Comm: kworker/0:2 Not tainted
      4.2.0-00002-g951efee-dirty #233
      [   23.461720] Hardware name: Generic AM33XX (Flattened Device Tree)
      [   23.468123] Workqueue: events request_firmware_work_func
      [   23.473690] task: de32efc0 ti: de4ee000 task.ti: de4ee000
      [   23.479341] PC is at 0x0
      [   23.482112] LR is at wl12xx_set_power_on+0x28/0x124 [wlcore]
      [   23.488074] pc : [<00000000>]    lr : [<bf2581f0>]    psr: 60000013
      [   23.488074] sp : de4efe50  ip : 00000002  fp : 00000000
      [   23.500162] r10: de7cdd00  r9 : dc848800  r8 : bf27af00
      [   23.505663] r7 : bf27a1a8  r6 : dcbd8a80  r5 : dce0e2e0  r4 :
      dce0d2e0
      [   23.512536] r3 : 00000000  r2 : 00000000  r1 : 00000001  r0 :
      dc848810
      [   23.519412] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM
      Segment kernel
      [   23.527109] Control: 10c5387d  Table: 9cb78019  DAC: 00000015
      [   23.533160] Process kworker/0:2 (pid: 36, stack limit = 0xde4ee218)
      [   23.539760] Stack: (0xde4efe50 to 0xde4f0000)
      
      [...]
      
      [   23.665030] [<bf2581f0>] (wl12xx_set_power_on [wlcore]) from
      [<bf25f7ac>] (wlcore_nvs_cb+0x118/0xa4c [wlcore])
      [   23.675604] [<bf25f7ac>] (wlcore_nvs_cb [wlcore]) from [<c04387ec>]
      (request_firmware_work_func+0x30/0x58)
      [   23.685784] [<c04387ec>] (request_firmware_work_func) from
      [<c0058e2c>] (process_one_work+0x1b4/0x4b4)
      [   23.695591] [<c0058e2c>] (process_one_work) from [<c0059168>]
      (worker_thread+0x3c/0x4a4)
      [   23.704124] [<c0059168>] (worker_thread) from [<c005ee68>]
      (kthread+0xd4/0xf0)
      [   23.711747] [<c005ee68>] (kthread) from [<c000f598>]
      (ret_from_fork+0x14/0x3c)
      [   23.719357] Code: bad PC value
      [   23.722760] ---[ end trace 981be8510db9b3a9 ]---
      
      Prevent oops by validationg power() pointer value before
      calling the function.
      Signed-off-by: default avatarUri Mashiach <uri.mashiach@compulab.co.il>
      Acked-by: default avatarIgor Grinberg <grinberg@compulab.co.il>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      38a2d810
    • Uri Mashiach's avatar
      wlcore/wl12xx: spi: fix oops on firmware load · 2cdbd2b0
      Uri Mashiach authored
      commit 9b2761cb upstream.
      
      The maximum chunks used by the function is
      (SPI_AGGR_BUFFER_SIZE / WSPI_MAX_CHUNK_SIZE + 1).
      The original commands array had space for
      (SPI_AGGR_BUFFER_SIZE / WSPI_MAX_CHUNK_SIZE) commands.
      When the last chunk is used (len > 4 * WSPI_MAX_CHUNK_SIZE), the last
      command is stored outside the bounds of the commands array.
      
      Oops 5 (page fault) is generated during current wl1271 firmware load
      attempt:
      
      root@debian-armhf:~# ifconfig wlan0 up
      [  294.312399] Unable to handle kernel paging request at virtual address
      00203fc4
      [  294.320173] pgd = de528000
      [  294.323028] [00203fc4] *pgd=00000000
      [  294.326916] Internal error: Oops: 5 [#1] SMP ARM
      [  294.331789] Modules linked in: bnep rfcomm bluetooth ipv6 arc4 wl12xx
      wlcore mac80211 musb_dsps cfg80211 musb_hdrc usbcore usb_common
      wlcore_spi omap_rng rng_core musb_am335x omap_wdt cpufreq_dt thermal_sys
      hwmon
      [  294.351838] CPU: 0 PID: 1827 Comm: ifconfig Not tainted
      4.2.0-00002-g3e9ad27-dirty #78
      [  294.360154] Hardware name: Generic AM33XX (Flattened Device Tree)
      [  294.366557] task: dc9d6d40 ti: de550000 task.ti: de550000
      [  294.372236] PC is at __spi_validate+0xa8/0x2ac
      [  294.376902] LR is at __spi_sync+0x78/0x210
      [  294.381200] pc : [<c049c760>]    lr : [<c049ebe0>]    psr: 60000013
      [  294.381200] sp : de551998  ip : de5519d8  fp : 00200000
      [  294.393242] r10: de551c8c  r9 : de5519d8  r8 : de3a9000
      [  294.398730] r7 : de3a9258  r6 : de3a9400  r5 : de551a48  r4 :
      00203fbc
      [  294.405577] r3 : 00000000  r2 : 00000000  r1 : 00000000  r0 :
      de3a9000
      [  294.412420] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM
      Segment user
      [  294.419918] Control: 10c5387d  Table: 9e528019  DAC: 00000015
      [  294.425954] Process ifconfig (pid: 1827, stack limit = 0xde550218)
      [  294.432437] Stack: (0xde551998 to 0xde552000)
      
      ...
      
      [  294.883613] [<c049c760>] (__spi_validate) from [<c049ebe0>]
      (__spi_sync+0x78/0x210)
      [  294.891670] [<c049ebe0>] (__spi_sync) from [<bf036598>]
      (wl12xx_spi_raw_write+0xfc/0x148 [wlcore_spi])
      [  294.901661] [<bf036598>] (wl12xx_spi_raw_write [wlcore_spi]) from
      [<bf21c694>] (wlcore_boot_upload_firmware+0x1ec/0x458 [wlcore])
      [  294.914038] [<bf21c694>] (wlcore_boot_upload_firmware [wlcore]) from
      [<bf24532c>] (wl12xx_boot+0xc10/0xfac [wl12xx])
      [  294.925161] [<bf24532c>] (wl12xx_boot [wl12xx]) from [<bf20d5cc>]
      (wl1271_op_add_interface+0x5b0/0x910 [wlcore])
      [  294.936364] [<bf20d5cc>] (wl1271_op_add_interface [wlcore]) from
      [<bf15c4ac>] (ieee80211_do_open+0x44c/0xf7c [mac80211])
      [  294.947963] [<bf15c4ac>] (ieee80211_do_open [mac80211]) from
      [<c0537978>] (__dev_open+0xa8/0x110)
      [  294.957307] [<c0537978>] (__dev_open) from [<c0537bf8>]
      (__dev_change_flags+0x88/0x148)
      [  294.965713] [<c0537bf8>] (__dev_change_flags) from [<c0537cd0>]
      (dev_change_flags+0x18/0x48)
      [  294.974576] [<c0537cd0>] (dev_change_flags) from [<c05a55a0>]
      (devinet_ioctl+0x6b4/0x7d0)
      [  294.983191] [<c05a55a0>] (devinet_ioctl) from [<c0517040>]
      (sock_ioctl+0x1e4/0x2bc)
      [  294.991244] [<c0517040>] (sock_ioctl) from [<c017d378>]
      (do_vfs_ioctl+0x420/0x6b0)
      [  294.999208] [<c017d378>] (do_vfs_ioctl) from [<c017d674>]
      (SyS_ioctl+0x6c/0x7c)
      [  295.006880] [<c017d674>] (SyS_ioctl) from [<c000f4c0>]
      (ret_fast_syscall+0x0/0x54)
      [  295.014835] Code: e1550004 e2444034 0a00007d e5953018 (e5942008)
      [  295.021544] ---[ end trace 66ed188198f4e24e ]---
      Signed-off-by: default avatarUri Mashiach <uri.mashiach@compulab.co.il>
      Acked-by: default avatarIgor Grinberg <grinberg@compulab.co.il>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2cdbd2b0
    • Johan Hovold's avatar
      spi: fix parent-device reference leak · 91f4fe2a
      Johan Hovold authored
      commit 157f38f9 upstream.
      
      Fix parent-device reference leak due to SPI-core taking an unnecessary
      reference to the parent when allocating the master structure, a
      reference that was never released.
      
      Note that driver core takes its own reference to the parent when the
      master device is registered.
      
      Fixes: 49dce689 ("spi doesn't need class_device")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      91f4fe2a
    • Vignesh R's avatar
      spi: ti-qspi: Fix data corruption seen on r/w stress test · 456a39ba
      Vignesh R authored
      commit bc27a539 upstream.
      
      Writing invalid command to QSPI_SPI_CMD_REG will terminate current
      transfer and de-assert the chip select. This has to be done before
      calling spi_finalize_current_message(). Because
      spi_finalize_current_message() will mark the end of current message
      transfer and schedule the next transfer. If the chipselect is not
      de-asserted before calling spi_finalize_current_message() then the next
      transfer will overlap with the previous transfer leading to data
      corruption.
      __spi_pump_message() can be called either from kthread worker context or
      directly from the calling process's context. It is possible that these
      two calls can race against each other. But race is serialized by
      checking whether master->cur_msg == NULL (pointer to msg being handled
      by transfer_one() at present). The master->cur_msg is set to NULL when
      spi_finalize_current_message() is called on that message, which means
      calling spi_finalize_current_message() allows __spi_sync() to pump next
      message in calling process context.
      Now if spi-ti-qspi calls spi_finalize_current_message() before we
      terminate transfer at hardware side, if __spi_pump_message() is called
      from process context then the successive transactions can overlap.
      
      Fix this by moving writing invalid command to QSPI_SPI_CMD_REG to
      before calling spi_finalize_current_message() call.
      Signed-off-by: default avatarVignesh R <vigneshr@ti.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      456a39ba
    • David Mosberger-Tang's avatar
      spi: atmel: Fix DMA-setup for transfers with more than 8 bits per word · 1156eaf0
      David Mosberger-Tang authored
      commit 06515f83 upstream.
      
      The DMA-slave configuration depends on the whether <= 8 or > 8 bits
      are transferred per word, so we need to call
      atmel_spi_dma_slave_config() with the correct value.
      Signed-off-by: default avatarDavid Mosberger <davidm@egauge.net>
      Signed-off-by: default avatarNicolas Ferre <nicolas.ferre@atmel.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1156eaf0
    • Mauricio Faria de Oliveira's avatar
      Revert "dm mpath: fix stalls when handling invalid ioctls" · 04b627d5
      Mauricio Faria de Oliveira authored
      commit 47796938 upstream.
      
      This reverts commit a1989b33.
      
      That commit introduced a regression at least for the case of the SG_IO ioctl()
      running without CAP_SYS_RAWIO capability (e.g., unprivileged users) when there
      are no active paths: the ioctl() fails with the ENOTTY errno immediately rather
      than blocking due to queue_if_no_path until a path becomes active, for example.
      
      That case happens to be exercised by QEMU KVM guests with 'scsi-block' devices
      (qemu "-device scsi-block" [1], libvirt "<disk type='block' device='lun'>" [2])
      from multipath devices; which leads to SCSI/filesystem errors in such a guest.
      
      More general scenarios can hit that regression too. The following demonstration
      employs a SG_IO ioctl() with a standard SCSI INQUIRY command for this objective
      (some output & user changes omitted for brevity and comments added for clarity).
      
      Reverting that commit restores normal operation (queueing) in failing scenarios;
      tested on linux-next (next-20151022).
      
      1) Test-case is based on sg_simple0 [3] (just SG_IO; remove SG_GET_VERSION_NUM)
      
          $ cat sg_simple0.c
          ... see [3] ...
          $ sed '/SG_GET_VERSION_NUM/,/}/d' sg_simple0.c > sgio_inquiry.c
          $ gcc sgio_inquiry.c -o sgio_inquiry
      
      2) The ioctl() works fine with active paths present.
      
          # multipath -l 85ag56
          85ag56 (...) dm-19 IBM     ,2145
          size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw
          |-+- policy='service-time 0' prio=0 status=active
          | |- 8:0:11:0  sdz  65:144  active undef running
          | `- 9:0:9:0   sdbf 67:144  active undef running
          `-+- policy='service-time 0' prio=0 status=enabled
            |- 8:0:12:0  sdae 65:224  active undef running
            `- 9:0:12:0  sdbo 68:32   active undef running
      
          $ ./sgio_inquiry /dev/mapper/85ag56
          Some of the INQUIRY command's response:
              IBM       2145              0000
          INQUIRY duration=0 millisecs, resid=0
      
      3) The ioctl() fails with ENOTTY errno with _no_ active paths present,
         for unprivileged users (rather than blocking due to queue_if_no_path).
      
          # for path in $(multipath -l 85ag56 | grep -o 'sd[a-z]\+'); \
                do multipathd -k"fail path $path"; done
      
          # multipath -l 85ag56
          85ag56 (...) dm-19 IBM     ,2145
          size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw
          |-+- policy='service-time 0' prio=0 status=enabled
          | |- 8:0:11:0  sdz  65:144  failed undef running
          | `- 9:0:9:0   sdbf 67:144  failed undef running
          `-+- policy='service-time 0' prio=0 status=enabled
            |- 8:0:12:0  sdae 65:224  failed undef running
            `- 9:0:12:0  sdbo 68:32   failed undef running
      
          $ ./sgio_inquiry /dev/mapper/85ag56
          sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device
      
      4) dmesg shows that scsi_verify_blk_ioctl() failed for SG_IO (0x2285);
         it returns -ENOIOCTLCMD, later replaced with -ENOTTY in vfs_ioctl().
      
          $ dmesg
          <...>
          [] device-mapper: multipath: Failing path 65:144.
          [] device-mapper: multipath: Failing path 67:144.
          [] device-mapper: multipath: Failing path 65:224.
          [] device-mapper: multipath: Failing path 68:32.
          [] sgio_inquiry: sending ioctl 2285 to a partition!
      
      5) The ioctl() only works if the SYS_CAP_RAWIO capability is present
         (then queueing happens -- in this example, queue_if_no_path is set);
         this is due to a conditional check in scsi_verify_blk_ioctl().
      
          # capsh --drop=cap_sys_rawio -- -c './sgio_inquiry /dev/mapper/85ag56'
          sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device
      
          # ./sgio_inquiry /dev/mapper/85ag56 &
          [1] 72830
      
          # cat /proc/72830/stack
          [<c00000171c0df700>] 0xc00000171c0df700
          [<c000000000015934>] __switch_to+0x204/0x350
          [<c000000000152d4c>] msleep+0x5c/0x80
          [<c00000000077dfb0>] dm_blk_ioctl+0x70/0x170
          [<c000000000487c40>] blkdev_ioctl+0x2b0/0x9b0
          [<c0000000003128e4>] block_ioctl+0x64/0xd0
          [<c0000000002dd3b0>] do_vfs_ioctl+0x490/0x780
          [<c0000000002dd774>] SyS_ioctl+0xd4/0xf0
          [<c000000000009358>] system_call+0x38/0xd0
      
      6) This is the function call chain exercised in this analysis:
      
      SYSCALL_DEFINE3(ioctl, <...>) @ fs/ioctl.c
          -> do_vfs_ioctl()
              -> vfs_ioctl()
                  ...
                  error = filp->f_op->unlocked_ioctl(filp, cmd, arg);
                  ...
                      -> dm_blk_ioctl() @ drivers/md/dm.c
                          -> multipath_ioctl() @ drivers/md/dm-mpath.c
                              ...
                              (bdev = NULL, due to no active paths)
                              ...
                              if (!bdev || <...>) {
                                  int err = scsi_verify_blk_ioctl(NULL, cmd);
                                  if (err)
                                      r = err;
                              }
                              ...
                                  -> scsi_verify_blk_ioctl() @ block/scsi_ioctl.c
                                      ...
                                      if (bd && bd == bd->bd_contains) // not taken (bd = NULL)
                                          return 0;
                                      ...
                                      if (capable(CAP_SYS_RAWIO)) // not taken (unprivileged user)
                                          return 0;
                                      ...
                                      printk_ratelimited(KERN_WARNING
                                                 "%s: sending ioctl %x to a partition!\n" <...>);
      
                                      return -ENOIOCTLCMD;
                                  <-
                              ...
                              return r ? : <...>
                          <-
                  ...
                  if (error == -ENOIOCTLCMD)
                      error = -ENOTTY;
                   out:
                      return error;
                  ...
      
      Links:
      [1] http://git.qemu.org/?p=qemu.git;a=commit;h=336a6915bc7089fb20fea4ba99972ad9a97c5f52
      [2] https://libvirt.org/formatdomain.html#elementsDisks (see 'disk' -> 'device')
      [3] http://tldp.org/HOWTO/SCSI-Generic-HOWTO/pexample.html (Revision 1.2, 2002-05-03)
      Signed-off-by: default avatarMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      04b627d5
    • Dmitry V. Levin's avatar
      sh64: fix __NR_fgetxattr · aaebbf19
      Dmitry V. Levin authored
      commit 2d33fa10 upstream.
      
      According to arch/sh/kernel/syscalls_64.S and common sense, __NR_fgetxattr
      has to be defined to 259, but it doesn't.  Instead, it's defined to 269,
      which is of course used by another syscall, __NR_sched_setaffinity in this
      case.
      
      This bug was found by strace test suite.
      Signed-off-by: default avatarDmitry V. Levin <ldv@altlinux.org>
      Acked-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aaebbf19
    • xuejiufei's avatar
      ocfs2/dlm: clear refmap bit of recovery lock while doing local recovery cleanup · e25c03af
      xuejiufei authored
      commit c95a5180 upstream.
      
      When recovery master down, dlm_do_local_recovery_cleanup() only remove
      the $RECOVERY lock owned by dead node, but do not clear the refmap bit.
      Which will make umount thread falling in dead loop migrating $RECOVERY
      to the dead node.
      Signed-off-by: default avatarxuejiufei <xuejiufei@huawei.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e25c03af
    • xuejiufei's avatar
      ocfs2/dlm: ignore cleaning the migration mle that is inuse · 6f7b1e5e
      xuejiufei authored
      commit bef5502d upstream.
      
      We have found that migration source will trigger a BUG that the refcount
      of mle is already zero before put when the target is down during
      migration.  The situation is as follows:
      
      dlm_migrate_lockres
        dlm_add_migration_mle
        dlm_mark_lockres_migrating
        dlm_get_mle_inuse
        <<<<<< Now the refcount of the mle is 2.
        dlm_send_one_lockres and wait for the target to become the
        new master.
        <<<<<< o2hb detect the target down and clean the migration
        mle. Now the refcount is 1.
      
      dlm_migrate_lockres woken, and put the mle twice when found the target
      goes down which trigger the BUG with the following message:
      
        "ERROR: bad mle: ".
      Signed-off-by: default avatarJiufei Xue <xuejiufei@huawei.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6f7b1e5e
    • Junxiao Bi's avatar
      ocfs2: fix SGID not inherited issue · e98aa53e
      Junxiao Bi authored
      commit 854ee2e9 upstream.
      
      Commit 8f1eb487 ("ocfs2: fix umask ignored issue") introduced an
      issue, SGID of sub dir was not inherited from its parents dir.  It is
      because SGID is set into "inode->i_mode" in ocfs2_get_init_inode(), but
      is overwritten by "mode" which don't have SGID set later.
      
      Fixes: 8f1eb487 ("ocfs2: fix umask ignored issue")
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Acked-by: default avatarSrinivas Eeda <srinivas.eeda@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e98aa53e
    • Richard Weinberger's avatar
      kernel/signal.c: unexport sigsuspend() · 79db232a
      Richard Weinberger authored
      commit 9d8a7652 upstream.
      
      sigsuspend() is nowhere used except in signal.c itself, so we can mark it
      static do not pollute the global namespace.
      
      But this patch is more than a boring cleanup patch, it fixes a real issue
      on UserModeLinux.  UML has a special console driver to display ttys using
      xterm, or other terminal emulators, on the host side.  Vegard reported
      that sometimes UML is unable to spawn a xterm and he's facing the
      following warning:
      
        WARNING: CPU: 0 PID: 908 at include/linux/thread_info.h:128 sigsuspend+0xab/0xc0()
      
      It turned out that this warning makes absolutely no sense as the UML
      xterm code calls sigsuspend() on the host side, at least it tries.  But
      as the kernel itself offers a sigsuspend() symbol the linker choose this
      one instead of the glibc wrapper.  Interestingly this code used to work
      since ever but always blocked signals on the wrong side.  Some recent
      kernel change made the WARN_ON() trigger and uncovered the bug.
      
      It is a wonderful example of how much works by chance on computers. :-)
      
      Fixes: 68f3f16d ("new helper: sigsuspend()")
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Reported-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Tested-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      79db232a
    • OGAWA Hirofumi's avatar
      fat: fix fake_offset handling on error path · 4c132ee7
      OGAWA Hirofumi authored
      commit 928a4771 upstream.
      
      For the root directory, .  and ..  are faked (using dir_emit_dots()) and
      ctx->pos is reset from 2 to 0.
      
      A corrupted root directory could cause fat_get_entry() to fail, but
      ->iterate() (fat_readdir()) reports progress to the VFS (with ctx->pos
      rewound to 0), so any following calls to ->iterate() continue to return
      the same entries again and again.
      
      The result is that userspace will never see the end of the directory,
      causing e.g.  'ls' to hang in a getdents() loop.
      
      [hirofumi@mail.parknet.co.jp: cleanup and make sure to correct fake_offset]
      Reported-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Tested-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarRichard Weinberger <richard.weinberger@gmail.com>
      Signed-off-by: default avatarOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4c132ee7
    • Arnd Bergmann's avatar
      remoteproc: avoid stack overflow in debugfs file · 4f02ad96
      Arnd Bergmann authored
      commit 92792e48 upstream.
      
      Recent gcc versions warn about reading from a negative offset of
      an on-stack array:
      
      drivers/remoteproc/remoteproc_debugfs.c: In function 'rproc_recovery_write':
      drivers/remoteproc/remoteproc_debugfs.c:167:9: warning: 'buf[4294967295u]' may be used uninitialized in this function [-Wmaybe-uninitialized]
      
      I don't see anything in sys_write() that prevents us from
      being called with a zero 'count' argument, so we should
      add an extra check in rproc_recovery_write() to prevent the
      access and avoid the warning.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Fixes: 2e37abb8 ("remoteproc: create a 'recovery' debugfs entry")
      Signed-off-by: default avatarOhad Ben-Cohen <ohad@wizery.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4f02ad96
    • Oleg Nesterov's avatar
      proc: actually make proc_fd_permission() thread-friendly · e8c26ae7
      Oleg Nesterov authored
      commit 54708d28 upstream.
      
      The commit 96d0df79 ("proc: make proc_fd_permission() thread-friendly")
      fixed the access to /proc/self/fd from sub-threads, but introduced another
      problem: a sub-thread can't access /proc/<tid>/fd/ or /proc/thread-self/fd
      if generic_permission() fails.
      
      Change proc_fd_permission() to check same_thread_group(pid_task(), current).
      
      Fixes: 96d0df79 ("proc: make proc_fd_permission() thread-friendly")
      Reported-by: default avatar"Jin, Yihua" <yihua.jin@intel.com>
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e8c26ae7
    • Ioan-Adrian Ratiu's avatar
      HID: usbhid: fix recursive deadlock · fcb65392
      Ioan-Adrian Ratiu authored
      commit e470127e upstream.
      
      The critical section protected by usbhid->lock in hid_ctrl() is too
      big and because of this it causes a recursive deadlock. "Too big" means
      the case statement and the call to hid_input_report() do not need to be
      protected by the spinlock (no URB operations are done inside them).
      
      The deadlock happens because in certain rare cases drivers try to grab
      the lock while handling the ctrl irq which grabs the lock before them
      as described above. For example newer wacom tablets like 056a:033c try
      to reschedule proximity reads from wacom_intuos_schedule_prox_event()
      calling hid_hw_request() -> usbhid_request() -> usbhid_submit_report()
      which tries to grab the usbhid lock already held by hid_ctrl().
      
      There are two ways to get out of this deadlock:
          1. Make the drivers work "around" the ctrl critical region, in the
          wacom case for ex. by delaying the scheduling of the proximity read
          request itself to a workqueue.
          2. Shrink the critical region so the usbhid lock protects only the
          instructions which modify usbhid state, calling hid_input_report()
          with the spinlock unlocked, allowing the device driver to grab the
          lock first, finish and then grab the lock afterwards in hid_ctrl().
      
      This patch implements the 2nd solution.
      Signed-off-by: default avatarIoan-Adrian Ratiu <adi@adirat.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarJason Gerecke <jason.gerecke@wacom.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fcb65392
    • Seth Jennings's avatar
      drivers/base/memory.c: prohibit offlining of memory blocks with missing sections · e9fcef00
      Seth Jennings authored
      commit 26bbe7ef upstream.
      
      Commit bdee237c ("x86: mm: Use 2GB memory block size on large-memory
      x86-64 systems") and 982792c7 ("x86, mm: probe memory block size for
      generic x86 64bit") introduced large block sizes for x86.  This made it
      possible to have multiple sections per memory block where previously,
      there was a only every one section per block.
      
      Since blocks consist of contiguous ranges of section, there can be holes
      in the blocks where sections are not present.  If one attempts to
      offline such a block, a crash occurs since the code is not designed to
      deal with this.
      
      This patch is a quick fix to gaurd against the crash by not allowing
      blocks with non-present sections to be offlined.
      
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=107781Signed-off-by: default avatarSeth Jennings <sjennings@variantweb.net>
      Reported-by: default avatarAndrew Banman <abanman@sgi.com>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Greg KH <greg@kroah.com>
      Cc: Russ Anderson <rja@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e9fcef00
    • Mike Snitzer's avatar
      dm btree: fix leak of bufio-backed block in btree_split_sibling error path · dd1abb32
      Mike Snitzer authored
      commit 30ce6e1c upstream.
      
      The block allocated at the start of btree_split_sibling() is never
      released if later insert_at() fails.
      
      Fix this by releasing the previously allocated bufio block using
      unlock_block().
      Reported-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dd1abb32
    • Herbert Xu's avatar
      crypto: algif_hash - Only export and import on sockets with data · d5cdc58a
      Herbert Xu authored
      commit 4afa5f96 upstream.
      
      The hash_accept call fails to work on sockets that have not received
      any data.  For some algorithm implementations it may cause crashes.
      
      This patch fixes this by ensuring that we only export and import on
      sockets that have received data.
      Reported-by: default avatarHarsh Jain <harshjain.prof@gmail.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Tested-by: default avatarStephan Mueller <smueller@chronox.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d5cdc58a
    • libin's avatar
      recordmcount: Fix endianness handling bug for nop_mcount · f5acefe8
      libin authored
      commit c84da8b9 upstream.
      
      In nop_mcount, shdr->sh_offset and welp->r_offset should handle
      endianness properly, otherwise it will trigger Segmentation fault
      if the recordmcount main and file.o have different endianness.
      
      Link: http://lkml.kernel.org/r/563806C7.7070606@huawei.comSigned-off-by: default avatarLi Bin <huawei.libin@huawei.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f5acefe8
    • Greg Kroah-Hartman's avatar
      xhci: fix placement of call to usb_disabled() · ac6c30ad
      Greg Kroah-Hartman authored
      In the backport of 1eaf35e4, the call to
      usb_disabled() was too late, after we had already done some allocation.
      Move that call to the top of the function instead, making the logic
      match what is intended and is in the original patch.
      Reported-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ac6c30ad
  2. 29 Jan, 2016 15 commits
    • Greg Kroah-Hartman's avatar
      Linux 3.14.60 · 757bcff7
      Greg Kroah-Hartman authored
      757bcff7
    • Yang Shi's avatar
      arm64: restore bogomips information in /proc/cpuinfo · 5fb9b4fd
      Yang Shi authored
      commit 92e788b7 upstream.
      
      As previously reported, some userspace applications depend on bogomips
      showed by /proc/cpuinfo. Although there is much less legacy impact on
      aarch64 than arm, it does break libvirt.
      
      This patch reverts commit 326b16db ("arm64: delay: don't bother
      reporting bogomips in /proc/cpuinfo"), but with some tweak due to
      context change and without the pr_info().
      
      Fixes: 326b16db ("arm64: delay: don't bother reporting bogomips in /proc/cpuinfo")
      Signed-off-by: default avatarYang Shi <yang.shi@linaro.org>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Cc: <stable@vger.kernel.org> # 3.12+
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5fb9b4fd
    • Guenter Roeck's avatar
      mn10300: Select CONFIG_HAVE_UID16 to fix build failure · 89423dbc
      Guenter Roeck authored
      commit c86576ea upstream.
      
      mn10300 builds fail with
      
      fs/stat.c: In function 'cp_old_stat':
      fs/stat.c:163:2: error: 'old_uid_t' undeclared
      
      ipc/util.c: In function 'ipc64_perm_to_ipc_perm':
      ipc/util.c:540:2: error: 'old_uid_t' undeclared
      
      Select CONFIG_HAVE_UID16 and remove local definition of CONFIG_UID16
      to fix the problem.
      
      Fixes: fbc416ff ("arm64: fix building without CONFIG_UID16")
      Cc: Arnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarAcked-by: David Howells <dhowells@redhat.com>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      89423dbc
    • Andrew Morton's avatar
      openrisc: fix CONFIG_UID16 setting · e8809090
      Andrew Morton authored
      commit 04ea1e91 upstream.
      
      openrisc-allnoconfig:
      
        kernel/uid16.c: In function 'SYSC_setgroups16':
        kernel/uid16.c:184:2: error: implicit declaration of function 'groups_alloc'
        kernel/uid16.c:184:13: warning: assignment makes pointer from integer without a cast
      
      openrisc shouldn't be setting CONFIG_UID16 when CONFIG_MULTIUSER=n.
      
      Fixes: 2813893f ("kernel: conditionally support non-root users, groups and capabilities")
      Reported-by: default avatarFengguang Wu <fengguang.wu@gmail.com>
      Cc: Iulia Manda <iulia.manda21@gmail.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e8809090
    • Richard Purdie's avatar
      HID: core: Avoid uninitialized buffer access · 389642cf
      Richard Purdie authored
      commit 79b568b9 upstream.
      
      hid_connect adds various strings to the buffer but they're all
      conditional. You can find circumstances where nothing would be written
      to it but the kernel will still print the supposedly empty buffer with
      printk. This leads to corruption on the console/in the logs.
      
      Ensure buf is initialized to an empty string.
      Signed-off-by: default avatarRichard Purdie <richard.purdie@linuxfoundation.org>
      [dvhart: Initialize string to "" rather than assign buf[0] = NULL;]
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: linux-input@vger.kernel.org
      Signed-off-by: default avatarDarren Hart <dvhart@linux.intel.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      389642cf
    • Mikulas Patocka's avatar
      parisc iommu: fix panic due to trying to allocate too large region · c91ab59c
      Mikulas Patocka authored
      commit e46e31a3 upstream.
      
      When using the Promise TX2+ SATA controller on PA-RISC, the system often
      crashes with kernel panic, for example just writing data with the dd
      utility will make it crash.
      
      Kernel panic - not syncing: drivers/parisc/sba_iommu.c: I/O MMU @ 000000000000a000 is out of mapping resources
      
      CPU: 0 PID: 18442 Comm: mkspadfs Not tainted 4.4.0-rc2 #2
      Backtrace:
       [<000000004021497c>] show_stack+0x14/0x20
       [<0000000040410bf0>] dump_stack+0x88/0x100
       [<000000004023978c>] panic+0x124/0x360
       [<0000000040452c18>] sba_alloc_range+0x698/0x6a0
       [<0000000040453150>] sba_map_sg+0x260/0x5b8
       [<000000000c18dbb4>] ata_qc_issue+0x264/0x4a8 [libata]
       [<000000000c19535c>] ata_scsi_translate+0xe4/0x220 [libata]
       [<000000000c19a93c>] ata_scsi_queuecmd+0xbc/0x320 [libata]
       [<0000000040499bbc>] scsi_dispatch_cmd+0xfc/0x130
       [<000000004049da34>] scsi_request_fn+0x6e4/0x970
       [<00000000403e95a8>] __blk_run_queue+0x40/0x60
       [<00000000403e9d8c>] blk_run_queue+0x3c/0x68
       [<000000004049a534>] scsi_run_queue+0x2a4/0x360
       [<000000004049be68>] scsi_end_request+0x1a8/0x238
       [<000000004049de84>] scsi_io_completion+0xfc/0x688
       [<0000000040493c74>] scsi_finish_command+0x17c/0x1d0
      
      The cause of the crash is not exhaustion of the IOMMU space, there is
      plenty of free pages. The function sba_alloc_range is called with size
      0x11000, thus the pages_needed variable is 0x11. The function
      sba_search_bitmap is called with bits_wanted 0x11 and boundary size is
      0x10 (because dma_get_seg_boundary(dev) returns 0xffff).
      
      The function sba_search_bitmap attempts to allocate 17 pages that must not
      cross 16-page boundary - it can't satisfy this requirement
      (iommu_is_span_boundary always returns true) and fails even if there are
      many free entries in the IOMMU space.
      
      How did it happen that we try to allocate 17 pages that don't cross
      16-page boundary? The cause is in the function iommu_coalesce_chunks. This
      function tries to coalesce adjacent entries in the scatterlist. The
      function does several checks if it may coalesce one entry with the next,
      one of those checks is this:
      
      	if (startsg->length + dma_len > max_seg_size)
      		break;
      
      When it finishes coalescing adjacent entries, it allocates the mapping:
      
      sg_dma_len(contig_sg) = dma_len;
      dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE);
      sg_dma_address(contig_sg) =
      	PIDE_FLAG
      	| (iommu_alloc_range(ioc, dev, dma_len) << IOVP_SHIFT)
      	| dma_offset;
      
      It is possible that (startsg->length + dma_len > max_seg_size) is false
      (we are just near the 0x10000 max_seg_size boundary), so the funcion
      decides to coalesce this entry with the next entry. When the coalescing
      succeeds, the function performs
      	dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE);
      And now, because of non-zero dma_offset, dma_len is greater than 0x10000.
      iommu_alloc_range (a pointer to sba_alloc_range) is called and it attempts
      to allocate 17 pages for a device that must not cross 16-page boundary.
      
      To fix the bug, we must make sure that dma_len after addition of
      dma_offset and alignment doesn't cross the segment boundary. I.e. change
      	if (startsg->length + dma_len > max_seg_size)
      		break;
      to
      	if (ALIGN(dma_len + dma_offset + startsg->length, IOVP_SIZE) > max_seg_size)
      		break;
      
      This patch makes this change (it precalculates max_seg_boundary at the
      beginning of the function iommu_coalesce_chunks). I also added a check
      that the mapping length doesn't exceed dma_get_seg_boundary(dev) (it is
      not needed for Promise TX2+ SATA, but it may be needed for other devices
      that have dma_get_seg_boundary lower than dma_get_max_seg_size).
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c91ab59c
    • Will Deacon's avatar
      arm64: mm: ensure that the zero page is visible to the page table walker · c05b4395
      Will Deacon authored
      commit 32d63978 upstream.
      
      In paging_init, we allocate the zero page, memset it to zero and then
      point TTBR0 to it in order to avoid speculative fetches through the
      identity mapping.
      
      In order to guarantee that the freshly zeroed page is indeed visible to
      the page table walker, we need to execute a dsb instruction prior to
      writing the TTBR.
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c05b4395
    • John Blackwood's avatar
      arm64: Clear out any singlestep state on a ptrace detach operation · c87aa8ce
      John Blackwood authored
      commit 5db4fd8c upstream.
      
      Make sure to clear out any ptrace singlestep state when a ptrace(2)
      PTRACE_DETACH call is made on arm64 systems.
      
      Otherwise, the previously ptraced task will die off with a SIGTRAP
      signal if the debugger just previously singlestepped the ptraced task.
      Signed-off-by: default avatarJohn Blackwood <john.blackwood@ccur.com>
      [will: added comment to justify why this is in the arch code]
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c87aa8ce
    • Arnd Bergmann's avatar
      arm64: fix building without CONFIG_UID16 · e858dfb6
      Arnd Bergmann authored
      commit fbc416ff upstream.
      
      As reported by Michal Simek, building an ARM64 kernel with CONFIG_UID16
      disabled currently fails because the system call table still needs to
      reference the individual function entry points that are provided by
      kernel/sys_ni.c in this case, and the declarations are hidden inside
      of #ifdef CONFIG_UID16:
      
      arch/arm64/include/asm/unistd32.h:57:8: error: 'sys_lchown16' undeclared here (not in a function)
       __SYSCALL(__NR_lchown, sys_lchown16)
      
      I believe this problem only exists on ARM64, because older architectures
      tend to not need declarations when their system call table is built
      in assembly code, while newer architectures tend to not need UID16
      support. ARM64 only uses these system calls for compatibility with
      32-bit ARM binaries.
      
      This changes the CONFIG_UID16 check into CONFIG_HAVE_UID16, which is
      set unconditionally on ARM64 with CONFIG_COMPAT, so we see the
      declarations whenever we need them, but otherwise the behavior is
      unchanged.
      
      Fixes: af1839eb ("Kconfig: clean up the long arch list for the UID16 config option")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e858dfb6
    • Marc Zyngier's avatar
      arm64: KVM: Fix AArch32 to AArch64 register mapping · fb68a784
      Marc Zyngier authored
      commit c0f09634 upstream.
      
      When running a 32bit guest under a 64bit hypervisor, the ARMv8
      architecture defines a mapping of the 32bit registers in the 64bit
      space. This includes banked registers that are being demultiplexed
      over the 64bit ones.
      
      On exceptions caused by an operation involving a 32bit register, the
      HW exposes the register number in the ESR_EL2 register. It was so
      far understood that SW had to distinguish between AArch32 and AArch64
      accesses (based on the current AArch32 mode and register number).
      
      It turns out that I misinterpreted the ARM ARM, and the clue is in
      D1.20.1: "For some exceptions, the exception syndrome given in the
      ESR_ELx identifies one or more register numbers from the issued
      instruction that generated the exception. Where the exception is
      taken from an Exception level using AArch32 these register numbers
      give the AArch64 view of the register."
      
      Which means that the HW is already giving us the translated version,
      and that we shouldn't try to interpret it at all (for example, doing
      an MMIO operation from the IRQ mode using the LR register leads to
      very unexpected behaviours).
      
      The fix is thus not to perform a call to vcpu_reg32() at all from
      vcpu_reg(), and use whatever register number is supplied directly.
      The only case we need to find out about the mapping is when we
      actively generate a register access, which only occurs when injecting
      a fault in a guest.
      Reviewed-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fb68a784
    • Ulrich Weigand's avatar
      scripts/recordmcount.pl: support data in text section on powerpc · 8cf051e1
      Ulrich Weigand authored
      commit 2e50c4be upstream.
      
      If a text section starts out with a data blob before the first
      function start label, disassembly parsing doing in recordmcount.pl
      gets confused on powerpc, leading to creation of corrupted module
      objects.
      
      This was not a problem so far since the compiler would never create
      such text sections.  However, this has changed with a recent change
      in GCC 6 to support distances of > 2GB between a function and its
      assoicated TOC in the ELFv2 ABI, exposing this problem.
      
      There is already code in recordmcount.pl to handle such data blobs
      on the sparc64 platform.  This patch uses the same method to handle
      those on powerpc as well.
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarUlrich Weigand <ulrich.weigand@de.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8cf051e1
    • Boqun Feng's avatar
      powerpc: Make {cmp}xchg* and their atomic_ versions fully ordered · a45c8a06
      Boqun Feng authored
      commit 81d7a329 upstream.
      
      According to memory-barriers.txt, xchg*, cmpxchg* and their atomic_
      versions all need to be fully ordered, however they are now just
      RELEASE+ACQUIRE, which are not fully ordered.
      
      So also replace PPC_RELEASE_BARRIER and PPC_ACQUIRE_BARRIER with
      PPC_ATOMIC_ENTRY_BARRIER and PPC_ATOMIC_EXIT_BARRIER in
      __{cmp,}xchg_{u32,u64} respectively to guarantee fully ordered semantics
      of atomic{,64}_{cmp,}xchg() and {cmp,}xchg(), as a complement of commit
      b97021f8 ("powerpc: Fix atomic_xxx_return barrier semantics")
      
      This patch depends on patch "powerpc: Make value-returning atomics fully
      ordered" for PPC_ATOMIC_ENTRY_BARRIER definition.
      Signed-off-by: default avatarBoqun Feng <boqun.feng@gmail.com>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a45c8a06
    • Boqun Feng's avatar
      powerpc: Make value-returning atomics fully ordered · 2af28251
      Boqun Feng authored
      commit 49e9cf3f upstream.
      
      According to memory-barriers.txt:
      
      > Any atomic operation that modifies some state in memory and returns
      > information about the state (old or new) implies an SMP-conditional
      > general memory barrier (smp_mb()) on each side of the actual
      > operation ...
      
      Which mean these operations should be fully ordered. However on PPC,
      PPC_ATOMIC_ENTRY_BARRIER is the barrier before the actual operation,
      which is currently "lwsync" if SMP=y. The leading "lwsync" can not
      guarantee fully ordered atomics, according to Paul Mckenney:
      
      https://lkml.org/lkml/2015/10/14/970
      
      To fix this, we define PPC_ATOMIC_ENTRY_BARRIER as "sync" to guarantee
      the fully-ordered semantics.
      
      This also makes futex atomics fully ordered, which can avoid possible
      memory ordering problems if userspace code relies on futex system call
      for fully ordered semantics.
      
      Fixes: b97021f8 ("powerpc: Fix atomic_xxx_return barrier semantics")
      Signed-off-by: default avatarBoqun Feng <boqun.feng@gmail.com>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2af28251
    • Michael Neuling's avatar
      powerpc/tm: Check for already reclaimed tasks · 70812a42
      Michael Neuling authored
      commit 7f821fc9 upstream.
      
      Currently we can hit a scenario where we'll tm_reclaim() twice.  This
      results in a TM bad thing exception because the second reclaim occurs
      when not in suspend mode.
      
      The scenario in which this can happen is the following.  We attempt to
      deliver a signal to userspace.  To do this we need obtain the stack
      pointer to write the signal context.  To get this stack pointer we
      must tm_reclaim() in case we need to use the checkpointed stack
      pointer (see get_tm_stackpointer()).  Normally we'd then return
      directly to userspace to deliver the signal without going through
      __switch_to().
      
      Unfortunatley, if at this point we get an error (such as a bad
      userspace stack pointer), we need to exit the process.  The exit will
      result in a __switch_to().  __switch_to() will attempt to save the
      process state which results in another tm_reclaim().  This
      tm_reclaim() now causes a TM Bad Thing exception as this state has
      already been saved and the processor is no longer in TM suspend mode.
      Whee!
      
      This patch checks the state of the MSR to ensure we are TM suspended
      before we attempt the tm_reclaim().  If we've already saved the state
      away, we should no longer be in TM suspend mode.  This has the
      additional advantage of checking for a potential TM Bad Thing
      exception.
      
      Found using syscall fuzzer.
      
      Fixes: fb09692e ("powerpc: Add reclaim and recheckpoint functions for context switching transactional memory processes")
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      70812a42
    • Michael Neuling's avatar
      powerpc/tm: Block signal return setting invalid MSR state · a327f056
      Michael Neuling authored
      commit d2b9d2a5 upstream.
      
      Currently we allow both the MSR T and S bits to be set by userspace on
      a signal return.  Unfortunately this is a reserved configuration and
      will cause a TM Bad Thing exception if attempted (via rfid).
      
      This patch checks for this case in both the 32 and 64 bit signals
      code.  If both T and S are set, we mark the context as invalid.
      
      Found using a syscall fuzzer.
      
      Fixes: 2b0a576d ("powerpc: Add new transactional memory state to the signal context")
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a327f056