1. 25 Feb, 2016 3 commits
  2. 24 Feb, 2016 37 commits
    • Naoya Horiguchi's avatar
      mm: soft-offline: check return value in second __get_any_page() call · 5b48ad31
      Naoya Horiguchi authored
      commit d96b339f upstream.
      
      I saw the following BUG_ON triggered in a testcase where a process calls
      madvise(MADV_SOFT_OFFLINE) on thps, along with a background process that
      calls migratepages command repeatedly (doing ping-pong among different
      NUMA nodes) for the first process:
      
         Soft offlining page 0x60000 at 0x700000600000
         __get_any_page: 0x60000 free buddy page
         page:ffffea0001800000 count:0 mapcount:-127 mapping:          (null) index:0x1
         flags: 0x1fffc0000000000()
         page dumped because: VM_BUG_ON_PAGE(atomic_read(&page->_count) == 0)
         ------------[ cut here ]------------
         kernel BUG at /src/linux-dev/include/linux/mm.h:342!
         invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
         Modules linked in: cfg80211 rfkill crc32c_intel serio_raw virtio_balloon i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi
         CPU: 3 PID: 3035 Comm: test_alloc_gene Tainted: G           O    4.4.0-rc8-v4.4-rc8-160107-1501-00000-rc8+ #74
         Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
         task: ffff88007c63d5c0 ti: ffff88007c210000 task.ti: ffff88007c210000
         RIP: 0010:[<ffffffff8118998c>]  [<ffffffff8118998c>] put_page+0x5c/0x60
         RSP: 0018:ffff88007c213e00  EFLAGS: 00010246
         Call Trace:
           put_hwpoison_page+0x4e/0x80
           soft_offline_page+0x501/0x520
           SyS_madvise+0x6bc/0x6f0
           entry_SYSCALL_64_fastpath+0x12/0x6a
         Code: 8b fc ff ff 5b 5d c3 48 89 df e8 b0 fa ff ff 48 89 df 31 f6 e8 c6 7d ff ff 5b 5d c3 48 c7 c6 08 54 a2 81 48 89 df e8 a4 c5 01 00 <0f> 0b 66 90 66 66 66 66 90 55 48 89 e5 41 55 41 54 53 48 8b 47
         RIP  [<ffffffff8118998c>] put_page+0x5c/0x60
          RSP <ffff88007c213e00>
      
      The root cause resides in get_any_page() which retries to get a refcount
      of the page to be soft-offlined.  This function calls
      put_hwpoison_page(), expecting that the target page is putback to LRU
      list.  But it can be also freed to buddy.  So the second check need to
      care about such case.
      
      Fixes: af8fae7c ("mm/memory-failure.c: clean up soft_offline_page()")
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      5b48ad31
    • Roman Gushchin's avatar
      fuse: break infinite loop in fuse_fill_write_pages() · affc1b9e
      Roman Gushchin authored
      commit 3ca8138f upstream.
      
      I got a report about unkillable task eating CPU. Further
      investigation shows, that the problem is in the fuse_fill_write_pages()
      function. If iov's first segment has zero length, we get an infinite
      loop, because we never reach iov_iter_advance() call.
      
      Fix this by calling iov_iter_advance() before repeating an attempt to
      copy data from userspace.
      
      A similar problem is described in 124d3b70 ("fix writev regression:
      pan hanging unkillable and un-straceable"). If zero-length segmend
      is followed by segment with invalid address,
      iov_iter_fault_in_readable() checks only first segment (zero-length),
      iov_iter_copy_from_user_atomic() skips it, fails at second and
      returns zero -> goto again without skipping zero-length segment.
      
      Patch calls iov_iter_advance() before goto again: we'll skip zero-length
      segment at second iteraction and iov_iter_fault_in_readable() will detect
      invalid address.
      
      Special thanks to Konstantin Khlebnikov, who helped a lot with the commit
      description.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Maxim Patlasov <mpatlasov@parallels.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarRoman Gushchin <klamm@yandex-team.ru>
      Signed-off-by: default avatarMiklos Szeredi <miklos@szeredi.hu>
      Fixes: ea9b9907 ("fuse: implement perform_write")
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      affc1b9e
    • Linus Walleij's avatar
      ARM: 8517/1: ICST: avoid arithmetic overflow in icst_hz() · 8c6bd581
      Linus Walleij authored
      commit 5070fb14 upstream.
      
      When trying to set the ICST 307 clock to 25174000 Hz I ran into
      this arithmetic error: the icst_hz_to_vco() correctly figure out
      DIVIDE=2, RDW=100 and VDW=99 yielding a frequency of
      25174000 Hz out of the VCO. (I replicated the icst_hz() function
      in a spreadsheet to verify this.)
      
      However, when I called icst_hz() on these VCO settings it would
      instead return 4122709 Hz. This causes an error in the common
      clock driver for ICST as the common clock framework will call
      .round_rate() on the clock which will utilize icst_hz_to_vco()
      followed by icst_hz() suggesting the erroneous frequency, and
      then the clock gets set to this.
      
      The error did not manifest in the old clock framework since
      this high frequency was only used by the CLCD, which calls
      clk_set_rate() without first calling clk_round_rate() and since
      the old clock framework would not call clk_round_rate() before
      setting the frequency, the correct values propagated into
      the VCO.
      
      After some experimenting I figured out that it was due to a simple
      arithmetic overflow: the divisor for 24Mhz reference frequency
      as reference becomes 24000000*2*(99+8)=0x132212400 and the "1"
      in bit 32 overflows and is lost.
      
      But introducing an explicit 64-by-32 bit do_div() and casting
      the divisor into (u64) we get the right frequency back, and the
      right frequency gets set.
      
      Tested on the ARM Versatile.
      
      Cc: linux-clk@vger.kernel.org
      Cc: Pawel Moll <pawel.moll@arm.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      8c6bd581
    • Linus Walleij's avatar
      ARM: 8519/1: ICST: try other dividends than 1 · d306ee9e
      Linus Walleij authored
      commit e972c374 upstream.
      
      Since the dawn of time the ICST code has only supported divide
      by one or hang in an eternal loop. Luckily we were always dividing
      by one because the reference frequency for the systems using
      the ICSTs is 24MHz and the [min,max] values for the PLL input
      if [10,320] MHz for ICST307 and [6,200] for ICST525, so the loop
      will always terminate immediately without assigning any divisor
      for the reference frequency.
      
      But for the code to make sense, let's insert the missing i++
      Reported-by: default avatarDavid Binderman <dcb314@hotmail.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      d306ee9e
    • Anson Huang's avatar
      ARM: 8471/1: need to save/restore arm register(r11) when it is corrupted · 02d2716d
      Anson Huang authored
      commit fa0708b3 upstream.
      
      In cpu_v7_do_suspend routine, r11 is used while it is NOT
      saved/restored, different compiler may have different usage
      of ARM general registers, so it may cause issues during
      calling cpu_v7_do_suspend.
      
      We meet kernel fault occurs when using GCC 4.8.3, r11 contains
      valid value before calling into cpu_v7_do_suspend, but when returned
      from this routine, r11 is corrupted and lead to kernel fault.
      Doing save/restore for those corrupted registers is a must in
      assemble code.
      Signed-off-by: default avatarAnson Huang <Anson.Huang@freescale.com>
      Reviewed-by: default avatarNicolas Pitre <nico@linaro.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      02d2716d
    • Helmut Klein's avatar
      ARM: dts: Kirkwood: Fix QNAP TS219 power-off · 58893eec
      Helmut Klein authored
      commit 5442f0ea upstream.
      
      The "reg" entry in the "poweroff" section of "kirkwood-ts219.dtsi"
      addressed the wrong uart (0 = console). This patch changes the address
      to select uart 1, which is the uart connected to the pic
      microcontroller, which can switch the device off.
      Signed-off-by: default avatarHelmut Klein <hgkr.klein@gmail.com>
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Fixes: 4350a47b ("ARM: Kirkwood: Make use of the QNAP Power off driver.")
      Signed-off-by: default avatarGregory CLEMENT <gregory.clement@free-electrons.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      58893eec
    • Andrew Gabbasov's avatar
      udf: Check output buffer length when converting name to CS0 · fc530807
      Andrew Gabbasov authored
      commit bb00c898 upstream.
      
      If a name contains at least some characters with Unicode values
      exceeding single byte, the CS0 output should have 2 bytes per character.
      And if other input characters have single byte Unicode values, then
      the single input byte is converted to 2 output bytes, and the length
      of output becomes larger than the length of input. And if the input
      name is long enough, the output length may exceed the allocated buffer
      length.
      
      All this means that conversion from UTF8 or NLS to CS0 requires
      checking of output length in order to stop when it exceeds the given
      output buffer size.
      
      [JK: Make code return -ENAMETOOLONG instead of silently truncating the
      name]
      Signed-off-by: default avatarAndrew Gabbasov <andrew_gabbasov@mentor.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      fc530807
    • Andrew Gabbasov's avatar
      udf: Prevent buffer overrun with multi-byte characters · b20777e7
      Andrew Gabbasov authored
      commit ad402b26 upstream.
      
      udf_CS0toUTF8 function stops the conversion when the output buffer
      length reaches UDF_NAME_LEN-2, which is correct maximum name length,
      but, when checking, it leaves the space for a single byte only,
      while multi-bytes output characters can take more space, causing
      buffer overflow.
      
      Similar error exists in udf_CS0toNLS function, that restricts
      the output length to UDF_NAME_LEN, while actual maximum allowed
      length is UDF_NAME_LEN-2.
      
      In these cases the output can override not only the current buffer
      length field, causing corruption of the name buffer itself, but also
      following allocation structures, causing kernel crash.
      
      Adjust the output length checks in both functions to prevent buffer
      overruns in case of multi-bytes UTF8 or NLS characters.
      Signed-off-by: default avatarAndrew Gabbasov <andrew_gabbasov@mentor.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      b20777e7
    • Vegard Nossum's avatar
      udf: limit the maximum number of indirect extents in a row · ac79eb71
      Vegard Nossum authored
      commit b0918d9f upstream.
      
      udf_next_aext() just follows extent pointers while extents are marked as
      indirect. This can loop forever for corrupted filesystem. Limit number
      the of indirect extents we are willing to follow in a row.
      
      [JK: Updated changelog, limit, style]
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ac79eb71
    • Andrew Elble's avatar
      nfs: Fix race in __update_open_stateid() · d92f5225
      Andrew Elble authored
      commit 361cad3c upstream.
      
      We've seen this in a packet capture - I've intermixed what I
      think was going on. The fix here is to grab the so_lock sooner.
      
      1964379 -> #1 open (for write) reply seqid=1
      1964393 -> #2 open (for read) reply seqid=2
      
        __nfs4_close(), state->n_wronly--
        nfs4_state_set_mode_locked(), changes state->state = [R]
        state->flags is [RW]
        state->state is [R], state->n_wronly == 0, state->n_rdonly == 1
      
      1964398 -> #3 open (for write) call -> because close is already running
      1964399 -> downgrade (to read) call seqid=2 (close of #1)
      1964402 -> #3 open (for write) reply seqid=3
      
       __update_open_stateid()
         nfs_set_open_stateid_locked(), changes state->flags
         state->flags is [RW]
         state->state is [R], state->n_wronly == 0, state->n_rdonly == 1
         new sequence number is exposed now via nfs4_stateid_copy()
      
         next step would be update_open_stateflags(), pending so_lock
      
      1964403 -> downgrade reply seqid=2, fails with OLD_STATEID (close of #1)
      
         nfs4_close_prepare() gets so_lock and recalcs flags -> send close
      
      1964405 -> downgrade (to read) call seqid=3 (close of #1 retry)
      
         __update_open_stateid() gets so_lock
       * update_open_stateflags() updates state->n_wronly.
         nfs4_state_set_mode_locked() updates state->state
      
         state->flags is [RW]
         state->state is [RW], state->n_wronly == 1, state->n_rdonly == 1
      
       * should have suppressed the preceding nfs4_close_prepare() from
         sending open_downgrade
      
      1964406 -> write call
      1964408 -> downgrade (to read) reply seqid=4 (close of #1 retry)
      
         nfs_clear_open_stateid_locked()
         state->flags is [R]
         state->state is [RW], state->n_wronly == 1, state->n_rdonly == 1
      
      1964409 -> write reply (fails, openmode)
      Signed-off-by: default avatarAndrew Elble <aweits@rit.edu>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      d92f5225
    • Anton Protopopov's avatar
      cifs: fix erroneous return value · 69ef7d4f
      Anton Protopopov authored
      commit 4b550af5 upstream.
      
      The setup_ntlmv2_rsp() function may return positive value ENOMEM instead
      of -ENOMEM in case of kmalloc failure.
      Signed-off-by: default avatarAnton Protopopov <a.s.protopopov@gmail.com>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      69ef7d4f
    • Vasily Averin's avatar
      cifs_dbg() outputs an uninitialized buffer in cifs_readdir() · ec6e80b8
      Vasily Averin authored
      commit 01b9b0b2 upstream.
      
      In some cases tmp_bug can be not filled in cifs_filldir and stay uninitialized,
      therefore its printk with "%s" modifier can leak content of kernelspace memory.
      If old content of this buffer does not contain '\0' access bejond end of
      allocated object can crash the host.
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarSteve French <steve.french@primarydata.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ec6e80b8
    • Yong Li's avatar
      iio: dac: mcp4725: set iio name property in sysfs · d4cd077a
      Yong Li authored
      commit 97a249e9 upstream.
      
      Without this change, the name entity for mcp4725 is missing in
      /sys/bus/iio/devices/iio\:device*/name
      
      With this change, name is reported correctly
      Signed-off-by: default avatarYong Li <sdliyong@gmail.com>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      d4cd077a
    • Lars-Peter Clausen's avatar
      iio: adis_buffer: Fix out-of-bounds memory access · 8506cf1d
      Lars-Peter Clausen authored
      commit d590faf9 upstream.
      
      The SPI tx and rx buffers are both supposed to be scan_bytes amount of
      bytes large and a common allocation is used to allocate both buffers. This
      puts the beginning of the tx buffer scan_bytes bytes after the rx buffer.
      The initialization of the tx buffer pointer is done adding scan_bytes to
      the beginning of the rx buffer, but since the rx buffer is of type __be16
      this will actually add two times as much and the tx buffer ends up pointing
      after the allocated buffer.
      
      Fix this by using scan_count, which is scan_bytes / 2, instead of
      scan_bytes when initializing the tx buffer pointer.
      
      Fixes: aacff892 ("staging:iio:adis: Preallocate transfer message")
      Signed-off-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      8506cf1d
    • Dan Carpenter's avatar
      iio: fix some warning messages · 3a9194b1
      Dan Carpenter authored
      commit 231bfe53 upstream.
      
      WARN_ON() only takes a condition argument.  I have changed these to
      WARN() instead.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      3a9194b1
    • Lars-Peter Clausen's avatar
      iio: ad5064: Fix ad5629/ad5669 shift · c66e8114
      Lars-Peter Clausen authored
      commit 5dcbe97b upstream.
      
      The ad5629/ad5669 are the I2C variant of the ad5628/ad5668, which has a SPI
      interface. They are mostly identical with the exception that the shift
      factor is different. Currently the driver does not take care of this
      difference which leads to incorrect DAC output values.
      
      Fix this by introducing a custom channel spec for the ad5629/ad5669 with
      the correct shift factor.
      
      Fixes: commit 6a17a076 ("iio:dac:ad5064: Add support for the ad5629r and ad5669r")
      Signed-off-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      c66e8114
    • Michael Hennerich's avatar
      iio:ad5064: Make sure ad5064_i2c_write() returns 0 on success · baae260f
      Michael Hennerich authored
      commit 03fe472e upstream.
      
      i2c_master_send() returns the number of bytes transferred on success while
      the ad5064 driver expects that the write() callback returns 0 on success.
      Fix that by translating any non negative return value of i2c_master_send()
      to 0.
      
      Fixes: commit 6a17a076 ("iio:dac:ad5064: Add support for the ad5629r and ad5669r")
      Signed-off-by: default avatarMichael Hennerich <michael.hennerich@analog.com>
      Signed-off-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      baae260f
    • Vladimir Zapolskiy's avatar
      iio: lpc32xx_adc: fix warnings caused by enabling unprepared clock · 0e2e731e
      Vladimir Zapolskiy authored
      commit 01bb70ae upstream.
      
      If common clock framework is configured, the driver generates a warning,
      which is fixed by this change:
      
          root@devkit3250:~# cat /sys/bus/iio/devices/iio\:device0/in_voltage0_raw
          ------------[ cut here ]------------
          WARNING: CPU: 0 PID: 724 at drivers/clk/clk.c:727 clk_core_enable+0x2c/0xa4()
          Modules linked in: sc16is7xx snd_soc_uda1380
          CPU: 0 PID: 724 Comm: cat Not tainted 4.3.0-rc2+ #198
          Hardware name: LPC32XX SoC (Flattened Device Tree)
          Backtrace:
          [<>] (dump_backtrace) from [<>] (show_stack+0x18/0x1c)
          [<>] (show_stack) from [<>] (dump_stack+0x20/0x28)
          [<>] (dump_stack) from [<>] (warn_slowpath_common+0x90/0xb8)
          [<>] (warn_slowpath_common) from [<>] (warn_slowpath_null+0x24/0x2c)
          [<>] (warn_slowpath_null) from [<>] (clk_core_enable+0x2c/0xa4)
          [<>] (clk_core_enable) from [<>] (clk_enable+0x24/0x38)
          [<>] (clk_enable) from [<>] (lpc32xx_read_raw+0x38/0x80)
          [<>] (lpc32xx_read_raw) from [<>] (iio_read_channel_info+0x70/0x94)
          [<>] (iio_read_channel_info) from [<>] (dev_attr_show+0x28/0x4c)
          [<>] (dev_attr_show) from [<>] (sysfs_kf_seq_show+0x8c/0xf0)
          [<>] (sysfs_kf_seq_show) from [<>] (kernfs_seq_show+0x2c/0x30)
          [<>] (kernfs_seq_show) from [<>] (seq_read+0x1c8/0x440)
          [<>] (seq_read) from [<>] (kernfs_fop_read+0x38/0x170)
          [<>] (kernfs_fop_read) from [<>] (do_readv_writev+0x16c/0x238)
          [<>] (do_readv_writev) from [<>] (vfs_readv+0x50/0x58)
          [<>] (vfs_readv) from [<>] (default_file_splice_read+0x1a4/0x308)
          [<>] (default_file_splice_read) from [<>] (do_splice_to+0x78/0x84)
          [<>] (do_splice_to) from [<>] (splice_direct_to_actor+0xc8/0x1cc)
          [<>] (splice_direct_to_actor) from [<>] (do_splice_direct+0xa0/0xb8)
          [<>] (do_splice_direct) from [<>] (do_sendfile+0x1a8/0x30c)
          [<>] (do_sendfile) from [<>] (SyS_sendfile64+0x104/0x10c)
          [<>] (SyS_sendfile64) from [<>] (ret_fast_syscall+0x0/0x38)
      Signed-off-by: default avatarVladimir Zapolskiy <vz@mleia.com>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      0e2e731e
    • Lars-Peter Clausen's avatar
      iio:ad7793: Fix ad7785 product ID · 3eefb1c1
      Lars-Peter Clausen authored
      commit 785171fd upstream.
      
      While the datasheet for the AD7785 lists 0xXB as the product ID the actual
      product ID is 0xX3.
      
      Fix the product ID otherwise the driver will reject the device due to non
      matching IDs.
      
      Fixes: e786cc26 ("staging:iio:ad7793: Implement stricter id checking")
      Signed-off-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      3eefb1c1
    • James Bottomley's avatar
      scsi: fix soft lockup in scsi_remove_target() on module removal · bf7642bd
      James Bottomley authored
      commit 90a88d6e upstream.
      
      This softlockup is currently happening:
      
      [  444.088002] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:1:29]
      [  444.088002] Modules linked in: lpfc(-) qla2x00tgt(O) qla2xxx_scst(O) scst_vdisk(O) scsi_transport_fc libcrc32c scst(O) dlm configfs nfsd lockd grace nfs_acl auth_rpcgss sunrpc ed
      d snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device dm_mod iTCO_wdt snd_hda_codec_realtek snd_hda_codec_generic gpio_ich iTCO_vendor_support ppdev snd_hda_intel snd_hda_codec snd_hda
      _core snd_hwdep tg3 snd_pcm snd_timer libphy lpc_ich parport_pc ptp acpi_cpufreq snd pps_core fjes parport i2c_i801 ehci_pci tpm_tis tpm sr_mod cdrom soundcore floppy hwmon sg 8250_
      fintek pcspkr i915 drm_kms_helper uhci_hcd ehci_hcd drm fb_sys_fops sysimgblt sysfillrect syscopyarea i2c_algo_bit usbcore button video usb_common fan ata_generic ata_piix libata th
      ermal
      [  444.088002] CPU: 1 PID: 29 Comm: kworker/1:1 Tainted: G           O    4.4.0-rc5-2.g1e923a3-default #1
      [  444.088002] Hardware name: FUJITSU SIEMENS ESPRIMO E           /D2164-A1, BIOS 5.00 R1.10.2164.A1               05/08/2006
      [  444.088002] Workqueue: fc_wq_4 fc_rport_final_delete [scsi_transport_fc]
      [  444.088002] task: f6266ec0 ti: f6268000 task.ti: f6268000
      [  444.088002] EIP: 0060:[<c07e7044>] EFLAGS: 00000286 CPU: 1
      [  444.088002] EIP is at _raw_spin_unlock_irqrestore+0x14/0x20
      [  444.088002] EAX: 00000286 EBX: f20d3800 ECX: 00000002 EDX: 00000286
      [  444.088002] ESI: f50ba800 EDI: f2146848 EBP: f6269ec8 ESP: f6269ec8
      [  444.088002]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
      [  444.088002] CR0: 8005003b CR2: 08f96600 CR3: 363ae000 CR4: 000006d0
      [  444.088002] Stack:
      [  444.088002]  f6269eec c066b0f7 00000286 f2146848 f50ba808 f50ba800 f50ba800 f2146a90
      [  444.088002]  f2146848 f6269f08 f8f0a4ed f3141000 f2146800 f2146a90 f619fa00 00000040
      [  444.088002]  f6269f40 c026cb25 00000001 166c6392 00000061 f6757140 f6136340 00000004
      [  444.088002] Call Trace:
      [  444.088002]  [<c066b0f7>] scsi_remove_target+0x167/0x1c0
      [  444.088002]  [<f8f0a4ed>] fc_rport_final_delete+0x9d/0x1e0 [scsi_transport_fc]
      [  444.088002]  [<c026cb25>] process_one_work+0x155/0x3e0
      [  444.088002]  [<c026cde7>] worker_thread+0x37/0x490
      [  444.088002]  [<c027214b>] kthread+0x9b/0xb0
      [  444.088002]  [<c07e72c1>] ret_from_kernel_thread+0x21/0x40
      
      What appears to be happening is that something has pinned the target
      so it can't go into STARGET_DEL via final release and the loop in
      scsi_remove_target spins endlessly until that happens.
      
      The fix for this soft lockup is to not keep looping over a device that
      we've called remove on but which hasn't gone into DEL state.  This
      patch will retain a simplistic memory of the last target and not keep
      looping over it.
      Reported-by: default avatarSebastian Herbszt <herbszt@gmx.de>
      Tested-by: default avatarSebastian Herbszt <herbszt@gmx.de>
      Fixes: 40998193Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      bf7642bd
    • Mika Westerberg's avatar
      SCSI: Add Marvell Console to VPD blacklist · 14b9a5ea
      Mika Westerberg authored
      commit 82c43310 upstream.
      
      I have a Marvell 88SE9230 SATA Controller that has some sort of
      integrated console SCSI device attached to one of the ports.
      
        ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
        ata14.00: ATAPI: MARVELL VIRTUALL, 1.09, max UDMA/66
        ata14.00: configured for UDMA/66
        scsi 13:0:0:0: Processor         Marvell  Console 1.01 PQ: 0 ANSI: 5
      
      Sending it VPD INQUIRY command seem to always fail with following error:
      
        ata14.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
        ata14.00: irq_stat 0x40000001
        ata14.00: cmd a0/01:00:00:00:01/00:00:00:00:00/a0 tag 2 dma 16640 in
                  Inquiry 12 01 00 00 ff 00res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x3 (HSM violation)
        ata14: hard resetting link
      
      This has been minor annoyance (only error printed on dmesg) until commit
      09e2b0b1 ("scsi: rescan VPD attributes") added call to scsi_attach_vpd()
      in scsi_rescan_device(). The commit causes the system to splat out
      following errors continuously without ever reaching the UI:
      
        ata14.00: configured for UDMA/66
        ata14: EH complete
        ata14.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
        ata14.00: irq_stat 0x40000001
        ata14.00: cmd a0/01:00:00:00:01/00:00:00:00:00/a0 tag 6 dma 16640 in
                  Inquiry 12 01 00 00 ff 00res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x3 (HSM violation)
        ata14: hard resetting link
        ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
        ata14.00: configured for UDMA/66
        ata14: EH complete
        ata14.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
        ata14.00: irq_stat 0x40000001
        ata14.00: cmd a0/01:00:00:00:01/00:00:00:00:00/a0 tag 7 dma 16640 in
                  Inquiry 12 01 00 00 ff 00res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x3 (HSM violation)
      
      Without in-depth understanding of SCSI layer and the Marvell controller,
      I suspect this happens because when the link goes down (because of an
      error) we schedule scsi_rescan_device() which again fails to read VPD
      data... ad infinitum.
      
      Since VPD data cannot be read from the device anyway we prevent the SCSI
      layer from even trying by blacklisting the device. This gets away the
      error and the system starts up normally.
      
      [mkp: Widened the match to all revisions of this device]
      Signed-off-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Reported-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: default avatarAlexander Duyck <alexander.duyck@gmail.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      14b9a5ea
    • Hannes Reinecke's avatar
      scsi_dh_rdac: always retry MODE SELECT on command lock violation · 6c3964e4
      Hannes Reinecke authored
      commit d2d06d4f upstream.
      
      If MODE SELECT returns with sense '05/91/36' (command lock violation)
      it should always be retried without counting the number of retries.
      During an HBA upgrade or similar circumstances one might see a flood
      of MODE SELECT command from various HBAs, which will easily trigger
      the sense code and exceed the retry count.
      Signed-off-by: default avatarHannes Reinecke <hare@suse.de>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      6c3964e4
    • Kirill A. Shutemov's avatar
      drivers/scsi/sg.c: mark VMA as VM_IO to prevent migration · 26d026b8
      Kirill A. Shutemov authored
      commit 461c7fa1 upstream.
      
      Reduced testcase:
      
          #include <fcntl.h>
          #include <unistd.h>
          #include <sys/mman.h>
          #include <numaif.h>
      
          #define SIZE 0x2000
      
          int main()
          {
              int fd;
              void *p;
      
              fd = open("/dev/sg0", O_RDWR);
              p = mmap(NULL, SIZE, PROT_EXEC, MAP_PRIVATE | MAP_LOCKED, fd, 0);
              mbind(p, SIZE, 0, NULL, 0, MPOL_MF_MOVE);
              return 0;
          }
      
      We shouldn't try to migrate pages in sg VMA as we don't have a way to
      update Sg_scatter_hold::pages accordingly from mm core.
      
      Let's mark the VMA as VM_IO to indicate to mm core that the VMA is not
      migratable.
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Doug Gilbert <dgilbert@interlog.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Shiraz Hashim <shashim@codeaurora.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: syzkaller <syzkaller@googlegroups.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      26d026b8
    • Alan Stern's avatar
      SCSI: fix crashes in sd and sr runtime PM · d0adf936
      Alan Stern authored
      commit 13b43891 upstream.
      
      Runtime suspend during driver probe and removal can cause problems.
      The driver's runtime_suspend or runtime_resume callbacks may invoked
      before the driver has finished binding to the device or after the
      driver has unbound from the device.
      
      This problem shows up with the sd and sr drivers, and can cause disk
      or CD/DVD drives to become unusable as a result.  The fix is simple.
      The drivers store a pointer to the scsi_disk or scsi_cd structure as
      their private device data when probing is finished, so we simply have
      to be sure to clear the private data during removal and test it during
      runtime suspend/resume.
      
      This fixes <https://bugs.debian.org/801925>.
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Reported-by: default avatarPaul Menzel <paul.menzel@giantmonkey.de>
      Reported-by: default avatarErich Schubert <erich@debian.org>
      Reported-by: default avatarAlexandre Rossi <alexandre.rossi@gmail.com>
      Tested-by: default avatarPaul Menzel <paul.menzel@giantmonkey.de>
      Tested-by: default avatarErich Schubert <erich@debian.org>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      d0adf936
    • Nicholas Bellinger's avatar
      iscsi-target: Fix potential dead-lock during node acl delete · 31de1ca8
      Nicholas Bellinger authored
      commit 26a99c19 upstream.
      
      This patch is a iscsi-target specific bug-fix for a dead-lock
      that can occur during explicit struct se_node_acl->acl_group
      se_session deletion via configfs rmdir(2), when iscsi-target
      time2retain timer is still active.
      
      It changes iscsi-target to obtain se_portal_group->session_lock
      internally using spin_in_locked() to check for the specific
      se_node_acl configfs shutdown rmdir(2) case.
      
      Note this patch is intended for stable, and the subsequent
      v4.5-rc patch converts target_core_tpg.c to use proper
      se_sess->sess_kref reference counting for both se_node_acl
      deletion + se_node_acl->queue_depth se_session restart.
      Reported-by: default avatar: Sagi Grimberg <sagig@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Andy Grover <agrover@redhat.com>
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      31de1ca8
    • Ken Xue's avatar
      SCSI: Fix NULL pointer dereference in runtime PM · 229ad79a
      Ken Xue authored
      commit 4fd41a85 upstream.
      
      The routines in scsi_pm.c assume that if a runtime-PM callback is
      invoked for a SCSI device, it can only mean that the device's driver
      has asked the block layer to handle the runtime power management (by
      calling blk_pm_runtime_init(), which among other things sets q->dev).
      
      However, this assumption turns out to be wrong for things like the ses
      driver.  Normally ses devices are not allowed to do runtime PM, but
      userspace can override this setting.  If this happens, the kernel gets
      a NULL pointer dereference when blk_post_runtime_resume() tries to use
      the uninitialized q->dev pointer.
      
      This patch fixes the problem by checking q->dev in block layer before
      handle runtime PM. Since ses doesn't define any PM callbacks and call
      blk_pm_runtime_init(), the crash won't occur.
      
      This fixes Bugzilla #101371.
      https://bugzilla.kernel.org/show_bug.cgi?id=101371
      
      More discussion can be found from below link.
      http://marc.info/?l=linux-scsi&m=144163730531875&w=2Signed-off-by: default avatarKen Xue <Ken.Xue@amd.com>
      Acked-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Cc: Xiangliang Yu <Xiangliang.Yu@amd.com>
      Cc: James E.J. Bottomley <JBottomley@odin.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Michael Terry <Michael.terry@canonical.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      229ad79a
    • Bart Van Assche's avatar
      Fix a memory leak in scsi_host_dev_release() · 81a7d1af
      Bart Van Assche authored
      commit b49493f9 upstream.
      
      Avoid that kmemleak reports the following memory leak if a
      SCSI LLD calls scsi_host_alloc() and scsi_host_put() but neither
      scsi_host_add() nor scsi_host_remove(). The following shell
      command triggers that scenario:
      
      for ((i=0; i<2; i++)); do
        srp_daemon -oac |
        while read line; do
          echo $line >/sys/class/infiniband_srp/srp-mlx4_0-1/add_target
        done
      done
      
      unreferenced object 0xffff88021b24a220 (size 8):
        comm "srp_daemon", pid 56421, jiffies 4295006762 (age 4240.750s)
        hex dump (first 8 bytes):
          68 6f 73 74 35 38 00 a5                          host58..
        backtrace:
          [<ffffffff8151014a>] kmemleak_alloc+0x7a/0xc0
          [<ffffffff81165c1e>] __kmalloc_track_caller+0xfe/0x160
          [<ffffffff81260d2b>] kvasprintf+0x5b/0x90
          [<ffffffff81260e2d>] kvasprintf_const+0x8d/0xb0
          [<ffffffff81254b0c>] kobject_set_name_vargs+0x3c/0xa0
          [<ffffffff81337e3c>] dev_set_name+0x3c/0x40
          [<ffffffff81355757>] scsi_host_alloc+0x327/0x4b0
          [<ffffffffa03edc8e>] srp_create_target+0x4e/0x8a0 [ib_srp]
          [<ffffffff8133778b>] dev_attr_store+0x1b/0x20
          [<ffffffff811f27fa>] sysfs_kf_write+0x4a/0x60
          [<ffffffff811f1e8e>] kernfs_fop_write+0x14e/0x180
          [<ffffffff81176eef>] __vfs_write+0x2f/0xf0
          [<ffffffff811771e4>] vfs_write+0xa4/0x100
          [<ffffffff81177c64>] SyS_write+0x54/0xc0
          [<ffffffff8151b257>] entry_SYSCALL_64_fastpath+0x12/0x6f
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: default avatarSagi Grimberg <sagig@mellanox.com>
      Reviewed-by: default avatarLee Duncan <lduncan@suse.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      81a7d1af
    • Nicholas Bellinger's avatar
      iscsi-target: Fix rx_login_comp hang after login failure · 1ff4ce3f
      Nicholas Bellinger authored
      commit ca82c2bd upstream.
      
      This patch addresses a case where iscsi_target_do_tx_login_io()
      fails sending the last login response PDU, after the RX/TX
      threads have already been started.
      
      The case centers around iscsi_target_rx_thread() not invoking
      allow_signal(SIGINT) before the send_sig(SIGINT, ...) occurs
      from the failure path, resulting in RX thread hanging
      indefinately on iscsi_conn->rx_login_comp.
      
      Note this bug is a regression introduced by:
      
        commit e5419865
        Author: Nicholas Bellinger <nab@linux-iscsi.org>
        Date:   Wed Jul 22 23:14:19 2015 -0700
      
            iscsi-target: Fix iscsit_start_kthreads failure OOPs
      
      To address this bug, complete ->rx_login_complete for good
      measure in the failure path, and immediately return from
      RX thread context if connection state did not actually reach
      full feature phase (TARG_CONN_STATE_LOGGED_IN).
      
      Cc: Sagi Grimberg <sagig@mellanox.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      1ff4ce3f
    • Peter Oberparleiter's avatar
      scsi_sysfs: Fix queue_ramp_up_period return code · 54c4a58c
      Peter Oberparleiter authored
      commit 863e02d0 upstream.
      
      Writing a number to /sys/bus/scsi/devices/<sdev>/queue_ramp_up_period
      returns the value of that number instead of the number of bytes written.
      This behavior can confuse programs expecting POSIX write() semantics.
      Fix this by returning the number of bytes written instead.
      Signed-off-by: default avatarPeter Oberparleiter <oberpar@linux.vnet.ibm.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Reviewed-by: default avatarMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Reviewed-by: default avatarEwan D. Milne <emilne@redhat.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      54c4a58c
    • Christoph Hellwig's avatar
      scsi: restart list search after unlock in scsi_remove_target · 259d9d2f
      Christoph Hellwig authored
      commit 40998193 upstream.
      
      When dropping a lock while iterating a list we must restart the search
      as other threads could have manipulated the list under us.  Without this
      we can get stuck in an endless loop.  This bug was introduced by
      
      commit bc3f02a7
      Author: Dan Williams <djbw@fb.com>
      Date:   Tue Aug 28 22:12:10 2012 -0700
      
          [SCSI] scsi_remove_target: fix softlockup regression on hot remove
      
      Which was itself trying to fix a reported soft lockup issue
      
      http://thread.gmane.org/gmane.linux.kernel/1348679
      
      However, we believe even with this revert of the original patch, the soft
      lockup problem has been fixed by
      
      commit f2495e22
      Author: James Bottomley <JBottomley@Parallels.com>
      Date:   Tue Jan 21 07:01:41 2014 -0800
      
          [SCSI] dual scan thread bug fix
      
      Thanks go to Dan Williams <dan.j.williams@intel.com> for tracking all this
      prior history down.
      Reported-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Tested-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Fixes: bc3f02a7Signed-off-by: default avatarJames Bottomley <JBottomley@Odin.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      259d9d2f
    • James Bottomley's avatar
      klist: fix starting point removed bug in klist iterators · 1661a369
      James Bottomley authored
      commit 00cd29b7 upstream.
      
      The starting node for a klist iteration is often passed in from
      somewhere way above the klist infrastructure, meaning there's no
      guarantee the node is still on the list.  We've seen this in SCSI where
      we use bus_find_device() to iterate through a list of devices.  In the
      face of heavy hotplug activity, the last device returned by
      bus_find_device() can be removed before the next call.  This leads to
      
      Dec  3 13:22:02 localhost kernel: WARNING: CPU: 2 PID: 28073 at include/linux/kref.h:47 klist_iter_init_node+0x3d/0x50()
      Dec  3 13:22:02 localhost kernel: Modules linked in: scsi_debug x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32c_intel joydev iTCO_wdt dcdbas ipmi_devintf acpi_power_meter iTCO_vendor_support ipmi_si imsghandler pcspkr wmi acpi_cpufreq tpm_tis tpm shpchp lpc_ich mfd_core nfsd nfs_acl lockd grace sunrpc tg3 ptp pps_core
      Dec  3 13:22:02 localhost kernel: CPU: 2 PID: 28073 Comm: cat Not tainted 4.4.0-rc1+ #2
      Dec  3 13:22:02 localhost kernel: Hardware name: Dell Inc. PowerEdge R320/08VT7V, BIOS 2.0.22 11/19/2013
      Dec  3 13:22:02 localhost kernel: ffffffff81a20e77 ffff880613acfd18 ffffffff81321eef 0000000000000000
      Dec  3 13:22:02 localhost kernel: ffff880613acfd50 ffffffff8107ca52 ffff88061176b198 0000000000000000
      Dec  3 13:22:02 localhost kernel: ffffffff814542b0 ffff880610cfb100 ffff88061176b198 ffff880613acfd60
      Dec  3 13:22:02 localhost kernel: Call Trace:
      Dec  3 13:22:02 localhost kernel: [<ffffffff81321eef>] dump_stack+0x44/0x55
      Dec  3 13:22:02 localhost kernel: [<ffffffff8107ca52>] warn_slowpath_common+0x82/0xc0
      Dec  3 13:22:02 localhost kernel: [<ffffffff814542b0>] ? proc_scsi_show+0x20/0x20
      Dec  3 13:22:02 localhost kernel: [<ffffffff8107cb4a>] warn_slowpath_null+0x1a/0x20
      Dec  3 13:22:02 localhost kernel: [<ffffffff8167225d>] klist_iter_init_node+0x3d/0x50
      Dec  3 13:22:02 localhost kernel: [<ffffffff81421d41>] bus_find_device+0x51/0xb0
      Dec  3 13:22:02 localhost kernel: [<ffffffff814545ad>] scsi_seq_next+0x2d/0x40
      [...]
      
      And an eventual crash. It can actually occur in any hotplug system
      which has a device finder and a starting device.
      
      We can fix this globally by making sure the starting node for
      klist_iter_init_node() is actually a member of the list before using it
      (and by starting from the beginning if it isn't).
      Reported-by: default avatarEwan D. Milne <emilne@redhat.com>
      Tested-by: default avatarEwan D. Milne <emilne@redhat.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      1661a369
    • Arnd Bergmann's avatar
      tracing: Fix freak link error caused by branch tracer · 0081cd7b
      Arnd Bergmann authored
      commit b33c8ff4 upstream.
      
      In my randconfig tests, I came across a bug that involves several
      components:
      
      * gcc-4.9 through at least 5.3
      * CONFIG_GCOV_PROFILE_ALL enabling -fprofile-arcs for all files
      * CONFIG_PROFILE_ALL_BRANCHES overriding every if()
      * The optimized implementation of do_div() that tries to
        replace a library call with an division by multiplication
      * code in drivers/media/dvb-frontends/zl10353.c doing
      
              u32 adc_clock = 450560; /* 45.056 MHz */
              if (state->config.adc_clock)
                      adc_clock = state->config.adc_clock;
              do_div(value, adc_clock);
      
      In this case, gcc fails to determine whether the divisor
      in do_div() is __builtin_constant_p(). In particular, it
      concludes that __builtin_constant_p(adc_clock) is false, while
      __builtin_constant_p(!!adc_clock) is true.
      
      That in turn throws off the logic in do_div() that also uses
      __builtin_constant_p(), and instead of picking either the
      constant- optimized division, and the code in ilog2() that uses
      __builtin_constant_p() to figure out whether it knows the answer at
      compile time. The result is a link error from failing to find
      multiple symbols that should never have been called based on
      the __builtin_constant_p():
      
      dvb-frontends/zl10353.c:138: undefined reference to `____ilog2_NaN'
      dvb-frontends/zl10353.c:138: undefined reference to `__aeabi_uldivmod'
      ERROR: "____ilog2_NaN" [drivers/media/dvb-frontends/zl10353.ko] undefined!
      ERROR: "__aeabi_uldivmod" [drivers/media/dvb-frontends/zl10353.ko] undefined!
      
      This patch avoids the problem by changing __trace_if() to check
      whether the condition is known at compile-time to be nonzero, rather
      than checking whether it is actually a constant.
      
      I see this one link error in roughly one out of 1600 randconfig builds
      on ARM, and the patch fixes all known instances.
      
      Link: http://lkml.kernel.org/r/1455312410-1058841-1-git-send-email-arnd@arndb.deAcked-by: default avatarNicolas Pitre <nico@linaro.org>
      Fixes: ab3c9c68 ("branch tracer, intel-iommu: fix build with CONFIG_BRANCH_TRACER=y")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      0081cd7b
    • Steven Rostedt's avatar
      tools lib traceevent: Fix output of %llu for 64 bit values read on 32 bit machines · 2a4305da
      Steven Rostedt authored
      commit 32abc2ed upstream.
      
      When a long value is read on 32 bit machines for 64 bit output, the
      parsing needs to change "%lu" into "%llu", as the value is read
      natively.
      
      Unfortunately, if "%llu" is already there, the code will add another "l"
      to it and fail to parse it properly.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/r/20151116172516.4b79b109@gandalf.local.homeSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      2a4305da
    • Jann Horn's avatar
      ptrace: use fsuid, fsgid, effective creds for fs access checks · 66d3ad93
      Jann Horn authored
      commit caaee623 upstream.
      
      By checking the effective credentials instead of the real UID / permitted
      capabilities, ensure that the calling process actually intended to use its
      credentials.
      
      To ensure that all ptrace checks use the correct caller credentials (e.g.
      in case out-of-tree code or newly added code omits the PTRACE_MODE_*CREDS
      flag), use two new flags and require one of them to be set.
      
      The problem was that when a privileged task had temporarily dropped its
      privileges, e.g.  by calling setreuid(0, user_uid), with the intent to
      perform following syscalls with the credentials of a user, it still passed
      ptrace access checks that the user would not be able to pass.
      
      While an attacker should not be able to convince the privileged task to
      perform a ptrace() syscall, this is a problem because the ptrace access
      check is reused for things in procfs.
      
      In particular, the following somewhat interesting procfs entries only rely
      on ptrace access checks:
      
       /proc/$pid/stat - uses the check for determining whether pointers
           should be visible, useful for bypassing ASLR
       /proc/$pid/maps - also useful for bypassing ASLR
       /proc/$pid/cwd - useful for gaining access to restricted
           directories that contain files with lax permissions, e.g. in
           this scenario:
           lrwxrwxrwx root root /proc/13020/cwd -> /root/foobar
           drwx------ root root /root
           drwxr-xr-x root root /root/foobar
           -rw-r--r-- root root /root/foobar/secret
      
      Therefore, on a system where a root-owned mode 6755 binary changes its
      effective credentials as described and then dumps a user-specified file,
      this could be used by an attacker to reveal the memory layout of root's
      processes or reveal the contents of files he is not allowed to access
      (through /proc/$pid/cwd).
      
      [akpm@linux-foundation.org: fix warning]
      Signed-off-by: default avatarJann Horn <jann@thejh.net>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Casey Schaufler <casey@schaufler-ca.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      66d3ad93
    • Peter Feiner's avatar
      perf trace: Fix documentation for -i · f3dff1cb
      Peter Feiner authored
      commit 956959f6 upstream.
      
      The -i flag was incorrectly listed as a short flag for --no-inherit.  It
      should have only been listed as a short flag for --input.
      
      This documentation error has existed since the --input flag was
      introduced in 6810fc91 (perf trace: Add
      option to analyze events in a file versus live).
      Signed-off-by: default avatarPeter Feiner <pfeiner@google.com>
      Cc: David Ahern <dsahern@gmail.com>
      Link: http://lkml.kernel.org/r/1446657706-14518-1-git-send-email-pfeiner@google.com
      Fixes: 6810fc91 ("perf trace: Add option to analyze events in a file versus live")
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      f3dff1cb
    • Peter Zijlstra's avatar
      perf: Fix inherited events vs. tracepoint filters · c788b454
      Peter Zijlstra authored
      commit b71b437e upstream.
      
      Arnaldo reported that tracepoint filters seem to misbehave (ie. not
      apply) on inherited events.
      
      The fix is obvious; filters are only set on the actual (parent)
      event, use the normal pattern of using this parent event for filters.
      This is safe because each child event has a reference to it.
      Reported-by: default avatarArnaldo Carvalho de Melo <acme@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/20151102095051.GN17308@twins.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      c788b454
    • Filipe Manana's avatar
      Btrfs: fix hang on extent buffer lock caused by the inode_paths ioctl · 2c0d636d
      Filipe Manana authored
      commit 0c0fe3b0 upstream.
      
      While doing some tests I ran into an hang on an extent buffer's rwlock
      that produced the following trace:
      
      [39389.800012] NMI watchdog: BUG: soft lockup - CPU#15 stuck for 22s! [fdm-stress:32166]
      [39389.800016] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 22s! [fdm-stress:32165]
      [39389.800016] Modules linked in: btrfs dm_mod ppdev xor sha256_generic hmac raid6_pq drbg ansi_cprng aesni_intel i2c_piix4 acpi_cpufreq aes_x86_64 ablk_helper tpm_tis parport_pc i2c_core sg cryptd evdev psmouse lrw tpm parport gf128mul serio_raw pcspkr glue_helper processor button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs]
      [39389.800016] irq event stamp: 0
      [39389.800016] hardirqs last  enabled at (0): [<          (null)>]           (null)
      [39389.800016] hardirqs last disabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35
      [39389.800016] softirqs last  enabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35
      [39389.800016] softirqs last disabled at (0): [<          (null)>]           (null)
      [39389.800016] CPU: 14 PID: 32165 Comm: fdm-stress Not tainted 4.4.0-rc6-btrfs-next-18+ #1
      [39389.800016] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
      [39389.800016] task: ffff880175b1ca40 ti: ffff8800a185c000 task.ti: ffff8800a185c000
      [39389.800016] RIP: 0010:[<ffffffff810902af>]  [<ffffffff810902af>] queued_spin_lock_slowpath+0x57/0x158
      [39389.800016] RSP: 0018:ffff8800a185fb80  EFLAGS: 00000202
      [39389.800016] RAX: 0000000000000101 RBX: ffff8801710c4e9c RCX: 0000000000000101
      [39389.800016] RDX: 0000000000000100 RSI: 0000000000000001 RDI: 0000000000000001
      [39389.800016] RBP: ffff8800a185fb98 R08: 0000000000000001 R09: 0000000000000000
      [39389.800016] R10: ffff8800a185fb68 R11: 6db6db6db6db6db7 R12: ffff8801710c4e98
      [39389.800016] R13: ffff880175b1ca40 R14: ffff8800a185fc10 R15: ffff880175b1ca40
      [39389.800016] FS:  00007f6d37fff700(0000) GS:ffff8802be9c0000(0000) knlGS:0000000000000000
      [39389.800016] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [39389.800016] CR2: 00007f6d300019b8 CR3: 0000000037c93000 CR4: 00000000001406e0
      [39389.800016] Stack:
      [39389.800016]  ffff8801710c4e98 ffff8801710c4e98 ffff880175b1ca40 ffff8800a185fbb0
      [39389.800016]  ffffffff81091e11 ffff8801710c4e98 ffff8800a185fbc8 ffffffff81091895
      [39389.800016]  ffff8801710c4e98 ffff8800a185fbe8 ffffffff81486c5c ffffffffa067288c
      [39389.800016] Call Trace:
      [39389.800016]  [<ffffffff81091e11>] queued_read_lock_slowpath+0x46/0x60
      [39389.800016]  [<ffffffff81091895>] do_raw_read_lock+0x3e/0x41
      [39389.800016]  [<ffffffff81486c5c>] _raw_read_lock+0x3d/0x44
      [39389.800016]  [<ffffffffa067288c>] ? btrfs_tree_read_lock+0x54/0x125 [btrfs]
      [39389.800016]  [<ffffffffa067288c>] btrfs_tree_read_lock+0x54/0x125 [btrfs]
      [39389.800016]  [<ffffffffa0622ced>] ? btrfs_find_item+0xa7/0xd2 [btrfs]
      [39389.800016]  [<ffffffffa069363f>] btrfs_ref_to_path+0xd6/0x174 [btrfs]
      [39389.800016]  [<ffffffffa0693730>] inode_to_path+0x53/0xa2 [btrfs]
      [39389.800016]  [<ffffffffa0693e2e>] paths_from_inode+0x117/0x2ec [btrfs]
      [39389.800016]  [<ffffffffa0670cff>] btrfs_ioctl+0xd5b/0x2793 [btrfs]
      [39389.800016]  [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc
      [39389.800016]  [<ffffffff81276727>] ? __this_cpu_preempt_check+0x13/0x15
      [39389.800016]  [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc
      [39389.800016]  [<ffffffff8118b3d4>] ? rcu_read_unlock+0x3e/0x5d
      [39389.800016]  [<ffffffff811822f8>] do_vfs_ioctl+0x42b/0x4ea
      [39389.800016]  [<ffffffff8118b4f3>] ? __fget_light+0x62/0x71
      [39389.800016]  [<ffffffff8118240e>] SyS_ioctl+0x57/0x79
      [39389.800016]  [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f
      [39389.800016] Code: b9 01 01 00 00 f7 c6 00 ff ff ff 75 32 83 fe 01 89 ca 89 f0 0f 45 d7 f0 0f b1 13 39 f0 74 04 89 c6 eb e2 ff ca 0f 84 fa 00 00 00 <8b> 03 84 c0 74 04 f3 90 eb f6 66 c7 03 01 00 e9 e6 00 00 00 e8
      [39389.800012] Modules linked in: btrfs dm_mod ppdev xor sha256_generic hmac raid6_pq drbg ansi_cprng aesni_intel i2c_piix4 acpi_cpufreq aes_x86_64 ablk_helper tpm_tis parport_pc i2c_core sg cryptd evdev psmouse lrw tpm parport gf128mul serio_raw pcspkr glue_helper processor button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs]
      [39389.800012] irq event stamp: 0
      [39389.800012] hardirqs last  enabled at (0): [<          (null)>]           (null)
      [39389.800012] hardirqs last disabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35
      [39389.800012] softirqs last  enabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35
      [39389.800012] softirqs last disabled at (0): [<          (null)>]           (null)
      [39389.800012] CPU: 15 PID: 32166 Comm: fdm-stress Tainted: G             L  4.4.0-rc6-btrfs-next-18+ #1
      [39389.800012] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
      [39389.800012] task: ffff880179294380 ti: ffff880034a60000 task.ti: ffff880034a60000
      [39389.800012] RIP: 0010:[<ffffffff81091e8d>]  [<ffffffff81091e8d>] queued_write_lock_slowpath+0x62/0x72
      [39389.800012] RSP: 0018:ffff880034a639f0  EFLAGS: 00000206
      [39389.800012] RAX: 0000000000000101 RBX: ffff8801710c4e98 RCX: 0000000000000000
      [39389.800012] RDX: 00000000000000ff RSI: 0000000000000000 RDI: ffff8801710c4e9c
      [39389.800012] RBP: ffff880034a639f8 R08: 0000000000000001 R09: 0000000000000000
      [39389.800012] R10: ffff880034a639b0 R11: 0000000000001000 R12: ffff8801710c4e98
      [39389.800012] R13: 0000000000000001 R14: ffff880172cbc000 R15: ffff8801710c4e00
      [39389.800012] FS:  00007f6d377fe700(0000) GS:ffff8802be9e0000(0000) knlGS:0000000000000000
      [39389.800012] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [39389.800012] CR2: 00007f6d3d3c1000 CR3: 0000000037c93000 CR4: 00000000001406e0
      [39389.800012] Stack:
      [39389.800012]  ffff8801710c4e98 ffff880034a63a10 ffffffff81091963 ffff8801710c4e98
      [39389.800012]  ffff880034a63a30 ffffffff81486f1b ffffffffa0672cb3 ffff8801710c4e00
      [39389.800012]  ffff880034a63a78 ffffffffa0672cb3 ffff8801710c4e00 ffff880034a63a58
      [39389.800012] Call Trace:
      [39389.800012]  [<ffffffff81091963>] do_raw_write_lock+0x72/0x8c
      [39389.800012]  [<ffffffff81486f1b>] _raw_write_lock+0x3a/0x41
      [39389.800012]  [<ffffffffa0672cb3>] ? btrfs_tree_lock+0x119/0x251 [btrfs]
      [39389.800012]  [<ffffffffa0672cb3>] btrfs_tree_lock+0x119/0x251 [btrfs]
      [39389.800012]  [<ffffffffa061aeba>] ? rcu_read_unlock+0x5b/0x5d [btrfs]
      [39389.800012]  [<ffffffffa061ce13>] ? btrfs_root_node+0xda/0xe6 [btrfs]
      [39389.800012]  [<ffffffffa061ce83>] btrfs_lock_root_node+0x22/0x42 [btrfs]
      [39389.800012]  [<ffffffffa062046b>] btrfs_search_slot+0x1b8/0x758 [btrfs]
      [39389.800012]  [<ffffffff810fc6b0>] ? time_hardirqs_on+0x15/0x28
      [39389.800012]  [<ffffffffa06365db>] btrfs_lookup_inode+0x31/0x95 [btrfs]
      [39389.800012]  [<ffffffff8108d62f>] ? trace_hardirqs_on+0xd/0xf
      [39389.800012]  [<ffffffff8148482b>] ? mutex_lock_nested+0x397/0x3bc
      [39389.800012]  [<ffffffffa068821b>] __btrfs_update_delayed_inode+0x59/0x1c0 [btrfs]
      [39389.800012]  [<ffffffffa068858e>] __btrfs_commit_inode_delayed_items+0x194/0x5aa [btrfs]
      [39389.800012]  [<ffffffff81486ab7>] ? _raw_spin_unlock+0x31/0x44
      [39389.800012]  [<ffffffffa0688a48>] __btrfs_run_delayed_items+0xa4/0x15c [btrfs]
      [39389.800012]  [<ffffffffa0688d62>] btrfs_run_delayed_items+0x11/0x13 [btrfs]
      [39389.800012]  [<ffffffffa064048e>] btrfs_commit_transaction+0x234/0x96e [btrfs]
      [39389.800012]  [<ffffffffa0618d10>] btrfs_sync_fs+0x145/0x1ad [btrfs]
      [39389.800012]  [<ffffffffa0671176>] btrfs_ioctl+0x11d2/0x2793 [btrfs]
      [39389.800012]  [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc
      [39389.800012]  [<ffffffff81140261>] ? __might_fault+0x4c/0xa7
      [39389.800012]  [<ffffffff81140261>] ? __might_fault+0x4c/0xa7
      [39389.800012]  [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc
      [39389.800012]  [<ffffffff8118b3d4>] ? rcu_read_unlock+0x3e/0x5d
      [39389.800012]  [<ffffffff811822f8>] do_vfs_ioctl+0x42b/0x4ea
      [39389.800012]  [<ffffffff8118b4f3>] ? __fget_light+0x62/0x71
      [39389.800012]  [<ffffffff8118240e>] SyS_ioctl+0x57/0x79
      [39389.800012]  [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f
      [39389.800012] Code: f0 0f b1 13 85 c0 75 ef eb 2a f3 90 8a 03 84 c0 75 f8 f0 0f b0 13 84 c0 75 f0 ba ff 00 00 00 eb 0a f0 0f b1 13 ff c8 74 0b f3 90 <8b> 03 83 f8 01 75 f7 eb ed c6 43 04 00 5b 5d c3 0f 1f 44 00 00
      
      This happens because in the code path executed by the inode_paths ioctl we
      end up nesting two calls to read lock a leaf's rwlock when after the first
      call to read_lock() and before the second call to read_lock(), another
      task (running the delayed items as part of a transaction commit) has
      already called write_lock() against the leaf's rwlock. This situation is
      illustrated by the following diagram:
      
               Task A                       Task B
      
        btrfs_ref_to_path()               btrfs_commit_transaction()
          read_lock(&eb->lock);
      
                                            btrfs_run_delayed_items()
                                              __btrfs_commit_inode_delayed_items()
                                                __btrfs_update_delayed_inode()
                                                  btrfs_lookup_inode()
      
                                                    write_lock(&eb->lock);
                                                      --> task waits for lock
      
          read_lock(&eb->lock);
          --> makes this task hang
              forever (and task B too
      	of course)
      
      So fix this by avoiding doing the nested read lock, which is easily
      avoidable. This issue does not happen if task B calls write_lock() after
      task A does the second call to read_lock(), however there does not seem
      to exist anything in the documentation that mentions what is the expected
      behaviour for recursive locking of rwlocks (leaving the idea that doing
      so is not a good usage of rwlocks).
      
      Also, as a side effect necessary for this fix, make sure we do not
      needlessly read lock extent buffers when the input path has skip_locking
      set (used when called from send).
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      2c0d636d