1. 25 Feb, 2016 18 commits
  2. 24 Feb, 2016 22 commits
    • Naoya Horiguchi's avatar
      mm: soft-offline: check return value in second __get_any_page() call · 5b48ad31
      Naoya Horiguchi authored
      commit d96b339f upstream.
      
      I saw the following BUG_ON triggered in a testcase where a process calls
      madvise(MADV_SOFT_OFFLINE) on thps, along with a background process that
      calls migratepages command repeatedly (doing ping-pong among different
      NUMA nodes) for the first process:
      
         Soft offlining page 0x60000 at 0x700000600000
         __get_any_page: 0x60000 free buddy page
         page:ffffea0001800000 count:0 mapcount:-127 mapping:          (null) index:0x1
         flags: 0x1fffc0000000000()
         page dumped because: VM_BUG_ON_PAGE(atomic_read(&page->_count) == 0)
         ------------[ cut here ]------------
         kernel BUG at /src/linux-dev/include/linux/mm.h:342!
         invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
         Modules linked in: cfg80211 rfkill crc32c_intel serio_raw virtio_balloon i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi
         CPU: 3 PID: 3035 Comm: test_alloc_gene Tainted: G           O    4.4.0-rc8-v4.4-rc8-160107-1501-00000-rc8+ #74
         Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
         task: ffff88007c63d5c0 ti: ffff88007c210000 task.ti: ffff88007c210000
         RIP: 0010:[<ffffffff8118998c>]  [<ffffffff8118998c>] put_page+0x5c/0x60
         RSP: 0018:ffff88007c213e00  EFLAGS: 00010246
         Call Trace:
           put_hwpoison_page+0x4e/0x80
           soft_offline_page+0x501/0x520
           SyS_madvise+0x6bc/0x6f0
           entry_SYSCALL_64_fastpath+0x12/0x6a
         Code: 8b fc ff ff 5b 5d c3 48 89 df e8 b0 fa ff ff 48 89 df 31 f6 e8 c6 7d ff ff 5b 5d c3 48 c7 c6 08 54 a2 81 48 89 df e8 a4 c5 01 00 <0f> 0b 66 90 66 66 66 66 90 55 48 89 e5 41 55 41 54 53 48 8b 47
         RIP  [<ffffffff8118998c>] put_page+0x5c/0x60
          RSP <ffff88007c213e00>
      
      The root cause resides in get_any_page() which retries to get a refcount
      of the page to be soft-offlined.  This function calls
      put_hwpoison_page(), expecting that the target page is putback to LRU
      list.  But it can be also freed to buddy.  So the second check need to
      care about such case.
      
      Fixes: af8fae7c ("mm/memory-failure.c: clean up soft_offline_page()")
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      5b48ad31
    • Roman Gushchin's avatar
      fuse: break infinite loop in fuse_fill_write_pages() · affc1b9e
      Roman Gushchin authored
      commit 3ca8138f upstream.
      
      I got a report about unkillable task eating CPU. Further
      investigation shows, that the problem is in the fuse_fill_write_pages()
      function. If iov's first segment has zero length, we get an infinite
      loop, because we never reach iov_iter_advance() call.
      
      Fix this by calling iov_iter_advance() before repeating an attempt to
      copy data from userspace.
      
      A similar problem is described in 124d3b70 ("fix writev regression:
      pan hanging unkillable and un-straceable"). If zero-length segmend
      is followed by segment with invalid address,
      iov_iter_fault_in_readable() checks only first segment (zero-length),
      iov_iter_copy_from_user_atomic() skips it, fails at second and
      returns zero -> goto again without skipping zero-length segment.
      
      Patch calls iov_iter_advance() before goto again: we'll skip zero-length
      segment at second iteraction and iov_iter_fault_in_readable() will detect
      invalid address.
      
      Special thanks to Konstantin Khlebnikov, who helped a lot with the commit
      description.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Maxim Patlasov <mpatlasov@parallels.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarRoman Gushchin <klamm@yandex-team.ru>
      Signed-off-by: default avatarMiklos Szeredi <miklos@szeredi.hu>
      Fixes: ea9b9907 ("fuse: implement perform_write")
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      affc1b9e
    • Linus Walleij's avatar
      ARM: 8517/1: ICST: avoid arithmetic overflow in icst_hz() · 8c6bd581
      Linus Walleij authored
      commit 5070fb14 upstream.
      
      When trying to set the ICST 307 clock to 25174000 Hz I ran into
      this arithmetic error: the icst_hz_to_vco() correctly figure out
      DIVIDE=2, RDW=100 and VDW=99 yielding a frequency of
      25174000 Hz out of the VCO. (I replicated the icst_hz() function
      in a spreadsheet to verify this.)
      
      However, when I called icst_hz() on these VCO settings it would
      instead return 4122709 Hz. This causes an error in the common
      clock driver for ICST as the common clock framework will call
      .round_rate() on the clock which will utilize icst_hz_to_vco()
      followed by icst_hz() suggesting the erroneous frequency, and
      then the clock gets set to this.
      
      The error did not manifest in the old clock framework since
      this high frequency was only used by the CLCD, which calls
      clk_set_rate() without first calling clk_round_rate() and since
      the old clock framework would not call clk_round_rate() before
      setting the frequency, the correct values propagated into
      the VCO.
      
      After some experimenting I figured out that it was due to a simple
      arithmetic overflow: the divisor for 24Mhz reference frequency
      as reference becomes 24000000*2*(99+8)=0x132212400 and the "1"
      in bit 32 overflows and is lost.
      
      But introducing an explicit 64-by-32 bit do_div() and casting
      the divisor into (u64) we get the right frequency back, and the
      right frequency gets set.
      
      Tested on the ARM Versatile.
      
      Cc: linux-clk@vger.kernel.org
      Cc: Pawel Moll <pawel.moll@arm.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      8c6bd581
    • Linus Walleij's avatar
      ARM: 8519/1: ICST: try other dividends than 1 · d306ee9e
      Linus Walleij authored
      commit e972c374 upstream.
      
      Since the dawn of time the ICST code has only supported divide
      by one or hang in an eternal loop. Luckily we were always dividing
      by one because the reference frequency for the systems using
      the ICSTs is 24MHz and the [min,max] values for the PLL input
      if [10,320] MHz for ICST307 and [6,200] for ICST525, so the loop
      will always terminate immediately without assigning any divisor
      for the reference frequency.
      
      But for the code to make sense, let's insert the missing i++
      Reported-by: default avatarDavid Binderman <dcb314@hotmail.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      d306ee9e
    • Anson Huang's avatar
      ARM: 8471/1: need to save/restore arm register(r11) when it is corrupted · 02d2716d
      Anson Huang authored
      commit fa0708b3 upstream.
      
      In cpu_v7_do_suspend routine, r11 is used while it is NOT
      saved/restored, different compiler may have different usage
      of ARM general registers, so it may cause issues during
      calling cpu_v7_do_suspend.
      
      We meet kernel fault occurs when using GCC 4.8.3, r11 contains
      valid value before calling into cpu_v7_do_suspend, but when returned
      from this routine, r11 is corrupted and lead to kernel fault.
      Doing save/restore for those corrupted registers is a must in
      assemble code.
      Signed-off-by: default avatarAnson Huang <Anson.Huang@freescale.com>
      Reviewed-by: default avatarNicolas Pitre <nico@linaro.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      02d2716d
    • Helmut Klein's avatar
      ARM: dts: Kirkwood: Fix QNAP TS219 power-off · 58893eec
      Helmut Klein authored
      commit 5442f0ea upstream.
      
      The "reg" entry in the "poweroff" section of "kirkwood-ts219.dtsi"
      addressed the wrong uart (0 = console). This patch changes the address
      to select uart 1, which is the uart connected to the pic
      microcontroller, which can switch the device off.
      Signed-off-by: default avatarHelmut Klein <hgkr.klein@gmail.com>
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Fixes: 4350a47b ("ARM: Kirkwood: Make use of the QNAP Power off driver.")
      Signed-off-by: default avatarGregory CLEMENT <gregory.clement@free-electrons.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      58893eec
    • Andrew Gabbasov's avatar
      udf: Check output buffer length when converting name to CS0 · fc530807
      Andrew Gabbasov authored
      commit bb00c898 upstream.
      
      If a name contains at least some characters with Unicode values
      exceeding single byte, the CS0 output should have 2 bytes per character.
      And if other input characters have single byte Unicode values, then
      the single input byte is converted to 2 output bytes, and the length
      of output becomes larger than the length of input. And if the input
      name is long enough, the output length may exceed the allocated buffer
      length.
      
      All this means that conversion from UTF8 or NLS to CS0 requires
      checking of output length in order to stop when it exceeds the given
      output buffer size.
      
      [JK: Make code return -ENAMETOOLONG instead of silently truncating the
      name]
      Signed-off-by: default avatarAndrew Gabbasov <andrew_gabbasov@mentor.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      fc530807
    • Andrew Gabbasov's avatar
      udf: Prevent buffer overrun with multi-byte characters · b20777e7
      Andrew Gabbasov authored
      commit ad402b26 upstream.
      
      udf_CS0toUTF8 function stops the conversion when the output buffer
      length reaches UDF_NAME_LEN-2, which is correct maximum name length,
      but, when checking, it leaves the space for a single byte only,
      while multi-bytes output characters can take more space, causing
      buffer overflow.
      
      Similar error exists in udf_CS0toNLS function, that restricts
      the output length to UDF_NAME_LEN, while actual maximum allowed
      length is UDF_NAME_LEN-2.
      
      In these cases the output can override not only the current buffer
      length field, causing corruption of the name buffer itself, but also
      following allocation structures, causing kernel crash.
      
      Adjust the output length checks in both functions to prevent buffer
      overruns in case of multi-bytes UTF8 or NLS characters.
      Signed-off-by: default avatarAndrew Gabbasov <andrew_gabbasov@mentor.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      b20777e7
    • Vegard Nossum's avatar
      udf: limit the maximum number of indirect extents in a row · ac79eb71
      Vegard Nossum authored
      commit b0918d9f upstream.
      
      udf_next_aext() just follows extent pointers while extents are marked as
      indirect. This can loop forever for corrupted filesystem. Limit number
      the of indirect extents we are willing to follow in a row.
      
      [JK: Updated changelog, limit, style]
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ac79eb71
    • Andrew Elble's avatar
      nfs: Fix race in __update_open_stateid() · d92f5225
      Andrew Elble authored
      commit 361cad3c upstream.
      
      We've seen this in a packet capture - I've intermixed what I
      think was going on. The fix here is to grab the so_lock sooner.
      
      1964379 -> #1 open (for write) reply seqid=1
      1964393 -> #2 open (for read) reply seqid=2
      
        __nfs4_close(), state->n_wronly--
        nfs4_state_set_mode_locked(), changes state->state = [R]
        state->flags is [RW]
        state->state is [R], state->n_wronly == 0, state->n_rdonly == 1
      
      1964398 -> #3 open (for write) call -> because close is already running
      1964399 -> downgrade (to read) call seqid=2 (close of #1)
      1964402 -> #3 open (for write) reply seqid=3
      
       __update_open_stateid()
         nfs_set_open_stateid_locked(), changes state->flags
         state->flags is [RW]
         state->state is [R], state->n_wronly == 0, state->n_rdonly == 1
         new sequence number is exposed now via nfs4_stateid_copy()
      
         next step would be update_open_stateflags(), pending so_lock
      
      1964403 -> downgrade reply seqid=2, fails with OLD_STATEID (close of #1)
      
         nfs4_close_prepare() gets so_lock and recalcs flags -> send close
      
      1964405 -> downgrade (to read) call seqid=3 (close of #1 retry)
      
         __update_open_stateid() gets so_lock
       * update_open_stateflags() updates state->n_wronly.
         nfs4_state_set_mode_locked() updates state->state
      
         state->flags is [RW]
         state->state is [RW], state->n_wronly == 1, state->n_rdonly == 1
      
       * should have suppressed the preceding nfs4_close_prepare() from
         sending open_downgrade
      
      1964406 -> write call
      1964408 -> downgrade (to read) reply seqid=4 (close of #1 retry)
      
         nfs_clear_open_stateid_locked()
         state->flags is [R]
         state->state is [RW], state->n_wronly == 1, state->n_rdonly == 1
      
      1964409 -> write reply (fails, openmode)
      Signed-off-by: default avatarAndrew Elble <aweits@rit.edu>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      d92f5225
    • Anton Protopopov's avatar
      cifs: fix erroneous return value · 69ef7d4f
      Anton Protopopov authored
      commit 4b550af5 upstream.
      
      The setup_ntlmv2_rsp() function may return positive value ENOMEM instead
      of -ENOMEM in case of kmalloc failure.
      Signed-off-by: default avatarAnton Protopopov <a.s.protopopov@gmail.com>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      69ef7d4f
    • Vasily Averin's avatar
      cifs_dbg() outputs an uninitialized buffer in cifs_readdir() · ec6e80b8
      Vasily Averin authored
      commit 01b9b0b2 upstream.
      
      In some cases tmp_bug can be not filled in cifs_filldir and stay uninitialized,
      therefore its printk with "%s" modifier can leak content of kernelspace memory.
      If old content of this buffer does not contain '\0' access bejond end of
      allocated object can crash the host.
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarSteve French <steve.french@primarydata.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ec6e80b8
    • Yong Li's avatar
      iio: dac: mcp4725: set iio name property in sysfs · d4cd077a
      Yong Li authored
      commit 97a249e9 upstream.
      
      Without this change, the name entity for mcp4725 is missing in
      /sys/bus/iio/devices/iio\:device*/name
      
      With this change, name is reported correctly
      Signed-off-by: default avatarYong Li <sdliyong@gmail.com>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      d4cd077a
    • Lars-Peter Clausen's avatar
      iio: adis_buffer: Fix out-of-bounds memory access · 8506cf1d
      Lars-Peter Clausen authored
      commit d590faf9 upstream.
      
      The SPI tx and rx buffers are both supposed to be scan_bytes amount of
      bytes large and a common allocation is used to allocate both buffers. This
      puts the beginning of the tx buffer scan_bytes bytes after the rx buffer.
      The initialization of the tx buffer pointer is done adding scan_bytes to
      the beginning of the rx buffer, but since the rx buffer is of type __be16
      this will actually add two times as much and the tx buffer ends up pointing
      after the allocated buffer.
      
      Fix this by using scan_count, which is scan_bytes / 2, instead of
      scan_bytes when initializing the tx buffer pointer.
      
      Fixes: aacff892 ("staging:iio:adis: Preallocate transfer message")
      Signed-off-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      8506cf1d
    • Dan Carpenter's avatar
      iio: fix some warning messages · 3a9194b1
      Dan Carpenter authored
      commit 231bfe53 upstream.
      
      WARN_ON() only takes a condition argument.  I have changed these to
      WARN() instead.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      3a9194b1
    • Lars-Peter Clausen's avatar
      iio: ad5064: Fix ad5629/ad5669 shift · c66e8114
      Lars-Peter Clausen authored
      commit 5dcbe97b upstream.
      
      The ad5629/ad5669 are the I2C variant of the ad5628/ad5668, which has a SPI
      interface. They are mostly identical with the exception that the shift
      factor is different. Currently the driver does not take care of this
      difference which leads to incorrect DAC output values.
      
      Fix this by introducing a custom channel spec for the ad5629/ad5669 with
      the correct shift factor.
      
      Fixes: commit 6a17a076 ("iio:dac:ad5064: Add support for the ad5629r and ad5669r")
      Signed-off-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      c66e8114
    • Michael Hennerich's avatar
      iio:ad5064: Make sure ad5064_i2c_write() returns 0 on success · baae260f
      Michael Hennerich authored
      commit 03fe472e upstream.
      
      i2c_master_send() returns the number of bytes transferred on success while
      the ad5064 driver expects that the write() callback returns 0 on success.
      Fix that by translating any non negative return value of i2c_master_send()
      to 0.
      
      Fixes: commit 6a17a076 ("iio:dac:ad5064: Add support for the ad5629r and ad5669r")
      Signed-off-by: default avatarMichael Hennerich <michael.hennerich@analog.com>
      Signed-off-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      baae260f
    • Vladimir Zapolskiy's avatar
      iio: lpc32xx_adc: fix warnings caused by enabling unprepared clock · 0e2e731e
      Vladimir Zapolskiy authored
      commit 01bb70ae upstream.
      
      If common clock framework is configured, the driver generates a warning,
      which is fixed by this change:
      
          root@devkit3250:~# cat /sys/bus/iio/devices/iio\:device0/in_voltage0_raw
          ------------[ cut here ]------------
          WARNING: CPU: 0 PID: 724 at drivers/clk/clk.c:727 clk_core_enable+0x2c/0xa4()
          Modules linked in: sc16is7xx snd_soc_uda1380
          CPU: 0 PID: 724 Comm: cat Not tainted 4.3.0-rc2+ #198
          Hardware name: LPC32XX SoC (Flattened Device Tree)
          Backtrace:
          [<>] (dump_backtrace) from [<>] (show_stack+0x18/0x1c)
          [<>] (show_stack) from [<>] (dump_stack+0x20/0x28)
          [<>] (dump_stack) from [<>] (warn_slowpath_common+0x90/0xb8)
          [<>] (warn_slowpath_common) from [<>] (warn_slowpath_null+0x24/0x2c)
          [<>] (warn_slowpath_null) from [<>] (clk_core_enable+0x2c/0xa4)
          [<>] (clk_core_enable) from [<>] (clk_enable+0x24/0x38)
          [<>] (clk_enable) from [<>] (lpc32xx_read_raw+0x38/0x80)
          [<>] (lpc32xx_read_raw) from [<>] (iio_read_channel_info+0x70/0x94)
          [<>] (iio_read_channel_info) from [<>] (dev_attr_show+0x28/0x4c)
          [<>] (dev_attr_show) from [<>] (sysfs_kf_seq_show+0x8c/0xf0)
          [<>] (sysfs_kf_seq_show) from [<>] (kernfs_seq_show+0x2c/0x30)
          [<>] (kernfs_seq_show) from [<>] (seq_read+0x1c8/0x440)
          [<>] (seq_read) from [<>] (kernfs_fop_read+0x38/0x170)
          [<>] (kernfs_fop_read) from [<>] (do_readv_writev+0x16c/0x238)
          [<>] (do_readv_writev) from [<>] (vfs_readv+0x50/0x58)
          [<>] (vfs_readv) from [<>] (default_file_splice_read+0x1a4/0x308)
          [<>] (default_file_splice_read) from [<>] (do_splice_to+0x78/0x84)
          [<>] (do_splice_to) from [<>] (splice_direct_to_actor+0xc8/0x1cc)
          [<>] (splice_direct_to_actor) from [<>] (do_splice_direct+0xa0/0xb8)
          [<>] (do_splice_direct) from [<>] (do_sendfile+0x1a8/0x30c)
          [<>] (do_sendfile) from [<>] (SyS_sendfile64+0x104/0x10c)
          [<>] (SyS_sendfile64) from [<>] (ret_fast_syscall+0x0/0x38)
      Signed-off-by: default avatarVladimir Zapolskiy <vz@mleia.com>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      0e2e731e
    • Lars-Peter Clausen's avatar
      iio:ad7793: Fix ad7785 product ID · 3eefb1c1
      Lars-Peter Clausen authored
      commit 785171fd upstream.
      
      While the datasheet for the AD7785 lists 0xXB as the product ID the actual
      product ID is 0xX3.
      
      Fix the product ID otherwise the driver will reject the device due to non
      matching IDs.
      
      Fixes: e786cc26 ("staging:iio:ad7793: Implement stricter id checking")
      Signed-off-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      3eefb1c1
    • James Bottomley's avatar
      scsi: fix soft lockup in scsi_remove_target() on module removal · bf7642bd
      James Bottomley authored
      commit 90a88d6e upstream.
      
      This softlockup is currently happening:
      
      [  444.088002] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:1:29]
      [  444.088002] Modules linked in: lpfc(-) qla2x00tgt(O) qla2xxx_scst(O) scst_vdisk(O) scsi_transport_fc libcrc32c scst(O) dlm configfs nfsd lockd grace nfs_acl auth_rpcgss sunrpc ed
      d snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device dm_mod iTCO_wdt snd_hda_codec_realtek snd_hda_codec_generic gpio_ich iTCO_vendor_support ppdev snd_hda_intel snd_hda_codec snd_hda
      _core snd_hwdep tg3 snd_pcm snd_timer libphy lpc_ich parport_pc ptp acpi_cpufreq snd pps_core fjes parport i2c_i801 ehci_pci tpm_tis tpm sr_mod cdrom soundcore floppy hwmon sg 8250_
      fintek pcspkr i915 drm_kms_helper uhci_hcd ehci_hcd drm fb_sys_fops sysimgblt sysfillrect syscopyarea i2c_algo_bit usbcore button video usb_common fan ata_generic ata_piix libata th
      ermal
      [  444.088002] CPU: 1 PID: 29 Comm: kworker/1:1 Tainted: G           O    4.4.0-rc5-2.g1e923a3-default #1
      [  444.088002] Hardware name: FUJITSU SIEMENS ESPRIMO E           /D2164-A1, BIOS 5.00 R1.10.2164.A1               05/08/2006
      [  444.088002] Workqueue: fc_wq_4 fc_rport_final_delete [scsi_transport_fc]
      [  444.088002] task: f6266ec0 ti: f6268000 task.ti: f6268000
      [  444.088002] EIP: 0060:[<c07e7044>] EFLAGS: 00000286 CPU: 1
      [  444.088002] EIP is at _raw_spin_unlock_irqrestore+0x14/0x20
      [  444.088002] EAX: 00000286 EBX: f20d3800 ECX: 00000002 EDX: 00000286
      [  444.088002] ESI: f50ba800 EDI: f2146848 EBP: f6269ec8 ESP: f6269ec8
      [  444.088002]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
      [  444.088002] CR0: 8005003b CR2: 08f96600 CR3: 363ae000 CR4: 000006d0
      [  444.088002] Stack:
      [  444.088002]  f6269eec c066b0f7 00000286 f2146848 f50ba808 f50ba800 f50ba800 f2146a90
      [  444.088002]  f2146848 f6269f08 f8f0a4ed f3141000 f2146800 f2146a90 f619fa00 00000040
      [  444.088002]  f6269f40 c026cb25 00000001 166c6392 00000061 f6757140 f6136340 00000004
      [  444.088002] Call Trace:
      [  444.088002]  [<c066b0f7>] scsi_remove_target+0x167/0x1c0
      [  444.088002]  [<f8f0a4ed>] fc_rport_final_delete+0x9d/0x1e0 [scsi_transport_fc]
      [  444.088002]  [<c026cb25>] process_one_work+0x155/0x3e0
      [  444.088002]  [<c026cde7>] worker_thread+0x37/0x490
      [  444.088002]  [<c027214b>] kthread+0x9b/0xb0
      [  444.088002]  [<c07e72c1>] ret_from_kernel_thread+0x21/0x40
      
      What appears to be happening is that something has pinned the target
      so it can't go into STARGET_DEL via final release and the loop in
      scsi_remove_target spins endlessly until that happens.
      
      The fix for this soft lockup is to not keep looping over a device that
      we've called remove on but which hasn't gone into DEL state.  This
      patch will retain a simplistic memory of the last target and not keep
      looping over it.
      Reported-by: default avatarSebastian Herbszt <herbszt@gmx.de>
      Tested-by: default avatarSebastian Herbszt <herbszt@gmx.de>
      Fixes: 40998193Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      bf7642bd
    • Mika Westerberg's avatar
      SCSI: Add Marvell Console to VPD blacklist · 14b9a5ea
      Mika Westerberg authored
      commit 82c43310 upstream.
      
      I have a Marvell 88SE9230 SATA Controller that has some sort of
      integrated console SCSI device attached to one of the ports.
      
        ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
        ata14.00: ATAPI: MARVELL VIRTUALL, 1.09, max UDMA/66
        ata14.00: configured for UDMA/66
        scsi 13:0:0:0: Processor         Marvell  Console 1.01 PQ: 0 ANSI: 5
      
      Sending it VPD INQUIRY command seem to always fail with following error:
      
        ata14.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
        ata14.00: irq_stat 0x40000001
        ata14.00: cmd a0/01:00:00:00:01/00:00:00:00:00/a0 tag 2 dma 16640 in
                  Inquiry 12 01 00 00 ff 00res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x3 (HSM violation)
        ata14: hard resetting link
      
      This has been minor annoyance (only error printed on dmesg) until commit
      09e2b0b1 ("scsi: rescan VPD attributes") added call to scsi_attach_vpd()
      in scsi_rescan_device(). The commit causes the system to splat out
      following errors continuously without ever reaching the UI:
      
        ata14.00: configured for UDMA/66
        ata14: EH complete
        ata14.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
        ata14.00: irq_stat 0x40000001
        ata14.00: cmd a0/01:00:00:00:01/00:00:00:00:00/a0 tag 6 dma 16640 in
                  Inquiry 12 01 00 00 ff 00res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x3 (HSM violation)
        ata14: hard resetting link
        ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
        ata14.00: configured for UDMA/66
        ata14: EH complete
        ata14.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
        ata14.00: irq_stat 0x40000001
        ata14.00: cmd a0/01:00:00:00:01/00:00:00:00:00/a0 tag 7 dma 16640 in
                  Inquiry 12 01 00 00 ff 00res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x3 (HSM violation)
      
      Without in-depth understanding of SCSI layer and the Marvell controller,
      I suspect this happens because when the link goes down (because of an
      error) we schedule scsi_rescan_device() which again fails to read VPD
      data... ad infinitum.
      
      Since VPD data cannot be read from the device anyway we prevent the SCSI
      layer from even trying by blacklisting the device. This gets away the
      error and the system starts up normally.
      
      [mkp: Widened the match to all revisions of this device]
      Signed-off-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Reported-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: default avatarAlexander Duyck <alexander.duyck@gmail.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      14b9a5ea
    • Hannes Reinecke's avatar
      scsi_dh_rdac: always retry MODE SELECT on command lock violation · 6c3964e4
      Hannes Reinecke authored
      commit d2d06d4f upstream.
      
      If MODE SELECT returns with sense '05/91/36' (command lock violation)
      it should always be retried without counting the number of retries.
      During an HBA upgrade or similar circumstances one might see a flood
      of MODE SELECT command from various HBAs, which will easily trigger
      the sense code and exceed the retry count.
      Signed-off-by: default avatarHannes Reinecke <hare@suse.de>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      6c3964e4