1. 04 May, 2016 40 commits
    • Cyrille Pitchen's avatar
      mtd: spi-nor: remove micron_quad_enable() · 87261de3
      Cyrille Pitchen authored
      commit 3b5394a3 upstream.
      
      This patch remove the micron_quad_enable() function which force the Quad
      SPI mode. However, once this mode is enabled, the Micron memory expect ALL
      commands to use the SPI 4-4-4 protocol. Hence a failure does occur when
      calling spi_nor_wait_till_ready() right after the update of the Enhanced
      Volatile Configuration Register (EVCR) in the micron_quad_enable() as
      the SPI controller driver is not aware about the protocol change.
      
      Since there is almost no performance increase using Fast Read 4-4-4
      commands instead of Fast Read 1-1-4 commands, we rather keep on using the
      Extended SPI mode than enabling the Quad SPI mode.
      
      Let's take the example of the pretty standard use of 8 dummy cycles during
      Fast Read operations on 64KB erase sectors:
      
      Fast Read 1-1-4 requires 8 cycles for the command, then 24 cycles for the
      3byte address followed by 8 dummy clock cycles and finally 65536*2 cycles
      for the read data; so 131112 clock cycles.
      
      On the other hand the Fast Read 4-4-4 would require 2 cycles for the
      command, then 6 cycles for the 3byte address followed by 8 dummy clock
      cycles and finally 65536*2 cycles for the read data. So 131088 clock
      cycles. The theorical bandwidth increase is 0.0%.
      
      Now using Fast Read operations on 512byte pages:
      Fast Read 1-1-4 needs 8+24+8+(512*2) = 1064 clock cycles whereas Fast
      Read 4-4-4 would requires 2+6+8+(512*2) = 1040 clock cycles. Hence the
      theorical bandwidth increase is 2.3%.
      Consecutive reads for non sequential pages is not a relevant use case so
      The Quad SPI mode is not worth it.
      
      mtd_speedtest seems to confirm these figures.
      Signed-off-by: default avatarCyrille Pitchen <cyrille.pitchen@atmel.com>
      Fixes: 548cd3ab ("mtd: spi-nor: Add quad I/O support for Micron SPI NOR")
      Signed-off-by: default avatarBrian Norris <computersforpeace@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      87261de3
    • Geert Uytterhoeven's avatar
      serial: sh-sci: Remove cpufreq notifier to fix crash/deadlock · 447ea0a3
      Geert Uytterhoeven authored
      commit ff1cab37 upstream.
      
      The BSP team noticed that there is spin/mutex lock issue on sh-sci when
      CPUFREQ is used.  The issue is that the notifier function may call
      mutex_lock() while the spinlock is held, which can lead to a BUG().
      This may happen if CPUFREQ is changed while another CPU calls
      clk_get_rate().
      
      Taking the spinlock was added to the notifier function in commit
      e552de24 ("sh-sci: add platform device private data"), to
      protect the list of serial ports against modification during traversal.
      At that time the Common Clock Framework didn't exist yet, and
      clk_get_rate() just returned clk->rate without taking a mutex.
      Note that since commit d535a230 ("serial: sh-sci: Require a
      device per port mapping."), there's no longer a list of serial ports to
      traverse, and taking the spinlock became superfluous.
      
      To fix the issue, just remove the cpufreq notifier:
        1. The notifier doesn't work correctly: all it does is update stored
           clock rates; it does not update the divider in the hardware.
           The divider will only be updated when calling sci_set_termios().
           I believe this was broken back in 2004, when the old
           drivers/char/sh-sci.c driver (where the notifier did update the
           divider) was replaced by drivers/serial/sh-sci.c (where the
           notifier just updated port->uartclk).
           Cfr. full-history-linux commits 6f8deaef2e9675d9 ("[PATCH] sh: port
           sh-sci driver to the new API") and 3f73fe878dc9210a ("[PATCH]
           Remove old sh-sci driver").
        2. On modern SoCs, the sh-sci parent clock rate is no longer related
           to the CPU clock rate anyway, so using a cpufreq notifier is
           futile.
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      447ea0a3
    • Eryu Guan's avatar
      ext4: fix NULL pointer dereference in ext4_mark_inode_dirty() · c745297b
      Eryu Guan authored
      commit 5e1021f2 upstream.
      
      ext4_reserve_inode_write() in ext4_mark_inode_dirty() could fail on
      error (e.g. EIO) and iloc.bh can be NULL in this case. But the error is
      ignored in the following "if" condition and ext4_expand_extra_isize()
      might be called with NULL iloc.bh set, which triggers NULL pointer
      dereference.
      
      This is uncovered by commit 8b4953e1 ("ext4: reserve code points for
      the project quota feature"), which enlarges the ext4_inode size, and
      run the following script on new kernel but with old mke2fs:
      
        #/bin/bash
        mnt=/mnt/ext4
        devname=ext4-error
        dev=/dev/mapper/$devname
        fsimg=/home/fs.img
      
        trap cleanup 0 1 2 3 9 15
      
        cleanup()
        {
                umount $mnt >/dev/null 2>&1
                dmsetup remove $devname
                losetup -d $backend_dev
                rm -f $fsimg
                exit 0
        }
      
        rm -f $fsimg
        fallocate -l 1g $fsimg
        backend_dev=`losetup -f --show $fsimg`
        devsize=`blockdev --getsz $backend_dev`
      
        good_tab="0 $devsize linear $backend_dev 0"
        error_tab="0 $devsize error $backend_dev 0"
      
        dmsetup create $devname --table "$good_tab"
      
        mkfs -t ext4 $dev
        mount -t ext4 -o errors=continue,strictatime $dev $mnt
      
        dmsetup load $devname --table "$error_tab" && dmsetup resume $devname
        echo 3 > /proc/sys/vm/drop_caches
        ls -l $mnt
        exit 0
      
      [ Patch changed to simplify the function a tiny bit. -- Ted ]
      Signed-off-by: default avatarEryu Guan <guaneryu@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c745297b
    • Karol Herbst's avatar
      x86/mm/kmmio: Fix mmiotrace for hugepages · 8481fdf6
      Karol Herbst authored
      commit cfa52c0c upstream.
      
      Because Linux might use bigger pages than the 4K pages to handle those mmio
      ioremaps, the kmmio code shouldn't rely on the pade id as it currently does.
      
      Using the memory address instead of the page id lets us look up how big the
      page is and what its base address is, so that we won't get a page fault
      within the same page twice anymore.
      Tested-by: default avatarPierre Moreau <pierre.morrow@free.fr>
      Signed-off-by: default avatarKarol Herbst <nouveau@karolherbst.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: linux-mm@kvack.org
      Cc: linux-x86_64@vger.kernel.org
      Cc: nouveau@lists.freedesktop.org
      Cc: pq@iki.fi
      Cc: rostedt@goodmis.org
      Link: http://lkml.kernel.org/r/1456966991-6861-1-git-send-email-nouveau@karolherbst.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8481fdf6
    • Arnaldo Carvalho de Melo's avatar
      perf evlist: Reference count the cpu and thread maps at set_maps() · 36828721
      Arnaldo Carvalho de Melo authored
      commit a55e5663 upstream.
      
      We were dropping the reference we possibly held but not obtaining one
      for the new maps, which we will drop at perf_evlist__delete(), fix it.
      
      This was caught by Steven Noonan in some of the machines which would
      produce this output when caught by glibc debug mechanisms:
      
        $ sudo perf test 21
        21: Test object code reading                                 :***
        Error in `perf': corrupted double-linked list: 0x00000000023ffcd0 ***
        ======= Backtrace: =========
        /usr/lib/libc.so.6(+0x72055)[0x7f25be0f3055]
        /usr/lib/libc.so.6(+0x779b6)[0x7f25be0f89b6]
        /usr/lib/libc.so.6(+0x7a0ed)[0x7f25be0fb0ed]
        /usr/lib/libc.so.6(__libc_calloc+0xba)[0x7f25be0fceda]
        perf(parse_events_lex_init_extra+0x38)[0x4cfff8]
        perf(parse_events+0x55)[0x4a0615]
        perf(perf_evlist__config+0xcf)[0x4eeb2f]
        perf[0x479f82]
        perf(test__code_reading+0x1e)[0x47ad4e]
        perf(cmd_test+0x5dd)[0x46452d]
        perf[0x47f4e3]
        perf(main+0x603)[0x42c723]
        /usr/lib/libc.so.6(__libc_start_main+0xf0)[0x7f25be0a1610]
        perf(_start+0x29)[0x42c859]
      
      Further investigation using valgrind led to the reference count imbalance fixed
      in this patch.
      Reported-and-Tested-by: default avatarSteven Noonan <steven@uplinklabs.net>
      Report-Link: http://lkml.kernel.org/r/CAKbGBLjC2Dx5vshxyGmQkcD+VwiAQLbHoXA9i7kvRB2-2opHZQ@mail.gmail.com
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: f30a79b0 ("perf tools: Add reference counting for cpu_map object")
      Link: http://lkml.kernel.org/n/tip-j0u1bdhr47sa511sgg76kb8h@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      36828721
    • Michael Hennerich's avatar
      drivers/misc/ad525x_dpot: AD5274 fix RDAC read back errors · ab2c82dc
      Michael Hennerich authored
      commit f3df53e4 upstream.
      
      Fix RDAC read back errors caused by a typo. Value must shift by 2.
      
      Fixes: a4bd3949 ("drivers/misc/ad525x_dpot.c: new features")
      Signed-off-by: default avatarMichael Hennerich <michael.hennerich@analog.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ab2c82dc
    • Krzysztof Kozlowski's avatar
      rtc: max77686: Properly handle regmap_irq_get_virq() error code · f5513114
      Krzysztof Kozlowski authored
      commit fb166ba1 upstream.
      
      The regmap_irq_get_virq() can return 0 or -EINVAL in error conditions
      but driver checked only for value of 0.
      
      This could lead to a cast of -EINVAL to an unsigned int used as a
      interrupt number for devm_request_threaded_irq(). Although this is not
      yet fatal (devm_request_threaded_irq() will just fail with -EINVAL) but
      might be a misleading when diagnosing errors.
      Signed-off-by: default avatarKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Fixes: 6f1c1e71 ("mfd: max77686: Convert to use regmap_irq")
      Reviewed-by: default avatarJavier Martinez Canillas <javier@osg.samsung.com>
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@free-electrons.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f5513114
    • Alexandre Belloni's avatar
      rtc: rx8025: remove rv8803 id · 11dd7f9a
      Alexandre Belloni authored
      commit aaa3cee5 upstream.
      
      The rv8803 has its own driver that should be used. Remove its id from
      the rx8025 driver.
      
      Fixes: b1f9d790Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@free-electrons.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      11dd7f9a
    • Dan Carpenter's avatar
      rtc: ds1685: passing bogus values to irq_restore · 83fe55ba
      Dan Carpenter authored
      commit 8c09b9fd upstream.
      
      We call spin_lock_irqrestore with "flags" set to zero instead of to the
      value from spin_lock_irqsave().
      
      Fixes: aaaf5fbf ('rtc: add driver for DS1685 family of real time clocks')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@free-electrons.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      83fe55ba
    • Geert Uytterhoeven's avatar
      rtc: vr41xx: Wire up alarm_irq_enable · 041f2ca3
      Geert Uytterhoeven authored
      commit a25f4a95 upstream.
      
      drivers/rtc/rtc-vr41xx.c:229: warning: ‘vr41xx_rtc_alarm_irq_enable’ defined but not used
      
      Apparently the conversion to alarm_irq_enable forgot to wire up the
      callback.
      
      Fixes: 16380c15 ("RTC: Convert rtc drivers to use the alarm_irq_enable method")
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@free-electrons.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      041f2ca3
    • Alexander Kochetkov's avatar
      rtc: hym8563: fix invalid year calculation · 1392ec2a
      Alexander Kochetkov authored
      commit d5861262 upstream.
      
      Year field must be in BCD format, according to
      hym8563 datasheet.
      
      Due to the bug year 2016 became 2010.
      
      Fixes: dcaf0384 ("rtc: add hym8563 rtc-driver")
      Signed-off-by: default avatarAlexander Kochetkov <al.kochet@gmail.com>
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@free-electrons.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1392ec2a
    • Jon Hunter's avatar
      PM / Domains: Fix removal of a subdomain · cab4c949
      Jon Hunter authored
      commit beda5fc1 upstream.
      
      Commit 30e7a65b (PM / Domains: Ensure subdomain is not in use
      before removing) added a test to ensure that a subdomain is not a
      master to another subdomain or if any devices are using the subdomain
      before removing. This change incorrectly used the "slave_links" list to
      determine if the subdomain is a master to another subdomain, where it
      should have been using the "master_links" list instead. The
      "slave_links" list will never be empty for a subdomain and so a
      subdomain can never be removed. Fix this by testing if the
      "master_links" list is empty instead.
      
      Fixes: 30e7a65b (PM / Domains: Ensure subdomain is not in use before removing)
      Signed-off-by: default avatarJon Hunter <jonathanh@nvidia.com>
      Reviewed-by: default avatarThierry Reding <treding@nvidia.com>
      Acked-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Acked-by: default avatarKevin Hilman <khilman@baylibre.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cab4c949
    • Viresh Kumar's avatar
      PM / OPP: Initialize u_volt_min/max to a valid value · dabe1416
      Viresh Kumar authored
      commit c88c395f upstream.
      
      We kept u_volt_min/max initialized to 0, when only the target voltage is
      present in DT, instead of the target/min/max triplet.
      
      This didn't go well with the regulator framework, as on few calls the
      min voltage was set to target and max was set to 0 and so resulted in a
      kernel crash like below:
      
      kernel BUG at ../drivers/regulator/core.c:216!
      
      [<c0684af4>] (regulator_check_voltage) from [<c06857ac>] (regulator_set_voltage_unlocked+0x58/0x230)
      [<c06857ac>] (regulator_set_voltage_unlocked) from [<c06859ac>] (regulator_set_voltage+0x28/0x54)
      [<c06859ac>] (regulator_set_voltage) from [<c0775b28>] (_set_opp_voltage+0x30/0x98)
      [<c0775b28>] (_set_opp_voltage) from [<c0776630>] (dev_pm_opp_set_rate+0xf0/0x28c)
      [<c0776630>] (dev_pm_opp_set_rate) from [<c096f784>] (__cpufreq_driver_target+0x184/0x2b4)
      [<c096f784>] (__cpufreq_driver_target) from [<c0973760>] (dbs_check_cpu+0x1b0/0x1f4)
      [<c0973760>] (dbs_check_cpu) from [<c0973f30>] (cpufreq_governor_dbs+0x324/0x5c4)
      [<c0973f30>] (cpufreq_governor_dbs) from [<c0970958>] (__cpufreq_governor+0xe4/0x1ec)
      [<c0970958>] (__cpufreq_governor) from [<c09711e0>] (cpufreq_init_policy+0x64/0x8c)
      [<c09711e0>] (cpufreq_init_policy) from [<c09718cc>] (cpufreq_online+0x2fc/0x708)
      [<c09718cc>] (cpufreq_online) from [<c0765ff0>] (subsys_interface_register+0x94/0xd8)
      [<c0765ff0>] (subsys_interface_register) from [<c0970530>] (cpufreq_register_driver+0x14c/0x19c)
      [<c0970530>] (cpufreq_register_driver) from [<c09746dc>] (dt_cpufreq_probe+0x70/0xec)
      [<c09746dc>] (dt_cpufreq_probe) from [<c076907c>] (platform_drv_probe+0x4c/0xb0)
      [<c076907c>] (platform_drv_probe) from [<c07678e0>] (driver_probe_device+0x214/0x2c0)
      [<c07678e0>] (driver_probe_device) from [<c0767a18>] (__driver_attach+0x8c/0x90)
      [<c0767a18>] (__driver_attach) from [<c0765c2c>] (bus_for_each_dev+0x68/0x9c)
      [<c0765c2c>] (bus_for_each_dev) from [<c0766d78>] (bus_add_driver+0x1a0/0x218)
      [<c0766d78>] (bus_add_driver) from [<c076810c>] (driver_register+0x78/0xf8)
      [<c076810c>] (driver_register) from [<c0301d74>] (do_one_initcall+0x90/0x1d8)
      [<c0301d74>] (do_one_initcall) from [<c1100e14>] (kernel_init_freeable+0x15c/0x1fc)
      [<c1100e14>] (kernel_init_freeable) from [<c0b27a0c>] (kernel_init+0x8/0xf0)
      [<c0b27a0c>] (kernel_init) from [<c0307d78>] (ret_from_fork+0x14/0x3c)
      Code: e1550004 baffffeb e3a00000 e8bd8070 (e7f001f2)
      
      Fix that by initializing u_volt_min/max to the target voltage in such cases.
      Reported-and-tested-by: default avatarKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Fixes: 27465902 (PM / OPP: Add support to parse "operating-points-v2" bindings)
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dabe1416
    • Dan Carpenter's avatar
      misc: mic/scif: fix wrap around tests · 4f8e29e7
      Dan Carpenter authored
      commit 7b64dbf8 upstream.
      
      Signed integer overflow is undefined.  Also I added a check for
      "(offset < 0)" in scif_unregister() because that makes it match the
      other conditions and because I didn't want to subtract a negative.
      
      Fixes: ba612aa8 ('misc: mic: SCIF memory registration and unregistration')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4f8e29e7
    • Ben Hutchings's avatar
      misc/bmp085: Enable building as a module · 01c8261c
      Ben Hutchings authored
      commit 50e6315d upstream.
      
      Commit 985087db 'misc: add support for bmp18x chips to the bmp085
      driver' changed the BMP085 config symbol to a boolean.  I see no
      reason why the shared code cannot be built as a module, so change it
      back to tristate.
      
      Fixes: 985087db ("misc: add support for bmp18x chips to the bmp085 driver")
      Cc: Eric Andersson <eric.andersson@unixphere.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      01c8261c
    • Michal Marek's avatar
      lib/mpi: Endianness fix · 81b3a56e
      Michal Marek authored
      commit 3ee0cb5f upstream.
      
      The limbs are integers in the host endianness, so we can't simply
      iterate over the individual bytes. The current code happens to work on
      little-endian, because the order of the limbs in the MPI array is the
      same as the order of the bytes in each limb, but it breaks on
      big-endian.
      
      Fixes: 0f74fbf7 ("MPI: Fix mpi_read_buffer")
      Signed-off-by: default avatarMichal Marek <mmarek@suse.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      81b3a56e
    • Sushaanth Srirangapathi's avatar
      fbdev: da8xx-fb: fix videomodes of lcd panels · 0658e8c5
      Sushaanth Srirangapathi authored
      commit 713fced8 upstream.
      
      Commit 028cd86b ("video: da8xx-fb: fix the polarities of the
      hsync/vsync pulse") fixes polarities of HSYNC/VSYNC pulse but
      forgot to update known_lcd_panels[] which had sync values
      according to old logic. This breaks LCD at least on DA850 EVM.
      
      This patch fixes this issue and I have tested this for panel
      "Sharp_LK043T1DG01" using DA850 EVM board.
      
      Fixes: 028cd86b ("video: da8xx-fb: fix the polarities of the hsync/vsync pulse")
      Signed-off-by: default avatarSushaanth Srirangapathi <sushaanth.s@ti.com>
      Signed-off-by: default avatarTomi Valkeinen <tomi.valkeinen@ti.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0658e8c5
    • Arnd Bergmann's avatar
      scsi_dh: force modular build if SCSI is a module · 9d9fefc8
      Arnd Bergmann authored
      commit 0c994c03 upstream.
      
      When the scsi_dh core was moved into the scsi core module,
      CONFIG_SCSI_DH became a 'bool' option, and now anything depending on it
      can be built-in even when CONFIG_SCSI=m. This of course cannot link
      successfully:
      
      drivers/scsi/built-in.o: In function `rdac_init':
      scsi_dh_alua.c:(.init.text+0x14): undefined reference to `scsi_register_device_handler'
      scsi_dh_alua.c:(.init.text+0x64): undefined reference to `scsi_unregister_device_handler'
      drivers/scsi/built-in.o: In function `alua_init':
      scsi_dh_alua.c:(.init.text+0xb0): undefined reference to `scsi_register_device_handler'
      
      As a workaround, this adds an extra dependency on CONFIG_SCSI, so
      Kconfig can figure out whether built-in is allowed or not.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Fixes: 086b91d0 ("scsi_dh: integrate into the core SCSI code")
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9d9fefc8
    • Arnd Bergmann's avatar
      paride: make 'verbose' parameter an 'int' again · aea6995a
      Arnd Bergmann authored
      commit dec63a4d upstream.
      
      gcc-6.0 found an ancient bug in the paride driver, which had a
      "module_param(verbose, bool, 0);" since before 2.6.12, but actually uses
      it to accept '0', '1' or '2' as arguments:
      
        drivers/block/paride/pd.c: In function 'pd_init_dev_parms':
        drivers/block/paride/pd.c:298:29: warning: comparison of constant '1' with boolean expression is always false [-Wbool-compare]
         #define DBMSG(msg) ((verbose>1)?(msg):NULL)
      
      In 2012, Rusty did a cleanup patch that also changed the type of the
      variable to 'bool', which introduced what is now a gcc warning.
      
      This changes the type back to 'int' and adapts the module_param() line
      instead, so it should work as documented in case anyone ever cares about
      running the ancient driver with debugging.
      
      Fixes: 90ab5ee9 ("module_param: make bool parameters really bool (drivers & misc)")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tim Waugh <tim@cyberelk.net>
      Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aea6995a
    • Arnd Bergmann's avatar
      regulator: s5m8767: fix get_register() error handling · 72291d61
      Arnd Bergmann authored
      commit e07ff943 upstream.
      
      The s5m8767_pmic_probe() function calls s5m8767_get_register() to
      read data without checking the return code, which produces a compile-time
      warning when that data is accessed:
      
      drivers/regulator/s5m8767.c: In function 's5m8767_pmic_probe':
      drivers/regulator/s5m8767.c:924:7: error: 'enable_reg' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      drivers/regulator/s5m8767.c:944:30: error: 'enable_val' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      
      This changes the s5m8767_get_register() function to return a -EINVAL
      not just for an invalid register number but also for an invalid
      regulator number, as both would result in returning uninitialized
      data. The s5m8767_pmic_probe() function is then changed accordingly
      to fail on a read error, as all the other callers of s5m8767_get_register()
      already do.
      
      In practice this probably cannot happen, as we don't call
      s5m8767_get_register() with invalid arguments, but the gcc
      warning seems valid in principle, in terms writing safe
      error checking.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Fixes: 9c4c6055 ("regulator: s5m8767: Convert to use regulator_[enable|disable|is_enabled]_regmap")
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      72291d61
    • Vladimir Zapolskiy's avatar
      irqchip/mxs: Fix error check of of_io_request_and_map() · e60711a1
      Vladimir Zapolskiy authored
      commit edf8fcdc upstream.
      
      The of_io_request_and_map() returns a valid pointer in iomem region or
      ERR_PTR(), check for NULL always fails and may cause a NULL pointer
      dereference on error path.
      
      Fixes: 25e34b44 ("irqchip/mxs: Prepare driver for hardware with different offsets")
      Signed-off-by: default avatarVladimir Zapolskiy <vz@mleia.com>
      Cc: Jason Cooper <jason@lakedaemon.net>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Oleksij Rempel <linux@rempel-privat.de>
      Cc: Sascha Hauer <kernel@pengutronix.de>
      Cc: Shawn Guo <shawnguo@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lkml.kernel.org/r/1457486500-10237-1-git-send-email-vz@mleia.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e60711a1
    • Vladimir Zapolskiy's avatar
      irqchip/sunxi-nmi: Fix error check of of_io_request_and_map() · fd66dc5d
      Vladimir Zapolskiy authored
      commit cfe199af upstream.
      
      The of_io_request_and_map() returns a valid pointer in iomem region or
      ERR_PTR(), check for NULL always fails and may cause a NULL pointer
      dereference on error path.
      
      Fixes: 0e841b04 ("irqchip/sunxi-nmi: Switch to of_io_request_and_map() from of_iomap()")
      Signed-off-by: default avatarVladimir Zapolskiy <vz@mleia.com>
      Cc: Jason Cooper <jason@lakedaemon.net>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Chen-Yu Tsai <wens@csie.org>
      Cc: Maxime Ripard <maxime.ripard@free-electrons.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lkml.kernel.org/r/1457486489-10189-1-git-send-email-vz@mleia.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fd66dc5d
    • Huibin Hong's avatar
      spi/rockchip: Make sure spi clk is on in rockchip_spi_set_cs · 791e8462
      Huibin Hong authored
      commit b920cc31 upstream.
      
      Rockchip_spi_set_cs could be called by spi_setup, but
      spi_setup may be called by device driver after runtime suspend.
      Then the spi clock is closed, rockchip_spi_set_cs may access the
      spi registers, which causes cpu block in some socs.
      
      Fixes: 64e36824 ("spi/rockchip: add driver for Rockchip RK3xxx")
      Signed-off-by: default avatarHuibin Hong <huibin.hong@rock-chips.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      791e8462
    • Peter Zijlstra's avatar
      locking/mcs: Fix mcs_spin_lock() ordering · 23a67ddd
      Peter Zijlstra authored
      commit 920c720a upstream.
      
      Similar to commit b4b29f94 ("locking/osq: Fix ordering of node
      initialisation in osq_lock") the use of xchg_acquire() is
      fundamentally broken with MCS like constructs.
      
      Furthermore, it turns out we rely on the global transitivity of this
      operation because the unlock path observes the pointer with a
      READ_ONCE(), not an smp_load_acquire().
      
      This is non-critical because the MCS code isn't actually used and
      mostly serves as documentation, a stepping stone to the more complex
      things we've build on top of the idea.
      Reported-by: default avatarAndrea Parri <parri.andrea@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Fixes: 3552a07a ("locking/mcs: Use acquire/release semantics")
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      23a67ddd
    • Thierry Reding's avatar
      regulator: core: Fix nested locking of supplies · 5a58f809
      Thierry Reding authored
      commit 70a7fb80 upstream.
      
      Commit fa731ac7 ("regulator: core: avoid unused variable warning")
      introduced a subtle change in how supplies are locked. Where previously
      code was always locking the regulator of the current iteration, the new
      implementation only locks the regulator if it has a supply. For any
      given power tree that means that the root will never get locked.
      
      On the other hand the regulator_unlock_supply() will still release all
      the locks, which in turn causes the lock debugging code to warn about a
      mutex being unlocked which wasn't locked.
      
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Fixes: fa731ac7 ("regulator: core: avoid unused variable warning")
      Signed-off-by: default avatarThierry Reding <treding@nvidia.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5a58f809
    • Mark Brown's avatar
      regulator: core: Ensure we lock all regulators · 29c9f634
      Mark Brown authored
      commit 49a6bb7a upstream.
      
      The latest workaround for the lockdep interface's not using the second
      argument of mutex_lock_nested() changed the loop missed locking the last
      regulator due to a thinko with the loop termination condition exiting
      one regulator too soon.
      Reported-by: default avatarTyler Baker <tyler.baker@linaro.org>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      29c9f634
    • Arnd Bergmann's avatar
      regulator: core: fix regulator_lock_supply regression · f500da32
      Arnd Bergmann authored
      commit bb41897e upstream.
      
      As noticed by Geert Uytterhoeven, my patch to avoid a harmless build warning
      in regulator_lock_supply() was total crap and introduced a real bug:
      
      > [ BUG: bad unlock balance detected! ]
      > kworker/u4:0/6 is trying to release lock (&rdev->mutex) at:
      > [<c0247b84>] regulator_set_voltage+0x38/0x50
      
      we still lock the regulator supplies, but not the actual regulators,
      so we are missing a lock, and the unlock is unbalanced.
      
      This rectifies it by first locking the regulator device itself before
      using the same loop as before to lock its supplies.
      Reported-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Fixes: 716fec9d1965 ("[SUBMITTED] regulator: core: avoid unused variable warning")
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f500da32
    • Greg Kroah-Hartman's avatar
      Revert "regulator: core: Fix nested locking of supplies" · 34af67eb
      Greg Kroah-Hartman authored
      This reverts commit b1999fa6 which was
      commit 70a7fb80 upstream.
      
      It causes run-time breakage in the 4.4-stable tree and more patches are
      needed to be applied first before this one in order to resolve the
      issue.
      Reported-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Thierry Reding <treding@nvidia.com>
      Cc: Mark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      34af67eb
    • Sakari Ailus's avatar
      videobuf2-v4l2: Verify planes array in buffer dequeueing · 19a4e46b
      Sakari Ailus authored
      commit 2c1f6951 upstream.
      
      When a buffer is being dequeued using VIDIOC_DQBUF IOCTL, the exact buffer
      which will be dequeued is not known until the buffer has been removed from
      the queue. The number of planes is specific to a buffer, not to the queue.
      
      This does lead to the situation where multi-plane buffers may be requested
      and queued with n planes, but VIDIOC_DQBUF IOCTL may be passed an argument
      struct with fewer planes.
      
      __fill_v4l2_buffer() however uses the number of planes from the dequeued
      videobuf2 buffer, overwriting kernel memory (the m.planes array allocated
      in video_usercopy() in v4l2-ioctl.c)  if the user provided fewer
      planes than the dequeued buffer had. Oops!
      
      Fixes: b0e0e1f8 ("[media] media: videobuf2: Prepare to divide videobuf2")
      Signed-off-by: default avatarSakari Ailus <sakari.ailus@linux.intel.com>
      Acked-by: default avatarHans Verkuil <hans.verkuil@cisco.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      19a4e46b
    • Sakari Ailus's avatar
      videobuf2-core: Check user space planes array in dqbuf · 3a4b3d18
      Sakari Ailus authored
      commit e7e0c3e2 upstream.
      
      The number of planes in videobuf2 is specific to a buffer. In order to
      verify that the planes array provided by the user is long enough, a new
      vb2_buf_op is required.
      
      Call __verify_planes_array() when the dequeued buffer is known. Return an
      error to the caller if there was one, otherwise remove the buffer from the
      done list.
      Signed-off-by: default avatarSakari Ailus <sakari.ailus@linux.intel.com>
      Acked-by: default avatarHans Verkuil <hans.verkuil@cisco.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3a4b3d18
    • Ignat Korchagin's avatar
      USB: usbip: fix potential out-of-bounds write · 4a1bb501
      Ignat Korchagin authored
      commit b348d7dd upstream.
      
      Fix potential out-of-bounds write to urb->transfer_buffer
      usbip handles network communication directly in the kernel. When receiving a
      packet from its peer, usbip code parses headers according to protocol. As
      part of this parsing urb->actual_length is filled. Since the input for
      urb->actual_length comes from the network, it should be treated as untrusted.
      Any entity controlling the network may put any value in the input and the
      preallocated urb->transfer_buffer may not be large enough to hold the data.
      Thus, the malicious entity is able to write arbitrary data to kernel memory.
      Signed-off-by: default avatarIgnat Korchagin <ignat.korchagin@gmail.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4a1bb501
    • Tejun Heo's avatar
      cgroup: make sure a parent css isn't freed before its children · 3c6266d5
      Tejun Heo authored
      commit 8bb5ef79 upstream.
      
      There are three subsystem callbacks in css shutdown path -
      css_offline(), css_released() and css_free().  Except for
      css_released(), cgroup core didn't guarantee the order of invocation.
      css_offline() or css_free() could be called on a parent css before its
      children.  This behavior is unexpected and led to bugs in cpu and
      memory controller.
      
      The previous patch updated ordering for css_offline() which fixes the
      cpu controller issue.  While there currently isn't a known bug caused
      by misordering of css_free() invocations, let's fix it too for
      consistency.
      
      css_free() ordering can be trivially fixed by moving putting of the
      parent css below css_free() invocation.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3c6266d5
    • Minchan Kim's avatar
      mm/hwpoison: fix wrong num_poisoned_pages accounting · 36abe727
      Minchan Kim authored
      commit d7e69488 upstream.
      
      Currently, migration code increses num_poisoned_pages on *failed*
      migration page as well as successfully migrated one at the trial of
      memory-failure.  It will make the stat wrong.  As well, it marks the
      page as PG_HWPoison even if the migration trial failed.  It would mean
      we cannot recover the corrupted page using memory-failure facility.
      
      This patches fixes it.
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Reported-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      36abe727
    • Minchan Kim's avatar
      mm: vmscan: reclaim highmem zone if buffer_heads is over limit · 87c855f1
      Minchan Kim authored
      commit 7bf52fb8 upstream.
      
      We have been reclaimed highmem zone if buffer_heads is over limit but
      commit 6b4f7799 ("mm: vmscan: invoke slab shrinkers from
      shrink_zone()") changed the behavior so it doesn't reclaim highmem zone
      although buffer_heads is over the limit.  This patch restores the logic.
      
      Fixes: 6b4f7799 ("mm: vmscan: invoke slab shrinkers from shrink_zone()")
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      87c855f1
    • Gerald Schaefer's avatar
      numa: fix /proc/<pid>/numa_maps for THP · e513b90a
      Gerald Schaefer authored
      commit 28093f9f upstream.
      
      In gather_pte_stats() a THP pmd is cast into a pte, which is wrong
      because the layouts may differ depending on the architecture.  On s390
      this will lead to inaccurate numa_maps accounting in /proc because of
      misguided pte_present() and pte_dirty() checks on the fake pte.
      
      On other architectures pte_present() and pte_dirty() may work by chance,
      but there may be an issue with direct-access (dax) mappings w/o
      underlying struct pages when HAVE_PTE_SPECIAL is set and THP is
      available.  In vm_normal_page() the fake pte will be checked with
      pte_special() and because there is no "special" bit in a pmd, this will
      always return false and the VM_PFNMAP | VM_MIXEDMAP checking will be
      skipped.  On dax mappings w/o struct pages, an invalid struct page
      pointer would then be returned that can crash the kernel.
      
      This patch fixes the numa_maps THP handling by introducing new "_pmd"
      variants of the can_gather_numa_stats() and vm_normal_page() functions.
      Signed-off-by: default avatarGerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e513b90a
    • Konstantin Khlebnikov's avatar
      mm/huge_memory: replace VM_NO_THP VM_BUG_ON with actual VMA check · be591a68
      Konstantin Khlebnikov authored
      commit 3486b85a upstream.
      
      Khugepaged detects own VMAs by checking vm_file and vm_ops but this way
      it cannot distinguish private /dev/zero mappings from other special
      mappings like /dev/hpet which has no vm_ops and popultes PTEs in mmap.
      
      This fixes false-positive VM_BUG_ON and prevents installing THP where
      they are not expected.
      
      Link: http://lkml.kernel.org/r/CACT4Y+ZmuZMV5CjSFOeXviwQdABAgT7T+StKfTqan9YDtgEi5g@mail.gmail.com
      Fixes: 78f11a25 ("mm: thp: fix /dev/zero MAP_PRIVATE and vm_flags cleanups")
      Signed-off-by: default avatarKonstantin Khlebnikov <koct9i@gmail.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      be591a68
    • Tejun Heo's avatar
      memcg: relocate charge moving from ->attach to ->post_attach · 52526076
      Tejun Heo authored
      commit 264a0ae1 upstream.
      
      Hello,
      
      So, this ended up a lot simpler than I originally expected.  I tested
      it lightly and it seems to work fine.  Petr, can you please test these
      two patches w/o the lru drain drop patch and see whether the problem
      is gone?
      
      Thanks.
      ------ 8< ------
      If charge moving is used, memcg performs relabeling of the affected
      pages from its ->attach callback which is called under both
      cgroup_threadgroup_rwsem and thus can't create new kthreads.  This is
      fragile as various operations may depend on workqueues making forward
      progress which relies on the ability to create new kthreads.
      
      There's no reason to perform charge moving from ->attach which is deep
      in the task migration path.  Move it to ->post_attach which is called
      after the actual migration is finished and cgroup_threadgroup_rwsem is
      dropped.
      
      * move_charge_struct->mm is added and ->can_attach is now responsible
        for pinning and recording the target mm.  mem_cgroup_clear_mc() is
        updated accordingly.  This also simplifies mem_cgroup_move_task().
      
      * mem_cgroup_move_task() is now called from ->post_attach instead of
        ->attach.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@kernel.org>
      Debugged-and-tested-by: default avatarPetr Mladek <pmladek@suse.com>
      Reported-by: default avatarCyril Hrubis <chrubis@suse.cz>
      Reported-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Fixes: 1ed13287 ("sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      52526076
    • Tejun Heo's avatar
      cgroup, cpuset: replace cpuset_post_attach_flush() with cgroup_subsys->post_attach callback · d5209747
      Tejun Heo authored
      commit 5cf1cacb upstream.
      
      Since e93ad19d ("cpuset: make mm migration asynchronous"), cpuset
      kicks off asynchronous NUMA node migration if necessary during task
      migration and flushes it from cpuset_post_attach_flush() which is
      called at the end of __cgroup_procs_write().  This is to avoid
      performing migration with cgroup_threadgroup_rwsem write-locked which
      can lead to deadlock through dependency on kworker creation.
      
      memcg has a similar issue with charge moving, so let's convert it to
      an official callback rather than the current one-off cpuset specific
      function.  This patch adds cgroup_subsys->post_attach callback and
      makes cpuset register cpuset_post_attach_flush() as its ->post_attach.
      
      The conversion is mostly one-to-one except that the new callback is
      called under cgroup_mutex.  This is to guarantee that no other
      migration operations are started before ->post_attach callbacks are
      finished.  cgroup_mutex is one of the outermost mutex in the system
      and has never been and shouldn't be a problem.  We can add specialized
      synchronization around __cgroup_procs_write() but I don't think
      there's any noticeable benefit.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d5209747
    • Jesper Dangaard Brouer's avatar
      slub: clean up code for kmem cgroup support to kmem_cache_free_bulk · a4e25ff3
      Jesper Dangaard Brouer authored
      commit 376bf125 upstream.
      
      This change is primarily an attempt to make it easier to realize the
      optimizations the compiler performs in-case CONFIG_MEMCG_KMEM is not
      enabled.
      
      Performance wise, even when CONFIG_MEMCG_KMEM is compiled in, the
      overhead is zero.  This is because, as long as no process have enabled
      kmem cgroups accounting, the assignment is replaced by asm-NOP
      operations.  This is possible because memcg_kmem_enabled() uses a
      static_key_false() construct.
      
      It also helps readability as it avoid accessing the p[] array like:
      p[size - 1] which "expose" that the array is processed backwards inside
      helper function build_detached_freelist().
      
      Lastly this also makes the code more robust, in error case like passing
      NULL pointers in the array.  Which were previously handled before commit
      03374518 ("slub: add missing kmem cgroup support to
      kmem_cache_free_bulk").
      
      Fixes: 03374518 ("slub: add missing kmem cgroup support to kmem_cache_free_bulk")
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a4e25ff3
    • Roman Pen's avatar
      workqueue: fix ghost PENDING flag while doing MQ IO · 2da9606a
      Roman Pen authored
      commit 346c09f8 upstream.
      
      The bug in a workqueue leads to a stalled IO request in MQ ctx->rq_list
      with the following backtrace:
      
      [  601.347452] INFO: task kworker/u129:5:1636 blocked for more than 120 seconds.
      [  601.347574]       Tainted: G           O    4.4.5-1-storage+ #6
      [  601.347651] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  601.348142] kworker/u129:5  D ffff880803077988     0  1636      2 0x00000000
      [  601.348519] Workqueue: ibnbd_server_fileio_wq ibnbd_dev_file_submit_io_worker [ibnbd_server]
      [  601.348999]  ffff880803077988 ffff88080466b900 ffff8808033f9c80 ffff880803078000
      [  601.349662]  ffff880807c95000 7fffffffffffffff ffffffff815b0920 ffff880803077ad0
      [  601.350333]  ffff8808030779a0 ffffffff815b01d5 0000000000000000 ffff880803077a38
      [  601.350965] Call Trace:
      [  601.351203]  [<ffffffff815b0920>] ? bit_wait+0x60/0x60
      [  601.351444]  [<ffffffff815b01d5>] schedule+0x35/0x80
      [  601.351709]  [<ffffffff815b2dd2>] schedule_timeout+0x192/0x230
      [  601.351958]  [<ffffffff812d43f7>] ? blk_flush_plug_list+0xc7/0x220
      [  601.352208]  [<ffffffff810bd737>] ? ktime_get+0x37/0xa0
      [  601.352446]  [<ffffffff815b0920>] ? bit_wait+0x60/0x60
      [  601.352688]  [<ffffffff815af784>] io_schedule_timeout+0xa4/0x110
      [  601.352951]  [<ffffffff815b3a4e>] ? _raw_spin_unlock_irqrestore+0xe/0x10
      [  601.353196]  [<ffffffff815b093b>] bit_wait_io+0x1b/0x70
      [  601.353440]  [<ffffffff815b056d>] __wait_on_bit+0x5d/0x90
      [  601.353689]  [<ffffffff81127bd0>] wait_on_page_bit+0xc0/0xd0
      [  601.353958]  [<ffffffff81096db0>] ? autoremove_wake_function+0x40/0x40
      [  601.354200]  [<ffffffff81127cc4>] __filemap_fdatawait_range+0xe4/0x140
      [  601.354441]  [<ffffffff81127d34>] filemap_fdatawait_range+0x14/0x30
      [  601.354688]  [<ffffffff81129a9f>] filemap_write_and_wait_range+0x3f/0x70
      [  601.354932]  [<ffffffff811ced3b>] blkdev_fsync+0x1b/0x50
      [  601.355193]  [<ffffffff811c82d9>] vfs_fsync_range+0x49/0xa0
      [  601.355432]  [<ffffffff811cf45a>] blkdev_write_iter+0xca/0x100
      [  601.355679]  [<ffffffff81197b1a>] __vfs_write+0xaa/0xe0
      [  601.355925]  [<ffffffff81198379>] vfs_write+0xa9/0x1a0
      [  601.356164]  [<ffffffff811c59d8>] kernel_write+0x38/0x50
      
      The underlying device is a null_blk, with default parameters:
      
        queue_mode    = MQ
        submit_queues = 1
      
      Verification that nullb0 has something inflight:
      
      root@pserver8:~# cat /sys/block/nullb0/inflight
             0        1
      root@pserver8:~# find /sys/block/nullb0/mq/0/cpu* -name rq_list -print -exec cat {} \;
      ...
      /sys/block/nullb0/mq/0/cpu2/rq_list
      CTX pending:
              ffff8838038e2400
      ...
      
      During debug it became clear that stalled request is always inserted in
      the rq_list from the following path:
      
         save_stack_trace_tsk + 34
         blk_mq_insert_requests + 231
         blk_mq_flush_plug_list + 281
         blk_flush_plug_list + 199
         wait_on_page_bit + 192
         __filemap_fdatawait_range + 228
         filemap_fdatawait_range + 20
         filemap_write_and_wait_range + 63
         blkdev_fsync + 27
         vfs_fsync_range + 73
         blkdev_write_iter + 202
         __vfs_write + 170
         vfs_write + 169
         kernel_write + 56
      
      So blk_flush_plug_list() was called with from_schedule == true.
      
      If from_schedule is true, that means that finally blk_mq_insert_requests()
      offloads execution of __blk_mq_run_hw_queue() and uses kblockd workqueue,
      i.e. it calls kblockd_schedule_delayed_work_on().
      
      That means, that we race with another CPU, which is about to execute
      __blk_mq_run_hw_queue() work.
      
      Further debugging shows the following traces from different CPUs:
      
        CPU#0                                  CPU#1
        ----------------------------------     -------------------------------
        reqeust A inserted
        STORE hctx->ctx_map[0] bit marked
        kblockd_schedule...() returns 1
        <schedule to kblockd workqueue>
                                               request B inserted
                                               STORE hctx->ctx_map[1] bit marked
                                               kblockd_schedule...() returns 0
        *** WORK PENDING bit is cleared ***
        flush_busy_ctxs() is executed, but
        bit 1, set by CPU#1, is not observed
      
      As a result request B pended forever.
      
      This behaviour can be explained by speculative LOAD of hctx->ctx_map on
      CPU#0, which is reordered with clear of PENDING bit and executed _before_
      actual STORE of bit 1 on CPU#1.
      
      The proper fix is an explicit full barrier <mfence>, which guarantees
      that clear of PENDING bit is to be executed before all possible
      speculative LOADS or STORES inside actual work function.
      Signed-off-by: default avatarRoman Pen <roman.penyaev@profitbricks.com>
      Cc: Gioh Kim <gi-oh.kim@profitbricks.com>
      Cc: Michael Wang <yun.wang@profitbricks.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-block@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2da9606a