1. 26 Sep, 2014 40 commits
    • Lai Jiangshan's avatar
      workqueue: ensure @task is valid across kthread_stop() · 770e3572
      Lai Jiangshan authored
      When a kworker should die, the kworkre is notified through WORKER_DIE
      flag instead of kthread_should_stop().  This, IIRC, is primarily to
      keep the test synchronized inside worker_pool lock.  WORKER_DIE is
      first set while holding pool->lock, the lock is dropped and
      kthread_stop() is called.
      
      Unfortunately, this means that there's a slight chance that the target
      kworker may see WORKER_DIE before kthread_stop() finishes and exits
      and frees the target task before or during kthread_stop().
      
      Fix it by pinning the target task before setting WORKER_DIE and
      putting it after kthread_stop() is done.
      
      tj: Improved patch description and comment.  Moved pinning above
          WORKER_DIE for better signify what it's protecting.
      
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      
      (cherry picked from commit 5bdfff96)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      770e3572
    • Alan Stern's avatar
      usb-storage: enable multi-LUN scanning when needed · af8c400a
      Alan Stern authored
      People sometimes create their own custom-configured kernels and forget
      to enable CONFIG_SCSI_MULTI_LUN.  This causes problems when they plug
      in a USB storage device (such as a card reader) with more than one
      LUN.
      
      Fortunately, we can tell fairly easily when a storage device claims to
      have more than one LUN.  When that happens, this patch asks the SCSI
      layer to probe all the LUNs automatically, regardless of the config
      setting.
      
      The patch also updates the Kconfig help text for usb-storage,
      explaining that CONFIG_SCSI_MULTI_LUN may be necessary.
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Reported-by: default avatarThomas Raschbacher <lordvan@lordvan.com>
      CC: Matthew Dharm <mdharm-usb@one-eyed-alien.net>
      CC: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      (cherry picked from commit 823d12c9)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      af8c400a
    • Mikulas Patocka's avatar
      alpha: fix broken network checksum · 839b0e91
      Mikulas Patocka authored
      The patch 3ddc5b46 breaks networking on
      alpha (there is a follow-up fix 5cfe8f1b,
      but networking is still broken even with the second patch).
      
      The patch 3ddc5b46 makes
      csum_partial_copy_from_user check the pointer with access_ok. However,
      csum_partial_copy_from_user is called also from csum_partial_copy_nocheck
      and csum_partial_copy_nocheck is called on kernel pointers and it is
      supposed not to check pointer validity.
      
      This bug results in ssh session hangs if the system is loaded and bulk
      data are printed to ssh terminal.
      
      This patch fixes csum_partial_copy_nocheck to call set_fs(KERNEL_DS), so
      that access_ok in csum_partial_copy_from_user accepts kernel-space
      addresses.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMatt Turner <mattst88@gmail.com>
      
      (cherry picked from commit 0ef38d70)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      839b0e91
    • Dan Carpenter's avatar
      net: clamp ->msg_namelen instead of returning an error · b79fa518
      Dan Carpenter authored
      If kmsg->msg_namelen > sizeof(struct sockaddr_storage) then in the
      original code that would lead to memory corruption in the kernel if you
      had audit configured.  If you didn't have audit configured it was
      harmless.
      
      There are some programs such as beta versions of Ruby which use too
      large of a buffer and returning an error code breaks them.  We should
      clamp the ->msg_namelen value instead.
      
      Fixes: 1661bf36 ("net: heap overflow in __audit_sockaddr()")
      Reported-by: default avatarEric Wong <normalperson@yhbt.net>
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Tested-by: default avatarEric Wong <normalperson@yhbt.net>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit db31c55a)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      b79fa518
    • Jean Delvare's avatar
      hwmon: (w83l768ng) Fix fan speed control range · 4ed4e043
      Jean Delvare authored
      The W83L786NG stores the fan speed on 4 bits while the sysfs interface
      uses a 0-255 range. Thus the driver should scale the user input down
      to map it to the device range, and scale up the value read from the
      device before presenting it to the user. The reserved register nibble
      should be left unchanged.
      Signed-off-by: default avatarJean Delvare <khali@linux-fr.org>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarGuenter Roeck <linux@roeck-us.net>
      
      (cherry picked from commit 33a7ab91)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      4ed4e043
    • Michael Neuling's avatar
      powerpc/signals: Improved mark VSX not saved with small contexts fix · 336d6cf2
      Michael Neuling authored
      In a recent patch:
        commit c13f20ac
        Author: Michael Neuling <mikey@neuling.org>
        powerpc/signals: Mark VSX not saved with small contexts
      
      We fixed an issue but an improved solution was later discussed after the patch
      was merged.
      
      Firstly, this patch doesn't handle the 64bit signals case, which could also hit
      this issue (but has never been reported).
      
      Secondly, the original patch isn't clear what MSR VSX should be set to.  The
      new approach below always clears the MSR VSX bit (to indicate no VSX is in the
      context) and sets it only in the specific case where VSX is available (ie. when
      VSX has been used and the signal context passed has space to provide the
      state).
      
      This reverts the original patch and replaces it with the improved solution.  It
      also adds a 64 bit version.
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      
      (cherry picked from commit ec67ad82)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      336d6cf2
    • Michael Neuling's avatar
      powerpc/signals: Mark VSX not saved with small contexts · 76b28eb5
      Michael Neuling authored
      commit c13f20ac upstream.
      
      The VSX MSR bit in the user context indicates if the context contains VSX
      state.  Currently we set this when the process has touched VSX at any stage.
      
      Unfortunately, if the user has not provided enough space to save the VSX state,
      we can't save it but we currently still set the MSR VSX bit.
      
      This patch changes this to clear the MSR VSX bit when the user doesn't provide
      enough space.  This indicates that there is no valid VSX state in the user
      context.
      
      This is needed to support get/set/make/swapcontext for applications that use
      VSX but only provide a small context.  For example, getcontext in glibc
      provides a smaller context since the VSX registers don't need to be saved over
      the glibc function call.  But since the program calling getcontext may have
      used VSX, the kernel currently says the VSX state is valid when it's not.  If
      the returned context is then used in setcontext (ie. a small context without
      VSX but with MSR VSX set), the kernel will refuse the context.  This situation
      has been reported by the glibc community.
      
      Based on patch from Carlos O'Donell.
      Tested-by: default avatarHaren Myneni <haren@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 3bf4e8c8)
      76b28eb5
    • Jan Kara's avatar
      ext2: Fix fs corruption in ext2_get_xip_mem() · 5163683b
      Jan Kara authored
      Commit 8e3dffc6 "Ext2: mark inode dirty after the function
      dquot_free_block_nodirty is called" unveiled a bug in __ext2_get_block()
      called from ext2_get_xip_mem(). That function called ext2_get_block()
      mistakenly asking it to map 0 blocks while 1 was intended. Before the
      above mentioned commit things worked out fine by luck but after that commit
      we started returning that we allocated 0 blocks while we in fact
      allocated 1 block and thus allocation was looping until all blocks in
      the filesystem were exhausted.
      
      Fix the problem by properly asking for one block and also add assertion
      in ext2_get_blocks() to catch similar problems.
      Reported-and-tested-by: default avatarAndiry Xu <andiry.xu@gmail.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      
      (cherry picked from commit 7ba3ec57)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      5163683b
    • Paul Gortmaker's avatar
      8250_pci: fix warnings in backport of Broadcom TruManage support · 580d2659
      Paul Gortmaker authored
      commit 7400ce7e (v3.4.92-76-g7400ce7e)
      was a backport of commit ebebd49a upstream
      ("8250/16?50: Add support for Broadcom TruManage redirected serial port")
      
      However, in the context of 3.4.x kernels, the pci setup code was
      expecting a struct uart_port and not a struct uart_8250_port, leading to
      the following concerning warnings:
      
      drivers/tty/serial/8250/8250_pci.c: In function ‘pci_brcm_trumanage_setup’:
      drivers/tty/serial/8250/8250_pci.c:1086:2: warning: passing argument 3 of ‘pci_default_setup’ from incompatible pointer type [enabled by default]
        int ret = pci_default_setup(priv, board, port, idx);
        ^
      drivers/tty/serial/8250/8250_pci.c:1036:1: note: expected ‘struct uart_port *’ but argument is of type ‘struct uart_8250_port *’
       pci_default_setup(struct serial_private *priv,
       ^
      drivers/tty/serial/8250/8250_pci.c: At top level:
      drivers/tty/serial/8250/8250_pci.c:1746:3: warning: initialization from incompatible pointer type [enabled by default]
         .setup  = pci_brcm_trumanage_setup,
         ^
      drivers/tty/serial/8250/8250_pci.c:1746:3: warning: (near initialization for ‘pci_serial_quirks[56].setup’) [enabled by default]
      
      I'd also expect the initialization to not function correctly, and
      perhaps dereference random garbage due to this.  Since the uart_port
      is a field within the uart_8250_port, the adaptation to fix these
      warnings is a straightforward removal of a layer of indirection.
      
      Cc: Stephen Hurd <shurd@broadcom.com>
      Cc: Michael Chan <mchan@broadcom.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Rui Xiang <rui.xiang@huawei.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      (cherry picked from commit 82a938ab)
      
      (cherry picked from commit HEAD)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      580d2659
    • Stefan Kristiansson's avatar
      openrisc: add missing header inclusion · 27a615f0
      Stefan Kristiansson authored
      Prevents build issue with updated toolchain
      Reported-by: default avatarJack Thomasson <jkt@moonlitsw.com>
      Tested-by: default avatarChristian Svensson <blue@cmd.nu>
      Signed-off-by: default avatarStefan Kristiansson <stefan.kristiansson@saunalahti.fi>
      Signed-off-by: default avatarJonas Bonn <jonas@southpole.se>
      
      (cherry picked from commit 160d8378)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      27a615f0
    • Johan Hovold's avatar
      USB: serial: fix potential heap buffer overflow · e547051b
      Johan Hovold authored
      Make sure to verify the number of ports requested by subdriver to avoid
      writing beyond the end of fixed-size array in interface data.
      
      The current usb-serial implementation is limited to eight ports per
      interface but failed to verify that the number of ports requested by a
      subdriver (which could have been determined from device descriptors) did
      not exceed this limit.
      
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      (cherry picked from commit 5654699f)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      e547051b
    • Mark Rutland's avatar
      ARM: 8129/1: errata: work around Cortex-A15 erratum 830321 using dummy strex · a8252b76
      Mark Rutland authored
      On revisions of Cortex-A15 prior to r3p3, a CLREX instruction at PL1 may
      falsely trigger a watchpoint exception, leading to potential data aborts
      during exception return and/or livelock.
      
      This patch resolves the issue in the following ways:
      
        - Replacing our uses of CLREX with a dummy STREX sequence instead (as
          we did for v6 CPUs).
      
        - Removing the clrex code from v7_exit_coherency_flush and derivatives,
          since this only exists as a minor performance improvement when
          non-cached exclusives are in use (Linux doesn't use these).
      
      Benchmarking on a variety of ARM cores revealed no measurable
      performance difference with this change applied, so the change is
      performed unconditionally and no new Kconfig entry is added.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      
      (cherry picked from commit 2c32c65e)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      a8252b76
    • Mark Rutland's avatar
      ARM: 8128/1: abort: don't clear the exclusive monitors · bc4c7e40
      Mark Rutland authored
      The ARMv6 and ARMv7 early abort handlers clear the exclusive monitors
      upon entry to the kernel, but this is redundant:
      
        - We clear the monitors on every exception return since commit
          200b812d ("Clear the exclusive monitor when returning from an
          exception"), so this is not necessary to ensure the monitors are
          cleared before returning from a fault handler.
      
        - Any dummy STREX will target a temporary scratch area in memory, and
          may succeed or fail without corrupting useful data. Its status value
          will not be used.
      
        - Any other STREX in the kernel must be preceded by an LDREX, which
          will initialise the monitors consistently and will not depend on the
          earlier state of the monitors.
      
      Therefore we have no reason to care about the initial state of the
      exclusive monitors when a data abort is taken, and clearing the monitors
      prior to exception return (as we already do) is sufficient.
      
      This patch removes the redundant clearing of the exclusive monitors from
      the early abort handlers.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      
      (cherry picked from commit 85868313)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      bc4c7e40
    • Jiri Kosina's avatar
      HID: magicmouse: sanity check report size in raw_event() callback · 1e2d552a
      Jiri Kosina authored
      The report passed to us from transport driver could potentially be
      arbitrarily large, therefore we better sanity-check it so that
      magicmouse_emit_touch() gets only valid values of raw_id.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarSteven Vittitoe <scvitti@google.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      
      (cherry picked from commit c54def7b)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      1e2d552a
    • Stephen Hemminger's avatar
      USB: sisusb: add device id for Magic Control USB video · c1112865
      Stephen Hemminger authored
      I have a j5 create (JUA210) USB 2 video device and adding it device id
      to SIS USB video gets it to work.
      Signed-off-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      (cherry picked from commit 5b6b80ae)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      c1112865
    • Benjamin Tissoires's avatar
      HID: logitech-dj: prevent false errors to be shown · d4ee521d
      Benjamin Tissoires authored
      Commit "HID: logitech: perform bounds checking on device_id early
      enough" unfortunately leaks some errors to dmesg which are not real
      ones:
      - if the report is not a DJ one, then there is not point in checking
        the device_id
      - the receiver (index 0) can also receive some notifications which
        can be safely ignored given the current implementation
      
      Move out the test regarding the report_id and also discards
      printing errors when the receiver got notified.
      
      Fixes: ad3e14d7
      
      Cc: stable@vger.kernel.org
      Reported-and-tested-by: default avatarMarkus Trippelsdorf <markus@trippelsdorf.de>
      Signed-off-by: default avatarBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      
      (cherry picked from commit 5abfe85c)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      d4ee521d
    • Jaša Bartelj's avatar
      USB: ftdi_sio: Added PID for new ekey device · 46f96315
      Jaša Bartelj authored
      Added support to the ftdi_sio driver for ekey Converter USB which
      uses an FT232BM chip.
      Signed-off-by: default avatarJaša Bartelj <jasa.bartelj@gmail.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      
      (cherry picked from commit 646907f5)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      46f96315
    • Greg KH's avatar
      USB: serial: pl2303: add device id for ztek device · 5c478599
      Greg KH authored
      This adds a new device id to the pl2303 driver for the ZTEK device.
      Reported-by: default avatarMike Chu <Mike-Chu@prolific.com.tw>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      
      (cherry picked from commit 91fcb1ce)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      5c478599
    • Brennan Ashton's avatar
      USB: option: add VIA Telecom CDS7 chipset device id · 4c1a1c23
      Brennan Ashton authored
      This VIA Telecom baseband processor is used is used by by u-blox in both the
      FW2770 and FW2760 products and may be used in others as well.
      
      This patch has been tested on both of these modem versions.
      Signed-off-by: default avatarBrennan Ashton <bashton@brennanashton.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      
      (cherry picked from commit d7730273)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      4c1a1c23
    • Max Filippov's avatar
      xtensa: fix a6 and a7 handling in fast_syscall_xtensa · b87c951d
      Max Filippov authored
      Remove restoring a6 on some return paths and instead modify and restore
      it in a single place, using symbolic name.
      Correctly restore a7 from PT_AREG7 in case of illegal a6 value.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      
      (cherry picked from commit d1b6ba82)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      b87c951d
    • Max Filippov's avatar
      xtensa: fix TLBTEMP_BASE_2 region handling in fast_second_level_miss · eb398c35
      Max Filippov authored
      Current definition of TLBTEMP_BASE_2 is always 32K above the
      TLBTEMP_BASE_1, whereas fast_second_level_miss handler for the TLBTEMP
      region analyzes virtual address bit (PAGE_SHIFT + DCACHE_ALIAS_ORDER)
      to determine TLBTEMP region where the fault happened. The size of the
      TLBTEMP region is also checked incorrectly: not 64K, but twice data
      cache way size (whicht may as well be less than the instruction cache
      way size).
      
      Fix TLBTEMP_BASE_2 to be TLBTEMP_BASE_1 + data cache way size.
      Provide TLBTEMP_SIZE that is a greater of doubled data cache way size or
      the instruction cache way size, and use it to determine if the second
      level TLB miss occured in the TLBTEMP region.
      
      Practical occurence of page faults in the TLBTEMP area is extremely
      rare, this code can be tested by deletion of all w[di]tlb instructions
      in the tlbtemp_mapping region.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      
      (cherry picked from commit 7128039f)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      eb398c35
    • Alan Douglas's avatar
      xtensa: fix address checks in dma_{alloc,free}_coherent · 523733da
      Alan Douglas authored
      Virtual address is translated to the XCHAL_KSEG_CACHED region in the
      dma_free_coherent, but is checked to be in the 0...XCHAL_KSEG_SIZE
      range.
      
      Change check for end of the range from 'addr >= X' to 'addr > X - 1' to
      handle the case of X == 0.
      
      Replace 'if (C) BUG();' construct with 'BUG_ON(C);'.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAlan Douglas <adouglas@cadence.com>
      Signed-off-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      
      (cherry picked from commit 1ca49463)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      523733da
    • Arjun Sreedharan's avatar
      pata_scc: propagate return value of scc_wait_after_reset · bff54be9
      Arjun Sreedharan authored
      scc_bus_softreset not necessarily should return zero.
      Propagate the error code.
      Signed-off-by: default avatarArjun Sreedharan <arjun024@gmail.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org
      
      (cherry picked from commit 4dc7c76c)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      bff54be9
    • Anton Blanchard's avatar
      ibmveth: Fix endian issues with rx_no_buffer statistic · 4132d9dd
      Anton Blanchard authored
      Hidden away in the last 8 bytes of the buffer_list page is a solitary
      statistic. It needs to be byte swapped or else ethtool -S will
      produce numbers that terrify the user.
      
      Since we do this in multiple places, create a helper function with a
      comment explaining what is going on.
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit cbd52281)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      4132d9dd
    • David S. Miller's avatar
      sparc64: Do not insert non-valid PTEs into the TSB hash table. · cc241ba4
      David S. Miller authored
      The assumption was that update_mmu_cache() (and the equivalent for PMDs) would
      only be called when the PTE being installed will be accessible by the user.
      
      This is not true for code paths originating from remove_migration_pte().
      
      There are dire consequences for placing a non-valid PTE into the TSB.  The TLB
      miss frramework assumes thatwhen a TSB entry matches we can just load it into
      the TLB and return from the TLB miss trap.
      
      So if a non-valid PTE is in there, we will deadlock taking the TLB miss over
      and over, never satisfying the miss.
      
      Just exit early from update_mmu_cache() and friends in this situation.
      
      Based upon a report and patch from Christopher Alexander Tobias Schulze.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit 18f38132)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      cc241ba4
    • H. Peter Anvin's avatar
      x86, espfix: Make espfix64 a Kconfig option, fix UML · 5d66c0c7
      H. Peter Anvin authored
      Make espfix64 a hidden Kconfig option.  This fixes the x86-64 UML
      build which had broken due to the non-existence of init_espfix_bsp()
      in UML: since UML uses its own Kconfig, this option does not appear in
      the UML build.
      
      This also makes it possible to make support for 16-bit segments a
      configuration option, for the people who want to minimize the size of
      the kernel.
      Reported-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Cc: Richard Weinberger <richard@nod.at>
      Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
      
      (cherry picked from commit 197725de)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      5d66c0c7
    • Thomas Gleixner's avatar
      rtmutex: Fix deadlock detector for real · 42262f2c
      Thomas Gleixner authored
      The current deadlock detection logic does not work reliably due to the
      following early exit path:
      
      	/*
      	 * Drop out, when the task has no waiters. Note,
      	 * top_waiter can be NULL, when we are in the deboosting
      	 * mode!
      	 */
      	if (top_waiter && (!task_has_pi_waiters(task) ||
      			   top_waiter != task_top_pi_waiter(task)))
      		goto out_unlock_pi;
      
      So this not only exits when the task has no waiters, it also exits
      unconditionally when the current waiter is not the top priority waiter
      of the task.
      
      So in a nested locking scenario, it might abort the lock chain walk
      and therefor miss a potential deadlock.
      
      Simple fix: Continue the chain walk, when deadlock detection is
      enabled.
      
      We also avoid the whole enqueue, if we detect the deadlock right away
      (A-A). It's an optimization, but also prevents that another waiter who
      comes in after the detection and before the task has undone the damage
      observes the situation and detects the deadlock and returns
      -EDEADLOCK, which is wrong as the other task is not in a deadlock
      situation.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Reviewed-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20140522031949.725272460@linutronix.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      
      (cherry picked from commit 397335f0)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      42262f2c
    • Nadav Amit's avatar
      KVM: x86: Increase the number of fixed MTRR regs to 10 · 8a98f151
      Nadav Amit authored
      commit 682367c4 upstream.
      
      Recent Intel CPUs have 10 variable range MTRRs. Since operating systems
      sometime make assumptions on CPUs while they ignore capability MSRs, it is
      better for KVM to be consistent with recent CPUs. Reporting more MTRRs than
      actually supported has no functional implications.
      Signed-off-by: default avatarNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      (cherry picked from commit 80bbfbaa)
      8a98f151
    • Thomas Hellstrom's avatar
      drm/vmwgfx: Fix incorrect write to read-only register v2: · 4194290b
      Thomas Hellstrom authored
      Commit "drm/vmwgfx: correct fb_fix_screeninfo.line_length", while fixing a
      vmwgfx fbdev bug, also writes the pitch to a supposedly read-only register:
      SVGA_REG_BYTES_PER_LINE, while it should be (and also in fact is) written to
      SVGA_REG_PITCHLOCK.
      
      This patch is Cc'd stable because of the unknown effects writing to this
      register might have, particularly on older device versions.
      
      v2: Updated log message.
      
      Cc: stable@vger.kernel.org
      Cc: Christopher Friedt <chrisfriedt@gmail.com>
      Tested-by: default avatarChristopher Friedt <chrisfriedt@gmail.com>
      Signed-off-by: default avatarThomas Hellstrom <thellstrom@vmware.com>
      Reviewed-by: default avatarJakob Bornecrantz <jakob@vmware.com>
      
      (cherry picked from commit 4e578080)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      4194290b
    • Jaganath Kanakkassery's avatar
      Bluetooth: Fix invalid length check in l2cap_information_rsp() · 48818e14
      Jaganath Kanakkassery authored
      The length check is invalid since the length varies with type of
      info response.
      
      This was introduced by the commit cb3b3152
      
      Because of this, l2cap info rsp is not handled and command reject is sent.
      
      > ACL data: handle 11 flags 0x02 dlen 16
              L2CAP(s): Info rsp: type 2 result 0
                Extended feature mask 0x00b8
                  Enhanced Retransmission mode
                  Streaming mode
                  FCS Option
                  Fixed Channels
      < ACL data: handle 11 flags 0x00 dlen 10
              L2CAP(s): Command rej: reason 0
                Command not understood
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJaganath Kanakkassery <jaganath.k@samsung.com>
      Signed-off-by: default avatarChan-Yeol Park <chanyeol.park@samsung.com>
      Acked-by: default avatarJohan Hedberg <johan.hedberg@intel.com>
      Signed-off-by: default avatarGustavo Padovan <gustavo.padovan@collabora.co.uk>
      
      ath9k_htc: Handle IDLE state transition properly
      
      Make sure that a chip reset is done when IDLE is turned
      off - this fixes authentication timeouts.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarIgnacy Gawedzki <i@lri.fr>
      Signed-off-by: default avatarSujith Manoharan <c_manoha@qca.qualcomm.com>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      
      ath9k: fix an RCU issue in calling ieee80211_get_tx_rates
      
      ath_txq_schedule is called outside of the drv_tx call, so it needs RCU
      protection.
      Signed-off-by: default avatarFelix Fietkau <nbd@openwrt.org>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      
      Bluetooth: Fix invalid length check in l2cap_information_rsp()
      
      The length check is invalid since the length varies with type of
      info response.
      
      This was introduced by the commit cb3b3152
      
      Because of this, l2cap info rsp is not handled and command reject is sent.
      
      > ACL data: handle 11 flags 0x02 dlen 16
              L2CAP(s): Info rsp: type 2 result 0
                Extended feature mask 0x00b8
                  Enhanced Retransmission mode
                  Streaming mode
                  FCS Option
                  Fixed Channels
      < ACL data: handle 11 flags 0x00 dlen 10
              L2CAP(s): Command rej: reason 0
                Command not understood
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJaganath Kanakkassery <jaganath.k@samsung.com>
      Signed-off-by: default avatarChan-Yeol Park <chanyeol.park@samsung.com>
      Acked-by: default avatarJohan Hedberg <johan.hedberg@intel.com>
      Signed-off-by: default avatarGustavo Padovan <gustavo.padovan@collabora.co.uk>
      
      Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
      
      nl80211: fix attrbuf access race by allocating a separate one
      
      Since my commit 3713b4e3 ("nl80211: allow splitting wiphy
      information in dumps"), nl80211_dump_wiphy() uses the global
      nl80211_fam.attrbuf for parsing the incoming data. This wouldn't
      be a problem if it only did so on the first dump iteration which
      is locked against other commands in generic netlink, but due to
      space constraints in cb->args (the needed state doesn't fit) I
      decided to always parse the original message. That's racy though
      since nl80211_fam.attrbuf could be used by some other parsing in
      generic netlink concurrently.
      
      For now, fix this by allocating a separate parse buffer (it's a
      bit too big for the stack, currently 1448 bytes on 64-bit). For
      -next, I'll change the code to parse into the global buffer in
      the first round only and then allocate a smaller buffer to keep
      the data in cb->args.
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      
      (cherry picked from commit da9910ac
      3f6fa3d4)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      48818e14
    • Konrad Rzeszutek Wilk's avatar
      intel_idle: Don't register CPU notifier if we are not running. · 63c9b9e3
      Konrad Rzeszutek Wilk authored
      The 'intel_idle_probe' probes the CPU and sets the CPU notifier.
      But if later on during the module initialization we fail (say
      in cpuidle_register_driver), we stop loading, but we neglect
      to unregister the CPU notifier.  This means that during CPU
      hotplug events the system will fail:
      
      calling  intel_idle_init+0x0/0x326 @ 1
      intel_idle: MWAIT substates: 0x1120
      intel_idle: v0.4 model 0x2A
      intel_idle: lapic_timer_reliable_states 0xffffffff
      intel_idle: intel_idle yielding to none
      initcall intel_idle_init+0x0/0x326 returned -19 after 14 usecs
      
      ... some time later, offlining and onlining a CPU:
      
      cpu 3 spinlock event irq 62
      BUG: unable to ] __cpuidle_register_device+0x1c/0x120
      PGD 99b8b067 PUD 99b95067 PMD 0
      Oops: 0000 [#1] SMP
      Modules linked in: xen_evtchn nouveau mxm_wmi wmi radeon ttm i915 fbcon tileblit font atl1c bitblit softcursor drm_kms_helper video xen_blkfront xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea xenfs xen_privcmd mperf
      CPU 0
      Pid: 2302, comm: udevd Not tainted 3.8.0-rc3upstream-00249-g09ad159 #1 MSI MS-7680/H61M-P23 (MS-7680)
      RIP: e030:[<ffffffff814d956c>]  [<ffffffff814d956c>] __cpuidle_register_device+0x1c/0x120
      RSP: e02b:ffff88009dacfcb8  EFLAGS: 00010286
      RAX: 0000000000000000 RBX: ffff880105380000 RCX: 000000000000001c
      RDX: 0000000000000000 RSI: 0000000000000055 RDI: ffff880105380000
      RBP: ffff88009dacfce8 R08: ffffffff81a4f048 R09: 0000000000000008
      R10: 0000000000000008 R11: 0000000000000000 R12: ffff880105380000
      R13: 00000000ffffffdd R14: 0000000000000000 R15: ffffffff81a523d0
      FS:  00007f37bd83b7a0(0000) GS:ffff880105200000(0000) knlGS:0000000000000000
      CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000008 CR3: 00000000a09ea000 CR4: 0000000000042660
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process udevd (pid: 2302, threadinfo ffff88009dace000, task ffff88009afb47f0)
      Stack:
       ffffffff8107f2d0 ffffffff810c2fb7 ffff88009dacfce8 00000000ffffffea
       ffff880105380000 00000000ffffffdd ffff88009dacfd08 ffffffff814d9882
       0000000000000003 ffff880105380000 ffff88009dacfd28 ffffffff81340afd
      Call Trace:
       [<ffffffff8107f2d0>] ? collect_cpu_info_local+0x30/0x30
       [<ffffffff810c2fb7>] ? __might_sleep+0xe7/0x100
       [<ffffffff814d9882>] cpuidle_register_device+0x32/0x70
       [<ffffffff81340afd>] intel_idle_cpu_init+0xad/0x110
       [<ffffffff81340bc8>] cpu_hotplug_notify+0x68/0x80
       [<ffffffff8166023d>] notifier_call_chain+0x4d/0x70
       [<ffffffff810bc369>] __raw_notifier_call_chain+0x9/0x10
       [<ffffffff81094a4b>] __cpu_notify+0x1b/0x30
       [<ffffffff81652cf7>] _cpu_up+0x103/0x14b
       [<ffffffff81652e18>] cpu_up+0xd9/0xec
       [<ffffffff8164a254>] store_online+0x94/0xd0
       [<ffffffff814122fb>] dev_attr_store+0x1b/0x20
       [<ffffffff81216404>] sysfs_write_file+0xf4/0x170
       [<ffffffff811a1024>] vfs_write+0xb4/0x130
       [<ffffffff811a17ea>] sys_write+0x5a/0xa0
       [<ffffffff816643a9>] system_call_fastpath+0x16/0x1b
      Code: 03 18 00 c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec 30 48 89 5d e8 4c 89 65 f0 48 89 fb 4c 89 6d f8 e8 84 08 00 00 <48> 8b 78 08 49 89 c4 e8 f8 7f c1 ff 89 c2 b8 ea ff ff ff 84 d2
      RIP  [<ffffffff814d956c>] __cpuidle_register_device+0x1c/0x120
       RSP <ffff88009dacfcb8>
      
      This patch fixes that by moving the CPU notifier registration
      as the last item to be done by the module.
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: default avatarSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: 3.6+ <stable@vger.kernel.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      
      (cherry picked from commit 6f8c2e79)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      63c9b9e3
    • Florian Westphal's avatar
      net: ipv4: ip_forward: fix inverted local_df test · dd62a1f9
      Florian Westphal authored
      local_df means 'ignore DF bit if set', so if its set we're
      allowed to perform ip fragmentation.
      
      This wasn't noticed earlier because the output path also drops such skbs
      (and emits needed icmp error) and because netfilter ip defrag did not
      set local_df until couple of days ago.
      
      Only difference is that DF-packets-larger-than MTU now discarded
      earlier (f.e. we avoid pointless netfilter postrouting trip).
      
      While at it, drop the repeated test ip_exceeds_mtu, checking it once
      is enough...
      
      Fixes: fe6cc55f ("net: ip, ipv6: handle gso skbs in forwarding path")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit ca6c5d4a)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      dd62a1f9
    • Christopher Friedt's avatar
      drm/vmwgfx: correct fb_fix_screeninfo.line_length · 11b7678d
      Christopher Friedt authored
      commit aa6de142 upstream.
      
      Previously, the vmwgfx_fb driver would allow users to call FBIOSET_VINFO, but it would not adjust
      the FINFO properly, resulting in distorted screen rendering. The patch corrects that behaviour.
      
      See https://bugs.gentoo.org/show_bug.cgi?id=494794 for examples.
      Signed-off-by: default avatarChristopher Friedt <chrisfriedt@gmail.com>
      Reviewed-by: default avatarThomas Hellstrom <thellstrom@vmware.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      (cherry picked from commit b8a0ddef)
      11b7678d
    • Roger Quadros's avatar
      ARM: OMAP3: hwmod data: Correct clock domains for USB modules · ce3fd6ed
      Roger Quadros authored
      OMAP3 doesn't contain "l3_init_clkdm" clock domain. Use the
      proper clock domains for USB Host and USB TLL modules.
      
      Gets rid of the following warnings during boot
       omap_hwmod: usb_host_hs: could not associate to clkdm l3_init_clkdm
       omap_hwmod: usb_tll_hs: could not associate to clkdm l3_init_clkdm
      Reported-by: default avatarNishanth Menon <nm@ti.com>
      Cc: Paul Walmsley <paul@pwsan.com>
      Signed-off-by: default avatarRoger Quadros <rogerq@ti.com>
      Fixes: de231388 ("ARM: OMAP: USB: EHCI and OHCI hwmod structures for OMAP3")
      Cc: Keshava Munegowda <keshava_mgowda@ti.com>
      Cc: Partha Basak <parthab@india.ti.com>
      Signed-off-by: default avatarPaul Walmsley <paul@pwsan.com>
      
      (cherry picked from commit c6c56697)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      ce3fd6ed
    • Nicholas Santos's avatar
      HID: usbhid: quirk for Formosa IR receiver · 1df94751
      Nicholas Santos authored
      Patch to add the Formosa Industrial Computing, Inc. Infrared Receiver
      [IR605A/Q] to hid-ids.h and hid-quirks.c.  This IR receiver causes about a 10
      second timeout when the usbhid driver attempts to initialze the device.  Adding
      this device to the quirks list with HID_QUIRK_NO_INIT_REPORTS removes the
      delay.
      Signed-off-by: default avatarNicholas Santos <nicholas.santos@gmail.com>
      [jkosina@suse.cz: fix ordering]
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      
      (cherry picked from commit 320cde19)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      1df94751
    • Lai Jiangshan's avatar
      workqueue: ensure @task is valid across kthread_stop() · e68da712
      Lai Jiangshan authored
      When a kworker should die, the kworkre is notified through WORKER_DIE
      flag instead of kthread_should_stop().  This, IIRC, is primarily to
      keep the test synchronized inside worker_pool lock.  WORKER_DIE is
      first set while holding pool->lock, the lock is dropped and
      kthread_stop() is called.
      
      Unfortunately, this means that there's a slight chance that the target
      kworker may see WORKER_DIE before kthread_stop() finishes and exits
      and frees the target task before or during kthread_stop().
      
      Fix it by pinning the target task before setting WORKER_DIE and
      putting it after kthread_stop() is done.
      
      tj: Improved patch description and comment.  Moved pinning above
          WORKER_DIE for better signify what it's protecting.
      
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      
      (cherry picked from commit 5bdfff96)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      e68da712
    • Jason Gunthorpe's avatar
      tpm: Provide a generic means to override the chip returned timeouts · 6b2a55ea
      Jason Gunthorpe authored
      Some Atmel TPMs provide completely wrong timeouts from their
      TPM_CAP_PROP_TIS_TIMEOUT query. This patch detects that and returns
      new correct values via a DID/VID table in the TIS driver.
      
      Tested on ARM using an AT97SC3204T FW version 37.16
      
      Cc: <stable@vger.kernel.org>
      [PHuewe: without this fix these 'broken' Atmel TPMs won't function on
      older kernels]
      Signed-off-by: default avatar"Berg, Christopher" <Christopher.Berg@atmel.com>
      Signed-off-by: default avatarJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: default avatarPeter Huewe <peterhuewe@gmx.de>
      
      (cherry picked from commit 8e54caf4)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      6b2a55ea
    • Linus Torvalds's avatar
      vfs: avoid non-forwarding large load after small store in path lookup · 8aaa881c
      Linus Torvalds authored
      The performance regression that Josef Bacik reported in the pathname
      lookup (see commit 99d263d4 "vfs: fix bad hashing of dentries") made
      me look at performance stability of the dcache code, just to verify that
      the problem was actually fixed.  That turned up a few other problems in
      this area.
      
      There are a few cases where we exit RCU lookup mode and go to the slow
      serializing case when we shouldn't, Al has fixed those and they'll come
      in with the next VFS pull.
      
      But my performance verification also shows that link_path_walk() turns
      out to have a very unfortunate 32-bit store of the length and hash of
      the name we look up, followed by a 64-bit read of the combined hash_len
      field.  That screws up the processor store to load forwarding, causing
      an unnecessary hickup in this critical routine.
      
      It's caused by the ugly calling convention for the "hash_name()"
      function, and easily fixed by just making hash_name() fill in the whole
      'struct qstr' rather than passing it a pointer to just the hash value.
      
      With that, the profile for this function looks much smoother.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      Merge branch 'parisc-3.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
      
      Pull parisc updates from Helge Deller:
       "The most important patch is a new Light Weigth Syscall (LWS) for 8,
        16, 32 and 64 bit atomic CAS operations which is required in order to
        be able to implement the atomic gcc builtins on our platform.
      
        Other than that, we wire up the seccomp, getrandom and memfd_create
        syscalls, fixes a minor off-by-one bug and a wrong printk string"
      
      * 'parisc-3.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: Implement new LWS CAS supporting 64 bit operations.
        parisc: Wire up seccomp, getrandom and memfd_create syscalls
        parisc: dino: fix %d confusingly prefixed with 0x in format string
        parisc: sys_hpux: NUL terminator is one past the end
      
      Merge tag 'ntb-3.17' of git://github.com/jonmason/ntb
      
      Pull ntb driver bugfixes from Jon Mason:
       "NTB driver fixes for queue spread and buffer alignment.  Also, update
        to MAINTAINERS to reflect new e-mail address"
      
      * tag 'ntb-3.17' of git://github.com/jonmason/ntb:
        ntb: Add alignment check to meet hardware requirement
        MAINTAINERS: update NTB info
        NTB: correct the spread of queues over mw's
      
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
      
      Pull ARM irq chip fixes from Thomas Gleixner:
       "Another pile of ARM specific irq chip fixlets:
      
         - off by one bugs in the crossbar driver
         - missing annotations
         - a bunch of "make it compile" updates
      
        I pulled the lot today from Jason, but it has been in -next for at
        least a week"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip: gic-v3: Declare rdist as __percpu pointer to __iomem pointer
        irqchip: gic: Make gic_default_routable_irq_domain_ops static
        irqchip: exynos-combiner: Fix compilation error on ARM64
        irqchip: crossbar: Off by one bugs in init
        irqchip: gic-v3: Tag all low level accessors __maybe_unused
        irqchip: gic-v3: Only define gic_peek_irq() when building SMP
      
      Merge tag 'irqchip-urgent-3.17' of git://git.infradead.org/users/jcooper/linux into irq/urgent
      
      irqchip fixes for v3.17 from Jason Cooper
      
       - GIC/GICV3: Various fixlets
       - crossbar: Fix off-by-one bug
       - exynos-combiner: Fix arm64 build error
      
      ntb: Add alignment check to meet hardware requirement
      
      The NTB translate register must have the value to be BAR size aligned.
      This alignment check make sure that the DMA memory allocated has the
      proper alignment. Another requirement for NTB to function properly with
      memory window BAR size greater or equal to 4M is to use the CMA feature
      in 3.16 kernel with the appropriate CONFIG_CMA_ALIGNMENT and
      CONFIG_CMA_SIZE_MBYTES set.
      Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
      Signed-off-by: default avatarJon Mason <jdmason@kudzu.us>
      
      MAINTAINERS: update NTB info
      
      Update my contact info to my personal email address and add Dave Jiang.
      Signed-off-by: default avatarJon Mason <jon.mason@intel.com>
      Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
      
      NTB: correct the spread of queues over mw's
      
      The detection of an uneven number of queues on the given memory windows
      was not correct.  The mw_num is zero based and the mod should be
      division to spread them evenly over the mw's.
      Signed-off-by: default avatarJon Mason <jon.mason@intel.com>
      
      Merge branches 'locking-urgent-for-linus' and 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
      
      Pull futex and timer fixes from Thomas Gleixner:
       "A oneliner bugfix for the jinxed futex code:
      
         - Drop hash bucket lock in the error exit path.  I really could slap
           myself for intruducing that bug while fixing all the other horror
           in that code three month ago ...
      
        and the timer department is not too proud about the following fixes:
      
         - Deal with a long standing rounding bug in the timeval to jiffies
           conversion.  It's a real issue and this fix fell through the cracks
           for quite some time.
      
         - Another round of alarmtimer fixes.  Finally this code gets used
           more widely and the subtle issues hidden for quite some time are
           noticed and fixed.  Nothing really exciting, just the itty bitty
           details which bite the serious users here and there"
      
      * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        futex: Unlock hb->lock in futex_wait_requeue_pi() error path
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        alarmtimer: Lock k_itimer during timer callback
        alarmtimer: Do not signal SIGEV_NONE timers
        alarmtimer: Return relative times in timer_gettime
        jiffies: Fix timeval conversion to jiffies
      
      parisc: Implement new LWS CAS supporting 64 bit operations.
      
      The current LWS cas only works correctly for 32bit. The new LWS allows
      for CAS operations of variable size.
      Signed-off-by: default avatarGuy Martin <gmsoft@tuxicoman.be>
      Cc: <stable@vger.kernel.org> # 3.13+
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      
      vfs: fix bad hashing of dentries
      
      Josef Bacik found a performance regression between 3.2 and 3.10 and
      narrowed it down to commit bfcfaa77 ("vfs: use 'unsigned long'
      accesses for dcache name comparison and hashing"). He reports:
      
       "The test case is essentially
      
            for (i = 0; i < 1000000; i++)
                    mkdir("a$i");
      
        On xfs on a fio card this goes at about 20k dir/sec with 3.2, and 12k
        dir/sec with 3.10.  This is because we spend waaaaay more time in
        __d_lookup on 3.10 than in 3.2.
      
        The new hashing function for strings is suboptimal for <
        sizeof(unsigned long) string names (and hell even > sizeof(unsigned
        long) string names that I've tested).  I broke out the old hashing
        function and the new one into a userspace helper to get real numbers
        and this is what I'm getting:
      
            Old hash table had 1000000 entries, 0 dupes, 0 max dupes
            New hash table had 12628 entries, 987372 dupes, 900 max dupes
            We had 11400 buckets with a p50 of 30 dupes, p90 of 240 dupes, p99 of 567 dupes for the new hash
      
        My test does the hash, and then does the d_hash into a integer pointer
        array the same size as the dentry hash table on my system, and then
        just increments the value at the address we got to see how many
        entries we overlap with.
      
        As you can see the old hash function ended up with all 1 million
        entries in their own bucket, whereas the new one they are only
        distributed among ~12.5k buckets, which is why we're using so much
        more CPU in __d_lookup".
      
      The reason for this hash regression is two-fold:
      
       - On 64-bit architectures the down-mixing of the original 64-bit
         word-at-a-time hash into the final 32-bit hash value is very
         simplistic and suboptimal, and just adds the two 32-bit parts
         together.
      
         In particular, because there is no bit shuffling and the mixing
         boundary is also a byte boundary, similar character patterns in the
         low and high word easily end up just canceling each other out.
      
       - the old byte-at-a-time hash mixed each byte into the final hash as it
         hashed the path component name, resulting in the low bits of the hash
         generally being a good source of hash data.  That is not true for the
         word-at-a-time case, and the hash data is distributed among all the
         bits.
      
      The fix is the same in both cases: do a better job of mixing the bits up
      and using as much of the hash data as possible.  We already have the
      "hash_32|64()" functions to do that.
      Reported-by: default avatarJosef Bacik <jbacik@fb.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: linux-fsdevel@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      alarmtimer: Lock k_itimer during timer callback
      
      Locks the k_itimer's it_lock member when handling the alarm timer's
      expiry callback.
      
      The regular posix timers defined in posix-timers.c have this lock held
      during timout processing because their callbacks are routed through
      posix_timer_fn().  The alarm timers follow a different path, so they
      ought to grab the lock somewhere else.
      
      Cc: stable@vger.kernel.org
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Sharvil Nanavati <sharvil@google.com>
      Signed-off-by: default avatarRichard Larocque <rlarocque@google.com>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      
      alarmtimer: Do not signal SIGEV_NONE timers
      
      Avoids sending a signal to alarm timers created with sigev_notify set to
      SIGEV_NONE by checking for that special case in the timeout callback.
      
      The regular posix timers avoid sending signals to SIGEV_NONE timers by
      not scheduling any callbacks for them in the first place.  Although it
      would be possible to do something similar for alarm timers, it's simpler
      to handle this as a special case in the timeout.
      
      Prior to this patch, the alarm timer would ignore the sigev_notify value
      and try to deliver signals to the process anyway.  Even worse, the
      sanity check for the value of sigev_signo is skipped when SIGEV_NONE was
      specified, so the signal number could be bogus.  If sigev_signo was an
      unitialized value (as it often would be if SIGEV_NONE is used), then
      it's hard to predict which signal will be sent.
      
      Cc: stable@vger.kernel.org
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Sharvil Nanavati <sharvil@google.com>
      Signed-off-by: default avatarRichard Larocque <rlarocque@google.com>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      
      alarmtimer: Return relative times in timer_gettime
      
      Returns the time remaining for an alarm timer, rather than the time at
      which it is scheduled to expire.  If the timer has already expired or it
      is not currently scheduled, the it_value's members are set to zero.
      
      This new behavior matches that of the other posix-timers and the POSIX
      specifications.
      
      This is a change in user-visible behavior, and may break existing
      applications.  Hopefully, few users rely on the old incorrect behavior.
      
      Cc: stable@vger.kernel.org
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Sharvil Nanavati <sharvil@google.com>
      Signed-off-by: default avatarRichard Larocque <rlarocque@google.com>
      [jstultz: minor style tweak]
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      
      jiffies: Fix timeval conversion to jiffies
      
      timeval_to_jiffies tried to round a timeval up to an integral number
      of jiffies, but the logic for doing so was incorrect: intervals
      corresponding to exactly N jiffies would become N+1. This manifested
      itself particularly repeatedly stopping/starting an itimer:
      
      setitimer(ITIMER_PROF, &val, NULL);
      setitimer(ITIMER_PROF, NULL, &val);
      
      would add a full tick to val, _even if it was exactly representable in
      terms of jiffies_ (say, the result of a previous rounding.)  Doing
      this repeatedly would cause unbounded growth in val.  So fix the math.
      
      Here's what was wrong with the conversion: we essentially computed
      (eliding seconds)
      
      jiffies = usec  * (NSEC_PER_USEC/TICK_NSEC)
      
      by using scaling arithmetic, which took the best approximation of
      NSEC_PER_USEC/TICK_NSEC with denominator of 2^USEC_JIFFIE_SC =
      x/(2^USEC_JIFFIE_SC), and computed:
      
      jiffies = (usec * x) >> USEC_JIFFIE_SC
      
      and rounded this calculation up in the intermediate form (since we
      can't necessarily exactly represent TICK_NSEC in usec.) But the
      scaling arithmetic is a (very slight) *over*approximation of the true
      value; that is, instead of dividing by (1 usec/ 1 jiffie), we
      effectively divided by (1 usec/1 jiffie)-epsilon (rounding
      down). This would normally be fine, but we want to round timeouts up,
      and we did so by adding 2^USEC_JIFFIE_SC - 1 before the shift; this
      would be fine if our division was exact, but dividing this by the
      slightly smaller factor was equivalent to adding just _over_ 1 to the
      final result (instead of just _under_ 1, as desired.)
      
      In particular, with HZ=1000, we consistently computed that 10000 usec
      was 11 jiffies; the same was true for any exact multiple of
      TICK_NSEC.
      
      We could possibly still round in the intermediate form, adding
      something less than 2^USEC_JIFFIE_SC - 1, but easier still is to
      convert usec->nsec, round in nanoseconds, and then convert using
      time*spec*_to_jiffies.  This adds one constant multiplication, and is
      not observably slower in microbenchmarks on recent x86 hardware.
      
      Tested: the following program:
      
      int main() {
        struct itimerval zero = {{0, 0}, {0, 0}};
        /* Initially set to 10 ms. */
        struct itimerval initial = zero;
        initial.it_interval.tv_usec = 10000;
        setitimer(ITIMER_PROF, &initial, NULL);
        /* Save and restore several times. */
        for (size_t i = 0; i < 10; ++i) {
          struct itimerval prev;
          setitimer(ITIMER_PROF, &zero, &prev);
          /* on old kernels, this goes up by TICK_USEC every iteration */
          printf("previous value: %ld %ld %ld %ld\n",
                 prev.it_interval.tv_sec, prev.it_interval.tv_usec,
                 prev.it_value.tv_sec, prev.it_value.tv_usec);
          setitimer(ITIMER_PROF, &prev, NULL);
        }
          return 0;
      }
      
      Cc: stable@vger.kernel.org
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Reviewed-by: default avatarPaul Turner <pjt@google.com>
      Reported-by: default avatarAaron Jacobs <jacobsa@google.com>
      Signed-off-by: default avatarAndrew Hunter <ahh@google.com>
      [jstultz: Tweaked to apply to 3.17-rc]
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      
      futex: Unlock hb->lock in futex_wait_requeue_pi() error path
      
      futex_wait_requeue_pi() calls futex_wait_setup(). If
      futex_wait_setup() succeeds it returns with hb->lock held and
      preemption disabled. Now the sanity check after this does:
      
              if (match_futex(&q.key, &key2)) {
      	   	ret = -EINVAL;
      		goto out_put_keys;
      	}
      
      which releases the keys but does not release hb->lock.
      
      So we happily return to user space with hb->lock held and therefor
      preemption disabled.
      
      Unlock hb->lock before taking the exit route.
      Reported-by: default avatarDave "Trinity" Jones <davej@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarDarren Hart <dvhart@linux.intel.com>
      Reviewed-by: default avatarDavidlohr Bueso <dave@stgolabs.net>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1409112318500.4178@nanosSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      
      irqchip: gic-v3: Declare rdist as __percpu pointer to __iomem pointer
      
      The __percpu __iomem annotations on the rdist base are contradictory
      and confuse static checkers such as sparse.
      
      This patch fixes the anotations so that rdist is described as a __percpu
      pointer to an __iomem pointer.
      
      Cc: Jason Cooper <jason@lakedaemon.net>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Link: https://lkml.kernel.org/r/1409062410-25891-9-git-send-email-will.deacon@arm.comSigned-off-by: default avatarJason Cooper <jason@lakedaemon.net>
      
      irqchip: gic: Make gic_default_routable_irq_domain_ops static
      
      The internal irq domain ops for the GIC are not used directly anywhere
      else, so make them static. This gets rid of a sparse warning on the
      file.
      
      Cc: Jason Cooper <jason@lakedaemon.net>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Link: https://lkml.kernel.org/r/1409062410-25891-8-git-send-email-will.deacon@arm.comSigned-off-by: default avatarJason Cooper <jason@lakedaemon.net>
      
      irqchip: exynos-combiner: Fix compilation error on ARM64
      
      The following compilation error occurs on 64-bit Exynos7 SoC:
      
      drivers/irqchip/exynos-combiner.c: In function ‘combiner_irq_domain_map’:
      drivers/irqchip/exynos-combiner.c:162:2: error: implicit declaration of function ‘set_irq_flags’ [-Werror=implicit-function-declaration]
        set_irq_flags(irq, IRQF_VALID | IRQF_PROBE);
        ^
      drivers/irqchip/exynos-combiner.c:162:21: error: ‘IRQF_VALID’ undeclared (first use in this function)
        set_irq_flags(irq, IRQF_VALID | IRQF_PROBE);
                           ^
      drivers/irqchip/exynos-combiner.c:162:21: note: each undeclared identifier is reported only once for each function it appears in
      drivers/irqchip/exynos-combiner.c:162:34: error: ‘IRQF_PROBE’ undeclared (first use in this function)
        set_irq_flags(irq, IRQF_VALID | IRQF_PROBE);
      
      Fix the build error by including linux/interrupt.h.
      Signed-off-by: default avatarNaveen Krishna Chatradhi <ch.naveen@samsung.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jason Cooper <jason@lakedaemon.net>
      Cc: Sudeep Holla <sudeep.holla@arm.com>
      Link: https://lkml.kernel.org/r/1409722329-18309-1-git-send-email-ch.naveen@samsung.comSigned-off-by: default avatarJason Cooper <jason@lakedaemon.net>
      
      parisc: Wire up seccomp, getrandom and memfd_create syscalls
      
      With secure computing we only support the SECCOMP_MODE_STRICT mode for
      now.
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      
      parisc: dino: fix %d confusingly prefixed with 0x in format string
      Signed-off-by: default avatarHans Wennborg <hans@hanshq.net>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      
      parisc: sys_hpux: NUL terminator is one past the end
      
      We allocate "len" number of chars so we should put the NUL at "len - 1"
      to avoid corrupting memory.  Btw, strlen_user() is different from the
      normal strlen() function because it includes NUL terminator in the
      count.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      
      irqchip: crossbar: Off by one bugs in init
      
      My static checker complains that the ">" should be ">=" or else we go
      beyond the end of the cb->irq_map[] array on the next line.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarJason Cooper <jason@lakedaemon.net>
      
      irqchip: gic-v3: Tag all low level accessors __maybe_unused
      
      This is only really needed for gic_write_sgi1r in the !SMP case since it
      is only referenced in the SMP initialisation code but it seems better to
      have these functions all next to each other and declared consistently.
      Signed-off-by: default avatarMark Brown <broonie@linaro.org>
      Link: https://lkml.kernel.org/r/1406748194-21094-1-git-send-email-broonie@kernel.orgSigned-off-by: default avatarJason Cooper <jason@lakedaemon.net>
      
      irqchip: gic-v3: Only define gic_peek_irq() when building SMP
      
      If building with CONFIG_SMP disbled (for example, with allnoconfig) then
      GCC complains that the static function gic_peek_irq() is defined but not
      used since the only reference is in the SMP initialisation code. Fix this
      by moving the function definition inside the ifdef.
      Signed-off-by: default avatarMark Brown <broonie@linaro.org>
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Link: https://lkml.kernel.org/r/1406480224-24628-1-git-send-email-broonie@kernel.orgSigned-off-by: default avatarJason Cooper <jason@lakedaemon.net>
      
      (cherry picked from commit 9226b5b4
      99d263d4)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      8aaa881c
    • Al Viro's avatar
      dcache.c: get rid of pointless macros · 145fce8d
      Al Viro authored
      D_HASH{MASK,BITS} are used once each, both in the same function (d_hash()).
      At this point they are actively misguiding - they imply that values are
      compiler constants, which is no longer true.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      
      (cherry picked from commit 482db906)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      145fce8d
    • Tejun Heo's avatar
      blkcg: don't call into policy draining if root_blkg is already gone · 3f2c76f9
      Tejun Heo authored
      While a queue is being destroyed, all the blkgs are destroyed and its
      ->root_blkg pointer is set to NULL.  If someone else starts to drain
      while the queue is in this state, the following oops happens.
      
        NULL pointer dereference at 0000000000000028
        IP: [<ffffffff8144e944>] blk_throtl_drain+0x84/0x230
        PGD e4a1067 PUD b773067 PMD 0
        Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
        Modules linked in: cfq_iosched(-) [last unloaded: cfq_iosched]
        CPU: 1 PID: 537 Comm: bash Not tainted 3.16.0-rc3-work+ #2
        Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
        task: ffff88000e222250 ti: ffff88000efd4000 task.ti: ffff88000efd4000
        RIP: 0010:[<ffffffff8144e944>]  [<ffffffff8144e944>] blk_throtl_drain+0x84/0x230
        RSP: 0018:ffff88000efd7bf0  EFLAGS: 00010046
        RAX: 0000000000000000 RBX: ffff880015091450 RCX: 0000000000000001
        RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
        RBP: ffff88000efd7c10 R08: 0000000000000000 R09: 0000000000000001
        R10: ffff88000e222250 R11: 0000000000000000 R12: ffff880015091450
        R13: ffff880015092e00 R14: ffff880015091d70 R15: ffff88001508fc28
        FS:  00007f1332650740(0000) GS:ffff88001fa80000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
        CR2: 0000000000000028 CR3: 0000000009446000 CR4: 00000000000006e0
        Stack:
         ffffffff8144e8f6 ffff880015091450 0000000000000000 ffff880015091d80
         ffff88000efd7c28 ffffffff8144ae2f ffff880015091450 ffff88000efd7c58
         ffffffff81427641 ffff880015091450 ffffffff82401f00 ffff880015091450
        Call Trace:
         [<ffffffff8144ae2f>] blkcg_drain_queue+0x1f/0x60
         [<ffffffff81427641>] __blk_drain_queue+0x71/0x180
         [<ffffffff81429b3e>] blk_queue_bypass_start+0x6e/0xb0
         [<ffffffff814498b8>] blkcg_deactivate_policy+0x38/0x120
         [<ffffffff8144ec44>] blk_throtl_exit+0x34/0x50
         [<ffffffff8144aea5>] blkcg_exit_queue+0x35/0x40
         [<ffffffff8142d476>] blk_release_queue+0x26/0xd0
         [<ffffffff81454968>] kobject_cleanup+0x38/0x70
         [<ffffffff81454848>] kobject_put+0x28/0x60
         [<ffffffff81427505>] blk_put_queue+0x15/0x20
         [<ffffffff817d07bb>] scsi_device_dev_release_usercontext+0x16b/0x1c0
         [<ffffffff810bc339>] execute_in_process_context+0x89/0xa0
         [<ffffffff817d064c>] scsi_device_dev_release+0x1c/0x20
         [<ffffffff817930e2>] device_release+0x32/0xa0
         [<ffffffff81454968>] kobject_cleanup+0x38/0x70
         [<ffffffff81454848>] kobject_put+0x28/0x60
         [<ffffffff817934d7>] put_device+0x17/0x20
         [<ffffffff817d11b9>] __scsi_remove_device+0xa9/0xe0
         [<ffffffff817d121b>] scsi_remove_device+0x2b/0x40
         [<ffffffff817d1257>] sdev_store_delete+0x27/0x30
         [<ffffffff81792ca8>] dev_attr_store+0x18/0x30
         [<ffffffff8126f75e>] sysfs_kf_write+0x3e/0x50
         [<ffffffff8126ea87>] kernfs_fop_write+0xe7/0x170
         [<ffffffff811f5e9f>] vfs_write+0xaf/0x1d0
         [<ffffffff811f69bd>] SyS_write+0x4d/0xc0
         [<ffffffff81d24692>] system_call_fastpath+0x16/0x1b
      
      776687bc ("block, blk-mq: draining can't be skipped even if
      bypass_depth was non-zero") made it easier to trigger this bug by
      making blk_queue_bypass_start() drain even when it loses the first
      bypass test to blk_cleanup_queue(); however, the bug has always been
      there even before the commit as blk_queue_bypass_start() could race
      against queue destruction, win the initial bypass test but perform the
      actual draining after blk_cleanup_queue() already destroyed all blkgs.
      
      Fix it by skippping calling into policy draining if all the blkgs are
      already gone.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarShirish Pargaonkar <spargaonkar@suse.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Reported-by: default avatarJet Chen <jet.chen@intel.com>
      Cc: stable@vger.kernel.org
      Tested-by: default avatarShirish Pargaonkar <spargaonkar@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      
      Revert "bio: modify __bio_add_page() to accept pages that don't start a new segment"
      
      This reverts commit 254c4407.
      
      It causes crashes with cryptsetup, even after a few iterations and
      updates. Drop it for now.
      
      blkcg: don't call into policy draining if root_blkg is already gone
      
      While a queue is being destroyed, all the blkgs are destroyed and its
      ->root_blkg pointer is set to NULL.  If someone else starts to drain
      while the queue is in this state, the following oops happens.
      
        NULL pointer dereference at 0000000000000028
        IP: [<ffffffff8144e944>] blk_throtl_drain+0x84/0x230
        PGD e4a1067 PUD b773067 PMD 0
        Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
        Modules linked in: cfq_iosched(-) [last unloaded: cfq_iosched]
        CPU: 1 PID: 537 Comm: bash Not tainted 3.16.0-rc3-work+ #2
        Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
        task: ffff88000e222250 ti: ffff88000efd4000 task.ti: ffff88000efd4000
        RIP: 0010:[<ffffffff8144e944>]  [<ffffffff8144e944>] blk_throtl_drain+0x84/0x230
        RSP: 0018:ffff88000efd7bf0  EFLAGS: 00010046
        RAX: 0000000000000000 RBX: ffff880015091450 RCX: 0000000000000001
        RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
        RBP: ffff88000efd7c10 R08: 0000000000000000 R09: 0000000000000001
        R10: ffff88000e222250 R11: 0000000000000000 R12: ffff880015091450
        R13: ffff880015092e00 R14: ffff880015091d70 R15: ffff88001508fc28
        FS:  00007f1332650740(0000) GS:ffff88001fa80000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
        CR2: 0000000000000028 CR3: 0000000009446000 CR4: 00000000000006e0
        Stack:
         ffffffff8144e8f6 ffff880015091450 0000000000000000 ffff880015091d80
         ffff88000efd7c28 ffffffff8144ae2f ffff880015091450 ffff88000efd7c58
         ffffffff81427641 ffff880015091450 ffffffff82401f00 ffff880015091450
        Call Trace:
         [<ffffffff8144ae2f>] blkcg_drain_queue+0x1f/0x60
         [<ffffffff81427641>] __blk_drain_queue+0x71/0x180
         [<ffffffff81429b3e>] blk_queue_bypass_start+0x6e/0xb0
         [<ffffffff814498b8>] blkcg_deactivate_policy+0x38/0x120
         [<ffffffff8144ec44>] blk_throtl_exit+0x34/0x50
         [<ffffffff8144aea5>] blkcg_exit_queue+0x35/0x40
         [<ffffffff8142d476>] blk_release_queue+0x26/0xd0
         [<ffffffff81454968>] kobject_cleanup+0x38/0x70
         [<ffffffff81454848>] kobject_put+0x28/0x60
         [<ffffffff81427505>] blk_put_queue+0x15/0x20
         [<ffffffff817d07bb>] scsi_device_dev_release_usercontext+0x16b/0x1c0
         [<ffffffff810bc339>] execute_in_process_context+0x89/0xa0
         [<ffffffff817d064c>] scsi_device_dev_release+0x1c/0x20
         [<ffffffff817930e2>] device_release+0x32/0xa0
         [<ffffffff81454968>] kobject_cleanup+0x38/0x70
         [<ffffffff81454848>] kobject_put+0x28/0x60
         [<ffffffff817934d7>] put_device+0x17/0x20
         [<ffffffff817d11b9>] __scsi_remove_device+0xa9/0xe0
         [<ffffffff817d121b>] scsi_remove_device+0x2b/0x40
         [<ffffffff817d1257>] sdev_store_delete+0x27/0x30
         [<ffffffff81792ca8>] dev_attr_store+0x18/0x30
         [<ffffffff8126f75e>] sysfs_kf_write+0x3e/0x50
         [<ffffffff8126ea87>] kernfs_fop_write+0xe7/0x170
         [<ffffffff811f5e9f>] vfs_write+0xaf/0x1d0
         [<ffffffff811f69bd>] SyS_write+0x4d/0xc0
         [<ffffffff81d24692>] system_call_fastpath+0x16/0x1b
      
      776687bc ("block, blk-mq: draining can't be skipped even if
      bypass_depth was non-zero") made it easier to trigger this bug by
      making blk_queue_bypass_start() drain even when it loses the first
      bypass test to blk_cleanup_queue(); however, the bug has always been
      there even before the commit as blk_queue_bypass_start() could race
      against queue destruction, win the initial bypass test but perform the
      actual draining after blk_cleanup_queue() already destroyed all blkgs.
      
      Fix it by skippping calling into policy draining if all the blkgs are
      already gone.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarShirish Pargaonkar <spargaonkar@suse.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Reported-by: default avatarJet Chen <jet.chen@intel.com>
      Cc: stable@vger.kernel.org
      Tested-by: default avatarShirish Pargaonkar <spargaonkar@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      
      bio: modify __bio_add_page() to accept pages that don't start a new segment
      
      The original behaviour is to refuse to add a new page if the maximum
      number of segments has been reached, regardless of the fact the page we
      are going to add can be merged into the last segment or not.
      
      Unfortunately, when the system runs under heavy memory fragmentation
      conditions, a driver may try to add multiple pages to the last segment.
      The original code won't accept them and EBUSY will be reported to
      userspace.
      
      This patch modifies the function so it refuses to add a page only in case
      the latter starts a new segment and the maximum number of segments has
      already been reached.
      
      The bug can be easily reproduced with the st driver:
      
      1) set CONFIG_SCSI_MPT2SAS_MAX_SGE or CONFIG_SCSI_MPT3SAS_MAX_SGE  to 16
      2) modprobe st buffer_kbs=1024
      3) #dd if=/dev/zero of=/dev/st0 bs=1M count=10
         dd: error writing `/dev/st0': Device or resource busy
      
      [ming.lei@canonical.com: update bi_iter.bi_size before recounting segments]
      Signed-off-by: default avatarMaurizio Lombardi <mlombard@redhat.com>
      Signed-off-by: default avatarMing Lei <ming.lei@canonical.com>
      Tested-by: default avatarDongsu Park <dongsu.park@profitbricks.com>
      Tested-by: default avatarJet Chen <jet.chen@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      
      block: fix SG_[GS]ET_RESERVED_SIZE ioctl when max_sectors is huge
      
      SG_GET_RESERVED_SIZE and SG_SET_RESERVED_SIZE ioctls access a reserved
      buffer in bytes as int type.  The value needs to be capped at the request
      queue's max_sectors.  But integer overflow is not correctly handled in
      the calculation when converting max_sectors from sectors to bytes.
      Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Douglas Gilbert <dgilbert@interlog.com>
      Cc: linux-scsi@vger.kernel.org
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      
      block: fix BLKSECTGET ioctl when max_sectors is greater than USHRT_MAX
      
      BLKSECTGET ioctl loads the request queue's max_sectors as unsigned
      short value to the argument pointer.  So if the max_sector is greater
      than USHRT_MAX, the upper 16 bits of that is just discarded.
      
      In such case, USHRT_MAX is more preferable than the lower 16 bits of
      max_sectors.
      Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Douglas Gilbert <dgilbert@interlog.com>
      Cc: linux-scsi@vger.kernel.org
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      
      block/partitions/efi.c: kerneldoc fixing
      
      Adding function documentation and fixing kerneldoc warnings
      ('field: description' uniformization).
      
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarFabian Frederick <fabf@skynet.be>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      
      block/partitions/msdos.c: code clean-up
      
      checkpatch fixing:
      WARNING: Missing a blank line after declarations
      WARNING: space prohibited between function name and open parenthesis '('
      ERROR: spaces required around that '<' (ctx:VxV)
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarFabian Frederick <fabf@skynet.be>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      
      block/partitions/amiga.c: replace nolevel printk by pr_err
      
      Also add no prefix pr_fmt to avoid any future default format update
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarFabian Frederick <fabf@skynet.be>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      
      block/partitions/aix.c: replace count*size kzalloc by kcalloc
      
      kcalloc manages count*sizeof overflow.
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarFabian Frederick <fabf@skynet.be>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      
      bio-integrity: add "bip_max_vcnt" into struct bio_integrity_payload
      
      Commit 08778795 ("block: Fix nr_vecs for inline integrity vectors") from
      Martin introduces the function bip_integrity_vecs(get the useful vectors)
      to fix the issue about nr_vecs for inline integrity vectors that reported
      by David Milburn.
      
      But it seems that bip_integrity_vecs() will return the wrong number if the
      bio is not based on any bio_set for some reason(bio->bi_pool == NULL),
      because in that case, the bip_inline_vecs[0] is malloced directly.  So
      here we add the bip_max_vcnt to record the count of vector slots, and
      cleanup the function bip_integrity_vecs().
      Signed-off-by: default avatarGu Zheng <guz.fnst@cn.fujitsu.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      
      blk-mq: use percpu_ref for mq usage count
      
      Currently, blk-mq uses a percpu_counter to keep track of how many
      usages are in flight.  The percpu_counter is drained while freezing to
      ensure that no usage is left in-flight after freezing is complete.
      blk_mq_queue_enter/exit() and blk_mq_[un]freeze_queue() implement this
      per-cpu gating mechanism.
      
      This type of code has relatively high chance of subtle bugs which are
      extremely difficult to trigger and it's way too hairy to be open coded
      in blk-mq.  percpu_ref can serve the same purpose after the recent
      changes.  This patch replaces the open-coded per-cpu usage counting
      and draining mechanism with percpu_ref.
      
      blk_mq_queue_enter() performs tryget_live on the ref and exit()
      performs put.  blk_mq_freeze_queue() kills the ref and waits until the
      reference count reaches zero.  blk_mq_unfreeze_queue() revives the ref
      and wakes up the waiters.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      
      blk-mq: collapse __blk_mq_drain_queue() into blk_mq_freeze_queue()
      
      Keeping __blk_mq_drain_queue() as a separate function doesn't buy us
      anything and it's gonna be further simplified.  Let's flatten it into
      its caller.
      
      This patch doesn't make any functional change.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      
      blk-mq: decouble blk-mq freezing from generic bypassing
      
      blk_mq freezing is entangled with generic bypassing which bypasses
      blkcg and io scheduler and lets IO requests fall through the block
      layer to the drivers in FIFO order.  This allows forward progress on
      IOs with the advanced features disabled so that those features can be
      configured or altered without worrying about stalling IO which may
      lead to deadlock through memory allocation.
      
      However, generic bypassing doesn't quite fit blk-mq.  blk-mq currently
      doesn't make use of blkcg or ioscheds and it maps bypssing to
      freezing, which blocks request processing and drains all the in-flight
      ones.  This causes problems as bypassing assumes that request
      processing is online.  blk-mq works around this by conditionally
      allowing request processing for the problem case - during queue
      initialization.
      
      Another weirdity is that except for during queue cleanup, bypassing
      started on the generic side prevents blk-mq from processing new
      requests but doesn't drain the in-flight ones.  This shouldn't break
      anything but again highlights that something isn't quite right here.
      
      The root cause is conflating blk-mq freezing and generic bypassing
      which are two different mechanisms.  The only intersecting purpose
      that they serve is during queue cleanup.  Let's properly separate
      blk-mq freezing from generic bypassing and simply use it where
      necessary.
      
      * request_queue->mq_freeze_depth is added and
        blk_mq_[un]freeze_queue() now operate on this counter instead of
        ->bypass_depth.  The replacement for QUEUE_FLAG_BYPASS isn't added
        but the counter is tested directly.  This will be further updated by
        later changes.
      
      * blk_mq_drain_queue() is dropped and "__" prefix is dropped from
        blk_mq_freeze_queue().  Queue cleanup path now calls
        blk_mq_freeze_queue() directly.
      
      * blk_queue_enter()'s fast path condition is simplified to simply
        check @q->mq_freeze_depth.  Previously, the condition was
      
      	!blk_queue_dying(q) &&
      	    (!blk_queue_bypass(q) || !blk_queue_init_done(q))
      
        mq_freeze_depth is incremented right after dying is set and
        blk_queue_init_done() exception isn't necessary as blk-mq doesn't
        start frozen, which only leaves the blk_queue_bypass() test which
        can be replaced by @q->mq_freeze_depth test.
      
      This change simplifies the code and reduces confusion in the area.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      
      block, blk-mq: draining can't be skipped even if bypass_depth was non-zero
      
      Currently, both blk_queue_bypass_start() and blk_mq_freeze_queue()
      skip queue draining if bypass_depth was already above zero.  The
      assumption is that the one which bumped the bypass_depth should have
      performed draining already; however, there's nothing which prevents a
      new instance of bypassing/freezing from starting before the previous
      one finishes draining.  The current code may allow the later
      bypassing/freezing instances to complete while there still are
      in-flight requests which haven't finished draining.
      
      Fix it by draining regardless of bypass_depth.  We still skip draining
      from blk_queue_bypass_start() while the queue is initializing to avoid
      introducing excessive delays during boot.  INIT_DONE setting is moved
      above the initial blk_queue_bypass_end() so that bypassing attempts
      can't slip inbetween.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      
      blk-mq: fix a memory ordering bug in blk_mq_queue_enter()
      
      blk-mq uses a percpu_counter to keep track of how many usages are in
      flight.  The percpu_counter is drained while freezing to ensure that
      no usage is left in-flight after freezing is complete.
      
      blk_mq_queue_enter/exit() and blk_mq_[un]freeze_queue() implement this
      per-cpu gating mechanism; unfortunately, it contains a subtle bug -
      smp_wmb() in blk_mq_queue_enter() doesn't prevent prevent the cpu from
      fetching @q->bypass_depth before incrementing @q->mq_usage_counter and
      if freezing happens inbetween the caller can slip through and freezing
      can be complete while there are active users.
      
      Use smp_mb() instead so that bypass_depth and mq_usage_counter
      modifications and tests are properly interlocked.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      
      Merge branch 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu into for-3.17/core
      
      Merge the percpu_ref changes from Tejun, he says they are stable now.
      
      percpu-refcount: implement percpu_ref_reinit() and percpu_ref_is_zero()
      
      Now that explicit invocation of percpu_ref_exit() is necessary to free
      the percpu counter, we can implement percpu_ref_reinit() which
      reinitializes a released percpu_ref.  This can be used implement
      scalable gating switch which can be drained and then re-opened without
      worrying about memory allocation failures.
      
      percpu_ref_is_zero() is added to be used in a sanity check in
      percpu_ref_exit().  As this function will be useful for other purposes
      too, make it a public interface.
      
      v2: Use smp_read_barrier_depends() instead of smp_load_acquire().  We
          only need data dep barrier and smp_load_acquire() is stronger and
          heavier on some archs.  Spotted by Lai Jiangshan.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      
      percpu-refcount: require percpu_ref to be exited explicitly
      
      Currently, a percpu_ref undoes percpu_ref_init() automatically by
      freeing the allocated percpu area when the percpu_ref is killed.
      While seemingly convenient, this has the following niggles.
      
      * It's impossible to re-init a released reference counter without
        going through re-allocation.
      
      * In the similar vein, it's impossible to initialize a percpu_ref
        count with static percpu variables.
      
      * We need and have an explicit destructor anyway for failure paths -
        percpu_ref_cancel_init().
      
      This patch removes the automatic percpu counter freeing in
      percpu_ref_kill_rcu() and repurposes percpu_ref_cancel_init() into a
      generic destructor now named percpu_ref_exit().  percpu_ref_destroy()
      is considered but it gets confusing with percpu_ref_kill() while
      "exit" clearly indicates that it's the counterpart of
      percpu_ref_init().
      
      All percpu_ref_cancel_init() users are updated to invoke
      percpu_ref_exit() instead and explicit percpu_ref_exit() calls are
      added to the destruction path of all percpu_ref users.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarBenjamin LaHaise <bcrl@kvack.org>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
      Cc: Li Zefan <lizefan@huawei.com>
      
      percpu-refcount: use unsigned long for pcpu_count pointer
      
      percpu_ref->pcpu_count is a percpu pointer with a status flag in its
      lowest bit.  As such, it always goes through arithmetic operations
      which is very cumbersome to do on a pointer.  It has to be first
      casted to unsigned long and then back.
      
      Let's just make the field unsigned long so that we can skip the first
      casts.  While at it, rename it to pcpu_counter_ptr to clarify that
      it's a pointer value.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      
      percpu-refcount: add helpers for ->percpu_count accesses
      
      * All four percpu_ref_*() operations implemented in the header file
        perform the same operation to determine whether the percpu_ref is
        alive and extract the percpu pointer.  Factor out the common logic
        into __pcpu_ref_alive().  This doesn't change the generated code.
      
      * There are a couple places in percpu-refcount.c which masks out
        PCPU_REF_DEAD to obtain the percpu pointer.  Factor it out into
        pcpu_count_ptr().
      
      * The above changes make the WARN_ON_ONCE() conditional at the top of
        percpu_ref_kill_and_confirm() the only user of REF_STATUS().  Test
        PCPU_REF_DEAD directly and remove REF_STATUS().
      
      This patch doesn't introduce any functional change.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      
      percpu-refcount: one bit is enough for REF_STATUS
      
      percpu-refcount currently reserves two lowest bits of its percpu
      pointer to indicate its state; however, only one bit is used for
      PCPU_REF_DEAD.
      
      Simplify it by removing PCPU_STATUS_BITS/MASK and testing
      PCPU_REF_DEAD directly.  This also allows the compiler to choose a
      more efficient instruction depending on the architecture.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      
      percpu-refcount, aio: use percpu_ref_cancel_init() in ioctx_alloc()
      
      ioctx_alloc() reaches inside percpu_ref and directly frees
      ->pcpu_count in its failure path, which is quite gross.  percpu_ref
      has been providing a proper interface to do this,
      percpu_ref_cancel_init(), for quite some time now.  Let's use that
      instead.
      
      This patch doesn't introduce any behavior changes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarBenjamin LaHaise <bcrl@kvack.org>
      Cc: Kent Overstreet <kmo@daterainc.com>
      
      workqueue: stronger test in process_one_work()
      
      After the recent changes, when POOL_DISASSOCIATED is cleared, the
      running worker's local CPU should be the same as pool->cpu without any
      exception even during cpu-hotplug.  Update the sanity check in
      process_one_work() accordingly.
      
      This patch changes "(proposition_A && proposition_B && proposition_C)"
      to "(proposition_B && proposition_C)", so if the old compound
      proposition is true, the new one must be true too. so this will not
      hide any possible bug which can be caught by the old test.
      
      tj: Minor updates to the description.
      
      CC: Jason J. Herne <jjherne@linux.vnet.ibm.com>
      CC: Sasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      
      workqueue: clear POOL_DISASSOCIATED in rebind_workers()
      
      The commit a9ab775b ("workqueue: directly restore CPU affinity of
      workers from CPU_ONLINE") moved the pool->lock into rebind_workers()
      without also moving "pool->flags &= ~POOL_DISASSOCIATED".
      
      There is nothing wrong with "pool->flags &= ~POOL_DISASSOCIATED" not
      being moved together, but there isn't any benefit either. We move it
      into rebind_workers() and achieve these benefits:
      
      1) Better readability.  POOL_DISASSOCIATED is cleared in
         rebind_workers() as expected.
      
      2) When POOL_DISASSOCIATED is cleared, we can ensure that all the
         running workers of the pool are on the local CPU (pool->cpu).
      
      tj: Cosmetic updates to the code and description.
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      
      percpu: Use ALIGN macro instead of hand coding alignment calculation
      Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      
      percpu: invoke __verify_pcpu_ptr() from the generic part of accessors and operations
      
      __verify_pcpu_ptr() is used to verify that a specified parameter is
      actually an percpu pointer by percpu accessor and operation
      implementations.  Currently, where it's called isn't clearly defined
      and we just ensure that it's invoked at least once for all accessors
      and operations.
      
      The lack of clarity on when it should be called isn't nice and given
      that this is a completely generic issue, there's no reason to make
      archs worry about it.
      
      This patch updates __verify_pcpu_ptr() invocations such that it's
      always invoked from the final generic wrapper once per access or
      operation.  As this is already the case for {raw|this}_cpu_*()
      definitions through __pcpu_size_*(), only the {raw|per|this}_cpu_ptr()
      accessors need to be updated.
      
      This change makes it unnecessary for archs to worry about
      __verify_pcpu_ptr().  x86's arch_raw_cpu_ptr() is updated accordingly.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      
      percpu: preffity percpu header files
      
      percpu macros are difficult to read.  It's partly because they're
      fairly complex but also because they simply lack visual and
      conventional consistency to an unusual degree.  The preceding patches
      tried to organize macro definitions consistently by their roles.  This
      patch makes the following cosmetic changes to improve overall
      readability.
      
      * Use consistent convention for multi-line macro definitions - "do {"
        or "({" are now put on their own lines and the line continuing '\'
        are all put on the same column.
      
      * Temp variables used inside macro are consistently given "__" prefix.
      
      * When a macro argument is passed to another macro or a function,
        putting extra parenthses around it doesn't help anything.  Don't put
        them.
      
      * _this_cpu_generic_*() are renamed to this_cpu_generic_*() so that
        they're consistent with raw_cpu_generic_*().
      
      * Reorganize raw_cpu_*() and this_cpu_*() definitions so that trivial
        wrappers are collected in one place after actual operation
        definitions.
      
      * Other misc cleanups including reorganizing comments.
      
      All changes in this patch are cosmetic and cause no functional
      difference.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      
      percpu: use raw_cpu_*() to define __this_cpu_*()
      
      __this_cpu_*() operations are the same as raw_cpu_*() operations
      except for the added __this_cpu_preempt_check().  Curiously, these
      were defined using __pcu_size_call_*() instead of being layered on top
      of raw_cpu_*().
      
      Let's layer them so that __this_cpu_*() are defined in terms of
      raw_cpu_*().  It's simpler and less error-prone this way.
      
      This patch doesn't introduce any functional difference.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      
      percpu: reorder macros in percpu header files
      
      * In include/asm-generic/percpu.h, collect {raw|_this}_cpu_generic*()
        macros into one place.  They were dispersed through
        {raw|this}_cpu_*_N() definitions and the visiual inconsistency was
        making following the code unnecessarily difficult.
      
      * In include/linux/percpu-defs.h, move __verify_pcpu_ptr() later in
        the file so that it's right above accessor definitions where it's
        actually used.
      
      This is pure reorganization.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      
      percpu: move {raw|this}_cpu_*() definitions to include/linux/percpu-defs.h
      
      We're in the process of moving all percpu accessors and operations to
      include/linux/percpu-defs.h so that they're available to arch headers
      without having to include full include/linux/percpu.h which may cause
      cyclic inclusion dependency.
      
      This patch moves {raw|this}_cpu_*() definitions from
      include/linux/percpu.h to include/linux/percpu-defs.h.  The code is
      moved mostly verbatim; however, raw_cpu_*() are placed above
      this_cpu_*() which is more conventional as the raw operations may be
      used to defined other variants.
      
      This is pure reorganization.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      
      percpu: move generic {raw|this}_cpu_*_N() definitions to include/asm-generic/percpu.h
      
      {raw|this}_cpu_*_N() operations are expected to be provided by archs
      and the generic definitions are provided as fallbacks.  As such, these
      firmly belong to include/asm-generic/percpu.h.
      
      Move the generic definitions to include/asm-generic/percpu.h.  The
      code is moved mostly verbatim; however, raw_cpu_*_N() are placed above
      this_cpu_*_N() which is more conventional as the raw operations may be
      used to defined other variants.
      
      This is pure reorganization.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      
      percpu: only allow sized arch overrides for {raw|this}_cpu_*() ops
      
      Currently, percpu allows two separate methods for overriding
      {raw|this}_cpu_*() ops - for a given operation, an arch can provide
      whole replacement or sized sub operations to override specific parts
      of it.  e.g. arch either can provide this_cpu_add() or
      this_cpu_add_4() to override only the 4 byte operation.
      
      While quite flexible on a glance, the dual-overriding scheme
      complicates the code path for no actual gain.  It compilcates the
      already complex operation definitions and if an arch wants to override
      all sizes, it can easily provide all variants anyway.  In fact, no
      arch is actually making use of whole operation override.
      
      Another oddity is that __this_cpu_*() operations are defined in the
      same way as raw_cpu_*() but ignores full overrides of the raw_cpu_*()
      and doesn't allow full operation override, so if an arch provides
      whole overrides for raw_cpu_*() operations __this_cpu_*() ends up
      using the generic implementations.
      
      More importantly, it takes away the layering between arch-specific and
      generic parts making it impossible for the generic part to implement
      arch-independent features on top of arch-specific overrides.
      
      This patch removes the support for whole operation overrides.  As no
      arch is using it, this doesn't cause any actual difference.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      
      percpu: reorganize include/linux/percpu-defs.h
      
      Reorganize for better readability.
      
      * Accessor definitions are collected into one place and SMP and UP now
        define them in the same order.
      
      * Definitions are layered when possible - e.g. per_cpu() is now
        defined in terms of this_cpu_ptr().
      
      * Rather pointless comment dropped.
      
      * per_cpu(), __raw_get_cpu_var() and __get_cpu_var() are defined in a
        way which can be shared between SMP and UP and moved out of
        CONFIG_SMP blocks.
      
      This patch doesn't introduce any functional difference.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      
      percpu: move accessors from include/linux/percpu.h to percpu-defs.h
      
      include/linux/percpu-defs.h is gonna host all accessors and operations
      so that arch headers can make use of them too without worrying about
      circular dependency through include/linux/percpu.h.
      
      This patch moves the following accessors from include/linux/percpu.h
      to include/linux/percpu-defs.h.
      
      * get/put_cpu_var()
      * get/put_cpu_ptr()
      * per_cpu_ptr()
      
      This is pure reorgniazation.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      
      percpu: include/asm-generic/percpu.h should contain only arch-overridable parts
      
      The roles of the various percpu header files has become unclear.
      There are four header files involved.
      
       include/linux/percpu-defs.h
       include/linux/percpu.h
       include/asm-generic/percpu.h
       arch/*/include/asm/percpu.h
      
      The original intention for include/asm-generic/percpu.h is providing
      generic definitions for arch-overridable parts; however, it now hosts
      various stuff which can't be overridden by archs.
      
      Also, include/linux/percpu-defs.h was initially added to contain
      section and percpu variable definition macros so that arch header
      files can make use of them without worrying about introducing cyclic
      inclusion dependency by including include/linux/percpu.h; however,
      arch headers sometimes need to access percpu variables too and this is
      one of the reasons why some accessors were implemented in
      include/linux/asm-generic/percpu.h.
      
      Let's clear up the situation by making include/asm-generic/percpu.h
      contain only arch-overridable parts and moving accessors and
      operations into include/linux/percpu-defs.  Note that this patch only
      moves things from include/asm-generic/percpu.h.
      include/linux/percpu.h will be taken care of by later patches.
      
      This patch moves the followings.
      
      * SHIFT_PERCPU_PTR() / VERIFY_PERCPU_PTR()
      * per_cpu()
      * raw_cpu_ptr()
      * this_cpu_ptr()
      * __get_cpu_var()
      * __raw_get_cpu_var()
      * __this_cpu_ptr()
      * PER_CPU_[SHARED_]ALIGNED_SECTION
      * PER_CPU_[SHARED_]ALIGNED_SECTION
      * PER_CPU_FIRST_SECTION
      
      This patch is pure reorganization.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      
      percpu: introduce arch_raw_cpu_ptr()
      
      Currently, archs can override raw_cpu_ptr() directly; however, we
      wanna build a layer of indirection in the generic part of percpu so
      that we can implement generic features there without affecting archs.
      
      Introduce arch_raw_cpu_ptr() which is used to define raw_cpu_ptr() by
      generic percpu code.  The two are identical for now.  x86 is currently
      the only arch which overrides raw_cpu_ptr() and is converted to
      define arch_raw_cpu_ptr() instead.
      
      This doesn't introduce any functional difference.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      
      percpu: disallow archs from overriding SHIFT_PERCPU_PTR()
      
      It has been about half a decade since all archs started using the
      dynamic percpu allocator and thus the same SHIFT_PERCPU_PTR()
      implementation.  There's no benefit in overriding SHIFT_PERCPU_PTR()
      anymore.
      
      Remove #ifndef around it to clarify that this is identical regardless
      of the arch.
      
      This patch doesn't cause any functional difference.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      
      (cherry picked from commit 2a1b4cf2
      0b462c89)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      3f2c76f9