1. 18 Jul, 2017 40 commits
    • Suzuki K Poulose's avatar
      kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd · 2a992347
      Suzuki K Poulose authored
      commit 8b3405e3 upstream.
      
      In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
      unmap_stage2_range() on the entire memory range for the guest. This could
      cause problems with other callers (e.g, munmap on a memslot) trying to
      unmap a range. And since we have to unmap the entire Guest memory range
      holding a spinlock, make sure we yield the lock if necessary, after we
      unmap each PUD range.
      
      Fixes: commit d5d8184d ("KVM: ARM: Memory virtualization setup")
      Cc: Paolo Bonzini <pbonzin@redhat.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Christoffer Dall <christoffer.dall@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarSuzuki K Poulose <suzuki.poulose@arm.com>
      [ Avoid vCPU starvation and lockup detector warnings ]
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: default avatarChristoffer Dall <cdall@linaro.org>
      [bwh: Backported to 3.16:
       - unmap_stage2_range() is a wrapper around unmap_range(), which is also used for
         HYP page table setup.  So unmap_range() should do the cond_resched_lock(), but
         only if kvm != NULL.
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      2a992347
    • Yuejie Shi's avatar
      af_key: Add lock to key dump · fd75aee7
      Yuejie Shi authored
      commit 89e357d8 upstream.
      
      A dump may come in the middle of another dump, modifying its dump
      structure members. This race condition will result in NULL pointer
      dereference in kernel. So add a lock to prevent that race.
      
      Fixes: 83321d6b ("[AF_KEY]: Dump SA/SP entries non-atomically")
      Signed-off-by: default avatarYuejie Shi <syjcnss@gmail.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      fd75aee7
    • Nicholas Bellinger's avatar
      iscsi-target: Drop work-around for legacy GlobalSAN initiator · c136bc3a
      Nicholas Bellinger authored
      commit 1c99de98 upstream.
      
      Once upon a time back in 2009, a work-around was added to support
      the GlobalSAN iSCSI initiator v3.3 for MacOSX, which during login
      did not propose nor respond to MaxBurstLength, FirstBurstLength,
      DefaultTime2Wait and DefaultTime2Retain keys.
      
      The work-around in iscsi_check_proposer_for_optional_reply()
      allowed the missing keys to be proposed, but did not require
      waiting for a response before moving to full feature phase
      operation.  This allowed GlobalSAN v3.3 to work out-of-the
      box, and for many years we didn't run into login interopt
      issues with any other initiators..
      
      Until recently, when Martin tried a QLogic 57840S iSCSI Offload
      HBA on Windows 2016 which completed login, but subsequently
      failed with:
      
          Got unknown iSCSI OpCode: 0x43
      
      The issue was QLogic MSFT side did not propose DefaultTime2Wait +
      DefaultTime2Retain, so LIO proposes them itself, and immediately
      transitions to full feature phase because of the GlobalSAN hack.
      However, the QLogic MSFT side still attempts to respond to
      DefaultTime2Retain + DefaultTime2Wait, even though LIO has set
      ISCSI_FLAG_LOGIN_NEXT_STAGE3 + ISCSI_FLAG_LOGIN_TRANSIT
      in last login response.
      
      So while the QLogic MSFT side should have been proposing these
      two keys to start, it was doing the correct thing per RFC-3720
      attempting to respond to proposed keys before transitioning to
      full feature phase.
      
      All that said, recent versions of GlobalSAN iSCSI (v5.3.0.541)
      does correctly propose the four keys during login, making the
      original work-around moot.
      
      So in order to allow QLogic MSFT to run unmodified as-is, go
      ahead and drop this long standing work-around.
      Reported-by: default avatarMartin Svec <martin.svec@zoner.cz>
      Cc: Martin Svec <martin.svec@zoner.cz>
      Cc: Himanshu Madhani <Himanshu.Madhani@cavium.com>
      Cc: Arun Easi <arun.easi@cavium.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      c136bc3a
    • Song Hongyan's avatar
      iio: hid-sensor-attributes: Fix sensor property setting failure. · dc49f841
      Song Hongyan authored
      commit bba6d9e4 upstream.
      
      When system bootup without get sensor property, set sensor
      property will be fail.
      
      If no get_feature operation done before set_feature, the sensor
      properties will all be the initialized value, which is not the
      same with sensor real properties. When set sensor property it will
      write back to sensor the changed perperty data combines with other
      sensor properties data, it is not right and may be dangerous.
      
      In order to get all sensor properties, choose to read one of the sensor
      properties(no matter read any sensor peroperty, driver will get all
      the peroperties and return the requested one).
      
      Fixes: 73c6768b ("iio: hid-sensors: Common attribute and trigger")
      Signed-off-by: default avatarSong Hongyan <hongyan.song@intel.com>
      Acked-by: default avatarSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      [bwh: Backported to 3.16:
       - sensor_hub_get_feature() doesn't take a 'buffer_size' parameter
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      dc49f841
    • Nikolaus Schulz's avatar
      iio: core: Fix IIO_VAL_FRACTIONAL_LOG2 for negative values · 2c10d3e6
      Nikolaus Schulz authored
      commit 7fd6592d upstream.
      
      Fix formatting of negative values of type IIO_VAL_FRACTIONAL_LOG2 by
      switching from do_div(), which can't handle negative numbers, to
      div_s64_rem().  Also use shift_right for shifting, which is safe with
      negative values.
      Signed-off-by: default avatarNikolaus Schulz <nikolaus.schulz@avionic-design.de>
      Reviewed-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      [bwh: Backported to 3.16:
       - Use vals[] instead of tmp{0,1}
       - Keep using sprintf()]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      2c10d3e6
    • Michal Nazarewicz's avatar
      include/linux/kernel.h: change abs() macro so it uses consistent return type · bc11e0dc
      Michal Nazarewicz authored
      commit 8f57e4d9 upstream.
      
      Rewrite abs() so that its return type does not depend on the
      architecture and no unexpected type conversion happen inside of it.  The
      only conversion is from unsigned to signed type.  char is left as a
      return type but treated as a signed type regradless of it's actual
      signedness.
      
      With the old version, int arguments were promoted to long and depending
      on architecture a long argument might result in s64 or long return type
      (which may or may not be the same).
      
      This came after some back and forth with Nicolas.  The current macro has
      different return type (for the same input type) depending on
      architecture which might be midly iritating.
      
      An alternative version would promote to int like so:
      
      	#define abs(x)	__abs_choose_expr(x, long long,			\
      			__abs_choose_expr(x, long,			\
      			__builtin_choose_expr(				\
      				sizeof(x) <= sizeof(int),		\
      				({ int __x = (x); __x<0?-__x:__x; }),	\
      				((void)0))))
      
      I have no preference but imagine Linus might.  :] Nicolas argument against
      is that promoting to int causes iconsistent behaviour:
      
      	int main(void) {
      		unsigned short a = 0, b = 1, c = a - b;
      		unsigned short d = abs(a - b);
      		unsigned short e = abs(c);
      		printf("%u %u\n", d, e);  // prints: 1 65535
      	}
      
      Then again, no sane person expects consistent behaviour from C integer
      arithmetic.  ;)
      
      Note:
      
        __builtin_types_compatible_p(unsigned char, char) is always false, and
        __builtin_types_compatible_p(signed char, char) is also always false.
      Signed-off-by: default avatarMichal Nazarewicz <mina86@mina86.com>
      Reviewed-by: default avatarNicolas Pitre <nico@linaro.org>
      Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Cc: Wey-Yi Guy <wey-yi.w.guy@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [bwh: Backported to 3.16: adjust filename]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      bc11e0dc
    • Michal Nazarewicz's avatar
      kernel.h: make abs() work with 64-bit types · 5ab49e93
      Michal Nazarewicz authored
      commit c8299cb6 upstream.
      
      For 64-bit arguments, the abs macro casts it to an int which leads to
      lost precision and may cause incorrect results.  To deal with 64-bit
      types abs64 macro has been introduced but still there are places where
      abs macro is used incorrectly.
      
      To deal with the problem, expand abs macro such that it operates on s64
      type when dealing with 64-bit types while still returning long when
      dealing with smaller types.
      
      This fixes one known bug (per John):
      
      The internal clocksteering done for fine-grained error correction uses a
      : logarithmic approximation, so any time adjtimex() adjusts the clock
      : steering, timekeeping_freqadjust() quickly approximates the correct clock
      : frequency over a series of ticks.
      :
      : Unfortunately, the logic in timekeeping_freqadjust(), introduced in commit
      : dc491596 (Rework frequency adjustments to work better w/ nohz),
      : used the abs() function with a s64 error value to calculate the size of
      : the approximated adjustment to be made.
      :
      : Per include/linux/kernel.h: "abs() should not be used for 64-bit types
      : (s64, u64, long long) - use abs64()".
      :
      : Thus on 32-bit platforms, this resulted in the clocksteering to take a
      : quite dampended random walk trying to converge on the proper frequency,
      : which caused the adjustments to be made much slower then intended (most
      : easily observed when large adjustments are made).
      Signed-off-by: default avatarMichal Nazarewicz <mina86@mina86.com>
      Reported-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Tested-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      5ab49e93
    • Guillaume Nault's avatar
      l2tp: take a reference on sessions used in genetlink handlers · b301c9b7
      Guillaume Nault authored
      commit 2777e2ab upstream.
      
      Callers of l2tp_nl_session_find() need to hold a reference on the
      returned session since there's no guarantee that it isn't going to
      disappear from under them.
      
      Relying on the fact that no l2tp netlink message may be processed
      concurrently isn't enough: sessions can be deleted by other means
      (e.g. by closing the PPPOL2TP socket of a ppp pseudowire).
      
      l2tp_nl_cmd_session_delete() is a bit special: it runs a callback
      function that may require a previous call to session->ref(). In
      particular, for ppp pseudowires, the callback is l2tp_session_delete(),
      which then calls pppol2tp_session_close() and dereferences the PPPOL2TP
      socket. The socket might already be gone at the moment
      l2tp_session_delete() calls session->ref(), so we need to take a
      reference during the session lookup. So we need to pass the do_ref
      variable down to l2tp_session_get() and l2tp_session_get_by_ifname().
      
      Since all callers have to be updated, l2tp_session_find_by_ifname() and
      l2tp_nl_session_find() are renamed to reflect their new behaviour.
      
      Fixes: 309795f4 ("l2tp: Add netlink control API for L2TP")
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.16: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      b301c9b7
    • Guillaume Nault's avatar
      l2tp: fix duplicate session creation · 877dc0d7
      Guillaume Nault authored
      commit dbdbc73b upstream.
      
      l2tp_session_create() relies on its caller for checking for duplicate
      sessions. This is racy since a session can be concurrently inserted
      after the caller's verification.
      
      Fix this by letting l2tp_session_create() verify sessions uniqueness
      upon insertion. Callers need to be adapted to check for
      l2tp_session_create()'s return code instead of calling
      l2tp_session_find().
      
      pppol2tp_connect() is a bit special because it has to work on existing
      sessions (if they're not connected) or to create a new session if none
      is found. When acting on a preexisting session, a reference must be
      held or it could go away on us. So we have to use l2tp_session_get()
      instead of l2tp_session_find() and drop the reference before exiting.
      
      Fixes: d9e31d17 ("l2tp: Add L2TP ethernet pseudowire support")
      Fixes: fd558d18 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      877dc0d7
    • Guillaume Nault's avatar
      l2tp: ensure session can't get removed during pppol2tp_session_ioctl() · 611ddd1a
      Guillaume Nault authored
      commit 57377d63 upstream.
      
      Holding a reference on session is required before calling
      pppol2tp_session_ioctl(). The session could get freed while processing the
      ioctl otherwise. Since pppol2tp_session_ioctl() uses the session's socket,
      we also need to take a reference on it in l2tp_session_get().
      
      Fixes: fd558d18 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      611ddd1a
    • Guillaume Nault's avatar
      l2tp: fix race in l2tp_recv_common() · 3129845f
      Guillaume Nault authored
      commit 61b9a047 upstream.
      
      Taking a reference on sessions in l2tp_recv_common() is racy; this
      has to be done by the callers.
      
      To this end, a new function is required (l2tp_session_get()) to
      atomically lookup a session and take a reference on it. Callers then
      have to manually drop this reference.
      
      Fixes: fd558d18 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.16: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      3129845f
    • Uwe Kleine-König's avatar
      serial: mxs-auart: Fix baudrate calculation · a3cd4f8c
      Uwe Kleine-König authored
      commit a6040bc6 upstream.
      
      The reference manual for the i.MX28 recommends to calculate the divisor
      as
      
      	divisor = (UARTCLK * 32) / baud rate, rounded to the nearest integer
      
      , so let's do this. For a typical setup of UARTCLK = 24 MHz and baud
      rate = 115200 this changes the divisor from 6666 to 6667 and so the
      actual baud rate improves from 115211.521 Bd (error ≅ 0.01 %) to
      115194.240 Bd (error ≅ 0.005 %).
      Signed-off-by: default avatarUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [bwh: Backported to 3.16: adjust context, indentation]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      a3cd4f8c
    • Stefan Wahren's avatar
      serial: mxs-auart: fix baud rate range · c7c32716
      Stefan Wahren authored
      commit df57cf6a upstream.
      
      Currently mxs-auart doesn't care correctly about the baud rate divisor.
      According to reference manual the baud rate divisor must be between
      0x000000EC and 0x003FFFC0. So calculate the possible baud rate range
      and use it for uart_get_baud_rate().
      Signed-off-by: default avatarStefan Wahren <stefan.wahren@i2se.com>
      Reviewed-by: default avatarFabio Estevam <fabio.estevam@freescale.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      c7c32716
    • Hans de Goede's avatar
      mmc: sdhci: Disable runtime pm when the sdio_irq is enabled · 9af42458
      Hans de Goede authored
      commit 923713b3 upstream.
      
      SDIO cards may need clock to send the card interrupt to the host.
      
      On a cherrytrail tablet with a RTL8723BS wifi chip, without this patch
      pinging the tablet results in:
      
      PING 192.168.1.14 (192.168.1.14) 56(84) bytes of data.
      64 bytes from 192.168.1.14: icmp_seq=1 ttl=64 time=78.6 ms
      64 bytes from 192.168.1.14: icmp_seq=2 ttl=64 time=1760 ms
      64 bytes from 192.168.1.14: icmp_seq=3 ttl=64 time=753 ms
      64 bytes from 192.168.1.14: icmp_seq=4 ttl=64 time=3.88 ms
      64 bytes from 192.168.1.14: icmp_seq=5 ttl=64 time=795 ms
      64 bytes from 192.168.1.14: icmp_seq=6 ttl=64 time=1841 ms
      64 bytes from 192.168.1.14: icmp_seq=7 ttl=64 time=810 ms
      64 bytes from 192.168.1.14: icmp_seq=8 ttl=64 time=1860 ms
      64 bytes from 192.168.1.14: icmp_seq=9 ttl=64 time=812 ms
      64 bytes from 192.168.1.14: icmp_seq=10 ttl=64 time=48.6 ms
      
      Where as with this patch I get:
      
      PING 192.168.1.14 (192.168.1.14) 56(84) bytes of data.
      64 bytes from 192.168.1.14: icmp_seq=1 ttl=64 time=3.96 ms
      64 bytes from 192.168.1.14: icmp_seq=2 ttl=64 time=1.97 ms
      64 bytes from 192.168.1.14: icmp_seq=3 ttl=64 time=17.2 ms
      64 bytes from 192.168.1.14: icmp_seq=4 ttl=64 time=2.46 ms
      64 bytes from 192.168.1.14: icmp_seq=5 ttl=64 time=2.83 ms
      64 bytes from 192.168.1.14: icmp_seq=6 ttl=64 time=1.40 ms
      64 bytes from 192.168.1.14: icmp_seq=7 ttl=64 time=2.10 ms
      64 bytes from 192.168.1.14: icmp_seq=8 ttl=64 time=1.40 ms
      64 bytes from 192.168.1.14: icmp_seq=9 ttl=64 time=2.04 ms
      64 bytes from 192.168.1.14: icmp_seq=10 ttl=64 time=1.40 ms
      
      Cc: Dong Aisheng <b29396@freescale.com>
      Cc: Ian W MORRISON <ianwmorrison@gmail.com>
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Acked-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Acked-by: default avatarDong Aisheng <aisheng.dong@nxp.com>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      [bwh: Backported to 3.16: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      9af42458
    • Thomas Hellstrom's avatar
      drm/vmwgfx: Remove getparam error message · 447ed742
      Thomas Hellstrom authored
      commit 53e16798 upstream.
      
      The mesa winsys sometimes uses unimplemented parameter requests to
      check for features. Remove the error message to avoid bloating the
      kernel log.
      Signed-off-by: default avatarThomas Hellstrom <thellstrom@vmware.com>
      Reviewed-by: default avatarBrian Paul <brianp@vmware.com>
      Reviewed-by: default avatarSinclair Yeh <syeh@vmware.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      447ed742
    • Thomas Hellstrom's avatar
      drm/ttm, drm/vmwgfx: Relax permission checking when opening surfaces · ef9e7a11
      Thomas Hellstrom authored
      commit fe25deb7 upstream.
      
      Previously, when a surface was opened using a legacy (non prime) handle,
      it was verified to have been created by a client in the same master realm.
      Relax this so that opening is also allowed recursively if the client
      already has the surface open.
      
      This works around a regression in svga mesa where opening of a shared
      surface is used recursively to obtain surface information.
      Signed-off-by: default avatarThomas Hellstrom <thellstrom@vmware.com>
      Reviewed-by: default avatarSinclair Yeh <syeh@vmware.com>
      [bwh: Backported to 3.16: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      ef9e7a11
    • Murray McAllister's avatar
      drm/vmwgfx: avoid calling vzalloc with a 0 size in vmw_get_cap_3d_ioctl() · dc7d9316
      Murray McAllister authored
      commit 63774069 upstream.
      
      In vmw_get_cap_3d_ioctl(), a user can supply 0 for a size that is
      used in vzalloc(). This eventually calls dump_stack() (in warn_alloc()),
      which can leak useful addresses to dmesg.
      
      Add check to avoid a size of 0.
      Signed-off-by: default avatarMurray McAllister <murray.mcallister@insomniasec.com>
      Reviewed-by: default avatarSinclair Yeh <syeh@vmware.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      dc7d9316
    • Thomas Hellstrom's avatar
      drm/vmwgfx: Type-check lookups of fence objects · 1ab4c8e8
      Thomas Hellstrom authored
      commit f7652afa upstream.
      
      A malicious caller could otherwise hand over handles to other objects
      causing all sorts of interesting problems.
      
      Testing done: Ran a Fedora 25 desktop using both Xorg and
      gnome-shell/Wayland.
      Signed-off-by: default avatarThomas Hellstrom <thellstrom@vmware.com>
      Reviewed-by: default avatarSinclair Yeh <syeh@vmware.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      1ab4c8e8
    • Nicholas Bellinger's avatar
      iscsi-target: Fix TMR reference leak during session shutdown · c7e911db
      Nicholas Bellinger authored
      commit efb2ea77 upstream.
      
      This patch fixes a iscsi-target specific TMR reference leak
      during session shutdown, that could occur when a TMR was
      quiesced before the hand-off back to iscsi-target code
      via transport_cmd_check_stop_to_fabric().
      
      The reference leak happens because iscsit_free_cmd() was
      incorrectly skipping the final target_put_sess_cmd() for
      TMRs when transport_generic_free_cmd() returned zero because
      the se_cmd->cmd_kref did not reach zero, due to the missing
      se_cmd assignment in original code.
      
      The result was iscsi_cmd and it's associated se_cmd memory
      would be freed once se_sess->sess_cmd_map where released,
      but the associated se_tmr_req was leaked and remained part
      of se_device->dev_tmr_list.
      
      This bug would manfiest itself as kernel paging request
      OOPsen in core_tmr_lun_reset(), when a left-over se_tmr_req
      attempted to dereference it's se_cmd pointer that had
      already been released during normal session shutdown.
      
      To address this bug, go ahead and treat ISCSI_OP_SCSI_CMD
      and ISCSI_OP_SCSI_TMFUNC the same when there is an extra
      se_cmd->cmd_kref to drop in iscsit_free_cmd(), and use
      op_scsi to signal __iscsit_free_cmd() when the former
      needs to clear any further iscsi related I/O state.
      Reported-by: default avatarRob Millner <rlm@daterainc.com>
      Cc: Rob Millner <rlm@daterainc.com>
      Reported-by: default avatarChu Yuan Lin <cyl@datera.io>
      Cc: Chu Yuan Lin <cyl@datera.io>
      Tested-by: default avatarChu Yuan Lin <cyl@datera.io>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      [bwh: Backported to 3.16: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      c7e911db
    • Sebastian Siewior's avatar
      ubi/upd: Always flush after prepared for an update · 6550ef6f
      Sebastian Siewior authored
      commit 9cd9a21c upstream.
      
      In commit 6afaf8a4 ("UBI: flush wl before clearing update marker") I
      managed to trigger and fix a similar bug. Now here is another version of
      which I assumed it wouldn't matter back then but it turns out UBI has a
      check for it and will error out like this:
      
      |ubi0 warning: validate_vid_hdr: inconsistent used_ebs
      |ubi0 error: validate_vid_hdr: inconsistent VID header at PEB 592
      
      All you need to trigger this is? "ubiupdatevol /dev/ubi0_0 file" + a
      powercut in the middle of the operation.
      ubi_start_update() sets the update-marker and puts all EBs on the erase
      list. After that userland can proceed to write new data while the old EB
      aren't erased completely. A powercut at this point is usually not that
      much of a tragedy. UBI won't give read access to the static volume
      because it has the update marker. It will most likely set the corrupted
      flag because it misses some EBs.
      So we are all good. Unless the size of the image that has been written
      differs from the old image in the magnitude of at least one EB. In that
      case UBI will find two different values for `used_ebs' and refuse to
      attach the image with the error message mentioned above.
      
      So in order not to get in the situation, the patch will ensure that we
      wait until everything is removed before it tries to write any data.
      The alternative would be to detect such a case and remove all EBs at the
      attached time after we processed the volume-table and see the
      update-marker set. The patch looks bigger and I doubt it is worth it
      since usually the write() will wait from time to time for a new EB since
      usually there not that many spare EB that can be used.
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      6550ef6f
    • Heiko Carstens's avatar
      s390/uaccess: get_user() should zero on failure (again) · db49ae76
      Heiko Carstens authored
      commit d09c5373 upstream.
      
      Commit fd2d2b19 ("s390: get_user() should zero on failure")
      intended to fix s390's get_user() implementation which did not zero
      the target operand if the read from user space faulted. Unfortunately
      the patch has no effect: the corresponding inline assembly specifies
      that the operand is only written to ("=") and the previous value is
      discarded.
      
      Therefore the compiler is free to and actually does omit the zero
      initialization.
      
      To fix this simply change the contraint modifier to "+", so the
      compiler cannot omit the initialization anymore.
      
      Fixes: c9ca7841 ("s390/uaccess: provide inline variants of get_user/put_user")
      Fixes: fd2d2b19 ("s390: get_user() should zero on failure")
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      db49ae76
    • Guillaume Nault's avatar
      l2tp: purge socket queues in the .destruct() callback · 8f2293a7
      Guillaume Nault authored
      commit e91793bb upstream.
      
      The Rx path may grab the socket right before pppol2tp_release(), but
      nothing guarantees that it will enqueue packets before
      skb_queue_purge(). Therefore, the socket can be destroyed without its
      queues fully purged.
      
      Fix this by purging queues in pppol2tp_session_destruct() where we're
      guaranteed nothing is still referencing the socket.
      
      Fixes: 9e9cb622 ("l2tp: fix userspace reception on plain L2TP sockets")
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      8f2293a7
    • Mathias Nyman's avatar
      xhci: Manually give back cancelled URB if we can't queue it for cancel · ce339190
      Mathias Nyman authored
      commit d3519b9d upstream.
      
      xhci needs to take care of four scenarios when asked to cancel a URB.
      
      1 URB is not queued or already given back.
        usb_hcd_check_unlink_urb() will return an error, we pass the error on
      
      2 We fail to find xhci internal structures from urb private data such as
        virtual device and endpoint ring.
        Give back URB immediately, can't do anything about internal structures.
      
      3 URB private data has valid pointers to xhci internal data, but host is
        not  responding.
        give back URB immedately and remove the URB from the endpoint lists.
      
      4 Everyting is working
        add URB to cancel list, queue a command to stop the endpoint, after
        which the URB can be turned to no-op or skipped, removed from lists,
        and given back.
      
      We failed to give back the urb in case 2 where the correct device and
      endpoint pointers could not be retrieved from URB private data.
      
      This caused a hang on Dell Inspiron 5558/0VNM2T at resume from suspend
      as urb was never returned.
      
      [  245.270505] INFO: task rtsx_usb_ms_1:254 blocked for more than 120 seconds.
      [  245.272244]       Tainted: G        W       4.11.0-rc3-ARCH #2
      [  245.273983] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  245.275737] rtsx_usb_ms_1   D    0   254      2 0x00000000
      [  245.277524] Call Trace:
      [  245.279278]  __schedule+0x2d3/0x8a0
      [  245.281077]  schedule+0x3d/0x90
      [  245.281961]  usb_kill_urb.part.3+0x6c/0xa0 [usbcore]
      [  245.282861]  ? wake_atomic_t_function+0x60/0x60
      [  245.283760]  usb_kill_urb+0x21/0x30 [usbcore]
      [  245.284649]  usb_start_wait_urb+0xe5/0x170 [usbcore]
      [  245.285541]  ? try_to_del_timer_sync+0x53/0x80
      [  245.286434]  usb_bulk_msg+0xbd/0x160 [usbcore]
      [  245.287326]  rtsx_usb_send_cmd+0x63/0x90 [rtsx_usb]
      
      Reported-by: diego.viola@gmail.com
      Tested-by: diego.viola@gmail.com
      Signed-off-by: default avatarMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [bwh: Backported to 3.16: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      ce339190
    • Josh Poimboeuf's avatar
      ACPI: Fix incompatibility with mcount-based function graph tracing · 0dc84f3b
      Josh Poimboeuf authored
      commit 61b79e16 upstream.
      
      Paul Menzel reported a warning:
      
        WARNING: CPU: 0 PID: 774 at /build/linux-ROBWaj/linux-4.9.13/kernel/trace/trace_functions_graph.c:233 ftrace_return_to_handler+0x1aa/0x1e0
        Bad frame pointer: expected f6919d98, received f6919db0
          from func acpi_pm_device_sleep_wake return to c43b6f9d
      
      The warning means that function graph tracing is broken for the
      acpi_pm_device_sleep_wake() function.  That's because the ACPI Makefile
      unconditionally sets the '-Os' gcc flag to optimize for size.  That's an
      issue because mcount-based function graph tracing is incompatible with
      '-Os' on x86, thanks to the following gcc bug:
      
        https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42109
      
      I have another patch pending which will ensure that mcount-based
      function graph tracing is never used with CONFIG_CC_OPTIMIZE_FOR_SIZE on
      x86.
      
      But this patch is needed in addition to that one because the ACPI
      Makefile overrides that config option for no apparent reason.  It has
      had this flag since the beginning of git history, and there's no related
      comment, so I don't know why it's there.  As far as I can tell, there's
      no reason for it to be there.  The appropriate behavior is for it to
      honor CONFIG_CC_OPTIMIZE_FOR_{SIZE,PERFORMANCE} like the rest of the
      kernel.
      Reported-by: default avatarPaul Menzel <pmenzel@molgen.mpg.de>
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Acked-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      0dc84f3b
    • James Morse's avatar
      ACPI / APEI: Add missing synchronize_rcu() on NOTIFY_SCI removal · 01bfe308
      James Morse authored
      commit 7d64f82c upstream.
      
      When removing a GHES device notified by SCI, list_del_rcu() is used,
      ghes_remove() should call synchronize_rcu() before it goes on to call
      kfree(ghes), otherwise concurrent RCU readers may still hold this list
      entry after it has been freed.
      Signed-off-by: default avatarJames Morse <james.morse@arm.com>
      Reviewed-by: default avatar"Huang, Ying" <ying.huang@intel.com>
      Fixes: 81e88fdc (ACPI, APEI, Generic Hardware Error Source POLL/IRQ/NMI notification type support)
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      01bfe308
    • Joerg Roedel's avatar
      ACPI: Do not create a platform_device for IOAPIC/IOxAPIC · 4e153cff
      Joerg Roedel authored
      commit 08f63d97 upstream.
      
      No platform-device is required for IO(x)APICs, so don't even
      create them.
      
      [ rjw: This fixes a problem with leaking platform device objects
        after IOAPIC/IOxAPIC hot-removal events.]
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      4e153cff
    • Arnd Bergmann's avatar
      virtio_balloon: prevent uninitialized variable use · 3b5d8b04
      Arnd Bergmann authored
      commit f0bb2d50 upstream.
      
      The latest gcc-7.0.1 snapshot reports a new warning:
      
      virtio/virtio_balloon.c: In function 'update_balloon_stats':
      virtio/virtio_balloon.c:258:26: error: 'events[2]' is used uninitialized in this function [-Werror=uninitialized]
      virtio/virtio_balloon.c:260:26: error: 'events[3]' is used uninitialized in this function [-Werror=uninitialized]
      virtio/virtio_balloon.c:261:56: error: 'events[18]' is used uninitialized in this function [-Werror=uninitialized]
      virtio/virtio_balloon.c:262:56: error: 'events[17]' is used uninitialized in this function [-Werror=uninitialized]
      
      This seems absolutely right, so we should add an extra check to
      prevent copying uninitialized stack data into the statistics.
      >From all I can tell, this has been broken since the statistics code
      was originally added in 2.6.34.
      
      Fixes: 9564e138 ("virtio: Add memory statistics reporting to the balloon driver (V4)")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarLadi Prosek <lprosek@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      3b5d8b04
    • Ladi Prosek's avatar
      virtio_balloon: init 1st buffer in stats vq · 710aa47c
      Ladi Prosek authored
      commit fc865322 upstream.
      
      When init_vqs runs, virtio_balloon.stats is either uninitialized or
      contains stale values. The host updates its state with garbage data
      because it has no way of knowing that this is just a marker buffer
      used for signaling.
      
      This patch updates the stats before pushing the initial buffer.
      
      Alternative fixes:
      * Push an empty buffer in init_vqs. Not easily done with the current
        virtio implementation and violates the spec "Driver MUST supply the
        same subset of statistics in all buffers submitted to the statsq".
      * Push a buffer with invalid tags in init_vqs. Violates the same
        spec clause, plus "invalid tag" is not really defined.
      
      Note: the spec says:
      	When using the legacy interface, the device SHOULD ignore all values in
      	the first buffer in the statsq supplied by the driver after device
      	initialization. Note: Historically, drivers supplied an uninitialized
      	buffer in the first buffer.
      
      Unfortunately QEMU does not seem to implement the recommendation
      even for the legacy interface.
      Signed-off-by: default avatarLadi Prosek <lprosek@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      710aa47c
    • Benjamin Herrenschmidt's avatar
      powerpc: Disable HFSCR[TM] if TM is not supported · aca2e939
      Benjamin Herrenschmidt authored
      commit 7ed23e1b upstream.
      
      On Power8 & Power9 the early CPU inititialisation in __init_HFSCR()
      turns on HFSCR[TM] (Hypervisor Facility Status and Control Register
      [Transactional Memory]), but that doesn't take into account that TM
      might be disabled by CPU features, or disabled by the kernel being built
      with CONFIG_PPC_TRANSACTIONAL_MEM=n.
      
      So later in boot, when we have setup the CPU features, clear HSCR[TM] if
      the TM CPU feature has been disabled. We use CPU_FTR_TM_COMP to account
      for the CONFIG_PPC_TRANSACTIONAL_MEM=n case.
      
      Without this a KVM guest might try use TM, even if told not to, and
      cause an oops in the host kernel. Typically the oops is seen in
      __kvmppc_vcore_entry() and may or may not be fatal to the host, but is
      always bad news.
      
      In practice all shipping CPU revisions do support TM, and all host
      kernels we are aware of build with TM support enabled, so no one should
      actually be able to hit this in the wild.
      
      Fixes: 2a3563b0 ("powerpc: Setup in HFSCR for POWER8")
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Tested-by: default avatarSam Bobroff <sam.bobroff@au1.ibm.com>
      [mpe: Rewrite change log with input from Sam, add Fixes/stable]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      [bwh: Backported to 3.16: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      aca2e939
    • Gao Feng's avatar
      netfilter: nf_nat_snmp: Fix panic when snmp_trap_helper fails to register · db6a9ec5
      Gao Feng authored
      commit 75c689dc upstream.
      
      In the commit 93557f53 ("netfilter: nf_conntrack: nf_conntrack snmp
      helper"), the snmp_helper is replaced by nf_nat_snmp_hook. So the
      snmp_helper is never registered. But it still tries to unregister the
      snmp_helper, it could cause the panic.
      
      Now remove the useless snmp_helper and the unregister call in the
      error handler.
      
      Fixes: 93557f53 ("netfilter: nf_conntrack: nf_conntrack snmp helper")
      Signed-off-by: default avatarGao Feng <fgao@ikuai8.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      db6a9ec5
    • Alan Stern's avatar
      USB: fix linked-list corruption in rh_call_control() · af168762
      Alan Stern authored
      commit 16336820 upstream.
      
      Using KASAN, Dmitry found a bug in the rh_call_control() routine: If
      buffer allocation fails, the routine returns immediately without
      unlinking its URB from the control endpoint, eventually leading to
      linked-list corruption.
      
      This patch fixes the problem by jumping to the end of the routine
      (where the URB is unlinked) when an allocation failure occurs.
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Reported-and-tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      af168762
    • Theodore Ts'o's avatar
      ext4: lock the xattr block before checksuming it · 9140c0ce
      Theodore Ts'o authored
      commit dac7a4b4 upstream.
      
      We must lock the xattr block before calculating or verifying the
      checksum in order to avoid spurious checksum failures.
      
      https://bugzilla.kernel.org/show_bug.cgi?id=193661Reported-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      [bwh: Backported to 3.16: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      9140c0ce
    • Arnd Bergmann's avatar
      IB/qib: fix false-postive maybe-uninitialized warning · 48b0c712
      Arnd Bergmann authored
      commit f6aafac1 upstream.
      
      aarch64-linux-gcc-7 complains about code it doesn't fully understand:
      
      drivers/infiniband/hw/qib/qib_iba7322.c: In function 'qib_7322_txchk_change':
      include/asm-generic/bitops/non-atomic.h:105:35: error: 'shadow' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      
      The code is right, and despite trying hard, I could not come up with a version
      that I liked better than just adding a fake initialization here to shut up the
      warning.
      
      Fixes: f931551b ("IB/qib: Add new qib driver for QLogic PCIe InfiniBand adapters")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarIra Weiny <ira.weiny@intel.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      48b0c712
    • Nathan Sullivan's avatar
      net: phy: handle state correctly in phy_stop_machine · b7538f41
      Nathan Sullivan authored
      commit 49d52e81 upstream.
      
      If the PHY is halted on stop, then do not set the state to PHY_UP.  This
      ensures the phy will be restarted later in phy_start when the machine is
      started again.
      
      Fixes: 00db8189 ("This patch adds a PHY Abstraction Layer to the Linux Kernel, enabling ethernet drivers to remain as ignorant as is reasonable of the connected PHY's design and operation details.")
      Signed-off-by: default avatarNathan Sullivan <nathan.sullivan@ni.com>
      Signed-off-by: default avatarBrad Mouring <brad.mouring@ni.com>
      Acked-by: default avatarXander Huff <xander.huff@ni.com>
      Acked-by: default avatarKyle Roeschley <kyle.roeschley@ni.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      b7538f41
    • Takashi Iwai's avatar
      ALSA: seq: Fix race during FIFO resize · e91325f2
      Takashi Iwai authored
      commit 2d7d5400 upstream.
      
      When a new event is queued while processing to resize the FIFO in
      snd_seq_fifo_clear(), it may lead to a use-after-free, as the old pool
      that is being queued gets removed.  For avoiding this race, we need to
      close the pool to be deleted and sync its usage before actually
      deleting it.
      
      The issue was spotted by syzkaller.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      e91325f2
    • Arnd Bergmann's avatar
      hwmon: (asus_atk0110) fix uninitialized data access · b615ac75
      Arnd Bergmann authored
      commit a2125d02 upstream.
      
      The latest gcc-7 snapshot adds a warning to point out that when
      atk_read_value_old or atk_read_value_new fails, we copy
      uninitialized data into sensor->cached_value:
      
      drivers/hwmon/asus_atk0110.c: In function 'atk_input_show':
      drivers/hwmon/asus_atk0110.c:651:26: error: 'value' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      
      Adding an error check avoids this. All versions of the driver
      are affected.
      
      Fixes: 2c03d07a ("hwmon: Add Asus ATK0110 support")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarLuca Tettamanti <kronos.it@gmail.com>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      b615ac75
    • David Hildenbrand's avatar
      KVM: kvm_io_bus_unregister_dev() should never fail · 33588110
      David Hildenbrand authored
      commit 90db1043 upstream.
      
      No caller currently checks the return value of
      kvm_io_bus_unregister_dev(). This is evil, as all callers silently go on
      freeing their device. A stale reference will remain in the io_bus,
      getting at least used again, when the iobus gets teared down on
      kvm_destroy_vm() - leading to use after free errors.
      
      There is nothing the callers could do, except retrying over and over
      again.
      
      So let's simply remove the bus altogether, print an error and make
      sure no one can access this broken bus again (returning -ENOMEM on any
      attempt to access it).
      
      Fixes: e93f8a0f ("KVM: convert io_bus to SRCU")
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Reviewed-by: default avatarCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      [bwh: Backported to 3.16:
       - Drop changes to kvm_io_bus_get_dev()
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      33588110
    • Peter Xu's avatar
      KVM: x86: clear bus pointer when destroyed · 937a0196
      Peter Xu authored
      commit df630b8c upstream.
      
      When releasing the bus, let's clear the bus pointers to mark it out. If
      any further device unregister happens on this bus, we know that we're
      done if we found the bus being released already.
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      937a0196
    • Ankur Arora's avatar
      xen/acpi: upload PM state from init-domain to Xen · 0f98d587
      Ankur Arora authored
      commit 1914f0cd upstream.
      
      This was broken in commit cd979883 ("xen/acpi-processor:
      fix enabling interrupts on syscore_resume"). do_suspend (from
      xen/manage.c) and thus xen_resume_notifier never get called on
      the initial-domain at resume (it is if running as guest.)
      
      The rationale for the breaking change was that upload_pm_data()
      potentially does blocking work in syscore_resume(). This patch
      addresses the original issue by scheduling upload_pm_data() to
      execute in workqueue context.
      
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Based-on-patch-by: default avatarKonrad Wilk <konrad.wilk@oracle.com>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: default avatarAnkur Arora <ankur.a.arora@oracle.com>
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      0f98d587
    • Ilya Dryomov's avatar
      libceph: force GFP_NOIO for socket allocations · 5ff2d2e6
      Ilya Dryomov authored
      commit 633ee407 upstream.
      
      sock_alloc_inode() allocates socket+inode and socket_wq with
      GFP_KERNEL, which is not allowed on the writeback path:
      
          Workqueue: ceph-msgr con_work [libceph]
          ffff8810871cb018 0000000000000046 0000000000000000 ffff881085d40000
          0000000000012b00 ffff881025cad428 ffff8810871cbfd8 0000000000012b00
          ffff880102fc1000 ffff881085d40000 ffff8810871cb038 ffff8810871cb148
          Call Trace:
          [<ffffffff816dd629>] schedule+0x29/0x70
          [<ffffffff816e066d>] schedule_timeout+0x1bd/0x200
          [<ffffffff81093ffc>] ? ttwu_do_wakeup+0x2c/0x120
          [<ffffffff81094266>] ? ttwu_do_activate.constprop.135+0x66/0x70
          [<ffffffff816deb5f>] wait_for_completion+0xbf/0x180
          [<ffffffff81097cd0>] ? try_to_wake_up+0x390/0x390
          [<ffffffff81086335>] flush_work+0x165/0x250
          [<ffffffff81082940>] ? worker_detach_from_pool+0xd0/0xd0
          [<ffffffffa03b65b1>] xlog_cil_force_lsn+0x81/0x200 [xfs]
          [<ffffffff816d6b42>] ? __slab_free+0xee/0x234
          [<ffffffffa03b4b1d>] _xfs_log_force_lsn+0x4d/0x2c0 [xfs]
          [<ffffffff811adc1e>] ? lookup_page_cgroup_used+0xe/0x30
          [<ffffffffa039a723>] ? xfs_reclaim_inode+0xa3/0x330 [xfs]
          [<ffffffffa03b4dcf>] xfs_log_force_lsn+0x3f/0xf0 [xfs]
          [<ffffffffa039a723>] ? xfs_reclaim_inode+0xa3/0x330 [xfs]
          [<ffffffffa03a62c6>] xfs_iunpin_wait+0xc6/0x1a0 [xfs]
          [<ffffffff810aa250>] ? wake_atomic_t_function+0x40/0x40
          [<ffffffffa039a723>] xfs_reclaim_inode+0xa3/0x330 [xfs]
          [<ffffffffa039ac07>] xfs_reclaim_inodes_ag+0x257/0x3d0 [xfs]
          [<ffffffffa039bb13>] xfs_reclaim_inodes_nr+0x33/0x40 [xfs]
          [<ffffffffa03ab745>] xfs_fs_free_cached_objects+0x15/0x20 [xfs]
          [<ffffffff811c0c18>] super_cache_scan+0x178/0x180
          [<ffffffff8115912e>] shrink_slab_node+0x14e/0x340
          [<ffffffff811afc3b>] ? mem_cgroup_iter+0x16b/0x450
          [<ffffffff8115af70>] shrink_slab+0x100/0x140
          [<ffffffff8115e425>] do_try_to_free_pages+0x335/0x490
          [<ffffffff8115e7f9>] try_to_free_pages+0xb9/0x1f0
          [<ffffffff816d56e4>] ? __alloc_pages_direct_compact+0x69/0x1be
          [<ffffffff81150cba>] __alloc_pages_nodemask+0x69a/0xb40
          [<ffffffff8119743e>] alloc_pages_current+0x9e/0x110
          [<ffffffff811a0ac5>] new_slab+0x2c5/0x390
          [<ffffffff816d71c4>] __slab_alloc+0x33b/0x459
          [<ffffffff815b906d>] ? sock_alloc_inode+0x2d/0xd0
          [<ffffffff8164bda1>] ? inet_sendmsg+0x71/0xc0
          [<ffffffff815b906d>] ? sock_alloc_inode+0x2d/0xd0
          [<ffffffff811a21f2>] kmem_cache_alloc+0x1a2/0x1b0
          [<ffffffff815b906d>] sock_alloc_inode+0x2d/0xd0
          [<ffffffff811d8566>] alloc_inode+0x26/0xa0
          [<ffffffff811da04a>] new_inode_pseudo+0x1a/0x70
          [<ffffffff815b933e>] sock_alloc+0x1e/0x80
          [<ffffffff815ba855>] __sock_create+0x95/0x220
          [<ffffffff815baa04>] sock_create_kern+0x24/0x30
          [<ffffffffa04794d9>] con_work+0xef9/0x2050 [libceph]
          [<ffffffffa04aa9ec>] ? rbd_img_request_submit+0x4c/0x60 [rbd]
          [<ffffffff81084c19>] process_one_work+0x159/0x4f0
          [<ffffffff8108561b>] worker_thread+0x11b/0x530
          [<ffffffff81085500>] ? create_worker+0x1d0/0x1d0
          [<ffffffff8108b6f9>] kthread+0xc9/0xe0
          [<ffffffff8108b630>] ? flush_kthread_worker+0x90/0x90
          [<ffffffff816e1b98>] ret_from_fork+0x58/0x90
          [<ffffffff8108b630>] ? flush_kthread_worker+0x90/0x90
      
      Use memalloc_noio_{save,restore}() to temporarily force GFP_NOIO here.
      
      Link: http://tracker.ceph.com/issues/19309Reported-by: default avatarSergey Jerusalimov <wintchester@gmail.com>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarJeff Layton <jlayton@redhat.com>
      [bwh: Backported to 3.16:
       - memalloc_noio_{save,restore}() are declared in <linux/sched.h>
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      5ff2d2e6