1. 10 Oct, 2014 40 commits
    • Steven Rostedt (Red Hat)'s avatar
      ring-buffer: Fix infinite spin in reading buffer · 739ff0ec
      Steven Rostedt (Red Hat) authored
      Commit 651e22f2 "ring-buffer: Always reset iterator to reader page"
      fixed one bug but in the process caused another one. The reset is to
      update the header page, but that fix also changed the way the cached
      reads were updated. The cache reads are used to test if an iterator
      needs to be updated or not.
      
      A ring buffer iterator, when created, disables writes to the ring buffer
      but does not stop other readers or consuming reads from happening.
      Although all readers are synchronized via a lock, they are only
      synchronized when in the ring buffer functions. Those functions may
      be called by any number of readers. The iterator continues down when
      its not interrupted by a consuming reader. If a consuming read
      occurs, the iterator starts from the beginning of the buffer.
      
      The way the iterator sees that a consuming read has happened since
      its last read is by checking the reader "cache". The cache holds the
      last counts of the read and the reader page itself.
      
      Commit 651e22f2 changed what was saved by the cache_read when
      the rb_iter_reset() occurred, making the iterator never match the cache.
      Then if the iterator calls rb_iter_reset(), it will go into an
      infinite loop by checking if the cache doesn't match, doing the reset
      and retrying, just to see that the cache still doesn't match! Which
      should never happen as the reset is suppose to set the cache to the
      current value and there's locks that keep a consuming reader from
      having access to the data.
      
      Fixes: 651e22f2 "ring-buffer: Always reset iterator to reader page"
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      
      (cherry picked from commit 24607f11)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      739ff0ec
    • Peter Zijlstra's avatar
      perf: fix perf bug in fork() · fda98b27
      Peter Zijlstra authored
      Oleg noticed that a cleanup by Sylvain actually uncovered a bug; by
      calling perf_event_free_task() when failing sched_fork() we will not yet
      have done the memset() on ->perf_event_ctxp[] and will therefore try and
      'free' the inherited contexts, which are still in use by the parent
      process.  This is bad..
      Suggested-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reported-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reported-by: default avatarSylvain 'ythier' Hitier <sylvain.hitier@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit 6c72e350)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      fda98b27
    • Jan Kara's avatar
      udf: Avoid infinite loop when processing indirect ICBs · cbcecd10
      Jan Kara authored
      We did not implement any bound on number of indirect ICBs we follow when
      loading inode. Thus corrupted medium could cause kernel to go into an
      infinite loop, possibly causing a stack overflow.
      
      Fix the possible stack overflow by removing recursion from
      __udf_read_inode() and limit number of indirect ICBs we follow to avoid
      infinite loops.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      
      (cherry picked from commit c03aa9f6)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      cbcecd10
    • Julian Anastasov's avatar
      ipvs: fix ipv6 hook registration for local replies · 82b32079
      Julian Anastasov authored
      commit fc604767
      ("ipvs: changes for local real server") from 2.6.37
      introduced DNAT support to local real server but the
      IPv6 LOCAL_OUT handler ip_vs_local_reply6() is
      registered incorrectly as IPv4 hook causing any outgoing
      IPv4 traffic to be dropped depending on the IP header values.
      
      Chris tracked down the problem to CONFIG_IP_VS_IPV6=y
      Bug report: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1349768Reported-by: default avatarChris J Arges <chris.j.arges@canonical.com>
      Tested-by: default avatarChris J Arges <chris.j.arges@canonical.com>
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      
      (cherry picked from commit eb90b0c7)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      82b32079
    • Alex Gartrell's avatar
      ipvs: Maintain all DSCP and ECN bits for ipv6 tun forwarding · ae9f5ce2
      Alex Gartrell authored
      Previously, only the four high bits of the tclass were maintained in the
      ipv6 case.  This matches the behavior of ipv4, though whether or not we
      should reflect ECN bits may be up for debate.
      Signed-off-by: default avatarAlex Gartrell <agartrell@fb.com>
      Acked-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      
      (cherry picked from commit 76f084bc)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      ae9f5ce2
    • NeilBrown's avatar
      md/raid1: intialise start_next_window for READ case to avoid hang · 396258de
      NeilBrown authored
      r1_bio->start_next_window is not initialised in the READ
      case, so allow_barrier may incorrectly decrement
         conf->current_window_requests
      which can cause raise_barrier() to block forever.
      
      Fixes: 79ef3a8a
      cc: stable@vger.kernel.org (v3.13+)
      Reported-by: default avatarBrassow Jonathan <jbrassow@redhat.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      
      (cherry picked from commit f0cc9a05)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      396258de
    • NeilBrown's avatar
      md/raid1: fix_read_error should act on all non-faulty devices. · 0d0178c9
      NeilBrown authored
      If a devices is being recovered it is not InSync and is not Faulty.
      
      If a read error is experienced on that device, fix_read_error()
      will be called, but it ignores non-InSync devices.  So it will
      neither fix the error nor fail the device.
      
      It is incorrect that fix_read_error() ignores non-InSync devices.
      It should only ignore Faulty devices.  So fix it.
      
      This became a bug when we allowed reading from a device that was being
      recovered.  It is suitable for any subsequent -stable kernel.
      
      Fixes: da8840a7
      Cc: stable@vger.kernel.org (v3.5+)
      Reported-by: default avatarAlexander Lyakas <alex.bolshoy@gmail.com>
      Tested-by: default avatarAlexander Lyakas <alex.bolshoy@gmail.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      
      (cherry picked from commit b8cb6b4c)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      0d0178c9
    • NeilBrown's avatar
      md/raid1: count resync requests in nr_pending. · 7583c35b
      NeilBrown authored
      Both normal IO and resync IO can be retried with reschedule_retry()
      and so be counted into ->nr_queued, but only normal IO gets counted in
      ->nr_pending.
      
      Before the recent improvement to RAID1 resync there could only
      possibly have been one or the other on the queue.  When handling a
      read failure it could only be normal IO.  So when handle_read_error()
      called freeze_array() the fact that freeze_array only compares
      ->nr_queued against ->nr_pending was safe.
      
      But now that these two types can interleave, we can have both normal
      and resync IO requests queued, so we need to count them both in
      nr_pending.
      
      This error can lead to freeze_array() hanging if there is a read
      error, so it is suitable for -stable.
      
      Fixes: 79ef3a8a
      cc: stable@vger.kernel.org (v3.13+)
      Reported-by: default avatarBrassow Jonathan <jbrassow@redhat.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      
      (cherry picked from commit 34e97f17)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      7583c35b
    • NeilBrown's avatar
      md/raid1: be more cautious where we read-balance during resync. · d15bcf73
      NeilBrown authored
      commit c6d119cf upstream.
      
      commit 79ef3a8a made
      it possible for reads to happen concurrently with resync.
      This means that we need to be more careful where read_balancing
      is allowed during resync - we can no longer be sure that any
      resync that has already started will definitely finish.
      
      So keep read_balancing to before recovery_cp, which is conservative
      but safe.
      
      This bug makes it possible to read from a device that doesn't
      have up-to-date data, so it can cause data corruption.
      So it is suitable for any kernel since 3.11.
      
      Fixes: 79ef3a8aSigned-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      (cherry picked from commit dd777df4)
      
      (cherry picked from commit HEAD)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      d15bcf73
    • Hans Verkuil's avatar
      media: cx18: fix kernel oops with tda8290 tuner · a57cd698
      Hans Verkuil authored
      commit 6a03dc92 upstream.
      
      This was caused by an uninitialized setup.config field.
      
      Based on a suggestion from Devin Heitmueller.
      Signed-off-by: default avatarHans Verkuil <hans.verkuil@cisco.com>
      Thanks-to: Devin Heitmueller <dheitmueller@kernellabs.com>
      Reported-by: default avatarScott Robinson <scott.robinson55@gmail.com>
      Tested-by: default avatarHans Verkuil <hans.verkuil@cisco.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <m.chehab@samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      (cherry picked from commit 94146fe3)
      
      (cherry picked from commit HEAD)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      a57cd698
    • Antti Palosaari's avatar
      media: af9033: feed clock to RF tuner · fa19e70e
      Antti Palosaari authored
      commit 9dc0f3fe upstream.
      
      IT9135 RF tuner clock is coming from demodulator. We need enable it
      early in demod init, before any tuner I/O. Currently it is enabled
      by tuner driver itself, but it is too late and performance will be
      reduced as some registers are not updated correctly. Clock is
      disabled automatically when demod is put onto sleep.
      
      Cc: Bimow Chen <Bimow.Chen@ite.com.tw>
      Signed-off-by: default avatarAntti Palosaari <crope@iki.fi>
      Signed-off-by: default avatarMauro Carvalho Chehab <m.chehab@samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      (cherry picked from commit 994e79d2)
      
      (cherry picked from commit HEAD)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      fa19e70e
    • Malcolm Priestley's avatar
      media: af9035: new IDs: add support for PCTV 78e and PCTV 79e · c469113b
      Malcolm Priestley authored
      commit a04646c0 upstream.
      
      add the following IDs
      USB_PID_PCTV_78E (0x025a) for PCTV 78e
      USB_PID_PCTV_79E (0x0262) for PCTV 79e
      
      For these it9135 devices.
      Signed-off-by: default avatarMalcolm Priestley <tvboxspy@gmail.com>
      Cc: Antti Palosaari <crope@iki.fi>
      Signed-off-by: default avatarAntti Palosaari <crope@iki.fi>
      Signed-off-by: default avatarMauro Carvalho Chehab <m.chehab@samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      (cherry picked from commit 5e80de30)
      
      (cherry picked from commit HEAD)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      c469113b
    • Anton Altaparmakov's avatar
      Fix nasty 32-bit overflow bug in buffer i/o code. · ecbeb346
      Anton Altaparmakov authored
      On 32-bit architectures, the legacy buffer_head functions are not always
      handling the sector number with the proper 64-bit types, and will thus
      fail on 4TB+ disks.
      
      Any code that uses __getblk() (and thus bread(), breadahead(),
      sb_bread(), sb_breadahead(), sb_getblk()), and calls it using a 64-bit
      block on a 32-bit arch (where "long" is 32-bit) causes an inifinite loop
      in __getblk_slow() with an infinite stream of errors logged to dmesg
      like this:
      
        __find_get_block_slow() failed. block=6740375944, b_blocknr=2445408648
        b_state=0x00000020, b_size=512
        device sda1 blocksize: 512
      
      Note how in hex block is 0x191C1F988 and b_blocknr is 0x91C1F988 i.e. the
      top 32-bits are missing (in this case the 0x1 at the top).
      
      This is because grow_dev_page() is broken and has a 32-bit overflow due
      to shifting the page index value (a pgoff_t - which is just 32 bits on
      32-bit architectures) left-shifted as the block number.  But the top
      bits to get lost as the pgoff_t is not type cast to sector_t / 64-bit
      before the shift.
      
      This patch fixes this issue by type casting "index" to sector_t before
      doing the left shift.
      
      Note this is not a theoretical bug but has been seen in the field on a
      4TiB hard drive with logical sector size 512 bytes.
      
      This patch has been verified to fix the infinite loop problem on 3.17-rc5
      kernel using a 4TB disk image mounted using "-o loop".  Without this patch
      doing a "find /nt" where /nt is an NTFS volume causes the inifinite loop
      100% reproducibly whilst with the patch it works fine as expected.
      Signed-off-by: default avatarAnton Altaparmakov <aia21@cantab.net>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit f2d5a944)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      ecbeb346
    • Alex Deucher's avatar
      drm/radeon/px: fix module unload · 3fa700f0
      Alex Deucher authored
      Use the new vga_switcheroo_fini_domain_pm_ops function
      to unregister the pm ops.
      
      Based on a patch from:
      Pali Rohár <pali.rohar@gmail.com>
      
      bug:
      https://bugzilla.kernel.org/show_bug.cgi?id=84431Reviewed-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPali Rohár <pali.rohar@gmail.com>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      
      (cherry picked from commit 2e97140d)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      3fa700f0
    • Alex Deucher's avatar
      drm/nouveau/runpm: fix module unload · 3184db94
      Alex Deucher authored
      Use the new vga_switcheroo_fini_domain_pm_ops function
      to unregister the pm ops.
      
      Based on a patch from:
      Pali Rohár <pali.rohar@gmail.com>
      
      bug:
      https://bugzilla.kernel.org/show_bug.cgi?id=84431Reviewed-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org
      Cc: Ben Skeggs <bskeggs@redhat.com>
      
      (cherry picked from commit 53beaa01)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      3184db94
    • Zhiqiang Zhang's avatar
      arm: armv7: perf: fix armv7 ref-cycles error · 89df6dd6
      Zhiqiang Zhang authored
      ref-cycles event is specially to Intel core, but can still used in arm
      architecture with the wrong return value with 3.10 stable. this patch fix the
      bug and make it return NOT SUPPORTED distinctly.
      
      In upstream this bug has been fixed by other way, which changes more than one
      file and more than 1000 lines. the primary commit is
      6b7658ec.  besides we can not simply
      cherry-pick.
      Signed-off-by: default avatarZhiqiang Zhang <zhangzhiqiang.zhang@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Christopher Covington <cov@codeaurora.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      (cherry picked from commit e7a0374e)
      
      (cherry picked from commit HEAD)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      89df6dd6
    • Cong Wang's avatar
      perf: Fix a race condition in perf_remove_from_context() · e7bfd6e2
      Cong Wang authored
      We saw a kernel soft lockup in perf_remove_from_context(),
      it looks like the `perf` process, when exiting, could not go
      out of the retry loop. Meanwhile, the target process was forking
      a child. So either the target process should execute the smp
      function call to deactive the event (if it was running) or it should
      do a context switch which deactives the event.
      
      It seems we optimize out a context switch in perf_event_context_sched_out(),
      and what's more important, we still test an obsolete task pointer when
      retrying, so no one actually would deactive that event in this situation.
      Fix it directly by reloading the task pointer in perf_remove_from_context().
      
      This should cure the above soft lockup.
      Signed-off-by: default avatarCong Wang <cwang@twopensource.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1409696840-843-1-git-send-email-xiyou.wangcong@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      
      (cherry picked from commit 3577af70)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      e7bfd6e2
    • Matan Barak's avatar
      IB/core: When marshaling uverbs path, clear unused fields · b281319e
      Matan Barak authored
      When marsheling a user path to the kernel struct ib_sa_path, need
      to zero smac, dmac and set the vlan id to the "no vlan" value.
      
      Fixes: dd5f03be ("IB/core: Ethernet L2 attributes in verbs/cm structures")
      Reported-by: default avatarAleksey Senin <alekseys@mellanox.com>
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      
      (cherry picked from commit a59c5850)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      b281319e
    • Richard Larocque's avatar
      alarmtimer: Lock k_itimer during timer callback · a3d590ca
      Richard Larocque authored
      Locks the k_itimer's it_lock member when handling the alarm timer's
      expiry callback.
      
      The regular posix timers defined in posix-timers.c have this lock held
      during timout processing because their callbacks are routed through
      posix_timer_fn().  The alarm timers follow a different path, so they
      ought to grab the lock somewhere else.
      
      Cc: stable@vger.kernel.org
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Sharvil Nanavati <sharvil@google.com>
      Signed-off-by: default avatarRichard Larocque <rlarocque@google.com>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      
      (cherry picked from commit 474e941b)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      a3d590ca
    • Richard Larocque's avatar
      alarmtimer: Do not signal SIGEV_NONE timers · da1301a7
      Richard Larocque authored
      Avoids sending a signal to alarm timers created with sigev_notify set to
      SIGEV_NONE by checking for that special case in the timeout callback.
      
      The regular posix timers avoid sending signals to SIGEV_NONE timers by
      not scheduling any callbacks for them in the first place.  Although it
      would be possible to do something similar for alarm timers, it's simpler
      to handle this as a special case in the timeout.
      
      Prior to this patch, the alarm timer would ignore the sigev_notify value
      and try to deliver signals to the process anyway.  Even worse, the
      sanity check for the value of sigev_signo is skipped when SIGEV_NONE was
      specified, so the signal number could be bogus.  If sigev_signo was an
      unitialized value (as it often would be if SIGEV_NONE is used), then
      it's hard to predict which signal will be sent.
      
      Cc: stable@vger.kernel.org
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Sharvil Nanavati <sharvil@google.com>
      Signed-off-by: default avatarRichard Larocque <rlarocque@google.com>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      
      (cherry picked from commit 265b81d2)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      da1301a7
    • Richard Larocque's avatar
      alarmtimer: Return relative times in timer_gettime · c0e52d5e
      Richard Larocque authored
      Returns the time remaining for an alarm timer, rather than the time at
      which it is scheduled to expire.  If the timer has already expired or it
      is not currently scheduled, the it_value's members are set to zero.
      
      This new behavior matches that of the other posix-timers and the POSIX
      specifications.
      
      This is a change in user-visible behavior, and may break existing
      applications.  Hopefully, few users rely on the old incorrect behavior.
      
      Cc: stable@vger.kernel.org
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Sharvil Nanavati <sharvil@google.com>
      Signed-off-by: default avatarRichard Larocque <rlarocque@google.com>
      [jstultz: minor style tweak]
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      
      (cherry picked from commit e86fea76)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      c0e52d5e
    • John David Anglin's avatar
      parisc: Only use -mfast-indirect-calls option for 32-bit kernel builds · 181c6190
      John David Anglin authored
      In spite of what the GCC manual says, the -mfast-indirect-calls has
      never been supported in the 64-bit parisc compiler. Indirect calls have
      always been done using function descriptors irrespective of the
      -mfast-indirect-calls option.
      
      Recently, it was noticed that a function descriptor was always requested
      when the -mfast-indirect-calls option was specified. This caused
      problems when the option was used in  application code and doesn't make
      any sense because the whole point of the option is to avoid using a
      function descriptor for indirect calls.
      
      Fixing this broke 64-bit kernel builds.
      
      I will fix GCC but for now we need the attached change. This results in
      the same kernel code as before.
      Signed-off-by: default avatarJohn David Anglin <dave.anglin@bell.net>
      Cc: stable@vger.kernel.org  # v3.0+
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      
      (cherry picked from commit d26a7730)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      181c6190
    • Guy Martin's avatar
      parisc: Implement new LWS CAS supporting 64 bit operations. · f82ac1f7
      Guy Martin authored
      The current LWS cas only works correctly for 32bit. The new LWS allows
      for CAS operations of variable size.
      Signed-off-by: default avatarGuy Martin <gmsoft@tuxicoman.be>
      Cc: <stable@vger.kernel.org> # 3.13+
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      
      (cherry picked from commit 89206491)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      f82ac1f7
    • Richard Genoud's avatar
      tty/serial: at91: BUG: disable interrupts when !UART_ENABLE_MS() · 6b2cae48
      Richard Genoud authored
      In set_termios(), interrupts where not disabled if UART_ENABLE_MS() was
      false.
      
      Tested on at91sam9g35.
      Signed-off-by: default avatarRichard Genoud <richard.genoud@gmail.com>
      Cc: stable <stable@vger.kernel.org> # >= 3.16
      Reviewed-by: default avatarPeter Hurley <peter@hurleysoftware.com>
      Acked-by: default avatarNicolas Ferre <nicolas.ferre@atmel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      (cherry picked from commit 35b675b9)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      6b2cae48
    • Michael Ellerman's avatar
      powerpc: Add smp_mb()s to arch_spin_unlock_wait() · f92547ad
      Michael Ellerman authored
      Similar to the previous commit which described why we need to add a
      barrier to arch_spin_is_locked(), we have a similar problem with
      spin_unlock_wait().
      
      We need a barrier on entry to ensure any spinlock we have previously
      taken is visibly locked prior to the load of lock->slock.
      
      It's also not clear if spin_unlock_wait() is intended to have ACQUIRE
      semantics. For now be conservative and add a barrier on exit to give it
      ACQUIRE semantics.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      
      (cherry picked from commit 78e05b14)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      f92547ad
    • Anton Blanchard's avatar
      powerpc/perf: Fix ABIv2 kernel backtraces · 9b413bea
      Anton Blanchard authored
      ABIv2 kernels are failing to backtrace through the kernel. An example:
      
      39.30%  readseek2_proce  [kernel.kallsyms]    [k] find_get_entry
                  |
                  --- find_get_entry
                     __GI___libc_read
      
      The problem is in valid_next_sp() where we check that the new stack
      pointer is at least STACK_FRAME_OVERHEAD below the previous one.
      
      ABIv1 has a minimum stack frame size of 112 bytes consisting of 48 bytes
      and 64 bytes of parameter save area. ABIv2 changes that to 32 bytes
      with no paramter save area.
      
      STACK_FRAME_OVERHEAD is in theory the minimum stack frame size,
      but we over 240 uses of it, some of which assume that it includes
      space for the parameter area.
      
      We need to work through all our stack defines and rationalise them
      but let's fix perf now by creating STACK_FRAME_MIN_SIZE and using
      in valid_next_sp(). This fixes the issue:
      
      30.64%  readseek2_proce  [kernel.kallsyms]    [k] find_get_entry
                  |
                  --- find_get_entry
                     pagecache_get_page
                     generic_file_read_iter
                     new_sync_read
                     vfs_read
                     sys_read
                     syscall_exit
                     __GI___libc_read
      
      Cc: stable@vger.kernel.org # 3.16+
      Reported-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      
      (cherry picked from commit 85101af1)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      9b413bea
    • Wanpeng Li's avatar
      sched: Fix unreleased llc_shared_mask bit during CPU hotplug · 87a5b1a2
      Wanpeng Li authored
      The following bug can be triggered by hot adding and removing a large number of
      xen domain0's vcpus repeatedly:
      
      	BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 IP: [..] find_busiest_group
      	PGD 5a9d5067 PUD 13067 PMD 0
      	Oops: 0000 [#3] SMP
      	[...]
      	Call Trace:
      	load_balance
      	? _raw_spin_unlock_irqrestore
      	idle_balance
      	__schedule
      	schedule
      	schedule_timeout
      	? lock_timer_base
      	schedule_timeout_uninterruptible
      	msleep
      	lock_device_hotplug_sysfs
      	online_store
      	dev_attr_store
      	sysfs_write_file
      	vfs_write
      	SyS_write
      	system_call_fastpath
      
      Last level cache shared mask is built during CPU up and the
      build_sched_domain() routine takes advantage of it to setup
      the sched domain CPU topology.
      
      However, llc_shared_mask is not released during CPU disable,
      which leads to an invalid sched domainCPU topology.
      
      This patch fix it by releasing the llc_shared_mask correctly
      during CPU disable.
      
      Yasuaki also reported that this can happen on real hardware:
      
        https://lkml.org/lkml/2014/7/22/1018
      
      His case is here:
      
      	==
      	Here is an example on my system.
      	My system has 4 sockets and each socket has 15 cores and HT is
      	enabled. In this case, each core of sockes is numbered as
      	follows:
      
      		 | CPU#
      	Socket#0 | 0-14 , 60-74
      	Socket#1 | 15-29, 75-89
      	Socket#2 | 30-44, 90-104
      	Socket#3 | 45-59, 105-119
      
      	Then llc_shared_mask of CPU#30 has 0x3fff80000001fffc0000000.
      
      	It means that last level cache of Socket#2 is shared with
      	CPU#30-44 and 90-104.
      
      	When hot-removing socket#2 and #3, each core of sockets is
      	numbered as follows:
      
      		 | CPU#
      	Socket#0 | 0-14 , 60-74
      	Socket#1 | 15-29, 75-89
      
      	But llc_shared_mask is not cleared. So llc_shared_mask of CPU#30
      	remains having 0x3fff80000001fffc0000000.
      
      	After that, when hot-adding socket#2 and #3, each core of
      	sockets is numbered as follows:
      
      		 | CPU#
      	Socket#0 | 0-14 , 60-74
      	Socket#1 | 15-29, 75-89
      	Socket#2 | 30-59
      	Socket#3 | 90-119
      
      	Then llc_shared_mask of CPU#30 becomes
      	0x3fff8000fffffffc0000000. It means that last level cache of
      	Socket#2 is shared with CPU#30-59 and 90-104. So the mask has
      	the wrong value.
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@linux.intel.com>
      Tested-by: default avatarLinn Crosetto <linn@hp.com>
      Reviewed-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarToshi Kani <toshi.kani@hp.com>
      Reviewed-by: default avatarYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: <stable@vger.kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1411547885-48165-1-git-send-email-wanpeng.li@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      
      (cherry picked from commit 03bd4e1f)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      87a5b1a2
    • Joseph Qi's avatar
      ocfs2/dlm: do not get resource spinlock if lockres is new · 8e663ddf
      Joseph Qi authored
      There is a deadlock case which reported by Guozhonghua:
        https://oss.oracle.com/pipermail/ocfs2-devel/2014-September/010079.html
      
      This case is caused by &res->spinlock and &dlm->master_lock
      misordering in different threads.
      
      It was introduced by commit 8d400b81 ("ocfs2/dlm: Clean up refmap
      helpers").  Since lockres is new, it doesn't not require the
      &res->spinlock.  So remove it.
      
      Fixes: 8d400b81 ("ocfs2/dlm: Clean up refmap helpers")
      Signed-off-by: default avatarJoseph Qi <joseph.qi@huawei.com>
      Reviewed-by: default avatarjoyce.xue <xuejiufei@huawei.com>
      Reported-by: default avatarGuozhonghua <guozhonghua@h3c.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit 5760a97c)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      8e663ddf
    • Andreas Rohner's avatar
      nilfs2: fix data loss with mmap() · 66c262aa
      Andreas Rohner authored
      This bug leads to reproducible silent data loss, despite the use of
      msync(), sync() and a clean unmount of the file system.  It is easily
      reproducible with the following script:
      
        ----------------[BEGIN SCRIPT]--------------------
        mkfs.nilfs2 -f /dev/sdb
        mount /dev/sdb /mnt
      
        dd if=/dev/zero bs=1M count=30 of=/mnt/testfile
      
        umount /mnt
        mount /dev/sdb /mnt
        CHECKSUM_BEFORE="$(md5sum /mnt/testfile)"
      
        /root/mmaptest/mmaptest /mnt/testfile 30 10 5
      
        sync
        CHECKSUM_AFTER="$(md5sum /mnt/testfile)"
        umount /mnt
        mount /dev/sdb /mnt
        CHECKSUM_AFTER_REMOUNT="$(md5sum /mnt/testfile)"
        umount /mnt
      
        echo "BEFORE MMAP:\t$CHECKSUM_BEFORE"
        echo "AFTER MMAP:\t$CHECKSUM_AFTER"
        echo "AFTER REMOUNT:\t$CHECKSUM_AFTER_REMOUNT"
        ----------------[END SCRIPT]--------------------
      
      The mmaptest tool looks something like this (very simplified, with
      error checking removed):
      
        ----------------[BEGIN mmaptest]--------------------
        data = mmap(NULL, file_size - file_offset, PROT_READ | PROT_WRITE,
                    MAP_SHARED, fd, file_offset);
      
        for (i = 0; i < write_count; ++i) {
              memcpy(data + i * 4096, buf, sizeof(buf));
              msync(data, file_size - file_offset, MS_SYNC))
        }
        ----------------[END mmaptest]--------------------
      
      The output of the script looks something like this:
      
        BEFORE MMAP:    281ed1d5ae50e8419f9b978aab16de83  /mnt/testfile
        AFTER MMAP:     6604a1c31f10780331a6850371b3a313  /mnt/testfile
        AFTER REMOUNT:  281ed1d5ae50e8419f9b978aab16de83  /mnt/testfile
      
      So it is clear, that the changes done using mmap() do not survive a
      remount.  This can be reproduced a 100% of the time.  The problem was
      introduced in commit 136e8770 ("nilfs2: fix issue of
      nilfs_set_page_dirty() for page at EOF boundary").
      
      If the page was read with mpage_readpage() or mpage_readpages() for
      example, then it has no buffers attached to it.  In that case
      page_has_buffers(page) in nilfs_set_page_dirty() will be false.
      Therefore nilfs_set_file_dirty() is never called and the pages are never
      collected and never written to disk.
      
      This patch fixes the problem by also calling nilfs_set_file_dirty() if the
      page has no buffers attached to it.
      
      [akpm@linux-foundation.org: s/PAGE_SHIFT/PAGE_CACHE_SHIFT/]
      Signed-off-by: default avatarAndreas Rohner <andreas.rohner@gmx.net>
      Tested-by: default avatarAndreas Rohner <andreas.rohner@gmx.net>
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit 56d7acc7)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      66c262aa
    • Andrey Vagin's avatar
      fs/notify: don't show f_handle if exportfs_encode_inode_fh failed · b8df69c1
      Andrey Vagin authored
      Currently we handle only ENOSPC.  In case of other errors the file_handle
      variable isn't filled properly and we will show a part of stack.
      Signed-off-by: default avatarAndrey Vagin <avagin@openvz.org>
      Acked-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit 7e882481)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      b8df69c1
    • Andrey Vagin's avatar
      fsnotify/fdinfo: use named constants instead of hardcoded values · 88f73632
      Andrey Vagin authored
      MAX_HANDLE_SZ is equal to 128, but currently the size of pad is only 64
      bytes, so exportfs_encode_inode_fh can return an error.
      Signed-off-by: default avatarAndrey Vagin <avagin@openvz.org>
      Acked-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit 1fc98d11)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      88f73632
    • Rasmus Villemoes's avatar
      kcmp: fix standard comparison bug · 58430160
      Rasmus Villemoes authored
      The C operator <= defines a perfectly fine total ordering on the set of
      values representable in a long.  However, unlike its namesake in the
      integers, it is not translation invariant, meaning that we do not have
      "b <= c" iff "a+b <= a+c" for all a,b,c.
      
      This means that it is always wrong to try to boil down the relationship
      between two longs to a question about the sign of their difference,
      because the resulting relation [a LEQ b iff a-b <= 0] is neither
      anti-symmetric or transitive.  The former is due to -LONG_MIN==LONG_MIN
      (take any two a,b with a-b = LONG_MIN; then a LEQ b and b LEQ a, but a !=
      b).  The latter can either be seen observing that x LEQ x+1 for all x,
      implying x LEQ x+1 LEQ x+2 ...  LEQ x-1 LEQ x; or more directly with the
      simple example a=LONG_MIN, b=0, c=1, for which a-b < 0, b-c < 0, but a-c >
      0.
      
      Note that it makes absolutely no difference that a transmogrying bijection
      has been applied before the comparison is done.  In fact, had the
      obfuscation not been done, one could probably not observe the bug
      (assuming all values being compared always lie in one half of the address
      space, the mathematical value of a-b is always representable in a long).
      As it stands, one can easily obtain three file descriptors exhibiting the
      non-transitivity of kcmp().
      
      Side note 1: I can't see that ensuring the MSB of the multiplier is
      set serves any purpose other than obfuscating the obfuscating code.
      
      Side note 2:
      #include <stdio.h>
      #include <stdlib.h>
      #include <string.h>
      #include <fcntl.h>
      #include <unistd.h>
      #include <assert.h>
      #include <sys/syscall.h>
      
      enum kcmp_type {
              KCMP_FILE,
              KCMP_VM,
              KCMP_FILES,
              KCMP_FS,
              KCMP_SIGHAND,
              KCMP_IO,
              KCMP_SYSVSEM,
              KCMP_TYPES,
      };
      pid_t pid;
      
      int kcmp(pid_t pid1, pid_t pid2, int type,
      	 unsigned long idx1, unsigned long idx2)
      {
      	return syscall(SYS_kcmp, pid1, pid2, type, idx1, idx2);
      }
      int cmp_fd(int fd1, int fd2)
      {
      	int c = kcmp(pid, pid, KCMP_FILE, fd1, fd2);
      	if (c < 0) {
      		perror("kcmp");
      		exit(1);
      	}
      	assert(0 <= c && c < 3);
      	return c;
      }
      int cmp_fdp(const void *a, const void *b)
      {
      	static const int normalize[] = {0, -1, 1};
      	return normalize[cmp_fd(*(int*)a, *(int*)b)];
      }
      #define MAX 100 /* This is plenty; I've seen it trigger for MAX==3 */
      int main(int argc, char *argv[])
      {
      	int r, s, count = 0;
      	int REL[3] = {0,0,0};
      	int fd[MAX];
      	pid = getpid();
      	while (count < MAX) {
      		r = open("/dev/null", O_RDONLY);
      		if (r < 0)
      			break;
      		fd[count++] = r;
      	}
      	printf("opened %d file descriptors\n", count);
      	for (r = 0; r < count; ++r) {
      		for (s = r+1; s < count; ++s) {
      			REL[cmp_fd(fd[r], fd[s])]++;
      		}
      	}
      	printf("== %d\t< %d\t> %d\n", REL[0], REL[1], REL[2]);
      	qsort(fd, count, sizeof(fd[0]), cmp_fdp);
      	memset(REL, 0, sizeof(REL));
      
      	for (r = 0; r < count; ++r) {
      		for (s = r+1; s < count; ++s) {
      			REL[cmp_fd(fd[r], fd[s])]++;
      		}
      	}
      	printf("== %d\t< %d\t> %d\n", REL[0], REL[1], REL[2]);
      	return (REL[0] + REL[2] != 0);
      }
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Reviewed-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit acbbe6fb)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      58430160
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Update all ftrace_ops for a ftrace_hash_ops update · 14ed89e4
      Steven Rostedt (Red Hat) authored
      When updating what an ftrace_ops traces, if it is registered (that is,
      actively tracing), and that ftrace_ops uses the shared global_ops
      local_hash, then we need to update all tracers that are active and
      also share the global_ops' ftrace_hash_ops.
      
      Cc: stable@vger.kernel.org # 3.16 (apply after 3.17-rc4 is out)
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      
      (cherry picked from commit 84261912)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      14ed89e4
    • Felipe Balbi's avatar
      usb: dwc3: fix TRB completion when multiple TRBs are started · 98f09ad0
      Felipe Balbi authored
      After commit 2ec2a8be (usb: dwc3: gadget:
      always enable IOC on bulk/interrupt transfers)
      we created a situation where it was possible to
      hang a bulk/interrupt endpoint if we had more
      than one pending request in our queue and they
      were both started with a single Start Transfer
      command.
      
      The problems triggers because we had not enabled
      Transfer In Progress event for those endpoints
      and we were not able to process early giveback
      of requests completed without LST bit set.
      
      Fix the problem by finally enabling Xfer In Progress
      event for all endpoint types, except control.
      
      Fixes: 2ec2a8be (usb: dwc3: gadget: always
      	enable IOC on bulk/interrupt transfers)
      Cc: <stable@vger.kernel.org> # v3.14+
      Reported-by: default avatarPratyush Anand <pratyush.anand@st.com>
      Signed-off-by: default avatarFelipe Balbi <balbi@ti.com>
      
      (cherry picked from commit 0b93a4c8)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      98f09ad0
    • Jens Axboe's avatar
      genhd: fix leftover might_sleep() in blk_free_devt() · 76bef4ac
      Jens Axboe authored
      Commit 2da78092 changed the locking from a mutex to a spinlock,
      so we now longer sleep in this context. But there was a leftover
      might_sleep() in there, which now triggers since we do the final
      free from an RCU callback. Get rid of it.
      Reported-by: default avatarPontus Fuchs <pontus.fuchs@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      
      (cherry picked from commit 46f341ff)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      76bef4ac
    • J. Bruce Fields's avatar
      lockd: fix rpcbind crash on lockd startup failure · 39c3e370
      J. Bruce Fields authored
      Nikita Yuschenko reported that booting a kernel with init=/bin/sh and
      then nfs mounting without portmap or rpcbind running using a busybox
      mount resulted in:
      
        # mount -t nfs 10.30.130.21:/opt /mnt
        svc: failed to register lockdv1 RPC service (errno 111).
        lockd_up: makesock failed, error=-111
        Unable to handle kernel paging request for data at address 0x00000030
        Faulting instruction address: 0xc055e65c
        Oops: Kernel access of bad area, sig: 11 [#1]
        MPC85xx CDS
        Modules linked in:
        CPU: 0 PID: 1338 Comm: mount Not tainted 3.10.44.cge #117
        task: cf29cea0 ti: cf35c000 task.ti: cf35c000
        NIP: c055e65c LR: c0566490 CTR: c055e648
        REGS: cf35dad0 TRAP: 0300   Not tainted  (3.10.44.cge)
        MSR: 00029000 <CE,EE,ME>  CR: 22442488  XER: 20000000
        DEAR: 00000030, ESR: 00000000
      
        GPR00: c05606f4 cf35db80 cf29cea0 cf0ded80 cf0dedb8 00000001 1dec3086
        00000000
        GPR08: 00000000 c07b1640 00000007 1dec3086 22442482 100b9758 00000000
        10090ae8
        GPR16: 00000000 000186a5 00000000 00000000 100c3018 bfa46edc 100b0000
        bfa46ef0
        GPR24: cf386ae0 c07834f0 00000000 c0565f88 00000001 cf0dedb8 00000000
        cf0ded80
        NIP [c055e65c] call_start+0x14/0x34
        LR [c0566490] __rpc_execute+0x70/0x250
        Call Trace:
        [cf35db80] [00000080] 0x80 (unreliable)
        [cf35dbb0] [c05606f4] rpc_run_task+0x9c/0xc4
        [cf35dbc0] [c0560840] rpc_call_sync+0x50/0xb8
        [cf35dbf0] [c056ee90] rpcb_register_call+0x54/0x84
        [cf35dc10] [c056f24c] rpcb_register+0xf8/0x10c
        [cf35dc70] [c0569e18] svc_unregister.isra.23+0x100/0x108
        [cf35dc90] [c0569e38] svc_rpcb_cleanup+0x18/0x30
        [cf35dca0] [c0198c5c] lockd_up+0x1dc/0x2e0
        [cf35dcd0] [c0195348] nlmclnt_init+0x2c/0xc8
        [cf35dcf0] [c015bb5c] nfs_start_lockd+0x98/0xec
        [cf35dd20] [c015ce6c] nfs_create_server+0x1e8/0x3f4
        [cf35dd90] [c0171590] nfs3_create_server+0x10/0x44
        [cf35dda0] [c016528c] nfs_try_mount+0x158/0x1e4
        [cf35de20] [c01670d0] nfs_fs_mount+0x434/0x8c8
        [cf35de70] [c00cd3bc] mount_fs+0x20/0xbc
        [cf35de90] [c00e4f88] vfs_kern_mount+0x50/0x104
        [cf35dec0] [c00e6e0c] do_mount+0x1d0/0x8e0
        [cf35df10] [c00e75ac] SyS_mount+0x90/0xd0
        [cf35df40] [c000ccf4] ret_from_syscall+0x0/0x3c
      
      The addition of svc_shutdown_net() resulted in two calls to
      svc_rpcb_cleanup(); the second is no longer necessary and crashes when
      it calls rpcb_register_call with clnt=NULL.
      Reported-by: default avatarNikita Yushchenko <nyushchenko@dev.rtsoft.ru>
      Fixes: 679b033d "lockd: ensure we tear down any live sockets when socket creation fails during lockd_up"
      Cc: stable@vger.kernel.org
      Acked-by: default avatarJeff Layton <jlayton@primarydata.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      
      (cherry picked from commit 7c17705e)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      39c3e370
    • Larry Finger's avatar
      rtlwifi: rtl8192cu: Add new ID · 1acfa1fa
      Larry Finger authored
      The Sitecom WLA-2102 adapter uses this driver.
      Reported-by: default avatarNico Baggus <nico-linux@noci.xs4all.nl>
      Signed-off-by: default avatarLarry Finger <Larry.Finger@lwfinger.net>
      Cc: Nico Baggus <nico-linux@noci.xs4all.nl>
      Cc: Stable <stable@vger.kernel.org>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      
      (cherry picked from commit c6651716)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      1acfa1fa
    • Tejun Heo's avatar
      percpu: perform tlb flush after pcpu_map_pages() failure · f8e97f7b
      Tejun Heo authored
      If pcpu_map_pages() fails midway, it unmaps the already mapped pages.
      Currently, it doesn't flush tlb after the partial unmapping.  This may
      be okay in most cases as the established mapping hasn't been used at
      that point but it can go wrong and when it goes wrong it'd be
      extremely difficult to track down.
      
      Flush tlb after the partial unmapping.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org
      
      (cherry picked from commit 849f5169)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      f8e97f7b
    • Tejun Heo's avatar
      percpu: fix pcpu_alloc_pages() failure path · 144319ad
      Tejun Heo authored
      When pcpu_alloc_pages() fails midway, pcpu_free_pages() is invoked to
      free what has already been allocated.  The invocation is across the
      whole requested range and pcpu_free_pages() will try to free all
      non-NULL pages; unfortunately, this is incorrect as
      pcpu_get_pages_and_bitmap(), unlike what its comment suggests, doesn't
      clear the pages array and thus the array may have entries from the
      previous invocations making the partial failure path free incorrect
      pages.
      
      Fix it by open-coding the partial freeing of the already allocated
      pages.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org
      
      (cherry picked from commit f0d27965)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      144319ad
    • Honggang Li's avatar
      Revert "percpu: free percpu allocation info for uniprocessor system" · de431443
      Honggang Li authored
      This reverts commit 3189eddb ("percpu: free percpu allocation info for
      uniprocessor system").
      
      The commit causes a hang with a crisv32 image. This may be an architecture
      problem, but at least for now the revert is necessary to be able to boot a
      crisv32 image.
      
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Honggang Li <enjoymindful@gmail.com>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Fixes: 3189eddb ("percpu: free percpu allocation info for uniprocessor system")
      Cc: stable@vger.kernel.org # Please don't apply 3189eddb
      
      percpu-refcount: make percpu_ref based on longs instead of ints
      
      percpu_ref is currently based on ints and the number of refs it can
      cover is (1 << 31).  This makes it impossible to use a percpu_ref to
      count memory objects or pages on 64bit machines as it may overflow.
      This forces those users to somehow aggregate the references before
      contributing to the percpu_ref which is often cumbersome and sometimes
      challenging to get the same level of performance as using the
      percpu_ref directly.
      
      While using ints for the percpu counters makes them pack tighter on
      64bit machines, the possible gain from using ints instead of longs is
      extremely small compared to the overall gain from per-cpu operation.
      This patch makes percpu_ref based on longs so that it can be used to
      directly count memory objects or pages.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      
      percpu-refcount: improve WARN messages
      
      percpu_ref's WARN messages can be a lot more helpful by indicating
      who's the culprit.  Make them report the release function that the
      offending percpu-refcount is associated with.  This should make it a
      lot easier to track down the reported invalid refcnting operations.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Kent Overstreet <kmo@daterainc.com>
      
      percpu: fix locking regression in the failure path of pcpu_alloc()
      
      While updating locking, b38d08f3 ("percpu: restructure locking")
      broke pcpu_create_chunk() creation path in pcpu_alloc().  It returns
      without releasing pcpu_alloc_mutex.  Fix it.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarJulia Lawall <julia.lawall@lip6.fr>
      
      percpu-refcount: add @gfp to percpu_ref_init()
      
      Percpu allocator now supports allocation mask.  Add @gfp to
      percpu_ref_init() so that !GFP_KERNEL allocation masks can be used
      with percpu_refs too.
      
      This patch doesn't make any functional difference.
      
      v2: blk-mq conversion was missing.  Updated.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      
      proportions: add @gfp to init functions
      
      Percpu allocator now supports allocation mask.  Add @gfp to
      [flex_]proportions init functions so that !GFP_KERNEL allocation masks
      can be used with them too.
      
      This patch doesn't make any functional difference.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      
      percpu_counter: add @gfp to percpu_counter_init()
      
      Percpu allocator now supports allocation mask.  Add @gfp to
      percpu_counter_init() so that !GFP_KERNEL allocation masks can be used
      with percpu_counters too.
      
      We could have left percpu_counter_init() alone and added
      percpu_counter_init_gfp(); however, the number of users isn't that
      high and introducing _gfp variants to all percpu data structures would
      be quite ugly, so let's just do the conversion.  This is the one with
      the most users.  Other percpu data structures are a lot easier to
      convert.
      
      This patch doesn't make any functional difference.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarJan Kara <jack@suse.cz>
      Acked-by: default avatar"David S. Miller" <davem@davemloft.net>
      Cc: x86@kernel.org
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      
      percpu_counter: make percpu_counters_lock irq-safe
      
      percpu_counter is scheduled to grow @gfp support to allow atomic
      initialization.  This patch makes percpu_counters_lock irq-safe so
      that it can be safely used from atomic contexts.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      
      percpu: implement asynchronous chunk population
      
      The percpu allocator now supports atomic allocations by only
      allocating from already populated areas but the mechanism to ensure
      that there's adequate amount of populated areas was missing.
      
      This patch expands pcpu_balance_work so that in addition to freeing
      excess free chunks it also populates chunks to maintain an adequate
      level of populated areas.  pcpu_alloc() schedules pcpu_balance_work if
      the amount of free populated areas is too low or after an atomic
      allocation failure.
      
      * PERPCU_DYNAMIC_RESERVE is increased by two pages to account for
        PCPU_EMPTY_POP_PAGES_LOW.
      
      * pcpu_async_enabled is added to gate both async jobs -
        chunk->map_extend_work and pcpu_balance_work - so that we don't end
        up scheduling them while the needed subsystems aren't up yet.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      
      percpu: rename pcpu_reclaim_work to pcpu_balance_work
      
      pcpu_reclaim_work will also be used to populate chunks asynchronously.
      Rename it to pcpu_balance_work in preparation.  pcpu_reclaim() is
      renamed to pcpu_balance_workfn() and some of its local variables are
      renamed too.
      
      This is pure rename.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      
      percpu: implmeent pcpu_nr_empty_pop_pages and chunk->nr_populated
      
      pcpu_nr_empty_pop_pages counts the number of empty populated pages
      across all chunks and chunk->nr_populated counts the number of
      populated pages in a chunk.  Both will be used to implement pre/async
      population for atomic allocations.
      
      pcpu_chunk_[de]populated() are added to update chunk->populated,
      chunk->nr_populated and pcpu_nr_empty_pop_pages together.  All
      successful chunk [de]populations should be followed by the
      corresponding pcpu_chunk_[de]populated() calls.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      
      percpu: make sure chunk->map array has available space
      
      An allocation attempt may require extending chunk->map array which
      requires GFP_KERNEL context which isn't available for atomic
      allocations.  This patch ensures that chunk->map array usually keeps
      some amount of available space by directly allocating buffer space
      during GFP_KERNEL allocations and scheduling async extension during
      atomic ones.  This should make atomic allocation failures from map
      space exhaustion rare.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      
      percpu: implement [__]alloc_percpu_gfp()
      
      Now that pcpu_alloc_area() can allocate only from populated areas,
      it's easy to add atomic allocation support to [__]alloc_percpu().
      Update pcpu_alloc() so that it accepts @gfp and skips all the blocking
      operations and allocates only from the populated areas if @gfp doesn't
      contain GFP_KERNEL.  New interface functions [__]alloc_percpu_gfp()
      are added.
      
      While this means that atomic allocations are possible, this isn't
      complete yet as there's no mechanism to ensure that certain amount of
      populated areas is kept available and atomic allocations may keep
      failing under certain conditions.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      
      percpu: indent the population block in pcpu_alloc()
      
      The next patch will conditionalize the population block in
      pcpu_alloc() which will end up making a rather large indentation
      change obfuscating the actual logic change.  This patch puts the block
      under "if (true)" so that the next patch can avoid indentation
      changes.  The defintions of the local variables which are used only in
      the block are moved into the block.
      
      This patch is purely cosmetic.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      
      percpu: make pcpu_alloc_area() capable of allocating only from populated areas
      
      Update pcpu_alloc_area() so that it can skip unpopulated areas if the
      new parameter @pop_only is true.  This is implemented by a new
      function, pcpu_fit_in_area(), which determines the amount of head
      padding considering the alignment and populated state.
      
      @pop_only is currently always false but this will be used to implement
      atomic allocation.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      
      percpu: restructure locking
      
      At first, the percpu allocator required a sleepable context for both
      alloc and free paths and used pcpu_alloc_mutex to protect everything.
      Later, pcpu_lock was introduced to protect the index data structure so
      that the free path can be invoked from atomic contexts.  The
      conversion only updated what's necessary and left most of the
      allocation path under pcpu_alloc_mutex.
      
      The percpu allocator is planned to add support for atomic allocation
      and this patch restructures locking so that the coverage of
      pcpu_alloc_mutex is further reduced.
      
      * pcpu_alloc() now grab pcpu_alloc_mutex only while creating a new
        chunk and populating the allocated area.  Everything else is now
        protected soley by pcpu_lock.
      
        After this change, multiple instances of pcpu_extend_area_map() may
        race but the function already implements sufficient synchronization
        using pcpu_lock.
      
        This also allows multiple allocators to arrive at new chunk
        creation.  To avoid creating multiple empty chunks back-to-back, a
        new chunk is created iff there is no other empty chunk after
        grabbing pcpu_alloc_mutex.
      
      * pcpu_lock is now held while modifying chunk->populated bitmap.
        After this, all data structures are protected by pcpu_lock.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      
      percpu: make percpu-km set chunk->populated bitmap properly
      
      percpu-km instantiates the whole chunk on creation and doesn't make
      use of chunk->populated bitmap and leaves it as zero.  While this
      currently doesn't cause any problem, the inconsistency makes it
      difficult to build further logic on top of chunk->populated.  This
      patch makes percpu-km fill chunk->populated on creation so that the
      bitmap is always consistent.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      
      percpu: move region iterations out of pcpu_[de]populate_chunk()
      
      Previously, pcpu_[de]populate_chunk() were called with the range which
      may contain multiple target regions in it and
      pcpu_[de]populate_chunk() iterated over the regions.  This has the
      benefit of batching up cache flushes for all the regions; however,
      we're planning to add more bookkeeping logic around [de]population to
      support atomic allocations and this delegation of iterations gets in
      the way.
      
      This patch moves the region iterations out of
      pcpu_[de]populate_chunk() into its callers - pcpu_alloc() and
      pcpu_reclaim() - so that we can later add logic to track more states
      around them.  This change may make cache and tlb flushes more frequent
      but multi-region [de]populations are rare anyway and if this actually
      becomes a problem, it's not difficult to factor out cache flushes as
      separate callbacks which are directly invoked from percpu.c.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      
      percpu: move common parts out of pcpu_[de]populate_chunk()
      
      percpu-vm and percpu-km implement separate versions of
      pcpu_[de]populate_chunk() and some part which is or should be common
      are currently in the specific implementations.  Make the following
      changes.
      
      * Allocate area clearing is moved from the pcpu_populate_chunk()
        implementations to pcpu_alloc().  This makes percpu-km's version
        noop.
      
      * Quick exit tests in pcpu_[de]populate_chunk() of percpu-vm are moved
        to their respective callers so that they are applied to percpu-km
        too.  This doesn't make any meaningful difference as both functions
        are noop for percpu-km; however, this is more consistent and will
        help implementing atomic allocation support.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      
      percpu: remove @may_alloc from pcpu_get_pages()
      
      pcpu_get_pages() creates the temp pages array if not already allocated
      and returns the pointer to it.  As the function is called from both
      [de]population paths and depopulation can only happen after at least
      one successful population, the param doesn't make any difference - the
      allocation will always happen on the population path anyway.
      
      Remove @may_alloc from pcpu_get_pages().  Also, add an lockdep
      assertion pcpu_alloc_mutex instead of vaguely stating that the
      exclusion is the caller's responsibility.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      
      percpu: remove the usage of separate populated bitmap in percpu-vm
      
      percpu-vm uses pcpu_get_pages_and_bitmap() to acquire temp pages array
      and populated bitmap and uses the two during [de]population.  The temp
      bitmap is used only to build the new bitmap that is copied to
      chunk->populated after the operation succeeds; however, the new bitmap
      can be trivially set after success without using the temp bitmap.
      
      This patch removes the temp populated bitmap usage from percpu-vm.c.
      
      * pcpu_get_pages_and_bitmap() is renamed to pcpu_get_pages() and no
        longer hands out the temp bitmap.
      
      * @populated arugment is dropped from all the related functions.
        @populated updates in pcpu_[un]map_pages() are dropped.
      
      * Two loops in pcpu_map_pages() are merged.
      
      * pcpu_[de]populated_chunk() modify chunk->populated bitmap directly
        from @page_start and @page_end after success.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      
      percpu: free percpu allocation info for uniprocessor system
      
      Currently, only SMP system free the percpu allocation info.
      Uniprocessor system should free it too. For example, one x86 UML
      virtual machine with 256MB memory, UML kernel wastes one page memory.
      Signed-off-by: default avatarHonggang Li <enjoymindful@gmail.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org
      
      (cherry picked from commit bb2e226b
      3189eddb)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      de431443