1. 12 Aug, 2014 40 commits
    • Lars-Peter Clausen's avatar
      ALSA: control: Handle numid overflow · 108869ef
      Lars-Peter Clausen authored
      Each control gets automatically assigned its numids when the control is created.
      The allocation is done by incrementing the numid by the amount of allocated
      numids per allocation. This means that excessive creation and destruction of
      controls (e.g. via SNDRV_CTL_IOCTL_ELEM_ADD/REMOVE) can cause the id to
      eventually overflow. Currently when this happens for the control that caused the
      overflow kctl->id.numid + kctl->count will also over flow causing it to be
      smaller than kctl->id.numid. Most of the code assumes that this is something
      that can not happen, so we need to make sure that it won't happen
      Signed-off-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Acked-by: default avatarJaroslav Kysela <perex@perex.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      
      (cherry picked from commit ac902c11)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      108869ef
    • Lars-Peter Clausen's avatar
      ALSA: control: Don't access controls outside of protected regions · 570f4da0
      Lars-Peter Clausen authored
      A control that is visible on the card->controls list can be freed at any time.
      This means we must not access any of its memory while not holding the
      controls_rw_lock. Otherwise we risk a use after free access.
      Signed-off-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Acked-by: default avatarJaroslav Kysela <perex@perex.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      
      (cherry picked from commit fd9f26e4)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      570f4da0
    • Lars-Peter Clausen's avatar
      ALSA: control: Fix replacing user controls · 87d85bfd
      Lars-Peter Clausen authored
      There are two issues with the current implementation for replacing user
      controls. The first is that the code does not check if the control is actually a
      user control and neither does it check if the control is owned by the process
      that tries to remove it. That allows userspace applications to remove arbitrary
      controls, which can cause a user after free if a for example a driver does not
      expect a control to be removed from under its feed.
      
      The second issue is that on one hand when a control is replaced the
      user_ctl_count limit is not checked and on the other hand the user_ctl_count is
      increased (even though the number of user controls does not change). This allows
      userspace, once the user_ctl_count limit as been reached, to repeatedly replace
      a control until user_ctl_count overflows. Once that happens new controls can be
      added effectively bypassing the user_ctl_count limit.
      
      Both issues can be fixed by instead of open-coding the removal of the control
      that is to be replaced to use snd_ctl_remove_user_ctl(). This function does
      proper permission checks as well as decrements user_ctl_count after the control
      has been removed.
      
      Note that by using snd_ctl_remove_user_ctl() the check which returns -EBUSY at
      beginning of the function if the control already exists is removed. This is not
      a problem though since the check is quite useless, because the lock that is
      protecting the control list is released between the check and before adding the
      new control to the list, which means that it is possible that a different
      control with the same settings is added to the list after the check. Luckily
      there is another check that is done while holding the lock in snd_ctl_add(), so
      we'll rely on that to make sure that the same control is not added twice.
      Signed-off-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Acked-by: default avatarJaroslav Kysela <perex@perex.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      
      (cherry picked from commit 82262a46)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      87d85bfd
    • Lars-Peter Clausen's avatar
      ALSA: control: Protect user controls against concurrent access · c7021e92
      Lars-Peter Clausen authored
      The user-control put and get handlers as well as the tlv do not protect against
      concurrent access from multiple threads. Since the state of the control is not
      updated atomically it is possible that either two write operations or a write
      and a read operation race against each other. Both can lead to arbitrary memory
      disclosure. This patch introduces a new lock that protects user-controls from
      concurrent access. Since applications typically access controls sequentially
      than in parallel a single lock per card should be fine.
      Signed-off-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Acked-by: default avatarJaroslav Kysela <perex@perex.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      
      (cherry picked from commit 07f4d9d7)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      c7021e92
    • Suravee Suthikulpanit's avatar
      arm64/dma: Removing ARCH_HAS_DMA_GET_REQUIRED_MASK macro · e7c6cd59
      Suravee Suthikulpanit authored
      Arm64 does not define dma_get_required_mask() function.
      Therefore, it should not define the ARCH_HAS_DMA_GET_REQUIRED_MASK.
      This causes build errors in some device drivers (e.g. mpt2sas)
      Signed-off-by: default avatarSuravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: <stable@vger.kernel.org>
      
      (cherry picked from commit f3a183cb)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      e7c6cd59
    • Will Deacon's avatar
      arm64: uid16: fix __kernel_old_{gid,uid}_t definitions · 4c05f098
      Will Deacon authored
      Whilst native arm64 applications don't have the 16-bit UID/GID syscalls
      wired up, compat tasks can still access them. The 16-bit wrappers for
      these syscalls use __kernel_old_uid_t and __kernel_old_gid_t, which must
      be 16-bit data types to maintain compatibility with the 16-bit UIDs used
      by compat applications.
      
      This patch defines 16-bit __kernel_old_{gid,uid}_t types for arm64
      instead of using the 32-bit types provided by asm-generic.
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      
      (cherry picked from commit 34c65c43)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      4c05f098
    • Will Deacon's avatar
      arm64: ptrace: change fs when passing kernel pointer to regset code · 56fe47b7
      Will Deacon authored
      Our compat PTRACE_POKEUSR implementation simply passes the user data to
      regset_copy_from_user after some simple range checking. Unfortunately,
      the data in question has already been copied to the kernel stack by this
      point, so the subsequent access_ok check fails and the ptrace request
      returns -EFAULT. This causes problems tracing fork() with older versions
      of strace.
      
      This patch briefly changes the fs to KERNEL_DS, so that the access_ok
      check passes even with a kernel address.
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      
      (cherry picked from commit c1688707)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      56fe47b7
    • ChiaHao's avatar
      arm64: Bug fix in stack alignment exception · 1b21d964
      ChiaHao authored
      The value of ESR has been stored into x1, and should be directly pass to
      do_sp_pc_abort function, "MOV x1, x25" is an extra operation and do_sp_pc_abort
      will get the wrong value of ESR.
      Signed-off-by: default avatarChiaHao <andy.jhshiu@gmail.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: <stable@vger.kernel.org>
      
      (cherry picked from commit 3906c2b5)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      1b21d964
    • Alan Stern's avatar
      USB: EHCI: avoid BIOS handover on the HASEE E200 · 1d55e6ea
      Alan Stern authored
      Leandro Liptak reports that his HASEE E200 computer hangs when we ask
      the BIOS to hand over control of the EHCI host controller.  This
      definitely sounds like a bug in the BIOS, but at the moment there is
      no way to fix it.
      
      This patch works around the problem by avoiding the handoff whenever
      the motherboard and BIOS version match those of Leandro's computer.
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Reported-by: default avatarLeandro Liptak <leandroliptak@gmail.com>
      Tested-by: default avatarLeandro Liptak <leandroliptak@gmail.com>
      CC: <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      (cherry picked from commit b0a50e92)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      1d55e6ea
    • Greg Kroah-Hartman's avatar
      Revert "uio: fix vma io range check in mmap" · b437ce34
      Greg Kroah-Hartman authored
      This reverts commit ddb09754.
      
      Linus objected to this originally, I can see why it might be needed, but
      given that no one spoke up defending this patch, I'm going to revert it.
      
      If you have hardware that requires this change, please speak up in the
      future and defend the patch.
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Bin Wang <binw@marvell.com>
      Cc: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
      Cc: Norbert Ciosek <norbertciosek@gmail.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      (cherry picked from commit b29f680c)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      b437ce34
    • Dan Carpenter's avatar
      iio: adc: at91: signedness bug in at91_adc_get_trigger_value_by_name() · 864bc733
      Dan Carpenter authored
      at91_adc_get_trigger_value_by_name() was returning -ENOMEM truncated to
      a positive u8 and that doesn't work.  I've changed it to int and
      refactored it to preserve the error code.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarAlexandre Belloni <alexandre.belloni@free-electrons.com>
      Tested-by: default avatarAlexandre Belloni <alexandre.belloni@free-electrons.com>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Cc: Stable@vger.kernel.org
      
      (cherry picked from commit 4f3bcd87)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      864bc733
    • Mario Schuknecht's avatar
      staging: iio: tsl2x7x_core: fix proximity treshold · 28fb0ff7
      Mario Schuknecht authored
      Consider high byte of proximity min and max treshold in function
      'tsl2x7x_chip_on'. So far, the high byte was not set.
      
      Cc: Stable@vger.kernel.org
      Signed-off-by: default avatarMario Schuknecht <mario.schuknecht@dresearch-fe.de>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      
      (cherry picked from commit c404618c)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      28fb0ff7
    • Dmitry Kasatkin's avatar
      ima: introduce ima_kernel_read() · 875e5abe
      Dmitry Kasatkin authored
      Commit 8aac6270 "move exit_task_namespaces() outside of exit_notify"
      introduced the kernel opps since the kernel v3.10, which happens when
      Apparmor and IMA-appraisal are enabled at the same time.
      
      ----------------------------------------------------------------------
      [  106.750167] BUG: unable to handle kernel NULL pointer dereference at
      0000000000000018
      [  106.750221] IP: [<ffffffff811ec7da>] our_mnt+0x1a/0x30
      [  106.750241] PGD 0
      [  106.750254] Oops: 0000 [#1] SMP
      [  106.750272] Modules linked in: cuse parport_pc ppdev bnep rfcomm
      bluetooth rpcsec_gss_krb5 nfsd auth_rpcgss nfs_acl nfs lockd sunrpc
      fscache dm_crypt intel_rapl x86_pkg_temp_thermal intel_powerclamp
      kvm_intel snd_hda_codec_hdmi kvm crct10dif_pclmul crc32_pclmul
      ghash_clmulni_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul
      ablk_helper cryptd snd_hda_codec_realtek dcdbas snd_hda_intel
      snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_seq_midi
      snd_seq_midi_event snd_rawmidi psmouse snd_seq microcode serio_raw
      snd_timer snd_seq_device snd soundcore video lpc_ich coretemp mac_hid lp
      parport mei_me mei nbd hid_generic e1000e usbhid ahci ptp hid libahci
      pps_core
      [  106.750658] CPU: 6 PID: 1394 Comm: mysqld Not tainted 3.13.0-rc7-kds+ #15
      [  106.750673] Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A08
      09/19/2012
      [  106.750689] task: ffff8800de804920 ti: ffff880400fca000 task.ti:
      ffff880400fca000
      [  106.750704] RIP: 0010:[<ffffffff811ec7da>]  [<ffffffff811ec7da>]
      our_mnt+0x1a/0x30
      [  106.750725] RSP: 0018:ffff880400fcba60  EFLAGS: 00010286
      [  106.750738] RAX: 0000000000000000 RBX: 0000000000000100 RCX:
      ffff8800d51523e7
      [  106.750764] RDX: ffffffffffffffea RSI: ffff880400fcba34 RDI:
      ffff880402d20020
      [  106.750791] RBP: ffff880400fcbae0 R08: 0000000000000000 R09:
      0000000000000001
      [  106.750817] R10: 0000000000000000 R11: 0000000000000001 R12:
      ffff8800d5152300
      [  106.750844] R13: ffff8803eb8df510 R14: ffff880400fcbb28 R15:
      ffff8800d51523e7
      [  106.750871] FS:  0000000000000000(0000) GS:ffff88040d200000(0000)
      knlGS:0000000000000000
      [  106.750910] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  106.750935] CR2: 0000000000000018 CR3: 0000000001c0e000 CR4:
      00000000001407e0
      [  106.750962] Stack:
      [  106.750981]  ffffffff813434eb ffff880400fcbb20 ffff880400fcbb18
      0000000000000000
      [  106.751037]  ffff8800de804920 ffffffff8101b9b9 0001800000000000
      0000000000000100
      [  106.751093]  0000010000000000 0000000000000002 000000000000000e
      ffff8803eb8df500
      [  106.751149] Call Trace:
      [  106.751172]  [<ffffffff813434eb>] ? aa_path_name+0x2ab/0x430
      [  106.751199]  [<ffffffff8101b9b9>] ? sched_clock+0x9/0x10
      [  106.751225]  [<ffffffff8134a68d>] aa_path_perm+0x7d/0x170
      [  106.751250]  [<ffffffff8101b945>] ? native_sched_clock+0x15/0x80
      [  106.751276]  [<ffffffff8134aa73>] aa_file_perm+0x33/0x40
      [  106.751301]  [<ffffffff81348c5e>] common_file_perm+0x8e/0xb0
      [  106.751327]  [<ffffffff81348d78>] apparmor_file_permission+0x18/0x20
      [  106.751355]  [<ffffffff8130c853>] security_file_permission+0x23/0xa0
      [  106.751382]  [<ffffffff811c77a2>] rw_verify_area+0x52/0xe0
      [  106.751407]  [<ffffffff811c789d>] vfs_read+0x6d/0x170
      [  106.751432]  [<ffffffff811cda31>] kernel_read+0x41/0x60
      [  106.751457]  [<ffffffff8134fd45>] ima_calc_file_hash+0x225/0x280
      [  106.751483]  [<ffffffff8134fb52>] ? ima_calc_file_hash+0x32/0x280
      [  106.751509]  [<ffffffff8135022d>] ima_collect_measurement+0x9d/0x160
      [  106.751536]  [<ffffffff810b552d>] ? trace_hardirqs_on+0xd/0x10
      [  106.751562]  [<ffffffff8134f07c>] ? ima_file_free+0x6c/0xd0
      [  106.751587]  [<ffffffff81352824>] ima_update_xattr+0x34/0x60
      [  106.751612]  [<ffffffff8134f0d0>] ima_file_free+0xc0/0xd0
      [  106.751637]  [<ffffffff811c9635>] __fput+0xd5/0x300
      [  106.751662]  [<ffffffff811c98ae>] ____fput+0xe/0x10
      [  106.751687]  [<ffffffff81086774>] task_work_run+0xc4/0xe0
      [  106.751712]  [<ffffffff81066fad>] do_exit+0x2bd/0xa90
      [  106.751738]  [<ffffffff8173c958>] ? retint_swapgs+0x13/0x1b
      [  106.751763]  [<ffffffff8106780c>] do_group_exit+0x4c/0xc0
      [  106.751788]  [<ffffffff81067894>] SyS_exit_group+0x14/0x20
      [  106.751814]  [<ffffffff8174522d>] system_call_fastpath+0x1a/0x1f
      [  106.751839] Code: c3 0f 1f 44 00 00 55 48 89 e5 e8 22 fe ff ff 5d c3
      0f 1f 44 00 00 55 65 48 8b 04 25 c0 c9 00 00 48 8b 80 28 06 00 00 48 89
      e5 5d <48> 8b 40 18 48 39 87 c0 00 00 00 0f 94 c0 c3 0f 1f 80 00 00 00
      [  106.752185] RIP  [<ffffffff811ec7da>] our_mnt+0x1a/0x30
      [  106.752214]  RSP <ffff880400fcba60>
      [  106.752236] CR2: 0000000000000018
      [  106.752258] ---[ end trace 3c520748b4732721 ]---
      ----------------------------------------------------------------------
      
      The reason for the oops is that IMA-appraisal uses "kernel_read()" when
      file is closed. kernel_read() honors LSM security hook which calls
      Apparmor handler, which uses current->nsproxy->mnt_ns. The 'guilty'
      commit changed the order of cleanup code so that nsproxy->mnt_ns was
      not already available for Apparmor.
      
      Discussion about the issue with Al Viro and Eric W. Biederman suggested
      that kernel_read() is too high-level for IMA. Another issue, except
      security checking, that was identified is mandatory locking. kernel_read
      honors it as well and it might prevent IMA from calculating necessary hash.
      It was suggested to use simplified version of the function without security
      and locking checks.
      
      This patch introduces special version ima_kernel_read(), which skips security
      and mandatory locking checking. It prevents the kernel oops to happen.
      Signed-off-by: default avatarDmitry Kasatkin <d.kasatkin@samsung.com>
      Suggested-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarMimi Zohar <zohar@linux.vnet.ibm.com>
      Cc: <stable@vger.kernel.org>
      
      (cherry picked from commit 0430e49b)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      875e5abe
    • Mimi Zohar's avatar
      evm: prohibit userspace writing 'security.evm' HMAC value · 71351849
      Mimi Zohar authored
      Calculating the 'security.evm' HMAC value requires access to the
      EVM encrypted key.  Only the kernel should have access to it.  This
      patch prevents userspace tools(eg. setfattr, cp --preserve=xattr)
      from setting/modifying the 'security.evm' HMAC value directly.
      Signed-off-by: default avatarMimi Zohar <zohar@linux.vnet.ibm.com>
      Cc: <stable@vger.kernel.org>
      
      (cherry picked from commit 2fb1c9a4)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      71351849
    • Wang, Xiaoming's avatar
      ALSA: compress: Cancel the optimization of compiler and fix the size of struct for all platform. · feb8b472
      Wang, Xiaoming authored
      Cancel the optimization of compiler for struct snd_compr_avail
      which size will be 0x1c in 32bit kernel while 0x20 in 64bit
      kernel under the optimizer. That will make compaction between
      32bit and 64bit. So add packed to fix the size of struct
      snd_compr_avail to 0x1c for all platform.
      Signed-off-by: default avatarZhang Dongxing <dongxing.zhang@intel.com>
      Signed-off-by: default avatarxiaoming wang <xiaoming.wang@intel.com>
      Acked-by: default avatarVinod Koul <vinod.koul@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      
      (cherry picked from commit 2bd0ae46)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      feb8b472
    • Lukas Czerner's avatar
      dm thin: update discard_granularity to reflect the thin-pool blocksize · 40c17691
      Lukas Czerner authored
      DM thinp already checks whether the discard_granularity of the data
      device is a factor of the thin-pool block size.  But when using the
      dm-thin-pool's discard passdown support, DM thinp was not selecting the
      max of the underlying data device's discard_granularity and the
      thin-pool's block size.
      
      Update set_discard_limits() to set discard_granularity to the max of
      these values.  This enables blkdev_issue_discard() to properly align the
      discards that are sent to the DM thin device on a full block boundary.
      As such each discard will now cover an entire DM thin-pool block and the
      block will be reclaimed.
      Reported-by: default avatarZdenek Kabelac <zkabelac@redhat.com>
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      
      (cherry picked from commit 09869de5)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      40c17691
    • Michael Neuling's avatar
      powerpc: Don't setup CPUs with bad status · b6b27e70
      Michael Neuling authored
      OPAL will mark a CPU that is guarded as "bad" in the status property of the CPU
      node.
      
      Unfortunatley Linux doesn't check this property and will put the bad CPU in the
      present map.  This has caused hangs on booting when we try to unsplit the core.
      
      This patch checks the CPU is avaliable via this status property before putting
      it in the present map.
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Tested-by: default avatarAnton Blanchard <anton@samba.org>
      cc: stable@vger.kernel.org
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      
      (cherry picked from commit 59a53afe)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      b6b27e70
    • Viresh Kumar's avatar
      watchdog: sp805: Set watchdog_device->timeout from ->set_timeout() · f9cd885a
      Viresh Kumar authored
      Implementation of ->set_timeout() is supposed to set 'timeout' field of 'struct
      watchdog_device' passed to it. sp805 was rather setting this in a local
      variable. Fix it.
      Reported-by: default avatarArun Ramamurthy <arun.ramamurthy@broadcom.com>
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarWim Van Sebroeck <wim@iguana.be>
      Cc: <stable@vger.kernel.org> # 2.6.36+
      
      (cherry picked from commit 938626d9)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      f9cd885a
    • Gabor Juhos's avatar
      watchdog: ath79_wdt: avoid spurious restarts on AR934x · 435b04a6
      Gabor Juhos authored
      On some AR934x based systems, where the frequency of
      the AHB bus is relatively high, the built-in watchdog
      causes a spurious restart when it gets enabled.
      
      The possible cause of these restarts is that the timeout
      value written into the TIMER register does not reaches
      the hardware in time.
      
      Add an explicit delay into the ath79_wdt_enable function
      to avoid the spurious restarts.
      Signed-off-by: default avatarGabor Juhos <juhosg@openwrt.org>
      Reviewed-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarWim Van Sebroeck <wim@iguana.be>
      Cc: <stable@vger.kernel.org>
      
      (cherry picked from commit 23afeb61)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      435b04a6
    • Andy Lutomirski's avatar
      auditsc: audit_krule mask accesses need bounds checking · bab0ad3e
      Andy Lutomirski authored
      Fixes an easy DoS and possible information disclosure.
      
      This does nothing about the broken state of x32 auditing.
      
      eparis: If the admin has enabled auditd and has specifically loaded
      audit rules.  This bug has been around since before git.  Wow...
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit a3c54931)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      bab0ad3e
    • Mateusz Guzik's avatar
      NFS: populate ->net in mount data when remounting · 8a8dea4c
      Mateusz Guzik authored
      Otherwise the kernel oopses when remounting with IPv6 server because
      net is dereferenced in dev_get_by_name.
      
      Use net ns of current thread so that dev_get_by_name does not operate on
      foreign ns. Changing the address is prohibited anyway so this should not
      affect anything.
      Signed-off-by: default avatarMateusz Guzik <mguzik@redhat.com>
      Cc: linux-nfs@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: stable@vger.kernel.org # 3.4+
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      
      (cherry picked from commit a914722f)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      8a8dea4c
    • Chris Mason's avatar
      Btrfs: fix double free in find_lock_delalloc_range · 4aabd2a8
      Chris Mason authored
      We need to NULL the cached_state after freeing it, otherwise
      we might free it again if find_delalloc_range doesn't find anything.
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      cc: stable@vger.kernel.org
      
      (cherry picked from commit 7d788742)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      4aabd2a8
    • J. Bruce Fields's avatar
      nfsd4: fix FREE_STATEID lockowner leak · 9ab1c772
      J. Bruce Fields authored
      27b11428 ("nfsd4: remove lockowner when removing lock stateid")
      introduced a memory leak.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarJeff Layton <jeff.layton@primarydata.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      
      (cherry picked from commit 48385408)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      9ab1c772
    • Hans de Goede's avatar
      Input: elantech - don't set bit 1 of reg_10 when the no_hw_res quirk is set · d70aacab
      Hans de Goede authored
      The touchpad on the GIGABYTE U2442 not only stops communicating when we try
      to set bit 3 (enable real hardware resolution) of reg_10, but on some BIOS
      versions also when we set bit 1 (enable two finger mode auto correct).
      
      I've asked the original reporter of:
      https://bugzilla.kernel.org/show_bug.cgi?id=61151
      
      To check that not setting bit 1 does not lead to any adverse effects on his
      model / BIOS revision, and it does not, so this commit fixes the touchpad
      not working on these versions by simply never setting bit 1 for laptop
      models with the no_hw_res quirk.
      Reported-and-tested-by: default avatarJames Lademann <jwlademann@gmail.com>
      Tested-by: default avatarPhilipp Wolfer <ph.wolfer@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      
      (cherry picked from commit fb4f8f56)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      d70aacab
    • Hans de Goede's avatar
      Input: elantech - deal with clickpads reporting right button events · 42f9e96d
      Hans de Goede authored
      At least the Dell Vostro 5470 elantech *clickpad* reports right button
      clicks when clicked in the right bottom area:
      
      https://bugzilla.redhat.com/show_bug.cgi?id=1103528
      
      This is different from how (elantech) clickpads normally operate, normally
      no matter where the user clicks on the pad the pad always reports a left
      button event, since there is only 1 hardware button beneath the path.
      
      It looks like Dell has put 2 buttons under the pad, one under each bottom
      corner, causing this.
      
      Since this however still clearly is a real clickpad hardware-wise, we still
      want to report it as such to userspace, so that things like finger movement
      in the bottom area can be properly ignored as it should be on clickpads.
      
      So deal with this weirdness by simply mapping a right click to a left click
      on elantech clickpads. As an added advantage this is something which we can
      simply do on all elantech clickpads, so no need to add special quirks for
      this weird model.
      Reported-and-tested-by: default avatarElder Marco <eldermarco@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      
      (cherry picked from commit cd9e83e2)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      42f9e96d
    • Lai Jiangshan's avatar
      idr: fix overflow bug during maximum ID calculation at maximum height · a0b340a3
      Lai Jiangshan authored
      idr_replace() open-codes the logic to calculate the maximum valid ID
      given the height of the idr tree; unfortunately, the open-coded logic
      doesn't account for the fact that the top layer may have unused slots
      and over-shifts the limit to zero when the tree is at its maximum
      height.
      
      The following test code shows it fails to replace the value for
      id=((1<<27)+42):
      
        static void test5(void)
        {
              int id;
              DEFINE_IDR(test_idr);
        #define TEST5_START ((1<<27)+42) /* use the highest layer */
      
              printk(KERN_INFO "Start test5\n");
              id = idr_alloc(&test_idr, (void *)1, TEST5_START, 0, GFP_KERNEL);
              BUG_ON(id != TEST5_START);
              TEST_BUG_ON(idr_replace(&test_idr, (void *)2, TEST5_START) != (void *)1);
              idr_destroy(&test_idr);
              printk(KERN_INFO "End of test5\n");
        }
      
      Fix the bug by using idr_max() which correctly takes into account the
      maximum allowed shift.
      
      sub_alloc() shares the same problem and may incorrectly fail with
      -EAGAIN; however, this bug doesn't affect correct operation because
      idr_get_empty_slot(), which already uses idr_max(), retries with the
      increased @id in such cases.
      
      [tj@kernel.org: Updated patch description.]
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit 3afb69cb)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      a0b340a3
    • Matthew Dempsky's avatar
      ptrace: fix fork event messages across pid namespaces · 97ec9c38
      Matthew Dempsky authored
      When tracing a process in another pid namespace, it's important for fork
      event messages to contain the child's pid as seen from the tracer's pid
      namespace, not the parent's.  Otherwise, the tracer won't be able to
      correlate the fork event with later SIGTRAP signals it receives from the
      child.
      
      We still risk a race condition if a ptracer from a different pid
      namespace attaches after we compute the pid_t value.  However, sending a
      bogus fork event message in this unlikely scenario is still a vast
      improvement over the status quo where we always send bogus fork event
      messages to debuggers in a different pid namespace than the forking
      process.
      Signed-off-by: default avatarMatthew Dempsky <mdempsky@chromium.org>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Julien Tinnes <jln@chromium.org>
      Cc: Roland McGrath <mcgrathr@chromium.org>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit 4e52365f)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      97ec9c38
    • Boris BREZILLON's avatar
      rtc: rtc-at91rm9200: fix infinite wait for ACKUPD irq · 9f00f801
      Boris BREZILLON authored
      The rtc user must wait at least 1 sec between each time/calandar update
      (see atmel's datasheet chapter "Updating Time/Calendar").
      
      Use the 1Hz interrupt to update the at91_rtc_upd_rdy flag and wait for
      the at91_rtc_wait_upd_rdy event if the rtc is not ready.
      
      This patch fixes a deadlock in an uninterruptible wait when the RTC is
      updated more than once every second.  AFAICT the bug is here from the
      beginning, but I think we should at least backport this fix to 3.10 and
      the following longterm and stable releases.
      Signed-off-by: default avatarBoris BREZILLON <boris.brezillon@free-electrons.com>
      Reported-by: default avatarBryan Evenson <bevenson@melinkcorp.com>
      Tested-by: default avatarBryan Evenson <bevenson@melinkcorp.com>
      Cc: Andrew Victor <linux@maxim.org.za>
      Cc: Nicolas Ferre <nicolas.ferre@atmel.com>
      Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit 2fe121e1)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      9f00f801
    • Johannes Weiner's avatar
      mm: vmscan: clear kswapd's special reclaim powers before exiting · dbf15567
      Johannes Weiner authored
      When kswapd exits, it can end up taking locks that were previously held
      by allocating tasks while they waited for reclaim.  Lockdep currently
      warns about this:
      
      On Wed, May 28, 2014 at 06:06:34PM +0800, Gu Zheng wrote:
      >  inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage.
      >  kswapd2/1151 [HC0[0]:SC0[0]:HE1:SE1] takes:
      >   (&sig->group_rwsem){+++++?}, at: exit_signals+0x24/0x130
      >  {RECLAIM_FS-ON-W} state was registered at:
      >     mark_held_locks+0xb9/0x140
      >     lockdep_trace_alloc+0x7a/0xe0
      >     kmem_cache_alloc_trace+0x37/0x240
      >     flex_array_alloc+0x99/0x1a0
      >     cgroup_attach_task+0x63/0x430
      >     attach_task_by_pid+0x210/0x280
      >     cgroup_procs_write+0x16/0x20
      >     cgroup_file_write+0x120/0x2c0
      >     vfs_write+0xc0/0x1f0
      >     SyS_write+0x4c/0xa0
      >     tracesys+0xdd/0xe2
      >  irq event stamp: 49
      >  hardirqs last  enabled at (49):  _raw_spin_unlock_irqrestore+0x36/0x70
      >  hardirqs last disabled at (48):  _raw_spin_lock_irqsave+0x2b/0xa0
      >  softirqs last  enabled at (0):  copy_process.part.24+0x627/0x15f0
      >  softirqs last disabled at (0):            (null)
      >
      >  other info that might help us debug this:
      >   Possible unsafe locking scenario:
      >
      >         CPU0
      >         ----
      >    lock(&sig->group_rwsem);
      >    <Interrupt>
      >      lock(&sig->group_rwsem);
      >
      >   *** DEADLOCK ***
      >
      >  no locks held by kswapd2/1151.
      >
      >  stack backtrace:
      >  CPU: 30 PID: 1151 Comm: kswapd2 Not tainted 3.10.39+ #4
      >  Call Trace:
      >    dump_stack+0x19/0x1b
      >    print_usage_bug+0x1f7/0x208
      >    mark_lock+0x21d/0x2a0
      >    __lock_acquire+0x52a/0xb60
      >    lock_acquire+0xa2/0x140
      >    down_read+0x51/0xa0
      >    exit_signals+0x24/0x130
      >    do_exit+0xb5/0xa50
      >    kthread+0xdb/0x100
      >    ret_from_fork+0x7c/0xb0
      
      This is because the kswapd thread is still marked as a reclaimer at the
      time of exit.  But because it is exiting, nobody is actually waiting on
      it to make reclaim progress anymore, and it's nothing but a regular
      thread at this point.  Be tidy and strip it of all its powers
      (PF_MEMALLOC, PF_SWAPWRITE, PF_KSWAPD, and the lockdep reclaim state)
      before returning from the thread function.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: default avatarGu Zheng <guz.fnst@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit 71abdc15)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      dbf15567
    • Bart Van Assche's avatar
      IB/umad: Fix use-after-free on close · 2a07b56b
      Bart Van Assche authored
      Avoid that closing /dev/infiniband/umad<n> or /dev/infiniband/issm<n>
      triggers a use-after-free.  __fput() invokes f_op->release() before it
      invokes cdev_put().  Make sure that the ib_umad_device structure is
      freed by the cdev_put() call instead of f_op->release().  This avoids
      that changing the port mode from IB into Ethernet and back to IB
      followed by restarting opensmd triggers the following kernel oops:
      
          general protection fault: 0000 [#1] PREEMPT SMP
          RIP: 0010:[<ffffffff810cc65c>]  [<ffffffff810cc65c>] module_put+0x2c/0x170
          Call Trace:
           [<ffffffff81190f20>] cdev_put+0x20/0x30
           [<ffffffff8118e2ce>] __fput+0x1ae/0x1f0
           [<ffffffff8118e35e>] ____fput+0xe/0x10
           [<ffffffff810723bc>] task_work_run+0xac/0xe0
           [<ffffffff81002a9f>] do_notify_resume+0x9f/0xc0
           [<ffffffff814b8398>] int_signal+0x12/0x17
      
      Reference: https://bugzilla.kernel.org/show_bug.cgi?id=75051Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Reviewed-by: default avatarYann Droneaud <ydroneaud@opteya.com>
      Cc: <stable@vger.kernel.org> # 3.x: 8ec0a0e6: IB/umad: Fix error handling
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      
      (cherry picked from commit 60e1751c)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      2a07b56b
    • Nicholas Bellinger's avatar
      iscsi-target: Reject mutual authentication with reflected CHAP_C · 204dc41e
      Nicholas Bellinger authored
      This patch adds an explicit check in chap_server_compute_md5() to ensure
      the CHAP_C value received from the initiator during mutual authentication
      does not match the original CHAP_C provided by the target.
      
      This is in line with RFC-3720, section 8.2.1:
      
         Originators MUST NOT reuse the CHAP challenge sent by the Responder
         for the other direction of a bidirectional authentication.
         Responders MUST check for this condition and close the iSCSI TCP
         connection if it occurs.
      Reported-by: default avatarTejas Vaykole <tejas.vaykole@calsoftinc.com>
      Cc: stable@vger.kernel.org # 3.1+
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      
      (cherry picked from commit 1d2b60a5)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      204dc41e
    • Kailang Yang's avatar
      ALSA: hda/realtek - Add support of ALC891 codec · 796ac613
      Kailang Yang authored
      New codec support for ALC891.
      Signed-off-by: default avatarKailang Yang <kailang@realtek.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      
      (cherry picked from commit b6c5fbad)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      796ac613
    • Anton Blanchard's avatar
      powerpc: 64bit sendfile is capped at 2GB · 8651fe5d
      Anton Blanchard authored
      commit 8f9c0119 (compat: fs: Generic compat_sys_sendfile
      implementation) changed the PowerPC 64bit sendfile call from
      sys_sendile64 to sys_sendfile.
      
      Unfortunately this broke sendfile of lengths greater than 2G because
      sys_sendfile caps at MAX_NON_LFS. Restore what we had previously which
      fixes the bug.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      
      (cherry picked from commit 5d73320a)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      8651fe5d
    • Benjamin Herrenschmidt's avatar
      powerpc/serial: Use saner flags when creating legacy ports · b2f3a242
      Benjamin Herrenschmidt authored
      We had a mix & match of flags used when creating legacy ports
      depending on where we found them in the device-tree. Among others
      we were missing UPF_SKIP_TEST for some kind of ISA ports which is
      a problem as quite a few UARTs out there don't support the loopback
      test (such as a lot of BMCs).
      
      Let's pick the set of flags used by the SoC code and generalize it
      which means autoconf, no loopback test, irq maybe shared and fixed
      port.
      
      Sending to stable as the lack of UPF_SKIP_TEST is breaking
      serial on some machines so I want this back into distros
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      CC: stable@vger.kernel.org
      
      (cherry picked from commit c4cad90f)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      b2f3a242
    • Tony Luck's avatar
      mm/memory-failure.c: don't let collect_procs() skip over processes for MF_ACTION_REQUIRED · 627f9634
      Tony Luck authored
      When Linux sees an "action optional" machine check (where h/w has reported
      an error that is not in the current execution path) we generally do not
      want to signal a process, since most processes do not have a SIGBUS
      handler - we'd just prematurely terminate the process for a problem that
      they might never actually see.
      
      task_early_kill() decides whether to consider a process - and it checks
      whether this specific process has been marked for early signals with
      "prctl", or if the system administrator has requested early signals for
      all processes using /proc/sys/vm/memory_failure_early_kill.
      
      But for MF_ACTION_REQUIRED case we must not defer.  The error is in the
      execution path of the current thread so we must send the SIGBUS
      immediatley.
      
      Fix by passing a flag argument through collect_procs*() to
      task_early_kill() so it knows whether we can defer or must take action.
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Chen Gong <gong.chen@linux.jf.intel.com>
      Cc: <stable@vger.kernel.org>	[3.2+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit 74614de1)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      627f9634
    • Tony Luck's avatar
      mm/memory-failure.c-failure: send right signal code to correct thread · 2f051a36
      Tony Luck authored
      When a thread in a multi-threaded application hits a machine check because
      of an uncorrectable error in memory - we want to send the SIGBUS with
      si.si_code = BUS_MCEERR_AR to that thread.  Currently we fail to do that
      if the active thread is not the primary thread in the process.
      collect_procs() just finds primary threads and this test:
      
      	if ((flags & MF_ACTION_REQUIRED) && t == current) {
      
      will see that the thread we found isn't the current thread and so send a
      si.si_code = BUS_MCEERR_AO to the primary (and nothing to the active
      thread at this time).
      
      We can fix this by checking whether "current" shares the same mm with the
      process that collect_procs() said owned the page.  If so, we send the
      SIGBUS to current (with code BUS_MCEERR_AR).
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reported-by: default avatarOtto Bruggeman <otto.g.bruggeman@intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Chen Gong <gong.chen@linux.jf.intel.com>
      Cc: <stable@vger.kernel.org>	[3.2+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit a70ffcac)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      2f051a36
    • Mel Gorman's avatar
      mm: page_alloc: use word-based accesses for get/set pageblock bitmaps · 45ba11d7
      Mel Gorman authored
      The test_bit operations in get/set pageblock flags are expensive.  This
      patch reads the bitmap on a word basis and use shifts and masks to isolate
      the bits of interest.  Similarly masks are used to set a local copy of the
      bitmap and then use cmpxchg to update the bitmap if there have been no
      other changes made in parallel.
      
      In a test running dd onto tmpfs the overhead of the pageblock-related
      functions went from 1.27% in profiles to 0.5%.
      
      In addition to the performance benefits, this patch closes races that are
      possible between:
      
      a) get_ and set_pageblock_migratetype(), where get_pageblock_migratetype()
         reads part of the bits before and other part of the bits after
         set_pageblock_migratetype() has updated them.
      
      b) set_pageblock_migratetype() and set_pageblock_skip(), where the non-atomic
         read-modify-update set bit operation in set_pageblock_skip() will cause
         lost updates to some bits changed in the set_pageblock_migratetype().
      
      Joonsoo Kim first reported the case a) via code inspection.  Vlastimil
      Babka's testing with a debug patch showed that either a) or b) occurs
      roughly once per mmtests' stress-highalloc benchmark (although not
      necessarily in the same pageblock).  Furthermore during development of
      unrelated compaction patches, it was observed that frequent calls to
      {start,undo}_isolate_page_range() the race occurs several thousands of
      times and has resulted in NULL pointer dereferences in move_freepages()
      and free_one_page() in places where free_list[migratetype] is
      manipulated by e.g.  list_move().  Further debugging confirmed that
      migratetype had invalid value of 6, causing out of bounds access to the
      free_list array.
      
      That confirmed that the race exist, although it may be extremely rare,
      and currently only fatal where page isolation is performed due to
      memory hot remove.  Races on pageblocks being updated by
      set_pageblock_migratetype(), where both old and new migratetype are
      lower MIGRATE_RESERVE, currently cannot result in an invalid value
      being observed, although theoretically they may still lead to
      unexpected creation or destruction of MIGRATE_RESERVE pageblocks.
      Furthermore, things could get suddenly worse when memory isolation is
      used more, or when new migratetypes are added.
      
      After this patch, the race has no longer been observed in testing.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reported-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Reported-and-tested-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit e58469ba)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      45ba11d7
    • Michal Hocko's avatar
      memcg: do not hang on OOM when killed by userspace OOM access to memory reserves · b43ece61
      Michal Hocko authored
      Eric has reported that he can see task(s) stuck in memcg OOM handler
      regularly.  The only way out is to
      
      	echo 0 > $GROUP/memory.oom_control
      
      His usecase is:
      
      - Setup a hierarchy with memory and the freezer (disable kernel oom and
        have a process watch for oom).
      
      - In that memory cgroup add a process with one thread per cpu.
      
      - In one thread slowly allocate once per second I think it is 16M of ram
        and mlock and dirty it (just to force the pages into ram and stay
        there).
      
      - When oom is achieved loop:
        * attempt to freeze all of the tasks.
        * if frozen send every task SIGKILL, unfreeze, remove the directory in
          cgroupfs.
      
      Eric has then pinpointed the issue to be memcg specific.
      
      All tasks are sitting on the memcg_oom_waitq when memcg oom is disabled.
      Those that have received fatal signal will bypass the charge and should
      continue on their way out.  The tricky part is that the exit path might
      trigger a page fault (e.g.  exit_robust_list), thus the memcg charge,
      while its memcg is still under OOM because nobody has released any charges
      yet.
      
      Unlike with the in-kernel OOM handler the exiting task doesn't get
      TIF_MEMDIE set so it doesn't shortcut further charges of the killed task
      and falls to the memcg OOM again without any way out of it as there are no
      fatal signals pending anymore.
      
      This patch fixes the issue by checking PF_EXITING early in
      mem_cgroup_try_charge and bypass the charge same as if it had fatal
      signal pending or TIF_MEMDIE set.
      
      Normally exiting tasks (aka not killed) will bypass the charge now but
      this should be OK as the task is leaving and will release memory and
      increasing the memory pressure just to release it in a moment seems
      dubious wasting of cycles.  Besides that charges after exit_signals should
      be rare.
      
      I am bringing this patch again (rebased on the current mmotm tree). I
      hope we can move forward finally. If there is still an opposition then
      I would really appreciate a concurrent approach so that we can discuss
      alternatives.
      
      http://comments.gmane.org/gmane.linux.kernel.stable/77650 is a reference
      to the followup discussion when the patch has been dropped from the mmotm
      last time.
      Reported-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit d8dc595c)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      b43ece61
    • Mel Gorman's avatar
      mm: vmscan: do not throttle based on pfmemalloc reserves if node has no ZONE_NORMAL · ab93cf1a
      Mel Gorman authored
      throttle_direct_reclaim() is meant to trigger during swap-over-network
      during which the min watermark is treated as a pfmemalloc reserve.  It
      throttes on the first node in the zonelist but this is flawed.
      
      The user-visible impact is that a process running on CPU whose local
      memory node has no ZONE_NORMAL will stall for prolonged periods of time,
      possibly indefintely.  This is due to throttle_direct_reclaim thinking the
      pfmemalloc reserves are depleted when in fact they don't exist on that
      node.
      
      On a NUMA machine running a 32-bit kernel (I know) allocation requests
      from CPUs on node 1 would detect no pfmemalloc reserves and the process
      gets throttled.  This patch adjusts throttling of direct reclaim to
      throttle based on the first node in the zonelist that has a usable
      ZONE_NORMAL or lower zone.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit 675becce)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      ab93cf1a
    • Hugh Dickins's avatar
      mm: fix sleeping function warning from __put_anon_vma · 4a98fb1a
      Hugh Dickins authored
      Trinity reports BUG:
      
        sleeping function called from invalid context at kernel/locking/rwsem.c:47
        in_atomic(): 0, irqs_disabled(): 0, pid: 5787, name: trinity-c27
      
      __might_sleep < down_write < __put_anon_vma < page_get_anon_vma <
      migrate_pages < compact_zone < compact_zone_order < try_to_compact_pages ..
      
      Right, since conversion to mutex then rwsem, we should not put_anon_vma()
      from inside an rcu_read_lock()ed section: fix the two places that did so.
      And add might_sleep() to anon_vma_free(), as suggested by Peter Zijlstra.
      
      Fixes: 88c22088 ("mm: optimize page_lock_anon_vma() fast-path")
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit 7f39dda9)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      4a98fb1a