1. 14 Nov, 2014 40 commits
    • Michal Hocko's avatar
      OOM, PM: OOM killed task shouldn't escape PM suspend · e033782a
      Michal Hocko authored
      commit 5695be14 upstream.
      
      PM freezer relies on having all tasks frozen by the time devices are
      getting frozen so that no task will touch them while they are getting
      frozen. But OOM killer is allowed to kill an already frozen task in
      order to handle OOM situtation. In order to protect from late wake ups
      OOM killer is disabled after all tasks are frozen. This, however, still
      keeps a window open when a killed task didn't manage to die by the time
      freeze_processes finishes.
      
      Reduce the race window by checking all tasks after OOM killer has been
      disabled. This is still not race free completely unfortunately because
      oom_killer_disable cannot stop an already ongoing OOM killer so a task
      might still wake up from the fridge and get killed without
      freeze_processes noticing. Full synchronization of OOM and freezer is,
      however, too heavy weight for this highly unlikely case.
      
      Introduce and check oom_kills counter which gets incremented early when
      the allocator enters __alloc_pages_may_oom path and only check all the
      tasks if the counter changes during the freezing attempt. The counter
      is updated so early to reduce the race window since allocator checked
      oom_killer_disabled which is set by PM-freezing code. A false positive
      will push the PM-freezer into a slow path but that is not a big deal.
      
      Changes since v1
      - push the re-check loop out of freeze_processes into
        check_frozen_processes and invert the condition to make the code more
        readable as per Rafael
      
      Fixes: f660daac (oom: thaw threads if oom killed thread is frozen before deferring)
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e033782a
    • Cong Wang's avatar
      freezer: Do not freeze tasks killed by OOM killer · f95ad6ed
      Cong Wang authored
      commit 51fae6da upstream.
      
      Since f660daac (oom: thaw threads if oom killed thread is frozen
      before deferring) OOM killer relies on being able to thaw a frozen task
      to handle OOM situation but a3201227 (freezer: make freezing() test
      freeze conditions in effect instead of TIF_FREEZE) has reorganized the
      code and stopped clearing freeze flag in __thaw_task. This means that
      the target task only wakes up and goes into the fridge again because the
      freezing condition hasn't changed for it. This reintroduces the bug
      fixed by f660daac.
      
      Fix the issue by checking for TIF_MEMDIE thread flag in
      freezing_slow_path and exclude the task from freezing completely. If a
      task was already frozen it would get woken by __thaw_task from OOM killer
      and get out of freezer after rechecking freezing().
      
      Changes since v1
      - put TIF_MEMDIE check into freezing_slowpath rather than in __refrigerator
        as per Oleg
      - return __thaw_task into oom_scan_process_thread because
        oom_kill_process will not wake task in the fridge because it is
        sleeping uninterruptible
      
      [mhocko@suse.cz: rewrote the changelog]
      Fixes: a3201227 (freezer: make freezing() test freeze conditions in effect instead of TIF_FREEZE)
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f95ad6ed
    • Jan Kara's avatar
      ext4: fix oops when loading block bitmap failed · 6ce75ebb
      Jan Kara authored
      commit 599a9b77 upstream.
      
      When we fail to load block bitmap in __ext4_new_inode() we will
      dereference NULL pointer in ext4_journal_get_write_access(). So check
      for error from ext4_read_block_bitmap().
      
      Coverity-id: 989065
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6ce75ebb
    • Pali Rohár's avatar
      cpufreq: intel_pstate: Fix setting max_perf_pct in performance policy · 4bf70f9f
      Pali Rohár authored
      commit 36b4bed5 upstream.
      
      Code which changes policy to powersave changes also max_policy_pct based on
      max_freq. Code which change max_perf_pct has upper limit base on value
      max_policy_pct. When policy is changing from powersave back to performance
      then max_policy_pct is not changed. Which means that changing max_perf_pct is
      not possible to high values if max_freq was too low in powersave policy.
      
      Test case:
      
      $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
      800000
      $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
      3300000
      $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
      performance
      $ cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
      100
      
      $ echo powersave > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
      $ echo 800000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
      $ echo 20 > /sys/devices/system/cpu/intel_pstate/max_perf_pct
      
      $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
      powersave
      $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
      800000
      $ cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
      20
      
      $ echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
      $ echo 3300000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
      $ echo 100 > /sys/devices/system/cpu/intel_pstate/max_perf_pct
      
      $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
      performance
      $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
      3300000
      $ cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
      24
      
      And now intel_pstate driver allows to set maximal value for max_perf_pct based
      on max_policy_pct which is 24 for previous powersave max_freq 800000.
      
      This patch will set default value for max_policy_pct when setting policy to
      performance so it will allow to set also max value for max_perf_pct.
      Signed-off-by: default avatarPali Rohár <pali.rohar@gmail.com>
      Acked-by: default avatarDirk Brandewie <dirk.j.brandewie@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4bf70f9f
    • Jan Kara's avatar
      ext4: fix overflow when updating superblock backups after resize · 57ce0ed4
      Jan Kara authored
      commit 9378c676 upstream.
      
      When there are no meta block groups update_backups() will compute the
      backup block in 32-bit arithmetics thus possibly overflowing the block
      number and corrupting the filesystem. OTOH filesystems without meta
      block groups larger than 16 TB should be rare. Fix the problem by doing
      the counting in 64-bit arithmetics.
      
      Coverity-id: 741252
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      57ce0ed4
    • Darrick J. Wong's avatar
      ext4: check s_chksum_driver when looking for bg csum presence · 209f5484
      Darrick J. Wong authored
      commit 813d32f9 upstream.
      
      Convert the ext4_has_group_desc_csum predicate to look for a checksum
      driver instead of the metadata_csum flag and change the bg checksum
      calculation function to look for GDT_CSUM before taking the crc16
      path.
      
      Without this patch, if we mount with ^uninit_bg,^metadata_csum and
      later metadata_csum gets turned on by accident, the block group
      checksum functions will incorrectly assume that checksumming is
      enabled (metadata_csum) but that crc16 should be used
      (!s_chksum_driver).  This is totally wrong, so fix the predicate
      and the checksum formula selection.
      
      (Granted, if the metadata_csum feature bit gets enabled on a live FS
      then something underhanded is going on, but we could at least avoid
      writing garbage into the on-disk fields.)
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      209f5484
    • Eric Sandeen's avatar
      ext4: fix reservation overflow in ext4_da_write_begin · 1f1ccdde
      Eric Sandeen authored
      commit 0ff8947f upstream.
      
      Delalloc write journal reservations only reserve 1 credit,
      to update the inode if necessary.  However, it may happen
      once in a filesystem's lifetime that a file will cross
      the 2G threshold, and require the LARGE_FILE feature to
      be set in the superblock as well, if it was not set already.
      
      This overruns the transaction reservation, and can be
      demonstrated simply on any ext4 filesystem without the LARGE_FILE
      feature already set:
      
      dd if=/dev/zero of=testfile bs=1 seek=2147483646 count=1 \
      	conv=notrunc of=testfile
      sync
      dd if=/dev/zero of=testfile bs=1 seek=2147483647 count=1 \
      	conv=notrunc of=testfile
      
      leads to:
      
      EXT4-fs: ext4_do_update_inode:4296: aborting transaction: error 28 in __ext4_handle_dirty_super
      EXT4-fs error (device loop0) in ext4_do_update_inode:4301: error 28
      EXT4-fs error (device loop0) in ext4_reserve_inode_write:4757: Readonly filesystem
      EXT4-fs error (device loop0) in ext4_dirty_inode:4876: error 28
      EXT4-fs error (device loop0) in ext4_da_write_end:2685: error 28
      
      Adjust the number of credits based on whether the flag is
      already set, and whether the current write may extend past the
      LARGE_FILE limit.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1f1ccdde
    • Theodore Ts'o's avatar
      ext4: add ext4_iget_normal() which is to be used for dir tree lookups · 65f25799
      Theodore Ts'o authored
      commit f4bb2981 upstream.
      
      If there is a corrupted file system which has directory entries that
      point at reserved, metadata inodes, prohibit them from being used by
      treating them the same way we treat Boot Loader inodes --- that is,
      mark them to be bad inodes.  This prohibits them from being opened,
      deleted, or modified via chmod, chown, utimes, etc.
      
      In particular, this prevents a corrupted file system which has a
      directory entry which points at the journal inode from being deleted
      and its blocks released, after which point Much Hilarity Ensues.
      Reported-by: default avatarSami Liedes <sami.liedes@iki.fi>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      65f25799
    • Dmitry Monakhov's avatar
      ext4: grab missed write_count for EXT4_IOC_SWAP_BOOT · 744539f1
      Dmitry Monakhov authored
      commit 3e67cfad upstream.
      
      Otherwise this provokes complain like follows:
      WARNING: CPU: 12 PID: 5795 at fs/ext4/ext4_jbd2.c:48 ext4_journal_check_start+0x4e/0xa0()
      Modules linked in: brd iTCO_wdt lpc_ich mfd_core igb ptp dm_mirror dm_region_hash dm_log dm_mod
      CPU: 12 PID: 5795 Comm: python Not tainted 3.17.0-rc2-00175-gae5344f #158
      Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.99.99.x028.061320111235 06/13/2011
       0000000000000030 ffff8808116cfd28 ffffffff815c7dfc 0000000000000030
       0000000000000000 ffff8808116cfd68 ffffffff8106ce8c ffff8808116cfdc8
       ffff880813b16000 ffff880806ad6ae8 ffffffff81202008 0000000000000000
      Call Trace:
       [<ffffffff815c7dfc>] dump_stack+0x51/0x6d
       [<ffffffff8106ce8c>] warn_slowpath_common+0x8c/0xc0
       [<ffffffff81202008>] ? ext4_ioctl+0x9e8/0xeb0
       [<ffffffff8106ceda>] warn_slowpath_null+0x1a/0x20
       [<ffffffff8122867e>] ext4_journal_check_start+0x4e/0xa0
       [<ffffffff81228c10>] __ext4_journal_start_sb+0x90/0x110
       [<ffffffff81202008>] ext4_ioctl+0x9e8/0xeb0
       [<ffffffff8107b0bd>] ? ptrace_stop+0x24d/0x2f0
       [<ffffffff81088530>] ? alloc_pid+0x480/0x480
       [<ffffffff8107b1f2>] ? ptrace_do_notify+0x92/0xb0
       [<ffffffff81186545>] do_vfs_ioctl+0x4e5/0x550
       [<ffffffff815cdbcb>] ? _raw_spin_unlock_irq+0x2b/0x40
       [<ffffffff81186603>] SyS_ioctl+0x53/0x80
       [<ffffffff815ce2ce>] tracesys+0xd0/0xd5
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      744539f1
    • Jan Kara's avatar
      ext4: don't check quota format when there are no quota files · b69805f8
      Jan Kara authored
      commit 279bf6d3 upstream.
      
      The check whether quota format is set even though there are no
      quota files with journalled quota is pointless and it actually
      makes it impossible to turn off journalled quotas (as there's
      no way to unset journalled quota format). Just remove the check.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b69805f8
    • Darrick J. Wong's avatar
      ext4: check EA value offset when loading · cfcc2239
      Darrick J. Wong authored
      commit a0626e75 upstream.
      
      When loading extended attributes, check each entry's value offset to
      make sure it doesn't collide with the entries.
      
      Without this check it is easy to crash the kernel by mounting a
      malicious FS containing a file with an EA wherein e_value_offs = 0 and
      e_value_size > 0 and then deleting the EA, which corrupts the name
      list.
      
      (See the f_ea_value_crash test's FS image in e2fsprogs for an example.)
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cfcc2239
    • Darrick J. Wong's avatar
      jbd2: free bh when descriptor block checksum fails · c38e36f1
      Darrick J. Wong authored
      commit 064d8389 upstream.
      
      Free the buffer head if the journal descriptor block fails checksum
      verification.
      
      This is the jbd2 port of the e2fsprogs patch "e2fsck: free bh on csum
      verify error in do_one_pass".
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c38e36f1
    • David Daney's avatar
      MIPS: tlbex: Properly fix HUGE TLB Refill exception handler · b0de6ef3
      David Daney authored
      commit 9e0f162a upstream.
      
      In commit 8393c524 (MIPS: tlbex: Fix a missing statement for
      HUGETLB), the TLB Refill handler was fixed so that non-OCTEON targets
      would work properly with huge pages.  The change was incorrect in that
      it broke the OCTEON case.
      
      The problem is shown here:
      
          xxx0:	df7a0000 	ld	k0,0(k1)
          .
          .
          .
          xxxc0:	df610000 	ld	at,0(k1)
          xxxc4:	335a0ff0 	andi	k0,k0,0xff0
          xxxc8:	e825ffcd 	bbit1	at,0x5,0x0
          xxxcc:	003ad82d 	daddu	k1,at,k0
          .
          .
          .
      
      In the non-octeon case there is a destructive test for the huge PTE
      bit, and then at 0, $k0 is reloaded (that is what the 8393c524
      patch added).
      
      In the octeon case, we modify k1 in the branch delay slot, but we
      never need k0 again, so the new load is not needed, but since k1 is
      modified, if we do the load, we load from a garbage location and then
      get a nested TLB Refill, which is seen in userspace as either SIGBUS
      or SIGSEGV (depending on the garbage).
      
      The real fix is to only do this reloading if it is needed, and never
      where it is harmful.
      Signed-off-by: default avatarDavid Daney <david.daney@cavium.com>
      Cc: Huacai Chen <chenhc@lemote.com>
      Cc: Fuxin Zhang <zhangfx@lemote.com>
      Cc: Zhangjin Wu <wuzhangjin@gmail.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/8151/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b0de6ef3
    • Nicholas Bellinger's avatar
      target: Fix APTPL metadata handling for dynamic MappedLUNs · ec0f40e8
      Nicholas Bellinger authored
      commit e2480563 upstream.
      
      This patch fixes a bug in handling of SPC-3 PR Activate Persistence
      across Target Power Loss (APTPL) logic where re-creation of state for
      MappedLUNs from dynamically generated NodeACLs did not occur during
      I_T Nexus establishment.
      
      It adds the missing core_scsi3_check_aptpl_registration() call during
      core_tpg_check_initiator_node_acl() -> core_tpg_add_node_to_devs() in
      order to replay any pre-loaded APTPL metadata state associated with
      the newly connected SCSI Initiator Port.
      
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ec0f40e8
    • Quinn Tran's avatar
      target: Fix queue full status NULL pointer for SCF_TRANSPORT_TASK_SENSE · b036d398
      Quinn Tran authored
      commit 082f58ac upstream.
      
      During temporary resource starvation at lower transport layer, command
      is placed on queue full retry path, which expose this problem.  The TCM
      queue full handling of SCF_TRANSPORT_TASK_SENSE currently sends the same
      cmd twice to lower layer.  The 1st time led to cmd normal free path.
      The 2nd time cause Null pointer access.
      
      This regression bug was originally introduced v3.1-rc code in the
      following commit:
      
      commit e057f533
      Author: Christoph Hellwig <hch@infradead.org>
      Date:   Mon Oct 17 13:56:41 2011 -0400
      
          target: remove the transport_qf_callback se_cmd callback
      Signed-off-by: default avatarQuinn Tran <quinn.tran@qlogic.com>
      Signed-off-by: default avatarSaurav Kashyap <saurav.kashyap@qlogic.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b036d398
    • Joern Engel's avatar
      qla_target: don't delete changed nacls · e80395d5
      Joern Engel authored
      commit f4c24db1 upstream.
      
      The code is currently riddled with "drop the hardware_lock to avoid a
      deadlock" bugs that expose races.  One of those races seems to expose a
      valid warning in tcm_qla2xxx_clear_nacl_from_fcport_map.  Add some
      bandaid to it.
      Signed-off-by: default avatarJoern Engel <joern@logfs.org>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e80395d5
    • Anton Kolesov's avatar
      ARC: Update order of registers in KGDB to match GDB 7.5 · 73213e6c
      Anton Kolesov authored
      commit ebc0c74e upstream.
      
      Order of registers has changed in GDB moving from 6.8 to 7.5. This patch
      updates KGDB to work properly with GDB 7.5, though makes it incompatible
      with 6.8.
      Signed-off-by: default avatarAnton Kolesov <Anton.Kolesov@synopsys.com>
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      73213e6c
    • Vineet Gupta's avatar
      ARC: [nsimosci] Allow "headless" models to boot · e27cd756
      Vineet Gupta authored
      commit 5c05483e upstream.
      
      There are certain test configuration of virtual platform which don't
      have any real console device (uart/pgu). So add tty0 as a fallback console
      device to allow system to boot and be accessible via telnet
      
      Otherwise with ttyS0 as only console, but 8250 disabled in kernel build,
      init chokes.
      Reported-by: default avatarAnton Kolesov <akolesov@synopsys.com>
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e27cd756
    • Nadav Amit's avatar
      KVM: x86: Emulator fixes for eip canonical checks on near branches · 0efb0baa
      Nadav Amit authored
      commit 234f3ce4 upstream.
      
      Before changing rip (during jmp, call, ret, etc.) the target should be asserted
      to be canonical one, as real CPUs do.  During sysret, both target rsp and rip
      should be canonical. If any of these values is noncanonical, a #GP exception
      should occur.  The exception to this rule are syscall and sysenter instructions
      in which the assigned rip is checked during the assignment to the relevant
      MSRs.
      
      This patch fixes the emulator to behave as real CPUs do for near branches.
      Far branches are handled by the next patch.
      
      This fixes CVE-2014-3647.
      Signed-off-by: default avatarNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0efb0baa
    • Nadav Amit's avatar
      KVM: x86: Fix wrong masking on relative jump/call · d092975d
      Nadav Amit authored
      commit 05c83ec9 upstream.
      
      Relative jumps and calls do the masking according to the operand size, and not
      according to the address size as the KVM emulator does today.
      
      This patch fixes KVM behavior.
      Signed-off-by: default avatarNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d092975d
    • Michael S. Tsirkin's avatar
      kvm: x86: don't kill guest on unknown exit reason · e56b9c47
      Michael S. Tsirkin authored
      commit 2bc19dc3 upstream.
      
      KVM_EXIT_UNKNOWN is a kvm bug, we don't really know whether it was
      triggered by a priveledged application.  Let's not kill the guest: WARN
      and inject #UD instead.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e56b9c47
    • Nadav Amit's avatar
      KVM: x86: Check non-canonical addresses upon WRMSR · ea306147
      Nadav Amit authored
      commit 854e8bb1 upstream.
      
      Upon WRMSR, the CPU should inject #GP if a non-canonical value (address) is
      written to certain MSRs. The behavior is "almost" identical for AMD and Intel
      (ignoring MSRs that are not implemented in either architecture since they would
      anyhow #GP). However, IA32_SYSENTER_ESP and IA32_SYSENTER_EIP cause #GP if
      non-canonical address is written on Intel but not on AMD (which ignores the top
      32-bits).
      
      Accordingly, this patch injects a #GP on the MSRs which behave identically on
      Intel and AMD.  To eliminate the differences between the architecutres, the
      value which is written to IA32_SYSENTER_ESP and IA32_SYSENTER_EIP is turned to
      canonical value before writing instead of injecting a #GP.
      
      Some references from Intel and AMD manuals:
      
      According to Intel SDM description of WRMSR instruction #GP is expected on
      WRMSR "If the source register contains a non-canonical address and ECX
      specifies one of the following MSRs: IA32_DS_AREA, IA32_FS_BASE, IA32_GS_BASE,
      IA32_KERNEL_GS_BASE, IA32_LSTAR, IA32_SYSENTER_EIP, IA32_SYSENTER_ESP."
      
      According to AMD manual instruction manual:
      LSTAR/CSTAR (SYSCALL): "The WRMSR instruction loads the target RIP into the
      LSTAR and CSTAR registers.  If an RIP written by WRMSR is not in canonical
      form, a general-protection exception (#GP) occurs."
      IA32_GS_BASE and IA32_FS_BASE (WRFSBASE/WRGSBASE): "The address written to the
      base field must be in canonical form or a #GP fault will occur."
      IA32_KERNEL_GS_BASE (SWAPGS): "The address stored in the KernelGSbase MSR must
      be in canonical form."
      
      This patch fixes CVE-2014-3610.
      Signed-off-by: default avatarNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ea306147
    • Andy Honig's avatar
      KVM: x86: Improve thread safety in pit · ca09be78
      Andy Honig authored
      commit 2febc839 upstream.
      
      There's a race condition in the PIT emulation code in KVM.  In
      __kvm_migrate_pit_timer the pit_timer object is accessed without
      synchronization.  If the race condition occurs at the wrong time this
      can crash the host kernel.
      
      This fixes CVE-2014-3611.
      Signed-off-by: default avatarAndrew Honig <ahonig@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ca09be78
    • Andy Honig's avatar
      KVM: x86: Prevent host from panicking on shared MSR writes. · 1bea37d6
      Andy Honig authored
      commit 8b3c3104 upstream.
      
      The previous patch blocked invalid writes directly when the MSR
      is written.  As a precaution, prevent future similar mistakes by
      gracefulling handle GPs caused by writes to shared MSRs.
      Signed-off-by: default avatarAndrew Honig <ahonig@google.com>
      [Remove parts obsoleted by Nadav's patch. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1bea37d6
    • Quentin Casasnovas's avatar
      kvm: fix excessive pages un-pinning in kvm_iommu_map error path. · 3ea61129
      Quentin Casasnovas authored
      commit 3d32e4db upstream.
      
      The third parameter of kvm_unpin_pages() when called from
      kvm_iommu_map_pages() is wrong, it should be the number of pages to un-pin
      and not the page size.
      
      This error was facilitated with an inconsistent API: kvm_pin_pages() takes
      a size, but kvn_unpin_pages() takes a number of pages, so fix the problem
      by matching the two.
      
      This was introduced by commit 350b8bdd ("kvm: iommu: fix the third parameter
      of kvm_iommu_put_pages (CVE-2014-3601)"), which fixes the lack of
      un-pinning for pages intended to be un-pinned (i.e. memory leak) but
      unfortunately potentially aggravated the number of pages we un-pin that
      should have stayed pinned. As far as I understand though, the same
      practical mitigations apply.
      
      This issue was found during review of Red Hat 6.6 patches to prepare
      Ksplice rebootless updates.
      
      Thanks to Vegard for his time on a late Friday evening to help me in
      understanding this code.
      
      Fixes: 350b8bdd ("kvm: iommu: fix the third parameter of... (CVE-2014-3601)")
      Signed-off-by: default avatarQuentin Casasnovas <quentin.casasnovas@oracle.com>
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarJamie Iles <jamie.iles@oracle.com>
      Reviewed-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3ea61129
    • Axel Lin's avatar
      media: tda7432: Fix setting TDA7432_MUTE bit for TDA7432_RF register · 6033d642
      Axel Lin authored
      commit 91ba0e59 upstream.
      
      Fix a copy-paste bug when converting to the control framework.
      
      Fixes: commit 5d478e0d ("[media] tda7432: convert to the control framework")
      Signed-off-by: default avatarAxel Lin <axel.lin@ingics.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <m.chehab@samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6033d642
    • Ulrich Eckhardt's avatar
      media: ds3000: fix LNB supply voltage on Tevii S480 on initialization · f879c8cc
      Ulrich Eckhardt authored
      commit 8c5bcded upstream.
      
      The Tevii S480 outputs 18V on startup for the LNB supply voltage and does not
      automatically power down. This blocks other receivers connected
      to a satellite channel router (EN50494), since the receivers can not send the
      required DiSEqC sequences when the Tevii card is connected to a the same SCR.
      
      This patch switches off the LNB supply voltage on initialization of the frontend.
      
      [mchehab@osg.samsung.com: add a comment about why we're explicitly
       turning off voltage at device init]
      Signed-off-by: default avatarUlrich Eckhardt <uli@uli-eckhardt.de>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f879c8cc
    • Frank Schaefer's avatar
      media: em28xx-v4l: give back all active video buffers to the vb2 core properly on streaming stop · 7dbdd901
      Frank Schaefer authored
      commit 627530c3 upstream.
      
      When a new video frame is started, the driver takes the next video buffer from
      the list of active buffers and moves it to dev->usb_ctl.vid_buf / dev->usb_ctl.vbi_buf
      for further processing.
      
      On streaming stop we currently only give back the pending buffers from the list
      but not the ones which are currently processed.
      
      This causes the following warning from the vb2 core since kernel 3.15:
      
      ...
       ------------[ cut here ]------------
       WARNING: CPU: 1 PID: 2284 at drivers/media/v4l2-core/videobuf2-core.c:2115 __vb2_queue_cancel+0xed/0x150 [videobuf2_core]()
       [...]
       Call Trace:
        [<c0769c46>] dump_stack+0x48/0x69
        [<c0245b69>] warn_slowpath_common+0x79/0x90
        [<f925e4ad>] ? __vb2_queue_cancel+0xed/0x150 [videobuf2_core]
        [<f925e4ad>] ? __vb2_queue_cancel+0xed/0x150 [videobuf2_core]
        [<c0245bfd>] warn_slowpath_null+0x1d/0x20
        [<f925e4ad>] __vb2_queue_cancel+0xed/0x150 [videobuf2_core]
        [<f925fa35>] vb2_internal_streamoff+0x35/0x90 [videobuf2_core]
        [<f925fac5>] vb2_streamoff+0x35/0x60 [videobuf2_core]
        [<f925fb27>] vb2_ioctl_streamoff+0x37/0x40 [videobuf2_core]
        [<f8e45895>] v4l_streamoff+0x15/0x20 [videodev]
        [<f8e4925d>] __video_do_ioctl+0x23d/0x2d0 [videodev]
        [<f8e49020>] ? video_ioctl2+0x20/0x20 [videodev]
        [<f8e48c63>] video_usercopy+0x203/0x5a0 [videodev]
        [<f8e49020>] ? video_ioctl2+0x20/0x20 [videodev]
        [<c039d0e7>] ? fsnotify+0x1e7/0x2b0
        [<f8e49012>] video_ioctl2+0x12/0x20 [videodev]
        [<f8e49020>] ? video_ioctl2+0x20/0x20 [videodev]
        [<f8e4461e>] v4l2_ioctl+0xee/0x130 [videodev]
        [<f8e44530>] ? v4l2_open+0xf0/0xf0 [videodev]
        [<c0378de2>] do_vfs_ioctl+0x2e2/0x4d0
        [<c0368eec>] ? vfs_write+0x13c/0x1c0
        [<c0369a8f>] ? vfs_writev+0x2f/0x50
        [<c0379028>] SyS_ioctl+0x58/0x80
        [<c076fff3>] sysenter_do_call+0x12/0x12
       ---[ end trace 5545f934409f13f4 ]---
      ...
      
      Many thanks to Hans Verkuil, whose recently added check in the vb2 core unveiled
      this long standing issue and who has investigated it further.
      Signed-off-by: default avatarFrank Schäfer <fschaefer.oss@googlemail.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <m.chehab@samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7dbdd901
    • Maciej Matraszek's avatar
      media: v4l2-common: fix overflow in v4l_bound_align_image() · ed070066
      Maciej Matraszek authored
      commit 3bacc10c upstream.
      
      Fix clamp_align() used in v4l_bound_align_image() to prevent overflow
      when passed large value like UINT32_MAX.
      
       In the current implementation:
          clamp_align(UINT32_MAX, 8, 8192, 3)
      
      returns 8, because in line:
      
          x = (x + (1 << (align - 1))) & mask;
      
      x overflows to (-1 + 4) & 0x7 = 3, while expected value is 8192.
      
      v4l_bound_align_image() is heavily used in VIDIOC_S_FMT and
      VIDIOC_SUBDEV_S_FMT ioctls handlers, and documentation of the latter
      explicitly states that:
      
      "The modified format should be as close as possible to the original
      request."
        -- http://linuxtv.org/downloads/v4l-dvb-apis/vidioc-subdev-g-fmt.html
      
      Thus one would expect, that passing UINT32_MAX as format width and
      height will result in setting maximum possible resolution for the
      device. Particularly, when the driver doesn't support
      VIDIOC_ENUM_FRAMESIZES ioctl, which is common in the codebase.
      
      Fixes changeset: b0d3159bSigned-off-by: default avatarMaciej Matraszek <m.matraszek@samsung.com>
      Acked-by: default avatarSakari Ailus <sakari.ailus@linux.intel.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed070066
    • Ben Skeggs's avatar
      drm/nouveau/bios: memset dcb struct to zero before parsing · 3b163cb4
      Ben Skeggs authored
      commit 595d373f upstream.
      
      Fixes type/mask calculation being based on uninitialised data for VGA
      outputs.
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3b163cb4
    • Ezequiel Garcia's avatar
      drm/tilcdc: Fix the error path in tilcdc_load() · a3cb4455
      Ezequiel Garcia authored
      commit b478e336 upstream.
      
      The current error path calls tilcdc_unload() in case of an error to release
      the resources. However, this is wrong because not all resources have been
      allocated by the time an error occurs in tilcdc_load().
      
      To fix it, this commit adds proper labels to bail out at the different
      stages in the load function, and release only the resources actually allocated.
      Tested-by: default avatarDarren Etheridge <detheridge@ti.com>
      Tested-by: default avatarJohannes Pointner <johannes.pointner@br-automation.com>
      Signed-off-by: default avatarEzequiel Garcia <ezequiel@vanguardiasur.com.ar>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Fixes: 3a490122 ("drm/tilcdc: panel: fix leak when unloading the module")
      Signed-off-by: default avatarMatwey V. Kornilov <matwey.kornilov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a3cb4455
    • Benjamin Herrenschmidt's avatar
      drm/ast: Fix HW cursor image · da1185cf
      Benjamin Herrenschmidt authored
      commit 1e99cfa8 upstream.
      
      The translation from the X driver to the KMS one typo'ed a couple
      of array indices, causing the HW cursor to look weird (blocky with
      leaking edge colors). This fixes it.
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      da1185cf
    • Hans de Goede's avatar
      Input: i8042 - quirks for Fujitsu Lifebook A544 and Lifebook AH544 · 55a72275
      Hans de Goede authored
      commit 993b3a3f upstream.
      
      These models need i8042.notimeout, otherwise the touchpad will not work.
      
      BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=69731
      BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1111138Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      55a72275
    • Hans de Goede's avatar
      Input: i8042 - add noloop quirk for Asus X750LN · 3b8bb8fb
      Hans de Goede authored
      commit 9ff84a17 upstream.
      
      Without this the aux port does not get detected, and consequently the
      touchpad will not work.
      
      https://bugzilla.redhat.com/show_bug.cgi?id=1110011Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3b8bb8fb
    • Mikulas Patocka's avatar
      framebuffer: fix border color · a276227f
      Mikulas Patocka authored
      commit f74a289b upstream.
      
      The framebuffer code uses the current background color to fill the border
      when switching consoles, however, this results in inconsistent behavior.
      For example:
      - start Midnigh Commander
      - the border is black
      - switch to another console and switch back
      - the border is cyan
      - type something into the command line in mc
      - the border is cyan
      - switch to another console and switch back
      - the border is black
      - press F9 to go to menu
      - the border is black
      - switch to another console and switch back
      - the border is dark blue
      
      When switching to a console with Midnight Commander, the border is random
      color that was left selected by the slang subsystem.
      
      This patch fixes this inconsistency by always using black as the
      background color when switching consoles.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarTomi Valkeinen <tomi.valkeinen@ti.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a276227f
    • Prarit Bhargava's avatar
      modules, lock around setting of MODULE_STATE_UNFORMED · 5ade1695
      Prarit Bhargava authored
      commit d3051b48 upstream.
      
      A panic was seen in the following sitation.
      
      There are two threads running on the system. The first thread is a system
      monitoring thread that is reading /proc/modules. The second thread is
      loading and unloading a module (in this example I'm using my simple
      dummy-module.ko).  Note, in the "real world" this occurred with the qlogic
      driver module.
      
      When doing this, the following panic occurred:
      
       ------------[ cut here ]------------
       kernel BUG at kernel/module.c:3739!
       invalid opcode: 0000 [#1] SMP
       Modules linked in: binfmt_misc sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw igb gf128mul glue_helper iTCO_wdt iTCO_vendor_support ablk_helper ptp sb_edac cryptd pps_core edac_core shpchp i2c_i801 pcspkr wmi lpc_ich ioatdma mfd_core dca ipmi_si nfsd ipmi_msghandler auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm isci drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: dummy_module]
       CPU: 37 PID: 186343 Comm: cat Tainted: GF          O--------------   3.10.0+ #7
       Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013
       task: ffff8807fd2d8000 ti: ffff88080fa7c000 task.ti: ffff88080fa7c000
       RIP: 0010:[<ffffffff810d64c5>]  [<ffffffff810d64c5>] module_flags+0xb5/0xc0
       RSP: 0018:ffff88080fa7fe18  EFLAGS: 00010246
       RAX: 0000000000000003 RBX: ffffffffa03b5200 RCX: 0000000000000000
       RDX: 0000000000001000 RSI: ffff88080fa7fe38 RDI: ffffffffa03b5000
       RBP: ffff88080fa7fe28 R08: 0000000000000010 R09: 0000000000000000
       R10: 0000000000000000 R11: 000000000000000f R12: ffffffffa03b5000
       R13: ffffffffa03b5008 R14: ffffffffa03b5200 R15: ffffffffa03b5000
       FS:  00007f6ae57ef740(0000) GS:ffff88101e7a0000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000404f70 CR3: 0000000ffed48000 CR4: 00000000001407e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
       Stack:
        ffffffffa03b5200 ffff8810101e4800 ffff88080fa7fe70 ffffffff810d666c
        ffff88081e807300 000000002e0f2fbf 0000000000000000 ffff88100f257b00
        ffffffffa03b5008 ffff88080fa7ff48 ffff8810101e4800 ffff88080fa7fee0
       Call Trace:
        [<ffffffff810d666c>] m_show+0x19c/0x1e0
        [<ffffffff811e4d7e>] seq_read+0x16e/0x3b0
        [<ffffffff812281ed>] proc_reg_read+0x3d/0x80
        [<ffffffff811c0f2c>] vfs_read+0x9c/0x170
        [<ffffffff811c1a58>] SyS_read+0x58/0xb0
        [<ffffffff81605829>] system_call_fastpath+0x16/0x1b
       Code: 48 63 c2 83 c2 01 c6 04 03 29 48 63 d2 eb d9 0f 1f 80 00 00 00 00 48 63 d2 c6 04 13 2d 41 8b 0c 24 8d 50 02 83 f9 01 75 b2 eb cb <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41
       RIP  [<ffffffff810d64c5>] module_flags+0xb5/0xc0
        RSP <ffff88080fa7fe18>
      
          Consider the two processes running on the system.
      
          CPU 0 (/proc/modules reader)
          CPU 1 (loading/unloading module)
      
          CPU 0 opens /proc/modules, and starts displaying data for each module by
          traversing the modules list via fs/seq_file.c:seq_open() and
          fs/seq_file.c:seq_read().  For each module in the modules list, seq_read
          does
      
                  op->start()  <-- this is a pointer to m_start()
                  op->show()   <- this is a pointer to m_show()
                  op->stop()   <-- this is a pointer to m_stop()
      
          The m_start(), m_show(), and m_stop() module functions are defined in
          kernel/module.c. The m_start() and m_stop() functions acquire and release
          the module_mutex respectively.
      
          ie) When reading /proc/modules, the module_mutex is acquired and released
          for each module.
      
          m_show() is called with the module_mutex held.  It accesses the module
          struct data and attempts to write out module data.  It is in this code
          path that the above BUG_ON() warning is encountered, specifically m_show()
          calls
      
          static char *module_flags(struct module *mod, char *buf)
          {
                  int bx = 0;
      
                  BUG_ON(mod->state == MODULE_STATE_UNFORMED);
          ...
      
          The other thread, CPU 1, in unloading the module calls the syscall
          delete_module() defined in kernel/module.c.  The module_mutex is acquired
          for a short time, and then released.  free_module() is called without the
          module_mutex.  free_module() then sets mod->state = MODULE_STATE_UNFORMED,
          also without the module_mutex.  Some additional code is called and then the
          module_mutex is reacquired to remove the module from the modules list:
      
              /* Now we can delete it from the lists */
              mutex_lock(&module_mutex);
              stop_machine(__unlink_module, mod, NULL);
              mutex_unlock(&module_mutex);
      
      This is the sequence of events that leads to the panic.
      
      CPU 1 is removing dummy_module via delete_module().  It acquires the
      module_mutex, and then releases it.  CPU 1 has NOT set dummy_module->state to
      MODULE_STATE_UNFORMED yet.
      
      CPU 0, which is reading the /proc/modules, acquires the module_mutex and
      acquires a pointer to the dummy_module which is still in the modules list.
      CPU 0 calls m_show for dummy_module.  The check in m_show() for
      MODULE_STATE_UNFORMED passed for dummy_module even though it is being
      torn down.
      
      Meanwhile CPU 1, which has been continuing to remove dummy_module without
      holding the module_mutex, now calls free_module() and sets
      dummy_module->state to MODULE_STATE_UNFORMED.
      
      CPU 0 now calls module_flags() with dummy_module and ...
      
      static char *module_flags(struct module *mod, char *buf)
      {
              int bx = 0;
      
              BUG_ON(mod->state == MODULE_STATE_UNFORMED);
      
      and BOOM.
      
      Acquire and release the module_mutex lock around the setting of
      MODULE_STATE_UNFORMED in the teardown path, which should resolve the
      problem.
      
      Testing: In the unpatched kernel I can panic the system within 1 minute by
      doing
      
      while (true) do insmod dummy_module.ko; rmmod dummy_module.ko; done
      
      and
      
      while (true) do cat /proc/modules; done
      
      in separate terminals.
      
      In the patched kernel I was able to run just over one hour without seeing
      any issues.  I also verified the output of panic via sysrq-c and the output
      of /proc/modules looks correct for all three states for the dummy_module.
      
              dummy_module 12661 0 - Unloading 0xffffffffa03a5000 (OE-)
              dummy_module 12661 0 - Live 0xffffffffa03bb000 (OE)
              dummy_module 14015 1 - Loading 0xffffffffa03a5000 (OE+)
      Signed-off-by: default avatarPrarit Bhargava <prarit@redhat.com>
      Reviewed-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5ade1695
    • Alexey Khoroshilov's avatar
      dm log userspace: fix memory leak in dm_ulog_tfr_init failure path · fdef68fb
      Alexey Khoroshilov authored
      commit 56ec16cb upstream.
      
      If cn_add_callback() fails in dm_ulog_tfr_init(), it does not
      deallocate prealloced memory but calls cn_del_callback().
      
      Found by Linux Driver Verification project (linuxtesting.org).
      Signed-off-by: default avatarAlexey Khoroshilov <khoroshilov@ispras.ru>
      Reviewed-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fdef68fb
    • Mike Snitzer's avatar
      block: fix alignment_offset math that assumes io_min is a power-of-2 · a63bea06
      Mike Snitzer authored
      commit b8839b8c upstream.
      
      The math in both blk_stack_limits() and queue_limit_alignment_offset()
      assume that a block device's io_min (aka minimum_io_size) is always a
      power-of-2.  Fix the math such that it works for non-power-of-2 io_min.
      
      This issue (of alignment_offset != 0) became apparent when testing
      dm-thinp with a thinp blocksize that matches a RAID6 stripesize of
      1280K.  Commit fdfb4c8c ("dm thin: set minimum_io_size to pool's data
      block size") unlocked the potential for alignment_offset != 0 due to
      the dm-thin-pool's io_min possibly being a non-power-of-2.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a63bea06
    • Lai Jiangshan's avatar
      drbd: compute the end before rb_insert_augmented() · 29772e3e
      Lai Jiangshan authored
      commit 82cfb90b upstream.
      
      Commit 98683650 "Merge branch 'drbd-8.4_ed6' into
      for-3.8-drivers-drbd-8.4_ed6" switches to the new augment API, but the
      new API requires that the tree is augmented before rb_insert_augmented()
      is called, which is missing.
      
      So we add the augment-code to drbd_insert_interval() when it travels the
      tree up to down before rb_insert_augmented().  See the example in
      include/linux/interval_tree_generic.h or Documentation/rbtree.txt.
      
      drbd_insert_interval() may cancel the insertion when traveling, in this
      case, the just added augment-code does nothing before cancel since the
      @this node is already in the subtrees in this case.
      
      CC: Michel Lespinasse <walken@google.com>
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruen@linbit.com>
      Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      29772e3e
    • Joe Thornber's avatar
      dm bufio: update last_accessed when relinking a buffer · 3f626317
      Joe Thornber authored
      commit eb76faf5 upstream.
      
      The 'last_accessed' member of the dm_buffer structure was only set when
      the the buffer was created.  This led to each buffer being discarded
      after dm_bufio_max_age time even if it was used recently.  In practice
      this resulted in all thinp metadata being evicted soon after being read
      -- this is particularly problematic for metadata intensive workloads
      like multithreaded small random IO.
      
      'last_accessed' is now updated each time the buffer is moved to the head
      of the LRU list, so the buffer is now properly discarded if it was not
      used in dm_bufio_max_age time.
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3f626317