1. 22 Oct, 2015 25 commits
  2. 18 Sep, 2015 15 commits
    • Zefan Li's avatar
      Linux 3.4.109 · 4a55c0cf
      Zefan Li authored
      4a55c0cf
    • Eric Dumazet's avatar
      udp: fix behavior of wrong checksums · 1c50a0ae
      Eric Dumazet authored
      commit beb39db5 upstream.
      
      We have two problems in UDP stack related to bogus checksums :
      
      1) We return -EAGAIN to application even if receive queue is not empty.
         This breaks applications using edge trigger epoll()
      
      2) Under UDP flood, we can loop forever without yielding to other
         processes, potentially hanging the host, especially on non SMP.
      
      This patch is an attempt to make things better.
      
      We might in the future add extra support for rt applications
      wanting to better control time spent doing a recv() in a hostile
      environment. For example we could validate checksums before queuing
      packets in socket receive queue.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      1c50a0ae
    • Thomas Gleixner's avatar
      sched: Queue RT tasks to head when prio drops · aaedb090
      Thomas Gleixner authored
      commit 81a44c54 upstream.
      
      The following scenario does not work correctly:
      
      Runqueue of CPUx contains two runnable and pinned tasks:
      
       T1: SCHED_FIFO, prio 80
       T2: SCHED_FIFO, prio 80
      
      T1 is on the cpu and executes the following syscalls (classic priority
      ceiling scenario):
      
       sys_sched_setscheduler(pid(T1), SCHED_FIFO, .prio = 90);
       ...
       sys_sched_setscheduler(pid(T1), SCHED_FIFO, .prio = 80);
       ...
      
      Now T1 gets preempted by T3 (SCHED_FIFO, prio 95). After T3 goes back
      to sleep the scheduler picks T2. Surprise!
      
      The same happens w/o actual preemption when T1 is forced into the
      scheduler due to a sporadic NEED_RESCHED event. The scheduler invokes
      pick_next_task() which returns T2. So T1 gets preempted and scheduled
      out.
      
      This happens because sched_setscheduler() dequeues T1 from the prio 90
      list and then enqueues it on the tail of the prio 80 list behind T2.
      This violates the POSIX spec and surprises user space which relies on
      the guarantee that SCHED_FIFO tasks are not scheduled out unless they
      give the CPU up voluntarily or are preempted by a higher priority
      task. In the latter case the preempted task must get back on the CPU
      after the preempting task schedules out again.
      
      We fixed a similar issue already in commit 60db48ca (sched: Queue a
      deboosted task to the head of the RT prio queue). The same treatment
      is necessary for sched_setscheduler(). So enqueue to head of the prio
      bucket list if the priority of the task is lowered.
      
      It might be possible that existing user space relies on the current
      behaviour, but it can be considered highly unlikely due to the corner
      case nature of the application scenario.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1391803122-4425-6-git-send-email-bigeasy@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      aaedb090
    • Ben Hutchings's avatar
      pipe: iovec: Fix memory corruption when retrying atomic copy as non-atomic · a39bf4a8
      Ben Hutchings authored
      pipe_iov_copy_{from,to}_user() may be tried twice with the same iovec,
      the first time atomically and the second time not.  The second attempt
      needs to continue from the iovec position, pipe buffer offset and
      remaining length where the first attempt failed, but currently the
      pipe buffer offset and remaining length are reset.  This will corrupt
      the piped data (possibly also leading to an information leak between
      processes) and may also corrupt kernel memory.
      
      This was fixed upstream by commits f0d1bec9 ("new helper:
      copy_page_from_iter()") and 637b58c2 ("switch pipe_read() to
      copy_page_to_iter()"), but those aren't suitable for stable.  This fix
      for older kernel versions was made by Seth Jennings for RHEL and I
      have extracted it from their update.
      
      CVE-2015-1805
      
      References: https://bugzilla.redhat.com/show_bug.cgi?id=1202855Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      a39bf4a8
    • Ralf Baechle's avatar
      NET: ROSE: Don't dereference NULL neighbour pointer. · bee5f3e2
      Ralf Baechle authored
      commit d496f784 upstream.
      
      A ROSE socket doesn't necessarily always have a neighbour pointer so check
      if the neighbour pointer is valid before dereferencing it.
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Tested-by: default avatarBernard Pidoux <f6bvp@free.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      bee5f3e2
    • Dan Williams's avatar
      block: fix ext_dev_lock lockdep report · bd3fa757
      Dan Williams authored
      commit 4d66e5e9 upstream.
      
       =================================
       [ INFO: inconsistent lock state ]
       4.1.0-rc7+ #217 Tainted: G           O
       ---------------------------------
       inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
       swapper/6/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
        (ext_devt_lock){+.?...}, at: [<ffffffff8143a60c>] blk_free_devt+0x3c/0x70
       {SOFTIRQ-ON-W} state was registered at:
         [<ffffffff810bf6b1>] __lock_acquire+0x461/0x1e70
         [<ffffffff810c1947>] lock_acquire+0xb7/0x290
         [<ffffffff818ac3a8>] _raw_spin_lock+0x38/0x50
         [<ffffffff8143a07d>] blk_alloc_devt+0x6d/0xd0  <-- take the lock in process context
      [..]
        [<ffffffff810bf64e>] __lock_acquire+0x3fe/0x1e70
        [<ffffffff810c00ad>] ? __lock_acquire+0xe5d/0x1e70
        [<ffffffff810c1947>] lock_acquire+0xb7/0x290
        [<ffffffff8143a60c>] ? blk_free_devt+0x3c/0x70
        [<ffffffff818ac3a8>] _raw_spin_lock+0x38/0x50
        [<ffffffff8143a60c>] ? blk_free_devt+0x3c/0x70
        [<ffffffff8143a60c>] blk_free_devt+0x3c/0x70    <-- take the lock in softirq
        [<ffffffff8143bfec>] part_release+0x1c/0x50
        [<ffffffff8158edf6>] device_release+0x36/0xb0
        [<ffffffff8145ac2b>] kobject_cleanup+0x7b/0x1a0
        [<ffffffff8145aad0>] kobject_put+0x30/0x70
        [<ffffffff8158f147>] put_device+0x17/0x20
        [<ffffffff8143c29c>] delete_partition_rcu_cb+0x16c/0x180
        [<ffffffff8143c130>] ? read_dev_sector+0xa0/0xa0
        [<ffffffff810e0e0f>] rcu_process_callbacks+0x2ff/0xa90
        [<ffffffff810e0dcf>] ? rcu_process_callbacks+0x2bf/0xa90
        [<ffffffff81067e2e>] __do_softirq+0xde/0x600
      
      Neil sees this in his tests and it also triggers on pmem driver unbind
      for the libnvdimm tests.  This fix is on top of an initial fix by Keith
      for incorrect usage of mutex_lock() in this path: 2da78092 "block:
      Fix dev_t minor allocation lifetime".  Both this and 2da78092 are
      candidates for -stable.
      
      Fixes: 2da78092 ("block: Fix dev_t minor allocation lifetime")
      Cc: Keith Busch <keith.busch@intel.com>
      Reported-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      bd3fa757
    • Vasily Averin's avatar
      bridge: superfluous skb->nfct check in br_nf_dev_queue_xmit · 59c4dd5e
      Vasily Averin authored
      commit aff09ce3 upstream.
      
      Currently bridge can silently drop ipv4 fragments.
      If node have loaded nf_defrag_ipv4 module but have no nf_conntrack_ipv4,
      br_nf_pre_routing defragments incoming ipv4 fragments
      but nfct check in br_nf_dev_queue_xmit does not allow re-fragment combined
      packet back, and therefore it is dropped in br_dev_queue_push_xmit without
      incrementing of any failcounters
      
      It seems the only way to hit the ip_fragment code in the bridge xmit
      path is to have a fragment list whose reassembled fragments go over
      the mtu. This only happens if nf_defrag is enabled. Thanks to
      Florian Westphal for providing feedback to clarify this.
      
      Defragmentation ipv4 is required not only in conntracks but at least in
      TPROXY target and socket match, therefore #ifdef is changed from
      NF_CONNTRACK_IPV4 to NF_DEFRAG_IPV4
      Signed-off-by: default avatarVasily Averin <vvs@openvz.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Cc: Kirill Tkhai <ktkhai@odin.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      59c4dd5e
    • Junling Zheng's avatar
      net: socket: Fix the wrong returns for recvmsg and sendmsg · dbccb188
      Junling Zheng authored
      Based on 08adb7da upstream.
      
      We found that after v3.10.73, recvmsg might return -EFAULT while -EINVAL
      was expected.
      
      We tested it through the recvmsg01 testcase come from LTP testsuit. It set
      msg->msg_namelen to -1 and the recvmsg syscall returned errno 14, which is
      unexpected (errno 22 is expected):
      
      recvmsg01    4  TFAIL  :  invalid socket length ; returned -1 (expected -1),
      errno 14 (expected 22)
      
      Linux mainline has no this bug for commit 08adb7da fixes it accidentally.
      However, it is too large and complex to be backported to LTS 3.10.
      
      Commit 281c9c36 (net: compat: Update get_compat_msghdr() to match
      copy_msghdr_from_user() behaviour) made get_compat_msghdr() return
      error if msg_sys->msg_namelen was negative, which changed the behaviors
      of recvmsg and sendmsg syscall in a lib32 system:
      
      Before commit 281c9c36, get_compat_msghdr() wouldn't fail and it would
      return -EINVAL in move_addr_to_user() or somewhere if msg_sys->msg_namelen
      was invalid and then syscall returned -EINVAL, which is correct.
      
      And now, when msg_sys->msg_namelen is negative, get_compat_msghdr() will
      fail and wants to return -EINVAL, however, the outer syscall will return
      -EFAULT directly, which is unexpected.
      
      This patch gets the return value of get_compat_msghdr() as well as
      copy_msghdr_from_user(), then returns this expected value if
      get_compat_msghdr() fails.
      
      Fixes: 281c9c36 (net: compat: Update get_compat_msghdr() to match copy_msghdr_from_user() behaviour)
      Signed-off-by: default avatarJunling Zheng <zhengjunling@huawei.com>
      Signed-off-by: default avatarHanbing Xu <xuhanbing@huawei.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      dbccb188
    • Xie XiuQi's avatar
      ipmi: fix timeout calculation when bmc is disconnected · e08ca627
      Xie XiuQi authored
      commit e21404dc upstream.
      
      Loading ipmi_si module while bmc is disconnected, we found the timeout
      is longer than 5 secs.  Actually it takes about 3 mins and 20
      secs.(HZ=250)
      
      error message as below:
        Dec 12 19:08:59 linux kernel: IPMI BT: timeout in RD_WAIT [ ] 1 retries left
        Dec 12 19:08:59 linux kernel: BT: write 4 bytes seq=0x01 03 18 00 01
        [...]
        Dec 12 19:12:19 linux kernel: IPMI BT: timeout in RD_WAIT [ ]
        Dec 12 19:12:19 linux kernel: failed 2 retries, sending error response
        Dec 12 19:12:19 linux kernel: IPMI: BT reset (takes 5 secs)
        Dec 12 19:12:19 linux kernel: IPMI BT: flag reset [ ]
      
      Function wait_for_msg_done() use schedule_timeout_uninterruptible(1) to
      sleep 1 tick, so we should subtract jiffies_to_usecs(1) instead of 100
      usecs from timeout.
      Reported-by: default avatarHu Shiyuan <hushiyuan@huawei.com>
      Signed-off-by: default avatarXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      e08ca627
    • Suresh Siddha's avatar
      x86, kvm: fix kvm's usage of kernel_fpu_begin/end() · b10e02da
      Suresh Siddha authored
      commit b1a74bf8 upstream.
      
      Preemption is disabled between kernel_fpu_begin/end() and as such
      it is not a good idea to use these routines in kvm_load/put_guest_fpu()
      which can be very far apart.
      
      kvm_load/put_guest_fpu() routines are already called with
      preemption disabled and KVM already uses the preempt notifier to save
      the guest fpu state using kvm_put_guest_fpu().
      
      So introduce __kernel_fpu_begin/end() routines which don't touch
      preemption and use them instead of kernel_fpu_begin/end()
      for KVM's use model of saving/restoring guest FPU state.
      
      Also with this change (and with eagerFPU model), fix the host cr0.TS vm-exit
      state in the case of VMX. For eagerFPU case, host cr0.TS is always clear.
      So no need to worry about it. For the traditional lazyFPU restore case,
      change the cr0.TS bit for the host state during vm-exit to be always clear
      and cr0.TS bit is set in the __vmx_load_host_state() when the FPU
      (guest FPU or the host task's FPU) state is not active. This ensures
      that the host/guest FPU state is properly saved, restored
      during context-switch and with interrupts (using irq_fpu_usable()) not
      stomping on the active FPU state.
      Signed-off-by: default avatarSuresh Siddha <suresh.b.siddha@intel.com>
      Link: http://lkml.kernel.org/r/1348164109.26695.338.camel@sbsiddha-desk.sc.intel.com
      Cc: Avi Kivity <avi@redhat.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      [xr: Backported to 3.4: Adjust context]
      Signed-off-by: default avatarRui Xiang <rui.xiang@huawei.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      b10e02da
    • Suresh Siddha's avatar
      x86, kvm: use kernel_fpu_begin/end() in kvm_load/put_guest_fpu() · ac573c10
      Suresh Siddha authored
      commit 9c1c3fac upstream.
      
      kvm's guest fpu save/restore should be wrapped around
      kernel_fpu_begin/end(). This will avoid for example taking a DNA
      in kvm_load_guest_fpu() when it tries to load the fpu immediately
      after doing unlazy_fpu() on the host side.
      
      More importantly this will prevent the host process fpu from being
      corrupted.
      Signed-off-by: default avatarSuresh Siddha <suresh.b.siddha@intel.com>
      Link: http://lkml.kernel.org/r/1345842782-24175-4-git-send-email-suresh.b.siddha@intel.com
      Cc: Avi Kivity <avi@redhat.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Cc: Rui Xiang <rui.xiang@huawei.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      ac573c10
    • David S. Miller's avatar
      ipv4: Missing sk_nulls_node_init() in ping_unhash(). · 7014d74f
      David S. Miller authored
      commit a134f083 upstream.
      
      If we don't do that, then the poison value is left in the ->pprev
      backlink.
      
      This can cause crashes if we do a disconnect, followed by a connect().
      Tested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Reported-by: default avatarWen Xu <hotdog3645@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      7014d74f
    • Benjamin Randazzo's avatar
      md: use kzalloc() when bitmap is disabled · 842c3621
      Benjamin Randazzo authored
      commit b6878d9e upstream.
      
      In drivers/md/md.c get_bitmap_file() uses kmalloc() for creating a
      mdu_bitmap_file_t called "file".
      
      5769         file = kmalloc(sizeof(*file), GFP_NOIO);
      5770         if (!file)
      5771                 return -ENOMEM;
      
      This structure is copied to user space at the end of the function.
      
      5786         if (err == 0 &&
      5787             copy_to_user(arg, file, sizeof(*file)))
      5788                 err = -EFAULT
      
      But if bitmap is disabled only the first byte of "file" is initialized
      with zero, so it's possible to read some bytes (up to 4095) of kernel
      space memory from user space. This is an information leak.
      
      5775         /* bitmap disabled, zero the first byte and copy out */
      5776         if (!mddev->bitmap_info.file)
      5777                 file->pathname[0] = '\0';
      Signed-off-by: default avatarBenjamin Randazzo <benjamin@randazzo.fr>
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      [lizf: Backported to 3.4: fix both branches]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      842c3621
    • Jan Kara's avatar
      udf: Check length of extended attributes and allocation descriptors · 97186c09
      Jan Kara authored
      commit 23b133bd upstream.
      
      Check length of extended attributes and allocation descriptors when
      loading inodes from disk. Otherwise corrupted filesystems could confuse
      the code and make the kernel oops.
      Reported-by: default avatarCarl Henrik Lunde <chlunde@ping.uio.no>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      [lizf: Backported to 3.4:
       - call make_bad_inode() and then return
       - relace bs with inode->i_sb->s_blocksize]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      97186c09
    • Steven Rostedt's avatar
      tracing: Have filter check for balanced ops · ea1e8ee0
      Steven Rostedt authored
      commit 2cf30dc1 upstream.
      
      When the following filter is used it causes a warning to trigger:
      
       # cd /sys/kernel/debug/tracing
       # echo "((dev==1)blocks==2)" > events/ext4/ext4_truncate_exit/filter
      -bash: echo: write error: Invalid argument
       # cat events/ext4/ext4_truncate_exit/filter
      ((dev==1)blocks==2)
      ^
      parse_error: No error
      
       ------------[ cut here ]------------
       WARNING: CPU: 2 PID: 1223 at kernel/trace/trace_events_filter.c:1640 replace_preds+0x3c5/0x990()
       Modules linked in: bnep lockd grace bluetooth  ...
       CPU: 3 PID: 1223 Comm: bash Tainted: G        W       4.1.0-rc3-test+ #450
       Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
        0000000000000668 ffff8800c106bc98 ffffffff816ed4f9 ffff88011ead0cf0
        0000000000000000 ffff8800c106bcd8 ffffffff8107fb07 ffffffff8136b46c
        ffff8800c7d81d48 ffff8800d4c2bc00 ffff8800d4d4f920 00000000ffffffea
       Call Trace:
        [<ffffffff816ed4f9>] dump_stack+0x4c/0x6e
        [<ffffffff8107fb07>] warn_slowpath_common+0x97/0xe0
        [<ffffffff8136b46c>] ? _kstrtoull+0x2c/0x80
        [<ffffffff8107fb6a>] warn_slowpath_null+0x1a/0x20
        [<ffffffff81159065>] replace_preds+0x3c5/0x990
        [<ffffffff811596b2>] create_filter+0x82/0xb0
        [<ffffffff81159944>] apply_event_filter+0xd4/0x180
        [<ffffffff81152bbf>] event_filter_write+0x8f/0x120
        [<ffffffff811db2a8>] __vfs_write+0x28/0xe0
        [<ffffffff811dda43>] ? __sb_start_write+0x53/0xf0
        [<ffffffff812e51e0>] ? security_file_permission+0x30/0xc0
        [<ffffffff811dc408>] vfs_write+0xb8/0x1b0
        [<ffffffff811dc72f>] SyS_write+0x4f/0xb0
        [<ffffffff816f5217>] system_call_fastpath+0x12/0x6a
       ---[ end trace e11028bd95818dcd ]---
      
      Worse yet, reading the error message (the filter again) it says that
      there was no error, when there clearly was. The issue is that the
      code that checks the input does not check for balanced ops. That is,
      having an op between a closed parenthesis and the next token.
      
      This would only cause a warning, and fail out before doing any real
      harm, but it should still not caues a warning, and the error reported
      should work:
      
       # cd /sys/kernel/debug/tracing
       # echo "((dev==1)blocks==2)" > events/ext4/ext4_truncate_exit/filter
      -bash: echo: write error: Invalid argument
       # cat events/ext4/ext4_truncate_exit/filter
      ((dev==1)blocks==2)
      ^
      parse_error: Meaningless filter expression
      
      And give no kernel warning.
      
      Link: http://lkml.kernel.org/r/20150615175025.7e809215@gandalf.local.home
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Reported-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Tested-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      [lizf: Backported to 3.4: remove the check for OP_NOT, as it's not supported.]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      ea1e8ee0