1. 15 Feb, 2016 36 commits
  2. 03 Feb, 2016 4 commits
    • Mateusz Guzik's avatar
      prctl: take mmap sem for writing to protect against others · 92d4a198
      Mateusz Guzik authored
      [ Upstream commit ddf1d398 ]
      
      An unprivileged user can trigger an oops on a kernel with
      CONFIG_CHECKPOINT_RESTORE.
      
      proc_pid_cmdline_read takes mmap_sem for reading and obtains args + env
      start/end values. These get sanity checked as follows:
              BUG_ON(arg_start > arg_end);
              BUG_ON(env_start > env_end);
      
      These can be changed by prctl_set_mm. Turns out also takes the semaphore for
      reading, effectively rendering it useless. This results in:
      
        kernel BUG at fs/proc/base.c:240!
        invalid opcode: 0000 [#1] SMP
        Modules linked in: virtio_net
        CPU: 0 PID: 925 Comm: a.out Not tainted 4.4.0-rc8-next-20160105dupa+ #71
        Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
        task: ffff880077a68000 ti: ffff8800784d0000 task.ti: ffff8800784d0000
        RIP: proc_pid_cmdline_read+0x520/0x530
        RSP: 0018:ffff8800784d3db8  EFLAGS: 00010206
        RAX: ffff880077c5b6b0 RBX: ffff8800784d3f18 RCX: 0000000000000000
        RDX: 0000000000000002 RSI: 00007f78e8857000 RDI: 0000000000000246
        RBP: ffff8800784d3e40 R08: 0000000000000008 R09: 0000000000000001
        R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000050
        R13: 00007f78e8857800 R14: ffff88006fcef000 R15: ffff880077c5b600
        FS:  00007f78e884a740(0000) GS:ffff88007b200000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
        CR2: 00007f78e8361770 CR3: 00000000790a5000 CR4: 00000000000006f0
        Call Trace:
          __vfs_read+0x37/0x100
          vfs_read+0x82/0x130
          SyS_read+0x58/0xd0
          entry_SYSCALL_64_fastpath+0x12/0x76
        Code: 4c 8b 7d a8 eb e9 48 8b 9d 78 ff ff ff 4c 8b 7d 90 48 8b 03 48 39 45 a8 0f 87 f0 fe ff ff e9 d1 fe ff ff 4c 8b 7d 90 eb c6 0f 0b <0f> 0b 0f 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
        RIP   proc_pid_cmdline_read+0x520/0x530
        ---[ end trace 97882617ae9c6818 ]---
      
      Turns out there are instances where the code just reads aformentioned
      values without locking whatsoever - namely environ_read and get_cmdline.
      
      Interestingly these functions look quite resilient against bogus values,
      but I don't believe this should be relied upon.
      
      The first patch gets rid of the oops bug by grabbing mmap_sem for
      writing.
      
      The second patch is optional and puts locking around aformentioned
      consumers for safety.  Consumers of other fields don't seem to benefit
      from similar treatment and are left untouched.
      
      This patch (of 2):
      
      The code was taking the semaphore for reading, which does not protect
      against readers nor concurrent modifications.
      
      The problem could cause a sanity checks to fail in procfs's cmdline
      reader, resulting in an OOPS.
      
      Note that some functions perform an unlocked read of various mm fields,
      but they seem to be fine despite possible modificaton.
      Signed-off-by: default avatarMateusz Guzik <mguzik@redhat.com>
      Acked-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Jarod Wilson <jarod@redhat.com>
      Cc: Jan Stancek <jstancek@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Anshuman Khandual <anshuman.linux@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      92d4a198
    • James Bottomley's avatar
      string_helpers: fix precision loss for some inputs · c81f4f03
      James Bottomley authored
      [ Upstream commit 564b026f ]
      
      It was noticed that we lose precision in the final calculation for some
      inputs.  The most egregious example is size=3000 blk_size=1900 in units
      of 10 should yield 5.70 MB but in fact yields 3.00 MB (oops).
      
      This is because the current algorithm doesn't correctly account for
      all the remainders in the logarithms.  Fix this by doing a correct
      calculation in the remainders based on napier's algorithm.
      
      Additionally, now we have the correct result, we have to account for
      arithmetic rounding because we're printing 3 digits of precision.  This
      means that if the fourth digit is five or greater, we have to round up,
      so add a section to ensure correct rounding.  Finally account for all
      possible inputs correctly, including zero for block size.
      
      Fixes: b9f28d86Signed-off-by: default avatarJames Bottomley <JBottomley@Odin.com>
      Reported-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: <stable@vger.kernel.org>	[delay until after 4.4 release]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      c81f4f03
    • Vitaly Kuznetsov's avatar
      lib/string_helpers.c: fix infinite loop in string_get_size() · 4aba5827
      Vitaly Kuznetsov authored
      [ Upstream commit 62bef58a ]
      
      Some string_get_size() calls (e.g.:
       string_get_size(1, 512, STRING_UNITS_10, ..., ...)
       string_get_size(15, 64, STRING_UNITS_10, ..., ...)
      ) result in an infinite loop. The problem is that if size is equal to
      divisor[units]/blk_size and is smaller than divisor[units] we'll end
      up with size == 0 when we start doing sf_cap calculations:
      
      For string_get_size(1, 512, STRING_UNITS_10, ..., ...) case:
         ...
         remainder = do_div(size, divisor[units]); -> size is 0, remainder is 1
         remainder *= blk_size; -> remainder is 512
         ...
         size *= blk_size; -> size is still 0
         size += remainder / divisor[units]; -> size is still 0
      
      The caller causing the issue is sd_read_capacity(), the problem was
      noticed on Hyper-V, such weird size was reported by host when scanning
      collides with device removal.  This is probably a separate issue worth
      fixing, this patch is intended to prevent the library routine from
      infinite looping.
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Acked-by: default avatarJames Bottomley <JBottomley@Odin.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      4aba5827
    • Junil Lee's avatar
      zsmalloc: fix migrate_zspage-zs_free race condition · 37301986
      Junil Lee authored
      [ Upstream commit c102f07c ]
      
      record_obj() in migrate_zspage() does not preserve handle's
      HANDLE_PIN_BIT, set by find_aloced_obj()->trypin_tag(), and implicitly
      (accidentally) un-pins the handle, while migrate_zspage() still performs
      an explicit unpin_tag() on the that handle.  This additional explicit
      unpin_tag() introduces a race condition with zs_free(), which can pin
      that handle by this time, so the handle becomes un-pinned.
      
      Schematically, it goes like this:
      
        CPU0                                        CPU1
        migrate_zspage
          find_alloced_obj
            trypin_tag
              set HANDLE_PIN_BIT                    zs_free()
                                                      pin_tag()
        obj_malloc() -- new object, no tag
        record_obj() -- remove HANDLE_PIN_BIT           set HANDLE_PIN_BIT
        unpin_tag()  -- remove zs_free's HANDLE_PIN_BIT
      
      The race condition may result in a NULL pointer dereference:
      
        Unable to handle kernel NULL pointer dereference at virtual address 00000000
        CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
        PC is at get_zspage_mapping+0x0/0x24
        LR is at obj_free.isra.22+0x64/0x128
        Call trace:
           get_zspage_mapping+0x0/0x24
           zs_free+0x88/0x114
           zram_free_page+0x64/0xcc
           zram_slot_free_notify+0x90/0x108
           swap_entry_free+0x278/0x294
           free_swap_and_cache+0x38/0x11c
           unmap_single_vma+0x480/0x5c8
           unmap_vmas+0x44/0x60
           exit_mmap+0x50/0x110
           mmput+0x58/0xe0
           do_exit+0x320/0x8dc
           do_group_exit+0x44/0xa8
           get_signal+0x538/0x580
           do_signal+0x98/0x4b8
           do_notify_resume+0x14/0x5c
      
      This patch keeps the lock bit in migration path and update value
      atomically.
      Signed-off-by: default avatarJunil Lee <junil0814.lee@lge.com>
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: <stable@vger.kernel.org> [4.1+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      37301986