1. 27 Jun, 2016 2 commits
    • Andreas Gruenbacher's avatar
      gfs2: Get rid of gfs2_ilookup · ec5ec66b
      Andreas Gruenbacher authored
      Now that gfs2_lookup_by_inum only takes the inode glock for new inodes
      (and not for cached inodes anymore), there no longer is a need to
      optimize the cached-inode case in gfs2_get_dentry or delete_work_func,
      and gfs2_ilookup can be removed.
      
      In addition, gfs2_get_dentry wasn't checking the GFS2_DIF_SYSTEM flag in
      i_diskflags in the gfs2_ilookup case (see gfs2_lookup_by_inum); this
      inconsistency goes away as well.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      ec5ec66b
    • Andreas Gruenbacher's avatar
      gfs2: Fix gfs2_lookup_by_inum lock inversion · 3ce37b2c
      Andreas Gruenbacher authored
      The current gfs2_lookup_by_inum takes the glock of a presumed inode
      identified by block number, verifies that the block is indeed an inode,
      and then instantiates and reads the new inode via gfs2_inode_lookup.
      
      However, instantiating a new inode may block on freeing a previous
      instance of that inode (__wait_on_freeing_inode), and freeing an inode
      requires to take the glock already held, leading to lock inversion and
      deadlock.
      
      Fix this by first instantiating the new inode, then verifying that the
      block is an inode (if required), and then reading in the new inode, all
      in gfs2_inode_lookup.
      
      If the block we are looking for is not an inode, we discard the new
      inode via iget_failed, which marks inodes as bad and unhashes them.
      Other tasks waiting on that inode will get back a bad inode back from
      ilookup or iget_locked; in that case, retry the lookup.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      3ce37b2c
  2. 17 Jun, 2016 1 commit
  3. 10 Jun, 2016 1 commit
    • Bob Peterson's avatar
      GFS2: don't set rgrp gl_object until it's inserted into rgrp tree · 36e4ad03
      Bob Peterson authored
      Before this patch, function read_rindex_entry would set a rgrp
      glock's gl_object pointer to itself before inserting the rgrp into
      the rgrp rbtree. The problem is: if another process was also reading
      the rgrp in, and had already inserted its newly created rgrp, then
      the second call to read_rindex_entry would overwrite that value,
      then return a bad return code to the caller. Later, other functions
      would reference the now-freed rgrp memory by way of gl_object.
      In some cases, that could result in gfs2_rgrp_brelse being called
      twice for the same rgrp: once for the failed attempt and once for
      the "real" rgrp release. Eventually the kernel would panic.
      There are also a number of other things that could go wrong when
      a kernel module is accessing freed storage. For example, this could
      result in rgrp corruption because the fake rgrp would point to a
      fake bitmap in memory too, causing gfs2_inplace_reserve to search
      some random memory for free blocks, and find some, since we were
      never setting rgd->rd_bits to NULL before freeing it.
      
      This patch fixes the problem by not setting gl_object until we
      have successfully inserted the rgrp into the rbtree. Also, it sets
      rd_bits to NULL as it frees them, which will ensure any accidental
      access to the wrong rgrp will result in a kernel panic rather than
      file system corruption, which is preferred.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      36e4ad03
  4. 24 May, 2016 36 commits
    • Linus Torvalds's avatar
      Merge tag 'for-linus-4.7-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 29567292
      Linus Torvalds authored
      Pull xen bug fixes from David Vrabel.
      
      * tag 'for-linus-4.7-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen: use same main loop for counting and remapping pages
        xen/events: Don't move disabled irqs
        xen/x86: actually allocate legacy interrupts on PV guests
        Xen: don't warn about 2-byte wchar_t in efi
        xen/gntdev: reduce copy batch size to 16
        xen/x86: don't lose event interrupts
      29567292
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · ecaba718
      Linus Torvalds authored
      Pull virtio updates from Michael Tsirkin:
       "Looks like a quiet cycle for virtio.  There's a new inorder option for
        the ringtest tool, and a bugfix for balloon for ppc platforms when
        using virtio 1 mode"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        ringtest: pass buf != NULL
        virtio_balloon: fix PFN format for virtio-1
        virtio: add inorder option
      ecaba718
    • Linus Torvalds's avatar
      Merge tag 'nios2-v4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2 · e989cc56
      Linus Torvalds authored
      Pull nios2 update from Ley Foon Tan:
       - add order-only DTC dependency to %.dtb target
       - fix libgcc location detection
      
      * tag 'nios2-v4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2:
        nios2: Add order-only DTC dependency to %.dtb target
        nios2: Fix libgcc location detection
      e989cc56
    • Linus Torvalds's avatar
      Merge tag 'microblaze-4.7-rc1' of git://git.monstr.eu/linux-2.6-microblaze · 36b150bb
      Linus Torvalds authored
      Pull Microblaze updates from Michal Simek:
      
       - Wire-up new syscalls
      
       - Fix link error
      
      * tag 'microblaze-4.7-rc1' of git://git.monstr.eu/linux-2.6-microblaze:
        microblaze: pci: export isa_io_base to fix link errors
        microblaze: Wire up userfaultfd, membarrier, mlock2 syscalls
      36b150bb
    • Juergen Gross's avatar
      xen: use same main loop for counting and remapping pages · dd14be92
      Juergen Gross authored
      Instead of having two functions for cycling through the E820 map in
      order to count to be remapped pages and remap them later, just use one
      function with a caller supplied sub-function called for each region to
      be processed. This eliminates the possibility of a mismatch between
      both loops which showed up in certain configurations.
      Suggested-by: default avatarEd Swierk <eswierk@skyportsystems.com>
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      dd14be92
    • Ross Lagerwall's avatar
      xen/events: Don't move disabled irqs · f0f39387
      Ross Lagerwall authored
      Commit ff1e22e7 ("xen/events: Mask a moving irq") open-coded
      irq_move_irq() but left out checking if the IRQ is disabled. This broke
      resuming from suspend since it tries to move a (disabled) irq without
      holding the IRQ's desc->lock. Fix it by adding in a check for disabled
      IRQs.
      
      The resulting stacktrace was:
      kernel BUG at /build/linux-UbQGH5/linux-4.4.0/kernel/irq/migration.c:31!
      invalid opcode: 0000 [#1] SMP
      Modules linked in: xenfs xen_privcmd ...
      CPU: 0 PID: 9 Comm: migration/0 Not tainted 4.4.0-22-generic #39-Ubuntu
      Hardware name: Xen HVM domU, BIOS 4.6.1-xs125180 05/04/2016
      task: ffff88003d75ee00 ti: ffff88003d7bc000 task.ti: ffff88003d7bc000
      RIP: 0010:[<ffffffff810e26e2>]  [<ffffffff810e26e2>] irq_move_masked_irq+0xd2/0xe0
      RSP: 0018:ffff88003d7bfc50  EFLAGS: 00010046
      RAX: 0000000000000000 RBX: ffff88003d40ba00 RCX: 0000000000000001
      RDX: 0000000000000001 RSI: 0000000000000100 RDI: ffff88003d40bad8
      RBP: ffff88003d7bfc68 R08: 0000000000000000 R09: ffff88003d000000
      R10: 0000000000000000 R11: 000000000000023c R12: ffff88003d40bad0
      R13: ffffffff81f3a4a0 R14: 0000000000000010 R15: 00000000ffffffff
      FS:  0000000000000000(0000) GS:ffff88003da00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fd4264de624 CR3: 0000000037922000 CR4: 00000000003406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Stack:
       ffff88003d40ba38 0000000000000024 0000000000000000 ffff88003d7bfca0
       ffffffff814c8d92 00000010813ef89d 00000000805ea732 0000000000000009
       0000000000000024 ffff88003cc39b80 ffff88003d7bfce0 ffffffff814c8f66
      Call Trace:
       [<ffffffff814c8d92>] eoi_pirq+0xb2/0xf0
       [<ffffffff814c8f66>] __startup_pirq+0xe6/0x150
       [<ffffffff814ca659>] xen_irq_resume+0x319/0x360
       [<ffffffff814c7e75>] xen_suspend+0xb5/0x180
       [<ffffffff81120155>] multi_cpu_stop+0xb5/0xe0
       [<ffffffff811200a0>] ? cpu_stop_queue_work+0x80/0x80
       [<ffffffff811203d0>] cpu_stopper_thread+0xb0/0x140
       [<ffffffff810a94e6>] ? finish_task_switch+0x76/0x220
       [<ffffffff810ca731>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
       [<ffffffff810a3935>] smpboot_thread_fn+0x105/0x160
       [<ffffffff810a3830>] ? sort_range+0x30/0x30
       [<ffffffff810a0588>] kthread+0xd8/0xf0
       [<ffffffff810a04b0>] ? kthread_create_on_node+0x1e0/0x1e0
       [<ffffffff8182568f>] ret_from_fork+0x3f/0x70
       [<ffffffff810a04b0>] ? kthread_create_on_node+0x1e0/0x1e0
      Signed-off-by: default avatarRoss Lagerwall <ross.lagerwall@citrix.com>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      f0f39387
    • Stefano Stabellini's avatar
      xen/x86: actually allocate legacy interrupts on PV guests · 702f9260
      Stefano Stabellini authored
      b4ff8389 is incomplete: relies on nr_legacy_irqs() to get the number
      of legacy interrupts when actually nr_legacy_irqs() returns 0 after
      probe_8259A(). Use NR_IRQS_LEGACY instead.
      Signed-off-by: default avatarStefano Stabellini <sstabellini@kernel.org>
      CC: stable@vger.kernel.org
      702f9260
    • Arnd Bergmann's avatar
      Xen: don't warn about 2-byte wchar_t in efi · 971a69db
      Arnd Bergmann authored
      The XEN UEFI code has become available on the ARM architecture
      recently, but now causes a link-time warning:
      
      ld: warning: drivers/xen/efi.o uses 2-byte wchar_t yet the output is to use 4-byte wchar_t; use of wchar_t values across objects may fail
      
      This seems harmless, because the efi code only uses 2-byte
      characters when interacting with EFI, so we don't pass on those
      strings to elsewhere in the system, and we just need to
      silence the warning.
      
      It is not clear to me whether we actually need to build the file
      with the -fshort-wchar flag, but if we do, then we should also
      pass --no-wchar-size-warning to the linker, to avoid the warning.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarStefano Stabellini <sstabellini@kernel.org>
      Fixes: 37060935dc04 ("ARM64: XEN: Add a function to initialize Xen specific UEFI runtime services")
      971a69db
    • David Vrabel's avatar
      xen/gntdev: reduce copy batch size to 16 · 36ae220a
      David Vrabel authored
      IOCTL_GNTDEV_GRANT_COPY batches copy operations to reduce the number
      of hypercalls.  The stack is used to avoid a memory allocation in a
      hot path. However, a batch size of 24 requires more than 1024 bytes of
      stack which in some configurations causes a compiler warning.
      
          xen/gntdev.c: In function ‘gntdev_ioctl_grant_copy’:
          xen/gntdev.c:949:1: warning: the frame size of 1248 bytes is
          larger than 1024 bytes [-Wframe-larger-than=]
      
      This is a harmless warning as there is still plenty of stack spare,
      but people keep trying to "fix" it.  Reduce the batch size to 16 to
      reduce stack usage to less than 1024 bytes.  This should have minimal
      impact on performance.
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      36ae220a
    • Stefano Stabellini's avatar
      xen/x86: don't lose event interrupts · c06b6d70
      Stefano Stabellini authored
      On slow platforms with unreliable TSC, such as QEMU emulated machines,
      it is possible for the kernel to request the next event in the past. In
      that case, in the current implementation of xen_vcpuop_clockevent, we
      simply return -ETIME. To be precise the Xen returns -ETIME and we pass
      it on. However the result of this is a missed event, which simply causes
      the kernel to hang.
      
      Instead it is better to always ask the hypervisor for a timer event,
      even if the timeout is in the past. That way there are no lost
      interrupts and the kernel survives. To do that, remove the
      VCPU_SSHOTTMR_future flag.
      Signed-off-by: default avatarStefano Stabellini <sstabellini@kernel.org>
      Acked-by: default avatarJuergen Gross <jgross@suse.com>
      c06b6d70
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 84787c57
      Linus Torvalds authored
      Merge yet more updates from Andrew Morton:
      
       - Oleg's "wait/ptrace: assume __WALL if the child is traced".  It's a
         kernel-based workaround for existing userspace issues.
      
       - A few hotfixes
      
       - befs cleanups
      
       - nilfs2 updates
      
       - sys_wait() changes
      
       - kexec updates
      
       - kdump
      
       - scripts/gdb updates
      
       - the last of the MM queue
      
       - a few other misc things
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (84 commits)
        kgdb: depends on VT
        drm/amdgpu: make amdgpu_mn_get wait for mmap_sem killable
        drm/radeon: make radeon_mn_get wait for mmap_sem killable
        drm/i915: make i915_gem_mmap_ioctl wait for mmap_sem killable
        uprobes: wait for mmap_sem for write killable
        prctl: make PR_SET_THP_DISABLE wait for mmap_sem killable
        exec: make exec path waiting for mmap_sem killable
        aio: make aio_setup_ring killable
        coredump: make coredump_wait wait for mmap_sem for write killable
        vdso: make arch_setup_additional_pages wait for mmap_sem for write killable
        ipc, shm: make shmem attach/detach wait for mmap_sem killable
        mm, fork: make dup_mmap wait for mmap_sem for write killable
        mm, proc: make clear_refs killable
        mm: make vm_brk killable
        mm, elf: handle vm_brk error
        mm, aout: handle vm_brk failures
        mm: make vm_munmap killable
        mm: make vm_mmap killable
        mm: make mmap_sem for write waits killable for mm syscalls
        MAINTAINERS: add co-maintainer for scripts/gdb
        ...
      84787c57
    • Linus Torvalds's avatar
      Merge tag 'linux-kselftest-4.7-rc1' of... · d62a0234
      Linus Torvalds authored
      Merge tag 'linux-kselftest-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull kselftest updates from Shuah Khan:
       "This update for Kselftest adds:
      
         - a new ftrace testcase
         - fixes for ftrace and intel_pstate tests"
      
      * tag 'linux-kselftest-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
        tools: testing: define the _GNU_SOURCE macro
        kselftests/ftrace: Add a test case for event pid filtering
        kselftests/ftrace: Detect tracefs mount point
      d62a0234
    • Linus Torvalds's avatar
      Merge tag 'trace-v4.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 4496a1d9
      Linus Torvalds authored
      Pull tracing fix from Steven Rostedt:
       "Reviewing the selftest I recently submitted, I realize that the second
        part of it uses my old hack to get the PID of the spawned background
        tasks, which doesn't work for all shells, instead of the common use of
        $!"
      
      * tag 'trace-v4.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        ftracetest: Use proper logic to find process PID
      4496a1d9
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile · d6542d76
      Linus Torvalds authored
      Pull arch/tile updates from Chris Metcalf:
       "This is an even quieter cycle than usual"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
        Fix typo
        Fix typo
        Fix typo
        tile: sort the "select" lines in the TILE/TILEGX configs
        tile: clarify barrier semantics of atomic_add_return
        tile/defconfigs: Remove CONFIG_IPV6_PRIVACY
      d6542d76
    • Linus Torvalds's avatar
      Merge branch 'for-4.7-dw' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata · 3ec438af
      Linus Torvalds authored
      Pull libata sata_dwc_460ex updates from Tejun Heo:
       "Patches to bring sata_dwc_460ex up to snuff.
      
        It was a separate pull request because it depends on dmaengine dw
        platform changes which are now in mainline"
      
      * 'for-4.7-dw' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: (24 commits)
        ata: dwc: add DMADEVICES dependency
        powerpc/4xx: Device tree update for the 460ex DWC SATA
        ata: sata_dwc_460ex: make debug messages neat
        ata: sata_dwc_460ex: supply physical address of FIFO to DMA
        ata: sata_dwc_460ex: use devm_ioremap
        ata: sata_dwc_460ex: tidy up sata_dwc_clear_dmacr()
        ata: sata_dwc_460ex: use readl/writel_relaxed()
        ata: sata_dwc_460ex: switch to new dmaengine_terminate_* API
        ata: sata_dwc_460ex: add __iomem to register base pointer
        ata: sata_dwc_460ex: get rid of incorrect cast
        ata: sata_dwc_460ex: get rid of some pointless casts
        ata: sata_dwc_460ex: remove empty libata callback
        ata: sata_dwc_460ex: correct HOSTDEV{P}_FROM_*() macros
        ata: sata_dwc_460ex: get rid of global data
        ata: sata_dwc_460ex: add phy support
        ata: sata_dwc_460ex: use "dmas" DT property to find dma channel
        ata: sata_dwc_460ex: don't call ata_sff_qc_issue() on DMA commands
        ata: sata_dwc_460ex: skip dma setup for non-dma commands
        ata: sata_dwc_460ex: select only core part of DMA driver
        ata: sata_dwc_460ex: DMA is always a flow controller
        ...
      3ec438af
    • Linus Torvalds's avatar
      Merge branch 'for-4.7-zac' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata · e4f7bdc2
      Linus Torvalds authored
      Pull libata ZAC support from Tejun Heo:
       "This contains Zone ATA Command support for Shingled Magnetic Recording
        devices.
      
        In addition to sending the new commands down to the device, as ZAC
        commands depend on getting a lot of responses from the device, piping
        up responses is beefed up too.  However, it doesn't involve changes to
        libata core mechanism or its interaction with upper layers, so I'm not
        expecting too many fallouts.
      
        Kudos to Hannes for driving SMR support"
      
      * 'for-4.7-zac' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: (28 commits)
        libata: support host-aware and host-managed ZAC devices
        libata: support device-managed ZAC devices
        libata: NCQ encapsulation for ZAC MANAGEMENT OUT
        libata: Implement ZBC OUT translation
        libata: implement ZBC IN translation
        libata: fixup ZAC device disabling
        libata-scsi: Generate sense code for disabled devices
        libata-trace: decode subcommands
        libata: Check log page directory before accessing pages
        libata: Add command definitions for NCQ Encapsulation for READ LOG DMA EXT
        libata: Separate out ata_dev_config_ncq_send_recv()
        libata/libsas: Define ATA_CMD_NCQ_NON_DATA
        libsas: enable FPDMA SEND/RECEIVE
        libata: do not attempt to retrieve sense code twice
        libata-scsi: Set information sense field for invalid parameter
        libata-scsi: set bit pointer for sense code information
        libata-scsi: Set field pointer in sense code
        scsi: add scsi_set_sense_field_pointer()
        libata: Implement control mode page to select sense format
        libata-scsi: generate correct ATA pass-through sense
        ...
      e4f7bdc2
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security · 3159ee58
      Linus Torvalds authored
      Pull more security subsystem updates from James Morris:
       "Minor updates for the keys code"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
        MAINTAINERS: Update keyrings record and add asymmetric keys record
        lib: asn1_decoder - add MODULE_LICENSE("GPL")
        KEYS: The PKCS#7 test key type should use the secondary keyring
      3159ee58
    • Jiri Slaby's avatar
      kgdb: depends on VT · c5d2cac0
      Jiri Slaby authored
      With VT=n, the kernel build fails with:
      
        drivers/built-in.o: In function `kgdboc_pre_exp_handler':
        kgdboc.c:(.text+0x7b5aa): undefined reference to `fg_console'
        kgdboc.c:(.text+0x7b5ce): undefined reference to `vc_cons'
        kgdboc.c:(.text+0x7b5d5): undefined reference to `vc_cons'
      
      kgdboc.o is built when KGDB_SERIAL_CONSOLE is set.  So make
      KGDB_SERIAL_CONSOLE depend on HW_CONSOLE which includes those symbols.
      
      Link: http://lkml.kernel.org/r/1459412955-4696-1-git-send-email-jslaby@suse.czSigned-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Reported-by: default avatar"Jim Davis" <jim.epost@gmail.com>
      Acked-by: default avatarJason Wessel <jason.wessel@windriver.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c5d2cac0
    • Michal Hocko's avatar
      drm/amdgpu: make amdgpu_mn_get wait for mmap_sem killable · b5637051
      Michal Hocko authored
      amdgpu_mn_get which is called during ioct path relies on mmap_sem for
      write.  If the waiting task gets killed by the oom killer it would block
      oom_reaper from asynchronous address space reclaim and reduce the
      chances of timely OOM resolving.  Wait for the lock in the killable mode
      and return with EINTR if the task got killed while waiting.
      
      [arnd@arndb.de: use ERR_PTR() to return from amdgpu_mn_get]
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b5637051
    • Michal Hocko's avatar
      drm/radeon: make radeon_mn_get wait for mmap_sem killable · 2267c299
      Michal Hocko authored
      radeon_mn_get which is called during ioct path relies on mmap_sem for
      write.  If the waiting task gets killed by the oom killer it would block
      oom_reaper from asynchronous address space reclaim and reduce the
      chances of timely OOM resolving.  Wait for the lock in the killable mode
      and return with EINTR if the task got killed while waiting.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: David Airlie <airlied@linux.ie>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2267c299
    • Michal Hocko's avatar
      drm/i915: make i915_gem_mmap_ioctl wait for mmap_sem killable · 80a89a5e
      Michal Hocko authored
      i915_gem_mmap_ioctl relies on mmap_sem for write.  If the waiting task
      gets killed by the oom killer it would block oom_reaper from
      asynchronous address space reclaim and reduce the chances of timely OOM
      resolving.  Wait for the lock in the killable mode and return with EINTR
      if the task got killed while waiting.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Daniel Vetter <daniel.vetter@intel.com>
      Cc: David Airlie <airlied@linux.ie>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      80a89a5e
    • Michal Hocko's avatar
      uprobes: wait for mmap_sem for write killable · 598fdc1d
      Michal Hocko authored
      xol_add_vma needs mmap_sem for write.  If the waiting task gets killed
      by the oom killer it would block oom_reaper from asynchronous address
      space reclaim and reduce the chances of timely OOM resolving.  Wait for
      the lock in the killable mode and return with EINTR if the task got
      killed while waiting.
      
      Do not warn in dup_xol_work if __create_xol_area failed due to fatal
      signal pending because this is usually considered a kernel issue.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      598fdc1d
    • Michal Hocko's avatar
      prctl: make PR_SET_THP_DISABLE wait for mmap_sem killable · 17b0573d
      Michal Hocko authored
      PR_SET_THP_DISABLE requires mmap_sem for write.  If the waiting task
      gets killed by the oom killer it would block oom_reaper from
      asynchronous address space reclaim and reduce the chances of timely OOM
      resolving.  Wait for the lock in the killable mode and return with EINTR
      if the task got killed while waiting.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarAlex Thorlton <athorlton@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      17b0573d
    • Michal Hocko's avatar
      exec: make exec path waiting for mmap_sem killable · f268dfe9
      Michal Hocko authored
      setup_arg_pages requires mmap_sem for write.  If the waiting task gets
      killed by the oom killer it would block oom_reaper from asynchronous
      address space reclaim and reduce the chances of timely OOM resolving.
      Wait for the lock in the killable mode and return with EINTR if the task
      got killed while waiting.  All the callers are already handling error
      path and the fatal signal doesn't need any additional treatment.
      
      The same applies to __bprm_mm_init.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f268dfe9
    • Michal Hocko's avatar
      aio: make aio_setup_ring killable · 013373e8
      Michal Hocko authored
      aio_setup_ring waits for mmap_sem in writable mode.  If the waiting task
      gets killed by the oom killer it would block oom_reaper from
      asynchronous address space reclaim and reduce the chances of timely OOM
      resolving.  Wait for the lock in the killable mode and return with EINTR
      if the task got killed while waiting.  This will also expedite the
      return to the userspace and do_exit.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Benamin LaHaise <bcrl@kvack.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      013373e8
    • Michal Hocko's avatar
      coredump: make coredump_wait wait for mmap_sem for write killable · 4136c26b
      Michal Hocko authored
      coredump_wait waits for mmap_sem for write currently which can prevent
      oom_reaper to reclaim the oom victims address space asynchronously
      because that requires mmap_sem for read.  This might happen if the oom
      victim is multi threaded and some thread(s) is holding mmap_sem for read
      (e.g.  page fault) and it is stuck in the page allocator while other
      thread(s) reached coredump_wait already.
      
      This patch simply uses down_write_killable and bails out with EINTR if
      the lock got interrupted by the fatal signal.  do_coredump will return
      right away and do_group_exit will take care to zap the whole thread
      group.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4136c26b
    • Michal Hocko's avatar
      vdso: make arch_setup_additional_pages wait for mmap_sem for write killable · 69048176
      Michal Hocko authored
      most architectures are relying on mmap_sem for write in their
      arch_setup_additional_pages.  If the waiting task gets killed by the oom
      killer it would block oom_reaper from asynchronous address space reclaim
      and reduce the chances of timely OOM resolving.  Wait for the lock in
      the killable mode and return with EINTR if the task got killed while
      waiting.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: Andy Lutomirski <luto@amacapital.net>	[x86 vdso]
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      69048176
    • Michal Hocko's avatar
      ipc, shm: make shmem attach/detach wait for mmap_sem killable · 91f4f94e
      Michal Hocko authored
      shmat and shmdt rely on mmap_sem for write.  If the waiting task gets
      killed by the oom killer it would block oom_reaper from asynchronous
      address space reclaim and reduce the chances of timely OOM resolving.
      Wait for the lock in the killable mode and return with EINTR if the task
      got killed while waiting.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarDavidlohr Bueso <dave@stgolabs.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      91f4f94e
    • Michal Hocko's avatar
      mm, fork: make dup_mmap wait for mmap_sem for write killable · 7c051267
      Michal Hocko authored
      dup_mmap needs to lock current's mm mmap_sem for write.  If the waiting
      task gets killed by the oom killer it would block oom_reaper from
      asynchronous address space reclaim and reduce the chances of timely OOM
      resolving.  Wait for the lock in the killable mode and return with EINTR
      if the task got killed while waiting.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7c051267
    • Michal Hocko's avatar
      mm, proc: make clear_refs killable · 4e80153a
      Michal Hocko authored
      CLEAR_REFS_MM_HIWATER_RSS and CLEAR_REFS_SOFT_DIRTY are relying on
      mmap_sem for write.  If the waiting task gets killed by the oom killer
      and it would operate on the current's mm it would block oom_reaper from
      asynchronous address space reclaim and reduce the chances of timely OOM
      resolving.  Wait for the lock in the killable mode and return with EINTR
      if the task got killed while waiting.  This will also expedite the
      return to the userspace and do_exit even if the mm is remote.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Petr Cermak <petrcermak@chromium.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4e80153a
    • Michal Hocko's avatar
      mm: make vm_brk killable · 2d6c9282
      Michal Hocko authored
      Now that all the callers handle vm_brk failure we can change it wait for
      mmap_sem killable to help oom_reaper to not get blocked just because
      vm_brk gets blocked behind mmap_sem readers.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2d6c9282
    • Michal Hocko's avatar
      mm, elf: handle vm_brk error · ecc2bc8a
      Michal Hocko authored
      load_elf_library doesn't handle vm_brk failure although nothing really
      indicates it cannot do that because the function is allowed to fail due
      to vm_mmap failures already.  This might be not a problem now but later
      patch will make vm_brk killable (resp.  mmap_sem for write waiting will
      become killable) and so the failure will be more probable.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ecc2bc8a
    • Michal Hocko's avatar
      mm, aout: handle vm_brk failures · 864778b1
      Michal Hocko authored
      vm_brk is allowed to fail but load_aout_binary simply ignores the error
      and happily continues.  I haven't noticed any problem from that in real
      life but later patches will make the failure more likely because vm_brk
      will become killable (resp.  mmap_sem for write waiting will become
      killable) so we should be more careful now.
      
      The error handling should be quite straightforward because there are
      calls to vm_mmap which check the error properly already.  The only
      notable exception is set_brk which is called after beyond_if label.  But
      nothing indicates that we cannot move it above set_binfmt as the two do
      not depend on each other and fail before we do set_binfmt and alter
      reference counting.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      864778b1
    • Michal Hocko's avatar
      mm: make vm_munmap killable · ae798783
      Michal Hocko authored
      Almost all current users of vm_munmap are ignoring the return value and
      so they do not handle potential error.  This means that some VMAs might
      stay behind.  This patch doesn't try to solve those potential problems.
      Quite contrary it adds a new failure mode by using down_write_killable
      in vm_munmap.  This should be safer than other failure modes, though,
      because the process is guaranteed to die as soon as it leaves the kernel
      and exit_mmap will clean the whole address space.
      
      This will help in the OOM conditions when the oom victim might be stuck
      waiting for the mmap_sem for write which in turn can block oom_reaper
      which relies on the mmap_sem for read to make a forward progress and
      reclaim the address space of the victim.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ae798783
    • Michal Hocko's avatar
      mm: make vm_mmap killable · 9fbeb5ab
      Michal Hocko authored
      All the callers of vm_mmap seem to check for the failure already and
      bail out in one way or another on the error which means that we can
      change it to use killable version of vm_mmap_pgoff and return -EINTR if
      the current task gets killed while waiting for mmap_sem.  This also
      means that vm_mmap_pgoff can be killable by default and drop the
      additional parameter.
      
      This will help in the OOM conditions when the oom victim might be stuck
      waiting for the mmap_sem for write which in turn can block oom_reaper
      which relies on the mmap_sem for read to make a forward progress and
      reclaim the address space of the victim.
      
      Please note that load_elf_binary is ignoring vm_mmap error for
      current->personality & MMAP_PAGE_ZERO case but that shouldn't be a
      problem because the address is not used anywhere and we never return to
      the userspace if we got killed.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9fbeb5ab
    • Michal Hocko's avatar
      mm: make mmap_sem for write waits killable for mm syscalls · dc0ef0df
      Michal Hocko authored
      This is a follow up work for oom_reaper [1].  As the async OOM killing
      depends on oom_sem for read we would really appreciate if a holder for
      write didn't stood in the way.  This patchset is changing many of
      down_write calls to be killable to help those cases when the writer is
      blocked and waiting for readers to release the lock and so help
      __oom_reap_task to process the oom victim.
      
      Most of the patches are really trivial because the lock is help from a
      shallow syscall paths where we can return EINTR trivially and allow the
      current task to die (note that EINTR will never get to the userspace as
      the task has fatal signal pending).  Others seem to be easy as well as
      the callers are already handling fatal errors and bail and return to
      userspace which should be sufficient to handle the failure gracefully.
      I am not familiar with all those code paths so a deeper review is really
      appreciated.
      
      As this work is touching more areas which are not directly connected I
      have tried to keep the CC list as small as possible and people who I
      believed would be familiar are CCed only to the specific patches (all
      should have received the cover though).
      
      This patchset is based on linux-next and it depends on
      down_write_killable for rw_semaphores which got merged into tip
      locking/rwsem branch and it is merged into this next tree.  I guess it
      would be easiest to route these patches via mmotm because of the
      dependency on the tip tree but if respective maintainers prefer other
      way I have no objections.
      
      I haven't covered all the mmap_write(mm->mmap_sem) instances here
      
        $ git grep "down_write(.*\<mmap_sem\>)" next/master | wc -l
        98
        $ git grep "down_write(.*\<mmap_sem\>)" | wc -l
        62
      
      I have tried to cover those which should be relatively easy to review in
      this series because this alone should be a nice improvement.  Other
      places can be changed on top.
      
      [0] http://lkml.kernel.org/r/1456752417-9626-1-git-send-email-mhocko@kernel.org
      [1] http://lkml.kernel.org/r/1452094975-551-1-git-send-email-mhocko@kernel.org
      [2] http://lkml.kernel.org/r/1456750705-7141-1-git-send-email-mhocko@kernel.org
      
      This patch (of 18):
      
      This is the first step in making mmap_sem write waiters killable.  It
      focuses on the trivial ones which are taking the lock early after
      entering the syscall and they are not changing state before.
      
      Therefore it is very easy to change them to use down_write_killable and
      immediately return with -EINTR.  This will allow the waiter to pass away
      without blocking the mmap_sem which might be required to make a forward
      progress.  E.g.  the oom reaper will need the lock for reading to
      dismantle the OOM victim address space.
      
      The only tricky function in this patch is vm_mmap_pgoff which has many
      call sites via vm_mmap.  To reduce the risk keep vm_mmap with the
      original non-killable semantic for now.
      
      vm_munmap callers do not bother checking the return value so open code
      it into the munmap syscall path for now for simplicity.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dc0ef0df