1. 05 Jul, 2019 3 commits
    • Shakeel Butt's avatar
      mm/vmscan.c: prevent useless kswapd loops · dffcac2c
      Shakeel Butt authored
      In production we have noticed hard lockups on large machines running
      large jobs due to kswaps hoarding lru lock within isolate_lru_pages when
      sc->reclaim_idx is 0 which is a small zone.  The lru was couple hundred
      GiBs and the condition (page_zonenum(page) > sc->reclaim_idx) in
      isolate_lru_pages() was basically skipping GiBs of pages while holding
      the LRU spinlock with interrupt disabled.
      
      On further inspection, it seems like there are two issues:
      
      (1) If kswapd on the return from balance_pgdat() could not sleep (i.e.
          node is still unbalanced), the classzone_idx is unintentionally set
          to 0 and the whole reclaim cycle of kswapd will try to reclaim only
          the lowest and smallest zone while traversing the whole memory.
      
      (2) Fundamentally isolate_lru_pages() is really bad when the
          allocation has woken kswapd for a smaller zone on a very large machine
          running very large jobs.  It can hoard the LRU spinlock while skipping
          over 100s of GiBs of pages.
      
      This patch only fixes (1).  (2) needs a more fundamental solution.  To
      fix (1), in the kswapd context, if pgdat->kswapd_classzone_idx is
      invalid use the classzone_idx of the previous kswapd loop otherwise use
      the one the waker has requested.
      
      Link: http://lkml.kernel.org/r/20190701201847.251028-1-shakeelb@google.com
      Fixes: e716f2eb ("mm, vmscan: prevent kswapd sleeping prematurely due to mismatched classzone_idx")
      Signed-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Reviewed-by: default avatarYang Shi <yang.shi@linux.alibaba.com>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: Roman Gushchin <guro@fb.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dffcac2c
    • Eric Biggers's avatar
      fs/userfaultfd.c: disable irqs for fault_pending and event locks · cbcfa130
      Eric Biggers authored
      When IOCB_CMD_POLL is used on a userfaultfd, aio_poll() disables IRQs
      and takes kioctx::ctx_lock, then userfaultfd_ctx::fd_wqh.lock.
      
      This may have to wait for userfaultfd_ctx::fd_wqh.lock to be released by
      userfaultfd_ctx_read(), which in turn can be waiting for
      userfaultfd_ctx::fault_pending_wqh.lock or
      userfaultfd_ctx::event_wqh.lock.
      
      But elsewhere the fault_pending_wqh and event_wqh locks are taken with
      IRQs enabled.  Since the IRQ handler may take kioctx::ctx_lock, lockdep
      reports that a deadlock is possible.
      
      Fix it by always disabling IRQs when taking the fault_pending_wqh and
      event_wqh locks.
      
      Commit ae62c16e ("userfaultfd: disable irqs when taking the
      waitqueue lock") didn't fix this because it only accounted for the
      fd_wqh lock, not the other locks nested inside it.
      
      Link: http://lkml.kernel.org/r/20190627075004.21259-1-ebiggers@kernel.org
      Fixes: bfe4037e ("aio: implement IOCB_CMD_POLL")
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Reported-by: syzbot+fab6de82892b6b9c6191@syzkaller.appspotmail.com
      Reported-by: syzbot+53c0b767f7ca0dc0c451@syzkaller.appspotmail.com
      Reported-by: syzbot+a3accb352f9c22041cfa@syzkaller.appspotmail.com
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: <stable@vger.kernel.org>	[4.19+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cbcfa130
    • Juergen Gross's avatar
      mm/page_alloc.c: fix regression with deferred struct page init · b9705d87
      Juergen Gross authored
      Commit 0e56acae ("mm: initialize MAX_ORDER_NR_PAGES at a time
      instead of doing larger sections") is causing a regression on some
      systems when the kernel is booted as Xen dom0.
      
      The system will just hang in early boot.
      
      Reason is an endless loop in get_page_from_freelist() in case the first
      zone looked at has no free memory.  deferred_grow_zone() is always
      returning true due to the following code snipplet:
      
        /* If the zone is empty somebody else may have cleared out the zone */
        if (!deferred_init_mem_pfn_range_in_zone(&i, zone, &spfn, &epfn,
                                                 first_deferred_pfn)) {
                pgdat->first_deferred_pfn = ULONG_MAX;
                pgdat_resize_unlock(pgdat, &flags);
                return true;
        }
      
      This in turn results in the loop as get_page_from_freelist() is assuming
      forward progress can be made by doing some more struct page
      initialization.
      
      Link: http://lkml.kernel.org/r/20190620160821.4210-1-jgross@suse.com
      Fixes: 0e56acae ("mm: initialize MAX_ORDER_NR_PAGES at a time instead of doing larger sections")
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Suggested-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Acked-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b9705d87
  2. 04 Jul, 2019 4 commits
    • Linus Torvalds's avatar
      Merge tag 'sound-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · c212ddae
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "Here are a collection of small fixes for:
      
         - A race with ASoC HD-audio registration
      
         - LINE6 usb-audio memory overwrite by malformed descriptor
      
         - FireWire MIDI handling
      
         - Missing cast for bit shifts in a few USB-audio quirks
      
         - The wrong function calls in minor OSS sequencer code paths
      
         - A couple of HD-audio quirks"
      
      * tag 'sound-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: line6: Fix write on zero-sized buffer
        ALSA: hda: Fix widget_mutex incomplete protection
        ALSA: firewire-lib/fireworks: fix miss detection of received MIDI messages
        ALSA: seq: fix incorrect order of dest_client/dest_ports arguments
        ALSA: hda/realtek - Change front mic location for Lenovo M710q
        ALSA: usb-audio: fix sign unintended sign extension on left shifts
        ALSA: hda/realtek: Add quirks for several Clevo notebook barebones
      c212ddae
    • Jann Horn's avatar
      ptrace: Fix ->ptracer_cred handling for PTRACE_TRACEME · 6994eefb
      Jann Horn authored
      Fix two issues:
      
      When called for PTRACE_TRACEME, ptrace_link() would obtain an RCU
      reference to the parent's objective credentials, then give that pointer
      to get_cred().  However, the object lifetime rules for things like
      struct cred do not permit unconditionally turning an RCU reference into
      a stable reference.
      
      PTRACE_TRACEME records the parent's credentials as if the parent was
      acting as the subject, but that's not the case.  If a malicious
      unprivileged child uses PTRACE_TRACEME and the parent is privileged, and
      at a later point, the parent process becomes attacker-controlled
      (because it drops privileges and calls execve()), the attacker ends up
      with control over two processes with a privileged ptrace relationship,
      which can be abused to ptrace a suid binary and obtain root privileges.
      
      Fix both of these by always recording the credentials of the process
      that is requesting the creation of the ptrace relationship:
      current_cred() can't change under us, and current is the proper subject
      for access control.
      
      This change is theoretically userspace-visible, but I am not aware of
      any code that it will actually break.
      
      Fixes: 64b875f7 ("ptrace: Capture the ptracer's creds not PT_PTRACE_CAP")
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6994eefb
    • Linus Torvalds's avatar
      Merge tag 'trace-v5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 550d1f5b
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
       "This includes three fixes:
      
         - Fix a deadlock from a previous fix to keep module loading and
           function tracing text modifications from stepping on each other
           (this has a few patches to help document the issue in comments)
      
         - Fix a crash when the snapshot buffer gets out of sync with the main
           ring buffer
      
         - Fix a memory leak when reading the memory logs"
      
      * tag 'trace-v5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        ftrace/x86: Anotate text_mutex split between ftrace_arch_code_modify_post_process() and ftrace_arch_code_modify_prepare()
        tracing/snapshot: Resize spare buffer if size changed
        tracing: Fix memory leak in tracing_err_log_open()
        ftrace/x86: Add a comment to why we take text_mutex in ftrace_arch_code_modify_prepare()
        ftrace/x86: Remove possible deadlock between register_kprobe() and ftrace_run_update_code()
      550d1f5b
    • Linus Torvalds's avatar
      Merge tag 'gpio-v5.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio · 179c96d9
      Linus Torvalds authored
      Pull GPIO fix from Linus Walleij:
       "A single fixup for the SPI CS gpios that regressed in the current
        kernel cycle"
      
      * tag 'gpio-v5.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
        gpio/spi: Fix spi-gpio regression on active high CS
      179c96d9
  3. 03 Jul, 2019 4 commits
  4. 02 Jul, 2019 3 commits
    • Linus Walleij's avatar
      gpio/spi: Fix spi-gpio regression on active high CS · fbbf145a
      Linus Walleij authored
      I ran into an intriguing bug caused by
      commit ""spi: gpio: Don't request CS GPIO in DT use-case"
      affecting all SPI GPIO devices with an active high
      chip select line.
      
      The commit switches the CS gpio handling over to the GPIO
      core, which will parse and handle "cs-gpios" from the OF
      node without even calling down to the driver to get the
      job done.
      
      However the GPIO core handles the standard bindings in
      Documentation/devicetree/bindings/spi/spi-controller.yaml
      that specifies that active high CS needs to be specified
      using "spi-cs-high" in the DT node.
      
      The code in drivers/spi/spi-gpio.c never respected this
      and never tried to inspect subnodes to see if they contained
      "spi-cs-high" like the gpiolib OF quirks does. Instead the
      only way to get an active high CS was to tag it in the
      device tree using the flags cell such as
      cs-gpios = <&gpio 4 GPIO_ACTIVE_HIGH>;
      
      This alters the quirks to not inspect the subnodes of SPI
      masters on "spi-gpio" for the standard attribute "spi-cs-high",
      making old device trees work as expected.
      
      This semantic is a bit ambigous, but just allowing the
      flags on the GPIO descriptor to modify polarity is what
      the kernel at large mostly uses so let's encourage that.
      
      Fixes: 249e2632 ("spi: gpio: Don't request CS GPIO in DT use-case")
      Cc: Andrey Smirnov <andrew.smirnov@gmail.com>
      Cc: linux-gpio@vger.kernel.org
      Cc: linux-spi@vger.kernel.org
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      fbbf145a
    • Jiri Kosina's avatar
      ftrace/x86: Anotate text_mutex split between... · 074376ac
      Jiri Kosina authored
      ftrace/x86: Anotate text_mutex split between ftrace_arch_code_modify_post_process() and ftrace_arch_code_modify_prepare()
      
      ftrace_arch_code_modify_prepare() is acquiring text_mutex, while the
      corresponding release is happening in ftrace_arch_code_modify_post_process().
      
      This has already been documented in the code, but let's also make the fact
      that this is intentional clear to the semantic analysis tools such as sparse.
      
      Link: http://lkml.kernel.org/r/nycvar.YFH.7.76.1906292321170.27227@cbobk.fhfr.pm
      
      Fixes: 39611265 ("ftrace/x86: Add a comment to why we take text_mutex in ftrace_arch_code_modify_prepare()")
      Fixes: d5b844a2 ("ftrace/x86: Remove possible deadlock between register_kprobe() and ftrace_run_update_code()")
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      074376ac
    • Takashi Iwai's avatar
      ALSA: line6: Fix write on zero-sized buffer · 34501219
      Takashi Iwai authored
      LINE6 drivers allocate the buffers based on the value returned from
      usb_maxpacket() calls.  The manipulated device may return zero for
      this, and this results in the kmalloc() with zero size (and it may
      succeed) while the other part of the driver code writes the packet
      data with the fixed size -- which eventually overwrites.
      
      This patch adds a simple sanity check for the invalid buffer size for
      avoiding that problem.
      
      Reported-by: syzbot+219f00fb49874dcaea17@syzkaller.appspotmail.com
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      34501219
  5. 01 Jul, 2019 3 commits
    • Evan Green's avatar
      ALSA: hda: Fix widget_mutex incomplete protection · 98482377
      Evan Green authored
      The widget_mutex was introduced to serialize callers to
      hda_widget_sysfs_{re}init. However, its protection of the sysfs widget array
      is incomplete. For example, it is acquired around the call to
      hda_widget_sysfs_reinit(), which actually creates the new array, but isn't
      still acquired when codec->num_nodes and codec->start_nid is updated. So
      the lock ensures one thread sets up the new array at a time, but doesn't
      ensure which thread's value will end up in codec->num_nodes. If a larger
      num_nodes wins but a smaller array was set up, the next call to
      refresh_widgets() will touch free memory as it iterates over codec->num_nodes
      that aren't there.
      
      The widget_lock really protects both the tree as well as codec->num_nodes,
      start_nid, and end_nid, so make sure it's held across that update. It should
      also be held during snd_hdac_get_sub_nodes(), so that a very old read from that
      function doesn't end up clobbering a later update.
      
      Fixes: ed180abb ("ALSA: hda: Fix race between creating and refreshing sysfs entries")
      Signed-off-by: default avatarEvan Green <evgreen@chromium.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      98482377
    • Takashi Sakamoto's avatar
      ALSA: firewire-lib/fireworks: fix miss detection of received MIDI messages · 7fbd1753
      Takashi Sakamoto authored
      In IEC 61883-6, 8 MIDI data streams are multiplexed into single
      MIDI conformant data channel. The index of stream is calculated by
      modulo 8 of the value of data block counter.
      
      In fireworks, the value of data block counter in CIP header has a quirk
      with firmware version v5.0.0, v5.7.3 and v5.8.0. This brings ALSA
      IEC 61883-1/6 packet streaming engine to miss detection of MIDI
      messages.
      
      This commit fixes the miss detection to modify the value of data block
      counter for the modulo calculation.
      
      For maintainers, this bug exists since a commit 18f5ed36 ("ALSA:
      fireworks/firewire-lib: add support for recent firmware quirk") in Linux
      kernel v4.2. There're many changes since the commit.  This fix can be
      backported to Linux kernel v4.4 or later. I tagged a base commit to the
      backport for your convenience.
      
      Besides, my work for Linux kernel v5.3 brings heavy code refactoring and
      some structure members are renamed in 'sound/firewire/amdtp-stream.h'.
      The content of this patch brings conflict when merging -rc tree with
      this patch and the latest tree. I request maintainers to solve the
      conflict to replace 'tx_first_dbc' with 'ctx_data.tx.first_dbc'.
      
      Fixes: df075fee ("ALSA: firewire-lib: complete AM824 data block processing layer")
      Cc: <stable@vger.kernel.org> # v4.4+
      Signed-off-by: default avatarTakashi Sakamoto <o-takashi@sakamocchi.jp>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      7fbd1753
    • Christian Brauner's avatar
      fork: return proper negative error code · 28dd29c0
      Christian Brauner authored
      Make sure to return a proper negative error code from copy_process()
      when anon_inode_getfile() fails with CLONE_PIDFD.
      Otherwise _do_fork() will not detect an error and get_task_pid() will
      operator on a nonsensical pointer:
      
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006dbc2c
      R13: 00007ffc15fbb0ff R14: 00007ff07e47e9c0 R15: 0000000000000000
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      CPU: 1 PID: 7990 Comm: syz-executor290 Not tainted 5.2.0-rc6+ #9
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      RIP: 0010:__read_once_size include/linux/compiler.h:194 [inline]
      RIP: 0010:get_task_pid+0xe1/0x210 kernel/pid.c:372
      Code: 89 ff e8 62 27 5f 00 49 8b 07 44 89 f1 4c 8d bc c8 90 01 00 00 eb 0c
      e8 0d fe 25 00 49 81 c7 38 05 00 00 4c 89 f8 48 c1 e8 03 <80> 3c 18 00 74
      08 4c 89 ff e8 31 27 5f 00 4d 8b 37 e8 f9 47 12 00
      RSP: 0018:ffff88808a4a7d78 EFLAGS: 00010203
      RAX: 00000000000000a7 RBX: dffffc0000000000 RCX: ffff888088180600
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
      RBP: ffff88808a4a7d90 R08: ffffffff814fb3a8 R09: ffffed1015d66bf8
      R10: ffffed1015d66bf8 R11: 1ffff11015d66bf7 R12: 0000000000041ffc
      R13: 1ffff11011494fbc R14: 0000000000000000 R15: 000000000000053d
      FS:  00007ff07e47e700(0000) GS:ffff8880aeb00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000004b5100 CR3: 0000000094df2000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
        _do_fork+0x1b9/0x5f0 kernel/fork.c:2360
        __do_sys_clone kernel/fork.c:2454 [inline]
        __se_sys_clone kernel/fork.c:2448 [inline]
        __x64_sys_clone+0xc1/0xd0 kernel/fork.c:2448
        do_syscall_64+0xfe/0x140 arch/x86/entry/common.c:301
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Link: https://lore.kernel.org/lkml/000000000000e0dc0d058c9e7142@google.com
      Reported-and-tested-by: syzbot+002e636502bc4b64eb5c@syzkaller.appspotmail.com
      Fixes: 6fd2fe49 ("copy_process(): don't use ksys_close() on cleanups")
      Cc: Jann Horn <jannh@google.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarChristian Brauner <christian@brauner.io>
      28dd29c0
  6. 30 Jun, 2019 3 commits
  7. 29 Jun, 2019 20 commits