1. 06 Feb, 2018 1 commit
  2. 04 Feb, 2018 4 commits
  3. 29 Jan, 2018 4 commits
    • Linus Torvalds's avatar
      Merge tag 'acpi-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 1a9a126b
      Linus Torvalds authored
      Pull ACPI updates from Rafael Wysocki:
       "The majority of this is an update of the ACPICA kernel code to
        upstream revision 20171215 with a cosmetic change and a maintainers
        information update on top of it.
      
        The rest is mostly some minor fixes and cleanups in the ACPI drivers
        and cleanups to initialization on x86.
      
        Specifics:
      
         - Update the ACPICA kernel code to upstream revision 20171215 including:
            * Support for ACPI 6.0A changes in the NFIT table (Bob Moore)
            * Local 64-bit divide in string conversions (Bob Moore)
            * Fix for a regression in acpi_evaluate_object_type() (Bob Moore)
            * Fixes for memory leaks during package object resolution (Bob
              Moore)
            * Deployment of safe version of strncpy() (Bob Moore)
            * Debug and messaging updates (Bob Moore)
            * Support for PDTT, SDEV, TPM2 tables in iASL and tools (Bob
              Moore)
            * Null pointer dereference avoidance in Op and cleanups (Colin Ian
              King)
            * Fix for memory leak from building prefixed pathname (Erik
              Schmauss)
            * Coding style fixes, disassembler and compiler updates (Hanjun
              Guo, Erik Schmauss)
            * Additional PPTT flags from ACPI 6.2 (Jeremy Linton)
            * Fix for an off-by-one error in acpi_get_timer_duration()
              (Jung-uk Kim)
            * Infinite loop detection timeout and utilities cleanups (Lv
              Zheng)
            * Windows 10 version 1607 and 1703 OSI strings (Mario
              Limonciello)
      
         - Update ACPICA information in MAINTAINERS to reflect the current
           status of ACPICA maintenance and rename a local variable in one
           function to match the corresponding upstream code (Rafael Wysocki)
      
         - Clean up ACPI-related initialization on x86 (Andy Shevchenko)
      
         - Add support for Intel Merrifield to the ACPI GPIO code (Andy
           Shevchenko)
      
         - Clean up ACPI PMIC drivers (Andy Shevchenko, Arvind Yadav)
      
         - Fix the ACPI Generic Event Device (GED) driver to free IRQs on
           shutdown and clean up the PCI IRQ Link driver (Sinan Kaya)
      
         - Make the GHES code call into the AER driver on all errors and clean
           up the ACPI APEI code (Colin Ian King, Tyler Baicar)
      
         - Make the IA64 ACPI NUMA code parse all SRAT entries (Ganapatrao
           Kulkarni)
      
         - Add a lid switch blacklist to the ACPI button driver and make it
           print extra debug messages on lid events (Hans de Goede)
      
         - Add quirks for Asus GL502VSK and UX305LA to the ACPI battery driver
           and clean it up somewhat (Bjørn Mork, Kai-Heng Feng)
      
         - Add device link for CHT SD card dependency on I2C to the ACPI LPSS
           (Intel SoCs) driver and make it avoid creating platform device
           objects for devices without MMIO resources (Adrian Hunter, Hans de
           Goede)
      
         - Fix the ACPI GPE mask kernel command line parameter handling
           (Prarit Bhargava)
      
         - Fix the handling of (incorrectly exposed) backlight interfaces
           without LCD (Hans de Goede)
      
         - Fix the usage of debugfs_create_*() in the ACPI EC driver (Geert
           Uytterhoeven)"
      
      * tag 'acpi-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (62 commits)
        ACPI/PCI: pci_link: reduce verbosity when IRQ is enabled
        ACPI / LPSS: Do not instiate platform_dev for devs without MMIO resources
        ACPI / PMIC: Convert to use builtin_platform_driver() macro
        ACPI / x86: boot: Propagate error code in acpi_gsi_to_irq()
        ACPICA: Update version to 20171215
        ACPICA: trivial style fix, no functional change
        ACPICA: Fix a couple memory leaks during package object resolution
        ACPICA: Recognize the Windows 10 version 1607 and 1703 OSI strings
        ACPICA: DT compiler: prevent error if optional field at the end of table is not present
        ACPICA: Rename a global variable, no functional change
        ACPICA: Create and deploy safe version of strncpy
        ACPICA: Cleanup the global variables and update comments
        ACPICA: Debugger: fix slight indentation issue
        ACPICA: Fix a regression in the acpi_evaluate_object_type() interface
        ACPICA: Update for a few debug output statements
        ACPICA: Debug output, no functional change
        ACPI: EC: Fix debugfs_create_*() usage
        ACPI / video: Default lcd_only to true on Win8-ready and newer machines
        ACPI / x86: boot: Don't setup SCI on HW-reduced platforms
        ACPI / x86: boot: Use INVALID_ACPI_IRQ instead of 0 for acpi_sci_override_gsi
        ...
      1a9a126b
    • Linus Torvalds's avatar
      Merge tag 'pm-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 7f3fdd40
      Linus Torvalds authored
      Pull power management updates from Rafael Wysocki:
       "This includes some infrastructure changes in the PM core, mostly
        related to integration between runtime PM and system-wide suspend and
        hibernation, plus some driver changes depending on them and fixes for
        issues in that area which have become quite apparent recently.
      
        Also included are changes making more x86-based systems use the Low
        Power Sleep S0 _DSM interface by default, which turned out to be
        necessary to handle power button wakeups from suspend-to-idle on
        Surface Pro3.
      
        On the cpufreq front we have fixes and cleanups in the core, some new
        hardware support, driver updates and the removal of some unused code
        from the CPU cooling thermal driver.
      
        Apart from this, the Operating Performance Points (OPP) framework is
        prepared to be used with power domains in the future and there is a
        usual bunch of assorted fixes and cleanups.
      
        Specifics:
      
         - Define a PM driver flag allowing drivers to request that their
           devices be left in suspend after system-wide transitions to the
           working state if possible and add support for it to the PCI bus
           type and the ACPI PM domain (Rafael Wysocki).
      
         - Make the PM core carry out optimizations for devices with driver PM
           flags set in some cases and make a few drivers set those flags
           (Rafael Wysocki).
      
         - Fix and clean up wrapper routines allowing runtime PM device
           callbacks to be re-used for system-wide PM, change the generic
           power domains (genpd) framework to stop using those routines
           incorrectly and fix up a driver depending on that behavior of genpd
           (Rafael Wysocki, Ulf Hansson, Geert Uytterhoeven).
      
         - Fix and clean up the PM core's device wakeup framework and
           re-factor system-wide PM core code related to device wakeup
           (Rafael Wysocki, Ulf Hansson, Brian Norris).
      
         - Make more x86-based systems use the Low Power Sleep S0 _DSM
           interface by default (to fix power button wakeup from
           suspend-to-idle on Surface Pro3) and add a kernel command line
           switch to tell it to ignore the system sleep blacklist in the ACPI
           core (Rafael Wysocki).
      
         - Fix a race condition related to cpufreq governor module removal and
           clean up the governor management code in the cpufreq core (Rafael
           Wysocki).
      
         - Drop the unused generic code related to the handling of the static
           power energy usage model in the CPU cooling thermal driver along
           with the corresponding documentation (Viresh Kumar).
      
         - Add mt2712 support to the Mediatek cpufreq driver (Andrew-sh
           Cheng).
      
         - Add a new operating point to the imx6ul and imx6q cpufreq drivers
           and switch the latter to using clk_bulk_get() (Anson Huang, Dong
           Aisheng).
      
         - Add support for multiple regulators to the TI cpufreq driver along
           with a new DT binding related to that and clean up that driver
           somewhat (Dave Gerlach).
      
         - Fix a powernv cpufreq driver regression leading to incorrect CPU
           frequency reporting, fix that driver to deal with non-continguous
           P-states correctly and clean it up (Gautham Shenoy, Shilpasri
           Bhat).
      
         - Add support for frequency scaling on Armada 37xx SoCs through the
           generic DT cpufreq driver (Gregory CLEMENT).
      
         - Fix error code paths in the mvebu cpufreq driver (Gregory CLEMENT).
      
         - Fix a transition delay setting regression in the longhaul cpufreq
           driver (Viresh Kumar).
      
         - Add Skylake X (server) support to the intel_pstate cpufreq driver
           and clean up that driver somewhat (Srinivas Pandruvada).
      
         - Clean up the cpufreq statistics collection code (Viresh Kumar).
      
         - Drop cluster terminology and dependency on physical_package_id from
           the PSCI driver and drop dependency on arm_big_little from the SCPI
           cpufreq driver (Sudeep Holla).
      
         - Add support for system-wide suspend and resume to the RAPL power
           capping driver and drop a redundant semicolon from it (Zhen Han,
           Luis de Bethencourt).
      
         - Make SPI domain validation (in the SCSI SPI transport driver) and
           system-wide suspend mutually exclusive as they rely on the same
           underlying mechanism and cannot be carried out at the same time
           (Bart Van Assche).
      
         - Fix the computation of the amount of memory to preallocate in the
           hibernation core and clean up one function in there (Rainer Fiebig,
           Kyungsik Lee).
      
         - Prepare the Operating Performance Points (OPP) framework for being
           used with power domains and clean up one function in it (Viresh
           Kumar, Wei Yongjun).
      
         - Clean up the generic sysfs interface for device PM (Andy
           Shevchenko).
      
         - Fix several minor issues in power management frameworks and clean
           them up a bit (Arvind Yadav, Bjorn Andersson, Geert Uytterhoeven,
           Gustavo Silva, Julia Lawall, Luis de Bethencourt, Paul Gortmaker,
           Sergey Senozhatsky, gaurav jindal).
      
         - Make it easier to disable PM via Kconfig (Mark Brown).
      
         - Clean up the cpupower and intel_pstate_tracer utilities (Doug
           Smythies, Laura Abbott)"
      
      * tag 'pm-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (89 commits)
        PCI / PM: Remove spurious semicolon
        cpufreq: scpi: remove arm_big_little dependency
        drivers: psci: remove cluster terminology and dependency on physical_package_id
        powercap: intel_rapl: Fix trailing semicolon
        dmaengine: rcar-dmac: Make DMAC reinit during system resume explicit
        PM / runtime: Allow no callbacks in pm_runtime_force_suspend|resume()
        PM / hibernate: Drop unused parameter of enough_swap
        PM / runtime: Check ignore_children in pm_runtime_need_not_resume()
        PM / runtime: Rework pm_runtime_force_suspend/resume()
        PM / genpd: Stop/start devices without pm_runtime_force_suspend/resume()
        cpufreq: powernv: Dont assume distinct pstate values for nominal and pmin
        cpufreq: intel_pstate: Add Skylake servers support
        cpufreq: intel_pstate: Replace bxt_funcs with core_funcs
        platform/x86: surfacepro3: Support for wakeup from suspend-to-idle
        ACPI / PM: Use Low Power S0 Idle on more systems
        PM / wakeup: Print warn if device gets enabled as wakeup source during sleep
        PM / domains: Don't skip driver's ->suspend|resume_noirq() callbacks
        PM / core: Propagate wakeup_path status flag in __device_suspend_late()
        PM / core: Re-structure code for clearing the direct_complete flag
        powercap: add suspend and resume mechanism for SOC power limit
        ...
      7f3fdd40
    • Linus Torvalds's avatar
      Merge tag 'sound-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 1c1f395b
      Linus Torvalds authored
      Pull sound updates from Takashi Iwai:
       "The major changes in the core API side in this cycle are the still
        on-going ASoC componentization works. Other than that, only few small
        changes such as 20bit PCM format support are found.
      
        Meanwhile the rest majority of changes are for ASoC drivers:
      
         - Large cleanups of some of the TI CODEC drivers
      
         - Continued work on Intel ASoC stuff for new quirks, ACPI GPIO
           handling, Kconfigs and lots of cleanups
      
         - Refactoring of the Freescale SSI driver, as preliminary work for
           the upcoming changes
      
         - Work on ST DFSDM driver, including the required IIO patches
      
         - New drivers for Allwinner A83T, Maxim MAX89373, SocioNext UiniPhier
           EVEA Tempo Semiconductor TSCS42xx and TI PCM816x, TAS5722 and
           TAS6424 devices
      
         - Removal of dead codes for SN95031 and board drivers
      
        Last but not least, a few HD-audio and USB-audio quirks are included
        as usual, too"
      
      * tag 'sound-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (303 commits)
        ALSA: hda - Reduce the suspend time consumption for ALC256
        ASoC: use seq_file to dump the contents of dai_list,platform_list and codec_list
        ASoC: soc-core: add missing EXPORT_SYMBOL_GPL() for snd_soc_rtdcom_lookup
        IIO: ADC: stm32-dfsdm: remove unused variable again
        ASoC: bcm2835: fix hw_params error when device is in prepared state
        ASoC: mxs-sgtl5000: Do not print error on probe deferral
        ASoC: sgtl5000: Do not print error on probe deferral
        ASoC: Intel: remove select on non-existing SND_SOC_INTEL_COMMON
        ALSA: usb-audio: Support changing input on Sound Blaster E1
        ASoC: Intel: remove second duplicated assignment to pointer 'res'
        ALSA: hda/realtek - update ALC215 depop optimize
        ALSA: hda/realtek - Support headset mode for ALC215/ALC285/ALC289
        ALSA: pcm: Fix trailing semicolon
        ASoC: add Component level .read/.write
        ASoC: cx20442: fix regression by adding back .read/.write
        ASoC: uda1380: fix regression by adding back .read/.write
        ASoC: tlv320dac33: fix regression by adding back .read/.write
        ALSA: hda - Use IS_REACHABLE() for dependency on input
        IIO: ADC: stm32-dfsdm: fix static check warning
        IIO: ADC: stm32-dfsdm: code optimization
        ...
      1c1f395b
    • Linus Torvalds's avatar
      Merge tag 'init_task-20180117' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · 49f9c355
      Linus Torvalds authored
      Pull init_task initializer cleanups from David Howells:
       "It doesn't seem useful to have the init_task in a header file rather
        than in a normal source file. We could consolidate init_task handling
        instead and expand out various macros.
      
        Here's a series of patches that consolidate init_task handling:
      
         (1) Make THREAD_SIZE available to vmlinux.lds for cris, hexagon and
             openrisc.
      
         (2) Alter the INIT_TASK_DATA linker script macro to set
             init_thread_union and init_stack rather than defining these in C.
      
             Insert init_task and init_thread_into into the init_stack area in
             the linker script as appropriate to the configuration, with
             different section markers so that they end up correctly ordered.
      
             We can then get merge ia64's init_task.c into the main one.
      
             We then have a bunch of single-use INIT_*() macros that seem only
             to be macros because they used to be used per-arch. We can then
             expand these in place of the user and get rid of a few lines and
             a lot of backslashes.
      
         (3) Expand INIT_TASK() in place.
      
         (4) Expand in place various small INIT_*() macros that are defined
             conditionally. Expand them and surround them by #if[n]def/#endif
             in the .c file as it takes fewer lines.
      
         (5) Expand INIT_SIGNALS() and INIT_SIGHAND() in place.
      
         (6) Expand INIT_STRUCT_PID in place.
      
        These macros can then be discarded"
      
      * tag 'init_task-20180117' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
        Expand INIT_STRUCT_PID and remove
        Expand the INIT_SIGNALS and INIT_SIGHAND macros and remove
        Expand various INIT_* macros and remove
        Expand INIT_TASK() in init/init_task.c and remove
        Construct init thread stack in the linker script rather than by union
        openrisc: Make THREAD_SIZE available to vmlinux.lds
        hexagon: Make THREAD_SIZE available to vmlinux.lds
        cris: Make THREAD_SIZE available to vmlinux.lds
      49f9c355
  4. 28 Jan, 2018 8 commits
  5. 27 Jan, 2018 2 commits
    • Thomas Gleixner's avatar
      hrtimer: Reset hrtimer cpu base proper on CPU hotplug · d5421ea4
      Thomas Gleixner authored
      The hrtimer interrupt code contains a hang detection and mitigation
      mechanism, which prevents that a long delayed hrtimer interrupt causes a
      continous retriggering of interrupts which prevent the system from making
      progress. If a hang is detected then the timer hardware is programmed with
      a certain delay into the future and a flag is set in the hrtimer cpu base
      which prevents newly enqueued timers from reprogramming the timer hardware
      prior to the chosen delay. The subsequent hrtimer interrupt after the delay
      clears the flag and resumes normal operation.
      
      If such a hang happens in the last hrtimer interrupt before a CPU is
      unplugged then the hang_detected flag is set and stays that way when the
      CPU is plugged in again. At that point the timer hardware is not armed and
      it cannot be armed because the hang_detected flag is still active, so
      nothing clears that flag. As a consequence the CPU does not receive hrtimer
      interrupts and no timers expire on that CPU which results in RCU stalls and
      other malfunctions.
      
      Clear the flag along with some other less critical members of the hrtimer
      cpu base to ensure starting from a clean state when a CPU is plugged in.
      
      Thanks to Paul, Sebastian and Anna-Maria for their help to get down to the
      root cause of that hard to reproduce heisenbug. Once understood it's
      trivial and certainly justifies a brown paperbag.
      
      Fixes: 41d2e494 ("hrtimer: Tune hrtimer_interrupt hang logic")
      Reported-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Sewior <bigeasy@linutronix.de>
      Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801261447590.2067@nanos
      d5421ea4
    • H. Peter Anvin's avatar
      x86: Mark hpa as a "Designated Reviewer" for the time being · 8a95b74d
      H. Peter Anvin authored
      Due to some unfortunate events, I have not been directly involved in
      the x86 kernel patch flow for a while now.  I have also not been able
      to ramp back up by now like I had hoped to, and after reviewing what I
      will need to work on both internally at Intel and elsewhere in the near
      term, it is clear that I am not going to be able to ramp back up until
      late 2018 at the very earliest.
      
      It is not acceptable to not recognize that this load is currently
      taken by Ingo and Thomas without my direct participation, so I mark
      myself as R: (designated reviewer) rather than M: (maintainer) until
      further notice.  This is in fact recognizing the de facto situation
      for the past few years.
      
      I have obviously no intention of going away, and I will do everything
      within my power to improve Linux on x86 and x86 for Linux.  This,
      however, puts credit where it is due and reflects a change of focus.
      
      This patch also removes stale entries for portions of the x86
      architecture which have not been maintained separately from arch/x86
      for a long time.  If there is a reason to re-introduce them then that
      can happen later.
      Signed-off-by: default avatarH. Peter Anvin <h.peter.anvin@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Bruce Schlobohm <bruce.schlobohm@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20180125195934.5253-1-hpa@zytor.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8a95b74d
  6. 26 Jan, 2018 13 commits
  7. 25 Jan, 2018 8 commits
    • Lyude Paul's avatar
      drm/nouveau: Move irq setup/teardown to pci ctor/dtor · 0fd189a9
      Lyude Paul authored
      For a while we've been having issues with seemingly random interrupts
      coming from nvidia cards when resuming them. Originally the fix for this
      was thought to be just re-arming the MSI interrupt registers right after
      re-allocating our IRQs, however it seems a lot of what we do is both
      wrong and not even nessecary.
      
      This was made apparent by what appeared to be a regression in the
      mainline kernel that started introducing suspend/resume issues for
      nouveau:
      
              a0c9259d (irq/matrix: Spread interrupts on allocation)
      
      After this commit was introduced, we started getting interrupts from the
      GPU before we actually re-allocated our own IRQ (see references below)
      and assigned the IRQ handler. Investigating this turned out that the
      problem was not with the commit, but the fact that nouveau even
      free/allocates it's irqs before and after suspend/resume.
      
      For starters: drivers in the linux kernel haven't had to handle
      freeing/re-allocating their IRQs during suspend/resume cycles for quite
      a while now. Nouveau seems to be one of the few drivers left that still
      does this, despite the fact there's no reason we actually need to since
      disabling interrupts from the device side should be enough, as the
      kernel is already smart enough to know to disable host-side interrupts
      for us before going into suspend. Since we were tearing down our IRQs by
      hand however, that means there was a short period during resume where
      interrupts could be received before we re-allocated our IRQ which would
      lead to us getting an unhandled IRQ. Since we never handle said IRQ and
      re-arm the interrupt registers, this would cause us to miss all of the
      interrupts from the GPU and cause our init process to start timing out
      on anything requiring interrupts.
      
      So, since this whole setup/teardown every suspend/resume cycle is
      useless anyway, move irq setup/teardown into the pci subdev's ctor/dtor
      functions instead so they're only called at driver load and driver
      unload. This should fix most of the issues with pending interrupts on
      resume, along with getting suspend/resume for nouveau to work again.
      
      As well, this probably means we can also just remove the msi rearm call
      inside nvkm_pci_init(). But since our main focus here is to fix
      suspend/resume before 4.15, we'll save that for a later patch.
      Signed-off-by: default avatarLyude Paul <lyude@redhat.com>
      Cc: Karol Herbst <kherbst@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      0fd189a9
    • Nicolas Dichtel's avatar
      net: don't call update_pmtu unconditionally · f15ca723
      Nicolas Dichtel authored
      Some dst_ops (e.g. md_dst_ops)) doesn't set this handler. It may result to:
      "BUG: unable to handle kernel NULL pointer dereference at           (null)"
      
      Let's add a helper to check if update_pmtu is available before calling it.
      
      Fixes: 52a589d5 ("geneve: update skb dst pmtu on tx path")
      Fixes: a93bf0ff ("vxlan: update skb dst pmtu on tx path")
      CC: Roman Kapl <code@rkapl.cz>
      CC: Xin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f15ca723
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 6e20630e
      Linus Torvalds authored
      Pull KVM fixes from Radim Krčmář:
       "Fix races and a potential use after free in the s390 cmma migration
        code"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: s390: add proper locking for CMMA migration bitmap
      6e20630e
    • Linus Torvalds's avatar
      Merge tag 'for-4.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 525273fb
      Linus Torvalds authored
      Pull btrfs fix from David Sterba:
       "It's been reported recently that readdir can list stale entries under
        some conditions. Fix it."
      
      * tag 'for-4.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        Btrfs: fix stale entries in readdir
      525273fb
    • Dan Streetman's avatar
      net: tcp: close sock if net namespace is exiting · 4ee806d5
      Dan Streetman authored
      When a tcp socket is closed, if it detects that its net namespace is
      exiting, close immediately and do not wait for FIN sequence.
      
      For normal sockets, a reference is taken to their net namespace, so it will
      never exit while the socket is open.  However, kernel sockets do not take a
      reference to their net namespace, so it may begin exiting while the kernel
      socket is still open.  In this case if the kernel socket is a tcp socket,
      it will stay open trying to complete its close sequence.  The sock's dst(s)
      hold a reference to their interface, which are all transferred to the
      namespace's loopback interface when the real interfaces are taken down.
      When the namespace tries to take down its loopback interface, it hangs
      waiting for all references to the loopback interface to release, which
      results in messages like:
      
      unregister_netdevice: waiting for lo to become free. Usage count = 1
      
      These messages continue until the socket finally times out and closes.
      Since the net namespace cleanup holds the net_mutex while calling its
      registered pernet callbacks, any new net namespace initialization is
      blocked until the current net namespace finishes exiting.
      
      After this change, the tcp socket notices the exiting net namespace, and
      closes immediately, releasing its dst(s) and their reference to the
      loopback interface, which lets the net namespace continue exiting.
      
      Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=97811Signed-off-by: default avatarDan Streetman <ddstreet@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ee806d5
    • Peter Zijlstra's avatar
      perf/x86: Fix perf,x86,cpuhp deadlock · efe951d3
      Peter Zijlstra authored
      More lockdep gifts, a 5-way lockup race:
      
      	perf_event_create_kernel_counter()
      	  perf_event_alloc()
      	    perf_try_init_event()
      	      x86_pmu_event_init()
      		__x86_pmu_event_init()
      		  x86_reserve_hardware()
       #0		    mutex_lock(&pmc_reserve_mutex);
      		    reserve_ds_buffer()
       #1		      get_online_cpus()
      
      	perf_event_release_kernel()
      	  _free_event()
      	    hw_perf_event_destroy()
      	      x86_release_hardware()
       #0		mutex_lock(&pmc_reserve_mutex)
      		release_ds_buffer()
       #1		  get_online_cpus()
      
       #1	do_cpu_up()
      	  perf_event_init_cpu()
       #2	    mutex_lock(&pmus_lock)
       #3	    mutex_lock(&ctx->mutex)
      
      	sys_perf_event_open()
      	  mutex_lock_double()
       #3	    mutex_lock(ctx->mutex)
       #4	    mutex_lock_nested(ctx->mutex, 1);
      
      	perf_try_init_event()
       #4	  mutex_lock_nested(ctx->mutex, 1)
      	  x86_pmu_event_init()
      	    intel_pmu_hw_config()
      	      x86_add_exclusive()
       #0		mutex_lock(&pmc_reserve_mutex)
      
      Fix it by using ordering constructs instead of locking.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      efe951d3
    • Peter Zijlstra's avatar
      perf/core: Fix ctx::mutex deadlock · 0c7296ca
      Peter Zijlstra authored
      Lockdep noticed the following 3-way lockup scenario:
      
      	sys_perf_event_open()
      	  perf_event_alloc()
      	    perf_try_init_event()
       #0	      ctx = perf_event_ctx_lock_nested(1)
      	      perf_swevent_init()
      		swevent_hlist_get()
       #1		  mutex_lock(&pmus_lock)
      
      	perf_event_init_cpu()
       #1	  mutex_lock(&pmus_lock)
       #2	  mutex_lock(&ctx->mutex)
      
      	sys_perf_event_open()
      	  mutex_lock_double()
       #2	   mutex_lock()
       #0	   mutex_lock_nested()
      
      And while we need that perf_event_ctx_lock_nested() for HW PMUs such
      that they can iterate the sibling list, trying to match it to the
      available counters, the software PMUs need do no such thing. Exclude
      them.
      
      In particular the swevent triggers the above invertion, while the
      tpevent PMU triggers a more elaborate one through their event_mutex.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0c7296ca
    • Peter Zijlstra's avatar
      perf/core: Fix another perf,trace,cpuhp lock inversion · 43fa87f7
      Peter Zijlstra authored
      Lockdep noticed the following 3-way lockup race:
      
              perf_trace_init()
       #0       mutex_lock(&event_mutex)
                perf_trace_event_init()
                  perf_trace_event_reg()
                    tp_event->class->reg() := tracepoint_probe_register
       #1              mutex_lock(&tracepoints_mutex)
                        trace_point_add_func()
       #2                  static_key_enable()
      
       #2	do_cpu_up()
      	  perf_event_init_cpu()
       #3	    mutex_lock(&pmus_lock)
       #4	    mutex_lock(&ctx->mutex)
      
      	perf_ioctl()
       #4	  ctx = perf_event_ctx_lock()
      	  _perf_iotcl()
      	    ftrace_profile_set_filter()
       #0	      mutex_lock(&event_mutex)
      
      Fudge it for now by noting that the tracepoint state does not depend
      on the event <-> context relation. Ugly though :/
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      43fa87f7