1. 07 Oct, 2019 10 commits
  2. 04 Oct, 2019 5 commits
    • James Morse's avatar
      arm64: ftrace: Ensure synchronisation in PLT setup for Neoverse-N1 #1542419 · dd8a1f13
      James Morse authored
      CPUs affected by Neoverse-N1 #1542419 may execute a stale instruction if
      it was recently modified. The affected sequence requires freshly written
      instructions to be executable before a branch to them is updated.
      
      There are very few places in the kernel that modify executable text,
      all but one come with sufficient synchronisation:
       * The module loader's flush_module_icache() calls flush_icache_range(),
         which does a kick_all_cpus_sync()
       * bpf_int_jit_compile() calls flush_icache_range().
       * Kprobes calls aarch64_insn_patch_text(), which does its work in
         stop_machine().
       * static keys and ftrace both patch between nops and branches to
         existing kernel code (not generated code).
      
      The affected sequence is the interaction between ftrace and modules.
      The module PLT is cleaned using __flush_icache_range() as the trampoline
      shouldn't be executable until we update the branch to it.
      
      Drop the double-underscore so that this path runs kick_all_cpus_sync()
      too.
      Signed-off-by: default avatarJames Morse <james.morse@arm.com>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      dd8a1f13
    • James Morse's avatar
      arm64: Fix incorrect irqflag restore for priority masking for compat · f46f27a5
      James Morse authored
      Commit bd82d4bd ("arm64: Fix incorrect irqflag restore for priority
      masking") added a macro to the entry.S call paths that leave the
      PSTATE.I bit set. This tells the pPNMI masking logic that interrupts
      are masked by the CPU, not by the PMR. This value is read back by
      local_daif_save().
      
      Commit bd82d4bd added this call to el0_svc, as el0_svc_handler
      is called with interrupts masked. el0_svc_compat was missed, but should
      be covered in the same way as both of these paths end up in
      el0_svc_common(), which expects to unmask interrupts.
      
      Fixes: bd82d4bd ("arm64: Fix incorrect irqflag restore for priority masking")
      Signed-off-by: default avatarJames Morse <james.morse@arm.com>
      Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      f46f27a5
    • Mark Rutland's avatar
      arm64: mm: avoid virt_to_phys(init_mm.pgd) · e4365f96
      Mark Rutland authored
      If we take an unhandled fault in the kernel, we call show_pte() to dump
      the {PGDP,PGD,PUD,PMD,PTE} values for the corresponding page table walk,
      where the PGDP value is virt_to_phys(mm->pgd).
      
      The boot-time and runtime kernel page tables, init_pg_dir and
      swapper_pg_dir respectively, are kernel symbols. Thus, it is not valid
      to call virt_to_phys() on either of these, though we'll do so if we take
      a fault on a TTBR1 address.
      
      When CONFIG_DEBUG_VIRTUAL is not selected, virt_to_phys() will silently
      fix this up. However, when CONFIG_DEBUG_VIRTUAL is selected, this
      results in splats as below. Depending on when these occur, they can
      happen to suppress information needed to debug the original unhandled
      fault, such as the backtrace:
      
      | Unable to handle kernel paging request at virtual address ffff7fffec73cf0f
      | Mem abort info:
      |   ESR = 0x96000004
      |   EC = 0x25: DABT (current EL), IL = 32 bits
      |   SET = 0, FnV = 0
      |   EA = 0, S1PTW = 0
      | Data abort info:
      |   ISV = 0, ISS = 0x00000004
      |   CM = 0, WnR = 0
      | ------------[ cut here ]------------
      | virt_to_phys used for non-linear address: 00000000102c9dbe (swapper_pg_dir+0x0/0x1000)
      | WARNING: CPU: 1 PID: 7558 at arch/arm64/mm/physaddr.c:15 __virt_to_phys+0xe0/0x170 arch/arm64/mm/physaddr.c:12
      | Kernel panic - not syncing: panic_on_warn set ...
      | SMP: stopping secondary CPUs
      | Dumping ftrace buffer:
      |    (ftrace buffer empty)
      | Kernel Offset: disabled
      | CPU features: 0x0002,23000438
      | Memory Limit: none
      | Rebooting in 1 seconds..
      
      We can avoid this by ensuring that we call __pa_symbol() for
      init_mm.pgd, as this will always be a kernel symbol. As the dumped
      {PGD,PUD,PMD,PTE} values are the raw values from the relevant entries we
      don't need to handle these specially.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      e4365f96
    • Julien Grall's avatar
      arm64: cpufeature: Effectively expose FRINT capability to userspace · 7230f7e9
      Julien Grall authored
      The HWCAP framework will detect a new capability based on the sanitized
      version of the ID registers.
      
      Sanitization is based on a whitelist, so any field not described will end
      up to be zeroed.
      
      At the moment, ID_AA64ISAR1_EL1.FRINTTS is not described in
      ftr_id_aa64isar1. This means the field will be zeroed and therefore the
      userspace will not be able to see the HWCAP even if the hardware
      supports the feature.
      
      This can be fixed by describing the field in ftr_id_aa64isar1.
      
      Fixes: ca9503fc ("arm64: Expose FRINT capabilities to userspace")
      Signed-off-by: default avatarJulien Grall <julien.grall@arm.com>
      Cc: mark.brown@arm.com
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      7230f7e9
    • Will Deacon's avatar
      arm64: Mark functions using explicit register variables as '__always_inline' · a48e61de
      Will Deacon authored
      As of ac7c3e4f ("compiler: enable CONFIG_OPTIMIZE_INLINING forcibly"),
      inline functions are no longer annotated with '__always_inline', which
      allows the compiler to decide whether inlining is really a good idea or
      not. Although this is a great idea on paper, the reality is that AArch64
      GCC prior to 9.1 has been shown to get confused when creating an
      out-of-line copy of a function passing explicit 'register' variables
      into an inline assembly block:
      
        https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91111
      
      It's not clear whether this is specific to arm64 or not but, for now,
      ensure that all of our functions using 'register' variables are marked
      as '__always_inline' so that the old behaviour is effectively preserved.
      
      Hopefully other architectures are luckier with their compilers.
      
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      a48e61de
  3. 01 Oct, 2019 3 commits
    • Adam Zerella's avatar
      docs: arm64: Fix indentation and doc formatting · a2b99dca
      Adam Zerella authored
      Sphinx generates the following warnings for the arm64 doc
      pages:
      
      Documentation/arm64/memory.rst:158: WARNING: Unexpected indentation.
      Documentation/arm64/memory.rst:162: WARNING: Unexpected indentation.
      
      These indentations warnings can be resolved by utilising code
      hightlighting instead.
      Signed-off-by: default avatarAdam Zerella <adam.zerella@gmail.com>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      a2b99dca
    • Masayoshi Mizuma's avatar
      arm64/sve: Fix wrong free for task->thread.sve_state · 4585fc59
      Masayoshi Mizuma authored
      The system which has SVE feature crashed because of
      the memory pointed by task->thread.sve_state was destroyed
      by someone.
      
      That is because sve_state is freed while the forking the
      child process. The child process has the pointer of sve_state
      which is same as the parent's because the child's task_struct
      is copied from the parent's one. If the copy_process()
      fails as an error on somewhere, for example, copy_creds(),
      then the sve_state is freed even if the parent is alive.
      The flow is as follows.
      
      copy_process
              p = dup_task_struct
                  => arch_dup_task_struct
                      *dst = *src;  // copy the entire region.
      :
              retval = copy_creds
              if (retval < 0)
                      goto bad_fork_free;
      :
      bad_fork_free:
      ...
              delayed_free_task(p);
                => free_task
                   => arch_release_task_struct
                      => fpsimd_release_task
                         => __sve_free
                            => kfree(task->thread.sve_state);
                               // free the parent's sve_state
      
      Move child's sve_state = NULL and clearing TIF_SVE flag
      to arch_dup_task_struct() so that the child doesn't free the
      parent's one.
      There is no need to wait until copy_process() to clear TIF_SVE for
      dst, because the thread flags for dst are initialized already by
      copying the src task_struct.
      This change simplifies the code, so get rid of comments that are no
      longer needed.
      
      As a note, arm64 used to have thread_info on the stack. So it
      would not be possible to clear TIF_SVE until the stack is initialized.
      From commit c02433dd ("arm64: split thread_info from task stack"),
      the thread_info is part of the task, so it should be valid to modify
      the flag from arch_dup_task_struct().
      
      Cc: stable@vger.kernel.org # 4.15.x-
      Fixes: bc0ee476 ("arm64/sve: Core task context handling")
      Signed-off-by: default avatarMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Reported-by: default avatarHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Suggested-by: default avatarDave Martin <Dave.Martin@arm.com>
      Reviewed-by: default avatarDave Martin <Dave.Martin@arm.com>
      Tested-by: default avatarJulien Grall <julien.grall@arm.com>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      4585fc59
    • Thierry Reding's avatar
      arm64: errata: Update stale comment · 7a292b6c
      Thierry Reding authored
      Commit 73f38166 ("arm64: Advertise mitigation of Spectre-v2, or lack
      thereof") renamed the caller of the install_bp_hardening_cb() function
      but forgot to update a comment, which can be confusing when trying to
      follow the code flow.
      
      Fixes: 73f38166 ("arm64: Advertise mitigation of Spectre-v2, or lack thereof")
      Signed-off-by: default avatarThierry Reding <treding@nvidia.com>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      7a292b6c
  4. 30 Sep, 2019 17 commits
    • Linus Torvalds's avatar
      Linux 5.4-rc1 · 54ecb8f7
      Linus Torvalds authored
      54ecb8f7
    • Linus Torvalds's avatar
      Merge tag 'for-5.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · bb48a591
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
       "A bunch of fixes that accumulated in recent weeks, mostly material for
        stable.
      
        Summary:
      
         - fix for regression from 5.3 that prevents to use balance convert
           with single profile
      
         - qgroup fixes: rescan race, accounting leak with multiple writers,
           potential leak after io failure recovery
      
         - fix for use after free in relocation (reported by KASAN)
      
         - other error handling fixups"
      
      * tag 'for-5.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: qgroup: Fix reserved data space leak if we have multiple reserve calls
        btrfs: qgroup: Fix the wrong target io_tree when freeing reserved data space
        btrfs: Fix a regression which we can't convert to SINGLE profile
        btrfs: relocation: fix use-after-free on dead relocation roots
        Btrfs: fix race setting up and completing qgroup rescan workers
        Btrfs: fix missing error return if writeback for extent buffer never started
        btrfs: adjust dirty_metadata_bytes after writeback failure of extent buffer
        Btrfs: fix selftests failure due to uninitialized i_mode in test inodes
      bb48a591
    • Linus Torvalds's avatar
      Merge tag 'csky-for-linus-5.4-rc1' of git://github.com/c-sky/csky-linux · 80b29b6b
      Linus Torvalds authored
      Pull csky updates from Guo Ren:
       "This round of csky subsystem just some fixups:
      
         - Fix mb() synchronization problem
      
         - Fix dma_alloc_coherent with PAGE_SO attribute
      
         - Fix cache_op failed when cross memory ZONEs
      
         - Optimize arch_sync_dma_for_cpu/device with dma_inv_range
      
         - Fix ioremap function losing
      
         - Fix arch_get_unmapped_area() implementation
      
         - Fix defer cache flush for 610
      
         - Support kernel non-aligned access
      
         - Fix 610 vipt cache flush mechanism
      
         - Fix add zero_fp fixup perf backtrace panic
      
         - Move static keyword to the front of declaration
      
         - Fix csky_pmu.max_period assignment
      
         - Use generic free_initrd_mem()
      
         - entry: Remove unneeded need_resched() loop"
      
      * tag 'csky-for-linus-5.4-rc1' of git://github.com/c-sky/csky-linux:
        csky: Move static keyword to the front of declaration
        csky: entry: Remove unneeded need_resched() loop
        csky: Fixup csky_pmu.max_period assignment
        csky: Fixup add zero_fp fixup perf backtrace panic
        csky: Use generic free_initrd_mem()
        csky: Fixup 610 vipt cache flush mechanism
        csky: Support kernel non-aligned access
        csky: Fixup defer cache flush for 610
        csky: Fixup arch_get_unmapped_area() implementation
        csky: Fixup ioremap function losing
        csky: Optimize arch_sync_dma_for_cpu/device with dma_inv_range
        csky/dma: Fixup cache_op failed when cross memory ZONEs
        csky: Fixup dma_alloc_coherent with PAGE_SO attribute
        csky: Fixup mb() synchronization problem
      80b29b6b
    • Linus Torvalds's avatar
      Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · cef0aa0c
      Linus Torvalds authored
      Pull ARM SoC fixes from Olof Johansson:
       "A few fixes that have trickled in through the merge window:
      
         - Video fixes for OMAP due to panel-dpi driver removal
      
         - Clock fixes for OMAP that broke no-idle quirks + nfsroot on DRA7
      
         - Fixing arch version on ASpeed ast2500
      
         - Two fixes for reset handling on ARM SCMI"
      
      * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
        ARM: aspeed: ast2500 is ARMv6K
        reset: reset-scmi: add missing handle initialisation
        firmware: arm_scmi: reset: fix reset_state assignment in scmi_domain_reset
        bus: ti-sysc: Remove unpaired sysc_clkdm_deny_idle()
        ARM: dts: logicpd-som-lv: Fix i2c2 and i2c3 Pin mux
        ARM: dts: am3517-evm: Fix missing video
        ARM: dts: logicpd-torpedo-baseboard: Fix missing video
        ARM: omap2plus_defconfig: Fix missing video
        bus: ti-sysc: Fix handling of invalid clocks
        bus: ti-sysc: Fix clock handling for no-idle quirks
      cef0aa0c
    • Linus Torvalds's avatar
      Merge tag 'trace-v5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · cf4f493b
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
       "A few more tracing fixes:
      
         - Fix a buffer overflow by checking nr_args correctly in probes
      
         - Fix a warning that is reported by clang
      
         - Fix a possible memory leak in error path of filter processing
      
         - Fix the selftest that checks for failures, but wasn't failing
      
         - Minor clean up on call site output of a memory trace event"
      
      * tag 'trace-v5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        selftests/ftrace: Fix same probe error test
        mm, tracing: Print symbol name for call_site in trace events
        tracing: Have error path in predicate_parse() free its allocated memory
        tracing: Fix clang -Wint-in-bool-context warnings in IF_ASSIGN macro
        tracing/probe: Fix to check the difference of nr_args before adding probe
      cf4f493b
    • Linus Torvalds's avatar
      Merge tag 'mmc-v5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · c710364f
      Linus Torvalds authored
      Pull more MMC updates from Ulf Hansson:
       "A couple more updates/fixes for MMC:
      
         - sdhci-pci: Add Genesys Logic GL975x support
      
         - sdhci-tegra: Recover loss in throughput for DMA
      
         - sdhci-of-esdhc: Fix DMA bug"
      
      * tag 'mmc-v5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
        mmc: host: sdhci-pci: Add Genesys Logic GL975x support
        mmc: tegra: Implement ->set_dma_mask()
        mmc: sdhci: Let drivers define their DMA mask
        mmc: sdhci-of-esdhc: set DMA snooping based on DMA coherence
        mmc: sdhci: improve ADMA error reporting
      c710364f
    • Krzysztof Wilczynski's avatar
      csky: Move static keyword to the front of declaration · 9af032a3
      Krzysztof Wilczynski authored
      Move the static keyword to the front of declaration of
      csky_pmu_of_device_ids, and resolve the following compiler
      warning that can be seen when building with warnings
      enabled (W=1):
      
      arch/csky/kernel/perf_event.c:1340:1: warning:
        ‘static’ is not at beginning of declaration [-Wold-style-declaration]
      Signed-off-by: default avatarKrzysztof Wilczynski <kw@linux.com>
      Signed-off-by: default avatarGuo Ren <guoren@kernel.org>
      9af032a3
    • Valentin Schneider's avatar
      csky: entry: Remove unneeded need_resched() loop · a2139d3b
      Valentin Schneider authored
      Since the enabling and disabling of IRQs within preempt_schedule_irq()
      is contained in a need_resched() loop, we don't need the outer arch
      code loop.
      Signed-off-by: default avatarValentin Schneider <valentin.schneider@arm.com>
      Signed-off-by: default avatarGuo Ren <guoren@kernel.org>
      a2139d3b
    • Linus Torvalds's avatar
      Merge tag 'char-misc-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 97f9a3c4
      Linus Torvalds authored
      Pull Documentation/process update from Greg KH:
       "Here are two small Documentation/process/embargoed-hardware-issues.rst
        file updates that missed my previous char/misc pull request.
      
        The first one adds an Intel representative for the process, and the
        second one cleans up the text a bit more when it comes to how the
        disclosure rules work, as it was a bit confusing to some companies"
      
      * tag 'char-misc-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        Documentation/process: Clarify disclosure rules
        Documentation/process: Volunteer as the ambassador for Intel
      97f9a3c4
    • Linus Torvalds's avatar
      Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 1eb80d6f
      Linus Torvalds authored
      Pull more vfs updates from Al Viro:
       "A couple of misc patches"
      
      * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        afs dynroot: switch to simple_dir_operations
        fs/handle.c - fix up kerneldoc
      1eb80d6f
    • Linus Torvalds's avatar
      Merge tag '5.4-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 · 7edee522
      Linus Torvalds authored
      Pull more cifs updates from Steve French:
       "Fixes from the recent SMB3 Test events and Storage Developer
        Conference (held the last two weeks).
      
        Here are nine smb3 patches including an important patch for debugging
        traces with wireshark, with three patches marked for stable.
      
        Additional fixes from last week to better handle some newly discovered
        reparse points, and a fix the create/mkdir path for setting the mode
        more atomically (in SMB3 Create security descriptor context), and one
        for path name processing are still being tested so are not included
        here"
      
      * tag '5.4-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        CIFS: Fix oplock handling for SMB 2.1+ protocols
        smb3: missing ACL related flags
        smb3: pass mode bits into create calls
        smb3: Add missing reparse tags
        CIFS: fix max ea value size
        fs/cifs/sess.c: Remove set but not used variable 'capabilities'
        fs/cifs/smb2pdu.c: Make SMB2_notify_init static
        smb3: fix leak in "open on server" perf counter
        smb3: allow decryption keys to be dumped by admin for debugging
      7edee522
    • Mao Han's avatar
      csky: Fixup csky_pmu.max_period assignment · 3a09d8e2
      Mao Han authored
      The csky_pmu.max_period has type u64, and BIT() can only return
      32 bits unsigned long on C-SKY. The initialization for max_period
      will be incorrect when count_width is bigger than 32.
      
      Use BIT_ULL()
      Signed-off-by: default avatarMao Han <han_mao@c-sky.com>
      Signed-off-by: default avatarGuo Ren <ren_guo@c-sky.com>
      3a09d8e2
    • Guo Ren's avatar
      csky: Fixup add zero_fp fixup perf backtrace panic · 48ede51f
      Guo Ren authored
      We need set fp zero to let backtrace know the end. The patch fixup perf
      callchain panic problem, because backtrace didn't know what is the end
      of fp.
      Signed-off-by: default avatarGuo Ren <ren_guo@c-sky.com>
      Reported-by: default avatarMao Han <han_mao@c-sky.com>
      48ede51f
    • Mike Rapoport's avatar
      csky: Use generic free_initrd_mem() · fdbdcddc
      Mike Rapoport authored
      The csky implementation of free_initrd_mem() is an open-coded version of
      free_reserved_area() without poisoning.
      
      Remove it and make csky use the generic version of free_initrd_mem().
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarGuo Ren <guoren@kernel.org>
      fdbdcddc
    • Linus Torvalds's avatar
      Merge branch 'entropy' · 3f2dc279
      Linus Torvalds authored
      Merge active entropy generation updates.
      
      This is admittedly partly "for discussion".  We need to have a way
      forward for the boot time deadlocks where user space ends up waiting for
      more entropy, but no entropy is forthcoming because the system is
      entirely idle just waiting for something to happen.
      
      While this was triggered by what is arguably a user space bug with
      GDM/gnome-session asking for secure randomness during early boot, when
      they didn't even need any such truly secure thing, the issue ends up
      being that our "getrandom()" interface is prone to that kind of
      confusion, because people don't think very hard about whether they want
      to block for sufficient amounts of entropy.
      
      The approach here-in is to decide to not just passively wait for entropy
      to happen, but to start actively collecting it if it is missing.  This
      is not necessarily always possible, but if the architecture has a CPU
      cycle counter, there is a fair amount of noise in the exact timings of
      reasonably complex loads.
      
      We may end up tweaking the load and the entropy estimates, but this
      should be at least a reasonable starting point.
      
      As part of this, we also revert the revert of the ext4 IO pattern
      improvement that ended up triggering the reported lack of external
      entropy.
      
      * getrandom() active entropy waiting:
        Revert "Revert "ext4: make __ext4_get_inode_loc plug""
        random: try to actively add entropy rather than passively wait for it
      3f2dc279
    • Linus Torvalds's avatar
      Revert "Revert "ext4: make __ext4_get_inode_loc plug"" · 02f03c42
      Linus Torvalds authored
      This reverts commit 72dbcf72.
      
      Instead of waiting forever for entropy that may just not happen, we now
      try to actively generate entropy when required, and are thus hopefully
      avoiding the problem that caused the nice ext4 IO pattern fix to be
      reverted.
      
      So revert the revert.
      
      Cc: Ahmed S. Darwish <darwish.07@gmail.com>
      Cc: Ted Ts'o <tytso@mit.edu>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Alexander E. Patrakov <patrakov@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      02f03c42
    • Linus Torvalds's avatar
      random: try to actively add entropy rather than passively wait for it · 50ee7529
      Linus Torvalds authored
      For 5.3 we had to revert a nice ext4 IO pattern improvement, because it
      caused a bootup regression due to lack of entropy at bootup together
      with arguably broken user space that was asking for secure random
      numbers when it really didn't need to.
      
      See commit 72dbcf72 (Revert "ext4: make __ext4_get_inode_loc plug").
      
      This aims to solve the issue by actively generating entropy noise using
      the CPU cycle counter when waiting for the random number generator to
      initialize.  This only works when you have a high-frequency time stamp
      counter available, but that's the case on all modern x86 CPU's, and on
      most other modern CPU's too.
      
      What we do is to generate jitter entropy from the CPU cycle counter
      under a somewhat complex load: calling the scheduler while also
      guaranteeing a certain amount of timing noise by also triggering a
      timer.
      
      I'm sure we can tweak this, and that people will want to look at other
      alternatives, but there's been a number of papers written on jitter
      entropy, and this should really be fairly conservative by crediting one
      bit of entropy for every timer-induced jump in the cycle counter.  Not
      because the timer itself would be all that unpredictable, but because
      the interaction between the timer and the loop is going to be.
      
      Even if (and perhaps particularly if) the timer actually happens on
      another CPU, the cacheline interaction between the loop that reads the
      cycle counter and the timer itself firing is going to add perturbations
      to the cycle counter values that get mixed into the entropy pool.
      
      As Thomas pointed out, with a modern out-of-order CPU, even quite simple
      loops show a fair amount of hard-to-predict timing variability even in
      the absense of external interrupts.  But this tries to take that further
      by actually having a fairly complex interaction.
      
      This is not going to solve the entropy issue for architectures that have
      no CPU cycle counter, but it's not clear how (and if) that is solvable,
      and the hardware in question is largely starting to be irrelevant.  And
      by doing this we can at least avoid some of the even more contentious
      approaches (like making the entropy waiting time out in order to avoid
      the possibly unbounded waiting).
      
      Cc: Ahmed Darwish <darwish.07@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Nicholas Mc Guire <hofrat@opentech.at>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Alexander E. Patrakov <patrakov@gmail.com>
      Cc: Lennart Poettering <mzxreary@0pointer.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      50ee7529
  5. 29 Sep, 2019 5 commits