1. 17 Jun, 2021 1 commit
  2. 11 Jun, 2021 5 commits
    • Rafael J. Wysocki's avatar
      cpuidle: teo: Use kerneldoc documentation in admin-guide · 154ae8bb
      Rafael J. Wysocki authored
      There are two descriptions of the TEO (Timer Events Oriented) cpuidle
      governor in the kernel source tree, one in the C file containing its
      code and one in cpuidle.rst which is part of admin-guide.
      
      Instead of trying to keep them both in sync and in order to reduce
      text duplication, include the governor description from the C file
      directly into cpuidle.rst.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      154ae8bb
    • Rafael J. Wysocki's avatar
      cpuidle: teo: Rework most recent idle duration values treatment · 77577558
      Rafael J. Wysocki authored
      The TEO (Timer Events Oriented) cpuidle governor uses several most
      recent idle duration values for a given CPU to refine the idle state
      selection in case the previous long-term trends have not been
      followed recently and a new trend appears to be forming.  That is
      done by computing the average of the most recent idle duration
      values falling below the time till the next timer event ("sleep
      length"), provided that they are the majority of the most recent
      idle duration values taken into account, and using it as the new
      expected idle duration value.
      
      However, idle state selection based on that value may not be optimal,
      because the average does not really indicate which of the idle states
      with target residencies less than or equal to it is likely to be the
      best fit.
      
      Thus, instead of computing the average, make the governor carry out
      computations based on the distribution of the most recent idle
      duration values among the bins corresponding to different idle
      states.  Namely, if the majority of the most recent idle duration
      values taken into consideration are less than the current sleep
      length (which means that the CPU is likely to wake up early), find
      the idle state closest to the "candidate" one "matching" the sleep
      length whose target residency is less than or equal to the majority
      of the most recent idle duration values that have fallen below the
      current sleep length (which means that it is likely to be "shallow
      enough" this time).
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      77577558
    • Rafael J. Wysocki's avatar
      cpuidle: teo: Change the main idle state selection logic · c410a9a1
      Rafael J. Wysocki authored
      Two aspects of the current main idle state selection logic in the
      TEO (Timer Events Oriented) cpuidle governor are quite questionable.
      
      First of all, the "hits" and "misses" metrics used by it are only
      updated for a given idle state if the time till the next timer event
      ("sleep length") is between the target residency of that state and
      the target residency of the next one.  Consequently, they are likely
      to become stale if the sleep length tends to fall outside that
      interval which increases the likelihood of subomtimal idle state
      selection.
      
      Second, the decision on whether or not to select the idle state
      "matching" the sleep length is based on the metrics collected for
      that state alone, whereas in principle the metrics collected for
      the other idle states should be taken into consideration when that
      decision is made.  For example, if the measured idle duration is less
      than the target residency of the idle state "matching" the sleep
      length, then it is also less than the target residency of any deeper
      idle state and that should be taken into account when considering
      whether or not to select any of those states, but currently it is
      not.
      
      In order to address the above shortcomings, modify the main idle
      state selection logic in the TEO governor to take the metrics
      collected for all of the idle states into account when deciding
      whether or not to select the one "matching" the sleep length.
      
      Moreover, drop the "misses" metric that becomes redundant after the
      above change and rename the "early_hits" metric to "intercepts" so
      that its role is better reflected by its name (the idea being that
      if a CPU wakes up earlier than indicated by the sleep length, then
      it must be a result of a non-timer interrupt that "intercepts" the
      CPU).
      
      Also rename the states[] array in struct struct teo_cpu to
      state_bins[] to avoid confusing it with the states[] array in
      struct cpuidle_driver and update the documentation to match the
      new code (and make it more comprehensive while at it).
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c410a9a1
    • Rafael J. Wysocki's avatar
      cpuidle: teo: Cosmetic modification of teo_select() · b18e0de1
      Rafael J. Wysocki authored
      Initialize local variables in teo_select() where they are declared.
      
      No functional impact.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      b18e0de1
    • Rafael J. Wysocki's avatar
      cpuidle: teo: Cosmetic modifications of teo_update() · f53cbdab
      Rafael J. Wysocki authored
      Rename a local variable in teo_update() so that its purpose is better
      reflected by its name and use one more local variable in the loop
      over the CPU idle states in that function to make the code somewhat
      easier to read.
      
      No functional impact.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      f53cbdab
  3. 09 Jun, 2021 1 commit
    • Chen Yu's avatar
      intel_idle: Adjust the SKX C6 parameters if PC6 is disabled · 64233338
      Chen Yu authored
      Because cpuidle assumes worst-case C-state parameters, PC6 parameters
      are used for describing C6, which is worst-case for requesting CC6.
      When PC6 is enabled, this is appropriate. But if PC6 is disabled
      in the BIOS, the exit latency and target residency should be adjusted
      accordingly.
      
      Exit latency:
      Previously the C6 exit latency was measured as the PC6 exit latency.
      With PC6 disabled, the C6 exit latency should be the one of CC6.
      
      Target residency:
      With PC6 disabled, the idle duration within [CC6, PC6) would make the
      idle governor choose C1E over C6. This would cause low energy-efficiency.
      We should lower the bar to request C6 when PC6 is disabled.
      
      To fill this gap, check if PC6 is disabled in the BIOS in the
      MSR_PKG_CST_CONFIG_CONTROL(0xe2) register. If so, use the CC6 exit latency
      for C6 and set target_residency to 3 times of the new exit latency. [This
      is consistent with how intel_idle driver uses _CST to calculate the
      target_residency.] As a result, the OS would be more likely to choose C6
      over C1E when PC6 is disabled, which is reasonable, because if C6 is
      enabled, it implies that the user cares about energy, so choosing C6 more
      frequently makes sense.
      
      The new CC6 exit latency of 92us was measured with wult[1] on SKX via NIC
      wakeup as the 99.99th percentile. Also CLX and CPX both have the same CPU
      model number as SkX, but their CC6 exit latencies are similar to the SKX
      one, 96us and 89us respectively, so reuse the SKX value for them.
      
      There is a concern that it might be better to use a more generic approach
      instead of optimizing every platform. However, if the required code
      complexity and different PC6 bit interpretation on different platforms
      are taken into account, tuning the code per platform seems to be an
      acceptable tradeoff.
      
      Link: https://intel.github.io/wult/ # [1]
      Suggested-by: default avatarLen Brown <len.brown@intel.com>
      Signed-off-by: default avatarChen Yu <yu.c.chen@intel.com>
      Reviewed-by: default avatarArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      [ rjw: Subject and changelog edits ]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      64233338
  4. 06 Jun, 2021 11 commits
    • Linus Torvalds's avatar
      Linux 5.13-rc5 · 614124be
      Linus Torvalds authored
      614124be
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 90d56a3d
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Five small and fairly minor fixes, all in drivers"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: scsi_devinfo: Add blacklist entry for HPE OPEN-V
        scsi: ufs: ufs-mediatek: Fix HCI version in some platforms
        scsi: qedf: Do not put host in qedf_vport_create() unconditionally
        scsi: lpfc: Fix failure to transmit ABTS on FC link
        scsi: target: core: Fix warning on realtime kernels
      90d56a3d
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 20e41d9b
      Linus Torvalds authored
      Pull ext4 fixes from Ted Ts'o:
       "Miscellaneous ext4 bug fixes"
      
      * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: Only advertise encrypted_casefold when encryption and unicode are enabled
        ext4: fix no-key deletion for encrypt+casefold
        ext4: fix memory leak in ext4_fill_super
        ext4: fix fast commit alignment issues
        ext4: fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed
        ext4: fix accessing uninit percpu counter variable with fast_commit
        ext4: fix memory leak in ext4_mb_init_backend on error path.
      20e41d9b
    • Linus Torvalds's avatar
      Merge tag 'arm-soc-fixes-v5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · decad3e1
      Linus Torvalds authored
      Pull ARM SoC fixes from Olof Johansson:
       "A set of fixes that have been coming in over the last few weeks, the
        usual mix of fixes:
      
         - DT fixups for TI K3
      
         - SATA drive detection fix for TI DRA7
      
         - Power management fixes and a few build warning removals for OMAP
      
         - OP-TEE fix to use standard API for UUID exporting
      
         - DT fixes for a handful of i.MX boards
      
        And a few other smaller items"
      
      * tag 'arm-soc-fixes-v5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (29 commits)
        arm64: meson: select COMMON_CLK
        soc: amlogic: meson-clk-measure: remove redundant dev_err call in meson_msr_probe()
        ARM: OMAP1: ams-delta: remove unused function ams_delta_camera_power
        bus: ti-sysc: Fix flakey idling of uarts and stop using swsup_sidle_act
        ARM: dts: imx: emcon-avari: Fix nxp,pca8574 #gpio-cells
        ARM: dts: imx7d-pico: Fix the 'tuning-step' property
        ARM: dts: imx7d-meerkat96: Fix the 'tuning-step' property
        arm64: dts: freescale: sl28: var1: fix RGMII clock and voltage
        arm64: dts: freescale: sl28: var4: fix RGMII clock and voltage
        ARM: imx: pm-imx27: Include "common.h"
        arm64: dts: zii-ultra: fix 12V_MAIN voltage
        arm64: dts: zii-ultra: remove second GEN_3V3 regulator instance
        arm64: dts: ls1028a: fix memory node
        bus: ti-sysc: Fix am335x resume hang for usb otg module
        ARM: OMAP2+: Fix build warning when mmc_omap is not built
        ARM: OMAP1: isp1301-omap: Add missing gpiod_add_lookup_table function
        ARM: OMAP1: Fix use of possibly uninitialized irq variable
        optee: use export_uuid() to copy client UUID
        arm64: dts: ti: k3*: Introduce reg definition for interrupt routers
        arm64: dts: ti: k3-am65|j721e|am64: Map the dma / navigator subsystem via explicit ranges
        ...
      decad3e1
    • Linus Torvalds's avatar
      Merge tag 'powerpc-5.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · bd7b12aa
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "Fix our KVM reverse map real-mode handling since we enabled huge
        vmalloc (in some configurations).
      
        Revert a recent change to our IOMMU code which broke some devices.
      
        Fix KVM handling of FSCR on P7/P8, which could have possibly let a
        guest crash it's Qemu.
      
        Fix kprobes validation of prefixed instructions across page boundary.
      
        Thanks to Alexey Kardashevskiy, Christophe Leroy, Fabiano Rosas,
        Frederic Barrat, Naveen N. Rao, and Nicholas Piggin"
      
      * tag 'powerpc-5.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        Revert "powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE() to save TCEs"
        KVM: PPC: Book3S HV: Save host FSCR in the P7/8 path
        powerpc: Fix reverse map real-mode address lookup with huge vmalloc
        powerpc/kprobes: Fix validation of prefixed instructions across page boundary
      bd7b12aa
    • Linus Torvalds's avatar
      Merge tag 'x86_urgent_for_v5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 773ac53b
      Linus Torvalds authored
      Pull x86 fixes from Borislav Petkov:
       "A bunch of x86/urgent stuff accumulated for the last two weeks so
        lemme unload it to you.
      
        It should be all totally risk-free, of course. :-)
      
         - Fix out-of-spec hardware (1st gen Hygon) which does not implement
           MSR_AMD64_SEV even though the spec clearly states so, and check
           CPUID bits first.
      
         - Send only one signal to a task when it is a SEGV_PKUERR si_code
           type.
      
         - Do away with all the wankery of reserving X amount of memory in the
           first megabyte to prevent BIOS corrupting it and simply and
           unconditionally reserve the whole first megabyte.
      
         - Make alternatives NOP optimization work at an arbitrary position
           within the patched sequence because the compiler can put
           single-byte NOPs for alignment anywhere in the sequence (32-bit
           retpoline), vs our previous assumption that the NOPs are only
           appended.
      
         - Force-disable ENQCMD[S] instructions support and remove
           update_pasid() because of insufficient protection against FPU state
           modification in an interrupt context, among other xstate horrors
           which are being addressed at the moment. This one limits the
           fallout until proper enablement.
      
         - Use cpu_feature_enabled() in the idxd driver so that it can be
           build-time disabled through the defines in disabled-features.h.
      
         - Fix LVT thermal setup for SMI delivery mode by making sure the APIC
           LVT value is read before APIC initialization so that softlockups
           during boot do not happen at least on one machine.
      
         - Mark all legacy interrupts as legacy vectors when the IO-APIC is
           disabled and when all legacy interrupts are routed through the PIC"
      
      * tag 'x86_urgent_for_v5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/sev: Check SME/SEV support in CPUID first
        x86/fault: Don't send SIGSEGV twice on SEGV_PKUERR
        x86/setup: Always reserve the first 1M of RAM
        x86/alternative: Optimize single-byte NOPs at an arbitrary position
        x86/cpufeatures: Force disable X86_FEATURE_ENQCMD and remove update_pasid()
        dmaengine: idxd: Use cpu_feature_enabled()
        x86/thermal: Fix LVT thermal setup for SMI delivery mode
        x86/apic: Mark _all_ legacy interrupts when IO/APIC is missing
      773ac53b
    • Daniel Rosenberg's avatar
      ext4: Only advertise encrypted_casefold when encryption and unicode are enabled · e71f99f2
      Daniel Rosenberg authored
      Encrypted casefolding is only supported when both encryption and
      casefolding are both enabled in the config.
      
      Fixes: 471fbbea ("ext4: handle casefolding with encryption")
      Cc: stable@vger.kernel.org # 5.13+
      Signed-off-by: default avatarDaniel Rosenberg <drosen@google.com>
      Link: https://lore.kernel.org/r/20210603094849.314342-1-drosen@google.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      e71f99f2
    • Daniel Rosenberg's avatar
      ext4: fix no-key deletion for encrypt+casefold · 63e7f128
      Daniel Rosenberg authored
      commit 471fbbea ("ext4: handle casefolding with encryption") is
      missing a few checks for the encryption key which are needed to
      support deleting enrypted casefolded files when the key is not
      present.
      
      This bug made it impossible to delete encrypted+casefolded directories
      without the encryption key, due to errors like:
      
          W         : EXT4-fs warning (device vdc): __ext4fs_dirhash:270: inode #49202: comm Binder:378_4: Siphash requires key
      
      Repro steps in kvm-xfstests test appliance:
            mkfs.ext4 -F -E encoding=utf8 -O encrypt /dev/vdc
            mount /vdc
            mkdir /vdc/dir
            chattr +F /vdc/dir
            keyid=$(head -c 64 /dev/zero | xfs_io -c add_enckey /vdc | awk '{print $NF}')
            xfs_io -c "set_encpolicy $keyid" /vdc/dir
            for i in `seq 1 100`; do
                mkdir /vdc/dir/$i
            done
            xfs_io -c "rm_enckey $keyid" /vdc
            rm -rf /vdc/dir # fails with the bug
      
      Fixes: 471fbbea ("ext4: handle casefolding with encryption")
      Signed-off-by: default avatarDaniel Rosenberg <drosen@google.com>
      Link: https://lore.kernel.org/r/20210522004132.2142563-1-drosen@google.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      63e7f128
    • Alexey Makhalov's avatar
      ext4: fix memory leak in ext4_fill_super · afd09b61
      Alexey Makhalov authored
      Buffer head references must be released before calling kill_bdev();
      otherwise the buffer head (and its page referenced by b_data) will not
      be freed by kill_bdev, and subsequently that bh will be leaked.
      
      If blocksizes differ, sb_set_blocksize() will kill current buffers and
      page cache by using kill_bdev(). And then super block will be reread
      again but using correct blocksize this time. sb_set_blocksize() didn't
      fully free superblock page and buffer head, and being busy, they were
      not freed and instead leaked.
      
      This can easily be reproduced by calling an infinite loop of:
      
        systemctl start <ext4_on_lvm>.mount, and
        systemctl stop <ext4_on_lvm>.mount
      
      ... since systemd creates a cgroup for each slice which it mounts, and
      the bh leak get amplified by a dying memory cgroup that also never
      gets freed, and memory consumption is much more easily noticed.
      
      Fixes: ce40733c ("ext4: Check for return value from sb_set_blocksize")
      Fixes: ac27a0ec ("ext4: initial copy of files from ext3")
      Link: https://lore.kernel.org/r/20210521075533.95732-1-amakhalov@vmware.comSigned-off-by: default avatarAlexey Makhalov <amakhalov@vmware.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      afd09b61
    • Harshad Shirwadkar's avatar
      ext4: fix fast commit alignment issues · a7ba36bc
      Harshad Shirwadkar authored
      Fast commit recovery data on disk may not be aligned. So, when the
      recovery code reads it, this patch makes sure that fast commit info
      found on-disk is first memcpy-ed into an aligned variable before
      accessing it. As a consequence of it, we also remove some macros that
      could resulted in unaligned accesses.
      
      Cc: stable@kernel.org
      Fixes: 8016e29f ("ext4: fast commit recovery path")
      Signed-off-by: default avatarHarshad Shirwadkar <harshadshirwadkar@gmail.com>
      Link: https://lore.kernel.org/r/20210519215920.2037527-1-harshads@google.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      a7ba36bc
    • Ye Bin's avatar
      ext4: fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed · 082cd4ec
      Ye Bin authored
      We got follow bug_on when run fsstress with injecting IO fault:
      [130747.323114] kernel BUG at fs/ext4/extents_status.c:762!
      [130747.323117] Internal error: Oops - BUG: 0 [#1] SMP
      ......
      [130747.334329] Call trace:
      [130747.334553]  ext4_es_cache_extent+0x150/0x168 [ext4]
      [130747.334975]  ext4_cache_extents+0x64/0xe8 [ext4]
      [130747.335368]  ext4_find_extent+0x300/0x330 [ext4]
      [130747.335759]  ext4_ext_map_blocks+0x74/0x1178 [ext4]
      [130747.336179]  ext4_map_blocks+0x2f4/0x5f0 [ext4]
      [130747.336567]  ext4_mpage_readpages+0x4a8/0x7a8 [ext4]
      [130747.336995]  ext4_readpage+0x54/0x100 [ext4]
      [130747.337359]  generic_file_buffered_read+0x410/0xae8
      [130747.337767]  generic_file_read_iter+0x114/0x190
      [130747.338152]  ext4_file_read_iter+0x5c/0x140 [ext4]
      [130747.338556]  __vfs_read+0x11c/0x188
      [130747.338851]  vfs_read+0x94/0x150
      [130747.339110]  ksys_read+0x74/0xf0
      
      This patch's modification is according to Jan Kara's suggestion in:
      https://patchwork.ozlabs.org/project/linux-ext4/patch/20210428085158.3728201-1-yebin10@huawei.com/
      "I see. Now I understand your patch. Honestly, seeing how fragile is trying
      to fix extent tree after split has failed in the middle, I would probably
      go even further and make sure we fix the tree properly in case of ENOSPC
      and EDQUOT (those are easily user triggerable).  Anything else indicates a
      HW problem or fs corruption so I'd rather leave the extent tree as is and
      don't try to fix it (which also means we will not create overlapping
      extents)."
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20210506141042.3298679-1-yebin10@huawei.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      082cd4ec
  5. 05 Jun, 2021 22 commits