• James Morse's avatar
    x86/resctrl: Separate arch and fs resctrl locks · fb700810
    James Morse authored
    resctrl has one mutex that is taken by the architecture-specific code, and the
    filesystem parts. The two interact via cpuhp, where the architecture code
    updates the domain list. Filesystem handlers that walk the domains list should
    not run concurrently with the cpuhp callback modifying the list.
    
    Exposing a lock from the filesystem code means the interface is not cleanly
    defined, and creates the possibility of cross-architecture lock ordering
    headaches. The interaction only exists so that certain filesystem paths are
    serialised against CPU hotplug. The CPU hotplug code already has a mechanism to
    do this using cpus_read_lock().
    
    MPAM's monitors have an overflow interrupt, so it needs to be possible to walk
    the domains list in irq context. RCU is ideal for this, but some paths need to
    be able to sleep to allocate memory.
    
    Because resctrl_{on,off}line_cpu() take the rdtgroup_mutex as part of a cpuhp
    callback, cpus_read_lock() must always be taken first.
    rdtgroup_schemata_write() already does this.
    
    Most of the filesystem code's domain list walkers are currently protected by
    the rdtgroup_mutex taken in rdtgroup_kn_lock_live().  The exceptions are
    rdt_bit_usage_show() and the mon_config helpers which take the lock directly.
    
    Make the domain list protected by RCU. An architecture-specific lock prevents
    concurrent writers. rdt_bit_usage_show() could walk the domain list using RCU,
    but to keep all the filesystem operations the same, this is changed to call
    cpus_read_lock().  The mon_config helpers send multiple IPIs, take the
    cpus_read_lock() in these cases.
    
    The other filesystem list walkers need to be able to sleep.  Add
    cpus_read_lock() to rdtgroup_kn_lock_live() so that the cpuhp callbacks can't
    be invoked when file system operations are occurring.
    
    Add lockdep_assert_cpus_held() in the cases where the rdtgroup_kn_lock_live()
    call isn't obvious.
    
    Resctrl's domain online/offline calls now need to take the rdtgroup_mutex
    themselves.
    
      [ bp: Fold in a build fix: https://lore.kernel.org/r/87zfvwieli.ffs@tglx ]
    Signed-off-by: default avatarJames Morse <james.morse@arm.com>
    Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
    Reviewed-by: default avatarShaopeng Tan <tan.shaopeng@fujitsu.com>
    Reviewed-by: default avatarReinette Chatre <reinette.chatre@intel.com>
    Reviewed-by: default avatarBabu Moger <babu.moger@amd.com>
    Tested-by: default avatarShaopeng Tan <tan.shaopeng@fujitsu.com>
    Tested-by: default avatarPeter Newman <peternewman@google.com>
    Tested-by: default avatarBabu Moger <babu.moger@amd.com>
    Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
    Link: https://lore.kernel.org/r/20240213184438.16675-25-james.morse@arm.comSigned-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
    fb700810
monitor.c 29.3 KB