Commit e610a466 authored by Nathan Lynch's avatar Nathan Lynch Committed by Michael Ellerman

powerpc/pseries/mobility: rebuild cacheinfo hierarchy post-migration

It's common for the platform to replace the cache device nodes after a
migration. Since the cacheinfo code is never informed about this, it
never drops its references to the source system's cache nodes, causing
it to wind up in an inconsistent state resulting in warnings and oopses
as soon as CPU online/offline occurs after the migration, e.g.

  cache for /cpus/l3-cache@3113(Unified) refers to cache for /cpus/l2-cache@200d(Unified)
  WARNING: CPU: 15 PID: 86 at arch/powerpc/kernel/cacheinfo.c:176 release_cache+0x1bc/0x1d0
  [...]
  NIP release_cache+0x1bc/0x1d0
  LR  release_cache+0x1b8/0x1d0
  Call Trace:
    release_cache+0x1b8/0x1d0 (unreliable)
    cacheinfo_cpu_offline+0x1c4/0x2c0
    unregister_cpu_online+0x1b8/0x260
    cpuhp_invoke_callback+0x114/0xf40
    cpuhp_thread_fun+0x270/0x310
    smpboot_thread_fn+0x2c8/0x390
    kthread+0x1b8/0x1c0
    ret_from_kernel_thread+0x5c/0x68

Using device tree notifiers won't work since we want to rebuild the
hierarchy only after all the removals and additions have occurred and
the device tree is in a consistent state. Call cacheinfo_teardown()
before processing device tree updates, and rebuild the hierarchy
afterward.

Fixes: 410bccf9 ("powerpc/pseries: Partition migration in the kernel")
Signed-off-by: default avatarNathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
parent e59a175f
...@@ -23,6 +23,7 @@ ...@@ -23,6 +23,7 @@
#include <asm/machdep.h> #include <asm/machdep.h>
#include <asm/rtas.h> #include <asm/rtas.h>
#include "pseries.h" #include "pseries.h"
#include "../../kernel/cacheinfo.h"
static struct kobject *mobility_kobj; static struct kobject *mobility_kobj;
...@@ -345,11 +346,20 @@ void post_mobility_fixup(void) ...@@ -345,11 +346,20 @@ void post_mobility_fixup(void)
*/ */
cpus_read_lock(); cpus_read_lock();
/*
* It's common for the destination firmware to replace cache
* nodes. Release all of the cacheinfo hierarchy's references
* before updating the device tree.
*/
cacheinfo_teardown();
rc = pseries_devicetree_update(MIGRATION_SCOPE); rc = pseries_devicetree_update(MIGRATION_SCOPE);
if (rc) if (rc)
printk(KERN_ERR "Post-mobility device tree update " printk(KERN_ERR "Post-mobility device tree update "
"failed: %d\n", rc); "failed: %d\n", rc);
cacheinfo_rebuild();
cpus_read_unlock(); cpus_read_unlock();
/* Possibly switch to a new RFI flush type */ /* Possibly switch to a new RFI flush type */
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment