1. 04 Sep, 2024 8 commits
    • Thomas Gleixner's avatar
      printk: nbcon: Introduce printer kthreads · 76f258bf
      Thomas Gleixner authored
      Provide the main implementation for running a printer kthread
      per nbcon console that is takeover/handover aware. This
      includes:
      
      - new mandatory write_thread() callback
      - kthread creation
      - kthread main printing loop
      - kthread wakeup mechanism
      - kthread shutdown
      
      kthread creation is a bit tricky because consoles may register
      before kthreads can be created. In such cases, registration
      will succeed, even though no kthread exists. Once kthreads can
      be created, an early_initcall will set @printk_kthreads_ready.
      If there are no registered boot consoles, the early_initcall
      creates the kthreads for all registered nbcon consoles. If
      kthread creation fails, the related console is unregistered.
      
      If there are registered boot consoles when
      @printk_kthreads_ready is set, no kthreads are created until
      the final boot console unregisters.
      
      Once kthread creation finally occurs, @printk_kthreads_running
      is set so that the system knows kthreads are available for all
      registered nbcon consoles.
      
      If @printk_kthreads_running is already set when the console
      is registering, the kthread is created during registration. If
      kthread creation fails, the registration will fail.
      
      Until @printk_kthreads_running is set, console printing occurs
      directly via the console_lock.
      
      kthread shutdown on system shutdown/reboot is necessary to
      ensure the printer kthreads finish their printing so that the
      system can cleanly transition back to direct printing via the
      console_lock in order to reliably push out the final
      shutdown/reboot messages. @printk_kthreads_running is cleared
      before shutting down the individual kthreads.
      
      The kthread uses a new mandatory write_thread() callback that
      is called with both device_lock() and the console context
      acquired.
      
      The console ownership handling is necessary for synchronization
      against write_atomic() which is synchronized only via the
      console context ownership.
      
      The device_lock() serializes acquiring the console context with
      NBCON_PRIO_NORMAL. It is needed in case the device_lock() does
      not disable preemption. It prevents the following race:
      
      CPU0				CPU1
      
       [ task A ]
      
       nbcon_context_try_acquire()
         # success with NORMAL prio
         # .unsafe == false;  // safe for takeover
      
       [ schedule: task A -> B ]
      
      				WARN_ON()
      				  nbcon_atomic_flush_pending()
      				    nbcon_context_try_acquire()
      				      # success with EMERGENCY prio
      
      				      # flushing
      				      nbcon_context_release()
      
      				      # HERE: con->nbcon_state is free
      				      #       to take by anyone !!!
      
       nbcon_context_try_acquire()
         # success with NORMAL prio [ task B ]
      
       [ schedule: task B -> A ]
      
       nbcon_enter_unsafe()
         nbcon_context_can_proceed()
      
      BUG: nbcon_context_can_proceed() returns "true" because
           the console is owned by a context on CPU0 with
           NBCON_PRIO_NORMAL.
      
           But it should return "false". The console is owned
           by a context from task B and we do the check
           in a context from task A.
      
      Note that with these changes, the printer kthreads do not yet
      take over full responsibility for nbcon printing during normal
      operation. These changes only focus on the lifecycle of the
      kthreads.
      Co-developed-by: default avatarJohn Ogness <john.ogness@linutronix.de>
      Signed-off-by: default avatarJohn Ogness <john.ogness@linutronix.de>
      Signed-off-by: default avatarThomas Gleixner (Intel) <tglx@linutronix.de>
      Reviewed-by: default avatarPetr Mladek <pmladek@suse.com>
      Link: https://lore.kernel.org/r/20240904120536.115780-7-john.ogness@linutronix.deSigned-off-by: default avatarPetr Mladek <pmladek@suse.com>
      76f258bf
    • John Ogness's avatar
      printk: nbcon: Init @nbcon_seq to highest possible · fb9fabf3
      John Ogness authored
      When initializing an nbcon console, have nbcon_alloc() set
      @nbcon_seq to the highest possible sequence number. For all
      practical purposes, this will guarantee that the console
      will have nothing to print until later when @nbcon_seq is
      set to the proper initial printing value.
      
      This will be particularly important once kthread printing is
      introduced because nbcon_alloc() can create/start the kthread
      before the desired initial sequence number is known.
      Signed-off-by: default avatarJohn Ogness <john.ogness@linutronix.de>
      Reviewed-by: default avatarPetr Mladek <pmladek@suse.com>
      Link: https://lore.kernel.org/r/20240904120536.115780-6-john.ogness@linutronix.deSigned-off-by: default avatarPetr Mladek <pmladek@suse.com>
      fb9fabf3
    • John Ogness's avatar
      printk: nbcon: Add context to usable() and emit() · 6cb58cfe
      John Ogness authored
      The nbcon consoles will have two callbacks to be used for
      different contexts. In order to determine if an nbcon console
      is usable, console_is_usable() must know if it is a context
      that will need to use the optional write_atomic() callback.
      Also, nbcon_emit_next_record() must know which callback it
      needs to call.
      
      Add an extra parameter @use_atomic to console_is_usable() and
      nbcon_emit_next_record() to specify this.
      
      Since so far only the write_atomic() callback exists,
      @use_atomic is set to true for all call sites.
      
      For legacy consoles, @use_atomic is not used.
      Signed-off-by: default avatarJohn Ogness <john.ogness@linutronix.de>
      Reviewed-by: default avatarPetr Mladek <pmladek@suse.com>
      Link: https://lore.kernel.org/r/20240904120536.115780-5-john.ogness@linutronix.deSigned-off-by: default avatarPetr Mladek <pmladek@suse.com>
      6cb58cfe
    • John Ogness's avatar
      printk: Flush console on unregister_console() · 0e53e2d9
      John Ogness authored
      Ensure consoles have flushed pending records before
      unregistering. The console should print up to at least its
      related "console disabled" record.
      Signed-off-by: default avatarJohn Ogness <john.ogness@linutronix.de>
      Reviewed-by: default avatarPetr Mladek <pmladek@suse.com>
      Link: https://lore.kernel.org/r/20240904120536.115780-4-john.ogness@linutronix.deSigned-off-by: default avatarPetr Mladek <pmladek@suse.com>
      0e53e2d9
    • John Ogness's avatar
      printk: Fail pr_flush() if before SYSTEM_SCHEDULING · e37577eb
      John Ogness authored
      A follow-up change adds pr_flush() to console unregistration.
      However, with boot consoles unregistration can happen very
      early if there are also regular consoles registering as well.
      In this case the pr_flush() is not important because all
      consoles are flushed when checking the initial console sequence
      number.
      
      Allow pr_flush() to fail if @system_state has not yet reached
      SYSTEM_SCHEDULING. This avoids might_sleep() and msleep()
      explosions that would otherwise occur:
      
      [    0.436739][    T0] printk: legacy console [ttyS0] enabled
      [    0.439820][    T0] printk: legacy bootconsole [earlyser0] disabled
      [    0.446822][    T0] BUG: scheduling while atomic: swapper/0/0/0x00000002
      [    0.450491][    T0] 1 lock held by swapper/0/0:
      [    0.457897][    T0]  #0: ffffffff82ae5f88 (console_mutex){+.+.}-{4:4}, at: console_list_lock+0x20/0x70
      [    0.463141][    T0] Modules linked in:
      [    0.465307][    T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.10.0-rc1+ #372
      [    0.469394][    T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
      [    0.474402][    T0] Call Trace:
      [    0.476246][    T0]  <TASK>
      [    0.481473][    T0]  dump_stack_lvl+0x93/0xb0
      [    0.483949][    T0]  dump_stack+0x10/0x20
      [    0.486256][    T0]  __schedule_bug+0x68/0x90
      [    0.488753][    T0]  __schedule+0xb9b/0xd80
      [    0.491179][    T0]  ? lock_release+0xb5/0x270
      [    0.493732][    T0]  schedule+0x43/0x170
      [    0.495998][    T0]  schedule_timeout+0xc5/0x1e0
      [    0.498634][    T0]  ? __pfx_process_timeout+0x10/0x10
      [    0.501522][    T0]  ? msleep+0x13/0x50
      [    0.503728][    T0]  msleep+0x3c/0x50
      [    0.505847][    T0]  __pr_flush.constprop.0.isra.0+0x56/0x500
      [    0.509050][    T0]  ? _printk+0x58/0x80
      [    0.511332][    T0]  ? lock_is_held_type+0x9c/0x110
      [    0.514106][    T0]  unregister_console_locked+0xe1/0x450
      [    0.517144][    T0]  register_console+0x509/0x620
      [    0.519827][    T0]  ? __pfx_univ8250_console_init+0x10/0x10
      [    0.523042][    T0]  univ8250_console_init+0x24/0x40
      [    0.525845][    T0]  console_init+0x43/0x210
      [    0.528280][    T0]  start_kernel+0x493/0x980
      [    0.530773][    T0]  x86_64_start_reservations+0x18/0x30
      [    0.533755][    T0]  x86_64_start_kernel+0xae/0xc0
      [    0.536473][    T0]  common_startup_64+0x12c/0x138
      [    0.539210][    T0]  </TASK>
      
      And then the kernel goes into an infinite loop complaining about:
      
      1. releasing a pinned lock
      2. unpinning an unpinned lock
      3. bad: scheduling from the idle thread!
      4. goto 1
      Signed-off-by: default avatarJohn Ogness <john.ogness@linutronix.de>
      Reviewed-by: default avatarPetr Mladek <pmladek@suse.com>
      Link: https://lore.kernel.org/r/20240904120536.115780-3-john.ogness@linutronix.deSigned-off-by: default avatarPetr Mladek <pmladek@suse.com>
      e37577eb
    • John Ogness's avatar
      printk: nbcon: Add function for printers to reacquire ownership · bd07d864
      John Ogness authored
      Since ownership can be lost at any time due to handover or
      takeover, a printing context _must_ be prepared to back out
      immediately and carefully. However, there are scenarios where
      the printing context must reacquire ownership in order to
      finalize or revert hardware changes.
      
      One such example is when interrupts are disabled during
      printing. No other context will automagically re-enable the
      interrupts. For this case, the disabling context _must_
      reacquire nbcon ownership so that it can re-enable the
      interrupts.
      
      Provide nbcon_reacquire_nobuf() for exactly this purpose. It
      allows a printing context to reacquire ownership using the same
      priority as its previous ownership.
      
      Note that after a successful reacquire the printing context
      will have no output buffer because that has been lost. This
      function cannot be used to resume printing.
      Signed-off-by: default avatarJohn Ogness <john.ogness@linutronix.de>
      Reviewed-by: default avatarPetr Mladek <pmladek@suse.com>
      Link: https://lore.kernel.org/r/20240904120536.115780-2-john.ogness@linutronix.deSigned-off-by: default avatarPetr Mladek <pmladek@suse.com>
      bd07d864
    • John Ogness's avatar
      printk: nbcon: Use raw_cpu_ptr() instead of open coding · d33d5e68
      John Ogness authored
      There is no need to open code a non-migration-checking
      this_cpu_ptr(). That is exactly what raw_cpu_ptr() is.
      Signed-off-by: default avatarJohn Ogness <john.ogness@linutronix.de>
      Reviewed-by: default avatarPetr Mladek <pmladek@suse.com>
      Link: https://lore.kernel.org/r/87plpum4jw.fsf@jogness.linutronix.deSigned-off-by: default avatarPetr Mladek <pmladek@suse.com>
      d33d5e68
    • Jinjie Ruan's avatar
      printk: Use the BITS_PER_LONG macro · 85a147a9
      Jinjie Ruan authored
      sizeof(unsigned long) * 8 is the number of bits in an unsigned long
      variable, replace it with BITS_PER_LONG macro to make it simpler.
      Signed-off-by: default avatarJinjie Ruan <ruanjinjie@huawei.com>
      Reviewed-by: default avatarJohn Ogness <john.ogness@linutronix.de>
      Reviewed-by: default avatarPetr Mladek <pmladek@suse.com>
      Link: https://lore.kernel.org/r/20240903035358.308482-1-ruanjinjie@huawei.comSigned-off-by: default avatarPetr Mladek <pmladek@suse.com>
      85a147a9
  2. 21 Aug, 2024 32 commits