• Linus Torvalds's avatar
    printk: reinstate KERN_CONT for printing continuation lines · 4bcc595c
    Linus Torvalds authored
    Long long ago the kernel log buffer was a buffered stream of bytes, very
    much like stdio in user space.  It supported log levels by scanning the
    stream and noticing the log level markers at the beginning of each line,
    but if you wanted to print a partial line in multiple chunks, you just
    did multiple printk() calls, and it just automatically worked.
    
    Except when it didn't, and you had very confusing output when different
    lines got all mixed up with each other.  Then you got fragment lines
    mixing with each other, or with non-fragment lines, because it was
    traditionally impossible to tell whether a printk() call was a
    continuation or not.
    
    To at least help clarify the issue of continuation lines, we added a
    KERN_CONT marker back in 2007 to mark continuation lines:
    
      47492527 ("printk: add KERN_CONT annotation").
    
    That continuation marker was initially an empty string, and didn't
    actuall make any semantic difference.  But it at least made it possible
    to annotate the source code, and have check-patch notice that a printk()
    didn't need or want a log level marker, because it was a continuation of
    a previous line.
    
    To avoid the ambiguity between a continuation line that had that
    KERN_CONT marker, and a printk with no level information at all, we then
    in 2009 made KERN_CONT be a real log level marker which meant that we
    could now reliably tell the difference between the two cases.
    
      5fd29d6c ("printk: clean up handling of log-levels and newlines")
    
    and we could take advantage of that to make sure we didn't mix up
    continuation lines with lines that just didn't have any loglevel at all.
    
    Then, in 2012, the kernel log buffer was changed to be a "record" based
    log, where each line was a record that has a loglevel and a timestamp.
    
    You can see the beginning of that conversion in commits
    
      e11fea92 ("kmsg: export printk records to the /dev/kmsg interface")
      7ff9554b ("printk: convert byte-buffer to variable-length record buffer")
    
    with a number of follow-up commits to fix some painful fallout from that
    conversion.  Over all, it took a couple of months to sort out most of
    it.  But the upside was that you could have concurrent readers (and
    writers) of the kernel log and not have lines with mixed output in them.
    
    And one particular pain-point for the record-based kernel logging was
    exactly the fragmentary lines that are generated in smaller chunks.  In
    order to still log them as one recrod, the continuation lines need to be
    attached to the previous record properly.
    
    However the explicit continuation record marker that is actually useful
    for this exact case was actually removed in aroundm the same time by commit
    
      61e99ab8 ("printk: remove the now unnecessary "C" annotation for KERN_CONT")
    
    due to the incorrect belief that KERN_CONT wasn't meaningful.  The
    ambiguity between "is this a continuation line" or "is this a plain
    printk with no log level information" was reintroduced, and in fact
    became an even bigger pain point because there was now the whole
    record-level merging of kernel messages going on.
    
    This patch reinstates the KERN_CONT as a real non-empty string marker,
    so that the ambiguity is fixed once again.
    
    But it's not a plain revert of that original removal: in the four years
    since we made KERN_CONT an empty string again, not only has the format
    of the log level markers changed, we've also had some usage changes in
    this area.
    
    For example, some ACPI code seems to use KERN_CONT _together_ with a log
    level, and now uses both the KERN_CONT marker and (for example) a
    KERN_INFO marker to show that it's an informational continuation of a
    line.
    
    Which is actually not a bad idea - if the continuation line cannot be
    attached to its predecessor, without the log level information we don't
    know what log level to assign to it (and we traditionally just assigned
    it the default loglevel).  So having both a log level and the KERN_CONT
    marker is not necessarily a bad idea, but it does mean that we need to
    actually iterate over potentially multiple markers, rather than just a
    single one.
    
    Also, since KERN_CONT was still conceptually needed, and encouraged, but
    didn't actually _do_ anything, we've also had the reverse problem:
    rather than having too many annotations it has too few, and there is bit
    rot with code that no longer marks the continuation lines with the
    KERN_CONT marker.
    
    So this patch not only re-instates the non-empty KERN_CONT marker, it
    also fixes up the cases of bit-rot I noticed in my own logs.
    
    There are probably other cases where KERN_CONT will be needed to be
    added, either because it is new code that never dealt with the need for
    KERN_CONT, or old code that has bitrotted without anybody noticing.
    
    That said, we should strive to avoid the need for KERN_CONT.  It does
    result in real problems for logging, and should generally not be seen as
    a good feature.  If we some day can get rid of the feature entirely,
    because nobody does any fragmented printk calls, that would be lovely.
    
    But until that point, let's at mark the code that relies on the hacky
    multi-fragment kernel printk's.  Not only does it avoid the ambiguity,
    it also annotates code as "maybe this would be good to fix some day".
    
    (That said, particularly during single-threaded bootup, the downsides of
    KERN_CONT are very limited.  Things get much hairier when you have
    multiple threads going on and user level reading and writing logs too).
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    4bcc595c
policydb.c 69.7 KB