Commit 4bcc595c authored by Linus Torvalds's avatar Linus Torvalds

printk: reinstate KERN_CONT for printing continuation lines

Long long ago the kernel log buffer was a buffered stream of bytes, very
much like stdio in user space.  It supported log levels by scanning the
stream and noticing the log level markers at the beginning of each line,
but if you wanted to print a partial line in multiple chunks, you just
did multiple printk() calls, and it just automatically worked.

Except when it didn't, and you had very confusing output when different
lines got all mixed up with each other.  Then you got fragment lines
mixing with each other, or with non-fragment lines, because it was
traditionally impossible to tell whether a printk() call was a
continuation or not.

To at least help clarify the issue of continuation lines, we added a
KERN_CONT marker back in 2007 to mark continuation lines:

  47492527 ("printk: add KERN_CONT annotation").

That continuation marker was initially an empty string, and didn't
actuall make any semantic difference.  But it at least made it possible
to annotate the source code, and have check-patch notice that a printk()
didn't need or want a log level marker, because it was a continuation of
a previous line.

To avoid the ambiguity between a continuation line that had that
KERN_CONT marker, and a printk with no level information at all, we then
in 2009 made KERN_CONT be a real log level marker which meant that we
could now reliably tell the difference between the two cases.

  5fd29d6c ("printk: clean up handling of log-levels and newlines")

and we could take advantage of that to make sure we didn't mix up
continuation lines with lines that just didn't have any loglevel at all.

Then, in 2012, the kernel log buffer was changed to be a "record" based
log, where each line was a record that has a loglevel and a timestamp.

You can see the beginning of that conversion in commits

  e11fea92 ("kmsg: export printk records to the /dev/kmsg interface")
  7ff9554b ("printk: convert byte-buffer to variable-length record buffer")

with a number of follow-up commits to fix some painful fallout from that
conversion.  Over all, it took a couple of months to sort out most of
it.  But the upside was that you could have concurrent readers (and
writers) of the kernel log and not have lines with mixed output in them.

And one particular pain-point for the record-based kernel logging was
exactly the fragmentary lines that are generated in smaller chunks.  In
order to still log them as one recrod, the continuation lines need to be
attached to the previous record properly.

However the explicit continuation record marker that is actually useful
for this exact case was actually removed in aroundm the same time by commit

  61e99ab8 ("printk: remove the now unnecessary "C" annotation for KERN_CONT")

due to the incorrect belief that KERN_CONT wasn't meaningful.  The
ambiguity between "is this a continuation line" or "is this a plain
printk with no log level information" was reintroduced, and in fact
became an even bigger pain point because there was now the whole
record-level merging of kernel messages going on.

This patch reinstates the KERN_CONT as a real non-empty string marker,
so that the ambiguity is fixed once again.

But it's not a plain revert of that original removal: in the four years
since we made KERN_CONT an empty string again, not only has the format
of the log level markers changed, we've also had some usage changes in
this area.

For example, some ACPI code seems to use KERN_CONT _together_ with a log
level, and now uses both the KERN_CONT marker and (for example) a
KERN_INFO marker to show that it's an informational continuation of a
line.

Which is actually not a bad idea - if the continuation line cannot be
attached to its predecessor, without the log level information we don't
know what log level to assign to it (and we traditionally just assigned
it the default loglevel).  So having both a log level and the KERN_CONT
marker is not necessarily a bad idea, but it does mean that we need to
actually iterate over potentially multiple markers, rather than just a
single one.

Also, since KERN_CONT was still conceptually needed, and encouraged, but
didn't actually _do_ anything, we've also had the reverse problem:
rather than having too many annotations it has too few, and there is bit
rot with code that no longer marks the continuation lines with the
KERN_CONT marker.

So this patch not only re-instates the non-empty KERN_CONT marker, it
also fixes up the cases of bit-rot I noticed in my own logs.

There are probably other cases where KERN_CONT will be needed to be
added, either because it is new code that never dealt with the need for
KERN_CONT, or old code that has bitrotted without anybody noticing.

That said, we should strive to avoid the need for KERN_CONT.  It does
result in real problems for logging, and should generally not be seen as
a good feature.  If we some day can get rid of the feature entirely,
because nobody does any fragmented printk calls, that would be lovely.

But until that point, let's at mark the code that relies on the hacky
multi-fragment kernel printk's.  Not only does it avoid the ambiguity,
it also annotates code as "maybe this would be good to fix some day".

(That said, particularly during single-threaded bootup, the downsides of
KERN_CONT are very limited.  Things get much hairier when you have
multiple threads going on and user level reading and writing logs too).
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent c8d2bc9b
...@@ -3182,11 +3182,11 @@ static int do_bind_con_driver(const struct consw *csw, int first, int last, ...@@ -3182,11 +3182,11 @@ static int do_bind_con_driver(const struct consw *csw, int first, int last,
pr_info("Console: switching "); pr_info("Console: switching ");
if (!deflt) if (!deflt)
printk("consoles %d-%d ", first+1, last+1); printk(KERN_CONT "consoles %d-%d ", first+1, last+1);
if (j >= 0) { if (j >= 0) {
struct vc_data *vc = vc_cons[j].d; struct vc_data *vc = vc_cons[j].d;
printk("to %s %s %dx%d\n", printk(KERN_CONT "to %s %s %dx%d\n",
vc->vc_can_do_color ? "colour" : "mono", vc->vc_can_do_color ? "colour" : "mono",
desc, vc->vc_cols, vc->vc_rows); desc, vc->vc_cols, vc->vc_rows);
...@@ -3195,7 +3195,7 @@ static int do_bind_con_driver(const struct consw *csw, int first, int last, ...@@ -3195,7 +3195,7 @@ static int do_bind_con_driver(const struct consw *csw, int first, int last,
update_screen(vc); update_screen(vc);
} }
} else } else
printk("to %s\n", desc); printk(KERN_CONT "to %s\n", desc);
retval = 0; retval = 0;
err: err:
......
...@@ -20,7 +20,7 @@ ...@@ -20,7 +20,7 @@
* line that had no enclosing \n). Only to be used by core/arch code * line that had no enclosing \n). Only to be used by core/arch code
* during early bootup (a continued line is not SMP-safe otherwise). * during early bootup (a continued line is not SMP-safe otherwise).
*/ */
#define KERN_CONT "" #define KERN_CONT KERN_SOH "c"
/* integer equivalents of KERN_<LEVEL> */ /* integer equivalents of KERN_<LEVEL> */
#define LOGLEVEL_SCHED -2 /* Deferred messages from sched code #define LOGLEVEL_SCHED -2 /* Deferred messages from sched code
......
...@@ -16,6 +16,7 @@ static inline int printk_get_level(const char *buffer) ...@@ -16,6 +16,7 @@ static inline int printk_get_level(const char *buffer)
switch (buffer[1]) { switch (buffer[1]) {
case '0' ... '7': case '0' ... '7':
case 'd': /* KERN_DEFAULT */ case 'd': /* KERN_DEFAULT */
case 'c': /* KERN_CONT */
return buffer[1]; return buffer[1];
} }
} }
......
...@@ -655,11 +655,8 @@ static ssize_t msg_print_ext_header(char *buf, size_t size, ...@@ -655,11 +655,8 @@ static ssize_t msg_print_ext_header(char *buf, size_t size,
* better readable output. 'c' in the record flags mark the first * better readable output. 'c' in the record flags mark the first
* fragment of a line, '+' the following. * fragment of a line, '+' the following.
*/ */
if (msg->flags & LOG_CONT && !(prev_flags & LOG_CONT)) if (msg->flags & LOG_CONT)
cont = 'c'; cont = (prev_flags & LOG_CONT) ? '+' : 'c';
else if ((msg->flags & LOG_CONT) ||
((prev_flags & LOG_CONT) && !(msg->flags & LOG_PREFIX)))
cont = '+';
return scnprintf(buf, size, "%u,%llu,%llu,%c;", return scnprintf(buf, size, "%u,%llu,%llu,%c;",
(msg->facility << 3) | msg->level, seq, ts_usec, cont); (msg->facility << 3) | msg->level, seq, ts_usec, cont);
...@@ -1819,10 +1816,9 @@ asmlinkage int vprintk_emit(int facility, int level, ...@@ -1819,10 +1816,9 @@ asmlinkage int vprintk_emit(int facility, int level,
/* strip kernel syslog prefix and extract log level or control flags */ /* strip kernel syslog prefix and extract log level or control flags */
if (facility == 0) { if (facility == 0) {
int kern_level = printk_get_level(text); int kern_level;
if (kern_level) { while ((kern_level = printk_get_level(text)) != 0) {
const char *end_of_header = printk_skip_level(text);
switch (kern_level) { switch (kern_level) {
case '0' ... '7': case '0' ... '7':
if (level == LOGLEVEL_DEFAULT) if (level == LOGLEVEL_DEFAULT)
...@@ -1830,14 +1826,13 @@ asmlinkage int vprintk_emit(int facility, int level, ...@@ -1830,14 +1826,13 @@ asmlinkage int vprintk_emit(int facility, int level,
/* fallthrough */ /* fallthrough */
case 'd': /* KERN_DEFAULT */ case 'd': /* KERN_DEFAULT */
lflags |= LOG_PREFIX; lflags |= LOG_PREFIX;
break;
case 'c': /* KERN_CONT */
lflags |= LOG_CONT;
} }
/*
* No need to check length here because vscnprintf text_len -= 2;
* put '\0' at the end of the string. Only valid and text += 2;
* newly printed level is detected.
*/
text_len -= end_of_header - text;
text = (char *)end_of_header;
} }
} }
...@@ -1852,7 +1847,7 @@ asmlinkage int vprintk_emit(int facility, int level, ...@@ -1852,7 +1847,7 @@ asmlinkage int vprintk_emit(int facility, int level,
* Flush the conflicting buffer. An earlier newline was missing, * Flush the conflicting buffer. An earlier newline was missing,
* or another task also prints continuation lines. * or another task also prints continuation lines.
*/ */
if (cont.len && (lflags & LOG_PREFIX || cont.owner != current)) if (cont.len && (!(lflags & LOG_CONT) || cont.owner != current))
cont_flush(LOG_NEWLINE); cont_flush(LOG_NEWLINE);
/* buffer line if possible, otherwise store it right away */ /* buffer line if possible, otherwise store it right away */
...@@ -1874,7 +1869,7 @@ asmlinkage int vprintk_emit(int facility, int level, ...@@ -1874,7 +1869,7 @@ asmlinkage int vprintk_emit(int facility, int level,
* a newline, flush and append the newline. * a newline, flush and append the newline.
*/ */
if (cont.len) { if (cont.len) {
if (cont.owner == current && !(lflags & LOG_PREFIX)) if (cont.owner == current && (lflags & LOG_CONT))
stored = cont_add(facility, level, text, stored = cont_add(facility, level, text,
text_len); text_len);
cont_flush(LOG_NEWLINE); cont_flush(LOG_NEWLINE);
......
...@@ -527,9 +527,9 @@ static int policydb_index(struct policydb *p) ...@@ -527,9 +527,9 @@ static int policydb_index(struct policydb *p)
printk(KERN_DEBUG "SELinux: %d users, %d roles, %d types, %d bools", printk(KERN_DEBUG "SELinux: %d users, %d roles, %d types, %d bools",
p->p_users.nprim, p->p_roles.nprim, p->p_types.nprim, p->p_bools.nprim); p->p_users.nprim, p->p_roles.nprim, p->p_types.nprim, p->p_bools.nprim);
if (p->mls_enabled) if (p->mls_enabled)
printk(", %d sens, %d cats", p->p_levels.nprim, printk(KERN_CONT ", %d sens, %d cats", p->p_levels.nprim,
p->p_cats.nprim); p->p_cats.nprim);
printk("\n"); printk(KERN_CONT "\n");
printk(KERN_DEBUG "SELinux: %d classes, %d rules\n", printk(KERN_DEBUG "SELinux: %d classes, %d rules\n",
p->p_classes.nprim, p->te_avtab.nel); p->p_classes.nprim, p->te_avtab.nel);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment