• Yonghong Song's avatar
    bpf: Use get_file_rcu() instead of get_file() for task_file iterator · cf28f3bb
    Yonghong Song authored
    With latest `bpftool prog` command, we observed the following kernel
    panic.
        BUG: kernel NULL pointer dereference, address: 0000000000000000
        #PF: supervisor instruction fetch in kernel mode
        #PF: error_code(0x0010) - not-present page
        PGD dfe894067 P4D dfe894067 PUD deb663067 PMD 0
        Oops: 0010 [#1] SMP
        CPU: 9 PID: 6023 ...
        RIP: 0010:0x0
        Code: Bad RIP value.
        RSP: 0000:ffffc900002b8f18 EFLAGS: 00010286
        RAX: ffff8883a405f400 RBX: ffff888e46a6bf00 RCX: 000000008020000c
        RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8883a405f400
        RBP: ffff888e46a6bf50 R08: 0000000000000000 R09: ffffffff81129600
        R10: ffff8883a405f300 R11: 0000160000000000 R12: 0000000000002710
        R13: 000000e9494b690c R14: 0000000000000202 R15: 0000000000000009
        FS:  00007fd9187fe700(0000) GS:ffff888e46a40000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: ffffffffffffffd6 CR3: 0000000de5d33002 CR4: 0000000000360ee0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         <IRQ>
         rcu_core+0x1a4/0x440
         __do_softirq+0xd3/0x2c8
         irq_exit+0x9d/0xa0
         smp_apic_timer_interrupt+0x68/0x120
         apic_timer_interrupt+0xf/0x20
         </IRQ>
        RIP: 0033:0x47ce80
        Code: Bad RIP value.
        RSP: 002b:00007fd9187fba40 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
        RAX: 0000000000000002 RBX: 00007fd931789160 RCX: 000000000000010c
        RDX: 00007fd9308cdfb4 RSI: 00007fd9308cdfb4 RDI: 00007ffedd1ea0a8
        RBP: 00007fd9187fbab0 R08: 000000000000000e R09: 000000000000002a
        R10: 0000000000480210 R11: 00007fd9187fc570 R12: 00007fd9316cc400
        R13: 0000000000000118 R14: 00007fd9308cdfb4 R15: 00007fd9317a9380
    
    After further analysis, the bug is triggered by
    Commit eaaacd23 ("bpf: Add task and task/file iterator targets")
    which introduced task_file bpf iterator, which traverses all open file
    descriptors for all tasks in the current namespace.
    The latest `bpftool prog` calls a task_file bpf program to traverse
    all files in the system in order to associate processes with progs/maps, etc.
    When traversing files for a given task, rcu read_lock is taken to
    access all files in a file_struct. But it used get_file() to grab
    a file, which is not right. It is possible file->f_count is 0 and
    get_file() will unconditionally increase it.
    Later put_file() may cause all kind of issues with the above
    as one of sympotoms.
    
    The failure can be reproduced with the following steps in a few seconds:
        $ cat t.c
        #include <stdio.h>
        #include <sys/types.h>
        #include <sys/stat.h>
        #include <fcntl.h>
        #include <unistd.h>
    
        #define N 10000
        int fd[N];
        int main() {
          int i;
    
          for (i = 0; i < N; i++) {
            fd[i] = open("./note.txt", 'r');
            if (fd[i] < 0) {
               fprintf(stderr, "failed\n");
               return -1;
            }
          }
          for (i = 0; i < N; i++)
            close(fd[i]);
    
          return 0;
        }
        $ gcc -O2 t.c
        $ cat run.sh
        #/bin/bash
        for i in {1..100}
        do
          while true; do ./a.out; done &
        done
        $ ./run.sh
        $ while true; do bpftool prog >& /dev/null; done
    
    This patch used get_file_rcu() which only grabs a file if the
    file->f_count is not zero. This is to ensure the file pointer
    is always valid. The above reproducer did not fail for more
    than 30 minutes.
    
    Fixes: eaaacd23 ("bpf: Add task and task/file iterator targets")
    Suggested-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Signed-off-by: default avatarYonghong Song <yhs@fb.com>
    Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Link: https://lore.kernel.org/bpf/20200817174214.252601-1-yhs@fb.com
    cf28f3bb
task_iter.c 8.2 KB