1. 06 Nov, 2020 23 commits
  2. 04 Nov, 2020 6 commits
  3. 02 Nov, 2020 1 commit
    • Eric Dumazet's avatar
      bpf: Fix error path in htab_map_alloc() · 8aaeed81
      Eric Dumazet authored
      syzbot was able to trigger a use-after-free in htab_map_alloc() [1]
      
      htab_map_alloc() lacks a call to lockdep_unregister_key() in its error path.
      
      lockdep_register_key() and lockdep_unregister_key() can not fail,
      it seems better to use them right after htab allocation and before
      htab freeing, avoiding more goto/labels in htab_map_alloc()
      
      [1]
      BUG: KASAN: use-after-free in lockdep_register_key+0x356/0x3e0 kernel/locking/lockdep.c:1182
      Read of size 8 at addr ffff88805fa67ad8 by task syz-executor.3/2356
      
      CPU: 1 PID: 2356 Comm: syz-executor.3 Not tainted 5.9.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x107/0x163 lib/dump_stack.c:118
       print_address_description.constprop.0.cold+0xae/0x4c8 mm/kasan/report.c:385
       __kasan_report mm/kasan/report.c:545 [inline]
       kasan_report.cold+0x1f/0x37 mm/kasan/report.c:562
       lockdep_register_key+0x356/0x3e0 kernel/locking/lockdep.c:1182
       htab_init_buckets kernel/bpf/hashtab.c:144 [inline]
       htab_map_alloc+0x6c5/0x14a0 kernel/bpf/hashtab.c:521
       find_and_alloc_map kernel/bpf/syscall.c:122 [inline]
       map_create kernel/bpf/syscall.c:825 [inline]
       __do_sys_bpf+0xa80/0x5180 kernel/bpf/syscall.c:4381
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x45deb9
      Code: 0d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f0eafee1c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
      RAX: ffffffffffffffda RBX: 0000000000001a00 RCX: 000000000045deb9
      RDX: 0000000000000040 RSI: 0000000020000040 RDI: 405a020000000000
      RBP: 000000000118bf60 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 000000000118bf2c
      R13: 00007ffd3cf9eabf R14: 00007f0eafee29c0 R15: 000000000118bf2c
      
      Allocated by task 2053:
       kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
       kasan_set_track mm/kasan/common.c:56 [inline]
       __kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:461
       kmalloc include/linux/slab.h:554 [inline]
       kzalloc include/linux/slab.h:666 [inline]
       htab_map_alloc+0xdf/0x14a0 kernel/bpf/hashtab.c:454
       find_and_alloc_map kernel/bpf/syscall.c:122 [inline]
       map_create kernel/bpf/syscall.c:825 [inline]
       __do_sys_bpf+0xa80/0x5180 kernel/bpf/syscall.c:4381
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Freed by task 2053:
       kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
       kasan_set_track+0x1c/0x30 mm/kasan/common.c:56
       kasan_set_free_info+0x1b/0x30 mm/kasan/generic.c:355
       __kasan_slab_free+0x102/0x140 mm/kasan/common.c:422
       slab_free_hook mm/slub.c:1544 [inline]
       slab_free_freelist_hook+0x5d/0x150 mm/slub.c:1577
       slab_free mm/slub.c:3142 [inline]
       kfree+0xdb/0x360 mm/slub.c:4124
       htab_map_alloc+0x3f9/0x14a0 kernel/bpf/hashtab.c:549
       find_and_alloc_map kernel/bpf/syscall.c:122 [inline]
       map_create kernel/bpf/syscall.c:825 [inline]
       __do_sys_bpf+0xa80/0x5180 kernel/bpf/syscall.c:4381
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The buggy address belongs to the object at ffff88805fa67800
       which belongs to the cache kmalloc-1k of size 1024
      The buggy address is located 728 bytes inside of
       1024-byte region [ffff88805fa67800, ffff88805fa67c00)
      The buggy address belongs to the page:
      page:000000003c5582c4 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x5fa60
      head:000000003c5582c4 order:3 compound_mapcount:0 compound_pincount:0
      flags: 0xfff00000010200(slab|head)
      raw: 00fff00000010200 ffffea0000bc1200 0000000200000002 ffff888010041140
      raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff88805fa67980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff88805fa67a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                          ^
       ffff88805fa67b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff88805fa67b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      Fixes: c50eb518 ("bpf: Use separate lockdep class for each hashtab")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20201102114100.3103180-1-eric.dumazet@gmail.com
      8aaeed81
  4. 30 Oct, 2020 3 commits
    • Alexei Starovoitov's avatar
      Merge branch 'bpf: safeguard hashtab locking in NMI context' · cb5dc5b0
      Alexei Starovoitov authored
      Song Liu says:
      
      ====================
      LOCKDEP NMI warning highlighted potential deadlock of hashtab in NMI
      context:
      
      [   74.828971] ================================
      [   74.828972] WARNING: inconsistent lock state
      [   74.828973] 5.9.0-rc8+ #275 Not tainted
      [   74.828974] --------------------------------
      [   74.828975] inconsistent {INITIAL USE} -> {IN-NMI} usage.
      [   74.828976] taskset/1174 [HC2[2]:SC0[0]:HE0:SE1] takes:
      [...]
      [   74.828999]  Possible unsafe locking scenario:
      [   74.828999]
      [   74.829000]        CPU0
      [   74.829001]        ----
      [   74.829001]   lock(&htab->buckets[i].raw_lock);
      [   74.829003]   <Interrupt>
      [   74.829004]     lock(&htab->buckets[i].raw_lock);
      
      Please refer to patch 1/2 for full trace.
      
      This warning is a false alert, as "INITIAL USE" and "IN-NMI" in the tests
      are from different hashtab. On the other hand, in theory, it is possible
      to deadlock when a hashtab is access from both non-NMI and NMI context.
      Patch 1/2 fixes this false alert by assigning separate lockdep class to
      each hashtab. Patch 2/2 introduces map_locked counters, which is similar to
      bpf_prog_active counter, to avoid hashtab deadlock in NMI context.
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      cb5dc5b0
    • Song Liu's avatar
      bpf: Avoid hashtab deadlock with map_locked · 20b6cc34
      Song Liu authored
      If a hashtab is accessed in both non-NMI and NMI context, the system may
      deadlock on bucket->lock. Fix this issue with percpu counter map_locked.
      map_locked rejects concurrent access to the same bucket from the same CPU.
      To reduce memory overhead, map_locked is not added per bucket. Instead,
      8 percpu counters are added to each hashtab. buckets are assigned to these
      counters based on the lower bits of its hash.
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20201029071925.3103400-3-songliubraving@fb.com
      20b6cc34
    • Song Liu's avatar
      bpf: Use separate lockdep class for each hashtab · c50eb518
      Song Liu authored
      If a hashtab is accessed in both NMI and non-NMI contexts, it may cause
      deadlock in bucket->lock. LOCKDEP NMI warning highlighted this issue:
      
      ./test_progs -t stacktrace
      
      [   74.828970]
      [   74.828971] ================================
      [   74.828972] WARNING: inconsistent lock state
      [   74.828973] 5.9.0-rc8+ #275 Not tainted
      [   74.828974] --------------------------------
      [   74.828975] inconsistent {INITIAL USE} -> {IN-NMI} usage.
      [   74.828976] taskset/1174 [HC2[2]:SC0[0]:HE0:SE1] takes:
      [   74.828977] ffffc90000ee96b0 (&htab->buckets[i].raw_lock){....}-{2:2}, at: htab_map_update_elem+0x271/0x5a0
      [   74.828981] {INITIAL USE} state was registered at:
      [   74.828982]   lock_acquire+0x137/0x510
      [   74.828983]   _raw_spin_lock_irqsave+0x43/0x90
      [   74.828984]   htab_map_update_elem+0x271/0x5a0
      [   74.828984]   0xffffffffa0040b34
      [   74.828985]   trace_call_bpf+0x159/0x310
      [   74.828986]   perf_trace_run_bpf_submit+0x5f/0xd0
      [   74.828987]   perf_trace_urandom_read+0x1be/0x220
      [   74.828988]   urandom_read_nowarn.isra.0+0x26f/0x380
      [   74.828989]   vfs_read+0xf8/0x280
      [   74.828989]   ksys_read+0xc9/0x160
      [   74.828990]   do_syscall_64+0x33/0x40
      [   74.828991]   entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   74.828992] irq event stamp: 1766
      [   74.828993] hardirqs last  enabled at (1765): [<ffffffff82800ace>] asm_exc_page_fault+0x1e/0x30
      [   74.828994] hardirqs last disabled at (1766): [<ffffffff8267df87>] irqentry_enter+0x37/0x60
      [   74.828995] softirqs last  enabled at (856): [<ffffffff81043e7c>] fpu__clear+0xac/0x120
      [   74.828996] softirqs last disabled at (854): [<ffffffff81043df0>] fpu__clear+0x20/0x120
      [   74.828997]
      [   74.828998] other info that might help us debug this:
      [   74.828999]  Possible unsafe locking scenario:
      [   74.828999]
      [   74.829000]        CPU0
      [   74.829001]        ----
      [   74.829001]   lock(&htab->buckets[i].raw_lock);
      [   74.829003]   <Interrupt>
      [   74.829004]     lock(&htab->buckets[i].raw_lock);
      [   74.829006]
      [   74.829006]  *** DEADLOCK ***
      [   74.829007]
      [   74.829008] 1 lock held by taskset/1174:
      [   74.829008]  #0: ffff8883ec3fd020 (&cpuctx_lock){-...}-{2:2}, at: perf_event_task_tick+0x101/0x650
      [   74.829012]
      [   74.829013] stack backtrace:
      [   74.829014] CPU: 0 PID: 1174 Comm: taskset Not tainted 5.9.0-rc8+ #275
      [   74.829015] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
      [   74.829016] Call Trace:
      [   74.829016]  <NMI>
      [   74.829017]  dump_stack+0x9a/0xd0
      [   74.829018]  lock_acquire+0x461/0x510
      [   74.829019]  ? lock_release+0x6b0/0x6b0
      [   74.829020]  ? stack_map_get_build_id_offset+0x45e/0x800
      [   74.829021]  ? htab_map_update_elem+0x271/0x5a0
      [   74.829022]  ? rcu_read_lock_held_common+0x1a/0x50
      [   74.829022]  ? rcu_read_lock_held+0x5f/0xb0
      [   74.829023]  _raw_spin_lock_irqsave+0x43/0x90
      [   74.829024]  ? htab_map_update_elem+0x271/0x5a0
      [   74.829025]  htab_map_update_elem+0x271/0x5a0
      [   74.829026]  bpf_prog_1fd9e30e1438d3c5_oncpu+0x9c/0xe88
      [   74.829027]  bpf_overflow_handler+0x127/0x320
      [   74.829028]  ? perf_event_text_poke_output+0x4d0/0x4d0
      [   74.829029]  ? sched_clock_cpu+0x18/0x130
      [   74.829030]  __perf_event_overflow+0xae/0x190
      [   74.829030]  handle_pmi_common+0x34c/0x470
      [   74.829031]  ? intel_pmu_save_and_restart+0x90/0x90
      [   74.829032]  ? lock_acquire+0x3f8/0x510
      [   74.829033]  ? lock_release+0x6b0/0x6b0
      [   74.829034]  intel_pmu_handle_irq+0x11e/0x240
      [   74.829034]  perf_event_nmi_handler+0x40/0x60
      [   74.829035]  nmi_handle+0x110/0x360
      [   74.829036]  ? __intel_pmu_enable_all.constprop.0+0x72/0xf0
      [   74.829037]  default_do_nmi+0x6b/0x170
      [   74.829038]  exc_nmi+0x106/0x130
      [   74.829038]  end_repeat_nmi+0x16/0x55
      [   74.829039] RIP: 0010:__intel_pmu_enable_all.constprop.0+0x72/0xf0
      [   74.829042] Code: 2f 1f 03 48 8d bb b8 0c 00 00 e8 29 09 41 00 48 ...
      [   74.829043] RSP: 0000:ffff8880a604fc90 EFLAGS: 00000002
      [   74.829044] RAX: 000000070000000f RBX: ffff8883ec2195a0 RCX: 000000000000038f
      [   74.829045] RDX: 0000000000000007 RSI: ffffffff82e72c20 RDI: ffff8883ec21a258
      [   74.829046] RBP: 000000070000000f R08: ffffffff8101b013 R09: fffffbfff0a7982d
      [   74.829047] R10: ffffffff853cc167 R11: fffffbfff0a7982c R12: 0000000000000000
      [   74.829049] R13: ffff8883ec3f0af0 R14: ffff8883ec3fd120 R15: ffff8883e9c92098
      [   74.829049]  ? intel_pmu_lbr_enable_all+0x43/0x240
      [   74.829050]  ? __intel_pmu_enable_all.constprop.0+0x72/0xf0
      [   74.829051]  ? __intel_pmu_enable_all.constprop.0+0x72/0xf0
      [   74.829052]  </NMI>
      [   74.829053]  perf_event_task_tick+0x48d/0x650
      [   74.829054]  scheduler_tick+0x129/0x210
      [   74.829054]  update_process_times+0x37/0x70
      [   74.829055]  tick_sched_handle.isra.0+0x35/0x90
      [   74.829056]  tick_sched_timer+0x8f/0xb0
      [   74.829057]  __hrtimer_run_queues+0x364/0x7d0
      [   74.829058]  ? tick_sched_do_timer+0xa0/0xa0
      [   74.829058]  ? enqueue_hrtimer+0x1e0/0x1e0
      [   74.829059]  ? recalibrate_cpu_khz+0x10/0x10
      [   74.829060]  ? ktime_get_update_offsets_now+0x1a3/0x360
      [   74.829061]  hrtimer_interrupt+0x1bb/0x360
      [   74.829062]  ? rcu_read_lock_sched_held+0xa1/0xd0
      [   74.829063]  __sysvec_apic_timer_interrupt+0xed/0x3d0
      [   74.829064]  sysvec_apic_timer_interrupt+0x3f/0xd0
      [   74.829064]  ? asm_sysvec_apic_timer_interrupt+0xa/0x20
      [   74.829065]  asm_sysvec_apic_timer_interrupt+0x12/0x20
      [   74.829066] RIP: 0033:0x7fba18d579b4
      [   74.829068] Code: 74 54 44 0f b6 4a 04 41 83 e1 0f 41 80 f9 ...
      [   74.829069] RSP: 002b:00007ffc9ba69570 EFLAGS: 00000206
      [   74.829071] RAX: 00007fba192084c0 RBX: 00007fba18c24d28 RCX: 00000000000007a4
      [   74.829072] RDX: 00007fba18c30488 RSI: 0000000000000000 RDI: 000000000000037b
      [   74.829073] RBP: 00007fba18ca5760 R08: 00007fba18c248fc R09: 00007fba18c94c30
      [   74.829074] R10: 000000000000002f R11: 0000000000073c30 R12: 00007ffc9ba695e0
      [   74.829075] R13: 00000000000003f3 R14: 00007fba18c21ac8 R15: 00000000000058d6
      
      However, such warning should not apply across multiple hashtabs. The
      system will not deadlock if one hashtab is used in NMI, while another
      hashtab is used in non-NMI.
      
      Use separate lockdep class for each hashtab, so that we don't get this
      false alert.
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20201029071925.3103400-2-songliubraving@fb.com
      c50eb518
  5. 28 Oct, 2020 1 commit
    • Yonghong Song's avatar
      bpf: Permit cond_resched for some iterators · cf83b2d2
      Yonghong Song authored
      Commit e679654a ("bpf: Fix a rcu_sched stall issue with
      bpf task/task_file iterator") tries to fix rcu stalls warning
      which is caused by bpf task_file iterator when running
      "bpftool prog".
      
            rcu: INFO: rcu_sched self-detected stall on CPU
            rcu: \x097-....: (20999 ticks this GP) idle=302/1/0x4000000000000000 softirq=1508852/1508852 fqs=4913
            \x09(t=21031 jiffies g=2534773 q=179750)
            NMI backtrace for cpu 7
            CPU: 7 PID: 184195 Comm: bpftool Kdump: loaded Tainted: G        W         5.8.0-00004-g68bfc7f8c1b4 #6
            Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A17 05/03/2019
            Call Trace:
            <IRQ>
            dump_stack+0x57/0x70
            nmi_cpu_backtrace.cold+0x14/0x53
            ? lapic_can_unplug_cpu.cold+0x39/0x39
            nmi_trigger_cpumask_backtrace+0xb7/0xc7
            rcu_dump_cpu_stacks+0xa2/0xd0
            rcu_sched_clock_irq.cold+0x1ff/0x3d9
            ? tick_nohz_handler+0x100/0x100
            update_process_times+0x5b/0x90
            tick_sched_timer+0x5e/0xf0
            __hrtimer_run_queues+0x12a/0x2a0
            hrtimer_interrupt+0x10e/0x280
            __sysvec_apic_timer_interrupt+0x51/0xe0
            asm_call_on_stack+0xf/0x20
            </IRQ>
            sysvec_apic_timer_interrupt+0x6f/0x80
            ...
            task_file_seq_next+0x52/0xa0
            bpf_seq_read+0xb9/0x320
            vfs_read+0x9d/0x180
            ksys_read+0x5f/0xe0
            do_syscall_64+0x38/0x60
            entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The fix is to limit the number of bpf program runs to be
      one million. This fixed the program in most cases. But
      we also found under heavy load, which can increase the wallclock
      time for bpf_seq_read(), the warning may still be possible.
      
      For example, calling bpf_delay() in the "while" loop of
      bpf_seq_read(), which will introduce artificial delay,
      the warning will show up in my qemu run.
      
        static unsigned q;
        volatile unsigned *p = &q;
        volatile unsigned long long ll;
        static void bpf_delay(void)
        {
               int i, j;
      
               for (i = 0; i < 10000; i++)
                       for (j = 0; j < 10000; j++)
                               ll += *p;
        }
      
      There are two ways to fix this issue. One is to reduce the above
      one million threshold to say 100,000 and hopefully rcu warning will
      not show up any more. Another is to introduce a target feature
      which enables bpf_seq_read() calling cond_resched().
      
      This patch took second approach as the first approach may cause
      more -EAGAIN failures for read() syscalls. Note that not all bpf_iter
      targets can permit cond_resched() in bpf_seq_read() as some, e.g.,
      netlink seq iterator, rcu read lock critical section spans through
      seq_ops->next() -> seq_ops->show() -> seq_ops->next().
      
      For the kernel code with the above hack, "bpftool p" roughly takes
      38 seconds to finish on my VM with 184 bpf program runs.
      Using the following command, I am able to collect the number of
      context switches:
         perf stat -e context-switches -- ./bpftool p >& log
      Without this patch,
         69      context-switches
      With this patch,
         75      context-switches
      This patch added additional 6 context switches, roughly every 6 seconds
      to reschedule, to avoid lengthy no-rescheduling which may cause the
      above RCU warnings.
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20201028061054.1411116-1-yhs@fb.com
      cf83b2d2
  6. 23 Oct, 2020 6 commits
    • Linus Torvalds's avatar
      Merge tag 'net-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 3cb12d27
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Cross-tree/merge window issues:
      
         - rtl8150: don't incorrectly assign random MAC addresses; fix late in
           the 5.9 cycle started depending on a return code from a function
           which changed with the 5.10 PR from the usb subsystem
      
        Current release regressions:
      
         - Revert "virtio-net: ethtool configurable RXCSUM", it was causing
           crashes at probe when control vq was not negotiated/available
      
        Previous release regressions:
      
         - ixgbe: fix probing of multi-port 10 Gigabit Intel NICs with an MDIO
           bus, only first device would be probed correctly
      
         - nexthop: Fix performance regression in nexthop deletion by
           effectively switching from recently added synchronize_rcu() to
           synchronize_rcu_expedited()
      
         - netsec: ignore 'phy-mode' device property on ACPI systems; the
           property is not populated correctly by the firmware, but firmware
           configures the PHY so just keep boot settings
      
        Previous releases - always broken:
      
         - tcp: fix to update snd_wl1 in bulk receiver fast path, addressing
           bulk transfers getting "stuck"
      
         - icmp: randomize the global rate limiter to prevent attackers from
           getting useful signal
      
         - r8169: fix operation under forced interrupt threading, make the
           driver always use hard irqs, even on RT, given the handler is light
           and only wants to schedule napi (and do so through a _irqoff()
           variant, preferably)
      
         - bpf: Enforce pointer id generation for all may-be-null register
           type to avoid pointers erroneously getting marked as null-checked
      
         - tipc: re-configure queue limit for broadcast link
      
         - net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN
           tunnels
      
         - fix various issues in chelsio inline tls driver
      
        Misc:
      
         - bpf: improve just-added bpf_redirect_neigh() helper api to support
           supplying nexthop by the caller - in case BPF program has already
           done a lookup we can avoid doing another one
      
         - remove unnecessary break statements
      
         - make MCTCP not select IPV6, but rather depend on it"
      
      * tag 'net-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits)
        tcp: fix to update snd_wl1 in bulk receiver fast path
        net: Properly typecast int values to set sk_max_pacing_rate
        netfilter: nf_fwd_netdev: clear timestamp in forwarding path
        ibmvnic: save changed mac address to adapter->mac_addr
        selftests: mptcp: depends on built-in IPv6
        Revert "virtio-net: ethtool configurable RXCSUM"
        rtnetlink: fix data overflow in rtnl_calcit()
        net: ethernet: mtk-star-emac: select REGMAP_MMIO
        net: hdlc_raw_eth: Clear the IFF_TX_SKB_SHARING flag after calling ether_setup
        net: hdlc: In hdlc_rcv, check to make sure dev is an HDLC device
        bpf, libbpf: Guard bpf inline asm from bpf_tail_call_static
        bpf, selftests: Extend test_tc_redirect to use modified bpf_redirect_neigh()
        bpf: Fix bpf_redirect_neigh helper api to support supplying nexthop
        mptcp: depends on IPV6 but not as a module
        sfc: move initialisation of efx->filter_sem to efx_init_struct()
        mpls: load mpls_gso after mpls_iptunnel
        net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN tunnels
        net/sched: act_gate: Unlock ->tcfa_lock in tc_setup_flow_action()
        net: dsa: bcm_sf2: make const array static, makes object smaller
        mptcp: MPTCP_IPV6 should depend on IPV6 instead of selecting it
        ...
      3cb12d27
    • Linus Torvalds's avatar
      Merge tag 'gfs2-for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 · 0adc313c
      Linus Torvalds authored
      Pull gfs2 updates from Andreas Gruenbacher:
      
       - Use iomap for non-journaled buffered I/O. This largely eliminates
         buffer heads on filesystems where the block size matches the page
         size. Many thanks to Christoph Hellwig for this patch!
      
       - Fixes for some more journaled data filesystem bugs, found by running
         xfstests with data journaling on for all files (chattr +j $MNT) (Bob
         Peterson)
      
       - gfs2_evict_inode refactoring (Bob Peterson)
      
       - Use the statfs data in the journal during recovery instead of reading
         it in from the local statfs inodes (Abhi Das)
      
       - Several other minor fixes by various people
      
      * tag 'gfs2-for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (30 commits)
        gfs2: Recover statfs info in journal head
        gfs2: lookup local statfs inodes prior to journal recovery
        gfs2: Add fields for statfs info in struct gfs2_log_header_host
        gfs2: Ignore subsequent errors after withdraw in rgrp_go_sync
        gfs2: Eliminate gl_vm
        gfs2: Only access gl_delete for iopen glocks
        gfs2: Fix comments to glock_hash_walk
        gfs2: eliminate GLF_QUEUED flag in favor of list_empty(gl_holders)
        gfs2: Ignore journal log writes for jdata holes
        gfs2: simplify gfs2_block_map
        gfs2: Only set PageChecked if we have a transaction
        gfs2: don't lock sd_ail_lock in gfs2_releasepage
        gfs2: make gfs2_ail1_empty_one return the count of active items
        gfs2: Wipe jdata and ail1 in gfs2_journal_wipe, formerly gfs2_meta_wipe
        gfs2: enhance log_blocks trace point to show log blocks free
        gfs2: add missing log_blocks trace points in gfs2_write_revokes
        gfs2: rename gfs2_write_full_page to gfs2_write_jdata_page, remove parm
        gfs2: add validation checks for size of superblock
        gfs2: use-after-free in sysfs deregistration
        gfs2: Fix NULL pointer dereference in gfs2_rgrp_dump
        ...
      0adc313c
    • Linus Torvalds's avatar
      Merge tag '5.10-rc-smb3-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6 · 0613ed91
      Linus Torvalds authored
      Pull cifs updates from Steve French:
      
       - add support for recognizing special file types (char/block/fifo/
         symlink) for files created by Linux on WSL (a format we plan to move
         to as the default for creating special files on Linux, as it has
         advantages over the other current option, the SFU format) in readdir.
      
       - fix double queries to root directory when directory leases not
         supported (e.g. Samba)
      
       - fix querying mode bits (modefromsid mount option) for special file
         types
      
       - stronger encryption (gcm256), disabled by default until tested more
         broadly
      
       - allow querying owner when server reports 'well known SID' on query
         dir with SMB3.1.1 POSIX extensions
      
      * tag '5.10-rc-smb3-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6: (30 commits)
        SMB3: add support for recognizing WSL reparse tags
        cifs: remove bogus debug code
        smb3.1.1: fix typo in compression flag
        cifs: move smb version mount options into fs_context.c
        cifs: move cache mount options to fs_context.ch
        cifs: move security mount options into fs_context.ch
        cifs: add files to host new mount api
        smb3: do not try to cache root directory if dir leases not supported
        smb3: fix stat when special device file and mounted with modefromsid
        cifs: Print the address and port we are connecting to in generic_ip_connect()
        SMB3: Resolve data corruption of TCP server info fields
        cifs: make const array static, makes object smaller
        SMB3.1.1: Fix ids returned in POSIX query dir
        smb3: add dynamic trace point to trace when credits obtained
        smb3.1.1: do not fail if no encryption required but server doesn't support it
        cifs: Return the error from crypt_message when enc/dec key not found.
        smb3.1.1: set gcm256 when requested
        smb3.1.1: rename nonces used for GCM and CCM encryption
        smb3.1.1: print warning if server does not support requested encryption type
        smb3.1.1: add new module load parm enable_gcm_256
        ...
      0613ed91
    • Linus Torvalds's avatar
      Merge tag 'vfs-5.10-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · c4728cfb
      Linus Torvalds authored
      Pull clone/dedupe/remap code refactoring from Darrick Wong:
       "Move the generic file range remap (aka reflink and dedupe) functions
        out of mm/filemap.c and fs/read_write.c and into fs/remap_range.c to
        reduce clutter in the first two files"
      
      * tag 'vfs-5.10-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        vfs: move the generic write and copy checks out of mm
        vfs: move the remap range helpers to remap_range.c
        vfs: move generic_remap_checks out of mm
      c4728cfb
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · f9a705ad
      Linus Torvalds authored
      Pull KVM updates from Paolo Bonzini:
       "For x86, there is a new alternative and (in the future) more scalable
        implementation of extended page tables that does not need a reverse
        map from guest physical addresses to host physical addresses.
      
        For now it is disabled by default because it is still lacking a few of
        the existing MMU's bells and whistles. However it is a very solid
        piece of work and it is already available for people to hammer on it.
      
        Other updates:
      
        ARM:
         - New page table code for both hypervisor and guest stage-2
         - Introduction of a new EL2-private host context
         - Allow EL2 to have its own private per-CPU variables
         - Support of PMU event filtering
         - Complete rework of the Spectre mitigation
      
        PPC:
         - Fix for running nested guests with in-kernel IRQ chip
         - Fix race condition causing occasional host hard lockup
         - Minor cleanups and bugfixes
      
        x86:
         - allow trapping unknown MSRs to userspace
         - allow userspace to force #GP on specific MSRs
         - INVPCID support on AMD
         - nested AMD cleanup, on demand allocation of nested SVM state
         - hide PV MSRs and hypercalls for features not enabled in CPUID
         - new test for MSR_IA32_TSC writes from host and guest
         - cleanups: MMU, CPUID, shared MSRs
         - LAPIC latency optimizations ad bugfixes"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (232 commits)
        kvm: x86/mmu: NX largepage recovery for TDP MMU
        kvm: x86/mmu: Don't clear write flooding count for direct roots
        kvm: x86/mmu: Support MMIO in the TDP MMU
        kvm: x86/mmu: Support write protection for nesting in tdp MMU
        kvm: x86/mmu: Support disabling dirty logging for the tdp MMU
        kvm: x86/mmu: Support dirty logging for the TDP MMU
        kvm: x86/mmu: Support changed pte notifier in tdp MMU
        kvm: x86/mmu: Add access tracking for tdp_mmu
        kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU
        kvm: x86/mmu: Allocate struct kvm_mmu_pages for all pages in TDP MMU
        kvm: x86/mmu: Add TDP MMU PF handler
        kvm: x86/mmu: Remove disallowed_hugepage_adjust shadow_walk_iterator arg
        kvm: x86/mmu: Support zapping SPTEs in the TDP MMU
        KVM: Cache as_id in kvm_memory_slot
        kvm: x86/mmu: Add functions to handle changed TDP SPTEs
        kvm: x86/mmu: Allocate and free TDP MMU roots
        kvm: x86/mmu: Init / Uninit the TDP MMU
        kvm: x86/mmu: Introduce tdp_iter
        KVM: mmu: extract spte.h and spte.c
        KVM: mmu: Separate updating a PTE from kvm_set_pte_rmapp
        ...
      f9a705ad
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 9313f802
      Linus Torvalds authored
      Pull virtio updates from Michael Tsirkin:
       "vhost, vdpa, and virtio cleanups and fixes
      
        A very quiet cycle, no new features"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        MAINTAINERS: add URL for virtio-mem
        vhost_vdpa: remove unnecessary spin_lock in vhost_vring_call
        vringh: fix __vringh_iov() when riov and wiov are different
        vdpa/mlx5: Setup driver only if VIRTIO_CONFIG_S_DRIVER_OK
        s390: virtio: PV needs VIRTIO I/O device protection
        virtio: let arch advertise guest's memory access restrictions
        vhost_vdpa: Fix duplicate included kernel.h
        vhost: reduce stack usage in log_used
        virtio-mem: Constify mem_id_table
        virtio_input: Constify id_table
        virtio-balloon: Constify id_table
        vdpa/mlx5: Fix failure to bring link up
        vdpa/mlx5: Make use of a specific 16 bit endianness API
      9313f802