1. 23 Oct, 2019 1 commit
    • Daniel Borkmann's avatar
      bpf: Fix use after free in bpf_get_prog_name · 3b4d9eb2
      Daniel Borkmann authored
      There is one more problematic case I noticed while recently fixing BPF kallsyms
      handling in cd7455f1 ("bpf: Fix use after free in subprog's jited symbol
      removal") and that is bpf_get_prog_name().
      
      If BTF has been attached to the prog, then we may be able to fetch the function
      signature type id in kallsyms through prog->aux->func_info[prog->aux->func_idx].type_id.
      However, while the BTF object itself is torn down via RCU callback, the prog's
      aux->func_info is immediately freed via kvfree(prog->aux->func_info) once the
      prog's refcount either hit zero or when subprograms were already exposed via
      kallsyms and we hit the error path added in 5482e9a9 ("bpf: Fix memleak in
      aux->func_info and aux->btf").
      
      This violates RCU as well since kallsyms could be walked in parallel where we
      could access aux->func_info. Hence, defer kvfree() to after RCU grace period.
      Looking at ba64e7d8 ("bpf: btf: support proper non-jit func info") there
      is no reason/dependency where we couldn't defer the kvfree(aux->func_info) into
      the RCU callback.
      
      Fixes: 5482e9a9 ("bpf: Fix memleak in aux->func_info and aux->btf")
      Fixes: ba64e7d8 ("bpf: btf: support proper non-jit func info")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/875f2906a7c1a0691f2d567b4d8e4ea2739b1e88.1571779205.git.daniel@iogearbox.net
      3b4d9eb2
  2. 22 Oct, 2019 1 commit
    • Daniel Borkmann's avatar
      bpf: Fix use after free in subprog's jited symbol removal · cd7455f1
      Daniel Borkmann authored
      syzkaller managed to trigger the following crash:
      
        [...]
        BUG: unable to handle page fault for address: ffffc90001923030
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        PGD aa551067 P4D aa551067 PUD aa552067 PMD a572b067 PTE 80000000a1173163
        Oops: 0000 [#1] PREEMPT SMP KASAN
        CPU: 0 PID: 7982 Comm: syz-executor912 Not tainted 5.4.0-rc3+ #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        RIP: 0010:bpf_jit_binary_hdr include/linux/filter.h:787 [inline]
        RIP: 0010:bpf_get_prog_addr_region kernel/bpf/core.c:531 [inline]
        RIP: 0010:bpf_tree_comp kernel/bpf/core.c:600 [inline]
        RIP: 0010:__lt_find include/linux/rbtree_latch.h:115 [inline]
        RIP: 0010:latch_tree_find include/linux/rbtree_latch.h:208 [inline]
        RIP: 0010:bpf_prog_kallsyms_find kernel/bpf/core.c:674 [inline]
        RIP: 0010:is_bpf_text_address+0x184/0x3b0 kernel/bpf/core.c:709
        [...]
        Call Trace:
         kernel_text_address kernel/extable.c:147 [inline]
         __kernel_text_address+0x9a/0x110 kernel/extable.c:102
         unwind_get_return_address+0x4c/0x90 arch/x86/kernel/unwind_frame.c:19
         arch_stack_walk+0x98/0xe0 arch/x86/kernel/stacktrace.c:26
         stack_trace_save+0xb6/0x150 kernel/stacktrace.c:123
         save_stack mm/kasan/common.c:69 [inline]
         set_track mm/kasan/common.c:77 [inline]
         __kasan_kmalloc+0x11c/0x1b0 mm/kasan/common.c:510
         kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:518
         slab_post_alloc_hook mm/slab.h:584 [inline]
         slab_alloc mm/slab.c:3319 [inline]
         kmem_cache_alloc+0x1f5/0x2e0 mm/slab.c:3483
         getname_flags+0xba/0x640 fs/namei.c:138
         getname+0x19/0x20 fs/namei.c:209
         do_sys_open+0x261/0x560 fs/open.c:1091
         __do_sys_open fs/open.c:1115 [inline]
         __se_sys_open fs/open.c:1110 [inline]
         __x64_sys_open+0x87/0x90 fs/open.c:1110
         do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:290
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [...]
      
      After further debugging it turns out that we walk kallsyms while in parallel
      we tear down a BPF program which contains subprograms that have been JITed
      though the program itself has not been fully exposed and is eventually bailing
      out with error.
      
      The bpf_prog_kallsyms_del_subprogs() in bpf_prog_load()'s error path removes
      the symbols, however, bpf_prog_free() tears down the JIT memory too early via
      scheduled work. Instead, it needs to properly respect RCU grace period as the
      kallsyms walk for BPF is under RCU.
      
      Fix it by refactoring __bpf_prog_put()'s tear down and reuse it in our error
      path where we defer final destruction when we have subprogs in the program.
      
      Fixes: 7d1982b4 ("bpf: fix panic in prog load calls cleanup")
      Fixes: 1c2a088a ("bpf: x64: add JIT support for multi-function programs")
      Reported-by: syzbot+710043c5d1d5b5013bc7@syzkaller.appspotmail.com
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Tested-by: syzbot+710043c5d1d5b5013bc7@syzkaller.appspotmail.com
      Link: https://lore.kernel.org/bpf/55f6367324c2d7e9583fa9ccf5385dcbba0d7a6e.1571752452.git.daniel@iogearbox.net
      cd7455f1
  3. 21 Oct, 2019 1 commit
  4. 18 Oct, 2019 2 commits
  5. 14 Oct, 2019 1 commit
  6. 13 Oct, 2019 13 commits
    • YueHaibing's avatar
      netdevsim: Fix error handling in nsim_fib_init and nsim_fib_exit · 33902b4a
      YueHaibing authored
      In nsim_fib_init(), if register_fib_notifier failed, nsim_fib_net_ops
      should be unregistered before return.
      
      In nsim_fib_exit(), unregister_fib_notifier should be called before
      nsim_fib_net_ops be unregistered, otherwise may cause use-after-free:
      
      BUG: KASAN: use-after-free in nsim_fib_event_nb+0x342/0x570 [netdevsim]
      Read of size 8 at addr ffff8881daaf4388 by task kworker/0:3/3499
      
      CPU: 0 PID: 3499 Comm: kworker/0:3 Not tainted 5.3.0-rc7+ #30
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      Workqueue: ipv6_addrconf addrconf_dad_work [ipv6]
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0xa9/0x10e lib/dump_stack.c:113
       print_address_description+0x65/0x380 mm/kasan/report.c:351
       __kasan_report+0x149/0x18d mm/kasan/report.c:482
       kasan_report+0xe/0x20 mm/kasan/common.c:618
       nsim_fib_event_nb+0x342/0x570 [netdevsim]
       notifier_call_chain+0x52/0xf0 kernel/notifier.c:95
       __atomic_notifier_call_chain+0x78/0x140 kernel/notifier.c:185
       call_fib_notifiers+0x30/0x60 net/core/fib_notifier.c:30
       call_fib6_entry_notifiers+0xc1/0x100 [ipv6]
       fib6_add+0x92e/0x1b10 [ipv6]
       __ip6_ins_rt+0x40/0x60 [ipv6]
       ip6_ins_rt+0x84/0xb0 [ipv6]
       __ipv6_ifa_notify+0x4b6/0x550 [ipv6]
       ipv6_ifa_notify+0xa5/0x180 [ipv6]
       addrconf_dad_completed+0xca/0x640 [ipv6]
       addrconf_dad_work+0x296/0x960 [ipv6]
       process_one_work+0x5c0/0xc00 kernel/workqueue.c:2269
       worker_thread+0x5c/0x670 kernel/workqueue.c:2415
       kthread+0x1d7/0x200 kernel/kthread.c:255
       ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
      
      Allocated by task 3388:
       save_stack+0x19/0x80 mm/kasan/common.c:69
       set_track mm/kasan/common.c:77 [inline]
       __kasan_kmalloc.constprop.3+0xa0/0xd0 mm/kasan/common.c:493
       kmalloc include/linux/slab.h:557 [inline]
       kzalloc include/linux/slab.h:748 [inline]
       ops_init+0xa9/0x220 net/core/net_namespace.c:127
       __register_pernet_operations net/core/net_namespace.c:1135 [inline]
       register_pernet_operations+0x1d4/0x420 net/core/net_namespace.c:1212
       register_pernet_subsys+0x24/0x40 net/core/net_namespace.c:1253
       nsim_fib_init+0x12/0x70 [netdevsim]
       veth_get_link_ksettings+0x2b/0x50 [veth]
       do_one_initcall+0xd4/0x454 init/main.c:939
       do_init_module+0xe0/0x330 kernel/module.c:3490
       load_module+0x3c2f/0x4620 kernel/module.c:3841
       __do_sys_finit_module+0x163/0x190 kernel/module.c:3931
       do_syscall_64+0x72/0x2e0 arch/x86/entry/common.c:296
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 3534:
       save_stack+0x19/0x80 mm/kasan/common.c:69
       set_track mm/kasan/common.c:77 [inline]
       __kasan_slab_free+0x130/0x180 mm/kasan/common.c:455
       slab_free_hook mm/slub.c:1423 [inline]
       slab_free_freelist_hook mm/slub.c:1474 [inline]
       slab_free mm/slub.c:3016 [inline]
       kfree+0xe9/0x2d0 mm/slub.c:3957
       ops_free net/core/net_namespace.c:151 [inline]
       ops_free_list.part.7+0x156/0x220 net/core/net_namespace.c:184
       ops_free_list net/core/net_namespace.c:182 [inline]
       __unregister_pernet_operations net/core/net_namespace.c:1165 [inline]
       unregister_pernet_operations+0x221/0x2a0 net/core/net_namespace.c:1224
       unregister_pernet_subsys+0x1d/0x30 net/core/net_namespace.c:1271
       nsim_fib_exit+0x11/0x20 [netdevsim]
       nsim_module_exit+0x16/0x21 [netdevsim]
       __do_sys_delete_module kernel/module.c:1015 [inline]
       __se_sys_delete_module kernel/module.c:958 [inline]
       __x64_sys_delete_module+0x244/0x330 kernel/module.c:958
       do_syscall_64+0x72/0x2e0 arch/x86/entry/common.c:296
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Fixes: 59c84b9f ("netdevsim: Restore per-network namespace accounting for fib entries")
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      33902b4a
    • Cédric Le Goater's avatar
      net/ibmvnic: Fix EOI when running in XIVE mode. · 11d49ce9
      Cédric Le Goater authored
      pSeries machines on POWER9 processors can run with the XICS (legacy)
      interrupt mode or with the XIVE exploitation interrupt mode. These
      interrupt contollers have different interfaces for interrupt
      management : XICS uses hcalls and XIVE loads and stores on a page.
      H_EOI being a XICS interface the enable_scrq_irq() routine can fail
      when the machine runs in XIVE mode.
      
      Fix that by calling the EOI handler of the interrupt chip.
      
      Fixes: f23e0643 ("ibmvnic: Clear pending interrupt after device reset")
      Signed-off-by: default avatarCédric Le Goater <clg@kaod.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      11d49ce9
    • Alexandre Belloni's avatar
      net: lpc_eth: avoid resetting twice · c23936fa
      Alexandre Belloni authored
      __lpc_eth_shutdown is called after __lpc_eth_reset but it is already
      calling __lpc_eth_reset. Avoid resetting the IP twice.
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c23936fa
    • David S. Miller's avatar
      Merge branch 'tcp-address-KCSAN-reports-in-tcp_poll-part-I' · 3f233809
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      tcp: address KCSAN reports in tcp_poll() (part I)
      
      This all started with a KCSAN report (included
      in "tcp: annotate tp->rcv_nxt lockless reads" changelog)
      
      tcp_poll() runs in a lockless way. This means that about
      all accesses of tcp socket fields done in tcp_poll() context
      need annotations otherwise KCSAN will complain about data-races.
      
      While doing this detective work, I found a more serious bug,
      addressed by the first patch ("tcp: add rcu protection around
      tp->fastopen_rsk").
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3f233809
    • Eric Dumazet's avatar
      tcp: annotate sk->sk_wmem_queued lockless reads · ab4e846a
      Eric Dumazet authored
      For the sake of tcp_poll(), there are few places where we fetch
      sk->sk_wmem_queued while this field can change from IRQ or other cpu.
      
      We need to add READ_ONCE() annotations, and also make sure write
      sides use corresponding WRITE_ONCE() to avoid store-tearing.
      
      sk_wmem_queued_add() helper is added so that we can in
      the future convert to ADD_ONCE() or equivalent if/when
      available.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ab4e846a
    • Eric Dumazet's avatar
      tcp: annotate sk->sk_sndbuf lockless reads · e292f05e
      Eric Dumazet authored
      For the sake of tcp_poll(), there are few places where we fetch
      sk->sk_sndbuf while this field can change from IRQ or other cpu.
      
      We need to add READ_ONCE() annotations, and also make sure write
      sides use corresponding WRITE_ONCE() to avoid store-tearing.
      
      Note that other transports probably need similar fixes.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e292f05e
    • Eric Dumazet's avatar
      tcp: annotate sk->sk_rcvbuf lockless reads · ebb3b78d
      Eric Dumazet authored
      For the sake of tcp_poll(), there are few places where we fetch
      sk->sk_rcvbuf while this field can change from IRQ or other cpu.
      
      We need to add READ_ONCE() annotations, and also make sure write
      sides use corresponding WRITE_ONCE() to avoid store-tearing.
      
      Note that other transports probably need similar fixes.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ebb3b78d
    • Eric Dumazet's avatar
      tcp: annotate tp->urg_seq lockless reads · d9b55bf7
      Eric Dumazet authored
      There two places where we fetch tp->urg_seq while
      this field can change from IRQ or other cpu.
      
      We need to add READ_ONCE() annotations, and also make
      sure write side use corresponding WRITE_ONCE() to avoid
      store-tearing.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d9b55bf7
    • Eric Dumazet's avatar
      tcp: annotate tp->snd_nxt lockless reads · e0d694d6
      Eric Dumazet authored
      There are few places where we fetch tp->snd_nxt while
      this field can change from IRQ or other cpu.
      
      We need to add READ_ONCE() annotations, and also make
      sure write sides use corresponding WRITE_ONCE() to avoid
      store-tearing.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e0d694d6
    • Eric Dumazet's avatar
      tcp: annotate tp->write_seq lockless reads · 0f317464
      Eric Dumazet authored
      There are few places where we fetch tp->write_seq while
      this field can change from IRQ or other cpu.
      
      We need to add READ_ONCE() annotations, and also make
      sure write sides use corresponding WRITE_ONCE() to avoid
      store-tearing.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0f317464
    • Eric Dumazet's avatar
      tcp: annotate tp->copied_seq lockless reads · 7db48e98
      Eric Dumazet authored
      There are few places where we fetch tp->copied_seq while
      this field can change from IRQ or other cpu.
      
      We need to add READ_ONCE() annotations, and also make
      sure write sides use corresponding WRITE_ONCE() to avoid
      store-tearing.
      
      Note that tcp_inq_hint() was already using READ_ONCE(tp->copied_seq)
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7db48e98
    • Eric Dumazet's avatar
      tcp: annotate tp->rcv_nxt lockless reads · dba7d9b8
      Eric Dumazet authored
      There are few places where we fetch tp->rcv_nxt while
      this field can change from IRQ or other cpu.
      
      We need to add READ_ONCE() annotations, and also make
      sure write sides use corresponding WRITE_ONCE() to avoid
      store-tearing.
      
      Note that tcp_inq_hint() was already using READ_ONCE(tp->rcv_nxt)
      
      syzbot reported :
      
      BUG: KCSAN: data-race in tcp_poll / tcp_queue_rcv
      
      write to 0xffff888120425770 of 4 bytes by interrupt on cpu 0:
       tcp_rcv_nxt_update net/ipv4/tcp_input.c:3365 [inline]
       tcp_queue_rcv+0x180/0x380 net/ipv4/tcp_input.c:4638
       tcp_rcv_established+0xbf1/0xf50 net/ipv4/tcp_input.c:5616
       tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1542
       tcp_v4_rcv+0x1a03/0x1bf0 net/ipv4/tcp_ipv4.c:1923
       ip_protocol_deliver_rcu+0x51/0x470 net/ipv4/ip_input.c:204
       ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:442 [inline]
       ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
       __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5004
       __netif_receive_skb+0x37/0xf0 net/core/dev.c:5118
       netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5208
       napi_skb_finish net/core/dev.c:5671 [inline]
       napi_gro_receive+0x28f/0x330 net/core/dev.c:5704
       receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061
      
      read to 0xffff888120425770 of 4 bytes by task 7254 on cpu 1:
       tcp_stream_is_readable net/ipv4/tcp.c:480 [inline]
       tcp_poll+0x204/0x6b0 net/ipv4/tcp.c:554
       sock_poll+0xed/0x250 net/socket.c:1256
       vfs_poll include/linux/poll.h:90 [inline]
       ep_item_poll.isra.0+0x90/0x190 fs/eventpoll.c:892
       ep_send_events_proc+0x113/0x5c0 fs/eventpoll.c:1749
       ep_scan_ready_list.constprop.0+0x189/0x500 fs/eventpoll.c:704
       ep_send_events fs/eventpoll.c:1793 [inline]
       ep_poll+0xe3/0x900 fs/eventpoll.c:1930
       do_epoll_wait+0x162/0x180 fs/eventpoll.c:2294
       __do_sys_epoll_pwait fs/eventpoll.c:2325 [inline]
       __se_sys_epoll_pwait fs/eventpoll.c:2311 [inline]
       __x64_sys_epoll_pwait+0xcd/0x170 fs/eventpoll.c:2311
       do_syscall_64+0xcf/0x2f0 arch/x86/entry/common.c:296
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 7254 Comm: syz-fuzzer Not tainted 5.3.0+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dba7d9b8
    • Eric Dumazet's avatar
      tcp: add rcu protection around tp->fastopen_rsk · d983ea6f
      Eric Dumazet authored
      Both tcp_v4_err() and tcp_v6_err() do the following operations
      while they do not own the socket lock :
      
      	fastopen = tp->fastopen_rsk;
       	snd_una = fastopen ? tcp_rsk(fastopen)->snt_isn : tp->snd_una;
      
      The problem is that without appropriate barrier, the compiler
      might reload tp->fastopen_rsk and trigger a NULL deref.
      
      request sockets are protected by RCU, we can simply add
      the missing annotations and barriers to solve the issue.
      
      Fixes: 168a8f58 ("tcp: TCP Fast Open Server - main code path")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d983ea6f
  7. 12 Oct, 2019 2 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 8caf8a91
      David S. Miller authored
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf 2019-10-12
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) a bunch of small fixes. Nothing critical.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8caf8a91
    • David Howells's avatar
      rxrpc: Fix possible NULL pointer access in ICMP handling · f0308fb0
      David Howells authored
      If an ICMP packet comes in on the UDP socket backing an AF_RXRPC socket as
      the UDP socket is being shut down, rxrpc_error_report() may get called to
      deal with it after sk_user_data on the UDP socket has been cleared, leading
      to a NULL pointer access when this local endpoint record gets accessed.
      
      Fix this by just returning immediately if sk_user_data was NULL.
      
      The oops looks like the following:
      
      #PF: supervisor read access in kernel mode
      #PF: error_code(0x0000) - not-present page
      ...
      RIP: 0010:rxrpc_error_report+0x1bd/0x6a9
      ...
      Call Trace:
       ? sock_queue_err_skb+0xbd/0xde
       ? __udp4_lib_err+0x313/0x34d
       __udp4_lib_err+0x313/0x34d
       icmp_unreach+0x1ee/0x207
       icmp_rcv+0x25b/0x28f
       ip_protocol_deliver_rcu+0x95/0x10e
       ip_local_deliver+0xe9/0x148
       __netif_receive_skb_one_core+0x52/0x6e
       process_backlog+0xdc/0x177
       net_rx_action+0xf9/0x270
       __do_softirq+0x1b6/0x39a
       ? smpboot_register_percpu_thread+0xce/0xce
       run_ksoftirqd+0x1d/0x42
       smpboot_thread_fn+0x19e/0x1b3
       kthread+0xf1/0xf6
       ? kthread_delayed_work_timer_fn+0x83/0x83
       ret_from_fork+0x24/0x30
      
      Fixes: 17926a79 ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
      Reported-by: syzbot+611164843bd48cc2190c@syzkaller.appspotmail.com
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f0308fb0
  8. 11 Oct, 2019 4 commits
  9. 10 Oct, 2019 15 commits
    • Jacob Keller's avatar
      net: update net_dim documentation after rename · 2168da45
      Jacob Keller authored
      Commit 8960b389 ("linux/dim: Rename externally used net_dim
      members") renamed the net_dim API, removing the "net_" prefix from the
      structures and functions. The patch didn't update the net_dim.txt
      documentation file.
      
      Fix the documentation so that its examples match the current code.
      
      Fixes: 8960b389 ("linux/dim: Rename externally used net_dim members", 2019-06-25)
      Fixes: c002bd52 ("linux/dim: Rename externally exposed macros", 2019-06-25)
      Fixes: 4f75da36 ("linux/dim: Move implementation to .c files")
      Cc: Tal Gilboa <talgi@mellanox.com>
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      2168da45
    • Heiner Kallweit's avatar
      r8169: fix jumbo packet handling on resume from suspend · 4ebcb113
      Heiner Kallweit authored
      Mariusz reported that invalid packets are sent after resume from
      suspend if jumbo packets are active. It turned out that his BIOS
      resets chip settings to non-jumbo on resume. Most chip settings are
      re-initialized on resume from suspend by calling rtl_hw_start(),
      so let's add configuring jumbo to this function.
      There's nothing wrong with the commit marked as fixed, it's just
      the first one where the patch applies cleanly.
      
      Fixes: 7366016d ("r8169: read common register for PCI commit")
      Reported-by: default avatarMariusz Bialonczyk <manio@skyboo.net>
      Tested-by: default avatarMariusz Bialonczyk <manio@skyboo.net>
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      4ebcb113
    • Eric Dumazet's avatar
      net: silence KCSAN warnings about sk->sk_backlog.len reads · 70c26558
      Eric Dumazet authored
      sk->sk_backlog.len can be written by BH handlers, and read
      from process contexts in a lockless way.
      
      Note the write side should also use WRITE_ONCE() or a variant.
      We need some agreement about the best way to do this.
      
      syzbot reported :
      
      BUG: KCSAN: data-race in tcp_add_backlog / tcp_grow_window.isra.0
      
      write to 0xffff88812665f32c of 4 bytes by interrupt on cpu 1:
       sk_add_backlog include/net/sock.h:934 [inline]
       tcp_add_backlog+0x4a0/0xcc0 net/ipv4/tcp_ipv4.c:1737
       tcp_v4_rcv+0x1aba/0x1bf0 net/ipv4/tcp_ipv4.c:1925
       ip_protocol_deliver_rcu+0x51/0x470 net/ipv4/ip_input.c:204
       ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:442 [inline]
       ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
       __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5004
       __netif_receive_skb+0x37/0xf0 net/core/dev.c:5118
       netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5208
       napi_skb_finish net/core/dev.c:5671 [inline]
       napi_gro_receive+0x28f/0x330 net/core/dev.c:5704
       receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061
       virtnet_receive drivers/net/virtio_net.c:1323 [inline]
       virtnet_poll+0x436/0x7d0 drivers/net/virtio_net.c:1428
       napi_poll net/core/dev.c:6352 [inline]
       net_rx_action+0x3ae/0xa50 net/core/dev.c:6418
      
      read to 0xffff88812665f32c of 4 bytes by task 7292 on cpu 0:
       tcp_space include/net/tcp.h:1373 [inline]
       tcp_grow_window.isra.0+0x6b/0x480 net/ipv4/tcp_input.c:413
       tcp_event_data_recv+0x68f/0x990 net/ipv4/tcp_input.c:717
       tcp_rcv_established+0xbfe/0xf50 net/ipv4/tcp_input.c:5618
       tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1542
       sk_backlog_rcv include/net/sock.h:945 [inline]
       __release_sock+0x135/0x1e0 net/core/sock.c:2427
       release_sock+0x61/0x160 net/core/sock.c:2943
       tcp_recvmsg+0x63b/0x1a30 net/ipv4/tcp.c:2181
       inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
       sock_recvmsg_nosec net/socket.c:871 [inline]
       sock_recvmsg net/socket.c:889 [inline]
       sock_recvmsg+0x92/0xb0 net/socket.c:885
       sock_read_iter+0x15f/0x1e0 net/socket.c:967
       call_read_iter include/linux/fs.h:1864 [inline]
       new_sync_read+0x389/0x4f0 fs/read_write.c:414
       __vfs_read+0xb1/0xc0 fs/read_write.c:427
       vfs_read fs/read_write.c:461 [inline]
       vfs_read+0x143/0x2c0 fs/read_write.c:446
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 7292 Comm: syz-fuzzer Not tainted 5.3.0+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      70c26558
    • Eric Dumazet's avatar
      net: annotate sk->sk_rcvlowat lockless reads · eac66402
      Eric Dumazet authored
      sock_rcvlowat() or int_sk_rcvlowat() might be called without the socket
      lock for example from tcp_poll().
      
      Use READ_ONCE() to document the fact that other cpus might change
      sk->sk_rcvlowat under us and avoid KCSAN splats.
      
      Use WRITE_ONCE() on write sides too.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      eac66402
    • Eric Dumazet's avatar
      net: silence KCSAN warnings around sk_add_backlog() calls · 8265792b
      Eric Dumazet authored
      sk_add_backlog() callers usually read sk->sk_rcvbuf without
      owning the socket lock. This means sk_rcvbuf value can
      be changed by other cpus, and KCSAN complains.
      
      Add READ_ONCE() annotations to document the lockless nature
      of these reads.
      
      Note that writes over sk_rcvbuf should also use WRITE_ONCE(),
      but this will be done in separate patches to ease stable
      backports (if we decide this is relevant for stable trees).
      
      BUG: KCSAN: data-race in tcp_add_backlog / tcp_recvmsg
      
      write to 0xffff88812ab369f8 of 8 bytes by interrupt on cpu 1:
       __sk_add_backlog include/net/sock.h:902 [inline]
       sk_add_backlog include/net/sock.h:933 [inline]
       tcp_add_backlog+0x45a/0xcc0 net/ipv4/tcp_ipv4.c:1737
       tcp_v4_rcv+0x1aba/0x1bf0 net/ipv4/tcp_ipv4.c:1925
       ip_protocol_deliver_rcu+0x51/0x470 net/ipv4/ip_input.c:204
       ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:442 [inline]
       ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
       __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5004
       __netif_receive_skb+0x37/0xf0 net/core/dev.c:5118
       netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5208
       napi_skb_finish net/core/dev.c:5671 [inline]
       napi_gro_receive+0x28f/0x330 net/core/dev.c:5704
       receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061
       virtnet_receive drivers/net/virtio_net.c:1323 [inline]
       virtnet_poll+0x436/0x7d0 drivers/net/virtio_net.c:1428
       napi_poll net/core/dev.c:6352 [inline]
       net_rx_action+0x3ae/0xa50 net/core/dev.c:6418
      
      read to 0xffff88812ab369f8 of 8 bytes by task 7271 on cpu 0:
       tcp_recvmsg+0x470/0x1a30 net/ipv4/tcp.c:2047
       inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
       sock_recvmsg_nosec net/socket.c:871 [inline]
       sock_recvmsg net/socket.c:889 [inline]
       sock_recvmsg+0x92/0xb0 net/socket.c:885
       sock_read_iter+0x15f/0x1e0 net/socket.c:967
       call_read_iter include/linux/fs.h:1864 [inline]
       new_sync_read+0x389/0x4f0 fs/read_write.c:414
       __vfs_read+0xb1/0xc0 fs/read_write.c:427
       vfs_read fs/read_write.c:461 [inline]
       vfs_read+0x143/0x2c0 fs/read_write.c:446
       ksys_read+0xd5/0x1b0 fs/read_write.c:587
       __do_sys_read fs/read_write.c:597 [inline]
       __se_sys_read fs/read_write.c:595 [inline]
       __x64_sys_read+0x4c/0x60 fs/read_write.c:595
       do_syscall_64+0xcf/0x2f0 arch/x86/entry/common.c:296
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 7271 Comm: syz-fuzzer Not tainted 5.3.0+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      8265792b
    • Eric Dumazet's avatar
      tcp: annotate lockless access to tcp_memory_pressure · 1f142c17
      Eric Dumazet authored
      tcp_memory_pressure is read without holding any lock,
      and its value could be changed on other cpus.
      
      Use READ_ONCE() to annotate these lockless reads.
      
      The write side is already using atomic ops.
      
      Fixes: b8da51eb ("tcp: introduce tcp_under_memory_pressure()")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      1f142c17
    • Eric Dumazet's avatar
      net: add {READ|WRITE}_ONCE() annotations on ->rskq_accept_head · 60b173ca
      Eric Dumazet authored
      reqsk_queue_empty() is called from inet_csk_listen_poll() while
      other cpus might write ->rskq_accept_head value.
      
      Use {READ|WRITE}_ONCE() to avoid compiler tricks
      and potential KCSAN splats.
      
      Fixes: fff1f300 ("tcp: add a spinlock to protect struct request_sock_queue")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      60b173ca
    • Eric Dumazet's avatar
      net: avoid possible false sharing in sk_leave_memory_pressure() · 503978ac
      Eric Dumazet authored
      As mentioned in https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE#it-may-improve-performance
      a C compiler can legally transform :
      
      if (memory_pressure && *memory_pressure)
              *memory_pressure = 0;
      
      to :
      
      if (memory_pressure)
              *memory_pressure = 0;
      
      Fixes: 06044751 ("tcp: add TCPMemoryPressuresChrono counter")
      Fixes: 180d8cd9 ("foundations of per-cgroup memory pressure controlling.")
      Fixes: 3ab224be ("[NET] CORE: Introducing new memory accounting interface.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      503978ac
    • Eric Dumazet's avatar
      tun: remove possible false sharing in tun_flow_update() · 4ffdd22e
      Eric Dumazet authored
      As mentioned in https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE#it-may-improve-performance
      a C compiler can legally transform
      
      if (e->queue_index != queue_index)
      	e->queue_index = queue_index;
      
      to :
      
      	e->queue_index = queue_index;
      
      Note that the code using jiffies has no issue, since jiffies
      has volatile attribute.
      
      if (e->updated != jiffies)
          e->updated = jiffies;
      
      Fixes: 83b1bc12 ("tun: align write-heavy flow entry members to a cache line")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Zhang Yu <zhangyu31@baidu.com>
      Cc: Wang Li <wangli39@baidu.com>
      Cc: Li RongQing <lirongqing@baidu.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      4ffdd22e
    • Eric Dumazet's avatar
      netfilter: conntrack: avoid possible false sharing · e37542ba
      Eric Dumazet authored
      As hinted by KCSAN, we need at least one READ_ONCE()
      to prevent a compiler optimization.
      
      More details on :
      https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE#it-may-improve-performance
      
      sysbot report :
      BUG: KCSAN: data-race in __nf_ct_refresh_acct / __nf_ct_refresh_acct
      
      read to 0xffff888123eb4f08 of 4 bytes by interrupt on cpu 0:
       __nf_ct_refresh_acct+0xd4/0x1b0 net/netfilter/nf_conntrack_core.c:1796
       nf_ct_refresh_acct include/net/netfilter/nf_conntrack.h:201 [inline]
       nf_conntrack_tcp_packet+0xd40/0x3390 net/netfilter/nf_conntrack_proto_tcp.c:1161
       nf_conntrack_handle_packet net/netfilter/nf_conntrack_core.c:1633 [inline]
       nf_conntrack_in+0x410/0xaa0 net/netfilter/nf_conntrack_core.c:1727
       ipv4_conntrack_in+0x27/0x40 net/netfilter/nf_conntrack_proto.c:178
       nf_hook_entry_hookfn include/linux/netfilter.h:135 [inline]
       nf_hook_slow+0x83/0x160 net/netfilter/core.c:512
       nf_hook include/linux/netfilter.h:260 [inline]
       NF_HOOK include/linux/netfilter.h:303 [inline]
       ip_rcv+0x12f/0x1a0 net/ipv4/ip_input.c:523
       __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5004
       __netif_receive_skb+0x37/0xf0 net/core/dev.c:5118
       netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5208
       napi_skb_finish net/core/dev.c:5671 [inline]
       napi_gro_receive+0x28f/0x330 net/core/dev.c:5704
       receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061
       virtnet_receive drivers/net/virtio_net.c:1323 [inline]
       virtnet_poll+0x436/0x7d0 drivers/net/virtio_net.c:1428
       napi_poll net/core/dev.c:6352 [inline]
       net_rx_action+0x3ae/0xa50 net/core/dev.c:6418
       __do_softirq+0x115/0x33f kernel/softirq.c:292
      
      write to 0xffff888123eb4f08 of 4 bytes by task 7191 on cpu 1:
       __nf_ct_refresh_acct+0xfb/0x1b0 net/netfilter/nf_conntrack_core.c:1797
       nf_ct_refresh_acct include/net/netfilter/nf_conntrack.h:201 [inline]
       nf_conntrack_tcp_packet+0xd40/0x3390 net/netfilter/nf_conntrack_proto_tcp.c:1161
       nf_conntrack_handle_packet net/netfilter/nf_conntrack_core.c:1633 [inline]
       nf_conntrack_in+0x410/0xaa0 net/netfilter/nf_conntrack_core.c:1727
       ipv4_conntrack_local+0xbe/0x130 net/netfilter/nf_conntrack_proto.c:200
       nf_hook_entry_hookfn include/linux/netfilter.h:135 [inline]
       nf_hook_slow+0x83/0x160 net/netfilter/core.c:512
       nf_hook include/linux/netfilter.h:260 [inline]
       __ip_local_out+0x1f7/0x2b0 net/ipv4/ip_output.c:114
       ip_local_out+0x31/0x90 net/ipv4/ip_output.c:123
       __ip_queue_xmit+0x3a8/0xa40 net/ipv4/ip_output.c:532
       ip_queue_xmit+0x45/0x60 include/net/ip.h:236
       __tcp_transmit_skb+0xdeb/0x1cd0 net/ipv4/tcp_output.c:1158
       __tcp_send_ack+0x246/0x300 net/ipv4/tcp_output.c:3685
       tcp_send_ack+0x34/0x40 net/ipv4/tcp_output.c:3691
       tcp_cleanup_rbuf+0x130/0x360 net/ipv4/tcp.c:1575
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 7191 Comm: syz-fuzzer Not tainted 5.3.0+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: cc169213 ("netfilter: conntrack: avoid same-timeout update")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Jozsef Kadlecsik <kadlec@netfilter.org>
      Cc: Florian Westphal <fw@strlen.de>
      Acked-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      e37542ba
    • Nicolas Dichtel's avatar
      netns: fix NLM_F_ECHO mechanism for RTM_NEWNSID · 993e4c92
      Nicolas Dichtel authored
      The flag NLM_F_ECHO aims to reply to the user the message notified to all
      listeners.
      It was not the case with the command RTM_NEWNSID, let's fix this.
      
      Fixes: 0c7aecd4 ("netns: add rtnl cmd to add and get peer netns ids")
      Reported-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: default avatarGuillaume Nault <gnault@redhat.com>
      Tested-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      993e4c92
    • Daniele Palmas's avatar
      net: usb: qmi_wwan: add Telit 0x1050 composition · e0ae2c57
      Daniele Palmas authored
      This patch adds support for Telit FN980 0x1050 composition
      
      0x1050: tty, adb, rmnet, tty, tty, tty, tty
      Signed-off-by: default avatarDaniele Palmas <dnlplm@gmail.com>
      Acked-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      e0ae2c57
    • YueHaibing's avatar
      act_mirred: Fix mirred_init_module error handling · 11c9a7d3
      YueHaibing authored
      If tcf_register_action failed, mirred_device_notifier
      should be unregistered.
      
      Fixes: 3b87956e ("net sched: fix race in mirred device removal")
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      11c9a7d3
    • Vinicius Costa Gomes's avatar
      net: taprio: Fix returning EINVAL when configuring without flags · a954380a
      Vinicius Costa Gomes authored
      When configuring a taprio instance if "flags" is not specified (or
      it's zero), taprio currently replies with an "Invalid argument" error.
      
      So, set the return value to zero after we are done with all the
      checks.
      
      Fixes: 9c66d156 ("taprio: Add support for hardware offloading")
      Signed-off-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Acked-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      a954380a
    • Jakub Kicinski's avatar
      Merge branch 's390-qeth-fixes' · 8cd6f4fe
      Jakub Kicinski authored
      Julian Wiedmann says:
      
      ====================
      s390/qeth: fixes 2019-10-08
      
      Alexandra fixes two issues in the initialization code for vnicc cmds.
      One is an uninitialized variable when a cmd fails, the other that we
      wouldn't recover correctly if the device's supported features changed.
      ====================
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      8cd6f4fe