1. 25 Aug, 2019 3 commits
    • Yi-Hung Wei's avatar
      openvswitch: Fix conntrack cache with timeout · 71778951
      Yi-Hung Wei authored
      This patch addresses a conntrack cache issue with timeout policy.
      Currently, we do not check if the timeout extension is set properly in the
      cached conntrack entry.  Thus, after packet recirculate from conntrack
      action, the timeout policy is not applied properly.  This patch fixes the
      aforementioned issue.
      
      Fixes: 06bd2bdf ("openvswitch: Add timeout support to ct action")
      Reported-by: default avatarkbuild test robot <lkp@intel.com>
      Signed-off-by: default avatarYi-Hung Wei <yihung.wei@gmail.com>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      71778951
    • Alexey Kodanev's avatar
      ipv4: mpls: fix mpls_xmit for iptunnel · 803f3e22
      Alexey Kodanev authored
      When using mpls over gre/gre6 setup, rt->rt_gw4 address is not set, the
      same for rt->rt_gw_family.  Therefore, when rt->rt_gw_family is checked
      in mpls_xmit(), neigh_xmit() call is skipped. As a result, such setup
      doesn't work anymore.
      
      This issue was found with LTP mpls03 tests.
      
      Fixes: 1550c171 ("ipv4: Prepare rtable for IPv6 gateway")
      Signed-off-by: default avatarAlexey Kodanev <alexey.kodanev@oracle.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      803f3e22
    • David Ahern's avatar
      nexthop: Fix nexthop_num_path for blackhole nexthops · 9b5f6841
      David Ahern authored
      Donald reported this sequence:
        ip next add id 1 blackhole
        ip next add id 2 blackhole
        ip ro add 1.1.1.1/32 nhid 1
        ip ro add 1.1.1.2/32 nhid 2
      
      would cause a crash. Backtrace is:
      
      [  151.302790] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [  151.304043] CPU: 1 PID: 277 Comm: ip Not tainted 5.3.0-rc5+ #37
      [  151.305078] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
      [  151.306526] RIP: 0010:fib_add_nexthop+0x8b/0x2aa
      [  151.307343] Code: 35 f7 81 48 8d 14 01 c7 02 f1 f1 f1 f1 c7 42 04 01 f4 f4 f4 48 89 f2 48 c1 ea 03 65 48 8b 0c 25 28 00 00 00 48 89 4d d0 31 c9 <80> 3c 02 00 74 08 48 89 f7 e8 1a e8 53 ff be 08 00 00 00 4c 89 e7
      [  151.310549] RSP: 0018:ffff888116c27340 EFLAGS: 00010246
      [  151.311469] RAX: dffffc0000000000 RBX: ffff8881154ece00 RCX: 0000000000000000
      [  151.312713] RDX: 0000000000000004 RSI: 0000000000000020 RDI: ffff888115649b40
      [  151.313968] RBP: ffff888116c273d8 R08: ffffed10221e3757 R09: ffff888110f1bab8
      [  151.315212] R10: 0000000000000001 R11: ffff888110f1bab3 R12: ffff888115649b40
      [  151.316456] R13: 0000000000000020 R14: ffff888116c273b0 R15: ffff888115649b40
      [  151.317707] FS:  00007f60b4d8d800(0000) GS:ffff88811ac00000(0000) knlGS:0000000000000000
      [  151.319113] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  151.320119] CR2: 0000555671ffdc00 CR3: 00000001136ba005 CR4: 0000000000020ee0
      [  151.321367] Call Trace:
      [  151.321820]  ? fib_nexthop_info+0x635/0x635
      [  151.322572]  fib_dump_info+0xaa4/0xde0
      [  151.323247]  ? fib_create_info+0x2431/0x2431
      [  151.324008]  ? napi_alloc_frag+0x2a/0x2a
      [  151.324711]  rtmsg_fib+0x2c4/0x3be
      [  151.325339]  fib_table_insert+0xe2f/0xeee
      ...
      
      fib_dump_info incorrectly has nhs = 0 for blackhole nexthops, so it
      believes the nexthop object is a multipath group (nhs != 1) and ends
      up down the nexthop_mpath_fill_node() path which is wrong for a
      blackhole.
      
      The blackhole check in nexthop_num_path is leftover from early days
      of the blackhole implementation which did not initialize the device.
      In the end the design was simpler (fewer special case checks) to set
      the device to loopback in nh_info, so the check in nexthop_num_path
      should have been removed.
      
      Fixes: 430a0491 ("nexthop: Add support for nexthop groups")
      Reported-by: default avatarDonald Sharp <sharpd@cumulusnetworks.com>
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b5f6841
  2. 24 Aug, 2019 12 commits
    • Zhu Yanjun's avatar
      net: rds: add service level support in rds-info · e0e6d062
      Zhu Yanjun authored
      >From IB specific 7.6.5 SERVICE LEVEL, Service Level (SL)
      is used to identify different flows within an IBA subnet.
      It is carried in the local route header of the packet.
      
      Before this commit, run "rds-info -I". The outputs are as
      below:
      "
      RDS IB Connections:
       LocalAddr  RemoteAddr Tos SL  LocalDev               RemoteDev
      192.2.95.3  192.2.95.1  2   0  fe80::21:28:1a:39  fe80::21:28:10:b9
      192.2.95.3  192.2.95.1  1   0  fe80::21:28:1a:39  fe80::21:28:10:b9
      192.2.95.3  192.2.95.1  0   0  fe80::21:28:1a:39  fe80::21:28:10:b9
      "
      After this commit, the output is as below:
      "
      RDS IB Connections:
       LocalAddr  RemoteAddr Tos SL  LocalDev               RemoteDev
      192.2.95.3  192.2.95.1  2   2  fe80::21:28:1a:39  fe80::21:28:10:b9
      192.2.95.3  192.2.95.1  1   1  fe80::21:28:1a:39  fe80::21:28:10:b9
      192.2.95.3  192.2.95.1  0   0  fe80::21:28:1a:39  fe80::21:28:10:b9
      "
      
      The commit fe3475af ("net: rds: add per rds connection cache
      statistics") adds cache_allocs in struct rds_info_rdma_connection
      as below:
      struct rds_info_rdma_connection {
      ...
              __u32           rdma_mr_max;
              __u32           rdma_mr_size;
              __u8            tos;
              __u32           cache_allocs;
       };
      The peer struct in rds-tools of struct rds_info_rdma_connection is as
      below:
      struct rds_info_rdma_connection {
      ...
              uint32_t        rdma_mr_max;
              uint32_t        rdma_mr_size;
              uint8_t         tos;
              uint8_t         sl;
              uint32_t        cache_allocs;
      };
      The difference between userspace and kernel is the member variable sl.
      In the kernel struct, the member variable sl is missing. This will
      introduce risks. So it is necessary to use this commit to avoid this risk.
      
      Fixes: fe3475af ("net: rds: add per rds connection cache statistics")
      CC: Joe Jin <joe.jin@oracle.com>
      CC: JUNXIAO_BI <junxiao.bi@oracle.com>
      Suggested-by: default avatarGerd Rausch <gerd.rausch@oracle.com>
      Signed-off-by: default avatarZhu Yanjun <yanjun.zhu@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e0e6d062
    • John Fastabend's avatar
      net: route dump netlink NLM_F_MULTI flag missing · e93fb3e9
      John Fastabend authored
      An excerpt from netlink(7) man page,
      
        In multipart messages (multiple nlmsghdr headers with associated payload
        in one byte stream) the first and all following headers have the
        NLM_F_MULTI flag set, except for the last  header  which  has the type
        NLMSG_DONE.
      
      but, after (ee28906f) there is a missing NLM_F_MULTI flag in the middle of a
      FIB dump. The result is user space applications following above man page
      excerpt may get confused and may stop parsing msg believing something went
      wrong.
      
      In the golang netlink lib [0] the library logic stops parsing believing the
      message is not a multipart message. Found this running Cilium[1] against
      net-next while adding a feature to auto-detect routes. I noticed with
      multiple route tables we no longer could detect the default routes on net
      tree kernels because the library logic was not returning them.
      
      Fix this by handling the fib_dump_info_fnhe() case the same way the
      fib_dump_info() handles it by passing the flags argument through the
      call chain and adding a flags argument to rt_fill_info().
      
      Tested with Cilium stack and auto-detection of routes works again. Also
      annotated libs to dump netlink msgs and inspected NLM_F_MULTI and
      NLMSG_DONE flags look correct after this.
      
      Note: In inet_rtm_getroute() pass rt_fill_info() '0' for flags the same
      as is done for fib_dump_info() so this looks correct to me.
      
      [0] https://github.com/vishvananda/netlink/
      [1] https://github.com/cilium/
      
      Fixes: ee28906f ("ipv4: Dump route exceptions if requested")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e93fb3e9
    • Julian Wiedmann's avatar
      s390/qeth: reject oversized SNMP requests · 292a50e3
      Julian Wiedmann authored
      Commit d4c08afa ("s390/qeth: streamline SNMP cmd code") removed
      the bounds checking for req_len, under the assumption that the check in
      qeth_alloc_cmd() would suffice.
      
      But that code path isn't sufficiently robust to handle a user-provided
      data_length, which could overflow (when adding the cmd header overhead)
      before being checked against QETH_BUFSIZE. We end up allocating just a
      tiny iob, and the subsequent copy_from_user() writes past the end of
      that iob.
      
      Special-case this path and add a coarse bounds check, to protect against
      maliciuous requests. This let's the subsequent code flow do its normal
      job and precise checking, without risk of overflow.
      
      Fixes: d4c08afa ("s390/qeth: streamline SNMP cmd code")
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Reviewed-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      292a50e3
    • zhanglin's avatar
      sock: fix potential memory leak in proto_register() · b45ce321
      zhanglin authored
      If protocols registered exceeded PROTO_INUSE_NR, prot will be
      added to proto_list, but no available bit left for prot in
      proto_inuse_idx.
      
      Changes since v2:
      * Propagate the error code properly
      Signed-off-by: default avatarzhanglin <zhang.lin16@zte.com.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b45ce321
    • David S. Miller's avatar
      Merge tag 'mlx5-fixes-2019-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · d37fb975
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      Mellanox, mlx5 fixes 2019-08-22
      
      This series introduces some fixes to mlx5 driver.
      
      1) Form Moshe, two fixes for firmware health reporter
      2) From Eran, two ktls fixes.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d37fb975
    • Andrew Lunn's avatar
      MAINTAINERS: Add phylink keyword to SFF/SFP/SFP+ MODULE SUPPORT · 0c69b19f
      Andrew Lunn authored
      Russell king maintains phylink, as part of the SFP module support.
      However, much of the review work is about drivers swapping from phylib
      to phylink. Such changes don't make changes to the phylink core, and
      so the F: rules in MAINTAINERS don't match. Add a K:, keywork rule,
      which hopefully get_maintainers will match against for patches to MAC
      drivers swapping to phylink.
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0c69b19f
    • David S. Miller's avatar
      Merge branch 'collect_md-mode-dev-null' · 9b45ff91
      David S. Miller authored
      Hangbin Liu says:
      
      ====================
      fix dev null pointer dereference when send packets larger than mtu in collect_md mode
      
      When we send a packet larger than PMTU, we need to reply with
      icmp_send(ICMP_FRAG_NEEDED) or icmpv6_send(ICMPV6_PKT_TOOBIG).
      
      But with collect_md mode, kernel will crash while accessing the dst dev
      as __metadata_dst_init() init dst->dev to NULL by default. Here is what
      the code path looks like, for GRE:
      
      - ip6gre_tunnel_xmit
        - ip6gre_xmit_ipv4
          - __gre6_xmit
            - ip6_tnl_xmit
              - if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE
          - icmp_send
            - net = dev_net(rt->dst.dev); <-- here
        - ip6gre_xmit_ipv6
          - __gre6_xmit
            - ip6_tnl_xmit
              - if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE
          - icmpv6_send
            ...
            - decode_session4
              - oif = skb_dst(skb)->dev->ifindex; <-- here
            - decode_session6
              - oif = skb_dst(skb)->dev->ifindex; <-- here
      
      We could not fix it in __metadata_dst_init() as there is no dev supplied.
      Look in to the __icmp_send()/decode_session{4,6} code we could find the dst
      dev is actually not needed. In __icmp_send(), we could get the net by skb->dev.
      For decode_session{4,6}, as it was called by xfrm_decode_session_reverse()
      in this scenario, the oif is not used by
      fl4->flowi4_oif = reverse ? skb->skb_iif : oif;
      
      The reproducer is easy:
      
      ovs-vsctl add-br br0
      ip link set br0 up
      ovs-vsctl add-port br0 gre0 -- set interface gre0 type=gre options:remote_ip=$dst_addr
      ip link set gre0 up
      ip addr add ${local_gre6}/64 dev br0
      ping6 $remote_gre6 -s 1500
      
      The kernel will crash like
      [40595.821651] BUG: kernel NULL pointer dereference, address: 0000000000000108
      [40595.822411] #PF: supervisor read access in kernel mode
      [40595.822949] #PF: error_code(0x0000) - not-present page
      [40595.823492] PGD 0 P4D 0
      [40595.823767] Oops: 0000 [#1] SMP PTI
      [40595.824139] CPU: 0 PID: 2831 Comm: handler12 Not tainted 5.2.0 #57
      [40595.824788] Hardware name: Red Hat KVM, BIOS 1.11.1-3.module+el8.1.0+2983+b2ae9c0a 04/01/2014
      [40595.825680] RIP: 0010:__xfrm_decode_session+0x6b/0x930
      [40595.826219] Code: b7 c0 00 00 00 b8 06 00 00 00 66 85 d2 0f b7 ca 48 0f 45 c1 44 0f b6 2c 06 48 8b 47 58 48 83 e0 fe 0f 84 f4 04 00 00 48 8b 00 <44> 8b 80 08 01 00 00 41 f6 c4 01 4c 89 e7
      ba 58 00 00 00 0f 85 47
      [40595.828155] RSP: 0018:ffffc90000a73438 EFLAGS: 00010286
      [40595.828705] RAX: 0000000000000000 RBX: ffff8881329d7100 RCX: 0000000000000000
      [40595.829450] RDX: 0000000000000000 RSI: ffff8881339e70ce RDI: ffff8881329d7100
      [40595.830191] RBP: ffffc90000a73470 R08: 0000000000000000 R09: 000000000000000a
      [40595.830936] R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90000a73490
      [40595.831682] R13: 000000000000002c R14: ffff888132ff1301 R15: ffff8881329d7100
      [40595.832427] FS:  00007f5bfcfd6700(0000) GS:ffff88813ba00000(0000) knlGS:0000000000000000
      [40595.833266] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [40595.833883] CR2: 0000000000000108 CR3: 000000013a368000 CR4: 00000000000006f0
      [40595.834633] Call Trace:
      [40595.835392]  ? rt6_multipath_hash+0x4c/0x390
      [40595.835853]  icmpv6_route_lookup+0xcb/0x1d0
      [40595.836296]  ? icmpv6_xrlim_allow+0x3e/0x140
      [40595.836751]  icmp6_send+0x537/0x840
      [40595.837125]  icmpv6_send+0x20/0x30
      [40595.837494]  tnl_update_pmtu.isra.27+0x19d/0x2a0 [ip_tunnel]
      [40595.838088]  ip_md_tunnel_xmit+0x1b6/0x510 [ip_tunnel]
      [40595.838633]  gre_tap_xmit+0x10c/0x160 [ip_gre]
      [40595.839103]  dev_hard_start_xmit+0x93/0x200
      [40595.839551]  sch_direct_xmit+0x101/0x2d0
      [40595.839967]  __dev_queue_xmit+0x69f/0x9c0
      [40595.840399]  do_execute_actions+0x1717/0x1910 [openvswitch]
      [40595.840987]  ? validate_set.isra.12+0x2f5/0x3d0 [openvswitch]
      [40595.841596]  ? reserve_sfa_size+0x31/0x130 [openvswitch]
      [40595.842154]  ? __ovs_nla_copy_actions+0x1b4/0xad0 [openvswitch]
      [40595.842778]  ? __kmalloc_reserve.isra.50+0x2e/0x80
      [40595.843285]  ? should_failslab+0xa/0x20
      [40595.843696]  ? __kmalloc+0x188/0x220
      [40595.844078]  ? __alloc_skb+0x97/0x270
      [40595.844472]  ovs_execute_actions+0x47/0x120 [openvswitch]
      [40595.845041]  ovs_packet_cmd_execute+0x27d/0x2b0 [openvswitch]
      [40595.845648]  genl_family_rcv_msg+0x3a8/0x430
      [40595.846101]  genl_rcv_msg+0x47/0x90
      [40595.846476]  ? __alloc_skb+0x83/0x270
      [40595.846866]  ? genl_family_rcv_msg+0x430/0x430
      [40595.847335]  netlink_rcv_skb+0xcb/0x100
      [40595.847777]  genl_rcv+0x24/0x40
      [40595.848113]  netlink_unicast+0x17f/0x230
      [40595.848535]  netlink_sendmsg+0x2ed/0x3e0
      [40595.848951]  sock_sendmsg+0x4f/0x60
      [40595.849323]  ___sys_sendmsg+0x2bd/0x2e0
      [40595.849733]  ? sock_poll+0x6f/0xb0
      [40595.850098]  ? ep_scan_ready_list.isra.14+0x20b/0x240
      [40595.850634]  ? _cond_resched+0x15/0x30
      [40595.851032]  ? ep_poll+0x11b/0x440
      [40595.851401]  ? _copy_to_user+0x22/0x30
      [40595.851799]  __sys_sendmsg+0x58/0xa0
      [40595.852180]  do_syscall_64+0x5b/0x190
      [40595.852574]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [40595.853105] RIP: 0033:0x7f5c00038c7d
      [40595.853489] Code: c7 20 00 00 75 10 b8 2e 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 8e f7 ff ff 48 89 04 24 b8 2e 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 d7 f7 ff ff 48 89
      d0 48 83 c4 08 48 3d 01
      [40595.855443] RSP: 002b:00007f5bfcf73c00 EFLAGS: 00003293 ORIG_RAX: 000000000000002e
      [40595.856244] RAX: ffffffffffffffda RBX: 00007f5bfcf74a60 RCX: 00007f5c00038c7d
      [40595.856990] RDX: 0000000000000000 RSI: 00007f5bfcf73c60 RDI: 0000000000000015
      [40595.857736] RBP: 0000000000000004 R08: 0000000000000b7c R09: 0000000000000110
      [40595.858613] R10: 0001000800050004 R11: 0000000000003293 R12: 000055c2d8329da0
      [40595.859401] R13: 00007f5bfcf74120 R14: 0000000000000347 R15: 00007f5bfcf73c60
      [40595.860185] Modules linked in: ip_gre ip_tunnel gre openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 sunrpc bochs_drm ttm drm_kms_helper drm pcspkr joydev i2c_piix4 qemu_fw_cfg xfs libcrc32c virtio_net net_failover serio_raw failover ata_generic virtio_blk pata_acpi floppy
      [40595.863155] CR2: 0000000000000108
      [40595.863551] ---[ end trace 22209bbcacb4addd ]---
      
      v4: Julian Anastasov remind skb->dev also could be NULL in icmp_send. We'd
      better still use dst.dev and do a check to avoid crash.
      
      v3: only replace pkg to packets in cover letter. So I didn't update the version
      info in the follow up patches.
      
      v2: fix it in __icmp_send() and decode_session{4,6} separately instead of
      updating shared dst dev in {ip_md, ip6}_tunnel_xmit.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b45ff91
    • Hangbin Liu's avatar
      xfrm/xfrm_policy: fix dst dev null pointer dereference in collect_md mode · c3b4c3a4
      Hangbin Liu authored
      In decode_session{4,6} there is a possibility that the skb dst dev is NULL,
      e,g, with tunnel collect_md mode, which will cause kernel crash.
      Here is what the code path looks like, for GRE:
      
      - ip6gre_tunnel_xmit
        - ip6gre_xmit_ipv6
          - __gre6_xmit
            - ip6_tnl_xmit
              - if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE
          - icmpv6_send
            - icmpv6_route_lookup
              - xfrm_decode_session_reverse
                - decode_session4
                  - oif = skb_dst(skb)->dev->ifindex; <-- here
                - decode_session6
                  - oif = skb_dst(skb)->dev->ifindex; <-- here
      
      The reason is __metadata_dst_init() init dst->dev to NULL by default.
      We could not fix it in __metadata_dst_init() as there is no dev supplied.
      On the other hand, the skb_dst(skb)->dev is actually not needed as we
      called decode_session{4,6} via xfrm_decode_session_reverse(), so oif is not
      used by: fl4->flowi4_oif = reverse ? skb->skb_iif : oif;
      
      So make a dst dev check here should be clean and safe.
      
      v4: No changes.
      
      v3: No changes.
      
      v2: fix the issue in decode_session{4,6} instead of updating shared dst dev
      in {ip_md, ip6}_tunnel_xmit.
      
      Fixes: 8d79266b ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Tested-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c3b4c3a4
    • Hangbin Liu's avatar
      ipv4/icmp: fix rt dst dev null pointer dereference · e2c69393
      Hangbin Liu authored
      In __icmp_send() there is a possibility that the rt->dst.dev is NULL,
      e,g, with tunnel collect_md mode, which will cause kernel crash.
      Here is what the code path looks like, for GRE:
      
      - ip6gre_tunnel_xmit
        - ip6gre_xmit_ipv4
          - __gre6_xmit
            - ip6_tnl_xmit
              - if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE
          - icmp_send
            - net = dev_net(rt->dst.dev); <-- here
      
      The reason is __metadata_dst_init() init dst->dev to NULL by default.
      We could not fix it in __metadata_dst_init() as there is no dev supplied.
      On the other hand, the reason we need rt->dst.dev is to get the net.
      So we can just try get it from skb->dev when rt->dst.dev is NULL.
      
      v4: Julian Anastasov remind skb->dev also could be NULL. We'd better
      still use dst.dev and do a check to avoid crash.
      
      v3: No changes.
      
      v2: fix the issue in __icmp_send() instead of updating shared dst dev
      in {ip_md, ip6}_tunnel_xmit.
      
      Fixes: c8b34e68 ("ip_tunnel: Add tnl_update_pmtu in ip_md_tunnel_xmit")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: default avatarJulian Anastasov <ja@ssi.bg>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e2c69393
    • Yi-Hung Wei's avatar
      openvswitch: Fix log message in ovs conntrack · 12c6bc38
      Yi-Hung Wei authored
      Fixes: 06bd2bdf ("openvswitch: Add timeout support to ct action")
      Signed-off-by: default avatarYi-Hung Wei <yihung.wei@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      12c6bc38
    • David S. Miller's avatar
      Merge branch 'ieee802154-for-davem-2019-08-24' of... · 12e2e15d
      David S. Miller authored
      Merge branch 'ieee802154-for-davem-2019-08-24' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan
      
      Stefan Schmidt says:
      
      ====================
      pull-request: ieee802154 for net 2019-08-24
      
      An update from ieee802154 for your *net* tree.
      
      Yue Haibing fixed two bugs discovered by KASAN in the hwsim driver for
      ieee802154 and Colin Ian King cleaned up a redundant variable assignment.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      12e2e15d
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 211c4624
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2019-08-24
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Fix verifier precision tracking with BPF-to-BPF calls, from Alexei.
      
      2) Fix a use-after-free in prog symbol exposure, from Daniel.
      
      3) Several s390x JIT fixes plus BE related fixes in BPF kselftests, from Ilya.
      
      4) Fix memory leak by unpinning XDP umem pages in error path, from Ivan.
      
      5) Fix a potential use-after-free on flow dissector detach, from Jakub.
      
      6) Fix bpftool to close prog fd after showing metadata, from Quentin.
      
      7) BPF kselftest config and TEST_PROGS_EXTENDED fixes, from Anders.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      211c4624
  3. 23 Aug, 2019 7 commits
    • Ilya Leoshkevich's avatar
      bpf: allow narrow loads of some sk_reuseport_md fields with offset > 0 · 2c238177
      Ilya Leoshkevich authored
      test_select_reuseport fails on s390 due to verifier rejecting
      test_select_reuseport_kern.o with the following message:
      
      	; data_check.eth_protocol = reuse_md->eth_protocol;
      	18: (69) r1 = *(u16 *)(r6 +22)
      	invalid bpf_context access off=22 size=2
      
      This is because on big-endian machines casts from __u32 to __u16 are
      generated by referencing the respective variable as __u16 with an offset
      of 2 (as opposed to 0 on little-endian machines).
      
      The verifier already has all the infrastructure in place to allow such
      accesses, it's just that they are not explicitly enabled for
      eth_protocol field. Enable them for eth_protocol field by using
      bpf_ctx_range instead of offsetof.
      
      Ditto for ip_protocol, bind_inany and len, since they already allow
      narrowing, and the same problem can arise when working with them.
      
      Fixes: 2dbb9b9e ("bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT")
      Signed-off-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      2c238177
    • Daniel Borkmann's avatar
      bpf: fix use after free in prog symbol exposure · c751798a
      Daniel Borkmann authored
      syzkaller managed to trigger the warning in bpf_jit_free() which checks via
      bpf_prog_kallsyms_verify_off() for potentially unlinked JITed BPF progs
      in kallsyms, and subsequently trips over GPF when walking kallsyms entries:
      
        [...]
        8021q: adding VLAN 0 to HW filter on device batadv0
        8021q: adding VLAN 0 to HW filter on device batadv0
        WARNING: CPU: 0 PID: 9869 at kernel/bpf/core.c:810 bpf_jit_free+0x1e8/0x2a0
        Kernel panic - not syncing: panic_on_warn set ...
        CPU: 0 PID: 9869 Comm: kworker/0:7 Not tainted 5.0.0-rc8+ #1
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        Workqueue: events bpf_prog_free_deferred
        Call Trace:
         __dump_stack lib/dump_stack.c:77 [inline]
         dump_stack+0x113/0x167 lib/dump_stack.c:113
         panic+0x212/0x40b kernel/panic.c:214
         __warn.cold.8+0x1b/0x38 kernel/panic.c:571
         report_bug+0x1a4/0x200 lib/bug.c:186
         fixup_bug arch/x86/kernel/traps.c:178 [inline]
         do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
         do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
         invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
        RIP: 0010:bpf_jit_free+0x1e8/0x2a0
        Code: 02 4c 89 e2 83 e2 07 38 d0 7f 08 84 c0 0f 85 86 00 00 00 48 ba 00 02 00 00 00 00 ad de 0f b6 43 02 49 39 d6 0f 84 5f fe ff ff <0f> 0b e9 58 fe ff ff 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1
        RSP: 0018:ffff888092f67cd8 EFLAGS: 00010202
        RAX: 0000000000000007 RBX: ffffc90001947000 RCX: ffffffff816e9d88
        RDX: dead000000000200 RSI: 0000000000000008 RDI: ffff88808769f7f0
        RBP: ffff888092f67d00 R08: fffffbfff1394059 R09: fffffbfff1394058
        R10: fffffbfff1394058 R11: ffffffff89ca02c7 R12: ffffc90001947002
        R13: ffffc90001947020 R14: ffffffff881eca80 R15: ffff88808769f7e8
        BUG: unable to handle kernel paging request at fffffbfff400d000
        #PF error: [normal kernel read fault]
        PGD 21ffee067 P4D 21ffee067 PUD 21ffed067 PMD 9f942067 PTE 0
        Oops: 0000 [#1] PREEMPT SMP KASAN
        CPU: 0 PID: 9869 Comm: kworker/0:7 Not tainted 5.0.0-rc8+ #1
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        Workqueue: events bpf_prog_free_deferred
        RIP: 0010:bpf_get_prog_addr_region kernel/bpf/core.c:495 [inline]
        RIP: 0010:bpf_tree_comp kernel/bpf/core.c:558 [inline]
        RIP: 0010:__lt_find include/linux/rbtree_latch.h:115 [inline]
        RIP: 0010:latch_tree_find include/linux/rbtree_latch.h:208 [inline]
        RIP: 0010:bpf_prog_kallsyms_find+0x107/0x2e0 kernel/bpf/core.c:632
        Code: 00 f0 ff ff 44 38 c8 7f 08 84 c0 0f 85 fa 00 00 00 41 f6 45 02 01 75 02 0f 0b 48 39 da 0f 82 92 00 00 00 48 89 d8 48 c1 e8 03 <42> 0f b6 04 30 84 c0 74 08 3c 03 0f 8e 45 01 00 00 8b 03 48 c1 e0
        [...]
      
      Upon further debugging, it turns out that whenever we trigger this
      issue, the kallsyms removal in bpf_prog_ksym_node_del() was /skipped/
      but yet bpf_jit_free() reported that the entry is /in use/.
      
      Problem is that symbol exposure via bpf_prog_kallsyms_add() but also
      perf_event_bpf_event() were done /after/ bpf_prog_new_fd(). Once the
      fd is exposed to the public, a parallel close request came in right
      before we attempted to do the bpf_prog_kallsyms_add().
      
      Given at this time the prog reference count is one, we start to rip
      everything underneath us via bpf_prog_release() -> bpf_prog_put().
      The memory is eventually released via deferred free, so we're seeing
      that bpf_jit_free() has a kallsym entry because we added it from
      bpf_prog_load() but /after/ bpf_prog_put() from the remote CPU.
      
      Therefore, move both notifications /before/ we install the fd. The
      issue was never seen between bpf_prog_alloc_id() and bpf_prog_new_fd()
      because upon bpf_prog_get_fd_by_id() we'll take another reference to
      the BPF prog, so we're still holding the original reference from the
      bpf_prog_load().
      
      Fixes: 6ee52e2a ("perf, bpf: Introduce PERF_RECORD_BPF_EVENT")
      Fixes: 74451e66 ("bpf: make jited programs visible in traces")
      Reported-by: syzbot+bd3bba6ff3fcea7a6ec6@syzkaller.appspotmail.com
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Song Liu <songliubraving@fb.com>
      c751798a
    • Alexei Starovoitov's avatar
      bpf: fix precision tracking in presence of bpf2bpf calls · 6754172c
      Alexei Starovoitov authored
      While adding extra tests for precision tracking and extra infra
      to adjust verifier heuristics the existing test
      "calls: cross frame pruning - liveness propagation" started to fail.
      The root cause is the same as described in verifer.c comment:
      
       * Also if parent's curframe > frame where backtracking started,
       * the verifier need to mark registers in both frames, otherwise callees
       * may incorrectly prune callers. This is similar to
       * commit 7640ead9 ("bpf: verifier: make sure callees don't prune with caller differences")
       * For now backtracking falls back into conservative marking.
      
      Turned out though that returning -ENOTSUPP from backtrack_insn() and
      doing mark_all_scalars_precise() in the current parentage chain is not enough.
      Depending on how is_state_visited() heuristic is creating parentage chain
      it's possible that callee will incorrectly prune caller.
      Fix the issue by setting precise=true earlier and more aggressively.
      Before this fix the precision tracking _within_ functions that don't do
      bpf2bpf calls would still work. Whereas now precision tracking is completely
      disabled when bpf2bpf calls are present anywhere in the program.
      
      No difference in cilium tests (they don't have bpf2bpf calls).
      No difference in test_progs though some of them have bpf2bpf calls,
      but precision tracking wasn't effective there.
      
      Fixes: b5dc0163 ("bpf: precise scalar_value tracking")
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      6754172c
    • Jakub Sitnicki's avatar
      flow_dissector: Fix potential use-after-free on BPF_PROG_DETACH · db38de39
      Jakub Sitnicki authored
      Call to bpf_prog_put(), with help of call_rcu(), queues an RCU-callback to
      free the program once a grace period has elapsed. The callback can run
      together with new RCU readers that started after the last grace period.
      New RCU readers can potentially see the "old" to-be-freed or already-freed
      pointer to the program object before the RCU update-side NULLs it.
      
      Reorder the operations so that the RCU update-side resets the protected
      pointer before the end of the grace period after which the program will be
      freed.
      
      Fixes: d58e468b ("flow_dissector: implements flow dissector BPF hook")
      Reported-by: default avatarLorenz Bauer <lmb@cloudflare.com>
      Signed-off-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Acked-by: default avatarPetar Penkov <ppenkov@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      db38de39
    • Heiner Kallweit's avatar
      Revert "r8169: remove not needed call to dma_sync_single_for_device" · 345b9326
      Heiner Kallweit authored
      This reverts commit f072218c.
      
      As reported by Aaro this patch causes network problems on
      MIPS Loongson platform. Therefore revert it.
      
      Fixes: f072218c ("r8169: remove not needed call to dma_sync_single_for_device")
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Reported-by: default avatarAaro Koskinen <aaro.koskinen@iki.fi>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      345b9326
    • Sabrina Dubroca's avatar
      ipv6: propagate ipv6_add_dev's error returns out of ipv6_find_idev · db0b99f5
      Sabrina Dubroca authored
      Currently, ipv6_find_idev returns NULL when ipv6_add_dev fails,
      ignoring the specific error value. This results in addrconf_add_dev
      returning ENOBUFS in all cases, which is unfortunate in cases such as:
      
          # ip link add dummyX type dummy
          # ip link set dummyX mtu 1200 up
          # ip addr add 2000::/64 dev dummyX
          RTNETLINK answers: No buffer space available
      
      Commit a317a2f1 ("ipv6: fail early when creating netdev named all
      or default") introduced error returns in ipv6_add_dev. Before that,
      that function would simply return NULL for all failures.
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      db0b99f5
    • Justin.Lee1@Dell.com's avatar
      net/ncsi: Fix the payload copying for the request coming from Netlink · f6edbf2d
      Justin.Lee1@Dell.com authored
      The request coming from Netlink should use the OEM generic handler.
      
      The standard command handler expects payload in bytes/words/dwords
      but the actual payload is stored in data if the request is coming from Netlink.
      Signed-off-by: default avatarJustin Lee <justin.lee1@dell.com>
      Reviewed-by: default avatarVijay Khemka <vijaykhemka@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f6edbf2d
  4. 22 Aug, 2019 11 commits
  5. 21 Aug, 2019 7 commits