• Ido Schimmel's avatar
    nexthop: Fix infinite nexthop bucket dump when using maximum nexthop ID · 8743aeff
    Ido Schimmel authored
    A netlink dump callback can return a positive number to signal that more
    information needs to be dumped or zero to signal that the dump is
    complete. In the second case, the core netlink code will append the
    NLMSG_DONE message to the skb in order to indicate to user space that
    the dump is complete.
    
    The nexthop bucket dump callback always returns a positive number if
    nexthop buckets were filled in the provided skb, even if the dump is
    complete. This means that a dump will span at least two recvmsg() calls
    as long as nexthop buckets are present. In the last recvmsg() call the
    dump callback will not fill in any nexthop buckets because the previous
    call indicated that the dump should restart from the last dumped nexthop
    ID plus one.
    
     # ip link add name dummy1 up type dummy
     # ip nexthop add id 1 dev dummy1
     # ip nexthop add id 10 group 1 type resilient buckets 2
     # strace -e sendto,recvmsg -s 5 ip nexthop bucket
     sendto(3, [[{nlmsg_len=24, nlmsg_type=RTM_GETNEXTHOPBUCKET, nlmsg_flags=NLM_F_REQUEST|NLM_F_DUMP, nlmsg_seq=1691396980, nlmsg_pid=0}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], {nlmsg_len=0, nlmsg_type=0 /* NLMSG_??? */, nlmsg_flags=0, nlmsg_seq=0, nlmsg_pid=0}], 152, 0, NULL, 0) = 152
     recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 128
     recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396980, nlmsg_pid=347}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], [{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396980, nlmsg_pid=347}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 128
     id 10 index 0 idle_time 6.66 nhid 1
     id 10 index 1 idle_time 6.66 nhid 1
     recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 20
     recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396980, nlmsg_pid=347}, 0], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20
     +++ exited with 0 +++
    
    This behavior is both inefficient and buggy. If the last nexthop to be
    dumped had the maximum ID of 0xffffffff, then the dump will restart from
    0 (0xffffffff + 1) and never end:
    
     # ip link add name dummy1 up type dummy
     # ip nexthop add id 1 dev dummy1
     # ip nexthop add id $((2**32-1)) group 1 type resilient buckets 2
     # ip nexthop bucket
     id 4294967295 index 0 idle_time 5.55 nhid 1
     id 4294967295 index 1 idle_time 5.55 nhid 1
     id 4294967295 index 0 idle_time 5.55 nhid 1
     id 4294967295 index 1 idle_time 5.55 nhid 1
     [...]
    
    Fix by adjusting the dump callback to return zero when the dump is
    complete. After the fix only one recvmsg() call is made and the
    NLMSG_DONE message is appended to the RTM_NEWNEXTHOPBUCKET responses:
    
     # ip link add name dummy1 up type dummy
     # ip nexthop add id 1 dev dummy1
     # ip nexthop add id $((2**32-1)) group 1 type resilient buckets 2
     # strace -e sendto,recvmsg -s 5 ip nexthop bucket
     sendto(3, [[{nlmsg_len=24, nlmsg_type=RTM_GETNEXTHOPBUCKET, nlmsg_flags=NLM_F_REQUEST|NLM_F_DUMP, nlmsg_seq=1691396737, nlmsg_pid=0}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], {nlmsg_len=0, nlmsg_type=0 /* NLMSG_??? */, nlmsg_flags=0, nlmsg_seq=0, nlmsg_pid=0}], 152, 0, NULL, 0) = 152
     recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 148
     recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396737, nlmsg_pid=350}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], [{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396737, nlmsg_pid=350}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], [{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396737, nlmsg_pid=350}, 0]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 148
     id 4294967295 index 0 idle_time 6.61 nhid 1
     id 4294967295 index 1 idle_time 6.61 nhid 1
     +++ exited with 0 +++
    
    Note that if the NLMSG_DONE message cannot be appended because of size
    limitations, then another recvmsg() will be needed, but the core netlink
    code will not invoke the dump callback and simply reply with a
    NLMSG_DONE message since it knows that the callback previously returned
    zero.
    
    Add a test that fails before the fix:
    
     # ./fib_nexthops.sh -t basic_res
     [...]
     TEST: Maximum nexthop ID dump                                       [FAIL]
     [...]
    
    And passes after it:
    
     # ./fib_nexthops.sh -t basic_res
     [...]
     TEST: Maximum nexthop ID dump                                       [ OK ]
     [...]
    
    Fixes: 8a1bbabb ("nexthop: Add netlink handlers for bucket dump")
    Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
    Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
    Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
    Link: https://lore.kernel.org/r/20230808075233.3337922-4-idosch@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
    8743aeff
fib_nexthops.sh 69.1 KB