• Martin KaFai Lau's avatar
    bpf: Keep the (rcv) timestamp behavior for the existing tc-bpf@ingress · 7449197d
    Martin KaFai Lau authored
    The current tc-bpf@ingress reads and writes the __sk_buff->tstamp
    as a (rcv) timestamp which currently could either be 0 (not available)
    or ktime_get_real().  This patch is to backward compatible with the
    (rcv) timestamp expectation at ingress.  If the skb->tstamp has
    the delivery_time, the bpf insn rewrite will read 0 for tc-bpf
    running at ingress as it is not available.  When writing at ingress,
    it will also clear the skb->mono_delivery_time bit.
    
    /* BPF_READ: a = __sk_buff->tstamp */
    if (!skb->tc_at_ingress || !skb->mono_delivery_time)
    	a = skb->tstamp;
    else
    	a = 0
    
    /* BPF_WRITE: __sk_buff->tstamp = a */
    if (skb->tc_at_ingress)
    	skb->mono_delivery_time = 0;
    skb->tstamp = a;
    
    [ A note on the BPF_CGROUP_INET_INGRESS which can also access
      skb->tstamp.  At that point, the skb is delivered locally
      and skb_clear_delivery_time() has already been done,
      so the skb->tstamp will only have the (rcv) timestamp. ]
    
    If the tc-bpf@egress writes 0 to skb->tstamp, the skb->mono_delivery_time
    has to be cleared also.  It could be done together during
    convert_ctx_access().  However, the latter patch will also expose
    the skb->mono_delivery_time bit as __sk_buff->delivery_time_type.
    Changing the delivery_time_type in the background may surprise
    the user, e.g. the 2nd read on __sk_buff->delivery_time_type
    may need a READ_ONCE() to avoid compiler optimization.  Thus,
    in expecting the needs in the latter patch, this patch does a
    check on !skb->tstamp after running the tc-bpf and clears the
    skb->mono_delivery_time bit if needed.  The earlier discussion
    on v4 [0].
    
    The bpf insn rewrite requires the skb's mono_delivery_time bit and
    tc_at_ingress bit.  They are moved up in sk_buff so that bpf rewrite
    can be done at a fixed offset.  tc_skip_classify is moved together with
    tc_at_ingress.  To get one bit for mono_delivery_time, csum_not_inet is
    moved down and this bit is currently used by sctp.
    
    [0]: https://lore.kernel.org/bpf/20220217015043.khqwqklx45c4m4se@kafai-mbp.dhcp.thefacebook.com/Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    7449197d
cls_bpf.c 16.9 KB