Commit 836e66c2 authored by Daniel Borkmann's avatar Daniel Borkmann Committed by Alexei Starovoitov

bpf: Fix up bpf_skb_adjust_room helper's skb csum setting

Lorenz recently reported:

  In our TC classifier cls_redirect [0], we use the following sequence of
  helper calls to decapsulate a GUE (basically IP + UDP + custom header)
  encapsulated packet:

    bpf_skb_adjust_room(skb, -encap_len, BPF_ADJ_ROOM_MAC, BPF_F_ADJ_ROOM_FIXED_GSO)
    bpf_redirect(skb->ifindex, BPF_F_INGRESS)

  It seems like some checksums of the inner headers are not validated in
  this case. For example, a TCP SYN packet with invalid TCP checksum is
  still accepted by the network stack and elicits a SYN ACK. [...]

  That is, we receive the following packet from the driver:

    | ETH | IP | UDP | GUE | IP | TCP |
    skb->ip_summed == CHECKSUM_UNNECESSARY

  ip_summed is CHECKSUM_UNNECESSARY because our NICs do rx checksum offloading.
  On this packet we run skb_adjust_room_mac(-encap_len), and get the following:

    | ETH | IP | TCP |
    skb->ip_summed == CHECKSUM_UNNECESSARY

  Note that ip_summed is still CHECKSUM_UNNECESSARY. After bpf_redirect()'ing
  into the ingress, we end up in tcp_v4_rcv(). There, skb_checksum_init() is
  turned into a no-op due to CHECKSUM_UNNECESSARY.

The bpf_skb_adjust_room() helper is not aware of protocol specifics. Internally,
it handles the CHECKSUM_COMPLETE case via skb_postpull_rcsum(), but that does
not cover CHECKSUM_UNNECESSARY. In this case skb->csum_level of the original
skb prior to bpf_skb_adjust_room() call was 0, that is, covering UDP. Right now
there is no way to adjust the skb->csum_level. NICs that have checksum offload
disabled (CHECKSUM_NONE) or that support CHECKSUM_COMPLETE are not affected.

Use a safe default for CHECKSUM_UNNECESSARY by resetting to CHECKSUM_NONE and
add a flag to the helper called BPF_F_ADJ_ROOM_NO_CSUM_RESET that allows users
from opting out. Opting out is useful for the case where we don't remove/add
full protocol headers, or for the case where a user wants to adjust the csum
level manually e.g. through bpf_csum_level() helper that is added in subsequent
patch.

The bpf_skb_proto_{4_to_6,6_to_4}() for NAT64/46 translation from the BPF
bpf_skb_change_proto() helper uses bpf_skb_net_hdr_{push,pop}() pair internally
as well but doesn't change layers, only transitions between v4 to v6 and vice
versa, therefore no adoption is required there.

  [0] https://lore.kernel.org/bpf/20200424185556.7358-1-lmb@cloudflare.com/

Fixes: 2be7e212 ("bpf: add bpf_skb_adjust_room helper")
Reported-by: default avatarLorenz Bauer <lmb@cloudflare.com>
Reported-by: default avatarAlan Maguire <alan.maguire@oracle.com>
Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: default avatarLorenz Bauer <lmb@cloudflare.com>
Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
Reviewed-by: default avatarAlan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/CACAyw9-uU_52esMd1JjuA80fRPHJv5vsSg8GnfW3t_qDU4aVKQ@mail.gmail.com/
Link: https://lore.kernel.org/bpf/11a90472e7cce83e76ddbfce81fdfce7bfc68808.1591108731.git.daniel@iogearbox.net
parent 9a25c1df
...@@ -3919,6 +3919,14 @@ static inline void __skb_incr_checksum_unnecessary(struct sk_buff *skb) ...@@ -3919,6 +3919,14 @@ static inline void __skb_incr_checksum_unnecessary(struct sk_buff *skb)
} }
} }
static inline void __skb_reset_checksum_unnecessary(struct sk_buff *skb)
{
if (skb->ip_summed == CHECKSUM_UNNECESSARY) {
skb->ip_summed = CHECKSUM_NONE;
skb->csum_level = 0;
}
}
/* Check if we need to perform checksum complete validation. /* Check if we need to perform checksum complete validation.
* *
* Returns true if checksum complete is needed, false otherwise * Returns true if checksum complete is needed, false otherwise
......
...@@ -1635,6 +1635,13 @@ union bpf_attr { ...@@ -1635,6 +1635,13 @@ union bpf_attr {
* Grow or shrink the room for data in the packet associated to * Grow or shrink the room for data in the packet associated to
* *skb* by *len_diff*, and according to the selected *mode*. * *skb* by *len_diff*, and according to the selected *mode*.
* *
* By default, the helper will reset any offloaded checksum
* indicator of the skb to CHECKSUM_NONE. This can be avoided
* by the following flag:
*
* * **BPF_F_ADJ_ROOM_NO_CSUM_RESET**: Do not reset offloaded
* checksum data of the skb to CHECKSUM_NONE.
*
* There are two supported modes at this time: * There are two supported modes at this time:
* *
* * **BPF_ADJ_ROOM_MAC**: Adjust room at the mac layer * * **BPF_ADJ_ROOM_MAC**: Adjust room at the mac layer
...@@ -3433,6 +3440,7 @@ enum { ...@@ -3433,6 +3440,7 @@ enum {
BPF_F_ADJ_ROOM_ENCAP_L3_IPV6 = (1ULL << 2), BPF_F_ADJ_ROOM_ENCAP_L3_IPV6 = (1ULL << 2),
BPF_F_ADJ_ROOM_ENCAP_L4_GRE = (1ULL << 3), BPF_F_ADJ_ROOM_ENCAP_L4_GRE = (1ULL << 3),
BPF_F_ADJ_ROOM_ENCAP_L4_UDP = (1ULL << 4), BPF_F_ADJ_ROOM_ENCAP_L4_UDP = (1ULL << 4),
BPF_F_ADJ_ROOM_NO_CSUM_RESET = (1ULL << 5),
}; };
enum { enum {
......
...@@ -3113,7 +3113,8 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff, ...@@ -3113,7 +3113,8 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
{ {
int ret; int ret;
if (flags & ~BPF_F_ADJ_ROOM_FIXED_GSO) if (unlikely(flags & ~(BPF_F_ADJ_ROOM_FIXED_GSO |
BPF_F_ADJ_ROOM_NO_CSUM_RESET)))
return -EINVAL; return -EINVAL;
if (skb_is_gso(skb) && !skb_is_gso_tcp(skb)) { if (skb_is_gso(skb) && !skb_is_gso_tcp(skb)) {
...@@ -3163,7 +3164,8 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff, ...@@ -3163,7 +3164,8 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
u32 off; u32 off;
int ret; int ret;
if (unlikely(flags & ~BPF_F_ADJ_ROOM_MASK)) if (unlikely(flags & ~(BPF_F_ADJ_ROOM_MASK |
BPF_F_ADJ_ROOM_NO_CSUM_RESET)))
return -EINVAL; return -EINVAL;
if (unlikely(len_diff_abs > 0xfffU)) if (unlikely(len_diff_abs > 0xfffU))
return -EFAULT; return -EFAULT;
...@@ -3191,6 +3193,8 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff, ...@@ -3191,6 +3193,8 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
ret = shrink ? bpf_skb_net_shrink(skb, off, len_diff_abs, flags) : ret = shrink ? bpf_skb_net_shrink(skb, off, len_diff_abs, flags) :
bpf_skb_net_grow(skb, off, len_diff_abs, flags); bpf_skb_net_grow(skb, off, len_diff_abs, flags);
if (!ret && !(flags & BPF_F_ADJ_ROOM_NO_CSUM_RESET))
__skb_reset_checksum_unnecessary(skb);
bpf_compute_data_pointers(skb); bpf_compute_data_pointers(skb);
return ret; return ret;
......
...@@ -1635,6 +1635,13 @@ union bpf_attr { ...@@ -1635,6 +1635,13 @@ union bpf_attr {
* Grow or shrink the room for data in the packet associated to * Grow or shrink the room for data in the packet associated to
* *skb* by *len_diff*, and according to the selected *mode*. * *skb* by *len_diff*, and according to the selected *mode*.
* *
* By default, the helper will reset any offloaded checksum
* indicator of the skb to CHECKSUM_NONE. This can be avoided
* by the following flag:
*
* * **BPF_F_ADJ_ROOM_NO_CSUM_RESET**: Do not reset offloaded
* checksum data of the skb to CHECKSUM_NONE.
*
* There are two supported modes at this time: * There are two supported modes at this time:
* *
* * **BPF_ADJ_ROOM_MAC**: Adjust room at the mac layer * * **BPF_ADJ_ROOM_MAC**: Adjust room at the mac layer
...@@ -3433,6 +3440,7 @@ enum { ...@@ -3433,6 +3440,7 @@ enum {
BPF_F_ADJ_ROOM_ENCAP_L3_IPV6 = (1ULL << 2), BPF_F_ADJ_ROOM_ENCAP_L3_IPV6 = (1ULL << 2),
BPF_F_ADJ_ROOM_ENCAP_L4_GRE = (1ULL << 3), BPF_F_ADJ_ROOM_ENCAP_L4_GRE = (1ULL << 3),
BPF_F_ADJ_ROOM_ENCAP_L4_UDP = (1ULL << 4), BPF_F_ADJ_ROOM_ENCAP_L4_UDP = (1ULL << 4),
BPF_F_ADJ_ROOM_NO_CSUM_RESET = (1ULL << 5),
}; };
enum { enum {
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment