• Jamal Hadi Salim's avatar
    net, sched: Fix SKB_NOT_DROPPED_YET splat under debug config · 40cb2fdf
    Jamal Hadi Salim authored
    Getting the following splat [1] with CONFIG_DEBUG_NET=y and this
    reproducer [2]. Problem seems to be that classifiers clear 'struct
    tcf_result::drop_reason', thereby triggering the warning in
    __kfree_skb_reason() due to reason being 'SKB_NOT_DROPPED_YET' (0).
    
    Fixed by disambiguating a legit error from a verdict with a bogus drop_reason
    
    [1]
    WARNING: CPU: 0 PID: 181 at net/core/skbuff.c:1082 kfree_skb_reason+0x38/0x130
    Modules linked in:
    CPU: 0 PID: 181 Comm: mausezahn Not tainted 6.6.0-rc6-custom-ge43e6d95 #682
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014
    RIP: 0010:kfree_skb_reason+0x38/0x130
    [...]
    Call Trace:
     <IRQ>
     __netif_receive_skb_core.constprop.0+0x837/0xdb0
     __netif_receive_skb_one_core+0x3c/0x70
     process_backlog+0x95/0x130
     __napi_poll+0x25/0x1b0
     net_rx_action+0x29b/0x310
     __do_softirq+0xc0/0x29b
     do_softirq+0x43/0x60
     </IRQ>
    
    [2]
    
    ip link add name veth0 type veth peer name veth1
    ip link set dev veth0 up
    ip link set dev veth1 up
    tc qdisc add dev veth1 clsact
    tc filter add dev veth1 ingress pref 1 proto all flower dst_mac 00:11:22:33:44:55 action drop
    mausezahn veth0 -a own -b 00:11:22:33:44:55 -q -c 1
    
    Ido reported:
    
      [...] getting the following splat [1] with CONFIG_DEBUG_NET=y and this
      reproducer [2]. Problem seems to be that classifiers clear 'struct
      tcf_result::drop_reason', thereby triggering the warning in
      __kfree_skb_reason() due to reason being 'SKB_NOT_DROPPED_YET' (0). [...]
    
      [1]
      WARNING: CPU: 0 PID: 181 at net/core/skbuff.c:1082 kfree_skb_reason+0x38/0x130
      Modules linked in:
      CPU: 0 PID: 181 Comm: mausezahn Not tainted 6.6.0-rc6-custom-ge43e6d95 #682
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014
      RIP: 0010:kfree_skb_reason+0x38/0x130
      [...]
      Call Trace:
       <IRQ>
       __netif_receive_skb_core.constprop.0+0x837/0xdb0
       __netif_receive_skb_one_core+0x3c/0x70
       process_backlog+0x95/0x130
       __napi_poll+0x25/0x1b0
       net_rx_action+0x29b/0x310
       __do_softirq+0xc0/0x29b
       do_softirq+0x43/0x60
       </IRQ>
    
      [2]
      #!/bin/bash
    
      ip link add name veth0 type veth peer name veth1
      ip link set dev veth0 up
      ip link set dev veth1 up
      tc qdisc add dev veth1 clsact
      tc filter add dev veth1 ingress pref 1 proto all flower dst_mac 00:11:22:33:44:55 action drop
      mausezahn veth0 -a own -b 00:11:22:33:44:55 -q -c 1
    
    What happens is that inside most classifiers the tcf_result is copied over
    from a filter template e.g. *res = f->res which then implicitly overrides
    the prior SKB_DROP_REASON_TC_{INGRESS,EGRESS} default drop code which was
    set via sch_handle_{ingress,egress}() for kfree_skb_reason().
    
    Commit text above copied verbatim from Daniel. The general idea of the patch
    is not very different from what Ido originally posted but instead done at the
    cls_api codepath.
    
    Fixes: 54a59aed
    
     ("net, sched: Make tc-related drop reason more flexible")
    Reported-by: default avatarIdo Schimmel <idosch@idosch.org>
    Signed-off-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
    Link: https://lore.kernel.org/netdev/ZTjY959R+AFXf3Xy@shredder
    
    Reviewed-by: default avatarSimon Horman <horms@kernel.org>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    40cb2fdf
cls_api.c 97.4 KB