• Vladimir Oltean's avatar
    net/sched: taprio: keep child Qdisc refcount elevated at 2 in offload mode · 25b0d4e4
    Vladimir Oltean authored
    Normally, Qdiscs have one reference on them held by their owner and one
    held for each TXQ to which they are attached, however this is not the
    case with the children of an offloaded taprio. Instead, the taprio qdisc
    currently lives in the following fragile equilibrium.
    
    In the software scheduling case, taprio attaches itself (the root Qdisc)
    to all TXQs, thus having a refcount of 1 + the number of TX queues. In
    this mode, the q->qdiscs[] children are not visible directly to the
    Qdisc API. The lifetime of the Qdiscs from this private array lasts
    until qdisc_destroy() -> taprio_destroy().
    
    In the fully offloaded case, the root taprio has a refcount of 1, and
    all child q->qdiscs[] also have a refcount of 1. The child q->qdiscs[]
    are attached to the netdev TXQs directly and thus are visible to the
    Qdisc API, however taprio loses a reference to them very early - during
    qdisc_graft(parent==NULL) -> taprio_attach(). At that time, taprio frees
    the q->qdiscs[] array to not leak memory, but interestingly, it does not
    release a reference on these qdiscs because it doesn't effectively own
    them - they are created by taprio but owned by the Qdisc core, and will
    be freed by qdisc_graft(parent==NULL, new==NULL) -> qdisc_put(old) when
    the Qdisc is deleted or when the child Qdisc is replaced with something
    else.
    
    My interest is to change this equilibrium such that taprio also owns a
    reference on the q->qdiscs[] child Qdiscs for the lifetime of the root
    Qdisc, including in full offload mode. I want this because I would like
    taprio_leaf(), taprio_dump_class(), taprio_dump_class_stats() to have
    insight into q->qdiscs[] for the software scheduling mode - currently
    they look at dev_queue->qdisc_sleeping, which is, as mentioned, the same
    as the root taprio.
    
    The following set of changes is necessary:
    - don't free q->qdiscs[] early in taprio_attach(), free it late in
      taprio_destroy() for consistency with software mode. But:
    - currently that's not possible, because taprio doesn't own a reference
      on q->qdiscs[]. So hold that reference - once during the initial
      attach() and once during subsequent graft() calls when the child is
      changed.
    - always keep track of the current child in q->qdiscs[], even for full
      offload mode, so that we free in taprio_destroy() what we should, and
      not something stale.
    Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
    Link: https://lore.kernel.org/r/20230807193324.4128292-3-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
    25b0d4e4
sch_taprio.c 66.3 KB