• Daniel Xu's avatar
    libbpf: Add BPF_CORE_WRITE_BITFIELD() macro · 2f708035
    Daniel Xu authored
    === Motivation ===
    
    Similar to reading from CO-RE bitfields, we need a CO-RE aware bitfield
    writing wrapper to make the verifier happy.
    
    Two alternatives to this approach are:
    
    1. Use the upcoming `preserve_static_offset` [0] attribute to disable
       CO-RE on specific structs.
    2. Use broader byte-sized writes to write to bitfields.
    
    (1) is a bit hard to use. It requires specific and not-very-obvious
    annotations to bpftool generated vmlinux.h. It's also not generally
    available in released LLVM versions yet.
    
    (2) makes the code quite hard to read and write. And especially if
    BPF_CORE_READ_BITFIELD() is already being used, it makes more sense to
    to have an inverse helper for writing.
    
    === Implementation details ===
    
    Since the logic is a bit non-obvious, I thought it would be helpful
    to explain exactly what's going on.
    
    To start, it helps by explaining what LSHIFT_U64 (lshift) and RSHIFT_U64
    (rshift) is designed to mean. Consider the core of the
    BPF_CORE_READ_BITFIELD() algorithm:
    
            val <<= __CORE_RELO(s, field, LSHIFT_U64);
            val = val >> __CORE_RELO(s, field, RSHIFT_U64);
    
    Basically what happens is we lshift to clear the non-relevant (blank)
    higher order bits. Then we rshift to bring the relevant bits (bitfield)
    down to LSB position (while also clearing blank lower order bits). To
    illustrate:
    
            Start:    ........XXX......
            Lshift:   XXX......00000000
            Rshift:   00000000000000XXX
    
    where `.` means blank bit, `0` means 0 bit, and `X` means bitfield bit.
    
    After the two operations, the bitfield is ready to be interpreted as a
    regular integer.
    
    Next, we want to build an alternative (but more helpful) mental model
    on lshift and rshift. That is, to consider:
    
    * rshift as the total number of blank bits in the u64
    * lshift as number of blank bits left of the bitfield in the u64
    
    Take a moment to consider why that is true by consulting the above
    diagram.
    
    With this insight, we can now define the following relationship:
    
                  bitfield
                     _
                    | |
            0.....00XXX0...00
            |      |   |    |
            |______|   |    |
             lshift    |    |
                       |____|
                  (rshift - lshift)
    
    That is, we know the number of higher order blank bits is just lshift.
    And the number of lower order blank bits is (rshift - lshift).
    
    Finally, we can examine the core of the write side algorithm:
    
            mask = (~0ULL << rshift) >> lshift;              // 1
            val = (val & ~mask) | ((nval << rpad) & mask);   // 2
    
    1. Compute a mask where the set bits are the bitfield bits. The first
       left shift zeros out exactly the number of blank bits, leaving a
       bitfield sized set of 1s. The subsequent right shift inserts the
       correct amount of higher order blank bits.
    
    2. On the left of the `|`, mask out the bitfield bits. This creates
       0s where the new bitfield bits will go. On the right of the `|`,
       bring nval into the correct bit position and mask out any bits
       that fall outside of the bitfield. Finally, by bor'ing the two
       halves, we get the final set of bits to write back.
    
    [0]: https://reviews.llvm.org/D133361Co-developed-by: default avatarEduard Zingerman <eddyz87@gmail.com>
    Signed-off-by: default avatarEduard Zingerman <eddyz87@gmail.com>
    Co-developed-by: default avatarJonathan Lemon <jlemon@aviatrix.com>
    Signed-off-by: default avatarJonathan Lemon <jlemon@aviatrix.com>
    Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
    Signed-off-by: default avatarDaniel Xu <dxu@dxuuu.xyz>
    Link: https://lore.kernel.org/r/4d3dd215a4fd57d980733886f9c11a45e1a9adf3.1702325874.git.dxu@dxuuu.xyzSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
    2f708035
bpf_core_read.h 20.2 KB