• Sasha Goldshtein's avatar
    cc: Symbol resolution with multiple executable regions per module · f141d1b9
    Sasha Goldshtein authored
    The symbol resolution code used to assume for most purposes that
    there is a single executable region per module. When there were
    several, there was no crash, but symbols were not resolved correctly.
    The reason is that the symbol offsets are relative to the first
    executable region's start address, but bcc would resolve them
    relative to the region in which they appeared. For example, given
    the following regions and spans for a module libfoo.so loaded into
    some process:
    
      1000-2000 r-xp libfoo.so
      2000-3000 rw-p libfoo.so
      3000-4000 r-xp libfoo.so
      4000-5000 r--- libfoo.so
    
    Now, suppose there is a symbol bar() loaded at address 3500. In
    the binary on disk, bar() is at offset 2500 from the beginning of
    the module (but not the beginning of the 3000-4000 region!). When
    we look at the candidate regions, we find 3000-4000, and discover
    that 3500 lies within it. Then we subtract 3500-3000 to find the
    offset from the beginning of the region, get 500, and now look
    for a symbol that contains the relative address 500. As a result,
    we might find some random symbol in the region 1000-2000, and
    report that address 3500 corresponds to that random symbol rather
    than to bar().
    
    This commit fixes the situation by keeping only a single `Module`
    instance for each module, even if that module spans multiple
    executable regions. We remember all executable region start and
    end ranges so we can determine whether an address (like 3500 in
    the above example) lies within the module. But for the purpose of
    finding the actual symbol, we need only the offset from the start
    of the _first_ executable region, and then need to look up a symbol
    based on that.
    
    This was discovered and fixed while tracing .NET Core processes on
    Linux, where libcoreclr.so (the main CLR binary) has several
    executable regions. Resolving symbols from any but the first region
    would produce totally bogus results.
    f141d1b9
syms.h 3.3 KB