spec/lib/banzai/reference_parser/base_parser_spec.rb · e8faadaefb658f9b2ba52cf86d6b7ca13df4af64 · nexedi / gitlab-ce

Prevent stack overflowing with a lot of user references · e8faadae

Sean McGivern authored Dec 24, 2019

When an issue, MR, or epic contains a lot of user references (even to
the same user), we use the method
Banzai::ReferenceParser::BaseParser.collection_objects_for_ids to load
those users and maintain the loaded users for the current request in a
RequestStore cache.

This method calls Hash#values_at with the splatted IDs - so one argument
per ID passed in. If enough IDs are passed in, it's possible for the
arguments here to overflow the stack.

Let's say we have:

    ids = [1] * 1_000_000; ids.length
    #=> 1000000
    cache = {}
    #=> {}
    cache.values_at(*ids).compact
    # SystemStackError: stack level too deep

We could do this `cache.values_at(*ids.uniq).compact`, but that would
fail in this case:

    distinct_ids = 1.upto(1_000_000).to_a; distinct_ids.length
    #=> 1000000
    cache.values_at(*distinct_ids.uniq).compact
    # SystemStackError: stack level too deep

So the best option seems to be to just write a slower, but
non-stack-consuming, version:

    Benchmark.realtime { ids.uniq.map { |id| cache[id] }.compact }
    #=> 0.15192799968644977
    Benchmark.realtime { distinct_ids.uniq.map { |id| cache[id] }.compact }
    #=> 0.3621319998055696

To test this locally, you can replace the last issue's description with
a lot of mentions of a user (here I've created a user with the username
't'), but it takes a long time:

    problem = ('@t ' * 1_000_000)[0...1_000_000]; problem.length
    #=> 1000000
    Issue.last.update!(description: problem_string)

Visiting that issue in the UI will then overflow the stack. With this
change, it won't (although it is very slow).

e8faadae