Commit 755ed5e3 authored by Patrick Steinhardt's avatar Patrick Steinhardt

checks: Fix combinatorial explosion in `#commits_for()`

The function `#commits_for()` has been introduced via 156ce433
(checks: Implement infrastructure for bulk diff checks, 2021-07-29) in
order to allow bulk-loading of commits. What this function does is given
a set of new commits and a specific object ID, it will do a graph walk
of these new commits starting from this ID in order to extract only
those commits which have been newly introduced via this object ID.

As it turns out though, the implementation has a bug which causes
combinatorial explosion: if a commit is reachable via multiple commits,
then it will be walked and returned multiple times. This can happen if
there are merge commits, where we'll now repeatedly walk all common
ancestors of both commits. In case these again contain merge commits,
the common ancestors again get walked multiple times. The result is that
commits get walked exponentially many times. For the following graph
with criss-cross merges, this causes us to return 768 commits instead of
the expected 18:

      o---o---o---o---o---o---o---o
     / \ / \ / \ / \ / \ / \ / \ / \
    o   X   X   X   X   X   X   X   o
    \  / \ / \ / \ / \ / \ / \ / \ /
      o---o---o---o---o---o---o---o

Fix the bug by removing seen commit IDs from the hash tracking commits
by their object ID. Furthermore, the pending queue is converted to a set
such that we don't re-add IDs which we have already seen before such
that it doesn't exhibit exponential growth.

Changelog: fixed
parent bb1ae909
...@@ -48,16 +48,28 @@ module Gitlab ...@@ -48,16 +48,28 @@ module Gitlab
commits_by_id = commits.index_by(&:id) commits_by_id = commits.index_by(&:id)
result = [] result = []
pending = [newrev] pending = Set[newrev]
# We go up the parent chain of our newrev and collect all commits which # We go up the parent chain of our newrev and collect all commits which
# are new. In case a commit's ID cannot be found in the set of new # are new. In case a commit's ID cannot be found in the set of new
# commits, then it must already be a preexisting commit. # commits, then it must already be a preexisting commit.
pending.each do |rev| while pending.any?
commit = commits_by_id[rev] rev = pending.first
pending.delete(rev)
# Remove the revision from commit candidates such that we don't walk
# it multiple times. If the hash doesn't contain the revision, then
# we have either already walked the commit or it's not new.
commit = commits_by_id.delete(rev)
next if commit.nil? next if commit.nil?
pending.push(*commit.parent_ids) # Only add the parent ID to the pending set if we actually know its
# commit to guards us against readding an ID which we have already
# queued up before.
commit.parent_ids.each do |parent_id|
pending.add(parent_id) if commits_by_id.has_key?(parent_id)
end
result << commit result << commit
end end
......
...@@ -182,23 +182,8 @@ RSpec.describe Gitlab::Checks::ChangesAccess do ...@@ -182,23 +182,8 @@ RSpec.describe Gitlab::Checks::ChangesAccess do
create_commit('b1', %w[b2 a2]), create_commit('b1', %w[b2 a2]),
create_commit('a2', %w[a3 b3]), create_commit('a2', %w[a3 b3]),
create_commit('b2', %w[b3 a3]), create_commit('b2', %w[b3 a3]),
create_commit('b2', %w[b3 a3]),
create_commit('a2', %w[a3 b3]),
create_commit('a3', %w[c]),
create_commit('b3', %w[c]),
create_commit('b3', %w[c]),
create_commit('a3', %w[c]),
create_commit('b3', %w[c]),
create_commit('a3', %w[c]),
create_commit('a3', %w[c]), create_commit('a3', %w[c]),
create_commit('b3', %w[c]), create_commit('b3', %w[c]),
create_commit('c', []),
create_commit('c', []),
create_commit('c', []),
create_commit('c', []),
create_commit('c', []),
create_commit('c', []),
create_commit('c', []),
create_commit('c', []) create_commit('c', [])
] ]
end end
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment