Commit 8a5681f2 authored by Patrick Steinhardt's avatar Patrick Steinhardt

push_rules: Implement bulk-checking of file sizes

The file size check checks each newly pushed blob's size to see whether
it's bigger than a configured threshold and, if so, rejects the ref
update. This is an expensive check though: we need to go both through
all preexisting as well as all new refs in order to find out new blobs
via a graph walk. As such, this check doesn't only scale with the number
of changes, but also with the repository size itself.

Now that `#new_blobs` knows to handle multiple new revisions at once in
a single RPC call to Gitaly, we can convert this check to use a single
bulk-load of new blobs. While this doesn't help much with walking the
positive side of the graph walk, it does amortize the negative walk of
all preexisting refs and will thus in most cases result in a significant
speedup if multiple changes are to be checked.

Ideally, we'd go even further and enumerate new blobs directly via the
quarantine directory: we wouldn't have to do a graph walk at all in this
case, but can just directly look up all new blobs. While this would be
as fast as we can get, the downside is that we wouldn't have blob paths
available anymore given that these blobs wouldn't have been walked via a
tree object. We would still be able to at least present the blob ID to
the user, but the user experience is definitely worse in this case.

We may still at a later point decide to go this step given that it si a
huge performance win (e.g. on gitlab-org/gitlab, we're talking about
10ms vs 30s). But for now, this commit only does the uncontroversial
part of batch-computing new blobs.

Changelog: performance
parent 081d1780
......@@ -13,20 +13,16 @@ module EE
logger.log_timed(LOG_MESSAGE) do
max_file_size = push_rule.max_file_size
changes_access.changes.each do |change|
newrev = change[:newrev]
newrevs = changes_access.changes.map { |change| change[:newrev] }
next if newrev.blank? || ::Gitlab::Git.blank_ref?(newrev)
blobs = project.repository.new_blobs(newrevs, dynamic_timeout: logger.time_left)
blobs = project.repository.new_blobs(newrev, dynamic_timeout: logger.time_left)
large_blob = blobs.find do |blob|
::Gitlab::Utils.bytes_to_megabytes(blob.size) > max_file_size
end
large_blob = blobs.find do |blob|
::Gitlab::Utils.bytes_to_megabytes(blob.size) > max_file_size
end
if large_blob
raise ::Gitlab::GitAccess::ForbiddenError, %Q{File "#{large_blob.path}" is larger than the allowed size of #{max_file_size} MB}
end
if large_blob
raise ::Gitlab::GitAccess::ForbiddenError, %Q{File "#{large_blob.path}" is larger than the allowed size of #{max_file_size} MB}
end
end
end
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment