• Stan Hu's avatar
    Expire project caches once per push instead of once per ref · f14647fd
    Stan Hu authored
    Previously `ProjectCacheWorker` would be scheduled once per ref, which
    would generate unnecessary I/O and load on Sidekiq, especially if many
    tags or branches were pushed at once. `ProjectCacheWorker` would expire
    three items:
    
    1. Repository size: This only needs to be updated once per push.
    2. Commit count: This only needs to be updated if the default branch
       is updated.
    3. Project method caches: This only needs to be updated if the default
       branch changes, but only if certain files change (e.g. README,
       CHANGELOG, etc.).
    
    Because the third item requires looking at the actual changes in the
    commit deltas, we schedule one `ProjectCacheWorker` to handle the first
    two cases, and schedule a separate `ProjectCacheWorker` for the third
    case if it is needed. As a result, this brings down the number of
    `ProjectCacheWorker` jobs from N to 2.
    
    Closes https://gitlab.com/gitlab-org/gitlab-ce/issues/52046
    f14647fd
project_cache_worker.rb 1.98 KB
# frozen_string_literal: true

# Worker for updating any project specific caches.
class ProjectCacheWorker
  include ApplicationWorker

  LEASE_TIMEOUT = 15.minutes.to_i

  # project_id - The ID of the project for which to flush the cache.
  # files - An Array containing extra types of files to refresh such as
  #         `:readme` to flush the README and `:changelog` to flush the
  #         CHANGELOG.
  # statistics - An Array containing columns from ProjectStatistics to
  #              refresh, if empty all columns will be refreshed
  # refresh_statistics - A boolean that determines whether project statistics should
  #                     be updated.
  # rubocop: disable CodeReuse/ActiveRecord
  def perform(project_id, files = [], statistics = [], refresh_statistics = true)
    project = Project.find_by(id: project_id)

    return unless project

    update_statistics(project, statistics) if refresh_statistics

    return unless project.repository.exists?

    project.repository.refresh_method_caches(files.map(&:to_sym))

    project.cleanup
  end
  # rubocop: enable CodeReuse/ActiveRecord

  # NOTE: triggering both an immediate update and one in 15 minutes if we
  # successfully obtain the lease. That way, we only need to wait for the
  # statistics to become accurate if they were already updated once in the
  # last 15 minutes.
  def update_statistics(project, statistics = [])
    return if Gitlab::Database.read_only?
    return unless try_obtain_lease_for(project.id, statistics)

    Projects::UpdateStatisticsService.new(project, nil, statistics: statistics).execute

    UpdateProjectStatisticsWorker.perform_in(LEASE_TIMEOUT, project.id, statistics)
  end

  private

  def try_obtain_lease_for(project_id, statistics)
    Gitlab::ExclusiveLease
      .new(project_cache_worker_key(project_id, statistics), timeout: LEASE_TIMEOUT)
      .try_obtain
  end

  def project_cache_worker_key(project_id, statistics)
    ["project_cache_worker", project_id, *statistics.sort].join(":")
  end
end