• Bob Van Landuyt's avatar
    Reduce the number of buckets in Sidekiq histograms · 7c912e14
    Bob Van Landuyt authored
    Because of the wide range of buckets used in for these metrics and the
    large number of pods running, the cardinality of these series made it
    hard to query the Prometheus instance serving these.
    
    As a result, some of the metrics that are used for service monitoring
    and alerting were failing to record in Thanos. By reducing the number
    of buckets we're hoping to improve the rule evaluations and prevent
    missing series for Sidekiq
    
    This brings the number of series for the
    `sidekiq_jobs_completion_seconds` &
    `sidekiq_jobs_queue_duration_seconds` down from +8k to about 1.5k
    each.
    
    This also reduces the number of buckets used for measuring the total
    time a job spends per resource: cpu, db, gitaly or elasticsearch.
    
    Changelog: changed
    7c912e14
server_metrics.rb 8.19 KB