Commit a348f748 authored by Alex Kalderimis's avatar Alex Kalderimis

Merge branch 'mwaw/add_time_frame_attribute_to_aggregated_metrics' into 'master'

Add time frame attribute to aggregated metric

See merge request gitlab-org/gitlab!54570
parents 8e7780e2 5f51a17c
......@@ -934,6 +934,10 @@ To add data for aggregated metrics into Usage Ping payload you should add corres
- `operator`: Operator that defines how the aggregated metric data is counted. Available operators are:
- `OR`: Removes duplicates and counts all entries that triggered any of listed events.
- `AND`: Removes duplicates and counts all elements that were observed triggering all of following events.
- `time_frame`: One or more valid time frames. Use these to limit the data included in aggregated metric to events within a specific date-range. Valid time frames are:
- `7d`: Last seven days of data.
- `28d`: Last twenty eight days of data.
- `all`: All historical data, only available for `database` sourced aggregated metrics.
- `source`: Data source used to collect all events data included in aggregated metric. Valid data sources are:
- [`database`](#database-sourced-aggregated-metrics)
- [`redis`](#redis-sourced-aggregated-metrics)
......@@ -949,18 +953,24 @@ To add data for aggregated metrics into Usage Ping payload you should add corres
Example aggregated metric entries:
```yaml
- name: product_analytics_test_metrics_union_redis_sourced
- name: example_metrics_union
operator: OR
events: ['i_search_total', 'i_search_advanced', 'i_search_paid']
source: redis
- name: product_analytics_test_metrics_intersection_with_feautre_flag_database_sourced
time_frame:
- 7d
- 28d
- name: example_metrics_intersection
operator: AND
source: database
time_frame:
- 28d
- all
events: ['dependency_scanning_pipeline_all_time', 'container_scanning_pipeline_all_time']
feature_flag: example_aggregated_metric
```
Aggregated metrics are added under `aggregated_metrics` key in both `counts_weekly` and `counts_monthly` top level keys in Usage Ping payload.
Aggregated metrics collected in `7d` and `28d` time frames are added into Usage Ping payload under the `aggregated_metrics` sub-key in the `counts_weekly` and `counts_monthly` top level keys.
```ruby
{
......@@ -973,14 +983,35 @@ Aggregated metrics are added under `aggregated_metrics` key in both `counts_week
:project_snippets => 407,
:promoted_issues => 719,
:aggregated_metrics => {
:product_analytics_test_metrics_union => 7,
:product_analytics_test_metrics_intersection_with_feautre_flag => 2
:example_metrics_union => 7,
:example_metrics_intersection => 2
},
:snippets => 2513
}
}
```
Aggregated metrics for `all` time frame are present in the `count` top level key, with the `aggregate_` prefix added to their name.
For example:
`example_metrics_intersection`
Becomes:
`counts.aggregate_example_metrics_intersection`
```ruby
{
:counts => {
:deployments => 11003,
:successful_deployments => 178,
:failed_deployments => 1275,
:aggregate_example_metrics_intersection => 12
}
}
```
### Redis sourced aggregated metrics
> - [Introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/45979) in GitLab 13.6.
......@@ -992,6 +1023,7 @@ you must fulfill the following requirements:
[`known_events/*.yml`](#known-events-are-added-automatically-in-usage-data-payload) files.
1. All events listed at `events` attribute must have the same `redis_slot` attribute.
1. All events listed at `events` attribute must have the same `aggregation` attribute.
1. `time_frame` does not include `all` value, which is unavailable for Redis sourced aggregated metrics.
### Database sourced aggregated metrics
......@@ -1051,17 +1083,24 @@ end
#### Add new aggregated metric definition
After all metrics are persisted, you can add an aggregated metric definition at
[`aggregated_metrics/`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/usage_data_counters/aggregated_metrics/). When adding definitions for metrics names listed in the
`events:` attribute, use the same names you passed in the `metric_name` argument
while persisting metrics in previous step.
[`aggregated_metrics/`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/usage_data_counters/aggregated_metrics/).
To declare the aggregate of metrics collected with [Estimated Batch Counters](#estimated-batch-counters),
you must fulfill the following requirements:
- Metrics names listed in the `events:` attribute, have to use the same names you passed in the `metric_name` argument while persisting metrics in previous step.
- Every metric listed in the `events:` attribute, has to be persisted for **every** selected `time_frame:` value.
Example definition:
```yaml
- name: product_analytics_test_metrics_intersection_database_sourced
- name: example_metrics_intersection_database_sourced
operator: AND
source: database
events: ['dependency_scanning_pipeline', 'container_scanning_pipeline']
time_frame:
- 28d
- all
```
## Example Usage Ping payload
......
......@@ -11,6 +11,7 @@ module Gitlab
AggregatedMetricError = Class.new(StandardError)
UnknownAggregationOperator = Class.new(AggregatedMetricError)
UnknownAggregationSource = Class.new(AggregatedMetricError)
DisallowedAggregationTimeFrame = Class.new(AggregatedMetricError)
DATABASE_SOURCE = 'database'
REDIS_SOURCE = 'redis'
......@@ -30,25 +31,38 @@ module Gitlab
@recorded_at = recorded_at
end
def all_time_data
aggregated_metrics_data(start_date: nil, end_date: nil, time_frame: Gitlab::Utils::UsageData::ALL_TIME_TIME_FRAME_NAME)
end
def monthly_data
aggregated_metrics_data(**monthly_time_range)
aggregated_metrics_data(**monthly_time_range.merge(time_frame: Gitlab::Utils::UsageData::TWENTY_EIGHT_DAYS_TIME_FRAME_NAME))
end
def weekly_data
aggregated_metrics_data(**weekly_time_range)
aggregated_metrics_data(**weekly_time_range.merge(time_frame: Gitlab::Utils::UsageData::SEVEN_DAYS_TIME_FRAME_NAME))
end
private
attr_accessor :aggregated_metrics, :recorded_at
def aggregated_metrics_data(start_date:, end_date:)
def aggregated_metrics_data(start_date:, end_date:, time_frame:)
aggregated_metrics.each_with_object({}) do |aggregation, data|
next if aggregation[:feature_flag] && Feature.disabled?(aggregation[:feature_flag], default_enabled: :yaml, type: :development)
next unless aggregation[:time_frame].include?(time_frame)
case aggregation[:source]
when REDIS_SOURCE
data[aggregation[:name]] = calculate_count_for_aggregation(aggregation: aggregation, start_date: start_date, end_date: end_date)
if time_frame == Gitlab::Utils::UsageData::ALL_TIME_TIME_FRAME_NAME
data[aggregation[:name]] = Gitlab::Utils::UsageData::FALLBACK
Gitlab::ErrorTracking
.track_and_raise_for_dev_exception(
DisallowedAggregationTimeFrame.new("Aggregation time frame: 'all' is not allowed for aggregation with source: '#{REDIS_SOURCE}'")
)
else
data[aggregation[:name]] = calculate_count_for_aggregation(aggregation: aggregation, start_date: start_date, end_date: end_date)
end
when DATABASE_SOURCE
next unless Feature.enabled?('database_sourced_aggregated_metrics', default_enabled: false, type: :development)
......
......@@ -55,15 +55,15 @@ module Gitlab
end
def time_period_to_human_name(time_period)
return Gitlab::Utils::UsageData::ALL_TIME_PERIOD_HUMAN_NAME if time_period.blank?
return Gitlab::Utils::UsageData::ALL_TIME_TIME_FRAME_NAME if time_period.blank?
start_date = time_period.first.to_date
end_date = time_period.last.to_date
if (end_date - start_date).to_i > 7
Gitlab::Utils::UsageData::MONTHLY_PERIOD_HUMAN_NAME
Gitlab::Utils::UsageData::TWENTY_EIGHT_DAYS_TIME_FRAME_NAME
else
Gitlab::Utils::UsageData::WEEKLY_PERIOD_HUMAN_NAME
Gitlab::Utils::UsageData::SEVEN_DAYS_TIME_FRAME_NAME
end
end
end
......
......@@ -60,6 +60,7 @@ module Gitlab
.merge(compliance_unique_visits_data)
.merge(search_unique_visits_data)
.merge(redis_hll_counters)
.deep_merge(aggregated_metrics_data)
end
end
......@@ -224,8 +225,7 @@ module Gitlab
project_snippets: count(ProjectSnippet.where(last_28_days_time_period)),
projects_with_alerts_created: distinct_count(::AlertManagement::Alert.where(last_28_days_time_period), :project_id)
}.merge(
snowplow_event_counts(last_28_days_time_period(column: :collector_tstamp)),
aggregated_metrics_monthly
snowplow_event_counts(last_28_days_time_period(column: :collector_tstamp))
).tap do |data|
data[:snippets] = add(data[:personal_snippets], data[:project_snippets])
end
......@@ -250,10 +250,7 @@ module Gitlab
def system_usage_data_weekly
{
counts_weekly: {
}.merge(
aggregated_metrics_weekly
)
counts_weekly: {}
}
end
......@@ -713,15 +710,13 @@ module Gitlab
{ redis_hll_counters: ::Gitlab::UsageDataCounters::HLLRedisCounter.unique_events_data }
end
def aggregated_metrics_monthly
{
aggregated_metrics: aggregated_metrics.monthly_data
}
end
def aggregated_metrics_weekly
def aggregated_metrics_data
{
aggregated_metrics: aggregated_metrics.weekly_data
counts_weekly: { aggregated_metrics: aggregated_metrics.weekly_data },
counts_monthly: { aggregated_metrics: aggregated_metrics.monthly_data },
counts: aggregated_metrics
.all_time_data
.to_h { |key, value| ["aggregate_#{key}".to_sym, value.round] }
}
end
......
......@@ -11,6 +11,7 @@
operator: OR
feature_flag: usage_data_code_review_aggregation
source: redis
time_frame: [7d, 28d]
events: [
'i_code_review_user_single_file_diffs',
'i_code_review_user_create_mr',
......@@ -54,6 +55,7 @@
operator: OR
feature_flag: usage_data_code_review_aggregation
source: redis
time_frame: [7d, 28d]
events: [
'i_code_review_user_single_file_diffs',
'i_code_review_user_create_mr',
......@@ -96,6 +98,7 @@
operator: OR
feature_flag: usage_data_code_review_aggregation
source: redis
time_frame: [7d, 28d]
events: [
'i_code_review_user_vs_code_api_request'
]
......@@ -7,6 +7,10 @@
# source: defines which datasource will be used to locate events that should be included in aggregated metric. Valid values are:
# - database
# - redis
# time_frame: defines time frames for aggregated metrics:
# - 7d - last 7 days
# - 28d - last 28 days
# - all - all historical available data, this time frame is not available for redis source
# feature_flag: name of development feature flag that will be checked before metrics aggregation is performed.
# Corresponding feature flag should have `default_enabled` attribute set to `false`.
# This attribute is OPTIONAL and can be omitted, when `feature_flag` is missing no feature flag will be checked.
......@@ -14,18 +18,22 @@
- name: compliance_features_track_unique_visits_union
operator: OR
source: redis
time_frame: [7d, 28d]
events: ['g_compliance_audit_events', 'g_compliance_dashboard', 'i_compliance_audit_events', 'a_compliance_audit_events_api', 'i_compliance_credential_inventory']
- name: product_analytics_test_metrics_union
operator: OR
source: redis
time_frame: [7d, 28d]
events: ['i_search_total', 'i_search_advanced', 'i_search_paid']
- name: product_analytics_test_metrics_intersection
operator: AND
source: redis
time_frame: [7d, 28d]
events: ['i_search_total', 'i_search_advanced', 'i_search_paid']
- name: incident_management_alerts_total_unique_counts
operator: OR
source: redis
time_frame: [7d, 28d]
events: [
'incident_management_alert_status_changed',
'incident_management_alert_assigned',
......@@ -35,6 +43,7 @@
- name: incident_management_incidents_total_unique_counts
operator: OR
source: redis
time_frame: [7d, 28d]
events: [
'incident_management_incident_created',
'incident_management_incident_reopened',
......@@ -51,10 +60,11 @@
- name: i_testing_paid_monthly_active_user_total
operator: OR
source: redis
time_frame: [7d, 28d]
events: [
'i_testing_web_performance_widget_total',
'i_testing_full_code_quality_report_total',
'i_testing_group_code_coverage_visit_total',
'i_testing_load_performance_widget_total',
'i_testing_metrics_report_widget_total'
]
'i_testing_web_performance_widget_total',
'i_testing_full_code_quality_report_total',
'i_testing_group_code_coverage_visit_total',
'i_testing_load_performance_widget_total',
'i_testing_metrics_report_widget_total'
]
......@@ -40,9 +40,9 @@ module Gitlab
FALLBACK = -1
DISTRIBUTED_HLL_FALLBACK = -2
ALL_TIME_PERIOD_HUMAN_NAME = "all_time"
WEEKLY_PERIOD_HUMAN_NAME = "weekly"
MONTHLY_PERIOD_HUMAN_NAME = "monthly"
ALL_TIME_TIME_FRAME_NAME = "all"
SEVEN_DAYS_TIME_FRAME_NAME = "7d"
TWENTY_EIGHT_DAYS_TIME_FRAME_NAME = "28d"
def count(relation, column = nil, batch: true, batch_size: nil, start: nil, finish: nil)
if batch
......
......@@ -69,7 +69,7 @@ RSpec.describe Gitlab::Usage::Metrics::Aggregates::Sources::PostgresHll, :clean_
it 'persists serialized data in Redis' do
Gitlab::Redis::SharedState.with do |redis|
expect(redis).to receive(:set).with("#{metric_1}_weekly-#{recorded_at.to_i}", '{"141":1,"56":1}', ex: 120.hours)
expect(redis).to receive(:set).with("#{metric_1}_7d-#{recorded_at.to_i}", '{"141":1,"56":1}', ex: 120.hours)
end
save_aggregated_metrics
......@@ -81,7 +81,7 @@ RSpec.describe Gitlab::Usage::Metrics::Aggregates::Sources::PostgresHll, :clean_
it 'persists serialized data in Redis' do
Gitlab::Redis::SharedState.with do |redis|
expect(redis).to receive(:set).with("#{metric_1}_monthly-#{recorded_at.to_i}", '{"141":1,"56":1}', ex: 120.hours)
expect(redis).to receive(:set).with("#{metric_1}_28d-#{recorded_at.to_i}", '{"141":1,"56":1}', ex: 120.hours)
end
save_aggregated_metrics
......@@ -93,7 +93,7 @@ RSpec.describe Gitlab::Usage::Metrics::Aggregates::Sources::PostgresHll, :clean_
it 'persists serialized data in Redis' do
Gitlab::Redis::SharedState.with do |redis|
expect(redis).to receive(:set).with("#{metric_1}_all_time-#{recorded_at.to_i}", '{"141":1,"56":1}', ex: 120.hours)
expect(redis).to receive(:set).with("#{metric_1}_all-#{recorded_at.to_i}", '{"141":1,"56":1}', ex: 120.hours)
end
save_aggregated_metrics
......
......@@ -23,6 +23,22 @@ RSpec.describe 'aggregated metrics' do
end
end
RSpec::Matchers.define :have_known_time_frame do
allowed_time_frames = [
Gitlab::Utils::UsageData::ALL_TIME_TIME_FRAME_NAME,
Gitlab::Utils::UsageData::TWENTY_EIGHT_DAYS_TIME_FRAME_NAME,
Gitlab::Utils::UsageData::SEVEN_DAYS_TIME_FRAME_NAME
]
match do |aggregate|
(aggregate[:time_frame] - allowed_time_frames).empty?
end
failure_message do |aggregate|
"Aggregate with name: `#{aggregate[:name]}` uses not allowed time_frame`#{aggregate[:time_frame] - allowed_time_frames}`"
end
end
let_it_be(:known_events) do
Gitlab::UsageDataCounters::HLLRedisCounter.known_events
end
......@@ -38,10 +54,18 @@ RSpec.describe 'aggregated metrics' do
expect(aggregated_metrics).to all has_known_source
end
it 'all aggregated metrics has known source' do
expect(aggregated_metrics).to all have_known_time_frame
end
aggregated_metrics&.select { |agg| agg[:source] == Gitlab::Usage::Metrics::Aggregates::REDIS_SOURCE }&.each do |aggregate|
context "for #{aggregate[:name]} aggregate of #{aggregate[:events].join(' ')}" do
let_it_be(:events_records) { known_events.select { |event| aggregate[:events].include?(event[:name]) } }
it "does not include 'all' time frame for Redis sourced aggregate" do
expect(aggregate[:time_frame]).not_to include(Gitlab::Utils::UsageData::ALL_TIME_TIME_FRAME_NAME)
end
it "only refers to known events" do
expect(aggregate[:events]).to all be_known_event
end
......
......@@ -1375,25 +1375,20 @@ RSpec.describe Gitlab::UsageData, :aggregate_failures do
end
end
describe '.aggregated_metrics_weekly' do
subject(:aggregated_metrics_payload) { described_class.aggregated_metrics_weekly }
describe '.aggregated_metrics_data' do
it 'uses ::Gitlab::Usage::Metrics::Aggregates::Aggregate methods', :aggregate_failures do
expected_payload = {
counts_weekly: { aggregated_metrics: { global_search_gmau: 123 } },
counts_monthly: { aggregated_metrics: { global_search_gmau: 456 } },
counts: { aggregate_global_search_gmau: 789 }
}
it 'uses ::Gitlab::Usage::Metrics::Aggregates::Aggregate#weekly_data', :aggregate_failures do
expect_next_instance_of(::Gitlab::Usage::Metrics::Aggregates::Aggregate) do |instance|
expect(instance).to receive(:weekly_data).and_return(global_search_gmau: 123)
expect(instance).to receive(:monthly_data).and_return(global_search_gmau: 456)
expect(instance).to receive(:all_time_data).and_return(global_search_gmau: 789)
end
expect(aggregated_metrics_payload).to eq(aggregated_metrics: { global_search_gmau: 123 })
end
end
describe '.aggregated_metrics_monthly' do
subject(:aggregated_metrics_payload) { described_class.aggregated_metrics_monthly }
it 'uses ::Gitlab::Usage::Metrics::Aggregates::Aggregate#monthly_data', :aggregate_failures do
expect_next_instance_of(::Gitlab::Usage::Metrics::Aggregates::Aggregate) do |instance|
expect(instance).to receive(:monthly_data).and_return(global_search_gmau: 123)
end
expect(aggregated_metrics_payload).to eq(aggregated_metrics: { global_search_gmau: 123 })
expect(described_class.aggregated_metrics_data).to eq(expected_payload)
end
end
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment