Commit e437426b authored by Suzanne Selhorn's avatar Suzanne Selhorn

Merge branch 'russell/edit-telemetry-docs' into 'master'

Edited Telemetry docs per review

See merge request gitlab-org/gitlab!32615
parents d3c771ea e7e770db
...@@ -46,25 +46,25 @@ More useful links: ...@@ -46,25 +46,25 @@ More useful links:
In this section we will explain the six different technologies we use to gather product usage data. In this section we will explain the six different technologies we use to gather product usage data.
**Snowplow JS (Frontend)** ### Snowplow JS (Frontend)
Snowplow is an enterprise-grade marketing and product analytics platform which helps track the way users engage with our website and application. [Snowplow JS](https://github.com/snowplow/snowplow/wiki/javascript-tracker) is a frontend tracker for client-side events. Snowplow is an enterprise-grade marketing and product analytics platform which helps track the way users engage with our website and application. [Snowplow JS](https://github.com/snowplow/snowplow/wiki/javascript-tracker) is a frontend tracker for client-side events.
**Snowplow Ruby (Backend)** ### Snowplow Ruby (Backend)
Snowplow is an enterprise-grade marketing and product analytics platform which helps track the way users engage with our website and application. [Snowplow Ruby](https://github.com/snowplow/snowplow/wiki/ruby-tracker) is a backend tracker for server-side events. Snowplow is an enterprise-grade marketing and product analytics platform which helps track the way users engage with our website and application. [Snowplow Ruby](https://github.com/snowplow/snowplow/wiki/ruby-tracker) is a backend tracker for server-side events.
**Usage Ping** ### Usage Ping
Usage Ping is a method for GitLab Inc to collect usage data on a GitLab instance. Usage Ping is primarily composed of row counts for different tables in the instance’s database. By comparing these counts month over month (or week over week), we can get a rough sense for how an instance is using the different features within the product. This high-level data is used to help our product, support, and sales teams. Usage Ping is a method for GitLab Inc to collect usage data on a GitLab instance. Usage Ping is primarily composed of row counts for different tables in the instance’s database. By comparing these counts month over month (or week over week), we can get a rough sense for how an instance is using the different features within the product. This high-level data is used to help our product, support, and sales teams.
Read more about how this works in the [Usage Ping guide](usage_ping.md) Read more about how this works in the [Usage Ping guide](usage_ping.md)
**Database import** ### Database import
Database imports are full imports of data into GitLab's data warehouse. For GitLab.com, the PostgreSQL database is loaded into Snowflake data warehouse every 6 hours. For more details, see the [data team handbook](https://about.gitlab.com/handbook/business-ops/data-team/#extract-and-load). Database imports are full imports of data into GitLab's data warehouse. For GitLab.com, the PostgreSQL database is loaded into Snowflake data warehouse every 6 hours. For more details, see the [data team handbook](https://about.gitlab.com/handbook/business-ops/data-team/#extract-and-load).
**Log system** ### Log system
System logs are the application logs generated from running the GitLab Rails application. For more details, see the [log system](../../administration/logs.md) and [logging infrastructure](https://gitlab.com/gitlab-com/runbooks/tree/master/logging/doc#logging-infrastructure-overview). System logs are the application logs generated from running the GitLab Rails application. For more details, see the [log system](../../administration/logs.md) and [logging infrastructure](https://gitlab.com/gitlab-com/runbooks/tree/master/logging/doc#logging-infrastructure-overview).
...@@ -83,52 +83,52 @@ Our different tracking tools allows us to track different types of events. The e ...@@ -83,52 +83,52 @@ Our different tracking tools allows us to track different types of events. The e
| Logs | ❌ | ❌ | ❌ | ❌ | ✅ | | Logs | ❌ | ❌ | ❌ | ❌ | ✅ |
| External services | ❌ | ❌ | ❌ | ❌ | ❌ | | External services | ❌ | ❌ | ❌ | ❌ | ❌ |
**Database counts** ### Database counts
- How many Projects have been created by unique users - How many Projects have been created by unique users
- How many users logged in the past 28 day - How many users logged in the past 28 day
Database counts are row counts for different tables in an instance’s database. These are SQL count queries which have been filtered, grouped, or aggregated which provide high level usage data. The full list of available tables can be found in [structure.sql](https://gitlab.com/gitlab-org/gitlab/-/blob/master/db/structure.sql) Database counts are row counts for different tables in an instance’s database. These are SQL count queries which have been filtered, grouped, or aggregated which provide high level usage data. The full list of available tables can be found in [structure.sql](https://gitlab.com/gitlab-org/gitlab/-/blob/master/db/structure.sql)
**Pageview events** ### Pageview events
- How many sessions visited the /dashboard/groups page - How many sessions visited the /dashboard/groups page
**UI Events** ### UI Events
- How many sessions clicked on a button or link - How many sessions clicked on a button or link
- How many sessions closed a modal - How many sessions closed a modal
UI events are any interface-driven actions from the browser including click data. UI events are any interface-driven actions from the browser including click data.
**CRUD or API events** ### CRUD or API events
- How many Git pushes were made - How many Git pushes were made
- How many GraphQL queries were made - How many GraphQL queries were made
- How many requests were made to a Rails action or controller. - How many requests were made to a Rails action or controller.
These are backend events that include the creation, read, update, deletion of records and other events that might be triggered from layers that aren't necessarily only available in the interface. These are backend events that include the creation, read, update, deletion of records, and other events that might be triggered from layers other than those available in the interface.
**Event funnels** ### Event funnels
- How many sessions performed action A, B, then C - How many sessions performed action A, B, then C
- What is our conversion rate from step A to B? - What is our conversion rate from step A to B?
**PostgreSQL data** ### PostgreSQL data
These are raw database records which can be explored using business intelligence tools like Sisense. The full list of available tables can be found in [structure.sql](https://gitlab.com/gitlab-org/gitlab/-/blob/master/db/structure.sql) These are raw database records which can be explored using business intelligence tools like Sisense. The full list of available tables can be found in [structure.sql](https://gitlab.com/gitlab-org/gitlab/-/blob/master/db/structure.sql)
**Logs** ### Logs
These are raw logs such as the [Production logs](../../administration/logs.md#production_jsonlog), [API logs](../../administration/logs.md#api_jsonlog), or [Sidekiq logs](../../administration/logs.md#sidekiqlog). See the [overview of Logging Infrastructure](https://gitlab.com/gitlab-com/runbooks/tree/master/logging/doc#logging-infrastructure-overview) for more details. These are raw logs such as the [Production logs](../../administration/logs.md#production_jsonlog), [API logs](../../administration/logs.md#api_jsonlog), or [Sidekiq logs](../../administration/logs.md#sidekiqlog). See the [overview of Logging Infrastructure](https://gitlab.com/gitlab-com/runbooks/tree/master/logging/doc#logging-infrastructure-overview) for more details.
**External services** ### External services
These are external services a GitLab instance interacts with such as an [external storage provider](../../administration/static_objects_external_storage.md) or an [external container registry](../../administration/packages/container_registry.md#use-an-external-container-registry-with-gitlab-as-an-auth-endpoint). These services must be able to send data back into a GitLab instance for data to be tracked. These are external services a GitLab instance interacts with such as an [external storage provider](../../administration/static_objects_external_storage.md) or an [external container registry](../../administration/packages/container_registry.md#use-an-external-container-registry-with-gitlab-as-an-auth-endpoint). These services must be able to send data back into a GitLab instance for data to be tracked.
## Telemetry systems overview ## Telemetry systems overview
The systems overview is a simplified diagram showing the interactions between GitLab Inc and self-managed nstances. The systems overview is a simplified diagram showing the interactions between GitLab Inc and self-managed instances.
![Telemetry_Overview](../img/telemetry_system_overview.png) ![Telemetry_Overview](../img/telemetry_system_overview.png)
...@@ -140,7 +140,7 @@ For Telemetry purposes, GitLab Inc has three major components: ...@@ -140,7 +140,7 @@ For Telemetry purposes, GitLab Inc has three major components:
1. [Data Infrastructure](https://about.gitlab.com/handbook/business-ops/data-team/data-infrastructure/): This contains everything managed by our data team including Sisense Dashboards for visualization, Snowflake for Data Warehousing, incoming data sources such as PostgreSQL Pipeline and S3 Bucket, and lastly our data collectors [GitLab.com's Snowplow Collector](https://about.gitlab.com/handbook/engineering/infrastructure/library/snowplow/) and GitLab's Versions Application. 1. [Data Infrastructure](https://about.gitlab.com/handbook/business-ops/data-team/data-infrastructure/): This contains everything managed by our data team including Sisense Dashboards for visualization, Snowflake for Data Warehousing, incoming data sources such as PostgreSQL Pipeline and S3 Bucket, and lastly our data collectors [GitLab.com's Snowplow Collector](https://about.gitlab.com/handbook/engineering/infrastructure/library/snowplow/) and GitLab's Versions Application.
1. GitLab.com: This is the production GitLab application which is made up of a Client and Server. On the Client or browser side, a Snowplow JS Tracker (Frontend) is used to track client-side events. On the Server or application side, a Snowplow Ruby Tracker (Backend) is used to track server-side events. The server also contains Usage Ping which leverages a PostgreSQL database and a Redis in-memory data store to report on usage data. Lastly, the server also contains System Logs which are generated from running the GitLab application. 1. GitLab.com: This is the production GitLab application which is made up of a Client and Server. On the Client or browser side, a Snowplow JS Tracker (Frontend) is used to track client-side events. On the Server or application side, a Snowplow Ruby Tracker (Backend) is used to track server-side events. The server also contains Usage Ping which leverages a PostgreSQL database and a Redis in-memory data store to report on usage data. Lastly, the server also contains System Logs which are generated from running the GitLab application.
1. [Monitoring infrastructure](https://about.gitlab.com/handbook/engineering/monitoring/): This is the infrastructure used to ensure GitLab.com is operating smoothly. System Logs are sent from GitLab.com to our monitoring infrastructure and collected by a FluentD collector. From FluentD, logs are either sent to long term Google Cloud Services cold storage via Stackdriver, or, they are sent to our Elastic Cluster via Cloud Pub/Sub which can be explored in real-time using Kibana 1. [Monitoring infrastructure](https://about.gitlab.com/handbook/engineering/monitoring/): This is the infrastructure used to ensure GitLab.com is operating smoothly. System Logs are sent from GitLab.com to our monitoring infrastructure and collected by a FluentD collector. From FluentD, logs are either sent to long term Google Cloud Services cold storage via Stackdriver, or, they are sent to our Elastic Cluster via Cloud Pub/Sub which can be explored in real-time using Kibana.
### Self-managed ### Self-managed
...@@ -151,7 +151,7 @@ For Telemetry purposes, self-managed instances have two major components: ...@@ -151,7 +151,7 @@ For Telemetry purposes, self-managed instances have two major components:
### Differences between GitLab Inc and Self-managed ### Differences between GitLab Inc and Self-managed
As shown by the orange lines, on GitLab.com Snowplow JS, Snowplow Ruby, Usage Ping, and PostgreSQL database imports all flow into GitLab Inc's data fnfrastructure. However, on self-managed, only Usage Ping flows into GitLab Inc's data infrastructure. As shown by the orange lines, on GitLab.com Snowplow JS, Snowplow Ruby, Usage Ping, and PostgreSQL database imports all flow into GitLab Inc's data infrastructure. However, on self-managed, only Usage Ping flows into GitLab Inc's data infrastructure.
As shown by the green lines, on GitLab.com system logs flow into GitLab Inc's monitoring infrastructure. On self-managed, there are no logs sent to GitLab Inc's monitoring infrastructure. As shown by the green lines, on GitLab.com system logs flow into GitLab Inc's monitoring infrastructure. On self-managed, there are no logs sent to GitLab Inc's monitoring infrastructure.
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment