Commit 237b228c authored by Axel García's avatar Axel García Committed by Fiona Neill

Add details regarding pseudonymization on the GSC

- Implementation details
- Notes about user_id being pseudonymized
- SQL examples
parent 78eb758e
...@@ -21,8 +21,25 @@ For the recommended frontend tracking implementation, see [Usage recommendations ...@@ -21,8 +21,25 @@ For the recommended frontend tracking implementation, see [Usage recommendations
Structured events and page views include the [`gitlab_standard`](schemas.md#gitlab_standard) Structured events and page views include the [`gitlab_standard`](schemas.md#gitlab_standard)
context, using the `window.gl.snowplowStandardContext` object which includes context, using the `window.gl.snowplowStandardContext` object which includes
[default data](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/views/layouts/_snowplow.html.haml) [default data](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/views/layouts/_snowplow.html.haml)
as base. This object can be modified for any subsequent structured event fired, as base:
although it's not recommended.
| Property | Example |
| -------- | ------- |
| `context_generated_at` | `"2022-01-01T01:00:00.000Z"` |
| `environment` | `"production"` |
| `extra` | `{}` |
| `namespace_id` | `123` |
| `plan` | `"gold"` |
| `project_id` | `456` |
| `source` | `"gitlab-rails"` |
| `user_id` | `789`* |
_\* Undergoes a pseudonymization process at the collector level._
These properties [are overriden](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/assets/javascripts/tracking/get_standard_context.js)
with frontend-specific values, like `source` (`gitlab-javascript`), `google_analytics_id`
and the custom `extra` object. You can modify this object for any subsequent
structured event that fires, although this is not recommended.
Tracking implementations must have an `action` and a `category`. You can provide additional Tracking implementations must have an `action` and a `category`. You can provide additional
properties from the [structured event taxonomy](index.md#structured-event-taxonomy), in properties from the [structured event taxonomy](index.md#structured-event-taxonomy), in
...@@ -396,13 +413,13 @@ Use the following arguments: ...@@ -396,13 +413,13 @@ Use the following arguments:
|------------|---------------------------|---------------|-----------------------------------------------------------------------------------------------------------------------------------| |------------|---------------------------|---------------|-----------------------------------------------------------------------------------------------------------------------------------|
| `category` | String | | Area or aspect of the application. For example, `HealthCheckController` or `Lfs::FileTransformer`. | | `category` | String | | Area or aspect of the application. For example, `HealthCheckController` or `Lfs::FileTransformer`. |
| `action` | String | | The action being taken. For example, a controller action such as `create`, or an Active Record callback. | | `action` | String | | The action being taken. For example, a controller action such as `create`, or an Active Record callback. |
| `label` | String | nil | The specific element or object to act on. This can be one of the following: the label of the element, for example, a tab labeled 'Create from template' for `create_from_template`; a unique identifier if no text is available, for example, `groups_dropdown_close` for closing the Groups dropdown in the top bar; or the name or title attribute of a record being created. | | `label` | String | `nil` | The specific element or object to act on. This can be one of the following: the label of the element, for example, a tab labeled 'Create from template' for `create_from_template`; a unique identifier if no text is available, for example, `groups_dropdown_close` for closing the Groups dropdown in the top bar; or the name or title attribute of a record being created. |
| `property` | String | nil | Any additional property of the element, or object being acted on. | | `property` | String | `nil` | Any additional property of the element, or object being acted on. |
| `value` | Numeric | nil | Describes a numeric value (decimal) directly related to the event. This could be the value of an input. For example, `10` when clicking `internal` visibility. | | `value` | Numeric | `nil` | Describes a numeric value (decimal) directly related to the event. This could be the value of an input. For example, `10` when clicking `internal` visibility. |
| `context` | Array\[SelfDescribingJSON\] | nil | An array of custom contexts to send with this event. Most events should not have any custom contexts. | | `context` | Array\[SelfDescribingJSON\] | `nil` | An array of custom contexts to send with this event. Most events should not have any custom contexts. |
| `project` | Project | nil | The project associated with the event. | | `project` | Project | `nil` | The project associated with the event. |
| `user` | User | nil | The user associated with the event. | | `user` | User | `nil` | The user associated with the event. This value undergoes a pseudonymization process at the collector level. |
| `namespace` | Namespace | nil | The namespace associated with the event. | | `namespace` | Namespace | `nil` | The namespace associated with the event. |
| `extra` | Hash | `{}` | Additional keyword arguments are collected into a hash and sent with the event. | | `extra` | Hash | `{}` | Additional keyword arguments are collected into a hash and sent with the event. |
### Unit testing ### Unit testing
......
...@@ -150,6 +150,23 @@ ORDER BY page_view_start DESC ...@@ -150,6 +150,23 @@ ORDER BY page_view_start DESC
LIMIT 100 LIMIT 100
``` ```
#### Top 20 users who fired `reply_comment_button` in the last 30 days
```sql
SELECT
count(*) as hits,
se_action,
se_category,
gsc_pseudonymized_user_id
FROM legacy.snowplow_gitlab_events_30
WHERE
se_label = 'reply_comment_button'
AND gsc_pseudonymized_user_id IS NOT NULL
GROUP BY gsc_pseudonymized_user_id, se_category, se_action
ORDER BY count(*) DESC
LIMIT 20
```
#### Query JSON formatted data #### Query JSON formatted data
```sql ```sql
......
...@@ -10,17 +10,18 @@ This page provides Snowplow schema reference for GitLab events. ...@@ -10,17 +10,18 @@ This page provides Snowplow schema reference for GitLab events.
## `gitlab_standard` ## `gitlab_standard`
We are including the [`gitlab_standard` schema](https://gitlab.com/gitlab-org/iglu/-/blob/master/public/schemas/com.gitlab/gitlab_standard/jsonschema/) with every event. See [Standardize Snowplow Schema](https://gitlab.com/groups/gitlab-org/-/epics/5218) for details. We are including the [`gitlab_standard` schema](https://gitlab.com/gitlab-org/iglu/-/blob/master/public/schemas/com.gitlab/gitlab_standard/jsonschema/) for structured events and page views.
The [`StandardContext`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/tracking/standard_context.rb) The [`StandardContext`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/tracking/standard_context.rb)
class represents this schema in the application. Some properties are automatically populated for [frontend](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/views/layouts/_snowplow.html.haml) class represents this schema in the application. Some properties are
events. [automatically populated for frontend events](implementation.md#snowplow-javascript-frontend-tracking),
and can be [provided manually for backend events](implementation.md#implement-ruby-backend-tracking).
| Field Name | Required | Default value | Type | Description | | Field Name | Required | Default value | Type | Description |
|----------------|:-------------------:|-----------------------|--|---------------------------------------------------------------------------------------------| |----------------|:-------------------:|-----------------------|--|---------------------------------------------------------------------------------------------|
| `project_id` | **{dotted-circle}** | Current project ID * | integer | | | `project_id` | **{dotted-circle}** | Current project ID * | integer | |
| `namespace_id` | **{dotted-circle}** | Current group/namespace ID * | integer | | | `namespace_id` | **{dotted-circle}** | Current group/namespace ID * | integer | |
| `user_id` | **{dotted-circle}** | Current user ID * | integer | User database record ID attribute. This file undergoes a pseudonymization process at the collector level. | | `user_id` | **{dotted-circle}** | Current user ID * | integer | User database record ID attribute. This value undergoes a pseudonymization process at the collector level. |
| `context_generated_at` | **{dotted-circle}** | Current timestamp | string (date time format) | Timestamp indicating when context was generated. | | `context_generated_at` | **{dotted-circle}** | Current timestamp | string (date time format) | Timestamp indicating when context was generated. |
| `environment` | **{check-circle}** | Current environment | string (max 32 chars) | Name of the source environment, such as `production` or `staging` | | `environment` | **{check-circle}** | Current environment | string (max 32 chars) | Name of the source environment, such as `production` or `staging` |
| `source` | **{check-circle}** | Event source | string (max 32 chars) | Name of the source application, such as `gitlab-rails` or `gitlab-javascript` | | `source` | **{check-circle}** | Event source | string (max 32 chars) | Name of the source application, such as `gitlab-rails` or `gitlab-javascript` |
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment