Commit da76de12 authored by Lin Jen-Shin's avatar Lin Jen-Shin

Merge branch '207819-use-the-environment-auto-stop-feature' into 'master'

Auto-stop Review Apps after 48 hours

Closes #207819

See merge request gitlab-org/gitlab!26279
parents cdad37fe c69f4eaf
...@@ -75,6 +75,7 @@ review-build-cng: ...@@ -75,6 +75,7 @@ review-build-cng:
name: review/${CI_COMMIT_REF_NAME} name: review/${CI_COMMIT_REF_NAME}
url: https://gitlab-${CI_ENVIRONMENT_SLUG}.${REVIEW_APPS_DOMAIN} url: https://gitlab-${CI_ENVIRONMENT_SLUG}.${REVIEW_APPS_DOMAIN}
on_stop: review-stop on_stop: review-stop
auto_stop_in: 48 hours
review-deploy: review-deploy:
extends: extends:
......
...@@ -79,27 +79,38 @@ subgraph "CNG-mirror pipeline" ...@@ -79,27 +79,38 @@ subgraph "CNG-mirror pipeline"
**Additional notes:** **Additional notes:**
- If the `review-deploy` job keep failing (note that we already retry it twice), - If the `review-deploy` job keep failing (note that we already retry it twice),
please post a message in the `#quality` channel and/or create a ~Quality ~bug please post a message in the `#g_qe_engineering_productivity` channel and/or create a `~"Engineering Productivity"` `~"ep::review apps"` `~bug`
issue with a link to your merge request. Note that the deployment failure can issue with a link to your merge request. Note that the deployment failure can
reveal an actual problem introduced in your merge request (i.e. this isn't reveal an actual problem introduced in your merge request (i.e. this isn't
necessarily a transient failure)! necessarily a transient failure)!
- If the `review-qa-smoke` job keep failing (note that we already retry it twice), - If the `review-qa-smoke` job keeps failing (note that we already retry it twice),
please check the job's logs: you could discover an actual problem introduced in please check the job's logs: you could discover an actual problem introduced in
your merge request. You can also download the artifacts to see screenshots of your merge request. You can also download the artifacts to see screenshots of
the page at the time the failures occurred. If you don't find the cause of the the page at the time the failures occurred. If you don't find the cause of the
failure or if it seems unrelated to your change, please post a message in the failure or if it seems unrelated to your change, please post a message in the
`#quality` channel and/or create a ~Quality ~bug issue with a link to your `#quality` channel and/or create a ~Quality ~bug issue with a link to your
merge request. merge request.
- The manual [`review-stop`][gitlab-ci-yml] in the `test` stage can be used to - The manual `review-stop` can be used to
stop a Review App manually, and is also started by GitLab once a merge stop a Review App manually, and is also started by GitLab once a merge
request's branch is deleted after being merged. request's branch is deleted after being merged.
- Review Apps are cleaned up regularly via a pipeline schedule that runs
the [`schedule:review-cleanup`][gitlab-ci-yml] job.
- The Kubernetes cluster is connected to the `gitlab-{ce,ee}` projects using - The Kubernetes cluster is connected to the `gitlab-{ce,ee}` projects using
[GitLab's Kubernetes integration][gitlab-k8s-integration]. This basically [GitLab's Kubernetes integration][gitlab-k8s-integration]. This basically
allows to have a link to the Review App directly from the merge request allows to have a link to the Review App directly from the merge request
widget. widget.
### Auto-stopping of Review Apps
Review Apps are automatically stopped 2 days after the last deployment thanks to
the [Environment auto-stop](../../ci/environments.html#environments-auto-stop) feature.
If you need your Review App to stay up for a longer time, you can
[pin its environment](../../ci/environments.html#auto-stop-example).
The `review-cleanup` job that automatically runs in scheduled
pipelines (and is manual in merge request) stops stale Review Apps after 5 days,
deletes their environment after 6 days, and cleans up any dangling Helm releases
and Kubernetes resources after 7 days.
## QA runs ## QA runs
On every [pipeline][gitlab-pipeline] in the `qa` stage (which comes after the On every [pipeline][gitlab-pipeline] in the `qa` stage (which comes after the
...@@ -206,7 +217,7 @@ aids in identifying load spikes on the cluster, and if nodes are problematic or ...@@ -206,7 +217,7 @@ aids in identifying load spikes on the cluster, and if nodes are problematic or
**Potential cause:** **Potential cause:**
That could be a sign that the [`schedule:review-cleanup`][gitlab-ci-yml] job is That could be a sign that the `review-cleanup` job is
failing to cleanup stale Review Apps and Kubernetes resources. failing to cleanup stale Review Apps and Kubernetes resources.
**Where to look for further debugging:** **Where to look for further debugging:**
...@@ -270,7 +281,7 @@ kubectl get cm --sort-by='{.metadata.creationTimestamp}' | grep 'review-' | grep ...@@ -270,7 +281,7 @@ kubectl get cm --sort-by='{.metadata.creationTimestamp}' | grep 'review-' | grep
### Using K9s ### Using K9s
[K9s] is a powerful command line dashboard which allows you to filter by labels. This can help identify trends with apps exceeding the [review-app resource requests](https://gitlab.com/gitlab-org/gitlab/blob/master/scripts/review_apps/base-config.yaml). Kubernetes will schedule pods to nodes based on resource requests and allow for CPU usage up to the limits. [K9s] is a powerful command line dashboard which allows you to filter by labels. This can help identify trends with apps exceeding the [review-app resource requests](https://gitlab.com/gitlab-org/gitlab/-/blob/master/scripts/review_apps/base-config.yaml). Kubernetes will schedule pods to nodes based on resource requests and allow for CPU usage up to the limits.
- In K9s you can sort or add filters by typing the `/` character - In K9s you can sort or add filters by typing the `/` character
- `-lrelease=<review-app-slug>` - filters down to all pods for a release. This aids in determining what is having issues in a single deployment - `-lrelease=<review-app-slug>` - filters down to all pods for a release. This aids in determining what is having issues in a single deployment
...@@ -387,13 +398,11 @@ find a way to limit it to only us.** ...@@ -387,13 +398,11 @@ find a way to limit it to only us.**
[helm-chart]: https://gitlab.com/gitlab-org/charts/gitlab/ [helm-chart]: https://gitlab.com/gitlab-org/charts/gitlab/
[review-apps-ce]: https://console.cloud.google.com/kubernetes/clusters/details/us-central1-a/review-apps-ce?project=gitlab-review-apps [review-apps-ce]: https://console.cloud.google.com/kubernetes/clusters/details/us-central1-a/review-apps-ce?project=gitlab-review-apps
[review-apps-ee]: https://console.cloud.google.com/kubernetes/clusters/details/us-central1-b/review-apps-ee?project=gitlab-review-apps [review-apps-ee]: https://console.cloud.google.com/kubernetes/clusters/details/us-central1-b/review-apps-ee?project=gitlab-review-apps
[review-apps.sh]: https://gitlab.com/gitlab-org/gitlab/blob/master/scripts/review_apps/review-apps.sh [review-apps.sh]: https://gitlab.com/gitlab-org/gitlab/-/blob/master/scripts/review_apps/review-apps.sh
[automated_cleanup.rb]: https://gitlab.com/gitlab-org/gitlab/blob/master/scripts/review_apps/automated_cleanup.rb [automated_cleanup.rb]: https://gitlab.com/gitlab-org/gitlab/-/blob/master/scripts/review_apps/automated_cleanup.rb
[Auto-DevOps.gitlab-ci.yml]: https://gitlab.com/gitlab-org/gitlab/blob/master/lib/gitlab/ci/templates/Auto-DevOps.gitlab-ci.yml [Auto-DevOps.gitlab-ci.yml]: https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/ci/templates/Auto-DevOps.gitlab-ci.yml
[gitlab-ci-yml]: https://gitlab.com/gitlab-org/gitlab/blob/master/.gitlab-ci.yml
[gitlab-k8s-integration]: ../../user/project/clusters/index.md [gitlab-k8s-integration]: ../../user/project/clusters/index.md
[K9s]: https://github.com/derailed/k9s [K9s]: https://github.com/derailed/k9s
[password-bug]: https://gitlab.com/gitlab-org/gitlab-foss/issues/53621
--- ---
......
...@@ -54,7 +54,7 @@ class AutomatedCleanup ...@@ -54,7 +54,7 @@ class AutomatedCleanup
end end
def perform_gitlab_environment_cleanup!(days_for_stop:, days_for_delete:) def perform_gitlab_environment_cleanup!(days_for_stop:, days_for_delete:)
puts "Checking for review apps not updated in the last #{days_for_stop} days..." puts "Checking for Review Apps not updated in the last #{days_for_stop} days..."
checked_environments = [] checked_environments = []
delete_threshold = threshold_time(days: days_for_delete) delete_threshold = threshold_time(days: days_for_delete)
...@@ -84,7 +84,7 @@ class AutomatedCleanup ...@@ -84,7 +84,7 @@ class AutomatedCleanup
elsif deployed_at < stop_threshold elsif deployed_at < stop_threshold
stop_environment(environment, deployment) stop_environment(environment, deployment)
else else
print_release_state(subject: 'Review app', release_name: environment.slug, release_date: last_deploy, action: 'leaving') print_release_state(subject: 'Review App', release_name: environment.slug, release_date: last_deploy, action: 'leaving')
end end
checked_environments << environment.slug checked_environments << environment.slug
...@@ -94,9 +94,9 @@ class AutomatedCleanup ...@@ -94,9 +94,9 @@ class AutomatedCleanup
end end
def perform_helm_releases_cleanup!(days:) def perform_helm_releases_cleanup!(days:)
puts "Checking for Helm releases not updated in the last #{days} days..." puts "Checking for Helm releases that are FAILED or not updated in the last #{days} days..."
threshold_day = threshold_time(days: days) threshold = threshold_time(days: days)
releases_to_delete = [] releases_to_delete = []
...@@ -104,7 +104,7 @@ class AutomatedCleanup ...@@ -104,7 +104,7 @@ class AutomatedCleanup
# Prevents deleting `dns-gitlab-review-app` releases or other unrelated releases # Prevents deleting `dns-gitlab-review-app` releases or other unrelated releases
next unless release.name.start_with?('review-') next unless release.name.start_with?('review-')
if release.status == 'FAILED' || release.last_update < threshold_day if release.status == 'FAILED' || release.last_update < threshold
releases_to_delete << release releases_to_delete << release
else else
print_release_state(subject: 'Release', release_name: release.name, release_date: release.last_update, action: 'leaving') print_release_state(subject: 'Release', release_name: release.name, release_date: release.last_update, action: 'leaving')
...@@ -180,14 +180,14 @@ end ...@@ -180,14 +180,14 @@ end
automated_cleanup = AutomatedCleanup.new automated_cleanup = AutomatedCleanup.new
timed('Review apps cleanup') do timed('Review Apps cleanup') do
automated_cleanup.perform_gitlab_environment_cleanup!(days_for_stop: 2, days_for_delete: 3) automated_cleanup.perform_gitlab_environment_cleanup!(days_for_stop: 5, days_for_delete: 6)
end end
puts puts
timed('Helm releases cleanup') do timed('Helm releases cleanup') do
automated_cleanup.perform_helm_releases_cleanup!(days: 3) automated_cleanup.perform_helm_releases_cleanup!(days: 7)
end end
exit(0) exit(0)
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment