info:To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
type:reference
---
# Pipeline Efficiency
[CI/CD Pipelines](index.md) are the fundamental building blocks for [GitLab CI/CD](../README.md).
Making pipelines more efficient helps you save developer time, which:
- Speeds up your DevOps processes
- Reduces costs
- Shortens the development feedback loop
It's common that new teams or projects start with slow and inefficient pipelines,
and improve their configuration over time through trial and error. A better process is
to use pipeline features that improve efficiency right away, and get a faster software
development lifecycle earlier.
First ensure you are familiar with [GitLab CI/CD fundamentals](../introduction/index.md)
and understand the [quick start guide](../quick_start/README.md).
## Identify bottlenecks and common failures
The easiest indicators to check for inefficient pipelines are the runtimes of the jobs,
stages, and the total runtime of the pipeline itself. The total pipeline duration is
heavily influenced by the:
- Total number of stages and jobs
- Dependencies between jobs
- The ["critical path"](#directed-acyclic-graphs-dag-visualization), which represents
the minimum and maximum pipeline duration
Additional points to pay attention relate to [GitLab Runners](../runners/README.md):
- Availability of the runners and the resources they are provisioned with
- Build dependencies and their installation time
-[Container image size](#docker-images)
- Network latency and slow connections
Pipelines frequently failing unnecessarily also causes slowdowns in the development
lifecycle. You should look for problematic patterns with failed jobs:
- Flaky unit tests which fail randomly, or produce unreliable test results.
- Test coverage drops and code quality correlated to that behavior.
- Failures that can be safely ignored, but that halt the pipeline instead.
- Tests that fail at the end of a long pipeline, but could be in an earlier stage,
causing delayed feedback.
## Pipeline analysis
Analyze the performance of your pipeline to find ways to improve efficiency. Analysis
can help identify possible blockers in the CI/CD infrastructure. This includes analyzing:
- Job workloads
- Bottlenecks in the execution times
- The overall pipeline architecture
It's important to understand and document the pipeline workflows, and discuss possible
actions and changes. Refactoring pipelines may need careful interaction between teams
in the DevSecOps lifecycle.
Pipeline analysis can help identify issues with cost efficiency. For example, [runners](../runners/README.md)
hosted with a paid cloud service may be provisioned with:
- More resources than needed for CI/CD pipelines, wasting money.
- Not enough resources, causing slow runtimes and wasting time.
### Pipeline Insights
The [Pipeline success and duration charts](index.md#pipeline-success-and-duration-charts)
give information about pipeline runtime and failed job counts.
Tests like [unit tests](../unit_test_reports.md), integration tests, end-to-end tests,
[code quality](../../user/project/merge_requests/code_quality.md) tests, and others
ensure that problems are automatically found by the CI/CD pipeline. There could be many
pipeline stages involved causing long runtimes.
You can improve runtimes by running jobs that test different things in parallel, in
the same stage, reducing overall runtime. The downside is that you need more runners
running simultaneously to support the parallel jobs.
The [testing levels for GitLab](../../development/testing_guide/testing_levels.md)
provide an example of a complex testing strategy with many components involved.
### Directed Acyclic Graphs (DAG) visualization
The [Directed Acyclic Graph](../directed_acyclic_graph/index.md)(DAG) visualization can help analyze the critical path in
the pipeline and understand possible blockers.
![CI Pipeline Critical Path with DAG](img/ci_efficiency_pipeline_dag_critical_path.png)
### Pipeline Monitoring
Global pipeline health is a key indicator to monitor along with job and pipeline duration.
[CI/CD analytics](index.md#pipeline-success-and-duration-charts) give a visual
representation of pipeline health.
Instance administrators have access to additional [performance metrics and self-monitoring](../../administration/monitoring/index.md).
You can fetch specific pipeline health metrics from the [API](../../api/README.md).
External monitoring tools can poll the API and verify pipeline health or collect
metrics for long term SLA analytics.
For example, the [GitLab CI Pipelines Exporter](https://github.com/mvisonneau/gitlab-ci-pipelines-exporter)
for Prometheus fetches metrics from the API. It can check branches in projects automatically
and get the pipeline status and duration. In combination with a Grafana dashboard,
this helps build an actionable view for your operations team. Metric graphs can also
be embedded into incidents making problem resolving easier.
![Grafana Dashboard for GitLab CI Pipelines Prometheus Exporter](img/ci_efficiency_pipeline_health_grafana_dashboard.png)
Alternatively, you can use a monitoring tool that can execute scripts, like
[`check_gitlab`](https://gitlab.com/6uellerBpanda/check_gitlab) for example.
#### Runner monitoring
You can also [monitor CI runners](https://docs.gitlab.com/runner/monitoring/) on
their host systems, or in clusters like Kubernetes. This includes checking:
- Disk and disk IO
- CPU usage
- Memory
- Runner process resources
The [Prometheus Node Exporter](https://prometheus.io/docs/guides/node-exporter/)
can monitor runners on Linux hosts, and [`kube-state-metrics`](https://github.com/kubernetes/kube-state-metrics)
runs in a Kubernetes cluster.
You can also test [GitLab Runner auto-scaling](https://docs.gitlab.com/runner/configuration/autoscale.html)
with cloud providers, and define offline times to reduce costs.
#### Dashboards and incident management
Use your existing monitoring tools and dashboards to integrate CI/CD pipeline monitoring,
or build them from scratch. Ensure that the runtime data is actionable and useful
in teams, and operations/SREs are able to identify problems early enough.
[Incident management](../../operations/incident_management/index.md) can help here too,
with embedded metric charts and all valuable details to analyze the problem.
### Storage usage
Review the storage use of the following to help analyze costs and efficiency:
-[Job artifacts](job_artifacts.md) and their [`expire_in`](../yaml/README.md#artifactsexpire_in)
configuration. If kept for too long, storage usage grows and could slow pipelines down.