Skip to main content

So, you started adopting DevOps in your company. But how can you tell if it’s improving your processes? You have to measure the success somehow and can achieve this by monitoring some key DevOps metrics.

There are many ways to assess the quality of a system or application, but in this article, we’ll focus on key metrics that help evaluate the quality of your processes. By tracking these indicators, you can gain deeper insights into your strengths and weaknesses, improve your DevOps best practices, and leverage the right tools and software for continuous improvement.

DORA Metrics

DORA is a DevOps term that stands for DevOps Research and Assessment. Google purchased the DORA team in 2018.

DORA employs data-driven insights to promote DevOps best practices, focusing on assisting organizations in developing and delivering software more effectively. DORA is still working with the Google Cloud team to offer DevOps studies and reports to help enterprises improve software delivery.

Deployment Frequency

An important metric for DevOps success is the number of deployments in a given timeframe. A high deployment frequency indicates that new business value is delivered more frequently and in smaller increments.

Frequent deployments mean that the errors associated with failed deployments are reduced. In turn, this will increase overall customer satisfaction. 

In a 2021 State of DevOps report, DORA researchers revealed that elite teams deploy multiple times per day, high performers deploy between once per hour and once per day, and low performers have a deployment frequency between once a week and once a month. If you are on the lower end, you might want to consider increasing your deployment frequency.

Related: 10 BEST DEVOPS DEPLOYMENT TOOLS FOR QA TEAMS

How to Measure Deployment Frequency

To measure deployment frequency, it’s enough to collect from the pipeline tool (Azure DevOps, Jenkins, etc.) the number of builds and the number of builds deployed to production successfully and divide (Deployments / Total Builds * 100). The higher the number, the better. 

Lead Time for Changes

Another DORA metric is the lead time for changes. One of the main advantages of using DevOps in software development is the ability to release quickly, so it’s a good idea to measure how long a work item can get from the DevOps implementation stage to be pushed to production. This means the entire cycle of an item, namely development, testing, and delivery.

A shorter lead time is usually better, so the objective is to reduce the total deployment time. This can be done by improving test integration and automation, for example.

Related: WHAT IS DEVOPS RELEASE MANAGEMENT AND 4 BEST PRACTICES

How to Measure Lead Time

Like any other metric, it brings little value until we have enough data to refer to. Average the lead time for changes across a period for multiple commits to compute the lead time for changes. Calculating the meantime is essential because no two modifications are the same, and lead times will vary depending on the scope and kind of change.

Mean Time to Recovery (MTTR)

This metric refers to the time it takes for the organization to recover from a production failure. 

As much as we hate it, unplanned outages or failures are a natural part of any system’s life. And since they are inevitable, what matters is how long it takes to restore the system or the application. 

The metric is significant because it helps DevOps for startups and more established companies create more reliable systems. 

Discover how to deliver better software and systems in rapidly scaling environments.

Discover how to deliver better software and systems in rapidly scaling environments.

By submitting this form you agree to receive our newsletter and occasional emails related to the CTO. You can unsubscribe at anytime. For more details, review our Privacy Policy. We're protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
This field is for validation purposes and should be left unchanged.

How to Measure Mean Time to Recovery

MTTR can be calculated by keeping track of the average time between when a defect was reported and when the fix was deployed to production. It is done as part of the continuous monitoring activities performed by the DevOps teams.

According to DORA’s report, elite teams have a mean time to recover below an hour, high-performing teams less than a day, and it may take medium- and low-performing teams between a day and a month.

Change Failure Rate

Last but not least, of the DORA metrics, the Change Failure Rate is the number of code changes that resulted in incidents, defects, rollbacks, or any other production failure. It looks at how many deployments failed when released into production. 

It determines how stable and efficient your DevOps processes are. Tracking the change failure rate aims to automate additional DevOps operations. Increased automation makes more consistent and dependable software more likely to succeed in production.

As opposed to the previous metrics, the Change Failure Rate truly measures the quality of the software. A lower change failure rate means that fewer failures are pushed to production, which should increase customer satisfaction.

How to Measure Change Failure Rate

Of course, the aim is to have the Change Failure Rate as low as possible. To calculate it, divide the number of deployment failures by the total number of deployments. 

Other Notable DevOps Metrics

Apart from the DORA metrics, other essential metrics can give insights into the performance of the DevOps team.

Pre-Deployment Performance Monitoring Tools and Techniques

Monitoring application performance before deployment is crucial for identifying potential issues and ensuring smooth operation post-deployment. Tools like Retrace, New Relic, and Datadog help DevOps engineers detect performance problems, hidden errors, and other issues before they impact end users.

Key Techniques for Pre-Deployment Monitoring

  • Load Testing: Simulates high-traffic scenarios to identify potential bottlenecks.
  • Code Profiling: Helps developers analyze application performance at a granular level.
  • Automated Testing: Ensures code changes do not introduce regressions before deployment.

Post-Deployment Application Performance Monitoring

After deployment, whether you are undergoing testing in production or not, continuous monitoring of critical application metrics is essential for detecting changes in usage patterns. Tools like Prometheus, Splunk, and AppDynamics help monitor real-time performance.

Key Metrics to Monitor Post-Deployment

  • CPU and Memory Usage: Identifies resource utilization trends.
  • Response Time: Measures the time it takes for the application to respond to requests.
  • Error Rate: Tracks the number of failed requests over time.
  • Throughput: Measures the number of requests processed per second.

The Role of Custom Metrics in Application Performance Monitoring

Custom metrics allow teams to monitor specific aspects of application performance tailored to their needs.

For example, Stackify uses custom metrics to track how many log messages are received via their API per minute. This metric provides insights into the volume of data flowing through their system and helps detect potential issues before they escalate.

Examples of Custom Metrics

  • API Request Latency: Measures the delay between sending a request and receiving a response.
  • Database Query Performance: Tracks query execution times and potential bottlenecks.
  • User Behavior Analytics: Monitors feature adoption and engagement trends.

Cycle Time

Cycle time measures the time between starting to work on a specific item and when it becomes ready for end-users. In development teams, cycle time is between a code commit and when it is deployed to production.

A shorter cycle time indicates an efficient workflow, reducing bottlenecks and increasing development speed.

How to Measure Cycle Time

Cycle time is calculated by tracking timestamps of commits, code merges, and deployments in tools like GitHub, GitLab, or Jenkins.

Cycle Time = Deployment Time - Commit Time

Impact of Cycle Time on Workflow Efficiency

Understanding the relationship between Cycle Time and workflow efficiency is essential for teams looking to improve their processes. The longer the cycle time, the more work is in progress, and the less efficient the workflows. Optimizing and improving the efficiency of workflows can help reduce Cycle Time.

Strategies for Reducing Cycle Time

To improve Cycle Time, teams should:

  • Optimize CI/CD pipelines for faster integration and deployment.
  • Automate builds and testing.
  • Improve collaboration between development and operations teams.
  • Reduce dependencies between tasks to minimize bottlenecks.

Mean Time to Detection (MTTD)

Mean Time to Detection (MTTD) is the time it takes to detect a production failure and flag it as an issue. This metric helps evaluate the effectiveness of your monitoring and alerting systems. The lower the MTTD, the more likely you can fix the problem and push it into production before it affects end users.

How to Measure MTTD

MTTD can be measured by tracking the average time between an issue's occurrence and its detection by monitoring tools or users' reporting. Lowering MTTD involves improving automated monitoring, refining alerting mechanisms, and ensuring proactive issue detection.

Passed Automated Tests

It’s good to strive for good test coverage, especially automated tests. And here, I’m talking about unit, integration, UI, and end-to-end tests. However, good coverage is not enough to ensure the quality of the software. What matters is the percentage of these tests that pass.

Of course, the goal is to have a percentage of passed tests as close to 100% as possible. Monitoring this metric can also reveal how often new developments break existing tests.

How to Measure the Percentage of Passed Automated Tests

The calculation is a simple percentage: multiply the number of passed tests by 100, then divide by the total number of tests. You can get this information from the pipeline tool that runs the builds (Jenkins, Azure DevOps, CircleCI, etc.).

The number can be a good indicator of the quality of the product. However, it can also be tricky if you have flaky or unreliable tests.

Defect Escape Rate

In a utopian world, all our apps would be defect-free. However, that’s rarely the case. Ideally, defects are caught during the development and testing phases of the DevOps process, not in production.

This metric helps to determine the efficacy of your testing processes and the overall quality of your program. A high defect escape rate suggests that procedures must be improved and that more automation is needed, whereas a low rate (ideally near zero) implies a high-quality application.

How to Measure the Defect Escape Rate

To measure this, you can use your bug tracking tool and, for each open defect, track where it has been detected—whether the testing or the production environment (or any other environment you might be using, such as UAT).

Customer Tickets

Customer happiness is a driving element for innovation, and with good reason: a flawless user experience is good customer service and typically corresponds to a rise in sales. As a result, client tickets, especially during the ticket escalation process, are a good indicator of how well your DevOps transition is going. 

Customers should not act as quality control by reporting defects and bugs. Hence, a decrease in customer tickets is a good sign of good application performance.

Final Thoughts

Like any other methodology, DevOps is only successful if implemented correctly. And you can’t know the success until you know how to use DevOps metrics. If you keep an eye on DevOps metrics and continuously work to improve them, your application will always exude quality.

And speaking of quality, if you want to stay updated with news and articles, subscribe to our newsletter!

Also Worth Checking Out:

Andreea Draniceanu

Hi there! My name is Andreea, I’m a software test engineer based in Romania. I’ve been in the software industry for over 10 years. Currently my main focus is UI test automation with C#, but I love exploring all QA-related areas 😊