Git Product home page Git Product logo

devopsmetrics's Introduction

DevOps Metrics

Build Coverage Status Code Smells Current Release

Why should we care about DevOps Metrics and what are they? All engineering, including software, needs metrics to track performance, but many metrics when measured individually, can be 'gamed', or don't encourage the right behaviors or incentives. This has been an issue with metrics for many years. The DORA metrics are a step in the right direction, combining several metrics that encourage the behaviors and incentives - and hence that encourage DevOps teams to perform at a high level of performance.

This project is focused on helping you collect and analyze four key high performing DevOps metrics from GitHub and Azure DevOps. DORA's "State of DevOps" research and Accelerate highlighted four driving indicators of high performing DevOps teams. While these four metrics are widely used in DevOps discussion, it's challenging to implement and capture all of the metrics.

  • Deployment frequency: Number of deployments to production. This is important, as it highlights how often you can deploy to production - which in turn indicates there is a mature automated testing and a mature CI/CD pipeline to release to production.
  • Lead time for changes: Time from committing a change to deployment to production. How quickly can we change a line of code and have it running in production? Again, this indicates mature automated testing and a mature CI/CD pipeline able to handle changes.
  • Mean time to restore (MTTR): How quickly restoration of production occurs in an outage or degradation. When there is a degradation, how quickly can the system auto-heal itself, scale to handle increased load, and/or This one is contraversal, as it's challenging to compare different events that cause degradation.
  • Change failure rate: After a production deployment, was it successful? Or was a fix or rollback required after the fact? How often is a change we made 'successful'? This ties in well with deployment frequency and lead time for changes, but is challenging to measure - as it requires a signoff off of success. Not just that the code deployed correctly, but that there weren't adverse effects or degradation of the deployment to the system

High performing metrics (Chart from page 9 of state of DevOps 2021 report) A demo website displaying the metrics can be viewed here. More information about high performing DevOps metrics can be found in a blog post here

The current solution:

We currently have all four of the metrics implemented and undergoing a pilot. There is a Probot for GitHub. (The Azure DevOps widget is on hold to focus on GitHub).

  • Deployment Frequency, in both Azure DevOps and GitHub:

    • How does it work? We look at the number of successful pipeline runs.
    • Assumptions/things we can't currently measure:
      • The build is multi-stage, and leads to a deployment in a production environment.
      • We only look at a single branch (usually the main branch), hence we ignore feature branches (as these probably aren't deploying to production)
    • Current limitations: Only one build/run/branch can be specified Deployment Frequency
  • Lead time for changes, in both Azure DevOps and GitHub:

    • How does it work? We look at the number of successful pipeline runs and match it with Pull Requests
    • Assumptions/things we can't currently measure:
      • We currently count the pull request and deployment durations, averaging them for the time period to create the lead time for changes metric.
      • We start measuring at the last commit for a branch to the PR close/merge time. Development is variable that depends on the task, and doesn't help with this measurement.
      • We assume we are following a git flow process, creating feature branches and merging back to the main branch, which is deployed to production on the completion of pull requests
      • We assume that the user requires pull requests to merge work into the main branch - we are looking at all work that is not on this main branch - hence we currently only support one main branch.
    • Current limitations: Only one repo and main branch can be specified Lead time for changes
  • Time to restore service, in Azure

    • How does it work? We setup Azure Monitor alerts on our resources, for example, on our web service, where we have an alerts for HTTP500 and HTTP403 errors, as well as monitoring CPU and RAM. If any of these alerts are triggered, we capture the alert in an Azure function, and save it into a Azure table storage, where we can aggregate and measure the time of the outage. When the alert is later resolved, this also triggers through the same workflow to save the the resolution and record the restoration of service.
    • Assumptions/things we can't currently measure:
      • Our project is hosted in Azure
      • The production environment is contained in a single resource group
      • There are appropriate alerts setup on each of the resources, each with action groups to save the alert to Azure Storage
      • We generate an SLA, but it's entirely based on the MTTR time - assuming the application is "not available" during this time
    • Current limitations:
      • Only one production resource group can be specified
      • If there is catastrophic resource group failure, (e.g. deleted), there is a high chance that some/all of the alerts will also be deleted Time to restore service
  • Change failure rate, in Azure DevOps and GitHub

    • How does it work? We look at builds, and let the user indicate if it was successful or a failure. By default (currently), the build is considered a failure. (We are going to change this to success by default later)
    • Assumptions/things we can't currently measure:
      • The build is multi-stage, and leads to a deployment in a production environment.
      • We only look at a single branch (usually the main branch), hence we ignore feature branches (as these probably aren't deploying to production)
      • The user has reviewed the build/deployment and confirmed that the production deployment was successful
    • Current limitations: Only one build/run can be specified Change failure rate

Architecture

Developed in .NET 7. A GitHub action runs the CI/CD process.

Currently the CI/CD process:

  1. Builds the code
  2. Runs the unit tests
  3. Deploys the Probot code to a Azure web app (http://devops-prod-eu-probot.azurewebsites.net/) (Currently disabled)
  4. Deploys the webservice to a Azure web app (https://devops-prod-eu-service.azurewebsites.net)
  5. Deploys the demo website to a Azure web app (https://devops-prod-eu-web.azurewebsites.net)
  6. Deploys the function website to a Azure function

Dependabot runs daily to check for dependency upgrades.

Architecture diagram

Badges

The API can generate a URL for static badges, but more work is needed. Some current samples are shown below: Build Build Build Build

Setup

(Old Manual Steps - Outdated!) Manually Deploying Azure Infra

  • Run the infrastructure setup script [Currently \src\DevOpsMetrics.Infrastructure\DeployInfrastructureToAzure.ps1]
  • DevOpsMetrics.Service setup: Keyvault URL and application insights id set as part of setup script
  • Browse to [website name].azurewebsites.net/Home/Settings, and setup your projects as needed. Note that all secrets are loaded into the keyvault and are controlled by you!

Automatically Deploying Azure Infra and Code using Github Actions

1) Create a Service Principal with Owner Role to the Subscription in Azure

2) Set the following Github Secrets:

  • AzureDevOpsPATToken: Azure DevOps PAT Toekn
  • GitHubClientId: Client Id of OAuth Application in Github
  • GitHubClientSecret: Secret of OAuth Application in Github
  • AZ_CLIENT_ID: Client ID of the Azure Service Principal to deploy this tool
  • AZ_CLIENT_SECRET: : Client Secret of the Azure Service Principal to deploy this tool
  • AZ_SUBSCRIPTION_ID: Subscription ID of the Azure Service Principal
  • AZ_TENANT_ID: Tenant ID of the Azure Service Principal
  • AZ_RESOURCES_SUFFIX: Suffix for the Azure Resources - as many resources must have unique names

3) Run the IaC CI/CD pipeline to create the Infra components

4) Run the other pipelines to deploy the solution

To debug/run tests

  • To Debug issues on Web Service:
    • Turn on Development in App Service by defining Config "ASPNETCORE_ENVIRONMENT=Development"
    • Access Swagger using "{server url}/swagger"
    • Invoke the desired call

What's next?

  • Upgrades to packaging and setup (in progress)
  • Upgrades to store data in CosmosDB (currently in Azure storage)
  • Reviewing the current GitHub probot approach, to find a better target than issues (perhaps a metrics readme.md file?)
  • Support for more scenarios, releases, etc
  • Azure DevOps marketplace integrations, so you can see the changes real time on your project/repo. (lower priority to focus on GitHub)

References

devopsmetrics's People

Contributors

samsmithnz avatar dependabot[bot] avatar rmoreiraoms avatar chrishanna avatar dependabot-preview[bot] avatar rmoreirao avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.