Endpoints | Authentication | Description |
---|---|---|
/api | Basic | Healthcheck |
/api/users | Basic | List all available users |
All API calls require Basic Authentication and can be set by passing in the Authorization headers
APP_API_STATUS=$(curl --get --silent --header "Authorization: Basic dGVzdDp0ZXN0MTIz" http://127.0.0.1:3000/api | jq .status)
We have to identify the metrics that we want to focus on. In the context of our implementation, we chose the following under Metrics.
With those metrics in mind, we can determine what and how to measure, quantifying them into useful Statistics.
These can then be Analysed and Visualized, giving us an overview of the system's health. We can also set Alarms and Alerts to inform us if a certain threshold has been crossed.
- Lead Time: Measures the time from implementation to testing to release/delivery
- Mean Time To Recovery: Measures the ability to identify fault and resolve them
- Deployment Frequency: Measures the amount of deployment (i.e. routine/bugfix/hotfix)
- CPU Utilization: Exceeding >80% Usage
- Memory: Exceeding >80% Usage
- Storage/Diskspace: Exceeding >80% Usage
- Service Uptime: Ensure <1% Downtime
- API Requests Errors: Receive <10% Failure
- Email: Outlook, Gmail
- Notification Bot: Slack, Telegram
- https://aws.amazon.com/products/management-and-governance/use-cases/monitoring-and-observability
- https://devops.com/top-5-best-practices-devops-monitoring/
- https://www.atlassian.com/devops/devops-tools/devops-monitoring
- https://www.atlassian.com/devops/frameworks/devops-metrics
- https://stackify.com/15-metrics-for-devops-success/#post-14669-_2ljwt1mgqvyy
- https://www.digitalocean.com/community/tutorials/an-introduction-to-metrics-monitoring-and-alerting
- https://cloud.google.com/architecture/devops/devops-measurement-monitoring-and-observability
- https://devops.com/metrics-logs-and-traces-the-golden-triangle-of-observability-in-monitoring/
- https://jenkins-x.io/blog/2019/07/29/jenkins-x-observability/