Git Product home page Git Product logo

slo's Introduction

slo

Title of the SLO Document

This document describes the SLO for chatbot service.

Status Published
Author Aamir Raza
Date 2021-04-26
Reviewers
Approvers
Approval date
Revisit date

Service Overview

Transform chatbot based user and cosumer experience that is integrated with Facebook and Line Messenger.Platform core services are

  1. Chatbot management web application
  2. Message sending and receiving service to messenger
  3. Group of services that cut out common processing of prior services.

Users interacts with application through messenger,its inflows and utterances are sent back to messagig service .Serverless pipeline is in built between user and messaging service.Serverless stacks JSON data and returns HTTP response immediately. Web application is for user management ,CRUD of chatbot utterance content and DB linkage with client.It faces cloud load balancer and fastly for static file delivery. gRPC based group of services for providing common functionlity such as narrowing down of users. Memory store is used for scenarios distribution to users. All data is persisted into single database for CRUD ops.

The SLO is uses a four week rolling window.
Each objective has a separate error budget
Formula = 100% minus (-) the goal for that objective.

SLI's capture ratio of good events to total events
Error budget gives number of allowed bad events.
Error rate is the ratio of bad events to total events

SLIs and SLOs

Category SLI SLO Error budget Error rate Source
Request Driven API Total no. of requests are 1,000,000 than value of error budget is
Availability Any HTTP status code other than 500-599 is considered successful
Proportion of successful http requests / total http requests 97% success 3% = 30000 errors 3%
Latency Proportion of fast reqs <400 ms / total no. of reqs 90% reqs <400ms 10% = 1,000,00 reqs<400ms 10%
Proportion of fast reqs <800 ms / total no. of reqs 97% reqs <800ms 3% = 30000 reqs<800ms 3%
Proportion of slow reqs <6000 ms / total no. of reqs 80% reqs <6000ms 20% = 2,000,00 reqs<6secs 20%
Proportion of slow reqs <8000 ms / total no. of reqs 89% reqs <8000ms 11% = 1,100,00 reqs<8secs 11%
Error Explicit: HTTP 500-599
Proportion of errors having status code / total http reqs 3% error 3% = 3,000,0 errors 3%
Implicit: HTTP 200 but coupled with wrong content
Proportion of errors having wrong content / total http reqs 1% error 1,000,0 errors 1%
Policy:
Committed to 1 sec response time but delayed 3% conflict with defined policy 3,000,0 errors 3%
Quality Proportion of successful reqs when cpu overloaded 90 % 80 % success 20% = 2,000,00 errors 20%
Proportion of successful reqs when memory overloaded 90 % 80% success 20% = 2,000,00 errors 20%
Proportion of successful reqs whe datastore is unavailable % 80% success 20% = 2,000,00 errors 20%
Web server
Availability Proportion of successful web requests / total web requests 99.9% success 0.1% = 1000errors 0.1%
Latency Proportion of fast reqs <200 ms / total no. of reqs 90% reqs <200ms 10% = 1,000,00 reqs<200ms 10%
Proportion of fast reqs <1000 ms / total no. of reqs 99% reqs <1000ms 1% = 1,000,0 reqs<1secs 1%
Proportion of slow reqs <6000 ms / total no. of reqs 80% reqs <6000ms 20% = 2,000,00 reqs<6secs 20%
Proportion of slow reqs <8000 ms / total no. of reqs 89% reqs <8000ms 11% = 1,100,00 reqs<8secs 11%
gRPC Server
Availability Proportion of successful grpc requests/ total grcp requests 99.99% success 0.01% = 100 errors 0.01%
Latency Proportion of fast reqs <200 ms / total no. of reqs 90% reqs <200ms 10% = 1,000,00 reqs < 200ms 10%
Proportion of fast reqs <1000 ms / total no. of reqs 97% reqs <1000ms 3% = 3,000,0 reqs<1secs 3%
Proportion of slow reqs <6000 ms / total no. of reqs 80% reqs <6000ms 20% = 2,000,00 reqs<6secs 20%
Proportion of slow reqs <8000 ms / total no. of reqs 89% reqs <8000ms 11% = 1,100,00 reqs<8secs 11%
Pipeline
Freshness Proportion of records read from table recently
Recently is defined by 1 min to 10 min
Use metrics from API and HTTP server
Count of all data reqs for "api" & "webserver" with 1 min freshness / total no. of data reqs 90% of reads use data written previous 1 min 10% = 1,000,00 reads use data written more than 1 min 10%
Count of all data reqs for "api" & "webserver" with 10 min freshness / total no. of data reqs **99% of reada use data written previous 10 min ** 1% = 1,000,0 reads use data written more than 10 min 1%
Correctness Proportion of records injected into table by prober
Result in correct data beingg read
Prober should export outcome metric
99.999% of records injected by prober results in correct output
Completeness Proportion of hours in which 100% of data processed (no data skipped)
count of pipeline runs that procssed 100 percent of records divided by total pipeline runs
99 % of pipeline runs cover 100% data In case of total 1000 pipelines runs
1% = 10 pipelines
1%

Suggestions: Overview of monitoring technique and existing infra should also mentioned.
Development technological stack with exact versions and languages should be mentioned

References:

  1. SLO
  2. Implementing SLO
  3. Sample SLO Document

slo's People

Contributors

aamir-raza-1 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.