Git Product home page Git Product logo

fastapi_service's Introduction

Demo

Problem

A service is provided which serves up 1024 bits every second (via http). This is a low dimensional representation of 1 second of video, generated by a Neural Network. We are tasked to find repeated seconds of video, with a time window of one day.

Research has revealed that the best way to find repeated sections is to compare bitwise XOR of the current 1024 bit value vs the previous 1024 bit value. If we call the current value $B_{i}$ then the comparison value $C_{i}$ will be

$$ C_{i} = B_{i} \oplus B_{i-1} $$

When comparing this value to other options $C_{j}$ from the past day, we firstly state that the first 256 bits must match exactly, giving a score based on the remaining 768 bits, preferring bits towards the start of the array. The score between the current second t, and the previous second d, can thus be expressed as:

$$ s_{t,d} = (\prod_{i=0}^{255}!(C_{t}[i] \oplus C_{d}[i])) \times (\sum_{i=0}^{767}\frac{(1024-i)}{1024}!(C_{t}[i+256]\oplus C_{d}[i+256])) $$

A match is then given by comparing the 5 scores going back in time, according to

$$ S_{t,d} = \prod_{i=0}^{4}s_{t-i,d-i} > T $$

where $T$ is some threshold float value, which we can adjust.

The requirement is to report the times which match to another service, which is available via REST POST and requires an input of the format:

{
  "channel": "",
  "time": 0,
  "match_times": [0]
}

times are UNIX timestamps (integer) and the channel is a string to identify the video source.

The team would like only the top 10 matches, based on score, if more than 10 occur for a given time.

Solution

1. Technology

  • Programming Language: Python
  • Frameworks: FastAPI
  • Data Storage: Redis, Firestore (GCP NoSQL database)
  • Deployment: Docker for containerization, Kubernetes for orchestration
  • Testing: Pytest for unit tests, Postman for API testing

2. Estimations & Required Resources

  • Time estimate: Initial development - 2 weeks, Testing and debugging - 1 week
  • QPS: number of sources(could be 100)
    • According to the implementation of FastAPI Server, QPS depends on the number of sources providing 1024-bits bytes.
  • Storage:
    • Radis: 0.03GB * number of source
      • Value: 1024 * 60 * 60 * 24 * 2(to compare with previous day) bits ≈ 0.02 GB
      • Key: 20(average text length of source+timestamp) * 8 * 4 * 60 * 60 * 24 * 2 bits ≈ 0.01 GB
      • Storage will expire and be removed automatically.
    • Firestore: 0.003 GB * number of source
      • For pi notation score : (8 bytes (timestamp) + 8 bytes (score)) * 8 * 60 * 60 * 24 * 2 bits ≈ 0.003 GB
      • Regularly cleaned out expired data

3. Requirements

  • Send HTTP GET requests for 1024-bits bytes.
  • Implement XOR comparison and scoring mechanism.
  • Store bytes and scores with timestamps.
  • Send HTTP POST requests reporting matches to another service.
  • Provide a configurable threshold for match detection.
  • Limit reported matches to the top 10 based on score.

4. Scalability

  • Additional sources: Scale horizontally via deploying multiple instances of the service behind a load balancer.
  • Increased comparison time: Optimize algorithms for efficiency and consider parallel processing techniques.
  • Increased bit array size: Design system to handle larger bit arrays efficiently, potentially using distributed computing.

Usage

Send POST requests(Scheduler) to the /report endpoint with the following request body:

{
  "source": "string",
  "source_url": "string",
  "threshold": 0,
  "reporting_url": "string"
}

System Diagram

image

Getting Started

Build the Docker image:

docker-compose build

To run the docker-compose environment:

docker-compose up

To run unit test:

docker-compose run --rm app py.test app/tests --cov=app

or

pip install -r requirements.txt
pytest app/tests --cov

fastapi_service's People

Contributors

thomas-cloudmile avatar thomas-chiang avatar

Stargazers

RayWu avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.