A service is provided which serves up 1024 bits every second (via http). This is a low dimensional representation of 1 second of video, generated by a Neural Network. We are tasked to find repeated seconds of video, with a time window of one day.
Research has revealed that the best way to find repeated sections is to compare bitwise XOR of the current 1024 bit value vs the previous 1024 bit value. If we call the current value
When comparing this value to other options
A match is then given by comparing the 5 scores going back in time, according to
where
The requirement is to report the times which match to another service, which is available via REST POST and requires an input of the format:
{
"channel": "",
"time": 0,
"match_times": [0]
}
times are UNIX timestamps (integer) and the channel is a string to identify the video source.
The team would like only the top 10 matches, based on score, if more than 10 occur for a given time.
- Programming Language: Python
- Frameworks: FastAPI
- Data Storage: Redis, Firestore (GCP NoSQL database)
- Deployment: Docker for containerization, Kubernetes for orchestration
- Testing: Pytest for unit tests, Postman for API testing
- Time estimate: Initial development - 2 weeks, Testing and debugging - 1 week
- QPS: number of sources(could be 100)
- According to the implementation of FastAPI Server, QPS depends on the number of sources providing 1024-bits bytes.
- Storage:
- Radis: 0.03GB * number of source
- Value: 1024 * 60 * 60 * 24 * 2(to compare with previous day) bits ≈ 0.02 GB
- Key: 20(average text length of source+timestamp) * 8 * 4 * 60 * 60 * 24 * 2 bits ≈ 0.01 GB
- Storage will expire and be removed automatically.
- Firestore: 0.003 GB * number of source
- For pi notation score : (8 bytes (timestamp) + 8 bytes (score)) * 8 * 60 * 60 * 24 * 2 bits ≈ 0.003 GB
- Regularly cleaned out expired data
- Radis: 0.03GB * number of source
- Send HTTP GET requests for 1024-bits bytes.
- Implement XOR comparison and scoring mechanism.
- Store bytes and scores with timestamps.
- Send HTTP POST requests reporting matches to another service.
- Provide a configurable threshold for match detection.
- Limit reported matches to the top 10 based on score.
- Additional sources: Scale horizontally via deploying multiple instances of the service behind a load balancer.
- Increased comparison time: Optimize algorithms for efficiency and consider parallel processing techniques.
- Increased bit array size: Design system to handle larger bit arrays efficiently, potentially using distributed computing.
Send POST requests(Scheduler) to the /report
endpoint with the following request body:
{
"source": "string",
"source_url": "string",
"threshold": 0,
"reporting_url": "string"
}
Build the Docker image:
docker-compose build
To run the docker-compose environment:
docker-compose up
To run unit test:
docker-compose run --rm app py.test app/tests --cov=app
or
pip install -r requirements.txt
pytest app/tests --cov