Git Product home page Git Product logo

dyte_log_server's Introduction

Dyte Backend Task ๐Ÿงญ

๐Ÿ“š | Problem Statement

  • Develop a log ingestor system that can efficiently handle vast volumes of log data.
  • Offer a simple interface for querying this data using full-text search or specific field filters.
  • The logs should be ingested (in the log ingestor) over HTTP, on port 3000.

Log Ingestor:

  • Develop a mechanism to ingest logs in the provided format.
  • Ensure scalability to handle high volumes of logs efficiently.
  • Mitigate potential bottlenecks such as I/O operations, database write speeds, etc.
  • Make sure that the logs are ingested via an HTTP server, which runs on port 3000 by default.

Query Interface:

  • Offer a user interface (Web UI or CLI) for full-text search across logs.
  • Include filters based on:
    • level
    • message
    • resourceId
    • timestamp
    • traceId
    • spanId
    • commit
    • metadata.parentResourceId
  • Aim for efficient and quick search results.

๐ŸŽฏ | Sample Queries

The following are some sample queries that will be executed for validation.

  • Find all logs with the level set to "error".
  • Search for logs with the message containing the term "Failed to connect".
  • Retrieve all logs related to resourceId "server-1234".
  • Filter logs between the timestamp "2023-09-10T00:00:00Z" and "2023-09-15T23:59:59Z". (Bonus)

๐ŸŒ | Test Project

  • Clone this repository.
  • Install Docker Desktop.
  • Run docker-compose up --build in the root directory of the project. This process takes a a minute or two to complete, for the first time.
  • You might have to stop the containers and run the command again, because the Cassandra containers sometimes takes some time to create superusers upon first time container start, because of which the consumer fails.
  • The Web UI will be available at http://localhost:3000/static/index.html.

Note:

  • The Web UI is very basic, and is only meant for testing the API.

๐Ÿ“ | System Design

Project Logo

Features Implemented:

  • Web UI running on port 3000.
  • Include filters based on:
    • level
    • message
    • resourceId
    • timestamp
    • traceId
    • spanId
    • commit
    • metadata.parentResourceId
  • Implement search within specific date ranges.
  • Allow combining multiple filters.
  • Provide real-time log ingestion and searching capabilities.

Current Architecture:

  • Containerized approach to solving the problem statement.
  • Given the non-blocking & async I/O of FastAPI, it is used as the web framework. This will help in ingesting logs at a faster rate.
  • Apache Kafka is used as the message broker. It will help in decoupling the ingestion and querying process.
  • Apache Cassandra is used as the database. It is a NoSQL database and is highly scalable. It will help in storing the logs in a distributed manner. It also provides a fast read/write speed, i.e. high throughput.
  • Query interface is built upon Elasticsearch. It is a distributed, RESTful search and analytics engine. It helps in providing a fast search result. Elasticsearch works wonders with large databases, with high processing speeds.
  • The logs are ingested via an HTTP server, which runs on port 3000 by default.

Future Scope:

  • The current architecture is a very basic implementation of the problem statement.
  • Depending upon the scale, the entire architecture can be scaled horizontally.
  • Load Balancing can be implemented to handle high volumes of logs efficiently. We might use AWS ELB.
  • Apache Flink can be setup between Kafka and Cassandra for streaming, processing and analytics.
  • Cassandra can be used as a Data Lake, and Apache Spark can be used for analytics.
  • JWT Authentication can be implemented for the Web UI. (didn't have time)
  • Regex filters can be implemented on Elasticsearch. (didn't have time)

๐Ÿง‘๐Ÿฝ | Author

Kaustav Mukhopadhyay



dyte_log_server's People

Contributors

mukaustav avatar

Watchers

 avatar

Forkers

aditya-aka-leo

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.