Git Product home page Git Product logo

kafka-go-cardinality's Introduction

About this project:

This project is a solution to the Data Engineer Challenge. It sends statistics to the output topic and the counting error margin is less than 1%. It doesn't deal with latency, but it does work with historically data (you simply re-run it on a Kafka topic, since the entire point is for it to be stateless).

The original solution that got me a job is at commit: 56103583efeb057546bd402882908f7a0fda9ba2

The current and improved solution is for my personal satisfaction...

How to run this project:

You can simply use go run, or if you want you can use Docker. If you are using Docker, just make sure that the container has access to the host network so it can use the local Kafka instance. There are a few environment variables that this project uses, but they have sensible defaults, so you don't have to bother with them. Here's the list of the used environment variables and their default values:

KAFKA_BROKER=localhost:9092
USERS_TOPIC=users
STATS_TOPIC=stats

To send messages to a Kafka topic you can use this command:

gzcat stream.jsonl.gz | kafka-console-producer --bootstrap-server localhost:9092 --topic users

To receive messages from Kafka topic, you can use this command:

kafka-console-consumer --bootstrap-server localhost:9092 --topic stats

You can find sample data (stream.jsonl.gz) here.

kafka-go-cardinality's People

Contributors

matejamaric avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

kenz68

kafka-go-cardinality's Issues

Handle latancy up to X seconds

POSSIBLE SOLUTION:
There should be two HyperLogLog-s and two lastFlush-es.
When a message has a timestamp minus interval that's X seconds longer then the it's lastFlush then the result should be emitted.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.