Git Product home page Git Product logo

infra-coding-challenge's Introduction

Coding Challenge

Introduction

At Datafiniti, we have several millions of product records in our database that we've collected from several retailers on the internet. We're tasked with keeping those records as up-to-date as we can. This means our data pipeline needs to be blazing fast with regards to getting data from the internet into our database. This coding challenge exposes you to a miniature prototype of our pipeline and tasks you with speeding up the rate at which data moves through it.

Objective

To reiterate, our goal is to get data, in the form of json objects, from point A to point B as fast as we possibly can. These json objects begin by being enqueued in a shared in-memory cache (redis in this case). We provide you with code that simply dequeues records one at a time and imports them into elasticsearch. Your objective is to do whatever it takes to increase the rate at which records are inserted into elasticsearch. You can complete this challenge by improving either the java or nodejs code we provide. Feel free to change any part of this codebase or introduce/replace technologies in order to increase import rates. The only rule of this challenge is that records must be enqueued somewhere and then imported into elasticsearch. This may seem initially daunting but don't worry, we provide plenty of tools which we explain in the section below.

Setup Instructions

Prerequistes

  • Docker

    • If you don't already have docker you can download and install it from the following links:
    • For those of you on Windows or a Mac: increase the memory allocated to docker to at least 4GB. This setting can be found within docker's preferences. Feel free to reach out if you don't have that much memory available on your machine. Linux users don't need to worry about this because there's no virtual machine running between your host OS and your docker containers.
  • Fork this repository, clone it down and cd into it.

  • The repository contains a docker composition that sets up the following docker containers:

    • Redis
    • Elasticsearch
    • Kibana. This is a fancy web UI that you can use to monitor import metrics in order to benchmark your solution
    • A dev container with either java or nodejs installed depending on which you decide to use
  • Follow the instructions below for your language of choice

NodeJS

Instructions Screencast

  1. From within the coding-challenge directory, run ./bin/setup-nodejs.sh

  2. Once the output is done printing to the terminal click on this link.

  3. Click the Monitoring button on the far left

  4. Click Enable Monitoring

  5. Open another terminal and type in docker exec -it node-dev bash

    • This is equivalent to sshing into a virtual machine that has nodejs installed
  6. To run the code we've provided run the following commands

    cd code
    npm install
    node baseline.js
    • This code will seed 10,000 product records into redis and then begin to import one record at a time into elasticsearch
  7. Head over to the browser window that you have kibana open in

  8. Click on indices and then records

  9. Watch the Indexing Rate graph to see how fast the provided solution is.

    • You'll be using this graph to benchmark your solution
    • Your solution will be assessed by first running baseline.js and then your solution. If the graph indicates that your solution is indexing records faster than the baseline then Congratulations! you did it.
  10. Include a file named solution.js that runs your solution. Including more than 1 file is perfectly fine so long as we can kick everything off by running node solution.js. Additionally, feel free to modify any and all parts of the code provided in this challenge.

Java

Instructions Screencast

  1. From within the coding-challenge directory, run ./bin/setup-java.sh

  2. Once the output is done printing to the terminal click on this link.

  3. Click the Monitoring button on the far left

  4. Click Enable Monitoring

  5. Open another terminal and type in docker exec -it java-dev bash

    • This is equivalent to sshing into a virtual machine that has java installed
  6. To run the code we've provided run the following commands

    cd code
    
    # This command will compile the baseline solution we provide to you and run it.
    ./bin/run.sh
    • This code will seed 10,000 product records into redis and then begin to import one record at a time into elasticsearch
  7. Head over to the browser window that you have kibana open in

  8. Click on indices and then records

  9. Watch the Indexing Rate graph to see how fast the provided solution is.

    • You'll be using this graph to benchmark your solution
    • Your solution will be assessed by first running Baseline and then your solution. If the graph indicates that your solution is indexing records faster than the baseline then Congratulations! you did it.
  10. Provide a way for us to start your solution in a class named Solution. You can include all of your solution in that class or create additional classes as you see fit. Just make sure that we can run your solution from the Main class by doing something like Solution.start() or Solution.run(). Additionally, free to modify any and all parts of the code provided in this challenge.

    • You can run your code using ./bin/run.sh so long as you run your solution from within Main

Cleanup and Submission

  • Once you're done, from within the coding-challenge directory, run ./bin/teardown.sh in order to delete all containers and volumes
  • Send over an email with a link to your forked repo and we'll take a look ASAP!

Hints

  • Speeding up the import rate does not require some fancy algorithm or data strucuture.
  • We will be trying your solution against different amounts of seeded records, likely somwhere between 10,000 - 50,000. Make sure your solution isn't hardcoded for just 10,000 records.

Good Luck!

infra-coding-challenge's People

Contributors

mistermoe avatar anjaliannahuja avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.