Git Product home page Git Product logo

de-code-challenge-saleem's Introduction

Data-Engineering-Challenge

Take home assignment based on Demyst Python libraries and building a model

Description

In this challenge you will have to use our Analytics Python package using the documentation to show your understanding of APIs and knowledge of Python. The API documentation is available on https://demyst.com/docs/python/api-reference/. You can also login to our platform through the website where you will find input file for this challenge that you need to append with external data using the Python API.

Once you are done with the data enrichment, you will have to predict the target variable (safety_flag). The input file is available through the Transfer Files section of the platform.

Requirement

Perform three sub-tasks for submission:

  1. Analyze and clean the input file using Python/Pandas in a Jupyter Notebook.
  2. Enrich the cleaned input with external data from the providers available on the platform.
  3. Use the enriched file to predict the target variable (safety_flag). You can use any model building packages/tools. Make sure the model can be re-trained.

A Jupyter Notebook displaying the steps you took to clean the input and enrich it through our Analytics Python package along with the model must be pushed to the Github Repo with any supporting files. Please be reminded to cache the enriched data in order to save some cost as there is an upper limit for the data enrichment. A smart tip is to try to run a few records first to test what the input and ouput should look like.

If you use any third party libraries / non-standard build tools, document the build instructions clearly in a readme file.

Evaluation

  • Coding style: 30% Ease of maintenance; terseness; use of best practices; leverage latest technologies / libraries / clever coding techniques; etc. Appropriate choice of 3rd party libraries or frameworks is encouraged.
  • Data scrubbing: 20% Steps taken to clean data and possibility of automating the cleansing step through scripts.
  • Documentation: 20% Is the API doc self-explanatory?
  • Modelling: 30% Understanding of models and analytics and implementation on the dataset to predict the outcome.

Feel free to ask any questions as you tackle challenge! Have Fun!

de-code-challenge-saleem's People

Contributors

virsal avatar hps257 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.