Take home assignment based on Demyst Python libraries and building a model
In this challenge you will have to use our Analytics Python package using the documentation to show your understanding of APIs and knowledge of Python. The API documentation is available on https://demyst.com/docs/python/api-reference/. You can also login to our platform through the website where you will find input file for this challenge that you need to append with external data using the Python API.
Once you are done with the data enrichment, you will have to predict the target variable (safety_flag). The input file is available through the Transfer Files section of the platform.
Perform three sub-tasks for submission:
- Analyze and clean the input file using Python/Pandas in a Jupyter Notebook.
- Enrich the cleaned input with external data from the providers available on the platform.
- Use the enriched file to predict the target variable (safety_flag). You can use any model building packages/tools. Make sure the model can be re-trained.
A Jupyter Notebook displaying the steps you took to clean the input and enrich it through our Analytics Python package along with the model must be pushed to the Github Repo with any supporting files. Please be reminded to cache the enriched data in order to save some cost as there is an upper limit for the data enrichment. A smart tip is to try to run a few records first to test what the input and ouput should look like.
If you use any third party libraries / non-standard build tools, document the build instructions clearly in a readme file.
- Coding style: 30% Ease of maintenance; terseness; use of best practices; leverage latest technologies / libraries / clever coding techniques; etc. Appropriate choice of 3rd party libraries or frameworks is encouraged.
- Data scrubbing: 20% Steps taken to clean data and possibility of automating the cleansing step through scripts.
- Documentation: 20% Is the API doc self-explanatory?
- Modelling: 30% Understanding of models and analytics and implementation on the dataset to predict the outcome.
Feel free to ask any questions as you tackle challenge! Have Fun!