Assignment for Programming for Big Data Course https://github.com/barrysheppard/B8IT105-CA4
Name: Barry Sheppard Student Number: 10387786
- import_gitlog.py has code for parsing and cleaning
- test_import_gitlog.py has unittests for import_gitlog.py
- heatmap.py was 3rd party code with custom addition at eof
- Analysis.ipynb has the actual analysis along with comments
Assignment 4 is based on transforming a large dataset in text format - over 5000 lines of text. You will need to scrub (clean) the data and place it into the relevant holder/container objects. Once in these objects you will see that there are 422 different sets of commit objects. So your task will be to analyse these 422 objects that are in a list and come up with 3 interesting statistical pieces of information for this dataset with supporting evidence of "interestingness' You code for calculating the analysis should be documented and tested. Test should be in a separate file runnable from the command line. Your statistical analytics conclusions should be in a word document explaining in approximately 500 words the information that you have gleamed from the dataset. You will be required to submit your code via github along with all documentation and tests. The deadline is the 4th November 2018 on moodle @ 23:55.