The code should run with no issues using Python versions 3.6.3
Libraries embedded in Anaconda distribution used in the project:
- sys
- numpy
- pandas
- matplotlib.pyplot
- seaborn
- sklearn.cluster
- sklearn.preprocessing
External function library not embedded in Anaconda distribution used in the project called "proj1_func_library", which included in:
- proj1_func_library.py
For this project, I was interestested in using Stack Overflow data from 2020 to better understand three questions:
- What are top FIVE hottest programming languages in 2020?
- What kind of people would like to keep studying new technologies?
- How could Stack Overflow segment their visitors according their behaviours on the platform?
There are 3 jupyter notebook files (*.ipynb) available here to showcase the works for three questions mentioned above respectively with CRISP-PM process:
- project_1_question_1.ipynb
- project_1_question_2.ipynb
- project_1_question_3.ipynb
Those 3 jupyter notebooks required a external library call "proj1_func_library", which included in "proj1_func_library.py".
- proj1_func_library.py
Moreover, for easily reading, there are 3 HTML files generated by those 3 jupyter notebook files above. They are:
- project_1_question_1.html
- project_1_question_2.html
- project_1_question_3.html
The main findings of the analysis can be found at the post available on my Medium Medium post here.
Must give credit to Stack Overflow for the data. You can find the Licensing for the data and other descriptive information at the Kaggle link available here. Otherwise, feel free to use the code here as you would like!
(version 1.2 in friends_group)