It's hard to have a clear view on what to learn and what to know to be employable. Especially when you're not in a traditional cursus.
This list is a compilation of most-wanted skills for data scientist based on online job offers.
I took hundreds of data scientist job offers in Paris, France, in Novembre 2020. This list may not be representative of the most-wanted skills in other areas or countries.
The raw data extracted from job offers is visible in JobOffers.md.
The lists are ordered by frequence of mentionning in the offers.
- Understand and implement scientistic papers.
- Statistical methodology. Statistics testing, P-value.
- General statistics knowledge. Distribution, Bayesian inference, statistics models, probabilities.
- Time series analysis.
- Sequential analysis.
- Scoring.
- Regression.
- Econometrics.
- Game theory.
- Complexity estimation.
- Graph theory.
- Approximation algorithm.
- K-nearest neigbours.
- Deep learning. Neural networks theory.
- Decision tree / Gradient boosted decision tree.
- Regression / Logistic regression.
- Reinforcement learning.
- Convolutional Neural Network.
- Neural language processing.
- Ensemble modeling.
- Recommendation.
- Clustering.
- Auto-encoder.
- Restricted Boltzmann machine.
- Qlik.
- Google Data Studio.
- Plotly / Dash. For Python/R.
- Shiny. For R.
- Chartio.
- Matplotlib / Seaborn. For Python.
- Bokeh. For Python, R wrapper.
- Graphiz. For Python/R.
- Kibana.
- PowerBI.
- Sweetviz. For Python.
- Dataiku.
- Druid.
- H2O.ai.
Python was 2x more mentionned than R, but both are really demanded.
SQL is as demanded as R, it appears to be an essential skill.
Dashboarding in general is a top-demanded skill.
- Python.
- R.
- C++.
- Pandas / Numpy. Essential Python data handling libs.
- Scikit-learn.
- Tensorflow / Keras.
- PyTorch.
- PySpark. Connect your Python script to a Spark stack.
- NLTK. Neural language processing lib.
- Scipy.
- MxNet. Deep learning lib.
- XGboost. Gradient boosted decision trees in Pyhton and R.
- Catboost. Yandex boosted gradient decision trees in Python and R.
- LGBM. Microsoft boosted gradient decision trees in Python and R.
- Prophet. Facebook time series forecasting lib.
- Libsvm. Support vector machines in Python.
- Apache Spark. With Hive and AirFlow.
- Hadoop.
- Tableau.
- Linux / Shell scripting.
- Git / Gitlab / Github.
- Docker.
- CD/CI. Jenkins, Gitlab.
- ElasticSearch.
- Excel.
- Google Cloud. Functions, storage, big query.
- AWS.
- SQL.
- NoSQL / Relational algebra. Appears 5x less than SQL, but still interesting to learn.
Soft skills were nearly as mentionned as "Python" or "Tensorflow", so they seem really important.
- Communication. Being able to explain complex algorithms to non-technical clients or other employees. Being able to write reports and documentation on your search work.
- Self-organisation. Being able to organize your work without direct instructions.
- Business inteligence / CRM. Being able to understand how AI can improve a business and client relation management.
- Technological watch. Being able to organise and documentate a technological watch so your company and employees are always open to state of the art technics.