This repository contains material for introducing common approaches in the machine learning pipeline. It consists of several jupyter notebooks for various tasks and introduces privacy-preserving machine learning using encrypted inference. Finally, the concept of Membership Inference Attack (MIA), which can lead to a serious privacy leakage is introduced.
This repository is made by the Euclid team, Democritus University of Thrace, Dept. of Electrical & Computer Engineering.
- Exploratory Data Analysis: Summarize the main characteristics of a dataset.
- Feature Engineering: Select, transform and generate new features on a dataset.
- Standardization and Normalization: Scale variables.
- Imputation: Handle NaN/null values.
- Feature Selection: Reduce the number of input variables.
- UnderSampling and OverSampling: Handle imbalanced datasets.
- Deep Learning with PyTorch: Build deep neural networks with PyTorch.
- Cross Validation: Evaluate the generalization of a machine learning model.
- Tuning with Grid-search: Find optimal parameters on a given model.
- Tuning with Bayesian Optimization: Find optimal parameters on a given model using Bayesian optimization.
- Introduction to Regression: Build regressor models.
- Introduction to Community Detection on Graphs: Find cluster in graph-based representations.
- Dimensionality Reduction: Project or transform data into a low-dimensional space.
- Introducing CryptoNets: Evaluate a model on encrypted data.
- Introducing MIA: Identify the training set that used to generate a predictive model.
We recommend the configuration using Conda and Python 3.8+ or using an external jupyter environment such as Colab or Kaggle.
- imbalanced_learn
- matplotlib
- numpy
- pandas
- scikit_learn
- torch
- optuna
- seaborn
- tenseal
- networkx
- python-louvain
- notebook
pip install -r requirements.txt