Feature engineering is a course offered on codecademy. This repository is a collection of exercises I completed for the course.
Techniques for transforming categorical data, scaling data, and working with date-time features are applied to a clothing company dataset from Kaggle.
Survey data from a survey conducted by Fabio Mendoza Palechor and Alexis de la Hoz Manotas (sourced from the UCI Machine Learning Repository) are analysed using a regresssion model, with the purpose of predicting whether survey respondents are obese. A number of wrapper methods are testest and compared, including:
- sequential forward selection
- sequential backward floating selection
- recursive feature elimination
A set of logistic regression models were trained on a dataset from the UCI Machine Learning Repository on wine quality using sci-kit learn. The goal of the project was to:
- implement different logistici classifiers
- find the best ridge-regularized classifier using hyperparameter tuning
- implement a tuned lasso-regularised feature selection method
In this project, you will classify particles into gamma(signal) or hadrons(background). Given that the features are correlated, you will perform PCA to get a new set of features, and select the features that contain the most information. The data set was generated by a Monte Carlo program, Corsika, described in D. Heck et al., CORSIKA, A Monte Carlo code to simulate extensive air showers, Forschungszentrum Karlsruhe FZKA 6019 (1998).