Use Kaggle’s Red Wine Quality dataset to build various classification models to predict whether a particular red wine is “good quality” or not. Each wine in this dataset is given a “quality” score between 0 and 10. For this project, The quality of a wine is determined by 11 input variables: The objectives is to determine which features are the most good quality of wine.
In the dataset, you can see that several features will be used to classify the quality of wine, many of them are chemical.
- volatile acidity : Volatile acidity is the gaseous acids present in wine.
- fixed acidity : Primary fixed acids found in wine are tartaric, succinic, citric, and malic
- residual sugar : Amount of sugar left after fermentation.
- citric acid : It is weak organic acid, found in citrus fruits naturally.
- chlorides : Amount of salt present in wine.
- free sulfur dioxide : So2 is used for prevention of wine by oxidation and microbial spoilage.
- total sulfur dioxide
- pH : In wine pH is used for checking acidity
- density
- sulphates : Added sulfites preserve freshness and protect wine from oxidation, and bacteria.
- alcohol : Percent of alcohol present in wine.
Rather than chemical features, you can see that there is one feature named Type it contains the types of wine we here discuss on red and white wine, the percent of red wine is greater than white.
For the next step we have to import some important library :
1. import the data
2. clean data
3. split the data into training set / testset means some wine for train some for test
4. create a model with decison tree
5. create a model
6. train the model
7. make prediction
8. then evaluate