Git Product home page Git Product logo

uci-iris-classification's Introduction

UCI Iris Classification

Description

A python script that predicts plant species based on sepal and petal lengths. The species used in this dataset are iris-setosa, iris-versicolor, iris-virginica. This example is part of the University of California - Irvine Machine Learning Repository.

Libraries used in this example include pandas, seaborn, matplotlib, and scikit-learn. The algorithm used is the k-nearest neighbors algorithm.

Analysis

First, we make box and whisker plots to see the range of values for petal and sepal dimensions.

petalLengthBW

petalWidthBW

sepalLengthBW

sepalWidthBW

Next, plot histograms of the same data.

petalLengthHist

petalWidthHist

sepalLengthHist

sepalWidthHist

These plots give us a good visual for the data. Now use a violin plot to condense it all into two graphs. One violin plot will show petal length and another will show sepal length.

petalLengthViolin

sepalLengthViolin

Now, since we were only given one dataset, we have to split it into a training section and testing section. Most of the data will be in the training dataset.

train, test = train_test_split(df, test_size = 0.3)

#take data features and output for training and testing
train_x = train[['sepal-length', 'sepal-width', 'petal-length', 'petal-width']]
train_y = train['species']

test_x = train[['sepal-length', 'sepal-width', 'petal-length', 'petal-width']]
test_y = train['species']

This example uses the K-nearest Neighbors algorithm so use the following script to train and fit the model:

model = KNeighborsClassifier(n_neighbors = 3)
model.fit(train_x, train_y)
prediction = model.predict(test_x)
print(metrics.accuracy_score(prediction, test_y))
print(' ')

This returns pretty good results but what would happen if we seperated petal and sepal lengths? To do this, again split the data into a training section and a testing section. The only difference this time is that you will to do it for both petal and sepal lengths.

#split the dataset
petal = df[['petal-length', 'petal-width', 'species']]
sepal = df[['sepal-length', 'sepal-width', 'species']]

#split the data into a training and testing section again

#petals
train_petal, test_petal = train_test_split(petal, test_size = 0.3, random_state = 0)
train_petal_x = train_petal[['petal-length', 'petal-width']]
train_petal_y = train_petal['species']

test_petal_x = test_petal[['petal-length', 'petal-width']]
test_petal_y = test_petal['species']

#sepals
train_sepal, test_sepal = train_test_split(sepal, test_size = 0.3, random_state = 0)
train_sepal_x = train_sepal[['sepal-length', 'sepal-width']]
train_sepal_y = train_sepal['species']

test_sepal_x = test_sepal[['sepal-length', 'sepal-width']]
test_sepal_y = test_sepal['species']

Retrain the model for this new scenario:

print('New training session:')
#petals
model = KNeighborsClassifier(n_neighbors = 3)
model.fit(train_petal_x, train_petal_y)
prediction = model.predict(test_petal_x)
print('Petal prediction: ')
print(metrics.accuracy_score(prediction, test_petal_y))
print(' ')

#sepals
model = KNeighborsClassifier(n_neighbors = 3)
model.fit(train_sepal_x, train_sepal_y)
prediction = model.predict(test_sepal_x)
print('Sepal prediction: ')
print(metrics.accuracy_score(prediction, test_sepal_y))

It can be seen that restricting only to petal length gives a better prediction than sepal length or both.

Acknowledgements

This project was made with guidance from various Kaggle kernels and other tutorials. These include this tutorial on machinelearningmastery.com and this IPython Notebook by I,Coder.

Sources and Helpful Links

https://archive.ics.uci.edu/ml/datasets/iris
https://www.kaggle.com/adityabhat24/iris-data-analysis-and-machine-learning-python
https://www.kaggle.com/uciml/iris/home
https://www.kaggle.com/ash316/ml-from-scratch-with-iris

uci-iris-classification's People

Contributors

hernanrazo avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.