Git Product home page Git Product logo

rohinegi548 / eda-and-machine-learning-avocado-prices-predictions Goto Github PK

View Code? Open in Web Editor NEW
11.0 2.0 4.0 1.95 MB

Exploratory Data Analysis and Price Predictions for Avocado Dataset based on Machine Learning

HTML 16.24% Jupyter Notebook 83.76%
avocado-dataset price-predictions machine-learning exploratory-data-analysis eda jupyter-notebook python pandas pandas-dataframe seaborn-plots matplotlib xlarge-bags total-bags

eda-and-machine-learning-avocado-prices-predictions's Introduction

EDA-and-Machine-Learning-Avocado-Prices-Predictions

Exploratory Data Analysis and Price Predictions for Avocado Dataset based on Machine Learning

Avocado Dataset Analysis and ML Prediction

    Table of Contents

  • Problem Statement
  • Data Loading and Description
  • Data Profiling
    • Understanding the Dataset
    • Profiling
    • Preprocessing
  • Data Visualisation and Questions answered
    • Q.1 Which type of Avocados are more in demand (Conventional or Organic)?
    • Q.2 In which range Average price lies, what is distribution look like?
    • Q.3 How Average price is distributed over the months for Conventional and Organic Types?
    • Q.4 What are TOP 5 regions where Average price are very high?
    • Q.5 What are TOP 5 regions where Average consumption is very high?
    • Q.6 In which year and for which region was the Average price the highest?
    • Q.7 How price is distributed over the date column?
    • Q.8 How dataset features are correlated with each other?
  • Feature Engineering for Model building
  • Model selection/predictions
    • P.1 Are we good with Linear Regression? Lets find out.
    • P.2 Are we good with Decision Tree Regression? Lets find out.
    • P.3 Are we good with Random Forest Regressor? Lets find out.
  • Lets see final Actual Vs Predicted sample.
  • Conclusions

Problem Statement

  • The notebooks explores the basic use of Pandas and will cover the basic commands of (EDA) for analysis purpose.
  • In this study, we will try to see if we can predict the Avocado’s Average Price based on different features. The features are different (Total Bags,Date,Type,Year,Region…).
  • The variables of the dataset are the following:

  • Categorical: ‘region’,’type’
  • Date: ‘Date’
  • Numerical:’Total Volume’, ‘4046’, ‘4225’, ‘4770’, ‘Total Bags’, ‘Small Bags’,’Large Bags’,’XLarge Bags’,’Year’
  • Target:‘AveragePrice’

Data Loading and Description

This data was downloaded and provided by INSAID, from the Hass Avocado Board website in May of 2018 & compiled into a single CSV. Represents weekly 2018 retail scan data for National retail volume (units) and price. The dataset comprises of 18249 observations of 14 columns. Below is a table showing names of all the columns and their description.

The unclear numerical variables terminology is explained in the next section:

FeaturesDescription
‘Unamed: 0’ Its just a useless index feature that will be removed later
‘Total Volume’ Total sales volume of avocados
‘4046’ Total sales volume of Small/Medium Hass Avocado
‘4225’ Total sales volume of Large Hass Avocado
‘4770’ Total sales volume of Extra Large Hass Avocado
‘Total Bags’ Total number of Bags sold
‘Small Bags’ Total number of Small Bags sold
‘Large Bags’ Total number of Large Bags sold
‘XLarge Bags’ Total number of XLarge Bags sold

-->Use this while viewing notebook

eda-and-machine-learning-avocado-prices-predictions's People

Contributors

rohinegi548 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.