Git Product home page Git Product logo

women_shoe_analysis's Introduction

Women's Shoe Data Analysis

Objective

The objective of this project is to explore the women's shoe sample dataset provided by Datafiniti https://datafiniti.co

This project will answers the questions below using visualization:

  1. What are the top 10 most expensive brands?

  2. Which brands have wide price distribution?

  3. Do unavailability of an item affect its price?

  4. Which brand is listed the most among retailers?

Dataset

The data is publicly available on Kaggle https://www.kaggle.com/datafiniti/womens-shoes-prices#Datafiniti_Womens_Shoes.csv

The dataset consist of 10,000 listing of women's shoes updated between January and October 2018, and another 10,000 between March 2019 and May 2019

It should be noted that the dataset is only provided as a SAMPLE and might not be fully complete, as such the result of the analysis might not reflect the true condition

Quality

prices.availability has different value for in stock or out of stock, need to convert to true or false or blank

True, TRUE, In Stock = 1
False, Out Of Stock = 0
nan = -1

brand is quite messy, the column has values which point to the same brand but different capital letters or spelling (e.g. Nike, nike and nike shoes), we need to convert them into consistent naming

prices.amountMax and prices.amountMin seems to contain outliers which set price at 5000 and 999.99, we will remove these

Some columns contain missing values, however, since we are not using them we will leave them as is

Tidiness

categories, color and sizes contain comma separated values. These columns need to be separated if we want to evaluate them

Visualization

Most Expensive

most_expensive

From the bar chart, it is clear that Tabitha Simmons is the most expensive brands listed among the retailers

Price Distribution

price_distribution

The wide price distribution might indicate that these brands are often on sale, hence the big price difference from time to time

Unavailability vs Price

availability

It is often rumored that retailers take advantage of unavailability of the item to set its price lower in order to make them looks like a good bargain retailer, however, according to the violin plot above, this is not the case.

The left violin is price distribution when the items are not available, the right is when the items are available. We can see that they are similarly shaped with similar inter quartile range. Hence there is no evidence that when an item is unavailable its price is reduced.

Most listed brands

wordcloud

The word cloud image above is generated using most frequent occurence of words. We can see that the most listed brand is Journee Collection as it takes the biggest portion of the 'shoe' ;)

women_shoe_analysis's People

Contributors

albertsundjaja avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.