To perform EDA on the given data set.
The primary aim with exploratory analysis is to examine the data for distribution, outliers and anomalies to direct specific testing of your hypothesis.
Import the required packages(pandas,numpy,seaborn).
Read the given .csv file.
Convert the file into a dataframe and get information of the data.
Remove the non numerical data columns using drop() method.
Replace the null values using (.fillna).
Returns object containing counts of unique values using (value_counts()).
Plot the counts in the form of Histogram or Bar Graph.
Find the pairwise correlation of all columns in the dataframe(.corr()).
Save the final data set into the file.
import pandas as pd
import numpy as np
import seaborn as sns
df=pd.read_csv("supermarket.csv")
df.info()
df.head()
df.tail()
df.isnull().sum()
df["City"].value_counts()
df["Gender"].value_counts()
df["Payment"].value_counts()
sns.countplot(x="Invoice ID",data=df)
sns.countplot(x="Total",data=df)
sns.countplot(x='gross income',data=df)
sns.countplot(x='Payment',data=df)
sns.displot(df["cogs"])
sns.countplot(x="Gender",hue="Quantity",data=df)
sns.displot(df[df["Product line"]==0]["Total"])
pd.crosstab(df["Payment"],df["Quantity"])
pd.crosstab(df["Gender"],df["Quantity"])
df.corr()
sns.heatmap(df.corr(),annot=True)
Thus the Exploratory Data Analysis (EDA) on the given data set is successfully completed.