Git Product home page Git Product logo

1)a) Data Cleaning:

import pandas as pd df=pd.read_csv("Data_set.csv") print(df.head()) print(df.tail()) print(df.info()) print(df.dtypes) print(df.isnull().sum()) df.dropna(inplace='True') x = df['height'].mean() df['height'].fillna(x, inplace=True) df['Date'] = pd.to_datetime(df['Date']) df.drop_duplicates(inplace = True)

1)b) Detect and Remove Outliers:

import pandas as pd import numpy as np df=pd.read_csv("Data_set.csv") print(df.head()) print(df.tail()) print(df.info()) print(df.dtypes) print(df.isnull().sum()) import matplotlib.pyplot as plt plt.boxplot(data['price']) plt.show() Q1 = data['price'].quantile(0.25) Q3 = data['price'].quantile(0.75) IQR = Q3 - Q1 lower = Q1 - 1.5IQR upper = Q3 + 1.5IQR upper_array = np.where(data['price']>=upper)[0] lower_array = np.where(data['price']<=lower)[0] data.drop(index=upper_array, inplace=True) data.drop(index=lower_array, inplace=True) plt.boxplot(data['price']) plt.show()

2,3) Feature Selection Techniques:

import pandas as pd from sklearn.feature_selection import SelectKBest from sklearn.feature_selection import chi2 data = pd.read_csv("CarPrice.csv") print(df.head()) print(df.tail()) print(df.info()) print(df.dtypes) print(df.isnull().sum()) df.dropna(inplace='True') df.drop_duplicates(inplace = True) data["fueltype"]=data["fueltype"].map({"gas":1,"diesel":0}) data.drop(['enginetype','carbody','symboling','CarName','aspiration','doornumber','drivewheel','enginelocation','cylindernumber','fuelsystem'], axis=1, inplace=True) selector = SelectKBest(chi2, k=10) data = selector.fit_transform(data, data["horsepower"]) #Enter the column that is of int64 datatype not float64 print("Selected Features:",data.shape)

4,5) Data Visualization:

import pandas as pd data = pd.read_csv("CarPrice.csv") print(df.head()) print(df.tail()) print(df.info()) print(df.dtypes) print(df.isnull().sum()) df.dropna(inplace='True') df.drop_duplicates(inplace = True) import matplotlib.pyplot as plt import seaborn as sns sns.boxplot(data['horsepower']) plt.show() sns.countplot(data['horsepower']) plt.show() sns.histplot(data['horsepower']) plt.show() sns.lineplot(data['horsepower']) plt.show() x = data['fueltype'].value_counts() plt.pie(x.values, labels=x.index, autopct='%1.1f%%') plt.show() sns.barplot(x=data['fueltype'],y=data['horsepower']) plt.show() sns.scatterplot(data['horsepower']) plt.show() sns.heatmap(data['horsepower'].corr(), annot=True) plt.show()

sachinezhilmaran's Projects

sachinezhilmaran doesnโ€™t have any public repositories yet.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.