Git Product home page Git Product logo

ex-05-feature-generation's Introduction

EX-05-Feature-Generation

AIM

To read the given data and perform Feature Generation process and save the data to a file.

Explanation

Feature Generation (also known as feature construction, feature extraction or feature engineering) is the process of transforming features into new features that better relate to the target.

ALGORITHM

STEP 1

Read the given Data

STEP 2

Clean the Data Set using Data Cleaning Process

STEP 3

Apply Feature Generation techniques to all the feature of the data set

STEP 4

Save the data to the file

CODE

DATA.CSV

import pandas as pd
df=pd.read_csv('/content/data.csv')
df

from sklearn.preprocessing import LabelEncoder,OrdinalEncoder
education=['High School','Diploma','Bachelors','Masters','PhD']
enc=OrdinalEncoder(categories=[education])
enc.fit_transform(df[['Ord_2']])

df['education']=enc.fit_transform(df[['Ord_2']])
df=df.drop('Ord_2',axis=1)
df

weather=['Cold','Warm','Hot','Very Hot']
enc1=OrdinalEncoder(categories=[weather])
enc1.fit_transform(df[['Ord_1']])

df['weather']=enc1.fit_transform(df[['Ord_1']])
df

from sklearn.preprocessing import OneHotEncoder
ohe=OneHotEncoder(sparse=False)
enc=pd.DataFrame(ohe.fit_transform(df[['City']]))
df=pd.concat([df,enc],axis=1)
df

from category_encoders import BinaryEncoder
be=BinaryEncoder()
newdata=be.fit_transform(df['bin_1'])
new_data=be.fit_transform(df['bin_2'])
df=pd.concat([df,new_data],axis=1)
df


from sklearn.preprocessing import StandardScaler
df1=df.copy()
df1.columns = df1.columns.astype(str)

# Or remove any columns with string values
df1 = df1.select_dtypes(exclude=['object'])

# Then apply the StandardScaler
sc = StandardScaler()
df1_scaled = pd.DataFrame(sc.fit_transform(df1), columns=df1.columns)

df1_scaled

from sklearn.preprocessing import MinMaxScaler
sc=MinMaxScaler()
df2=pd.DataFrame(sc.fit_transform(df1))
df2

from sklearn.preprocessing import MinMaxScaler
sc1=MinMaxScaler()
df3=pd.DataFrame(sc1.fit_transform(df1))
df3

from sklearn.preprocessing import RobustScaler
sc2=RobustScaler()
df4=pd.DataFrame(sc2.fit_transform(df1))
df4

OUPUT

Screenshot from 2023-04-25 20-16-25 Screenshot from 2023-04-25 20-16-43

Screenshot from 2023-04-25 20-16-25 Screenshot from 2023-04-25 20-17-14 Screenshot from 2023-04-25 20-17-23 Screenshot from 2023-04-25 20-17-20 Screenshot from 2023-04-25 20-17-27 Screenshot from 2023-04-25 20-17-23 Screenshot from 2023-04-25 20-17-32 Uploading Screenshot from 2023-04-25 20-16-25.png… Screenshot from 2023-04-25 20-16-48 Screenshot from 2023-04-25 20-16-36 Screenshot from 2023-04-25 20-16-53 Screenshot from 2023-04-25 20-16-43 Screenshot from 2023-04-25 20-17-02 Screenshot from 2023-04-25 20-16-48 Screenshot from 2023-04-25 20-17-14 Screenshot from 2023-04-25 20-16-53 Screenshot from 2023-04-25 20-17-20 Screenshot from 2023-04-25 20-17-02

ENCODING.CSV

import pandas as pd
df=pd.read_csv('/content/Encoding Data.csv')
df

from category_encoders import BinaryEncoder
be=BinaryEncoder()
data=be.fit_transform(df["bin_1"])
df=pd.concat([df,data],axis=1)
df

new_data = be.fit_transform(df["bin_2"])
df=pd.concat([df,newdata],axis=1)
df

df1=df.copy()
from sklearn.preprocessing import LabelEncoder,OrdinalEncoder
le=LabelEncoder()
oe=OrdinalEncoder()

df1["nom_0"] = oe.fit_transform(df1[["nom_0"]])
temp=['Cold','Warm','Hot']
oe2=OrdinalEncoder(categories=[temp])
df1['ord_2'] = oe2.fit_transform(df1[['ord_2']])

df1

from sklearn.preprocessing import StandardScaler
df1=df.copy()
df1.columns = df1.columns.astype(str)

# Or remove any columns with string values
df1 = df1.select_dtypes(exclude=['object'])

# Then apply the StandardScaler
sc = StandardScaler()
df1_scaled = pd.DataFrame(sc.fit_transform(df1), columns=df1.columns)
df1_scaled

#feature scaling
from sklearn.preprocessing import MinMaxScaler
sc=MinMaxScaler()
df0=pd.DataFrame(sc.fit_transform(df1),columns=df1.columns)
df0

from sklearn.preprocessing import MaxAbsScaler
sc2=MaxAbsScaler()
df3=pd.DataFrame(sc2.fit_transform(df1),columns=df1.columns)
df3

from sklearn.preprocessing import RobustScaler
sc3=RobustScaler()
df4=pd.DataFrame(sc3.fit_transform(df1),columns=df1.columns)
df4

OUTPUT:

Screenshot from 2023-04-25 20-17-32 Screenshot from 2023-04-25 20-23-29 Screenshot from 2023-04-25 20-23-34 Screenshot from 2023-04-25 20-23-50 Screenshot from 2023-04-25 20-23-58 Screenshot from 2023-04-25 20-23-17 Screenshot from 2023-04-25 20-17-32 Screenshot from 2023-04-25 20-23-17

TITANIC_DATASET.CSV

C0DE:

import pandas as pd
df=pd.read_csv("titanic_dataset.csv")
df

#removing unwanted data
df.drop("Name",axis=1,inplace=True)
df.drop("Ticket",axis=1,inplace=True)
df.drop("Cabin",axis=1,inplace=True)

#data cleaning
df.isnull().sum()

df["Age"]=df["Age"].fillna(df["Age"].median())
df["Embarked"]=df["Embarked"].fillna(df["Embarked"].mode()[0])

df.isnull().sum()

df


#feature encoding
from category_encoders import BinaryEncoder
be=BinaryEncoder()
data=be.fit_transform(df[["Sex"]])
df=pd.concat([df,data],axis=1)
df

df1=df.copy()
from sklearn.preprocessing import LabelEncoder, OrdinalEncoder
embark=['S','C','Q']
e1=OrdinalEncoder(categories=[embark])
df1['Embarked'] = e1.fit_transform(df[['Embarked']])
df1

from sklearn.preprocessing import StandardScaler
df1=df.copy()
df1.columns = df1.columns.astype(str)

# Or remove any columns with string values
df1 = df1.select_dtypes(exclude=['object'])

# Then apply the StandardScaler
sc = StandardScaler()
df1_scaled = pd.DataFrame(sc.fit_transform(df1), columns=df1.columns)
df1_scaled

#feature scaling
from sklearn.preprocessing import MinMaxScaler
sc=MinMaxScaler()
df2=pd.DataFrame(sc.fit_transform(df1),columns=df1.columns)
df2

from sklearn.preprocessing import StandardScaler
sc1=StandardScaler()
df3=pd.DataFrame(sc1.fit_transform(df1),columns=df1.columns)
df3

from sklearn.preprocessing import MaxAbsScaler
sc2=MaxAbsScaler()
df4=pd.DataFrame(sc2.fit_transform(df1),columns=df1.columns)
df4

from sklearn.preprocessing import RobustScaler
sc3=RobustScaler()
df5=pd.DataFrame(sc3.fit_transform(df1),columns=df1.columns)
df5

OUTPUT:

Screenshot from 2023-04-25 20-35-11 Screenshot from 2023-04-25 20-35-18 Screenshot from 2023-04-25 20-35-21 Screenshot from 2023-04-25 20-35-29 Screenshot from 2023-04-25 20-35-36 Screenshot from 2023-04-25 20-35-40 Screenshot from 2023-04-25 20-35-45 Screenshot from 2023-04-25 20-35-51

RESULT:

Thus the experiment executed sucessfully.

ex-05-feature-generation's People

Contributors

harish-ragavendra-25 avatar karthi-govindharaju avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.