Git Product home page Git Product logo

data-mining-projects's Introduction

Data Mining Projects

This repository contains multiple projects covering several important topics in Data Mining.

About

Under The Supervision of Prof.Ehsan Nazerfard

Spring 2023

1. Data Preprocessing

The objective of this project is to employ a variety of preprocessing techniques to showcase the significance of comprehending, cleansing, and refining the raw dataset. The considered aspects encompass:

  1. Managing NaN values
  2. Processing non-numeric data through Label Encoding and One Hot Encoding
  3. Implementing Data Augmentation
  4. Utilizing Upsampling and Downsampling methods
  5. Applying Smotetomek and Smoteenn approaches
  6. Normalizing the data
  7. Conducting Principal Component Analysis (PCA)
  8. Creating plots and visualizations"

Libraries: Scikit-learn, Pandas, Imbalanced-learn, Matplotlib

About the Dataset: The dataset considered for this project is Palmer Penguin. This collection was collected to identify three different breeds of penguins (Adelie, Gentoo and Chinstrap). There are 7 features for each penguin.

2. Regression and Classification

The objective of this project is to demonstrate the deployment of various machine learning techniques on housing price data, illustrating the application and impact of both classification and regression methods

  1. Q-box analysis
  2. Comparison between Linear Regression and Polynomial Regression
  3. Calculation of Mean Squared Error
  4. Classification methods such as Decision Trees, Random Forests, K-Nearest Neighbors (KNN), Linear and Non-Linear Support Vector Machines (SVM)
  5. Multi-class classification employing Deep Learning techniques
  6. Utilization of a Confusion Matrix

Libraries: Scikit-learn, Tensorflow, Pandas, Numpy , Matplotlib

About the Dataset: The data set considered for this project is the data set related to house price prediction (houseprice.csv). This collection includes the characteristics of the area, the number of rooms, having parking, storage, elevator, address and the price of the house corresponding to them.

3. Kmeans Algorithm

The target of this project is to gain insight into clusters through practical exploration and to create clusters using the Python language.

  1. Generate a Similarity matrix using Cosine Similarity and Euclidean distance.
  2. Implementation of the K-means algorithm

Simulated Result:

C1 C2 C3 C4

Libraries: Scikit-learn,matplotlib, numpy

4. Final Project: Persian Spotify

A project aimed at making various predictions using the Persian music dataset.

  1. Data analysis and review, including Exploratory Data Analysis (EDA) and PCA visualization.
  2. Application of regression to predict music popularity.
  3. Classification of music into traditional and non-traditional categories.

Libraries: Scikit-learn, matplotlib, numpy, Seaborn

About the Dataset: This dataset contains 10,632 songs from 69 Iranian artists. There are 32 features to describe music.

data-mining-projects's People

Contributors

amirbehnam1009 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.