Git Product home page Git Product logo

nashville-housing-data-cleaning's Introduction

Nashville Housing Data Cleaning Project

Overview

This project involves cleaning and transforming data from the Nashville Housing dataset using SQL queries. The primary tasks include exploring the data, changing date formats, populating missing property addresses, breaking down address columns, standardizing 'SoldAsVacant' values, removing duplicates, and deleting unused columns.

SQL Queries

Exploring Data

  • Retrieve all records from the NashvilleHousing table.
SELECT * 
FROM NashvilleHousing;

ุฏ

Changing SaleDate Format

Alter the table to add a new column (SaleDateConverted) with a standardized date format. Update the new column with converted dates.

ALTER TABLE NashvilleHousing
ADD SaleDateConverted DATE;

UPDATE NashvilleHousing
SET SaleDateConverted = CONVERT(date, SaleDate);

Populating Property Address Data

Populate missing property addresses by updating records based on ParcelID.

-- Populating Property Address Data
-- (code snippet for SELECT statement not shown here)

-- Update records with missing property addresses
UPDATE N1
SET PropertyAddress = ISNULL(N1.PropertyAddress, N2.PropertyAddress)
FROM NashvilleHousing N1
JOIN NashvilleHousing N2
	ON N1.ParcelID = N2.ParcelID
	AND N1.UniqueID <> N2.UniqueID
WHERE N1.PropertyAddress IS NULL;

Breaking Property Address Column

Break down the PropertyAddress column into two individual columns: Address and City.

-- Breaking Property Address Column
-- (code snippet for SELECT statement not shown here)

-- Add new columns
ALTER TABLE NashvilleHousing
ADD Address NVARCHAR(255);

-- Update the new columns
UPDATE NashvilleHousing
SET Address = SUBSTRING( PropertyAddress, 1, CHARINDEX(',', PropertyAddress)-1);

ALTER TABLE NashvilleHousing
ADD City NVARCHAR(255);

UPDATE NashvilleHousing
SET City = SUBSTRING(PropertyAddress, CHARINDEX(',', PropertyAddress)+1, LEN(PropertyAddress));

Changing 'Y' and 'N' to 'Yes' and 'No' in SoldAsVacant Column

Standardize values in the SoldAsVacant column.

-- Changing 'Y' and 'N' to 'Yes' and 'No' in SoldAsVacant Column
-- (code snippet for SELECT statement not shown here)

-- Update the column
UPDATE NashvilleHousing
SET SoldAsVacant = CASE WHEN SoldAsVacant = 'Y' THEN 'Yes'
	   WHEN SoldAsVacant = 'N' THEN 'No'
	   ELSE SoldAsVacant
	   END;

Removing Duplicates

Remove duplicate records based on specific columns.

-- Removing Duplicates
-- (code snippet for WITH statement not shown here)

-- Delete duplicates
WITH RowNumCte AS (
    -- ...
)
DELETE 
FROM RowNumCte
WHERE row_num > 1;

Deleting Unused Columns

Remove columns that are no longer needed.

-- Deleting Unused Columns
ALTER TABLE NashvilleHousing
DROP COLUMN OwnerAddress, TaxDistrict, PropertyAddress;
### Checking the Result
Verify that the cleaning process was successful.

Happy Cleaning!

nashville-housing-data-cleaning's People

Contributors

yaseentheanalyst avatar

Stargazers

Yousef Ahmed avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.