Git Product home page Git Product logo

data-analyst-portfolio's Introduction

Data-Analytics-Projects:

This repository is mainly for projects I have done under Udacity-Data-Analysis-Nanodegree.

Udacity online data analyst program prepares me for a career as a data analyst by helping me learn to clean and organize data, uncover patterns and insights, draw meaningful conclusions, and clearly communicate critical findings. I am developing proficiency in Python and its data analysis libraries (Numpy, pandas, Matplotlib) and SQL as I build a portfolio of projects .

Tips: For data science projects with python, I would recomend you to install numpy , pandas , scipy , scikit learn , matplotlib , seaborn thest basic libraries.

Part 1 - Intro to Data Analysis

Subjects Covered:

  • Anaconda: Learn to use Anaconda to manage packages and environments for use with Python
  • Jupyter Notebook: Learn to use this open-source web application
  • Data Analysis Process
  • NumPy for 1 and 2D Data
  • Pandas Series and Dataframes

Project 1: Investigate a dataset called TMDb movie data.

I was provided a dataset reflecting data collected from an experiment. I used statistical techniques to answer questions about the data and report my conclusions and recommendations in a report. In this project, I choose one of Udacity's curated datasets and investigate it using NumPy and pandas.I complete the entire data analysis process, starting by posing a question and finishing by sharing the findings.

Project 2: Data Analysis with No-show Movie appointments

Project 3: Data Analysis with-ncis-and-census-data

Part 2 -Practical Statistics

Subjects Covered:

  • Probability
  • Conditional Probability
  • Binominal Distribution
  • Sampling Distribution and Central Limit Theorem
  • Descriptive Statistics
  • Inferential Statistics
  • Confidence Levels and Intervals
  • Hypothesis Testing
  • T-tests and A/B test
  • Regression
  • Multiple Linear Regression
  • Logistic Regression

Project 4: Analyze A/B Test Results with company ab_data.csv

Using Python, I gathered data from a variety of sources, assess its quality and tidiness, then clean it.I documented the wrangling efforts in a Jupyter Notebook, plus showcase them through analyses and visualizations using Python and SQL.By using AB Testing and regression methods to decide if the company should launch a new webpage or keep the old one.

Part 3 - Data Extraction and Wrangling

Subjects Covered:

  • GATHERING DATA:
    • Gather data from multiple sources, including gathering files, programmatically downloading files, web-scraping data, and accessing data from APIs
    • Import data of various file formats into pandas, including flat files (e.g. TSV), HTML files, TXT files, and JSON files
    • Store gathered data in a PostgreSQL database
  • ASSESSING DATA
    • Assess data visually and programmatically using pandas
    • Distinguish between dirty data (content or “quality” issues) and messy data (structural or “tidiness” issues)
    • Identify data quality issues and categorize them using metrics: validity, accuracy, completeness, consistency, and uniformity
  • CLEANING DATA
    • Identify each step of the data cleaning process (defining, coding,and testing)
    • Clean data using Python and pandas
    • Test cleaning code visually and programmatically using Python

Project 5 : Case Study in Data Wrangling.

I divide this project in three parts such as Gathering Data, Assessing Data, Cleaning Data. Dataset which i used in this project is really very messy. So first figure out what are the problem with dataset after that i found that the dataset have three major problem. First it has missing value, second is its Untidy dataset and third is quality issues. I clean this data step by step. In the end i am able to remove all problem associated with the data set. Now i have clean data. I perform all the cleaning operation in jupyter notebook using pandas and numpy.

Project 6 : Data Wrangle and Analyze with Tweet WeRateDogs data

Collect data from different sources and assess data visually and programmatically , clean data for visulizing data and finding insights later.

Part 4 - Data Visualization

Subjects Covered:

  • Univariate exploration of data ( histogram , bar charts , Use axis limits and different scales )
  • Bivariate exploration of data ( scatter plots , clustered bar charts , violin and bar charts , faceting )
  • Multivariate exploration of data ( encodings , plot matrices , feature enginnering )
  • Explanatory Visulizations ( story telling with data , polish plots , create slide deck )

Project 7: Data Visulization with Diamond Data

Data visualization to a dataset involving the characteristics of diamonds and their prices.

Project 8: Data Visualization with Communicate data finding with Ford Go Bike Sharing Data

In this project, I used Python’s data visualization tools to systematically explore the bike dataset for its properties and relationships between variables. Then, I created a presentation that communicates the findings to others.

Project 9: Data Visualization with Prosper Loan data.

data-analyst-portfolio's People

Contributors

mkumar7 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.