Abhiram Muktineni's Projects
Taking on this challenge for fun, daily practice, to unlock the secret of persistence and consistency.
This data set contains attributes of animal bite incidents reported to Louisville Metro Department of Public Health and Wellness, investigated and analyzed by Commonwealth of Kentucky's CHFS Office of Applications and Technology Services (OATS). Personal/identifying data has been removed.
The Colab Notebook in this repository does a great job explaining why you would use Spark and PySpark, walking you through the technical setup, illustrating how to talk to the DataFrame API, and distinguishing these DataFrames from RDDs — Resilient Distributed Datasets that are essentially big lists, but stored in different locations — a cornerstone of the PySpark toolkit. (Credits: LinkedIn Learning - Apache PySpark by Example))
A Mini Project for API and performing Data Wrangling on Carl Zuess Meditec stock price from Frankfurt Stock Exchange
In this project, I'll be making use of automated feature engineering to predict Customer churn.
The main purpose of working through this exercise has been to hone my visualization abilities and understanding of Bayesian parameter optimization in Python for a Light GBM model.
Tried to put what I've learned about Euclidean and Manhattan distance to the test. In this case study, I applied these two distance metrics and visualized their distances on the same dataset.
In this case study, I've implemented the K-Means clustering algorithm, found the value for K using the Elbow method, the Silhouette method, and the Gap statistic, and visualized the clusters with Principal Components Analysis (PCA). I've used real data containing information on marketing newsletters and email campaigns, as well as transaction-level data from customers.
This repository contains an case study that I have put into practise what I have been learning about Cosine Similarity. To see how cosine similarity is calculated with a numeric dataset and explore the utility of cosine similarity for record matching and NLP projects.
This case study is part of Springboard DS curriculum
In this case study, as a Data Scientist I will be working with the US federal government's Health and Environment department. I've been tasked with determining whether sales for the oldest and most powerful producers of cigarettes in the country are increasing or declining. Cowboy Cigarettes (TM, est. 1890) is the US's longest-running cigarette manufacturer. Like many cigarette companies, however, they haven't always been that public about their sales and marketing data. The available post-war historical data runs for only 11 years after they resumed production in 1949; stopping in 1960 before resuming again in 1970. My job is to use just the 1949-1960 data to predict whether the manufacturer's cigarette sales actually increased, decreased, or stayed the same in the early 60s. I need to make a probable reconstruction of the sales record of the manufacturer - predicting the future, from the perspective of the past - to contribute my part of a full report on US public health in relation to major cigarette companies. The report will then be combined with other studies executed by my colleagues to provide important government advice.
This repository includes all of my projects ongoing or completed during my Career Track
This case study is part of the Springboard DS curriculum
Exercises to accompany the free Springboard introductory data science "taster" course.
This case study is part of Springboard DS Curiculumn
This case study is to gain a full understanding of how gradient boosting works to improve predictions based on information from the residuals. First, I'll apply this method to a regression problem then to a classification problem using the Titanic dataset.
In this case study, I will be using the Grid Search method to identify the optimal number of neighbors to use in the K-nearest neighbor model.
This repository is part of Springbaord's DS curriculum
This project is part of Springboard DS curriculum
This Case Study is part of Springboard DS Curriculum
Keras is one of the most extensible and user-friendly network libraries in the pool of available open-source tools. You saw a little Keras earlier in this course. The beauty of Keras is its modularity; it enables the user to build up a neural network with handy, Lego-like components. Keras has components for layers, activation functions, and optimizers, all of which save you the time (and trouble) of encoding these yourself and making them mutually compatible in a Python script.
Python Data Science Handbook: full text in Jupyter Notebooks
In Phase 2 of the TripDataAnalysis project, building upon the findings and insights gained in Phase 1 (https://github.com/abhiram540/TripDataAnalysis-CityOfLouisville-DocklessVehicles), we will explore advanced time series forecasting methods. The focus of this phase will be to make predictions using FBProphet and Vector Autoregression (VAR) models
Using data provided by the Commonwealth Of Kentucky's Department Of Transportation and Louiville Metro, this project analyzes the trip data of electric find and ride dockless vehicles and predicts the demand in various neighborhoods within urban areas throughout the city.