tboudart Goto Github PK

followers: 2.0 following: 0.0 repos: 8.0 gists: 0.0

Type: User

tboudart's Projects

chicago-crime-regression-analysis

As part of a group project, I developed separate regression models using R to predict the daily number of batteries and robberies in Chicago using four different datasets. I tested interactive and second-order terms and used stepwise feature selection to find the best model with the given data. I tested several potential models using cross-validation and chose the model that minimized the cross-validation errors while striking a balance with the model's simplicity. I checked the residual assumptions and both models exhibit autocorrelation as indicated by rejecting the null hypothesis of the Durbin-Watson Test. If I had more time, I would try using an ARMA model instead of multiple regression.

financial-markets-regression-analysis

My role in this group project was to perform regression analysis on quarterly financial data to predict a company's market capitalization. I used R to develop ordinary least squares (OLS), stepwise, ridge, lasso, relaxed lasso, and elastic net regression models. I first used stepwise and OLS regression to develop a model and examine its residual plots. The plot displaying the residuals against the predicted values indicated multiplicative errors. I, therefore, took the natural log transformation of the dependent variable. The resulting model's R2 was significantly, negatively impacted. After examining scatter plots between the log transformation of market capitalization and the independent variables, I discovered the independent variables also had to be transformed to produce a linear relationship. Using the log transformation of both the dependent and independent variables, I developed models using all the regression techniques mentioned to strike a balance between R2 and producing a parsimonious model. All the models produced similar results, with an R2 of around .80. Since OLS is easiest to explain, had similar residual plots, and the highest R2 of all the models, it was the best model developed.

global-terrorism-data-visualization

I completed a group project in my data visualization course using Global Terrorism data covering 1970 - 2017. My contribution to the project was creating an interactive Shiny Dashboard using R. The dashboard and its graphs were code solely by me using ggplot2 and thinking about the data, user, and task. I created seven subsets of the main data frame to allow the user to choose between displaying seven different qualitative attributes of interest. The user can select a minimum number of fatalities for a class to be displayed in the graphs and animate through the years. The dashboard allows the user to examine the trends of different qualitative attributes like terrorist organizations, target types, attack types, and more, to see how they change over time or are related to each other in specific years. I also have experience coding other types of graphs in R and using Tableau.

globaltemperaturetimeseries

greencoffeewebcrawlers

I developed Python programs to scrape data from multiple unroasted coffee bean vendors and structured it in Excel tables. I used python libraries Beautiful Soup, Requests, and XlsxWriter to gather the necessary information to help guide my coffee buying decisions. A key feature used to help guide my buying decision was the cupping score, which is calculated differently based on vendor. I, therefore, standardized the different vendors' cupping scores into Z-scores so I could compare the price per cupping score across vendors. I also have experience building web crawlers with Python by extending the HTMLParser class.

life-expectancy-regression-analysis-and-classification

I contributed to a group project using the Life Expectancy (WHO) dataset from Kaggle where I performed regression analysis to predict life expectancy and classification to classify countries as developed or developing. The project was completed in Python using the pandas, Matplotlib, NumPy, seaborn, scikit-learn, and statsmodels libraries. The regression models were fitted on the entire dataset, along with subsets for developed and developing countries. I tested ordinary least squares, lasso, ridge, and random forest regression models. Random forest regression performed the best on all three datasets and did not overfit the training set. The testing set R2 was .96 for the entire dataset and developing country subset. The developed country subset achieved an R2 of .8. I tested seven different classification algorithms to classify a country as developing or developed. The models obtained testing set balanced accuracies ranging from 86% - 99%. From best to worst, the models included gradient boosting, random forest, Adaptive Boosting (AdaBoost), decision tree, k-nearest neighbors, support-vector machines, and naive Bayes. I tuned all the models' hyperparameters. None of the models overfitted the training set.

tanzanian-water-pumps-clustering-and-classification

For this group project, I performed cluster analysis and classification using Python to predict one of three classes for water pumps; functional, functional but needs repair, and non-functions. I used clustering to find hidden data structures to exploit for fitting individual classification techniques with better results than using the entire dataset. Unfortunately, k-means clustering, DBSCAN, hierarchical clustering, nor OPTICS produced well-defined clusters. The entire dataset was therefore used for fitting classification algorithms. The two classification techniques I was responsible for were k-nearest neighbors and stacked generalization ensemble. For the latter, I combined the best models each group member developed. All the models had a hard time predicting the functional but need repair class. My best model was only able to achieve an accuracy of 76%.

tylerboudart.com

I taught myself HTML, CSS, and JavaScript to build this website from scratch. This is the first website I have coded and I tried to incorporate as many HTML semantic elements as I could. I do not intend to be a web designer but learned to code a website to learn more coding languages like JavaScript and better understand how websites work to help build web crawlers more efficiently using Python.

tboudart Goto Github PK

tboudart's Projects

chicago-crime-regression-analysis

financial-markets-regression-analysis

global-terrorism-data-visualization

globaltemperaturetimeseries

greencoffeewebcrawlers

life-expectancy-regression-analysis-and-classification

tanzanian-water-pumps-clustering-and-classification

tylerboudart.com

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent