Git Product home page Git Product logo

tboudart's Projects

chicago-crime-regression-analysis icon chicago-crime-regression-analysis

As part of a group project, I developed separate regression models using R to predict the daily number of batteries and robberies in Chicago using four different datasets. I tested interactive and second-order terms and used stepwise feature selection to find the best model with the given data. I tested several potential models using cross-validation and chose the model that minimized the cross-validation errors while striking a balance with the model's simplicity. I checked the residual assumptions and both models exhibit autocorrelation as indicated by rejecting the null hypothesis of the Durbin-Watson Test. If I had more time, I would try using an ARMA model instead of multiple regression.

financial-markets-regression-analysis icon financial-markets-regression-analysis

My role in this group project was to perform regression analysis on quarterly financial data to predict a company's market capitalization. I used R to develop ordinary least squares (OLS), stepwise, ridge, lasso, relaxed lasso, and elastic net regression models. I first used stepwise and OLS regression to develop a model and examine its residual plots. The plot displaying the residuals against the predicted values indicated multiplicative errors. I, therefore, took the natural log transformation of the dependent variable. The resulting model's R2 was significantly, negatively impacted. After examining scatter plots between the log transformation of market capitalization and the independent variables, I discovered the independent variables also had to be transformed to produce a linear relationship. Using the log transformation of both the dependent and independent variables, I developed models using all the regression techniques mentioned to strike a balance between R2 and producing a parsimonious model. All the models produced similar results, with an R2 of around .80. Since OLS is easiest to explain, had similar residual plots, and the highest R2 of all the models, it was the best model developed.

global-terrorism-data-visualization icon global-terrorism-data-visualization

I completed a group project in my data visualization course using Global Terrorism data covering 1970 - 2017. My contribution to the project was creating an interactive Shiny Dashboard using R. The dashboard and its graphs were code solely by me using ggplot2 and thinking about the data, user, and task. I created seven subsets of the main data frame to allow the user to choose between displaying seven different qualitative attributes of interest. The user can select a minimum number of fatalities for a class to be displayed in the graphs and animate through the years. The dashboard allows the user to examine the trends of different qualitative attributes like terrorist organizations, target types, attack types, and more, to see how they change over time or are related to each other in specific years. I also have experience coding other types of graphs in R and using Tableau.

greencoffeewebcrawlers icon greencoffeewebcrawlers

I developed Python programs to scrape data from multiple unroasted coffee bean vendors and structured it in Excel tables. I used python libraries Beautiful Soup, Requests, and XlsxWriter to gather the necessary information to help guide my coffee buying decisions. A key feature used to help guide my buying decision was the cupping score, which is calculated differently based on vendor. I, therefore, standardized the different vendors' cupping scores into Z-scores so I could compare the price per cupping score across vendors. I also have experience building web crawlers with Python by extending the HTMLParser class.

life-expectancy-regression-analysis-and-classification icon life-expectancy-regression-analysis-and-classification

I contributed to a group project using the Life Expectancy (WHO) dataset from Kaggle where I performed regression analysis to predict life expectancy and classification to classify countries as developed or developing. The project was completed in Python using the pandas, Matplotlib, NumPy, seaborn, scikit-learn, and statsmodels libraries. The regression models were fitted on the entire dataset, along with subsets for developed and developing countries. I tested ordinary least squares, lasso, ridge, and random forest regression models. Random forest regression performed the best on all three datasets and did not overfit the training set. The testing set R2 was .96 for the entire dataset and developing country subset. The developed country subset achieved an R2 of .8. I tested seven different classification algorithms to classify a country as developing or developed. The models obtained testing set balanced accuracies ranging from 86% - 99%. From best to worst, the models included gradient boosting, random forest, Adaptive Boosting (AdaBoost), decision tree, k-nearest neighbors, support-vector machines, and naive Bayes. I tuned all the models' hyperparameters. None of the models overfitted the training set.

tanzanian-water-pumps-clustering-and-classification icon tanzanian-water-pumps-clustering-and-classification

For this group project, I performed cluster analysis and classification using Python to predict one of three classes for water pumps; functional, functional but needs repair, and non-functions. I used clustering to find hidden data structures to exploit for fitting individual classification techniques with better results than using the entire dataset. Unfortunately, k-means clustering, DBSCAN, hierarchical clustering, nor OPTICS produced well-defined clusters. The entire dataset was therefore used for fitting classification algorithms. The two classification techniques I was responsible for were k-nearest neighbors and stacked generalization ensemble. For the latter, I combined the best models each group member developed. All the models had a hard time predicting the functional but need repair class. My best model was only able to achieve an accuracy of 76%.

tylerboudart.com icon tylerboudart.com

I taught myself HTML, CSS, and JavaScript to build this website from scratch. This is the first website I have coded and I tried to incorporate as many HTML semantic elements as I could. I do not intend to be a web designer but learned to code a website to learn more coding languages like JavaScript and better understand how websites work to help build web crawlers more efficiently using Python.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.