Cynthia Correa's Projects
Using autoregression on stocks and temperature data
Curated list of Linguistic Resources for doing NLP & CL on Spanish
Code for tools developed by the BizOps team
I clean up, munge, plot, and characterize personal movement monitoring data. This project offers examples of how to use the lubridate, plyr, and knitr R packages.
I test my machine learning prowess by predicting whether entries in FitBit data correspond to walking, running, or going up stairs. The Random Forest algorithm gives almost perfect accuracy and correctly predicts the 20 test cases.
Files for my user page, which you can view at
Notebook to retrieve a stream of tweets and save to a .txt file
Use set and get functions in R to cache a computationally-intensive result so that you can re-use that result without having to re-calculate it.
Plotting Assignment 1 for Exploratory Data Analysis
Using data from the UCI Machine Learning Repository, we create code that generates various plots of energy consumption over time.
Perform autoregression to predict climate change in the next decade from dataset with temperatures, and other environmental metrics from hundreds of cities worldwide.
Exploratory analysis of the Instacart grocery store purchase data
Make an SQL database out of your iTunes music library and use python and machine learning algorithms to predict star ratings for all your songs.
A Python implementation of LightFM, a hybrid recommendation algorithm.
Guides to get up and running with Looker!
Merge two datasets from the UC Irving Human Activity Recognition archive, use them to calculate means and standard deviations of variables.
Use Keras neural network model to perform digit recognition on the mnist dataset
"Network/Graph Analysis in Python" repository of 3 hours training session held at ODSC East 2018.
Install and try out pyspark on your local machine
US stock market data since 2009
scikit-learn: machine learning in Python
I use SciPy to train 6 ML algorithms on the Iris dataset to predict the species of each sample based on the petal and sepal length and width. I use a test harness with 10-fold cross validation. KNN gives the best results, with 90% accuracy on the validation set.
Example source code and projects for the Looker SDKs
I perform sentiment analysis on tweets about Donald Trump and Hillary Clinton leading up to the 2016 presidential election. I also look for activity upticks corresponding to election debates and other events. I use the Twitter API and my R code to retrieve the tweets. The sentiment analysis is done using the tm (text mining) R package.
💫 Industrial-strength Natural Language Processing (NLP) with Python and Cython