Name: Arnab Sarkar
Type: User
Bio: Budding Data Scientist, with more than 3 years of experience in Oracle PL/SQL, Oracle E-Business Suite ERP platform, Oracle Demantra and Talend Data Integration
Location: Charlottesville, VA
Arnab Sarkar's Projects
Implementation of Bayesian approaches to develop a sentiment prediction model over the publicly available IMDb movie review dataset.
Capstone project for Wikimedia's Trust and Safety team. Involves abuse detection in user comments and future user risk prediction using both the textual and activity based non textual features. Implemented in Python, using Pandas,Numpy,NLTK,Scikit-Learn and more.
CS5010 Final project implemented in Python, using football players dataset based on FIFA 2018 game, scraped from 'SOFIFA.com'. We explore and visualize various player attributes, and finally create classification models(Logistic Regression,Random Forests) to predict a player's position on the basis of their attributes.
an initiative to provide infrastructure for reproducible workflows around open data
This is the final project for the Linear Models Statistics course, implemented using R and associated libraries. We implement various multiple linear regression models based on census data features(at county level) and try to predict the risk that a person from that county might be suffering from cancer.
Text analytics on Steam Reviews.
SYS 6018: Applied Data Mining, Competition 3: Predicting Blogger Characteristics
SYS 6018: Applied Data Mining, Competition 1: Titanic Dataset
This is a Data Mining project primarily implemented in R, which is based on the US National Prisoner Exoneration data, and looks into the relationship between the different features(given below) and the CTE(Conviction to Exoneration Time)