This project is the first part of a larger project that aims to predict sentiment from Twitch chatrooms. Our goal here is to provide a simple example of how this can be done. We proceed by cleaning our data and using a two different models: a bidirection LSTM neural network and a branched neural network. The bidirectional LSTM NN is a rough implementation of what is described in this article and the branched model is used in this article. In the future, we will further explore model selection, optimize parameters, and develop a user interface to make the analysis accessible and digestible.
The data used and its documentation can be found here.
We use the CMU Noah's Ark tokenizer, called twokenizer. It was initially developed for twitter, and it is suggested in Barbieri's article (linked above) that a modified version can be useful for Twitch. The twokenize.py file was downloaded from here.
We include a few data cleaning functions in preprocessing.py and emotes.py. We use an API in order to get a list of emotes from Twitch for each streamer. Since the amount of data is very large, we can only access the list of emotes from one channel at a time.
There is a short analysis at the end of each notebook evaluating the accuracy of the corresponding model.