chatbots-of-me's Introduction

FB chat data chatbots

Two chatbots trained on my Facebook Messenger chat data to talk like me. One uses the Doc2Vec implementation of the Python library gensim and the other is based on the library ChatterBot. As my chat data is not too plentiful (~70k messages sent by me and ~35k decent input-output pairs) the strategy was to get the user input sentence, find the most similar recorded input and return the corresponding recorded output.

I also analysed my chat data, which can be found here.

The bots

Doc2Vec is well suited for this task (and was very performant), but likely needs a corpus much larger than my chat history. I also did not experiment thorougly with parameters such as word vector dimension count. ChatterBot, while preferring to be trained on full conversations, needed to be trained simply on input-output pairs to only learn "character" from me. This may have been one of the reasons for it being much slower. For responses to not take minutes, the bot based on ChatterBot was only trained on 20% of all data. Despite that, its responses generally seemed slightly more on-topic and it was less prone to repeating itself like the bot based on Doc2Vec.

Running

My chat data and the models trained on it have not been included for obvious reasons. After downloading your own FB data (instructions) (change your FB language to English and the time format to 24h beforehand), place the messages folder in it into the same folder as all the scripts, delete all subfolders of messages, leaving only the html files and run in succession scrape.py, datagen.py, train.py and finally chat.py.

January 2018
Andreas Vija

Recommend Projects

sshuster / chatbots-of-me Goto Github PK

chatbots-of-me's Introduction

FB chat data chatbots

The bots

Running

chatbots-of-me's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent