lunarwatcher / nn-chatbot Goto Github PK

A chatbot with a neural network, which can be used in the console, on Discord or in the Stack Exchange network chat.

License: Apache License 2.0

Python 9.55% Java 38.79% Kotlin 51.66%

nn-chatbot stackexchange discord-py neural-network recurrent-neural-networks chatbot tensorflow tensorlayer python java

nn-chatbot's Introduction

NN-chatbot

This is a chatbot both designed for neural network interraction in addition to the default command-based system. Note that it's still a work in progress, so there are bound to be bugs.

Documentation notice

The documentation is currently beign written, and some of it is completely outdated. The readme is up to date (more or less), but it still needs editing (the same as the rest of the documentation). See #47

Install

Dependencies

discord.py - not necessary if you're using the Java backend
numpy
tensorlayer
tensorflow (tensorflow-gpu is recommended - CPU is extremely slow)
sklearn
tensorboard (will be added layer)
asyncio
nltk
Python 3.6 (anything under 3.5 requires code edits because of the async keyword and type hints. However, 3.6 is the only tested version). Note that 3.7 is currently not supported by Tensorflow, which will cause problems with that dependency.
Java 8 - The Java module downloads its dependencies as .jars using Gradle. Just run it, Gradle will take care of the rest

Please note:

In some cases, the dependencies for the depencencies of this project doesn't get installed. I.e. Tensorlayer requires Scipy, which for some reason doesn't get installed. Should this happen, pip install the missing packages. Using pip on the package installs should be enough to avoid missing dependencies. However, should there still be missing dependencies, manually install them.

Dataset

The Corell Movie Dialog Corpus is the dataset the bot is designed for at the moment. There will be support for custom dataset (and conversation scrapping for personalized conversations), but that's an issue for later.

Setup

The text files in the corell movie dialog corpus goes in a directory called raw_data.

And finally, when all the data is added, run bot.py. It'll set up the necessary files and start training once it's done. Checkpoints are saved every epoch (and it overrides the past save).

Running it

The bot is split in two parts:

NN backend
Java bot

The Java bot has support for the NN backend, but can also be run without it.

Note that you don't have to train the network before using BotCore.java. If you decide to use the neural net, you can train it while running the Java bot, but the chat feature won't be available until after the Flask server comes online.

The bot.py file supports CLI arguments. They are:

--help     | shows the help message and exits
--training | whether or not the bot trains (boolean)
--mode     | The mode to run in (int). 0 for console, 1 for the flask server.

NOTE:

It's recommended that you cd into the root directory when running bot.py. It's the root directory the compiled .jar file will run from, meaning it calls the bot.py file relative to that. And when the bot.py file is called from the root directory, that is considered the active classpath (probably the wrong word) for the python script as well, meaning it looks for folders and files in rootDir/, not rootDir/Network/.

System

The code is based on a bag of words, in order to be able to vectorize the words properly.

Known issues

Changing the vocab size prevents loading of previous save

Unfortunately, by design. The Embedding layer takes a fixed input which is saved in the checkpoint, meaning changes in dimensions will prevent loading of the previous model. Figuring out a way to get an expanding vocab is a top priority.

TensorFlow throws OutOfMemoryErrors

The easy way: Reduce the input dim or the vocab size. The hard way: Get a better GPU with more vram, or get more GPUs. If you use the CPU, get more RAM.

Nonsense replies

Train more.

No memory

Image showing the differences between some of the core chatbot types (not embedded because it didn't load properly)

General chatbots are incredibly hard to make. Creating knowledge-based generative models isn't exactly something that's documented in the tensorflow documentation (or anywhere else for that matter). For now, getting sensible replies is the top priority, along with expandable vocabulary. Adding generative memory (not retrieval-based) with context is a task for later

Training is slow

Use a GPU if possible. If you already are, you'll either need more GPU's, or more powerful ones. Hardware is unfortunately a problem when it comes to neural networks. Or decrease the vocab size, it speeds it up a little.

Notes

With the Java/Kotlin bot, using the revision command requires access to git. Meaning it has to be added to the path or in some other way become accessible to the program.

Licensing

Copyright 2018 LunarWatcher

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

  http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

nn-chatbot's People

Contributors

Stargazers

Watchers

Forkers

whispres dawgswaffle

nn-chatbot's Issues

Optimize boot

Only applies to chatting

Launch the bot and net async, so the bot can get online quickly independently of the network.

Status

prints events by how many times events with a specific UID has been intercepted. I.e.:

User     | Events
SomeUser | 912
User2    | 132

applies on Discord too, use site-specific commands to deal with the async bs of getting usernames through Discord.py

Remove eval command

This just exists to inform about why the eval command was removed and what's required for it to be re-added.

The problem:

[TRIGGER]eval while(true)print("infinite recursive loop")

Attempted solutions:

Kotlin corroutines - cancel and cancelAndJoin doesn't actually stop the execution of the script
Regular threads - does not work. stop, join, interrupt, etc doesn't actually stop it.

Assuming a timeout can be added, and it works, the eval command can be re-added. Until then, it's not an option to do so.

Truncate discord help command

2000 char limit. Splitting is already happening (horribly though, that's another item on the TODO-list), but having the entire help list takes up a lot of space. [trigger]help <command name> exists, so help needs to be added to the commands too.

Enable disabling sites

Add booleans in config.py to disable entire output sites.

I.e.:

useDiscord = True

disabling it disables booting the site and adding it to the site list (for privileges).

Make an index for sites

I noticed you have several loops to find the site desired.

https://github.com/LunarWatcher/NN-chatbot/blob/master/Commands.py
Lines 28 - 32, 38 - 42, etc.

What I think would help is that since the sites object is an array, we could create an index object. What would be done is making a new line after line 20 that reads: siteIndexs = {}

A new line would be appended after line 31 that reads: siteIndexs[site.name] = sites.indexOf(site)

That enables us to do the following:

instead of: for site in sites: #some verification method
we would use:
sites[siteIndexs[site.name]]

Line 20, Commands.py

sites: list() = []

Throws a syntax error:

 sites: list() = []
      ^
SyntaxError: invalid syntax

Alphabetical help command

For all the hard-coded commands (learned commands are an exception).

Stack Exchange OpenID deprecation

SE OpenID will be deprecated on July 25th. Most likely, this will break chat.SE login. From what I can tell, it looks like MSE and SO is unaffected as the login seems to not use OpenID, though I can't be entirely sure.

It's probable that at least one Stack Exchange chat site breaks, depending on the flow. This might work, but how much of the current login system is affected. There are currently no POST requests to the gifs mentioned, which is why it's possible that SO and MSE login will break too.

It's hard to debug until OpenID is removed completely, since there isn't a problem yet. But it's in less than 20 days, so it still needs fixing soon

Learn and unlearn

Breaking: Database autosave wipes data

Database autosave ends up wiping the data.

Summon and unsummon is broken

Idk why, doesn't work though

Java 10 support

Ranks not working on Discord

The ranks don't work on Discord for some reason. Could be a data storage issue

Time command - timezone input

Make it so that users can create their own vocab for the bot.

I think it would be useful if users could make their own vocab the first time before they make saves for the bot. This would require 2 things:

A way to access the chatroom posts.
A way to write to a file.

The first one can be solved by SO-chatbot by Zirak, combined with the modification of the console.log function.
The second one can be done by opening a new window and using document.write.

requirements.txt

Restart

Command to restart the bot. Would make life easier. Most likely, this means adding a gradle task and executing it before calling system.exit, but this would be much easier with a core system where the bot itself is external from the process running it. Using Python for this may be an option, seems fairly easy too

Clean up requirements.txt

Requests is vulnerable to CVE-2018-18074. IIRC, all the Python code using requests was removed. With the cleanup, there's a bit of code that apparently wasn't updated

Move SE to an entirely event-based system

Currently, the flow for events on SE is:

Receives an event from the websocket
Checks if the event has an r[roomid] (without brackets)`. Creates a new JsonNode
Checks if the JsonNode created in the previous event has a key called e
- If yes, moves to a bunch of if-statements with known, mapped, events
- If no, returns.

While this works, it gets messy. In addition, it's not flexible. The goal should be:

Receives an event from the websocket
Parses it like the current flow (necessary to get to the event object)
Gets the event_id
Uses the event_id to find related callbacks (if any) and forwards the JsonNode from the parsing to them

This would be more flexible, and also enable the registration of multiple callbacks for different events.

Summon and unsummon commands (SE)

Backup command

In light of recent bugs (I'm talking to you #50), it would be a good idea to have a backup command. This command would be accessible to rank 8 users and up, and creates a separate file. This is why it's separate from the save command, as this command would create a new file, where as the save command saves into the existing one.

Add message editing, deletion, and reactions

This only applies to SE and Discord. Twitch does not have message reactions, editing, deletion, starring, or something similar

Better learned commands and argument handling

Example of a learned command currently deployed in my instance:

name: charge
output: *charges @%s with %s*

The current usage is:

[TRIGGER]charge who, what

However, other commands wok differently and split by space instead. For consistency, learned command arguments should be:

[TRIGGER]charge who what

But this presents one problem: How are multi-word arguments handled? They should be handled like this:

[TRIGGER]charge who "with what"

Show username in getRank

Contributing to the Wiki

Given GitHub doesn't support pull requests for the wiki repository, contributors need a workaround to submit and collaborate on documentation.

I found a few possible suggestions on the internet:

Including a github wiki in a repository as a submodule
Enabling pull requests on github wikis setup
Contributors clone the wiki; make changes to their version of the wiki repo; submit a link to their repo as an Issue; and request for it to be added to the main wiki.

Performance issues

There are quite a lot of them. There's not a list of them at the moment.

For now, it's the obvious stuff that needs to be fixed, like unnecessary calls to get fields, unnecessary iterating, and generally other inefficient calls or design.

Doge

Such command. Much fun. WOW!

API cleanup

Most of the API is a mess. It needs to be refactored, and specifically the Java module.

The neural net could use some improvements too, specifically on memory use. It might need to be re-written from scratch.

Ping triggers commands

CLI arguments

Save time - add CLI arguments to allow direct boot without prompting the user.

Necessary stuff:

Parsing
Set variables if the CLI argument for the relevant var isn't found
Without getting in the way of the force training system

Relevant code:

Inside if __name__ == "__main__" in bot.py

Output formatting

don ' t you just hate reading messages like this ? Especially , since all the formatting is messed up by the bot and it ' s training . . .

Anyways, the "problem" (by design obviously) created here is (supposed to be) cleaned up here, but it does so in a horrible, non-functioning manner. It works perfectly fine on the first few replaces, but after that the poorly constructed replace statements don't clean up properly.
These need to be fixed

Minification of bot.py

Please mark this as "enhancement".
Also please delay this until the project is finished.

Use any tricks and shortcuts you know to make bot.py smaller for quicker download size.
I know Python cannot be compressed exactly like JS (made into 1 line).
Stil it could be made a bit shorter.

netStat

prints whether or not the Flask server is online or not

Lombok doesn't work with Java 9

Could be a version problem for that matter. When using Java 9, the bot crashes at the lombok compiler. As a result, the bot doesn't run.

Functions in taught command

Learned commands are very static at the moment. They currently only support arguments with %s or {id}, but that's where it stops. By adding functions (i.e. url encoding) the learned commands can be more complex than they currently can be

Clarify API

#54 escalated quickly.

Clean up the multi-command Kotlin files
Make the classes more clear
Merge the site config classes
Remove redundant classes
Do API calls instead of storing usernames

Refactor CommandCenter

Having one instance of every command for every site is a massive memory waste. It should be refactored to have a single instance, but each class still uses the existing functions (i.e. loadSE) to register that site as using a specific set of commands (aside the shared ones). This would also require refactoring with passing the site, but it shouldn't be a problem

Java/Kotlin

I'm going to hate myself for this, but convert the main bot to Java/Kotlin and use Python for the backend

Better docs

The documentation is a mess and a lot of it is non-existent. Documentation needs to be added as soon as possible

Current progress

Setup documentation - thanks to Whispres
Command/listener documentation
- Extending existing ones
Code documentation (Javadoc/Kdoc) - not very critical.
Interaction guide
- rank documentation

Notes

Interaction guide

A basic "how to use the bot". Creating command and listener documentation would take care of a large part of this, but stuff like understanding arguments (currently a comma-separated list, #61 changes that a bit but still) for commands, also the CLI-style arguments (--somename "argument content").

Should also cover ranks

Start and stop the network as a command

Make a `if ...` statement for discord.py.

You should make it so that the user does not manually have to disable several lines. The lines could simply be wrapped in a if: statement. You could do this by adding these lines in the config.py file:

"useDiscord": "true",
"useStackExchange":"true"

lunarwatcher / nn-chatbot Goto Github PK

nn-chatbot's Introduction

NN-chatbot

Documentation notice

Install

Dependencies

Please note:

Dataset

Setup

Running it

NOTE:

System

Known issues

Notes

Licensing

nn-chatbot's People

Contributors

Stargazers

Watchers

Forkers

nn-chatbot's Issues

Current progress

Notes

Recommend Projects

Recommend Topics

Recommend Org