Git Product home page Git Product logo

reddictio's Introduction

Contributors Stargazers Issues


Logo

RedDictio

Utilizing machine learning, to analyze subreddit comments for hate/toxicity.
Explore the docs »

Report Bug · Request Feature

Table of Contents
  1. About The Project
  2. Contributing
  3. Contact
  4. Acknowledgments

About The Project

RedDictio is overall a test of our ability to create a webpage, hook it up to a hosted database, scrape data from reddit, and judge this data using a neural network. It connects to several fields in computing such as Database Design, Data Engineering, Data Science, Machine Learning, Cloud Computing, and Web Development.

Issues and Solutions

  • Data for the Neural Network was one of the biggest issues. Determining what is hateful language is very serious, therefore it is important to have the highest accuracy possible when detecting hate. We tested two different data sets. The first one was from reddit but could only get roughly 75% accuracy on its own validation set. The second dataset was generated by another neural network, but it could only reach an 80% accuracy on it’s validation set. An attempt was made at combining the two to see if it would reach a higher accuracy, but it did not. Finally, a Twitter dataset was used which reached an amazing 95% accuracy. While this was by no means perfect and a higher accuracy should be aimed for, it was a great choice for the project given the time constraints.
  • We considered hosting it online on Google Vertex AI or on a Virtual Machine. These were both great choices, but we were not able to get either of them to work well for a cheap enough price. Vertex AI would have ended up costing us over $100 a month while a Virtual Machine would have cost a few dollars a day. Since we desired for the project to be permanently hosted, not just for this semester, we sought another solution. We ended up deciding to host the Neural Network on Google Drive. This is most certainly not the best solution but it is the cheapest and most effective or the price.
  • We had the option of handling all of the processing on the Cloud, but because of our inexperience, and the cost of using the Google Cloud, we ended up hosting it on Google Colab. Google Colab is a cloud service provided by Google. It costs roughly $10 a month for the first premium service, which is significantly cheaper than the other options Google Cloud offered. The only issue is that it has to be manually run and monitored, but it still uses Google GPUs and storage instead of our own computers.
  • We tried using Google Query but it would not work with our data and had very confusing tutorials to set up. We then tried using Cloud SQL, but it was far too expensive for a service. As a result, we went for the safer option of using Sqlite3 and hosting the DB directly on the webpage. This was not a good choice, this would slow the server and would make it incredibly hard to edit the DB, but we wanted an option that functioned. Luckily, we discovered Amazon RDS and migrated all the data and code to MySql. Now all the processing connects to the RDS DB and updates it remotely, while the web page accesses the same DB remotely. We can therefore update the DB dynamically and change the data whenever we want, all for completely free.
  • At first we were hosting the webpage through Google App Engine, however, we were being ~$10/day, which is out of the picture for broke college students. Google Cloud has many products that are named similarly so it was confusing to figure out which one to use. After researching the cost of each option, we decided to switch to Google Cloud Run and have only been charged a few cents since then.
  • Since we are remotely accessing our database and auto deploying to Google Cloud Run, it would be a terrible idea to make those credentials public. To obfuscate the credentials, we used Github Secrets and Environment Variables. Github Secrets is a built-in tool in Github that allows the user to add credentials and other ‘secrets’, and allows them to be used in workflows. This is what enables the auto deployment to Google Cloud Run. The Environment Variable had to be configured within the Google Cloud Run API to enter the credentials needed to access the remote database.

Future Goals

  • Rework the neural network
  • Limit amount of comments being displayed on one page
  • Add more neural networks
  • Allow users to vote on whether a comment is hateful or not

(back to top)

Built With

Neural Network

Web Page

(back to top)

Contributing

At this time we aren't looking for any contributors. If you feel like you have an idea that would benefit this project, please feel free to contact either Jairo or Ryan.

(back to top)

Contact

Jairo Garciga - Linkedin | Github
Ryan Smith - Linkedin | Github

Project Link: https://github.com/rpsmith77/RedDictio

(back to top)

Acknowledgments

Thank you to everyone who helped with this. Special shout out to:

(back to top)

reddictio's People

Contributors

rpsmith77 avatar jgar157 avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

jgar157

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.