Git Product home page Git Product logo

agoraphon's Introduction

Agoraphon

A Flask application for analyzing activity on an online discussion forum, using scraping, indexing, analytics, relational graph and NLP.

Agor@phon

Objectives

The Agor@phon project aims to contribute to knowledge of natural language for machine learning purposes and to provide real world material to study phenomena such as disinformation, propaganda, and hate / extremist speech.

A tool developed by a researcher-programmer for scientific research, it materializes in a web platform that ensures the collection and in-depth analysis of multimodal contents published on an online discussion forum.

NLP / linguistic application

The content collected will make it possible to build real-world French text corpora, which are rare compared with English ones. This is all the more a valuable resource that, on the forum studied here, the language is of a very oral and slang style, with idioms specific to its user community. That kind of communication is a pain for natural language processing systems which algorithms are mainly trained on texts written by professionals (e.g. press articles) and / or to be read by the greatest number (e.g. Wikipedia).

Disinformation and hate speech investigation

The obtained datasets will also allow the study of phenomena such as propaganda, fake news, trolls as well as hate / extremist speech for which any online communication platform may be fertile ground. What makes the type of forums studied here a little bit different is that users can register truly anonymously - no phone number or verified professional email to provide, which facilitates opportunistic or impulsive interventions, whether to launch or to participate in a discussion. Also, the desire to build and feed any community of followers is out of concern for most of users. Unlike other platforms where family, friends and colleagues may identify them, they can post freely without worrying about their reputation or popularity. And when social desirability is not at stake, anything goes…

It should be noted that, on this research subject too, large French corpora are few or else concentrated on easily accessible deposits (e.g. Twitter).

In addition, the forum is a place of convergence of different kinds of sources, whether social networks, micro-blogging and videos or images sharing platforms, information sites, or even messaging such as Telegram or Whatsapp which screenshots can be found shared by users. Thus it forum opens on a wider spectrum than itself and offers materials that enable to catch societal trends.

How it works

alt text alt text alt text alt text alt text alt text alt text alt text alt text

Stack and Architecture

The application is built with Flask framework 1.1 and written in Python 3.8.

The whole system is based on a distributed architecture. Three servers are at play: the first one is dedicated to scraping ; the second one to data indexing and retrieving ; and the third one hosts the application where the data mining, analytics and visualization tasks are performed.

Status

This project is in progress.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

Authors

  • Initial work: Stephanie BLANCHET, R&D Cognitician. Data Pythonist.
  • Contact: [email protected]

License

This project is licensed under the MIT License - see MIT for details.

agoraphon's People

Contributors

stfblanchet avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

dwtcourses

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.