Git Product home page Git Product logo

snp_filter's Introduction

SNP Filter

Currently, employed to detect the posts containing terrorist propaganda. SNP filter, reads the social media posts and filters the results based on three specifications:

  • Terrorism Tags
  • Motivation Tags
  • Out-of-context (OOC) Tags

Sample content of each Tag set is given in the table below:

Tag Set Content
Terrorist Tags (S1) Specifically terrorism agents or terrorism actions, such as {isis, kill, threaten,taliban}
Motivation Tags (S2) {Encourage, Inspire, etc.}
OOC Tags (S3) {"Name of People from Unrelated Context", etc.}

Algorithms

Preprocess Data

Input: twitter
Output: preprocessedTwitter
1) Split the post into independent sentences 
2) Remove the stop words from text
3) Decompose compound data, such as #supportISIS =>support ISIS
3) Stem the words to root words, such as support, supported, supporting => support

Filter Algorithm

Input : a new twitter
Out put : Accept or not
1)if the twitter contains words in filter set 1 then 
2)   if the twitter contains words in filter set 2 then
3)        if the twitter contains words in filter set 3 then
4)               return not accepted
5)        else
6)               return accepted 

How to Run the Code

Basic Filtering

The code in Mainapp.java is doing basic filtering. Run this file, in the result, should see a set of filtered twitter set in the standard output. This set should contain the twitters that have motivational terrorism information in them.

Limitations

1) Multiple Nouns

Example of multiple nouns : Suiside Belt, cut their heads off, call upon

Since our matching algorithm workds via key word matching in a 1-gram model, such that it can't work well on Multiple Nouns. For example, it will be hard to filter out a twitter that contains the object Suicide Belt. The problem happens because when we break down the sencenten, the smallest unit after dividing is a token made up of one word, such as embasy, isis, division etc. In the meantime, the tags sets are also made up of single word, such as terrorism, isis, threaten.

snp_filter's People

Contributors

shirish57 avatar xiemengrain avatar

Watchers

 avatar  avatar

Forkers

xiemengrain

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.