Git Product home page Git Product logo

authoridentification's Introduction

alt text

This github Repository contains the osurce codes required for reproducing the results shown in the paper, "Authorship Identification of Microtext Using Capsule Networks" , accepted in IEEE Transactions of Computational Social systems, 2021.

Author's

  • chanchal Suman
  • Ayush Raj
  • Sriparna Saha
  • Pushpak Bhattacharyya

Abstract

Authorship attribution is an important task, as it identifies the author of a written text from a set of suspect authors. Differentmethodologies of anonymous writing, have been discovered with the rising usage of social media. This anonymous writing leads to anincrease in malicious and suspicious activities, and anonymity makes it difficult to find the suspect. Authorship attribution helps to findthe writer of a suspect text from a set of suspects. Different social media platforms such as Twitter, Facebook, Instagram, etc. are usedregularly by the users for sharing their daily life activities. Finding the writer of micro-texts is considered the toughest task, due to theshorter length of the suspect piece of text. We present a Capsule based Convolutional Neural Network model over character n-grams for performing the authorship attribution task. Capsule with Kervolutional Neural Networks (KNNs) has also been utilized for this task.We also present different analyses of our developed system, which improves the interpretability of our developed system. Heat-mapsfor different models, illustrate the relevant text fragments for the prediction task. A standard Twitter dataset is used for evaluating theperformance of the developed systems. The experimental evaluation shows that capsule-based CNNs and capsule-based KNNsperform competitively and are able to outperform previous methods. The source codes will be publicly available after acceptance of this work.

Dataset

We have collected tweets for so many authors then we created Dataset of two types:

  • Dataset with varying number of Tweets We randomly selected 50 authors then we created Dataset as:

    • Dataset with 50 Tweets per author
    • Dataset with 100 Tweets per author
    • Dataset with 200 Tweets per author
    • Dataset with 500 Tweets per author
    • Dataset with 1000 Tweets per author
  • Dataset with varying number of Author We fixed the number of tweets equal to 200 per author then created the following Dataset:

    • Dataset with 100 authors
    • Dataset with 200 authors
    • Dataset with 500 authors
    • Dataset with 1000 authors

Models

Character Unigram with CNN

Character Unigram is constructed for text then it is given to the input layer of neural net.You can see the detailed layer of the neural net in the diagram below.

Results

Dataset with varying number of tweets:

50 Tweets 100 Tweets 200 Tweets 500 Tweets 1000 Tweets

Supplementary file required for the main paper: supplementary-file.pdf

authoridentification's People

Contributors

chanchaliitp avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.