Git Product home page Git Product logo

climachange_claims's Introduction

Table of contents

Introduction

Objective:

  • To tag and model ACTOR(any Person/Organisation) and their corresponding CLAIM(S) made on environmental issues, emmisions and/or climate-change and whether they support the claim or not. We also wish to generate and visualize a poltical discourse network hence generated from the annotations and analysis.

For Linux/Ubuntu based systems:

#### 1. Install python virtualenv:
$ pip install virtualenv 

#### 2a. Create a virtual environment
$ virtualenv climachange_env
## OR
$ python3 -m virtualenv climachange_env 

#### 2b. Add virtual environment to jupyter-notebook (for using the .ipynb notebooks)
$ python3 -m ipykernel install --user --name=climachange_env  

#### 3. Activate virtual environment
$ source climachange_env/bin/activate

#### 4. Create git clone repo
$ git clone https://github.com/parashar-lonewolf/ClimaChange_Claims.git

#### 5. Move to folder
$ cd ClimaChange_Claims

#### 6. Install requirements
$ pip install -r requirements.txt 

#### 7. Run Annotator
$ python3  ACTOR_CLAIM_ANNOTATOR.py 

The annotator

Instructions for using the Annotator :

  • All tagged data will be stored in ConLL format as: ConLLformat_annotator.txt (here you see ConLLformat_annotator_ank/ian.txt) as shown here
  • All claims and tagged data will be stored in Save folder (here you see Claims_ank/ian.txt)

The GUI windows

To explain how this works, As an example lets talk about a claim made by Bill Clinton from the NYT article from 1997/06/23 (this is stored in Claims.txt which is the Save folder)

" But in a sharp dispute with France and other European nations that dominated this morning's final session of ''The Summit of the Eight,'' President Clinton refused to commit the United States to a specific reduction in the emission of carbon dioxide and the other greenhouse gases that contribute to global warming, agreeing only to ''substantial reductions'' by 2010. "

1.

  • "'So do you think this has a CLAIM ?'

You answer yes by selecting a new actor, here President and Clinton and clicking Next (if the annnotator has already correctly chosen the actor for you, you may just click Next)

Window 1

2.

  • 'Select the Claim boundaries'

This brings us to the claim annotations, where we have to select the words to be used as boundaries, i.e the first word and the last word of the claim sequence. You can select commit and warming, on the window. Choose if the actor Supports or is Against the claim (in this case since he ' "refused" to commit ', it's Against) and click on Next

Window 2

  1. if you select Cancel at WINDOW 1, you remove that sentence from tagging criterias.
  2. if you select Exit the prgram ends and, when restarted, it restarts at the candidate sentence quit, so progress is autosaved by default.
  3. If you select Save Prev Annotations, it will save all current progress of dataset annotations in the Save folder as Progress_Manager.txt(saved here as Progress_Manager_ank/ian.txt), and once you exit and restart, the annotation will continue from the last annotation point.
  4. In case, you made a mistake in the annotation, the current version addresses the issue as:
    • click on Save Prev Annotations
    • press Exit
    • rerun the annotator program

ConLL annotated data format and example

  • ConLL format used: (Using above output as example)
  • FORMAT: Token no in sent | Fine-grained tag | Token entity type | token Relation | Actor-Claim-O tag | (optional)(+/-) supports/opposes ConLL data

Claim classification

All the tagged Claims are classified into 5 clusters using K-means:

Claim Cluster

The Tagger

CRF Tagger

This notebook has a functional models made using CRF as a proof of concept, since we havent annoatted enough data, the model files are also available for use in Pickles folder. We have trained 3 models each of clustered and unclustered data and run them for 500,1000 and 1500 iterations respectively

BiLSTM Tagger

The Taggers below use BERT embeddings, from here: https://github.com/google-research/bert

  1. BERT Tiny: L=2, H=128, A=2
  2. BERT base uncased: L=12, H=768, A=12

This notebook has a functional models made using BiLSTM as a proof of concept, since we havent annoatted enough data.

BiLSTM-CRF Tagger

  1. BERT Tiny: L=2, H=128, A=2

This notebook also has functional models made using BiLSTM + CRF models as a proof of concept, and that they yield much better results than only BiLSTM models, given we havent annotated enough data.

Political discourse network

As an example we show you two of our Political discourse graphs generated from our annotations, from 1997 and 2002

  • 1997
  • 2002

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.