🌮 TACO -- Twitter Arguments from COnversations

This repository contains the annotation framework, dataset and code used for the resource paper "TACO -- Twitter Arguments from COnversations". To use the baseline model, please visit Hugging Face.

Table of Contents:

Repository Layout
Findings
Example
Licensing
Contact
Acknowledgements

Repository Layout

data
1. README.md: A data-specific README for TACO and its annotation process.
2. annotation_framework.pdf: The annotation framework for TACO.
3. conversations.csv: Having stored the structure of all collected conversations.
4. majority_votes.csv: All the majority votes, which serve as the labeled ground truth.
5. worker_decisions.csv: All individual expert decisions.
notebooks
1. dataset_statistics.ipynb: For comparing the dataset statistics.
2. classifier_cv.ipynb: For training and evaluating the baseline model.
outputs
1. bertweet_cv_predictions.csv: The ground truth and cross-validation results of the baseline model.

Findings

Dataset Metadata

                 Language  Sample       Total            Query-Time        Key-Date
Abortion         English   486 (26.8%)   29,939  (5.0%)  2021/08/15-10/16  S.B.8 took effet on 2021/09/01.
Brexit           English   535 (29.5%)  427,260 (70.9%)  2020/01/01-03/01  Brexit took effect on 2020/02/01.
GoT              English   192 (10.6%)   61,705 (10.2%)  2019/04/01-05/01  GOT S8 premiered (HBO-US) on 2019/04/14.
LOTRROP          English   209 (11.5%)   14,014  (2.3%)  2022/02/01-03/01  LOTRROP teaser trailer was released on 2022/02/13.
SquidGame        English   226 (12.5%)   51,215  (8.5%)  2021/09/10-10/10  Squid Game was released (Netflix wordlwide) on 2021/09/17.
TwitterTakeover  English   166  (9.1%)   18,531  (3.1%)  2022/04/01-05/01  Elon Musk offers $43 billion to purchase Twitter on 2022/04/14.

Dataset Distribution

  Argument                    No-Argument
  865 (49.88%)                869 (50.12%)
  
  Reason       Statement      Notification  None
  581 (33.50%) 284 (16.38%)   500 (28.84%)  369 (21.28%)

Conversational Reply Patterns

              Reason  Statement   Notification    None
      Reason    0.51       0.12           0.31    0.06
   Statement    0.38       0.21           0.33    0.08
Notification    0.26       0.08           0.57    0.09
        None    0.26       0.08           0.44    0.22

Performance Multi-Class Classification Task (BERTweet)

                precision    recall  f1-score   support
        Reason     0.7369    0.7522    0.7445       581
     Statement     0.5437    0.5915    0.5666       284
  Notification     0.7902    0.7760    0.7830       500
          None     0.8387    0.7751    0.8056       369

      accuracy                         0.7376      1734
     macro avg     0.7274    0.7237    0.7249      1734
  weighted avg     0.7423    0.7376    0.7395      1734

Performance Binary Classification Task (BERTweet)

               precision    recall  f1-score   support
  No-Argument     0.8666    0.8297    0.8477       869
     Argument     0.8359    0.8717    0.8534       865

     accuracy                         0.8506      1734
    macro avg     0.8513    0.8507    0.8506      1734
 weighted avg     0.8513    0.8506    0.8506      1734

Textual Features

                   Reason Statement Notification      None
  Average Length      213       122          156        63
            URLs    34.6%      8.1%        71.6%      7.6%
   external URLs    41.8%     17.4%        49.7%     17.9%
          Emojis    11.9%     14.1%        16.0%     35.8%
        Hashtags    45.8%     38.7%        60.0%     12.2%
           Users    65.9%     68.0%        56.4%     91.3%
Discourse Marker    32.9%     19.0%        11.4%      8.7%

Error Analysis

              Reason  Statement   Notification   None
      Reason     437         76             66      2
   Statement      73        168             13     30
Notification      63         26            388     23
        None      20         39             24    286

Example Conversation

Licensing

TACO -- Twitter Arguments from Conversations by Marc Feger is licensed under CC BY-NC-SA 4.0

Contact

Please contact [email protected] or [email protected].

Acknowledgements

We thank Aylin Feger, Tillmann Junk, Andreas Burbach, Talha Caliskan, and Aaron Schneider for their contributions to the annotation process in this paper.

tomatenmarc / taco Goto Github PK

taco's Introduction

🌮 TACO -- Twitter Arguments from COnversations

Repository Layout

Findings

Dataset Metadata

Dataset Distribution

Conversational Reply Patterns

Performance Multi-Class Classification Task (BERTweet)

Performance Binary Classification Task (BERTweet)

Textual Features

Error Analysis

Example Conversation

Licensing

Contact

Acknowledgements

taco's People

Stargazers

Watchers

Forkers

Recommend Projects

Recommend Topics

Recommend Org