Git Product home page Git Product logo

fake-video-corpus's Introduction

fake-video-corpus

This is the first, to our knowledge, annotated dataset of debunked and verified user-generated videos (UGVs), along with multiple near-duplicate reposted versions of them. For details refer to Fake Video Corpus.

The dataset comprises videos from a variety of event categories, such as politics, sports, natural disasters, accidents, wars, etc. Currently, it consists of 200 unique debunked videos (for simplicity also referred to as fake) and 180 unique verified videos (also referred to as real). In particular, different types of fake video are included:

  • Staged videos where actors perform scripted actions under direction.
  • Videos where contextual information is false (e.g. the claimed video location is wrong).
  • Past videos presented as UGV from breaking events.
  • Videos of which the visual or audio content has been altered through editing.
  • Computer-generated Imagery (CGI) posing as real.

The dataset was extended following a largely automatic systematic process that combines text search and near-duplicate video retrieval, followed by manual annotation using a set of guidelines. More specifically:

  1. For each video in the original set, the video title was used as input.
  2. The title was reformulated to a more general form (called the “event title”). For example, a video with title “Video Tornado IRMA en Florida EEUU Video impactante” was assigned to event “Tornado IRMA at Florida”.
  3. The event title was translated from English into four major languages: Russian, Arabic, French, and German using Google Translate. These languages were selected after preliminary tests indicated that near-duplicate videos appear with increased frequency in these languages.
  4. The video title, event title, and the four translations were used as separate queries to the three target platforms: YouTube, Facebook, Twitter. All returned videos were aggregated in a common pool.
  5. A near-duplicate retrieval algorithm was used to search within this pool for near-duplicates of the video.
  6. After manual inspection, erroneous results were removed and only actual near-duplicates were retained.

The overall dataset consist of 3957 videos annotated as fake and 2458 annotated as real.

Categories for near-duplicates of fake videos include Categories for near-duplicates of real videos include
Fake: those that reproduce the same false claims Real: those that reproduce the same factual claims
Uncertain: those that express doubts on the veracity of the claim Uncertain: those that express doubts on the veracity of the claim
Debunk: those that attempt to debunk the original claim Debunk: those that attempt to debunk their claims as false
Parody: those that use the content for fun/entertainment Parody: those that use the content for fun/entertainment
Real: those that contain the earlier, original source from which the fake was made

Facebook videos that were relevant to the dataset but were published by individual users (and thus could not be accessed through the API) were excluded from this dataset.

Dataset

The initial 200 fake and 180 real videos are contained in FVC.csv.

The near duplicates are contained in FVC_dup.csv.

The text queries for retrieving the near duplicates are contained in FVC_text_queries.csv.

License and acknowledgement

The video dataset is provided under the Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

The video dataset is supported by the InVID project, which is funded by the European Commission under contract number 687786.

If you use this video dataset for your research, please include a citation to the following paper: Papadopoulou, O., Zampoglou, M., Papadopoulos, S., & Kompatsiaris, Y. (2018). A Corpus of Debunked and Verified User-Generated Videos. Online Information Review. Accepted for publication.

@article{papadopoulou2018corpus,
  author = "Papadopoulou, Olga and Zampoglou, Markos and Papadopoulos, Symeon and Kompatsiaris, Ioannis",
  title = "A corpus of debunked and verified user-generated videos",
  journal = "Online Information Review",
  doi = "10.1108/OIR-03-2018-0101",
  year={2018},
  publisher={Emerald Publishing Limited}
}

If you encounter any issues in this process, please get in touch with Olga Papadopoulou [email protected].

fake-video-corpus's People

Contributors

olgapapa avatar kleinmind avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.