Along with the COVID-19 pandemic we are alsofighting an ’infodemic’. Fake news and rumors are rampant on social media. Believing in rumors can cause significant harm. This is further exacerbated at the time of a pandemic.
To tackle this, we curate and release a manually annotated dataset of multiple social media posts and articles of real and fake news on COVID-19. We perform a binary classification task (real vs fake) and benchmark the annotated dataset with six machine learning baselines - Decision Tree, Logistic Regression, Gradient Descent Boosted Trees, and Support Vector Machine (SVM). We have also implemented LSTM and BERT models.
For example, the following two posts belong to fake and real categories, respectively.
One which gives the highest F1-Score is selected as the best model.
Dataset Used: https://github.com/diptamath/covid_fake_news/tree/main/data
- We find the highest F1 score for the SVM model, i.e. 93.45%. Hence we conclude it can best detect fake news, for the used dataset.
- The descending order of F1 scores for all the models are: SVM > LR > BERT > LSTM > GDBT > DT
[1] Darcy Warkentin, Michael Woodworth, Jeffrey T. Hancock, and Nicole Cormier. 2010. Warrants and deception in computer-mediated communication. In Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work, CSCW ’10, page 9–12, New York, NY, USA. Association for Computing Machinery.
[2] Arkaitz Zubiaga, Maria Liakata, Rob Procter, Geral- dine Wong Sak Hoi, and Peter Tolmie. 2016. Analysing how people orient to and spread rumours in social media by looking at conversational threads. PLOS ONE, 11(3):e0150989.
[3] J.Cement. 2020. Number of social media users 2025. Statista. Accessed: 2020-10-30.