Large multi-labelled news dataset for the study of misinformation in news articles
Dataset of 713k articles collected between February 2018 to November 2018 from 194 news and media outlets including what the authors consider mainstream, hyper-partisan, and conspiracy sources. They incorporate ground truth ratings of the sources from eight different "assessment sites" (NewsGuard, Pew Research Center, Wikipedia, OpenSources, Media Bias/Fact Check (MBFC), AllSides, BuzzFeed News and Politifact) covering multiple dimensions of veracity, including reliability, bias, transparency, adherence to journalistic standards, and consumer trust.
Detailed information about this dataset can be found in the publication Nørregaard, J., et al. (2019)
. NELA-GT-2018: A large multi-labelled news dataset for the study of misinformation in news articles
. Proceedings of the international AAAI conference on web and social media. Vol. 13.