Open Datasets by Third Parties related to the COVID-19
CDC COVID Data Tracker
Centers for Disease Control and Prevention (CDC)
The website provides official statistics on cases, deaths, vaccinations and trends of COVID-19 in the United States, updated daily by 8 pm ET. Data are shown in maps and charts, and can be downloaded in tabular format.
Carnegie Mellon University
Twitter misinformation dataset called "CMU-MisCov19" with 4573 annotated tweets over 17 themes around the COVID-19 discourse. It also includes an annotation codebook for the different COVID-19 themes on Twitter, along with their descriptions and examples, for the community to use for collecting further annotations. Further details related to the dataset, and our analysis based on this dataset can be found at Memon, S. A., & Carley, K. M. (2020). Characterizing COVID-19 Misinformation Communities Using a Novel Twitter Dataset. arXiv.2008.00791. In adherence to the Twitter’s terms and conditions, full tweet JSONs are not provided but a ".csv" file with the tweet IDs so that the tweets can be rehydrated. The dataset also provides the annotations, and the date of creation for each tweet for the reproduction of the results of our analyses.
The Pennsylvania State University
Diverse COVID-19 healthcare misinformation dataset, including fake news on websites and social platforms, along with users social engagement about such news. It includes 4,251 news, 296,000 related user engagements, 926 social platform posts about COVID-19, and ground truth labels.
CORD-19: The Covid-19 Open Research Dataset
Allen Institute for AI
CORD-19 is a resource of over 200,000 scholarly articles, including over 100,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses. This freely available dataset is provided to the global research community to apply recent advances in natural language processing and other AI techniques to generate new insights in support of the ongoing fight against this infectious disease. There is a growing urgency for these approaches because of the rapid acceleration in new coronavirus literature, making it difficult for the medical research community to keep up.
Coronavirus (COVID-19) Tweets Dataset
Jawaharlal Nehru University
This dataset includes CSV files that contain IDs and sentiment scores of the tweets related to the COVID-19 pandemic. The tweets have been collected by an on-going project deployed at https://live.rlamsal.com.np. The model monitors the real-time Twitter feed for coronavirus-related tweets using 90+ different keywords and hashtags that are commonly used while referencing the pandemic. This dataset has been wholly re-designed on March 20, 2020, to comply with the content redistribution policy set by Twitter.
Coronavirus en YouTube
Universitat Politècnica de València
This dataset contains, on the one hand, the initial sample of 73,268 videos retrieved on YouTube from specific queries related to COVID-19 and Spain and, on the other hand, the final sample of 39,702 videos in which the term coronavirus, COVID-19 or SARS-COV-2 in the title or description of the videos. Different descriptors (author, channel, publication date, categorization, title, duration) and metrics (views, likes, dislikes and comments) are offered for each video. The data was obtained using the Webometric Analyst tool during May 2020. This dataset is part of the study entitled "Covid-19: metric analysis of videos and communication channels on YouTube", accepted for publication in the journal El profesional de la Información.
COVID-19 Data Repository
Johns Hopkins University
Data repository for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). Also, Supported by ESRI Living Atlas Team and the Johns Hopkins University Applied Physics Lab (JHU APL).
COVID-19 Fact-checkers Dataset
Social Media Lab - Ryerson University
The COVID-19 Fact Checkers Dataset is a comprehensive list of over 200 active fact-checking organizations and groups that verify COVID-19 misinformation. The dataset is maintained by the Ryerson University’s Social Media Lab as part of an international initiative to study the proliferation of COVID-19 misinformation and to map fact-checking activities around the world in partnership with the World Health Organization (WHO). It was created to provide the public with a better understanding of the COVID-19 fact-checking ecosystem and is intended for use by policy makers and others to make data-informed decisions in the fight against COVID-19 misinformation.
COVID-19 Mobility Monitoring project
ISI Foundation and Cuebiq
Data from COVID-19 Mobility Monitoring project that analyses anonymized location data to understand the effect of mobility restrictions and behavioral changes on the current international COVID-19 outbreak.
COVID-19 Reddit Algo-Tracker
COVID-19 content being promoted by reddit algorithms
COVID-19 World Survey Data API
University of Maryland
API for accessing the daily global Facebook symptoms survey data
University of Southern California
Ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020. The Twitter’s search API was used to gather historical Tweets from the preceding 7 days, leading to the first Tweets in our dataset dating back to January 21, 2020. Twitter’s streaming API was leveraged to follow specified accounts and also collect in real-time tweets that mention specific keywords. To comply with Twitter’s Terms of Service, only the Tweet IDs of the collected Tweets are publicly released. The data is released for non-commercial research use.
COVID19 Infodemics Observatory
CoMuNe Lab - Fondazione Bruno Kessler
Results from the analysis of infodemics due to unreliable content in online social media. Specifically, public posts on Twitter, analyzed with state-of-the-art machine learning techniques for: (1) population emotional state; (2) bot/human classification; (3) news reliability.
Sardar Vallabhbhai National Institute of Technology
Dataset that comprises of labelled news articles of misinformation spread during the Infodemic. It contains approximately 2800 news articles, real and fake, scraped from Poynter and other fact-checking sites. It also contains information such as source URL, publish date and origin country of the news article. Potential applications of this dataset would be to explore various research areas such as classification of intention, study of spatial and temporal features, and linguistic indications that can provide further insight and help mitigate its effect as much as possible.
CSH Covid-19 Control Strategies List (CCCSL)
Complexity Science Hub
A structured open dataset of government interventions in response to COVID-19
Data for the Open COVID-19 Data Working Group
University of Washington
Location for summaries and analysis of data related to n-CoV 2019, first reported in Wuhan, China
Data on COVID-19 by Our World in Data
Our World in Data
Complete COVID-19 dataset is a collection of the COVID-19 data maintained by Our World in Data. It is updated daily and includes data on confirmed cases, deaths, hospitalizations, and testing, as well as other variables of potential interest.
A series of projects to identify, collect, and analyze the value data can provide to the ongoing COVID-19 pandemic
The GDELT Project monitors the world broadcast, print, and web news from nearly every corner of every country in over 100 languages and identifies the people, locations, organizations, themes, sources, emotions, counts, quotes, images and events driving our global society every second of every day, creating a free open platform for computing on the entire world.
Health Intevention Tracking for COVID-19 (HIT-COVID) Data
Boston University and Johns Hopkins University
The Health Intervention Tracking for COVID-19 (HIT-COVID) project tracks the implementation and relaxation of public health and social measures (PHSMs) taken by governments to slow transmission of SARS-COV-2 globally. Hundreds of volunteer data contributors were trained, provided with standardized field definitions and access to an online forum for asking questions and sharing ideas.
Institutional and news media tweet dataset for COVID-19 social science research
Universitat Autònoma de Barcelona
Open access data repository for institutional/news media tweet dataset in the time of COVID-19 pandemic
Mozilla COVID dataset
Data about user browsing in Mozilla Firefox to understand social distancing
The CoVidAffect dataset
Data from CoVidAfect, a nationwide citizen science project aimed to provide longitudinal data of mood changes following the COVID-19 outbreak in the spanish territory
Webhddose’s free datasets
News articles, blog posts and online discussions that mention “CoronaVirus”
COVID-19 related content across Wikipedia projects
MediaFutures is funded by the European Union's Horizon 2020 Programme, under grant agreement number 951962. MediaFutures is a Europe-wide consortium. This website is managed on behalf of the consortium by Eurecat, whose main address is Carrer de Bilbao, 72, 08013 Barcelona (Spain).