COVID-19 related datasets

Open Datasets by Third Parties related to the COVID-19

CDC COVID Data Tracker

Centers for Disease Control and Prevention (CDC)


The website provides official statistics on cases, deaths, vaccinations and trends of COVID-19 in the United States, updated daily by 8 pm ET. Data are shown in maps and charts, and can be downloaded in tabular format.

CMU-MisCov19

Carnegie Mellon University


Twitter misinformation dataset called "CMU-MisCov19" with 4573 annotated tweets over 17 themes around the COVID-19 discourse. It also includes an annotation codebook for the different COVID-19 themes on Twitter, along with their descriptions and examples, for the community to use for collecting further annotations. Further details related to the dataset, and our analysis based on this dataset can be found at Memon, S. A., & Carley, K. M. (2020). Characterizing COVID-19 Misinformation Communities Using a Novel Twitter Dataset. arXiv.2008.00791. In adherence to the Twitter’s terms and conditions, full tweet JSONs are not provided but a ".csv" file with the tweet IDs so that the tweets can be rehydrated. The dataset also provides the annotations, and the date of creation for each tweet for the reproduction of the results of our analyses.

CoAID

The Pennsylvania State University


Diverse COVID-19 healthcare misinformation dataset, including fake news on websites and social platforms, along with users social engagement about such news. It includes 4,251 news, 296,000 related user engagements, 926 social platform posts about COVID-19, and ground truth labels.

CORD-19: The Covid-19 Open Research Dataset

Allen Institute for AI


CORD-19 is a resource of over 200,000 scholarly articles, including over 100,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses. This freely available dataset is provided to the global research community to apply recent advances in natural language processing and other AI techniques to generate new insights in support of the ongoing fight against this infectious disease. There is a growing urgency for these approaches because of the rapid acceleration in new coronavirus literature, making it difficult for the medical research community to keep up.

Coronavirus (COVID-19) Tweets Dataset

Jawaharlal Nehru University


This dataset includes CSV files that contain IDs and sentiment scores of the tweets related to the COVID-19 pandemic. The tweets have been collected by an on-going project deployed at https://live.rlamsal.com.np. The model monitors the real-time Twitter feed for coronavirus-related tweets using 90+ different keywords and hashtags that are commonly used while referencing the pandemic. This dataset has been wholly re-designed on March 20, 2020, to comply with the content redistribution policy set by Twitter.

Coronavirus en YouTube

Universitat Politècnica de València


This dataset contains, on the one hand, the initial sample of 73,268 videos retrieved on YouTube from specific queries related to COVID-19 and Spain and, on the other hand, the final sample of 39,702 videos in which the term coronavirus, COVID-19 or SARS-COV-2 in the title or description of the videos. Different descriptors (author, channel, publication date, categorization, title, duration) and metrics (views, likes, dislikes and comments) are offered for each video. The data was obtained using the Webometric Analyst tool during May 2020. This dataset is part of the study entitled "Covid-19: metric analysis of videos and communication channels on YouTube", accepted for publication in the journal El profesional de la Información.

COVID-19 Data Repository

Johns Hopkins University


Data repository for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). Also, Supported by ESRI Living Atlas Team and the Johns Hopkins University Applied Physics Lab (JHU APL).

COVID-19 Fact-checkers Dataset

Social Media Lab - Ryerson University


The COVID-19 Fact Checkers Dataset is a comprehensive list of over 200 active fact-checking organizations and groups that verify COVID-19 misinformation. The dataset is maintained by the Ryerson University’s Social Media Lab as part of an international initiative to study the proliferation of COVID-19 misinformation and to map fact-checking activities around the world in partnership with the World Health Organization (WHO). It was created to provide the public with a better understanding of the COVID-19 fact-checking ecosystem and is intended for use by policy makers and others to make data-informed decisions in the fight against COVID-19 misinformation.

COVID-19 Mobility Monitoring project

ISI Foundation and Cuebiq


Data from COVID-19 Mobility Monitoring project that analyses anonymized location data to understand the effect of mobility restrictions and behavioral changes on the current international COVID-19 outbreak.

COVID-19 Reddit Algo-Tracker

Cornell University


COVID-19 content being promoted by reddit algorithms

COVID-19 World Survey Data API

University of Maryland


API for accessing the daily global Facebook symptoms survey data

COVID-19-TweetIDs

University of Southern California


Ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020. The Twitter’s search API was used to gather historical Tweets from the preceding 7 days, leading to the first Tweets in our dataset dating back to January 21, 2020. Twitter’s streaming API was leveraged to follow specified accounts and also collect in real-time tweets that mention specific keywords. To comply with Twitter’s Terms of Service, only the Tweet IDs of the collected Tweets are publicly released. The data is released for non-commercial research use.

COVID19 Infodemics Observatory

CoMuNe Lab - Fondazione Bruno Kessler


Results from the analysis of infodemics due to unreliable content in online social media. Specifically, public posts on Twitter, analyzed with state-of-the-art machine learning techniques for: (1) population emotional state; (2) bot/human classification; (3) news reliability.

COVID19FN

Sardar Vallabhbhai National Institute of Technology


Dataset that comprises of labelled news articles of misinformation spread during the Infodemic. It contains approximately 2800 news articles, real and fake, scraped from Poynter and other fact-checking sites. It also contains information such as source URL, publish date and origin country of the news article. Potential applications of this dataset would be to explore various research areas such as classification of intention, study of spatial and temporal features, and linguistic indications that can provide further insight and help mitigate its effect as much as possible.

CSH Covid-19 Control Strategies List (CCCSL)

Complexity Science Hub


A structured open dataset of government interventions in response to COVID-19

Data for the Open COVID-19 Data Working Group

University of Washington


Location for summaries and analysis of data related to n-CoV 2019, first reported in Wuhan, China

Data on COVID-19 by Our World in Data

Our World in Data


Complete COVID-19 dataset is a collection of the COVID-19 data maintained by Our World in Data. It is updated daily and includes data on confirmed cases, deaths, hospitalizations, and testing, as well as other variables of potential interest.

Data4COVID19

The GovLab


A series of projects to identify, collect, and analyze the value data can provide to the ongoing COVID-19 pandemic

GDELT

Google Jigsaw


The GDELT Project monitors the world broadcast, print, and web news from nearly every corner of every country in over 100 languages and identifies the people, locations, organizations, themes, sources, emotions, counts, quotes, images and events driving our global society every second of every day, creating a free open platform for computing on the entire world.

Health Intevention Tracking for COVID-19 (HIT-COVID) Data

Boston University and Johns Hopkins University


The Health Intervention Tracking for COVID-19 (HIT-COVID) project tracks the implementation and relaxation of public health and social measures (PHSMs) taken by governments to slow transmission of SARS-COV-2 globally. Hundreds of volunteer data contributors were trained, provided with standardized field definitions and access to an online forum for asking questions and sharing ideas.

Institutional and news media tweet dataset for COVID-19 social science research

Universitat Autònoma de Barcelona


Open access data repository for institutional/news media tweet dataset in the time of COVID-19 pandemic

Mozilla COVID dataset

Mozilla Foundation


Data about user browsing in Mozilla Firefox to understand social distancing

The CoVidAffect dataset

CoVidAffect project


Data from CoVidAfect, a nationwide citizen science project aimed to provide longitudinal data of mood changes following the COVID-19 outbreak in the spanish territory

Webhddose’s free datasets

Webhose


News articles, blog posts and online discussions that mention “CoronaVirus”

WMF COVID-19

Wikimedia Foundation


COVID-19 related content across Wikipedia projects