General Datasets

Open datasets and dataset collections by third parties

Components.one datasets

Components publication and research group


A collection of large datasets related to media, including a dataset of 2.7 million news articles and essays from the last 7 years, and 10 thousand articles from the front page of the Times.

Data.europa.eu

European Union


Data.europa.eu, the official portal for European data, is the access point for open data from Europe and its Member States. It includes already over 1 million datasets from 36 states, labelled and categorized according to different criteria. A good starting point to look for datasets about a variety of topics.

Kaggle dataset collection

Kaggle (Google LLC)


Collaborative collection of datasets for data science and machine learning. It includes datasets shared by the community on a very wide range of topics, and with different licences and formats.

Planet.osm - Open Street Maps data dumps

OpenStreetMap


OpenStreetMap is a collaborative project to create a free editable geographic database of the world. Planet.osm consists of a weekly dump of the whole OpenStreetMap geographic data covering the whole planet. There are also files called Extracts which contain OpenStreetMap Data for individual continents, countries, and metropolitan areas.

Webhddose’s free datasets

Webhose


A collection of free datasets include data from a range of different sources, languages and categories. Data sources include news articles, blog posts and online discussions.

Wikipedia data dumps

Wikimedia Foundation


The Wikimedia downloads provide data dumps for all the Wikimedia projects, including Wikipedia in each language edition. The dumps are updated monthly, and include the latest version of the wiki, as well as the complete log of all the previous revisions of each page in XML format.

World news media - GDELT

Google Jigsaw


The GDELT Project monitors the world broadcast, print, and web news from nearly every corner of every country in over 100 languages and identifies the people, locations, organizations, themes, sources, emotions, counts, quotes, images and events driving our global society every second of every day, creating a free open platform for computing on the entire world.