Open tools for retrieving data from online platforms
DMI TCAT
Twitter Capture and Analysis Toolset
Toolset to retrieve tweets from the Twitter API, and to refine and analyze them in various ways. DMI-TCAT provides robust and reproducible data capture and analysis, and interlinks with existing analytical software.
Facebook Marketing API
pySocialWatcher Python library
A Social Data Collector from Facebook Marketing API. This tool can be used to obtain from the Facebook Marketing API the number of Facebook users sharing a specific interest based on different demographic filtering criteria such as country, age range, gender, scholarity, language and citizenship of users.
Hydrator
Reconstructing Twitter ID datasets
Electron based desktop application for hydrating Twitter ID datasets. Sharing full Twitter datasets is not allowed by the Twitter Terms of Service, however tweet IDs can be shared. Hydrator allows one to reconstruct a whole Twitter dataset, in JSON or CSV format, from the tweets IDs, by querying the Twitter API to retrieve the data corresponding to each tweet ID.
Hyphe
Web corpus curation
Websites crawler with built-in exploration and control web interface. It allows for creating a web corpus as a set of web pages and links between them, curated and organized by the user. The exploration starts from an initial seed of one or more web sites defined by the user, that is expanded in one or more iterations.
Reddit API
PRAW Python library
Python Reddit API Wrapper (PRAW) is a Python library for accessing Reddit data through the Reddit API, in compliance with the platform’s policies and the API’s rules.
Scrapy
Data scraping in Python
Fast high-level web crawling & scraping framework for Python. Scrapy can be useful to build a website crawler and collect data from different websites using python programming language.
Tumblr Tool
Tumblr data extraction
Script that extracts data from the microblogging and social networking website Tumblr, and allows for creating co-tag networks and tabular post stats. Data can be queried through a web interface.
Twarc
Command line tool and Python library for collecting and archiving Twitter JSON data via the Twitter API
twarc is a command line tool and Python library for collecting and archiving Twitter JSON data via the Twitter API. It has separate commands (twarc and twarc2) for working with the older v1.1 API and the newer v2 API and Academic Access (respectively). It also has an ecosystem of plugins for doing things with the collected data. While the primary use is academic, it works just as well with "Standard" v2 API and "Premium" v1.1 APIs.
Twitter API
Twitter API official documentation
Through the Twitter API it is possible to retrieve public tweets about specific topics or query terms, and to monitor the debate in real time. The documentation includes tools and libraries for dealing with the API in different programming languages, and step by step tutorials.
Wikipedia API
Wikipedia-API Python library
Python library to easily query the WIkipedia API. Through the API it is possible to retrieve in real time data from Wikipedia, including text and metadata for any Wikipedia page, as well as images, links between articles, or the revision history.
YouTube Data Tools
YouTube data extraction
Collection of simple tools for extracting data from the YouTube platform via the API v3. The scripts that query the Youtube API can be run, or launched directly online through a web interface.