Data Science

Open source Python packages for data science

Apache Superset

Data exploration and visualization platform


Superset is fast, lightweight, intuitive, and loaded with options that make it easy for users of all skill sets to explore and visualize their data, from simple line charts to highly detailed geospatial charts.

Gensim

Topic modelling


Gensim is a python library for topic modelling, document indexing and similarity retrieval with large corpora. Gensm provides a set of pretrained machine learning models and pretrained vectors using large datasets collected from sources like Twitter, Wikipedia.

Grafana

Interactive visualizations


Grafana allows you to query, visualize, alert on and understand your metrics no matter where they are stored. Create, explore, and share dashboards with your team and foster a data driven culture.

Keras

Deep learning API based on TensorFlow


Keras is a deep learning API written in Python, running on top of the machine learning platform TensorFlow. It was developed with a focus on enabling fast experimentation.

LIME

Explain the predictions of any ML classifier


LIME (local interpretable model-agnostic explanations) explains what machine learning classifiers or models are doing. LIME supports explaining individual predictions for text classifiers or classifiers that act on tables or images. This package is able to explain any black box classifier, with two or more classes. All we require is that the classifier implements a function that takes in raw text or a numpy array and outputs a probability for each class. Support for scikit-learn classifiers is built-in.

Matplotlib

Data visualization


Comprehensive library for creating static, animated, and interactive visualizations in Python. Matplotlib is widely used for quick visualizations and to get an understanding of a dataset.

NetworkX

Complex networks


Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. With NetworkX it is possible to model relationships between entities building directed and undirected graphs and extract quantitative information. It can be used for a variety of network related tasks such as community detection, identifying connected components, and visualizing relationships using graphs.

NLTK

Natural Language Processing


NLTK is a natural language processing toolkit widely used to process text data with python programming language. With NLTK it is possible to compute text similarity, lemmatization, to remove stop words, sentiment analysis, etc. The NLTK website offers a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing.

Plotly

Interactive graphing


Interactive graphing library for Python. With Plotly users can visually represent a dataset and study the relationships in an interactive way. Plotly supports various types of plots like line charts, scatter plots, histograms, cox plots, etc. and offers many possibilities for graph customization.

PyTorch

An open source machine learning framework that accelerates the path from research prototyping to production deployment.


PyTorch is a Python package that provides two high-level features. Tensor computation (like NumPy) with strong GPU acceleration and deep neural networks built on a tape-based autograd system.

scikit-learn

Python module for machine learning built on top of SciPy


Scikit-learn is a useful package for building machine learning models like clustering, classification, and regression models. It also provides a variety of packages for data transformation such as feature extraction, encoding, normalization etc.

SHAP

Game theoretic approach to explain ML models


SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions.

TensorFlow

An open source machine learning framework


TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML-powered applications.