This repository contains a curated list of awesome open source libraries that will help you deploy, monitor, version, scale, and secure your production machine learning.
Quick links to sections in this page
10 Min Video Overview
|This 10 minute video provides an overview of the motivations for machine learning operations as well as a high level overview on some of the tools in this repo.|
Want to receive recurrent updates on this repo and other advancements?
Explaining Black Box Models and Datasets
- Aequitas – An open-source bias audit toolkit for data scientists, machine learning researchers, and policymakers to audit machine learning models for discrimination and bias, and to make informed and equitable decisions around developing and deploying predictive risk-assessment tools.
- Alibi – Alibi is an open source Python library aimed at machine learning model inspection and interpretation. The initial focus on the library is on black-box, instance based model explanations.
- anchor – Code for the paper “High precision model agnostic explanations”, a model-agnostic system that explains the behaviour of complex models with high-precision rules called anchors.
- captum – model interpretability and understanding library for PyTorch developed by Facebook. It contains general purpose implementations of integrated gradients, saliency maps, smoothgrad, vargrad and others for PyTorch models.
- casme – Example of using classifier-agnostic saliency map extraction on ImageNet presented on the paper “Classifier-agnostic saliency map extraction”.
- ContrastiveExplanation (Foil Trees) – Python script for model agnostic contrastive/counterfactual explanations for machine learning. Accompanying code for the paper “Contrastive Explanations with Local Foil Trees”.
- DeepLIFT – Codebase that contains the methods in the paper “Learning important features through propagating activation differences”. Here is the slides and the video of the 15 minute talk given at ICML.
- DeepVis Toolbox – This is the code required to run the Deep Visualization Toolbox, as well as to generate the neuron-by-neuron visualizations using regularized optimization. The toolbox and methods are described casually here and more formally in this paper.
- ELI5 – “Explain Like I’m 5” is a Python package which helps to debug machine learning classifiers and explain their predictions.
- FACETS – Facets contains two robust visualizations to aid in understanding and analyzing machine learning datasets. Get a sense of the shape of each feature of your dataset using Facets Overview, or explore individual observations using Facets Dive.
- Fairlearn – Fairlearn is a python toolkit to assess and mitigate unfairness in machine learning models.
- FairML – FairML is a python toolbox auditing the machine learning models for bias.
- fairness – This repository is meant to facilitate the benchmarking of fairness aware machine learning algorithms based on this paper.
- GEBI – Global Explanations for Bias Identification – An attention-based summarized post-hoc explanations for detection and identification of bias in data. We propose a global explanation and introduce a step-by-step framework on how to detect and test bias. Python package for image data.
- IBM AI Explainability 360 – Interpretability and explainability of data and machine learning models including a comprehensive set of algorithms that cover different dimensions of explanations along with proxy explainability metrics.
- IBM AI Fairness 360 – A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.
- iNNvestigate – An open-source library for analyzing Keras models visually by methods such as DeepTaylor-Decomposition, PatternNet, Saliency Maps, and Integrated Gradients.
- Integrated-Gradients – This repository provides code for implementing integrated gradients for networks with image inputs.
- InterpretML – InterpretML is an open-source package for training interpretable models and explaining blackbox systems.
- keras-vis – keras-vis is a high-level toolkit for visualizing and debugging your trained keras neural net models. Currently supported visualizations include: Activation maximization, Saliency maps, Class activation maps.
- L2X – Code for replicating the experiments in the paper “Learning to Explain: An Information-Theoretic Perspective on Model Interpretation” at ICML 2018
- Lightwood – A Pytorch based framework that breaks down machine learning problems into smaller blocks that can be glued together seamlessly with an objective to build predictive models with one line of code.
- LIME – Local Interpretable Model-agnostic Explanations for machine learning models.
- LOFO Importance – LOFO (Leave One Feature Out) Importance calculates the importances of a set of features based on a metric of choice, for a model of choice, by iteratively removing each feature from the set, and evaluating the performance of the model, with a validation scheme of choice, based on the chosen metric.
- MindsDB – MindsDB is an Explainable AutoML framework for developers. With MindsDB you can build, train and use state of the art ML models in as simple as one line of code.
- mljar-supervised – An Automated Machine Learning (AutoML) python package for tabular data. It can handle: Binary Classification, MultiClass Classification and Regression. It provides feature engineering, explanations and markdown reports.
- NETRON – Viewer for neural network, deep learning and machine learning models.
- pyBreakDown – A model agnostic tool for decomposition of predictions from black boxes. Break Down Table shows contributions of every variable to a final prediction.
- rationale – Code to implement learning rationales behind predictions with code for paper “Rationalizing Neural Predictions”
- responsibly – Toolkit for auditing and mitigating bias and fairness of machine learning systems
- SHAP – SHapley Additive exPlanations is a unified approach to explain the output of any machine learning model.
- Skater – Skater is a unified framework to enable Model Interpretation for all forms of model to help one build an Interpretable machine learning system often needed for real world use-cases
- Tensorboard’s Tensorboard WhatIf – Tensorboard screen to analyse the interactions between inference results and data inputs.
- Tensorflow’s cleverhans – An adversarial example library for constructing attacks, building defenses, and benchmarking both. A python library to benchmark system’s vulnerability to adversarial examples
- tensorflow’s lucid – Lucid is a collection of infrastructure and tools for research in neural network interpretability.
- tensorflow’s Model Analysis – TensorFlow Model Analysis (TFMA) is a library for evaluating TensorFlow models. It allows users to evaluate their models on large amounts of data in a distributed manner, using the same metrics defined in their trainer.
- themis-ml – themis-ml is a Python library built on top of pandas and sklearn that implements fairness-aware machine learning algorithms.
- Themis – Themis is a testing-based approach for measuring discrimination in a software system.
- TreeInterpreter – Package for interpreting scikit-learn’s decision tree and random forest predictions. Allows decomposing each prediction into bias and feature contribution components as described in http://blog.datadive.net/interpreting-random-forests/.
- woe – Tools for WoE Transformation mostly used in ScoreCard Model for credit rating
- XAI – eXplainableAI – An eXplainability toolbox for machine learning.
Privacy Preserving Machine Learning
Model and Data Versioning
- Apache Marvin is a platform for model deployment and versioning that hides all complexity under the hood: data scientists just need to set up the server and write their code in an extended jupyter notebook.
- Catalyst – High-level utils for PyTorch DL & RL research. It was developed with a focus on reproducibility, fast experimentation and code/ideas reusing.
- D6tflow – A python library that allows for building complex data science workflows on Python.
- Data Version Control (DVC) – A git fork that allows for version management of models.
- FGLab – Machine learning dashboard, designed to make prototyping experiments easier.
- Flor – Easy to use logger and automatic version controller made for data scientists who write ML code
- Hangar – Version control for tensor data, git-like semantics on numerical data with high speed and efficiency.
- MLflow – Open source platform to manage the ML lifecycle, including experimentation, reproducibility and deployment.
- MLWatcher – MLWatcher is a python agent that records a large variety of time-serie metrics of your running ML classification algorithm. It enables you to monitor in real time.
- ModelChimp – Framework to track and compare all the results and parameters from machine learning models (Video)
- ModelDB – An open-source system to version machine learning models including their ingredients code, data, config, and environment and to track ML metadata across the model lifecycle.
- Pachyderm – Open source distributed processing framework build on Kubernetes focused mainly on dynamic building of production machine learning pipelines – (Video)
- Polyaxon – A platform for reproducible and scalable machine learning and deep learning on kubernetes. – (Video)
- PredictionIO – An open source Machine Learning Server built on top of a state-of-the-art open source stack for developers and data scientists to create predictive engines for any machine learning task
- Quilt Data – Versioning, reproducibility and deployment of data and models.
- Sacred – Tool to help you configure, organize, log and reproduce machine learning experiments.
- steppy – Lightweight, Python3 library for fast and reproducible machine learning experimentation. Introduces simple interface that enables clean machine learning pipeline design.
- Studio.ML – Model management framework which minimizes the overhead involved with scheduling, running, monitoring and managing artifacts of your machine learning experiments.
- TRAINS – Auto-Magical Experiment Manager & Version Control for AI.
Model Training Orchestration
Model Serving and Monitoring
Adversarial Robustness Libraries
- AdvBox – generate adversarial examples from the command line with 0 coding using PaddlePaddle, PyTorch, Caffe2, MxNet, Keras, and TensorFlow. Includes 10 attacks and also 6 defenses. Used to implement StealthTshirt at DEFCON!
- Adversarial DNN Playground – think TensorFlow Playground, but for Adversarial Examples! A visualization tool designed for learning and teaching – the attack library is limited in size, but it has a nice front-end to it with buttons you can press!
- AdverTorch – library for adversarial attacks / defenses specifically for PyTorch.
- Alibi Detect – alibi-detect is a Python package focused on outlier, adversarial and concept drift detection. The package aims to cover both online and offline detectors for tabular data, text, images and time series. The outlier detection methods should allow the user to identify global, contextual and collective outliers.
- Artificial Adversary AirBnB’s library to generate text that reads the same to a human but passes adversarial classifiers.
- CleverHans – library for testing adversarial attacks / defenses maintained by some of the most important names in adversarial ML, namely Ian Goodfellow (ex-Google Brain, now Apple) and Nicolas Papernot (Google Brain). Comes with some nice tutorials!
- DEEPSEC – another systematic tool for attacking and defending deep learning models.
- EvadeML – benchmarking and visualization tool for adversarial ML maintained by Weilin Xu, a PhD at University of Virginia, working with David Evans. Has a tutorial on re-implementation of one of the most important adversarial defense papers – feature squeezing (same team).
- Foolbox – second biggest adversarial library. Has an even longer list of attacks – but no defenses or evaluation metrics. Geared more towards computer vision. Code easier to understand / modify than ART – also better for exploring blackbox attacks on surrogate models.
- IBM Adversarial Robustness 360 Toolbox (ART) – at the time of writing this is the most complete off-the-shelf resource for testing adversarial attacks and defenses. It includes a library of 15 attacks, 10 empirical defenses, and some nice evaluation metrics. Neural networks only.
- MIA – A library for running membership inference attacks (MIA) against machine learning models.
- Nicolas Carlini’s Adversarial ML reading list – not a library, but a curated list of the most important adversarial papers by one of the leading minds in Adversarial ML, Nicholas Carlini. If you want to discover the 10 papers that matter the most – I would start here.
- Robust ML – another robustness resource maintained by some of the leading names in adversarial ML. They specifically focus on defenses, and ones that have published code available next to papers. Practical and useful.
- TextFool – plausible looking adversarial examples for text generation.
- Trickster – Library and experiments for attacking machine learning in discrete domains using graph search.
Neural Architecture Search
Data Science Notebook Frameworks
- Apache Zeppelin – Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
- Binder – Binder hosts notebooks in an executable environment (for free).
- H2O Flow – Jupyter notebook-like interface for H2O to create, save and re-use “flows”
- Hydrogen – A plugin for ATOM that enables it to become a jupyter-notebook-like interface that prints the outputs directly in the editor.
- Jupyter Notebooks – Web interface python sandbox environments for reproducible development
- ML Workspace – All-in-one web IDE for machine learning and data science. Combines Jupyter, VS Code, Tensorflow, and many other tools/libraries into one Docker image.
- Papermill – Papermill is a library for parameterizing notebooks and executing them like Python scripts.
- Polynote – Polynote is an experimental polyglot notebook environment. Currently, it supports Scala and Python (with or without Spark), SQL, and Vega.
- RMarkdown – The rmarkdown package is a next generation implementation of R Markdown based on Pandoc.
- Stencila – Stencila is a platform for creating, collaborating on, and sharing data driven content. Content that is transparent and reproducible.
- Voilà – Voilà turns Jupyter notebooks into standalone web applications that can e.g. be used as dashboards.
Industrial Strength Visualisation libraries
Industrial Strength NLP
- Blackstone – Blackstone is a spaCy model and library for processing long-form, unstructured legal text. Blackstone is an experimental research project from the Incorporated Council of Law Reporting for England and Wales’ research lab, ICLR&D.
- CTRL – A Conditional Transformer Language Model for Controllable Generation released by SalesForce
- Facebook’s XLM – PyTorch original implementation of Cross-lingual Language Model Pretraining which includes BERT, XLM, NMT, XNLI, PKM, etc.
- Flair – Simple framework for state-of-the-art NLP developed by Zalando which builds directly on PyTorch.
- Github’s Semantic – Github’s text library for parsing, analyzing, and comparing source code across many languages .
- GluonNLP – GluonNLP is a toolkit that enables easy text preprocessing, datasets loading and neural models building to help you speed up your Natural Language Processing (NLP) research.
- GNES – Generic Neural Elastic Search is a cloud-native semantic search system based on deep neural networks.
- Grover – Grover is a model for Neural Fake News — both generation and detection. However, it probably can also be used for other generation tasks.
- Kashgari – Kashgari is a simple and powerful NLP Transfer learning framework, build a state-of-art model in 5 minutes for named entity recognition (NER), part-of-speech tagging (PoS), and text classification tasks.
- OpenAI GPT-2 – OpenAI’s code from their paper “Language Models are Unsupervised Multitask Learners”.
- sense2vec – A Pytorch library that allows for training and using sense2vec models, which are models that leverage the same approach than word2vec, but also leverage part-of-speech attributes for each token, which allows it to be “meaning-aware”
- Snorkel – Snorkel is a system for quickly generating training data with weak supervision https://snorkel.org.
- SpaCy – Industrial-strength natural language processing library built with python and cython by the explosion.ai team.
- Stable Baselines – A fork of OpenAI Baselines, implementations of reinforcement learning algorithms http://stable-baselines.readthedocs.io/.
- Tensorflow Lingvo – A framework for building neural networks in Tensorflow, particularly sequence models. Lingvo: A TensorFlow Framework for Sequence Modeling.
- Tensorflow Text – TensorFlow Text provides a collection of text related classes and ops ready to use with TensorFlow 2.0.
- Wav2Letter++ – A speech to text system developed by Facebook’s FAIR teams.
- YouTokenToMe – YouTokenToMe is an unsupervised text tokenizer focused on computational efficiency. It currently implements fast Byte Pair Encoding (BPE) [Sennrich et al.].
🤗Transformers – Huggingface’s library of state-of-the-art pretrained models for Natural Language Processing (NLP).
Data Pipeline ETL Frameworks
- Apache Airflow – Data Pipeline framework built in Python, including scheduler, DAG definition and a UI for visualisation
- Apache Nifi – Apache NiFi was made for dataflow. It supports highly configurable directed graphs of data routing, transformation, and system mediation logic.
- Argo Workflows – Argo Workflows is an open source container-native workflow engine for orchestrating parallel jobs on Kubernetes. Argo Workflows is implemented as a Kubernetes CRD (Custom Resource Definition).
- Azkaban – Azkaban is a batch workflow job scheduler created at LinkedIn to run Hadoop jobs. Azkaban resolves the ordering through job dependencies and provides an easy to use web user interface to maintain and track your workflows.
- Basin – Visual programming editor for building Spark and PySpark pipelines
- Bonobo – ETL framework for Python 3.5+ with focus on simple atomic operations working concurrently on rows of data
- Chronos – More of a job scheduler for Mesos than ETL pipeline. [OUTDATED]
- Couler – Unified interface for constructing and managing machine learning workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow.
- Dagster – A data orchestrator for machine learning, analytics, and ETL.
- Genie – Job orchestration engine to interface and trigger the execution of jobs from Hadoop-based systems
- Gokart – Wrapper of the data pipeline Luigi
- Kedro – Kedro is a workflow development tool that helps you build data pipelines that are robust, scalable, deployable, reproducible and versioned. Visualization of the kedro workflows can be done by
- Luigi – Luigi is a Python module that helps you build complex pipelines of batch jobs, handling dependency resolution, workflow management, visualisation, etc
- Metaflow – A framework for data scientists to easily build and manage real-life data science projects.
- Neuraxle – A framework for building neat pipelines, providing the right abstractions to chain your data transformation and prediction steps with data streaming, as well as doing hyperparameter searches (AutoML).
- Oozie – Workflow scheduler for Hadoop jobs
- PipelineX – Based on Kedro and MLflow. Full comparison given at https://github.com/Minyus/Python_Packages_for_Pipeline_Workflow
- Prefect Core – Workflow management system that makes it easy to take your data pipelines and add semantics like retries, logging, dynamic mapping, caching, failure notifications, and more.
Data Labelling Tools and Frameworks
- COCO Annotator – Web-based image segmentation tool for object detection, localization and keypoints
- Computer Vision Annotation Tool (CVAT) – OpenCV’s web-based annotation tool for both VIDEOS and images for computer algorithms.
- Doccano – Open source text annotation tools for humans, providing functionality for sentiment analysis, named entity recognition, and machine translation.
- ImageTagger – Image labelling tool with support for collaboration, supporting bounding box, polygon, line, point labelling, label export, etc.
- ImgLab – Image annotation tool for bounding boxes with auto-suggestion and extensibility for plugins.
- Label Studio – Multi-domain data labeling and annotation tool with standardized output format
- Labelimg – Open source graphical image annotation tool writen in Python using QT for graphical interface focusing primarily on bounding boxes.
- makesense.ai – Free to use online tool for labelling photos. Prepared labels can be downloaded in one of multiple supported formats.
- MedTagger – A collaborative framework for annotating medical datasets using crowdsourcing.
- OpenLabeling – Open source tool for labelling images with support for labels, edges, as well as image resizing and zooming in.
- PixelAnnotationTool – Image annotation tool with ability to “colour” on the images to select labels for segmentation. Process is semi-automated with the watershed marked algorithm of OpenCV
- Semantic Segmentation Editor – Hitachi’s Open source tool for labelling camera and LIDAR data.
- Superintendent – superintendent provides an ipywidget-based interactive labelling tool for your data.
- VGG Image Annotator (VIA) – A simple and standalone manual annotation software for image, audio and video. VIA runs in a web browser and does not require any installation or setup.
- Visual Object Tagging Tool (VOTT) – Microsoft’s Open Source electron app for labelling videos and images for object detection models (with active learning functionality)
Data Storage Optimisation
- Alluxio – A virtual distributed storage system that bridges the gab between computation frameworks and storage systems.
- Apache Arrow – In-memory columnar representation of data compatible with Pandas, Hadoop-based systems, etc
- Apache Druid – A high performance real-time analytics database. https://druid.apache.org/. An introduction to Druid, your Interactive Analytics at (big) Scale.
- Apache Ignite – A memory-centric distributed database, caching, and processing platform for transactional, analytical, and streaming workloads delivering in-memory speeds at petabyte scale. TensorFlow on Apache Ignite, Distributed ML in Apache Ignite
- Apache Kafka – Distributed streaming platform framework
- Apache Parquet – On-disk columnar representation of data compatible with Pandas, Hadoop-based systems, etc
- Apache Pinot – A realtime distributed OLAP datastore https://pinot.apache.org. Comparison of the Open Source OLAP Systems for Big Data: ClickHouse, Druid, and Pinot.
- BayesDB – Database that allows for built-in non-parametric Bayesian model discovery and queryingi for data on a database-like interface – (Video)
- ClickHouse – ClickHouse is an open source column oriented database management system supported by Yandex – [(Video)](https://
- EdgeDB – NoSQL interface for Postgres that allows for object interaction to data stored
- HopsFS – HDFS-compatible file system with scale-out strongly consistent metadata.
- InfluxDB Scalable datastore for metrics, events, and real-time analytics.
- TimescaleDB An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension. Time-series ML in TimescaleDB
Function as a Service Frameworks
Computation load distribution frameworks
- Apache Spark MLlib – Apache Spark’s scalable machine learning library in Java, Scala, Python and R
- Beam Apache Beam is a unified programming model for Batch and Streaming https://beam.apache.org/
- BigDL – Deep learning framework on top of Spark/Hadoop to distribute data and computations across a HDFS system
- Dask – Distributed parallel processing framework for Pandas and NumPy computations – (Video)
- DEAP – A novel evolutionary computation framework for rapid prototyping and testing of ideas. It seeks to make algorithms explicit and data structures transparent. It works in perfect harmony with parallelisation mechanisms such as multiprocessing and SCOOP.
- DeepSpeed – A deep learning optimization library (lightweight PyTorch wrapper) that makes distributed training easy, efficient, and effective.
- Fiber – Distributed computing library for modern computer clusters from Uber.
- Hadoop Open Platform-as-a-service (HOPS) – A multi-tenancy open source framework with RESTful API for data science on Hadoop which enables for Spark, Tensorflow/Keras, it is Python-first, and provides a lot of features
- Horovod – Uber’s distributed training framework for TensorFlow, Keras, and PyTorch
- NumPyWren – Scientific computing framework build on top of pywren to enable numpy-like distributed computations
- PyWren – Answer the question of the “cloud button” for python function execution. It’s a framework that abstracts AWS Lambda to enable data scientists to execute any Python function – (Video)
- PyTorch Lightning– Lightweight PyTorch research framework that allows you to easily scale your models to GPUs and TPUs and use all the latest best practices, without the engineering boilerplate – (Video)
- Ray – Ray is a flexible, high-performance distributed execution framework for machine learning (VIDEO)
- Vespa Vespa is an engine for low-latency computation over large data sets. https://vespa.ai
Model serialisation formats
- Java PMML API – Java libraries for consuming and producing PMML files containing models from different frameworks, including:
- MMdnn – Cross-framework solution to convert, visualize and diagnose deep neural network models.
- Neural Network Exchange Format (NNEF) – A standard format to store models across Torch, Caffe, TensorFlow, Theano, Chainer, Caffe2, PyTorch, and MXNet
- ONNX – Open Neural Network Exchange Format
- PFA – Created by the same organisation as PMML, the Predicted Format for Analytics is an emerging standard for statistical models and data transformation engines.
- PMML – The Predictive Model Markup Language standard in XML – (Video)_
Optimized calculation frameworks
- CuDF cuDF – GPU DataFrame Library http://rapids.ai
- CuML cuML – RAPIDS Machine Learning Library
- CuPy NumPy-like API accelerated with CUDA
- H2O-3 – Fast scalable Machine Learning platform for smarter applications: Deep Learning, Gradient Boosting & XGBoost, Random Forest, Generalized Linear Modeling (Logistic Regression, Elastic Net), K-Means, PCA, Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
- Modin – Speed up your Pandas workflows by changing a single line of code
- Numba – A compiler for Python array and numerical functions
- NumpyGroupies Optimised tools for group-indexing operations: aggregated sum and more
- Vulkan Kompute – Blazing fast, lightweight and mobile phone-enabled Vulkan compute framework optimized for advanced GPU data processing usecases.
- Weld High-performance runtime for data analytics applications, Interview with Weld’s main contributor
Data Stream Processing
Outlier and Anomaly Detection
Feature Engineering Automation
- auto-sklearn – Framework to automate algorithm and hyperparameter tuning for sklearn
- AutoGluon – Automated feature, model, and hyperparameter selection for tabular, image, and text data on top of popular machine learning libraries (Scikit-Learn, LightGBM, CatBoost, PyTorch, MXNet)
- AutoML-GS – Automatic feature and model search with code generation in Python, on top of common data science libraries (tensorflow, sklearn, etc)
- automl – Automated feature engineering, feature/model selection, hyperparam. optimisation
- Colombus – A scalable framework to perform exploratory feature selection implemented in R
- Feature Engine – Feature-engine is a Python library that contains several transformers to engineer features for use in machine learning models.
- Featuretools – An open source framework for automated feature engineering
- keras-tuner – Keras Tuner is an easy-to-use, distributable hyperparameter optimization framework that solves the pain points of performing a hyperparameter search. Keras Tuner makes it easy to define a search space and leverage included algorithms to find the best hyperparameter values.
- sklearn-deap Use evolutionary algorithms instead of gridsearch in scikit-learn.
- TPOT – Automation of sklearn pipeline creation (including feature selection, pre-processor, etc)
- tsfresh – Automatic extraction of relevant features from time series
- mljar-supervised – An Automated Machine Learning (AutoML) python package for tabular data. It can handle: Binary Classification, MultiClass Classification and Regression. It provides feature engineering, explanations and markdown reports.
- Algorithmia – Cloud platform to build, deploy and serve machine learning models (Video)
- allegro ai Enterprise – Automagical open-source ML & DL experiment manager and ML-Ops solution.
- Amazon SageMaker – End-to-end machine learning development and deployment interface where you are able to build notebooks that use EC2 instances as backend, and then can host models exposed on an API
- bigml – E2E machine learning platform.
- cnvrg.io – An end-to-end platform to manage, build and automate machine learning
- Comet.ml – Machine learning experiment management. Free for open source and students (Video)
- Cubonacci – The Cubonacci platform manages deployment, versioning, infrastructure, monitoring and lineage for you, eliminating risk and minimizing time-to-market.
- D2iQ KUDO for Kubeflow – Enterprise machine learning platform that runs in the cloud, on premises (incl. air-gapped), in hybrid environments, or on the edge; based on Kubeflow and open-source Kubernetes Universal Declarative Operators (KUDO).
- DAGsHub – Community platform for Open Source ML – Manage experiments, data & models and create collaborative ML projects easily.
- Dataiku – Collaborative data science platform powering both self-service analytics and the operationalization of machine learning models in production.
- DataRobot – Automated machine learning platform which enables users to build and deploy machine learning models.
- Datatron – Machine Learning Model Governance Platform for all your AI models in production for large Enterprises.
- deepsense AIOps – Enhances multi-cloud & data center IT Operations via traffic analysis, risk analysis, anomaly detection, predictive maintenance, root cause analysis, service ticket analysis and event consolidation.
- Deep Cognition Deep Learning Studio – E2E platform for deep learning.
- deepsense Safety – AI-driven solution to increase worksite safety via safety procedure check, thread detection and hazardous zones monitoring.
- deepsense Quality – Automating laborious quality control tasks.
- Google Cloud Machine Learning Engine – Managed service that enables developers and data scientists to build and bring machine learning models to production.
- H2O Driverless AI – Automates key machine learning tasks, delivering automatic feature engineering, model validation, model tuning, model selection and deployment, machine learning interpretability, bring your own recipe, time-series and automatic pipeline generation for model scoring. (Video)
- IBM Watson Machine Learning – Create, train, and deploy self-learning models using an automated, collaborative workflow.
- Iguazio Data Science Platform – Bring your Data Science to life by automating MLOps with end-to-end machine learning pipelines, transforming AI projects into real-world business outcomes, and supporting real-time performance at enterprise scale.
- Labelbox – Image labelling service with support for semantic segmentation (brush & superpixels), bounding boxes and nested classifications.
- Logical Clocks Hopsworks – Enterprise version of Hopsworks with a Feature Store and scale-out ML pipeline design and operation.
- MCenter – MLOps platform automates the deployment, ongoing optimization, and governance of machine learning applications in production.
- Microsoft Azure Machine Learning service – Build, train, and deploy models from the cloud to the edge.
- MLJAR – Platform for rapid prototyping, developing and deploying machine learning models.
- neptune.ml – community-friendly platform supporting data scientists in creating and sharing machine learning models. Neptune facilitates teamwork, infrastructure management, models comparison and reproducibility.
- Prodigy – Active learning-based data annotation. Allows to train a model and pick most ‘uncertain’ samples for labeling from an unlabeled pool.
- Scribble Enrich – Customizable, auditable, privacy-aware feature store. It is designed to help mid-sized data teams gain trust in the data that they use for training and analysis, and support emerging needs such drift computation and bias assessment.
- SKIL – Software distribution designed to help enterprise IT teams manage, deploy, and retrain machine learning models at scale.
- Skytree 16.0 – End to end machine learning platform (Video)
- Spell – Flexible end-to-end MLOps / Machine Learning Platform. (Video)
- Superb AI – ML DataOps platform providing various tools to build, label, manage and iterate on training data.
- Talend Studio
- Valohai – Machine orchestration, version control and pipeline management for deep learning.