Devtool Review

— Each week, console reviews the best tools for developers. Subscribe

Let your site visitors know you’ve been featured.
Get your badge

Fake production data for ML.

MLOpen Source
Our review

What we like

Enables privacy compliant training of ML models on realistic fake data. Connects to your prod database via SQL, transforms (fakes) the data, then trains your model. Provides a report after training to confirm how realistic the trained data was. Supports tabular and event driven models. Export to ipynb. Cloud or self-hosted.

What we don't like

No dark mode. Pretty hefty minimum requirements for self-hosting, but ML training is compute & GPU intensive.

Reviewed: 2023-03-02

Developer Interview

With Ander Steele, Lead Data Scientist


What is Tonic? Why did you build it? is the fake data company. We provide fake data so that developers and QA can test their application code against safe, de-identified versions of their application database.

Why would you use fake or synthetic data instead of real data? Particularly in regulated industries, access to data is very difficult. It may be impossible to test against production data. Even outside of regulated industries, you don't necessarily want to be working with customer data as a developer. The more distance you can get from real sensitive data as a developer or data scientist, the safer everyone is.

Recently, we've been building a product, Djinn, which is focused on the data-science workflow and those use cases. Instead of building an entire de-identified production, de-identified application database, the data-science workflow typically starts with a dataset, which is some view or table within this application database.

The tool takes these views and builds generative models that are capable of producing synthetic data, which has the same statistical properties as the real data. Synthetic data can be used in place of real data to train machine-learning models, build dashboards, to do exploratory data analysis.

This lets the data scientists get started on their job a lot faster than they could if they were waiting around for access to real data. This is something that we're quite excited about. In addition to the privacy use case, increasingly we're seeing people that want to use synthetic data for the data augmentation use case.

About Console

Console is the place developers go to find the best tools. Each week, our weekly newsletter picks out the most interesting tools and new releases. We keep track of everything - dev tools, devops, cloud, and APIs - so you don't have to.