Blog

Company updates, tutorials, research, and more!

Assessing the Quality of Synthetic Data with Cleanlab Studio

07/12/2023

Use AI to measure the quality of LLM-generated data, automatically detecting unrealistic synthetic examples and underrepresented tails of the real data distribution.

Elías Snorrason

Enhancing Product Analytics and E-commerce with Cleanlab Studio

07/06/2023

Using AI to analyze product listings for errors, and how this boosts the accuracy of product categorization and analytics efforts.

Sanjana Garg

Beware of Unreliable Data in Model Evaluation: A LLM Prompt Selection case study with Flan-T5

06/29/2023

You may choose suboptimal prompts for your LLM (or make other suboptimal choices via model evaluation) unless you clean your test data.

Chris Mauck
Jonas Mueller

Improving Legal Judgement Prediction with Cleanlab Studio

06/27/2023

A legal sector case study using Cleanlab Studio to produce better models for making predictions (eg. of final judgements) based on court case documents.

Hui Wen Goh

Improving any OpenAI Language Model by Systematically Improving its Data

06/01/2023

Reduce LLM prediction error by 37% via data-centric AI.

Chris Mauck
Jonas Mueller

Datalab: A Linter for ML Datasets

05/16/2023

Catch issues in your data/labels. This unified audit uses your ML model to automatically detect various problems in real-world datasets that can be fixed to produce a better model.

Elías Snorrason
Sanjana Garg
Hui Wen Goh
Jesse Cummings
Jonas Mueller

CleanVision: Audit your Image Data for better Computer Vision

03/22/2023

Introducing an open-source Python package to automatically identify common issues in image datasets.

Sanjana Garg
Ulyana Tkachenko
Yiming Chen
Elías Snorrason
Jonas Mueller

ActiveLab: Active Learning with Data Re-Labeling

03/02/2023

ActiveLab helps you optimally choose which data to (re)label, lowering the cost to train an accurate ML model.

Hui Wen Goh
Jonas Mueller

Cleanlab: The History, Present, and Future

04/01/2022

How an MIT grad student project became a company with tech used by Google, Amazon, Tesla, Uber, Facebook, and companies around the world.

Curtis Northcutt

Cleanlab Studio: Issues Found in Popular Datasets

The Cleanlab Studio Audit uses AI to auto-detect problems in given data. Explore all sorts of issues found in popular datasets by Cleanlab Studio!

Detecting Dataset Drift and Non-IID Sampling: A k-Nearest Neighbors approach that works for Image/Text/Audio/Numeric Data

05/30/2023

A simple method to determine if a dataset violates the IID assumption in common ways (e.g. temporal drift, or interaction between almost adjacent datapoints).