Oopsie Data

All successful robots are alike; each unsuccessful robot is unsuccessful in its own way.

L30 Tolstoy, Anna Kareni-Bot

Oopsie is a multi-lab effort to build the first large-scale dataset of real robot manipulation failures.

Today’s robotics datasets contain only successes. But a policy that has only seen things go right never learns what a bad grasp looks like and how to avoid failures. The failures scenarios that would teach it this are produced constantly, during policy rollouts in every lab, but are ignored and thrown away.

We want to stop throwing them away. Real failures and suboptimal behavior are the missing ingredient for reinforcement learning, reward modeling, failure prediction, and world modeling, and no amount of synthetic noise injection substitutes for them. Getting there takes data spanning many robots, tasks, and setups, which is more than any single lab can collect.

So here is our ask: next time you rollout a policy on the real robot (e.g. policy evaluation, play data collection, online RL training, etc.), keep the rollouts and send them to us, failures and successes alike. We provide the toolkit to record and annotate them, and contributing labs get early access to the dataset and co-authorship on the public release.

Details

This website is everything you need to start contributing. For a longer introduction to why suboptimal and failure data matters in robotics, see the motivation and why you should contribute. If you are ready, the quickstart guide walks through the workflow end to end and links out to the details of each step, and the Oopsie toolkit is what you will use to record and annotate the rollouts. Please refer to the FAQ for any questions you might have.

If our software does not fit your workflow, tell us and we will help you share your data anyway — and reach out with any suggestions, because we want this dataset to be useful for you and your research.

An example of a common evaluation failure

Below are two example episodes from the initial dataset: one successful grasp and one failure on an Aloha robot using a diffusion policy. Recorded under similar conditions with the same policy, they highlight the fine-grained differences the dataset is designed to capture.

Even in a simple ball-grasping task, a slight gripper offset can cause failure. Capturing these nuances (and other common failure modes) is a core goal of this project.

Successful episode

The robot is able to pick up the ball and place it in the bowl.

Failure episode

The robot fails to grasp the ball and instead pushes it to roll off the table.