Quickstart Guide

To contribute data to the Oopsie Dataset, we ask that you collect recordings of your robot policy evaluations, successes and failures. Our toolkit provides utilities for formatting robot evaluation data in a consistent manner across embodiments. Finally, use our annotation tool to quickly provide a brief description of each failed trajectory and upload your labeled data to the project repository.

This page provides a brief overview of all the necessary steps to contribute to the project using our tooling:

Registration
Installation and setup
Data recording and annotation
Data submission

For each step, we provide more detailed instructions on the code and API documentation in the Oopsie Toolkit section of this website. If you run into any trouble, please reference it for any additional information and check the FAQ as well. If you still have questions, do not hesitate to open an issue on github or contact the team.

1. Registration

To submit data to the official Oopsie Data Project, you need to register your lab. We will review the registration and send you the lab-specific ID and huggingface token which are used to submit data to the central repository. Only your lab will have access to this data until the public release, and we will notify all contributors before their data gets released!

To register, please use the registration form. We only need one registration per lab. If your lab is already registered, please contact your lab contact to get the submission token.

2. Installation and setup

2.1 Installation

Full instructions

To install our data collection and annotation tooling, we recommend using uv or pip. We tested our toolkit with python versions 3.8 and 3.12, please contact us if you run into trouble with another version.

To download and install the oopsie-tools package, simply activate your environment and run

git clone https://github.com/oopsie-data/oopsie-tools
cd oopsie_tools

pip install -e .

2.2 Creating a robot profile

Full instructions

To record robot and policy metadata, we use a setup-specific yaml file, the robot profile. A template and example robot profiles can be found in config/robot_profiles. To start use the template or the closest existing profile and modify it with your specific information. For a full list of keys and detailed information, please refer to the full instructions.

The robot profile captures both robot embodiment information as well as the policy. This means you have to overwrite the policy field or create a separate profile if you want to evaluate more than one policy. This is required as some keys, such as the action space, are policy – and not just embodiment – specific.

2.3 Setting up the contributor config

To contribute data, you will need to put the lab id and huggingface token you received after registration in config/contributor_config.yaml. Please make sure that you use the exact provided lab id (including capitalization) otherwise you cannot access the lab-specific repository.

3. Data recording and annotation

3.1 Data collection

Full instructions

We provide several tools to collect data and save it in the required format for annotation and submission. If you are using a standard framework for policy execution and evaluation, check the examples provided in examples/inference_examples for a growing list of ready-to-use scripts.

We envision three possible workflows for data collection:

In-the-loop collection and annotation: If you want to collect data and annotate each episode with success and failure and the failure description immediately, we provide an all-in-one tool that automatically saves your evaluation data and launches a browser tool for annotation. For a minimal code example, see here.
Bulk collection and annotation: If you only want to record the data, and annotate each episode later, we provide a stand-alone tool for recording. For a minimal code example, see here
Custom collection and bulk annotation: If your setup is incompatible with our EpisodeRecorder, or if you have already collected data and simply want to format it into the Oopsie Data format for annotation and submission, see option 3 here.

3.2 Annotation

Full instructions

To make the data useful for downstream projects, we require that each episode is annotated with failure information. At minimum, each episode needs to be marked as a success or failure and the Describe what went wrong field needs to be filled. In addition, we allow each annotator to fill out a short failure questionnaire.

4. Data submission

Full instructions

To submit your data, you need to ensure that you have provided the lab_id and huggingface token, and that your data is properly annotated. After doing this, you can simply run

python scripts/validate_and_upload/upload.py \
  --samples_dir /path/to/formatted_data \