Dataset Format

Episodes are stored as HDF5 files (.h5), one file per episode. This is the native format written by EpisodeRecorder and read by the annotation tool. For dataset distribution, episodes will be exported to training-ready formats, such as RLDS and LeRobot. If you use the provided toolkit, you will not have to worry about the data format, but if you write a separate converter, or if you simply want to know how the data looks under the hood, we explain it in detail here.

We chose to use a file-per-episode format to make the annotation tool more flexible, as it allows us to group and bulk annotate episodes flexibly. For more details, see the explanation of the data annotation tool.

Note that we save far more metadata than other projects such as the DROID dataset. This is on purpose, as our dataset is designed to make cross-embodiment training easier. For example, we explicitly collect information about joint names and rotation representation. For details, see the Robot Setup.


HDF5 Episode Schema

Each HDF5 file follows the oopsiedata_format_v1 schema:

episode.h5
├── [attr] schema = "oopsiedata_format_v1"
├── [attr] language_instruction    (str)
├── [attr] episode_id              (str)        # unique in submission group
├── [attr] lab_id                  (str)        # assigned at sign_up
├── [attr] operator_name           (str)        # can be pseudonymized
├── [attr] robot_profile           (str)        # json-serialized RobotSetup profile
├── [attr] timestamp               (float)      # unix timestamp of episode start
│
├── episode_annotations/           (group)      # written by annotation tool after rollout
│   └── <annotator_name>/          (group)      # one subgroup per annotator
│       ├── [attr] source              (str)    # e.g. "human"
│       ├── [attr] timestamp           (str)    # ISO timestamp of annotation
│       ├── [attr] success             (float)  # 1.0 = success, 0.0 = failure
│       ├── [attr] failure_description (str)
│       ├── [attr] taxonomy            (str)    # json: {failure_category, severity}
│       └── [attr] additional_notes    (str)
│
├── observations/                  (group)
│   ├── video_paths/               (group)
│   │   └── <camera_name>          (dataset, str) # relative path to .mp4 file
│   └── robot_states/              (group)
│       ├── gripper_position       (dataset, float64, shape [N, 1])
│       ├── cartesian_position     (dataset, float64, shape [N, D])
│       └── joint_position         (dataset, float64, shape [N, D])
│
└── actions/                       (group)
    ├── joint_position             (dataset, float64, shape [N, D])
    ├── joint_velocity             (dataset, float64, shape [N, D])
    ├── gripper_binary             (dataset, float64, shape [N, 1])
    ├── gripper_position           (dataset, float64, shape [N, 1])
    ├── gripper_velocity           (dataset, float64, shape [N, 1])
    ├── base_position              (dataset, float64, shape [N, 3])
    ├── base_velocity              (dataset, float64, shape [N, 3])
    ├── cartesian_position     (dataset, float64, shape [N, 7/14])
    └── cartesian_velocity     (dataset, float64, shape [N, 6/12])

N is the number of recorded timesteps and D is the degrees of freedom for the robot. For bi-arm setups, please simply concatenate the actions of the left and right arm.

Important Points

  • We assume that many data collection setups will not make it possible to collect all action formats. We therefore only require one entry in actions/ to be a valid tensor dataset; the others are stored as empty HDF5 datasets.
  • Please ensure to provide unnormalized and absolute actions as this will make using the actions easier and reduce the amount of conversions.
  • We furthermore assume that the rotation component of cartesian position action space are encoded as quaternions. Tooling for converting most common representations into quaternions are provided in the episode recorder.
  • The gripper action should be set in one of the three keys: gripper_binary, gripper_position or gripper_velocity. Note that we currently only support two-finger gripper setups.

Field Reference

Root Attributes

Field Type Required Description
schema str attr Yes "oopsiedata_format_v1"
language_instruction str attr Yes Natural language task description
episode_id str attr Yes Unique disambiguation ID per episode
lab_id str attr Yes Lab identifier for multi-lab tracking
operator_name str attr No Name of the operator running the rollout
robot_profile str attr Yes JSON-serialized RobotSetup profile (includes robot_id, control_freq, joint names, camera names, etc.)
timestamp float attr Yes Unix timestamp of episode start

Episode Annotations

The episode_annotations/ group is written by the annotation tool after rollout, not during recording. It contains one subgroup per annotator:

Field Type Description
source str attr Annotation source, e.g. "human"
timestamp str attr ISO timestamp of when the annotation was made
success float attr 1.0 = success, 0.0 = failure
failure_description str attr Free-text description of the failure
taxonomy str attr JSON string: {"failure_category": ..., "severity": ...}
additional_notes str attr Any other annotator notes

The annotation tool provides a simple interface for editing these fields per episode, or in bulk across a group of episodes.

Video Paths

Camera video files are stored under observations/video_paths/ as relative paths to MP4 files co-located with the HDF5 file. Camera names are user-defined; use descriptive names for consistency:

wrist_cam          # Wrist-mounted camera
overhead_cam       # Top-down view
left_shoulder_cam  # Left over-shoulder view
right_shoulder_cam # Right over-shoulder view
front_cam          # Frontal view

Keeping the videos in separate files as opposed to storing the raw pixel observations in the HDF5 file allows us to display and store high resolution videos without massive storage inflation during annotation. These files will be post-processed for the dataset release.

Robot State Observations (per timestep)

Stored under observations/robot_states/.

Field Type Shape Description
gripper_position float64 (N, 1) Gripper position state
cartesian_position float64 (N, D) End-effector Cartesian pose; shape depends on robot profile
joint_position float64 (N, D) Joint position state

Actions (per timestep)

Stored under actions/. Unused fields are written as empty HDF5 datasets.

Field Type Shape Description
joint_position float64 (N, D) Commanded joint positions in absolute (unnormalized) space
joint_velocity float64 (N, D) Commanded joint velocities
cartesian_position float64 (N, 7/14) Commanded end-effector Cartesian pose (position + quaternion-based rotation)
cartesian_velocity float64 (N, 6/12) Commanded end-effector Cartesian velocity (first 3 linear + next 3 angular)
gripper_binary float64 (N, 1) Binary gripper command (open/close)
gripper_position float64 (N, 1) Commanded gripper position
gripper_velocity float64 (N, 1) Commanded gripper velocity
base_position float64 (N, 3) Commanded base position (deltas usually)
base_velocity float64 (N, 3) Commanded base linear velocity (x, y, yaw)

To make the dataset usable across embodiments, we ask that you save absolute end-effector position commands. However, we acknowledge that the best action space encoding can vary from policy to policy. For common embodiments, such as the Franka arm used in the standard Droid setup, and the Aloha bi-arm manipulator, we provide utilities to convert between joint and eef space representations.

If you want to contribute data on a different embodiment, and your policy does not allow you to output eef positions, please let us know.


Failure Annotation Schema

Annotations are written into the episode_annotations/<annotator_name>/ subgroup by the annotation tool. The taxonomy attribute holds a JSON string with structured failure labels:

{
  "failure_category": "<category>",
  "severity": "<severity>"
}

The questionnaire driving these fields is defined in oopsie_tools/annotation_tool/questionnaire.yaml and can be filled out using the annotation tool.


Directory Layout

We strongly recommend saving the data in a directory layout structured like this.

samples_directory
├── evaluation_session_1
│   ├── episode_1.hdf5
│   ├── episode_1_camera1.mp4
│   ├── episode_1_camera2.mp4
│   ├── episode_2.hdf5
│   ├── episode_2_camera1.mp4
│   ├── episode_2_camera2.mp4
│   └── ...
│
├── evaluation_session_2
│   ├── episode_1.hdf5
│   ├── episode_1_camera1.mp4
│   ├── episode_1_camera2.mp4
│   └── ...
│
└── ...

By default, session names and episodes will contain the lab id and timestamp.

Keeping all episodes from one evaluation session in one directory allows you to easily use the bulk annotation utilities in the data annotation tool. Note that we do not make any assumptions about file names, all of our tools simply look for all hdf5 files in a nested directory structure. The file paths of associated mp4 camera videos need to be saved in the hdf5 files directly, as described above.


Additional Data Format Constraints

Video constraints (enforced by validate.py):

  • Resolution: 180–1280 px on each side
  • Duration: 2–300 seconds

Back to top

Oopsie Tools & Data — Tooling for the Oopsie Dataset.

This site uses Just the Docs, a documentation theme for Jekyll.