Dataset Format
Episodes are stored as HDF5 files (.h5), one file per episode. This is the native format written by EpisodeRecorder and read by the annotation tool. For dataset distribution, episodes will be exported to training-ready formats, such as RLDS and LeRobot. If you use the provided toolkit, you will not have to worry about the data format, but if you write a separate converter, or if you simply want to know how the data looks under the hood, we explain it in detail here.
We chose to use a file-per-episode format to make the annotation tool more flexible, as it allows us to group and bulk annotate episodes flexibly. For more details, see the explanation of the data annotation tool.
Note that we save far more metadata than other projects such as the DROID dataset. This is on purpose, as our dataset is designed to make cross-embodiment training easier. For example, we explicitly collect information about joint names and rotation representation. For details, see the Robot Setup.
HDF5 Episode Schema
Each HDF5 file follows the oopsiedata_format_v1 schema:
episode.h5
├── [attr] schema = "oopsiedata_format_v1"
├── [attr] language_instruction (str)
├── [attr] episode_id (str) # unique in submission group
├── [attr] lab_id (str) # assigned at sign_up
├── [attr] operator_name (str) # can be pseudonymized
├── [attr] robot_profile (str) # json-serialized RobotSetup profile
├── [attr] timestamp (float) # unix timestamp of episode start
│
├── episode_annotations/ (group) # written by annotation tool after rollout
│ └── <annotator_name>/ (group) # one subgroup per annotator
│ ├── [attr] source (str) # e.g. "human"
│ ├── [attr] timestamp (str) # ISO timestamp of annotation
│ ├── [attr] success (float) # 1.0 = success, 0.0 = failure
│ ├── [attr] failure_description (str)
│ ├── [attr] taxonomy (str) # json: {failure_category, severity}
│ └── [attr] additional_notes (str)
│
├── observations/ (group)
│ ├── video_paths/ (group)
│ │ └── <camera_name> (dataset, str) # relative path to .mp4 file
│ └── robot_states/ (group)
│ ├── gripper_position (dataset, float64, shape [N, 1])
│ ├── cartesian_position (dataset, float64, shape [N, D])
│ └── joint_position (dataset, float64, shape [N, D])
│
└── actions/ (group)
├── joint_position (dataset, float64, shape [N, D])
├── joint_velocity (dataset, float64, shape [N, D])
├── gripper_binary (dataset, float64, shape [N, 1])
├── gripper_position (dataset, float64, shape [N, 1])
├── gripper_velocity (dataset, float64, shape [N, 1])
├── base_position (dataset, float64, shape [N, 3])
├── base_velocity (dataset, float64, shape [N, 3])
├── cartesian_position (dataset, float64, shape [N, 7/14])
└── cartesian_velocity (dataset, float64, shape [N, 6/12])
N is the number of recorded timesteps and D is the degrees of freedom for the robot. For bi-arm setups, please simply concatenate the actions of the left and right arm.
Important Points
- We assume that many data collection setups will not make it possible to collect all action formats. We therefore only require one entry in
actions/to be a valid tensor dataset; the others are stored as empty HDF5 datasets. - Please ensure to provide unnormalized and absolute actions as this will make using the actions easier and reduce the amount of conversions.
- We furthermore assume that the rotation component of cartesian position action space are encoded as quaternions. Tooling for converting most common representations into quaternions are provided in the episode recorder.
- The gripper action should be set in one of the three keys: gripper_binary, gripper_position or gripper_velocity. Note that we currently only support two-finger gripper setups.
Field Reference
Root Attributes
| Field | Type | Required | Description |
|---|---|---|---|
schema | str attr | Yes | "oopsiedata_format_v1" |
language_instruction | str attr | Yes | Natural language task description |
episode_id | str attr | Yes | Unique disambiguation ID per episode |
lab_id | str attr | Yes | Lab identifier for multi-lab tracking |
operator_name | str attr | No | Name of the operator running the rollout |
robot_profile | str attr | Yes | JSON-serialized RobotSetup profile (includes robot_id, control_freq, joint names, camera names, etc.) |
timestamp | float attr | Yes | Unix timestamp of episode start |
Episode Annotations
The episode_annotations/ group is written by the annotation tool after rollout, not during recording. It contains one subgroup per annotator:
| Field | Type | Description |
|---|---|---|
source | str attr | Annotation source, e.g. "human" |
timestamp | str attr | ISO timestamp of when the annotation was made |
success | float attr | 1.0 = success, 0.0 = failure |
failure_description | str attr | Free-text description of the failure |
taxonomy | str attr | JSON string: {"failure_category": ..., "severity": ...} |
additional_notes | str attr | Any other annotator notes |
The annotation tool provides a simple interface for editing these fields per episode, or in bulk across a group of episodes.
Video Paths
Camera video files are stored under observations/video_paths/ as relative paths to MP4 files co-located with the HDF5 file. Camera names are user-defined; use descriptive names for consistency:
wrist_cam # Wrist-mounted camera
overhead_cam # Top-down view
left_shoulder_cam # Left over-shoulder view
right_shoulder_cam # Right over-shoulder view
front_cam # Frontal view
Keeping the videos in separate files as opposed to storing the raw pixel observations in the HDF5 file allows us to display and store high resolution videos without massive storage inflation during annotation. These files will be post-processed for the dataset release.
Robot State Observations (per timestep)
Stored under observations/robot_states/.
| Field | Type | Shape | Description |
|---|---|---|---|
gripper_position | float64 | (N, 1) | Gripper position state |
cartesian_position | float64 | (N, D) | End-effector Cartesian pose; shape depends on robot profile |
joint_position | float64 | (N, D) | Joint position state |
Actions (per timestep)
Stored under actions/. Unused fields are written as empty HDF5 datasets.
| Field | Type | Shape | Description |
|---|---|---|---|
joint_position | float64 | (N, D) | Commanded joint positions in absolute (unnormalized) space |
joint_velocity | float64 | (N, D) | Commanded joint velocities |
cartesian_position | float64 | (N, 7/14) | Commanded end-effector Cartesian pose (position + quaternion-based rotation) |
cartesian_velocity | float64 | (N, 6/12) | Commanded end-effector Cartesian velocity (first 3 linear + next 3 angular) |
gripper_binary | float64 | (N, 1) | Binary gripper command (open/close) |
gripper_position | float64 | (N, 1) | Commanded gripper position |
gripper_velocity | float64 | (N, 1) | Commanded gripper velocity |
base_position | float64 | (N, 3) | Commanded base position (deltas usually) |
base_velocity | float64 | (N, 3) | Commanded base linear velocity (x, y, yaw) |
To make the dataset usable across embodiments, we ask that you save absolute end-effector position commands. However, we acknowledge that the best action space encoding can vary from policy to policy. For common embodiments, such as the Franka arm used in the standard Droid setup, and the Aloha bi-arm manipulator, we provide utilities to convert between joint and eef space representations.
If you want to contribute data on a different embodiment, and your policy does not allow you to output eef positions, please let us know.
Failure Annotation Schema
Annotations are written into the episode_annotations/<annotator_name>/ subgroup by the annotation tool. The taxonomy attribute holds a JSON string with structured failure labels:
{
"failure_category": "<category>",
"severity": "<severity>"
}
The questionnaire driving these fields is defined in oopsie_tools/annotation_tool/questionnaire.yaml and can be filled out using the annotation tool.
Directory Layout
We strongly recommend saving the data in a directory layout structured like this.
samples_directory
├── evaluation_session_1
│ ├── episode_1.hdf5
│ ├── episode_1_camera1.mp4
│ ├── episode_1_camera2.mp4
│ ├── episode_2.hdf5
│ ├── episode_2_camera1.mp4
│ ├── episode_2_camera2.mp4
│ └── ...
│
├── evaluation_session_2
│ ├── episode_1.hdf5
│ ├── episode_1_camera1.mp4
│ ├── episode_1_camera2.mp4
│ └── ...
│
└── ...
By default, session names and episodes will contain the lab id and timestamp.
Keeping all episodes from one evaluation session in one directory allows you to easily use the bulk annotation utilities in the data annotation tool. Note that we do not make any assumptions about file names, all of our tools simply look for all hdf5 files in a nested directory structure. The file paths of associated mp4 camera videos need to be saved in the hdf5 files directly, as described above.
Additional Data Format Constraints
Video constraints (enforced by validate.py):
- Resolution: 180–1280 px on each side
- Duration: 2–300 seconds