Prepare Data#

If you are not a developer, you don’t have to follow this instruction. The required data will be downloaded automatically from HuggingFace when you run the code.

For the current stage, the HuggingFace dataset is private. So you need a token to access it. See tips for using HuggingFace here.

Data on HuggingFace#

See tips for using HuggingFace here.

Download the release-ready data by running:

git clone git@hf.co:datasets/RoboVerseOrg/roboverse_data

If you haven’t used Git LFS on your machine, run the following command to install it:

sudo apt install git-lfs
git lfs install

File structure#

The release-ready data strictly follows the following format. It is divided into two parts: metasim/data and roboverse_data.

The metasim/data should have relatively small amount of data, and allow continous updates. So it contains the necessary data for quickstarts, and the robot assets.
The roboverse_data should be the place for stable large amount of data, such as trajectory data, object assets, and material, because once a task is successfully transferred, the corresponding trajectory and object assets is relatively stable.

Note again that the robot assets are not included in roboverse_learn, but in metasim/data.

metasim
|-- data
    |-- quickstarts
        |-- ...
    |-- robots (deprecated)
        |-- {robot}
            |-- usd
                |-- *.usd
            |-- urdf
                |-- *.urdf
            |-- mjcf
                |-- *.xml

roboverse_data
|-- robots
    |-- {robot}
        |-- usd
            |-- *.usd
        |-- urdf
            |-- *.urdf
        |-- mjcf
            |-- *.xml

|-- trajs
    |-- {benchmark}
        |-- {task}
            |-- v2
                |-- {robot}_v2.pkl

|-- assets
    |-- {benchmark}
        |-- {task}
            |-- {object}
                |-- textures
                    |-- *.png
                |-- usd
                    |-- *.usd
                |-- urdf
                    |-- *.urdf
                |-- mjcf
                    |-- *.xml
|-- materials
    |-- arnold
        |-- {Category}
            |-- *.mdl
    |-- vMaterial_2
        |-- {Category}
            |-- *.mdl
    |-- ...

|-- scenes
    |-- {source} (e.g. arnold, physcene)
        |-- {scene}
            |-- textures
                |-- *.png
            |-- usd
                |-- *.usd
            |-- urdf
                |-- *.urdf
            |-- mjcf
                |-- *.xml

Naming convention:

The {benchmark} is in lowercase
The {task} is in snake_case
The {robot} is in snake_case
The {object} is in snake_case

Data on Google Drive (Deprecated)#

Please download the data from here and put it under ./data.
Download the converted isaaclab data from here and put it under ./data_isaaclab.

Then link ./data/source_data to ./data_isaaclab/source_data.

cd data_isaaclab
ln -s ../data/source_data source_data

Sync data with Google Drive#

We recommend using rclone to sync the data with google drive.

Setup rclone#

Create a rclone config under the official guide, during which you need to create your own client id and token by following this.

The final rclone config should look like this:

[roboverse]
type = drive
client_id = <your_client_id>
client_secret = <your_client_secret>
scope = drive
token = <your_token>
team_drive =
root_folder_id = 1ORMP3__KIlXettN8eUCF3YQNybZQxzkw

[roboverse_isaaclab]
type = drive
client_id = <your_client_id>
client_secret = <your_client_secret>
scope = drive
token = <your_token>
team_drive =
root_folder_id = 1nF-5SU4nC6S_vgC_NL7E7Rt63Zl9sYqW

Download data#

rclone copy -P roboverse: data --exclude='demo/**'
rclone copy -P roboverse_isaaclab: data_isaaclab

You may want to speed up by setting --multi-thread-streams=16 (default is 4).

Upload data#

For example, if you are a developer who is migrating the rlbench task, you can upload the assets and source demos to google drive by running:

rclone copy -P data/assets/rlbench roboverse_isaaclab:assets/rlbench --exclude='.thumbs/**'
rclone copy -P data/source_data/rlbench roboverse:data/source_data --include='trajectory-unified.pkl'

You should replace -P with -n (means dry run) to check before uploading.