# Prepare Data If you are not a developer, you don't have to follow this instruction. The required data will be downloaded automatically from HuggingFace when you run the code. For the current stage, the HuggingFace dataset is private. So you need a token to access it. See tips for using HuggingFace [here](../developer_guide/tips/huggingface.md). ## Data on HuggingFace See tips for using HuggingFace [here](../developer_guide/tips/huggingface.md). Download the release-ready data by running: ```bash git clone git@hf.co:datasets/RoboVerseOrg/roboverse_data ``` If you haven't used Git LFS on your machine, run the following command to install it: ```bash sudo apt install git-lfs git lfs install ``` ### File structure The release-ready data strictly follows the following format. It is divided into two parts: `metasim/data` and `roboverse_data`. - The `metasim/data` should have relatively small amount of data, and allow continous updates. So it contains the necessary data for quickstarts, and the robot assets. - The `roboverse_data` should be the place for stable large amount of data, such as trajectory data, object assets, and material, because once a task is successfully transferred, the corresponding trajectory and object assets is relatively stable. Note again that the robot assets are not included in `roboverse_learn`, but in `metasim/data`. ```text metasim |-- data |-- quickstarts |-- ... |-- robots (deprecated) |-- {robot} |-- usd |-- *.usd |-- urdf |-- *.urdf |-- mjcf |-- *.xml roboverse_data |-- robots |-- {robot} |-- usd |-- *.usd |-- urdf |-- *.urdf |-- mjcf |-- *.xml |-- trajs |-- {benchmark} |-- {task} |-- v2 |-- {robot}_v2.pkl |-- assets |-- {benchmark} |-- {task} |-- {object} |-- textures |-- *.png |-- usd |-- *.usd |-- urdf |-- *.urdf |-- mjcf |-- *.xml |-- materials |-- arnold |-- {Category} |-- *.mdl |-- vMaterial_2 |-- {Category} |-- *.mdl |-- ... |-- scenes |-- {source} (e.g. arnold, physcene) |-- {scene} |-- textures |-- *.png |-- usd |-- *.usd |-- urdf |-- *.urdf |-- mjcf |-- *.xml ``` Naming convention: - The {benchmark} is in lowercase - The {task} is in snake_case - The {robot} is in snake_case - The {object} is in snake_case ## Data on Google Drive (Deprecated) 1. Please download the data from [here](https://drive.google.com/drive/folders/1ORMP3__KIlXettN8eUCF3YQNybZQxzkw) and put it under `./data`. 2. Download the converted isaaclab data from [here](https://drive.google.com/drive/folders/1nF-5SU4nC6S_vgC_NL7E7Rt63Zl9sYqW) and put it under `./data_isaaclab`. 3. Then link `./data/source_data` to `./data_isaaclab/source_data`. ```bash cd data_isaaclab ln -s ../data/source_data source_data ``` ### Sync data with Google Drive We recommend using rclone to sync the data with google drive. #### Setup rclone Create a rclone config under the [official guide](https://rclone.org/drive/), during which you need to create your own client id and token by following [this](https://rclone.org/drive/#making-your-own-client-id). The final rclone config should look like this: ``` [roboverse] type = drive client_id = client_secret = scope = drive token = team_drive = root_folder_id = 1ORMP3__KIlXettN8eUCF3YQNybZQxzkw [roboverse_isaaclab] type = drive client_id = client_secret = scope = drive token = team_drive = root_folder_id = 1nF-5SU4nC6S_vgC_NL7E7Rt63Zl9sYqW ``` #### Download data ```bash rclone copy -P roboverse: data --exclude='demo/**' rclone copy -P roboverse_isaaclab: data_isaaclab ``` You may want to speed up by setting `--multi-thread-streams=16` (default is 4). #### Upload data For example, if you are a developer who is migrating the rlbench task, you can upload the assets and source demos to google drive by running: ```bash rclone copy -P data/assets/rlbench roboverse_isaaclab:assets/rlbench --exclude='.thumbs/**' rclone copy -P data/source_data/rlbench roboverse:data/source_data --include='trajectory-unified.pkl' ``` You ***should*** replace `-P` with `-n` (means dry run) to check before uploading.