# Prepare Data
If you are not a developer, you don't have to follow this instruction. The required data will be downloaded automatically from HuggingFace when you run the code.

For the current stage, the HuggingFace dataset is private. So you need a token to access it. See tips for using HuggingFace [here](../developer_guide/tips/huggingface.md).

## Data on HuggingFace

See tips for using HuggingFace [here](../developer_guide/tips/huggingface.md).

Download the release-ready data by running:
```bash
git clone git@hf.co:datasets/RoboVerseOrg/roboverse_data
```

If you haven't used Git LFS on your machine, run the following command to install it:
```bash
sudo apt install git-lfs
git lfs install
```

### File structure

The release-ready data strictly follows the following format. It is divided into two parts: `metasim/data` and `roboverse_data`.

- The `metasim/data` should have relatively small amount of data, and allow continous updates. So it contains the necessary data for quickstarts, and the robot assets.
- The `roboverse_data` should be the place for stable large amount of data, such as trajectory data, object assets, and material, because once a task is successfully transferred, the corresponding trajectory and object assets is relatively stable.

Note again that the robot assets are not included in `roboverse_learn`, but in `metasim/data`.

```text
metasim
|-- data
    |-- quickstarts
        |-- ...
    |-- robots (deprecated)
        |-- {robot}
            |-- usd
                |-- *.usd
            |-- urdf
                |-- *.urdf
            |-- mjcf
                |-- *.xml

roboverse_data
|-- robots
    |-- {robot}
        |-- usd
            |-- *.usd
        |-- urdf
            |-- *.urdf
        |-- mjcf
            |-- *.xml

|-- trajs
    |-- {benchmark}
        |-- {task}
            |-- v2
                |-- {robot}_v2.pkl

|-- assets
    |-- {benchmark}
        |-- {task}
            |-- {object}
                |-- textures
                    |-- *.png
                |-- usd
                    |-- *.usd
                |-- urdf
                    |-- *.urdf
                |-- mjcf
                    |-- *.xml
|-- materials
    |-- arnold
        |-- {Category}
            |-- *.mdl
    |-- vMaterial_2
        |-- {Category}
            |-- *.mdl
    |-- ...

|-- scenes
    |-- {source} (e.g. arnold, physcene)
        |-- {scene}
            |-- textures
                |-- *.png
            |-- usd
                |-- *.usd
            |-- urdf
                |-- *.urdf
            |-- mjcf
                |-- *.xml
```

Naming convention:
- The {benchmark} is in lowercase
- The {task} is in snake_case
- The {robot} is in snake_case
- The {object} is in snake_case

## Data on Google Drive (Deprecated)

1. Please download the data from [here](https://drive.google.com/drive/folders/1ORMP3__KIlXettN8eUCF3YQNybZQxzkw) and put it under `./data`.

2. Download the converted isaaclab data from [here](https://drive.google.com/drive/folders/1nF-5SU4nC6S_vgC_NL7E7Rt63Zl9sYqW) and put it under `./data_isaaclab`.

3. Then link `./data/source_data` to `./data_isaaclab/source_data`.
    ```bash
    cd data_isaaclab
    ln -s ../data/source_data source_data
    ```

### Sync data with Google Drive
We recommend using rclone to sync the data with google drive.

#### Setup rclone
Create a rclone config under the [official guide](https://rclone.org/drive/), during which you need to create your own client id and token by following [this](https://rclone.org/drive/#making-your-own-client-id).

The final rclone config should look like this:
```
[roboverse]
type = drive
client_id = <your_client_id>
client_secret = <your_client_secret>
scope = drive
token = <your_token>
team_drive =
root_folder_id = 1ORMP3__KIlXettN8eUCF3YQNybZQxzkw

[roboverse_isaaclab]
type = drive
client_id = <your_client_id>
client_secret = <your_client_secret>
scope = drive
token = <your_token>
team_drive =
root_folder_id = 1nF-5SU4nC6S_vgC_NL7E7Rt63Zl9sYqW
```

#### Download data
```bash
rclone copy -P roboverse: data --exclude='demo/**'
rclone copy -P roboverse_isaaclab: data_isaaclab
```
You may want to speed up by setting `--multi-thread-streams=16` (default is 4).

#### Upload data
For example, if you are a developer who is migrating the rlbench task, you can upload the assets and source demos to google drive by running:
```bash
rclone copy -P data/assets/rlbench roboverse_isaaclab:assets/rlbench --exclude='.thumbs/**'
rclone copy -P data/source_data/rlbench roboverse:data/source_data --include='trajectory-unified.pkl'
```
You ***should*** replace `-P` with `-n` (means dry run) to check before uploading.