Skip to content

Commit

Permalink
Address more comments
Browse files Browse the repository at this point in the history
  • Loading branch information
Cadene committed Mar 19, 2024
1 parent b420ab8 commit 7d5d99e
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 5 deletions.
23 changes: 18 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,9 +148,9 @@ DATA_DIR="tests/data" pytest -sx tests

**Datasets**

To add a pytorch rl dataset to the hub, first login and use a token generated from [huggingface settings](https://huggingface.co/settings/tokens) with write access:
To add a dataset to the hub, first login and use a token generated from [huggingface settings](https://huggingface.co/settings/tokens) with write access:
```
huggingface-cli login --token $HUGGINGFACE_TOKEN --add-to-git-credential
huggingface-cli login --token ${HUGGINGFACE_TOKEN} --add-to-git-credential
```

Then you can upload it to the hub with:
Expand All @@ -160,6 +160,12 @@ HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli upload $HF_USER/$DATASET data/$DATAS
--revision v1.0
```

You will need to set the corresponding version as a default argument in your dataset class:
```python
version: str | None = "v1.0",
```
See: [`lerobot/common/datasets/pusht.py`](https://github.com/Cadene/lerobot/blob/main/lerobot/common/datasets/pusht.py)

For instance, for [cadene/pusht](https://huggingface.co/datasets/cadene/pusht), we used:
```
HF_USER=cadene
Expand All @@ -169,7 +175,7 @@ DATASET=pusht
If you want to improve an existing dataset, you can download it locally with:
```
mkdir -p data/$DATASET
HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download $HF_USER/$DATASET \
HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download ${HF_USER}/$DATASET \
--repo-type dataset \
--local-dir data/$DATASET \
--local-dir-use-symlinks=False \
Expand All @@ -181,15 +187,22 @@ Iterate on your code and dataset with:
DATA_DIR=data python train.py
```

Then upload a new version (v2.0 or v1.1 if the changes are respectively more or less significant):
Upload a new version (v2.0 or v1.1 if the changes are respectively more or less significant):
```
HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli upload $HF_USER/$DATASET data/$DATASET \
--repo-type dataset \
--revision v1.1 \
--delete "*"
```

And you might want to mock the dataset if you need to update the unit tests as well:
Then you will need to set the corresponding version as a default argument in your dataset class:
```python
version: str | None = "v1.1",
```
See: [`lerobot/common/datasets/pusht.py`](https://github.com/Cadene/lerobot/blob/main/lerobot/common/datasets/pusht.py)


Finally, you might want to mock the dataset if you need to update the unit tests as well:
```
python tests/scripts/mock_dataset.py --in-data-dir data/$DATASET --out-data-dir tests/data/$DATASET
```
Expand Down
6 changes: 6 additions & 0 deletions lerobot/common/datasets/abstract.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,12 @@ def __init__(
self.version = version
self.shuffle = shuffle
self.root = root

if self.root is not None and self.version is not None:
logging.warning(
f"The version of the dataset ({self.version}) is not enforced when root is provided ({self.root})."
)

storage = self._download_or_load_dataset()

super().__init__(
Expand Down

0 comments on commit 7d5d99e

Please sign in to comment.