-
Notifications
You must be signed in to change notification settings - Fork 641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve push_dataset_to_hub
API + Add unit tests
#231
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I prefer raw_dir
and out_dir
.
We will need to propagate the change to the README and other parts of the code.
Let's do it once we converged on the API for push_dataset_to_hub.
For instance, I am still wondering if we should rename it convert_to_lerobot_dataset.py
with an option --push-to-hub
.
With the current state of this PR, we currently have push_dataset_to_hub.py
with an option --out-dir
and --dry-run 1
to save locally without pushing to hub.
Updated it to fix the readme and doc. ok to merge @Cadene so I can push a next fix? |
Quick updated behavior: default "save-tests-to-disk" to 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR :)
We should definitly improve push_dataset_to_hub.py
.
I just want to highlight that we dont have proper unit tests for push_dataset_to_hub.py
for now. We prioritized other features.
If we start iterating heavily on push_dataset_to_hub.py
, we might need to implement unit tests in a future PR to ensure we dont break things.
push_dataset_to_hub
arguments to --raw-dir
and --out-dir
push_dataset_to_hub
arguments to --raw-dir
and --out-dir
push_dataset_to_hub
arguments
Co-authored-by: Remi <re.cadene@gmail.com>
Co-authored-by: Remi <re.cadene@gmail.com>
Co-authored-by: Remi <re.cadene@gmail.com>
Co-authored-by: Remi <re.cadene@gmail.com>
Co-authored-by: Remi <re.cadene@gmail.com>
Co-authored-by: Remi <re.cadene@gmail.com>
all good to merge |
push_dataset_to_hub
argumentspush_dataset_to_hub
API + Add unit tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left comments mainly about the interface.
I didn't look into each _format.py
script, tell me if you want another pair of eyes on those.
@@ -18,58 +18,39 @@ | |||
or store it locally. LeRobot dataset format is lightweight, fast to load from, and does not require any | |||
installation of neural net specific packages like pytorch, tensorflow, jax. | |||
|
|||
Example: | |||
Example of how to download raw datasets, convert them into LeRobotDataset format, and push them to the hub: | |||
``` | |||
python lerobot/scripts/push_dataset_to_hub.py \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What this does
push_dataset_to_hub.py
API--data-dir data
and implicit location for raw directory, uses explicit--raw-dir data/pusht_raw
with full path to directory--save-to-disk 1
and implicit location for output directory, uses explicit--local-dir data/pusht
with full path to directory--community-id lerobot
and--dataset-id pusht
, uses simpler--repo-id lerobot/pusht
--debug 1
to run on first episode, uses more configurable--episodes 0
--save-tests-to-disk 1
and implicit location for dataset test directory, uses explicit--tests-data-dir tests/data/pusht
--dry-run 1
, uses explicit--push-to-hub 0
(by default to 1)--revision
, hardcodeCODEBASE_VERSION
--force-override 1
(by default 0), to not delete local directory by mistake--local-dir
is provided, uses hardcoded/tmp/{REPO_ID}
to cache images if neededpush_dataset_to_hub.py
cadene
pagedownload_hub
for allpush
datasets instead oflift
(see below)How to try the code
How it was tested
CI
This change is