Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(dataset): allow overriding tolerance_s #403

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions lerobot/common/datasets/lerobot_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,13 +49,15 @@ def __init__(
image_transforms: Callable | None = None,
delta_timestamps: dict[list[float]] | None = None,
video_backend: str | None = None,
tolerance_s: float | None = None,
):
super().__init__()
self.repo_id = repo_id
self.root = root
self.split = split
self.image_transforms = image_transforms
self.delta_timestamps = delta_timestamps
self.tolerance_s = tolerance_s
# load data from hub or locally when root is provided
# TODO(rcadene, aliberts): implement faster transfer
# https://huggingface.co/docs/huggingface_hub/en/guides/download#faster-downloads
Expand Down Expand Up @@ -126,6 +128,9 @@ def tolerance_s(self) -> float:
are not close enough from the requested frames. It is only used when `delta_timestamps`
is provided or when loading video frames from mp4 files.
"""
if self.tolerance_s is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the "right" way to handle this would be to have a private attribute self._tolerance which you set to 1 / fps - 1e-4 in the __init__ (btw fps should also be private as in self._fps and then might need its own getter method). Then here you just need to return self._tolerance.

OR, if we don't want to bother with encapsulation, this getter should disappear and we should just have an attribute self.tolerance_s. The only danger here is that it's independent of the fps from an interface perspective.

@Cadene I think you should weigh in here as I know you're not a fan of using private attributes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@villekuosmanen After discussing internally, we will put your PR on hold for a bit while we think about a design.
cc @alexander-soare

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@villekuosmanen yeah so since I am encountering this right now with some RL stuff I am doing, we just want to get wait that working then use whatever it was I did to make it happen. It might be this approach, but there may be other ways of handling it.

Thanks a lot mate!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep no problem, happy to wait.

return self.tolerance_s

# 1e-4 to account for possible numerical error
return 1 / self.fps - 1e-4

Expand Down Expand Up @@ -190,6 +195,7 @@ def from_preloaded(
info=None,
videos_dir=None,
video_backend=None,
tolerance_s: float | None = None,
) -> "LeRobotDataset":
"""Create a LeRobot Dataset from existing data and attributes instead of loading from the filesystem.

Expand All @@ -212,6 +218,7 @@ def from_preloaded(
obj.info = info if info is not None else {}
obj.videos_dir = videos_dir
obj.video_backend = video_backend if video_backend is not None else "pyav"
obj.tolerance_s = tolerance_s
return obj


Expand Down
11 changes: 9 additions & 2 deletions lerobot/scripts/control_robot.py
Original file line number Diff line number Diff line change
Expand Up @@ -300,10 +300,10 @@ def record(
tags=None,
num_image_writers=8,
force_override=False,
dataset_sync_tolerance_s=None,
):
# TODO(rcadene): Add option to record logs
# TODO(rcadene): Clean this function via decomposition in higher level functions

_, dataset_name = repo_id.split("/")
if dataset_name.startswith("eval_") and policy is None:
raise ValueError(
Expand Down Expand Up @@ -634,6 +634,7 @@ def on_press(key):
episode_data_index=episode_data_index,
info=info,
videos_dir=videos_dir,
tolerance_s=dataset_sync_tolerance_s,
)
if run_compute_stats:
logging.info("Computing dataset statistics")
Expand Down Expand Up @@ -798,7 +799,13 @@ def replay(robot: Robot, episode: int, fps: int | None = None, root="data", repo
nargs="*",
help="Any key=value arguments to override config values (use dots for.nested=overrides)",
)

parser_record.add_argument(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this matter? It gets lost in the hub upload anyway right? Then when someone wants to actually use the dataset they can set the tolerance as they wish.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was having an issue when calculating stats from the dataset at the end of record which is why I put it here. Happy to delete the arg if there is some other, better way to do it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's weird that this is the case as you are not using delta_timestamps which is the only way I could see the tolerance guard is being triggered. Do you have an exception trace (only dig it up if you cbb, as this PR is on hold anyway).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no I don't unfortunately :(

"--dataset-sync-tolerance-s",
type=float,
default=None,
help="Override the maximum syncronisation tolerance (in seconds) between frames allowed by the LeRobot Dataset. Not passing an argument means we use FPS settings to infer the tolerance.",
)

parser_replay = subparsers.add_parser("replay", parents=[base_parser])
parser_replay.add_argument(
"--fps", type=none_or_int, default=None, help="Frames per second (set to None to disable)"
Expand Down