Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Train] Unify configurations between LightningConfigBuilder.checkpointing() and AIR CheckpointConfig. #35920

Closed
woshiyyya opened this issue May 31, 2023 · 0 comments · Fixed by #36368
Assignees
Labels
enhancement Request for new feature and/or capability ray-team-created Ray Team created train Ray Train Related Issue

Comments

@woshiyyya
Copy link
Member

woshiyyya commented May 31, 2023

Description

Specifying the same checkpoint configuration twice in two places is confusing. The user don't know which config actually controls the checkpointing logic.

The proposed solution is:

AIR CheckpointConfig PTL checkpoint config
no-op, by default PTL only saves the latest checkpoint
create a matched AIR CheckpointConfig for users, otherwise AIR saves all checkpoints which takes tons of disk storage
Throw a warning if the metric doesn't match. AIR and PTL can monitor different metrics

Use case

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Request for new feature and/or capability ray-team-created Ray Team created train Ray Train Related Issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant