-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multi-run with hydra + DDP #18175
base: master
Are you sure you want to change the base?
Conversation
adb38f3
to
e65d344
Compare
282fbfa
to
1f81663
Compare
cwd = get_original_cwd() | ||
rundir = f'"{HydraConfig.get().run.dir}"' | ||
# Set output_subdir null since we don't want different subprocesses trying to write to config.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment is still useful, we could move it down to the corresponding line.
raise RuntimeError("DDP with multirun requires saved config file") | ||
else: # Used saved config for new run | ||
hydra_subdir = rundir / hydra_cfg.output_subdir | ||
command += ["-cp", str(hydra_subdir), "-cn", "config.yaml"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the name of the file always guaranteed to be config.yaml
?
Hey @nisheethlahoti |
@awaelchli I realized that the implementation above isn't entirely correct (it doesn't properly handle user overrides that start with |
@nisheethlahoti @awaelchli I ran into the problem of a sweep/multirun of multiple PL training jobs with DDP resulting in spawned ddp processes (train_ddp_process_1, etc.) creating multiple output folders of the same multirun, each folder containing a copy of multirun.yaml and DDP subprocess log files. This is partially because my I came up with my own solution. This is the monkey patch I use: def _hydra_subprocess_cmd(local_rank: int):
"""
Monkey patching for lightning.fabric.strategies.launchers.subprocess_script._hydra_subprocess_cmd
Temporarily fixes the problem of unnecessarily creating log folders for DDP subprocesses in Hydra multirun/sweep.
"""
import __main__ # local import to avoid https://github.com/Lightning-AI/lightning/issues/15218
from hydra.core.hydra_config import HydraConfig
from hydra.types import RunMode
from hydra.utils import get_original_cwd, to_absolute_path
# when user is using hydra find the absolute path
if __main__.__spec__ is None: # pragma: no-cover
command = [sys.executable, to_absolute_path(sys.argv[0])]
else:
command = [sys.executable, "-m", __main__.__spec__.name]
command += sys.argv[1:] + [f"hydra.job.name=train_ddp_process_{local_rank}", "hydra.output_subdir=null"]
cwd = get_original_cwd()
config = HydraConfig.get()
if config.mode == RunMode.RUN:
command += [f'hydra.run.dir="{config.run.dir}"']
elif config.mode == RunMode.MULTIRUN:
command += [f'hydra.sweep.dir="{config.sweep.dir}"']
print(command)
return command, cwd I think in your solution using |
|
GitGuardian id | Secret | Commit | Filename | |
---|---|---|---|---|
- | Generic High Entropy Secret | 78fa3af | tests/tests_app/utilities/test_login.py | View secret |
- | Base64 Basic Authentication | 78fa3af | tests/tests_app/utilities/test_login.py | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secret safely. Learn here the best practices.
- Revoke and rotate this secret.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
Our GitHub checks need improvements? Share your feedbacks!
Support multi-run with hydra + DDP (when output_subdir is non-None).
Implementation of the idea from #11617 in a restricted setting (when saved config is actually present).
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:
Reviewer checklist