Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix custom imports for both distributed and single device #1731

Merged
merged 3 commits into from
Oct 1, 2024

Conversation

RdoubleA
Copy link
Contributor

@RdoubleA RdoubleA commented Oct 1, 2024

Context

What is the purpose of this PR? Is it to

  • add a new feature
  • fix a bug
  • update tests and/or documentation
  • other (please add here)

Closes #1540. To run custom recipes, you now need to specify via dotpath as if it were a module. this enables running with python -m under the hood of torch.distributed.run which will automatically prepend the current directory to sys.path.

In instantiate, we append the current working directory to the sys.path so it is recognized by import_module in _get_component_from_path in the Python path. This applies to both single device and distributed runs - previously we appended to sys.path BEFORE torch.distributed.run (#1617) which discards it for some reason. Now, we do it within instantiate itself.

Test plan

Log before the fix, the CWD is not present in sys.path right before import_module:

2024-10-01:12:14:30,820 INFO     [_utils.py:75] pythonpath: ['/data/users/rafiayub/torchtune-rafiayub/recipes', '/home/rafiayub/.conda/envs/torchtune/lib/python310.zip', '/home/rafiayub/.conda/envs/torchtune/lib/python3.10', '/home/rafiayub/.conda/envs/torchtune/lib/python3.10/lib-dynload', '/home/rafiayub/.conda/envs/torchtune/lib/python3.10/site-packages', '__editable__.torchtune-0.0.0.finder.__path_hook__', '/data/users/rafiayub/torchtune-rafiayub/docs/src/pytorch-sphinx-theme', '/tmp/tmpgqp_rg5o']
2024-10-01:12:14:30,820 INFO     [_utils.py:76] cwd: /data/users/rafiayub
2024-10-01:12:14:30,820 INFO     [_utils.py:77] import path ['data', 'dataset', 'custom_dataset']

Log after the fix, you can see the CWD at the end of sys.path:

2024-10-01:12:16:57,555 INFO     [_utils.py:77] pythonpath: ['/data/users/rafiayub/torchtune-rafiayub/recipes', '/home/rafiayub/.conda/envs/torchtune/lib/python310.zip', '/home/rafiayub/.conda/envs/torchtune/lib/python3.10', '/home/rafiayub/.conda/envs/torchtune/lib/python3.10/lib-dynload', '/home/rafiayub/.conda/envs/torchtune/lib/python3.10/site-packages', '__editable__.torchtune-0.0.0.finder.__path_hook__', '/data/users/rafiayub/torchtune-rafiayub/docs/src/pytorch-sphinx-theme', '/tmp/tmp695bcof_', '/data/users/rafiayub']
2024-10-01:12:16:57,555 INFO     [_utils.py:78] cwd: /data/users/rafiayub
2024-10-01:12:16:57,555 INFO     [_utils.py:79] import path ['data', 'dataset', 'custom_dataset']

Tested both distributed and single device runs by using tune cp and modifying config to point to a local custom dataset builder at data.dataset.custom_dataset:

def custom_dataset(tokenizer):
	return alpaca_dataset(tokenizer)

Also tested the four combinations:

  • built-in single device recipe + custom component: tune run full_finetune_single_device --config config/my_config_single_device.yaml
  • custom single device recipe + custom imports + custom component: tune run recipe.tune_single --config config/my_config_single_device.yaml
  • built-in distributed recipe + custom component: tune run --nproc_per_node 4 full_finetune_distributed --config config/my_config.yaml
  • custom distributedrecipe + custom imports + custom component: tune run --nproc_per_node 4 recipe.tune_dist --config config/my_config.yaml

Copy link

pytorch-bot bot commented Oct 1, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1731

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit d7be38f with merge base 10b02e0 (image):

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 1, 2024
Copy link
Contributor

@pbontrager pbontrager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this change! Can you do one additional test where you copy a recipe to your cwd, modify it with "import local_module" at the top, and verify it works with distributed?

@RdoubleA RdoubleA merged commit 29888d1 into pytorch:main Oct 1, 2024
16 of 17 checks passed
@RdoubleA RdoubleA deleted the fix_custom_imports branch October 1, 2024 22:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Custom imports are broken
3 participants