Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] pt: setting batch_size to list throws errors #3475

Closed
njzjz opened this issue Mar 17, 2024 · 0 comments · Fixed by #3614
Closed

[BUG] pt: setting batch_size to list throws errors #3475

njzjz opened this issue Mar 17, 2024 · 0 comments · Fixed by #3614
Assignees
Labels
Milestone

Comments

@njzjz
Copy link
Member

njzjz commented Mar 17, 2024

Bug summary

In the PyTorch backend, setting batch_size to list throws errors as shown below.

DeePMD-kit Version

v3.0.0a0-28-ged831c88

TensorFlow Version

PT v2.2.0+cu121-g8ac9b20d4b0

How did you download the software?

Built from source

Input Files, Running Commands, Error Log, etc.

Traceback (most recent call last):
  File "/home/jz748/anaconda3/bin/dp", line 8, in <module>
    sys.exit(main())
  File "/home/jz748/codes/deepmd-kit/deepmd/main.py", line 807, in main
    deepmd_main(args)
  File "/home/jz748/anaconda3/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
    return f(*args, **kwargs)
  File "/home/jz748/codes/deepmd-kit/deepmd/pt/entrypoints/main.py", line 306, in main
    train(FLAGS)
  File "/home/jz748/codes/deepmd-kit/deepmd/pt/entrypoints/main.py", line 270, in train
    trainer = get_trainer(
  File "/home/jz748/codes/deepmd-kit/deepmd/pt/entrypoints/main.py", line 166, in get_trainer
    ) = prepare_trainer_input_single(
  File "/home/jz748/codes/deepmd-kit/deepmd/pt/entrypoints/main.py", line 149, in prepare_trainer_input_single
    train_data_single = DpLoaderSet(
  File "/home/jz748/codes/deepmd-kit/deepmd/pt/utils/dataloader.py", line 129, in __init__
    system_dataloader = DataLoader(
  File "/home/jz748/anaconda3/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 356, in __init__
    batch_sampler = BatchSampler(sampler, batch_size, drop_last)
  File "/home/jz748/anaconda3/lib/python3.10/site-packages/torch/utils/data/sampler.py", line 267, in __init__
    raise ValueError(f"batch_size should be a positive integer value, but got batch_size={batch_size}")
ValueError: batch_size should be a positive integer value, but got batch_size=[1, 1, 1]

Steps to Reproduce

cd examples/water/se_atten

Do the following modifications:

diff --git a/examples/water/se_atten/input_torch.json b/examples/water/se_atten/input_torch.json
index 7e9cf06f..0188228e 100644
--- a/examples/water/se_atten/input_torch.json
+++ b/examples/water/se_atten/input_torch.json
@@ -68,7 +68,7 @@
         "../data/data_1",
         "../data/data_2"
       ],
-      "batch_size": 1,
+      "batch_size": [1, 1, 1],
       "_comment": "that's all"
     },
     "validation_data": {

Then run

dp --pt train input_torch.json

Further Information, Files, and Links

Need to update documentation if it cannot be resolved before the stable release.
https://docs.deepmodeling.com/projects/deepmd/en/latest/train/train-input.html#argument:training/training_data/batch_size

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

Successfully merging a pull request may close this issue.

2 participants