[BUG] pt: setting `batch_size` to `list` throws errors #3475

njzjz · 2024-03-17T05:08:41Z

Bug summary

In the PyTorch backend, setting batch_size to list throws errors as shown below.

DeePMD-kit Version

v3.0.0a0-28-ged831c88

TensorFlow Version

PT v2.2.0+cu121-g8ac9b20d4b0

How did you download the software?

Built from source

Input Files, Running Commands, Error Log, etc.

Traceback (most recent call last):
  File "/home/jz748/anaconda3/bin/dp", line 8, in <module>
    sys.exit(main())
  File "/home/jz748/codes/deepmd-kit/deepmd/main.py", line 807, in main
    deepmd_main(args)
  File "/home/jz748/anaconda3/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
    return f(*args, **kwargs)
  File "/home/jz748/codes/deepmd-kit/deepmd/pt/entrypoints/main.py", line 306, in main
    train(FLAGS)
  File "/home/jz748/codes/deepmd-kit/deepmd/pt/entrypoints/main.py", line 270, in train
    trainer = get_trainer(
  File "/home/jz748/codes/deepmd-kit/deepmd/pt/entrypoints/main.py", line 166, in get_trainer
    ) = prepare_trainer_input_single(
  File "/home/jz748/codes/deepmd-kit/deepmd/pt/entrypoints/main.py", line 149, in prepare_trainer_input_single
    train_data_single = DpLoaderSet(
  File "/home/jz748/codes/deepmd-kit/deepmd/pt/utils/dataloader.py", line 129, in __init__
    system_dataloader = DataLoader(
  File "/home/jz748/anaconda3/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 356, in __init__
    batch_sampler = BatchSampler(sampler, batch_size, drop_last)
  File "/home/jz748/anaconda3/lib/python3.10/site-packages/torch/utils/data/sampler.py", line 267, in __init__
    raise ValueError(f"batch_size should be a positive integer value, but got batch_size={batch_size}")
ValueError: batch_size should be a positive integer value, but got batch_size=[1, 1, 1]

Steps to Reproduce

cd examples/water/se_atten

Do the following modifications:

diff --git a/examples/water/se_atten/input_torch.json b/examples/water/se_atten/input_torch.json
index 7e9cf06f..0188228e 100644
--- a/examples/water/se_atten/input_torch.json
+++ b/examples/water/se_atten/input_torch.json
@@ -68,7 +68,7 @@
         "../data/data_1",
         "../data/data_2"
       ],
-      "batch_size": 1,
+      "batch_size": [1, 1, 1],
       "_comment": "that's all"
     },
     "validation_data": {

Then run

dp --pt train input_torch.json

Further Information, Files, and Links

Need to update documentation if it cannot be resolved before the stable release.
https://docs.deepmodeling.com/projects/deepmd/en/latest/train/train-input.html#argument:training/training_data/batch_size

The text was updated successfully, but these errors were encountered:

#3475 --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

njzjz added the bug label Mar 17, 2024

njzjz added this to the v3.0.0 milestone Mar 17, 2024

github-project-automation bot added this to Multiple backend support for DeePMD-kit Mar 17, 2024

github-project-automation bot moved this to Todo in Multiple backend support for DeePMD-kit Mar 17, 2024

wanghan-iapcm assigned CaRoLZhangxy Mar 18, 2024

njzjz mentioned this issue Mar 24, 2024

[Feature Request] Support different backends for DeePMD-kit deepmodeling/dpgen#1462

Closed

CaRoLZhangxy mentioned this issue Mar 27, 2024

pt: support list format batch size #3614

Merged

njzjz linked a pull request Mar 27, 2024 that will close this issue

pt: support list format batch size #3614

Merged

github-merge-queue bot pushed a commit that referenced this issue Mar 28, 2024

pt: support list format batch size (#3614)

7933c5e

#3475 --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

njzjz closed this as completed Mar 28, 2024

github-project-automation bot moved this from Todo to Done in Multiple backend support for DeePMD-kit Mar 28, 2024

njzjz mentioned this issue May 11, 2024

[BUG] Inconsistency between the batch_size specifications in input.json for tf and pt backends #3770

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] pt: setting `batch_size` to `list` throws errors #3475

[BUG] pt: setting `batch_size` to `list` throws errors #3475

njzjz commented Mar 17, 2024 •

edited

Loading

[BUG] pt: setting batch_size to list throws errors #3475

[BUG] pt: setting batch_size to list throws errors #3475

Comments

njzjz commented Mar 17, 2024 • edited Loading

Bug summary

DeePMD-kit Version

TensorFlow Version

How did you download the software?

Input Files, Running Commands, Error Log, etc.

Steps to Reproduce

Further Information, Files, and Links

[BUG] pt: setting `batch_size` to `list` throws errors #3475

[BUG] pt: setting `batch_size` to `list` throws errors #3475

njzjz commented Mar 17, 2024 •

edited

Loading