[BUG] Model converted from PT to TF backend could not run with TF #3997

Cloudac7 · 2024-07-19T07:28:37Z

Bug summary

I am now working on multi-task training with DeePMD-kit v3.0.0b0, and I get a header with se_a descriptor after freezing step. Then, I tried to use dp --pt convert-backend frozen_model.pth frozen_model.pb (and without--pt, getting the same result.) to get a frozen_model.pb. But it could not be used when running Lammps with both v2.2.9 and v3.0.0b0, raising the following error:

Setting up Verlet run ...
  Unit style    : metal
  Current step  : 0
  Time step     : 0.0005
INVALID_ARGUMENT: 2 root error(s) found.
  (0) INVALID_ARGUMENT: Input to reshape is a tensor with 504000 values, but the requested shape requires a multiple of 1608
	 [[{{node Reshape_33}}]]
	 [[o_atom_energy/_37]]
  (1) INVALID_ARGUMENT: Input to reshape is a tensor with 504000 values, but the requested shape requires a multiple of 1608
	 [[{{node Reshape_33}}]]
0 successful operations.
0 derived errors ignored.
ERROR on proc 0: DeePMD-kit C API Error: DeePMD-kit Error: TensorFlow Error: INVALID_ARGUMENT: 2 root error(s) found.
  (0) INVALID_ARGUMENT: Input to reshape is a tensor with 504000 values, but the requested shape requires a multiple of 1608
	 [[{{node Reshape_33}}]]
	 [[o_atom_energy/_37]]
  (1) INVALID_ARGUMENT: Input to reshape is a tensor with 504000 values, but the requested shape requires a multiple of 1608
	 [[{{node Reshape_33}}]]
0 successful operations.
0 derived errors ignored. (/public/groups/ai4ec/libs/conda/deepmd/3.0.0b0-cuda118/source/deepmd-kit/source/lmp/pair_deepmd.cpp:586)
Last command: run             ${NSTEPS} upto

It seems something wrong when converting the model, and seems to be a bug.

DeePMD-kit Version

DeePMD-kit v3.0.0b0

Backend and its version

PyTorch v2.0.0.post200, TensorFlow v2.14.0

How did you download the software?

Offline packages

Input Files, Running Commands, Error Log, etc.

Running command:

dp --pt freeze -o frozen_model.pth --head ener
dp convert-backend frozen_model.pth frozen_model.pb

or use --pt.

And the Lammps error log is under below.
slurm-2623892.txt

Steps to Reproduce

Please use the following frozen_model.pth to freeze and use the following Lammps task to reproduce the bug.

Further Information, Files, and Links

No response

The text was updated successfully, but these errors were encountered:

njzjz · 2024-07-19T08:24:58Z

DescrptDPA1Compat has the wrong get_dim_out() when concat_output_tebd is true. cc @iProzd

njzjz · 2024-07-26T18:41:25Z

Fixed in #4007.

- [x] (Tomorrow) Test if it works for deepmodeling#3997. deepmodeling#3997 needs another fix in deepmodeling#4022 .  ## Summary by CodeRabbit - **New Features** - Introduced a method to dynamically determine the output dimension of the descriptor, enhancing its functionality and interaction with other components. - Improved tensor dimensionality handling in tests to ensure compatibility with the new output dimension method.  --------- Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu> Co-authored-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

njzjz · 2024-10-23T06:35:38Z

Reopen. #4007 may not fix this issue, which needs more validation.

njzjz · 2024-11-13T18:44:09Z

#4320 should fix the issue.

Cloudac7 added the bug label Jul 19, 2024

njzjz added the reproduced This bug has been reproduced by developers label Jul 19, 2024

github-project-automation bot added this to Bugfixes for DeePMD-kit Jul 19, 2024

github-project-automation bot moved this to Todo in Bugfixes for DeePMD-kit Jul 19, 2024

iProzd mentioned this issue Jul 23, 2024

fix(pt): fix get_dim for DescrptDPA1Compat #4007

Merged

1 task

njzjz linked a pull request Jul 23, 2024 that will close this issue

fix(pt): fix get_dim for DescrptDPA1Compat #4007

Merged

1 task

njzjz assigned iProzd Jul 24, 2024

iProzd mentioned this issue Jul 25, 2024

[Feature Request] Support dp convert-backend for models with type embedding for tf #4022

Closed

njzjz closed this as completed Jul 26, 2024

github-project-automation bot moved this from Todo to Done in Bugfixes for DeePMD-kit Jul 26, 2024

njzjz reopened this Oct 23, 2024

njzjz closed this as completed Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Model converted from PT to TF backend could not run with TF #3997

[BUG] Model converted from PT to TF backend could not run with TF #3997

Cloudac7 commented Jul 19, 2024 •

edited

Loading

njzjz commented Jul 19, 2024

njzjz commented Jul 26, 2024

njzjz commented Oct 23, 2024

njzjz commented Nov 13, 2024

[BUG] Model converted from PT to TF backend could not run with TF #3997

[BUG] Model converted from PT to TF backend could not run with TF #3997

Comments

Cloudac7 commented Jul 19, 2024 • edited Loading

Bug summary

DeePMD-kit Version

Backend and its version

How did you download the software?

Input Files, Running Commands, Error Log, etc.

Steps to Reproduce

Further Information, Files, and Links

njzjz commented Jul 19, 2024

njzjz commented Jul 26, 2024

njzjz commented Oct 23, 2024

njzjz commented Nov 13, 2024

Cloudac7 commented Jul 19, 2024 •

edited

Loading