A large error induced by compression for se_e3 descriptor #2250

njzjz · 2023-01-13T19:57:59Z

Discussed in #2182

^{Originally posted by shihao-code December 15, 2022}
When I used a hybrid descriptor of se_e2_a and se_e3, the RMSE of deep potential is very small (3 meV/atom for energy and 59 meV/Ang for atomic force), however, after compressing the potential, the RMSE change very large (16 meV/atom for energy and 64 meV/Ang for atomic force). But if I only used se_e2_a descriptor with keepind other parameter in input.json file unchanged, there is no change before and after compression. And if only se_e3 descriptor was used, there is also a large error induced by compression.

Verison of deepmd-kit: 2.1.5_cuda11.6

Command I used: dp compress -i FeH.pb -o FeH-compress.pb --step 0.002

The output of compression:

Loading BaseGPU/2021
  Loading requirement: nvhpc/21.3 cuda/11.2 openmpi/4.0.3cu11.2.v2
WARNING:tensorflow:From /sqfs/work/G14979/u6b368/bin/deepmd_kit_gpu_2.1.5_cuda11.6/lib/python3.10/site-packages/tensorflow/python/compat/v2_compat.py:107: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
WARNING:root:To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, TF_INTRA_OP_PARALLELISM_THREADS, and TF_INTER_OP_PARALLELISM_THREADS.
WARNING:root:Environment variable KMP_BLOCKTIME is empty. Use the default value 0
WARNING:root:Environment variable KMP_AFFINITY is empty. Use the default value granularity=fine,verbose,compact,1,0
/sqfs/work/G14979/u6b368/bin/deepmd_kit_gpu_2.1.5_cuda11.6/lib/python3.10/importlib/__init__.py:169: UserWarning: The NumPy module was reloaded (imported a second time). This can in some cases result in small but subtle issues and is discouraged.
  _bootstrap._exec(spec, module)
DEEPMD INFO    


DEEPMD INFO    stage 1: compress the model
DEEPMD INFO     _____               _____   __  __  _____           _     _  _   
DEEPMD INFO    |  __ \             |  __ \ |  \/  ||  __ \         | |   (_)| |  
DEEPMD INFO    | |  | |  ___   ___ | |__) || \  / || |  | | ______ | | __ _ | |_ 
DEEPMD INFO    | |  | | / _ \ / _ \|  ___/ | |\/| || |  | ||______|| |/ /| || __|
DEEPMD INFO    | |__| ||  __/|  __/| |     | |  | || |__| |        |   < | || |_ 
DEEPMD INFO    |_____/  \___| \___||_|     |_|  |_||_____/         |_|\_\|_| \__|
DEEPMD INFO    Please read and cite:
DEEPMD INFO    Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018)
DEEPMD INFO    installed to:         /home/conda/feedstock_root/build_artifacts/deepmd-kit_1663923590539/work/_skbuild/linux-x86_64-3.10/cmake-install
DEEPMD INFO    source :              v2.1.5
DEEPMD INFO    source brach:         HEAD
DEEPMD INFO    source commit:        6e3d4a62
DEEPMD INFO    source commit at:     2022-09-23 16:10:28 +0800
DEEPMD INFO    build float prec:     double
DEEPMD INFO    build variant:        cuda
DEEPMD INFO    build with tf inc:    /sqfs/work/G14979/u6b368/bin/deepmd_kit_gpu_2.1.5_cuda11.6/lib/python3.10/site-packages/tensorflow/include;/sqfs/work/G14979/u6b368/bin/deepmd_kit_gpu_2.1.5_cuda11.6/include
DEEPMD INFO    build with tf lib:    
DEEPMD INFO    ---Summary of the training---------------------------------------
DEEPMD INFO    running on:           gpu0501
DEEPMD INFO    computing device:     gpu:0
DEEPMD INFO    CUDA_VISIBLE_DEVICES: 0,1
DEEPMD INFO    Count of visible GPU: 2
DEEPMD INFO    num_intra_threads:    0
DEEPMD INFO    num_inter_threads:    0
DEEPMD INFO    -----------------------------------------------------------------
DEEPMD INFO    training without frame parameter
DEEPMD INFO    training data with lower boundary: [-0.22680075 -0.29381635]
DEEPMD INFO    training data with upper boundary: [30.16753829 41.82551879]
OMP: Info #155: KMP_AFFINITY: Initial OS proc set respected: 0
OMP: Info #216: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #157: KMP_AFFINITY: 1 available OS procs
OMP: Info #158: KMP_AFFINITY: Uniform topology
OMP: Info #287: KMP_AFFINITY: topology layer "LL cache" is equivalent to "socket".
OMP: Info #287: KMP_AFFINITY: topology layer "L3 cache" is equivalent to "socket".
OMP: Info #287: KMP_AFFINITY: topology layer "L2 cache" is equivalent to "core".
OMP: Info #287: KMP_AFFINITY: topology layer "L1 cache" is equivalent to "core".
OMP: Info #192: KMP_AFFINITY: 1 socket x 1 core/socket x 1 thread/core (1 total cores)
OMP: Info #218: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #172: KMP_AFFINITY: OS proc 0 maps to socket 0 core 0 thread 0 
OMP: Info #254: KMP_AFFINITY: pid 449229 tid 449422 thread 0 bound to OS proc set 0
OMP: Info #254: KMP_AFFINITY: pid 449229 tid 449421 thread 1 bound to OS proc set 0
DEEPMD INFO    training data with lower boundary: [-1505.35165116 -4165.88651941]
DEEPMD INFO    training data with upper boundary: [1505.35165116 4165.88651941]
DEEPMD INFO    built lr
DEEPMD INFO    built network
DEEPMD INFO    built training
DEEPMD INFO    initialize model from scratch
INFO:tensorflow:/sqfs2/cmc/0/work/G14979/u6b368/bbb0/model-compression/model.ckpt.index
DEEPMD INFO    /sqfs2/cmc/0/work/G14979/u6b368/bbb0/model-compression/model.ckpt.index
INFO:tensorflow:0
DEEPMD INFO    0
INFO:tensorflow:/sqfs2/cmc/0/work/G14979/u6b368/bbb0/model-compression/model.ckpt.data-00000-of-00001
DEEPMD INFO    /sqfs2/cmc/0/work/G14979/u6b368/bbb0/model-compression/model.ckpt.data-00000-of-00001
INFO:tensorflow:69300
DEEPMD INFO    69300
INFO:tensorflow:/sqfs2/cmc/0/work/G14979/u6b368/bbb0/model-compression/model.ckpt.meta
DEEPMD INFO    /sqfs2/cmc/0/work/G14979/u6b368/bbb0/model-compression/model.ckpt.meta
INFO:tensorflow:1659000
DEEPMD INFO    1659000
DEEPMD INFO    finished compressing
DEEPMD INFO    


DEEPMD INFO    stage 2: freeze the model
INFO:tensorflow:Restoring parameters from model-compression/model.ckpt
DEEPMD INFO    Restoring parameters from model-compression/model.ckpt
DEEPMD INFO    The following nodes will be frozen: ['model_type', 'descrpt_attr/rcut', 'descrpt_attr/ntypes', 'model_attr/tmap', 'model_attr/model_type', 'model_attr/model_version', 'train_attr/min_nbor_dist', 'train_attr/training_script', 'o_energy', 'o_force', 'o_virial', 'o_atom_energy', 'o_atom_virial', 'fitting_attr/dfparam', 'fitting_attr/daparam']
WARNING:tensorflow:From /sqfs/work/G14979/u6b368/bin/deepmd_kit_gpu_2.1.5_cuda11.6/lib/python3.10/site-packages/deepmd/entrypoints/freeze.py:246: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.convert_variables_to_constants`
DEEPMD WARNING From /sqfs/work/G14979/u6b368/bin/deepmd_kit_gpu_2.1.5_cuda11.6/lib/python3.10/site-packages/deepmd/entrypoints/freeze.py:246: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.convert_variables_to_constants`
WARNING:tensorflow:From /sqfs/work/G14979/u6b368/bin/deepmd_kit_gpu_2.1.5_cuda11.6/lib/python3.10/site-packages/tensorflow/python/framework/convert_to_constants.py:925: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
DEEPMD WARNING From /sqfs/work/G14979/u6b368/bin/deepmd_kit_gpu_2.1.5_cuda11.6/lib/python3.10/site-packages/tensorflow/python/framework/convert_to_constants.py:925: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
DEEPMD INFO    1258 ops in the final graph.

My input.json file

        "descriptor": {
	    "type": "hybrid",
	    "list": [
		 {
	            "type": "se_e2_a",
	            "sel": "auto",
	            "rcut_smth": 0.5,
		    "activation_function": "tanh",
	            "rcut": 6.5,
	            "neuron": [
	                30,
	                60,
			120
	            ],
	            "resnet_dt": false,
	            "axis_neuron": 32,
	            "seed": 13290,
	            "_comment": " that's all"
		 },
		 {
                    "type": "se_e3",
                    "sel": "auto",
                    "rcut_smth": 0.5,
                    "activation_function": "tanh",
                    "rcut": 5.0,
                    "neuron": [
                        5,
                        10,
                        20
                    ],
                    "resnet_dt": false,
                    "seed": 1327,
                    "_comment": " that's all"
		 }
	    ]
        },
        "fitting_net": {
            "neuron": [
                320,
                320,
		320
            ],
            "resnet_dt": true,
            "seed": 6374,
            "_comment": " that's all"
        },

The text was updated successfully, but these errors were encountered:

DingChangjie · 2023-05-21T12:42:29Z

Hi, I've also found this issue in my hybrid-descriptor ZrC potential , where I find that the accuracy deteriotates severely after model compression. I used the latest v2.2.1 version of deepmd-kit. It seems that this issue has not yet been fixed...?

Fix deepmodeling#2250. Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

Fix #2250. --------- Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

njzjz · 2023-05-22T20:36:29Z

Fixed in #2552.

njzjz added bug reproduced This bug has been reproduced by developers labels Jan 13, 2023

njzjz added this to Bugfixes for DeePMD-kit Feb 16, 2023

github-project-automation bot moved this to Todo in Bugfixes for DeePMD-kit Feb 16, 2023

njzjz added a commit to njzjz/deepmd-kit that referenced this issue May 22, 2023

fix se_e3 tabulate op

bdcf779

Fix deepmodeling#2250. Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

njzjz mentioned this issue May 22, 2023

fix se_e3 tabulate op #2552

Merged

njzjz linked a pull request May 22, 2023 that will close this issue

fix se_e3 tabulate op #2552

Merged

njzjz moved this from Todo to Done in Bugfixes for DeePMD-kit May 22, 2023

wanghan-iapcm pushed a commit that referenced this issue May 22, 2023

fix se_e3 tabulate op (#2552)

450455f

Fix #2250. --------- Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

njzjz closed this as completed May 22, 2023

njzjz added the critical Critical bugs that may break the results without messages label Sep 26, 2023

njzjz mentioned this issue Sep 26, 2023

List of critical bugs giving incorrect results without error messages #2866

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A large error induced by compression for se_e3 descriptor #2250

A large error induced by compression for se_e3 descriptor #2250

njzjz commented Jan 13, 2023 •

edited

Loading

DingChangjie commented May 21, 2023

njzjz commented May 22, 2023

A large error induced by compression for se_e3 descriptor #2250

A large error induced by compression for se_e3 descriptor #2250

Comments

njzjz commented Jan 13, 2023 • edited Loading

Discussed in #2182

DingChangjie commented May 21, 2023

njzjz commented May 22, 2023

njzjz commented Jan 13, 2023 •

edited

Loading