DP-GEN Manual

Description: We just add the technical support of SIESTA on the basis of the source code.

The source code link is as follows:

dpdata: https://github.com/deepmodeling/dpdata

dpgen: https://github.com/deepmodeling/dpgen

About DP-GEN

DP-GEN (Deep Generator) is a software written in Python, delicately designed to generate a deep learning based model of interatomic potential energy and force field. DP-GEN is depedent on DeepMD-kit (https://github.com/deepmodeling/deepmd-kit/blob/master/README.md). With highly scalable interface with common softwares for molecular simulation, DP-GEN is capable to automatically prepare scripts and maintain job queues on HPC machines (High Performance Cluster) and analyze results

Highlighted features

Accurate and efficient: DP-GEN is capable to sample more than tens of million structures and select only a few for first principles calculation. DP-GEN will finally obtain a uniformly accurate model.
User-friendly and automatic: Users may install and run DP-GEN easily. Once succusefully running, DP-GEN can dispatch and handle all jobs on HPCs, and thus there's no need for any personal effort.
Highly scalable: With modularized code structures, users and developers can easily extend DP-GEN for their most relevant needs. DP-GEN currently supports for HPC systems (Slurm, PBS, LSF and cloud machines ), Deep Potential interface with DeePMD-kit, MD interface with LAMMPS and ab-initio calculation interface with VASP, PWSCF, SIESTA, and Gaussian. We're sincerely welcome and embraced to users' contributions, with more possibilities and cases to use DP-GEN.

Code structure and interface

dpgen:
- data: source codes for preparing initial data of bulk and surf systems.
- generator: source codes for main process of deep generator.
- auto_test : source code for undertaking materials property analysis.
- remote : source code for automatically submiting scripts,maintaining job queues and collecting results.
- database : source code for collecting data generated by DP-GEN and interface with database.
examples : providing example JSON files.
tests : unittest tools for developers.

One can easily run DP-GEN with :

dpgen TASK PARAM MACHINE

where TASK is the key word, PARAM and MACHINE are both JSON files.

Options for TASK:

init_bulk : Generating initial data for bulk systems.
init_surf : Generating initial data for surface systems.
run : Main process of Deep Generator.
test: Auto-test for Deep Potential.
db: Collecting data from DP-GEN.

Download and Install

One can download the source code of dpgen by

git clone https://github.com/LiuGroupHNU/dpgen dpgen

demand to use python 3.6 or higher version. then you may install DP-GEN easily by:

cd dpgen
pip install --user .

With this command, the dpgen executable is install to $HOME/.local/bin/dpgen. You may want to export the PATH by

export PATH=$HOME/.local/bin/dpgen:$PATH

To test if the installation is successful, you may execute

dpgen -h

and if everything works, it gives

DeepModeling
------------

Version: 0.2.0
Path:    /home/wanghan/.local/lib/python3.6/site-packages/dpgen-0.1.0-py3.6.egg/dpgen
Date:    Aug 13, 2019

usage: dpgen [-h] {init_surf,init_bulk,run,test,db} ...

dpgen is a convenient script that uses DeepGenerator to prepare initial data,
drive DeepMDkit and analyze results. This script works based on several sub-
commands with their own options. To see the options for the sub-commands, type
"dpgen sub-command -h".

positional arguments:
  {init_surf,init_bulk,run,test,db}
    init_surf           Generating initial data for bulk systems.
    init_bulk           Generating initial data for surface systems.
    run                 Main process of Deep Generator.
    test                Auto-test for Deep Potential.
    db                  Collecting data from DP-GEN.


optional arguments:
  -h, --help            show this help message and exit

Init: Preparing Initial Data

Init_bulk

You may prepare initial data for bulk systems with VASP by:

dpgen init_bulk PARAM MACHINE

Basically init_bulk can be devided into four parts , denoted as stages in PARAM:

Relax in folder 00.place_ele
Pertub and scale in folder 01.scale_pert
Run a shor AIMD in folder 02.md
Collect data in folder 02.md.

All stages must be in order. One doesn't need to run all stages. For example, you may run stage 1 and 2, generating supercells as starting point of exploration in dpgen run.

Following is an example for PARAM, which generates data from a typical structure hcp.

{
    "stages" : [1,2,3,4],
    "cell_type":    "hcp",
    "latt":     4.479,
    "super_cell":   [2, 2, 2],
    "elements":     ["Mg"],
    "potcars":      ["....../POTCAR"],
    "relax_incar": "....../INCAR_metal_rlx",
    "md_incar" : "....../INCAR_metal_md",
    "scale":        [1.00],
    "skip_relax":   false,
    "pert_numb":    2,
    "md_nstep" : 5,
    "pert_box":     0.03,
    "pert_atom":    0.01,
    "coll_ndata":   5000,
    "_comment":     "that's all"
}

If you want to specify a structure as starting point for init_bulk, you may set in PARAM as follows.

"from_poscar":	true,
"from_poscar_path":	"....../C_mp-47_conventional.POSCAR",

The following table gives explicit descriptions on keys in PARAM.

The bold notation of key (such as Elements) means that it's a necessary key.

Key	Type	Example	Discription
stages	List of Integer	[1,2,3,4]	Stages for `init_bulk`
Elements	List of String	["Mg"]	Atom types
cell_type	String	"hcp"	Specifying which typical structure to be generated. Options include fcc, hcp, bcc, sc, diamond.
latt	Float	4.479	Lattice constant for single cell.
from_poscar	Boolean	True	Deciding whether to use a given poscar as the beginning of relaxation. If it's true, keys (`cell_type`, `latt`) will be aborted. Otherwise, these two keys are necessary.
from_poscar_path	String	"....../C_mp-47_conventional.POSCAR"	Path of POSCAR. Necessary if `from_poscar` is true.
relax_incar	String	"....../INCAR"	Path of INCAR for relaxation in VASP. Necessary if `stages` include 1.
md_incar	String	"....../INCAR"	Path of INCAR for MD in VASP. Necessary if `stages` include 3.
scale	List of float	[0.980, 1.000, 1.020]	Scales for transforming cells.
skip_relax	Boolean	False	If it's true, you may directly run stage 2 (pertub and scale) using an unrelaxed POSCAR.
pert_numb	Integer	30	Number of pertubations for each POSCAR.
pert_box	Float	0.03	Percentage of Perturbation for cells.
pert_atom	Float	0.01	Pertubation of each atoms (Angstrom).
md_nstep	Integer	10	Steps of AIMD in stage 3. If it's not equal to settings via `NSW` in `md_incar`, DP-GEN will follow `NSW`.
coll_ndata	Integer	5000	Maximal number of collected data.

Init_surf

You may prepare initial data for surface systems with VASP by:

dpgen init_surf PARAM MACHINE

Basically init_surf can be devided into two parts , denoted as stages in PARAM:

Build specific surface in folder 00.place_ele
Pertub and scale in folder 01.scale_pert

All stages must be in order.

Following is an example for PARAM, which generates data from a typical structure hcp.

{
  "stages": [
    1,
    2
  ],
  "cell_type": "fcc",
  "latt": 4.034,
  "super_cell": [
    2,
    2,
    2
  ],
  "z_min": 9,
  "vacuum_max": 9,
  "vacuum_resol": [
    0.5,
    1
  ],
  "mid_point": 4.0,
  "millers": [
    [
      1,
      0,
      0
    ],
    [
      1,
      1,
      0
    ],
    [
      1,
      1,
      1
    ]
  ],
  "elements": [
    "Al"
  ],
  "potcars": [
    "....../POTCAR"
  ],
  "relax_incar": "....../INCAR_metal_rlx_low",
  "scale": [
    1.0
  ],
  "skip_relax": true,
  "pert_numb": 2,
  "pert_box": 0.03,
  "pert_atom": 0.01,
  "_comment": "that's all"
}

The following table gives explicit descriptions on keys in PARAM.

The bold notation of key (such as Elements) means that it's a necessary key.

Key	Type	Example	Discription
stages	List of Integer	[1,2,3,4]	Stages for `init_surf`
Elements	List of String	["Mg"]	Atom types
cell_type	String	"hcp"	Specifying which typical structure to be generated. Options include fcc, hcp, bcc, sc, diamond.
latt	Float	4.479	Lattice constant for single cell.
z_min	Float	9	Thickness of slab (Angstrom).
vacuum_max	Float	9	Maximal thickness of vacuum (Angstrom).
vacuum_resol	List of float	[0.5, 1 ]	Interval of thichness of vacuum. If size of `vacuum_resol` is 1, the interval is fixed to its value. If size of `vacuum_resol` is 2, the interval is `vacuum_resol[0]` before `mid_point`, otherwise `vacuum_resol[1]` after `mid_point`.
millers	List of list of Integer	[[1,0,0]]	Miller indices.
relax_incar	String	"....../INCAR"	Path of INCAR for relaxation in VASP. Necessary if `stages` include 1.
scale	List of float	[0.980, 1.000, 1.020]	Scales for transforming cells.
skip_relax	Boolean	False	If it's true, you may directly run stage 2 (pertub and scale) using an unrelaxed POSCAR.
pert_numb	Integer	30	Number of pertubations for each POSCAR.
pert_box	Float	0.03	Percentage of Perturbation for cells.
pert_atom	Float	0.01	Pertubation of each atoms (Angstrom).
coll_ndata	Integer	5000	Maximal number of collected data.

Run: Main Process of Generator

You may call the main process by: dpgen run PARAM MACHINE.

The whole process of generator will contain a series of iterations, succussively undertaken in order such as heating the system to certain temperature.

In each iteration, there are three stages of work, namely, 00.train 01.model_devi 02.fp.

00.train: DP-GEN will train several (default 4) models based on initial and generated data. The only difference between these models is the random seed for neural network initialization.
01.model_devi : represent for model-deviation. DP-GEN will use models obtained from 00.train to run Molecular Dynamics(default LAMMPS). Larger deviation for structure properties (default is force of atoms) means less accuracy of the models. Using this criterion, a few fructures will be selected and put into next stage 02.fp for more accurate calculation based on First Principles.
02.fp : Selected structures will be calculated by first principles methods(default VASP). DP-GEN will obtain some new data and put them together with initial data and data generated in previous iterations. After that a new training will be set up and DP-GEN will enter next iteration!

DP-GEN identifies the current stage by a record file, record.dpgen, which will be created and upgraded by codes.Each line contains two number: the first is index of iteration, and the second ,ranging from 0 to 9 ,records which stage in each iteration is currently running.

0,1,2 correspond to make_train, run_train, post_train. DP-GEN will write scripts in make_train, run the task by specific machine in run_train and collect result in post_train. The records for model_devi and fp stage follow similar rules.

In PARAM, you can specialize the task as you expect.

{
  "type_map": [
    "H",
    "C"
  ],
  "mass_map": [
    1,
    12
  ],
  "init_data_prefix": "....../init/",
  "init_data_sys": [
    "CH4.POSCAR.01x01x01/02.md/sys-0004-0001/deepmd"
  ],
  "init_batch_size": [
    8
  ],
  "sys_configs_prefix": "....../init/",
  "sys_configs": [
    [
      "CH4.POSCAR.01x01x01/01.scale_pert/sys-0004-0001/scale*/00000*/POSCAR"
    ],
    [
      "CH4.POSCAR.01x01x01/01.scale_pert/sys-0004-0001/scale*/00001*/POSCAR"
    ]
  ],
  "sys_batch_size": [
    8,
    8,
    8,
    8
  ],
  "_comment": " that's all ",
  "numb_models": 4,
  "train_param": "input.json",
  "default_training_param": {
    "_comment": "that's all",
    "use_smooth": true,
    "sel_a": [
      16,
      4
    ],
    "rcut_smth": 0.5,
    "rcut": 5,
    "filter_neuron": [
      10,
      20,
      40
    ],
    "filter_resnet_dt": false,
    "n_axis_neuron": 12,
    "n_neuron": [
      100,
      100,
      100
    ],
    "resnet_dt": true,
    "coord_norm": true,
    "type_fitting_net": false,
    "systems": [],
    "set_prefix": "set",
    "stop_batch": 40000,
    "batch_size": 1,
    "start_lr": 0.001,
    "decay_steps": 200,
    "decay_rate": 0.95,
    "seed": 0,
    "start_pref_e": 0.02,
    "limit_pref_e": 2,
    "start_pref_f": 1000,
    "limit_pref_f": 1,
    "start_pref_v": 0.0,
    "limit_pref_v": 0.0,
    "disp_file": "lcurve.out",
    "disp_freq": 1000,
    "numb_test": 4,
    "save_freq": 1000,
    "save_ckpt": "model.ckpt",
    "load_ckpt": "model.ckpt",
    "disp_training": true,
    "time_training": true,
    "profiling": false,
    "profiling_file": "timeline.json"
  },
  "model_devi_dt": 0.002,
  "model_devi_skip": 0,
  "model_devi_f_trust_lo": 0.05,
  "model_devi_f_trust_hi": 0.15,
  "model_devi_clean_traj": true,
  "model_devi_jobs": [
    {
      "sys_idx": [
        0
      ],
      "temps": [
        100
      ],
      "press": [
        1.0
      ],
      "trj_freq": 10,
      "nsteps": 300,
      "ensemble": "nvt",
      "_idx": "00"
    },
    {
      "sys_idx": [
        1
      ],
      "temps": [
        100
      ],
      "press": [
        1.0
      ],
      "trj_freq": 10,
      "nsteps": 3000,
      "ensemble": "nvt",
      "_idx": "01"
    }
  ],
  "fp_style": "vasp",
  "shuffle_poscar": false,
  "fp_task_max": 20,
  "fp_task_min": 1,
  "fp_pp_path": "....../methane/",
  "fp_pp_files": [
    "POTCAR"
  ],
  "fp_incar": "....../INCAR_methane"
}

The following table gives explicit descriptions on keys in PARAM.

The bold notation of key (such aas type_map) means that it's a necessary key.

Key	Type	Example	Discription
#Basics
type_map	List of string	["H", "C"]	Atom types
mass_map	List of float	[1, 12]	Standard atom weights.
#Data
init_data_prefix	String	"/sharedext4/.../data/"	Prefix of initial data directories
init_data_sys	List of string	["CH4.POSCAR.01x01x01/.../deepmd"]	Directories of initial data. You may use either absolute or relative path here.
sys_format	String	"vasp/poscar"	Format of initial data. It will be `vasp/poscar` if not set.
init_batch_size	String of integer	[8]	Each number is the batch_size of corresponding system for training in `init_data_sys`. One recommended rule for setting the `sys_batch_size` and `init_batch_size` is that `batch_size` mutiply number of atoms ot the stucture should be larger than 32. If set to `auto`, batch size will be 32 divided by number of atoms.
sys_configs_prefix	String	"/sharedext4/.../data/"	Prefix of `sys_configs`
sys_configs	List of list of string	[ ["/sharedext4/.../POSCAR"], ["....../POSCAR"] ]	Containing directories of structures to be explored in iterations.Wildcard characters are supported here.
sys_batch_size	List of integer	[8, 8]	Each number is the batch_size for training of corresponding system in `sys_configs`. If set to `auto`, batch size will be 32 divided by number of atoms.
#Training
numb_models	Integer	4 (recommend)	Number of models to be trained in `00.train`.
default_training_param	Dict	{ ... "use_smooth": true, "sel_a": [16, 4], "rcut_smth": 0.5, "rcut": 5, "filter_neuron": [10, 20, 40], ... }	Training parameters for `deepmd-kit` in `00.train`. You can find instructions from here: (https://github.com/deepmodeling/deepmd-kit).. We commonly let `stop_batch` = 200 * `decay_steps`.
#Exploration
model_devi_dt	Float	0.002 (recommend)	Timestep for MD
model_devi_skip	Integer	0	Number of structures skipped for fp in each MD
model_devi_f_trust_lo	Float	0.05	Lower bound of forces for the selection.
model_devi_f_trust_hi	Float	0.15	Upper bound of forces for the selection
model_devi_e_trust_lo	Float	1e10	Lower bound of energies for the selection. Recommend to set them a high number, since forces provide more precise information. Special cases such as energy minimization may need this.
model_devi_e_trust_hi	Float	1e10	Upper bound of energies for the selection.
model_devi_clean_traj	Boolean	true	Deciding whether to clean traj folders in MD since they are too large.
model_devi_jobs	[ { "sys_idx": [0], "temps": [100], "press": [1], "trj_freq": 10, "nsteps": 1000, "ensembles": "nvt" }, ... ]	List of dict	Settings for exploration in `01.model_devi`. Each dict in the list corresponds to one iteration. The index of `model_devi_jobs` exactly accord with index of iterations
model_devi_jobs["sys_idx"]	List of integer	[0]	Systems to be selected as the initial structure of MD and be explored. The index corresponds exactly to the `sys_configs`.
model_devi_jobs["temps"]	List of integer	[50, 300]	Temperature (K) in MD
model_devi_jobs["press"]	List of integer	[1,10]	Pressure (Bar) in MD
model_devi_jobs["trj_freq"]	Integer	10	Frequecy of trajectory saved in MD.
model_devi_jobs["nsteps"]	Integer	3000	Running steps of MD.
model_devi_jobs["ensembles"]	String	"nvt"	Determining which ensemble used in MD, options include “npt” and “nvt”.
model_devi_jobs["neidelay"]	Integer	"10"	delay building until this many steps since last build
model_devi_jobs["taut"]	Float	"0.1"	Coupling time of thermostat (fs)
model_devi_jobs["taup"]	Float	"0.5"	Coupling time of barostat (fs)
#Labeling
fp_style	string	"vasp"	Software for First Principles. Options include “vasp”, “pwscf”, “siesta” and “gaussian” up to now.
fp_task_max	Integer	20	Maximum of structures to be calculated in `02.fp` of each iteration.
fp_task_min	Integer	5	Minimum of structures to calculate in `02.fp` of each iteration.
fp_style == VASP
fp_pp_path	String	"/sharedext4/.../ch4/"	Directory of psuedo-potential file to be used for 02.fp exists.
fp_pp_files	List of string	["POTCAR"]	Psuedo-potential file to be used for 02.fp. Note that the order of elements should correspond to the order in `type_map`.
fp_incar	String	"/sharedext4/../ch4/INCAR"	Input file for VASP. INCAR must specify KSPACING.
cvasp	Boolean	true	If `cvasp` is true, DP-GEN will use Custodian to help control VASP calculation.
fp_style == Gaussian
use_clusters	Boolean	false	If set to `true`, clusters will be taken instead of the whole system. This option does not work with DeePMD-kit 0.x.
cluster_cutoff	Float	3.5	The cutoff radius of clusters if `use_clusters` is set to `true`.
fp_params	Dict		Parameters for Gaussian calculation.
fp_params["keywords"]	String or list	"mn15/6-31g** nosymm scf(maxcyc=512)"	Keywords for Gaussian input.
fp_params["multiplicity"]	Integer or String	1	Spin multiplicity for Gaussian input. If set to `auto`, the spin multiplicity will be detected automatically. If set to `frag`, the "fragment=N" method will be used.
fp_params["nproc"]	Integer	4	The number of processors for Gaussian input.
fp_style == siesta
use_clusters	Boolean	false	If set to `true`, clusters will be taken instead of the whole system. This option does not work with DeePMD-kit 0.x.
cluster_cutoff	Float	3.5	The cutoff radius of clusters if `use_clusters` is set to `true`.
fp_params	Dict		Parameters for siesta calculation.
fp_params["ecut"]	Integer	300	Define the plane wave cutoff for grid.
fp_params["ediff"]	Float	1e-4	Tolerance of Density Matrix.
fp_params["kspacing"]	Float	0.4	sample factor in Brillouin zones.
fp_params["mixingweight"]	Float	0.05	Proportion a of output Density Matrix to be used for the input Density Matrix of next SCF cycle (linear mixing).
fp_params["NumberPulay"]	Integer	5	controls the Pulay convergence accelerator.

Test: Auto-test for Deep Generator

At this step, we assume that you have prepared some graph files like graph.*.pb and the particular pseudopotential POTCAR.

The main code of this step is

dpgen test PARAM MACHINE

where PARAM and MACHINE are both json files. MACHINE is the same as above.

The whole program contains a series of tasks shown as follows. In each task, there are three stages of work, generate, run and compute.

00.equi:(default task) the equilibrium state
01.eos: the equation of state
02.elastic: the elasticity like Young's module
03.vacancy: the vacancy formation energy
04.interstitial: the interstitial formation energy
05.surf: the surface formation energy

We take Al as an example to show the parameter settings of param.json. The first part is the fundamental setting for particular alloy system.

    "_comment": "models",
    "potcar_map" : {
	"Al" : "/somewhere/POTCAR"
    },
    "conf_dir":"confs/Al/std-fcc",
    "key_id":"API key of Material project",
    "task_type":"deepmd",
    "task":"eos",

You need to add the specified paths of necessary POTCAR files in "potcar_map". The different POTCAR paths are separated by commas. Then you also need to add the folder path of particular configuration, which contains POSCAR file.

"confs/[element or alloy]/[std-* or mp-**]"
std-*: standard structures, * can be fcc, bcc, hcp and so on.
mp-**: ** means Material id from Material Project.

Usually, if you add the relative path of POSCAR as the above format, dpgen test will check the existence of such file and automatically downloads the standard and existed configurations of the given element or alloy from Materials Project and stores them in confs folder, which needs the API key of Materials project.

task_type contains 3 optional types for testing, i.e. vasp, deepmd and meam.
task contains 7 options, equi, eos, elastic, vacancy, interstitial, surf and all. The option all can do all the tasks.

It is worth noting that the subsequent tasks need to rely on the calculation results of the equilibrium state, so it is necessary to give priority to the calculation of the equilibrium state while testing. And due to the stable consideration, we recommand you to test the equilibrium state of vasp before other tests.

The second part is the computational settings for vasp and lammps. The most important setting is to add the folder path model_dir of deepmd model and supply the corresponding element type map. Besides, dpgen test also is able to call common lammps packages, such as meam.

"vasp_params":	{
	"ecut":		650,
	"ediff":	1e-6,
	"kspacing":	0.1,
	"kgamma":	false,
	"npar":		1,
	"kpar":		1,
	"_comment":	" that's all "
    },
    "lammps_params":    {
        "model_dir":"somewhere/example/Al_model",
        "type_map":["Al"],
        "model_name":false,
        "model_param_type":false
    },

The last part is the optional settings for various tasks mentioned above. You can change the parameters according to actual needs.

    "_comment":"00.equi",
    "store_stable":true,

store_stable:(boolean) whether to store the stable energy and volume

    "_comment": "01.eos",
    "vol_start":	12,
    "vol_end":		22,
    "vol_step":		0.5,

vol_start, vol_end and vol_step determine the volumetric range and accuracy of the eos.

    "_comment": "02.elastic",
    "norm_deform":	2e-2,
    "shear_deform":	5e-2,

norm_deform and shear_deform are the scales of material deformation. This task uses the stress-strain relationship to calculate the elastic constant.

    "_comment":"03.vacancy",
    "supercell":[3,3,3],

supercell:(list of integer) the supercell size used to generate vacancy defect and interstitial defect

    "_comment":"04.interstitial",
    "insert_ele":["Al"],
    "reprod-opt":false,

insert_ele:(list of string) the elements used to generate point interstitial defect
repord-opt:(boolean) whether to reproduce trajectories of interstitial defect

    "_comment": "05.surface",
    "min_slab_size":	10,
    "min_vacuum_size":	11,
    "_comment": "pert xz to work around vasp bug...",
    "pert_xz":		0.01,
    "max_miller": 2,
    "static-opt":false,
    "relax_box":false,

min_slab_size and min_vacuum_size are the minimum size of slab thickness and the vacuume width.
pert_xz is the perturbation through xz direction used to compute surface energy.
max_miller (integer) is the maximum miller index
static-opt:(boolean) whether to use atomic relaxation to compute surface energy. if false, the structure will be relaxed.
relax_box:(boolean) set true if the box is relaxed, otherwise only relax atom positions.

Set up machine

When switching into a new machine, you may modifying the MACHINE, according to the actual circumstance. Once you have finished, the MACHINE can be re-used for any DP-GEN tasks without any extra efforts.

An example for MACHINE is:

{
  "train": [
    {
      "machine": {
        "machine_type": "slurm",
        "hostname": "localhost",
        "port": 22,
        "username": "Angus",
        "work_path": "....../work"
      },
      "resources": {
        "numb_node": 1,
        "numb_gpu": 1,
        "task_per_node": 4,
        "partition": "AdminGPU",
        "exclude_list": [],
        "source_list": [
          "....../train_tf112_float.env"
        ],
        "module_list": [],
        "time_limit": "23:0:0",
        "qos": "data"
      },
      "deepmd_path": "....../tf1120-lowprec"
    }
  ],
  "model_devi": [
    {
      "machine": {
        "machine_type": "slurm",
        "hostname": "localhost",
        "port": 22,
        "username": "Angus",
        "work_path": "....../work"
      },
      "resources": {
        "numb_node": 1,
        "numb_gpu": 1,
        "task_per_node": 2,
        "partition": "AdminGPU",
        "exclude_list": [],
        "source_list": [
          "......./lmp_tf112_float.env"
        ],
        "module_list": [],
        "time_limit": "23:0:0",
        "qos": "data"
      },
      "command": "lmp_serial",
      "group_size": 1
    }
  ],
  "fp": [
    {
      "machine": {
        "machine_type": "slurm",
        "hostname": "localhost",
        "port": 22,
        "username": "Angus",
        "work_path": "....../work"
      },
      "resources": {
        "task_per_node": 4,
        "numb_gpu": 1,
        "exclude_list": [],
        "with_mpi": false,
        "source_list": [],
        "module_list": [
          "mpich/3.2.1-intel-2017.1",
          "vasp/5.4.4-intel-2017.1",
          "cuda/10.1"
        ],
        "time_limit": "120:0:0",
        "partition": "AdminGPU",
        "_comment": "that's All"
      },
      "command": "vasp_gpu",
      "group_size": 1
    }
  ]
}

Following table illustrates which key is needed for three types of machine: train,model_devi and fp. Each of them is a list of dicts. Each dict can be considered as an independent environmnet for calculation.

Key	`train`	`model_devi`	`fp`
machine	NEED	NEED	NEED
resources	NEED	NEED	NEED
deepmd_path	NEED
command		NEED	NEED
group_size		NEED	NEED

The following table gives explicit descriptions on keys in param.json.

Key	Type	Example	Discription
deepmd_path	String	"......tf1120-lowprec"	Installed directory of DeepMD-Kit 0.x, which should contain `bin lib include`.
python_path	String	"....../python3.6/bin/python"	Python path for DeePMD-kit 1.x installed. This option should not be used with `deepmd_path` together.
machine	Dict		Settings of the machine for TASK.
resources	Dict		Resources needed for calculation.
# Followings are keys in resources
numb_node	Integer	1	Node count required for the job
task_per_node	Integer	4	Number of CPU cores required
`numb_gpu`	Integer	4	Number of GPUs required
source_list	List of string	"....../vasp.env"	Environment needed for certain job. For example, if "env" is in the list, 'source env' will be written in the script.
module_list	List of string	[ "Intel/2018", "Anaconda3"]	For example, If "Intel/2018" is in the list, "module load Intel/2018" will be written in the script.
time_limit	String (time format)	23:00:00	Maximal time permitted for the job
mem_limit	Interger	16	Maximal memory permitted to apply for the job.
with_mpi	Boolean	true	Deciding whether to use mpi for calculation. If it's true and machine type is Slurm, "srun" will be prefixed to `command` in the script.
qos	"string"	"bigdata"	Deciding priority, dependent on particular settings of your HPC.
# End of resources
command	String	"lmp_serial"	Executable path of software, such as `lmp_serial`, `lmp_mpi` and `vasp_gpu`, `vasp_std`, etc.
group_size	Integer	5	DP-GEN will put these jobs together in one submitting script.
allow_failure	Boolean	false	Allow the command to return a non-zero exit code.

Troubleshooting

The most common problem is whether two settings correspond with each other, including:
- The order of elements in type_map and mass_map and fp_pp_files.
- Size of init_data_sys and init_batch_size.
- Size of sys_configs and sys_batch_size.
- Size of sel_a and actual types of atoms in your system.
- Index of sys_configs and sys_idx
Please verify the directories of sys_configs. If there isnt's any POSCAR for 01.model_devi in one iteration, it may happen that you write the false path of sys_configs.
Correct format of JSON file.
In 02.fp, total cores you require through task_per_node should be devided by npar times kpar.
The frames of one system should be larger than batch_size and numb_test in default_training_param. It happens that one iteration adds only a few structures and causes error in next iteration's training. In this condition, you may let fp_task_min be larger than numb_test.

License

The project dpgen is licensed under GNU LGPLv3.0.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
dpgen		dpgen
examples		examples
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DP-GEN Manual

Table of Contents

About DP-GEN

Highlighted features

Code structure and interface

Download and Install

Init: Preparing Initial Data

Init_bulk

Init_surf

Run: Main Process of Generator

Test: Auto-test for Deep Generator

Set up machine

Troubleshooting

License

About

Releases

Packages

Contributors 2

Languages

License

LiuGroupHNU/dpgen

Folders and files

Latest commit

History

Repository files navigation

DP-GEN Manual

Table of Contents

About DP-GEN

Highlighted features

Code structure and interface

Download and Install

Init: Preparing Initial Data

Init_bulk

Init_surf

Run: Main Process of Generator

Test: Auto-test for Deep Generator

Set up machine

Troubleshooting

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages