Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PhParser: allow for pattern initialization #1034

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

bastonero
Copy link
Collaborator

The ph.x should be parallelized by setting in the input parameters start_irr and last_irr to 0. This allows the program to exit smoothly and it further avoids to wait for the rewriting of the wavefunctions, which can be a rather long and intensive IO operation, not really suited for initialization runs.

The parser is then adjusted to account for this option, as for some reason the line having JOB DONE is not printed in such cases. A simple specialized parser is also added to store the number of q-points and their values, which can be later on used to parallelize over q-points by specifying last_q and start_q.

PS: this also avoids the creation of the aiida.EXIT file, which triggers the program to stop.

Note: this is also the recommended way from the official webpage. To report the statements in case the link will be broken in the future:

NB: The program ph.x writes on the tmp_dir/_ph0/{prefix}.phsave directory
a file for each representation of each q point. This file is called
dynmat.#iq.#irr.xml where #iq is the number of the q point and #irr
is the number of the representation. These files contain the
contribution to the dynamical matrix of the irr representation for the
iq point.

If [recover](https://www.quantum-espresso.org/Doc/INPUT_PH.html#recover)=.true. ph.x does not recalculate the
representations already saved in the tmp_dir/_ph0/{prefix}.phsave
directory.  Moreover ph.x writes on the files patterns.#iq.xml in the
tmp_dir/_ph0/{prefix}.phsave directory the displacement patterns that it
is using. If [recover](https://www.quantum-espresso.org/Doc/INPUT_PH.html#recover)=.true. ph.x does not recalculate the
displacement patterns found in the tmp_dir/_ph0/{prefix}.phsave directory.

This mechanism allows:

  1) To recover part of the ph.x calculation even if the recover file
     or files are corrupted. You just remove the _ph0/{prefix}.recover
     files from the tmp_dir directory. You can also remove all the _ph0
     files and keep only the _ph0/{prefix}.phsave directory.

  2) To split a phonon calculation into several jobs for different
     machines (or set of nodes). Each machine calculates a subset of
     the representations and saves its dynmat.#iq.#irr.xml files on
     its tmp_dir/_ph0/{prefix}.phsave directory. Then you collect all the
     dynmat.#iq.#irr.xml files in one directory and run ph.x to
     collect all the dynamical matrices and diagonalize them.

NB: To split the q points in different machines, use the input
variables start_q and last_q. To split the irreducible
representations, use the input variables [start_irr](https://www.quantum-espresso.org/Doc/INPUT_PH.html#start_irr), [last_irr](https://www.quantum-espresso.org/Doc/INPUT_PH.html#last_irr). Please
note that different machines will use, in general, different
displacement patterns and it is not possible to recollect partial
dynamical matrices generated with different displacement patterns.  A
calculation split into different machines will run as follows: A
preparatory run of ph.x with [start_irr](https://www.quantum-espresso.org/Doc/INPUT_PH.html#start_irr)=0, [last_irr](https://www.quantum-espresso.org/Doc/INPUT_PH.html#last_irr)=0 produces the sets
of displacement patterns and save them on the patterns.#iq.xml files.
These files are copied in all the tmp_dir/_ph0/{prefix}.phsave directories
of the machines where you plan to run ph.x. ph.x is run in different
machines with complementary sets of start_q, last_q, [start_irr](https://www.quantum-espresso.org/Doc/INPUT_PH.html#start_irr) and
[last_irr](https://www.quantum-espresso.org/Doc/INPUT_PH.html#last_irr) variables.  All the files dynmat.#iq.#irr.xml are
collected on a single tmp_dir/_ph0/{prefix}.phsave directory (remember to
collect also dynmat.#iq.0.xml).  A final run of ph.x in this
machine collects all the data contained in the files and diagonalizes
the dynamical matrices.  This is done requesting a complete dispersion
calculation without using start_q, last_q, [start_irr](https://www.quantum-espresso.org/Doc/INPUT_PH.html#start_irr), or [last_irr](https://www.quantum-espresso.org/Doc/INPUT_PH.html#last_irr).
See an example in examples/GRID_example.

On parallel machines the q point and the irreps calculations can be split
automatically using the -nimage flag. See the phonon user guide for further
information.

The ph.x should be parallelized by setting in the input parameters
`start_irr` and `last_irr` to 0. This allows the program to exit
smoothly and it further avoids to wait for the rewriting of the
wavefunctions, which can be a rather long and intensive IO
operation, not really suited for initialization runs.

The parser is then adjusted to account for this option, as for some
reason the line having `JOB DONE` is not printed in such cases.
A simple specialized parser is also added to store the number of
q-points and their values, which can be later on used to parallelize
over q-points by specifying `last_q` and `start_q`.
@bastonero bastonero force-pushed the ph-initialization branch from 4df9fb0 to 5c396c1 Compare June 18, 2024 17:08
@bastonero bastonero requested a review from sphuber June 18, 2024 17:08
Comment on lines 48 to 50
if parameters:
self.out('output_parameters', orm.Dict(parameters))
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case that parameters is empty, wouldn't that reasonably correspond to some kind of error? Or are you intentionally letting it continue the parsing in that case to find a generic error?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah in principle ph.x can still throw some errors, say if something was wrong with some files etc.

Comment on lines 470 to 472
q_points = [list(map(float, coord)) for coord in coords]

parameters.update({'q_points': q_points})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense perhaps to return this as an actual KpointsData instead of a Dict?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I thought about that, maybe it would. On the other hand, I was even wondering whether it's worth it to actually parse it or not. If one has a grid, one can/should use start/last_q, if you provide a KpointsData, this would return the same node basically. So, don't know. Any strong opinion? What about maybe adding some "post-process" parsing via tools?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not quite sure I fully understand the use case of this initialization run. But if for the common use case a user would actually want to use the parsed grid as an input for the next calculation (i.e. they are going to turn it into a KpointsData anyway) then we might as well have the parser do it here.

If, instead, the kpoints won't be used as is, but in parts and so the KpointsData would have to be transformed, then you might as well just leave it as is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This initialization run is the proper initialization for ph.x, which avoids the use of the .EXIT file, which would still make ph.x to rewrite the wavefunctions for nothing (hence, wasting time - to give an idea, an 18 atoms system it would take ~20 min, which are wasted node hours). The key ingredient here is just to determine the number of q points, and the next runs would be parallelized not with the specific q point but using start_q and last_q instead. The parsing of the q points as either dictionary in the output parameters or as kpointsdata is just out of completeness, but not really meant to be used (at least, as I am thinking to use this initialization run). I could simply remove it at this point.

if parameters['number_of_qpoints'] != len(parameters['q_points']):
return parameters, False

return parameters, True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a big fan of communicating errors through return values, especially not if that means having to turn the return value into a tuple. Since you only really use the parameters result in case there isn't a problem, what is the problem with just raising an exception and catching that in the caller?

@bastonero bastonero requested a review from sphuber June 26, 2024 22:06
Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bastonero . Just the one test is failing. If you fix that, I will merge

@bastonero bastonero added the pr/blocked PR is blocked by another PR that should be merged first label Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr/blocked PR is blocked by another PR that should be merged first priority/important topic/parsers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants