-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
electronWannierTransport calculations crash #225
Comments
Hi Siyu, Glad to hear you're using the code. Let's see if we can get to the bottom of this. First, can you confirm for me that you have the most recent version of the code, (just run git pull to be sure) and give me a little info about the resources you used to run this calculation? Also does this happen if you use a smaller kpoint mesh? I just want to make sure nothing is overflowing or running out of memory first. Thanks, |
Hi Jenny Thank you! I confirm that I am using the most recent version of the code.
I also confirm that the calculation can finish properly if a smaller k-mesh is used. For example, with kMesh = [15,27,39], the job can be done using 5 compute nodes (More specifically, 15 MPI processes x 18 OpenMP threads). I am using a cluster in which each node has 56 cpus and 384 GiB of RAM, that is 6840 MB per cpu. However, with kMesh = [20,36,52], my phoebe will always crash with the abovementioned error, no matter how I increase the memory. I have tried to launch the job with 10 nodes (10 MPI processes x 56 OpenMP threads, maximizing the memory available to each MPI process), but it still does not work. Happy to provide more information if needed. Best wishes |
Hi Siyu, Indeed, the code should be able to scale that far in kMesh (as well as pretty far beyond that -- I've been able to run kMesh = [350,350,350] for some materials). The memory content of your job also doesn't seem very large. I have some ideas about what might be happening, and likely it's going to be a super minor fix on my part. I can probably fix this in the next day or so. I'm sure you don't want to share your data broadly, but if you are willing to let me look your files we can communicate by email. You can see my email address is listed under my on my Github account page under my name and photo -- please write and I'll provide a place for you to upload the data. Thanks for reporting this, we appreciate it when users let us know about these things. |
Hi Siyu, Would you mind checking out the branch named activeBandsVelocitiesOverflowBug?
and then go into your build directory and type "make" again. Let me know if this does not fix your issue somehow, but I was able to reproduce and see it fixed on my machine. Basically it as I suspected -- your system is so big, you managed to overflow an integer argument storing the number of band velocities for the phonon band structure :). I just had to increase it from int -> size_t. There is a chance you could encounter some other such error just because you have so many phonon bands. Let me know if something else fails, usually these issues are quite fast to find (and ideally fix). Best, |
Hi Jenny, I have done what you said here. However, now my phoebe throws segmentation faults:
Can you see your phoebe output "started computing scattering matrix"? The screen output of my phoebe still gets stuck at "Computing phonon band structure. Allocating 0.0502 GB (per MPI process)." |
Hi Siyu, Ok, I was able to reproduce this -- for me, it's not a seg fault but a very reasonable out of memory error. This takes a bit more work to get around, but it could possibly be done. A fast workaround may be to reduce the size of the population window, like this:
however, this can be dangerous if one wants to use the Wigner correction, as for that, contributions can come from far away from the Fermi energy (I think this is noted in the tutorial as well) and in general one also should then converge wrt the window population limit. It would still give you an idea of if the RTA is already converged here. I think there is a workaround to this that I've wanted to implement anyway. Let me investigate the difficulty of that change and get back to you in ~ a day. Jenny |
Hi, I am doing a electron transport calculation with Phoebe while it unfortunately crashed somehow.
The following is the screen outputs before it crashed:
After this, it crashed with throwing the following error:
Do you have any clue regarding this error?
The text was updated successfully, but these errors were encountered: