-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenFAST chkp corrupted on frontier #984
Comments
In nalu-wind we rely on the openfast-cpp interface which also loops over indices starting at 0: We'll have to dig deeper into openfast to see if this offset is handled inside the openfoam data structures. |
Yeah that's interesting. Can you throw a print statement in that if condition I mentioned and see if it ever hits that |
I'm not in a position to test this at the moment. I can put it on my backlog though. This would be more of an openfast core issue than a nalu-wind issue. |
I tried a naive approach to get rid of the off-by-1 error. Basically I replaced this line
with auto my_tid_local = fi.tid_local+1;
fast_func(FAST_CreateCheckpoint, &my_tid_local, rst_file); However that also resulted in segfaults when writing out the chkp files. So in the end I just commented out the if else check in Lawrence |
Yeah that's not the right fix. Turns out this is a bit messed up. You need to increment The definition of the function: https://github.com/OpenFAST/openfast/blob/main/modules/openfast-library/src/FAST_Subs.f90#L37 has The call site: https://github.com/OpenFAST/openfast/blob/main/modules/openfast-library/src/FAST_Library.f90#L144: passes in I am in conversation with @andrew-platt for other fixes. |
tracking this here: OpenFAST/openfast#2064 |
I did a little digging into this. There is a discrepancy between how we handle the turbine id with FAST.Farm and with the cpp interface. With FAST.Farm, we index using the Fortran index start of 1 for the turbine array between As noted above, since the closing of the checkpoint file assumes the start index of 1, it is never closed with the cpp interface. So to fix this, I think we have three options.
I'm inclined to pursue option 3 as this will preserve the existing numbering systems for cpp and FAST.Farm . I don't think it will be all that difficult to do in OpenFAST. I'll post here with when I have a proposed solution in place. |
Hi @andrew-platt , thank you for looking into this! I appreciate you digging through this. Option 3 sounds good. I do still think (as I noted in the openfast issue) that there seems to be an inconsistency in the way |
Proposed solution: OpenFAST/openfast#2097 |
Confirmed that this is fixed with OpenFAST/openfast#2097 |
OpenFAST checkpoint files aren't closed properly. This is an issue on any machine we use but only results in corrupted chkp files on Frontier (that we've seen so far).
The reason for this is that OpenFAST assumes turbine ID numbering with fortran numbering (starting at 1) but we interface to the library using C numbering (starting at 0). So the following check (here):
never gets hit and the chkp file never gets closed.
The fix I am thinking about right now is to send a "fortran id" to OpenFAST from amr-wind. But this causes a segfault somewhere else. Still need to track that down.
@psakievich I am not sure how/if this affects the way Nalu-Wind interfaces with OpenFAST as well.
The text was updated successfully, but these errors were encountered: