Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialize ED RtHS to zeros #2317

Merged
merged 5 commits into from
Jul 12, 2024

Conversation

andrew-platt
Copy link
Collaborator

Ready to merge

Feature or improvement description
The RtHS derived type in ElastoDyn contains many arrays, none of which were initialized to zero during init. Some of these arrays end up getting used without initializing, so whatever leftover stuff was in memory would then be accessed as values in some of these arrays.

We discovered this issue while converting the parsing of the blade file to use the FileInfo data structure and parsing. If a file of 64 lines long was loaded with the ProcessCom command then discarded without use during the ReadBladeFile routine, the 5MW_Land_Linear_Aero case would give the expected results (tested with gcc 12.3.0, double precision release). However, if more lines were present, whatever was written in those memory locations would end up in an RtHS array. We noticed this as incorrect acceleration values for blade 1 nodes in the y_op array with values upwards of 1e+294, which caused issues with linearization.

If line 65 contained r, the resulting frequencies would be:

46:               Frequency (Hz)                        Damping (%)
46:            ----------------------------         ----------------------------
46:             Ref              New                 Ref              New
46:            0.321115263      0.377324016         0.004761374      0.935368246
46:            0.336468595      0.764882101         0.080286934      0.642368682
46:            0.724206648      0.791899953         0.651774323      0.092775252
46:            0.749265197      1.080146181         0.656903895      0.008118302
46:            0.749326080      1.904270051         0.646239398      0.062853626
46:            1.083362496      1.961998652         0.012309791      0.160466846
46:            1.103154529      1.980423436         0.014866192      0.180651456
46:            1.701439646      2.395306777         0.023216699     -0.521713667
46:            1.928532353      2.618982440         0.180245471      0.577067874
46:            1.945246335      3.155262530         0.177738889      0.021554264

If line 65 contained rrrrrrrr, the resulting frequencies would be off as follows:

46:  Case: 5MW_Land_Linear_Aero:1.lin: :
46:               Frequency (Hz)                        Damping (%)
46:            ----------------------------         ----------------------------
46:             Ref              New                 Ref              New
46:            0.321115263     83.209828452         0.004761374     -0.826506890
46:            0.336468595   8662506957.551628113         0.080286934     -0.000000256
46:            0.724206648   574739393926509749142693497935502516420608.000000000         0.651774323      0.103782929
46:            0.749265197   37471977929674705460798753326446458938499006464.000000000         0.656903895      0.044189924
46:            0.749326080   1596353572817605839899136123782313993697574780928.000000000         0.646239398     -0.221361357
46:            1.083362496   6454645867183894278226561596858301745340078685046177792.000000000         0.012309791     -0.058691704
46:            1.103154529   44775087308082026458361647148189744859863545122018820096.000000000         0.014866192      0.681102728
46:            1.701439646   74380807686665623026286669625369691668842911394732965888.000000000         0.023216699     -0.789905619
46:            1.928532353   301056255025183220231506577309715866922450175489959002112.000000000         0.180245471      0.453293460
46:            1.945246335   3180994479181899036819026195066465440447406087302753026048.000000000         0.177738889      0.017797110

Explicitly setting all RtHS arrays to 0.0_ReKi during allocation would get rid of this issue.

@deslaughter also found an issue in the ProcessCom routine that caused an index issue in Intel compilations.

Related issue, if one exists
There may have been some phantom issues caused by this.

Impacted areas of the software
ED calculations of blade 1 acceleration values, and perhaps other issues.

andrew-platt and others added 4 commits July 10, 2024 17:15
Observation -- read file ./tmp.dat using ProcessComFile inside the ReadBladeFile, and don't keep the data
 - at 64 lines, frequencies are ok and no huge acceleration values in OP
 - at 65 lines, something goes wrong and the frequencies change and huge accelerations in OP show up

Possibility 1:
Maybe the FileInfo%Lines is getting allocated to the stack even thought it should be on the heap at 65 lines?
This would result in clobbering something behind it on the stack that was already there?  Not sure that logic holds

Possibility 2:
At this size, it takes up enough space on the heap that when deallocated, something uninitialized ends up using that memory.  Problems with this include the "why has this never been observed before" question.
…ddressing or null pointer dereferencing

- CountWords would scan past end of line
- ProcessComFile Cleanup would dereference a null pointer when getting NextFile pointer
- WrScr would hide a character string allocation which caused a memory issue in ifx
Some of these values were not getting zeroed out.  This was occasionally leading to spurious root acceleration values when memory that was previously occupied by something else non-zero was used.
If a file of more than 64 lines was read in, then discarded, whatever was on line 65 or later could end up in arrays used by RtHS (gcc 12.3.0, double precision release).  This would show up in linearization with case `5MW_Land_Linear_Aero`.
@andrew-platt
Copy link
Collaborator Author

This should be backported to 3.5.4

Also moved the zeroing into the `else` part of the error checking instead of after -- we could potentially have triggered memory violations otherwise and not gotten our error back.
@andrew-platt andrew-platt merged commit 1a90606 into OpenFAST:dev Jul 12, 2024
21 checks passed
@andrew-platt andrew-platt mentioned this pull request Jul 12, 2024
@bjonkman
Copy link
Contributor

Do you know which arrays were using uninitialized values? FAST 7 used to be built with a compiler option to initialize all variables to 0 (or false), but we were trying to make sure we explicitly initialized variables that need it. This should also give some performance improvements on variables that don't need to be initialized before their use.

NextFile => CurrFile%Next
DO
DO WHILE(ASSOCIATED(currFile))
NextFile => CurrFile%Next
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am hopeful that this fixes some of the weird behavior that was sometimes causing regression tests to fail on file reads when the OpenFAST input file reading was changed to use this routine!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope it improves the stability. This is definitely in the realm of undefined behavior, so it could have been doing anything when CurrFile was not associated

@andrew-platt
Copy link
Collaborator Author

I don't know exactly which arrays weren't getting initialized. I had started looking into that, but then decided to just take brute force approach to make sure everything got initialized.

@bjonkman
Copy link
Contributor

I don't know exactly which arrays weren't getting initialized. I had started looking into that, but then decided to just take brute force approach to make sure everything got initialized.

I'm just a little concerned that some of those variables may need to be explicitly zeroed out in the RHS routine so they don't retain values between time steps. If the issue is in an initialization routine that's a little different.

@andrew-platt
Copy link
Collaborator Author

That's a very good point! I had been assuming it was an initialization issue only, but since it is in the RHS they may need explicit zeroing at every timestep as well.

@andrew-platt andrew-platt deleted the b/FileInfoType_Memory_Clobber branch July 16, 2024 16:22
@andrew-platt andrew-platt mentioned this pull request Oct 21, 2024
28 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants