Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'decomp' test does not thoroughly check the return code of 'bfbcomp.csh' #524

Closed
phil-blain opened this issue Oct 21, 2020 · 4 comments
Closed
Assignees

Comments

@phil-blain
Copy link
Member

While investigating a failing 'decomp' test in a base_suite run from @JFLemieux73 for an upcoming PR, I noticed that the error-checking in the 'decomp' test script could use some hardening. Here is the part of the script that runs the bfbcompare.csh script:

${ICE_CASEDIR}/casescripts/comparebfb.csh ${base_dir} ${test_dir}
set bfbstatus = $status
if ($bfbstatus == 0) then
set grade = PASS
echo "bfb baseline and test dataset are identical"
else
set grade = FAIL
echo "bfbcomp and test dataset are different"
endif
echo "$grade ${ICE_TESTNAME}_${decomp} bfbcomp ${base_case}" >> ${ICE_CASEDIR}/test_output
endif

We just check if the script returns "0", and if not mark the test as FAIL with the message "bfbcomp and test dataset are different".

However, we should be checking the return code of the 'bfbcompare.csh' script more thoroughly, so that the reason for the failure is not mis-diagnosed, since the script can return different failure codes in case of errors:

# Return Codes (depends on quality of error checking)
# 0 = pass
# 1 = fail
# 2 = missing data
# 9 = error

(as is done in baseline.script ).

@apcraig

@phil-blain
Copy link
Member Author

I also realized that the decomp test is not re-runnable, because the restart.${decomp} folders are not cleaned up at the beginning of the test script.

In fact this was why the test was failing in the first place; JF had mistakenly launched the suite several times, and the third time the mv -f step failed because the destination was not empty (a little hard to explain but you'll see what I mean if you relaunch this test several times)...

@apcraig
Copy link
Contributor

apcraig commented Oct 22, 2020

For the major tests (restart, smoke, etc), I have tried to build tests that were rerunable. That is not always easy and not as easy with the decomp test. I always though rerunable tests were a feature, not a requirement, but maybe we should change that.

My general strategy is not to rerun. Generally, I'll run a test suite and then if something fails, I'll setup a standalone case that duplicates the error. I've found individual cases are easier to debug than a test, any test. Once the error is fixed, I then rerun the test from scratch to confirm it passes. We could get rid of the decomp test and change it to 12 separate tests and then the problem would be fixed. There are pluses and minuses to different strategies.

Having said all that, I'll try to have a look at the rerunability issues with the decomp test and see if I can fix that.

@phil-blain
Copy link
Member Author

I agree, it's more of a nice-to-have to be re-runnable.

I tend to use the same strategy.

In the case of the decomp test, I think just improving the error checking should be sufficient.

apcraig added a commit to apcraig/CICE that referenced this issue Nov 21, 2020
quickstart documentation points to porting (CICE-Consortium#529)
check additional return codes in the bfbcomp tool (CICE-Consortium#524)
fix undefined variable in ice_init output (CICE-Consortium#520)
add documentation about aliases (CICE-Consortium#523)
remove key_ CPPS, can be handled by passing communicator thru interface (CICE-Consortium#498)
apcraig added a commit that referenced this issue Nov 24, 2020
)

* Fix minor issues in documentation, key_ CPPs, bfbcomp return codes

quickstart documentation points to porting (#529)
check additional return codes in the bfbcomp tool (#524)
fix undefined variable in ice_init output (#520)
add documentation about aliases (#523)
remove key_ CPPS, can be handled by passing communicator thru interface (#498)

* update alias documentation
@apcraig
Copy link
Contributor

apcraig commented Nov 24, 2020

The bfbcomp check has been updated in #532. Rerunability of the decomp test is an outstanding low priority issue, but would propose we defer and close this for now. If anyone wants to reopen, please do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants