'decomp' test does not thoroughly check the return code of 'bfbcomp.csh' #524

phil-blain · 2020-10-21T17:46:31Z

While investigating a failing 'decomp' test in a base_suite run from @JFLemieux73 for an upcoming PR, I noticed that the error-checking in the 'decomp' test script could use some hardening. Here is the part of the script that runs the bfbcompare.csh script:

CICE/configuration/scripts/tests/test_decomp.script

Lines 74 to 84 in 0ac4cd9

    
             ${ICE_CASEDIR}/casescripts/comparebfb.csh ${base_dir} ${test_dir} 
        
             set bfbstatus = $status 
        
             if ($bfbstatus == 0) then 
        
               set grade = PASS 
        
               echo "bfb baseline and test dataset are identical" 
        
             else 
        
               set grade = FAIL 
        
               echo "bfbcomp and test dataset are different" 
        
             endif 
        
             echo "$grade ${ICE_TESTNAME}_${decomp} bfbcomp ${base_case}" >> ${ICE_CASEDIR}/test_output 
        
           endif

We just check if the script returns "0", and if not mark the test as FAIL with the message "bfbcomp and test dataset are different".

However, we should be checking the return code of the 'bfbcompare.csh' script more thoroughly, so that the reason for the failure is not mis-diagnosed, since the script can return different failure codes in case of errors:

CICE/configuration/scripts/tests/comparebfb.csh

Lines 13 to 17 in 0ac4cd9

    
           # Return Codes (depends on quality of error checking) 
        
           #  0 = pass 
        
           #  1 = fail 
        
           #  2 = missing data 
        
           #  9 = error

(as is done in baseline.script ).

@apcraig

The text was updated successfully, but these errors were encountered:

phil-blain · 2020-10-22T19:07:58Z

I also realized that the decomp test is not re-runnable, because the restart.${decomp} folders are not cleaned up at the beginning of the test script.

In fact this was why the test was failing in the first place; JF had mistakenly launched the suite several times, and the third time the mv -f step failed because the destination was not empty (a little hard to explain but you'll see what I mean if you relaunch this test several times)...

apcraig · 2020-10-22T19:20:06Z

For the major tests (restart, smoke, etc), I have tried to build tests that were rerunable. That is not always easy and not as easy with the decomp test. I always though rerunable tests were a feature, not a requirement, but maybe we should change that.

My general strategy is not to rerun. Generally, I'll run a test suite and then if something fails, I'll setup a standalone case that duplicates the error. I've found individual cases are easier to debug than a test, any test. Once the error is fixed, I then rerun the test from scratch to confirm it passes. We could get rid of the decomp test and change it to 12 separate tests and then the problem would be fixed. There are pluses and minuses to different strategies.

Having said all that, I'll try to have a look at the rerunability issues with the decomp test and see if I can fix that.

phil-blain · 2020-10-22T19:30:59Z

I agree, it's more of a nice-to-have to be re-runnable.

I tend to use the same strategy.

In the case of the decomp test, I think just improving the error checking should be sufficient.

quickstart documentation points to porting (CICE-Consortium#529) check additional return codes in the bfbcomp tool (CICE-Consortium#524) fix undefined variable in ice_init output (CICE-Consortium#520) add documentation about aliases (CICE-Consortium#523) remove key_ CPPS, can be handled by passing communicator thru interface (CICE-Consortium#498)

) * Fix minor issues in documentation, key_ CPPs, bfbcomp return codes quickstart documentation points to porting (#529) check additional return codes in the bfbcomp tool (#524) fix undefined variable in ice_init output (#520) add documentation about aliases (#523) remove key_ CPPS, can be handled by passing communicator thru interface (#498) * update alias documentation

apcraig · 2020-11-24T01:58:55Z

The bfbcomp check has been updated in #532. Rerunability of the decomp test is an outstanding low priority issue, but would propose we defer and close this for now. If anyone wants to reopen, please do.

phil-blain added Scripts Testing labels Oct 21, 2020

apcraig self-assigned this Oct 22, 2020

apcraig mentioned this issue Nov 21, 2020

Fix minor issues in documentation, key_ CPPs, bfbcomp return codes #532

Merged

16 tasks

apcraig closed this as completed Nov 24, 2020

phil-blain mentioned this issue May 21, 2021

'decomp' test is not re-runnable #601

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'decomp' test does not thoroughly check the return code of 'bfbcomp.csh' #524

'decomp' test does not thoroughly check the return code of 'bfbcomp.csh' #524

phil-blain commented Oct 21, 2020

phil-blain commented Oct 22, 2020

apcraig commented Oct 22, 2020

phil-blain commented Oct 22, 2020

apcraig commented Nov 24, 2020

'decomp' test does not thoroughly check the return code of 'bfbcomp.csh' #524

'decomp' test does not thoroughly check the return code of 'bfbcomp.csh' #524

Comments

phil-blain commented Oct 21, 2020

phil-blain commented Oct 22, 2020

apcraig commented Oct 22, 2020

phil-blain commented Oct 22, 2020

apcraig commented Nov 24, 2020