Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create a shr_log_error method #63

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jedwards4b
Copy link
Contributor

After discussion ESCOMP/CMEPS#534

We decided to just print the error to both the PET log and the component log and then return up the stack with errors.

@jedwards4b jedwards4b requested a review from billsacks January 31, 2025 16:57
@jedwards4b jedwards4b self-assigned this Jan 31, 2025
Copy link
Member

@billsacks billsacks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for these changes, @jedwards4b ! I'm generally happy with this. Other than my one comment asking for a better comment, the main thing I'd like to ask for is some more clarity on the usage of rc in these routines. It may be enough to just add some documentation of this – particularly, that shr_log_error sets rc to ESMF_FAILURE – but I'll share some thoughts that made this confusing to me and might suggest some rework of this. I'd be happy to talk more about this if it would help.

Thought 1: It looks like the original usage of rc to shr_abort_abort was to provide an error code to be passed to mpi_abort. Now, though, it seems to be a mix of things: it's possible that some uses of this routine still are using it for that original behavior, but now its usage in shr_log_error suggests that it should be interpreted as an ESMF return code. Maybe it's okay to have those two different uses, particularly since ESMF_SUCCESS is 0, but I think it would at least help to document the intended meaning of rc in both shr_abort_abort and shr_log_error.

Thought 2: If part of the idea here is to replace previous code that wrote error messages to ESMF PET files, and if we have an actual ESMF error code in these cases, then it could be better to use ESMF_LogSetError or ESMF_LogFoundError instead of ESMF_LogWrite, since these will translate the integer error code into a meaningful error message. However, this will only work if the rc being operated on is an ESMF error code – not, for example, if it's the return code from a netcdf call (though ESMF_LogFoundNetCDFError could be used for that purpose).

So I wonder if it would make sense to change this interface to have multiple separate rc arguments:

  • Optional input, esmf_rc: if present, this gives the ESMF return code that indicates the error; it will be printed via ESMF_LogSetError or ESMF_LogFoundError (I'm not clear on the difference between those two at this point)
  • Optional input, netcdf_rc: if present, this gives the NetCDF return code that indicates the error; it will be printed via ESMF_LogFoundNetCDFError
  • Optional output, rc: This will be set to ESMF_FAILURE; this is just a convienence so that you don't need to remember to have a separate line setting rc = ESMF_FAILURE before returning. (Maybe unnecessary.)

I don't feel very certain about these thoughts, so definitely open to discussion (or to you just moving ahead with whatever makes sense to you).

@@ -117,4 +118,47 @@ subroutine shr_log_getLogUnit(unit)

end subroutine shr_log_getLogUnit

subroutine shr_log_error(string, rc, line, file)
use esmf, only : ESMF_LOGWRITE, ESMF_LOGMSG_ERROR, ESMF_FINALIZE, ESMF_END_ABORT, ESMF_FAILURE, ESMF_SUCCESS
! Consistent stopping mechanism
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment should say something like: "Log the given message to all places a user may want to look for an error message: the ESMF log file, the log unit given by shr_log_unit, and standard error". Actually, there was a good comment at the top of the deleted print_error_to_logs function that could be restored as a starting point for this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the comment as requested.

@jedwards4b
Copy link
Contributor Author

The user is expected to have set a useful error message in string. It may be something generated from esmf or netcdf or directly from the calling function. Both the error message and the code are written to the logs by this routine. Setting rc to the generic ESMF_FAILURE on exit from this routine assures that
the code will abort at the top of the stack.

Copy link
Member

@billsacks billsacks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updated comment. I thought more about the rc meaning/handling after reviewing the CDEPS PR. I had originally thought that this would replace the implementation of chkerr, but now I'm seeing that's maybe not the intent, and maybe the intent is to still generally use chkerr after ESMF calls (which will call ESMF_LogFoundError to translate rc to a string message). In that case, I'm okay with the implementation / handling of rc here, so my one remaining request is that you add documentation to shr_log_error that it sets rc to ESMF_FAILURE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants