-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Output model data in a portable and friendly format #30
Comments
Somehow it didn't connect before that NetCDF is a binary format with embedded strings, rather than a text format. That means it shouldn't be materially larger than the native stuff, and, given the wide array of software that exists for working with it, I now think the product that is MIM should definitely produce output in NetCDF. The presence of metadata in the format also makes me less antsy about storing (some) model outputs in git, as they will be relatively interpretable. This leaves open these follow-on questions:
|
I agree that using NetCDF for the output is a highly desirable behaviour. Seems to me we are agreed on this, and now we just need to decide how to implement it. You've laid out the pros and cons of each choice pretty well. Placing the NetCDF functionality in the core will complicate the build process and introduces an external dependency for the model. At the moment the compilation is very straightforward, and while I'm loathe to sacrifice that simplicity, NetCDF output is one of the few reasons I would (the other main one being parallelisation). As far as which version, I would lean towards NetCDF4, even though scipy.io only supports V3. I've seen both versions being used in the wild, but I don't see a strong case for choosing the older standard. There are very mature python libraries for dealing with NetCDF4 files, and MATLAB can read both formats. I think that making the output CF compliant is indeed a no-brainer. There are a number of data analysis suites that more or less assume this (see e.g. Iris), though some of them can deal with non-CF compliant input data. Here's some information about the Fortran NetCDF library. It will definitely complicate the build process, but I think the trade off is probably worth it - provided the user manual has a sufficiently helpful walk through. In summary - my preference is for CF compliant, NetCDF4 output produced by the Fortran core, but I'm willing to be talked out of those preferences. |
There's also another option - wrap the Fortran program in python. Does this option have downsides in terms of speed or complicating any future desires to make it run in parallel? I don't know much about the process. |
Any sort of Python wrapping should have a negligible effect on performance, or on parallelism, provided the chunks of work done by the wrappee are large enough. For instance, I expect it wouldn't be materially slower to have the integration loop be managed by Python, so long as computing all the tendencies were still one big chunk of Fortran code. In fact, if numpy or scipy has optimized loops for stencil computations (which it may?), it may not be a terrible exercise to rewrite (a simplified version of) the model entirely in Python+numpy and see what performance looks like. Even if the verdict is that it's terrible, that version can be used as a sanity check on the results from the Fortran. Or, perhaps, we could construct a very simple benchmark program to test this hypothesis before doing a rewrite. |
That's good to know. Given that python dependencies should be easier to solve than Fortran dependencies, perhaps a python wrapping is the best option? |
The output are currently dumped as unformatted Fortran native files. These come with no metadata, are not particularly friendly to deal with, and may not be portable across systems.
Other options include:
The text was updated successfully, but these errors were encountered: