Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use YAML for saving and loading 1D flame simulations #1112

Merged
merged 26 commits into from
Oct 26, 2021

Conversation

speth
Copy link
Member

@speth speth commented Oct 6, 2021

Changes proposed in this pull request

In order to move ahead with the deprecation and removal of the XML format, we need a replacement for saving and loading 1D simulation results for all language interfaces. While the HDF5-based format available in the Python module is clearly the best choice for that interface, using YAML is the simplest option for the other interfaces.

  • Save 1D flame simulations to YAML files
  • Load 1D flame simulations from YAML files
  • Fix a couple of errors in the Matlab toolbox related to the use of the now-removed component names u and v (replaced by velocity and spread_rate in Cantera 2.5.0).

If applicable, provide an example illustrating new features this pull request is introducing

Checklist

  • The pull request includes a clear description of this code change
  • Commit messages have short titles and reference relevant issues
  • Build passes (scons build & scons test) and unit tests address code coverage
  • Style & formatting of contributed code follows contributing guidelines
  • The pull request is ready for review

@codecov
Copy link

codecov bot commented Oct 6, 2021

Codecov Report

Merging #1112 (3268e96) into main (0d7373f) will increase coverage by 0.04%.
The diff coverage is 79.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1112      +/-   ##
==========================================
+ Coverage   73.45%   73.49%   +0.04%     
==========================================
  Files         365      365              
  Lines       47912    48187     +275     
==========================================
+ Hits        35194    35417     +223     
- Misses      12718    12770      +52     
Impacted Files Coverage Δ
include/cantera/base/AnyMap.h 100.00% <ø> (ø)
include/cantera/oneD/Boundary1D.h 50.00% <ø> (ø)
include/cantera/oneD/Domain1D.h 85.57% <ø> (ø)
include/cantera/oneD/IonFlow.h 84.61% <ø> (ø)
include/cantera/oneD/OneDim.h 52.45% <ø> (ø)
src/oneD/IonFlow.cpp 52.38% <0.00%> (-1.47%) ⬇️
src/oneD/Boundary1D.cpp 53.22% <52.27%> (-0.21%) ⬇️
src/oneD/Sim1D.cpp 82.58% <90.16%> (+1.63%) ⬆️
src/oneD/StFlow.cpp 90.44% <91.57%> (+0.35%) ⬆️
src/oneD/Domain1D.cpp 85.20% <92.10%> (+2.00%) ⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0d7373f...3268e96. Read the comment docs.

@ischoegl
Copy link
Member

ischoegl commented Oct 6, 2021

@speth ... thanks for taking this on. Without looking at details, I believe this is a very workable approach to moving away from XML. Do you anticipate creating an equivalent to SolutionArray (currently Python API only) at the Cantera C++ core eventually? (if yes, we may be able to move HDF to C++, and offer this as one of the input/output formats it for all interfaces). PS: The HDF5 structure should be writable by a C++ library; and I am absolutely not advocating for tackling this without a C++ SolutionArray equivalent.

@speth
Copy link
Member Author

speth commented Oct 6, 2021

While I'd love to have the ability to write HDF5 output files directly, my previous experience with using the HDF5 C++ library was that it was a beastly dependency to add to a project, especially when trying to build on Windows where there isn't (or at least wasn't) an easy way to install the library and headers. That was the main driver for choosing to base this on the already-available YAML input/output capabilities, even if HDF5 is better suited for this mostly numeric array-centric data.

If you want to start an enhancement discussion to talk about a C++ equivalent to the Python SolutionArray class, I'd be happy to share my thoughts on it.

@ischoegl
Copy link
Member

ischoegl commented Oct 6, 2021

If you want to start an enhancement discussion to talk about a C++ equivalent to the Python SolutionArray class, I'd be happy to share my thoughts on it.

Not sure that this is on my own short-term horizon, but I believe it may be a good new feature for 3.0. It's a substantial project.

@ischoegl
Copy link
Member

ischoegl commented Oct 6, 2021

@speth ... I think it still makes sense to start the discussion now (Cantera/enhancements#119).

To me, the ability to export to and import from YAML would be parallel to existing CSV and HDF5 (with the latter only supported for the Python API). In other words, it would be neat if we could import the YAML format from the Python API also (which presumably would be a simple pass-through to C++). You may already have that in the works ...

@speth speth marked this pull request as ready for review October 8, 2021 00:57
@ischoegl
Copy link
Member

ischoegl commented Oct 15, 2021

Leaving a comment about HDF vs YAML before looking at details.

The output produced by save for the adiabatic_flame.py example for the proposed YAML export is
adiabatic_flame.txt

The same case output generated by write_hdf has the following structure

$ h5dump -n 1 adiabatic_flame_.h5 
HDF5 "adiabatic_flame_.h5" {
FILE_CONTENTS {
 group      /
 group      /group0
 attribute  /group0/Sim1D_type
 attribute  /group0/cantera_version
 attribute  /group0/curve
 attribute  /group0/date
 attribute  /group0/energy_enabled
 attribute  /group0/fixed_temperature
 attribute  /group0/git_commit
 attribute  /group0/max_grid_points
 attribute  /group0/max_time_step_count
 attribute  /group0/prune
 attribute  /group0/radiation_enabled
 attribute  /group0/ratio
 attribute  /group0/slope
 attribute  /group0/soret_enabled
 attribute  /group0/transport_model
 group      /group0/flame
 attribute  /group0/flame/Domain1D_type
 attribute  /group0/flame/emissivity_left
 attribute  /group0/flame/emissivity_right
 attribute  /group0/flame/name
 attribute  /group0/flame/steady_abstol
 attribute  /group0/flame/steady_reltol
 attribute  /group0/flame/transient_abstol
 attribute  /group0/flame/transient_reltol
 dataset    /group0/flame/T
 dataset    /group0/flame/X
 dataset    /group0/flame/density
 dataset    /group0/flame/grid
 group      /group0/flame/phase
 attribute  /group0/flame/phase/name
 attribute  /group0/flame/phase/source
 dataset    /group0/flame/velocity
 group      /group0/products
 attribute  /group0/products/Domain1D_type
 attribute  /group0/products/name
 dataset    /group0/products/T
 dataset    /group0/products/X
 dataset    /group0/products/density
 group      /group0/products/phase
 attribute  /group0/products/phase/name
 attribute  /group0/products/phase/source
 group      /group0/reactants
 attribute  /group0/reactants/Domain1D_type
 attribute  /group0/reactants/name
 dataset    /group0/reactants/T
 dataset    /group0/reactants/X
 dataset    /group0/reactants/density
 group      /group0/reactants/phase
 attribute  /group0/reactants/phase/name
 attribute  /group0/reactants/phase/source
 dataset    /group0/reactants/velocity
 }
}

Copy link
Member

@ischoegl ischoegl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@speth ... thank you for taking on the replacement of XML export. While this is not meant to be a full review, I commented on a couple of minor things I noticed. Beyond, there are a couple of Python examples left where XML is not replaced (and probably should be).

Regarding the PR itself, I don't have any concerns about the implementation per se, but I think it makes sense to discuss the long-term fate of storage structures for YAML. Cantera 2.5.1 introduced HDF output for Python, which is largely centered on SolutionArray, and extends this philosophy to various 1D objects (the way it was conceived centers on the user-facing Python API). The YAML structure here is - as far as I am aware - mostly a translation of the legacy XML structure that focuses on the implementation, and not the user interface.

As mentioned earlier, I hope to see both SolutionArray and (hopefully) HDF support pushed into the C++ core eventually (perhaps for Cantera 3.0?). That way, we can offer users a more convenient 'save' operation for reactor simulations, and have full array support in all languages. Specifically for 1D simulations, I believe more often than not users are interested in post-processing of results when loading an old solution. From that viewpoint, a direct access as SolutionArray would be convenient. So designing the YAML structure with SolutionArray import/export in mind would make some sense.

While I don't think that we necessarily have to have exact equivalence, here is what I noticed when comparing the two structures:

  • the HDF format is a lot flatter, and uses some domain names directly rather than storing them as the id field. I believe that it would make sense to do the same for YAML: if the YAML file were to be imported directly using a generic YAML parser, you can access what belongs to an object by its name, rather than pick from a list after checking the id field.
  • I do not think that the domains level is necessary.
  • soret_enabled, radiation_enabled and emissivity_left/right appear to be missing from YAML
  • The location of some parameters is consistent with C++ for YAML, whereas the HDF implementation follows the Python interface. I do not have an opinion here as you can argue both ways (C++ implementation vs user interface, and a revision of HDF could be done)
  • Storing by species name makes sense for YAML as it is a human-readable format (HDF uses X for efficiency, and isn't meant to be read)
  • One thing I tried to ensure for HDF is that the heritage of the Solution is stored (see phase) entry; the rationale here was that a solution can only be restored if you know what mechanism was used in the first place.

PS: out of curiosity ... what are the long-term plans to support import of xml results after Cantera 2.6? (Drop support or use converter script?)

samples/matlab/catcomb.m Outdated Show resolved Hide resolved
samples/matlab/catcomb.m Show resolved Hide resolved
@speth
Copy link
Member Author

speth commented Oct 17, 2021

@speth ... thank you for taking on the replacement of XML export. While this is not meant to be a full review, I commented on a couple of minor things I noticed. Beyond, there are a couple of Python examples left where XML is not replaced (and probably should be).

Good catch -- I had this on my to-do list for this PR, but it completely slipped my mind.

  • the HDF format is a lot flatter, and uses some domain names directly rather than storing them as the id field. I believe that it would make sense to do the same for YAML: if the YAML file were to be imported directly using a generic YAML parser, you can access what belongs to an object by its name, rather than pick from a list after checking the id field.

I agree that taking some cues from the Python HDF5 format for 1D flames is a good thought, though I think it's probably fine for these two formats to have some structural differences. I've updated the location of the domains so that they are stored in the top-level mapping, using the "id" as the key.

  • I do not think that the domains level is necessary.

Eliminated, per above.

  • soret_enabled, radiation_enabled and emissivity_left/right appear to be missing from YAML

Good catch. These were never included in the XML format, so I missed them while adapting the existing serialization code.

  • The location of some parameters is consistent with C++ for YAML, whereas the HDF implementation follows the Python interface. I do not have an opinion here as you can argue both ways (C++ implementation vs user interface, and a revision of HDF could be done)

Can you be more specific? I think the grouping of the grid-related properties (curve, prune, etc.) into a subgroup makes sense, at least for the YAML format. Likewise for the solver tolerances.

One other difference is that YAML output puts the grid refinement criteria and the energy_enabled etc. fields with the "flow" domain, as they really belong to that domain. In theory you could have multiple flow domains, though we currently have no examples of this and the Python FlameBase class where write_hdf is implemented is built around the assumption of a single flow domain.

  • One thing I tried to ensure for HDF is that the heritage of the Solution is stored (see phase) entry; the rationale here was that a solution can only be restored if you know what mechanism was used in the first place.

Done.

PS: out of curiosity ... what are the long-term plans to support import of xml results after Cantera 2.6? (Drop support or use converter script?)

The minimalist option is to say that Cantera 2.6 can be used convert existing XML solutions to YAML, as it has support for both. That will be viable as long as installing Cantera 2.6 in a conda environment works, which I think ought to be a fairly long time. Given that, I was leaning toward not creating a standalone converter script.

@ischoegl
Copy link
Member

ischoegl commented Oct 19, 2021

Attaching the updated YAML output (generated by adiabatic_flame.py):
adiabatic_flame.txt

@ischoegl
Copy link
Member

ischoegl commented Oct 19, 2021

@speth ... thank you for taking the time to consider my observations and also to implement most of them.

I really like the approach of letting AnyMap do all the data collection, and writing YAML from there. If we ever get around to moving HDF to the C++ core, I believe it would make sense to follow the same approach.

You are correct that locating refinement criteria with the flow domain makes sense from the C++ perspective, even if the Python API currently goes a different route. We could address this with a revision of the HDF standard if/when we move it to the C++ core; same with the grouping of tolerances and refinement criteria (otherwise it's non-essential).

Regarding tolerances, I have a slight preference for a flatter structure: it's never clear to me whether to group by absolute/relative tolerance or transient/steady solver, so I'd prefer to see something that is close to the Python/HDF API, i.e.

    tolerances:
      steady_abstol: 1.0e-09
      steady_reltol: 1.0e-04
      transient_abstol: 9.999999999999999e-12 # <-- I know this formatting issue is a current limitation
      transient_reltol: 1.0e-04

Using uppercase Soret_enabled looks like an outlier as other fields are all lowercase (there is a clash of two conventions).

Is there a way to write all settings before the numeric blocks? Currently, tolerances are written before, whereas refinement criteria are written after.

Finally, one observation is that currently some number blocks don't use horizontal space efficiently; this may be addressed by changing:

    velocity: [0.7112641405849059, 0.7112641405822596, 0.7112641404451637,
    0.7112641377426332, 0.7112641104415781, 0.7112637339406135,
    0.7112617435453226, 0.7112560064702751, 0.7112322153842345,
    ...
    4.055246182368149, 4.055246182377726]

to

    velocity: [
      0.7112641405849059, 0.7112641405822596, 0.7112641404451637, 0.7112641377426332, 
      0.7112641104415781, 0.7112637339406135, 0.7112617435453226, 0.7112560064702751, 
      0.7112322153842345, 0.7111349509095231, 0.7109781680863129, 0.7106086071388092,
      ...
      4.055246182368149, 4.055246182377726]

(which in this case still fits 88 characters; I'd even consider going wider for this type of file).

PS: this is what 102 characters would allow for (current output only has 2 columns)

    OH: [
      -2.014801730705695e-17, -4.629141939774994e-19, -4.813801146169683e-20, 1.771106802152406e-20,
      2.545810916390064e-20, 1.623461193735384e-18, 1.841329907298661e-17, 1.31188752889368e-16,
      ...
      4.556290451629765e-04, 4.556290451629765e-04]

PPS: the HDF writer suppresses eField and lambda where it is not applicable.

@speth
Copy link
Member Author

speth commented Oct 20, 2021

Regarding tolerances, I have a slight preference for a flatter structure: it's never clear to me whether to group by absolute/relative tolerance or transient/steady solver, so I'd prefer to see something that is close to the Python/HDF API, i.e.

    tolerances:
      steady_abstol: 1.0e-09
      steady_reltol: 1.0e-04
      transient_abstol: 9.999999999999999e-12 # <-- I know this formatting issue is a current limitation
      transient_reltol: 1.0e-04

Done, though I used names like steady-abstol to be consistent with the naming pattern established for Cantera's YAML input files.

Using uppercase Soret_enabled looks like an outlier as other fields are all lowercase (there is a clash of two conventions).

For the YAML input file format, the convention is to capitalize proper nouns, but little less, so I'd prefer to leave this as Soret-enabled.

Is there a way to write all settings before the numeric blocks? Currently, tolerances are written before, whereas refinement criteria are written after.

Done.

Finally, one observation is that currently some number blocks don't use horizontal space efficiently; this may be addressed by changing:

You're right, getting only two values per line was a little silly, so I increased the nominal line length to 88 characters globally, though this is really just an estimate because the YAML emitter object doesn't provide enough information to know how many characters in you already are when you start serializing a particular sub-object. It also doesn't indent wrapped lines the way you or I might want, so the best we can do without a lot of extra effort is something like:

    H2O: [9.397257458934713e-10, 4.507432204414695e-08, 2.0583678723563e-06,
    7.962288661066346e-05, 8.587434349611319e-04, 7.741273165985068e-03,
    0.0196275630231541, 0.03800532491016567, 0.05808037087942033,
    0.07665600042698147]

PPS: the HDF writer suppresses eField and lambda where it is not applicable.

Resolved (and likewise spread_rate for freely-propagating flames).

Copy link
Member

@ischoegl ischoegl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this implementation looks good to me. Using the AnyMap emitters makes a lot of sense.

While I don't think that addressing some of the remaining YAML formatting issues needs to be taken on at this moment (these weren't introduced here after all), I hope to see them addressed before the release of 2.6 (I opened issue #1128).

src/base/AnyMap.cpp Outdated Show resolved Hide resolved
src/base/AnyMap.cpp Outdated Show resolved Hide resolved
src/base/AnyMap.cpp Outdated Show resolved Hide resolved
src/oneD/StFlow.cpp Outdated Show resolved Hide resolved
include/cantera/oneD/Domain1D.h Show resolved Hide resolved
Copy link
Member

@ischoegl ischoegl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@speth ... thank you for addressing the last comments!

@ischoegl ischoegl merged commit 4570f3e into Cantera:main Oct 26, 2021
@ischoegl ischoegl mentioned this pull request Oct 26, 2021
5 tasks
@speth speth deleted the onedim-yaml branch July 23, 2024 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants