Use YAML for saving and loading 1D flame simulations #1112

speth · 2021-10-06T16:58:24Z

Changes proposed in this pull request

In order to move ahead with the deprecation and removal of the XML format, we need a replacement for saving and loading 1D simulation results for all language interfaces. While the HDF5-based format available in the Python module is clearly the best choice for that interface, using YAML is the simplest option for the other interfaces.

Save 1D flame simulations to YAML files
Load 1D flame simulations from YAML files
Fix a couple of errors in the Matlab toolbox related to the use of the now-removed component names u and v (replaced by velocity and spread_rate in Cantera 2.5.0).

If applicable, provide an example illustrating new features this pull request is introducing

Checklist

The pull request includes a clear description of this code change
Commit messages have short titles and reference relevant issues
Build passes (scons build & scons test) and unit tests address code coverage
Style & formatting of contributed code follows contributing guidelines
The pull request is ready for review

codecov · 2021-10-06T17:42:17Z

Codecov Report

Merging #1112 (3268e96) into main (0d7373f) will increase coverage by 0.04%.
The diff coverage is 79.00%.

@@            Coverage Diff             @@
##             main    #1112      +/-   ##
==========================================
+ Coverage   73.45%   73.49%   +0.04%     
==========================================
  Files         365      365              
  Lines       47912    48187     +275     
==========================================
+ Hits        35194    35417     +223     
- Misses      12718    12770      +52

Impacted Files	Coverage Δ
include/cantera/base/AnyMap.h	`100.00% <ø> (ø)`
include/cantera/oneD/Boundary1D.h	`50.00% <ø> (ø)`
include/cantera/oneD/Domain1D.h	`85.57% <ø> (ø)`
include/cantera/oneD/IonFlow.h	`84.61% <ø> (ø)`
include/cantera/oneD/OneDim.h	`52.45% <ø> (ø)`
src/oneD/IonFlow.cpp	`52.38% <0.00%> (-1.47%)`	⬇️
src/oneD/Boundary1D.cpp	`53.22% <52.27%> (-0.21%)`	⬇️
src/oneD/Sim1D.cpp	`82.58% <90.16%> (+1.63%)`	⬆️
src/oneD/StFlow.cpp	`90.44% <91.57%> (+0.35%)`	⬆️
src/oneD/Domain1D.cpp	`85.20% <92.10%> (+2.00%)`	⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0d7373f...3268e96. Read the comment docs.

ischoegl · 2021-10-06T19:29:08Z

@speth ... thanks for taking this on. Without looking at details, I believe this is a very workable approach to moving away from XML. Do you anticipate creating an equivalent to SolutionArray (currently Python API only) at the Cantera C++ core eventually? (if yes, we may be able to move HDF to C++, and offer this as one of the input/output formats it for all interfaces). PS: The HDF5 structure should be writable by a C++ library; and I am absolutely not advocating for tackling this without a C++ SolutionArray equivalent.

speth · 2021-10-06T19:53:58Z

While I'd love to have the ability to write HDF5 output files directly, my previous experience with using the HDF5 C++ library was that it was a beastly dependency to add to a project, especially when trying to build on Windows where there isn't (or at least wasn't) an easy way to install the library and headers. That was the main driver for choosing to base this on the already-available YAML input/output capabilities, even if HDF5 is better suited for this mostly numeric array-centric data.

If you want to start an enhancement discussion to talk about a C++ equivalent to the Python SolutionArray class, I'd be happy to share my thoughts on it.

ischoegl · 2021-10-06T20:25:29Z

If you want to start an enhancement discussion to talk about a C++ equivalent to the Python SolutionArray class, I'd be happy to share my thoughts on it.

Not sure that this is on my own short-term horizon, but I believe it may be a good new feature for 3.0. It's a substantial project.

ischoegl · 2021-10-06T22:13:27Z

@speth ... I think it still makes sense to start the discussion now (Cantera/enhancements#119).

To me, the ability to export to and import from YAML would be parallel to existing CSV and HDF5 (with the latter only supported for the Python API). In other words, it would be neat if we could import the YAML format from the Python API also (which presumably would be a simple pass-through to C++). You may already have that in the works ...

ischoegl · 2021-10-15T22:24:25Z

Leaving a comment about HDF vs YAML before looking at details.

The output produced by save for the adiabatic_flame.py example for the proposed YAML export is
adiabatic_flame.txt

The same case output generated by write_hdf has the following structure

$ h5dump -n 1 adiabatic_flame_.h5 
HDF5 "adiabatic_flame_.h5" {
FILE_CONTENTS {
 group      /
 group      /group0
 attribute  /group0/Sim1D_type
 attribute  /group0/cantera_version
 attribute  /group0/curve
 attribute  /group0/date
 attribute  /group0/energy_enabled
 attribute  /group0/fixed_temperature
 attribute  /group0/git_commit
 attribute  /group0/max_grid_points
 attribute  /group0/max_time_step_count
 attribute  /group0/prune
 attribute  /group0/radiation_enabled
 attribute  /group0/ratio
 attribute  /group0/slope
 attribute  /group0/soret_enabled
 attribute  /group0/transport_model
 group      /group0/flame
 attribute  /group0/flame/Domain1D_type
 attribute  /group0/flame/emissivity_left
 attribute  /group0/flame/emissivity_right
 attribute  /group0/flame/name
 attribute  /group0/flame/steady_abstol
 attribute  /group0/flame/steady_reltol
 attribute  /group0/flame/transient_abstol
 attribute  /group0/flame/transient_reltol
 dataset    /group0/flame/T
 dataset    /group0/flame/X
 dataset    /group0/flame/density
 dataset    /group0/flame/grid
 group      /group0/flame/phase
 attribute  /group0/flame/phase/name
 attribute  /group0/flame/phase/source
 dataset    /group0/flame/velocity
 group      /group0/products
 attribute  /group0/products/Domain1D_type
 attribute  /group0/products/name
 dataset    /group0/products/T
 dataset    /group0/products/X
 dataset    /group0/products/density
 group      /group0/products/phase
 attribute  /group0/products/phase/name
 attribute  /group0/products/phase/source
 group      /group0/reactants
 attribute  /group0/reactants/Domain1D_type
 attribute  /group0/reactants/name
 dataset    /group0/reactants/T
 dataset    /group0/reactants/X
 dataset    /group0/reactants/density
 group      /group0/reactants/phase
 attribute  /group0/reactants/phase/name
 attribute  /group0/reactants/phase/source
 dataset    /group0/reactants/velocity
 }
}

ischoegl

@speth ... thank you for taking on the replacement of XML export. While this is not meant to be a full review, I commented on a couple of minor things I noticed. Beyond, there are a couple of Python examples left where XML is not replaced (and probably should be).

Regarding the PR itself, I don't have any concerns about the implementation per se, but I think it makes sense to discuss the long-term fate of storage structures for YAML. Cantera 2.5.1 introduced HDF output for Python, which is largely centered on SolutionArray, and extends this philosophy to various 1D objects (the way it was conceived centers on the user-facing Python API). The YAML structure here is - as far as I am aware - mostly a translation of the legacy XML structure that focuses on the implementation, and not the user interface.

As mentioned earlier, I hope to see both SolutionArray and (hopefully) HDF support pushed into the C++ core eventually (perhaps for Cantera 3.0?). That way, we can offer users a more convenient 'save' operation for reactor simulations, and have full array support in all languages. Specifically for 1D simulations, I believe more often than not users are interested in post-processing of results when loading an old solution. From that viewpoint, a direct access as SolutionArray would be convenient. So designing the YAML structure with SolutionArray import/export in mind would make some sense.

While I don't think that we necessarily have to have exact equivalence, here is what I noticed when comparing the two structures:

the HDF format is a lot flatter, and uses some domain names directly rather than storing them as the id field. I believe that it would make sense to do the same for YAML: if the YAML file were to be imported directly using a generic YAML parser, you can access what belongs to an object by its name, rather than pick from a list after checking the id field.
I do not think that the domains level is necessary.
soret_enabled, radiation_enabled and emissivity_left/right appear to be missing from YAML
The location of some parameters is consistent with C++ for YAML, whereas the HDF implementation follows the Python interface. I do not have an opinion here as you can argue both ways (C++ implementation vs user interface, and a revision of HDF could be done)
Storing by species name makes sense for YAML as it is a human-readable format (HDF uses X for efficiency, and isn't meant to be read)
One thing I tried to ensure for HDF is that the heritage of the Solution is stored (see phase) entry; the rationale here was that a solution can only be restored if you know what mechanism was used in the first place.

PS: out of curiosity ... what are the long-term plans to support import of xml results after Cantera 2.6? (Drop support or use converter script?)

samples/matlab/catcomb.m

speth · 2021-10-17T19:14:13Z

@speth ... thank you for taking on the replacement of XML export. While this is not meant to be a full review, I commented on a couple of minor things I noticed. Beyond, there are a couple of Python examples left where XML is not replaced (and probably should be).

Good catch -- I had this on my to-do list for this PR, but it completely slipped my mind.

the HDF format is a lot flatter, and uses some domain names directly rather than storing them as the id field. I believe that it would make sense to do the same for YAML: if the YAML file were to be imported directly using a generic YAML parser, you can access what belongs to an object by its name, rather than pick from a list after checking the id field.

I agree that taking some cues from the Python HDF5 format for 1D flames is a good thought, though I think it's probably fine for these two formats to have some structural differences. I've updated the location of the domains so that they are stored in the top-level mapping, using the "id" as the key.

I do not think that the domains level is necessary.

Eliminated, per above.

soret_enabled, radiation_enabled and emissivity_left/right appear to be missing from YAML

Good catch. These were never included in the XML format, so I missed them while adapting the existing serialization code.

The location of some parameters is consistent with C++ for YAML, whereas the HDF implementation follows the Python interface. I do not have an opinion here as you can argue both ways (C++ implementation vs user interface, and a revision of HDF could be done)

Can you be more specific? I think the grouping of the grid-related properties (curve, prune, etc.) into a subgroup makes sense, at least for the YAML format. Likewise for the solver tolerances.

One other difference is that YAML output puts the grid refinement criteria and the energy_enabled etc. fields with the "flow" domain, as they really belong to that domain. In theory you could have multiple flow domains, though we currently have no examples of this and the Python FlameBase class where write_hdf is implemented is built around the assumption of a single flow domain.

One thing I tried to ensure for HDF is that the heritage of the Solution is stored (see phase) entry; the rationale here was that a solution can only be restored if you know what mechanism was used in the first place.

Done.

PS: out of curiosity ... what are the long-term plans to support import of xml results after Cantera 2.6? (Drop support or use converter script?)

The minimalist option is to say that Cantera 2.6 can be used convert existing XML solutions to YAML, as it has support for both. That will be viable as long as installing Cantera 2.6 in a conda environment works, which I think ought to be a fairly long time. Given that, I was leaning toward not creating a standalone converter script.

ischoegl · 2021-10-19T20:33:06Z

Attaching the updated YAML output (generated by adiabatic_flame.py):
adiabatic_flame.txt

ischoegl · 2021-10-19T20:59:36Z

@speth ... thank you for taking the time to consider my observations and also to implement most of them.

I really like the approach of letting AnyMap do all the data collection, and writing YAML from there. If we ever get around to moving HDF to the C++ core, I believe it would make sense to follow the same approach.

You are correct that locating refinement criteria with the flow domain makes sense from the C++ perspective, even if the Python API currently goes a different route. We could address this with a revision of the HDF standard if/when we move it to the C++ core; same with the grouping of tolerances and refinement criteria (otherwise it's non-essential).

Regarding tolerances, I have a slight preference for a flatter structure: it's never clear to me whether to group by absolute/relative tolerance or transient/steady solver, so I'd prefer to see something that is close to the Python/HDF API, i.e.

    tolerances:
      steady_abstol: 1.0e-09
      steady_reltol: 1.0e-04
      transient_abstol: 9.999999999999999e-12 # <-- I know this formatting issue is a current limitation
      transient_reltol: 1.0e-04

Using uppercase Soret_enabled looks like an outlier as other fields are all lowercase (there is a clash of two conventions).

Is there a way to write all settings before the numeric blocks? Currently, tolerances are written before, whereas refinement criteria are written after.

Finally, one observation is that currently some number blocks don't use horizontal space efficiently; this may be addressed by changing:

    velocity: [0.7112641405849059, 0.7112641405822596, 0.7112641404451637,
    0.7112641377426332, 0.7112641104415781, 0.7112637339406135,
    0.7112617435453226, 0.7112560064702751, 0.7112322153842345,
    ...
    4.055246182368149, 4.055246182377726]

to

    velocity: [
      0.7112641405849059, 0.7112641405822596, 0.7112641404451637, 0.7112641377426332, 
      0.7112641104415781, 0.7112637339406135, 0.7112617435453226, 0.7112560064702751, 
      0.7112322153842345, 0.7111349509095231, 0.7109781680863129, 0.7106086071388092,
      ...
      4.055246182368149, 4.055246182377726]

(which in this case still fits 88 characters; I'd even consider going wider for this type of file).

PS: this is what 102 characters would allow for (current output only has 2 columns)

    OH: [
      -2.014801730705695e-17, -4.629141939774994e-19, -4.813801146169683e-20, 1.771106802152406e-20,
      2.545810916390064e-20, 1.623461193735384e-18, 1.841329907298661e-17, 1.31188752889368e-16,
      ...
      4.556290451629765e-04, 4.556290451629765e-04]

PPS: the HDF writer suppresses eField and lambda where it is not applicable.

speth · 2021-10-20T03:08:14Z

Regarding tolerances, I have a slight preference for a flatter structure: it's never clear to me whether to group by absolute/relative tolerance or transient/steady solver, so I'd prefer to see something that is close to the Python/HDF API, i.e.
    tolerances:
      steady_abstol: 1.0e-09
      steady_reltol: 1.0e-04
      transient_abstol: 9.999999999999999e-12 # <-- I know this formatting issue is a current limitation
      transient_reltol: 1.0e-04

Done, though I used names like steady-abstol to be consistent with the naming pattern established for Cantera's YAML input files.

Using uppercase Soret_enabled looks like an outlier as other fields are all lowercase (there is a clash of two conventions).

For the YAML input file format, the convention is to capitalize proper nouns, but little less, so I'd prefer to leave this as Soret-enabled.

Is there a way to write all settings before the numeric blocks? Currently, tolerances are written before, whereas refinement criteria are written after.

Done.

Finally, one observation is that currently some number blocks don't use horizontal space efficiently; this may be addressed by changing:

You're right, getting only two values per line was a little silly, so I increased the nominal line length to 88 characters globally, though this is really just an estimate because the YAML emitter object doesn't provide enough information to know how many characters in you already are when you start serializing a particular sub-object. It also doesn't indent wrapped lines the way you or I might want, so the best we can do without a lot of extra effort is something like:

    H2O: [9.397257458934713e-10, 4.507432204414695e-08, 2.0583678723563e-06,
    7.962288661066346e-05, 8.587434349611319e-04, 7.741273165985068e-03,
    0.0196275630231541, 0.03800532491016567, 0.05808037087942033,
    0.07665600042698147]

PPS: the HDF writer suppresses eField and lambda where it is not applicable.

Resolved (and likewise spread_rate for freely-propagating flames).

ischoegl

Overall, this implementation looks good to me. Using the AnyMap emitters makes a lot of sense.

While I don't think that addressing some of the remaining YAML formatting issues needs to be taken on at this moment (these weren't introduced here after all), I hope to see them addressed before the release of 2.6 (I opened issue #1128).

src/base/AnyMap.cpp

src/oneD/StFlow.cpp

include/cantera/oneD/Domain1D.h

If a single value applies for all components, just store the scalar value. If different values are needed for different components, store them as a map with the component names, so the tolerances can be restored even if the number of components changes.

The names 'u' and 'v' were deprecated and replaced with the names 'velocity' and 'spread_rate' in eb63825 and 61b3396. Support for the old names was removed in daf7d4d.

This makes the saved YAML structure more similar to the HDF5 structure used by the Python module.

Rename 'timestamp' to 'date', add 'git-commit', and move all metadata fields ahead of the simulation state.

The new setting of 88 is sufficient to get three full-precision doubles on a single line, even when accounting for the space taken for the key in the first line (usually).

ischoegl

@speth ... thank you for addressing the last comments!

speth marked this pull request as ready for review October 8, 2021 00:57

ischoegl reviewed Oct 16, 2021

View reviewed changes

samples/matlab/catcomb.m Outdated Show resolved Hide resolved

samples/matlab/catcomb.m Show resolved Hide resolved

speth force-pushed the onedim-yaml branch from d284451 to 2150565 Compare October 17, 2021 18:12

speth force-pushed the onedim-yaml branch from 016248c to 38c8f5e Compare October 20, 2021 02:55

ischoegl requested changes Oct 23, 2021

View reviewed changes

src/base/AnyMap.cpp Outdated Show resolved Hide resolved

src/base/AnyMap.cpp Outdated Show resolved Hide resolved

src/base/AnyMap.cpp Outdated Show resolved Hide resolved

src/oneD/StFlow.cpp Outdated Show resolved Hide resolved

include/cantera/oneD/Domain1D.h Show resolved Hide resolved

ischoegl mentioned this pull request Oct 23, 2021

Improve YAML output formatting #1128

Closed

speth added 14 commits October 25, 2021 21:32

[Input] Add a function for clearing cached input files

a2eaa75

[1D] Add basic framework for saving to YAML files

6d3898c

[1D] Implement serialization of Boundary1D objects to YAML

4f99073

[1D] Serialize StFlow to YAML

d141a23

[1D] Make StFlow::flowType a const function

5cbafcb

[1D] Remove unused member variable Domain1D::m_desc

34a968e

[1D] Add description and timestamp to YAML output files

2a5e980

[1D] Implement restoring flame solutions from YAML files

56aedb5

[1D/Test] Add tests of saving and restoring 1D solutions with YAML

1138b89

[1D] Add generator and Cantera version to YAML output

7f686cf

[1D] Update samples to use YAML output instead of XML

7a3465d

[Matlab/1D] Fix usage of outdated component names

adddba3

The names 'u' and 'v' were deprecated and replaced with the names 'velocity' and 'spread_rate' in eb63825 and 61b3396. Support for the old names was removed in daf7d4d.

[1D] Improve test coverage for YAML-based save/restore

8dab8cb

speth added 12 commits October 25, 2021 21:36

[1D] Move domains to top level of saved flame solution

2635009

This makes the saved YAML structure more similar to the HDF5 structure used by the Python module.

[1D] Update YAML metadata fields

3c2f26f

Rename 'timestamp' to 'date', add 'git-commit', and move all metadata fields ahead of the simulation state.

[1D] Include radiation-related properties in YAML output

afa3801

[1D] Include max grid point setting in YAML input/output

f2d272d

[1D] Save phase name and source file in YAML output

4501888

[1D] Include Soret-enabled flag in YAML input/output

bafb6c0

[1D] Update Python examples to use YAML output instead of XML

3ce60a3

[1D] Rename tolerances stored in YAML output files

8a70fe6

[1D] Improve ordering of keys in YAML output

7af8607

[YAML] Increase maximum line length when serializing

950ea20

The new setting of 88 is sufficient to get three full-precision doubles on a single line, even when accounting for the space taken for the key in the first line (usually).

[1D] Avoid serializing components that are not used in a given model

12b8829

[Input] Improve documentation of AnyMap YAML emitters

3268e96

speth force-pushed the onedim-yaml branch from 38c8f5e to 3268e96 Compare October 26, 2021 01:44

speth requested a review from ischoegl October 26, 2021 01:45

ischoegl approved these changes Oct 26, 2021

View reviewed changes

ischoegl merged commit 4570f3e into Cantera:main Oct 26, 2021

ischoegl mentioned this pull request Oct 26, 2021

Improve YAML formatting #1133

Merged

5 tasks

ischoegl mentioned this pull request Feb 21, 2022

C++ Equivalent of SolutionArray Cantera/enhancements#137

Closed

speth deleted the onedim-yaml branch July 23, 2024 15:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use YAML for saving and loading 1D flame simulations #1112

Use YAML for saving and loading 1D flame simulations #1112

speth commented Oct 6, 2021 •

edited

Loading

codecov bot commented Oct 6, 2021 •

edited

Loading

ischoegl commented Oct 6, 2021 •

edited

Loading

speth commented Oct 6, 2021

ischoegl commented Oct 6, 2021

ischoegl commented Oct 6, 2021 •

edited

Loading

ischoegl commented Oct 15, 2021 •

edited

Loading

ischoegl left a comment •

edited

Loading

speth commented Oct 17, 2021

ischoegl commented Oct 19, 2021 •

edited

Loading

ischoegl commented Oct 19, 2021 •

edited

Loading

speth commented Oct 20, 2021

ischoegl left a comment •

edited

Loading

ischoegl left a comment

Use YAML for saving and loading 1D flame simulations #1112

Use YAML for saving and loading 1D flame simulations #1112

Conversation

speth commented Oct 6, 2021 • edited Loading

codecov bot commented Oct 6, 2021 • edited Loading

Codecov Report

ischoegl commented Oct 6, 2021 • edited Loading

speth commented Oct 6, 2021

ischoegl commented Oct 6, 2021

ischoegl commented Oct 6, 2021 • edited Loading

ischoegl commented Oct 15, 2021 • edited Loading

ischoegl left a comment • edited Loading

Choose a reason for hiding this comment

speth commented Oct 17, 2021

ischoegl commented Oct 19, 2021 • edited Loading

ischoegl commented Oct 19, 2021 • edited Loading

speth commented Oct 20, 2021

ischoegl left a comment • edited Loading

Choose a reason for hiding this comment

ischoegl left a comment

Choose a reason for hiding this comment

speth commented Oct 6, 2021 •

edited

Loading

codecov bot commented Oct 6, 2021 •

edited

Loading

ischoegl commented Oct 6, 2021 •

edited

Loading

ischoegl commented Oct 6, 2021 •

edited

Loading

ischoegl commented Oct 15, 2021 •

edited

Loading

ischoegl left a comment •

edited

Loading

ischoegl commented Oct 19, 2021 •

edited

Loading

ischoegl commented Oct 19, 2021 •

edited

Loading

ischoegl left a comment •

edited

Loading