Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic Time History Collection #1360

Merged
merged 28 commits into from
Jul 15, 2021
Merged

Conversation

wrtobin
Copy link
Collaborator

@wrtobin wrtobin commented Mar 17, 2021

  • Fix [Bug] History collection give out of bound error with EmbeddedSurfaceSubRegion. #1356
  • Fix [Bug] Time history in parallel: abnormally slow setup and crash on exit #1309
  • Resolve [Feature] Time History output including spatial coordinates  #1203
  • Collect data from arrays with an index set that can change size on any MPI rank during execution
  • Account for the changing sizes in buffered collected data when writing to file
    • Using the minimum index count from a rank involved in the writing works. Any larger and the HDF5 mpio routines error. Minimally this means doing writes with chunk size 1, since any ranks with 0 indices are excluded from the subcomm used to write a specific history 'row'. Efficiency of this approach is questionable, but this is the largest the chunk can reliably be.
  • Unit testing for (1) expanding and contracting data collection/output from the same process (2) repeated file writing with (1)
  • Allow collection of anything in the data repository (that is able to be collected).
    • Since the dynamic collection changes the extent of the serialized portion of the time history 'row' that each MPI rank writes, we should also be able to use this to collect indexing data as well in order to help track specific items in a time history over the course of the simulation.

@wrtobin wrtobin added effort: 1 week type: cleanup / refactor Non-functional change (NFC) type: feature New feature or request labels Mar 18, 2021
@wrtobin wrtobin removed the new label Mar 25, 2021
@castelletto1
Copy link
Contributor

Hi @wrtobin, are you planning on taking care of #1203 within this PR?

@wrtobin
Copy link
Collaborator Author

wrtobin commented Apr 22, 2021

@castelletto1
I haven't added that yet, but it should be doable.

This PR DOES include the ability to collect anything in the dataRepository (that is packable), which includes objects that aren't necessarily associated with the mesh (linear algebra package information, etc). So I'll have to add some internal logical to only add the coordinate information when collecting from mesh objects.

I've added it to the checklist.

wrtobin added 7 commits April 27, 2021 15:06
…tically when collecting from object managers / fields, account for dynamic set changes for the coordinates
…anging file row prealloc to 1 so that coords being written and never updated don't have erroneous zero rows
@wrtobin wrtobin marked this pull request as ready for review May 3, 2021 16:33
@wrtobin
Copy link
Collaborator Author

wrtobin commented May 3, 2021

This requires a rebaseline as the tests deriving from sedov_base.xml will restart fail due to schema changes.

@wrtobin wrtobin added flag: ready for review flag: requires rebaseline Requires rebaseline branch in integratedTests labels May 3, 2021
HistoryCollection::initializePostSubGroups();
if( !m_initialized )
{
DomainPartition & domain = this->getGroupByPath< DomainPartition >( "/Problem/domain" );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rrsettgast is this preferred over getGlobalState().problemManager().domain()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think to prefer getGlobalState().problemManager().domain(), which is what I was using originally, this came in via a merge and I left it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be the way it is to avoid dependencies.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to fix this still...I was avoiding calls to getGlobalState() due to dependency issues, but I hadn't provided an accessor for domain. It isn't pleasant as it stands.

Copy link
Collaborator

@CusiniM CusiniM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I would add an integrated test in which some non mesh objects are collected (e.g., nonlinear and linear iterations) so that we have a working example in there.

objectPath="ElementRegions/Fracture/embeddedSurfaceSubRegion"
fieldName="elementCenter"
minSetSize="120"/>
fieldName="displacementJump" />
</Tasks>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for a min set size coz now the collection just gets resized every time the field changes size?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly.

There can still be trailing 0s though since the HDF arrays aren't ragged / AoAs. Practically now we just expand the dataset as needed, which will add 0s to all previous 'rows', and we never contract the dataset: it will be sized to accommodate the largest set of collected data from the run.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How hard would it be to write it out AoA style?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC it will require defining a couple data types, one for the values extracted by a single index, and another variable-length type using the first type. Not sure how it plays in a dataspace with an unlimited dimension.

The bigger unknown is dealing with the write in parallel w/ MPIO since HDF5 is a bit cumbersome in that respect... it might handle variable-length writes from processes transparently, or might require explicitly being told everything about the write (which is mostly the case right now).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this will only happen if the collection has not been written to file yet. So, basically, if the field is resized in the same packCollection event before being written to file then all time-steps will have the field with the maximum size. But this should not happen if an output event occurs before the resizing. Am I wrong?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A size increase at any point results in the dimensions of the hdf dataspace being changed, so even rows already written to file will have 0s appended when the extent is changed.

Correlating the rows and the mesh object coordinates should in most cases allow the determination of which elements are padding elements and which are not.

Copy link
Member

@rrsettgast rrsettgast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some nitpick things

src/coreComponents/dataRepository/HistoryDataSpec.hpp Outdated Show resolved Hide resolved
src/coreComponents/fileIO/Outputs/TimeHistoryOutput.cpp Outdated Show resolved Hide resolved
src/coreComponents/fileIO/timeHistory/HistoryIO.hpp Outdated Show resolved Hide resolved
HistoryCollection::initializePostSubGroups();
if( !m_initialized )
{
DomainPartition & domain = this->getGroupByPath< DomainPartition >( "/Problem/domain" );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to fix this still...I was avoiding calls to getGlobalState() due to dependency issues, but I hadn't provided an accessor for domain. It isn't pleasant as it stands.

src/coreComponents/fileIO/timeHistory/PackCollection.cpp Outdated Show resolved Hide resolved
wrtobin and others added 2 commits June 17, 2021 09:49
Co-authored-by: Randolph Settgast <settgast1@llnl.gov>
@castelletto1
Copy link
Contributor

I tried the integrated test PoroElasticTerzaghi_FIM.xml. Inspecting the pressure history output I see:

Screen Shot 2021-06-18 at 5 32 18 PM

I understand the pressure dataset: each row gives the pressure at an output time level (as shown in the pressure Time dataset) for the cell collection. What is not clear to me is the pressure elementCenter dataset: I would expect in each row the cell 0-coordinates -- in this snapshot, 1- and 2-coordinates look the same -- at the desired output time level, but I can only see zero values.

Note that all values in the pressure dataset are zero because the traction field specification has to be done differently. This has been fixed in #1401 . However, this should have no impact on the pressure elementCenter dataset.

@wrtobin
Copy link
Collaborator Author

wrtobin commented Jun 21, 2021

@castelletto1
I made a change to when the coordinates were first collected at some point which removed an extra row from the output (in order to get direct correspondence between the "time index" in all output datasets). Apparently without that extra collection the coordinates of static sets wouldn't be correctly collected due to this bug.

The last commit should fix this bug, thanks for catching it.

@wrtobin
Copy link
Collaborator Author

wrtobin commented Jul 1, 2021

This should be ready to go in.

https://github.com/GEOSX/integratedTests/pull/139 needs to be merged in first to account for the XML changes to the baselines that use the time history functionality.

@wrtobin wrtobin added ci: run CUDA builds Allows to triggers (costly) CUDA jobs and removed effort: 1 week flag: ready for review flag: requires rebaseline Requires rebaseline branch in integratedTests labels Jul 15, 2021
@rrsettgast rrsettgast merged commit d1e9ef7 into develop Jul 15, 2021
@rrsettgast rrsettgast deleted the feature/wrtobin/dynamic-time-hist branch July 15, 2021 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci: run CUDA builds Allows to triggers (costly) CUDA jobs type: cleanup / refactor Non-functional change (NFC) type: feature New feature or request
Projects
None yet
5 participants