[Bug] History collection give out of bound error with EmbeddedSurfaceSubRegion. #1356

CusiniM · 2021-03-15T22:18:31Z

Describe the bug
Out of bound error in the pack collection when running cases with EmbeddedSurfaceSubRegion.

To Reproduce
Just try to run Tutorial 10 with a debug build.

The text was updated successfully, but these errors were encountered:

wrtobin · 2021-03-15T22:52:29Z

Just reproduced this on quartz, I'll take a look.

wrtobin · 2021-03-15T23:41:57Z

Bypassing the initial issue -- that the set we're collecting from is initially empty -- is relatively straightforward.

The bigger issue is that later on when we go to collect the output the buffer won't be allocated since it thinks the size is 0. I'm in the middle of triaging what is needed to add the functionality (some of it is in place, but some things remain to be worked out).

I'll update when I have an idea what will be required to get the new functionality up and running.

CusiniM · 2021-03-15T23:53:43Z

Let me know if I can help. I am also working on syncrhonizing data across ranks in mpi runs in #1354. I wonder if we can do something at that stage for the timeHistory as well. It's not at all the same issue but they are both generated by the fact that elements are created later on and not when the mesh is generated.

wrtobin · 2021-03-16T21:32:40Z

I spent some time this morning looking at this.

Changing collection operations to allow the collected data to change size is pretty straightforward and doesn't require a big change.

The difficulty is writing the data to file since we buffer our collection operations internally until we go to write a bunch of them at once. This requires accounting for a number of things in with MPI+HDF5 in terms of data psuedo-partitioning, dataspace sizing and changing the way we write the data from all-at-once to incremental (can't hyperslab at dataspace in HDF5 with one of the dimensions changing extent, so we treat each collected 'row' as a hyperslab and write them incrementally).

So we'll have to change our buffering + file writing operations. I am part way through what needs to get done but it will probably take a couple days to get everything put together since there are a number of edge cases we can run into and its best to handle them without breaking the cases out.

Mostly these amount to dealing with data movement/creation/deletion across MPI between write operations:

What if we're buffering data and a rank loses all the indices being collected that are associated with it (migration or just creating/deleting underlying objects we're collecting from)
What if a rank that has no indices being collected creates/takes ownership of some
What if a rank goes from zero to nonzero and back to zero between write operations
What if a rank does the the opposite

In particular it is dealing with these in the context of HDF5 that represent the biggest issue since HDF5 has some caveats w.r.t. using MPIO for simultaneous write operations.

These can all be handled by a single operation to be sure, just need to make a number of adjustments to the current way we buffer + write.

CusiniM · 2021-03-16T21:59:59Z

Thanks for looking into this. I think it is important to fix this.

Do you think that writing data incrementally will also allow us to start writing to the HDF5 files info related to the solver performance (i.e., iteration count etc) ? This would be a big plus.

I don't know much about neither HDF5 nor MPIO but I think that for the use case I have in mind some of the things you listed should never happen and this may simplify things a bit.

Below my comment about each difficult scenario you described.

What if we're buffering data and a rank loses all the indices being collected that are associated with it (migration or just creating/deleting underlying objects we're collecting from) I don't think this will ever happen for the embedded fractures. Once a fracture element is created on a rank I would expect it to stay there.
What if a rank that has no indices being collected creates/takes ownership of some This is certainly something that can happen throughout a simulation. As a matter of fact this happens all the time when the first fracture elements are created in the fracture generator.
What if a rank goes from zero to nonzero and back to zero between write operations Again, I don't think this will happen. In general I would expect a rank to only see an increase of the number of fractured elements on it.
What if a rank does the the opposite same as before.

I d like to hear @rrsettgast and @corbett5 opinions. Also, I wonder if this functionality could be useful for @homel1 's work on the particles. I guess particles may introduce some of the cases you mentioned. In particular I think they can move from one rank to another.

CusiniM added the new label Mar 15, 2021

CusiniM assigned wrtobin Mar 15, 2021

CusiniM added type: bug Something isn't working and removed new labels Mar 15, 2021

wrtobin mentioned this issue Mar 17, 2021

Dynamic Time History Collection #1360

Merged

7 tasks

rrsettgast closed this as completed in #1360 Jul 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] History collection give out of bound error with EmbeddedSurfaceSubRegion. #1356

[Bug] History collection give out of bound error with EmbeddedSurfaceSubRegion. #1356

CusiniM commented Mar 15, 2021 •

edited

Loading

wrtobin commented Mar 15, 2021

wrtobin commented Mar 15, 2021 •

edited

Loading

CusiniM commented Mar 15, 2021 •

edited

Loading

wrtobin commented Mar 16, 2021 •

edited

Loading

CusiniM commented Mar 16, 2021

[Bug] History collection give out of bound error with EmbeddedSurfaceSubRegion. #1356

[Bug] History collection give out of bound error with EmbeddedSurfaceSubRegion. #1356

Comments

CusiniM commented Mar 15, 2021 • edited Loading

wrtobin commented Mar 15, 2021

wrtobin commented Mar 15, 2021 • edited Loading

CusiniM commented Mar 15, 2021 • edited Loading

wrtobin commented Mar 16, 2021 • edited Loading

CusiniM commented Mar 16, 2021

CusiniM commented Mar 15, 2021 •

edited

Loading

wrtobin commented Mar 15, 2021 •

edited

Loading

CusiniM commented Mar 15, 2021 •

edited

Loading

wrtobin commented Mar 16, 2021 •

edited

Loading