Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] History collection give out of bound error with EmbeddedSurfaceSubRegion. #1356

Closed
CusiniM opened this issue Mar 15, 2021 · 5 comments · Fixed by #1360
Closed

[Bug] History collection give out of bound error with EmbeddedSurfaceSubRegion. #1356

CusiniM opened this issue Mar 15, 2021 · 5 comments · Fixed by #1360
Assignees
Labels
type: bug Something isn't working

Comments

@CusiniM
Copy link
Collaborator

CusiniM commented Mar 15, 2021

Describe the bug
Out of bound error in the pack collection when running cases with EmbeddedSurfaceSubRegion.

To Reproduce
Just try to run Tutorial 10 with a debug build.

@CusiniM CusiniM added the new label Mar 15, 2021
@CusiniM CusiniM added type: bug Something isn't working and removed new labels Mar 15, 2021
@wrtobin
Copy link
Collaborator

wrtobin commented Mar 15, 2021

Just reproduced this on quartz, I'll take a look.

@wrtobin
Copy link
Collaborator

wrtobin commented Mar 15, 2021

Bypassing the initial issue -- that the set we're collecting from is initially empty -- is relatively straightforward.

The bigger issue is that later on when we go to collect the output the buffer won't be allocated since it thinks the size is 0. I'm in the middle of triaging what is needed to add the functionality (some of it is in place, but some things remain to be worked out).

I'll update when I have an idea what will be required to get the new functionality up and running.

@CusiniM
Copy link
Collaborator Author

CusiniM commented Mar 15, 2021

Let me know if I can help. I am also working on syncrhonizing data across ranks in mpi runs in #1354. I wonder if we can do something at that stage for the timeHistory as well. It's not at all the same issue but they are both generated by the fact that elements are created later on and not when the mesh is generated.

@wrtobin
Copy link
Collaborator

wrtobin commented Mar 16, 2021

I spent some time this morning looking at this.

Changing collection operations to allow the collected data to change size is pretty straightforward and doesn't require a big change.

The difficulty is writing the data to file since we buffer our collection operations internally until we go to write a bunch of them at once. This requires accounting for a number of things in with MPI+HDF5 in terms of data psuedo-partitioning, dataspace sizing and changing the way we write the data from all-at-once to incremental (can't hyperslab at dataspace in HDF5 with one of the dimensions changing extent, so we treat each collected 'row' as a hyperslab and write them incrementally).

So we'll have to change our buffering + file writing operations. I am part way through what needs to get done but it will probably take a couple days to get everything put together since there are a number of edge cases we can run into and its best to handle them without breaking the cases out.

Mostly these amount to dealing with data movement/creation/deletion across MPI between write operations:

  • What if we're buffering data and a rank loses all the indices being collected that are associated with it (migration or just creating/deleting underlying objects we're collecting from)
  • What if a rank that has no indices being collected creates/takes ownership of some
  • What if a rank goes from zero to nonzero and back to zero between write operations
  • What if a rank does the the opposite

In particular it is dealing with these in the context of HDF5 that represent the biggest issue since HDF5 has some caveats w.r.t. using MPIO for simultaneous write operations.

These can all be handled by a single operation to be sure, just need to make a number of adjustments to the current way we buffer + write.

@CusiniM
Copy link
Collaborator Author

CusiniM commented Mar 16, 2021

Thanks for looking into this. I think it is important to fix this.

Do you think that writing data incrementally will also allow us to start writing to the HDF5 files info related to the solver performance (i.e., iteration count etc) ? This would be a big plus.

I don't know much about neither HDF5 nor MPIO but I think that for the use case I have in mind some of the things you listed should never happen and this may simplify things a bit.

Below my comment about each difficult scenario you described.

  • What if we're buffering data and a rank loses all the indices being collected that are associated with it (migration or just creating/deleting underlying objects we're collecting from) I don't think this will ever happen for the embedded fractures. Once a fracture element is created on a rank I would expect it to stay there.
  • What if a rank that has no indices being collected creates/takes ownership of some This is certainly something that can happen throughout a simulation. As a matter of fact this happens all the time when the first fracture elements are created in the fracture generator.
  • What if a rank goes from zero to nonzero and back to zero between write operations Again, I don't think this will happen. In general I would expect a rank to only see an increase of the number of fractured elements on it.
  • What if a rank does the the opposite same as before.

I d like to hear @rrsettgast and @corbett5 opinions. Also, I wonder if this functionality could be useful for @homel1 's work on the particles. I guess particles may introduce some of the cases you mentioned. In particular I think they can move from one rank to another.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants