Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

availableChunks open logic (HDF5: Early Chunk Read) #1035

Merged
merged 15 commits into from
Jul 21, 2021

Conversation

ax3l
Copy link
Member

@ax3l ax3l commented Jul 8, 2021

This adds a test for early reading of chunks (via availableChunks) with HDF5 that appears quite often with post-processing routines based on chunks.

I have not yet found the reason for this fail, but it seems to be HDF5 specific (details documented in #961). Either the writable is a bit off or the file not yet open.

for( auto & r_c : r.second )
{
std::cout << r_c.first << "\n";
if( !r_c.second.constant() )
Copy link
Member Author

@ax3l ax3l Jul 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memo to myself and unrelated to the issue: this if can probably be removed now that #942 is implemented :)

Also: let's do something with the chunks variable (e.g. print it).

@franzpoeschel
Copy link
Contributor

franzpoeschel commented Jul 8, 2021

Ref.: https://github.com/openPMD/openPMD-api/pull/862/files
tldr: No file handles are open after parsing in file-based iteration encoding. Fixed by using auto electrons = s.iterations[400].open().particles["electrons"];.

This test should fail for other backends, too. I claim "no bug".

@ax3l
Copy link
Member Author

ax3l commented Jul 8, 2021

Thanks for the ideas!

Why I tend to think that this is a usability bug and we should auto-open iterations in availableChunks() calls is because we do the same for loadChunks().

I mean also from the API contract it's logical that if a serial

Series s = Series(...);

auto electrons = s.iterations[400].particles["electrons"];
auto chunks = electrons["position"]["x"].loadChunk();

works than a

Series s = Series(...);

auto electrons = s.iterations[400].particles["electrons"];
auto chunks = electrons["position"]["x"].availableChunks();

should work the same way. Especially since the latter would be called before the former now with #802-guided processing.

Ref.: https://github.com/openPMD/openPMD-api/pull/862/files

Note that #862 is a MPI-parallel context where we want to trigger a MPI-collective open call with a collective guarantee because we use loadChunks() and flush() in a non-collective way as the following first data calls (see test in that PR). The problem here is serial.

If course, if someone called in an MPI-parallel context availableChunks() as the first thing and only from one rank, then they would be again required to first perform a collective ::open() on the iteration, the same way as they need to for loadChunk().

This test should fail for other backends, too.

The problem became apparent to me in serial reads with src/binding/python/openpmd_api/DaskDataFrame.py, where we have no access to the iteration. In my tests, ADIOS2 reads work and HDF5 reads fail (both are serial, file-based encoding).

this->dirty() = true;
this->seriesFlush();
this->dirty() = false;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit doubtful about this..

  1. It makes the availableChunks call collective, we should at least document that
  2. availableChunks will now silently flush the whole Series. This will break user code that peruses our new flushing logic and its guarantees of no writes occurring until flushing. If we really want to do this, this function must not do anything more than merely opening those files.

@franzpoeschel
Copy link
Contributor

franzpoeschel commented Jul 12, 2021

Thanks for the ideas!

Why I tend to think that this is a usability bug and we should auto-open iterations in availableChunks() calls is because we do the same for loadChunks().

I mean also from the API contract it's logical that if a serial

Series s = Series(...);

auto electrons = s.iterations[400].particles["electrons"];
auto chunks = electrons["position"]["x"].loadChunk();

This code does not open the iteration, s.flush() does. Interacting with an iteration fundamentally requires opening it in openPMD, but Series::flush() as well as the Streaming API will do it automatically. The fundamental difference between availableChunks() and loadChunk() at this point is that the latter is a deferred operation and hence receives all the properties of a deferred operation, including that it is performed by flushing.

works than a

Series s = Series(...);

auto electrons = s.iterations[400].particles["electrons"];
auto chunks = electrons["position"]["x"].availableChunks();

should work the same way. Especially since the latter would be called before the former now with #802-guided processing.

Ref.: https://github.com/openPMD/openPMD-api/pull/862/files

This would make availableChunks() collective too, while loadChunk() can remain non-collective (collective business is dealt with in Series::flush()). Just as imbalanced as before, though even a bit trickier to debug.

Note that #862 is a MPI-parallel context where we want to trigger a MPI-collective open call with a collective guarantee because we use loadChunks() and flush() in a non-collective way as the following first data calls (see test in that PR). The problem here is serial.

We're making the problem parallel by implicit opening of files (and with the current implementation: implicit flushing).

If course, if someone called in an MPI-parallel context availableChunks() as the first thing and only from one rank, then they would be again required to first perform a collective ::open() on the iteration, the same way as they need to for loadChunk().

This test should fail for other backends, too.

The problem became apparent to me in serial reads with src/binding/python/openpmd_api/DaskDataFrame.py, where we have no access to the iteration. In my tests, ADIOS2 reads work and HDF5 reads fail (both are serial, file-based encoding).

It's weird that this would even work in ADIOS2, I'll need to have a look at what's going on there.

Do you want Iteration::open() fundamentally to be necessary in parallel contexts only? I'm a bit hesitant to add exceptions to using Iteration::open() here and there, but if we set that as a fundamental guideline, and have some clean general concepts, that's another story then.

@franzpoeschel
Copy link
Contributor

After following the behavior of ADIOS2 in the debugger, this line implicitly opens an ADIOS2 engine if not yet happened. I would declare this a bug, such things should only happen if explicitly requested by the frontend.

@ax3l
Copy link
Member Author

ax3l commented Jul 13, 2021

We will discuss this further on Wednesday.

I think the cleanest way would be to:

  • implicitly open the file if still closed; if this is the first call than this makes it collective (otherwise this is an independent call)
    • serial reads are also significantly simpler that way, e.g. the demonstrator test in this PR
  • make sure Iteration::open() works for this context, currently it's not the case (tried it)

I would thus also discuss if we want to apply #1045 really or if we instead rather implement a similar solution here.

We currently don't need Iteration::open() besides for the described independent-first-access context above. If we want to change that we need to improve error reporting on non-opened backends I think and we probably need to advertise that Iteration::open() always needs to be called for consistency.

@ax3l ax3l changed the title HDF5: Early Chunk Read availableChunks open logic (HDF5: Early Chunk Read) Jul 14, 2021
@franzpoeschel
Copy link
Contributor

Memo to self: Try why Iteration::open() won't work in this context

try
{
Series s = Series(
"../samples/git-sample/data%T.h5",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed: add (file based) adios2 test, too due to #1045

@franzpoeschel
Copy link
Contributor

elaborate on open() relation to flushing logic

The changes in ax3l#1 make open() independent from flushing logic.

include/openPMD/backend/Attributable.hpp Show resolved Hide resolved
src/backend/Attributable.cpp Show resolved Hide resolved
@@ -4288,6 +4326,48 @@ void deferred_parsing( std::string const & extension )
std::numeric_limits< float >::epsilon() );
}
}
{
Series series(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: also fixes & tests read_write lazy parsing in this diff.

@ax3l ax3l enabled auto-merge (squash) July 21, 2021 07:22
@ax3l ax3l disabled auto-merge July 21, 2021 16:09
@ax3l ax3l merged commit 020177a into openPMD:dev Jul 21, 2021
@ax3l ax3l deleted the fix-h5EarlyChunk branch July 21, 2021 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Early Query: availableChunks & HDF5
2 participants