Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update examples #3856

Merged
merged 4 commits into from
Oct 20, 2023
Merged

Update examples #3856

merged 4 commits into from
Oct 20, 2023

Conversation

spyridon97
Copy link
Contributor

  1. Examples: Add BeginStep/EndStep wherever it was missing
  2. Examples: Use BPFile instead of BP3/4/5 for future-proof
  3. Update the bpWriterReadHip example's cmake to run on crusher

This PR fixes #3855, #3840.

@spyridon97 spyridon97 force-pushed the update-examples branch 2 times, most recently from 796bc9c to 8a009bc Compare October 19, 2023 16:58
@spyridon97
Copy link
Contributor Author

@caitlinross can you understand why this test fails? https://open.cdash.org/test/1266322909

I have beginStep and endStep.

@williamfgc
Copy link
Contributor

@spyridon97 could be a race condition in which reader is faster than the writer. The error message says the metadata is empty, but the exception should be fixed in BP5 or just rely on checking on var.

@spyridon97
Copy link
Contributor Author

@spyridon97 could be a race condition in which reader is faster than the writer. The error message says the metadata is empty, but the exception should be fixed in BP5 or just rely on checking on var.

What do you propose to fix it?

@caitlinross
Copy link
Collaborator

caitlinross commented Oct 19, 2023

I don't think it's a race condition. The read test doesn't run until the write test completes (I checked - there's a dependency set in the relevant CMakeLists.txt). But the write test is segfaulting (not sure why), so that's why the read test is failing because the bp file written in the write test is corrupted for some reason. Did you try running that test locally to see if it fails? Looks like I had it set to use the BP4 engine before, so I'm not sure why it would fail when switching to BP5 engine.

@caitlinross
Copy link
Collaborator

caitlinross commented Oct 19, 2023

I'm getting something weird when trying to run locally. To run: ctest -V -R Install.CMake.Encryption

So what happens results in 2 steps:

  1. Install.Setup where the adios library is built. This part completes fine locally and looks like it passes in CI as well.
  2. Install.CMake.EncryptionOperator This actually has a CMake file that is its own project separate from the other CMake files for ADIOS. So this finds the adios2 package (built in step 1), then it builds the write and read examples, and then runs those 2 tests. If those 2 tests pass, then this step 2 passes (as in the outer step 2, Install.CMake.EncryptionOperator). If I change the engine back to BP4, there's no issue. But when I change it to BP5, it seems to hang at the part where it's building the write/read examples. I have no idea how that change could cause it to hang at that point...

Actually looks like it hangs now in CI too: https://open.cdash.org/test/1266359116

@caitlinross
Copy link
Collaborator

Okay, I think the hanging/time out is a red herring. It seems that the tests are being built and run, but because step 2 above contains its own tests, something about how those are failing is causing it to look like it's hanging when trying to run with CTest.

Anyway running the example outside of CTest I investigated the segfault. It's failing in BP5Serializer.cpp:736. Offsets is null.

@eisenhauer the example code in question is adios2/examples/plugins/operator/examplePluginOperatorWrite.cpp. A local variable is created and then an operator (the EncryptionOperator plugin) is added to it. If I switch the engine back to BP4, it runs successfully. If I set the engine to BP5 and remove the operator, it's fine too. It seems to be an issue with having an operator on a local variable in BP5. Any known issues for this? Perhaps I am doing something incorrectly in the plugin that was fine before BP5?

@eisenhauer
Copy link
Member

BP5 does a lot of things differently than BP4, including keeping more information in the engine (and not in the variable, which might be shared with other engines). I'll take a look at this, but it'll probably be tomorrow before I get to it.

@eisenhauer
Copy link
Member

Actually did get a chance for a quick look. BP3/4 operator stuff was based on the BPInfo struct which was largely eliminated in BP5 because it was responsible for significant overhead. So operator handling was relocated and yes, we must not have any tests for operators on local variables because that case was missed. Can you try it with this patch?

diff --git a/source/adios2/toolkit/format/bp5/BP5Serializer.cpp b/source/adios2/toolkit/format/bp5/BP5Serializer.cpp
index 8b10c30b4..017f23854 100644
--- a/source/adios2/toolkit/format/bp5/BP5Serializer.cpp
+++ b/source/adios2/toolkit/format/bp5/BP5Serializer.cpp
@@ -733,7 +733,8 @@ void BP5Serializer::Marshal(void *Variable, const char *Name, const DataType Typ
             for (size_t i = 0; i < DimCount; i++)
             {
                 tmpCount.push_back(Count[i]);
-                tmpOffsets.push_back(Offsets[i]);
+		if (Offsets)
+		    tmpOffsets.push_back(Offsets[i]);
             }
             size_t AllocSize = ElemCount * ElemSize + 100;
             BufferV::BufferPos pos = CurDataBuffer->Allocate(AllocSize, ElemSize);

If that works we can format it properly and get it in a PR.

@caitlinross
Copy link
Collaborator

Yup that fixes it.

@pnorbert
Copy link
Contributor

pnorbert commented Oct 19, 2023 via email

@eisenhauer
Copy link
Member

eisenhauer commented Oct 19, 2023

Yup that fixes it.

Sweet! I gotta run now, but I'll do a PR later, or you guys can, whichever. (And I guess we might want to think about a test for that case? Or I guess the examples are essentially that, since they're running?)

@caitlinross
Copy link
Collaborator

@eisenhauer I can go ahead and take care of it. And it'd be pretty easy to write a test for this case (basically the same thing this encryption operator write test is doing, except use a built in compression operator to simplify the test), so I can add that as well.

@williamfgc
Copy link
Contributor

@pnorbert agree to drop BP3/BP4. BP5 has the advantage of all the learning from BP3 and BP4 and breaks too much API. I wouldn't even call it BP5, just BPFile for the end user moving forward.

@caitlinross
Copy link
Collaborator

@spyridon97 if you rebase on master, that operator test should be fixed now.

@spyridon97
Copy link
Contributor Author

@caitlinross can you approve this PR?

@spyridon97 spyridon97 merged commit 2d4af78 into ornladios:master Oct 20, 2023
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Example fails to build in Crusher
5 participants