-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
phiGRAPE MPI support is broken #1090
Comments
@LourensVeen, I did what you suggested (after modifying the Makefile of phi grape to allow for multiple data formats), OSX 15.2 / gcc14 / python3.12.8 from Macports [obas-rech-cmb.astro.unistra.fr:24593] pmix_mca_base_component_repository_open: unable to open /opt/locaress_zlib.so: File /opt/local/lib/openmpi-mp/pmix/pmix_mca_pcompress_zlib.so.sl not found (ignored) ===================================== warnings summary ========================= -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html So apart from addressing new file on the system (new OSX security issues ??) I don't have any test failure. |
Hm, interesting. It's been a while since I looked at this, and it may be that I misunderstood something. I'll have another go at it after the new release, see if I can either get it to work or get a better understanding of what the problem is. Thanks for testing! |
I ran into this again while testing the new installer more, and I think I've found a problem: the way I was injecting values into the phigrape Makefile didn't pass in the right flags, and that caused MPI and non-MPI code to get mixed up. The behaviour of make changes a bit depending on whether variables were set in the environment or not, and I misunderstood how overriding works ( So now I can run test11 with 2 workers, but on the other hand, if I set the MPI worker as the default it still fails some of the tests:
Could you try any of these tests with more than one worker? |
Describe the bug
While converting phiGRAPE to the new build system, I tried removing the non-MPI worker and just putting in the MPI worker as the default, seeing as we're assuming that MPI is available everywhere anyway. This caused some of the tests to fail, in particular test11, which removes particles and then adds new particles representing merged pairs.
The original tests run only the non-MPI worker, except for for tests 15, 16 and 19, which are run with multiple workers but don't remove or add particles. So it looks like this was never tested. I haven't been able to find the exact cause, but it looks like adding and removing particles was simply never implemented correctly for the MPI-enabled case.
To Reproduce
Modify
TestPhigrape::test11
to run with two workers rather than one.Expected behavior
The code should give correct results rather than getting its internal state messed up, and thus pass the test.
Environment (please complete the following information):
Additional context
I'll build the non-MPI worker as well for now in the new build system and disable the MPI one in interface.py to avoid giving people incorrect results.
Another thing to take away from this is that the tests should really be set up so that every test is run for every worker, rather than running most tests only on the default worker and then adding one or two for the other workers. The latter greatly reduces test coverage for the non-default workers, while not saving any work. (Except that you'd then have to fix all the additional bugs you find, but that's better than incorrect results.)
The text was updated successfully, but these errors were encountered: