Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Navigation refactoring moving navigators in VecGeom #264

Merged
merged 3 commits into from
Oct 13, 2023

Conversation

agheata
Copy link
Contributor

@agheata agheata commented Oct 13, 2023

This combines the VecGeom version change and the draft BVH refactoring, to have a coherent migration in one go:

  • Moving the requested version of VecGeom to 2.0.0-dev.3, which contains the navigators moved from AdePT, and needed by Draft BVH Refactoring #261. We need to use VecGeom_VERSION_STRING to select dev versions of VecGeom until the 2.0.0 release

  • Adapting the BVHNavigator to work with the modified BVH in VecGeom (by @JuanGonzalezCaminero)

  • Validation done:

    • Example17, 1 event, 1000 e-, Gun 10-170 in Theta, 0-360 in Phi: Numerically identical to master
    • TestEM3: Numerically identical to master
  • Performance comparison:

    • Example17, 64 events, 1000 e-, Gun 10-170 in Theta, 0-360 in Phi.
    • Average Event time ratio new/master : 1.00081
    • Average ECAL time ratio new/master : 1.00423
    • Average Non EM time ratio new/master: 0.99945

JuanGonzalezCaminero and others added 3 commits October 13, 2023 09:23
* Adapted the BVHNavigator to work with the modified BVH in Vecgeom
* Increased device stack limit in example17
* Move BVH and Loop navigators to VecGeom
* Take into account that older versions do not provide VecGeom_VERSION_STRING
@phsft-bot
Copy link

Can one of the admins verify this patch?


} // End namespace COPCORE_IMPL
#endif // RT_LOOP_NAVIGATOR_H_
#endif // RT_LOOP_NAVIGATOR_H_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RT_NAVIGATOR_H_

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK we fix this in a separate PR not to re-trigger the CI now

@agheata agheata merged commit 0e51a1e into apt-sim:master Oct 13, 2023
1 check passed
@hahnjo
Copy link
Contributor

hahnjo commented Oct 13, 2023

Hi, I ran my usual tests before and after this PR, also switching VecGeom from v1-patches (ie what we used so far) to master:

On AdePT commit eb811d9 (before this PR), comparing v1-patches to VecGeom commit https://gitlab.cern.ch/VecGeom/VecGeom/-/commit/804dcfadd0253ba81a2875da99cf40a2007b3f94 (before the breaking changes) shows many small differences with cms2018 and the BVH navigation (example12 and later). Do we understand where they come from? Even worse, example11 with -gdml_file cms2018.gdml and also example14 crash with an illegal memory access was encountered...

Then updating AdePT and VecGeom to the latest master, I see again differences; also in example17, so not numerically identical results. @agheata can you share the macro file? I use the same validation setup as in the past with 48 events of 1000 electrons: example17_validation.10GeV.48x1000.GPU.t12.mac.txt Moreover I observe the same crashes with example11 and example14...

For performance, I see a quite measurable time difference for both migrations with stronger magnetic fields (example19 with B_z = 3.8T)...

@agheata
Copy link
Contributor Author

agheata commented Oct 13, 2023

The crashes in examples when using cms_2018 are likely due to not increasing the stack limit, after a commit in VecGeom that was setting it to 8KB by default. Many examples are already protected, none is crashing on my machine or on the test machine, but we probably need to extend the protection. About the reproducibility and performance measurements, @JuanGonzalezCaminero can comment.

@JuanGonzalezCaminero
Copy link
Contributor

Hi Jonas, indeed the results are no longer numerically identical, the comparison was done before the commit moving the navigators to VecGeom.
However we can't compare the results from VecGeom commit 804dcfad to the current master because commit 0693dd8e does produce differences in the results. The comparison with commit 44ec0798 does produce identical results, however we can no longer run it with AdePT since this PR doesn't include This commit with the partial changes (Modified navigator but still in AdePT)

@agheata
Copy link
Contributor Author

agheata commented Oct 13, 2023

However we can't compare the results from VecGeom commit 804dcfad to the current master because commit 0693dd8e does produce differences in the results.

This is the fix for extruded solids non-symmetrical about XoY

@hahnjo
Copy link
Contributor

hahnjo commented Oct 13, 2023

indeed the results are no longer numerically identical, the comparison was done before the commit moving the navigators to VecGeom.

Ok, but there are two sources of differences that should be understood individually:

  1. Moving from v1-patches to master before the BVH refactoring - where does this come from?
  2. Refactoring the BVH and moving the navigators from AdePT to VecGeom - if the code is moved 1:1, I would expect it to produce identical results. Why is that not the case?

because commit 0693dd8e does produce differences in the results

This is the fix for extruded solids non-symmetrical about XoY

This commit touches PolygonalShell which is only used by simple extruded solids. There are no extruded solids in cms2018 (they are used in the HGCal for Phase2), so I'm not sure I understand how this can be responsible for changing results?

The comparison with commit 44ec0798 does produce identical results, however we can no longer run it with AdePT since this PR doesn't include This commit with the partial changes (Modified navigator but still in AdePT)

Ok, I picked only that one commit and the BVH refactoring alone seems to give numerical results while staying with VecGeom commit https://gitlab.cern.ch/VecGeom/VecGeom/-/commit/44ec0798fcc7d689246fb214620e520382d6c294 Can run more tests next week, if needed.

@agheata
Copy link
Contributor Author

agheata commented Oct 15, 2023

The only source of differences is actually coming from this merge request: https://gitlab.cern.ch/VecGeom/VecGeom/-/merge_requests/1004 (commit e607cbc72ac323c31747c0181c4e1ead886965ea) introducing operator*= for Transformation3D and applying it for cached transformations. This is one commit after the one fixing the simple extruded. The rationale was twofold: having numerical matching between transformations as computed in Geant4, and doing less copies while getting the top transformation in the non-cached mode. Basically, we replace the weird operation MultiplyFromRight with a normal matrix multiplication like in G4. However, due to the different order of operations, truncation is done differently so the final matrices and not numerically equal. There is a unit test that was implemented at the same time, and replacing the ApproxEqual with == makes it fail.

The move to master before the refactoring is probably an effect of my starting from Juan's branch, and adding the version change on top of it, I was rushing because the CI was broken by a premature VecGeom tag deployment.

@hahnjo
Copy link
Contributor

hahnjo commented Oct 16, 2023

The only source of differences is actually coming from this merge request: https://gitlab.cern.ch/VecGeom/VecGeom/-/merge_requests/1004 (commit e607cbc72ac323c31747c0181c4e1ead886965ea)

Alright, this explains the differences seen together with the refactoring of the BVH navigator and the move from AdePT to VecGeom. The difference between v1-patches to VecGeom commit 804dcfadd0253ba81a2875da99cf40a2007b3f94 is still a mystery to me...

@hahnjo
Copy link
Contributor

hahnjo commented Oct 16, 2023

The difference between v1-patches to VecGeom commit 804dcfadd0253ba81a2875da99cf40a2007b3f94 is still a mystery to me...

It's https://gitlab.cern.ch/VecGeom/VecGeom/-/commit/011010a2 which makes sense in retrospective, I guess... (edit: and https://gitlab.cern.ch/VecGeom/VecGeom/-/commit/380a0d48 as well; but I still can't exactly reproduce the results of v1-patches, with magnetic field at least; edit2: that's because https://gitlab.cern.ch/VecGeom/VecGeom/-/commit/81c7abf4 happened after that point - reverting only the two commits instead of fully resetting to the commit before gives me the same results as v1-patches)

@agheata
Copy link
Contributor Author

agheata commented Oct 16, 2023

The difference between v1-patches to VecGeom commit 804dcfadd0253ba81a2875da99cf40a2007b3f94 is still a mystery to me...

It's https://gitlab.cern.ch/VecGeom/VecGeom/-/commit/011010a2 which makes sense in retrospective, I guess...

Ah, OK, yes, it is understandable now. Can we run the testEM3 and example17 validation vs. Geant4 to make sure that all these transformation changes (tested to alter the values of transformation elements at ~1E-15 level) do not affect physics validation?

@agheata agheata mentioned this pull request Oct 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants