-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove duplicate events when generating DL3 files for source-dependent analysis #973
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #973 +/- ##
==========================================
+ Coverage 74.02% 74.08% +0.05%
==========================================
Files 123 123
Lines 11870 11901 +31
==========================================
+ Hits 8787 8817 +30
- Misses 3083 3084 +1
☔ View full report in Codecov by Sentry. |
Are you sure it is correct to do this? You want to estimate the background as if that was your source position. I think removing duplicate events will make you underestimate the background. What I am particularly concerned about is that the background rate will depend now implicitly on the number of off positions since more off positions means higher chance of duplicate events/higher chance of higher gamaness in another off region. |
Thanks @maxnoe for your comment. I'm aware of your points, and again wondering how we should treat such events... If we do not remove duplicated events, the same events are counted as signal and background events. If so, the number of excess events will be unchanged, but the significance will be higher because of more background events. So it would be not correct from the statistical point of view. But if we remove the duplicates, what you mentioned can happen. Although, the only single off region should be assumed to avoid multiple duplicates for source-dependent analysis (especially for low energy). Any other solutions to this...? |
So for events whose axes intersect more than one region (on-source or off-source regions) you select the one which makes the event more gamma-like. Assuming the off regions are equivalent to the on region (like we assume in source-independent analysis) in terms of background acceptance, real background will have in average equal probability of ending up in the ON or in any one of the OFF regions. So it should work. The problem is for signal, since gammas will sometimes end up in the off regions. This happens also in source-independent analysis, the psf tail contributes to populate the off regions, and biases the background estimate. For a bright source it may indeed be a problem. The additional complication in source-dependent analysis is that even the number of ON events will be affected by the number of chosen off regions (more chances to get a lower gammaness elsewhere). This would have an equivalence in the source-independent approach: increasing so much the number of off regions that you are forced to reduce their size to avoid overlap - and then you also start losing signal in the on region. So I think the method should work, but depending on the source brightness, number of considered regions, and sharpness of the Alpha distributions, it can lead to biases. |
maybe we could add a warning here when the number of OFF regions is >3 for people to be aware of the possibility of this effect |
thank you all for your comments. I implemented a change following the comments from @rlopezcoto (warning message), and also added an option to select if duplicated events are removed or kept after gammaness/alpha cut. |
@SeiyaNozaki implemented in this PR implemented all the comments from myself, @moralejo and @maxnoe . Even though we agree that it could be problematic if we assume more than 3 OFF positions (it will be very rarely done in single-telescope analysis), a warning is already implemented to make the users aware that they should be careful with the background determination. If nobody opposes, I'll merge it. |
For source-dependent analysis, it can happen that duplicate events are saved in DL3 files since each event has multiple gammaness values depending on the number of assumed source positions. The effect should be small if we use a relatively tight cut. But if we use a looser cut (e.g. efficiency cut like #942), it would be better to take into account the effect.
If duplicate events are survived after gammaness/alpha cut, only one with higher gammaness will be saved.
In this PR, a separate function is prepared and implemented in
lstchain_create_dl3_file.py