Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove duplicate events when generating DL3 files for source-dependent analysis #973

Merged
merged 11 commits into from
May 19, 2023

Conversation

SeiyaNozaki
Copy link
Collaborator

For source-dependent analysis, it can happen that duplicate events are saved in DL3 files since each event has multiple gammaness values depending on the number of assumed source positions. The effect should be small if we use a relatively tight cut. But if we use a looser cut (e.g. efficiency cut like #942), it would be better to take into account the effect.

If duplicate events are survived after gammaness/alpha cut, only one with higher gammaness will be saved.
In this PR, a separate function is prepared and implemented in lstchain_create_dl3_file.py

@SeiyaNozaki SeiyaNozaki changed the title Remode duplicate events when generating DL3 files for source-dependent analysis Remove duplicate events when generating DL3 files for source-dependent analysis Apr 22, 2022
@codecov
Copy link

codecov bot commented Apr 22, 2022

Codecov Report

Patch coverage: 96.87% and project coverage change: +0.05 🎉

Comparison is base (90d0450) 74.02% compared to head (291c41c) 74.08%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #973      +/-   ##
==========================================
+ Coverage   74.02%   74.08%   +0.05%     
==========================================
  Files         123      123              
  Lines       11870    11901      +31     
==========================================
+ Hits         8787     8817      +30     
- Misses       3083     3084       +1     
Impacted Files Coverage Δ
lstchain/io/__init__.py 100.00% <ø> (ø)
lstchain/tools/lstchain_create_dl3_file.py 92.26% <88.88%> (-0.20%) ⬇️
lstchain/io/io.py 79.73% <100.00%> (+0.54%) ⬆️
lstchain/io/tests/test_io.py 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

lstchain/io/__init__.py Outdated Show resolved Hide resolved
lstchain/io/__init__.py Outdated Show resolved Hide resolved
@maxnoe
Copy link
Member

maxnoe commented Apr 25, 2022

Are you sure it is correct to do this?

You want to estimate the background as if that was your source position. I think removing duplicate events will make you underestimate the background.

What I am particularly concerned about is that the background rate will depend now implicitly on the number of off positions since more off positions means higher chance of duplicate events/higher chance of higher gamaness in another off region.

@SeiyaNozaki
Copy link
Collaborator Author

Thanks @maxnoe for your comment.

I'm aware of your points, and again wondering how we should treat such events...

If we do not remove duplicated events, the same events are counted as signal and background events. If so, the number of excess events will be unchanged, but the significance will be higher because of more background events. So it would be not correct from the statistical point of view.

But if we remove the duplicates, what you mentioned can happen. Although, the only single off region should be assumed to avoid multiple duplicates for source-dependent analysis (especially for low energy).

Any other solutions to this...?

@moralejo
Copy link
Collaborator

So for events whose axes intersect more than one region (on-source or off-source regions) you select the one which makes the event more gamma-like. Assuming the off regions are equivalent to the on region (like we assume in source-independent analysis) in terms of background acceptance, real background will have in average equal probability of ending up in the ON or in any one of the OFF regions. So it should work.

The problem is for signal, since gammas will sometimes end up in the off regions. This happens also in source-independent analysis, the psf tail contributes to populate the off regions, and biases the background estimate. For a bright source it may indeed be a problem. The additional complication in source-dependent analysis is that even the number of ON events will be affected by the number of chosen off regions (more chances to get a lower gammaness elsewhere). This would have an equivalence in the source-independent approach: increasing so much the number of off regions that you are forced to reduce their size to avoid overlap - and then you also start losing signal in the on region.

So I think the method should work, but depending on the source brightness, number of considered regions, and sharpness of the Alpha distributions, it can lead to biases.

@rlopezcoto
Copy link
Contributor

So I think the method should work, but depending on the source brightness, number of considered regions, and sharpness of the Alpha distributions, it can lead to biases.

maybe we could add a warning here when the number of OFF regions is >3 for people to be aware of the possibility of this effect

@SeiyaNozaki
Copy link
Collaborator Author

thank you all for your comments. I implemented a change following the comments from @rlopezcoto (warning message), and also added an option to select if duplicated events are removed or kept after gammaness/alpha cut.

@rlopezcoto
Copy link
Contributor

@SeiyaNozaki implemented in this PR implemented all the comments from myself, @moralejo and @maxnoe . Even though we agree that it could be problematic if we assume more than 3 OFF positions (it will be very rarely done in single-telescope analysis), a warning is already implemented to make the users aware that they should be careful with the background determination. If nobody opposes, I'll merge it.

@rlopezcoto rlopezcoto merged commit 91f7b2b into master May 19, 2023
@rlopezcoto rlopezcoto deleted the rm_duplicate_srcdep_event branch May 19, 2023 06:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants