Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filters on mc_type for RF #589

Merged
merged 9 commits into from
Feb 17, 2021
Merged

Conversation

vuillaut
Copy link
Member

Issue:

  • Some of the parameters are initialized with a default value = -1.
  • Even in case of issues with the computation of these parameters, they can be used for training without causing issues (the values are valid predictions)
  • Training and inference would not raise any error and warning but make useless predictions

I encountered the problem when using dl1ab, the mc_type would be set to -1 for some events and thus the RF would predict 3 classes and the gammaness would not be assigned to the right class.
So I also add an error raise in case of more than 2 predicted classes, which should never happen in current setup.

rlopezcoto
rlopezcoto previously approved these changes Feb 10, 2021
@vuillaut
Copy link
Member Author

vuillaut commented Feb 10, 2021

I did not foresee the writing type issue... (these parameters are requested to be integers and nan is a float)
I will try and think about an elegant solution but suggestions are welcome.

@moralejo
Copy link
Collaborator

I did not foresee the writing type issue... (these parameters are requested to be integers and nan is a float)
I will try and think about an elegant solution but suggestions are welcome.

Then the default necessarily has to be an integer outside of the physical range, right? And filter those before passing the events to the training (and to the application of the trained models)

@@ -95,7 +95,7 @@ class DL1ParametersContainer(Container):
mc_core_x = Field(None, 'Simulated impact point x position', unit=u.m)
mc_core_y = Field(None, 'Simulated impact point y position', unit=u.m)
mc_h_first_int = Field(None, 'Simulated first interaction height', unit=u.m)
mc_type = Field(-1, 'Simulated particle type')
mc_type = Field(np.nan, 'Simulated particle type')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, -1 is ok, particle id is also a positive integer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I think that in the corsika nomenclatura, negative integers are possible and meant for anti-particles (-1 would be positrons then)
maybe I'll rather go for -9999 just in case ;-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, corsika uses positive numbers:

From the user's guide:
Screenshot_20210210_212528

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:o
thanks!
I would have sworn I had seen negative values in the past, my mistake!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, triple checking this, I found that simtel does not use the CORSIKA codes anymore.

From io_hess.h:

   int primary_id;      ///< Particle ID of primary. Was in CORSIKA convention
                        ///< where detector_prog_vers in MC run header was 0,
                        ///< and is now 0 (gamma), 1(e-), 2(mu-), 100*A+Z
                        ///< for nucleons and nuclei, negative for antimatter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this should be 0 for gamma, 101 for proton.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, verified for our current simulations. 0 = gamma, 101 = proton, negative numbers possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the counter check!
I'll update accordingly.

@vuillaut
Copy link
Member Author

I did not foresee the writing type issue... (these parameters are requested to be integers and nan is a float)
I will try and think about an elegant solution but suggestions are welcome.

Then the default necessarily has to be an integer outside of the physical range, right? And filter those before passing the events to the training (and to the application of the trained models)

Yes that was basically how it was implemented.
nan has the advantage to always be outside the physical range and does not need extra care.
But yes, with the type issue, I will introduce some more precise filtering...

@vuillaut vuillaut changed the title Initialize parameters with value = nan Filters on mc_type for RF Feb 10, 2021
@codecov
Copy link

codecov bot commented Feb 10, 2021

Codecov Report

Merging #589 (42b2baa) into master (99f4fc0) will increase coverage by 0.00%.
The diff coverage is 87.50%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #589   +/-   ##
=======================================
  Coverage   37.51%   37.52%           
=======================================
  Files          82       82           
  Lines        7359     7361    +2     
=======================================
+ Hits         2761     2762    +1     
- Misses       4598     4599    +1     
Impacted Files Coverage Δ
lstchain/reco/dl1_to_dl2.py 65.60% <80.00%> (+0.01%) ⬆️
lstchain/io/lstcontainers.py 86.22% <100.00%> (-0.07%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 99f4fc0...42b2baa. Read the comment docs.

@vuillaut vuillaut merged commit b527cb1 into cta-observatory:master Feb 17, 2021
@vuillaut vuillaut deleted the pred_features branch February 17, 2021 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants