Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sprinkle M1 with comments on what the evaluation means #550

Merged
merged 9 commits into from
Feb 11, 2022

Conversation

ArturoAmorQ
Copy link
Collaborator

Partially addresses #530.

This PR adds comments on the score method in the M1:

  • the first time the method is used
  • the first time we use a test set for scoring
  • in the pipeline video, where care has to be taken on interpreting the result naively

@ArturoAmorQ ArturoAmorQ changed the title Sprinkle m1 Sprinkle M1 with comments on what the evaluation means Jan 24, 2022
ArturoAmorQ and others added 3 commits February 10, 2022 07:40
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
@ArturoAmorQ
Copy link
Collaborator Author

Thanks for the comments @ogrisel !

# But, can this evaluation be trusted, or is it too good to be true?
# This result means that the model makes a correct _prediction_ for
# approximately 82 samples out of 100. But, can a model _predict_ something
# that it already saw? In other words, can this evaluation be trusted, or is it
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I know what you are trying to say (that we are measuring the accuracy on the training data and that it is kind of cheating) but I find the wording super confusing ...

In particular: "can a model predict something that it already saw?" I would answer "yes why not, sorry is this a trick question?"

I think you probably mean "can this really be called prediction when we are learning and predicting from the same data" but I can't find a good wording that convinces me.

I kind of think the next section in train-test data split explains this kind of thing already so I would stay short maybe something like this:

Note that here we used the same data to learn and evaluate our model, so can this evaluation be trusted, or is it too good to be true?

Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
@lesteve
Copy link
Collaborator

lesteve commented Feb 11, 2022

I tried hard to refrain from tweaking the wording but I did not manage to ...

Thanks, merging this one!

@lesteve lesteve merged commit 4e126c7 into INRIA:main Feb 11, 2022
github-actions bot pushed a commit that referenced this pull request Feb 11, 2022
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Loïc Estève <loic.esteve@ymail.com> 4e126c7
@ArturoAmorQ ArturoAmorQ deleted the sprinkle_M1 branch March 11, 2022 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants