Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MetaCAT fixes and upgrades #495

Merged
merged 6 commits into from
Oct 14, 2024
Merged

MetaCAT fixes and upgrades #495

merged 6 commits into from
Oct 14, 2024

Conversation

shubham-s-agarwal
Copy link
Collaborator

Pushing for 3 updates:

  1. Removed the check and update for labels with zero data, as this was causing issues during evaluation
  2. Resolved an issue where the confusion matrix couldn't be calculated when testing on a single class with an F1 score of 1, as it expected the original number of training classes (3)
  3. Updated the attention mask creation to dynamically use the actual pad_idx value instead of assuming it to be 0

Pushing for 3 updates:
1) Removed the check and update for labels with zero data, as this was causing issues during evaluation
2) Resolved an issue where the confusion matrix couldn't be calculated when testing on a single class with an F1 score of 1, as it expected the original number of training classes (3)
3) Updated the attention mask creation to dynamically use the actual pad_idx value instead of assuming it to be 0
Copy link
Collaborator

@mart-r mart-r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do a slightly better approach where I commented.

But I'll leave a more general comment here as well.

This PR attempts to fix 3 things at once. But it's not immediately clear which part of the changes corresponds to which fix. And the commit messages don't help either
If in the future there's issues with some of this code, we'll come back and see when this was changed. And we'll come to this PR. We'll try to understand why the changes were made. But the commit messages won't be able to help us. And we'll be left having to look into the change logs to deduce which of the 3 fixes are relevant to the code at hand initially.

I would recommend in the future to use more descriptive commits and do one thing at a time

  • Instead of "Pushing change" with all the changes you're making
  • Make one change, and commit it with a message (e.g) "Allow dynamic use of correct pad_idx value instead of defaulting to 0"
  • Then make the next change and commit with a message specific to that change

By doing this you would make the (potential) future trouble shooting easier since we would be able to track a change to a specific commit and its descriptive commit message. And it also has the added bonus of lowering the chance of committing changes that you weren't meaning to include since you're only dealing with one thing at a time.

PS:
In principle, it's also best to try to limit 1 PR to 1 change. But given how small these 3 are, I'm not too worried about it here.

medcat/utils/meta_cat/ml_utils.py Outdated Show resolved Hide resolved
For the issue where the confusion matrix couldn't be calculated when testing on a single class with an F1 score of 1, as it expected the original number of training classes (3), pushing an optimized version w/o the try except block
Copy link
Collaborator

@mart-r mart-r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@mart-r mart-r merged commit 3e01747 into master Oct 14, 2024
8 checks passed
mart-r pushed a commit to mart-r/MedCAT that referenced this pull request Oct 14, 2024
* MetaCAT fixes and upgrades

Pushing for 3 updates:
1) Removed the check and update for labels with zero data, as this was causing issues during evaluation
2) Resolved an issue where the confusion matrix couldn't be calculated when testing on a single class with an F1 score of 1, as it expected the original number of training classes (3)
3) Updated the attention mask creation to dynamically use the actual pad_idx value instead of assuming it to be 0

* Pushing type fix

* Pushing for type fix

* Fixing type issues

* Pushing change

* Pushing update w/o try except block

For the issue where the confusion matrix couldn't be calculated when testing on a single class with an F1 score of 1, as it expected the original number of training classes (3), pushing an optimized version w/o the try except block
mart-r pushed a commit that referenced this pull request Oct 14, 2024
* MetaCAT fixes and upgrades

Pushing for 3 updates:
1) Removed the check and update for labels with zero data, as this was causing issues during evaluation
2) Resolved an issue where the confusion matrix couldn't be calculated when testing on a single class with an F1 score of 1, as it expected the original number of training classes (3)
3) Updated the attention mask creation to dynamically use the actual pad_idx value instead of assuming it to be 0

* Pushing type fix

* Pushing for type fix

* Fixing type issues

* Pushing change

* Pushing update w/o try except block

For the issue where the confusion matrix couldn't be calculated when testing on a single class with an F1 score of 1, as it expected the original number of training classes (3), pushing an optimized version w/o the try except block
@shubham-s-agarwal shubham-s-agarwal deleted the metacat_fixes branch October 14, 2024 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants