Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

127 adjust threshold #136

Merged
merged 21 commits into from
Sep 8, 2023
Merged

127 adjust threshold #136

merged 21 commits into from
Sep 8, 2023

Conversation

yiwen-h
Copy link
Member

@yiwen-h yiwen-h commented Sep 7, 2023

  • When training models, dataset now split into train, validation, test
  • Validation set used to calculate thresholds for best F1 scores for each label
  • Calculated thresholds included in Excel analysis document for each model
  • predict_multilabel now takes as argument a dict containing thresholds to use for each label

Testing has shown small improvements to performance
Next steps: integrate into the API

now set by type of data fed into the function
model_type now inferred from isinstance of model, or from n_dim of predicted probabilities
removed enhance_with_probs from all predict_with_sklearn - it is now default behaviour to use probs and threshold of 0.5 unless a threshold_dict is provided
@yiwen-h yiwen-h requested a review from ChrisBeeley September 7, 2023 11:35
@codecov
Copy link

codecov bot commented Sep 7, 2023

Codecov Report

Patch coverage: 81.35% and project coverage change: +0.41% 🎉

Comparison is base (71ece91) 88.36% compared to head (5c1c217) 88.78%.
Report is 16 commits behind head on development.

Additional details and impacted files
@@               Coverage Diff               @@
##           development     #136      +/-   ##
===============================================
+ Coverage        88.36%   88.78%   +0.41%     
===============================================
  Files               11       11              
  Lines              937      945       +8     
===============================================
+ Hits               828      839      +11     
+ Misses             109      106       -3     
Files Changed Coverage Δ
pxtextmining/pipelines/multilabel_pipeline.py 88.54% <67.64%> (-11.46%) ⬇️
...ining/factories/factory_predict_unlabelled_text.py 83.16% <85.48%> (-0.36%) ⬇️
pxtextmining/factories/factory_write_results.py 75.00% <85.71%> (-1.37%) ⬇️
...xtextmining/factories/factory_model_performance.py 93.69% <92.30%> (+16.01%) ⬆️
pxtextmining/factories/factory_pipeline.py 92.94% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@yiwen-h
Copy link
Member Author

yiwen-h commented Sep 7, 2023

closes #127

@yiwen-h yiwen-h merged commit 6221306 into development Sep 8, 2023
Copy link
Member

@ChrisBeeley ChrisBeeley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants