Implementation of Weighted CRF Tagger (handling unbalanced datasets) #341

eraldoluis · 2022-06-22T21:59:49Z

Closes allennlp issue #4619.

Depends on allennlp PR #5676

Changes proposed in this pull request:

I implemented and experimentally compared three sample weighting strategies for CrfTagger.
I added two parameters to CrfTagger: label_weights and weight_strategy.
The parameter label_weights is a Dict[str, float] with a mapping {label : weight} to be used in the loss function in order to give different weights for each token depending on its label.
The parameter weight_strategy can be: None 'emission', 'emission_transition' or 'lannoy'.
If label_weights is given and weight_strategy is None or 'emission', then the emission score of each tag is multiplied by the corresponding weight (as given by label_weights).
If emission_transition, both emission and transition scores of each tag are multiplied by the corresponding weight.
If weight_strategy is 'lannoy', then we use the strategy proposed by Lannoy et al. (2019).
An experimental comparison among these three strategies and a brief discussion of their differences here.
Tests were created to cover the new feature.

epwalsh

LGTM!

CHANGELOG.md

eraldoluis added 11 commits June 22, 2022 22:35

(rebase) Weighted CRF: scaled emission scores

012ec54

Added FBetaMeasure to CrfTagger just to test class weights

bb3e695

Added FBetaMeasure2 to CrfTagger.

b264389

Fixed bug regarding label_weights in CrfTagger

6bdaf2a

CrfTagger: using micro and macro average for FBetaMeasure2

63c7fe2

CRF weighting strategies

0f10325

Weighted CRF: adjustments considering refactoring

1d7a97f

Weighted CRF tests

926d3a8

Weighted CRF: tests minor adjustments

c0d9798

CrfTagger: added test regarding FBetaVerboseMeasure

ae401d3

CrfTagger: black formatting

7200ea0

eraldoluis mentioned this pull request Jun 22, 2022

Implementation of Weighted CRF Tagger (handling unbalanced datasets) allenai/allennlp#5676

Merged

5 tasks

epwalsh self-assigned this Jun 30, 2022

eraldoluis added 2 commits July 13, 2022 23:30

Merge branch 'main' into weighted_crf

9be867e

Updated CrfTagger to the new module organization

e7daa53

epwalsh approved these changes Jul 14, 2022

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

Update CHANGELOG.md

eb4f170

epwalsh enabled auto-merge (squash) July 14, 2022 00:42

epwalsh merged commit 97df196 into allenai:main Jul 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of Weighted CRF Tagger (handling unbalanced datasets) #341

Implementation of Weighted CRF Tagger (handling unbalanced datasets) #341

eraldoluis commented Jun 22, 2022

epwalsh left a comment

Implementation of Weighted CRF Tagger (handling unbalanced datasets) #341

Implementation of Weighted CRF Tagger (handling unbalanced datasets) #341

Conversation

eraldoluis commented Jun 22, 2022

epwalsh left a comment

Choose a reason for hiding this comment