-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BHASA LINDSEA scenarios #2694
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks amazing! I mostly just have minor comments. It is a little long so I did a partial review (not quite done with the scenarios, schema and run entries yet) - will review the rest tomorrow.
|
||
def _compute_chrf(self, refs: List[str], pred: str) -> Dict[str, float]: | ||
metrics: Dict[str, float] = {} | ||
metrics['ChrF++'] = self.chrf_scorer.sentence_score(pred, refs).score |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally our convention for metrics is camel case - could you make this "chr_f_plus_plus"
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correction - convention is snake case, not camel case.
aggregator.add_scores(self.rouge_scorer.score(ref, pred)) | ||
aggregates = aggregator.aggregate() | ||
for key, value in self.rouge_metrics.items(): | ||
metrics[value] = aggregates[key].mid.fmeasure * 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this to make the range 0 to 100 instead of 0 to 1? We generally prefer the 0 to 1 range.
metrics[value] = aggregates[key].mid.fmeasure * 100 | ||
return metrics | ||
|
||
def _remove_braces(self, text: str) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: Don't know if this is important, but if you want to ensure that braces are removed in a balanced way, then you should do
if text.startswith("{") and text.endswith("}"):
text = text[1:-1]
otherwise you might strip the brace from only the start or only the end. Likewise for the other occurrence of this function.
Also, for my education, why do we need to remove braces?
text = text[:-1] | ||
return text | ||
|
||
def evaluate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can omit this definition, if all you are doing is calling the super.
|
||
return result | ||
|
||
def get_bhasa_machine_translation_metric_specs() -> List[MetricSpec]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move these to a different file bhasa_metric_specs.py
(follow the conventions in common_metric_specs.py
).
The reason for doing this is that we don't want the bhasa_run_specs
file to import bhasa_metrics
, which transitively imports optional dependencies, otherwise this would cause helm-run
to fail for someone who doesn't have the optional dependencies installed.
}, | ||
} | ||
|
||
def generate_xquad_run_spec(language="th"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional: can inline this into get_xquad_spec()
since it isn't used elsewhere
name=name, | ||
scenario_spec=scenario_spec, | ||
adapter_spec=adapter_spec, | ||
metric_specs=get_f1_metric_specs(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all the sentiment analysis scenarios, you probably want get_exact_match_metric_specs() + get_classification_metric_specs()
- the confusingly-named get_f1_metric_specs()
gives you word overlap F1 instead of classification F1 and is intended for open ended generation evaluation.
name=name, | ||
scenario_spec=scenario_spec, | ||
adapter_spec=adapter_spec, | ||
metric_specs=get_f1_metric_specs(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment regarding metric_specs
as above.
} | ||
} | ||
|
||
def generate_xlsum_run_spec(language="id"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional: inline
same for the other inline-able functions below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you clarify:
- Is this subfolder is a redistributed copy of another code repository? If so:
- What is the URL for the original source?
- Has it been modified from the original source?
I see some URLs in README.md
and source code, but it is hard for me to tell if these URLs refer to the original source, or to other redistributed code contained in the original source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good, just some minor stuff left.
|
||
if split == "train": | ||
# Select only bottom 20th percentile by length for in-context examples as examples are very long | ||
data = df[df["passage_text"].apply(len) < df["passage_text"].apply(len).quantile(.2)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks for the clarification.
@run_spec_function("lindsea_syntax_minimal_pairs") | ||
def get_lindsea_syntax_minimal_pairs_spec(language: str = "id", method: str = "mcq") -> RunSpec: | ||
name = f"lindsea_syntax_minimal_pairs_{language}" | ||
if method == "mcq": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for doing the "A" and "B" formatting yourself, instead of using get_multiple_choice_joint_adapter_spec()
which will do it for you? get_multiple_choice_joint_adapter_spec()
has the added advantage of being more similar to get_multiple_choice_separate_adapter_spec()
except for logprobs vs generation.
- efficiency | ||
- general_information | ||
environment: | ||
main_name: accuracy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be exact_match
or quasi_exact_match
.
@@ -0,0 +1,167 @@ | |||
--- | |||
############################################################ | |||
metrics: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add exact_match
or quasi_exact_match
to this list.
Please run the linter: at the root of the repo, run: pip install -e '.[dev]'
./pre-commit.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you merge / rebase the changes from #2648?
Hi @yifanmai, I saw that a lot of PR checks (including this one) were failing with the same error message (Failed to build backports.zoneinfo). Please advise. |
@raileymontalan sorry, the main branch was broken. Could you merge main again? That should pick up the fix. |
@yifanmai Thanks! This PR has now passed all the checks :) |
As a reminder, there is still a need to discuss and revamp the calculation of metrics for the LINDSEA scenarios before this can be merged! |
I'll merge this first so that I can do some prototyping on top of this. |
Add the following for BHASA: