fix: correct unification implementation for `RankingQuestionStrategy` #4295

plaguss · 2023-11-21T14:38:45Z

Description

Currently we have the following behaviour for :

## FeedbackRecord.responses
[
    ResponseSchema(
        user_id=None,
        values={
            'ranking': ValueSchema(
                value=[
                    RankingValueSchema(value='yes', rank=2),
                    RankingValueSchema(value='no', rank=3)
                ]
            )
        },
        status=<ResponseStatus.submitted: 'submitted'>
    ),
    ResponseSchema(
        user_id=None,
        values={
            'ranking': ValueSchema(
                value=[
                    RankingValueSchema(value='yes', rank=2),
                    RankingValueSchema(value='no', rank=1)
                ]
            )
        },
        status=<ResponseStatus.submitted: 'submitted'>
    )
]

## Unified responses:

[UnifiedValueSchema(value='yes', strategy=<RatingQuestionStrategy.MIN: 'min'>)]

Where we should have:

[UnifiedValueSchema(value=[{'value': 'yes', 'rank': 2}, {'value': 'no', 'rank': 1}], strategy=<RatingQuestionStrategy.MIN: 'min'>)]

This PR fixes the issue

Type of change

(Please delete options that are not relevant. Remember to title the PR according to the type of change)

Bug fix (non-breaking change which fixes an issue)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

How Has This Been Tested

(Please describe the tests that you ran to verify your changes. And ideally, reference tests)

tests/integration/client/feedback/test_unification.py

Checklist

I followed the style guidelines of this project
I did a self-review of my code
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I filled out the contributor form (see text above)
I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)

…kings

github-actions · 2023-11-21T15:07:50Z

The URL of the deployed environment for this PR is https://argilla-quickstart-pr-4295-ki24f765kq-no.a.run.app

plaguss · 2023-11-21T15:46:46Z

Hi @davidberenstein1957, I've removed the mean strategy for the moment, I think it's hard to interpret. For example, using the examples from the tests:

# representation as a dataframe of the rankings
>>> df
       value    rank
0  (yes, no)  (2, 3)
1  (yes, no)  (2, 1)
2  (yes, no)  (2, 3)

The mean in this case would be (asumming we want the mean in the ranks, and obtain the mean of each "row" in the rank):
(2, 2.3333)
This value by itself doesn't have a clear interpretation I think, so we would have ti find the "most similar" in a sense
from the available ranks in the responses. That would be one of the following:

0  (yes, no)  (2, 3)
2  (yes, no)  (2, 3)

I think that it's a bit hard to reason of a mean for the rankings, when we would aim to obtain the majority (or mode) in a sense, what do you think?

I can correct the tests that assume the mean strategy exists for the RankingQuestion

davidberenstein1957 · 2023-11-22T10:53:42Z

src/argilla/client/feedback/unification.py

@@ -202,13 +204,11 @@ def unify_responses(self, records: List[FeedbackRecord], question: str):
 class RankingQuestionStrategy(RatingQuestionStrategyMixin, Enum):
    """
    Options:
-        - "mean": the mean value of the rankings


hi, this is a breaking change and should be represented in the changelog and the docs. I do prefer to still include this to avoid having too much fragmentation in the unification methods.

def calculate_average_ranking(data): label_rank_sum = {} label_count = {} for ranking in data: for item in ranking: label, rank = item.popitem() label_rank_sum[label] = label_rank_sum.get(label, 0) + rank label_count[label] = label_count.get(label, 0) + 1 average_ranking = {label: label_rank_sum[label] / label_count[label] for label in label_rank_sum} return average_ranking # Example usage: data = [[{"label_1": 2}, {"label_2": 1}], [{"label_1": 1}, {"label_2": 2}]] result = calculate_average_ranking(data) print(result)

after this I would only expect the labels to be mapped back to the original available ranks through an zip(rank, sorted_result_based_on_values)

codecov · 2023-11-22T19:56:20Z

Codecov Report

Attention: 50 lines in your changes are missing coverage. Please review.

Comparison is base (b97a4fc) 64.76% compared to head (75593d0) 64.66%.
Report is 2 commits behind head on develop.

Files	Patch %	Lines
src/argilla/client/feedback/unification.py	5.66%	50 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #4295      +/-   ##
===========================================
- Coverage    64.76%   64.66%   -0.10%     
===========================================
  Files          321      321              
  Lines        18511    18540      +29     
===========================================
+ Hits         11988    11989       +1     
- Misses        6523     6551      +28

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

plaguss · 2023-11-24T17:08:34Z

While testing more functionality for the metrics I've found another bug, will try to fix it using this PR:

Testing with the following dataset from huggingface (plaguss/go_emotions_raw)
Checking the responses we have the following one of the records for example:

>>> feedback_dataset.records[0].responses

[ResponseSchema(user_id=UUID('00000000-0000-0000-0000-000000000001'), values={'label': ValueSchema(value=['neutral'])}, status=<ResponseStatus.submitted: 'submitted'>),
 ResponseSchema(user_id=UUID('00000000-0000-0000-0000-000000000016'), values={'label': ValueSchema(value=['anger', 'annoyance', 'optimism'])}, status=<ResponseStatus.submitted: 'submitted'>),
 ResponseSchema(user_id=UUID('00000000-0000-0000-0000-000000000028'), values={'label': ValueSchema(value=['approval'])}, status=<ResponseStatus.submitted: 'submitted'>),
 ResponseSchema(user_id=UUID('00000000-0000-0000-0000-000000000039'), values={'label': ValueSchema(value=['neutral'])}, status=<ResponseStatus.submitted: 'submitted'>),
 ResponseSchema(user_id=UUID('00000000-0000-0000-0000-000000000048'), values={'label': ValueSchema(value=['annoyance'])}, status=<ResponseStatus.submitted: 'submitted'>)]

And after unifying the responses:

feedback_dataset.records[0].unified_responses

{'label': [UnifiedValueSchema(value=[], strategy=<RatingQuestionStrategy.MAJORITY: 'majority'>)]}

We should have one of the labels contained in the responses.

Solved in the following commit.

* develop: (30 commits) chore: increase dev version release to 1.21.0 fix: responses and suggestions filter QA (#4337) feat: delete suggestion from record on search engine (#4336) feat: update suggestion from record on search engine (#4339) bug: fix bug and update test (#4341) fix: preserve `TextClassificationSettings.label_schema` order (#4332) Update issue templates feat: 🚀 support for filtering and sorting by responses and suggestions (#4160) fix: handling errors for non-existing endpoints (#4325) feat: adding utils module and functions (#4121) Update labels in github workflows (#4315) fix: correct unification implementation for `RankingQuestionStrategy` (#4295) fix: update to solve the error of integration tests in CI (#4314) docs: revisit install process (#4261) feat: increase timeout minutes for python tests (#4307) docs: docs export dataset does not apply coloring for code snippets (#4296) docs: update final section of the rag haystack blog post (#4294) feat: add multi_modal templates and update vector setting (#4283) feat: better logging bar for FeedbackDataset (#4267) refactor: ArgillaTrainer for unified variable usage (#4214) ... # Conflicts: # frontend/v1/infrastructure/repositories/RecordRepository.ts

Agustin Piqueres added 2 commits November 21, 2023 14:36

fix: correct unification of rankin questions and removed mean for ran…

2d21662

…kings

chore: update changelog

60782d5

plaguss marked this pull request as ready for review November 21, 2023 14:40

plaguss requested a review from davidberenstein1957 November 21, 2023 14:40

davidberenstein1957 reviewed Nov 22, 2023

View reviewed changes

feat: include mean as a unification strategy for ranking questions

3825fee

davidberenstein1957 self-requested a review November 23, 2023 15:56

fix: ensure at least one option os gathered from the labels

75593d0

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Nov 24, 2023

This was referenced Nov 24, 2023

[FEATURE] Add more tests for unification strategies #4320

Closed

feat: add metrics module aligned with unification #4271

Merged

davidberenstein1957 approved these changes Nov 26, 2023

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Nov 26, 2023

davidberenstein1957 merged commit dcdf788 into develop Nov 26, 2023

davidberenstein1957 deleted the fix/ranking-question-strategy branch November 26, 2023 13:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: correct unification implementation for `RankingQuestionStrategy` #4295

fix: correct unification implementation for `RankingQuestionStrategy` #4295

plaguss commented Nov 21, 2023 •

edited

Loading

github-actions bot commented Nov 21, 2023

plaguss commented Nov 21, 2023

davidberenstein1957 Nov 22, 2023

davidberenstein1957 Nov 22, 2023

davidberenstein1957 Nov 22, 2023

codecov bot commented Nov 22, 2023 •

edited

Loading

plaguss commented Nov 24, 2023 •

edited

Loading

fix: correct unification implementation for RankingQuestionStrategy #4295

fix: correct unification implementation for RankingQuestionStrategy #4295

Conversation

plaguss commented Nov 21, 2023 • edited Loading

Description

github-actions bot commented Nov 21, 2023

plaguss commented Nov 21, 2023

davidberenstein1957 Nov 22, 2023

Choose a reason for hiding this comment

davidberenstein1957 Nov 22, 2023

Choose a reason for hiding this comment

davidberenstein1957 Nov 22, 2023

Choose a reason for hiding this comment

codecov bot commented Nov 22, 2023 • edited Loading

Codecov Report

plaguss commented Nov 24, 2023 • edited Loading

fix: correct unification implementation for `RankingQuestionStrategy` #4295

fix: correct unification implementation for `RankingQuestionStrategy` #4295

plaguss commented Nov 21, 2023 •

edited

Loading

codecov bot commented Nov 22, 2023 •

edited

Loading

plaguss commented Nov 24, 2023 •

edited

Loading