Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: correct unification implementation for RankingQuestionStrategy #4295

Merged
merged 4 commits into from
Nov 26, 2023

Conversation

plaguss
Copy link
Contributor

@plaguss plaguss commented Nov 21, 2023

Description

Currently we have the following behaviour for :

## FeedbackRecord.responses
[
    ResponseSchema(
        user_id=None,
        values={
            'ranking': ValueSchema(
                value=[
                    RankingValueSchema(value='yes', rank=2),
                    RankingValueSchema(value='no', rank=3)
                ]
            )
        },
        status=<ResponseStatus.submitted: 'submitted'>
    ),
    ResponseSchema(
        user_id=None,
        values={
            'ranking': ValueSchema(
                value=[
                    RankingValueSchema(value='yes', rank=2),
                    RankingValueSchema(value='no', rank=1)
                ]
            )
        },
        status=<ResponseStatus.submitted: 'submitted'>
    )
]

## Unified responses:

[UnifiedValueSchema(value='yes', strategy=<RatingQuestionStrategy.MIN: 'min'>)]

Where we should have:

[UnifiedValueSchema(value=[{'value': 'yes', 'rank': 2}, {'value': 'no', 'rank': 1}], strategy=<RatingQuestionStrategy.MIN: 'min'>)]

This PR fixes the issue

Type of change

(Please delete options that are not relevant. Remember to title the PR according to the type of change)

  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

How Has This Been Tested

(Please describe the tests that you ran to verify your changes. And ideally, reference tests)

  • tests/integration/client/feedback/test_unification.py

Checklist

  • I followed the style guidelines of this project
  • I did a self-review of my code
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I filled out the contributor form (see text above)
  • I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)

@plaguss plaguss marked this pull request as ready for review November 21, 2023 14:40
Copy link

The URL of the deployed environment for this PR is https://argilla-quickstart-pr-4295-ki24f765kq-no.a.run.app

@plaguss
Copy link
Contributor Author

plaguss commented Nov 21, 2023

Hi @davidberenstein1957, I've removed the mean strategy for the moment, I think it's hard to interpret. For example, using the examples from the tests:

# representation as a dataframe of the rankings
>>> df
       value    rank
0  (yes, no)  (2, 3)
1  (yes, no)  (2, 1)
2  (yes, no)  (2, 3)

The mean in this case would be (asumming we want the mean in the ranks, and obtain the mean of each "row" in the rank):
(2, 2.3333)
This value by itself doesn't have a clear interpretation I think, so we would have ti find the "most similar" in a sense
from the available ranks in the responses. That would be one of the following:

0  (yes, no)  (2, 3)
2  (yes, no)  (2, 3)

I think that it's a bit hard to reason of a mean for the rankings, when we would aim to obtain the majority (or mode) in a sense, what do you think?

I can correct the tests that assume the mean strategy exists for the RankingQuestion

@@ -202,13 +204,11 @@ def unify_responses(self, records: List[FeedbackRecord], question: str):
class RankingQuestionStrategy(RatingQuestionStrategyMixin, Enum):
"""
Options:
- "mean": the mean value of the rankings

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi, this is a breaking change and should be represented in the changelog and the docs. I do prefer to still include this to avoid having too much fragmentation in the unification methods.

image

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def calculate_average_ranking(data):
    label_rank_sum = {}
    label_count = {}

    for ranking in data:
        for item in ranking:
            label, rank = item.popitem()
            label_rank_sum[label] = label_rank_sum.get(label, 0) + rank
            label_count[label] = label_count.get(label, 0) + 1

    average_ranking = {label: label_rank_sum[label] / label_count[label] for label in label_rank_sum}

    return average_ranking

# Example usage:
data = [[{"label_1": 2}, {"label_2": 1}], [{"label_1": 1}, {"label_2": 2}]]

result = calculate_average_ranking(data)
print(result)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after this I would only expect the labels to be mapped back to the original available ranks through an zip(rank, sorted_result_based_on_values)

Copy link

codecov bot commented Nov 22, 2023

Codecov Report

Attention: 50 lines in your changes are missing coverage. Please review.

Comparison is base (b97a4fc) 64.76% compared to head (75593d0) 64.66%.
Report is 2 commits behind head on develop.

Files Patch % Lines
src/argilla/client/feedback/unification.py 5.66% 50 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #4295      +/-   ##
===========================================
- Coverage    64.76%   64.66%   -0.10%     
===========================================
  Files          321      321              
  Lines        18511    18540      +29     
===========================================
+ Hits         11988    11989       +1     
- Misses        6523     6551      +28     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@plaguss
Copy link
Contributor Author

plaguss commented Nov 24, 2023

While testing more functionality for the metrics I've found another bug, will try to fix it using this PR:

Testing with the following dataset from huggingface (plaguss/go_emotions_raw)
Checking the responses we have the following one of the records for example:

>>> feedback_dataset.records[0].responses

[ResponseSchema(user_id=UUID('00000000-0000-0000-0000-000000000001'), values={'label': ValueSchema(value=['neutral'])}, status=<ResponseStatus.submitted: 'submitted'>),
 ResponseSchema(user_id=UUID('00000000-0000-0000-0000-000000000016'), values={'label': ValueSchema(value=['anger', 'annoyance', 'optimism'])}, status=<ResponseStatus.submitted: 'submitted'>),
 ResponseSchema(user_id=UUID('00000000-0000-0000-0000-000000000028'), values={'label': ValueSchema(value=['approval'])}, status=<ResponseStatus.submitted: 'submitted'>),
 ResponseSchema(user_id=UUID('00000000-0000-0000-0000-000000000039'), values={'label': ValueSchema(value=['neutral'])}, status=<ResponseStatus.submitted: 'submitted'>),
 ResponseSchema(user_id=UUID('00000000-0000-0000-0000-000000000048'), values={'label': ValueSchema(value=['annoyance'])}, status=<ResponseStatus.submitted: 'submitted'>)]

And after unifying the responses:

feedback_dataset.records[0].unified_responses

{'label': [UnifiedValueSchema(value=[], strategy=<RatingQuestionStrategy.MAJORITY: 'majority'>)]}

We should have one of the labels contained in the responses.


Solved in the following commit.

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Nov 24, 2023
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Nov 26, 2023
@davidberenstein1957 davidberenstein1957 merged commit dcdf788 into develop Nov 26, 2023
@davidberenstein1957 davidberenstein1957 deleted the fix/ranking-question-strategy branch November 26, 2023 13:55
leiyre pushed a commit that referenced this pull request Nov 29, 2023
* develop: (30 commits)
  chore: increase dev version release to 1.21.0
  fix: responses and suggestions filter QA (#4337)
  feat: delete suggestion from record on search engine (#4336)
  feat: update suggestion from record on search engine (#4339)
  bug: fix bug and update test (#4341)
  fix: preserve `TextClassificationSettings.label_schema` order (#4332)
  Update issue templates
  feat: 🚀 support for filtering and sorting by responses and suggestions (#4160)
  fix: handling errors for non-existing endpoints (#4325)
  feat: adding utils module and functions (#4121)
  Update labels in github workflows (#4315)
  fix: correct unification implementation for `RankingQuestionStrategy` (#4295)
  fix: update to solve the error of integration tests in CI (#4314)
  docs: revisit install process (#4261)
  feat: increase timeout minutes for python tests (#4307)
  docs: docs export dataset does not apply coloring for code snippets (#4296)
  docs: update final section of the rag haystack blog post (#4294)
  feat: add multi_modal templates and update vector setting (#4283)
  feat: better logging bar for FeedbackDataset (#4267)
  refactor: ArgillaTrainer for unified variable usage (#4214)
  ...

# Conflicts:
#	frontend/v1/infrastructure/repositories/RecordRepository.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants