Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DebiasWrapper for metrics #98

Closed
blondered opened this issue Feb 21, 2024 · 1 comment
Closed

DebiasWrapper for metrics #98

blondered opened this issue Feb 21, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@blondered
Copy link
Collaborator

blondered commented Feb 21, 2024

Feature Description

A metric wrapper that creates debiased validation in case of strong popularity bias in test data. One way to do this is to fight power-law popularity distribution in test interactions on each fold with down-sampling fold popular items.

Why this feature?

It helps as a correct goal for hyper-parameters tuning and model selection

Additional context

Algorithm to detect and down-sample excessively popular items. More algorithms and modifications can be proposed here. For now we can use IQR (interquartile-range) that is also used for boxplots: logic.

  1. We find first and third quartiles in test items popularity distribution (Q1 and Q3)
  2. IQR = Q3 - Q1. This is interquartile range. 50% of the observed data is inside this range.
  3. Outliers popularity border will be defined as Q3 + iqr_coef * IQR
  4. Maximum accepted popularity will be defined as the maximum value inside the border.
  5. Every item that exceeds the border should be down-sampled to match the maximum accepted popularity.

For all exceeding items in the test fold we need to randomly keep only the maximum allowed subset of users. We use downsampling for this.

The wrapper changes test interactions, but afterwards any metrics can be calculated as usual.

from rectools.metrics import DebiasWrapper, Precision

debiased_precision = DebiasWrapper(Precision(k=10), iqr_coef=1.5, random_state=32)

Other possible namings are: PopDownSamplingWrapper, DownSamplingWrapper, UnbiasedWrapper

@blondered blondered added the enhancement New feature or request label Feb 21, 2024
@blondered blondered moved this to 📋 Backlog in RecTools board Feb 21, 2024
@blondered blondered moved this from 📋 Backlog to 🆕 New in RecTools board Feb 21, 2024
@blondered blondered changed the title Down-sampling validation tools for popularity bias cases PopDownSamplingWrapper for metrics Feb 21, 2024
@blondered blondered moved this from 🆕 New to 🔖 Next in RecTools board Feb 26, 2024
@blondered blondered changed the title PopDownSamplingWrapper for metrics DebiasWrapper for metrics Feb 26, 2024
@In48semenov In48semenov self-assigned this Mar 13, 2024
@blondered blondered moved this from 🔖 Next to 🏗 In progress in RecTools board Mar 22, 2024
@blondered blondered moved this from 🏗 In progress to 👀 In review in RecTools board May 17, 2024
@feldlime feldlime moved this from 👀 In review to ✅ Done in RecTools board Aug 5, 2024
@feldlime
Copy link
Collaborator

feldlime commented Aug 5, 2024

Closed with #152

@feldlime feldlime closed this as completed Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: ✅ Done
Development

No branches or pull requests

3 participants