DebiasWrapper for metrics #98

blondered · 2024-02-21T14:34:39Z

Feature Description

A metric wrapper that creates debiased validation in case of strong popularity bias in test data. One way to do this is to fight power-law popularity distribution in test interactions on each fold with down-sampling fold popular items.

Why this feature?

It helps as a correct goal for hyper-parameters tuning and model selection

Additional context

Algorithm to detect and down-sample excessively popular items. More algorithms and modifications can be proposed here. For now we can use IQR (interquartile-range) that is also used for boxplots: logic.

We find first and third quartiles in test items popularity distribution (Q1 and Q3)
IQR = Q3 - Q1. This is interquartile range. 50% of the observed data is inside this range.
Outliers popularity border will be defined as Q3 + iqr_coef * IQR
Maximum accepted popularity will be defined as the maximum value inside the border.
Every item that exceeds the border should be down-sampled to match the maximum accepted popularity.

For all exceeding items in the test fold we need to randomly keep only the maximum allowed subset of users. We use downsampling for this.

The wrapper changes test interactions, but afterwards any metrics can be calculated as usual.

from rectools.metrics import DebiasWrapper, Precision

debiased_precision = DebiasWrapper(Precision(k=10), iqr_coef=1.5, random_state=32)

Other possible namings are: PopDownSamplingWrapper, DownSamplingWrapper, UnbiasedWrapper

The text was updated successfully, but these errors were encountered:

feldlime · 2024-08-05T19:00:08Z

Closed with #152

blondered added the enhancement New feature or request label Feb 21, 2024

blondered added this to RecTools board Feb 21, 2024

blondered moved this to 📋 Backlog in RecTools board Feb 21, 2024

blondered moved this from 📋 Backlog to 🆕 New in RecTools board Feb 21, 2024

blondered changed the title ~~Down-sampling validation tools for popularity bias cases~~ PopDownSamplingWrapper for metrics Feb 21, 2024

blondered moved this from 🆕 New to 🔖 Next in RecTools board Feb 26, 2024

blondered changed the title ~~PopDownSamplingWrapper for metrics~~ DebiasWrapper for metrics Feb 26, 2024

In48semenov self-assigned this Mar 13, 2024

blondered moved this from 🔖 Next to 🏗 In progress in RecTools board Mar 22, 2024

blondered moved this from 🏗 In progress to 👀 In review in RecTools board May 17, 2024

feldlime moved this from 👀 In review to ✅ Done in RecTools board Aug 5, 2024

feldlime closed this as completed Aug 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DebiasWrapper for metrics #98

DebiasWrapper for metrics #98

blondered commented Feb 21, 2024 •

edited

Loading

feldlime commented Aug 5, 2024

DebiasWrapper for metrics #98

DebiasWrapper for metrics #98

Comments

blondered commented Feb 21, 2024 • edited Loading

Feature Description

Why this feature?

Additional context

feldlime commented Aug 5, 2024

blondered commented Feb 21, 2024 •

edited

Loading