Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong estimated accuracy values for multiclass classification using CBPE #346

Closed
nnansters opened this issue Nov 28, 2023 · 0 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@nnansters
Copy link
Contributor

Describe the bug
We have a dataset for a multiclass classification problem. When estimating accuracy using CBPE the results are different depending on the label values being used.

Replacing the string valued labels by integer values does yields different, correct results.

To Reproduce

ref = pd.read_parquet('image_satellite_reference_index.pq')
ana = pd.read_parquet('image_satellite_analysis_index.pq')

replacement = {'water': 0, 'desert': 1, 'green area': 2, 'clouds': 3}
ref.replace({'y_true': replacement, 'y_pred': replacement}, inplace=True)
ana.replace({'y_true': replacement, 'y_pred': replacement}, inplace=True)

est = nml.CBPE(
    timestamp_column_name='timestamp',
    chunk_period='D',
    metrics=['accuracy'],
    y_true='y_true',
    y_pred='y_pred',
    y_pred_proba={'water': 'pred_proba_0', 'desert': 'pred_proba_1', 'green area': 'pred_proba_2', 'clouds': 'pred_proba_3'},
    problem_type='classification_multiclass',
    thresholds={'accuracy': ConstantThreshold(lower=0.95, upper=1.0)}
).fit(ref)

res = est.estimate(ana)
res.plot().show()

This code will not fail but contain incorrect values for accuracy.

The correct results are given when replacing the labels in y_true and y_pred with integers:

ref = pd.read_parquet('image_satellite_reference_index.pq')
ana = pd.read_parquet('image_satellite_analysis_index.pq')

replacement = {'water': 0, 'desert': 1, 'green area': 2, 'clouds': 3}
ref.replace({'y_true': replacement, 'y_pred': replacement}, inplace=True)
ana.replace({'y_true': replacement, 'y_pred': replacement}, inplace=True)

est = nml.CBPE(
    timestamp_column_name='timestamp',
    chunk_period='D',
    metrics=['accuracy'],
    y_true='y_true',
    y_pred='y_pred',
    y_pred_proba={0: 'pred_proba_0', 1: 'pred_proba_1', 2: 'pred_proba_2', 3: 'pred_proba_3'},
    problem_type='classification_multiclass',
    thresholds={'accuracy': ConstantThreshold(lower=0.95, upper=1.0)}
).fit(ref)

res = est.estimate(ana)
res.plot().show()

Expected behavior
Running both snippets of code (with and without replacing label values) should yield the same results.

@nnansters nnansters added bug Something isn't working triage Needs to be assessed labels Nov 28, 2023
@nnansters nnansters self-assigned this Nov 28, 2023
@nnansters nnansters removed the triage Needs to be assessed label Nov 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant