Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pytest can't detect 2 or more outliers in a function #274

Closed
jxnior01 opened this issue May 5, 2023 · 2 comments
Closed

Pytest can't detect 2 or more outliers in a function #274

jxnior01 opened this issue May 5, 2023 · 2 comments
Assignees
Labels
invalid This doesn't seem right

Comments

@jxnior01
Copy link
Contributor

jxnior01 commented May 5, 2023

Describe the bug

The following function: def remove_rows_with_outliers doesn't perform its function correctly because it can only detect and remove a maximum of 1 outlier. That is, 2 or more outliers should be removed but that doesn't work.

To Reproduce

  1. Run the following function: test_should_remove_rows_with_outliers in /Stdlib/tests/safeds/data/tabular/containers/table/test_remove_rows_with_outliers.py

The Tests should run successfully and remove only 1 row containing an outlier

  1. Add 1 or more outliers to an existing column e.g col1, col2 or col3.

Outlier Definition: An outlier is defined as a value that has a distance of more than 3 standard deviations from the column mean. Missing values are not considered outliers. They are also ignored during the calculation of the standard deviation.

  1. Repeat step 1

Expected behavior

All Outliers should be detected and removed.

Screenshots (optional)

No response

Additional Context (optional)

No response

@zzril
Copy link
Contributor

zzril commented May 19, 2023

The test function should also be changed to test_should_remove_rows_with_outliers.

A new test should also be added to catch the incorrect old behaviour.

@Marsmaennchen221
Copy link
Contributor

The current function can remove mutliple outliers. It is possible that you added another high value but with this addition the outliers were no outliers anymore because of the small data you used to test this. In #309 I added 2 new tests that test mutliple outliers in different columns and multiple outliers in the same column.

@Marsmaennchen221 Marsmaennchen221 closed this as not planned Won't fix, can't repro, duplicate, stale May 26, 2023
@github-project-automation github-project-automation bot moved this from In Progress to ✔️ Done in Library May 26, 2023
@Marsmaennchen221 Marsmaennchen221 added the invalid This doesn't seem right label May 26, 2023
jxnior01 pushed a commit that referenced this issue Jun 12, 2023
…le outliers in one column and outliers in two different columns (#309)

### Summary of Changes

test: Added test for `Table.remove_rows_with_outliers` to test multiple
outliers in one column and outliers in two different columns

See my
[Comment](#274 (comment))
in the referenced issue for further explanation as there is no bug with
the current method `Table.remove_rows_with_outliers`

### Additional Context

See #274

---------

Co-authored-by: megalinter-bot <129584137+megalinter-bot@users.noreply.github.com>
Co-authored-by: Severin Paul Höfer <84280965+zzril@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid This doesn't seem right
Projects
Archived in project
Development

No branches or pull requests

3 participants