Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Z score returns DivisionByZeroError when N = 0. #40

Closed
ditlevjoergensen opened this issue Sep 23, 2020 · 5 comments
Closed

Z score returns DivisionByZeroError when N = 0. #40

ditlevjoergensen opened this issue Sep 23, 2020 · 5 comments

Comments

@ditlevjoergensen
Copy link

Hello. I hope this is the correct way to report issues I have run into using your program.

When i try using the Benford on a dataset containing only single integers it returns DivisionByZero error when calculating F2D/F3D/SD/L2D simply because the Base() class which transforms data places -1 for all values. Thus giving N = 0.

To bypass this I have simply made my test data contain values > 1000. However, I cant manipulate produktion data. As of today, there is no issues as every data-set i test Benford() on luckily contain at least 1 data point with a value > 1000.
I don't like having to specifically TRY/CATCH each time i use Benford, simply because it calculates for all digits, ie. F1D, F2D, F3D, SD, L2D.

I think there should be an input to Benford() stating which digits to test against?

`def Z_score(frame, N):
"""Computes the Z statistics for the proportions studied

    Args:
        frame: DataFrame with the expected proportions and the already calculated
            Absolute Diferences between the found and expeccted proportions
        N: sample size

    Returns:
        Series of computed Z scores
    """
  return (frame.AbsDif - (1 / (2 * N))) / sqrt(
           (frame.Expected * (1. - frame.Expected)) / N)

E ZeroDivisionError: division by zero`

Hope this makes sence, there is nothing wrong with the Z_score function. The issue is how Benford() tries to test against fx. F3D even if there is no 3 digit number in the data set.

@milcent
Copy link
Owner

milcent commented Sep 23, 2020

Hi @ditlevjoergensen,
Thanks for posting!
This is a nice suggestion. Maybe an automatic detector of which tests could be performed.
For it to work right now, you could try setting the ‘decimals’ parameter for values higher than zero (even if you are dealing with integers). During preparation of the data, it multiplies by 10**decimals. In theory, this would not affect F1D, but may affect the result of other tests, since it would include zeros to the right of numbers that would not originally enter the test.
Or you could use the old functions that perform only the tests of their names, like bf.first_digits(), bf.second_digit(), and bf.last_two_digits(). Their downside is they process the input data every time they are called, while the Benford object does this only once.
Let me know if this works for you.

@ditlevjoergensen
Copy link
Author

Thank you @milcent for your swift response! I will look into using the bf.first_digits() functions.
As of now I've added the try/catch just to be safe as I'm unsure how future data looks. Then I'll manually handle errors and place -1 when it raises ZeroDivisionError.

@ditlevjoergensen
Copy link
Author

Hi again @milcent,
Using bf.first_digits() and bf.second_digit() solved it for me! :) Using those i was able to get F1D, F2D and SD which I needed.

I found that setting input parameters KS=True, and chi_squared=True didn't work. However it didn't matter as i simply imported the functions directly from stats and used them manually on the result. The functions only returned [['Counts', 'Found', 'Expected', 'Z_score']].

Thank you for helping me in the right direction, i will close the issue.

regards Ditlev.

@milcent
Copy link
Owner

milcent commented Sep 24, 2020

Dear @ditlevjoergensen ,
Yes, the KS and Chi-square in those old functions only appear printed if you turn them on, since the functions only return a DataFrame, so if you are using in production (as you mentioned), they won´t be stored anywhere.
You have given me nice ideas. Even with the issue closed, I shall include them as improvements.
Cheers,
Milcent

@milcent
Copy link
Owner

milcent commented Nov 20, 2020

I have inserted a fix in an internal functional, so the Z-score won't receive N=0
Clone the latest master branch and try it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants