Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lack of Clarity on the Parameters of the Distribution #23

Closed
cschiri opened this issue Jun 7, 2020 · 8 comments
Closed

Lack of Clarity on the Parameters of the Distribution #23

cschiri opened this issue Jun 7, 2020 · 8 comments

Comments

@cschiri
Copy link

cschiri commented Jun 7, 2020

If I use the "get_best" sub-module as follows:

f.get_best(method='sumsquare_error')

It returns the best fitted distribution and its parameters; i.e., a dictionary with one key (the distribution name) and its parameters.

For instance:

{'beta': (1.0900359801761663, 0.8058383063379988, -9.543996466545888, 107.5439964665459)}

Could you please provide clarity on which is the mean, standard deviation, etc? The package documentation does not provide clarity.

@shadowboxingskills
Copy link

+1 (same question/issue as @cschiri )

@cokelaer
Copy link
Owner

I believe that the list of values is in the exact same order as the one used under the hood by scipy. Not necesseraly obvious to retrieve which is which. Not sure I will implement it soon though. If you are willing to help, I'll be happy to include this feature. This may have side effects when plotting the results. Maybe it would be easier to have a new method to do the work. Sorry for not helping more.

@easonanalytica
Copy link

A quick note for those in need: Use beta for instance, the parameters are (a, b, loc, scale). In scipy.stats.distributions the mean (loc) and standard deviation (scale) will always be the last two values. Normal distribution will just have 2 paras (loc, scale). In the above example, we have {loc:-9.543996466545888, scale: 107.5439964665459}.

@Magiclemond
Copy link

I have the same question in FUNCTION ERLANG, but I can not make sure what the paras mean...TOT

@rahul-raoniar
Copy link

rahul-raoniar commented Jun 24, 2021

The Filter package is very useful. Thanks to all contributors. To make it more reachable for students and researchers I wrote this blog [recently added the streamlit app link in the blog]

Medium Blog Link

Yes, in the get_best function this problem exists. Even in Scipy documentation, it is sometimes not clear. I tried to implement a Streamlit app using the Fitter library and faced the same issue. To resolve that I scraped all distribution-related data and made a dictionary where keys are the parameter name and values are the best parameter values.

The problem is that few of the distributions [which are not a part of distribution] produces an error when trying to retrieve the best parameters using the dictionary approach.

# Removing these three from dictionary resolved the issue.
#    "rv_continuous":   ,
#    "rv_histogram":     ,
#    "trapz":                 , 

Even scraping is not consistent as some of the distribution page follows old URL style in Scipy documentation

For example, these four URL follows old webpage of Scipy documentation

# https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.frechet_l.html
# https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.frechet_r.html
# https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.reciprocal.html
# https://docs.scipy.org/doc/scipy-0.17.0/reference/generated/scipy.integrate.trapz.html

The new page style is like this:

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html

I hope this issue resolves soon.

@kabirmdasraful
Copy link

If I use the "get_best" sub-module as follows:

f.get_best(method='sumsquare_error')

It returns the best fitted distribution and its parameters; i.e., a dictionary with one key (the distribution name) and its parameters.

For instance:

{'beta': (1.0900359801761663, 0.8058383063379988, -9.543996466545888, 107.5439964665459)}

Could you please provide clarity on which is the mean, standard deviation, etc? The package documentation does not provide clarity.

f=Fitter()
f.fit()
best= f.get_best(method = 'sumsquare_error')
distribution = getattr(st, list(best.keys())[0])
param_names = (distribution.shapes + ', loc, scale').split(', ') if distribution.shapes else ['loc', 'scale']

param_dict= {}
for d_key, d_val in zip (param_names,list(best.values())[0]):
    param_dict[d_key]= d_val

These few lines of code will help to get parameter dictionary @cokelaer I believe you can find a way to implement this portion of code so that user can see the name of parameter also

kabirmdasraful pushed a commit to kabirmdasraful/fitter that referenced this issue Aug 24, 2021
@kabirmdasraful
Copy link

@cokelaer I am not a good coder. Even this is my first public repository contribution. The pull request is there. There is probably some problem in Linux which is related to a gamma distribution. But it has nothing to do with the function that I have edited. Still, you can check and decide if it is ok to accept the pull request.

cokelaer added a commit that referenced this issue Sep 2, 2021
Solved issue #23: Lack of Clarity on the Parameters of the Distribution
@cokelaer
Copy link
Owner

cokelaer commented Sep 2, 2021

@kabirmdasraful thanks again for your contribution. I will release a new version of fitter on pypi (1.4.0) and will update the documentation accordingly.

@cokelaer cokelaer closed this as completed Sep 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants