Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modify cumulative_distribution function #220

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

lei-1126
Copy link

Hi, I have used the package copulas when I need to simulate real data. I used GaussianMultivariate to fit, and then sampled data by GaussianMultivariate.sample(),but I found it was very slow when I used 20000 samples(10 features) for fitting to generate 20000 simulation data. I found that a lot time was spent in calculating the cumulative distribution function. So I modified some source code in the module gaussian_kde.py to improve the speed. In the end, my simulation speed increased by about 100 times.

Of course, as my data are all integers, so for a variable, there are many samples with the same value and I can count each value to redefine the weight. Although copula is used for continuous data, In real situations, data of int type is often used for fitting, and most of the values are the same for one variable, especially when there is lot of training data. In such situation, the simulation speed will increase a lot if we use the modified code.

best wishes

@lei-1126 lei-1126 closed this Apr 2, 2021
@lei-1126 lei-1126 reopened this Apr 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant