modify cumulative_distribution function #220

lei-1126 · 2021-03-29T07:52:01Z

Hi, I have used the package copulas when I need to simulate real data. I used GaussianMultivariate to fit, and then sampled data by GaussianMultivariate.sample(),but I found it was very slow when I used 20000 samples(10 features) for fitting to generate 20000 simulation data. I found that a lot time was spent in calculating the cumulative distribution function. So I modified some source code in the module gaussian_kde.py to improve the speed. In the end, my simulation speed increased by about 100 times.

Of course, as my data are all integers, so for a variable, there are many samples with the same value and I can count each value to redefine the weight. Although copula is used for continuous data, In real situations, data of int type is often used for fitting, and most of the values are the same for one variable, especially when there is lot of training data. In such situation, the simulation speed will increase a lot if we use the modified code.

best wishes

lei-1126 added 2 commits March 29, 2021 15:48

gaussian_kde.py: modify cumulative_distribution function

916cd57

add self._lower,self._upper

2cd71e7

lei-1126 closed this Apr 2, 2021

lei-1126 reopened this Apr 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

modify cumulative_distribution function #220

modify cumulative_distribution function #220

lei-1126 commented Mar 29, 2021

modify cumulative_distribution function #220

Are you sure you want to change the base?

modify cumulative_distribution function #220

Conversation

lei-1126 commented Mar 29, 2021