-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enanchment in CF Generation #259
base: main
Are you sure you want to change the base?
Conversation
Thanks @giandos200 for this PR. I executed all the gates. Could you please examine the failures and re-submit to make all the tests and linting pass? It looks like your PR has a lot of changes which are out of scope of the performance improvement. It will be great if you could clean all this and send out a commit focusing just on the perf improvements. Regards, |
It should be ok now, @gaugup . Maybe I have an older version because I have never changed the imports. |
19f2e47
to
785bc4a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take a look at the failing gates.
Signed-off-by: Gaurav Gupta <gaugup@microsoft.com> Signed-off-by: giandos200 <giando95menico@gmail.com>
Signed-off-by: giandos200 <giando95menico@gmail.com>
Signed-off-by: giandos200 <giando95menico@gmail.com>
Signed-off-by: giandos200 <giando95menico@gmail.com>
Signed-off-by: giandos200 <giando95menico@gmail.com>
Signed-off-by: giandos200 <giando95menico@gmail.com>
Signed-off-by: giandos200 <giando95menico@gmail.com>
…@gmail.com> Signed-off-by: giandos200 <giando95menico@gmail.com>
Signed-off-by: giandos200 <giando95menico@gmail.com>
Signed-off-by: giandos200 <giando95menico@gmail.com>
Signed-off-by: giandos200 <giando95menico@gmail.com>
Hi, I created this pull request to give some hints on CF generation @gaugup.
Regarding Random, although there is very clear and fast, finding combinations between feature sampling and substitution is unclear. The Loop inside, instead of gradually replacing more features actually in your code, only replaces one feature as :
selected_features = np.random.choice(self.features_to_vary, (sample_size, 1), replace=True)
1 should be replaced by num_features_to_vary and then .loc instead of .at.
This method is slower but certainly more complete and still faster than Genetic/KDtree (I have deliberately left it commented out for you).
If you want to leave a single variation, I suggest changing .at to ._get_value in the replacement for faster access.
As far as genetic is concerned, in the case of datasets with many features, a random initialization is very slow and seems never to end. For this reason, I suggest increasing the population of the KDtree initialization (which is also lowering the initialization time a lot). In addition, I recommend switching to a binary search in the case of requests for a large number of CFs.