-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Computational performance - optimize for speed #107
Comments
I can't help directly, but ScikitLearn is built on PyCall.jl. You can check from there how to do that. Something like using PyCall
sys = pyimport("numpy.distutils.system_info")
sys.getinfo("blas_opt"))
Are you making one call with a big Apart from that, it all depends on the Python code, so there's not much I can do there. DecisionTrees.jl might provide |
Hi @cstjean, Thanks for your reply!
No, I actually use the trained random forest classifiers (100 trees) to make atomic predictions online. That is, each time I only have one datapoint with about 45 features. Extracting the features is almost instantaneous, whereas making the predictions takes about 0.1 seconds. I have actually found this discussion on stack overflow : https://stackoverflow.com/questions/50676717/why-sklearn-random-forest-takes-the-same-time-to-predict-one-sample-than-n-sampl |
DecisionTrees.jl supports the ScikitLearn interface, so it shouldn't be too hard to give it a try! |
Hi,
I am using the ScikitLearn.jl library to train Random Forest classifiers. After the training, I note that re-applying the trained models to new datapoints take about 0.2 seconds. After some tests, it seems that this amount of time is un-related to the number of trees and features. Instead, it seems to be latency time.
I had a look at the scikit-learn webpage here: https://scikit-learn.org/0.15/modules/computational_performance.html
Here they mention that the computational performance of scikitlearn heavily relies on Numpy/Scipy and linear algebra and that it makes sense to take care of these libraries. So they propose to check that Numpy is built using an optimized BLAS/LAPACK library, as follows:
from numpy.distutils.system_info import get_info print(get_info('blas_opt')) print(get_info('lapack_opt'))
Any idea of how I can check for this in Julia?
Else, do you have any suggestion to speed-up the ScikitLearn.jl predictions?
The text was updated successfully, but these errors were encountered: