Computational performance - optimize for speed #107

CBongiova · 2022-01-17T17:03:04Z

Hi,

I am using the ScikitLearn.jl library to train Random Forest classifiers. After the training, I note that re-applying the trained models to new datapoints take about 0.2 seconds. After some tests, it seems that this amount of time is un-related to the number of trees and features. Instead, it seems to be latency time.

I had a look at the scikit-learn webpage here: https://scikit-learn.org/0.15/modules/computational_performance.html
Here they mention that the computational performance of scikitlearn heavily relies on Numpy/Scipy and linear algebra and that it makes sense to take care of these libraries. So they propose to check that Numpy is built using an optimized BLAS/LAPACK library, as follows:

from numpy.distutils.system_info import get_info print(get_info('blas_opt')) print(get_info('lapack_opt'))

Any idea of how I can check for this in Julia?
Else, do you have any suggestion to speed-up the ScikitLearn.jl predictions?

The text was updated successfully, but these errors were encountered:

cstjean · 2022-01-17T20:30:33Z

Any idea of how I can check for this in Julia?

I can't help directly, but ScikitLearn is built on PyCall.jl. You can check from there how to do that. Something like

using PyCall
sys = pyimport("numpy.distutils.system_info")
sys.getinfo("blas_opt"))

Else, do you have any suggestion to speed-up the ScikitLearn.jl predictions?

Are you making one call with a big n_sample X n_feature matrix to get your predictions?

Apart from that, it all depends on the Python code, so there's not much I can do there. DecisionTrees.jl might provide

CBongiova · 2022-01-18T14:06:46Z

Hi @cstjean,

Thanks for your reply!

Are you making one call with a big n_sample X n_feature matrix to get your predictions?

No, I actually use the trained random forest classifiers (100 trees) to make atomic predictions online. That is, each time I only have one datapoint with about 45 features. Extracting the features is almost instantaneous, whereas making the predictions takes about 0.1 seconds.

I have actually found this discussion on stack overflow : https://stackoverflow.com/questions/50676717/why-sklearn-random-forest-takes-the-same-time-to-predict-one-sample-than-n-sampl
The 0.1 seconds seems to be latency time which is unavoidable with Scikitlearn ... maybe other libraries or ML approaches are more appropriate for real-time applications.

cstjean · 2022-01-18T17:57:15Z

DecisionTrees.jl supports the ScikitLearn interface, so it shouldn't be too hard to give it a try!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Computational performance - optimize for speed #107

Computational performance - optimize for speed #107

CBongiova commented Jan 17, 2022

cstjean commented Jan 17, 2022

CBongiova commented Jan 18, 2022

cstjean commented Jan 18, 2022

Computational performance - optimize for speed #107

Computational performance - optimize for speed #107

Comments

CBongiova commented Jan 17, 2022

cstjean commented Jan 17, 2022

CBongiova commented Jan 18, 2022

cstjean commented Jan 18, 2022