Added per-class probability prediction for random forests #138

AlanRace · 2022-07-11T14:11:05Z

Added a function to predict the per-class probability of each class for each observation.

let probabilities = forest.predict_probs(&data).unwrap();

probabilities is a KxC matrix, where K is the number of observations and C is the number of classes. Probabilities are calculated as the fraction of trees in the random forest that predicted the given class.

Answer to #50 for random forests.

src/ensemble/random_forest_classifier.rs

VolodymyrOrlov · 2022-07-14T00:49:22Z

@AlanRace thank you for your contribution to Smartcore! The change looks good, but you might want to look at clippy warnings as well as increase test coverage to get this code through automatic checks

Mec-iS · 2022-08-24T11:42:29Z

I have added a test here but there is something wrong, please take a look:
AlanRace#1

Add test to predict probabilities

AlanRace · 2022-08-29T14:27:19Z

@Mec-iS Thanks for supplying the test - I am guessing there was a problem due to row-major vs column-major storage of DenseMatrix? Swapping the number of rows and columns in your test and then transposing the matrix results in a passing test.

codecov-commenter · 2022-08-29T14:32:18Z

Codecov Report

Merging #138 (7f7b2ed) into development (b4a807e) will increase coverage by 0.60%.
The diff coverage is 100.00%.

@@               Coverage Diff               @@
##           development     #138      +/-   ##
===============================================
+ Coverage        83.40%   84.01%   +0.60%     
===============================================
  Files               78       81       +3     
  Lines             8377     8751     +374     
===============================================
+ Hits              6987     7352     +365     
- Misses            1390     1399       +9

Impacted Files	Coverage Δ
src/ensemble/random_forest_classifier.rs	`75.58% <100.00%> (+4.54%)`	⬆️
src/linalg/evd.rs	`86.06% <0.00%> (ø)`
src/linear/lasso_optimizer.rs	`94.11% <0.00%> (ø)`
src/algorithm/neighbour/mod.rs	`78.57% <0.00%> (ø)`
src/algorithm/neighbour/distances.rs	`66.66% <0.00%> (ø)`
src/preprocessing/numerical.rs	`88.88% <0.00%> (ø)`
src/algorithm/neighbour/fastpair.rs	`95.67% <0.00%> (ø)`
src/linalg/naive/dense_matrix.rs	`80.11% <0.00%> (+0.89%)`	⬆️
src/optimization/first_order/lbfgs.rs	`94.44% <0.00%> (+1.58%)`	⬆️
src/linalg/mod.rs	`58.57% <0.00%> (+5.49%)`	⬆️
... and 1 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Mec-iS · 2022-08-29T14:41:57Z

thanks @AlanRace

it is probably better to adhere to the DenseMatrix format, so it would be nice for the method to return the transposed values or directly a DenseMatrix.

AlanRace · 2022-08-30T07:02:40Z

Maybe I am misunderstanding, but predict_probs does return a DenseMatrix.

It looks like DenseMatrix::from_vec (which is called from DenseMatrix::from_array as part of your test) assumes that the given vector is in row-major form, but the entered values in the test are in column-major form.

Would you prefer that the matrix returned from predict_probs is num classes x num observations, rather than the current num observations x num classes?

Mec-iS · 2022-08-30T09:02:52Z

yeah, probably in the shape returned by from_vector and from_array is handier. thanks again

Mec-iS · 2022-08-30T10:15:42Z

@morenol @VolodymyrOrlov could you please take a look to the WASM test failing? it looks like we have different results for different targets. Looks like rounding works different for WASM, the results look close but not close enough.

morenol · 2022-09-22T17:26:09Z

src/ensemble/random_forest_classifier.rs

@@ -553,6 +554,37 @@ impl<T: RealNumber> RandomForestClassifier<T> {
        which_max(&result)
    }

+    /// Predict the per-class probabilties for each observation.
+    /// The probability is calculated as the fraction of trees that predicted a given class
+    pub fn predict_probs<M: Matrix<T>>(&self, x: &M) -> Result<DenseMatrix<f64>, Failed> {


in scikit it is called predict_proba, I think that it is better to keep the same name

morenol · 2022-09-23T23:05:42Z

src/ensemble/random_forest_classifier.rs

+                20,
+                2,
+                &[
+                    1.0, 0.0, 0.78, 0.22, 0.95, 0.05, 0.82, 0.18, 1.0, 0.0, 0.92, 0.08, 0.99, 0.01,


Are these the expected values?

yes. those are the results as returned by the test. they match among all the targets except Wasm.

They are failing for me locally, and they also failed in the CI for x86_64-unknown-linux-gnu,

I think that the green checks in the CI are only in the CI jobs that builds the crate but not run tests (32 bits arch)

alexis2804 · 2022-10-03T09:24:06Z

Hello guys, when does this fucntion will be available ? I totally need it in order to perform model ensembling !

Thanks a lot

Mec-iS · 2022-10-03T11:20:25Z

@alexis2804 unfortunately we have problems with some tests, you can take a look at them by fetching this branch

Mec-iS · 2022-10-31T19:26:26Z

moved to #211 to solve conflicts

Added per-class probability prediction for random forests

663db03

VolodymyrOrlov reviewed Jul 14, 2022

View reviewed changes

src/ensemble/random_forest_classifier.rs Show resolved Hide resolved

Mec-iS added 2 commits August 24, 2022 11:44

Add test

2603a1f

Add test

61db4eb

AlanRace and others added 3 commits August 29, 2022 15:57

Merge pull request #1 from smartcorelib/alanrace-predict-probs

b6fb819

Add test to predict probabilities

Merge branch 'development' into predict-probability

d46b830

Fixed test by transposing matrix

7f7b2ed

Test case now passing without transpose

28c81eb

Merge remote-tracking branch 'sm/development' into predict-probability

e9ed9e8

morenol reviewed Sep 22, 2022

View reviewed changes

morenol reviewed Sep 23, 2022

View reviewed changes

Mec-iS mentioned this pull request Oct 31, 2022

Implement predict_proba as per #138 #211

Open

Mec-iS closed this Oct 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added per-class probability prediction for random forests #138

Added per-class probability prediction for random forests #138

AlanRace commented Jul 11, 2022 •

edited

Loading

VolodymyrOrlov commented Jul 14, 2022

Mec-iS commented Aug 24, 2022 •

edited

Loading

AlanRace commented Aug 29, 2022

codecov-commenter commented Aug 29, 2022

Mec-iS commented Aug 29, 2022

AlanRace commented Aug 30, 2022

Mec-iS commented Aug 30, 2022

Mec-iS commented Aug 30, 2022 •

edited

Loading

morenol Sep 22, 2022

morenol Sep 23, 2022

Mec-iS Sep 24, 2022

morenol Sep 24, 2022 •

edited

Loading

alexis2804 commented Oct 3, 2022 •

edited

Loading

Mec-iS commented Oct 3, 2022

Mec-iS commented Oct 31, 2022 •

edited

Loading

Added per-class probability prediction for random forests #138

Added per-class probability prediction for random forests #138

Conversation

AlanRace commented Jul 11, 2022 • edited Loading

VolodymyrOrlov commented Jul 14, 2022

Mec-iS commented Aug 24, 2022 • edited Loading

AlanRace commented Aug 29, 2022

codecov-commenter commented Aug 29, 2022

Codecov Report

Mec-iS commented Aug 29, 2022

AlanRace commented Aug 30, 2022

Mec-iS commented Aug 30, 2022

Mec-iS commented Aug 30, 2022 • edited Loading

morenol Sep 22, 2022

Choose a reason for hiding this comment

morenol Sep 23, 2022

Choose a reason for hiding this comment

Mec-iS Sep 24, 2022

Choose a reason for hiding this comment

morenol Sep 24, 2022 • edited Loading

Choose a reason for hiding this comment

alexis2804 commented Oct 3, 2022 • edited Loading

Mec-iS commented Oct 3, 2022

Mec-iS commented Oct 31, 2022 • edited Loading

AlanRace commented Jul 11, 2022 •

edited

Loading

Mec-iS commented Aug 24, 2022 •

edited

Loading

Mec-iS commented Aug 30, 2022 •

edited

Loading

morenol Sep 24, 2022 •

edited

Loading

alexis2804 commented Oct 3, 2022 •

edited

Loading

Mec-iS commented Oct 31, 2022 •

edited

Loading