Feat: implement Random Projections #332

GBathie · 2024-02-17T09:16:25Z

This PR implements random projection techniques for dimensionality reduction, as seen in the sklean.random_projection module of scikit-learn

Contains two algorithms based on variants on the Johnson-lindenstrauss lemma: - Random projections with Gaussian coefficients - Sparse random projections with +/- 1 coefficients (multiplied by a scaling factor).

codecov-commenter · 2024-02-17T09:23:28Z

Codecov Report

Attention: Patch coverage is 8.79121% with 83 lines in your changes are missing coverage. Please review.

Project coverage is 35.87%. Comparing base (4e40ce6) to head (6b9c2a4).

Files	Patch %	Lines
...nfa-reduction/src/random_projection/hyperparams.rs	3.44%	28 Missing ⚠️
...s/linfa-reduction/src/random_projection/methods.rs	0.00%	26 Missing ⚠️
...infa-reduction/src/random_projection/algorithms.rs	21.87%	25 Missing ⚠️
...ms/linfa-reduction/src/random_projection/common.rs	0.00%	4 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #332      +/-   ##
==========================================
- Coverage   36.18%   35.87%   -0.32%     
==========================================
  Files          92       96       +4     
  Lines        6218     6303      +85     
==========================================
+ Hits         2250     2261      +11     
- Misses       3968     4042      +74

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

quietlychris · 2024-02-17T15:13:45Z

I've done a quick review; this looks good to me, but have requested @bytesnake also give it a look as he is probably more familiar with the algorithm side of things.

algorithms/linfa-reduction/src/random_projection/sparse/algorithms.rs

algorithms/linfa-reduction/src/random_projection/common.rs

algorithms/linfa-reduction/src/random_projection/gaussian/algorithms.rs

bytesnake · 2024-02-22T07:57:26Z

I've done a quick review; this looks good to me, but have requested @bytesnake also give it a look as he is probably more familiar with the algorithm side of things.

thank you for reviewing @relf @quietlychris

RNG defaults to Xoshiro256Plus if not provided by user. Also added tests for minimum dimension using values from scikit-learn.

GBathie · 2024-03-01T17:47:47Z

Thank you for the reviews, and @relf for the suggestions, I have implemented them.

Changes:

The rng field for both random projections structs is no longer optional, and defaults to Xoshiro256Plus with a fixed seed if not provided by the user.
Renamed precision parameter to eps, as increasing this parameter results in a lower dimension embedding and often yields lower classification performance.
Added a check that the projections reduce the dimension of the data, returning an error otherwise. Added tests for this behavior.
Added a test for the function that computes the embedding dimension from a given epsilon, against values from scikit-learn .
Fixed reference issue in docs and other minor issues.

relf

Thanks for your contribution and the changes. Now, gaussian and sparse random projection codes look very alike, I am wondering if you could not refactor even further by using zero-sized types and a unique RandomProjection generic type, something like:

struct Gaussian;
struct Sparse;

pub struct RandomProjectionValidParams<RandomMethod, R: Rng + Clone> {
    pub params: RandomProjectionParamsInner,
    pub rng: Option<R>,
    pub method: std::marker::PhantomData<RandomMethod>,
}

pub struct RandomProjectionParams<RandomMethod, R: Rng + Clone>(
    pub(crate) RandomProjectionValidParams<RandomMethod, R>,
);

pub struct RandomProjection<RandomMethod, F: Float> {
    projection: Array2<F>,
    method: std::marker::PhantomData<RandomMethod>,
}

pub struct GaussianRandomProjection<F: Float> = RandomProjection<Gaussian, F: Float>;
pub struct SparseRandomProjection<F: Float> = RandomProjection<Sparse, F: Float>;

impl<F, Rec, T, R> Fit<Rec, T, ReductionError> for RandomProjectionValidParams<Gausssian, R>
where
    F: Float,
    Rec: Records<Elem = F>,
    StandardNormal: Distribution<F>,
    R: Rng + Clone,
{
    type Object = RandomProjection<Gaussian, F>;

    fn fit(&self, dataset: &linfa::DatasetBase<Rec, T>) -> Result<Self::Object, ReductionError> {...}
}

impl<F, Rec, T, R> Fit<Rec, T, ReductionError> for RandomProjectionValidParams<Sparse, R>
where
    F: Float,
    Rec: Records<Elem = F>,
    StandardNormal: Distribution<F>,
    R: Rng + Clone,
{
    type Object = RandomProjection<Sparse, F>;

    fn fit(&self, dataset: &linfa::DatasetBase<Rec, T>) -> Result<Self::Object, ReductionError> {...}
}

...

What do you think?

GBathie · 2024-03-02T16:00:41Z

Thanks for your contribution and the changes. Now, gaussian and sparse random projection codes look very alike, I am wondering if you could not refactor even further by using zero-sized types and a unique RandomProjection generic type, something like:

struct Gaussian;
struct Sparse;

pub struct RandomProjectionValidParams<RandomMethod, R: Rng + Clone> {
    pub params: RandomProjectionParamsInner,
    pub rng: Option<R>,
    pub method: std::marker::PhantomData<RandomMethod>,
}

pub struct RandomProjectionParams<RandomMethod, R: Rng + Clone>(
    pub(crate) RandomProjectionValidParams<RandomMethod, R>,
);

pub struct RandomProjection<RandomMethod, F: Float> {
    projection: Array2<F>,
    method: std::marker::PhantomData<RandomMethod>,
}

pub struct GaussianRandomProjection<F: Float> = RandomProjection<Gaussian, F: Float>;
pub struct SparseRandomProjection<F: Float> = RandomProjection<Sparse, F: Float>;

impl<F, Rec, T, R> Fit<Rec, T, ReductionError> for RandomProjectionValidParams<Gausssian, R>
where
    F: Float,
    Rec: Records<Elem = F>,
    StandardNormal: Distribution<F>,
    R: Rng + Clone,
{
    type Object = RandomProjection<Gaussian, F>;

    fn fit(&self, dataset: &linfa::DatasetBase<Rec, T>) -> Result<Self::Object, ReductionError> {...}
}

impl<F, Rec, T, R> Fit<Rec, T, ReductionError> for RandomProjectionValidParams<Sparse, R>
where
    F: Float,
    Rec: Records<Elem = F>,
    StandardNormal: Distribution<F>,
    R: Rng + Clone,
{
    type Object = RandomProjection<Sparse, F>;

    fn fit(&self, dataset: &linfa::DatasetBase<Rec, T>) -> Result<Self::Object, ReductionError> {...}
}

...

What do you think?

I think that's a very good suggestion, it will be easier to maintain than the previous approach using a macro to avoid code duplication. 6b9c2a4 implements a variation of this idea: all the logic has been refactored, and behavior depending on the projection method has been encapsulated in the ProjectionMethod trait. It also makes implementing other projection methods significantly easier.

GBathie added 3 commits February 17, 2024 09:57

Add random projections algorithms for dimensionality reduction.

175cc13

Contains two algorithms based on variants on the Johnson-lindenstrauss lemma: - Random projections with Gaussian coefficients - Sparse random projections with +/- 1 coefficients (multiplied by a scaling factor).

Merge branch 'master' of github.com:GBathie/linfa

f42a549

Update readme

1999d0f

quietlychris requested a review from bytesnake February 17, 2024 15:06

relf reviewed Feb 19, 2024

View reviewed changes

algorithms/linfa-reduction/src/random_projection/sparse/algorithms.rs Outdated Show resolved Hide resolved

relf reviewed Feb 20, 2024

View reviewed changes

algorithms/linfa-reduction/src/random_projection/common.rs Show resolved Hide resolved

algorithms/linfa-reduction/src/random_projection/gaussian/algorithms.rs Outdated Show resolved Hide resolved

GBathie added 3 commits March 1, 2024 18:05

Add RNG to random projection structs

3569c10

RNG defaults to Xoshiro256Plus if not provided by user. Also added tests for minimum dimension using values from scikit-learn.

Check that random projections actually reduce the dimension of the data.

a5f1ad3

Use fixed dimension in error tests

151900a

relf reviewed Mar 1, 2024

View reviewed changes

Refactor random projections code

6b9c2a4

relf approved these changes Mar 2, 2024

View reviewed changes

Merge branch 'master' into master

23407e3

quietlychris approved these changes Mar 30, 2024

View reviewed changes

quietlychris merged commit 2eaa686 into rust-ml:master Mar 30, 2024
19 of 20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: implement Random Projections #332

Feat: implement Random Projections #332

GBathie commented Feb 17, 2024

codecov-commenter commented Feb 17, 2024 •

edited

Loading

quietlychris commented Feb 17, 2024

bytesnake commented Feb 22, 2024

GBathie commented Mar 1, 2024

relf left a comment

GBathie commented Mar 2, 2024

Feat: implement Random Projections #332

Feat: implement Random Projections #332

Conversation

GBathie commented Feb 17, 2024

codecov-commenter commented Feb 17, 2024 • edited Loading

Codecov Report

quietlychris commented Feb 17, 2024

bytesnake commented Feb 22, 2024

GBathie commented Mar 1, 2024

relf left a comment

Choose a reason for hiding this comment

GBathie commented Mar 2, 2024

codecov-commenter commented Feb 17, 2024 •

edited

Loading