ENH: add code for marron wand simulator #203

sampan501 · 2024-01-23T03:17:11Z

N/A

Changes proposed in this pull request:

Added Marron and Wand 1992 simulations with multivariate normal simulations
Added truth for each simulation

Before submitting

I've read and followed all steps in the Making a pull request
section of the CONTRIBUTING docs.
I've updated or added any relevant docstrings following the syntax described in the
Writing docstrings section of the CONTRIBUTING docs.
If this PR fixes a bug, I've added a test that will fail without my fix.
If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

After submitting

All GitHub Actions jobs for my pull request have passed.

adam2392

Few comments that I think can result in runtime issues and hidden bugs even.

Can you also add a unit test? I think an easy thing to test is just that things run, expected error comes out for simulation and

sktree/datasets/hyppo.py

adam2392 · 2024-01-25T14:18:18Z

sktree/datasets/hyppo.py

+        raise ValueError(
+            f"Simulation must be trunk, trunk_overlap, trunk_mix, {MARRON_WAND_SIMS.keys()}"
+        )
+
    y = np.concatenate((np.zeros(n_samples // 2), np.ones(n_samples // 2)))

    if return_params:
        return X, y, [mu_0, mu_1], [cov, cov]


Technically, the mu and covariance are not easily returned anymore. I'm wondering if we should wrap them in an object then instead?

I'm not sure, what do you think we need to return in return_params?

I guess we use return_params to get information about the data generating model as well as possibly parameters of the DGM.

Is it easy to wrap these as a subclass for rv_continuous class (https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.html#scipy.stats.rv_continuous)? Then we can just return the instantiated object. Though I'm not familiar with that API, and how we can recover the exact parametrization of a specific MW simulation.

@sampan501 if there's no way to address this, then should we just error out on return_params=True if simulation is not trunk?

I think we just want the DGM to be returned, so perhaps it would make sense to just return a callable such that
returned_model.sample(n_samples) can generate new samples of data. For rng.multivariate_normal or if we can get an API in the framework of rv_continuous, then we can do things like:

test = multivariate_normal(mean=2.5, cov=0.5) print(test) print(test.mean) print(test.cov) print(test.rvs(size=(10,)))

we could just return X, Y, norm_params, and G for the Marron and Wand Sims right?

The only thing additional is G, which are the observations from the mixed Gaussian

Yeah that sounds reasonable for now. Can you update the docstring accordingly?

adam2392 · 2024-02-01T18:00:01Z

Lmk when this is ready for review.

Now n_informative controls the number of informative features. You can set it to default 256. cc: #203 (comment)

codecov · 2024-02-07T15:40:33Z

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (18c2f45) 88.45% compared to head (7c41e75) 88.48%.
Report is 1 commits behind head on main.

Files	Patch %	Lines
sktree/datasets/hyppo.py	90.62%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #203      +/-   ##
==========================================
+ Coverage   88.45%   88.48%   +0.03%     
==========================================
  Files          54       54              
  Lines        4823     4891      +68     
==========================================
+ Hits         4266     4328      +62     
- Misses        557      563       +6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

sampan501 · 2024-02-07T16:47:16Z

@adam2392 PR is ready except for formatting. Not sure where the error is

adam2392 · 2024-02-07T17:02:29Z

You can run make pre-commit to run pre-commit linting pipelines, or setup pre-commit to automatically run before each commit.

I pushed 665dfdf which should fix the issues

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392

Since we'll be using the simulations a lot, it would be great to document it well at this stage to prevent future confusion. I left a few comments on where I think it would be necessary.

sktree/datasets/hyppo.py

adam2392 · 2024-02-07T19:29:33Z

Few other nit picks and then LGTM

sktree/datasets/hyppo.py

adam2392

I'll let you merge once CIs are green

add code for marron wand simulator and true density

c0636fa

adam2392 self-requested a review January 23, 2024 14:06

stop sim signal at 256 dimensions

e2a00e6

adam2392 reviewed Jan 25, 2024

View reviewed changes

sampan501 added 2 commits January 27, 2024 13:10

fix simulations

d49928d

fix simulation bugs

c3aa597

sampan501 and others added 4 commits February 6, 2024 11:44

Merge branch 'main' into marron-wand-sims

4b55f8a

add unit tests

78f60c2

fix import

1a2a6b8

fix unit test

0797070

fix typing issues

e3b1384

Address style issues

665dfdf

Signed-off-by: Adam Li <adam2392@gmail.com>

sampan501 changed the title ~~ENH: add code for marron wand simulator and true density~~ ENH: add code for marron wand simulator Feb 7, 2024

adam2392 reviewed Feb 7, 2024

View reviewed changes

sktree/datasets/hyppo.py Outdated Show resolved Hide resolved

sktree/datasets/hyppo.py Outdated Show resolved Hide resolved

sktree/datasets/hyppo.py Outdated Show resolved Hide resolved

sampan501 and others added 3 commits February 7, 2024 12:39

add more details in docstring

b1a0dca

Merge branch 'main' into marron-wand-sims

111261d

update return_params

6688725

adam2392 reviewed Feb 7, 2024

View reviewed changes

sktree/datasets/hyppo.py Outdated Show resolved Hide resolved

adam2392 reviewed Feb 7, 2024

View reviewed changes

sktree/datasets/hyppo.py Show resolved Hide resolved

fix else if to be trunk-mix instead of trunk-overlap

7f70343

sampan501 requested a review from adam2392 February 8, 2024 00:19

adam2392 reviewed Feb 8, 2024

View reviewed changes

sktree/datasets/hyppo.py Show resolved Hide resolved

Update sktree/datasets/hyppo.py

7c41e75

adam2392 approved these changes Feb 8, 2024

View reviewed changes

sampan501 merged commit 945864c into main Feb 8, 2024
30 checks passed

sampan501 deleted the marron-wand-sims branch February 8, 2024 01:47

adam2392 mentioned this pull request Feb 16, 2024

[ENH] Datasets based on Marron and Wand, Trunk, etc. #185

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: add code for marron wand simulator #203

ENH: add code for marron wand simulator #203

sampan501 commented Jan 23, 2024

adam2392 left a comment

adam2392 Jan 25, 2024

sampan501 Feb 6, 2024

adam2392 Feb 6, 2024

adam2392 Feb 7, 2024

sampan501 Feb 7, 2024

sampan501 Feb 7, 2024 •

edited

Loading

adam2392 Feb 7, 2024

sampan501 Feb 7, 2024

adam2392 commented Feb 1, 2024

codecov bot commented Feb 7, 2024 •

edited

Loading

sampan501 commented Feb 7, 2024

adam2392 commented Feb 7, 2024 •

edited

Loading

adam2392 left a comment

adam2392 commented Feb 7, 2024

adam2392 left a comment

ENH: add code for marron wand simulator #203

ENH: add code for marron wand simulator #203

Conversation

sampan501 commented Jan 23, 2024

Before submitting

After submitting

adam2392 left a comment

Choose a reason for hiding this comment

adam2392 Jan 25, 2024

Choose a reason for hiding this comment

sampan501 Feb 6, 2024

Choose a reason for hiding this comment

adam2392 Feb 6, 2024

Choose a reason for hiding this comment

adam2392 Feb 7, 2024

Choose a reason for hiding this comment

sampan501 Feb 7, 2024

Choose a reason for hiding this comment

sampan501 Feb 7, 2024 • edited Loading

Choose a reason for hiding this comment

adam2392 Feb 7, 2024

Choose a reason for hiding this comment

sampan501 Feb 7, 2024

Choose a reason for hiding this comment

adam2392 commented Feb 1, 2024

codecov bot commented Feb 7, 2024 • edited Loading

Codecov Report

sampan501 commented Feb 7, 2024

adam2392 commented Feb 7, 2024 • edited Loading

adam2392 left a comment

Choose a reason for hiding this comment

adam2392 commented Feb 7, 2024

adam2392 left a comment

Choose a reason for hiding this comment

sampan501 Feb 7, 2024 •

edited

Loading

codecov bot commented Feb 7, 2024 •

edited

Loading

adam2392 commented Feb 7, 2024 •

edited

Loading