-
-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: add code for marron wand simulator #203
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few comments that I think can result in runtime issues and hidden bugs even.
Can you also add a unit test? I think an easy thing to test is just that things run, expected error comes out for simulation
and
sktree/datasets/hyppo.py
Outdated
raise ValueError( | ||
f"Simulation must be trunk, trunk_overlap, trunk_mix, {MARRON_WAND_SIMS.keys()}" | ||
) | ||
|
||
y = np.concatenate((np.zeros(n_samples // 2), np.ones(n_samples // 2))) | ||
|
||
if return_params: | ||
return X, y, [mu_0, mu_1], [cov, cov] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, the mu and covariance are not easily returned anymore. I'm wondering if we should wrap them in an object then instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, what do you think we need to return in return_params
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we use return_params
to get information about the data generating model as well as possibly parameters of the DGM.
Is it easy to wrap these as a subclass for rv_continuous
class (https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.html#scipy.stats.rv_continuous)? Then we can just return the instantiated object. Though I'm not familiar with that API, and how we can recover the exact parametrization of a specific MW simulation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sampan501 if there's no way to address this, then should we just error out on return_params=True
if simulation is not trunk?
I think we just want the DGM to be returned, so perhaps it would make sense to just return a callable such that
returned_model.sample(n_samples)
can generate new samples of data. For rng.multivariate_normal
or if we can get an API in the framework of rv_continuous
, then we can do things like:
test = multivariate_normal(mean=2.5, cov=0.5)
print(test)
print(test.mean)
print(test.cov)
print(test.rvs(size=(10,)))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could just return X
, Y
, norm_params
, and G
for the Marron and Wand Sims right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing additional is G, which are the observations from the mixed Gaussian
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that sounds reasonable for now. Can you update the docstring accordingly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lmk when this is ready for review. Now |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #203 +/- ##
==========================================
+ Coverage 88.45% 88.48% +0.03%
==========================================
Files 54 54
Lines 4823 4891 +68
==========================================
+ Hits 4266 4328 +62
- Misses 557 563 +6 ☔ View full report in Codecov by Sentry. |
@adam2392 PR is ready except for formatting. Not sure where the error is |
You can run I pushed 665dfdf which should fix the issues |
Signed-off-by: Adam Li <adam2392@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we'll be using the simulations a lot, it would be great to document it well at this stage to prevent future confusion. I left a few comments on where I think it would be necessary.
Few other nit picks and then LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll let you merge once CIs are green
N/A
Changes proposed in this pull request:
Before submitting
section of the
CONTRIBUTING
docs.Writing docstrings section of the
CONTRIBUTING
docs.After submitting