-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't pass y=None
to the train_split
function
#646
Don't pass y=None
to the train_split
function
#646
Conversation
I had a quick look at this and it looks good so far. However, I'd like to think a bit longer on this since this change might break some existing code. I'd like to make sure that nothing severe breaks because of the change. Meanwhile, could you please add an entry to the CHANGES.md, in the |
5655dbe
to
d1cdeae
Compare
Thanks for reviewing the PR, sure, take your time. |
Thanks for updating the CHANGES.md. I would like to see an explicit test for the new behavior. As of now, it's only tested coincidentally. I'm a tiny bit nervous since this change could break existing code (as witnessed by the tests that needed changing). It's probably not necessary to raise a warning, but I could see adding a comment in line 1215 (before the
(of course with the proper indentations and line breaks) The rest looks good to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice test, thanks. I just have two questions, it would be nice if you could answer them (and, if appropriate, add comments to the test for posterity).
skorch/tests/test_net.py
Outdated
(True, lambda x, y: (x, x), ExitStack()), | ||
(True, lambda x: (x, x), pytest.raises(TypeError)), # Raises an error | ||
]) | ||
def test_train_split_with_nans(self, needs_y, train_split, raises): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused about the name test_train_split_with_nans
, where do nans come into play? Is the test not about y
being passed to train_split
only when it is not None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General remark: indeed, I didn't think well about the name of the test case.
where do nans come into play?
The NaN
s here come into play as y=None
and train_split=None
Is the test not about y being passed to train_split only when it is not None?
I am writing a more verbose answer to this question as the PR comment, so we can move the discussion there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, can we call this function test_passes_y_to_train_split_when_not_nan
? Frankly, I can't invent a better name 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
NaN
s here come into play asy=None
andtrain_split=None
Can you elaborate on that? Do you mean NaN
in the sense of "not a number"? I don't see how that can happen.
So, can we call this function
test_passes_y_to_train_split_when_not_nan
? Frankly, I can't invent a better name
With tests, it's always better to use a lengthy name that is more descriptive than a short, less descriptive name. Nobody is going to import your test anyway. If in doubt, tests should also rather contain too many than too few explanatory comments (some skorch tests are hard to understand because we didn't do that right from the start).
Regarding the name itself, I would prefer test_passes_y_to_train_split_when_not_none
(last word changed), since that is what y
actually is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on that?
I think this is covered in the PR comments. Yes, I just confused None
and NaN
in the conversation, sorry.
I agree to the changes, see the new code.
skorch/tests/test_net.py
Outdated
) | ||
with raises: | ||
net.fit(X, y) | ||
assert net.predict(X) is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why this assertion is necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not, I think. I wanted to be more explicit: even though the y=None
the predictions still make sense. I can change this 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it's better to remove it, since it detracts from what is really tested. I actually believe it's impossible for net.predict
to ever return None, so this assertion will never fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I think this can be resolved
Thanks again for the suggestions. Some explanations about the test case. The tests are applied to the The test cases, these two lines reproduce an old behaviour: (False, None, ExitStack()), # ExitStack = does not raise
(True, None, ExitStack()),
(True, lambda x, y: (x, x), ExitStack()), Where the first two lines correspond to the default Then we need to check if it works when (False, lambda x: (x, x), ExitStack()),
(True, lambda x: (x, x), pytest.raises(TypeError)), # Raises an error These cases forced me to change the test data in Thanks again for the review |
Thanks for adding the explanation. I actually believe that a condensed form of it could be added to the test (as I mentioned above, better to add too many than too few comments in tests).
This is not quite true, since the default
Thanks for the PR ;) |
😮 I overlooked that. For some reason I thought, that it goes like I agree to the comments, I'll try to fix the issues tomorrow (now I am way from my workstation). |
skorch/tests/test_net.py
Outdated
from skorch.net import NeuralNet | ||
from skorch.toy import MLPModule | ||
|
||
# By default, `train_split=CVSplit(5)` in the `NeuralNet` definition | ||
if train_split == "default": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a bit ugly, but I didn't want to introduce a new function/import to this namespace.
skorch/tests/test_net.py
Outdated
# By default, `train_split=CVSplit(5)` in the `NeuralNet` definition | ||
if train_split == "default": | ||
from skorch.dataset import CVSplit | ||
train_split = CVSplit(5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is an explicit statement, however it can be replaced by the implicit one for sake of extensibility:
train_split = NeuralNet(None, None).train_split
So, this way you don't have to modify the test when the default value of the parameter is changed.
@BenjaminBossan , what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be even better to solve it like this:
kwargs = {} if train_split == 'default' else {'train_split': train_split}
and then below:
net = NeuralNet(
...
**kwargs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggested two minor changes, the rest looks good.
skorch/tests/test_net.py
Outdated
# By default, `train_split=CVSplit(5)` in the `NeuralNet` definition | ||
if train_split == "default": | ||
from skorch.dataset import CVSplit | ||
train_split = CVSplit(5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be even better to solve it like this:
kwargs = {} if train_split == 'default' else {'train_split': train_split}
and then below:
net = NeuralNet(
...
**kwargs)
skorch/tests/test_net.py
Outdated
n_samples, n_features = 128, 10 | ||
X = np.random.rand(n_samples, n_features).astype(np.float32) | ||
y = np.random.binomial(n=1, p=0.5, size=n_samples) if needs_y else None | ||
|
||
# The `NeuralNetClassifier` or `NeuralNetRegressor` always require `y` | ||
# Only `NeuralNet`can transfer `y=None` to `tran_split` method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Only `NeuralNet`can transfer `y=None` to `tran_split` method. | |
# Only `NeuralNet` can transfer `y=None` to `train_split` method. |
Great work, thanks for being patient. |
By default
skorch
requires thetrain_split
function to have two positional argumentsX
andy
. This leads to unexpected behavior when working on unsupervised tasks:As suggested by @BenjaminBossan, It is possible to define a wrapper function that accepts a dummy positional argument, to solve the problem, but the same can be achieved with the minimum changes on the
skorch
side. This update also eases the integration withtorchtext
iterator/split methods (see the discussion here).