Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature skip data iteration when caching scoring #557

Merged
merged 8 commits into from
Dec 16, 2019

Conversation

BenjaminBossan
Copy link
Collaborator

This is the proposed fix for #552.

As discussed there, my proposal is to cache net.forward_iter, so that we can skip the iteration over the data.

It would be very helpful if someone could think of any unintended side-effects this change could have. E.g., in theory, some code could rely on iterating over the data even in case of caching, but I can hardly imagine this happening in reality.

Also, I cannot test on GPU for the moment. I don't see how this would affect the outcome, but it could still be nice of someone verifies it works.

I deprecated cache_net_infer because it was not "private". However, I made the new _cache_net_forward_iter private.

BenjaminBossan added 2 commits November 11, 2019 11:33
* indentation level
* disable some pylint messages
* unused fixtures
Before, net.infer was cached when using a scoring callback wiht
use_caching=True. This way, the time to make an inference step was
saved. However, there was still an iteration step over the data for
each scoring callback. If iteration is slow, this could incur a
significant overhead.

Now net.forward_iter is cached instead. This way, the iteration over
the data is skipped and the iteration overhead should be gone.
@marrrcin
Copy link

Can I tests this from some nightly build or do I have to build the package on my own?

@BenjaminBossan
Copy link
Collaborator Author

@marrrcin You would need to install from source, but it's not difficult: https://github.com/skorch-dev/skorch#from-source

Copy link
Member

@ottonemo ottonemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I think this approach works. We could debate whether this is a problem we need to solve or where PyTorch lacks infrastructure (i.e., caching datasets) but ultimately I think it doesn't hurt to fix this.

skorch/net.py Outdated
def _forward_output(self, yp, device):
if isinstance(yp, tuple):
return tuple(n.to(device) for n in yp)
return yp.to(device)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is structurally very similar to skorch.utils.to_tensor. Maybe we should introduce a skorch.utils.to_device instead? This might also become handy if we support multiple GPUs in the future.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I moved this to skorch.utils.to_device

@BenjaminBossan
Copy link
Collaborator Author

@ottonemo I addressed your comment, pls review again.

Before merging this, I should we consider making a new release? I wouldn't mind this new feature being only on master for some time in case it creates some trouble down the line.

@ottonemo ottonemo merged commit 09be626 into master Dec 16, 2019
@BenjaminBossan BenjaminBossan deleted the feature-skip-data-iteration-when-caching-scoring branch February 2, 2020 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants