Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gaussian processes via gpytorch #782

Merged
merged 77 commits into from
Oct 9, 2021

Conversation

BenjaminBossan
Copy link
Collaborator

@BenjaminBossan BenjaminBossan commented Jun 13, 2021

This is a feature to add support for Gaussian Processes (GPs) via integration with gpytorch.

Similar to "vanilla" PyTorch, the idea here is that skorch allows the user to focus on what's important (implementing the mean function and kernel function) and not to bother with stuff like the training loop, callbacks, etc. This is probably best illustrated in the accompanying notebook.

GPs are primarily for regression, hence those are the main focus here. Traditionally, there are "exact" solutions and approximations. skorch will provide an ExactGPRegressor and a GPRegressor for those two use cases.

On top of that, a GPBinaryClassifier is offered, though I suspect it to be rarely used. I couldn't get the GPClassifier for multiclass to work, the code is therefore commented out.

The API is mostly the same as for the normal skorch estimators. There are some additions to make working with GPs easier:

  • predict method takes a return_std argument to return the standard deviation as well (as in sklearn's GaussianProcessRegressor; return_cov is not supported)
  • sample method to sample for the model
  • confidence_region method to get the confidence region

Before merging this, I would like this to be reviewed by some experts on GPs, since I consider myself to know very little about them. Therefore, I will try to find some external help. Especially the following points should be clarified:

  • Do the gyptorch criteria always takes likelihood and module as input arguments?
  • Do gpytorch losses always need to be maximized, i.e. we should minimize the negative objective function?
  • Does sampling work even if we use batching (i.e. for non-exact solutions)?
  • What is the problem with the multiclass classifier? Is there a self-contained example to replicate?

This PR already includes the bugfix in #781

BenjaminBossan added 30 commits March 26, 2021 19:03
Including a notebook to demonstrate.

Open TODOs:

- [] adding a bunch of tests (similar to the tests for NeuralNetClassifier et al)
- [] adding most docstrings
- [] adding a section to the documentation
- [] adding a mention to the README
- [] update CHANGES
- [] multiclass classification never learns (output all nans)
- [] verifying with the GPyTorch devs if the assumptions made here are correct (hopefully they have time to respond)
Previously, when a parameter on, say, the module was changed via
set_params (e.g. net.set_params(module__hidden_units=123)), set_params
would always trigger (re-)initialization of the module. However, when
the net was not initialized in the first place, this is unnecessary. It
is sufficient to set the new attribute and wait for the net to be
initialized later.

Fortunately, this change doesn't seem to have any further impact, i.e.
we didn't implicitly rely on this behavior anywhere. The only exceptions
are 2 tests in test_cli.py, but those can easily be adjusted and this
shouldn't have any user impact.
These methods started to become complicated because they did the
following:

1. Check if there is anything to initialize et all
2. Print message about reason for potential re-initialization
3. Moving to device

That made it quite difficult to override them without forgetting about
some aspect. With this change, there are now corresponding _intialize_*
methods that are called by net.initialize() and net.set_params. These
new methods now take care of the points above and call the initialize_*
methods inside.

Now, we can more easily make sure that the user can override
initialize_* without anything important being forgotten.
They are not passing yet.
Add a test to check that set_params doesn't initialize the net if it's
not yet initialized at that time.
There were two instances of printing regardless of verbosity.
Is not relevant at the moment.
Removed code for states that could not be reached because of virtual
params. This simplifies the logic considerably.
Check optimizer-related messages for an initialized net with set_params
applied on module.
This is partly WIP because there is more to come, even though this
change per se is already an improvement on the status quo.

So far, the logic for creating custom modules or optimizers was separate
from the logic that created the default module, criterion and optimizer.
E.g., the "prefixes_" attribute was prefilled with 'module_',
'criterion_' and 'optimizer_'. This makes dealing with custom
modules/optimizers (e.g. creating a second module called 'mymodule_')
more difficult, because the logic for treating those was completely
disjoint from the logic of how the default modules/optimizer were
treated.

This change actually removes most of the "special status" of
module/criterion/optimizer. Therefore, the logic to treat those is now
the same as for any custom module. So for instance, they are no longer
pre-registered but instead are only registered later during their
initialize_* methods.

The this is implemented is to move the registration to the respective
initialize_* methods. This is because during __init__, we don't actually
know if we deal with a module or optimizer yet (passed argument for
'module' can, for instance, be a function, so we cannot type check). But
during 'initialize', when the actual instances are created, we can check
if we deal with a nn.Module or optim.Optimizer. If we do, we register
them.

So overall, the logic and role of 'initialize' have changed. Users will
be expected to set custom modules/optimizers during their respective
'initialize_*' methods from now on (stricter checks and doc updates will
be added). This affords us to no longer rely on the name to infer the
function (remember that previously, a custom module needed to contain
the substring 'module', which is an ugly restriction).

As more of a side effect to these changes, the '_check_kwargs' call was
moved to 'initialize' as well, since we cannot really check for faulty
kwargs as long as we don't know what modules and optimizers will be
registered.
These are only the tests, which will currently fail, hence WIP.

Right now, there is a big hole in the treatment of custom
modules/optimizers that distinguishes them from the assumed
ones ('module', 'criterion', 'optimizer'). This battery of unit tests
covers behaviors that will fail but really shouldn't:

- custom module parameters should be passed to the optimizer
- set_params on a custom module should trigger re-initialization of
  criterion and optimizer
- set_params on a custom criterion should trigger re-initialization of
  optimizer
- custom modules and criteria are not automatically moved to cuda
Since custom components are no longer matched by name, this became
obsolete.
Before this, only the default "optimizer_" was used and all others were
being ignored. With this change, "zero_grad" and "step" are called on
all optimizers automatically.
- Update example in docstring of get_learnable_params
- Add comments about unused init contexts
- Add deprecation for triggered_directly argument

Also: Improved docstring for named_parameters
Merge in latest branch of the refactoring that elevates custom modules
to first class citizens.

Moreover, provide an ExactGPRegressor class directly, instead of
requiring the user to implement this. That way, we avoid the minor
annoyance of having to write your own initialize_module.

Consequently, the notebook could be simplified and the futureattr
workaround could be removed.

Furthermore, add a lost of tests, most notably tests for
GPRegressor (not exact) and GPBinaryClassifier.

Added some tests for grid search as well, which showed a few issues that
were addressed.
This case had to be covered yet: When the module/criterion is already
initialized and none of it's parameters changed,
initialize_module/criterion was not called. However, what if a custom
module/criterion does need to be initialized? In that case, not calling
initialize_module/criterion is bad.

With this fix, this bad behavior no longer occurs. Tests were added to
cover this.

In order to achieve this change, we had to unfortunately push down the
checking whether module/criterion is already initialized from
_initialize_module/criterion to initialize_module/criterion. There was
no other way of checking this, since at first, we cannot know which
attributes are modules/criteria.

For the user, this means a little more work if they want to implement
initialize_module/criterion absolutely correctly. However, that's not so
bad because it is only important if the user wants to work with
pre-initialized modules/criteria and with custom modules/criteria, which
should happen very rarely in practice. And even if it does, the user can
just copy the default skorch code and will end up with a correct
implementation.
With the upstream changes on the refactoring of custom module treatment,
this test now passes. It had to be adjusted a bit because
BernoulliLikelihood didn't support some of the given parameters.
Extend tests regarding saving, loading, pickling error.
BenjaminBossan added 5 commits June 27, 2021 13:35
@BenjaminBossan
Copy link
Collaborator Author

Note: GPyTorch + PyTorch 1.7.1 don't work together, so I skip GPyTorch tests if this PyTorch version is detected. Officially, PyTorch >= 1.8.1 is supported by GPyTorch, but PyTorch 1.5.1 and 1.6.0 also work, thus I left those tests in.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial look at the docs/API.

module__y=y,
)

gpr.fit(X, y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing X, and y into ExactGPRegressor.__init__ and in fit is a little strange. Alternatives where we delay initializing the module until fit is call feels too hacky.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had started out in a similar direction but it really didn't work well. The current solution is not the most beautiful but it serves its purpose and uses facilities that are already present. But I'm open for suggestions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skorch's API is not flexible enough to use the X, y to initialize the module.

As long as we check that module__X is X in fit, I am okay with this workaround.

BTW Another hack would be to do something like nn.LazyLinear: https://github.com/pytorch/pytorch/blob/e711b5ce6c4982960c293b005da37a6d91253bd7/torch/nn/modules/lazy.py#L52 (I do not recommend it)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we check that module__X is X in fit, I am okay with this workaround.

GPyTorch already checks for this equality (unless debug mode is explicitly switched off). I believe this is enough and skorch doesn't need to do the same work.

BTW Another hack would be to do something like nn.LazyLinear: https://github.com/pytorch/pytorch/blob/e711b5ce6c4982960c293b005da37a6d91253bd7/torch/nn/modules/lazy.py#L52 (I do not recommend it)

I agree that we shouldn't use it (for now), given that it's still experimental. Maybe GPyTorch will adopt it so that we don't have to.

gpr = ExactGPRegressor(
RbfModule,
module__X=X,
module__y=y,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that there is a default likelihood that I assume routes into __init__. I think at this point we can use train_inputs, train_targets and route into ExactGPRegressor. This would be more inline with ExactGP itself.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that X and y should be renamed to train_inputs and train_targets?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just module__X_train, module__y_train, would work. This is to make it clear that we are using training data.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

gpr = ExactGPRegressor(
RbfModule,
module__X=X,
module__y=y,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just module__X_train, module__y_train, would work. This is to make it clear that we are using training data.

module__y=y,
)

gpr.fit(X, y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skorch's API is not flexible enough to use the X, y to initialize the module.

As long as we check that module__X is X in fit, I am okay with this workaround.

BTW Another hack would be to do something like nn.LazyLinear: https://github.com/pytorch/pytorch/blob/e711b5ce6c4982960c293b005da37a6d91253bd7/torch/nn/modules/lazy.py#L52 (I do not recommend it)

Comment on lines +177 to +178
if loss.dim() != 0:
loss = loss.mean()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is getting the mean handled by the Criterion?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gpytorch losses don't always return a scalar by default, as e.g. in this example, so I added this conditional to take care of this. Ideally, it wouldn't be necessary.

return latent_pred


class BaseProbabilisticTests:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mind is blown that using fixtures + setting class attributes in child classes works.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was doing an experiment here. Do you think this overall improves the code or not? It does reduce code duplication but it may be less obvious what is going on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mostly okay with me. Looking at this more closely, it all sense and its a reasonable way to "do not repeat yourself". I can see other approaches, but I do not see it as drastic improvement over what we have now.

skorch/tests/test_probabilistic.py Outdated Show resolved Hide resolved
skorch/utils.py Outdated
if (
likelihood
and GPYTORCH_INSTALLED
and isinstance(likelihood, gpytorch.likelihoods.SoftmaxLikelihood)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code using SoftmaxLikelihood is commented out, so this code path is never reached?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, good catch, I commented this code out (adding a comment), as well as the code for _transpose.

BenjaminBossan and others added 3 commits August 15, 2021 23:38
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
This is code only required in case we add multiclass GP regression.
Be explicit about X and y being X_train and y_train.
@thomasjpfan
Copy link
Member

Ran through the notebook and left some comments at https://app.reviewnb.com/skorch-dev/skorch/blob/feature/gaussian-processes-via-gpytorch/notebooks/Gaussian_Processes.ipynb/ Let me know if you can see them.

@BenjaminBossan
Copy link
Collaborator Author

@thomasjpfan Thanks for your comments, I could read them. Hopefully, you can read my replies as well. I adjusted the notebook according to your comments/wrote a reply.

This way, we can avoid having to set X and y during the module's
__init__, making the usage of exact GPs easier overall.
Copy link

@jacobrgardner jacobrgardner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good to me! Left a few comments, primarily about the prediction and sampling methods.


class _GPRegressorPredictMixin:
"""Mixin class that provides a predict method for GP regressors."""
def predict(self, X, return_std=False, return_cov=False):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When return_std is False, consider turning the gpytorch.settings.skip_posterior_variances setting on -- this will avoid doing the math for the posterior variances altogether, which will save a great deal of compute.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I will add this.


"""
self.check_is_fitted()
samples = [p.sample(torch.Size([n_samples])) for p in self.forward_iter(X)]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously not 100% familiar with how skorch's internals work, but assuming p is a MultivariateNormal, is there any reason to call sample instead of rsample here? Outright enforcing that sampling must be done in a torch.no_grad() context seems unnecessary at first glance, and a common use case for GPs looks something like: x -> GP -> p(y|D) -> samples of y -> average some function of y over the samples -> differentiate the average with respect to x

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason I used sample was because rsample was not supported for all distributions, e.g. Bernoulli. I didn't think of the possibility of using the sample gradients. I'll make a switch to rsample if the used distribution supports it and fall back on sample otherwise.


nonlin = self._get_predict_nonlinearity()
y_preds, y_stds = [], []
for yi in self.forward_iter(X, training=False):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that GPyTorch does is we cache a couple things that don't depend on the test points the first time we make predictions so that the next time we make predictions we don't have O(n^3) complexity. These caches get deleted in some cases, like if the model gets put back in training mode -- I assume that forward_iter wouldn't do any of these things though.

It should be pretty easy to test:

gpr = ...
gpr.fit(...)
with gpytorch.settings.fast_pred_var():
    y_pred, y_std = gpr.predict(test_x, return_std=True)
    # test_x could change here.
    y_pred, y_std = gpr.predict(test_x, return_std=True)   # This call should be significantly faster. 

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, when using forward_iter, there is no risk of the model being set to training again. Therefore, I'll add the fast_pred_var context for it. When testing, I didn't find any noticeable speed difference. Perhaps, that is because I'm mostly using toy models on small data in the notebook?

)

gpr.fit(X_train, y_train)
y_pred = gpr.predict(X_train)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The jupyter notebook you sent me currently highlights gpytorch.settings.fast_pred_samples. when calling sample. That setting won't actually do anything unless you are using KISS-GP. A setting that definitely will make a perf difference is wrapping predict in gpytorch.settings.fast_pred_var() though, assuming gpytorch.settings.skip_posterior_variances() isn't also on (see my comment about that below).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know that, thanks for clarifying. I will remove the usage of gpytorch.settings.fast_pred_var in the notebook.

@ibayer
Copy link

ibayer commented Sep 30, 2021

Since jacobrgardner already review as authority on GPyTorch, I looked at the integration from the perspective of potential user. This is what I came up with.

When I was reading the tutorial I was wondering in which situation I should use skorch's GPyTorch integration
and when not. I got the following impression which (if correct) might be useful for other first time users as well.

Good use cases:

  • Scaling GPs with simple standard kernels and data that leads to too slow computation or memory requirements for libraries such as sklearn.
  • Deploying a GP with complex kernel so that contributors not familiar with the nitty gritty details of GPs and GPyTorch features.

Better use GPyTorch directly:

  • Building complex kernel compositions and exploring their data fits in detail.

This would then naturally lead to section titles in the tutorial nb such as "Scaling GP's" and "Deploying GPyTorch Models with Skorch". The material is already there but this would have helped me to jump right into the parts most relevant for me.

Below are a few nitpicking comments on the tutorial text.

If you need to use batches (say, you don't have enough GPU memory to fit all your data), you have to use variational GPs, as shown later in the notebook.

Sparse GP's can be used to scale to larger data. Not all approaches a based on variational inference.

  • you have to -> you can

even though we have not even trained the model.

This is a bit tricky since GPs are non-parametric methods and "learning" is usually associated with estimating primary parameters and not hyper-parameters as with GPs.

  • trained the model -> have not fitted the hyper-parameters

Since the model learns a real distribution, not just a point prediction as is most often the case for Deep Learning

Again the learning thing ;), I would say the distribution is modeled using kernel engineering and the distribution parameters are fitted.

  • Since the model learns a real distribution -> Since the model represents a probability distribution.

That's all, I think the PR is in good shape. Thanks for making GP's easier accessible. :)

* Use fast_pred_vars in forward_iter
* Use rsample (where possible) when sampling
* Use skip_posterior_variances when predicting without returning the
  standard deviations
* Don't use fast_pred_samples in notebook since there is no point there

Accompanying these fixes, I made testing more rigurous by checking that
certain values are not all identical. This is because with some of the
introduced optimizations, GPyTorch will return (what I assume to be)
dummy values for certain operations, which we don't want to use.
@BenjaminBossan
Copy link
Collaborator Author

Thank you @ibayer for your feedback.

This would then naturally lead to section titles in the tutorial nb such as "Scaling GP's" and "Deploying GPyTorch Models with Skorch". The material is already there but this would have helped me to jump right into the parts most relevant for me.

I couldn't quite manage to rework the sections to fit the titles as you suggested them. Still, as you said, it's important to clarify when and when not to use GPyTorch with skorch. To this end, I added a section at the start of the GP documentation in skorch that discusses this topic. The notebook references the docs too, so that interested readers can look up that information.

If you need to use batches (say, you don't have enough GPU memory to fit all your data), you have to use variational GPs, as shown later in the notebook.

Sparse GP's can be used to scale to larger data. Not all approaches a based on variational inference.

The part you cite talks about using batches and not about scaling per se. For batching, use of sparse GPs would not help and the statement is correct as is ;) But I get what you mean, I'll change the wording to no longer give the impression that variational GPs are the only possibility for scaling.

This is a bit tricky since GPs are non-parametric methods and "learning" is usually associated with estimating primary parameters and not hyper-parameters as with GPs.

Again the learning thing ;), I would say the distribution is modeled using kernel engineering and the distribution parameters are fitted.

Thank you, you are right. I'll change the wording to not mislead the reader.

* Add discussion and when (and when not) to use GPyTorch + skorch
* Reword to no longer give the impression that variational GPs are the
  only scaling method
* Reword sentences that refer to "learning", as this can be misleading
@BenjaminBossan
Copy link
Collaborator Author

@thomasjpfan @ottonemo @jacobrgardner @ibayer

Thank you all for your helpful reviews. Hopefully, I could address all of your concerns.

My proposal is to give one more week for any issues with this PR to be raised. If not, I'll assume it is ready to go now. Then I'd merge and prepare the next skorch release. For that, I would frame this feature as a "beta" with possible breaking changes in the future in case something is just plain wrong. That way, we are not shackled by possible mistakes I made here.

Please let me know if you have any objections.

@thomasjpfan
Copy link
Member

For that, I would frame this feature as a "beta" with possible breaking changes in the future in case something is just plain wrong.

Sounds good to me.

@BenjaminBossan BenjaminBossan merged commit 6ca5c4e into master Oct 9, 2021
@BenjaminBossan
Copy link
Collaborator Author

Thanks again everyone for your comments and suggestions. As announced, I merged this PR after waiting one week. A new skorch release will follow soon (I'll aim for tomorrow).

BenjaminBossan added a commit that referenced this pull request Oct 31, 2021
We are happy to announce the new skorch 0.11 release:

Two basic but very useful features have been added to our collection of callbacks. First, by setting `load_best=True` on the  [`Checkpoint` callback](https://skorch.readthedocs.io/en/latest/callbacks.html#skorch.callbacks.Checkpoint), the snapshot of the network with the best score will be loaded automatically when training ends. Second, we added a callback [`InputShapeSetter`](https://skorch.readthedocs.io/en/latest/callbacks.html#skorch.callbacks.InputShapeSetter) that automatically adjusts your input layer to have the size of your input data (useful e.g. when that size is not known beforehand).

When it comes to integrations, the [`MlflowLogger`](https://skorch.readthedocs.io/en/latest/callbacks.html#skorch.callbacks.MlflowLogger) now allows to automatically log to [MLflow](https://mlflow.org/). Thanks to a contributor, some regressions in `net.history` have been fixed and it even runs faster now.

On top of that, skorch now offers a new module, `skorch.probabilistic`. It contains new classes to work with **Gaussian Processes** using the familiar skorch API. This is made possible by the fantastic [GPyTorch](https://github.com/cornellius-gp/gpytorch) library, which skorch uses for this. So if you want to get started with Gaussian Processes in skorch, check out the [documentation](https://skorch.readthedocs.io/en/latest/user/probabilistic.html) and this [notebook](https://nbviewer.org/github/skorch-dev/skorch/blob/master/notebooks/Gaussian_Processes.ipynb). Since we're still learning, it's possible that we will change the API in the future, so please be aware of that.

Morever, we introduced some changes to make skorch more customizable. First of all, we changed the signature of some methods so that they no longer assume the dataset to always return exactly 2 values. This way, it's easier to work with custom datasets that return e.g. 3 values. Normal users should not notice any difference, but if you often create custom nets, take a look at the [migration guide](https://skorch.readthedocs.io/en/latest/user/FAQ.html#migration-from-0-10-to-0-11).

And finally, we made a change to how custom modules, criteria, and optimizers are handled. They are now "first class citizens" in skorch land, which means: If you add a second module to your custom net, it is treated exactly the same as the normal module. E.g., skorch takes care of moving it to CUDA if needed and of switching it to train or eval mode. This way, customizing your networks architectures with skorch is easier than ever. Check the [docs](https://skorch.readthedocs.io/en/latest/user/customization.html#initialization-and-custom-modules) for more details.

Since these are some big changes, it's possible that you encounter issues. If that's the case, please check our [issue](https://github.com/skorch-dev/skorch/issues) page or create a new one.

As always, this release was made possible by outside contributors. Many thanks to:

- Autumnii
- Cebtenzzre
- Charles Cabergs
- Immanuel Bayer
- Jake Gardner
- Matthias Pfenninger
- Prabhat Kumar Sahu 

Find below the list of all changes:

Added

- Added `load_best` attribute to `Checkpoint` callback to automatically load state of the best result at the end of training
- Added a `get_all_learnable_params` method to retrieve the named parameters of all PyTorch modules defined on the net, including of criteria if applicable
- Added `MlflowLogger` callback for logging to Mlflow (#769)
- Added `InputShapeSetter` callback for automatically setting the input dimension of the PyTorch module
- Added a new module to support Gaussian Processes through [GPyTorch](https://gpytorch.ai/). To learn more about it, read the [GP documentation](https://skorch.readthedocs.io/en/latest/user/probabilistic.html) or take a look at the [GP notebook](https://nbviewer.jupyter.org/github/skorch-dev/skorch/blob/master/notebooks/Gaussian_Processes.ipynb). This feature is experimental, i.e. the API could be changed in the future in a backwards incompatible way (#782)

Changed

- Changed the signature of `validation_step`, `train_step_single`, `train_step`, `evaluation_step`, `on_batch_begin`, and `on_batch_end` such that instead of receiving `X` and `y`, they receive the whole batch; this makes it easier to deal with datasets that don't strictly return an `(X, y)` tuple, which is true for quite a few PyTorch datasets; please refer to the [migration guide](https://skorch.readthedocs.io/en/latest/user/FAQ.html#migration-from-0-10-to-0-11) if you encounter problems (#699)
- Checking of arguments to `NeuralNet` is now during `.initialize()`, not during `__init__`, to avoid raising false positives for yet unknown module or optimizer attributes
- Modules, criteria, and optimizers that are added to a net by the user are now first class: skorch takes care of setting train/eval mode, moving to the indicated device, and updating all learnable parameters during training (check the [docs](https://skorch.readthedocs.io/en/latest/user/customization.html#initialization-and-custom-modules) for more details, #751)
- `CVSplit` is renamed to `ValidSplit` to avoid confusion (#752)

Fixed

- Fixed a few bugs in the `net.history` implementation (#776)
- Fixed a bug in `TrainEndCheckpoint` that prevented it from being unpickled (#773)
@BenjaminBossan BenjaminBossan deleted the feature/gaussian-processes-via-gpytorch branch July 22, 2022 12:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants