Gaussian processes via gpytorch #782

BenjaminBossan · 2021-06-13T15:38:36Z

This is a feature to add support for Gaussian Processes (GPs) via integration with gpytorch.

Similar to "vanilla" PyTorch, the idea here is that skorch allows the user to focus on what's important (implementing the mean function and kernel function) and not to bother with stuff like the training loop, callbacks, etc. This is probably best illustrated in the accompanying notebook.

GPs are primarily for regression, hence those are the main focus here. Traditionally, there are "exact" solutions and approximations. skorch will provide an ExactGPRegressor and a GPRegressor for those two use cases.

On top of that, a GPBinaryClassifier is offered, though I suspect it to be rarely used. I couldn't get the GPClassifier for multiclass to work, the code is therefore commented out.

The API is mostly the same as for the normal skorch estimators. There are some additions to make working with GPs easier:

predict method takes a return_std argument to return the standard deviation as well (as in sklearn's GaussianProcessRegressor; return_cov is not supported)
sample method to sample for the model
confidence_region method to get the confidence region

Before merging this, I would like this to be reviewed by some experts on GPs, since I consider myself to know very little about them. Therefore, I will try to find some external help. Especially the following points should be clarified:

Do the gyptorch criteria always takes likelihood and module as input arguments?
Do gpytorch losses always need to be maximized, i.e. we should minimize the negative objective function?
Does sampling work even if we use batching (i.e. for non-exact solutions)?
What is the problem with the multiclass classifier? Is there a self-contained example to replicate?

This PR already includes the bugfix in #781

Including a notebook to demonstrate. Open TODOs: - [] adding a bunch of tests (similar to the tests for NeuralNetClassifier et al) - [] adding most docstrings - [] adding a section to the documentation - [] adding a mention to the README - [] update CHANGES - [] multiclass classification never learns (output all nans) - [] verifying with the GPyTorch devs if the assumptions made here are correct (hopefully they have time to respond)

Previously, when a parameter on, say, the module was changed via set_params (e.g. net.set_params(module__hidden_units=123)), set_params would always trigger (re-)initialization of the module. However, when the net was not initialized in the first place, this is unnecessary. It is sufficient to set the new attribute and wait for the net to be initialized later. Fortunately, this change doesn't seem to have any further impact, i.e. we didn't implicitly rely on this behavior anywhere. The only exceptions are 2 tests in test_cli.py, but those can easily be adjusted and this shouldn't have any user impact.

These methods started to become complicated because they did the following: 1. Check if there is anything to initialize et all 2. Print message about reason for potential re-initialization 3. Moving to device That made it quite difficult to override them without forgetting about some aspect. With this change, there are now corresponding _intialize_* methods that are called by net.initialize() and net.set_params. These new methods now take care of the points above and call the initialize_* methods inside. Now, we can more easily make sure that the user can override initialize_* without anything important being forgotten.

They are not passing yet.

Add a test to check that set_params doesn't initialize the net if it's not yet initialized at that time.

There were two instances of printing regardless of verbosity.

Is not relevant at the moment.

Removed code for states that could not be reached because of virtual params. This simplifies the logic considerably.

Check optimizer-related messages for an initialized net with set_params applied on module.

This is partly WIP because there is more to come, even though this change per se is already an improvement on the status quo. So far, the logic for creating custom modules or optimizers was separate from the logic that created the default module, criterion and optimizer. E.g., the "prefixes_" attribute was prefilled with 'module_', 'criterion_' and 'optimizer_'. This makes dealing with custom modules/optimizers (e.g. creating a second module called 'mymodule_') more difficult, because the logic for treating those was completely disjoint from the logic of how the default modules/optimizer were treated. This change actually removes most of the "special status" of module/criterion/optimizer. Therefore, the logic to treat those is now the same as for any custom module. So for instance, they are no longer pre-registered but instead are only registered later during their initialize_* methods. The this is implemented is to move the registration to the respective initialize_* methods. This is because during __init__, we don't actually know if we deal with a module or optimizer yet (passed argument for 'module' can, for instance, be a function, so we cannot type check). But during 'initialize', when the actual instances are created, we can check if we deal with a nn.Module or optim.Optimizer. If we do, we register them. So overall, the logic and role of 'initialize' have changed. Users will be expected to set custom modules/optimizers during their respective 'initialize_*' methods from now on (stricter checks and doc updates will be added). This affords us to no longer rely on the name to infer the function (remember that previously, a custom module needed to contain the substring 'module', which is an ugly restriction). As more of a side effect to these changes, the '_check_kwargs' call was moved to 'initialize' as well, since we cannot really check for faulty kwargs as long as we don't know what modules and optimizers will be registered.

These are only the tests, which will currently fail, hence WIP. Right now, there is a big hole in the treatment of custom modules/optimizers that distinguishes them from the assumed ones ('module', 'criterion', 'optimizer'). This battery of unit tests covers behaviors that will fail but really shouldn't: - custom module parameters should be passed to the optimizer - set_params on a custom module should trigger re-initialization of criterion and optimizer - set_params on a custom criterion should trigger re-initialization of optimizer - custom modules and criteria are not automatically moved to cuda

…into feature/gpytorch-integration-copy

Since custom components are no longer matched by name, this became obsolete.

Before this, only the default "optimizer_" was used and all others were being ignored. With this change, "zero_grad" and "step" are called on all optimizers automatically.

- Update example in docstring of get_learnable_params - Add comments about unused init contexts - Add deprecation for triggered_directly argument Also: Improved docstring for named_parameters

…into feature/gpytorch-integration-copy

Merge in latest branch of the refactoring that elevates custom modules to first class citizens. Moreover, provide an ExactGPRegressor class directly, instead of requiring the user to implement this. That way, we avoid the minor annoyance of having to write your own initialize_module. Consequently, the notebook could be simplified and the futureattr workaround could be removed. Furthermore, add a lost of tests, most notably tests for GPRegressor (not exact) and GPBinaryClassifier. Added some tests for grid search as well, which showed a few issues that were addressed.

This case had to be covered yet: When the module/criterion is already initialized and none of it's parameters changed, initialize_module/criterion was not called. However, what if a custom module/criterion does need to be initialized? In that case, not calling initialize_module/criterion is bad. With this fix, this bad behavior no longer occurs. Tests were added to cover this. In order to achieve this change, we had to unfortunately push down the checking whether module/criterion is already initialized from _initialize_module/criterion to initialize_module/criterion. There was no other way of checking this, since at first, we cannot know which attributes are modules/criteria. For the user, this means a little more work if they want to implement initialize_module/criterion absolutely correctly. However, that's not so bad because it is only important if the user wants to work with pre-initialized modules/criteria and with custom modules/criteria, which should happen very rarely in practice. And even if it does, the user can just copy the default skorch code and will end up with a correct implementation.

…into feature/gpytorch-integration-copy

With the upstream changes on the refactoring of custom module treatment, this test now passes. It had to be adjusted a bit because BernoulliLikelihood didn't support some of the given parameters.

Extend tests regarding saving, loading, pickling error.

Depending on the kind of distribution, the shape has to be determined differently.

Now handles '+something' suffix, e.g. '1.7.1+cu102'

BenjaminBossan · 2021-07-04T12:03:10Z

Note: GPyTorch + PyTorch 1.7.1 don't work together, so I skip GPyTorch tests if this PyTorch version is detected. Officially, PyTorch >= 1.8.1 is supported by GPyTorch, but PyTorch 1.5.1 and 1.6.0 also work, thus I left those tests in.

thomasjpfan

Initial look at the docs/API.

thomasjpfan · 2021-07-25T03:16:49Z

docs/user/probabilistic.rst

+        module__y=y,
+    )
+
+    gpr.fit(X, y)


Passing X, and y into ExactGPRegressor.__init__ and in fit is a little strange. Alternatives where we delay initializing the module until fit is call feels too hacky.

I had started out in a similar direction but it really didn't work well. The current solution is not the most beautiful but it serves its purpose and uses facilities that are already present. But I'm open for suggestions.

skorch's API is not flexible enough to use the X, y to initialize the module.

As long as we check that module__X is X in fit, I am okay with this workaround.

BTW Another hack would be to do something like nn.LazyLinear: https://github.com/pytorch/pytorch/blob/e711b5ce6c4982960c293b005da37a6d91253bd7/torch/nn/modules/lazy.py#L52 (I do not recommend it)

As long as we check that module__X is X in fit, I am okay with this workaround.

GPyTorch already checks for this equality (unless debug mode is explicitly switched off). I believe this is enough and skorch doesn't need to do the same work.

BTW Another hack would be to do something like nn.LazyLinear: https://github.com/pytorch/pytorch/blob/e711b5ce6c4982960c293b005da37a6d91253bd7/torch/nn/modules/lazy.py#L52 (I do not recommend it)

I agree that we shouldn't use it (for now), given that it's still experimental. Maybe GPyTorch will adopt it so that we don't have to.

thomasjpfan · 2021-07-25T03:18:07Z

docs/user/probabilistic.rst

+    gpr = ExactGPRegressor(
+        RbfModule,
+        module__X=X,
+        module__y=y,


I see that there is a default likelihood that I assume routes into __init__. I think at this point we can use train_inputs, train_targets and route into ExactGPRegressor. This would be more inline with ExactGP itself.

Do you mean that X and y should be renamed to train_inputs and train_targets?

I think just module__X_train, module__y_train, would work. This is to make it clear that we are using training data.

thomasjpfan · 2021-08-13T20:34:39Z

docs/user/probabilistic.rst

+    gpr = ExactGPRegressor(
+        RbfModule,
+        module__X=X,
+        module__y=y,


I think just module__X_train, module__y_train, would work. This is to make it clear that we are using training data.

thomasjpfan · 2021-08-13T20:40:08Z

docs/user/probabilistic.rst

+        module__y=y,
+    )
+
+    gpr.fit(X, y)


skorch's API is not flexible enough to use the X, y to initialize the module.

As long as we check that module__X is X in fit, I am okay with this workaround.

BTW Another hack would be to do something like nn.LazyLinear: https://github.com/pytorch/pytorch/blob/e711b5ce6c4982960c293b005da37a6d91253bd7/torch/nn/modules/lazy.py#L52 (I do not recommend it)

thomasjpfan · 2021-08-13T20:47:52Z

skorch/probabilistic.py

+        if loss.dim() != 0:
+            loss = loss.mean()


Is getting the mean handled by the Criterion?

The gpytorch losses don't always return a scalar by default, as e.g. in this example, so I added this conditional to take care of this. Ideally, it wouldn't be necessary.

thomasjpfan · 2021-08-13T20:55:39Z

skorch/tests/test_probabilistic.py

+        return latent_pred
+
+
+class BaseProbabilisticTests:


My mind is blown that using fixtures + setting class attributes in child classes works.

I was doing an experiment here. Do you think this overall improves the code or not? It does reduce code duplication but it may be less obvious what is going on.

It's mostly okay with me. Looking at this more closely, it all sense and its a reasonable way to "do not repeat yourself". I can see other approaches, but I do not see it as drastic improvement over what we have now.

skorch/tests/test_probabilistic.py

thomasjpfan · 2021-08-13T21:04:00Z

skorch/utils.py

+    if (
+            likelihood
+            and GPYTORCH_INSTALLED
+            and isinstance(likelihood, gpytorch.likelihoods.SoftmaxLikelihood)


The code using SoftmaxLikelihood is commented out, so this code path is never reached?

Indeed, good catch, I commented this code out (adding a comment), as well as the code for _transpose.

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

This is code only required in case we add multiclass GP regression.

Be explicit about X and y being X_train and y_train.

thomasjpfan · 2021-08-29T14:27:08Z

Ran through the notebook and left some comments at https://app.reviewnb.com/skorch-dev/skorch/blob/feature/gaussian-processes-via-gpytorch/notebooks/Gaussian_Processes.ipynb/ Let me know if you can see them.

BenjaminBossan · 2021-08-29T17:51:06Z

@thomasjpfan Thanks for your comments, I could read them. Hopefully, you can read my replies as well. I adjusted the notebook according to your comments/wrote a reply.

This way, we can avoid having to set X and y during the module's __init__, making the usage of exact GPs easier overall.

jacobrgardner

Mostly looks good to me! Left a few comments, primarily about the prediction and sampling methods.

jacobrgardner · 2021-09-29T17:42:28Z

skorch/probabilistic.py

+
+class _GPRegressorPredictMixin:
+    """Mixin class that provides a predict method for GP regressors."""
+    def predict(self, X, return_std=False, return_cov=False):


When return_std is False, consider turning the gpytorch.settings.skip_posterior_variances setting on -- this will avoid doing the math for the posterior variances altogether, which will save a great deal of compute.

Good point, I will add this.

jacobrgardner · 2021-09-29T17:48:26Z

skorch/probabilistic.py

+
+        """
+        self.check_is_fitted()
+        samples = [p.sample(torch.Size([n_samples])) for p in self.forward_iter(X)]


Obviously not 100% familiar with how skorch's internals work, but assuming p is a MultivariateNormal, is there any reason to call sample instead of rsample here? Outright enforcing that sampling must be done in a torch.no_grad() context seems unnecessary at first glance, and a common use case for GPs looks something like: x -> GP -> p(y|D) -> samples of y -> average some function of y over the samples -> differentiate the average with respect to x

The main reason I used sample was because rsample was not supported for all distributions, e.g. Bernoulli. I didn't think of the possibility of using the sample gradients. I'll make a switch to rsample if the used distribution supports it and fall back on sample otherwise.

jacobrgardner · 2021-09-29T17:52:56Z

skorch/probabilistic.py

+
+        nonlin = self._get_predict_nonlinearity()
+        y_preds, y_stds = [], []
+        for yi in self.forward_iter(X, training=False):


One thing that GPyTorch does is we cache a couple things that don't depend on the test points the first time we make predictions so that the next time we make predictions we don't have O(n^3) complexity. These caches get deleted in some cases, like if the model gets put back in training mode -- I assume that forward_iter wouldn't do any of these things though.

It should be pretty easy to test:

gpr = ... gpr.fit(...) with gpytorch.settings.fast_pred_var(): y_pred, y_std = gpr.predict(test_x, return_std=True) # test_x could change here. y_pred, y_std = gpr.predict(test_x, return_std=True) # This call should be significantly faster.

Indeed, when using forward_iter, there is no risk of the model being set to training again. Therefore, I'll add the fast_pred_var context for it. When testing, I didn't find any noticeable speed difference. Perhaps, that is because I'm mostly using toy models on small data in the notebook?

jacobrgardner · 2021-09-29T17:56:52Z

docs/user/probabilistic.rst

+    )
+
+    gpr.fit(X_train, y_train)
+    y_pred = gpr.predict(X_train)


The jupyter notebook you sent me currently highlights gpytorch.settings.fast_pred_samples. when calling sample. That setting won't actually do anything unless you are using KISS-GP. A setting that definitely will make a perf difference is wrapping predict in gpytorch.settings.fast_pred_var() though, assuming gpytorch.settings.skip_posterior_variances() isn't also on (see my comment about that below).

I didn't know that, thanks for clarifying. I will remove the usage of gpytorch.settings.fast_pred_var in the notebook.

ibayer · 2021-09-30T10:16:12Z

Since jacobrgardner already review as authority on GPyTorch, I looked at the integration from the perspective of potential user. This is what I came up with.

When I was reading the tutorial I was wondering in which situation I should use skorch's GPyTorch integration
and when not. I got the following impression which (if correct) might be useful for other first time users as well.

Good use cases:

Scaling GPs with simple standard kernels and data that leads to too slow computation or memory requirements for libraries such as sklearn.
Deploying a GP with complex kernel so that contributors not familiar with the nitty gritty details of GPs and GPyTorch features.

Better use GPyTorch directly:

Building complex kernel compositions and exploring their data fits in detail.

This would then naturally lead to section titles in the tutorial nb such as "Scaling GP's" and "Deploying GPyTorch Models with Skorch". The material is already there but this would have helped me to jump right into the parts most relevant for me.

Below are a few nitpicking comments on the tutorial text.

If you need to use batches (say, you don't have enough GPU memory to fit all your data), you have to use variational GPs, as shown later in the notebook.

Sparse GP's can be used to scale to larger data. Not all approaches a based on variational inference.

you have to -> you can

even though we have not even trained the model.

This is a bit tricky since GPs are non-parametric methods and "learning" is usually associated with estimating primary parameters and not hyper-parameters as with GPs.

trained the model -> have not fitted the hyper-parameters

Since the model learns a real distribution, not just a point prediction as is most often the case for Deep Learning

Again the learning thing ;), I would say the distribution is modeled using kernel engineering and the distribution parameters are fitted.

Since the model learns a real distribution -> Since the model represents a probability distribution.

That's all, I think the PR is in good shape. Thanks for making GP's easier accessible. :)

* Use fast_pred_vars in forward_iter * Use rsample (where possible) when sampling * Use skip_posterior_variances when predicting without returning the standard deviations * Don't use fast_pred_samples in notebook since there is no point there Accompanying these fixes, I made testing more rigurous by checking that certain values are not all identical. This is because with some of the introduced optimizations, GPyTorch will return (what I assume to be) dummy values for certain operations, which we don't want to use.

BenjaminBossan · 2021-10-03T13:29:40Z

Thank you @ibayer for your feedback.

This would then naturally lead to section titles in the tutorial nb such as "Scaling GP's" and "Deploying GPyTorch Models with Skorch". The material is already there but this would have helped me to jump right into the parts most relevant for me.

I couldn't quite manage to rework the sections to fit the titles as you suggested them. Still, as you said, it's important to clarify when and when not to use GPyTorch with skorch. To this end, I added a section at the start of the GP documentation in skorch that discusses this topic. The notebook references the docs too, so that interested readers can look up that information.

If you need to use batches (say, you don't have enough GPU memory to fit all your data), you have to use variational GPs, as shown later in the notebook.

Sparse GP's can be used to scale to larger data. Not all approaches a based on variational inference.

The part you cite talks about using batches and not about scaling per se. For batching, use of sparse GPs would not help and the statement is correct as is ;) But I get what you mean, I'll change the wording to no longer give the impression that variational GPs are the only possibility for scaling.

This is a bit tricky since GPs are non-parametric methods and "learning" is usually associated with estimating primary parameters and not hyper-parameters as with GPs.

Again the learning thing ;), I would say the distribution is modeled using kernel engineering and the distribution parameters are fitted.

Thank you, you are right. I'll change the wording to not mislead the reader.

* Add discussion and when (and when not) to use GPyTorch + skorch * Reword to no longer give the impression that variational GPs are the only scaling method * Reword sentences that refer to "learning", as this can be misleading

BenjaminBossan · 2021-10-03T13:37:52Z

@thomasjpfan @ottonemo @jacobrgardner @ibayer

Thank you all for your helpful reviews. Hopefully, I could address all of your concerns.

My proposal is to give one more week for any issues with this PR to be raised. If not, I'll assume it is ready to go now. Then I'd merge and prepare the next skorch release. For that, I would frame this feature as a "beta" with possible breaking changes in the future in case something is just plain wrong. That way, we are not shackled by possible mistakes I made here.

Please let me know if you have any objections.

thomasjpfan · 2021-10-08T21:48:50Z

For that, I would frame this feature as a "beta" with possible breaking changes in the future in case something is just plain wrong.

Sounds good to me.

BenjaminBossan · 2021-10-09T19:23:57Z

Thanks again everyone for your comments and suggestions. As announced, I merged this PR after waiting one week. A new skorch release will follow soon (I'll aim for tomorrow).

We are happy to announce the new skorch 0.11 release: Two basic but very useful features have been added to our collection of callbacks. First, by setting `load_best=True` on the [`Checkpoint` callback](https://skorch.readthedocs.io/en/latest/callbacks.html#skorch.callbacks.Checkpoint), the snapshot of the network with the best score will be loaded automatically when training ends. Second, we added a callback [`InputShapeSetter`](https://skorch.readthedocs.io/en/latest/callbacks.html#skorch.callbacks.InputShapeSetter) that automatically adjusts your input layer to have the size of your input data (useful e.g. when that size is not known beforehand). When it comes to integrations, the [`MlflowLogger`](https://skorch.readthedocs.io/en/latest/callbacks.html#skorch.callbacks.MlflowLogger) now allows to automatically log to [MLflow](https://mlflow.org/). Thanks to a contributor, some regressions in `net.history` have been fixed and it even runs faster now. On top of that, skorch now offers a new module, `skorch.probabilistic`. It contains new classes to work with **Gaussian Processes** using the familiar skorch API. This is made possible by the fantastic [GPyTorch](https://github.com/cornellius-gp/gpytorch) library, which skorch uses for this. So if you want to get started with Gaussian Processes in skorch, check out the [documentation](https://skorch.readthedocs.io/en/latest/user/probabilistic.html) and this [notebook](https://nbviewer.org/github/skorch-dev/skorch/blob/master/notebooks/Gaussian_Processes.ipynb). Since we're still learning, it's possible that we will change the API in the future, so please be aware of that. Morever, we introduced some changes to make skorch more customizable. First of all, we changed the signature of some methods so that they no longer assume the dataset to always return exactly 2 values. This way, it's easier to work with custom datasets that return e.g. 3 values. Normal users should not notice any difference, but if you often create custom nets, take a look at the [migration guide](https://skorch.readthedocs.io/en/latest/user/FAQ.html#migration-from-0-10-to-0-11). And finally, we made a change to how custom modules, criteria, and optimizers are handled. They are now "first class citizens" in skorch land, which means: If you add a second module to your custom net, it is treated exactly the same as the normal module. E.g., skorch takes care of moving it to CUDA if needed and of switching it to train or eval mode. This way, customizing your networks architectures with skorch is easier than ever. Check the [docs](https://skorch.readthedocs.io/en/latest/user/customization.html#initialization-and-custom-modules) for more details. Since these are some big changes, it's possible that you encounter issues. If that's the case, please check our [issue](https://github.com/skorch-dev/skorch/issues) page or create a new one. As always, this release was made possible by outside contributors. Many thanks to: - Autumnii - Cebtenzzre - Charles Cabergs - Immanuel Bayer - Jake Gardner - Matthias Pfenninger - Prabhat Kumar Sahu Find below the list of all changes: Added - Added `load_best` attribute to `Checkpoint` callback to automatically load state of the best result at the end of training - Added a `get_all_learnable_params` method to retrieve the named parameters of all PyTorch modules defined on the net, including of criteria if applicable - Added `MlflowLogger` callback for logging to Mlflow (#769) - Added `InputShapeSetter` callback for automatically setting the input dimension of the PyTorch module - Added a new module to support Gaussian Processes through [GPyTorch](https://gpytorch.ai/). To learn more about it, read the [GP documentation](https://skorch.readthedocs.io/en/latest/user/probabilistic.html) or take a look at the [GP notebook](https://nbviewer.jupyter.org/github/skorch-dev/skorch/blob/master/notebooks/Gaussian_Processes.ipynb). This feature is experimental, i.e. the API could be changed in the future in a backwards incompatible way (#782) Changed - Changed the signature of `validation_step`, `train_step_single`, `train_step`, `evaluation_step`, `on_batch_begin`, and `on_batch_end` such that instead of receiving `X` and `y`, they receive the whole batch; this makes it easier to deal with datasets that don't strictly return an `(X, y)` tuple, which is true for quite a few PyTorch datasets; please refer to the [migration guide](https://skorch.readthedocs.io/en/latest/user/FAQ.html#migration-from-0-10-to-0-11) if you encounter problems (#699) - Checking of arguments to `NeuralNet` is now during `.initialize()`, not during `__init__`, to avoid raising false positives for yet unknown module or optimizer attributes - Modules, criteria, and optimizers that are added to a net by the user are now first class: skorch takes care of setting train/eval mode, moving to the indicated device, and updating all learnable parameters during training (check the [docs](https://skorch.readthedocs.io/en/latest/user/customization.html#initialization-and-custom-modules) for more details, #751) - `CVSplit` is renamed to `ValidSplit` to avoid confusion (#752) Fixed - Fixed a few bugs in the `net.history` implementation (#776) - Fixed a bug in `TrainEndCheckpoint` that prevented it from being unpickled (#773)

BenjaminBossan added 30 commits March 26, 2021 19:03

Minor changes

df0413e

Move cb_params update, improve comment

a880987

Update notebook with a warning on init likelihood

ab5f973

Remove unnecessary imports

e572d65

Add first few unit tests

15d6fa8

They are not passing yet.

Add unit test for set_params on uninitialized net

0b19538

Add a test to check that set_params doesn't initialize the net if it's not yet initialized at that time.

Print only when verbose

d6e1bca

There were two instances of printing regardless of verbosity.

Remove init code related to likelihood

c87c932

Is not relevant at the moment.

Further clean up of set_params re-initialization

0cda3ef

Removed code for states that could not be reached because of virtual params. This simplifies the logic considerably.

Add more tests for re-initialization logic

cff5edb

Check optimizer-related messages for an initialized net with set_params applied on module.

Implement changes to make tests pass

435fc75

[WIP] Update CHANGES

52bdefa

Merge branch 'changed/refactor-init-more-consistency-custom-modules' …

eddd5aa

…into feature/gpytorch-integration-copy

Simplify implementation based on refactoring

c471c78

[WIP] Document an edge case not covered yet

be4c035

Remove _PYTORCH_COMPONENTS global

f20982d

Since custom components are no longer matched by name, this became obsolete.

Update documentation reflecting the changes

5480b8f

All optimizers perform updates automatically

7454e85

Before this, only the default "optimizer_" was used and all others were being ignored. With this change, "zero_grad" and "step" are called on all optimizers automatically.

Address reviewer comments

e1d3c2f

- Update example in docstring of get_learnable_params - Add comments about unused init contexts - Add deprecation for triggered_directly argument Also: Improved docstring for named_parameters

Merge branch 'changed/refactor-init-more-consistency-custom-modules' …

5510ef0

…into feature/gpytorch-integration-copy

Merge branch 'changed/refactor-init-more-consistency-custom-modules' …

ee23283

…into feature/gpytorch-integration-copy

Activate test about initialization message

968da8c

With the upstream changes on the refactoring of custom module treatment, this test now passes. It had to be adjusted a bit because BernoulliLikelihood didn't support some of the given parameters.

Extend test coverage, fix a typo

0889eea

Extend tests regarding saving, loading, pickling error.

BenjaminBossan added 5 commits June 27, 2021 13:35

Fix failing test caused by distribution shape

3795c19

Depending on the kind of distribution, the shape has to be determined differently.

For testing, exclude Python 3.6, PyTorch 1.7.1

b3c5908

For testing, exclude Python 3.8 PyTorch 1.7.1

e97dc14

Skip gpytorch tests for pytorch 1.7.1

d85597f

Modify pytorch version check

02d1b6f

Now handles '+something' suffix, e.g. '1.7.1+cu102'

thomasjpfan reviewed Jul 25, 2021

View reviewed changes

thomasjpfan reviewed Aug 13, 2021

View reviewed changes

BenjaminBossan and others added 3 commits August 15, 2021 23:38

Reviewer comment: Use pytest.mark.skipif

8befe3a

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

Comment out code that is not currently needed

cfa993e

This is code only required in case we add multiclass GP regression.

Improve documentation example for GPs

e3cd39b

Be explicit about X and y being X_train and y_train.

Reviewer comments: some improvements to notebook

663ee3d

Use set_train_data for exact GPs

454e898

This way, we can avoid having to set X and y during the module's __init__, making the usage of exact GPs easier overall.

jacobrgardner reviewed Sep 29, 2021

View reviewed changes

Address reviewer comments by Immanuel Bayer

9670c9c

* Add discussion and when (and when not) to use GPyTorch + skorch * Reword to no longer give the impression that variational GPs are the only scaling method * Reword sentences that refer to "learning", as this can be misleading

BenjaminBossan and others added 3 commits October 9, 2021 21:00

Fix typo in README

c1a1ace

Add entry about GPs to CHANGES

cdc82a8

Merge branch 'master' into feature/gaussian-processes-via-gpytorch

777914a

BenjaminBossan merged commit 6ca5c4e into master Oct 9, 2021

BenjaminBossan mentioned this pull request Oct 10, 2021

Preparation for release 0.11.0 #807

Merged

BenjaminBossan deleted the feature/gaussian-processes-via-gpytorch branch July 22, 2022 12:43

Gaussian processes via gpytorch #782

Gaussian processes via gpytorch #782

Conversation

BenjaminBossan commented Jun 13, 2021 • edited Loading

BenjaminBossan commented Jul 4, 2021

thomasjpfan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thomasjpfan commented Aug 29, 2021

BenjaminBossan commented Aug 29, 2021

jacobrgardner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ibayer commented Sep 30, 2021

BenjaminBossan commented Oct 3, 2021

BenjaminBossan commented Oct 3, 2021

thomasjpfan commented Oct 8, 2021

BenjaminBossan commented Oct 9, 2021

BenjaminBossan commented Jun 13, 2021 •

edited

Loading