Multi output #360

ingmarschuster · 2023-08-14T08:50:45Z

Type of changes

Checklist

I've formatted the new code by running poetry run pre-commit run --all-files --show-diff-on-failure before committing.
I've added tests for new code.
I've added docstrings for the new code.

Description

Added multi-output capability with coregionalization, where the dataset is expected to have multiple columns in y for the different output dimensions. Trainable output correlations can be achieved via a categorical CatKernel used as the out_kernel field in the prior.

The advantage of this approach is that the Kronecker structure of coregionalization, i.e. the fact that K = kron(Kxx, Kyy) can be used to efficiently invert K. When instead we leave the Dataset.y array to be a single dimension and artificially add the index of the output dimension to the input points, the information about the Kronecker structure is lost and efficiency goes away. Unless doing some smart trick of course.

Issue Number: N/A

ingmarschuster · 2023-08-24T08:24:36Z

@henrymoss Can you review this? The regression_mo.py can be used to try things out.

Main adaptations in the predict() method of Prior, ConjugatePosterior and likelihood Gaussian.

Would need to add the changes toNonConjugatePosterior and likelihoods Bernoulli, Poisson as well and write tests. If somebody can help out here that would be great!

henrymoss · 2023-08-24T10:07:35Z

Is there a way for me to just see the MO differences? At the moment its all mixed in with the masking stuff

ingmarschuster · 2023-08-24T12:05:47Z

Yes, just go to
mask-missing...multi-output

Maybe I was a bit too optimistic with the mask PR being merged into main ;)

ingmarschuster · 2023-08-25T15:16:13Z

one thing I [dislike is trying] to do too much out of the box and then things [get] messy

I'm not sure how to do this using a wrapper such as MultiOutputLikelihood around univariate likelihoods as suggested by Ti on slack. Mainly because I think we need access to not only the abstract distribution of a univariate likelihood, but also to parameters like the covariance matrix if the likelihood is gaussian.
One obvious option to require the user to be more explicit would be to have multivariate versions of the different likelihoods.

ingmarschuster · 2023-08-25T15:18:47Z

I’ll think about a wrapper solution, as I also think its much nicer. If you have “how” ideas for the design, I’d be very happy

ingmarschuster · 2023-08-25T15:45:27Z

Mask-missing is now merged, so all changes you see in diffs are agains main @henrymoss

ingmarschuster · 2023-08-30T13:26:28Z

Ok, I think I can draft a solution that is explicit like Ti suggested. For the likelihood it can, after all, be a pretty simple wrapper I think. For the Posterior, things get slightly more involved, but you can introduce classes that encapsulate the output logic.

Here is a draft. Its rough around the edges, and the method naming is not right, but it illustrates the point so we can discuss:

class AbstractOutput:
    def calculate_Sigma(self, x, y, n, obs_noise, jitter, mask, Kxx):
        raise NotImplementedError()

    def calculate_Ktt_and_Kxt(self, t, x, n_test, mask):
        raise NotImplementedError()

    def calculate_distribution(self, n_test, m, mean, covariance):
        raise NotImplementedError()

class UnivariateOutput(AbstractOutput):
    def calculate_Sigma(self, x, y, n, obs_noise, jitter, mask, Kxx):
        Sigma = Kxx + identity(n) * (jitter + obs_noise)

        if mask is not None:
            y = jnp.where(mask, 0.0, y)
            mx = jnp.where(mask, 0.0, mx)
            Sigma_masked = jnp.where(mask + mask.T, 0.0, Sigma.matrix)
            Sigma = Sigma.replace(
                matrix=jnp.where(
                    jnp.diag(jnp.squeeze(mask)), 1 / (2 * jnp.pi), Sigma_masked
                )
            )

        return Sigma

    def calculate_Ktt_and_Kxt(self, t, x, n_test, mask):
        Ktt = self.prior.kernel.gram(t)
        Kxt = self.prior.kernel.cross_covariance(x, t)

        if mask is not None:
            Kxt = jnp.where(mask * jnp.ones((1, n_test), dtype=bool), 0.0, Kxt)

        return Ktt, Kxt

    def calculate_distribution(self, n_test, m, mean, covariance):
        return GaussianDistribution(jnp.atleast_1d(mean.squeeze()), covariance)

class MultivariateOutput(AbstractOutput):
    def __init__(self, out_kernel):
        self.out_kernel = out_kernel

    def calculate_Sigma(self, x, y, n, obs_noise, jitter, mask, Kxx):
        m = y.shape[1]
        n_X_m = n * m
        Kyy = self.out_kernel.gram(jnp.arange(m)[:, jnp.newaxis])
        Sigma = DenseLinearOperator(jnp.kron(Kxx.to_dense(), Kyy.to_dense()))
        Sigma += identity(n_X_m) * (jitter + obs_noise)

        if mask is not None:
            y = jnp.where(mask, 0.0, y)
            mx = jnp.where(mask, 0.0, mx)
            Sigma_masked = jnp.where(mask + mask.T, 0.0, Sigma.matrix)
            Sigma = Sigma.replace(
                matrix=jnp.where(
                    jnp.diag(jnp.squeeze(mask)), 1 / (2 * jnp.pi), Sigma_masked
                )
            )

        return Sigma

    def calculate_Ktt_and_Kxt(self, t, x, n_test, mask):
        m = y.shape[1]
        Kyy = self.out_kernel.gram(jnp.arange(m)[:, jnp.newaxis])
        Ktt = jnp.kron(self.prior.kernel.gram(t).to_dense(), Kyy.to_dense())
        Kxt = jnp.kron(self.prior.kernel.cross_covariance(x, t), Kyy.to_dense())

        if mask is not None:
            Kxt = jnp.where(mask * jnp.ones((1, n_test), dtype=bool), 0.0, Kxt)

        return Ktt, Kxt

    def calculate_distribution(self, n_test, m, mean, covariance):
        rval = GaussianDistribution(jnp.atleast_1d(mean.squeeze()), covariance)
        if m == 1:
            return rval
        else:
            return ReshapedDistribution(rval, (n_test, m))

Then the predict method of the posterior can look something like this, under the assumption that the posterior is constructed with an output_model (defaults to UnivariateOutput):

def predict(self, test_inputs: Num[Array, "N D"], train_data: Dataset) -> GaussianDistribution:
    x, y, n, mask = train_data.X, train_data.y, train_data.n, train_data.mask
    m = y.shape[1]
    if m > 1 and mask is not None:
        mask = mask.flatten()

    t, n_test = test_inputs, test_inputs.shape[0]

    obs_noise = self.likelihood.obs_noise
    mx = self.prior.mean_function(x)
    mean_t = self.prior.mean_function(t)
    Kxx = self.prior.kernel.gram(x)

    Sigma = self.output_model.calculate_Sigma(x, y, n, obs_noise, self.prior.jitter, mask, Kxx)
    Ktt, Kxt = self.output_model.calculate_Ktt_and_Kxt(t, x, n_test, mask)

    Sigma_inv_Kxt = Sigma.solve(Kxt)

    mean = mean_t.flatten() + jnp.matmul(Sigma_inv_Kxt.T, (y - mx).flatten())

    covariance = Ktt - jnp.matmul(Kxt.T, Sigma_inv_Kxt)
    covariance += identity(n_test * m) * self.prior.jitter

    return self.output_model.calculate_distribution(n_test, m, mean, covariance)

What do you think?

For reference, the GPytorch API provides

gpytorch.means.MultitaskMean(gpytorch.means.ConstantMean(), num_tasks=2 )
gpytorch.kernels.MultitaskKernel(gpytorch.kernels.RBFKernel(), num_tasks=2, rank=1)

thomaspinder · 2023-09-01T07:12:00Z

Hey @ingmarschuster, for me, the AbstractOutput approach is cleaner. I feel it demands less of the user, is slightly more intuitive, and keeps the underlying code more didactic. Perhaps this was always your intention, but maybe the user does not even need to specify the output class i.e., MultivariateOutput and it could instead be formed by passing num_output or num_task into the prior initialiser? I agree on tidying up the naming e.g., calculate_Sigma -> gram or calculate_covariance.

daniel-dodd · 2023-09-01T15:26:58Z

Hey nice work @ingmarschuster. Great to see MOGPs coming along.

Just perhaps a naive question though, is there a benefit of this PR, over using another input dimension to the Kernel (defining output indicies) and using a Kronecker product kernel defined on k((i, x), (j, y)) = k(i, j) k(x, y) - just here we would actually be computationally efficient with regards to LinearOperator structure, while your code assumes everything is dense. (We actually nearly support this - following CoLA PR merger #370 , all we need is a Kronecker compute engine #371 - which is a quick PR). We could create an MOGP wrapper to do so in 5 lines of code, and would have all the computational benefits retained, would be simpler to read, would be extensible.

y # Flattened to [N, 1]
x # [output_dimension, covariates]

k_out = Kernel(active_dims = output_dimension)
k_cov = Kernel(active_dims = covariates)
k = Kronecker([k_out, k_cov])
gp = Prior(kernel = k)

As this seems to be the approach you are trying to mimic? Or maybe I am missing something?

ingmarschuster · 2023-09-02T07:48:50Z

@thomaspinder: passing num_output / num_task into the prior would then force one single multi-output handling onto the user like e.g. coregionalization. I'd be ok with this in principle, just to say that explicitly passing the multi-output handling object gives more flexibility (and we can have a sensible default).
@daniel-dodd: Denseness was used because I was too lazy for a Kronecker operator in the current LinOps architecture and knew Cola integration was coming along, would make this very easy, and have the computational benefits. I totally understand how the CoLa Kronecker would work on the matrix level – on the kernel level I do not see it working at the moment. But maybe you do, since you worked on the CoLa PR? Actually the problem I saw with the approach of making output index an input dimension is exactly the fact that guaranteeing efficient Kronecker calculation would be difficult if some output indices only occur occasionally in the input.

henrymoss

Sorry its taken so long for me to get to this. Its been my teaching weeks....

I think the approach is nice. We just need to think about how to do the efficient kronecker matrix inverse.

I also wonder if the ReshapedDistribution is a bit too far? You could just have some code that maps stuff into the right shapes where we need it (i.e. in the likelhodd and posteriors?).

Also, perhaps Im being stupid. But is it possible with this setup to have different kernels for each output?

henrymoss · 2023-08-24T10:05:54Z

gpjax/dataset.py

+        mask: (Optional[Union[Bool[Array, "N Q"], Literal["infer automatically"]]]): mask for the output data.
+            Users can optionally specify a pre-computed mask, or explicitly pass `None` which
+            means no mask will be used. Defaults to `"infer automatically"` which means that
+            the mask will be computed from the output data, or set to `None` if no output data is provided.


I guess this will be its own PR?

daniel-dodd · 2023-09-07T21:08:23Z

@ingmarschuster, @henrymoss, @thomaspinder, thinking about this further, these are my current thoughts on this PR.

I find some aspects of this approach quite appealing, and I believe there are strong elements we can build upon. However, I have reservations about merging this PR directly into the main GPJax codebase in its current state, with the implication that it would become a stable code design for several releases.

My primary concern is that this implementation seems tailored exclusively for conjugate priors, rendering it incompatible with various other components of the library, including non-conjugate priors, variational families, approximations like RFF, and pathwise sampling. To make it compatible, substantial code modifications would be necessary.

However, I believe there are simpler ways to achieve the same functionality introduced by this PR. These approaches would not necessitate as extensive changes to existing code, and would offer enhanced efficiency and extensibility for various modelling scenarios, including heterotopic and isotopic cases.

A logical starting point would be to restructure the kernel and incorporate structured likelihoods. In my view, some of the functionality found in your AbstractOutput's methods should be distributed between these components, and I find specifying the out_kernel within the Prior feels overly restrictive; it aligns better with the kernel's responsibilities. This reorganisation of the code would provide greater flexibility and result in minimal compatibility issues with the rest of the codebase. Now with the (albeit minimal) CoLA integration, we are in a good position to start thinking about this.

Nonetheless, I appreciate the innovative aspects of your work, @ingmarschuster. There are undoubtedly neat computational structure exampled here, and I believe this can be effectively utilised with careful kernel and likelihood design. To keep the momentum going, I would recommend that this PR in its current state, be designated for a gpjax.experimental submodule. This would allow us to develop and refine it more freely until we achieve a sufficient level of compatibility and extensibility.

ingmarschuster · 2023-09-09T09:55:52Z

@daniel-dodd The current implementation is only for conjugate priors mostly because I wanted to be sure that the design is well-received before continuing implementation. However the approach is easy to implement for Non-Conjugate models etc, since its basically just making the output dimension index into a kernel input/coregionalization. Variational, pathwise sampling and RFF I didn't think about too deeply, but don't see a problem.

If you think you want to integrate structured likelihoods and rework the code like that, of course this is also a viable way, but I will probably not be able to do it. I'm open to gpjax.experimental integration on the other hand. Just maybe you and @thomaspinder decide either way, because I wouldn't want to spend time on it if its not used.

daniel-dodd · 2023-09-15T10:47:45Z

Hey @ingmarschuster, thanks for your quick reply and apologies for my slow one.

That sounds good to me. The main intention of the previous messages was to assess how this fits into the broader developmental plan. Of course, a minimal working implementation is the best place to start - but nowhere did you describe this as such - please briefly mention such known limitations on the PR notes in future so we can open corresponding issues.

Maybe a broader concern I did have is that the prior and conjugate posterior objects are starting to get busy with code - perhaps a bit out of touch with didacticity. But, to get the ball rolling, please feel free to merge as is, but I would recommend an experimental flag to avoid frustrating users by saying we support a stable MOGP version when we are still yet to finalise things and address its integration into the rest of the library.

During your rebase you may find it useful to use CoLA’s Kronecker linear operator in place of your dense jnp.kron for efficiency - to facilitate an actual computational benefit here of your approach. :)

I will shortly open a GitHub discussion where I believe we can simplify your implementation and be agnostic to the type of MOGP while achieving full compatibility with the rest of the library. I would be happy to lead such developmental work and would greatly appreciate your insights — as there are certain things my mental model could do with learning and improving.

* using Co-regionalization & trainable output correlations * prediction needs to be reshaped currently to reflect multiple outputs

Necessitates correcting type hints

(Num instead of Float arrays)

to include reshaped distributions

ingmarschuster requested a review from henrymoss August 24, 2023 08:23

ingmarschuster marked this pull request as ready for review August 24, 2023 08:25

ingmarschuster force-pushed the multi-output branch from a2f1a6b to 054ed98 Compare August 24, 2023 12:02

henrymoss approved these changes Sep 7, 2023

View reviewed changes

daniel-dodd approved these changes Sep 15, 2023

View reviewed changes

frazane and others added 12 commits September 16, 2023 14:43

tests for masked dataset

f39c238

dummy shape kwarg

d49b730

masked log prob and posterior prediction

ab8c40e

improve dataset initialization

8b726bf

add log_prob test

bb0c433

First working version for multi-output

bd45e82

* using Co-regionalization & trainable output correlations * prediction needs to be reshaped currently to reflect multiple outputs

Change sentinel class to string literal

7f6c1e4

Multi-output mean functions

c46b942

Keep single output API the same

e6c5a47

Necessitates correcting type hints

Tiny fixes

7062e96

More general call signature for kernels

01c0261

(Num instead of Float arrays)

Simplified multioutput regression notebook

9a29bb3

ingmarschuster added 6 commits September 16, 2023 15:24

Remove now unused import

e8a7b40

Update docstrings for ReshapedDistribution.

5f9f236

update to new Zero mean function from main branch

d46e839

Remove unused Import

40cf499

Remove outdated comment

a336daa

Some final rebase adjustments

9bcc366

ingmarschuster force-pushed the multi-output branch from eba682a to 9bcc366 Compare September 16, 2023 13:44

ingmarschuster added 8 commits September 16, 2023 16:19

Undo locally used pyright

359fa40

Corrected number of test points

a621189

Delete unused imports and variables

4194c64

Correct ConjugatePosterior computation for multi-output

809c0d1

Remove double import caused by rebase

87f612c

Add tests for multi-output regression.

74a3093

Generalize type hints for kernels.

5a0fa82

Correct type hints for GPs

3e6f7f7

to include reshaped distributions

ingmarschuster merged commit 83fdab2 into main Sep 24, 2023
14 checks passed

st-- deleted the multi-output branch November 30, 2023 10:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi output #360

Multi output #360

ingmarschuster commented Aug 14, 2023 •

edited

Loading

ingmarschuster commented Aug 24, 2023 •

edited

Loading

henrymoss commented Aug 24, 2023

ingmarschuster commented Aug 24, 2023

ingmarschuster commented Aug 25, 2023

ingmarschuster commented Aug 25, 2023

ingmarschuster commented Aug 25, 2023

ingmarschuster commented Aug 30, 2023 •

edited

Loading

thomaspinder commented Sep 1, 2023

daniel-dodd commented Sep 1, 2023

ingmarschuster commented Sep 2, 2023 •

edited

Loading

henrymoss left a comment

henrymoss Aug 24, 2023

daniel-dodd commented Sep 7, 2023

ingmarschuster commented Sep 9, 2023

daniel-dodd commented Sep 15, 2023

Multi output #360

Multi output #360

Conversation

ingmarschuster commented Aug 14, 2023 • edited Loading

Type of changes

Checklist

Description

ingmarschuster commented Aug 24, 2023 • edited Loading

henrymoss commented Aug 24, 2023

ingmarschuster commented Aug 24, 2023

ingmarschuster commented Aug 25, 2023

ingmarschuster commented Aug 25, 2023

ingmarschuster commented Aug 25, 2023

ingmarschuster commented Aug 30, 2023 • edited Loading

thomaspinder commented Sep 1, 2023

daniel-dodd commented Sep 1, 2023

ingmarschuster commented Sep 2, 2023 • edited Loading

henrymoss left a comment

Choose a reason for hiding this comment

henrymoss Aug 24, 2023

Choose a reason for hiding this comment

daniel-dodd commented Sep 7, 2023

ingmarschuster commented Sep 9, 2023

daniel-dodd commented Sep 15, 2023

ingmarschuster commented Aug 14, 2023 •

edited

Loading

ingmarschuster commented Aug 24, 2023 •

edited

Loading

ingmarschuster commented Aug 30, 2023 •

edited

Loading

ingmarschuster commented Sep 2, 2023 •

edited

Loading