Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MCMC prediction stability issue - providing NAN values of variance (Issue in Google Colab GPU/HighRam setting) #34

Open
arpanbiswas52 opened this issue Aug 11, 2023 · 4 comments

Comments

@arpanbiswas52
Copy link
Contributor

arpanbiswas52 commented Aug 11, 2023

**This is the issue encountered in Google Colab- under GPU setting T4 and High-RAM

When we run the function- ExactGP.fit()-- it produces NAN values for standard deviation calculation. The error can be repo in all the below modifications
These things I have already tried

  • Normalize the data in fitting the model
  • Tried with all the kernels: RBF, Matern and Periodic
  • Tried with diff prior function: LogNormal, Normal, HalfNormal which are mostly used anyway in GP/BO
  • Tried with diff noise priors
  • Also tried with diff sets of training data

With current workaround it seems with reducing the number of total samples in MCMC setting to num_warmup=500, num_samples=500 (Default is num_warmup=1000, num_samples=3000), it is able to provide reasonable outputs.

@nmenon97
Copy link

nmenon97 commented Dec 4, 2023

Encountered the same issue without using Google Colab or GPU

@ziatdinovmax
Copy link
Owner

@nmenon97 - did it happen in the 64-bit precision regime?

@nmenon97
Copy link

nmenon97 commented Dec 5, 2023

@ziatdinovmax yes! I can sometimes get it to work by changing num_warmup and num_samples as suggested above but it isn't a stable solution.

@ziatdinovmax
Copy link
Owner

ziatdinovmax commented Jan 3, 2024

This is due to the peculiar behavior of jax.vmap when approaching a memory limit. There are three ways to deal with this:

  1. Draw multiple random smaller batches of samples (see gpax.acquisition.qEI, .qUCB, etc; you can specify the batch size using the subsample_size argument) and average them.
  2. Assume that the acquisition function is continuous and use gpax.acquisition.optimize_acq to optimize it with num_initial_guesses << total_number_of_points.
  3. Use a device with a larger memory :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants