Allow optimization based acquisition functions #6

kiudee · 2020-01-07T16:51:51Z

Computation of acquisition functions only on sampled points is problematic in high-dimensional spaces, where the distance to the true optimum (of the acquisition function) will be large on average.

michaelosthege · 2020-03-14T20:43:41Z

Hi @kiudee,
The package looks quite good! I'm particularly interested in Thompson Sampling, however my problem is high dimensional so I need to use the approximation from Hernández-Lobato 2014.
Did you have a look into it already?
Cheers

kiudee · 2020-03-14T23:41:31Z

Hi @michaelosthege, thank you for the reference.
One of my near future todos was to support parallel evaluation of configurations, which are not simply implemented using a constant liar strategy.
Thompson Sampling already seemed like the most natural acquisition function for parallelization.

From having a cursory glance at the paper, it looks like it should be straightforward to implement.

michaelosthege · 2020-03-19T13:01:10Z

The paper was accompanied by a MATLAB implementation for which I also found a Python port.

Unfortunately, due to the code formatting & absence of useful docstrings/comments I found it hard to understand. I have trouble understanding which values I'll have to pass & which constraints this approximation may put on my GP model..

kiudee · 2020-03-19T13:41:15Z

In your case I would look at quadrature fourier features to implement the Thompson sampling part:
https://papers.nips.cc/paper/8115-efficient-high-dimensional-bayesian-optimization-with-additivity-and-quadrature-fourier-features
I found an implementation here:
https://github.com/Mojusko/QFF

Use it in conjunction with predictive variance reduction search (PVRS):
https://bayesopt.github.io/papers/2017/13.pdf
which works similar to PES but is faster and simpler to implement.

edit: In bayes-skopt PVRS is implemented as follows:

bayes-skopt/bask/acquisition.py

Lines 278 to 308 in d7fa3d2

    
           class PVRS(FullGPAcquisition): 
        
               """Implements the predictive variance reduction search algorithm. 
        
               The algorithm draws a set of Thompson samples (samples from the optimum 
        
               distribution) and proposes the point which reduces the predictive variance 
        
               of these samples the most. 
        
               References 
        
               ---------- 
        
               [1] Nguyen, Vu, et al. "Predictive variance reduction search." Workshop on 
        
               Bayesian optimization at neural information processing systems (NIPSW). 2017. 
        
               """ 
        
               def __call__(self, X, gp, *args, n_thompson=10, random_state=None, **kwargs): 
        
                   n = len(X) 
        
                   thompson_sample = gp.sample_y( 
        
                       X, sample_mean=True, n_samples=n_thompson, random_state=random_state 
        
                   ) 
        
                   thompson_points = np.array(X)[np.argmin(thompson_sample, axis=0)] 
        
                   covs = np.empty(n) 
        
                   for i in range(n): 
        
                       X_train_aug = np.concatenate([gp.X_train_, [X[i]]]) 
        
                       K = gp.kernel_(X_train_aug) 
        
                       if np.iterable(gp.alpha): 
        
                           K[np.diag_indices_from(K)] += np.concatenate([gp.alpha, [0.0]]) 
        
                       L = cholesky(K, lower=True) 
        
                       K_trans = gp.kernel_(thompson_points, X_train_aug) 
        
                       v = cho_solve((L, True), K_trans.T) 
        
                       cov = K_trans.dot(v) 
        
                       covs[i] = np.diag(cov).sum() 
        
                   return covs

Note, however that this is sampling from a random set of points instead of using fourier features.

michaelosthege · 2020-04-14T09:20:07Z

hi @kiudee,
Thank you for those links. It took me a while to read them and also Bradford, 2018 which has a really good description of the RFF algorithm.

My impression is that the QFF make some assumptions about independence of the problem dimensions, that we don't feel comfortable about with our problem. (That might change though, if we run into problems with computational complexity 😅)

~~My RFF implementation still has a bug, resulting in too much posterior variance, but I think it's just a **2 or numpy.log somewhere..~~

kiudee added the Priority: Low label May 21, 2020

kiudee added Priority: Medium Priority: Low enhancement New feature or request and removed Priority: Low Priority: Medium labels Aug 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow optimization based acquisition functions #6

Allow optimization based acquisition functions #6

kiudee commented Jan 7, 2020

michaelosthege commented Mar 14, 2020

kiudee commented Mar 14, 2020

michaelosthege commented Mar 19, 2020

kiudee commented Mar 19, 2020 •

edited

Loading

michaelosthege commented Apr 14, 2020 •

edited

Loading

Allow optimization based acquisition functions #6

Allow optimization based acquisition functions #6

Comments

kiudee commented Jan 7, 2020

michaelosthege commented Mar 14, 2020

kiudee commented Mar 14, 2020

michaelosthege commented Mar 19, 2020

kiudee commented Mar 19, 2020 • edited Loading

michaelosthege commented Apr 14, 2020 • edited Loading

kiudee commented Mar 19, 2020 •

edited

Loading

michaelosthege commented Apr 14, 2020 •

edited

Loading