Include Adrien's modifications

blackjax-devs · Dec 12, 2022 · 4004742 · 4004742
1 parent ac0fc42
commit 4004742
Showing 1 changed file with 18 additions and 21 deletions.
diff --git a/examples/GP_Marginal.md b/examples/GP_Marginal.md
@@ -22,49 +22,44 @@ In section we give a brief overview of the idea behind this particular sampler.
 
 ### Motivation: Auxiliary Metropolis-Hastings samplers
 
-Let's us recall how to sample from a target density $\pi(\mathbf{x})$ using a Metropolis-Hasting sampler trough a *marginal scheme process*. The main idea is to have via a mechanisms that generate proposals $y$ which we then accept or reject according to a specific criterion. Concretely,
+Let us recall how to sample from a target density $\pi(\mathbf{x})$ using a Metropolis-Hasting sampler trough a *marginal scheme process*. The main idea is to have a mechanism that generate proposals $y$ which we then accept or reject according to a specific criterion. Concretely, suppose that we have an *auxiliary* scheme given by
 
-1. First we draw an initial sample $\mathbf{u} \sim q(\mathbf{u}|\mathbf{x})$.
-2. Then generate the proposal $y \sim q(\mathbf{y}|\mathbf{x}, \mathbf{u})$.
+1. Sample $\mathbf{u}|\mathbf{x} \sim \pi(\mathbf{u}|\mathbf{x}) = q(\mathbf{u}|\mathbf{x})$.
+2. Generate proposal $\mathbf{y}|\mathbf{u}, \mathbf{x} \sim q(\mathbf{y}|\mathbf{x}, \mathbf{u})$
 3. Compute the Metropolis-Hasting ratio
 
 $$
-\varrho = \frac{\pi(\mathbf{y})q(\mathbf{x}|\mathbf{y})}{\pi(\mathbf{x})q(\mathbf{y}|\mathbf{x})}
+\tilde{\varrho} = \frac{\pi(\mathbf{y}|\mathbf{u})q(\mathbf{x}|\mathbf{y}, \mathbf{u})}{\pi(\mathbf{x}|\mathbf{u})q(\mathbf{y}|\mathbf{x}, \mathbf{u})}
 $$
 
-where $q(\mathbf{y}|\mathbf{x})$ is the overall proposal density which can be obtained by integrating over the auxiliary variable $u$:
-
-$$
-q(\mathbf{y}|\mathbf{x}) = \int q(\mathbf{y}|\mathbf{x}, \mathbf{u})q(\mathbf{u}|\mathbf{x})du
-$$
+4. Accept proposal $y$ with probability $\min(1, \tilde{\varrho})$ and reject it otherwise.
 
-4. Accept proposal $y$ with probability $\min(1, \varrho)$ and reject it otherwise.
+This scheme targets the auxiliary distribution $\pi(\mathbf{x}, \mathbf{u}) = \pi(\mathbf{x}) q(\mathbf{u}|\mathbf{x})$ in two steps.
 
-There is a way of extending this approach, known as *an auxiliary sampler*, by considering a different target (assuming the product form) $\pi(\mathbf{x}, \mathbf{u}) = \pi(\mathbf{x}) q(\mathbf{u}|\mathbf{x})$. For this target we can generate proposal similarly as before(well, via Hastings-within-Gibbs):
+Now, suppose we can instead compute the *marginal* proposal distribution $q(\mathbf{y}|\mathbf{x}) = \int q(\mathbf{y}|\mathbf{x}, \mathbf{u}) q(\mathbf{u}|\mathbf{x}) \mathrm{d}u$ in closed form, then a alternative scheme is given by:
 
-1. Sample $\mathbf{u}|\mathbf{x} \sim \pi(\mathbf{u}|\mathbf{x}) = q(\mathbf{u}|\mathbf{x})$.
-2. Generate proposal $\mathbf{y}|\mathbf{u}, \mathbf{x} \sim q(\mathbf{y}|\mathbf{x}, \mathbf{u})$
-3. Compute the Metropolis-Hasting ratio
+1. We draw a proposal $y \sim q(\mathbf{y}\mid\mathbf{x})$.
+2. Then we compute the Metropolis-Hasting ratio
 
 $$
-\tilde{\varrho} = \frac{\pi(\mathbf{y}|\mathbf{u})q(\mathbf{x}|\mathbf{y}, \mathbf{u})}{\pi(\mathbf{x}|\mathbf{u})q(\mathbf{y}|\mathbf{x}, \mathbf{u})}
+\varrho = \frac{\pi(\mathbf{y})q(\mathbf{x}|\mathbf{y})}{\pi(\mathbf{x})q(\mathbf{y}|\mathbf{x})}
 $$
 
-4. Accept proposal $y$ with probability $\min(1, \tilde{\varrho})$ and reject it otherwise.
+3. Accept proposal $y$ with probability $\min(1, \varrho)$ and reject it otherwise.
 
 ### Example: Auxiliary Metropolis-Adjusted Langevin Algorithm (MALA)
 
-Let's consider the case of a random walk proposal $N(\mathbf{u}|\mathbf{x}, (\delta /2) \mathbf{I})$ for $\delta > 0$. In [[Section 2.2]  Auxiliary gradient-based sampling algorithms](https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/rssb.12269),
-it is shown that one can use a first order approximation to sample from the (intractable) $\pi(\mathbf{x}|\mathbf{u})$ density so that
+Let's consider the case of an auxiliary random walk proposal $q(\mathbf{u}|\mathbf{x}) = N(\mathbf{u}|\mathbf{x}, (\delta /2) \mathbf{I})$ for $\delta > 0$ as in [[Section 2.2]  Auxiliary gradient-based sampling algorithms](https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/rssb.12269), it is shown that one can use a first order approximation to sample from the (intractable) $\pi(\mathbf{x}|\mathbf{u})$ density by choosing
 
 $$
 q(\mathbf{y}|\mathbf{u}, \mathbf{x}) \propto N(\mathbf{y}|\mathbf{u} + (\delta/2)\nabla \log \pi(\mathbf{x}), (\delta/2) I).
 $$
 
-The resulting marginal sampler is the Metropolis-adjusted Langevin algorithm (MALA) where
+The resulting marginal sampler can be shown to correspond to the Metropolis-adjusted Langevin algorithm (MALA) with
+
 
 $$
-q(\mathbf{y}| \mathbf{x}) \propto N(\mathbf{y}|\mathbf{x} + (\delta/2)\nabla \log \pi(\mathbf{x}), \delta I).
+q(\mathbf{y}| \mathbf{x}) = N(\mathbf{y}|\mathbf{x} + (\delta/2)\nabla \log \pi(\mathbf{x}), \delta I).
 $$
 
 ### Latent Gaussian Models
@@ -75,7 +70,7 @@ $$
 \pi(\mathbf{x}) \propto \overbrace{\exp\{f(\mathbf{x})\}}^{\text{likelihood}} \underbrace{N(\mathbf{x}|\mathbf{0}, \mathbf{C})}_{\text{Gaussian Prior}}
 $$
 
-In this case when combined with a random walk proposal $N(\mathbf{u}|\mathbf{x}, (\delta /2) \mathbf{I})$ with the first order approximation we obtain tht following proposal density:
+In this case, instead of linearising the full log density $\log \pi(\mathbf{x})$, we can linearise $f$ only, which, when combined with a random walk proposal $N(\mathbf{u}|\mathbf{x}, (\delta /2) \mathbf{I})$, recovers to the following auxiliary proposal
 
 $$
 q(\mathbf{y}|\mathbf{x}, \mathbf{u}) \propto N\left(\mathbf{y}|\frac{2}{\delta} \mathbf{A}\left(\mathbf{u} + \frac{\delta}{2}\nabla f(\mathbf{x})\right), \mathbf{A}\right),
@@ -89,6 +84,8 @@ $$
 
 Sampling from $\pi(\mathbf{x}, \mathbf{u})$ (and therefore from $\pi(\mathbf{x})$) is done via Hastings-within-Gibbs as above.
 
+A crucial point of this algorithm is the fact that $\mathbf{A}$ can be precomputed and afterward modified cheaply when $\delta$ varies. This makes it easy to calibrate the step-size $\delta$ at low cost.
+
 ---
 
 Now that we have a high-level understanding of the algorithm, let's see how to use it in `blackjax`.