Orthogonal Matching Pursuit Method

The orthogonal matching pursuit (OMP) algorithm used in this folder differs from the conventional algorithm in how the background is fitted and how the next atom (gene in this case) is selected. This document expalins how.

The function that carries out the OMP is o.get_omp_coefs. It takes SpotColors normalised by channel and fits the background eigenvectors followed by successive genes, as long as they exceed a certain threshold.

Fitting Background

The background is fitted using the function o.get_spot_residual_background. This assumes that there are o.nBP background eigenvectors and each one is just a strip in a color channel i.e. o.ompBackgroundChannelStrips=true. An example for channel 1 is shown below:

The background fitting procedure will be illustrated by fitting the above background vector to the following (channel normalised) spot color:

The basic procedure, shown below, is to find a weight factor that normalises the contribution from each round so that no one round dominates. Multiply both background code and spot color by this weight. Then the final coefficient for the background code is the dot product of the weighted spot color with the weighted background code.

The weight factor for fitting the channel B background vector in round r is:

$W_{B,r} = \frac{1}{(|s_{B,r}|+\lambda)^\sigma}$

$s_{B,r}}$ is the spot color in channel B, round r. $\lambda$ is o.ompWeightShift which is just a small number to stop $W_{B,r}$ blowing up for small $s_{B,r}}$ . $\sigma$ is o.ompWeightPower, the lower $\sigma$ , the greater the contributions of the rounds where $s_{B,r}}$ is large.

For $\lambda=0.01$ and $\sigma=0.9$ , the weight factor is shown in the top row. The third row just shows the second row multiplied by the weight factor. If $\lambda=0$ and $\sigma=1$ , then the weighted spot color shown would have an absolute value of 1 for all rounds in channel 1.

The final coefficient of the background vector for channel B is given by: $C_B = \frac{\sum_{r=1}^7W_{B,r}^2s_{B,r}g_{B,r}}{\sum_{r=1}^7W_{B,r}^2g_{B,r}^2}$

$g_{B,r}$ is the value of the background vector for channel B in round r. In the procedure shown, this is equivalent to multiplying the two weighted codes in the third row together, then dividing by the square of the weighted background. The result of this is shown as the Weighted Dot Product in the bottom row. is then found by summing this. It is clear from the formula that if $s_{B,r} = \mu g_{B,r}$ then $C_B = \mu$ for any value of $\mu$ .

The spot color found after fitting the background is shown in the middle plot below. The values of for all channels, b, are kept constant after this initial fitting which differs from the usual OMP method where they are re-fit after each subsequent gene is added. The reason for this is that sometimes an unusual background can be fit to justify further genes.

Fitting Genes

After fitting the background, we need to decide which gene to fit next. This is done by the function o.get_weight_gene_dot_product2. It finds a modified dot product for each gene, g, and then the next gene to be fit is the gene with the largest value.

For the spot color shown above, the next gene to be fit is Aldoc, with a bled code shown below.

The below plot illustrates the procedure for finding for Aldoc to be 6.098, with a particular focus on the contribution from round 7. The basic procedure is to find a dot product between the spot color and the gene code for each round and then summing these individual dot products with a weighting indicating the strength of the gene in each round.

First, for each gene, we need to find a weight for each round. This is based on the gene efficiency as shown in the top left plot above. The formula is:

$w_{gr} = \sqrt{\frac{1}{1+e^{-\alpha(\varepsilon_{g,r}-\beta)}}}$

$\varepsilon_{g,r}$ is the gene efficiency for gene g in round r. $\alpha$ is ompNormBledCodeScale, the larger this, the sharper the step function. $\beta$ is ompNormBledCodeShift, this controls where the step function is centered. For the plot shown, $\alpha=7$ and $\beta=0.5$ .

We next square and normalise this as shown in the top right plot. The formula is:

$W_{gr}^2 = \frac{nR \times w_{gr}^2}{\sum_{r=1}^{nR}w_{gr}^2}$

Here, nR = o.nRounds, and it is normalised such that $\sum_{r=1}^{nR}W_{gr}^2=nR$ .

Now we need to find the dot product for each round. For a particular round R, this is:

$DP_{g,R} = \frac{\sum_{b=1}^7s''_{b,R}g_{b,R}}{\Bigg(\sqrt{\sum_{b=1}^7{s''_{b,R}^2}}+\lambda\Bigg)^{\sigma}\sqrt{\sum_{b=1}^7{g_{b,R}^2}}}$

Here, refers to the current spot residual, so for this example it is the spot color - background but if we are on the next iteration it would be spot color - background - first gene. We include both $\lambda$ and $\sigma$ again to stop a blow up and to give slightly greater contribution to the more intense rounds. If $\lambda=0$ and $\sigma=1$ , then we are just finding the cosine of the angle between the vectors.

In the procedure shown, the second row shows and with R=7. The third row shows the spot weight which is:

$\frac{1}{\Bigg(\sqrt{\sum_{b=1}^7{s''_{b,R}^2}}+\lambda\Bigg)^{\sigma}}$

and the gene weight which is:

$\frac{1}{\sqrt{\sum_{b=1}^7{g_{b,R}^2}}}$ .

The next row shows the two preceding rows multiplied by each other i.e.

$\frac{s''_R}{\Bigg(\sqrt{\sum_{b=1}^7{s''_{b,R}^2}}+\lambda\Bigg)^{\sigma}}$ and $\frac{g_R}{\sqrt{\sum_{b=1}^7{g_{b,R}^2}}}$ .

The bottom left plot then shows these two multiplied by each other. Summing over all values in this image then gives $DP_{g,R} = 0.894$ .

The final dot product is then given by:

$DP_g = \sum_{r=1}^7W_{gr}^2DP_{gr}$

In the bottom right plot shown, NormRoundWeight is $W_{gr}^2$ , RoundDotProduct is $DP_{gr}$ and RoundContribution is $W_{gr}^2DP_{gr}$ .

Finding coefficients for selected genes

Now we know the best gene to add, we need to find a coefficiet for it. This is done in the standard OMP least squares way. If we are considering the first gene to be added then:

$C_g = \frac{\sum_{b=1}^7\sum_{r=1}^7s'_{br}g_{br}}{\sum_{b=1}^7\sum_{r=1}^7g_{br}^2}$

Where refers to the spot residual post background removal i.e. spot color - background. It is always this for every iteration as we refit the genes we have previously found when subsequent genes are added.

If we are on a later iteration with more than one gene, then the MATLAB backslash operation is used to solve:

$C_G = \bigg(G^TG\bigg)^{-1}G^Ts'$

where G is a 49 x nGenesAdded matrix where each column is the bled code of a gene that is added and $C_G = [C_{g1}, C_{g2} ...]$ .

Genes are continually added in this way until there are o.ompMaxGenes added to explain a particular spot color or:

$\Delta Residual = Residual_{i-1} - Residual_{i} < ResidualThresh$

Where:

$Residual = \sqrt{\sum_{b=1}^7\sum_{r=1}^7s''_{br}^2}$

ResidualThresh is based on the second largest value in the initial spot color. The process is more easily explained through the example spot we have been using, this has ResidualThresh = 0.036:

Here, i=1 refers to just Aldoc so:
, $Residual_{i-1} = 1.12$ and $\Delta Residual = 0.66$ . $\Delta Residual > ResidualThresh$ so we proceed and accept Aldoc.

i=2:
, $Residual_{i-1} = 0.46$ and $\Delta Residual = 0.22$ . $\Delta Residual > ResidualThresh$ so we proceed and accept Trp53i11.

i=3:
, $Residual_{i-1} = 0.24$ and $\Delta Residual = 0.03$ . $\Delta Residual < ResidualThresh$ so we reject Hapln1 and end.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OMP Maths.md

OMP Maths.md

Orthogonal Matching Pursuit Method

Fitting Background

Fitting Genes

Finding coefficients for selected genes

Files

OMP Maths.md

Latest commit

History

OMP Maths.md

File metadata and controls

Orthogonal Matching Pursuit Method

Fitting Background

Fitting Genes

Finding coefficients for selected genes