Better averaging strategy for heatmap tiles #135

JobLeonard · 2017-11-08T15:27:58Z

Because of issues #114 and #134 I'm looking at this code again I've done a quick investigation into strategies for generating zoomed-out tiles from our data.

Currently, we pick the top-left corner out of four data-points. This allows for enormous systematic biases: half of the genes are removed for each zoom level! I think we can do better.

For comparison, here is a pixel-perfect zoomed-in view of the cortex.loom dataset:

Ideally, we want to maintain similar brightness, some sense of noise profile, and visible structures. In practice we will need to compromise on something that does well but no perfect on all three.

top-left pick (current strategy)

This happens works decently enough on this dataset, presumably because the distribution in the data is random enough to counter the systematic bias. On other datasets the zoomed out view is almost completely blue, despite having non-blue rows, hiding interesting spots.

Also, structures present in zoomed in views (rows and columns that have expression levels from top to bottom) is almost completely gone when zooming out.

Average

Too smooth, and because the value distribution is not uniform it introduces a bias of its own by draging the high values down. It does preserve structure better.

Max value

Yeah... moving on...

Max value per column, average per row

Now we're getting somewhere! While still biased to the maximum values too much, resulting in higher values every time we zoom out, this maintains the structure visible when looking at the zoomed in tiles.

Max-biased weighed average per column, average per row

We take the weighed average per column, biasing max:min value 3:1. Then we take the plain average between rows.

While brightness still slowly increases as we zoom out (this might be tweakable with a different weight, but it also depends on the underlying values so I don't think there is a "generic" way of doing this), it is not that pronounced, and it maintains the aforementioned benefits.

I think the last strategy is a good replacement for our current one. Also, we're using numpy methods, so this does not create a significant slowdown.

settle on new strategy for merging the data
re-tile all files on the server

The text was updated successfully, but these errors were encountered:

- use numpy methods to calculate min/max faster - improve CLI feedback for user when calculating min/max - change from "pick top-left datapoint" to "max-biased weighed average". Preserves structure a LOT better when zooming out. See issue #135 on github for more details

JobLeonard · 2017-11-08T17:19:57Z

Old vs New:

https://www.youtube.com/watch?v=IjYZybeB4N4

https://www.youtube.com/watch?v=AB86fNJuzOU

(also, the private server is a few versions behind in terms of the loom-viewer. @pl-ki, can you show me tomorrow how it was set up and how I can update it?)

JobLeonard added enhancement server labels Nov 8, 2017

JobLeonard self-assigned this Nov 8, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better averaging strategy for heatmap tiles #135

Better averaging strategy for heatmap tiles #135

JobLeonard commented Nov 8, 2017 •

edited

Loading

JobLeonard commented Nov 8, 2017

Better averaging strategy for heatmap tiles #135

Better averaging strategy for heatmap tiles #135

Comments

JobLeonard commented Nov 8, 2017 • edited Loading

top-left pick (current strategy)

Average

Max value

Max value per column, average per row

Max-biased weighed average per column, average per row

JobLeonard commented Nov 8, 2017

JobLeonard commented Nov 8, 2017 •

edited

Loading