You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Because of issues #114 and #134 I'm looking at this code again I've done a quick investigation into strategies for generating zoomed-out tiles from our data.
Currently, we pick the top-left corner out of four data-points. This allows for enormous systematic biases: half of the genes are removed for each zoom level! I think we can do better.
For comparison, here is a pixel-perfect zoomed-in view of the cortex.loom dataset:
Ideally, we want to maintain similar brightness, some sense of noise profile, and visible structures. In practice we will need to compromise on something that does well but no perfect on all three.
top-left pick (current strategy)
This happens works decently enough on this dataset, presumably because the distribution in the data is random enough to counter the systematic bias. On other datasets the zoomed out view is almost completely blue, despite having non-blue rows, hiding interesting spots.
Also, structures present in zoomed in views (rows and columns that have expression levels from top to bottom) is almost completely gone when zooming out.
Average
Too smooth, and because the value distribution is not uniform it introduces a bias of its own by draging the high values down. It does preserve structure better.
Max value
Yeah... moving on...
Max value per column, average per row
Now we're getting somewhere! While still biased to the maximum values too much, resulting in higher values every time we zoom out, this maintains the structure visible when looking at the zoomed in tiles.
Max-biased weighed average per column, average per row
We take the weighed average per column, biasing max:min value 3:1. Then we take the plain average between rows.
While brightness still slowly increases as we zoom out (this might be tweakable with a different weight, but it also depends on the underlying values so I don't think there is a "generic" way of doing this), it is not that pronounced, and it maintains the aforementioned benefits.
I think the last strategy is a good replacement for our current one. Also, we're using numpy methods, so this does not create a significant slowdown.
settle on new strategy for merging the data
re-tile all files on the server
The text was updated successfully, but these errors were encountered:
- use numpy methods to calculate min/max faster
- improve CLI feedback for user when calculating min/max
- change from "pick top-left datapoint" to
"max-biased weighed average". Preserves structure
a LOT better when zooming out.
See issue #135 on github for more details
(also, the private server is a few versions behind in terms of the loom-viewer. @pl-ki, can you show me tomorrow how it was set up and how I can update it?)
Because of issues #114 and #134 I'm looking at this code again I've done a quick investigation into strategies for generating zoomed-out tiles from our data.
Currently, we pick the top-left corner out of four data-points. This allows for enormous systematic biases: half of the genes are removed for each zoom level! I think we can do better.
For comparison, here is a pixel-perfect zoomed-in view of the
cortex.loom
dataset:Ideally, we want to maintain similar brightness, some sense of noise profile, and visible structures. In practice we will need to compromise on something that does well but no perfect on all three.
top-left pick (current strategy)
This happens works decently enough on this dataset, presumably because the distribution in the data is random enough to counter the systematic bias. On other datasets the zoomed out view is almost completely blue, despite having non-blue rows, hiding interesting spots.
Also, structures present in zoomed in views (rows and columns that have expression levels from top to bottom) is almost completely gone when zooming out.
Average
Too smooth, and because the value distribution is not uniform it introduces a bias of its own by draging the high values down. It does preserve structure better.
Max value
Yeah... moving on...
Max value per column, average per row
Now we're getting somewhere! While still biased to the maximum values too much, resulting in higher values every time we zoom out, this maintains the structure visible when looking at the zoomed in tiles.
Max-biased weighed average per column, average per row
We take the weighed average per column, biasing max:min value 3:1. Then we take the plain average between rows.
While brightness still slowly increases as we zoom out (this might be tweakable with a different weight, but it also depends on the underlying values so I don't think there is a "generic" way of doing this), it is not that pronounced, and it maintains the aforementioned benefits.
I think the last strategy is a good replacement for our current one. Also, we're using numpy methods, so this does not create a significant slowdown.
The text was updated successfully, but these errors were encountered: