-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define the origin w.r.t. the pixel corner or center #89
Comments
Yes, agreed. |
For reference, here is how ITK defines image geometry (origin, spacing, directions): https://itk.org/ITKSoftwareGuide/html/Book1/ITKSoftwareGuide-Book1ch4.html#x45-540004.1.4 |
Thanks for providing the link @lassoan! I read this now and in light of yesterdays discussion I think that we should not add it to 0.4, but instead tackle this in 0.5, given that @bogovicj will also make the definition of spaces and data-spaces much more explicit than it currently is. With this the origin definition will make much more sense. If anyone thinks we need a sentence on it right now, please go ahead and propose changes in a PR; I am not opposed to adding it for v0.4 (or to make a v0.4.1 with just that change), but I can't find a good place to put this in right now without the more explicit space definition. |
Besides, there's one center while 4, 8, ? corners... and one would have to specify which corner |
The corner would surely be the start (i.e. top left) along each dimension, so I don't think there is an issue there. Though I suppose you could in theory allow a choice of center vs "corner" (i.e. boundary) independently for each dimension. I have a vague intuition that if the data will be displayed using interpolation then center is a more natural choice, while if the data will be displayed without interpolation (i.e. "pixelated") then corner may be a more natural choice. For example: Suppose we have a 1-d array of size 10 at a resolution of 4nm. If we say integer coordinates are at the corner, then our data corresponds to the continuous physical range [0nm, 40nm] assuming we don't apply any translation. If we instead had an array of size 5 at a resolution of 8nm, then our data would still correspond to the continuous physical range [0nm, 40nm]. If we say integer coordinates are at the center, then our data corresponds to the continuous physical range [-2nm, 38nm] if we don't apply any translation. If we instead had an array of size 5 at a resolution of 8nm, then we would instead get by default a continuous physical range [-4nm, 36nm]. On the other hand, if we are always interpolating, and will exclude the outer half of the begin and end pixels, then our array of 10 4nm pixels would correspond to the physical range [0, 36nm] and our array of 5 8nm pixels would correspond to the physical range [0, 32nm] which is maybe a bit more intuitive. When I made the choice in Neuroglancer to use corner rather than center it was more a natural result of how the transforms were implemented rather than a conscious decision, and then changing it later was not an option for backwards compatibility reasons, so I'm curious what others think about this. |
Top left is does correspond to how most visualization tools orient their rendering. However, bottom left is how most processing tools orient themselves. By using the center, we avoid these issues. We want to support processing and rendering in these non-"pixelated" methods, e.g. @tpietzsch demo'ed rendering a splat-type rendering with BigDataViewer. Pixelated, interpolated-pixelated, non-pixelated, processing can all be supported when the transform applies to the pixel center. Considering all use cases, complexities and dependencies on the size of a pixel and which corner is taken to be the start are removed. |
@thewtex I'm a bit confused when you say "bottom left". Certainly it is fairly common for the origin of a coordinate space to correspond to the "bottom left" corner of the entire image/screen, i.e. the x axis goes left to right, and the y axis goes bottom to top. That is how OpenGL window coordinates are defined, for example. However, I think here we are talking about the origin within an individual pixel/voxel, not of the entire coordinate space. Let say we have a zarr array of shape (4, 3). The point label O is contained within pixel (0, 0), while the point labeled Z is contained within pixel (3, 2).
In terms of the continuous coordinate space, I would say reasonable choices of the (0, 0) origin are A and O. I think the choice of pixel origin is independent of the choice of which screen direction should by default correspond to each dimension of the coordinate space; that could perhaps be indicated by separate metadata. For example, if the diagram were flipped vertically, but we are still assuming that point O is contained in pixel (0, 0) of the zarr array:
Then I would still say the reasonable choices of the (0, 0) origin in the continuous coordinate space are A and O. |
In radiology (and in 3D medical imaging software libraries and applications I know), the debates around pixel corner/center took place in the early 2000s and the community standardized on pixel center. The decision was not contested later and in general everyone is happy with it. Image is a continuous signal that can be reconstructed flawlessly from the discrete samples stored in the voxel values (as long as the Nyquist sampling criterion was respected). Therefore pixelated display is unnecessary, and it can be also simply considered incorrect, because we know that the original signal can be reconstructed using a low-pass filter yet we construct a signal using a zero-order hold. Some people switch to pixelated display because they want to see the voxel boundaries, but this goal can be achieved much better by overlaying a grid (the grid always shows voxel boundaries clearly, regardless of brightness/contrast settings and intensity difference between neighbor voxels).
I can tell about a similar issue that we have in 3D Slicer. At the time when the application was designed, image axis orientation in radiology software was still not standardized. Slicer chose RAS, while over the years the rest of the radiology world ended up using LPS. Slicer kept using RAS for backward compatibility reasons. As time went on, we encountered more and more issues due to this inconsistency, but as more and more features were added, more data was generated, the larger the community grew, switching to the standard convention just got so much more complex and bigger task that it has remained out of our reach. With several years of careful work we managed to switch to standard LPS in all the files, but internally we still use RAS. This is a source of lots of complications, potential errors, and user confusion. It would have been much better to switch many years ago, when everything was still smaller and simpler. This is of course just one example, but it illustrates how you can get into a tough spot if your software diverges from common conventions in your field. The change to using pixel center when representing 3D images with a 3D array may look hard and/or unnecessary, but in the long term it will be just harder and it may turn out that the change is practically unavoidable (because you just need to spend too much time debugging and fixing errors related to half-voxel offset errors and explaining to users and developers why they need to add half voxel offsets here and there when they use or write plugins for your software). |
Here's a blurb I wrote re this definition that I hope will be in the next version:
|
What's the status of this discussion? Is it waiting on someone to open a pull request with a concrete proposal? |
Regarding the status of the discussion, going by emoji and vibes my read is that most participants are in the "pixel center" camp, but that doesn't mean everyone agrees. I think the "pixel center or pixel corner" question is basically about how to draw little squares on a screen; this is an important concern, but I think it's out of scope for a file format specification for data sampled in physical coordinates. Zarr arrays don't contain pixels, they just contain values. We use metadata to assign those values to positions in space, i.e. we assign each value to a point, (not a square / cube / hypercube). The little squares only enter the story when someone decides to display those values by resampling onto a display coordinate space after using nearest neighbor interpolation, which is just one of several possible interpolation schemes. As for the future of the spec, I believe John will cover these points better than I can in his forthcoming transformations RFC. |
I agree that the file format does not need to talk about pixels, as the data array just stores signal samples at physical locations. It is up to the visualization software to specify display coordinate system and render pixels. However, based on the discussion in this thread, it may worth adding a note to the file format specification that explains the difference between physical and display coordinate systems. |
The pixel origin matters for mapping between the discrete coordinate space of a zarr array and any continuous coordinate space. In particular, when defining a multiscale, the relative transformations that need to be specified depend on the pixel origin. |
Also, pixel origin is the center is anyway the defacto standard since that is what existing ome-zarr software and multiscale datasets use. That should be described in the spec. |
To get my head around this I did some ASCII art with a toy example. If the multiscales axes unit is meters, and the coordinate transform scale for this level is 0.2, then an array with four items maps to a physical coordinate system like this:
The transform is [
{
"type": "scale",
"scale": [0.2]
}
] This is all the spec is for, the transform between array indices and a continous space. No pixels here! I think there are two common issues with how an array like this is visualised, and how lower resolution multiscales data are calculated from it. These two issues cancel each other out so they don't look like issues, but in reality they are. VisualisationFor the above array, it is often visualised (I believe this is how neuroglancer currently works?) as:
But this is doesn't seem like a very sensible way to visualise - physical coordiante (e.g.) 0.15m is closer to array value As such, it makes more sense to use nearest neighbour interpolation:
Creating downsampled dataTo create a lower resolution level multiscale pyramid, we can bin this data by a factor of two. That means take every two values, and average them. Where should these new values lie in the physical coordinate space? It seems sensible to put them halfway between the two values we averaged:
The correct transform for this level is then: [
{
"type": "scale",
"scale": [0.4]
}
{
"type": "translation",
"translation": [0.1]
},
] But the translation is easy to miss (certainly, software I have written currently misses the translation and I only noticed when writing this comment 😱). Without the translation we would have:
If you display this array using the first option in Visualisation above, the pixels overlap with the pixels they have been binned from, so two wrongs have consipred to make something that looks right! I am guessing this is the thinking the radiology community went through as mentioned in #89 (comment). I used to work in astronomy, where the same issue was also present in the past until we did a drive to educate folks on how array values mapped to pixels (ie, your array value is defined at the centre of a pixel, not the corner). Sorry for the long post, but I hope it explains stuff for others as well as me? The conclusions I'm getting from this and the conversation above are:
|
I think the sign is wrong here -- if your original coordinates were |
Thanks for catching, I think fixed now! |
The image specification (
multiscales
) must define the origin, either w.r.t. pixel center or corner.@lassoan summarized this well in #85 (comment):
So I think we should use the pixel center as origin here as well. Anyone wants to make a small PR to add this, @jbms @d-v-b @lassoan @thewtex? Otherwise I can give this a shot once #85 is merged.
The text was updated successfully, but these errors were encountered: