Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define the origin w.r.t. the pixel corner or center #89

Open
constantinpape opened this issue Feb 1, 2022 · 18 comments
Open

Define the origin w.r.t. the pixel corner or center #89

constantinpape opened this issue Feb 1, 2022 · 18 comments

Comments

@constantinpape
Copy link
Contributor

The image specification (multiscales) must define the origin, either w.r.t. pixel center or corner.

@lassoan summarized this well in #85 (comment):

In 2D applications using one of the pixel corners as origin is probably more common, but in 3D imaging voxel coordinate system origin is typically the center. In VTK, ITK, and all applications based on these toolkits origin is in the pixel center. In all 3D image file formats that I know of (nrrd, nifti, metaio, dicom) origin is in the pixel center, too.
NGFF standard must specify this (if it hasn't been specified it already).
If any software that uses different coordinate system convention internally then it can convert to/from the standard coordinate system when reading/writing images.

So I think we should use the pixel center as origin here as well. Anyone wants to make a small PR to add this, @jbms @d-v-b @lassoan @thewtex? Otherwise I can give this a shot once #85 is merged.

@constantinpape constantinpape mentioned this issue Feb 1, 2022
13 tasks
@bogovicj
Copy link
Contributor

bogovicj commented Feb 1, 2022

So I think we should use the pixel center as origin here as well.

Yes, agreed.

@lassoan
Copy link

lassoan commented Feb 1, 2022

For reference, here is how ITK defines image geometry (origin, spacing, directions): https://itk.org/ITKSoftwareGuide/html/Book1/ITKSoftwareGuide-Book1ch4.html#x45-540004.1.4

@constantinpape
Copy link
Contributor Author

Thanks for providing the link @lassoan! I read this now and in light of yesterdays discussion I think that we should not add it to 0.4, but instead tackle this in 0.5, given that @bogovicj will also make the definition of spaces and data-spaces much more explicit than it currently is. With this the origin definition will make much more sense.

If anyone thinks we need a sentence on it right now, please go ahead and propose changes in a PR; I am not opposed to adding it for v0.4 (or to make a v0.4.1 with just that change), but I can't find a good place to put this in right now without the more explicit space definition.

@xulman
Copy link

xulman commented Feb 2, 2022

So I think we should use the pixel center as origin here as well.

Yes, agreed.

Besides, there's one center while 4, 8, ? corners... and one would have to specify which corner

@jbms
Copy link

jbms commented Feb 2, 2022

The corner would surely be the start (i.e. top left) along each dimension, so I don't think there is an issue there.

Though I suppose you could in theory allow a choice of center vs "corner" (i.e. boundary) independently for each dimension.

I have a vague intuition that if the data will be displayed using interpolation then center is a more natural choice, while if the data will be displayed without interpolation (i.e. "pixelated") then corner may be a more natural choice.

For example:

Suppose we have a 1-d array of size 10 at a resolution of 4nm.

If we say integer coordinates are at the corner, then our data corresponds to the continuous physical range [0nm, 40nm] assuming we don't apply any translation.

If we instead had an array of size 5 at a resolution of 8nm, then our data would still correspond to the continuous physical range [0nm, 40nm].

If we say integer coordinates are at the center, then our data corresponds to the continuous physical range [-2nm, 38nm] if we don't apply any translation. If we instead had an array of size 5 at a resolution of 8nm, then we would instead get by default a continuous physical range [-4nm, 36nm].

On the other hand, if we are always interpolating, and will exclude the outer half of the begin and end pixels, then our array of 10 4nm pixels would correspond to the physical range [0, 36nm] and our array of 5 8nm pixels would correspond to the physical range [0, 32nm] which is maybe a bit more intuitive.

When I made the choice in Neuroglancer to use corner rather than center it was more a natural result of how the transforms were implemented rather than a conscious decision, and then changing it later was not an option for backwards compatibility reasons, so I'm curious what others think about this.

@thewtex
Copy link
Contributor

thewtex commented Feb 3, 2022

Top left is does correspond to how most visualization tools orient their rendering. However, bottom left is how most processing tools orient themselves. By using the center, we avoid these issues.

We want to support processing and rendering in these non-"pixelated" methods, e.g. @tpietzsch demo'ed rendering a splat-type rendering with BigDataViewer. Pixelated, interpolated-pixelated, non-pixelated, processing can all be supported when the transform applies to the pixel center. Considering all use cases, complexities and dependencies on the size of a pixel and which corner is taken to be the start are removed.

@jbms
Copy link

jbms commented Feb 3, 2022

@thewtex I'm a bit confused when you say "bottom left". Certainly it is fairly common for the origin of a coordinate space to correspond to the "bottom left" corner of the entire image/screen, i.e. the x axis goes left to right, and the y axis goes bottom to top. That is how OpenGL window coordinates are defined, for example.

However, I think here we are talking about the origin within an individual pixel/voxel, not of the entire coordinate space.

Let say we have a zarr array of shape (4, 3). The point label O is contained within pixel (0, 0), while the point labeled Z is contained within pixel (3, 2).

A      B     C     D      E
 +-----+-----+-----+-----+
 |     |     |     |     |
 |  O  |  P  |  Q  |  R  |
 |     |     |     |     |
N+-----+-----+-----+-----+F
 |     |     |     |     |
 |  S  |  T  |  U  |  V  |
 |     |     |     |     |
M+-----+-----+-----+-----+G
 |     |     |     |     |
 |  W  |  X  |  Y  |  Z  |
 |     |     |     |     |
 +-----+-----+-----+-----+
L      K     J     I      H

In terms of the continuous coordinate space, I would say reasonable choices of the (0, 0) origin are A and O.

I think the choice of pixel origin is independent of the choice of which screen direction should by default correspond to each dimension of the coordinate space; that could perhaps be indicated by separate metadata. For example, if the diagram were flipped vertically, but we are still assuming that point O is contained in pixel (0, 0) of the zarr array:

L      K     J     I      H
 +-----+-----+-----+-----+
 |     |     |     |     |
 |  W  |  X  |  Y  |  Z  |
 |     |     |     |     |
M+-----+-----+-----+-----+G
 |     |     |     |     |
 |  S  |  T  |  U  |  V  |
 |     |     |     |     |
N+-----+-----+-----+-----+F
 |     |     |     |     |
 |  O  |  P  |  Q  |  R  |
 |     |     |     |     |
 +-----+-----+-----+-----+
A      B     C     D      E

Then I would still say the reasonable choices of the (0, 0) origin in the continuous coordinate space are A and O.

@lassoan
Copy link

lassoan commented Feb 3, 2022

In radiology (and in 3D medical imaging software libraries and applications I know), the debates around pixel corner/center took place in the early 2000s and the community standardized on pixel center. The decision was not contested later and in general everyone is happy with it.

Image is a continuous signal that can be reconstructed flawlessly from the discrete samples stored in the voxel values (as long as the Nyquist sampling criterion was respected). Therefore pixelated display is unnecessary, and it can be also simply considered incorrect, because we know that the original signal can be reconstructed using a low-pass filter yet we construct a signal using a zero-order hold. Some people switch to pixelated display because they want to see the voxel boundaries, but this goal can be achieved much better by overlaying a grid (the grid always shows voxel boundaries clearly, regardless of brightness/contrast settings and intensity difference between neighbor voxels).

then changing it later was not an option for backwards compatibility reasons, so I'm curious what others think about this.

I can tell about a similar issue that we have in 3D Slicer. At the time when the application was designed, image axis orientation in radiology software was still not standardized. Slicer chose RAS, while over the years the rest of the radiology world ended up using LPS. Slicer kept using RAS for backward compatibility reasons. As time went on, we encountered more and more issues due to this inconsistency, but as more and more features were added, more data was generated, the larger the community grew, switching to the standard convention just got so much more complex and bigger task that it has remained out of our reach. With several years of careful work we managed to switch to standard LPS in all the files, but internally we still use RAS. This is a source of lots of complications, potential errors, and user confusion. It would have been much better to switch many years ago, when everything was still smaller and simpler.

This is of course just one example, but it illustrates how you can get into a tough spot if your software diverges from common conventions in your field. The change to using pixel center when representing 3D images with a 3D array may look hard and/or unnecessary, but in the long term it will be just harder and it may turn out that the change is practically unavoidable (because you just need to spend too much time debugging and fixing errors related to half-voxel offset errors and explaining to users and developers why they need to add half voxel offsets here and there when they use or write plugins for your software).

@thewtex
Copy link
Contributor

thewtex commented Feb 4, 2022

@jbms yes, that's right -- we want to specify O vs A. While we avoid any confusion with the orientation of the coordinate, we also support processing as @lassoan pointed out.

@bogovicj
Copy link
Contributor

Here's a blurb I wrote re this definition that I hope will be in the next version:

Coordinate convention

The pixel/voxel center is the origin of the continuous coordinate system.

It is vital to consistently define relationship between the discrete / arrray and continuous / interpolated
coordinate systems. A pixel / voxel is the continuous region that corresponds to a single sample
in the discrete array, i.e., the area corresponding to nearest-neighbor (NN) interpolation of that sample.
The center of a 2d pixel corresponding to the origin (0,0) in the discrete array is the origin of the continuous space
(0.0, 0.0) (when the transformation is the identity). The continuous rectangle of the pixel is given by the
half-open interval [-0.5, 0.5) x [-0.5, 0.5). See chapter 4 and figure 4.1 of the ITK Software Guide [[itk]].

@dstansby
Copy link
Contributor

dstansby commented Aug 16, 2024

What's the status of this discussion? Is it waiting on someone to open a pull request with a concrete proposal?

@d-v-b
Copy link
Contributor

d-v-b commented Aug 16, 2024

Regarding the status of the discussion, going by emoji and vibes my read is that most participants are in the "pixel center" camp, but that doesn't mean everyone agrees.

I think the "pixel center or pixel corner" question is basically about how to draw little squares on a screen; this is an important concern, but I think it's out of scope for a file format specification for data sampled in physical coordinates. Zarr arrays don't contain pixels, they just contain values. We use metadata to assign those values to positions in space, i.e. we assign each value to a point, (not a square / cube / hypercube). The little squares only enter the story when someone decides to display those values by resampling onto a display coordinate space after using nearest neighbor interpolation, which is just one of several possible interpolation schemes.

As for the future of the spec, I believe John will cover these points better than I can in his forthcoming transformations RFC.

@lassoan
Copy link

lassoan commented Aug 17, 2024

I think the "pixel center or pixel corner" question is basically about how to draw little squares on a screen; this is an important concern, but I think it's out of scope for a file format specification for data sampled in physical coordinates.

I agree that the file format does not need to talk about pixels, as the data array just stores signal samples at physical locations. It is up to the visualization software to specify display coordinate system and render pixels.

However, based on the discussion in this thread, it may worth adding a note to the file format specification that explains the difference between physical and display coordinate systems.

@jbms
Copy link

jbms commented Aug 17, 2024

The pixel origin matters for mapping between the discrete coordinate space of a zarr array and any continuous coordinate space. In particular, when defining a multiscale, the relative transformations that need to be specified depend on the pixel origin.

@jbms
Copy link

jbms commented Aug 17, 2024

Also, pixel origin is the center is anyway the defacto standard since that is what existing ome-zarr software and multiscale datasets use. That should be described in the spec.

@dstansby
Copy link
Contributor

dstansby commented Aug 17, 2024

To get my head around this I did some ASCII art with a toy example.

If the multiscales axes unit is meters, and the coordinate transform scale for this level is 0.2, then an array with four items maps to a physical coordinate system like this:

0       1       2       3       4
|       |       |       |       |
---------------------------------
|       |       |       |       |
0.0m    0.2m    0.4m    0.6m    0.8m

The transform is

[
    {
        "type": "scale",
        "scale": [0.2]
    }
]

This is all the spec is for, the transform between array indices and a continous space. No pixels here!

I think there are two common issues with how an array like this is visualised, and how lower resolution multiscales data are calculated from it. These two issues cancel each other out so they don't look like issues, but in reality they are.

Visualisation

For the above array, it is often visualised (I believe this is how neuroglancer currently works?) as:

0       1       2       3       4
|0000000|1111111|2222222|3333333|...
---------------------------------
|       |       |       |       |
0.0m    0.2m    0.4m    0.6m    0.8m

But this is doesn't seem like a very sensible way to visualise - physical coordiante (e.g.) 0.15m is closer to array value 1, but is being rendered as array value 0. It is worth stressing at this point that the values in the array are not pixel values to be inerpreted as being defined over an imaginary pixel or higher-dimensional voxel, they are values defined at a single point in a physical space.

As such, it makes more sense to use nearest neighbour interpolation:

0       1       2       3       4
0000|1111111|2222222|3333333|4444...
---------------------------------
|       |       |       |       |
0.0m    0.2m    0.4m    0.6m    0.8m

Creating downsampled data

To create a lower resolution level multiscale pyramid, we can bin this data by a factor of two. That means take every two values, and average them. Where should these new values lie in the physical coordinate space? It seems sensible to put them halfway between the two values we averaged:

    0               1
|   |   |   |   |   |   |   |   |
---------------------------------
|       |       |       |       |
0.0m    0.2m    0.4m    0.6m    0.8m

The correct transform for this level is then:

[
    {
        "type": "scale",
        "scale": [0.4]
    }
    {
        "type": "translation",
        "translation": [0.1]
    },
]

But the translation is easy to miss (certainly, software I have written currently misses the translation and I only noticed when writing this comment 😱). Without the translation we would have:

0               1   
|   |   |   |   |   |   |   |   |
---------------------------------
|       |       |       |       |
0.0m    0.2m    0.4m    0.6m    0.8m

If you display this array using the first option in Visualisation above, the pixels overlap with the pixels they have been binned from, so two wrongs have consipred to make something that looks right!

I am guessing this is the thinking the radiology community went through as mentioned in #89 (comment). I used to work in astronomy, where the same issue was also present in the past until we did a drive to educate folks on how array values mapped to pixels (ie, your array value is defined at the centre of a pixel, not the corner).

Sorry for the long post, but I hope it explains stuff for others as well as me? The conclusions I'm getting from this and the conversation above are:

  • The spec does not need to mention anything about pixels
  • ... but as a community we need to educate folks to get their transforms right when creating multiscale data, and check and potentially correct our own software

@d-v-b
Copy link
Contributor

d-v-b commented Aug 17, 2024

{
   "type": "translation",
  "translation": [-0.5]
}

I think the sign is wrong here -- if your original coordinates were [0, 1, 2, 3, ...], then downsampling by 2x windowed averaging will yield coordinates [0.5, 2.5, ...].

@dstansby
Copy link
Contributor

Thanks for catching, I think fixed now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants