Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Imagestack.transform executes applied function on xarray objects instead of numpy arrays #1035

Merged
merged 2 commits into from
Feb 23, 2019

Conversation

ambrosejcarr
Copy link
Member

@ambrosejcarr ambrosejcarr commented Feb 21, 2019

@kevinyamauchi pointed out in #987 and #983 that our apply method hampers our ability to write expressive PipelineComponents because it returns numpy arrays instead of xarray DataArray objects.

This PR:

  1. Modifies ImageStack.transform so that methods used by apply can take advantage of xarray named axes. Because we are rebuilding an array from the sharedmemory object, this does not give us access to data stored in the coordinates. Thus, the returned arrays only guarantee that the Axes are in the correct order. If is_volume is True, xarrays generated by ImageStack.transform have dims = (z, y, x) and no coordinates. If is_volume is False, xarrays generated by ImageStack.transform have dims = (y, x) and no coordinates. These are set based on the information in starfish.imagestack.dataorder.AXES_DATA

  2. Fixes a bug in preserve_float_range. That function has obviously never seen an xarray before. 😆

  3. Minor updates to SpotFinder methods to support receiving xr.DataArray objects

  4. Updates to docstrings to reflect flexibility to receive Union[xr.DataArray, np.ndarray] images.

@codecov-io
Copy link

codecov-io commented Feb 21, 2019

Codecov Report

Merging #1035 into master will increase coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1035      +/-   ##
==========================================
+ Coverage    88.7%   88.74%   +0.03%     
==========================================
  Files         164      164              
  Lines        6036     6048      +12     
==========================================
+ Hits         5354     5367      +13     
+ Misses        682      681       -1
Impacted Files Coverage Δ
starfish/image/_filter/gaussian_high_pass.py 96.87% <ø> (ø) ⬆️
starfish/image/_filter/scale_by_percentile.py 100% <100%> (ø) ⬆️
starfish/spots/_detector/detect.py 96.82% <100%> (-0.05%) ⬇️
...ish/image/_filter/richardson_lucy_deconvolution.py 94.44% <100%> (+0.1%) ⬆️
starfish/image/_filter/white_tophat.py 100% <100%> (ø) ⬆️
...h/spots/_detector/trackpy_local_max_peak_finder.py 91.8% <100%> (+0.27%) ⬆️
starfish/imagestack/imagestack.py 84% <100%> (+0.03%) ⬆️
starfish/image/_filter/util.py 85.71% <100%> (+2.04%) ⬆️
starfish/image/_filter/bandpass.py 94.59% <100%> (+0.15%) ⬆️
starfish/image/_filter/mean_high_pass.py 96.87% <100%> (+0.1%) ⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7998996...5119aa4. Read the comment docs.

This was referenced Feb 21, 2019
@kevinyamauchi
Copy link
Collaborator

Thanks, @ambrosejcarr! I am not very experienced with multiprocessing, so I think I should leave the approval to @shanaxel42.

I did have a question though: in general, why do the filtering functions in all of the filters retain compatibility for image being passed in as an ndarray?

@ambrosejcarr
Copy link
Member Author

ambrosejcarr commented Feb 21, 2019

I did have a question though: in general, why do the filtering functions in all of the filters retain compatibility for image being passed in as an ndarray?

Not all of them did, but this is by design: xarray wraps a numpy array and implements or delegates most of the numpy array functionality. Pretty cool right? :-)

@kevinyamauchi
Copy link
Collaborator

Sorry for being unclear. I was more specifically wondering if the fact that the type hint for image in many of the filtering methods (e.g., Clip._clip()) include ndarray implies that there are conditions in which ImageStack._processing_workflow() would pass an ndarray instead of an xarray. Or is it just to indicate the compatibility as you described above?

@ambrosejcarr
Copy link
Member Author

ambrosejcarr commented Feb 21, 2019

is it just to indicate the compatibility as you described above?

Just to indicate compatibility. 👍

There is also the case that someone calls the private method with a numpy array. The only execution flow that I'm aware of in the codebase now involves xarray objects.

@kevinyamauchi
Copy link
Collaborator

Got it. Thanks!

Copy link
Collaborator

@shanaxel42 shanaxel42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, just a few comments

if np.any(data < 0):
data[array < 0] = 0
if np.any(array > 1):
data[data < 0] = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did this have to change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐛 !

If data is an xarray, we need to convert it to a numpy array to run these comparisons because indexing works differently between the two objects: http://xarray.pydata.org/en/stable/indexing.html#indexing-rules

This function was just never properly tested with an xarray input. np.ndarray[xr.DataArray] does not work.

dims = [
name.value for (name, data) in
sorted(AXES_DATA.items(), key=lambda kv: kv[1].order)
] + [Axes.Y.value, Axes.X.value]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're just trying to maintain the current axes order here right? why not just dims = [dim for dim in self.xarray.dims]

Copy link
Member Author

@ambrosejcarr ambrosejcarr Feb 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this is a static method so does not have access to self. Happy to simplify this if you can find an easier way!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohhhh I missed that. hmm well you could make dims_order a param you pass into _processing_workflow, then do then above in transform()... might be a little more clear

@ambrosejcarr ambrosejcarr merged commit ade276c into master Feb 23, 2019
@ambrosejcarr ambrosejcarr deleted the ajc-transform-xarray branch February 23, 2019 19:18

# build and then slice the xarray to get the piece needed for this worker
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not set the xarray coordinates correctly. I imagine that if you use crop-on-load, your labels will all be incorrect.

ttung pushed a commit that referenced this pull request Feb 26, 2019
There are some places in the codebase where we need a numpy array instead of an xarray.  Switch all instances where we do this to `np.asarray(...)`.  Also, fixed type hints where appropriate.

This should fix the master-breaking bug in #1035, but does not resolve the issue of (xarray) coordinates not being transferred to the constituent xarrays.
ttung added a commit that referenced this pull request Feb 28, 2019
There are some places in the codebase where we need a numpy array instead of an xarray.  Switch all instances where we do this to `np.asarray(...)`.  Also, fixed type hints where appropriate.

This should fix the master-breaking bug in #1035, but does not resolve the issue of (xarray) coordinates not being transferred to the constituent xarrays.
ttung pushed a commit that referenced this pull request Apr 11, 2019
ImageStack.apply uses slice indices based on the labels, but the xarray passed to the worker process does not have the labels.  This PR passes the dims and the coordinates of the original xarray to the worker process so it can reconstitute an identical xarray.

Test plan: Added a test that creates an ImageStack with labeled indices, and runs apply on it.  Without the code change, it fails.  With the code change, it works.

Fixes #1108, and the comment raised in #1035 (comment) :)
ttung added a commit that referenced this pull request Apr 11, 2019
ImageStack.apply uses slice indices based on the labels, but the xarray passed to the worker process does not have the labels.  This PR passes the dims and the coordinates of the original xarray to the worker process so it can reconstitute an identical xarray.

Test plan: Added a test that creates an ImageStack with labeled indices, and runs apply on it.  Without the code change, it fails.  With the code change, it works.

Fixes #1108, and the comment raised in #1035 (comment) :)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants