Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pix preprocessing #579

Merged
merged 23 commits into from
Jun 9, 2022
Merged

Pix preprocessing #579

merged 23 commits into from
Jun 9, 2022

Conversation

ngreenwald
Copy link
Member

If you haven't already, please read through our contributing guidelines before opening your PR

What is the purpose of this PR?
Adds in functionality to normalize each channel of image data data separately prior to pixel clustering. This helps to make sure that markers which have different intensity values are treated equally right from the beginning of the clustering process.

In addition, it changes from removing pixels that have 0 total counts to removing pixels in the bottom 5% of total counts from the image. This better matches the format of the data following rosetta, where there are very few true zeros.

Remaining issues
The testing I put together is very basic. @alex-l-kong, if you could go in and double check that everything is working as intended, and if needed adding more testing, that would be great. Also feel free to change the organization/saving structure if you think it would be better some other way. I didn't add any new tests for create_pixel_matrix, that will likely need to be checked as well.

@alex-l-kong
Copy link
Contributor

@ngreenwald ok I'll take a look at this.

@alex-l-kong
Copy link
Contributor

@ngreenwald just tested it on Candace's dataset on my end, had to make one change to account for all-zero channels in calculate_channel_percentiles but other than that it runs without error.

@ngreenwald
Copy link
Member Author

Okay cool, let me know once it's ready to look at

@alex-l-kong alex-l-kong self-requested a review June 1, 2022 02:45
Copy link
Member Author

@ngreenwald ngreenwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple comments

ark/phenotyping/som_utils.py Outdated Show resolved Hide resolved
provided_chans=chans)

# assert no rows sum to 0
assert np.all(sample_pixel_mat.loc[:, ['chan0', 'chan1']].sum(axis=1).values != 0)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a more precise check we could add here to ensure that this addition is working? This test would pass with the previous version and this version

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngreenwald because we're dividing by row sums, we can change this test to ensure that all rows sum to 1 to test normalization with different pixel_norm_val parameters. Is this what you were thinking about, or something more specific?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean something that checks that passing pixel_norm_val is working as intended. For example, an expected decrease in the total amount of pixels included in the df or something like that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngreenwald oh ok. I can do something along the lines of:

assert sample_pixel_mat.shape[0] < (sample_img_data.shape[0] * sample_img_data.shape[1])

This will assert that we actually generated fewer pixels in sample_pixel_mat than there exist in sample_img_data.

The opposite test:

assert sample_pixel_mat.shape[0] == (sample_img_data.shape[0] * sample_img_data.shape[1])

would be good for the other 2 tests where no pixels are removed by pixel_norm_val.

ark/phenotyping/som_utils_test.py Outdated Show resolved Hide resolved
@alex-l-kong
Copy link
Contributor

@ngreenwald just one clarification question, otherwise should be good to go.

@alex-l-kong
Copy link
Contributor

@ngreenwald OK the above comment about testing pixel filtering with pixel_norm_val should be addressed now.

@ngreenwald
Copy link
Member Author

gonna test this out myself a bit more before merging it in to make sure nothing got missed

@ngreenwald ngreenwald requested a review from cliu72 June 8, 2022 01:33
@ngreenwald
Copy link
Member Author

Looks good, once @cliu72 approves I'll merge it in

ark/phenotyping/som_utils.py Outdated Show resolved Hide resolved
ark/phenotyping/som_utils.py Outdated Show resolved Hide resolved
Copy link
Contributor

@cliu72 cliu72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few small comments. Otherwise looks good to me.

ark/phenotyping/som_utils.py Outdated Show resolved Hide resolved
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@ngreenwald ngreenwald merged commit 185cf2f into master Jun 9, 2022
@ngreenwald ngreenwald deleted the pix_preprocessing branch June 9, 2022 21:57
@srivarra srivarra added the enhancement New feature or request label Jun 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants