-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pix preprocessing #579
Pix preprocessing #579
Conversation
@ngreenwald ok I'll take a look at this. |
@ngreenwald just tested it on Candace's dataset on my end, had to make one change to account for all-zero channels in |
Okay cool, let me know once it's ready to look at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple comments
ark/phenotyping/som_utils_test.py
Outdated
provided_chans=chans) | ||
|
||
# assert no rows sum to 0 | ||
assert np.all(sample_pixel_mat.loc[:, ['chan0', 'chan1']].sum(axis=1).values != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a more precise check we could add here to ensure that this addition is working? This test would pass with the previous version and this version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ngreenwald because we're dividing by row sums, we can change this test to ensure that all rows sum to 1 to test normalization with different pixel_norm_val
parameters. Is this what you were thinking about, or something more specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean something that checks that passing pixel_norm_val
is working as intended. For example, an expected decrease in the total amount of pixels included in the df or something like that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ngreenwald oh ok. I can do something along the lines of:
assert sample_pixel_mat.shape[0] < (sample_img_data.shape[0] * sample_img_data.shape[1])
This will assert that we actually generated fewer pixels in sample_pixel_mat
than there exist in sample_img_data
.
The opposite test:
assert sample_pixel_mat.shape[0] == (sample_img_data.shape[0] * sample_img_data.shape[1])
would be good for the other 2 tests where no pixels are removed by pixel_norm_val
.
@ngreenwald just one clarification question, otherwise should be good to go. |
@ngreenwald OK the above comment about testing pixel filtering with |
gonna test this out myself a bit more before merging it in to make sure nothing got missed |
Looks good, once @cliu72 approves I'll merge it in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few small comments. Otherwise looks good to me.
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
If you haven't already, please read through our contributing guidelines before opening your PR
What is the purpose of this PR?
Adds in functionality to normalize each channel of image data data separately prior to pixel clustering. This helps to make sure that markers which have different intensity values are treated equally right from the beginning of the clustering process.
In addition, it changes from removing pixels that have 0 total counts to removing pixels in the bottom 5% of total counts from the image. This better matches the format of the data following rosetta, where there are very few true zeros.
Remaining issues
The testing I put together is very basic. @alex-l-kong, if you could go in and double check that everything is working as intended, and if needed adding more testing, that would be great. Also feel free to change the organization/saving structure if you think it would be better some other way. I didn't add any new tests for create_pixel_matrix, that will likely need to be checked as well.