Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporarily remove pixel preprocessing normalization for Candace's paper #913

Merged
merged 23 commits into from
Mar 14, 2023

Conversation

alex-l-kong
Copy link
Contributor

What is the purpose of this PR?

@cliu72 will need to remove channel_norm_df normalization and pixel_thresh from the pixel preprocessing step of Pixie.

How did you implement your changes

Explicitly remove both of these processes.

Remaining issues

See discussion thread.

@alex-l-kong alex-l-kong self-assigned this Feb 14, 2023
@alex-l-kong
Copy link
Contributor Author

@cliu72 is it fine to leave in the computations of channel_norm_df and pixel_thresh_df as long as they're not used for preprocessing, or would the reviewers need all references to these completely purged?

@cliu72
Copy link
Contributor

cliu72 commented Feb 16, 2023

@cliu72 is it fine to leave in the computations of channel_norm_df and pixel_thresh_df as long as they're not used for preprocessing, or would the reviewers need all references to these completely purged?

I think it's okay to leave in the computations to make it easier for later, but can we remove writing those files? The files with the 99.9% value and the pixel threshold value. I think having those files written will be confusing.

@alex-l-kong
Copy link
Contributor Author

@cliu72 we're aiming to get this PR merged in by Friday so we can close out as many open Pixie issues as possible before your submission.

Copy link
Contributor

@cliu72 cliu72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Copy link
Member

@ngreenwald ngreenwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets wait to merge this in until all of the other PRs are merged in, as well as after all the testing candace does, to make sure that undoing this is as simple as possible.

@alex-l-kong
Copy link
Contributor Author

alex-l-kong commented Mar 2, 2023

@ngreenwald @cliu72 with the overwrite branch now merged into main, I think that concludes all the dependencies for this task. If we're ready, we can merge this PR in and tag this as the release for Candace's paper.

@cliu72
Copy link
Contributor

cliu72 commented Mar 2, 2023

I'm going to do one more check over everything, which I'll do hopefully today/tomorrow. I think we can wait to merge this PR then after that.

Copy link
Member

@ngreenwald ngreenwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge conflicts with the refactoring branch. Once those are resolved this is good to go.

Copy link
Member

@ngreenwald ngreenwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what happened here, but none of the changes are present anymore

@alex-l-kong
Copy link
Contributor Author

Oh that's weird, let me fix that up again.

@alex-l-kong
Copy link
Contributor Author

Looks like GitHub had trouble mapping the changes into pixie_preprocessing.py. Should be good to go now!

@cliu72
Copy link
Contributor

cliu72 commented Mar 13, 2023

I don't think all the changes have been re-implemented after the merge conflicts. These lines should be gone (were removed in the original review):

# remove any rows with channels with a sum below the threshold
rowsums = pixel_mat[channels].sum(axis=1)
pixel_mat = pixel_mat.loc[rowsums > pixel_thresh_val, :].reset_index(drop=True)

# create vector for normalizing image data
norm_vect = channel_norm_df.iloc[0].values
norm_vect = np.array(norm_vect).reshape([1, 1, len(norm_vect)])
# normalize image data
img_data = img_data / norm_vect

After thinking on it some more, I think it would be best to remove these too (to avoid confusion):

# define path to channel normalization values
channel_norm_path = os.path.join(
base_dir, pixel_output_dir, 'channel_norm.feather'
)
# define path to pixel normalization values
pixel_thresh_path = os.path.join(
base_dir, pixel_output_dir, 'pixel_thresh.feather'
)
# reset entire cohort if channels provided are different from ones in existing channel_norm
if os.path.exists(channel_norm_path):
channel_norm_df = feather.read_dataframe(channel_norm_path)
if set(channel_norm_df.columns.values) != set(channels):
print("New channels provided: overwriting whole cohort")
# delete the existing data in data_dir and subset_dir
rmtree(os.path.join(base_dir, data_dir))
os.mkdir(os.path.join(base_dir, data_dir))
rmtree(os.path.join(base_dir, subset_dir))
os.mkdir(os.path.join(base_dir, subset_dir))
# delete the existing channel_norm.feather and pixel_thresh.feather
os.remove(channel_norm_path)
os.remove(pixel_thresh_path)

# load existing channel_norm_path if exists, otherwise generate
if not os.path.exists(channel_norm_path):
# compute channel percentiles
channel_norm_df = pixel_cluster_utils.calculate_channel_percentiles(
tiff_dir=tiff_dir,
fovs=fovs,
channels=channels,
img_sub_folder=img_sub_folder,
percentile=channel_percentile
)
else:
# load previously generated output
channel_norm_df = feather.read_dataframe(channel_norm_path)
# load existing pixel_thresh_path if exists, otherwise generate
if not os.path.exists(pixel_thresh_path):
# compute pixel percentiles
pixel_thresh_val = pixel_cluster_utils.calculate_pixel_intensity_percentile(
tiff_dir=tiff_dir, fovs=fovs, channels=channels,
img_sub_folder=img_sub_folder, channel_percentiles=channel_norm_df
)
pixel_thresh_df = pd.DataFrame({'pixel_thresh_val': [pixel_thresh_val]})
else:
pixel_thresh_df = feather.read_dataframe(pixel_thresh_path)
pixel_thresh_val = pixel_thresh_df['pixel_thresh_val'].values[0]

Remove channel_norm_df and pixel_thresh_val as inputs to preprocess_fov

@alex-l-kong
Copy link
Contributor Author

@cliu72 OK this should do the trick.

Copy link
Contributor

@cliu72 cliu72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants