Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a hidden parameter to overwrite certain portions of the pipeline #914

Merged
merged 18 commits into from
Mar 2, 2023

Conversation

alex-l-kong
Copy link
Contributor

What is the purpose of this PR?

Users may have specific use cases where they'll need to rerun a Pixie function again. We need to ensure that these functions have cold start ability.

Closes #891.

How did you implement your changes

For pixel clustering functions, as well as train_cell_som, add an explicit overwrite flag that will allow for cold start runs.

For cluster_cells, ensure that column verification doesn't fail if cell_som_cluster already exists in the cell data.

@alex-l-kong alex-l-kong self-assigned this Feb 15, 2023
@alex-l-kong
Copy link
Contributor Author

@cliu72 we're aiming to get this PR merged in by Friday, just so we can close out the many Pixie-related issues currently open.

Copy link
Contributor

@cliu72 cliu72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think generate_som_avg_files and generate_meta_avg_files need an overwrite flag as well. Right now, adding the overwrite flag to the clustering and consensus clustering functions works, but it's not propagating to the generation of the average files.

Copy link
Contributor

@cliu72 cliu72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more small thing. I ran into an issue when I was testing, where I included all the FOVs in one run, then used the overwrite flag to re-do training with a subset of FOVs (which worked fine), and then used the overwrite flag again to try to re-do training with all the FOVs again (broke at this point).

I think the reason for this is that when I used cluster_pixels with the overwrite flag for the subset of FOVs, it overwrote the files in pixel_mat_data to only be the subset of FOVs (as it should). But then, when I tried to increase the FOVs back to the full set, pixel_mat_data still only had a subset of FOVs (which prevents cluster_pixels from writing to all the FOV feather files). The logical solution is to try to run create_pixel_matrix again, to try to regenerate all of the FOV feather files. However, that results in "There are no more FOVs to preprocess, skipping", because create_pixel_matrix is only checking in the subsetted folder to figure out which FOVs to preprocess:

fovs_sub = io_utils.list_files(os.path.join(base_dir, subset_dir), substrs='.feather')

In the comment for that line, you said that it checks for the case where the data file was written but not the subset file:

# NOTE: this handles the case where the data file was written, but not the subset file

Would it be possible to change that part of the code in create_pixel_matrix to check for files that are missing in either pixel_mat_data or pixel_mat_subset? I think that would solve this problem.

This is probably a very rare thing that would happen, but the fix is so simple that I thought it wouldn't be a big deal.

@cliu72
Copy link
Contributor

cliu72 commented Mar 1, 2023

Also running into this error when trying to subset FOVs in cell clustering (trying to test overwrite flag):

First I get this warning:
image

Then this error:
image

Based off the warning and error, my hunch is that maybe this is caused by the fact that when you select a subset of FOVs from the cell table, the indices of the pandas table are not reset?

@alex-l-kong
Copy link
Contributor Author

@cliu72 I can make the change to address the rare issue you encountered with pixel_mat_data and pixel_mat_sub.

I'll also fix the issue of maintaining Pandas indices. I think we need to update our notebooks tests to handle these better in the future.

@alex-l-kong alex-l-kong requested a review from cliu72 March 1, 2023 23:58
Copy link
Contributor

@cliu72 cliu72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@alex-l-kong alex-l-kong merged commit 0b093db into main Mar 2, 2023
@alex-l-kong alex-l-kong deleted the overwrite_option branch March 2, 2023 20:12
@srivarra srivarra added the bug Something isn't working label Mar 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't run cluster_cells function more than once
3 participants