-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a hidden parameter to overwrite certain portions of the pipeline #914
Conversation
@cliu72 we're aiming to get this PR merged in by Friday, just so we can close out the many Pixie-related issues currently open. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think generate_som_avg_files
and generate_meta_avg_files
need an overwrite flag as well. Right now, adding the overwrite flag to the clustering and consensus clustering functions works, but it's not propagating to the generation of the average files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more small thing. I ran into an issue when I was testing, where I included all the FOVs in one run, then used the overwrite flag to re-do training with a subset of FOVs (which worked fine), and then used the overwrite flag again to try to re-do training with all the FOVs again (broke at this point).
I think the reason for this is that when I used cluster_pixels
with the overwrite flag for the subset of FOVs, it overwrote the files in pixel_mat_data
to only be the subset of FOVs (as it should). But then, when I tried to increase the FOVs back to the full set, pixel_mat_data
still only had a subset of FOVs (which prevents cluster_pixels
from writing to all the FOV feather files). The logical solution is to try to run create_pixel_matrix
again, to try to regenerate all of the FOV feather files. However, that results in "There are no more FOVs to preprocess, skipping", because create_pixel_matrix
is only checking in the subsetted folder to figure out which FOVs to preprocess:
fovs_sub = io_utils.list_files(os.path.join(base_dir, subset_dir), substrs='.feather') |
In the comment for that line, you said that it checks for the case where the data file was written but not the subset file:
# NOTE: this handles the case where the data file was written, but not the subset file |
Would it be possible to change that part of the code in create_pixel_matrix
to check for files that are missing in either pixel_mat_data
or pixel_mat_subset
? I think that would solve this problem.
This is probably a very rare thing that would happen, but the fix is so simple that I thought it wouldn't be a big deal.
Also running into this error when trying to subset FOVs in cell clustering (trying to test overwrite flag): Based off the warning and error, my hunch is that maybe this is caused by the fact that when you select a subset of FOVs from the cell table, the indices of the pandas table are not reset? |
@cliu72 I can make the change to address the rare issue you encountered with I'll also fix the issue of maintaining Pandas indices. I think we need to update our notebooks tests to handle these better in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
What is the purpose of this PR?
Users may have specific use cases where they'll need to rerun a Pixie function again. We need to ensure that these functions have cold start ability.
Closes #891.
How did you implement your changes
For pixel clustering functions, as well as
train_cell_som
, add an explicitoverwrite
flag that will allow for cold start runs.For
cluster_cells
, ensure that column verification doesn't fail ifcell_som_cluster
already exists in the cell data.