Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure cluster_helpers.PixelSOMCluster receives the fovs argument #905

Merged
merged 9 commits into from
Feb 14, 2023

Conversation

alex-l-kong
Copy link
Contributor

What is the purpose of this PR?

This argument was not propagated over to PixelSOMCluster, so it would always load in all the FOVs. It should only read the subset of FOVs

How did you implement your changes

Add a param fovs to PixelSOMCluster constructor to address this. When loading in the subsetted data frames, file names are verified against the fovs list provided.

@alex-l-kong alex-l-kong self-assigned this Feb 8, 2023
Copy link
Contributor

@cliu72 cliu72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be a good idea to add this functionality to CellSOMCluster as well (i.e. subset the cell table for only fovs specified in the init)

Copy link
Contributor

@cliu72 cliu72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

Related issue - it doesn't seem like fovs is propagating to cluster_pixels either. In cluster_pixels, there is an fov argument, but the fov list is generated by looking through all fovs in the data folder.

fovs_list = find_fovs_missing_col(base_dir, data_dir, 'pixel_som_cluster')

I think it would be good to add a line after this, only keeping the fovs that are specified in the fovs argument.

Not exactly what is being addressed here - if you want, you can open a new issue for that, and I can approve this one.

@alex-l-kong
Copy link
Contributor Author

@cliu72 if I understand correctly the list of fovs is meant to propagate across the entire pipeline. At the stage you've listed, the only FOVs that can be contained in data_dir are the ones already contained in the list of fovs, since that same list was passed to create_pixel_matrix in creating this folder. Is there a case where the user will want to specify a smaller list of fovs to assign SOM labels to? We can do that, but keep in mind we'd have to add validation checks to later steps in the pipeline to ensure, among other things, we don't attempt to consensus cluster on FOVs that don't already have SOM labels.

One improvement we can make in the meantime, however, is to avoid having to pass this list of FOVs, and instead refer to pixel_pysom.fovs instead. This list of fovs in cluster_pixels is really only meant to check if restarting functionality is needed.

@cliu72
Copy link
Contributor

cliu72 commented Feb 13, 2023

@cliu72 if I understand correctly the list of fovs is meant to propagate across the entire pipeline. At the stage you've listed, the only FOVs that can be contained in data_dir are the ones already contained in the list of fovs, since that same list was passed to create_pixel_matrix in creating this folder. Is there a case where the user will want to specify a smaller list of fovs to assign SOM labels to? We can do that, but keep in mind we'd have to add validation checks to later steps in the pipeline to ensure, among other things, we don't attempt to consensus cluster on FOVs that don't already have SOM labels.

One improvement we can make in the meantime, however, is to avoid having to pass this list of FOVs, and instead refer to pixel_pysom.fovs instead. This list of fovs in cluster_pixels is really only meant to check if restarting functionality is needed.

The reason that this issue even came up is because Hadeesha had pre-processed all the FOVs in his folder, then wanted to run the whole pipeline (not just training) on a subset of them. He was trying to do some time tests, but he didn't want to completely delete the pre-processed matrix folder since it took a long time to make that folder. That's why we wanted the FOVs argument to populate to the PIxelSOMcluster in the first place - before this change, it was just reading all the FOVs in the subsetted folder (which works under the paradigm that you talked about, but not if somebody wants to run the pipeline on only some of the FOVs). So to answer your question, yes, I think there is a situation in which the user will want to specify a smaller list of fovs (when testing, as Hadeesha is doing now).

I don't think it will be that hard to change this for consensus clustering either. That function also uses find_fovs_missing_col. I think it's literally a one line fix in both functions - just take the intersection of find_fovs_missing_col and the list of fovs specified by the user.

…s_missing_col to allow for further fov subsetting
@alex-l-kong
Copy link
Contributor Author

alex-l-kong commented Feb 13, 2023

@cliu72 got it, just wanted to double check. I've just made the change and tested that the pipeline runs on a subset of FOVs.

Copy link
Contributor

@cliu72 cliu72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@ngreenwald ngreenwald merged commit 9a81365 into main Feb 14, 2023
@ngreenwald ngreenwald deleted the propagate_fovs branch February 14, 2023 18:36
@srivarra srivarra added the enhancement New feature or request label Feb 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants