Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential bug in neighborhood assignments #968

Closed
ngreenwald opened this issue Mar 30, 2023 · 4 comments · Fixed by #977
Closed

Potential bug in neighborhood assignments #968

ngreenwald opened this issue Mar 30, 2023 · 4 comments · Fixed by #977
Assignees
Labels
bug Something isn't working

Comments

@ngreenwald
Copy link
Member

Please refer to our FAQ and look at our known issues before opening a bug report.

Describe the bug
I'm running into some weird behavior with the neighborhood analysis script. Specifically, it seems like cells with very similar neighborhoods are being assigned to different clusters.

For example, in the upper right hand corner, all of the blue cancer cells seem to have almost exactly the same neighbors
image

However, they are assigned to different neighborhoods in the output.
image

I'm not sure if this is related to #967. It could be that the visualization isn't working correctly. However, the heatmap of the clusters roughly lines up with the visual, so I think that's less likely. Not sure exactly what's going on. I think a good first step once #967 is resolved will be to re-run on some previous data and confirm that we still get the qualitatively same clustering results, making sure to re-generate the neighbor_counts, rather than using the previously extracted ones.

@ngreenwald ngreenwald added the bug Something isn't working label Mar 30, 2023
@camisowers
Copy link
Contributor

Looks like the color assignment is accurate based on the outputted kmeans clustering results. Seems like it could be an issue with either the neighbor matrices or distance matrices calculation, which have both been adjusted in the last 4 months. Which previous data should I test out?

@ngreenwald
Copy link
Member Author

ngreenwald commented Mar 31, 2023 via email

@camisowers
Copy link
Contributor

camisowers commented Apr 24, 2023

I verified that the the generated neighbors matrices have not changed, but it looks like there was an issue with the Kmeans function call itself.
Scikit-learn 1.2 changed the default n_init param from 10 to 'auto', which then caused the algorithm to run only once (see here). On the left is the clustering using a commit from October and the right is using main.

Screenshot 2023-04-24 at 11 52 42 AM

I was able to get the same results as before by adding n_init=10 to the Kmeans() call.

Screenshot 2023-04-24 at 11 43 03 AM

I can open a quick PR now to fix this.

@ngreenwald
Copy link
Member Author

Sounds good, thanks! Then we can redo the TONIC clustering and see if things still look weird or if this was the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants