Use `random_state` parameter in KMeans clustering for reproducibility #834

alex-l-kong · 2022-11-16T23:35:06Z

What is the purpose of this PR?

Closes #812. Adds a random seed to KMeans clustering in neighborhood analysis to ensure reproducible results.

How did you implement your changes

Pass a seed parameter to the cluster metrics and assignment functions of neighborhood analysis, which is analogous with random_state in sklearn.cluster.KMeans and pd.DataFrame.sample.

ngreenwald

Did you run through the whole notebook with the same data twice to make sure you can replicate the same output?

alex-l-kong · 2022-11-17T02:58:05Z

Yes On Nov 16, 2022, at 4:52 PM, Noah F. Greenwald ***@***.***> wrote: @ngreenwald requested changes on this pull request. Did you run through the whole notebook with the same data twice to make sure you can replicate the same output? —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were assigned.Message ID: ***@***.***>

cliu72

Looks good to me

alex-l-kong added 3 commits November 16, 2022 14:53

Add random seeds to k-means clustering steps for neighborhood analysis

c825586

Make sure subsampling for silhouette also gets a random seed

90c9fa6

Fix random_state parameter position in sample function for subsampling

c31811b

alex-l-kong self-assigned this Nov 16, 2022

alex-l-kong requested review from ngreenwald and cliu72 November 17, 2022 00:29

ngreenwald requested changes Nov 17, 2022

View reviewed changes

alex-l-kong requested a review from ngreenwald November 17, 2022 03:15

cliu72 reviewed Nov 17, 2022

View reviewed changes

ngreenwald merged commit e1af451 into main Nov 17, 2022

ngreenwald deleted the kmeans_seed branch November 17, 2022 21:30

srivarra added the enhancement New feature or request label Jan 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `random_state` parameter in KMeans clustering for reproducibility #834

Use `random_state` parameter in KMeans clustering for reproducibility #834

alex-l-kong commented Nov 16, 2022

ngreenwald left a comment

alex-l-kong commented Nov 17, 2022 via email •

edited

Loading

cliu72 left a comment

Use random_state parameter in KMeans clustering for reproducibility #834

Use random_state parameter in KMeans clustering for reproducibility #834

Conversation

alex-l-kong commented Nov 16, 2022

ngreenwald left a comment

Choose a reason for hiding this comment

alex-l-kong commented Nov 17, 2022 via email • edited Loading

cliu72 left a comment

Choose a reason for hiding this comment

Use `random_state` parameter in KMeans clustering for reproducibility #834

Use `random_state` parameter in KMeans clustering for reproducibility #834

alex-l-kong commented Nov 17, 2022 via email •

edited

Loading