Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add random_state option to benchmark function #160

Closed
utf opened this issue Jan 15, 2019 · 3 comments
Closed

Add random_state option to benchmark function #160

utf opened this issue Jan 15, 2019 · 3 comments

Comments

@utf
Copy link
Member

utf commented Jan 15, 2019

Would be nice to add support for the random_sample variable of pandas.DataFrame.sample() for pipeline benchmarking.

E.g. implemented for this line:

testdf, traindf = np.split(df.sample(frac=1), [int(test_spec * len(df))])

Would allow for random but deterministic sampling of the dataframe when choosing the test/train split. This way you can benchmark two models on the same dataframe and automatically have the same test/train split. Can just override the benchmark test_spec variable again (e.g. if test_spec is an int or numpy.random.RandomState object then use it as the random_sample variable).

@utf utf added the ugrads label Jan 15, 2019
@ardunn
Copy link
Contributor

ardunn commented Jan 15, 2019

sounds good to me!

@ardunn
Copy link
Contributor

ardunn commented Jan 26, 2019

@utf current benchmarking implementation has you pass in a sklearn kfold (or StratifiedKFold) to the benchmarking function. the kfold object can accept a random state param, so this issue is essentially closed.

@ardunn ardunn closed this as completed Jan 26, 2019
@utf
Copy link
Member Author

utf commented Jan 26, 2019

Sounds great

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants