Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bootstrap fixes #250

Merged
merged 7 commits into from
Apr 3, 2020
Merged

Bootstrap fixes #250

merged 7 commits into from
Apr 3, 2020

Conversation

tcare
Copy link
Contributor

@tcare tcare commented Apr 2, 2020

  • Project name is now additionally restricted to letters and underscores
  • Fix an issue where we try to load a non-existant dataset from sklearn after bootstrapping
  • Added some general error handling
  • Standardized arguments to short and long forms & updated README
  • General code cleanup

@tcare tcare requested review from dtzar, eedorenko and sudivate April 2, 2020 21:25
@tcare tcare force-pushed the tcare/bootstrap-fixes branch from d8a4eee to 6c9cfa4 Compare April 2, 2020 21:32
Copy link
Contributor

@eedorenko eedorenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment

try:
from sklearn.datasets import load_diabetes
except ImportError as e:
print("Project has already been bootstrapped, you must provide your own data.") # NOQA: E501
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't be so strong in this message. We don't know the reason of the error for sure. We just guess. I would go with something like "Failed to load diabetes dataset, perhaps the project has already ..."

The thing is that it will still rename load_diabetes into load_we_dont_call_his_name which introduces a buggy code (that we handle with this try-except). Perhaps it would make sense to move out all this load_diabetes dataset creation to a separate module (imported in this file) and exclude that module/file from "files" in replace_project_name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, separating dataset creation is the right way to go. A quick hotfix is to call replaceprojectname on the training script to rename the specific import.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought ImportError would be enough of a limited scope to avoid any weirdness. However you made me realize that this would actually be the first time that we encounter sklearn in the flow, so this could hide a dependency error. At the very least, we need to check that sklearn exists and the loading function doesn't.

Re: moving it out, if we're going to do the non-hacky fix, I feel that this should be generic enough that you don't need to rely on having a predefined dataset to export and rather can use a csv.

@tcare
Copy link
Contributor Author

tcare commented Apr 2, 2020

I decided to take the middle ground. CSV creation from the diabetes data has been factored out, but when the project bootstraps the CSV loading will fail with a message that they need to provide a CSV. This way, we avoid a situation where we silently use diabetes data (happened to me :)) and still make it easy to bring a CSV to use.

@eedorenko
Copy link
Contributor

lgtm

@eedorenko eedorenko merged commit 3ed9a90 into master Apr 3, 2020
@dtzar dtzar deleted the tcare/bootstrap-fixes branch April 9, 2020 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants