-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bootstrap fixes #250
Bootstrap fixes #250
Conversation
tcare
commented
Apr 2, 2020
- Project name is now additionally restricted to letters and underscores
- Fix an issue where we try to load a non-existant dataset from sklearn after bootstrapping
- Added some general error handling
- Standardized arguments to short and long forms & updated README
- General code cleanup
d8a4eee
to
6c9cfa4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment
try: | ||
from sklearn.datasets import load_diabetes | ||
except ImportError as e: | ||
print("Project has already been bootstrapped, you must provide your own data.") # NOQA: E501 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't be so strong in this message. We don't know the reason of the error for sure. We just guess. I would go with something like "Failed to load diabetes dataset, perhaps the project has already ..."
The thing is that it will still rename load_diabetes into load_we_dont_call_his_name which introduces a buggy code (that we handle with this try-except). Perhaps it would make sense to move out all this load_diabetes dataset creation to a separate module (imported in this file) and exclude that module/file from "files" in replace_project_name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, separating dataset creation is the right way to go. A quick hotfix is to call replaceprojectname on the training script to rename the specific import.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought ImportError would be enough of a limited scope to avoid any weirdness. However you made me realize that this would actually be the first time that we encounter sklearn in the flow, so this could hide a dependency error. At the very least, we need to check that sklearn exists and the loading function doesn't.
Re: moving it out, if we're going to do the non-hacky fix, I feel that this should be generic enough that you don't need to rely on having a predefined dataset to export and rather can use a csv.
I decided to take the middle ground. CSV creation from the diabetes data has been factored out, but when the project bootstraps the CSV loading will fail with a message that they need to provide a CSV. This way, we avoid a situation where we silently use diabetes data (happened to me :)) and still make it easy to bring a CSV to use. |
lgtm |