Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pascal_voc.py dataset added #6665

Closed
wants to merge 1 commit into from
Closed

Conversation

ahundt
Copy link
Contributor

@ahundt ahundt commented May 17, 2017

Some of this code is credited to @warmspringwinds https://github.com/warmspringwinds/tf-image-segmentation which is also MIT licensed. Originally at keras-team/keras-contrib#80

Depends on sacred for the command line utilities, I can remove this if desired but in that case I would appreciate a preferred tool recommendation.

To download and configure pascal voc + berkeley sbd for use with keras:

cd /path/to/keras/keras/datasets
python pascal_voc.py pascal_voc_setup

#6538 is the issue via which examples using this dataset would be contributed.

@ahundt ahundt changed the title pascal_voc.py added pascal_voc.py dataset added May 18, 2017
@fchollet
Copy link
Collaborator

This PR does not follow the code and docstring style used in Keras. But apart from that, I am not sure that we should be adding new datasets: Keras datasets are not being actively developed, and at this point they are just meant to get data for debugging, not for real work.

@ahundt
Copy link
Contributor Author

ahundt commented May 20, 2017

The first purpose of Pascal VOC is for examples to debug segmentation/dense prediction use cases, much like you mentioned. I'll detail a second purpose below. With formatting, docstrings, and perhaps cleanup could it become viable for merging?

Reproducibility may be another one of those best practices that could lower the barrier to entry in a valuable way. It could be as easy in Keras as creating models and loading default imagenet weights. Reproducibility is Sacred's goal and something like it might complement Keras examples and datasets well. That's why Sacred is imported in this PR.

Please consider Keras APIs for a limited number of key formats for which many datasets are available. This could be a step towards a lower barrier to using Keras and to reproducibility, with one or two default dataset examples directly included. I've identified several candidate formats, datasets and APIs:

@fchollet
Copy link
Collaborator

It would be ok to add Pascal VOC as a dataset. However, this PR should not go into core Keras, because:

  • it's a very significant amount of code -- amounts to a sizeable fraction of the total codebase
  • the majority of it is not in line with the code conventions of the rest of the codebase
  • the docstrings follow Sphinx conventions, it seems
  • it introduces a new dependency

Basically this PR is dropping a big chunk of code into the codebase, that simply doesn't belong.

This could be mergeable as a much simpler, concise PR that would follow Keras conventions --something similar to what we are doing with the other datasets.

@fchollet fchollet closed this May 23, 2017
@ahundt
Copy link
Contributor Author

ahundt commented May 23, 2017

ok sounds good, will refine over at keras-contrib, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants