Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passed the scheduling argument through the *_generator function. #7236

Merged
merged 13 commits into from
Jul 19, 2017
Merged

Passed the scheduling argument through the *_generator function. #7236

merged 13 commits into from
Jul 19, 2017

Conversation

joeyearsley
Copy link
Contributor

@joeyearsley joeyearsley commented Jul 4, 2017

Previously, the scheduling parameter was not passed to the OrderedEnqueuer, which was needed in training.
Since the scheduling parameter is currently a binary decision, I've also altered the value to a boolean and changed the kwarg to be more descriptive.

It also, seems we may need a shuffle function inside the Sequence call. Currently, only the input is shuffled, whilst this ensures some randomness this could be improved on via a function call.

Also, @fchollet do you remember why in the generators you passed different seeds? - This is also missing from the current code, however, it is trivial to implement.

Previously, the scheduling parameter was not passed to the OrderedEnqueuer, which was needed in training.
Since the scheduling paramater is currently a binary decision, I've also altered the value to a boolean.
@@ -1932,7 +1940,8 @@ def evaluate_generator(self, generator, steps,
try:
if is_sequence:
enqueuer = OrderedEnqueuer(generator,
use_multiprocessing=use_multiprocessing)
use_multiprocessing=use_multiprocessing,
ordered=ordered)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense? evalute_generator(...,ordered=True) ==evalute_generator(...,ordered=False)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the predict, to keep kwargs consistent.

@@ -2015,6 +2025,8 @@ def predict_generator(self, generator, steps,
non picklable arguments to the generator
as they can't be passed
easily to children processes.
ordered: Sequential querying of data if `True`,
random otherwise.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of shuffling for predict?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was mostly to keep kwargs consistent.

self.sequence = sequence
self.use_multiprocessing = use_multiprocessing
self.scheduling = scheduling
self.ordered = ordered
Copy link
Contributor

@Dref360 Dref360 Jul 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we test that ordered works? Probably.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean test it worked?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a test that would validate that if ordered=False, there is some shuffling. Nothing fancy, but I could prevent some bugs in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll add it in, good spot.

@@ -1680,6 +1681,8 @@ def fit_generator(self, generator,
non picklable arguments to the generator
as they can't be passed
easily to children processes.
ordered: Sequential querying of data if `True`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only for Sequence. No effect for Generators

@fchollet
Copy link
Collaborator

We already have a shuffle argument in fit. Is this the same thing?

@Dref360
Copy link
Contributor

Dref360 commented Jul 10, 2017

Pretty much, but for Sequence. So every epoch the order of the batches will be shuffled.

@fchollet
Copy link
Collaborator

1 - if this only affects Sequence, shouldn't it be part of its API, and not part of the *_generator API?
2 - the API should be consistent across the codebase, so we should use shuffle, not ordered

@Dref360
Copy link
Contributor

Dref360 commented Jul 10, 2017

So a property in Sequence that the Enqueuer can poke?

My only concern about that would be that it's the Enqueuer responsibility to know how to iterate over the Sequence.

@joeyearsley
Copy link
Contributor Author

1 - had to pass through the *_generator methods, since the OrderedEnqueuer isn't accessible else where, unless we make users pass the enqueuer into these methods? - the plus is that it will be even more flexible, down side is more work for the user.

2 - I'll alter when I'm back from vacation

@joeyearsley
Copy link
Contributor Author

@fchollet It keeps timing out on this test: tests/keras/wrappers/scikit_learn_test.py::test_regression_class_build_fn

However, I haven't touched anything related to it. How can I fix it?

@joeyearsley joeyearsley reopened this Jul 18, 2017
@joeyearsley
Copy link
Contributor Author

@Dref360 Thanks for the suggestion!

@@ -1680,6 +1681,9 @@ def fit_generator(self, generator,
non picklable arguments to the generator
as they can't be passed
easily to children processes.
shuffle: whether to shuffle the data at the beginning of each
epoch. Only used with instances of Sequence (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use code markers around code keywords (`)

@@ -351,6 +351,12 @@ def __len__(self):
"""
raise NotImplementedError

@abstractmethod
def on_epoch_end(self):
"""A function which is called at the end of the epoch.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Method called at the end of every epoch."

@@ -351,6 +351,12 @@ def __len__(self):
"""
raise NotImplementedError

@abstractmethod
def on_epoch_end(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addition of this method appears unrelated to the shuffle arg?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is and it isn't.
The shuffle arg relates to the shuffling of the batch indices in the sequences, however, you may also wish to implement file path shuffling in the Sequence method. This would allow that by placing that shuffle operator into the on_epoch_end method.
The path shuffling would ensure that the files inside the batches differed as well as the batch ordering.

@fchollet fchollet merged commit 78f26df into keras-team:master Jul 19, 2017
ahundt added a commit to ahundt/keras that referenced this pull request Jul 26, 2017
* commit '84ceb94055b831c486dbf4955fdf1ba0f63320d1': (42 commits)
  Fix conv reccurent test
  Style fix in conv recurrent tests.
  Support return_state parameter in ConvRecurrent2D (keras-team#7407)
  Small simplification in ResNet50 architecture
  Update FAQ with info about custom object loading.
  add example for passing in custom objects in load_model (keras-team#7420)
  Update applications.md (keras-team#7428)
  Cast constants in optimizer as floatx.
  Fix stop_gradient inconsistent API (keras-team#7416)
  Simplify static shape management in TF backend.
  Fixed warning showing up when channel axis is 1 (keras-team#7392)
  Throw exception in LSTM layer if timesteps=1 and unroll=True (keras-team#7387)
  Style fix
  Passed the scheduling argument through the `*_generator` function. (keras-team#7236)
  Fix typos. (keras-team#7374)
  Fix ImageDataGenerator.standardize to support batches (keras-team#7360)
  Fix learning phase info being left out in multi-input models (keras-team#7135)
  Fix PEP8
  Fix deserialization bug with layer sharing at heterogenous depths
  Bug fix: Support multiple outputs in Lambda layer (keras-team#7222)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants