Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequential.fit_iter() for large datasets #575

Closed
wants to merge 6 commits into from

Conversation

loisaidasam
Copy link

Sequential.fit_iter() method for datasets that are too large to fit into memory

https://groups.google.com/d/msg/keras-users/wv6tw0-QLPw/0H2tovUhDAAJ

@fchollet
Copy link
Collaborator

Won't merge because this is trivial to do in vanilla Keras:

for X, y in datastream():
    model.train_on_batch(X, y)

From keras.io:

Keep a pragmatic mindset and avoid bloat. Only add to the source if that is the only path forward.

@fchollet fchollet closed this Aug 21, 2015
@loisaidasam
Copy link
Author

👍 thanks for the insight @fchollet - it wasn't immediately obvious to me that you could do this.

/cc @amitbeka

@amitbeka
Copy link
Contributor

My problem with this small loop is that you lose all other "goodies" that fit() gives you, like handling of sample_weight and class_weight, callbacks, verbosity etc., you need to implement it around this loop by yourself, which is exactly what fit_iter() does (wrapping this loop with these options).

I really understand @fchollet for not wanting to add more APIs into the system, because it makes the package a little bit more complex. In addition, there are more things which are natural to add to this function (like shuffling), and it will become more complicated. Still, I think people will end up with writing almost-identical code for this case, so probably we can all benefit from sharing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants