Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement CTC with keras? #383

Closed
blackyang opened this issue Jul 12, 2015 · 38 comments
Closed

implement CTC with keras? #383

blackyang opened this issue Jul 12, 2015 · 38 comments

Comments

@blackyang
Copy link
Contributor

Hi there,

Has anyone implemented a (Connectionist-Temporal-Classification)CTC loss with keras?

I attempt to add such a cost function in objectives.py file, based on rakeshvar's code. The model could be compiled, however, there are several errors when I do model.fit(). I am new to theano so it's really tough for me to debug...

It shouldn't be hard in theory, so I guess I made some "naive" mistakes...

@fchollet
Copy link
Member

Do you have a reference for what you are trying to implement? As well as your attempt so far.

@blackyang
Copy link
Contributor Author

Hi @fchollet , the original paper of CTC could be found here by Alex Graves.

Basically, CTC is a special loss function to handle alignment. For example, in speech recognition, suppose the input sequence has a length of t (then the output of RNN also has a length of t), usually the target sequence would have a length of w smaller than t. CTC saves the need for pre-segmentation of the inputs and post-segmentation of the net outputs.

I was trying to add a new cost function in objectives.py, based on this ctc.py file. The model could be compiled, however, there are several errors when I do model.fit(). I guess the reason lies in these lines, which implies that the two arguments to cost function should share same shape. Correct me if I misunderstand anything

@futurely
Copy link

@amaas implemented the CTC loss strictly faithful to the original paper in a very straightforward way.

@blackyang
Copy link
Contributor Author

@futurely thanks! Currently I am using this with lasagne :-)

@amaas
Copy link

amaas commented Jul 23, 2015

It should be relatively straightforward to port our CTC implementation into the Keras framework. Note that our fast version is cython (which doesn't seem to be used elsewhere in Keras). Without cython the loops to compute alignments required to evaluate the CTC loss were painfully slow.

@ghost
Copy link

ghost commented Aug 19, 2015

@amaas : do you have a theano version implementation? Or can your fast version work with theano?

@amaas
Copy link

amaas commented Aug 25, 2015

@Jedi00 No, we wrote our RNNs from scratch without Theano. If you want to replace the NN architecture though you could take just our CTC loss and make it a Theano function. It only needs to interact with the final layer so it should be mostly unchanged in a Theano implementation.

@jinserk
Copy link

jinserk commented Sep 22, 2015

Hi @blackyang, did you implement Lasagne's CTC into Keras? If you did, could you tell me how to do?
Keras' loss objects are all functions defined in objectives.py, and this seems being called from compile() function in models.py. It is wrapped with weighted_objective() function, which call the loss function object with only two params y_true and y_pred. However, Lasagne's CTC is a class object, and the apply() function seems to require 4 params. I'm stuck here.
Thank you.

@blackyang
Copy link
Contributor Author

Hi @jinserk , I was stuck at the same place, so I used Lasagne which I think is more extensible. By the way I recommend amaas's implementation instead of Lasagne's CTC, since the later one is somehow problematic

@Michlong
Copy link

I tried it too, unfortunately, failed...

@futurely
Copy link

The following paper trained a convolutional bidirectional LSTM network to recognize natural scene texts without text line segmentation. The open source code implemented CTC in C++ for the Torch7 framework in Lua. The C++ code can be modified to use in Python.

[1] B. Shi, X. Bai, C. Yao. An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition. CoRR abs/1507.05717, 2015.

@ekelsen
Copy link

ekelsen commented Jan 14, 2016

Baidu just released their open source CPU and GPU implementation of CTC here:
https://github.com/baidu-research/warp-ctc

It is released as a C-library and bindings for Torch. The C library should be easy to integrate into many different projects.

@blackyang
Copy link
Contributor Author

@ekelsen thanks for the pointer!

@ZhangAustin
Copy link

Here is a implementation of Theano bindings for Baidu's warp-ctc: https://github.com/sherjilozair/ctc

Is there any plan for Keras to do bind this?

@futurely
Copy link

@mschonwe
Copy link

@shantanudev
Copy link

You guys have any luck with this implementation?

@ghost
Copy link

ghost commented May 13, 2016

I maintain a repository of CTC with various implementations, including cython, numba/python and theano versions, check here: https://github.com/daweileng/Precise-CTC. You can use CTC_precise or CTC_for_train class, they're both fine for RNN training.

The CTC objective is different from the current objective functions in Keras, and requires different masking mechanism. I also maintain a repository of Keras MOD with CTC incorporated, check here : https://github.com/daweileng/keras_MOD. Currently, only train_on_batch() is modified to be compatible with CTC. This is enough for me, so there's no definite planning to modify other parts of Keras.

@shantanudev
Copy link

Oh this is perfect and exactly what I am interested in. Thank you!

@nouiz
Copy link
Contributor

nouiz commented May 13, 2016

Just to let you know, there is this discussion with version that wrap baidu
version that could be faster:

Theano/Theano#3871 (comment)

There is 2 current wrapper version at:

https://github.com/mcf06/theano_ctc

and

https://github.com/sherjilozair/ctc

On Fri, May 13, 2016 at 1:41 AM, shantanudev notifications@github.com
wrote:

Oh this is perfect and exactly what I am interested in. Thank you!


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#383 (comment)

@lingz
Copy link

lingz commented Jun 1, 2016

@daweileng Do you have any instructions/examples as to how to use your Keras MOD?

@ghost
Copy link

ghost commented Jun 2, 2016

Under the repository https://github.com/daweileng/Precise-CTC, there is a folder named as 'Test', you can find a demo script 'mnist_ctc_v4.py' there.

@vkatsouros
Copy link

vkatsouros commented Jun 22, 2016

@daweileng In mnist_ctc_v4.py you import from NN_auxiliary and from mytheano_utils. Can you share these too? Maybe in Keras MOD?

@ghost
Copy link

ghost commented Jul 6, 2016

For who's interested: I updated my CTC-integrated Keras fork to base version 1.0.4, check here: https://github.com/daweileng/keras_MOD/tree/MOD_1.0.4. Till now, the following train/test functions work well with CTC cost:

  • train_on_batch()
  • test_on_batch()
  • predict_on_batch()

@githubnemo
Copy link
Contributor

@daweileng Sadly you did not fork the Keras repository. Instead you just copied the files over and added everything (including your patches) in one commit. Can you do that properly (e.g., press the fork button on github, clone, add your changes, commit separately, push) so your patches become actually visible? That'd be awesome.

@ghost
Copy link

ghost commented Jul 7, 2016

@githubnemo As explained in the README, the reason I didn't make a pull request is that to avoid a mass modification of Keras' masking mechanism, currently I override sample_weights and masks variables of Keras. In theory this should not cause problem for other networks but I'm not 100% sure about this. Besides, the modification of fit() function is not done yet. I'd like to collect enough feedback before an official pull request to Keras master branch.

If you just want to know what are changed, you can compare contents of the two repositories.

Progress: Now FCN can work with LSTM + CTC!

@pasky
Copy link
Contributor

pasky commented Aug 12, 2016

See also #3436

@patyork
Copy link
Contributor

patyork commented Jan 13, 2017

@harikrishnavydana The ocr example runs fine for me on both Theano and Tensorflow.

If it is not working for you, please review the issue guidelines (update keras) and if the issue persists, open a new issue.

@HariKrishna-Vydana
Copy link

Thank you, i was using the older version of keras @patyork

@besanson
Copy link

Hi, thanks @patyork . Just to understand. You are putting text in images. And using some of these to train and others to validate? but you are using full words to train and not characters. Pycairo is a complicated library to install :)

@anuj-rathore
Copy link

anuj-rathore commented Aug 17, 2017

I am trying to use keras ctc in Bidirectional LSTM i.e. https://github.com/lvapeab/ABiViRNet
Network is as:
https://pastebin.com/9QXbJSwE

Since loss function in keras uses 2 arguements, ctc_batch_cost uses 4. Can somebody tell me how to process it?

@selcouthlyBlue
Copy link

selcouthlyBlue commented Jan 16, 2018

Apparently, there is a ctc_loss implementation in Keras. There's an open issue on Keras' ctc_batch_cost in the tensorflow_backend.

@hypernote
Copy link

Hello... We already have some sample of CTC at keras repository?

@selcouthlyBlue
Copy link

You mean this one? If so, yeah I know there is already a sample. It's just when I search for "Keras CTC" in google, this issue comes up and I just thought it would be nice to let people know that such an implementation already exists.

@hypernote
Copy link

Great

@rasto2211
Copy link

Is it ok to use ctc_batch_cost as keras loss function and pass it to model.compile? All the losses that are implemented in:
https://github.com/keras-team/keras/blob/master/keras/losses.py
take only one sample. Is it efficient?

Is there any plan to integrate WarpCTC to Keras?

@saisumanth007
Copy link

Could you please tell what input_length and label_length specify? As per the documentation it seems label_length contains the lengths of ground truth strings (in case of OCR). But I'm not sure what input_length means.

@aayushee
Copy link

aayushee commented Feb 5, 2018

I think input_length refers to your sequence length and label_length refers to the ground truth label length.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests